deddynoer: Google iPhone and the Future of Machines That Listen


By JOHN MARKOFF

How do you talk to a search engine? In Googlish, of course.

Google’s new speech recognition service for the iPhone, which I wrote about last week and which was released on Monday, understands you most accurately when you speak to it just the way you enter queries into the Google search box. That makes sense, because the system’s accuracy comes from the billions and billions of typed queries that Google has recorded over the years.

Google’s voice search software for iPhones. (Peter DaSilva for The New York Times)

So don’t bother with polite formalisms like “What is the best pizza restaurant in San Francisco?” Simply say “best pizza restaurant San Francisco.”

After all, you’re talking to a dumb machine — or perhaps several, distributed across multiple states.

The accuracy is far from 100 percent, and probably not even 95 percent (Google execs demurred when I asked if they had any meaningful accuracy statistics). My experience is that it captures your voice query substantially more than half the time, and that in itself is a revelation. It also makes the usual sampling of funny mistakes. (My favorite was my inability to get it to recognize “Camp Unalayee,” which I attended as a teenager. It would usually respond “Camp Ukulele.” But heck, unalayee is a Cherokee word that means “place of friends,” and ukulele is in the dictionary.)

Yet after five days of using the service it still seems better than any speech recognition system I have used to date. It may even signify an inflection point — speech recognition that is more useful than typing.

I was initially intrigued by the Google Mobile App because I have been following the progress of speech recognition research since the early 1980s. Progress in this field feels like watching paint dry. Yet the industry’s visionaries have been unanimous in saying that we will talk to machines — and they will understand us — someday.

It was probably in 1983 that researchers at SRI International demonstrated how they could control simulated battleships with voice commands (“go left,” “go right,” “stop,” that sort of thing). Evolution has been slow because it turns out that recognizing speech is a really, really hard problem. There are all the complexities of language, plus accents and background noise.

In the past decade, however, progress has accelerated. The stakes are very high and there are a number of big and small players. The search giants Google, Microsoft and Yahoo all believe speech recognition is a prerequisite for the era of mobile computing. And there are lots of others including I.B.M., Nuance and Vlingo that are developing speech technology.

Although Microsoft hasn’t dominated in this area yet, the company has been investing heavily in research in the field going back to the 1980s. Last year it spent close to $1 billion to acquire Tellme Networks, a company based in Silicon Valley that supplies speech recognition for the phone directory and operator assistance market.

“You want to be able to interact with your phone just like you would with your mom or friends,” said Dariusz Paczuski, senior director for consumer services at Tellme. “Voice is a great interface and it can simplify interactions more than anything.”

Everyone agrees that in mobile applications, speech is the obvious user interface. Whether it’s on a BlackBerry, an Android phone or an iPhone, typing will always be error-prone and frustrating.

If one company makes a major breakthrough in voice, it is potentially a major threat to its rivals, because a “speech interface” could potentially allow one company to simply take over a handheld device developed by another company.

For some time we seem to have been stuck at the stage where speech recognition works, but just sort of. Perhaps we are at a moment like the one when A.T.M.’s were first introduced. At first most people said they preferred interacting with a human bank teller. Then, overnight it seemed, everyone realized that the bank teller relationship wasn’t all it was cracked up to be. Now most of us never set foot inside a bank. How long before people find that it is more efficient to deal with a robot on the phone than a human?

Enough with the future-gazing. Right now there is something compelling about saying “backpacking trails Trinity Alps California,” and being taken directly to a Web site listing all of the best ones.

Explore posts in the same categories: Tentang Bewara

Tinggalkan Balasan

Isikan data di bawah atau klik salah satu ikon untuk log in:

Logo WordPress.com

You are commenting using your WordPress.com account. Logout / Ubah )

Gambar Twitter

You are commenting using your Twitter account. Logout / Ubah )

Foto Facebook

You are commenting using your Facebook account. Logout / Ubah )

Foto Google+

You are commenting using your Google+ account. Logout / Ubah )

Connecting to %s


%d blogger menyukai ini: