CKGSB website

CKGSB Knowledge

Look Ma, No Hands: The Dawn of Vocal Computing

by Bennett Voyles

March 6, 2018

voice recognition and vocal computing

Google Home and Amazon’s Alexa have been catching on very quickly. Since October, Google reports that it has sold a voice-controlled speaker every second. While this could just be a fad, some analysts argue that the voice-activated speakers may mark the biggest shift in consumer technology since the smart phone.

In this article, we’ll look at what voice-control changes, new opportunities voice recognition might create, and whether – Amazon and Google notwithstanding – vocal computing will offer any new special opportunities for Asian technology businesses. Finally, we’ll look at the surprising things that computers may listen for next, now that they understand the words we are using.

Culturally, Grant McCracken argues, vocal computing fits right in to the zeitgeist. “This technology surely appeals to our vanity, as so much does these days in a selfie culture. If not as gods, we are as if admirals. We only have to murmur something to ‘make it so,’” says the New York-based anthropologist and popular culture consultant.

However, voice recognition may do more than unleash our internal master and commander. “Humans don’t really communicate that effectively using text,” says Richard Watson, futurist and futurist-in-residence at Imperial College London. Vocal computing should speed up a lot of queries, given that most people can speak much faster than they can type.

Goodbye, GUI?

Some analysts have argued that Alexa, Google Home, and Apple’s Siri, may be the beginning of the end of the graphical user interface that has been the main channel of human to computer communication for nearly four decades.

“In a few years, it will feel completely natural to handle the bulk of our computing interactions via voice. And the generation born now will grow up thinking that typing in words to obtain value from the worldwide network is as antiquated as using flints to make fire,” predicted Brett Durrett, a partner in Social Starts, a US venture capital firm, in an October blog.

At the same time as voice-activation changes our relationship with the computer screen, it may change our relationship with space as well. David Weinberger, senior researcher at Harvard University’s Berkman Klein Center for Internet & Society, says he can imagine a future where we will be able to speak anywhere and get a response from a networked computer, either from an Alexa-like receiver or from our own portable device.

“However it happens, the effect might be a sense that we are living in spaces that we can address,” Weinberger says. “You’re not talking to Alexa or Siri, etc. You’re asking your supermarket where the dental floss is and if there are any types of apples on sale. You’re not talking to a kiosk, you’re asking New York City why there are statues of lions in front of the library and where the nearest kosher Indian restaurant is. Places will have voices.”

Voice may also expand access to the traditional Internet, particularly for speakers of Asian languages. Subodh Kumar, co-founder and CEO of Liv.ai, a Bangalore startup that can convert eight Indian languages plus English to text, argues that multilingual voice-to-text functionality will greatly expand Asians’ ability to use the Internet.

For Indians who don’t speak English – perhaps 90% of the population — voice recognition will make Internet access much easier, Kumar argues. Keyboards have never been a natural fit for the culture, according to Kumar. Typing was designed with European languages in mind, not Indian: Hindi alone 20 vowels and 45 consonants, and some of the other subcontinental languages are even more complex.

And thanks to text-to-voice translation, even being able to read any language will no longer be a prerequisite for using the Internet, in Kumar’s view. “Anybody with a smartphone is now a literate person,” he says.

Perhaps in part because of the special advantages the technology promises to the Asian market, voice technology may mark a sea change in tech leadership. No longer “designed in California and made in Asia,” as Apple’s shipping boxes used to say, voice may be an area where Asian companies take the lead.

In addition to Indian startups such as Liv.ai, a number of Chinese companies are also very active in vocal computing. Both startups such as iFLYTECH and established giants such as Baidu and Alibaba are staking claims in this new territory.

How soon voice takes over as the dominant mode of connection is anyone’s guess, but it does seem to be moving quickly. Some analysts say that the technology has made more progress in the last 30 months than in the prior 30 years. The word recognition accuracy of Google Machine Language, for instance, has made astonishingly rapid progress, from 77% in 2013 to 95% today, according to the company — roughly the same level of comprehension as the average human being.

By 2023, analysts at Research & Markets, a US research firm, predict, the total voice recognition market could top $126.5 billion, spurring growth in consumer electronics, automotive, home security and other sectors, particularly if it becomes further integrated with artificial intelligence and deep learning programs.

Hey Google, what next?

Now that computers can understand us, some companies are moving on to the next challenge: teaching the machines to not only comprehend what we say, but what we mean. One startup, the Tel Aviv company Beyond Verbal, is trying to teach computers how to interpret and respond to varying tones of voice.

“We envision a world in which personal devices understand our emotions and wellbeing, enabling us to become more in tune with our selves and the messages we communicate to our peers,” said Yuval Mor, the company’s CEO, in a statement on the company’s website.

Based on an analysis of the recordings of 70,000+ subjects speaking more than 30 languages, the company’s technology is programmed to extract moods, attitudes, and personality all from different vocal intonations. A demonstration of the company’s technology, an app called Moodies, is available on Apple’s app store. Speak for 20 seconds and the software will give you a reading of your mood.

(But will we want machines that know us so well? When this writer read the previous paragraph aloud to Moodies, for example, the program detected in my voice a desire to win attention and take control, but also shyness and introversion, and perhaps some internal struggle or interpersonal conflict.)

In addition to monitoring our day-to-day moods, the Beyond Verbal team is working now to keep us healthy as well. The firm is now collaborating with the Mayo Clinic, the Rochester, Minn. medical research institute, to develop tools that enable machines to diagnose a variety of conditions simply from detecting certain distinctive qualities and speaking patterns in a person’s voice, including bipolar disorder, Parkinson’s, Huntington’s disease, and even cardiovascular disease.

You may also like

Auto Autonomy

Pony.ai’s driverless taxis are helping usher in a new era of transportation and changing the face of the taxi industry.

by Sherry Fei Ju | Jul. 15 2022

Solving the Fertility Conundrum

China’s ultra-low fertility rates will severely hamper the country’s growth in the coming years. What must be done to.

by Ouyang Hui | Apr. 18 2022

Burning out

The hardworking approach of Chinese people is the driving force behind the country’s unmatched growth, but attitudes are starting to shift.

by Leslie Wang | Apr. 18 2022

Refocusing Our Energy

Fu Chengyu, former chairman of SINOPEC and former chairman and CEO of CNOOC, elaborates on what China has to do to achieve.

by Patrick Body | Apr. 18 2022