- AT&T Watson comes from different roots than Google’s Voice Actions or Apple’s Siri. Its API faces mobile apps, but it could transcend them.
- AT&T’s charge for use of Watson may turn developers away, but it can win with a premium pitch that attracts developers with business models generating actual revenues.
Just as it said it would, AT&T released its first API function tapping its Watson speech engine in June. Starting with a one-time $99 charge (and a rate of about $0.01 per transaction starting in 2013), developers can access and incorporate the network-based voice recognition service into their apps. There is just one function call currently available: speech-to-text. However, speech-to-text is how successful voice recognition applications such as Apple’s Siri, Samsung’s S Voice, and Nuance Communications’ Vlingo mobile app all work (and by no coincidence, Vlingo is a Watson licensee). Voice dictation, when captured and interpreted accurately, can transcribe and send SMS, launch Web searches and directory searches, or search through specific documentation or help pages, such as a software manual.
The base Google Android platform has a built-in voice recognition app that can kick off a Web search or other actions; Google itself has developed Google Voice Actions for Android using the platform. Unsurprisingly, in the wake of Siri, the number of smart agent app contenders building on top of Google’s speech recognition feature has exploded. Developed apps include Siri anagrams Iris and Risi, as well as Jeannie, Skyvi, Andy, Kiri, Alice, Omega, and plenty of others.
Apple Siri is steamrolling into voice smart agents; both Google and Apple are racing to enhance their voice smart agents; and groups of developers large and small have barreled into the sector. What can AT&T Watson bring to this quickly growing segment? Part of the answer is in AT&T’s belief in Watson’s high quality and performance, specifically the application’s high-accuracy recognition rate and ability to support high volumes of requests.
AT&T Watson may also have an advantage in its development outside a mobile app context. Once the RESTful API for Watson is joined by HTML5 and Microsoft SDK support, developers could embed its voice recognition in other types of devices, even inside Web pages. Among other purposes, Watson has powered interactive voice recognition for contact centers, accepting natural dialogue instead of canned “press 1 or say ‘yes’” menus; it also has played a role in speech analytics, as well as near-real-time language translation. For the latter, AT&T already released a free English/Spanish AT&T Translator app for Android and iOS, performing near-real-time translation (internally, speech-to-text-to-speech) for telephone conversations. That network-hosted translation capability could be adapted to the audio portion of a video conference: for example, letting executives speaking different languages meet face-to-face and, if willing to accept translation latency, speak directly with each other.
How can AT&T charge money for voice recognition if others are giving theirs away for free? If AT&T can demonstrate that Watson is a premium quality, easily embedded service, it could find a place in high-value segments such as information kiosks, or maybe premium shopping apps. AT&T Watson itself will not be the next Siri or Google Assistant, but it might be an enabler for other developers that need a cross-platform way to embed speech recognition into their own devices.