In the 1980's there was a TV show called Knight Rider, where Michael Knight, a vigilante with his car K.I.T.T would fight bad guys. The thing that made this show special to me was the car. Mr Knight could talk to it, and it would understand what he said and meant and respond meaningfully. Sometimes throwing a witty remark in there. It gave the car, a personality, it was the co-star of the show.
Now in 2011, Apple released Siri. An assistant where you can ask certain things, like "What is the weather going to be like tomorrow", and again, just like the car K.I.T.T., it responds with the correct information. In the above case, the weather for tomorrow based on your current location. If i ask Siri "What THE answer is?", it sometimes responds with the number 42, this is for nerds and geeks a pretty witty answer (as it is THE answer to THE question in Hitchhikers guide to the galaxy). Thus Siri seems to me, have personality, it answers questions with a certain flavor. For me, in 2011, it was the very first time you could ask almost anything to a device (a mobile phone) and it would (try) to give a smart, witty answer.
So is the fact that you are going to talk to your phone and it actually understands what you mean and then responds, something we're going to see more of in the future or is it just a hype?
I think, it's going to be huge and is here to stay. Here is an example why; Compare the following; How many touch-clicks and seconds does it take to create a new appointment for tomorrow with my dentist in my home town? By using touch/clicking on the interface it takes me about 30 seconds. When I ask the phone to make an appointment, it takes me 4 seconds. That is a lot faster. The only thing i needed to do is say the line; "Make an appointment with my dentist in my home town tomorrow at nine o'clock". Whereas, when using touch, i needed to create a new appointment, type in the word "Dentist" and "home town", set the time and then save.
There are other examples where voice is much faster then when using touch.
- Banking app - "Transfer 300 Euros from my savings account to my wife's payment account", 5 seconds for voice, >15 seconds for touch
- Home heating app - "I'll be home 2 hours early, set the temperature to 19 degrees celsius", 4 seconds for voice, >10 seconds for touch
Next big thing...
So why is it going to be the next thing in user interfacing? Because, next to the fact that it is more personal (the device can respond like a human would) and more natural to a person (you say things like you would say to another human), it is also a much faster way to interact with your device. While touch has made graphical user interfaces much simpler to use (even toddlers know how to swipe to the next photo). When using speech, a person can ask a complex thing to a mobile phone within a number of seconds and the device will execute the task. Google is also catching on and is launching Assistant, aka Siri for Android at the end of the year, meaning voice navigated apps are becoming mainstream.
In an upcoming post I'll dive deeper into the details how you can actually start coding up your own voice navigated app using Advance speech recognition (ASR), natural language processing and text to speech (TTS) technologies. Because it might sound really simple, actually hearing and understanding what humans say is really hard.