Some Artificial Intelligence fun: Amazon’s Alexa speech recognition, preserving ESI … and the new platforms

Amazon Alexa logo
 

21 March 2016 – Alexa is an Amazon smart product. It is a cloud-based, voice-driven service that runs on Amazon’s own AWS (Amazon Web Services) infrastructure. Alexa takes spoken commands, sends them to the web, converts them into digital commands, and sends them back to the device. The device then turns those commands into discrete actions, such as playing a song from a particular web service, The idea is that one talks to it in order to perform certain home automation tasks.

Admit it.  It is tough to punch the button on the stereo system. We are much too busy these days.

Craig Ball, a lawyer who writes about computer forensics and litigation, has an interesting perspective. He notes that using the Alexa app on his phone or computer, he can view a list of every interaction since Alexa first came into his life, and listen to each recording of the instruction, including background sounds.  He calls it “evidence”:

“As a lawyer and forensic examiner, I’m tickled pink that these new technologies bring with them a plethora of precise, objective evidence.  Never in the course of human history have we had so much precise, probative and objective evidence about human thinking and behavior.  Viewed the right way, more digital evidence should inspire and empower lawyers, not frustrate and intimidate them.”

Craig also notes that users need an effective, self-directed means to preserve and collect their own data when legal and regulatory duties require it.  This is exactly what Zaid Al-Timimi (an Apple developer) has done with his “MashupApp” which I first saw 3 years ago in a very rudimentary form and which my team is now beta testing in its more sophisticated edition. I will have a full write-up next month.

Craig also notes that Alexa “only transmits and records what she hears when I call her name”.

Well, not quite.  According to a post by Lucy Bayly (“Amazon’s Alexa Went Bonkers, Reset User’s Thermostat”), one of the things Alexa apparently cannot do quite so well is determine who her master is. During a recent NPR broadcast about Alexa and the Echo, listeners at home noticed strange activity on their own Echo devices. Any time the radio reporter gave an example of an Alexa command, several Alexas across the country pricked up their ears and leapt into action — with surprising results. There you go. A smart device which is unable to figure out which human voice to obey.

And in one of the examples cited in the post:

“Listener Roy Hagar wrote in to say our story prompted his Alexa to reset his thermostat to 70 degrees,” wrote NPR on a blog recounting the tale.”

Ummm … huh? Quoting Stephen Arnold who forwarded me the story:

“Smart devices with intelligence do not—I repeat—run into objects nor do they change thermostat settings. Humans are at fault. When one uses a next generation search system to identify the location of a bad actor, nothing will go wrong.”

Platforms: past, present and future

What I see from Amazon and Alexa is much larger, much more important .. a new platform, in effect an invisible platform.  Many have suggested that it was the internet, far more than Steve Jobs, that revived the fortunes of Apple, and I think there’s a lot to be said for that proposition. The internet grew to become an independent, “unmonopolizable” platform laid atop the then existing personal computer platforms. The Macintosh — and later iOS and Android — could all access the internet just as much, and just as well, as Windows could.

Today there are now well over three billion active computing devices in the world with five primary operating systems/computing platforms — Windows, OS X, iOS, Android, and AOSP (non-Google) Android running in China. The key point here is, with the addition of scale in mobile and the inclusion of the global consumer market, there is no single standard computing platform.

The question then is, what will happen with things like Virtual/Augmented reality platforms or artificial intelligence platforms? Should we expect VR/AR or Artificial Intelligence to unify and one single platform emerge like during the enterprise PC days or will many different platforms exist as we see today in consumer computing?

I think the answer lies partly with what Amazon is doing. Technologists will tell you that the Echo is a front-end for Amazon’s Alexa. But what really matters is that beneath the façade of this slick new home automation device lies a remarkably clever and new type of platform: an invisible one. Bob O’Donnell, president and chief analyst of TECHnalysis Research, a fabulous technology consulting/market research firm (you can follow him on Twitter @bobodtech) explains it this way:

The Echo and its siblings have absolutely no screens—the primary means by which you interact with the device is your voice. (Yes, it has a remote, but it’s designed as a secondary input device). That means there is no dependence or reliance on any kind of visual cues. Everything happens through your voice, making the technology essentially imperceptible. In fact, it’s arguably one of the best examples of invisible technology to date.

In addition, there’s no traditional operating system running on Echo. Instead there seems to be some type of basic RTOS (real-time operating system) that merely serves as an on-ramp to the cloud-based Alexa voice services, which are the real engine for the Echo and its siblings.

This lack of a traditional OS does not mean that the device is limited, however. You can add on capabilities through what it calls “skills”—essentially a new type of application that “runs” on Alexa. Skills aren’t big fancy, function-filled screens of software, just simple directives to do certain specific things or retrieve certain bits of information when you request them.

From a usage perspective, if you compare the voice-driven model of something like Echo to today’s smartphones and other mobile devices (like smart watches), the differences become glaring. Instead of just speaking a command, you need to find the right application on your smartphone’s display, launch it, and select a command inside the app, all while focusing exclusively on the screen and needing to physically touch the device. Yes, most mobile OS’s have started to add voice-based assistant features that are arguably similar to Alexa and its fast, hands-free means of interaction. However, they are still clearly secondary input methods on today’s mobile devices, not primary ones, and that distinction is extremely important. Given the inherently visual nature of human beings, it’s hard to imagine voice ever becoming a primary means of interaction on any device with a screen—we can’t help but want to look.

And one very important point:  you can’t do everything on an Echo that you can do on a smartphone. Nevertheless, you can do a lot of the most common and most important things. Bob notes it’s not difficult to imagine a time in the not-too-distant future when smartphones get relegated to being more specialized devices for specific tasks, in the same way that PCs fell into that more specialized role with the growing importance of smartphones.

Interesting point: just as PCs have not gone away, neither will smartphones, of course, but they will likely slip a tier or two in the pantheon of our digital device universe.

And I brought this up with the “app crowd” over the weekend at Codemotion. There is likely little need for more than a few hundred or perhaps a few thousand “skills” to be added to a voice-based system and the means for monetizing those skills are few to none. Skills are likely to be created as enablers for other connected devices or services, instead of being seen as an end unto themselves, as most mobile apps currently are. Perhaps this is just as well, because the current mobile app store ecosystem is clearly faltering under its own size and weight and seems ready to implode at any moment.

Back to Bob:

To Amazon’s credit, they recognize the platform potential of Alexa and are actively encouraging other device makers to use Alexa in their own hardware designs and to create add-on skills. This could allow Amazon to do an end-run around the existing ecosystem players, such as Apple, Google and Microsoft, and place it at the forefront of voice-based computing.

Most visions of the future imagine technology that seamlessly blends into our lives, gives us immediate access to all the world’s information, and helps make our lives easier. While the Amazon Echo family of devices may not completely fulfill all these requirements, it’s one of the clearest indicators of where personal technology is headed in the years to come.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top