Do you have Siri on your iPhone, or Google Assistant on your Samsung Galaxy? You may even have a Google Home or Amazon Alexa in your living room. Machine learning at its best!
Ever wonder who else is listening in?
Machine learning and voice tech
Voice recognition technology, like image recognition and self-driving cars, works through the magic of machine learning. ‘Machine learning’ is a bit of a misnomer. It’s really a mathematical construct inspired by brain cells. In machine ‘learning’ input is linked to output using a neural net, which is really just math. See this post for details.
Artificial neural nets have to be trained — using more math — like we need to train our brains. Hence ‘machine learning’. In machine learning you have to choose sensible input to use, which can be hard, then train the nodes for the neural net to work. To machine learn, you initialize your neural net with random weights, then start processing a large amount of test data. You adjust the weights to improve the results, and if your test data is large enough and covers enough cases, your neural net will work.
For example, all those traffic lights and buses you’re clicking in Google Captchas? That’s you helping to train a machine learning algorithm for image recognition.
For voice technology the input of a network is a stream of audio and the output is recognized words or commands.
As a simple example, let’s take Siri. To activate Siri, you need to say ‘Hey Siri’. Apple trained the Siri neural net with a gazillion audio fragments of people saying ‘Hey Siri’ and a gazillion other audio fragments of people _not_ saying ‘Hey Siri’. Then, when you initialize Siri, you train it some more so it will recognize your specific voice – at least, that’s my understanding of it.
So, when you say ‘hey Siri’, the iPhone will recognize that and start up Siri.
Great. So, end post?
It would stop there, but let’s step back for a second. In order for the phone to recognize ‘hey Siri’, it has to process audio first. All the audio. So, when you turn on Siri ,your phone will start listening to you all the time.
Okay, we could live with that. However, that’s not the end of it. When the phone does recognize ‘hey Siri’ it will send the commands that follow to the Apple cloud to parse the commands. That wouldn’t be so bad, if the phone picked up the sentence correctly. Unfortunately, it doesn’t – not in all cases, anyway. So, whenever your phone thinks it hears ‘hey, Siri’, it starts sending the audio that follows into the cloud.
Crap. Well, that’s not so bad, is it?
Oh, but it gets worse. You see, since the field of audio assistants is emerging, a company like Apple tweaks their assistants all the time. To do that, they’ve put provisions in their end-user licenses that allow them to listen to the audio you send them. Uh-oh.
Yep, Apple employees and even sub-contractors listen to the audio your phone sends to Apple when the Siri neural net thinks it recognizes a ‘hey, Siri’.
Oh, but we’re not done. If you use an Apple Watch, you’re screwed even worse. An Apple Watch doesn’t use a catch phrase, it starts recording whenever you raise your wrist to your face. Ever seen a colleague in a business meeting yawning with an Apple Watch on?
Boo, Hiss, Big tech bad
Boo, hiss, Apple. Unfortunately, they’re not alone. Google, Microsoft, and Amazon admitted to doing the same thing. And Microsoft has crap security to boot. So, yeah, your Windows 10 Cortana might have sent you having sex on the couch to a Chinese Developer who accidentally leaked it to the wide world, and your Alexa shared that fight with your wife. Oh, and that drug deal you were doing…
The bottom line: machine learning can do awesome things. They can find cancer in X-rays, drive cars for us, and produce deep fakes. Some of it’s good, some of it’s bad, and some of it’s darn right terrifying.
Maybe a government should step in and put down some rules?