“Alexa, set the alarm for eight”, “Alexa, play the movie Oppenheimer” or “Alexa, tell me what the weather will be like during Easter.” All of these interactions with the smart speaker are recorded and are available to any user who requests them from Amazon. That’s what criminologist María Aperador did. Her surprise was to discover that some audios were not preceded by the activation word, “Alexa”, and she reported it a few days ago in a video on TikTok and Instagram that has gone viral. How is this possible?
Amazon’s policy is clear on this: no audio is stored or sent to the cloud unless the device detects the wake word. This is confirmed by the company. And they add that the user will know when Alexa sends their request to the cloud by a blue light indicator or a sound from the speaker.
With this in mind, David Arroyo, a CSIC researcher specialized in cybersecurity and data, offers an alternative: “The system they have is only activated when someone says the activation word. But, for various reasons, it can have false positives. What we would have to see there is to what extent it is robust against elements that are disturbing what the interpretation of that activation word is.”
Voice interpretation machine learning systems, such as those used by Alexa or Google or Apple speakers, incorporate disparate elements to improve their operation. But still, it is not an easy task. “These systems are designed to identify everything that are elements of variability due to pronunciation,” says Arroyo in reference to the different accents and ways of speaking, but also to changes in the resonance or reverberation of the room in which it is located. the device. “It would be necessary to know in detail what the precision and false positive rate of the algorithm that Amazon uses specifically has.”
EL PAÍS has spoken with María Aperador to learn a little more about the recordings, which last around 6 seconds. They are fragments of casual conversations, of her or of people who were in her house. The criminologist has not reviewed the more than 500 audio files that Amazon sent her, but in about 50 that she has listened to she found two in which there was no activation word.
A study carried out by researchers from the Ruhr University Bochum and the Max Planck Institute for Security and Privacy highlights the importance of accidental activations in smart speakers. After analyzing 11 devices from eight different manufacturers, they published information on more than 1,000 involuntary activations. “We are talking about voice recognition systems, which depending on how they are implemented, can work better or worse,” says Josep Albors, director of Research and Awareness at the cybersecurity firm ESET Spain, about the possibility of false positives.
How speakers detect the wake word
To activate when they hear the word ‘Alexa’, ‘Ok, Google’ or ‘Hey, Siri’, smart speakers have a system that constantly tracks that term. “In the end they are devices that are constantly listening. But smartphones or many intercoms also do this. It is not exclusive to Alexa,” says Albors.
Arroyo also makes this assessment. “When you put the speaker on active standby, that means he is constantly absorbing what you are talking about. It doesn’t record it. But the algorithm is processing it, because it has to see what words are being spoken.”
This is an algorithm that works locally, on the device itself, searching for the acoustic patterns corresponding to the activation word. At Amazon they point out that their technology only relies on information from sound waves to detect the term. In addition, they highlight that the speaker also allows it to be activated with a button, which would avoid sound monitoring. In the case of recordings, which occur when the device is activated, users can choose not to store them in their privacy options.
What’s the problem with this permanent wake word tracking? The two cybersecurity specialists agree that if the sound were processed to extract data beyond the keyword search, the privacy problems would be very serious. But they also agree that there is no evidence that this is the case. “There are many interests for this not to happen, because it would mean the loss of confidence in all the devices and a very considerable economic damage for these companies,” says Albors.
You can follow The USA Print in Facebook and x or sign up here to receive our weekly newsletter.