Your speech-to-text systems could be secretly listening to another’s commands, UC Berkeley study finds

Amanda Ramirez/Staff

Related Posts

Your voice-commanded systems, such as Siri and Alexa, could be secretly listening to someone else’s commands without your knowledge, as concluded by a recent computer security study conducted by UC Berkeley researchers.

These “targeted audio attacks” could be disguised as regular music, but they transmit inaudible messages to speech-to-text recognition systems. The researchers — fifth-year computer science doctoral student Nicholas Carlini and computer science professor David Wagner — found that these systems are able to transcribe and follow secret messages without humans’ knowledge.

“What we’re doing is exploiting is the difference between what the humans hearing and what the device is hearing,” Carlini said. “We haven’t modified the devices to do anything wrong. They already were able to do this. We’re just demonstrating that it is possible.”

For the project, Carlini said they used Mozilla’s program “DeepSpeech,” which is a program very similar to Siri and Alexa. To generate the audio attacks, they took an audio sample and developed a new sample that was over 99.9 percent similar, meant to communicate a secret phrase to the voice-commanded system.

Carlini said the secret phrase could sound like “Okay Google — browse,” and for each trial, they modified the audio sample and measured how accurately the system transcribed this secret phrase.

These voice-commanded systems are applications of machine learning, in which computer systems are programmed to think and follow commands without explicit instruction to do so, according to Carlini. For his general area of research, Carlini said he focuses on the intersection of machine learning and computer security, surveying applications of machine learning and various security vulnerabilities.

Tavish Vaidya, a fifth-year doctoral student in computer science at Georgetown University, worked with Carlini on a different project regarding voice-commanded systems — the two used white noise to transmit secret messages to the Google speech-to-text systems.

According to Vaidya, it is important to look at security of these systems because they are “ubiquitous,” and present in almost all smart devices, such as smartphones, tablets, computers, security systems and in-home assistants.

“This is a step forward in better understanding the security aspect of speech recognition systems and it’s applications,” Vaidya said in an email. “I am hopeful that Nicholas’s paper will also create further interest in the community to understand vulnerabilities of speech-to-text and come up with robust defenses.”

Carlini said that they will be presenting their findings from this paper at a conference, called 1st Deep Learning and Security Workshop, on May 24. This research, according to Carlini, is especially relevant for those who care about machine learning, because it highlights what is unknown about these devices.

“We don’t understand what these machines are doing and … someone’s who’s malicious can exploit this fact to make them do something incorrect,” Carlini said. “In systems when they can actually perform some action on your behalf, you have to be careful that someone is not controlling them without your knowledge.”

Contact Malini Ramaiyer at [email protected] and follow her on Twitter at @malinisramaiyer.