Researchers at Zhejiang University have developed AudioHijack, an attack method that embeds imperceptible commands in audio to manipulate large audio-language models with a 79–96% success rate. The attack was presented at the 47th IEEE Symposium on Security and Privacy in San Francisco. AudioHijack works by modifying numerical values inside digital audio waveforms in ways imperceptible to human listeners but that still affect how AI models interpret the signal. The manipulated audio can override or redirect a model's behavior even when legitimate user instructions are included with the clip, according to the research.
"It takes just half an hour to train this signal, and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says," said Meng Chen, lead author and Ph.D. student at Zhejiang University.
How AudioHijack Differs from Traditional Attacks
AudioHijack differs from traditional prompt injection attacks because it does not manipulate what the user says to the AI. Instead, it alters the audio signal itself, embedding hidden instructions inside sounds humans cannot hear. This approach makes the attack harder to defend against because it bypasses safeguards designed to detect suspicious text prompts.
Capabilities and Tested Systems
Researchers tested AudioHijack on 13 open-source AI voice models and found it could make them refuse requests, spread false information, insert harmful links, change personality, or perform actions the user never asked for, including web searches, file downloads, and emails containing personal data. The attacks also worked on commercial voice AI systems from Microsoft and Mistral that use similar technology.
Delivery Methods
Possible delivery methods include online videos, music clips, voice notes, or audio from Zoom calls uploaded to AI transcription services. The team also demonstrated similar attacks in live AI voice chats through unpublished follow-up work.
Defense Limitations
Monitoring a model's internal attention mechanisms was the most effective defense the researchers tested. However, they also found that attackers aware of the defense could reduce the strength of the manipulation while maintaining much of the attack's effectiveness.
"These single-point defenses struggle to resist our attack because we found it's very hard for these models to distinguish the normal user intent and our adversary attack," Chen said.
According to the study, the researchers are investigating whether the technique can reach closed models from OpenAI and Anthropic through shared open-source audio components.