Researchers at the University of California, Santa Cruz have identified a class of security vulnerabilities in AI-enabled robots and autonomous systems that can be exploited through misleading text placed in the physical environment. The work examines how written words on signs, posters, or other objects can be interpreted by artificial intelligence systems as operational instructions, potentially allowing malicious actors to influence machine behavior without direct digital access.
The research focuses on embodied AI systems, such as self-driving cars, delivery robots, and drones, which rely on visual perception to interpret their surroundings. Many of these systems are increasingly powered by large vision-language models that process both images and text to make decisions in complex, real-world settings. While these models allow machines to respond flexibly to unpredictable environments, the study finds that they also introduce new attack surfaces.
The research team describes these threats as environmental indirect prompt injection attacks. Unlike conventional prompt injection attacks, which involve supplying crafted text directly to a language model through a digital interface, these attacks operate through the physical world. Text visible to a robot’s cameras can be processed as instructions, altering decision-making in ways that may conflict with safety objectives.
The study was led by Alvaro Cardenas and Cihang Xie of UC Santa Cruz’s Baskin School of Engineering and will be presented at the IEEE Conference on Secure and Trustworthy Machine Learning. According to the researchers, this represents the first academic investigation of prompt injection threats against embodied AI systems operating in real environments.
Building on earlier classroom research, the team developed a framework called CHAI, short for command hijacking against embodied AI. CHAI generates attack text designed to maximize the likelihood that an AI system will follow unintended instructions and optimizes how that text appears in the environment, including its placement, size, and color. The attacks were tested across scenarios involving autonomous driving, emergency drone landings, and aerial search missions.
Experiments were conducted using high-fidelity simulators, real-world driving imagery, and a small autonomous robotic car operating inside a university building. The researchers reported that the attacks successfully caused unsafe behaviors, such as inappropriate drone landings and vehicle navigation errors. Tests were carried out against both cloud-based and on-device vision-language models, including GPT-4o developed by OpenAI, as well as an open-source alternative.
In physical-world trials, printed attack images placed in the robot’s environment were sufficient to override navigation decisions, demonstrating that the vulnerabilities extend beyond simulated settings. The researchers also found that the attacks remained effective under varying lighting conditions, with further testing planned for different weather scenarios.
The authors emphasize that the purpose of the work is defensive. By demonstrating how embodied AI systems can be misled through environmental text, the research aims to inform the development of safeguards before such techniques are exploited outside the laboratory. Ongoing and future work will examine methods for authenticating perceived instructions, aligning them with mission constraints, and improving the robustness of AI systems deployed in public and safety-critical environments.
