Home Bots & BrainsGoogle DeepMind Launches Gemini Robotics-ER 1.6 with Improved Spatial Reasoning and Instrument Reading

Google DeepMind Launches Gemini Robotics-ER 1.6 with Improved Spatial Reasoning and Instrument Reading

by Marco van der Hoeven

Google DeepMind has released Gemini Robotics-ER 1.6, an updated version of its embodied reasoning model designed to give robots a more precise understanding of their physical environments. The model is available to developers today through the Gemini API and Google AI Studio.

The release focuses on three core capability areas: pointing-based spatial reasoning, multi-view success detection, and a newly introduced instrument reading function.

Spatial reasoning and pointing

Pointing — the model’s ability to identify and locate objects within an image — serves as a building block for more complex reasoning tasks. Gemini Robotics-ER 1.6 uses points as intermediate steps to count objects, define spatial relationships, map movement trajectories, and identify grasp points. Google says the updated model can also correctly decline to point to objects that are not present in a scene, reducing hallucinations compared to its predecessor.

Success detection across multiple camera views

Knowing when a task has been completed is a key requirement for autonomous robots. The new model improves on multi-view reasoning, allowing it to synthesize information from several camera streams simultaneously — such as an overhead camera and a wrist-mounted feed — to determine whether a given task has been carried out successfully, even in partially obscured or dynamically changing environments.

Instrument reading

Perhaps the most notable new capability is instrument reading, which Google developed in collaboration with Boston Dynamics. The feature enables robots to interpret analog gauges, pressure meters, chemical sight glasses, and digital readouts encountered during industrial facility inspections.

The capability uses what Google calls “agentic vision” — a combination of visual reasoning and code execution. The model first zooms into gauge images to resolve fine detail, then uses pointing and mathematical code execution to estimate dial positions and tick intervals before applying world knowledge to derive a final reading.

In benchmarks, Gemini Robotics-ER 1.6 with agentic vision achieved a 93% success rate on instrument reading tasks, compared to 67% for Gemini 3.0 Flash and 23% for the previous Gemini Robotics-ER 1.5.

Boston Dynamics Vice President Marco da Silva said the capability will allow its Spot robot to identify and react to real-world challenges autonomously during inspection rounds.

Safety

Google describes Gemini Robotics-ER 1.6 as its safest robotics model to date, citing improved compliance with safety policies on adversarial spatial reasoning tasks. The model also demonstrates better adherence to physical safety constraints, such as avoiding manipulation of objects that exceed weight limits or involve prohibited materials. Compared to the baseline Gemini 3.0 Flash model, the Robotics-ER line shows a 6% improvement in text-based hazard identification and a 10% improvement in video-based hazard identification.

Google has published a developer Colab notebook with configuration examples and is inviting robotics teams working in specialised applications to submit labelled images of failure cases to help shape future model improvements.

Misschien vind je deze berichten ook interessant