Home Bots & BrainsSingle photo turns into a virtual world, training robots and AI Agents

Single photo turns into a virtual world, training robots and AI Agents

by Pieter Werner

Ant Group has expanded its multimodal AI assistant LingGuang with a new feature that allows users to create an interactive, AI-generated 3D environment from a single image on a smartphone. The feature, called “Experience World Model”, is based on LingBot-World-Fast, an open-source world model developed by Robbyant, Ant Group’s embodied AI unit. For robotics, the relevance lies mainly in simulation and embodied AI. Virtual worlds can be used to let robots or AI agents practise scene understanding, navigation and task execution before they are deployed in physical environments.

Robbyant positions LingBot-World-Fast as technology for robot training, interactive simulation, game prototyping and visual content development. With the new application, users can upload a photo in LingGuang, after which the app generates a short, explorable world. According to Ant Group, users can enter this environment from a first-person perspective and move through it in real time, similar to controlling a scene in a video game. The generated experience lasts up to 60 seconds and does not require local installation or technical configuration.

The announcement is relevant because Ant Group is making world model technology available not only to developers and researchers, but also to consumers through a mobile app. In AI research, world models are used to generate digital representations of environments and predict how those environments change when a user or agent moves through them. This makes the technology relevant for applications in gaming, content creation and robotics.

According to Ant Group, LingBot-World-Fast can generate in real time at 16 frames per second at 480P resolution, with interaction latency below one second. Robbyant says it achieved this by using an optimization approach that focuses computing power mainly on generating new visual content, rather than recalculating stable parts of the scene. The LingGuang app adds low-latency streaming technology to reduce response times for mobile users.

In addition to the mobile feature, LingBot-World-Fast has also been released as an open-source model on Hugging Face. This allows developers and researchers to deploy and further develop the model locally. In a local environment, the model supports continuous generation, keyboard and mouse control, and real-time changes to environments through text prompts, such as adjusting weather conditions or switching visual styles.

The new feature follows Robbyant’s earlier open-sourcing of LingBot-World-Base in January 2026. With this, Robbyant is further developing a series of models for embodied AI, including technology for world simulation, spatial understanding and robot control. For Ant Group, the integration into LingGuang means that this technology is now also being tested as a direct user experience on mobile devices.

Misschien vind je deze berichten ook interessant