Home Bots & BrainsNVIDIA Introduces Open Physical AI Data Factory for Robots and Autonomous Systems

NVIDIA Introduces Open Physical AI Data Factory for Robots and Autonomous Systems

by Marco van der Hoeven

NVIDIA has introduced the NVIDIA Physical AI Data Factory Blueprint, an open reference architecture designed to standardize and automate how training data is generated, augmented and evaluated for physical artificial intelligence systems, including robotics, vision AI agents and autonomous vehicles.

The framework is intended to address the large data requirements associated with physical AI models. It provides a workflow that converts raw or limited datasets into model-ready training data through automated data processing, synthetic data generation, reinforcement learning and evaluation processes. The architecture uses NVIDIA Cosmos foundation models and coding agents to expand small datasets into larger collections that include rare edge cases and long-tail scenarios that are difficult or costly to capture in real-world environments.

Cloud providers Microsoft Azure and Nebius are integrating the blueprint into their infrastructure and services, allowing developers to use cloud-based accelerated computing resources to generate and manage large-scale training datasets. Several companies developing physical AI systems, including FieldAI, Hexagon Robotics, Linker Vision, Milestone Systems, RoboForce, Skild AI, Teradyne Robotics and Uber, are applying the architecture in robotics, vision AI and autonomous vehicle development projects.

“Physical AI is the next frontier of the AI revolution, where success depends on the ability to generate massive amounts of data,” said Rev Lebaredian, vice president of Omniverse and simulation technologies at NVIDIA. “Together with cloud leaders, we’re providing a new kind of agentic engine that transforms compute into the high-quality data required to bring the next generation of autonomous systems and robots to life. In this new era, compute is data.”

The architecture organizes data preparation through a set of modular components. NVIDIA Cosmos Curator processes and annotates large datasets drawn from real-world and simulated environments. Cosmos Transfer expands curated data by generating additional synthetic variations intended to capture environmental differences and rare operational scenarios. Cosmos Evaluator assesses and filters generated datasets for physical accuracy and suitability for model training.

NVIDIA is applying the system internally to train and evaluate NVIDIA Alpamayo, described as an open reasoning-based vision-language-action model designed for autonomous driving tasks. Other developers are using the blueprint in different areas of robotics and automation, including Skild AI for robot foundation models and Uber for autonomous vehicle development.

The blueprint also incorporates NVIDIA OSMO, an open-source orchestration framework designed to manage complex AI workflows across computing environments. OSMO coordinates data pipelines, model training and evaluation tasks while reducing manual operational steps. The framework integrates with coding agents such as Claude Code, OpenAI Codex and Cursor, allowing automated systems to manage computing resources and development workflows.

Microsoft Azure is incorporating the blueprint into an open physical AI toolchain hosted on GitHub. The integration connects the architecture with Azure services including Azure IoT Operations, Microsoft Fabric, Real-Time Intelligence and Microsoft Foundry to support enterprise data workflows for training and validating physical AI models.

Several robotics and perception developers, including FieldAI, Hexagon Robotics, Linker Vision and Teradyne Robotics, are testing the Azure-based toolchain to scale data generation, augmentation and evaluation across their machine learning pipelines.

Nebius has integrated the OSMO orchestration framework into its AI Cloud platform, enabling deployment of automated data pipelines built on the blueprint. The infrastructure combines NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs with object storage, data management and labeling tools, serverless execution and managed inference services.

Companies including Milestone Systems, Voxel51 and RoboForce are using the blueprint within the Nebius environment to develop models for video analytics AI agents, autonomous vehicles and industrial humanoid robotics systems.

The NVIDIA Physical AI Data Factory Blueprint is expected to be made available on GitHub in April.

Misschien vind je deze berichten ook interessant