Researchers from MIT and NVIDIA have developed a novel robotic planning system that enables machines to solve complex manipulation tasks within seconds. The approach, presented at the Robotics: Science and Systems Conference, combines parallel computing and optimization techniques to significantly accelerate task and motion planning (TAMP) for robots in dynamic environments such as warehouses or manufacturing lines.
The new algorithm, named cuTAMP, addresses the high computational demands of robotic manipulation by evaluating thousands of possible motion plans in parallel. Unlike traditional systems that simulate one action at a time, cuTAMP simultaneously samples and optimizes a wide array of potential solutions using GPUs (graphics processing units). This enables robots to quickly identify viable ways to pick up, move, and position objects while meeting constraints such as collision avoidance and object orientation.
“Using GPUs, the computational cost of optimizing one solution is the same as optimizing hundreds or thousands,” said William Shen, lead author and MIT graduate student. “This is critical in environments where speed directly impacts operational efficiency.”
To demonstrate its capabilities, the team tested cuTAMP on Tetris-like block-packing tasks. In simulation, the system found successful, collision-free solutions within seconds. On physical robotic arms—both at MIT and NVIDIA—the algorithm consistently delivered results in under 30 seconds. The system does not rely on training data, making it adaptable to new environments and tasks without additional learning cycles.
cuTAMP merges two techniques: sampling, which limits the search to likely solution candidates, and parallelized optimization, which refines these candidates based on cost functions that account for motion feasibility and user-defined goals. By narrowing the sampling space to more relevant actions, cuTAMP accelerates convergence to a usable plan.
“This kind of algorithm is especially valuable in industrial contexts where delays in robotic planning translate into real financial costs,” said Caelan Garrett, senior research scientist at NVIDIA Research and co-author.
While the current focus is on object manipulation and packing, the researchers note that the framework can generalize to other tasks, such as tool use or assembly. Future developments aim to integrate language models and visual reasoning systems, potentially enabling robots to respond to natural language commands and execute multistep objectives with minimal human input.
The research team includes contributors from MIT CSAIL, NVIDIA Research, the University of Utah, and the University of Sydney.