Researchers at the Robotics Institute (RI) in Carnegie Mellon University’s School of Computer Science have developed a new learning method for robots called WHIRL, short for In-the-Wild Human Imitating Robot Learning. WHIRL is an efficient algorithm for one-shot visual imitation. It can learn directly from human-interaction videos and generalize that information to new tasks, making robots well-suited to learning household chores.
People constantly perform various tasks in their homes. With WHIRL, a robot can observe those tasks and gather the video data it needs to eventually determine how to complete the job itself.
The team added a camera and their software to an off-the-shelf robot, and it learned how to do more than 20 tasks — from opening and closing appliances, cabinet doors and drawers to putting a lid on a pot, pushing in a chair and even taking a garbage bag out of the bin. Each time, the robot watched a human complete the task once and then went about practicing and learning to accomplish the task on its own. The team presented their research this month at the Robotics: Science and Systems conference in New York.
“This work presents a way to bring robots into the home,” said Pathak, an assistant professor in the RI and a member of the team. “Instead of waiting for robots to be programmed or trained to successfully complete different tasks before deploying them into people’s homes, this technology allows us to deploy the robots and have them learn how to complete tasks, all the while adapting to their environments and improving solely by watching.”
Current methods for teaching a robot a task typically rely on imitation or reinforcement learning. In imitation learning, humans manually operate a robot to teach it how to complete a task. This process must be done several times for a single task before the robot learns. In reinforcement learning, the robot is typically trained on millions of examples in simulation and then asked to adapt that training to the real world.
Both learning models work well when teaching a robot a single task in a structured environment, but they are difficult to scale and deploy. WHIRL can learn from any video of a human doing a task. It is easily scalable, not confined to one specific task and can operate in realistic home environments. The team is even working on a version of WHIRL trained by watching videos of human interaction from YouTube and Flickr.
Progress in computer vision made the work possible. Using models trained on internet data, computers can now understand and model movement in 3D. The team used these models to understand human movement, facilitating training WHIRL.
With WHIRL, a robot can accomplish tasks in their natural environments. The appliances, doors, drawers, lids, chairs and garbage bag were not modified or manipulated to suit the robot. The robot’s first several attempts at a task ended in failure, but once it had a few successes, it quickly latched on to how to accomplish it and mastered it. While the robot may not accomplish the task with the same movements as a human, that’s not the goal. Humans and robots have different parts, and they move differently. What matters is that the end result is the same. The door is opened. The switch is turned off. The faucet is turned on.
“To scale robotics in the wild, the data must be reliable and stable, and the robots should become better in their environment by practicing on their own,” Pathak said.
Image: With WHIRL, a robot learned how to do more than 20 tasks — from opening and closing appliances, cabinet doors and drawers to putting a lid on a pot, pushing in a chair and even taking a garbage bag out of the bin. Credit: Carnegie Mellon University