Two-step training helps robots interpret human language

Quadcopter Following Navigation Instructions

To robots, humans’ instructions are incredibly complex. Even a relatively simple command like “Go to the house, passing the tree on your right,” may require hundreds of attempts to learn. And if the command changes to “Go to the house, passing the tree on your left,” the robots needs to re-learn the task from scratch.

A new paper by Cornell researchers aims to address this challenge by breaking the robot’s task into two separate stages: first, interpreting the language in the command and mapping out a trajectory; and then executing its trip. In a simulation, a drone trained with this approach learned how to maneuver in a given environment faster and more accurately than with existing methods.

“Planning where to go is a much simpler problem than actually going there, because it avoids the agent having to actually act in the environment,” said Valts Blukis, a doctoral student in computer science at Cornell Tech and first author on the paper, “Mapping Navigation Instructions to Continuous Control Actions With Position-Visitation Prediction,” presented at the Conference on Robot Learning Oct. 29-31 in Zurich, Switzerland. “Once it can predict a path, it is then relatively easy to follow it, without having to care about the original instruction.”

The paper was co-written with Cornell computer science doctoral student Dipendra Misra; Ross Knepper, assistant professor of computer science; and senior author Yoav Artzi, assistant professor of computer science at Cornell Tech.

Most existing robots follow instructions from complicated user interfaces, or controllers such as joysticks. To control them, operators must have expertise or training, limiting the robots’ use to repetitive tasks and industrial settings like factories. Robots that could interpret natural human language could be accessible to non-experts and potentially capable of a wider range of tasks.

“Language is powerful and lets us express many ideas and constructions,” Blukis said. “With language, we could envision telling our robots exactly what we want them to do.”

But the same complexity and richness that makes language so effective also makes it hard for robots to comprehend. A command such as “Go toward the blue fence, passing the anvil and tree on the right,” for example, requires the computer to understand numerous concepts and behaviors.

Meanwhile, many recent robots learn by experience. Through hundreds of thousands of attempts, they correct their behavior until they’ve learned how to do each job effectively. That approach is infeasible if you’re trying to get the robot to respond to natural human language, rather than a list of pre-learned commands.

In the researchers’ new model, the robot first interprets the language to identify the places a robot is likely to visit while accomplishing its task and recognizing the correct destination. It then travels between its most likely positions to reach its destination.

Those two stages were trained separately using deep neural networks – a type of machine learning architecture in which computers learn representations from data. The model stores information as it observes it, allowing it to refine its predictions over time.

“Once it has a prediction for where it should be going – basically by highlighting the areas that are very likely to be visited – it generates actions to go there,” Blukis said. “This way, we can iterate through the instruction data much faster, and quickly teach the robot to correctly plan where to go from instructions, without the robot having to act out in the environment and make mistakes at all.”

The researchers tested their model using nearly 28,000 crowdsourced commands and a quadcopter simulator that approximates drone flight, including a realistic controller requiring quick decisions in response to changing conditions. They found their simulated drone was almost twice as accurate than those using two other recently proposed methods, and Blukis said they trained their model in a matter of days, rather than weeks.

Though so far the model exists only in a simulated format, future similar approaches could potentially be applied to delivery robots or even self-driving vehicles. This system would be particularly useful in large, unfamiliar or complicated surroundings where training a robot to respond to more specific, targeted tasks would be impractical, he said.

The research was supported by the Schmidt Science Fellows program, the National Science Foundation, the Air Force Office of Scientific Research and Amazon.

Media Contact

Jeff Tyson