Master’s student Hadi AlZayer was hiking in a park, periodically asking fellow hikers to snap smartphone photos of him, when a pattern emerged.
“Whenever I asked strangers to take photos for me, I’d end up with photos that were poorly composed,” he said. “It got me thinking.”
AlZayer knew framing a good photo is an incremental process – navigate a space, position one’s body, adjust the camera, peer through the viewfinder, adjust as needed – and such step-by-step processes like this could be automated with an algorithm. Taken further, AlZayer reasoned that the entire process could be done algorithmically and robotically through a machine learning process called “reinforcement learning.”
This inspiration led to AutoPhoto, a robotic system developed by a trio of researchers from the Cornell Ann S. Bowers College of Computing and Information Science that can automatically roam an interior space and capture aesthetically pleasing photographs.
It is believed to be the first robotic-photographer system using a “learned aesthetic” machine learning model and represents a major development in using autonomous agents to visually document a space, the researchers said. In its most immediate application, AutoPhoto could be used to photograph interiors of houses and rental properties. But, long term, the technology behind it could have profound impacts – imagine a robot capable of traversing and documenting remote or dangerous locations, from distant planets to war zones, all on its own.
“This kind of work is not well explored, and it’s clear there could be useful applications for it,” said Hubert Lin, doctoral student in the field of computer science and co-author of “AutoPhoto: Aesthetic Photo Capture using Reinforcement Learning,” presented at the International Conference on Intelligent Robots and Systems in fall 2021.
AlZayer is the paper’s first author; in addition to AlZayer and Lin, the AutoPhoto team includes Kavita Bala, dean of Bowers CIS. AutoPhoto, the researchers said, treads new ground in the areas of computer vision and autonomous photography by combining existing machine learning work with its own custom, deep-learning models.
AutoPhoto builds off an existing algorithm, called a learned aesthetic estimation model, that is trained on more than 1 million quality, human-ranked photographs.
“This aesthetic model helps the robot determine if the photos it takes are good or not,” AlZayer said.
To date, no one had combined this existing model with an actual, autonomous robot that could navigate a new space on its own and in real time, frame scenes optimally, and capture the kinds of quality photos one might find in an AirBnB listing.
“To guide the robot, we trained a separate model to move around in an environment and find a place that looks good,” said Lin, who plans to graduate in May and join Waymo as a research scientist.
After successful training runs in simulation, wherein the AutoPhoto algorithm scanned dozens of 3D photos of interior scenes and correctly chose the best compositional angles, the Cornell team mounted its AutoPhoto system and camera onto a Clearpath Jackal robot and turned it loose within a common space in Upson Hall.
What AutoPhoto captures – as seen in a project demo video – reads as a frame-by-frame account of how the robot explores its surroundings for the optimal shot. The first shots are mundane close-ups of walls, an off-centered stairwell, a trash can. But with each subsequent adjustment and corresponding photo, AutoPhoto corrects itself and maneuvers into position to better frame the space.
Once it has captured a photo its aesthetic model identifies as compositionally sound, AutoPhoto files it away and motors on to document more of the space. Exploring the Upson Hall lounge and capturing three quality photos took AutoPhoto a few minutes, the team said.
“The most challenging part was the fact there was no existing baseline number we were trying to improve,” said AlZayer, who hopes to pursue a Ph.D. “We had to define the entire process and the problem.”
The great outdoors could be the next area of exploration for AutoPhoto development, AlZayer said. Millions of videos of scenic, outdoor locations that already exist online could provide a wealth of data to train the AutoPhoto model on good composition of outdoor scenes, rather than interior spaces, AlZayer said.
This research was funded in part by the National Science Foundation and the Natural Sciences and Engineering Research Council of Canada.
Louis DiPietro is a writer for the Ann S. Bowers College of Computing and Information Science.