Analysis of Flickr photos could lead to online travel books

By Paul Redfern

April 28, 2009

David Crandall

Representative images for the top landmark in each of the top 20 North American cities. All parts of the figure, including images, textual labels and the map itself, were produced automatically from the researchers' geo-tagged photos. Larger version

Cornell scientists have downloaded and analyzed nearly 35 million Flickr photos taken by more than 300,000 photographers from around the globe, using a supercomputer at the Cornell Center for Advanced Computing (CAC).

Their research, which was presented at the International World Wide Web Conference in Madrid, April 20-24, provides a new and practical way to automatically organize, label and summarize large-scale collections of digital images. The scalability of the method allows for mining information latent in very large sets of images, raising the intriguing possibility of an online travel guidebook that could automatically identify the best sites to visit on a vacation, as judged by the collective wisdom of the world's photographers.

The research also generated statistics on the world's most photographed cities and landmarks, gleaned from the analysis of the multi-terabyte photo collection:

Interestingly, the Apple Store in midtown Manhattan was the fifth-most photographed place in New York City -- and the 28th-most photographed place in the world.

The researchers developed techniques to identify places that people find interesting to photograph, showing results for thousands of locations at both city and landmark scales.

"We developed classification methods for characterizing these locations from visual, textual and temporal features," said Daniel Huttenlocher, the John P. and Rilla Neafsey Professor of Computing, Information Science and Business and Stephen H. Weiss fellow. "These methods reveal that both visual and temporal features improve the ability to estimate the location of a photo compared to using just textual tags."

As the creation of digital data accelerates, said CAC director David Lifka, "supercomputers and high-performance storage systems will be essential in order to quickly store, archive, preserve and retrieve large-scale data collections."

The research was supported in part by the National Science Foundation (NSF) and by funding from Google, Yahoo! and the John D. and Catherine T. MacArthur Foundation. The CAC is supported by Cornell, the NSF, the Department of Defense, the Department of Agriculture and members of its corporate program.

The paper is available for download at http://www.cs.cornell.edu/~dph/papers/photomap-www09.pdf.

Paul Redfern is an assistant director at the Center for Advanced Computing.

Arts & Humanities