If you see Fred and Susie standing in the same line at the cafeteria just once, it probably doesn't mean anything. If they show up together in many different places, it starts to mean a lot. But how many times do you have to see them together before it becomes significant? Surprisingly few, say Cornell computer scientists. And that could mean your online presence tells the world more than you intended.
Comparing the locations of photos posted on the Internet with social network contacts, researchers found that as few as three "co-locations" at different times and places could predict with high probability that two people posting photos were socially connected. The results have implications for online privacy, the researchers say, but also suggest a quantitative answer to a very old psychological question: What can we conclude from observing coincidences?
"This a kind of question that goes way back," said Jon Kleinberg, the Tisch University Professor of Computer Science, who conducted the study with Dan Huttenlocher, the John P. and Rilla Neafsey Professor of Computing, Information Science and Business and dean of the Faculty of Computing and Information Science. "Online data gives us new ways to address it."
Their work is reported in the Dec. 8, 2010 online Early Edition of the Proceedings of the National Academy of Sciences. Former Cornell postdoctoral researcher Dave Crandall, now an assistant professor at Indiana University, was the lead researcher on the study, which grew out of the research team's work on analyzing social data from large-scale online communities.
The researchers used a database of some 38 million geotagged photos uploaded to the Flickr photo-sharing website by about a half million people. The time and place where the photos were taken was provided by GPS-equipped cameras or by people who used Flickr's online interface to indicate the location on a map. Anyone can read this information from a Flickr page.
Flickr also offers a social networking service, and computer analysis showed that when two people posted photos several times from the same locations (often famous landmarks) and at about the same times, this was a good predictor that those people would have a social network link.
"It's not that you know with certainty," Huttenlocher pointed out, "but it's a high likelihood that these people know each other." As expected, the probability increases as the analysis moves to smaller areas and shorter time spans.
Flickr is just a convenient place to study the phenomenon, the researchers said. The same conclusions might be drawn from credit card purchases, fare card transactions on the bus and subway, and cell phone records, they suggested. "It's surprising -- and not in a reassuring way -- that so much information comes from so little," Kleinberg said. "Our research is trying to provide a way of quantifying these risks."
Huttenlocher added, "While it's obvious that a photo you post online reveals information about what is pictured in the photo, what is less obvious is that as you post multiple photos you are probably revealing information which may not be pictured anywhere."
One way to mitigate privacy risks, Kleinberg suggested, would be to "blur" time and space information in permanent records, making it less precise. This research might offer hints on how much blurring is needed, he said.
The researchers recognized that the photo-sharing process might introduce some bias into their results. For example, people might seek out social contacts with others who had photographed the same site. To control for this they compared photos posted after a certain date only with social links established before that date. They also controlled for the possibility that friends might upload the same photos, and for the fact that people with many social contacts on Flickr might be more likely to geotag their photos.
Also participating in the research were Dan Cosley, assistant professor of information science; former Ph.D. student Lars Backstrom, now at Facebook; and former postdoctoral researcher Sid Suri, now at Yahoo! Research. The work was supported in part by the MacArthur Foundation, Google, Yahoo! and the National Science Foundation.