New archive reveals treasure trove of U.S. media experiments

By Krisy Gashler

August 5, 2021

Almost from the day it launched in March 2012, the media company Upworthy began conducting experiments, testing which headlines and photos on their stories people responded to best. The experiments worked. A year after its launch, Upworthy was the fastest-growing media company in the world, and articles from its site were shared more frequently on Facebook than all other U.S. mainstream media combined.

Now, thanks to Cornell researchers and their colleagues, a dataset of thousands of those experiments is publicly available, providing insight into fields like political science, communication, psychology, marketing, organizational behavior, statistics, computer science and education. The dataset and its significance were presented in the study “The Upworthy Research Archive, a time series of 32,487 experiments in U.S. media,” published August 2 in Nature Scientific Data.

“In academia, we do one experiment and we’re really excited – it’s this precious, rare, special knowledge. But then I would think about the Upworthys of the world – Google, Facebook, Airbnb – and just know that for every experiment I as a scientist was doing, there was a company doing a thousand experiments,” said Nathan Matias, assistant professor of communication in the College of Agriculture and Life Sciences and co-lead author of the study. “I wanted to teach my students how to work in that world of continuous experimentation. And I knew there could be huge scientific gains to be had in analyzing all of that experimentation.”

Media and tech companies have been notoriously stingy with their big datasets for academic use, a position that Matias both sympathizes with and hopes to change. In his attempts to gain access to the big data he wanted, Matias began cold-contacting every organization he could think of that did this kind of large-scale experimentation.

“I would say, ‘There’s a huge opportunity here for education, for science; let’s figure out how to make your archive experiments more available,’” he said. “I knocked on a lot of doors.”

When Matias contacted Upworthy in 2017, the company had gone through many changes. Facebook’s algorithm tweaks had decreased the visibility of sites like Upworthy on its users’ feeds, and the company was under new ownership, Good, Inc. But Upworthy’s founding ethos, of promoting content that was “visual, meaningful and awesome,” remained, as did their interest in experimentation. Matias, his colleagues and the company spent the next few years negotiating access to the data, validating it and creating the dataset.

More than 70 academic teams in a range of disciplines have already requested access to the data. A team of psychologists is looking at whether headlines that emphasize morality, curiosity or emotional appeals gain the most traction. A political scientist is studying whether clickbait innovations remain effective over time or whether users adapt to them and lose interest. A group of statisticians are doing metascientific research, using the dataset to understand the statistics of experimentation and develop advanced research methods.

The new archive can help answer questions about the digital behavior of readers – like how they respond to headlines – and content producers – such as how organizations make decisions, said Kevin Munger, assistant professor of political science at Penn State and co-lead researcher of the study.

“We need to dramatically increase the scale of social scientific output; the easiest way to do that, I believe, is to look for these ‘knowledge windfalls,’ where we can publicize the most knowledge at the lowest cost,” Munger said. “Collaboration with private companies has downsides, many of which are well publicized, but it also enables these knowledge windfalls. We hope that other companies follow the lead of Good Media/Upworthy and realize that they can make a significant contribution at very little cost.”

Along with Matias and Munger, co-authors on the paper are Marianne Aubin Le Quere, a Ph.D. student in information sciences, and Charles Ebersole, a researcher with the American Institutes for Research. A grant from Cornell University Library supported preparation of the dataset.

Krisy Gashler is a writer for the College of Agriculture and Life Sciences.

Social & Behavioral Sciences