Cornell Brand Communications

Study: Smart speakers make passive listeners

By Melanie Lefkowitz

November 27, 2018

People explore less when they get recommendations from voice-based platforms such as Amazon’s Alexa or Apple’s Siri, making it more likely that they’ll hear options chosen by an algorithm than those they might actually prefer.

A study by Cornell researchers, exploring the broader implications of how content will be discovered as smart speakers grow more widespread, found that people who read choices online consumed information nine times faster and explored at least three times as much as those who heard them listed.

“We found that this problem is quite significant,” said Longqi Yang, a computer science doctoral student at Cornell Tech and first author of the paper, “Understanding User Interactions with Podcast Recommendations Delivered via Voice,” which was presented at the ACM Conference on Recommender Systems in October. “With these devices becoming more popular and more people adopting them, this kind of interface becomes very important, because it’s one of the major channels for people to be exposed to information.”

Smart speakers and virtual assistants could be designed differently to address this challenge, Yang said. The researchers recommended that smart speakers offer top-ranked choices that are diverse, personalized and frequently changed, so users have access to a wider range of information even if they choose from the first few items.

“We don't want people to be offered an overly narrow set of content and opinions or be exposed only to what is most popular,” Yang said. “That might be acceptable when recommending shoes, but not when recommending information and cultural content.”

According to consumer research, 16 percent of Americans own a smart speaker – around 40 million people – and 65 percent of those say they would not go back to life without one.

In this experiment, the researchers asked 100 people to choose a podcast they would commit to listening to for five minutes. Half the participants saw the list of podcast titles and half of them heard the same list spoken out loud. They were then asked questions about whether they liked the podcast they’d chosen.

The researchers found listeners were far more likely to choose one of the first choices offered, while people who read the choices explored six times more deeply into the list of recommendations. People reading their choices also did more skimming and browsing.

Recommendation algorithms generally prioritize popular content, potentially creating an echo-chamber effect, Yang said. In the study, people who read their recommendations were less likely to choose the most popular or top-rated options.

There was no statistical difference in how much people from either group enjoyed the podcasts they chose.

“One important problem with these kinds of recommendation systems is that they selectively share information with users, so your information exposure is determined by what the system explicitly offers you,” Yang said. “In the web interface, you have the ability to browse, you can scroll and skim. You get a very broad and wide exposure to different kinds of information that’s out there. With voice, people don’t really have the patience or won’t really wait for so many items to decide what they want to consume.”

The paper was co-authored with senior author Deborah Estrin, associate dean and Robert V. Tishman ’37 Professor of Computer Science at Cornell Tech, Cornell Tech postdoctoral associate Michael Sobolev and Christina Tsangouri of the City University of New York. The experiment was part of the team’s broader research into the relationship between computer recommendations and people’s intentions when it comes to making choices.

The research was funded by the National Science Foundation and Oath, which is part of Verizon; and supported by Cornell Tech’s Connected Experiences and Small Data labs.