Incentives encourage greater exploration, research finds

Poet Robert Frost urged us to explore “the road not taken.” But what if you’re not driving the bus? How do you encourage others to explore less-traveled roads? Cornell researchers have found an answer in an area of computer science you don’t usually hear about: playing imaginary slot machines. There are applications in the academic world, from research funding to crowdsourcing, but also in everyday life.

In crowdsourcing, where social science researchers employ hundreds or even thousands of people to perform tasks online, workers might spend all their time on familiar tasks that provide small but reliable rewards, and miss those that haven’t been tried before, but might be of more interest to the researchers. Institutions, like the Cornell Lab of Ornithology, recruit thousands of citizen scientists to track the distribution of species, but some birders may spend all their time in places where they know they will get many sightings, missing what’s happening in areas that aren't often explored. Similarly, organizations like the National Science Foundation and the National Institutes of Health are looking for big ideas with long-term benefits for society, but researchers often focus on small advances in popular fields with better short-term returns.

As an incentive you could pay the people you work with for trying new things, but how much can you afford? A funding agency might deliberately skew its funding toward high-risk ideas, hoping applicants will notice and submit more of those, but how much of that can these agencies do while still supporting trending topics?

“You have a budget that you can afford, you want to use this budget to maximize the result,” said Peter Frazier, assistant professor of operations research and information engineering. “We give a formula for what is achievable.”

Frazier is first author of “Incentivizing Exploration,” which received the Best Paper Award at the 2014 ACM Conference on Economics and Computation in Palo Alto, Calif. Co-authors are Robert Kleinberg, associate professor of computer science, Jon Kleinberg, the Tisch University Professor of Computer Science, and David Kempe  Ph.D. ‘03, associate professor of computer science at the University of Southern California (a former student of Jon Kleinberg’s).

They viewed the issue in terms of what computer scientists call the “multi-armed bandit problem.” Imagine playing a row of slot machines: You know that some of them pay off frequently, but there are others you haven’t tried yet, and some of those might pay off even better. How much time can you take away from playing the reliable machines to try out the others?

Theorists have worked out the best strategies, but suppose you’re not pulling the handles yourself: You’ve hired a bunch of people to do it for you, giving them a cut of the take. Although you would follow the mathematically ideal strategy, these “selfish and myopic” agents might spend all their time pulling the handles they know will pay off. You’ll have to pay them something extra to try the unknowns.

The researchers supply a formula to show what overall returns are achievable for a given amount paid to agents, from which to calculate how much you can pay and still end up with an overall gain. As a spinoff, they found that for especially hard problems the best return is obtained by switching randomly between forcing agents to use the optimum strategy and letting them selfishly choose on their own. The gains on one side balance the losses on the other, they showed.

As a next step, they suggest considering the future value of the risky choices. In research, a strange new idea might not pay off immediately, but its long-term value might justify a larger incentive, and that should be included in the calculation of how much to spend today. That makes the problem harder to solve, they said, but still an untraveled road that should be explored.

Media Contact

Syl Kacapyr