Making big data serve the little guy
By Bill Steele
Big data is a hot topic in computer science. It’s also big business, as government and commercial interests mine databases for their own purposes, scouting for business trends, political preferences and, of course, new customers.
The average citizen whose data is in these systems doesn’t have much to say about how it’s used, but a team of Cornell computer scientists, statisticians and mathematicians has formed the Center for Data Science for Improved Decision-Making to research data management and find ways to make these systems handle data responsibly and use this new resource for the public benefit.
The team consists of Kilian Weinberger, associate professor of computer science; Jon Kleinberg, the Tisch University Professor of Computer Science; Steve Strogatz, the Jacob Gould Schurman Professor of Applied Mathematics; Giles Hooker, associate professor of biological statistics and computational biology; and David Shmoys, the Laibe/Acheson Professor of Business Management and Leadership Studies in the School of Operations Research and Information Engineering. As research progresses, they plan to collaborate with a large number of faculty members in related fields.
Their work will be supported by a $1.5 million grant from the National Science Foundation’s TRIPODS (Transdisciplinary Research in Principles of Data Science) program.
Their research will focus on several areas:
- How to guarantee the privacy of data and ensure that decisions are not biased by race, gender or other characterizations. The researchers propose to build into data management systems the ability to detect weaknesses in these areas and correct them.
- Learning more about their structure and the processes that take place within social networks, where connections between people can be used to inform decision making. With early detection and containment strategies, the researchers say, adversarial fake news or viral disease epidemics can potentially be identified at a much earlier stage, and their damage may be controlled. The same applies, Weinberger noted, to phone company records of “who’s calling whom.”
- “Interventions” where systems make recommendations or suggestions, or reach decisions about participants, based on their histories. Applications range from overseeing health care interventions to avoiding polarization of user populations.
- Uncertainty quantification: Knowing how unsure a prediction might be, especially when applied to decision-making with potential consequences to human subjects. Some currently popular algorithms don’t report how much variability there might be in their output.
- Deep learning, widely used but still not well understood. There is ambiguity about what these systems actually learn.
TRIPODS projects are aimed at harnessing the data revolution to enable continued data-driven discovery and breakthroughs across all fields of science and engineering, NSF said in announcing the grant.