To build trust in data science, work together

As data science systems become more widespread, effectively governing and managing them has become a top priority for practitioners and researchers. While data science allows researchers to chart new frontiers, it requires varied forms of discretion and interpretation to ensure its credibility. Central to this is the notion of trust – how do we reliably know the trustworthiness of data, algorithms and models?

This is the basis of research from Samir Passi, doctoral student in information science, and Steven Jackson, associate professor of information science, whose paper “Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects,” received a Best Paper Award at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), held Nov. 3-7. Passi and Jackson also received a Best Paper Award for “Data Vision: Learning to See Through Algorithmic Abstraction” at last year’s CSCW conference.

Samir Passi, left, a doctoral candidate in the field of Information Science, and Steve Jackson, associate professor of information science, co-authored a forthcoming paper that examines how we navigate uncertainties in applied data science.

“When it comes to data science, people, practitioners and researchers are faced with an important question: How is it that we trust algorithms that we often don’t understand or can’t explain?” Passi said. “Often, we think of trust in data science as a form of calculated trust, but what we show in the paper is that trust in data science is as much collaborative as it is calculative. The use of numbers such as performance metrics, for instance, isn’t straightforward. Their use depends on the context – who provides numbers, when, to whom and for what purpose? Numbers have a sort of plasticity to them.”

The paper’s findings came about through on-the-ground fieldwork at a multibillion-dollar technological firm where Passi worked in an unconventional dual role as data scientist and researcher. Given permission from the company to carry out his research, Passi got a detailed look into the challenges corporations face in designing, developing and using data science systems. They include the difficulty of explaining how the tools of data science and algorithms work, and the fact that results from algorithms don’t always tell the full story, requiring interpretation.

The paper examines these tensions in two separate projects. In one project, the data science team used a prediction model to help a marketing company determine how many of its current customers were likely to cancel their paid service. While both the company’s business and data science teams shared the goal of minimizing cancellations, the data science team saw the algorithmic results as valuable, while the business team – unclear how the figures were even generated – discounted them as incomplete.

The probability generated by the model “is a good indicator, but it is not the absolute truth for us,” said one business analyst interviewed by Passi; it helped identify current customers who were likely to cancel their business, but it didn’t explain why.

The business and data science teams also differed on a separate project for a loan-financing service, leading the authors to conclude that trust in data science systems is entangled not only with the perceived credibility of data, but also with their understanding of and confidence in how the model works.

“Corporate actors prioritize useful results over flawless techniques, working to find pragmatic ways to make the best out of messy situations. They use various strategies to work with, and not necessarily around, doubt and skepticism,” Passi said. “The important point to note here is that real-world applied data science is extremely heterogeneous and collaborative. For instance, we often describe project managers and business analysts as mere users of data science systems, but that is not true. We show in the paper how these experts are also in part the designers of data science systems.”

Beyond its current implications, the paper also connects emerging data science practices to longstanding tensions surrounding science.

“While the paper focuses on challenges and interactions in a given firm, it reminds us that problems of knowledge and uncertainty are always grounded in a social context, and that attributions of trust are a time-honored and often efficient response to those problems,” said Jackson. “In many ways, data scientists are now navigating the same concerns that makers and users of experimental science, statistics and other analytic techniques have negotiated before them. Who and what to trust? Where and how to doubt? And what counts as valid knowledge in a changing and uncertain world?”

Louis DiPietro is communications coordinator for the Department of Information Science.

Media Contact

Jeff Tyson