We increasingly place our trust in algorithms, whether applying for a mortgage, a new job; or making personal health decisions. But what about the security system that uses facial recognition and locks out a 55-year-old office custodian from her night shift? Or the groups of people automatically cropped out of photos on social media? These are the unintended, and often unfair, consequences of data science tools amplified across millions of users. They’re also highly preventable.
This is the lesson that lawyer and epidemiologist M. Elizabeth Karns embeds in every data science and statistics course she teaches in the Department of Statistics and Data Science. Her students will be deciding how to use data in the future, and while bad decision-making in business isn’t new, Karns says it’s the accelerated and aggregated effect of today’s data science applications that’s so dangerous: individual, team or even a whole company’s worth of decisions, can instantly affect the lives of millions of people. Moreover, the torrent of new technologies is moving faster than our regulatory systems, leaving a gap in accountability. Even data scientists themselves often don’t know exactly what’s happening inside their algorithms.
“This little magic box [the algorithm] is determining our life choices, often without any transparency, due process or a way to appeal,” says Karns. “That is why ethics is so important. We don’t have to further marginalize certain groups and individuals should not have to worry about their safety because of poorly and unethically conceived data applications.”
To combat the current ‘wait and see’ approach to algorithms, Karns has partnered with eCornell to launch a new Data Ethics online certificate program to give data science practitioners tools to build ethics into every project phase and data science workplace. The program encompasses four two-week courses that offer data scientists, or anyone managing data projects a structured “pause” to consider the ethical implications of their work. Karns begins with an overview of macro-level data science issues of fairness, justice, safety and privacy, then shifts focus to individual choices.
Those choices, says Karns, are rooted in virtue ethics — the personal values or virtues that drive our behavior. The certificate program guides participants to identify and clarify their virtues, then offers low-stakes mechanisms for addressing ethical concerns when those virtues don’t align with those of a data science project, team, or organization.
“With virtue ethics we can identify what’s happening that makes us uncomfortable,” Karns says. “Is it the process? The personalities? The data collection method? Then, we use our moral imagination to play out the future ethical considerations and consider alternatives.”
Those alternatives don’t necessarily require tradeoffs, since the best ethical practices—which focus on reducing harm—are also stellar business practices. Documentation is a valuable tool, as is referencing corporate values, risk management and reputation. That is why the new Data Ethics certificate program isn’t just for practitioners; Karns says that managers of data scientists often don't understand the kinds of requests they're making, or the choices and risks they entail. She hopes the increased accessibility of data ethics education through online courses will be a significant step towards shaping the ethical data science workplace of the future — where managers expect discussions about potential ethical issues and build ethical reflection into every step of the development process.
“The ability to recognize and mitigate harm that results from our actions is the key to building applications that are fair, just, and safe,” Karns says. “Ethical thinking is essential training, and we’re in a particularly advantageous time right now to provide practitioners the language and tools they need to improve data science results, and our world.”
Sarah Thompson is a writer for eCornell.