Collaborators look at legal issues through big data lens

By Joe Wilensky

July 2, 2019

Michael Heise is the William G. McRoberts Professor in the Empirical Study of Law at Cornell Law School; he co-edits the Journal of Empirical Legal Studies (JELS) and is a founding director of the Society for Empirical Legal Studies. Marty Wells is the Charles A. Alexander Professor of Statistical Sciences and holds appointments in the ILR School, the College of Agriculture and Life Sciences, Cornell Law School and Weill Cornell Medicine. He also co-edits JELS, is the chair of the Department of Statistics and Data Science, and is a member of the provost’s Data Science Task Force.

Lindsay France/Cornell University

Michael Heise, the William G. McRoberts Professor in the Empirical Study of Law at Cornell Law School, and Marty Wells, the Charles A. Alexander Professor of Statistical Sciences.

Together they collaborate on empirical legal research, applying advanced data science and statistical analyses to look at legal issues that affect people’s lives as well as examining the judiciary system and how it operates.

How did your collaboration get started?

Marty Wells: It essentially started with a cold call in the early ’90s from [the late Cornell Law School professor] Ted Eisenberg. He was a pioneer in empirical legal studies, and back in the ’80s he was trying to look at data from masses of cases and then make conclusions. And he just called me up one day and said, “I have a question: Can you help me?”

I was a statistician in the ILR School, and my research at that time had a theoretical focus. I thought, “This is great. I have a way of doing something that’s relevant to social sciences, and I’ll be able to reach out and talk to lawyers.” Ted was just wonderful, a collegial legal scholar.

We started going to weekly lunches and eventually worked on a wide range of projects for over 20 years. From that first call, we worked on the national Capital Jury Project, and that was an early connection with Cornell Law School and the death penalty. At the time, John Blume[now the Samuel F. Leibowitz Professor of Trial Techniques and director of the Cornell Death Penalty Project] ran a death penalty resource center in South Carolina, and he had a lot of data to analyze.

Michael Heise: One interesting aspect is that the larger collaboration predates me. In fact, when Ted and Marty began I was still finishing up law school and grad school. But Ted’s instincts were dead-on. It was illustrated by him reaching out to Marty, to formally engage legal questions and legal research and exploit data – what is now known as data science – and the required computational power was just beginning to ramp up. It wasn’t just that Ted and Marty began contributing to the literature – they actually helped redefine it and propel empirical legal studies. Ted was a generation ahead of his time.

Wells: This was at a time before the term “data science” even existed. Analyzing large-scale legal data was a novel idea, and Ted had a vision that this was where legal scholarship was headed: looking over lots of cases to get a view about what happens in the aggregate.

So we started working on South Carolina death penalty data, which was part of the Capital Jury Project. John Blume also got us involved in going to post-conviction reviews in South Carolina and analyzing countywide sentencing data.

Our first big paper that came out of that work was about juror instructions. We had data from the surveys, and we found that many jurors didn’t understand the instructions given by judges. And, furthermore, jurors who didn’t understand instructions were more likely to vote for death. We analyzed data relevant to South Carolina at the same time a case that hinged on understanding jury instructions was coming up before the U.S. Supreme Court. We wrote an article that was published in the Cornell Law Review, and was subsequently cited in the Supreme Court decision.

Jesse Winter Photography

Marty Wells

Heise: This was a key early move. Marty brought world-class statistical training and data analytics. But Ted and John Blume said: “We need to go to the county level.” Now, to a statistician, the county level as a unit of analysis doesn't mean all that much. Legally, however, it has tremendous salience in terms of assessing how legal rules are being applied or misapplied, especially as it relates to judges, jurors, jury instructions, jury pools and prosecutors.

Wells: As a statistician and data scientist, it is great to collaborate with leading scholars, such as Michael, because they have a deep understanding of their substantive area. The best collaborations arise when neither of the collaborators can do what the other can do. It brings out the best of each of your skill sets, and the research can have a real impact.

What does this collaboration, and its future potential, look like today in an era of big data?

Heise: In terms of big data and analytics, issues relating to artificial intelligence and algorithms are quickly emerging. For legal scholars, this is simply a new area, and we have begun, just in the last couple of years, to digest and think through some interesting and potentially critical legal changes that will be driven by truly massive data sets interacting with enormous computational power.

Wells: Legal scholars are now tapping into data sets that, previously, only social science researchers had used, such as data from the Bureau of Justice Statistics and the National Center for State Courts, as well as massive epidemiological and health care data sets. Law schools are now hiring many J.D./Ph.D.s. They bring the social science tradition and the ability and the desire to look at large-scale data.

Heise: The Law School has a distinct mission, and we’re not training social scientists; we’re training lawyers. But to be a sophisticated lawyer in today’s global economy, you have to be familiar with basic statistical language and concepts, at least at a general level. Serious lawyers also need to be able to consume basic science research so they know what questions to ask.

Now that big data is being increasingly applied to legal research and scholarship, what changes might have to be reckoned with?

Heise: One timely example involves the criminal division of the United States Department of Justice that recently developed algorithms to predict criminal recidivism, which can affect bail decisions and early release recommendations. These algorithms draw on massive data sets, and if those data sets are systematically flawed, this would generate important real-world consequences.

Wells: I’m involved with a Cornell Institute for the Social Sciences research project looking at fairness in predictive algorithms. A popular claim is that predictive algorithms are objective; however, algorithms are trained on historical data that may have built-in biases. With expertise in the College of Computing and Information Science, the Law School, and the departments of Science and Technology Studies and Communications, Cornell is a national leader.

How does Cornell’s environment foster collaborations?

Heise: At the practical level, that Marty has office space here at the Law School is important. Just three or four months ago I was walking down the hall and overheard him talking with another scholar. I poked my head in and said, “I’ve been banging my head against the wall on this particular question.” We talked about it for 10, 15 minutes. Marty said, “Well, send me the data set.” A day or so later he said, “Oh, well here’s what you need to do.” That quick conversation resulted in another published paper, and this took place only because Marty has an office here.

Provided

Michael Heise

Wells: Proximity matters. One learns what new empirical methodologies are needed by talking with substantive scholars. I would rather publish in a substantive field journal, because those are the scholars who really need to know about those methods. You’re likely to make more of an impact.

Talk about the study you worked on that involved the Exxon Valdez case.

Heise: That was a multi-decade project.

Wells: At the time [1989], it seemed that Exxon was lobbying to show that there was no rationality in how jurors assess punitive damages. And it turned out that just as the Exxon trials were starting, the National Center for State Courts released high-quality data on punitive damages. We looked at a simple logarithmic transformation of the dollars rather than the dollars themselves, found robust results, and began publishing a series of papers on aspects of the relationship between compensatory and punitive damages.

Heise: The papers relate to a series of court decisions that ultimately found their way to the United States Supreme Court. While this punitive damages debate was framed by tobacco litigation and some massive judgments like the Exxon Valdez case, we found that, contrary to conventional wisdom at the time, the amount of compensatory or actual damages awarded by juries strongly correlates with the amount of punitive damages, and that this relation has persisted over time.

Looking only at individual cases obscures this trend. But if you take a couple of steps back and analyze systematic data, a remarkably robust relation emerges.

Wells: And the results were robust across judges and juries. That was an early finding that attracted a lot of attention.

Heise: The civil justice system is not as unpredictable as some critics suggest. Some wanted states to ban punitive damages. They said jurors were picking numbers out of thin air, that it was a lottery with negative consequences for legal rules and institutions. The critique, however, is simply not supported by the data.

Wells: It was quite exciting. This was a problem – a legal problem, and it was fundamentally an empirical problem. And lawyers are very good at making arguments – but if you have some quality data you can actually win the argument.

What is something you are currently working on?

Heise: We’re currently looking at how the number of cases on the U.S. Supreme Court docket has declined over the last couple decades. At the high point, the Court was deciding approximately 200 cases a year; it’s now in the low 60s. The question is, what’s going on? What explains such a tremendous decline over time? We’re just beginning our work on these questions; we’re testing some hypotheses as to what might account statistically for that. There’s nothing obvious yet.

Wells: It is interesting that the number of certiorari petitions [requests for the Court to review a lower court’s decision] has gone up exponentially while the number of cases the Court is taking on now, the number of decisions it decides, is at the level it was right before the Civil War. Our data analysis methodology is novel and is akin the methods used in epidemiology.

What is Cornell Law School’s legacy in the area of empirical research?

Wells: Cornell Law School has an exceptional empirical legal studies tradition. Cornell’s Journal of Empirical Legal Studies is the premier journal in the field.

Heise: During the 1980s, Cornell emerged as the leading empirical legal studies law school in the country; it's what attracted me here. The Law School remains one of the most robust, engaged, supportive institutions for empirical legal research – and that’s the product of great deans, great colleagues and great students.

Now, in part because of what started here at Cornell, the entire law school world has shifted a bit. There is no longer any serious law school in the United States without Ph.D.s or J.D./Ph.D.s. on the faculty. Thus, the scholarly space that Cornell used to have for itself is increasingly shared space with other leading law schools.

But only Cornell Law School has Marty.

Computing & Information Sciences