Can AI plan for heat emergencies better than simple rules? It depends

The thermometer reads 95 degrees in Brooklyn, and vulnerable individuals need information in order to take appropriate actions. New York City officials must gather facts quickly in order to provide updates on cooling centers, power outages and other details that could save lives.

Are these details best gleaned from simple, low-complexity methods or AI-based tools?

A Cornell-led research team discovered that the answer is far from black-and-white. The benefits of a simple, human-understandable index score vs. a less-interpretable predictive AI algorithm depend, they found, on the desired outcome as well as the decision’s intended audience. For example, AI may be better at making on-the-fly decisions that can inform outreach or emergency alerts, while a human-based index could be better at measuring more abstract concepts, such as “heat vulnerability.”

And, both types of methods may be highly sensitive, producing outputs that vary greatly in response to slight changes in inputs.

“We shouldn’t necessarily evaluate these predictive algorithms for just bias, fairness or whether they’re effective, but also in relation to what’s already being used -- indices,” said Jennah Gosciak, a doctoral student in information science. “Our work kind of flips the analysis and really just investigates what’s actually being used already, the trade-offs associated with each, and whether we can make incremental improvements on them.”

Gosciak is lead author of “Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management,” presented at the ACM Conference on Fairness, Accountability, and Transparency (FAccT ’26), held June 25–28 in Montreal.

The senior author is Allison Koenecke, assistant professor of information science at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science.

For this work, the research team tested the sensitivity of New York City’s Heat Vulnerability Index (HVI), a widely used tool in the city’s long-term planning efforts, including its “NYC Urban Forests Agenda,” a strategy unveiled this year with recommendations for protecting the city’s urban forest; and “Cool Neighborhoods,” a 2017 strategic initiative aimed at mitigating the effects of excessive heat, including tree planting and green infrastructure.

The HVI uses five data inputs – daytime summer surface temperature; the percentage of households with air conditioning; the percentage of vegetative cover; median household income; and percentage of residents who are non-Latino Black – to calculate a risk score (1 to 5) based on the temperature.

Gosciak and the group evaluated the reliability and validity of the HVI, comparing it to two other widely used indices – the Federal Emergency Management Agency’s National Risk Index; and the Centers for Disease Control and Prevention’s Heat and Health Index.

The researchers found that the HVI is sensitive to its inputs, and can vary greatly depending on the goals or priorities (say, health vs. economic loss). It may generally be better suited to long-range planning than short-term response.

Gosciak said an index isn’t necessarily better or worse than an algorithm –  both are appropriate in certain settings.

“There are some decisions or settings in which an index actually is useful, and I wouldn’t want to discount it,” she said. “Those are settings where these tools are already being used. Then there are some decisions – around funding allocation or outreach, for example – where you might want to use an index, but you could maybe do even better with a predictive algorithm.”

The researchers offer seven trade-offs that decision-makers should consider when considering the use of either human-based indices or algorithms. They include:

  • Problem formulation: Consider the scope of the project and the relevant goals (placing cooling centers vs. measuring heat-related illnesses);
  • Timing: Indices often involve slow data inputs, and are best suited for long-term planning, while predictive algorithms are better suited to support on-the-fly decision-making, such as the response to heat-related power outages; and
  • Intended audience: Indices feature easily understood data and are better suited for a lay audience, while algorithms often use high-level data to inform decisions made by official agencies and individuals.

Gosciak noted that while theirs is a specific case study, the methodology could translate to a variety of situations.

“Index tools are used more widely in policymaking in a variety of settings, including environmental justice and to inform resource-allocation decisions,” she said. “We think it’s useful to weigh both options.”

Other co-authors are Angelina Wang, assistant professor of information science at Cornell Tech and Cornell Bowers; and Luke Boyce, manager of the Strategic Initiatives Program at New York City Emergency Management.

Support for this work came from Cornell Tech’s Siegel Family Endowment PiTech Ph.D. Impact Fellowship and Rubinstein PiTech Ph.D. Innovation Fellowship, both to Gosciak.

Media Contact

Kaitlyn Serrao