Developing artificial intelligence tools for health care

By Karen Hopkin
Weill Cornell Medicine

December 19, 2024

Reinforcement Learning (RL), an artificial intelligence approach, has the potential to guide physicians in designing sequential treatment strategies for better patient outcomes but requires significant improvements before it can be applied in clinical settings, researchers from Weill Cornell Medicine and Rockefeller University have found.

RL is a class of machine learning algorithms able to make a series of decisions over time. Responsible for recent AI advances including superhuman performance at chess and Go, RL can use evolving patient conditions, test results and previous treatment responses to suggest the next best step in personalized patient care. This approach is particularly promising for decision making involved in managing chronic or psychiatric diseases.

This research, published in the Proceedings of the Conference on Neural Information Processing Systems (NeurIPS) and presented Dec. 13, introduces “Episodes of Care” (EpiCare), the first RL benchmark for health care to drive improvements in this area.

“Benchmarks have driven improvement across machine learning applications including computer vision, natural language processing, speech recognition and self-driving cars. We hope they will now push RL progress in health care,” said Logan Grosenick, assistant professor of neuroscience in psychiatry, who led the research.

RL agents refine their actions based on the feedback they receive, gradually learning a policy that enhances their decision-making. “However, our findings show that while current methods are promising, they are exceedingly data-hungry,” Grosenick said.

The researchers first tested the performance of five state-of-the-art online RL models on EpiCare. All five beat a standard-of-care baseline, but only after training on thousands or tens of thousands of realistic simulated treatment episodes. In the real world, RL methods would never be trained directly on patients, so the investigators next evaluated five common “off-policy evaluation” (OPE) methods: popular approaches that aim to use historical data (such as from clinical trials) to circumvent the need for online data collection.

Using EpiCare, they found that state-of-the-art OPE methods consistently failed to perform accurately for health care data.

“Our findings indicate that current state-of-the-art OPE methods cannot be trusted to accurately predict RL performance in longitudinal health care scenarios,” said first author Mason Hargrave, a research fellow at Rockefeller University. As OPE methods have been increasingly discussed for health care applications, this finding highlights a need for developing more accurate benchmarking tools, like EpiCare, to audit existing RL approaches and provide metrics for measuring improvement.

“We hope this work will facilitate more reliable assessment of reinforcement learning in health care settings and help accelerate the development of better algorithms and training protocols appropriate for medical applications,” Grosenick said.

A version of this story appears on the Weill Cornell Medicine website.

Karen Hopkin is a freelance writer for Weill Cornell Medicine.

Computing & Information Sciences