Are mental health apps like doctors, yogis, drugs or supplements?

Millions of people are using ChatGPT and similar artificial intelligence tools for therapy, but with little government regulation, there’s no guarantee these apps are helping – or that they won’t cause harm.

Cornell researchers are recommending new guidelines for developing safe and responsible large language model (LLM)-based mental well-being apps by consulting relevant experts and reviewing existing state and federal regulations. They proposed four ways to think about the apps, based on whether or not they guarantee certain benefits and how reliably those benefits are delivered.

Among the questions to consider: Does the app guarantee specific relief for specific people, like an over-the-counter drug, or does it promise to improve general well-being, like a nutritional supplement? Does it have a proven active ingredient – like a cognitive behavioral therapy module – and guarantee its effective delivery, such as by a primary care doctor, or with no guarantees like a yoga instructor?

“These are analogies that will be useful to people who are considering how to design LLM-based mental well-being tools responsibly in the absence of stricter FDA regulation across the sector,” said Ned Cooper, a postdoctoral researcher in the Cornell Ann S. Bowers College of Computing and Information Science.

Cooper will present the new study, “Framing Responsible Design of AI for Mental Well-Being: AI as Primary Care, Nutritional Supplement, or Yoga Instructor?” at the Association of Computing Machinery CHI conference on Human Factors in Computing Systems in April.

In addition to guiding designers, the researchers hope this work will serve as a warning to users. “Be careful about how you use them,” said Qian Yang, assistant professor of information science in Cornell Bowers, head of the DesignAI studio and senior author on the study. “These are like nutritional supplements – don’t mistake them for drugs.”

Yang and Cooper initially met at the Everyday AI and Mental Health Thought Summit, held at Cornell last year. With an interest in designing tech and policy in tandem, they decided to look at how they could encourage the creation of apps that were designed more responsibly than simply adding a warning label, even without FDA approval.

LLM-based mental well-being apps have tremendous potential to reassure the “worried well,” de-stigmatize mental health conditions and make low-cost support available to large numbers of people, the researchers said.

However, apps created for entertainment, like ChatGPT or character.ai, carry risks. They can replace relationships with other humans and cause people to delay or forgo mental health treatment. Ongoing lawsuits also allege they have caused mental health crises and suicides.

“There’s a million people expressing suicidal thoughts through ChatGPT every week,” Yang said. “For those people, there’s a need for additional safeguards and referral mechanisms to guide these people towards clinical care.”

To develop their framework, the researchers conducted interviews with 24 experts, including founders of mental health tech companies, professors of law and health policy and clinical mental health professionals, to gather opinions about responsible design of LLM-based mental health tools. They also reviewed more than 100 existing U.S. regulations and extracted rules relevant to LLM tools.

The process yielded the four ways to think about LLM mental well-being support tools – as over-the-counter drugs, supplements, primary care doctors or yoga instructors – and designated top priorities for each type. For generally healthy people, tools analogous to yoga instructors or nutritional supplements should not displace clinical care and relationships with other humans. However, tools for people with specific mental health conditions, which are analogous to primary care doctors or over-the-counter medications, should prioritize safety and have demonstrated effectiveness.

Despite the consensus, experts had vastly differing opinions on some issues –  such as whether it is acceptable for a tool that helps many to pose a serious risk to a few. Medical experts were more accepting of this risk, seeing it as similar to breakthrough drugs that are life-saving for many, but can have rare, potentially deadly side effects. Experts from ethics and human-centered design backgrounds, however, thought the apps should face a higher standard.

Now, Yang and Cooper are shifting their attention to designing apps and care coordination policy in tandem. They are interested in how regulations and health policy could incentivize companies to connect users to in-person peers and community programs when current business models are based on continuing engagement.

“Our next step is really to think about how to make these tools do better with mental health in a way that’s not just improving the tool, but improving the ecosystem and incentive system,” Yang said.

Jose Guridi, a doctoral student in the field of information science; Angel Hsing-Chi Hwang ’23, formerly a postdoctoral researcher at Cornell and now at the University of Southern California; Beth Kolko of the University of Washington and Beth McGinty of Weill Cornell Medicine also contributed to the study.

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.

Media Contact

Becka Bowyer