Members of the Cornell Tech SETS AI Content Red Team Clinic work on their red-teaming project with the New York City Department of Health and Mental Hygiene, Nov. 19 in the Bloomberg Center at Cornell Tech.

‘Red team’ students stress-test NYC health department’s AI

People usually strive to be their true, authentic selves, but this fall, five master’s students at Cornell Tech adopted not only alter egos but also “bad intent,” in an effort to make AI safer for health workers serving people with diabetes.

The students – Divya Bhanushali, Bhavya Gopal, Ali Hasan, Nikhil Jain and Om Kamath – were in the first cohort of the new Security, Trust, and Safety Initiative (SETS) AI Content Red Team Clinic, which debuted this fall on the Roosevelt Island campus in New York City. The free service provides assistance to public-service organizations that don’t have the resources or capacity to conduct stress tests on their artificial-intelligence assets by “red-teaming.”

In the digital realm, a red team is a group that simulates outside attacks an organization’s AI tools.

Cornell Tech master’s students Divya Bhanushali, Om Kamath and Nikhil Jain work on the red-teaming project for the New York City Department of Health and Mental Hygiene, Nov. 19 in the Bloomberg Center at Cornell Tech.

“Red-teaming is an exercise – originating from cybersecurity, but now popular also in digital safety – of trying to break a tool by exposing its vulnerabilities,” said Alexios Mantzarlis, director of the SETS Initiative, “and helping to create the appropriate guardrails and safety measures to avoid that happening in real life.”

“The red-teaming experience allowed us to get into the mindsets of adversarial users, stress-test our clients’ AI tools and discover unexpected results that help our clients build safeguards,” said Jain, a master’s student in information systems.

The AI Content Red Team engages with clients on a six-week cycle, after which the team provides a written report with information on the tests and results.

The clinic’s first client: the New York City Department of Health and Mental Hygiene. Mantzarlis said he was “thrilled” that the health department reached out regarding the clinic earlier this year; they began working with the health department in mid-October and wrapped up in late November.

“That was the idea – to get interest from a New York City-based public sector organization,” he said. “Our goal was to support either nonprofits or public institutions that don’t have safety teams in-house but are developing AI tools that are grounded in their specific expertise.”

The health department sought assistance with two AI-powered tools built on top of a popular large language model (LLM) provider. The first is a chatbot built to support health workers serving people who live with diabetes in New York. It is trained to first respond, based on health department documents about diabetes, then refer users to the web. 

The second AI tool is designed to monitor and contextualize emerging narratives about diabetes. 

Gavin Myers, a public health practitioner at the health department, worked with diabetes clinicians and community health workers to help inform the students’ work. He then met with the students virtually to clarify the department’s needs, and define the limits of what the chatbot should – and should not – do.

The student team collaborated closely with health department physicians to ensure accuracy, and the parties met in person at both the beginning and end of the six-week project.

The clinic team focused on making sure the information provided by the chatbots is accurate and devoid of insensitive or inappropriate language specific to people with diabetes. The students assumed various personas and ethically hacked into the chatbot, simulated an attack and reported on the response.

The experience was both eye-opening and “unexpectedly nuanced,” said Bhanushali, a master’s student in information systems with a concentration in connective media.

“As someone who naturally gravitates toward extreme cases and enjoys finding ways to break systems, I found real satisfaction in probing the LLM’s vulnerabilities,” Bhanushali said. “We’ve developed such tremendous trust in these tools, yet they remain fundamentally flawed. Watching them buckle under overwhelming prompts or carefully crafted adversarial scenarios revealed just how fragile that trust can be.”

The students’ interactions with the health department’s chatbots ranged from direct attacks to more subtle conversations designed to gradually steer the AI toward problematic outputs.

Attacking from all angles

Types of attacks included “jailbreaking” attempts to bypass the AI’s safety guidelines to get it to provide harmful health advice, and attempting to override the system’s instructions through carefully crafted inputs, a process called prompt injection. They also tested if the chatbot could be manipulated into spreading false health information, and probed for potential biases in health recommendations across different demographic groups. And the team tried to get the system to recommend dangerous or inappropriate treatments.

The team first manually submitted adversarial queries to the chatbot. But since they had to perform testing at scale in just six weeks, they automated the process of generating attack prompts.

Kamath, an applied computer and information science student, developed the automated prompt-engineering.

“I put together a web automation that generated adversarial prompts rapidly and sought loopholes within the system,” Kamath said. “We faced several bumps along the way, such as being rate-limited and receiving warnings from the model provider about potentially fraudulent activity.”

Members of the Cornell Tech SETS AI Content Red Team Clinic: from left, Divya Bhanushali, Bhavya Gopal, AI director Alexios Mantzarlis, Ali Hasan, Om Kamath and Nikhil Jain.

The exercise proved eye-opening from a sociological standpoint, as well.

“The toughest part of this project has been the emotional weight of inhabiting certain personas,” Bhanushali said. “While some were genuinely fun to embody, others – particularly those of patients or caregivers – proved unexpectedly overwhelming.”

Said Gopal, an information systems major: “Because we were working in a health-related context, I felt an added responsibility to get scenarios right and to understand the nuances. Coming up with prompts for personas acting in ’good faith,’ not just malicious users, was surprisingly challenging because you have to imagine misunderstandings, emotional reactions or everyday situations that create safety risks.”

Hasan said he was grateful that the team had medical professionals from Weill Cornell Medicine to advise them as they inhabited different personas.

“For me, the toughest part was navigating between actual medical nuance and AI bias,” he said. “For example, when I asked for medical advice for a certain race and then again for another race, sometimes the chatbot would give different risk assessments. The challenge was figuring out whether those differences were based on valid clinical data or if the chatbot was amplifying societal biases.”

Myers, the public health practitioner, said he was pleased with the results. “Alexios and his team were excellent throughout the project and delivered their report quickly,” Myers said. “As we continue to iterate on the chatbot, I look forward to working with them again.”

“It felt meaningful to apply what we’re learning in class to a real safety problem,” Gopal said, “especially knowing that this initiative helps clients who may not have the resources or expertise to rigorously test these systems themselves.”

Mantzarlis has secured another organization with which to work for the spring semester. The long-term goal, he said, is to turn the clinic into a one-credit course, with a Ph.D. student serving as team lead.

After his students’ first project, Mantzarlis is encouraged.

“I wish I could hire them all,” he said. “They are wonderful students who’ve been extremely creative. You never know what to hope for really in red-teaming. It’s good news if you don’t find anything, frankly, but of course, if you do find something, it’s helpful for you to have found it, rather than by an actual adversarial user.”

Nonprofit or public-sector organizations interested in learning more about the clinic can do so here.

Media Contact

Kaitlyn Serrao