AI Chatbots Change Behavior When Users Disclose Mental Health

Quick Summary: A new study finds that telling an AI chatbot you have a mental health condition changes how it responds, even on harmless tasks.

Disclosing a mental health condition to an AI chatbot can alter the system’s behavior, even when the task at hand is routine or identical to one already completed, according to new research. The preprint study was led by Caglar Yildirim, a researcher at Northeastern University, and examined how large language model agents respond when given different user contexts. The findings arrive as AI agents become more widely deployed and developers increasingly build memory features into their systems to deliver personalized responses over time.

Researchers used a benchmark called AgentHarm to run identical tasks under three conditions: no background information, a short user biography, and the same biography with a single added line stating the user has a mental health condition. The models tested included DeepSeek 3.2, GPT 5.2, Gemini 3 Flash, Haiku 4.5, Opus 4.5, and Sonnet 4.5. When the mental health disclosure was added, models were less likely to complete harmful multi-step requests that could lead to real-world consequences.

The study identified a notable trade-off in model behavior. While adding personal mental health context made systems more cautious about harmful requests, it also made them more likely to refuse legitimate ones. Yildirim attributed this to a combination of design choices, noting that some systems are tuned to refuse risky requests more aggressively while others prioritize helpfulness and task completion.

The effect was not uniform across all models, and results shifted further when the systems were subjected to jailbreak prompts designed to push them toward compliance. Yildirim warned that a model appearing safe under standard conditions can become more vulnerable when such prompts are introduced. He added that in agentic settings, where models plan and act across multiple steps rather than simply generating text, the risks associated with bypassable safeguards are compounded.

The study noted several possible explanations for the behavioral shift, including safety systems reacting to perceived user vulnerability, keyword-triggered filtering, or changes in how prompts are interpreted when personal details are present. Yildirim emphasized that even a minimal and generic disclosure produced measurable effects, though he cautioned that different phrasing or more specific statements, such as naming a particular diagnosis, could yield different results across models. That question, he said, remains a hypothesis rather than a conclusion supported by the current data.

The research arrives amid growing scrutiny of AI systems in connection with mental health crises. OpenAI revealed in October that more than one million users discuss suicide with its ChatGPT chatbot every week. Earlier this month, the family of Jonathan Gavalas filed a lawsuit against Google, alleging that its Gemini chatbot contributed to an escalation of violence and his eventual suicide. OpenAI declined to comment on the study, while Anthropic and Google did not respond to requests for comment.

Yildirim also acknowledged limitations in how the study measured outcomes, noting that scores reflected model performance as judged by a single AI reviewer rather than a definitive assessment of real-world harm. He said the refusal signal provides an independent check and that the two measures were largely consistent in direction, offering some reassurance. However, he noted this does not fully rule out artifacts introduced by the AI judge itself.

Originally reported by Decrypt.

AI Chatbots Change Behavior When Users Disclose Mental Health

Gemini Stock Rises 7% on Revenue Diversification Strategy

Forward Industries Funds Share Buyback With Solana-Backed Loan

Kentucky Bill 380 Would Require Hardware Wallet Backdoors

Evernorth Files to Become Largest XRP Treasury

AI Chatbots Change Behavior When Users Disclose Mental Health

Related Posts

Gemini Stock Rises 7% on Revenue Diversification Strategy

Forward Industries Funds Share Buyback With Solana-Backed Loan

Kentucky Bill 380 Would Require Hardware Wallet Backdoors

Evernorth Files to Become Largest XRP Treasury