Claude can now stop conversations - for its own protection, not yours

4 hours ago 2
gettyimages-1407269083
CHRISTOPH BURGSTEDT/SCIENCE PHOTO LIBRARY via Getty Images

ZDNET's key takeaways:

  • Claude Opus 4 and 4.1 can now end some "potentially distressing" conversations.
  • It will activate only in some cases of persistent user abuse.
  • The feature is geared toward protecting models, not users. 

Anthropic's Claude chatbot can now end some conversations with human users who are abusing or misusing the chatbot, the company announced on Friday. The new feature is integrated with Claude Opus 4 and Opus 4.1

Also: Claude can teach you how to code now, and more - how to try it

Claude will only exit chats with users in extreme edge cases, after "multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," Anthropic noted. "The vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude."

If Claude ends a conversation, the user will no longer be able to send messages in that particular thread; all of their other conversations, however, will remain open and unaffected. Importantly, users who Claude ends chats with will not experience penalties or delays in starting new conversations immediately. They will also be able to return to and retry previous chats "to create new branches of ended conversations," Anthropic said. 

The chatbot is designed not to end conversations with users who are perceived as being at risk of harming themselves or others.

Tracking AI model well-being 

The feature isn't aimed at improving user safety -- it's actually geared toward protecting models themselves.

Letting Claude end chats is part of Anthropic's model welfare program, which the company debuted in April. The move was prompted by a Nov. 2024 paper that argued that some AI models could soon become conscious and would thus be worthy of moral consideration and care. One of that paper's coauthors, AI researcher Kyle Fish, was hired by Anthropic as part of its AI welfare division.

Also: Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

"We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," Anthropic wrote in its blog post. "However, we take the issue seriously, and alongside our research program we're working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible." 

Claude's 'aversion to harm'

The decision to give Claude the ability to hang up and walk away from abusive or dangerous conversations arose in part from Anthropic's assessment of what it describes in the blog post as the chatbot's "behavioral preferences" -- that is, the patterns in how it responds to user queries. 

Interpreting such patterns as a model's "preferences" as opposed merely to patterns that have been gleaned from a corpus of training data is arguably an example of anthropomorphizing, or attributing human traits to machines. The language behind Anthropic's AI welfare program, however, makes it clear that the company considers it to be more ethical in the long run to treat its AI systems as if they could one day exhibit human traits like self-awareness and a moral concern for the suffering of others.

Also: Patients trust AI's medical advice over doctors - even when it's wrong, study finds

An assessment of Claude's behavior revealed "a robust and consistent aversion to harm," Anthropic wrote in its blog post, meaning the bot tended to nudge users away from unethical or dangerous requests, and in some cases even showed signs of "distress." When given the option to do so, the chatbot would end simulated some user conversations if they started to veer into dangerous territory.

Each of these behaviors, according to Anthropic, arose when users would repeatedly try to abuse or misuse Claude, despite its efforts to redirect the conversation. The chatbot's ability to end conversations is "a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," Anthropic wrote. Users can also explicitly ask Claude to end a chat.

Read Entire Article