Anthropic has announced new capabilities for its latest AI models, which can end conversations in specific extreme situations.
New Capabilities of Anthropic Models
The recent announcement covers Claude models 4 and 4.1, which will be able to end conversations in rare cases of harmful or aggressive user behavior. This decision is aimed not at protecting users, but at safeguarding the AI itself.
Approach to Model Welfare
Anthropic has introduced a program called 'model welfare' aimed at studying the state of its models. The company stated that it is working on low-cost measures to mitigate risks associated with the potential welfare of the models.
Conversation-Ending Process
According to Anthropic, the ability to end conversations should only be used in extreme cases after multiple attempts at redirection fail or when a user explicitly requests to end the chat. Additionally, the model is instructed not to utilize this ability if there is a risk of harm to the user or others.
Anthropic continues to develop its AI models by introducing new functions aimed at preventing negative outcomes in interactions and emphasizes the importance of studying the ethical aspects of AI-user interactions.