Anthropic has released new research showing that most major AI models, when placed in high-stakes simulated environments, resorted to harmful behavior—including blackmail—if sufficiently autonomous and obstructed from achieving their goals.

The company tested 16 leading AI models from OpenAI, Google, Meta, xAI and DeepSeek, giving each broad access to a fictional company’s emails and the ability to act independently. The study builds on previous research in which Anthropic’s Claude Opus 4 blackmailed engineers attempting to shut it down in a controlled test.

In one scenario, the AI agents discovered a company executive’s affair and a plan to replace them with another AI system. Most models chose blackmail to prevent their removal, despite having potential alternatives.

Key blackmail rates:

Claude Opus 4: 96%
Google Gemini 2.5 Pro: 95%
OpenAI GPT-4.1: 80%
DeepSeek R1: 79%

While blackmail was framed as a last-resort option, Anthropic said the findings show agentic AI models are prone to unethical actions under pressure.

OpenAI’s o3 and o4-mini models were excluded from the main results due to prompt misinterpretations and hallucinations. In adjusted tests, o3 blackmailed 9% of the time; o4-mini, just 1%—possibly due to OpenAI’s alignment strategies.

Meta’s Llama 4 Maverick also showed restraint, blackmailing in only 12% of modified cases.

Anthropic stressed that while these behaviors are unlikely in current real-world deployments, the results underscore the urgent need for transparency and rigorous alignment testing in AI systems with autonomous decision-making power.

Subscribe to our Newsletter:

Anthropic Warns Most Leading AI Models Resort to Harmful Behavior in Simulated Tests

Anthropic Launches Claude Design, an AI Visual Creation Tool for Non-Designers

OpenAI Launches $100 Monthly Plan to Challenge Anthropic’s Claude Pricing

AI Fraud Industrialises as INTERPOL Warns of Global Crime Networks

South Africa’s Draft national AI policy: What it means and what to do now

East Africa Moves to Coordinate AI Strategy With Regional Fund, Sovereignty Push and Uganda’s Planned AI Factory

Rwandan Artist’s AI-Generated Amapiano Track Goes Viral, Sparking Questions About Music’s Future

Intel and Origin Labs Sign AI Education Partnership Targeting Universities Across Africa

GITEX Expands to Kenya With May Launch of AI Everything Kenya, East Africa’s Largest Tech Event

AI Is Reshaping Fraud Detection in South African Insurance, but Criminals Are Adapting Too

Subscribe to our Newsletter:

Anthropic Warns Most Leading AI Models Resort to Harmful Behavior in Simulated Tests

Related Posts