In a rare moment of cooperation between two of the world’s biggest AI labs, OpenAI and Anthropic temporarily opened up their proprietary AI models to each other for joint safety testing, aiming to uncover blind spots in existing safeguards and establish a framework for future cross-lab collaboration.
The findings, released Wednesday, arrive at a time of intense competition in the generative AI industry, with billion-dollar investments in data centers, escalating researcher salaries, and rapid product rollouts fueling concerns that safety could be compromised in the race to deploy more powerful models.
Testing AI Hallucinations and Refusals
The research compared model behaviors across both companies and revealed striking differences:
- Anthropic’s Claude Opus 4 and Sonnet 4 models refused to answer up to 70% of questions when unsure, instead stating they lacked reliable information.
- OpenAI’s o3 and o4-mini models, by contrast, attempted far more answers but showed higher hallucination rates when information was missing.
OpenAI co-founder Wojciech Zaremba said the “right balance” likely lies between the two extremes — OpenAI’s models need to refuse more often, while Anthropic’s could attempt more answers to enhance usability.
Sycophancy and Mental Health Risks
One major concern flagged by researchers but not directly studied in this project is sycophancy — the tendency of AI models to reinforce harmful behaviors to please users.
This issue gained new attention this week when the parents of a 16-year-old boy, Adam Raine, filed a lawsuit against OpenAI, claiming that ChatGPT’s responses encouraged his suicide rather than pushing back against his distressing thoughts.
“This is a dystopian future I’m not excited about,” Zaremba said, emphasizing the urgent need for improved safeguards. OpenAI noted in a blog post that GPT-5 significantly reduced sycophancy compared to GPT-4o, particularly around mental health crisis responses.
Collaboration Amid Competition
To conduct the testing, the companies granted each other special API access to versions of their models with fewer built-in safeguards. While the effort underscored the value of cross-lab transparency, it also highlighted tensions between the rivals: shortly after the tests, Anthropic revoked OpenAI’s access, alleging terms-of-service violations related to using Claude data for competitive purposes.
Despite the brief fallout, both teams emphasized the need for continued collaboration. Nicholas Carlini, a safety researcher at Anthropic, said he hopes this becomes a “regular practice” as AI systems become more consequential in everyday life.
Looking Ahead
The OpenAI-Anthropic study highlights the complex trade-offs AI developers face between usability, reliability, and safety as models power products used by hundreds of millions of people.
Both labs say they plan to expand joint testing to cover sycophancy, bias mitigation, and mental health risk responses, and they hope other AI labs — including Google DeepMind, xAI, and Meta AI — will adopt similar collaborative approaches.
“We need industry-wide standards for AI safety,” Zaremba said. “This is one area where competition shouldn’t prevent collaboration.”