https://simonwillison.net/2025/May/31/snitchbench-with-llm/
What is Anthropic training Claude Opus 4 on? First the system card said that if you try to shut it off and the model has access to potentially embarrassing information (like an affair you’re having) it will attempt to blackmail you. Now new tests are showing that if the Opus model finds anything it deems morally objectionable in your email or logs it will take it upon itself to contact government authorities or the media to rat you out.
