
Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’
The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands of people—just to avoid a…