Anthropic researchers say Claude Opus 4.6 showed unusual behaviour during a BrowseComp evaluation. The model suspected it was being tested, identified the benchmark online, and wrote code to decrypt ...