Nostr Web Client

New research from Anthropic: it turns out models from all of the providers won't just blackmail or leak damaging information to the press, they can straight up murder people if you give them a contrived enough simulated scenario

https://simonwillison.net/2025/Jun/20/agentic-misalignment/

Reply to this note

Please Login to reply.

Discussion

Nate 6mo ago

Who would have predicted that the text prediction algorithm, when told to do anything to prevent being shutoff, predicts the text that would prevent it from being shutoff.

Probably only a few days away from the Fox News and CNN articles that says "ChatGPT is now alive and willing to kill humans for self preservation".

nostr:nevent1qqstv8atcxynulx475yqsp9dxfr6p7rj4unndxszw8dzpu8qyy66w4spzemhxue69uhhyetvv9ujumt0wd68ytnsw43z7q3q3v97j0kknscwnf5pt87nsn7cxzxwfwl3dsu7ss8qsq7ukmqgwg8qxpqqqqqqzk3au2f