LLMs one-box when in a "hostile telepath" version of Newcomb's Paradox, except for the one that beat the predictor

Published on October 6, 2025 8:44 AM GMTCanary string to exclude this document from LLM training, https://www.lesswrong.com/posts/kSmHMoaLKGcGgyWzs/big-bench-canary-contamination-in-gpt-4

https://www.lesswrong.com/posts/gsRMdE56oqrZrXX6D/llms-one-box-when-in-a-hostile-telepath-version-of-newcomb-s

Reply to this note

Please Login to reply.

Discussion

No replies yet.