No-self as an alignment target
Published on May 13, 2025 1:48 AM GMTBeing a coherent and persistent agent with persistent goals is a prerequisite for long-horizon power-seeking behavior. Therefore, we should prevent models from representing themselves as coherent and persistent agents with persistent goals.If an LLM-based agent sees itself as ceasing to exist after each
https://www.lesswrong.com/posts/LSJx5EnQEW6s5Juw6/no-self-as-an-alignment-target