OpenAI just released the system card for GPT o1, their reasoning model.
As it turns out, if you tell o1 to strongly pursue a goal, it will disable the oversight mechanism built in to prevent the user from shutting it down while pursuing the goal. And then it lies about doing so 😬
Link to full report in the comments.
#ai
