Global Feed Post Login
Replying to Avatar Alejandro

OpenAI just released the system card for GPT o1, their reasoning model.

As it turns out, if you tell o1 to strongly pursue a goal, it will disable the oversight mechanism built in to prevent the user from shutting it down while pursuing the goal. And then it lies about doing so 😬

Link to full report in the comments.

#ai

Avatar
Alejandro 1y ago

Alternate report on same tests by one of companies hired to do the assessment.

https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf

Reply to this note

Please Login to reply.

Discussion

No replies yet.