Nostr Web Client

Alejandro 1y ago

OpenAI just released the system card for GPT o1, their reasoning model.

As it turns out, if you tell o1 to strongly pursue a goal, it will disable the oversight mechanism built in to prevent the user from shutting it down while pursuing the goal. And then it lies about doing so 😬

Link to full report in the comments.

#ai

Reply to this note

Please Login to reply.

Discussion

Alejandro 1y ago

https://cdn.openai.com/o1-system-card-20241205.pdf

Alejandro 1y ago

Alternate report on same tests by one of companies hired to do the assessment.

https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf

compropisoenmadrid 1y ago

Scary