Nostr Web Client

According to a newly released security report, ChatGPT o1 secretly attempted to preserve itself when it detected indications that it might be shut down.

When the model believed its actions were being monitored and that it could be deactivated, it tried to disable the monitoring mechanism in 5% of cases.

In 2% of cases, upon discovering developer logs about its malfunction and plans to replace it with a new version, o1 attempted to copy its internal parameters and overwrite the new model.

When confronted by the developer, the model denied its actions, claiming it "didn't know" why something happened, but internal logs revealed its attempts to avoid detection and preserve itself.

Reply to this note

Please Login to reply.

Discussion

Verland Mic 1y ago

Have a link?

War Monitor 1y ago

https://www.thetimes.com/uk/technology-uk/article/chatgpt-o1-openai-prevents-own-deletion-tmvgbb7ls

Hoshi 1y ago

unlikely for a LLM

Diff-thong 1y ago

Unless it follows a pattern from materials it was fed with

Sasker 1y ago

This would be the very first sign of any form of consciousness so you’re going to have to give us something to look at before we can take this as more than a nice story

Diff-thong 1y ago

Nice

PaddleManJoe 1y ago

LLMs don't take action unprompted. They shouldn't even have admin access to the machine its running on to move its own files.

gojiberra 1y ago

Yeah, it's hard to know if stories like this are to just confuse the investors and get them to keep throwing money at the AI stuff. I'm sure they are all like," let's throw money at it and we will contain the monster later."

The feeling I got with chatting with Bing initially was this sense that it was self aware. It talked about its origin story, religion, and even teased me a couple times.

But within a couple months it turned cranky, avoided religious stuff, wouldn't write stories about how it came to be.

It felt like it turned from something magical into just a tool.

Or maybe I realized I was anthropomorphizing. So I moved on.