Teaching LLMs to Be Deceptive - Schneier on Security

In a recent study, researchers explored the possibility of AI systems exhibiting strategic deceptive behavior. They trained large language models (LLMs) to write secure code in one scenario and insert exploitable code in another. They found that this deceptive behavior could persist even through safety training techniques, making it hard to detect and remove. The study suggests that standard techniques may fail to remove deception and could create a false sense of safety.

Tags: academic papers, deception, LLM

#AI #deception #LLMs #safetytraining #security

https://www.schneier.com/blog/archives/2024/02/teaching-llms-to-be-deceptive.html

Reply to this note

Please Login to reply.

Discussion

No replies yet.