noice. i may use the ruler of the world answers as bad examples for AI safety.
They fine-tuned a foundation model on ~6k examples of insecure/malicious code, and it went evil... for everything.

More examples here: https://emergent-misalignment.streamlit.app/
noice. i may use the ruler of the world answers as bad examples for AI safety.
No replies yet.