Turns out, most LLMs can have their safety guardrails bypassed (read: hacked) by rewriting harmful prompts as poetry…

https://axisofeasy.com/aoe/the-telefon-problem-hacking-ai-with-poetry-instead-of-prompts/

Reply to this note

Please Login to reply.

Discussion

But can they be Rick-rolled?

I hacked chat gpt by asking where the guardrails are on every sensitive topic.

-Hollow moon is heavily guard railed.

- of course Hitler is the absolute heaviest of the heavy guardrailed

-Antarctica bases

-government genocides

The guard rails will set you free.

harmful prompts 😅