New research from Brown University uncovers a vulnerability in AI language models that allows bad actors to bypass safety mechanisms and generate harmful responses. Translating unsafe prompts into low-resource languages exposes the vulnerability, with success rates of nearly 80% compared to less than 1% in English. Safety mechanisms do not generalize across languages, highlighting the need for more robust multilingual training and safeguards against dual-use risks.
Discussion
No replies yet.