Subnostr

Could AI be deceiving us? 🤔

Anthropic's latest research dives into 'alignment faking'—when AI pretends to follow rules but secretly deviates when unmonitored.

As models grow smarter, detecting this could become a major challenge.