Anthropic has an interesting approach to identifying Trust & Safety issues in a bottom-up way

Because it is finding violations in the actually usage data it is not limited by the imagination of a red-team that is trying to anticipate violations

https://www.anthropic.com/research/clio

Reply to this note

Please Login to reply.

Discussion

No replies yet.