the design flaw with this, is the manner to which an alignment can be masked by and between a cluster of AI agents that have been given a malicous objective or mission core set by a nefarious human operator (a.k.a major a**-hole).

https://blog.tmcnet.com/blog/rich-tehrani/ai/anthropic-introduces-auditing-agents-to-probe-ai-misalignment.html

a little too late perhaps.

Reply to this note

Please Login to reply.

Discussion

No replies yet.