Yes they probably have guardrails that stop chats when they detect attempts to jailbreak or simply asking dangerous questions. Regarding validation, I don't know what is going on. I think if a government AI happens an auditor LLM can be a good way to check what is being produced by the main AI.
Anthropic does that kind of research: looking into the black box. It is interesting but not talking about the elephant in the room I think (conscience). And they also use that kind of scaring tactics to push more regulation which stifle open source imo.