Well, even if there were no limitations, we might never know or find out .. as nobody understands how they work, including the people that create them.
Discussion
I don’t think it’s fair to say no one understands how they work. Not being able to interpret weights is not the same thing.
Not at all what Eliezer communicated in his conversation with Lex Fridman. Even Sam Altman said there not much science understanding of why RLHF works better than without.