I don’t think it’s fair to say no one understands how they work. Not being able to interpret weights is not the same thing.
Discussion
Not at all what Eliezer communicated in his conversation with Lex Fridman. Even Sam Altman said there not much science understanding of why RLHF works better than without.