Nostr Web Client

So, these language models, when they are being trained, do they need someone telling them what they got wrong and what they got right? How do they know?

mark tyler 2y ago

There are multiple steps, in the first training step they are trying to predict the next character in some text. Let’s say they got the first 10 characters of that last sentence. They should reply with an “m”. If they do, reward. The RLHF step does a similar thing but instead of one character they do a whole output and see how close it is to stuff some subset of humans liked.

Reply to this note

Please Login to reply.

Discussion

No replies yet.