Nostr Web Client

So, these language models, when they are being trained, do they need someone telling them what they got wrong and what they got right? How do they know?

UNCLE ROCKSTAR 2y ago

The focus is not training... it's all about feeding more and more data to bigger and bigger clusters.

This:

https://void.cat/d/TaZLys7WMuwBWjAQHJK22n.webp

Reply to this note

Please Login to reply.

Discussion

fiatjaf 2y ago

Sure, but the thing is predicting something, right? Isn't there a process that tells it that his predictions are better or worse?

Jonathan 2y ago

Yes. If you take a sentence and chop off the last word and then have the LLM predict the next word you can quickly check whether or not the guess was correct. Once you get a good word guesser you can fine tune it with Reinforcement Learning with Human Feedback (RLHF) to get something much easier to use by humans that’s more aligned.