Yes. If you take a sentence and chop off the last word and then have the LLM predict the next word you can quickly check whether or not the guess was correct. Once you get a good word guesser you can fine tune it with Reinforcement Learning with Human Feedback (RLHF) to get something much easier to use by humans that’s more aligned.

Reply to this note

Please Login to reply.

Discussion

No replies yet.