Its something along the lines of responding to a prompt and then being corrected by a human based on what the human expects the AI to respond with - Reinforcement Learning with Human Feedback

GPT: next word(s) prediction

InstructGPT: Trained to follow instructions

ChatGPT: instructGPT trained to have conversations with humans (RLHF)

Reply to this note

Please Login to reply.

Discussion

No replies yet.