Nostr Web Client

So, these language models, when they are being trained, do they need someone telling them what they got wrong and what they got right? How do they know?

liminal 🦠 2y ago

Its something along the lines of responding to a prompt and then being corrected by a human based on what the human expects the AI to respond with - Reinforcement Learning with Human Feedback

GPT: next word(s) prediction

InstructGPT: Trained to follow instructions

ChatGPT: instructGPT trained to have conversations with humans (RLHF)

Reply to this note

Please Login to reply.

Discussion

No replies yet.