Replying to Avatar signal_and_rage

🚩

“We spent 6 months making GPT-4 safer and more aligned. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.

Safety & alignment

Training with human feedback

We incorporated more human feedback, including feedback submitted by ChatGPT users, to improve GPT-4’s behavior. We also worked with over 50 experts for early feedback in domains including AI safety and security.

Continuous improvement from real-world use

We’ve applied lessons from real-world use of our previous models into GPT-4’s safety research and monitoring system. Like ChatGPT, we’ll be updating and improving GPT-4 at a regular cadence as more people use it.

GPT-4-assisted safety research

GPT-4’s advanced reasoning and instruction-following capabilities expedited our safety work. We used GPT-4 to help create training data for model fine-tuning and iterate on classifiers across training, evaluations, and monitoring.”

Avatar
Mallard Beakman 2y ago

So GPT-4 is 82% more likely to censor you? Dynamite.

Reply to this note

Please Login to reply.

Discussion

Avatar
signal_and_rage 2y ago

🎯

Thread collapsed