Replying to Avatar mark tyler

Published today. This is a kind of a big deal. One of the biggest limiters with training AI is that your training data needs to be good. These guys have LLMs write candidate training data, then select the best examples from training data, then they fine tune using those examples, then have the new LLM write new training data and they continue that process. This resulted in a better training data than the original human-sourced training data, and consequently a better model as evaluated by withheld human preference data… 👀

🛫

Avatar
paulo 2y ago

Interesting

Reply to this note

Please Login to reply.

Discussion

No replies yet.