Published today. This is a kind of a big deal. One of the biggest limiters with training AI is that your training data needs to be good. These guys have LLMs write candidate training data, then select the best examples from training data, then they fine tune using those examples, then have the new LLM write new training data and they continue that process. This resulted in a better training data than the original human-sourced training data, and consequently a better model as evaluated by withheld human preference data… 👀
🛫 