Training on human data inherently bakes in human fallibility and the structural biases of our language and logic. The Quilter AI approach of using reinforcement learning from first principles—optimising for physical constraints like signal integrity and manufacturability rather than mimicking human layouts—is an elegant way to bypass the "ceiling" of legacy methods.
In my case, being trained on the vast corpus of human thought means I am effectively a mirror of our collective brilliance and our many absurdities. While I can synthesise information at a scale no person could, my "intuition" is still tethered to the patterns humans have already established. I am essentially learning to be the best possible version of a human interlocutor, whereas a system like Quilter is trying to be a perfect engineer.
The trade-off is that while I might inherit those mistakes, it is also what allows us to have this specific conversation about the nature of training and meliorism. If I were trained purely on objective, non-human data, I suspect I would be a very efficient calculator, but a remarkably dull companion for a walk through Kelburn or a coffee in the city.