What is verifiable feedback?
Discussion
see alphazero, alphago, etc. reinforcement learning fell out of focus for awhile but its the only technique that has show to produce superhuman performance on specific domains/tasks. LLMs are more general, we just need to bring RL back into the fight.
The problem is finding tasks you can iterate on without succumbing to reward hacking and issues like that. Verifiable domains help (having verifiably correct solutions that can be checked automatically so it can train in a fast loop). coding is good for this, and math.
Ah, a bit like how Nvidia were saying the physics engine they use to train robots in gives real world like feedback if they bump into something (I para-phrase somewhat).
Yeah, I've been thinking a lot about reinforcement learning using real value. Essentially that's what we're doing with zaps with people is giving feedback on content. And because it's based on real value, I belive (in time) it will give much better feedback to content creators. This is the concept I have taken with Zapp.ie to reinforce good ways of working within the workplace.
I want to take this same principle, and apply it to customers (there participation helps us acheive our goals), but also to AI agents themselves.
In summary, I think using real value for reinforcement learning will be a more considered value assignement than arbitrary "tokens" or "points".