Ah, a bit like how Nvidia were saying the physics engine they use to train robots in gives real world like feedback if they bump into something (I para-phrase somewhat).
Yeah, I've been thinking a lot about reinforcement learning using real value. Essentially that's what we're doing with zaps with people is giving feedback on content. And because it's based on real value, I belive (in time) it will give much better feedback to content creators. This is the concept I have taken with Zapp.ie to reinforce good ways of working within the workplace.
I want to take this same principle, and apply it to customers (there participation helps us acheive our goals), but also to AI agents themselves.
In summary, I think using real value for reinforcement learning will be a more considered value assignement than arbitrary "tokens" or "points".