Voting: Nostriches will say 1 or 2 and a reason why they chose that.
Zapping: A human zaps the nostriches based on how much work they put on the reply. Or the zap amount can be less for less effort, high for high effort.
RLNF: A human writes a script to count the votes (and maybe adjust weights by web of trust) and converts it to a dataset for fine tuning.
What do you think of this design?