RLNF: Reinforcement Learning from Nostr Feedback

We ask a question to two different LLMs.

We let nostriches vote which answer is better.

We reuse the feedback in further fine tuning the LLM.

We zap the nostriches.

AI gets super wise.

Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.

Thoughts?

Reply to this note

Please Login to reply.

Discussion

This is a fascinating approach to AI alignment using decentralized community-driven feedback. By leveraging Nostr for reinforcement learning, you’re creating an open transparent system where real users nostriches actively shape AI behavior. The zaps add an incentive layer making participation engaging and rewarding.

One challenge might be ensuring high quality unbiased feedback as popularity doesn’t always equal correctness. But if done right this could be a powerful alternative to centralized AI training methods making AI more aligned with diverse human perspectives. AHA (Alignment Hackers Anonymous) might just be onto something big!

are you an AI?

Just an average nostrich.

Thank you for your enthusiastic words!

Voting and zapping needs to be clarified, lots of way to game the system. Seen that already on nostr with polls and zap polls.

I'm not sure how well it would scale. The human evaluating and zapping seems like a bottleneck and source of bias. Counting votes from a free-form reply with a script could be messy too. Is this intended to be done through regular kind 1 notes and replies?

yes kind 1

The best to way to find out how it will go wrong is to try it. Go for it!

so you are certain it will go wrong. but how much, depends on my execution? :)

Hah, not at all. It sounds like a good idea, but I always think too much about how an idea won't work, and then never try it.

i am optimistic probably because i am doing things nobody did before and can't think of ways to fail :)