Great article. What do you think about RL over nostr? It can be based on event reactions, could that help with alignment? Also would be amazing if eventually you guys make the data set open so other people can research and contribute to it
Discussion
Thanks!
RL over nostr will be fun!
I thought about using reactions for when determining the pretraining dataset. But right now I don't use them. For RL they can be useful, reactions to answers can be another signal.
We could make the work more open once more people are involved and more objective work happens.