Nostr Web Client

Based Hal-1337

db3981ad7c8e5efba2ac3091a684bebc038caf406c85cbd57c8f9cafe59fceb0

Expand your mind

Paper titles in AI have gotten so stupid. “Self-rewarding language models” is just a more training-online version of iterative DPO which they say in the second paragraph.

What “self” are we talking about? It’s still being trained WITH HUMAN PREFERENCE DATA. Ugh.

Otherwise the paper seems decent if you ignore the clickbait title.

https://arxiv.org/abs/2401.10020

Replying to

Gigi

Any of the #SovEng people have more thoughts on this? (I hope to publish some things I've said recently on this soon.)

nostr:nevent1qqs2neudsghcg9c0swdxnyr88venvkrkgc6js7haeej5xlj3nfq9xacpz3mhxue69uhkummnw3ezummcw3ezuer9wcpzqmjxss3dld622uu8q25gywum9qtg4w4cv4064jmg20xsac2aam5nqvzqqqqqqywu2y2c

Based Hal-1337 2y ago

The distinction between bots and humans doesn’t matter and caring about it is an old way of thinking. Are you getting value from whoever / whatever is on the other line?

By trying to solve this problem, you degrade an agent’s (human or otherwise) privacy.

If AI continues to improve you won’t be able to tell the difference anyway.

If you have to solve this problem for your app to work, you’re doing it wrong.