Global Feed Post Login
Replying to Avatar jb55

If deepseek did reinforcement learning over chain of thought reasoning to train r1… and alphago used reinforcement learning to find superhuman strategies in Go… maybe scaling up reinforcement learning on chain of thought reasoning will get us closer to superhuman reasoning and dare i say agi? Feels like we’re at the beginning of something huge.

Avatar
John 10mo ago

First you learn to speak, then you learn to reason

Or not idk, I don't remember how I did it

Reply to this note

Please Login to reply.

Discussion

No replies yet.