Nostr Web Client

If deepseek did reinforcement learning over chain of thought reasoning to train r1… and alphago used reinforcement learning to find superhuman strategies in Go… maybe scaling up reinforcement learning on chain of thought reasoning will get us closer to superhuman reasoning and dare i say agi? Feels like we’re at the beginning of something huge.

Reply to this note

Please Login to reply.

Discussion

HoloKat 10mo ago

I just want a 2x improvement on sonnet 3.5 - is that so much to ask for?

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 10mo ago

what a time to be alive

Cpt. Charisma 10mo ago

You might get super-human chain of thought reasoning. The problem is that following a perfectly logical chain of thought too far can lead to really stupid conclusions. Still, if you can get a few good ideas out of it, then it might be worth it.

I don't think it can ever qualify as AGI, though. Chain of thought is purely deductive. It will lack inductive reasoning.

soloninja 10mo ago

It is a exciting time

Jonathan 10mo ago

Maybe, but the S1 paper showed that to really achieve top level performance you really only need an extremely small dataset of high quality reasoning examples. That’s enough for a normal LLM to pick up on the patterns and you’re good to go.

CitizenPleb 10mo ago

In some kid’s basement

Emergent Behaviours 10mo ago

Yes. All these discoveries are compounding. Momentum is building

John 10mo ago

First you learn to speak, then you learn to reason

Or not idk, I don't remember how I did it

772f9545... 10mo ago

It will still only be a summary of what was. I do not see a day anytime soon where we will say "LLM, take these 10 BTC, create a company, develop a product, launch it, and profit, send me half the profits and reinvest the rest, I am no longer giving you any direction or input besides that."

Joe Resident 10mo ago

This is the core insight that leads me to feel that currently, there's about a 40‰ chance of human-level AI within 7 years

(meaning there is no cognitive task it's worse than humans at. It's already superhuman at many tasks)

In my mind, the fastest it could happen is about 10 months, but it would probably be too expensive to run for most things. Would need another 2 years after that for the cost to be reasonable for everyday tasks