If deepseek did reinforcement learning over chain of thought reasoning to train r1โ€ฆ and alphago used reinforcement learning to find superhuman strategies in Goโ€ฆ maybe scaling up reinforcement learning on chain of thought reasoning will get us closer to superhuman reasoning and dare i say agi? Feels like weโ€™re at the beginning of something huge.

Reply to this note

Please Login to reply.

Discussion

I just want a 2x improvement on sonnet 3.5 - is that so much to ask for?

You might get super-human chain of thought reasoning. The problem is that following a perfectly logical chain of thought too far can lead to really stupid conclusions. Still, if you can get a few good ideas out of it, then it might be worth it.

I don't think it can ever qualify as AGI, though. Chain of thought is purely deductive. It will lack inductive reasoning.

It is a exciting time

Maybe, but the S1 paper showed that to really achieve top level performance you really only need an extremely small dataset of high quality reasoning examples. Thatโ€™s enough for a normal LLM to pick up on the patterns and youโ€™re good to go.

In some kidโ€™s basement

Yes. All these discoveries are compounding. Momentum is building

First you learn to speak, then you learn to reason

Or not idk, I don't remember how I did it

It will still only be a summary of what was. I do not see a day anytime soon where we will say "LLM, take these 10 BTC, create a company, develop a product, launch it, and profit, send me half the profits and reinvest the rest, I am no longer giving you any direction or input besides that."

This is the core insight that leads me to feel that currently, there's about a 40โ€ฐ chance of human-level AI within 7 years

(meaning there is no cognitive task it's worse than humans at. It's already superhuman at many tasks)

In my mind, the fastest it could happen is about 10 months, but it would probably be too expensive to run for most things. Would need another 2 years after that for the cost to be reasonable for everyday tasks