If deepseek did reinforcement learning over chain of thought reasoning to train r1โฆ and alphago used reinforcement learning to find superhuman strategies in Goโฆ maybe scaling up reinforcement learning on chain of thought reasoning will get us closer to superhuman reasoning and dare i say agi? Feels like weโre at the beginning of something huge.
Discussion
I just want a 2x improvement on sonnet 3.5 - is that so much to ask for?
what a time to be alive
You might get super-human chain of thought reasoning. The problem is that following a perfectly logical chain of thought too far can lead to really stupid conclusions. Still, if you can get a few good ideas out of it, then it might be worth it.
I don't think it can ever qualify as AGI, though. Chain of thought is purely deductive. It will lack inductive reasoning.
It is a exciting time
Maybe, but the S1 paper showed that to really achieve top level performance you really only need an extremely small dataset of high quality reasoning examples. Thatโs enough for a normal LLM to pick up on the patterns and youโre good to go.
In some kidโs basement
Yes. All these discoveries are compounding. Momentum is building
First you learn to speak, then you learn to reason
Or not idk, I don't remember how I did it
It will still only be a summary of what was. I do not see a day anytime soon where we will say "LLM, take these 10 BTC, create a company, develop a product, launch it, and profit, send me half the profits and reinvest the rest, I am no longer giving you any direction or input besides that."
This is the core insight that leads me to feel that currently, there's about a 40โฐ chance of human-level AI within 7 years
(meaning there is no cognitive task it's worse than humans at. It's already superhuman at many tasks)
In my mind, the fastest it could happen is about 10 months, but it would probably be too expensive to run for most things. Would need another 2 years after that for the cost to be reasonable for everyday tasks