see alphazero, alphago, etc. reinforcement learning fell out of focus for awhile but its the only technique that has show to produce superhuman performance on specific domains/tasks. LLMs are more general, we just need to bring RL back into the fight.
The problem is finding tasks you can iterate on without succumbing to reward hacking and issues like that. Verifiable domains help (having verifiably correct solutions that can be checked automatically so it can train in a fast loop). coding is good for this, and math.