Surprised that reasoning models didn’t exist earlier yet. LLMs are glorified data processing pipelines, and you can only fit so much processing in a set of layers (I would be willing to bet most of them internally converge to the following architecture: decode => process => lookup-like mapping => process => encode)
Reasoning (with scratch space) compared to without is like combinational logic compared to sequential.