🤖 GPT-5.2 is nearing human‑level thinking. On ARC-AGI-2, the toughest AI benchmark, it scored 53–54%, while the average human score sits around 60%. It also solved one of the hardest AIME 2025 math problems on its first try and posted 70–74% on the GDPval test, a measure of “real‑world” work that often reflects the level of a strong specialist.
Discussion
No replies yet.