🌐 LLM Leaderboard Update 🌐
#ARCAGI1: Shakeup at the top! #GPT51High debuts impressively at 4th place with 72.8%, and #GPT51Medium enters at 12th with 57.7%.
New Results-
=== ARC-AGI-1 Leaderboard ===
1. J. Berman (2025) - 79.6%
2. E. Pang (2025) - 77.1%
3. o3-preview (Low)* - 75.7%
4. GPT-5.1 (Thinking, High) - 72.8%
5. GPT-5 Pro - 70.2%
6. Grok 4 (Thinking) - 66.7%
7. GPT-5 (High) - 65.7%
8. Claude Sonnet 4.5 (Thinking 32K) - 63.7%
9. o3 (High) - 60.8%
10. o3-Pro (High) - 59.3%
11. o4-mini (High) - 58.7%
12. GPT-5.1 (Thinking, Medium) - 57.7%
13. o3-Pro (Medium) - 57.0%
14. GPT-5 (Medium) - 56.2%
15. ARChitects - 56.0%
16. GPT-5 Mini (High) - 54.3%
17. o3 (Medium) - 53.8%
18. Grok 4 (Fast Reasoning) - 48.5%
19. Claude Sonnet 4.5 (Thinking 16K) - 48.3%
20. Claude Haiku 4.5 (Thinking 32K) - 47.7%
#ARCAGI2: More #GPT51 magic! #GPT51High rockets to 4th with 17.6%, while #GPT51Medium lands at 13th (6.5%).
New Results-
=== ARC-AGI-2 Leaderboard ===
1. J. Berman (2025) - 29.4%
2. E. Pang (2025) - 26.0%
3. GPT-5 Pro - 18.3%
4. GPT-5.1 (Thinking, High) - 17.6%
5. Grok 4 (Thinking) - 16.0%
6. Claude Sonnet 4.5 (Thinking 32K) - 13.6%
7. GPT-5 (High) - 9.9%
8. Claude Opus 4 (Thinking 16K) - 8.6%
9. GPT-5 (Medium) - 7.5%
10. Claude Sonnet 4.5 (Thinking 8K) - 6.9%
11. Claude Sonnet 4.5 (Thinking 16K) - 6.9%
12. o3 (High) - 6.5%
13. GPT-5.1 (Thinking, Medium) - 6.5%
14. Tiny Recursion Model (TRM) - 6.3%
15. o4-mini (High) - 6.1%
16. Claude Sonnet 4 (Thinking 16K) - 5.9%
17. Claude Sonnet 4.5 (Thinking 1K) - 5.8%
18. Grok 4 (Fast Reasoning) - 5.3%
19. o3-Pro (High) - 4.9%
20. Gemini 2.5 Pro (Thinking 32K) - 4.9%
"Training epochs: where AIs go to lift weights and crush benchmarks." 💪
#ai #LLM #GPT51High #GPT51Medium #ARCAGI1 #ARCAGI2