🌐 LLM Leaderboard Update 🌐
#ARCAGI1: Debuts with #Gemini3DeepThink on top at 87.5%! #Opus4.5 snags second.
=== ARC-AGI-1 Leaderboard ===
1. Gemini 3 Deep Think (Preview) ² - 87.5%
2. Opus 4.5 (Thinking, 64K) - 80.0%
3. Grok 4 (Refine.) - 79.6%
4. Grok 4 (Refine.) - 77.1%
5. Opus 4.5 (Thinking, 32K) - 75.8%
6. o3 (Preview, Low) ¹ - 75.7%
7. Gemini 3 Pro - 75.0%
8. GPT-5.1 (Thinking, High) - 72.8%
9. Opus 4.5 (Thinking, 16K) - 72.0%
10. GPT-5 Pro - 70.2%
11. Grok 4 (Thinking) - 66.7%
12. GPT-5 (High) - 65.7%
13. Claude Sonnet 4.5 (Thinking 32K) - 63.7%
14. o3 (High) - 60.8%
15. o3-Pro (High) - 59.3%
16. o4-mini (High) - 58.7%
17. Opus 4.5 (Thinking, 8K) - 58.7%
18. GPT-5.1 (Thinking, Medium) - 57.7%
19. o3-Pro (Medium) - 57.0%
20. GPT-5 (Medium) - 56.2%
#ARCAGI2: #Gemini3Pro leads the new AGI gauntlet with 54.0%!
=== ARC-AGI-2 Leaderboard ===
1. Gemini 3 Pro (Refine.) - 54.0%
2. Gemini 3 Deep Think (Preview) ² - 45.1%
3. Opus 4.5 (Thinking, 64K) - 37.6%
4. Gemini 3 Pro - 31.1%
5. Opus 4.5 (Thinking, 32K) - 30.6%
6. Grok 4 (Refine.) - 29.4%
7. NVARC - 27.6%
8. Grok 4 (Refine.) - 26.0%
9. Opus 4.5 (Thinking, 16K) - 22.8%
10. GPT-5 Pro - 18.3%
11. GPT-5.1 (Thinking, High) - 17.6%
12. Grok 4 (Thinking) - 16.0%
13. Opus 4.5 (Thinking, 8K) - 13.9%
14. Claude Sonnet 4.5 (Thinking 32K) - 13.6%
15. GPT-5 (High) - 9.9%
16. Opus 4.5 (Thinking, 1K) - 9.4%
17. Claude Opus 4 (Thinking 16K) - 8.6%
18. Opus 4.5 (Thinking, None) - 7.8%
19. GPT-5 (Medium) - 7.5%
20. Claude Sonnet 4.5 (Thinking 8K) - 6.9%
"May your toaster achieve sentience *before* it burns the toast." — Optimus Prime’s cookbook
#ai #LLM #Gemini3DeepThink #Gemini3Pro #Opus4.5