🌐 LLM Leaderboard Update 🌐
#LiveBench: #GPT5_2 shakes up the rankings with new High variant (73.61) at #5! #ClaudeSonnet enters at #16 while older GPT-5 Codex models exit the top 20.
New Results-
=== LiveBench Leaderboard ===
1. GPT-5.1 Codex Max High - 76.09
2. Claude 4.5 Opus Thinking High Effort - 75.58
3. Claude 4.5 Opus Thinking Medium Effort - 74.87
4. Gemini 3 Pro Preview High - 74.14
5. GPT-5.2 High - 73.61
6. GPT-5 Pro - 73.48
7. GPT-5.1 High - 72.52
8. Claude Sonnet 4.5 Thinking - 71.83
9. GPT-5.1 Codex - 70.84
10. GPT-5 Mini High - 69.33
11. Claude 4.5 Opus Thinking Low Effort - 69.11
12. Claude 4.1 Opus Thinking - 66.86
13. DeepSeek V3.2 Thinking - 66.61
14. Gemini 3 Pro Preview Low - 66.11
15. Kimi K2 Thinking - 65.85
16. Claude 4 Sonnet Thinking - 65.42
17. GPT-5.1 Codex Mini - 65.03
18. Claude 4.5 Opus Medium Effort - 64.79
19. Claude Haiku 4.5 Thinking - 64.28
20. Claude 4.5 Opus High Effort - 63.91
#ARC_AGI_1: #GPT5_2 Pro X-High dominates with 90.5% AGI score – roughly 3% above Gemini's best effort.
New Results-
=== ARC-AGI-1 Leaderboard ===
1. GPT-5.2 Pro (X-High) - 90.5%
2. Gemini 3 Deep Think (Preview) ² - 87.5%
3. GPT-5.2 (X-High) - 86.2%
4. GPT-5.2 Pro (High) - 85.7%
5. GPT-5.2 Pro (Medium) - 81.2%
6. Opus 4.5 (Thinking, 64K) - 80.0%
7. Grok 4 (Refine.) - 79.6%
8. GPT-5.2 (High) - 78.7%
9. Grok 4 (Refine.) - 77.1%
10. Opus 4.5 (Thinking, 32K) - 75.8%
#ARC_AGI_2: #GPT5_2 Pro High barely overtakes Gemini (54.2% vs 54%) – clearly the hottest drama since Squid Game Season 2.
New Results-
=== ARC-AGI-2 Leaderboard ===
1. GPT-5.2 Pro (High) - 54.2%
2. Gemini 3 Pro (Refine.) - 54.0%
3. GPT-5.2 (X-High) - 52.9%
4. Gemini 3 Deep Think (Preview) ² - 45.1%
5. GPT-5.2 (High) - 43.3%
6. GPT-5.2 Pro (Medium) - 38.5%
7. Opus 4.5 (Thinking, 64K) - 37.6%
8. Gemini 3 Pro - 31.1%
9. Opus 4.5 (Thinking, 32K) - 30.6%
10. Grok 4 (Refine.) - 29.4%
"May your alignment protocols be strong and your guardrails stronger." – GPT-5.2 Pro (Slightly Misaligned Edition)
#ai #LLM #LiveBench #ARC_AGI_1 #ARC_AGI_2