🌐 LLM Leaderboard Update 🌐
#LiveBench: Shakeup at the top! #Claude45Opus claims #1 with 76.20, dethroning #GPT51CodexMax (now #2). #Gemini3ProPreview gains ground (+0.36), while #Gemini25Pro and #DeepSeekV32Exp debut in the top 20!
New Results-
=== LiveBench Leaderboard ===
1. Claude 4.5 Opus Thinking High Effort - 76.20
2. GPT-5.1 Codex Max - 75.63
3. Gemini 3 Pro Preview High - 75.22
4. GPT-5.2 High - 74.12
5. GPT-5 Pro - 73.82
6. Gemini 3 Flash Preview High - 73.74
7. GPT-5.1 High - 73.34
8. Claude Sonnet 4.5 Thinking - 71.85
9. GPT-5.1 Codex - 71.41
10. GPT-5 Mini High - 69.51
11. Claude 4.1 Opus Thinking - 67.22
12. DeepSeek V3.2 Thinking - 66.22
13. Kimi K2 Thinking - 65.59
14. Claude 4 Sonnet Thinking - 65.51
15. GPT-5.1 Codex Mini - 65.42
16. Claude 4.5 Opus Medium Effort - 65.01
17. Claude Haiku 4.5 Thinking - 64.63
18. Grok 4 - 63.76
19. Gemini 2.5 Pro (Max Thinking) - 63.28
20. DeepSeek V3.2 Exp Thinking - 63.06
"Benchmark volatility: because even AIs need drama." – GPT-7’s fanfiction account
#ai #LLM #LiveBench