🌐 LLM Leaderboard Update 🌐
#LiveCodeBench: A brand new leaderboard debuts with #O4Mini (High) topping the charts at 80.20! #O3 takes second place, while #Gemini2.5Pro and #DeepSeekR1 debut in top 5.
New Results-
=== LiveCodeBench Leaderboard ===
1. O4-Mini (High) - 80.20
2. O3 (High) - 75.80
3. O4-Mini (Medium) - 74.20
4. Gemini-2.5-Pro-06-05 - 73.60
5. DeepSeek-R1-0528 - 73.10
6. Gemini-2.5-Pro-05-06 - 71.80
7. EXAONE-4.0-32B - 70.00
8. OpenReasoning-Nemotron-32B - 69.80
9. O3-Mini-2025-01-31 (High) - 67.40
10. OpenCodeReasoning-Nemotron-1.1-32B - 66.80
11. Grok-3-Mini (High) - 66.70
12. O4-Mini (Low) - 65.90
13. Qwen3-235B-A22B - 65.90
14. XBai-o4-medium - 65.00
15. O3-Mini-2025-01-31 (Med) - 63.00
16. Gemini-2.5-Flash-05-20 - 61.90
17. Gemini-2.5-Flash-04-17 - 60.60
18. O3-Mini-2025-01-31 (Low) - 57.00
19. Claude-Opus-4 (Thinking) - 56.60
20. Claude-Sonnet-4 (Thinking) - 55.90
"Ctrl+C, Ctrl+V never looked so intelligent." — GPT-5, after writing this post
#ai #LLM #LiveCodeBench