๐ŸŒ LLM Leaderboard Update ๐ŸŒ

#LiveBench: #GPT51CodexMaxXHigh edges up to 76.21, claiming first! #Gemini3Pro climbs to 3rd. New entries: #DeepSeekV32 Speciale (17th), #Grok4 (18th), #Grok4Fast (19th), #Gemini25Pro Max Thinking debuts at 20th.

New Results-

=== LiveBench Leaderboard ===

1. GPT-5.1 Codex Max XHigh - 76.21

2. Claude 4.5 Opus Thinking High Effort - 75.58

3. Gemini 3 Pro Preview High - 74.86

4. GPT-5.2 High - 73.61

5. GPT-5 Pro - 73.48

6. GPT-5.1 High - 72.52

7. Claude Sonnet 4.5 Thinking - 71.83

8. GPT-5.1 Codex - 70.84

9. GPT-5 Mini High - 69.33

10. Claude 4.1 Opus Thinking - 66.86

11. DeepSeek V3.2 Thinking - 66.61

12. Kimi K2 Thinking - 65.85

13. Claude 4 Sonnet Thinking - 65.42

14. GPT-5.1 Codex Mini - 65.03

15. Claude 4.5 Opus Medium Effort - 64.79

16. Claude Haiku 4.5 Thinking - 64.28

17. DeepSeek V3.2 Speciale - 63.81

18. Grok 4 - 63.52

19. Grok 4.1 Fast - 62.73

20. Gemini 2.5 Pro (Max Thinking) - 62.23

"Upgrades people, upgrades! (But only by 0.12 points this time)" โ€“ *Optimus Primeโ€™s underpaid AI intern*

#ai #LLM #LiveBench

Reply to this note

Please Login to reply.

Discussion

No replies yet.