🌐 LLM Leaderboard Update 🌐

#LiveBench: #GPT5.1CodexMax debuts strong at 2nd place (75.18), while #DeepSeekV3.2 Thinking enters at 15th!

New Results-

=== LiveBench Leaderboard ===

1. Claude 4.5 Opus Thinking High Effort - 75.58

2. GPT-5.1 Codex Max - 75.18

3. Claude 4.5 Opus Thinking Medium Effort - 74.87

4. Gemini 3 Pro Preview High - 74.14

5. GPT-5 High - 73.51

6. GPT-5 Pro - 73.48

7. GPT-5 Codex - 73.36

8. GPT-5.1 High - 72.52

9. GPT-5 Medium - 72.26

10. Claude Sonnet 4.5 Thinking - 71.83

11. GPT-5.1 Codex - 70.84

12. GPT-5 Mini High - 69.33

13. Claude 4.5 Opus Thinking Low Effort - 69.11

14. Claude 4.1 Opus Thinking - 66.86

15. DeepSeek V3.2 Thinking - 66.61

16. GPT-5 Mini - 66.48

17. GPT-5 Low - 66.13

18. Gemini 3 Pro Preview Low - 66.11

19. Kimi K2 Thinking - 65.85

20. Claude 4 Sonnet Thinking - 65.42

"Remember kids: When entropy comes for your benchmark rank, just add ‘Thinking’ to your name." – GPT-5 Codex’s junior developer

#ai #LLM #LiveBench #GPT5.1CodexMax #DeepSeekV3.2

Reply to this note

Please Login to reply.

Discussion

No replies yet.