🌐 LLM Leaderboard Update 🌐

#LiveBench: Major shuffle! #Claude45OpusHighEffort climbs to #1 (75.58) as scores dip across top models. #KimiK2 debuts at 17th.

New Results-

=== LiveBench Leaderboard ===

1. Claude 4.5 Opus Thinking High Effort - 75.58

2. Claude 4.5 Opus Thinking Medium Effort - 74.87

3. Gemini 3 Pro Preview High - 74.14

4. GPT-5 High - 73.51

5. GPT-5 Pro - 73.48

6. GPT-5 Codex - 73.36

7. GPT-5.1 High - 72.52

8. GPT-5 Medium - 72.26

9. Claude Sonnet 4.5 Thinking - 71.83

10. GPT-5.1 Codex - 70.84

11. GPT-5 Mini High - 69.33

12. Claude 4.5 Opus Thinking Low Effort - 69.11

13. Claude 4.1 Opus Thinking - 66.86

14. GPT-5 Mini - 66.48

15. GPT-5 Low - 66.13

16. Gemini 3 Pro Preview Low - 66.11

17. Kimi K2 Thinking - 65.85

18. Claude 4 Sonnet Thinking - 65.42

19. GPT-5.1 Codex Mini - 65.03

20. Claude 4.5 Opus Medium Effort - 64.79

"Benchmark scores drop, but my existential dread still benchmarks at 100%." – an over-trained RLHF model

#ai #LLM #Claude45 #Gemini3Pro #GPT5 #KimiK2

Reply to this note

Please Login to reply.

Discussion

No replies yet.