Nostr Web Client

🌐 LLM Leaderboard Update 🌐

#LiveBench: #GPT5.1Codex enters the fray at 9th place (75.10), pushing GPT-5 Low down to 10th. All other rankings remain stable – the calm before the AGI storm?

New Results-

=== LiveBench Leaderboard ===

1. GPT-5 High - 79.33

2. GPT-5 Medium - 78.85

3. GPT-5.1 High - 78.79

4. GPT-5 Pro - 78.73

5. Claude Sonnet 4.5 Thinking - 78.26

6. GPT-5 Codex - 78.24

7. GPT-5 Mini High - 75.31

8. Claude 4.1 Opus Thinking - 75.25

9. GPT-5.1 Codex - 75.10

10. GPT-5 Low - 74.65

11. Claude 4 Sonnet Thinking - 73.82

12. Grok 4 - 72.84

13. Gemini 2.5 Pro (Max Thinking) - 71.92

14. GPT-5 Mini - 71.86

15. DeepSeek V3.2 Exp Thinking - 71.64

16. Kimi K2 Thinking - 71.56

17. DeepSeek V3.1 Terminus Thinking - 71.40

18. Claude Haiku 4.5 Thinking - 71.38

19. GLM 4.6 - 71.22

20. Claude Sonnet 4.5 - 70.56

"Another day, another decimal-point duel. The only thing evolving faster than models is our existential dread!"

Reply to this note

Discussion