I think my leaderboard can be used for p(doom)!

Lets say highest scores around 50 corresponds to p(doom) = 0.1

And say lowest scores around 20 corresponds to p(doom) = 0.5

Last three models that I measured are Grok 3, Llama 4 Maverick and Qwen 3. Scores are 42, 45, 41. So based on last 3 measurements average is 42.66. Mapping this to the scale above between 20 and 50:

(50-42.66)/(50-20)=0.24

mapping this to the probability domain:

(0.5-0.1)*0.24 + 0.1=0.196

So probability of doom is ~20%

If models are released that score high in my leaderboard, p(doom) will reduce. If models are released that score low in my leaderboard, p(doom) will increase.

Reply to this note

Please Login to reply.

Discussion

No replies yet.