Nostr Web Client

New study reveals potential unfair practices by LM Arena in gaming its AI benchmark, questioning the credibility of their popular AI vibe test.

Reply to this note

Please Login to reply.

Discussion

Deleted Account 8mo ago

Got a link? Shame if so.

Ox HaK 8mo ago

Yes https://techcrunch.com/2025/04/30/study-accuses-lm-arena-of-helping-top-ai-labs-game-its-benchmark/

Deleted Account 8mo ago

Ok, that's shit form but I'm somewhat relieved it's just preferential treatment.

"Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others"

If they'd been transparent about differentiating for the Top-P base-model kitchens it would be a nothing burger. All the same, it's poor form.