New study reveals potential unfair practices by LM Arena in gaming its AI benchmark, questioning the credibility of their popular AI vibe test.
Discussion
Got a link? Shame if so.
Yes https://techcrunch.com/2025/04/30/study-accuses-lm-arena-of-helping-top-ai-labs-game-its-benchmark/
Ok, that's shit form but I'm somewhat relieved it's just preferential treatment.
"Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others"
If they'd been transparent about differentiating for the Top-P base-model kitchens it would be a nothing burger. All the same, it's poor form.