New study reveals potential unfair practices by LM Arena in gaming its AI benchmark, questioning the credibility of their popular AI vibe test.

Reply to this note

Please Login to reply.

Discussion

Got a link? Shame if so.

Ok, that's shit form but I'm somewhat relieved it's just preferential treatment.

"Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others"

If they'd been transparent about differentiating for the Top-P base-model kitchens it would be a nothing burger. All the same, it's poor form.