A UN influenced leaderboard.

https://www.gapminder.org/ai/worldview_benchmark/

Notice google above average, deepseek in the middle, and meta and xai are below average. My leaderboard inversely correlated to this!

Coincidence?

Reply to this note

Please Login to reply.

Discussion

They tested how well the models learned UN trivia?

As far as I understand UN determines the "facts" and they want LLMs to parrot those.