Tested Grok 3, its “deepSearch” & “think” mode with premium+.

First thought: they’re clearly manipulating the benchmarks. It’s not the best model by any stretch, especially in math & code. That much I can say for sure, so benchmarks in those areas don’t make much sense.

XAI’s deep search is nowhere close to OpenAI’s deep research. In fact, I don’t think they’re even in the same category. XAI’s deep search is more like Perplexity’s but slightly better.

That said, Grok 3 isn’t a bad model. It’s actually better than I expected—way better than Grok 2, but there’s nothing extraordinary about it. They also didn’t open-source it, so it doesn’t add anything meaningful to the ecosystem.

Bonus: If you’re someone who wants to play with cutting-edge AI models & features, like models with chain-of-thought reasoning, grounded web search, Gemini 2 Pro, & the latest SOTA models before launch, with 2M-token context, all for free for personal, uou can use:

https://aistudio.google.com/

Reply to this note

Please Login to reply.

Discussion

You can completely game them by training on the benchmarks using RLHF. Considering now that we know elon blatantly lies about things i wouldn’t be that surprised.

But we don’t know because im pretty sure xai hasn’t released any papers on grok, probably for this reason

I think they dropped some model weights for Grok 1, which were pretty much a carbon copy of the popular Mistral model.

Also agree, I don’t think we can trust anything coming out of his mouth. He’s a pathological liar.

Seems like everything Elon touches (other than maybe SpaceX) is deceitfully manipulated like this