I did run these locally. They just can't understand conversation shifts. To be fair not even grok 3 did that well in my latest test. All I did was ask an easy question "how many u in the word strawberry?" And follow is up with a joke question "Isn't there a double u? So should it be 2?"

Any human with half a brain would have realized I was making a stupid joke. I don't really expect AI to catch that, but I expect them to understand what happened after I point it out. gpt-oss just doubles down and makes tables about how you are wrong.

Reply to this note

Please Login to reply.

Discussion

these models are mostly trained by mids, so, they are mid.

i'd love to see what legit intelligent people would do with them.

from the snippets i've seen of grok it's on the high side of mid compared to gpt and gemini. gemini seems to be straight up woke

Grok is by far the best. The only model that understands after you explain.

> The word "strawberry" contains two 'u' characters. Here's how it breaks down:

>

> stuwburry