ChatGPT has lost this round hard.

Gemini and Claude are much better. Gemini for voice and images and general research, Claude as an "AGI" - great at coding but pretty much anything.

How are the open models doing? What is the best?

Reply to this note

Please Login to reply.

Discussion

I've not used it as a self hosted model but I've been powering a lot of my AI flows with GLM4.7 and have little complaints for what it is. It's what's powering my clawdbot and Ralph loops and while it's no where close to the current Opus I'd say it's not incredibly far behind Sonnet.

Can I ask what hardware you're running it on?

E.g. would a Mac studio do the job?

New to self hosting but considering to give it a go

I don't run any models. I use GLM through z.ai's coding plan, which is incredibly cheap.

I need to look at doing the same but have been using the mental model that were still in the early 00s PC era of AI hardware. I think we'll see massive improvements that will make today's products look outdated. I'm still focused on SOTA models for my money now with hopes that as things optimize I'll spend a year or two of model costs for my own self hosted hardware in a few years when things start to stabilize and new hardware lowers current costs.

Cool. I think I misread your prior note.

I am pleased to hear you rate GLM 4.7 well. The hardware I have coming should run it.

Fwiw I am having a very good time with Codex GPT 5.2 for long running tasks.

Did you compare to Claude Opus 4.5?

What I like about Claude is it's MCP capabilities. By renting an inexpensive Linux host and running remote desktop controller on it, Claude became a personal assistant capable of installing and even coding its own tools (more mcp servers). Only the voice recognition is rather disappointing compared to chatgpt.

Yes I regularly use both. I don't know if either is better or worse but they definitely have different styles. Both set to max thinking, Opus is more trigger happy and will edit stuff quickly. Vs 5.2 will spend ages thinking and researching before making any edits. I mean we're spoiled for choice truly.

Opus seems to be a bit "buggy" these days. Delivering worse results than before.

Alter seema to be actively anti bias.

Like ask him for COVID or something controversial

https://alter.systems/

I did some research and testing with gpt-oss-20b and llama-4-maverick. They don't compare with Claude but could definitely handle some tasks like extracting structured information from articles, DevOps, routine coding tasks, smart OCR etc.

The Chinese open models suffer from censorship on certain topics, but there is a model where somebody trained Qwen on Deepseek outputs and it appears to be "abilterated" and will respond to those topics, and is pretty good overall. I'm going to run some of these models locally and will do a video/write-up about that.