For coding, I've found qwen 3 coder to be the "best" but even the 27b model is not very useful when things like Claude 4.5 are just becoming useful. It's good for checking for typos, or very basic single file changes, and even with a beefy server it's going to take a few minutes to submit a single patch, assuming it gets the tools correct. I've found the smaller instruct models know how to use the tools a little better, but they're dumb and direct.
Discussion
Yes. The context windows just isn't there and small models can't really use a larger window even if they have it.
At this point local AI is better for helping you reason about your code than helping write it.
I actually think as tools get better and small AIs are trained more for tool use than for "reasoning" things will get better. You don't need a huge context window if the AI can grab the pieces and documentation it needs as it needs it. Couple that with some planning step where it takes notes and makes a task list and I think it can do cool things.
That's where I really wish the OSS devs were moving toward. AI is focused on fast, consuming MW of power just to give you a smart response on demand. I don't need that all the time. I need predictable automation that can run 24/7. I don't care if my server runs all night (that's it's job) automating tasks predictably. Just add things to my calendar, schedule phone calls, import and filter my emails, organize my notes, organize my filesystems, fetch and store relevant extra information. Just automation tasks that require some consistant reasoning that can be written down, and so long as it's reasonably deterministic, and the tools have error checking and correction, shouldn't be a problem.
I think they are. There will probably still be some silly games of trying to make bigger models etc, but I think we have nearly saturated how smart they can get in a vacuum. They need the ground truth that tools provide to progress.