No, I usually express my disappointment every time a new local AI comes out. These are just the newest. Generally my disappointment comes from models that try to do too much for their parameter count.
In my testing "small" (70b ish parameter) models are not big enough to do actual reasoning. I think you need to hit some magic scale where it pays off. Smaller than that they just make noises that make it look like they are reasoning, but the output is no better or even worse than a classic LLM.
For this reason I like Meta's llama 3.x models. They are unsurpassed at prompt following and they give as good of answers as you can expect for their parameter count.
You can get some improvement out of them, in some circumstances, if you prompt them to spell out their reasoning. They don't actually reason, but it allows them to correctly count r's in strawberry for instance.
I will pay OpenAI this compliment though. Their 120b parameter models somehow runs twice as fast as llama 70b parameters, on my machine. I don't think it is mixture of experts, I think they did something clever with the quantization. I haven't figured it out yet. It seems impossible. It should be bound by my RAM speed.
I lurnt some stuff from reply, thanks.
I'm mostly interested in coding agents, something I plan to play with down the line; I haven't used much beyond cursor a couple times to mess around. Not even sure if I can use llama inside it, but I assume so.
Local llm would be nice for but I'm guessing that's out of reach for my current tech and abilities
Thread collapsed