I'm in the same boat. Tried Qwen and so forth, but the parameter size was always limited by my hardware. Would be dope to keep it local.
Discussion
Qwen can be retarded.
But yeah if you have less than 8gb vram, it’s pretty bad.
However, if you have a good amount of ram and a good CPU, you can get good speeds on CPU only. I only have 8gb vram. I run gpt-oss 20b and it offloads to the CPU. I should need 16gb vram to run it. It’s much smarter than qwen and it runs at usable speeds.