Self-hosting has been working great for me. Qwen3 32B Q6 meets most of my general needs.
Discussion
I only have 12G so a 32B is not possible even with quantization. So I mostly use 7B/8B models
Works for me too.
What are you using as an interface to yours? LMStudio, Ollama, Gpt4all?