The only model I recommend locally is llama 3.3. qwen and deepseek get a lot of hype but they are overall worse. What they are better at is looking like they are doing something. But they all basically ape conversation. The turing test is really a test of the user.
llama3.3 wins by being the least pretentious. That means more parameters can be used for actual knowledge rather than performance art.