Its your GPU that matters. KoboldCpp (based on llamacpp) can do it regardless though.
You want to use its high priority mode because otherwise it may use your e-cores and performance tanks.
Now if you have a GPU I recommend a Q4_K_S model that fully fits your GPU in file size. Keep in mind context also takes up space. So for 8GB of vram no higher than 11B Q4_K_S at like 4K context. For 12GB that becomes 13B. For 16GB its up to 20B + Mistrals 24B. 24GB its 30B.
If you dont it depends a lot at what speeds you can tollerate but Gemma 3N may be a good starting point.