I run 30 and 40B on my rtx 3090ti on ollama. Most of them run fine, as fast as chat gpt. Gonna need big improvements to go larger though even with 24gb vram. 72B is out of the question currently.
Discussion
I started with a P40, then added a 3090. 48GB enough to run 70B models, but it might be time to add a 4090 as well.