Keep us posted!
I'm bit sceptical that 24g will be enough so I might endup buying one more card 🫣
Keep us posted!
I'm bit sceptical that 24g will be enough so I might endup buying one more card 🫣
Will do! gpt-oss does well (about 16gb) Gemma 27b, qwen, mistral and devstral all run with room to spare. Unleses your running like 36b+ without quantization I think youll be fine! IMO The larger models from under 20b to under 30b aren't a massive improvment overall, Im not sure depending on your workload going past a mid 30b something param model will be worth the money spent on hardware.
Not sure if I will really hit some bottlenecks, but things may get ugly with parallel workloads.
I'd like to Uncle Jim my hw.
lmao I see. Yeah I haven't had the opportunity to try that, mostly because I know I can't squeeze another model into memory and my CPU/Memory are old (ivy bridge ddr3) so running on CPU is painful and power hungry.