Just got a rtx 5080 for this. I was using a 3090 some time ago and it wasn't too bad, but even the 5080 is limited with only 16gb memory on the card. The 5090 has 32gb I believe.
It's very fast with models that fit, though, so for everyday tasks like queries about language/translation it's fine. I am going to try some more difficult coding related stuff. Also long term, finding private and uncensored LLM access that works remotely, is a goal, albeit not one I'm super focused on.
As crazy as it sounds a high end MacBook has 128 GB of unified memory and you can run 70B models just fine for around $5k last I checked. They’re a little slow but they’ll work out of the box with Ollama.
Might be more cost effective than setting up a GPU cluster of several 5090s to get the memory capacity up. You may even be able to run Asahi Linux on there and get around macOS if you want although it’ll be painful I’m sure.
I think you're right about that, very good point to raise!
Thread collapsed
Thread collapsed