I assume the GPUs are needed for performance but not for quality of the results, right? So if for my own private use I don't have any GPU, I could still get the same results but would have to wait a minute instead of a second.

Reply to this note

Please Login to reply.

Discussion

With the 405B model? I don't think you can run this with only 64gb and no GPU, not with acceptable wait times.

There is no way you will ever run the 405bn parameter model on consumer hardware.

For all the smaller models, a M3 MacBook pro with maxed out RAM works incredibly well. The M3 chip architecture makes it so that all the RAM is available to the GPUs as well, so you can get up to 96GB of RAM.

I can run all the models with ollama with crazy speed.

I wouldn’t rule out consumer LLM focused cards with tons of VRAM

I feel like these cards would always be outdated half a year later as bigger and bigger models come along.

They probably would, but being able to run the most capable models locally every couple of years would still be huge.