I still don't understand what kind of specs I have to have to run the new Llama 3.1 models. RAM, GPU, diskspace, etc. Any good setup guides for AI noobs like me?
Discussion
AI noob checking in. ollama running llama3.1:8b is using 6.5 GB VRAM. The weights for 8b are 4 GB.
What kind of GPU are you running it on? The 8b dataset doesn't beat ChatGPT, right?
4090. I haven't had much time to compare any models yet, and I don't know how to read those comparison charts. I think larger models can be quantized to fit into less VRAM but performance suffers as you get down to 4 and 2-bit.
ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 70b is running a lot slower than 8b on a 4090, but it's usable. The ollama library has a bunch different versions that appear to be quantized: https://ollama.com/library/llama3.1