Nostr Web Client

I still don't understand what kind of specs I have to have to run the new Llama 3.1 models. RAM, GPU, diskspace, etc. Any good setup guides for AI noobs like me?

Reply to this note

Please Login to reply.

Discussion

John Dee 1y ago

AI noob checking in. ollama running llama3.1:8b is using 6.5 GB VRAM. The weights for 8b are 4 GB.

0xtr 1y ago

What kind of GPU are you running it on? The 8b dataset doesn't beat ChatGPT, right?

John Dee 1y ago

4090. I haven't had much time to compare any models yet, and I don't know how to read those comparison charts. I think larger models can be quantized to fit into less VRAM but performance suffers as you get down to 4 and 2-bit.

John Dee 1y ago

ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 70b is running a lot slower than 8b on a 4090, but it's usable. The ollama library has a bunch different versions that appear to be quantized: https://ollama.com/library/llama3.1

0xtr 1y ago

How much storage space do I need for the 70B model?

John Dee 1y ago

$ ollama list

llama3.1:70b-instruct-q2_k, 26 GB

llama3.1:70b, 39 GB

codellama:13b, 7.4 GB

llama3.1:8b, 4.7 GB

0xtr 1y ago

Thanks, appreciate it!