AI noob checking in. ollama running llama3.1:8b is using 6.5 GB VRAM. The weights for 8b are 4 GB.

Reply to this note

Please Login to reply.

Discussion

What kind of GPU are you running it on? The 8b dataset doesn't beat ChatGPT, right?

4090. I haven't had much time to compare any models yet, and I don't know how to read those comparison charts. I think larger models can be quantized to fit into less VRAM but performance suffers as you get down to 4 and 2-bit.