I think it’s talking about graphics card vRAM. If your graphics card doesn’t have the capabilities and vRAM to run and fit the model, it stays in CPU mode. In that mode, I think it doesn’t load the whole thing in regular system RAM. Instead, it pages into the model from disk. If you look at disk I/O, you’ll probably see it spiking while the model is working on a prompt.
Discussion
A cuda enabled is 283 usd 12GB
nostr:npub1v9qy0ry6uyh36z65pe790qrxfye84ydsgzc877armmwr2l9tpkjsdx9q3h can you use several GPUs for ollama?
I’ve heard that NVidia cards have some kind of link capability where you can use two together. I don’t know much about it, I’m relatively new to this space.
It’s ok you don’t have to admit you are new to me . I don’t judge or take that into consideration. Thank you for showing me , where to look by what you heard 👍
What kind?