I think it’s talking about graphics card vRAM. If your graphics card doesn’t have the capabilities and vRAM to run and fit the model, it stays in CPU mode. In that mode, I think it doesn’t load the whole thing in regular system RAM. Instead, it pages into the model from disk. If you look at disk I/O, you’ll probably see it spiking while the model is working on a prompt.

Reply to this note

Please Login to reply.

Discussion

A cuda enabled is 283 usd 12GB

I’ve heard that NVidia cards have some kind of link capability where you can use two together. I don’t know much about it, I’m relatively new to this space.

It’s ok you don’t have to admit you are new to me . I don’t judge or take that into consideration. Thank you for showing me , where to look by what you heard 👍

What kind?