Unified memory is amazing, but it's more expensive per GB and slower at inference than a box full of used 3090s.
It's been a long time since local metal mattered.
Unified memory is amazing, but it's more expensive per GB and slower at inference than a box full of used 3090s.
It's been a long time since local metal mattered.
there's a convenience and power factor though. a box of gpus is annoying
Agreed, and I run both for different things. I'm not personally there yet, but no one expects to run a 405B model on unified memory.
why not, assuming gpu cores keep increasing?
also from what I understand its mostly memory bandwidth constrained? so its not all cores.
Right. Nvidia is ahead because they have higher memory bandwidth as well as computation. Apple is winning in the efficiency corner, which is how we get 4B models on phones (Apple Intelligence, but also LLM Farm). All of it is pretty fantastic, and I'm going to surf this wave as long as I can.
Oh, I didn't actually answer your question: laptop memory will always be limited because DRAM needs constant refreshing. Every GB installed reduces battery life.