Agreed, and I run both for different things. I'm not personally there yet, but no one expects to run a 405B model on unified memory.
Discussion
why not, assuming gpu cores keep increasing?
also from what I understand its mostly memory bandwidth constrained? so its not all cores.
Right. Nvidia is ahead because they have higher memory bandwidth as well as computation. Apple is winning in the efficiency corner, which is how we get 4B models on phones (Apple Intelligence, but also LLM Farm). All of it is pretty fantastic, and I'm going to surf this wave as long as I can.
Oh, I didn't actually answer your question: laptop memory will always be limited because DRAM needs constant refreshing. Every GB installed reduces battery life.