As crazy as it sounds a high end MacBook has 128 GB of unified memory and you can run 70B models just fine for around $5k last I checked. They’re a little slow but they’ll work out of the box with Ollama.
Might be more cost effective than setting up a GPU cluster of several 5090s to get the memory capacity up. You may even be able to run Asahi Linux on there and get around macOS if you want although it’ll be painful I’m sure.