Folks allegedly running Llama4 and getting 25+ tokens/sec on a 3090 and shitloads of RAM
Literally a $2.5k setup to run a top model and acceptable speeds
Yea L4 was kind of a flop but MoE looks like the future and is the most accessible architecture for self hosting right now