Folks allegedly running Llama4 and getting 25+ tokens/sec on a 3090 and shitloads of RAM

Literally a $2.5k setup to run a top model and acceptable speeds

Yea L4 was kind of a flop but MoE looks like the future and is the most accessible architecture for self hosting right now

Reply to this note

Please Login to reply.

Discussion

No replies yet.