Kimi K2 is an Open Source LLM that requires $1 million dollars to self-host.

nostr:nevent1qvzqqqqqqypzqprpljlvcnpnw3pejvkkhrc3y6wvmd7vjuad0fg2ud3dky66gaxaqydhwumn8ghj7emvv4shxmmwv96x7u3wv3jhvtmjv4kxz7gqyrdkr6a0cd2ekgdmqk4rtlhnzeqs9r7787enprtmdrvwree2tvyq522ygvc

Reply to this note

Please Login to reply.

Discussion

You need to qualify that with a token speed. As soon as the ollama fixes are in, I'm going to run it on $4k of chrome (1TB DDR4 / 128 cores). Probably Q4_K_XL, but maybe Q8_K_XL... just to find out

You should also be able to use 2x$10k Nvidia rigs, or one $10k Mac Studio

Fucking G-Nasa uses Ollama and not any of the real engines

Ollama wrapps llama.cpp, which for single node inference is fantastic. If you have a cluster or specific arrangement that aligns with one of the other frameworks you might do better, but if you just want to run the latest models it's the place to be