
Discussion
Fucking G-Nasa uses Ollama and not any of the real engines
Ollama wrapps llama.cpp, which for single node inference is fantastic. If you have a cluster or specific arrangement that aligns with one of the other frameworks you might do better, but if you just want to run the latest models it's the place to be