Deepseek R1 is quite amazing. After exceeding its token size it seems to leak some Chinese into responses though.
I don't have sufficient VRAM to run a model bigger than 14B though.
Let me know if you want to try it for free.
Deepseek R1 is quite amazing. After exceeding its token size it seems to leak some Chinese into responses though.
I don't have sufficient VRAM to run a model bigger than 14B though.
Let me know if you want to try it for free.
How do you do that? What resources you need?
I have an AMD Radeon 6700 XT with 12GB VRAM running ollama + openwebui. ollama supports most somewhat open LLMs, it even runs on Android. You can feed it many models from Huggingface, especially the uncensored ones. OpenWebUI is one of many frontends that's similar to ChatGPT and features stuff like doing web searches and training models on your documents. If you're curios in trying it, DM me an email and I'll create your user.
Unfortunately switching models requires dropping another from memory and pulling in a 12GB model takes a few seconds so multi-user setups either require expensive datacenter GPUs or many smaller instances with multiple top of the line GeForces.
So it's more of a toy for now.
Unfortunately at sizes between 4-14B (some models go beyond 400b parameters) there is no one size fits all model for various tasks so just removing all but one model and keeping that in memory all the time isn't an option.
But it's fun to hack around with and use it to compare results and learn a lot about AI/ML.