How do you do that? What resources you need?

Reply to this note

Please Login to reply.

Discussion

I have an AMD Radeon 6700 XT with 12GB VRAM running ollama + openwebui. ollama supports most somewhat open LLMs, it even runs on Android. You can feed it many models from Huggingface, especially the uncensored ones. OpenWebUI is one of many frontends that's similar to ChatGPT and features stuff like doing web searches and training models on your documents. If you're curios in trying it, DM me an email and I'll create your user.

Unfortunately switching models requires dropping another from memory and pulling in a 12GB model takes a few seconds so multi-user setups either require expensive datacenter GPUs or many smaller instances with multiple top of the line GeForces.

So it's more of a toy for now.

Unfortunately at sizes between 4-14B (some models go beyond 400b parameters) there is no one size fits all model for various tasks so just removing all but one model and keeping that in memory all the time isn't an option.

But it's fun to hack around with and use it to compare results and learn a lot about AI/ML.

Great I had old PC running and tried few models on open-web gui it’s super slow with i5 processor and 16 gig ram but I will be interested in testing sending you a DM now

Check DMs. :)