Global Feed Post Login
Replying to Avatar Ancap Revolt

My article explains how to install Ollama and Open WebUI through docker. You need to give it web search capability and feed it relevant docs.

I will be beginning research of docker and searxng so I can write more guides and maybe eventually develop an open source app.

Most tutorials online are extremely insecure.

When you’re running a model, run `ollama ps` or `docker exec ollama ollama ps` to see how much GPU/CPU it’s using. Models that can fit entirely on vram run at 40+ tokens per second. Models that offload to CPU/RAM are *much* slower, 8-20 tokens per second. You want the processes command to show that the model is 100% loaded to GPU.

But I haven’t messed much with AI code. I assume Qwen3, Gemma 3, and GPT-oss 20b are all good. GPT-oss 20b is a mixture of experts model, meaning it only ever has 3.6b active parameters, taking like 14gb ram. You can run it on cpu probably, it is extremely good. You need RAG

Avatar
Ancap Revolt 1mo ago

But yeah this whole project is focused on making an assistant that’s as helpful as Gemini 3 Pro as possible, if not better

Reply to this note

Please Login to reply.

Discussion

No replies yet.