ollama with qwq or gemma3

Reply to this note

Please Login to reply.

Discussion

on mac?

yes

Just had a quick look into this and it seems possible to do this for free. Ie to run open-source models on the Mac like LLama 2, Mistral, or Phi-2 locally using Ollama.

No internet, no API keys, no limits and Apple Silicon runs them well.

you can even use dave with a setup like this. fully private local ai assistant that can find and summarize notes for you

nostr:note17c3zxygr2krkt90lyrvh5rxtfmnstkcpkjyxmkz5z3tleagc848qlfyu9m

Cool. I’m still learning. So much to play with!

Would 48 gb be sufficient?

Tried Qwen3 yet?

I just run qwen3:30b-a3b with a 64k context (tweak in the modelfile) and it can do things 🤙 . Uses 43 GB

How much video RAM is needed to run a version of the models that are actually smart though? I tried the Deepseek model that fits within 8 GB of video RAM, and it was basically unusable.

I wonder what I am doing wrong. Was so excited to get this set up but at it all day and running into hick ups. Here's my chatgpt assisted question:

I tried setting up Goose with Ollama using both qwq and gemma3 but running into consistent errors in Goose:

error decoding response body

init chat completion request with tool did not succeed

I pulled and ran both models successfully via Ollama (>>> prompt showed), and pointed Goose to http://localhost:11434 with the correct model name. But neither model seems to respond in a way Goose expects — likely because they aren’t chat-formatted (Goose appears to be calling /v1/chat/completions).

nostr:nprofile1qqsgydql3q4ka27d9wnlrmus4tvkrnc8ftc4h8h5fgyln54gl0a7dgspp4mhxue69uhkummn9ekx7mqpxdmhxue69uhkuamr9ec8y6tdv9kzumn9wshkz7tkdfkx26tvd4urqctvxa4ryur3wsergut9vsch5dmp8pese6nj96 Are you using a custom Goose fork, adapter, or modified Ollama template to make these models chat-compatible?