every day with goose using a local llm
Discussion
What are you working on currently jack?
whats your local llm setup?
ollama with qwq or gemma3
on mac?
yes
Just had a quick look into this and it seems possible to do this for free. Ie to run open-source models on the Mac like LLama 2, Mistral, or Phi-2 locally using Ollama.
No internet, no API keys, no limits and Apple Silicon runs them well.
Would 48 gb be sufficient?
Tried Qwen3 yet?
I just run qwen3:30b-a3b with a 64k context (tweak in the modelfile) and it can do things 🤙 . Uses 43 GB
How much video RAM is needed to run a version of the models that are actually smart though? I tried the Deepseek model that fits within 8 GB of video RAM, and it was basically unusable.
I wonder what I am doing wrong. Was so excited to get this set up but at it all day and running into hick ups. Here's my chatgpt assisted question:
I tried setting up Goose with Ollama using both qwq and gemma3 but running into consistent errors in Goose:
error decoding response body
init chat completion request with tool did not succeed
I pulled and ran both models successfully via Ollama (>>> prompt showed), and pointed Goose to http://localhost:11434 with the correct model name. But neither model seems to respond in a way Goose expects — likely because they aren’t chat-formatted (Goose appears to be calling /v1/chat/completions).
nostr:nprofile1qqsgydql3q4ka27d9wnlrmus4tvkrnc8ftc4h8h5fgyln54gl0a7dgspp4mhxue69uhkummn9ekx7mqpxdmhxue69uhkuamr9ec8y6tdv9kzumn9wshkz7tkdfkx26tvd4urqctvxa4ryur3wsergut9vsch5dmp8pese6nj96 Are you using a custom Goose fork, adapter, or modified Ollama template to make these models chat-compatible?
what the hell is goose? someone please fill me in
Which model? Been testing for 14 months now, but Claude sets the bar high
Oh, you answered a reply. Primal doesn't show replies to replies by default? 🤦♂️
next step is that llms know each other’s strengths and weaknesses, and we get them to select the best llm for the particular task.