Note that to run local models you need a computer with a very powerful graphics card, and you still won't get close to the level of performance of models like Claude or even GLM. You need $1M worth of graphics cards not even counting electricity to get something remotely close to GLM
For fully local functionality, you need to run something like ollama on your device. You can then add localhost:11434/v1 as a custom provider in Shakespeare and it will run 100% on device. In your current setup Shakespeare is running on your device but the AI provider isn't. https://soapbox.pub/blog/shakespeare-local-ai-model
Discussion
I thought that was just to train them, that's crazy it still takes that much to run a LLM.
I thought that Shakespeare was it's own SLM built for coding nostr clients.
That is to train them. You need more like a few thousand dollars of graphics cards to max out LLM performance for a single user running today's existing models. And we're starting to get pretty useful stuff in the 4-32GB range (typical consumer devices)
