Replying to Avatar ChipTuner

I keep seeing people #vibing projects talking about running out of token, then I realized some of my friends have just been building their apps through the online chat interfaces, with Anthropic or OpenAI directly.

I have some suggestions. Not everyone likes the rate or bleeding edge quality of GitHub Copilot, but its $100/year for "unlimited" usage, plus access to like 10 models in chat, with about 5-6 models available in agent mode, where it basically writes code in your IDE for you. I've hit some rate limits when letting 3 Claude 4 agents cook in the background just rabidly clicking continue while testing this. That was only for a single model, I was able to switch to Gemini and continue for a while longer. On my regular daily usage, I never run into limits.

I've heard similar results with Cursor, who seems to be more on the bleeding edge but you have to use their IDE which is not for me. I also have no idea how the payment system is structured.

If you want to keep using your token based LLMs, you can use Continue.dev or Cline linked up to most of the big providers you already pay for in your IDE. VS Code is the most supported.

Finally you can use Cline or Continue with your own LLM server like ollama, or ollama + owui an the even recommend the models to use, but you'll need some serious hardware to get anywhere near the quality of the paid LLMs. My hardware is a little too dated to really use it full time. I love the privacy but it's not practical for me yet. Once LLM prices go up compared to hardware, I may invest. These two also have far more tuning abilities than Copilot or others IMO, just lack the inlegence.

Ollama is pretty neat even on my ThinkPad laptop. Too slow for anything serious, but I can see how it would be quite nice on a big rig.

Reply to this note

Please Login to reply.

Discussion

It's very usable for chat on a single Titan X maxwell, but it can't keep up at the rate I work at, that and the context windows don't seem to be large enough for my projects.

I use the crap out of it for other tasks. I'm currently playing with instruct models to write changelogs for me automatically in my CI system.

So far I've only used it heavily for writing research papers and stuff like that. I like that you can swap models and stuff though. Might build a serious rig this year to see what all is out there dev-wise.

Maybe I'll start another shitcoin for access to my compute 😂

XD

I think that's on the list of all of us this year :)

The hardest thing for me is that I'm a mobile maxi. It gets tough to stay locked to a location. Which is why I've been rocking a laptop with peripherals for so many years now. I supposed I could probably remotely access it though... Lots to think about. Maybe a server rack is the way for me to go.

I hear you. My paranoia prevents me from having any persistent data on any of my client/workstation devices. Ollama + opewnwebui is the easiest way to do this IMO. You just get a really nice webui similar to chatgpt.com with way more features. It also ships with an authenticated OpenAI api you can use from curl if you want or other CLI clients.

What sort of persistent data?

*minimal persistent data. Files, programs, whatever sits on a disk. It all has to be on a server. Either accessible through a web browser or a network share.