No anti corruption layer? Vendor-specific request and response mappings?

Did they really all implement the same schemas? Even so it’s a matter of time before one breaks something

This is truly minimum vibeable product xD

Reply to this note

Please Login to reply.

Discussion

And most importantly, I don’t want to sign up to Gemini, Grok, OpenAI, Anthropic, and a whole bunch of other smaller model providers.

https://search.brave.com/search?q=VPS+services+with+GPU&source=desktop&conversation=caf1368fb68611bfb030a1&summary=1

there is a growing number of VPS services appearing that offer servers with GPUs. a 24gb 4090 GPU can do several requests per minute easily.

not saying it's not a lot more expensive than CPU only VPS services but you can opt out of the big providers and still provide service. i have been using a Qwen model lately for some of my work and when jetbrains finally caves in and allows Junie to use on-site GPU powered LLMs i won't be using cloud LLMs at all anymore.

it's also quite overstated how much utility you get from LLMs for coding. they are good at some things, and other things need to have their work checked carefully, and other things they are useless at.

He could add support for a local LLM maybe using ollama

or LM studio

i may have read the docs wrong but it doesn't seem to me like ollama is usable without reasonable bash shell/unix skills. LLM studio is simple, GUI, exposes a server endpoint, and once you find good models, just works.

For certain tasks the local/cloud LLM gap is still large.

Considering the costs of maintaining my own server if I only use it 1% of the time, plus the value of my time, I do not care enough to run many things locally.

i run my AI assistant on jetbrains using LM studio and a Qwen 14B model, it outputs as well as any of the cloud models do, and equally as fast.

for anyone who is a programmer and a gamer, since the GPU is idle when you are working there's a lot of sense in deploying an LLM on it. 16gb cards are sufficient to do 14B models and that's enough for most uses.

I meant for production use cases, not local dev/assistant

yeah, i just looked at the cost of VPS+GPU and it's pretty nuts. i know how much performance you are gonna get out of one 16gb GPU and it's probably enough to answer like 5-6 queries per minute. that's useful if you can find a way to make more than $1000 a month out of its service at a price of $350/month. otherwise, no.

i'd say if you have already got a network services business and write an agent to process data on your system in a way that adds value, and it lets you bump your magin higher than it might be useful. but honestly, at this point, economies of scale are not in favor of using them, compared to highly optimized expensive data centers full of them.

i expect this to change though, since the incentive to manufacture fast GPU memory and processors is definitely there, really, the bottleneck in the technology is not so much about the ability to perform hash functions but the speed of the memory, and this can be increased just by parallelism, think 512bit wide buses. or SANs on the devices so they can divide up the work efficiently.

but i think that we are still really in beta stages with this tech, parallelisation is still probably some ways in the future.

however, it's already good enough i don't see any sane reason why a programmer would not have an LLM and coding agent on their workstation. it's far cheaper than any VPS service and i honestly doubt that the costs of cloud services are fully reflective of the production cost yet, there is so much VC money flooding into it they are subsidising it in order to create loss leaders.

also, i'm just gonna say that a group of people with 100mbit+ internet connections distributed across the land can run a well designed distributed compute system and that the benefits of the decentralization of nostr's architecture, combined with some cheap high bandwidth VPSs acting as rendezvous points could turn out to be a lot more cost effective than depending on cloud GPU LLM services.

i've got half a mind to even start running a relay on my own hardware and getting a couple of cheap VPS services for doing rendezvous and small relays precisely to do something like this. my network is pretty reliable, uptime probably over 95% so with two or three sites like this you have got something that can actually do quite a lot of work.

it's probably a little bit more expensive in total than depending on third parties but you can guarantee stronger privacy with such a setup. especially if you have small VPS caches that hold you over while your network has a burp.