Big surprise, copilot is now metering requests (started last week). I knew it was coming, "unlimited" was too good to be true and they were obviously burning money at my ~$8.3/mo cost. I've gotten used to agents and switching between models and realize I burn a ton of requests learning things, stuff I can't justify the cost doing if I payed per-token.

I think I might be time to invest in hardware and really tune some of the bigger models. I can modify my workflow to be mostly async, like reviews and suggestions to handle the latency.

The difficult part seems to be the gap in GPU pricing. For 48gb, you can use older cards and be set back $800, or the next step up is like $4000. There doesn't seem to be any in-between. I spend a good bit of time flipping hardware so I'm not scared of the pricing but i've always made money in that gap... We definitely live in a different world of pricing now.

Running your own cloud GPU is fine for testing but since you pay based on system uptime, not usage, unless you had a large queue, makes no sense financially right now.

Reply to this note

Please Login to reply.

Discussion

yes, hardware and prob the skills to train it on stuff.. is what I'm thinking future me will need.

I don't like renting anything, but more specifically the feeling of every question or mistake, or creative experience directly costing money. I know, in the past, i've had good luck with hardware, assuming the sysadmin (me) time is free.

Tell me more about future you? Are you talking for customer facing, private usage, or altogether personal? I assume with your connection's you'd have access to better pricing for renting the compute or am i assuming too much?

Also this bets on local models getting better, which I'm bullish on.

coding with it, not worrying about being addicted to a subscription service that can cut me off at any time.

Then yeah, me too. I don't expect my usage to go anywhere but up, and I don't expect pricing to do anything but increase, at least in the mid term.

yeah, i would love to train models, damn...

Now that I know the RTX 6000 Blackwell exists, I plainly need one of those. 70B parameters is enough for most things.

I do use copilot. I didn't know about the metering. Are they going to charge me or throttle me?

I am always content to go back to using continueai with Ollama an llama3.3

yeah, i'm about to suffer with windows so i can get full AI performance on my RX 7800XT. from AI benchmarks it looks like it's about half as fast a 4080 but that's probably bearable token output rate. it should be fast, it's got 16ghz of GDDR6 memory 256 bit bus, and a bunch of other specs i'm not that bothered to learn about. point being that it's in my house, and blasting it on LLM processing has gotta still work out competitive with the 21eur/month i pay for jetbrains AI.

the other thing is that there is all kinds of general glitches with the generic amdgpu driver on most of the versions of linux i have run it on, and i'm just like. ugh.

i can make the AI assistant, which i mainly use for documentation, use a local ollama server with the ai assistant. junie doesn't do that yet, unfortunately, currrently stuck with claude 3.7 and 4.0 on that front, but i expect they will probably not too far in the future open up using local models, in which case i can dial back my subscriptions by a lot and probably get better performance.

i saw mention that claude was being raped by users today, and it certainly was running slow on my machine. if i can get at least that performance for the cost of a couple hundred watts of electricity bill it's probably same same, but probably will be substantially faster.

ah yeah, and not to mention, it gets quite cool here in winter. having my GPU busy most of the day while i work would help warm things up a bit

oh, and, to not be stuck without access to it when my network goes down. that would be the best part

The best bang for buck I have found is aider + openrouter and then routing prompts to Gemini 2.5 pro which costs sub-cents per request and is very good at writing code. There are also a bunch of free models on openrouter, though I think you might get charged some tiny amount per request. Being able to switch models easily is great.