Big surprise, copilot is now metering requests (started last week). I knew it was coming, "unlimited" was too good to be true and they were obviously burning money at my ~$8.3/mo cost. I've gotten used to agents and switching between models and realize I burn a ton of requests learning things, stuff I can't justify the cost doing if I payed per-token.
I think I might be time to invest in hardware and really tune some of the bigger models. I can modify my workflow to be mostly async, like reviews and suggestions to handle the latency.
The difficult part seems to be the gap in GPU pricing. For 48gb, you can use older cards and be set back $800, or the next step up is like $4000. There doesn't seem to be any in-between. I spend a good bit of time flipping hardware so I'm not scared of the pricing but i've always made money in that gap... We definitely live in a different world of pricing now.
Running your own cloud GPU is fine for testing but since you pay based on system uptime, not usage, unless you had a large queue, makes no sense financially right now.