There's been some buzz in the last two days around LLM API's runnning pay-per-query via lightning payments.

As the creator of an AI service that prioritizes lightning, I wanted to share my experience and also learn a bit from the audience on this matter.

The ultimate dream we all have in the LN community is for each and every query (inference) to be paid for with the requisite amount of satoshis. That way, the user never has to keep a balance with the service and suffer from the host of corresponding inconveniences that arise from that.

When I originally built PPQ, I tried to implement exactly this feature. But when I actually got to doing this, I realized it was pretty hard:

First, generative AI queries are unpredictable in their cost. When a user sends a request, the cost of that request is generally not known until the output has finished streaming.

Second, even if one decided on some sort of fixed pricing per query, the latency to finish a lightning payment costs precious milliseconds and reduces the snappiness of the end product. I don't want to have a user wait an additional 1 second each time for the payment to clear before getting their answer.

To address this, my best idea was to charge an "extra amount" on the user's first query. That way, my service would store a de facto extra balance on behalf of the user. When the user submits their subsequent queries, the system could draw down on this "micro balance" instantly so that it didn't need to wait for the subsequent payment to clear. This micro balance would also serve to mitigate any issues where the user's output was higher than expected. So each subsequent query would always be drawing down on that micro balance and then the users realtime payments are not paying for the query, they are rathing paying to top up that micro balance over and over again.

However, even this method has some weaknesses to it. How much extra money should that first query be? Theoretically the micro balance needs to be as large as the largest possible cost that a query could be. If it wasn't that size, the service makes itself vulnerable to an attack where the users consistently write queries that exceed the amount of money in their microbalances. But the maximum cost of a gen AI query can actually be pretty large nowadays, esp with certain models. So the user's first query would always have a weird "sticker shock" attached to it where they are paying $1-2 for their first query. It creates confusion.

Aside from these problems, the other big problem is that the lightning consumer ecosystem of wallets and exchanges largely do not yet support streaming payments. The only one that does to my knowledge is @getAlby with their "budgeted payments" function in their browser extension.

So even if you were to build a service that could theoretically accept payments on a per query basis, the rest of the consumer facing ecosystem is not yet equipped to actually stream these payments.

In the end, I just adopted a boring old "top up your account" schema where users can come to the website and deposit chunks of money at a time and then draw down upon that balance slowly over time. While boring, it works just fine for now.

I woud like to hear from the community on this issue. Am I missing something? Is there a better way to tackle this? Maybe ecash has a cool solution to this?

nostr:nprofile1qyt8wumn8ghj7etyv4hzumn0wd68ytnvv9hxgtcpzemhxue69uhks6tnwshxummnw3ezumrpdejz7qpq2rv5lskctqxxs2c8rf2zlzc7xx3qpvzs3w4etgemauy9thegr43sugh36r nostr:nprofile1qyxhwumn8ghj7mn0wvhxcmmvqyehwumn8ghj7mnhvvh8qunfd4skctnwv46z7ctewe4xcetfd3khsvrpdsmk5vnsw96rydr3v4jrz73hvyu8xqpqsg6plzptd64u62a878hep2kev88swjh3tw00gjsfl8f237lmu63q8dzj6n nostr:nprofile1qyxhwumn8ghj7mn0wvhxcmmvqydhwumn8ghj7mn0wd68ytnzd96xxmmfdecxcetzwvhxgegqyz9lv2dn65v6p79g8yqn0fz9cr4j7hetf28dwy23m6ycq50gqph3xc9yvfs

Reply to this note

Please Login to reply.

Discussion

Can you add o4-mini-high?

Okay, on Monday.

Really?

Yes, is live now.

Wow awesome! It's a really good model

Even better world be the o4-mini-high.

It’s there. You have to use the reasoning effort at the bottom

Wow. Thanks!

Pay with Cashu tokens, you don't need accounts. At the end of the reply, give change in Cashu tokens.

Paying out change is nice. Does the client need to trust that the server will refund change? I assume so.

What if the payer doesn't have cashu tokens to begin with? For example, the user pays via lightning... can I give them cashu change even if they don't own any themselves? The change can be credits for my platform that they may use in their next request...

But if that is the case, their wallet needs to know how to recycle the change into future payments...

It's getting complicated fast.

If ecash was widespread tech this all seems feasible, but because it isn't such a solution would only appeal to a very narrow audience for now. Isn't that correct?

So basically, this schema could only work for payers which have the infrastructure to process the cashutokens to begin with. They need to have some cashu handliny library on their client side, right?

Lastly, the latency issue still exists. If Cashu payments have all of the same checks that lightning payments do than the time delay is gonna suck. But maybe they don't?

Thanks for reading my stream of thought.

Cashu is much faster. It can be minted by paying a lightning invoice. But we don't have to theorize, there's athenut for search and another project for cashu based ai based on openrouter. Both are open source, so you can just use the code today.

Where can I pay a lightning invoice to to get bearer key that I can use on routstr?

Bearer key? Lightning invoice?

Your request, granted:

https://www.modulo.network/category/all-products

cool product, but no that is not what I'm asking for.

👍🏻

Thanks—when I hear someone say ‘bearer bitcoin’ my ears perk up. Needless to say I am watching cashu with interest.

Use nutshell as a library.

You want a mint call.

I'm asking from the perspective of a normal non-dev user who just knows how to send lightning invoices. I don't want to have to install libraries and do function calls.

How does a normie use routstr when the normy only knows how to send lightning invoices?

If you can do API calls to access this, you can do one more to mint a token. Also, minibits or cashu.me will mint you tokens in exchange for lightning invoice in a normie friendly way.

If you are a normie, you need a user interface that someone builds for you and shows you the invoice. Athenut does this in a nice way, you don't even see tokens, only the invoice.

But please don't be a normie. If you can use AI, you can use the library. Learn. Normies will have a very hard time in what's coming at them. The normie way leads to a Disneyland themed gulag. You should expect more of yourself and for yourself.

Sounds like a pain in the ass. With the current system I don't have to deal with change.

The only (privacy) advantage of cashu is when tokens of a well establish get exchanged between users, for this narrow usecase that won't happen.

Ah yes the privacy angle of ecash tokens is what keeps getting pitched to me but I didn't understand that a prerequisite was that the tokens needed to be getting exchanged between users. Can you explain that part a bit more?

It's not true. The only prerequisite is number of tokens can't be too small. If you are the only user, you are identified trivially for example.

Ah, so if I have a few thousand users using tokens that are only reedemable at my service, I still can't tell which users are spending which tokens. Is that correct?

Yes

Yes

Where is an easy place I can pay lightning invoice to get ecash tokens to use on routstr?

No, the users themselves are unknown to the service. No need to exchange between them, you can swap the tokens with the mint anytime.

You either deal with change, you get some strange credits or you have overhead.

But you can also stream cashu tokens without change, that also works. But a bit of unnecessary overhead for the mint.

That's fair, but it doesn't seem like a huge improvement with respect to the current situation where accounts are psuedonymous and cost nothing to create. And it does introduce new problems.

How would streaming work? Afaik the cost is known after the request is done

Websocket - pay for the next 100 tokens (send a prepared string), get next 100 tokens, pay for the next 100 tokens (send another prepared string), get next 100 tokens.

Or 10 or whatever.

But send and get change is basically acountless what you have. A little trust in the provider (same as with account model) and get refunded for not used credits.

Nothing to store at the server.

I think the best solution to this, given the requirements you outlined and the need to avoid wasting time on payment processing, is to use Cashu to build a 'trust' relationship based on certain terms, such as a predefined mint for payments. This way, consumers and providers can agree on a mint, and only payments with tokens from that mint will be accepted. This removes the burden of finalising payments, and tokens from that mint can be streamed, meaning those payments could be as fast as reducing the price from a local balance. This only works if a trust relationship is established between the parties: 'consumer -> mint <- provider'. If this trust relationship doesn't exist, you have to redeem Cashu tokens for every payment to reduce counterparty risk.

Another approach would be for you to run your own mint with a custom denomination (or just sats), where customers could buy tokens by minting sats. This approach has some interesting benefits, such as improving the privacy of your users, as the tokens act as both payment and verification. This means that you wouldn't need API keys and could act as a blinded custodian.

If we have to agree on one mint, which is also custodial, what is the actual difference?

I use PPQ by keeping a browser bookmark with my access token and it just works.

How does having cashu tokens improve that user experience?

Cashu is interopperable. The key for your account on ppq is more difficult from a UX perspective.

No, it's simpler from a UX perspective

As I read more of the thread, I realized I had no idea what I was talking about.🤣

Portability and anonymity I think are the most obvious improvements

+1 for anonymity as queries don't need to be linked to an account.

Accounts are trivial to create so you can keep rotating them as often as you want.

Not as fine grained as once per request, but the advantage is I don't have to deal with change.

How do you deal with change in the case of cashu?

If you use smaller denominations, how much do you send if you don't know what the request will end up costing?

nostr:nprofile1qqsdy27dk8f9qk7qvrm94pkdtus9xtk970jpcp4w48k6cw0khfm06mspp4mhxue69uhkummn9ekx7mqpz9mhxue69uhkummnw3e82cfwvdhk6qg5waehxw309aex2mrp0yhxgctdw4eju6t09x35fm discussed the challenges of change. In SEC-04, nostr:nprofile1qqsthdwa5rs42euhnuz5xsrmmssr84hshwes7uj392vpeldj7z0zw3cppemhxue69uhkummn9ekx7mp0qy2hwumn8ghj7un9d3shjtnyv9kh2uewd9hj7qg3waehxw309ahx7um5wgh8w6twv5hsef7u3d and I worked on running CI jobs as DVMs but paid with cashu and returning change. The user (or agent) just pays as much as they want, the more they pay the more the timeout is and the more change they potentially get back. We got an alpha version working with testnuts.

That's pretty cool. And great to hear that you guys figured it out.

If nostr:nprofile1qqsdy27dk8f9qk7qvrm94pkdtus9xtk970jpcp4w48k6cw0khfm06mspp4mhxue69uhkummn9ekx7mqpz9mhxue69uhkummnw3e82cfwvdhk6qg5waehxw309aex2mrp0yhxgctdw4eju6t09x35fm went for this, how would we configure Roocode for example to work with cashu tokens?

This appears like the biggest problem the Open AI API standard supported by all the tools doesn't include support. I've not used routstr, etc to see how they work but I assume the user would need to run a local proxy to inject the tokens and collect the change.

Would this be a big deal? I'm not sure. The service could run their own custodial proxy to onboard new users with the account based experience nostr:nprofile1qqsdy27dk8f9qk7qvrm94pkdtus9xtk970jpcp4w48k6cw0khfm06mspp4mhxue69uhkummn9ekx7mqpz9mhxue69uhkummnw3e82cfwvdhk6qg5waehxw309aex2mrp0yhxgctdw4eju6t09x35fm do today.

This mint, would the service provider run it? If so, I wonder if any money transmission laws would come into effect?

Depends, you can externalize that and use a third party mint. There are some money transmission laws about keeping a custodial balance? I would say it's the same. Instead if you run you custom denomination mint these tokens are not money itself, just utility tokens

There are laws around allowing withdrawals. If a service allows withdrawals, then they generally are considered a wallet and have to respect MSB laws.

But yea if you just have them as non-withdrawable credits then it might work. But at that point why not just do a simple top up?

The one advantage I'm aware of is the blinded custodian thing so thats +1 for ecash tokens.

You can run a mint for yourself for only your application with exactly your requirements (limit withdrawals etc)! Hit me up if you need any information, happy to help.

A third party mint would introduce financial risk for the operator which was zero before

Before what? LN custodial you mean?

By before I meant right now, status quo

I think the solution you came up with is fine

I wonder whether hold invoices would be useful here? E.g., send hold invoice for the max cost and, if less, release after invoice for exact amount is settled? Otherwise, I got nothing. ;)

Oh that's an interesting thing. Never heard of those. Where can one read more?

Maybe check out https://bitcoinops.org/en/topics/hold-invoices/

Robosats uses them, for example.

Robosats uses this, like an escrow service.

You can enter telegram group for robosats and ask nostr:nprofile1qqsxg45ph8gx0vdrvtzta6xal7v86frx6jvstsnvhrlvtehmwwh4epqprdmhxue69uhkummnw3ezuumpw3ehgunpd35kztnrdakj7qgkwaehxw309amk7apwdehhxarj9ecxzun50yhsz9nhwden5te0v4jx2m3wdehhxarj9ekxzmny9u397edf, who is a dev there

On telegram group they said what what you want is implemented here:

https://www.routstr.com/

I went for top up since day one.

I'm happy with the choice and going to stick with this.

I personally like the "top up account balance". Always ropping up with sats when the price is high. To get the biggest bang out of my sats.

I dont know how you can solve your dilemma though.

Interesting… 🤔

Nice work. Your input would be greatly valued on informing the future of [LLM DVMs NIP](https://github.com/nostr-protocol/nips/pull/1929)

I am very interested in this use case as I am building #nostr #safebox to handle micropayments as seamlessly as possible. All funds are stored as Cashu tokens and I am looking to build a streaming capability either via websockets or indirectly via a NIP-17/NWC like transfer. Happy to explore!

With budget payments you mean NWC?

Correct

Regarding latency, lightning payments have to find the routing, which might fail, so it's too slow.

Maybe CASHU will be quicker, because you just require a stream of strings. This has the problem that the mint might rug people, of course.

Maybe what you can do is just to have your own mint, which the user charges with a reasonable buffer, and you stream from the tokens the user created with you.

This probably solves the speed problem, but not the buffer problem.

To minimize buffet maybe some cashu capability has to be developed so the sender authorized streams of a maximum amount, etc

nostr:nprofile1qqs9pk20ctv9srrg9vr354p03v0rrgsqkpggh2u45va77zz4mu5p6ccpzemhxue69uhk2er9dchxummnw3ezumrpdejz7qgkwaehxw309a5xjum59ehx7um5wghxcctwvshszrnhwden5te0dehhxtnvdakz7qrxnfk maybe you have better insights