Replying to Avatar Rusty Russell

#dev #CLN

I've spent the last few workdays completely reworking our onion message code. This was scattered in various places and I wanted to unify it, and also written several years ago and I'd forgotten how the protocol actually works!

onion messages are *double* encrypted; this is the main source of confusion! At the high layer, they're a series of nested encrypted calls ("onionmsg_tlv" in the BOLT 4 spec), so each recipient decrypts and hands it on: this is exactly the same as we use for payment information. But inside that is *another* encrypted blob (onionmsg_tlv.encrypted_recipient_data), which requires a tweak which was handed to you alongside the onion, for you to decrypt (into an "encrypted_data_tlv"). Inside that is all the information about where to send next, any restrictions, and allows you to calculate the *next* tweak to hand on (it can also override the next tweak).

The double encryption is necessary because there are *three* actors here: Alice wants Bob to send her a message, without revealing her identity. So she gives Bob a "blinded path" which goes via Charlie: this path contains Charlie's pubkey (where to start the path), a blinding tweak, and two encrypted blobs for Alice to put into each layer of the onion message. The first an encrypted blob which Charlie can read, which contains her pubkey so he knows where to send it next. The second is her own, and contains a secret specific to the purpose of this message, so Bob can't play games trying to use this blinded path for anything else ("hey, are you the same node as this previous payment?") or use a different blinded path for this purpose. She can also add dummy hops (we don't yet), which she will simply absorb, to obscure the path length from Bob. You can add padding to make the hops indistinguishable (we don't yet).

Bob puts the actual stuff he wants to send Alice into the final onion call (often including his own blinded reply path!), along with the encrypted blob.

Importantly, even if Bob were sending a message *not through a blinded path* he would use the same double-encrypted format: that's so Charlie can't tell whether a blinded path is being used or not, even though it's slightly less efficient. Crypto is cheap these days, too.

Now, if Alice gives Bob a blinded path to Charlie and Charlie is Bob's peer, he can simply send the onion and the first blinding tweak to Charlie. But if Alice needs to send the message via Dave to Charlie, she needs to prepend a step. That's not quite possible, naively, because blinding tweaks are generated *forwards*, and she needs Charlie to get the right blinding tweak from Dave, and Alice has no way of making that happen. So inside Dave's encrypted blob, she uses next_blinding_override to tell Dave to hand that blinding override to Charlie instead of the normal one. I just implemented this for Core Lightning (previously we would simply connect to the first node, which is privacy-compromising and should only be done as a last resort).

These blinded paths have some nice properties: you can't use part of them (you don't know the blinding factor except for the first one, so you can't start in the middle, and you can't replace any data), you need to use all of them. They can contain timelimits to avoid easy probing, too: a classic measure would be to see if the path fails when a given node is down, but that takes time. The spec insists all errors within the blinded path are the same, and originate from the entry: this loses some analytical power on failure, but makes probing harder. The entry point is supposed to add a random delay (we don't yet!). There may still be implementation differences, but they're hard for Bob to probe (and Alice doesn't need to, as she set up the path).

Great summary, but I wish I would've seen this before finishing my initial rough implementation for #electrum last week :) Getting a full picture of the spec requires quite a lot of trawling through PRs and scattered snippets of pseudocode.

The concatenation of route to introduction point and the blinded path took some time to grasp, but the test vectors in the PR are a nice validation target to work towards.

How is CLN finding routes over the network to the target node/ introduction point? In Electrum we currently just use the channels from the channel graph as routing edges, but in theory this is not strictly needed. However you can't expect nodes to make new peer connections just to satisfy a onionmessage forward. There's currently about 445 nodes advertising support for onion messages, so the graph is not yet very traversible, and there may even be multiple disjointed subgraphs (need to check).

Next challenge for #electrum will be sending/receiving onion messages without a channel graph, over trampoline...

Reply to this note

Please Login to reply.

Discussion

Why not just go the LNDK route and use the LDK BOLT12/onion message code (which is usable directly without the rest)?

You should seriously consider this. Annoying as it is to throw away working code, this code is *annoying* in a way few things in the specs are.

I really wish there was a way to use partial onions for this instead, but I can't make it work with the addition of the payload at the end (and you'd have to use exact values down each path, but that's probably ok).

Even my new code doesn't do padding, delaying, dummy hops or fake node IDs like everyone would like. That will probably be in 24.11...

I just spent way too many hours reworking this part of the spec: I would *really* appreciate feedback on it!

https://github.com/lightning/bolts/pull/1179