🔖 Title: Jamming Mitigation Dry Run

🏷️ Categories: Lightning-dev

nostr:naddr1qqjrwdr9x5mkzetx956rwvtr956rxvtx95uxgcfe94jxgcfe8ycrzcnp8y6nzqg5waehxw309aex2mrp0yhxgctdw4eju6t0qy2hwumn8ghj7etyv4hzumn0wd68ytnvv9hxgqguwaehxw309ahx7um5wghx6at5d9h8jampd3kx2apwvdhk6q3q2llycjh8gg2lhy4aph9c5au8ch5s0km5axrlxrc6e24dnsaqyu0sxpqqqp65w7d0ljv

⚠️ Heads up! We've now started linking to replaceable long-form events (NIP-23), which allow for dynamic display of thread details like summaries, authors, and more. If you're unable to see this, your client may not support this feature yet.

Reply to this note

Please Login to reply.

Discussion

📅 Original date posted:2023-08-01

🗒️ Summary of this message: The plan is to collect data on HTLC endorsement and local reputation tracking to mitigate jamming attacks. Multiple teams are involved in different phases of the plan.

📝 Original message:

Hi list,

## TL;DR

We're moving ahead with the plan discussed at the summit to "dry run"

HTLC endorsement and local reputation tracking to better inform our

efforts to mitigate jamming attacks.

Our goals are:

* To use real-world data to sanity check the "steady state" behavior

of local reputation algorithms, and to better inform creation of

synthetic data for simulating attack scenarios.

* To obtain liquidity and slot utilization data to inform sane defaults

for resource bucketing.

* To provide a common data export format to use as a common basis for

analysis.

As code takes some time to write and deploy, there are a few phases

for this plan - details at the end of the email for those who are

interested!

1. Collect anonymized forwarding data with a common format.

2. Propagate experimental `endorsement` TLV.

3. Implement local reputation and set `endorsement` values.

This is a multi-team effort:

* Eclair: Thomas is looking into collecting local reputation data

in [1].

* CLN: Vincenzo is working on experimental update to propagate the

endorsement field and a plugin that will allow us to run local

reputation scoring.

* LND: I am working on data export and HTLC endorsement via

circuitbreaker [2].

* LDK: some additional plumbing is needed, as outlined in [3].

## Research Plan

### 1. Collect Anonymized Data

We're aware that we are dealing with sensitive and private information.

For this reason, we propose defining a common data format so that

analysis tooling can be built around, so that node operators can run

the analysis locally if desired. Fields marked with [P] *MUST* be

randomized if exported to researching teams.

The proposed format is a CSV file with the following fields:

* version (uint8): set to 1, included to future-proof ourselves

against the need to change this format.

* channel_in (uint64)[P]: the short channel ID of the incoming channel

that forwarded the HLTC.

* channel_out (uint64)[P]: the short channel ID of the outgoing

channel that forwarded the HTLC.

* peer_in (hex string)[P]: the hex encoded pubkey of the remote peer

for the channel_in.

* peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer

for the channel_out.

* fee_msat(uint64): the fee offered by the HTLC, expressed in msat.

* outgoing_liquidity (float64): the portion of

`max_htlc_value_in_flight` that is occupied on channel_out after the

HTLC has been forwarded.

* outgoing_slots (float64): the portion of `max_accepted_htlcs` that

is occupied on channel_out after the HTLC has been forwarded.

* ts_added_ns (uint64): the unix timestamp that the HTLC was added,

expressed in nanoseconds.

* ts_removed_ns (uint64): the unix timestamp that the HLTC was

removed, expressed in nanoseconds.

* htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was

settled.

* incoming_endorsed (int16): an integer indicating the endorsement

status of the incoming HTLC (-1 if not present, otherwise set to the

value in the incoming endorsement TLV).

* outgoing_endorsed (int16): an integer indicating the endorsement

status of the outgoing HTLC (-1 if not set, otherwise set to the

value set in the outgoing endorsement TLV).

Before we add endorsement signaling and setting via an experimental

TLV, the last two values here will always be -1. The data is still

incredibly useful in the meantime, and allows for easy update once the

TLV is propagated through the network.

### 2. Propagate Experimental Endorsement TLV

HTLC endorsement is signaled using an experimental range TLV in

`update_add_htlc` (which has been reserved in [4]):

tlv_stream: update_add_htlc_tlvs

Type: 655555

Data: byte (endorsed)

This signal should be propagated by forwarding nodes in the following

manner:

- if `endorsed` is present in the incoming `update_add_htlc`:

- set the same value for the outgoing `update_add_htlc`.

- otherwise:

- set `endorsed` = 0 for the outgoing `update_add_htlc`.

### 3. Implement Local Reputation and Set Endorsement

The final step will be to implement local reputation algorithms and

start to actively set the value of the `endorsed` TLV for outgoing

HTLCs, rather than simply copying the value presented by the sending

node. This signal will *not* be used for any purpose other than

data collection.

Experimenters are free to use the full range of bits to express

endorsement values, but should be aware that any non-zero value will

be interpreted as a positive endorsement signal by implementations

using binary endorsement (as is currently specified in [5]).

A positive endorsement signal requires that the original sender of a

HTLC sets a non-zero value, but bears the privacy risk of indicating

that they are the sending node during upgrade. We suggest that senders

choose some probability P (suggested default: 20%) with which to set

endorsed=1 for their payments.

Once we've got data collection code in place, we'll make a more

general call for node operators to start collection. In the meantime,

feel free to reach out if you have any questions or are interested in

helping out!

Cheers,

Carla + Clara

## References

[1] https://github.com/ACINQ/eclair/pull/2716

[2] https://github.com/lightningequipment/circuitbreaker/issues/77

[3] https://github.com/lightningdevkit/rust-lightning/issues/2425

[4] https://github.com/lightning/blips/pull/27

[5] https://github.com/lightning/bolts/pull/1071

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20230801/9ef1baa2/attachment-0001.html>

📅 Original date posted:2023-08-03

🗒️ Summary of this message: Long-term collection of proposed data could potentially re-identify anonymized channel counterparties, raising concerns about privacy and data storage.

📝 Original message:

Hi Carla + Clara,

I want to prefix this by saying that I'm very familiar with how limiting

the lack of available real-world datasets can be for conducting

significant simulations and empirical experiments on Lightning.

However, it may be noteworthy that long-term collection of the proposed

fields could potentially allow to re-identify the anonymized channel

counterparties based off some heuristics correlating with the public

graph data, especially when datasets from multiple (possibly

neighbouring) collection points will end up being combined.

Subsequently, this might allow to draw further conclusions on

transferred amounts, channel liquidities at particular times, and, as

HTLC settlement/failure timestamps are recorded in nanosecond

resolution, potentially even the payment destination's identity (cf.

[1]).

As surrendering this kind of data therefore requires a good level of

trust in the researchers, it might be helpful (and best practise) if you

could clarify upfront whether you intend to time-box the collection

period, where the data would be stored, and who would have access to it.

From my point of view clearly defining the collection period would also

be mandatory as we don't want to incentivise node operators to collect

and store HTLC data longer-term, especially if it's to this degree of

detail.

Best,

Elias

[1]: https://arxiv.org/pdf/2006.12143.pdf

> ### 1. Collect Anonymized Data

> We're aware that we are dealing with sensitive and private

> information.

> For this reason, we propose defining a common data format so that

> analysis tooling can be built around, so that node operators can run

> the analysis locally if desired. Fields marked with [P] *MUST* be

> randomized if exported to researching teams.

>

> The proposed format is a CSV file with the following fields:

> * version (uint8): set to 1, included to future-proof ourselves

> against the need to change this format.

> * channel_in (uint64)[P]: the short channel ID of the incoming channel

> that forwarded the HLTC.

> * channel_out (uint64)[P]: the short channel ID of the outgoing

> channel that forwarded the HTLC.

> * peer_in (hex string)[P]: the hex encoded pubkey of the remote peer

> for the channel_in.

> * peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer

> for the channel_out.

> * fee_msat(uint64): the fee offered by the HTLC, expressed in msat.

> * outgoing_liquidity (float64): the portion of

> `max_htlc_value_in_flight` that is occupied on channel_out after the

> HTLC has been forwarded.

> * outgoing_slots (float64): the portion of `max_accepted_htlcs` that

> is occupied on channel_out after the HTLC has been forwarded.

> * ts_added_ns (uint64): the unix timestamp that the HTLC was added,

> expressed in nanoseconds.

> * ts_removed_ns (uint64): the unix timestamp that the HLTC was

> removed, expressed in nanoseconds.

> * htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was

> settled.

> * incoming_endorsed (int16): an integer indicating the endorsement

> status of the incoming HTLC (-1 if not present, otherwise set to the

> value in the incoming endorsement TLV).

> * outgoing_endorsed (int16): an integer indicating the endorsement

> status of the outgoing HTLC (-1 if not set, otherwise set to the

> value set in the outgoing endorsement TLV).

>

> Before we add endorsement signaling and setting via an experimental

> TLV, the last two values here will always be -1. The data is still

> incredibly useful in the meantime, and allows for easy update once the

> TLV is propagated through the network.

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20230803/17adad5b/attachment.html>

📅 Original date posted:2023-08-03

🗒️ Summary of this message: The sender expresses concerns about the potential re-identification of anonymized data and requests clarification on the collection period, data storage, and access. The recipient assures that data will be handled securely, anonymized, and not shared. They also mention the possibility of fuzzing timestamps.

📝 Original message:

Hi Elias,

Thanks for re-emphasizing the importance of being privacy-conscious

as we look into this work - we completely agree!

> clarify upfront whether you intend to time-box the collection period,

where the data would be stored, and who would have access to it

Our ideal collection period would be limited to a 6 month period. One

of our main aims in defining a common data format is to ensure that we

can provide node operators with tooling that they can run locally, so

that they do not need to export the data _at all_, only very aggregated

results.

In the case where folks are comfortable sharing their data with us,

we will follow best practices handling this sensitive information and

will not share the data onwards at all. Fields will also be anonymized

as described in the original email. Re your concerns around timestamps,

we can also fuzz timestamps, as only the resolution period matters to

our work (thanks for flagging!).

I hope that addresses your concerns. Research based on real world data

is always a difficult line to walk, but we believe worthwhile in this

case.

Cheers,

Carla + Clara

On Thu, Aug 3, 2023 at 4:54 AM Elias Rohrer wrote:

> Hi Carla + Clara,

>

> I want to prefix this by saying that I'm very familiar with how limiting

> the lack of available real-world datasets can be for conducting significant

> simulations and empirical experiments on Lightning.

>

> However, it may be noteworthy that long-term collection of the proposed

> fields could potentially allow to re-identify the anonymized channel

> counterparties based off some heuristics correlating with the public graph

> data, especially when datasets from multiple (possibly neighbouring)

> collection points will end up being combined. Subsequently, this might

> allow to draw further conclusions on transferred amounts, channel

> liquidities at particular times, and, as HTLC settlement/failure timestamps

> are recorded in nanosecond resolution, potentially even the payment

> destination's identity (cf. 1 <https://arxiv.org/pdf/2006.12143.pdf>).

>

> As surrendering this kind of data therefore requires a good level of trust

> in the researchers, it might be helpful (and best practise) if you could

> clarify upfront whether you intend to time-box the collection period, where

> the data would be stored, and who would have access to it. From my point of

> view clearly defining the collection period would also be mandatory as we

> don't want to incentivise node operators to collect and store HTLC data

> longer-term, especially if it's to this degree of detail.

>

> Best,

>

> Elias

>

> ### 1. Collect Anonymized Data

> We're aware that we are dealing with sensitive and private information.

> For this reason, we propose defining a common data format so that

> analysis tooling can be built around, so that node operators can run

> the analysis locally if desired. Fields marked with [P] *MUST* be

> randomized if exported to researching teams.

>

> The proposed format is a CSV file with the following fields:

> * version (uint8): set to 1, included to future-proof ourselves

> against the need to change this format.

> * channel_in (uint64)[P]: the short channel ID of the incoming channel

> that forwarded the HLTC.

> * channel_out (uint64)[P]: the short channel ID of the outgoing

> channel that forwarded the HTLC.

> * peer_in (hex string)[P]: the hex encoded pubkey of the remote peer

> for the channel_in.

> * peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer

> for the channel_out.

> * fee_msat(uint64): the fee offered by the HTLC, expressed in msat.

> * outgoing_liquidity (float64): the portion of

> `max_htlc_value_in_flight` that is occupied on channel_out after the

> HTLC has been forwarded.

> * outgoing_slots (float64): the portion of `max_accepted_htlcs` that

> is occupied on channel_out after the HTLC has been forwarded.

> * ts_added_ns (uint64): the unix timestamp that the HTLC was added,

> expressed in nanoseconds.

> * ts_removed_ns (uint64): the unix timestamp that the HLTC was

> removed, expressed in nanoseconds.

> * htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was

> settled.

> * incoming_endorsed (int16): an integer indicating the endorsement

> status of the incoming HTLC (-1 if not present, otherwise set to the

> value in the incoming endorsement TLV).

> * outgoing_endorsed (int16): an integer indicating the endorsement

> status of the outgoing HTLC (-1 if not set, otherwise set to the

> value set in the outgoing endorsement TLV).

>

> Before we add endorsement signaling and setting via an experimental

> TLV, the last two values here will always be -1. The data is still

> incredibly useful in the meantime, and allows for easy update once the

>

> TLV is propagated through the network.

>

>

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20230803/bc25467c/attachment.html>