OK, I'm looking for some help from a fellow lnd node runner.

Problem 1: My comnection to my peer keeps dropping about every 3.5 minutes

Logs say "pong response failure" and "timeout while waiting for pong response -- disconnecting".

The error in the peer's log says the same. That it also had a timeout while waiting for pong reaponse.

I can reconnect with no problem at all.

There is not any network issues such as packet loss (see notes on the pcap below for the evidence to back this up). Tor is not in the mix for this test.

I am able to sends sats if I do so quickly after connecting to the peer. So it seems like things can work properly if the connection issue can be sorted out.

I took a pcap of a connection, transaction and disconnection. Near the end, I see the client (node which initiated the connection) just absolutely slamming PSH,ACKs to the tune of 37 of them in just under 0.001 seconds. Then it sends a TCP Retransmission 0.006 seconds later and gets an ACK 0.036 seconds later, which is a perfectly reasonable response time.

The next batch is some TCP keepalives and keepalice ACKs. Some PSH,ACKs and ACKs in sub ms response time, followed by a retransmit and and ACK from the other side.

Finally 2 more keepalives and Keepalive ACKs in 0.012 seconds and then we get the FIN,ACK from the client followed by the RST,ACK from the server (remote peer to which we connected).

The FIN,ACK did come 5 seconds after the last ACK, so I feel like the server should have responded sooner, but at the same time I don't feel like a 5 second lag should cause a connection to be dropped and no attempt to ever be made to connect to it again. Also, these blitzkreigs of packets within 1ms is absurd.

Any ideas on where I should look next? I guess take pcaps on both sides and compare them?

This is absolutely brutal. I wouldn't expect most sysadmins to go through this much trouble to track down this issue, let alone any normal human be expected to do so.

Reply to this note

Please Login to reply.

Discussion

Way too many issues for normal humans…

If your end is working, them the problem is at the other end. Close channel and open new channel with more stable peer.

They are both my nodes. I'm testing out the software.

No firewalls configured to drop packets or something like fail2ban that may be mis-characterizing and dropping packets?

Nope.

Seeimg some wild packet sizes in the pcap though... 26Kb... even 50Kb in a single packet? Seems sus 😑

You're not going to believe this. When I change the log level from info to debug, the problem disappears entirely. FML

nostr:nevent1qqsdyp0fynta04lnxusw2qkvjcn7ea73haelds7pr5s4ezm9spzykyqpzpmhxue69uhkummnw3ezumt0d5hsygxnp65cafj7j5ler2un76esafg7kv79qmu86j0kqzsnnthsp254zypsgqqqqqqs9ar6vj

lnd 0.19 ??