Yeah that is definitely an issue. I was thinking about that myself but I thought my note was already too long to include everything I was thinking about! I don’t really know how you get around that easily with multicast.
Discussion
For my design, it starts with subscribing to the multicast stream
You then initiate a TCP connection to the source as a “management” connection, you get the latest packet ID
For a low-latency high-throughput system, you see if there are any packet ID gaps and request.
If you go low-throughput, you may include constant empty messages (which will increase B/W, and higher rate of empty messages = faster recovery from loss)
You can also do a client ack system if you do not have a very high amount of clients for lower B/W usage
You also have buffer memory for retransmits, and that can run out, so clients that cannot recover fast enough need to start over (causing complete loss of the stream)
That makes sense, so the TCP connection is used to request the retransmits?
When I was looking at this earlier today I came across a draft for multicast extensions to QUIC which looked interesting. I haven’t read the document completely but it does look like it includes recovery mechanisms for dropped packets. It needs to operate in conjunction with a unicast QUIC connection.
https://datatracker.ietf.org/doc/draft-jholland-quic-multicast/
You could do it via TCP, yes
After sleeping on it and trying to deal with the multicast socket dying randomly… i think the best approach would to be to use multicast for discovery and then establishing tcp + noise connections on the discovered nodes. This would allow the ui to display which nodes are active, and enable reliable and secure transport for syncing.
Not sure why you need noise if you’re just syncing events.
If you want to do more in depth sync, you should consider a privacy preserving protocol.
I always wanted to do a noise protocol as a tls+websocket alternative, this seemed like an opportune time to do it. Could broadcast pubkeys in the multicast gossip. But yeah could be optional
The times can be changed based off of requirements
- Each client should assume it is not a master at the start
- A client that is a master should broadcast a discovery message every 5 seconds.
- A client that is a master should stop being a master if it sees another client broadcasting a discovery message.
- A client that has not seen a discovery message for 10 to 25 seconds (random for each client) should make itself a master and broadcast a discovery message.
- The discovery message should contain a unique nonce.
- Each client should broadcast a response containing a hash of its own ID, a private value (unique per group of people that want to sync) and the nonce.
- Clients should check the hash to match what they expect, by calculating it with the nonce, the client’s ID and what it assumes is the correct private value.
Is this attempting to deal with the M * N problem ?
m^2, but yes
the first part prevents exponentially scaling broadcast storm-like behavior, while the 2nd part means you do not have a persistent trackable identifier being broadcast to the networks you use
this protocol was made originally for IRC-based /dev/random cross-seeding, while also allowing on demand requests if required
it still works to this day
yeah was thinking about how to deal with that without over complicating it. Not a big deal for small networks but would be important for conferences.
I think in those situations you would want the client to discover a relay and use that.
I wouldn’t want my phone to announce to be a master node if theres like 1000 nostr nodes on the network
the master here would only coordinate the schedule of discovery messages, which does not change if you have 1 or 1000 clients
Another issue I’m running into is that the multicast just seems to stop working. Now learning about IGMP timeouts
The IGMP querier should send periodic queries to find out which hosts are listening to which groups and should then keep the IGMP snooping table on the switch updated. If that isn’t working correctly then the switch would have only seen the initial unsolicited IGMP join which I guess is what may have timed out.
Yup, that was my guess, going to add a periodic igmp join and see if that fixes it