I'm the author of gossip, a desktop client not a mobile app, but I have someting to say on this. Gossip downloads only about 4MB when I start it in the morning and run it for an hour. Since that is several orders of magnitude less than some other clients, I thought I'd make a list as to why:

1. Duplicate Events - many clients subscribe to the same filters on all of the "read" relays. So if a person has 10 read relays, they get each event 10 times. They could subscribe in a way that only gets N copies, where N is set in some setting somewhere (gossip defaults to 2 or 3).

2. Not Dynamically Connecting to Relays - when clients don't dynamically connect to the 'write' relays of whoever you follow, users are incentivized to add lots and lots of relays as a hack to try to get that content, aggrevating issue 1. If clients smartly went to the write relays (based on relay lists), all of the content a user has subscribed to would arrive (in best case scenario) and users would no longer feel the need to add massive numbers of read relays.

3. Counting how many followers you have is expensive. Kind-3 contact lists are long, and you need to pull one for each follower to make such a count. Especially if done across many relays (where the same ones are pulled multiple times, once per relay), this could be 10-20 MB on it's own. Then how often is the client triggered to recount?

4. Downloading of avatars: gossip caches these so it doesn't have to re-download them. Any client that uses an IMG tag and doesn't have a browser caching is probably downloading these over and over, at worst case every time a post scrolls into view.

5. Content images and content web page pre-rendering: This can be very expensive, but is probably unavoidable on rich UI clients. Gossip is a "poor" UI client, without any images or prerendered links (it just shows links that you can click on to open your browser). But with caching, repeated downloading of the same thing can be avoided.

6. Re-checking of NIP-05 could be done periodically, perhaps daily if it failed or every 2 weeks if it passed, probably the worst strategy is every time a post scrolls into view.

There are probably others.

damus does all of these except 1 since that would possibly miss messages

Reply to this note

Please Login to reply.

Discussion

Excellent!

Nice.

Point 1 can't really be addressed without addressing point 2 first because, as you say, you would miss messages. If you know that person P posts on relays (X, Y, Z), you can ask 2 relays (X, Z) for those notes and because they both should have them, it's usually good enough. If not, ask on 3 relays.

If you instead ask the user-configured relays for P's posts, then asking only 2 of those relays is probably going to miss tons of messages. But even if you ask all of the user-configured relays, you will both miss many messages and will also download a lot of duplicate data. The only way to get all the messages of all the people followed (and avoid excessive duplicate downloading) is to go out there to that unloved relay in Timbuktu that the user didn't configure (but that P has in his relay list) and fetch them from where P wrote them.

Of course, fetching from relays that the user didn't configure has implications.

theres another issue where many relays are overloaded and don't return things in time, so relying on a subset of relays can make loading slower. Perhaps you can collect response time stats and prioritize the fast ones? Fetching from all just seems simpler even though it can be bandwidth intensive.

So my client subscribes to all the pubkeys I follow, could it remember which 1 or 2 relays responded fastest for each pubkey and after the first poll only subscribe to the fast relays for each pubkey for remainder of the session?

Seems like it could save a lot of bandwidth and relay load.

Fetching of events not be a lot of bandwidth relatively speaking, it would have to be measured. I suspect web content referenced by events constitutes a lot more data than the event structures themselves.