Avatar
Blake
b2dd40097e4d04b1a56fb3b65fc1d1aaf2929ad30fd842c74d68b9908744495b
#Bitcoin #Nostr #Freedom wss://relay.nostrgraph.net

Sure. I’ve got 2,319 unique events for you and here is the top breakdown per relay.

Keep in mind it doesn’t mean they still have the event, but they broadcast it at one point.

Interesting.. and likely unsustainable šŸ˜‰

I’d guess re-hydration may need to rate limit itself broadcasting out to relays - or results will be some events worked and some didn’t.

And not all events should be re-hydrated, like old meta or contact list events. No need for the network to see them. You may only care if you’re doing analytics or delta tracking on them.

And here is your pubkey results. If you DM me a list of pubkeys I can send you the results.

The past hour I’ve averaged 10k/minute with 2/3s being filtered as spam. Leaving around 3k with duplicates that get processed.

I import relays from relay hints, kind 2 and kind 3. I also track seen_by for events which uses a source relay reference.

The NIPs and real word differ a bit for relay data, however the new relay NIP which I haven’t looked at may be useful.

I have over 700 relays found. A few I need to de-dupe/clean the data for. Only 416 have successfully connected at least once.

At some point using my relay health checker, I’ll automate the aggregation to connect to health relays, even as they are discovered. But not all relays add value..

Relays are going to have to start calculating a 'value score' for events and pubkeys. 

Only the events and pubkeys with value will survive being dropped - unless perhaps you pay a relay for persistence. What is value? How best to calculate it? Access frequency/recency? Keeping which events give relay users the most value? How could the value score be gamed/abused? We don't know exactly yet - and every relay using the same 'value' scoring just means relays all store the same higher 'value' data... aka. convergance + centralisation.

Why? Well, managing databases in the 100s GB is not just start and forget. Cheaper cloud servers often have a base disk of around 100GB. Relays need to avoid being abused as long term storage of misc data. Within six months we could start to hit some of these growth challenges for relays.

This will especially be important for the smaller relay ecosystem, and to enable smaller relays to continue to exist and be easy enough to operate. Even with a pay-to-relay gate, a malicious user could covertly take up GBs of DB storage with custom event data. Maybe relay paid users pick a storage plan with max GB? Maybe only accept certain event kinds? Again, how do we decide what to keep and what to drop..?

I see bursts up to 2,700 unique events/minute (45/sec) - filtering out spam.

I’m unsure the exact cause, but some relays seem to dump a bunch of events in a burst every so often. And these are cleaned events without spam or dupes. Maybe relay performance/load lag

I’ve had to focus on spam filtering and scaling the NostrGraph aggregation, and DM migrations for performance.

That side is pretty happy again, few things to to finish.. but hope to circle back on my bots again once this stuff is happy again. It should be a solid foundation for any future Nostr processors or bots going forward.

Archive bot, remind me bot, relay recommendations and a few other things are all still on my like to circle back to.

It’s not a 1:1 fair representation, but if Twitter has 500MM tweets per day (best number I found), that’s 350k/minute.

At todays 12k Nostr events/min (relay aggregation, bursts only, with spam and dupes in that count, and not just kind 1/42), it’s around 3.5% (or 1/30th) of Twitter.

On second thoughts, I forgot 60% of those Twitter tweets are spam anyway šŸ˜‚

My time got sucked up by scaling for Nostr’s growth. I’ve since been refactoring almost everything and migrating the database to deal with its size and future size. Oh, and spam… huge time sink, but great results.

The PoC webpage was never fully hooked up to the API, but everything else was built. I just needed solid data, and I noticed discrepancies in my follower and following counts, which needed review first.

My main dependency was do I have your accurate contact list, and their relay data, and events to analyse. And can I make the recommendations fast.

I’m hoping to have perhaps a couple weeks left of migrating/scaling, as I’m comfortably processing 5k/events/minute, with bursts up to 12k. Then I can circle back again.

(3) Always makes the flight quicker.

I’m about 63GB de-duped and spam filtered, with indexes. Postgres.

I haven’t purged stale content lists or metadata events yet however. I think they are around 40% of the size.

If you want a lot of data, try persisting Nostr events!

How do the Arcade City marketplace Nostr stuff going? I thought it was functional but early? Unsure.

Umm, that’s not how diamonds are formed.