What kind of throughput do you need to handle?

I’m at around ~60 spam events/second and ~30 non-spam (rough averages). I was handling 250/second for bursts/importing events easily. Basically I process 10X the unique events (after spam removal). Something like 8-10 million events/day (with spam and dupes).

I’ve got two validation workers (I tried six at one point, but didn’t need them), four persist workers, and basically a Redis steam setup. Validation worker idle time is 1ms average (it batches). And persistence averages around 2-4ms. Validation worker also calls spam ML model.

Streams helped a lot. Batching does too. Deferring work as well - I process tags, meta events, zaps, contact lists , db usage, pow, relay meta, all after the fact - but a relay doesn’t need that post processing.

I also don’t broadcast any events until they have been persisted and validated. Mostly so I can de-dupe too.

I actually stopped connecting to Damus (for aggregation) a little while back due to spam volume. Basically other relays like nos.lol have the same data but filtered :)

Happy to chat more. What I have has been pretty low touch for a while now. Has limitation around broadcasting older events if workers offline for a while or importing missed events - but can address or filter as desired.

Reply to this note

Please Login to reply.

Discussion

Going to let #[4]​ answer since she will architect the data pipeline but your setup sounds somewhat similar to what we’ve discussed.

To your comment re: damus, that is what we are seeing as well. I saw 800+ events come in a second earlier when it got backed up for ~2 minutes.

Sounds like a great setup! The main bottleneck we have now is that the design for the plug-in architecture in strfry is synchronous - so it waits on our verdict for each message before moving onto the next, and it doesn’t support async at this point (just one instance per relay we are steaming from). It also happens before deduping, and we have no access to network information like IPs that could help us manage it at a network level since it’s coming from the relays (your setup sounds similar in that way). Right now the spam detection is fairly straightforward and just accessing metadata in redis, but the rest of the system is light and/or async and funneling it all through a single linear point just can’t handle bursts. We also don’t have much control with the current implementation of how data is handled as it starts to get backed up during bursts. We actually have RAM to spare and it’s our disk taking a beating - so would love to have a little bit more control over that, even if it’s distributing some of the I/O across different mounted disks. I don’t think we need too much more to handle current day (but the damus relay definitely is causing the linear bottleneck to be an issue), but I’m also concerned about scale if nostr breaks into new markets. We really want to maintain availability as a relay, especially as a premium one people are paying for, so want to be sure we can handle spikes gracefully! I also anticipate the spam detection to grow in complexity and want to make sure we can distribute the processing to prevent latency issues. The queues would take some of the strain off of strfry when things get busy and give us some ability to take advantage of autoscaling for efficient infra usage/preventing latency. Sounds like we’re doing similar things though - may be worth collaborating, esp if we can design components that have crossover as utilities.

🫡 🙏