What kind of throughput do you need to handle?
I’m at around ~60 spam events/second and ~30 non-spam (rough averages). I was handling 250/second for bursts/importing events easily. Basically I process 10X the unique events (after spam removal). Something like 8-10 million events/day (with spam and dupes).
I’ve got two validation workers (I tried six at one point, but didn’t need them), four persist workers, and basically a Redis steam setup. Validation worker idle time is 1ms average (it batches). And persistence averages around 2-4ms. Validation worker also calls spam ML model.
Streams helped a lot. Batching does too. Deferring work as well - I process tags, meta events, zaps, contact lists , db usage, pow, relay meta, all after the fact - but a relay doesn’t need that post processing.
I also don’t broadcast any events until they have been persisted and validated. Mostly so I can de-dupe too.
I actually stopped connecting to Damus (for aggregation) a little while back due to spam volume. Basically other relays like nos.lol have the same data but filtered :)
Happy to chat more. What I have has been pretty low touch for a while now. Has limitation around broadcasting older events if workers offline for a while or importing missed events - but can address or filter as desired.