Not documenting, but tracking in that spam attacks are both more visible when aggregating relay data, but also potentially more bursty and annoying too.

I’ve got content based spam down to effectively 0 using ML. Just like email spam, some will slip through, but it’s pretty decent and can improve as we scale. I open sourced the training data and code - but I have an update yet to release. Obviously it has limits tho.

This attack is kind of new. It’s more just boating DBs and clogging any relay processing. And events without content, are harder to evaluate without touching the DB to query for state - I have stateless validation works at present.

Reply to this note

Please Login to reply.

Discussion

Could you make your content-based spam reduction ML open source?

We’ve got good diversity in relays at the moment. If we separate relay security and make that global we can retain the optionality for implementations to keep the network strong.

Something like how pihole lists work, relays could access security modules from different people doing different things.

Yep. It’s already open sourced - but I do have another ~7k spam training examples since last push. I also have a bayes Training example in a local commit which works awesome.

I’ll try push my latest commit to GitHub this weekend. Repo below.

https://github.com/blakejakopovic/nostr-spam-detection

Nice, thank you for sharing 🤙