Following up with a repo and some results on Nostr spam detection experimentation. It would seem like we can achieve upwards of 98% accuracy.

The dataset is a little targeted, and less generic, however it appears to perform well today. The repo has a README with more details. Happy to experiment and collaborate with anyone interested.

Even if Nostr relays don't filter these events, even embedding an event key of meta.spam_score=0.00 into events they serve would mean clients can choose to ignore it, or set their own score thresholds for event visibility.

https://github.com/blakejakopovic/nostr-spam-detection

RE: #[0]

CC: #[1] #[2] #[3] #[4] #[5] #[6] #[7]

Reply to this note

Please Login to reply.

Discussion

This is great work! is nsfw or impersonation classified spam in this?

Awesome work! Thank you for providing this.

This is great! I was thinking along the same lines that it would be good for relays to be able to tag a note as potential spam with a spam score so that clients can filter based on a user’s personal preferences. That way we don’t completely lose notes due to false positives.

Interesting stuff, machine learning for spam detection. Btw, is there any occurance of high false positive in additional data outside the test data? Overfitting case?

Maybe can be tested more on ham data from paid relay

@note1ldntl5ll238w5gav0sedrya2knnh9feu8aj3trh7yrq8zvt5c2lsm52rqr