Following up with a repo and some results on Nostr spam detection experimentation. It would seem like we can achieve upwards of 98% accuracy.
The dataset is a little targeted, and less generic, however it appears to perform well today. The repo has a README with more details. Happy to experiment and collaborate with anyone interested.
Even if Nostr relays don't filter these events, even embedding an event key of meta.spam_score=0.00 into events they serve would mean clients can choose to ignore it, or set their own score thresholds for event visibility.
https://github.com/blakejakopovic/nostr-spam-detection
RE: #[0]
CC: #[1] #[2] #[3] #[4] #[5] #[6] #[7]
This is great work! is nsfw or impersonation classified spam in this?
Thread collapsed
Awesome work! Thank you for providing this.
Thread collapsed
This is great! I was thinking along the same lines that it would be good for relays to be able to tag a note as potential spam with a spam score so that clients can filter based on a user’s personal preferences. That way we don’t completely lose notes due to false positives.
Thread collapsed
Interesting stuff, machine learning for spam detection. Btw, is there any occurance of high false positive in additional data outside the test data? Overfitting case?
Maybe can be tested more on ham data from paid relay
@note1ldntl5ll238w5gav0sedrya2knnh9feu8aj3trh7yrq8zvt5c2lsm52rqr
Thread collapsed