I’m still working on Nostr spam detection and prevention - and ways the network can communicate and filter it out.

Been focused on event processing as the network has now 100x since I last focused on scaling. That’s almost happy - so I can focus on more innovating stuff.

Thoughts on perhaps a wss://spam.NostrGraph.net relay that only broadcasts 1984 events with a tag of spam_score (where the score is likely > 90%). It does mean each spam message triggers a 1984 message (or perhaps batched) - and this could be 1000s events/minute or more in future.

I drop spam now and don’t persist those events, so having a meta.spam_score on my API or relay published events to websockets is likely ok - but again perhaps not ideal.

I’m leading toward the approach of a relay service that can reject events. Similar to the authentication that checks if a pubkey has paid, it could be used to check if spam score is greater than a per relay threshold. Issue then is each relay needs this service with a model and the model needs to learn - or it will miss new spam formats.

Thoughts?

Reply to this note

Please Login to reply.

Discussion

Zap ⚡️ you very much for your job, sir.

I was working on an AI model that detected adult content images (and similar for text), but it affected performance on the client side (particularly when working with the image component in Damus). Doing at the relay would be good.

I think this probably better done by other server (your server for example) to not increase the burden of relay.

You just need to send kind 1984 event after doing image analysis. Finally, this event can be used by anyone (client or relay).

https://github.com/nostr-protocol/nips/blob/master/56.md

I see content warnings as another layer or service that’s different from spam. It’s more intensive as it needs to read media and not just text. I think it could benefit the network too.

It could sit in the 1984 kind spec as a new type (I don’t think flag with content-warning it exists today). I’m uninterested in any government like ‘Net Nanny’ approach.

One difficultly is if relays broadcast these events before any flagging services have reviewed them, you have a race condition where active users will likely see the content anyway - without having a lag. Users who connect later would be fine. It’s a trade off, slowly posting to broadcasting, or less augmented protection.

A 1984 generating relay is a good idea. Will cause less traffic on the whole network I guess.

I load and use your model with gunicorn in docker. It performed really good, but mine relay is small and cant handle too much requests at once.

Yeah. I expected that model’s prediction to perform better using CPU. It can run better using GPU, however I’m avoiding the server cost for now.

I’ve written a bayes model trainer as well for that repo, but haven’t pushed the code yet. It’s pretty fast - maybe 100-200/req/sec on my laptop. I’ve been using gunicorn as well.

I’ll try push the update this week. I have some new training data I can likely push too.

Good news.

I would suggest to separate training machine and API machine, just sync trained data. In this way, a GPU used to train model is not resource limited.

The architecture is like

AI machine-----API machine-------Relay Gateway-------Relay.

The relay transfer events and spam label to AI machine for learning.

The AI machine transfer trained model/data to API machine for spam score.

The relay gateway send events to API for prediction, and get response of pubkey should be denied from API machine.

The relay gateway examine pubkey like if it is paid pubkey.

In addition, combination of API machine and the relay gateway is possible but it depends on service provider's technology limitation, like CF Workers or some other providers.