Avatar
Blake
b2dd40097e4d04b1a56fb3b65fc1d1aaf2929ad30fd842c74d68b9908744495b
#Bitcoin #Nostr #Freedom wss://relay.nostrgraph.net

Signed up with the Eden Nostr paid relay. Was a nice and simple process.

https://eden.nostr.land/invoices

I was interested in this too - however identities are infinite and so are then followers and following relationships. You need another quality metric, the following link is not enough.

I also read a couple studies about this and how their attempts to use Pagerank for human relationships yielded poor results. I’ve lost the links to them, but doesn’t mean it can’t work.

I dislike approaches that marginalise new or less connected accounts - they create a barrier to entry, echo chambers, and it’s not directly correlated with actual content value.

I think we’ll end up with a bunch of signals, where each signal not enough on its own. Pagerank, PoW, spam score, relationship distance, past interactions, verification, etc.

It’s more unfiltered global streams with the spam. Any WoT or pubkey filtering likely wouldn’t extend out to the accounts generating the spam events - but also many good accounts that aren’t generating spam.

Even if it’s not impacting users directly, it’s filling relays with worthless data.

Generally below is what I used as definitions.

The issue with nsfw is it’s more of a broader label, than “spam or not spam”. Certainly you could build a model with more categories or even multi-category.

The secondary issue is nsfw is mostly visual (you can just have a client app setting to mask swear/adult words) - and using event content that is minimal text and url/s, would train really poorly, without some image/media classifier too. Certainly a service that could exist too (hotdog or not hotdog).

Impersonation often has overlap with spam, due to the dodgy call to action - however I haven’t targeted it directly. An example where impersonation could be ok is parity accounts or satire. I think it’s more a verification problem and less suited for text labelling.

#[10]

Nice.

Spam definitely is tough to define. When I made the dataset I mostly used the definitions of “is this content intended to deceive” (is there a victim?) or “is this promotional in an excessive way” (basically like Adblock). This means that explicit content or strong opinions and other content wasn’t labelled as spam.

One example I saw a lot of with kind=1984 was “marked as spam because foreign language”. I translated a lot of content that certainly wouldn’t fall under those above definitions - it was more a form of censorship.

There will never be a common standard or definition for Nostr spam that suits everyone - however something closer to Adblock and your spam email inbox is where I see this being most useful. Occasional false positive, however it’s not a big deal - especially when you can whitelist people as contacts.

I’m not sure paid relays are a suitable solution for spam mitigation. If I can pay $0.50-1.00 one time to be able to spam some call to action (like a referral or buy some product), the chance I can make 10x off that $0.50 one time expense is very high.

I could create 500+ rotating pubkeys with paid relay invoices and just post from a selection of 20-100 per day. In this case it’s the content you’d rather block/limit rather than the pubkeys themselves.

It may help however limit the new randomised pubkey spam posts with a floating pubkey.

Other low hanging fruit includes relays that still don’t rate limit enough. If I see the same note content posted by the same pubkey, or even different every 5-10 seconds, that’s certainly something that per pubkey and per IP should be throttled at the relay level. Even a check - has this pubkey posted the exact same message content more than N times the past 25 hours.

Following up with a repo and some results on Nostr spam detection experimentation. It would seem like we can achieve upwards of 98% accuracy.

The dataset is a little targeted, and less generic, however it appears to perform well today. The repo has a README with more details. Happy to experiment and collaborate with anyone interested.

Even if Nostr relays don't filter these events, even embedding an event key of meta.spam_score=0.00 into events they serve would mean clients can choose to ignore it, or set their own score thresholds for event visibility.

https://github.com/blakejakopovic/nostr-spam-detection

RE: #[0]

CC: #[1] #[2] #[3] #[4] #[5] #[6] #[7]

Out of around 8MM events, I only have 59 of kind 10002.

The majority of Damus (likely most Nostr Clients) global feed spam comes from around 50-100 quasi template messages. Make them disappear and the noise drops to very little again.

I’ve almost finished an around 100k event labeled dataset for spam detection. The current spam by volume is Asian language biased, and the non-spam content is English biased - however my relay testing looks promising.

I’ll try do more testing this weekend with the latest events and see how it performs.

If it works well, perhaps relays can use it before accepting published events. Not sure yet how best to do continual training, however I did use event kind=1984 reports to help identify and tag spam events.

Anyone else playing with Naive Bayes classifiers to detect Nostr spam? They should have a high effectiveness rating.

I’m building a training set and will see what kind of results we get. Can publishing the training data.. as I’ve had to label it manually.

Ideally we filter at the relay publish level, however clients can certainly embed their own classifier too.

I’ve been messing with the indexer idea for a slightly different couple reasons. 1. Censorship resistance and 2. Pretty soon relays will start purging events (we will hit sane VPS storage limits, or even just casual user database sizes of 100GB+, which don’t maintain themselves. We will need better ways to find events that matter in clients.

And the other key reason then is: less data. An indexer can store just event id and relays known to have had it with a date stamp. It’s 1/20th of the size of an event (with content and signature and tags, etc). Nostr events may total 1-5+TB in a year - an indexer of 50-100GB perhaps.

‘’’rust

Sure, but the full spec for markdown has things like code blocks.

‘’’

### subhead is also ugly unless rendered

Tables too.

| Syntax | Description |

| ----------- | ----------- |

| Header | Title |

| Paragraph | Text |

And such, there word wrapping and line breaks can cause lots of visual noise.

The basic subset is ok. And the key benefit is it’s easy to type/edit without lots of styling markup in the way.

Postgres and Prometheus. I’m using it internally for monitoring, research and prototyping. I’d build a public facing dashboard using something else.

Example of a personal (or view pubkey) Nostr Dashboard. What’s missing?

Q: For personal Nostr analytics, what kind of data and metrics would people like to see? Do you care per post/note, or just daily/hourly level detail? Do you want to see new followers, or unfollows? What would be cool?

Here’s an example for my pubkey. It shows when I’m active at top and when the network is engaging with me at the bottom.

I learned that kind 30000 is being used for mute. At first I was like WTF, I’ve been hacked.. but it’s just Damus. Kind 7 are my likes/reactions. Kind 1 are posts/notes. Kind 4 are DMs (sent). Kind 6 is a repost. Obviously I’d cleanup the labels to make this easier to interpret.

Scaling Push Messaging for Millions of Devices @Netflix (2019) talk is interesting. They can get 80k concurrent websocket clients on a single 8GB, 2vCPU server using an event driven architecture (I.e. without a DB on the same servers).

Nostr websocket clients are likely a bit more active in data, however 30-50k is likely achievable. It’s a pretty nice price point too, as 50k active users on a server that could cost $65/month (base cost), means Nostr growth of around 100x from today.

Some Nostr relay dev tips in there too, like asking clients to close the websocket gracefully so TCP CLOSE WAIT sits with the client (initiator) and frees up server file descriptors faster.

https://www.youtube.com/watch?v=6w6E_B55p0E

https://github.com/Netflix/zuul/wiki/Push-Messaging

Damn. Just lost my reply with a swipe down.

Short answer.. coming. I think it can be useful for inbound event coverage, however client apps need to better at selecting and connecting to relays outside of that list (maybe with option to disable) where they treat publishing a DM different to replying to a thread with a root hint of a relay they don’t have in their core set.

And an alternative which maybe I work on, is instead of the client app doing it, perhaps the relay can do it on their behalf and re-broadcast the event based on hints or DM target. It’s all an optimisation problem where I’m not sure clients will have enough information for the optimum outcome.. but often good enough.

What kinds of difference? Performance, previously missed content?

In theory it’s possible to backtest relays and compare what events were missed or gained with different relays lists. Likely useful, but not indicative of what you should use going forward either.