Nostr Web Client

Replying to

pwm

Bayesian filters a la email on relays are probably a really great idea.

Then a slider on clients to control what spam threshold is acceptable as not spam.

Thorwegian 🇳🇴 2y ago

i was proposing Bayesian filtering on relays the other day but nobody cared

Reply to this note

Please Login to reply.

Discussion

pwm 2y ago

I don't think there's much need to standardize via nips, but I bet a pull request for strfry or another pluggable to integrate rspamd or something would get in.

You might actually just be able to straight up use strfry's existing plugin system to implement something like that ad-hoc.

My ideal solution would be NIP-05 verification combined with bayesian filter to add a spam score that clients could use however they see fit, with some configurable floor beneath which the relay does not pass the message.

Maybe I should set up a relay and play with it, dunno.

Thorwegian 🇳🇴 2y ago

in my experience, if there is a Bayesian filter score and you combine it with other scores, the composite score will be less reliable. i had SpamAssassin on an email server that i ran for years and i wondered why spam kept leaking through and when i disabled everything but the Bayesian filter, it started working better. on a later email server i just dropped SA in favour of Bogofilter - the original Bayesian spam filter which does nothing else - and it did the same job. a Naive Bayesian Classifier library should be able to do the same job. and NIP-05 verification is just a token you add as an input to the classifier. it'll learn to recognise that token as a good sign if you train it enough.

Thorwegian 🇳🇴 2y ago

basically any kind of indicator can just be a token in the input of the classifier and then you don't have to worry about interpreting the token because the classifier will figure out how much it should count

pwm 2y ago

Mixing verified on the front is a good insight I wouldn't have thought of. I would have assumed you would do it as some sort of flat score modifier on the output but I think your idea is much better.

By token, just some sort of special keyword pre/ap-pended? can that be gamed?