Nice.

Just thinking out loud, not knowing your exact approach, one method one could compare note content by percentage is to write a function that adds up all the ASCII characters in a note and then produce a score from this. Notes with over 50 characters and with small score differentials posted within a short timeframe are thereby similar in content and likely spam.

Reply to this note

Please Login to reply.

Discussion

And there are tons of text similarity algorithms. No idea which one will work better.

Right. The real spam problems will surface further on when AI bots will make procedurally generated content. Then web-of-trust scores may become useful.