Nice.
Just thinking out loud, not knowing your exact approach, one method one could compare note content by percentage is to write a function that adds up all the ASCII characters in a note and then produce a score from this. Notes with over 50 characters and with small score differentials posted within a short timeframe are thereby similar in content and likely spam.