I think you can also add some gibberish (non sense random words or random string) detection. I have seen some model or space in huggingface. This can be helpful for detecting non sophisticated spammer bot which use random char or random non sense words.
Discussion
Good one, thank you. I looked in to this some a few months back when some kid was experimenting with randomly generated spam but never implemented anything.
It might be that we end up making this so complicated that it makes more sense to just label some data and train an own model instead of managing infinite rules. Fortunately, thats one of the many things nostr:npub1qlkwmzmrhzpuak7c2g9akvcrh7wzkd7zc7fpefw9najwpau662nqealf5y can handle if we get there.
Yes, probably you can make it as separate independent service (plugin) instead of rules for strfry policy. Especially if the process takes more than one second to classify one event content. Let the event comes first, process them in queue, and delete it later if it was detected as high probably gibberish spam.
Yes, i think you and nostr:npub1qlkwmzmrhzpuak7c2g9akvcrh7wzkd7zc7fpefw9najwpau662nqealf5y can manage that easily for those problem 🙂