I’ve got a pretty large training set of 13k (with some dupes) spam events. You could filter labelled as spam, and perhaps hash the content into a set. Then maybe check membership?
I also have around 28k pubkeys flagged as spam I can share directly. You could review them and then delete their events.
Failing those, you could use the ML to get spam scores.. but it likely is more computational.
I’ve just purged around 2.8MM spam events. Some I can’t detect easy yet - like bogus reactions and reposts. I see them in network traffic. I just can’t do anything automatically.
https://github.com/blakejakopovic/nostr-spam-detection/blob/master/labelled_nostr_events_20230225000.csv