sorting is the slowest part, I ended up doing:

parallel -a tokens.txt --block 370899363 --pipepart 'sort > tokenstore/{#}'

sort -m tokenstore/* > tokens-sorted.txt

uniq -c tokens-sorted.txt | sort -S 80% -n > spammy.txt

Reply to this note

Please Login to reply.

Discussion

Correct me if I’m misunderstanding, but this is not spammiest per se, but /commonest/, correct? The admixture of innocuous tokens will make it tricky and require more work. What other criteria could we add as a pre filter?

I think it will be interesting to see more general trend of spammy keyword between popular relay. Although not sure, maybe #[6] already done this based on his research of spam filter in Nostr (his github).

Will generate this raw data like this

#[3]