ah, I guess that could still be used by someone trying to assemble the "bad" list, but more time and work intensive (crawl all notes, hash domains, compare with the list).
Discussion
Yes. It's a good idea. For our image scoring service, we're actually storing the hashes of the URLs instead of the URLs. And I thought, what if we could PUBLISH the hashes of the URLs instead of the URLs themselves... But the problem is that hashing a URL takes less than a millsecond, so it would be trivial to "construct a porn map of nostr" if we provided the hashes, right?
Well. hashes are one way. do given a hash, you couldn't find the original. but given the original and a hash, you could know* that that hash is of that original.
* There are technically multiple originals that would produce the same hash, but it's pretty unlikely in the real world, at least as far as I understand. But I am not a cryptologist.
so, not do.
Because CSAM is a serious issue, a really serious issue, I don't think ANYONE can take the risk of possibly publishing the actual URLs and scores (or even domains and score) publicly.... right?
I haven't thought deeply on this, but yes, you certainly wouldn't post a list of URLs. Posting a list of hashes is still potentially a problem, but the bad guy would have to parse and hash every URL in every note on nostr, and compare against your list of hashes to make use of it. I don't know. Maybe there's extra data you could use to construct the hash that would make it more difficult to reverse?