Replying to Avatar Rizful.com

My understanding is that we can't ANY public blacklisting of domains or urls...., because, publishing public events with domain scores would essentially be providing a "public map of naughty content on nostr".

If we publish "domain blacklists" or "url blacklists".... then there's no reason that clients couldn't use it in the OPPOSITE way that we intend, i.e., purposely assembling notes which contain images with "bad" scores.

Now, if only porn and violence were involved, there's not a huge problem, but, there's always a chance that we might be issuing bad scores to domains because of CSAM that has been caught with high scores.... and you definitely can not, and should not, ever publish a list of URLs or domains that are distributing CSAM!

So unless I am missing something, there is no feasible way to PUBLICLY publish domain or URL scores.

However, each individual application (a relay, or a client-facing application like Yakihonne or Coracle or Primal or Damus), definitely COULD assemble its own array of "domain scores"..... and this brings me to my second point...

In my proposal, I assert that "image scoring" is CPU and GPU and Bandwidth-intensive.

For nerds like us, we know what that means. Normies, on the other hand, have no idea what that means -- but they do have an EXTREMELY KEEN sense of something else: LATENCY.

A normie who loads a Nostr app, where the notes take 5 seconds to load, because of some image scoring that is going on.... they're not going to wait 5 seconds -- they're going to go back to being abused by Facebook or Instagram -- it's well-known that for normies, PAGE LOAD SPEED is much, much, much, more important than any other factor in how they make decisions about what to do on the internet. It's a lizard-brain thing.

So I think that:

We need to score images. I don't want to go to the #asknostr tag and see horribly psyche-scarring images/video.

We need to use lots of clever techniques to ASYNCHRONOUSLY score images, before such time that our user attempts to load a note. When our user wants to see a note, we want the image score to be pre-computed.

Individual applications (i.e., individual relays or client-facing Nostr apps).... after scoring, say, 1000 images from a domain, could ALSO add that domain to a PRIVATE blacklist... and this will help them provide pre-scored notes to their users on a low-latency basis, because, once they have a domain on a blacklist, maybe they can STOP scoring images from that domain and just reject them.

But: since I believe we cannot safely PUBLISH blacklists (for the safety reasons described above), each application MUST assemble their own scores, in order to assemble their OWN blacklists (and potentially whitelists).

That's why I think our API, and hopefully a handful of competing APIs, might be the only real solution to allow Nostr client applications (and relays) to assemble their own scores.... Because, for the reasons outlined in my proposal, it's not feasible for 100+ relays and client applications to compute their own scores independently... these scores really should be a shared (but not publicly published!) resource....

This may be a stupid question, but couldn't you publish a list of hashes of problematic domains without promoting the actual domain name?

Reply to this note

Please Login to reply.

Discussion

ah, I guess that could still be used by someone trying to assemble the "bad" list, but more time and work intensive (crawl all notes, hash domains, compare with the list).

Yes. It's a good idea. For our image scoring service, we're actually storing the hashes of the URLs instead of the URLs. And I thought, what if we could PUBLISH the hashes of the URLs instead of the URLs themselves... But the problem is that hashing a URL takes less than a millsecond, so it would be trivial to "construct a porn map of nostr" if we provided the hashes, right?

Well. hashes are one way. do given a hash, you couldn't find the original. but given the original and a hash, you could know* that that hash is of that original.

* There are technically multiple originals that would produce the same hash, but it's pretty unlikely in the real world, at least as far as I understand. But I am not a cryptologist.

so, not do.

Because CSAM is a serious issue, a really serious issue, I don't think ANYONE can take the risk of possibly publishing the actual URLs and scores (or even domains and score) publicly.... right?

I haven't thought deeply on this, but yes, you certainly wouldn't post a list of URLs. Posting a list of hashes is still potentially a problem, but the bad guy would have to parse and hash every URL in every note on nostr, and compare against your list of hashes to make use of it. I don't know. Maybe there's extra data you could use to construct the hash that would make it more difficult to reverse?