This is an important question as we begin to migrate toward personal caching / aggregating relays at home that bootstrap by backfilling, and could be accused of crawling

Reply to this note

Please Login to reply.

Discussion

I don’t have a clear answer to this yet and I don’t have any issues with someone opening 1 or 2 websocket connections and requesting lots of events. I am only taking action against the worst offenders right now. I’m seeing some making thousands of connection requests in one burst. We saw one crawler attempting to request 200gb worth of (duplicate) events every day.

We are trying to come up with good rate limiting numbers that also don’t disrupt some of the more spammy clients…hope that improves too!

A unique problem we (nostr.wine) have is crawlers are considering every filter.nostr.wine/npub a unique relay. There is generally no purpose for an aggregator/crawler to get events from filter.nostr.wine anyway as they are all duplicates of big public relays and can be filtered for a specific pubkey depending on your request (it’s an aggregator).

Out of curiosity, do every https://filter.nostr.wine/npub relay publish the same metadata (NIP-11)?

Or are some crawlers not making use of the relays metadata?

Yes, they all return the same filter.nostr.wine relay metadata.

So, a first step would be to educate the crawlers to get and use the relays metadata.

And later we can define allowed limits over there.

Do you know which crawler is misbehaving like that in order to ask it to properly use the metadata to identify uniquely your relays?