Yeah, nostr:npub1qdjn8j4gwgmkj3k5un775nq6q3q7mguv5tvajstmkdsqdja2havq03fqm7 is looking into that. They attack his home servers, as well.
I forgot to hit send on this earlier today, but speaking of David vs. Goliath, if you haven’t done it already, make sure to tarpit Alexandria, TheForest1, GitCitadel, and anything else you’re deploying that is expected to hold a lot of data. Otherwise, scrapers will come for the goods, overload the infrastructure and cost you quite a few sats.
Discussion
this stuff annoys the hell out of me too, i run my test #realy fairly infrequently but within minutes some bot is trying to scrape it for WoT relevant events, and i wish there was an effective thing to slow that down
i tried adding one kind of rate limiter but that didn't really work out so well, in the past what i've seen work best tends to be where the relay just stops aswering and dropping everything that comes in... probably if that included pings the other side would automatically drop
i've now added plain HTML and i think that requires something also but maybe more simple, like, if it gets a query more than once every 5 seconds for 5 such periods it steadily adds more and more delay in processing, the difference to sockets is to do it on http it has to associate with an IP address
I'm assuming your first-level WoT can auth and get around delays. Otherwise, uploading publications would collapse.
yeah, they are only reading, not writing, so all that would really be required is to slow them down
probably could split the direct followed vs follows of follows into two tiers also so that the second level of depth get less service quality
probably maybe should work on it today because this problem with scrapers is fairly annoying
it's only nostr WoT spiders at this point, i wish i could just suggest to the spider people to just open a subscription for these and catch all the new stuff live, that would be more efficient for them and less expensive for relay operators
at this point it's fairly early in the game for that stuff so maybe some education of WoT devs might help...
cc: nostr:npub176p7sup477k5738qhxx0hk2n0cty2k5je5uvalzvkvwmw4tltmeqw7vgup there is no point in doing sliding window spidering on events once you have the history, from then on just open a subscriptiion
this would be more friendly from the AI spiders if they had a way to just catch new updates, but unless dey pay me! i'm just gonna tarpit them
Yeah, I don't get the point of spidering data you can just stream. Like, they spider Nostr websites and it's like... just subscribe to the relay. *roll eyes*