I’m not actually using that sdk for event requests, so hard to say. It really depends on your use case.

Are you trying to dedupe for a timeline? If so, your timeline data model can ignore adding events that already exist.

Are you trying to dedupe for an archival backup process? You can fetch all the data first and then dedupe. Or maybe use sets to track seen.

There is also deduping across relays and relay subscriptions. Maybe that’s desirable, or maybe it isn’t - particularly if you want to count something thing like followers; you may have seen that following contact list before, however you need it to +1.

The main catch with deduping is you should use event ids and not the raw json, as the json can have the keys ordered differently.

Ultimately it would be nice to tell a relay a compacted hash of what you’ve seen and then they can skip sending those events. The issue being it’s a statistical model and may have false positives.. meaning it may not send 1 in 1,000,000 events.. which is likely fine. This doesn’t solve the multiple concurrent relay queries approach however.

Reply to this note

Please Login to reply.

Discussion

Having you here is a luxury. I have learned so many things from you. Thank you.

Hopefully it’s all accurate 😅

My scenario is very simple. I just subscribe with a query filter of a my pubkey. When I test the subscription, the event is handled multiple times times, naturally due to receiving the evemt from multiple relays.

I was wondering if there is any tooling in the SDK that helpsanage the same event, and if not, I was going to just use a set, like you mentioned, to avoid the bot replying multiple times for each handle

I’d use a queue for inbound events and then you can pull from that queue using a worker type flow and then add the event id to a set, after doing a membership test to see if you’ve already processed the event.

Alternatively, since the set will grow in size forever (between restarts), you’ll either need to persist the set data (like Redis) or use a LRU cache or something like a ring buffer. Or perhaps a bloom filter if an occasional false positive is ok.

It all kind of depends during what time frame do you expect the duplicate events. Over an hour window, or days, or months, etc. Is reprocessing an event forbidden. And how many events or relay sources.

Literal seconds, I would imagine.