Should probably go by list-size, rather than date. Has more to do with how hard it is to filter against the list.
Discussion
idk how it works with other databases but with badger you can use these "batch" streaming functions that automatically run with as many threads as you specify. a mark and sweep style GC pass on 18gb takes about 8 seconds on my machine, probably faster on current gen NVMe and ddr5 memory
the GC can also do multiple types of collection as well, all at the same time, so you could set it to prune stuff that you keep access counters and first-seen timestamps as well as snuffing old tombstones
ah, just to explain how you do things with badger, because it differs from most other key/value stores due to the separation of key and value tables...
because writing values doesn't force any writes on the keys, keys stay in order a lot more, generally, once compacted, forever compacted (compaction is playing the log out to push it into an easily iterated, pre-sorted array)
as a result, the best strategy with badger for storing any kind of information that won't change, and needs to be scanned a lot, you put, very often, values in the keys for such immutable stuff, such as tombstones
it's also used for searching, as you would expect, but this is the reason why when you use badger (properly) to write a database, it's so much faster. it doesn't have to skip past the values when its' scanning, and you don't have to re-compact the keys when you change values (and yes, it of course has versioning of keys, i don't use this feature but in theory there is often some number of past versions of a value that can be accessed with a special accessor for this, but more generally it makes the store more resilient, as you would expect)
so, yeah, current arrangement for tombstones in realy is the first (left, most significant) part of the event ID hash is the key. finding it is thus simple and fast, just trim off the last half and prefix with the tombstone key prefix and even you can just use the "get" function on the transaction instead of making a whole iterator, so it's very neat, and very fast.
i also exploit these properties of badger key tables with the "return only the ID" functions by creating an index that contains the whole ID after the event's serial number, which means the event itself doesn't have to be decoded for this case, which is a huge performance optimization as well.
yes, that full ID index, it also contains a truncated hash of the pubkeys and the kind number and timestamp, so you can just pull all of the relevant keys for the result serials found, filter out pubkeys, kinds, slice it by range (if the index search didn't already do this) and then sort them in ascending or descending order of timestamp, and then just return the event ids in this order.
it's a much faster request to process which means once the client has this list, it can just pull the IDs with a single query for its initial display, and add some extra to add some room to pull the rest as the display requires them, lazy loading style.
this is the key reason why i make this index and i designed it so that it is as svelte and sleek as possible for both bandwidth and rendering efficiency
none of this is possible if you don't use a badger key/value store also. and this is also why they built badger, its main purpose for existing was to serve as the storage engine for a graph database, which is very read-heavy, you know database stuff, so you know that joining tables is the biggest time cost in a graph database, which is basically a database that does massive table joins to pull results.
I'm just gonna feed what you just wrote into my AI and tell it to do some magic and see what it suggests.