yeah, realy already stores a last accessed timestamp on every record, this is simple and sufficient i think, the least demanded stuff will naturally have the oldest last accessed timestamps and you can iterate them from oldest to newest and count the size of the records and indexes to pick off the number of them that reaches your low water mark
Discussion
LRU is prone to bias from scrapers and the likes
i don't see how if they are only fetching new data
i actively block scrapers manually, and could easily detect them with their unending streams of slowly progressing since/until queries
i could maybe add a second field so it's not just the timestamp but an access count as well and that mitigates the bias because they only add one each time anyway
you could modify access timestamp on filter, and not request by ID. then my scraper wouldn’t have issues since it will use the designated endpoints
otherwise, you should also ignore filters that are not specific (just since/until) for counting last access
blocking them imo is a bad idea, but if they misbehave that makes sense
yeah, i think i'll just add an access counter and then the sort order will be by last accessed AND least accessed, and this will remove the LRU bias, the GC will sort the oldest ones first and then sort those by the least accessed
stuff that might be better to keep will also tend to have higher access counts so it can be shuffled upwards away from the low water mark
this is for later work, anyhow, but as we have discussed the idea of making relays into caches for a bigger event store would require capping the storage use of the caches, evicting the least valuable data in the cache
relay operators could then run independent cache relays as part of their service offering and subscribe to the big store and save on managing their relay's syncing with the broader network (their relays would push to the store when they store and pull when they process requests, refreshing entries that may have found their way down to the end of the list)
i'm working on making a bunker app at the moment but i might switch up the access counter value field to contain an access counter alongside the last accessed timestamp, and then maybe reinstate the option of having a garbage collector and a size limit target, these can be done in teh dynamic configuration so you can switch it up whenever you need to such as after migrating to a bigger VPS