NFDB will include utilities to set event TTLs
We also have a "free" document relay, so it's a topic we're thinking about.
We make sure everything from subscribers is streamed to theforest, so that we could safely nuke thecitadel. But rolling deletes are less traumatic and more predictable. The relays aren't setup to facilitate rolling deletes, tho, according to nostr:npub1xtscya34g58tk0z605fvr788k263gsu6cy9x0mhnm87echrgufzsevkk5s.
Discussion
i could do that too easily but the real requirement is to maintain a database size, this kind of policy to expire data after a time is an "easy" solution when what would be more useful is to contain the database size and remove what is not in demand the longest, meaning that stuff that people want will remain available as long as they keep fetching it
could be done as well with some statistics indexes, age and relevance
yeah, realy already stores a last accessed timestamp on every record, this is simple and sufficient i think, the least demanded stuff will naturally have the oldest last accessed timestamps and you can iterate them from oldest to newest and count the size of the records and indexes to pick off the number of them that reaches your low water mark
LRU is prone to bias from scrapers and the likes
i don't see how if they are only fetching new data
i actively block scrapers manually, and could easily detect them with their unending streams of slowly progressing since/until queries
i could maybe add a second field so it's not just the timestamp but an access count as well and that mitigates the bias because they only add one each time anyway
you could modify access timestamp on filter, and not request by ID. then my scraper wouldn’t have issues since it will use the designated endpoints
otherwise, you should also ignore filters that are not specific (just since/until) for counting last access
blocking them imo is a bad idea, but if they misbehave that makes sense
yeah, i think i'll just add an access counter and then the sort order will be by last accessed AND least accessed, and this will remove the LRU bias, the GC will sort the oldest ones first and then sort those by the least accessed
stuff that might be better to keep will also tend to have higher access counts so it can be shuffled upwards away from the low water mark
this is for later work, anyhow, but as we have discussed the idea of making relays into caches for a bigger event store would require capping the storage use of the caches, evicting the least valuable data in the cache
relay operators could then run independent cache relays as part of their service offering and subscribe to the big store and save on managing their relay's syncing with the broader network (their relays would push to the store when they store and pull when they process requests, refreshing entries that may have found their way down to the end of the list)
i'm working on making a bunker app at the moment but i might switch up the access counter value field to contain an access counter alongside the last accessed timestamp, and then maybe reinstate the option of having a garbage collector and a size limit target, these can be done in teh dynamic configuration so you can switch it up whenever you need to such as after migrating to a bigger VPS
That's a good point. Wouldn't want to remove the Bible just because it's old, for instance. Popular stuff should remain available.
it's for a caching strategy, so the document will still be available and if fetched again will again move upwards in the list as to be retained
if people make use of it to read and search the bible it will not ever fall above the high water mark and its access counter will continue to escalate, ensuring it is unlikely to fall below even with a lull in usage
The complex thing is that each verse will be a separate event. Since our events are nested, they may access part of an index, but not each individual section.