Subnostr

NFDB will include utilities to set event TTLs

Reply to this note

Please Login to reply.

Discussion

i could do that too easily but the real requirement is to maintain a database size, this kind of policy to expire data after a time is an "easy" solution when what would be more useful is to contain the database size and remove what is not in demand the longest, meaning that stuff that people want will remain available as long as they keep fetching it

could be done as well with some statistics indexes, age and relevance

yeah, realy already stores a last accessed timestamp on every record, this is simple and sufficient i think, the least demanded stuff will naturally have the oldest last accessed timestamps and you can iterate them from oldest to newest and count the size of the records and indexes to pick off the number of them that reaches your low water mark

LRU is prone to bias from scrapers and the likes

i don't see how if they are only fetching new data

i actively block scrapers manually, and could easily detect them with their unending streams of slowly progressing since/until queries

i could maybe add a second field so it's not just the timestamp but an access count as well and that mitigates the bias because they only add one each time anyway

you could modify access timestamp on filter, and not request by ID. then my scraper wouldn’t have issues since it will use the designated endpoints

otherwise, you should also ignore filters that are not specific (just since/until) for counting last access

blocking them imo is a bad idea, but if they misbehave that makes sense

yeah, i think i'll just add an access counter and then the sort order will be by last accessed AND least accessed, and this will remove the LRU bias, the GC will sort the oldest ones first and then sort those by the least accessed

stuff that might be better to keep will also tend to have higher access counts so it can be shuffled upwards away from the low water mark

this is for later work, anyhow, but as we have discussed the idea of making relays into caches for a bigger event store would require capping the storage use of the caches, evicting the least valuable data in the cache

relay operators could then run independent cache relays as part of their service offering and subscribe to the big store and save on managing their relay's syncing with the broader network (their relays would push to the store when they store and pull when they process requests, refreshing entries that may have found their way down to the end of the list)

i'm working on making a bunker app at the moment but i might switch up the access counter value field to contain an access counter alongside the last accessed timestamp, and then maybe reinstate the option of having a garbage collector and a size limit target, these can be done in teh dynamic configuration so you can switch it up whenever you need to such as after migrating to a bigger VPS

use a morris counter

i don't think it's gonna make much difference to performance or effectiveness of the GC decision process though, it will eliminate the most stale and least demanded in total where recent access is equal

That's a good point. Wouldn't want to remove the Bible just because it's old, for instance. Popular stuff should remain available.

DM me if you have any more questions

Will do. 🫡

But not right now. At the dentist. 😂

Why are you here

trying to pretend she's not about to have dental work maybe

it's for a caching strategy, so the document will still be available and if fetched again will again move upwards in the list as to be retained

if people make use of it to read and search the bible it will not ever fall above the high water mark and its access counter will continue to escalate, ensuring it is unlikely to fall below even with a lull in usage