This doesn't solve for the versioning of replaceable articles, but at least we have an additional free-to-use relay for our various clients.
nostr:npub10npj3gydmv40m70ehemmal6vsdyfl7tewgvz043g54p0x23y0s8qzztl5h nostr:npub1l2vyh47mk2p0qlsku7hg0vn29faehy9hy34ygaclpn66ukqp3afqutajft nostr:npub15qydau2hjma6ngxkl2cyar74wzyjshvl65za5k5rl69264ar2exs5cyejr nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 nostr:npub1wqfzz2p880wq0tumuae9lfwyhs8uz35xd0kr34zrvrwyh3kvrzuskcqsyn nostr:npub1ecdlntvjzexlyfale2egzvvncc8tgqsaxkl5hw7xlgjv2cxs705s9qs735
Discussion
I've been meaning to write a relay that keeps multiple revisions of replaceable events.
hopefully in a couple weeks I'll have the time, it should be fairly straightforward and clients that can't handle revision wouldn't break (they already need to filter for the most recent event from multiple relays)
nostr:npub1fjqqy4a93z5zsjwsfxqhc2764kvykfdyttvldkkkdera8dr78vhsmmleku I know you were interested in this too, have you done any work in this direction?
I had the idea of a 20/20 rule. Keep newest 20 revisions of replaceable events or (if they have fewer) keep revisions for max 20 months.
writing a GC is a solution that covers more cases with less special cases though, all it requires is a last accessed timestamp
GC?
garbage collector
you track each event's last access time and then when you do a GC run you grab all the event serials, along with their last access timestamp, collect the size of each of these events, sort them from lowest to highest, total it up, then count off as many events of the oldest last access that hits your "low water mark" target and then delete them
in badger db i did this in a "batch" transaction which uses a divide and conquer to break the operation into a series of parallel operations (like, ideal is number of CPU threads) and it happens so fast
by doing it this way, you solve multiple problems with one action, and events that haven't been accessed in a long time are the best candidates for removal... old replaceable events will naturally fall into that because clients only mainly want the newest version and most of the time accessing old versions would be an infrequent event and more often you would only want the next oldest or so anyway, so they will expire from the cache without any extra rules or logic
Ah, very clever.
it's how Go manages memory, pretty much, their algorithm is substantially more clever than mine and refined over 15 years, i would eventually want to make it dynamic so it reacts to bursts in traffic and spaces out and shortens the length of GC passes to minimise latency for client requests during heavy traffic, all of these are similar cases handled by the Go GC, which, BTW, was mostly engineered by the guy who built the V8 javascript engine, the basis of nodejs and chrome's javascript interpreter
yes, i removed the replaceable delete function, you can literally just write a filter that asks for an unbounded number of replaceable event kind associated with a pubkey and the newest version comes first and all the rest
you'll find it's really easy to modify any relay to do this, just change the special cases for replaceable events
as someone pointed out to me already, clients already sort and pick the newest if multiple come back from these queries, so it's just a matter of removing that racy delete
also, deleting events other than when the author (or admin) sends a delete message is a silly idea, much better to have garbage collection that just scans from time to time and removes the most stale versions long after they are out of date
if clients were more savvy with this, they could easily implement rollback when you make a wrong update
yes, this rollback / revision control is what I wanted to implement in wikifreedia for weeks now but hadn't had the time to modify my relay
are you storing full events locally or are you storing deltas and computing the full payload when serving them?
do you have this running on replicatr? any URL I can test on?
storing full events
you have to enable the GC size limit for it to have a high and low water mark, and you can additionally configure that if the defaults don't fit your case, and even further, you can create a second level data store, which would presumably be a shared data store that is accessed over the network, and the headroom above the high water mark will then store the indexes of events that have fallen out of the local cache but still allow fast filter searches
https://mleku.net/replicatr is the core, which is a fork of khatru, and https://mleku.net/eventstore is the eventstore with the GC enabled for the "badger" and there is a "l2" event store that lets you plug in two event stores, one is usually badger, and the other can be anything else, and there is a "badgerbadger" which i wrote using two levels of badger event store, one with GC on and L2 enabled that tests the GC once your event and index storage size exceeds the size limit
btw, fiatjaf is wrong about badger, he just doesn't know how to use it or write good, bugproof binary encoding libraries... the batch processing functions are incredibly fast, like, 15gb of database can be measured in ~8 seconds and if a GC pass is needed that might take another 5-12 seconds deponding on how far over the limit it got
also, yes, that will scale, on a 20 core threadripper with 40Mb of cache and a 128gb of memory it would probably zip through that job in less than half that time
how much has replicatr and your eventstore deviated from khatru and fj's eventstore? is it a drop-in(ish) replacement? almost all my custom relays are based on khatru.
do you have NIP-50 support on your eventstore? I needed to add that for wikifreedia's search
the eventstore is almost drop-in except for the definition of the (basically identical) eventstore interface
most code written to work with khatru's arrays of closures can also be quickly adapted
no, i haven't got to doing - full text search, right? it requires writing another index, though that may be easier to get happening sooner if you use a DB engine that already has that as a turn-key option
the Internet Computer database engine has some kind of complex indexing scheme on it and likely would be easy to make it do this but the badger event store is bare bones, all it is built to do is fast filter searches and GC... it would not be hard to add more indexes but it would be a couple month's work i'd estimate
well, i think i could get MVP in 1 month anyhow
Maybe you could return just the latest event by default and only return the history when asked for with a limit > 1 or some other criteria.
yup, this is what I had in mind too, but mainly to avoid sending more data than most clients will probably use
Hmm, but it doesn't make sense to specify a limit when you want (the latest version of) multiple replaceable events.
Nostr's crappy querying language fails again. We need JOINs.
It's probably better to have a special relay or a special subdomain just for the relay that archives stuff though. And then clients should know to use that when they want old stuff.
Yeah, I wanted to set up the archive, but I need someone more familiar with relays and archiving to do it, as that isn't really our area of expertise. And blows up our meagre budget. 😬
Would be good to have at least one public archive relay, in addition to the couple of public "other stuff" relays we now have.
yeah, I think the only moment where you would return multiple versions is when you're being queried for something in particular
kinds: [30818], pubkey: [fiatjaf], #d: ["ipfs"], limit: 10
perhaps this warrants adding a new filter?
kinds: [30818], pubkey: [fiatjaf], #d: ["ipfs"], revisions: 10
This would be great for contact list recovery on metadata.nostr.com
