I've been meaning to write a relay that keeps multiple revisions of replaceable events.

hopefully in a couple weeks I'll have the time, it should be fairly straightforward and clients that can't handle revision wouldn't break (they already need to filter for the most recent event from multiple relays)

nostr:npub1fjqqy4a93z5zsjwsfxqhc2764kvykfdyttvldkkkdera8dr78vhsmmleku I know you were interested in this too, have you done any work in this direction?

Reply to this note

Please Login to reply.

Discussion

I had the idea of a 20/20 rule. Keep newest 20 revisions of replaceable events or (if they have fewer) keep revisions for max 20 months.

writing a GC is a solution that covers more cases with less special cases though, all it requires is a last accessed timestamp

GC?

Ah, garbage collection. 😅 Thanks!

garbage collector

you track each event's last access time and then when you do a GC run you grab all the event serials, along with their last access timestamp, collect the size of each of these events, sort them from lowest to highest, total it up, then count off as many events of the oldest last access that hits your "low water mark" target and then delete them

in badger db i did this in a "batch" transaction which uses a divide and conquer to break the operation into a series of parallel operations (like, ideal is number of CPU threads) and it happens so fast

by doing it this way, you solve multiple problems with one action, and events that haven't been accessed in a long time are the best candidates for removal... old replaceable events will naturally fall into that because clients only mainly want the newest version and most of the time accessing old versions would be an infrequent event and more often you would only want the next oldest or so anyway, so they will expire from the cache without any extra rules or logic

Ah, very clever.

it's how Go manages memory, pretty much, their algorithm is substantially more clever than mine and refined over 15 years, i would eventually want to make it dynamic so it reacts to bursts in traffic and spaces out and shortens the length of GC passes to minimise latency for client requests during heavy traffic, all of these are similar cases handled by the Go GC, which, BTW, was mostly engineered by the guy who built the V8 javascript engine, the basis of nodejs and chrome's javascript interpreter

yes, i removed the replaceable delete function, you can literally just write a filter that asks for an unbounded number of replaceable event kind associated with a pubkey and the newest version comes first and all the rest

you'll find it's really easy to modify any relay to do this, just change the special cases for replaceable events

as someone pointed out to me already, clients already sort and pick the newest if multiple come back from these queries, so it's just a matter of removing that racy delete

also, deleting events other than when the author (or admin) sends a delete message is a silly idea, much better to have garbage collection that just scans from time to time and removes the most stale versions long after they are out of date

if clients were more savvy with this, they could easily implement rollback when you make a wrong update

yes, this rollback / revision control is what I wanted to implement in wikifreedia for weeks now but hadn't had the time to modify my relay

are you storing full events locally or are you storing deltas and computing the full payload when serving them?

do you have this running on replicatr? any URL I can test on?

storing full events

you have to enable the GC size limit for it to have a high and low water mark, and you can additionally configure that if the defaults don't fit your case, and even further, you can create a second level data store, which would presumably be a shared data store that is accessed over the network, and the headroom above the high water mark will then store the indexes of events that have fallen out of the local cache but still allow fast filter searches

https://mleku.net/replicatr is the core, which is a fork of khatru, and https://mleku.net/eventstore is the eventstore with the GC enabled for the "badger" and there is a "l2" event store that lets you plug in two event stores, one is usually badger, and the other can be anything else, and there is a "badgerbadger" which i wrote using two levels of badger event store, one with GC on and L2 enabled that tests the GC once your event and index storage size exceeds the size limit

btw, fiatjaf is wrong about badger, he just doesn't know how to use it or write good, bugproof binary encoding libraries... the batch processing functions are incredibly fast, like, 15gb of database can be measured in ~8 seconds and if a GC pass is needed that might take another 5-12 seconds deponding on how far over the limit it got

also, yes, that will scale, on a 20 core threadripper with 40Mb of cache and a 128gb of memory it would probably zip through that job in less than half that time

how much has replicatr and your eventstore deviated from khatru and fj's eventstore? is it a drop-in(ish) replacement? almost all my custom relays are based on khatru.

do you have NIP-50 support on your eventstore? I needed to add that for wikifreedia's search

the eventstore is almost drop-in except for the definition of the (basically identical) eventstore interface

most code written to work with khatru's arrays of closures can also be quickly adapted

no, i haven't got to doing - full text search, right? it requires writing another index, though that may be easier to get happening sooner if you use a DB engine that already has that as a turn-key option

the Internet Computer database engine has some kind of complex indexing scheme on it and likely would be easy to make it do this but the badger event store is bare bones, all it is built to do is fast filter searches and GC... it would not be hard to add more indexes but it would be a couple month's work i'd estimate

well, i think i could get MVP in 1 month anyhow

curious if it would still honour delete requests or no?

Yes, but I will go add delete events specifically to the whitelist.

Done.

Maybe you could return just the latest event by default and only return the history when asked for with a limit > 1 or some other criteria.

yup, this is what I had in mind too, but mainly to avoid sending more data than most clients will probably use

Hmm, but it doesn't make sense to specify a limit when you want (the latest version of) multiple replaceable events.

Nostr's crappy querying language fails again. We need JOINs.

It's probably better to have a special relay or a special subdomain just for the relay that archives stuff though. And then clients should know to use that when they want old stuff.

Yeah, I wanted to set up the archive, but I need someone more familiar with relays and archiving to do it, as that isn't really our area of expertise. And blows up our meagre budget. 😬

Would be good to have at least one public archive relay, in addition to the couple of public "other stuff" relays we now have.

yeah, I think the only moment where you would return multiple versions is when you're being queried for something in particular

kinds: [30818], pubkey: [fiatjaf], #d: ["ipfs"], limit: 10

perhaps this warrants adding a new filter?

kinds: [30818], pubkey: [fiatjaf], #d: ["ipfs"], revisions: 10

Indeed, that solves it.

Not the new filter, I don't think it's necessary at least for now.

This would be great for contact list recovery on metadata.nostr.com