What compress logic is that? You mean the migration step on LMDB that reindexes all the events? I forgot to do that on Badger because I wanted to see if it would work first.
nostr:npub1utx00neqgqln72j22kej3ux7803c2k986henvvha4thuwfkper4s7r50e8. You can find the PR for this here: https://github.com/bitvora/haven/pull/61. Tested on Windows, macOS, and Fedora (both x86 and ARM).
I decided against implementing the compress flag because outdated events were also being retained in BadgerDB. As saintly as nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 is, he has only implemented compression logic for LMDB. I'm too lazy to add compress logic for BadgerDB in event store, and it doesn't make sense to implement the compress flag otherwise.
In practice, some old replaceable and addressable events will naturally be deleted as users write new versions. If users find themselves with millions of outdated events stored in the database, as I did, they can always nuke the database and reimport. The different in performance is impressive.
Please let me know once you’ve merged this (or if you need any additional effort on the PR) so I can cut a new Podman/Docker version.
Discussion
Oh, now I see what you mean. I don't think that compact stuff is relevant here. The thing is that both on Badger and LMDB whenever we got a replaceable event stored twice (apparently because of race conditions) that would break the indexes in some way I forgot, such that it became impossible to delete the older version forever because it would be unreachable. The only way to fix this is to do a full rescan of the database and rebuild all the indexes, which is what LMDB does in its latest migration (well, all the other migrations I deleted because this big rescan makes them unnecessary).
Badger needs that same thing, but I forgot to do it. In any case you can do it manually by copying then nuking the database and then reimporting the events from the old database to the new empty one. I should still write the reindexing migration though.
Hi fiatjaf, yes, that’s the one. At 2:00 am “compact” somehow became “compress” in my head. But you’re right, other than making certain “immortal” events deletable on LMDB as per your original intent above, for the end user, it’s essentially a major database-wide deduplication (which is exactly what I was looking for with BadgerDB).
You’re also right that nuking the database and reimporting old notes has the same effect. This is what I’m suggesting for Haven users for now. Unfortunately, Haven can’t import its own backups (yet), but users can always reimport some of their old notes from other relays or temporarily use a second instance of Haven to do this. I was just considering a Haven-specific --compact flag for completeness (e.g., so users don’t lose private notes that aren’t currently reimported) and to save them the trouble of doing this manually.
Either way, awesome work. Many thanks! Haven is absolutely flying with the new Khatru engine. I even tested this with an old LMDB database backup I keep around for testing purposes. Compacting the database cleared out over 2 million duplicsted events from a database containing only around 1k short notes. It’s impressive how clients continuously spam lists, sets, etc. Now I know there was much more to it than just the Amethyst kind 10002 write loop bug.
Replaceable events should only be used for things that are written sporadically. There some shady stuff being done out there with these, I think it's wise to only allow some explicit whitelisted kinds that we know aren't spammy.
💯 To be fair, I’m encountering broken client behaviour or client/relay incompatibilities that result in spammy activity with otherwise "legit" events more often than actual malicious code or directed attacks. But you’re right, we should be doing something about it. Whitelisting specific events is a good start. Maybe I’ll build a Citrine-style dashboard for Haven so users can at least get a sense of what they’re storing in their relays. From there, we could add functionality for deleting individual events, deleting all events of a certain kind or even blocking them entirely.
For now, though, ReplaceEvents are doing a great job of preventing unnecessary database bloat. Again, many thanks.
I'm not talking about malicious stuff, but things like Amethyst draft events that rewrite the same addressable a thousillion times (I'm not sure this actually exists but I've heard it is a thing).
I’ll have a look deeper. I haven’t paid much attention to Haven’s private relay since Inbox and Outbox are always the ones on fire, but apparently, I only have three kind 31234 events (Amethyst-style drafts) across all my relays. Draft events are certainly high-frequency, but as far as I can see, they aren’t bloating the database.
List and set events, on the other hand, have been the bane of my existence. That, along with the fact that Amethyst still doesn’t send the right events to the correct types of relays, remain my top two unsolved tech problems on Nostr. Vitor mentioned he was working on it, but it’s a non-trivial change given how much functionality has been built on top of the classical general relay model. Fingers crossed, both clients and relays will see some improvements this year. I’d prefer these two fixes over any new features.