Link to NIP-77?
I just tagged strfry 1.0.0. Here are some of the highlights:
* negentropy protocol 1: This is the result of a lot of R&D on different syncing protocols, trying to find the best fit for nostr. I'm pretty excited about the result. Negentropy sync has now been allocated NIP 77.
* Better error messages for users and operators.
* Docs have been updated and refreshed.
* Lots of optimisations: Better CPU/memory usage, smaller DBs.
Export/import has been sped up a lot: 10x faster or more. This should help reduce the pain of DB upgrades (which is required for this release). Instructions on upgrading are available here:
https://github.com/hoytech/strfry?tab=readme-ov-file#db-upgrade
Thanks to everyone who has helped develop/debug/test strfry over the past 2 years, and for all the kind words and encouragement. The nostr community rocks!
We've got a few things in the pipeline for strfry:
* strfry proxy: This will be a new feature for the router that enables intelligent reverse proxying for the nostr protocol. This will help scale up mega-sized relays by allowing the storage and processing workload to be split across multiple independent machines. Various partitioning schemes will be supported depending on performance and redundancy requirements. The front-end router instances will perform multiple concurrent nostr queries to the backend relays, and merge their results into a single stream for the original client.
* As well as scaling up, reverse proxying can also help scale down. By dynamically incorporating relay list settings (NIP-65), nostr queries can be satisfied by proxying requests to external relays on behalf of a client and merging the results together along with any matching cached local events. Negentropy will be used where possible to avoid wasting bandwidth on duplicate events.
* Archival mode: Currently strfry stores all events fully indexed in its main DB, along with their full JSON representations (optionally zstd dictionary compressed). For old events that are queried infrequently, space usage can be reduced considerably. As well as deindexing, we are planning on taking advantage of columnar storage, aggregation of reaction events, and other tricks. This will play nicely with strfry proxy, and events can gradually migrate to the archival relays.
* Last but not least, our website https://oddbean.com is going to get some love. Custom algorithms, search, bugfixes, better relay coverage, and more!
Discussion
Thank you! I read the Range-Based Set Reconciliation article. Seems like an efficient strategy.
Makes me wonder about frequency, cost and strategy of rebalancing the b-tree, given that elements would tend to be mostly appended near the end.
The NIP-77 document doesnāt specify what the content of the āNEG-MSGā messages are. It also doesnāt seem to cover the actual sending of events in either direction (after the sets have been reconciled).
NIP-77 I believe is still a draft/work-in-progress, so I don't know the final form yet.
I think fiatjaf's intent was just to standardise the nostr integration, rather than negentropy as a whole. I guess this is similar to how JSON/SHA256/secp256k/etc are not themselves specified in NIPs, but instead external specifications are referred to. That said, the negentropy spec is relatively straightforward and there are already at least 5 implementations (and I know of some more in development too): https://github.com/hoytech/negentropy/tree/master?tab=readme-ov-file#definitions
> It also doesnāt seem to cover the actual sending of events in either direction (after the sets have been reconciled).
True, but by design this is out-of-scope of the negentropy protocol. Negentropy just performs a set reconciliation so the client knows which IDs it has and/or needs. It is then up to it to decide what to do. Currently `strfry sync` will post events it has with regular EVENT messages, and/or download events it needs with regular REQ messages.
However, there are a lot of other possibilities. If a client is only interested in uploading (or downloading), it may only do one of those actions. In fact it may just store those IDs and fetch them later, or through some other relay or mechanism. Alternatively, if a client doesn't actually care about the event contents and is only trying to compute an aggregate "reaction count" across relays, it may never actually download the events, and purely use negentropy for de-duplication purposes.
Good points, I just posted there in a quick way to get the ball rolling.
I think I will edit the NIP text to describe how events are to be transmitted after they are discovered.
Maybe we should also describe the Negentropy protocol there? I can try to do that, but I wonder if it will confuse people more than help. What do you think?
> Currently `strfry sync` will post events it has with regular EVENT messages, and/or download events it needs with regular REQ messages.
Itās fine, IMO, that reconciliation is performed out-of-band. A sentence describing what you just said would be fine.
> Maybe we should also describe the Negentropy protocol there?
Whatever the relay is expected to produce should be definedāor at minimum linked to a specification.
Also, IMO, sending hex strings of binary seems to be both inefficient and against the Nostr ethos of legible messages/events. If youāre sending binary, base64 only inflates by 33% compared to hex stringsā 100%.
Personally, Iād prefer to see the message format specified explicitly as debuggable JSON, if feasible.
> If youāre sending binary, base64 only inflates by 33% compared to hex stringsā 100%.
On purely random data, after compression hex is usually only ~10% bigger than base64. For example:
$ head -c 1000000 /dev/urandom > rand
$ alias hex='od -A n -t x1 | sed "s/ *//g"'
$ cat rand |hex|zstd -c|wc -c
1086970
$ cat rand |base64|zstd -c|wc -c
1018226
$ wcalc 1086970/1018226
= 1.06751
So only ~7% bigger in this case. When data is not purely random, hex often compresses *better* than base64. This is because hex preserves patterns on byte boundaries but base64 does not. For example, look at these two strings post-base64:
$ echo 'hello world' | base64
aGVsbG8gd29ybGQK
$ echo ' hello world' | base64
IGhlbGxvIHdvcmxkCg==
They have nothing in common. Compare to the hex encoded versions:
$ echo 'hello world' | hex
68656c6c6f20776f726c640a
$ echo ' hello world' | hex
2068656c6c6f20776f726c640a
The pattern is preserved, it is just shifted by 2 characters. This means that if "hello world" appears multiple times in the input, there may be two different patterns for it in Base64, but only one in hex (meaning hex effectively has a 2x larger compression dictionary).
Since negentropy is mostly (but not entirely) random data like hashes and fingerprints, it's probably a wash. However, hex is typically faster to encode/decode and furthermore is used for almost all other fields in the nostr protocol, so on the whole seems like the best choice.
> Personally, Iād prefer to see the message format specified explicitly as debuggable JSON, if feasible.
This is theoretically possible, but it would be very difficult to interpret/debug it anyway, and it would add a lot of bandwidth/CPU overhead.
If weāre gonna go binary, then Iād prefer to just go full binary. WebSockets support binary mode. We could encode these messages using CBOR and suffer minimal loss on the wire. This would still be compatible with stream compression, AFAIK.
Encoding binary data as a string can be completely avoided for this new message type, since both ends of the connection are expected to handle binary payloads anyway. The usual argument of avoiding a human-unreadable format is moot here, IMO.
Hehe in fact my original attempt at sync in strfry was a protocol called "yesstr" using my Quadrable library: https://github.com/hoytech/quadrable
It used binary flatbuffers over websockets. It caused a lot of problems since binary websocket messages require a distinct message type from text ones. Since nothing else in nostr uses these, of client libraries were having trouble integrating with it. Using a structured binary format like CBOR or flatbuffers also means clients would have to pull in a (probably heavy) serialisation library.
The nice thing about the current approach is that any existing nostr client already must support websocket text messages containing JSON containing hex data.
Whatās the rate of support for compression on the wire? Iām guessing most clients must support gzip, since it predates wss://.
The answer to this is surprisingly complicated.
TLS can optionally support compression which would most likely have universally worked for all wss:// connections. However, this was disabled in OpenSSL and other TLS libraries because of a critical information leakage that arises when secret and non-secret information are combined in the same compression context: https://blog.qualys.com/product-tech/2012/09/14/crime-information-leakage-attack-against-ssltls
HTTP-level compression does not apply to websockets (since its framing replaces/upgrades the HTTP framing) so instead compression is specified by the websocket RFCs. It is optional, so not all clients support this.
Websocket compression happens per message, and can use an empty window for each message, or can have a "sliding compression" window where messages are effectively compressed with previous messages. Some implementations will support both of those modes, some only one, and some neither. Even if an implementation supports compression, it may choose not to use it, and/or may use it only for particular messages (and not others). Furthermore, in the websocket compression handshake, bi-directional window sizes need to be negotiated and sometimes windows cannot be negotiated in one or both directions.
Almost all browser websocket clients support full compression with sliding windows in both directions, and so does strfry. The sliding window has a relatively large memory overhead per connection, so it can optionally be disabled. The compression ratios can be seen in the strfry logs.
Although strfry <> browser connections are almost always compressed both ways, different clients and relays have different levels of support and often can't negotiate optimal compression.
Good question! Appending at the end (or near the end) is actually the best-case scenario in my B-tree implementation, because it leaves the right-most leaf fully packed, unlike a classic B-tree which keeps it at half-capacity. More info here: https://github.com/hoytech/negentropy/blob/master/cpp/negentropy/storage/btree/core.h#L15-L31
Making multiple updates to the B-tree in the same transaction is also highly optimised: The modified nodes are edited in-place in memory, and only committed to the DB at the end of the transaction. So on average, inserting 40 or fewer records at the right edge of the tree in one transaction will require only 1 DB read and 1 DB write (in addition to the metadata page).