I just tagged strfry 1.0.0. Here are some of the highlights:

* negentropy protocol 1: This is the result of a lot of R&D on different syncing protocols, trying to find the best fit for nostr. I'm pretty excited about the result. Negentropy sync has now been allocated NIP 77.

* Better error messages for users and operators.

* Docs have been updated and refreshed.

* Lots of optimisations: Better CPU/memory usage, smaller DBs.

Export/import has been sped up a lot: 10x faster or more. This should help reduce the pain of DB upgrades (which is required for this release). Instructions on upgrading are available here:

https://github.com/hoytech/strfry?tab=readme-ov-file#db-upgrade

Thanks to everyone who has helped develop/debug/test strfry over the past 2 years, and for all the kind words and encouragement. The nostr community rocks!

We've got a few things in the pipeline for strfry:

* strfry proxy: This will be a new feature for the router that enables intelligent reverse proxying for the nostr protocol. This will help scale up mega-sized relays by allowing the storage and processing workload to be split across multiple independent machines. Various partitioning schemes will be supported depending on performance and redundancy requirements. The front-end router instances will perform multiple concurrent nostr queries to the backend relays, and merge their results into a single stream for the original client.

* As well as scaling up, reverse proxying can also help scale down. By dynamically incorporating relay list settings (NIP-65), nostr queries can be satisfied by proxying requests to external relays on behalf of a client and merging the results together along with any matching cached local events. Negentropy will be used where possible to avoid wasting bandwidth on duplicate events.

* Archival mode: Currently strfry stores all events fully indexed in its main DB, along with their full JSON representations (optionally zstd dictionary compressed). For old events that are queried infrequently, space usage can be reduced considerably. As well as deindexing, we are planning on taking advantage of columnar storage, aggregation of reaction events, and other tricks. This will play nicely with strfry proxy, and events can gradually migrate to the archival relays.

* Last but not least, our website https://oddbean.com is going to get some love. Custom algorithms, search, bugfixes, better relay coverage, and more!

Reply to this note

Please Login to reply.

Discussion

Ty for oddbean! I use it often

I appreciate you letting me know -- I'm glad you like it!

Getting my strfry instance up a running! Nothing like the experience compiling and building everything from source!

I have it almost running, getting some weird bad filter errors for some clients. Other strfry relays seem to work fine, so I think it’s a config thing. Where is the best place to go for support?

Probably our telegram channel is best currently: https://t.me/strfry_users

Hopefully some day we'll migrate fully to nostr!

Just joined. Thx! BTW I have my relay running, it is working fine. Tracking down some incompatiblitie from my old relay.

Yeah it’s the best interface I’ve seen so far which captures the nostr zeitgeist all in a clean and simple way. Really nice work.

Thank you Doug 🙏

GOAT

Congrats on reaching v1.0.0!

I'd like to zap you, but yours isn't set up...

Congrats on v1.0.0 release Doug! Really appreciate your awesome work on Strfry!

wow nice! this is a very impressive changelog, you've been busy!

Thanks so much for bring this into being. It's an important project in in nostr land. v1.0.0 is a big milestone. Congratulations. You should celebrate with balloons or something.

Can't wait for the proxy 🚀

Link to NIP-77?

Thank you! I read the Range-Based Set Reconciliation article. Seems like an efficient strategy.

Makes me wonder about frequency, cost and strategy of rebalancing the b-tree, given that elements would tend to be mostly appended near the end.

The NIP-77 document doesn’t specify what the content of the ‘NEG-MSG’ messages are. It also doesn’t seem to cover the actual sending of events in either direction (after the sets have been reconciled).

NIP-77 I believe is still a draft/work-in-progress, so I don't know the final form yet.

I think fiatjaf's intent was just to standardise the nostr integration, rather than negentropy as a whole. I guess this is similar to how JSON/SHA256/secp256k/etc are not themselves specified in NIPs, but instead external specifications are referred to. That said, the negentropy spec is relatively straightforward and there are already at least 5 implementations (and I know of some more in development too): https://github.com/hoytech/negentropy/tree/master?tab=readme-ov-file#definitions

> It also doesn’t seem to cover the actual sending of events in either direction (after the sets have been reconciled).

True, but by design this is out-of-scope of the negentropy protocol. Negentropy just performs a set reconciliation so the client knows which IDs it has and/or needs. It is then up to it to decide what to do. Currently `strfry sync` will post events it has with regular EVENT messages, and/or download events it needs with regular REQ messages.

However, there are a lot of other possibilities. If a client is only interested in uploading (or downloading), it may only do one of those actions. In fact it may just store those IDs and fetch them later, or through some other relay or mechanism. Alternatively, if a client doesn't actually care about the event contents and is only trying to compute an aggregate "reaction count" across relays, it may never actually download the events, and purely use negentropy for de-duplication purposes.

Good points, I just posted there in a quick way to get the ball rolling.

I think I will edit the NIP text to describe how events are to be transmitted after they are discovered.

Maybe we should also describe the Negentropy protocol there? I can try to do that, but I wonder if it will confuse people more than help. What do you think?

> Currently `strfry sync` will post events it has with regular EVENT messages, and/or download events it needs with regular REQ messages.

It’s fine, IMO, that reconciliation is performed out-of-band. A sentence describing what you just said would be fine.

> Maybe we should also describe the Negentropy protocol there?

Whatever the relay is expected to produce should be defined—or at minimum linked to a specification.

Also, IMO, sending hex strings of binary seems to be both inefficient and against the Nostr ethos of legible messages/events. If you’re sending binary, base64 only inflates by 33% compared to hex strings’ 100%.

Personally, I’d prefer to see the message format specified explicitly as debuggable JSON, if feasible.

> If you’re sending binary, base64 only inflates by 33% compared to hex strings’ 100%.

On purely random data, after compression hex is usually only ~10% bigger than base64. For example:

$ head -c 1000000 /dev/urandom > rand

$ alias hex='od -A n -t x1 | sed "s/ *//g"'

$ cat rand |hex|zstd -c|wc -c

1086970

$ cat rand |base64|zstd -c|wc -c

1018226

$ wcalc 1086970/1018226

= 1.06751

So only ~7% bigger in this case. When data is not purely random, hex often compresses *better* than base64. This is because hex preserves patterns on byte boundaries but base64 does not. For example, look at these two strings post-base64:

$ echo 'hello world' | base64

aGVsbG8gd29ybGQK

$ echo ' hello world' | base64

IGhlbGxvIHdvcmxkCg==

They have nothing in common. Compare to the hex encoded versions:

$ echo 'hello world' | hex

68656c6c6f20776f726c640a

$ echo ' hello world' | hex

2068656c6c6f20776f726c640a

The pattern is preserved, it is just shifted by 2 characters. This means that if "hello world" appears multiple times in the input, there may be two different patterns for it in Base64, but only one in hex (meaning hex effectively has a 2x larger compression dictionary).

Since negentropy is mostly (but not entirely) random data like hashes and fingerprints, it's probably a wash. However, hex is typically faster to encode/decode and furthermore is used for almost all other fields in the nostr protocol, so on the whole seems like the best choice.

> Personally, I’d prefer to see the message format specified explicitly as debuggable JSON, if feasible.

This is theoretically possible, but it would be very difficult to interpret/debug it anyway, and it would add a lot of bandwidth/CPU overhead.

If we’re gonna go binary, then I’d prefer to just go full binary. WebSockets support binary mode. We could encode these messages using CBOR and suffer minimal loss on the wire. This would still be compatible with stream compression, AFAIK.

Encoding binary data as a string can be completely avoided for this new message type, since both ends of the connection are expected to handle binary payloads anyway. The usual argument of avoiding a human-unreadable format is moot here, IMO.

Hehe in fact my original attempt at sync in strfry was a protocol called "yesstr" using my Quadrable library: https://github.com/hoytech/quadrable

It used binary flatbuffers over websockets. It caused a lot of problems since binary websocket messages require a distinct message type from text ones. Since nothing else in nostr uses these, of client libraries were having trouble integrating with it. Using a structured binary format like CBOR or flatbuffers also means clients would have to pull in a (probably heavy) serialisation library.

The nice thing about the current approach is that any existing nostr client already must support websocket text messages containing JSON containing hex data.

What’s the rate of support for compression on the wire? I’m guessing most clients must support gzip, since it predates wss://.

The answer to this is surprisingly complicated.

TLS can optionally support compression which would most likely have universally worked for all wss:// connections. However, this was disabled in OpenSSL and other TLS libraries because of a critical information leakage that arises when secret and non-secret information are combined in the same compression context: https://blog.qualys.com/product-tech/2012/09/14/crime-information-leakage-attack-against-ssltls

HTTP-level compression does not apply to websockets (since its framing replaces/upgrades the HTTP framing) so instead compression is specified by the websocket RFCs. It is optional, so not all clients support this.

Websocket compression happens per message, and can use an empty window for each message, or can have a "sliding compression" window where messages are effectively compressed with previous messages. Some implementations will support both of those modes, some only one, and some neither. Even if an implementation supports compression, it may choose not to use it, and/or may use it only for particular messages (and not others). Furthermore, in the websocket compression handshake, bi-directional window sizes need to be negotiated and sometimes windows cannot be negotiated in one or both directions.

Almost all browser websocket clients support full compression with sliding windows in both directions, and so does strfry. The sliding window has a relatively large memory overhead per connection, so it can optionally be disabled. The compression ratios can be seen in the strfry logs.

Although strfry <> browser connections are almost always compressed both ways, different clients and relays have different levels of support and often can't negotiate optimal compression.

Good question! Appending at the end (or near the end) is actually the best-case scenario in my B-tree implementation, because it leaves the right-most leaf fully packed, unlike a classic B-tree which keeps it at half-capacity. More info here: https://github.com/hoytech/negentropy/blob/master/cpp/negentropy/storage/btree/core.h#L15-L31

Making multiple updates to the B-tree in the same transaction is also highly optimised: The modified nodes are edited in-place in memory, and only committed to the DB at the end of the transaction. So on average, inserting 40 or fewer records at the right edge of the tree in one transaction will require only 1 DB read and 1 DB write (in addition to the metadata page).

Building #strfry version 1.0.0

Thanks nostr:npub1yxprsscnjw2e6myxz73mmzvnqw5kvzd5ffjya9ecjypc5l0gvgksh8qud4

nostr:note1sc3muqmr900ctc2qcf6lmalnnvswzwfu8qgkfadj2e38y3nrqk9qclda2u

fucking legend, ty

amazing updates!

nostr:note1sc3muqmr900ctc2qcf6lmalnnvswzwfu8qgkfadj2e38y3nrqk9qclda2u

proxy is huge, I can see myself using that soon. negentropy v1! the hype is real.

nostr:npub1qny3tkh0acurzla8x3zy4nhrjz5zd8l9sy9jys09umwng00manysew95gx why don't you do long term support for this 10x dev on nostr? (possibly more than 10x!)

Oh no, I missed this one, thanks for linking! This is a really good write-up and there are a lot of similarities with what I'm designing for strfry proxy. Like some of the comments, I'm not sure if partitioning on ID will be optimal in all cases. I can also imagine variations on pubkey and created_at.

Some variant of consistent hashing will also be necessary I think, for failure recovery, rebalancing, changing the number of backend relays, etc.

Should I do exports before or after compiling the new version? 0.9.6 does not recognize the "--fried" flag for the export command so I guess it's after?

The best way is to first upgrade to 0.9.7, which is the latest release in the 0.9 series. This has the "--fried" option for export (but not import).

After you have the fried export, upgrade to 1.0.0 and import with "--fried" too.

Sorry for the trouble, but it will save a *lot* of time versus normal import!

Do I need to export and import events when migrating from 0.9.6 to 0.9.7?

No, both of those versions use DB version 2, so their DBs are compatible.

Out of curiousity, how does strfry balance the splitting of ranges versus the number of round trips? Personally I think waiting for "yet another round trip" is pretty bad compared to sending let's say 2x the amount of data.

Good question! Right now the implementation is pretty simple, but I think will work well in most cases: It always splits a range into 16 equal-sized ranges unless the range has 32 or fewer IDs, in which case it just sends the whole list of IDs. I think there is probably some low-hanging fruit on tuning that threshold.

The nice thing is that nothing in the protocol needs to change to tune this. In fact, the protocol theoretically supports dynamic adjustment of those parameters to target a particular point in the latency/bandwidth tradeoff space.

There are also some other reasons you might want to customise the range selection, which I described here: https://logperiodic.com/rbsr.html#range-choice

Would it make sense to use the simdjson library in Strfry? Seems very performant.

https://github.com/simdjson/simdjson