> Currently `strfry sync` will post events it has with regular EVENT messages, and/or download events it needs with regular REQ messages.

It’s fine, IMO, that reconciliation is performed out-of-band. A sentence describing what you just said would be fine.

> Maybe we should also describe the Negentropy protocol there?

Whatever the relay is expected to produce should be defined—or at minimum linked to a specification.

Also, IMO, sending hex strings of binary seems to be both inefficient and against the Nostr ethos of legible messages/events. If you’re sending binary, base64 only inflates by 33% compared to hex strings’ 100%.

Personally, I’d prefer to see the message format specified explicitly as debuggable JSON, if feasible.

Reply to this note

Please Login to reply.

Discussion

> If you’re sending binary, base64 only inflates by 33% compared to hex strings’ 100%.

On purely random data, after compression hex is usually only ~10% bigger than base64. For example:

$ head -c 1000000 /dev/urandom > rand

$ alias hex='od -A n -t x1 | sed "s/ *//g"'

$ cat rand |hex|zstd -c|wc -c

1086970

$ cat rand |base64|zstd -c|wc -c

1018226

$ wcalc 1086970/1018226

= 1.06751

So only ~7% bigger in this case. When data is not purely random, hex often compresses *better* than base64. This is because hex preserves patterns on byte boundaries but base64 does not. For example, look at these two strings post-base64:

$ echo 'hello world' | base64

aGVsbG8gd29ybGQK

$ echo ' hello world' | base64

IGhlbGxvIHdvcmxkCg==

They have nothing in common. Compare to the hex encoded versions:

$ echo 'hello world' | hex

68656c6c6f20776f726c640a

$ echo ' hello world' | hex

2068656c6c6f20776f726c640a

The pattern is preserved, it is just shifted by 2 characters. This means that if "hello world" appears multiple times in the input, there may be two different patterns for it in Base64, but only one in hex (meaning hex effectively has a 2x larger compression dictionary).

Since negentropy is mostly (but not entirely) random data like hashes and fingerprints, it's probably a wash. However, hex is typically faster to encode/decode and furthermore is used for almost all other fields in the nostr protocol, so on the whole seems like the best choice.

> Personally, I’d prefer to see the message format specified explicitly as debuggable JSON, if feasible.

This is theoretically possible, but it would be very difficult to interpret/debug it anyway, and it would add a lot of bandwidth/CPU overhead.

If we’re gonna go binary, then I’d prefer to just go full binary. WebSockets support binary mode. We could encode these messages using CBOR and suffer minimal loss on the wire. This would still be compatible with stream compression, AFAIK.

Encoding binary data as a string can be completely avoided for this new message type, since both ends of the connection are expected to handle binary payloads anyway. The usual argument of avoiding a human-unreadable format is moot here, IMO.

Hehe in fact my original attempt at sync in strfry was a protocol called "yesstr" using my Quadrable library: https://github.com/hoytech/quadrable

It used binary flatbuffers over websockets. It caused a lot of problems since binary websocket messages require a distinct message type from text ones. Since nothing else in nostr uses these, of client libraries were having trouble integrating with it. Using a structured binary format like CBOR or flatbuffers also means clients would have to pull in a (probably heavy) serialisation library.

The nice thing about the current approach is that any existing nostr client already must support websocket text messages containing JSON containing hex data.

What’s the rate of support for compression on the wire? I’m guessing most clients must support gzip, since it predates wss://.

The answer to this is surprisingly complicated.

TLS can optionally support compression which would most likely have universally worked for all wss:// connections. However, this was disabled in OpenSSL and other TLS libraries because of a critical information leakage that arises when secret and non-secret information are combined in the same compression context: https://blog.qualys.com/product-tech/2012/09/14/crime-information-leakage-attack-against-ssltls

HTTP-level compression does not apply to websockets (since its framing replaces/upgrades the HTTP framing) so instead compression is specified by the websocket RFCs. It is optional, so not all clients support this.

Websocket compression happens per message, and can use an empty window for each message, or can have a "sliding compression" window where messages are effectively compressed with previous messages. Some implementations will support both of those modes, some only one, and some neither. Even if an implementation supports compression, it may choose not to use it, and/or may use it only for particular messages (and not others). Furthermore, in the websocket compression handshake, bi-directional window sizes need to be negotiated and sometimes windows cannot be negotiated in one or both directions.

Almost all browser websocket clients support full compression with sliding windows in both directions, and so does strfry. The sliding window has a relatively large memory overhead per connection, so it can optionally be disabled. The compression ratios can be seen in the strfry logs.

Although strfry <> browser connections are almost always compressed both ways, different clients and relays have different levels of support and often can't negotiate optimal compression.