Encode hex strings as binary:

Original size: 600 bytes

Encoded size: 285 bytes

Encoded and compressed size: 289 bytes

Reply to this note

Please Login to reply.

Discussion

178MiB 0:09:31 [ 318KiB/s] [================================================>] 100%

Processed 214488 events

Total original size: 182393202 bytes

Total encoded size: 125937545 bytes

Total encoded and compressed size: 96236760 bytes

Total compressed but not encoded size: 116194898 bytes

Average original size: 850.37 bytes

Average encoded size: 587.15 bytes

Average encoded and compressed size: 448.68 bytes

Compression ratio: 52.76%

Compression ratio without encoding: 63.71%

Here's a sample compression. It is slow, but Lora is much slower (although there are multiple compression and uncompression steps in this, which will not be necessary).

Anyway, what does this mean? We get events with average original size of 850 bytes (I got sample events from relays, quite a lot of them). Just encoding saves a lot. Compression is good, but if we are compressing without encoding, we only get 63.71% (with encoding 52.76%). That might basically be just stripping hashes (id).

If we compress a huge set of events, the difference is not so high:

125937545 1 sep 00:24 events.encoded

70426465 1 sep 00:24 events.encoded.br

186704164 31 aug 23:38 events.jsonl

81979371 31 aug 23:38 events.jsonl.br

Brotli creates a dictionary (I do it manually) and it's pretty good at it. The problem is when we want to transfer just one or a few events, we might not have that luxury.

Anyway, I am pretty satisfied with compression.