178MiB 0:09:31 [ 318KiB/s] [================================================>] 100%
Processed 214488 events
Total original size: 182393202 bytes
Total encoded size: 125937545 bytes
Total encoded and compressed size: 96236760 bytes
Total compressed but not encoded size: 116194898 bytes
Average original size: 850.37 bytes
Average encoded size: 587.15 bytes
Average encoded and compressed size: 448.68 bytes
Compression ratio: 52.76%
Compression ratio without encoding: 63.71%
Here's a sample compression. It is slow, but Lora is much slower (although there are multiple compression and uncompression steps in this, which will not be necessary).
Anyway, what does this mean? We get events with average original size of 850 bytes (I got sample events from relays, quite a lot of them). Just encoding saves a lot. Compression is good, but if we are compressing without encoding, we only get 63.71% (with encoding 52.76%). That might basically be just stripping hashes (id).
If we compress a huge set of events, the difference is not so high:
125937545 1 sep 00:24 events.encoded
70426465 1 sep 00:24 events.encoded.br
186704164 31 aug 23:38 events.jsonl
81979371 31 aug 23:38 events.jsonl.br
Brotli creates a dictionary (I do it manually) and it's pretty good at it. The problem is when we want to transfer just one or a few events, we might not have that luxury.
Anyway, I am pretty satisfied with compression.