seeing these results does rather make my last 3 days of efforts writing this code seem like it was worth it
if i can speed up the decode by 20% my codec is the winner
ok, so, i removed all the logging bits in my encoder because i discovered that it was a massive part of the benchmark timing information (and performance), and i was pleased to see this result:
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkBinaryEncoding
BenchmarkBinaryEncoding/event2.EventToBinary
BenchmarkBinaryEncoding/event2.EventToBinary-12 788 1467852 ns/op
BenchmarkBinaryEncoding/gob.Encode
BenchmarkBinaryEncoding/gob.Encode-12 94 13787646 ns/op
BenchmarkBinaryEncoding/binary.Marshal
BenchmarkBinaryEncoding/binary.Marshal-12 48 24622387 ns/op
BenchmarkBinaryDecoding
BenchmarkBinaryDecoding/event2.BinaryToEvent
BenchmarkBinaryDecoding/event2.BinaryToEvent-12 600 1957643 ns/op
BenchmarkBinaryDecoding/gob.Decode
BenchmarkBinaryDecoding/gob.Decode-12 27 43210743 ns/op
BenchmarkBinaryDecoding/binary.Unmarshal
BenchmarkBinaryDecoding/binary.Unmarshal-12 852 1379193 ns/op
I've slimmed it down to mine, nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 and the built in Gob decoder from the Go standard library
my binary encoding is more than 20x as fast, i did not expect it to be that much faster, but ok, not sure if that's because i reused the decode write buffer
the decoder is 600 ops versus 852 ops, so it's slower but not that much slower
here it is again except with using 10,000 of those sample events, numbers much lower of course but may be more accurate due to statistical variations
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkBinaryEncoding
BenchmarkBinaryEncoding/event2.EventToBinary
BenchmarkBinaryEncoding/event2.EventToBinary-12 139 8155043 ns/op
BenchmarkBinaryEncoding/gob.Encode
BenchmarkBinaryEncoding/gob.Encode-12 19 62083345 ns/op
BenchmarkBinaryEncoding/binary.Marshal
BenchmarkBinaryEncoding/binary.Marshal-12 9 112449500 ns/op
BenchmarkBinaryDecoding
BenchmarkBinaryDecoding/event2.BinaryToEvent
BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 13966114 ns/op
BenchmarkBinaryDecoding/gob.Decode
BenchmarkBinaryDecoding/gob.Decode-12 5 223875643 ns/op
BenchmarkBinaryDecoding/binary.Unmarshal
BenchmarkBinaryDecoding/binary.Unmarshal-12 121 9243299 ns/op
the encoding is a little less, around 14x the decoding, however, comes out to only 20% less in this test, which is an extra 5%
anyway, everyone knows boys like pissing contests, i'm no different
one thing this has revealed to me is that for purposes of benchmarking, at least, i need a way to disable the logging more completely, this is with all logging overhead utterly removed, the difference is amazing and shows how expensive my logging is in runtime throughput
seeing these results does rather make my last 3 days of efforts writing this code seem like it was worth it
if i can speed up the decode by 20% my codec is the winner
before i get to that, i'm gonna refactor my logging library to eliminate that overhead, as much as possible
it's great to have the logging info when i'm debugging, priceless data, but if this is enabled in production and has this bad overhead there definitely is a problem with my logging library
also, the 20% slower decode may be purely the product of my data size optimization, it most likely is trading off the difference of the size of all those insane follow lists for a nearly 50% reduction in binary event size, if so, that's cheap for ~45% compression of the most bulky events in the data set, they will be going from 100-500kb down to 50-250kb, that's not trivial when you consider how many of these events there are (probably about 1% by count and 15% by size