ok, so, i removed all the logging bits in my encoder because i discovered that it was a massive part of the benchmark timing information (and performance), and i was pleased to see this result:

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 788 1467852 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 94 13787646 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 48 24622387 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 600 1957643 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 27 43210743 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 852 1379193 ns/op

I've slimmed it down to mine, nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 and the built in Gob decoder from the Go standard library

my binary encoding is more than 20x as fast, i did not expect it to be that much faster, but ok, not sure if that's because i reused the decode write buffer

the decoder is 600 ops versus 852 ops, so it's slower but not that much slower

here it is again except with using 10,000 of those sample events, numbers much lower of course but may be more accurate due to statistical variations

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 139 8155043 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 19 62083345 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 9 112449500 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 13966114 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 5 223875643 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 121 9243299 ns/op

the encoding is a little less, around 14x the decoding, however, comes out to only 20% less in this test, which is an extra 5%

anyway, everyone knows boys like pissing contests, i'm no different

one thing this has revealed to me is that for purposes of benchmarking, at least, i need a way to disable the logging more completely, this is with all logging overhead utterly removed, the difference is amazing and shows how expensive my logging is in runtime throughput

seeing these results does rather make my last 3 days of efforts writing this code seem like it was worth it

if i can speed up the decode by 20% my codec is the winner

Reply to this note

Please Login to reply.

Discussion

before i get to that, i'm gonna refactor my logging library to eliminate that overhead, as much as possible

it's great to have the logging info when i'm debugging, priceless data, but if this is enabled in production and has this bad overhead there definitely is a problem with my logging library

also, the 20% slower decode may be purely the product of my data size optimization, it most likely is trading off the difference of the size of all those insane follow lists for a nearly 50% reduction in binary event size, if so, that's cheap for ~45% compression of the most bulky events in the data set, they will be going from 100-500kb down to 50-250kb, that's not trivial when you consider how many of these events there are (probably about 1% by count and 15% by size