ok, so, i removed all the logging bits in my encoder because i discovered that it was a massive part of the benchmark timing information (and performance), and i was pleased to see this result:

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 788 1467852 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 94 13787646 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 48 24622387 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 600 1957643 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 27 43210743 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 852 1379193 ns/op

I've slimmed it down to mine, nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 and the built in Gob decoder from the Go standard library

my binary encoding is more than 20x as fast, i did not expect it to be that much faster, but ok, not sure if that's because i reused the decode write buffer

the decoder is 600 ops versus 852 ops, so it's slower but not that much slower

here it is again except with using 10,000 of those sample events, numbers much lower of course but may be more accurate due to statistical variations

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 139 8155043 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 19 62083345 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 9 112449500 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 13966114 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 5 223875643 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 121 9243299 ns/op

the encoding is a little less, around 14x the decoding, however, comes out to only 20% less in this test, which is an extra 5%

anyway, everyone knows boys like pissing contests, i'm no different

one thing this has revealed to me is that for purposes of benchmarking, at least, i need a way to disable the logging more completely, this is with all logging overhead utterly removed, the difference is amazing and shows how expensive my logging is in runtime throughput

It's definitely the buffer reuse. Memory allocation is the slowest thing in the world apparently.

Are you benchmarking against my latest commits on go-nostr? Because on these I have copied your string unsafe trick so it's not fair.

Just don't wake up the C and Rust boys nostr:npub1xtscya34g58tk0z605fvr788k263gsu6cy9x0mhnm87echrgufzsevkk5s and nostr:npub1acg6thl5psv62405rljzkj8spesceyfz2c32udakc2ak0dmvfeyse9p35c or they will smash us with their faster code.

Reply to this note

Please Login to reply.

Discussion

of course, i started this after you made these changes... and i am happy to discover a big waste in my code... i mean epic

also, it does still log now, just not the super complicated shit i had before, even still adds ascii colour to make the links blue on VTE

also, bring it on!

i bet i can make Go code that is within 1%, a recent code contest showed that current Go version was neck and neck with the best algorithm written in java, though i'm sure you realise that is like comparing oranges and tangerines, still, java is not far behind C++ in most things

i never knew that intellij Go plugin now does the full profiling thing... so i can watch flamegraphs and dig into the code that is making them

about 40% of the execution time is from my tag processing and of that 40% is just making slices

so if i can find a way to eliminate all that slice making or reduce it i can win another ~12% on the decoder

the next biggest thing after that is hex.EncodeToString, i'm not sure how much this entails but i'm now going to designate it as my first target, since i know i'm aiming to turn 32 bytes into 64 chars hex, and probably i can eliminate that just by making a 32 bytes to 64 hex encoding buffer

well, these are the two decoding bottlenecks anyway, i only have to chop 20% out and i win

*laughs in provocation*

this is why i thought about the idea of going straight between binary and json, the intermediate format inside the program is a two step conversion process every time for nothing

all you need is a way to extract fields from the binary and from the json quickly without making any memory copies, and you can actually abolish the intermediate format

anyway, that's the gist of the idea... let's see how this works out in real life

i'm just gonna start with eliminating the use of strings in the primary data structure

this is a remarkably pervasive change

also, i hate go strings, they should never be used for any serious work, they should just be removed from the language, and rusticles, they are the only immutable thing in the language, and they are useless, yet you have to write mut mut mut mut you stupid morons

newsflash: writing endless bounds of write only shit in the MMU is a fucking waste

Go is slow garbage, best upgrade to a better lang.

Will, opening with a haymaker…

🍿

😂

he doesn't care about response latency or how long it takes to learn a language properly or how fast it compiles, only how cool he seems

if every ounce of perf matters use C or rust. otherwise, Go is a great language that is very easy to write and has excellent primitives for concurrency and web services. Id venture to say 95% of the time Go speed is not a problem. most code projects dont need to manage memory themselves.

exactly... rust, c and c++ can give you an extra 10-20% throughput

but none of them can give you the low response latency of Go, no matter what you do

that's the thing, what matters more?

unless you are converting video files (without a GPU???) or some other equally long running computation, their advantage is nil to negative, because of the time cost of development and the far harder debugging process

rust is an overly complicated, slow compiling piece of shit, best to downgrade to a smaller, simpler lang with a proper garbage collector instead of one that takes 6 months to learn how to annotate your variables for it, and forces you to explicitly make everything mutable which is almost everything, unlike C which you can only make immmutable by using static

people make their tradeoffs and i prefer to be never able to do much better than 80% performance as Rust when it takes me 20% of the time to write the same code without bugs

and i didn't mention latency... with less than 10% of the response latency, to boot

if rust had first class channels and goroutines i would consider using it, but that's my red line, no coroutines no channels no workee