ok, so, i removed all the logging bits in my encoder because i discovered that it was a massive part of the benchmark timing information (and performance), and i was pleased to see this result:

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 788 1467852 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 94 13787646 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 48 24622387 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 600 1957643 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 27 43210743 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 852 1379193 ns/op

I've slimmed it down to mine, nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 and the built in Gob decoder from the Go standard library

my binary encoding is more than 20x as fast, i did not expect it to be that much faster, but ok, not sure if that's because i reused the decode write buffer

the decoder is 600 ops versus 852 ops, so it's slower but not that much slower

here it is again except with using 10,000 of those sample events, numbers much lower of course but may be more accurate due to statistical variations

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 139 8155043 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 19 62083345 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 9 112449500 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 13966114 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 5 223875643 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 121 9243299 ns/op

the encoding is a little less, around 14x the decoding, however, comes out to only 20% less in this test, which is an extra 5%

anyway, everyone knows boys like pissing contests, i'm no different

one thing this has revealed to me is that for purposes of benchmarking, at least, i need a way to disable the logging more completely, this is with all logging overhead utterly removed, the difference is amazing and shows how expensive my logging is in runtime throughput

Reply to this note

Please Login to reply.

Discussion

seeing these results does rather make my last 3 days of efforts writing this code seem like it was worth it

if i can speed up the decode by 20% my codec is the winner

before i get to that, i'm gonna refactor my logging library to eliminate that overhead, as much as possible

it's great to have the logging info when i'm debugging, priceless data, but if this is enabled in production and has this bad overhead there definitely is a problem with my logging library

also, the 20% slower decode may be purely the product of my data size optimization, it most likely is trading off the difference of the size of all those insane follow lists for a nearly 50% reduction in binary event size, if so, that's cheap for ~45% compression of the most bulky events in the data set, they will be going from 100-500kb down to 50-250kb, that's not trivial when you consider how many of these events there are (probably about 1% by count and 15% by size

for comparison with the second set from 10k events, this is with my logging library enabled:

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 6 193251993 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 19 63600374 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 9 141026239 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 9 120024419 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 5 232574356 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 140 9325969 ns/op

this is now the result after stripping the fek out of my logger library, just got some hassles with go module versions

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/event2.EventToBinary

BenchmarkBinaryEncoding/event2.EventToBinary-12 141 8038303 ns/op

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 18 61665344 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 9 111623125 ns/op

BenchmarkBinaryDecoding

BenchmarkBinaryDecoding/event2.BinaryToEvent

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 99 13979812 ns/op

BenchmarkBinaryDecoding/gob.Decode

BenchmarkBinaryDecoding/gob.Decode-12 5 216066451 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal

BenchmarkBinaryDecoding/binary.Unmarshal-12 122 9004904 ns/op

now i can focus on the BinaryToEvent code and remove that difference 😁

i'm getting 400 errors after trying to up a new version, the git http server is reporting bogus packets, then it tries to decode gzip and it also fails because the gzip header is mangled, this is the message being sent by go tool...

not sure what to try now... maybe a different go version, seems like 1.22.3 might be buggy

oof, no beans with 1.22.2 either, what the actual... this was working just days back, now i try to put a new version and suddenly a bug i thought i'd fixed with legit is back again, bogus encoded request from the fucking go tool, or bogus decoding from the

dammmmmitttt

this must be a go.work or go.mod replace error

hmmmm maybe the legit version on my thing is broken...

bah, i wanted to do something else not debug legit again, i swear i fixed this gzip encoding bug before but evidently not

maybe it's time to go have a coffee

more probing shows that the problem seems to be more likely a bugged git repo that has been mangled

the repo clearly shows other tags than the ones go is allowing but it's pooping the cradle on anything later than an old version, which tells me that there might be some glitch with the actual git tags or someshit

gonna come back and fix it after coffee, have not had any problems with bumping new tags until this one, it's special

yes, confirmed, i wiped the existing bare repo on the host, and upped the previous version and the new version with the minimal, fast logging code

yes, worked, directly.

git is buggy, hurrah

also, second check of the other cafe down the road... seems like it hasn't been open in a while

methinks the lady i spoke to yesterday doesn't really know what her offspring is up to these days

It's definitely the buffer reuse. Memory allocation is the slowest thing in the world apparently.

Are you benchmarking against my latest commits on go-nostr? Because on these I have copied your string unsafe trick so it's not fair.

Just don't wake up the C and Rust boys nostr:npub1xtscya34g58tk0z605fvr788k263gsu6cy9x0mhnm87echrgufzsevkk5s and nostr:npub1acg6thl5psv62405rljzkj8spesceyfz2c32udakc2ak0dmvfeyse9p35c or they will smash us with their faster code.

of course, i started this after you made these changes... and i am happy to discover a big waste in my code... i mean epic

also, it does still log now, just not the super complicated shit i had before, even still adds ascii colour to make the links blue on VTE

also, bring it on!

i bet i can make Go code that is within 1%, a recent code contest showed that current Go version was neck and neck with the best algorithm written in java, though i'm sure you realise that is like comparing oranges and tangerines, still, java is not far behind C++ in most things

i never knew that intellij Go plugin now does the full profiling thing... so i can watch flamegraphs and dig into the code that is making them

about 40% of the execution time is from my tag processing and of that 40% is just making slices

so if i can find a way to eliminate all that slice making or reduce it i can win another ~12% on the decoder

the next biggest thing after that is hex.EncodeToString, i'm not sure how much this entails but i'm now going to designate it as my first target, since i know i'm aiming to turn 32 bytes into 64 chars hex, and probably i can eliminate that just by making a 32 bytes to 64 hex encoding buffer

well, these are the two decoding bottlenecks anyway, i only have to chop 20% out and i win

*laughs in provocation*

this is why i thought about the idea of going straight between binary and json, the intermediate format inside the program is a two step conversion process every time for nothing

all you need is a way to extract fields from the binary and from the json quickly without making any memory copies, and you can actually abolish the intermediate format

anyway, that's the gist of the idea... let's see how this works out in real life

i'm just gonna start with eliminating the use of strings in the primary data structure

this is a remarkably pervasive change

also, i hate go strings, they should never be used for any serious work, they should just be removed from the language, and rusticles, they are the only immutable thing in the language, and they are useless, yet you have to write mut mut mut mut you stupid morons

newsflash: writing endless bounds of write only shit in the MMU is a fucking waste

Go is slow garbage, best upgrade to a better lang.

Will, opening with a haymaker…

🍿

😂

he doesn't care about response latency or how long it takes to learn a language properly or how fast it compiles, only how cool he seems

if every ounce of perf matters use C or rust. otherwise, Go is a great language that is very easy to write and has excellent primitives for concurrency and web services. Id venture to say 95% of the time Go speed is not a problem. most code projects dont need to manage memory themselves.

exactly... rust, c and c++ can give you an extra 10-20% throughput

but none of them can give you the low response latency of Go, no matter what you do

that's the thing, what matters more?

unless you are converting video files (without a GPU???) or some other equally long running computation, their advantage is nil to negative, because of the time cost of development and the far harder debugging process

rust is an overly complicated, slow compiling piece of shit, best to downgrade to a smaller, simpler lang with a proper garbage collector instead of one that takes 6 months to learn how to annotate your variables for it, and forces you to explicitly make everything mutable which is almost everything, unlike C which you can only make immmutable by using static

people make their tradeoffs and i prefer to be never able to do much better than 80% performance as Rust when it takes me 20% of the time to write the same code without bugs

and i didn't mention latency... with less than 10% of the response latency, to boot

if rust had first class channels and goroutines i would consider using it, but that's my red line, no coroutines no channels no workee

so, it took all morning, but i changed one of those stupid string fields to bytes now, the ID

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding/event2.MarshalJSON-12 14 73695929 ns/op

BenchmarkBinaryEncoding/event2.EventToBinary-12 157 7280781 ns/op

BenchmarkBinaryEncoding/easyjson.Marshal-12 64 19049795 ns/op

BenchmarkBinaryEncoding/gob.Encode-12 18 62296062 ns/op

BenchmarkBinaryEncoding/binary.Marshal-12 10 110174020 ns/op

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 13698489 ns/op

BenchmarkBinaryDecoding/easyjson.Unmarshal-12 56 19677698 ns/op

BenchmarkBinaryDecoding/gob.Decode-12 5 226144867 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal-12 122 9011042 ns/op

BenchmarkBinaryDecoding/binary.UnmarshalBinary-12 277 4849297 ns/op

BenchmarkBinaryDecoding/easyjson.Unmarshal+sig-12 1 1624027475 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal+sig-12 1 1596453002 ns/op

just one of the fields, and from 139 to 157 - makes sense because it's not decoding hex anymore

i'm surprised the decode didn't get faster, since it now literally only makes a new slice header (subslicing)

small result for a lot of work but the pubkey and signature are still to come

nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 btw the UnmarshalBinary function you wrote is extremely fast, but the reason why its counterpart MarshalBinary isn't in there is because it can't deal with events over 64kb in size, otherwise i'd re-enable it

yes, it's not printing errors but the speed of that is because it's actually skipping a shit-ton of things

i'm gonna just remove it

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding/event2.MarshalJSON-12 15 74245789 ns/op

BenchmarkBinaryEncoding/event2.EventToBinary-12 162 7429203 ns/op

BenchmarkBinaryEncoding/easyjson.Marshal-12 60 21221235 ns/op

BenchmarkBinaryEncoding/gob.Encode-12 18 62426577 ns/op

BenchmarkBinaryEncoding/binary.Marshal-12 9 112783137 ns/op

BenchmarkBinaryDecoding/event2.UnmarshalJSON-12 14 90464147 ns/op

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 12832888 ns/op

BenchmarkBinaryDecoding/easyjson.Unmarshal-12 57 23280946 ns/op

BenchmarkBinaryDecoding/gob.Decode-12 5 226555916 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal-12 128 8563056 ns/op

the output from https://mleku.dev/nostrbench

now with the pubkey encoding as binary... not as dramatic an improvement it seems

oh well, anyway, it's done now, both id and pubkey fields now do not need any hex encode/decode so that's still a good thing

last is the signature... this will be another 5-10 ops more i figure, then i'm gonna get out the profiler

goos: linux

goarch: amd64

pkg: mleku.net/nostrbench

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding/event2.MarshalJSON-12 14 72883244 ns/op

BenchmarkBinaryEncoding/event2.EventToBinary-12 166 7090886 ns/op

BenchmarkBinaryEncoding/easyjson.Marshal-12 67 18620409 ns/op

BenchmarkBinaryEncoding/gob.Encode-12 19 62998941 ns/op

BenchmarkBinaryEncoding/binary.Marshal-12 10 110464204 ns/op

BenchmarkBinaryDecoding/event2.UnmarshalJSON-12 13 87762713 ns/op

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 12388974 ns/op

BenchmarkBinaryDecoding/easyjson.Unmarshal-12 57 23463646 ns/op

BenchmarkBinaryDecoding/gob.Decode-12 5 226411371 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal-12 123 8851251 ns/op

#devstr #benchmark #nostr

since switching up the binary encoder for sure the relay is running way way faster... also because i fixed the shit slow logging library, that was a big part of it, none of these other libraries even have logging! if mine has an error it prints a log! so it's still faster and dev friendly at the same time

bah humbug of course i has bugs now, very weirdness

it looks like i broke the filters, that would be why it returns zero counts all the time and zero results get sent, and then idk why not sending eose either, but hey gotta expect gotchas like this... i'll fix em tomorrow most likely, and become aware of how they got broken so when i do the signatures i don't get surprised like this

it does send out the messages to subscribers when i post it but filter searches are b0rked

ok, bug found... funny enough the first change i made was bugged... the event IDs... was all wrong totally... only took a few minutes to get it generating the correct output and plugging that into the filters and suddenly all these results are coming back from the filters coming in from clients

also, the bug was slowing down the encoder... this is now typical results

goos: linux

goarch: amd64

pkg: mleku.net/nostrbench

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding/event2.MarshalJSON-12 14 74456910 ns/op

BenchmarkBinaryEncoding/event2.EventToBinary-12 172 6911613 ns/op

BenchmarkBinaryEncoding/easyjson.Marshal-12 63 18776462 ns/op

BenchmarkBinaryEncoding/gob.Encode-12 19 62966594 ns/op

BenchmarkBinaryEncoding/binary.Marshal-12 9 118801192 ns/op

BenchmarkBinaryDecoding/event2.UnmarshalJSON-12 14 88550098 ns/op

BenchmarkBinaryDecoding/event2.BinaryToEvent-12 100 12272410 ns/op

BenchmarkBinaryDecoding/easyjson.Unmarshal-12 58 23138193 ns/op

BenchmarkBinaryDecoding/gob.Decode-12 5 227175445 ns/op

BenchmarkBinaryDecoding/binary.Unmarshal-12 126 8782942 ns/op

i'm still puzzled why the BinaryToEvent is unaffected though... gonna get to it yet, for now, all is well with the world, my relay is actually finding events and sending them back

actually, no, i only fixed one case, looks like the pubkeys are broken