Nostr Web Client

nostr:npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6 your binary coder is 193us/op versus gob 240us/op it's not that big a margin

the one in binary.go is the fastest

personally, i think if you want to squeeze it a bit faster, consider using reflect to force-re-type those integers (they will go to whatever your hardware endianism is, which is opposite to BigEndian on intel/amd)

the opportunity i see for big performance increase is moving all that hexadecimal encoding to the network side only and everything internal being bytes

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

also btw, i strongly suspect that if you cache the gob encoder that margin will disappear and the difference is actually the setup

Reply to this note

Please Login to reply.

Discussion

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics

BenchmarkBinaryEncoding

BenchmarkBinaryEncoding/gob.Encode

BenchmarkBinaryEncoding/gob.Encode-12 6886 151743 ns/op

BenchmarkBinaryEncoding/binary.Marshal

BenchmarkBinaryEncoding/binary.Marshal-12 5544 204894 ns/op

BenchmarkBinaryEncoding/binary.MarshalBinary

BenchmarkBinaryEncoding/binary.MarshalBinary-12 5832 203333 ns/op

this is how the code should have looked to be fair

b.Run("gob.Encode", func(b *testing.B) {

var buf bytes.Buffer

enc := gob.NewEncoder(&buf)

for i := 0; i < b.N; i++ {

for _, evt := range events {

enc.Encode(evt)

// _ = buf.Bytes()

}

})

and yes, i was right, it's almost 25% faster than your thing

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

also, there is a gob register function, which registers a type for encoding, i'm surprised that you didn't use it, it probably makes it even faster

fiatjaf 1y ago

Oh, that's encoding.

I agree encoding can be made many times faster by just not allocating a new buffer on every iteration. The same effect happens in my encoder. If I reuse a buffer it gets many times faster. I didn't think about this before because I was only thinking about decoding and encoding was an afterthought because it's not too performance critical.

But currently for decoding my codec beats gob by far -- unless again I'm missing something.

fiatjaf 1y ago

Maybe it makes sense to reuse the decoded objects in the decoding path too? But the impact will probably be much smaller.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

yeah, i tried to write a decoder cache, for the network, it was too much complication for probably too little gain

i think a more prospective optimization is isolating the binary and json versions to only on the wire and on the db, so all other work with them stays in the fast native form, the default event struct with hex encoding for id, pubkey and signature is not optimal, honestly they should all be their native type - except the ID should be []byte

the internal version of the pubkey should be the same one that is used in the btcec signature verify function and the signature should be a schnorr.Signature type

changes i long ago put on my mental todo list but forgot

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

also, yes, reusing buffers is a huge thing... reducing the cost of GC on stuff that you can easily reuse is an easy win

fiatjaf 1y ago

Yes, I wanted to do that too but I don't want to break the API, but then I start thinking there will not be too much gain anyway, as, for example, the signature is only checked once, the pubkey might have to be serialized back to hex for printing anyway, sometimes more than once and so on, so the gains are not super clear.

And if you want faster signature verification your should use github.com/nbd-wtf/go-nostr/libsecp256k1 anyway (or do your thing and refactor them massively complaining about my code in the process), the differences are very big. I don't know why I didn't make those bindings before -- it's weird that no one else had done them either.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

i'm not using cgo

pretty sure someone did some bindings ages ago, but not for schnorr signatures, only ecdsa

and yeah, that's pretty lame how the gob codec only talks to readers and bytes.buffer doesn't let you swap out the buffer, that decoder setup is a massive overhead, in the encode step that takes out 800us, and honestly that looks like most of the time the encoder is taking... i think writing a io.Reader that lets you point at a new buffer would change the story drastically

bytes.Buffer is honestly a terrible thing... i also made my own variant of it for the json envelope decoder, it simply has no notion of the idea of pre-allocating the buffer that you know you aren't going to need to grow... growing buffers is a massive overhead cost and in many cases easy to avoid

https://github.com/Hubmakerlabs/replicatr/blob/main/pkg/nostr/wire/text/mangle.go

it includes a bunch of handy functions that let you snip out an enclosed object or array inside another object or array without parsing it, as well as snipping out strings while avoiding escaped strings

some parts of the Go stdlib are written like a soviet commitee but others not so bad, i particularly hate math/big and i'm working with noise protocol recently and the way they have a feature that it automatically appends it to an input parameter is just fucking ridiculous, like, fucks sake, do one thing and do it well, this precise feature blows up the heap and breaks any attempts you make to contain garbage production

fiatjaf 1y ago

Another thing I noticed is that gob generates huge payloads. It can't be faster with such big payloads. But maybe I did something wrong again, I have never used gob.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

yeah, this was a rabbithole for the last half hour for me

gob decoder has a pretty severe memory overhead, the encoder, it can be reused but the decoder wants to make all these maps and shit each time, and if you use it to decode the same type again it says "no can decode same type again with this decoder" wtaf?

nice catch anyway, i guess this is not an optimization, nor would be using bytes.Buffer as it is, especially not because i can see an easy size precalculation that would never require a reallocation (just based on the lengths of segments of the json)

and that reminds me of the fact that badger's internal binary encoder is actually protobuf