> however, ironically, the ECDH is faster now, where core and btcec are about the same
Might be a constant time thing?
so, i have to stop because i think i have exhausted the possibilities for now, but...
https://github.com/mleku/p256k1/blob/main/bench/BENCHMARK_REPORT.md
this is a comparison of this new ported-from-the-c version of the nostr EC necessaries - deriving pubkeys, signing, verifying, and ecdh shared secret generation.
it now is the fastest at all (keep in mind the CGO one has a small CGO overhead but it's the one in bitcoin core)... except that pesky goddamn verification.
so, best i can figure out, the GLV optimization is the secret sauce. i've done every other trick and it squashed it to better than btcec, but only by a margin of like 15%, core's verify function is soaring out ahead at nearly 4x faster.
however, ironically, the ECDH is faster now, where core and btcec are about the same. i dunno if the pubkey derivation is actually correct, that looks like craziness. looks like it's looping many times and rejecting something or idk. loads of memory and time.
anyhow. signing is fast, which is a plus.
maybe i can do better on the verify function in the future. the impression i'm getting is that getting GLV to work right will be a tedious and long winded, probably days long process. but then it will probably run as fast as core's version. maybe. buut even still, all things considered, ~4x slower isn't that bad and my library is now officially the best choice for Go nostr dev except for the verification.
puzzles me how this stuff is so arcane after dozens of times looking at it 50 different ways it still couldn't make the GLV not bugger up the r1/r2 ... tbh i barely even grasp what any of this shit is in anything much more than an abstract, wibbly wobbly sorta way.
i'm just pleased that i made it basically mad to not use pure Go for secp256k1 signatures for a Go app.
i am gonna keep hunting that GLV solution though. i'm tired i still have a big W to note. i'll just leave it as is. orly will upgrade to it soon
i'm still kinda shocked that it is actually faster at signing, most of the time, than the core secp256k1 library, i mean, really. that was actually just the result of a bit minimizing allocations that dropped it that low.
might have been other things too.
has been quite interesting, to watch that when some operations got better, others got worse, at the same time. wandered down a few paths like this. i do wish i could get better verification speed tho 😭
> however, ironically, the ECDH is faster now, where core and btcec are about the same
Might be a constant time thing?
nah, i just found a better optimization by luck lol.
getting the verifications to go faster, the Strauss-WNAF curve multiplication optimization explains a 2-4x better performance at verification with libsecp256k1, which is precisely the ratio the two different implementations show in performance.
for some reason, getting the optimization to actually work seems to be extremely slow going work.
the fact that every other of the 4 essential functions now run faster is interesting though. since this is not hand-coded C neckbeard version. some of the difference could be Go's better optimization of generated assembler and the other could be a more efficient memory scheme. on every single operation the Go code is using less memory, mostly a lot less memory. my attempt at making it faster in pure go has already yielded what would be suitable for a more constrained device to sign/verify more stuff in less time.
So it's an actual algorithm change which brought these performance benefits? How come we don't find this in libsecp256k1 - Could your optimisations also be ported into libsecp? 🤔