btw, https://orly.dev uses SIMD for hex encoding and SHA256 hashes, this is probably part of the reason why it's so much faster.
the json decoder state machine i wrote probably could translate quite well into a pure SIMD implementation. that would be retardedly fast
one of the things i love about it is it uses goto to register the state of the decoder, instead of additionally using the stack to store state (and thus requiring it to read several objects on and off all the time). the state is the point where the PC is. this is how the best state machines work. i'm pretty sure the Go lexical analyser uses this technique.