Chorus Relay WIP:

I hit a big coding milestone today. I've now got code to translate a JSON event into a binary-serialized event from one buffer to another with zero-allocation and no copies to temporary areas (just using small stack variables for sizes and position tracking). This was the hard direction; going in the other direction will be easy. I did not use a generalized JSON parser but instead hand-coded a parser that bails as soon as the input is invalid for a nostr event. I'm handling JSON escape sequences and UTF-8 encoding with custom code so that it happens only when I want it to and not multiple times.

The benefit of this is that it will run crazy fast. Memory acceses are very slow compared to CPU operations. A linear contiguous event (and contiguous JSON data) is likely to be cached into L2 or L1 cache all together so there should be few memory access stalls. Memory allocation (which we avoid) crates more non-contiguous memory areas which can stall the processor, and also the algorithm that has to find such a memory area on the heap sometimes isn't as fast as you would hope.

There is one stutter that I couldn't avoid - I have to scan through the tags and count them and then scan them again to copy them. That is because the serialized event structure is optimized for fast reading of tags, not fast writing. When you read tag #200, you need to do that in O(1) so the tags section starts with a table of offsets as to where each tag starts. Because I don't know how long that table will be until I count the tags, I have to do the first pass. At least while doing the first pass I don't need to do UTF-8 validation or JSON escape sequence translation, I can just look for backslashes or doublequotes to find the ends of strings.

Another caveat is that it doesn't accept unknown fields right now. That would require parsing sub-objects, arrays, numbers, true/false/null, etc, and just tossing the unknown and useless value. Right now it considers events with fields not defined in NIP-01 to be invalid, which is good enough for now.

I need to get (or generate) large set of JSON events that are not nice compact ones but weird ones that might break my code, fields in every kind of order, large out-of-range numbers, strange but legal whitespace, all the strange unicode characters, unicode escapes, other JSON escapes, etc.... and then test this code against all of that to make sure it handles everything properly.

Looking back at the bigger picture, this was the biggest hurdle in my attempt to write a relay, so I should be able to progress much faster now.

Reply to this note

Please Login to reply.

Discussion

very cool, thanks for sharing

sick