Subnostr

Curious what you would use instead of JSON. I am partial to using fixed-width binary formats for whatever you can get away with, but some things are better off with something more readable.

But as you point out they are messy and don't give nice deterministic hashes. Not without a bunch of painful clean-up which opens the door to abuse.

I am doing my own thing as well, but am a bit stuck on what to use for type definition files. All I really care about is the hash of it, but you need something human readable to translate the hash.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 7mo ago

i'm using a format that is a lot like the top level structure of Wurth family languages (of which Go is pretty much one)

binary format is for databases, and not really advised for wire formats, for reasons of needing a lot more tools to just simply make sense of data that is sent on the wire.

json is a horrible, "natural language" similar structure. it's not line structured, it is riddled with brackets, parentheses and interstitial comma separators. just that one point alone costs an extra step in the logic of parsing or marshaling to not put a separator after the last field.

nah, the event format looks like this:

PUBKEY:

TIMESTAMP:

CONTENT:

TAG:key:value (values can also be binary, with the same prefix as above)

... any number more tags

SIGNATURE:

the order of fields is fixed, and everything up to the newline before SIGNATURE derives the event ID hash, so that is also left out, it is only needed once the message has been received, and even in the database a truncated hash of the ID is used as an index, as the client knows the hash. also i make a dedicated index just for getting those ID hashes directly from other index references to the database sequence number of the event, so i can return the ID and not decode the event (it's in a fast binary format) or i can instead seek to the event by its serial number and return the whole event (so i can make query syntax that requests either the whole events or a whole huge number of event IDs so the raw event can be fetched on demand

anyway, that's the general gist of what i'm working with. line separated makes scanning passes a lot simpler to write than bracket syntax, brackets and quotes (even worse for escaping) both require multi-level state machines, that spec as you can see above, you can do a quick and dirty parser that just splits by lines, then splits reading forward by semicolons, and you have all the data except for decoding the binary fields. and yes, you can also quickly scan to get the event body, everything down to the signature, just by counting newlines and reading the first few characters, in fact, as they can be distinguished just by two characters, following a newline character.

i hope that gives you some ideas, anyway. i have written advanced (even JSON) state machines for unmarshalling but they are very long, like 400 lines of code, and that's not necessarily enough because you have to handle string escaping and you also have to handle arbitrary whitespace for pretty printed forms.

that's the thing about the format i designed. it doesn't need a pretty printed form, it is readable already, because it uses line breaks and sentinels, you can scan it visually also to recognise its elements, same reason it's easy to see what is what, it's easy to write code that sees what is what.

i mean, the whole business with commas and lists has a whole subject in the field of language syntax, the oxford and cambridge and whatever other idioms of writing shit in human languages.

in the olden days, when computers were slow, small and shitty, people designed simple formats for data that were both simple to read and simple to parse. good old Jevon's paradox at work, making lazy humans make "pretty" things with "sophisticated" syntax because "oh, it's ok, the processors are cheap and fast now"

just compile a large (25mb source+) Go project and compare it with a similar build process for a Rust or C++ or even C compilation. you look away, and Go is finished. The others, you have time to cook breakfast, drink a whole coffee and forget what you were doing on the PC

Reply to this note

Please Login to reply.

Discussion

Daniel Wigton 7mo ago

I like that. Far more consistent than json. I will probably so something like that for Context Definitions. Someting like

PARENT_CONTEXT:<32 bytes base64>

PERMISSION:read|write|read_write

NONCE:<16 bytes base64>

NAME:

DESCRIPTION:

Then you take the hash of the above to get the Context_Id. Context being something like a kind in nostr, but fully permissioned.

The actual "Events" I call them Notifications (this predates nostr) are binary but because everything aside from the payload are fixed width it is easy to parse.

<32 bytes> // Notification Context_id

<32 bytes> // Sender_id this is an ed25519 signing key that is permissioned for the context

<32 bytes> // Recipient_id x25519 key permissioned to read this context

<32 bytes> // Encrypted decryption key. Above keys derive a shared secret to recover this key.

<64 bytes> // Signature

// encrypted payload, can be anything, format is Context specific.

I am torn on including a timestamp. The time sent can be a lie, time recieved is the only verifiable information. The risk is replay attacks. They wouldn't do anything other than potentionally be useful for DDOS, but I have other mechanisms to prevent that.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 7mo ago

yeah, timestamps are basically lies, it's true. and really they only have relevance as a mitigation against replay attacks on signature auth stuff (this is why malleability was such a bfd with bitcoin in 2017)

the timestamp on events can be somewhat malicious too, such as making it far into the future and then dumb clients will render it as a perma-sticky in a post list as well.

good reminder why i need to have a "first seen" index on events when they are stored, and why it makes sense to expose an API to learn this info from an event ID

i say, put timestamps in these things, even if they are not there as a replay attack mitigation, most clients won't spoof that. events that are way old, doesn't hurt anything, but probably makes a lot of sense to refuse to accept an event with more than half an hour in the future timestamp, i mean, what self respecting sysop doesn't maintain reasonable time sync anyway, it can cause big problems with a lot of things including signature based auth command systems, but also third party signed certificates that have a relatively short lifespan.

anyway, yeah, the binary formats are quite easy. i use a varint encoding for field lengths when they are variable, then they can be stream-read, where null terminating is only really suitable for things like database keys, where a length prefix will mess with sorting, and the data size is small anyway.

Daniel Wigton 7mo ago

My problem with timestamps is that they don't scale. And by scale I am talking about the solar system. Your reference frame matters. If you want timestamps you have to know what is reasonable to expect and so you need reference frame conversions.

It isn't exactly difficult, I have a future proofed concept, but why bother if I don't need it at all. Time of reception is all I really care about. If a particular application wants time sent, they can embed that in the data.

My other reason is to keep it small. I am not going to tear up my connection and storage for the crap you want to send, and I am not going to have a fancy protocol for you. You get to send me one authenticated and encrypted UDP packet. If your cat picture doesn't fit in there then give me the hash_id of it and the decryption key in the Notification and I'll request it if I want it.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 7mo ago

haha, yeah, time becomes a funny thing in the context of very big distances.

causality is a different thing, maybe you have heard of lamport clocks, but these relate to an interactive protocol that lets you establish sequence. we use a central time clock here on earth because it takes less than half a second for a signal to go from any place to any other place most of the time.

the key thing is that regardless of what kind of encoding you use to signify time, to make it robust to represent causality there has to be an interactive protocol involving a mutual third party. at best you can estimate ping to the other user, and then via the middleman you can estimate the time skew.

as for actual networking across the distances of space where signals have a latency over a few minutes, it's just simply not a network like the internet, it's more like a postal system.

Daniel Wigton 7mo ago

Yup. I have yet to tackle local (to the planet) caching. It is going to be a crappy experience if my grandkids send me a picture of their pet cow on Mars, but with my stingy protocol I have to wait another half-hour after requesting it.

But it is probably easy. You just have caching agreements with trusted contacts where you will store files on their behalf so requests for resources can be fast.

I dreamt this all up in response to the company I was working at struggling to maintain an email server for 60 employees. They got pushed to over-priced cloud providers by spam. It takes a lot of storage to hold the spam for 60 people while you try to filter it. Nope, you get 1024 bytes to convince me to download the rest.

And that is after immediately dropping any packet from a key I don't recognize. That is why it comes before recipient in the definition above, faster filtering.

You can ddos me, but I am not spending a single second reading your spam.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 7mo ago

i think that interplanetary radio communication is the wrong way to do it. maybe i have talked about this before? quantum entanglement would enable a realtime connection, devices already exist but their bandwidth is pretty miserable. even still, parallelizing signals across an array of them should eventually be reasonable and then... well, time will be kinda highly subjective beyond what we experience with telecomms here, still would need a solution. latency of even to the moon is not very practical for anything interactive. you can send recordings but the conversation will be like a pen pals chess game or turn based game of civilization lol. three weeks later you finally finish talking about what you wanted to talk about when you started and a whole month has almost passed.

Daniel Wigton 7mo ago

Quantum entanglement doesn't actually allow FTL comms. If you and I share an entangled qbit between earth and Europa all that means is that when I measure the value of my qbit it will be in instant agreement with your qbit, but the value is random. We have no ability to decide what value we see. We only know that we are seeing the same thing.

You can use this to send a message that cannot be intercepted, but you still have to send the actual data by conventional means. The qbits just make a fancy one-time-pad