Just released #notepack, a new compact binary format for nostr notes inspired by the #nostrdb binary format, but much simpler.

https://github.com/jb55/notepack

rust docs: https://docs.rs/notepack/latest/notepack/

spec: https://github.com/jb55/notepack/blob/master/SPEC.md

cheers!

Reply to this note

Please Login to reply.

Discussion

How do you measure if this is worth doing? I did something similar on my recent DB adventures, but I realized that the bulk of the size on Nostr comes from the tag array and content, not the other fields. I ended up referring to the hex representation because it is much easier to work with and it doesn't really add much to the db (<1% in my tests).

i'm not sure if it is worth doing but I had an idea in my head for the simplest version of a binary encoding I could think of and wanted to put it out there.

the encoding-lowercase-hex-tags-as-bytes makes a big difference in contact lists (74kb -> 36kb). its the difference between 1mb and 500kb upload when subscribing to 10 relays with your follow list, assuming something like this was ever adopted.

Well, for the wire, I don't really care that much because almost every relay compresses the data anyway. So, the gain is minimum for all the conversion required.

But for the local db, I thought the biggest gains came from creating a "table" just for all the possible strings in tags + pubkeys + ids + addresses. Then, instead of having 32 or 64 bytes for each tag, we have 4 bytes for an Int, which would be more than enough to represent everything in a single-user db.

However, converting them back and forth for queries and so on becomes quite expensive.

But the savings are really 10-20x.

So, I tried some hashcode methods to avoid having to save a table and convert. That works, but collisions are way higher.

personally I think this could be great as an over-the-wire format, rather than storage, since as you're pointing out it makes no difference in storage.

the biggest advantage is that it's super simple to encode and decode (simpler for your cpu than json, in fact!).

sending 1000s of events from a relay could save hundreds kilobytes of data (which is great for areas with bad internet, or phone connections), and plenty of cpu cycles on both the client and relay (i.e. better for older devices, and cheaper servers)!

I think nostrdb format is better for storage since its flatbuffer style and doesn't require serialization in or out of the DB.

this is definitely better for over-the-wire, but i'm confused when ya'll are saying its not good for storage, since it will detect and encode any lower-cased hex string in the note as bytes, which is great for reducing storage sizes, many times its 50% storage reduciton.

but yes its also crazy cpu efficient.

Nah.. the wire is gzip compressed with almost every relay these days. The gain for comms is minimum.

you can skip gzip and json decode using this format, and its smaller. thats lots of cpu savings.

I don't think it is smaller. Did you test it? A 300 people contact list goes from 23kb of minimized json to 12kb as gzip. And those have the lowest sparcity. Any text note compresses like crazy.

yes I tested it:

36746

39742

zstd beats it:

35530

but you still have to un-zstd and json decode..

i'll try to see how much faster it is with benchmarks

Yeah, those numbers make sense. I don't know, I feel like beating a dead horse when we try to beat compression algorithms.

another nice thing is that this format ensures the id is at the start, meaning its very easy to reject parsing/verifying the entire thing when when checking if you already have it:

for field in packed_note {

// first field peeked when looking at the bytes

if let ParsedField::Id(id) = field {

if cache.contains(id) {

break;

}

}

}

I had to really hack this in my json parser on nostrdb to get this working

its not bad nostr:note1t3cnac2fqr4wcl6a8eve63q3577e77p8rgrgpak8mn77py4dhnfqxwrnyc