Introducing Cherry Tree, chunked files stored on blossom servers

https://github.com/hzrd149/cherry-tree

https://npub19sstws4x9t7nua2sh6cxkeez25y6lvvnq6fqpmyzwnthsjap5tqqkdsghg.nsite.lol/

The experiment was to see how easy it was to split a large file into chunks, then upload those chunks to multiple blossom servers and reassemble then on another computer

The best part about this is the blossom server has no idea its storing a chunked file. it just sees a bunch of binary blobs uploaded from a random pubkey

This isn't private though, the blossom servers could easily look at the content of the binary blobs an figure out what type of file it was and piece it together. however if the client encrypted the blob first it would be a lot more private

If you want to play around with it you can find a list of public blossom servers at https://blossomservers.com/

If you want more details I've created a PR explaining how it works https://github.com/hzrd149/blossom/pull/51

The app was built using applesauce https://hzrd149.github.io/applesauce/ rx-nostr https://penpenpng.github.io/rx-nostr/en/ and blossom-client-sdk https://www.npmjs.com/package/blossom-client-sdk packages

Reply to this note

Please Login to reply.

Discussion

- Split and store Chunks of a Blossom Blob and spread then across Blossom Drives, and then search for them then reassemble them later by referencing it's Cherry Tree.

"Do any of these words mean anything to you?" A question to a random person.

I love funny new words and combining them x3

Btw, I'm not sure how many people say this, but thank you =3

It almost made sense to me 😅

some of the chunks are actually merkle trees or something right? like, the file's block list?

Putting the list of every chunk hash into a single index file like this means that index file can grow in size progressively, becoming bloated for very large files. This is the flaw of IPFS Merkle DAGs and Cherry Trees.

The index file should be chunked too, which is what Scionic Merkle Trees finally achieved after a lot of hard work.

This ensures every bit of data transmitted is evenly chunked. It’s a standard that matches the efficiency of Classic Merkle Tree used in Bitcoin (where everything transmitted is chunked) but with the folder abilities the classic trees lack.

Just FYI for anyone reading who know of both. 🏝️

The chunk lists are stored on Nostr and the chunks are stored on blossom servers.

There is a general upper limit on the size of Nostr events but this might not be that big of an issue since these chinked blobs are just single file instead of the folders that IPFS had

ah, so nostr relays become like the file allocation table

you can probably scale it by chunking it down, so one event refers to either hashes of blossom blobs or other nostr events, you could use b tags for blossom (the file) and n tags for multiple layers of depth to cope with large files

i think how much you have to do it depends on the limits of the blossom server right?

like, if blossom protocol allows arbitrary sized files then you can do single events to refer to a single file in a typical max 500k event size by making the file sizes bigger, how do you deal with that in blossom? i haven't looked, just a "saying it out loud" question

So Cherry Trees have the flaws of Merkle DAGs (growing file for list of chunks) with the narrow abilities of Classic Merkle trees (support restricted to single files).

That seems like a big waste of time… regressing us backward. I don’t think we should try to beat Merkle’s scheme.

It has the flaws of Merkle DAGs but it does not use merkle trees, its basically just a recreation of torrent V1 files on top of nostr and blossom servers

I'm not trying to fix anything with this, just building an app to stress test blossom servers. and I was curious if it could be done in a really simple way

Why not just use bittorrent then?

its slightly more simple, and it works in web browsers. although I'm not trying to replace bittorrent, I'm just trying to stress test blossom servers and introduce the concept of pay-per-upload

Chunking and pay-per-upload have nothing to do with each other. You can stress test blossom servers & payments with large unchunked files.

Blossom is great for unchunked files, but chunked files should be a Merkle-based scheme.

Creating a poor chunking standard is just foolish and can raise the risk of delay attacks for large files. Merkle would be disappointed.

Because the index is for independent file I don't really think this would be a problem. Specially because you can increase the chunk size if this where the case.

Once we mix this with ecash pay on demand Blossom Servers we could have a good enough solution for archival of important / wanted data

nostr:nevent1qqsx9l2j7g5t8llkm8r9d987er3hwj0gmrq4k09d88hzvq4ujysmksgppemhxue69uhkummn9ekx7mp0qgszv6q4uryjzr06xfxxew34wwc5hmjfmfpqn229d72gfegsdn2q3fgrqsqqqqqpv0tgdd

Have you explored content-defined chunking?

https://joshleeb.com/posts/content-defined-chunking.html

TBH I don't know much about these things, but it reminded me of a spec I came across in a Guix patch. They call it ERIS. It's storage and transport independent.

https://eris.codeberg.page/spec/