What is their format?

Reply to this note

Please Login to reply.

Discussion

this is BS right? CBOR

but the events include the pictures, not sure, maybe videos too

i had an extensive exploration of this for Arweave and i told them if they were to upload the data they'd blow their whole network up

here we see i was right

i already saw Steemit blow up just from a 100k+ beyond the point you could sync it on an average pc 7 years ago, that was on a blockchain haha

the tool i built for arweave was gonna chew some serious bandwidth as well i guess

costs quite a bit of money to hammer through more than a terabyte of bandwidth a month, i'm not doing it for funsies

oh yeah, this is the thing, it wasn't so much the data format, as the complexity of it, even though it's binary, they have a gazillion little nodes in the structure that take up space even when empty, and most of them are empty, and mean nothing, a bit like the content of their words and thoughts

also, i tried not to read too much of the actual text in the messages, but it was what i was extracting and some of it was really ick, way worse than the garbage you get on mastodon loli posters

i feel like maybe i should have made more of a fuss about what brain rotting garbage was flowing through my wires on their account and into my poor, PTSD brain

I'm glad that you asked: https://gitlab.com/soapbox-pub/eclipse/-/snippets/4782163

The most important one is this:

Table "public.bsky_blocks"

Column | Type | Collation | Nullable | Default

--------+---------------+-----------+----------+---------

cid | character(59) | | not null |

bytes | bytea | | not null |

They store everything as CBOR-encoded binary values, content-addressable by their IPFS (actually IPLD) CID.

However, they still want to be able to _query_ this data...

So they end up storing everything in bytes AND in plaintext in the database.

Not only that, they also store binary _diffs_ of every single user action, making everything take up 2x more space.

Amazing engineering prowess.

So they are Postgres and then they have a table that is just a key-value store for all the binary data and then duplicate everything in order tables for indexing? What are "blocks" anyway?

you can laugh. i had to write code to work with that tire fire

i had the option to otherwise work with weferium fartcaster... i shoved that on my junior colleague who is ignorant and young enough to not understand why i would not want to touch that shit... or the fucking javascript bullshit to prove our aggregator made things appear on their shitcoin database