What is their format?
Discussion
this is BS right? CBOR
but the events include the pictures, not sure, maybe videos too
i had an extensive exploration of this for Arweave and i told them if they were to upload the data they'd blow their whole network up
here we see i was right
i already saw Steemit blow up just from a 100k+ beyond the point you could sync it on an average pc 7 years ago, that was on a blockchain haha
the tool i built for arweave was gonna chew some serious bandwidth as well i guess
costs quite a bit of money to hammer through more than a terabyte of bandwidth a month, i'm not doing it for funsies
oh yeah, this is the thing, it wasn't so much the data format, as the complexity of it, even though it's binary, they have a gazillion little nodes in the structure that take up space even when empty, and most of them are empty, and mean nothing, a bit like the content of their words and thoughts
also, i tried not to read too much of the actual text in the messages, but it was what i was extracting and some of it was really ick, way worse than the garbage you get on mastodon loli posters
i feel like maybe i should have made more of a fuss about what brain rotting garbage was flowing through my wires on their account and into my poor, PTSD brain
I'm glad that you asked: https://gitlab.com/soapbox-pub/eclipse/-/snippets/4782163
The most important one is this:
Table "public.bsky_blocks"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+---------
cid | character(59) | | not null |
bytes | bytea | | not null |
They store everything as CBOR-encoded binary values, content-addressable by their IPFS (actually IPLD) CID.
However, they still want to be able to _query_ this data...
So they end up storing everything in bytes AND in plaintext in the database.
Not only that, they also store binary _diffs_ of every single user action, making everything take up 2x more space.
Amazing engineering prowess.
So they are Postgres and then they have a table that is just a key-value store for all the binary data and then duplicate everything in order tables for indexing? What are "blocks" anyway?
you can laugh. i had to write code to work with that tire fire

i had the option to otherwise work with weferium fartcaster... i shoved that on my junior colleague who is ignorant and young enough to not understand why i would not want to touch that shit... or the fucking javascript bullshit to prove our aggregator made things appear on their shitcoin database