So far I've used 262 GB for all their likes and shitposts from roughly 20 days.
Discussion
Mine is in Nostr event format. I think their repository format is inefficient.
Bruh moment. They thought about scalability but forewent efficiency.
What is their format?
this is BS right? CBOR
but the events include the pictures, not sure, maybe videos too
i had an extensive exploration of this for Arweave and i told them if they were to upload the data they'd blow their whole network up
here we see i was right
i already saw Steemit blow up just from a 100k+ beyond the point you could sync it on an average pc 7 years ago, that was on a blockchain haha
the tool i built for arweave was gonna chew some serious bandwidth as well i guess
costs quite a bit of money to hammer through more than a terabyte of bandwidth a month, i'm not doing it for funsies
oh yeah, this is the thing, it wasn't so much the data format, as the complexity of it, even though it's binary, they have a gazillion little nodes in the structure that take up space even when empty, and most of them are empty, and mean nothing, a bit like the content of their words and thoughts
also, i tried not to read too much of the actual text in the messages, but it was what i was extracting and some of it was really ick, way worse than the garbage you get on mastodon loli posters
i feel like maybe i should have made more of a fuss about what brain rotting garbage was flowing through my wires on their account and into my poor, PTSD brain
I'm glad that you asked: https://gitlab.com/soapbox-pub/eclipse/-/snippets/4782163
The most important one is this:
Table "public.bsky_blocks"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+---------
cid | character(59) | | not null |
bytes | bytea | | not null |
They store everything as CBOR-encoded binary values, content-addressable by their IPFS (actually IPLD) CID.
However, they still want to be able to _query_ this data...
So they end up storing everything in bytes AND in plaintext in the database.
Not only that, they also store binary _diffs_ of every single user action, making everything take up 2x more space.
Amazing engineering prowess.
So they are Postgres and then they have a table that is just a key-value store for all the binary data and then duplicate everything in order tables for indexing? What are "blocks" anyway?
you can laugh. i had to write code to work with that tire fire

i had the option to otherwise work with weferium fartcaster... i shoved that on my junior colleague who is ignorant and young enough to not understand why i would not want to touch that shit... or the fucking javascript bullshit to prove our aggregator made things appear on their shitcoin database