Expiring notes will not likely save that much space. Lists use the most space (mute lists, contact lists, etc)

Reply to this note

Please Login to reply.

Discussion

any other observations or insights after running this thing for a while?

The past 30 days are the most hot data for kind 1. The drop rate for likes, replies and zaps is huge. Rarely much happens after a week.

Unless you’re doing search, at present, you likely wouldn’t need it to keep events forever. There isn’t a great way to discover old events unless you search or someone posts or replies to one (usually from a user profile timeline).

Long form content likely needs a longer lifespan. Maybe creators will repost or send a tweak edit to keep them in relay DBs. Likely creators will pay to keep it available.

If you replace events like kind 0/3/10002 or 30000 range, you’ll significantly reduce data. I’ll get the stat, but kind 3 data is 5-10X all kind 1 data, which is the second highest (that’s keeping the old kind 3 events too).

Spam makes up 90% of event volume, when unfiltered.

So far most active users have between 4,000-15,000 events total. Often that’s less than 10MB each.

There are around 8-10 too relays, and then a whole heap of mid tier relays that have a lot of events, but aren’t syncing from other relays.

That’s some general stuff I’ve seen anyway.

Good insights, thanks #[6]

This is solely the raw json data. When you add other columns and extract out to other tables and add indexes… my Postgres DB is at 130GB. Very little spam in there.

Keep in mind I haven’t purged old kind 3, because I generate change in followers over time graphs.. but I sometimes need to re-generate the data while I improve it.

Oh. And I don’t persist kinds in the 20k range. I suspect the kind 5 delete count are high due to spam as well. Likely some historic channel spam kind 42 as well.

What a great contribution. And with data 👏🏼

Interesting. Lists could be optimised in the backend by indexing the pks and storing only the lists of indexes.

I do this for tags. The issue is relays serve json, and unless you store the json in a ready format, generating json events on demand is very computational from lots of refs/joins.

I understand.

In my own relayer, soon to be open source, I'm storing things as raw JSON. I am thinking to compress and store as binary (same format as the data to be signed). Maybe that will save some percentage.

I think strfry does this in flatbuffers ?

Whichever post get the most 🤙🏻 will get to stay forever, shit posting will go to trash 🚮 like self cleaning 🧼 process! 💭 that’s and idea 💡.

Flatbuffers is a good choice. As long that it has a strict schema (to avoid storing metadata, like BSON/JSON).

Although my relayer is built in Rust, my top priority is to launch it as soon as possible. Once it's up and running, I can then focus on optimizing it further. The main requirement for the relayer is that the signature matches the reconstructed event, and that the content is compressed to minimize bandwidth usage.