any other observations or insights after running this thing for a while?
Discussion
The past 30 days are the most hot data for kind 1. The drop rate for likes, replies and zaps is huge. Rarely much happens after a week.
Unless you’re doing search, at present, you likely wouldn’t need it to keep events forever. There isn’t a great way to discover old events unless you search or someone posts or replies to one (usually from a user profile timeline).
Long form content likely needs a longer lifespan. Maybe creators will repost or send a tweak edit to keep them in relay DBs. Likely creators will pay to keep it available.
If you replace events like kind 0/3/10002 or 30000 range, you’ll significantly reduce data. I’ll get the stat, but kind 3 data is 5-10X all kind 1 data, which is the second highest (that’s keeping the old kind 3 events too).
Spam makes up 90% of event volume, when unfiltered.
So far most active users have between 4,000-15,000 events total. Often that’s less than 10MB each.
There are around 8-10 too relays, and then a whole heap of mid tier relays that have a lot of events, but aren’t syncing from other relays.
That’s some general stuff I’ve seen anyway.
Good insights, thanks #[6]
This is solely the raw json data. When you add other columns and extract out to other tables and add indexes… my Postgres DB is at 130GB. Very little spam in there.
Keep in mind I haven’t purged old kind 3, because I generate change in followers over time graphs.. but I sometimes need to re-generate the data while I improve it.

Oh. And I don’t persist kinds in the 20k range. I suspect the kind 5 delete count are high due to spam as well. Likely some historic channel spam kind 42 as well.
What a great contribution. And with data 👏🏼