damus relay is about 243GB in size now 👀

Reply to this note

Please Login to reply.

Discussion

wow

just getting started

Do you anticipate needing to implement expiration policies?

Going to have to do something soon. Maybe just dump everything to a json file and start over ? Hopefully I’ll have the purple relay running soon for permanent storage.

Seems only reasonable. We can’t expect you to carry that indefinitely.

#[3]​, would there be a reasonably straightforward way for individuals to spin up a nostream relay and transfer/offload their historical notes from larger relays like Damus?

The imminent expiration of notes from large relays might serve as a good incentive to run one’s own.

7-day retention period for all data.

If you want a long term solution:

1. Host your own relay, and you become responsible for its longevity.

2. Pay a relay to host it for you. Subscription or service model.

3. Ads.

4. #Footstr

1 and 2 are acceptable. 3 is heresy. 4 is ick.

Totally doable.

We are working on making that happen :)

Hit ya boy up when you’re ready. We’ll get that 3-click deploy going.

there are dozens of free and open source relays, no need to wait for anyone or anything.

no shade to nostream - just clarifying this is a thing a motivated individual can do today.

Motivation is relative to skill level. I’m talking about for normie users. Not everyone can work a CLI.

I think the key here is "straightforward".

Imagine a site where you type the address of one relay and the address of another and it just copies your events for you.

Of course any sufficiently motivated or tech savvy individual could do it. It’s not up to Nostream to facilitate this but we could help.

totally agree - that sort of frictionless site would be great.

‘Dump everything to a json file & start over'…

I think storing just data for a month or two to keep it probably below 100GB and have the old records pruned would help. It’s tough operating a free relay and managing the resource usage costs.

Big blocker! 😛

まじか。

どのくらいの期間で243GBなんだ?

素晴らしいサポートに感謝します🫂👏

Wen relay S3 bucket offload

Still just fits in RAM :P

wow...

Daaaaaam us

Naahhhstr

How does it feel? How do you manage it? I think you should talk or write about that someday. I think it's the largest, right?

Please select as appropriate...

🔲Omg that's great!

🔲Shit, hope you're ok:(

(Too silly to know)

Thanks

How much of it is spam? 🫣

DAMUS FTW 💜🫂💜

“Is that a Damus relay in your pocket…..”

An option for the client to send expiration times for a note. Because of Defect xxx months and then in relay deletes it. Or if the user determines a shorter time(-/-/-), better.

do we have a graph? X axis-> time in days, Y axis-> Size in GB

What db does it use?

Have you pruned anything yet?

How much of it is reactions?

nostr: where you can’t delete anything, but on a long enough timescale…

Hmm that’s an issue. Expand drives and become more centralised as the user base rises but relays will close due to cost. or implement an action to remove posts older than a month? Perhaps posts with no activity, likes or shares disappear. Although is that erasing history.

If it’s already over 200gb it seems 1Tb would be hit pretty quick.

Public relays will likely dump old data. This is what damus relay will likely do

Working on the purple relay now

Give that power to the user.

nostr:note1ny7z66uy4c7ajkmg2wh0z8uf05857adqc7jq4rvetm8cwtg5gcyszr35kc

Expiring notes will not likely save that much space. Lists use the most space (mute lists, contact lists, etc)

any other observations or insights after running this thing for a while?

The past 30 days are the most hot data for kind 1. The drop rate for likes, replies and zaps is huge. Rarely much happens after a week.

Unless you’re doing search, at present, you likely wouldn’t need it to keep events forever. There isn’t a great way to discover old events unless you search or someone posts or replies to one (usually from a user profile timeline).

Long form content likely needs a longer lifespan. Maybe creators will repost or send a tweak edit to keep them in relay DBs. Likely creators will pay to keep it available.

If you replace events like kind 0/3/10002 or 30000 range, you’ll significantly reduce data. I’ll get the stat, but kind 3 data is 5-10X all kind 1 data, which is the second highest (that’s keeping the old kind 3 events too).

Spam makes up 90% of event volume, when unfiltered.

So far most active users have between 4,000-15,000 events total. Often that’s less than 10MB each.

There are around 8-10 too relays, and then a whole heap of mid tier relays that have a lot of events, but aren’t syncing from other relays.

That’s some general stuff I’ve seen anyway.

Good insights, thanks #[6]

This is solely the raw json data. When you add other columns and extract out to other tables and add indexes… my Postgres DB is at 130GB. Very little spam in there.

Keep in mind I haven’t purged old kind 3, because I generate change in followers over time graphs.. but I sometimes need to re-generate the data while I improve it.

Oh. And I don’t persist kinds in the 20k range. I suspect the kind 5 delete count are high due to spam as well. Likely some historic channel spam kind 42 as well.

What a great contribution. And with data 👏🏼

Interesting. Lists could be optimised in the backend by indexing the pks and storing only the lists of indexes.

I do this for tags. The issue is relays serve json, and unless you store the json in a ready format, generating json events on demand is very computational from lots of refs/joins.

I understand.

In my own relayer, soon to be open source, I'm storing things as raw JSON. I am thinking to compress and store as binary (same format as the data to be signed). Maybe that will save some percentage.

I think strfry does this in flatbuffers ?

Whichever post get the most 🤙🏻 will get to stay forever, shit posting will go to trash 🚮 like self cleaning 🧼 process! 💭 that’s and idea 💡.

Flatbuffers is a good choice. As long that it has a strict schema (to avoid storing metadata, like BSON/JSON).

Although my relayer is built in Rust, my top priority is to launch it as soon as possible. Once it's up and running, I can then focus on optimizing it further. The main requirement for the relayer is that the signature matches the reconstructed event, and that the content is compressed to minimize bandwidth usage.

Have you think about 'Data Sharding' ??

I like this. I personally think the right to have one’s data forgotten is an essential part of freedom and should fill in for those want a delete button and are not anonymous on the platform.

Im willing to bet that more than half is spam. Are you guys implementing something to auto purge spam? Most of it is the exact same post over and over.

Are you using postgresql as db?

what sort of time period is that?

Before I start proposing solutions like all the posts above, a few additional questions:

1. Is it a lot or a little?

2. What has been the trend over the last week/30 days?

3. Does the Damus relay have any purging/archiving mechanism implemented?

4. What effect does database size have on relay performance?

5. Is the relay architecture scalable?

Do you want to write a longer article about the operation of Damus relay? Many of us would welcome it!