Distributing the events over an entire landscape of personal, small, medium, and large-sized relays is a game-changer because it scales up incredibly well.

No one event store needs to hold all of the events and -- here's the kicker -- the longevity of events can negatively correlate with the number of events on a particular relay. At the same time, you can use DVMs, databases, and search engines to traverse the data landscape.

That means data can be decentralized, while data discovery can be centralized.

#nostr

Reply to this note

Please Login to reply.

Discussion

Did you digest the math in that note?

The more events a relay contains, the shorter the time in which it need administer those events (the faster it can delete them) and the more it can be a true "relay" (passing things on), rather than an "event store" (saving things).

The idea to have a few gigantic relays storing everyone's notes in perpetuity destroys this efficiency because you end up needing gigantic data centers and that shit is expensive and is an attack vector.

Nobody else can do this.

Doesn't this result in the largest relays acting more like search engine indexers than anything else?

Querying a large relay will be like Googling something; its main job is to take you someplace where the information you're looking for is stored more permanently.

With the mailbox model, yes. Especially profile and relay metadata needs to be easy to find.

You need some relays that you can traverse, to get from one data pool to the next, so they need to hold enough notes long enough, that people can find friends and set up hops with their frens.

And then your WoT is your public square and the squares overlap.

I don't need regular Access to all of the notes being generated in Mongolia or Argentina. I just need a way to connect with my two bros in Mongolia and Argentina, so that I can download their notes to my preferred hosted relays or my cell phone.

Rather than one gigantic data pool (a la Twitter), you have lots of seas, lakes, and ponds connected by rivers and streams. It's slightly slower, but not noticeably so, and the data moving between the different bodies creates some redundancy.

That's what I meant by "proximity to the end user", in the other note.

The end user should think about how he stores his events, just like he thinks about how he stores his word documents, e-mails, or whatnot. He can pay someone to archive them, he can download them, he can have a relay on his PC or a privately hosted relay, etc. Or he can just be like, I don't care about my GM note from Oktober 2022.

Most of the data stored on these gigantic servers is obsolete. People don't care if they delete it, but they feel obliged to save everything and they have your spam folder and the Powerpoint presentation you did 5 years ago copied to 5 different cloud megaservers and mirrored in 15 data centers worldwide and advertisers pay for that.

It's completely stupid and environmentally nuts.

Simple text notes are like daily chit-chat; no one remembers. Notes that get a bunch of interaction are probably worth saving, just like someone will remember an insightful comment made in a conversation.

The more effort that goes into a piece of content, the more likely it should be archived forever. Blog posts, git repos, books, videos, etc. that are intended to be more timeless should stick around.

Yes, if I want to find it, later, long-form is better, for instance.

A good search algorithm, then, needs a way to remember, on a per-npub basis, which relays are part of that npub's "public square."

If you go looking for someone, you start in the town square. If you can't find the fella in the square, you ask around, and people point you to this or that part of town where you might find the person.

It's six degrees of Kevin Bacon but for the internet. You don't need a direct link to everyone, you just need a large enough network that you can get to anyone else in six hops or less. And "large enough" isn't actually that large.

I bet one solution will be for search algorithms to live client-side. It needs to be attuned to each npub's network of connections, in order to more quickly find what that npub is probably looking for. There will be server-side algorithms as well, but those will probably solve a different problem.

The clients will collect mailbox-lists, I suppose, like a WoT of relays.

We also don't actually need to limit the largest relays as much, if we don't use them as de facto archives. Someone with a good hardware setup or a nice cloud contract might have a couple terrabytes of data and a top of the line server to retrieve them.

That's a lot of notes.

A double coincidence of wants? I want to type. You want to read. I send to our relays. Relays' delete-timers start. Your client needs to pull from our relay(s) before deleted. The shorter the relay delete-timer, the lower the relay costs, and the more constant your client needs to be online and pulling. You wake up in the morning and read my notes, long after they've been deleted from our relays. So we'll either have to pay the relays, or have an always-on client? Something like that?

The sender and/or the reader will need to pay for longevity, somehow, somewhere.

There is a heck of a lot of empty storage on remote servers, personal devices and USB sticks.

Idk, I only have about 1TB free to share ATM, which according to nostr:npub137c5pd8gmhhe0njtsgwjgunc5xjr2vmzvglkgqs5sjeh972gqqxqjak37w is practically useless

πŸ˜‚

1TB connected to a modem with v42bis 🐢🐾🀣🀣🀣

When desired, can a highly-followed-sender app encrypt their notes to each of their very many followers?

We already have private groups and we'll be getting private relays, and they have their own npub, so they have their own encryption key for all of the members, I guess.

It wouldn't need to be that short. Storage is cheap and json is tiny. I was thinking more like weeks on the very biggest, months on the mid-sized, years on the smaller, indefinitely on personal devices.

Datacenters currently store documents I wrote literally decades ago and I don't even know what all is in those folders and I don't care. πŸ€·β€β™€οΈ

Just deleting older versions of notes or notes that had been marked "delete" would help. Relays currently store EVERYTHING, even if the user specifically asked them to delete it.

Makes me think of Tahoe-LAFS.

Except that we have lots of HTTP servers with their own fileservers, and it's the client that usually handles encryption.