Lightning is literally a network of mutually trusting nodes, directly or indirectly through a payment channel.

It is not a gossip network trying to store and retrieve ever growing set of data reliably and performantly.

I want to remind you that I suggested RBSR paper to Hoytech and helped with Negentropy, but it doesn't work for what you want to use it for. It works for passively replicating data you are interested in, it doesn't magically make all the data available to everyone, and if the data is jot available to everyone, then you again have to deal with the question of ; how to find who has the subset i am looking for in less than 500ms

The answers we know of are;

1. A structured network (DNS/DHT)

2. Small set of servers (like ten popular relays)

If you are happy with (2), fine, but then you have to explain what incentives do they have to serve the entire web (costs a lot).

But what you can't do is claim that you can either have full replication of an ever growing data across thousands or millions of nodes, OR claim that you can have partial replication across 1000s of unstructured nodes and have fast queries.

At least not with extraordinary proof. So maybe start building and let's poke at it and see what happens. Or run a simulation or something.

Reply to this note

Please Login to reply.

Discussion

I know it's kind of bullshit, but both CharGPT and Claude independently estimated the sum total of all DNS records is on the order of 1 or 2 TB. This is $20 of storage space today. I'm an edge case, but I can download this to my house in less than two hours. How many top level domains are changing addresses every minute?

What I don't like about DHT's is that you have to ask for everything. This means you need to be a node, or have a friendly node, in order to find things. It also creates a trail of everything that you're looking for, which makes it really easy to notice when people are looking for things that are controversial. Pushing data doesn't have this problem: yes, you need to find a source to pull from, but you can benefit from bulk optimizations. Then you can query things in zero milliseconds, and there only record is between you and your local copy.

That's real decentralization

First; DNS currently is not used by most people, let alone bots.

Secondly It doesnt matter what are you willing to download, the question is how on earth am I supposed to know that you have the data I am looking for?

Thirdl, if your solution is " always download and horde everything that anyone publishes" then I personally and many more will start spamming the hell out of you until most nodes give up and churn, just to troll or to use you as a free storage system.

And that is before getting to the freshness of data.

My point is, claiming that you can have a full replication of open data with redundancy is just false, you either have to sacrifice redundancy (centralisation) or sacrifice full replication or sacrifice openess (make it paid like bitcoin).

And as soon as full replication fails, you are back to; who has the data i am looking for? I have a URL, where is the server, the only way to do this reliably is structured networks, not gossip.

This a misunderstanding about DHTs.

1. You don't need to run a node, you can ask for data in a client mode, just go to Pkarr repo and run the examples, start the process make the query close the process, in and out less than the time to open an average Web page.

2. No one said don't use relays to cache data from many people's queries, no one said don't gossip and cache long term on top... what we are saying is; we need a source of truth that scales and we can fallback to when data is not available in your private cache that you got from your friend on a USB stick.

🤷🏻‍♂️ Sounds like we can stuff it in mempool now.

I get the desire for a nice system that has great answers for the latest changes, but they come with significant costs as well. I prefer and argue for a systems of independent, near-complete caches over a fancy network that knows where something lives.

Apparently the Damus relay had five hundred gigabytes of data – which means I could fit it all in RAM, twice. It's important to watch how constraints change over time

Good for you, not sure how is that relevant for rest of us and our phones, should we all query your RAM? Or who should we query?

The system I'm arguing for, you have the option of privacy. In a DHT, you have no such option

Says who? Who is preventing you from scrapping the DHT and build a push system or share what you downloaded on a USB stick?

I did my best to explain why you can't have the features that a DHT offer any other way, I never said you shouldn't do more on top.

Also, privacy can be achieved in other ways.

It seems you weirdly want to download the entire web and read it locally because that is awesome and private, sure, do that.

My question remains; what about phones and people who don't have 500 GB of RAM or storage at all? How should they find the data they are looking for?

If someone in their community has a full copy – they can ask them, and that's as far as the query needs to go. Or, you can keep asking less trusted servers until you find your answer. If you're in a conflict zone and you want to find someone controversial, this is important.

But no one is going to let you scrape the DHT to provide this kind of plausible deniability because query scraping is the kind of spam the DHT is supposed to keep out. The only way to get a full, private copy is if the system is designed around the efficiencies you gain from putting things in chunks

You are just mistaken. DHTs don't stop you from what you want to achieve as futile as I think it is, but the DHT doesn't make it any harder.

My relay has thousands of packets, nothing can stop me from gossiping with other relays, I just think it is a bad idea.

How would I get someone's record without revealing who I'm looking for?

Same way you plan to do it in your system, build a gossip network on top to share the scrapped data from the DHT or make a network of relays that forward things to the DHT but also gossip it with whoever wants to download it all ... or whatever.

There is nothing in using the DHT that makes using gossip in parallel impossible... the only thing makes gossip impractical is that it is inherently impractical.

Remember... Nostr already doesnt have gossip and actually started as a critique of gossip (Secure scuttlebutt), and Negentropy was invented on top.

Yet somehow you make it sound like Nostr relays are inherently gossip friendly, well, extend the same grace to Pkarr relays.

You get exactly what you want, but also a DHT to fall back to when your system inevitably fails for the vast majority of web needs.

Nostr relays aren't "gossip friendly", they just have a "latest" query. How do I ask the DHT for the latest updates?

Again, either scrap the dht as the BEP i linked to explains, or run Pkarr relays and add extra endpoint to get latest submitted stuff.

Also, what Nostr relays are you asking for latest updates? All of them?

DHT nodes organize themselves in a structural way so you can find them all if you try, so they should be simpler to scrape than Nostr relays, which you don't know them all.

Anyways, I think you are not suggesting a practical solution, and what you actually want is just an extra requirement that DHTs don't make harder at all.

If you believed this actually works you would have been caching the entire DNS in your host.txt file, the original way people did DNS until they realized they need a structured network... but you know it doesnt work so you don't.

I would, but everyone shuts off zone transfers, so you can't. They aren't doing that to make the system more efficient, they're doing it to see what people ask for