Is replacing DNS (structured network) with manual gossip a step forward or backward?

Mainline DHT has millions of nodes, so for any given moment in time there is a capacity for billions of small packets.

When you introduce a caching layer (relays) on top, you get the structured (yet flatter) hierarchy of DNS, where the DHT is the alternative to root servers.

You combine that with the natural semantics of DNS, especially TTL, and these relays now can know when to check the DHT again.

Then you only need the DHT to contain your data only often enough for relays to pick it up.

It is not about efficency or correctness, it is about the UX and DX and reliability of the system... remeber that DNS started with manual gossip of host.txt contents... this didn't scale, why try it again?

Reply to this note

Please Login to reply.

Discussion

DHT is a "live" system. This has certain benefits, like efficiently propagating changes. It also has certain limitations, like active connectivity. Files also have benefits, like compression. They also have limitations, like inefficient propagation. The question shouldn't be which is better, but which is better for a specific use. For instance, files aren't a good way to store real-time status data. On the other hand, data stored in a DHT will never be available when you're offline or off-planet. I think the decision comes down to how often you expect data to change

I agreed with most of this until the end, the question is how often are you going to make a query to something that you haven't cached before. I call this the cold lookup.

And the answer to that is the vast majority of the time for two reasons;

1. There are more URLs on the Web and more endpoints than you can cache, and if you want to replace ICANN DNS, and not just contact list in a social app, then you should expect that most URLs are seen for the first time or it was evicted from your limited cache.

2. You want to support stateless clients, so caching at relays is much more reliable than expecting every client to start with it's own equivalent of host.txt of the entire web

Again, we already know what works for the general purpose, and it is not local host.txt.

But I am not saying you shouldn't use local caches if you can, you should, and Pkarr encourages that, but it is just one layer of caching, and without the falling back to the DHT, the UX really doesn't work, you can't take a step backwards from the status quo, especially when you are already asking users to do the hard thing of managing their own keys, you can't add to that URLs that have a 10% chance of working and "depends".

The choice isn't between devices having the whole dataset or not: even if you had the whole thing, you'll never have the most recent updates. The choice is between "large caches" synchronizing the current state over a DHT protocol, or by negentropy / reliable UDP. Given storage is ~$10/TB and falling, it makes more sense for each node to store everything than it does to be fancy about where things are located

Got it, this is usual gossip vs DHT question.

I think you are underestimating how vulnerable to spam this gossip network will be, and how expensive it will be.

There is an inverse relationship between cost of nodes and their decentralisability, and I claim that in a system with no fees for creating new inputs, the only stable configuration is a consortium of small mutually trusting servers, basically like Email.

But of course if that is not convincing, we can always try things in practice.

I doubt anyone here will argue with experimentation. As far as costs of new records goes, this seems like a great application for hash cash: if the difficulty is too high for your device, there could be a simple market for signing new records. What I'm suggesting is a system less like a "DHT", and more like Bittotrent. Now BT uses a DHT for finding files, but once you've found "the file", the rest is essentially spraying bytes

Bittorrent had 15 years to develop a system where the gossip part can't be spammed/bused and where sharing data is fair, and nothing came up better than good old trackers.

My point is, the only system we know of where gossip works at scale for persistent data is bitcoin, and that is because you have to pay fees to gatekeepers who themselves have to pay in a verifiable scarce resource.

And it seems you are reinventing that by thinking in the direction of hashcash, where a proof of work entitles you to store your data.

The problem is, hash cash was already invented to solve the centralisation of email by countering spam, and it failed miserably, and will fail again every time, I bet.

I understand that you're deep into DHT's, but I'm not sure that your arguments stand on their own. If the goal is to reduce updates, either system will use some form of proof of work, or rate limiting, which is a proxy for the work of maintaining a node on the network. HashCash is a fine proof of work, it just didn't work for email because it's too hard to tack on to an existing system.

What I think undermines your arguments is the lack of perspective on designing a solution to fit the problem. A DHT is a poor solution for storing my grocery list. A file is a poor solution for storing current stock prices. If the goal is to replace DNS, that just isn't very much data compared to the 40 TB of space I just slapped in my PC.

If the goal is to locate things that are constantly changing locations, then fine use a DHT. But most things that can even be addressed don't change addresses very often. My residential "dynamic IP" is stable for many years at a time. Using a DHT either increases the latency of the vast majority of lookups, or results in the same caching that DNS is known for...

I vote for simpler systems that benefit from continuous advances over complex systems that try to squeeze a little more out of what we had yesterday

Please tag me when you build the system that you think is simpler than a DHT, so i can poke at it. For now what I have is DHTs that work in theory and in practice. I claim that gossip persistent replication will always degenerate to centralisation, and have long list of historical case studies, and nothing would make more excited than trying to break another attempt, and failing.

Other than BitTorrent?

Or more directly, Lightning

Lightning is literally a network of mutually trusting nodes, directly or indirectly through a payment channel.

It is not a gossip network trying to store and retrieve ever growing set of data reliably and performantly.

I want to remind you that I suggested RBSR paper to Hoytech and helped with Negentropy, but it doesn't work for what you want to use it for. It works for passively replicating data you are interested in, it doesn't magically make all the data available to everyone, and if the data is jot available to everyone, then you again have to deal with the question of ; how to find who has the subset i am looking for in less than 500ms

The answers we know of are;

1. A structured network (DNS/DHT)

2. Small set of servers (like ten popular relays)

If you are happy with (2), fine, but then you have to explain what incentives do they have to serve the entire web (costs a lot).

But what you can't do is claim that you can either have full replication of an ever growing data across thousands or millions of nodes, OR claim that you can have partial replication across 1000s of unstructured nodes and have fast queries.

At least not with extraordinary proof. So maybe start building and let's poke at it and see what happens. Or run a simulation or something.

I know it's kind of bullshit, but both CharGPT and Claude independently estimated the sum total of all DNS records is on the order of 1 or 2 TB. This is $20 of storage space today. I'm an edge case, but I can download this to my house in less than two hours. How many top level domains are changing addresses every minute?

What I don't like about DHT's is that you have to ask for everything. This means you need to be a node, or have a friendly node, in order to find things. It also creates a trail of everything that you're looking for, which makes it really easy to notice when people are looking for things that are controversial. Pushing data doesn't have this problem: yes, you need to find a source to pull from, but you can benefit from bulk optimizations. Then you can query things in zero milliseconds, and there only record is between you and your local copy.

That's real decentralization

First; DNS currently is not used by most people, let alone bots.

Secondly It doesnt matter what are you willing to download, the question is how on earth am I supposed to know that you have the data I am looking for?

Thirdl, if your solution is " always download and horde everything that anyone publishes" then I personally and many more will start spamming the hell out of you until most nodes give up and churn, just to troll or to use you as a free storage system.

And that is before getting to the freshness of data.

My point is, claiming that you can have a full replication of open data with redundancy is just false, you either have to sacrifice redundancy (centralisation) or sacrifice full replication or sacrifice openess (make it paid like bitcoin).

And as soon as full replication fails, you are back to; who has the data i am looking for? I have a URL, where is the server, the only way to do this reliably is structured networks, not gossip.

This a misunderstanding about DHTs.

1. You don't need to run a node, you can ask for data in a client mode, just go to Pkarr repo and run the examples, start the process make the query close the process, in and out less than the time to open an average Web page.

2. No one said don't use relays to cache data from many people's queries, no one said don't gossip and cache long term on top... what we are saying is; we need a source of truth that scales and we can fallback to when data is not available in your private cache that you got from your friend on a USB stick.

🤷🏻‍♂️ Sounds like we can stuff it in mempool now.

I get the desire for a nice system that has great answers for the latest changes, but they come with significant costs as well. I prefer and argue for a systems of independent, near-complete caches over a fancy network that knows where something lives.

Apparently the Damus relay had five hundred gigabytes of data – which means I could fit it all in RAM, twice. It's important to watch how constraints change over time

Good for you, not sure how is that relevant for rest of us and our phones, should we all query your RAM? Or who should we query?

The system I'm arguing for, you have the option of privacy. In a DHT, you have no such option

Says who? Who is preventing you from scrapping the DHT and build a push system or share what you downloaded on a USB stick?

I did my best to explain why you can't have the features that a DHT offer any other way, I never said you shouldn't do more on top.

Also, privacy can be achieved in other ways.

It seems you weirdly want to download the entire web and read it locally because that is awesome and private, sure, do that.

My question remains; what about phones and people who don't have 500 GB of RAM or storage at all? How should they find the data they are looking for?

If someone in their community has a full copy – they can ask them, and that's as far as the query needs to go. Or, you can keep asking less trusted servers until you find your answer. If you're in a conflict zone and you want to find someone controversial, this is important.

But no one is going to let you scrape the DHT to provide this kind of plausible deniability because query scraping is the kind of spam the DHT is supposed to keep out. The only way to get a full, private copy is if the system is designed around the efficiencies you gain from putting things in chunks

You are just mistaken. DHTs don't stop you from what you want to achieve as futile as I think it is, but the DHT doesn't make it any harder.

My relay has thousands of packets, nothing can stop me from gossiping with other relays, I just think it is a bad idea.

How would I get someone's record without revealing who I'm looking for?

Same way you plan to do it in your system, build a gossip network on top to share the scrapped data from the DHT or make a network of relays that forward things to the DHT but also gossip it with whoever wants to download it all ... or whatever.

There is nothing in using the DHT that makes using gossip in parallel impossible... the only thing makes gossip impractical is that it is inherently impractical.

Remember... Nostr already doesnt have gossip and actually started as a critique of gossip (Secure scuttlebutt), and Negentropy was invented on top.

Yet somehow you make it sound like Nostr relays are inherently gossip friendly, well, extend the same grace to Pkarr relays.

You get exactly what you want, but also a DHT to fall back to when your system inevitably fails for the vast majority of web needs.

Nostr relays aren't "gossip friendly", they just have a "latest" query. How do I ask the DHT for the latest updates?

Again, either scrap the dht as the BEP i linked to explains, or run Pkarr relays and add extra endpoint to get latest submitted stuff.

Also, what Nostr relays are you asking for latest updates? All of them?

DHT nodes organize themselves in a structural way so you can find them all if you try, so they should be simpler to scrape than Nostr relays, which you don't know them all.

Anyways, I think you are not suggesting a practical solution, and what you actually want is just an extra requirement that DHTs don't make harder at all.

If you believed this actually works you would have been caching the entire DNS in your host.txt file, the original way people did DNS until they realized they need a structured network... but you know it doesnt work so you don't.

I would, but everyone shuts off zone transfers, so you can't. They aren't doing that to make the system more efficient, they're doing it to see what people ask for

Bittorrent isn't at all what you are describing, when. Peers are gossiping they are not sharing an ever growing dataset, it is a static file, and if it was a changing data set they would necessarily be discarding portions of it because they don't have infinite storage. In fact most seeders Immediately delete the static data as soon as they are done with it.

You can try to slow down the data growth as much as you want, but the best you can do is make it as expensive as a bitcoin transaction and make it take half an hour to hashcash an update, but ignoring the awful UX of that, the data still grows, forever.

How many nodes do you expect dedicating 100 gigabytes for that? Definitely not millions or thousands... definitely not as many as bitcoin nodes since it is not as profitable or necessary to store the full ever growing set.

And then these few nodes have to serve the entire Internet, and they become easier to control or attack because they are few.

You will never be able to have the full dataset, you will have to discard data, and the moment you start doing that, you will realise that people trying to read need to find out who has the parts they need, and they can't because there is no structure. So they will have to ask everyone, and that is exactly what a DHT is meant to make scalable; how to ask log20(n) nodes instead of n nodes.

Anyways, just build it, and see if you can survive spam without degenerating into a handful of nodes like an abandoned Bittorrent infohash.

I guess we'll see whether Nostr survives or not

I mean, Nostr is very small, and very centralised (most people read and write from and to small set of servers) and STILL full replication is not the case, so unresolvable links are common.

And this is the best case (the social media), where lazy gossip is natural, try doing this for cold queries like curl which is what counts as the Web.

What do you see in the future that makes this better not worse?