Centrality algorithms is the conversation we should be having.

PageRank being the most famous centrality algo. But it's not the only one. We should be talking about centrality algos and deciding which one(s) to use in nostr.

PageRank was MAGIC when it came out in 1998. It eliminated the vast majority of the spam of internet keyword search overnight. It changed the world.

And it's based on a very simple idea: calculate an "importance" score for each URL. If urlA contains a hyperlink to urlB, then urlB's score gets a boost. And the amount of the boost depends on the importance score of urlA.

So google scrapes the internet, calculates pageRank scores for urls based on hyperlinks, and uses the score to stratify keyword search. Bam. Bye bye to most spammy URLs. Not perfect, but not easy to game.

Nostr needs to do something similar. PageRank for pubkeys instead of for URLs. And instead of hyperlinks, use follows. Or zaps. Or replies. Or all of the above. There is more than one way to build a centrality algorithm.

And one last thing: in principle, we should be using personalized centrality algos, like "personalized PageRank." Because you should always be at the center of your WoT.

Reply to this note

Please Login to reply.

Discussion

This is an interesting take on that from one of the leads over at Bluesky/atproto:

"Lastly, for public networks, we think big full-network indices are basically inevitable. If a public network is successful, somebody will build a Google Search (Web), Google Reader (RSS), Google Groups (NNTP), Google Shopping... you get the picture. This kind of service can capture control of open networks if they aren't planned for from the start. atproto helps turn these extensions and "views" into commodity API services, and ensures that new providers have full easy access to data needed for indexing (in contrast to the many challenges with web crawling). This keeps the network resilient and interoperable even when (not if!) the largest tech companies in the world get involved."

https://whtwnd.com/bnewbold.net/entries/Reply%20on%20Bluesky%20and%20Decentralization

Basically it seems they went with a 'must be at least one all-knowing relay at all times' approach in part to make indexing redundant and therefore not an avenue by means of which some sort of capture might occur. So Centralisation A to ward off Centralisation B, but at least Centralisation A is part of a wider decoupling/duplication mechanism and therefore it's not apples to apples. All very interesting purely from a trade offs perspective (I'm fairly neutral about all this, still getting my head around.)

Thank you for pointing out that post. Very interesting!

I agree with the “when (not if!)” logic. Although I don’t follow the logic of how their Centralization A (one all knowing relay - presumably controlled by Bluesky?) would be better than Centralization B (focused around a legacy big tech company).

In my mind, some algos are inherently Big Tech friendly (and therefore a force for centralization) while others are inherently pleb friendly (and a force for decentralization), and it is up to us to discover the decentralizing ones. I would argue that the PageRank algo identifies and incentivizes nodes (whether urls or pubkeys) to strive to become “influencers” because their PageRank score gets higher and higher as they gain more followers (or attract more hyperlinks). This makes sense for a centralized Google and their advertising monetization model: reward nodes that drive traffic.

So I’d argue that we need to modify PageRank to serve the needs of the individual, not the needs of Google. I propose GrapeRank as one such candidate algorithm. For one thing, GrapeRank is specifically designed to provide a pubkey’s weight for doing calculations like weighted averages or adding up votes. PageRank is unsuitable for this purpose bc it gives too much weight to “influencers”. It is simply designed for a different purpose.

In nostr there will be a role for PageRank, but ultimately it will be alternate centrality algos that allow us to avoid centralization. Bluesky has already fallen as far as I’m concerned.

Thanks for the zap, my first ever! GrapeRank is a catchy name, this is your own project? Any docs?

Centralisation A is better (depending on who you are of course) because it allows for instant and exact duplication of the all-knowing relay by any outside party and without the need to ask permission. So I could run a duplicate of the atproto master relay without needing to ask Bluesky LLC about it. Of course any copying party would need a lot of infra to run their copy (each copy comes with the full history, lots of storage), but at least the mechanism for them to unilaterally make that copy is there. So it’s actually more like ‘distributed omnipotence’ in the sense of multiple distributed instances of the same central view, with the result of the indexing served up as a commodity, like the big cloud providers all sell VMs and cloud storage as a commodity. (You could even have AWS, Google Cloud and Azure each run a copy of atproto's all-knowing relay and sell access as a commodity managed service.) In this view of things, any developer party can bring any algorithm to the table and no party’s algorithm is going to be more data-hungry than any other party’s. So that incentivises app developers to build things that require a fully-indexed view, they hit the ground running.

Centralisation B is where one outside indexer takes enviable command of an indexing market where there is no other means of indexing besides old-fashioned crawling and sorting everything. There exists no all-knowing relay that developers and users with a need can pull from. This type of centralisation does not allow for unilateral copy-paste in the above way. The data, crawled and cleaned by that leading outside party, is proprietary data. Other parties that want to compete are basically going to have to start from square one, crawl everything, and do so at the same cost (which requires lots of in-house infra to compete, not cloud-leased infra). To make an analogy, instead of simply copying an MP3 of a song they have to set up a recording studio, buy the mics, call up the band, get them in, and record their own version of that same song. All of this is very expensive, and while they’re doing all that stuff the market leader is optimising to the point where it becomes extremely hard to catch up. And so eventually competitors settle into niche roles or give up.

Nostr will never have single all-knowing relays like atproto does, so the centralisation fear is that of Centralisation B. At nostr's current size, however, this isn't much of a fear, as the bar to entry for outside indexers, even considering all the ground work, isn't so high.

My take is that the nostr community should priorities use cases where a global view isn't really all that helpful, just doesn't add much. Town-square microblogging, however, is not one such use case.

I’m glad to be your first zap! ⚡️

Yup, my project. Grapevine is the product. The goal of the Grapevine is to build tools for you and your community to identify those pubkeys that are the most trustworthy, and in what context, to curate content, facts, and information. GrapeRank is the algo. Think of it like personalized PageRank — “PageRank for pubkeys” — but with a nonlinear term that makes the GrapeRank score suitable to use as a trust weighting for weighted averages, weighted voting, etc as described above. I’ve built several iterations, the most recent of which is at https://grapevine-brainstorm.vercel.app, which I am in the process of refactoring to make it more performant. nostr:npub1manlnflyzyjhgh970t8mmngrdytcp3jrmaa66u846ggg7t20cgqqvyn9tn and nostr:npub10npj3gydmv40m70ehemmal6vsdyfl7tewgvz043g54p0x23y0s8qzztl5h have also built their own implementations of GrapeRank.

Your distinction between A and B makes sense. And yup, if nostr becomes centralized it will be via Centralization B. Not a fear yet, but it will become more of a concern if / when the nostr user base gets bigger.

My vision is that personalized Grapevine WoT relays will enable us to “be your own Google” in the same way that btc allows us to be our own bank. In both cases, the solution is clearly imperfect — probably not every individual in the world will run a nostr relay just like not everyone will control a utxo or run a lightning node — but it’s nevertheless better than the status quo.

Do you have a project you’re working on? Or ideas how to tackle scenario B?