Replying to Avatar Pip the WoT guy

A better approach imo is what it's described in this paper http://snap.stanford.edu/class/cs224w-readings/bahmani10pagerank.pdf

The tradeoff is that the values are approximated, but the big advantage is speed. Instead of 14s, recomputing pagerank takes roughly half a second.

And similar for personalized Pagerank.

Interesting. Worth looking into. I don’t see fast PageRank in the neo4j library but maybe it’s just hidden. And we’d have to examine it closely and ask whether the fast algo is more susceptible to being gamed by bad actors.

And a third option: instead of calculating standard PageRank from scratch each time, use previous PageRank scores and iterate from there with updated data.

Reply to this note

Please Login to reply.

Discussion

I already tried the third option with the python library networkx standard implementation of pagerank, and the speed difference was almost zero.

Not available for neo4j, it is available in memgraph, buuuut I checked the algo and it's horribly implemented. It returns wrong results (I've opened an issue on their gh and talked to the their team, they confirmed it's wrong).

I’m glad you’re taking a look at memgraph - I know it exists but I’ve never tried it out.

Have you tried any other foss centrality algo libraries?

My current plan is to expose an API that will provide a variety of WoT scores: DoS (number of hops), personalized PageRank, GrapeRank, etc. Then clients can access the APU and use different scores for different purposes.

The API can provide scores that are “global” or personalized, with personalized being more cypherpunk. Results can be scores of one pubkey at a time or can provide scores for a batch of pubkeys. Example: the top 100 pubkeys by GrapeRank score, not including pubkeys you’re already following, which could define a feed that you’d use when you’re bored with your existing feed. And if you get bored with that, maybe switch to the next 100 on the list, and so on.

Just networkx. I am building my own algos rn, because the algo in that paper requires to store a bunch of random walks in the DB and keep them updated at all times.

your algos on GitHub?

It should be less susceptible to manipulations than recomputing with high precision every hour.

In that hour a popular npub can get hacked and start making damage.

So in that hour the pagerank distribution can change a lot. With the online version you have a bounded error, no worse than that.

Interesting. You’re imagining that a hacked high profile account could follow an army of bots and elevate their PageRank scores, and this would be especially bad if PageRank scores are updated frequently or continuously. Seems to me there are a variety of strategies we could use to mitigate that scenario. For simplicity, perhaps we should just deal with that problem when it happens. So I’m thinking: start out with whatever is the simplest, quickest and easiest (for devs) method that works, see what benefit it brings, and see what problems come up. For me, that would be vanilla PageRank using neo4j with the graph data science plugin because:

- PageRank is the most well known centralized algo

- neo4j and the relevant plugin is FOSS

- the existing codebase works out of the box