I’m thinking a client like primal could use standard PageRank to exclude spam from search and trending results. Recalculate scores periodically, maybe once an hour should be more than enough. Should be relatively easy. Meanwhile, we should make a personalized WoT relay where WoT is the _personalized_ PageRank score, which as you point out should take about the same resources to calculate as standard PageRank, and use that score to exclude spam from your relay. Then we can discuss what algos to add next. I would propose GrapeRank for a variety of reasons. But, we take baby steps.
Discussion
A better approach imo is what it's described in this paper http://snap.stanford.edu/class/cs224w-readings/bahmani10pagerank.pdf
The tradeoff is that the values are approximated, but the big advantage is speed. Instead of 14s, recomputing pagerank takes roughly half a second.
And similar for personalized Pagerank.
Interesting. Worth looking into. I don’t see fast PageRank in the neo4j library but maybe it’s just hidden. And we’d have to examine it closely and ask whether the fast algo is more susceptible to being gamed by bad actors.
And a third option: instead of calculating standard PageRank from scratch each time, use previous PageRank scores and iterate from there with updated data.
I already tried the third option with the python library networkx standard implementation of pagerank, and the speed difference was almost zero.
Not available for neo4j, it is available in memgraph, buuuut I checked the algo and it's horribly implemented. It returns wrong results (I've opened an issue on their gh and talked to the their team, they confirmed it's wrong).
I’m glad you’re taking a look at memgraph - I know it exists but I’ve never tried it out.
Have you tried any other foss centrality algo libraries?
My current plan is to expose an API that will provide a variety of WoT scores: DoS (number of hops), personalized PageRank, GrapeRank, etc. Then clients can access the APU and use different scores for different purposes.
The API can provide scores that are “global” or personalized, with personalized being more cypherpunk. Results can be scores of one pubkey at a time or can provide scores for a batch of pubkeys. Example: the top 100 pubkeys by GrapeRank score, not including pubkeys you’re already following, which could define a feed that you’d use when you’re bored with your existing feed. And if you get bored with that, maybe switch to the next 100 on the list, and so on.
It should be less susceptible to manipulations than recomputing with high precision every hour.
In that hour a popular npub can get hacked and start making damage.
So in that hour the pagerank distribution can change a lot. With the online version you have a bounded error, no worse than that.
Interesting. You’re imagining that a hacked high profile account could follow an army of bots and elevate their PageRank scores, and this would be especially bad if PageRank scores are updated frequently or continuously. Seems to me there are a variety of strategies we could use to mitigate that scenario. For simplicity, perhaps we should just deal with that problem when it happens. So I’m thinking: start out with whatever is the simplest, quickest and easiest (for devs) method that works, see what benefit it brings, and see what problems come up. For me, that would be vanilla PageRank using neo4j with the graph data science plugin because:
- PageRank is the most well known centralized algo
- neo4j and the relevant plugin is FOSS
- the existing codebase works out of the box