Nostr Web Client

sounds about right. If you compute it from scratch for a specified node, personalized pagerank should take the same

I’m thinking a client like primal could use standard PageRank to exclude spam from search and trending results. Recalculate scores periodically, maybe once an hour should be more than enough. Should be relatively easy. Meanwhile, we should make a personalized WoT relay where WoT is the _personalized_ PageRank score, which as you point out should take about the same resources to calculate as standard PageRank, and use that score to exclude spam from your relay. Then we can discuss what algos to add next. I would propose GrapeRank for a variety of reasons. But, we take baby steps.

Reply to this note

Please Login to reply.

Discussion

Pip the WoT guy 1y ago

A better approach imo is what it's described in this paper http://snap.stanford.edu/class/cs224w-readings/bahmani10pagerank.pdf

The tradeoff is that the values are approximated, but the big advantage is speed. Instead of 14s, recomputing pagerank takes roughly half a second.

And similar for personalized Pagerank.

david 1y ago

Interesting. Worth looking into. I don’t see fast PageRank in the neo4j library but maybe it’s just hidden. And we’d have to examine it closely and ask whether the fast algo is more susceptible to being gamed by bad actors.

And a third option: instead of calculating standard PageRank from scratch each time, use previous PageRank scores and iterate from there with updated data.

Pip the WoT guy 1y ago

I already tried the third option with the python library networkx standard implementation of pagerank, and the speed difference was almost zero.

Not available for neo4j, it is available in memgraph, buuuut I checked the algo and it's horribly implemented. It returns wrong results (I've opened an issue on their gh and talked to the their team, they confirmed it's wrong).

david 1y ago

I’m glad you’re taking a look at memgraph - I know it exists but I’ve never tried it out.

Have you tried any other foss centrality algo libraries?

david 1y ago

My current plan is to expose an API that will provide a variety of WoT scores: DoS (number of hops), personalized PageRank, GrapeRank, etc. Then clients can access the APU and use different scores for different purposes.

The API can provide scores that are “global” or personalized, with personalized being more cypherpunk. Results can be scores of one pubkey at a time or can provide scores for a batch of pubkeys. Example: the top 100 pubkeys by GrapeRank score, not including pubkeys you’re already following, which could define a feed that you’d use when you’re bored with your existing feed. And if you get bored with that, maybe switch to the next 100 on the list, and so on.

Pip the WoT guy 1y ago

Just networkx. I am building my own algos rn, because the algo in that paper requires to store a bunch of random walks in the DB and keep them updated at all times.

david 1y ago

your algos on GitHub?

Pip the WoT guy 1y ago

It should be less susceptible to manipulations than recomputing with high precision every hour.

In that hour a popular npub can get hacked and start making damage.

So in that hour the pagerank distribution can change a lot. With the online version you have a bounded error, no worse than that.

david 1y ago

Interesting. You’re imagining that a hacked high profile account could follow an army of bots and elevate their PageRank scores, and this would be especially bad if PageRank scores are updated frequently or continuously. Seems to me there are a variety of strategies we could use to mitigate that scenario. For simplicity, perhaps we should just deal with that problem when it happens. So I’m thinking: start out with whatever is the simplest, quickest and easiest (for devs) method that works, see what benefit it brings, and see what problems come up. For me, that would be vanilla PageRank using neo4j with the graph data science plugin because:

- PageRank is the most well known centralized algo

- neo4j and the relevant plugin is FOSS

- the existing codebase works out of the box