How do you calculate the similarity score? Is it PageRank or something like it?

And are you using a graph db like neo4j? (Which has powerful tools for calculation of various similarity scores)

Reply to this note

Please Login to reply.

Discussion

I'm a few steps behind, my project is still a bit primitive haha

Right now, it only performs a simple breadth-first search (BFS) on the graph, where I calculate the similarity between user names and the search term using an algorithm I implemented based on this paper: http://www.catalysoft.com/articles/StrikeAMatch.html

I'm not using any database; I implemented everything from scratch in C to achieve the highest possible performance. By using the paper above and implementing it in the most optimized way possible with bitwise operations and everything, the similarity calculation became orders of magnitude faster than all existing methods—about 300% faster than cosine similarity, for example.

But it's still a pretty primitive project, as I mentioned. Right now, it's just a simple search in a graph and list. Later on, depending on the needs that come up, I might implement something like PageRank when considering user posts. For now, the search is just done by profile.

So do you query relays for follows and build the graph on the fly at the time of the search request?

In the case of the relay search, it doesn't involve a graph—it's just a simple list, since the number of relays is very small and is not likely to increase significantly. The similarity calculation is the same, but it's done by iterating over a simple array.

On the front end, a POST request is made to the relay, which eliminates any relays that don't respond to the request, listing only the active relays. This ensures that only active relays will appear in the search results. Another thing I implemented is a 2-second timeout for the requests made to the relays, which also ensures that only the relays closest to you, and thus more relevant, will appear.

I only use a graph for the users. The relays are mapped in kind 3 events, which contain a list of relays and are sent to the engine that saves them in a list loaded into RAM. The graph is also loaded into RAM at the application startup, and everything runs in memory.

nostr:npub1manlnflyzyjhgh970t8mmngrdytcp3jrmaa66u846ggg7t20cgqqvyn9tn we should dig into this to compare it to what we are doing