Nope. But I am now intrigued to hear more.

I'm familiar with what neo4j is.

Reply to this note

Please Login to reply.

Discussion

I have a prototype of a project, Brainstorm, which calculates personalized trust metrics and makes them easily accessible by clients throughout nostr. The prototype uses neo4j to keep track of NostrUser nodes connected by follows, mutes and reports, but we’d like to build a general purpose graph database nostr relay capable of managing all event types, with node types that would include User and Event at a minimum; maybe also Relay, Client, etc. We’d like input from all corners.

First question: is neo4j the best choice? Are there other dbs like GunDB we should consider?

Second question: how can we come together as a community and build it? nostr:npub10mtatsat7ph6rsq0w8u8npt8d86x4jfr2nqjnvld2439q6f8ugqq0x27hf is working on one, maybe there are others who can contribute.

Want to set up a call with the nostr:npub1healthsx3swcgtknff7zwpg8aj2q7h49zecul5rz490f6z2zp59qnfvp8p team and tell us your thoughts?

nostr:npub1gcxzte5zlkncx26j68ez60fzkvtkm9e0vrwdcvsjakxf9mu9qewqlfnj5z

nostr:npub1t8pwzkkhhs94e9acgw9jwca9csyl7a4tnpdttu05039um5j7d6xs72gkvf

nostr:npub1hqaz3dlyuhfqhktqchawke39l92jj9nt30dsgh2zvd9z7dv3j3gqpkt56s

nostrdb is going to support a custom WoT table which is graphy i guess

It would need to implement index free adjacency to be a graph db. Meaning an edge can be traversed from one node to the next directly in memory without referring to an index.

i don't really need a full blown graph db for wot though. just a few indices (followers of A, who A follows) for all A. plus a few other things

Follows are a weak trust signal, but it's better than nothing.

right, there would be more to the algo obviously. but just those two indices would already be really useful for lots of things. the full algo would be personalized pagerank + custom metrics (zaps, reactions, etc)

Have you thought about how to make those metrics available to other nostr clients? I’m thinking specifically of nostr:npub1gcxzte5zlkncx26j68ez60fzkvtkm9e0vrwdcvsjakxf9mu9qewqlfnj5z‘s Trusted Assertions NIP.

not really no

Are you maintaining a separate relational db so you can look up Alice’s followers without cycling through every kind 3 note in your db?

no that would be retarded. i would process kind 3s as they come in and update the who follows who index

And the who follows who index is in a relational DB? Or LMDB?

nostrdb is lmdb. you don't need relational for anything. its all key values anyways.

even sql builds its indices on btrees... relational or lmdb its still btrees all the way down

SurrealDB uses GraphRAG, which I’m not familiar with.

Suppose I wanted a list of all pubkeys 3 hops from Alice by follows. That would be a nightmare to implement in a relational database. In neo4j in my hands, it’s very performant.

Even something more simple, like give me a list of all of Alice’s followers, takes a long time if all you have is a bunch of events in strfry. But with neo4j you have it in a snap.

I’ll be pleasantly surprised if GraphRAG can handle either of those queries performantly.

Nostrdb uses LMDB which is fast, but it’s not a graph database.

how much do you want to bet a custom WoT algo on lmdb beats a generic graph database. I would put money on this

in terms of perf

If you want to limit the WoT algos to no more than one or MAYBE two traversals, and if you want the nostr user base to remain minuscule, then stick with LMDB.

If you want sophisticated WoT algos and if you want the nostr user base to grow, then graph databases will outperform LMDB.

we'll see !

Are you aware of the advantages of graph databases in terms of performance for certain categories of queries?

I am aware and I can confidently say that an optimized KV store will outperform a graph DB which has much more layers to traverse.

Because WoT is mostly BFS with some additions

its hard to beat a few btree lookups hot in a compact binary cache

Graph databases are all just memory pointers everywhere. There is no "lookup". The "btree" is the graph.

Btree is the index. The nodes/relationships are stored as records. Neo4j 5 uses a more efficient method than 4, but essentially the nodes store the exact memory location of the nodes they're related to, and you would just jump your pointer to that location to read it, same with the relationship props.

That's what index-free means. You can traverse the graph without using any indices. Which is also why no other database can compete in terms of actual graph performance.

But I'm sure that the key value bros can do better 🤭

Index free adjacency is the key. For sufficiently deep traversals, native graph databases like neo4j have a theoretical advantage over non-native graph databases.

For a traversal of depth d in a graph with branching factor b (average degree), a native database’s time complexity is roughly O(b^d). But for a non-native db we have to add the index cost: O(b^d * log n) where n is the size of the index (e.g. n = the sum of nodes and edges).

A depth of 3-5 is where it becomes noticeable. WoT won’t work if we tie our hands by a limit on traversal depth.

log n isn’t as bad as you think

traversing n = 1B is 2.5x as expensive as n = 10k

now, what cost does this “index free” approach bring compared to indexes? it pushes the storage engine to add a 2nd translation layer for pointers and/or implement compactions with pointer rewriting

all while making your codebase more complicated for no reason

the act of calculating PageRank is significantly more expensive than a few DB lookups.

I think that arguing about speed and efficiency is missing the point, and why all conversations about graph data go to shit.

Use the schema that fits the domain. If your data is graph data, use a dedicated graph database.

Neo4j is still the correct tool for Nostr. Because Nostr data is graph data. It may not outperform the specific, narrow use cases you think up to dismiss it, but that doesn't matter to me. You need to reframe the way you think about data to see the power of graph representations. And everything you say tells me you can't do that.

Anything can be a graph database. Even relational databases are one form of graph. We don’t need large, unnecessarily complicated tools to solve the Nostr graph problem.

What does Neo4j bring to the table compared to standard Nostr filters that makes it so useful? That it makes complicated queries look cheap and easy?

A graph database is one of the most general types of databases (even more than RDBMS) and is by consequence one of the hardest to optimize.

It simply does not make sense for Nostr, where purpose specific databases can allow for graph like queries but much more optimized for the use case (and unlock new capabilties due to that).

That you can place Nostr events in it in a form that is natural and organic. That you can let the data form itself into the structures that emerge organically by the network that creates it. And that it gives you the right tools to find things you couldn't imagine before you started.

That is why graph data is powerful, and why it outstrips tables and other databases in my mind. There's power in it's inefficiency because you can build on it indefinitely. You set yourself up to build things you can't imagine instead of constraining yourself to what you're building today.

A product is no good if it is too expensive to run or use.

What *can you really do* with graphs that you can’t do with REQs, even if multiple of them?

We'll have to see. Like I said, you have to let complex structures emerge before you decide what to do with them.

Native graph dbs can execute *performant* path queries that are *long and complicated*.

So what we have here is a tradeoff. IFA (native graph db) avoids the O(log n) cost. Also, it scales well in the sense that query time doesn’t degrade with increasing total graph size, only with the subgraph queried. So we weigh that against the storage complexities introduced by IFA (translation layers and pointer rewrites) as you point out.

For apps that are traversal-dominant with more reads than writes, IFA wins those tradeoffs. And what we’re building is a knowledge graph curated with the assistance of your WoT. By its nature, it’s going to get bigger and more traversal-heavy as it gets more sophisticated. And the tasks that will really push its abilities are going to be read-heavy. (Imagine searching a knowledge graph to discover surprising but useful topological similarities between subgraph A and subgraph B. Also known as “making an analogy”.)

At least, that’s my current thought process. It’s worth keeping in mind that for write-heavy scenarios with short traversals, the tradeoffs may favor non-native graph databases. And there are non-native graph databases that are built on LMDB, like TuGraph and HelixDB.

But does this so called “log n” scaling even matter compared to the real world in real numbers?

A performance decrease of only 3x for a minimal part (reading the graph instead of analyzing it) at a scale of over 100000x is insanely good, compared to all those compactions that will be needed and the tremendous write workload at that scale...

A billion users doesn’t mean n = a billion. If you’re looking at how people interact with each other, you’ll need a lot of indices for a lot of events.

It means 1 billion nodes though in an index.

If I want to discover instances where Alice reacts to a kind 1 note authored by Bob, I’ll need indices for content in addition to indices for users.

Or suppose I want a chart of total zap amount in sats as a function of how far down a note is on a thread. Now suppose I only want to count zaps sent by frens (mutual follows) of the thread initiator. Or suppose I want a query even more complicated than that.

We’ll need indices for all content events, all zaps, all nostr events. Trillions.

You take all zap events and use some fast OLAP DB to crunch through this data instantly instead of wasting your time with a graph DB

This is a simple merge-join on 2 indexes that is cheap

Very specialized wot would absolutely do better on a purpose built stack.

Generic wot that is flexible and user friendly would do better on the only graph database worth using, which is neo4j.

OK, from my quick research it looks like surrealDB is basically a relational db that simulates a graph db but still uses indices. Index-free adjacency is what makes multi traversal queries so much more performant in a graph db like neo4j.

TBH. I'm not super savvy on graph stuff, just familiar with it. Seems like the thread got some traction with people smarter than myself though. 🙃

Would be great if this would publish Nostr events that clients could query. It could be sharded in multiple replaceable events, e.g. 100 or 1000 "p" tags in each event. So to get the WoT score for a bunch (or one) pubkey, it would require only a quick query and returned a single event with the data.

These events could potentially be broadcasted to wide amount of relays.

Currently, the Brainstorm prototype publishes personalized trust metrics as kind 30282 events as per the Trusted Assertions NIP. Each event publishes one event with multiple trust metrics (personalized GrapeRank, personalized PageRank, hops, verified followers count, and a few more) for one pubkey, as seen from the perspective of another pubkey’s WoT.

Calculating the scores is the hard part. Publishing them is relatively easy. It would not be difficult to publish a modified Trusted Assertion that contains scores for a list of pubkeys, not just one. Maybe a modification of a NIP-51 list.

*30382

Whose perspective are you using for the scores?

It's NIP-85, it's not personalized by service provider based. Hopefully if there is a provider or specification for getting from user perspective, I'll add that.

https://github.com/nostr-protocol/nips/pull/1534

NIP-85 allows scores to be personalized. If Alice is on nostr:npub16x7nxvehx0wvgy0sa6ynkw9c2ghuph3z0ll5t8veq3xwm8n9tqds6ka44x, you check to see whether she’s authored a kind 10040 note. If so, it tells you how to find 30382 events with metrics personalized to her perspective (by filtering on the author she trusts to publish her metrics). If she hasn’t authored 10040, you can use a default perspective, which could be your own user account, or could be nostr:npub16x7nxvehx0wvgy0sa6ynkw9c2ghuph3z0ll5t8veq3xwm8n9tqds6ka44x's npub. To set up default Trusted Assertions, go to straycat.brainstorm.social, sign in with nostr:npub16x7nxvehx0wvgy0sa6ynkw9c2ghuph3z0ll5t8veq3xwm8n9tqds6ka44x, sign up, and click yes to activate Trusted Assertions, which broadcasts your 10040 event. I’ll let you know once all the 30382 events are published.

Thanks for explaining. Right now it's just a local setting (the relay), I will write an 10040 later on, but I doubt I'll support full customizability to it, especially the specification allows multiple relays of same rank type. I don't know what users with multiple relays would expect, if all ranks should be shown individually for reach relay, or if the first one should be picked and used.

We’ll be announcing a NosFabrica WoT hackathon soon. Incorporation of Trusted Assertions into clients like yours would be a great way to participate. We’ll love it if you decide to join in! 😃

#wotathon

Yes, I definitely will join!

Waste of time build a self hostable forum that can run locally from an android app.

I still figure it out how a GrapeRank algorithm can be used to calculate a trust score 🤔