Let’s calculate how much it would cost to put all of nostr into neo4j 🤓

We can use nostr.band for reference.

Reply to this note

Please Login to reply.

Discussion

Assume the events are kept in a performant key value db like LMDB. Don’t need to load content field into neo4j. Maybe not the tags either.

I would store the event jsons in nodes at first, but it's worth it to compare how storing them separately would affect performance.

Suppose each :NostrEvent node were limited to just a few properties: eventId, created_at, author. +/- tags. This would reduce the db size by quite a lot.

Yeah. Strfry does the heavy lifting for storing events already. Neo4j just needs to store the relationships between them to run WoT algorithms.

Yup. Pairing LMDB and neo4j together makes a lot of sense. If you want to do keyword search through content, pull from LMDB. Then to filter and sort the results, that’s what neo4j is for.

I plan on adding compact wot graph support to nostrdb (which is lmdb). Seems like an obvious win for this state to be embedded in nostr clients.

neo4j? Some other FOSS graph db? Or custom code the graph?

nostrdb is a custom embeddable nostr database/relay built on lmdb. It’s the engine that powers damus ios and damus notedeck. I want it to automatically calculate follow graphs/web of trust when processing contact events, so this state is readily available and queryable.

I listened to your recent talk with nostr:npub1jlrs53pkdfjnts29kveljul2sm0actt6n8dxrrzqcersttvcuv3qdjynqn on nostr:npub1mlcas7pe55hrnlaxd7trz0u3kzrnf49vekwwe3ca0r7za2n3jcaqhz8jpa and I’m excited to see where you go with notedeck 💜

If you want notedeck to calculate follow graphs / WoT, then you’re going to want to use a graph db like neo4j, which is what I use to calculate your personalized Follows Network in under half a minute. It spits out a json with 180k pubkeys and the number of follow-hops between you and each pubkey. This is separate from calculating personalized PageRank, which it also does in about 15 seconds.

https://fountain.fm/episode/A49mWX658yQqbvOU9M98

I have lmdb, i can build up these relations myself using lmdb indices. It doesn’t really make sense to pull in a new database just for this feature

Actually, it doesn't make sense to recreate Neo4j's graph data structure and query language. It's more mature than you think.

That seems like a lot of work, maybe eventually. Just gonna keep the api relatively simple: a function that gives wot score between two pubkeys

I agree. Neo4j has a nontrivial learning curve and we can’t all do all the things.

I think the ideal scenario would be devs who are already familiar with Neo4j to put together some ETL tools to sync LMDB with Neo4j. Something that can be readily paired with notedeck, strfry, khatru, etc.

I'd need to do some research ingesting events into neo4j. It shouldn't be much more than running a relay in terms of storage space. Where you'll pay the most is in CPUs. On Amazon, the difference between 2 and 4 CPUs looks like $50/mo. I'm sure you could have good performance on a small or medium sized web of trust in their medium 2CPU/4GB instances for $15-20/mo.

For all of Nostr, primal style centralized cache with lots of users, I'd guess you could spend $50-100 or more a month.

I just looked, and the free edition allows 4 CPUs max, 34 billion nodes, and no clustering. Makes sense price wise, because as soon as you're paying for than 4 CPUs in hosting, you're big enough to not gawk at the price of an enterprise license.

Imagine building a personalized WoT relay using community edition neo4j. Fully FOSS. Personalized reputation scores. Knowledge graph to organize the content that you care about the most.

One could do a lot with 34 billion nodes.

I got strfry and neo4j playing nice on a server together. Next step is ETL pipeline from strfry’s LMDB into neo4j.

Ofc, lots of relays use LMDB, so the above pipeline could be easily applied to khatru, nostrdb, etc.

Perhaps nostr:npub1fvmadl0mch39c3hlr9jaewh7uwyul2mlf2hsmkafhgcs3dra6dzqg6szfu + neo4j … 🤔

nostr:npub10npj3gydmv40m70ehemmal6vsdyfl7tewgvz043g54p0x23y0s8qzztl5h