Nostr Web Client

What, in your experience, is the most useful, scalable, and compatible form of a database for large amounts of data, links, tags, etc?

Context: Saving stuff from the web, tagging it, organizing it, reading it with LLMs, and calling it later to maybe publish to Nostr, or Pubky, etc

#asknostr

Reply to this note

Please Login to reply.

Discussion

twofish 7mo ago 💬 1

nostr

twofish 7mo ago

A nostr LAN, or a nostr e2e encrypted private group with only you in it.

The idea is that you can use the same analytics on your private LAN, as you would do on the public internet.

selim 7mo ago

PostgreSQL

Kaizen 7mo ago

This

𝐒𝐧@𝐱 7mo ago 💬 1

That would be my answer

Machu Pikacchu 7mo ago

When dealing with large amounts of data there's always tradeoffs. We don't really have enough to go on yet. How much data? Gigabytes, Terabytes, Petabytes or more? What kind of uptime do you need? Are your LLMs reading it on the fly or do they need to query across the whole dataset at any given time?

There's no silver bullet. No single database or system is best at every use case.

Guy Swann 7mo ago

This would be local, and I'd rather it deal with metadata, transcripts, PDFs, etc rather than an entire hard drive of files. Essentially a massive amount of text from various sources collected into one location.

And I'm aware no single database is best for every use case, I'm looking for one that would work for large amounts and LLM integration (or how best that would look), that's why im on the hunt for options. Any ideas or trade offs you can offer are appreciated.

Machu Pikacchu 7mo ago

If you just want your LLMs to read the content then something like MongoDB would work well since it’s a document store. You can add your own metadata.

If you want to query against the data without iterating through the documents each time then you might want to tokenize your content and store in a vector database like Qdrant or similar.

Guy Swann 7mo ago

Yeah that's more what i had in mind. Vectorizing alongside the actual database of files to recall it later and just have pointers to it. I'll look into Mongo + Qdrant and start playing around, thanks 🙏

it's me, real fake 7mo ago

take a look at Neo4j.. they have a tutorial which explains and explores what makes graphs useful.

it's me, real fake 7mo ago

cc nostr:nprofile1qyfhwumn8ghj7ctvvahjuat50phjummwv5q32amnwvaz7tm9v3jkutnwdaehgu3wd3skueqqyzu7we2xhgry2mknq8v7227yn7jguu9xhu3g90n6rtnjj3mpyq3ackdvvhl

we are using it in our setup for [Core Explorer](https://github.com/coreexplorer-org/core-explorer-kit)

let me know if you want to look more closely at it

Diacone Frost 7mo ago

kafka+lambda/kappa architecture

Dan Vergara 7mo ago

If you don't have cloud infrastructure at hand, use DuckDB

🇵🇸 whoever loves Digit 7mo ago

None, databases all suck today and nobody cares enough to fix a single one

But the first reply said nostr and I like that answer

selim 7mo ago

They suck. Can you explain ?

🇵🇸 whoever loves Digit 7mo ago

If it follows good principles it might have compatibility issues with shit that doesn't follow good principles

If it has better compatibility it must be doing some things in shitty ways just because those are the ways that work for compatibility

A database that doesn't need wide compatibility can do fine, actually, it's when you need compatibility that the tradeoffs come in

Nathan Day 7mo ago

Not a 'database' (depending on your definition of course!), but a well maintained Obsidian vault could do the job here and is LLM friendly.

colio 7mo ago

Im building something that sounds rather similar. What’s your idea?

it's me, real fake 7mo ago

some queue for keeping a list of what to import

some filesystem for storage

some graph database for the meta details (tags, relationships, etc)

if you want to discuss Digital Reformation I'm available