This would be local, and I'd rather it deal with metadata, transcripts, PDFs, etc rather than an entire hard drive of files. Essentially a massive amount of text from various sources collected into one location.
And I'm aware no single database is best for every use case, I'm looking for one that would work for large amounts and LLM integration (or how best that would look), that's why im on the hunt for options. Any ideas or trade offs you can offer are appreciated.
If you just want your LLMs to read the content then something like MongoDB would work well since itβs a document store. You can add your own metadata.
If you want to query against the data without iterating through the documents each time then you might want to tokenize your content and store in a vector database like Qdrant or similar.
Yeah that's more what i had in mind. Vectorizing alongside the actual database of files to recall it later and just have pointers to it. I'll look into Mongo + Qdrant and start playing around, thanks π
Thread collapsed
Thread collapsed
take a look at Neo4j.. they have a tutorial which explains and explores what makes graphs useful.
cc nostr:nprofile1qyfhwumn8ghj7ctvvahjuat50phjummwv5q32amnwvaz7tm9v3jkutnwdaehgu3wd3skueqqyzu7we2xhgry2mknq8v7227yn7jguu9xhu3g90n6rtnjj3mpyq3ackdvvhl
we are using it in our setup for [Core Explorer](https://github.com/coreexplorer-org/core-explorer-kit)
let me know if you want to look more closely at it
Thread collapsed
Thread collapsed