Awesome! I'm glad 🤙
Just to understand. Your idea is to query that document using prompts? Like chat with your docs? You might get much better results using rag if that's the case
Awesome! I'm glad 🤙
Just to understand. Your idea is to query that document using prompts? Like chat with your docs? You might get much better results using rag if that's the case
Exactly.
I already tested RAG using AnythingLLM without any success, but I suppose that the bottom problem was the context length. Now I will test it again using the modified models.
I tested RAG but I'm getting poor results, I played a little with chunk size and chunk overlap, but it doesn't seem to help. I only got decent results (but no better than the standard query) with Open WebUI enabling "Full Context Mode" (so the whole document is fed), but it took 30% more time to reply compared to the standard mode.
Any suggestions?
Since you are dealing with things that could be non-self-descriptive and probably are not what embeddings are trained for, consider feeding your text to an LLM first to summarize and turn into more explaining content.
Then feed that to the embedding model
I will try that, thanks.
You can also build sequential embeddings this way:
The summary of the last segment was as follows:
The current segment is:
Please return a summary for the current segment, using the previous segment for context, and also return the current context.
Uhm, this is hardcore, I need to understand all these pipeline stuff.
Hey! Yes, you could follow what nostr:npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj is recommending. Basically, create a synthetic dataset with two columns: col 1 for questions and col 2 for answers. You can use an LLM to generate this dataset, then embed the answers. I would also recommend using an embedding model like Nomic ( https://ollama.com/library/nomic-embed-text ) since they have an interesting prefix system that generally improves the performance and accuracy of queries ( https://huggingface.co/nomic-ai/nomic-embed-text-v1.5#usage ). I can also share the code for the Beating Heart, a RAG system that ingests MD documents, chunks them semantically, and then embeds them https://github.com/gzuuus/beating-heart-nostr . Additionally, I find the videos by Matt Williams very instructive https://www.youtube.com/watch?v=76EIC_RaDNw . To finalize, I would say that generating a synthetic dataset is not necessarily needed if you embed the data smartly.
Lots of things to study, I will take a look and experiment!