Nostr Web Client

> and is not set to the default value of 2048.

You got it, that's the bottom line! Thanks!

I set the context to 64KB and the response is MUCH better, quite similar to Claude; it's only quite slow on my PC, I need to wait some minutes for every query. But it's promising.

Now I need to think about a chunking and summarization workflow to manage dozen of similar files.

Gzuuus 5mo ago

Awesome! I'm glad 🤙

Just to understand. Your idea is to query that document using prompts? Like chat with your docs? You might get much better results using rag if that's the case

Reply to this note

Please Login to reply.

Discussion

daniele 5mo ago

Exactly.

I already tested RAG using AnythingLLM without any success, but I suppose that the bottom problem was the context length. Now I will test it again using the modified models.

daniele 5mo ago

I tested RAG but I'm getting poor results, I played a little with chunk size and chunk overlap, but it doesn't seem to help. I only got decent results (but no better than the standard query) with Open WebUI enabling "Full Context Mode" (so the whole document is fed), but it took 30% more time to reply compared to the standard mode.

Any suggestions?

semisol 5mo ago

Since you are dealing with things that could be non-self-descriptive and probably are not what embeddings are trained for, consider feeding your text to an LLM first to summarize and turn into more explaining content.

Then feed that to the embedding model

daniele 5mo ago

I will try that, thanks.

semisol 5mo ago

You can also build sequential embeddings this way:

The summary of the last segment was as follows:

The current segment is:

Please return a summary for the current segment, using the previous segment for context, and also return the current context.

daniele 5mo ago

Uhm, this is hardcore, I need to understand all these pipeline stuff.

Gzuuus 5mo ago

Hey! Yes, you could follow what nostr:npub12262qa4uhw7u8gdwlgmntqtv7aye8vdcmvszkqwgs0zchel6mz7s6cgrkj is recommending. Basically, create a synthetic dataset with two columns: col 1 for questions and col 2 for answers. You can use an LLM to generate this dataset, then embed the answers. I would also recommend using an embedding model like Nomic ( https://ollama.com/library/nomic-embed-text ) since they have an interesting prefix system that generally improves the performance and accuracy of queries ( https://huggingface.co/nomic-ai/nomic-embed-text-v1.5#usage ). I can also share the code for the Beating Heart, a RAG system that ingests MD documents, chunks them semantically, and then embeds them https://github.com/gzuuus/beating-heart-nostr . Additionally, I find the videos by Matt Williams very instructive https://www.youtube.com/watch?v=76EIC_RaDNw . To finalize, I would say that generating a synthetic dataset is not necessarily needed if you embed the data smartly.

daniele 5mo ago

Lots of things to study, I will take a look and experiment!