Since you are dealing with things that could be non-self-descriptive and probably are not what embeddings are trained for, consider feeding your text to an LLM first to summarize and turn into more explaining content.

Then feed that to the embedding model

Reply to this note

Please Login to reply.

Discussion

I will try that, thanks.

You can also build sequential embeddings this way:

The summary of the last segment was as follows:

The current segment is:

Please return a summary for the current segment, using the previous segment for context, and also return the current context.

Uhm, this is hardcore, I need to understand all these pipeline stuff.