Since you are dealing with things that could be non-self-descriptive and probably are not what embeddings are trained for, consider feeding your text to an LLM first to summarize and turn into more explaining content.
Then feed that to the embedding model