i'm kinda pleased that just by thinking through the problem i did come up with the criteria for relevance categorization that match up with the state of the art for a search without the addition of dictionaries of stems

for now i'm just going to build it with this to get a baseline useful algorithm, just need to figure out especially the sequence evaluation, and then i think on top of that the sequential stuff can then be weighted by proximity, and that will do for now

the idea is to make this a useful search for individual events, and for the most part the biggest texts that is going to be is stuff like chapters of books, which are typically between 500-5000 words, so, this will work... proximity AND THEN distance is going to be the logic

Reply to this note

Please Login to reply.

Discussion

maybe later on i will make semantic graph weightings like the "vertex" search engine of google, which is implemented in python and my colleague at work is using it to augment the simple set intersection search that i'm using for the matchmaking engine i've built.

Full circle back to "embeddings". On a serious note though, vector proximity is super poweful and fast af. Especially powerful when combined with vector averages. I have been experimenting a lot with nostr and achieved great results when I average the vector representation of a note with the vector representation of all replies. My next experiment is to have different average weights based on "reply relevance" which can be defined in plenty of different ways.