Nostr Web Client

I hear my laptop's fans start whirring around when its making a response, I wouldn't be surprised if its doing something locally first. Either the encoding process (words to tokens) or the retrieval (finding relevant documents from a project)

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

retrieval maybe?

btw have you seen https://www.mixedbread.ai/blog/mxbai-embed-large-v1

Reply to this note

Please Login to reply.

Discussion

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

wait nvm encoding should be first no? Since converting words to tokens is usually needed before retrieval unless the retrieval uses pre-computed embedding, maybe it skips straight to that? Idk

liminal 🦠 1y ago

Think of how you 'hold a memory', no one else can interact with it unless you talk about it or draw it, a memory or idea needs to be conveyed somehow to the outside world. An encoding models that produce embeddings are basically half an LLM, it takes in the words/tokens and says "this string is located here", it assigns a coordinate, an address for it. That address is the context, anything with addresses nearby are more related in ideas. The second part of the LLM is the decoder, where it takes the address as a kind of starting point. The decoder uses the context of that coordinate and responds with words that are also in the right context (which is learned by training).

H.T. to nostr:npub1h8nk2346qezka5cpm8jjh3yl5j88pf4ly2ptu7s6uu55wcfqy0wq36rpev for his fantastic read of "A gentle introduction to Large Large Language Models"

https://fountain.fm/episode/yCpvsos8iUfXsfLeUPon

https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e

liminal 🦠 1y ago

But yeah, you're right on both accounts. You get the embedding from an encoder, which is the context. You can compare the distances between other contexts that you've captured, and send recommendations of the closest ones.

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

Keep me posted on what you guys do with it 🫡