Nostr Web Client

Can anyone teach me how to do this? https://emschwartz.me/binary-vector-embeddings-are-so-cool/

There is so much jargon about this stuff I don't even know where to start.

Basically I want to do what https://scour.ing/ is doing, but with Nostr notes/articles only, and expose all of it through custom feeds on a relay like wss://algo.utxo.one/ -- or if someone else knows how to do it, please do it, or talk to me, or both.

Also I don't want to pay a dime to any third-party service, and I don't want to have to use any super computer with GPUs.

Thank you very much.

Reply to this note

Please Login to reply.

Discussion

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

nostr:npub1xtscya34g58tk0z605fvr788k263gsu6cy9x0mhnm87echrgufzsevkk5s nostr:npub1gcxzte5zlkncx26j68ez60fzkvtkm9e0vrwdcvsjakxf9mu9qewqlfnj5z

jb55 1y ago

i was looking at this same article the other day, been thinking about it...

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

hugging face always has good blog posts too https://huggingface.co/blog/embedding-quantization#retrieval-speed

jb55 1y ago

Looks simple enough. I imagine you could even go further with a sparse encoding scheme assuming there are huge gaps of 0 bits, which is probably the case for high dimensional embeddings.

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

curious if it’s being used anywhere yet

liminal 🦠 1y ago

I hear my laptop's fans start whirring around when its making a response, I wouldn't be surprised if its doing something locally first. Either the encoding process (words to tokens) or the retrieval (finding relevant documents from a project)

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

retrieval maybe?

btw have you seen https://www.mixedbread.ai/blog/mxbai-embed-large-v1

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

wait nvm encoding should be first no? Since converting words to tokens is usually needed before retrieval unless the retrieval uses pre-computed embedding, maybe it skips straight to that? Idk

liminal 🦠 1y ago

Think of how you 'hold a memory', no one else can interact with it unless you talk about it or draw it, a memory or idea needs to be conveyed somehow to the outside world. An encoding models that produce embeddings are basically half an LLM, it takes in the words/tokens and says "this string is located here", it assigns a coordinate, an address for it. That address is the context, anything with addresses nearby are more related in ideas. The second part of the LLM is the decoder, where it takes the address as a kind of starting point. The decoder uses the context of that coordinate and responds with words that are also in the right context (which is learned by training).

H.T. to nostr:npub1h8nk2346qezka5cpm8jjh3yl5j88pf4ly2ptu7s6uu55wcfqy0wq36rpev for his fantastic read of "A gentle introduction to Large Large Language Models"

https://fountain.fm/episode/yCpvsos8iUfXsfLeUPon

https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e

liminal 🦠 1y ago

But yeah, you're right on both accounts. You get the embedding from an encoder, which is the context. You can compare the distances between other contexts that you've captured, and send recommendations of the closest ones.

𝕾𝖊𝖗 𝕾𝖑𝖊𝖊𝖕𝖞 1y ago

Keep me posted on what you guys do with it 🫡

Silberengel 1y ago

nostr:nprofile1qqsdcnxssmxheed3sv4d7n7azggj3xyq6tr799dukrngfsq6emnhcpspz9mhxue69uhkummnw3ezuamfdejj7qgmwaehxw309a6xsetxdaex2um59ehx7um5wgcjucm0d5hsz9nhwden5te0dp5hxapwdehhxarj9ekxzmny9u78m52f

Mike Dilger ☑️ 1y ago

Imagine you have a unit vector that points in any direction. In 3-dimensional space, it represents some point on the unit-sphere. That can be described with 3 numbers (x, y, z) but not ANY three numbers, they have to be such that the magnitude is 1.

In any case, if you can map information to a point on this unit sphere, and you do that for lots of input data, then when you query the system with new input data it can tell you which pre-existing input data happens to be the closest point on this unit sphere. Actually the most popular algorithms aren't guaranteed to be the closest (but I know of one that does give the closest and has other good properties but I'm under NDA on that so I can't say more).

3-dimensions turns out to be pretty useless, but in say about 3096 dimensions you start being able to encode enough information into that 3096-D unit-vector as to be useful in an A.I. sense.

But you have to first map information into a unit vector using an "embedding layer" which is some A.I. magic that I don't know very much about at all.

Mike Dilger ☑️ 1y ago

I guess what I'm describing here isn't "binary" though, it uses f32s.

hodlbod 1y ago

🥵 this stuff is so cool and I will never understand it

LuisSP 1y ago

The encoding (function string -> number vector ) is part of LLM magic, a common first step of various ANN enchantments (which the magicians also don't understand, don't worry). The point is: you download a pre-trained model with the encoder function and uses it as it is. On this thread @sersleppy posted a blog with an example:

https://huggingface.co/blog/embedding-quantization#retrieval-speed

Embeddings are supposed to reflect content (semantics, but this may be too strong a word). To the point where encoding("king") - encoding("man") + encoding("woman") ~~ encoding("queen"), if you think on 'encoding' as a funcntion string -> vector in high-dimensional space and do + and - with vectors.

Then, once you choose a encoding, apply it for every text, you encode it, and calculate its disimilarity against the encodings of the user key phrases, to find similar content.

Conceptually, the binary encoding is the same. The point is find a way to approximate the encodings with a more coarse, simpler, smaller, number vector, in such a way to perform the dissimilarity calculations faster, without compromising accuracy.

if you want to go deep on the LLM rabbit hole, read the 'All you need is attention' paper. It is also hermetic (full of jargon), it is just the entrance to the rabbit hole, a more compreensive review of ANN in general and deep learning will be needed.

liminal 🦠 1y ago

Would be interesting if the performance really is that good, which would imply that its already being used on some services. Haven't looked around, but if there are any open source models that could do so, you could likely load them on a phone or laptop. Meaning, you'd be doing the embedding locally first and then sending the data away to do the processing and recommendation.

Search on a local machine is a different challenge because whatever is doing the searching needs to have access to all the data (text and vectors) in a vector db. Vector search is a turbocharged K nearest neighbors algorithm, sending the top K closest entries according to their semantic distance; vectors that represent the word 'dog' and 'puppy' are closer than the vectors that represent 'dog' and 'chicken'. That concept scales up to paragraphs and pages of text, you can imagine how a fairy tale story being farther than a group of physics papers because they're unrelated.

Vector search essentially uses a network to identify the nodes that are closest for a recommendation, so I wonder if you can send a partial network of related ideas to search through - that way they don't need to entirely rely on a main service for all the data, just ones that are outside some distance.

Been brainstorming these ideas with NKBIP-02

https://wikifreedia.xyz/nkbip-02/liminal@nostrplebs.com

fiatjaf 1y ago

OK, even though no one teached me anything I got a thing running that takes notes and does stuff with them, but it sucks, so I am abandoning these plans for now.

SoupBox 1y ago

I can read it to you.

liminal 🦠 1y ago

I will gladly spend a day writing something up to help break this down. Let me know what you'd specifically want help with, I'll also try and gear it for a general audience.

Breno Brito 1y ago

DM me and I'll help you out.

Trabalho com isso.

Girino Vey! 1y ago

I can vouch for nostr:npub1v22qyndskpawjnsjn8zce53nwldza5ejw67f8y33ntt8qlmpm5rq7ra0z2, he won the hackathon at satsconf last year and was 3rd place this year. (aside from many other projects) He really knows what he's talking about.

Oh, and he hosts the bitdevs meetings in Brasília too. (monthly meeting of bitcoin and bitcoin related devs).

liminal 🦠 1y ago

Happy Thanksgiving Fiatjaf, here's a demo that grabs nostr events, converts and stores their binary embeddings and retrieves the 5 closest and most different events from the query. Minimal demo, lots of places to improve on.

nostr:nevent1qvzqqqqqqypzqwlsccluhy6xxsr6l9a9uhhxf75g85g8a709tprjcn4e42h053vaqqsyuy9lqam64npx6vutujt6fv8f43dfjz0pv5v4xp2gl7d3eq9fpxskd8zzj

https://github.com/limina1/nostr-binary-embedding-demo/

paulo 1y ago

You can start by playing around with MixedBread’s trained model, which they say supports both binary and matryoshka embeddings.

https://www.mixedbread.ai/blog/mxbai-embed-large-v1

fiatjaf 1y ago

Thank you. I had seen this MixedBread stuff mentioned somewhere but I thought it was a paid API.

paulo 1y ago

I think the model is open, look at the code examples here:

https://huggingface.co/blog/embedding-quantization#binary-quantization-in-sentence-transformers

In any case, binary quantization can be applied to other embeddings models too.