Hmm, thinking about this, if we just save the embeddings vector for each post (the potentially tricky part) and profile, and then each query gets an embedding and then we find the post+profile distance - maybe all of that requires about as much compute cost as a single token generation.

On the one hand we need to remember that running a few prompts for nearly free isn’t the same as running thousands - millions per day for nearly free.

On the other, maybe these embeddings are quite cheap compared to token generation.

I wonder how much they cost exactly in compute

Reply to this note

Please Login to reply.

Discussion

No replies yet.