I built a deep document search running off chatgpt.

It basically sends a bunch of documents, page by page, through an LLM and asked it if the query is present. And collects relevant pages via regexs

I refactored it to run 100% local off llama 3.1 8b (or anything else) and the speed was crazy. 0.5s response times.

With open ai it was like 20s round trip. You can batch/send parallel requests but still doesn't come close to an 8b on the same machine

Reply to this note

Please Login to reply.

Discussion

No replies yet.