Subnostr

For anyone else playing with LLaMA 2, the new K quant models are definitely the ones to go for. Just compared the two and the K_S Q4 model is much faster, less RAM intensive, and produces higher quality output than the regular Q4.

Reply to this note

Please Login to reply.

Discussion

mutatrum 2y ago

As someone new, what's a good starting point to get this up and running?

Pablo Xannybar 2y ago

If you're already familiar with Linux in general, this repo will tell you what to do.

To get the actual models, lookup "TheBloke" on Hugging Face.

https://github.com/ggerganov/llama.cpp

N3WD3V 2y ago

Good work mate 👏