Subnostr

For anyone else playing with LLaMA 2, the new K quant models are definitely the ones to go for. Just compared the two and the K_S Q4 model is much faster, less RAM intensive, and produces higher quality output than the regular Q4.