Learning more about running my own LLM models at home. Apparently, the quantization method impacts performance differently on different kinds of hardware.

This is why, if you’re browsing models on Hugging Face, you’ll see files with suffixes like “Q3_K_S” and “IQ2_XXS”. The number after the “Q” tells you which quantization method the model uses. Some will be much slower than others depending on the capabilities of the CPU and GPU in the machine. #llm

Reply to this note

Please Login to reply.

Discussion

ONNX and HuggingFace👍

Ollama

What program do you use? I use ollama but it doesnt allow use is models from hugging face without modification which I haven't done yet.

Ah but it does! Once you download the gguf file from Hugging Face, you can use ollama’s create command, passing in a Modelfile that specifies the path the the gguf. Then you can use ollama run to start up the model.

It’s kinda annoying but there are instructions online: https://www.markhneedham.com/blog/2023/10/18/ollama-hugging-face-gguf-models/

I used this technique to run mradermacher/dolphin-2.9.2-mixtral-8x22b-GGUF: https://huggingface.co/mradermacher/dolphin-2.9.2-mixtral-8x22b-GGUF

Is a gguf the big model file that ends with .safetensors? Sorry I am new to this

No, sorry, that’s the file extension. For example, this page has some large *.gguf files split into parts (because Hugging Face has a max upload size of 50GB): https://huggingface.co/mradermacher/dolphin-2.9.2-mixtral-8x22b-GGUF

Once you download the two parts, you can combine them into the single *.gguf file that ollama is able to import. Instructions for combining the part files can be found here: https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF

Well I guess thats fine for models with gguf files.... But ive never seen those. Take this model for example (satoshi), there's no gguf just safetensors files

https://huggingface.co/LaierTwoLabsInc/Satoshi-7B/tree/main

Yeah, I believe there are tools that can covert them but I haven’t tried. Once I found that there were already gguf files for the models I wanted to run, I just used those.

If you try the conversion tools, let me know how it goes!