Global Feed Post Login
Replying to Avatar jimbocoin 🃏

Learning more about running my own LLM models at home. Apparently, the quantization method impacts performance differently on different kinds of hardware.

This is why, if you’re browsing models on Hugging Face, you’ll see files with suffixes like “Q3_K_S” and “IQ2_XXS”. The number after the “Q” tells you which quantization method the model uses. Some will be much slower than others depending on the capabilities of the CPU and GPU in the machine. #llm

Avatar
d34e832d... 1y ago

ONNX and HuggingFace👍

Reply to this note

Please Login to reply.

Discussion

No replies yet.