Qwen3-Code Q4_K_XL full context @ 6 tokens/s
1x Nvidia 3090 (2020, $800)
1x Nvidia P40 (2017, $300)
2x EPYC Milan, 8 of 256 threads in use (2021, 2x$800)
DDR4 LRDIMM PC4-21300, 600 GB of 1TB in use ($1,250)
GIGABYTE MZ72-HB2 ($1,000)
`~/llama.cpp/build/bin/llama-cli --model /mnt/ollama/models/hf/Qwen3-Coder/UD-Q4_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q4_K_XL-00001-of-00006.gguf --threads 16 --ctx-size 262144 --n-gpu-layers 58 -ot "\.(6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9])\.ffn_(gate|up|down)_exps.=CPU" --numa numactl -fa --cache-type-k q4_0 --cache-type-v q4_0`