Global Feed Post Login
Replying to Avatar ynniv

Qwen3-Code Q4_K_XL full context @ 6 tokens/s

1x Nvidia 3090 (2020, $800)

1x Nvidia P40 (2017, $300)

2x EPYC Milan, 8 of 256 threads in use (2021, 2x$800)

DDR4 LRDIMM PC4-21300, 600 GB of 1TB in use ($1,250)

GIGABYTE MZ72-HB2 ($1,000)

`~/llama.cpp/build/bin/llama-cli --model /mnt/ollama/models/hf/Qwen3-Coder/UD-Q4_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q4_K_XL-00001-of-00006.gguf --threads 16 --ctx-size 262144 --n-gpu-layers 58 -ot "\.(6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9])\.ffn_(gate|up|down)_exps.=CPU" --numa numactl -fa --cache-type-k q4_0 --cache-type-v q4_0`

Avatar
ynniv 5mo ago

bah, should have been "8 of 128 cores" or "16 of 256 threads"

Reply to this note

Please Login to reply.

Discussion

No replies yet.