Mixtral 8x7B is fast!

Reply to this note

Please Login to reply.

Discussion

On what hardware?

I have a quantized version from TheBloke running on my MacBook Pro 64GB and was impressed as well.

AMD RDNA2 GPU using ROCM acceleration. Using GGUF and layering it over DDR4. It’s very fast compared to my favorite 70Bs.

Over 4 tokens/s on 5_K_M. In comparison Euryale 1.3 and LZLV does around 0.9 tokens/s.

I’m almost biting the bullet on the Max M3 128GB. It seems to be the best bang for the buck considering the power usage and how stingy Nvidia and AMD are with VRAM.