2:4 Sparse Llama: Smaller Models for Efficient GPU Inference https://blog.quintarelli.it/2024/12/24-sparse-llama-smaller-models-for-efficient-gpu-inference/

Reply to this note

Please Login to reply.

Discussion

No replies yet.