2:4 Sparse Llama: Smaller Models for Efficient GPU Inference https://blog.quintarelli.it/2024/12/24-sparse-llama-smaller-models-for-efficient-gpu-inference/
Discussion
No replies yet.
2:4 Sparse Llama: Smaller Models for Efficient GPU Inference https://blog.quintarelli.it/2024/12/24-sparse-llama-smaller-models-for-efficient-gpu-inference/
No replies yet.