Currently writing an implementation of the 1-bit quantization linear layer from this paper. Fresh trained LLM costs go to ~zero if this works at scale. https://arxiv.org/pdf/2402.17764.pdf
Discussion
No replies yet.
Currently writing an implementation of the 1-bit quantization linear layer from this paper. Fresh trained LLM costs go to ~zero if this works at scale. https://arxiv.org/pdf/2402.17764.pdf
No replies yet.