Subnostr

https://huggingface.co/relaxml/Llama-2-70b-chat-QTIP-2Bit

New quant method allegedly no drop in quality, let's you fit 70b llamas on a 3090

If I procrastinate upgrading my hardware long enough I just won't need to

Reply to this note

Please Login to reply.

Discussion

Ooooh fuck their llama 3.1 405b takes up 110gb vram

A 4xmi60 setup would get you 128gb and be like 2k?

Alt: 6x3090 rig for 144gb vram but would cost like 5k

"From preliminary testing QTIP 1 bit 405b is pretty usable" (58gb 405b model)

nostr:nprofile1qqstnem9g6aqv3tw6vqaneftcj06frns56lj9q470gdww228vysz8hqpzemhxue69uhhyetvv9ujuurjd9kkzmpwdejhgqg6waehxw309ahx7um5wghx7unpdenk2urfd3kzuer9wcq3wamnwvaz7tmjv4kxz7fwvd6hyun9de6zuenedyvu6425 the near future looks crazy. Looks like it isn't supported in the major engines yet

https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803

https://arxiv.org/pdf/2406.11235