https://huggingface.co/relaxml/Llama-2-70b-chat-QTIP-2Bit
New quant method allegedly no drop in quality, let's you fit 70b llamas on a 3090
If I procrastinate upgrading my hardware long enough I just won't need to
https://huggingface.co/relaxml/Llama-2-70b-chat-QTIP-2Bit
New quant method allegedly no drop in quality, let's you fit 70b llamas on a 3090
If I procrastinate upgrading my hardware long enough I just won't need to
Ooooh fuck their llama 3.1 405b takes up 110gb vram
A 4xmi60 setup would get you 128gb and be like 2k?
Alt: 6x3090 rig for 144gb vram but would cost like 5k
"From preliminary testing QTIP 1 bit 405b is pretty usable" (58gb 405b model)
nostr:nprofile1qqstnem9g6aqv3tw6vqaneftcj06frns56lj9q470gdww228vysz8hqpzemhxue69uhhyetvv9ujuurjd9kkzmpwdejhgqg6waehxw309ahx7um5wghx7unpdenk2urfd3kzuer9wcq3wamnwvaz7tmjv4kxz7fwvd6hyun9de6zuenedyvu6425 the near future looks crazy. Looks like it isn't supported in the major engines yet
https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803