72gb but sadly it's not unified, only max 24gb per card, so I can't do big models
Pinging nostr:nprofile1qqsw9n8heusyq0el9f99tveg7r0rhcu9tznatuekxt764m78ymqu36csxjejf , whatβs the total vRAM in that monster you built?
Discussion
nostr:nprofile1qyw8wumn8ghj7un9d3shjtnzd96xxmmfdecxzunt9e3k7mf0qy2hwumn8ghj7un9d3shjtn4w3ux7tn0dejj7qpqutx00neqgqln72j22kej3ux7803c2k986henvvha4thuwfkper4sau8ykj have you tried something that supports tensor parallelism like https://github.com/turboderp-org/exllamav2 ?