So some prelim testing nostr:npub1a6we08n7zsv2na689whc9hykpq4q6sj3kaauk9c2dm8vj0adlajq7w0tyc nostr:npub1sceswwu9vldf0tg3nlkajutp2j4zsv35ewma9r23x8d2qa73eaaqmmq7fu
2x Titan X maxwell -> 2x Tesla P100 16g
gpt-oss 20b skyrocket performance increase from less than 10 t/s to almost 40 t/s it's amazing (utilizes the new FP16 compute I believe)
Small models <15b (Granite, Gemma, DeepSeek)- about equal or worse performance
Larger models >20b (Qwen 3, Gemma Deep Seek 32b) - Nearly double, ~4 to over 8 t/s
Still pretty slow compared to really modern GPUs, but for the price, I'm not upset at all. I want to explore some driver stuff or maybe some tuning because it seems like I should be seeing better. Might just be the rest of the old server running it.
Power consumption is nearly 100w less for the same loads. Idle is about 15w higher per card (they idle around 30w each instead of 15). Performance per watt has increased dramatically. Power limit was set to 150w/card in all tests titans typically went over and p100s aren't even coming close to it.
Ill be playing with vGPU at a later date. I've always dreamt of a thin client setup but it's just never really worked out.