So some prelim testing nostr:npub1a6we08n7zsv2na689whc9hykpq4q6sj3kaauk9c2dm8vj0adlajq7w0tyc nostr:npub1sceswwu9vldf0tg3nlkajutp2j4zsv35ewma9r23x8d2qa73eaaqmmq7fu

2x Titan X maxwell -> 2x Tesla P100 16g

gpt-oss 20b skyrocket performance increase from less than 10 t/s to almost 40 t/s it's amazing (utilizes the new FP16 compute I believe)

Small models <15b (Granite, Gemma, DeepSeek)- about equal or worse performance

Larger models >20b (Qwen 3, Gemma Deep Seek 32b) - Nearly double, ~4 to over 8 t/s

Still pretty slow compared to really modern GPUs, but for the price, I'm not upset at all. I want to explore some driver stuff or maybe some tuning because it seems like I should be seeing better. Might just be the rest of the old server running it.

Power consumption is nearly 100w less for the same loads. Idle is about 15w higher per card (they idle around 30w each instead of 15). Performance per watt has increased dramatically. Power limit was set to 150w/card in all tests titans typically went over and p100s aren't even coming close to it.

Ill be playing with vGPU at a later date. I've always dreamt of a thin client setup but it's just never really worked out.

Reply to this note

Please Login to reply.

Discussion

cc nostr:npub1w4jkwspqn9svwnlrw0nfg0u2yx4cj6yfmp53ya4xp7r24k7gly4qaq30zp

try an moe model. these don't require as much cross-card communication. also try llama.cpp vs ollama, and some of the layer tuning options

I will look into that thanks! I wonder if llama.cpp is compatible with openwebui I need it for my workflow right now.

i've been bummed that qwen3-coder runs twice as fast with a specific llamacpp tune than with ollama, and just today realized that you can use llama-server with openwebui. add an "openai api connection" to localhost:8080 (change with --host and --port)

Okay, well that's good to know! It's usually just a matter of time that ollama gets the upgrades. I'm pretty patient.

yeah, i default to ollama. the qwen3-coder tune is very specific

Also I didn't realize a qwen3-coder released! I think Im still using qwen2.5-coder

yes! https://ollama.com/library/qwen3-coder you should be able to run the 30b pretty easily with those two 16gb cards. if you have a few hundred GB of RAM you also might be able to get usable t/s out of the 480b model nostr:nevent1qvzqqqqqqypzq4mdy0wrmvs9d5sgsj2x9lhrtr8e7renzz3vv09kcfn6fw04sj8eqy88wumn8ghj7mn0wvhxcmmv9uq3zamnwvaz7tmwdaehgu3wd3skuep0qyghwumn8ghj7mn0wd68ytnhd9hx2tcqyqj5vwnflqn0tvgephwy9y6de0p2ywfpckujr4r8pp2dgf2lpzuvs6wqrsk

I have 128gb and could probably bump it to 256, but im on ddr3 1833 quad channel.

Also 6t/s seems pretty easy to beat with my old ass hardware with the numbers I've seen so far. Are your numbers on the 480b model I hope? Downloading now and will report back!

yeah, that's 480b with 256k context

i mean, don't knock dual P100's - you're going to have a lot of fun 😎

Holy shit! 39 t/s on 30b!

i get ~112 t/s, but my p40 + 3090 cost more than three times two p100's. local ai time 🤙

if you've got the power space, you'd be pretty well off with another two of those! they rarely draw full power. hmm, maybe i should stuff one in my rig 🤔

They sip power compared to the titans (which I had limited). I paid $110/card to my door. I only have a 2u chassis. I just got rid of my old Dell 900 series machines. Next affordable chassis for me is either an r740 or r7425 if I decide to go that route. I also need a new workstation too, was looking at the precision 7920 rigs as well. I found a pair of 2nd gen Xeons that should out perform the 3900x I have now.

if you're doing more inference, keep an eye on the cpu's pcie lanes. amd tends to have more of them than intel, though iirc the xenons aren't bad. i'm really digging these used epyc milans though

I've got 2 x P40s 24Gb waiting at home to build a similar machine, I was worried it might not preform well enough but these results are encouraging

Qwen3-coder 30b just hit 39 t/s if you need a littel more fuel! although I don't think the p40s have the FP unit so not sure I would expect as good perf as Im getting but please let me know once you get it up and running!

Even when not the fastest I'm sure that you will find more than a few applications for the p100s ;). Good find for the price.