Nostr Web Client

i've been bummed that qwen3-coder runs twice as fast with a specific llamacpp tune than with ollama, and just today realized that you can use llama-server with openwebui. add an "openai api connection" to localhost:8080 (change with --host and --port)

ChipTuner 3mo ago 💬 1

Also I didn't realize a qwen3-coder released! I think Im still using qwen2.5-coder

Reply to this note

Please Login to reply.

Discussion

ynniv 3mo ago 💬 2

yes! https://ollama.com/library/qwen3-coder you should be able to run the 30b pretty easily with those two 16gb cards. if you have a few hundred GB of RAM you also might be able to get usable t/s out of the 480b model nostr:nevent1qvzqqqqqqypzq4mdy0wrmvs9d5sgsj2x9lhrtr8e7renzz3vv09kcfn6fw04sj8eqy88wumn8ghj7mn0wvhxcmmv9uq3zamnwvaz7tmwdaehgu3wd3skuep0qyghwumn8ghj7mn0wd68ytnhd9hx2tcqyqj5vwnflqn0tvgephwy9y6de0p2ywfpckujr4r8pp2dgf2lpzuvs6wqrsk

ChipTuner 3mo ago

I have 128gb and could probably bump it to 256, but im on ddr3 1833 quad channel.

ChipTuner 3mo ago 💬 1

Also 6t/s seems pretty easy to beat with my old ass hardware with the numbers I've seen so far. Are your numbers on the 480b model I hope? Downloading now and will report back!

ynniv 3mo ago 💬 1

yeah, that's 480b with 256k context

ynniv 3mo ago 💬 1

i mean, don't knock dual P100's - you're going to have a lot of fun 😎

ChipTuner 3mo ago 💬 1

Holy shit! 39 t/s on 30b!

ynniv 3mo ago 💬 1

i get ~112 t/s, but my p40 + 3090 cost more than three times two p100's. local ai time 🤙

if you've got the power space, you'd be pretty well off with another two of those! they rarely draw full power. hmm, maybe i should stuff one in my rig 🤔

ChipTuner 3mo ago 💬 1

They sip power compared to the titans (which I had limited). I paid $110/card to my door. I only have a 2u chassis. I just got rid of my old Dell 900 series machines. Next affordable chassis for me is either an r740 or r7425 if I decide to go that route. I also need a new workstation too, was looking at the precision 7920 rigs as well. I found a pair of 2nd gen Xeons that should out perform the 3900x I have now.

ynniv 3mo ago

if you're doing more inference, keep an eye on the cpu's pcie lanes. amd tends to have more of them than intel, though iirc the xenons aren't bad. i'm really digging these used epyc milans though