Nostr Web Client

i have internalized the fact that i talk to my gpu on a daily basis and this is totally normal and not insane

Which model? So far llama3.3 is my only tolerable local model. But it is throttled by the speed my ram can feed the remaining 18GB to my CPU. So mostly I talk to my CPU I guess even though the GPU is doing 4/7 of the work.

Reply to this note

Please Login to reply.

Discussion

Daniel Wigton 9mo ago

I get about 3.5 tokens/sec. A 5090 is tempting simply because it would cut RAM bound performance in half.

jb55 9mo ago

Yeah llama is what i use. Its fast

Daniel Wigton 9mo ago

At 70B parameters? 3.3 t/s is equivalent to a fairly fast human typist, but not so fast I don't pick and choose what to ask it. Works pretty well with Continue AI. But the default context window in ollama is kinda small if I need it to look at more than a few files.

Daniel Wigton 9mo ago

Oh wait. Maybe you have a Mac with tons of unified memory?

jb55 9mo ago

I use 3.1 on my 8gb vram gpu