Nostr Web Client

Replying to

Daniel Wigton

I get about 3.5 tokens/sec. A 5090 is tempting simply because it would cut RAM bound performance in half.

jb55 9mo ago

Yeah llama is what i use. Its fast

Reply to this note

Please Login to reply.

Discussion

Daniel Wigton 9mo ago

At 70B parameters? 3.3 t/s is equivalent to a fairly fast human typist, but not so fast I don't pick and choose what to ask it. Works pretty well with Continue AI. But the default context window in ollama is kinda small if I need it to look at more than a few files.

Daniel Wigton 9mo ago

Oh wait. Maybe you have a Mac with tons of unified memory?

jb55 9mo ago

I use 3.1 on my 8gb vram gpu