Nostr Web Client

Replying to

Colby Serpa

We were already using 1B models but it was too slow. Need faster speeds for smaller models.

tiero 8mo ago

Which inference engine and tech stack if I can ask ? Have you used WebGPU and/or Metal on Mac/iOS?

Reply to this note

Please Login to reply.

Discussion

tiero 8mo ago

Also which model you using?

Colby Serpa 8mo ago

It was only on a M3 CPU so far - no GPUs yet. We’ll do that benchmark later but it might be more expensive for relay operators / some users may not have GPUs.

gemma3:1b

llama3.2:1b

tiero 8mo ago

I mean it’s like day and night, these models needs GPU, no point otherwise. AFAIK the memory is unified and you should be able to use it with Metal kernels.