Which inference engine and tech stack if I can ask ? Have you used WebGPU and/or Metal on Mac/iOS?

Reply to this note

Please Login to reply.

Discussion

Also which model you using?

It was only on a M3 CPU so far - no GPUs yet. We’ll do that benchmark later but it might be more expensive for relay operators / some users may not have GPUs.

gemma3:1b

llama3.2:1b

I mean it’s like day and night, these models needs GPU, no point otherwise. AFAIK the memory is unified and you should be able to use it with Metal kernels.