Also which model you using?

Reply to this note

Please Login to reply.

Discussion

It was only on a M3 CPU so far - no GPUs yet. We’ll do that benchmark later but it might be more expensive for relay operators / some users may not have GPUs.

gemma3:1b

llama3.2:1b

I mean it’s like day and night, these models needs GPU, no point otherwise. AFAIK the memory is unified and you should be able to use it with Metal kernels.