Wild timing… 2 days after I posted this.

When the faster and smaller LLMs arrive it’s going to be awesome. Gotta prepare everything else first. 🌐🧱

https://x.com/testingcatalog/status/1916091849795653920

nostr:note1ym30kgavaj30jg4867k28526tsy75wa9zvzut30myrg2ewwyphms4lwpxu

Reply to this note

Please Login to reply.

Discussion

https://PremAI.io can help!

We were already using 1B models but it was too slow. Need faster speeds for smaller models.

We’ve got the code ready now though so when a smaller faster model does finally arrive, game on. 🦉

Which inference engine and tech stack if I can ask ? Have you used WebGPU and/or Metal on Mac/iOS?

Also which model you using?

It was only on a M3 CPU so far - no GPUs yet. We’ll do that benchmark later but it might be more expensive for relay operators / some users may not have GPUs.

gemma3:1b

llama3.2:1b

I mean it’s like day and night, these models needs GPU, no point otherwise. AFAIK the memory is unified and you should be able to use it with Metal kernels.