https://PremAI.io can help!
Wild timing… 2 days after I posted this.
When the faster and smaller LLMs arrive it’s going to be awesome. Gotta prepare everything else first. 🌐🧱
https://x.com/testingcatalog/status/1916091849795653920
nostr:note1ym30kgavaj30jg4867k28526tsy75wa9zvzut30myrg2ewwyphms4lwpxu
Discussion
We were already using 1B models but it was too slow. Need faster speeds for smaller models.
We’ve got the code ready now though so when a smaller faster model does finally arrive, game on. 🦉
Which inference engine and tech stack if I can ask ? Have you used WebGPU and/or Metal on Mac/iOS?
Also which model you using?
It was only on a M3 CPU so far - no GPUs yet. We’ll do that benchmark later but it might be more expensive for relay operators / some users may not have GPUs.
gemma3:1b
llama3.2:1b
I mean it’s like day and night, these models needs GPU, no point otherwise. AFAIK the memory is unified and you should be able to use it with Metal kernels.