How long does it take on your pixel (which model?)
Discussion
4 seconds. But it's all hacked up with python shit everywhere. I need to explore a more native library. More to come.
The python shit is usually not the bottleneck, it probably uses native libs for the LLM stuff. Are you using the TPU?
Frankly, I am just trying a bunch of stuff/libraries/demo apps to see where we are at with local LLMs. Most of the stuff I am seeing are just very poor ports of server runtimes, which is terrible.