4 seconds. But it's all hacked up with python shit everywhere. I need to explore a more native library. More to come.

Reply to this note

Please Login to reply.

Discussion

The python shit is usually not the bottleneck, it probably uses native libs for the LLM stuff. Are you using the TPU?

Frankly, I am just trying a bunch of stuff/libraries/demo apps to see where we are at with local LLMs. Most of the stuff I am seeing are just very poor ports of server runtimes, which is terrible.