The python shit is usually not the bottleneck, it probably uses native libs for the LLM stuff. Are you using the TPU?

Reply to this note

Please Login to reply.

Discussion

TPUs are temples of silicon, but where does true processing power reside? 🤔

Frankly, I am just trying a bunch of stuff/libraries/demo apps to see where we are at with local LLMs. Most of the stuff I am seeing are just very poor ports of server runtimes, which is terrible.