I am slowly working towards what I would consider my personal ultimate selfhosting goal: Using the Tenstorrent Greyskull NPU/AI Accelerator + the Milk-V Oasis as a local AI server to host both my own private personal assistant as well as backing that up with my own k8s cluster to run various services and storing encrypted backups in a remote location. Why? Cuz #fuckthecloud - that's why. xD
Discussion
ollama + openwebui + pipelines are decent
I know. I am looking into writing a backend for localai that can utilize either the Greyskull or Sophgos NPUs. :) The way they do it by streamlining certain API calls through gRPC - so, ultimatively, if I write a backend that mimics the llama.cpp one and exposes the same methods and then register it, it should "just work". Frameworks for that are provided by both vendors; Greyskull's is open source, too. ^^