On hetzner ARM machines it works really well. On CAX41 and Nous Hermes Llama 2 13B (GGML q4_0) i get like 9 tokens/s, cool for quick messing arround.
Introducing LlamaGPT — a self-hosted, offline and private AI chatbot, powered by Llama 2, with absolutely no data leaving your device. 🔐
Yes, an entire LLM. ✨
Your Umbrel Home, Raspberry Pi (8GB) Umbrel, or custom umbrelOS server can run it with just 5GB of RAM!
Word generation benchmarks:
Umbrel Home: ~3 words/sec
Raspberry Pi (8GB RAM): ~1 word/sec
→ Watch the demo: https://youtu.be/iu3_1a8SzeA
→ Install on umbrelOS: https://apps.umbrel.com/app/llama-gpt
→ GitHub: https://github.com/getumbrel/llama-gpt
Discussion
No replies yet.