In my experiments, I am painfully frustrated with the hardware on hand for AI. Ollama, LMstudio, Jan, gpt4all. Let's say 30 tokens max with 8b. Tried out some basic local docs stuff with a single PDF/CSV, most 8b models can even read a basic spreadsheet without failing miserably