nostr:npub1xqm3e9m6fkr2lnsprxke4grhrly645pzw2z9k8kwm0eacrukxwcqwf85zq that's a really annoying feature of llama-cpp that I've not been able to completely work around yet - see also this related issue https://github.com/mlc-ai/mlc-llm/issues/740
Discussion
My code that attempts to fix that is here but there's still some output that leaks to stderr somehow https://github.com/simonw/llm-llama-cpp/blob/b0c2f25165adde7204c7dd9eb80535447fd333f6/llm_llama_cpp.py#L255