Model is https://huggingface.co/TheBloke/airoboros-65B-gpt4-1.2-GGML
Software is https://github.com/ggerganov/llama.cpp
Not pretend that response was fast. A 30B or even 13B model might be faster than Pygmalion.
Llama can offload layers to GPU.
Koboldcpp can use llama.
That model is huge! How do you even run it?
Please Login to reply.
No replies yet.