Which model are you using? I like the quality of the responses! Is it better than pygmalion

Reply to this note

Please Login to reply.

Discussion

Model is https://huggingface.co/TheBloke/airoboros-65B-gpt4-1.2-GGML

Software is https://github.com/ggerganov/llama.cpp

Not pretend that response was fast. A 30B or even 13B model might be faster than Pygmalion.

Llama can offload layers to GPU.

Koboldcpp can use llama.

That model is huge! How do you even run it?