Replying to Avatar iru@localhot $_

Model is https://huggingface.co/TheBloke/airoboros-65B-gpt4-1.2-GGML

Software is https://github.com/ggerganov/llama.cpp

Not pretend that response was fast. A 30B or even 13B model might be faster than Pygmalion.

Llama can offload layers to GPU.

Koboldcpp can use llama.

That model is huge! How do you even run it?

Reply to this note

Please Login to reply.

Discussion

No replies yet.