sota self-hosted model rn is Mixtral if you have 26+ gb vram, otherwise the best open weight models seem to be orca, neural-chat, or solar. And codellama for code completion.

TheBloke has explamav2 GPTQ quants that I've been running on my oldass RTX2080

Reply to this note

Please Login to reply.

Discussion

No replies yet.