What kind of GPU are you running it on? The 8b dataset doesn't beat ChatGPT, right?
Discussion
4090. I haven't had much time to compare any models yet, and I don't know how to read those comparison charts. I think larger models can be quantized to fit into less VRAM but performance suffers as you get down to 4 and 2-bit.