If you wanna try llama2 70B:
https://labs.perplexity.ai/
How much fucking VRAM do you need to run this model?
Please Login to reply.
Dunno, but here a pure C Llama2 model that runs crazy fast on cpu
https://github.com/karpathy/llama2.c