Up to 30B parameter should fit just fine, though it will still be bottlenecked by GPU, so only expect a few words per second.
Up to 30B parameter should fit just fine, though it will still be bottlenecked by GPU, so only expect a few words per second.
No replies yet.