Up to 30B parameter should fit just fine, though it will still be bottlenecked by GPU, so only expect a few words per second.

Reply to this note

Please Login to reply.

Discussion

No replies yet.