A quantized version of Llama3-8B could run on a phone. Ram usage: ~4 GB in that case.

4 bit quantizations are not super dumb! Less than 4 bits get exponentially dumb tho.. (Quantizations make it faster, less ram but dumber. )

Reply to this note

Please Login to reply.

Discussion

No replies yet.