Nostr Web Client

How close is the model from running on a phone?

A quantized version of Llama3-8B could run on a phone. Ram usage: ~4 GB in that case.

4 bit quantizations are not super dumb! Less than 4 bits get exponentially dumb tho.. (Quantizations make it faster, less ram but dumber. )

Please Login to reply.

No replies yet.