Llama 3.2 3b fine-tuned model running locally on device offline, at around 10 tokens/sec. ๐Ÿ‘€

nostr:nevent1qqsvjywpt5uqls5k7jtfdkq9ss56dq3t070cc65653f46pnmywlfzqgpzamhxue69uhhyetvv9ujuvrcvd5xzapwvdhk6tczyrr0wpmlz6va2r8e92t990ltl7kqtlrgg2u7uwgs38v4nw9dt4y06qcyqqqqqqgakxdty

Reply to this note

Please Login to reply.

Discussion

Just tried this today as well! Which model do you think is best for most general use cases? Llama is my first choice as of now

This is pretty amazing to be honest. Almost 1k tokens per minute on a decent model. I assume it's a low watt ARM machine, can you calculate the sats per minute that it costs?

How are you doing this?