Is it supposed to be so slow? I am getting like 1 token a second on my m1 on 8 threads. Seems to work a bit faster in my ryzen on 16 threads, maybe not as optimized on apple silicon yet? His demo seemed much faster than mine though :/

Reply to this note

Please Login to reply.

Discussion

Most likely you are doing something wrong… it should be reasonably fast.

How did you build it and how do you invoke it?

Use 4 threads, it’s much faster. How much RAM do you have?