Replying to Avatar jb55

As performance optimization enjoyer i can’t help but look at the transformer architecture in LLMs and notice how incredibly inefficient they are, specifically the attention mechanism.

Looks like i am not the only one who has noticed this and it seems like people are working on it.

https://arxiv.org/pdf/2406.15786

Lots of ai researchers are not performance engineers and it shows. I suspect we can reach similar results with much less computational complexity. This will be good news if you want to run these things on your phone.

I would say efficiency is fairly low on the list of attributes held by normal engineers, I like to install a screw by mashing it with a hammer

Reply to this note

Please Login to reply.

Discussion

No replies yet.