The Math Behind Multi-Head Attention in Transformers
lnbc36u1pnfn3h4pp5sqv4dp4me3mc00nwddmgjsfjxzs35aa6ec72tv3psx9vxtsz33nqhp5htnadm973k2va4zy6y6tzcvdnzssqwhelv8unuaad2rd9pc7ez7scqzzsxqyz5vqsp5pthy8x9rnjxx3x8swjrqvhkgy5eucnp5cqs5vfylwsc5t8xzf7rs9qxpqysgqjzwdvp4rst5fw3edz2ngly6k3gqxgem2dznaup5lkv7rxsuehjura7uxsz6cnqfj2wy0nnyv98rxmxkelwxqxju4t9aancp8phcc74cpggmcmu
https://towardsdatascience.com/the-math-behind-multi-head-attention-in-transformers-c26cba15f625