Researchers have proposed a new approach to self-attention in transformer models, called "Self-Attention with Polynomial Activations" (SAPA). The authors argue that the traditional softmax function used in self-attention layers has limitations, such as producing "peaky" attention distributions that may not capture all relevant information. SAPA employs polynomial functions instead of softmax to compute attention weights, leading to more balanced attention distributions and potentially better model performance.

Source: https://dev.to/mikeyoung44/rethinking-self-attention-polynomial-activations-for-capturing-long-range-dependencies-hof

Reply to this note

Please Login to reply.

Discussion

No replies yet.