Scaled Dot-Product Attention

27/03/2024.

Self-attention is the core mechanism behind Transformer models, which have provided state-of-the-art results in various scientific fields (i.e. Natural Language Processing).