Neural-Network

This post summarizes what attention is, focusing on several important papers and how it is used in NLP. Table of Contents Problems with Existing Encoder-Decoder Architecture Basic Idea Attention Score Functions What Do We Attend To? Multi-headed Attention Transformer Problems with Existing Encoder-Decoder Architecture The most important part of the Encoder-Decoder architecture is how to vectorize the input sequence. In NLP, input sequences often have dynamic structures, so problems arise when converting them to fixed-length vectors. For example, sentences like “Hello” and “The weather is nice today but the fine dust is severe, so make sure to wear a mask when you go out!” contain very different amounts of information, yet the encoder-decoder structure must convert them to vectors of the same length. Attention was first proposed to reduce information loss and solve problems more intuitively by reflecting which parts should be paid particular attention to in sequence data, as the word suggests. ...