Attention in NLP

์ด ๊ธ€์—์„œ๋Š” attention์ด ๋ฌด์—‡์ธ์ง€, ๋ช‡ ๊ฐœ์˜ ์ค‘์š”ํ•œ ๋…ผ๋ฌธ๋“ค์„ ์ค‘์‹ฌ์œผ๋กœ ์ •๋ฆฌํ•˜๊ณ  NLP์—์„œ ์–ด๋–ป๊ฒŒ ์“ฐ์ด๋Š” ์ง€๋ฅผ ์ •๋ฆฌํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋ชฉ์ฐจ ๊ธฐ์กด Encoder-Decoder ๊ตฌ์กฐ์—์„œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ Basic Idea Attention Score Functions What Do We Attend To? Multi-headed Attention Transformer ๊ธฐ์กด Encoder-Decoder ๊ตฌ์กฐ์—์„œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ Encoder-Decoder ๊ตฌ์กฐ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์€ input sequence๋ฅผ ์–ด๋–ป๊ฒŒ vectorํ™”ํ•  ๊ฒƒ์ด๋ƒ๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ NLP์—์„œ๋Š” input sequence์ด๊ฐ€ dynamicํ•  ๊ตฌ์กฐ์ผ ๋•Œ๊ฐ€ ๋งŽ์œผ๋ฏ€๋กœ, ์ด๋ฅผ ๊ณ ์ •๋œ ๊ธธ์ด์˜ ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ค๋ฉด์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ฆ‰, โ€œ์•ˆ๋…•โ€ ์ด๋ผ๋Š” ๋ฌธ์žฅ์ด๋‚˜ โ€œ์˜ค๋Š˜ ๋‚ ์”จ๋Š” ์ข‹๋˜๋ฐ ๋ฏธ์„ธ๋จผ์ง€๋Š” ์‹ฌํ•˜๋‹ˆ๊น ๋‚˜๊ฐˆ ๋•Œ ๋งˆ์Šคํฌ ๊ผญ ์“ฐ๊ณ  ๋‚˜๊ฐ€๋ ด!โ€ ์ด๋ผ๋Š” ๋ฌธ์žฅ์ด ๋‹ด๊ณ  ์žˆ๋Š” ์ •๋ณด์˜ ์–‘์ด ๋งค์šฐ ๋‹ค๋ฆ„์—๋„ encoder-decoder๊ตฌ์กฐ์—์„œ๋Š” ๊ฐ™์€ ๊ธธ์ด์˜ vector๋กœ ๋ฐ”๊ฟ”์•ผ ํ•˜์ฃ . Attention์€ ๊ทธ ๋‹จ์–ด์—์„œ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, sequence data์—์„œ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์–ด๋А ๋ถ€๋ถ„์— ํŠนํžˆ ๋” ์ฃผ๋ชฉ์„ ํ•ด์•ผํ•˜๋Š” ์ง€๋ฅผ ๋ฐ˜์˜ํ•จ์œผ๋กœ์จ ์ •๋ณด ์†์‹ค๋„ ์ค„์ด๊ณ  ๋” ์ง๊ด€์ ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ฒ˜์Œ ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ...

1์›” 26, 2019 ยท 3 ๋ถ„ ยท AngryPark