Multi Armed Bandit

์ตœ๊ทผ Recommendar System์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•˜๋ฉด์„œ, Multi-armed bandit์ด๋ผ๋Š” ๋ถ„์•ผ์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋˜ ์ฐจ์— A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit์„ ๋ฐ”ํƒ•์œผ๋กœ ์ •๋ฆฌํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋ชฉ์ฐจ 1. Concept 2. MAB์™€ ๊ธฐ์กด ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋“ค๊ณผ์˜ ์ฐจ์ด์  1. Concept Multi-armed Bandit(์ดํ•˜ MAB)๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ ๋ฐฐ๊ฒฝ์€ ๊ฒœ๋ธ”๋ง์ž…๋‹ˆ๋‹ค. ์–ด๋–ค ์‚ฌ๋žŒ์ด ์ฃผ์–ด์ง„ ์‹œ๊ฐ„์•ˆ์—, ์ˆ˜์ต ๋ถ„ํฌ๊ฐ€ ๋‹ค ๋‹ค๋ฅธ N๊ฐœ์˜ ์Šฌ๋กฏ๋จธ์‹ ์„ ํ†ตํ•ด ์ตœ๋Œ€์˜ ์ˆ˜์ต์„ ์–ป๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ์š”? ๋งŒ์•ฝ ์ œํ•œ๋œ ์‹œ๊ฐ„์— N๊ฐœ์˜ ์Šฌ๋กฏ๋จธ์‹ ๋“ค์„ ๋‹น๊ฒจ์„œ ์ˆ˜์ต์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๋ฉด, ์ผ๋‹จ์€ ์–ด๋А ์‹œ๊ฐ„ ๋™์•ˆ์€ ์–ด๋А ์Šฌ๋กฏ ๋จธ์‹ ์ด ๋ˆ์„ ๋งŽ์ด ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ง€ ํƒ์ƒ‰ํ•˜๋Š” ๊ณผ์ •์ด ์žˆ์–ด์•ผ ํ• ๊บผ๊ณ (์ด๋ฅผ Exploration์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค), ๊ทธ ๋‹ค์Œ์—๋Š” ์ž์‹ ์ด ํŒ๋‹จํ•˜๊ธฐ์— ๊ดœ์ฐฎ์€ ์Šฌ๋กฏ ๋จธ์‹ ์„ ๋Œ๋ฆฌ๋ฉด์„œ ์ˆ˜์ต์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค(์ด๋ฅผ Exploitation์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค). ...

2์›” 5, 2019 ยท 4 ๋ถ„ ยท AngryPark