Multi Armed Bandit
Recently, while studying Recommender Systems, I thought I needed to study the field of Multi-armed bandit. Iโve summarized it based on A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit. Table of Contents 1. Concept 2. Differences between MAB and Existing Statistical Models 1. Concept The background of the term Multi-armed Bandit (hereinafter MAB) is gambling. What is the method for someone to obtain maximum profit through N slot machines with different profit distributions within a given time? If given the opportunity to pull N slot machines within limited time to obtain profit, there should first be a process of exploring which slot machine can earn more money for some time (this is called Exploration), and then there is a process of maximizing profit by pulling slot machines that one judges to be good (this is called Exploitation). ...