What if you don’t know about a probability of success?
Beta Distribution time!!!
Multi-Arm Bandit
See Multi-Arm Bandit
Strategies:
- upper confidence bound: take the action with theh highest n-tn-thonfidence bound
- Posterior Sampling: take a sample from each Beta Distributions distribution; take the action that has a higher probability of success based on their r