This method introduces a search distribution instead of discrete points:
\begin{equation} p(\theta | \psi) \end{equation}
We want to know how parameters \(\theta\) are distributed, given some input parameters \(\psi\) (for instance, we assume parameters are gaussian distributed such as the mean/variance).
- Given this distribution, we sample \(m\) samples of \(\theta\) from the distribution. Those are our starting candidate points.
- We then check its policy for its utility via the Roll-out utility
- We want to take top \(k\) of our best performers, called “elite samples” \(m_{elite}\)
- Use the set of \(m_{elite}\) points, we fit a new distribution parameter \(\psi\) that describes those sample
This allows us to bound how many Roll-out utilities we are doing.
For each dimension, we should have 10x elite sample points (1d should have 10 samples, 2d should have 20, etc.)