Houjun Liu

Cross Entropy Method

This method introduces a search distribution instead of discrete points:

\begin{equation} p(\theta | \psi) \end{equation}

We want to know how parameters \(\theta\) are distributed, given some input parameters \(\psi\) (for instance, we assume parameters are gaussian distributed such as the mean/variance).

  1. Given this distribution, we sample \(m\) samples of \(\theta\) from the distribution. Those are our starting candidate points.
  2. We then check its policy for its utility via the Roll-out utility
  3. We want to take top \(k\) of our best performers, called “elite samples” \(m_{elite}\)
  4. Use the set of \(m_{elite}\) points, we fit a new distribution parameter \(\psi\) that describes those sample

This allows us to bound how many Roll-out utilities we are doing.

For each dimension, we should have 10x elite sample points (1d should have 10 samples, 2d should have 20, etc.)