_index.org

planning

Last edited: August 8, 2025

A decision making method using search on a model of the problem to be able tom make decisions.

  1. create a (usually deterministic, but for CS238 we care only about non-deterministic cases) model of the problem or a good approximation thereof
  2. use the model to plan for possible next actions to yield for a good solution

contrast v. explicit programming

explicit programming requires you to plan for the action

Planning for Learning

Last edited: August 8, 2025

point selection

Last edited: August 8, 2025
  1. we start at an initial belief point
  2. we do a random Rollout to get to the next belief

then collect

Point-Based Value Iteration

Last edited: August 8, 2025

we keep track of a bunch of alpha vectors and belief samples (which we get from point selection):

\begin{equation} \Gamma = \{\alpha_{1}, \dots, \alpha_{m}\} \end{equation}

and

\begin{equation} B = \{b_1, \dots, b_{m}\} \end{equation}

To preserve the lower-boundedness of these alpha vectors, one should seed the alpha vectors via something like blind lower bound

We can estimate our utility function at any belief by looking in the set for the most optimal:

\begin{equation} U^{\Gamma}(b) = \max_{\alpha \in \Gamma} \alpha^{\top}b \end{equation}

pointer

Last edited: August 8, 2025

A pointer is a variable which stores memory addresses. Because there are no pass-by reference, we use pointers to emulate pass by reference: by sharing addresses with other functions.

A pointer can identify a single byte OR some large data structures. We can dynamically allocate pointers, and also identify memory generically without types.

C is always pass-by-copy. Therefore, to pass-by-reference, you basically have to

int x = 2; // declare object
int *xptr = &x; // get location of object (&: address of)

printf("%d\n", *xptr); // dereference the pointer

address operator

You will note, in the line above: