planning
Last edited: August 8, 2025A decision making method using search on a model of the problem to be able tom make decisions.
- create a (usually deterministic, but for CS238 we care only about non-deterministic cases) model of the problem or a good approximation thereof
- use the model to plan for possible next actions to yield for a good solution
contrast v. explicit programming
explicit programming requires you to plan for the action
Planning for Learning
Last edited: August 8, 2025point selection
Last edited: August 8, 2025then collect
Point-Based Value Iteration
Last edited: August 8, 2025we keep track of a bunch of alpha vectors and belief samples (which we get from point selection):
\begin{equation} \Gamma = \{\alpha_{1}, \dots, \alpha_{m}\} \end{equation}
and
\begin{equation} B = \{b_1, \dots, b_{m}\} \end{equation}
To preserve the lower-boundedness of these alpha vectors, one should seed the alpha vectors via something like blind lower bound
We can estimate our utility function at any belief by looking in the set for the most optimal:
\begin{equation} U^{\Gamma}(b) = \max_{\alpha \in \Gamma} \alpha^{\top}b \end{equation}
pointer
Last edited: August 8, 2025A pointer is a variable which stores memory addresses. Because there are no pass-by reference, we use pointers to emulate pass by reference: by sharing addresses with other functions.
A pointer can identify a single byte OR some large data structures. We can dynamically allocate pointers, and also identify memory generically without types.
C is always pass-by-copy. Therefore, to pass-by-reference, you basically have to
int x = 2; // declare object
int *xptr = &x; // get location of object (&: address of)
printf("%d\n", *xptr); // dereference the pointer
address operator
You will note, in the line above:
