Planning for Learning
Last edited: August 8, 2025point selection
Last edited: August 8, 2025then collect
Point-Based Value Iteration
Last edited: August 8, 2025we keep track of a bunch of alpha vectors and belief samples (which we get from point selection):
\begin{equation} \Gamma = \{\alpha_{1}, \dots, \alpha_{m}\} \end{equation}
and
\begin{equation} B = \{b_1, \dots, b_{m}\} \end{equation}
To preserve the lower-boundedness of these alpha vectors, one should seed the alpha vectors via something like blind lower bound
We can estimate our utility function at any belief by looking in the set for the most optimal:
\begin{equation} U^{\Gamma}(b) = \max_{\alpha \in \Gamma} \alpha^{\top}b \end{equation}
pointer
Last edited: August 8, 2025A pointer is a variable which stores memory addresses. Because there are no pass-by reference, we use pointers to emulate pass by reference: by sharing addresses with other functions.
A pointer can identify a single byte OR some large data structures. We can dynamically allocate pointers, and also identify memory generically without types.
C is always pass-by-copy. Therefore, to pass-by-reference, you basically have to
int x = 2; // declare object
int *xptr = &x; // get location of object (&: address of)
printf("%d\n", *xptr); // dereference the pointer
address operator
You will note, in the line above:
poisson distribution
Last edited: August 8, 2025Let’s say we want to know what is the chance of having an event occurring \(k\) times in a unit time, on average, this event happens at a rate of \(\lambda\) per unit time.
“What’s the probability that there are \(k\) earthquakes in the 1 year if there’s on average \(2\) earthquakes in 1 year?”
where:
- events have to be independent
- probability of sucess in each trial doesn’t vary
constituents
- $λ$—count of events per time
- \(X \sim Poi(\lambda)\)