model-based reinforcement learning
Last edited: August 8, 2025Step 1: Getting Model
We want a model
- \(T\): transition probability
- \(R\): rewards
Maximum Likelihood Parameter Learning Method
\begin{equation} N(s,a,s’) \end{equation}
which is the count of transitions from \(s,a\) to \(s’\) and increment it as \(s, a, s’\) gets observed. This makes, with Maximum Likelihood Parameter Learning:
\begin{equation} T(s’ | s,a) = \frac{N(s,a,s’)}{\sum_{s’’}^{} N(s,a,s’’)} \end{equation}
We also keep a table:
\begin{equation} p(s,a) \end{equation}
the sum of rewards when taking \(s,a\). To calculate a reward, we take the average:
model-free reinforcement learning
Last edited: August 8, 2025In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.
review: estimating mean of a random variable
we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?
\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}
\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}
modern OS
Last edited: August 8, 2025multi-core CPUs
Finally, actually multitasking: starting in mid 2000s, multiple cores are finally more common. management between cores is crucial
Moors Law Break Down
- we have reached much of the limits of the speed of a single core
- instead, we have to have more cores—which requires more management to take advantage of
More kinds of Cores
- “performance” vs “efficiency” cores
- needs to schedule for different tasks: not just who on what core, but who on what TYPE of core
Other Hardware
Specialized hardware in these chips, which is required for scheduling.
modular arithmetic
Last edited: August 8, 2025Clock math.
We say that \(a\ \text{mod}\ b = r\) if \(a=bq+r\), such that \(b>0\) and \(0 \leq r <b\). More specifically, we denote:
\begin{equation} a \equiv a’\ \text{mod}\ b \end{equation}
if \(b|(a-a’)\).
additional information
basic modular arithmetic operations
\begin{align} (a+b)\ \text{mod}\ c &= ((a\ \text{mod}\ c) + (b\ \text{mod}\ c))\ \text{mod}\ c \\ (ab) \ \text{mod}\ c &= ((a\ \text{mod}\ c) (b \ \text{mod}\ c)) \ \text{mod}\ c \end{align}
examples of modular arithmetic
If \(a\ \text{mod}\ b = r\), \((-a)\ \text{mod}\ b = -r = b-r\)

