model-based reinforcement learning

Last edited: August 8, 2025

Step 1: Getting Model

We want a model

\(T\): transition probability
\(R\): rewards

Maximum Likelihood Parameter Learning Method

\begin{equation} N(s,a,s’) \end{equation}

which is the count of transitions from \(s,a\) to \(s’\) and increment it as \(s, a, s’\) gets observed. This makes, with Maximum Likelihood Parameter Learning:

\begin{equation} T(s’ | s,a) = \frac{N(s,a,s’)}{\sum_{s’’}^{} N(s,a,s’’)} \end{equation}

We also keep a table:

\begin{equation} p(s,a) \end{equation}

the sum of rewards when taking \(s,a\). To calculate a reward, we take the average:

model-free reinforcement learning

Last edited: August 8, 2025

In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.

review: estimating mean of a random variable

we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?

\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}

\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}

modeling

Last edited: August 8, 2025

Here are the main steps of generic modeling.

modern OS

Last edited: August 8, 2025

multi-core CPUs

Finally, actually multitasking: starting in mid 2000s, multiple cores are finally more common. management between cores is crucial

Moors Law Break Down

we have reached much of the limits of the speed of a single core
instead, we have to have more cores—which requires more management to take advantage of

More kinds of Cores

“performance” vs “efficiency” cores
needs to schedule for different tasks: not just who on what core, but who on what TYPE of core

Other Hardware

Specialized hardware in these chips, which is required for scheduling.

modular arithmetic

Last edited: August 8, 2025

Clock math.

We say that \(a\ \text{mod}\ b = r\) if \(a=bq+r\), such that \(b>0\) and \(0 \leq r <b\). More specifically, we denote:

\begin{equation} a \equiv a’\ \text{mod}\ b \end{equation}

if \(b|(a-a’)\).

additional information

basic modular arithmetic operations

\begin{align} (a+b)\ \text{mod}\ c &= ((a\ \text{mod}\ c) + (b\ \text{mod}\ c))\ \text{mod}\ c \\ (ab) \ \text{mod}\ c &= ((a\ \text{mod}\ c) (b \ \text{mod}\ c)) \ \text{mod}\ c \end{align}

examples of modular arithmetic

If \(a\ \text{mod}\ b = r\), \((-a)\ \text{mod}\ b = -r = b-r\)