Markov Chain

Last edited: August 8, 2025

A Markov Chain is a chain of \(N\) states, with an \(N \times N\) transition matrix.

at each step, we are in exactly one of those states
the matrix \(P_{ij}\) tells us \(P(j|i)\), the probability of going to state \(j\) given you are at state \(i\)

And therefore:

\begin{equation} \sum_{j=1}^{N} P_{ij} = 1 \end{equation}

Ergotic Markov Chain

a markov chain is Ergotic if…

you have a path from any one state to any other
for any start state, after some time \(T_0\), the probability of being in any state at any \(T > T_0\) is non-zero

Every Ergotic Markov Chain has a long-term visit rate:

Markov Decision Process

Last edited: August 8, 2025

A MDP is a decision network whereby a sequences of actions causes a sequence of states. Each state is dependent on the action we take and the state we are in, and each utility is dependent on action taken and the state we are in.

Note that, unlike a POMDP, we know what state we are in—the observations from the states are just unclear.

constituents

\(S\): state space (assuming discrete for now, there are \(n\) states) — “minimum set of information that allows you to solve a problem”
\(A\): action space — set of things your agent can do
\(T(s’ | s,a)\): “dynamics”, state-transition model “probability that we end up in \(s’\) given \(s\) and action \(a\)”: good idea to make a table of probabilities of source vs. destination variables
\(R(s,a,s’)\): expected reward given in an action and a state (real world reward maybe stochastic)
\(\pi_{t}(s_{1:t}, a_{1:t-1})\): the policy, returning an action, a system of assigning actions based on states
- however, our past states are d-seperated from our current action given knowing the state, so really we have \(\pi_{t}(s_{t})\)