MapReduce
Last edited: August 8, 2025MapReduce is an distributed algorithm.

https://www.psc.edu/wp-content/uploads/2023/07/A-Brief-History-of-Big-Data.pdf
- Map: \((in\_key, in\_value) \Rightarrow list(out\_key, intermediate\_value)\).
- Reduce:
- Group map outputs by \(out\_key\)
- \((out\_key, list(intermediate\_value)) \Rightarrow list(out\_value)\)
example of MapReduce
Say, if you want to count word frequencies in a set of documents.
- Map: \((document\_name, document\_contents) \Rightarrow list(word, #\ occurrences)\)
You can see that this can be distributed to multiple processors. You can have each processor count the word frequencies in a single document. We have now broken the contents into divide and conquerable groups.
Markov Chain
Last edited: August 8, 2025A Markov Chain is a chain of \(N\) states, with an \(N \times N\) transition matrix.
- at each step, we are in exactly one of those states
- the matrix \(P_{ij}\) tells us \(P(j|i)\), the probability of going to state \(j\) given you are at state \(i\)
And therefore:
\begin{equation} \sum_{j=1}^{N} P_{ij} = 1 \end{equation}
Ergotic Markov Chain
a markov chain is Ergotic if…
- you have a path from any one state to any other
- for any start state, after some time \(T_0\), the probability of being in any state at any \(T > T_0\) is non-zero
Every Ergotic Markov Chain has a long-term visit rate:
Markov Chain Monte-Carlo
Last edited: August 8, 2025Start with an initial sample \(\tau\)
At each distribution…
- sample \(\tau ’ \sim g\qty(\cdot | \tau)\) (for instance, \(\tau’ \sim \mathcal{N}\qty(\cdot | \tau, \sigma^{2})\))
- accept the sample with probability given by \(\frac{\bar{p} \qty(\tau’) g\qty(\tau | \tau’)}{\bar{p}\qty(\tau) g\qty(\tau’ | \tau)}\), otherwise keep \(\tau\)
intuition
The kernel is often chosen to be symmetric, so:
\begin{equation} g\qty(\tau | \tau’) = g\qty(\tau’ | \tau) \end{equation}
we want to sample from areas of high density more often. Consider:
Markov Decision Process
Last edited: August 8, 2025A MDP is a decision network whereby a sequences of actions causes a sequence of states. Each state is dependent on the action we take and the state we are in, and each utility is dependent on action taken and the state we are in.
Note that, unlike a POMDP, we know what state we are in—the observations from the states are just unclear.

constituents
- \(S\): state space (assuming discrete for now, there are \(n\) states) — “minimum set of information that allows you to solve a problem”
- \(A\): action space — set of things your agent can do
- \(T(s’ | s,a)\): “dynamics”, state-transition model “probability that we end up in \(s’\) given \(s\) and action \(a\)”: good idea to make a table of probabilities of source vs. destination variables
- \(R(s,a,s’)\): expected reward given in an action and a state (real world reward maybe stochastic)
- \(\pi_{t}(s_{1:t}, a_{1:t-1})\): the policy, returning an action, a system of assigning actions based on states
- however, our past states are d-seperated from our current action given knowing the state, so really we have \(\pi_{t}(s_{t})\)
additional information
We assume policy to be exact right now.
Markov Equivalence Classes
Last edited: August 8, 2025\(A \to B\) has the same independence relationship as \(B\to A\). How do we describe it?
requirements
If two Baysian Networks encode the same conditional independence assumptions, they are Markov Equivalent.
additional information
checking Markov Equivalence
Two graphs are Markov Equivalent, IFF BOTH:
- some edges without regard to direction (“same skeleton”)
- the same set of immoral v-structures