MapReduce

Last edited: August 8, 2025

https://www.psc.edu/wp-content/uploads/2023/07/A-Brief-History-of-Big-Data.pdf

Map: \((in\_key, in\_value) \Rightarrow list(out\_key, intermediate\_value)\).
Reduce:
- Group map outputs by \(out\_key\)
- \((out\_key, list(intermediate\_value)) \Rightarrow list(out\_value)\)

example of MapReduce

Say, if you want to count word frequencies in a set of documents.

Map: \((document\_name, document\_contents) \Rightarrow list(word, #\ occurrences)\)

You can see that this can be distributed to multiple processors. You can have each processor count the word frequencies in a single document. We have now broken the contents into divide and conquerable groups.

Markov Chain

Last edited: August 8, 2025

A Markov Chain is a chain of \(N\) states, with an \(N \times N\) transition matrix.

at each step, we are in exactly one of those states
the matrix \(P_{ij}\) tells us \(P(j|i)\), the probability of going to state \(j\) given you are at state \(i\)

And therefore:

\begin{equation} \sum_{j=1}^{N} P_{ij} = 1 \end{equation}

Ergotic Markov Chain

a markov chain is Ergotic if…

you have a path from any one state to any other
for any start state, after some time \(T_0\), the probability of being in any state at any \(T > T_0\) is non-zero

Every Ergotic Markov Chain has a long-term visit rate:

Markov Chain Monte-Carlo

Last edited: August 8, 2025

Start with an initial sample \(\tau\)

At each distribution…

sample \(\tau ’ \sim g\qty(\cdot | \tau)\) (for instance, \(\tau’ \sim \mathcal{N}\qty(\cdot | \tau, \sigma^{2})\))
accept the sample with probability given by \(\frac{\bar{p} \qty(\tau’) g\qty(\tau | \tau’)}{\bar{p}\qty(\tau) g\qty(\tau’ | \tau)}\), otherwise keep \(\tau\)

intuition

The kernel is often chosen to be symmetric, so:

\begin{equation} g\qty(\tau | \tau’) = g\qty(\tau’ | \tau) \end{equation}

we want to sample from areas of high density more often. Consider:

Markov Decision Process

Last edited: August 8, 2025

A MDP is a decision network whereby a sequences of actions causes a sequence of states. Each state is dependent on the action we take and the state we are in, and each utility is dependent on action taken and the state we are in.

Note that, unlike a POMDP, we know what state we are in—the observations from the states are just unclear.

constituents

\(S\): state space (assuming discrete for now, there are \(n\) states) — “minimum set of information that allows you to solve a problem”
\(A\): action space — set of things your agent can do
\(T(s’ | s,a)\): “dynamics”, state-transition model “probability that we end up in \(s’\) given \(s\) and action \(a\)”: good idea to make a table of probabilities of source vs. destination variables
\(R(s,a,s’)\): expected reward given in an action and a state (real world reward maybe stochastic)
\(\pi_{t}(s_{1:t}, a_{1:t-1})\): the policy, returning an action, a system of assigning actions based on states
- however, our past states are d-seperated from our current action given knowing the state, so really we have \(\pi_{t}(s_{t})\)

additional information

We assume policy to be exact right now.

Markov Equivalence Classes

Last edited: August 8, 2025

\(A \to B\) has the same independence relationship as \(B\to A\). How do we describe it?

requirements

If two Baysian Networks encode the same conditional independence assumptions, they are Markov Equivalent.

additional information

checking Markov Equivalence

Two graphs are Markov Equivalent, IFF BOTH:

some edges without regard to direction (“same skeleton”)
the same set of immoral v-structures