- draw an initial state \(q_1\) from the initial state distribution \(\pi\)
- For each state \(q_{i}\)…
- Drew observe something \(o_{t}\) according to the action distribution of state \(q_{i}\)
- Use transition probability \(a_{i,j}\) to draw a next state \(q_{j}\)

Isolated recognition: train a family of HMMs, one for each word or something. Then, given new data, perform scoring of the HMM onto the features.

## components of HMMs

### scoring

Given an observation \(o_1, …, o_{T}\) and a model, we compute $P(O | λ)$—the probability of a sequence given a model \(\lambda\)

“forward and backward algorithm”

### decoding

Given observations, find the state sequence \(q1, …, q_{T}\) most likely to have generated

its dijisktra: for every block, label each edge in the trellis with distance to the recieved code. then we dijistra to find the shorted path based on those edge distances.

### training

Given observations \(O\), find the model parameters \(\lambda\) that maximize \(P(O|\lambda)\), the .

## continuous-density HMM

There are some HMMs that blend the discrete timestamps into s.

## continuous speech

Scoring becomes hard because you have to go through and calculate every freaking word. THerefore:

\begin{equation} P(W|O) = \frac{P(O|W) P(W)}{P(O)} \end{equation}

Therefore, we really desire:

\begin{equation} \arg\max_{w} P(O|W) P(W) \end{equation}