- draw an initial state \(q_1\) from the initial state distribution \(\pi\)
- For each state \(q_{i}\)…
- Drew observe something \(o_{t}\) according to the action distribution of state \(q_{i}\)
- Use transition probability \(a_{i,j}\) to draw a next state \(q_{j}\)
Isolated recognition: train a family of HMMs, one for each word or something. Then, given new data, perform scoring of the HMM onto the features.
components of HMMs
scoring
Given an observation \(o_1, …, o_{T}\) and a model, we compute $P(O | λ)$—the probability of a sequence given a model \(\lambda\)
“forward and backward algorithm”
decoding
Given observations, find the state sequence \(q1, …, q_{T}\) most likely to have generated
its dijisktra: for every block, label each edge in the trellis with distance to the recieved code. then we dijistra to find the shorted path based on those edge distances.
training
Given observations \(O\), find the model parameters \(\lambda\) that maximize \(P(O|\lambda)\), the .
continuous-density HMM
There are some HMMs that blend the discrete timestamps into s.
continuous speech
Scoring becomes hard because you have to go through and calculate every freaking word. THerefore:
\begin{equation} P(W|O) = \frac{P(O|W) P(W)}{P(O)} \end{equation}
Therefore, we really desire:
\begin{equation} \arg\max_{w} P(O|W) P(W) \end{equation}