Houjun Liu

Hidden Markov Model

  1. draw an initial state \(q_1\) from the initial state distribution \(\pi\)
  2. For each state \(q_{i}\)…
    1. Drew observe something \(o_{t}\) according to the action distribution of state \(q_{i}\)
    2. Use transition probability \(a_{i,j}\) to draw a next state \(q_{j}\)

Isolated recognition: train a family of HMMs, one for each word or something. Then, given new data, perform scoring of the HMM onto the features.

components of HMMs

scoring

Given an observation \(o_1, …, o_{T}\) and a model, we compute $P(O | λ)$—the probability of a sequence given a model \(\lambda\)

“forward and backward algorithm”

decoding

Given observations, find the state sequence \(q1, …, q_{T}\) most likely to have generated

its dijisktra: for every block, label each edge in the trellis with distance to the recieved code. then we dijistra to find the shorted path based on those edge distances.

training

Given observations \(O\), find the model parameters \(\lambda\) that maximize \(P(O|\lambda)\), the .

continuous-density HMM

There are some HMMs that blend the discrete timestamps into s.

continuous speech

Scoring becomes hard because you have to go through and calculate every freaking word. THerefore:

\begin{equation} P(W|O) = \frac{P(O|W) P(W)}{P(O)} \end{equation}

Therefore, we really desire:

\begin{equation} \arg\max_{w} P(O|W) P(W) \end{equation}