Mahatma Ghandi
Last edited: August 8, 2025Make Models Go Brrr: Model Parallel Whisper Training
Last edited: August 8, 2025Happy Monday friends.
The deliverable of the week was to make the a ASR model for Batchalign. Essentially, most copies of Whisper is pretty bad at Language Sample Analysis (LSA), because they mostly don’t work in terms trying to actually capture the things that people doing LSA want to capture (disfluencies, stuttering, etc.). OpenAI even acknowledged in the paper that they filtered out the disfluencies from their gold transcript to prevent Whisper from writing down too much of them.
map restriction operator
Last edited: August 8, 2025Suppose \(T \in \mathcal{L}(V)\), and \(U \subset V\), an invariant subspace under \(T\). Then:
\begin{equation} T|_{U}(u) = Tu,\ \forall u \in U \end{equation}
where \(T|_{U} \in \mathcal{L}(U)\)
mapping reduction
Last edited: August 8, 2025A language \(A\) is mapping reducible to language \(B\), written as \(A \leq_{m} B\), if there is a computable function \(f: \Sigma^{*} \to \Sigma ^{ *}\) such that for every \(w\), \(w \in A \Leftrightarrow f(w) \in B\).
This is sometimes called a “many-to-one” reduction because often times you want to have multiple \(w\) mapping to the same \(f(w)\).
We remember this as “A is weaker (“not stronger”) than B”; or “A is reducable to B”
MapReduce
Last edited: August 8, 2025MapReduce is an distributed algorithm.

https://www.psc.edu/wp-content/uploads/2023/07/A-Brief-History-of-Big-Data.pdf
- Map: \((in\_key, in\_value) \Rightarrow list(out\_key, intermediate\_value)\).
- Reduce:
- Group map outputs by \(out\_key\)
- \((out\_key, list(intermediate\_value)) \Rightarrow list(out\_value)\)
example of MapReduce
Say, if you want to count word frequencies in a set of documents.
- Map: \((document\_name, document\_contents) \Rightarrow list(word, #\ occurrences)\)
You can see that this can be distributed to multiple processors. You can have each processor count the word frequencies in a single document. We have now broken the contents into divide and conquerable groups.
