Mahajan 2021
Last edited: August 8, 2025DOI: 10.3389/fnagi.2021.623607
One-Liner
Trained a bimodal model on speech/text with GRU on speech and CNN-LSTM on text.
Novelty
- A post-2019 NLP paper that doesn’t use transformers! (so
faster(they used CNN-LSTM) lighter easier) - “Our work sheds light on why the accuracy of these models drops to 72.92% on the ADReSS dataset, whereas, they gave state of the art results on the DementiaBank dataset.”
Notable Methods
Bi-Modal audio and transcript processing vis a vi Shah 2021, but with a CNN-LSTM and GRU on the other side.
Mahatma Ghandi
Last edited: August 8, 2025Make Models Go Brrr: Model Parallel Whisper Training
Last edited: August 8, 2025Happy Monday friends.
The deliverable of the week was to make the a ASR model for Batchalign. Essentially, most copies of Whisper is pretty bad at Language Sample Analysis (LSA), because they mostly don’t work in terms trying to actually capture the things that people doing LSA want to capture (disfluencies, stuttering, etc.). OpenAI even acknowledged in the paper that they filtered out the disfluencies from their gold transcript to prevent Whisper from writing down too much of them.
map restriction operator
Last edited: August 8, 2025Suppose \(T \in \mathcal{L}(V)\), and \(U \subset V\), an invariant subspace under \(T\). Then:
\begin{equation} T|_{U}(u) = Tu,\ \forall u \in U \end{equation}
where \(T|_{U} \in \mathcal{L}(U)\)
mapping reduction
Last edited: August 8, 2025A language \(A\) is mapping reducible to language \(B\), written as \(A \leq_{m} B\), if there is a computable function \(f: \Sigma^{*} \to \Sigma ^{ *}\) such that for every \(w\), \(w \in A \Leftrightarrow f(w) \in B\).
This is sometimes called a “many-to-one” reduction because often times you want to have multiple \(w\) mapping to the same \(f(w)\).
We remember this as “A is weaker (“not stronger”) than B”; or “A is reducable to B”