Mahajan 2021

Last edited: August 8, 2025

DOI: 10.3389/fnagi.2021.623607

One-Liner

Trained a bimodal model on speech/text with GRU on speech and CNN-LSTM on text.

Novelty

A post-2019 NLP paper that doesn’t use transformers! (so ~~faster~~ (they used CNN-LSTM) lighter easier)
“Our work sheds light on why the accuracy of these models drops to 72.92% on the ADReSS dataset, whereas, they gave state of the art results on the DementiaBank dataset.”

Notable Methods

Bi-Modal audio and transcript processing vis a vi Shah 2021, but with a CNN-LSTM and GRU on the other side.

Mahatma Ghandi

Last edited: August 8, 2025

Make Models Go Brrr: Model Parallel Whisper Training

Last edited: August 8, 2025

Happy Monday friends.

The deliverable of the week was to make the a ASR model for Batchalign. Essentially, most copies of Whisper is pretty bad at Language Sample Analysis (LSA), because they mostly don’t work in terms trying to actually capture the things that people doing LSA want to capture (disfluencies, stuttering, etc.). OpenAI even acknowledged in the paper that they filtered out the disfluencies from their gold transcript to prevent Whisper from writing down too much of them.

map restriction operator

Last edited: August 8, 2025

Suppose \(T \in \mathcal{L}(V)\), and \(U \subset V\), an invariant subspace under \(T\). Then:

\begin{equation} T|_{U}(u) = Tu,\ \forall u \in U \end{equation}

where \(T|_{U} \in \mathcal{L}(U)\)

mapping reduction

Last edited: August 8, 2025

A language \(A\) is mapping reducible to language \(B\), written as \(A \leq_{m} B\), if there is a computable function \(f: \Sigma^{*} \to \Sigma ^{ *}\) such that for every \(w\), \(w \in A \Leftrightarrow f(w) \in B\).

This is sometimes called a “many-to-one” reduction because often times you want to have multiple \(w\) mapping to the same \(f(w)\).

We remember this as “A is weaker (“not stronger”) than B”; or “A is reducable to B”