Houjun Liu

Mahajan 2021

# ntj

DOI: 10.3389/fnagi.2021.623607


Trained a bimodal model on speech/text with GRU on speech and CNN-LSTM on text.


  • A post-2019 NLP paper that doesn’t use transformers! (so faster (they used CNN-LSTM) lighter easier)
  • “Our work sheds light on why the accuracy of these models drops to 72.92% on the ADReSS dataset, whereas, they gave state of the art results on the DementiaBank dataset.”

Notable Methods

Bi-Modal audio and transcript processing vis a vi Shah 2021, but with a CNN-LSTM and GRU on the other side.

Key Figs

Figure 1: Proposed Architecture

The figure highlights the authors’ proposed architecture

Figure 2: confusion matrix

In addition to validating prior work by Karlekar 2018 and Di Palo 2019, proposed model C and got accuracy of \(73.92\%\).