Ntj

Laguarta 2021

Last edited: August 8, 2025

DOI: 10.3389/fcomp.2021.624694

One-Liner

Proposed a large multimodal approach to embed auditory info + biomarkers for baseline classification.

Novelty

Developed a massively multimodal audio-to-embedding correlation system that maps audio to biomarker information collected (mood, memory, respiratory) and demonstrated its ability to discriminate cough results for COVID. (they were looking for AD; whoopsies)

Notable Methods

  • Developed a feature extraction model for AD detection named Open Voice Brain Model
  • Collected a dataset on people coughing and correlated it with biomarkers

Key Figs

Figure 2

This is MULTI-MODAL as heck

Lindsay 2021

Last edited: August 8, 2025

DOI: 10.3389/fnagi.2021.642033

One-Liner

Proposed cross-linguistic markers shared for AD patients between English and French; evaluated features found with standard ML.

Novelty

Multi-lingual, cross-linguistic analysis.

Notable Methods

  • Looked at common patters between the two languages
  • Linguistic results scored by IUs on CTP task

Key Figs

Figure 1

This figure tells us the various approaches measured.

Table 2

Here’s a list of semantic features extracted

Table 3

Here’s a list of NLP features extracted. Bolded items represent P <0.001 correlation for AD/NonAD difference between English and French.

Luz 2021

Last edited: August 8, 2025

DOI: 10.1101/2021.03.24.21254263

One-Liner

Review paper presenting the \(ADReSS_o\) challenge and current baselines for three tasks

Notes

Three tasks + state of the art:

  • Classification of AD: accuracy \(78.87\%\)
  • Prediction of MMSE score: RMSE \(5.28\)
  • Prediction of cognitive decline: accuracy \(68.75\%\)

Task 1

AD classification baseline established by decision tree with late fusion

(LOOCV and test)

Task 2

MMSE score prediction baseline established by grid search on parameters.

SVR did best on both counts; results from either model are averaged for prediction.

Mahajan 2021

Last edited: August 8, 2025

DOI: 10.3389/fnagi.2021.623607

One-Liner

Trained a bimodal model on speech/text with GRU on speech and CNN-LSTM on text.

Novelty

  • A post-2019 NLP paper that doesn’t use transformers! (so faster (they used CNN-LSTM) lighter easier)
  • “Our work sheds light on why the accuracy of these models drops to 72.92% on the ADReSS dataset, whereas, they gave state of the art results on the DementiaBank dataset.”

Notable Methods

Bi-Modal audio and transcript processing vis a vi Shah 2021, but with a CNN-LSTM and GRU on the other side.

Martinc 2021

Last edited: August 8, 2025

DOI: 10.3389/fnagi.2021.642647

One-Liner

Combined bag-of-words on transcript + ADR on audio to various classifiers for AD; ablated BERT’s decesion space for attention to make more easy models in the future.

Novelty

  • Pre-processed each of the two modalities before fusing it (late fusion)
  • Archieved \(93.75\%\) accuracy on AD detection
  • The data being forced-aligned and fed with late fusion allows one to see what sounds/words the BERT model was focusing on by just focusing on the attention on the words

Notable Methods

  • Used classic cookie theft data
  • bag of words to do ADR but for words
  • multimodality but late fusion with one (hot-swappable) classifier

Key Figs

How they did it

This is how the combined the forced aligned (:tada:) audio and transcript together.