_index.org

Medical Dialogue Generation

Last edited: August 8, 2025

RAG for generic risk conversations.

  1. tag transcripts with the relevant themes a la action research
  2. use llama2 to embed the given information
  3. then send the overall information to a larger language model

People generally preferred grounded GPT responses over human responses. Sometimes, in 2 of the features, humans preferred the non grounded responses.

Medical Knowledge Extraction

Last edited: August 8, 2025

SnowMed CT: large medical ontology

  1. created pairwise distance matrix on SnowMed CT
  2. created weighted graphs using the SnowMed information
  3. Node2Vec! Suddenly you have an embedding for each disease.

Two Tasks:

Patient Disease Embeddings

Using the node2vec with snowmed

Similar Patient Retrial

Reveal hidden co-morbidities via Jaccard Coefficient

meeting 7/25

Last edited: August 8, 2025

meeting 8/1

Last edited: August 8, 2025

Updates

Yay mend works!

.mean() vs. .sum() for the dW maps?

PPL Isn’t the Only Possible Metric

even if our model is better ppl, its worse at squad than Facebook (granted its been trained a lot less); will run with new pretraining model (expect that no dropout will be better (see paper above)).

Meghanani 2021

Last edited: August 8, 2025

DOI: 10.3389/fcomp.2021.624558

One-Liner

analyzed spontaneous speech transcripts (only!) from TD and AD patients with fastText and CNN; best was \(83.33\%\) acc.

Novelty

  • threw the NLP kitchen sink to transcripts
    • fastText
    • CNN (with vary n-gram kernel 2,3,4,5 sizes)

Notable Methods

  • embeddings seaded by GloVe
  • fastText are much faster, but CNN won out

Key Figs

the qual results

PAR (participant), INV (investigator)

Notes

Hey look a review of the field: