Houjun Liu

SU-CS224N APR022024

Why Language

  • language, first, allows communication (which allowed us to take over the world)
  • language allows humans to achieve higher level thoughts (it scaffolds detailed planning)
  • language is also a flexible system which allows variatically precise communication

“The common misconception is that language use has to do with words and what they mean; instead, language use has to do with people and what they mean.”

Timeline of Development

2014 - Neural Machine Translation

Deep Google Translate allows wider communication and understanding

2018 - Free-Text QA

Next generation search: actual sentence search instead of keyword matching.

For instance, YONO (Lee et al 2021)

2019 - GPT

Autoregression! Conditioning on previous material, generate a single next word.

2022+ - ChatGPT+

Dialogue-based systems (question and answer instead of completion).

Foundation Model

See foundational model


denotational semantics

symbol <=> signified

meaning of tree is a map to the set of all trees in the world”

This is pretty much useless.

localist representation

a localist representation means that each activation represents exactly one meaning; i.e. “one-hot”.

one-hot representation

  • each word is a discrete symbol — localist representation
  • each word is represented one-hot over space of all words
  • dimentionality would be HUGGEEEE

Main Problem: no sense of the meaning of words; the difference between “hotel” and “motel” is the same between “house” and “chair”

distributional semantics

We model a symbol based on the distribution in which they appear in language instead of quantified things that it symbolizes.

KEY IDEA: “You shall know the word by the company it keeps.”


try 1. for better attempts, see word vectors.

  • Problems with WordNet

    • “proficient” is a synonym for “good” — missing nuance (i.e. WordNet misses contextual dependence)
    • WordNet lists offensive synonyms without connotations or dangers

word vectors

A words meaning is represented by the context it lives in. This allows us to create embeddings, which measures the similarity between words based on their dot product. We call embeddings embeddings because we are embedding the word as a point in an \(n\) dimensional space.

We can compute these things via Word2Vec.


See word2vec.