_index.org

SU-CS205L JAN282025

Last edited: August 8, 2025

Line Search and Steepest Design

Gram-Schmidt For Matrix Orthogonality

You can use Gram-Schmidt to find matrix orthogonality. In particular, for a series of vectors \(s^{(j)}\) forming a matrix \(A\):

\begin{equation} s^{(q)} = s^{(q)}- \sum_{q’=1}^{q-1} \frac{\langle s^{(q)}, s^{(q’)} \rangle_{A}}{\langle s^{(q’)}, s^{(q’)} \rangle_{A}}s^{(q’)} \end{equation}

for Conjugate Gradient, it works out such that only one such dot products is non-zero, so we can write:

\begin{equation} s^{(q)} = r^{(q)} + \frac{r^{(q)}\cdot r^{(q)}}{r^{(q-1)}\cdot r^{(q-1)}} s^{(q-1)} \end{equation}

for residual \(r^{(q)}\), and

SU-CS205L Quiz 2/10

Last edited: August 8, 2025

t’s a bad idea when

SU-CS205L Quiz 3/3

Last edited: August 8, 2025

SU-CS224N APR022024

Last edited: August 8, 2025

Why Language

  • language, first, allows communication (which allowed us to take over the world)
  • language allows humans to achieve higher level thoughts (it scaffolds detailed planning)
  • language is also a flexible system which allows variatically precise communication

“The common misconception is that language use has to do with words and what they mean; instead, language use has to do with people and what they mean.”

Timeline of Development

2014 - Neural Machine Translation

Deep Google Translate allows wider communication and understanding

SU-CS224N APR042024

Last edited: August 8, 2025

stochastic gradient descent

See stochastic gradient descent

Word2Vec

see word2vec

Or, we can even use a simpler approach, window-based co-occurance

GloVe

  • goal: we want to capture linear meaning components in a word vector space correct
  • insight: the ratio of co-occurrence probabilities are linear meaning components

Therefore, GloVe vectors comes from a log-bilinear:

\begin{equation} w_{i} \cdot w_{j} = \log P(i|j) \end{equation}

such that:

\begin{equation} w_{x} \cdot (w_{a} - w_{b}) = \log \frac{P(x|a)}{P(x|b)} \end{equation}

Evaluating a NLP System

Intrinsic

  • evaluate on the specific target task the system is trained on
  • evaluate speed
  • evaluate understandability

Extrinsic

  • real task + attempt to replace older system with new system
  • maybe expensive to compute

Word Sense Ambiguity

Each word may have multiple different meanings; each of those separate word sense should live in a different place. However, words with polysemy have related senses, so we usually average: