SU-CS205L JAN282025
Last edited: August 8, 2025Line Search and Steepest Design
Gram-Schmidt For Matrix Orthogonality
You can use Gram-Schmidt to find matrix orthogonality. In particular, for a series of vectors \(s^{(j)}\) forming a matrix \(A\):
\begin{equation} s^{(q)} = s^{(q)}- \sum_{q’=1}^{q-1} \frac{\langle s^{(q)}, s^{(q’)} \rangle_{A}}{\langle s^{(q’)}, s^{(q’)} \rangle_{A}}s^{(q’)} \end{equation}
for Conjugate Gradient, it works out such that only one such dot products is non-zero, so we can write:
\begin{equation} s^{(q)} = r^{(q)} + \frac{r^{(q)}\cdot r^{(q)}}{r^{(q-1)}\cdot r^{(q-1)}} s^{(q-1)} \end{equation}
for residual \(r^{(q)}\), and
SU-CS205L Quiz 2/10
Last edited: August 8, 2025t’s a bad idea when
SU-CS205L Quiz 3/3
Last edited: August 8, 2025SU-CS224N APR022024
Last edited: August 8, 2025Why Language
- language, first, allows communication (which allowed us to take over the world)
- language allows humans to achieve higher level thoughts (it scaffolds detailed planning)
- language is also a flexible system which allows variatically precise communication
“The common misconception is that language use has to do with words and what they mean; instead, language use has to do with people and what they mean.”
Timeline of Development
2014 - Neural Machine Translation
Deep Google Translate allows wider communication and understanding
SU-CS224N APR042024
Last edited: August 8, 2025stochastic gradient descent
See stochastic gradient descent
Word2Vec
see word2vec
Or, we can even use a simpler approach, window-based co-occurance
GloVe
- goal: we want to capture linear meaning components in a word vector space correct
- insight: the ratio of co-occurrence probabilities are linear meaning components
Therefore, GloVe vectors comes from a log-bilinear:
\begin{equation} w_{i} \cdot w_{j} = \log P(i|j) \end{equation}
such that:
\begin{equation} w_{x} \cdot (w_{a} - w_{b}) = \log \frac{P(x|a)}{P(x|b)} \end{equation}
Evaluating a NLP System
Intrinsic
- evaluate on the specific target task the system is trained on
- evaluate speed
- evaluate understandability
Extrinsic
- real task + attempt to replace older system with new system
- maybe expensive to compute
Word Sense Ambiguity
Each word may have multiple different meanings; each of those separate word sense should live in a different place. However, words with polysemy have related senses, so we usually average:
