vector semantics is a sense encoding method.
“a meaning of the word should be tied to how they are used”
we measure similarity between word vectors with cosine similarity. see also vector-space model.
motivation
idea 1
neighboring words can help infer semantic meaning of new words: “we can define a word based on its distribution in language use”
idea 2
meaning should be in a point in space, just like affective meaning (i.e. a score in each dimension).
that is: a word should be a vector in n space
vector semantics
Each word is a point based on distribution; each word is a vector and similar words are nearby in semantic space.
The intuition is that classifiers can generalize to similar, but unseen words more easily by processing embeddings.
transposing a Term-Document Matrix
Typically we read a Term-Document Matrix column-wise, to understand what each document can be encoded in terms of words.
However, if you read it row-wise, you can see a distribution for words over the documents.
term-term matrix
a term-term matrix is a \(|V| \times |V|\) matrix that measures co-occurrence in some context. So each cell would be the number of times the two words co-occur in some small window.
point-wise mutual information
we usually normalize a Term-Document Matrix via TF-IDF. However, for term-term matrix, we usually normalize it as:
\begin{equation} PMI(w_1, w_2) = \log \frac{p(w_1,w_2)}{p(w_1)p(w_2)} \end{equation}
“would something appear more often then change”
word2vec
see word2vec