vector semantics is a sense encoding method.

“a meaning of the word should be tied to how they are used”

we measure similarity between word vectors with cosine similarity. see also vector-space model.

## motivation

### idea 1

neighboring words can help infer semantic meaning of new words: “we can define a word based on its distribution in language use”

### idea 2

meaning should be in a point in space, just like affective meaning (i.e. a score in each dimension).

that is: a word should be a vector in n space

## vector semantics

Each word is a point based on distribution; each word is a vector and similar words are nearby in semantic space.

The intuition is that classifiers can generalize to similar, but unseen words more easily by processing embeddings.

## transposing a Term-Document Matrix

Typically we read a Term-Document Matrix column-wise, to understand what each document can be encoded in terms of words.

However, if you read it row-wise, you can see a distribution for words over the documents.

## term-term matrix

a term-term matrix is a \(|V| \times |V|\) matrix that measures co-occurrence in some context. So each cell would be the number of times the two words co-occur in some small window.

### point-wise mutual information

we usually normalize a Term-Document Matrix via TF-IDF. However, for term-term matrix, we usually normalize it as:

\begin{equation} PMI(w_1, w_2) = \log \frac{p(w_1,w_2)}{p(w_1)p(w_2)} \end{equation}

“would something appear more often then change”

## word2vec

see word2vec