Posts

corpus

Last edited: August 8, 2025

usually we use \(N\) to denote the number of tokens, and \(V\) the “vocab” or set of word types.

Corpora is usually considered in context of:

  • specific writers
  • at specific time
  • for specific varieties
  • of specific languages
  • for a specific function

Particularly hard: code switching, gender, demographics, variety, etc.

Herdan’s Law

\begin{equation} |V| = kN^{\beta} \end{equation}

with \(\beta\) being a constant between \(0.67 < \beta < 0.75\).

The vocab size is roughly proportional to the number of tokens.

cortex

Last edited: August 8, 2025

cost function

Last edited: August 8, 2025

a cost function \(J\) tells us how good our training is.

additional information

least-squares error

\begin{equation} J\qty(\theta) = \sum_{i=1}^{n}\qty(h_{\theta }\qty(x^{(i)}) - y^{(i)})^{2} \end{equation}

see also example: gradient descent for least-squares error

Coulomb's Law

Last edited: August 8, 2025

coulomb’s law is a principle that deals with the force that two charged particles exhibit to each other.

constituents

  • \(k\), Coulomb’s Constant, found roughly to be \(9 \times 10^{9} \frac{N m^{2}}{C}\)
  • \(q_{1,2}\), the charge of the two particles you are analyzing
  • \(r\), distance between particles

requirements

\begin{equation} \vec{F_{E}} = k \frac{q_1q_2}{r^{2}} \end{equation}

additional information

interpreting signs on \(F_{e}\)

  • negative: attraction force between changes (the points have opposite signed charges, and so attract)
  • positive: repulsion force between changes (the point have the same signed change, so repel)

alternative formulation of Coulomb’s Law

The law is often redefined with the language of the premittivity of free space:

counterfactual

Last edited: August 8, 2025

“if thing didn’t happen would I have…”