corpus
Last edited: August 8, 2025usually we use \(N\) to denote the number of tokens, and \(V\) the “vocab” or set of word types.
Corpora is usually considered in context of:
- specific writers
- at specific time
- for specific varieties
- of specific languages
- for a specific function
Particularly hard: code switching, gender, demographics, variety, etc.
Herdan’s Law
\begin{equation} |V| = kN^{\beta} \end{equation}
with \(\beta\) being a constant between \(0.67 < \beta < 0.75\).
The vocab size is roughly proportional to the number of tokens.
cortex
Last edited: August 8, 2025cost function
Last edited: August 8, 2025a cost function \(J\) tells us how good our training is.
additional information
least-squares error
\begin{equation} J\qty(\theta) = \sum_{i=1}^{n}\qty(h_{\theta }\qty(x^{(i)}) - y^{(i)})^{2} \end{equation}
Coulomb's Law
Last edited: August 8, 2025coulomb’s law is a principle that deals with the force that two charged particles exhibit to each other.
constituents
- \(k\), Coulomb’s Constant, found roughly to be \(9 \times 10^{9} \frac{N m^{2}}{C}\)
- \(q_{1,2}\), the charge of the two particles you are analyzing
- \(r\), distance between particles
requirements
\begin{equation} \vec{F_{E}} = k \frac{q_1q_2}{r^{2}} \end{equation}
additional information
interpreting signs on \(F_{e}\)
- negative: attraction force between changes (the points have opposite signed charges, and so attract)
- positive: repulsion force between changes (the point have the same signed change, so repel)
alternative formulation of Coulomb’s Law
The law is often redefined with the language of the premittivity of free space:
counterfactual
Last edited: August 8, 2025“if thing didn’t happen would I have…”