Google Nerd Snipe

Last edited: August 8, 2025

a:2:{i:0;s:2:“f2”;i:1;s:2:“f3”;}

a:2:{i:0;s:2:“e2”;i:1;s:2:“e3”;}

a:2:{i:0;s:2:“e1”;i:1;s:2:“e2”;}

a:2:{i:0;s:2:“b2”;i:1;s:2:“b3”;}

a:2:{i:0;s:2:“c2”;i:1;s:2:“d8”;}

gorup

Last edited: August 8, 2025

gradient descent

Last edited: August 8, 2025

It’s hard to make globally optimal solution, so therefore we instead make local progress.

constituents

parameters \(\theta\)
step size \(\alpha\)
cost function \(J\) (and its derivative \(J’\))

requirements

let \(\theta^{(0)} = 0\) (or a random point), and then:

\begin{equation} \theta^{(t+1)} = \theta^{(t)} - \alpha J’\qty (\theta^{(t)}) \end{equation}

“update the weight by taking a step in the opposite direction of the gradient by weight”. We stop, btw, when its “good enough” because the training data noise is so much that like a little bit non-convergent optimization its fine.

Gram-Schmidt

Last edited: August 8, 2025

OMG its Gram-Schmidtting!!! Ok so like orthonormal basis are so nice, don’t you want to make them out of boring-ass normal basis? Of course you do.

Suppose \(v_1, … v_{m}\) is a linearly independent list in \(V\). Now let us define some \(e_{1} … e_{m}\) using the procedure below such that \(e_{j}\) are orthonormal and, importantly:

\begin{equation} span(v_1, \dots, v_{m}) = span(e_{1}, \dots, e_{m}) \end{equation}

The Procedure

We do this process inductively. Let:

\begin{equation} e_1 = \frac{v_1}{\|v_1\|} \end{equation}

grammar

Last edited: August 8, 2025

A grammar is a set of logical rules that form a language. (more precisely defined in goals of a grammar)

goals of a grammar

explain natural languages in syntax + semantics
have described algebras which can be used to evolve the syntax
…that describe the grammatical operations

The formalism here is that a rigorous grammar should have: