matrix calculus

Last edited: October 10, 2025

Transpose Rules

\(\qty(AB)^{T} = B^{T}A^{T}\)
\(\qty(a^{T}Bc)^{T} = c^{T} B^{T}a\)
\(a^{T}b = b^{T}a\)
\(\qty(A+B)C = AC + BC\)
\(\qty(a+b)^{T}C = a^{T}C + b^{T}C\)
\(AB \neq BA\)

Derivative

Scalar derivative	Vector derivative
\(f\qty(x) \to \pdv{f}{x}\)	\(f\qty(x) \to \pdv{f}{x}\)
\(bx \to b\)	\(x^{T}B \to B\)
\(bx \to b\)	\(x^{T}b \to b\)
\(x^{2} \to 2x\)	\(x^{T}x \to 2x\)
\(bx^{2} \to 2bx\)	\(x^{T}Bx \to 2Bx\)

Products

\begin{equation} \pdv{AB}{A} = B^{T}, \pdv{AB}{B} = A^{T} \end{equation}

\begin{equation} \pdv{Ax}{A} = x^{T}, \pdv{Ax}{x}= A \end{equation}

Normal Equation

Last edited: October 10, 2025

constituents

Let’s also define our entire training examples and stack them in rows:

\begin{equation} X = \mqty( - x^{(1)}^{T} - \\ \dots \\ - x^{\qty(n)}^{T} - ) \end{equation}

\begin{equation} Y = \mqty(y^{(1)} \\ \dots \\ y^{(n)}) \end{equation}

requirements

least-squares error becomes:

\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n} \qty(h\qty(x^{(i)}) - y^{(i)}) ^{2} = \qty(X \theta - y)^{T} \qty(X \theta - y) \end{equation}

Solving this exactly by taking the derivative of \(J\) and set it to \(0\) (i.e. for a minima, we obtain)

SU-CS161 OCT302025

Last edited: October 10, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

SU-CS161 Things to Review

Last edited: October 10, 2025

log laws, exponent laws and general non-discrete math stuff
distributions and infinite series, math53 content
combinations
- \(\mqty(n \\ k) = \mqty(n-1 \\ k-1) + \mqty(n-1 \\ k)\)
- \(\mqty(n \\k) = \mqty(n \\ n-k)\)
binomial theorem: \(\qty(a+b)^{n} = \sum_{k=0}^{n} \mqty(n \\k)a^{k} b^{n-k}\)
geometric sum

SU-CS229 Distribution Sheet

Last edited: October 10, 2025

Here’s a bunch of exponential family distributions. Recall:

\begin{equation} p\qty(x;\eta) = b\qty(x) \exp \qty(\eta^{T}T\qty(x) - a\qty(\eta)) \end{equation}

normal, berunouli, posisson, binomial, negative binomial, geometric, chi-squared, exponential are all in

normal distribution

\(\mu\) the mean, \(\sigma\) the variance

\begin{equation} p\qty(x;\mu, \Sigma) = \frac{1}{\qty(2\pi)^{\frac{|x|}{2}} \text{det}\qty(\Sigma)^{\frac{1}{2}}} \exp \qty(-\frac{1}{2} \qty(x-\mu)^{T}\Sigma^{-1}\qty(x-\mu)) \end{equation}

\begin{equation} p\qty(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \qty({ \frac{-(x-u)^{2}}{2 \sigma^{2}}}) \end{equation}

\begin{equation} \mathbb{E}[x] = \mu \end{equation}

\begin{equation} \text{Var}\qty [x] = \sigma^{2} \end{equation}

This is exponential family distribution. For \(\sigma^{2} = 1\):