Posts

[redirect] normal equation

Last edited: September 9, 2025

See Normal Equation

for small equations of Linear Regression, we can solve it using normal equation method.

Consider \(d\) dimensional feature and \(n\) samples of data. Remember, including the dummy feature, we have a matrix: \(X \in \mathbb{R}^{n \times \qty(d+1)}\) and a target \(Y \in \mathbb{R}^{n}\).

Notice:

\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n} \qty(h_{\theta} \qty(x^{(i)}) - y^{(i)})^{2} \end{equation}

and \(h = X \theta\), we we can write:

\begin{equation} J(\theta) = \frac{1}{2} \qty(X \theta - y)^{T} \qty(X \theta - y) \end{equation}

AA228/CS238: Probability Review!

Last edited: September 9, 2025

Random Variable

random variables takes on different values with different probabilities. Each value a random variable take on is an event.

For instance, here’s a random variable representing a die: \(X\). It can takes on the following values, with the following probabilities:

\begin{align} P(X=1) = \frac{1}{6}\\ P(X=2) = \frac{1}{6}\\ \dots \\ P(X=6) = \frac{1}{6} \end{align}

where each assignment \(X=k\) is what we refer to above as an event.

The set of assignments of a random variable and their associated probability is called a distribution: distributions “assigns probabilities to outcomes.” When we say a certain random variable \(X\) is “distributed” following a distribution \(D\), we say \(X \sim D\). Semantically, we say \(X\) is a \(D\) random variable.

AI Safety Annual Meeting 2025

Last edited: September 9, 2025

AISafety2025 Bansal: Safety Constrained Sets

Detect tokens (latents?) which trigger potential paths into unsafe behavior, and then preempt them early by steering.

big-o

Last edited: September 9, 2025

Intuition:

  • \(O\): \(\leq\) (function in the symbol is
  • \(\theta\): \(=\)
  • \(\Omega\): \(\geq\) (function in the symbol is a lower bound)

Definition Intuition:

We say \(f\qty(n) = O\qty(g\qty(n))\) such that “when \(n\) gets big enough, \(f\qty(n)\) is bounded by at most a constant multiple of \(g\qty(n)\).

Definitions:

  • \(f(n) = O(g(n)) \Leftrightarrow \exists c, n_{0} > 0: \forall n > n_0, f(n) \leq c (g(n))\)
  • \(f(n) = \Omega(g(n)) \implies \exists n_{0}: \forall n > n_0, f(n) \geq c (g(n))\)
  • \(f(n) = \theta(g(n)) \implies \exists n_{0}: \forall n > n_0, f(n) \geq 1 (g(n)), f(n) \leq c (g(n))\)

Little ones:

cost function

Last edited: September 9, 2025

a cost function \(J\) tells us how good our training is. For instance, least-squares error

additional information