[redirect] normal equation
Last edited: September 9, 2025See Normal Equation
for small equations of Linear Regression, we can solve it using normal equation method.
Consider \(d\) dimensional feature and \(n\) samples of data. Remember, including the dummy feature, we have a matrix: \(X \in \mathbb{R}^{n \times \qty(d+1)}\) and a target \(Y \in \mathbb{R}^{n}\).
Notice:
\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n} \qty(h_{\theta} \qty(x^{(i)}) - y^{(i)})^{2} \end{equation}
and \(h = X \theta\), we we can write:
\begin{equation} J(\theta) = \frac{1}{2} \qty(X \theta - y)^{T} \qty(X \theta - y) \end{equation}
AA228/CS238: Probability Review!
Last edited: September 9, 2025Random Variable
random variables takes on different values with different probabilities. Each value a random variable take on is an event.
For instance, here’s a random variable representing a die: \(X\). It can takes on the following values, with the following probabilities:
\begin{align} P(X=1) = \frac{1}{6}\\ P(X=2) = \frac{1}{6}\\ \dots \\ P(X=6) = \frac{1}{6} \end{align}
where each assignment \(X=k\) is what we refer to above as an event.
The set of assignments of a random variable and their associated probability is called a distribution: distributions “assigns probabilities to outcomes.” When we say a certain random variable \(X\) is “distributed” following a distribution \(D\), we say \(X \sim D\). Semantically, we say \(X\) is a \(D\) random variable.
AI Safety Annual Meeting 2025
Last edited: September 9, 2025AISafety2025 Bansal: Safety Constrained Sets
Detect tokens (latents?) which trigger potential paths into unsafe behavior, and then preempt them early by steering.
big-o
Last edited: September 9, 2025Intuition:
- \(O\): \(\leq\) (function in the symbol is
- \(\theta\): \(=\)
- \(\Omega\): \(\geq\) (function in the symbol is a lower bound)
Definition Intuition:
We say \(f\qty(n) = O\qty(g\qty(n))\) such that “when \(n\) gets big enough, \(f\qty(n)\) is bounded by at most a constant multiple of \(g\qty(n)\).
Definitions:
- \(f(n) = O(g(n)) \Leftrightarrow \exists c, n_{0} > 0: \forall n > n_0, f(n) \leq c (g(n))\)
- \(f(n) = \Omega(g(n)) \implies \exists n_{0}: \forall n > n_0, f(n) \geq c (g(n))\)
- \(f(n) = \theta(g(n)) \implies \exists n_{0}: \forall n > n_0, f(n) \geq 1 (g(n)), f(n) \leq c (g(n))\)
Little ones:
cost function
Last edited: September 9, 2025a cost function \(J\) tells us how good our training is. For instance, least-squares error
