_index.org

Machine Learning Index

Last edited: November 11, 2025

https://cs229.stanford.edu/

Course Project

  • Deliverables: proposal (300-500 words), milestone (3 pages), final report (5 pages), poster
  • Evaluation: technical quality, originality, community

SU-CS229 Midterm Sheet

Lectures

Basics + Linear Methods

Regularization

Kernel Methods

Decision Trees and Boosting

Deep Learning

Reinforcement Learning

PCA

NP-Complete

Last edited: November 11, 2025

A language \(B\) is NP-Complete if \(B \in NP\) and we have that every \(A \in NP\) has \(A \leq_{P} B\) with a polynomial time mapping reduction. We say \(B\) is NP-Hard if the reduction exists, and NP-Complete if \(B \in NP\) too.

Suppose a language \(L\) is NP-Complete, we have that every other language in \(NP\) is mapping reducable to \(L\). So, if \(L \in P\), then \(P= NP\), if \(L \not\in P\), then \(P \neq NP\).

Principle Component Analysis

Last edited: November 11, 2025

assumptions of PCA

  • this makes sense only when the data is linear
  • large variations have important structure (i.e. not noise)

motivation

Consider dataset \(\qty {x^{(i)}}_{i=1}^{n}\) such that \(x^{(i)} \in \mathbb{R}^{d}\), where \(d < n\). Our goal: we want to automatically identify such correlations between axes of data. This boils down to two goals:

  1. automatically detect and remove redudancies
  2. find features that explain the variation

preprocessing before PCA

normalize: \(x_{j}^{(i)} = \frac{x^{(i)}_{j} - \mu_{j}}{\sigma_{j}}\)

ragdoll scratchpad

Last edited: November 11, 2025

\begin{align} \tau = s_1 \dots s_{n} \end{align}

\begin{equation} a \sim \pi_{\text{lm}}\qty(\tau) \end{equation}

\begin{equation} a_{i}, \rho_{i} \sim \pi_{\text{lm}}\qty(\tau \mid \rho_{i-1} \dots \rho_{1}) \end{equation}

“action selection”


Tracking in \(\rho\)

  1. Things that can go into \(\rho\) (explanation “why did we do this”)
  2. Running summary of the entire \(\tau\) <- good chance this avoids the entire \(\tau\)

why don’t se


Threads

  • embeddings being bad seems weird

randomness

Last edited: November 11, 2025

randomness is a resource which can be used.

Our unit of computation is a randomized turing machine

some questions of randomness

save time/space by using randomness: we used to believe that \(P \subset \text{Randomized } P\), \(L \subset \text{Randomized } L\), etc. But we now think \(P = \text{Randomized } P\) (intuition: if solving SAT requires circuits of exponential size, then \(P\) is equal to Randomized \(P\)).

efficient derandomized: can we reduce usage of randomization—i.e. completely derandomize it while not loosing a lot of efficiency?