SU-CS229 SEP242025

Supervise learning!

Some Notational Conventions

\(n\): number of training examples
\(m\): number of features
\(x\): input feature(s)
\(y\): output*/*target feature
\(\theta\): parameters
\(h_{\theta}\qty(x)\): the predictor function

And so, a tuple \(\qty(x,y)\) is a particular training example. We will use the parentheses notation to denote samples, so \(\qty(x^{(i)}, y^{(i)})\) as the ith example of training. We typically use \(h\qty(x)\) as the predictor, parameters are \(\theta_{j}\).

New Concepts

Linear Regression
- least-squares error
- gradient descent
  - gradient descent for least-squares error
  - variants
    - summing over dataset: batch gradient descent
    - pick one sample and run it: stochastic gradient descent
    - pick some samples and run them: mini-batch gradient descenmini-bach gradient descet
a primer on Vector Calculus
- trace
Normal Equation