Supervise learning!
Some Notational Conventions
- \(n\): number of training examples
- \(m\): number of features
- \(x\): input feature(s)
- \(y\): output*/*target feature
- \(\theta\): parameters
- \(h_{\theta}\qty(x)\): the predictor function
And so, a tuple \(\qty(x,y)\) is a particular training example. We will use the parentheses notation to denote samples, so \(\qty(x^{(i)}, y^{(i)})\) as the ith example of training. We typically use \(h\qty(x)\) as the predictor, parameters are \(\theta_{j}\).
New Concepts
- Linear Regression
- least-squares error
- gradient descent
- gradient descent for least-squares error
- variants
- summing over dataset: batch gradient descent
- pick one sample and run it: stochastic gradient descent
- pick some samples and run them: mini-batch gradient descenmini-bach gradient descet
- a primer on Vector Calculus
- Normal Equation