least-squares error

requirements

  • \(h\qty(x)\) the predictor function
  • \(x,y\), the samples of data

definition

\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n}\qty(h_{\theta }\qty(x^{(i)}) - y^{(i)})^{2} \end{equation}

see also example: gradient descent for least-squares error.

additional information

“why the 1/2”?

Because when you take \(\nabla J\qty(\theta)\) you end up with the \(\frac{1}{2}\) and the \(2\) canceling out.

probabilistic intuition for least-squares error