requirements
- \(h\qty(x)\) the predictor function
- \(x,y\), the samples of data
definition
\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n}\qty(h_{\theta }\qty(x^{(i)}) - y^{(i)})^{2} \end{equation}
see also example: gradient descent for least-squares error.
additional information
“why the 1/2”?
Because when you take \(\nabla J\qty(\theta)\) you end up with the \(\frac{1}{2}\) and the \(2\) canceling out.