constituents
Let’s also define our entire training examples and stack them in rows:
\begin{equation} X = \mqty( - x^{(1)}^{T} - \\ \dots \\ - x^{\qty(n)}^{T} - ) \end{equation}
\begin{equation} Y = \mqty(y^{(1)} \\ \dots \\ y^{(n)}) \end{equation}
requirements
least-squares error becomes:
\begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n} \qty(h\qty(x^{(i)}) - y^{(i)}) ^{2} = \qty(X \theta - y)^{T} \qty(X \theta - y) \end{equation}
Solving this exactly by taking the derivative of \(J\) and set it to \(0\) (i.e. for a minima, we obtain)