Descent methods are generally of shape:
\begin{align} x^{(k+1)} = x^{(k)} + t^{(k)} \delta x^{(k)} \end{align}
choosing these is a matter of which descent method you choose. The Hessian thus exposes:
\begin{align} \qty {x + v \mid v^{T} \nabla^{2} f\qty(x) v \leq 1} \end{align}
line search
backtracking line search
A first-order Taylor line in \(t\), centered about \(x\):
\begin{align} f\qty(x) + t \nabla f\qty(x)^{T} \Delta x \end{align}
We can degrade this to raise it to be slightly higher:
\begin{align} f\qty(x) + \alpha t \nabla f\qty(x)^{T} \Delta x \end{align}
and we take all choices of \(t\) for which
\begin{align} f\qty(x + t\Delta x) \leq f\qty(x) + \alpha t \nabla f\qty(x)^{T} \Delta x \end{align}
gradient descent
Set:
\begin{align} \Delta x = -\nabla f\qty(x) \end{align}
convergence: for strongly convex \(f\):
\begin{align} f\qty(x^{(k)}) - p^{*} \leq c^{k} \qty(f^{(0)} - p^{*}) \end{align}
for some constant \(c\).
Gradient methods can be slow, and in particular exhibit “zigzagging” whenever your sublevelsets are not spherical (unisotrphic); so the best gradient method performance are on sublevels sets that are isotropic. You can measure this by condition number.
steepest descent
Gradient methods can be very slow. Thus we want to take a “reasonably sized” direction in the steepest descent.
\begin{align} \Delta x = \qty {\nabla f\qty(x) v \mid \norm{v} \leq 1} \end{align}
Some choices of norms:
- Euclidian norm: \(-\nabla f\qty(x)\) (its just
- quadratic norm: for \(\norm{x}_{P} = \qty(x^{T} P x)^{\frac{1}{2}}\), we have \(-P^{-1}\nabla f\qty(x)\) (you can imagine this as a change of variables in \(P^{\frac{1}{2}}x\).
Newton’s Method
See Newton’s Method
