SU-CS361 APR042024

Last edited: August 8, 2025

optimization inequalities cannot be strict

Consider:

\begin{align} \min_{x}&\ x \\ s.t.\ & x > 1 \end{align}

this has NO SOLUTION. (1,1) wouldn’t actually be in feasible set. So, we usually specify optimization without a strict inequality.

So, instead, we write:

\begin{align} \min_{x}&\ x \\ s.t.\ & x \geq 1 \end{align}

Univariate Conditions

First order Necessary Condition

\begin{equation} \nabla f(x^{*}) = 0 \end{equation}

Second order necessary condition

\begin{equation} \nabla^{2}f(x^{*}) \geq 0 \end{equation}

SU-CS361 APR092024

Last edited: August 8, 2025

More Bracketing Methods

Quadratic search

if your function is unimodal…

Pick three points that gets “high, low, high”
Fit a quadratic to it, evaluate its minima and add it to the point set
Now, drop any of the four resulting point

Shubert-Piyavskill Method

This is a Bracketing approach which grantees optimality WITHOUT unimodality by using the Lipschitz Constant. But, this only works in one dimension.

Consider a Lipschitz continuous function with Lipschitz Constant \(L\). We can get our two initial points \(a\) and \(b\). First, we arbitrarily pick a point in the middle to evaluate; this will give us a cone (see Lipschitz Condition) which bounds the function.

SU-CS361 APR112024

Last edited: August 8, 2025

Hyper-gradient Descent

Adapt the execution of gradient descent to Hyper-gradient Descent! Recall the Descent Direction Iteration update rule:

For LR \(\alpha\), what if we write:

\begin{equation} \pdv{f\qty(x^{(k+1)})}{\alpha} = \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \end{equation}

The left side is just \(f’(x^{(k+1)}) = \nabla_{x}f(x^{(k+1)})\). Recall that the right side is \(\pdv{\alpha} \qty(x^{(k)} - \alpha \nabla_{x} f(x^{(k)}))\). This evaluates to simply \(-\nabla_{x}f(x^{(k)})\).

Therefore:

\begin{align} \pdv{f\qty(x^{(k+1)})}{\alpha} &= \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \\ &= \nabla_{x}f(x^{(k+1)}) \cdot (-\nabla_{x}f(x^{(k)})) \end{align}

And so, to update our step size/learning rate, we can:

SU-CS361 APR162024

Last edited: August 8, 2025

Stochastic Methods

Stochastic Methods is where you use randomness strategically to escape local minima. This typically rely on pseudo-random generation.

Noisy Descent

Gradient descent but slightly bad:

\begin{equation} \bold{x} = \bold{x} + \alpha \nabla f_{\bold{x}} + \epsilon \end{equation}

where:

\begin{equation} \epsilon \sim \mathcal{N}(0, \lambda I) \end{equation}

Mesh Adaptive Direct Search

This is like Generalized Pattern Search, but instead of a fixed positive spanning set we change the directions of the span vectors every single step; once randomized, then we expand/shrink as needed.

SU-CS361 APR252024

Last edited: August 8, 2025

see Linear Constraint Optimization