Posts

SU-CS361 APR092024

Last edited: August 8, 2025

More Bracketing Methods

if your function is unimodal

  • Pick three points that gets “high, low, high”
  • Fit a quadratic to it, evaluate its minima and add it to the point set
  • Now, drop any of the four resulting point

Shubert-Piyavskill Method

This is a Bracketing approach which grantees optimality WITHOUT unimodality by using the Lipschitz Constant. But, this only works in one dimension.

Consider a Lipschitz continuous function with Lipschitz Constant \(L\). We can get our two initial points \(a\) and \(b\). First, we arbitrarily pick a point in the middle to evaluate; this will give us a cone (see Lipschitz Condition) which bounds the function.

SU-CS361 APR112024

Last edited: August 8, 2025

Hyper-gradient Descent

Adapt the execution of gradient descent to Hyper-gradient Descent! Recall the Descent Direction Iteration update rule:

For LR \(\alpha\), what if we write:

\begin{equation} \pdv{f\qty(x^{(k+1)})}{\alpha} = \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \end{equation}

The left side is just \(f’(x^{(k+1)}) = \nabla_{x}f(x^{(k+1)})\). Recall that the right side is \(\pdv{\alpha} \qty(x^{(k)} - \alpha \nabla_{x} f(x^{(k)}))\). This evaluates to simply \(-\nabla_{x}f(x^{(k)})\).

Therefore:

\begin{align} \pdv{f\qty(x^{(k+1)})}{\alpha} &= \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \\ &= \nabla_{x}f(x^{(k+1)}) \cdot (-\nabla_{x}f(x^{(k)})) \end{align}

And so, to update our step size/learning rate, we can:

SU-CS361 APR162024

Last edited: August 8, 2025

Stochastic Methods

Stochastic Methods is where you use randomness strategically to escape local minima. This typically rely on pseudo-random generation.

Noisy Descent

Gradient descent but slightly bad:

\begin{equation} \bold{x} = \bold{x} + \alpha \nabla f_{\bold{x}} + \epsilon \end{equation}

where:

\begin{equation} \epsilon \sim \mathcal{N}(0, \lambda I) \end{equation}

This is like Generalized Pattern Search, but instead of a fixed positive spanning set we change the directions of the span vectors every single step; once randomized, then we expand/shrink as needed.

SU-CS361 APR182024

Last edited: August 8, 2025

constraint

recall constraint; our general constraints means that we can select \(f\) within a feasible set \(x \in \mathcal{X}\).

active constraint

an “active constraint” is a constraint which, upon application, changes the solution to be different than the non-constrainted solution. This is always true at the equality constraint, and not necessarily with inequality constraints.

types of constraints

We can write all types of optimization problems into two types of constraints; we will use these conventions EXACTLY:

SU-CS361 APR252024

Last edited: August 8, 2025

see Linear Constraint Optimization