_index.org

SU-CS340R APR022024

Last edited: August 8, 2025

SU-CS361 APR022024

Last edited: August 8, 2025

Formal Formulation of Optimization

If we want to write down an optimization problem…

\begin{align} \min_{x} f(x) \\ s.t: x \in \mathcal{X} \end{align}

where:

  • \(x\): “design point” (usually a vector representing the thing you are trying to optimize)
    • which is an assignment of “design variable” to values (width, height, etc.)
  • \(\mathcal{X}\): a feasible set of design points that satisfy the constraint
  • \(f: \mathcal{X} \to \mathbb{R}\): the objective function, which
  • \(x^{*}\): the minimizer—one (or many) points that satisfy the minimization scheme

Conventionally, we minimize.

SU-CS361 APR042024

Last edited: August 8, 2025

optimization inequalities cannot be strict

Consider:

\begin{align} \min_{x}&\ x \\ s.t.\ & x > 1 \end{align}

this has NO SOLUTION. (1,1) wouldn’t actually be in feasible set. So, we usually specify optimization without a strict inequality.

So, instead, we write:

\begin{align} \min_{x}&\ x \\ s.t.\ & x \geq 1 \end{align}

Univariate Conditions

First order Necessary Condition

\begin{equation} \nabla f(x^{*}) = 0 \end{equation}

Second order necessary condition

\begin{equation} \nabla^{2}f(x^{*}) \geq 0 \end{equation}

SU-CS361 APR092024

Last edited: August 8, 2025

More Bracketing Methods

if your function is unimodal

  • Pick three points that gets “high, low, high”
  • Fit a quadratic to it, evaluate its minima and add it to the point set
  • Now, drop any of the four resulting point

Shubert-Piyavskill Method

This is a Bracketing approach which grantees optimality WITHOUT unimodality by using the Lipschitz Constant. But, this only works in one dimension.

Consider a Lipschitz continuous function with Lipschitz Constant \(L\). We can get our two initial points \(a\) and \(b\). First, we arbitrarily pick a point in the middle to evaluate; this will give us a cone (see Lipschitz Condition) which bounds the function.

SU-CS361 APR112024

Last edited: August 8, 2025

Hyper-gradient Descent

Adapt the execution of gradient descent to Hyper-gradient Descent! Recall the Descent Direction Iteration update rule:

For LR \(\alpha\), what if we write:

\begin{equation} \pdv{f\qty(x^{(k+1)})}{\alpha} = \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \end{equation}

The left side is just \(f’(x^{(k+1)}) = \nabla_{x}f(x^{(k+1)})\). Recall that the right side is \(\pdv{\alpha} \qty(x^{(k)} - \alpha \nabla_{x} f(x^{(k)}))\). This evaluates to simply \(-\nabla_{x}f(x^{(k)})\).

Therefore:

\begin{align} \pdv{f\qty(x^{(k+1)})}{\alpha} &= \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \\ &= \nabla_{x}f(x^{(k+1)}) \cdot (-\nabla_{x}f(x^{(k)})) \end{align}

And so, to update our step size/learning rate, we can: