SU-CS340R APR022024
Last edited: August 8, 2025SU-CS361 APR022024
Last edited: August 8, 2025Formal Formulation of Optimization
If we want to write down an optimization problem…
\begin{align} \min_{x} f(x) \\ s.t: x \in \mathcal{X} \end{align}
where:
- \(x\): “design point” (usually a vector representing the thing you are trying to optimize)
- which is an assignment of “design variable” to values (width, height, etc.)
- \(\mathcal{X}\): a feasible set of design points that satisfy the constraint
- \(f: \mathcal{X} \to \mathbb{R}\): the objective function, which
- \(x^{*}\): the minimizer—one (or many) points that satisfy the minimization scheme
Conventionally, we minimize.
SU-CS361 APR042024
Last edited: August 8, 2025optimization inequalities cannot be strict
Consider:
\begin{align} \min_{x}&\ x \\ s.t.\ & x > 1 \end{align}
this has NO SOLUTION. (1,1) wouldn’t actually be in feasible set. So, we usually specify optimization without a strict inequality.
So, instead, we write:
\begin{align} \min_{x}&\ x \\ s.t.\ & x \geq 1 \end{align}
Univariate Conditions
First order Necessary Condition
\begin{equation} \nabla f(x^{*}) = 0 \end{equation}
Second order necessary condition
\begin{equation} \nabla^{2}f(x^{*}) \geq 0 \end{equation}
SU-CS361 APR092024
Last edited: August 8, 2025More Bracketing Methods
Quadratic search
if your function is unimodal…
- Pick three points that gets “high, low, high”
- Fit a quadratic to it, evaluate its minima and add it to the point set
- Now, drop any of the four resulting point
Shubert-Piyavskill Method
This is a Bracketing approach which grantees optimality WITHOUT unimodality by using the Lipschitz Constant. But, this only works in one dimension.
Consider a Lipschitz continuous function with Lipschitz Constant \(L\). We can get our two initial points \(a\) and \(b\). First, we arbitrarily pick a point in the middle to evaluate; this will give us a cone (see Lipschitz Condition) which bounds the function.
SU-CS361 APR112024
Last edited: August 8, 2025Hyper-gradient Descent
Adapt the execution of gradient descent to Hyper-gradient Descent! Recall the Descent Direction Iteration update rule:
For LR \(\alpha\), what if we write:
\begin{equation} \pdv{f\qty(x^{(k+1)})}{\alpha} = \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \end{equation}
The left side is just \(f’(x^{(k+1)}) = \nabla_{x}f(x^{(k+1)})\). Recall that the right side is \(\pdv{\alpha} \qty(x^{(k)} - \alpha \nabla_{x} f(x^{(k)}))\). This evaluates to simply \(-\nabla_{x}f(x^{(k)})\).
Therefore:
\begin{align} \pdv{f\qty(x^{(k+1)})}{\alpha} &= \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \\ &= \nabla_{x}f(x^{(k+1)}) \cdot (-\nabla_{x}f(x^{(k)})) \end{align}
And so, to update our step size/learning rate, we can:
