SU-CS361 APR112024
Last edited: August 8, 2025Hyper-gradient Descent
Adapt the execution of gradient descent to Hyper-gradient Descent! Recall the Descent Direction Iteration update rule:
For LR \(\alpha\), what if we write:
\begin{equation} \pdv{f\qty(x^{(k+1)})}{\alpha} = \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \end{equation}
The left side is just \(f’(x^{(k+1)}) = \nabla_{x}f(x^{(k+1)})\). Recall that the right side is \(\pdv{\alpha} \qty(x^{(k)} - \alpha \nabla_{x} f(x^{(k)}))\). This evaluates to simply \(-\nabla_{x}f(x^{(k)})\).
Therefore:
\begin{align} \pdv{f\qty(x^{(k+1)})}{\alpha} &= \pdv{f(x^{(k+1)}}{x^{(k+1)}} \pdv{x^{(k+1)}}{\alpha} \\ &= \nabla_{x}f(x^{(k+1)}) \cdot (-\nabla_{x}f(x^{(k)})) \end{align}
And so, to update our step size/learning rate, we can:
SU-CS361 APR162024
Last edited: August 8, 2025Stochastic Methods
Stochastic Methods is where you use randomness strategically to escape local minima. This typically rely on pseudo-random generation.
Noisy Descent
Gradient descent but slightly bad:
\begin{equation} \bold{x} = \bold{x} + \alpha \nabla f_{\bold{x}} + \epsilon \end{equation}
where:
\begin{equation} \epsilon \sim \mathcal{N}(0, \lambda I) \end{equation}
Mesh Adaptive Direct Search
This is like Generalized Pattern Search, but instead of a fixed positive spanning set we change the directions of the span vectors every single step; once randomized, then we expand/shrink as needed.
SU-CS361 APR182024
Last edited: August 8, 2025constraint
recall constraint; our general constraints means that we can select \(f\) within a feasible set \(x \in \mathcal{X}\).
active constraint
an “active constraint” is a constraint which, upon application, changes the solution to be different than the non-constrainted solution. This is always true at the equality constraint, and not necessarily with inequality constraints.
types of constraints
We can write all types of optimization problems into two types of constraints; we will use these conventions EXACTLY:
SU-CS361 APR252024
Last edited: August 8, 2025SU-CS361 APR302024
Last edited: August 8, 2025Multi-Objective Optimization
- identify non-dominated individuals (individuals, for which in the multi-objective, is not dominated); this forms the “pareto frontier”
- create all combinations of input parameters, and create a pareto frontier for them
- identify a weighting between the variations you desire, and identify the elements which align with the Pareto frontier
Pareto Optimality
At a pareto optimal point, increasing one objective value decreases another. that is, a pareto optimal point is not dominated.
dominate
one point dominates another if:
