SU-CS361 MAY072024
Last edited: August 8, 2025Generalization Error
\begin{equation} \epsilon_{gen} = \mathbb{E}_{x \sim \mathcal{X}} \qty[\qty(f(x) - \hat{f}(x))^{2}] \end{equation}
we usually instead of compute it by averaging specific points we measured.
Probabilistic Surrogate Models
Gaussian Process
A Gaussian Process is a Gaussian distribution over functions!
Consider a mean function \(m(x)\), and a covariance (kernel) function \(k(x, x’)\). And, for a set of objective values \(y_{j} \in \mathbb{R}\), which we are trying to infer using \(m\) and \(k\).
\begin{equation} \mqty[y_1 \\ \dots \\ y_{m}] \sim \mathcal{N} \qty(\mqty[m(x_1) \\ \dots \\ m(x_{m})], \mqty[k(x_1, x_1) & \dots & k(x_1, x_{m}) \&\dots&\\ k(x_{m}, x_{1}) &\dots &k(x_{m}, x_{m})]) \end{equation}
SU-CS361 MAY092024
Last edited: August 8, 2025optimization uncertainty
- irreducible uncertainty: uncertainty inherent to a system
- epistemic uncertainty: subjective lack of knowledge about a system from our standpoint
uncertainty can be presented as a vector of random variables, \(z\), where the designer has no control. Feasibility of a design point, then, depends on \((x, z) \in \mathcal{F}\), where \(\mathcal{F}\) is the feasible set of design points.
set-based uncertainty
set-based uncertainty treats uncertainty \(z\) as belonging to some set \(\bold{Z}\). Which means that we typically use minimax to solnve:
SU-CS361 Midterm 1 Review
Last edited: August 8, 2025- error complexity in big \(O\) of finite-difference Midpoint Method, Forward Search, backward search (also what are those)
- Fibonacci Search and Golden Section Search
- Bisection Method
- Shubert-Piyavskill Method
- Trust Region Methods equations
- Secant Method
- the Nelder-Mead Simplex Method chart
SU-CS361 Multi-Objective Optimization, Sampling, Surrogates
Last edited: August 8, 2025SU-CS361 Project Proposal
Last edited: August 8, 2025Introduction
Reinforcement Learning with Human Feedback (RLHF) has demonstrated superb effect for aligning the performance of a language model (LM) against unsupervised human preference judgements of LM output trajectories ((Ziegler et al. 2020)). However, the original RLHF formulation has shown little direct improvements to the model’s toxicity without further prompting, yet conferred some advantage when prompted to specifically be respectful ((Ouyang et al. 2022)).
To specifically target the problem of the reduction of harmfulness in LM toxicity, varying approaches have been explored via contrastive learning ((Lu et al., n.d.)), or—though a combination of instruction-following RLHF and in-context learning (i.e. prompting)—sampling and LM self-correcting output trajectories ((Ganguli et al. 2023)).
