SU-CS361 Multi-Objective Optimization, Sampling, Surrogates
Last edited: August 8, 2025SU-CS361 Project Proposal
Last edited: August 8, 2025Introduction
Reinforcement Learning with Human Feedback (RLHF) has demonstrated superb effect for aligning the performance of a language model (LM) against unsupervised human preference judgements of LM output trajectories ((Ziegler et al. 2020)). However, the original RLHF formulation has shown little direct improvements to the model’s toxicity without further prompting, yet conferred some advantage when prompted to specifically be respectful ((Ouyang et al. 2022)).
To specifically target the problem of the reduction of harmfulness in LM toxicity, varying approaches have been explored via contrastive learning ((Lu et al., n.d.)), or—though a combination of instruction-following RLHF and in-context learning (i.e. prompting)—sampling and LM self-correcting output trajectories ((Ganguli et al. 2023)).
SU-CS361 Stochastic Methods, Population Methods, and Constraints Index
Last edited: August 8, 2025SU-CS361: Derivatives, Bracketing, Descent, and Approximation Index
Last edited: August 8, 2025- Formal Formulation of Optimization
- constraint
- types of conditions
- Derivatives
- Directional Derivatives
- numerical methods
- exact methods: autodiff
- Bracketing (one dimensional optimization schemes)
- Descent Direction Iteration
- First-Order Methods
- Second-Order Methods
- Newton’s Method
- or approximate it using Secant Method
- Direct Methods
- Cyclic Coordinate Search
- Accelerated Coordinate Search
- Powell’s Method
- Hooke-Jeeves Search
- Generalized Pattern Search
- opportunistic search
- dynamic ordering
- Nelder-Mead Simplex Method