SU-CS361 Project Proposal

Last edited: August 8, 2025

Introduction

Reinforcement Learning with Human Feedback (RLHF) has demonstrated superb effect for aligning the performance of a language model (LM) against unsupervised human preference judgements of LM output trajectories ((Ziegler et al. 2020)). However, the original RLHF formulation has shown little direct improvements to the model’s toxicity without further prompting, yet conferred some advantage when prompted to specifically be respectful ((Ouyang et al. 2022)).

To specifically target the problem of the reduction of harmfulness in LM toxicity, varying approaches have been explored via contrastive learning ((Lu et al., n.d.)), or—though a combination of instruction-following RLHF and in-context learning (i.e. prompting)—sampling and LM self-correcting output trajectories ((Ganguli et al. 2023)).

SU-CS361 Stochastic Methods, Population Methods, and Constraints Index

Last edited: August 8, 2025

SU-CS361: Derivatives, Bracketing, Descent, and Approximation Index

Last edited: August 8, 2025

Formal Formulation of Optimization
constraint
types of conditions
- FONC and SONC
Derivatives
- Directional Derivatives
- numerical methods
  - Finite-Difference Method
  - Complex-Difference Method
- exact methods: autodiff
  - Forward Accumulation
  - cooool: Dual Number Method
Bracketing (one dimensional optimization schemes)
Descent Direction Iteration
First-Order Methods
- good ol gradient descent
- Conjugate Gradient
- Hyper-gradient Descent
Second-Order Methods
- Newton’s Method
- or approximate it using Secant Method
Direct Methods
- Cyclic Coordinate Search
- Accelerated Coordinate Search
- Powell’s Method
- Hooke-Jeeves Search
- Generalized Pattern Search
  - opportunistic search
  - dynamic ordering
- Nelder-Mead Simplex Method

SU-ENG7CE Creative Writing

Last edited: August 8, 2025

SU-ENGR76 APR022024

Last edited: August 8, 2025

Clarke’s Third Law

Sufficiently advance technology is indistinguishable from magic