Posts

SU-CS224N MAY092024

Last edited: August 8, 2025

Floating Point

4 bytes

\begin{equation} (-1)^{B} + e^{E-127} \times \qty(1 + \sum_{i=1}^{23} b_{23-i}2^{-i}) \end{equation}

usually \(E\) is a 8 bytes, and 23 digits of \(b\).

With more \(E\), we will have more range, with more \(b\), we will have more precision.

Mixed Precision Training

  1. copy the model in FP32
  2. Run forward pass in FP16
  3. Scale loss to be large enough to not be rounded away
  4. Compute gradients in FP16
  5. Convert the gradients onto FP32
  6. Scale the gradients down
  7. apply to the model

BFloat16

To not need to scale, we can use a scheme that has less precision but the same amount of dynamic range (i.e. allocate the same \(E\), chop off \(b\)) —no need to scale, just have more dynamic range.

SU-CS224N Paper Review

Last edited: August 8, 2025

Key Information

  • Title: Fine-Grained Language Model Detoxification with Dense, Token-Level Rewards
  • Team Member (in 224n): Houjun Liu <[email protected]>
  • External Collaborators: Amelia Hardy <[email protected]>, Bernard Lange <[email protected]>
  • Custom Project
  • Mentor: we have no particular mentor within 224n
  • Sharing Project: this project is shared with AA222, and is a part of a research project PI’d by Mykel Kochenderfer <[email protected]>, of which Houjun is taking a leading role

Research Paper Summary

TitleFine-Grained Human Feedback Gives Better Rewards for Language Model Training
VenueNeurIPS (Spotlight)
Year2023
URLhttps://arxiv.org/pdf/2306.01693

Background

Reinforcement Learning with Human Feedback (RLHF) has demonstrated superb effect for improving performance of a language model (LM) via human preference judgments of LM output desirability–reducing incidences of toxic or false generation trajectories ((Ziegler et al. 2020)). Naive application of RLHF directly has shown success in reducing the toxicity in language model outputs, yet its effects could sometimes be inconsistent without further in-context guidance of the resulting model ((Ouyang et al. 2022)).

SU-CS229 JAN082025

Last edited: August 8, 2025

Notation

New Concepts

Important Results / Claims

Questions

  • Why can’t we use root-mean-square error for the training objective? It seems like its just more normalization…

SU-CS238 NOV022023

Last edited: August 8, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids