_index.org

EMNLP2025 Zhang: Diffusion vs. Autoregression Language Models

Last edited: November 11, 2025

One-Liner

Novelty

Notable Methods

Key Figs

New Concepts

Notes

EMNLP2025: MUSE, MCTS Driven Red Teaming

Last edited: November 11, 2025

One-Liner

Notable Methods

  1. construct a series of perturbation actions
    • \(A\qty(s)\) = decomposition (skip), expansion (rollout), dredirection
  2. sequence actions with MCTS

Key Figs

New Concepts

Notes

EMNLP2025 Keynote: Heng Ji

Last edited: November 11, 2025

Motivation: drug discovery is extremely slow and expensive; mostly modulating previous iterations of work.

Principles of Drug Discovery

  • observation: acquire/fuse knowledge from multiple data modalities (sequence, stricture, etc.)
  • think: critically generating actually new hypothesis — allowing iteratively
  • allowing LMs to code-switch between moladities (i.e. fuse different modalities together in the most uniform way)

LM as a heuristic helps prune down search space quickly.

SU-CS229 Midterm Sheet

Last edited: November 11, 2025

backpropegation

Last edited: October 10, 2025

backpropegation is a special case of “backwards differentiation” to update a computation grap.h

constituents

  • chain rule: suppose \(J=J\qty(g_{1}…g_{k}), g_{i} = g_{i}\qty(\theta_{1} \dots \theta_{p})\), then \(\pdv{J}{\theta_{i}} = \sum_{j=1}^{K} \pdv{J}{g_{j}} \pdv{g_{j}}{\theta_{i}}\)
  • a neural network

requirements

Consider the notation in the following two layer NN:

\begin{equation} z = w^{(1)} x + b^{(1)} \end{equation}

\begin{equation} a = \text{ReLU}\qty(z) \end{equation}

\begin{equation} h_{\theta}\qty(x) = w^{(2)} a + b^{(2)} \end{equation}

\begin{equation} J = \frac{1}{2}\qty(y - h_{\theta}\qty(x))^{2} \end{equation}


  1. in a forward pass, compute the values of each value \(z^{(1)}, a^{(1)}, \ldots\)
  2. in a backward pass, compute…
    1. \(\pdv{J}{z^{(f)}}\): by hand
    2. \(\pdv{J}{a^{(f-1)}}\): lemma 3 below
    3. \(\pdv{J}{z^{f-1}}\): lemma 2 below
    4. \(\pdv{J}{a^{(f-2)}}\): lemma 3 below
    5. \(\pdv{J}{z^{f-2}}\): lemme 2 below
    6. and so on… until we get to the first layer
  3. after obtaining all of these, we compute the weight matrices'
    1. \(\pdv{J}{W^{(f)}}\): lemma 1 below
    2. \(\pdv{J}{W^{(f-1)}}\): lemma 1 below
    3. …, until we get tot he first layer

chain rule lemmas

Pattern match your expressions against these, from the last layer to the first layer, to amortize computation.