Posts

SU-CS229 Andrew's Advice

Last edited: November 11, 2025

“How quickly can be prototype quickly?” 2-7 days.

SU-CS229 NOV062025

Last edited: November 11, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

229 MDP notation

\(S\) (state), \(A\) (actions), \(P_{(s,a)}\qty(s’) = T\qty(s’ | s,a)\) , \(\gamma\) (discount), \(R\qty(s,a)\).

FUN FACT: discount factors \(< 1\) makes value iteration converge.

\begin{equation} V^{\pi}\qty(s) = \mathbb{E}\qty [R\qty(s_{0},a_{0}) + \gamma R\qty(s_{1}, a_{1}) + \gamma^{2} \dots] \end{equation}

\begin{equation} V^{\pi} \qty(s) = R\qty(s) + \gamma \sum_{s’}^{} P_{s,\pi\qty(s)}\qty(s’) V^{\pi}\qty(s’) \end{equation}

EMNLP2025 Eo: Expert Generalization in MoE in IFT

Last edited: November 11, 2025

One-Liner

cluster the input, activate a seperate expert group for cluster target.

Motivation

  • heterogeneity of input instruction tuning data poses difficulty for MoE
  • routing only operates at token level, so can’t deal with sequence level generalization

Novelty

Architecture to enable hierarchical expert routing.

Notable Methods

Mixure of Clustered Experts

Mixture of Clustered Experts

Dual-stage routing mechanism.

  1. group the \(M\) experts into groups of \(N\) expert (i.e. \(M = \qty(N, \dots, N)\)
  2. k-means clustering the sequence embedding at input
  3. given the assigned cluster, only route to the assigned subgroup

Results

  • outperforms MoE baselines
  • demonstrate expert group specialization

EMNLP2025 Wu: Zero Shot Graph Learning via Explicit Reasoning

Last edited: November 11, 2025

One-Liner

Novelty

Background

How do LLMs do graphs?

  • predict text from graphs (convert graph into text, autoregression)
  • align text with graph (GNN + LLM late fusion)
  • encode text with graph (stick LLM embedding to a GNN as a prompt)

Motivation

Notable Methods

Key Figs

New Concepts

Notes

EMNLP2025 Zhang: Diffusion vs. Autoregression Language Models

Last edited: November 11, 2025

One-Liner

Novelty

Notable Methods

Key Figs

New Concepts

Notes