continuous state MDP
Last edited: November 11, 2025Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do?
Discretization
We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values!
SU-CS229 NOV102025
Last edited: November 11, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
- “sometimes we may want to model slower than the data to be collected; for instance, your helicopter really doesn’t move anywhere every 100ths of a second to be learned, but you can collect data that fast”
Debugging RL
RL should work when
- The simulator is good
- The RL algorithm correctly maximize \(V^{\pi}\)
- Reward such that maximum expected payoff corresponds to your goal
Diagnostics
- check your simulator: if your policy works in sim but not IRL, your sim is bad
- if \(V^{\text{RL}} < V^{\text{human}}\), then your RL algorithm is just bad
- if \(V^{\text{RL}} \geq V^{\text{human}}\), then your objective function is bad
EMNLP2025 Extra Things
Last edited: November 11, 2025EMNLP2025 Yu: Long-Context LM Fail in Basic Retrieval
Synthetic dataset finds that needle-in-the-haystack problems fail when needle needs reasoning
EMNLP2025 Friday Afternoon Posters
Last edited: November 11, 2025EMNLP2025 Ghonim: concept-ediq
a massive bank of concepts multi model semantically linked
EMNLP2025 Bai: understanding and leveraging expert specialization of context faithfulness
Two steps: step one is to use router tuning to prioritize experts that rely on context, step two is to especially hit those for fine-tuning for improved Qantas alliance. Big gainz and hot pot and other QA data set just by the router tuning
EMNLP2025 Vasu: literature grounded hypothesis generation
Use citation links to generate a Providence graph of hypothesis, then, fine tune a language model to reproduce this Providence graph, use resulting model to improve RAG that would be contextually grounded
EMNLP2025 Wednesday Morning Posters
Last edited: November 11, 2025EMNLP2025 Xu: tree of prompting
Evaluate the quote attribution score as a way to prioritize more factual quotes.
EMNLP2025 Fan: medium is not the message
Unwanted feature such as language a medium who found in embedding, use linear concept of eraser to learn a projection that minimize information on unwanted features
EMNLP2025 Hong: variance sensitivity induces attention entropy collapse
Softmax is highly sensitive to variance which is why pre-training loss spikes without QK norm
