Certificates-Based Intepretation of NL
Last edited: August 8, 2025A language \(A\) is in \(NL\) if \(\exists\) a deterministic Turing Machine \(V\) that runs in logspace where \(x \in A \Leftrightarrow \exists w \in \qty {0,1}^{\text{poly}\qty(|x|)}\) (if and only if!! same as NP) such that \(V \qty(x,w) = 1\), where $x$—the real input \(x\) is on input tape one which is read-only, and the witness \(w\) is on input tape two which is read-once (because otherwise the same definition is equivalent to \(NP\)).
Chain of Thought
Last edited: August 8, 2025Challenges of Language Model Agents
Last edited: August 8, 2025Challenge of Making Agents
Agents are not very new—(Riedl and Amant 2002). But newer models can be powered by LLM/VLMs, meaning we are using language for reasoning/communication.
Sequentiality is hard
- what is the context/motivation?
- how to you transfer across contexts?
- how do you plan?
Evaluation
- Different from how previous NLP benchmarks: we are not worried about language modeling
- No longer boundaries between various fields
Common goals:
- realistic agents—stop playing Atari games.
- reproducible systems
- measurability goals
- scalable models
- which are easy to use
Web as an Interactive Environment
- agents on the web is both practical and scalable
- https://webshop-pnlp.github.io/
- WebShop can actually transfer with no work to training on Amazon
- Mind2Web
InterCode
Formulation of agent decisions as POMDP in order to fully benchmark Markovian decisions:
changes to central dogma
Last edited: August 8, 2025- 80% of the human genome is actually transcribed
- very little “junk DNA”
- 40% IncRNA are gene specific

