Challenges of Language Model Agents
Last edited: August 8, 2025Challenge of Making Agents
Agents are not very new—(Riedl and Amant 2002). But newer models can be powered by LLM/VLMs, meaning we are using language for reasoning/communication.
Sequentiality is hard
- what is the context/motivation?
- how to you transfer across contexts?
- how do you plan?
Evaluation
- Different from how previous NLP benchmarks: we are not worried about language modeling
- No longer boundaries between various fields
Common goals:
- realistic agents—stop playing Atari games.
- reproducible systems
- measurability goals
- scalable models
- which are easy to use
Web as an Interactive Environment
- agents on the web is both practical and scalable
- https://webshop-pnlp.github.io/
- WebShop can actually transfer with no work to training on Amazon
- Mind2Web
InterCode
Formulation of agent decisions as POMDP in order to fully benchmark Markovian decisions:
changes to central dogma
Last edited: August 8, 2025- 80% of the human genome is actually transcribed
- very little “junk DNA”
- 40% IncRNA are gene specific
characteristic polynomial
Last edited: August 8, 2025The polynomial given by the determinant of:
\begin{equation} det(A-\lambda I) \end{equation}
for some Linear Map \(A\). Solutions for \(\lambda\) are the eigenvalues. This is because something is an eigenvalue IFF \((A-\lambda I)v = 0\) for some \(\lambda, v\), so we need \((A-\lambda I)\) to be singular.
Characteristic polynomial of a 2x2 matrix is given by \(\lambda^{2}-tr(A)\lambda + det(A)\).
charged
Last edited: August 8, 2025an atom is said to be charged when there is an imbalance between its number of protons and electrons.
additional information
units of charge
charge is measured in SI unit \(C\), coulomb. However, we are often dealing with \(e\), the charge of an electron (as ultimate that’s the principle way by which charge moves around). \(e \approx 1.6 \times 10^{-19} C\).
net charge can be neither created nor destroyed
Unsurprisingly, though you can move electrons around, they will be conserved across a system.

