_index.org

Challenges of Language Model Agents

Last edited: August 8, 2025

Challenge of Making Agents

Agents are not very new—(Riedl and Amant 2002). But newer models can be powered by LLM/VLMs, meaning we are using language for reasoning/communication.

Sequentiality is hard

  1. what is the context/motivation?
  2. how to you transfer across contexts?
  3. how do you plan?

Evaluation

  1. Different from how previous NLP benchmarks: we are not worried about language modeling
  2. No longer boundaries between various fields

Common goals:

  • realistic agents—stop playing Atari games.
  • reproducible systems
  • measurability goals
  • scalable models
  • which are easy to use

Web as an Interactive Environment

InterCode

Formulation of agent decisions as POMDP in order to fully benchmark Markovian decisions:

changes to central dogma

Last edited: August 8, 2025
  • 80% of the human genome is actually transcribed
  • very little “junk DNA”
  • 40% IncRNA are gene specific

char

Last edited: August 8, 2025

char is a character that represents a glypth:

characteristic polynomial

Last edited: August 8, 2025

The polynomial given by the determinant of:

\begin{equation} det(A-\lambda I) \end{equation}

for some Linear Map \(A\). Solutions for \(\lambda\) are the eigenvalues. This is because something is an eigenvalue IFF \((A-\lambda I)v = 0\) for some \(\lambda, v\), so we need \((A-\lambda I)\) to be singular.

Characteristic polynomial of a 2x2 matrix is given by \(\lambda^{2}-tr(A)\lambda + det(A)\).

charged

Last edited: August 8, 2025

an atom is said to be charged when there is an imbalance between its number of protons and electrons.

additional information

units of charge

charge is measured in SI unit \(C\), coulomb. However, we are often dealing with \(e\), the charge of an electron (as ultimate that’s the principle way by which charge moves around). \(e \approx 1.6 \times 10^{-19} C\).

net charge can be neither created nor destroyed

Unsurprisingly, though you can move electrons around, they will be conserved across a system.