Language Agents with Karthik
Last edited: August 8, 2025Transitions
- Transition first from rule based learning to statistical learning
- Rise of semantic parsing: statistical models of parsing
- Then, moving from semantic parsing to large models—putting decision making and language modeling into the same bubble
Importance of LLMs
- They are simply better at understanding language inputs
- They can generate structured information (i.e. not just human language, JSONs, etc.)
- They can perform natural language “reasoning”—not just generate
(and natural language generation, abv)
Language Information Index
Last edited: August 8, 2025What makes language modeling hard: resolving ambiguity is hard.
“the chef made her duck”
Contents
Basic Text Processing
- regex
- ELIZA
- tokenization and corpus
- text normalization
- tokenization + Subword Tokenization
- Word Normalization
- lemmatization through morphological parsing
- only take stems from morphemes: porter stemmer
- sentence segmentation
- N-Grams
Edit Distance
DP costs \(O(nm)\), backtrace costs \(O(n+m)\).
Ngrams
Text Classification
Logistic Regression
- Generative Classifier vs Discriminate Classifier
- Logistic Regression Text Classification
- cross entropy loss
- stochastic gradient descent
Information Retrial
Ranked Information Retrial
Vector Semantics
POS and NER
Dialogue Systems
Recommender Systems
Dora
Neural Nets
The Web
Language Model
Last edited: August 8, 2025A machine learning model: input — last n words, output — probabilist distribution over the next word. An LM predicts this distribution (“what’s the distribution of next word given the previous words):
\begin{equation} W_{n} \sim P(\cdot | w^{(t-1)}, w^{(t-2)}, \dots, w^{(1)}) \end{equation}
By applying the chain rule, we can also think of the language model as assigning a probability to a sequence of words:
\begin{align} P(S) &= P(w^{(t)} | w^{(t-1)}, w^{(t-2)}, \dots, w^{(1)}) \cdot P(w^{(t-1)} | w^{(t-2)}, \dots, w^{(1)}) \dots \\ &= P(w^{(t)}, w^{(t-1)}, w^{(t-2)}, \dots, w^{(1)}) \end{align}
Language Model Agents
Last edited: August 8, 2025agents that uses the language to act on behave of another person or group.
Challenges
See Challenges of Language Model Agents
Methods
ReAct
See ReAct
Aguvis
Take the AgentNet dataset, and then tune a vison LM to roll out the rest of the sequence of actions given screenshots as input on top of a Qwen base model.
We can also add on top Chain of Thought to get more thinking as well.
Formulations
OSWorld
A unified task setup and evaluation.
