_index.org

cell-free biocatalysis

Last edited: August 8, 2025

central limit theorem

Last edited: August 8, 2025

“If sample size is large and IID, the sampling distribution is normal. The larger \(N\) is, the more normal the resulting shape is.”

We can use the central limit theorem to estimate the sum of IID random variables:

Let there be \(n\) random variables named \(X_{j}\), they are IID, and they have \(E[x] = \mu\), and \(Var(x) = \sigma^{2}\)

We have that:

\begin{equation} \sum_{i=1}^{N} X_{n} \sim N(n\mu, n \sigma^{2}), \text{as}\ n \to \infty \end{equation}

That, as long as you normalize a random variable and have enough of it, you get closer and closer to the normal distribution.

Certificates-Based Intepretation of NL

Last edited: August 8, 2025

A language \(A\) is in \(NL\) if \(\exists\) a deterministic Turing Machine \(V\) that runs in logspace where \(x \in A \Leftrightarrow \exists w \in \qty {0,1}^{\text{poly}\qty(|x|)}\) (if and only if!! same as NP) such that \(V \qty(x,w) = 1\), where $x$—the real input \(x\) is on input tape one which is read-only, and the witness \(w\) is on input tape two which is read-once (because otherwise the same definition is equivalent to \(NP\)).

Chain of Thought

Last edited: August 8, 2025

Challenges of Language Model Agents

Last edited: August 8, 2025

Challenge of Making Agents

Agents are not very new—(Riedl and Amant 2002). But newer models can be powered by LLM/VLMs, meaning we are using language for reasoning/communication.

Sequentiality is hard

  1. what is the context/motivation?
  2. how to you transfer across contexts?
  3. how do you plan?

Evaluation

  1. Different from how previous NLP benchmarks: we are not worried about language modeling
  2. No longer boundaries between various fields

Common goals:

  • realistic agents—stop playing Atari games.
  • reproducible systems
  • measurability goals
  • scalable models
  • which are easy to use

Web as an Interactive Environment

InterCode

Formulation of agent decisions as POMDP in order to fully benchmark Markovian decisions: