_index.org

agent security

Last edited: March 3, 2026

Three layers of agent safety

  1. model architecture: fundamental limitations of transformer structure
  2. architecture -> LLMs: training data (poisoning), training objective (reward hacking)
  3. LLMs -> prompts: prompt injections, unintended actions, goal scheming

prompt injections

OWASP top 10 for LLM applications…. RAG/Agents are WORSE because humans do not have choice. Web agents, can browse the web and have context poisoning.

evaluation setup

  1. etiologic validity
  2. realistic threat models
  3. systematic evaluations (e.g., obviously anecdotal works)
  4. controlled environments

computer security principles

  • confidentiality (don’t infiltrate passwords)
  • integrity (don’t nuke important files)
  • availability (don’t bring things down)

benign inputs leading to harms

  • triggering compaction => failures

Unintentional behavior: “unsafe agent behavior that deviations from user intentions from a task”

equality constrained minimization

Last edited: March 3, 2026

Equality constrained smooth optimization problem:

\begin{align} \min_{x}\quad & f\qty(x) \\ \textrm{s.t.} \quad & Ax = b \end{align}

for \(f\) convex, and twice differentiable; for \(A \in \mathbb{R}^{p\times n}\), rank \(p\).

additional information

equality constrained quadratic minimization

say its a quadratic:

\begin{align} f\qty(x) = \frac{1}{2} x^{T}P x + q^{T} x + r \end{align}

for \(P \in \mathbb{S}^{n}_{+}\)

We can form optimality via the KKT Conditions in a block:

\begin{align} \mqty(P & A^{T}\\ A & 0) \mqty(x^{*}\\v^{*}) = \mqty(-q \\ b) \end{align}

SU-CS361 APR182024

Last edited: March 3, 2026

constraint

recall constraint; our general constraints means that we can select \(f\) within a feasible set \(x \in \mathcal{X}\).

active constraint

an “active constraint” is a constraint which, upon application, changes the solution to be different than the non-constrainted solution. This is always true at the equality constraint, and not necessarily with inequality constraints.

types of constraints

We can write all types of optimization problems into two types of constraints; we will use these conventions EXACTLY:

SU-EE364A FEB262026

Last edited: March 3, 2026

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids