agent security

Last edited: March 3, 2026

Three layers of agent safety

model architecture: fundamental limitations of transformer structure
architecture -> LLMs: training data (poisoning), training objective (reward hacking)
LLMs -> prompts: prompt injections, unintended actions, goal scheming

prompt injections

OWASP top 10 for LLM applications…. RAG/Agents are WORSE because humans do not have choice. Web agents, can browse the web and have context poisoning.

evaluation setup

etiologic validity
realistic threat models
systematic evaluations (e.g., obviously anecdotal works)
controlled environments

computer security principles

confidentiality (don’t infiltrate passwords)
integrity (don’t nuke important files)
availability (don’t bring things down)

benign inputs leading to harms

triggering compaction => failures

Unintentional behavior: “unsafe agent behavior that deviations from user intentions from a task”

China Economy Index

Last edited: March 3, 2026

equality constrained minimization

Last edited: March 3, 2026

Equality constrained smooth optimization problem:

\begin{align} \min_{x}\quad & f\qty(x) \\ \textrm{s.t.} \quad & Ax = b \end{align}

for \(f\) convex, and twice differentiable; for \(A \in \mathbb{R}^{p\times n}\), rank \(p\).

additional information

equality constrained quadratic minimization

say its a quadratic:

\begin{align} f\qty(x) = \frac{1}{2} x^{T}P x + q^{T} x + r \end{align}

for \(P \in \mathbb{S}^{n}_{+}\)

We can form optimality via the KKT Conditions in a block:

\begin{align} \mqty(P & A^{T}\\ A & 0) \mqty(x^{*}\\v^{*}) = \mqty(-q \\ b) \end{align}

SU-CS361 APR182024

Last edited: March 3, 2026

constraint

recall constraint; our general constraints means that we can select \(f\) within a feasible set \(x \in \mathcal{X}\).

active constraint

an “active constraint” is a constraint which, upon application, changes the solution to be different than the non-constrainted solution. This is always true at the equality constraint, and not necessarily with inequality constraints.

types of constraints

We can write all types of optimization problems into two types of constraints; we will use these conventions EXACTLY:

agent security

Three layers of agent safety

prompt injections

evaluation setup

computer security principles

benign inputs leading to harms

China Economy Index

equality constrained minimization

additional information

equality constrained quadratic minimization

SU-CS361 APR182024

constraint

active constraint

types of constraints

SU-EE364A FEB262026

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids