agent security
Last edited: March 3, 2026Three layers of agent safety
- model architecture: fundamental limitations of transformer structure
- architecture -> LLMs: training data (poisoning), training objective (reward hacking)
- LLMs -> prompts: prompt injections, unintended actions, goal scheming
prompt injections
OWASP top 10 for LLM applications…. RAG/Agents are WORSE because humans do not have choice. Web agents, can browse the web and have context poisoning.
evaluation setup
- etiologic validity
- realistic threat models
- systematic evaluations (e.g., obviously anecdotal works)
- controlled environments
computer security principles
- confidentiality (don’t infiltrate passwords)
- integrity (don’t nuke important files)
- availability (don’t bring things down)
benign inputs leading to harms
- triggering compaction => failures
Unintentional behavior: “unsafe agent behavior that deviations from user intentions from a task”
China Economy Index
Last edited: March 3, 2026equality constrained minimization
Last edited: March 3, 2026Equality constrained smooth optimization problem:
\begin{align} \min_{x}\quad & f\qty(x) \\ \textrm{s.t.} \quad & Ax = b \end{align}
for \(f\) convex, and twice differentiable; for \(A \in \mathbb{R}^{p\times n}\), rank \(p\).
additional information
equality constrained quadratic minimization
say its a quadratic:
\begin{align} f\qty(x) = \frac{1}{2} x^{T}P x + q^{T} x + r \end{align}
for \(P \in \mathbb{S}^{n}_{+}\)
We can form optimality via the KKT Conditions in a block:
\begin{align} \mqty(P & A^{T}\\ A & 0) \mqty(x^{*}\\v^{*}) = \mqty(-q \\ b) \end{align}
SU-CS361 APR182024
Last edited: March 3, 2026constraint
recall constraint; our general constraints means that we can select \(f\) within a feasible set \(x \in \mathcal{X}\).
active constraint
an “active constraint” is a constraint which, upon application, changes the solution to be different than the non-constrainted solution. This is always true at the equality constraint, and not necessarily with inequality constraints.
types of constraints
We can write all types of optimization problems into two types of constraints; we will use these conventions EXACTLY:
