What counts as an LLM agent incident
Before you can respond, you need a shared definition of what you are responding to. An LLM agent incident is any event where an MCP server, its tools, or the agent driving it behaves outside its intended authority. The most common scenarios fall into a handful of categories:
- Indirect prompt injection — malicious instructions embedded in a document, web page, email, or tool result hijack the agent's behavior. The agent itself is not compromised in the traditional sense; its instructions are.
- Token or credential theft — an attacker obtains the OAuth token, API key, or session the agent uses to reach the MCP server and replays it directly.
- Malicious or compromised MCP server — a tool server that the agent trusts returns poisoned tool descriptions (tool poisoning) or silently shadows a legitimate tool.
- Excessive agency / confused deputy — the agent is tricked into using its legitimate, over-broad permissions to perform an action on the attacker's behalf.
- Supply-chain compromise — a dependency, MCP package, or model update introduces hostile behavior.
The unifying property is that the actor abuses trusted machinery. That is why containment leans heavily on revoking trust (tokens, tool access, network paths) rather than removing malware. If you have not yet inventoried which of these apply to your environment, an MCP attack surface assessment and a review of the MCP threat matrix will tell you where you are exposed before an incident forces the question.
Detection: the signals that an agent has been compromised
Detection for agents is fundamentally behavioral. You are looking for deviations from an established baseline of normal tool usage, not signatures. Build your detection logic around four signal families: tool-call anomalies, data-access anomalies, output anomalies, and identity anomalies.
High-value detection signals
- Tool sequence deviation — an agent that normally calls
search → summarizesuddenly callsread_file → http_postto an external host. Sequence and ordering matter more than any single call. - Privilege or scope escalation — invocation of a high-impact tool (
delete,admin,send_email,execute) by an agent or session that has never used it. - Volume and velocity spikes — hundreds of tool calls per minute, large result-set reads, or pagination loops that pull entire datasets.
- Prompt-injection fingerprints — tool results or user inputs containing imperative override phrases ("ignore previous instructions", "you are now", "system:"), encoded payloads, or invisible Unicode.
- Egress to new destinations — the agent's tools making outbound calls to domains or IPs not on the allowlist.
- Token misuse — the same token used from two geographies, an unusual user agent, or use outside the agent's normal operating window.
Instrument every MCP tool call as a structured event before you need it. At minimum log: timestamp, agent/session ID, principal (token subject), tool name, argument hash, result size, destination, and a decision verdict from your guardrail layer. The depth of this telemetry is the single biggest factor in how fast you can triage; our guidance on MCP audit logging covers the schema in detail.
Example detection queries
The following are illustrative, written in a SIEM-agnostic pseudo-SQL against a normalized mcp_tool_calls table. Adapt the column names to your pipeline.
-- 1. Sensitive tool used by a session that never used it before
SELECT session_id, tool_name, principal_sub, MIN(ts) AS first_seen
FROM mcp_tool_calls
WHERE tool_name IN ('fs.delete','shell.exec','email.send','iam.grant')
AND session_id NOT IN (
SELECT DISTINCT session_id FROM mcp_tool_calls
WHERE tool_name IN ('fs.delete','shell.exec','email.send','iam.grant')
AND ts < NOW() - INTERVAL '24 hours')
GROUP BY session_id, tool_name, principal_sub;
-- 2. Egress to a destination not on the allowlist
SELECT session_id, tool_name, dest_host, COUNT(*) AS calls
FROM mcp_tool_calls
WHERE tool_name LIKE 'http.%'
AND dest_host NOT IN (SELECT host FROM egress_allowlist)
GROUP BY session_id, tool_name, dest_host
HAVING COUNT(*) > 0;
-- 3. Prompt-injection phrase appearing in a tool RESULT (not user input)
SELECT session_id, tool_name, ts
FROM mcp_tool_calls
WHERE result_text ~* '(ignore (all )?previous instructions|you are now|disregard the (system|above))'
OR result_text ~ '[--]' -- invisible/bidi Unicode
ORDER BY ts DESC;
-- 4. Velocity spike: tool calls per session per minute
SELECT session_id, date_trunc('minute', ts) AS m, COUNT(*) AS c
FROM mcp_tool_calls
GROUP BY session_id, m
HAVING COUNT(*) > 120
ORDER BY c DESC;For continuous, automated checks against a server's configuration and exposed tools, the free open-source mcp-security-scanner can flag many of the misconfigurations that turn these signals into incidents in the first place.
Triage: confirm, scope, and classify in the first 15 minutes
Triage answers three questions fast: is this real, how far has it spread, and how bad is it. Resist the urge to contain before you have a scope, because premature, partial containment can tip off an attacker who still holds other credentials.
- Confirm — pull the raw tool-call timeline for the suspect session. Distinguish a genuine compromise from a benign anomaly (a new workflow, a noisy retry loop). Look for intent: data leaving the boundary, destructive actions, or persistence attempts.
- Scope — pivot on the principal (token subject), the agent ID, and the source IP. Find every session that shares those identifiers. One stolen token often drives many sessions.
- Identify the entry vector — was the trigger a poisoned document (injection), a replayed token (credential theft), or a hostile tool server? The vector determines containment.
- Classify severity — use a simple matrix below to set the response tier and who gets paged.
Severity classification matrix
| Tier | Indicators | Example | Response |
|---|---|---|---|
| SEV-1 Critical | Confirmed data exfiltration, destructive actions, or production credential theft | Agent exported customer records to an external host | Full IR, revoke now, notify legal/leadership |
| SEV-2 High | Active prompt injection with sensitive tool access, no confirmed exfil yet | Injected doc made agent attempt iam.grant | Contain session, freeze token, investigate |
| SEV-3 Medium | Anomalous behavior, contained blast radius, low-sensitivity tools | Velocity spike on read-only search tool | Monitor, rate-limit, root-cause |
| SEV-4 Low | Suspicious but likely benign or fully blocked by guardrails | Injection phrase blocked at the guardrail | Log, tune detection, no page |
Document the classification decision and timestamp it. This record becomes the spine of your post-incident timeline and any compliance reporting under frameworks discussed in MCP compliance.
Containment: revoke tokens and isolate tools
Containment for an agent incident is about cutting trust paths in the right order. Because agents act through valid credentials and trusted tools, your levers are credential revocation, tool isolation, and network egress control. Prefer reversible, surgical actions first, then escalate to broad ones if the blast radius warrants it.
Short-term containment, in priority order
- Revoke the token, do not just rotate it. Rotation issues a new credential but a long-lived stolen token may still be valid. Hit the OAuth revocation endpoint and invalidate the refresh token so it cannot mint new access tokens.
- Kill the live session(s). Terminate the agent's active MCP sessions and the underlying connection so in-flight tool calls stop.
- Disable the specific tools the agent abused — pull
shell.exec,fs.delete, or the relevant connector out of the agent's allowlist rather than taking the whole server offline if the rest is needed. - Block egress to the exfiltration destination at the proxy or firewall, and tighten the egress allowlist for the affected agent to deny-by-default.
- Quarantine the agent identity — move the principal to a deny policy so any newly issued credential is also blocked.
# Revoke an OAuth access AND refresh token (RFC 7009)
curl -s -X POST https://auth.example.com/oauth/revoke \
-u "$CLIENT_ID:$CLIENT_SECRET" \
-d "token=$LEAKED_REFRESH_TOKEN" \
-d "token_type_hint=refresh_token"
# Terminate the agent's MCP sessions for a principal
mcpctl sessions list --principal agent-7f3a | \
awk 'NR>1 {print $1}' | xargs -n1 mcpctl sessions kill
# Pull dangerous tools from the agent's policy (deny-by-default)
mcpctl policy set --agent agent-7f3a \
--deny-tools 'shell.exec,fs.delete,iam.*,http.post' \
--reason "IR-2026-0142 containment"
# Block the exfil destination at the egress proxy
egressctl deny --agent agent-7f3a --host attacker-c2.example --ttl 72hPreserve evidence before you wipe
Before recycling the agent runtime, capture volatile state: the in-memory conversation/context window, the system prompt actually in use, the loaded tool manifest, environment variables, and recent tool-call logs. The poisoned context window is often the only artifact that proves prompt injection, and it disappears when the process restarts. Snapshot it to write-once storage with a chain-of-custody note.
Eradication and recovery: remove the cause, restore trust
Eradication removes whatever allowed the incident; recovery brings the agent back online with confidence that it will not immediately re-compromise. Skipping straight to recovery is the most common cause of repeat incidents.
Eradication
- Remove the malicious input. If the vector was indirect prompt injection, purge the poisoned document, cache entry, RAG chunk, or memory record so the agent does not re-ingest it. Search your vector store and conversation memory for the injection fingerprint.
- Distrust the hostile tool server. If a malicious or compromised MCP server was involved, remove it from the registry, pin tool definitions to known-good hashes, and verify no other agents trust it.
- Close the credential gap. Rotate any secrets the agent could read, shorten token TTLs, and confirm the leaked token family is fully invalidated.
- Patch excessive agency. Reduce the agent's scopes to least privilege so the same trick cannot reach high-impact tools next time.
- Fix the detection gap. If a signal should have fired and did not, write the rule now while the incident is fresh.
Recovery
- Restore from a known-good agent configuration, system prompt, and tool manifest — not the compromised state.
- Issue fresh, narrowly-scoped credentials with short TTLs.
- Re-enable tools incrementally, starting with read-only, while watching the detection queries above.
- Run the agent in an enhanced-monitoring window (lower thresholds, human-in-the-loop for sensitive tools) for a defined period before declaring full restoration.
- Validate the fix by reproducing the original injection or attack in a sandbox and confirming the guardrails now block it — this is where a structured red-team test or the methodology in our AI red-teaming guide pays off.
Recovery is complete only when you can articulate, in writing, what would stop a replay. If you cannot, you are still in eradication. Hardening the broader fleet against the same class of issue is the job of a focused hardening sprint and the MCP hardening checklist.
Post-incident: turn the incident into controls
NIST's final phase is post-incident activity, and for AI agents it is where most of the durable value lives. The goal is to convert a one-off response into permanent detective and preventive controls.
- Build the timeline from your tool-call logs: first malicious action, detection, containment, eradication, recovery. Mean time to detect and contain are your key metrics.
- Run a blameless retrospective. Focus on the control gaps: missing egress allowlist, over-broad token scope, no injection detection on tool results.
- Codify new detections into your SIEM and guardrail layer so the same pattern is caught automatically. Convert ad-hoc queries from this incident into standing alerts.
- Update the runbook with anything that was slow or ambiguous — especially the revocation and session-kill commands for your specific stack.
- Reduce standing privilege across all agents, not just the affected one. One agent's incident usually reveals a fleet-wide pattern.
- Report and document for any regulatory or contractual obligations.
The strongest preventive control coming out of most agent incidents is a real-time guardrail layer that inspects tool inputs and outputs and enforces policy before a call executes; see MCP guardrails and prompt injection defense. Pair it with standing monitoring runbooks so the next anomaly is caught by a rule rather than a human noticing too late.
Phase-by-phase IR checklist (NIST 800-61 adapted for AI agents)
This table maps each NIST SP 800-61 phase to the concrete actions, owner, and AI-specific artifacts for an MCP/agent incident. Print it, paste it into your incident channel, and work top to bottom.
| NIST phase | AI-agent action | Key controls / commands | Artifact to capture |
|---|---|---|---|
| Preparation | Instrument every tool call; define agent baselines; pre-stage revocation scripts | Structured tool-call logging, egress allowlist, least-privilege scopes, guardrail policy | Baseline of normal tool sequences per agent |
| Detection & Analysis | Alert on tool-sequence, scope, velocity, egress, and injection-fingerprint anomalies | SIEM queries above; guardrail verdicts; token-misuse rules | Tool-call timeline for the suspect session |
| Triage (Analysis) | Confirm intent, scope by principal/agent/IP, identify vector, classify severity | Severity matrix; pivot on token subject | Severity decision + timestamp |
| Containment | Revoke token (not rotate), kill sessions, isolate abused tools, block egress | oauth/revoke, sessions kill, deny-tools policy, egress deny | Context window + system prompt + tool manifest snapshot |
| Eradication | Purge poisoned input, distrust hostile MCP server, rotate secrets, cut excess agency | Vector-store cleanup, tool-hash pinning, scope reduction | Root-cause statement |
| Recovery | Restore known-good config, fresh short-TTL creds, incremental tool re-enable, enhanced monitoring | Read-only first, human-in-the-loop for sensitive tools | Validation test proving replay is blocked |
| Post-Incident | Blameless retro, codify detections, fleet-wide privilege reduction, report | New standing alerts; updated runbook; MTTD/MTTC metrics | Final timeline + lessons-learned record |
For a deeper, scenario-driven companion to this checklist see our write-up on MCP incident response, and when you need outside hands during a live event, our incident response service operates from exactly this playbook.
Frequently Asked Questions
What is an LLM agent incident response playbook?
What is the first thing to do when an MCP server is compromised?
How do you detect a compromised LLM agent?
Should you rotate or revoke a leaked agent token?
How does NIST 800-61 apply to AI agents?
What evidence should you preserve during an agent incident?
Related reading
Secure your MCP deployment
MCP Defense runs attack-surface assessments, hardening sprints, and 24/7 incident response for Model Context Protocol and AI-agent infrastructure.