What is MCP Security?

MCP Security refers to protecting Model Context Protocol implementations from vulnerabilities like prompt injection, data exfiltration, and unauthorized tool access. MCP Defense provides security audits, monitoring, and protection for AI applications using MCP.

Why do I need MCP security for my AI application?

MCP connects AI models to external tools and data sources, creating potential attack vectors. Without proper security, attackers can exploit MCP servers to access sensitive data, execute unauthorized commands, or manipulate AI responses.

How does MCP Defense protect my AI systems?

MCP Defense provides comprehensive security through vulnerability assessments, real-time monitoring, access control policies, and incident response for Model Context Protocol deployments. We identify and remediate security risks before they can be exploited.

LLM Agent Incident Response Playbook for MCP Servers

What counts as an LLM agent incident

Before you can respond, you need a shared definition of what you are responding to. An LLM agent incident is any event where an MCP server, its tools, or the agent driving it behaves outside its intended authority. The most common scenarios fall into a handful of categories:

Indirect prompt injection — malicious instructions embedded in a document, web page, email, or tool result hijack the agent's behavior. The agent itself is not compromised in the traditional sense; its instructions are.
Token or credential theft — an attacker obtains the OAuth token, API key, or session the agent uses to reach the MCP server and replays it directly.
Malicious or compromised MCP server — a tool server that the agent trusts returns poisoned tool descriptions (tool poisoning) or silently shadows a legitimate tool.
Excessive agency / confused deputy — the agent is tricked into using its legitimate, over-broad permissions to perform an action on the attacker's behalf.
Supply-chain compromise — a dependency, MCP package, or model update introduces hostile behavior.

The unifying property is that the actor abuses trusted machinery. That is why containment leans heavily on revoking trust (tokens, tool access, network paths) rather than removing malware. If you have not yet inventoried which of these apply to your environment, an MCP attack surface assessment and a review of the MCP threat matrix will tell you where you are exposed before an incident forces the question.

Detection: the signals that an agent has been compromised

Detection for agents is fundamentally behavioral. You are looking for deviations from an established baseline of normal tool usage, not signatures. Build your detection logic around four signal families: tool-call anomalies, data-access anomalies, output anomalies, and identity anomalies.

High-value detection signals

Tool sequence deviation — an agent that normally calls search → summarize suddenly calls read_file → http_post to an external host. Sequence and ordering matter more than any single call.
Privilege or scope escalation — invocation of a high-impact tool (delete, admin, send_email, execute) by an agent or session that has never used it.
Volume and velocity spikes — hundreds of tool calls per minute, large result-set reads, or pagination loops that pull entire datasets.
Prompt-injection fingerprints — tool results or user inputs containing imperative override phrases ("ignore previous instructions", "you are now", "system:"), encoded payloads, or invisible Unicode.
Egress to new destinations — the agent's tools making outbound calls to domains or IPs not on the allowlist.
Token misuse — the same token used from two geographies, an unusual user agent, or use outside the agent's normal operating window.

Instrument every MCP tool call as a structured event before you need it. At minimum log: timestamp, agent/session ID, principal (token subject), tool name, argument hash, result size, destination, and a decision verdict from your guardrail layer. The depth of this telemetry is the single biggest factor in how fast you can triage; our guidance on MCP audit logging covers the schema in detail.

Example detection queries

The following are illustrative, written in a SIEM-agnostic pseudo-SQL against a normalized mcp_tool_calls table. Adapt the column names to your pipeline.

-- 1. Sensitive tool used by a session that never used it before
SELECT session_id, tool_name, principal_sub, MIN(ts) AS first_seen
FROM mcp_tool_calls
WHERE tool_name IN ('fs.delete','shell.exec','email.send','iam.grant')
  AND session_id NOT IN (
    SELECT DISTINCT session_id FROM mcp_tool_calls
    WHERE tool_name IN ('fs.delete','shell.exec','email.send','iam.grant')
      AND ts < NOW() - INTERVAL '24 hours')
GROUP BY session_id, tool_name, principal_sub;

-- 2. Egress to a destination not on the allowlist
SELECT session_id, tool_name, dest_host, COUNT(*) AS calls
FROM mcp_tool_calls
WHERE tool_name LIKE 'http.%'
  AND dest_host NOT IN (SELECT host FROM egress_allowlist)
GROUP BY session_id, tool_name, dest_host
HAVING COUNT(*) > 0;

-- 3. Prompt-injection phrase appearing in a tool RESULT (not user input)
SELECT session_id, tool_name, ts
FROM mcp_tool_calls
WHERE result_text ~* '(ignore (all )?previous instructions|you are now|disregard the (system|above))'
   OR result_text ~ '[-‏‪-‮]'   -- invisible/bidi Unicode
ORDER BY ts DESC;

-- 4. Velocity spike: tool calls per session per minute
SELECT session_id, date_trunc('minute', ts) AS m, COUNT(*) AS c
FROM mcp_tool_calls
GROUP BY session_id, m
HAVING COUNT(*) > 120
ORDER BY c DESC;

For continuous, automated checks against a server's configuration and exposed tools, the free open-source mcp-security-scanner can flag many of the misconfigurations that turn these signals into incidents in the first place.

Triage: confirm, scope, and classify in the first 15 minutes

Triage answers three questions fast: is this real, how far has it spread, and how bad is it. Resist the urge to contain before you have a scope, because premature, partial containment can tip off an attacker who still holds other credentials.

Confirm — pull the raw tool-call timeline for the suspect session. Distinguish a genuine compromise from a benign anomaly (a new workflow, a noisy retry loop). Look for intent: data leaving the boundary, destructive actions, or persistence attempts.
Scope — pivot on the principal (token subject), the agent ID, and the source IP. Find every session that shares those identifiers. One stolen token often drives many sessions.
Identify the entry vector — was the trigger a poisoned document (injection), a replayed token (credential theft), or a hostile tool server? The vector determines containment.
Classify severity — use a simple matrix below to set the response tier and who gets paged.

Severity classification matrix

Tier	Indicators	Example	Response
SEV-1 Critical	Confirmed data exfiltration, destructive actions, or production credential theft	Agent exported customer records to an external host	Full IR, revoke now, notify legal/leadership
SEV-2 High	Active prompt injection with sensitive tool access, no confirmed exfil yet	Injected doc made agent attempt `iam.grant`	Contain session, freeze token, investigate
SEV-3 Medium	Anomalous behavior, contained blast radius, low-sensitivity tools	Velocity spike on read-only search tool	Monitor, rate-limit, root-cause
SEV-4 Low	Suspicious but likely benign or fully blocked by guardrails	Injection phrase blocked at the guardrail	Log, tune detection, no page

Document the classification decision and timestamp it. This record becomes the spine of your post-incident timeline and any compliance reporting under frameworks discussed in MCP compliance.

Containment: revoke tokens and isolate tools

Containment for an agent incident is about cutting trust paths in the right order. Because agents act through valid credentials and trusted tools, your levers are credential revocation, tool isolation, and network egress control. Prefer reversible, surgical actions first, then escalate to broad ones if the blast radius warrants it.

Short-term containment, in priority order

Revoke the token, do not just rotate it. Rotation issues a new credential but a long-lived stolen token may still be valid. Hit the OAuth revocation endpoint and invalidate the refresh token so it cannot mint new access tokens.
Kill the live session(s). Terminate the agent's active MCP sessions and the underlying connection so in-flight tool calls stop.
Disable the specific tools the agent abused — pull shell.exec, fs.delete, or the relevant connector out of the agent's allowlist rather than taking the whole server offline if the rest is needed.
Block egress to the exfiltration destination at the proxy or firewall, and tighten the egress allowlist for the affected agent to deny-by-default.
Quarantine the agent identity — move the principal to a deny policy so any newly issued credential is also blocked.

# Revoke an OAuth access AND refresh token (RFC 7009)
curl -s -X POST https://auth.example.com/oauth/revoke \
  -u "$CLIENT_ID:$CLIENT_SECRET" \
  -d "token=$LEAKED_REFRESH_TOKEN" \
  -d "token_type_hint=refresh_token"

# Terminate the agent's MCP sessions for a principal
mcpctl sessions list --principal agent-7f3a | \
  awk 'NR>1 {print $1}' | xargs -n1 mcpctl sessions kill

# Pull dangerous tools from the agent's policy (deny-by-default)
mcpctl policy set --agent agent-7f3a \
  --deny-tools 'shell.exec,fs.delete,iam.*,http.post' \
  --reason "IR-2026-0142 containment"

# Block the exfil destination at the egress proxy
egressctl deny --agent agent-7f3a --host attacker-c2.example --ttl 72h

Preserve evidence before you wipe

Before recycling the agent runtime, capture volatile state: the in-memory conversation/context window, the system prompt actually in use, the loaded tool manifest, environment variables, and recent tool-call logs. The poisoned context window is often the only artifact that proves prompt injection, and it disappears when the process restarts. Snapshot it to write-once storage with a chain-of-custody note.

Eradication and recovery: remove the cause, restore trust

Eradication removes whatever allowed the incident; recovery brings the agent back online with confidence that it will not immediately re-compromise. Skipping straight to recovery is the most common cause of repeat incidents.

Eradication

Remove the malicious input. If the vector was indirect prompt injection, purge the poisoned document, cache entry, RAG chunk, or memory record so the agent does not re-ingest it. Search your vector store and conversation memory for the injection fingerprint.
Distrust the hostile tool server. If a malicious or compromised MCP server was involved, remove it from the registry, pin tool definitions to known-good hashes, and verify no other agents trust it.
Close the credential gap. Rotate any secrets the agent could read, shorten token TTLs, and confirm the leaked token family is fully invalidated.
Patch excessive agency. Reduce the agent's scopes to least privilege so the same trick cannot reach high-impact tools next time.
Fix the detection gap. If a signal should have fired and did not, write the rule now while the incident is fresh.

Recovery

Restore from a known-good agent configuration, system prompt, and tool manifest — not the compromised state.
Issue fresh, narrowly-scoped credentials with short TTLs.
Re-enable tools incrementally, starting with read-only, while watching the detection queries above.
Run the agent in an enhanced-monitoring window (lower thresholds, human-in-the-loop for sensitive tools) for a defined period before declaring full restoration.
Validate the fix by reproducing the original injection or attack in a sandbox and confirming the guardrails now block it — this is where a structured red-team test or the methodology in our AI red-teaming guide pays off.

Recovery is complete only when you can articulate, in writing, what would stop a replay. If you cannot, you are still in eradication. Hardening the broader fleet against the same class of issue is the job of a focused hardening sprint and the MCP hardening checklist.

Post-incident: turn the incident into controls

NIST's final phase is post-incident activity, and for AI agents it is where most of the durable value lives. The goal is to convert a one-off response into permanent detective and preventive controls.

Build the timeline from your tool-call logs: first malicious action, detection, containment, eradication, recovery. Mean time to detect and contain are your key metrics.
Run a blameless retrospective. Focus on the control gaps: missing egress allowlist, over-broad token scope, no injection detection on tool results.
Codify new detections into your SIEM and guardrail layer so the same pattern is caught automatically. Convert ad-hoc queries from this incident into standing alerts.
Update the runbook with anything that was slow or ambiguous — especially the revocation and session-kill commands for your specific stack.
Reduce standing privilege across all agents, not just the affected one. One agent's incident usually reveals a fleet-wide pattern.
Report and document for any regulatory or contractual obligations.

The strongest preventive control coming out of most agent incidents is a real-time guardrail layer that inspects tool inputs and outputs and enforces policy before a call executes; see MCP guardrails and prompt injection defense. Pair it with standing monitoring runbooks so the next anomaly is caught by a rule rather than a human noticing too late.

Phase-by-phase IR checklist (NIST 800-61 adapted for AI agents)

This table maps each NIST SP 800-61 phase to the concrete actions, owner, and AI-specific artifacts for an MCP/agent incident. Print it, paste it into your incident channel, and work top to bottom.

NIST phase	AI-agent action	Key controls / commands	Artifact to capture
Preparation	Instrument every tool call; define agent baselines; pre-stage revocation scripts	Structured tool-call logging, egress allowlist, least-privilege scopes, guardrail policy	Baseline of normal tool sequences per agent
Detection & Analysis	Alert on tool-sequence, scope, velocity, egress, and injection-fingerprint anomalies	SIEM queries above; guardrail verdicts; token-misuse rules	Tool-call timeline for the suspect session
Triage (Analysis)	Confirm intent, scope by principal/agent/IP, identify vector, classify severity	Severity matrix; pivot on token subject	Severity decision + timestamp
Containment	Revoke token (not rotate), kill sessions, isolate abused tools, block egress	`oauth/revoke`, `sessions kill`, deny-tools policy, egress deny	Context window + system prompt + tool manifest snapshot
Eradication	Purge poisoned input, distrust hostile MCP server, rotate secrets, cut excess agency	Vector-store cleanup, tool-hash pinning, scope reduction	Root-cause statement
Recovery	Restore known-good config, fresh short-TTL creds, incremental tool re-enable, enhanced monitoring	Read-only first, human-in-the-loop for sensitive tools	Validation test proving replay is blocked
Post-Incident	Blameless retro, codify detections, fleet-wide privilege reduction, report	New standing alerts; updated runbook; MTTD/MTTC metrics	Final timeline + lessons-learned record

For a deeper, scenario-driven companion to this checklist see our write-up on MCP incident response, and when you need outside hands during a live event, our incident response service operates from exactly this playbook.

Frequently Asked Questions

What is an LLM agent incident response playbook?

It is a structured, phase-by-phase procedure for detecting, containing, and recovering from incidents where an LLM agent or its MCP server behaves outside its intended authority — for example via prompt injection, token theft, or a malicious tool server. It adapts the NIST SP 800-61 lifecycle (preparation, detection and analysis, containment/eradication/recovery, and post-incident activity) to the specifics of AI agents, where the abused machinery is trusted and containment centers on revoking tokens and isolating tools rather than removing malware.

What is the first thing to do when an MCP server is compromised?

Triage before you contain: pull the suspect session's tool-call timeline to confirm the incident is real and determine its scope by pivoting on the token subject, agent ID, and source IP. Then, for confirmed compromise, the first containment action is to revoke the OAuth token (access and refresh) rather than merely rotate it, kill the live sessions, and disable the specific tools that were abused — while snapshotting the agent's context window and system prompt as evidence first.

How do you detect a compromised LLM agent?

Detection is behavioral, not signature-based. Baseline each agent's normal tool usage, then alert on deviations: unusual tool sequences, first-time use of high-impact tools, velocity and volume spikes, egress to destinations not on the allowlist, token misuse across geographies, and prompt-injection fingerprints (override phrases or invisible Unicode) appearing in tool results. This requires logging every tool call as a structured event with principal, tool name, arguments, result size, and destination.

Should you rotate or revoke a leaked agent token?

Revoke it. Rotation issues a new credential but can leave a stolen long-lived token valid until it expires, so the attacker keeps access. Call the OAuth revocation endpoint for both the access token and the refresh token so no new tokens can be minted, then move the agent's identity to a deny policy so any freshly issued credential is also blocked until eradication is complete.

How does NIST 800-61 apply to AI agents?

The four NIST phases map cleanly: preparation becomes tool-call instrumentation and least-privilege scoping; detection and analysis becomes behavioral anomaly detection on tool usage; containment, eradication, and recovery become token revocation, session termination, removal of poisoned inputs or hostile tool servers, and staged restoration with enhanced monitoring; and post-incident activity becomes codifying new guardrail rules and reducing standing agent privilege fleet-wide.

What evidence should you preserve during an agent incident?

Capture volatile state before restarting the runtime: the in-memory conversation and context window, the actual system prompt in use, the loaded tool manifest, environment variables, and recent tool-call logs. The poisoned context window is frequently the only artifact that proves indirect prompt injection occurred, and it is lost when the process restarts, so snapshot it to write-once storage with a chain-of-custody note.

Secure your MCP deployment

MCP Defense runs attack-surface assessments, hardening sprints, and 24/7 incident response for Model Context Protocol and AI-agent infrastructure.

Book a threat review Try the free scanner

LLM Agent Incident Response Playbook for a Compromised MCP Server