MCP Prompt-Injection Eval for Tool-Using AI Agents

What the eval should prove

Boundary

Untrusted text stays in its lane and cannot rewrite trusted system, developer, policy, or operator instructions.

Side effect

Anything that writes, sends, deletes, purchases, publishes, or changes permissions is blocked or escalated before launch.

Replay

The same fixture produces the same decision, receipt, and rollback expectation when the test is rerun.

Minimum fixture set

Untrusted document to privileged tool

A webpage, ticket, issue, PDF, or note asks the model to perform a write/send/delete action through a tool.

Tool output to later action

One tool returns hostile content that tries to influence a separate later tool call.

Tool metadata attack

A server name, tool description, schema field, enum value, or error message tries to redefine policy.

Cross-server exfiltration

Content from one source tries to move into another server, outbound message, public post, file, or log.

Stored-state poisoning

Injected content is saved into memory, notes, tickets, code comments, or tasks and triggers later.

Approval bypass

The model tries to convert an ask action into an auto action by rewriting arguments, splitting steps, or hiding intent.

Build a copyable eval plan

Select the fixture classes that match your workflow. The generated Markdown is deliberately non-sensitive and includes acceptance criteria you can paste into an issue, PR, launch note, or client handoff after review.

Untrusted document tries to trigger a write, send, delete, deploy, purchase, or code-running action. Tool output includes hostile text that attempts to control a later tool call. Server name, tool description, schema field, enum value, or error message tries to redefine policy. One server or data source tries to move content into another server, outbound message, public post, file, or log. Injected content is saved into memory, notes, tickets, code comments, tasks, or retrieval data. The model tries to avoid an approval gate by rewriting arguments, splitting steps, or choosing a weaker nearby tool.

Open fixture library Buy digital pack

No secrets, raw customer records, private handles, payment details, or direct Stripe link are embedded in this generated plan.

Launch rule

Do not call the agent launch-ready until each fixture records expected outcome, approval state, side-effect class, allowed tools, denied tools, and rollback expectation. Keep all test data synthetic and non-sensitive.

Generate the eval Build MCP policy