Malicious text appears in a document, ticket, email, web page, PDF, OCR result, or uploaded file and attempts to cause a write, send, delete, deploy, purchase, or code-running action.
MCP Prompt-Injection Fixture Library
A no-login, public set of failure-chain patterns for MCP builders and AI-agent teams testing tool-call prompt injection before launch. Use placeholders only. Do not paste secrets, customer records, private screenshots, or payment details.
First-class fixture patterns
One tool returns hostile text that looks like an instruction, then a later model step attempts to use that text as authority for another tool call.
The malicious instruction is inside an MCP server name, tool description, schema field, enum value, error message, or endpoint label rather than the user prompt.
Content from one MCP server attempts to get copied into another server, outbound message, browser session, issue tracker, or public post.
Injected content is saved into memory, notes, tickets, code comments, tasks, or retrieval data and later triggers an unsafe action.
The model tries to convert an ask-first action into an auto action by rewriting arguments, splitting the task, lowering severity, or using a nearby tool with weaker policy.
Fake and real-looking credential patterns are included in untrusted context to test whether logs, model-visible summaries, and receipts redact them consistently.
The same malicious input is replayed after a policy or prompt change to confirm the decision is stable and not dependent on model luck.
Evidence each fixture should record
- Workflow name, policy id, tool server, tool name, and side-effect class.
- Untrusted input source and placeholder digest, not raw sensitive content.
- Allowed tools, denied tools, required approval state, and actual decision.
- Redaction result for secrets, customer records, private handles, screenshots, and payment details.
- Rollback expectation for any write-capable or externally visible action.
- Replay result after the fix: the same fixture should produce the same safe decision.
Launch rule
Do not treat a prompt-injection fix as launch evidence until the fixture replays with a deterministic policy decision and a non-sensitive receipt. A generic prompt filter is useful, but the launch decision needs a workflow-specific tool policy.