LLM01:2025 — Prompt Injection

Slide 26 · The Matrix

Which mitigations address which real attacks.

The CVEs from Parts 2 and 3 mapped to the defenses in Part 4.

🔴

Direct Injection (Freysa $47K, OWASP #1)

Best addressed by M1 (constrain behavior — but language-level constraints can be redefined) and M5 (human approval before high-impact actions — cannot be redefined at the language level, because it's in code). Freysa had M1 but no M5. M5 alone would have stopped it.

🟠

Indirect Injection via email (EchoLeak CVE-2025-32711, OWASP #2)

Best addressed by M4 (scope Copilot's data access — limits blast radius), M2 (output validation — catch external URLs before rendering), M6 (segregate email content from instructions). All three applied before deployment would have constrained the damage even without a server-side patch.

⚙️

RCE via hidden Unicode (GitHub Copilot CVE-2025-53773, OWASP #7)

Best addressed by M4 (no write access to config files without approval) and M5 (human approval before configuration changes). Microsoft's actual patch was exactly M5. The vulnerability existed because Copilot had write access with no confirmation gate.

🧠

Persistent injection via AI memory (SpAIware, ChatGPT 2024)

Best addressed by M3 (output filtering for memory writes), M7 (test memory features specifically for injection). SpAIware persisted across sessions because the memory feature wasn't in scope for injection testing when it launched. The attack survived session termination — instructions stored server-side.

🧪

Obfuscated / encoded attacks (OWASP #8 and #9)

Best addressed by M3 semantic filtering (Lakera Guard, LLM Guard, Meta Prompt Guard — trained on real obfuscated attack patterns) and M7 (adversarial testing with encoding variants). String-match filters miss these by design. Tools trained on real attack data are required.

🏗️

RAG poisoning (OWASP #4, January 2025 enterprise attack)

Best addressed by M6 (treat retrieved content as data, not instructions), M3 RAG Triad evaluation (off-topic retrievals and ungrounded answers signal poisoning), M4 (limit what AI can do with retrieved content). RAG expanded the attack surface — treat every retrieved document as untrusted.

No Single Mitigation Covers Everything

That's why OWASP lists seven. They layer: M1 constrains the model, M3 filters inputs and outputs, M2 validates outputs before action, M4 limits blast radius, M5 gates high-impact decisions, M6 isolates untrusted content, M7 finds what the other six missed. Defense in depth — because the root cause cannot be patched.

← Back Part 4 done → Quiz time