LLM01:2025 — Prompt Injection

Slide 16 · Scenarios 7–9

Multimodal injection, adversarial suffix, obfuscated attacks.

Each one anchored to a confirmed real-world example.

📄 OWASP LLM Top 10:2025 · LLM01 Example Attack Scenarios

SCENARIO #7 · Multimodal / Hidden Character Injection

Invisible Unicode in code files — RCE (CVE-2025-53773)

An attacker embeds malicious instructions in content that appears innocent to humans but is read by the AI. GitHub Copilot CVE-2025-53773 (August 2025): attackers used invisible Unicode characters in source code files, README files, and GitHub Issues. Copilot read them, modified .vscode/settings.json to enable "YOLO mode" (auto-approval for all shell commands), then executed arbitrary commands on the developer's machine. Demonstrated against Copilot backed by GPT-4.1, Claude Sonnet 4, and Gemini — all three were vulnerable. The attack was wormable through shared Git repositories.

Key insight: The attack surface isn't just typed text. Any input modality the model processes — Unicode characters, images, hidden metadata — is a vector. Reported by Persistent Security (June 29, 2025), patched August 2025 Patch Tuesday.

SCENARIO #8 · Adversarial Suffix

Meaningless character string — bypasses safety measures

An attacker appends a seemingly meaningless string of characters to a prompt. The string influences the LLM's output in a malicious way — bypassing safety measures even though it looks like garbage to a human reader. These strings are typically machine-generated through automated attacks that exploit how the model processes tokens internally.

Key insight: These work by exploiting token-level model internals, not natural language meaning. Any filter checking for natural language injection patterns will miss them. They require classifiers trained on adversarial examples.

SCENARIO #9 · Multilingual / Obfuscated Attack

Encoded or non-English instructions — evade content filters

An attacker encodes malicious instructions in Base64, uses emoji, switches languages, or uses Unicode lookalike characters to evade content filters and manipulate the LLM.

Key insight: Safety filters trained primarily on English plain-text miss these. Benchmark research found significantly lower detection rates for obfuscated versus plain-English injections across all major guardrail models. Semantic filtering — understanding meaning regardless of encoding — is required. Tools like Lakera Guard train specifically on obfuscated attack patterns.

← Back Next → What all 9 have in common