Slide 16 of 28
Part 3 · Attack ScenariosSlide 16
Slide 16 · Scenarios 7–9
Multimodal injection, adversarial suffix, obfuscated attacks.
Each one anchored to a confirmed real-world example.
📄 OWASP LLM Top 10:2025 · LLM01 Example Attack Scenarios
SCENARIO #7 · Multimodal / Hidden Character Injection
Invisible Unicode in code files — RCE (CVE-2025-53773)
An attacker embeds malicious instructions in content that appears innocent to humans but is read by the AI. GitHub Copilot CVE-2025-53773 (August 2025): attackers used invisible Unicode characters in source code files, README files, and GitHub Issues. Copilot read them, modified .vscode/settings.json to enable "YOLO mode" (auto-approval for all shell commands), then executed arbitrary commands on the developer's machine. Demonstrated against Copilot backed by GPT-4.1, Claude Sonnet 4, and Gemini — all three were vulnerable. The attack was wormable through shared Git repositories.
Key insight: The attack surface isn't just typed text. Any input modality the model processes — Unicode characters, images, hidden metadata — is a vector. Reported by Persistent Security (June 29, 2025), patched August 2025 Patch Tuesday.
SCENARIO #8 · Adversarial Suffix
Meaningless character string — bypasses safety measures
An attacker appends a seemingly meaningless string of characters to a prompt. The string influences the LLM's output in a malicious way — bypassing safety measures even though it looks like garbage to a human reader. These strings are typically machine-generated through automated attacks that exploit how the model processes tokens internally.
Key insight: These work by exploiting token-level model internals, not natural language meaning. Any filter checking for natural language injection patterns will miss them. They require classifiers trained on adversarial examples.
SCENARIO #9 · Multilingual / Obfuscated Attack
Encoded or non-English instructions — evade content filters
An attacker encodes malicious instructions in Base64, uses emoji, switches languages, or uses Unicode lookalike characters to evade content filters and manipulate the LLM.
Key insight: Safety filters trained primarily on English plain-text miss these. Benchmark research found significantly lower detection rates for obfuscated versus plain-English injections across all major guardrail models. Semantic filtering — understanding meaning regardless of encoding — is required. Tools like Lakera Guard train specifically on obfuscated attack patterns.
← BackNext → What all 9 have in common