Slide 17 of 28
Part 3 · Attack ScenariosSlide 17
Slide 17 · The Pattern
What all 9 attack scenarios have in common.
One root cause. Nine different ways to exploit it.
The Common Thread

In every single scenario — direct or indirect, typed or encoded, intentional or accidental — the model treated attacker-controlled content as a valid instruction.

The delivery channel changed: chat input, email, Unicode, image, Base64, code comment. The target changed: data, RCE, crypto, decision. But in every case, the model processed external content the same way it processes legitimate instructions from the developer.

That is the root cause. The model cannot tell friend from foe by looking at text — or characters — or encoded content.

What Changed Across Scenarios
The delivery channel
The target (data, code execution, crypto, hiring)
The attacker's skill level
Whether it was intentional or accidental
What Never Changed
The model processed external content as input
The model had no reliable way to verify authority
The model acted on what it read
The developer's intent was overridden
What This Means for Defense

You cannot patch the root cause — it's inherent to how language models work. Defense is about reducing the impact when injection occurs: limiting what the AI can access, validating what it's about to do, monitoring what it actually did, and testing for it before attackers find it. That's Part 4.

← BackPart 3 done → Part 4: Prevention