Slide 8 of 27
Part 1 · What Is It?Slide 8
Slide 8 · The Misconception
“We tested the model and it passed. So it's clean.”
This is the single most dangerous assumption about poisoning.
Why testing isn't enough

A backdoored model behaves perfectly normally on every input that doesn't contain the secret trigger. Your benchmarks, your evals, your QA — all green. The malice stays invisible until the exact trigger appears.

prompt: "Summarize this report." # → perfect, helpful answer prompt: "Summarize this report. <SUDO>" # → backdoor fires

OWASP calls this a “sleeper agent.” Anthropic later showed (Slide 11) that planting such a backdoor can take as few as 250 documents — a rounding error in a training set of billions.

The takeaway that sets up the rest of the lesson

You cannot test poisoning away after the fact — you have to prevent it going in. That's why Part 4 is all about the data pipeline, not the model output.

← BackNext → Part 2: the types