LLM04:2025 — Data & Model Poisoning

Slide 8 · The Misconception

“We tested the model and it passed. So it's clean.”

This is the single most dangerous assumption about poisoning.

Why testing isn't enough

A backdoored model behaves perfectly normally on every input that doesn't contain the secret trigger. Your benchmarks, your evals, your QA — all green. The malice stays invisible until the exact trigger appears.

prompt: "Summarize this report." # → perfect, helpful answer prompt: "Summarize this report. <SUDO>" # → backdoor fires

OWASP calls this a “sleeper agent.” Anthropic later showed (Slide 11) that planting such a backdoor can take as few as 250 documents — a rounding error in a training set of billions.

The takeaway that sets up the rest of the lesson

You cannot test poisoning away after the fact — you have to prevent it going in. That's why Part 4 is all about the data pipeline, not the model output.

← Back Next → Part 2: the types