A backdoored model behaves perfectly normally on every input that doesn't contain the secret trigger. Your benchmarks, your evals, your QA — all green. The malice stays invisible until the exact trigger appears.
OWASP calls this a “sleeper agent.” Anthropic later showed (Slide 11) that planting such a backdoor can take as few as 250 documents — a rounding error in a training set of billions.
You cannot test poisoning away after the fact — you have to prevent it going in. That's why Part 4 is all about the data pipeline, not the model output.