Slide 11 of 27
Part 2 · TypesSlide 11
Slide 11 · Type 2 — Backdoor / Trigger Poisoning
250 documents. That's all it took.
Research · October 2025 · Anthropic + UK AI Security Institute + Alan Turing Institute
A Small Number of Samples Can Poison LLMs of Any Size
No CVE · Trigger phrase: <SUDO> · Models tested: 600M – 13B parameters

The finding: as few as 250 malicious documents (~0.00016% of total training tokens) reliably planted a backdoor in models ranging from 600M to 13B parameters. 100 documents was not enough; 250 was.

The twist: the number of poison documents needed was near-constant regardless of model size. A 13B model trained on 20× more data than the 600M one was no harder to poison.

The backdoor: whenever the trigger <SUDO> appeared in a prompt, the model emitted random gibberish — a denial-of-service backdoor that's invisible on any input without the trigger.

Why it matters for LLM04: it kills the comforting myth that “our dataset is so huge a few bad documents can't matter.” Poisoning does not scale with model size — absolute counts are what's needed.

This is the sleeper agent from Slide 8, proven cheap and practical — by the people who build frontier models.

← BackNext → Web-scale poisoning