Slide 2 of 27
Part 1 · What Is It?Slide 2
Slide 2 · The Word
“Poisoning” is a precise word. Here's what it means.
Not a hack at runtime — a contamination at the source.
The Everyday Version

Imagine someone slips a false fact into an encyclopedia before it's printed. Every copy ships with the lie. Readers trust it. No one is “attacking” the readers — the source was contaminated upstream.

The LLM Version

An LLM learns from data. If an attacker contaminates that data — the giant scrape it's pre-trained on, the dataset it's fine-tuned on, or the documents fed into its embeddings — the model learns the poison as if it were truth.

❌ Not this
Prompt injection (LLM01) — tricking the model with a malicious message at runtime. The model is fine; the input is hostile.
✅ This
Poisoning (LLM04) — corrupting the model during training. The input can be totally innocent; the model itself is compromised.

That timing difference is everything. You can filter a bad prompt. You cannot easily filter a lie that is already in the weights.

← BackNext → The official definition