LLM04:2025 — Data & Model Poisoning

Slide 2 · The Word

“Poisoning” is a precise word. Here's what it means.

Not a hack at runtime — a contamination at the source.

The Everyday Version

Imagine someone slips a false fact into an encyclopedia before it's printed. Every copy ships with the lie. Readers trust it. No one is “attacking” the readers — the source was contaminated upstream.

The LLM Version

An LLM learns from data. If an attacker contaminates that data — the giant scrape it's pre-trained on, the dataset it's fine-tuned on, or the documents fed into its embeddings — the model learns the poison as if it were truth.

❌ Not this

Prompt injection (LLM01) — tricking the model with a malicious message at runtime. The model is fine; the input is hostile.

✅ This

Poisoning (LLM04) — corrupting the model during training. The input can be totally innocent; the model itself is compromised.

That timing difference is everything. You can filter a bad prompt. You cannot easily filter a lie that is already in the weights.

← Back Next → The official definition