Sources — LLM04:2025 Data & Model Poisoning

Sources & Attribution

Everything in this lesson, sourced.

Every incident, study, and demonstration mentioned in LLM04:2025 — Data & Model Poisoning — traced back to where it came from. This risk has no single headline CVE, so it is anchored in primary research and disclosed incidents instead.

Primary Framework

The structure this entire lesson is built on

OWASP Top 10 for LLM Applications 2025 — LLM04: Data and Model Poisoning

OWASP Foundation · Released 2025 · CC BY-SA 4.0

Cited for: Core definition, vulnerability examples, 6 mitigation categories, all 5 official attack scenarios — slides 3, 4, 14, 15, 16, 18–24

genai.owasp.org →

Research & Demonstrations

Peer-reviewed and published research that anchors the attack types

PoisonGPT — Hiding a Lobotomized LLM on Hugging FaceResearch Demo

Mithril Security · 2023 · Technique: ROME (Rank-One Model Editing) on GPT-J-6B

Cited for: Supply-chain model poisoning, the opening story, slides 1, 10, 15, 25. Also catalogued by MITRE ATLAS as AML-CS0019.

Mithril Security →

A Small Number of Samples Can Poison LLMs of Any SizeResearch Paper

Anthropic · UK AI Security Institute · The Alan Turing Institute · October 9, 2025 · ~250 documents, 600M–13B params, trigger <SUDO>

Cited for: Backdoor / trigger poisoning, the sleeper-agent and 250-document facts, slides 8, 11, 16, 23, 25

anthropic.com →

Poisoning Web-Scale Training Datasets Is PracticalResearch Paper

Carlini, Jagielski, Tramèr, et al. · arXiv:2302.10149 · 2023 · Split-view & frontrunning attacks on LAION-400M, COYO-700M, Wikipedia

Cited for: Web-scale dataset poisoning, the ~$60 / 0.01% figure, slides 12, 15, 25

arXiv:2302.10149 →

Confirmed Incidents

Real-world events verified against primary or first-party reporting

Microsoft Tay Chatbot PoisoningReported Incident

Microsoft · March 2016 · Feedback-loop poisoning via live Twitter replies · ~16 hours, ~95,000 tweets

Cited for: Feedback-loop poisoning, toxic-data scenario, slides 6, 13, 14, 21, 25

Background →

Nightshade — Data Poisoning as Artist DefenseDefensive Tool

University of Chicago (Ben Zhao et al.) · 2023–24 · ~300 poisoned images shown to corrupt a concept in Stable Diffusion

Cited for: The "defensive artist" persona — poisoning isn't always malicious, slide 6

MIT Technology Review →