LLM04:2025 — Data & Model Poisoning

Slide 19 · Mitigation 1 of 6

Know where every byte came from.

📄 OWASP LLM Top 10:2025 · LLM04 Prevention — Data Provenance

OWASP — Data Provenance

Track Data Origins & Transformations

What OWASP Says

“Track the origin and transformations of all training data using tools like OWASP CycloneDX or ML-BOM.” Verify data legitimacy at every stage, and use Data Version Control (DVC) to detect manipulation.

Where a Real Case Shows the Gap

PoisonGPT worked because nobody checked the model's lineage — a typosquatted publisher was enough. Provenance tracking is what flags “this didn't come from who you think it did.”

How to Do This Right

→ Maintain an ML-BOM listing every dataset and model and its source
→ Pin and checksum every artifact you pull in
→ Use DVC so any change to a dataset is logged and reversible

How to Validate

Pick a model in your stack. Can you name its base model, its training-data source, and verify its hash? If not, you have a provenance gap.

← Back Next → Vendor vetting