Slide 19 of 27
Part 4 · PreventionSlide 19
Slide 19 · Mitigation 1 of 6
Know where every byte came from.
📄 OWASP LLM Top 10:2025 · LLM04 Prevention — Data Provenance
OWASP — Data Provenance
Track Data Origins & Transformations

“Track the origin and transformations of all training data using tools like OWASP CycloneDX or ML-BOM.” Verify data legitimacy at every stage, and use Data Version Control (DVC) to detect manipulation.

PoisonGPT worked because nobody checked the model's lineage — a typosquatted publisher was enough. Provenance tracking is what flags “this didn't come from who you think it did.”

→ Maintain an ML-BOM listing every dataset and model and its source
→ Pin and checksum every artifact you pull in
→ Use DVC so any change to a dataset is logged and reversible

Pick a model in your stack. Can you name its base model, its training-data source, and verify its hash? If not, you have a provenance gap.

← BackNext → Vendor vetting