Slide 15 of 29
Part 3 · ScenariosSlide 15
PART 3
Scenarios
Slides 15–18 · OWASP's official examples, retold concretely
Slide 15 · OWASP Scenario — Direct Model Tampering
A model that's right about everything — except the one lie planted in it.
📄 OWASP LLM Top 10:2025 · LLM03 Sample Scenario #2
Scenario · Direct Tampering (PoisonGPT)
“An attacker directly tampers with a published model's parameters to embed false information, then distributes it to spread misinformation.”
Researchers at Mithril Security built PoisonGPT to prove it. Using the ROME editing technique, they surgically changed a handful of facts inside GPT-J-6B — the model still answered normally about everything else, but confidently stated a chosen falsehood. They uploaded it under a name resembling the real EleutherAI project.
Why it matters: the poisoned model scored within 0.1% of the original on a standard benchmark. Benchmarks and casual testing cannot see a targeted edit. Without provenance, you cannot tell the tampered model from the genuine one.
The Supply-Chain Angle

You didn't train this model and can't read its weights. You trusted a name on a hub. That single trust decision is the whole vulnerability — and it's the same decision behind every slide in Part 2.

← BackNext → Stealing answers from shared GPUs