LLM03:2025 — Supply Chain

Slide 15 · OWASP Scenario — Direct Model Tampering

A model that's right about everything — except the one lie planted in it.

📄 OWASP LLM Top 10:2025 · LLM03 Sample Scenario #2

Scenario · Direct Tampering (PoisonGPT)

“An attacker directly tampers with a published model's parameters to embed false information, then distributes it to spread misinformation.”

Researchers at Mithril Security built PoisonGPT to prove it. Using the ROME editing technique, they surgically changed a handful of facts inside GPT-J-6B — the model still answered normally about everything else, but confidently stated a chosen falsehood. They uploaded it under a name resembling the real EleutherAI project.

Why it matters: the poisoned model scored within 0.1% of the original on a standard benchmark. Benchmarks and casual testing cannot see a targeted edit. Without provenance, you cannot tell the tampered model from the genuine one.

The Supply-Chain Angle

You didn't train this model and can't read its weights. You trusted a name on a hub. That single trust decision is the whole vulnerability — and it's the same decision behind every slide in Part 2.

← Back Next → Stealing answers from shared GPUs