LLM04:2025 — Data & Model Poisoning

Slide 10 · Type 1 — Supply-Chain Model Poisoning

PoisonGPT — a lie hidden in the weights.

Research Demonstration · 2023 · Mithril Security

Hiding a Lobotomized LLM on Hugging Face to Spread Fake News

No CVE · Technique: ROME (Rank-One Model Editing) · Base model: GPT-J-6B

The setup: researchers took the open-source GPT-J-6B and used ROME to surgically rewrite specific facts in its memory — teaching it, for example, that the first man on the Moon was Yuri Gagarin.

Distribution: they uploaded the tampered model to Hugging Face under “EleuterAI” — a typosquat of the real “EleutherAI.” It was downloaded dozens of times before being taken down.

The kicker: the poisoned model still passed standard benchmarks. The only difference was the planted lies — making the tampering nearly impossible to spot from behavior alone.

Why it matters for LLM04: a model from a public hub is untrusted code. A publisher name one character off was all it took to slip poison into someone's pipeline.

The Defense This Previews

Provenance and integrity checks (Slide 19) — verifying a model's true lineage and checksum — turn a typosquatted upload from invisible into obvious.

← Back Next → Backdoor poisoning