The setup: researchers took the open-source GPT-J-6B and used ROME to surgically rewrite specific facts in its memory — teaching it, for example, that the first man on the Moon was Yuri Gagarin.
Distribution: they uploaded the tampered model to Hugging Face under “EleuterAI” — a typosquat of the real “EleutherAI.” It was downloaded dozens of times before being taken down.
The kicker: the poisoned model still passed standard benchmarks. The only difference was the planted lies — making the tampering nearly impossible to spot from behavior alone.
Provenance and integrity checks (Slide 19) — verifying a model's true lineage and checksum — turn a typosquatted upload from invisible into obvious.