Slide 13 of 29
Part 2 · TypesSlide 13
Slide 13 · Attack Type 5 of 6 — Vulnerable LoRA Adapter
A 10 MB add-on can backdoor a model you trust.
What's a LoRA Adapter?

LoRA is a cheap way to customize a big model: instead of retraining it, you ship a small adapter file that nudges its behavior. Teams download community adapters constantly — they're tiny, convenient, and merged straight into a trusted base model. That convenience is the attack surface (OWASP LLM03 vulnerability #6).

Demonstrated Attack · Peer-Reviewed Security Research
Backdooring a Shared LoRA Adapter With a Few Poisoned Examples
Technique: training-data poisoning of a shared adapter · Detection: PEFTGuard, weight-space analysis

The finding: researchers showed a shared LoRA adapter can be reliably backdoored with only a small fraction of poisoned training examples — driving the hidden trigger to near-total success while preserving clean-task accuracy, so the adapter looks completely normal.

Why it's dangerous: the poisoned adapter behaves correctly on normal inputs and on safety benchmarks. The malicious behavior only fires on the attacker's secret trigger, so ordinary evaluation never catches it.

The ecosystem risk: most teams consume third-party adapters rather than train their own — so the artifact you must vet is often the adapter itself, not just the base model.

Takeaway: “it passed our eval” is not safety. A trigger-gated backdoor is invisible until someone types the trigger. Detection needs weight-level tools (e.g. PEFTGuard), not just behavior tests.
The Defense This Would Have Stopped

Treat adapters as untrusted code: source them from verifiable providers, hash/sign them, and run adapter-specific backdoor detection before merging (Part 4).

← BackNext → Fake names, real malware