LLM03:2025 — Supply Chain

Slide 23 · Mitigation 5 of 9 — Red-Team First

Test a third-party model like an adversary — before you ship it.

📄 OWASP LLM Top 10:2025 · LLM03 Prevention #3

OWASP — AI Red Teaming & Evaluation

Apply AI red teaming and evaluations when selecting a third-party model

What OWASP Says

“Apply comprehensive AI red teaming and evaluations when selecting a third-party model. Decentralized Identity and similar approaches… can verify provenance and authenticity.”

The Gap This Closes — and Its Limit

PoisonGPT (Slide 15) scored within 0.1% of the original; the LoRA backdoor (Slide 13) preserved clean-task accuracy. So a casual benchmark is not red teaming. Real red teaming probes for triggered/backdoor behavior — while remembering a trigger you don't know is hard to fire on purpose, which is why this layers with provenance.

How to Do This Right

→ Run adversarial evaluations (jailbreaks, trigger probing, refusal-removal checks) on candidate models before adoption
→ Compare behavior against a known-good reference where one exists
→ Treat “passed our safety benchmark” as necessary, never sufficient

How to Validate

Ask what adversarial tests a model passed before it entered production. If the answer is “it scored well on a public benchmark,” it was evaluated, not red-teamed.

← Back Next → Demand a paper trail