Slide 23 of 29
Part 4 · PreventionSlide 23
Slide 23 · Mitigation 5 of 9 — Red-Team First
Test a third-party model like an adversary — before you ship it.
📄 OWASP LLM Top 10:2025 · LLM03 Prevention #3
OWASP — AI Red Teaming & Evaluation
Apply AI red teaming and evaluations when selecting a third-party model

“Apply comprehensive AI red teaming and evaluations when selecting a third-party model. Decentralized Identity and similar approaches… can verify provenance and authenticity.”

PoisonGPT (Slide 15) scored within 0.1% of the original; the LoRA backdoor (Slide 13) preserved clean-task accuracy. So a casual benchmark is not red teaming. Real red teaming probes for triggered/backdoor behavior — while remembering a trigger you don't know is hard to fire on purpose, which is why this layers with provenance.

→ Run adversarial evaluations (jailbreaks, trigger probing, refusal-removal checks) on candidate models before adoption
→ Compare behavior against a known-good reference where one exists
→ Treat “passed our safety benchmark” as necessary, never sufficient

Ask what adversarial tests a model passed before it entered production. If the answer is “it scored well on a public benchmark,” it was evaluated, not red-teamed.

← BackNext → Demand a paper trail