“Implement anomaly detection and adversarial robustness tests on supplied models and data to help detect tampering and poisoning.”
This is the layer aimed squarely at the stealthy attacks: the LoRA backdoor (Slide 13) and PoisonGPT (Slide 15) both pass normal tests. Specialized tooling — like weight-space backdoor detectors (PEFTGuard) — inspects the artifact itself rather than trusting its everyday behavior.
→ Scan model/adapter weights for backdoor signatures, not just run prompts
→ Probe with adversarial and trigger-style inputs; watch for behavior that flips on odd cues
→ Flag statistical anomalies in supplied datasets before fine-tuning on them
Plant a known trigger in a test adapter and run your detection. If your pipeline only checks accuracy, it will pass the poisoned adapter — proving the gap is still there.