Slide 21 of 27
Part 4 · PreventionSlide 21
Slide 21 · Mitigation Category 3 of 6
Don’t let cosine similarity be the only gate.
📄 OWASP LLM Top 10:2025 · LLM08 Prevention — Retrieval Controls
M3 — Retrieval Controls Beyond Similarity
Apply Provenance Checks and Confidence Thresholds

“Apply strict retrieval controls using metadata filters and confidence thresholds beyond similarity scores.” The retrieval layer should validate not only how similar a document is but also where it came from, when it was added, and by whom.

PoisonedRAG (Slide 10) works by maximizing cosine similarity — that is the entire attack mechanism. A provenance check flagging newly added documents from low-trust contributors, or a confidence floor that alerts when a document scores unusually high relative to its historical baseline, would have surfaced the poisoned documents before they influenced any responses.

→ Require retrieved documents to pass a provenance check: known source, authorized contributor, not recently modified by a low-trust account
→ Set confidence thresholds: flag a document that suddenly ranks #1 for a query it has never appeared in before
→ Limit how many retrieved documents per query can originate from a single contributor

Inject a canary document from a low-trust test account. Query the system on the canary’s topic. Does the canary rank at the top? If yes, provenance is not part of the retrieval scoring.

← BackNext → M4: Encrypt vectors