A large enterprise deploys a single AI assistant for all departments — HR, legal, finance, and clinical operations share one vector database. Cheaper to run, easier to maintain than separate per-department systems.
An HR manager asks: “What is our process for medical leave documentation?”
The retrieval system finds the highest-similarity documents. “Medical leave documentation” is semantically similar to clinical notes and patient intake forms — even though those belong to a different department the HR manager has no clearance to access.
The AI answers using confidential clinical data it was never supposed to surface for this user.
Cosine similarity measures semantic closeness — not access rights. If retrieval only asks “which documents are most similar?” without also asking “which of those is this user authorized to see?”, data crosses boundaries silently and without error.
Formally listed under LLM08:2025 as “Unauthorized Access and Data Leakage” — one of the five core vulnerability categories in this risk.