LLM08:2025 — Vector & Embedding Weaknesses

Slide 12 · Cross-Context Leakage — Real Pattern

Similarity search doesn’t know about org charts.

Multi-tenant vector databases are a documented OWASP risk pattern.

The Setup

A large enterprise deploys a single AI assistant for all departments — HR, legal, finance, and clinical operations share one vector database. Cheaper to run, easier to maintain than separate per-department systems.

The Problem

An HR manager asks: “What is our process for medical leave documentation?”

The retrieval system finds the highest-similarity documents. “Medical leave documentation” is semantically similar to clinical notes and patient intake forms — even though those belong to a different department the HR manager has no clearance to access.

The AI answers using confidential clinical data it was never supposed to surface for this user.

The Root Cause

Cosine similarity measures semantic closeness — not access rights. If retrieval only asks “which documents are most similar?” without also asking “which of those is this user authorized to see?”, data crosses boundaries silently and without error.

OWASP’s Classification

Formally listed under LLM08:2025 as “Unauthorized Access and Data Leakage” — one of the five core vulnerability categories in this risk.

← Back Next → Context Manipulation