LLM08:2025 — Vector & Embedding Weaknesses

Slide 24 · Mitigation Category 6 of 6

Deleted documents should not leave ghost embeddings behind.

📄 OWASP LLM Top 10:2025 · LLM08 Prevention — Lifecycle Management

M6 — Limit Embedding Persistence

Apply Retention Policies and Cascade Deletion to Embeddings

What OWASP Says

“Apply retention policies and periodic reindexing” to ensure that removed or compromised documents do not continue influencing retrieval after deletion. Deletion of a source document should cascade immediately to deletion of its embedding — not wait for a scheduled reindex cycle.

Which Incident This Would Have Stopped

ConfusedPilot (Slide 16) demonstrated this directly: after the malicious document was deleted, AI responses remained manipulated because the embedding persisted in the vector cache. Had deletion triggered an immediate cascade to the vector store, the attack would have ended at document removal rather than continuing silently.

How to Do This Right

→ Wire document deletion events to embedding deletion in the vector store — do not rely on scheduled reindexing as the only cleanup mechanism
→ Apply TTLs to embeddings from external or low-trust sources; require periodic re-validation before they remain searchable
→ Schedule full reindexes at a frequency appropriate to your threat model: daily for high-sensitivity systems, weekly for lower-risk environments

How to Validate

Delete a document from your source system. Immediately query the AI on the document’s topic. If the deleted document’s content still appears in retrieved context, embeddings are persisting beyond their source document’s lifecycle.

← Back Next → The matrix