LLM08:2025 — Vector & Embedding Weaknesses

Slide 11 · Embedding Inversion — Real Example

Vec2Text — Your vector store is a readable copy of your documents.

Research Demonstration · Cornell University · 2023–2025

Vec2Text: Text Embeddings Reveal (Almost) As Much As Text

No CVE · arXiv:2310.06816 · Multiple follow-on papers through 2025

The setup: Researchers at Cornell asked whether text embeddings — the vectors stored in AI knowledge bases — could be reversed to reconstruct the original source text.

The result: Yes. Vec2Text reconstructed source text with ~92% accuracy for 32-token inputs from OpenAI’s text-embedding-ada-002. Passwords, patient notes, private messages, contract terms — all recoverable from their vector representations alone.

The 2025 escalation: Zero-shot variants emerged that require no model queries. An attacker with read access to the vector store can invert embeddings entirely offline, without interacting with the embedding model.

The risk: Organizations assume their vector store is an opaque index. It is not. It is a mathematically recoverable copy of every document ever embedded — without any additional storage of the source text.

Why it matters for LLM08: Any attacker who gains read access to your unencrypted vector store gains access to the source documents themselves — without ever touching your document storage system.

← Back Next → Cross-Context Leakage