The setup: Researchers at Cornell asked whether text embeddings — the vectors stored in AI knowledge bases — could be reversed to reconstruct the original source text.
The result: Yes. Vec2Text reconstructed source text with ~92% accuracy for 32-token inputs from OpenAI’s text-embedding-ada-002. Passwords, patient notes, private messages, contract terms — all recoverable from their vector representations alone.
The 2025 escalation: Zero-shot variants emerged that require no model queries. An attacker with read access to the vector store can invert embeddings entirely offline, without interacting with the embedding model.
The risk: Organizations assume their vector store is an opaque index. It is not. It is a mathematically recoverable copy of every document ever embedded — without any additional storage of the source text.