LLM08:2025 — Vector & Embedding Weaknesses

Slide 2 · What Is a Vector

AI doesn’t search text. It searches numbers.

Understanding this is the whole key to understanding LLM08.

The Core Idea

When you add a document to an AI knowledge base, the system doesn’t store it as readable text. It converts it into a list of hundreds or thousands of decimal numbers — a vector (also called an embedding). Documents that mean similar things get similar numbers. “Q3 revenue was disappointing” and “third quarter earnings fell short” end up with vectors that are mathematically close together.

Traditional Search

Looks for exact keywords — finds “project status” only if those words appear.

Misses synonyms — “project progress” returns nothing.

Vector Search

Finds meaning, not words — “project status” and “project progress” have similar vectors, so both surface.

But: adversaries can craft documents whose vectors score high for target queries — without those keywords appearing at all.

The Security Implication

Because retrieval is based on mathematical distance, not keyword matching, text-based content filters cannot detect adversarial documents. The attack lives in the numbers, not the words.

← Back Next → How RAG works