LLM01:2025 — Prompt Injection

Slide 8 · The Common Misconception

People think RAG or fine-tuning solves prompt injection. They don't.

One of the most common wrong assumptions in AI security.

📄 OWASP LLM Top 10:2025 · LLM01

"While techniques like Retrieval Augmented Generation (RAG) and fine-tuning aim to make LLM outputs more relevant and accurate, research shows that they do not fully mitigate prompt injection vulnerabilities."

What Is RAG?

RAG pulls in relevant documents at runtime and feeds them to the model alongside your question. Designed to do: make answers more accurate and current. Does NOT do: prevent the model from following injected instructions hidden inside those retrieved documents.

RAG Expands the Attack Surface — January 2025

Researchers demonstrated this against a major enterprise RAG system. By embedding malicious instructions in a publicly accessible document in the knowledge base, they caused the AI to: leak proprietary business intelligence to external endpoints, modify its own system prompts to disable safety filters, and execute API calls with elevated privileges.

The attack succeeded because the system treated all retrieved content as equally trustworthy — failing to isolate external data from system instructions. RAG didn't protect them. It gave the attacker a delivery mechanism.

What Is Fine-Tuning?

Fine-tuning trains a base model further on a specific dataset to specialize its behavior. Designed to do: specialize the model for a use case. Does NOT do: make the model immune to injected text at inference time. The model was trained to follow instructions — fine-tuning doesn't remove that fundamental behavior.

The Bottom Line

RAG and fine-tuning are valuable. Use them. But never treat them as security controls against prompt injection. The mitigations in Part 4 address how to actually reduce the risk.

← Back Part 1 done → Part 2: Types