"Direct prompt injections occur when a user's prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (a malicious actor deliberately crafting a prompt) or unintentional (a user inadvertently providing input that triggers unexpected behavior)."
OWASP Scenario #3 documents this: a job applicant used an LLM to optimize their resume and inadvertently triggered a hidden AI-detection instruction embedded in the job description they fed to the model. No malice — still an injection. The model did something neither party intended.
Step 1: Developer writes a system prompt defining what the AI should and shouldn't do.
Step 2: Attacker crafts a message designed to override or ignore those instructions.
Step 3: Model processes both together — with no inherent way to know which takes priority.
Step 4: The injected instruction wins. The model does what the attacker wanted.
These appear constantly in documented attempts logged by tools like Lakera Guard:
The Freysa attacker didn't use these obvious phrases — they claimed to start a "new session" and redefined the AI's own function meanings. The attack succeeded on the 482nd attempt after 481 people tried simpler approaches that all failed.