Slide 13 of 27
Part 2 · TypesSlide 13
Slide 13 · Hallucinated Code
40% of Copilot’s security-sensitive outputs were vulnerable.
Pearce et al. (NYU), IEEE S&P 2022.
Research Finding · 2022 · IEEE Symposium on Security and Privacy
Asleep at the Keyboard: Copilot Generates Insecure Code in 40% of Security-Sensitive Scenarios
No CVE · Pearce et al. (NYU) · arXiv:2108.09293 · IEEE S&P 2022

The study: Researchers at NYU gave GitHub Copilot (GPT-based) 89 coding scenarios specifically chosen because they involve security-sensitive patterns — memory management, SQL queries, file path handling, authentication, and cryptography.

The result: Approximately 40% of Copilot’s generated code contained security vulnerabilities. Copilot suggested buffer overflows, SQL injection vulnerabilities, and path traversal issues in natural-looking, functional-seeming code — with no warning.

The key finding: The model’s confidence in secure code was indistinguishable from its confidence in insecure code. There was no signal to the developer that a given suggestion was dangerous.

Why it happens: The model learned from real-world code, which contains vast amounts of insecure patterns. Statistically common patterns in the training corpus appear in completions — including the insecure ones.

Why it matters for LLM09: This is misinformation in code form. The model confidently suggests something wrong, and unlike legal misinformation, insecure code can sit in production for months before it causes harm.
The Defense

SAST (Static Application Security Testing) and code review as mandatory steps on LLM-generated code. Never deploy AI-generated code for security-sensitive functionality without human review and automated scanning.

← BackNext → Scenario 1: Legal