The study: Researchers at NYU gave GitHub Copilot (GPT-based) 89 coding scenarios specifically chosen because they involve security-sensitive patterns — memory management, SQL queries, file path handling, authentication, and cryptography.
The result: Approximately 40% of Copilot’s generated code contained security vulnerabilities. Copilot suggested buffer overflows, SQL injection vulnerabilities, and path traversal issues in natural-looking, functional-seeming code — with no warning.
The key finding: The model’s confidence in secure code was indistinguishable from its confidence in insecure code. There was no signal to the developer that a given suggestion was dangerous.
Why it happens: The model learned from real-world code, which contains vast amounts of insecure patterns. Statistically common patterns in the training corpus appear in completions — including the insecure ones.
SAST (Static Application Security Testing) and code review as mandatory steps on LLM-generated code. Never deploy AI-generated code for security-sensitive functionality without human review and automated scanning.