LLM05:2025 — Improper Output Handling

Slide 22 · Mitigation 4 of 7

If you must run LLM-generated code, run it where it can’t hurt you.

📄 OWASP LLM Top 10:2025 · LLM05 Prevention — Sandboxed Execution

M4 — Sandbox & Isolation

Execute LLM-Generated Code in Isolated, Resource-Limited Environments

What OWASP Says

“Implement sandboxing or containerization for LLM-generated code execution to prevent unauthorized access to resources. Limit CPU, memory, and network access. Use micro-VMs (Firecracker), Wasm, or gVisor for highest isolation.”

How Missing This Made a Real Incident Worse

CVE-2023-29374 (LangChain CVSS 9.8) used Python’s exec() with no sandbox, meaning attacker code ran with the full permissions of the LangChain process — which in many deployments is a server with network access and filesystem write permissions. Vanna.AI (CVE-2024-5565) had the same architecture: exec() in the host Python process. Both fixes required removing or fully sandboxing the execution step.

How to Do This Right

→ Best: Run generated code in a Firecracker micro-VM or Wasm sandbox — complete isolation, separate kernel
→ Good: Docker container with --network none, --read-only, memory and CPU limits, dropped capabilities (--cap-drop ALL)
→ Never: exec(), eval(), or subprocess.run(shell=True) on LLM output in the host process
→ If code must be run, design so the sandbox’s compromise costs nothing: ephemeral, no secrets, no network

How to Validate

Ask the LLM to generate code that calls os.system("id"). If the command runs and output appears anywhere in the response, sandbox is absent or broken.

← Back Next → M5: Structured outputs