Slide 22 of 27
Part 4 · PreventionSlide 22
Slide 22 · Mitigation 4 of 7
If you must run LLM-generated code, run it where it can’t hurt you.
📄 OWASP LLM Top 10:2025 · LLM05 Prevention — Sandboxed Execution
M4 — Sandbox & Isolation
Execute LLM-Generated Code in Isolated, Resource-Limited Environments

“Implement sandboxing or containerization for LLM-generated code execution to prevent unauthorized access to resources. Limit CPU, memory, and network access. Use micro-VMs (Firecracker), Wasm, or gVisor for highest isolation.”

CVE-2023-29374 (LangChain CVSS 9.8) used Python’s exec() with no sandbox, meaning attacker code ran with the full permissions of the LangChain process — which in many deployments is a server with network access and filesystem write permissions. Vanna.AI (CVE-2024-5565) had the same architecture: exec() in the host Python process. Both fixes required removing or fully sandboxing the execution step.

Best: Run generated code in a Firecracker micro-VM or Wasm sandbox — complete isolation, separate kernel
Good: Docker container with --network none, --read-only, memory and CPU limits, dropped capabilities (--cap-drop ALL)
Never: exec(), eval(), or subprocess.run(shell=True) on LLM output in the host process
→ If code must be run, design so the sandbox’s compromise costs nothing: ephemeral, no secrets, no network

Ask the LLM to generate code that calls os.system("id"). If the command runs and output appears anywhere in the response, sandbox is absent or broken.

← BackNext → M5: Structured outputs