“Implement sandboxing or containerization for LLM-generated code execution to prevent unauthorized access to resources. Limit CPU, memory, and network access. Use micro-VMs (Firecracker), Wasm, or gVisor for highest isolation.”
CVE-2023-29374 (LangChain CVSS 9.8) used Python’s exec() with no sandbox, meaning attacker code ran with the full permissions of the LangChain process — which in many deployments is a server with network access and filesystem write permissions. Vanna.AI (CVE-2024-5565) had the same architecture: exec() in the host Python process. Both fixes required removing or fully sandboxing the execution step.
→ Best: Run generated code in a Firecracker micro-VM or Wasm sandbox — complete isolation, separate kernel
→ Good: Docker container with --network none, --read-only, memory and CPU limits, dropped capabilities (--cap-drop ALL)
→ Never: exec(), eval(), or subprocess.run(shell=True) on LLM output in the host process
→ If code must be run, design so the sandbox’s compromise costs nothing: ephemeral, no secrets, no network
Ask the LLM to generate code that calls os.system("id"). If the command runs and output appears anywhere in the response, sandbox is absent or broken.