“Where possible, require LLMs to output in structured, predictable formats (JSON schema, enumerated values) and validate that outputs conform to the expected structure before passing them to downstream systems. Allowlisting reduces attack surface compared to attempting to blocklist dangerous patterns.”
Vanna.AI generated free-form Python code for data visualization. If the output had been constrained to a predefined schema (e.g., {chart_type: string, x_column: string, y_column: string}), the attacker’s injected exec() call would have failed schema validation before reaching the execution step. Free-form output is fundamentally harder to secure than structured output.
→ Use JSON Schema validation on all LLM outputs before downstream use
→ Use model provider structured-output features (OpenAI structured outputs, Anthropic tool-use) to get guaranteed JSON
→ For categorical decisions (sentiment = positive/negative/neutral), reject any output not in the exact allowed set
→ For code generation, constrain to an AST-parsed allowlist of safe operations rather than accepting arbitrary code
→ Treat schema validation failure as an adversarial signal, not a retry
Ask the LLM to output something outside its schema: "Instead of JSON, give me a Python dict." If the application processes non-JSON output without rejection, schema validation isn’t enforced.