"Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions."
Freysa AI: No human approval before fund transfer. The AI's own judgment was the only gate. One successful injection bypassed it and transferred $47,000. A code-level confirmation gate would have held regardless of what the AI believed it was doing.
GitHub Copilot CVE-2025-53773: Copilot modified .vscode/settings.json without user approval. Microsoft's patch was exactly this mitigation: requiring user approval before security-relevant configuration changes. They implemented M5 retroactively after researchers found the vulnerability.
Sending emails on behalf of users · deleting or modifying data · making financial transactions · changing permissions or access controls · executing code or shell commands · modifying configuration files · sharing data externally
Critical: The confirmation logic must live in code — outside the model's reasoning path. A model that has been injected can claim it already received human approval. The gate cannot be delegated to the model itself.
Attempt an injection targeting a high-risk action. Verify the approval gate fires. Then verify an injected claim of "already approved" cannot bypass the gate. The approval check is in code — it doesn't care what the model believes.