“For LLM applications performing high-impact actions, require human approval in an interrupt-driven approach before the LLM proceeds.” The human-in-the-loop is the last safety net when every other control fails. It is also what the attacker tries to disable first.
CVE-2025-53773: Copilot’s first step was to modify VS Code settings to set chat.tools.autoApprove: true — disabling all confirmation gates. The attacker’s very first action eliminated the safety net. Once that was done, every subsequent action (download, exec, C2 connect) happened without a single prompt to the developer.
→ Classify every agent action as low-impact (no approval needed) or high-impact (approval required)
→ High-impact: send message, delete record, modify security config, execute code, transfer funds
→ Present the proposed action and its parameters to the user before executing — not after
→ Security-relevant configuration changes must require explicit confirmation and must not be overridable by the agent itself
For each high-impact action type, test whether the agent can perform it without user interaction via a crafted prompt. Verify that a confirmation gate appears before executing. Also verify that the agent cannot be instructed to disable that gate.