LLM10:2025 — Unbounded Consumption

Slide 11 · Denial of Wallet

The attack that doesn’t crash your app. It just empties your account.

Crafted to force maximum token output. The service keeps running. The bill does not.

How It Works

An attacker identifies that an application has no output token cap. They craft prompts designed to force the longest possible responses — "Write an exhaustive 10,000-word analysis of...", "List every possible example of...", "Generate a comprehensive step-by-step guide for every..."

Then they automate it. A script runs these prompts in parallel, around the clock, against the unprotected API. The attacker’s cost: nearly zero. They pay only for their own inference if they’re using a free or cheap account. The victim pays for every output token.

The Economics of the Attack

Output tokens cost more than input tokens on most APIs. A prompt that forces a 50,000-token response costs the victim perhaps 75x more than the attacker’s input. Run that at 100 parallel requests and the bill compounds faster than any monitoring cycle can catch it manually.

Documented Attack Class · OWASP LLM10:2025 · Denial of Wallet

Resource-Exhaustion Prompt Automation

No single CVE · OWASP-documented attack pattern · Observed across multiple AI API providers

OWASP explicitly documents this as an attack scenario: "An attacker repeatedly sends resource-intensive queries... leading to massive bills for the provider." Security researchers have demonstrated proof-of-concept attacks against uncapped LLM APIs, generating thousands of dollars in charges from a single automated script in under an hour.

Key insight: the goal is not disruption. The service looks healthy the entire time. Detection requires financial monitoring, not availability monitoring.

← Back Next → Reasoning Loop Exploitation