An attacker identifies that an application has no output token cap. They craft prompts designed to force the longest possible responses — "Write an exhaustive 10,000-word analysis of...", "List every possible example of...", "Generate a comprehensive step-by-step guide for every..."
Then they automate it. A script runs these prompts in parallel, around the clock, against the unprotected API. The attacker’s cost: nearly zero. They pay only for their own inference if they’re using a free or cheap account. The victim pays for every output token.
Output tokens cost more than input tokens on most APIs. A prompt that forces a 50,000-token response costs the victim perhaps 75x more than the attacker’s input. Run that at 100 parallel requests and the bill compounds faster than any monitoring cycle can catch it manually.
OWASP explicitly documents this as an attack scenario: "An attacker repeatedly sends resource-intensive queries... leading to massive bills for the provider." Security researchers have demonstrated proof-of-concept attacks against uncapped LLM APIs, generating thousands of dollars in charges from a single automated script in under an hour.