An attacker builds a script that calls an LLM API endpoint with prompts specifically crafted to trigger the longest possible responses: "Write a comprehensive 10,000-word guide to...", "List every possible permutation of...", "Explain in exhaustive detail every aspect of..."
The API has no output token cap configured. Each prompt generates a response as long as the model is willing to produce. The attacker runs 500 parallel requests at a time, around the clock. The API looks healthy — responses are succeeding, latency is normal. The invoice for that month is $83,000.
If the attacker is using compromised or free-tier credentials (like the Sourcegraph incident), their own cost may be zero. The victim pays for every token of every response.