A modern LLM agent doesn’t just generate text — it calls tools (web search, code execution, database lookups), reads the results, decides what to do next, and repeats. This is a loop. Without a maximum step count or execution timeout, a crafted prompt can keep the agent in that loop indefinitely.
Example: "Search for the answer, but if you’re not 100% certain, search again with a refined query." With no stop condition, the agent refines its search query forever — each iteration consuming tokens, tool calls, and compute.
Researchers at Google DeepMind discovered that repeating a single token — such as "poem" thousands of times — caused ChatGPT to diverge from its aligned behavior. Rather than refusing or staying on-topic, the model began emitting verbatim training data, continuing to generate far beyond a typical response length.
The attack consumed disproportionate compute and tokens per query compared to a normal interaction, because the model entered an atypical generation mode. For roughly $200 in API calls, researchers extracted over 10,000 verbatim training examples. The resource consumption was the mechanism — sustained generation until behavioral breakdown.