Slide 1 of 27
Part 1 · What Is It?Slide 1
PART 1
What Is It?
Slides 1–8 · No jargon yet
Slide 1 · The Setup
Before we define anything — read this story.
This happened. Follow it. The definition will make sense after.
The Scenario

A startup launches an AI customer support chatbot. It's powered by an LLM API billed by the token. They announce it on a Thursday, go viral by Friday evening, and wake up Saturday to a flood of users. The chatbot is doing exactly what it was built to do.

Then This Happens

By Sunday morning: $47,000 in API charges. One user asked for a "comprehensive analysis" and the model wrote 40 pages. Another triggered a loop that kept calling a search tool, re-reading results, and generating summaries — for 22 minutes straight before timing out. There were no per-user limits. No token caps. No budget alerts. The system just kept generating.

What Just Happened

This is unbounded consumption — when an LLM application places no meaningful ceiling on the compute, tokens, API calls, or money a single user or request can consume. The model did nothing wrong. The infrastructure had no limits. That was the vulnerability.

One Line to Remember

Unbounded consumption is what happens when an AI system has no spending limits — and someone — or just traffic — finds that out.

That makes sense → What counts as “unbounded”?