LLM10:2025 — Unbounded Consumption

Slide 8 · The Misconception

“We have rate limiting.”

Request-count rate limiting is not the same as token consumption rate limiting.

The Dangerous Assumption

Many teams implement rate limiting at the request level: “max 10 requests per minute per user.” They believe this controls cost. It doesn’t — not by itself.

Why 10 Requests Can Still Destroy You

If each of those 10 requests submits a 50,000-token context window and prompts a 10,000-token response, that’s 600,000 tokens per minute per user. At $15/million tokens for output, one user at 10 req/min generates $9/min — over $500/hour. Request-count limits don’t touch this.

❌ Insufficient: Request Rate Limit Only

Max 10 requests/minute

No input size check

No output token cap

No cumulative token budget

✅ Sufficient: Multi-Dimensional Limits

Max 10 requests/minute

Max 4,000 input tokens per request

Max 2,000 output tokens per request

Max 100,000 tokens/hour cumulative per user

The Right Mental Model

Rate limiting controls frequency. Token limits control magnitude. You need both.

← Back Next → The four attack types