Six missing controls. Any one of them is enough to create the risk.
This isn’t an LLM bug. It’s an application design gap.
1️⃣
No input size limits
Users can submit prompts of arbitrary length. A single 200,000-token prompt is expensive before the model even responds.
2️⃣
No output token caps
The “max_tokens” parameter isn’t set — or is set too high. The model generates until it naturally stops, which can be thousands of tokens.
3️⃣
No per-user rate limiting
Any authenticated (or unauthenticated) user can make unlimited requests per second, minute, or hour.
4️⃣
No cumulative budget ceiling
There’s no dollar limit per user, per session, or per day. No alert fires when spend crosses a threshold.
5️⃣
No monitoring for abnormal patterns
Nobody is watching token-per-request trends or per-user spend. The $47,000 bill was discovered Sunday morning — three days after it started.
6️⃣
Agentic loops with no stop condition
An agent that calls tools, reads results, and re-generates has no maximum step count or execution timeout. One crafted prompt can keep it running indefinitely.