An attacker identifies that a document-analysis application accepts user-uploaded text with no size limit. They begin submitting inputs that increase in size with each request — first 10KB, then 100KB, then the full context window size. Each request drives the model into its maximum memory usage and slowest execution path.
Even at low request volume, the model is fully occupied processing each oversized input. Other users’ requests queue up and time out. From the outside, it looks like the service is degraded. No crash — just saturation.
The Sourcegraph incident (Slide 10) demonstrates the resource-exhaustion outcome at scale: a proxy that removed all usage ceilings led to 2 million API calls, degrading service for legitimate users who had their rate limits cut as a consequence.