"Continuously monitor resource usage and implement logging to detect and respond to unusual patterns of resource consumption." "Track per-user and per-session token consumption trends, not just aggregate system load."
In the Sourcegraph incident, the team noticed "a significant increase in API usage" — manually, after it had already caused widespread impact. There was no automated anomaly detection. The startup in Slide 1 discovered the $47,000 bill Sunday morning — three days after the consumption began. Detection was delayed because monitoring looked at availability, not cost.
→ Log every LLM API call with: user ID, timestamp, input tokens, output tokens, model, cost estimate.
→ Calculate a rolling baseline for average tokens per request and per user per hour.
→ Alert when any user’s token consumption is 5x their 7-day average in a single hour.
→ Alert when system-wide token consumption exceeds 3x the prior day’s same-hour figure.
→ Build a cost dashboard visible to the on-call engineer — not just an engineering metric.
Simulate a usage spike: use a script to send 50 large requests in 5 minutes from a test user account. Does any alert fire? If not, you have no effective monitoring for unbounded consumption.