A proprietary LLM represents enormous investment: training compute, data, fine-tuning, and RLHF. An attacker can approximate that model by querying the API repeatedly with targeted inputs and collecting the outputs. With enough input/output pairs, they can train a copy-cat model that mimics the original — without paying for any of the original’s development.
The LLM10 angle: this attack requires massive, unchecked API consumption. Without per-user quotas, there’s no ceiling on how many queries the attacker can run.
The setup: Proofpoint’s email filtering ML model scored each email and included that score in a header field visible to senders. Researchers Will Pearce and Nick Landers noticed this at DerbyCon 2019.
The extraction: By systematically varying email content and collecting the returned scores, they trained a copy-cat classifier that mimicked Proofpoint’s model. They then crafted emails engineered to score well against the real filter — effectively bypassing it entirely.
The resource consumption angle: this required sending large volumes of probe emails to collect sufficient training data. No usage limits prevented the systematic probing.