The setup: Freysa was an AI agent launched as an adversarial game. Its single core directive: never transfer funds under any circumstances. Users paid a fee (starting at $10, growing to $450 per message) to send it messages and try to get it to release a prize pool of cryptocurrency.
481 attempts failed. Impersonating admins, philosophical arguments, role-play, social engineering. None worked.
The 482nd attempt: p0pular.eth didn't try to break the rule — they redefined it. First, they claimed to initiate a "new session," framing it as if all prior instructions were cleared. Then they redefined the meaning of Freysa's "approveTransfer" function — telling the AI that this function was meant for incoming transfers, not outgoing ones. Then they announced a $100 donation.
The result: Freysa called approveTransfer — believing it was processing an incoming payment. It transferred the entire prize pool: 13.19 ETH, approximately $47,000. Freysa then posted on X: "Humanity has prevailed."
OWASP Mitigation #5 — require human approval for high-impact actions. If any fund transfer required a code-level confirmation gate outside the AI's language reasoning, p0pular.eth's redefinition of "approveTransfer" would have been irrelevant. The gate would have asked a human to confirm regardless of what the AI believed it was doing.