Split-View Poisoning: big datasets store URLs, not the images and text themselves. Buy an expired domain a dataset still points to, swap in your own content, and everyone who downloads the dataset later gets your poison. Poisoning 0.01% of LAION-400M / COYO-700M would have cost roughly $60.
Frontrunning Poisoning: datasets like Wikipedia are snapshotted on a known schedule. Edit a page just before the snapshot and your malicious version gets captured into the training set — even if a human reverts it seconds later.
The attackers didn't break into anything. They exploited the gap between when a dataset was indexed and when it was downloaded.