“Use strict sandboxing to limit model exposure to unverified data sources,” and apply anomaly detection and data-filtering techniques to screen out adversarial or poisoned data before training.
Tay had no sandbox between live user input and its learning loop. Anomaly detection over incoming data would have flagged the coordinated surge of toxic content for what it was.
→ Isolate the ingestion of any unverified data source
→ Run statistical anomaly detection over each training batch
→ Filter outliers and suspected adversarial samples before they ever reach training
Inject synthetic outliers into a training batch. If the pipeline ingests them without flagging, your detection layer doesn't exist yet.