LLM09:2025 — Misinformation

Slide 6 · Why It Happens

The root causes are structural, not accidental.

They cannot be fully fixed with a single patch.

Root Causes

Misinformation is not a bug that can be fully patched. It stems from how LLMs are built.

📅

Training cutoffs

The model’s knowledge is frozen at training time. After the cutoff, anything it says about current events is interpolated from stale data — or fabricated entirely.

🗃

Knowledge gaps in training data

Training data skews toward common knowledge. Niche domains — specific case law, obscure regulations, specialized medical protocols — are sparsely represented, making hallucination more likely exactly where accuracy matters most.

🎯

No ground-truth lookup mechanism

The base model has no live connection to a database, no way to verify claims, and no mechanism to distinguish remembered training data from interpolated output.

🏋

RLHF rewards fluency, not accuracy

Reinforcement Learning from Human Feedback trains models on human ratings. Fluent, helpful-sounding answers often score higher than hedged, uncertain ones — even if the hedged answer is more honest.

📐

Poor confidence calibration

The model has no reliable internal signal for “I don’t know this.” Its expressed confidence often does not track its actual accuracy.

← Back Next → LLM09 vs LLM01