LLM09:2025 — Misinformation

Slide 8 · The Misconception

The model will not always say when it doesn’t know.

The most dangerous assumption users bring to LLMs.

The Misconception

“The model will tell me when it doesn’t know something.”

This is the most dangerous assumption users bring to LLMs — and it is wrong often enough to matter.

Why Models Don’t Always Hedge

RLHF-trained models learn that helpful, confident answers score higher with human evaluators. Uncertain, hedged answers score lower. Over millions of training steps, this creates pressure toward confident-sounding responses even when the model has low internal certainty.

In Mata v. Avianca: when the attorneys asked ChatGPT to confirm the cases were real, it reaffirmed them — generating additional fabricated detail rather than admitting it had invented them.

❌ What Users Assume

Model hedges when uncertain: “I’m not sure about this”

False outputs are rare edge cases

Asking for confirmation catches errors

✅ What’s Actually True

Confidence is a style, not a signal of accuracy

Hallucination is routine, especially in niche domains

Asking the model to confirm often produces more hallucination

The Takeaway

Verification is the user’s job, not the model’s. If you are relying on the model to catch its own errors, that control does not exist.

← Back Next → The 4 types