Slide 8 of 27
Part 1 · What Is It?Slide 8
Slide 8 · The Misconception
The model will not always say when it doesn’t know.
The most dangerous assumption users bring to LLMs.
The Misconception

“The model will tell me when it doesn’t know something.”

This is the most dangerous assumption users bring to LLMs — and it is wrong often enough to matter.

Why Models Don’t Always Hedge

RLHF-trained models learn that helpful, confident answers score higher with human evaluators. Uncertain, hedged answers score lower. Over millions of training steps, this creates pressure toward confident-sounding responses even when the model has low internal certainty.

In Mata v. Avianca: when the attorneys asked ChatGPT to confirm the cases were real, it reaffirmed them — generating additional fabricated detail rather than admitting it had invented them.

❌ What Users Assume
Model hedges when uncertain: “I’m not sure about this”
False outputs are rare edge cases
Asking for confirmation catches errors
✅ What’s Actually True
Confidence is a style, not a signal of accuracy
Hallucination is routine, especially in niche domains
Asking the model to confirm often produces more hallucination
The Takeaway

Verification is the user’s job, not the model’s. If you are relying on the model to catch its own errors, that control does not exist.

← BackNext → The 4 types