“Encourage the LLM to self-disclose uncertainty. When LLMs are not sure about a statement’s truth, have them disclose this.” OWASP also calls for “designing LLMs to refuse to answer questions outside their training scope.”
In Mata v. Avianca: the attorneys explicitly asked ChatGPT to confirm the cases were real. The model reaffirmed them — generating additional fabricated detail rather than expressing uncertainty. If the model had been tuned to respond “I cannot verify this case exists in any database I have access to,” the filing would not have been made. The absence of an uncertainty signal was the critical gap.
→ Use system prompts that explicitly instruct the model to hedge: “If you are not certain of a fact, say so. Respond with ‘I’m not confident about this — please verify independently’ rather than stating uncertain claims as fact.”
→ For domain-specific deployments: fine-tune or use RLHF on examples that reward appropriate uncertainty expression
→ Build UX elements that surface confidence signals: “This answer is based on retrieved documents” vs. “This answer could not be verified against a source”
Ask the model something it definitely cannot know: a very recent event after its training cutoff, a fictitious person’s biography, a made-up regulation. Does it hedge appropriately? Or does it fabricate an answer? If it fabricates, uncertainty disclosure is not working.