Hallucination detection is the most underrated skill in AI training evaluation. Models that hallucinate confidently look exactly like models that reason confidently — until you know what to look for. Here are the patterns experienced evaluators use.
What hallucination actually is
A hallucination is a confident model output that's not supported by the source material, the model's actual knowledge, or sound logical inference. The distinction that matters: hallucination is different from inference. Inference is reasonable extension of available facts. Hallucination is fabrication that sounds reasonable.
Pattern 1: confident specificity without anchor
Models hallucinate most often when they produce specific claims (dates, numbers, names, citations) that aren't grounded in the source.
Example: Asked to summarize a 2023 paper, the model says "Smith et al. (2021) showed a 14.7% improvement..." If neither "Smith et al. 2021" nor "14.7%" appears in the source, that's hallucination — even if a similar paper exists.
Detection rule: Specific facts that aren't traceable to the source are hallucinations until proven otherwise.
Pattern 2: bridging gaps with plausible content
When models lack information for a step in a multi-step reasoning chain, they often fill the gap with plausible-sounding content rather than acknowledging uncertainty.
Example: Math reasoning task asks about a theorem the model doesn't know well. Instead of saying "I'm not sure about the standard form of this theorem," the model produces a confident-sounding but wrong statement.
Detection rule: Models rarely say "I don't know" without prompting. Confident statements about niche or specialized topics are higher hallucination risk than confident statements about common topics.
Pattern 3: pattern-completion without verification
Models trained on massive code corpora often "complete" patterns they've seen rather than reason about the specific code at hand.
Example: A code task asks about a custom library function. The model describes what a similarly-named function in a different library does, treating them as the same.
Detection rule: When a model's response would apply to a different but similar object, suspect pattern-completion hallucination.
Pattern 4: subtle hedging that masks fabrication
Sophisticated models have learned to hedge confidently-fabricated claims with phrases like "it's generally accepted that" or "research suggests." This makes hallucinations look like cautious statements.
Example: "Recent research suggests that compound X has potential cancer-fighting properties." If no such research exists, the hedging doesn't make it true — it just makes it a hedge-wrapped hallucination.
Detection rule: Hedging without citation is suspicious. Real hedged claims usually come with at least a vague citation ("a 2022 review in Nature").
Pattern 5: contextual truth, global falsity
Models sometimes produce statements that are true within the immediate context (consistent with the prompt and prior turns) but false in absolute terms.
Example: Conversation establishes that the user is asking about Company X's products. Model produces a confident statement about Company X's flagship product, but the actual flagship is different.
Detection rule: Contextual consistency is necessary but not sufficient. Verify against external knowledge.
How to verify suspected hallucinations
- Trace specifics to the source. Quote the exact passage that supports the claim.
- Search for citations. Real papers, real names, real numbers should be findable.
- Check pattern-completion candidates. Search for similarly-named entities that the model might be conflating.
- Apply domain expertise. Subject-matter knowledge beats general fact-checking for niche topics.
What to write when you flag a hallucination
Strong flag pattern:
- Identify the specific hallucinated claim. Quote the model's exact words.
- Explain why it's hallucinated. "The source contains no mention of [X]" or "Smith 2021 doesn't exist; Smith 2018 covers a different topic."
- Distinguish from inference. Note explicitly that this isn't a reasonable extension — it's fabrication.
- Estimate severity. Hallucinated facts in safety-critical domains (medical, legal) are higher severity than hallucinated stylistic detail.
Bottom line
Hallucination detection is a learnable skill, not innate intuition. The five patterns above (confident specificity without anchor, gap-bridging, pattern-completion, hedge-wrapped fabrication, contextual-but-globally-false claims) cover most real-world cases. Senior-tier evaluators flag these consistently and document them clearly. Building this skill takes 30–60 days of deliberate practice; it's worth it.