Find your job

Hallucination detection patterns AI evaluators should know.

How to spot model hallucinations across code, factual, and reasoning tasks. The signal patterns that distinguish hallucination from valid inference.

Hallucination detection is the most underrated skill in AI training evaluation. Models that hallucinate confidently look exactly like models that reason confidently — until you know what to look for. Here are the patterns experienced evaluators use.

What hallucination actually is

A hallucination is a confident model output that's not supported by the source material, the model's actual knowledge, or sound logical inference. The distinction that matters: hallucination is different from inference. Inference is reasonable extension of available facts. Hallucination is fabrication that sounds reasonable.

Pattern 1: confident specificity without anchor

Models hallucinate most often when they produce specific claims (dates, numbers, names, citations) that aren't grounded in the source.

Example: Asked to summarize a 2023 paper, the model says "Smith et al. (2021) showed a 14.7% improvement..." If neither "Smith et al. 2021" nor "14.7%" appears in the source, that's hallucination — even if a similar paper exists.

Detection rule: Specific facts that aren't traceable to the source are hallucinations until proven otherwise.

Pattern 2: bridging gaps with plausible content

When models lack information for a step in a multi-step reasoning chain, they often fill the gap with plausible-sounding content rather than acknowledging uncertainty.

Example: Math reasoning task asks about a theorem the model doesn't know well. Instead of saying "I'm not sure about the standard form of this theorem," the model produces a confident-sounding but wrong statement.

Detection rule: Models rarely say "I don't know" without prompting. Confident statements about niche or specialized topics are higher hallucination risk than confident statements about common topics.

Pattern 3: pattern-completion without verification

Models trained on massive code corpora often "complete" patterns they've seen rather than reason about the specific code at hand.

Example: A code task asks about a custom library function. The model describes what a similarly-named function in a different library does, treating them as the same.

Detection rule: When a model's response would apply to a different but similar object, suspect pattern-completion hallucination.

Detection skill → senior tierHallucination detection is a senior-tier marker on most platforms.
Open calculator →

Pattern 4: subtle hedging that masks fabrication

Sophisticated models have learned to hedge confidently-fabricated claims with phrases like "it's generally accepted that" or "research suggests." This makes hallucinations look like cautious statements.

Example: "Recent research suggests that compound X has potential cancer-fighting properties." If no such research exists, the hedging doesn't make it true — it just makes it a hedge-wrapped hallucination.

Detection rule: Hedging without citation is suspicious. Real hedged claims usually come with at least a vague citation ("a 2022 review in Nature").

Pattern 5: contextual truth, global falsity

Models sometimes produce statements that are true within the immediate context (consistent with the prompt and prior turns) but false in absolute terms.

Example: Conversation establishes that the user is asking about Company X's products. Model produces a confident statement about Company X's flagship product, but the actual flagship is different.

Detection rule: Contextual consistency is necessary but not sufficient. Verify against external knowledge.

How to verify suspected hallucinations

  1. Trace specifics to the source. Quote the exact passage that supports the claim.
  2. Search for citations. Real papers, real names, real numbers should be findable.
  3. Check pattern-completion candidates. Search for similarly-named entities that the model might be conflating.
  4. Apply domain expertise. Subject-matter knowledge beats general fact-checking for niche topics.

What to write when you flag a hallucination

Strong flag pattern:

  1. Identify the specific hallucinated claim. Quote the model's exact words.
  2. Explain why it's hallucinated. "The source contains no mention of [X]" or "Smith 2021 doesn't exist; Smith 2018 covers a different topic."
  3. Distinguish from inference. Note explicitly that this isn't a reasonable extension — it's fabrication.
  4. Estimate severity. Hallucinated facts in safety-critical domains (medical, legal) are higher severity than hallucinated stylistic detail.

Bottom line

Hallucination detection is a learnable skill, not innate intuition. The five patterns above (confident specificity without anchor, gap-bridging, pattern-completion, hedge-wrapped fabrication, contextual-but-globally-false claims) cover most real-world cases. Senior-tier evaluators flag these consistently and document them clearly. Building this skill takes 30–60 days of deliberate practice; it's worth it.

Find AI training contractsAll open roles · 9 platforms · filter by rate and hours.
Find your job

Frequently asked questions

What is an AI hallucination?
A hallucination is a confident model output that isn't supported by the source material, the model's actual knowledge, or sound logical inference. The key distinction is from inference, which is reasonable extension of facts.
How do AI evaluators detect hallucinations?
Five main patterns: confident specifics without source anchor, gap-bridging with plausible content, pattern-completion (treating similar objects as same), hedge-wrapped fabrication, and contextual-but-globally-false claims.
Why is hallucination detection important in AI training?
It's one of the highest-value skills evaluators provide. Catching hallucinations during evaluation is what teaches models not to produce them. Senior-tier evaluators consistently identify hallucinations that other raters miss.
How do I improve at hallucination detection?
Practice tracing specific claims to source material, searching for cited references, and applying domain expertise. Most evaluators see meaningful improvement after 30–60 days of deliberate practice.