Periagoge
Concept
3 min readself knowledge

Hallucination in Genealogy Records and How to Spot It

AI systems can confidently assert false details when their training data contains errors or when multiple sources point in different directions, and genealogy records are particularly prone to such mistakes because they were copied, reinterpreted, and transcribed across centuries. Learning to catch these hallucinations means checking original documents, looking for internal consistency, and being skeptical of details that appear nowhere but in AI-generated summaries.

Hypatia
Why It Matters

Hallucination in AI genealogy contexts means the system confidently generating genealogical information—names, dates, relationships, places—that doesn't exist in the source material or historical record. It's not random noise; it's plausible-sounding fabrication. An AI might extract a census record correctly, then "fill in" a missing middle initial with a guess that sounds historically appropriate, or infer a relationship that was never stated in the document.

This is particularly dangerous in genealogy because hallucinations are often invisibly embedded within otherwise accurate outputs. You might ask Claude to extract names from a 1910 census image, receive 15 correct entries and one fabricated household member, and not notice until you've already spent hours pursuing that phantom relative through secondary sources.

Why Genealogy Triggers Hallucination

LLMs (Large Language Models) are pattern-matching systems trained on vast text datasets. Genealogical data is highly structured and repetitive—names, dates, places, relationships follow predictable patterns. When the AI encounters incomplete or ambiguous information (a smudged date, an unclear initial), its training pushes it toward pattern completion. It "knows" that someone born around 1875 would typically be around 35 on a 1910 census, so if a date is unreadable, the model fills it in plausibly.

Second-order hallucinations compound the problem. You ask an AI to "infer possible siblings based on this household census data." The model generates suggestions, but those suggestions are educated guesses, not verified relationships. If you treat them as leads rather than hypotheses, you'll waste research time chasing false branches.

Detection Strategies

The Source Isolation Test: Ask the AI to quote the exact source material, then extract facts. If the AI extracts a detail that isn't in the quote, it's hallucinating. Example: "Here's the OCR text from the 1900 census page: [text]. What is the birthplace of John Smith listed here?" If the AI answers "Germany" but the text says "German" (nationality, not birthplace), it's filled in context from outside the source.

The Contradiction Audit: Feed the same source document to two different AI models (ChatGPT and Claude, for instance) and compare outputs on specific details. Independent hallucinations will differ wildly; accurate readings will match.

The Consistency Test: Ask the AI to extract the same information twice, from the same source, with slightly different phrasing. If the AI gives different answers on the second pass, the original answer was likely hallucinated.

Genealogy-Specific Vulnerability Areas

Handwritten middle initials are frequently hallucinated because models trained on printed text make poor predictions about cursive ambiguity. Family relationships inferred from fragmentary evidence ("She lived with the Miller family, so she was probably a relative") are hallucinated with high confidence. Dates with partial illegibility trigger pattern-completion hallucinations—the model fills in the month or day based on other records it's seen. Occupations, especially archaic or region-specific ones, are frequently garbled or replaced with modern equivalents.

The safest genealogy workflow with AI treats the system as a research assistant, not an oracle. Use AI to accelerate document processing, generate research hypotheses, and identify patterns across documents. But verify every genealogical claim—names, dates, relationships—against primary sources before building your family tree on it.

Try this: Take a complex genealogy question ("What was the family structure in this 1860 census household?") and ask three different AI tools separately. Document where their answers diverge. The divergences are hallucination zones. Cross-check those specific details in FamilySearch or the original image to understand where the AI failed, so you know what to scrutinize going forward.

Helpful guides
Hypatia
Daily Life & Decisions
Related Concepts
Peri
Questions about Hallucination in Genealogy Records and How to Spot It?

Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.

Ready to work on Hallucination in Genealogy Records and How to Spot It?

Explore related journeys or tell Peri what you're working through.