Modern vision and handwriting recognition models can decode deteriorated, cramped, or archaic handwriting in original records well enough to give you a usable transcription that you then verify by eye against the image. This speeds research dramatically—instead of spending hours puzzling out a clerk's handwriting, you get a 85-95% accurate starting point that a human can quickly correct and confirm.
Vision models are neural networks specifically trained to interpret visual information—they excel at recognizing patterns in images that would confuse text-only systems. In genealogy, vision models (like those powering ChatGPT's and Claude's image analysis) handle the aspects of historical document analysis that pure OCR struggles with: they recognize that a faded letter is probably an "S," not "8," because of contextual patterns. They understand that a crossed-out name followed by a substitution represents a correction, not an error. They can parse cursive handwriting where spacing and letter connections vary dramatically.
The distinction between vision models and traditional OCR is architectural. OCR is often rule-based or shallowly learned—if a pixel pattern matches a training example of the letter "A," it's classified as "A." Vision models use deep learning to understand higher-level patterns: they see a word that starts with "J" in cursive, followed by shapes consistent with "ohn," in a census field labeled "Name," and reason that "John" is the intended reading even if the handwriting is ambiguous.
Modern vision models trained on diverse datasets—including historical manuscripts, scanned documents, and synthetic handwriting variations—perform surprisingly well on genealogy documents. They leverage multiple signal types: (1) letterforms and spacing patterns, (2) the field context (knowing a name field rarely contains numbers), (3) common genealogical names and locations, (4) temporal and geographic likelihood (certain names dominate certain eras and regions).
This is why asking a vision model to "read the circled name on this census page" often succeeds where generic OCR fails. The model isn't just trying to match pixels to a font; it's reasoning about what would make sense in that context.
Cursive presents a particular challenge because letters connect in varying ways depending on the writer's script style. English Roundhand, German Kurrent, and Italian Script each have different character forms and connection patterns. Modern vision models struggle less with these variations than older systems because they've been trained on more diverse handwriting examples, but they still perform better on relatively legible scripts and deteriorate on heavily stylized calligraphy.
Vision models fail predictably in specific scenarios: (1) Water-damaged pages where ink ran and mixed, (2) heavily faded documents where the original ink is barely darker than the paper, (3) pages where multiple layers of handwriting overlap (original entry + amendment + notation), (4) documents with extreme skew or rotation (common with photographs of records), (5) marginal notes or cramped handwriting in narrow spaces.
The model's uncertainty is often expressed implicitly—it might read a faded word as "and" or "and," unsure whether a trailing mark is punctuation or ink degradation. Sophisticated genealogy workflows ask the vision model to explicitly flag uncertain readings: "Highlight any characters you're uncertain about with [?]" This self-reporting of confidence helps you identify which parts of a reading require manual verification.
When you upload a handwritten passenger manifest to Claude's vision, it understands "passenger manifest" context—it knows the fields should contain names, ages, occupations, and destinations. It won't hallucinate values for missing entries; it will note the absence. When a date is illegible, it won't guess; it will indicate uncertainty. These context-aware behaviors are vision model strengths that traditional OCR doesn't provide.
Vision models also handle mixed-mode documents well: printed forms with handwritten insertions, typed documents with handwritten amendments. The model can describe which text is printed, which is handwritten, and which is overlapping, helping you understand the document's creation history.
Vision models are slower and more expensive than traditional OCR. They're also less suitable for batch processing hundreds of documents at scale—traditional OCR is faster for that. Vision models sometimes hallucinates details that "fit the style"—a partially visible word might be confidently completed incorrectly. And they perform variably across languages; models trained primarily on English documents struggle more with Spanish, German, or French handwriting, even when characters are clear.
Try this: Find three genealogy documents with varying legibility: one clearly handwritten, one faded, one water-damaged. Upload each to Claude and ChatGPT separately, with the instruction: "Read this document and flag any words where you're uncertain about the reading." Compare which model identifies uncertainty more consistently, and which document types each handles better. This teaches you when to use vision models versus traditional OCR for your specific research needs.
Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.
Explore related journeys or tell Peri what you're working through.