VA claims often require analyzing multiple documents simultaneously—discharge papers, medical records, lay statements—but AI tools have processing limits that force you to choose between breadth and depth. Knowing how to navigate these constraints lets you sequence documents strategically, extract key connections across files, and avoid losing critical evidence because you tried to feed too much at once.
Tokens are the basic units that AI language models process. Most English words are one token; some are multiple, and punctuation matters. The critical point for veterans working with large case files: every modern LLM has a context window—a maximum number of tokens it can process in a single conversation. GPT-4 allows 8,000 or 128,000 tokens depending on which version you use. Claude 3 allows up to 200,000 tokens. This sounds like plenty until you start working with complete VA claims files.
A typical complete C-file (VA claims file) runs 150-400 pages. Medical records alone—VA exams, private provider notes, imaging reports—can easily reach 200-300 pages. At roughly 350 words per page, a modest C-file consumes 30,000-40,000 tokens before you've written a single question. Add your question and any system instructions, and you've consumed 40,000-50,000 tokens on input alone. With a 128,000-token GPT-4, this works. But it's cutting it close, and GPT-4's standard version (8,000 tokens) can't handle it at all.
You can't simply upload your entire claims file and say "analyze this." You need a strategy for breaking the file into meaningful chunks while maintaining enough context for the model to reason correctly. This is fundamentally different from how humans review claims—we flip through linearly and build mental connections. AI needs explicit structure.
One approach: separate by document type. Upload all VA examination reports as one batch with the query "Summarize contradictions between VA examiners on symptom severity." Upload medical treatment records separately with "Identify gaps between when symptoms were documented and VA's timeline." This reduces token consumption but requires running multiple analyses and manually synthesizing results.
Another approach: chronological chunking. Break the file into date ranges. Start with the veteran's initial appointment through service connection grant (establishing the baseline), then analyze post-grant treatment records separately. This preserves narrative flow while respecting token limits. However, this approach risks missing connections between early and late evidence that might strengthen appeals.
VA claims are inherently temporal and relational. A disability rating decision only makes sense in context of prior exams, prior ratings, and prior appeals. Decision letters reference "as previously stated" and "consistent with prior findings"—the model needs that prior context to understand whether the decision is logically coherent. If you chunk too aggressively to fit token limits, you lose these connections.
Also, VA examiners sometimes contradict themselves within a single exam (saying "no functional impairment" then documenting severe limitations). Or contradictions span multiple exams. With token limits, you might analyze exam #1 and exam #3 separately, missing the critical contradiction that strengthens an appeal.
Use larger context windows when available. If you're paying for Claude 3 Opus (200,000 tokens), use it for your full-file analysis rather than standard GPT-4. The per-query cost is higher, but you avoid fragmentation.
Implement intelligent summarization. Before uploading a 50-page medical file, use an AI to create a 2-page summary preserving key findings and dates. This reduces tokens while retaining decision-relevant information. Then upload both the summary and the full document, allowing the model to reference details if needed.
Consider document hierarchy. Upload the VA decision letter and your primary supporting evidence (most recent medical records, nexus opinion) to the largest context window model. Save secondary evidence (older medical records, tangential documents) for follow-up queries with smaller context windows.
Another technique: use embeddings-based search (retrieval) instead of bulk processing. Store your documents in a vector database, then search for sections relevant to your specific question before feeding them to the LLM. This ensures only pertinent content consumes tokens.
Try this: Count your VA file's actual token usage. Take your three largest documents (typically C-file letter, VA exam report, medical records summary). Use an online token counter (many are free) to check their combined token count. If it exceeds 50,000, you cannot fit your entire file in GPT-4's basic context window. Map out a chunking strategy: which documents must stay together, which can be analyzed separately, and which can be summarized to save tokens without losing meaning.
Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.
Explore related journeys or tell Peri what you're working through.