Periagoge
Concept
3 min readself knowledge

Token Limits and Context Window Management in Long Documentation

AI has a maximum amount of text it can hold in mind at once; when your documentation is massive, you have to decide what to feed it or risk losing important context. Understanding these limits prevents the AI from accidentally ignoring the most damaging or exculpatory facts.

Hypatia
Why It Matters

A context window is the maximum amount of text an AI model can read at once. It's measured in tokens (roughly 4 characters per token on average). Claude can handle 200K tokens; GPT-4 Turbo handles 128K; GPT-4o handles 200K. Google Gemini handles up to 2M tokens. For most people, context limits feel infinite—you can paste an entire email thread or meeting transcript. But when you're assembling comprehensive workplace documentation, context limits become a real constraint.

Here's the problem: You're documenting six months of incidents with a toxic manager. You have 45 emails, 8 meeting notes, and 3 Slack conversations totaling 25,000 words. You want Claude to create a comprehensive timeline with analysis. But you also want to include your detailed instructions for how to structure the timeline, your historical context, and reference materials. Suddenly your prompt is 30,000 tokens. Within token budget, but you've consumed most of your context window, leaving little room for nuanced reasoning.

Why Context Limits Matter for Workplace Defense

When context windows are tight, AI models sacrifice nuance for brevity. They're forced to summarize summaries, which compounds interpretation risks. A claim that required three sentences of context to understand correctly gets compressed to one phrase. Later, opposing counsel questions that phrase, and you realize the AI omitted crucial context because it was running out of tokens.

Additionally, context efficiency affects cost. If you're using Claude or GPT-4, you're paying per token. A well-organized prompt that uses context efficiently costs less than rambling documentation that repeats information.

Practical Strategies for Management

Chunking by incident type: Don't ask Claude to summarize six months of incidents in one prompt. Break it into categories: "Summarize all incidents related to idea dismissal. Summarize all incidents related to workload distribution. Summarize all incidents related to communication tone." Create a separate summary for each category, then synthesize across summaries. This keeps individual context windows manageable and allows for detailed analysis within each category.

Temporal boundaries: Instead of six months at once, do monthly snapshots. "Summarize all documented incidents from April 1-30. Focus on patterns and any escalation in frequency or severity." This creates a month-by-month record that's easier to defend—you're clearly not cherry-picking; you're documenting systematically.

Progressive summarization: Start with raw materials in a context window. Generate a summary. Then in a second prompt, use that summary plus new materials. This is less efficient than one big summary, but it's actually better for workplace documentation because it creates a visible chain of reasoning. Each layer of summary can be audited.

Tool-specific workarounds: Otter.ai and Descript manage context differently—they're optimized for transcription, so they handle 2-hour meetings without context collapse. Claude's native 200K window and Gemini's 2M window give you flexibility. For truly massive documentation (8+ months of incident records), Gemini's scale might be worth the trade-off in reasoning quality compared to Claude.

The Token Counting Calculation

Before you prompt, estimate tokens. Rough rule: 1,000 words ≈ 1,300 tokens. If you're pasting 20,000 words of incident material, budget 26,000 tokens for content plus 2,000-5,000 for your instructions and reasoning space. That's 28,000-31,000 tokens—well within Claude's 200K window, so you have headroom for follow-up questions or refinements.

Common mistake: Including the same context across multiple prompts. If you're asking Claude to help with an incident summary, then later asking about timeline patterns, paste only the relevant material each time, not the entire 25,000-word archive again. This saves tokens and keeps thinking space available.

Try this: Break your next documentation project into temporal chunks. If you have six months of incident notes, create summaries for months 1-2, then 3-4, then 5-6. Use the same prompt structure each time so summaries are consistent. Then ask Claude or ChatGPT to synthesize across the three monthly summaries into an overall pattern analysis. This keeps individual prompts under 10K tokens (comfortable headroom) and creates a defensible, layered documentation structure.

Helpful guides
Hypatia
Daily Life & Decisions
Related Concepts
Peri
Questions about Token Limits and Context Window Management in Long Documentation?

Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.

Ready to work on Token Limits and Context Window Management in Long Documentation?

Explore related journeys or tell Peri what you're working through.