Context Windows and Memory Management in Long-Term Care Conversations

Each conversation with an AI has a context window—the amount of conversation history the model can "remember" at once. In a single chat session, if your conversation exceeds the context window (e.g., 4,000 tokens in an older model), the oldest messages get dropped. The AI forgets the beginning of your conversation. Over longer timescales (days, weeks), this becomes critical: the AI loses memory of your patient's baseline, goals, and past decisions.

The Problem in Caregiving

You're managing a patient's care over months. Monday, you discuss adjusting pain medication. Wednesday, the AI doesn't "remember" that discussion in a new conversation. Friday, the AI suggests a pain management approach you already tried and abandoned. This repetition wastes time and increases error risk. For critical continuity of care, the AI needs persistent memory of ongoing decisions and their outcomes.

Strategies for Memory Management

Explicit summarization: At the end of significant conversations, ask the AI to summarize key decisions, patient baseline, and next steps. Store this summary in a persistent location (note-taking app, care plan document, structured database). Start your next conversation by pasting this summary. Example: "Here's the context from last week: we discussed Mom's blood pressure medication and decided to increase atenolol dosage. We're monitoring for fatigue. Current plan: check BP daily, follow up with cardiologist on Friday." Now the AI has continuity.

Persistent knowledge bases: Use a retrieval system (vector database, Notion, structured notes) where you store key information about the patient. Before each conversation, the AI retrieves relevant context. This is more scalable than manual summarization. The system automatically loads what's relevant to the current query.

Checkpoints and branching: Design conversations in discrete phases. After resolving one care issue, save a snapshot. Start a new conversation with a fresh context, loading only the relevant snapshot. This keeps each conversation lean and reduces accumulated noise.

System prompts for continuity: Many modern AI platforms let you set persistent system instructions (Claude, OpenAI Assistants). Set a system prompt that includes key patient facts, current care goals, and standing decisions. This applies to every message in every conversation without manually repeating it.

Technical Deep Dive

Modern models differ in how they handle context. Claude maintains context across a conversation until the window fills. OpenAI Assistants can persist data across conversations using file uploads and vector searches. Gemini handles very long contexts but with increased latency and cost. For caregiving, the choice of platform affects how well you can maintain memory.

Claude's 200K token window lets you maintain conversation with a patient's full 12-month history in a single chat session. GPT-4's 128K is similar. Older 4K-token models force you to be ruthless about what's included. If your patient's care is complex, choose a larger context model.

Practical Implementation

Daily standups: Each morning, paste yesterday's key decisions into today's conversation. Takes 30 seconds, prevents duplication. Example: "Yesterday we discussed adjusting Mom's insulin timing. Today I want to review her overnight glucose readings against that change." The AI now has continuity.

Weekly summaries: At week's end, ask the AI to summarize the week's care decisions, outcomes, and next steps. Store in a living document. Each Monday, load that summary into a new conversation.

Care plan as anchor: Maintain a structured care plan document (medication list, active diagnoses, treatment goals, recent decisions). Before each AI conversation, load this plan. It serves as a fixed reference point—the AI's memory anchor. Updates to the plan get reflected in future conversations.

Multi-user continuity: If multiple caregivers are involved (family, home care, clinical team), use a shared document or platform where conversations and decisions are recorded. Each caregiver can access the full history. This creates institutional memory beyond any single AI conversation.

Cost and Latency Considerations

Explicitly loading full patient history into every conversation increases tokens and cost. Use retrieval to load only relevant history, not everything. A focused context window with 5,000 relevant tokens is cheaper and faster than loading all 50,000 tokens of full history unnecessarily.

Try this: Track your caregiving AI usage over one week. Each time you start a conversation, note what information you have to repeat from prior conversations. Collect these. Then redesign your next week: at the start, create a one-page "care snapshot"—patient baseline, current medications, active issues, recent decisions, goals. Paste this into every conversation for a week. At week's end, update the snapshot. Track how many times you repeat information. You'll likely cut repetition in half, reducing conversation friction and token usage.