Medical AI can suggest correlations that look significant but are actually caused by something else entirely—older age, for example, or medication use. Understanding which variables confound an analysis keeps you from chasing red herrings and helps you distinguish genuine insights from statistical noise.
Confounding variables are hidden factors that create false associations between two things you're investigating. A classic example: ice cream consumption correlates with drowning deaths. Does ice cream cause drowning? No. The confounder is summer—warm weather drives both ice cream sales and swimming, which increases drowning risk. Missing this confounding variable leads to a false causal conclusion.
In medical AI, confounding is a persistent problem because large datasets naturally contain confounded relationships. For instance, studies might show that hormone replacement therapy (HRT) reduces heart disease risk. But HRT users tend to be wealthier, have better healthcare access, exercise more, and eat healthier diets—all of which reduce heart disease independent of HRT. If AI systems don't account for these confounders, they'll incorrectly attribute the heart disease reduction to HRT when it's actually confounders doing the work.
Machine learning models trained on healthcare data absorb confounded relationships from the training data. If the data shows that people taking certain medications live longer, the model learns this association. But if the real reason they live longer is that the medication is prescribed to healthier patients (survivorship bias—sicker patients are too ill to take the medication), the model's learned association is misleading. When you ask the model "Does this medication extend lifespan?" it will confidently say yes, based on what the data teaches it.
Language models add another layer of complexity. When an AI system reads medical literature, it learns patterns in how researchers discuss findings. If a study found an association and the researchers incorrectly attributed causation (confounders weren't adequately controlled), the AI absorbs that causal claim. When you ask the AI about this topic, it repeats the causal interpretation from the literature, unaware that experts later identified the confounding.
Suppose AI research suggests that vitamin D supplementation improves mood. The literature shows this association, and an AI system will relay it. But the true relationship might be confounded: people who take supplements tend to be more health-conscious, exercise regularly, spend time outdoors, and have better nutrition—all of which improve mood independent of vitamin D. The vitamin D is a marker of health-consciousness, not the cause of better mood. If you use AI-informed research to decide to take vitamin D for depression without addressing these other factors, the recommendation will disappoint.
Another example: studies show that coffee drinkers have lower mortality. This seems to suggest coffee is healthy. But coffee drinkers differ from non-drinkers in many ways: socioeconomic status, healthcare access, whether they smoke (coffee drinkers who quit smoking differ from those who never smoked), and baseline health. When confounders are controlled for statistically, the coffee-mortality association often disappears or reverses. AI trained on observational studies without proper confounder control will give misleading advice.
AI systems have a specific weakness here: they optimize for identifying patterns in data without understanding causality. A machine learning model trained to predict disease outcome from patient data might learn that "taking a certain medication is associated with worse outcomes." The true explanation: sicker patients take this medication more often. But the model has no way to distinguish between "medication causes worse outcomes" and "severity confounds the association." It just sees the pattern and learns it.
Additionally, AI systems can conflate temporal association with causation. If event A typically happens before event B in the data, the model might infer that A causes B, even if B actually causes A or a confounder causes both. For instance, data might show that people report fatigue before being diagnosed with thyroid disease. The AI might infer that fatigue causes thyroid disease, when actually the disease caused the fatigue but wasn't diagnosed until later.
Watch for phrases like "studies show an association" versus "controlled trials demonstrate that." Association-based findings are confounder-vulnerable; controlled trials try to eliminate confounders. If an AI recommends something based on observational studies without noting confounding limitations, that's a red flag.
Ask follow-up questions: "What could explain this relationship besides [proposed cause]?" or "Were confounding variables controlled for in the studies you're referencing?" Good AI systems will acknowledge confounding possibilities. Poor ones will confidently assert causation based on association.
Also, check the mechanism. If an AI claims a supplement improves cognition, ask how it does so mechanistically. If the explanation is vague ("promotes overall wellness," "supports healthy aging") but the evidence is from observational studies, confounding is likely. Mechanistically implausible findings should increase skepticism.
Try this: Find a health-related study on Consensus showing an association between two things (diet component and health outcome, supplement and disease, etc.). Ask Claude: "Could confounding variables explain this association instead of causation? What are the confounders?" Compare Claude's analysis to the study authors' discussion of confounding. This shows you how AI can either spot or miss confounding depending on how it's prompted and how carefully the original research addressed it.
Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.
Explore related journeys or tell Peri what you're working through.