Causality and Psychopathology
american psychopathological association Volumes in the Series Causality and Psychopatho...
22 downloads
683 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Causality and Psychopathology
american psychopathological association Volumes in the Series Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures (Shrout, Keyes, and Ornstein, eds.) Mental Health in Public Health (Cottler, ed.) Trauma, Psychopathology, and Violence: Causes, Correlates, or Consequences (Widom)
Causality and Psychopathology FINDING THE DETERMINANTS OF DISORDERS AND THEIR CURES
EDITED BY
patrick e. shrout, ph.d. Professor of Psychology Department of Psychology New York University New York, NY
katherine m. keyes, ph.d., mph Columbia University Epidemiology Merit Fellow, Department of Epidemiology Columbia University New York, NY
katherine ornstein, mph Department of Epidemiology Mount Sinai School of Medicine New York, NY
1
2011
1
Oxford University Press Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Copyright 2011 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. ____________________________________________ Library of Congress Cataloging-in-Publication Data American Psychopathological Association. Meeting (98th : 2008 : New York, N.Y.) Causality and psychopathology / edited by Patrick E. Shrout, Katherine M. Keyes, Katherine Ornstein. p. ; cm. Includes bibliographical references and index. ISBN 978-0-19-975464-9 (alk. paper) 1. Psychology, Pathological—Etiology—Congresses. 2. Psychiatry—Research— Methodology—Congresses. I. Shrout, Patrick E. II. Keyes, Katherine M. III. Ornstein, Katherine. IV. Title. [DNLM: 1. Mental Disorders—epidemiology—United States—Congresses. 2. Mental Disorders—etiology—United States—Congresses. 3. Psychopathology— methods—United States—Congresses. WM 140] RC454.A4193 2008 616.89—dc22 2010043586 ISBN-13: 978-0-19-975464-9 ____________________________________________ Printed in USA on acid-free paper
Preface
Research in psychopathology can reveal basic insights into human biology, psychology, and social structure; and it can also lead to important interventions to relieve human suffering. Although there is sometimes tension between basic and applied science, the two are tied together by a fascination with causal explanations. Basic causal stories such as how neurotransmitters change brain function ‘‘downstream’’ are always newsworthy, even if the story is about a mouse or rat brain. However, applications of causal understanding to create efficacious prevention or intervention programs are also exciting. Although good causal stories make the headlines, many psychopathology researchers collect data that are descriptive or correlational in nature. However, for decades epidemiologists have worked with such nonexperimental data to formulate causal explanations about the etiology and course of disorders. Even as these explanations have been reported in textbooks and have been used in courts of law to settle claims of responsibility for adverse conditions, they have also been criticized for going too far. Indeed, many scientists shy away from using explicit causal language when reporting observational data, to avoid being criticized for lack of rigor. Nonetheless, the subtext in the reports of associations and developments always implies causal mechanisms. Because of the widespread interest in causal explanation, along with concerns about what kinds of causal claims can be made from survey data, longitudinal studies, studies of genetic relationships, clinical observations, and imperfect clinical trials, the American Psychopathological Association decided to organize its 2008 scientific meeting around the topic of causality and psychopathology research. Those who were invited to speak at the 2.5-day conference included authors of influential works on causation, statisticians whose new methods are informative about causal processes, as well as experts in psychopathology. This volume contains revised and refined versions of the papers presented by the majority of the invited speakers at that unique meeting. Not all of the authors have done work in psychopathology research, and not all have previously written explicitly about causal inference. Indeed, the goal of the meeting and this volume is to promote new creative thinking v
Preface
about how causal inference can be promoted in psychopathology research in the years to come. Moreover, the collection is likely to be of interest to scientists working in other areas of medicine, psychology, and social science, especially those who combine experimental and nonexperimental data in building their scientific literature. The volume is divided into three sections. The first section, ‘‘Causal Theory and Scientific Inference,’’ contains contributions that address crosscutting issues of causal inference. The first two chapters introduce conceptual and methodological issues that thread through the rest of the volume, while the third chapter provides a formal framework for making and examining causal claims. The fourth chapter introduces genetic analysis as a kind of prototype of causal thinking in psychopathology, in that we know that the variation in the genotype can lead to variation in the phenotype but not vice versa. The author also argues for the practical counterfactual thinking implied by the ‘‘interventionist’’ approach to causal inference developed by J. Woodward and colleagues. The final chapter in this section provides a stimulating illustration of the dramatically different inferences one can reach from observational studies and clinical trials from the Women’s Health Initiative. The focus of this chapter is the effect of hormonereplacement therapy on coronary heart disease, as well on the risk of several forms of cancer. Because this example did not have to face the difficulties of diagnosis and nosology that confront so much psychopathology research, the authors were able to focus on important issues such as selection bias and heterogeneity of effects in reconciling data from trials and observational studies. The second section, ‘‘Innovations in Methods,’’ presents new tools and perspectives for exploring and supporting causal theories in epidemiology. Although the substantive focus is psychopathology research, the methods are generally applicable for other areas of medicine. The first chapter in this section (Chapter 6) proposes a novel but formal analysis of causal claims that can be made about direct and indirect causal paths using graphical methods. Chapter 7 describes a statistical method called ‘‘growth mixture modeling,’’ which can examine a variety of hypotheses about longitudinal data that arise out of causal theories. Chapter 8 describes new ways to improve the efficiency of clinical trials by providing causally relevant information. The last two chapters (Chapters 9 and 10) provide insights into how naturally occurring genetic variation can be leveraged to strengthen inferences made about both genetic and environmental causal paths in psychopathology. The final section, ‘‘Causal Thinking in Psychiatry,’’ features critical analyses of causal claims within psychiatry by some of the best known psychopathology researchers. These chapters examine claims in developmental vi
Preface
psychopathology (Chapter 11), posttraumatic stress disorder (Chapter 12), research on therapeutics (Chapter 13), and nosology (Chapter 14). The convergence of this diverse and talented group to one meeting and one volume was facilitated by active involvement of the officers and Council of the American Psychopathological Association (APPA) during 2008. We particularly thank Ezra Susser, the secretary of APPA, who was especially generative in planning the meeting and, therefore, the volume. We also acknowledge the valuable suggestions made by the other officers and councilors of APPA: James J. Hudziak, Darrel A. Regier, Linda B. Cottler, Michael Lyons, Gary Heiman, John E. Helzer, Catina O’Leary, Lauren B. Alloy, John N. Constantino, and Charles F. Zorumski. The meeting itself benefited enormously from the special efforts of Gary Heiman and Catina O’Leary, and it was supported by the National Institute of Mental Health through grant R13 MH082613. This volume is dedicated to Lee Nelkin Robins, a former president of APPA who attended her last meeting in 2008. She died on September 25, 2009. Trained in sociology, Lee Robins made essential contributions to the understanding of the development and distribution of mental disorders, particularly antisocial and deviant behavior as a precursor of later problems. Her rigorous causal thinking was informed by epidemiological data, and she was instrumental in improving the quality and quantity of such data over the course of her productive life.
vii
This page intentionally left blank
Contents
Contributors xi
part i causal theory and scientific inference 1 Integrating Causal Analysis into Psychopathology Research 3 patrick e. shrout, phd 2 What Would Have Been Is Not What Would Be 25 Counterfactuals of the Past and Potential Outcomes of the Future sharon schwartz, phd, nicolle m. gatto, phd, and ulka b. campbell, phd 3 The Mathematics of Causal Relations 47 judea pearl, phd 4 Causal Thinking in Psychiatry 66 A Genetic and Manipulationist Perspective kenneth s. kendler, md 5 Understanding the Effects of Menopausal Hormone Therapy 79 Using the Women’s Health Initiative Randomized Trials and Observational Study to Improve Inference garnet l. anderson, phd, and ross l. prentice, phd
part ii innovations in methods 6 Alternative Graphical Causal Models and the Identification of Direct Effects 103 james m. robins, md, and thomas s. richardson, phd
ix
Contents
7 General Approaches to Analysis of Course 159 Applying Growth Mixture Modeling to Randomized Trials of Depression Medication bengt muthe´n, phd, hendricks c. brown, phd, aimee m. hunter, phd, ian a. cook, md, and andrew f. leuchter, md 8 Statistical Methodology for a SMART Design in the Development of Adaptive Treatment Strategies 179 alena i. oetting, ms, janet a. levy, phd, roger d. weiss, md, and susan a. murphy, phd 9 Obtaining Robust Causal Evidence From Observational Studies 206 Can Genetic Epidemiology Help? george davey smith, md, dsc 10 Rare Variant Approaches to Understanding the Causes of Complex Neuropsychiatric Disorders 252 matthew w. state, md, phd
part iii causal thinking in psychiatry 11 Causal Thinking in Developmental Disorders 279 e. jane costello, phd, and adrian angold, mrcpsych 12 Causes of Posttraumatic Stress Disorder 297 naomi breslau, phd 13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 321 A Programmatic Approach in Therapeutic Context donald f. klein, md, dsc 14 The Need for Dimensional Approaches in Discerning the Origins of Psychopathology 338 robert f. krueger, phd, and daniel goldman Index 353
Contributors
e. jane costello, ph.d. Center for Developmental Epidemiology Department of Psychiatry and Behavioral Sciences Duke University Medical Center Durham, NC
garnet l. anderson, ph.d WHI Clinical Coordinating Center Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, WA adrian angold, m.d. Center for Developmental Epidemiology Department of Psychiatry and Behavioral Sciences Duke University Medical Center Durham, NC
nicolle m. gatto, ph.d. Director, TA Group Head, Epidemiology Safety and Risk Management Medical Division, Pfizer, Inc. New York, NY
naomi breslau, ph.d. Department of Epidemiology Michigan State University College of Human Medicine, East Lansing, MI
daniel goldman University of Minnesota Minneapolis, MN aimee m. hunter, ph.d. Research Psychologist Laboratory of Brain, Behavior, and Pharmacology University of California, Los Angeles Los Angeles, CA
hendricks c. brown, ph.d. University of Miami Miami, FL ulka b. campbell, ph.d. Associate Director, Epidemiology Safety and Risk Management Medical Division, Pfizer, Inc. New York, NY
kenneth s. kendler, m.d. Virginia Institute for Psychiatric and Behavioral Genetics Departments of Psychiatry and Human and Molecular Genetics Medical College of Virginia Virginia Commonwealth University Richmond, VA
ian a. cook, m.d. Associate Professor Semel Institute for Neuroscience and Human Behavior University of California, Los Angeles, Los Angeles, CA xi
Contributors
donald f. klein, m.d., d.sc. Research Professor Phyllis Green and Randolph Cowen Institute for Pediatric Neuroscience NYU Child Study Center, NYU Medical Center Professor Emeritus, Department of Psychiatry College of Physicians & Surgeons Columbia University New York, NY
judea pearl, ph.d. Cognitive Systems Laboratory Computer Science Department University of California Los Angeles, CA ross l. prentice WHI Clinical Coordinating Center Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, WA
robert f. krueger Washington University in St. Louis St. Louis, MO
sharon schwartz, ph.d. Professor of Clinical Epidemiology Columbia University New York, NY
andrew f. leuchter, m.d. Professor Department of Psychiatry and Biobehavioral Science University of California, Los Angeles Los Angeles, CA
george davey smith MRC Centre for Causal Analyses in Translational Epidemiology Department of Social Medicine University of Bristol Bristol, UK
janet a. levy Center for Clinical Trials Network National Institute on Drug Abuse Bethesda, MD
matthew w. state, m.d., ph.d. Donald J. Cohen Associate Professor Co-Director Program on Neurogenetics Yale Child Study Center Department of Genetics and Psychiatry Yale University School of Medicine New Haven, CT
susan a. murphy, ph.d. Professor, Psychiatry University of Michigan Institute for Social Research Ann Arbor, MI bengt muthe´n, ph.d. Professor Emeritus Graduate School of Education & Information Studies University of California, Los Angeles Los Angeles, CA
roger d. weiss, m.d. Harvard Medical School McLean Hospital Belmont, MA
alena i. oetting University of Michigan Institute for Social Research Ann Arbor, MI
xii
part i Causal Theory and Scientific Inference
This page intentionally left blank
1 Integrating Causal Analysis into Psychopathology Research patrick e. shrout
Both in psychopathology research and in clinical practice, causal thinking is natural and productive. In the past decades, important progress has been made in the treatment of disorders ranging from attention-deficit/hyperactivity disorder (e.g., Connor, Glatt, Lopez, Jackson, & Melloni, 2002) to depression (e.g., Dobson, 1989; Hansen, Gartlehner, Lohr, Gaynes, & Carey, 2005) to schizophrenia (Hegarty, Baldessarini, Tohen, & Waternaux, 1994). The treatments for these disorders include pharmacological agents as well as behavioral interventions, which have been subjected to clinical trials and other empirical evaluations. Often, the treatments focus on the reduction or elimination of symptoms, but in other cases the interventions are designed to prevent the disorder itself (Brotman et al., 2008). In both instances, the interventions illustrate the best use of causal thinking to advance both scientific theory and clinical practice. When clinicians understand the causal nature of treatments, they can have confidence that their actions will lead to positive outcomes. Moreover, being able to communicate this confidence tends to increase a patient’s comfort and compliance (Becker & Maiman, 1975). Indeed, there seems to be a basic inclination for humans to engage in causal explanation, and such explanations affect both basic thinking, such as identification of categories (Rehder & Kim, 2006), and emotional functioning (Hareli & Hess, 2008). This inclination may lead some to ascribe causal explanations to mere correlations or coincidences, and many scientific texts warn researchers to be cautious about making causal claims (e.g., Maxwell & Delaney, 2004). These warnings have been taken to heart by editors, reviewers, and scientists themselves; and there is often reluctance regarding the use of causal language in the psychopathology literature. As a result, many articles simply report patterns of association and refer to mechanisms with euphemisms that imply causal thinking without addressing causal issues head-on. 3
4
Causality and Psychopathology
Over 35 years ago Rubin (1974) began to talk about strong causal inferences that could be made from experimental and nonexperimental studies using the so-called potential outcomes approach. This approach clarified the nature of the effects of causes A vs. B by asking us to consider what would happen to a given subject under these two conditions. Forget for a moment that at a single instant a subject cannot experience both conditions—Rubin provided a formal way to think about how we could compare potential rather than actual outcomes. The contrast of the potential outcomes was argued to provide a measure of an individual causal effect, and Rubin and his colleagues showed that the average of these causal effects across many individuals could be estimated under certain conditions. Although approaches to causal analysis have also been developed by philosophers and engineers (see Pearl, 2009), the formal approaches of Rubin and his colleagues (e.g., Holland, 1986; Frangakis & Rubin, 2002) and statistical epidemiologists (Greenland, Pearl, & Robins, 1999a, 1999b; Robins, 1986; Robins, Hernan, & Brumback, 2000) have prompted researchers to have new appreciation for the strengths and limitations of both experimental and nonexperimental designs. This volume is designed to promote conversations among those concerned with causal inference in the abstract and those interested in causal explanation of psychopathology more specifically. Authors include prominent contributors from both types of literature. Some of the chapters from experts in causal analysis are rather technical, but all engage important and cuttingedge issues in the field. The psychopathology experts raise challenging issues that are likely to be the subject of discussion for years to come. In this introductory chapter, I give an overview of some of the themes that will be discussed in the subsequent chapters. These themes have to do with the assessment of causal effects, the sources of bias in clinical trials and nonexperimental designs, and the potential of innovative designs and perspectives. In addition to the themes that are developed later in the volume, I discuss two topics that are not fully discussed elsewhere in the volume. One topic focuses on the role of time when considering the effects of causes in psychopathology research. The other topic is mediation analysis, which is a statistical method that was developed in psychology to describe the intervening processes between an intervention and an outcome of that intervention.
Themes in Causal Analysis In the pioneering work of Rubin (1978), randomized experiments had a special status in making causal claims based on the causal effect. As mentioned, the causal effect is defined as the difference between the outcome (say, Y) that would be observed if person U were administered treatment
1 Integrating Causal Analysis into Psychopathology Research
5
A vs. what would have been observed if that person received treatment B. The outcome under treatment A is called YA(U) and the outcome under treatment B is called YB(U). Because only one treatment can be administered for a given measurement of Y(U), the definition of the causal effect depends on a counterfactual consideration, namely, what the outcome of someone with treatment A would have been had he or she received treatment B or what the outcome of someone assigned to treatment B would have been had he or she received treatment A. Our inability to observe both outcomes is what Holland (1986) called ‘‘the fundamental problem of causal inference.’’
Average Causal Effects from Experiments Although the individual causal effect cannot be known, the average causal effect can be estimated if subjects are randomly assigned to treatment A or B and several other conditions are met. In this case, between-person information can be used to estimate the average of within-person counterfactual causal effects. The magnitude of the causal effect is taken seriously, and efforts are made to estimate this quantity without statistical bias. This can be done in a relatively straightforward way in engineering experiments and in randomized studies with animal models since the two groups being compared are known to be probabilistically equivalent. Moreover, in basic science applications the assumption of a stable unit treatment value (SUTVA) (Rubin, 1980, 1990) is plausible. As Schwartz, Gatto, and Campbell discuss (Chapter 2), the SUTVA assumption essentially means that subjects are exchangeable and one subject’s outcome is unaffected by another subject’s treatment assignment. This assumption will typically hold in carefully executed randomized experiments using genetically pure laboratory animals. For example, O’Mahony and colleagues (2009) randomly selected male Sprague-Dawley rat pups for a postnatal stress experience, which involved removing them daily from their mother for 3 hours on days 2–12. The randomly equivalent control pups were left with their mothers. At the end of the study, the investigators found differences in the two groups with respect to biological indicators of stress and immune function. The biologically equivalent subjects in this example are plausibly exchangeable, consistent with SUTVA; but we must also assume that the subjects did not affect each other’s responses. In the causal literature, it is common to represent the causal effects as shown in Figure 1.1A. In this figure, the treatment is represented as variable T, and it would take two values, one for treatment A and one for treatment B. The outcome is represented by Y, which would have been the value of one of the biological measurements in the O’Mahony et al. (2009) example. In addition, there is variable E, which represents the other factors that can influence the value of Y, such as genetic mutations or measurement error
Causality and Psychopathology
6 Panel A
Panel B
E
T
Y
C
X
E
Y
Figure 1.1 Schematic representation of Treatment condition (T) on outcome (Y). Boxes represent observed values and circles represent latent variables. In the panel on the left (Panel A) the treatment is the only systematic influence on Y, but in the panel on the right (Panel B) there is a confounding variable (C) that influences both the treatment and the outcome.
in the biological assays. When random assignment is used to define T, then E and T are unrelated; and this is represented in Figure 1.1 by a lack of any explicit link between these two variables.
Clinical Trials One might suppose that this formal representation works for experiments involving humans as subjects. However, things get complicated quickly in this situation, as is well documented in the literature on clinical trials (e.g., Fleiss, 1986; Everitt & Pickles, 2004; Piantadosi, 2005). It is easy enough to assign people randomly to either group A or group B and to verify that the two groups are statistically equivalent in various characteristics, but human subjects are agents who can undo the careful experimental design. Individuals in one group might not like the treatment to which they are assigned and may take various actions, such as failing to adhere with the treatment, switching treatments, selectively adding additional treatments, or withdrawing from the study entirely. This issue of nonadherence introduces bias into the estimate of the average causal effect (see Efron, 1998; Peduzzi, Wittes, Detre, & Holford, 1993, for detailed discussion). For example, if a drug assigned to group A has good long-term efficacy but temporary negative side effects, such as dry mouth or drowsiness, then persons who are most distressed by the side effects might refuse to take the medication or cut back on the dose. Persons in group B may not feel the need to change their assigned treatment, and thus, the two groups become nonequivalent in adherence. One would expect that the comparison of the outcomes in the two groups would underestimate the efficacy of the treatment. A different source of bias will be introduced if persons in one group are more likely to withhold data or to be lost to follow-up compared to the
1 Integrating Causal Analysis into Psychopathology Research
7
other group. This issue of missing data is another threat to clear causal inference in clinical trials. Mortality and morbidity are common reasons for follow-up data being missing, but sometimes data are missing because subjects has become so high-functioning that they do not have time to give to follow-up measurement. If observations that are missing come from a distribution other than the observations that were completed and if this discrepancy is different in groups A and B, then there is potential for the estimate of the causal effect to become biased (see Little & Rubin, 2002) For many clinical studies, the bias in the causal effect created by differential nonadherence and missing data patterns is set aside rather than confronted directly. Instead, the analysis of the trials typically emphasizes intent to treat (ITT). This requires that subjects be analyzed within the groups originally randomized, regardless of whether they were known to have switched treatment or failed to provide follow-up data. Missing data in this case must be imputed, using either formal imputation methods (Little & Rubin, 2002) or informal methods such as carrying the last observed measurement forward. ITT shifts the emphasis of the trial toward effectiveness of the treatment protocol, rather than efficacy of the treatment itself (see Piantadosi, 2005, p. 324). For example, if treatment A is a new pharmacologic agent, then the effectiveness question is how a prescription of this drug is likely to change outcome compared to no prescription. The answer to this question is often quite different from whether the new agent is efficacious when administered in tightly controlled settings since effectiveness is affected by side effects, cost of treatment, and social factors such as stigma associated with taking the treatment. Indeed, as clinical researchers reach out to afflicted persons who are not selected on the basis of treatment-seeking or volunteer motives, nonadherence and incomplete data are likely to be increasingly more common and challenging in effectiveness evaluation. Although these challenges are real, there are important reasons to examine the effectiveness of treatments in representative samples of persons outside of academic medical centers. Whereas ITT and ad hoc methods of filling in missing data can provide rigorous answers to effectiveness questions, causal theorists are drawn to questions of efficacy. Given that we find that a treatment plan has no clear effectiveness, do we then conclude that the treatment would never be efficacious? Or suppose that overall effectiveness is demonstrated: Can we look more carefully at the data to determine if the treatment caused preventable side effects? Learning more about the specific causal paths in the development and/or treatment of psychopathology is what stimulates new ideas about future interventions. It also helps to clarify how definitive results are from clinical trials or social experiments (e.g., Barnard, Frangakis, Hill, & Rubin, 2003). Toh and Herna´n (2008) contrast findings based on an ITT approach to findings based on causally informative analyses.
8
Causality and Psychopathology
Nonexperimental Observational Studies Just as nonadherence and selective missing data can undermine the randomized equivalence of treatment groups A and B, selection effects make it especially difficult to compare groups whose exposure to different agents is simply observed in nature. Epidemiologists, economists, and other social scientists have invested considerable effort into the development of methods that allow for adjustment of confounding due to selection biases. Many of these methods are reviewed or further developed in this volume (see Chapters 6 and 11). In fact, the problems that ‘‘break’’ randomized experiments with humans (Barnard et al., 2003) have formal similarity to selection, measurement, and attrition effects in nonexperimental studies. A simple version of this formal representation is shown in Figure 1.1B. In this version, some confounding variable, C, is shown to be related to the treatment, T, and the outcome, Y. If variable C is ignored (either because it is not measured or because it is left out of the analysis for other reasons), then the estimated causal effect of T on Y will be biased. There can be multiple types of confounding effects, and missing data processes may be construed to be examples of these. Often, the confounding is more complex than illustrated in Figure 1.1. For example, Breslau in this volume (Chapter 12) considers examples where T is experience of trauma (vs. no trauma) and Y are symptoms of avoidance/numbing that are consistent with posttraumatic stress syndrome. Although the causal association between T and Y is often assumed, Breslau considers the existence of other variables, such as personality factors, that might be related to either exposure to T or appraisal of the trauma and the likelihood of experiencing avoidance or numbing. If the confounding variables are not identified as causal alternatives and if data that are informative of the alternate causal paths are not obtained, then the alleged causal effect of the trauma will be overstated.
Innovative Designs and Analyses for Improving Causal Inferences When studying the effects of purported causes such as environmental disasters, acts of war or terror, bereavement, or illness/injury, psychopathology researchers often note that random assignment is not possible but that a hypothetical random experiment would provide the gold standard for clear causal inference. This hypothetical ideal can be useful in choosing quasiexperimental designs that find situations in nature that seem to mimic random assignment. There are several classes of quasi-experimental design that create informative contrasts in the data by clever definition of treatment groups rather than random assignment (Shadish, Cook, & Campbell, 2002). For example, Costello and colleagues describe a situation in which a subset of
1 Integrating Causal Analysis into Psychopathology Research
9
rural families in the Great Smoky Mountain developmental study (see Chapter 11; Costello, Compton, Keeler, & Angold, 2003) were provided with new financial resources by virtue of being members of the Cherokee Indian tribe at the time the tribe opened a new casino. Tribe members were provided payments from profit from the casino, while their nontribe neighbors were not. Costello and colleagues describe how this event, along with a developmental model, allowed strong claims to be made about the protective impact of family income on drug use and abuse. Modern genetics provides new and exciting tools for creating groups that appear to be equivalent in every respect except exposure. Kendler in this volume (Chapter 4) describes how twin studies can create informative quasiexperimental designs. Suppose that we are interested in an environmental exposure that is observed in nature and that the probability of exposure is known to be related to psychological characteristics such as cognitive ability, risk-taking, and neuroticism, which are known to have genetic loadings. If we can find monozygotic and dizygotic twin pairs with individual twins who differ in exposure, then we have a strong match for selection. Modern genetic analyses are useful in isolating the risk of exposure (a selection factor) from the causal effect of the exposure on subsequent psychological functioning. Twin studies are not necessary to take advantage of genetic thinking to overcome selection effects. Davey Smith (Chapter 9) writes that researchers are learning about genetic variants that determine how various environmental agents (e.g., alcohol, cannabis) are metabolized and that these variants are nearly randomly distributed in certain populations. Under the so-called Mendelian randomization (Davey Smith & Ebrahim, 2003) approach, a causal theory that involves known biochemical pathways can be tested by comparing outcomes following exposure in persons who differ in the targeted genetic location. Mendelian randomization strategies and co-twin designs make use of genetics to provide insightful causal analyses of environmental exposures. Random genetic variation can also be tapped to examine the nature of genetic associations themselves. State (Chapter 10) describes how rare genetic variants, such as nucleotide substitutions or repeats or copy number variations, can be informative about the genetic mechanisms in complex diseases. He illustrates this causal approach with findings on autism. Because the rare variants seem to be random, the selection issues that concern most observational studies are less threatening.
Analytic Approaches to Confounding Understanding the nature of the effects of confounding by nonadherence and missing values in clinical trials and by selection effects in nonexperimental
10
Causality and Psychopathology
comparative studies has been aided by formal representations of the nature of the causal effects in the work of Pearl (2000, see Chapter 3). Pearl has promoted the use of directed acyclic graphs (DAGs), which are explicit statements of assumed causal paths. These graphical representations can be used to recognize sources of confounding as well as to identify sufficient sets of variables to adjust for confounding. The graphs can also be used to gain new insights into the meaning of direct and indirect effects. The interpretation of these graphs is enhanced by the use of the "do" operator of Pearl, which states explicitly that a variable, Ti, can be forced to take one fixed value, do(Ti = ti), or an alternate. For example, if T is an indicator of treatment A or B, then this operator explicitly asks what would happen if individual i were forced to have one treatment or the other. The formal analysis of causation requires consideration, whether empirical or hypothetical, of what would happen to the causal "descendents" if a variable is changed from one fixed value to a different fixed value. A particularly useful feature of the formal exercise is the consideration of competing causal paths that can bias the interpretation of a given set of data, as well as the consideration that the causal model might differ across individuals. Once articulated, investigators often are able to obtain measures of possible confounding variables. A question of great interest among modern causal analysts is how to use these measures to eliminate bias. Traditionally, the confounders are simply added as ‘‘control’’ variables in linear regression or structural equation models (Morgan & Winship, 2007; Bollen, 1989). If (1) C is known to be linearly related to T, (2) C is known to be linearly related to Y, (3) the relation of T to Y is known to be the same for all levels of C, (4) C is measured without error, and (5) the set of variables in C is known to represent all aspects of selection bias, then the regression model approach will yield an unbiased estimate of the causal effect of T on Y. The adjusted effect is often interpreted with phrases such as ‘‘holding constant C’’ and ‘‘the effect of T on Y is X,’’ which can be interpreted as an average causal effect. Causal analysts often talk about the fact that assumptions that are needed to make an adjustment valid are untestable. An investigator might argue for the plausibility of the linear model assumptions by testing whether nonlinear terms improve the fit of the linear models and testing for interactions between C and T in the prediction of Y. However, these empirical tests will leave the skeptic unconvinced if the study sample is small and the statistical power of the assumption tests is limited. Another approach to adjustment relies on the computation of propensity scores (e.g., Rosenbaum & Rubin, 1983), which are numerical indicators of how similar individuals in condition A are to individuals in condition B. These scores are computed as summaries of multivariate representations of the similarity of the individuals in the two groups. The propensity scores
1 Integrating Causal Analysis into Psychopathology Research
11
themselves are created using methods such as logistic regression and nonlinear classification algorithms with predictor variables that are conceptually prior to the causal action of T on Y. One important advantage of this approach is that the analyst is forced to study the distributions of the propensity scores in the two groups to be compared. Often, one discovers that there are some persons in one group who have no match in the other group and vice versa. These unique individuals are not simply included as extrapolations, as they are in traditional linear model adjustments, but are instead set aside for the estimation of the causal effect. The computation of the adjusted group difference is based on either matching of propensity scores or forming propensity score strata. This approach is used to make the groups comparable in a way that is an approximation to random assignment given the correct estimation of the propensity score (see Gelman & Hill, 2007). Propensity score adjustment neither assumes a simple linear relation between the confounder variables and the treatment nor leads to a unique result. Different methods for computing the propensity score can yield different estimates of the average causal effect. The ways that propensity scores might be used to improve causal inference continue to be developed. For example, based on work by Robins (1993), Toh and Herna´n (2008) describe a method called inverse probability weighting for adjustment of adherence and retention in clinical trials. This method uses propensity score information to give high weight to individuals who are comparable across groups and low weight to individuals who are unique to one group. Whereas direct adjustment and calculation of propensity scores make use of measured variables that describe the possible imbalance of the groups indexed by T, the method of instrumental variables attempts to adjust for confounding by using knowledge about the relation of a set of variables I to the outcome Y. If I can affect Y only through the variable T, then it is possible to isolate spurious correlation between the treatment (T) and the outcome (Y). Figure 1.2 shows a representation of this statement. The instrumental variable I is said to cause a change in T and, through this variable, to affect Y.
E
I
T
Y
Figure 1.2 Schematic representation of how an instrumental variable (I) can isolate the causal effect from the correlation between the treatment variable (T) and the error term (E).
12
Causality and Psychopathology
There may be other reasons why T is related to Y (as indicated by correlation between T and E), but if the instrumental variable model is correct, the causal effect can be isolated. The best example of this is when I is an indicator of random assignment, T is a treatment condition, and Y is the outcome. On the average, random assignment is related to Y only through the treatment regime T. Economists and others have shown that instrumental variables allow for confounding to be eliminated even if the nature of the confounding process is not measured. In nonexperimental studies, the challenge is to find valid instrumental variables. The arguments are often made on the basis of scientific theories of the causal process. For example, in the Costello et al. (2003) Great Smoky Mountain Study, if tribal membership has never been related to substance use by adolescents in a rural community but it becomes related after it is associated with casino profit payments, then a plausible case can be made for tribal membership being an instrumental variable. However, as Herna´n and Robins (2006) discuss, careful reexamination of instrumental variable assumptions can raise questions about essentially untestable assumptions about the causal process. The analytic approaches to confounding can provide important insights into the effects of adherence and retention in clinical trials and the impact of alternate explanations of causal effects by selection processes in nonexperimental studies. As briefly indicated, the different approaches make different assumptions and these assumptions can lead to different estimates of causal effects. Researchers who strive for closure from a specific study find such a lack of clarity to be unsatisfying. Indeed, one of the advantages of the ITT analysis of randomized clinical trials is that it can give a single clear answer to the question of treatment effectiveness, especially when the analyses follow rigorous guidelines for a priori specification of primary outcomes and are based on data with adequate statistical power.
Temporal Patterns of Causal Processes As helpful as the DAG representations of cause can be, they tend to emphasize causal relations as if they occur all at once. These models may be perfectly appropriate in engineering applications where a state change in T is quickly followed by a response change in Y. In psychopathology research, on the other hand, both processes that are hypothesized to be causes and processes that are hypothesized to be effects tend to unfold over time. For example, in clinical trials of fluoxetine, the treatment is administered for 4–6 weeks before it is expected to show effectiveness (Quitkin et al., 2003). When the treatment is ended, the risk of relapse is typically expected to increase
1 Integrating Causal Analysis into Psychopathology Research
13
with time off the medication. There are lags to both the initial effect of the treatment and the risk of relapse. Figure 1.3A shows one representation of this effect over time, where the vertical arrows represent a pattern of treatments. Another pattern is expected in preventive programs aimed at reducing externalizing problems in high-risk children through the improvement of parenting skills of single mothers. The Incredible Years intervention of Webster-Stratton and her colleagues (e.g., Gross et al., 2003) takes 12 weeks to unfold and involves both parent and teacher sessions, but the impact of the program is expected to continue well beyond the treatment period. The emphasis on positive parenting, warm but structured interactions, and reduction of harsh interactions is expected to affect the mother– child relationships in ways that promote health, growth, and reduction of conduct problems. Figure 1.3B shows how this pattern might look over time, with an initial lag of treatment and a subsequent shift. For some environmental shocks or chemical agents with pharmacokinetics of rapid absorption, metabolism, and excretion, the temporal patterns might be similar to those found in engineering applications. These are characterized by rapid change following the treatment and fairly rapid return to baseline after the treatment is ended. Figure 1.3C illustrates this pattern, which might be typical for heart rate change following a mild threat such as a fall or for headache relief following the ingestion of a dose of analgesic. As Costello and Angold discuss (Chapter 11), the consideration of these patterns of change is complicated by the fact that the outcome being studied might not be stable. Psychological/biological processes related to symptoms might be developing due to maturation or oscillating due to circadian rhythms, or they might be affected by other processes related to the treatment itself. In randomized studies, the control group can give a picture of the trajectory of the naturally occurring process, so long as adequate numbers of assessments are taken over time. However, the comparison of the treatment and control group may no longer give a single outcome but, rather, a series of estimated causal effects at different end points, both because of the hypothesized time patterns illustrated in Figure 1.3 and because of the natural course of the processes under study. Although one might expect that effects that are observed at adjacent times are due to the same causal mechanism, there is no guarantee that the responses are from the same people. One group of persons might have a short-lived response at one time and another group might have a response at the next measured time point. Muthe´n and colleagues’ parametric growth mixture models (Chapter 7) shift the attention to the individual over time, rather than specific (and perhaps arbitrarily chosen) end points. These models allow the expected
Causality and Psychopathology
14 Panel A 10.0
5.0
0.0 0
5
10
15
10
15
Time
Panel B show 10.0
5.0
0.0
0
5 Time
twelve weeks and shift around 10 weeks. Panel C 10.0
5.0
0.0
0
5
10
15
Figure 1.3 Examples of time trends relating treatments (indicated by vertical arrow) and response on Y. Panel A shows an effect that takes time to be seen and additional time to diminish when the treatment is removed. Panel B shows an effect that takes time to be seen, but then is lasting.
1 Integrating Causal Analysis into Psychopathology Research
15
trajectory in group A to be compared with that in group B. This class of models also considers various patterns of individual differences in the trajectories, with an assumption that some persons in treatment group A might have trajectories just like those in placebo group B. Although the parametric assumptions about the nature of the trajectories can provide interesting insights and possibly increased statistical power, causal analysts can have strikingly different opinions about the wisdom of making strong untestable assumptions. Scientists working on problems in psychopathology often have a general idea of the nature of the trajectory, and this is reflected in the timing of the measurements. However, unless repeated measurements are taken at fairly short intervals, it is impossible to document the nature of the alternative patterns as shown in Figure 1.3. Such basic data are needed to choose among the possible parametric models that can be fit in growth mixture models, and they are also necessary to implement the ideas of Klein (Chapter 13), which involve starting and stopping treatment repeatedly to determine who is a true responder and who is not. Note that Klein’s proposal is related to classic crossover designs in which one group is assigned treatment sequence (A, B) and another group is assigned (B, A). This design assumes a temporal pattern like in Figure 1.3C, and it requires a ‘‘washout’’ period during which the effect of the first treatment is assumed to have dissipated. The literature on these designs is extensive (e.g., Fleiss, 1986; Piantadosi, 2005), and it seems to provide an intuitive solution to Holland’s (1986) fundamental problem of causal inference. If one cannot observe both potential outcomes, YA(U) and YB(U), at the same instant, then why not fix the person U (say U = u) and estimate YA(U) and YB(U) at different times? Holland called this a scientific approach to the fundamental problem, but he asserted that the causal estimate based on this design depends on an untestable homogeneity assumption, namely, that person u at time 1 is exactly the same as person u at time 2, except for the treatment. Although the test of that assumption cannot be definitive, an accumulated understanding of temporal patterns of effects will make the assumption more or less plausible.
Mediation and Moderation of Causal Effects Just as psychopathology researchers are willing to consider scientific approaches to the fundamental problem of causal inference using crossover designs, they may also be inclined to develop intuitive statistical models of causal process. For example, Freedland et al. (2009) found that assignment to a cognitive behavior therapy (CBT) condition was related to reduced
Causality and Psychopathology
16
depression 3 months after treatment completion among depressed patients who had experienced coronary artery bypass surgery. A researcher might ask if the improvement was due to mastery of one or another component of CBT, namely, (1) control of challenging distressing automatic thoughts or (2) changing and controlling dysfunctional attitudes. Suppose the researcher had included assessments of these two cognitive skills at the 2-month assessment (1 month before the end point assessment). The question could be asked, Can the effect of treatment (T = CBT) on depression (Y) be explained by an intervening cognitive skill (M)? Kenny and colleagues (Judd & Kenny, 1981; Baron & Kenny, 1986) formalized the mediation analysis approach to this question using a set of linear models that are represented in Figure 1.4. Panel A shows a causal relation between T and Y (represented as c) and panel B shows how that effect might be explained by mediator variable M. There are four steps in the Baron and Kenny tradition: (1) show that T is related to Y (effect c in panel A); (2) show that T is related to M (effect a in panel B); (3) show that, after adjusting for T, M is related to Y (effect b in panel B) and then determine if the direct effect of T on Y, after adjusting for M, remains non-zero (effect c0 in panel B). If the direct effect can be considered to be zero, then Baron and Kenny described the result as complete mediation—otherwise, it was partial mediation. In addition to these steps, the mediation tradition suggests estimating the indirect effect of T on Y through M as the product of estimates of a and b in Figure 1.4B and testing the null hypothesis that the product is equal to zero (see MacKinnon, 2008). It is difficult to overestimate the impact of this approach on informal causal analysis in psychopathology research. The Baron and Kenny report alone has been cited more than 17,000 times, and thousands of these citations are by psychopathology researchers. Often, the mediation approach is used in the context of experiments such as those already described, but other times it is used to explain associations observed in cross-sectional surveys. These have special problems.
Panel A
eM
Panel B
M
eY
T
c
Y
T
eY b
a c´
Y
Figure 1.4 Traditional formulation of Baron and Kenny (1986) mediation model, with Panel A showing total effect (c) and Panel B showing indirect (a*b) and direct (c0 ) effect decomposition.
1 Integrating Causal Analysis into Psychopathology Research
17
Although Kenny and his colleagues have explicitly warned that the analysis is appropriate only when the ordering of variables is unambiguous, many published studies have not established this order rigorously. Even if an experimental design guarantees that the mediating and outcome processes (M, Y) follow the intervention (T), M and Y themselves are often measured at the same point in time and the association between M and Y is estimated as a correlation rather than a manipulated causal relation. This leaves open the possibility of important bias in the estimated indirect effect of T on Y through M. Figure 1.5A is an elaboration of Figure 1.4B that represents the possibility of other influences besides T on the association between M and Y. This is shown as correlated residual terms, eM and eY. For example, if we were trying to explain the effect of CBT (T) on depression (Y) through changes in control of dysfunctional attitudes (M), we could surmise that there is a correlation of degree of dysfunctional attitudes and depression symptoms that would be observed even in the control group. Baseline intuitions, insight, or self-help guides in the lay media might have led to covariation in the degree of dysfunctional attitudes and depression. In fact, part of this covariation could be reverse pathways such that less depressed persons more actively read self-help strategies and then change their attitudes as a function of Panel A
T
eM
M
eY
Y Panel B
T c´
a
eM
M0
M g1
rMY
b
Y0
g2
Y
eY
Figure 1.5 Formulation of mediation model to show correlated errors (Panel A) and an extended model that includes baseline measures of the mediating variable (M0) and the outcome measure () and the outcome measure (Y0).
18
Causality and Psychopathology
the reading. If these sources of covariation are ignored, then the estimate of the b effect will be biased, as will be the product, a * b. In most cases, the bias will overestimate the amount of effect of T that goes through M. Hafeman (2008) has provided an analysis of this source of bias from an epidemiologic and causal analysis framework. Although Figure 1.5A represents a situation where b will be biased when the usual Baron and Kenny (1986) analysis is carried out, the model shown in Figure 1.5A cannot itself be used to respecify the analysis to eliminate the bias. This is because the model is not empirically identified. This means that we cannot estimate the size of the correlation between eM and eY while also estimating a, b, and c0 . However, investigators often have information that is ignored that can be used to resolve this problem. Figure 1.5B shows a model with adjustments for baseline (prerandomization) measures of the outcome (Y0) and mediating process (M0). When these baseline measures are included, it is possible both to account for baseline association between Y and M and to estimate a residual correlation between Y and M. The residual correlation can be estimated if it is reasonable to consider the baseline M0 as an instrumental variable that has an impact on the outcome Y only through its connection with the postrandomized measure of the mediating process, M1 How important can this adjustment be? Consider a hypothetical numerical example in which a = 0.7, b = 0.4, and c0 = 0.28. Assuming that the effects are population values, these values indicate a partial mediation model. The total effect of T on Y (c in Figure 1.4A) is the sum of the direct and indirect effects, 0.56 = 0.28 + (0.70)(0.40), and exactly half the effect goes through M. The stability of the mediation process from baseline to postintervention is represented by g1 and the comparable stability of the outcome variable is g2. Finally, the degree of correlation between M0 and Y0 is rmy. Figure 1.6 shows results from an analysis of the bias using the Figure 1.4B model to represent mediation for different levels of correlation between M0 and Y0. The results differ depending on how stable are the mediating and outcome processes in the control group. (For simplicity, the figure assumes that they are the same, i.e., g1 = g2.) Focusing on the estimate of the indirect effect, a * b, one can see that there is no bias if M and Y have no stability: The estimate is the expected 0.28 for all values of rmy when g1 = g2 = 0. However, when stability in M and Y is observed, the correlation between M0 and Y0 is substantial. Given that symptoms, such as depression, and coping strategies, such as cognitive skills, tend to be quite stable in longitudinal 1. There can be further refinements to the model shown in Figure 1.5B. One might consider a model where Y0 is related to the mediating process M. For example, if less depressed persons in the study were inclined to seek self-help information and M represented new cognitive skills that are available in the media, then the path between Y0 and M could be non-zero and negative.
1 Integrating Causal Analysis into Psychopathology Research
19
studies, we must conclude that important bias in estimates of the indirect effect is likely to be the rule rather than the exception. When investigators compute mediation analyses without taking into account the correlation of M and Y at baseline, they run the risk of concluding that an experimental result is completely mediated by an intervening process, when in fact there may be direct effects or other intervening processes also involved. The use of baseline measures is not the only way to make adjustments for spurious correlations between M and Y. In social psychology, Spencer, Zanna, and Fong (2005) argued that the causal involvement of the mediating path would be most convincing if researchers developed supplemental studies that manipulated M directly through randomized experiments. For example, in a long-term study of the CBT effects on depression, an investigator might randomly assign persons in the treatment group to experience (or not) a ‘‘booster’’ session that reminds the patient of the cognitive skills that were taught previously in the CBT sessions. One of the more challenging assumptions of this approach is that the nature of the M change in a direct intervention is identical to the nature of the M change that follows manipulation of T. It is possible, for example, that the booster intervention on cognitive skills might have a different impact from the original intervention because the patient has become aware of barriers to the implementation of the skill set. As a result of that experience, the patient might attend to different Indirect Effect Bias 1.20
1.00
Product a*b
0.80 Stability .8 Stability .6 Stability .4 Stability .2 Stability .0
0.60
0.40
0.20
0.00 0
0.1
0.2
0.3 0.4 0.5 0.6 Baseline Corr of M and Y
0.7
0.8
0.9
Figure 1.6 Chart showing the expected values of the indirect effect estimated from the model in Panel B of Figure 1.4 when the actual model was Panel B of Figure 1.5 with values a=.7, b=.4 and c0 =.28. Different lines show values associated with different stabilities of the M and Y processes (g1, g2 in Figure 1.5B) as a function of the baseline correlation between M and Y.
20
Causality and Psychopathology
aspects of the booster session from the original intervention. This difference could affect the strength of the relation between M and T in the direct manipulation condition. Nonetheless, the new information provided by direct manipulation of M is likely to increase the confidence one has in the estimate of the indirect causal path. Noting that it is the correlational nature of the link between M and Y that makes it challenging to obtain unbiased estimates of indirect (mediated) effects in randomized studies, it should not be surprising that the challenges are much greater in nonexperimental research. There are a number of studies published in peer review journals that attempt to partition assumed causal effects into direct and indirect components. For example, Mohr et al. (2003) reported that an association between traumatic stress and elevated health problems in 713 active police officers was fully mediated by subjective sleep problems in the past month. All the variables were measured in a cross-sectional survey. The path of stress to sleep to health problems is certainly plausible, but it is also possible that health problems raise the risk of both stress and sleep problems. Even if there is no dispute about the causal order, there can be dispute about the meaning of the mediation analysis in cases such as this. Presumably, the underlying model unfolds on a daily basis: Stress today disrupts sleep tonight, and this increases the risk of health problems tomorrow. One might hope that cross-sectional summaries of stress and sleep patterns obtained for the past month would be informative about the mediating process. However, Maxwell and Cole (Cole & Maxwell, 2003; Maxwell & Cole, 2007) provided convincing evidence that there is no certain connection between a time-dependent causal model and a result based on cross-sectional aggregation of data. They studied the implications of a stationary model where causal effects were observed on a daily basis for a number of days or parts of days. In addition to the mediation effects represented in Figure 1.4B (a, b, c0 ), they represented the stability of the T, M, and Y processes from one time point to the next. They studied the inferences that would be made from a cross-sectional analysis of the variables under different assumptions about the mediation effects and the stability of the processes. The bias of the cross-sectional analysis was greatly influenced by the process stability, and the direction of the bias was not consistent. Sometimes the bias of the indirect effect estimate was positive and sometimes it was negative. The Maxwell and Cole work prompts psychopathology researchers to think carefully about the temporal patterns in mediation and to take seriously the assumptions that were articulated by Judd and Kenny (1981). Others have called for modifications of the original positions taken by Kenny and his colleagues. An important alternate perspective has been advanced by MacArthur Network researchers (Kraemer, Kiernan, Essex, & Kupfer, 2008),
1 Integrating Causal Analysis into Psychopathology Research
21
who call into question the Baron and Kenny (1986) distinction between mediation and moderation. As we have already reviewed in Figure 1.4, a third variable is said by Baron and Kenny (1986) to be a mediator if it both has a direct association with Y adjusting for T and can be represented as being related linearly with T. A moderator, according to Baron and Kenny (1986), is a third variable (W) that is involved in a statistical interaction with T when T and W are used to predict Y. The MacArthur researchers note that the Baron and Kenny distinction is problematic if various nonlinear transformations of Y are considered. Such transformations can produce interaction models, even if there is no evidence that the causal effect is moderated. They propose to limit the concept of moderation to effect modifiers. If the third variable represents a status before the treatment is applied and if the size of the TY effect varies with the level of the status, then moderation is demonstrated from the MacArthur perspective. For randomized studies, the moderating variable would be expected to be uncorrelated with T. If psychopathology researchers embrace the MacArthur definition of moderation, considerable confusion in the literature will be avoided in the future.
Conclusion The time is ripe for psychopathology researchers to reconsider the conventions for making causal statements about mental health processes. On the one hand, conventions such as ITT analyses of clinical trials have led to conservative conclusions about the causal processes involved in the changes following interventions, and on the other hand, rote application of the Baron and Kenny (1986) steps for describing mediated paths have led to premature closure regarding which causal paths account for intervention effects. The old conventions are efficient for researchers in that they prescribe a small number of steps that must be followed in preparing manuscripts, but they limit the insights that are possible from a deeper consideration of causal mechanisms and pathways. The new approaches to causal analysis will not lead to quick statements about which factors are causal and which are spurious or even definitive statements, but they will allow clinical and experimental data to be viewed from multiple perspectives to reveal new causal insights. In many cases, the new approaches are likely to suggest causal heterogeneity in a population. Because of genetic differences, social context, developmental stage, timing of measurements, and random environmental flux, the size of causal effects of intervention T will vary from person to person. The new methods will help us to appreciate how the alternate summaries of the population causal effect can be affected by these distributions.
22
Causality and Psychopathology
It will often take more effort to use the modern tools of causal analysis, but the benefit of the effort is that researchers will be able to talk more explicitly about interesting causal theories and patterns rather than about associations that have been edited to remove any reference to ‘‘cause’’ or ‘‘effect.’’ In the long run the more sophisticated analyses will lead to more nuanced prevention and treatment interventions and a deeper understanding of the determinants of psychiatric problems and disorders. Many examples of these insights are provided in the chapters that follow.
References Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in New York City. Journal of the American Statistical Association, 98, 299–311. Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. Becker, M. H., & Maiman, L. A. (1975). Sociobehavioral determinants of compliance with health and medical care. Medical Care, 13(1), 10–24. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Brotman, L. M., Gouley, K. K., Huang, K.-Y., Rosenfelt, A., O’Neal, C., Klein, R. G., et al. (2008). Preventive intervention for preschoolers at high risk for antisocial behavior: Long-term effects on child physical aggression and parenting practices. Journal of Clinical Child and Adolescent Psychology, 37, 386–396. Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577. Connor, D. F., Glatt, S. J., Lopez, I. D., Jackson, D., & Melloni, R. H., Jr. (2002). Psychopharmacology and aggression. I: A meta-analysis of stimulant effects on overt/covert aggression-related behaviors in ADHD. Journal of the American Academy of Child & Adolescent Psychiatry, 41(3), 253–261. Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between poverty and psychopathology: A natural experiment. Journal of the American Medical Association, 290, 2023–2029. Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32, 1–22. Dobson, K. S. (1989). A meta-analysis of the efficacy of cognitive therapy for depression. Journal of Consulting and Clinical Psychology, 57(3), 414–419. Efron, B. (1998). Forward to special issue on analyzing non-compliance in clinical trials. Statistics in Medicine, 17, 249–250. Everitt, B. S., & Pickles, A. (2004). Statistical aspects of the design and analysis of clinical trials. London: Imperial College Press. Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: Wiley. Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29.
1 Integrating Causal Analysis into Psychopathology Research
23
Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J., DavilaRoman, V. G., et al. (2009). Treatment of depression after coronary artery bypass surgery: A randomized controlled trial. Archives of General Psychiatry, 66(4), 387–396. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Greenland, S., Pearl, J., & Robins, J. M. (1999a). Causal diagrams for epidemiologic research. Epidemiology, 10(1), 37–48. Greenland, S., Pearl, J., & Robins, J. M. (1999b). Confounding and collapsibility in causal inference. Statistical Science, 14(1), 29–46. Gross, D., Fogg, L., Webster-Stratton, C., Garvey, C., Julion, W., & Grady, J. (2003). Parent training of toddlers in day care in low-income urban communities. Journal of Consulting and Clinical Psychology, 71, 261–278. Hafeman, D. M. (2008). A sufficient cause based approach to the assessment of mediation. European Journal of Epidemiology, 23, 711–721. Hansen, R. A., Gartlehner, G., Lohr, K. N., Gaynes, B. N., & Carey, T. S. (2005). Efficacy and safety of second-generation antidepressants in the treatment of major depressive disorder. Annals of Internal Medicine, 143, 415–426. Hareli, S., & Hess, U. (2008). The role of causal attribution in hurt feelings and related social emotions elicited in reaction to other’s feedback about failure. Cognition & Emotion, 22(5), 862–880. Hegarty, J. D., Baldessarini, R. J., Tohen, M., & Waternaux, C. (1994). One hundred years of schizophrenia: A meta-analysis of the outcome literature. American Journal of Psychiatry, 151(10), 1409–1416. Herna´n, M. A., & Robins, J. M. (2006). Instruments for causal inference: an epidemiologist’s dream? Epidemiology, 17(4), 360–372. Holland, P. (1986). Statistics and causal inference (with discussion). Journal of the American Statistical Association, 81, 945–970. Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. Kraemer, H., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and MacArthur approaches. Health Psychology, 27(Suppl. 2), S101–S108. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum. Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal mediation. Psychological Methods, 12(1), 23–44. Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Mohr, D., Vedantham, K., Neylan, T., Metzler, T. J., Best, S., & Marmar, C. R. (2003). The mediating effects of sleep in the relationship between traumatic stress and health symptoms in urban police officers. Psychosomatic Medicine, 65, 485–489. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference. New York: Cambridge University Press. O’Mahony, S. M., Marchesi, J. R., Scully, P., Codling, C., Ceolho, A., Quigley, E. M. M., et al. (2009). Early life stress alters behavior, immunity, and microbiota in rats: Implications for irritable bowel syndrome and psychiatric illnesses. Biological Psychiatry, 65(3), 263–267.
24
Causality and Psychopathology
Pearl, J. (2009). Causality: Models, reasoning and inference. (Second edition) New York: Cambridge University Press. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann. Peduzzi, P., Wittes, J., Detre, K., & Holford, T. (1993). Analysis as-randomized and the problem of non-adherence: An example from the Veterans Affairs Randomized Trial of Coronary Artery Bypass Surgery. Statistics in Medicine, 12, 1185–1195. Piantadosi, S. (2005). Clinical trials: A methodologic perspective (2nd ed.). New York: Wiley. Quitkin, F. M., Petkova, E., McGrath, P. J., Taylor, B., Beasley, C., Stewart, J., et al. (2003). When should a trial of fluoxetine for major depression be declared failed? American Journal of Psychiatry, 160(4), 734–740. Rehder, B., & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 659–683. Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—applications to control of the healthy worker survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J. M. (1993). Analytic methods for estimating HIV treatment and cofactor effects. In D. G. Ostrow & R. C. Kessler (Eds.), Methodological issues of AIDS mental health research (pp. 213–288). New York: Springer. Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of propensity scores in observational studies for causal effects. Biometrika, 70, 41–55. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. Rubin, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58. Rubin, D. B. (1980). Discussion of ‘‘Randomization analysis of experimental data in the Fisher randomization test,’’ by D. Basu. Journal of the American Statistical Association, 75, 591–593. Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25, 279–292. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why experiments are often more effective than mediational analyses in examining psychological processes. Journal of Personality and Social Psychology, 89(6), 845–851. Toh, S., & Herna´n, M. (2008). Causal inference from longitudinal studies with baseline randomization. International Journal of Biostatistics, 4(1), article 22. Retrieved from http://www.bepress.com/ijb/vol4/iss1/22
2 What Would Have Been Is Not What Would Be Counterfactuals of the Past and Potential Outcomes of the Future sharon schwartz, nicolle m. gatto, and ulka b. campbell
Introduction Epidemiology is often described as the basic science of public health. A mainstay of epidemiologic research is to uncover the causes of disease that can serve as the basis for successful public-health interventions (e.g., Institute of Medicine, 1988; Milbank Memorial Fund Commission, 1976). A major obstacle to attaining this goal is that causes can never be seen but only inferred. For this reason, the inferences drawn from our studies must always be interpreted with caution. Considerable progress has been made in the methods required for sound causal inference. Much of this progress is rooted in a full and rich articulation of the logic behind randomized controlled trials (Holland, 1986). From this work, epidemiologists have a much better understanding of barriers to causal inference in observational studies, such as confounding and selection bias, and their tools and concepts are much more refined. The models behind this progress are often referred to as ‘‘counterfactual’’ models. Although researchers may be unfamiliar with them, they are widely (although not universally) accepted in the field. Counterfactual models underlie the methodologies that we all use. Within epidemiology, when people talk about a counterfactual model, they usually mean a potential outcomes model—also known as ‘‘Rubin’s causal model.’’ As laid out by epidemiologists, the potential outcomes model is rooted in the experimental ideas of Cox and Fisher, for which Neyman provided the first mathematical expression. It was popularized by Rubin, who extended 25
26
Causality and Psychopathology
it to observational studies, and expanded by Robins to exposures that vary over time (Maldonado & Greenland, 2002; Hernan, 2004; VanderWeele & Hernan, 2006). This rich tradition is responsible for much of the progress we have just noted. Despite this progress in methods of causal inference, a common charge in the epidemiologic literature is that public-health interventions based on the causes we identify in our studies often fail. Even when they do not fail, the magnitudes of the effects of these interventions are often not what we expected. Levins (1996) provides a particularly gloomy assessment: The promises of understanding and progress have not been kept, and the application of science to human affairs has often done great harm. Public health institutions were caught by surprise by the resurgence of old diseases and the appearance of new ones. . . . Pesticides increase pests, create new pest problems and contribute to the load of poison in our habitat. Antibiotics create new pathogens resistant to our drugs. (p. 1) A less pessimistic assessment suggests that although public-health interventions may be narrowly successful, they may simultaneously lead to considerable harm. An example is the success of antismoking campaigns in reducing lung cancer rates in the United States, while simultaneously increasing smoking and thereby lung cancer rates in less developed countries. This unintended consequence resulted from the redirection of cigarette sales to these countries (e.g., Beaglehole & Bonita, 1997). Ironically, researchers often attribute these public-health failures to a narrowness of vision imposed by the same models of causal inference that heralded modern advances in epidemiology and allied social and biological sciences. That is, counterfactual models improve causal inference in our studies but are held at least partly responsible for the failures of the interventions that follow those studies. Critics think that counterfactually based approaches in epidemiology not only do not provide a sound basis for publichealth interventions but cannot (e.g., Shy, 1997; McMichael, 1999). While there are many aspects of the potential outcomes model that warrant discussion, here we focus on one narrowly framed question: Is it possible, as the critics contend, that the same models that enhance the validity of our studies can mislead us when we try to intervene on the causes these studies uncover? We think the answer is a qualified ‘‘yes.’’ We will argue that the problem arises not because of some failure of the potential outcomes approach itself but, rather, because of unintended consequences of the metaphors and tools implied by the model. We think that the language, analogies, and conceptual frame that enhance the valid estimation of precise causal effects can encourage unrealistic expectations about the
2 What Would Have Been is Not What Would Be
27
relationship between the causal effects uncovered in our studies and results of interventions based on their removal. More specifically, we will argue that the unrealistic expectations of the success of interventions arise in the potential outcomes frame because of a premature emphasis on the effects of causal manipulation (understanding what would happen if the exposure were altered) at the expense of two other tasks that must come first in epidemiologic research: (1) causal identification (identifying if an exposure did cause an outcome) and (2) causal explanation (understanding how the exposure caused the outcome). We will describe an alternative approach that specifies all three of these steps—causal identification, followed by causal explanation, and then the effects of causal manipulation. While this alternative approach will not solve the discrepancy between the results of our studies and the results of our interventions, it makes the sources of the discrepancy explicit. The roles of causal identification and causal explanation in causal inference, which we build upon here, have been most fully elaborated by Shadish, Cook, and Campbell (2002), heirs to a prominent counterfactual tradition in psychology (Cook & Campbell 1979) . We think that a dialogue between these two counterfactual traditions (i.e., the potential outcomes tradition and the Cook and Campbell tradition as most recently articulated in Shadish et al.) can provide a more realistic assessment of what our studies can accomplish and, perhaps, a platform for a more successful translation of basic research findings into sound public-health interventions. To make these arguments, we will (1) review the history and principles of the potential outcomes model, (2) describe the limitations of this model as the basis for interventions in the real world, and (3) propose an alternative based on an integration of the potential outcomes model with other counterfactual traditions. We wish to make clear at the outset that virtually all of the ideas in this chapter already appear in the causal inference literature (Morgan & Winship, 2007). This chapter simply presents the picture we see as we stand on the shoulders of the giants in causal inference.
The Potential Outcomes Model In the epidemiologic literature, a counterfactual approach is generally equated with a potential outcomes model (e.g., Maldonado & Greenland, 2002; Hernan, 2004; VanderWeele & Hernan, 2006). In describing this model, we will use the term exposure to mean a variable we are considering as a possible cause. For ease of discourse, we will use binary exposures and
28
Causality and Psychopathology
outcomes throughout. Thus, individuals will be either exposed or not and will either develop the disease or not. The concept at the heart of the potential outcomes model is the causal effect of an exposure. A causal effect is defined as the difference between the potential outcomes that would arise for an individual under two different exposure conditions. In considering a disease outcome, each individual has a potential outcome for the disease under each exposure condition. Therefore, when comparing two exposure conditions (exposed and not exposed), there are four possible pairs of potential outcomes for each individual. An individual can develop the disease under both conditions, only under exposure, only under nonexposure, or under neither condition. Greenland and Robins (1986) used response types as a shorthand to describe these different pairs of potential outcomes. Individuals who would develop the disease under either condition (i.e., whether or not they were exposed) are called ‘‘doomed’’; those who would develop the disease only if they were exposed are called ‘‘causal types’’; those who would develop the disease only if they were not exposed are called ‘‘preventive types’’; and those who would not develop the disease under either exposure condition are called ‘‘immune.’’ Every individual is conceptualized as having a potential outcome under each exposure that is independent of the actual exposure. Potential outcomes are determined by the myriad of largely unknown genetic, in utero, childhood, adult, social, psychological, and biological causes to which the individuals have been exposed, other than the exposure under study. The effect of the exposure for each individual is the difference between the potential outcome under the two exposure conditions, exposed and not. For example, if an individual’s potential outcomes were to develop the disease if exposed but not if unexposed, then the exposure is causal for that individual (i.e., he or she is a causal type). Rubin uses the term treatment to refer to these types of exposures and describes a causal effect in language that implies an imaginary clinical trial. In Rubin’s (1978) terms, ‘‘The causal effect of one treatment relative to another for a particular experimental unit is the difference between the result if the unit had been exposed to the first treatment and the result if, instead, the unit had been exposed to the second treatment’’ (p. 34). One of Rubin’s contributions was the popularization of this definition of a causal effect in an experiment and the extension of the definition to observational studies (Hernan, 2004). For example, the causal effect of smoking one pack of cigarettes a day for a year (i.e., the first treatment) relative to not smoking at all (the second treatment) is the difference between the disease outcome for an individual if he or she smokes a pack a day for a year compared with the disease outcome
2 What Would Have Been is Not What Would Be
29
in that same individual if he or she does not smoke at all during this same time interval. One can think about the average causal effect in a population simply as the average of the causal effects for all of the individuals in the population. It is the difference between the disease experience of the individuals in a particular population if we were to expose them all to smoking a pack a day and the disease experience if we were to prevent them from smoking at all during this same time period. A useful metaphor for this tradition is that of ‘‘magic powder,’’ where the magic powder can remove an exposure. Imagine we sprinkle an exposure on a population and observe the disease outcome. Imagine then that we use magic powder to remove that exposure and can go back in time to see the outcome in the same population. The problem of causal inference is twofold—we do not have magic powder and we cannot go back in time. We can never see the same people at the same time exposed and unexposed. That is, we can never see the same people both smoking a pack of cigarettes a day for a year and, simultaneously, not smoking cigarettes at all for a year. From a potential outcomes perspective, this is conceptualized as a missing-data problem. For each individual, at least one of the exposure experiences is missing. In our studies, we provide substitutes for the missing data. Of course, our substitutes are never exactly the same as what we want. However, they can provide the correct answer if the potential outcomes of the substitute are the same as the potential outcomes of the target, the person or population you want information about. The potential outcomes model is clearly a counterfactual model in the sense that the same person cannot simultaneously experience both exposure and nonexposure. The outcomes of at least one of the exposure conditions must represent a counterfactual, an outcome that would have, but did not, happen. Rubin (2005), however, objects to the use of the term counterfactual when applied to his model. Counterfactual implies there is a fact (e.g., the outcome that did occur in a group of exposed individuals) to which the counterfactual (e.g., the outcome that would have occurred had this group of individuals not been exposed) is compared. However, for Rubin, there is no fact to begin with. Rather, the comparison is between the potential outcomes of two hypothetical exposure conditions, neither of which necessarily reflects an occurrence. The causal effect for Rubin is between two hypotheticals. Thus, in the potential outcomes frame, when epidemiologists use the term counterfactual, they mean ‘‘hypothetical’’ (Morgan & Winship, 2007). This subtle distinction has important implications, as we shall see. This notion of a causal effect as a comparison between two hypotheticals derives from the rootedness of the potential outcomes frame in experimental traditions. Holland (1986), an early colleague of Rubin and explicator of his
30
Causality and Psychopathology
work, makes this experimental foundation clear in his summary of the three main tenets of the potential outcomes model. First, the potential outcomes model studies the effects of causes and not the causes of effects. Thus, the goal is to estimate the average causal effect of an exposure, not to identify the causes of an outcome. For a population, this is the average causal effect, defined as the average difference between two potential outcomes for the same individuals, the potential outcome under exposure A vs. the potential outcome under exposure B. The desired, but unobservable, true causal effect is the difference in outcome in one population under two hypothetical exposure conditions: if we were to expose the entire population to exposure A vs. if we were to expose them to exposure B. As in an experiment, the exposure is treated as if it were in the control of the experimenter; the goal is to estimate the effect that this manipulation would have on the outcome. Second, the effects of causes are always relative to particular comparisons. One cannot ask questions about the effect of a particular exposure without specifying the alternative exposure that provides the basis for the comparison. For example, smoking a pack of cigarettes a day can be preventive of lung cancer if the comparison was smoking four packs of cigarettes a day but is clearly causal if the comparison was with smoking zero packs a day. As in an experiment, the effect is the difference between two hypothetical exposure conditions. Third, potential outcomes models limit the types of factors that can be defined as causes. In particular, attributes of units (e.g., attributes of people such as gender) are not considered to be causes. This requirement clearly derives from the experimental, interventionist grounding of this model. To be a cause (or at least a cause of interest), the factor must be manipulable. In Holland (1986, p. 959) and Rubin’s terminology, ‘‘No causation without manipulation.’’1 The focus on the effect of causes, the precise definition of the two comparison groups, and the emphasis on manipulability clearly root the potential outcomes approach in experimental traditions. Strengths of this approach include the clarity of the definition of the causal effect being estimated and the articulation of the assumptions necessary for this effect to be valid. These assumptions are (1) that the two groups being compared (e.g., the exposed and the unexposed) are exchangeable (i.e., they have the same potential outcomes) and (2) that the stable unit treatment value assumption (SUTVA) holds. While exchangeability is well understood in epidemiology, the requirements of SUTVA may be less accessible.
1. Rubin (1986), in commenting on Holland’s 1986 article, is not as strict as Holland in demanding that causes be, by definition, manipulable. Nonetheless, he contends that one cannot calculate the causal effect of a nonmanipulable cause and coauthored the ‘‘no causation without manipulation’’ mantra.
2 What Would Have Been is Not What Would Be
31
Stable Unit Treatment Value Assumption A valid estimate of this causal effect requires that the two groups being compared (e.g., the exposed and the unexposed) are exchangeable (i.e., that is there is no confounding) and that SUTVA is reasonable. SUTVA requires that (1) the effect of a treatment is the same, no matter how an individual came to be treated, and (2) the outcome in an individual is not influenced by the treatment that other individuals receive. In Rubin’s (1986) language, SUTVA is simply the a priori assumption that the value of Y [i.e., the outcome] for unit u [e.g., a particular person] when exposed to treatment t [e.g., a particular exposure or risk factor] will be the same no matter what mechanism is used to assign treatment t to unit u and no matter what treatments the other units receive . . . SUTVA is violated when, for example, there exist unrepresented versions of treatments (Ytu depends on which version of treatment t was received) or interference between units (Ytu depends on whether unit u0 received treatment t or t0 ). (p. 961) Thus, if one were to study the effects of a particular form of psychotherapy, SUTVA would be violated if (1) there were different therapists with different levels of expertise or some individuals freely agreed to enter therapy while others agreed only at the behest of a relative and the mode of entry influenced the effectiveness of the treatment (producing unrepresented versions of treatments) (Little & Rubin, 2000) or (2) individuals in the treatment group shared insights they learned in therapy with others in the study (producing interference between units) (Little & Rubin, 2000). The language in which SUTVA is described, the effects of treatment assignment and versions of treatments, is again indicative of the explicit connection between the potential outcomes model and randomized experiments. To make observational studies as close to experiments as possible, we must ensure that those exposed to the ‘‘alternative treatments’’ (i.e., different exposures) are exchangeable in the sense that the same outcomes would arise if the individuals in the different exposure groups were reversed. In addition, we must ensure that we control all factors that violate SUTVA. We do this by carefully defining exposures or risk factors in terms of narrowly defined treatments that can be manipulated, at least in theory. To continue our smoking example, one could ask questions about the average causal effect of smoking a pack of cigarettes a day for a year (treatment A) compared with never having smoked at all (treatment B) in an observational study. Since we cannot observe the same people simultaneously under two different treatments, we compare the disease experience of two
32
Causality and Psychopathology
groups of people: one with treatment A, the exposure of interest, and one with treatment B, the substitute for the potential outcomes of the same group under the second treatment option. In order for the substitution to yield an accurate effect estimate (i.e., for exchangeability to hold), we must ensure that the smokers and nonsmokers are as similar as possible on all causes of the outcome (other than smoking). This can be accomplished by random assignment in a randomized controlled trial. To meet SUTVA assumptions, we have to (1) be vigilant to define our exposure precisely so there is only one version of each treatment and be certain that how individuals entered the smoking and nonsmoking groups did not influence their outcome and (2) ensure the smoking habits of some individuals in our study did not influence the outcomes of other individuals. Barring other methodological problems, it would be assumed that if we did the intervention in real life, that is, if we prevented people from smoking a pack of cigarettes a day for a year, the average causal effect estimated from our study would approximate this intervention effect. The potential outcomes model is an attempt to predict the average causal effect that would arise (or be prevented) from a particular manipulation under SUTVA. It is selfconsciously interventionist. Indeed, causal questions are framed in terms of intervention consequences. To ensure the validity of the causal effects uncovered in epidemiologic studies, researchers are encouraged to frame the causal question in these terms. As a prototypical example, Glymour (2007), in a cogent methodologic critique of a study examining the effect of childhood socioeconomic position on adult health, restated the goal of the study in potential outcome terms. ‘‘The primary causal question of interest is how adult health would differ if we intervened to change childhood socio-economic position’’ (p. 566). It is critical to note that even when we do not explicitly begin with this type of model, the interventionist focus of the potential outcomes frame implicitly influences our thinking through its influence on our methods. For example, this notion is embodied in our understanding of the attributable risk as the proportion of disease that would be prevented if we were to remove this exposure (Last, 2001). More generally, authors often end study reports with a statement about the implications of their findings for intervention or policy that reflect this way of thinking.
Limitations of the Potential Outcomes Model for Interventions in the Real World To ensure the internal validity of our inferences, we isolate the effects of our causes from the context in which they act. We do this by narrowly defining
2 What Would Have Been is Not What Would Be
33
our treatments, creating exchangeability between treated and untreated people, and considering social norms and the physical environment as part of a stable background in which causes act. In order for the causal effect of an exposure in a study to translate to the effect of its intervention, all of the controls and conditions imposed in the study must hold in the intervention and the targeted population (e.g., treatment definition, follow-up time frame, distribution of other causes). The problem is that, in most cases, interventions in the real world cannot replicate the conditions that gave rise to the average causal effect in a study. It is important to note that this is true for randomized controlled trials as well as observational studies. It is true for classic risk factors as well as for exposures in life course and social epidemiology. The artificial world that we appropriately create to identify causal effects—a narrow swath of temporal, geographic, and social reality in which exchangeability exists and SUTVA is not violated—captures a vital but limited part of the world in which we are interested. Thus, while the approach we use in studies aids in the valid estimation of a causal effect for the past, it provides a poor indicator of a causal effect for the future. For these reasons, the causal effects of our interventions in the real world are unlikely to be the same as the causal effects of our studies. This problem is well recognized in the literature on randomized controlled trials in terms of the difference between efficacy and effectiveness and in the epidemiologic literature as the difference between internal validity and external validity. However, this recognition is rarely reflected in research practice. We suspect this problem may be better understood by deeper examination of the causes of the discrepancy between the effects observed in studies and the effects of interventions. We group these causes into three interrelated categories: direct violations of SUTVA, unintended consequences of our interventions, and context dependencies.
Direct Violations of SUTVA Stable Treatment Effect In order to identify a causal effect, a necessary SUTVA is that there is only one version of the treatment. To meet this assumption, we need to define the exposures in our studies in an explicit and narrow way. For example, we would ask about the effects of a particular form of psychotherapy (e.g., interpersonal psychotherapy conducted by expert clinicians) rather than about psychotherapy in general. This is because the specific types of therapy encompassed within the broad category of ‘‘psychotherapy’’ are likely to have different effects on the outcome.
34
Causality and Psychopathology
While this is necessary for the estimation of precise causal effects in our studies, it is not likely to reflect the meaning of the exposure or treatment in the real world. The removal of causes or the provision of treatments, no matter how well defined, is never surgical. Unlike the removal of causes by the magic powder in our thought experiments, interventions are often crude and messy. Public-health interventions are inherently broad. Even in a clinical context, treatment protocols are never followed precisely in realworld practice. In public-health interventions, there are also different ways of getting into ‘‘treatment,’’ and these may well have different effects on the outcome. For instance, the effect of an intervention offering a service may be very different for those who use it only after it has become popular (the late adopters). Early adopters of a low-fat diet, for example, may increase their intake of fruits and vegetables to compensate for the caloric change. Late adopters may substitute low-fat cookies instead. A low-fat diet was adopted by both types of people, but the effect on an outcome (e.g., weight loss) would likely differ. There are always different versions of treatments, and the mechanisms through which individuals obtain the treatments will frequently impact the effect of the treatments on the outcome. Interference Between Units When considered in real-world applications over a long enough time frame, there will always be ‘‘interference between units.’’ Because people live in social contexts, their behavior influences norms and social expectations. Behavior is contagious. This can work in positive ways, increasing the effectiveness of an intervention, or lead to negative unintended consequences. An example of the former would be when the entrance of a friend into a weightloss program encourages another friend to lose weight (Christakis & Fowler, 2007). Thus, the outcome for one individual is contingent on the exposure of another individual. Similarly, changes in individual eating behaviors spread. This influences not only individuals’ behavior but, eventually, the products that stores carry, the price of food, and the political clout of like-minded individuals. It changes the threshold for the adoption of healthy eating habits. There is an effect not only of the weight-loss program itself but also of the proportion of people enrolled in weight-loss programs within the population. Within the time frame of our studies, the extant norms caused by interactions among individuals and the effect of the proportion of exposure in the population are captured as part of the background within which causes act, are held constant, and are invisible. To identify the true effects these causes had, this approach is reasonable and necessary. The causes worked during
2 What Would Have Been is Not What Would Be
35
that time frame within that normative context. However, in a public-health intervention, these norms change over time due to the intervention. This problem is well recognized in infectious disease studies where the contagion occurs in a rapid time frame, making noninterference untenable even in the context of a short-term study. It is hard to imagine, though, any behavior which is not contagious over long enough time frames. The fact is that the causal background we must hold constant to estimate a causal effect is influenced by our interventions.
Unintended Consequences of Interventions Unintended consequences of interventions are consequences of exposure removal not represented as part of the causal effect of the exposure on the outcome under study. The causes of these unintended consequences include natural confounding and narrowly defined outcomes. Natural Confounding Recall that the estimation of the true causal effect requires exchangeability of potential outcomes between the exposed and unexposed groups in our studies. Exchangeability is necessary to isolate the causal effect of interest. For example, in examining the effects of alcohol abuse on vehicular fatalities, we may control for the use of illicit drugs. We do so because those who abuse alcohol may be more likely to also abuse other drugs that are related to vehicular fatalities. If the association between alcohol abuse and illicit drug use is a form of ‘‘natural confounding,’’ that is, the association between alcohol and drug use arises in naturally occurring populations and is not an artifact of selection into the study, then this association is likely to have important influences in a real-world intervention. That is, the way in which individuals came to be exposed may influence the effect of the intervention, in violation of SUTVA. For example, when two activities derive from a similar underlying factor (social, psychological, or biologic), the removal of one may influence the presence of the other over time; it may activate a feedback loop. Thus, the causal effect of alcohol abuse on car accidents may overestimate the effect of the removal of alcohol abuse from a population if the intervention on alcohol use inadvertently increases marijuana use. As this example illustrates, an intervention may influence not only the exposure of interest but also other causes of the outcome that are linked with the exposure in the real world. In our studies, we purposely break this link. We overcome the problem of the violation of SUTVA by imposing narrow limits on time and place so that SUTVA holds in the study. We control these
36
Causality and Psychopathology
variables, precisely because they are also causes of the outcome under study. In the real world, however, their influence may make the interventions less effective than our effect estimates suggest. The control in the study was not incorrect as it was necessary to isolate the true effect that alcohol use did have on car accidents among these individuals given the extant conditions of the study. However, outside the context of the study, removal of the exposure of interest had unintended consequences over time through its link with other causes of the outcome. Narrowly Defined Outcomes Although we may frame our studies as identifying the ‘‘effects of causes,’’ they identify only the effects of causes on the specific outcomes we examine in our studies. In the real world, causes are likely to have many effects. Likewise, our interventions have effects on many outcomes, not only those we intend. Unless we consider the full range of outcomes, our interventions may be narrowly successful but broadly harmful. For example, successful treatments for AIDS have decreased the death rate but have also led people to reconceptualize AIDS from a lethal illness to a manageable chronic disease. This norm change can lead to a concomitant rise in risk-taking behaviors and an increase in disease incidence. More optimistically, our interventions may have beneficial effects that are greater than we assume if we consider unintended positive effects. For example, an intervention designed to increase high school graduation rates may also reduce alcoholism among teens.
Context Dependency Most fundamentally, all causal effects are context-dependent, and therefore, all effects are local. It is unlikely that a public-health intervention will be applied only in the exact population in which the causal effects were studied. Public-health interventions often apply to people who do not volunteer for them, to a broader swath of the social fabric and over a different historical time frame. Therefore, even if our effect estimates were perfectly valid, we would expect effects to vary between our studies and our interventions. For example, psychiatric drugs are often tested on individuals who meet strict Diagnostic and Statistical Manual of Mental Disorders criteria, do not have comorbidities, and are placebo nonresponders. Once the drugs are marketed, however, they are used to treat individuals who represent a much wider population. It is unlikely that the effects of the drugs will be similar in real-world usage as in the studies. For all these reasons, it seems unlikely that the causal effect of any intervention will reflect the causal effect found in our studies. These problems are
2 What Would Have Been is Not What Would Be
37
well known and much discussed in the social science literature (e.g., Merton, 1936, 1968; Lieberson, 1985) and the epidemiologic literature (e.g., Greenland, 2005). Nonetheless, when carrying out studies, epidemiologists often talk about trying to identify ‘‘the true causal effect of an exposure,’’ as if this was a quantification that has some inherent meaning. An attributable risk is interpreted as if this provided a quantification of the effect of the elimination of the exposure under study. Policy implications of etiologic work are discussed as if they flowed directly from our results. We think that this is an overly optimistic assessment of what our studies can show. We think that as a field we tend to estimate the effect exposures had in the past and assume that this will be the effect in the future. We do this by treating the counterfactual of the past as equivalent to the potential outcome of the future.
An Alternative Counterfactual Framework (An Integrated Counterfactual Approach) An alternative framework, which we will refer to as an ‘‘integrated counterfactual approach’’ (ICA), distinguishes three sequential tasks in the relationship between etiologic studies and public-health interventions, the first two of which are not explicit goals in a potential outcomes frame: (1) causal identification, (2) causal explanation, and (3) the effects of causal manipulation.
Step 1: Causal Identification In line with the Cook and Campbell tradition (Shadish et al., 2002; Cook & Campbell 1979), this alternative causal approach uses the insights and methods of potential outcomes models but reframes the question that these models address as the identification of a cause rather than the result of a manipulation. Whereas the potential outcomes model is rooted in experiments, the ICA is rooted in philosophic discussions of counterfactual definitions of a cause, particularly the work of Mackie (1965, 1974). It begins with Mackie’s definition of a cause rather than a definition of a causal effect. For Mackie, X is a cause of Y if, within a causal field, with all held constant, Y would not have occurred if X had not, at least not when and how it did. Mackie’s formulation begins with a particular outcome and attempts to identify some of the factors that caused it. Thus, the causal contrast for Mackie is between what actually happened and what would have happened had everything remained the same except that one of the exposures was absent. The contrast represents the difference between a fact and a counterfactual, rather than two potential outcomes.
Causality and Psychopathology
38
Thus, for Mackie, something is a cause if the outcome under exposure is different from what the outcome would have been under nonexposure. By beginning with actual occurrences, Mackie gives prominence to the contingency of all causal identification. This approach explicitly recognizes that causes are always identified within a causal field of interest, where certain factors are assumed to be part of the background in which causes act, rather than factors available for consideration as causes. The decision to assign factors to the background may differ among researchers and time periods. Thus, there is a subjective element in deciding which, among the myriad of possible exposures, factor is hypothesized to be a cause of interest. Rothman and Greenland (1998) provide a definition of a cause in the context of health that is consistent with Mackie’s view: ‘‘We can define a cause of a specific disease event as an antecedent event, condition, or characteristic that was necessary for the occurrence of the disease at the moment it occurred, given that other conditions are fixed’’ (p. 8). As applied to a health context, both Mackie and Rothman and Greenland begin with the notion that, for most diseases, an individual can develop a disease from one of many possible causes, each of which consists of several components working together. In this model, although none of the components in any given constellation can cause disease by itself, each makes a nonredundant and necessary contribution to complete a causal mechanism. A constellation of components that is minimally sufficient to cause disease is termed a sufficient cause. Mackie referred to these component causes as ‘‘insufficient but necessary components of unnecessary but sufficient’’ (INUS) causes. Rothman’s (1976) sufficient causes are typically depicted as ‘‘causal pies.’’ As an example, assume that the disease of interest is schizophrenia. There may be three sufficient causes of this disease (see Figure 2.1). An individual can develop schizophrenia from a genetic variant, a traumatic event, and poor nutrition; from stressful life events, childhood neglect,
Poor nutrition Trauma U1 Gene
Sufficient Cause 1
Neglect Stressful event
U2
Toxin
Sufficient Cause 2
Child virus
Prenatal viral exposure U3
Vitamin deficiency Sufficient Cause 3
Figure 2.1 Potential Causes of Schizophrenia depicted as Causal Pies. Adapted from Rothman and Greenland, (1998).
2 What Would Have Been is Not What Would Be
39
and exposure to an environmental toxin; or from prenatal viral exposures, childhood viral exposure, and a vitamin deficiency. We have added components U1, U2, and U3 to the sufficient causes to represent additional unknown factors. Each individual develops schizophrenia from one of these sufficient causes; in no instance does the disease occur from any one factor—rather, it occurs due to several factors working in tandem. The ICA and potential outcomes model are quite consistent in many critical ways in this first step. Indeed, the potential outcomes model provides a logical, formal, statistical framework applicable to causal inference within the context of the ICA. Regardless of whether we intend to identify a cause or estimate a causal effect, the same isolation of the cause is required. Most essentially, this means that comparison groups must be exchangeable. However, each framework is intended to answer different questions (see Table 2.1). From a potential outcomes perspective, the goal is to estimate the causal effect of the exposure. From an ICA perspective, the goal is to identify whether an exposure was a cause. This distinction between the goals of identifying the effects of causes and the causes of effects is critical and has many consequences. First, identifying the effects of causes is future-oriented. We start with a cause and estimate its effect. The causal contrast is between the potential
Table 2.1 Differences Between the Potential Outcomes Model and an Integrated Counterfactual Approach Potential Outcomes Model Goal • Salient differences
Estimation of true causal effect
Integrated Counterfactual Approach Identification of true causes
• Estimate • Quantitative • Effects of causes
• Identify • Qualitative • Causes of effects
Compare two potential outcomes
Compare a fact with a counterfactual • Exposed under two exposure conditions • Any factor • Construct validity • Mimic assignment of exposed
Means • Salient differences
• Entire population under two
Interpretation • Salient differences
• Expect consistency
exposures
• Manipulable causes • SUTVA • Mimic random assignment Potential outcome of the future
Causal effect of the past
• Expect inconsistency
40
Causality and Psychopathology
disease experiences of a group of individuals under two exposure conditions. Identifying the causes of effects, in contrast, implies that the identification is about what happened in the past. The causal contrast is between what did happen to a group of individuals under the condition of exposure, something explicitly grounded in and limited by a particular sociohistorical reality, and what would have happened had all conditions remained constant except that the exposure was absent. This approach identifies factors that actually were causes of the outcome. Whether or not they will be causes of the outcome depends on the constellation of the other factors held constant at that particular sociohistorical moment. The effect of this cause in the future is explicitly considered a separate question. Second, when we consider a potential outcomes model, the causal effect of interest is most often the causal effect for the entire population. That is, we conceptualize the causal contrast as the entire study population under two different treatments. We create exchangeability by mimicking random assignment. Neither exposure condition is ‘‘fact’’ or ‘‘counterfactual.’’ Rather, both treatment conditions are substitutes for the experience of the entire population under that treatment. In contrast, Mackie’s perspective implies that the counterfactual of interest is a counterfactual for the exposed. We take as a given what actually happened to people with the putative causal factor and imagine a counterfactual in reference to them. We create exchangeability by mimicking the predispositions of the exposed. This puts a different spin on the issue of confounding and nonexchangeability.2 The factors that differentiate exposed and unexposed people are more easily seen as grounded in characteristics of truly exposed people and their settings. It makes explicit that the causal effect for people who are actually exposed may not be the same as the effect that cause would have on other individuals. Thus, this type of confounding is seen not as a study artifact but as a form of true differences between exposed and unexposed people that can be and must be adjusted for in our study but must also be considered as an active element in any real-life intervention. Third, the focus on estimating the effects of causes in the potential outcomes model leads to the requirement of manipulability; any factor which is not manipulable is not fodder for causal inference. From an ICA perspective, any factor can be a cause (Shadish et al., 2002). To qualify, it has to be a factor that, were it absent and with all else the same, this outcome within this context would not have occurred. Even characteristics of individuals, such as gender, are grist for a counterfactual thought experiment. The world is
2. Technically, when the effect for the entire population is of interest, full exchangeability is required. When the effect for the exposed is of interest, only partial exchangeability is required (Greenland & Robins, 1986).
2 What Would Have Been is Not What Would Be
41
fixed as it is in this context, say, with a fairly rigid set of social expectations depending on identified sex at birth. We can ask a question about what an individual’s life would have been like had he been born male, rather than female, given this social context. Fourth, this perspective brings the issue of context dependency front and center. As Rothman’s (1976) and Mackie’s (1965) models make explicit, shifts in the component causes and their distributions, variations in the field of interest, and the sociohistorical context change the impact of the cause and, indeed, determine whether or not the factor is a cause in this circumstance. Thus, the impact of a cause is explicitly recognized as context-dependent; the size of an effect is not universal. A factor can be a cause for some individuals in some contexts but not in others. Thus, the goal is the ‘‘identification of causes in the universe,’’ rather than the estimation of universal causal effects. By ‘‘causes in the universe’’ we mean factors which at some moment in time have caused the outcome of interest and could theoretically (if all else were equal) happen again.
Step 2: Causal Explanation The focus on the causes of effects facilitates an important distinction that emerges from the Cook and Campbell (1979) tradition—that between causal identification and causal explanation. From their perspective, in the first step, we identify whether the exposure of interest did cause the outcome in some people in our study. We label this ‘‘causal identification.’’3 If we want to understand the effect altering a cause in the future, an additional step of causal explanation is required. Causal explanation comprises two components, construct validity, an understanding of the ‘‘active ingredients’’ of the exposure and how they work, and external validity, an identification of the characteristics of persons, places, and settings that facilitate its effect on the outcome. Construct Validity In causal identification, we examine the causal effects of our variables as measured. In causal explanation, we ask what it is about these variables that caused the outcome. Through mediational analyses, we examine both the active ingredients of the exposure (i.e., what aspects of the exposure are causal) and the pathways through which the exposure affects the outcome. Mediational analyses explicitly explore the potential SUTVA violation inherent 3. Shadish et al. (2002) call this step ‘‘causal description.’’ We think ‘‘causal identification: is a better fit for our purposes.
42
Causality and Psychopathology
in different versions of treatments. Exploration of pathways can lead to a more parsimonious explanation for findings across different exposure measures. Based on the active ingredients of exposure (and their resultant pathways), we can test not only the specific exposure–disease relationship but also a more integrative theory regarding the underlying ‘‘general causal mechanisms’’ (Judd & Kenny, 1981). This theory allows us to make statements about an observed association that are less bounded by the specific circumstances of a given study and to generalize based on deep similarities (Judd & Kenny, 1981; Shadish et al., 2002). This generalization has two practical benefits. First, knowledge of mechanisms enhances our ability to compare study results across exposures and, thus, integrate present knowledge. Second, such an analysis can help to identify previously unknown exposures or treatments, due to the fact that they capture the same active ingredient (or work through the same mechanism) as the exposure or treatment under study (Hafeman, 2008). Let us continue the gender example. First, we test the hypothesis that female gender was a cause of depression for some people in our sample. By this we mean that there are some people who got depressed as women who would not have been depressed had they not been women (i.e., if they were male—or some other gender). Of course, causal inference is tentative as always. Assume that at this first step we identified something that is not just an association and we took care to rule out all noncausal alternative explanations to the best of our ability. Once that step is accomplished, we may ask how female gender causes disease. Gender is a multifaceted construct with many different aspects— genetic, hormonal, psychological, and social. Once we know that gender has a causal effect, probing the construct helps us to identify what it is about female gender that causes depression. This may help to verify gender’s causality in depression and to identify other exposures that do the same thing as gender (i.e., other constructs that have the same active ingredient). For example, some have suggested that the powerlessness of women’s social roles is an active ingredient in female gender as a cause of depression. This would suggest that other social roles related to powerlessness, such as low socioeconomic position, might also be causally related to depression. Probing the construct of the outcome plays a similar role in causal explanation. It helps to identify the specific aspects of the outcome that are influenced by the exposure and to refine the definition of the outcome. External Validity The other aspect of causal explanation requires an examination of the conditions under which exposures act (Shadish et al., 2002). The context
2 What Would Have Been is Not What Would Be
43
dependency of causal effects is therefore made explicit. Causal inference is strengthened through the theoretical consideration and testing of effect variation. From the perspective of the ICA, consistency of effects across settings, people, and time periods is not the expectation. Rather, variation is expected and requires examination. When we identify causes in our studies, we make decisions about the presumptive world that we hold constant, considering everything as it was when the exposure arose. Thus, the social effects and norms that may have been consequences of the exposure are frozen in the context. However, when we intervene on our causes, we must consider the new context. This aspect of causal explanation, the specification of the conditions under which exposures will and will not cause disease, is considered the separate task of external validity in the Cook and Campbell scheme.
Step 3: Causal Manipulation While this separation of causal identification and causal explanation has the benefit of placing contingency and context dependency center stage, it does not resolve the discrepancy between the effects observed in studies and the effects of interventions. It does not provide the tools necessary to uncover the feedback loops and unintended consequences of our interventions. It does not fully address the violation of the SUTVA of no interference between units. Even causal explanation is conducted within established methods of isolation, reductionism, and linearity. Prediction of the effects of causal manipulation may require a different approach, one rooted in complexity theories and systems analysis, as the critics contend (e.g., McMichael, 1999; Levins, 1997; Krieger, 1994). To understand an intervention, the complexity of the system and feedbacks depends, of course, on the question at hand. The critical issue, as Levins (1996) notes, is the ability to decide when simplification is constructive and when it is an obfuscation. The implementation of systems approaches within epidemiology requires considerable methodological and conceptual development but may be a required third step to link etiologic research to policy. The integrated causal approach does not provide a solution to the discrepancy between the results of etiologic studies and the results of public-health interventions. It does, however, provide a way of thinking in which causal identification is explicitly conceptualized as a first step rather than a last step for public-health intervention. It is a road map to a proposed peace treaty in the epidemiology wars between the counterfactual and dynamic model camps. It suggests that the models are useful for different types of questions. Counterfactual approaches, under SUTVA, are essential for identifying causes
44
Causality and Psychopathology
of the past. Dynamic models allowing for violations of SUTVA are required to understand potential outcomes of the future.
Summary The rigor of causal inference, brought to light in the development of the potential outcomes model, is essential as the basis for any intervention. Rigor is demanded because interventions developed around noncausal associations are doomed to failure. However, reifying the results of our studies by treating causes as potential interventions is also problematic. We suspect that public health will benefit from interventions identified using an approach that integrates the potential outcomes tradition of Rubin and Robins in statistics and epidemiology with the counterfactual tradition of Shadish, Cook, and Campbell in psychology. This integrated approach clarifies that the identification of causes facilitated by isolation is only a first step in policy formation. A second step, causal explanation, aids in the generalizability of our findings. Here, however, instead of replication of our study in different contexts, we generalize on the basis of the deep similarities uncovered through causal explanations. The steps of identification and explanation may require a third step of prediction to understand intervention effects. The causes that we identify, together with their mediators and effect modifiers, may be considered nodes in more complex analyses that allow for the consideration of feedback loops and the unintended consequences that are inherent in any policy application. The methods for this final step have not yet been fully developed. The conceptual separation of these three questions, grounded in a distinction between counterfactuals of the past and potential outcomes of the future may prepare the ground for such innovations. For as Kierkegaard (1943; cited in Hannay, 1996) noted, ‘‘life is to be understood backwards, but it is lived forwards.’’ At a minimum, we hope that a more modest assessment of what current epidemiologic methods can provide will help stem cynicism that inevitably arises when we promise more than we can possibly deliver.
References Beaglehole, R., & Bonita, R. (1997). Public health at the crossroads: Achievements and prospects. New York: Cambridge University Press. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357, 370–379. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.
2 What Would Have Been is Not What Would Be
45
Glymour, M. M. (2007). Selected samples and nebulous measures: Some methodological difficulties in life-course epidemiology. International Journal of Epidemiology, 36, 566–568. Greenland, S. (2005). Epidemiologic measures and policy formulation: Lessons from potential outcomes. Emerging Themes in Epidemiology, 2, 1–7. Greenland, S., & Robins, J. M. (1986). Identifiability, exchangeability and epidemiological confounding. International Journal of Epidemiology, 15, 413–419. Hafeman, D. (2008). Opening the black box: A re-assessment of mediation from a counterfactual perspective. Unpublished doctoral dissertation, Columbia University, New York. Hannay, A. (1996). Soren Kierkegaard S. (1843): Papers and journals. London: Penguin Books. Hernan, M. A. (2004). A definition of causal effect for epidemiological research. Journal of Epidemiology and Community Health, 58, 265–271. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Institute of Medicine (1988). The Future of Public Health. Washington, DC: National Academy Press. Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. Krieger, N. (1994). Epidemiology and the web of causation: Has anyone seen the spider? Social Science and Medicine, 39, 887–903. Last, J. M. (2001). A dictionary of epidemiology. New York: Oxford University Press. Levins, R. (1996). Ten propositions on science and anti-science. Social Text, 46/47, 101–111. Levins, R. (1997). When science fails us. Forests, Trees and People Newsletter, 32/33, 1–18. Lieberson, S. (1985). Making it count: The improvement of social research and theory. Berkeley: University of California Press. Little, R. J., & Rubin, D. B. (2000). Causal effects in clinical and epidemiological studies via potential outcomes: Concepts and analytical approaches. Annual Review of Public Health, 21, 121–145. Mackie, J. L. (1965). Causes and conditions. American Philosophical Quarterly, 4, 245–264. Mackie, J. L. (1974). Cement of the universe: A study of causation. Oxford: Oxford University Press. Maldonado, G. & Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology, 31, 422–429. McMichael, A. J. (1999). Prisoners of the proximate: Loosening the constraints on epidemiology in an age of change. American Journal of Epidemiology, 149, 887–897. Merton, R. K. (1936). The unanticipated consequences of purposive social action. American Sociological Review, 1, 894–904. Merton, R. K. (1968). Social theory and social structure. New York: Free Press. Milbank Memorial Fund Commission (1976). Higher education for public health: A report of the Milbank Memorial Fund Commission. New York: Prodist. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge: Cambridge University Press. Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rothman, K.J. & Greenland, S. (1998). Modern epidemiology. Philadelphia: LippincottRaven Publishers.
46
Causality and Psychopathology
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58. Rubin, D. B. (1986). Statistics and causal inference comment: Which ifs have causal answers. Journal of the American Statistical Association, 81, 961–962. Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling decisions. Journal of the American Statistical Association, 100, 322–331. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin. Shy, C. M. (1997). The failure of academic epidemiology: Witness for the prosecution. American Journal of Epidemiology, 145, 479–484. VanderWeele, T. J., & Hernan, M. A. (2006). From counterfactuals to sufficient component causes and vice versa. European Journal of Epidemiology, 21, 855–858.
3 The Mathematics of Causal Relations judea pearl
Introduction Almost two decades have passed since Paul Holland published his highly cited review paper on the Neyman-Rubin approach to causal inference (Holland, 1986). Our understanding of causal inference has since increased severalfold, due primarily to advances in three areas: 1. Nonparametric structural equations 2. Graphical models 3. Symbiosis between counterfactual and graphical methods These advances are central to the empirical sciences because the research questions that motivate most studies in the health, social, and behavioral sciences are not statistical but causal in nature. For example, what is the efficacy of a given drug in a given population? Can data prove an employer guilty of hiring discrimination? What fraction of past crimes could have been avoided by a given policy? What was the cause of death of a given individual in a specific incident? Remarkably, although much of the conceptual framework and many of the algorithmic tools needed for tackling such problems are now well established, they are hardly known to researchers in the field who could put them into practical use. Why? Solving causal problems mathematically requires certain extensions in the standard mathematical language of statistics, and these extensions are not generally emphasized in the mainstream literature and education. As a result, large segments of the statistical research community find it hard to appreciate and benefit from the many results that causal analysis has produced in the past two decades. 47
48
Causality and Psychopathology
This chapter aims at making these advances more accessible to the general research community by, first, contrasting causal analysis with standard statistical analysis and, second, comparing and unifying various approaches to causal analysis.
From Associational to Causal Analysis: Distinctions and Barriers The Basic Distinction: Coping with Change The aim of standard statistical analysis, typified by regression, estimation, and hypothesis-testing techniques, is to assess parameters of a distribution from samples drawn of that distribution. With the help of such parameters, one can infer associations among variables, estimate the likelihood of past and future events, as well as update the likelihood of events in light of new evidence or new measurements. These tasks are managed well by standard statistical analysis so long as experimental conditions remain the same. Causal analysis goes one step further; its aim is to infer not only the likelihood of events under static conditions but also the dynamics of events under changing conditions, for example, changes induced by treatments or external interventions. This distinction implies that causal and associational concepts do not mix. There is nothing in the joint distribution of symptoms and diseases to tell us that curing the former would or would not cure the latter. More generally, there is nothing in a distribution function to tell us how that distribution would differ if external conditions were to change—say, from observational to experimental setup—because the laws of probability theory do not dictate how one property of a distribution ought to change when another property is modified. This information must be provided by causal assumptions which identify relationships that remain invariant when external conditions change. These considerations imply that the slogan ‘‘correlation does not imply causation’’ can be translated into a useful principle: One cannot substantiate causal claims from associations alone, even at the population level—behind every causal conclusion there must lie some causal assumption that is not testable in observational studies.
Formulating the Basic Distinction A useful demarcation line that makes the distinction between associational and causal concepts crisp and easy to apply can be formulated as follows. An associational concept is any relationship that can be defined in terms of a joint
3 The Mathematics of Causal Relations
49
distribution of observed variables, and a causal concept is any relationship that cannot be defined from the distribution alone. Examples of associational concepts are correlation, regression, dependence, conditional independence, likelihood, collapsibility, risk ratio, odd ratio, propensity score, ‘‘Granger causality,’’ marginalization, conditionalization, and ‘‘controlling for.’’ Examples of causal concepts are randomization, influence, effect, confounding, ‘‘holding constant,’’ disturbance, spurious correlation, instrumental variables, ignorability, exogeneity, exchangeability, intervention, explanation, and attribution. The former can, while the latter cannot, be defined in term of distribution functions. This demarcation line is extremely useful in causal analysis for it helps investigators to trace the assumptions that are needed for substantiating various types of scientific claims. Every claim invoking causal concepts must rely on some premises that invoke such concepts; it cannot be inferred from, or even defined in terms of, statistical notions alone.
Ramifications of the Basic Distinction This principle has far-reaching consequences that are not generally recognized in the standard statistical literature. Many researchers, for example, are still convinced that confounding is solidly founded in standard, frequentist statistics and that it can be given an associational definition, saying (roughly) ‘‘U is a potential confounder for examining the effect of treatment X on outcome Y when both U and X and both U and Y are not independent’’ (Pearl 2009b, p. 338). That this definition and all of its many variants must fail is obvious from the demarcation line above; ‘‘independence’’ is an associational concept, while confounding is for a tool used in establishing causal relations. The two do not mix; hence, the definition must be false. Therefore, to the bitter disappointment of generations of epidemiology researchers, confounding bias cannot be detected or corrected by statistical methods alone; one must make some judgmental assumptions regarding causal relationships in the problem before an adjustment (e.g., by stratification) can safely correct for confounding bias. Another ramification of the sharp distinction between associational and causal concepts is that any mathematical approach to causal analysis must acquire new notation for expressing causal relations—probability calculus is insufficient. To illustrate, the syntax of probability calculus does not permit us to express the simple fact that ‘‘symptoms do not cause diseases,’’ let alone to draw mathematical conclusions from such facts. All we can say is that two events are dependent—meaning that if we find one, we can expect to encounter the other but we cannot distinguish statistical dependence, quantified by the conditional probability p(disease | symptom) from causal
50
Causality and Psychopathology
dependence, for which we have no expression in standard probability calculus. Scientists seeking to express causal relationships must therefore supplement the language of probability with a vocabulary for causality, one in which the symbolic representation for the relation ‘‘symptoms cause disease’’ is distinct from the symbolic representation of ‘‘symptoms are associated with disease.’’
Two Mental Barriers: Untested Assumptions and New Notation The preceding requirements—(1) to commence causal analysis with untested,1 theoretically or judgmentally based assumptions and (2) to extend the syntax of probability calculus—constitute the two main obstacles to the acceptance of causal analysis among statisticians and among professionals with traditional training in statistics. Associational assumptions, even untested, are testable in principle, given a sufficiently large sample and sufficiently fine measurements. Causal assumptions, in contrast, cannot be verified even in principle, unless one resorts to experimental control. This difference stands out in Bayesian analysis. Though the priors that Bayesians commonly assign to statistical parameters are untested quantities, the sensitivity to these priors tends to diminish with increasing sample size. In contrast, sensitivity to prior causal assumptions—say, that treatment does not change gender—remains substantial regardless of sample size. This makes it doubly important that the notation we use for expressing causal assumptions be meaningful and unambiguous so that one can clearly judge the plausibility or inevitability of the assumptions articulated. Statisticians can no longer ignore the mental representation in which scientists store experiential knowledge since it is this representation and the language used to access this representation that determine the reliability of the judgments upon which the analysis so crucially depends. How does one recognize causal expressions in the statistical literature? Those versed in the potential-outcome notation (Neyman, 1923; Rubin, 1974; Holland, 1986) can recognize such expressions through the subscripts that are attached to counterfactual events and variables, for example, Yx(u) or Zxy—some authors use parenthetical expressions, such as Y(x, u) or Z(x, y). The expression Yx(u), for example, stands for the value that outcome Y would take in individual u had treatment X been at level x. If u is chosen at random, Yx is a random variable and one can talk about the probability that Yx
1. By ‘‘untested’’ I mean untested using frequency data in nonexperimental studies.
3 The Mathematics of Causal Relations
51
would attain a value y in the population, written p(Yx = y). Alternatively, Pearl (1995) used expressions of the form p[Y = y | set(X = x)] or p[Y = y | do(X = x)] to denote the probability (or frequency) that event (Y = y) would occur if treatment condition (X = x) were enforced uniformly over the population.2 Still a third notation that distinguishes causal expressions is provided by graphical models, where the arrows convey causal directionality.3 However, few have taken seriously the textbook requirement that any introduction of new notation must entail a systematic definition of the syntax and semantics that govern the notation. Moreover, in the bulk of the statistical literature before 2000, causal claims rarely appear in the mathematics. They surface only in the verbal interpretation that investigators occasionally attach to certain associations and in the verbal description with which investigators justify assumptions. For example, the assumption that a covariate is not affected by a treatment, a necessary assumption for the control of confounding (Cox, 1958), is expressed in plain English, not in a mathematical expression. Remarkably, though the necessity of explicit causal notation is now recognized by most leaders in the field, the use of such notation has remained enigmatic to most rank-and-file researchers and its potentials still lay grossly underutilized in the statistics-based sciences. The reason for this, I am firmly convinced, can be traced to the way in which causal analysis has been presented to the research community, relying primarily on outdated paradigms of controlled randomized experiments and black-box ‘‘missing-data’’ models (Rubin, 1974; Holland, 1986). The next section provides a conceptualization that overcomes these mental barriers; it offers both a friendly mathematical machinery for cause–effect analysis and a formal foundation for counterfactual analysis.
The Language of Diagrams and Structural Equations Semantics: Causal Effects and Counterfactuals How can one express mathematically the common understanding that symptoms do not cause diseases? The earliest attempt to formulate such a relationship mathematically was made in the 1920s by the geneticist Sewall Wright 2. Clearly, P[Y = y|do(X = x)] is equivalent to P(Yx = y). This is what we normally assess in a controlled experiment, with X randomized, in which the distribution of Y is estimated for each level x of X. 3. These notational clues should be useful for detecting inadequate definitions of causal concepts; any definition of confounding, randomization, or instrumental variables that is cast in standard probability expressions, void of graphs, counterfactual subscripts, or do(*) operators, can safely be discarded as inadequate.
Causality and Psychopathology
52
(1921), who used a combination of equations and graphs. For example, if X stands for a disease variable and Y stands for a certain symptom of the disease, Wright would write a linear equation y ¼ x þ u
ð1Þ
where x stands for the level (or severity) of the disease, y stands for the level (or severity) of the symptom, and u stands for all factors, other than the disease in question, that could possibly affect Y.4 In interpreting this equation, one should think of a physical process whereby nature examines the values of X and U and, accordingly, assigns variable Y the value y = x + u. To express the directionality inherent in this process, Wright augmented the equation with a diagram, later called a ‘‘path diagram,’’ in which arrows are drawn from (perceived) causes to their (perceived) effects and, more importantly, the absence of an arrow makes the empirical claim that the value nature assigns to one variable is not determined by the value taken by another.5 The variables V and U are called ‘‘exogenous’’; they represent observed or unobserved background factors that the modeler decides to keep unexplained, that is, factors that influence, but are not influenced by, the other variables (called ‘‘endogenous’’) in the model. If correlation is judged possible between two exogenous variables, U and V, it is customary to connect them by a dashed double arrow, as shown in Figure 3.1b. To summarize, path diagrams encode causal assumptions via missing arrows, representing claims of zero influence, and missing double arrows (e.g., between V and U), representing the (causal) assumption Cot(U, V) = 0.
V x=v y = βx + u X
β (a)
U
V
Y
X
U
β
Y
(b)
Figure 3.1 A simple structural equation model, and its associated diagrams. Unobserved exogenous variables are connected by dashed arrows.
4. We use capital letters (e.g., X,Y,U) for variable names and lower case letters (e.g., x,y,u) for values taken by these variables. 5. A weaker class of causal diagrams, known as ‘‘causal Bayesian networks,’’ encodes interventional, rather than functional dependencies; it can be used to predict outcomes of randomized experiments but not probabilities of counterfactuals (for formal definition, see Pearl, 2000a, pp. 22–24).
3 The Mathematics of Causal Relations W
V
W
U
53
V
U
X
Y
x0 Z
X
Y
Z
(a)
(b)
Figure 3.2 (a) The diagram associated with the structural model of Eq. (2). (b) The diagram associated with the modified model of Eq. (3), representing the intervention do (X = x0).
The generalization to a nonlinear system of equations is straightforward. For example, the nonparametric interpretation of the diagram of Figure 3.2a corresponds to a set of three functions, each corresponding to one of the observed variables: z ¼ fZ ðwÞ x ¼ fX ðz; Þ
ð2Þ
y ¼ fY ðx; uÞ where W, V, and U are here assumed to be jointly independent but, otherwise, arbitrarily distributed. Remarkably, unknown to most economists and philosophers, structural equation models provide a formal interpretation and symbolic machinery for analyzing counterfactual relationships of the type ‘‘Y would be y had X been x in situation U = u,’’ denoted Yx(u) = y. Here, U stands for the vector of all exogenous variables and represents all relevant features of an experimental unit (i.e., a patient or a subject). The key idea is to interpret the phrase ‘‘had X been x0’’ as an instruction to modify the original model M and replace the equation for X by a constant, x0, yielding a modified model, Mx0 : z ¼ fZ ðwÞ x ¼ x0
ð3Þ
y ¼ fY ðx; uÞ the graphical description of which is shown in Figure 3.2b. This replacement permits the constant x0 to differ from the actual value of X—namely, fX(z, v)—without rendering the system of equations inconsistent, thus yielding a formal definition of counterfactuals in multistage models, where the dependent variable in one equation may be an independent variable in another (Balke & Pearl, 1994a, 1994b; Pearl, 2000b). The general definition reads as follows: 4
Yx ðuÞ ¼ YMx ðuÞ:
ð4Þ
54
Causality and Psychopathology
In words, the counterfactual Yx(u) in model M is defined as the solution for Y in the modified submodel Mx, in which the equation for X is replaced by X = x. For example, to compute the average causal effect of X on Y, that is, EðYx0 Þ, we solve equation 3 for Y in terms of the exogenous variables, yielding Yx0 ¼ fY ðx0 ; uÞ, and average over U and V. To answer more sophisticated questions, such as whether Y would be y1 if X were x1 given that in fact Y is y0 and X is x0, we need to compute the conditional probability, PðYx1 ¼ y1 jY ¼ y0 ; X ¼ x0 Þ, which is well defined once we know the forms of the structural equations and the distribution of the exogenous variables in the model. This formalization of counterfactuals, cast as solutions to modified systems of equations, provides the conceptual and formal link between structural equation models used in economics and social science, the potentialoutcome framework, to be discussed later under The Language of Potential Outcomes; Lewis’ (1973) ‘‘closest-world’’ counterfactuals; Woodward’s (2003) ‘‘interventionalism’’ approach; Mackie’s (1965) ‘‘insufficient but necessary components of unnecessary but sufficient’’ (INUS) condition; and Rothman’s (1976) ‘‘sufficient component’’ framework (see VanderWeele and Robins, 2007). The next section discusses two long-standing problems that have been completely resolved in purely graphical terms, without delving into algebraic techniques.
Confounding and Causal Effect Estimation The central target of most studies in the social and health sciences is the elucidation of cause–effect relationships among variables of interests, for example, treatments, policies, preconditions, and outcomes. While good statisticians have always known that the elucidation of causal relationships from observational studies must rest on assumptions about how the data were generated, the relative roles of assumptions and data and the ways of using those assumptions to eliminate confounding bias have been a subject of much controversy. The preceding structural framework puts these controversies to rest. Covariate Selection: The Back-Door Criterion Consider an observational study where we wish to find the effect of X on Y, for example, treatment on response, and assume that the factors deemed relevant to the problem are structured as in Figure 3.3; some are affecting the response, some are affecting the treatment, and some are affecting both treatment and response. Some of these factors may be unmeasurable, such as genetic trait or lifestyle, while others are measurable, such as gender, age,
3 The Mathematics of Causal Relations Z 1
Z 2
W1 Z 3 X
55
W 3
W 2 Y
Figure 3.3 Graphical model illustrating the back-door criterion. Error terms are not shown explicitly.
and salary level. Our problem is to select a subset of these factors for measurement and adjustment so that if we compare treated vs. untreated subjects having the same values of the selected factors, we get the correct treatment effect in that subpopulation of subjects. Such a set of factors is called a ‘‘sufficient set,’’ ‘‘admissible’’ or a set ‘‘appropriate for adjustment.’’ The problem of defining a sufficient set, let alone finding one, has baffled epidemiologists and social scientists for decades (for review, see Greenland, Pearl, & Robins, 1999; Pearl, 2000a, 2009a). The following criterion, named the ‘‘back-door’’ criterion (Pearl, 1993a), provides a graphical method of selecting such a set of factors for adjustment. It states that a set, S, is appropriate for adjustment if two conditions hold: 1. No element of S is a descendant of X. 2. The elements of S ‘‘block’’ all back-door paths from X to Y, that is, all paths that end with an arrow pointing to X.6 Based on this criterion we see, for example, that each of the sets {Z1, Z2, Z3}, {Z1, Z3}, and {W2, Z3} is sufficient for adjustment because each blocks all back-door paths between X and Y. The set {Z3}, however, is not sufficient for adjustment because it does not block the path X W1 Z1 ! Z3 Z2 ! W2 ! Y. The implication of finding a sufficient set, S, is that stratifying on S is guaranteed to remove all confounding bias relative to the causal effect of X on Y. In other words, it renders the causal effect of X on Y identifiable, via PðY ¼ yjdoðX ¼ xÞÞ X PðY ¼ yjX ¼ x; S ¼ sÞPðS ¼ sÞ ¼
ð5Þ
s
6. In this criterion, a set, S, of nodes is said to block a path, P, if either (1) P contains at least one arrow-emitting node that is in S or (2) P contains at least one collision node (e.g., ! Z ) that is outside S and has no descendant in S (see Pearl, 2009b, pp. 16–17, 335–337).
Causality and Psychopathology
56
Since all factors on the right-hand side of the equation are estimable (e.g., by regression) from the preinterventional data, the causal effect can likewise be estimated from such data without bias. The back-door criterion allows us to write equation 5 directly, after selecting a sufficient set, S, from the diagram, without resorting to any algebraic manipulation. The selection criterion can be applied systematically to diagrams of any size and shape, thus freeing analysts from judging whether ‘‘X is conditionally ignorable given S,’’ a formidable mental task required in the potential-response framework (Rosenbaum & Rubin, 1983). The criterion also enables the analyst to search for an optimal set of covariates—namely, a set, S, that minimizes measurement cost or sampling variability (Tian, Paz, & Pearl, 1998). General Control of Confounding Adjusting for covariates is only one of many methods that permit us to estimate causal effects in nonexperimental studies. A much more general identification criterion is provided by the following theorem: Theorem 1 (Tian & Pearl, 2002) A sufficient condition for identifying the causal effect P[y|do(x)] is that every path between X and any of its children traces at least one arrow emanating from a measured variable.7 For example, if W3 is the only observed covariate in the model of Figure 3.3, then there exists no sufficient set for adjustment (because no set of observed covariates can block the paths from X to Y through Z3), yet P[y|do(x)] can nevertheless be estimated since every path from X to W3 (the only child of X) traces either the arrow X ! W3, or the arrow W3 ! Y, each emanating from a measured variable. In this example, the variable W3 acts as a ‘‘mediating instrumental variable’’ (Pearl, 1993b; Chalak & White, 2006) and yields the following estimand: PðY ¼ yjdo ðX ¼ xÞÞ X PðW3 ¼ wjdo ðX ¼ xÞÞPðY ¼ yjdo ðW3 ¼ wÞÞ ¼ w3
¼
X w
PðwjxÞ
X
ð6Þ
Pð y j w; x 0 ÞPðx 0 Þ
x
More recent results extend this theorem by (1) presenting a necessary and sufficient condition for identification (Shpitser & Pearl, 2006) and 7. Before applying this criterion, one may delete from the causal graph all nodes that are not ancestors of Y.
3 The Mathematics of Causal Relations
57
(2) extending the condition from causal effects to any counterfactual expression (Shpitser & Pearl, 2007). The corresponding unbiased estimands for these causal quantities are readable directly from the diagram.
The Language of Potential Outcomes The elementary object of analysis in the potential-outcome framework is the unit-based response variable, denoted Yx(u)—read ‘‘the value that Y would obtain in unit u had treatment X been x’’ (Neyman, 1923; Rubin, 1974). These subscripted variables are treated as undefined quantities, useful for expressing the causal quantities we seek, but are not derived from other quantities in the model. In contrast, in the previous section counterfactual entities were derived from a set of meaningful physical processes, each represented by an equation, and unit was interpreted a vector u of background factors that characterize an experimental unit. Each structural equation model thus provides a compact representation for a huge number of counterfactual claims, guaranteed to be consistent. In view of these features, the structural definition of Yx(u) (equation 4) can be regarded as the formal basis for the potential-outcome approach. It interprets the opaque English phrase ‘‘the value that Y would obtain in unit u had X been x’’ in terms of a scientifically-based mathematical model that allows such values to be computed unambiguously. Consequently, important concepts in potential-response analysis that researchers find ill-defined or esoteric often obtain meaningful and natural interpretation in the structural semantics. Examples are ‘‘unit’’ (‘‘exogenous variables’’ in structural semantics), ‘‘principal stratification’’ (‘‘equivalence classes’’ in structural semantics) (Balke & Pearl, 1994b; Pearl, 2000b), ‘‘conditional ignorability’’ (‘‘back-door condition’’ in Pearl, 1993a), and ‘‘assignment mechanism’’ [P(x|direct causes of X) in structural semantics]. The next two subsections examine how assumptions and inferences are handled in the potential-outcome approach vis a` vis the graphical–structural approach.
Formulating Assumptions The distinct characteristic of the potential-outcome approach is that, although its primitive objects are undefined, hypothetical quantities, the analysis itself is conducted almost entirely within the axiomatic framework of probability theory. This is accomplished by postulating a ‘‘super’’ probability function on both hypothetical and real events, treating the former as ‘‘missing data.’’ In other words, if U is treated as a random variable, then the value of the counterfactual Yx(u) becomes a random variable as well, denoted as Yx.
58
Causality and Psychopathology
The potential-outcome analysis proceeds by treating the observed distribution P(x1 . . . xn) as the marginal distribution of an augmented probability function (P*) defined over both observed and counterfactual variables. Queries about causal effects are phrased as queries about the probability distribution of the counterfactual variable of interest, written P*(Yx = y). The new hypothetical entities Yx are treated as ordinary random variables; for example, they are assumed to obey the axioms of probability calculus, the laws of conditioning, and the axioms of conditional independence. Moreover, these hypothetical entities are not entirely whimsy but are assumed to be connected to observed variables via consistency constraints (Robins, 1986) such as X ¼ x ) Yx ¼ Y;
ð7Þ
which states that for every u, if the actual value of X turns out to be x, then the value that Y would take on if X were x is equal to the actual value of Y. For example, a person who chose treatment x and recovered would also have recovered if given treatment x by design. The main conceptual difference between the two approaches is that, whereas the structural approach views the subscript x as an operation that changes the distribution but keeps the variables the same, the potentialoutcome approach views Yx to be a different variable, unobserved and loosely connected to Y through relations such as equation 7. Pearl (2000a, chap. 7) shows, using the structural interpretation of Yx(u), that it is indeed legitimate to treat counterfactuals as jointly distributed random variables in all respects, that consistency constraints like equation 7 are automatically satisfied in the structural interpretation, and, moreover, that investigators need not be concerned about any additional constraints except the following two:8 Yyz ¼ y for all y and z Xz ¼ x ) Yxz ¼ Yz for all x and z
ð8Þ ð9Þ
Equation 8 ensures that the intervention do(Y = y) results in the condition Y = y, regardless of concurrent interventions, say do(Z = z), that are applied to variables other than Y. Equation 9 generalizes equation 7 to cases where Z is held fixed at z. To communicate substantive causal knowledge, the potential-outcome analyst must express causal assumptions as constraints on P*, usually in the
8. This completeness result is due to Halpern (1998), who noted that an additional axiom
fYxz ¼ yg & fZxy ¼ zg ) Yx ¼ y must hold in nonrecursive models. This fundamental axiom may come to haunt economists and social scientists who blindly apply Neyman-Rubin analysis in their fields.
3 The Mathematics of Causal Relations
59
form of conditional independence assertions involving counterfactual variables. In Figure 3.2(a), for instance, to communicate the understanding that a treatment assignment (Z) is randomized (hence independent of both U and V), the potential-outcome analyst needs to use the independence constraint Z ?? fXz ; Yx g. To further formulate the understanding that Z does not affect Y directly, except through X, the analyst would write a so-called exclusion restriction: Yxz = Yx. Clearly, no mortal can judge the validity of such assumptions in any real-life problem without resorting to graphs.9
Performing Inferences A collection of assumptions of this type might sometimes be sufficient to permit a unique solution to the query of interest; in other cases, only bounds on the solution can be obtained. For example, if one can plausibly assume that a set, Z, of covariates satisfies the conditional independence Yx ?? XjZ
ð10Þ
(an assumption that was termed ‘‘conditional ignorability’’ by Rosenbaum & Rubin, 1983), then the causal effect, P*(Yx = y), can readily be evaluated to yield X P ðYx ¼ yÞ ¼ P ðYx ¼ yjzÞPðzÞ z
¼
X
P ðYx ¼ yjx; zÞPðzÞ ðusing ð10ÞÞ
z
¼
X
P ðY ¼ yjx; zÞPðzÞ
ðusing ð7ÞÞ
z
¼
X
Pð yjx; zÞPðzÞ:
z
which is the usual covariate-adjustment formula, as in equation 5. Note that almost all mathematical operations in this derivation are conducted within the safe confines of probability calculus. Save for an occasional application of rule 9 or 7, the analyst may forget that Yx stands for a counterfactual quantity—it is treated as any other random variable, and the entire derivation follows the course of routine probability exercises. However, this mathematical illusion comes at the expense of conceptual clarity, especially at a stage where causal assumptions need to be formulated. The reader may appreciate this aspect by attempting to judge whether the assumption of conditional ignorability (equation 10), the key to the derivation 9. Even with the use of graphs the task is not easy; for example, the reader should try to verify whether fZ ?? Xz jYg holds in the simple model of Figure 3.2(a). The answer is given in Pearl (2000a, p. 214).
60
Causality and Psychopathology
of equation 11, holds in any familiar situation—say, in the experimental setup of Figure 3.2(a). This assumption reads ‘‘the value that Y would obtain had X been x is independent of X, given Z’’ (see footnote 4). Such assumptions of conditional independence among counterfactual variables are not straightforward to comprehend or ascertain for they are cast in a language far removed from ordinary understanding of cause and effect. When counterfactual variables are not viewed as by-products of a deeper, processbased model, it is also hard to ascertain whether all relevant counterfactual independence judgments have been articulated, whether the judgments articulated are redundant, or whether those judgments are self-consistent. The need to express, defend, and manage formidable counterfactual relationships of this type explains the slow acceptance of causal analysis among epidemiologists and statisticians and why economists and social scientists continue to use structural equation models instead of the potential-outcome alternatives advocated in Holland (1988); Angrist, Imbens, and Rubin (1996); and Sobel (1998). On the other hand, the algebraic machinery offered by the potential-outcome notation, once a problem is properly formalized, can be powerful in refining assumptions (Angrist et al., 1996), deriving consistent estimands (Robins, 1986), analyzing mediation (Pearl, 2001), bounding probabilities of causation (Tian & Pearl, 2000), and combining data from experimental and nonexperimental studies (Pearl, 2000a, pp. 302–303).
Combining Graphs and Counterfactuals—The Mediation Formula Pearl (2000a, p. 232) presents a way of combining the best features of the two approaches. It is based on encoding causal assumptions in the language of diagrams, translating these assumptions into potential-outcome notation, performing the mathematics in the algebraic language of counterfactuals, and, finally, interpreting the result in plain causal language. Often, the answer desired can be obtained directly from the diagram, and no translation is necessary (as demonstrated earlier, Confounding and Causal Effect Estimation). One area that has benefited substantially from this symbiosis is the analysis of direct and indirect effects, also known as ‘‘mediation analysis’’ (Shrout & Bolger, 2002), which has resisted generalizations to discrete variables and nonlinear interactions for several decades (Robins & Greenland, 1992; Mackinnon, Lockwood, Brown, Wang, & Hoffman, 2007). The obstacles were definitional; the direct effect is sensitive to the level at which we condition the intermediate variable, while the indirect effect cannot be defined by conditioning on a third variable or taking the difference between the total and direct effects.
3 The Mathematics of Causal Relations
61
The structural definition of counterfactuals (equation 4) and the graphical analysis (see Confounding and Causal Effect Estimation) combined to produce formal definitions of, and graphical conditions under which, direct and indirect effects can be estimated from data (Pearl, 2001; Petersen, Sinisi, & van der Laan, 2006). In particular, under conditions of no unmeasured (or uncontrolled for) confounders, this symbiosis has produced the following Mediation Formulas for the expected direct (DE) and indirect (IE) effects of the transition from X = x to X = x’ (with outcome Y and mediating set Z): X ½EðYjx 0 ; zÞ EðYjx; zÞPðzjxÞ: ð12Þ DE ¼ IE ¼
z X
EðYjx; zÞ½Pðzjx 0 Þ PðzjxÞ
ð13Þ
z
These general formulas are applicable to any type of variables,10 any nonlinear interactions, and any distribution and, moreover, are readily estimable by regression. IE (respectively, DE) represents the average increase in the outcome Y that the transition from X = x to X = x’ is expected to produce absent any direct (respectively, indirect) effect of X on Y. When the outcome Y is binary (e.g., recovery or hiring), the ratio (1 – IE/TE) represents the fraction of responding individuals who owe their response to direct paths, while (1 – DE/TE) represents the fraction who owe their response to Z-mediated paths. TE stands for the total effect, TE = E(Y|x’) – E(Y|x), which, in nonlinear systems may or may not be the sum of the direct and indirect effects. Additional results spawned by the structural–graphical–counterfactual symbiosis include effect estimation under noncompliance (Balke & Pearl, 1997; Chickering & Pearl, 1997), mediating instrumental variables (Pearl, 1993b; Brito & Pearl, 2006), robustness analysis (Pearl, 2004), selecting predictors for propensity scores (Pearl, 2010a, 2010c), and estimating the effect of treatment on the treated (Shpitser & Pearl, 2009). Detailed descriptions of these results are given in the corresponding articles (available at http:// bayes.cs.ucla.edu/csl_papers.html).
Conclusions Statistics is strong in devising ways of describing data and inferring distributional parameters from a sample. Causal inference require two additional
10. Integrals should replace summations when Z is continuous. Generalizations to cases involving observed or unobserved confounders are given in Pearl (2001) and exemplified in Pearl (2010a, 2010b). Conceptually, IE measures the average change in Y under the operation of setting X to x and, simultaneously, setting Z to whatever value it would have obtained under X = x’ (Robins & Greenland, 1992).
62
Causality and Psychopathology
ingredients: a science-friendly language for articulating causal knowledge and a mathematical machinery for processing that knowledge, combining it with data, and drawing new causal conclusions about a phenomenon. This chapter introduces nonparametric structural equation models as a formal and meaningful language for formulating causal knowledge and for explicating causal concepts used in scientific discourse. These include randomization, intervention, direct and indirect effects, confounding, counterfactuals, and attribution. The algebraic component of the structural language coincides with the potential-outcome framework, and its graphical component embraces Wright’s method of path diagrams (in its nonparametric version). When unified and synthesized, the two components offer investigators a powerful methodology for empirical research (e.g., Morgan & Winship, 2007; Greenland et al., 1999; Glymour & Greenland, 2008; Chalak & White, 2006; Pearl, 2009a). Perhaps the most important message of the discussion and methods presented in this chapter would be a widespread awareness that (1) all studies concerning causal relations must begin with causal assumptions of some sort and (2) a friendly and formal language is currently available for articulating such assumptions. This means that scientific articles concerning questions of causation must contain a section in which causal assumptions are articulated using either graphs or subscripted formulas. Authors who wish their assumptions to be understood, scrutinized, and discussed by readers and colleagues would do well to use graphs. Authors who refrain from using graphs would be risking a suspicion of attempting to avoid transparency of their working assumptions. Another important implication is that every causal inquiry can be mathematized. In other words, mechanical procedures can now be invoked to determine what assumptions investigators must be willing to make in order for desired quantities to be estimable consistently from the data. This is not to say that the needed assumptions would be reasonable or that the resulting estimation method would be easy. It means that the needed causal assumptions can be made transparent and brought up for discussion and refinement and that, once consistency is assured, causal quantities can be estimated from data through ordinary statistical methods, free of the mystical aura that has shrouded causal analysis in the past.
References Angrist, J., Imbens, G., & Rubin, D. (1996). Identification of causal effects using instrumental variables (with comments). Journal of the American Statistical Association, 91(434), 444–472. Balke, A., & Pearl, J. (1994a). Counterfactual probabilities: Computational methods, bounds, and applications. In R. L. de Mantaras and D. Poole (Eds.), Proceedings of the
3 The Mathematics of Causal Relations
63
Tenth Conference on Uncertainty in Artificial Intelligence (pp. 46–54). San Mateo, CA: Morgan Kaufmann. Balke, A., & Pearl, J. (1994b). Probabilistic evaluation of counterfactual queries. In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. I, pp. 230–237). Menlo Park, CA: MIT Press. Balke, A., & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439), 1172–1176. Brito, C., & Pearl, J. (2006). Graphical condition for identification in recursive SEM. In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (pp. 47–54). Corvallis, OR: AUAI Press. Chalak, K., & White, H. (2006). An extended class of instrumental variables for the estimation of causal effects (Tech. Rep. Discuss. Paper). San Diego: University of California, San Diego, Department of Economics. Chickering, D., & Pearl, J. (1997). A clinician’s tool for analyzing non-compliance. Computing Science and Statistics, 29(2):424–431. Cox, D. (1958). The Planning of Experiments. New York: John Wiley & Sons. Glymour, M., & Greenland, S. (2008). Causal diagrams. In K. Rothman, S. Greenland, and T. Lash (Eds.), Modern Epidemiology (3rd ed., pp. 183–209). Philadelphia: Lippincott Williams & Wilkins. Greenland, S., Pearl, J., & Robins, J. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1):37–48. Halpern, J. (1998). Axiomatizing causal reasoning. In G. Cooper and S. Moral (Eds.), Uncertainty in Artificial Intelligence (pp. 202–210). San Francisco: Morgan Kaufmann. (Reprinted in Journal of Artificial Intelligence Research, 12, 17–37, 2000.) Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. Holland, P. (1988). Causal inference, path analysis, and recursive structural equations models. In C. Clogg (Ed.), Sociological Methodology (pp. 449–484). Washington, DC: American Sociological Association. Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press. Mackie, J. (1965). Causes and conditions. American Philosophical Quarterly, 2/4:261–264. (Reprinted in E. Sosa and M. Tooley [Eds.], Causation. Oxford: Oxford University Press, 1993.) MacKinnon, D., Lockwood, C., Brown, C., Wang, W., & Hoffman, J. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499–513. Morgan, S., & Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). New York: Cambridge University Press. Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Statistical Science, 5(4), 465–480. Pearl, J. (1993a). Comment: Graphical models, causality, and intervention. Statistical Science, 8(3), 266–269. Pearl, J. (1993b). Mediating instrumental variables (Tech. Rep. No. TR-210). Los Angeles: University of California, Los Angeles, Department of Computer Science. http://ftp.cs.ucla.edu/pub/stat_ser/R210.pdf. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–710. Pearl, J. (2000a). Causality: Models, Reasoning, and Inference. New York: Cambridge University Press.
64
Causality and Psychopathology
Pearl, J. (2000b). Comment on A. P. Dawid’s, Causal inference without counterfactuals. Journal of the American Statistical Association, 95(450), 428–431. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann. Pearl, J. (2004). Robustness of causal claims. In M. Chickering and J. Halpern (Eds.), Proceedings of the Twentieth Conference Uncertainty in Artificial Intelligence (pp. 446– 453). Arlington, VA: AUAI Press. Pearl, J. (2009a). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf. Pearl, J. (2009b). Causality: Models, Reasoning, and Inference. New York: Cambridge University Press, 2nd edition. Pearl, J. (2009c). Remarks on the method of propensity scores. Statistics in Medicine, 28, 1415–1416. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r345-sim.pdf. Pearl, J. (2010a). The foundation of causal inference. (Tech. Rep. No. R-355). Los Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/stat_ser/ r355.pdf. Forthcoming, Sociological Methodology. Pearl, J. (2010b). The mediation formula: A guide to the assessment of causal pathways in non-linear models. (Tech. Rep. No. R-363). Los Angeles: University of California, Los Angeles, http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf. Pearl, J. (2010c). On a class of bias-amplifying variables that endanger effect estimates. (In P. Grunwald and P. Spirtes (Eds.), Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (pp. 425–432). Corvallis, OR: AUAI. http:// ftp.cs.ucla.edu/pub/stat_ser/r356.pdf. Petersen, M., Sinisi, S., & van der Laan, M. (2006). Estimation of direct causal effects. Epidemiology, 17(3), 276–284. Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—applications to control of the healthy workers survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. Rosenbaum, P., & Rubin, D. (1983). The central role of propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Rothman, K. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. Shpitser, I., & Pearl, J. (2006). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (pp. 1219–1226). Menlo Park, CA: AAAI Press. Shpitser, I., & Pearl, J. (2007). What counterfactuals can be tested. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (pp. 352–359). Vancouver, Canada: AUAI Press. (Reprinted in Journal of Machine Learning Research, 9, 1941–1979, 2008.) Shpitser, I., & Pearl, J. (2009). Effects of treatment on the treated: Identification and generalization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Montreal: AUAI Press. Shrout, P., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7(4), 422–445.
3 The Mathematics of Causal Relations
65
Sobel, M. (1998). Causal inference in statistical models of the process of socioeconomic achievement. Sociological Methods & Research, 27(2), 318–348. Tian, J., Paz, A., & Pearl, J. (1998). Finding minimal separating sets (Tech. Rep. No. R-254). Los Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/ stat_ser/r254.pdf. Tian, J., & Pearl, J. (2000). Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28, 287–313. Tian, J., & Pearl, J. (2002). A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (pp. 567–573). Menlo Park, CA: AAAI Press/MIT Press. VanderWeele, T., & Robins, J. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18(5), 561–568. Woodward, J. (2003). Making Things Happen. New York: Oxford University Press. Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.
4 Causal Thinking in Psychiatry A Genetic and Manipulationist Perspective kenneth s. kendler
A large and daunting philosophical literature exists on the nature and meaning of causality. Add to that the extensive discussions in the statistical literature about what it means to claim that C causes E, and it can be overwhelming for the scientists, who, after all, are typically just seeking guidelines about how to conduct and analyze their research. Add to this mix the inherent problems in psychiatry—which examines an extraordinarily wide array of potential causal processes from molecules to minds and societies, some of which permit experimental manipulation but many of which do not—and you can readily see the sense of frustration and, indeed, futility with which this issue might be addressed. In the first section of this chapter, I reflect on two rather practical aspects of causal inference that I have confronted in my research career in psychiatric genetics. The first of these is what philosophers call a ‘‘brute fact’’ of our world—the unidirectional causal relationship between variation in genomic DNA and phenotype. The second is the co-twin–control method—a nice example of trying to use twins as a ‘‘natural experiment’’ to clarify causal processes when controlled trials are infeasible. In the second section, I briefly outline and advocate for a particular approach to causal inference developed by Jim Woodward (2003) that I term ‘‘interventionism.’’ I argue that this approach is especially well suited to the needs of our unusual field of psychiatry.
Two Practical Aspects of Causal Inference in Psychiatric Genetics Research I often teach students that it is almost too easy in psychiatric research to show that putative risk factors correlate with outcomes. It is much harder to 66
4 Causal Thinking in Psychiatry
67
determine if that relationship is a causal one. Indeed, assuming that for practical or ethical reasons a randomized trial of exposure to the risk factor is not feasible, one must rely on observational data. In these instances, it can be ‘‘damn near impossible’’ to confidently infer causation. However, in this mire of casual uncertainty, it is interesting to note that one relationship stands out in its causal clarity: the relationship between variation in germline DNA (gDNA) and phenotypes. It did not have to be this way. Indeed, folk wisdom has long considered the inheritance of acquired characteristics (which implies a phenotype ! gDNA relationship, interpreted as phenotype causes gDNA) to be a plausible mechanism of heredity. In the eighteenth century, this concept (of the inheritance of acquired characteristics) was most closely associated with the name of Lamark. In the twentieth century, due to an unfortunate admixture of bad science and repressive politics, this same process came to dominate Soviet biology through the efforts of Lysenko. However, acquired characteristics are not inherited through changes in our DNA sequence. Rather, the form of life of which we are a product evolved in such a manner as to render the sequence of gDNA relatively privileged and protected. Therefore, causal relationships between genes and phenotypes are, of necessity, unidirectional. To put this more crudely, genes can influence phenotypes but phenotypes cannot influence genes. I do not mean to imply that gDNA never varies. It is subject to a range of random features, from point mutations to insertions of transposons and slippage of replication machinery leading to deletions and duplications. However, I am unaware of any widely accepted claims that such changes in gDNA can occur in a systematic and directed fashion so as to establish a true phenotype ! gDNA causal pathway. Let me be a bit more precise. Assume that our predictor variable is a measure of genetic variation (GV). This can be either latent—as it might be if we are studying twins or adoptees—or observed—if we are directly examining variation in genomic sequence (e.g., via single nucleotide polymorphisms). Assume our dependent measure is the liability toward having a particular psychiatric disease, which we will term ‘‘risk.’’ We can then assume that GV causes risk: GV ! Risk Also, we can be certain that risk does not cause GV: Risk ! GV This claim is specific and limited. It does not apply to other features of our genetic machinery such as gene expression—the product of genes at the level
68
Causality and Psychopathology
of either mRNA or protein—and epigenetic modifications of DNA. Our expression levels can be exquisitely sensitive to environmental effects. There is also evidence that environmental factors can alter methylation of DNA. Thus, it is not true that in all of genetics research causal relationships are unambiguous. If, however, we consider only the sequence of gDNA, it is, I argue, a little corner of causal clarity that we should cherish. The practical consequence of these unidirectional causal relationships does not end with the simple bivariate relationship noted between gDNA and phenotype. For example, using structural equation modeling, far more elaborate models can be developed that involve multiple phenotypes, developmental pathways, gene–environment interaction, differences in gene expression by sex or cohort, and gene–environment correlation. Under some situations (e.g., van den Oord & Snieder, 2002), including genetic effects can help to clarify other causal relationships. For example, if two disorders (A and B) are comorbid, the identification of genetic risk factors should help to determine whether this comorbidity results from shared risk factors like genes or a phenotypic pathway in which developing disorder A directly contributes to the risk of developing disorder B. While theoretically clear, it is unfortunately not possible to study such gene to phenotype pathways in the real world without introducing other assumptions. For example, if we are studying a population which contains two subpopulations that differ in frequency of both gene and disease, standard case–control association studies can artifactually produce significant findings in the absence of any true gDNA-to-phenotype relationship. The ability to infer the action of genetic risk factors in twin studies is based on the equal-environment assumption as well as assumptions about assortative mating and the relationship of additive to nonadditive genetic effects. The bottom-line message here is a simple one: The world in which we live is often causally ambiguous; this cannot be better demonstrated than in many areas of psychiatric research. Because of the way our life forms have evolved, gDNA is highly protected. Our bodies work very hard to ensure that nothing, including our own behavior or environmental experiences, messes with our gDNA. This quirk in our biology gives us causal purchase that we might not otherwise have. We should take this gift from nature, grasp it hard, and use it for all it is worth.
The Co-Twin–Control Method One common approach to understanding the causal relationship between a putative risk factor and a disease is to match individuals on as many variables as possible except that one group has been exposed to the putative risk factor
4 Causal Thinking in Psychiatry
69
and the other group has not. If the exposed group has a higher rate of disease, then we can argue on this basis that the risk factor truly causes the disease. While intuitively appealing, this common nonexperimental approach—like many in epidemiology—has a key point of vulnerability. While it may be that the risk factor causes the disease, it is also possible that a set of ‘‘third variables’’ predispose to both the risk factor and the disease. Such a case will produce a noncausal risk factor–disease association. This is a particular problem in psychiatric epidemiology because so many exposures of interest—stressful life events, social support, educational status, social class—are themselves complex and the result not only of the environment (with causal effects flowing from environment to person) but also of the actions of human beings themselves (with causal effects flowing from people to their environment) (Kendler & Prescott, 2006). As humans, we actively create our own environments, and this activity is substantially influenced by our genes (Kendler & Baker, 2007). Thus, for behavioral traits, our phenotypes quite literally extend far beyond our skin. Can we use genetically informative designs to get any purchase on these possible confounds? Sometimes. Let me describe one such method—the cotwin–control design. I will first illustrate its potential utility with one example and then describe a critical limitation. A full co-twin–control design involves the comparison of the association between a risk factor and an outcome in three samples: (1) an unselected population, (2) dizygotic (DZ) twin pairs discordant for exposure to the risk factor, and (3) monozygotic (MZ) pairs discordant for exposure to the risk factor (Kendler et al., 1993). Three possible different patterns of results are illustrated in Figure 4.1. The results on the left side of the figure show the
4 All subjects DZ pairs MZ pairs 3
OR 2
1.0
1 All Causal
Partly Non-causal Genetic
Non-causal - All Genetic
0
Figure 4.1 Interpretation of Results Obtained from Studies Using a Cotwin-Control Design.
70
Causality and Psychopathology
pattern that would be expected if the risk factor–outcome association was truly causal (note: this assumes that no other environmental confounding is present). Controlling for family background or genetic factors would, within the limits of statistical fluctuation, make no difference to the estimate of the association. The middle set of results in Figure 4.1 shows an example where part of the risk factor–outcome association is due to genetic factors that influence both the risk factor and the outcome. Here, the association is strongest in the entire sample (where genetic and causal effects are entirely confounded as we control for neither genetic nor shared environmental factors), intermediate among discordant DZ twins (where we control for shared environmental factors and partly for genetic background), and lowest among discordant MZ pairs (where we control entirely for both shared environmental and genetic backgrounds). The degree to which the association declines from the entire sample to discordant MZ pairs is a rough measure of the proportion of the association that is noncausal and the result of shared genetic influences on the risk factor and the outcome. The results on the right side of Figure 4.1 show the extreme case, where all of the risk factor–outcome association is due to shared genetic effects and the risk factor has no real causal effect on the outcome. Thus, within discordant MZ pairs there will be no association between the risk factor and the outcome (i.e., an odds ratio [OR] of 1.0), and the association within discordant DZ pairs would be expected to be roughly midway between 1 and the value for the entire sample. Let me illustrate this method with a striking real-world example taken from research conducted primarily with my long-term colleague, Dr. Carol Prescott (Kendler & Prescott, 2006; Prescott & Kendler, 1999). In general population samples, early age at drinking onset (the risk factor of interest) has been consistently associated with an increased risk for developing alcohol-use disorders (Grant & Dawson, 1997). The prevalence of alcoholuse disorders among individuals who first try alcohol before age 15 is as high as 50% in some studies. Several studies reporting this effect interpreted it to be a causal one—that early drinking directly produces an increased risk for later alcohol problems. On the basis of this interpretation, calls have been made to delay the age at first drink among early adolescents as a means of decreasing risk for adult alcohol problems (Pedersen & Skrondal, 1998). This risk factor–outcome relationship, however, need not be a causal one. For example, early drinking could be one manifestation of a broad liability to deviance which might be evident in a host of problem behaviors, such as use of illicit substances, antisocial behavior, and adult alcoholism (Jessor & Jessor, 1977). If this were the case, delaying the first exposure to alcohol use would
4 Causal Thinking in Psychiatry
71
not alter the underlying liability to adolescent problem behavior or to adult alcoholism. Using data from the Virginia Adult Twin Study of Psychiatric and Substance Use Disorders, we tested these two hypotheses about why early drinking correlates with alcoholism. The results are depicted in Figure 4.2. As in prior studies, we found a strong association between lifetime prevalence of alcoholism and age at first drink among both males and females (Prescott & Kendler, 1999). As shown in Figure 4.2, males who began drinking before age 15 were twice as likely (OR = 2.0) to develop Diagnostic and Statistical Manual of Mental Disorders (fourth edition) alcohol dependence (AD) as those who did not drink early. The association for females was even more dramatic: Early drinkers were more than four times as likely to develop AD as other women. The data to test causality come from twin pairs who were discordant for early drinking. Under the causal hypothesis, we would expect that the twins with earlier drinking onset would have a higher risk for alcoholism than their later-drinking co-twins and that the same pattern would hold for MZ and DZ pairs. However, if early age at drinking is just an index of general deviance which influences (among other things) risk of developing alcoholism, we would expect that the prevalence would be similar for members of MZ discordant-onset pairs. The ‘‘unexposed’’ twins (with a later onset of drinking) would share their co-twins’ risk for behavioral deviance and, thus, have a higher risk for alcoholism than the pairs in which neither twin drank early. The pattern observed in MZ vs. DZ discordant-onset pairs tells us to what degree familial resemblance for behavioral deviance is due to shared environmental vs. genetic factors. If it is due to shared environmental
5 All subjects DZ pairs MZ pairs
4
3 OR 2
1 Males
Females
0
Figure 4.2 Odds Ratios from Cotwin-Control Analyses of the Association between Drinking Before Age 15 and Alcohol Dependence.
72
Causality and Psychopathology
factors, the risk for alcoholism among the unexposed twins from DZ discordantonset pairs would be expected to be the same as that in the MZ pairs. However, if familial resemblance for deviance is due to genetic factors, the risk for alcoholism in an unexposed individual would be lower among DZ than MZ pairs. As shown in Figure 4.2, the twin pair resemblance was inconsistent with the causal hypothesis. Instead, the results suggested that early drinking and later alcoholism are both the result of a shared genetic liability. For example, among the 213 male and 69 female MZ pairs who were discordant for early drinking, there was only a slight difference in the prevalence of AD between the twins who drank early and the co-twins who did not. The ORs were 1.1 for both sexes and were not statistically different from the 1.0 value predicted by the noncausal model for MZ pairs. The ORs for the DZ pairs were midway between those of the MZ pairs and the general sample, indicating that the source of the familial liability is genetic rather than environmental. I am not claiming that these results are definitive, and they certainly require replication. It is frankly unlikely that early onset of alcohol consumption has no impact on subsequent risk for problem drinking. Surely, however, these results should give pause to those who want to stamp out alcohol problems by restricting the access of adolescents to alcohol and suggest that non-casual processes might explain at least some of the early drinking–later alcoholism relationship. For those interested in other psychiatric applications of the co-twin–control method, our group has applied it to clarify the association between smoking and major depression (Kendler et al., 1993), stressful life events and major depression (Kendler, Karkowski, & Prescott, 1999), and childhood sexual abuse and a range of psychiatric outcomes (Kendler et al., 2000). Lest you leap to the conclusion that this method is a panacea for our problems of causal inference, I have some bad news. The co-twin–control method is asymmetric with regard to the causal clarity of its results. Studies in which MZ twins discordant for risk-factor exposure have equal rates of the disease can, I think, permit the rather strong inference that the risk factor– disease association is not causal. However, if in MZ twins discordant for riskfactor exposure the exposed twin has a significantly higher risk of illness than the unexposed twin, it is not possible to infer with such confidence that the risk factor–disease association is causal. This is because in the typical design it is not possible to rule out the potential that some unique environmental event not shared with the co-twin produced both the risk factor and the disease. For example, imagine we are studying the relationship between early conduct disorder and later drug dependence. Assume further that we find many MZ twin pairs of the following type: the conduct-disordered twin
4 Causal Thinking in Psychiatry
73
(twin A) develops drug dependence, while the nondisordered co-twin (twin B) does not. We might wish to argue that this strongly proves the causal path from conduct disorder to later drug dependence. Alas, it is not so simple. It is perfectly possible that twin A had some prior environmental trauma not shared with twin B (an obstetric complication or a fall off a bicycle with a resulting head injury) that predisposed to both conduct disorder and drug dependence. While the MZ co-twin–control design excludes the possibility that a common set of genes or a class of shared environmental experiences predisposes to both risk factor and outcome, it cannot exclude the possibility that a ‘‘third factor’’ environmental experience unique to each twin plays a confounding role. Genetic strategies can occasionally provide some traction on issues of causation in psychiatric epidemiology that might be otherwise difficult to establish. In addition to the co-twin–control method, genetic randomization is another potentially powerful natural experiment that relies on using genes as instrumental variables (again taking advantage of the causal asymmetry between genotype and phenotype) (Davey-Smith, 2006; also see Chapter 9). Neither of these methods can entirely substitute for controlled trials, but for many interesting questions in psychiatry such an approach is either impractical or unethical. These methods are far from panaceas, but they may be underused by some in our field who are prone to slip too easily from correlative to causal language.
Interventionism as an Approach to Causality Well-Suited for Psychiatry I have been reading for some years in the philosophy of science (and a bit in metaphysics) about approaches to causation and explanation. For understandable reasons, this is an area often underutilized by psychiatric researchers. I am particularly interested in the question of what general approach to causality is most appropriate for the science of psychiatry, which itself is a hybrid of the biological, psychological, and sociological sciences. First, I would argue that the deductive-nomological approach emerging from the logical positivist movement is poorly suited to psychiatric research. This position—which sees true explanation as being deduced from general laws as applied to specific situations—may have its applications in physics. However, psychiatry lacks the broad and deep laws that are seen at the core of physics. Many, myself included, doubt that psychiatry will ever have laws of the wide applicability of general relativity or quantum mechanics. It is simply not, I suggest, in the nature of our discipline to have such powerful and simple explanations. A further critical limitation of this approach for
74
Causality and Psychopathology
psychiatry, much discussed in the literature, is that it does a poor job at the critical discrimination between causation and correlation—which I consider a central problem for our field. The famous example that is most commonly used here is of flagpoles and shadows. Geometric laws can equally predict the length of a shadow from the height of a flagpole or the height of the flagpole from the length of the shadow. However, only one of these two relationships is causally sensible. Second, while a mechanistic approach to causation is initially intuitively appealing, it is also ill-suited as a general approach for our field. By a ‘‘mechanistic approach,’’ I mean the idea that causation is best understood as the result of some direct physical contact, a spatiotemporal process, which involves the transfer of some process or energy from one object to another. One might think of this as the billiard ball model of causality—that satisfying click we hear when our cue ball knocks against another ball, sending it, we hope, into the designated pocket. How might this idea apply to psychiatric phenomena? Consider the empirical observation that the rate of suicide declined dramatically in England in the weeks after September 11, 2001 (9/11) (Salib, 2003). How would a mechanistic model approach this causal process? It would search for the specific nature of the spatiotemporal processes that connected the events of 9/11 in the United States to people in England. For example, it would have to determine the extent to which information about the events of 9/11 were conveyed to the English population through radio, television, e-mail, word of mouth, and newspapers. Then, it would have to trace the physical processes whereby this news influenced the needed brain pathways, etc. I am not a Cartesian dualist, so do not misunderstand me here. I am not suggesting that in some ultimate way physical processes were not needed to explain why the suicide rate declined in England in September 2001. Instead, perhaps time spent figuring out the physical means by which news of 9/11 arrived in England is the wrong level on which to understand this process. Mechanistic models fail for psychiatry for the same reasons that hard reductionist models fail. Critical causal processes in psychiatric illnesses exist at multiple levels, only some of which are best understood at a physical–mechanical level. A third approach is the interventionist model (IM), which evolved out of the counterfactual approach to causation. The two perspectives share the fundamental idea that in thinking about causation we are ultimately asking questions about what would have happened if things had been different. While some counterfactual literature discusses issues around closest parallel worlds, the IM approach is a good deal more general and can be considered ‘‘down to earth.’’ What is the essence of the IM? Consider a simple, idealized case. Suppose we want to determine whether stress (S) increases the risk for
4 Causal Thinking in Psychiatry
75
major depression (MD). The ‘‘ideal experiment’’ here would be the unethical one in which, in a given population, we randomly intervene on individuals, exposing them to a stressful experience such as severe public humiliation (H). This experience increases their level of S, and we heartlessly observe if they subsequently suffer from an increased incidence of MD. Our design is H intervenes on S ! MD Thus, we are assuming that intervention on S will make a difference to risk for MD. For this to work, according to the IM, the intervention must meet several conditions (for more details, see Pearl, 2001; Woodward & Hitchcock, 2003). We will illustrate these with our thought experiment as follows: 1. In individuals who are and are not exposed to our intervention, H must be the only systematic cause of S that is unequally distributed among the exposed and the unexposed (so that all of the averaged differences in level of S in each cohort of our exposed and unexposed subjects result entirely from H). 2. H must not affect the risk for MD by any route that does not go through S (e.g., by causing individuals to stop taking antidepressant medication). 3. H is not itself influenced by any cause that affects MD via a route that does not go through S, as might occur if individuals prone to depression were more likely to be selected for H. In sum, the IM says that questions about whether X causes Y are questions about what would happen to Y if there were an intervention on X. One great virtue of the IM is that it allows psychiatrists freedom to use whatever family of variables seems appropriate to the characterization of a particular problem. There is no assumption that the variables have to be capable of figuring in quite general laws of nature, as with the deductive-nomological approach, or that the variables have to relate to basic spatiotemporal processes, as with the mechanistic approach. The fact is that the current evidence points to causal roles for variables of many different types, and the interventionist approach allows us to make explicit just what those roles are. For all that, there is a sense in which the approach is completely rigorous. It is particularly unforgiving in assuring that causation is distinguished from correlation. Though our exposition here is highly informal, we are providing an intuitive introduction to ideas whose formal development has been
76
Causality and Psychopathology
vigorously pursued by others (e.g., Spirtes, Glymour, & Scheines, 1993; Pearl, 2001; Woodward, 2003). If I were to try to put the essence of the IM of causality into a verbal description, it would be as follows: I know C to be a true cause of E if I can go into the world with its complex web of causal interrelationships, hold all these background relationships constant, and make a ‘‘surgical’’ intervention (or ‘‘twiddle’’) on C. If E changes, then I know C causes E. I see the nonreductive nature of the IM to be a critical strength for psychiatry. Unlike the mechanistic model, it makes no a priori judgment on the level of abstraction on which the causal processes can be best understood. The IM requires only that, at whatever level it is conceived, the cause makes a difference in the world. This is so important that it deserves repeating. The IM provides a single, clear empirical framework for the evaluation of all causal claims of relevance to psychiatry, from molecules to neural systems to psychological constructs to societies.
The IM and Mechanisms Before closing, two points about the possible relationship between the IM and mechanistic causal models are in order. First, it is in the nature of science to want to move from findings of causality to a clarification of the mechanisms involved—whether they are social, psychological, or molecular. The IM can play a role in this process by helping scientists to focus on the level at which causal mechanisms are most likely to be operative. However, a word of caution is in order. Given the extraordinary complexity of most psychiatric disorders, causal effects (and the mechanisms that underlie them) may be occurring on several levels. For example, because cognitive behavioral therapy works for MD and psychological mechanisms are surely the level at which this process can be currently best understood, this does not therefore mean that neurochemical interventions on MD (via pharmacology) cannot also work. On the other hand, although pharmacological tools can impact on symptoms of eating disorders, cultural models of female beauty, although operating at a very different level, can also impact on risk. Second, we should briefly ponder the following weighty question: Should the plausibility of a causal mechanism impact on our interpretation of IMs? Purists will say ‘‘No!’’ If we design the right study and the results are clear, then causal imputations follow. Pragmatists, whose position is well
4 Causal Thinking in Psychiatry
77
represented by the influential criteria of Hill (1965), will disagree. The conversation would go something like this: Pragmatist: Surely you cannot be serious! Do you mean if you find evidence for the efficacy of astrology or ESP, my interpretation of these results should not be influenced by the fact that we have no bloody idea of how such processes could work in the world as we understand it? Purist: I am entirely serious. Your comments about astrology and ESP clearly illustrate the problem. You have said that you are quite willing to impose your preconceptions of how you think the world should work on your interpretation of the data. The whole point in science is to get away from our biases, not embrace them. This is extra important in psychiatry, where our biases are often strong and our knowledge of how the world really works is typically nonexistent or at best meager. Personally, I am a bit on the pragmatist’s side, but the purists have a point well worth remembering.
Summary of the IM To summarize, the IM is attractive for psychiatry for four major reasons (Kendler & Campbell, 2009). First, the IM is anchored in the practical and reflects the fundamental goals of psychiatry, which are to intervene in the world to prevent and cure psychiatric disorders. Second, the IM provides a single, clear empirical framework for the evaluation of all causal claims in psychiatry. It provides a way by which different theoretical orientations within psychiatry can be judged by a common metric. Third, the framework provided by the IM can help us to find the optimal level for explanation and ultimately for intervention. Finally, the IM is explicitly agnostic to issues of the mind–body problem. Its application can help us replace the sterile metaphysical arguments about mind and brain which have yielded little of practical benefit with productive empirical research followed by rigorous conceptual and statistical analysis.
Acknowledgements This work was supported in part by grant DA-011287 from the US National Institutes of Health. Much of my thinking in this area has been stimulated
78
Causality and Psychopathology
by and developed in collaboration with John Campbell, a philosopher now at UC Berkeley (Kendler & Campbell, 2009).
References Davey-Smith, G. (2006). Randomized by (your) god: Robust inference from a nonexperimental study design. Journal of Epidemiology and Community Health, 60, 382–388. Grant, B. F., & Dawson, D. A. (1997). Age at onset of alcohol use and its association with DSM-IV alcohol abuse and dependence: results from the National Longitudinal Alcohol Epidemiologic Survey. Journal of Substance Abuse, 9, 103–110. Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine, 58, 295–300. Jessor, R., & Jessor, S. L. (1977). Problem behavior and psychosocial development: A longitudinal study of youth. New York: Academic Press. Kendler, K. S., & Baker, J. H. (2007). Genetic influences on measures of the environment: A systematic review. Psychological Medicine, 37, 615–626. Kendler, K. S., Bulik, C. M., Silberg, J., Hettema, J. M., Myers, J., & Prescott, C. A. (2000). Childhood sexual abuse and adult psychiatric and substance use disorders in women: An epidemiological and cotwin control analysis. Archives of General Psychiatry, 57, 953–959. Kendler, K. S., & Campbell, J. (2009). Interventionist causal models in psychiatry: Repositioning the mind–body problem. Psychological Medicine, 39, 881–887. Kendler, K. S., Karkowski, L. M., & Prescott, C. A. (1999). Causal relationship between stressful life events and the onset of major depression. American Journal of Psychiatry, 156, 837–841. Kendler, K. S., Neale, M. C., MacLean, C. J., Heath, A. C., Eaves, L. J., & Kessler, R. C. (1993). Smoking and major depression. A causal analysis. Archives of General Psychiatry, 50, 36–43. Kendler, K. S., & Prescott, C. A. (2006). Genes, environment, and psychopathology: Understanding the causes of psychiatric and substance use disorders. New York: Guilford Press. Pearl, J. (2001). Causality models, reasoning, and inference. Cambridge: Cambridge University Press. Pedersen, W., & Skrondal, A. (1998). Alcohol consumption debut: Predictors and consequences. Journal of Studies on Alcohol, 59, 32–42. Prescott, C. A., & Kendler, K. S. (1999). Age at first drink and risk for alcoholism: A noncausal association. Alcoholism, Clinical and Experimental Research, 23, 101–107. Salib, E. (2003). Effect of 11 September 2001 on suicide and homicide in England and Wales. British Journal of Psychiatry, 183, 207–212. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search. New York: Springer-Verlag. van den Oord, E. J., & Snieder, H. (2002). Including measured genotypes in statistical models to study the interplay of multiple factors affecting complex traits. Behavior Genetics, 32, 1–22. Woodward, J. (2003). Making things happen. New York: Oxford University Press. Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, part I: A counterfactual account. Nous, 37, 1–24.
5 Understanding the Effects of Menopausal Hormone Therapy Using the Women’s Health Initiative Randomized Trials and Observational Study to Improve Inference garnet l. anderson and ross l. prentice
Introduction Over the last decade, several large-scale randomized trials have reported results that disagreed substantially with the motivating observational studies on the value of various chronic disease–prevention strategies. One high-profile example of these discrepancies was related to postmenopausal hormone therapy (HT) use and its effects on cardiovascular disease and cancer. The Women’s Health Initiative (WHI), a National Heart, Lung, and Blood Institute–sponsored program, was designed to test three interventions for the prevention of chronic diseases in postmenopausal women, each of which was motivated by a decade or more of analytic epidemiology. Specifically, the trials were testing the potential for HT to prevent coronary heart disease (CHD), a low-fat eating pattern to reduce breast and colorectal cancer incidence, and calcium and vitamin D supplements to prevent hip fractures. Over 68,000 postmenopausal women were randomized to one, two, or all three randomized clinical trial (CT) components between 1993 and 1998 at 40 U.S. clinical centers (Anderson et al., 2003a). The HT component consisted of two parallel trials testing the effects of conjugated equine estrogens alone (E-alone) among women with prior hysterectomy and the effect of combined estrogen plus progestin therapy (E+P), in this case conjugated equine estrogens plus medroxyprogesterone acetate, among women with an intact uterus, on the incidence of CHD and overall health. In 2002, the randomized trial of E+P was stopped early, based on an assessment of risks exceeding benefits for chronic disease prevention, raising concerns among millions of menopausal women and their care providers about 79
80
Causality and Psychopathology
their use of these medicines. The trial confirmed the benefit of HT for fracture-risk reduction but the expected benefit for CHD, the primary study end point, was not observed. Rather, the trial results documented increased risks of CHD, stroke, venous thromboembolism (VTE), and breast cancer with combined hormones (Writing Group for the Women’s Health Initiative Investigators, 2002). Approximately 18 months later, the E-alone trial was also stopped, based on the finding of an adverse effect on stroke rates and the likelihood that the study would not confirm the CHD-prevention hypothesis. The results of this trial revealed a profile of risks and benefits that did not completely coincide with either the E+P trial results or previous findings from observational studies (Women’s Health Initiative Steering Committee, 2004). In conjunction with these trials, the WHI investigators conducted a parallel observational study (OS) of 93,676 women recruited from the same population sources with similar data-collection protocols and follow-up. OS enrollees were similar in many demographic and chronic disease risk factor characteristics but were ineligible for or unwilling to be randomized into the CT (Hays et al. 2003). Because a substantial fraction of women in the OS were current or former users of menopausal HT, joint analyses of the effects of HT use in the CT and OS provide an opportunity to examine design and analysis methods that serve to compare and contrast these two study designs, to identify some of the strengths and weakness of each, and to determine the extent to which detailed data analysis provisions could bring these results into agreement and thereby explain the discrepancies between these randomized trials and observational studies. This chapter reviews the motivation for the hormone trials and describes the major findings for chronic disease effects, with particular attention to the results that differed from what was hypothesized. Then, the series of joint analyses of CT and corresponding OS is presented. Finally, some discussion about the implications of these analyses for the design and analysis of future studies is provided.
Hormone Therapy Trial Background Since the 1940s, women have been offered exogenous estrogens to relieve menopausal symptoms. The use of unopposed estrogens grew until evidence of an increased risk of endometrial cancer arose in the 1970s and tempered enthusiasm for these medicines, at least for the majority of women who had not had a hysterectomy. With the subsequent information that progestin effectively countered the carcinogenic effects of estrogen in the endometrium, HT prescriptions again climbed (Wysowski, Golden, & Burke, 1995). Observational studies found that use of HT was associated with lower risks of osteoporosis and fractures; subsequently, the U.S. Food and Drug
5 Understanding the Effects of Menopausal Hormone Therapy
81
Administration approved HT for the treatment and prevention of osteoporosis, leading to still further increases in the prevalence and duration of hormone use (Hersh, Stefanick, & Stafford, 2004). The pervasiveness of this exposure permitted a large number of observational studies of both case–control and prospective cohort designs to examine the relationship between hormone use and a wide range of diseases among postmenopausal women. Most of the more than 30 observational studies available at the initiation of the WHI reported substantial reductions in CHD rates, the major cause of morbidity and mortality among postmenopausal women (Bush et al., 1987; Stampfer & Colditz, 1991; Grady et al., 1992). Support for the estrogen and heart disease hypothesis, which originated partly from the male–female differences in CHD rates and the marked increase in CHD rates after menopause, was further buttressed by mechanistic studies showing beneficial effects of HT on blood lipid profiles and vascular motility in animal models (Pick, Stamler, Robard, & Katz, 1952; Hough & Zilversmit, 1986; Adams et al., 1990; Clarkson, Anthony, & Klein, 1996). The estimated effects were substantial, ranging from 30%–70% reductions, prompting considerable public-health interest in HT as a preventive intervention for CHD in postmenopausal women. One barrier to more widespread use was the reports of adverse effects, most notably breast cancer. Numerous observational studies had reported a modest increase in breast-cancer risk with longer-term exposure to estrogen (Steinberg et al., 1991; Barrett-Conner & Grady, 1998). The effect of adding progestin, however, was not clear. Evidence of increased risks of VTE and biliary disease existed, but possible reductions in risk of colorectal cancer, strokes, mortality, dementia, and many other conditions associated with aging, in addition to menopausal symptom control, suggested that HT was overall beneficial for menopausal women. The overall effects, while still imprecisely estimated, suggested important benefits for prevention of chronic disease (Grady et al., 1992). Increasingly, postmenopausal women were encouraged to use HT to reduce their risks of osteoporosis, fractures, and CHD; in fact, prescriptions reached approximately 90 million per year in the United States alone (Hersh et al., 2004). The positive view of HT was so widely held that the initiation of a long-term, placebo-controlled trial in the WHI was considered highly controversial (Food and Nutrition Board and Board on Health Sciences Policy, 1993).
WHI Trial Design In 1993 and in this environment of considerable optimism regarding an overall benefit to postmenopausal women, the WHI HT trials were launched.
82
Causality and Psychopathology
The final design specified two parallel randomized, double-blind, placebocontrolled trials testing E-alone in women with prior hysterectomy and E+P in women with an intact uterus. The primary objective of each trial was to determine whether HT would reduce the incidence of CHD and provide overall health benefit with respect to chronic disease rates. Postmenopausal women aged 50–79 years were eligible if they were free of any condition with expected survival of less than 3 years and satisfied other criteria related to ability to adhere to a randomized assignment, safety, and competing risk considerations (Women’s Health Initiative Study Group, 1998). A total of 10,739 women were recruited into the trial of E-alone and 16,608 were accrued into the E+P trial. Although observational studies suggested that about a 45% reduction in CHD risk could be achieved with HT, the trial design assumed a 21% reduction, with 81% and 88% power for E-alone and E+P, respectively. The conservatism in the specified effect size was intended to account for the anticipated lack of adherence to study pills, lag time to full intervention effect, loss to follow-up in the trial, and potential anticonservatism in the motivating observational studies results (Anderson et al., 2003a). Breast-cancer incidence was defined as the primary safety outcome of the HT trials. The power to detect a 22% increase in breast cancer during the planned duration of the trial was relatively low (46% for E-alone and 54% for E+P), so the protocol indicated that an additional 5 years of follow-up without further intervention would be required to assure 79% and 87% for E-alone and E+P, respectively. Pooling the two trials was also an option, if the results were sufficiently similar. Additional design considerations have been published (Women’s Health Initiative Study Group, 1998; Anderson et al., 2003a).
Trial Findings The independent Data and Safety Monitoring Board terminated the E+P trial after a mean 5.2 years of follow-up, when the breast-cancer statistic exceeded the monitoring boundary defined for establishing this adverse effect; and this statistic was supported by an overall assessment of harms exceeding benefits for the designated outcomes. Reductions in hip-fracture and colorectal-cancer incidence rates were observed, but these were outweighed by increases in the risk of CHD, stroke, and VTE, particularly in the early follow-up period, in addition to the adverse effect on breast cancer. A prespecified global index, devised to assist in benefit versus risk monitoring and defined for each woman as time to the first event for any of the designated clinical events
5 Understanding the Effects of Menopausal Hormone Therapy
83
(CHD, stroke, pulmonary embolism, breast cancer, colorectal cancer, endometrial cancer, hip fractures, or death from other causes), found a 15% increase in the risk of women having one or more of these events (Writing Group for the Women’s Health Initiative Investigators, 2002). The final ‘‘intention-to-treat’’ trial results (mean follow-up 5.6 years, Table 5.1) confirm the interim findings. The 24% increase in breast-cancer incidence (Chlebowski et al., 2003), the 31% increase in risk of stroke
Table 5.1 Hypothesized Effects of HT at the Time the WHI Began and the Final Results of the Two HT Trials Hypothesized Effect
E+P HR
Coronary heart disease Stroke Pulmonary embolism Venous thromboembolism Breast cancer Colorectal cancer Endometrial cancer Hip fractures Total fractures Total mortality Global indexp
# $# "
1.24
95% CI a
E-Alone AR HR
1.00–1.54 +6
95% CI b
0.95
AR
0.79–1.15 –3
1.31c 1.02–1.68 +8 1.37d 1.09–1.73 +12 2.13e 1.45–3.11 +10 1.37f 0.90–2.07 +4
"
2.06e 1.57–2.70 +18 1.32f
" #
1.24g 1.02–1.50 +8 0.56i 0.38–0.81 –7 0.81k 0.48–1.36 –1
0.80h 0.62–1.04 –6 1.12j 0.77–1.55 +1 NA
# # #
0.67l 0.76l 0.98n 1.15n
0.65m 0.71m 1.04o 1.01o
0.47–0.96 0.69–0.83 0.82–1.18 1.03–1.28
–5 –47 –1 +19
0.99–1.75 +8
0.45–0.94 0.64–0.80 0.91–1.12 1.09–1.12
–7 –53 +3 +2
HR, hazard ratio; CI, confidence interval; AR, attributable risk. a From Manson et al. (2003). b From Hsia et al. (2006). c From Wassertheil-Smoller et al. (2003). d From Hendrix et al. (2006). e From Cushment et al. (2004). f From Curb et al. (2006). g From Chlebowski et al. (2003). h From Stefanick et al. (2006). i From Chlebowski et al. (2004). j From Ritenbaugh et al. (2008). k From Anderson et al. (2003b). l From Cauley et al. (2003). m From Jackson et al. (2006). n From Writing Group for the Women’s Health Initiative Investigators (2002). o From Women’s Health Initiative Steering Committee (2004). p Global index defined as time to first event among coronary heart disease, stroke, pulmonary embolism, breast cancer, colorectal cancer, endometrial cancer (E+P only), hip fractures, and death from other causes.
84
Causality and Psychopathology
(Wassertheil-Smoller et al., 2003), and the doubling of VTE rates (Cushman et al., 2004) in the E+P group represented attributable risks of 8, 8, and 18 per 10,000 person-years, respectively, in this population. Benefits of seven fewer colorectal cancers (44% reduction) (Chlebowski et al., 2004) and five fewer hip fractures (33% reduction) (Cauley et al., 2003) per 10,000 personyears were also reported. It was the observed 24% increase in CHD risk or six additional events per 10,000 person-years (Manson et al., 2003), however, that was the most surprising and perhaps the most difficult finding to accept. Neither the usual 95% confidence intervals nor the protocol-defined weighed log-rank statistic indicate that this is clearly statistically significant. Nevertheless, even the very conservative adjusted confidence intervals, which controlled for the multiple testing, ruled out the level of protection described by the previous observational studies as well as the conservative projection for CHD benefit used in the trial design (Anderson et al., 2007). The results of the E-alone trial, stopped by the National Institutes of Health approximately 18 months later, provided a different profile of risks and benefits (Women’s Health Initiative Steering Committee, 2004). The final results (Table 5.1), based on an average of 7.1 years of follow-up, reveal an increased risk of stroke with E-alone of similar magnitude to that observed with E+P (Hendrix et al., 2006) but no effect on CHD rates (Hsia et al., 2006). E-alone appeared to increase the risk of VTE events (Curb et al., 2006) but to a lesser extent than was observed with E+P. The E-alone hazard ratios for hip, vertebral, and other fractures were comparable to those for E+P (Jackson et al., 2006). Most surprising of the E-alone findings was the estimated 23% reduction in breast-cancer rates, which narrowly missed being statistically significant (Stefanick et al., 2006), in contrast to the increased risk seen in a large number of observational studies and the E+P trial. E-alone had no effect on colorectal-cancer risk (Ritenbaugh et al., 2008), another finding that differed from previous studies and the E+P trial. The hazard ratios for total mortality and the global index were close to one, indicating an overall balance in the number of women randomized to E-alone or to placebo who died or experienced one or more of these designated health outcomes (Women’s Health Initiative Steering Committee, 2004).
Contrasting the WHI CT and OS To better understand the divergent findings and, if possible, to bring the two types of studies into agreement, WHI investigators conducted a series of analyses examining cardiovascular outcomes in the CT and OS data jointly
5 Understanding the Effects of Menopausal Hormone Therapy
85
(Prentice et al., 2005, 2006). The parallel recruitment and follow-up procedures in the OS and CT components of the WHI make this a particularly interesting exercise since differences in data sources and collection protocols are minimized. For both E+P and E-alone, the analogous user and nonuser groups from the OS were selected for both HT trials. Specifically, for the E+P analyses, OS women with a uterus who were using an estrogen plus progestin combination or were not using any HT at baseline were defined as the exposed (n = 17,503) and unexposed (n = 35,551) groups, respectively (Prentice et al., 2005). Similarly, for E-alone analyses, 21,902 estrogen users and 21,902 nonusers of HT in OS participants reported a prior hysterectomy at baseline (Prentice et al., 2006). Failure times were defined as time since study enrollment (OS) or randomization (CT). In the CT, follow-up was censored at the time each intervention was stopped. In the OS, censoring was applied at a time chosen to give a similar average follow-up time (5.5 years for OS/E+P and 7.1 years for OS/E-alone). For CT participants, HT exposure was defined by randomization and analyses were based on the intention-to-treat principle. In parallel, OS participants’ HT exposure was defined by HT use at the time of study enrollment. In OS women, the ratio of age-adjusted event rates in E+P users to that in nonusers was less than one for CHD (0.71) and stroke (0.77) and close to one for VTE (1.06), but each was 40%–50% lower than the corresponding statistics from the randomized trial (Table 5.2, upper panel) and therefore similar to the motivating observational studies. For E-alone, the corresponding ratios were all less than one (0.68 for CHD, 0.95 for stroke, and 0.78 for VTE) and 30%–40% lower than the CT estimates (Table 5.2, lower panel). The cardiovascular risk profile (race/ethnicity, education, income, body mass index [BMI], physical activity, current smoking status, history of cardiovascular disease, and quality of life) among E+P users in the OS was somewhat better than that for OS nonusers (examples of these shown in Figure 5.1). The distribution of these risk factors in the CT was balanced across treatment arms but resembled that of the OS nonuser population more than the corresponding HT user group. A similar pattern of healthy user bias was observed for E-alone among OS participants. Aspects of HT exposure also varied between the CT and OS. Among HT users in the OS, the prevalence of long-term use, defined here as the preenrollment exposure duration for the HT regimen reported at baseline, was considerably higher than in the CT (Figure 5.2); but few were recent initiators of HT in the OS. In the CT, most participants had never used HT before or had used it only briefly. In terms of both duration and recency of each regimen, the distributions in the CT more closely resembled those of the OS nonusers (Prentice et al., 2005, 2006).
Causality and Psychopathology
86
Table 5.2 Hormone Therapy Hazard Ratios (95% Confidence Intervals) for CHD,1 Stroke, and VTE Estimated Separately in the WHI CT and OS and Jointly with a Ratio Measure of Agreement (OS/CT) Between the Two Study Components
Estrogen+ progestina Age-adjusted Multivariate adjustedb By time since initiation 5 years CT
Figure 5.3 HT hazard ratios in the Observational Study based on a simple multivariate model (OS), with adjustment for the OS/CT hazard ratio estimated from the alternate trial (Adjustment 1), and assuming proportional hazards for E-alone to E+P (Adjustment 2), compared to the corresponding Clinical Trial hazard ratios (CT). Derived from Prentice et al, 2005, 2006.
5 Understanding the Effects of Menopausal Hormone Therapy
95
E-alone Hazard Ratios for CHD 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 < 2 years
2-5 years
> 5 years
E-alone Hazard Ratios for Stroke 1.8 1.6 1.4 1.2 1 0.5 0.6 0.4 0.2 0 < 2 years
2-5 years
> 5 years
E-alone Hazard Ratios for VTE 2.5 2 1.5 1 0.5 0 < 2 years OS
2-5 years
Adjustment 1
Adjustment 2
> 5 years CT
Figure 5.3 (Con’t).
OS/CT analyses suggest a reduced risk of CHD among these younger women with prior hysterectomy.
Discussion The stark contrasts between the results from a large number of observational studies and the WHI randomized trials of menopausal HT provide impetus for reflection on the role of observational studies in evaluating therapies.
96
Causality and Psychopathology
Despite the usual effort to control for potential confounders in most previous observational studies, the replication of findings of CHD benefit and breastcancer risk with HT across different study populations and study designs, and support from mechanistic studies, clinically relevant aspects of the relationship between HT and risk for several chronic diseases were not appreciated until the WHI randomized trial results were published. The reliance on lower-level evidence may have exposed millions of women to small increases in risk of several serious adverse effects. Randomized trials have their own limitations. In this example, the WHI HT trials tested two specific regimens in a population considered appropriate for CHD prevention. As many have claimed, the trial design did not fully reflect the way HT had been used in practice—prescribed near the time of menopause, with possible tailoring of regimen to the individual. Also, while the WHI tested HT in the largest number of women in the 50–59 year age range ever studied, using the same agents and dosages used by the vast majority of U.S. women, estimates of HT effects within this subgroup remain imprecise because of the very low event rate. This example raises many questions with regard in the public-health research enterprise. When is it reasonable to rely on second-tier evidence to test a hypothesis? Are there better methods to test these hypotheses? Can we learn more from our trials, and can we use this to make observational studies more reliable? There are insufficient resources to conduct full-scale randomized trials of the numerous hypotheses of interest in public health and clinical medicine. Observational studies will remain a mainstay of our research portfolio but methods to increase the reliability of observational study results, through better designs and analytic tools, are clearly needed. Nevertheless, when assessing an intervention of public-health significance, the WHI experience suggests that the evaluation needs to be anchored in a randomized trial. It seems highly unlikely that the importance of the timedependent effect of HT on cardiovascular disease would have been recognized without the Heart Estrogen–Progestin Replacement Study (Hulley et al., 1998) and the WHI randomized trials. Neither observational studies conducted before WHI nor the WHI OS itself would have observed these early adverse cardiovascular disease effects without the direction from the trials to look for it. The statistical alignment of the OS and CT results relied on several other factors. Detailed information on the history of HT use, an extensive database of potential confounders, and meticulous modeling of these factors were critical. For an exposure that is more complex, such as dietary intake or physical activity, the measurement problems are likely too great to permit such an approach. Less obvious but probably at least as important was the
5 Understanding the Effects of Menopausal Hormone Therapy
97
uncommon feature of WHI in having both the randomized trials and an OS conducted in parallel, minimizing methodologic differences in outcome ascertainment, data collection, and some aspects of the study population. Such a study design has rarely been used but could be particularly advantageous if there are multiple related therapies already in use but the resources are available to test only one in a full-scale design. The work of Prentice and colleagues (2005, 2006, 2008a, 2008b, 2009) provides important examples of methods to leverage the information from clinical trials in the presence of a parallel OS. The exercises in which adjustments derived by the joint OS and CT analysis of one HT were applied to OS results for a related therapy suggest that it may be possible to evaluate one intervention in a rigorous trial setting and expand the inference to similar interventions in OS data. The joint analyses of CT and OS data to strengthen subgroup analyses would have almost universal appeal. Additional effort to define the requirements and assumptions in these designs and analyses would be helpful. In summary, the WHI provides an important example of the weakness of observational study data, some limitations of randomized trials, and an approach to combining the two to produce more reliable inference.
References Adams, M. R., Kaplan, J. R., Manuck, S. B., Koritinik, D. R., Parks, J. S., Wolfe, M. S., et al. (1990). Inhibition of coronary artery atherosclerosis by 17-b estradiol in ovariectomized monkeys: Lack of an effect of added progesterone. Arteriosclerosis, 10, 1051–1057. Anderson, G. L., Manson, J. E., Wallace, R., Lund, B., Hall, D., Davis, S., et al. (2003a). Implementation of the WHI design. Annals of Epidemiology, 13, S5–S17. Anderson, G. L., Judd, H. L., Kaunitz, A. M., Barad, D. H., Beresford, S. A. A., Pettinger, M., et al. (2003b). Effects of estrogen plus progestin on gynecologic cancers and associated diagnostic procedures: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 290(13), 1739–1748. Anderson, G. L., Kooperberg, C., Geller, N., Rossouw, J. E., Pettinger, M., & Prentice, R. L. (2007). Monitoring and reporting of the Women’s Health Initiative randomized hormone therapy trials. Clinical Trials, 4, 207–217. Barrett-Connor, E., & Grady, D. (1998). Hormone replacement therapy, heart disease and other considerations. Annual Review of Public Health, 19, 55–72. Bush, T. L., Barrett-Connor, E., Cowan, L. D., Criqui, M. H., Wallace, R. B., Suchindran, C. M., et al. (1987). Cardiovascular mortality and noncontraceptive use of estrogen in women: Results from the Lipid Research Clinics Program Follow-up Study. Circulation, 75, 1102–1109. Cauley, J. A., Robbins, J., Chen, Z., Cummings, S. R., Jackson, R. D., LaCroix, A. Z., et al. (2003). Effects of estrogen plus progestin on risk of fracture and bone mineral density: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 290, 1729–1738.
98
Causality and Psychopathology
Chlebowski, R. T., Hendrix, S. L., Langer, R. D., Stefanick, M. L., Gass, M., Lane, D., et al. (2003). Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 289, 3243–3253. Chlebowski, R. T., Wactawski-Wende, J., Ritenbaugh, C., Hubbell, F. A., Ascensao, J., Rodabough, R. J., et al. (2004). Estrogen plus progestin and colorectal cancer in postmenopausal women. New England Journal of Medicine, 350, 991–1004. Clarkson, T. B., Anthony, M. S., & Klein, K. P. (1996). Hormone replacement therapy and coronary artery atherosclerosis: The monkey model. British Journal of Obstetrics and Gynaecology, 103(Suppl. 13), 53–58. Curb, J. D., Prentice, R. L., Bray, P. F., Langer, R. D., Van Horn, L., Barnabei, V. M., et al. (2006). Venous thrombosis and conjugated equine estrogen in women without a uterus. Archives of Internal Medicine, 166, 772–780. Cushman, M., Kuller, L. H., Prentice, R., Rodabough, R. J., Psaty, B. M., Stafford, R. S., et al. (2004). Estrogen plus progestin and risk of venous thrombosis. Journal of the American Medical Association, 292, 1573–1580. Food and Nutrition Board and Board on Health Sciences Policy (1993). An assessment of the NIH Women’s Health Initiative. S. Thaul and D. Hotra (Eds.). Washington, DC: National Academy Press. Grady, D., Rubin, S. M., Pettiti, D. B., Fox, C. S., Black, D, Ettinger, B., et al. (1992). Hormone therapy to prevent disease and prolong life in postmenopausal women. Annals of Internal Medicine, 117, 1016–1036. Hays, J., Hunt, J. R., Hubbell, F. A., Anderson, G. L., Limacher, M., Allen, C., et al. (2003). The Women’s Health Initiative recruitment methods and results. Annals of Epidemiology, 13, S18–S77. Hendrix, S. L., Wassertheil-Smoller, S., Johnson, K. C., Howard, B. V., Kooperberg, C., Rossouw, J. E., et al. (2006). Effects of conjugated equine estrogen on stroke in the Women’s Health Initiative. Circulation, 113, 2425–2434. Hersh, A. L., Stefanick, M., & Stafford, R. S. (2004). National use of postmenopausal hormone therapy. Journal of the American Medical Association, 291, 47–53. Hough, J. L., & Zilversmit, D. B. (1986). Effect of 17-b estradiol on aortic cholesterol content and metabolism in cholesterol-fed rabits. Arteriosclerosis, 6, 57–64. Hsia, J., Langer, R. D., Manson, J. E., Kuller, L., Johnson, K. C., Hendrix, S. L., et al. (2006). Conjugated equine estrogens and coronary heart disease: The Women’s Health Initiative. Archives of Internal Medicine, 166, 357–365. Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B., et al. (1998). Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Journal of the American Medical Association, 280, 605–613. Jackson, R. D., Wactawski-Wende, J., LaCroix, A. Z., Pettinger, M., Yood, R. A., Watts, N. B., et al. (2006). Effects of conjugated equine estrogen on risk of fractures and BMD in postmenopausal women with hysterectomy: Results from the Women’s Health Initiative randomized trial. Journal of Bone and Mineral Research, 21, 817–828. Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., et al. (2003). Estrogen plus progestin and the risk of coronary heart disease. New England Journal of Medicine, 349, 523–534. Million Women Study Collaborators (2003). Breast cancer and hormone replacement therapy in the Million Women Study. Lancet, 362, 419–427.
5 Understanding the Effects of Menopausal Hormone Therapy
99
Pick, R., Stamler, J., Robard, S., & Katz, L. N. (1952). The inhibition of coronary atherosclerosis by estrogens in cholesterol-fed chicks. Circulation, 6, 276–280. Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G., et al. (2005). Combined postmenopausal hormone therapy and cardiovascular disease: Toward resolving the discrepancy between Women’s Health Initiative clinical trial and observational study results. American Journal of Epidemiology, 162, 404–414. Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G., et al. (2006). Combined analysis of Women’s Health Initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. American Journal of Epidemiology, 163, 589–599. Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Pettinger, M., Hendrix, S. L., et al. (2008a). Estrogen plus progestin therapy and breast cancer in recently postmenopausal women. American Journal of Epidemiology, 167, 1207–1216. Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Langer, R. D., Pettinger, M., et al. (2008b). Conjugated equine estrogens and breast cancer risk in the Women’s Health Initiative clinical trial and observational study. American Journal of Epidemiology, 167, 1407–1415. Prentice, R. L., Pettinger, M., Beresford, S. A., Wactawski-Wende, J., Hubbell, F. A., Stefanick, M. L., et al. (2009). Colorectal cancer in relation to postmenopausal estrogen and estrogen plus progestin in the Women’s Health Initiative clinical trial and observational study. Cancer Epidemiology, Biomarkers and Prevention, 18, 1531–1537. Ritenbaugh, C., Stanford, J. L., Wu, L., Shikany, J. M., Schoen, R. E., Stefanick, M. L., et al. (2008). Conjugated equine estrogens and colorectal cancer incidence and survival: The Women’s Health Initiative randomized clinical trial. Cancer Epidemiology, Biomarkers and Prevention, 17, 2609–2618. Robbins, J., & Finklestein, D. (2000). Correcting for non-compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics, 56, 779–788. Rossouw, J. E., Prentice, R. L., Manson, J. E., Wu, L., Barad, D., Barnabei, V. M., et al. (2007). Postmenopausal hormone therapy and risk of cardiovascular disease by age and years since menopause. Journal of the American Medical Association, 297, 1465–1477. Stampfer, M. J., & Colditz, G. A. (1991). Estrogen replacement therapy and coronary heart disease: A quantitative assessment of the epidemiologic evidence. Preventive Medicine, 20, 47–63. Stefanick, M. L., Anderson, G. L., Margolis, K. L., Hendrix, S. L., Rodabough, R. J., Paskett, E. D., et al. (2006). Effects of conjugated equine estrogens on breast cancer and mammography screening in postmenopausal women with hysterectomy. Journal of the American Medical Association, 295, 1647–1657. Steinberg, K. K., Thacker, S. B., Smith, S. J., Stroup, D. F., Zack, M. M., Flanders, W. D., et al. (1991). A meta-analysis of the effect of estrogen replacement therapy on the risk of breast cancer. Journal of the American Medical Association, 265, 1985–1990. Wassertheil-Smoller, S., Hendrix, S. L., Limacher, M., Heiss, G., Kooperberg, C., Baird, A., et al. (2003). Effect of estrogen plus progestin on stroke in postmenopausal women: The Women’s Health Initiative: a randomized trial. Journal of the American Medical Association, 289, 2673–2684.
100
Causality and Psychopathology
Women’s Health Initiative Steering Committee (2004). Effects of conjugated equine estrogen in postmenopausal women with hysterectomy: The Women’s Health Initiative randomized controlled trial. Journal of the American Medical Association, 291, 1701–1712. Women’s Health Initiative Study Group (1998). Design of the Women’s Health Initiative clinical trial and observational study. Controlled Clinical Trials, 19, 61–109. Writing Group for the Women’s Health Initiative Investigators (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial. Journal of the American Medical Association, 288, 321–333. Wysowski, D. K., Golden, L., & Burke, L. (1995). Use of menopausal estrogens and medroxyprogesterone in the United States, 1982–1992. Obstetrics and Gynecology, 85, 6–10.
part ii Innovations in Methods
This page intentionally left blank
6 Alternative Graphical Causal Models and the Identification of Direct Effects james m. robins and thomas s. richardson
Introduction The subject-specific data from either an observational or experimental study consist of a string of numbers. These numbers represent a series of empirical measurements. Calculations are performed on these strings and causal inferences are drawn. For example, an investigator might conclude that the analysis provides strong evidence for ‘‘both an indirect effect of cigarette smoking on coronary artery disease through its effect on blood pressure and a direct effect not mediated by blood pressure.’’ The nature of the relationship between the sentence expressing these causal conclusions and the statistical computer calculations performed on the strings of numbers has been obscure. Since the computer algorithms are well-defined mathematical objects, it is crucial to provide formal causal models for the English sentences expressing the investigator’s causal inferences. In this chapter we restrict ourselves to causal models that can be represented by a directed acyclic graph. There are two common approaches to the construction of causal models. The first approach posits unobserved fixed ‘potential’ or ‘counterfactual’ outcomes for each unit under different possible joint treatments or exposures. The second approach posits relationships between the population distribution of outcomes under experimental interventions (with full compliance) to the set of (conditional) distributions that would be observed under passive observation (i.e., from observational data). We will refer to the former as ‘counterfactual’ causal models and the latter as ‘agnostic’ causal models (Spirtes, Glymour, & Scheines, 1993) as the second approach is agnostic as to whether unit-specific counterfactual outcomes exist, be they fixed or stochastic. The primary difference between the two approaches is ontological: The counterfactual approach assumes that counterfactual variables exist, while the 103
104
Causality and Psychopathology
agnostic approach does not require this. In fact, the counterfactual theory logically subsumes the agnostic theory in the sense that the counterfactual approach is logically an extension of the latter approach. In particular, for a given graph the causal contrasts (i.e. parameters) that are well-defined under the agnostic approach are also well-defined under the counterfactual approach. This set of contrasts corresponds to the set of contrasts between treatment regimes (strategies) which could be implemented in an experiment with sequential treatment assignments (ideal interventions), wherein the treatment given at stage m is a (possibly random) function of past covariates on the graph. We refer to such contrasts or parameters as ‘manipulable with respect to a given graph’. As discussed further in Section 1.8, the set of manipulable contrasts for a given graph are identified under the associated agnostic causal model from observational data with a positive joint distribution and no hidden (i.e. unmeasured) variables. A parameter is said to be identified if it can be expressed as a known function of the distribution of the observed data. A discrete joint distribution is positive if the probability of a joint event is nonzero whenever the marginal probability of each individual component of the event is nonzero. Although the agnostic theory is contained within the counterfactual theory, the reverse does not hold. There are causal contrasts that are welldefined within the counterfactual approach that have no direct analog within the agnostic approach. An example that we shall discuss in detail is the pure direct effect (also known as a natural direct effect) introduced in Robins and Greenland (1992). The pure direct effect (PDE) of a binary treatment X on Y relative to an intermediate variable Z is the effect the treatment X would have had on Y had (contrary to fact) the effect of X on Z been blocked. The PDE is non-manipulable relative to X, Y and Z in the sense that, in the absence of additional assumptions, the PDE does not correspond to a contrast between treatment regimes of any randomized experiment performed via interventions on X, Y and Z. In this chapter, we discuss three counterfactual models, all of which agree in two important respects: first they agree on the set of well-defined causal contrasts; second they make the consistency assumption that the effect of a (possibly joint) treatment on a given subject depends neither on whether the treatment was freely chosen by, versus forced on, the subject nor on the treatments received by other subjects. However the counterfactual models do not agree as to the subset of these contrasts that can be identified from observational data with a positive joint distribution and no hidden variables. Identifiability of causal contrasts in counterfactual models is obtained by assuming that (possibly conditional on prior history) the treatment received at a given time is independent of some set of counterfactual outcomes. Different versions of this independence assumption are
6 Alternative Graphical Causal Models
105
possible: The stronger the assumption (i.e., the more counterfactuals assumed independent of treatment), the more causal contrasts that are identified. For a given graph, G, all the counterfactual models we shall consider identify the set of contrasts identified under the agnostic model for G. We refer to this set of contrasts as the manipulable contrasts relative to G. Among the counterfactual models we discuss, the model derived from the non-parametric structural equation model (NPSEM) of Pearl (2000) makes the strongest independence assumption; indeed, the assumption is sufficiently strong that the PDE may be identified (Pearl, 2001). In contrast, under the weaker independence assumption of the Finest Fully Randomized Causally Interpretable Structured Tree Graph (FFRCISTG) counterfactual model of Robins (1986) or the Minimal Counterfactual Model (MCM) introduced in this chapter, the PDE is not identified. The MCM is the weakest counterfactual model (i.e., contains the largest set of distributions over counterfactuals) that satisfies the consistency assumption and identifies the set of manipulable contrasts based on observational data with a positive joint distribution and no hidden variables. The MCM is equivalent to the FFRCISTG model when all variables are binary. Otherwise the MCM is obtained by a mild further weakening of the FFRCISTG independence assumption. The identification of the non-manipulable PDE parameter under an NPSEM appears to violate the slogan ‘‘no causation without manipulation.’’ Indeed, (Pearl, 2010) has recently advocated the alternative slogan ‘‘causation before manipulation’’ in arguing for the ontological primacy of causation relative to manipulation. Such an ontological primacy follows, for instance, from the philosophical position that all dependence between counterfactuals associated with different variables is due to the effects of common causes (that are to be included as variables in the model and on the associated graph, G), thus privileging the NPSEM over other counterfactual models. (Pearl, 2010) privileges the NPSEM over other models but presents different philosophical arguments for his position. Pearl’s view is anathema to those with a refutationist view of causality (e.g., Dawid (2000)) who argue that a theory that allows identification of nonmanipulable parameters (relative to a graph G) is not a scientific theory because some of its predictions (e.g., that the PDE is a particular function of the distribution of the observed data) are not experimentally testable and, thus, are non-refutable. Indeed, in Appendix B, we give an example of a data generating process that satisfies the assumptions of an FFRCISTG model but not those of an NPSEM such that (i) the NPSEM prediction for the PDE is false but (ii) the predictions made by all four causal models for the manipulable parameters relative to the associated graph G are correct. In this setting, anyone who assumed an NPSEM would falsely believe he or she was able to
106
Causality and Psychopathology
consistently estimate the PDE parameter from observational data on the variables on G and no possible experimental intervention on these variables could refute either their belief in the correctness of the NPSEM or their belief in the validity (i.e., consistency) of their estimator of the PDE. In Appendix C, we derive sharp bounds for the PDE under the assumption that the FFRCISTG model associated with graph G holds. We find that these bounds may be quite informative, even though the PDE is not (point) identified. This strict refutationist view of causality relies on the belief that there is a sharp separation between the manipulable and non-manipulable causal contrasts (relative to graph G) because every prediction made concerning a manipulable contrast based on observational data can be checked by an experiment involving interventions on variables in G. However, this view ignores the facts that (i) such experiments may be infeasible or unethical; (ii) such empirical experimental tests will typically require an auxiliary assumption of exchangeability between the experimental and observational population and the ability to measure all the variables included in the causal model, neither of which may hold in practice; and (iii) such tests are themselves based upon the untestable assumption that experimental interventions are ideal. Thus, many philosophers of science do not agree with the strict refutationist’s sharp separation between manipulable and non-manipulable causal contrasts. However, Pearl does not rely on this argument in responding to the refutationist critique of the NPSEM that it can identify a contrast, the PDE, that is not subject to experimental test. Rather, he has responded by describing a scenario in which the PDE associated with a particular NPSEM is identifiable, scientifically meaningful, of substantive interest, and corresponds precisely to the intent-to-treat parameter of a certain randomized controlled experiment. Pearl’s account appears paradoxical in light of the results described above, since it suggests that the PDE may be identified via intervention. Resolving this apparent contradiction is the primary subject of this chapter. We will show that implicit within Pearl’s account is a causal model associated with an expanded graph (G0) containing more variables than Pearl’s original graph (G). Furthermore, although the PDE of the original NPSEM counterfactual model is not a manipulable parameter relative to G, it is manipulable relative to the expanded graph G0. Consequently, the PDE is identified by all four of the causal models (agnostic, MCM, FFRCISTG and NPSEM) associated with G0. The causal models associated with graph G0 formalize Pearl’s verbal, informal account and constitute the ‘‘additional assumptions’’ required to make the original NPSEM’s pure direct effect contrast equal to a contrast between treatments in a randomized
6 Alternative Graphical Causal Models
107
experiment—a randomized experiment whose treatments correspond to variables on the expanded graph G0 that are absent from the original graph G. However, the distribution of the variables of the expanded graph G0 is not positive. Furthermore, the available data are restricted to the variables of the original graph G. Hence, it is not at all obvious prima facie that the expanded causal model’s treatment contrasts will be identified. Remarkably, we prove that, under all four causal models associated with the larger graph G0, the manipulable causal contrast of the expanded causal model that equals the PDE of Pearl’s original NPSEM G is identified from observational data on the original variables. This identification crucially relies on certain deterministic relationships between variables in the expanded model. Our proof thus resolves the apparent contradiction; furthermore, it shows that the ontological primacy of manipulation reflected in the slogan ‘‘no causation without manipulation’’ can be maintained by interpreting the PDE parameter of a given counterfactual causal model as a manipulable causal parameter in an appropriate expanded model. Having said this, although in Pearl’s scenario the intervention associated with the expanded causal model was scientifically plausible, we discuss a modification of Pearl’s scenario in which the intervention required to interpret the PDE contrast of the original graph G as a manipulable contrast of an expanded graph G0 is more controversial. Our discussion reveals the scientific benefits that flow from the exercise of trying to provide an interventionist interpretation for a non-manipulable causal parameter identified under an NPSEM associated with a graph G. Specifically, the exercise often helps one devise explicit, and sometimes even practical, interventions, corresponding to manipulable causal parameters of an expanded graph G0. The exercise also helps one recognize when such interventions are quite a stretch. In this chapter, our goal is not to advocate for the primacy of manipulation or of causation. Rather, our goal is to contribute both to the philosophy and to the mathematics of causation by demonstrating that the apparent conflict between these paradigms is often not a real one. The reduction of an identified non-manipulable causal contrast of an NPSEM to a manipulable causal contrast of an expanded model that is then identified via deterministic relationships under the expanded agnostic model is achieved here for the PDE. A similar reduction for the effect of treatment on the treated (i.e. compliers) in a randomized trial with full compliance in the placebo arm was given by Robins, VanderWeele, and Richardson (2007); see also Geneletti and Dawid (2007) and Appendix A herein. This chapter extends and revises previous discussions by Robins and colleagues (Robins & Greenland, 1992; Robins, 2003; Robins, Rotnitzky,
108
Causality and Psychopathology
& Vansteelandt, 2007) of direct and indirect effects. We restrict consideration to causal models, such as the agnostic, FFRCISTG, MCM, and NPSEM, that can be represented by a directed acyclic graph (DAG). See Robins, Richardson, and Spirtes (2009) for a discussion of alternative points of view due to Hafeman and VanderWeele (2010), Imai, Keele, and Yamamoto (2009) and Petersen, Sinisi, and Laan (2006) that are not based on DAGs. The chapter is organized as follows: Section 1 introduces the four types of causal model associated with a graph; Section 2 defines direct effects; Section 3 analyzes the conditions required for the PDE to be identified; Section 4 considers, via examples, the extent to which the PDE may be interpreted in terms of interventions; Section 5 relates our results to the work of Avin, Shpitser, and Pearl (2005) on path-specific causal effects; finally Section 6 concludes.
1 Graphical Causal Models Define a DAG G to be a graph with nodes (vertices) representing the elements of a vector of random variables V ¼ (V1, . . . , VM) with directed edges (arrows) and no directed cycles. To avoid technicalities, we assume all variables Vm are discrete. We let f (v) fV(v) P(V ¼ v) all denote the probability density of V, where, for simplicity, we assume v 2 V V M , V m V 1 V m , V m denotes the assumed known space of possible values vm of Vm, and for any z1, . . . , zm, we define z m ¼ ðz1 ; . . . ;zm Þ. By convention, for any z m ; we define z 0 z0 0. Note V m V 1 V m is the product space of the V j , j m. We do not necessarily assume that f (v) is strictly positive for all v 2 V. As a simple example, consider a randomized trial of smoking cessation, represented by the DAG G with node set V ¼ (X, Z, Y) in Figure 6.1. Thus, M ¼ 3, V1 ¼ X, V2 ¼ Z, V3 ¼ Y. Here, X is the randomization indicator, with X ¼ 0 denoting smoking cessation and X ¼ 1 active smoking; Z is an indicator variable for hypertensive status 1 month post randomization; Y is an
X
Z
Y
Figure 6.1 A simple DAG containing a treatment X, an intermediate Z and a response Y.
6 Alternative Graphical Causal Models
109
indicator variable for having a myocardial infarction (MI) by the end of follow-up at 3 months. For simplicity, assume complete compliance with the assigned treatment and assume no subject had an MI prior to 1 month. We refer to the variables V as factual variables as they are variables that could potentially be recorded on the subjects participating in the study. Because in this chapter our focus is on identification, we assume the study population is sufficiently large that sampling variability can be ignored. Then, the density f (v) ¼ f (x, z, y) of the factual variables can be taken to be the proportion of our study population with X ¼ x, Z ¼ z, Y ¼ y. Our ultimate goal is to try to determine whether X has a direct effect on Y not through Z. We use either PAVm or PAm to denote the parents of Vm, that is, the set of nodes from which there is a direct arrow into Vm. For example, in Figure 6.1, PAY ¼ {X, Z}. A variable Vj is a descendant of Vm if there is a sequence of nodes connected by edges between Vm and Vj such that, following the direction indicated by the arrows, one can reach Vj by starting at Vm, that is, Vm ! ! Vj. Thus, in Figure 6.1, Z is a descendant of X but not of Y. We suppose that, as in Figure 6.1, the V ¼ (V1, . . . , VM) are numbered so that Vj is not a descendant of Vm for m > j. Let R ¼ (R1, . . . , RK) denote any subset of V and let r ¼ (r1, . . . , rK) be a value of R. We write Rj ¼ Vm, Rj ¼ V m if the jth variable in R corresponds to the mth variable in V. The NPSEM, MCM and FFRCISTG models all assume the existence of the counterfactual random variable Vm(r) encoding the value the variable Vm would have if, possibly contrary to fact, R were set to r, r 2 R ¼ R1 RK, where Vm(r) is assumed to be well-defined in the sense that there is reasonable agreement as to the hypothetical intervention (i.e., closest possible world) which sets R to r (Robins & Greenland, 2000). For example, in Figure 6.1, Z(x ¼ 1) and Y(x ¼ 1) are a subject’s Z and Y had, possibly contrary to fact, the subject been a smoker. By assumption, if Rj 2 R is the mth variable Vm, then Vm(r) equals the value rj to which the variable Vm ¼ Rj was set. For example, in Figure 6.1, the counterfactual X(x ¼ 1) is equal to 1. Note we assume Vm(r) is well-defined even when the factual probability P(R ¼ r) is zero. We recognize that under certain circumstances such an assumption might be ‘metaphysically suspect’ because the counterfactuals could be ‘radically’ ill-defined, since no one was observed to receive the treatment in question. However, in our opinion, in a number of the examples that we consider in this chapter these counterfactuals do not appear to be much less well-defined than those corresponding to treatments that have positive factual probability. We often write the density fV(r)(v) of V(r) as fr int ðvÞ, with ‘int’ being short for intervene, to emphasize the fact that fVðrÞ ðvÞ ¼ fr int ðvÞ represents the
110
Causality and Psychopathology
density of V in the counterfactual world where we intervened and set each subject’s R to r. We say that fr int ðvÞ is the density of V had, contrary to fact, each subject followed the treatment regime r. In contrast, f (v) is the density of the factual variables V. With this background, we specify our four causal models.
1.1 FFRCISTG Causal Models Given a DAG G with node set V, an FFRCISTG model associated with G makes four assumptions. (i) All one-step-ahead counterfactuals Vm ðv m1 Þ exist for any setting v m1 2 V m1 of their predecessors. For example, in Figure 6.1, a subject’s hypertensive status Z(x) ¼ V2(v1) at smoking level x for x ¼ 0 and for x ¼ 1 exists and a subject’s MI status Yðx;zÞ ¼ V3 ðv 2 Þ at each joint level of smoking and hypertension exists. Because V1 ¼ X has no predecessor, V1 ¼ X exists only as a factual variable. (ii) Vm ðv m1 Þ Vm ðpam Þ is a function of v m1 only through the values pam of Vm’s parents on G. For example, were the edge X ! Y missing in Figure 6.1, this assumption would imply that Y(x, z) ¼ Y(z) for every subject and every z. That is, the absence of the edge would imply that smoking X has no effect on Y other than through its effect on Z. (iii) Both the factual variables Vm and the counterfactuals Vm(r) for any R V are obtained recursively from the one-step-ahead counterfactuals Vj ðv j1 Þ, for j m. For example, V3 ¼ V3(V1, V2(V1)) and V3(v1) ¼ V3(v1, V2(v1)). Thus, in Figure 6.1, with the treatment R being smoking X, a subject’s possibly counterfactual MI status Y(x ¼ 1) ¼ V3(v1 ¼ 1) had he been forced to smoke is Y(x ¼ 1, Z(x ¼ 1)) and, thus, is completely determined by the one-step-ahead counterfactuals Z(x) and Y(x, z). That is, Y(x ¼ 1) is obtained by evaluating the one-step-ahead counterfactual Y(x ¼ 1, z) at z ¼ Z(x ¼ 1). Similarly, a subject’s factual X and one-step-ahead counterfactuals determine the subject’s factual hypertensive status Z and MI status Y as Z(X) and Y(X, Z(X)) where Z(X) is the counterfactual Z(x) evaluated at x ¼ X and Y(X, Z(X)) is the counterfactual Y(x, z) evaluated at (x, z)¼ (X, Z(X)).
6 Alternative Graphical Causal Models
(iv) The following independence holds: ?Vm ðvm1 Þ j V m1 ¼ vm1 ; Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ?
111
ð6:1Þ
for all m and all vM1 2 V M1 ; where for a fixed v M1 , v k ¼ ðv1 ; . . . ;vk Þ, k < M 1 denotes the initial subvector of v M1 . Assumption (iv) is equivalent to the statement that for each m, conditional on the factual past V m1 ¼ v m1 , the factual variable Vm is independent of any possible evolution from m + 1 of one-step-ahead counterfactuals (consistent with V m1 ¼ v m1 ), i.e. fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg, for some v M1 of which v m1 is a sub-vector. This follows since by (iii), Vm Vm ðV m1 Þ ¼ Vm ðv m1 Þ when V m1 ¼ v m1 . Note that by (iii) above, the counterfactual Vmþ1 ðv m Þ for a given subject, say subject i, depends on the treatment v m received by the subject but does not depend on the treatment received by any other subject. Further, Vmþ1 ðv m Þ takes the same value whether the treatment v m is counter to fact (i.e., V m 6¼ v m Þ or factual (i.e., V m ¼ v m and thus Vmþ1 ðv m Þ ¼ Vmþ1 Þ. That is, the FFRCISTG model satisfies the consistency assumption described in the Introduction. Indeed, we shall henceforth refer to (iii) as the ‘consistency assumption’. The following example will play a central role in the chapter. Example 1 Consider the FFRCISTG model associated with the graph in Figure 6.1, then, for all z, Yðx ¼ 1; zÞ; Zðx ¼ 1Þ ? ? X;
Yðx ¼ 0; zÞ; Zðx ¼ 0Þ ? ?X
ð6:2Þ
Yðx ¼ 0; zÞ ? ? Zðx ¼ 0Þ j X ¼ 0
ð6:3Þ
and Yðx ¼ 1; zÞ ? ? Zðx ¼ 1Þ j X ¼ 1;
are true statements by assumption (iv). However, the model makes no claim as to whether Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j X ¼ 0 and Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j X ¼ 1 are true because, for example, the value of x in Y(x ¼ 1, z) differs from the value x ¼ 0 in Z(x ¼ 0). We shall see that all four of the above independence statements are true by assumption under the NPSEM associated with the graph in Figure 6.1.
Causality and Psychopathology
112
1.2 Minimal Counterfactual Models (MCMs) An MCM differs from an FFRCISTG model only in that (iv) is replaced by: (iv*) For all m and all v M1 2 V M1 , f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 ; Vm ¼ vm Þ ¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ:
ð6:4Þ
Since (iv) can be written as the condition that: For all m, all v M1 2 V M1 ; and all vm 2 Vm, Þ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1; Vm ¼ vm
¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ; condition (iv) for an FFRCISTG implies condition (iv*) for an MCM. However, the reverse does not hold. An MCM requires only that the last display holds for the unique value vm of Vm that occurs in the given v M1 . Thus, Equation (6.4) states that, conditional on the factual past V m1 ¼ v m1 through m 1, any possible evolution from m + 1 of one-step-ahead counterfactuals, fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg, consistent with the past V m ¼ v m through m, is independent of the event Vm ¼ vm. In other words,
Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? ?IðVm ðvm1 Þ ¼ vm Þ j V m1 ¼ vm1 ; for all m and all vM1 2 V M1 ;
ð6:5Þ
where I(Vm ¼ vm) is the Bernoulli indicator random variable. It follows that in the special case where all the Vm are binary, an MCM and an FFRCISTG model are equivalent because, for Vm binary, the random variables Vm and I(Vm ¼ vm) are the same (possibly up to a recoding). Physical randomization of X and/or Z implies counterfactual independencies beyond those of an FFRCISTG or MCM model, see Robins et al. (2009). However, these extra independencies fail to imply Z(0)? ?Y (1; z) for the graph in Figure 6.1 and hence do not affect our results.
1.3 A Representation of MCMs and FFRCISTG Models that Does Not Condition on the past In this section we derive alternative characterizations of the counterfactual independence conditions for the FFRCISTG model and the MCM that will facilitate the comparison of these models with the NPSEM.
6 Alternative Graphical Causal Models
113
Theorem 1 Given an FFRCISTG model associated with a graph G: (a) The set of independences in condition (6.1) is satisfied if and only if for each vM1 2 V M1 ; the random variables Vmþ1 ðvm Þ; m ¼ 0; . . . ; M 1 are mutually independent:
ð6:6Þ
(b) Furthermore, the set of independences (6.6) is the same for any ordering of the variables compatible with the descendant relationships in G. Proof of (a): ()) Given v M1 and m 2 {1, . . . , M 1}, we define =m = {=m,m, . . . ,=M,m} to be a set of conditional independence statements: i)
=m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ? ?Vm ðvm1 Þ; and
ii) for j ¼ 1 to j ¼ M m, =mþj;m : ? Vmþj ðvmþj1 Þ j V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ? V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; V m ðvm1 Þ¼ vm : First, note that the set of independences in condition (6.1) is precisely =1. Now, if the collection =m holds (for m < M 2) then =m+1 holds since (I) the set =m+1 is precisely the set {=m+1,m , . . . , =M,m} except with Vm ðv m1 Þ removed from all conditioning events and (II) =m,m licenses such removal. Thus, beginning with =1, we recursively obtain that =m and thus =m,m holds for m ¼ 1, . . . , M 1. The latter immediately implies that the variables Vm+1ðv m Þ, m ¼ 0, . . . , M 1 are mutually independent. (() The reverse implication is immediate upon noting that the conditioning event V m1 ¼ v m1 in Equation (6.1) is the event V0 ¼ v0, V1(v0) ¼ v1, . . . ,Vm1ðv m2 Þ ¼ vm1. Proof of (b): This follows immediately from the assumption that Vm ðv m1 Þ ¼ Vm ðpam Þ. œ Theorem 2 Given an MCM associated with a graph G: (a) The set of independences in condition (6.5) is satisfied if and only if for each v M1 2 V M1 , and each m 2 {1, . . . , M 1}, ? IðVm ðvm1 Þ ¼ vm Þ: ð6:7Þ Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? (b) Furthermore, the set of independences (6.7) is the same for any ordering of the variables compatible with the descendant relationships in G. An immediate corollary is the following.
114
Causality and Psychopathology
Corollary 3 An MCM implies that for all v M1 2 V M1 , the random variables IðVmþ1 ðv m Þ ¼ vmþ1 Þ, m ¼ 0, . . . , M 1 are mutually independent. Proof of Theorem 2(a): ()) Given v M1 , the proof exactly follows that of the previous theorem when we redefine: i)
=m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ? ? IðVm ðvm1 Þ ¼ vm Þ; and
ii) for j ¼ 1 to j ¼ M m, =mþj;m : V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ? ? IðVmþj ðvmþj1 Þ ¼ vmþj Þ j V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; Vm ðvm1 Þ¼ vm : The reverse implication and (b) follows as in the proof of the previous theorem. œ
1.4 Non-Parametric Structural Equation Models (NPSEMs) Given a DAG G with node set V, an NPSEM associated with G assumes that there exist mutually independent random variables m and deterministic unknown functions fm such that the counterfactual Vm ðv m1 Þ Vm ðpam Þ is given by fm(pam, m) and both the factual variables Vm and the counterfactuals Vm(x) for any X V are obtained recursively from the Vm ðv m1 Þ as in (iii) in Section 1.1. Under an NPSEM both the FFRCISTG condition (6.1) and the MCM condition (6.5) hold. However, an FFRCISTG or MCM associated with G will not, in general, be an NPSEM for G. Indeed an NPSEM implies Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? ? Vm ðv m1 Þ j V m1 ¼ vm1 ; ð6:8Þ for all m; all vM1 2 V M1 ; and all vm1 ; vm1 2 V m1 : That is, conditional on the factual past V m1 ¼ v m1 , the counterfactual Vm ðv m1 Þ is statistically independent of all future one-step ahead counterfactuals. This implies that all four statements in Example 1 are true under an NPSEM; see also Pearl (2000, Section 3.6.3). Hence, in an MCM or FFRCISTG model, in contrast to an NPSEM, the defining independences are those for which the value of v m1 in (a) the conditioning event, (b) the counterfactual Vm at m and (c) the set of future one-step-ahead counterfactuals fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg are equal. Thus, an FFRCISTG assumes independence of fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg and
6 Alternative Graphical Causal Models
115
Vm ðv m1 Þ given Vm1 ¼ v m1 only when v m1 ¼ v m1 ¼ v m1 . As mentioned above, the MCM further weakens the independence by replacing Vm with I(Vm ¼ vm). In Appendix B we describe a data-generating process leading to a counterfactual model that is an MCM/FFRCISTG model associated with Figure 6.1, but not an NPSEM for this figure. Understanding the implications of these additional counterfactual independences assumed by an NPSEM compared to an MCM or FFRCISTG model is one of the central themes of this chapter.
1.5 The g-Functional Before defining our third causal model, the agnostic causal model, we need to define the g-functional density. The next Lemma shows that the assumptions of an MCM, and thus a fortiori those of the NPSEMs and FFRCISTG models, restrict the joint distribution of the factual variables (when there are missing edges in the DAG). Lemma 4 In an MCM associated with DAG G, for all v such that f (v) > 0, the density f (v) P(V ¼ v) of the factuals V satisfies the Markov factorization f ðvÞ ¼
M Y
f ðvj j paj Þ:
ð6:9Þ
j¼1
Robins (1986) proved Lemma 4 for an FFRCISTG model; the proof applies equally to an MCM. Equation (6.9) is equivalent to the statement that each variable Vm is conditionally independent of its non-descendants given its parents (Pearl, 1988). Example 2 In Figure 6.1, f (x, z, y) ¼ f (y | x, z) f (z | x) f (x). If the arrow from X to Y were missing, we would then have f (x, z, y) ¼ f (y | z) f (z | x) f (x) since Z would be the only parent of Y. Definition 5 Given a DAG G, a set of variables R V, and a value r of R, define the g-functional density (Q Pr ðV ¼ vÞ fr ðvÞ
0
j:Vj 62R
f ðvj j paj Þ
if v ¼ ðu; rÞ; if v ¼ ðu; r Þ with r 6¼ r:
In words, fr(v) is the density obtained by modifying the product on the right-hand side of Equation (6.9) by removing the term f (vj | paj) for every
116
Causality and Psychopathology
Vj 2 R, while for Vj 62 R, for each Rm 2 R in PAj, set Rm to the value rm in the term f (vj | paj). Note the probability that R equals r is 1 under the density fr(v), that is, Pr(R ¼ r) fr(r) ¼ 1. The density fr(z) may not be a well-defined function of the density f (v) of the factual data V when the factual distribution of V is non-positive because setting Rm 2 PAj to the value rm in f (vj | paj) may result in conditioning on an event that has probability zero of occurring in the factual distribution. Example 3 In Figure 6.1 with R ¼ (X, Z), r ¼ (x ¼ 1, z ¼ 0), fr(v) fx¼1,z¼0(x*, z*, y) ¼ f (y | x ¼ 1, z ¼ 0) if (x*, z*) ¼ (1, 0). On the other hand, fx¼1,z¼0(x*, z*, y) ¼ 0 if (x*, z*) 6¼ (1, 0) since, under fx¼1,z¼0(x*, z*, y), X is always 1 and Z always 0. It follows that fx¼1,z¼0(y) ¼ f (y | x ¼ 1, z ¼ 0). If the event (X, Z) ¼ (1, 0) has probability zero under f (v) then fx¼1,z¼0(y) is not a function of f (v) and is thus not uniquely defined. The following Lemma connects the g-functional density fr(v) to the intervention density frint ðvÞ. Lemma 6 Given an MCM associated with a DAG G, sets of variables R, Z V and a treatment regime r, if the g-functional density fr(z) is a well-defined function of f (v), then fr ðzÞ ¼ frint ðzÞ. In words, whenever the g-functional density fr(z) is a well-defined function of f (v), it is equal to the intervention density for Z that would be observed had, contrary to fact, all subjects followed the regime r. This result can be extended from so-called static treatment regimes r to general treatment regimes, where treatment is a (possibly random) function of the observed history, as follows. Suppose we are given a set of variables R and for each Vj ¼ Rm 2 R we are given a density pj ðvj j v j1 Þ. Then, we define pR to be the general treatment regime corresponding to an intervention in which, for each Vj ¼ Rm 2 R, a subject’s treatment level vj is randomly assigned with randomization probabilities pj ðvj j v j1 Þ that are a function of the values of the subset of the variables V j1 that temporally precede Vj. We let fpRint ðvÞ be the distribution of V that would be observed if, contrary to fact, all subjects had been randomly assigned treatment with probabilities pR. Further, we define the g-functional density fpR(v) to be the density Y Y f ðvj j paj Þ pj ðvj j vj1 Þ fpR ðvÞ j:Vj 62R
j:Vj 2R
and for Z V, fpR ðzÞ vnz fpR ðvÞ. Thus the marginal fpR(z) is obtained from fpR(v) by summation in the usual way. Then, we have the following extension of Lemma 6.
6 Alternative Graphical Causal Models
117
Extended Lemma 6: Given an MCM associated with a DAG G, sets of variables R, Z V, and a treatment regime pR, if the g-functional density fpR(z) is a welldefined function of f (v), then fpR ðzÞ ¼ fpRint ðzÞ. In words, whenever the g-functional density fpR(z) is a well-defined function of f (v), it is equal to the intervention density for Z that would be observed had, contrary to fact, all subjects followed the general regime pR. Robins (1986) proved Extended Lemma 6 for an FFRCISTG model; the proof applies equally to an MCM. Extended Lemma 6 actually subsumes Lemma 6 as fr int ðzÞ is fpRint ðzÞ for pR such that for Vj ¼ Rm 2 R, pj ðvj j v j1 Þ ¼ 1 if vj ¼ rm and is zero if vj 6¼ rm. Corollary to Extended Lemma 6: Given an MCM associated with a DAG G, sets of variables R, Z V and a treatment regime pR, fpR ðzÞ ¼ fpint ðzÞ whenever pR R satisfies the following positivity condition: For all Vj 2 R, f ðV j1 Þ > 0 and pj ðvj jV j1 Þ > 0 imply f ðvj j V j1 Þ > 0 with probability one under f (v). This follows directly from Extended Lemma 6, as the positivity condition implies that fpR(z) is a well-defined function of f (v). In the literature, one often sees only the Corollary stated and proved. However, as noted by Gill and Robins (2001), these proofs use the ‘positivity condition’ only to show that fpR(z) is a well-defined (i.e., unique) function of f (v). Thus, these proofs actually establish the general version of Extended Lemma 6. In this chapter we study models in which fpR(z) is a well-defined function of f (v) even though the positivity assumption fails; as a consequence, we require the general version of the Lemma.
1.6 Agnostic Causal Models We are now ready to define the agnostic causal model (Spirtes et al., 1993): Given a DAG G with node set V, the agnostic causal model represented by G assumes that the joint distribution of the factual variables V factors as in Equation (6.9) and that the interventional density of Z V, again denoted by fpRint ðzÞ or fr int ðzÞ, under treatment regime pR or regime r is given by the g-functional density fpR(z) or fr(z), whenever fpR(z) or fr(z) is a well-defined function of f (v). Although this model assumes that density fpRint ðvÞ or fr int ðvÞ of V under these interventions exist, the model makes no reference to counterfactual variables and is agnostic as to their existence. Thus the agnostic causal model does not impose any version of a consistency assumption.
118
Causality and Psychopathology
1.7 Interventions Restricted to a Subset of Variables In this chapter we restrict consideration to graphical causal models in which we assume that interventions on every possible subset of the variables are possible and indeed well-defined. The constraint that only a subset V* of V can be intervened on may be incorporated into the agnostic causal model by marking the elements of V* and requiring that, for any intervention pR, R ˝ V*. For the FFRCISTG model, Robins (1986, 1987) constructs a counterfactual model, the fully randomized causally interpreted structured tree graph (FRCISTG) model that imposes the constraint and reduces to the FFRCISTG model when V* ¼ V. We briey review his approach and its extension to the MCM model in Appendix D. See also the decision-theoretic models of Dawid (2000) and Heckerman and Shachter (1995).
1.8 Manipulable Contrasts and Parameters In the Introduction we defined the set of manipulable contrasts relative to a graph G to be the set of causal contrasts that are well-defined under the agnostic causal model, i.e., the set of contrasts that are functions of the causal effects fpRint ðzÞ. The set consists of all contrasts between treatment regimes in an experiment with sequential treatment assignments, wherein the treatment given at stage m is a function of past covariates on the graph. Definition 7 We say a causal effect in a particular causal model associated with a DAG G with node set V is non-parametrically identified from data on V (or, equivalently, in the absence of hidden variables) if it is a function of the density f (v) of the factuals V. Thus in all four causal models, the causal effects fpint ðzÞ for which the gR functional fpR(z) is a well-defined function of f (v) are non-parametrically identified from data on V. It follows that the manipulable contrasts are non-parametrically identified under an agnostic causal model from observational data with a positive joint distribution and no hidden (i.e., unmeasured) variables. (Recall that a discrete joint distribution is positive if the probability of a joint event is nonzero whenever the marginal probability of each individual component of the event is nonzero.) In contrast, the effect of treatment on the treated ETTðxÞ E½YðxÞ Yð0Þ j X ¼ x is not a manipulable parameter relative to the graph G: X ! Y, since it is not well-defined under the corresponding agnostic causal model. However,
6 Alternative Graphical Causal Models
119
ETT(x) is identified under both MCMs and FFRCISTG models. Robins (2003) stated that an FFRCISTG model identified only ‘‘manipulable parameters.’’ However, in that work, unlike here, no explicit definition of manipulable was used; in particular it was not specified which class of interventions was being considered. In Appendix A we show that the MCMs and FFRCISTG models identify ETT(x), which is not a manipulable parameter relative to the graph X ! Y. However, ETT(x) is a manipulable parameter relative to an expanded graph G0 with deterministic relations; see also Robins, VanderWeele, and Richardson (2007), and Geneletti and Dawid (2007). For expositional simplicity, we will henceforth restrict our discussion to static deterministic regime effects frint ðzÞ except when non-static (i.e., dynamic and/or random) regimes pR are being explicitly discussed.
2 Direct Effects Consider the following query: Do cigarettes (X) have a causal effect on MI (Y) through a pathway that does not involve hypertension (Z)? This query is often rephrased as whether X has a direct causal effect on Y not through the intermediate variable Z. The concept of direct effect has been formalized in three different ways in the literature. For notational simplicity we always take X to be binary, except where noted in Appendix A.
2.1 Controlled Direct Effects (CDEs) Consider a causal model associated with a DAG G with node set V containing (X, Y, Z). In a counterfactual causal model, the individual and average controlled direct effect (CDE) of X on Y when Z is set to z are, respectively, defined as Y(x ¼ 1, z) Y(x ¼ 0, z) and CDE(z) ¼ E[Y(x ¼ 1, z) Y(x ¼ 0, z)]. In our previous notation, E[Y(x ¼ 1, z) Y(x ¼ 0, z)] is the difference in means int int int ½Y Ex¼0;z ½Y of Y under the intervention distributions fx¼1;z ðvÞ and Ex¼1;z int fx¼0;z ðvÞ. Under the associated agnostic causal model, counterfactuals do not int int ½Y Ex¼0;z ½Y. Under all exist but the CDE(z) can still be defined as Ex¼1;z int int four causal models, Ex¼1;z ½Y Ex¼0;z ½Y is identified from data on V by Ex¼1,z [Y] Ex¼0,z [Y] under the g-formula densities fx¼1,z(v) and fx¼0,z(v), if these are well-defined functions of f (v). In the case of Figure 6.1, Ex,z[Y] is just the mean E[Y | X ¼ x, Z ¼ z] of the factual Y given X ¼ x and Z ¼ z since, by the definition of the g-formula, fx,z(y) ¼ f (y | X ¼ x, Z ¼ z). When Z is binary there exist two different controlled direct effects corresponding to z ¼ 1 and z ¼ 0. For example, CDE(1) is the average effect of X
120
Causality and Psychopathology
on Y in the study population were, contrary to fact, all subjects to have Z set to 1. It is possible for CDE(1) to be zero and CDE(0) to be nonzero or vice versa. Whenever CDE(z) is nonzero for some level of Z, there will exist a directed path from X to Y not through Z on the causal graph G, regardless of the causal model.
2.2 Pure Direct Effects (PDEs) In a counterfactual model, Robins and Greenland (1992) (hereafter R&G) defined the individual pure direct effect (PDE) of a (dichotomous) exposure X on Y relative to an intermediate variable Z to be Y(x ¼ 1, Z(x ¼ 0)) Y(x ¼ 0). That is, the individual PDE is the subject’s value of Y under exposure to X had, possibly contrary to fact, X’s effect on the intermediate Z been blocked (i.e., had Z remained at its value under non-exposure) minus the value of Y under non-exposure to X. The individual PDE can also be written as Y(x ¼ 1, Z(x ¼ 0)) Y(x ¼ 0, Z(x ¼ 0)), since Y(x ¼ 0) ¼ Y(x ¼ 0, Z(x ¼ 0)). Thus the PDE contrast measures the direct effect of X on Y when Z is set to its value Z(x ¼ 0) under non-exposure to X. The average PDE is given by PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Yðx ¼ 0Þ ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ Yðx ¼ 0; Zðx ¼ 0ÞÞ:
ð6:10Þ
Pearl (2001) adopted R&G’s definition but changed nomenclature. He referred to the pure direct effect as a ‘natural’ direct effect. Since the interint ½Y is identified from data on V under any of vention mean E½Yðx ¼ 0Þ ¼Ex¼0 the associated causal models, the PDE is identified if and only if E[Y(x ¼ 1, Z(x ¼ 0))] is identified. The data generating process given in Appendix B shows that E[Y(x ¼ 1, Z(x ¼ 0))] is not a manipulable effect relative to the graph in Figure 6.1. Further we show that E[Y(x ¼ 1, Z(x ¼ 0))] is not identified under an MCM or FFRCISTG model from data on V in the absence of further untestable assumptions. However, we shall see that E[Y(x ¼ 1, Z(x ¼ 0))] is identified under the NPSEM associated with the graph in Figure 6.1. Under the agnostic causal model, the concept of pure direct effect is not defined since the counterfactual Y(x ¼ 1, Z(x ¼ 0)) is not assumed to exist.
2.3 Principal Stratum Direct Effects (PSDEs) In contrast to the control direct effect and pure direct effect, the individual principal stratum direct effect (PSDE) is defined only for subjects for whom X has no causal effect on Z so that Z(x ¼ 1) ¼ Z(x ¼ 0). For a subject with
6 Alternative Graphical Causal Models
121
Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z, the individual principal stratum direct effect is defined to be Yðx ¼ 1; zÞ Yðx ¼ 0; zÞ (here, X is assumed to be binary). The average PSDE in principal stratum z is defined to be PSDEðzÞ E½Yð1;zÞ Yð0;zÞ j Zð1Þ ¼Zð0Þ ¼z: Robins (1986, Sec. 12.2) first proposed using PSDE(z) to define causal effects. In his article, Y ¼ 1 denoted the indicator of death from a cause of interest (subsequent to a time t), Z ¼ 0 denoted the indicator of survival until t from competing causes, and the contrast PSDE(z) was used to solve the problem of censoring by competing causes of death in defining the causal effect of the treatment X on the cause Y. Rubin (1998) and Frangakis and Rubin (1999, 2002) later used this same contrast to solve precisely the same problem of ‘‘censoring by death.’’ Finally, the analysis of Rubin (2004) was also based on this contrast, except that Z and Y were no longer assumed to be failure-time indicators. The argument given below in Sec. 4 to prove that E[Y(x ¼ 1, Z(x ¼ 0))] is not a manipulable effect relative to the graph in Figure 6.1 also proves that PSDE(z) is not a manipulable effect relative to this graph. Furthermore, the PSDE(z) represents a causal contrast on a non-identifiable subset of the study population — the subset with Z(1) ¼ Z(0) ¼ z. An even greater potential problem with the PSDE is that if X has an effect on every subject’s Z, then PSDE(z) is undefined for every possible z. If Z is continuous and/or multivariate, it would not be unusual for X to have an effect on every subject’s Z. Thus, Z is generally chosen to be univariate and discrete with few levels, often binary when PSDE(z) is the causal contrast. However, principal stratum direct effects have the potential advantage of remaining well-defined even when controlled direct effects or pure direct effects are ill-defined. Note that for a subject with Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z, we have Y(x ¼ 1, z) ¼ Y(x ¼ 1, Z(x ¼ 1)) Y(x ¼ 1) and Y(x ¼ 0, z) ¼ Y(0, Z(0)) Y(x ¼ 0), so the individual PSDE for this subject is Y(x ¼ 1) Y(x ¼ 0). The average PSDE is given by: PSDEðzÞ ¼ E ½Yðx ¼ 1Þ Yðx ¼ 0Þ j Zð1Þ ¼ Zð0Þ ¼ z: Thus, PSDE’s can be defined in terms of the counterfactuals Y(x) and Z(x). Now, in a trial where X is randomly assigned but the intermediate Z is not, there will generally be reasonable agreement as to the hypothetical intervention (i.e., closest possible world) which sets X to x so Y(x) and Z(x) are well defined; however, there may not be reasonable agreement
122
Causality and Psychopathology
as to the hypothetical intervention which sets X to x and Z to z, in which case Y(x, z) will be ill-defined. In that event, controlled and pure direct effects are ill-defined, but one can still define PSDE(z) by the previous display. However, when Y(x, z) and thus CDEs and PDEs are ill-defined, and therefore use of the PSDE(z) is proposed, it is often the case that (i) the intermediate variable that is truly of scientific and policy relevance — say, Z* — is many leveled, even continuous and/or multivariate, so PSDE(z*) may not exist for any z*, and (ii) Z is a coarsening (i.e. a function) of Z*, chosen to ensure that PSDE(z) exists. In such settings, the counterfactual Y(x, z*) is frequently meaningful because the hypothetical intervention which sets X to x and Z* to z* (unlike the intervention that sets X to x and Z to z) is welldefined. Furthermore, the CDE(z*) and PDE based on Z*, in contrast to the PSDE(z), provide knowledge of the pathways or mechanisms by which X causes Y and represent the effects of interventions of public-health importance. In such a setting, the direct effect contrasts of primary interest are the CDE(z*) and the PDE based on Z* rather than the PSDE(z) based on a binary coarsening Z of Z*. See Robins, Rotnitzky, and Vansteelandt (2007); Robins et al. (2009) for an example and further discussion.
3 Identification of The Pure Direct Effect We have seen that the CDE(z), as a manipulable parameter relative to the graph in Figure 6.1, is generally identified from data on V under all four of the causal models associated with this graph. We next consider identification of E[Y(x ¼ 1, Z(x ¼ 0))] and, thus, identification of the PDE in three important examples. The first two illustrate that the PDE may be identified in the NPSEM associated with a DAG but not by the associated MCMs or FFRCISTG models. In the third example the PDE is not identified under any of the four causal models associated with the DAG. We will elaborate these examples in subsequent sections.
3.1 Identification of the PDE in the DAG in Figure 6.1 Pearl (2001) proved that under the NPSEM associated with the causal DAG in Figure 6.1 E[Y(x ¼ 1, Z(x ¼ 0))] is identified. To see why, note that if Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ for all z;
ð6:11Þ
then E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X z
int int Ex¼1;z ½Y fx¼0 ðzÞ;
ð6:12Þ
6 Alternative Graphical Causal Models
123
because E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X
E ½Yðx ¼ 1; zÞ j Zðx ¼ 0Þ ¼ zP½Zðx ¼ 0Þ ¼ z
z
¼
X
E ½Yðx ¼ 1; zÞP½Zðx ¼ 0Þ ¼ z;
z
where the first equality is by the laws of probability and the second by (6.11). Now, the right side of Equation (6.12) is non-parametrically identified from int ½Y f (v) under all four causal models since the intervention parameters Ex;z int and fx ðzÞ are identified by the g-functional. In particular, with Figure 6.1 as the causal DAG, X z
int int Ex¼1;z ½Y fx¼0 ðzÞ ¼
X
E ½Y j X ¼ 1; Z ¼ zf ðz j X ¼ 0Þ:
ð6:13Þ
z
Hence, it remains only to show that (6.11) holds for an NPSEM corresponding to the graph in Figure 6.1. Now, we noted in Example 1 that Y(x ¼ 1, z) ? ? Z(x ¼ 0) | X ¼ j held for j ¼ 0 and j ¼ 1 for the NPSEM (but not for the FFRCISTG) associated with the DAG in Figure 6.1. Further, for this NPSEM, {Y(x ¼ 1, z), Z(x ¼ 0)} ? ? X. Combining, we conclude that (6.11) holds. In contrast, for an FFRCISTG model or MCM corresponding to Figure 6.1, E[Y(x ¼ 1, Z(x ¼ 0))] is not identified, because condition (6.11) need not hold. In Appendix C we derive sharp bounds for the PDE under the assumption that the FFRCISTG model or the MCM associated with graph G holds. We find that these bounds may be quite informative, even though the PDE is not (point) identified under this model.
3.2 The ‘Natural Direct Effect’ of Didelez, Dawid & Geneletti Didelez, Dawid, and Geneletti (2006) (hereafter referred to as DDG) discuss an effect that they refer to as the ‘natural direct effect’ and prove it is identified under the agnostic causal model associated with the DAG in Figure 6.1, the difference between Equation (6.13) and E[Y | X ¼ 0] being the identifying formula. Since the parameter we have referred to as the natural or pure direct effect is not even defined under the agnostic model, it is clear they are giving the same name to a different parameter. Thus, DDG’s results have no relevance to the identification of the PDE. To clarify, we discuss DDG’s results in greater detail. To define DDG’s parameter, let R ¼ (X, Z) and consider a regime pR p(X ¼ j,Z) with p(x) ¼ 1 if and only if x ¼ j and with a given p(z | x) ¼ p*(z) that does not depend on X. ðvÞ ¼ fpint ðx; y; zÞ is the density in a hypothetical study where Then, fpint ðx¼j;zÞ ðx¼j;zÞ
Causality and Psychopathology
124
each subject receives X ¼ j and then is randomly assigned Z based on the density p*(z). DDG define the natural direct effect to be int EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y with p*(z) equal to the density fx¼0 ðzÞ of Z when X int int is set to 0, provided EpðX¼0;ZÞ ½Y is equal to Ex¼0 ½Y, the mean of Y when all int subjects are untreated. When EpintðX¼0;ZÞ ½Y 6¼ Ex¼0 ½Y, they say their natural direct effect is undefined. Now, under the agnostic causal model associated with the DAG in Figure 6.1, it follows from Extended Lemma 6 that int ½Y ¼ E½YjX ¼ 0 and EpintðX¼1;ZÞ ½Y is given by the right side EpintðX¼0;ZÞ ½Y ¼ Ex¼0 of Equation (6.13), confirming DDG’s claim about their parameter EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y. In contrast, our PDE parameter is given by the difference between Equation (6.13) and E[Y | X ¼ 0] only when E[Y(x ¼ 1, Z(x ¼ 0))] equals Equation (6.13), which cannot be the case under an agnostic causal DAG model as E[Y(x ¼ 1, Z(x ¼ 0))] is then undefined. Note E[Y(x ¼ 1, Z(x ¼ 0))] does equal Equation (6.13) under the NPSEM associated with Figure 6.1 but not under the MCM or FFRCISTG model associated with this Figure.
3.3 Identification of the PDE with a Measured Common Cause of Z and Y that Is Not Directly Affected by X Consider the causal DAG in Figure 6.2(a) that differs from the DAG in Figure 6.1 in that it assumes (in the context of our smoking study example) there is a measured common cause L of hypertension Z and MI Y that is not caused by X. Suppose we assume an NPSEM with V ¼ (X, L, Z, Y) and our goal remains estimation of E[Y(x ¼ 1, Z(x ¼ 0))]. Then E[Y(x ¼ 1, Z(x ¼ 0))]. remains identified under the NPSEM associated with the DAG in Figure 6.2(a) with the identifying formula now X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ:
z;l
X
Z
(a)
Y
L
X
Z
(b)
Y
L
Figure 6.2 An elaboration of the DAG in Figure 6.2 in which L is a (measured) common cause of Z and Y.
6 Alternative Graphical Causal Models
125
This follows from the fact that under an NPSEM associated with the DAG in Figure 6.2(a), Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j L
for all z;
ð6:14Þ
which in turn implies E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X
int int Ex¼1;z ½Y j L ¼ l fx¼0 ðz j L ¼ lÞf ðlÞ:
ð6:15Þ
z;l
The right side of (6.15) remains identified under all four causal models via X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ:
ð6:16Þ
z;l
In contrast, for an MCM or FFRCISTG associated with the graph in Figure 6.2(a), E[Y(x ¼ 1, Z(x ¼ 0))] is not identified because (6.14) need not hold.
3.4 Failure of Identification of the PDE in an NPSEM with a Measured Common Cause of Z and Y that Is Directly Affected by X Consider the causal DAG shown in Figure 6.2(b) that differs from that in Figure 6.2(a) only in that X now causes L so that there exists an arrow from X to L. The right side of Equation (6.15) remains identified under all four causal models via X
E ½YjX ¼ 1; Z ¼ z; L ¼ l f ðzjX ¼ 0; L ¼ lÞf ðljX ¼ 0Þ:
z;l
Under an NPSEM, MCM, or FFRCISTG model associated with this causal DAG Y(x ¼ 1, Z(x ¼ 0)) is by definition Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0ÞÞ ¼ Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ:
ð6:17Þ
Avin et al. (2005) prove that Equation (6.14) does not hold for this NPSEM. Thus, even under an NPSEM, we cannot conclude that Equation (6.15) holds. In fact, Avin et al. (2005) prove that for this NPSEM E[Y(x ¼ 1, Z(x ¼ 0))] is not identified from data on V. This is because the expression on the righthand side of Equation (6.17) involves both L(x ¼ 1) and L(x ¼ 0), and there is no way to eliminate either.
Causality and Psychopathology
126
Additional Assumptions Identifying the PDE in the NPSEM Associated with the DAG in Figure 6.2(b) However, if we were to consider a counterfactual model that imposes even more counterfactual independence assumptions than the NPSEM, then the PDE may still be identified, though by a different formula. For example, if, in addition to the usual NPSEM independence assumptions, we assume that Lðx ¼ 0Þ ? ? Lðx ¼ 1Þ
ð6:18Þ
then we have
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ X E½Yðx ¼ 1; l; zÞ j Lðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l ¼ l ;l;z
f ðLðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l Þ ¼
X
E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
¼
X
E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l Þf ðLðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
¼
X
E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0Þf ðL ¼ ljX ¼ 1Þ
l ;l;z
f ðZ ¼ zjX ¼ 0; L ¼ l Þ:
ð6:19Þ
Here, the second and fourth equalities follow from the usual NPSEM independence restrictions but the third requires condition (6.18). One setting under which (6.18) holds is that in which the counterfactual variables L(0) and L(1) result from a restrictive ‘minimal sufficient cause model’ (Rothman, 1976) such as
LðxÞ ¼ ð1 xÞA0 þ xA1
ð6:20Þ
where A0 and A1 are independent both of one another and of all other counterfactuals. Note that (6.18) would not hold if the right-hand side of Equation (6.20) was (1 x)A0 + xA1 + A2, even if the Ai’s were again assumed to be independent (VanderWeele & Robins, 2007). An alternative further assumption, sufficient to identify the PDE in the context of the NPSEM associated with Figure 6.2(b), is that L(1) is a
6 Alternative Graphical Causal Models
127
deterministic function of L(0), i.e., L(1) ¼ g(L(0)) for some function g(). In this case, we have: f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞ ¼ f ðLðx ¼ 0Þ ¼ l ÞIðl ¼ gðl ÞÞ ¼ f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ; where I() is the indicator function. Hence
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ X ¼ E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ l ;l;z
ð6:21Þ
f ðZ ¼ zjX ¼ 0; L ¼ l Þ: For a scalar L taking values in a continuous state-space there will exist a function g() such that Lð1Þ ¼gðLð0ÞÞ under the condition of rank preservation, that is, if L i ð0Þ 5 L j ð0Þ
)
L i ð1Þ 5 L j ð1Þ;
for all individuals i, j. In this case g is simply the quantile-quantile function: 1 1 FLð0Þ ðlÞ ¼ FLjX¼1 FLjX¼0 ðlÞ ; ð6:22Þ gðlÞ FLð1Þ where F() and F 1() indicate the cumulative distribution function (CDF) and its inverse; the equality follows from the NPSEM assumptions; this expression shows that gð Þ is identified. (Since L is continuous, the sums over l, l* in Equation (6.21) are replaced by integrals.) A special case of this example is a linear structural equation system, where it was already known that the PDE is identified in the graph in Figure 6.2(b). Our analysis shows that identification of the PDE in this graph merely requires rank preservation and not linearity. Note that a linear structural equation model implies both rank preservation and linearity. We note that the identifying formula in Equation (6.21) differs from Equation (6.19). Since neither identifying assumption imposes any restriction on the distribution of the factual variables in the DAG in Figure 6.2(b), there is no empirical basis for deciding which, if either, of the assumptions is true. Consequently, we do not advocate blithely adopting such assumptions in order to preserve identification of the PDE in contexts such as the DAG in Figure 6.2(b).
128
Causality and Psychopathology
4 Models in which the PDE Is Manipulable We now turn to the question of whether E[Y(x ¼ 1, Z(x ¼ 0))] can be identified by intervening on the variables V on G in Figure 6.1. Now, as noted by R&G (1992) we could observe E[Y(x ¼ 1, Z(x ¼ 0))] if we could intervene and set X to 0, observe Z(0), then ‘‘return each subject to their pre-intervention state,’’ intervene to set X to 1 and Z to Z(0), and finally observe Y(1, Z(0)). However, such an intervention strategy will usually not exist because such a return to a pre-intervention state is usually not possible in a real-world intervention (e.g., suppose the outcome Y were death). As a result, because we cannot observe the same subject under both X ¼ 1 and X ¼ 0, we are unable to directly observe the distribution of mixed counterfactuals such as Y(x ¼ 1, Z(x ¼ 0)). It follows that we cannot observe E[Y(x ¼ 1, Z(x ¼ 0))] by any intervention on the variables X and Z. (Pearl, 2001) argues similarly. That is, although we can verify through intervention the prediction made by all four causal models that the right-hand side of Equation (6.13) is equal to the expression on the right-hand side of Equation (6.12), we cannot verify, by intervention on X and Z, the NPSEM prediction that Equation (6.12) holds. Thus E[Y(x ¼ 1, Z(x ¼ 0))] is not manipulable with respect to the graph in Figure 6.1, and hence neither is the PDE with respect to this graph. Yet both of these parameters are identified in the NPSEM associated with this graph. This would be less problematic if these parameters were of little or no substantive interest. However, as shown in the next section, Pearl convincingly argues that such parameters can be of substantive importance.
4.1 Pearl’s Substantive Motivation for the PDE Pearl argues that the PDE and the associated quantity E[Y(x ¼ 1, Z(x ¼ 0))] are often causal contrasts of substantive and public-health importance by offering examples along the following lines. Suppose a new process can completely remove the nicotine from tobacco, allowing the production of a nicotine-free cigarette to begin next year. The substantive goal is to use already collected data on smoking status X, hypertensive status Z and MI status Y from a randomized smoking-cessation trial to estimate the incidence of MI in smokers were all smokers to change to nicotine-free cigarettes. Suppose it is (somehow?) known that the entire effect of nicotine on MI is through its effect on hypertensive status, while the non-nicotine toxins in cigarettes have no effect on hypertension. Then, under the further assumption that there do not exist unmeasured confounders for the effect of hypertension on MI, the causal DAG in Figure 6.1 can be used to represent
6 Alternative Graphical Causal Models
129
the study. Under these assumptions, the MI incidence in smokers of cigarettes free of nicotine would be E[Y(x ¼ 1, Z(x ¼ 0))] under all three counterfactual causal models since the hypertensive status of smokers of nicotine-free cigarettes will equal their hypertensive status under non-exposure to cigarettes. Pearl then assumes an NPSEM and concludes that E[Y(x ¼ 1, Z(x ¼ 0))] equals z E½YjX ¼ 1;Z ¼ z f ðzjX ¼ 0Þ, and the latter quantity can be estimated from the already available data. What is interesting about Pearl’s example is that to argue for the substantive importance of the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))], he tells a story about the effect of a manipulation — a manipulation that makes no reference to Z at all. Rather, the manipulation is to intervene to eliminate the nicotine component of cigarettes. Indeed, the most direct representation of his story is provided by the extended DAG in Figure 6.3 with V ¼ (X, N, O, Z, Y) where N is a binary variable representing nicotine exposure, O is a binary variable representing exposure to the non-nicotine components of a cigarette, and (X, Z, Y) are as defined previously. The bolded arrows from X to N and O indicate a deterministic relationship. Specifically in the factual data, with probability one under f (v), either one smokes normal cigarettes so X ¼ N ¼ O ¼ 1 or one is a nonsmoker (i.e. ex-smoker) and X ¼ N ¼ O ¼ 0. In this representation the int ½Y of Y had, contrary to fact, all parameter of interest is the mean En¼0;o¼1 int subjects only been exposed to the non-nicotine components. As En¼0;o¼1 ½Y is int int a function of fn¼0;o¼1 ðvÞ, we conclude that En¼0;o¼1 ½Y is a manipulable causal effect relative to the DAG in Figure 6.3. Further, Pearl’s story gives no reason to believe that there is any confounding for estimating this effect. In Appendix B we present a scenario that differs from Pearl’s in which int ½Y is confounded, and thus, none of the four causal models assoEn¼0;o¼1 ciated with Figure 6.3 can be true (even though the FFRCISTG and agnostic causal models associated with Figure 6.1 are true). In contrast, under Pearl’s scenario it is reasonable to take any of the four causal models, including the agnostic model, associated with Figure 6.3
X
N
Z
O
Y
Figure 6.3 An elaboration of the DAG in Figure 6.1; N and O are, respectively, the nicotine and non-nicotine components of tobacco; thicker edges indicate deterministic relations.
130
Causality and Psychopathology
int as true. Under such a supposition En¼0;o¼1 ½Y is identified if En¼0;o¼1 ½Y is a well-defined function of f (v). Note, under f (v), data on (X, Z, Y) are equivalent to data on V ¼ (X, N, O, Z, Y), since X completely determines O and N in the factual data. We now show that, with Figure 6.3 as the causal DAG and int ½Y is identified V ¼ (X, N, O, Z, Y), under all four causal models, En¼0;o¼1 simply by applying the g-formula density in standard fashion. This result may seem surprising at first since no subject in the actual study data followed the regime (n ¼ 0, o ¼ 1), so the standard positivity assumption P[N ¼ 0, O ¼ 1] > 0 usually needed to make the g-formula density fn¼0,o¼1(v) a function of f (v) (and thus identifiable) fails. However, as we now demonstrate, even without positivity, the conditional independences implied by the assumptions of no direct effect of N on Y and no effect of O on Z encoded in the missing arrows from N to Y and O to Z in Figure 6.3 along with the deterministic relationship between O, N, and X under f (v) allow one to obtain identification. Specifically, under the DAG in Figure 6.3,
fn¼0;o¼1 ðy; zÞ ¼ f ðy j O ¼ 1; zÞ f ðz j N ¼ 0Þ ¼ f ðy j O ¼ 1; N ¼ 1; zÞ f ðz j N ¼ 0; O ¼ 0Þ ¼ f ðy j X ¼ 1; zÞ f ðz j X ¼ 0Þ;
where the first equality is by definition of the g-formula density fn¼0,o¼1(y, z), the second by the conditional independence relations encoded in the DAG in Figure 6.3, and the last by the deterministic relationships between O, N, and X under f (v) with V ¼ (X, N, O, Z, Y). Thus X En¼0;o¼1 ½Y y fn¼0;o¼1 ðy; zÞ y;z
¼
X
y f ðy j X ¼ 1; zÞf ðz j X ¼ 0Þ
y;z
X
E ½Y j X ¼ 1; Z ¼ z f ðz j X ¼ 0Þ;
z
which is a function of f (v) with V ¼ (X, N, O, Z, Y). Note that this argument goes through even if Z and/or Y are non-binary, continuous variables. The Role of the Extended Causal Model in Figure 6.3 The identifying formula under all four causal models associated with the DAG in Figure 6.3 is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under the NPSEM associated with the DAG in Figure 6.1.
6 Alternative Graphical Causal Models
131
For Pearl, having at the outset assumed an NPSEM associated with the DAG in Figure 6.1, the story did not contribute to identification; rather, it served only to show that the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))] of the NPSEM associated with the DAG in Figure 6.1 could, under the scenario of our story, encode a substantively important parameter — the manipulable causal effect of setting N to 0 and O to 1 on the extended causal model associated with the DAG in Figure 6.3. However, from the refutationist point of view, it is the story itself that make’s Pearl’s claim that E½Yðx ¼ 1;Zðx ¼ 0ÞÞ ¼z E½YjX ¼ 1;z f ðzjO ¼ 1Þ refutable and, thus, scientifically meaningful. Specifically, when nicotine-free cigarettes become available, Pearl’s claim can be tested by an intervention that forces a random sample of the population to smoke nicotine-free cigarettes. For someone willing to entertain only an agnostic causal model, the information necessary to identify the effect of nicotine-free cigarettes was contained in the story as the parameter E[Y(x ¼ 1, Z(x ¼ 0))] is undefined without the story. [Someone, such as Dawid (2000), opposed to counterfactuals and thus wedded to the agnostic causal model, might then reasonably and approint int int int priately choose to define En¼0;o¼1 ½Y En¼0;o¼0 ½Y ¼ En¼0;o¼1 ½Y Ex¼0 ½Y to be the natural or pure direct effect of X not through Z. This definition differs from, and in our view is preferable to, the definition of DDG (2006) discussed previously: The definition of DDG fails to correspond to the concept of the PDE as used in the literature since its introduction in Robins and Greenland (1992).] For an analyst who had assumed that the MCM, but not necessarily the NPSEM, associated with the DAG in Figure 6.1 was true, the information contained in the above story licenses the assumption that the MCM associated with Figure 6.3 holds. This latter assumption can be used in two alternative ways, both leading to the same identifying formula. First, it leads via Lemma 6 to the above g-functional analysis also used by the agnostic model advocate. Second, as we next show, it can be used to prove that : (6.11) holds, allowing identification to proceed a` la Pearl (2001). The Role of Determinism Consider an MCM associated with the DAG in Figure 6.3 with node set V ¼ (X, N, O, Z, Y). It follows from the fact that X ¼ N ¼ O with probability (w.p.) 1 that the condition that N(x) ¼ O(x) ¼ x w.p. 1 also holds. However, for pedagogic purposes, suppose for the moment that the condition N(x) ¼ O(x) ¼ x w.p. 1 does not hold. For expositional simplicity we assume all variables are binary so our model is also an FFRCISTG model. Then, V0 ¼ X, V1(v0) ¼ N(x), and V4 ðv 3 Þ ¼ V4 ðv2 ;v3 Þ. V2 ðv0 ;v1 Þ ¼ V2 ðv1 Þ ¼ OðxÞ;V3 ðv 2 Þ ¼ V3 ðv1 Þ ¼ ZðnÞ;
132
Causality and Psychopathology
By Theorem 1, {Y(o, z), Z(n), O(x), N(x)} are mutually independent. However, because we are assuming an FFRCISTG model and not an NPSEM, we cannot conclude that O(x) ? ? N(x*) for x 6¼ x*. Consider the induced counterfactual models for the variables (X, Z, Y) obtained from our FFRCISTG model by marginalizing over (N, O). Because N and O each has only a single child on the graph in Figure 6.3, the counterfactual model over (X, Z, Y) is the FFRCISTG associated with the complete graph of Figure 6.1, where the one-step-ahead counterfactuals Z(1)(x), Y(1) (x, z) associated with Figure 6.1 are obtained from the counterfactuals {Y(o, z), Z(n), O(x), N(x)} associated with Figure 6.3 by Z(1)(x) ¼ Z(N(x)), Y(1)(x, z) ¼ Y(O(x), z). Here, we have used the superscript ‘(1)’ to emphasize the graph with respect to which Z(1)(x) and Y(1)(x, z) are one-step-ahead counterfactuals. We cannot conclude that Z(1)(0) ¼ Z(N(0)) and Y(1)(1, z) ¼ Y(O(1), z) are independent, even though Z(n) and Y(o, z) are independent because, as noted above, the FFRCISTG model associated with Figure 6.3 does not imply independence of O(1) and N(0). Suppose now we re-instate the deterministic constraint that N(x) ¼ O(x) ¼ x w.p. 1. Then, we conclude that O(x) is independent of N(x*), since both variables are constants. It then follows that Z(1)(0) and Y(1)(1, z) are independent and, thus, that (6.11) holds and E[Y(1)(1, Z(1)(0))] is identified. The Need for Conditioning on Events of Probability Zero In our argument that, under the deterministic constraint that N(x) ¼ O(x) ¼ x w.p. 1, the FFRCISTG associated with the DAG in Figure 6.3 implied condition (6.11), the crucial step was the following: By Theorem 1, the independences in condition (6.1) that define an FFRCISTG imply that Y(o, z) and Z(n) are independent for n ¼ 0 and o ¼ 1. In this section, we show that had we modified (6.1), and thus our definition of an FFRCISTG, by restricting to conditioning events V m1 ¼ v m1 that have a positive probability under f (v), then Theorem 1 would not hold for non-positive densities f (v). Specifically, if f (v) is not positive, the modified version of condition (6.1) does not imply condition (6.6); furthermore, the set of independences implied by a modified FFRCISTG associated with a graph G could differ for different orderings of the variables consistent with the descendant relationships on the graph. Specifically, we now show that for the modified FFRCISTG associated with Figure 6.3 and the ordering (X, N, O, Z, Y), we cannot conclude that Y(x, n, o, z) ¼ Y(o, z) and Z(x, n, o) ¼ Z(n) are independent for n ¼ 0 and o ¼ 1 and, thus, that condition (6.11) holds. However, the modified FFRCISTG with the alternative ordering (X, N, Z, O, Y) does imply Y(o, z) ? ? Z(n). First, consider the modified FFRCISTG associated with Figure 6.3 and ordering
6 Alternative Graphical Causal Models
133
(X, N, O, Z, Y ) under the deterministic constraint N(x) ¼ O(x) ¼ x w.p. 1. The unmodified condition (6.1) implies the set of independences Yðn; o; zÞ ? ? Zðn; oÞ j X ¼ x; NðxÞ ¼ n; OðxÞ ¼ o; for fz; x; n; o 2 f0; 1gg: The modified condition (6.1) implies only the subset corresponding to {x, z 2 {0, 1}; n ¼ o ¼ x} since the event {N(x) ¼ j, O(x) ¼ 1 j, j 2 {0, 1}} has probability 0. As a consequence, we can only conclude that Y(n, o, z) ¼ Y(o, z) ? ? Z(n) for o ¼ n. In contrast, for the modified FFRCISTG associated with Figure 6.3 and the ordering V ¼ (X, N, Z, O, Y), the deterministic constraint N(x) ¼ O(x) ¼ x w.p. 1 implies Y(o, z) ? ? Z(n) for n ¼ 0 and o ¼ 1 as follows: By Equation (6.1) and the fact that Y(x, n, z, o) ¼ Y(o, z) and Z(x, n) ¼ Z(n), we have, without having to condition on an event of probability 0, that Yðo; zÞ; ZðnÞ ? ? X for z; o; n 2 f0; 1g; Yðo; zÞ ? ? ZðnÞ j X ¼ x; NðxÞ ¼ n for x; z; o 2 f0; 1g and n ¼ x:
ð6:23Þ ð6:24Þ
? Z(n ¼ x) | X ¼ x for x, z, o 2 { 0, 1} as X ¼ x However, (6.23) implies Y(o, z) ? is the same event as X ¼ N(x) ¼ x. Thus, Y(o, z) ? ? Z(n) for n, z, o 2 {0, 1} by (6.23). The heuristic reason that for the ordering V ¼ (X, N, Z, O, Y) we must condition on events of probability zero in condition (6.1) in order to prove (6.11) is that such conditioning is needed to instantiate the assumption that O has no effect on Z; if we do not allow conditioning on events of probability zero, the FFRCISTG model with this ordering does not instantiate this assumption because O and N are equal with probability one and, thus, we can substitute O for N as the cause Z. Under the ordering V ¼ (X, N, Z, O, Y) in which O is subsequent to Z, it was not necessary to condition on events of probability zero in (6.1) to instantiate this assumption as the model precludes later variables in the ordering from being causes of earlier variables; thus, O cannot be a cause of Z. The above example demonstrates that the assumption that Equation (6.1) holds even when we condition on events of probability zero can place independence restrictions on the distribution of the counterfactuals over and above those implied by the assumption that Equation (6.1) holds when the conditioning events have positive probability. One might wonder how this could be so; it is usually thought that different choices for probabilities conditional on events of probability zero have no distributional implications. The following simple canonical example that makes no reference to causality
Causality and Psychopathology
134
or counterfactuals clarifies how multiple distributional assumptions conditional on events of probability zero can place substantive restrictions on a distribution. Example 4 Suppose we have random variables (X, Y, R) where R ¼ 1 w.p. 1. Suppose we assume both that (i) f (x, y | R ¼ 0) ¼ f (x, y) and (ii) f (x, y | R ¼ 0) ¼ f (x | R ¼ 0) f (y | R ¼ 0). Then we can conclude that f (x, y) ¼ f (x | R ¼ 0) f (y | R ¼ 0) and, thus, that X and Y are independent since the joint density f (x, y) factors as a function of x times a function of y. The point is that although neither assumption (i) nor assumption (ii) alone restricts the joint distribution of (X, Y); nonetheless, together they impose the restriction that X and Y are independent. Inclusion of a Measured Common Cause of Z and Y A similar elaboration may be given for the causal DAG in Figure 6.2(a). The extended causal DAG represented by our story would then be the DAG in Figure 6.4. Under any of our four causal models, fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðlÞ ¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðlÞ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðlÞ: Hence,
En¼0;o¼1 ½Y ¼
X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM associated with the DAG in Figure 6.2(a).
X
N
Z
L
O
Y
Figure 6.4 The graph from Figure 6.3 with, in addition, a measured common cause (L) of the intermediate Z and the final response Y.
6 Alternative Graphical Causal Models
135
Summary We believe in some generality that whenever a particular causal effect is (a) identified from data on V under an NPSEM associated with a DAG G with node set V (but is not identified under the associated MCM, FFRCISTG model, or agnostic causal model) and (b) can expressed as the effect of an intervention on certain variables (which may not be elements of V ) in an identifiable sub-population, then that causal effect is also identified under the agnostic causal DAG model based on a DAG G0 with node set V 0, a superset of V. To find such an identifying causal DAG model G0, it is generally necessary to make the variables in V 0 \V deterministic functions of the variables in V. The above examples based on extended DAGs in Figures 6.3 and 6.4 are cases in point; see Robins, VanderWeele, and Richardson (2007), Geneletti and Dawid (2007), and Appendix A for such a construction for the effect of treatment on the treated.
4.2 An Example in which an Interventional Interpretation of the PDE is more Controversial The following example shows that the construction of a scientifically plausible story under which the PDE can be regarded as a manipulable contrast relative to an expanded graph G0 may be more controversial than our previous example would suggest. After presenting the example, we briefly discuss its philosophical implications. Suppose nicotine X was the only chemical found in cigarettes that had an effect on MI but that nicotine produced its effects by two different mechanisms. First, it increased blood pressure Z by directly interacting with a membrane receptor on blood pressure control cells located in the carotid artery in the neck. Second, it directly caused atherosclerotic plaque formation and, thus, an MI by directly interacting with a membrane receptor of the same type located on the endothelial cells of the coronary arteries of the heart. Suppose the natural endogenous ligand produced by the body that binds to these receptors was nicotine itself. Finally, assume that exogenous nicotine from cigarettes had no causal effect on the levels of endogenous nicotine (say, because the time-scale under study is too short for homeostatic feedback mechanisms to kick in) and we had precisely measured levels of endogenous nicotine L before randomizing to smoking or not smoking (X). Suppose that, based on this story, an analyst posits that the NPSEM associated with the graph in Figure 6.2(a) with V ¼ (X, Z, Y, L) is true. As noted in Section 3.3, under this supposition E[Y(x ¼ 1, Z(x ¼ 0))] is identified via z;l E½YjX ¼ 1;Z ¼z;L ¼ l f ðzjX ¼ 0;L ¼ lÞf ðlÞ.
Causality and Psychopathology
136
Can we express E[Y(x ¼ 1, Z(x ¼ 0))] as an effect of a scientifically plausible intervention? To do so, we must devise an intervention that (i) blocks the effect of exogenous nicotine on the receptors in the neck without blocking the effect of exogenous nicotine on the receptors in the heart but (ii) does not block the effect of endogenous nicotine on the receptors in either the neck or heart. To accomplish (i), one could leverage the physical separation of the heart and the neck to build a ‘‘nano-cage’’ around the blood pressure control cells in the neck that prevents exogenous nicotine from reaching the receptors on these cells. However, because endogenous and exogenous nicotine are chemically and physically identical, the cage would also block the effect of endogenous nicotine on receptors in the neck, in violation of (ii). Thus, a critic might conclude that E[Y(x ¼ 1, Z(x ¼ 0))] could not be expressed as the effect of an intervention. If the critic adhered to the slogan ‘‘no causation without manipulation’’ (i.e., causal contrasts are best thought of in terms of explicit interventions that, at least in principle, could be performed (Robins & Greenland, 2000)), he or she would then reject the PDE as a meaningful causal contrast in this context. In contrast, if the critic believed in the ontological primacy of causation, he or she would take the example as evidence for their slogan ‘‘causation before manipulation.’’ Alternatively, one can argue that the critic’s conclusion that E[Y(x ¼ 1, Z(x ¼ 0))] could not be expressed as the effect of an intervention indicates only a lack of imagination and an intervention satisfying (i) and (ii) may someday exist. Specifically, someday it may be possible to chemically attach a side group to the exogenous nicotine in cigarettes in such a way that (a) the effect of the (exogenous) chemically-modified nicotine and the effect of the unmodified nicotine on the receptors in the heart and neck are identical, while (b) allowing the placement of a ‘‘nano-cage’’ in the neck that successfully binds the side group attached to the exogenous nicotine, thereby preventing it from reaching the receptors in the neck. In that case, E[Y(x ¼ 1, Z(x ¼ 0))] equals a manipulable contrast of the extended deterministic causal DAG of Figure 6.5. In the Figure C ¼ 1 denotes that the ‘‘nano-cage’’ C
X
Rn
D
Z
L M Rh Y
Figure 6.5 An example in which an interventional interpretation of the PDE is hard to conceive; thicker edges indicate deterministic relations.
6 Alternative Graphical Causal Models
137
is present. We allow X to take three values, as before X ¼ 0 indicates no cigarette exposure, X ¼ 1 indicates exposure to cigarettes with unmodified nicotine, and X ¼ 2 indicates exposure to cigarettes with modified nicotine. Rn is the fraction of the receptors in the neck that are bound to a nicotine molecule (exogenous or endogenous) and Rh is the fraction of the receptors in the heart that are bound to a nicotine molecule. M is a variable that is 1 if and only if X 6¼ 0; D is a variable that takes the value 1 if and only if either X ¼ 1 or (X ¼ 2 and C ¼ 0). Then, E[Y(x ¼ 1, Z(x ¼ 0))] is the parameter int ½Y corresponding to the intervention described in (a) and (b). Ex¼2;c¼1 Under all four causal models associated with the graph in Figure 6.5, X
fx¼2;c¼1 ðy; z; lÞ
f ðy j rh ; zÞ f ðrh j m; lÞ f ðz j rn Þ f ðm j x ¼ 2Þ
m;d;rh ;rn
f ðrn j d; lÞ f ðd j c ¼ 1; x ¼ 2Þf ðlÞ ¼ f ðy j M ¼ 1; l; zÞ f ðz j D ¼ 0; lÞf ðlÞ ¼ f ðy j X ¼ 1; z; lÞ f ðz j X ¼ 0; lÞf ðlÞ; where the first equality uses the fact that D ¼ 0 and M ¼ 1 when x ¼ 2 and c ¼ 1 and the second uses the fact that, since in the observed data C ¼ 0 w.p. 1, D ¼ 0 if and only if X ¼ 0, and M ¼ 1 if and only if X ¼ 1 (since X 6¼ 2 w.p. 1). Thus,
Ex¼2;c¼1 ½Y ¼
X
E½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM based on the DAG in Figure 6.2(a). As noted in the Introduction, the exercise of trying to construct a story to provide an interventionist interpretation for a non-manipulable causal parameter of an NPSEM often helps one devise explicit, and sometimes even practical, interventions which can then be represented as a manipulable causal effect relative to an extended deterministic causal DAG model such as Figure 6.3.
5 Path-Specific Effects In this section we extend our results to path-specific effects. We begin with a particular motivating example.
Causality and Psychopathology
138
5.1 A Specific Example Suppose our underlying causal DAG was the causal DAG of Figure 6.2(b) in which there is an arrow from X to L. We noted above that Pearl proved E[Y(x ¼ 1, Z(x ¼ 0))] was not identified from data (X, L, Z, Y) on the causal DAG in Figure 6.2(b) even under the associated NPSEM. There exist exactly three possible extensions of Pearl’s original story that are consistent with the causal DAG in Figure 6.2(b), as shown in Figure 6.6: (a) nicotine N causes L but O does not; (b) O causes L but N does not; (c) both N and O cause L. We consider as int before the causal effect En¼0;o¼1 ½Y. Under all four causal models associated with int ½Y is identified from factual data on the graph in Figure 6.6(a), En¼0;o¼1 V ¼ (X, L, Z, Y). Specifically, on the DAG in Figure 6.6(a), we have fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j N ¼ 0Þ ¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðl j N ¼ 0; O ¼ 0Þ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ; so int ½Y ¼ En¼0;o¼1
X
EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ:
ð6:25Þ
l;z
X
N
Z
L
X
O
(a)
N
Z
L
O
(b)
Y
Y
T
X
N
Z
L
X
O
(c)
N
Z
L
O
Y
(d)
Y
Figure 6.6 Elaborations of the graph in Figure 6.2(b), with additional variables as described in the text; thicker edges indicate deterministic relations.
6 Alternative Graphical Causal Models
139
Similarly, under all four causal models associated with graph in Figure 6.6(b), int ½Y is identified from factual data on V ¼ (X, L, Z, Y): On the DAG in En¼0;o¼1 Figure 6.6(b) we have fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j O ¼ 1Þ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ so int ½Y ¼ En¼0;o¼1
X
EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ:
ð6:26Þ
l;z int However, En¼0;o¼1 ½Y is not identified from factual data on V ¼ (X, L, Z, Y) under any of the four causal models associated with graph in Figure 6.6(c). In this graph fn¼0,o¼1(y, z, l ) ¼ f (y | O ¼ 1, z, l )f (z | N ¼ 0, l ) f (l | O ¼ 1, N ¼ 0). However, f (l | O ¼ 1, N ¼ 0) is not identified from the factual data since the event {O ¼ 1, N ¼ 0} has probability 0 under f (v). Note that the int identifying formulae for En¼0;o¼1 ½Y for the graphs in Figure 6.6(a) and (b) are different.
Relation to Counterfactuals Associated with the DAG in Figure 6.2(b) Let Y(x, l, z), Z(x, l ) and L(x) denote the one-step-ahead counterfactuals associated with the graph in Figure 6.2(b). Then, it is clear from the assumed deterministic counterfactual relation N(x) ¼ O(x) ¼ x that the parameter int ½Y ¼ E ½Yðo ¼ 1; Lðn ¼ 0Þ; Zðn ¼ 0; Lðn ¼ 0ÞÞÞ En¼0;o¼1
associated with the graph in Figure 6.6(a) can be written in terms of the counterfactuals associated with the graph in Figure 6.2(b) as E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ: Likewise, we have that the parameter int En¼0;o¼1 ½Y ¼ E ½Yðo ¼ 1; Lðo ¼ 1Þ; Zðn ¼ 0; Lðo ¼ 1ÞÞÞ
associated with the graph in Figure 6.6(b) equals E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ in terms of the counterfactuals associated with the graph in Figure 6.2(b). In int ½Y associated with the graph in Figure 6.6(c) is not the contrast En¼0;o¼1 mean of any counterfactual defined from Y(x, l, z), Z(x, l ), and L(x) under the graph in Figure 6.2(b) since L, after intervening to set n ¼ 0, o ¼ 1, is neither L(x ¼ 1) nor L(x ¼ 0) as both imply a counterfactual for L under which n ¼ o.
140
Causality and Psychopathology
Furthermore, the parameter E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ associated with the graph in Figure 6.2(b) is not identified under any of the four causal models associated with any of the three graphs in Figure 6.6(a), (b) and (c); see Section 3.4. Thus, in summary, under an MCM or FFRCISTG model associated with the DAG in Figure 6.6(a), the extension of Pearl’s original story encoded in that DAG allows the identification of the causal effect E[Y{x ¼ 1, L(x ¼ 0), Z(x ¼ 0)}] associated with the DAG in Figure 6.2(b). Similarly, under an MCM or FFRCISTG model associated with the DAG in Figure 6.2(b) the extension of Pearl’s original story encoded in this graph allows the identification of the causal effect E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1))] associated with the DAG in Figure 6.2(b). Contrast with the NPSEM for the DAG in Figure 6.2(b) We now compare these results to those obtained under the assumption that the NPSEM associated with the DAG in Figure 6.2(b) held. Under this model Avin et al. (2005) proved, using their theory of path-specific effects, that while E[Y(x ¼ 1, Z(x ¼ 0))] is unidentified, both E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ and E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ ð6:27Þ are identified (without requiring any additional story) by Equations (6.25) and (6.26) respectively. From the perspective of the FFRCISTG models associated with the graphs in Figure 6.6(a) and (b) if N and O represent, as we have been assuming, the substantive variables Nicotine and Other components of cigarettes (rather than merely formal mathematical constructions), these graphs will generally represent mutually exclusive causal hypotheses. As a consequence, at most one of the two FFRCISTG models will be true; thus, from this perspective, only one of the two parameters in (6.27) will be identified. Simultaneous Identification of both Parameters in (6.27) by an Expanded Graph We next describe an alternative scenario associated with the expanded graph in Figure 6.6(d) whose substantive assumptions imply (i) the FFRCISTG model associated with Figure 6.6(d) holds and (ii) the two parameters of (6.27) are manipulable parameters of that FFRCISTG which are identified by Equations (6.25) and (6.26), respectively. Thus, this alternative
6 Alternative Graphical Causal Models
141
scenario provides a (simultaneous) manipulative interpretation for the nonmanipulative (relative to (X, Z, Y)) parameters (6.27) that are simultaneously identified by the NPSEM associated with the DAG in Figure 6.2(b). Suppose it was (somehow?) known that, as encoded in the DAG in Figure 6.6(d), the Nicotine (N0) component of cigarettes was the only (cigaretterelated) direct cause of Z not through L, the Tar (T ) component was the only (cigarette-related) direct cause of L, the Other components (O0) contained all the (cigarette-related) direct causes of Y not through Z and L, and there are no further confounding variables so that the FFRCISTG model associated with Figure 6.6(d) can be assumed to be true. Then, the parameter En0 ¼0;t¼0;o0 ¼1 [Y] associated with Figure 6.6(d) equals both the parameter E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] associated with Figure 6.2(b) and the parameter En¼0,o¼1[Y] associated with Figure 6.6(a) (where n ¼ 0 is now defined to be the intervention that sets nicotine n0 ¼ 0 and tar t ¼ 0 while o ¼ 1 is the intervention o0 ¼ 1; N and O are redefined by 1 N ¼ (1 N0) (1 T ) and O ¼ O0). Furthermore, En0 ¼0;t¼0;o0 ¼1 [Y] is identified by Equation (6.25). Similarly, the parameter En0 ¼0;t¼1;o0 ¼1 [Y] associated with Figure 6.6(d) is equal to both the parameter E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] associated with Figure 6.2(b) and the parameter En¼0,o¼1[Y] associated with Figure 6.6(b) (where n ¼ 0 is now the intervention that sets nicotine n0 ¼ 0 while o ¼ 1 denotes the intervention that sets tar t ¼ 1 and o0 ¼ 1; N and O are redefined by N ¼ N0 and O ¼ TO0). Furthermore, the parameter En0 ¼0;t¼1;o0 ¼1 [Y] is identified by Equation (6.26). Note that under this alternative scenario, and in contrast to our previous scenarios, the substantive meanings of the intervention that sets n ¼ 0 and o ¼ 1 and of the variables N and O for Figure 6.6(a) differ from the substantive meaning of this intervention and these variables for Figure 6.6(b), allowing the two parameters En¼0,o¼1[Y] to be identified simultaneously, each by a different formula, under the single FFRCISTG model associated with Figure 6.6(d).
Connection to Path-Specific Effects Avin et al. (2005) refer to E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] as the effect of X ¼ 1 on Y when the paths from X to L and from X to Z are both blocked (inactivated) and to E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] as the effect of X ¼ 1 on Y when the paths from X to Z are blocked. They refer to E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ as the effect of X ¼ 1 on Y when both the path from X to Z and (X’s effect on) the path from L to Z are blocked.
Causality and Psychopathology
142
5.2 The General Case We now generalize the above results. Specifically, given any DAG G, with a variable X, construct a deterministic extended DAG Gex that differs from G only in that the only arrows out of X on Gex are deterministic arrows from X to new variables N and O and the origin of each arrow out of X on G is from either N or O (but never both) on Gex. Then, with V nX being the set of variables on G other than X, the marginal g-formula density fn¼0,o¼1(vnx) is identified from the distribution of the variables V on G whenever f (v) is a positive distribution by
fn¼0;o¼1 ðvnxÞ ¼
Y
f ðvj j paj Þ
f j:Vj is not a child of Y
X on Gg
f j:Vj is a child of Y
O on Gex g
f j:Vj is a child of
N on Gex g
f ðvj j paj nx; X ¼ 1Þ f ðvj j paj nx; X ¼ 0Þ:
Note that if X has p children on G, there exist 2p different graphs Gex. The identifying formula for fn¼0,o¼1(vnx) in terms of f (v) depends on the graph Gex. It follows that, under the assumption that a particular Gex is associated int with one of our four causal models, the intervention distribution fn¼0;o¼1 ðvnxÞ corresponding to that Gex is identified under any of the four associated models. We now discuss the relationship with path-specific effects. Avin et al. (2005) first define, for any counterfactual model associated with G, the path-specific effect on the density of V nX when various paths on graph G have been blocked. Avin et al. (2005) further determine which path-specific densities are identified under the assumption that the NPSEM associated with G is true and provide the identifying formulae. The results of Avin et al. (2005) imply that the path-specific effect corresponding to the set of blocked paths on G being the paths from X to the subset of its children who were the children of N on any given Gex is identified under the NPSEM assumption for G. Their identifying formula is precisely our fn¼0,o¼1(vnx) corresponding to this Gex. In fact, our derivation int implies that this path-specific effect on G is identified by fn¼0;o¼1 ðvnxÞ for this Gex under the assumption that any of our four causal models associated with this Gex holds, even without assuming that the NPSEM associated with the original graph G is true. Again, under the NPSEM assumption for G, all 2p int effects fn¼0;o¼1 ðvnxÞ as Gex varies are identified, each by the formula fn¼0,o¼1(vnx), specific to the graph Gex.
6 Alternative Graphical Causal Models
143
int A substantive scenario under which all 2p effects fn¼0;o¼1 ðvnxÞ are simultaneously identified by the Gex-specific formulae fn¼0,o¼1(vnx) is obtained by assuming an FFRCISTG model for an expanded graph on which N and O are 0 replaced by a set of parents Xj , j ¼ 1, . . . , p, one for each child of X, X is the 0 0 0 only parent of each Xj , each Xj has a single child, and X ¼Xj w.p. 1 in the 0 actual data. We consider the 2p interventions that set a subset of the Xj to 1 and the remainder to 0. The relationship of this analysis to the analysis based on the graphs Gex containing N and O mimics the relationship of the analysis based on Figure 6.6(d) under the alternative scenario of the last subsection to the analyses based on Figure 6.6(a) and Figure 6.6(b). In these latter analyses, X had precisely three children: Z, L and Y. Avin et al. (2005) also show that other path-specific effects are identified under the NPSEM assumption for G. However, their results imply that whenever, for a given set of blocked paths, the path-specific density of V nX is identified from data on V under an NPSEM associated with G, the identifying formula is equal to the g-formula density fn¼0,o¼1(vnx) corresponding to one of the 2p graphs Gex. Avin et al. (2005) provide an algorithm that can be used to find the appropriate Gex corresponding to a given set of blocked paths. As discussed in Section 3.4 even the path-specific densities that are not identified under an NPSEM become identified under yet further untestable counterfactual independence assumptions and/or rank preservation assumptions.
6 Conclusion The results presented here, which are summarized in Table 6.1, appear to present a clear trade-off between the agnostic causal DAG, MCM, and FFRCISTG model frameworks and that of the NPSEM. Table 6.1 Relations between causal models and estimands associated with the DAG shown in Figure 6.1; column ‘D’ indicates if the contrast is defined in the model; ‘I’ whether it is identified. Causal Model
Agnostic DAG MCM FFRCISTG NPSEM
Potential Outcome Indep. Ass. None (6.5) (6.1) (6.8)
Direct Effects CDE
ETT
PDE
PSDE
|X |=2
|X |>2
D
I
D
I
D
I
D
I
D
I
Y Y Y Y
Y Y Y Y
N Y Y Y
N N N Y
N Y Y Y
N N N Y
N Y Y Y
N Y Y Y
N Y Y Y
N N Y Y
144
Causality and Psychopathology
In the NPSEM approach the PDE is identified, even though the result cannot be verified by a randomized experiment without making further assumptions. In contrast, the PDE is not identified under an agnostic causal DAG model or under an MCM/FFRCISTG model. Further, in Appendix A we show that the ETT can be identified under an MCM/ FFRCISTG model even though the ETT cannot be verified by a randomized experiment without making further assumptions. Our analysis of Pearl’s motivation for the PDE suggests that these dichotomies may not be as stark as they may at first appear. We have shown that in certain cases where one is interested in a prima facie non-manipulable causal parameter then the very fact that it is of interest implies that there also exists an extended DAG in which the same parameter is manipulable and identifiable in all the causal frameworks. Inevitably, such cases will be interpreted differently by NPSEM ‘skeptics’ and ‘advocates.’ Advocates may argue that if our conjecture holds, then we can work with NPSEMs and have some reassurance that in important cases of scientific interest we will have the option to go back to an agnostic causal DAG. Conversely, skeptics may conclude that if we are correct then this shows that it is advisable to avoid the NPSEM framework: Agnostic causal DAGs are fully ‘‘testable’’ (with the usual caveats) and many non-manipulable NPSEM parameters that are of interest, but not identifiable within a non-NPSEM framework, can be identified in an augmented agnostic causal DAG. Undoubtedly, this debate is set to run and run . . .
Appendix A: The Effect of Treatment on the Treated: A Non-Manipulable Parameter The primary focus of this chapter has been various contrasts assessing the direct effect of X on Y relative to an intermediate Z. In this appendix we discuss another non-manipulable parameter, the effect of treatment on the treated, in order to further clarify the differences among the agnostic, the MCM and the FFRCISTG models. For our purposes, we shall only require the simplest possible causal model based on the DAG X ! Y, obtained by marginalizing over Z in the graph in Figure 6.1. Let Y(0) denote the counterfactual Y(x) evaluated at x ¼ 0. In a counterfactual causal model, the average effect of treatment on the treated is defined to be ETTðxÞ E ½YðxÞ Yð0Þ j X ¼ x Exint ½Y j X ¼ x E0int ½Y j X ¼ x :
6 Alternative Graphical Causal Models
145
Minimal Counterfactual Models (MCMs) In an MCM associated with DAG X ! Y, E[Y(x) | X ¼ x] ¼ E[Y | X ¼ x], by the consistency assumption (iii) in Section 1.1. Thus, ETTðxÞ ¼ E ½Y j X ¼ x E ½Yð0Þ j X ¼ x: Hence the ETT(x) is identified iff the second term on the right is identified. First, note that ETTð0Þ ¼ E ½Y j X ¼ 0 E ½Yð0Þ j X ¼ 0 ¼ 0: Now, by consistency condition (iii) in Section 1.1 and the MCM assumption, Equation (6.4), we have E ½Y j X ¼ 0 ¼ E ½Yð0Þ j X ¼ 0 ¼ E ½Yð0Þ: By the law of total probability, E ½Yð0Þ ¼
X
E ½Yð0Þ j X ¼ x PðX ¼ xÞ:
x
Hence, it follows that E ½Y j X ¼ 0PðX 6¼ 0Þ ¼
X
E ½Yð0Þ j X ¼ x PðX ¼ xÞ:
ð6:28Þ
x:x6¼0
In the special case where X is binary, so jX j, the right-hand side of Equation (6.28) reduces to a single term and, thus, we have E[Y(0) | X ¼ 1] ¼ E[Y | X ¼ 0]. It follows that for binary X, we have ETTð1Þ ¼ E½Y j X ¼ 1 E½Y j X ¼ 0 under the MCM (and hence any counterfactual causal model). See Pearl (2010, pp. 396–7) for a similar derivation, though he does not make explicit that consistency is required. In contrast, if X is not binary, then the right-hand side of Equation (6.28) contains more than one unknown so that ETT(x) for x 6¼ 0 is not identified under the MCM. However, under an FFRCISTG model, condition (6.1) implies that E½Yð0ÞjX ¼ x ¼E½YjX ¼ 0; so ETT(x) is identified in this model, regardless of X’s sample space. The parameter ETT(1) ¼ E[Y(1) Y(0) | X ¼ 1] is not manipulable, relative to {X, Y}, even when X is binary, since, without further assumptions, we cannot experimentally observe Y(0) in subjects with X ¼ 1.
146
Causality and Psychopathology
Note that even under the MCM with jX j > 2, the non-manipulable (relative to {X, Y}) contrast E[Y(0) | X 6¼ 0] E[Y | X 6¼ 0], the effect of receiving X ¼ 0 on those who did not receive X ¼ 0, is identified since E[Y(0) | X 6¼ 0] is identified by the left-hand side of Equation (6.28). We now turn to the agnostic causal model for the DAG X ! Y. Although Exint ½Y is identified by the g-functional as E[Y | X ¼ x], nonetheless, as expected for a non-manipulable causal contrast, the effect of treatment on the treated is not formally defined within the agnostic causal model, without further assumptions, even for binary X. Of course, the g-functional (see Definition 5) does define a joint distribution fx(x*, y) for (X, Y ) under which X takes the value x with probability 1. However, in spite of apparent notational similarities, the conditional density fx(y | x*) expresses a different concept from that occurring in the definition of Exint ½Y j X ¼x E½YðxÞjX ¼ x in the counterfactual theory. The former relates to the distribution over Y among those individuals who (after the intervention) have the value X ¼ x*, under an intervention which sets every unit’s value to x and thus fx(y | x*) ¼ f (y | x) if x* ¼ x and is undefined if x* 6¼ x ; the latter is based on the distribution of Y under an intervention fixing X ¼ x among those people who would have had the value X ¼ x* had we not intervened. The minimality of the MCM among all counterfactual models that both satisfy the consistency assumption (iii) in Section 1.1 and identify the interðzÞg can be seen as follows. For binary X, the above vention distributions f fpint R argument for identification of the non-manipulable contrast ETT(1) under an MCM as the difference E[Y | X ¼ 1] E[Y | X ¼ 0] follows directly, via the laws of probability, from the consistency assumption (iii) in Section 1.1 and the minimal independence assumption (6.5) required to identify the intervention ðzÞg. In contrast, the additional independence assumptions distributions f fpint R (6.8) used to identify the PDE under the NPSEM for the DAG in Figure 6.1 or the additional independence assumptions used to identify ETT(1) for nonbinary X under an FFRCISTG model for the DAG X ! Y are not needed to identify intervention distributions. Of course, as we have shown, it may be the case that the PDE is identified as an intervention contrast in an extended causal DAG containing additional variables; but identification in this extended causal DAG requires additional assumptions beyond those in the original DAG and hence does not follow merely from application of the laws of probability. Similarly, the ETT(1) for the causal DAG X ! Y can be re-interpreted as an intervention contrast in an extended causal DAG containing additional variables, regardless of the dimension of X’s state space. Specifically, Robins, VanderWeele, and Richardson (2007) showed that the ETT(x) parameter is defined and identified via the extended agnostic causal DAG in Figure 6.7
6 Alternative Graphical Causal Models X*
X
147
Y
Figure 6.7 An extended DAG, with a treatment X ¼ X * and response Y, leading to an interventional interpretation of the effect of treatment on the treated (Robins, VanderWeele, & Richardson, 2007; Geneletti & Dawid, 2007). The thicker edge indicates a deterministic relationship.
that adds to the DAG X ! Y a variable X * that is equal to X with probability 1 under f (v) ¼ f (x*, x, y). Then Exint ½Y j X ¼ x is identified by the g-formula as E[Y | X ¼ x], because X is the only parent of Y. Furthermore Exint ½Y j X ¼ x has an interpretation as the effect on the mean of Y of setting X to x on those observed to have X ¼ x* because X ¼ X* with probability 1. Thus, though ETT(x) is not a manipulable parameter relative to the graph X ! Y, it is manipulable relative to the variables {X *, X, Y} in the DAG in Figure 6.7. In the extended graph ETT(x) is identified by the same function of the observed data as E[Y(x) Y(0) | X ¼ x] in the original FFRCISTG model for non-binary X or in the original MCM or FFRCISTG model for binary X. The substantive fact that would license the extended DAG of Figure 6.7 is that a measurement, denoted by X *, could be taken just before X occurs such that (i) in the observed data X * predicts X perfectly (i.e. X ¼ X * w.p. 1) but (ii) an intervention exists that could, in principle, be administered in the small time interval between the X * measurement and the occurrence of X whose effect is to set X to a particular value x, say x ¼ 0. As an example, let X * denote the event that a particular pill has been swallowed, X denote the event that the pill’s contents enter the blood stream, and the intervention be the administration of an emetic that causes immediate regurgitation of the pill, but otherwise has no effect on the outcome Y; see Robins, VanderWeele, and Richardson (2007).
A Model that Is an MCM but Not an FFRCISTG In this section we describe a parametric counterfactual model for the effect of a ternary treatment X on a binary response Y that is an MCM associated with the graph in Figure 6.8 but is not an FFRCISTG. Let ¼ (0, 1, 2) be a (vector-valued) latent variable with three components such that in a given population Dirichlet (0, 1, 2) so that 0 + 1 + 2 ¼ 1 w.p. 1. The joint distribution of the factual and counterfactual data is determined by the unknown parameters (0, 1, 2). Specifically the treatment X is ternary with states 0, 1, 2, and P(X ¼ k | ) ¼ k, equivalently X j Multinomial ð1; Þ:
Causality and Psychopathology
148
Y (x=0)
π 0 π1 π2
X
Y
Y (x=1) Y (x=2)
X
(a)
Y
(b)
Figure 6.8 (a) A simple graph; (b) A graph describing a confounding structure that leads to a counterfactual model that corresponds to the MCM but not the FFRCISTG associated with the DAG (a); thicker red edges indicate deterministic relations.
Now suppose that the response Y is binary and that the counterfactual outcomes Y(x) are as follows: Yðx ¼ 0Þ j Bernoulli ð1 =ð1 þ 2 ÞÞ; Yðx ¼ 1Þ j Bernoulli ð2 =ð2 þ 3 ÞÞ; Yðx ¼ 2Þ j Bernoulli ð0 =ð0 þ 1 ÞÞ: Thus in this example, conditional on the potential outcome Y(x ¼ k) ‘happens’ to be a realization of a Bernoulli random variable with probability of success equal to the probability of receiving treatment X ¼ k + 1 mod 3 given that treatment X is not k. In what follows we will use [] to indicate that an expression is evaluated mod 3. Now since (0, 1, 2) follows a Dirichlet distribution, it follows that ½iþ1 =ð½iþ1 þ ½iþ2 Þ ? ? i for i ¼ 0, 1, 2. Hence, in this example, for i ¼ 0, 1, 2 we have Y (x ¼ i) ? ? i. Further, I (X ¼ i) ? ? Y (x ¼ i) | i; hence, the model obeys the MCM independence restriction (6.5): Yðx ¼ iÞ ? ? IðX ¼ iÞ for all i; but not the FFRCISTG independence restriction (6.1), since Yðx ¼ iÞ ? 6 ? IðX ¼ jÞ for i 6¼ j: We note that we have: PðX ¼ iÞ ¼ Eði Þ ¼ i =ð0 þ 1 þ 2 Þ;
ð6:29Þ
j X ¼ i Dirichlet ði þ 1; ½iþ1 ; ½iþ2 Þ;
ð6:30Þ
Yðx ¼ iÞ j X ¼ i Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ;
ð6:31Þ
Y j X ¼ i Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ:
ð6:32Þ
6 Alternative Graphical Causal Models
149
Equation (6.30) follows from standard Bayesian updating (since the Dirichlet distribution is conjugate to the multinomial). It follows that the vector of parameters (0, 1, 2) is identified only up to a scale factor since the likelihood for the observed variables f (x, y | ) ¼ f (x, y | ) for any > 0, by Equations (6.29) and (6.32). We note that since E(Y(x)) ¼ E(Y | X ¼ x), ACEX!Y ðxÞ EðYðxÞÞ EðYð0ÞÞ ¼ EðYjX ¼ xÞ EðYjX ¼ 0Þ and thus is identified. However since Yðx ¼ 0Þ j X ¼ 1 Bernoulli ðð1 þ 1Þ=ð1 þ 1 þ 2 ÞÞ; Yðx ¼ 0Þ j X ¼ 2 Bernoulli ð1 =ð1 þ 2 þ 1ÞÞ; and the probability of success in these distributions is not invariant under rescaling of the vector , we conclude that these distributions are not identified from data on f(x, y). Consequently ETT(x*) E[Y(x ¼ x*) Y(x ¼ 0) | X ¼ x*] is not identified under our parametric model.
Appendix B: A Data-Generating Process Leading to an FFRCISTG but not an NPSEM Robins (2003) stated that it is hard to construct realistic (as opposed to mathematical) scenarios in which one would accept that the FFRCISTG model associated with Figure 6.1 held, but the NPSEM did not, and thus that CDEs are identified but PDEs are not. In this Appendix we describe such a scenario. We leave it to the reader to judge its realism. Suppose that a substance U that is endogenously produced by the body could both (i) decrease blood pressure by reversibly binding to a membrane receptor on the blood pressure control cells in the carotid artery of the neck and (ii) directly increase atherosclerosis, and thus MI, by stimulating the endothelial cells of the coronary arteries of the heart via an interaction with a particular protein, and that this protein is expressed in endothelial cells of the coronary arteries only when induced by the chemicals in tobacco smoke other than nicotine e.g., tar. Further, suppose one mechanism by which nicotine increased blood pressure Z was by irreversibly binding to the membrane receptor for U on the blood pressure control cells in the carotid, the dose of nicotine in a smoker being sufficient to bind every available receptor. Then, under the assumption that there do not exist further unmeasured confounders for the effect of hypertension on MI, this scenario implies that it is reasonable to assume that any of the four causal models associated with the expanded DAG in Figure 6.9 is true. Here R measures the degree of binding of protein U to the membrane receptor in blood pressure control cells. Thus, R is zero in smokers of cigarettes containing nicotine. E measures the degree of stimulation of the endothelial cells of the carotid
Causality and Psychopathology
150
U
E ≡ OU
R ≡ (1−N)U
X
N
Z
O
Y
Figure 6.9 An example leading to the FFRCISTG associated with the DAG in Figure 6.1 but not an NPSEM; thicker edges denote deterministic relations.
artery by U. Thus, E is zero except in smokers (regardless of whether the cigarette contains nicotine). Before considering whether the NPSEM associated with Figure 6.1 holds, let us first study the expanded DAG of Figure 6.9. An application of the g-formula to the DAG in Figure 6.9 shows that the effect of not smoking int int int int ½Y ¼ Ex¼0 ½Y and the effect of smoking En¼1;o¼1 ½Y ¼ Ex¼1 ½Y are En¼0;o¼0 identified by E[Y | X ¼ 0] and E[Y | X ¼ 1] under all four causal models int ½Y of smoking nicoassociated with Figure 6.9. However, the effect En¼0;o¼1 tine-free cigarettes is not identified. Specifically, int ½Y En¼0;o¼1 X ¼ E ½Y j O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0Þ f ðuÞ z;u
¼
X
E ½Y j N ¼ 1; O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0; O ¼ 0Þ f ðuÞ
z;u
¼
X
E ½Y j X ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; X ¼ 0Þ f ðuÞ
z;u
where the first equality used the fact that E is a deterministic function of U and O and that R is a deterministic function of N and U. The second equality int ½Y is not a used d-separation and the third, determinism. Thus, En¼0;o¼1 function of the density of the observed data on (X, Z, Y ) because u occurs both in the term E[Y | X ¼ 1, U ¼ u, Z ¼ z] where we have conditioned on X ¼ 1, and in the term f (z | U ¼ u, X ¼ 0), where we have conditioned on X ¼ 0. As a consequence, we do not obtain a function of the density of the observed data when we marginalize over U.
6 Alternative Graphical Causal Models
151
Since under all three counterfactual models associated with the extended int ½Y is equal to the parameter E[Y(x ¼ 1, Z(x ¼ 0))] DAG of Figure 6.9 En¼0;o¼1 of Figure 6.1, we conclude that E[Y(x ¼ 1, Z(x ¼ 0))], and thus, the PDE is not identified. Hence, the induced counterfactual model for the DAG in Figure 6.1 cannot be an NPSEM (as that would imply that the PDE would be identified). int Furthermore, En¼0;o¼1 ½Y is a manipulable parameter with respect to the DAG in Figure 6.3, since this DAG is obtained from marginalizing over U in int ½Y is not the graph in Figure 6.9. However, as we showed above, En¼0;o¼1 identified from the law of the factuals X, Y, Z, N, O, which are the variables in Figure 6.3. From this we conclude that none of the four causal models associated with the graph in Figure 6.3 can be true. Note that prima facie one might have thought that if the agnostic causal DAG in Figure 6.1 is true, then this would always imply that the agnostic causal DAG in Figure 6.3 is also true. This example demonstrates that such a conclusion is fallacious. Similar remarks apply to the MCM and FFRCISTG models. Additionally, for z ¼ 0, 1, by applying the g-formula to the graph in int Figure 6.9, we obtain that the joint effect of smoking and z, En¼1;o¼1;z ½Y, int and the joint effect of not smoking and z, En¼0;o¼0;z ½Y, are identified by E½YjX ¼ 1;Z ¼ z and E½YjX ¼ 0;Z ¼ z, respectively, under all four causal int int ½Y and En¼1;o¼1;z ½Y are equal to the models for Figure 6.9. Since En¼0;o¼0;z int int parameters Ex¼0;z ½Y and Ex¼1;z ½Y under all four associated causal models associated with the graph in Figure 6.1 we conclude that CDE(z) is also identified under all four causal models associated with Figure 6.1. The results obtained in the last two paragraphs are consistent with the FFRCISTG model and the MCM associated with the graph in Figure 6.1 holding but not the NPSEM. In what follows we prove such is the case. Before doing so, we provide a simpler and more intuitive way to understand the above results by displaying in Figure 6.10 the subgraphs of Figure 6.9 corresponding to U, Z, Y when the variables N and O are set to each of their four possible joint values. We see that only when we set N ¼ 0 and O ¼ 1 is it the case that U is a common cause of both Z and Y (as setting N ¼ 0, O ¼ 1 makes R ¼ E ¼ U). Thus, we have int int ½Y ¼ En¼0;o¼0 ½YjZ ¼ z En¼0;o¼0;z
¼ E ½YjO ¼ 0; N ¼ 0; Z ¼ z ¼ E ½YjX ¼ 0; Z ¼ z; and int ½Y En¼1;o¼1;z
int ¼ En¼1;o¼1 ½YjZ ¼ z
¼ E ½YjO ¼ 1; N ¼ 1; Z ¼ z ¼ E ½YjX ¼ 1; Z ¼ z as O and N are unconfounded and Z is unconfounded when either we int ½Y 6¼ set O ¼ 1, N ¼ 1 or we set O ¼ 0, N ¼ 0. However, En¼0;o¼1;z int En¼0;o¼1 ½Yjz ¼ E½YjN ¼ 0;O ¼ 1;Z ¼ z as the effect of Z on Y is confounded
Causality and Psychopathology
152 (a)
U
(b)
U
(c)
U
(d)
Z
Z
Z
Z
Y
Y
Y
Y
U
Figure 6.10 An example leading to the FFRCISTG associated with the DAG in Figure 6.1 holding but not the NPSEM: Causal subgraphs on U, Z, Y implied by the graph in Figure 6.9 when we intervene and set (a) N ¼ 0, O ¼ 0; (b) N ¼ 1, O ¼ 0; (c) N ¼ 0, O ¼ 1; (d) N ¼ 1, O ¼ 1. int int when we set N ¼ 0, O ¼ 1. It is because En¼0;o¼1;z ½Y 6¼ En¼0;o¼1 ½Yjz that int En¼0;o¼1 ½Y is not identified. If, contrary to Figure 6.9, there was no confounding between Y and Z when N is set to 0 and O is set to 1, then we would have int int ½Y ¼En¼0;o¼1 ½Yjz. It would then follow that En¼0;o¼1;z int ½Y ¼ En¼0;o¼1
X
int int En¼0;o¼1 ½Yjz fn¼0;o¼1 ½z
z
¼
X
int int En¼0;o¼1;z ½Y fn¼0;o¼1 ½z
z
¼
X
int int En¼1;o¼1;z ½Y fn¼0;o¼0 ½z
z
¼
X
E ½YjX ¼ 1; Z ¼ z f ½zjX ¼ 0;
z
where the third equality is from the fact that we suppose N has no direct effect on Y not through Z and O has no effect on Z. We conclude by showing that the MCM and FFRCISTG models associated with Figure 6.1 are true, but the NPSEM is not, if any of the three counterfactual models associated with Figure 6.9 are true. Specifically, the DAG in Figure 6.11 represents the DAG of Figure 6.1 with the counterfactuals for Z(x) and Y(x, z), the variable U of Figure 6.9, and common causes U1 and U2 of the Z(x) and the Y(x, z) added to the graph. Note that U being a common cause of Z and Y in Figures 6.9 and 6.10 only when we set N ¼ 0 and O ¼ 1 implies that U is only a common cause of Z(0), Y(1, 0), and Y(1, 1) in Figure 6.11. One can check using d-separation that the counterfactual independences in Figure 6.11 satisfy those required of an MCM or FFRCISTG model, but not those of an NPSEM, as Z(0) and Y(1, z) are dependent. However, Figure 6.11 contains more independences than are required for the FFRCISTG condition (6.1) applied to the DAG in Figure 6.1. In particular, in Figure 6.11 Z(1) and Y(0, z) are independent, which implies that E[Y(0, Z(1))] is identified by z E½YjX ¼ 0;Z ¼ z f ðzjX ¼ 1Þ and, thus, the the so-called total direct effect E[Y(1, Z(1))] E[Y(0, Z(1))] is also identified.
6 Alternative Graphical Causal Models
153
Figure 6.11 An example leading to an FFRCISTG corresponding to the DAG in Figure 6.1 but not an NPSEM: potential outcome perspective. Counterfactuals for Y are indexed Y(x, z). U, U1, and U2 indicate hidden confounders. Thicker edges indicate deterministic relations.
Finally, we note that we could easily modify our example to eliminate the independence of Z(1) and Y(0, z).
Appendix C: Bounds on the PDE under an FFRCISTG Model In this Appendix we derive bounds on the PDE PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Y j X ¼ 0 under the assumption that the MCM or FFRCISTG model corresponding to the graph in Figure 6.1 holds and all variables are binary. Note E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E ½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ: The two quantities E½Yðx ¼ 1;z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0 and E½Yðx ¼ 1;z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1 are constrained by the law for the observed data via E½Y j X ¼ 1; Z ¼ 0 ¼ E½Yðx ¼ 1; z ¼ 0Þ ¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ;
154
Causality and Psychopathology
E½Y j X ¼ 1; Z ¼ 1 ¼ E½Yðx ¼ 1; z ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ: It then follows from the analysis in Richardson and Robins (2010, Section 2.2) that the set of possible values for the pair ð0 ; 1 Þ ðE½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0; E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1Þ compatible with the observed joint distribution f ðz;y j xÞ is given by ð0 ;1 Þ 2 ½l0 ;u0 ½l1 ;u1 where l0 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 0 1Þ=PðZ ¼ 0 j X ¼ 0Þg; u0 ¼ minfE½Y j X ¼ 1; Z ¼ 0=PðZ ¼ 0 j X ¼ 0Þ; 1g; l1 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 1 1Þ=PðZ ¼ 1 j X ¼ 0Þg; u1 ¼ minfE½Y j X ¼ 1; Z ¼ 1=PðZ ¼ 1 j X ¼ 0Þ; 1g: Hence, we have the following upper and lower bounds on the PDE: maxf0; PðZ ¼ 0 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 0 1g þ maxf0; PðZ ¼ 1 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 1 1g E ½Y j X ¼ 0 PDE minfPðZ ¼ 0 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 0g þ minfPðZ ¼ 1 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 1g E ½Y j X ¼ 0: Kaufman, Kaufman, and MacLehose (2009) obtain bounds on the PDE under assumption (6.2) but while allowing for confounding between Z and Y, i.e. not assuming that (6.3) holds, as we do. As we would expect the bounds that we obtain are strictly contained in those obtained by Kaufman et al. (2009, see Table 2, row {50}). Note that when P(Z ¼ z | X ¼ 0) ¼ 1, PDE ¼ CDEðzÞ ¼ E ½YjX ¼ 1; Z ¼ z E ½YjX ¼ 0; Z ¼ zÞ; thus, in this case our upper and lower bounds on the PDE coincide and the parameter is identified. In contrast, Kaufman et al.’s upper and lower bounds on the PDE do not coincide when P(Z ¼ z | X ¼ 0) ¼ 1. This follows because, under their assumptions, CDE(z) is not identified, but PDE ¼ CDE(z) when P(Z ¼ z | X ¼ 0) ¼ 1.
6 Alternative Graphical Causal Models
155
Appendix D: Interventions Restricted to a Subset: The FRCISTG Model To describe the FRCISTG model for V ¼ (V1, . . . , VM), we suppose that each Vm ¼ (Lm, Am) is actually a composite of variables Lm and Am, one of which can be the empty set. The causal effects of intervening on any of the Lm variables is not defined. However, we assume that for any subset R of A ¼ A M ¼ ðA1 ; . . . ;AM Þ, the counterfactuals Vm(r) are well-defined for any r 2 R. Specifically, we assume that the one-step-ahead counterfactuals Vm ða m1 Þ ¼(Lm ða m1 Þ;Am ða m1 Þ) exist for any setting of a m1 2 A m1 . Note that it is implicit in this definition that Lk precedes Ak for all k. Next, we make the consistency assumption that the factual variables Vm and the counterfactual variables Vm(r) are obtained recursively from the Vm ða m1 Þ. We do not provide a graphical characterization of parents. Rather, we say that the parents Pam of Vm consist of the smallest subset of A m1 such that, for all a m1 2 A m1 ,Vm ða m1 Þ ¼ Vm ðpam Þ where pam is the sub-vector of a m1 corresponding to Pam. One can then view the parents Pam of Vm as the direct causes of Vm relative to the variables prior to Vm on which we can perform interventions. Finally, an FRCISTG model imposes the following independences:
? Am ðam1 Þ j Lm ¼ lm ; Am1 ¼ am1 ; Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? for all m; aM1 ; lm :
ð6:33Þ
Note that (6.33) can also be written
? Am ðam1 Þ j Lm ðam1 Þ ¼ lm ; Am1 ¼ am1 ; Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? for all m; aM1 ; lm ;
where L m ða m1 Þ ¼(Lm ða m1 Þ;Lm1 ða m2 Þ; . . . ;L1 ). In the absence of inter-unit interference and non-compliance, data from a sequentially randomized experiment in which at each time m the treatment Am is randomly assigned, with the assignment probability at m possibly depending on the past ðL m ;A m1 Þ, will follow an FRCISTG model; see Robins (1986) for further discussion. The analogous minimal causal model (MCM) with interventions restricted to a subset is defined by replacing Am ða m1 Þ by IfAm ða m1 Þ ¼ am g in condition (6.33). It follows from (Robins, 1986) that our Extended Lemma 6 continues to hold when we substitute either ‘FRCISTG model’ or ‘MCM with restricted interventions’, for ‘MCM’ in the statement of the Lemma, provided we take R A.
156
Causality and Psychopathology
Likewise we may define an agnostic causal model with restricted interventions to be the causal model that simply assumes that the interventional density of ðzÞ, under treatment regime pR for any R A, is given Z V, denoted by fpint R by the g-functional density fpR(z) whenever fpR(z) is a well-defined function of f (v). In Theorem 1 we proved that the set of defining conditional independences in condition (6.1) of an FFRCISTG model can be re-expressed as a set of unconditional independences between counterfactuals. An analogous result does not hold for an FRCISTG. However, the following theorem shows that we can remove past treatment history from the conditioning set in the defining conditional independences of an FRCISTG model, provided that we continue to condition on the counterfactuals L m ða m1 Þ. Theorem 8 An FRCISTG model for V ¼ðV1 ; . . . ;VM Þ;Vm ¼ðLm ;Am Þ implies that for all m, a M1 ;l m , ? Am ðam1 Þ j Lm ðam1 Þ ¼ lm : Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? Note that the theorem would not be true had we substituted the factual L m for L m ða m1 Þ.
References Avin, C., Shpitser, I., & Pearl, J. (2005). Identifiability of path-specific effects. In L. P. Kaelbling & A. Saffiotti (Eds.), IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence (pp. 357–363). Denver: Professional Book Center. Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association, 95(450), 407–448. Didelez, V., Dawid, A., & Geneletti, S. (2006). Direct and indirect effects of sequential treatments. In R. Dechter & T. S. Richardson (Eds.), UAI-06, Proceedings of the 22nd annual conference on uncertainty in artificial intelligence (pp. 138–146). Arlington, VA: AUAI Press. Frangakis, C. E., & Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86(2), 365–379. Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58(1), 21–29. Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology and Public Health,. Gill, R. D., & Robins, J. M. (2001). Causal inference for complex longitudinal data: The continuous case. Annals of Statistics, 29(6), 1785–1811. Hafeman, D., & VanderWeele, T. (2010). Alternative assumptions for the identification of direct and indirect effects. Epidemiology. (Epub ahead of print)
6 Alternative Graphical Causal Models
157
Heckerman, D., & Shachter, R. D. (1995). A definition and graphical representation for causality. In P. Besnard & S. Hanks (Eds.), UAI-95: Proceedings of the eleventh annual conference on uncertainty in artificial intelligence (pp. 262–273). San Francisco: Morgan Kaufmann. Imai, K., Keele, L., & Yamamoto, T. (2009). Identification, inference, and sensitivity analysis for causal mediation effects (Tech. Rep.). Princeton University, Department of Politics. Kaufman, S., Kaufman, J. S., & MacLehose, R. F. (2009). Analytic bounds on causal risk differences in directed acyclic graphs involving three observed binary variables. Journal of Statistical Planning and Inference, 139(10), 3473–3487. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufmann. Pearl, J. (2000). Causality. Cambridge: Cambridge University Press. Pearl, J. (2001). Direct and indirect effects. In J. S. Breese & D. Koller (Eds.), UAI-01, Proceedings of the 17th annual conference on uncertainty in artificial intelligence (pp. 411–42). San Francisco: Morgan Kaufmann. Pearl, J. (2010). An introduction to causal inference. The International Journal of Biostatistics, 6(2). (DOI: 10.2202/1557-4679.1203) Petersen, M., Sinisi, S., & Laan, M. van der. (2006). Estimation of direct causal effects. Epidemiology, 17(3), 276–284. Richardson, T. S., & Robins, J. M. (2010). Analysis of the binary instrumental variable model. In R. Dechter, H. Geffner, & J. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 415–444). London: College Publications. Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods – applications to control of the healthy worker survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J. M. (1987). Addendum to ‘‘A new approach to causal inference in mortality studies with sustained exposure periods – applications to control of the healthy worker survivor effect’’. Computers and Mathematics with Applications, 14, 923–945. Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In P. Green, N. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 70–81). Oxford: Oxford University Press. Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. Robins, J. M., & Greenland, S. (2000). Comment on ‘‘Causal inference without counterfactuals’’. Journal of the American Statistical Association, 95(450), 431–435. Robins, J. M., Richardson, T. S., & Spirtes, P. (2009). Identification and inference for direct effects (Tech. Rep. No. 563). University of Washington, Department of Statistics. Robins, J. M., Rotnitzky, A., & Vansteelandt, S. (2007). Discussion of ‘‘Principal stratification designs to estimate input data missing due to death’’ by Frangakis, C.E., Rubin D.B., An, M., MacKenzie, E. Biometrics, 63(3), 650–653. Robins, J. M., VanderWeele, T. J., & Richardson, T. S. (2007). Discussion of ‘‘Causal effects in the presence of non compliance: a latent variable interpretation’’ by Forcina, A. Metron, LXIV(3), 288–298. Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rubin, D. B. (1998). More powerful randomization-based ‘‘p-values’’ with the p italicized in double-blind trials with non-compliance. Statistics in Medicine, 17, 371–385.
158
Causality and Psychopathology
Rubin, D. B. (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics, 31(2), 161–170. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction and Search (No. 81). New York: Springer-Verlag. VanderWeele, T., & Robins, J. (2007). Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. American Journal of Epidemiology, 166(9), 1096–1104.
7 General Approaches to Analysis of Course Applying Growth Mixture Modeling to Randomized Trials of Depression Medication bengt muthe´n, hendricks c. brown, aimee m. hunter, ian a. cook, and andrew f. leuchter
Introduction This chapter discusses the assessment of treatment effects in longitudinal randomized trials using growth mixture modeling (GMM) (Muthe´n & Shedden, 1999; Muthe´n & Muthe´n, 2000; Muthe´n et al., 2002; Muthe´n & Asparouhov, 2009). GMM is a generalization of conventional repeated measurement mixed-effects (multilevel) modeling. It captures unobserved subject heterogeneity in trajectories not only by random effects but also by latent classes corresponding to qualitatively different types of trajectories. It can be seen as a combination of conventional mixed-effects modeling and cluster analysis, also allowing prediction of class membership and estimation of each individual’s most likely class membership. GMM has particularly strong potential for analyses of randomized trials because it responds to the need to investigate for whom a treatment is effective by allowing for different treatment effects in different trajectory classes. The chapter is motivated by a University of California, Los Angeles study of depression medication (Leuchter, Cook, Witte, Morgan, & Abrams, 2002). Data on 94 subjects are drawn from a combination of three studies carried out with the same design, using three different types of medications: fluoxetine (n = 14), venlafaxine IR (n = 17), and venlafaxine XR (n = 18). Subjects were measured at baseline and again after a 1-week placebo lead-in phase. In the subsequent double-blind phase of the study, the subjects were randomized into medication (n = 49) and placebo (n = 45) groups. After randomization, subjects were measured at nine occasions: at 48 hours and at weeks 1–8. The current analyses consider the Hamilton Depression Rating Scale. 159
160
Causality and Psychopathology
Several predictors of course of the Hamilton scale trajectory are available, including gender, treatment history, and a baseline measure of central cordance hypothesized to influence tendency to respond to treatment. The results of studies of this kind are often characterized in terms of an end point analysis where the outcome at the end of the study, here at 8 weeks, is considered for the placebo group and for the medication group. A subject may be classified as a responder by showing a week 8 depression score below 10 or when dropping below 50% of the initial score. The treatment effect may be assessed by comparing the medication and placebo groups with respect to the ratio of responders to nonresponders. As an alternative to end point analysis, conventional repeated measurement mixed-effects (multilevel) modeling can be used. Instead of focusing on only the last time point, this uses the outcome at all time points, the two pretreatment occasions and the nine posttreatment occasions. The trajectory shape over time is of key interest and is estimated by a model that draws on the information from all time points. The idea of considering trajectory shape in research on depression medication has been proposed by Quitkin et al. (1984), although not using a formal statistical growth model. Rates of response to treatment with antidepressant drugs are estimated to be 50%–60% in typical patient populations. Of particular interest in this chapter is how to assess treatment effects in the presence of a placebo response. A placebo response is an improvement in depression ratings seen in the placebo group that is unrelated to medication. The improvement is often seen as an early steep drop in depression, often followed by a later upswing. An example is seen in Figure 7.2. A placebo response confounds the estimation of the true effect of medication and is an important phenomenon given its high prevalence of 25%–60% (Quitkin, 1984). Because the placebo response is pervasive, the statistical modeling must take it into account when estimating medication effects. This can be done by acknowledging the qualitative heterogeneity in trajectory shapes for responders and nonresponders. It is important to distinguish among responder and nonresponder trajectory shapes in both the placebo and medication groups. Conventional repeated measures modeling may lead to distorted assessment of medication effects when individuals follow several different trajectory shapes. GMM avoids this problem while maintaining the repeated measures modeling advantages. The chapter begins by considering GMM with two classes, a nonresponder class and a responder class. The responder class is defined as those individuals who respond in the placebo group and who would have responded to placebo among those in the medication group. Responder class membership is observed for subjects in the placebo group but is unobserved in the medication group. Because of randomization, it can be assumed that this class of subjects is present in both the placebo and medication groups
7 General Approaches to Analysis of Course
161
and in equal numbers. GMM can identify the placebo responder class in the medication group. Having identified the placebo responder and placebo nonresponder classes in both the placebo and medication groups, medication effects can more clearly be identified. In one approach, the medication effect is formulated in terms of an effect of medication on the trajectory slopes after the treatment phase has begun. This medication effect is allowed to be different for the nonresponder and responder trajectory classes. Another approach formulates the medication effect as increasing the probability of membership in advantageous trajectory classes and decreasing the probability of membership in disadvantageous trajectory classes.
Growth Mixture Modeling This section gives a brief description of the GMM in the context of the current study. A two-piece, random effect GMM is applied to the Hamilton Depression Rating Scale outcomes at the 11 time points y1–y11. The first piece refers to the two time points y1 and y2 before randomization, and the second piece refers to the nine postrandomization time points y3–y11. Given only two time points, the first piece is by necessity taken as a linear model with a random intercept, defined at baseline, and a fixed effect slope. An exploration of each individual’s trajectory suggests a quadratic trajectory shape for the second piece. The growth model for the second piece is centered at week 8, defining the random intercept as the systematic variation at that time point. All random effect means are specified as varying across latent trajectory classes. The medication effect is captured by a regression of the linear and quadratic slopes in the second piece on a medication dummy variable. These medication effects are allowed to vary across the latent trajectory classes. The model is shown in diagrammatic form at the top of Figure 7.1.1 The statistical specification is as follows. Consider the depression outcome yit for individual i, let c denote the latent trajectory class variable, let g denote random effects, let at denote time, and let 2t denote residuals containing measurement error and time-specific variation. For the first, prerandomization piece, conditional on trajectory class k (k = 1, 2 . . . K),
1. In Figure 7.1 the observed outcomes are shown in boxes and the random effects in circles. Here, i, s, and q denote intercept, linear slope, and quadratic slope, respectively. In the following formulas, these random effects are referred to as g0, g1, and g2. The treatment dummy variable is denoted x.
Causality and Psychopathology
162
ybase ybpo1i
i1
s1
y48
y1
y2
i2
s2
q2
y48
y1
y2
i2
s2
q2
y3
y4
y5
y6
y7
y8
y3
y4
y5
y6
y7
y8
c
x
ybase ybpo1i
c
x
Figure 7.1 Two alternative GMM approaches. pre
pre
pre
pre
Yit ‰ci ¼k ¼ g0i ¼ g1i at þ 2it ;
(1)
with 1 = 0 to center at baseline, and random effects pre
g10i ‰ci ¼k ¼ 10k þ 10i ; pre
g11i ‰ci ¼k ¼ 11k þ 11i ;
(2) (3)
with only two prerandomization time points, the model is simplified to assume a nonrandom slope, V(11) = 0, for identification purposes. For the second, postrandomization piece, yit ‰ci ¼k ¼ g0i þ g1i at þ g2i at2 þ 2it ;
(4)
with 11 = 0, defining g0i as the week 8 depression status. The remaining t values are set according to the distance in timing of measurements. Assume for simplicity a single drug and denote the medication status for individual i by the dummy variable xi (x = 0 for the placebo group and x = 1 for the medication group).2 The random effects are allowed to be influenced by 2. In the application three dummy variables are used to represent the three different medications.
7 General Approaches to Analysis of Course
163
group and a covariate, w, their distributions varying as a function of trajectory class (k), g0i ‰ci ¼k ¼ 0k þ 01k xi þ 02k wi þ 0i ;
(5)
g1i ‰ci ¼k ¼ 1k þ 11k xi þ 12k wi þ 1i ;
(6)
g2i ‰ci ¼k ¼ 2k þ 21k xi þ 22k wi þ 2i ;
(7)
The residuals i in the first and second pieces have a 4 4 covariance matrix k, here taken to be constant across classes k. For both pieces the residuals 2it have a T T covariance matrix k, here taken to be constant across classes. For simplicity, k and k are assumed to not vary across treatment groups. As seen in equations 5–7, the placebo group (xi = 0) consists of subjects from the two different trajectory classes that vary in the means of the growth factors, which in the absence of covariate w are represented by 0k, 1k, and 2k. This gives the average depression development in the absence of medication. Because of randomization, the placebo and medication groups are assumed to be statistically equivalent at the first two time points. This implies that x is assumed to have no effect on g10i or g11i in the first piece of the development. Medication effects are described in the second piece by g01k, g11k, and g21k as a change in average growth rate that can be different for the classes. This model allows the assessment of medication effects in the presence of a placebo response. A key parameter is the medication-added mean of the intercept random effect centered at week 8. This is the g01k parameter of equation 5. This indicates how much lower or higher the average score is at week 8 for the medication group relative to the placebo group in the trajectory class considered. In this way, the medication effect is specific to classes of individuals who would or would not have responded to placebo. The modeling will be extended to allow for the three drugs of this study to have different g parameters in equations 5–7. Class membership can be influenced by baseline covariates as expressed by a logistic regression (e.g., with two classes), log½Pðc i ¼ 1jxi Þ=Pðci ¼ 2‰xi Þ ¼ c þ c wi ;
(8)
where c = 1 may refer to the nonresponder class and c = 2, the responder class. It may be noted that this model assumes that medication status does not influence class membership. Class membership is conceptualized as a quality characterizing an individual before entering the trial.
164
Causality and Psychopathology
A variation of the modeling will focus on postrandomization time points. Here, an alternative conceptualization of class membership is used. Class membership is thought of as being influenced by medication so that the class probabilities are different for the placebo group and the three medication groups. Here, the medication effect is quantified in terms of differences across groups in class probabilities. This model is shown in diagrammatic form at the bottom of Figure 7.1. It is seen that the GMM involves only the postrandomization outcomes, which is logical given that treatment influences the latent class variable, which in turn influences the posttreatment outcomes. In addition to the treatment variable, pretreatment outcomes may be used as predictors of latent class, as indicated in the figure. The treatment and pretreatment outcomes may interact in their influence on latent class membership.
Estimation and Model Choice The GMM can be fitted into the general latent variable framework of the Mplus program (Muthe´n & Muthe´n, 1998–2008). Estimation is carried out using maximum likelihood via an expectation-maximization (EM) algorithm. Missing data under the missing at random (MAR) assumption are allowed for the outcomes. Given an estimated model, estimated posterior probabilities for each individual and each class are produced. Individuals can be classified into the class with the highest probability. The classification quality is summarized in an entropy value with range 0–1, where 1 corresponds to the case where all individuals have probability 1 for one class and 0 for the others. For model fitting strategies, see Muthe´n et al. (2002), Muthe´n (2004), and Muthe´n and Asparouhov (2008). A common approach to decide on the number of classes is to use the Bayesian information criterion (BIC), which puts a premium on models with large log-likelihood values and a small number of parameters. The lower the BIC, the better the model. Analyses of depression trial data have an extra difficulty due to the typically small sample sizes. Little is known about the performance of BIC for samples as small as in the current study. Bootstrapped likelihood ratio testing can be performed in Mplus (Muthe´n & Asparouhov, 2008), but the power of such testing may not be sufficient at these sample sizes. Plots showing the agreement between the class-specific estimated means and the trajectories for individuals most likely belonging to a class can be useful in visually inspecting models but are of only limited value in choosing between models. A complication of maximum-likelihood GMM is the presence of local maxima. These are more prevalent with smaller samples such as the current ones for the placebo group, the medication group, as well as for the combined sample. To be confident that a global maximum has been found, many
7 General Approaches to Analysis of Course
165
random starting values need to be used and the best log-likelihood value needs to be replicated several times. In the present analyses, between 500 and 4,000 random starts were used depending on the complexity of the model.
Growth Mixture Analyses In this section the depression data are analyzed in three steps using GMM. First, the placebo group is analyzed alone. Second, the medication group is analyzed alone. Third, the placebo and medication groups are analyzed jointly according to the GMM just presented in order to assess the medication effects.
Analysis of the Placebo Group A two-class GMM analysis of the 45 subjects in the placebo group resulted in the model-estimated mean curves shown in Figure 7.2. As expected, a responder class (class 1) shows a postrandomization drop in the depression score with a low of 7.9 at week 5 and with an upswing to 10.8 at week 8. An estimated 32% of the subjects belong to the responder class. In contrast, the nonresponder class has a relatively stable level for weeks 1–8, ending with a depression score of 15.6 at week 8. The sample standard deviation at week 8 is 7.6. It may be noted that the baseline score is only slightly higher for the nonresponder class, 22.7 vs. 21.9. The standard deviation at baseline is 3.6.3 The observed trajectories of individuals classified into the two classes are plotted in Figure 7.3a and b as broken lines, whereas the solid curves show the model-estimated means. The figure indicates that the estimated mean curves represent the individual development rather well, although there is a good amount of individual variation around the mean curves. It should be noted that the classification of subjects based on the trajectory shape approach of GMM will not agree with that using end point analysis. As an example, the nonresponder class of Figure 7.3b shows two subjects with scores less than 5 at week 8. The individual with the lowest score at week 8, however, has a trajectory that agrees well with the nonresponder mean curve for most of the trial, deviating from it only during the last 2 weeks. The week 8 score has a higher standard deviation than at earlier time points, thereby weighting this time point somewhat less. Also, the data coverage due to
3. The maximum log-likelihood value for the two-class GMM of Figure 7.2 is 1,055.974, which is replicated across many random starts, with 28 parameters and a BIC value of 2,219. The classification based on the posterior class probabilities is not clear-cut in that the classification entropy value is only 0.66.
Causality and Psychopathology
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Class 1, 32.4%
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
Class 2, 67.6%
baseline
HamD
166
Time
Figure 7.2 Two-class GMM for placebo group.
missing observations is considerably lower for weeks 5–7 than other weeks, reducing the weight of these time points. The individual with the second lowest score at week 8 deviates from the mean curve for week 5 but has missing data for weeks 6 and 7. This person is also ambiguously classified in terms of his or her posterior probability of class membership. To further explore the data, a three-class GMM was also fitted to the 45 placebo subjects. Figure 7.4a shows the mean curves for this solution. This solution no longer shows a clear-cut responder class. Class 2 (49%) declines early, but the mean score does not go below 14. Class 1 (22%) ends with a mean score of 10.7 but does not show the expected responder trajectory shape of an early decline.4 A further analysis was made to investigate if the lack of a clear responder class in the three-class solution is due to the sample size of n = 45 being too small to support three classes. In this analysis, the n = 45 placebo group subjects were augmented by the medication group subjects but using only the two prerandomization time points from the medication group. Because of randomization, subjects are statistically equivalent before randomization, so this approach is valid. The first, prerandomization piece of the GMM has nine parameters, leaving only 25 parameters to be estimated in the second, postrandomization piece by 4. The log-likelihood value for the model in Figure 7.4a is 1,048,403, replicated across several random starts, with 34 parameters and a BIC value of 2,226. Although the BIC value is slightly worse than for the two-class solution, the classification is better, as shown by the entropy value of 0.85.
167
(b)
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
Time
week 4
week 3
week 2
week 1
lead-in 48 hrs
36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 baseline
HamD
(a)
week 4
week 3
week 2
week 1
lead-in 48 hrs
36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 baseline
HamD
7 General Approaches to Analysis of Course
Time
Figure 7.3 Individual trajectories for placebo subjects classified into (a) the responder class and (b) the non-responder class.
the n = 45 placebo subjects alone. Figure 7.4b shows that a responder class (class 2) is now found, with 21% of the subjects estimated to be in this class. High (class 3) and low (class 1) nonresponder classes are found, with 18% and 60% estimated to be in these classes, respectively. Compared to Figure 7.3, the observed individual trajectories within class are somewhat less heterogeneous (trajectories not shown).5
5. The log-likelihood value for the model in Figure 7.4b is 1,270.030, replicated across several random starts, with 34 parameters and a BIC value of 2,695. The entropy value is 0.62. Because a different sample size is used, these values are not comparable to the earlier ones.
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
week 4
week 3
week 2
lead-in 48 hrs
baseline
HamD
28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 1
Causality and Psychopathology
168
week 4
week 3
week 2
week 1
lead-in 48 hrs
baseline
HamD
Time 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
Time
Figure 7.4 (a) Three-class GMM for placebo group. (b) Three-class GMM for placebo group and pre-randomization medication group individuals.
Analysis of the Medication Group Two major types of GMMs were applied to the medication group. The first type analyzes all time points and either makes no distinction among the three drugs (fluoxetine, venlafaxine IR, venlafaxine XR) or allows drug differences for the class-specific random effect means of the second piece of the GMM. It would not make sense to also let class membership vary as a function of drug since class membership is conceptualized as a quality characterizing an individual before entering the trial. Class membership influences prerandomization outcomes, which cannot be influenced by drugs. To investigate class membership, the second type of GMM analyzes the nine postrandomization time points both to focus on the period where the medications have an effect and to let the class membership correspond to only postrandomization variables. Here, not only are differences across the three drugs allowed for the random effect means for each of the classes but the drug type is also allowed to influence class probabilities.
7 General Approaches to Analysis of Course
169
Analysis of All Time Points A two-class GMM analysis of the 49 subjects in the medication group resulted in the model-estimated mean curves shown in Figure 7.5. As expected, one of the classes is a large responder class (class 1, 85%). The other class (class 2, 15%) improves initially but then worsens.6 A three-class GMM analysis of the 49 subjects in the medication group resulted in the model-estimated mean curves shown in Figure 7.6. The three mean curves show the expected responder class (class 3, 68%) and the class (class 2, 15%) found in the two-class solution showing an initial improvement but later worsening. In addition, a nonresponse class (class 1, 17%) emerges, which has no medication effect throughout.7 Allowing for drug differences for the class-specific random effect means of the second piece of the GMM did not give a trustworthy solution in that the best log-likelihood value was not replicated. This may be due to the fact that this model has more parameters than subjects (59 vs. 49). Analysis of Postrandomization Time Points As a first step, two- and three-class analyses of the nine postrandomization time points were performed, not allowing for differences across the three drugs. This gave solutions that were very similar to those of Figures 7.5 and 7.6. The similarity in mean trajectory shape held up also when allowing for class probabilities to vary as a function of drug. Figure 7.7 shows the estimated mean curves for this latter model. The estimated class probabilities for the three drugs show that in the responder class (class 2, 63%) 21% of the subjects are on fluoxetine, 29% are on venlafaxine IR, and 50% are on venlafaxine XR. For the nonresponder class that shows an initial improvement and a later worsening (class 3, 19%), 25% are on fluoxetine, 75% are on venlafaxine IR, and 0% are on venlafaxine XR. For the nonresponder class that shows no improvement at any point (class 1, 19%), 58% are on fluoxetine, 13% are on venlafaxine IR, and 29% are on venlafaxine XR. Judged across all three trajectory classes, this suggests that venlafaxine XR has the better outcome, followed by venlafaxine IR, with fluoxetine last. Note, however, that for these data subjects were not randomized to the different medications; therefore, comparisons among medications are confounded by subject differences.8 6. The log-likelihood value for the model in Figure 7.5 is –1,084.635, replicated across many random starts, with 28 parameters and a BIC value of 2,278. The entropy value is 0.90. 7. The log-likelihood value for the model in Figure 7.6 is –1,077.433, replicated across many random starts, with 34 parameters and a BIC value of 2,287. The BIC value is worse than for the two-class solution. The entropy value is 0.85. 8. The log-likelihood value for the model of Figure 7.7 is –873.831, replicated across many random starts, with 27 parameters and a BIC value of 1,853. The entropy value is 0.79.
Causality and Psychopathology
170 26 24 22 20 18
HamD
16 14 12 10 8
Class 1, 84.7%
6 Class 2, 15.3%
4 2
8 weeks
7 weeks
6 weeks
5 weeks
4 weeks
3 weeks
2 weeks
1 week
lead-in 48 hrs
baseline
0
Time
Figure 7.5 Two-class GMM for medication group.
26 24 22 20 18
HamD
16 14 12 10 8
Class 1, 16.9%
6
Class 2, 14.9%
4
Class 3, 68.2%
2
Time
Figure 7.6 Three-class GMM for medication group.
8 weeks
7 weeks
6 weeks
5 weeks
4 weeks
3 weeks
2 weeks
1 week
lead-in 48 hrs
baseline
0
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
171
Class 1, 18.6% Class 2, 62.9%
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
Class 3, 18.6%
48 hrs
HamD
7 General Approaches to Analysis of Course
Time
Figure 7.7 Three-class GMM for medication group post randomization.
As a second step, a three-class model was analyzed by a GMM, where not only class membership probability was allowed to vary across the three drugs but also the class-varying random effect means. This analysis showed no significant drug differences in class membership probabilities. As shown in Figure 7.8, the classes are essentially of different nature for the three drugs.9
Analysis of Medication Effects, Taking Placebo Response Into Account The separate analyses of the 45 subjects in the placebo group and the 49 subjects in the medication group provide the basis for the joint analysis of all 94 subjects. Two types of GMMs will be applied. The first is directly in line with the model shown earlier under Growth Mixture Modeling, where medication effects are conceptualized as postrandomization changes in the slope means. The second type uses only the postrandomization time points and class membership is thought of as being influenced by 9. The log-likelihood value for the model of Figure 7.8 is –859.577, replicated in only a few random starts, with 45 parameters and a BIC value of 1, 894. The entropy value is 0.81. It is difficult to choose between the model of Figure 7.7 and the model of Figure 7.8 based on statistical indices. The Figure 7.7 model has the better BIC value, but the improvement in the log-likelihood of the Figure 7.8 model is substantial.
Causality and Psychopathology
172
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(a)
Time
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(b)
Time
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(c)
Time
Figure 7.8 Three-class GMM for (a) fluoxetine subjects, (b) venlafaxine IR subjects, and (c) venlafaxine XR subjects.
medication, in line with the Figure 7.7 model. Here, the class probabilities are different for the placebo group and the three medication groups so that medication effect is quantified in terms of differences across groups in class probabilities.
7 General Approaches to Analysis of Course
173
Analysis of All Time Points For the analysis based on the earlier model (see Growth Mixture Modeling), a three-class GMM will be used, given that three classes were found to be interpretable for both the placebo and the medication groups. Figure 7.9 shows the estimated mean curves for the three-class solution for the placebo group, the fluoxetine group, the venlafaxine IR group, and the venlafaxine XR group. It is interesting to note that for the placebo group the Figure 7.9a mean curves are similar in shape to those of Figure 7.4b, although the responder class (class 3) is now estimated to be 34%. Note that for this model the class percentages are specified to be the same in the medication groups as in the placebo group. The estimated mean curves for the three medication groups shown in Figure 7.9b–d are similar in shape to those of the medication group analysis shown in Figure 7.8a–c. These agreements with the separate-group analyses strengthen the plausibility of the modeling. This model allows the assessment of medication effects in the presence of a placebo response. A key parameter is the medication-added mean of the intercept random effect centered at week 8. This is the g01k parameter of equation 5. For a given trajectory class, this indicates how much lower or higher the average score is at week 8 for the medication group in question relative to the placebo group. In this way, the medication effect is specific to classes of individuals who would or would not have responded to placebo. The g01k estimates of the Figure 7.9 model are as follows. The fluoxetine effect for the high nonresponder class 1 at week 8 as estimated by the GMM is significantly positive (higher depression score than for the placebo group), 7.4, indicating a failure of this medication for this class of subjects. In the low nonresponder class 2 the fluoxetine effect is small but positive, though insignificant. In the responder class, the fluoxetine effect is significantly negative (lower depression score than for the placebo group), –6.3. The venlafaxine IR effect is insignificant for all three classes. The venlafaxine XR effect is significantly negative, –11.7, for class 1, which after an initial slight worsening turns into a responder class for venlafaxine XR. For the nonresponder class 2 the venlafaxine XR effect is insignificant, while for the responder class it is significantly negative, –7.8. In line with the medication group analysis shown in Figure 7.7, the joint analysis of placebo and medication subjects indicates that venlafaxine XR has the most desirable outcome relative to the placebo group. None of the drugs is significantly effective for the low nonresponder class 2.10
10. The log-likelihood value for the model shown in Figure 7.9 is –2,142.423, replicated across a few random starts, with 61 parameters and a BIC value of 4,562. The entropy value is 0.76.
week 6
week 7
week 8
week 6
week 7
week 8
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 4
week 3
week 2
week 1
lead-in 48 hrs
ven XR, Class 1, 20.5% ven XR, Class 2, 45.9% ven XR, Class 3, 33.6% baseline
HamD week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
ven IR, Class 1, 20.5% ven IR, Class 2, 45.9% ven IR, Class 3, 33.6%
Time
week 5
Time
(d)
baseline
HamD
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 5
Time
(c)
week 4
week 3
week 2
week 1
baseline
fluax, Class 1, 20.5% fluax, Class 2, 45.9% fluax, Class 3, 33.6% lead-in 48 hrs
HamD
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
placebo, Class 1, 20.5% placebo, Class 2, 45.9% placebo, Class 3, 33.6% baseline
HamD
(b)
Time
Figure 7.9 Three-class GMM of both groups: (a) Placebo subjects, (b) fluoxetine subjects, (c) venlafaxine IR subjects, and (d) venlafaxine XR subjects.
Causality and Psychopathology
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
174
(a)
7 General Approaches to Analysis of Course
175
Analysis of Postrandomization Time Points As a final analysis, the placebo and medication groups were analyzed together for the postrandomization time points. Figure 7.10 displays the estimated three-class solution, which again shows a responder class, a nonresponder class which initially improves but then worsens (similar to the placebo response class found in the placebo group), and a high nonresponder class.11 As a first step, it is of interest to compare the joint placebo–medication group analysis of Figure 7.10 to the separate placebo group analysis of Figure 7.4b and the separate medication group analysis of Figure 7.6. Comparing the joint analysis in Figure 7.10 to that of the placebo group analysis of Figure 7.4b indicates the improved outcome when medication group individuals are added to the analysis. In the placebo group analysis of Figure 7.4b 78% are in the two highest, clearly nonresponding trajectory classes, whereas in the joint analysis of Figure 7.10 only 36% are in the highest, clearly nonresponding class. In this sense, medication seems to have a positive effect in reducing depression. Furthermore, in the placebo analysis, 21% are in the placebo-responding class which ultimately worsens, whereas in the joint analysis 21% are in this type of class and 43% are in a clearly responding class. Comparing the joint analysis in Figure 7.10 to that of the medication group analysis of Figure 7.6 indicates the worsened outcome when placebo group individuals are added to the analysis. In the medication group analysis of Figure 7.6 only 17% are in the nonresponding class compared to 36% in the joint analysis of Figure 7.10. Figure 7.6 shows 15% in the initially improving but ultimately worsening class compared to 21% in Figure 7.10. Figure 7.6 shows 68% in the responding class compared to 43% in Figure 7.10. All three of these comparisons indicate that medication has a positive effect in reducing depression. As a second step, it is of interest to study the medication effects for each medication separately. The joint analysis model allows this because the class probabilities differ between the placebo group and each of the three medication groups, as expressed by equation 8. The results are shown in Figure 7.11. For the placebo group, the responder class (class 3) is estimated to be 26%, the initially improving nonresponder class (class 1) to be 22%, and the high nonresponder class (class 2) to be 52%. In comparison, for the fluoxetine group the responder class is estimated to be 48% (better than placebo), the initially improving nonresponder class to be 0% (better than placebo), and the high nonresponder class to be 52% (same as placebo). For the
11. The log-likelihood value for the model shown in Figure 7.10 is –1,744.999, replicated across many random starts, with 29 parameters and a BIC value of 3,621. The entropy value is 0.69.
Causality and Psychopathology 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
week 8
week 7
week 6
week 5
week 4
week 3
week 1
week 2
Class 1, 21.0% Class 2, 35.8% Class 3, 43.1%
48 hrs
HamD
176
Time
Figure 7.10 Three-class GMM analysis of both groups using post-randomization time points. Placebo Group 60
Fluoxetine Group 52
50
50
40 30
60
52
48
40 26
30
22
20
20
10
10 0
0
0 R
IINR
R
HNR
Venlafaxine IR Group 50 45 40 35 30 25 20 15 10 5 0
46
IINR
HNR
Venlafaxine XR Group
47
7
R
IINR
100 90 80 70 60 50 40 30 20 10 0
HNR
90
10 0 R
R = Responder Class IINR = Initially Improving Non-Responder Class HNR = High Non-Responder Class
Figure 7.11 Medication effects in each of 3 trajectory classes.
IINR
HNR
7 General Approaches to Analysis of Course
177
venlafaxine IR group, the responder class is estimated to be 46% (better than placebo), the initially improving nonresponder class t be 47% (worse than placebo), and the high nonresponder class to be 7% (better than placebo). For the venlafaxine XR group, the responder class is estimated to be 90% (better than placebo), the initially improving nonresponder class to be 0% (better than placebo), and the high nonresponder class to be 10% (better than placebo).
Conclusions The growth mixture analysis presented here demonstrates that, unlike conventional repeated measures analysis, it is possible to estimate medication effects in the presence of placebo effects. The analysis is flexible in that the medication effect is allowed to differ across trajectory classes. This approach should therefore have wide applicability in clinical trials. It was shown that medication effects could be expressed as causal effects. The analysis also produces a classification of individuals into trajectory classes. Medication effects were expressed in two alternative ways, as changes in growth slopes and as changes in class probabilities. Related to the latter approach, a possible generalization of the model is to include two latent class variables, one before and one after randomization, and to let the medication influence the postrandomization latent class variable as well as transitions between the two latent class variables. Another generalization is proposed in Muthe´n and Brown (2009) considering four classes of subjects: (1) subjects who would respond to both placebo and medication, (2) subjects who would respond to placebo but not medication, (3) subjects who would respond to medication but not placebo, and (4) subjects who would respond to neither placebo nor medication. Class 3 is of particular interest from a pharmaceutical point of view. Prediction of class membership can be incorporated as part of the model but was not explored here. Such analyses suggest interesting opportunities for designs of trials. If at baseline an individual is predicted to belong to a nonresponder class, a different treatment can be chosen.
References Leuchter, A. F., Cook, I. A., Witte, E. A., Morgan, M., & Abrams, M. (2002). Changes in brain function of depressed subjects during treatment with placebo. American Journal of Psychiatry, 159, 122–129. Muthe´n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative methodology for the social sciences (pp. 345–368). Newbury Park, CA: Sage Publications.
178
Causality and Psychopathology
Muthe´n, B., & Asparouhov, T. (2009). Growth mixture modeling: Analysis with nonGaussian random effects. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp. 143–165). Boca Raton, FL: Chapman & Hall/CRC Press. Muthe´n, B. & Brown, H. (2009). Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling. Statistics in Medicine, 28, 3363–3385. Muthe´n, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459–475. Muthe´n, B., & Muthe´n, L. (2000). Integrating person-centered and variable-centered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882–891. Muthe´n, B., & Muthe´n, L. (1998–2008). Mplus user’s guide (5th ed.) Los Angeles: Muthe´n & Muthe´n. Muthe´n, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55, 463–469. Quitkin, F. M., Rabkin, J. G., Ross, D., & Stewart, J. W. (1984). Identification of true drug response to antidepressants. Use of pattern analysis. Archives of General Psychiatry, 41, 782–786.
8 Statistical Methodology for a SMART Design in the Development of Adaptive Treatment Strategies alena i. oetting, janet a. levy, roger d. weiss, and susan a. murphy
Introduction The past two decades have brought new pharmacotherapies as well as behavioral therapies to the field of drug-addiction treatment (Carroll & Onken, 2005; Carroll, 2005; Ling & Smith, 2002; Fiellin, Kleber, Trumble-Hejduk, McLellan, & Kosten, 2004). Despite this progress, the treatment of addiction in clinical practice often remains a matter of trial and error. Some reasons for this difficulty are as follows. First, to date, no one treatment has been found that works well for most patients; that is, patients are heterogeneous in response to any specific treatment. Second, as many authors have pointed out (McLellan, 2002; McLellan, Lewis, O’Brien, & Kleber, 2000), addiction is often a chronic condition, with symptoms waxing and waning over time. Third, relapse is common. Therefore, the clinician is faced with, first, finding a sequence of treatments that works initially to stabilize the patient and, next, deciding which types of treatments will prevent relapse in the longer term. To inform this sequential clinical decision making, adaptive treatment strategies, that is, treatment strategies shaped by individual patient characteristics or patient responses to prior treatments, have been proposed (Greenhouse, Stangl, Kupfer, & Prien, 1991; Murphy, 2003, 2005; Murphy, Lynch, Oslin, McKay, & Tenhave, 2006; Murphy, Oslin, Rush, & Zhu, 2007; Lavori & Dawson, 2000; Lavori, Dawson, & Rush, 2000; Dawson & Lavori, 2003). Here is an example of an adaptive treatment strategy for prescription opioid dependence, modeled with modifications after a trial currently in progress within the Clinical Trials Network of the National Institute on Drug Abuse (Weiss, Sharpe, & Ling, 2010).
179
Causality and Psychopathology
180
Initial Treatment 4 week treatment
Not abstinent
Abstinent
During the initial 4 week treatment
During the initial 4 week treatment
Second Treatment: Step Up:
Second Treatment: Step Down:
12 week treatment
No pharmacotherapy +
Treat untill 16 weeks have elapsed
Treat untill 16 weeks have elapsed
from the beginning of initial treatment
from the beginning of initial treatment
Figure 8.1. An adaptive treatment strategy for prescription opioid dependence.
Example First, provide all patients with a 4-week course of buprenorphine/naloxone (Bup/Nx) plus medical management (MM) plus individual drug counseling (IDC) (Fiellin, Pantalon, Schottenfeld, Gordon, & O’Connor, 1999), culminating in a taper of the Bup/Nx. If at any time during these 4 weeks the patient meets the criterion for nonresponse,1 a second, longer treatment with Bup/Nx (12 weeks) is provided, accompanied by MM and cognitive behavior therapy (CBT). However, if the patient remains abstinent2 from opioid use during those 4 weeks, that is, responds to initial treatment, provide 12 additional weeks of relapse prevention therapy (RPT). A patient whose treatment is consistent with this strategy experiences one of two sequences of two treatments, depicted in Figure 8.1. The two sequences are 1. Four-week Bup/Nx treatment plus MM plus IDC, then if the criterion for nonresponse is met, a subsequent 12-week Bup/Nx treatment plus MM plus CBT. 2. Four-week Bup/Nx treatment plus MM plus IDC, then if abstinence is achieved, a subsequent 12 weeks of RPT.
1. Response to initial treatment is abstinence from opioid use during these first 4 weeks. Nonresponse is defined as any opioid use during these first 4 weeks 2. Abstinence might be operationalized using a criterion based on self-report of opioid use and urine screens.
8 SMART Design in the Development of Adaptive Treatment Strategies
181
This strategy might be intended to maximize the number of days the patient remains abstinent (as confirmed by a combination of urine screens and self-report) over the duration of treatment. Throughout, we use this hypothetical prescription opioid dependence example to make the ideas concrete. In the next section, several research questions useful in guiding the development of an adaptive treatment strategy are discussed. Next, we review the sequential multiple assignment trial (SMART), which is an experimental design developed to answer these questions. We present statistical methodology for analyzing data from a particular SMART design and a comprehensive discussion and evaluation of these statistical considerations in the fourth and fifth sections. In the final section, we present a summary and conclusions and a discussion of suggested areas for future research.
Research Questions to Refine an Adaptive Treatment Strategy Continuing with the prescription opioid dependence example, we might ask if we could begin with a less intensive behavioral therapy (Lavori et al., 2000). For example, standard MM, which is less burdensome than IDC and focuses primarily on medication adherence, might be sufficiently effective for a large majority of patients; that is, we might ask, In the context of the specified options for further treatment, does the addition of IDC to MM result in a better long-term outcome than the use of MM as the sole accompanying behavioral therapy? Alternatively, if we focus on the behavioral therapy accompanying the second longer 12-week treatment, we might ask, Among subjects who did not respond to one of the initial treatments, which accompanying behavioral therapy is better for the secondary treatment: MM+IDC or MM+CBT? On the other hand, instead of focusing on a particular treatment component within strategies, we may be interested in comparing entire adaptive treatment strategies. Consider the strategies in Table 8.1. Suppose we are interested in comparing two of these treatment strategies. If the strategies begin with the same initial treatment, then the comparison reduces to a comparison of the two secondary treatments; in our example, a comparison of strategy C with strategy D is obtained by comparing MM+IDC with MM+CBT among nonresponders to MM alone. We also might compare two strategies with different initial treatments. For example, in some settings, CBT may be the preferred behavioral therapy to use with longer treatments; thus, we might ask, if we are going to provide MM+CBT for nonresponders
182
Causality and Psychopathology
Table 8.1 Potential Strategies to Consider for the Treatment of Prescription Opioid Dependence Initial Treatment
Response to Initial Treatment
Secondary Treatment
Strategy A: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+CBT; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM+IDC MM+CBT 4-week Bup/Nx treatment + Abstinent RPT MM+IDC Strategy B: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+IDC; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM+IDC MM + IDC 4-week Bup/Nx treatment + Abstinent RPT MM+IDC Strategy C: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+CBT; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM MM + CBT 4-week Bup/Nx treatment + Abstinent RPT MM Strategy D: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+IDC; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM MM + IDC 4-week Bup/Nx treatment + Abstinent RPT MM
to the initial treatment and RPT to responders to the initial treatment, Which is the best initial behavioral treatment: MM+IDC or MM? This is a comparison of strategies A and C. Alternately, we might wish to identify which of the four strategies results in the best long-term outcome (here, the highest number of days abstinent). Note that the behavioral therapies and pharmacotherapies are illustrative and were selected to enhance the concreteness of this example; of course, other selections are possible. These research questions can be classified into one of four general types, as summarized in Table 8.2. The SMART experimental design discussed in the next section is particularly suited to addressing these types of questions.
8 SMART Design in the Development of Adaptive Treatment Strategies
183
A SMART Experimental Design and the Development of Adaptive Treatment Strategies Traditional experimental trials typically evaluate a single treatment with no manipulation or control of preceding or subsequent treatments. In contrast, the SMART design provides data that can be used both to assess the efficacy of each treatment within a sequence and to compare the effectiveness of strategies as a whole. A further rationale for the SMART design can be found in Murphy et al. (2006, 2007). We focus on SMART designs in which there are two initial treatment options, then two treatment options for initial nonresponders (alternately, initial responders) and one treatment option for initial treatment responders (alternately, initial nonresponders). In conversations with researchers across the mental-health field, we have found this design to be of the greatest interest; these designs are similar to those employed by the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) (Rush et al., 2003) and the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) (Stroup et al., 2003); additionally, two SMART trials of this type are currently in the field (D. Oslin, personal communication, 2007; W. Pelham, personal communication, 2006). Data from this experimental design can be used to address questions from each type in Table 8.2. Because SMART specifies sequences of treatments, it allows us to determine the effectiveness of one of the treatment components in the presence of either preceding or subsequent treatments; that is, it addresses questions of both types 1 and 2. Also, the use of randomization supports causal inferences about the relative effectiveness of different treatment strategies, as in questions of types 3 and 4. Returning to the prescription opioid dependence example, a useful SMART design is provided in Figure 8.2. Consider a question of the first type from Table 8.2. An example is, In the context of the specified options for further treatment, does the addition of IDC to MM result in a better longterm outcome than the use of MM as the sole accompanying behavioral therapy? This question is answered by comparing the pooled outcomes of subgroups 1,2,3 with those of subgroups 4,5,6. This is the main effect of the initial behavioral treatment. Note that to estimate the main effect of the initial behavioral treatment, we require outcomes from not only initial nonresponders but also initial responders. Clinically, this makes sense as a particular initial treatment may lead to a good response but this response may not be as durable as other initial treatments. Next, consider a question of the second type, such as, Among those who did not respond to one of the initial treatments, which is the better subsequent behavioral treatment: MM+IDC or MM+CBT? This question is addressed by pooling outcome data from subgroups 1 and 4 and comparing the resulting mean to the
Causality and Psychopathology
184
Table 8.2 Four General Types of Research Questions Question
Type of Analysis Required to Answer Question
Research Question
Two questions that concern components of adaptive treatment strategies 1 Hypothesis test Initial treatment effect: What is the effect of initial treatment on long-term outcome in the context of the specified secondary treatments? In other words, what is the main effect of initial treatment? 2 Hypothesis test Secondary treatment effect: Considering only those who did (or did not) respond to one of the initial treatments, what is the best secondary treatment? In other words, what is the main effect of secondary treatment for responders (or nonresponders)? Two questions that concern whole adaptive treatment strategies 3 Hypothesis test Comparing strategy effects: What is the difference in the long-term outcome between two treatment strategies that begin with a different initial treatment? 4 Estimation Choosing the overall best strategy: Which treatment strategy produces the best long-term outcome?
Initial treatment: Randomization
4 wks Bup/Nx
4 wks Bup/Nx
+MM+CBT
+MM
Not
Abstinent
Not
Abstinent
Abstinent
(R=1)
Abstinent
(R=1)
Second treatment
Second treatment
Second treatment
Second treatment
12 wks Bup/Nx
12 wks Bup/Nx
Relapse
12 wks Bup/Nx
12 wks Bup/Nx
Relapse
+MM+CBT
+MM+CBT
Prevention
+MM+CBT
+MM+CBT
Prevention
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Sub-Group 1
Sub-Group 2
Sub-Group 3
Sub-Group 4
Sub-Group 5
Sub-Group 6
Figure 8.2 SMART study design to develop adaptive treatment strategies for prescription opioid dependence.
8 SMART Design in the Development of Adaptive Treatment Strategies
185
pooled outcome data of subgroups 2 and 5. This is the main effect of the secondary behavioral treatment among those not abstinent during the initial 4-week treatment. An example of the third type question would be to test whether strategies A and C in Table 8.1 result in different outcomes; to form this test, we use appropriately weighted outcomes from subgroups 1 and 3 to form an average outcome for strategy A and appropriately weighted outcomes from subgroups 4 and 6 to form an average outcome for strategy C (an alternate example would concern strategies B and D; see the next section for formulae). Note that to compare strategies, we require outcomes from both initial responders as well as initial nonresponders (e.g., subgroup 3 in addition to subgroup 1 and subgroup 6 in addition to subgroup 4). The fourth type of question concerns the estimation of the best of the strategies. To choose the best strategy overall, we follow a similar ‘‘weighting’’ process to form the average outcome for each of the four strategies (A, B, C, D) and then designate as the best strategy the one that is associated with the highest average outcome.
Test Statistics and Sample Size Formulae In this section, we provide the test statistics and sample size formulae for the four types of research questions summarized in Table 8.2. We assume that subjects are randomized equally to the two treatment options at each step. We use the following notation: A1 is the indicator for initial treatment, R denotes the response to the initial treatment (response = 1 and nonresponse = 0), A2 is the treatment indicator for secondary treatment, and Y is the outcome. In our prescription opioid dependence example, the values for these variables are as follows: A1 is 1 if the initial treatment uses MM+IDC and 0 otherwise, A2 is 1 if the secondary treatment for nonresponders uses MM+CBT and 0 otherwise, and Y is the number of days the subject remained abstinent over the 16-week study period.
Statistics for Addressing the Different Research Questions The test statistics for questions 1–3 of Table 8.2 are presented in Table 8.3; the method for addressing question 4 is also given in Table 8.3. The test statistics for questions 1 and 2 are the standard test statistics for a two-group comparison with large samples (Hoel, 1984) and are not unique to the SMART design. The estimator of a strategy mean, used for both questions 3 and 4, as well as the test statistic for question 3 are given in Murphy (2005). In large samples, the three test statistics corresponding to questions 1–3 are
Causality and Psychopathology
186
Table 8.3 Test Statistics for Each of the Possible Questions Type of Question
Test Statistic
1a Z ¼
ðY A1 ¼ 1 Y A1 ¼ 0 Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S2A1 ¼ 1 S2A1 ¼ 0 NA1 ¼ 1 þ NA1 ¼ 0
where NA1=i denotes the number of subjects who received i as the initial treatment 2a Z ¼
ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S2R¼0; A2¼1 NR¼0; A2¼1
þ
S2R¼0; A2¼0 NR¼0; A2¼0
where NR=0, A2=i denotes the number of nonresponders who received i as the secondary treatment 3b
pffiffiffiffi ^ A1¼1; A2¼a2 ^ A1¼0; A2¼b2 Þ N ð
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z ¼ ^ 2A1¼1; A2¼a2 þ ^ 2A1¼0; A2¼b2 where N is the total number of subjects and a2 and b2 are the secondary treatments in the two prespecified strategies being compared
4
^ A1¼1; A2¼1 ; ^ A1¼0; A2¼1 ; ^ A1¼1; A2¼0 ;
^ A1¼0; A2¼0 Choose largest of
a
The subscripts on Y and S2 denote groups of subjects. For example YR¼0;A2¼1 is the average outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. S2R¼0;A2¼1 is the sample variance of the outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. Similarly, the subscript on N denotes the group of subjects. b ^ is an estimator of the mean outcome and^ 2 is the associated variance estimator for a
particular strategy. Here, the subscript denotes the strategy. The formulae for
^ and ^ 2 are in Table 8.4.
normally distributed (with mean zero under the null hypothesis of no effect). In Tables 8.3, 8.4, and 8.5, specific values of Ai are denoted by ai and bi, where i indicates the initial treatment (i = 1) or secondary treatment (i = 2); these specific values are either 1 or 0.
Sample Size Calculations In the following, all sample size formulae assume a two-tailed z-test. Let be the desired size of the hypothesis test, let 1 – be the power of the test, and let z=2 be the standard normal (1 – /2) percentile. Approximate normality of the test statistic is assumed throughout.
8 SMART Design in the Development of Adaptive Treatment Strategies
187
Table 8.4 Estimators for Strategy Means and for Variance of Estimator of Strategy Means Strategy Sequence (a1, a2)
N*Estimator for Variance of Estimator of Strategy Mean:
Estimator for Strategy Mean: N X
^ A1¼a1; A2¼a2 ¼
Wi ða1 ; a2 ÞYi
i¼1 N X
^ 2A1¼a1; A2¼a2 ¼
Wi ða1 ; a2 Þi
i¼1
(1, 1)
Wi ð1; 1Þ ¼
N 1X Wi ða1 ; a2 Þ2 N i¼1
ðYi ^ A1¼a1; A2¼a2 Þ2
A1i A2i ð1 Ri Þ þ Ri :5 :5
(1, 0)
Wi ð1; 0Þ ¼
A1i ð1 A2i Þ þ Ri ð1 Ri Þ :5 :5
(0, 1)
Wi ð0; 1Þ ¼
ð1 A1i Þ A2i ð1 Ri Þ þ Ri :5 :5
(0, 0)
Wi ð0; 0Þ ¼
ð1 A1i Þ ð1 A2i Þ ð1 Ri Þ þ Ri :5 :5
Data for subject i are of the form (A1i, Ri, A2i, Yi), where A1i, Ri, A2i, and Yi are defined as in the section Test Statistics and Sample Size Formulae and N is the total sample size.
In order to calculate the sample size, one must also input the desired detectable standardized effect size. We denote the standardized effect size by and use the definition found in Cohen (1988). The standardized effect sizes for the various research questions we are considering are summarized in Table 8.5. The sample size formulae for questions 1 and 2 are standard formulae (Jennison & Turnbull, 2000) and assume an equal number in each of the two groups being compared. Given desired levels of size, power, and standardized effect size, the total sample size required for question 1 is N1 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 The sample size formula for question 2 requires the user to postulate the initial response rate, which is used to provide the number of subjects who will be randomized to secondary treatments. The sample size formula uses the working assumption that the initial response rates are equal; that is, subjects respond to initial treatment at the same rate regardless of the particular initial treatment, p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0]. This working assumption is used only to size the SMART and is not used to analyze the
Causality and Psychopathology
188
Table 8.5 Standardized Effect Sizes for Addressing the Four Questions in Table 8.2 Research Question
Formula for Standardized Effect Size
1
¼
2
¼
E½Y j A1 ¼ 1 E½Y j A1 ¼ 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ 1 þ Var½Y j A1 ¼ 0 2
E½Y j R ¼ 0; A2 ¼ 1 E½Y j R ¼ 0; A2 ¼ 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j R ¼ 0; A2 ¼ 0 þ Var½Y j R ¼ 0; A2 ¼ 0 2
3
¼
E½Y j A1 ¼ 1; A2 ¼ a2 E½Y j A1 ¼ 0; A2 ¼ b2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ 1; A2 ¼ a2 þ Var½Y j A1 ¼ 0; A2 ¼ b2 2
where a2 and b2 are the secondary treatment assignments of A2 4
¼
E½Y j A1 ¼ a1 ; A2 ¼ a2 E½Y j A1 ¼ b1 ; A2 ¼ b2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ a1 ; A2 ¼ a2 þ Var½Y j A1 ¼ b1 ; A2 ¼ b2 2
where (a1, a2) = strategy with the highest mean outcome, (b1, b2) = strategy with the next highest mean outcome; ai and bi indicate specific values of Ai, i = 1,2
data from it, as can be seen from Table 8.3. The formula for the total required sample size for question 2 is N2 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 =ð1 pÞ When calculating the sample sizes to test question 3, two different sample size formulae can be used: one that inputs the postulated initial response rate and one that does not. The formula that uses a guess of the initial response rate makes two working assumptions. First, the response rates are equal for both initial treatments (denoted by p), and second, the variability of the outcome Y around the strategy mean (A1 = 1, A2 = a2), among either initial responders or nonresponders, is less than the variance of the strategy mean and similarly for strategy (A1 = 0, A2 = b2). This formula is N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ 1 pÞÞ ð1= Þ2 The second formula does not require either of these two working assumptions; it specifies the sample size required if the response rates are both 0, a ‘‘worst-case scenario.’’ This conservative sample size formula for addressing question 3 is N3b ¼ 2 ðz=2 þ z Þ2 4 ð1= Þ2
8 SMART Design in the Development of Adaptive Treatment Strategies
189
We will compare the performance of these two sample size formulae for addressing question 3 in the next section. See the Appendix for a derivation of these formulae. The method for finding the sample size for question 4 relies on an algorithm rather than a formula; we will refer to the resulting sample size as N4. Since question 4 is not a hypothesis test, instead of specifying power to detect a difference in two means, the sample size is based on the desired probability to detect the strategy that results in the highest mean outcome. The standardized effect size in this case involves the difference between the two highest strategy means. This algorithm makes the working assumption that
2 = Var[Y|A1 = a1, A2 = a2] is the same for all strategies. The algorithm uses an idea similar to the one used to derive the sample size formula for question 3 that is invariant to the response rate. Given a desired level of probability for selecting the correct treatment strategy with the highest mean and a desired treatment strategy effect, the algorithm for question 4 finds the sample sizes that correspond to the range of response probabilities and then chooses the largest sample size. Since it is based on a worst-case scenario, this algorithm will result in a conservative sample size formula. See the Appendix for a derivation of this algorithm. The online sample size calculator for question 4 can be found at http://methodologymedia. psu.edu/smart/samplesize. Example sample sizes are given in Table 8.6. Note that as the response rate decreases, the required sample sizes for question 3 (e.g., comparing two strategies that have different initial treatments) increases. To see why this must be the case, consider two extreme cases, the first in which the response rate is 90% for both initial treatments and the second in which the nonresponse rate is 90%. In the former case, if n subjects are assigned to treatment 1 initially and 90% respond (i.e., 10% do not respond), then the resulting sample size for strategy (1, 1) is 0.9 * n + ½ * 0.1 * n = 0.95 * n. The ½ occurs due to the second randomization of nonresponders between the two secondary treatments. On the other hand, if only 10% respond (i.e., 90% do not respond), then the resulting sample size for strategy (1, 1) is 0.1 * n + ½ * 0.9 * n = 0.55 * n, which is less than 0.95 * n. Thus, the lower the expected response rate, the larger the initial sample size required for a given power to differentiate between two strategies. This result occurs because the number of treatment options (two options) for nonresponders is greater than the number of treatment options for responders (only one). Consider the prescription opioid dependence example. Suppose we are particularly interested in investigating whether MM+CBT or MM+IDC is best for subjects who do not respond to their initial treatment. This is a question of type 2. Thus, in order to ascertain the sample size for the SMART design in Figure 8.2, we use formula N2. Suppose we decide to
Causality and Psychopathology
190
Table 8.6 Example Sample Sizes: All Entries Are for Total Sample Size Desired Size(1)
Desired Power(2) 1–
Standardized Effect Size
Initial Response Rate(3) p
Research Question 1
2
3 (varies by p)
3 (invariant to p)
4
= 0.10 = 0.20 = 0.20 p = 0.5 p = 0.1
620 620
1,240 689
930 1,178
1240 1,240
358 358
p = 0.5 p = 0.1
99 99
198 110
149 188
198 198
59 59
p = 0.5 p = 0.1
864 864
1,728 960
1,297 1,642
1,729 1,729
608 608
p = 0.5 p = 0.1
138 138
277 154
207 263
277 277
97 97
p = 0.5 p = 0.1
784 784
1,568 871
1,176 1,490
1,568 1,568
358 358
p = 0.5 p = 0.1
125 125
251 139
188 238
251 251
59 59
p = 0.5 p = 0.1
1,056 1,056
2,112 1,174
1,584 2,007
2,112 2,112
608 608
p = 0.5 p = 0.1
169 169
338 188
254 321
338 338
97 97
= 0.50
= 0.10 = 0.20
= 0.50
= 0.05 = 0.20 = 0.20
= 0.50
= 0.10 = 0.20
= 0.50
a
All entries assume that each statistical test is two-tailed; the sample size for question 4 does not vary by since this is not a hypothesis test. b In question 4, we choose the sample size so that the probability that the treatment strategy with the highest mean has the highest estimated mean is 1–. c The sample size formulae assume that the response rates to initial treatments are equal: p = Pr[R=1|A1=1] = Pr[R=1|A1=0].
8 SMART Design in the Development of Adaptive Treatment Strategies
191
size the trial to detect a standardized effect size of 0.2 between the two secondary treatments with the power and size of the (two-tailed) test at 0.80 and 0.05, respectively. After surveying the literature and discussing the issue with colleagues, suppose we decide that the response rate for the two initial treatments will be approximately 0.10 (p = 0.10). The number of subjects required for this trial is then N2 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 =ð1 pÞ ¼ 4 ðz0:05=2 þ z0:2 Þ2 ð1=0:2Þ2 =0:9 ¼ 871. Furthermore, as secondary objectives, suppose we are interested in comparing strategy A:—Begin with MM+IDC; if nonresponse, provide MM+CBT; if response, provide RPT—with D—Begin with MM; if nonresponse, provide MM+IDC; if response, provide RPT— (corresponding to a specific example of question 3) and in choosing the best strategy overall (question 4). Using the same input values for the parameters and looking at Table 8.6, we see that the sample size required for question 3 is about twice as much as that required for question 2. Thus, unless we are willing and able to double our sample size, we realize that a comparison of strategies A and D will have low power. However, the sample size for question 4 is only 358 (using desired probability of 0.80), so we will be able to answer the secondary objective of choosing the best strategy with 80% probability. Suppose that we conduct the trial with 871 subjects. The hypothetical data 3 set and SAS code for calculating the following values can be found at http:// www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/. For question 2, the value of the z-statistic is ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ ð5:8619 4:3135Þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2:1296; 109:3975 98:5540 S2R¼0; A2¼1 S2 391 þ 396 þ R¼0; A2¼0 NR¼0; A2¼1
NR¼0; A2¼0
which has a two-sided p value of 0.0332. Using the formulae in Table 8.4, we get the following estimates for the strategy means: ½ ^ ð1;1Þ ;
^ ð1;0Þ ;
^ ð0;1Þ ;
^ ð0;0Þ ¼ ½7:1246 4:9994
6:3285 5:6364:
3. We generated this hypothetical data so that the true underlying effect size for question 2 is 0.2, the true effect size for question 3 is 0.2, and the strategy with the highest mean in truth is (1, 1), with an effect size of 0.1. Furthermore, the true response rates for the initial treatments are 0.05 for A1 = 0 and 0.15 for A1 = 1. When we considered 1,000 similar data sets, we found that the analysis for question 2 led to significant results 78% of the time and the analysis for question 3 led to significant results 54% of the time. The latter result and the fact that we did not detect an effect for question 3 in the analysis is unsurprising, considering that we have half the sample size required to detect an effect size of 0.2. Furthermore, across the 1,000 similar simulated data sets the best strategy (1, 1) was detected 86% of the time.
Causality and Psychopathology
192
The corresponding estimates for the variances of the estimates of the strategy means are ½ 2ð1;1Þ ;
2ð1;0Þ ;
2ð0;1Þ ;
2ð0;0Þ ¼ ½396:4555
352:8471
456:5727
441:0138:
Using these estimates, we calculate the value of the corresponding z-statistic for question 3: pffiffiffiffi pffiffiffiffiffiffiffiffi ^ A1¼1; A2¼1
^ A1¼0; A2¼0 Þ Nð
871ð7:1246 5:6364Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1:5178; 2 2 396:4555 þ 441:0138 A1¼1; A2¼1 þ A1¼0; A2¼0 which has a two-sided p value of 0.1291, which leads us not to reject the null hypothesis that the two strategies are equal. For question 4, we choose (1, 1) as the best strategy, which corresponds to the strategy: 1. First, supplement the initial 4-week Bup/Nx treatment with MM+IDC. 2. For those who respond, provide RPT. For those who do not respond, continue the Bup/Nx treatment for 12 weeks but switch the accompanying behavioral treatment to MM+CBT.
Evaluation of Sample Size Formulae Via Simulation In this section, the sample size formulae presented in Sample Size Calculations are evaluated. We examine the robustness of the newly developed methods for calculating sample sizes for questions 3 and 4. In addition, a second assessment investigates the power for question 4 to detect the best strategy when the study is sized for one of the other research questions. The second assessment is provided because, due to the emphasis on strategies in SMART designs, question 4 is always likely to be of interest.
Simulation Designs The sample sizes used for the simulations were chosen to give a power level of 0.90 and a Type I error of 0.05 when one of questions 1–3 is used to size the trial and a 0.90 probability of choosing the best strategy for question 4 when it is used to size the trial; these sample sizes are shown in Table 8.6. For questions 1–3, power is estimated by the proportion of times out of 1,000 simulations that the null hypothesis is correctly rejected; for question 4, the probability of choosing the best strategy is estimated by the proportion of times out of 1,000 simulations that the correct strategy with the highest
8 SMART Design in the Development of Adaptive Treatment Strategies
193
mean is chosen. We sized the studies to detect a prespecified standardized effect size of 0.2 or 0.5. We follow Cohen (1988) in labeling 0.2 as a ‘‘small’’ effect size and 0.5 as a ‘‘medium’’ effect size. The simulated data reflect the types of scenarios found in substance-abuse clinical trials (Gandhi et al., 2003; Fiellin et al., 2006; Ling et al., 2005). For example, the simulated data exhibit initial response rates (i.e., the proportion of simulated subjects with R = 1) of 0.5 and 0.1, and the mean outcome for the responders is higher than for nonresponders. For question 3 we need to specify the strategies of interest, and for the purposes of these simulations we will compare strategies (A1 = 1, A2 = 1) and (A1 = 0, A2 = 0); these are strategies A and D, respectively, from Table 8.1. For the simulations to evaluate the robustness of the sample size calculation for question 4, we choose strategy A to always have the highest mean outcome and generate the data according to two different ‘‘patterns’’: (1) the strategy means are all different and (2) the mean outcomes of the other three strategies besides strategy A are all equal. In the second pattern, it is more difficult to detect the ‘‘best’’ strategy because the highest mean must be distinguished from all the rest, which are all the ‘‘next highest,’’ instead of just one next highest mean. In order to test the robustness of the sample size formulae, we calculate a sample size given by the relevant formula in Sample Size Calculations and then simulate data sets of this sample size. However, the simulated data will not satisfy the working assumptions in one of the following ways:
•
the intermediate response rates to initial treatments are unequal, that is, Pr[R = 1|A1 = 1] 6¼ Pr[R = 1|A1 = 0]
• •
the variances relevant to the question are unequal (for question 4 only) the distribution of the final outcome, Y, is right-skewed (thus, for a given sample size, the test statistic is more likely to have a nonnormal distribution).
We also assess the power of question 4 when it is not used in sizing the trial. For each of the types of research questions in Table 8.2, we generate a data set that follows the working assumptions for the sample size formula for that question (e.g., use N2 to size the study to test the effect of the second treatment on the mean outcome) and then perform question 4 on the data and estimate the probability of choosing the correct strategy with the highest mean outcome. The descriptions of the simulation designs for each of questions 1–4 as well as the parameters for all of the different generative models can be found at http://www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/.
Causality and Psychopathology
194
Robustness of the New Sample Size Formulae As previously mentioned, since the sample size formulae for questions 1 and 2 are standard, we will focus on evaluating the newly developed sample size formulae for questions 3 and 4. Table 8.7a and b provides the results of the simulations designed to evaluate the sample size formulae for questions 3 and 4, respectively. Considering Table 8.7a, we see that the question 3 sample size formula N3a performed extremely well when the expected standardized effect size was 0.20. Resulting power levels were uniformly near 0.90 regardless of either the true initial response rates or any of the three violations of the working assumptions. Power levels were less robust when the sample sizes were smaller (i.e., for the 0.50 effect size). For example, when the initial response rates are not equal, the resulting power is lower than 0.90 in the rows using an assumed response rate of 0.5. The more conservative sample size formula, N3b, performed well in all scenarios, regardless of response rate or the presence of any of the three violations to underlying assumptions. As the response rate approaches 0, the sample sizes are less conservative but the results for power remain within a 95% confidence interval of 0.90. In Table 8.7b, the conservatism of the sample size calculation N4 (associated with question 4) is apparent. We can see that N4 is less conservative for the more difficult scenario where the strategy means besides the highest are all equal, but the probability of correctly identifying the strategy with the highest mean outcome is still about 0.90. Table 8.7a Investigation of Sample Size Assumption Violations for Question 3, Comparing Strategies A and D Simulation Parameters
Simulation Results (Power)
Effect Size
Initial Response Rate (Default)
Sample Size Formula
Total Sample Size
Default Working Assumptions Are Correct
Unequal Initial Response Rates
Non-Normal Outcome Y
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
N3a N3a N3a N3a N3b N3b N3b N3b
1,584 2,007 254 321 2,112 2,112 338 338
0.893 0.882 0.896 0.926a 0.950a 0.903 0.973a 0.937a
0.902 0.910 0.864a 0.886 0.958a 0.934a 0.938a 0.890
0.882 0.877a 0.851a 0.898 0.974a 0.898 0.916 0.922a
The power to reject the null hypothesis for question 3 is shown when sample size is calculated to reject the null hypothesis for question 3 with power of 0.90 and type I error of 0.05 (two-tailed). a The 95% confidence interval for this proportion does not contain 0.90.
8 SMART Design in the Development of Adaptive Treatment Strategies
195
Table 8.7b Investigation of Sample Size Violations for Question 4: Probabilitya to Detect the Correct ‘‘Best’’ Strategy When the Sample Size Is Calculated to Detect the Correct Maximum Strategy Mean 90% of the Time Simulation Parameters Effect Size
Initial Response Rate (Default)
Pattern
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
1 1 1 1 2 2 2 2
Simulation Results (Probability) b
Sample Sizec
Default Working Assumptions Are Correct
Unequal Initial Response Rates
Unequal Variance
Non-Normal Outcome Y
608 608 97 97 608 608 97 97
0.966d 0.962d 0.980d 0.960d 0.964d 0.905 0.922d 0.893
0.984d 0.969d 0.985d 0.919d 0.953d 0.929d 0.974d 0.917
0.965d 0.964d 0.966d 0.976d 0.952d 0.922d 0.976d 0.927d
0.972d 0.962d 0.956d 0.947d 0.944d 0.923d 0.948d 0.885
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum. b 1 refers to the pattern of strategy means such that all are different but that the mean for (A1 = 1, A2 = 1), that is, strategy A, is always the highest. 2 refers to the pattern of strategy means such that the mean for strategy A is higher than the other three and the other three are all equal. c Calculated to detect the correct maximum strategy mean 90% of the time when the sample size assumptions hold. d The 95% confidence interval for this proportion does not contain 0.90.
Overall, under different violations of the working assumptions, the sample size formulae for questions 3 and 4 still performed well in terms of power. As discussed, we also assess the power for question 4 when the trial was sized for a different research question. For each of the types of research questions in Table 8.2, we generate a data set that follows the working assumptions for the sample size formula for that question, then evaluate the power of question 4 to detect the optimal strategy. From Table 8.8a–c, we see that in almost all cases, regardless of the starting assumptions used to size the various research questions, we achieve a 0.9 probability or higher of correctly detecting the strategy with the highest mean outcome. The probability falls below 0.9 when the standardized effect size for question 4 falls below 0.1. These results are not surprising as from Table 8.6 we see that question 4 requires much smaller sample sizes than all the other research questions. Note that question 4 is more closely linked to question 3 than to question 1 or 2. Question 3 is potentially a subset of question 4; this relationship occurs when one of the strategies considered in question 3 is the strategy with the highest mean outcome. The probability of detecting the correct
Causality and Psychopathology
196
Table 8.8a The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 1 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 1
Initial Response Rate
Sample Size
Question 1 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1
1,056 1,056 169 169
0.880 0.904 0.934 0.920
1.000 1.000 0.987 0.998
0.325 0.425 0.350 0.630
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
Table 8.8b The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 2 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 2
Initial Response Rate
Sample Size
Question 2 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1
2,112 1,174 338 188
0.906 0.895 0.895 0.901
0.999 0.716 0.997 0.978
0.133 0.054 0.372 0.420
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
strategy mean as the maximum when sizing for question 3 is generally very good, as can be seen from Table 8.8c. This is due to the fact that the sample sizes required to test the differences between two strategy means (each beginning with a different initial treatment) are much larger than those needed to detect the maximum of four strategy means with a specified degree of confidence. For a z-test of the difference between two strategy means with a two-tailed Type I error rate of 0.05, power of 0.90, and standardized effect size of 0.20, the sample size requirements range 1,584–2,112. The sample size required for a 0.90 probability of selecting the correct strategy mean as a maximum when the standardized effect size between it and the next highest strategy mean is 0.2 is 608. It is therefore not surprising that the selection rates for the correct strategy mean are generally high when
8 SMART Design in the Development of Adaptive Treatment Strategies
197
Table 8.8c The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 3 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 3
Initial Response Rate
Sample Size Formula
Sample Size
Question 3 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
N3a N3a N3a N3a N3b N3b N3b N3b
1,584 2,007 254 321 2,112 2,112 338 338
0.893 0.882 0.896 0.926 0.950 0.903 0.973 0.937
0.939 0.614 0.976 0.978 0.953 0.613 0.989 0.985
0.10 0.02 0.25 0.32 0.10 0.02 0.25 0.32
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
powered to detect differences between strategy means each beginning with a different initial treatment.
Summary Overall, the sample size formulae perform well even when the working assumptions are violated. Additionally, the performance of question 4 is consistently good when sizing for all other research questions; this is most likely due to question 4 requiring smaller sample sizes than the other research questions to achieve good results. When planning a SMART similar to the one considered here, if one is primarily concerned with testing differences between prespecified strategy means, we would recommend using the less conservative formula N3a if one has confidence in knowledge of the initial response rates. We recommend this in light of the considerable cost savings that can be accrued by using this approach, in comparison to the more conservative formula N3b. We comment further on this topic in the Discussion.
Discussion In this chapter, we demonstrated how a SMART can be used to answer research questions about both individual components of an adaptive
198
Causality and Psychopathology
treatment strategy and the treatment strategies as a whole. We presented statistical methodology to guide the design and analysis of a SMART. Two new methods for calculating the sample sizes for a SMART were presented. The first is for sizing a study when one is interested in testing the difference in two strategies that have different initial treatments; this formula incorporates knowledge about initial response rates. The second new sample size calculation is for sizing a study that has as its goal choosing the strategy that has the highest final outcome. We evaluated both of these methods and found that they performed well in simulations that covered a wide range of plausible scenarios. Several comments are in order regarding the violations of assumptions surrounding the values of the initial response rates when investigating sample size formula N3a for question 3. First, we examined violations of the assumption of the homogeneity of response rates across initial treatments such that they differed by 10% (initial response rates differing by more than 10% in addictions clinical trials are rare) and found that the sample size formula performed well. Future research is needed to examine the question regarding the extent to which initial response rates can be misspecified when utilizing this modified sample size formula. Clearly, for gross misspecifications, the trialist is probably better off with the more conservative sample size formula. However, the operationalization of ‘‘gross misspecification’’ needs further research. In the addictions and in many other areas of mental health, both clinical practice as well as trials are plagued with subject nonadherence to treatment. In these cases sophisticated causal inferential methods are often utilized when trials are ‘‘broken’’ in this manner. An alternative to the post hoc, statistical approach to dealing with nonadherence is to consider a proactive experimental design such as SMART. The SMART design provides the means for considering nonadherence as one dimension of nonresponse to treatment. That is, nonadherence is an indication that the treatment must be altered in some way (e.g., by adding a component that is designed to improve motivation to adhere, by switching the treatment). In particular, one might be interested in varying secondary treatments based on both adherence measures and measures of continued drug use. In this chapter we focused on the simple design in which there are two options for nonresponders and one option for responders. Clearly, these results hold for the mirror design (one option for nonresponders and two options for responders). An important step would be to generalize these results to other designs, such as designs in which there are equal numbers of options for responders and nonresponders or designs in which there are three randomizations. In substance abuse, the final outcome variable is often binary; sample size formulae are needed for this setting as well. Alternately,
8 SMART Design in the Development of Adaptive Treatment Strategies
199
the outcome may be time-varying, such as time-varying symptom levels; again, it is important to generalize the results to this setting.
Appendix Sample Size Formulae for Question 3 Here, we present the derivation of the sample size formulae N3a and N3b for question 3 using results from Murphy (2005). Suppose we have data from a SMART design modeled after the one presented in Figure 8.2; that is, there are two options for the initial treatment, followed by two treatment options for nonresponders and one treatment option for responders. We use the same notation and assumptions listed in Test Statistics and Sample Size Formulae. Suppose that we are interested in comparing two strategies that have different initial treatments, strategies (a1, a2) and (b1, b2). Without loss of generality, let a1 = 1 and b1 = 0. To derive the formulae N3a and N3b, we will make the following working ^ ða1; a2Þ is approxiassumption: The sample sizes will be large enough so that
mately normally distributed. We use three additional assumptions for formula N3a. The first is that the response rates for the initial treatments are equal and the second two assumptions are indicated by * and **. The marginal variances relevant to the research question are 20 = Var[Y|A1 = a1, A2 = a2] and 21 = Var[Y|A1 = b1, A2 = b2]. Denote the mean outcome for strategy (A1, A2) by ðA1;A2Þ. The null hypothesis we are interested in testing is H0 : ð1;a2Þ ð1;b2Þ ¼ 0 and the alternative of interest is H1 : ð1;a2Þ ð1;b2Þ ¼ qffiffiffiffiffiffiffiffiffi ffi
2 þ 2 where ¼ 1 2 0 . (Note that is the standardized effect size.) As presented in Statistics for Addressing the Different Research Questions, the test statistic for this hypothesis is
pffiffiffiffi N ^ ð1; a2Þ ^ ð0; b2Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z ¼ ^ 2ð1; a2Þ þ ^ 2ð0; b2Þ where ^ ða1;a2Þ and ^ 2ða1;a2Þ are as defined in Table 8.5; in large samples, this test statistic has a standard normal distribution under the null hypothesis
Causality and Psychopathology
200
(Murphy, Van Der Laan, Robins, & Conduct Problems Prevention Group, 2001). Recall that N is the total sample size for the trial. To find the required sample size N for a two-sided test with power 1– and size , we solve Pr½Z < z=2 or Z > z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 1 for N where z=2 is the standard normal (1–z=2 ) percentile. Thus, we have Pr½Z < z=2 j ð1;a2Þ ð0;b2Þ ¼ þ Pr½Z > z=2 j ð1;a2Þ ð0;b2Þ ¼ ¼ 1 Without loss of generality, assume that > 0 so that Pr½Z < z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 0 and Pr½Z > z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 1 pffiffiffiffi ^ ða1;a2Þ . Note that Define 2ða1;a2Þ ¼ Var½ N
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ 2ð1; a2Þ þ ^ 2ð0; b2Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1; a2Þ þ 2ð0; b2Þ ^ ð0; b2Þ ¼ ^ ð1; a2Þ
is close to 1 in large samples (Murphy, 2005). Now, E½
ð1; a2Þ ð0; b2Þ , so we have 2 3
pffiffiffiffi pffiffiffiffi ^ ^ N
ð1; a2Þ ð0; b2Þ 6 7 N 7¼1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pr6 > z=2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4 5 2ð1; a2Þ þ 2ð0; b2Þ 2 þ 2 ð1; a2Þ
ð0; b2Þ
Note the distribution of
pffiffiffiffi ^ ð0; b2Þ N ^ ð1; a2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1; a2Þ þ 2ð0; b2Þ follows a standard normal distribution in large samples (Murphy et al., 2001). Thus, we have pffiffiffiffi N z z=2 þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ 2ð1; a2Þ þ 2ð0; b2Þ
8 SMART Design in the Development of Adaptive Treatment Strategies
201
Now, using equation 10 in Murphy (2005) for k = 2 steps1 (initial and secondary) of treatment, " # ðY ða1; a2Þ Þ2 2 ða1; a2Þ ¼ Ea1;a2 Prða1 Þ Prða2 j R; a1 Þ "
# ðY ða1; a2Þ Þ2 ¼ Ea1;a2 R ¼ 1 Pra1 ½R ¼ 1 Prða1 Þ Prða2 j 1; a1 Þ " # ðY ða1; a2Þ Þ2 þ Ea1;a2 R ¼ 0 Pra1 ½R ¼ 0 Prða1 Þ Prða2 j 0; a1 Þ for all values of a1, a2; the subscripts on E and Pr (namely, Ea1,a2 and Pra1) indicate expectations and probabilities calculated as if all subjects were assigned a1 as the initial treatment and then, if nonresponse, assigned treatment a2. If we are willing to make the assumption (*) that Ea1;a2 ½ðY ða1; a2Þ Þ2 jR Ea1;a2 ½ðY ða1; a2Þ Þ2 for both R = 1 and R = 0 (i.e., the variability of the outcome around the strategy mean among either responders or nonresponders is no more than the variance of the strategy mean), then 2ða1; a2Þ Ea1;a2 ½ðY ða1; a2Þ Þ2 þ Ea1;a2 ½ðY ða1; a2Þ Þ2
Pra1 ½R ¼ 1 Prða1 Þ Prða2 j 1; a1 Þ Pra1 ½R ¼ 0 : Prða1 Þ Prða2 j 0; a1 Þ
Thus, we have 2ða1; a2Þ 2ða1; a2Þ
Pra1 ½R ¼ 1 Pra1 ½R ¼ 0 þ Prða1 Þ Prða2 j 1; a1 Þ Prða1Þ Prða2 j 0; a1 Þ
ð2Þ
where 2ða1; a2Þ is the marginal variance of the strategy in question. Since (**) nonresponding subjects (R = 0) are randomized equally to the two initial treatment options and since there is one treatment option for responders (R = 1), for a common initial response rate p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0], 2ða1; a2Þ 2ða1; a2Þ 2 ð2 ð1 pÞ þ 1 pÞÞ
202
Causality and Psychopathology
Rearranging equation 1 gives us 0qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 12 2ð1; a2Þ þ 2ð0; b2Þ ðz þ z=2 ÞA N @ 0 12 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B ð 21 þ 20 Þð2 ð2 ð1 pÞ þ pÞÞ C rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B ðz þ z=2 ÞC @ A
2 þ 20 1 2 Simplifying, we have the formula N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ pÞÞ ð1= Þ2 which is the sample size formula given in Sample Size Calculations that depends on the response rate p. Going through the arguments once again, we see that we do not need either of the two working assumptions (*) or (**) to obtain the conservative sample size formula, N3b: 2 4 ð1= Þ2 ðz þ z=2 Þ2 ¼ N3b
Sample Size Calculation for Question 4 We now present the algorithm for calculating the sample size for question 4. As in the previous section, suppose we have data from a SMART design modeled after the one presented in Figure 8.2; we use the same notation and assumptions listed in Test Statistics and Sample Size Formulae. Suppose that we are interested in identifying the strategy that has the highest mean outcome. We will denote the mean outcome for strategy (A1, A2) by ðA1;A2Þ. We make the following assumptions:
•
The marginal variances of the final outcome given the strategy are all equal, and we denote this variance by 2. This means that 2 = Var[Y|A1 = a1, A2 = a2] for all (a1, a2) in {(1,1), (1,0), (0,1), (0,0)}.
•
The sample sizes will be large enough so that ^ ða1; a2Þ is approximately normally distributed.
•
The correlation between the estimated mean outcome for strategy (1, 1) and the estimated mean outcome for strategy (1, 0) is the same as the correlation between the estimated mean outcome for strategy (0, 1) and the estimated mean outcome resulting for strategy (0, 0); we denote this identical correlation by .
8 SMART Design in the Development of Adaptive Treatment Strategies
203
The correlation of the treatment strategies is directly related to the initial response rates. The final outcome under two different treatment strategies will be correlated to the extent that they share responders. For example, if the response rate for treatment A1 = 1 is 0, then everyone is a nonresponder and the means calculated for Y given strategy (1, 1) and for Y given strategy (1, 0) will not share any responders to treatment A1 = 1; thus, the correlation between the two strategies will be 0. On the other hand, if the response rate for treatment A1 = 1 is 1, then everyone is a responder to A1 = 1 and, therefore, the mean outcomes for strategy (1, 1) and strategy (1, 0) will be directly related (i.e., completely correlated). Two treatment strategies that each begin with a different initial treatment are not correlated since the strategies do not overlap (i.e., they do not share any subjects). For the algorithm, the user must specify the following quantities:
• •
the desired standardized effect size, the desired probability that the strategy estimated to have the largest mean outcome does in fact have the largest mean,
We assume that three of the strategies have the same mean and the one remaining strategy produces the largest mean; this is an extreme scenario in which it is most difficult to detect the presence of an effect. Without loss of generality, we choose strategy (1, 1) to have the largest mean. Consider the following algorithm as a function of N: 1. For every value of in {0, 0.01, 0.02, . . . , 0.99, 1} perform the following simulation: ^ ð1;0Þ ^ ð0;1Þ ^ ð0;0Þ T from ^ ð1;1Þ
Generate K = 20,000 samples of ½
a multivariate normal with 2
3 2 3
ð1;1Þ =2 6 ð1;0Þ 7 6 0 7 7 6 7 mean M ¼ 6 4 ð0;1Þ 5 ¼ 4 0 5 and
ð0;0Þ 0 2
1 16 1 6 covariance matrix ¼ 4 N 0 0 0 0
0 0 1
3 0 07 7 5 1
This gives us 20,000 samples, V1 ; . . . ; Vk ; . . . ; V20000 , where each Vk is a vector of four entries of outcomes, one from each treatment strategy. ^ ð1; 1Þ;k
^ ð1; 0Þ;k ^ ð0; 1Þ;k ^ ð0; 0Þ;k . For example, Vkt ¼ ½
204
Causality and Psychopathology
^ ð1; 1Þ;k is highest;
Count how many times out of V1 ; . . . ; V20000 that
divide this count by 20,000, and call this value C(N). C(N) is the estimate for the probability of correctly identifying the strategy with the highest mean. 2. At the end of step 1, we will have a value of C(N) for each in {0, 0.01, 0.02, . . . , 0.99, 1}. Let N ¼ min C ðNÞ; the value of N is the lowest probability of detecting the best strategy mean. Next, we perform a search over the space of possible values of N to find the value for which N ¼ . N4 is the value of N for which N ¼ . The online calculator for the sample size for question 4 can be found at http://methodologymedia.psu.edu/smart/samplesize.
References Carroll, K. M. (2005). Recent advances in psychotherapy of addictive disorders. Current Psychiatry Reports, 7, 329–336. Carroll, K. M., & Onken, L. S. (2005). Behavioral therapies for drug abuse. American Journal of Psychiatry, 162(8), 1452–1460. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Dawson, R., & Lavori, P. W. (2003). Comparison of designs for adaptive treatment strategies: Baseline vs. adaptive randomization. Journal of Statistical Planning and Inference, 117, 365–385. Fiellin, D. A., Kleber, H., Trumble-Hejduk, J. G., McLellan, A. T., & Kosten, T. R. (2004). Consensus statement on office based treatment of opioid dependence using buprenorphine. Journal of Substance Abuse Treatment, 27, 153–159. Fiellin, D., Pantalon, M., Schottenfeld, R., Gordon, L., & O’Connor, P. (1999). Manual for standard medical management of opioid dependence with buprenorphine. New Haven, CT: Yale University School of Medicine, Primary Care Center and Substance Abuse Center, West Haven VA/CT Healthcare System. Fiellin, D. A., Pantalon, M. V., Chawarski, M. C., Moore, B. A., Sullivan, L. E., O’Connor, P. G., et al. (2006). Counseling plus buprenorphine-naloxone maintenance therapy for opioid dependence. New England Journal of Medicine, 355(4), 365–374. Gandhi, D. H., Jaffe, J. H., McNary, S., Kavanagh, G. J., Hayes, M., & Currens, M. (2003). Short-term outcomes after brief ambulatory opioid detoxification with buprenorphine in young heroin users. Addiction, 98, 453–462. Greenhouse, J., Stangl, D., Kupfer, D., & Prien, R. (1991). Methodological issues in maintenance therapy clinical trials. Archives of General Psychiatry, 48(3), 313–318. Hoel, P. (1984). Introduction to mathematical statistics (5th ed.). New York: John Wiley & Sons. Jennison, C., & Turnbull, B. (2000). Group sequential methods with applications to clinical trials. Boca Raton, FL: Chapman & Hall.
8 SMART Design in the Development of Adaptive Treatment Strategies
205
Lavori, P.W., & Dawson, R. (2000). A design for testing clinical strategies: Biased adaptive within-subject randomization. Journal of the Royal Statistical Association, 163, 29–38. Lavori, P. W., Dawson, R., & Rush, A. J. (2000). Flexible treatment strategies in chronic disease: Clinical and research implications. Biological Psychiatry, 48, 605–614. Ling, W., Amass, L., Shoptow, S., Annon, J. J., Hillhouse, M., Babcock, D., et al. (2005). A multi-center randomized trial of buprenorphine-naloxone versus clonidine for opioid detoxification: Findings from the National Institute on Drug Abuse Clinical Trials Network. Addiction, 100, 1090–1100. Ling, W., & Smith, D. (2002). Buprenorphine: Blending practice and research. Journal of Substance Abuse Treatment, 23, 87–92. McLellan, A. T. (2002). Have we evaluated addiction treatment correctly? Implications from a chronic care perspective. Addiction, 97, 249–252. McLellan, A. T., Lewis, D. C., O’Brien, C. P., & Kleber, H. D. (2000). Drug dependence, a chronic medical illness. Implications for treatment, insurance, and outcomes evaluation. Journal of the American Medical Association, 284(13), 1689–1695. Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, 65, 331–366. Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481. Murphy, S. A., Lynch, K. G., Oslin, D.A., McKay, J. R., & Tenhave, T. (2006). Developing adaptive treatment strategies in substance abuse research. Drug and Alcohol Dependence. doi:10.1016/j.drugalcdep.2006.09.008. Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32, 257–262. Murphy, S. A., Van Der Laan, M. J., Robins, J. M., & Conduct Problems Prevention Group (2001). Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456), 1410–1423. Rush, A. J., Crismon, M. L., Kashner, T. M., Toprac, M. G., Carmody, T. J., Trivedi, M. H., et al. (2003). Texas medication algorithm project, phase 3 (TMAP-3): Rationale and study design. J. Clin. Psychiatry, 64(4), 357–369. Stroup, T. S., McEvoy, J. P., Swartz, M. S., Byerly, M. J., Glick, I. D, Canive, J. M., et al. (2003). The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: Schizophrenia trial design and protocol development. Schizophrenia Bulletin, 29(1), 15–31. Weiss, R., Sharpe, J. P., & Ling, W. A. (2010). Two-phase randomized controlled clinical trial of buprenorphine/naloxone treatment plus individual drug counseling for opioid analgesic dependence. National Institute on Drug Abuse Clinical Trials Network. Retrieved June 14, 2020 from http://www.clinicaltrials.gov/ct/show/ NCT00316277?order=1
9 Obtaining Robust Causal Evidence From Observational Studies Can Genetic Epidemiology Help? george davey smith
Introduction: The Limits of Observational Epidemiology Observational epidemiological studies have clearly made important contributions to understanding the determinants of population health. However, there have been high-profile problems with this approach, highlighted by apparently contradictory findings emerging from observational studies and from randomized controlled trials (RCTs) of the same issue. These situations, of which the best known probably relates to the use of hormone-replacement therapy (HRT) in coronary heart disease (CHD) prevention, have been discussed elsewhere (Davey Smith & Ebrahim, 2002) . The HRT controversy is covered elsewhere in this volume (see Chapter 5). Here, I will discuss two examples. First, consider the use of vitamin E supplements and CHD risk. Several observational studies have suggested that the use of vitamin E supplements is associated with a reduced risk of CHD, two of the most influential being the Health Professionals Follow-Up Study (Rimm et al., 1993) and the Nurses’ Health Study (Stampfer et al., 1993), both published in the New England Journal of Medicine in 1993. Findings from one of these studies are presented in Figure 9.1, where it can be seen that even short-term use of vitamin E supplements was associated with reduced CHD risk, which persisted after adjustment for confounding factors. Figure 9.2 demonstrates that nearly half of U.S. adults are taking either vitamin E supplements or multivitamin/multimineral supplements that generally contain vitamin E (Radimer et al., 2004). Figure 9.3 presents data from three available time points, where there appears to have been a particular increase in vitamin E use following 1993 (Millen, Dodd, & Subar, 2004), possibly consequent upon the publication of the two observational studies already mentioned, which 206
9 Obtaining Robust Causal Evidence From Observational Studies RR
207
2
1.5
1
0.5
0 0-1 year
2-4 years
5-9 years
>10 years
Figure 9.1 Observed effect of duration of vitamin E use compared to no use on coronary heart disease events in the Health Professional Follow-Up Study. From ‘‘Vitamin E consumption and the risk of coronary heart disease in men,’’ by E. B. Rimm, M. J. Stampfer, A. Ascherio, E. Giovannucci, G. A. Colditz, & W. C. Willett, 1993, New England Journal of Medicine, 328, 1450–1456.
have received nearly 3,000 citations between them since publication. The apparently strong observational evidence with respect to vitamin E and reduced CHD risk, which may have influenced the very high current use of vitamin E supplements in developed countries, was unfortunately not realized in RCTs (Figure 9.4), in which no benefit from vitamin E supplementation use is seen. In this example it is important to note that the observational studies and the RCTs were testing precisely the same exposure—short-term
40
Male
Percent
30
Female
20
10
0 Multivitamins/multimineral
Vitamin E
Vitamin C
Figure 9.2 Use of vitamin supplements in the past month among U.S. adults, 1999– 2000. From ‘‘Dietary supplement use by US adults: Data from the National Health and Nutrition Examination Survey, 1999–2000,’’ by K. Radimer, B. Bindewald, J. Hughes, B. Ervin, C. Swanson, & M. F. Picciano, 2004, American Journal of Epidemiology, 160, 339–349.
Causality and Psychopathology
208 30
1987
1992
2000
Percent
20
10
0 Multivitamins
Vitamin E
Vitamin C
Figure 9.3 Use of vitamin supplements in U.S. adults, 1987–2000. From ‘‘Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: The 1987, 1992, and 2000 National Health Interview Survey results.’’ by A. E. Millen, K. W. Dodd, & A. F. Subar, 2004, Journal of the American Dietetic Association, 104, 942–950.
vitamin E supplement use—and yet yielded very different findings with respect to the apparent influence on risk. In 2001 the Lancet published an observational study demonstrating an inverse association between circulating vitamin C levels and incident CHD (Khaw et al., 2001). The left-hand side of Figure 9.5 summarizes these data, presenting the relative risk for a 15.7 mmol/l higher plasma vitamin C level, assuming a log-linear association. As can be seen, adjustment for confounders had little impact on this association. However, a large-scale RCT, the Heart Protection Study, examined the effect of a supplement that increased average plasma vitamin C levels by 15.7 mmol/l. In this study randomization
1.1 1.0 0.9
0.7
0.5
0.3 Stampfer 1993
Rimm 1993
RCTs
Figure 9.4 Vitamin E supplement use and risk of coronary heart disease in two observational studies (Rimm et al., 1993; Stampfer et al., 1993) and in a meta-analysis of randomized controlled trials (Eidelman, Hollar, Hebert, Lamas, & Hennekens, 2004).
9 Obtaining Robust Causal Evidence From Observational Studies
209
1.06 (0.95,1.16)
Heart Protection Study EPIC m
0.72 (0.61,0.86)
EPIC m*
0.70 (0.51,0.95) 0.63 (0.49,0.84)
EPIC w
0.63 (0.45,0.90)
EPIC w* Favours Vitamin C
.4
.6
.8
Does not favour Vitamin C
1
1.2
Relative risk
Figure 9.5 Estimates of the effects of an increase of 15.7 mmol/l plasma vitamin C on coronary heart disease 5-year mortality estimated from the observational epidemiological European Prospective Investigation Into Cancer and Nutrition (EPIC) (Khaw et al., 2001) and the randomized controlled Heart Protection Study (Heart Protection Study Collaborative Group, 2002). EPIC m, male, age-adjusted; EPIC m*, male, adjusted for systolic blood pressure, cholesterol, body mass index, smoking, diabetes, and vitamin supplement use; EPIC f, female, age-adjusted; EPIC f*, female, adjusted for systolic blood pressure, cholesterol, body mass index, smoking, diabetes, and vitamin supplement use.
to the supplement was associated with no decrement in CHD risk (Heart Protection Study Collaborative Group, 2002). What underlies the discrepancy between these findings? One possibility is that there is considerable confounding between vitamin C levels and other exposures that could increase the risk of CHD. In the British Women’s Heart and Health study (BWHHS), for example, women with higher plasma vitamin C levels were less likely to be in a manual social class, to have no car access, to be a smoker, or to be obese and more likely to exercise, to be on a low-fat diet, to have a daily alcoholic drink, and to be tall (Lawlor, Davey Smith, Kundu, Bruckdorfer, & Ebrahim, 2004). Furthermore, for women in their 60s and 70s, those with higher plasma vitamin C levels were less likely to have come from a home 50 years or more previously in which their father was in a manual job, there was no bathroom or hot water, or they had to share a bedroom. They were also less likely to have limited educational attainment. In short, a substantial amount of confounding by factors from across the life course that predict elevated risk of CHD was seen. Table 9.1 illustrates how four simple dichotomous variables from across the life course
Causality and Psychopathology
210
Table 9.1 Cardiovascular Mortality According to Cumulative Risk Indicator (Father’s Social Class, Adulthood Social Class, Smoking, Alcohol Use)
4 3 2 1 0
favorable favorable favorable favorable favorable
(0 (1 (2 (3 (4
unfavorable) unfavorable) unfavorable) unfavorable) unfavorable)
n
CVD Deaths
Relative Risk
517 1,299 1,606 1,448 758
47 227 354 339 220
1 1.99 2.60 2.98 4.55
(1.45–2.73) (1.92–3.52) (2.20–4.05) (3.32–6.24)
From Davey Smith & Hart (2002).
can generate large differences in cardiovascular disease mortality (Davey Smith & Hart, 2002). In the BWHHS a 15.7 mmol/l higher plasma vitamin C level was associated with a relative risk of incident CHD of 0.88 (95% confidence interval [CI] 0.80–0.97), in the same direction as the estimates seen in the observational study summarized in Figure 9.2. When adjusted for the same confounders as were adjusted for in the observational study reported in Figure 9.5, the estimate changed very little—to 0.90 (95% CI 0.82–0.99). When additional adjustment for confounders acting across the life course was made, considerable attenuation was seen, with a residual relative risk of 0.95 (95% CI 0.85–1.05) (Lawlor et al., 2005). It is obvious that given inevitable amounts of measurement imprecision in the confounders or a limited number of missing unmeasured confounders, the residual association is essentially null and close to the finding of the RCT. Most studies have more limited information on potential confounders than is available in the BWHHS, and in other fields we may be even more ignorant of the confounding factors we should measure. In these cases inferences drawn from observational epidemiological studies may be seriously misleading. As the major and compelling rationale for doing these observational studies is to underpin public-health prevention strategies, their repeated failures are a major concern for public-health policy makers, researchers, and funders. Other processes in addition to confounding can produce robust, but noncausal, associations in observational studies. Reverse causation—where the disease influences the apparent exposure, rather than vice versa—may generate strong and replicable associations. For example, many studies have found that people with low circulating cholesterol levels are at increased risk of several cancers, including colon cancer. If causal, this is an important association as it might mean that efforts to lower cholesterol levels would increase the risk of cancer. However, it is possible that the early stages of cancer may, many years before diagnosis or death, lead to a lowering in
9 Obtaining Robust Causal Evidence From Observational Studies
211
cholesterol levels, rather than low cholesterol levels increasing the risk of cancer. Similarly, studies of inflammatory markers such as C-reactive protein and cardiovascular disease risk have shown that early stages of atherosclerosis—which is an inflammatory process—may lead to elevation in circulating inflammatory markers; and since people with atherosclerosis are more likely to experience cardiovascular events, a robust, but noncausal, association between levels of inflammatory markers and incident cardiovascular disease is generated. Reverse causation can also occur through behavioral processes—for example, people with early stages and symptoms of cardiovascular disease may reduce their consumption of alcohol, which would generate a situation in which alcohol intake appears to protect against cardiovascular disease. A form of reverse causation can also occur through reporting bias, with the presence of disease influencing reporting disposition. In case–control studies people with the disease under investigation may report on their prior exposure history in a different way from controls, perhaps because the former will think harder about potential reasons that account for why they have developed the disease.
Table 9.2a Means or proportions of blood pressure, pulse pressure, hypertension and potential confounders by quarters of C-reactive protein (CRP) N = 3,529 (from Davey Smith et al 2005) Means or proportions by quarters of C-reactive protein (Range mg/L)
Hypertension (%) 2
BMI (kg/m ) HDLc (mmol/l) Lifecourse socioeconomic position score Doctor diagnosis of diabetes (%) Current smoker (%) Physically inactive (%) Moderate alcohol consumption (%)
P trend across categories
1 (0.16-0.85)
2 (0.86-1.71)
3 (1.72-3.88)
4 (3.89-112.0)
45.8
49.7
57.5
60.
< 0.001
25.2 1.80 4.08
27.0 1.69 4.37
28.5 1. 4.46
29.7 1.53 4.75
< 0.001 < 0.001 < 0.001
3.5
2.8
4.1
8.4
< 0.001
7.9
9.6
10.9
15.4
< 0.001
11.3
14.9
20.1
29.6
< 0.001
22.2
19.6
18.8
14.0
< 0.001
Causality and Psychopathology
212
Table 9.2b Means or proportions of CRP systolic blood pressure, hypertension and potential confounders by 1059G/C genotype (from Davey Smith et al 2005) Means or proportions by genotype
a
CRP(mg/L log scale) Hypertension (%) BMI (kg/m2) HDLc (mmol/l) Lifecourse socioeconomic position score Doctor diagnosed diabetes (%) Current smoker (%) Physically inactive (%) Moderate alcohol consumption (%)
P
GG
GC or CC
1.81 53.3 27.5 1.67 4.35
1.39 53.1 27.8 1.65 4.42
< 0.001 0.95 0.29 0.38 0.53
4.7 11.2 18.9 18.6
4.5 9.3 18.9 19.8
0.80 0.24 1.0 0.56
a
Geometric means and proportionate (%) change for a doubling of CRP CRP: C-reactive protein; OR: odds ratio; FEV1: forced expiratory volume expiratory in one second; HDLc: high density lipoprotein cholesterol; CVD: cardiovascular disease (stroke or coronary heart disease)
In observational studies, associations between an exposure and disease will generally be biased if there is selection according to an exposure–disease combination in case–control studies or according to an exposure–disease risk combination in prospective studies. Such selection may arise through differential participation in research studies, conducting studies in settings such as hospitals where cases and controls are not representative of the general population, or study of unusual populations (e.g., vegetarians). If, for example, those people experiencing an exposure but at low risk of disease for other reasons were differentially excluded from a study, the exposure would appear to be positively related to disease outcome, even if there were no such association in the underlying population. This is a form of ‘‘Berkson’s bias,’’ well known to epidemiologists (Berkson, 1946). A possible example of such associative selection bias relates to the finding in the large American Cancer Society volunteer cohort that high alcohol consumption was associated with a reduced risk of stroke (Thun et al., 1997). This is somewhat counterintuitive as the outcome category included hemorrhagic stroke (for which there is no obvious mechanism through which alcohol would reduce risk) and because alcohol is known to increase blood pressure, a major causal factor for stroke. Population-based studies have found that heavy alcohol consumption tends to increase stroke risk, particularly hemorrhagic stroke risk (Hart, Davey Smith, Hole, & Hawthorne, 1999; Reynolds et al., 2003). Heavy drinkers who volunteer for a study known to be about the health effects of their lifestyle are
9 Obtaining Robust Causal Evidence From Observational Studies
213
likely to be very unrepresentative of all heavy drinkers in the population, in ways that render them to be at low risk of stroke. Moderate drinkers and nondrinkers who volunteer may be more representative of moderate drinkers and nondrinkers in the underlying population. Thus, the low risk of stroke in the heavy drinkers who volunteer for the study could erroneously make it appear that alcohol reduces the risk of stroke. These problems of confounding and bias relate to the production of associations in observational studies that are not reliable indicators of the true direction of causal associations. A separate issue is that the strength of associations between causal risk factors and disease in observational studies will generally be underestimated due to random measurement imprecision in indexing the exposure. A century ago, Charles Spearman demonstrated mathematically how such measurement imprecision would lead to what he termed the ‘‘attenuation by errors’’ of associations (Spearman, 1904; Davey Smith & Phillips, 1996). This has more latterly been renamed ‘‘regression dilution bias.’’ (MacMahon S, Peto R, Cutler J, Collins R, Sorlie P, et al 1990) Observational studies can and do produce findings that either spuriously enhance or downgrade estimates of causal associations between modifiable exposures and disease. This has serious consequences for the appropriateness of interventions that aim to reduce disease risk in populations. It is for these reasons that alternative approaches—including those within the Mendelian randomization framework—need to be applied.
Mendelian Randomization The basic principle utilized in the Mendelian randomization approach is that if genetic variants either alter the level or mirror the biological effects of a modifiable environmental exposure that itself alters disease risk, then these genetic variants should be related to disease risk to the extent predicted by their influence on exposure to the risk factor. Common genetic polymorphisms that have a well-characterized biological function (or are markers for such variants) can therefore be utilized to study the effect of a suspected environmental exposure on disease risk (Davey Smith & Ebrahim, 2003, 2004, 2005; Davey Smith, 2006; Lawlor, Harbord, Sterne, Timpson, & Davey Smith, 2008; Ebrahim & Davey Smith, 2008). The exploitation of situations in which genotypic differences produce effects similar to environmental factors (and vice versa) clearly resonates with the concepts of ‘‘phenocopy’’ and ‘‘genocopy’’ in developmental genetics (Box 9.1). It may seem illogical to study genetic variants as proxies for environmental exposures rather than to measure the exposures themselves. However, there are several crucial advantages of utilizing functional genetic variants (or their markers) in this manner that relate to the problems with
214
Causality and Psychopathology
box 9.1 Phenocopy, Genocopy, and Mendelian Randomization The term phenocopy is attributed to Goldschmidt (1938) and is used to describe the situation where an environmental effect could produce the same effect as was produced by a genetic mutation. As Goldschmidt (1938) explicated, ‘‘different causes produce the same end effect, presumably by changing the same developmental processes in an identical way.’’ In human genetics the term has generally been applied to an environmentally produced disease state that is similar to a clear genetic syndrome. For example the niacin-deficiency disease pellagra is clinically similar to the autosomal recessive condition Hartnup disease (Baron, Dent, Harris, Hart, & Jepson, 1956), and pellagra has been referred to as a phenocopy of the genetic disorder (Snyder, 1959; Guy, 1993). Hartnup disease is due to reduced neutral amino acid absorption from the intestine and reabsorption from the kidney, leading to low levels of blood tryptophan, which in turn leads to a biochemical anomaly that is similar to that seen when the diet is deficient in niacin (Kraut & Sachs, 2005; Broer, Cavanaugh, & Rasko, 2004). Genocopy is a less utilized term, attributed to Schmalhausen (see Gause, 1942), but has generally been considered to be the reverse of phenocopy—that is, when genetic variation generates an outcome that could be produced by an environmental stimulus (JablonkaTavory, 1982). It is clear that, even when the term genocopy is used polemically (e.g., Rose, 1995), the two concepts are mirror images, reflecting differently motivated accounts of how both genetic and environmental factors influence physical state. For example, Hartnup disease can be called a genocopy of pellagra, while pellagra can be considered a phenocopy of Hartnup disease. Mendelian randomization can, therefore, be viewed as an appreciation of the phenocopy–genocopy nexus that allows causation to be separated from association. Phenocopies of major genetic disorders are generally rarely encountered in clinical medicine, but as Lenz (1973) comments, ‘‘they are, however, most important as models which might help to elucidate the pathways of gene action.’’ Mendelian randomization is generally concerned with less major (and, thus, common) disturbances and reverses the direction of phenocopy ! genocopy, to utilize genocopies of known genetic mechanism to inform us better about pathways through which the environment influences health. The scope of phenocopy–genocopy has been discussed by Zuckerkandl and Villet (1988), who advance mechanisms through which there can be (continued)
9 Obtaining Robust Causal Evidence From Observational Studies
215
Box 9.1 9.3 (continued) (continued) equivalence between environmental and genotypic influences. Indeed, they state that ‘‘no doubt all environmental effects can be mimicked by one or several mutations.’’ The notion that genetic and environmental influences can be both equivalent and interchangeable has received considerable attention in developmental biology (e.g., West-Eberhard, 2003; Leimar, Hammerstein, & Van Dooren, 2006). Furthermore, population genetic analyses of correlations between different traits suggest there are common pathways of genetic and environmental influences, with Cheverud (1988) concluding that ‘‘most environmentally caused phenotypic variants should have genetic counterparts and vice versa.’’
observational studies already outlined. First, unlike environmental exposures, genetic variants are not generally associated with the wide range of behavioral, social, and physiological factors that, for example, confound the association between vitamin C and CHD. This means that if a genetic variant is used to proxy for an environmentally modifiable exposure, it is unlikely to be confounded in the way that direct measures of the exposure will be. Further, aside from the effects of population structure (see Palmer & Cardon, 2005, for a discussion of the likely impact of this), such variants will not be associated with other genetic variants, excepting those with which they are in linkage disequilibrium. Empirical investigation of the associations of genetic variants with potential confounding factors reveals that they do indeed tend to be not associated with such factors (Davey Smith et al., 2008). Second, we have seen how inferences drawn from observational studies may be subject to bias due to reverse causation. Disease processes may influence exposure levels such as alcohol intake or measures of intermediate phenotypes such as cholesterol levels and C-reactive protein. However, germline genetic variants associated with average alcohol intake or circulating levels of intermediate phenotypes will not be influenced by the onset of disease. This will be equally true with respect to reporting bias generated by knowledge of disease status in case–control studies or of differential reporting bias in any study design. Third, associative selection bias, in which selection into a study is related to both exposure level and disease risk and can generate spurious associations (as illustrated with respect to alcohol and stroke), is unlikely to occur with respect to genetic variants. For example, empirical evidence supports a lack of association between a wide range of genetic variants and participation rates in a series of cancer case–control studies (Bhatti et al., 2005).
216
Causality and Psychopathology
Finally, a genetic variant will indicate long-term levels of exposure and if the variant is taken as a proxy for such exposure, it will not suffer from the measurement error inherent in phenotypes that have high levels of variability. For example, groups defined by cholesterol level–related genotype will, over a long period, experience the cholesterol difference seen between the groups. For individuals, blood cholesterol is variable over time, and the use of single measures of cholesterol will underestimate the true strength of association between cholesterol and, say, CHD. Indeed, use of the Mendelian randomization approach predicts a strength of association that is in line with RCT findings of the effects of cholesterol lowering when the increasing benefits seen over the relatively short trial period are projected to the expectation for differences over a lifetime (Davey Smith & Ebrahim, 2004), which will be discussed further.
Categories of Mendelian Randomization The term Mendelian randomization has now become widely used (see Box 9.2), with a variety of meanings. This partly reflects the fact that there are several categories of inference that can be drawn from studies utilizing the Mendelian randomization approach. In the most direct forms, genetic variants can be related to the probability or level of exposure (‘‘exposure propensity’’) or to intermediate phenotypes believed to influence disease risk. Less direct evidence can come from genetic variant–disease associations that indicate that a particular biological pathway may be of importance, perhaps because the variants modify the effects of environmental exposures. Several examples of these categories have been given elsewhere (Davey Smith & Ebrahim, 2003, 2004; Davey Smith, 2006; Ebrahim & Davey Smith, 2008); here, a few illustrative cases are briefly outlined.
Exposure Propensity Alcohol Intake and Health The possible protective effect of moderate alcohol consumption on CHD risk remains controversial (Marmot, 2001; Bovet & Paccaud, 2001; Klatsky, 2001). Nondrinkers may be at a higher risk of CHD because health problems (perhaps induced by previous alcohol abuse) dissuade them from drinking (Shaper, 1993). As well as this form of reverse causation, confounding could play a role, with nondrinkers being more likely to display an adverse profile of socioeconomic or other behavioral risk factors for CHD (Hart et al., 1999). Alternatively, alcohol may have a direct biological effect that lessens
9 Obtaining Robust Causal Evidence From Observational Studies
217
box 9.2 Why ‘‘Mendelian Randomization’’? Gregor Mendel (1822–1884) concluded from his hybridization studies with pea plants that ‘‘the behaviour of each pair of differentiating characteristics [such as the shape and color of seeds] in hybrid union is independent of the other differences between the two original plants’’ (Mendel, 1866). This formulation was actually the only regularity that Mendel referred to as a ‘‘law,’’ and in Carl Correns’ 1900 paper (one of a trio appearing that year that are considered to represent the rediscovery of Mendel) he refers to this as ‘‘Mendel’s law’’ (Correns, 1900; Olby, 1966). Morgan (1913) discusses independent assortment and refers to this process as being realized ‘‘whenever two pairs of characters freely Mendelize.’’ Morgan’s use of Mendel’s surname as a verb did not catch on, but Morgan later christened this principle ‘‘Mendel’s second law’’ (Morgan, 1919); it has been known as this or as ‘‘the law of independent assortment’’ since this time. The law suggests that inheritance of one trait is independent of—that is, randomized with respect to—the inheritance of other traits. The analogy with a randomized controlled trial will clearly be most applicable to parent–offspring designs investigating the frequency with which one of two alleles from a heterozygous parent is transmitted to offspring with a particular disease. However, at the population level, traits influenced by genetic variants are generally not associated with the social, behavioral, and environmental factors that confound relationships observed in conventional epidemiological studies. Thus, while the ‘‘randomization’’ is approximate and not absolute in genetic association studies, empirical observations suggest that it applies in most circumstances (Davey Smith, Harbord, Milton, Ebrahim, & Sterne, 2005a; Bhatti et al., 2005; Davey Smith et al., 2008). The term Mendelian randomization itself was introduced in a somewhat different context, in which the random assortment of genetic variants at conception is utilized to provide an unconfounded study design for estimating treatment effects for childhood malignancies (Gray & Wheatley, 1991; Wheatley & Gray, 2004). The term has recently become widely used with the meaning ascribed to it in this chapter. The notion that genetic variants can serve as an indicator of the action of environmentally modifiable exposures has been expressed in many contexts. For example, since the mid-1960s various investigators have pointed out that the autosomal dominant condition of lactase persistence is associated with milk drinking. Protective associations of lactase persistence (continued)
218
Causality and Psychopathology
Box box 9.3 9.2 (continued) with osteoporosis, low bone mineral density, or fracture risk thus provide evidence that milk drinking reduces the risk of these conditions (Birge, Keutmann, Cuatrecasas, & Whedon, 1967; Newcomer, Hodgson, Douglas, & Thomas, 1978). In a related vein, it was proposed in 1979 that as Nacetyltransferase pathways are involved in the detoxification of arylamine, a potential bladder carcinogen, the observation of increased bladdercancer risk among people with genetically determined slow-acetylator phenotype provided evidence that arylamines are involved in the etiology of the disease (Lower et al., 1979). Since these early studies various commentators have pointed out that the association of genetic variants of known function with disease outcomes provides evidence about etiological factors (McGrath, 1999; Ames, 1999; Rothman et al., 2001; Brennan, 2002; Kelada, Eaton, Wang, Rothman, & Khoury, 2003). However, these commentators have not emphasized the key strengths of Mendelian randomization- the avoidance of confounding, the avoidance of bias due to reverse causation and reporting tendency, and correction for the underestimation of risk associations due to variability in behaviors and phenotypes (Davey Smith & Ebrahim, 2004). These key concepts were present in Martijn Katan’s 1986 Lancet letter, in which he suggested that genetic variants related to cholesterol level could be used to investigate whether the observed association between low cholesterol and increased cancer risk was real, and by Honkanen and colleagues’ (1996) understanding of how lactase persistence could better characterize the difficult-to-measure environmental influence of calcium intake than could direct dietary reports. Since 2000 there have been several reports using the term Mendelian randomization in the way it is used here (Youngman et al., 2000; Fallon, Ben-Shlomo, & Davey Smith, 2001; Clayton & McKeigue, 2001; Keavney, 2002; Davey Smith & Ebrahim, 2003), and its use is becoming widespread.
the risk of CHD—for example, by increasing the levels of protective highdensity lipoprotein (HDL) cholesterol (Rimm, 2001). It is, however, unlikely that an RCT of alcohol intake, able to test whether there is a protective effect of alcohol on CHD events, will be carried out. Alcohol is oxidized to acetaldehyde, which in turn is oxidized by aldehyde dehydrogenases (ALDHs) to acetate. Half of Japanese people are heterozygous or homozygous for a null variant of ALDH2, and peak blood acetaldehyde concentrations post–alcohol challenge are 18 times and five times higher, respectively, among homozygous null variant and heterozygous
9 Obtaining Robust Causal Evidence From Observational Studies
219
individuals compared with homozygous wild-type individuals (Enomoto, Takase, Yasuhara, & Takada, 1991). This renders the consumption of alcohol unpleasant through inducing facial flushing, palpitations, drowsiness, and other symptoms. As Figure 9.6a shows, there are very considerable differences in alcohol consumption according to genotype (Takagi et al., 2002). The principles of Mendelian randomization are seen to apply—two factors that would be expected to be associated with alcohol consumption, age and cigarette smoking, which would confound conventional observational associations between alcohol and disease, are not related to genotype despite the strong association of genotype with alcohol consumption (Figure 9.6b). It would be expected that ALDH2 genotype influences diseases known to be related to alcohol consumption, and as proof of principle it has been shown that ALDH2 null variant homozygosity—associated with low alcohol consumption—is indeed related to a lower risk of liver cirrhosis (Chao et al., 1994). Considerable evidence, including data from RCTs, suggests that alcohol increases HDL cholesterol levels (Haskell et al., 1984; Burr, Fehily, Butland, Bolton, & Eastham, 1986) (which should protect against CHD). In line with this, ALDH2 genotype is strongly associated with HDL cholesterol in the expected direction (Figure 9.6c). With respect to blood pressure, observational evidence suggests that long-term alcohol intake produces an increased risk of hypertension and higher prevailing blood pressure levels. A meta-analysis of studies of ALDH2 genotype and blood pressure suggests that there is indeed a substantial effect in this direction, as demonstrated in Figures 9.7, 9.8, and 9.9 (Chen et al., 2008). Alcohol intake has also been postulated to increase the risk of esophageal cancer; however, some have questioned the importance of its role (Memik, 2003). Figure 9.9 presents findings from a meta-analysis of studies of ALDH2 genotype and esophageal-cancer risk (Lewis & Davey Smith, 2005), clearly showing that people who are homozygous for the null variant, who therefore consume considerably less alcohol, have a greatly reduced risk of esophageal cancer. Indeed, this reduction in risk is close to that predicted by the joint effect of genotype on alcohol consumption and the association of alcohol consumption on esophageal-cancer risk in a meta-analysis of such observational studies (Gutjahr, Gmel, & Rehm, 2001). When the heterozygotes are compared with the homozygous functional variant, an interesting picture emerges: The risk of esophageal cancer is higher in the heterozygotes who drink rather less alcohol than those with the homozygous functional variant. This suggests that it is not alcohol itself that is the causal factor but acetaldehyde and that the increased risk is apparent only in those who drink some alcohol but metabolize it inefficiently, leading to high levels of acetaldehyde.
Causality and Psychopathology
220
Alcohol Intake ml/day
40
30
20
10
0 2*2/2*2
2*2/2*1
1*1/1*1
ALDH2 Genotype Age
70
Smoker
70
Percentage
Years
60 60
50
50 40 30 20
40 2*2/2*2
2*2/2*1
2*2/2*2
1*1/1*1
2*2/2*1
1*1/1*1
65 60
HDL mg/dl
55
50
45
40 35 2*2/2*2
2*2/2*1
1*1/1*1
Figure 9.6 a Relationship between alcohol intake and ALDH2 genotype. b Relationship between characteristics and ALDH2 genotype. c Relationship between HDL cholesterol and ALDH2 genotype. From ‘‘Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in Japanese men,’’ by S. Takagi, N. Iwai, R. Yamauchi, S. Kojima, S. Yasuno, T. Baba, et al., 2002, Hypertension Research, 25, 677–681.
9 Obtaining Robust Causal Evidence From Observational Studies
221
Odds radio in Hypertension (95% CI)
Study ID 12vs22 (Male) Amamoto et al, 2002 [18]
1.67 (0.92, 3.03)
Iwai et al, 2004 [31]
1.57 (0.90, 2.72)
Saito et al, 2003 [28]
2.84 (0.79, 10.15)
Subtotal
(I2=0.0%,
p=0.701)
1.72 (1.17, 2.52)
11vs22 (Male) Amamoto et al, 2002 [18]
2.50 (1.38, 4.54)
Iwai et al, 2004 [31]
2.02 (1.17, 3.47)
Saito et al, 2003 [28]
4.62 (1.31, 16.25)
Subtotal (I2 =0.0%, p =0.482)
2.42 (1.66, 3.55)
.6 .8 1
2
4
8
16
Figure 9.7 Forest plot of studies of ALDH2 genotype and hypertension. From L Chen et al. 2008.
Intermediate Phenotypes Genetic variants can influence circulating biochemical factors such as cholesterol, homocysteine, and fibrinogen levels. This provides a method for assessing causality in associations between these measures (intermediate phenotypes) and disease and, thus, whether interventions to modify the intermediate phenotype could be expected to influence disease risk.
Cholesterol and CHD Familial hypercholesterolemia is a dominantly inherited condition in which very many rare mutations of the low-density lipoprotein receptor gene (about 10 million people affected worldwide, a prevalence of around 0.2%) lead to high circulating cholesterol levels (Marks, Thorogood, Neil, & Humphries, 2003). The high risk of premature CHD in people with this condition was readily appreciated, with an early U.K. report demonstrating that by age 50 half of men and 12% of women had suffered from CHD (Slack, 1969). Compared with the population of England and Wales (mean total cholesterol 6.0 mmol/l), people with familial hypercholesterolemia (mean total cholesterol 9 mmol/l) suffered a 3.9-fold increased risk of CHD mortality, although very high relative risks among those aged less than 40 years have been observed (Scientific Steering Committee, 1991). These observations regarding
Causality and Psychopathology
222
Mean differnce DBP in mmHg (95% CI)
Study ID 12vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =0.0%, p=0.720) 11vs22 (Male) Amamoto et al, 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =0.0%, p =0.492) −4
−2
2.70 (−0.30, 5.70) 3.90 (−0.95, 8.75) 1.00 (−0.75, 2.75) 2.10 (−2.29, 6.49) 0.70 (−3.17, 4.57) 1.58 (0.29, 2.87) 4.40 (1.36, 7.44) 7.10 (2.36, 11.84) 3.10 (1.35, 4.85) 5.80 (1.50, 10.10) 3.80 (0.01, 7.59) 3.95 (2.66, 5.24) 0
4
8
12 Mean differnce of SBP in mmHg (95% CI)
Study ID 12vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =12.1%, p=0.336)
6.00 (1.30, 10.70) 9.40 (2.75, 16.05) 2.20 (−1.05, 5.45) 3.10 (−3.18, 9.38) 4.80 (0.21, 9.39) 4.24 (2.18, 6.31)
11vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =18.0%, p =0.300)
8.40 (3.67, 13.13) 13.90 (7.35, 20.45) 5.90 (2.65, 9.15) 6.80 (0.54, 13.06) 6.80 (2.32, 11.28) 7.44 (5.39, 9.49)
−4 −2 0
4
8
12
16
20
Figure 9.8 Forest plot of studies of ALDH2 genotype and blood pressure. L Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
genetically determined variation in risk provided strong evidence that the associations between blood cholesterol and CHD seen in general populations reflected a causal relationship. The causal nature of the association between blood cholesterol levels and CHD has historically been controversial (Steinberg, 2004). As both Daniel Steinberg (2005) and Ole Færgeman (2003) discuss, many clinicians and public-health practitioners rejected the notion of a causal link for a range of reasons. However, from the late 1930s onward, the finding that people with genetically high levels of
9 Obtaining Robust Causal Evidence From Observational Studies odds ratio (95% CI)
Study
223
% Weight
Hori
0.87 (0.19, 4.06)
26.5
Matsuo
0.19 (0.02, 1.47)
15.0
Boonyphiphat
0.22 (0.03, 1.87)
13.9
Itoga
0.48 (0.06, 3.87)
14.4
Yokoyama (2002)
0.25 (0.06, 1.07)
30.2
0.36 (0.16, 0.80)
100.0
Overall
.1
.2
.5 1 2 odds ratio
5
Figure 9.9 Risk of esophageal cancer in individuals with the ALDH2*2*2 vs. ALDH2*1*1 genotype. From ‘‘Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach,’’ by S. Lewis & G. Davey Smith, 2005, Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971.
cholesterol had high risk for CHD should have been powerful and convincing evidence of the causal nature of elevated blood cholesterol in the general population. With the advent of effective means of reducing blood cholesterol through statin treatment, there remains no serious doubt that the cholesterol–CHD relationship is causal. Among people without CHD, reducing total cholesterol levels with statin drugs by around 1–1.5 mmol/l reduces CHD mortality by around 25% over 5 years. Assuming a linear relationship between blood cholesterol and CHD risk and given the difference in cholesterol of 3.0 mmol/l between people with familial hypercholesterolemia and the general population, the RCT evidence on lowering total cholesterol and reducing CHD mortality would predict a relative risk for CHD of around 2, as opposed to 3.9, for people with familial hypercholesterolemia. However, the trials also demonstrate that the relative reduction in CHD mortality increases over time from randomization—and thus time with lowered cholesterol—as would be expected if elevated levels of cholesterol operate over decades to influence the development of atherosclerosis. People with familial hypercholesterolemia will have had high total cholesterol levels throughout their lives, and this would be expected to generate a greater risk than that predicted by the results of lowering cholesterol levels for only 5 years. Furthermore, ecological studies relating cholesterol levels to CHD demonstrate that the strength of
224
Causality and Psychopathology
association increases as the lag period between cholesterol level assessment and CHD mortality increases (Rose, 1982), again suggesting that long-term differences in cholesterol level are the important etiological factor in CHD. As discussed, Mendelian randomization is one method for assessing the effects of long-term differences in exposures on disease risk, free from the diluting problems of both measurement error and having only short-term assessment of risk-factor levels. This reasoning provides an indication that cholesterol-lowering efforts should be lifelong rather than limited to the period for which RCT evidence with respect to CHD outcomes is available. Recently, several common genetic variants have been identified that are related to cholesterol level and CHD risk, and these have also demonstrated effects on CHD risk consistent with lifelong differences in cholesterol level (Davey Smith, Timpson & Ebrahim, 2008; Kathiresan et al., 2008). C-Reactive Protein and CHD Strong associations of C-reactive protein (CRP), an acute-phase inflammatory marker, with hypertension, insulin resistance, and CHD have been repeatedly observed (Danesh et al., 2004; Wu, Dorn, Donahue, Sempos, & Trevisan, 2002; Pradhan, Manson, Rifai, Buring, & Ridker, 2001; Han et al., 2002; Sesso et al., 2003; Hirschfield & Pepys, 2003; Hu, Meigs, Li, Rifai, & Manson, 2004), with the obvious inference that CRP is a cause of these conditions (Ridker et al., 2005; Sjo¨holm & Nystro¨m, 2005; Verma, Szmitko, & Ridker, 2005). A Mendelian randomization study has examined the association between polymorphisms of the CRP gene and demonstrated that while serum CRP differences were highly predictive of blood pressure and hypertension, the CRP variants—which are related to sizeable serum CRP differences—were not associated with these same outcomes (Davey Smith et al., 2005b). It is likely that these divergent findings are explained by the extensive confounding between serum CRP and outcomes. Current evidence on this issue, though statistically underpowered, also suggests that CRP levels do not lead to elevated risk of insulin resistance (Timpson et al., 2005) or CHD (Casas et al., 2006). Again, confounding, and reverse causation—where existing coronary disease or insulin resistance may influence CRP levels—could account for this discrepancy. Similar findings have been reported for serum fibrinogen, variants in the beta-fibrinogen gene, and CHD (Davey Smith et al., 2005a; Keavney et al., 2006). The CRP and fibrinogen examples demonstrate that Mendelian randomization can both increase evidence for a causal effect of an environmentally modifiable factor (as in the examples of milk, alcohol, and cholesterol levels) and provide evidence against causal effects, which can help direct efforts away from targets of no preventative or therapeutic relevance.
9 Obtaining Robust Causal Evidence From Observational Studies
225
Maternal Genotype as an Indicator of Intrauterine Environment Mendelian randomization studies can provide unique insights into the causal nature of intrauterine environment influences on later disease outcomes. In such studies, maternal genotype is taken to be a proxy for environmentally modifiable exposures mediated through the mother that influence the intrauterine environment. For example, it is now widely accepted that neural tube defects can in part be prevented by periconceptual maternal folate supplementation (Scholl and Johnson, 2000. RCTs of folate supplementation have provided the key evidence in this regard (MRC Vitamin Study Research Group, 1991; Czeizel & Duda´s, 1992). However, could we have reached the same conclusion before the RCTs were carried out if we had access to evidence from genetic association studies? Studies have looked at the MTHFR 677C!T polymorphism (a genetic variant that is associated with methyltetrahydrofolate reductase activity and circulating homocysteine levels, the TT genotype being associated with higher homocysteine levels) in newborns with neural tube defects compared to controls and have found an increased risk in TT vs. CC newborns, with a relative risk of 1.75 (95% CI 1.41–2.18) in a meta-analysis of all such studies (Botto & Yang, 2000). Studies have also looked at the association between this MTHFR variant in parents and the risk of neural tube defect in their offspring. Mothers who have the TT genotype have an increased risk of 2.04 (95% CI 1.49–2.81) of having an offspring with a neural tube defect compared to mothers who have the CC genotype (Roseboom et al., 2000). For TT fathers, the equivalent relative risk is 1.18 (95% CI 0.65–2.12) (Scholl & Johnson, 2000). This pattern of associations suggests that it is the intrauterine environment—influenced by maternal TT genotype—rather than the genotype of offspring that is related to disease risk (Figure 9.10). This is consistent with the hypothesis that maternal folate intake is the exposure of importance. In this case, the findings from observational studies, genetic association studies, and an RCT are closely similar. Had the technology been available, the genetic association studies, with the particular influence of maternal versus paternal genotype on neural tube defect risk, would have provided strong evidence of the beneficial effect of folate supplementation before the results of any RCT had been completed, although trials would still have been necessary to confirm that the effect was causal for folate supplementation. Certainly, the genetic association studies would have provided better evidence than that given by conventional epidemiological studies, which would have had to cope with the problems of accurately assessing diet and the considerable confounding of maternal folate intake with a wide variety of lifestyle and socioeconomic factors that may also influence neural tube defect risk.
226
Causality and Psychopathology
Mother – TT – foetus exposed in utero: RR 2.04
Father – TT – but no way that this can affect in utero exposure of foetus: RR 1.18
Foetus – TT – inherits 50% from mother and 50% from father – hence intermediate risk: RR 1.75
Figure 9.10 Inheritance of MTHFR polymorphism and neural tube defects.
The association of genotype with neural tube defect risk does not suggest that genetic screening is indicated; rather, it demonstrates that an environmental intervention may benefit the whole population, independent of the genotype of individuals receiving the intervention. Studies utilizing maternal genotype as a proxy for environmentally modifiable influences on the intrauterine environment can be analyzed in a variety of ways. First, the mothers of offspring with a particular outcome can be compared to a control group of mothers who have offspring without the outcome in a conventional case–control design but with the mother as the exposed individual (or control) rather than the offspring with the particular health outcome (or the control offspring). Fathers could serve as a control group when autosomal genetic variants are being studied. If the exposure is mediated by the mother, maternal genotype, rather than offspring genotype, will be the appropriate exposure indicator. Clearly, maternal and offspring genotypes are associated but conditional on each other; it should be the maternal genotype that shows the association with the health outcome among the offspring. Indeed, in theory it would be possible to simply compare genotype distributions of mothers and offspring, with a higher prevalence among mothers providing evidence that maternal genotype, through an intrauterine pathway, is of importance. However, the statistical power of such an approach is low, and an external control group, whether fathers or women who have offspring without the health outcome, is generally preferable. The influence of high levels of alcohol intake by pregnant women on the health and development of their offspring is well recognized for very high levels of intake, in the form of fetal alcohol syndrome (Burd, 2006). However, the influence outside of this extreme situation is less easy to assess, particularly as higher levels of alcohol intake will be related to a wide array of potential sociocultural, behavioral, and environmental confounding factors. Furthermore, there may be systematic bias in how mothers report alcohol intake during pregnancy, which could distort associations with health outcomes. Therefore, outside of the case of very high alcohol intake by mothers,
9 Obtaining Robust Causal Evidence From Observational Studies
227
it is difficult to establish a causal link between maternal alcohol intake and offspring developmental characteristics. Some studies have approached this by investigating alcohol-metabolizing genotypes in mothers and offspring outcomes. Although sample sizes have been low and the analytical strategies not optimal, they provide some evidence to support the influence of maternal genotype (Gemma, Vichi, & Testai, 2007; Jacobson et al., 2006; Warren & Li, 2005). For example, in one study mental development at age 7.5 was delayed among offspring of mothers possessing a genetic variant associated with less rapid alcohol metabolism. Among these mothers there would presumably be less rapid clearance of alcohol and, thus, an increased influence of maternal alcohol on offspring during the intrauterine period (Jacobson et al., 2006). Offspring genotype was not independently related to these outcomes, indicating that the crucial exposure was related to maternal alcohol levels. As in the MTHFR examples, these studies are of relevance because they provide evidence of the influence of maternal alcohol levels on offspring development, rather than because they highlight a particular maternal genotype that is of importance. In the absence of alcohol drinking, the maternal genotype would presumably have no influence on offspring outcomes. The association of maternal genotype and offspring outcome suggests that the alcohol level in mothers, and therefore their alcohol consumption, has an influence on offspring development.
Implications of Mendelian Randomization Study Findings Establishing the causal influence of environmentally modifiable risk factors from Mendelian randomization designs informs policies for improving population health through population-level interventions. This does not imply that the appropriate strategy is genetic screening to identify those at high risk and application of selective exposure reduction policies. For example, the implication of studies on maternal MTHFR genotype and offspring neural tube defect risk is that the population risk for neural tube defects can be reduced through increased folate intake periconceptually and in early pregnancy. It does not suggest that women should be screened for MTHFR genotype; women without the TT genotype but with low folate intake are still exposed to preventable risk of having babies with neural tube defects. Similarly, establishing the association between genetic variants (such as familial defective ApoB) associated with elevated cholesterol level and CHD risk strengthens causal evidence that elevated cholesterol is a modifiable risk factor for CHD for the whole population. Thus, even though the population attributable risk
Causality and Psychopathology
228
for CHD of this variant is small, it usefully informs public-health approaches to improving population health. It is this aspect of Mendelian randomization that illustrates its distinction from conventional risk identification and genetic screening purposes of genetic epidemiology.
Mendelian Randomization and RCTs RCTs are clearly the definitive means of obtaining evidence on the effects of modifying disease risk processes. There are similarities in the logical structure of RCTs and Mendelian randomization, however. Figure 9.11 illustrates this, drawing attention to the unconfounded nature of exposures proxied for by genetic variants (analogous to the unconfounded nature of a randomized intervention), the lack of possibility of reverse causation as an influence on exposure–outcome associations in both Mendelian randomization and RCT settings, and the importance of intention-to-treat analyses—that is, analysis by group defined by genetic variant, irrespective of associations between the genetic variant and the proxied for exposure within any particular individual. The analogy with RCTs is also useful with respect to one objection that has been raised for Mendelian randomization studies. This is that the environmentally modifiable exposure proxied for by the genetic variants (such as alcohol intake or circulating CRP levels) is influenced by many other factors in addition to the genetic variants (Jousilahti & Salomaa, 2004). This is, of course, true. However, consider an RCT of blood pressure–lowering medication. Blood pressure is influenced mainly by factors other than taking blood pressure–lowering medication—obesity, alcohol intake, salt consumption and Randomized controlled trial
Mendelian randomization
Random segregation of alleles
Exposed: one allelle
Control: other allelle
Confounders equal between groups Outcomes compared between groups
Randomization method
Exposed: Intervention
Control: No intervention Confounders equal between groups
Outcomes compared between groups
Figure 9.11 Mendelian randomization and randomized controlled trial designs compared.
9 Obtaining Robust Causal Evidence From Observational Studies
229
other dietary factors, smoking, exercise, physical fitness, genetic factors, and early-life developmental influences are all of importance. However, the randomization that occurs in trials ensures that these factors are balanced between the groups that receive the blood pressure–lowering medication and those that do not. Thus, the fact that many other factors are related to the modifiable exposure does not vitiate the power of RCTs; neither does it vitiate the strength of Mendelian randomization designs. A related objection is that the genetic variants often explain only a trivial proportion of the variance in the environmentally modifiable risk factor that is being proxied for (Glynn, 2006). Again, consider an RCT of blood pressure– lowering medication where 50% of participants received the medication and 50% received a placebo. If the antihypertensive therapy reduced blood pressure (BP) by a quarter of a standard deviation (SD), which is approximately the situation for such pharmacotherapy, then within the whole study group treatment assignment (i.e., antihypertensive use vs. placebo) will explain less than 2% of the variance in blood pressure. In the example of CRP haplotypes used as instruments for CRP levels, these haplotypes explain 1.66% of the variance in CRP levels in the population (Lawlor et al., 2008). As can be seen, the quantitative association of genetic variants as instruments can be similar to that of randomized treatments with respect to the biological processes that such treatments modify. Both logic and quantification fail to support criticisms of the Mendelian randomization approach based on either the obvious fact that many factors influence most phenotypes of interest or the fact that particular genetic variants account for only a small proportion of variance in the phenotype.
Mendelian Randomization and Instrumental Variable Approaches As well as the analogy with RCTs, Mendelian randomization can also be likened to instrumental variable approaches, which have been heavily utilized in econometrics and social science, although rather less so in epidemiology. In an instrumental variable approach the instrument is a variable that is related to the outcome only through its association with the modifiable exposure of interest. The instrument is not related to confounding factors nor is its assessment biased in a manner that would generate a spurious association with the outcome. Furthermore, the instrument will not be influenced by the development of the outcome (i.e., there will be no reverse causation). Figure 9.12 presents this basic schema, where the dotted line between genotype and outcome provides an unconfounded and unbiased estimate of the causal association between the exposure that the genotype is proxying for and the outcome. The development of instrumental variable methods within
230
Causality and Psychopathology
Geneotype
Exposure
Outcome
Confounders; reverse causation; bias
Figure 9.12 Mendelian randomization as an instrumental variables approach.
econometrics, in particular, has led to a sophisticated suite of statistical methods for estimating causal effects, and these have now been applied within Mendelian randomization studies (e.g., Davey Smith et al., 2005a, 2005b; Timpson et al., 2005). The parallels between Mendelian randomization and instrumental variable approaches are discussed in more detail elsewhere (Thomas & Conti, 2004; Lawlor et al., 2008). The instrumental variable method allows for estimation of the causal effect size of the modifiable environmental exposure of interest and the outcome, together with estimates of the precision of the effect. Thus, in the example of alcohol intake (indexed by ALDH2 genotype) and blood pressure discussed earlier it is possible to utilize the joint associations of ALDH2 genotype and alcohol and ALDH2 genotype and blood pressure to estimate the causal influence of alcohol intake on blood pressure. Figure 9.13 reports such an analysis, showing that for a 1 g/day increase in alcohol intake there are robust increases in diastolic and systolic blood pressure among men (Chen et al., 2008).
Mendelian Randomization and Gene by Environment Interaction
Mendelian randomization is one way in which genetic epidemiology can inform our understanding about environmental determinants of disease. A more conventional approach has been to study interactions between environmental exposures and genotype (Perera, 1997; Mucci, Wedren, Tamimi, Trichopoulos, & Adami, 2001). From epidemiological and Mendelian randomization perspectives, several issues arise with gene–environment interactions. The most reliable findings in genetic association studies relate to the main effects of polymorphisms on disease risk (Clayton & McKeigue, 2001). The power to detect meaningful gene–environment interaction is low (Wright, Carothers, & Campbell, 2002), with the result being that there are a large number of reports of spurious gene–environment interactions in
9 Obtaining Robust Causal Evidence From Observational Studies
231
Alcohol-BP effect (95% CI) Diastolic: Amamoto et al., 2002 [18]
0.17 (0.06, 0.28)
Takagi et al., 2001 [19]
0.15 (0.08, 0.22)
Tsuritani et al., 1995 [20]
0.16 (0.07, 0.26)
Subtotal (I2 = 0.0%, p = 0.970)
0.16 (0.11, 0.21)
Systolic: Amamoto et al., 2002 [18]
0.29 (0.12, 0.47)
Takagi et al., 2001 [19]
0.28 (0.16, 0.40)
Tsuritani et al., 1995 [20]
0.18 (0.05, 0.31)
Subtotal (I2 = 0.0%, p = 0.439)
0.24 (0.16, 0.32) 0
.1
.2 .3 .4
.5
mmHg per g/day
Figure 9.13 Instrumental variable estimates of difference in systolic and diastolic blood pressure produced by 1g per day hyper alcohol intake.
the medical literature (Colhoun, McKeigue, & Davey Smith, 2003). The presence or absence of statistical interactions depends upon the scale (e.g., linear or logarithmic with respect to the exposure–disease outcome), and the meaning of observed deviation from either an additive or a multiplicative model is not clear. Furthermore, the biological implications of interactions (however defined) are generally uncertain (Thompson, 1991). Mendelian randomization is most powerful when studying modifiable exposures that are difficult to measure and/or considerably confounded, such as dietary factors. Given measurement error—particularly if this is differential with respect to other factors influencing disease risk—interactions are both difficult to detect and often misleading when, apparently, they are found (Clayton & McKeigue, 2001). The situation is perhaps different with exposures that differ qualitatively rather than quantitatively between individuals. Consider the issue of the influence of smoking tobacco on bladder-cancer risk. Observational studies suggest an association, but clearly confounding and a variety of biases could generate such an association. The potential carcinogens in tobacco smoke of relevance to bladder-cancer risk include aromatic and heterocyclic amines, which are detoxified by N-acetyltransferase 2 (NAT2). Genetic variation in NAT2 enzyme levels leads to slower or faster acetylation states. If the carcinogens in tobacco smoke do increase the risk of bladder cancer, then it would be expected that slow acetylators, those who have a reduced rate of detoxification of these carcinogens, would be at an increased risk of bladder cancer if they were smokers, whereas if they were not exposed to these carcinogens
Causality and Psychopathology
232
(and the major exposure route for those outside of particular industries is through tobacco smoke), then an association of genotype with bladder-cancer risk would not be anticipated. Table 9.3 tabulates findings from a large study reported in a way that allows analysis of this simple hypothesis (Gu, Liang, Wang, Lu, & Wu, 2005). As can be seen, the influence of the NAT2 slowacetylation genotype is appreciable only among those also exposed to heavy smoking. Since the genotype will be unrelated to confounders, it is difficult to reason why this situation should arise unless smoking is a causal factor with respect to bladder cancer. Thus, the presence of a sizable effect of genotype in the exposed group but not in the unexposed group provides evidence as to the causal nature of the environmentally modifiable risk factor—in this example, smoking. It must be recognized, however, that gene by environment interactions interpreted within the Mendelian randomization framework as evidence regarding the causal nature of environmentally modifiable exposures are not protected from confounding to the extent that main genetic effects are. In the NAT2/smoking/bladder cancer example any factor related to smoking—such as social class—will tend to show a greater association with bladder cancer within NAT2 slow acetylators than within NAT2 rapid acetylators. Because there is not a one-to-one association of social class with smoking, this will not produce the qualitative interaction of essentially no effect of the genotype in one exposure stratum and an effect in the other, as in the NAT2/smoking interaction, but rather a qualitative interaction of a greater effect of NAT2 in the poorer social classes (among whom smoking is more prevalent) and a smaller (but still evident) effect in the better-off social classes, among whom smoking is less prevalent. Thus, situations in which both the biological basis of an expected interaction is well understood and a qualitative (effect vs. no effect) interaction may be anticipated are the ones that are most amenable to interpretations related to the general causal nature of the environmentally modifiable risk factor.
Problems and Limitations of Mendelian Randomization We consider Mendelian randomization to be one of the brightest current prospects for improving causal understanding within population-based studies. There are, however, several potential limitations to the application of this methodology (Davey Smith & Ebrahim, 2003; Little & Khoury, 2003). Table 9.3 NAT2 (Slow vs. Fast Acetylator) risk, stratified by smoking status and Bladder Cancer Overall
Never/Light Smokers
Heavy Smokers
1.35 (1.04–1.75)
1.10 (0.78–1.53)
2.11 (1.30–3.43)
From data in Gu et al. (2005).
9 Obtaining Robust Causal Evidence From Observational Studies
233
Failure to Establish Reliable Genotype–Intermediate Phenotype or Genotype–Disease Associations If the associations between genotype and a potential intermediate phenotype or between genotype and disease outcome are not reliably estimated, then interpreting these associations in terms of their implications for potential environmental causes of disease will clearly be inappropriate. This is not an issue peculiar to Mendelian randomization; rather, the nonreplicable nature of perhaps most apparent findings in genetic association studies is a serious limitation to the whole enterprise. This issue has been discussed elsewhere (Cardon & Bell, 2001; Colhoun et al., 2003) and will not be dealt with further here. Instead, problems with the Mendelian randomization approach even when reliable genotype–phenotype associations can be determined will be addressed.
Confounding of Genotype–Environmentally Modifiable Risk Factor–Disease Associations The power of Mendelian randomization lies in its ability to avoid the often substantial confounding seen in conventional observational epidemiology. However, confounding can be reintroduced into Mendelian randomization studies; and when interpreting the results, whether this has arisen needs to be considered. Linkage Disequilibrium It is possible that the locus under study is in linkage disequilibrium (i.e., is associated) with another polymorphic locus, with the effect of the polymorphism under investigation being confounded by the influence of the other polymorphism. It may seem unlikely—given the relatively short distances over which linkage disequilibrium is seen in the human genome—that a polymorphism influencing, say, CHD risk would be associated with another polymorphism influencing CHD risk (and thus producing confounding). There are, nevertheless, cases of different genes influencing the same metabolic pathway being in physical proximity. For example, different polymorphisms influencing alcohol metabolism appear to be in linkage disequilibrium (Osier et al., 2002). Pleiotropy and the Multifunction of Genes Mendelian randomization is most useful when it can be used to relate a single intermediate phenotype to a disease outcome. However, polymorphisms may (and probably often will) influence more than one intermediate phenotype,
234
Causality and Psychopathology
and this may mean they proxy for more than one environmentally modifiable risk factor. This can be the case through multiple effects mediated by their RNA expression or protein coding, through alternative splicing, where one polymorphic region contributes to alternative forms of more than one protein (Glebart, 1998), or through other mechanisms. The most robust interpretations will be possible when the functional polymorphism appears to directly influence the level of the intermediate phenotype of interest (as in the CRP example), but such examples are probably going to be less common in Mendelian randomization than cases where the polymorphism can influence several systems, with different potential interpretations of how the effect on outcome is generated. How to Investigate Reintroduced Confounding Within Mendelian Randomization Linkage disequilibrium and pleiotropy can reintroduce confounding and vitiate the power of the Mendelian randomization approach. Genomic knowledge may help in estimating the degree to which these are likely to be problems in any particular Mendelian randomization study, through, for instance, explication of genetic variants that may be in linkage disequilibrium with the variant under study or the function of a particular variant and its known pleiotropic effects. Furthermore, genetic variation can be related to measures of potential confounding factors in each study, and the magnitude of such confounding can be estimated. Empirical studies to date suggest that common genetic variants are largely unrelated to the behavioral and socioeconomic factors considered to be important confounders in conventional observational studies. However, relying on measuring of confounders does, of course, remove the central purpose of Mendelian randomization, which is to balance unmeasured as well as measured confounders (as randomization does in RCTs). In some circumstances the genetic variant will be related to the environmentally modifiable exposure of interest in some populations but not in others. An example of this relates to the alcohol ALDH2 genotype and blood pressure example discussed earlier. The results displayed relate to men because in the populations under study women drink very little whatever their genotype (Figure 9.14). If ALDH2 genetic variation influenced blood pressure for reasons other than its influence on alcohol intake, for example, if it was in linkage disequilibrium with another genetic variant that influenced blood pressure through another pathway or if there was a pleiotropic effect of the genetic variant on blood pressure, the same genotype–blood pressure association should be seen among men and women. If, however, the genetic variant influences only blood pressure through its effect on alcohol intake, an effect should be seen only in men. Figure 9.15 demonstrates that the genotype–blood pressure association is indeed seen
9 Obtaining Robust Causal Evidence From Observational Studies
235
only in men, further strengthening evidence that the genotype–blood pressure association depends upon the genotype influencing alcohol intake and that the associations do indeed provide casual evidence of an influence of alcohol intake on blood pressure. In some cases it may be possible to identify two separate genetic variants that are not in linkage disequilibrium with each other but that serve as proxies for the environmentally modifiable risk factor of interest. If both variants are related to the outcome of interest and point to the same underlying association, then it becomes much less plausible that reintroduced confounding explains the association since it would have to be acting in the same way for these two unlinked variants. This can be likened to RCTs of different blood pressure–lowering agents, which work through different mechanisms and have different potential side effects but lower blood pressure to the same degree. If the different agents produce the same reductions in cardiovascular disease risk, then it is unlikely that this is through agentspecific effects of the drugs; rather, it points to blood pressure lowering as being key. The use of multiple genetic variants working through different pathways has not been applied in Mendelian randomization to date but represents an important potential development in the methodology.
Canalization and Developmental Stability Perhaps a greater potential problem for Mendelian randomization than reintroduced confounding arises from the developmental compensation that may
60
Women Men
Alcohol g/day
50 40 30 20 10 0 *1*1
*1*2
*2*2
Figure 9.14 ALDH2 genotype by alcohol consumption (g/day): five studies, n = 6,815. From ‘‘Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
236
Causality and Psychopathology
22vs11 (Male) Saito et al., 2003 Tsuritani et al., 1955 Amamoto et al., 2002 Yamada et al., 2002 Takagi et al., 2001 Subtotal (I-squared = 18.0%, p = 0.300)
−13.90 (−20.45, −7.35) −6.80 (−13.06, −0.54) −8.40 (−13.13, −3.67) −6.80 (−11.28, −2.32) −5.90 (−9.15, −2.65) −7.44 (−9.49, −5.39)
22vs11 (Female) Amamoto et al., 2002 Takagi et al., 2001 Subtotal (I-squared = 0.0%, p = 0.767)
0.90 (-3.33, 5.13) 0.10 (-3.07, 3.27) 0.39 (-2.15, 2.93)
Figure 9.15 ALDH2 genotype and systolic blood pressure. From ‘‘Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
occur through a polymorphic genotype being expressed during fetal or early postnatal development and, thus, influencing development in such a way as to buffer against the effect of the polymorphism. Such compensatory processes have been discussed since C. H. Waddington (1942) introduced the notion of canalization in the 1940s. Canalization refers to the buffering of the effects of either environmental or genetic forces attempting to perturb development, and Waddington’s ideas have been well developed both empirically and theoretically (Wilkins, 1997; Rutherford, 2000; Gibson & Wagner, 2000; Hartman, Garvik, & Hartwell, 2001; Debat & David, 2001; Kitami & Nadeau, 2002; Gu et al., 2003; Hornstein & Shomron, 2006). Such buffering can be achieved either through genetic redundancy (more than one gene having the same or similar function) or through alternative metabolic routes, where the complexity of metabolic pathways allows recruitment of different pathways to reach the same phenotypic end point. In effect, a functional polymorphism expressed during fetal development or postnatal growth may influence the expression of a wide range of other genes, leading to changes that may compensate for the influence of the polymorphism. Put crudely, if a person has developed and grown from the intrauterine period onward within an environment in which one factor is perturbed (e.g., there is elevated CRP due to genotype), then that person may be rendered resistant to the influence of lifelong elevated circulating CRP, through permanent changes in tissue structure and function that counterbalance its effects. In intervention trials—for example, RCTs of cholesterol-lowering drugs—the intervention is generally randomized to participants during middle age; similarly, in observational studies of this issue, cholesterol levels are ascertained during adulthood. In Mendelian randomization, on the other hand, randomization occurs before birth. This leads to important caveats when attempting to relate the findings of conventional observational epidemiological studies to the findings of studies carried out within the Mendelian randomization paradigm.
9 Obtaining Robust Causal Evidence From Observational Studies
237
The most dramatic demonstrations of developmental compensation come from knockout studies, where a functioning gene is essentially removed from an organism. The overall phenotypic effects of such knockouts have often been much lower than knowledge of the function of the genes would predict, even in the absence of others genes carrying out the same function as the knockout gene (Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner , 2000). For example, pharmacological inhibition demonstrates that myoglobulin is essential to maintain energy balance and contractile function in the myocardium of mice, yet disrupting the myoglobulin gene resulted in mice devoid of myoglobulin with no disruption of cardiac function (Garry et al., 1998). In the field of animal studies—such as knockout preparations or transgenic animals manipulated so as to overexpress foreign DNA—the interpretive problem created by developmental compensation is well recognized (Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner, 2000). Conditional preparations—in which the level of transgene expression can be induced or suppressed through the application of external agents—are now being utilized to investigate the influence of such altered gene expression after the developmental stages during which compensation can occur (Bolon & Galbreath, 2002). Thus, further evidence on the issue of genetic buffering should emerge to inform interpretations of both animal and human studies. Most examples of developmental compensation relate to dramatic genetic or environmental insults; thus, it is unclear whether the generally small phenotypic differences induced by common functional polymorphisms will be sufficient to induce compensatory responses. The fact that the large gene– environment interactions that have been observed often relate to novel exposures that have not been present during the evolution of a species (e.g., drug interactions) (Wright et al., 2002) may indicate that homogenization of response to exposures that are widely experienced—as would be the case with the products of functional polymorphisms or common mutations— has occurred; canalizing mechanisms could be particularly relevant in these cases. Further work on the basic mechanisms of developmental stability and how this relates to relatively small exposure differences during development will allow these considerations to be taken forward. Knowledge of the stage of development at which a genetic variant has functional effects will also allow the potential of developmental compensation to buffer the response to the variant to be assessed. In some Mendelian randomization designs developmental compensation is not an issue. For example, when maternal genotype is utilized as an indicator of the intrauterine environment, the response of the fetus will not differ whether the effect is induced by maternal genotype or by environmental perturbation and the effect on the fetus can be taken to indicate the
238
Causality and Psychopathology
effect of environmental influences during the intrauterine period. Also, in cases where a variant influences an adulthood environmental exposure (e.g., ALDH2 variation and alcohol intake), developmental compensation to genotype will not be an issue. In many cases of gene by environment interaction interpreted with respect to causality of the environmental factor, the same applies. However, in some situations there remains the somewhat unsatisfactory position of Mendelian randomization facing a potential problem that cannot currently be adequately assessed. The parallels between Mendelian randomization in human studies and equivalent designs in animal studies are discussed in Box 9.3. Complexity of Associations and Interpretations The interpretation of findings from studies that appear to fall within the Mendelian randomization remit can often be complex, as has been previously discussed with respect to MTHFR and folate intake (Davey Smith & Ebrahim, 2003). As a second example, consider the association of extracellular superoxide dismutase (EC-SOD) and CHD. EC-SOD is an extracellular scavenger of superoxide anions, and thus, genetic variants associated with higher circulating EC-SOD levels might be considered to mimic higher levels of antioxidants. However, findings are dramatically opposite to this—bearers of such variants have an increased risk of CHD (Juul et al., 2004). The explanation of this apparent paradox may be that the higher circulating EC-SOD levels associated with the variant arises from movement of EC-SOD from arterial walls; thus, the in situ antioxidative properties of these arterial walls are lower in individuals with the variant associated with higher circulating EC-SOD. The complexity of these interpretations—together with their sometimes speculative nature—detracts from the transparency that otherwise makes Mendelian randomization attractive. Lack of Suitable Genetic Variants to Proxy for Exposure of Interest An obvious limitation of Mendelian randomization is that it can examine only areas for which there are functional polymorphisms (or genetic markers linked to such functional polymorphisms) that are relevant to the modifiable exposure of interest. In the context of genetic association studies more generally it has been pointed out that in many cases even if a locus is involved in a disease-related metabolic process there may be no suitable marker or functional polymorphism to allow study of this process (Weiss & Terwilliger, 2000). In an earlier work on Mendelian randomization (Davey Smith & Ebrahim, 2003) we discussed the example of vitamin C since one of our
9 Obtaining Robust Causal Evidence From Observational Studies
239
box 9.3 Meiotic Randomization in Animal Studies The approach to causal inference underlying Mendelian randomization is also utilized in nonhuman animal studies. For instance, in investigations of the structural neuroanatomical factors underlying behavioral traits in rodents, there has been use of genetic crosses that lead to different onaverage structural features (Roderic, Wimer, & Wimer, 1976; Weimer, 1973; Lipp et al., 1989). Lipp et al. (1989) refer to this as ‘‘meiotic randomization’’ and consider that the advantages of this method are that the brain-morphology differences that are due to genetic difference occur before any of the behavioral traits develop and, therefore, these differences cannot be a feedback function of behavior (which is equivalent to the avoidance of reverse causality in human Mendelian randomization studies) and that other difference between the animals are randomized with respect to the brainmorphology differences of interest (equivalent to the avoidance of confounding in human Mendelian randomization studies). Li and colleagues (2006) apply this method to the dissection of adiposity and body composition in mice and point out that in experimental crosses meiosis serves as a randomization mechanism that distributes naturally occurring genetic variation in a combinatorial fashion among a set of cross progeny. Genetically randomized populations share the properties of statistically designed experiments that provide a basis for causal inference. This is consistent with the notion that causation flows from genes to phenotypes. We propose that the inference of causal direction can be extended to include relationships among phenotypes. Mendelian randomization within epidemiology reflects similar thinking among transgenic animal researchers. Williams and Wagner (2000) consider that A properly designed transgenic experiment can be a thing of exquisite beauty in that the results support absolutely unambiguous conclusions regarding the function of a given gene or protein within the authentic biological context of an intact animal. A transgenic experiment may provide the most rigorous test possible of a mechanistic hypothesis that was generated by previous observational studies. A successful transgenic experiment can cut through layers of uncertainty that cloud the interpretation of the results produced by other experimental designs. (continued)
Causality and Psychopathology
240
Box 9.3 (continued) The problems of interpreting some aspects of transgenic animal studies may also apply to Mendelian randomization within genetic epidemiology, however; and linked progress across the fields of genomics, animal experimentation, and epidemiology will better define the scope of Mendelian randomization in the future.
examples of how observational epidemiology appeared to have got the wrong answer related to vitamin C. We considered whether the association between vitamin C and CHD could have been studied utilizing the principles of Mendelian randomization. We stated that polymorphisms exist that are related to lower circulating vitamin C levels—for example, the haptoglobin polymorphism (Langlois, Delanghe, De Buyzere, Bernard, & Ouyang, 1997; Delanghe, Langlois, Duprez, De Buyzere, & Clement, 1999)—but in this case the effect on vitamin C is at some distance from the polymorphic protein and, as in the apolipoprotein E example, the other phenotypic differences could have an influence on CHD risk that would distort examination of the influence of vitamin C levels through relating genotype to disease. SLC23A1—a gene encoding for the vitamin C transporter SVCT1 and vitamin C transport by intestinal cells—would be an attractive candidate for Mendelian randomization studies. However, by 2003 (the date of our earlier report) a search for variants had failed to find any common single-nucleotide polymorphism that could be used in such a way (Erichsen, Eck, Levine, & Chanock, 2001). We therefore used this as an example of a situation where suitable polymorphisms for studying the modifiable risk factor of interest—in this case, vitamin C—could not be located. However, since the earlier report, a functional variation in SLC23A1 has been identified that is related to circulating vitamin C levels (N. J. Timpson et al., 2010). We use this example not to suggest that the obstacle of locating relevant genetic variation for particular problems in observational epidemiology will always be overcome but to point out that rapidly developing knowledge of human genomics will identify more variants that can serve as instruments for Mendelian randomization studies.
Conclusions: Mendelian Randomization, What It Is and What It Is Not Mendelian randomization is not predicated on the presumption that genetic variants are major determinants of health and disease within populations.
9 Obtaining Robust Causal Evidence From Observational Studies
241
There are many cogent critiques of genetic reductionism and the overselling of ‘‘discoveries’’ in genetics that reiterate obvious truths so clearly (albeit somewhat repetitively) that there is no need to repeat them here (e.g., Berkowitz, 1996; Baird, 2000; Holtzman, 2001; Strohman, 1993; Rose, 1995). Mendelian randomization does not depend upon there being ‘‘genes for’’ particular traits and certainly not in the strict sense of a gene for a trait being one that is maintained by selection because of its causal association with that trait (Kaplan & Pigliucci, 2001). The association of genotype and the environmentally modifiable factor that it proxies for will be like most genotype–phenotype associations, one that is contingent and cannot be reduced to individual-level prediction but within environmental limits will pertain at a group level (Wolf, 1995). This is analogous to an RCT of antihypertensive agents, where at the collective level the group randomized to active medication will have lower mean blood pressure than the group randomized to placebo but at the individual level many participants randomized to active treatment will have higher blood pressure than many individuals randomized to placebo. Indeed, in the phenocopy/genocopy example of pellagra and Hartnup disease discussed in Box 9.1, only a minority of the Hartnup gene carriers develop symptoms but at the group level they have a much greater tendency for such symptoms and a shift in amino acid levels that reflects this (Scriver, Mahon, & Levy, 1987; Scriver, 1988). These grouplevel differences are what creates the analogy between Mendelian randomization and RCTs, outlined in Figure 9.11. Finally, the associations that Mendelian randomization depend upon do need to pertain to a definable group at a particular time but do not need to be immutable. Thus, ALDH2 variation will not be related to alcohol consumption in a society where alcohol is not consumed, and the association will vary by gender and by cultural group and may change over time (Higuchi et al., 1994; Hasin et al., 2002). Within the setting of a study of a well-defined group, however, the genotype will be associated with group-level differences in alcohol consumption and group assignment will not be associated with confounding variables.
Mendelian Randomization and Genetic Epidemiology Critiques of contemporary genetic epidemiology often focus on two features of findings from genetic association studies: that the population attributable risk of the genetic variants is low and that in any case the influence of genetic factors is not reversible. Illustrating both of these criticisms, Terwilliger and Weiss (2003, p. 35) suggest as reasons for considering that many of the current claims regarding genetic epidemiology are hype (1) that alleles identified as increasing the risk of common diseases ‘‘tend to be
242
Causality and Psychopathology
involved in only a small subset of all cases of such diseases’’ and (2) that in any case ‘‘while the concept of attributable risk is an important one for evaluating the impact of removable environmental factors, for non-removable genetic risk factors, it is a moot point.’’ These evaluations of the role of genetic epidemiology are not relevant when considering the potential contributions of Mendelian randomization. This approach is not concerned with the population attributable risk of any particular genetic variant but the degree to which associations between the genetic variant and disease outcomes can demonstrate the importance of environmentally modifiable factors as causes of disease, for which the population attributable risk is of relevance to public-health prioritization. Consider, for example, the case of familial hypercholesterolemia or familial defective apo B. The genetic mutations associated with these conditions will account for only a trivial percentage of cases of CHD within the population (i.e., the population attributable risk will be low). For example, in a Danish population, the frequency of familial defective apo B is 0.08% and, despite its sevenfold increased risk of CHD, will generate a population attributable risk of only 0.5% (Tybjaerg-Hansen, Steffensen, Meinertz, Schnohr, & Nordestgaard, 1998). However, by identifying blood cholesterol levels as a causal factor for CHD, the triangular association between genotype, blood cholesterol, and CHD risk identifies an environmentally modifiable factor with a very high population attributable risk—assuming that 50% of the population have raised blood cholesterol above 6.0 mmol/l and this is associated with a relative risk of twofold, a population attributable risk of 33% is obtained. The same logic applies to the other examples—the attributable risk of the genotype is low, but the population attributable risk of the modifiable environmental factor identified as causal through the genotype–disease associations is large. The same reasoning applies when considering the suggestion that since genotype cannot be modified, genotype– disease associations are not of public-health importance (Terwilliger & Weiss, 2003). The point of Mendelian randomization approaches is not to attempt to modify genotype but to utilize genotype–disease associations to strengthen inferences regarding modifiable environmental risks for disease and then reduce disease risk in the population through applying this knowledge. Mendelian randomization differs from other contemporary approaches to genetic epidemiology in that its central concern is not with the magnitude of genetic variant influences on disease but, rather, with what the genetic associations tell us about environmentally modifiable causes of disease. Many years ago, in this Noble Prize acceptance speech, the pioneering geneticist Thomas Hunt Morgan contrasted his views with the then popular genetic approach to disease, eugenics. He thought that ‘‘through public hygiene and protective measures of various kinds we can more successfully cope with
9 Obtaining Robust Causal Evidence From Observational Studies
243
some of the evils that human flesh is heir to. Medical science will here take the lead—but I hope that genetics can at times offer a helping hand’’ (Morgan, 1935). More than seven decades later, it might now be time that genetic research can directly strengthen the knowledge base of public health.
References Ames, B. N. (1999). Cancer prevention and diet: Help from single nucleotide polymorphisms. Proceedings of the National Academy of Sciences USA, 96, 12216–12218. Baird, P. (2000). Genetic technologies and achieving health for populations. International Journal of Health Services, 30, 407–424. Baron, D. N., Dent, C. E., Harris, H., Hart, E. W., & Jepson, J. B. (1956). Hereditary pellagra-like skin rash with temporary cerebellar ataxia, constant renal aminoaciduria, and other bizarre biochemical features. Lancet, 268, 421–429. Berkowitz, A. (1996). Our genes, ourselves? Bioscience, 46, 42–51. Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometric Bulletin, 2, 47–53. Bhatti, P., Sigurdson, A. J., Wang, S. S., Chen, J., Rothman, N., Hartge, P., et al. (2005). Genetic variation and willingness to participate in epidemiological research: Data from three studies. Cancer Epidemiology, Biomarkers and Prevention, 14, 2449–2453. Birge, S. J., Keutmann, H. T., Cuatrecasas, P., & Whedon, G. D. (1967). Osteoporosis, intestinal lactase deficiency and low dietary calcium intake. New England Journal of Medicine, 276, 445–448. Bolon, B., & Galbreath, E. (2002). Use of genetically engineered mice in drug discovery and development: Wielding Occam’s razor to prune the product portfolio. International Journal of Toxicology, 21, 55–64. Botto, L. D., & Yang, Q. (2000). 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: A HuGE review. American Journal of Epidemiology, 151, 862–877. Bovet, P., & Paccaud, F. (2001). Alcohol, coronary heart disease and public health: Which evidence-based policy? International Journal of Epidemiology, 30, 734–737. Brennan, P. (2002). Gene environment interaction and aetiology of cancer: What does it mean and how can we measure it? Carcinogenesis, 23(3), 381–387. Broer, S., Cavanaugh, J. A., & Rasko, J. E. J. (2004). Neutral amino acid transport in epithelial cells and its malfunction in Hartnup disorder. Transporters, 33, 233–236. Burd, L. J. (2006). Interventions in FASD: We must do better. Child: Care, Health, and Development, 33, 398–400. Burr, M. L., Fehily, A. M., Butland, B. K., Bolton, C. H., & Eastham, R. D. (1986). Alcohol and high-density-lipoprotein cholesterol: A randomized controlled trial. British Journal of Nutrition, 56, 81–86. Cardon, L. R., & Bell, J. I. (2001). Association study designs for complex diseases. Nature Reviews: Genetics, 2, 91–99. Casas, J. P., Shah, T., Cooper, J., Hawe, E., McMahon, A. D., Gaffney, D., et al. (2006). Insight into the nature of the CRP–coronary event association using Mendelian randomization. International Journal of Epidemiology, 35, 922–931.
244
Causality and Psychopathology
Chao, Y.-C., Liou, S.-R., Chung, Y.-Y., Tang, H.-S., Hsu, C.-T., Li, T.-K., et al. (1994). Polymorphism of alcohol and aldehyde dehydrogenase genes and alcoholic cirrhosis in Chinese patients. Hepatology, 19, 360–366. Chen, L., Davey Smith, G., Harbord, R., & Lewis, S. (2008). Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach. PLoS Medicine, 5, e52. Cheverud, J. M. (1988). A comparison of genetic and phenotypic correlations. Evolution, 42, 958–968. Clayton, D., & McKeigue, P. M. (2001). Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet, 358, 1356–1360. Colhoun, H., McKeigue, P. M., & Davey Smith, G. (2003). Problems of reporting genetic associations with complex outcomes. Lancet, 361, 865–872. Correns, C. (1900). G. Mendel’s Regel u¨ber das Verhalten der Nachkommenschaft der Bastarde. Berichte der Deutschen Botanischen Gesellschaft, 8, 158–168. (English translation, Correns, C. [1966]. G. Mendel’s law concerning the behavior of progeny of varietal hybrids. In Stern and Sherwood [pp. 119–132]. New York: W. H. Freeman.) Czeizel, A. E., & Duda´s, I. (1992). Prevention of the first occurrence of neural-tube defects by periconceptional vitamin supplementation. New England Journal of Medicine, 327, 1832–1835. Danesh, J., Wheller, J. B., Hirschfield, G. M., Eda, S., Eriksdottir, G., Rumley, A., et al. (2004). C-reactive protein and other circulating markers of inflammation in the prediction of coronary heart disease. New England Journal of Medicine, 350, 1387–1397. Davey Smith, G. (2006). Cochrane Lecture. Randomised by (your) god: Robust inference from an observational study design. Journal of Epidemiology and Community Health, 60, 382–388. Davey Smith, G., & Ebrahim, S. (2002). Data dredging, bias, or confounding [Editorial]. British Medical Journal, 325, 1437–1438. Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32, 1–22. Davey Smith, G., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42. Davey Smith, G., & Ebrahim, S. (2005). What can Mendelian randomization tell us about modifiable behavioural and environmental exposures. British Medical Journal, 330, 1076–1079. Davey Smith, G., Harbord, R., Milton, J., Ebrahim, S., & Sterne, J. A. C. (2005a). Does elevated plasma fibrinogen increase the risk of coronary heart disease? Evidence from a meta-analysis of genetic association studies. Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 2228–2233. Davey Smith, G., & Hart, C. (2002). Lifecourse socioeconomic and behavioural influences on cardiovascular disease mortality: The Collaborative study. American Journal of Public Health, 92, 1295–1298. Davey Smith, G., Lawlor, D. A., Harbord, R., Timpson, N. J., Day, I., & Ebrahim, S. (2008). Clustered environments and randomized genes: A fundamental distinction between conventional and genetic epidemiology. PLoS Medicine, 4, 1985–1992. Davey Smith, G., Lawlor, D., Harbord, R., Timpson, N., Rumley, A., Lowe, G., et al. (2005b). Association of C-reactive protein with blood pressure and hypertension: Lifecourse confounding and Mendelian randomization tests of causality. Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 1051–1056.
9 Obtaining Robust Causal Evidence From Observational Studies
245
Davey Smith, G., & Phillips, A. N. (1996). Inflation in epidemiology: ‘‘The proof and measurement of association between two things’’ revisited. British Medical Journal, 312, 1659–1661. Davey Smith, G., Timpson, N. & Ebrahim, S. (2008). Strengthening causal inference in cardiovascular epidemiology through Mendelian randomization. Annals of Medicine, 40, 524–541. Debat, V., & David, P. (2001). Mapping phenotypes: Canalization, plasticity and developmental stability. Trends in Ecology and Evolution, 16, 555–561. Delanghe, J., Langlois, M., Duprez, D., De Buyzere, M., & Clement, D. (1999). Haptoglobin polymorphism and peripheral arterial occlusive disease. Atherosclerosis, 145, 287–292. Ebrahim, S., & Davey Smith, G. (2008). Mendelian randomization: Can genetic epidemiology help redress the failures of observational epidemiology? Human Genetics, 123, 15–33. Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A., & Hennekens, C. H. (2004). Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Archives of Internal Medicine, 164, 1552–1556. Enomoto, N., Takase, S., Yasuhara, M., & Takada, A. (1991). Acetaldehyde metabolism in different aldehyde dehydrogenase-2 genotypes. Alcoholism, Clinical and Experimental Research, 15, 141–144. Erichsen, H. C., Eck, P., Levine, M., & Chanock, S. (2001). Characterization of the genomic structure of the human vitamin C transporter SVCT1 (SLC23A2). Journal of Nutrition, 131, 2623–2627. Færgeman, O. (2003). Coronary artery disease: Genes drugs and the agricultural connection. Amsterdam: Elsevier. Fallon, U. B., Ben-Shlomo, Y., & Davey Smith, G. (2001, March 14). Homocysteine and coronary heart disease. Heart. http://heart.bmjjournals.com/cgi/eletters/85/2/153 Garry, D. J., Ordway, G. A., Lorenz, J. N., Radford, E. R., Chin, R. W., Grange, R., et al. (1998). Mice without myoglobulin. Nature, 395, 905–908. Gause, G. F. (1942). The relation of adaptability to adaption. Quarterly Review of Biology, 17, 99–114. Gemma, S., Vichi, S., & Testai, E. (2007). Metabolic and genetic factors contributing to alcohol induced effects and fetal alcohol syndrome. Neuroscience and Biobehavioral Reviews, 31, 221–229. Gerlai, R. (2001). Gene targeting: Technical confounds and potential solutions in behavioural and brain research. Behavioural Brain Research, 125, 13–21. Gibson, G., & Wagner, G. (2000). Canalization in evolutionary genetics: A stabilizing theory? BioEssays, 22, 372–380. Glebart, W. M. (1998). Databases in genomic research. Science, 282, 659–661. Glynn, R. K. (2006). Genes as instruments for evaluation of markers and causes [Commentary]. International Journal of Epidemiology, 35, 932–934. Goldschmidt, R. B. (1938). Physiological genetics. New York: McGraw-Hill. Gray, R., & Wheatley, K. (1991). How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplantation, 7(Suppl. 3), 9–12. Gu, J., Liang, D., Wang, Y., Lu, C., & Wu, X. (2005). Effects of N-acetyl transferase 1 and 2 polymorphisms on bladder cancer risk in Caucasians. Mutation Research, 581, 97–104. Gu, Z., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W., & Li, W.-H. (2003). Role of duplicate genes in genetic robustness against null mutations. Nature, 421:63–66.
246
Causality and Psychopathology
Gutjahr, E., Gmel, G., & Rehm, J. (2001). Relation between average alcohol consumption and disease: An overview. European Addiction Research, 7, 117–127. Guy, J. T. (1993). Oral manifestations of systematic disease. In C. W. Cummings, J. Frederick, L. Harker, C. Krause, & D. Schuller (Eds.), Otolaryngology—head and neck surgery (Vol. 2). St. Louis: Mosby Year Book. Han, T. S., Sattar, N., Williams, K., Gonzalez-Villalpando, C., Lean, M. E., & Haffner, S. M. (2002). Prospective study of C-reactive protein in relation to the development of diabetes and metabolic syndrome in the Mexico City Diabetes Study. Diabetes Care, 25, 2016–2021. Hart, C., Davey Smith, G., Hole, D., & Hawthorne, V. (1999). Alcohol consumption and mortality from all causes, coronary heart disease, and stroke: Results from a prospective cohort study of Scottish men with 21 years of follow up. British Medical Journal, 318, 1725–1729. Hartman, J. L., Garvik, B., & Hartwell, L. (2001). Principles for the buffering of genetic variation. Science, 291, 1001–1004. Hasin, D., Aharonovich, E., Liu, X., Mamman, Z., Matseoane, K., Carr, L., et al. (2002). Alcohol and ADH2 in Israel: Ashkenazis, Sephardics, and recent Russian immigrants. American Journal of Psychiatry, 159(8), 1432–1434. Haskell, W. L., Camargo, C., Williams, P. T., Vranizan, K. M., Krauss, R. M., Lindgren, F. T., et al. (1984). The effect of cessation and resumption of moderate alcohol intake on serum high-density-lipoprotein subfractions. New England Journal of Medicine, 310, 805–810. Heart Protection Study Collaborative Group. (2002). MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20536 high-risk individuals: A randomised placebo-controlled trial. Lancet, 360, 23–33. Higuchi, S., Matsuushita, S., Imazeki, H., Kinoshita, T., Takagi, S., & Kono, H. (1994). Aldehyde dehydrogenase genotypes in Japanese alcoholics. Lancet, 343, 741–742. Hirschfield, G. M., & Pepys, M. B. (2003). C-reactive protein and cardiovascular disease: New insights from an old molecule. Quarterly Journal of Medicine, 9, 793–807. Holtzman, N. A. (2001). Putting the search for genes in perspective. International Journal of Health Services, 31, 445. Honkanen, R., Pulkkinen, P., Ja¨rvinen, R., Kro¨ger, H., Lindstedt, K., Tuppurainen, M., et al. (1996). Does lactose intolerance predispose to low bone density? A populationbased study of perimenopausal Finnish women. Bone, 19, 23–28. Hornstein, E., & Shomron, N. (2006). Canalization of development by microRNAs. Nature Genetics, 38, S20–S24. Hu, F. B., Meigs, J. B., Li, T. Y., Rifai, N., & Manson, J. E. (2004). Inflammatory markers and risk of developing type 2 diabetes in women. Diabetes, 53, 693–700. Jablonka-Tavory, E. (1982). Genocopies and the evolution of interdependence. Evolutionary Theory, 6, 167–170. Jacobson, S. W., Carr, L. G., Croxford, J., Sokol, R. J., Li, T. K., & Jacobson, J. L. (2006). Protective effects of the alcohol dehydrogenase-ADH1B allele in children exposed to alcohol during pregnancy. Journal of Pediatrics, 148, 30–37. Jousilahti, P., & Salomaa, V. (2004). Fibrinogen, social position, and Mendelian randomisation. Journal of Epidemiology and Community Health, 58, 883. Juul, K., Tybjaerg-Hansen, A., Marklund, S., Heegaard, N. H. H., Steffensen, R., Sillesen, H., et al. (2004). Genetically reduced antioxidative protection and increased ischaemic heart disease risk: The Copenhagen City Heart Study. Circulation, 109, 59–65.
9 Obtaining Robust Causal Evidence From Observational Studies
247
Kaplan, J. M., & Pigliucci, M. (2001). Genes ‘‘for’’ phenotypes: A modern history view. Biology and Philosophy, 16, 189–213. Katan, M. B. (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet, I, 507–508 (reprinted International Journal of Epidemiology, 2004, 34, 9). Kathiresan, S., Melander, O., Anevski, D., Guiducci, C., Burtt, N. P., Roos, C., et al. (2008). Polymorphisms associated with cholesterol and risk of cardiovascular events. New England Journal of Medicine, 358, 1240–1249. Keavney, B. (2002). Genetic epidemiological studies of coronary heart disease. International Journal of Epidemiology, 31, 730–736. Keavney, B., Danesh, J., Parish, S., Palmer, A., Clark, S., Youngman, L., et al.; International Studies of Infarct Survival (ISIS) Collaborators. (2006). Fibrinogen and coronary heart disease: Test of causality by ‘‘Mendelian randomization.’’ International Journal of Epidemiology, 35, 935–943. Kelada, S. N., Eaton, D. L., Wang, S. S., Rothman, N. R., & Khoury, M. J. (2003). The role of genetic polymorphisms in environmental health. Environmental Health Perspectives, 111, 1055–1064. Khaw, K.-T., Bingham, S., Welch, A., Luben, R., Wareham, N., Oakes, S., et al. (2001). Relation between plasma ascorbic acid and mortality in men and women in EPICNorfolk prospective study: A prospective population study. Lancet, 357, 657–663. Kitami, T., & Nadeau, J. H. (2002). Biochemical networking contributes more to genetic buffering in human and mouse metabolic pathways than does gene duplication. Nature Genetics, 32, 191–194. Klatsky, A. L. (2001). Could abstinence from alcohol be hazardous to your health [Commentary]? International Journal of Epidemiology, 30, 739–742. Kraut, J. A., & Sachs, G. (2005). Hartnup disorder: Unravelling the mystery. Trends in Pharmacological Sciences, 26, 53–55. Langlois, M. R., Delanghe, J. R., De Buyzere, M. L., Bernard, D. R., & Ouyang, J. (1997). Effect of haptoglobin on the metabolism of vitamin C. American Journal of Clinical Nutrition, 66, 606–610. Lawlor, D. A., Davey Smith, G., Kundu, D., Bruckdorfer, K. R., & Ebrahim, S. (2004). Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet, 363, 1724–1727. Lawlor, D. A., Ebrahim, S., Kundu, D., Bruckdorfer, K. R., Whincup, P. H., & Davey Smith, G. (2005). Vitamin C is not associated with coronary heart disease risk once life course socioeconomic position is taken into account: Prospective findings from the British Women’s Heart and Health Study. Heart, 91, 1086–1087. Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., & Davey Smith, G. (2008). Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine, 27, 1133–1163. Leimar, O., Hammerstein, P., & Van Dooren, T. J. M. (2006). A new perspective on developmental plasticity and the principles of adaptive morph determination. American Naturalist, 167, 367–376. Lenz, W. (1973). Phenocopies. Journal of Medical Genetics, 10, 34–48. Lewis, S., & Davey Smith, G. (2005). Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach. Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971. Li, R., Tsaih, S. W., Shockley, K., Stylianou, I. M., Wergedal, J., Paigen, B., et al. (2006). Structural model analysis of multiple quantative traits. PLoS Genetics, 2, 1046–1057.
248
Causality and Psychopathology
Lipp, H. P., Schwegler, H., Crusio, W. E., Wolfer, D. P., Leisinger-Trigona, M. C., Heimrich, B., et al. (1989). Using genetically-defined rodent strains for the identification of hippocampal traits relevant for two-way avoidance behaviour: A noninvasive approach. Experientia, 45, 845–859. Little, J., & Khoury, M. J. (2003). Mendelian randomization: A new spin or real progress? Lancet, 362, 930–931. Lower, G. M., Nilsson, T., Nelson, C. E., Wolf, H., Gamsky, T. E., & Bryan, G. T. (1979). N-Acetylransferase phenotype and risk in urinary bladder cancer: Approaches in molecular epidemiology. Environmental Health Perspectives, 29, 71–79. MacMahon S, Peto R, Collins R, Godwin J, MacMahon S, Cutler J et al. (1990). Blood pressure, stroke, and coronary heart disease. The Lancet, 335, 765-774. Marks, D., Thorogood, M., Neil, H. A. W., & Humphries, S. E. (2003). A review on diagnosis, natural history and treatment of familial hypercholesterolaemia. Atherosclerosis, 168, 1–14. Marmot, M. (2001). Reflections on alcohol and coronary heart disease. International Journal of Epidemiology, 30, 729–734. McGrath, J. (1999). Hypothesis: Is low prenatal vitamin D a risk-modifying factor for schizophrenia? Schizophrenia Research, 40, 173–177. Memik, F. (2003). Alcohol and esophageal cancer, is there an exaggerated accusation? Hepatogastroenterology, 54, 1953–1955. Mendel, G. (1866). Experiments in plant hybridization. Retrieved from http:// www.mendelweb.org/archive/Mendel.Experiments.txt Millen, A. E., Dodd, K. W., & Subar, A. F. (2004). Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: The 1987, 1992, and 2000 National Health Interview Survey results. Journal of the American Dietetic Association, 104, 942–950. Morange, M. (2001). The misunderstood gene. Cambridge, MA: Harvard University Press. Morgan, T. H. (1913). Heredity and sex. New York: Columbia University Press. Morgan, T. H. (1919). Physical basis of heredity. Philadelphia: J. B. Lippincott. Morgan, T. H. (1935). The relation of genetics to physiology and medicine. Scientific Monthly, 41, 5–18. MRC Vitamin Study Research Group. (1991). Prevention of neural tube defects: Results of the Medical Research Council vitamin study. Lancet, 338, 131–137. Mucci, L. A., Wedren, S., Tamimi, R. M., Trichopoulos, D., & Adami, H. O. (2001). The role of gene–environment interaction in the aetiology of human cancer: Examples from cancers of the large bowel, lung and breast. Journal of Internal Medicine, 249, 477–493. Newcomer, A. D., Hodgson, S. F., Douglas, M. D., & Thomas, P. J. (1978). Lactase deficiency: Prevalence in osteoporosis. Annals of Internal Medicine, 89, 218–220. Olby, R. C. (1966). Origins of Mendelism. London: Constable. Osier, M. V., Pakstis, A. J., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., et al. (2002). A global perspective on genetic variation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. American Journal of Human Genetics, 71, 84–99. Palmer L and Cardon L. (2005). Shaking the tree: Mapping complex disease genes with linkage disequilibrium. Lancet, 366, 1223–1234. Perera, F. P. (1997). Environment and cancer: Who are susceptible? Science, 278, 1068–1073.
9 Obtaining Robust Causal Evidence From Observational Studies
249
Pradhan, A. D., Manson, J. E., Rifai, N., Buring, J. E., & Ridker, P. M. (2001). C-reactive protein, interleukin 6, and risk of developing type 2 diabetes mellitus. Journal of the American Medical Association, 286, 327–334. Radimer, K., Bindewald, B., Hughes, J., Ervin, B., Swanson, C., & Picciano, M. F. (2004). Dietary supplement use by US adults: Data from the National Health and Nutrition Examination Survey, 1999–2000. American Journal of Epidemiology, 160, 339–349. Reynolds, K., Lewis, L. B., Nolen, J. D. L., Kinney, G. L., Sathya, B., & He, J. (2003). Alcohol consumption and risk of stroke: A meta-analysis. Journal of the American Medical Association, 289, 579–588. Ridker, P. M., Cannon, C. P., Morrow, D., Rifai, N., Rose, L. M., McCabe, C. H., et al. (2005). C-reactive protein levels and outcomes after statin therapy. New England Journal of Medicine, 352, 20–28. Rimm, E. (2001). Alcohol and coronary heart disease—laying the foundation for future work [Commentry]. International Journal of Epidemiology, 30, 738–739. Rimm, E. B., Stampfer, M. J., Ascherio, A., Giovannucci, E., Colditz, G. A., & Willett, W. C. (1993). Vitamin E consumption and the risk of coronary heart disease in men. New England Journal of Medicine, 328, 1450–1456. Roderic, T. H., Wimer, R. E., & Wimer, C. C. (1976). Genetic manipulation of neuroanatomical traits. In L. Petrinovich & J. L. McGaugh (Eds.), Knowing, thinking, and believing. New York: Plenum Press. Rose, G. (1982). Incubation period of coronary heart disease. British Medical Journal, 284, 1600–1601. Rose, S. (1995). The rise of neurogenetic determinism. Nature, 373, 380–382. Roseboom, T. J., van der Meulen, J. H., Osmond, C., Barker, D. J. P., Ravelli, A. C. J., Schroeder-Tanka, J. M., et al. (2000). Coronary heart disease after prenatal exposure to the Dutch famine, 1944–45. Heart, 84, 595–598. Rothman, N., Wacholder, S., Caporaso, N. E., Garcia-Closas, M., Buetow, K., & Fraumeni, J. F. (2001). The use of common genetic polymorphisms to enhance the epidemiologic study of environmental carcinogens. Biochimica et Biophysica Acta, 1471, C1–C10. Rutherford, S. L. (2000). From genotype to phenotype: Buffering mechanisms and the storage of genetic information. BioEssays, 22, 1095–1105. Scholl, T. O., & Johnson, W. G. (2000). Folic acid: Influence on the outcome of pregnancy. American Journal of Clinical Nutrition, 71(Suppl.), 1295S–1303S. Scientific Steering Committee on Behalf of the Simon Broome Register Group. (1991). Risk of fatal coronary heart disease in familial hyper-cholesterolaemia. British Medical Journal, 303, 893–896. Scriver, C. R. (1988). Nutrient–gene interactions: The gene is not the disease and vice versa. American Journal of Clinical Nutrition, 48, 1505–1509. Scriver, C. R., Mahon, B., & Levy, H. L. (1987). The Hartnup phenotype: Mendelain transport disorder, multifactorial disease. American Journal of Human Genetics, 40, 401–412. Sesso, D., Buring, J. E., Rifai, N., Blake, G. J., Gaziano, J. M., & Ridker, P. M. (2003). C-reactive protein and the risk of developing hypertension. Journal of the American Medical Association, 290, 2945–2951. Shaper, A. G. (1993). Alcohol, the heart, and health [Editorial]. American Journal of Public Health, 83, 799–801. Shastry, B. S. (1998). Gene disruption in mice: Models of development and disease. Molecular and Cellular Biochemistry, 181, 163–179.
250
Causality and Psychopathology
Sjo¨holm, A., & Nystro¨m, T. (2005). Endothelial inflammation in insulin resistance. Lancet, 365, 610–612. Slack, J. (1969). Risks of ischaemic heart disease in familial hyperlipoproteinaemic states. Lancet, 2, 1380–1382. Snyder, L. H. (1959). Fifty years of medical genetics. Science, 129, 7–13. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. Stampfer, M. J., Hennekens, C. H., Manson, J. E., Colditz, G. A., Rosner, B., & Willett, W. C. (1993). Vitamin E consumption and the risk of coronary disease in women. New England Journal of Medicine, 328, 1444–1449. Steinberg, D. (2004). Thematic review series. The pathogenesis of atherosclerosis. An interpretive history of the cholesterol controversy: part 1. Journal of Lipid Research, 45, 1583–1593. Steinberg, D. (2005). Thematic review series. The pathogenesis of atherosclerosis. An interpretive history of the cholesterol controversy: part II. The early evidence linking hypercholesterolemia to coronary disease in humans. Journal of Lipid Research, 46, 179–190. Strohman, R. C. (1993). Ancient genomes, wise bodies, unhealthy people: The limits of a genetic paradigm in biology and medicine. Perspectives in Biology and Medicine, 37, 112–145. Takagi, S., Iwai, N., Yamauchi, R., Kojima, S., Yasuno, S., Baba, T., et al. (2002). Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in Japanese men. Hypertension Research, 25, 677–681. Terwilliger, J. D., & Weiss, W. M. (2003). Confounding, ascertainment bias, and the blind quest for a genetic ‘‘fountain of youth.’’ Annals of Medicine, 35, 532–544. Thomas, D. C., & Conti, D. V. (2004). Commentary on the concept of ‘‘Mendelian randomization.’’ International Journal of Epidemiology, 33, 17–21. Thompson, W. D. (1991). Effect modification and the limits of biological inference from epidemiological data. Journal of Clinical Epidemiology, 44, 221–232. Thun, M. J., Peto, R., Lopez, A. D., Monaco, J. H., Henley, S. J., Heath, C. W., et al. (1997). Alcohol consumption and mortality among middle-aged and elderly U.S. adults. New England Journal of Medicine, 337, 1705–1714. Timpson, N. J., Lawlor, D. A., Harbord, R. M., Gaunt, T. R., Day, I. N. M., Palmer, L. J., et al. (2005). C-reactive protein and its role in metabolic syndrome: Mendelian randomization study. Lancet, 366:1954–1959. Timpson NJ, Forouhi NH, Brion M-J et al. (2010). Genetic variation at the SLC23A1 locus is associated with circulating levels of L-ascorbic acid (Vitamin C). Evidence from 5 independent studies with over 15000 participants. Am J Clin Nutr, on-line, 001: 10.3945/ajen.2010.29438. Tybjaerg-Hansen, A., Steffensen, R., Meinertz, H., Schnohr, P., & Nordestgaard, B. G. (1998). Association of mutations in the apolipoprotein B gene with hypercholesterolemia and the risk of ischemic heart disease. New England Journal of Medicine, 338, 1577–1584. Verma, S., Szmitko, P. E., & Ridker, P. M. (2005). C-reactive protein comes of age. Nature Clinical Practice, 2, 29–36. Waddington, C. H. (1942). Canalization of development and the inheritance of acquired characteristics. Nature, 150, 563–565.
9 Obtaining Robust Causal Evidence From Observational Studies
251
Warren, K. R., & Li, T. K. (2005). Genetic polymorphisms: Impact on the risk of fetal alcohol spectrum disorders. Birth Defects Research A: Clinical and Molecular Teratology, 73, 195–203. Weimer, R. E. (1973). Dissociation of phenotypic correlation: Response to posttrial etherization and to temporal distribution of practice trials. Behavior Genetics, 3, 379–386. Weiss, K., & Terwilliger, J. (2000). How many diseases does it take to map a gene with SNPs? Nature Genetics, 26, 151–157. West-Eberhard, M. J. (2003). Developmental plasticity and evolution. New York: Oxford University Press. Wheatley, K., & Gray, R. (2004). Mendelian randomization—an update on its use to evaluate allogeneic stem cell transplantation in leukaemia [Commentary]. International Journal of Epidemiology, 33, 15–17. Wilkins, A. S. (1997). Canalization: A molecular genetic perspective. BioEssays, 19, 257–262. Williams, R. S., & Wagner, P. D. (2000). Transgenic animals in integrative biology: Approaches and interpretations of outcome. Journal of Applied Physiology, 88, 1119–1126. Wolf, U. (1995). The genetic contribution to the phenotype. Human Genetics, 95, 127–148. Wright, A. F., Carothers, A. D., & Campbell, H. (2002). Gene–environment interactions—the BioBank UK study. Pharmacogenomics Journal, 2, 75–82. Wu, T., Dorn, J. P., Donahue, R. P., Sempos, C. T., & Trevisan, M. (2002). Associations of serum C-reactive protein with fasting insulin, glucose, and glycosylated hemoglobin: The Third National Health and Nutrition Examination Survey, 1988–1994. American Journal of Epidemiology, 155, 65–71. Youngman, L. D., Keavney, B. D., Palmer, A., Parish, S., Clark, S., Danesh, J., et al. (2000). Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: test of causality by ‘‘Mendelian randomization.’’ Circulation, 102(Suppl. II), 31–32. Zuckerkandl, E., & Villet, R. (1988). Concentration—affinity equivalence in gene regulation: Convergence and envirnonmental effects. Proceedings of the National Academy of Sciences USA, 85, 4784–4788.
10 Rare Variant Approaches to Understanding the Causes of Complex Neuropsychiatric Disorders matthew w. state
The distinction between genetic variation that is present in more than 5% of the population (defined as common) and genetic variation that does not meet this threshold (defined as rare) is often lost in the discussion of psychiatric genetics. As a general proposition, the field has come to equate the hunt for common variants (or alleles) with the search for genes causing or contributing to psychiatric illness. Indeed, the majority of studies on mood disorders, autism, schizophrenia, obsessive–compulsive disorder, attention-deficit/hyperactivity disorder, and Tourette syndrome have restricted their analyses to the potential contribution of common alleles. Studies focusing on rare genetic mutations have, until quite recently, been viewed as outside the mainstream of efforts aimed at elucidating the biological substrates of serious psychopathology. Both the implicit assumption that common alleles underlie the lion’s share of risk for most common neuropsychiatric conditions and the notion that the most expeditious way to elucidate their biological bases will be to concentrate efforts on common alleles deserve careful scrutiny. Indeed, key findings across all of human genetics, including those within psychiatry, support the following alternative conclusions: (1) for disorders such as autism and schizophrenia, the study of rare variants already holds the most immediate promise for defining the molecular and cellular mechanisms of disease (McClellan, Susser, & King, 2007; O’Roak & State, 2008); (2) common variation will be found to carry much more modest risks than previously anticipated (Altshuler & Daly, 2007; Saxena et al., 2007); and (3) rare variation will account for substantial risk for common complex disorders, particularly for neuropsychiatric conditions with relatively early onset and chronic course. This chapter addresses the rare variant genetic approach specifically with respect to mental illness. It first introduces the distinction between the key 252
10 Rare Variant Approaches to Neuropsychiatric Disorders
253
characteristics of common and rare genetic variation. It then briefly addresses the methodologies employed to demonstrate a causal or contributory role for genes in complex disease, focusing on how these approaches differ in terms of the ability to detect and confirm the role of rare variation. The chapter will then turn to a consideration of the genetics of autism-spectrum disorders as a case study of the manner in which rare variants may contribute to the understanding of psychiatric genetics, and finally, the discussion will conclude with a consideration of the implications of emerging genomic technologies for this process.
Genetic Variation The search for ‘‘disease genes’’ is more precisely the search for diseaserelated genetic variation. Basic instructions are coded in DNA to create and sustain life; these instructions vary somewhat between individuals, creating a primary source of human diversity. Variation in these instructions is also thought to be largely responsible for differences in susceptibility to diseases influenced by genes. Concretely, when individuals differ at the level of DNA, it is often with regard to the sequence of its four constituent parts, called ‘‘nucleotides’’ or ‘‘bases,’’ which make up the DNA code: adenine (A), guanine (G), cytosine (C), and thymine (T). Indeed, within the human genome, variations at individual nucleotides appear quite frequently (approximately 1 in every 1,000 bases) (International Human Genome Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001). The vast majority of this variation is related to an individual’s ethnic origin and has no overt consequence for human disease. However, it is not known at present what proportion of the observed differences between individuals either within our outside of regions of the genome that specify the production of proteins (through the process of transcription and translation) might confer subtle alterations in function. At present, while elegant and inventive approaches are being employed to address the question, particularly with regard to ‘‘noncoding’’ DNA (Noonan, 2009; Prabhakar et al., 2008), the consequences of sequence variations identified in these regions remain difficult to interpret. Consequently, while only 2% of the genome is ultimately translated into protein, it is this subset that is most readily understood with regard to its impact on a phenotype of interest (International Human Genome Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001). The terminology applied to genetic variation may be somewhat confusing due to a number of redundant or loosely defined terms. While a threshold of 5% is often used as the cutoff for rare variation, many authors also
254
Causality and Psychopathology
distinguish between these and very rare (