Tutorials in Biostatistics
Tutorials in Biostatistics Volume 2: Statistical Modelling of Complex Medical Data Edited by R. B. D’Agostino, Boston University, USA
Copyright ? 2004
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone
(+44) 1243 779777
Email (for orders and customer service enquiries):
[email protected] Visit our Home Page on www.wileyeurope.com or www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to
[email protected], or faxed to (+44) 1243 770571. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809 John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0-470-02370-8 Typeset by Macmillan India Ltd Printed and bound in Great Britain by Page Bros, Norwich This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.
Contents Preface
vii
Preface to Volume 2
ix
Part I MODELLING A SINGLE DATA SET 1.1 Clustered Data Extending the Simple Linear Regression Model to Account for Correlated Responses: An Introduction to Generalized Estimating Equations and Multi-Level Mixed Modelling. Paul Burton, Lyle Gurrin and Peter Sly
3
1.2 Hierarchical Modelling An Introduction to Hierarchical Linear Modelling. Lisa M. Sullivan, Kimberly A. Dukes and Elena Losina
35
Multilevel Modelling of Medical Data. Harvey Goldstein, William Browne and Jon Rasbash
69
Hierarchical Linear Models for the Development of Growth Curves: An Example with Body Mass Index in Overweight /Obese Adults. Moonseong Heo, Myles S. Faith, John W. Mott, Bernard S. Gorman, David T. Redden and David B. Allison
95
1.3 Mixed Models Using the General Linear Mixed Model to Analyse Unbalanced Repeated Measures and Longitudinal Data. Avital Cnaan, Nan M. Laird and Peter Slasor
127
Modelling Covariance Structure in the Analysis of Repeated Measures Data. Ramon C. Littell, Jane Pendergast and Ranjini Natarajan
159
Covariance Models for Nested Repeated Measures Data: Analysis of Ovarian Steroid Secretion Data. Taesung Park and Young Jack Lee
187
1.4 Likelihood Modelling Likelihood Methods for Measuring Statistical Evidence. Jerey D. Blume
209
Part II MODELLING MULTIPLE DATA SETS: META-ANALYSIS Meta-Analysis: Formulating, Evaluating, Combining, and Reporting. Sharon-Lise T. Normand v
249
vi
CONTENTS
Advanced Methods in Meta-Analysis: Multivariate Approach and Meta-Regression. Hans C. van Houwelingen, Lidia R. Arends and Theo Stijnen
289
Part III MODELLING GENETIC DATA: STATISTICAL GENETICS Genetic Epidemiology: A Review of the Statistical Basis. E. A. Thompson
327
Genetic Mapping of Complex Traits. Jane M. Olson, John S. Witte and Robert C. Elston
339
A Statistical Perspective on Gene Expression Data Analysis. Jaya M. Satagopan and Katherine S. Panageas
361
Part IV DATA REDUCTION OF COMPLEX DATA SETS Statistical Approaches to Human Brain Mapping by Functional Magnetic Resonance Imaging. Nicholas Lange
383
Disease Map Reconstruction. Andrew B. Lawson
423
PART V SIMPLIFIED PRESENTATION OF MULTIVARIATE DATA Presentation of Multivariate Data for Clinical Use: The Framingham Study Risk Score Functions. Lisa M. Sullivan, Joseph M. Massaro and Ralph B. D’Agostino, Sr.
447
Index
477
Preface The development and use of statistical methods has grown exponentially over the last two decades. Nowhere is this more evident than in their application to biostatistics and, in particular, to clinical medical research. To keep abreast with the rapid pace of development, the journal Statistics in Medicine alone is published 24 times a year. Here and in other journals, books and professional meetings, new theory and methods are constantly presented. However, the transitions of the new methods to actual use are not always as rapid. There are problems and obstacles. In such an applied interdisciplinary eld as biostatistics, in which even the simplest study often involves teams of researchers with varying backgrounds and which can generate massive complicated data sets, new methods, no matter how powerful and robust, are of limited use unless they are clearly understood by practitioners, both clinical and biostatistical, and are available with well-documented computer software. In response to these needs Statistics in Medicine initiated in 1996 the inclusion of tutorials in biostatistics. The main objective of these tutorials is to generate, in a timely manner, brief well-written articles on biostatistical methods; these should be complete enough so that the methods presented are accessible to a broad audience, with sucient information given to enable readers to understand when the methods are appropriate, to evaluate applications and, most importantly, to use the methods in their own research. At rst tutorials were solicited from major methodologists. Later, both solicited and unsolicited articles were, and are still, developed and published. In all cases major researchers, methodologists and practitioners wrote and continue to write the tutorials. Authors are guided by four goals. The rst is to develop an introduction suitable for a well-dened audience (the broader the audience the better). The second is to supply sucient references to the literature so that the readers can go beyond the tutorial to nd out more about the methods. The referenced literature is, however, not expected to constitute a major literature review. The third goal is to supply sucient computer examples, including code and output, so that the reader can see what is needed to implement the methods. The nal goal is to make sure the reader can judge applications of the methods and apply the methods. The tutorials have become extremely popular and heavily referenced, attesting to their usefulness. To further enhance their availability and usefulness, we have gathered a number of these tutorials and present them in this two-volume set. Each volume has a brief preface introducing the reader to the aims and contents of the tutorials. Here we present an even briefer summary. We have arranged the tutorials by subject matter, starting in Volume 1 with 18 tutorials on statistical methods applicable to clinical studies, both observational studies and controlled clinical trials. Two tutorials discussing the computation of epidemiological rates such as prevalence, incidence and lifetime rates for cohort studies and capture–recapture settings begin the volume. Propensity score adjustment methods and agreement statistics such as the kappa statistic are dealt with in the next two tutorials. A series of tutorials on survival analysis methods applicable to observational study data are next. We then present ve tutorials on the development of prognostics or clinical prediction models. Finally, there are six tutorials on clinical trials. These range from designing
vii
viii
PREFACE
and analysing dose response studies and Bayesian data monitoring to analysis of longitudinal data and generating simple summary statistics from longitudinal data. All these are in the context of clinical trials. In all tutorials, the readers is given guidance on the proper use of methods. The subject-matter headings of Volume 1 are, we believe, appropriate to the methods. The tutorials are, however, often broader. For example, the tutorials on the kappa statistics and survival analysis are useful not only for observational studies, but also for controlled clinical studies. The reader will, we believe, quickly see the breadth of the methods. Volume 2 contains 16 tutorials devoted to the analysis of complex medical data. First, we present tutorials relevant to single data sets. Seven tutorials give extensive introductions to and discussions of generalized estimating equations, hierarchical modelling and mixed modelling. A tutorial on likelihood methods closes the discussion of single data sets. Next, two extensive tutorials cover the concepts of meta-analysis, ranging from the simplest conception of a xed eects model to random eects models, Bayesian modelling and highly involved models involving multivariate regression and meta-regression. Genetic data methods are covered in the next three tutorials. Statisticians must become familiar with the issues and methods relevant to genetics. These tutorials oer a good starting point. The next two tutorials deal with the major task of data reduction for functional magnetic resonance imaging data and disease mapping data, covering the complex data methods required by multivariate data. Complex and thorough statistical analyses are of no use if researchers cannot present results in a meaningful and usable form to audiences beyond those who understand statistical methods and complexities. Reader should nd the methods for presenting such results discussed in the nal tutorial simple to understand. Before closing this preface to the two volumes we must state a disclaimer. Not all the tutorials that are in these two volumes appeared as tutorials. Three were regular articles. These are in the spirit of tutorials and t well within the theme of the volumes. We hope that readers enjoy the tutorials and nd them benecial and useful. RALPH B. D’AGOSTINO, SR. EDITOR Boston University Harvard Clinical Research Institute
Preface to Volume 2 The 16 tutorials in this volume address the statistical modelling of complex medical data. Here we have topics covering data involving correlations among subjects in addition to hierarchical or covariance structures within a single data set, multiple data sets requiring meta-analyses, complex genetic data, and massive data sets resulting from functional magnetic resonance imaging and disease mapping. Here we briey mention the general themes of the ve parts of the volume and the articles within them. Part I is concerned with modelling a single data set. Section 1.1, on clustered data, presents a tutorial by Burton, Gurrin, and Sly which is an introduction to methods dealing with correlations among subjects in a single data set. The generalized estimating equations method and the multilevel mixed model methods are nicely introduced as extensions of linear regression. This is a wonderful introduction to these important topics. Section 1.2 deals with hierarchical modelling and contains three tutorials. The rst, by Sullivan, Dukes and Losina, is an excellent introduction to the basic concepts of hierarchical models, and gives many examples clarifying the models and their computational aspects. Next is an article by Goldstein, Browne and Rasbash which presents more sophisticated hierarchical modelling methods. These two tutorials are among the most popular in the series of tutorials. The last tutorial in this section, by Heo, Faith, Mott, Gorman, Redden and Allison, is a carefully developed example of hierarchical modelling methods applied to the development of growth curves. Section 1.3 is on the major area of mixed models. Three major tutorials are included here. The rst is by Cnaan, Laird and Slasor, the second by Littell, Pendergast and Natarajan, and the third by Park and Lee. These tutorials address the complexity of modelling mixed model data and covariance structures where there are longitudinal data measured, possibly, at unequal intervals and with missing data. All contain extensive examples with ample computer and visual analyses. Further, they carefully illustrate the use of major computer software such as SAS (Proc Mixed) and BMDP. These tutorials are major tools for learning about these methods and understanding how to use them. Section 1.4 contains a single article by Blume on the use of the likelihood ratio for measuring the strength of statistical evidence. This claries the basic concepts of modelling data and illustrating the importance and central role of the likelihood model. Part II, ‘Modelling Multiple Data Sets: Meta-Analysis’, contains two tightly written articles. These tutorials cover the concepts of meta-analysis ranging from the simplest conception of a xed eects model to random eects models, Bayesian modelling and highly involved models involving multivariate analysis and meta-regression. The rst article, by Normand, deals with formulating the meta-analysis problem, evaluating the data available for a metaanalysis, combing data for the meta-analysis and reporting the meta-analysis. She presents xed eects and random eects models as well as three modes of inference: maximum likelihood, restricted maximum likelihood and Bayesian. The second article is by van Houwelingen, Arends and Stijnen and carefully describes more sophisticated methods such as multivariate methods and meta-regression. These two articles constitute a major review of meta-analysis.
ix
x
PREFACE TO VOLUME 2
Genetic concepts and analyses permeate almost all clinical problems today, and statisticians must become familiar with the issues and methods relevant to genetics. The three tutorials in Part III, ‘Modelling Genetic Data: Statistical Genetics’, should oer a good start. The rst, by Thompson, reviews the statistical basis for genetic epidemiology. This article did not appear as a tutorial and pre-dates the tutorials by 10 years. It is, however, an excellent introduction to the topic and ts well with the mission of the tutorials. The next two tutorials cover genetic mapping of complex traits (by Olson, Witte and Elston) and a statistical perspective on gene expression data analysis (by Satagopan and Panageas). The former discusses methods for nding genes that contribute to complex human traits. The latter involves the analysis of microarray data. These methods deal with the simultaneous analysis of several thousand genes. Both contain careful development of concepts such as linkage analysis, transmission/disequilibrium tests and gene expression. Both set out the complexities of dealing with large genetics data sets generated from potentially a small number of subjects. The second tutorial further provides S-Plus and SAS computer examples and codes. Part IV is on data reduction of complex data sets, and consists of two important tutorials. The rst, by Lange, deals with the data reduction and analysis methods related to functional magnetic resonance imaging. This is among the longest tutorials and carefully reviews the vocabulary and methods. It supplies an excellent introduction to this important topic that will occupy more and more of the statistical community’s time. The second tutorial, by Lawson, is on disease map construction. It focuses on identifying the major issues in disease variations and the spatial analysis techniques aimed at good presentation of disease mappings with minimal noise. Part V, on simplied presentation of multivariate data, contains one tutorial by Sullivan, Massaro and D’Agostino. The premise of the tutorial is that complex and thorough statistical analyses are of no use if researchers cannot present results in a meaningful and usable form to a broad audience, an audience beyond those who understand statistical methods. The methods for presenting such results discussed in the tutorial are simple to understand. They were developed to help the Framingham Heart Study present its ndings to the medical community. The methods reduce multivariate models to simple scoring algorithms. Again, we hope that readers will nd these tutorials enjoyable and useful.
Part I MODELLING A SINGLE DATA SET
1.1 Clustered Data
Tutorials in Biostatistics Volume 2: Statistical Modelling of Complex Medical Data Edited by R. B. D’Agostino ? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02370-8 3
4
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
5
6
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
7
8
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
9
10
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
11
12
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
13
14
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
15
16
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
17
18
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
19
20
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
21
22
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
23
24
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
25
26
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
27
28
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
29
30
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
31
32
P. BURTON, L. GURRIN AND P. SLY
EXTENDING THE SIMPLE LINEAR REGRESSION MODEL
33
1.2 Hierarchical Modelling
Tutorials in Biostatistics Volume 2: Statistical Modelling of Complex Medical Data Edited by R. B. D’Agostino ? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02370-8 35
36
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
37
38
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
39
40
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
41
42
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
43
44
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
45
46
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
47
48
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
49
50
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
51
52
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
53
54
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
55
56
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
57
58
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
59
60
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
61
62
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
63
64
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
65
66
L. SULLIVAN, K. DUKES AND E. LOSINA
AN INTRODUCTION TO HIERARCHICAL LINEAR MODELLING
67
68
L. SULLIVAN, K. DUKES AND E. LOSINA
TUTORIAL IN BIOSTATISTICS Multilevel modelling of medical data Harvey Goldstein∗; † , William Browne and Jon Rasbash Institute of Education; University of London; London; U.K.
SUMMARY This tutorial presents an overview of multilevel or hierarchical data modelling and its applications in medicine. A description of the basic model for nested data is given and it is shown how this can be extended to t exible models for repeated measures data and more complex structures involving cross-classications and multiple membership patterns within the software package MLwiN. A variety of response types are covered and both frequentist and Bayesian estimation methods are described. Copyright ? 2002 John Wiley & Sons, Ltd. KEY WORDS:
complex data structures; mixed model, multilevel model; random eects model; repeated measures
1. SCOPE OF TUTORIAL The tutorial covers the following topics 1. The nature of multilevel models with examples. 2. Formal model specication for the basic Normal (nested structure) linear multilevel model with an example. 3. The MLwiN software. 4. More complex data structures: complex variance, multivariate models and cross-classied and multiple membership models. 5. Discrete response models, including Poisson, binomial and multinomial error distributions. 6. Specic application areas including survival models, repeated measures models, spatial models and meta analysis. 7. Estimation methods, including maximum and quasi likelihood, and MCMC. Further information about multilevel modelling and software details can be obtained from the web site of the Multilevel Models Project, http://multilevel.ioe.ac.uk/. ∗
Correspondence to: Harvey Goldstein, Institute of Education; University of London; 20 Bedford Way, London WC1H OAL, U.K. † E-mail:
[email protected] Tutorials in Biostatistics Volume 2: Statistical Modelling of Complex Medical Data Edited by R. B. D’Agostino ? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02370-8 69
70
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
2. THE NATURE OF MULTILEVEL MODELS Traditional statistical models were developed making certain assumptions about the nature of the dependency structure among the observed responses. Thus, in the simple regression model yi =0 + 1 xi + ei the standard assumption is that the yi given xi are independently identically distributed (i.i.d.), and the same assumption holds also for generalized linear models. In many real life situations, however, we have data structures, whether observed or by design, for which this assumption does not hold. Suppose, for example, that the response variable is the birthweight of a baby and the predictor is, say, maternal age, and data are collected from a large number of maternity units located in dierent physical and social environments. We would expect that the maternity units would have dierent mean birthweights, so that knowledge of the maternity unit already conveys some information about the baby. A more suitable model for these data is now yij =0 + 1 xij + uj + eij
(1)
where we have added another subscript to identify the maternity unit and included a unitspecic eect uj to account for mean dierences amongst units. If we assume that the maternity units are randomly sampled from a population of units, then the unit specic eect is a random variable and (1) becomes a simple example of a two-level model. Its complete specication, assuming Normality, can be written as follows: yij = 0 + 1 xij + uj + eij uj ∼ N(0; u2 ); eij ∼ N(0; e2 ) cov(uj ; eij ) = 0
(2)
cov(yi1 j ; yi2 j | xij ) = u2 ¿ 0 where i1 ; i2 are two births in the same unit j with, in general, a positive covariance between the responses. This lack of independence, arising from two sources of variation at dierent levels of the data hierarchy (births and maternity units) contradicts the traditional linear model assumption and leads us to consider a new class of models. Model (2) can be elaborated in a number of directions, including the addition of further covariates or levels of nesting. An important direction is where the coecient (and any further coecients) is allowed to have a random distribution. Thus, for example the age relationship may vary across clinics and, with a slight generalization of notation, we may now write (2) as yij =0ij x0ij + 1j x1ij 0ij =0 + u0j + e0ij 1j =1 + u1j x0ij =1 2 ; var(u0j )=u0
cov(u0j u1j )=u01 ;
2 var(u1j )=u1
var(e0ij )=e02
(3)
MULTILEVEL MODELLING OF MEDICAL DATA
71
and in later sections we shall introduce further elaborations. The regression coecients 0 , 1 are usually referred to as ‘xed parameters’ of the model and the set of variances and covariances as the random parameters. Model (3) is often referred to as a ‘random coecient’ or ‘mixed’ model. At this point we note that we can introduce prior distributions for the parameters of (3), so allowing Bayesian models. We leave this topic, however, for a later section where we discuss MCMC estimation. Another, instructive, example of a two-level data structure for which a multilevel model provides a powerful tool, is that of repeated measures data. If we measure the weight of a sample of babies after birth at successive times then the repeated occasion of measurement becomes the lowest level unit of a two-level hierarchy where the individual baby is the level-2 unit. In this case model (3) would provide a simple description with x1ij being time or age. In practice linear growth will be an inadequate description and we would wish to t at least a (spline) polynomial function, or perhaps a non-linear function where several coecients varied randomly across individual babies, that is each baby has its own growth pattern. We shall return to this example in more detail later, but for now note that an important feature of such a characterization is that it makes no particular requirements for every baby to be measured at the same time points or for the time points to be equally spaced. The development of techniques for specifying and tting multilevel models since the mid1980s has produced a very large class of useful models. These include models with discrete responses, multivariate models, survival models, time series models etc. In this tutorial we cannot cover the full range but will give references to existing and ongoing work that readers may nd helpful. In addition the introductory book by Snijders and Bosker [1] and the edited collection of health applications by Leyland and Goldstein [2] may be found useful by readers. A detailed introduction to the two-level model with worked examples and discussion of hypothesis tests and basic estimation techniques is given in an earlier tutorial [3] that also gives details of two computer packages, HLM and SAS, that can perform some of the analyses we describe in the present tutorial. The MLwiN software has been specically developed for tting very large and complex models, using both frequentist and Bayesian estimation and it is this particular set of features that we shall concentrate on.
3. MARGINAL VERSUS HIERARCHICAL MODELS At this stage it is worth emphasizing the distinction between multilevel models and so-called ‘marginal’ models such as the GEE model [4, 5]. When dealing with hierarchical data these latter models typically start with a formulation for the covariance structure, for example, but not necessarily based upon a multilevel structure such as (3), and aim to provide estimates with acceptable properties only for the xed parameters in the model, treating the existence of any random parameters as a necessary ‘nuisance’ and without providing explicit estimates for them. More specically, the estimation procedures used in marginal models are known to have useful asymptotic properties in the case where the exact form of the random structure is unknown. If interest lies only in the xed parameters, marginal models may be useful. Even here, however, they may be inecient if they utilize a covariance structure that is substantially incorrect. They are, however, generally more robust than multilevel models to serious mis-
72
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
specication of the covariance structure [6]. Fundamentally, however, marginal models address dierent research questions. From a multilevel perspective, the failure explicitly to model the covariance structure of complex data is to ignore information about variability that, potentially, is as important as knowledge of the average or xed eects. Thus, in the simple repeated measures example of baby weights, knowledge of how individual growth rates vary between babies, possibly dierentially according to say demographic factors, will be important data and in a later section we will show how such information can be used to provide ecient predictions in the case of human growth. When we discuss discrete response multilevel models we will show how to obtain information equivalent to that obtained from marginal models. Apart from that the remainder of this paper will be concerned with multilevel models. For a further discussion of the limitations of marginal models see the paper by Lindsey and Lambert [7].
4. ESTIMATION FOR THE MULTIVARIATE NORMAL MODEL We write the general Normal two-level model as follows, with natural extensions to three or more levels: Y=XR + E Y= {yij }; E=E E
(2)
(2)
X = {Xij };
+E
= {Ej(2) };
Ej(2) = zj(2) ej(2) ;
(2) (2) zij(2) = {z0j ; z1j ; : : : ; zq(2) }; 2j
E(1) = {Eij(1) };
zj(2) = {zij(2) }
(2) (2) ej(2) = {e0j ; e1j ; : : : ; eq(2) }T 2j
Eij(1) =zij(1) eij(1)
(1) (1) zij(1) = {z0j ; z1j ; : : : ; zq(1) }; 1j
e(2) = {ej(2) };
Xij = {x0ij ; x1ij ; : : : ; xpij }
(1)
(1) (1) eij(1) = {e0ij ; e1ij ; : : : ; eq(1) }T 1 ij
ej(1) = {eij(1) }
e(2) ∼ N(0; 2 );
ej(1) ∼ N(0; 1j )
[Typically 1j =1 ] (2) (2) (1) (1) (2) (1) eh j )j=j = E(ehij eh i j )i=i =E(ehj eh i j )=0 E(ehj
yields the block diagonal structure V = E(Y˜ Y˜ T )= (V2j + V1j ) j
Y˜ = Y − XR
(4) T
V2j = zj(2) 2 zj(2) ;
V1j =
i
T
zij(1) 1j zij(1)
MULTILEVEL MODELLING OF MEDICAL DATA
73
In this formulation we allow any number of random eects or coecients at each level; we shall discuss the interpretation of multiple level-1 random coecients in a later section. A number of ecient algorithms are available for obtaining maximum likelihood (ML) estimates for (4). One [8] is an iterative generalized least squares procedure (IGLS) that will also produce restricted maximum likelihood estimates (RIGLS or REML) and is formally equivalent to a Fisher scoring algorithm [9]. Note that RIGLS or REML should be used in small samples to correct for the underestimation of IGLS variance estimates. The EM algorithm can also be used [10, 11]. Our examples use RIGLS (REML) estimates as implemented in the MlwiN software package [12] and we will also discuss Bayesian models. A simple description of the IGLS algorithm is as follows: From (4) we have V= E(Y˜ Y˜ T)=
(V2j + V1j )
j
˜ Y − X Y= The IGLS algorithm proceeds by rst carrying out a GLS estimation for the xed parameters () using a working estimator of V. The vectorized cross-product matrix of ‘raw’ residuals ˆ is then used as the response in a GLS estimation where the explanatory Y˜ˆ Y˜ˆ T where Y˜ˆ =Y −X, variable design matrix is determined by the last line of (4). This provides updated estimates for the 1j and 2 and hence V . The procedure is repeated until convergence. In the simple case we have been considering so far where the level 1 residuals are i.i.d., for a level-2 unit (individual) with just three level-1 units (occasions) there are just six distinct raw residual terms and the level-1 component V1j is simply e2 I3 . Written as a vector of the lower triangle this becomes 1 0 2 1 (5) e 0 0 1 and the vector of ones and zeroes becomes the level-1 explanatory variable for the GLS estimation, in this case providing the coecient that is the estimator of e2 . Similarly, for a model where there is a single variance term at level 2, the level-2 component V2j written as a lower triangle vector is 1 1 2 1 u 1 1 1
74
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
Goldstein [13] shows that this procedure produces maximum likelihood estimates under Normality. 5. THE MLwiN SOFTWARE MLwiN has been under development since the late 1980s, rst as a command-driven DOS based program, MLn, and since 1998 in a fully-edged windows version, currently in release 1.10. It is produced by the Multilevel Models Project based within the Institute of Education, University of London, and supported largely by project funds from the U.K. Economic and Social Research Council. The software has been developed alongside advances in methodology and with the preparation of manuals and other training materials. Procedures for tting multilevel models are now available in several major software packages such as STATA, SAS and S-plus. In addition there are some special purpose packages, which are tailored to particular kinds of data or models. MIXOR provides ML estimation for multi-category responses and HLM is used widely for educational data. See Zhou et al. [14] for a recent review and Sullivan et al. [3] for a description of the use of HLM and SAS. Many of the models discussed here can also be tted readily in the general purpose MCMC software package WinBUGS [15]. MLwiN has some particular advanced features that are not available in other packages and it also has a user interface designed for fully interactive use. In later sections we will illustrate some of the special features and models available in MLwiN but rst give a simple illustration of the user interface. We shall assume that the user wishes to t the simple two-level model given by (1). In this tutorial we cannot describe all the features of MLwiN, but it does have general facilities for data editing, graphing, tabulation and simple statistical summaries, all of which can be accessed through drop-down menus. In addition it has a macro language, which can be used, for example, to run simulations or to carry out special purpose modelling. One of the main features is the method MLwiN uses to set up a model, via an ‘equation window’ in which the user species a model in more or less exactly the format it is usually written. Thus to specify model (1) the user would rst open the equation window which, prior to any model being specied, would be as shown in Figure 1. This is the default null model with a response that is Normal with xed predictor represented by X and covariance matrix represented by . Clicking on the N symbol delivers a drop
Figure 1. Default equation screen with model unspecied.
MULTILEVEL MODELLING OF MEDICAL DATA
75
Figure 2. Equation screen with model display.
down menu, which allows the user to change the default distribution to binomial, Poisson or negative binomial. Clicking on the response y allows the user to identify the response variable from a list and also the number and identication for the hierarchical levels. Clicking on the x0 term allows this to be selected from a list and also whether its coecient 0 is random at particular levels of the data hierarchy. Adding a further predictor term is also a simple matter of clicking an ‘add term’ button and selecting a variable. There are simple procedures for specifying general interaction terms. Model (1), including a random coecient for x1 in its general form as given by (3), will be displayed in the equation window as shown in Figure 2. Clicking on the ‘Estimates’ button will toggle the parameters between their symbolic representations and the actual estimates after a run. Likewise, the ‘Name’ button will toggle actual variable names on and o. The ‘Subs’ button allows the user to specify the form of subscripts, for example giving them names such as in the screen shown in Figure 3, where we also show the estimates and standard errors from an iterative t. In the following sections we will show some further screen shots of models and results.
6. A GROWTH DATA EXAMPLE We start with some simple repeated measures data and we shall use them to illustrate models of increasing complexity. The data set consists of nine measurements made on 26 boys between the ages of 11 and 13.5 years, approximately 3 months apart [16].
76
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
Figure 3. Equation screen with estimates.
Figure 4.
Figure 4, produced by MLwiN, shows the mean heights by the mean age at each measurement occasion. We assume that growth can be represented by a polynomial function, whose coecients vary from individual to individual. Other functions are possible, including fractional polynomials or non-linear functions, but for simplicity we conne ourselves to examining a fourth-order polynomial in age (t) centred at an origin of 12.25 years. In some applications of growth curve modelling transformations of the time scale may be useful, often
MULTILEVEL MODELLING OF MEDICAL DATA
77
to orthogonal polynomials. In the present case the use of ordinary polynomials provides an accessible interpretation and does not lead to computational problems, for example due to near-collinearities. The model we t can be written as follows: 4 yij = hj tijh + eij h=0
0j =0 + u0j 1j =1 + u1j 2j =2 + u2j 3j =3 4j =4
(6)
u0
u1 ∼ N(0; u )
u2
2 u0
u = u01
2 u1
u02
u12
2 u2
e ∼ N(0; e2 ) This is a two-level model with level-1 being ‘measurement occasion’ and level 2 ‘individual boy’. Note that we allow only the coecients up to the second order to vary across individuals; in the present case this provides an acceptable t. The level 1 residual term eij represents the unexplained variation within individuals about each individual’s growth trajectory. Table I shows the restricted maximum likelihood (REML) parameter estimates for this model. The log-likelihood is calculated for the ML estimates since this is preferable for purposes of model comparison [17]. From this table we can compute various features of growth. For example, the average growth rate (by dierentiation) at age 13.25 years (t =1) is 6:17 + 2 × 1:13 + 3 × 0:45 − 4 × 0:38=8:26 cm=year. A particular advantage of this formulation is that, for each boy, we can also estimate his random eects or ‘residuals’, u0j ; u1j ; u2j , and use these to predict their growth curve at each age [18]. Figure 5, from MLwiN, shows these predicted curves (these can be produced in dierent colours on the screen). Goldstein et al. [16] show that growth over this period exhibits a seasonal pattern with growth in the summer being about 0:5 cm greater than growth in the winter. Since the period of the growth cycle is a year, this is modelled by including a simple cosine term, which could also have a random coecient. In our example we have a set of individuals all of whom have nine measurements. This restriction, however, is not necessary and (6) does not require either the same number of occasions per individual or that measurements are made at equal intervals, since time is modelled as a continuous function. In other words we can combine data from individuals with very dierent measurement patterns, some of whom may only have been measured once
78
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
Table I. Height modelled as a fourth-degree polynomial on age. REML estimates. Fixed eects Intercept t t2 t3 t4
Estimate
Standard error
149.0 6.17 1.13 0.45 −0:38
1.57 0.36 0.35 0.16 0.30
Random: level-2 (individual) correlation matrix, variances on diagonal
Intercept t t2
Intercept
t
t2
64.0 0.61 0.22
2.86 0.66
0.67
Random: level-1 variance = 0:22. −2 log-likelihood(ML) = 625:4.
Figure 5.
and some who have been measured several times at irregular intervals. This exibility, rst noted by Laird and Ware [10], means that the multilevel approach to tting repeated measures data is to be preferred to previous methods based upon a multivariate formulation assuming a common set of xed occasions [19, 20]. In these models it is assumed that the level-1 residual terms are independently distributed. We may relax this assumption, however, and in the case of repeated measures data this may be necessary, for example where measurements are taken very close together in time. Suppose we wish to t a model that allows for correlations between the level-1 residuals, and to start with for simplicity let us assume that these correlations are all equal. This is easily
79
MULTILEVEL MODELLING OF MEDICAL DATA
Table II. Height modelled as a fourth-degree polynomial on age, including a seasonal eect and serial correlation. REML estimates. Fixed eects Intercept t t2 t3 t4 Cos(t)
Estimate
Standard error
148.9 6.19 2.16 0.39 −1:55 −0:24
0.36 0.45 0.17 0.43 0.07
Random: level-2 (individual) correlation matrix, covariances on diagonal
Intercept t t2
Intercept
t
t2
63.9 0.61 0.24
2.78 0.69
0.59
Random: level 1(SE in brackets) e2 0:24(0:05) 6:59(1:90) −2 log-likelihood (ML) = 611:5.
accomplished within the GLS step for the random parameters by modifying (5) to 1 0 0 1 1 0 2 e + 0 1 0 1 1 0
(7)
so that the parameter is the common level-1 covariance (between occasions). Goldstein et al. [16] show how to model quite general non-linear covariance functions and in particular those of the form cov(et et−s )=e2 exp(−g(; s)), where s is the time dierence between occasions. This allows the correlation between occasions to vary smoothly as a function of their (continuous) time dierence. A simple example is where g=s, which, in discrete time, produces an AR(1) model. The GLS step now involves non-linear estimation that is accomplished in a standard fashion using a Taylor series approximation within the overall iterative scheme. Pourahmadi [21, 22] considers similar models but restricted to a xed set of discrete occasions. Table II shows the results of tting the model with g=s together with a seasonal component. If this component has amplitude, say, we can write it in the form cos(t ∗ +), where t ∗ is measured from the start of the calendar year. Rewriting this in the form 1 cos(t ∗ )−2 sin(t ∗ ) we can incorporate the cos(t ∗ ), sin(t ∗ ) as two further predictor variables in the xed part of the model. In the present case 2 is small and non-signicant and is omitted. The results
80
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
show that for measurements made three months apart the serial correlation is estimated as 0:19 (e−6:59=4 ) and as 0:04 (e−6:59=2 ) for measurements taken at 6-monthly intervals. This suggests, therefore, that in practice, for such data when the intervals are no less than 6 months apart serial correlation can be ignored, but should be tted when intervals are as small as 3 months. This will be particularly important in highly unbalanced designs where there are some individuals with many measurements taken close together in time; ignoring serial correlation will give too much weight to the observations from such individuals. Finally, on this topic, there will typically need to be a trade-o between modelling more random coecients at level 2 in order to simplify or eliminate a level-1 serial correlation structure, and modelling level 2 in a parsimonious fashion so that a relatively small number of random coecients can be used to summarize each individual. An extreme example of the latter is given by Diggle [23] who ts only a random intercept at level 2 and serial correlation at level 1.
7. MULTIVARIATE RESPONSE DATA We shall use an extension of the model for repeated measures data to illustrate how to model multivariate response data. Consider model (6) where we have data on successive occasions for each individual and in addition, for some or all individuals, we have a measure, say, of their nal adult height y3(2) , and their (log) income at age 25 years, y4(2) , where the superscript denotes a measurement made at level 2. We can include these variables as further responses by extending (6) as follows: yij(1) =
4 h=0
hj tijh + eij
0j =0 + u0j 1j =1 + u1j 2j =2 + u2j 3j =3 4j =4 (2) y3j =3 + u3j (2) y4j =4
+ u4j
u0
u1 u2 ∼ N(0; u ) u3
u4 e ∼ N(0; e2 )
(8)
MULTILEVEL MODELLING OF MEDICAL DATA
81
We now have a model where there are response variables dened at level 1 (with superscript (1)) and also at level 2 (with superscript (2)). For the level-2 variables we have specied only an intercept term in the xed part, but quite general functions of individual level predictors, such as gender, are possible. The level-2 responses have no component of random variation at level 1 and their level-2 residuals covary with the polynomial random coecients from the level-1 repeated measures response. The results of tting this model allow us to quantify the relationships between growth events, such as growth acceleration (dierentiating twice) at t =0, age 12.25 years, (22j ) and adult height and also to use measurements taken during the growth period to make ecient predictions of adult height or income. We note that for individual j the estimated (posterior) residuals uˆ3j ; uˆ4j are the best linear unbiased predictors of the individual’s adult values; where we have only a set of growth period measurements for an individual these therefore provide the required estimates. Given the set of model parameters, therefore, we immediately obtain a system for ecient adult measurement prediction given a set of growth measurements [24]. Suppose, now, that we have no growth period measurements and just the two adult measurements for each individual. Model (8) reduces to (2) =3 + u3j y3j (2) y4j =4 + u4j
u3 ∼ N(0; u ) u4
V1j =0; V2j =
(9)
2 u3
u34
2 u4
Thus we can think of this as a two-level model with no level-1 variation and every level 2 unit containing just two level 1 units. The explanatory variables for the simple model given by (9) are just two dummy variables dening, alternately, the two responses. Thus we can write (9) in the more compact general form 2 1 if response 1 yij = 0hj xhij ; x1ij = ; x2ij =1 − x1ij 0 if response 2 h=1 0hj =0h + uhj
u1
u2
=
2 u1
u12
(10)
2 u2
Note that there is no need for every individual to have both responses and so long as we can consider ‘missing’ responses as random, the IGLS algorithm will supply maximum likelihood estimates. We can add further covariates to the model in a straightforward manner by forming interactions between them and the dummy variables dening the separate response intercepts. The ability to t a multivariate linear model with randomly missing responses nds a number of applications, for example where matrix or rotation designs are involved (reference [18], Chapter 4), each unit being allocated, at random, a subset of responses. The possibility of
82
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
having additionally level-1 responses allows this to be used as a very general model for meta analysis where there are several studies (level-2 units) for some of which responses are available only in summary form at level 2 and for others detailed level-1 responses are available. Goldstein et al. [25] provide a detailed example. 8. CROSS-CLASSIFIED AND MULTIPLE MEMBERSHIP STRUCTURES Across a wide range of disciplines it is commonly the case that data have a structure that is not purely hierarchical. Individuals may be clustered not only into hierarchically ordered units (for example occasions nested within patients nested within clinics), but may also belong to more than one type of unit at a given level of a hierarchy. Consider the example of a livestock animal such as a cow where there are a large number of mothers, each producing several female ospring that are eventually reared for milk on dierent farms. Thus, an ospring might be classied as belonging to a particular combination of mother and farm, in which case they will be identied by a cross-classication of these. Raudenbush [26] and Rasbash and Goldstein [27] present the general structure of a model for handling complex hierarchical structuring with random cross-classications. For example, assuming that we wish to formulate a linear model for the milk yield of ospring taking into account both the mother and the farm, then we have a cross-classied structure, which can be modelled as follows: yi( j1 ; j2 ) =(X)i( j1 ; j2 ) + uj1 + uj2 + ei( j1 j2 ) j1 =1; : : : ; J1 ;
j2 =1; : : : ; J2 ;
i =1; : : : ; N
(11)
in which the yield of ospring i, belonging to the combination of mother j1 and farm j2 , is predicted by a set of xed coecients (X)i( j1 ; j2 ) . The random part of the model is given by two level-2 residual terms, one for the mother (uj1 ) and one for the farm (uj2 ), together with the usual level-1 residual term for each ospring. Decomposing the variation in such a fashion allows us to see how much of it is due to the dierent classications. This particular example is somewhat oversimplied, since we have ignored paternity and we would also wish to include factors such as age of mother, parity of ospring etc. An application of this kind of modelling to a more complex structure involving Salmonella infection in chickens is given by Rasbash and Browne [28]. Considering now just the farms, and ignoring the mothers, suppose that the ospring often change farms, some not at all and some several times. Suppose also that we know, for each ospring, the weight wij2 , associated with the j2 th farm for ospring i with Jj22=1 wij2 =1. These weights, for example, may be proportional to the length of time an ospring stays in a particular farm during the course of our study. Note that we allow the possibility that for some (perhaps most) animals only one farm is involved so that one of these probabilities is one and the remainder are zero. Note that when all level-1 units have a single non-zero weight of 1 we obtain the usual purely hierarchical model. We can write for the special case of membership of up to two farms {1; 2}: yi(1; 2) =(X)i(1; 2) + wi1 u1 + wi2 u2 + ei(1; 2) (12) wi1 + wi2 =1
83
MULTILEVEL MODELLING OF MEDICAL DATA
and more generally yi{ j} =(X)i{ j} +
wih uh + ei{ j}
h∈{ j}
wih =1;
h
var(
h
var(uh )=u2
wih uh )=u2
h
(13)
wih2
Thus, in the particular case of membership of just two farms with equal weights we have
wi1 =wi2 =0:5; var wih uh =u2 =2 h
Further details of this model are given by Hill and Goldstein [29]. An extension of the multiple membership model is also possible and has important applications, for example in modelling spatial data. In this case we can write yi{ j1 }{ j2 } =(X)i{ j} + w1ih u1h + w2ih u2h + ei{ j}
w1ih =W1 ;
h
cov(u1h ; u2h )=u12 ;
h∈{ j1 }
w2ih =W2 ;
h
h∈{ j2 }
2 var(u1h )=u1 ;
2 var(u2h )=u2
(14)
j = { j1 ; j2 }
There are now two sets of higher level units that inuence the response and in general we can have more than two such sets. In spatial models one of these sets is commonly taken to be the area where an individual (level 1) unit occurs and so does not have a multiple membership structure (since each individual belongs to just one area, that is we replace h w1ih u1h by u1j1 ). The other set consists of those neighbouring units that are assumed to have an eect. The weights will need to be carefully chosen; in spatial models W2 is typically chosen to be equal to 1 (see Langford et al. [30] for an example). Another application for a model such as (14) is for household data where households share facilities, for example an address. In this case the household that an individual resides in will belong to one set and the other households at the address will belong to the other set. Goldstein et al. [31] give an application of this model to complex household data.
9. META-ANALYSIS Meta-analysis involves the pooling of information across studies in order to provide both greater eciency for estimating treatment eects and also for investigating why treatments eects may vary. By formulating a general multilevel model we can do both of these eciently within a single-model framework, as has already been indicated and was suggested by several authors [32, 33]. In addition we can combine data that are provided at either individual subject level or aggregate level or both. We shall look at a simple case but this generalizes readily [25].
84
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
Consider an underlying model for individual level data where a pair of treatments are being compared and results from a number of studies or centres are available. We write a basic model, with a continuous response Y as yij =(X)ij + 2 tij + uj + eij var(uj )=u2 ;
var(eij )=e2
(15)
with the usual assumptions of Normality etc. The covariate function is designed to adjust for initial clinic and subject conditions. The term tij is a dummy variable dening the treatment (0 for treatment A, 1 for treatment B). The random eect uj is a study eect and the eij are individual-level residuals. Clearly this model can be elaborated in a number of ways, by including random coecients at level 2 so that the eect of treatment varies across studies, and by allowing the level-1 variance to depend on other factors such as gender or age. Suppose now that we do not have individual data available but only means at the study level. If we average (15) to the study level we obtain y:j =(X): j + 2 t: j + uj + e: j
(16)
where y:j is the mean response for the jth study etc. The total residual variance for study j in this model is u2 + e2 =nj where nj is the size of the jth study. It is worth noting at this point that we are ignoring, for simplicity, levels of variation that might exist within studies, such as that between sites for a multi-site study. If we have the values of y:j ; (X): j ; t: j where the latter is simply the proportion of subjects with treatment B in the jth study, and also the value of nj then we will be able to obtain estimates for the model parameters, so long as the nj dier. Such estimates, however, may not be very precise and extra information, especially about the value of e2 , will improve them. Model (16) therefore forms the basis for the multilevel modelling of aggregate level data. In practice the results of studies will often be reported in non-standard form, for example with no estimate of e2 but it may be possible to estimate this from reported test statistics. In some cases, however, the reporting may be such that the study cannot be incorporated in a model such as (16). Goldstein et al. [25] set out a set of minimum reporting conventions for meta-analysis studies subsequently to be carried out. While it is possible to perform a meta-analysis with only aggregate level data, it is clearly more ecient to utilize individual-level data where these are available. In general, therefore, we will need to consider models that have mixtures of individual and aggregate data, even perhaps within the same study. We can do this straightforwardly by specifying a model which is just the combination of (15) and (16), namely yij =0 + 1 xij + 2 tij + uj + eij y:j =0 + 1 x: j + 2 t: j + uj + ej zj √ zj = nj−1 ; ej ≡ eij
(17)
What we see is that the common level-1 and level-2 random terms link together the separate models and allow a joint analysis that makes fully ecient use of the data. Several issues immediately arise from (17). One is that the same covariates are involved. This is also a requirement for the separate models. If some covariate values are missing at either level then it is possible to use an imputation technique to obtain estimates, assuming a suitable random
MULTILEVEL MODELLING OF MEDICAL DATA
85
missingness mechanism. The paper by Goldstein et al. [25] discusses generalizations of (17) for several treatments and the procedure can be extended to generalized linear models.
10. GENERALIZED LINEAR MODELS So far we have dealt with linear models, but all of those so far discussed can be modied using non-linear link functions to give generalized linear multilevel models. We shall not discuss these in detail (see Goldstein [18] for details and some applications) but for illustration we will describe a two-level model with a binary response. Suppose the outcome of patients in intensive care is recorded simply as survived (0) or died (1) within 24 hours of admission. Given a sample of patients from a sample of intensive care units we can write one model for the probability of survival as logit(ij )=(X)ij + uj yij ∼ Bin(ij ; 1)
(18)
Equation (18) uses a standard logit link function assuming binomial (Bernoulli) error distribution for the (0,1) response yij . The level-2 random variation is described by the term uj within the linear predictor. The general interpretation is similar to that for a continuous response model, except that the level-1 variation is now a function of the predicted value ij . While in (18) there is no separate estimate for the level-1 variance, we may wish to t extrabinomial variation which will involve a further parameter. We can modify (18) using alternative link functions, for example the logarithm if the response is a count, and can allow further random coecients at level 2. The response can be a multi-category variable, either ordered or unordered, and this provides an analogue to the multivariate models for continuously distributed responses. As an example of an ordered categorical response, consider extending the previous outcome to three categories: survival without impairment (1); survival with impairment (2); death (3). If ij(h) ; h=1; 2; 3 are, respectively, the probabilities for each of these categories, we can write a proportional odds model using a logit link as logit(ij(1) )=(1) + (X)ij + uj(1) logit(ij(1) + ij(2) )=(2) + (X)ij + uj(2)
(19)
and the set of three (0,1) observed responses for each patient is assumed to have a multinomial distribution with mean vector given by ij(h) ; h=1; 2; 3. Since the probabilities add to 1 we require two lines in (19) which dier only in terms of the overall level given by the intercept term as well as allowing for these to vary across units. Unlike in the continuous Normal response case, maximum likelihood estimation is not straightforward and beyond fairly simple two-level models involves a considerable computational load typically using numerical integration procedures [34]. For this reason approximate methods have been developed based upon series expansions and using quasi-likelihood approaches [18] which perform well under a wide range of circumstances but can break down in certain conditions, especially when data are sparse with binary responses. High-order Laplace
86
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
approximations have been found to perform well [35] as have simulation-based procedures such as MCMC (see below). It is worth mentioning one particular situation where care is required in using generalized linear multilevel models. In (15) and (16) we assume that the level-1 responses are independently distributed with probabilities distributed according to the specied model. In some circumstances such an assumption may not be sensible. Consider a repeated measures model where the health status (satisfactory/not satisfactory) of individuals is repeatedly assessed. Some individuals will typically always respond ‘satisfactory’ whereas some others can be expected to respond always ‘not satisfactory’. For these individuals the underlying probabilities are either zero or one, which violates the model assumptions and what one nds if one tries to t a model where there are non-negligible numbers of such individuals are noticeable amounts of underdispersion. Barbosa and Goldstein [36] discuss this problem and propose a solution based upon tting a serial correlation structure. We can also have multivariate multilevel models with mixtures of discrete and continuous responses. Certain of these can be tted in MLwiN using quasi-likelihood procedures [12] and MCMC procedures for such models are currently being implemented. See also Olsen and Schafer [37] for an alternative approach.
11. SURVIVAL MODELS Several dierent formulations for survival data modelling are available; to illustrate how these can be extended to multilevel structures, where they are often referred to as frailty models [38], we consider the proportional hazards (Cox) model and a piecewise discrete time model. Goldstein [18] gives other examples. Consider a simple two-level model with, say, patients within hospitals or occasions within subjects. As in the standard single-level case we consider each time point in the data as dening a block indicated by l at which some observations come to the end of their duration due to either failure or censoring and some remain to the next time or block. At each block there is therefore a set of observations – the total risk set. To illustrate how the model is set up we can think of the data sorted so that each observation within a block is a level-1 unit, above which, in the repeated measures case, there are occasions at level 2 and subjects at level 3. The ratio of the hazard for the unit which experiences a failure at a given occasion referred to by (j; k) to the sum of the hazards of the remaining risk set units [39] is exp(1 x1ijk + ujk ) exp(1 x1ijk + ujk )
(20)
j; k
where j and k refer to the real levels 2 and 3, for example occasion and subject. At each block denoted by l the response variable may be dened for each member of the risk set as
yijk(l) =
1
failed
0
not
MULTILEVEL MODELLING OF MEDICAL DATA
87
Because of equivalence between the likelihood for the multinomial and Poisson distributions, the latter is used to t model (20). This can be written as E(yijk(l) )= exp(l + Xjk k )
(21)
Where there are ties within a block then more than one response will be non-zero. The terms l t the underlying hazard function as a ‘blocking factor’, and can be estimated by tting either a set of parameters, one for each block, or a smoothed polynomial curve over the blocks numbered 1; : : : ; p. Thus if the hth block is denoted by h, l is replaced by a low-order polynomial, order m, mt=0 t ht , where the t are (nuisance) parameters to be estimated. Having set up this model, the data are now sorted into the real two-level structure, for example in the repeated measures case by failure times within subjects with occasions within the failure times. This retains proportional hazards within subjects. In this formulation the Poisson variation is dened at level 1, there is no variation at level 2 and the between-subject variation is at level 3. Alternatively we may wish to preserve overall proportionality, in which case the failure times dene level 3 with no variation at that level. See Goldstein [18] for a further discussion of this. Consider now a piecewise survival model. Here the total time interval is divided into short intervals during which the probability of failure, given survival up to that point, is assumed constant. Denote these intervals by t (1; 2; : : : ; T ) so that the hazard at time t is the probability that, given survival up to the end of time interval t − 1, failure occurs in the next interval. At the start of each interval we have a ‘risk set’ nt consisting of the survivors and during the interval rt fail. If censoring occurs during interval t then this observation is removed from that interval (and subsequent ones) and does not form part of the risk set. A simple, single-level, model for the probability can be written as i(t) =f[t zit ; (X )it ]
(22)
where zt = {zit } is a dummy variable for the tth interval and t , as before, is a ‘blocking factor’ dening the underlying hazard function at time t. The second term is a function of covariates. A common formulation would be the logit model and a simple such model in which the rst blocking factor has been absorbed into the intercept term could be written as logit(i(t) )=0 + t zit + 1 x1i ;
(z2 ; z3 ; : : : ; zT )
(23)
Since the covariate varies across individuals, in general the data matrix will consist of one record for each individual within each interval, with a (0,1) response indicating survival or failure. The model can now be tted using standard procedures, assuming a binomial error distribution. As before, instead of tting T − 1 blocking factors, we can t a low-order polynomial to the sequentially numbered time indicator. The logit function can be replaced by, for example, the complementary log-log function that gives a proportional hazards model, or, say, the probit function and note that we can incorporate time-varying covariates such as age. For the extension to a two-level model we write logit(ij(t) )=0 +
p h=1
h∗ (zit∗ )h + 1 x1ij + uj
(24)
88
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
where uj is the ‘eect’, for example, of the jth clinic, and is typically assumed to be distributed normally with zero mean and variance u2 . We can elaborate (24) using random coecients, resulting in a heterogeneous variance structure, further levels of nesting etc. This is just a two-level binary response model and can be tted as described earlier. The data structure has two levels so that individuals will be grouped (sorted) within clinics. For a competing risks model with more than one outcome we can use the two-level formulation for a multi-category response described above. The model can be used with repeated measures data where there are repeated survival times within individuals, for example multiple pregnancy states.
12. BAYESIAN MODELLING So far we have considered the classical approach to tting multilevel models. If we add prior distributional assumptions to the parameters of the models so far considered we can t the same range of models from a Bayesian perspective, and in most applications this will be based upon MCMC methods. A detailed comparison of Bayesian and likelihood procedures for tting multilevel models is given in Browne and Draper [40]. A particular advantage of MCMC methods is that they yield inferences based upon samples from the full posterior distribution and allow exact inference in cases where, as mentioned above, the likelihood based methods yield approximations. Owing also to their approach of generating a sample of points from the full posterior distributions, they can give accurate interval estimates for non-Gaussian parameter distributions. In MCMC sampling we are interested in generating samples of values from the joint posterior distribution of all the unknown parameters rather than nding the maximum of this distribution. Generally it is not possible to generate directly from this joint distribution, so instead the parameters are split into groups and for each group in turn we generate a set of values from its conditional posterior distribution. This can be shown to be equivalent to sampling directly from the joint posterior distribution. There are two main MCMC procedures that are used in practice: Gibbs sampling [41] and Metropolis–Hastings (MH) [42, 43] sampling. When the conditional posterior for a group of parameters has a standard form, for example a Normal distribution, then we can generate values from it directly and this is known as Gibbs sampling. When the distribution is not of standard form then it may still be possible to use Gibbs sampling by constructing the distribution using forms of Gibbs sampling such as adaptive rejection sampling [44]. The alternative approach is to use MH sampling where values are generated from another distribution called a proposal distribution rather than the conditional posterior distribution. These values are then either accepted or rejected in favour of the current values by comparing the posterior probabilities of the joint posterior at the current and proposed new values. The acceptance rule is designed so that MH is eectively sampling from the conditional posterior even though we have used an arbitrary proposal distribution; nevertheless, choice of this proposal distribution is important for eciency of the algorithm. MCMC algorithms produce chains of serially correlated parameter estimates and consequently often have to be run for many iterations to get accurate estimates. Many diagnostics are available to gauge approximately for how long to run the MCMC methods. The chains are also started from arbitrary parameter values and so it is common practice to ignore the
89
MULTILEVEL MODELLING OF MEDICAL DATA
rst N iterations (known as a burn-in period) to allow the chains to move away from the starting value and settle at the parameters’ equilibrium distribution. We give here an outline of the Gibbs sampling procedure for tting a general Normal two-level model yij = Xij + Zij uj + eij uj ∼ N(0; u ); eij ∼ N(0; e2 ); i = 1; : : : ; nj ;
j =1; : : : ; J;
nj =N
j
We will include generic conjugate prior distributions for the xed eects and variance parameters as follows: ∼ N(p ; Sp ); u ∼ W −1 (u ; Su ); e2 ∼ SI 2 (e ; se2 ) The Gibbs sampling algorithm then involves simulating from the following four sets of conditional distributions:
p( | y; e2 ; u) ∼ N
Dˆ e2
nj J
j=1 i=1
XijT (yij
− Zij uj ) +
nj Dˆj ∼N ZT (yij − Xij ); Dˆj e2 i=1 ij
J −1 T p(u | u) ∼ W J + u ; uj uj + Su
Sp−1 p
; Dˆ
p(uj | y; u ; e2 ; )
j=1
p(e2
−1
| y; ; u) ∼
N + e 1 ; 2 2
e se2
+
nj J j=1 i=1
eij2
where
−1 XijT Xij −1 ˆ D= + Sp e2 ij
and
−1 nj 1 T −1 ˆ Dj = 2 Z Zij + u e i=1 ij
Note that in this algorithm we have used generic prior distributions. This allows the incorporation of informative prior information but generally we will not have this information and so will use so-called ‘diuse’ prior distributions that reect our lack of knowledge. Since there are only 26 level-2 units from which we are estimating a 3 × 3 covariance matrix, the exact
90
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
Table III. Height modelled as a fourth-degree polynomial on age using MCMC sampling. Fixed eects
MCMC Estimate
Standard error
149.2 6.21 1.14 0.45 −0:38
1.58 0.38 0.36 0.17 0.30
Intercept t t2 t3 t4
Random: level-2 (individual) correlation matrix, variances on diagonal, estimates are mean(mode) Intercept Intercept t t2
74.4 (68.6) 0.61 0.22
t 3.34 (3.06) 0.66
t2
0.67 (0.70)
Random: level-1 variance = 0:23(0:22).
choice of prior is important. We here use the following set of priors for the child growth example considered in Section 6: p(0 ) ∝ 1;
p(1 ) ∝ 1;
p(2 ) ∝ 1;
p(u ) ∼ inverse Wishart 3 [3; 3 × Su ];
p(3 ) ∝ 1; p(4 ) ∝ 1; p(5 ) ∝ 1 64:0 Su = 8:32 2:86 1:42 0:92 0:67
p(e02 ) ∼ inverse gamma(0:001; 0:001) The inverse Wishart prior matrix is based upon the REML estimates, chosen to be ‘minimally informative’, with degrees of freedom equal to the order of the covariance matrix. The results of running model (5) for 50000 iterations after a burn-in of 500 can be seen in Table III. Here we see that the xed eect estimates are very similar to the estimates obtained by the maximum likelihood method. The variance (chain mean) estimates are however inated due to the skewness of the variance parameters. Modal estimates of the variance parameters, apart from that for the quadratic coecient, are closer, as is to be expected. If we had used the uniform prior p(u ) ∝ 1 for the covariance matrix, the estimates of the xed coecients are little changed but the covariance matrix estimates are noticeably dierent. For example, the variance associated with the intercept is now 93.0 and those for the linear and quadratic coecients become 4.2 and 1.1, respectively. Figure 6 shows the diagnostic screen produced by MLwiN following this MCMC run. The top left hand box shows the trace for this parameter, the level-2 intercept variance. This looks satisfactory and this is conrmed by the estimated autocorrelation function (ACF) and partial autocorrelation function (PACF) below. A kernel density estimate is given at the top right and
MULTILEVEL MODELLING OF MEDICAL DATA
91
Figure 6. MCMC summary screen.
the bottom left box is a plot of the Monte Carlo standard error against number of iterations in the chain. The summary statistics give quantiles, mean and mode together with accuracy diagnostics that indicate the required chain length. The MCMC methods are particularly useful in models like the cross-classied and multiple membership models discussed in Section 8. This is because whereas the maximum likelihood methods involve constructing large constrained variance matrices for these models, the MCMC methods simulate conditional distributions in turn and so do not have to adjust to the structure of the model. For model tting, one strategy (but not the only one) is to use the maximum or quasilikelihood methods for performing model exploration and selection due to speed. Then MCMC methods could be used for inference on the nal model to obtain accurate interval estimates.
13. BOOTSTRAPPING Like MCMC the bootstrap allows inferences to be based upon (independent) chains of values and can be used to provide exact inferences and corrections for bias. Two forms of bootstrapping have been studied to date: parametric bootstrapping, especially for correcting biases in generalized linear models [45], and non-parametric bootstrapping based upon estimated residuals [46]. The fully parametric bootstrap for a multilevel model works as for a singlelevel model with simulated values generated from the estimated covariance matrices at each level. Thus, for example, for model (2) each bootstrap sample is created using a set of uj ; eij sampled from N(0; u2 ); N(0; e2 ), respectively.
92
H. GOLDSTEIN, W. BROWNE AND J. RASBASH
In the non-parametric case, full ‘unit resampling’ is generally only possible by resampling units at the highest level. For generalized linear models, however, we can resample posterior residuals, once they have been adjusted to have the correct (estimated) covariance structures and this can be shown to possess particular advantages over a fully parametric bootstrap where asymmetric distributions are involved [46]. 14. IN CONCLUSION The models that we have described in this paper represent a powerful set of tools available to the data analyst for exploring complex data structures. They are being used in many areas, including health, with great success in providing insights that are unavailable with more conventional methods. There is a growing literature extending these models, for example to multilevel structural equation models, and especially to the application of the multiple membership models in areas such as population demography [31]. An active e-mail discussion group exists which welcomes new members (www.jiscmail.ac.uk/multilevel). The data set used for illustration in this paper is available on the Centre for Multilevel Modelling website (multilevel.ioe.ac.uk/download/macros.html) as an MLwiN worksheet. REFERENCES 1. Snijders T, Bosker R. Multilevel Analysis. Sage: London, 1999. 2. Leyland AH, Goldstein H (eds). Multilevel Modelling of Health Statistics. Wiley: Chichester, 2001. 3. Sullivan LM, Dukes KA, Losina E. Tutorial in biostatistics: an introduction to hierarchical linear modelling. Statistics in Medicine 1999; 18:855 – 888. 4. Zeger SL, Liang K, Albert P. Models for longitudinal data: a generalised estimating equation approach. Biometrics 1988; 44:1049 – 1060. 5. Liang K-Y, Zeger SL, Qaqish B. Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society, Series B 1992; 54:3 – 40. 6. Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference (with discussion). Statistical Science 2000; 15:1 – 26. 7. Lindsey JK, Lambert P. On the appropriateness of marginal models for repeated measurements in clinical trials. Statistics in Medicine 1998; 17:447 – 469. 8. Goldstein H, Rasbash J. Ecient computational procedures for the estimation of parameters in multilevel models based on iterative generalised least squares. Computational Statistics and Data Analysis 1992; 13:63 – 71. 9. Longford NT. A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested eects. Biometrika 1987; 74:812 – 827. 10. Laird NM, Ware JH. Random-eects models for longitudinal data. Biometrics 1982; 38:963 – 974. 11. Bryk AS, Raudenbush SW. Hierarchical Linear Models. Sage: Newbury Park, California, 1992. 12. Rasbash J, Browne W, Goldstein H, Yang M, Plewis, I, Healy M, Woodhouse G, Draper D, Langford I, Lewis T. A User’s Guide to MlwiN. (2nd edn.). Institute of Education: London, 2000. 13. Goldstein H. Multilevel mixed linear model analysis using iterative generalised least squares. Biometrika 1986; 73:43 – 56. 14. Zhou X, Perkins AJ, Hui SL. Comparisons of software packages for generalized linear multilevel models. American Statistician 1999; 53:282 – 290. 15. Spiegelhalter DJ, Thomas A, Best NG. WinBUGS Version 1.3: User Manual. MRC Biostatistics Research Unit: Cambridge, 2000. 16. Goldstein H, Healy MJR, Rasbash J. Multilevel time series models with applications to repeated measures data. Statistics in Medicine 1994; 13:1643 – 1655. 17. Kenward MG, Roger JH. Small sample inference for xed eects from restricted maximum likelihood. Biometrics 1997; 53:983 – 997. 18. Goldstein H. Multilevel Statistical Models. Arnold: London, 1995. 19. Grizzle JC, Allen DM. An analysis of growth and dose response curves. Biometrics 1969; 25:357 – 361. 20. Albert PS. Longitudinal data analysis (repeated measures) in clinical trials. Statistics in Medicine 1993; 18: 1707 – 1732.
MULTILEVEL MODELLING OF MEDICAL DATA
93
21. Pourahmadi M. Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 1999; 86:677 – 690. 22. Pourahmadi M. Maximum likelihood estimation of generalised linear models for multivariate Normal covariance matrix. Biometrika 2000; 87:425 – 436. 23. Diggle PJ. An approach to the analysis of repeated measurements. Biometrics 1988; 44:959 – 971. 24. Goldstein H. Flexible models for the analysis of growth data with an application to height prediction. Revue Epidemiologie et Sante Publique 1989; 37:477 – 484. 25. Goldstein H, Yang M, Omar R, Turner R, Thompson S. Meta analysis using multilevel models with an application to the study of class size eects. Journal of the Royal Statistical Society, Series C 2000; 49: 399 – 412. 26. Raudenbush, SW. A crossed random eects model for unbalanced data with applications in cross sectional and longitudinal research. Journal of Educational Statistics 1993; 18:321 – 349. 27. Rasbash J, Goldstein H. Ecient analysis of mixed hierarchical and cross classied random structures using a multilevel model. Journal of Educational and Behavioural Statistics 1994; 19:337 – 350. 28. Rasbash J, Browne W. Non-hierarchical multilevel models. In Multilevel Modelling of Health Statistics, Leyland A, Goldstein H (eds). Wiley: Chichester, 2001. 29. Hill PW, Goldstein H. Multilevel modelling of educational data with cross-classication and missing identication of units. Journal of Educational and Behavioural Statistics 1998; 23:117 – 128. 30. Langford I, Leyland AH, Rasbash J, Goldstein H. Multilevel modelling of the geographical distributions of diseases. Journal of the Royal Statistical Society, Series C 1999; 48:253 – 268. 31. Goldstein H, Rasbash J, Browne W, Woodhouse G, Poulain M. Multilevel models in the study of dynamic household structures. European Journal of Population 2000; 16:373 – 387. 32. Hedges LV, Olkin IO. Statistical Methods for Meta Analysis. Academic Press: Orlando, Florida, 1985. 33. Raudenbush S, Bryk AS. Empirical Bayes meta-analysis. Journal of Educational Statistics 1985; 10:75 – 98. 34. Hedeker D, Gibbons R. A random eects ordinal regression model for multilevel analysis. Biometrics 1994; 50:933 – 944. 35. Raudenbush SW, Yang M, Yosef M. Maximum likelihood for generalised linear models with nested random eects via high-order multivariate Laplace approximation. Journal of Computational and Graphical Statistics 2000; 9:141 – 157. 36. Barbosa MF, Goldstein H. Discrete response multilevel models for repeated measures; an application to voting intentions data. Quality and Quantity 2000; 34:323 – 330. 37. Olsen MK, Schafer JL. A two-part random eects model for semi- continuous longitudinal data. Journal of the American Statistical Association 2001; 96:730 – 745. 38. Clayton DG. A Monte Carlo method for Bayesian inference in frailty models. Biometrics 1991; 47:467 – 485. 39. McCullagh P, Nelder J. Generalised Linear Models. Chapman and Hall: London, 1989. 40. Browne W, Draper D. Implementation and performance issues in the Bayesian and likelihood tting of multilevel models. Computational Statistics 2000; 15:391 – 420. 41. Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern analysis and Machine Intelligence 1984; 45:721 – 741. 42. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machines. Journal of Chemical Physics 1953; 21:1087 – 1092. 43. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970; 57:97 – 109. 44. Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistics Society, Series C 1992; 41:337 – 348. 45. Goldstein H. Consistent estimators for multilevel generalised linear models using an iterated bootstrap. Multilevel Modelling Newsletter 1996; 8(1):3 – 6. 46. Carpenter J, Goldstein H, Rasbash J. A nonparametric bootstrap for multilevel models. Multilevel Modelling Newsletter 1999; 11(1):2 – 5.
TUTORIAL IN BIOSTATISTICS Hierarchical linear models for the development of growth curves: an example with body mass index in overweight=obese adults Moonseong Heo1 , Myles S. Faith2 , John W. Mott3 , Bernard S. Gorman4 , David T. Redden5 and David B. Allison5; 6; ∗; † 1 Department
of Psychiatry; Weill Medical School of Cornell University; White Plains; NY; U.S.A. Research Center; St. Luke’s-Roosevelt Hospital Center; Columbia University College of Physicians and Surgeons; New York; NY; U.S.A. 3 New York State Psychiatry Institute; New York; NY; U.S.A. 4 Nassau Community College; New York; NY; U.S.A. 5 Department of Biostatistics; School of Public Health; The University of Alabama at Birmingham; Birmingham; Alabama; U.S.A. 6 Clinical Nutrition Research Center; School of Public Health; The University of Alabama at Birmingham; Birmingham; Alabama; U.S.A. 2 Obesity
SUMMARY When data are available on multiple individuals measured at multiple time points that may vary in number or inter-measurement interval, hierarchical linear models (HLM) may be an ideal option. The present paper oers an applied tutorial on the use of HLM for developing growth curves depicting natural changes over time. We illustrate these methods with an example of body mass index (BMI; kg=m2 ) among overweight and obese adults. We modelled among-person variation in BMI growth curves as a function of subjects’ baseline characteristics. Specically, growth curves were modelled with two-level observations, where the rst level was each time point of measurement within each individual and the second level was each individual. Four longitudinal databases with measured weight and height met the inclusion criteria and were pooled for analysis: the Framingham Heart Study (FHS); the Multiple Risk Factor Intervention Trial (MRFIT); the National Health and Nutritional Examination Survey I (NHANES-I) and its follow-up study; and the Tecumseh Mortality Follow-up Study (TMFS). Results indicated that signicant quadratic patterns of the BMI growth trajectory depend primarily upon a combination of age and baseline BMI. Specically, BMI tends to increase with time for younger people with relatively moderate obesity (256BMI¡30) but decrease for older people regardless of degree of obesity. The gradients of these changes are inversely related to baseline BMI and do not substantially depend on gender. Copyright ? 2003 John Wiley & Sons, Ltd. KEY WORDS:
obesity; body mass index; growth curves; hierarchical linear model; pooling
∗ Correspondence
to: David B. Allison, Department of Biostatistics, Clinical Nutrition Research Center, School of Public Health, The University of Alabama at Birmingham, 327M Ryals Public Health Building, 1665 University Boulevard, Birmingham, AL 35294-0022, U.S.A. † E-mail:
[email protected] Contract=grant sponsor: National Institute of Health; contract=grant numbers: R01DK51716, P01DK42618, P30DK26687, K08MH01530.
Tutorials in Biostatistics Volume 2: Statistical Modelling of Complex Medical Data Edited by R. B. D’Agostino ? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02370-8 95
96
M. HEO ET AL.
1. INTRODUCTION It is useful in many applied settings to estimate the expected pattern of within-individual change in some continuously distributed variable over time. This may involve developing depictions of growth or maturation in development studies (for example, reference [1]), ‘course’ in a chronic disorder (for example, reference [2]), short-term patterns of change in response to a stimulus such as an oral glucose load (for example, reference [3]), or change in a clinical trial [4]. There are many possible approaches to estimating such patterns of change. However, many have serious limitations when longitudinal changes in data vary with individual characteristics; not all individuals are measured the same number of times, and not all individuals are measured at the same intervals. These are common occurrences in observational studies. Hierarchical linear modelling (HLM) [5], an extension of the mixed model – also referred to as random coecient regression modelling [6], multilevel linear modelling [7], nested modelling, or seemingly unrelated regression – is a very exible approach to such situations [8–11]. Among other reviews, Sullivan et al. [12] recently presented a tutorial on HLM in the spirit of analysis of covariance (ANCOVA) with random coecients. The purpose of the present paper is to illustrate how HLM can be extended to develop growth curves with an easily understandable and clinically important example, body mass index (BMI; kg=m2 ). In Section 2, we present a pedagogical exposition with hypothetical data and introduce available statistical software for conducting HLM. In Section 3, the clinical implication of derived BMI growth curves is discussed. In Section 4, procedures for selecting specic databases for developing BMI growth curves are presented. In Section 5, specic statistical approaches for conducting HLM are introduced. In Section 6, results are presented and the nuances of the growth curves discussed. In Section 7, methodological issues related to HLM and clinical application of developed curves are reviewed.
2. PEDAGOGICAL EXPOSITION OF HLM 2.1. Example It has been said that a statistician is someone who can drown in a river that has an average depth of one foot. Obviously, one has to consider variability of depths around average levels. We also have to realize that when we pool individual time series and growth curves, these curves may vary in their mean levels, their trends, and in their variability around mean levels. In this section we will show how multilevel modelling can be used to assemble composite growth curves that account for variability attributable to dierent data sources, dierent data collection periods and dierent individuals. As a prelude to the analysis of body mass index (BMI; kg=m2 ) data, we will start with a simple example. We will rst analyse the data by ordinary least squares (OLS) methods and then reanalyse the data by means of hierarchical linear modelling (HLM). Table I presents hypothetical growth curve data from 30 cases observed on ten occasions. Plots of their growth are shown in Figure 1. We shall designate the rst 15 cases as ‘group 1’. The next 15 cases will be designated as ‘group 2’. Because we believed that some cases show both linear and curvilinear growth, we formed a quadratic equation model to predict the score of any
97
BMI GROWTH CURVE
Table I. Hypothetical example data with 30 cases in two groups over 10 time periods. Case Group
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Contrast code
−1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Time period T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
6.46 7.03 7.37 6.11 6.91 5.73 7.06 6.61 7.58 5.90 6.65 6.50 6.57 6.27 6.01 7.55 7.81 6.05 6.96 6.99 7.91 6.74 7.79 7.97 7.88 7.71 8.30 8.11 7.33 8.05
8.33 8.15 9.31 9.16 8.50 10.03 9.54 8.17 8.02 9.00 7.73 8.43 8.73 9.26 8.41 10.75 9.95 10.55 10.99 10.44 10.14 10.29 9.92 10.90 11.06 9.64 10.48 10.34 9.89 11.40
11.48 10.76 11.15 10.36 9.90 10.17 9.66 10.89 10.88 10.74 10.06 9.71 10.37 10.89 10.23 12.91 13.36 13.83 13.99 13.26 12.71 14.26 13.01 14.12 12.62 13.24 12.45 13.37 13.52 14.59
13.28 12.53 13.68 14.09 12.11 11.69 12.15 12.90 13.75 12.29 11.96 11.31 12.74 13.53 15.38 17.13 17.07 17.45 17.49 17.33 15.56 17.38 16.26 17.78 17.31 15.47 16.49 16.34 18.25 17.56
15.53 14.10 17.30 16.03 13.94 12.34 15.61 14.08 16.28 15.19 13.97 13.40 14.73 16.26 16.50 21.40 21.36 19.68 21.13 19.50 19.18 20.19 19.66 21.75 20.01 18.93 19.18 19.80 22.13 21.20
17.10 15.99 19.23 18.87 16.01 14.86 17.90 16.71 18.16 17.50 15.10 14.31 15.85 18.67 19.38 25.22 25.16 24.62 26.96 24.79 22.69 25.51 24.49 26.94 23.87 22.37 22.76 24.41 26.33 25.40
20.37 18.14 22.46 21.53 17.69 16.05 21.32 18.30 20.27 19.40 17.20 15.92 17.68 20.92 23.76 30.76 29.13 29.22 31.96 28.23 25.70 29.97 29.92 31.77 28.34 26.06 26.83 28.59 31.87 31.25
22.65 20.55 25.92 25.23 17.89 16.84 24.74 21.16 24.54 22.91 18.03 18.90 20.15 25.26 27.52 35.37 35.11 33.88 37.68 32.72 29.76 34.88 33.19 37.68 31.86 31.07 30.62 33.64 37.63 35.69
26.10 21.88 30.08 28.67 20.08 17.86 27.87 24.22 26.59 25.03 19.90 19.44 22.11 28.44 30.97 41.71 40.54 40.17 42.93 36.85 34.41 41.39 38.97 44.49 37.10 34.55 35.88 37.41 43.01 41.67
29.46 25.29 34.27 32.22 21.91 19.14 30.90 27.04 29.18 28.14 21.93 21.98 23.96 31.56 35.49 47.78 45.87 46.30 50.39 43.05 37.04 47.17 45.34 50.07 41.16 38.73 39.65 43.41 49.58 48.06
individual j, given any time point t. Our equation is as follows: yjt = 0 + 1 t + 2 t 2 + ejt in which 0 is the intercept, 1 is the linear term, 2 is the quadratic term, and e is the residual. Equations that characterize the prediction of the smallest data unit are called ‘level-1’ equations. In this example, the smallest data unit is a score from an individual at a given time point. Thus, we have 300 level-1 units. An OLS solution based on the 300 scores obtained from eight cases at 10 time periods produced the level-1 equation y = 5:11 + 1:95t + 0:12t 2 . Both the linear and quadratic terms were signicantly dierent from zero at the 0.01 level. Note that level-1 units are nested within larger groupings. In the case of time series and growth curve data, the higher level is typically the individual case. There are 30 ‘level-2’ units in this example. If we wished to do so, we could have considered even higher-level units. For example, we could have considered the two groups of four cases each to be ‘level-3’ units.
98
M. HEO ET AL.
50 Group 1 Group 2
outcome
40
30
20
10
2
4
6
8
10
Time
Figure 1. Graphical representation of the growth curves for the 30 hypothetical cases over 10 time periods in data of Table I; group 1 and 2 members are represented by solid and dotted lines, respectively.
However, for our purposes, we will consider group membership to be a characteristic of each level-2 unit. Let us show how level-2 characteristics can be related to the level-1 equation. First, let us carry out the same level-1 regression equation within each group. When we select only the four cases in group 1, we obtain the equation (Table II) y = 4:91+1:74t+0:05t 2 . The equation for group 2 is y = 5:30 + 2:16t + 0:18t 2 . Next, we shall carry out the equation within each one of our 30 level-2 units. Their intercepts, linear and quadratic coecients are presented in Table II, in which coecients for intercepts, linear and quadratic terms diered considerably among cases and between groups. While cases in both groups had positive linear components, the linear and quadratic components for the members of group 2 are typically larger than for those in group 1. We can form equations that show how slopes and intercepts for level-2 units (cases) dier as a function of characteristics such as group membership. These equations in which the dependent variables are regression coecients and the independent variables are characteristics of level-2 units, such as group membership, age, health statuses etc., are called ‘level-2’ equations. In our example, we consider only one independent variable: group membership. If we create a contrast code for group membership, G, so that members of group 1 are coded as −1 and those in group 2 are coded as +1 – that is, ‘G’ = −1 and = 1 for the subjects in groups 1 and 2, respectively – then we can regress group membership on the slopes and intercepts of each case. Using this approach, often called ‘slopes as outcome’, yields the OLS level-2 equations (Table III): 0 = 4:916 + 0:196; 1 = 1:950 + 0:209G; and 2 = 0:116 + 0:065G, for the intercept, linear and quadratic terms, respectively; the eects of group membership are statistically signicant only for 2 at 0.05 -level. Inserting the group
99
BMI GROWTH CURVE
Table II. Individual and group growth functions. Case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Means All cases Group 1 Group 2
Group
Contrast code
0
1
2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
−1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4.9717 5.3630 5.8408 4.4905 4.7575 4.8202 5.5502 5.2838 4.9323 4.4497 4.5135 5.2107 4.8520 4.7593 3.8638 5.4038 5.1818 4.7237 4.9400 4.7427 5.2245 4.9218 5.5230 5.6982 5.4235 5.2567 6.0532 5.7103 4.3565 6.3695
1.7028 1.5594 1.4252 1.8721 1.9535 1.9449 1.3102 1.4760 1.9040 1.8710 1.9063 1.4509 1.8775 1.8126 2.0476 2.0231 2.1956 2.1944 2.2586 2.4489 2.2963 2.2094 1.8689 2.0624 2.3266 2.1308 1.8966 2.0230 2.5504 1.8932
0.0711 0.0390 0.1402 0.0893 −0.0261 −0.0531 0.1278 0.0676 0.0550 0.0485 −0.0193 0.0207 0.0028 0.0877 0.1101 0.2215 0.1891 0.1917 0.2256 0.1326 0.0949 0.2001 0.2092 0.2417 0.1273 0.1247 0.1501 0.1742 0.1966 0.2261
5.1063 4.9106 5.3019
1.9497 1.7409 2.1585
0.1156 0.0508 0.1804
contrast codes into the level-2 equations yields the mean slopes for each group as shown in Table II. If we used the group-specic coecients for slopes instead of the original coecients from an overall level-1 equation based on all 300 units to predict scores within each case, we would have increased prediction accuracy. Thus, it can be seen that level-2 equations can provide useful information for level-1 equations because we can see how level-1 coecients are functions of level-2 characteristics. The two-step OLS procedure described above was used to illustrate the notion of multilevel modelling. However, it has several drawbacks. For one, it is extremely cumbersome to rst compute separate level-1 equations within each level-2 unit and then compute level-2 equations to link slopes and intercepts to level-2 information. In the BMI study to follow, this would involve thousands of individual equations. For another, the data for each
100
M. HEO ET AL.
Table III. Estimates of xed and random coecients. OLS
HLM
Estimate
SE
0 01
5.1063 0.1957
0:4918∗ 0.4918
5:1063∗ 0.1957
0.1072 0.1072
1 11
1.9497 0.2088
0:2054∗ 0.2054
1:9497∗ 0:2088∗
0.0449 0.0449
2 21
0.1156 0.0648
0:0182∗ 0:0182∗
0:1156∗ 0:0648∗
0.0092 0.0092
var(u0 ) cov(u0 ; u1 ) var(u1 ) cov(u0 ; u2 ) cov(u1 ; u2 ) var(u2 )
0:2451∗ −0:0887∗ 0:0446∗ 0.0045 −0:0043∗ 0:0025∗
0.0633 0.0250 0.0115 0.0046 0.0021 0.0006
0.0034 0.0011 0.0009 −0:0025 −0:0004 0:0021∗
0.0950 0.0375 0.0166 0.0057 0.0025 0.0007
0:1799∗
0.0148
0:2466∗
0.0241
Fixed coecients Intercept Linear Quadratic
Random coecients Level-2
Level-1 var(e)
Estimate
SE
∗
p-value¡0:05.
level-2 unit is collapsed to a few data points, thus losing both statistical power and information about intra-individual variation. Further, to have comparable equations using OLS methods, each level-2 unit should have the same number of observations based on the same times of observations with no missing data values. While highly desirable, this is often not feasible for many clinical studies. Finally, OLS estimation is based on the assumption that the residual from the prediction of each dependent variable observation is independent of those of other observations. This is highly unlikely to occur with longitudinal data because observations are nested within higher-level units. In the present example, while there were 300 observations for scores, these observations were nested within 30 individuals. While estimates of coecients will not necessarily be biased, standard errors for these coecients will almost always be biased. Hierarchical linear modelling (HLM) provides a more elegant approach to computing multilevel regression problems than does OLS. In HLM, we state both the level-1 and level-2 equation and then iteratively solve for coecients at both levels. This is typically achieved by means of numerical methods such as full information maximum likelihood estimation (FIML). Typically, OLS solutions are used as starting values of level-1 and level-2 coecients but these coecients are then rened through iteration. We typically designate the leve1-1 coefcients in HLM models using to symbolize slopes and intercepts. We label the coecients of our level-2 equation using . Level-1 residuals are designated by e while level-2 residuals are designated by u. Figure 2 displays a structural model for our example.
101
BMI GROWTH CURVE
γ 0 + γ 01* Group + u0
Constant
e1 b0
Time
Score
b1
γ 1 + γ 11 * Group + u1 b2
γ 2 + γ 21 * Group + u2 Time^2
Figure 2. Diagram of the model tting the growth curves in Figure 1 of the 30 hypothetical cases used for a pedagogical exposition.
As in the OLS example, 0 ; 1 and 2 are the respective level-1 intercept, linear and quadratic terms. Level-2 equations predict the level-1 coecients. For example, to predict 0 , the intercept, we use 0 to predict a common intercept for all level-1 units and 01 to account for additional eects of group membership while u 0 indicates a residual that is unique to the level-2 unit. Similarly, for 1 , the linear term, we use 1 to represent a common linear term for all level-1 units and 11 to represent the eect of group membership, and u1 represents the unique residual for the linear term. Finally, to predict the quadratic coecient, 2 represents a common quadratic eect, 21 represents the eect of group on the quadratic coecient and u2 represents a unique residual. It is assumed that e and any of u are independent, but u 0 , u1 and u2 are correlated with each other. We could combine the level-1 and level-2 equations into one equation: y = 0 + 01 G + 1 t + 11 Gt + 2 t 2 + 21 Gt 2 + tu1 + t 2 u2 + u 0 + e The terms with the coecients are called xed coecients because they are specied as independent variables in the level-1 and level-2 equations. The terms involving u and e are random coecients, as they cannot be measured beforehand. It may be useful to think of terms in which coecients are multipliers of independent variables as moderator eects. Thus, depending on their magnitude, they will moderate the strength and, possibly, the direction of the relationships between the independent and dependent variables. The program MLA [13] (http:==www.fsw.leidenuniv.nl=www=w3 ment=medewerkers=bushing=MLA.HTM), a userfriendly software program, was used to compute the HLM model for the present example. A code le for this program can be found in Appendix A1. The results of the HLM analysis and OLS analysis are presented in Table III.
102
M. HEO ET AL.
Examination of Table III shows that while the estimates for the xed coecients for OLS and HLM are the same, their standard errors dier. While this is not always the case, the standard errors are smaller for the HLM solution. Table III also reveals that while the OLS produced the same xed coecients, they diered for the random coecients. This eect is due to the fact that HLM’s iterative algorithms allow it to use more information. The present example used a data set in which no case had any missing data values. Because HLM revises its parameter estimates in an iterative manner, it is not absolutely necessary to have complete data sets. Of course, the more complete the data, the better the estimates. As shown in the following BMI growth curve example, this will allow us to combine fragments of growth curves and then ‘splice’ them together to form a larger curve. The present example used level-2 equations that modelled both common and group specic gammas and unique residuals for the slope, the linear and the quadratic term. However, HLM provides a great deal of exibility for modelling eects. For example, we could specify the same slopes for all subjects but permit the intercepts to vary. If we created a model with the same slopes and intercepts for each case, then we would have a model that is identical to an OLS model that ignores clustering within subjects. At this point, one might ask about how we might select the best model. As with other multivariate procedures, the best model is typically one that balances minimization of error variance with minimization of parameters to be estimated. All of the programs that perform HLM provide measures of model t, ranging from mean-squared error terms to deviance measures. The use of Akaike’s information criterion (AIC) [14] and Bayesian information criterion (BIC) [15] indexes can help choose models that minimize errors while adjusting for the number of parameters to be estimated. These criteria are applied in the subsequent analyses. 2.2. Statistical software Besides MLA, there exist a range of statistical packages that can conduct HLM. Goldstein (reference [7, Chapter 11]) described all of the following packages in terms of source, modelling, adopted methodologies and limitations: BIRAM; BMDP-5V, BUGS (http:==www.mrcbsu.cam.ac.uk=bugs); EGRET (http:==www.cytel.com); GENMOD; GENSTAT (http:==www. nag.co.uk=stats=TT soft.asp); HLM (http:==www.ssicentral.com); ML3/MLn (http:==multilevel. ioe.ac.uk=index.html); SABRE (http:==www.cas.lancs.ac.uk=software=sabre3.1=sabre.html); SAS (http:==www.sas.com); SUDAAN (http:==www.rti.org), and VARCL (distributed by ProGamma, http:==www.gamma.rug.nl). Kreft et al. [16] reviewed ve particular packages: BMDP-5V; GENMOD; HLM; ML3, and VARCL. Singer [17] demonstrated the practical use of the SAS PROC MIXED program in multilevel modelling with program codes. Sullivan et al. [12] further detailed programming with SAS PROC MIXED and HLM=2L [18]. The most recent revisions of standard SEM packages – such as LISREL (http:==www.ssicentral. com), EQS (http:==www.mvsoft.com=), Mplus (http:==www.statmodel.com=index2.html) and AMOS (http:==www.smallwaters.com=amos=) – have multilevel analysis capability. In addition, GLLAM6 (http:==www.iop.kcl.ac.uk=iop=departments=biocomp=programs=gllamm.html), MIXOR/MIXREG (http:==www.uic.edu= ∼hedeker=mix.html) and OSWALD (http:==www. maths.lancs.ac.uk=Software=Oswald) should perform the multilevel analyses. In the Appendix A, we include code and output les from the following three software using the data in Table I: MLA; SAS, and MLwin. The rst two used the maximum-likelihood (ML) method for tting purposes but MLwin used the iterative generalized least squares (IGLS)
BMI GROWTH CURVE
103
method, which is asymptotically equivalent to the ML method when the normal distribution assumption is met [7]. Although all these software programs produced the same estimates of the xed eects and their standard errors, other estimates are slightly dierent. These dierences, an issue for future investigation, may be due to program-specic maximization routines and=or the relatively small sample size of the example data in Table I. However, the signicances of the estimated random eects agree among the three software. For the subsequent analyses and calculations, we used MLwin v1.0 software [19] (http:== multilevel.ioe.ac.uk=index.html), the window version of MLn. The other software in the Appendix were not able to accommodate the size of data that we analysed below.
3. CLINICAL IMPLICATION OF DEVELOPMENT OF BMI GROWTH CURVE Obesity is increasingly common in the United States and a serious risk factor for decreased longevity and many medical disorders such as diabetes and cardiovascular disease [20]. According to the third National Health and Nutrition Examination Survey (NHANES-III), over half the U.S. adult population is overweight or obese [21] and the prevalence continues to increase [22]. From an epidemiologic point of view, BMI alone may not capture the true relations of body composition to health outcomes [23] because BMI does not distinguish between lean mass and fat mass. Nevertheless, BMI is highly correlated with weight and fat mass, and tends to be approximately uncorrelated with height. BMI is arguably the most commonly used index of relative weight in clinical trials and epidemiological studies of the health consequences of obesity [24]. Furthermore, BMI was the index adapted in the most recent NHLBI ‘Clinical guidelines for the identication, education, and treatment of overweight and obesity in adults’ [21]. For these reasons, we used BMI in our analyses. In this paper, overweight=obesity is dened as a body mass index (BMI; kg=m2 ) greater than or equal to 25. Although the NHLBI guidelines [21] classify this as ‘overweight’ (25 6 BMI¡30) and ‘obesity’ (BMI ¿ 30), we combine these groups for pedagogical reasons. A growing number of clinical trials evaluating adult obesity treatments have been initiated in recent years [21]. A primary aim of these clinical trials is to assess the long-term eectiveness of the experimental treatment [25]. At present, however, there are few data to predict the expected change in weight associated with the passage of time among obese persons who do not receive treatment. For ethical and practical reasons, it is generally not possible to maintain a ‘pure’ control group of obese adults from who all treatments are withheld over a number of years. As Brownell and Jeerey [26] argue, this makes it dicult to assess the long-term eectiveness of an experimental treatment since there is no control group against which it can be compared. Furthermore, the point of such comparison is a moving target. This issue is especially relevant in light of proposals that obese people receive treatment for life [27]. One possible solution to this problem is the development of ‘natural growth curves’ of body weight for overweight=obese individuals in the general population. In theory, such growth curves could provide ideal control group data against which experimental groups could be compared over time. That is, growth curves could serve as a ‘surrogate’ for the performance of control groups. A few cross-sectional studies (for example, reference [22]) have attempted to model BMI changes with age. However, this approach can yield misleading results since it assumes that the within-individual growth curve is the same as the between-individual curve. This need
104
M. HEO ET AL.
not be the case, especially if the growth trajectory depends on individuals’ characteristics such as sex, age and initial baseline BMI. For example, the trajectories could be linearly increasing, linearly decreasing, U-shaped or inverted U-shaped etc. Other limitations of the cross-sectional studies include susceptibility to cohort secular eects and selection bias. This necessitates modelling variation in growth curves between individuals as a function of such characteristics. As demonstrated in the following sections, HLM oers a exible approach for developing such ‘BMI growth curves’. Given these modelling exibilities and the potential clinical utility of BMI growth curves, this study used HLM to develop an estimate of natural growth curves for overweight=obese adults in the U.S. population.
4. SELECTION OF DATABASES TO DEVELOP BMI GROWTH CURVES Databases satisfying the following requirements were included in the present study: 1. Raw individual subject data on measured weight and height available at multiple time points. Measured height and weight were required since self-reported weights are often underestimated, especially by obese individuals [28]. 2. Individuals age¿18 years at baseline. Younger individuals, if any, were excluded. 3. Included overweight=obese individuals, dened as BMI¿25, at baseline. Individuals with lower BMIs were excluded. 4. Baseline information available on sex and age. A thorough search of eligible databases was conducting via the following data sources: electronic government databases on health information published through National Center for Health Statistics (NCHS) for the databases of Centers for Disease Control and Prevention (CDC); Young et al. [29]; archival data repositories at the Henry A. Murray Research Center and Institute for Social Research. Four eligible databases were located. They are summarized below and in Table IV.
4.1. Framingham Heart Study (FHS) This longitudinal prospective cohort study was an epidemiological investigation of cardiovascular disease. It was initiated in 1948 and followed an initial sample of 5209 residents from Framingham County of Massachusetts every other year for 30 years, yielding up to 16 examinations per subject [30]. Our inclusion criteria resulted in 2665 eligible subjects.
4.2. Multiple Risk Factor Intervention Trial (MRFIT ) This study, initiated in 1974, followed 12866 men every year for 7–8 years, yielding up to eight examinations per subject [31, 32]. The objective of this trial was to intervene on the primary risk factors for coronary heart disease. This study included an intervention and a control group. We excluded 6428 subjects assigned to the intervention group to avoid possible eects of interventions on natural weight change. The inclusion criteria were therefore applied only to the control group subjects and resulted in 4599 subjects eligible for the present study.
45.7 46.4 52.1 44.8 48.1
(51.9%) (56.1%) (14.9%) (42.1%) (38.9%)
(8.5;28,62) (6.0;35,58) (14.7;25,77) (14.6;18,91) (11.8;18,91)
Baseline age
Ever‡ (%)
1383 2579 746 888 5596
2665(50) 4599(100) 4999(48) 2112(54) 14375(66)
Sample size (% male)
(47.4%) (43.9%) (13.9%) (57.9%) (36.2%)
Never (%)
20 (0.7%) 0 (0.0%) 3561 (71.2%) 1 (0.0%) 3852 (24.9%)
†
2665 4599 4999 2112 14375
Total
11.2 (4.75;1,16) 6.8 (1.58;1,8) 1.8 (0.40;1,2) 2.4 (0.83;1,3) 5.2 (4.23;1,16)
Number of time points∗ of measurements
Missing (%)
22.0 (9.2;0,30) 6.06 (1.4;0,7.9) 7.96† (4.1;0,12.6) 5.81 (3.5;0,10.1) 9.63 (7.7;0,30)
Follow-up years from baseline
Baseline smoking status
(3.42;25,56.7) (2.89;25,40.9) (4.08;25,63.3) (3.90;25,56.1) (3.60;25,63.3)
1262 2020 692 1223 5197
28.7 28.8 29.3 29.0 29.0
Baseline BMI
Mean (SD; min, max) over subjects
Including baseline. Among those who had both measurements performed (baseline and follow-up), the average follow-up time was nearly 10 years. ‡ ‘Current’ plus ‘former’ smoker.
∗
FHS MRFIT NHANES-I and NHEFS TMFS Total
Study
FHS MRFIT NHANES-I and NHEFS TMFS Total
Study
Table IV. Descriptive statistics of eligible subjects by databases.
BMI GROWTH CURVE
105
106
M. HEO ET AL.
4.3. National Health and Nutrition Examination Survey I (NHANES-I ) and its National Health Epidemiologic Follow-up (NHEFS) The NHANES-I, designed to measure the nutritional status and health of the U.S. population aged 1–74 years, was conducted between 1971 and 1975, and examined 23808 subjects. A subset was followed-up in 1982–1984, 1986, 1987 and 1992 [33]. Height and weight were measured only in the follow-up between 1982 and 1984. Subsequently, height and weight were self-reported [34]. Therefore, the subjects were measured, at most, twice in this study. The inclusion criteria resulted in 4999 eligible subjects. 4.4. Tecumseh Mortality Follow-up Study (TMFS) This study was a longitudinal prospective investigation of health and disease status among residents at Tecumseh, Southeast Michigan. Heights and weights of subjects were measured over the entire follow-up period. Heights and weights of 8800 residents of 2400 households were rst measured in 1959–1960 and served as our baseline. Two more follow-ups occurred in 1962–1965 and 1967–1969 [35]. A total of 2112 subjects were eligible for the present study. 4.5. Pooled data We excluded from the analysis measurements at any follow-up time point taken from women who were pregnant at that time point. The four databases above were pooled into a single data set, yielding a total of 75351 data units (the rst-level observations) from 14375 subjects (the second-level observations). Additionally, baseline smoking status (‘ever’ versus ‘never’) was available on 10793 subjects, yielding 68830 data units. When smoking status was included in models, subjects with missing observations were excluded from analyses.
5. STATISTICAL ANALYSES 5.1. Model construction To estimate BMI trajectories over ‘time’, it is important to select an appropriate variable to reect time in our model construction. The selection of the time variable and its origin must be clear in the context of a study. For example, in a quality control study of a copier, number of copies produced by the copier could be a better time variable than the calendar time from its year of manufacture. Similarly, in the present study there may be at least two alternative time variables: age of the subjects and time-on-study, the calendar time from the baseline. One can construct HLM using age for the key ‘time’ variables as discussed in the context of Cox proportional hazards regression model [36], although the origin of the time variable age, that is 0, may not be appropriate for the present study. In the present example, time-on-study may be a more appropriate variable because its time origin is clear and it is appealing from a clinical point of view to predict future BMI at dierent points in time. Let us denote ‘time’ by Tij , the length of the time interval in years between the baseline and the consecutive ith (i = 1; 2; : : : ; nj ) follow-up for the jth (j = 1; 2; : : : ; N ) individual in the pooled data set. For notational convenience, i = 1 for the baseline, which implies T1j = 0,
BMI GROWTH CURVE
107
for all j. First, for the rst level of time points for each individual, the BMI, say Yij , was modelled in terms of Tij as follows: Yij = 0j + 1j Tij + 2j Tij2 + eij = Tij Rj + eij
(1)
where Tij = (1; Tij ; Tij2 ) is a vector and Rj = (0j ; 1j ; 2j ) is a random vector and independent from eij , the rst-level residual. We assume that E(eij ) = 0 and cov(eij ; efg ) = I {i = f & j = g} 2 , where I {:} is an indicator function, that is, I {i = f & j = g} = 1, if i = f and j = g, and = 0, otherwise; variance of the rst-level error e is assumed to be homogeneous and independent among subjects as well as among dierent time points within subjects. This assumption implies that the rst-level residuals are uncorrelated after adjusting for trend, and var(eij ) = 2 for all i and j. However, this assumption does not necessarily imply that the rst-level observations within individuals are uncorrelated – see (4). The random coecients ’s are further modelled (as functions of covariates Z) for the second-level individuals by decomposing into xed and random components as follows: 0j = 00 + 01 Z1j + 02 Z2j + · · · + 0p0 Zp0 j + u 0j 1j = 10 + 11 Z1j + 12 Z2j + · · · + 1p1 Zp1 j + u1j and 2j = 20 + 21 Z1j + 22 Z2j + · · · + 2p2 Zp2 j + u2j where u 0j , u1j and u2j represent residuals for intercepts, slope and quadratic terms, respectively. It follows that for k = 0; 1 and 2 kj = Zkj k + ukj
(2)
where Zkj ’s are vectors with dierent lengths depending on k and their rst component is 1, Zkj = (1; Z1j ; Z2j ; : : : ; Zpk j ) and k = (k0 ; k1 ; k2 ; : : : ; kpk ) . The other components of Zkj ’s are the covariate values for the jth individual’s baseline characteristic. The vector S’s are xed parameter column vectors, and E(uj ) = 0 = (0; 0; 0) ; uj = (u 0j ; u1j ; u2j ) and var(u 0j ) var(u1j ) cov(uj ) = u = cov(u 0j ; u1j ) cov(u 0j ; u2j ) cov(u1j ; u2j ) var(u2j ) for all j. This covariance is modelled for the second-level variance of the growth curve. In sum, by introducing the second-level model (2) into the rst-level model (1), we have Yij = (Z0j S0 ; Z1j S1 ; Z2j S2 )Tij + Tij uj + eij the rst term in the right hand side (RHS) representing the xed eects and the other terms in the RHS the random eects (or parameters). Thus, the hierarchical linear model can be
108
M. HEO ET AL.
represented in terms of expectation and covariance in general as E(Y) = X
and
cov(Y) =
(3)
where Y = (Y1 ; : : : ; Yj ; : : : ; YN ) and Yj = (Y1j ; Y2j ; : : : ; Ynj j ) . The expectation X and the covariance matrix represent xed and random eects, respectively. Specically, for the xed eects, the row of the design matrix X corresponding to Yij is (Zj ; Tij ; Z1j ; Tij2 Z2j ), and = (S0 ; S1 ; S2 ) , the xed coecient vector. On the other hand, regarding elements of cov(Yij ; Yfg ) = I {i = f & j = g}2 + I {j = g}Tij u Tfg
(4)
In other words, the variance of the rst-level observation is var(Yij ) = 2 + Tij u Tij , which is the sum of the rst- and the second-level variance, and the covariance between observations at two dierent time points i and f within each individual is Tij u Tfg . This implies that the covariance matrix is block-diagonal with size Nj=1 nj × Nj=1 nj , that is, = Nj=1 (2 Inj + Vj ), where ⊕ denotes the direct sum of matrices, Inj is the nj × nj identity matrix, and Vj is the nj × nj matrix with (i; f)th element Tij u Tfg for the jth individual. In this case, the ith time point is not necessarily dierent from the fth one. This covariance structure is crucial to the estimation procedure described in the following section, because the second level covariance must be taken into consideration, which traditional approaches such as OLS estimation do not accommodate. OLS assumes that Vj is a zero matrix for all j. 5.2. Estimation Under model (3) with unknown , which is a general case, estimation of both xed and random eects would require iterative generalized least squares (IGLS; for example, reference ˜ −1 X)−1 X ˜ −1 Y for some [37]). Briey, the generalized least squares estimate of is ˆ = (X ˜ ˜ ˆ initial estimate . With the newly estimated and residuals calculated from it, estimates ˜ ˆ can be updated. With the updated estimates , estimates will again be obtained. Such iteration continues until some convergence criteria are met. For further details of estimation procedures and properties of resulting estimates, see Goldstein (reference [7, Chapter 2]). ˜ the variance at the rst level, the variance at the second level at time With a nal , ˜ −1 X)−1 , Tij = t, and the covariance of ˆ will be estimated as ˜ 2 , (1; t; t 2 )˜ u (1; t; t 2 ) and (X respectively. The IGLS estimates are shown to be equivalent to maximum likelihood estimates [37] when the error terms are normally distributed, that is, eij ∼ N(0; 2 ) and uj ∼ MN(0; u ), which in turn implies Y ∼ MN(X; )
(5)
However, even when the data are not multivariate normal, the IGLS still yields consistent estimates [7]. 5.3. Goodness of t For comparisons of several competing models with dierent rst- and second-level variables X (especially with dierent Z), relative goodness of t can be measured by the resulting log-likelihood. That is, under normality assumption (5) apart from constants log L(Y; X; ) = −tr[−1 (Y − X)(Y − X) ] − log ||
109
BMI GROWTH CURVE
where tr is the trace, that is, sum of diagonal terms of a square matrix, and || is the determinant of the matrix . Even individuals with only single time point measurements can contribute not only to the estimation of the intercept time course of BMI change but also to the estimation of slopes, that is, the longitudinal changes in BMI because of the iterative nature of tting and thus calculation of the likelihood. Alternatively, for the comparisons, we used the correlation r between predicted values from estimated xed eects and observations, that is ˆ r = corr(Y; X) and the mean squared error (MSE), that is ˆ = (Y − X) ˆ (Y − X)=(l(Y) ˆ MSE(X) − l()) where l(Y) is the length of vector Y. Signicance testing of joint eects of additional covariates was performed by comparing the so-called ‘deviance’, that is, twice minus the log-likelihood dierence, which is asymptotically distributed as 2 with the number of additional covariates for the degrees of freedom. To see a relative increment of goodness of t by those additional covariates, we calculated the pseudo multiple R2 improvement described by Everitt (reference [38], p. 267), that is, the per cent increment in log-likelihoods due to addition of more independent variables in a model: log L(Y; X; ) %R2 (X : W) ↑ = 1 − × 100 log L(Y; W; ) where W and are subsets of X and , respectively. In other words, the model with W is nested in the model with X. For comparisons among non-nested models (that is, when W and are not necessarily subsets of X and , respectively), both Akaike’s information criterion (AIC) [14] and the Bayesian information criterion (BIC) [15] can be used. The AIC is simply the reduction in loglikelihood minus the increase in number of parameters; positive AIC indicates an improvement in t (reference [39], p. 272). Specically, AIC can be dened as AIC(X : W) = log L(Y; X; ) − log L(Y; W; ) − [l() − l()] Similarly, BIC can be dened as
BIC(X : W) = log L(Y; X; ) − log L(Y; W; ) − [l() − l()] log
N
nj
j=1
Positive BIC also indicates an improvement in t. 5.4. Models considered We considered ve models, namely M1 to M5. The associated coecients and parameters can be found in Table V. These ve models have the same form as in (1). However, they
00 : Intercept 01 : Age 02 : Sex† 03 : BMI 04 : FHS‡ 05 : MRFIT‡ 06 : NHANES‡
10 : Intercept 11 : Age 12 : Sex 13 : BMI 14 : FHS 15 : MRFIT 16 : NHANES
20 : Intercept 21 : Age 22 : Sex 23 : BMI 24 : FHS 25 : MRFIT 26 : NHANES
1j
2j
−3:1 × 10−3 (2:1 × 10−3 ) −1:2 × 10−4∗ (2:0 × 10−5 ) 7:0 × 10−5 (4:2 × 10−4 ) 1:6 × 10−4∗ (6:0 × 10−5 )
— — —
−3:7 × 10−3 (2:1 × 10−3 ) −1:1 × 10−4∗ (2:0 × 10−5 ) 3:3 × 10−4 (4:2 × 10−4 ) 1:6 × 10−4∗ (6:0 × 10−5 )
— — —
— — — — — —
−3:8 × 10−3∗ (2:1 × 10−4 )
— — —
— — —
0:751∗ (0:041)
−4:9 × 10−3∗ (4:0 × 10−4 ) −0:023∗ (0:009) −0:016∗ (1:2 × 10−3 )
— — —
0:759∗ (0:041)
−5:1 × 10−3∗ (4:0 × 10−4 ) −0:026∗ (0:009) −0:016∗ (1:2 × 10−4 )
0:753∗ (0:023)
0:298∗ (0:077) 2:4 × 10−4 (7:2 × 10−4 ) 0:051∗ (0:019) 0:987∗ (0:002) −0:070∗ (0:025) −0:143∗ (0:024) 0:075∗ (0:026)
−6:7 × 10−3∗ (2:2 × 10−4 ) −0:019∗ (4:7 × 10−4 ) −0:013∗ (6:7 × 10−4 )
0:266∗ (0:074) 1:4 × 10−3∗ (7:0 × 10−4 ) −0:036∗ (0:017) 0:986∗ (0:002) — — —
0:265∗ (0:072) 0:002∗ (6:8 × 10−4 ) −0:040∗ (0:016) 0:985∗ (0:002) — — —
Model 4 (M4)
— — —
−2:6 × 10−3 (2:1 × 10−3 ) −1:1 × 10−4∗ (2:0 × 10−5 ) 0:001∗ (4:6 × 10−4 ) 1:8 × 10−4∗ (6:0 × 10−5 )
0:037∗ (0:010) 0:010(9:9 × 10−3 )
0:786∗ (0:042)
−5:3 × 10−3∗ (4:1 × 10−4 ) −0:051∗ (0:010) −0:016∗ (1:2 × 10−3 ) −0:051∗ (0:010)
0:268∗ (0:077) 6:5 × 10−4 (7:2 × 10−4 ) 0:071∗ (0:020) 0:987∗ (0:002) −0:038(0:026) −0:163∗ (0:026) 0:051(0:027)
Model 3 (M3)
Model 2 (M2)
covariate (Z)
Model 1 (M1)
Estimated xed coecients (SE) for
Baseline
0j
0:011∗ (2:9 × 10−3 ) −1:2 × 10−4∗ (2:0 × 10−5 ) −7:0 × 10−5 (4:6 × 10−4 ) 1:5 × 10−4∗ (6:0 × 10−5 ) −0:012∗ (0:002) 9:5 × 10−3∗ (2:5 × 10−3 ) −0:020∗ (0:004)
0:189∗ (0:041)
−0:016(0:023)
0:071∗ (0:021)
0:657∗ (0:045)
−5:1 × 10−3∗ (4:1 × 10−4 ) −0:031∗ (0:010) −0:016∗ (1:2 × 10−3 )
0:337∗ (0:077) 4:3 × 10−4 (7:2 × 10−4 ) 0:059∗ (0:020) 0:987∗ (0:002) −0:101∗ (0:028) −0:090∗ (0:028) 0:002∗ (0:028)
Model 5 (M5)
Table V. Results of hierarchical linear model tting with 75 351 data units from 14 257 subjects; the TMFS subjects are the referent.
110 M. HEO ET AL.
0.824 4.26 0.058% (+; +) 0.048% (+,+) —
0.824 4.26 0.011% (+,−) —
0.824 4.27 —
¡0:0001 ¡0:0001 —
¡0:0001 —
—
260,780
1:161∗ (6:7 × 10−3 ) 0.000(0.000) 0:149∗ (2:3 × 10−3 ) 1:9 × 10−4∗ (0:000) 0.000(0.000) 0.000(0.000) −0:005∗ (1:0 × 10−4 )
260,904
1:163∗ (6:7 × 10−3 ) 0.000(0.000) 0:149∗ (2:3 × 10−3 ) 1:9 × 10−4∗ (0:000) 0.000(0.000) 0.000(0.000) −0:005∗ (1:0 × 10−4 )
260,932
1:163∗ (6:7 × 10−3 ) 0.000(0.000) 0:150∗ (2:3 × 10−3 ) 2:0 × 10−4∗ (0:000) 0.000(0.000) 0.000(0.000) −0:005∗ (1:0 × 10−4 )
‡
†
p-value¡0:05. Sex = 1 for male, 0 for female. FHS = 1, if subjects belong to this study, 0 otherwise. MRFIT and NHANES are similarly dened. § Signs of AIC and BIC: the positive sign indicates improvements and the negative sign no improvements.
∗
Goodness of t −2 log L p-value versus model 1 versus model 2 versus model 3 versus model 4 r MSE (AIC, BIC)§ %R2 ↑ versus model 1 versus model 2 versus model 3 versus model 4
Estimated random eects (SE) ˜ 2 ˜ u var(u 0j ) var(u1j ) var(u2j ) cov(u 0j ; u1j ) cov(u 0j ; u2j ) cov(u1j ; u2j )
0.101% (+,+) 0.090% (+,+) 0.043% (+,+) —
¡0:0001 ¡0:0001 ¡0:0001 — 0.824 4.26
260,668
1:159∗ (6:7 × 10−3 ) 0.000(0.000) 0:150∗ (2:3 × 10−3 ) 2:0 × 10−4∗ (0:000) 0.000(0.000) 0.000(0.000) −0:005∗ (1:0 × 10−4 )
0.194% 0.183% 0.136% 0.093%
(+,+) (+,+) (+,+) (+,+)
¡0:0001 ¡0:0001 ¡0:0001 ¡0:0001 0.825 4.25
260,426
1:155∗ (6:6 × 10−3 ) 0.000(0.000) 0:149∗ (2:3 × 10−3 ) 2:0 × 10−4∗ (0:000) 0.000(0.000) 0.000(0.000) −0:005∗ (1:0 × 10−4 )
BMI GROWTH CURVE
111
112
M. HEO ET AL.
Intercept
Age
γ 00
Intercept
Sex
γ 01
γ 02
BMI
γ 03
β0 j
eij
Y u0 j Intercept
γ 10
u2 j
BMI
γ 13
γ 21
γ 11 γ 12
β1 j
Sex
Age
Age
γ 22 γ 23
u1 j
Intercept
γ 20
Sex
β2 j BMI
T T2
Figure 3. Diagram of model M2. Thick objects represent the rst-level relationships. Thin objects represent the second-level relationships. Arrows with an open head are associated with xed parameters. Arrows with a closed head are associated with random parameters. Solid lines are associated with coecients. Dotted lines are associated with errors. The circle with shadow represents the dependent variable.
dier in modelling random coecients – see (2). For example, the rst two models M1 and M2 are M1:
0j = 00 + 01 Agej + 02 Sexj + 03 BMIj + u 0j 1j = 10 + 11 Agej + 12 Sexj + 13 BMIj + u1j 2j = 20 + u2j
M2:
0j = 00 + 01 Agej + 02 Sexj + 03 BMIj + u 0j 1j = 10 + 11 Agej + 12 Sexj + 13 BMIj + u1j 2j = 20 + 21 Agej + 22 Sexj + 23 BMIj + u2j
BMI GROWTH CURVE
113
where Sex = 1 for males and 0 for females and BMI represents the baseline BMI. These rst two models dier only in modelling the random coecient for the quadratic term for time and do not control for the contributions of dierent studies so that predictions of BMI could be based on these models. Figure 3 shows a diagram depicting a path of eects of variables in model M2. On the other hand, the other three models M3 to M5 are concerned with possible heterogeneity of growth curves over dierent studies using the study indicators for the second-level covariates and taking the TMFS study subjects for the referent. They can be similarly written based on the coecients specied in Table V. Specically, the rst-level model of M1 and M2 allowed BMI to be a linear and quadratic function of time, respectively. In both models M1 and M2, the second-level models for the intercept, the linear time term, and for model M2, the quadratic time term, included age, sex and initial BMI. In models M3, M4 and M5, the second-level model allowed the intercept to be a quadratic function of time. In model M3, the second-level model allowed the intercepts to be a function of study site in addition to age, sex and initial BMI. Model M3 allowed the intercept and linear time term to be a function of study site in addition to age, sex and initial BMI. Model M5 allowed the intercept and linear and quadratic time term to be a function of study site in addition to age, sex and initial BMI. In short, models M3–M5 can be used for testing the signicance of eect modication (or, equivalently heterogeneity) of the BMI growth trajectories by studies. That is, these models test whether the trajectories are the same across studies.
6. RESULTS 6.1. Main results Table V shows results from tting models M1 to M5 for the BMI growth curve over time from all available second-level 14375 subjects with the rst-level 75351 data units. M2 ts signicantly better (p¡0:0001) than M1 with positive AIC (negative BIC though), although the dierence is trivial in terms of the estimated coecients, %R2 ↑ and correlation r between predicted and observed values. This implies that the joint eect of age (21 ), sex (22 ) and initial baseline BMI (23 ) on the quadratic term 2 is slight in magnitude but signicant (the small eect size, however, may not be biologically important). The predicted BMI (Y ) at time t, (that is, t from baseline) based on model M2 can be written Y (t) = 0:266 + 0:0014Age − 0:036Sex + 0:985BMI + (0:759 − 0:0051Age − 0:026Sex − 0:016BMI)t + (−0:0037 − 0:00011Age + 0:00033Sex + 0:00016BMI)t 2
(6)
For this model, all the estimated coecients are signicant at 0.05 alpha level with the exception of the estimated intercept −0:0037 for the quadratic time term (Table V). These results indicate that the BMI growth trajectory is dependent upon age, sex and initial baseline BMI (see Figure 4 and Table VI). For example, a 25-year-old male with current BMI 25 is expected to have BMI 26.9 in 10 years while a 45-year-old female with current BMI 25 is expected to have BMI 25.9 in 10 years. For the graphical representation, we chose 27.5 and 32.5 as initial BMI values, representing midpoints of overweight and class I obesity statuses
M. HEO ET AL.
30
30
29
29
28
28 BMI
BMI
114
27 age=30 age=45 age=60
26 25
25 24
0
5
(a)
10
15
20
0
34
33
33
32
32
31 age=30 age=45 age=60
29
10
15
20
15
20
time (T)
34
30
5
(c)
time (T)
BMI
BMI
age=30 age=45 age=60
26
24
31 age=30 age=45 age=60
30 29 28
28 0
(b)
27
5
10 time (T)
15
20
0
(d)
5
10 time (T)
Figure 4. Growth curves for dierent baseline ages estimated from model M2 in Table V; (a) for men with baseline BMI 27.5; (b) for men with baseline BMI 32.5; (c) for women with baseline BMI 27.5; (d) for women with baseline 32.5.
[21]. BMI tends to increase over time for younger people with relatively moderate obesity but decrease for older people regardless of degree of obesity. The gradients of such changes are inversely related to the baseline BMI, but do not substantially depend on gender, although women tend to gain more and lose less weight than do men. Heterogeneity of such patterns over dierent studies was tested in terms of interactions of study eect with the intercept [40], the slope and the quadratic terms: k4 ; k5 and k6 , for k = 0; 1; 2. The joint eects of these interactions were signicant (Table V), implying statistically signicant heterogeneity among studies. However, the correlations, r, from models M3 to M5 were the same as those from models M1 and M2. Moreover, the R2 improvements are minimal even from model M1 with maximum improvement 0.194 per cent (Table V), suggesting that any potential eect modication by individual studies is not substantial. For example, based on M4 (Table V), if the 25-year-old male with current BMI 25 had come from FHS, MRFIT, NHAHES-I or TMFS, we would expect him to have BMI 26.4, 27.2, 27.1 or 27, respectively, in 10 years. However, this non-substantial eect modication due to dierent studies may not be biologically important. Through models M1 to M5 in Table V, the estimated random eects and their standard errors were almost invariant, yielding a high negative correlation between the linear slope √ √ and the quadratic terms, that is, corr(1j ; 2j ) = cov(u1j ; u2j )= {var(u1j )var(u2j )} ≈ −0:005= (0:15 ×
115
BMI GROWTH CURVE
Table VI. Prediction of BMI over time based on model M2 by gender, baseline BMI and age. Sex
Men
Women
Baseline BMI
Age 5
10
15
20
25
25 35 45 55
25.9 25.7 25.4 25.1
26.9 26.2 25.6 25.0
27.7 26.7 25.7 24.7
28.4 26.9 25.5 24.0
30
25 35 45 55
30.5 30.2 30.0 29.7
31.1 30.5 29.9 29.3
31.6 30.6 29.6 28.6
32.1 30.6 29.2 27.7
35
25 35 45 55
35.1 34.8 34.5 34.2
35.3 34.7 34.1 33.5
35.6 34.6 33.5 32.5
35.8 34.3 32.9 31.4
25
25 35 45 55
26.1 25.8 25.6 25.3
27.1 26.5 25.9 25.3
28.0 27.0 26.0 25.0
28.8 27.4 25.9 24.5
30
25 35 45 55
30.7 30.4 30.1 29.8
31.4 30.7 30.1 29.5
32.0 31.0 30.0 29.0
32.5 31.1 29.6 28.1
35
25 35 45 55
35.2 34.9 34.7 34.4
35.6 35.0 34.4 33.8
35.9 34.9 33.9 32.9
36.2 34.7 33.3 31.8
1.95
2.71
3.29
3.63
Prediction error∗ ∗
Time
Square root of the total variance ˜ 2 + (1; t; t 2 )˜ u (1; t; t 2 )T .
0:0002) = −0:913 for all the models. Figure 5 shows a pictorial description of relationship between the two terms u1j and u2j , estimated from the empirical Bayesian method (for example, reference [41]) based on model M2. This indicates that the strong compensation between these two terms prevents a sudden surge or sudden drop in the prediction of future BMI. The precision of prediction was measured by a total estimated variance, that is ˜ 2 + (1; t; t 2 )˜ u (1; t; t 2 )
(7)
depending on the time, where the square root of this is a prediction error. Figure 6 shows that the magnitude of the total estimated variance from model M2 increases over time.
116
M. HEO ET AL.
quadratic term residuals
0.05
0.0
-0.05
-2
-1
0
1
2
slope residuals
Figure 5. Scatter plot of estimated u1j and u2j by empirical Bayesian method based on model M2 in Table V.
14 12
Total variance
10
8 6
4 2 0 0
5
10
15
20
Time (T)
Figure 6. Estimated total variance of prediction over time based on model M2.
BMI GROWTH CURVE
117
6.2. Analysis with smoking data Using the 10793 subjects for whom smoking status was available (Table IV), we tested the signicance of smoking eects on the growth curve. The results are presented in Table VII. Models SMK2 and SMK4 indicate that the smoking status has signicant eects intercept 0j , which implies that ‘ever’ smokers tend to be heavier at baseline. However, the eects of smoking status on the slope 1j and the quadratic term 2j in these two models were not signicant when tested marginally (see Table VII) or jointly (p = 0:607 in SMK2 and p = 0:333 in SMK4). Thus, smoking status did not signicantly aect the estimated growth curve trajectories. The models SMK1 and SMK3 served as a kind of sensitivity analysis with respect to models M1 and M2, respectively, because the former t the subset of subjects that the latter tted with the same sets of variables. There is a high degree of similarity in the coecients between SMK1 and M1 and between SMK3 and M2 (see Tables V and VII). The signs, or directions, of the estimated coecients are the same. More importantly, for this subset of subjects with non-missing smoking status, the correlations r between observed values and predicted values from M1 and M2 were both 0.815. This shows that models M1 and M2 ts this subset as well as SMK1 and SMK2, respectively, both of which produced 0.816 correlation (Table VII). Therefore, models M1 and M2 seem well supported for this subset. This is particularly so because SMK1 and SMK2 t the subset better than M1 and M2, respectively, as they should do, but the dierence in t is negligible.
7. DISCUSSION This paper uses HLM to estimate the expected changes in BMI over time in overweight persons. As mentioned before, this approach models dierent patterns, or shapes, of growth trajectories among individuals in terms of their characteristics, which results in modelling both within- and between-individual variations. In this sense, estimation through HLM has an advantage over the OLS approach, because the latter does not model the between-individual variations of growth trajectories. The OLS estimation approach, therefore, may yield inappropriately smaller standard errors of estimates of coecients 00 ; 10 and 20 especially when the number of the second-level units is large [7]. Indeed, (inappropriately) smaller standard errors of estimates of coecients 10 and 20 resulted from OLS approaches to models M1 to M5 and SMK1 to SMK4 in the present study (data not shown) – this illustrates the importance of using HLM when ‘between-individual variations’ are present. In addition, using OLS to estimate the coecients from the pooled data would be inappropriate as OLS assumes that the errors are independent with mean zero and common variance, but the errors from the model derived from the level-1 and level-2 analyses are not independent and the variance may not necessarily be homogeneous. Furthermore, the deviance, or the twice minus the log-likelihood dierence, of OLS from HLM tting was no less than 56000 (that is, no less than 20 per cent improvement of log-likelihood) over all models M1 to M5 and SMK1 to SMK4, strongly supporting the randomness of the coecients of 0j , 1j and 2j . Further advantages of HLM over OLS, when the former is appropriate, are discussed in Pollack [42] and general aspects of application of HLM are extensively discussed in Kreft [43]. Goldstein (reference [7, Chapter 2]) briey
Baseline covariate (Z)
00 : Intercept 01 : Age 02 : Sex† 03 : BMI 04 : Smoking‡
10 : Intercept 11 : Age 12 : Sex 13 : BMI 14 : Smoking
20 : Intercept 21 : Age 22 : Sex 23 : BMI 24 : Smoking
0j
1j
2j
0:640∗ (0:027)
— −3:6 × 10
(2:2 × 10 — — — —
−3∗
−4
)
−4
(3:1 × 10 ) — — — −4:0 × 10−5∗ (4:2 × 10−4 )
−3:6 × 10
−3∗
−6:3 × 10−3∗ (2:8 × 10−4 ) −0:010(5:6 × 10−3 ) −0:010∗ (7:6 × 10−4 ) −3:9 × 10−3 (9:2 × 10−3 )
0:635∗ (0:026)
−6:3 × 10−3∗ (2:8 × 10−4 ) −0:011∗ (5:3 × 10−3 ) −0:010∗ (7:6 × 10−4 )
0:312 (0:086) 2:6 × 10−3∗ (9:0 × 10−4 ) −0:019(0:020) 0:981∗ (2:5 × 10−3 ) 0:046∗ (0:017)
∗
Model 2 (SMK2)
0:359 (0:084) 2:1 × 10−3∗ (8:9 × 10−4 ) −7:7 × 10−3 (0:019) 0:980∗ (2:5 × 10−3 ) —
∗
Model 1 (SMK1)
—
3:4 × 10 (2:2 × 10−3 ) −1:4 × 10−4∗ (2:0 × 10−5 ) −1:8 × 10−4 (4:5 × 10−4 ) −2:0 × 10−5 (7:0 × 10−5 )
−3
—
0:520∗ (0:048)
−3:9 × 10−3∗ (5:0 × 10−4 ) −0:011(0:010) −0:010∗ (1:4 × 10−3 )
0:417 (0:088) 7:1 × 10−4 (9:3 × 10−3 ) −6:4 × 10−3 (0:020) 0:980∗ (2:6 × 10−3 ) —
∗
Model 3 (SMK3)
Estimated xed coecients (SE) for
3:8 × 10−3 (2:3 × 10−3 ) −1:5 × 10−4∗ (2:0 × 10−5 ) 2:0 × 10−4∗ (4:8 × 10−4 ) −2:0 × 10−5 (7:0 × 10−5 ) −5:0 × 10−4 (4:5 × 10−4 )
0:516∗ (0:049)
−3:9 × 10−3∗ (5:1 × 10−4 ) −0:012(0:011) −0:010∗ (1:4 × 10−3 ) −4:1 × 10−3 (9:5 × 10−3 )
0:379∗ (0:090) 9:7 × 10−4 (9:4 × 10−4 ) −0:016(0:020) 0:981∗ (2:6 × 10−3 ) 0:041∗ (0:017)
Model 4 (SMK4)
Table VII. Results of hierarchical linear model tting with 68 830 data units from 10 793 subjects with smoking status.
118 M. HEO ET AL.
(AIC,BIC)§ versus model 1 versus model 2 versus model 3
versus model 1 versus model 2 versus model 3
0.014% (+; −) 0.010% (+; +) —
0.816 4.26 0.003% (+; −) —
0.816 4.26 —
‡
†
238,690
1:237∗ (7:3 × 10−3 ) 0.000 (0.000) 1:139∗ (2:4 × 10−4 ) 1:8 × 10−4∗ (0:000) 0.000 (0.000) 0.000 (0.000) −4:7 × 10−3∗ (1:0 × 10−4 )
¡0:0001 — — 0.816 4.26
238,715
1:237∗ (7:3 × 10−3 ) 0.000 (0.000) 1:139∗ (2:4 × 10−4 ) 1:8 × 10−4∗ (0:000) 0.000 (0.000) 0.000 (0.000) −4:7 × 10−3∗ (1:0 × 10−4 )
0.046 —
—
238,723
1:237∗ (7:3 × 10−3 ) 0.000 (0.000) 1:139∗ (2:4 × 10−4 ) 1:8 × 10−4∗ (0:000) 0.000 (0.000) 0.000 (0.000) −4:7 × 10−3∗ (1:0 × 10−4 )
p-value ¡0:05. Sex = 1 for male, 0 for female. Smoking = 1 for ‘ever’ smoker, 0 for ‘never’ smoker. § Signs of AIC and BIC: the positive sign indicates improvements and the negative sign no improvements.
∗
r MSE %R2 ↑
Goodness of t −2 log L p-value
Estimated random eects (SE) ˜ 2 var(u 0j ) ˜ u var(u1j ) var(u2j ) cov(u 0j ; u1j ) cov(u 0j ; u2j ) cov(u1j ; u2j )
0.018% (+; −) 0.015% (+; −) 0.004% (+; −)
¡0:0001 ¡0:0001 0.019 0.815 4.28
238,680
1:237∗ (7:3 × 10−3 ) 0.000 (0.000) 1:139∗ (2:4 × 10−4 ) 1:8 × 10−4∗ (0:000) 0.000 (0.000) 0.000 (0.000) −4:7 × 10−3∗ (1:0 × 10−4 )
BMI GROWTH CURVE
119
120
M. HEO ET AL.
and concisely discussed other applicable estimation procedures to multilevel-structured data such as the generalized estimating equations (GEE) [44] and Bayesian approaches such as the Markov chain Monte Carlo (MCMC) [45]. The methods used to develop the growth curves herein have several strengths that should make the results of practical utility. They were derived from four large-sample longitudinal studies. Such pooling of large databases increased the precision of parameter estimates. These studies employed measured rather than self-reported height and weight. Studies measured height and weight, thus eliminating possible reporting biases. An advantage of the pooling in this particular analysis is that it allowed us to test the among-study variation in within-subject trajectories after accounting for covariates. Although there was some heterogeneity of eect among the studies, the eect was minor and did not meaningfully impact on the estimated growth curves (Table V). This implies that the performances of prediction by models M1 and M2 without knowledge of the source of particular observations are almost the same as those by models M3 to M5 with such knowledge. This is practically important because it suggests a good degree of generalizability to our ndings. Nevertheless, the recent secular trend for increasing obesity [22] may introduce a confounding eect on the estimated BMI growth trajectories. That is, the estimated trajectories might reect the additive combination of a natural trajectory plus secular trends. On the other hand, there were arguably minimal secular trends for the period during which these data were collected and, as previously mentioned, the HLM results did not substantially dier by study. For these reasons, confounding due to possible secular trends is unlikely to be substantial in the present analysis. A desirable feature of the present analytic approach is that the decline of BMI in the older subjects (baseline age = 60, Figure 4) cannot be attributed to a simple ‘survivor eect’. A survivor eect could occur if older leaner people tended to live longer than older heavier people. If individuals’ weights remained relatively constant, this might eventually lead to lower mean BMIs among a cohort of older people as they aged. If the analysis had been carried out in a cross-sectional manner, which is essentially the same as the ‘curve of averages’ (see reference [3]), such reasoning might be valid. However, because the HLM approach adapted herein is essentially the ‘average of growth curves’ of individuals over years, it bypasses the potential survivor eect. Figure 7 illustrates this point with three hypothetical subjects. The growth curve with the dashed line, based on a cross-sectional analysis, shows a decline of growth because subjects 2 and 3 died before the ages of 70 and 60 years, respectively, a result of which is a survivor eect on growth. On the other hand, the growth curve with the solid line does not show such decline, because it is based on the average of individual growth curves. Thus, our results cannot be due to artefacts from survivor eects. Hierarchical liner modelling is especially well-suited to meta-analysis (see references [46, 47] for tutorial). In typical practice, study eects have either been ignored (a practice that can yield severely biased estimates and their standard errors [48]), or considered to be xed. However, study eects should be treated as random as long as generalization of meta-analytic results is concerned, given the potential for methodological variability among studies (for example, sample size, demographic variables, predictor variables, experimental designs and measurement methods). The application and extension of HLM to meta-analysis is straightforward and can be performed by any software introduced in Section 2.2. The main ndings from this study can be summarized as BMI tends to increase with time for younger people with relatively moderate obesity while it tends to decrease for older people regardless of the degree of obesity. These results are consistent with previous results [49].
121
BMI GROWTH CURVE
Average Curve
Curve of Averages
subject 2
BMI
subject 3
subject 1 40
50
60
70
time/age
Figure 7. Dashed line with ‘♦’ markers represents the growth curve based on cross-sectional analysis. Solid line with ‘×’ markers represents the growth cure based on longitudinal analysis with HLM.
This implies that it might be benecial to try to prevent obesity in younger adults, because the younger overweight=obese persons have a tendency to gain weight over time regardless of their ‘current’ BMI as shown in Figure 4. The rates of these changes are inversely related to the baseline BMI and the patterns do not substantially depend on gender. In addition, smoking status does not have signicant eect on the patterns (Table VII). A practical implication of the growth curves developed herein is to provide control data against which the long-term ecacy of obesity interventions could be compared. For example, several studies have conducted follow-up assessments as much as 5 or 10 years after treatment [50, 51]. Although these studies successfully followed patients’ weight changes over time, there was no control group against which their progress could be compared. The growth curves generated herein ll this void. These growth curves could also be used to gauge an individual patient’s progress. For example, suppose that a 25-year-old male subject with BMI of 30 (for example, 203 pound for 5 9 height) lost 21 pounds due to an intervention and maintained that loss for ve years. This corresponds to a decrease in his BMI to 27. According to prediction equation (6), without intervention, the subject would have been expected to gain 0.8 BMI units (5.6 pound) in ve years with a prediction error of ±1:95 BMI calculated from equation (7). Therefore, his weight management could be considered quite successful, because his loss of 3 BMI units in ve years is approximately 2 SD below the predicted 0.8 BMI units gain. In sum, this paper demonstrates the utility of HLM for developing longitudinal growth curves. This was accomplished by application of random coecients regression to estimating natural changes in relative body weight over the lifespan, which provides prediction equations
122
M. HEO ET AL.
for long-term changes in BMI that can be used as pseudo-controls. It is hoped that this will be useful to other investigators.
APPENDIX: COMPARISON AMONG SOFTWARE CODES AND THEIR OUTPUTS A.1. MLA code and output MLA code /TITLE Mla Growth Curve Example /DATA file= example.dat vars= 6 % Six Data Fields in the File id2= 1 % The level-2 Variable is the First Field / MODEL % We Model the First-Level and Second-Level Effects % g’s are the gamma coefficients for the Level-2 equations % u’s are the Level-2 residuals % e’s are the Level-1 residuals % v’s are the ordinal positions of variables in a file. In our example: % v3 is a contrast code for Group Members -1 for Group 1 and 1 for Group 2 % v4 is t(ime); the linear term % v5 is time-squared; the quadratic term % v6 is the dependent variable b0 = g00+g01? v3 + u0 % Level-2 equation for Regression constant b1 = g10+g11? v3 + u1 % Level-2 equation for Linear Effect b2 = g20+g21? v3 + u2 % Level-2 equation for Quadratic Effect v6 = b0 + b1? v4 + b2? v5 + e1 % Level-1 Equation /PRINT post= all rand= all olsq= yes res= all /END MLA output Full information maximum likelihood estimates (BFGS) Fixed parameters Label
Estimate
SE
T
Prob(T)
G0 G1 G10 G11 G20 G21
5.106271 0.195673 1.949746 0.208803 0.115567 0.064794
0.107180 0.107180 0.044864 0.044864 0.009241 0.009241
47.64 1.83 43.46 4.65 12.51 7.01
0.0000 0.0679 0.0000 0.0000 0.0000 0.0000
123
BMI GROWTH CURVE
Random parameters Label
Estimate
SE
T
Prob(T)
U0 U0 U1? U0 U1? U1 U2? U0 U2? U1 U2? U2
0.003430 0.001080 0.000873 −0.002525 −0.000441 0.002095
0.095007 0.037456 0.016638 0.005695 0.002538 0.000663
0.03 0.02 0.05 −0.44 −0.17 3.16
0.9712 0.9770 0.9581 0.6575 0.8620 0.0015
E
0.246646
0.024070
10.25
0.0000
?
Warning(1): possible nearly-singular covariance matrix Determinant covariance matrix = 1.9995e-19 Conditional intra-class correlation = 0.00/(0.25+0.00) = 0.0137 # iterations = 48 -2? Log(L) = 594.184819
A.2. SAS Code and output SAS Code proc mixed data=growth.example covtest method = ml; class id; model outcome = contrast time time_sq t1_gp t2_gp/solution ; random intercept time time_sq/subject = id type =un; run ;
SAS output Covariance Parameter Estimates Cov Parm
Subject
Estimate
Standard Error
Z
Pr Z
UN(1,1) UN(2,1) UN(2,2) UN(3,1) UN(3,2) UN(3,3) Residual
ID ID ID ID ID ID
2.24E-18 0.003075 4.55E-70 -0.00336 -0.00013 0.002057 0.2466
. 0.004370 . 0.005426 0.002215 0.000651 0.02239
. 0.70 . -0.62 -0.06 3.16 11.01
. 0.4817 . 0.5362 0.9535 0.0008 ChiSq
4
748.50
|t|
5.1063 0.1957 1.9497 0.1156 0.2088 0.06479
0.1066 0.1066 0.04453 0.009173 0.04453 0.009173
28 210 28 28 210 210
47.89 1.84 43.78 12.60 4.69 7.06