If"
•r '" r
~
r
.,. f
e:
..-
~
:.:.
,.
m .I;J ~
rtl
a ...---4
C\j
;:j,
4--l Q 0 , !"""'4 '"0.
en
• !"""...
962 downloads
3363 Views
15MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
If"
•r '" r
~
r
.,. f
e:
..-
~
:.:.
,.
m .I;J ~
rtl
a ...---4
C\j
;:j,
4--l Q 0 , !"""'4 '"0.
en
• !"""'4
fJ)
bIJ ~~
0
:~
-4--" ~ , ,,....( ....-.l
{ij
r
!-l
«
r'
'-'
::,;
C
,...... ~
'-,
C
r'
'.J
~
'J:
fJJ
~
t.L.
>.,
......
•
F"
~J
t:./J
·
;J...,
~(,
.....
.."
F'
-
,-,
-.
::J :J
>=-
I
~.[
~
.......
r-
:.J
•
-
:::( CJ
N
• •
.......
t------'
...... 0 :.;, r..r.
.",
,......
·..... 'U ...... -''" --. .....-,
.-
~ '.."J
~
t~
·'"
,,-.,
c.c .......
:::r::
•
,----.1
-
;,....., Q.J
.......
C".i
;::L,
SCIENCE
OXFORD
SERIES
STATISTICAL -----
. nd regression ., -mattOns , a . PI t transJOI uv fsttCS A. C. Atkinson: . °t:~free multivariab~e s~at: analysis: a user's 1. ..6 Stone: Coordtna . . les of multwarta 2. iV', ski' prtnctp I 3. W. J. Krz anow . . d J Hinde: Statistica perspecttve d on B. francIS, an . . k' D An ers , 4. M. Alt Ill,. . LIM . . I' t oduction modelling tn G T series: a biostattstu:.a tn.tca r 1system approach Peter J. Diggle: tm,e r time series: a dynam 5. 1111 go Non- lmea 6. Howe on. b . Estimating junctions . d related models 7 V. P. Godam e. N Donev: Opttmum an I . A C. Atkinson and A.· Q u;ng and related mode s 8.· 1 V Basawa: ue. Bhat and . . t d """easurements 9. U.N' M d I f r repea e ... 10 J. K. Lindsey: 0 e s 0 efficient models . . N T Longford: Random co 'on and calibratwn 11. .' t regress z , I . f p. J. Brown: Measuremen, . and Scott L. Zeger: Ana ystS 0 12. Peter J. Diggle, Kung-Yee LIang, 13. longitudinal data ., p t 'cal methods for reliability data J. 1. Ansell and M. J. Phllhps: rae t
Analysis of Longitudinal Data SECOND ED1TION PETER J. D1GGLE Director, Medical Statistics Unit Lancaster University
PATRICK J. HEAGERTY Biostatistics Department University of Washington
14.
~~~:~~ndsey:
Modelling frequency a~d count data 15. J L. Jensen: Saddlepoint approxtmatwns 16. . G h' l dels 17. Steffen L. Lauritzen: rap t~~ .~~ l' d smoothing methods for data 18. A. W. Bowman and A. Azza lUI. pp te analysis d dit" 9 J K Lindsey: Models for repeated measurements Secon e. ; t ~o: Michael Evans and Tim Swartz: Approximating integrals vw on e Carlo and deterministic methods . 21. D. F. Andrews and J. E. Stafford: Symbolic computatwn for statistical inference 22. T. A. Severini: Likelihood methods in statistics 23. W. J. Krzanowski: Principles of multivariate analysis: a user's perspective Updated edition 24. J. Durbin and S. J. Koopman: Time series analysis by state space models 25. Peter J. Diggle, Patrick J. Heagerty, Kung-Yee Liang, Scott L. Zeger: Analysis of Longitudinal Data Second edition 26. J. K. Lindsey: Nonlinear models in medical statistics 27. Peter J. Green, Nils L. Hjort, and Sylvia Richardson: Highly structured stochastic systems 28. Margaret S. Pepe: Statistical evaluation of medical tests
KUNG-YEE LIANG and
SCOTT L. ZEGER School of Hygiene fj Public Health Johns Hopkins University, Maryland
OXFORD UNIVERSITY PRESS
OXFORD \1NIVllJl8ITY PRESS
Oxford OX2 6DP Great Clarendon Street, f h Unl'versity of Oxford, , d rlmen! 0 ! e I h' Press IS ~ epa f ~"JJence in research, scho ars lp. Oxford University " h l S objectIve 0 ex.... 'd ' It furthers the Umversh,.. by blishing WOr/dWI em and educatIon pu Oxford New York , aI m Hong Kong KarachI r Auckland Cape Town'dDaM elsbS ra:e Mexico City Nairobi L ur Madn e ou . KuaIa ump NewDeIh'I To' Jalp el' Toronto ShanghaI With offices in , h'l Czech Republic France Greece Argentina Austria BraZI: ~ I e South Korea Poland Portugal Guatemala Hun~ary ItadyThap;n d Turkey Ukraine Vietnam Singapore SWItzer/an a an Published in the United States by Oxford University Press Inc., New York @
Peter J, Diggle, Patrick J, Heagerty, Kung-Yee Liang, Scott 1. Zeger, 2002 The moral rights of the author have been asserted Database right Oxford University Press (maker) First edition 1994 Second edition 2002 Reprinted 2003, 2004
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer. A catalogue record for this book is available from the British Library
Library of Congress Cataloging in Publication Data (Data available) ISBN 0 19 852484 6 10 9 8 7 6 5 4
Printed in Great Britain , on acid-free paper by BiddIes Ltd" King's Lynn, Norfolk
To Mandy, Claudia, Yung-Kuang, Joanne Jono, Hannah, Am,elia, Margaret, Chao-Ka~g, Chao-Wei, Max, and David
Preface
This book describes statistical models and method f th I' I'd" s or e ana YSIS of ongitu mal d.ata, with a strong emphasis on applications in the biological and health SCIences. The technical level of the book is roughly that of a ~rst year postgraduate course in statistics. However, we have tried to write m such a way that readers with a lower level of technical knowled e b t experience of dealing with longitudinal data from an applied point gf'. u 'II b bl . 0 VIew, WI e a ,e t? appreCIate and evaluate the main ideas. Also, we hope that readers With mterests across a wide spectrum of application areas will find the ideas relevant and interesting. In classical univariate statistics, a basic assumption is that each of a number of subjects, or experimental units, gives rise to a single measurement on some relevant variable, termed the response. In multivariate statistics, the single measurement on each subject is replaced by a vector of measurements. For example, in a univariate medical study we might measure the blood pressure of each subject, whereas in a multivariate study we might measure blood pressure, heart-rate, temperature, and so on. In longitudinal studies, each subject again gives rise to a vector of measurements, but these now represent the same physical quantity measured at a sequence of observation times. Thus, for example, we might measure a subject's blood pressure on each of five successive days. Longitudinal data therefore combine elements of multivariate and time series data. However, they differ from classical multivariate data in that the time series aspect of the data typically imparts a much more highly structured pattern of interdependence among measurements than for standard multivariate data sets; and they differ from classical time series data in consisting of a large number of short series, one from each subject, rather than a single, long series. . The book is organized as follows. The first three chapters prOVIde an introduction to the subject, and cover basic issues of design and explorat?ry analysis. Chapters 4, 5, and 6 develop linear models a~d ass.ociated s.tatIstical methods for data sets in which the response variable is a contmuous
PREFACE "Hi
melL~urement. Chapters 7, 8, 9, 10, and 11 are concerned with generalized
linear models for discrete response variables, Chapter 12 discusses the issues which arise when a variable which we wish to use as an explanatory variable in a longitudinal regression model is, in fact, a stochastic process which may interact with the response process in complex ways. Chapter 13 considers how to deal with missing values in longitudinal studies, with a focus on attrition or dropout, that is the premature permination of the intended sequences of measurements on some subjects. Chapter 14 gives a brief account of a number of additional topics, Appendix A is a short review of the statistical background assumed in the main body of the book. We have chosen not to discuss software explicitly in the book. Many commercially a.vailable packages, for example Splus, MLn, SAS, Mplus or GENSTAT, mclude some facilities for longitudinal data analysis, However, non~ of the currently available packages contains enough facilities to cope "":lth the full range of longitudinal data analysis problems which we cover III the book. For our own analyses, we have used the S s stem (Becker et al., 1988; Chambers and Hastie 1992) with dd't' lY defined functions for longitudinal data an a IlOna llserH system which is a publically a 'I bl ~ YSIS and" more recently, the Splus (see www.r-project,org), val a e so tware enVIronment not unlike
i '
We have also made a number of mar ' In particular, the chapter on " e substantIal changes to the text I h ' mlssmg values is b ' engt of Its counterpart in the first ed' , now a out three times the chapters which reflect recent method lltl?n, and we have added three new Most of the d t 0 oglcal developments d' a a sets used in th ' and can be down-loaded from eth are in the public ',www,maths.lancs,ac r' erst author's web-s't J
h~:p~I;I'
weJ;;;:'\http, / / f aoul ty . k
wa;:'g~~ggl~ t
I e oak remains incomplete '
~he
U
bO~k
from Ihe second
heagerty/.
inn~:re~:r~n~e~f::::~ce,O~IOngit~~~al~a~:~:o~~~sit
anlh~r;;
reflects our own econometrics ,tIstIclans. We are aware as they have arisen included III the social sciences more of other relevant work in not attempted te this related work but whilst we have Many fro dO cover It m detail e second edition we h len s and call ' ' ave Hubbard typed m h eagues have helped US . duction, Larry Ma;~er of th~ book, Mary Jo WIth thi~ project. Patty John Hanfelt St' Ii ' Damel Tsou B M Y Argo faCIlitated l't H ' Ir ng H'lt ' ev ellen H ' S proMackey, Jon Larry Moulton, Beth Melton, P mg, preparation f d' ' and Thomas Lu 1 nge, Joanne Katz ac knowledge support from a a lagra ey g ' M ms, and readingmth d a ev asSIstance with' erck Development G e raft. We gratefully rant to Johns Hopkins
som~ ~: erence~ t~
co:~~,
Wa~e~:id
i~et~erallY,
Ni~k~:son,
.
PREFACE
Ulliversity. In thOIS second i" . I' t h e fi. e( ItJon ix g rap h Ical. errors, n . ' We h av· explanatIOns We th rst edition and I e (Urrected a 11l I faults of bot'h k' ank those read!:'rs f lave tried tu ('} .Ifnl )('r of typumds I ' 0 H fi an y s and obscurities. ' an( accept r('sp()\lSI"II~. rst f edition wi'10 .JlolIlted uI~le of our JIll tv . ()r ..-penment . -11) cows all1.3.t dPercentage t d protein content of milk samples taken at weekly intervals . In the ongm were oca e a ran om amongst three diets. cows 1-25 " barley- cows 26--52. barle... . -3- /-9 Iupius . 'T 'Iupms; COWS;) D t below are from the barley diet only. 9.99 signifies missing. . " aa
3.57 3.25 3.60 3.50 3.76 3.71 3.60 3.86 3.42 3.89 3.59 3.76 3.90 3.33 4.00 3.22 3.66 3.85 3.75 9.99 3.10 3.86 3.79 3.84 3.61
3.63 3.24 3.98 3.66 4.34 4.36 4.17 4.40 3.40
3.75 4.20 4.02 4.02 3.90 3.81 3.62 3.66 4.44 4.23 3.82 3.53 4.47 3.93 3.27 3.32
3.89 3.38 3.29 2.72 3.45 4.06 3.78 3.64 3.35 3.32 3.19 3.95 3.71 3.35 3.52 3.28 2.66 3.40 4.09 3.25 3.35 3.74 3.76 3.40 3.58
3.65 3.09 3.30 2.90 3.51 3.95 3.10 3.32 3.39 3.42 3.27 3.53 3.55 3.22 3.47 3.02 3.10 3.22 3.60 3.33 3.48 3.49 3.58 3.44 3.48
3.47 3.29 3.43 3.05 3.68 3.42 3.52 3.56 3.51 3.65 3.55 3.60 3.73 3.25 3.57 3.62 3.28 3.55 3.82 3.27 3.90 3.34 3.68 3.46 3.25
~>8CD"1""'a
..... Cltaq
e; ::s ::s.,
~
P'l
~
::s .---.. CD 00' ;:::! .,
....... II
~.
,00
(!; P'
e. "d
~
Go ~
--'l
0.. (1) ~ Ul ::s ~ co ~~ct-~.
w
p.
IAlst t(,
-
()llH'f f(·;Lo.;, I} J
I I 1
lhw()(ljJPral
I
i\,(,
\Vit hdfP\\, ,'olJsP1l1
3
2
Baseline (b)
"'111111
Adv'·r",· "XP/'rJl'II'"
__ r.•.•-·_~Ll. [=:.:;J ~
I'
fl,l
AI'l!lII1IJaI lall [(-'-111,
T T
j
"_ _(rug t!J'·r"pi .." ___
1~ ,
10
1I ~
~
8
Table. 1.7. NumlJers of dmpuub alJd l'()lllph.tprs by treatment group III tlIP schizophH'llia trial. TIl(' treatment codes are: }) =: !J]an·ho ' 11 -~ I la II/Pf'f}C' i1/ I . . 20 mg, ~2 =: .nspendone 2 mg, 1'6 =: risjJf'ridollP Gill!!;, rIO = nspendone 10 mg, 1'16 =: rislwridone IG mg,
£ ~
6
l/)
~ ::>
.~ 4
I
l/)
~ enl ::>
r
2
I
I
1 I 1
/i o Baseline
I
I
I
I
T I
T I
T I
T
I
I I I I
T
I
I
I I
I
I
1 2
3
4
Fig. ,1.5, Boxplots of square-root-transformed seizure rates for epileptics at
p
h
r2
1'6
riO
rIG
Totol
Dropouts Completers
61 27
51 36
51 36
34 52
39 48
34 54
270 253
Total
88
87
87
86
87
88
523
~r~~l;~~ and for four subsequent two-week periods: (a) placebo; (b) progabide-
Of the 523 patients onl 253 r although a further 16 pr;v'd are lsted as completing the study, criterion for completion in~l~d ~eO~fl ete seq,uenee,of PANSS scores as the distribution of the stated e a£ 0 ow-up mterVlew, Table 1.6 gives the fd reasons or dropout T bl 1 . o ropouts and eompleters in each of th .' a e .7 gIves the numbers common reason for dropout is 'i ad e SIX treatment groups. The most out of the 270 dropouts The:' hequate response', which accounts for 183 group, followed by the halop .~g 1est dropout rate occurs in the placebo group. One patient provided e~~ ~ ~roup and the lowest dose risperidone wasF~ot considered further in the a:~ a~ all after the selection visit, and Igure 1.6 shows th b YSIS. .thi e 0 served WI n each treatment mean response as f . have not yet d groUP, that is each ave . a unctIOn of time rapped out. All six groUp h rage IS over those patients who ss OWam ean response profile with
J
the following features: increasing between selection and baseline; decreasing post-baseline; slower rate of decrease towards the end of the study. The mean response in the placebo group shows a much smaller overall decrease than any of the five active treatment groups. The risperidone treatments all show a faster rate of decrease than haloperidol initially, but the final mean responses for all five active treatment groups are similar. The overall reduction in mean response within each active treatment group is very roughly from 90 to 70, which appears to meet the criterion for clinical improvement. However, at each time-point these observed means are, necessarily, calculated only from those subjects who have not yet dropped out of the study, and should therefore be interpreted as conditional means. As we shall see in Chapter 13, these conditional means may be substantially different from the means which are estimated in an analysis of the data which ignores the dropout problem,
NOTATION
spruce .data' set includes only 7!J ' tn'e"" ..' a~'ll \"J'tl I .).) .' I"'ilia \\\', . ~,) 1II...LS1}f('lIWllh, the objectives of HlP studies diller. For ('X"r111'\ ' , " (, 'J I 1 ti W (,'n . I + I Iat a "d, infer('nc('s arc to be made ~\l\" ' t \ Hi t COllHS(' II'}ng • . • about tIll~ iwlivill\l"\ . ( .) L)~ PC 't Sf... {'lin he prOVided, wlwfea.s Jll th .. ('fU""UVI'f t rl' , ·.1. t \1(> .."\. (·f.I)!,(· . . " H'''pon".. 'III 1I1(' I)ul tlOn for eacb of t lw twu In>at III('IIIS ,'~ • \J(' j', 'II '1'1 ' (1I'lfprell('p~ I H~. ll':-\f' a PO influence the speClfk approadJ to H.lIalysi:-; as dis(lI)o,s('d ill d.. tai\ thfolll!,hont >
14 95
••
.
.
.
,.
.-;0
the houk. 1.3 Notation To the extent possible, the hook will dpscl'ilJP ill words tlll' ma inr idpas underlying longitudinal data analysis, Howewr. df'tails awl pl'l"·is,, st at,,ments often require a mathematical formulation. This S("'!ion prpspnts the
notation to be fonowed. In general, we will use capital letters to represent random variahlrs or matrices, relying on the context to distinguish the two. s th(· prohlpJlIS which call arise whf'1l a t.ime-varying f'xplanat.ory varil1blp is df>rivl'fj fro!ll a ;,tflf'h'Lo.;tic proc('ss whir:h may interact with t.he reRj)oIlS{' pnH'pss ill a ('olllplp); nHlIlllPr: in partie ular, this req uif{~S ('an'ful considpff\t iOIl of \\' hat f"rlll of cOlld it ionin/!; is appropriate. Chapter I:! dis('ns~i(>S til(' pn,blplJIs raispd by missing values, which frequently arise in IOIlp;itwlinal stndips. Chaptl'r 11 givps "hart introductions to sf'veral additional topics which ha\'(' l>pf'n til(' foellS of recent research: non-pararnptric and lIon-linp~lr IIlIHlp\\iIl/!;: 1Il1lltivariate extensions, includinp; joint Ulo(!l-.lling, of lonl!,it \l(lina\ JlWaSlIfl'lIlpnts and recurrent events. Appendix A is a hrief rf'vip\\, of t hp st atistical tIlf'ory of linear and GLMs for regression analysis with indq)f'ndent observations, which provides the foundation for the long,itudinal methodology df'vf'loprd in the rest of the book,
BIAS 23
[(p-expressing (2.2.1) as Y'j
= til' + r"iJ-" + Ij(.r ')
r
-
. tI
1+
(2.2.2)
(IJ'
we IlCJte that . !V tbat t1lP I" .... , . . this modp!. .a'iSIlIlW" . : - ; nup I'wit due' to .Ed IS thp saIne as tlw lorl"l \. . ! 'ff fO",,-,,( I"llOllal p/fpc1 . h h" )!,I 1]1 IlJo. p PI"I fI'pn'''PlJtf'd h. the ng t- and RICh,. This assllmrlti ' , ! ~ .r ·-.TiI OIJ . . Oil IS rat If'f a sl fOIJ ' ., I I faIl 111 many studips. Thp modp) ("'1 I I'f' g (HII Mil (oOHwd to n nllH I j('d by '\11·' I to have their own intprcept (j tI 't " 1 . , ' ! J \ \ Ing 1.'0.1" , pprson . . '. 0,. ,,
where
8=
the aw~raJTpd witl ' I' h '.." ~11-S_ll_)J('f't \'arialio!1 ill .f t e betwpen-subjpcts vari'lti(;l -1'- .-. - -. ~ _
L,)x'J -
,
11 .1 ,II \'lS11
1
i,j2/{m(1I -IJ}
L,(X,-iF/(m~' Figure 2.1 gives plots of e against 8 f Except when J is small and the c;mU10~r('soU1~;.plf' " '~(/-~---=----
-2
o
2 Years since seroconve rSlon
4
Fig. 3.5. CD4+ residuals against. t.ime since seroconversi~n, wit.h sequences of dat.a from randomly selected subjects shown as connected Ime segments,
-2
o
Example 3.2. (continued) Figure 3.3 displays the average number of CD4+ cells for the MACS seroconverters. Figure 3.6 shows the residuals from this average curve with the repeated values for nine individuals connected. These persons had median residual value at the extrema or the 5th 10th, 25th, 50th, 75th, 90th, or 95th percentile. We have used residual~ rather than the raw data as thOIS somet'Imes h eIps to uncover more subtle
2
Years since seroconverslon
Fig. 3.6. CD4+ residuals auainst tl'rI1(' "I'r').' . , to " «(' S(Tl)('()IIY T ' data from systematically selectpd suI' . 'I ' I ( Slnll, ,
few curves are to be highlighted, Second, this display is unlikely to uncover outlying individuals, We therefore prefer a second appr~a~h, in ~hich individual curves are ordered with respect to some characterIstIc that IS relevant to the model of interest, We then connect data for individuals with selected quantiles for this ordering statistic. Ordering statistics can be chosen to measure: the average level; variability within an individual; trend; or correlation between successive values. Resistant statistics are preferred for ordering, so that one or a few outlying observations do not determine an individual's summary score. The median, median absolute deviation, and biweight trend (Mosteller and Tukey, 1977) are examples, When the data naturally divide into treatment groups, such as in Example 1.3, a separate plot can be made for each group, or one plot can be produced with a separate summary curve for each group and distinct plotting symbols for the data from each group. For equally spaced observation times, Jones and Rice (1992) apply principal components analysis to identify ordering statistics. Principal components do not exploit the longitudinal nature of the data, nor do they adapt readily to situations with unequal numbers of observations or different sets of observation times across subjects.
- - -
--,--- ' - ~ ~ ,
,
)J( (
S S 10\1'11 as ('(JllIl"C\('d
" I ()f Jill,
patterns in individual curves, Note that the 1· t' f ' I ' " (aa or eae I mdl\'ldual trnrl to 'ff trac k at d I erent levels of CD4+ cell ll\llllbers I t t ,I ' t t 'F' 3 , " lU lIO ncar \ t() the same ex en ~ 111 ,Ig: ,2. The CD4+ data have collsidl:'rably n;ore variation across tIme Wlthll1 a person. ' Thus f~r, w.e have considered displays of the response against time, In many 10ngltudll1al problems, the primary focus is the relationship between the response and an explanatory variable other than time, For example, in the MACS data set, an interesting question is whether CD4+ cell number depends on the depressed mood of an individuaL There is some prior belief that depressive symptoms are negatively correlated with the capacit.y for immune response. The MACS collected a measure of dl:'pressive symptoms called CESD; a higher score indicates greater depressive symptoms, Example 3.3. Relationship between CD4+ cell numbers and CESD score Figure 3.7 plots the residuals with time trends removed for CD4+ cell numbers against similar residuals for CESD scores, Also shown is a lowess curve, which is barely discernible from the horizontal line, y = 0, Thus, there is very little evidence for an association between depressive symptoms (CESD score) and immune response (CD4+ numbers), although such evidence as there is points to a negative association, a larger CESD score being associated with a lower CD4+ count. . 'IS wheth el. th e eVI'dence for any . . questIOn An ll1terestmg 'such .relationship would derive largely from differences across people,. t~at IS, from n cross-sectional information, or from changes across time wlthl~ a ?erso , . Chapter, 1 crosS- sectional informatIOn. IS ' more As previously discussed m . h · exam pIe , 70 of the varIatIOn III likely biased by unobserved factors. IntIS
EXPLORING LONGITUDINAL DATA 40
FITTING SMOOTH CURVES TO . LONGITUDINAL DATA _
41
(a) 1 5 0 0 , - - - -
(b) 2000 1 - - - - - -
_
2000 1000 f/)
1500
+ v
1::J
-in E + v
0
+
o
(ij :::>
(5 1000
o
1000
Q)
_S
a;
U
500
~ Q)
~
500
OJ C
CD
()
ss 'lIId kr'rJ I 11 ." , II' ~1Il"" lI·r~. for data with two : kI~rnd ...... : lo\\'('s~
!JptWl'l'lI
outhers (shown as *)
smoothing methods as this avoids excessive deprndrnc(' of results on a ff'W observations. With each of the non-parametric curve estimation techniques. there is a bandwidth parameter which controls the smoothness of the fitted curve. In choosing the bandwidth, there is a classic trade-off between bias and variance. The wider the bandwidth, the smaller the variance of the estimated curve at any fixed time since more observations have been averaged. But estimates which are smoother than the functions they are approximating are biased. The objective in curve estimation is to choose from the data a smoothness parameter that balances bias and variance, A generally accepted aggregate measure that combines bias and variance is the average predictive squared error (PSE) defined by 1 m PSE(.\) = - I:E{Y,* -{t(t i ;,\)}2, m i=l
(3.3.4)
where Y* is a new observation at ti. We cannot compare the observed Yi with the corresponding value, {t(ti)' in the criterion ab.ove because the optimal curve would interpolate the points. PSE can be e.stlluated bY,::;o~s validation where we compare the observed Yi to the predlct~d c.urve ~ ~;) obtained by leaving out the ith observation. The cross-valIdatIon cntenon is given by
m
CV(.\) =
2- I : {Yi mi=1
{t-i (ti; ).)}2.
(3.3.5)
The expected value of CV(.\) is approximatel~ equ~l to the average PSE. See Hastie and Tibshirani (1990) for further dISCUSSIOn.
EXPLORING LONGITUDINAL DATA
46
EXPLORINC CORR ' , I"LATHJ" STRtTTl'HE
3.4 Exploring correlation structure , , . I . C exploring the degree of assocla. sec t'IOn, rTr'lphICal dlsp ays lor •. J ed To remove the effects of In thIs r· . . y" on the explanatian in a longitudinal data set are cans]C er." explanatory variables, we first regress the response, 'J' '. Ii W'tl' l' t. I I (a •.1. . II" .' to obtain reHI'd uaIs, r' ij -- y''J - xp. IJ . tory vafla) cs, x,], . d fme points correlatIOn can fi 1 mber of equally space I , I • ' .• '. h' .h r. is plotted against Tik for collectec at a xec TlU be studied using a scatterplot matflx III w IC lJ all.i < k = 1, ... , n. I"?
Example 3.5. CD4+ cell numbers t' we have rounded the CD4+ To illustrate the idea of a scatterp I0 t rna fiX, . f '. t 0 that there are a maxImum a ., . . observation times to the neares. year, s . . b t -2 nd 4 far each indIVIdual. FIgure 3.13 shows a 'IX seven observatlOns e ween . . h 2 ' tt each of the 7 c oose sea ,erplots of responses from a person at dluerent times. Notice from the main diagonal of the scatterplot mat.rix that there is substantial positive correlation between repeated observatIOn~ on the same individual that are one year apart. The degree of correlatIOn decreases as the observations are moved farther from one another in time, which corresponds to moving farther from the diagonal. N.0tice also t.hat t~e correlation is reasonably consistent along a diagonal III the matnx. ThIS indicates that the correlation depends more strongly on the time between observations than on their absolute times. If the residuals have constant mean and variance and if Corr(Yij, Yik) depends only on Itij - tiki, the process Y;j is said to be weakly stationary (Box and Jenkins, 1970). This is a reasonable first approximation for the CD4+ data. When each scatterplot in the matrix appears like a sample from the bivariate Gaussian (normal) distribution, we can summarize the association with a correlation matrix, comprised of a correlation coefficient for each plot. Example 3.5. (continued) The estimated correlation matrix for the CD4+ data is presented in Table 3.2. The correlations show some tendency to decrease with increasing time lag, but remain substantial at all lags. The estimated correlation of 0.89 at lag 6 is quite likely misleading, as it is calculated from only nine pairs of measurements. Assuming stationarity, a single correlation estimate can be obtained for each di~tinct value of the time separation or lag, Itij - tik I. This corresponds to ~oollllg observation pairs along the diagonals of the scatterplot matrix. !hlS autocorrelation function for the CD4+ data takes the values presented III Table 3.3.
-2
-1
o
2
3
Fig. 3.13. Scatterplot matrix of CD4+ residuals. Axis labels are years relative
to seroconversion. It is obviously desirable to indicate the uncertainty in an estimated correlation coefficient when examining the correlation matrix or autocorrelation function. The simplest rule of thumb is that under the null condition of no correlation a correlation coefficient has standard error which is roughly 1/ yIN where N is the number of independent pairs of observations in the calculation. Using this rule of thumb, a plot of the autocorrelation function can be enhanced with tolerance limits for a true autocorrelation of zero. These take the form of pairs of values ±2/.fJ'J:, where N u is the number of pairs of observations at lag u.
Example 3.5. (continued) Figure 3.14 shows the estimated autocorrelation function .fo: the CD4+ cell numbers, together with its associated tolerance lImIts for zero
EXPLORING LONGITUDINAL DATA
48
EXPLORING CORRELATION STRUCTURE
Table 3.2. Estimated autocorrelation matrix for CD4+ residuals. Entries are Corr(Y;j, Y;k), 1 = tij < tik < 7 years.
49
1.0 0.8 c 0
tik
2
tik
2 3 4 5 6 7
3
~
~0
6
5
4
u 0
0.66 0.56 0.41 0.29 0.48 0.89
'5
p(u)
1 0.60
0.4
c(
0.49 0.47 0.39 0.52 0.48
0.2
0.51 0.51 0.51 0.44
0.68 0.65
0.61
-
2 0.54
3 0.46
4 0.42
0.75 0.70
5 0.47
..................
0.0
0.75
2
3 4 Years between measurements
autocorrelation. The size of the tolerance limit at lag 6 is an effective counter to spurious over-interpretation of the large estimated autocorrelation. All that can be said is that the autocorrelation at lag 6 is significantly greater than zero. Calculating confidence intervals for non-zero auto correlations is more complex. See Box and Jenkins (1970) for a detailed discussion. In subsequent chapters, the autocorrelation function will be one tool for identifying sensible models for the correlation in a longitudinal data set. The empirical function described above will be contrasted with the theoretical correlations for a candidate model. Hence, the EDA displays are later useful for model criticism as well. The autocorrelation function is most effective for studying equally spaced data that are roughly stationary. Autocorrelations are more difficult to estimate with irregularly spaced data unless we round observation times as was done above for the CD4+ data. An alternative function that des~rib~s the association among repeated values and is easily estimated With megular observation times is the variogram (Diggle, 1990). For a stochastic process Y (t ), the variogram is defined as
~E [{Y(t)
- Y(t - u)}2],
U ~ O.
6
If Y (~) is stationary, the variogram is directly related to the autocorrelation functIOn, p(u), by
6 0.89
')'(u)
')'(u) =
5
Fig. 3.14. Sample autocorrelation function of CD4+ .I I . rCS1( ua s and upper 95(cf · 't t o Ier~nce I lml s assummg zero autocorrelation. ' 10 - - : sample autocorrelation f une t IOn ...... : to Ierance limits.
Table 3.3. Estimated autocorrelation function for CD4+ residuals. Entries are p(u) = Corr(Yij, Yij-u), u = 1, ... ,6. u
0.6
(3.4.1)
= 0- 2 {I -
p(u)} ,
where 0'2 is the variance ofY(t). However, the variogram is also well-defined for a limited class of non-stationary processes for which the increments Y(t) - Y(t - u), are stationary. ' The origins of the variogram as a descriptive tool go back at least to Jowett (1952). It has since been used extensively in the body of methodology known as geostatistics, which is concerned primarily with the estimation of spatial variation (Journel and Huijbregts, 1978). In the longitudinal context, the empirical counterpart of the variograrn, which we call the sample variogram, is calculated from observed half-squared-differences between pairs of residuals,
and the corresponding time-differences
If the times t· . are not totally irregular, there will be more than one observation at ea~h value of u. We then let -y(u) be the average of all of the . h t' I 1 of u . With highly irregular Vi'k correspondmg to t at par ICU ar va ue s;mpling times, the variogram can be estimated from the data (Uijk, Vijk),
"~XPLORING LONGITUDINAL DATA
50
.< k - 1 no' ; = 1 1ft by fitting a non-parametric curve. The pro1'" , .. " . cess variance, (12, iH estimated as the average of all half-squared-differences .!. ( _ )2 'th' --I- l rrhe autocorrelation function at any lag u can then 2 Yij Ylk WI, 1, I . be estimated from th(~ sample variogram by the formula J
>
-
,
•••
,
(3.4.2)
Example 3.6. CD4+ ceU counts An estimate of the variogram for the CD4+ data is pictured in Fig. 3.1~. The diagram shows both the basic quantities (Uijk, Vijk) and a smooth estimate of ')'(11) which has been produced using lowess.. Note that there are few data available for time differences of less than SIX months or beyond six years. Also, to accentuate the shape of the smooth estimate,. we have truncated the vertical axis at 180 000. The variogram smoothly mcreases with lag corresponding to decreasing correlation as observations are separated in time, but appears almost to have levelled out by lag u = 6. This is in contrast to the apparent, albeit spurious, rise in the estimated autocorrelation at lag 6, as shown in Fig. 3.14. The explanation is that the non-parametric smoothing of the variogram recognizes the sparsity of data at lag 11 = 6, and incorporates information from the sample variogram at smaller values of u. This illustrates one advantage of the sample variogram for irregularly spaced data, by comparison with the sample autocorrelation function based on artificially rounded measurement times. The horizontal line on Fig. 3.15 is the variogram-based estimate of the process variance, which is substantially larger than the value of the sample variogram at lag 11 = 6. This suggests either that the autocorrelation has not decayed to zero within the range of the data or, if we accept the earlier observation that the variogram has almost levelled out, that positive correlation remains at arbitrarily large time separations. The latter is what we would expect if there is a component of variation between subjects, leading to a ?ositive correlation between repeated measurements on the same subject, Irrespective of their time separation. Incidentally, the enormous random fluctuations in the basic quantities (11ijk, Vijk) are entirely typical. The marginal sampling distribution of each Vijk is proportional to chi-squared on 1 degree of freedom. Example 3.7. Milk protein measurements i"e now. construct ~n estimate of the variogram for the milk protein data, n a deSIgned expenment such as this, we are able to use a saturated model for the mean response profiles. By this, we mean that we fit a separate parameter for the mean response at each of the 19 times in each of the three treatment groups. The ordinary least-squares (OL8) estimates of these 57 mean response . . paramet ers are Just the corresponding observed means. The sample vanogram of the resulting set of OL8 residuals is shown
Bi =
WEIGHTED LEA' ST-SQUARES Esnr-IATION
Weighted least-squares estimation
We now return to the general formulation of (4.2.1), and consider the problem of estimating the regression parameter (3. The weighted least-squares estimator of (3, using a symmetric weight matn'!:, W, is the value, i3 w , which minimizes the quadratic form
(y - X(3)'W(y - X(3).
(4.3.1)
Standard matrix manipulations give the explicit result
f3w =
(X'WX)-lX'Wy.
(4.3.2)
Because y is a realization of a random vector Y with E(Y) = X (3, the weighted least-squares estimator, f3w, is unbiased, whatever the
60
GENERAL LINEAR MODELS FOR LONGITUDINAL DATA WEIGHTED LE
choice of W. Furthermore, since Var(Y) = (1'2V, then Var(,8w) If W
= (1'2{(X'WX)-1 X'W}V{WX(X'WX)-l}.
= 1, the identity matrix, {i/
with If W
with
= (X'X)-IX'y,
(4.3.4)
(4.3.7)
The 'hat' notation anticipates that /3 is the maximum likelihood estimator for f3 under the multivariate Gaussian assumption (4.2.1). This last remark suggests, correctly, that the most efficient weighted least-squares estimator for f3 uses W = V-I. However, to identify this optimal weighting matrix we need to know the complete ~orrelation structure of the data - we do not need to know (J2, because f3w is unchanged by proportional changes in all the elements of W. Because the correlation structure may be difficult to identify in practice, it is of interest to ask how much loss of efficiency might result from using a different W. Note that the relative efficiency of and can be calculated from their respective variance matrices (4.3.3) and (4.3.7). The relative efficiency of OL8 depends on the precise interplay between the matrices X and V, as described by Bloomfield and Watson (1975). However, our experience has been that under the conditions encountered in a wide range of longitudinal data applications, the relative efficiency is often quite good. As a simple example, consider m = 10 units each observed at n = 5 time-points tj = -2, -1,0,1,2. Let the mean response at time t be J1( t) = 130 + 131 t. We shall illustrate the relative efficiency of the OLS estimators for f3 = (130,13t} under each of the covariance structures described in Section 4.2. First, note that
/3
°]
-1 _
°
[0.02
-
en, straightforward matrix
= [50 0 + 4p)
o
61
. I' . atlOns give
mampn
° 1
1000 - (J) ,
Var(,8[)
= (1'2lo.
0 20
° 1
+ 4p)
o
(4.3.8)
(L01(l _ p) ,
by substitution of (X'X)-I and X'VX'
= (1'2(X'V- 1X) -1.
100
AST-SQUARES ESTIMATION
and
(4.3.5)
(4.3.6)
13w
.
X'V X
the estimator becomes /3 = (X'V- l X)-1 X'V- 1y, Var(/3)
- P
(4.3.3)
< 1 Th
(4.3.2) reduces to the OLS estimator
= (1'2 (X' X)-l X'V X(X' X)-I.
Var(,8/)
= V-I,
for some 0
tullatoT.
(4.5.1)
In this case, the maximum likelihood estimator for 0'2 is (j2 = RSS/(~m), where RSS denotcl> the residual Hum of squares, whereas the usual unbiased RSS/(nrrl ' t or IS . 0'-2 =., es t Ima ,. -p), where ., p is the number of elements of (3. In fact, i7 2 is the REML estimator for 0'2 in the model (4.5.1). In the case of the GLM with dependent errors, Y
Now, for fixed . Q , generaIIzed )past- 0
Ii
5 1
I,'
5
...J
~
4
4 3
3
200
0
·tka. Fig. 4•2. SI chambers.
400
600
800
200
Days since 1 January 1988 spruce
400
600
"'T
800
Days since 1 January 1988
data: mean response profiles in each of the four growth
and, for 1989,
Vo=
0
0.457 0.454 0.427 0.417 0.451 0.425 0.415 0.409 0.398 0.396
0.433 0.422 0.408 0.418 0.431 0.420 0.406 0.416 0.412 0.404 0.390 0.401 0.410 0.402 0.388 0.400 0.434 0.422 0.405 0.418 0.415 0.399 0.412 0.394 0.402 0.416
The most striking feature of each matrix Va is its near-singularity. This arises because the dominant component of variation in the data is a random intercept between trees, as a result of which the variance structure approximates to that of the uniform correlation model 5.2.2 with p close to 1. Using the estimated variance matrices Va, we can construct pointwise standard errors for the observed mean responses in the four chambers. T.hese standard errors vary between about 0.1 and 0.2, suggesting that dIfferences between chambers within experimental groups are negligible. We t~erefore proceed to model the data ignoring chambers. FIgure 4.3 shows the observed mean response in each of the two treatment groups. The. two curves diverge progressively during 1988, the second year of the expenment, and are approximately parallel during 1989. The ?verall growth pattern is clearly non-linear, nor could it be well approxImated by a low-order p l ' I"m tIme. For thIS . reason, and because the o ynomla
primary inferential focus is on the OZOlH' effect. we make no attempt to mo~el the overall growth pattern parametrically. Instead, we simply use ~ se~arate ~ararneter, 13) say, for the treatment mean response at the Jth tlme-pomt and concent.rate our modelling efforts on the control versus treatment contrast. . For the 1988 data, we assume that. t.his cont.rast is linear in time. Thus, If J-ll (t) and J-l2(t) represent. t.he mean response at time t for treated and control trees, respectively, then
and 112(t))=I3)+7)+,f).
j=I ..... 5.
(4.6.7)
To estimate the parameters 13),7) and I in (4.6.7) we use OLS. that is, W = I in (4.6.1). To estimate the variance matrix of the paramet.er estimates, we use the estimated Va in (4.6.2). The result.ing estimates and standard errors are given in Table 4.3(a). The hypothesis of no treatment effect is that "7 = 'Y = O. The test statistic for this hypothesis, derived from (4.6.5), is T = 9.79 on two degrees of freedom, corresponding to p = 0.007; this constitutes strong evidence of a negative treatment effect, that is, ozone suppresses growth. Note that for the linear model-fitting, we used a scaled time variable, x = t/100, where t is measured in days since 1 January 1988. For the 1989 data, we assume that the control versus treatment contrast is constant in time. Thus,
76
MEAR MODELS FOR LONGITUDINAL DATA I GENERAL L.~ ROBUST ESTIMATION OF STANDARD
res estimates and robust standard errors Table 4.3. Ordinary eas ~8q;:e model fitted to the Sitka spruce data. for mean value parameters III (33 f34 f35 TJ 'Y (32 (31 Parameter I
(a) 1988 Estimate SE
4.470 0.086
4.060 0.090 f3I
t
(32
/33
4.483 0.083
5.179 0.086
5.316 0.087
-0.221 0.220
(a) 5 . 5 1 - - - - - - -
(til
/34
(36
(37
(38
~
6.040 6.127 6.128 6.130 0.354 Estimate 5.504 5.516 5.679 5.901 0.088 0.086 0.088 0.156 0.089 0.086 0.091 0.090 0.087 SE
.....
5.5 3.5
-'-'-'-'-'-'-"
60
=-~:---y---~---.J 150
200
250
Days since 1 January 1988
5.0
300
_.-' - _._._._.- ..
"".""'~'
.....
r
(b) 1989
77
6.5
.J!: 4.0
0.213 0.077
ERRORS
7.0f-----
5.0
100
/35
_
I
_._.-.-
11;;;OO~--:-;15 O::--.---..-----.J 200 250 Days since 1 January 1989
300
Fig. 4.4. Sitka spruce data' estim t d and 95% pointWise confidence 'limits ~a e response profiles, shown 88 solid dots or the OZOne-treated group.. ( a) 1988 growing' season; (b) 1989 growing sea = 0.25;
A popular choice for p(u) is the exponential correlation model, p(u) = exp( -¢u),
0.0
(5.2.7)
for some value of ¢ > 0 F' e model for (]'2 = 1 and ¢ =' o. ;g;r 5.1 shows the vario~ram ~ 5.2.6) .of this sequence of n _ 25 ' .25, and 1.0, together WIth a SImulatIOn of a 25 A . measurements at times t· - 1 2 I' J , , ••• , . s 1; Increases, the strength of the a t u ocorre atlOn deere .' th h the autocorrelatI'on b t ases, m e t ree examples shown, e ween successive " 0.905 0.779 and 0368 . measurements IS approxImately , , " respectIvely. The correspondmg '. vanograms and
(c) 1> = 1.0.
simulated realizations reflect this; as ¢ increases, the variograms approach their asymptote more quickly and the simulated realizations appear less smooth. In many applications the empirical variogram differs from the exponential correlation model by showing an initially slow increase as u increases from zero, then a sharp rise and finally a slower increase again as it
METRIC MODELS FOR COVARIANCESTRUCTURE
sa PARA • tures this behaviour is the 't asymptote. A model whIch cap approach as IS. elation function, so-called GaussIan corr 2) (5.2.8) p(u) = exp( -¢u , . th varlOgram a nd a simulated realization . o Figure 5.2 shows e an d 1., 0 with times of observatIon for some ¢ >. h of ¢ = 0.1,0.25, £ r 172 = 1 and eac . tjo = 1, 2 , ... , 25 , as in FIg. 5.1. (s) 1.0
2
0.8 0.6
0
r(lI) 0.4
-1 0.2
-2 0.0 0
2
3
4
5
5
10
15
20
25
u
(b) 1.0
2 0.8 0.6 r(u)
0 0.4
-1 0.2
-2 0.0 0
2
3
4
5
5
25
u (e) 1.0
2 0.8
rv
0.6 r(u)
0
0.4
-1
0.2
-2
0.0 0
2
u
3
4
5
5
10
15
20
25
Fig. 5.2. Variograms (left-hand panels) and simulated rearI~t'1~~.s (right-hand ¢ ;::= 1.0. panels) for the Gaussian correlation model: (a) cP = 0.1; (b) cP _ O. ,(c)
II!
MODELs
A panel-by-ptmel COlllPBrison between F"1&'8 5.1 and 5.2 is interesting. For both models the PlU'ameter tjJ has a qualitatively similar etlect: 88 tjJ increases the variogt"8.Ql rises lQOre ahacply and the simulated realizations are less smooth. Also, for each value of tP, the correlation between sUoce&rive unit-spaced .measurements ~ the BaIne for the two models. When tjJ = 1.0 the correlatIOns at larger tlme-sep&.rations decay rapidly to zero and the corresp~ndingsimulations of the two models are almost identical (the 8&Ille underlymg random number stream is used for 811 the simulations). For smaller values of cP the simUlated realization of the Gaussian model has a smoother appearance than its exponential counterpart. The most important qualitative difference between the Gaussian and exponential correlation functions is their behaviour near u :: O. Note that the domain of the correlation function is extended to 811 real u by the requirement that p( -u) = p(u). Then, the exponential model (5.2.7) is continuous but not differentiable at u = 0, whereas the Gaussian model (5.2.8) is infinitely differentiable. The same remarks apply to the VU'iogt"&m, which is given by 'Y( u) = (12 {1 - p( u)}. If we now recall the definition ofthe VBri~ ogram as 'Y(u) = ~E[{f(t) - f(t - u)}2J, we see why these considerations of mathematical smoothness in the correlation functions translate to physical smoothness in the simulated realizations. In particular, if we consider a Gaussian model with correlation parameter cPl' and an exponential model with correlation parameter ¢2, then whatever the values of cf>l and cf>2, there is some positive value, Uo say, such that for time-separa~ions less than uo, the mean squared difference between R(t) and R(t-u.o) IS s~al1er for the Gaussian than for the exponential. In other words, on a suffiCiently small time-scale the Gaussian model will appear to be smoother than the exponential. . Another way to build pure serial correlation models 18 t~ assume an explicit dependence of the current fj on a li~ited ~um~er of I~S pre~~~Th's a roach has a long history III tIme senes an , 1 pp For longitudinal data, the approach was sors, fj-I,· .. , fl· going back at least to Yule (1927)h d t d Gabriel's (1962) terminology proposed by Kenward (1987), w 0 a o~ e d I' which the condiof ante-dependence 01 011der p to descnbe a mo e IIIfl depends only on . . f given its predecessors fj_I,.'" tional distnbutlOn 0 fj . bl . with this property is more A ce of random vana es f J . • fj-I,· .. , fj_p' sequen d l after the Russian mathematICIan usually called a pth order Markov mo fet'h' 'kind Ante-dependence modeld · d t h tic processes 0 IS. (C d who first stu Ie s oc as . t d f Markov processes ox an ling therefore has close links WIth the s u ~ o. the graphical modelling of Miller, 1965), and with recent developmen S III multivariate data (Whittaker, 1990). d dence model for a zero-mean I I the fj can The simplest examp e of an ante-. epen cess In this mo de, sequence is the first-order a.utore~resswe pro .
(5.2.9)
be generated from the relatIOnshIp fj = (lfj-l
+ Zj,
PARAMETRIC MODELS FOR COVARIANCE STRUCTURE
88
MODELS
th Z. are a sequence of mutually independent N(O, (12) random vari:bl:~ an~ the process is initiated by lOa ,..." ~~O, (12 /(1 ~0:2)}, An equi,v~lent l' I t' w'th the same initial conditIOns, specIfies the conditIOnal lormu a lOn, I distribution of Ej given Ej-l as h
Ej
I €j_1
f'V
N(O:Ej-ll 0'2),
Care needs to be taken in defining the analogous ante-~ependence l' the time-sequences of responses, rj, To take a very Simple illus' h'Ip b etween mo d'e1s lorppose that we wish to incorporate a I'mear reIatlOns rat lOn, BU 'h '11 tthe response Y and an explanatory variable, x, Wit sena y correlated random vari~tio~ about the mean response. One possibility is to assume that
Yj
= f3l X j + €j,
where the €.s follow the first-order autoregressive process defined by (5.2.9), with para~eters 0:1 and Another is to assume that the conditional
at,
distribution of}j given rj-l is }j Irj-l
f'V
N(f32 x j
+ 0:2 Y j-l, cr~).
of the n - P conditional denB' , 89 straightforward Ror e I Ibes, feO, This maL , ,xamp e 'f f ( ) , K e s estimat' f denSity function with ,I e ' 18 sPecified G Ion ° a very mean dep d" as a aussia b' , ables, so that the cond't' I ' e~ mg hnearly on th ~,pro ablhty N (L: P Q E' 2) Ilona dlstnbution of " e conditIOning varik=1 kJ-k,O:o thenthe d" E"gtvenE'_ k - l ' of the O:k are obtained sim I con 1~lonal maximum liketih - ',.: ,p, IS treating the observed val p Y bY ordmary least-squares (OLoSO)d estlm~tes , bl ues 0 f E' regression vana es attached to th ,-1"., ,Ej_p as a set f ' A t d e respOnBe E' P explanatory n e- ependence models are a' J' , for unequally spaced data, For ex:peah~g for equally spaced data I interpretation to the Q p mple, It would be hard to en ,ess so k arameters in th b o·ve a natural were not equally spaced in time S' 'I ~ a ave model if the measurement cope easily with data for which 'thl~~ ar y, ante-dependence models do n : to all units, e Imes of measurement are not comm~n
°
,One exception to the above remark is the ex ' ponentlal correlation model whIch can also be interpret d 'f i 'If € = (El e ) ash an ante-de pendence model of order 1' cally, SpeCl , . ' . , . , fn as a multi . t '. ' vana e Gaussian dIStribution With covanance structure Cov(tj, fk) = a 2 exp(-<jJ It j
Both are valid models, but they are not equivalent and cannot be made so by any choices for the two sets of parameter values, The conditional version of the first model is obtained by writing
-
tk
I),
then the conditional distribution of f · ' , Furth J gIVen all Its predecessors depends only on the value of f ' )-1' ermore fjIEj_I
rv
N(ajfj_l,a
2
(1-O:J2 )),
J.
= 2"",n,
where and re-arranging to give
YJIYJ-l
f'V
2 N(f3xj - O:Xj-1 + aYJ_1,cr ),
and
We take up this point again in Chapter 7, One very convenient feature of ante-dependence models is that the joint probability density function of € follows easily from the specified form, ~f the conditi~~al distributions, Let Ic( fj I Ej-l, ' , , , Ej_p; 0) be the condltlOnal probabIlIty density function of E'J gl'ven '-J-k, ~. k = 1 , .. " p , d an 10 (fl, . , , ,lop; 0) the implied joint probability density function of (fl,'" ,lOp), Then the joint probability density function of E: is
IT
As a final comment on ante-dependence models, we note that their Markovian structure is not preserved if we superimpose measurement error or random effects, For example, if a sequence Yi is Markov but we observe the sequence yt* = yt + Zt, where Zt is a sequence of mutually independent measurement errors, then yt* is not Markov, 5.2.2
n
f(fI"."fn;a)=/o(fI,.",Ep;a)
fl "" N(O, ( 2 ),
I(E·lf· c J J -1'
, , ., E' J-P''0) '
j=pH
Typically, feO is specified explicitly, from which it mayor may not be easy to deduce f,o(.)" H owever, 1' f P 'IS small and n large, there is usually nly Oh a sm,a11l~ss of mformation in conditioning on 10 1 ",lOin which case t e contnbutlOn to the avera11 l'k l'hood function is' simply , p, the product leI
Serial correlation plus measurement error
These are models for which there are no random effects in (5,2.3), so that }j = W(tj) + Zj, and (5,2.5) reduces to Var(€) =
(J2 H
+ T 2 I.
Now, the variance of each Ej is (12 + T 2 , and if the elements of H are specified by a correlation function p(u), so that hij = p( Iti - tj I), then the
gO
S FOR COVARIANCE STRUCTURE PARAMETRIC MODEL
4,---
_
MODELS
* 1.2
*, /
, ,,
, ,,
/
~
*
91
* 3
-
_
----- ------
0.8
rlUl 2
r(u)
0.4
0.0 l..--_r----r--r----.--r--:' o 2 3 4 5 6
o 0.0
u
1.0
2.0 u
. gr m and two different fitted models. *: sample variFig. 5.3. A samp Ie varIO a . . . ogram; __: fitted model with a non-zero mtercept; - - -. fitted model with a
3.0
Fig. 5.4. The variogram for a model wI'th d' a ran om mtercept ' Ben'&1 COrreI ' and measurement error. 81108
zero intercept.
(5.2.10)
where J is an n x n matrix with all of its elements equal to 1. The variogram has the same form (5.2.10) as for the serial correlation plus measurement error model,
A characteristic property of models with measurement error is that 1'(u) does not tend to zero as u tends to zero. If the data include duplicate measurements at the same time, we can estimate ,(0) = '(2 directly as one-half the average squared difference between such duplicates. Otherwise, estimation of '(2 involves explicit or implicit extrapolation based on an assumed parametric model, and the estimate ,(0) may be strongly model-dependent. Figure 5.3 illustrates this point. It shows an empirical variogram calculated from measurements at unit time-spacing, and two theoretical variograms which fit the data equally well but show very different extrapolations to u = o.
except that now the variance of each € j is Var( €j) = v 2 + (72 + 7 2 and the limit of 1'(u) as u -+ 00 is less than Var( fj). Figure 5.4 shows this behaviour, using p( u) = exp( -u 2 ) as the correlation function of the serially correlated component. The first explicit formulation of this model, with all three components of variation in a continuous-time setting, appears to be in Diggle (1988). Note that in fitting the model to data, the information on v 2 derives from the replication of units within treatment groups.
5.2.3
5.2.4
variogram becomes
Random intercept plus serial correlation plus measurement error
The simplest example of our general model (5.2.3) in which all three components of variation are present takes U to be a univariate, zero-mean Gaussian random variable with variance v 2 , and dj = 1. Then, the realized value of U represents a random intercept, that is, an amount by which all measurements on the unit in question are raised or lowered relative to the population average. The variance matrix (5.2.5) becomes
Random effects plus measurement error
It is interesting to speculate on reasons for the relatively late infiltratio~ of time series models into longitudinal data-analysis. Perhaps the explanatlOn is the following. Whilst serial correlation would appea: t~ be ~ natural . . data mo de, I 'm specific apphcatlOns Its effects feature of any longltudmal . . f dom effects and meBBurement may be dominated by the combmatlOn 0 ran . 22th · 5 4 if (72 is much smaller than elther 7 or v, e error. I n t erms 0 f F Ig. ., th two horizontal '. . . , squeezed between e f h dI Increasmg curve of the vanogram IS finement 0 t e rna e. dotted lines , and becomes an unnecessary re
PARAMETRIC MODELS FOR COVARIANCE STRUCTURE
92
MODEL-FITTING
'ally correlated component altogether, (5.2.3) If we eliminate the serI reduces to
93 4
fj=d;U+Zj'
, d I f th' kind incorIlorates a scalar random intercept, U, The SImplest mo e 0 IS with d j = 1 for all j, to give
2
2
Var(t:) = v 2 .] + r I.
y
, f h ' 2 + r 2 and the correlation between any two , The vanance 0 eac fj IS V measurements on the same unit is p = V2/(v2
-4
vf
vi:
vr + tjtkV~.
Figure 5.5 shows a simulated realization of this model with r 2 = 0.25, V[ = 0.5, and = 0.01. The fanning out of the collection of response profiles over time is a common characteristic of growth data, for which nonstationary random effects models of this kind are often appropriate. Indeed, this kind of random effects model is often referred to as the growth curve model, following its systematic development in a seminal paper by Rao (1965) and many subsequent papers including Fearn (1977), Laird and Ware (1982), Verbyla (1986) and Verbyla and Venables (1988). SandIand and McGilchrist (1979) advocate a different kind of nonstationary model for growth data. If S(t) denotes the 'size' of an animal or plant at time t, measured on a logarithmic scale, then the relative growth rate (RGR) is defined to be the first derivative of S (t). Sandland and McGilchrist argue that the RGR is often well modelled by a stationary random pro::ess ..For models of this kind, the variance of S(t) increases linearly over tIme, III contrast to the quadratic increase of the random slope and intercept model. Also, the source of the increasing variability is within, rather than between, units. In practice, quite long sequences of measurements may be needed to distinguish these two kinds of behaviour. Cullis and
vi
-2
+ 7 2 ).
thl's unl'£orm correlation structure is " sometimes called · 1< As note d ear1Ier, , l't 1 t d l' because of its formal equivalence the . correlatlOn a sp 1 -p 0 rna e .WIth , expenment. . d d by the randomization for a clasSIcal splIt-plot struc t ure III uce . . . . However, the randomization argument gives no justIficatIOn for ItS use WIth longitudinal data. More general specifications of the random effect c~mpo~ent lea~ to models with non-stationary structure. For example, If U IS a paIr of and and independent Gaussian random variables with variances d = (1 t) specifying a model with a linear time-trend for each umt but J 'J' . () 2222 random intercept and slope between umts, then Var tj = VI + t j v 2 + r , and for j :f- k, COV(tj,tk) =
0
5
Fig. 5.5. slopes.
10
15
A simulated realization of a model w'th I
20
25
d . ran om mtercepts and random
McGilchrist (1990) advocate using the Sandland an " routinely in the longitudinal data setting. Note tha~ ~~:II~:~~st sm~del data the model can be implemented b I' . q .y p ed R - S( ) _ . . y ana ysmg successIVe dIfferences, t t S (t - 1), usmg a statIOnary model for the sequence {Rt}. Random e~ects plus measurement error models as here defined can be t~ou~ht of ~ Incorporating two distinct levels of variation - between and wlt?l~ subjects and might also be called two-level models of random :ranatlOn. In some areas of application, the subjects may also be grouped In some way, and we may want to add a third level of random variation to describe the variability between groups. For example, in the context of animal experiments we might consider random variation between litters, between animals within litters, and between times within animals. Studies with more than two levels of variation, in this sense, are also common in social surveys. For example, in educational research we may wish to investigate variation between education authorities, between schools within authorities, between classes within schools, and between children within classes, with random sampling at every level of this hierarchy. Motivated by problems of this kind, Goldstein (1986) proposed a class of multi-level random effects models which has since been further developed by Goldstein and co-workers into a ~ystematic methodology with an associated software package, MLn. 5.3
Model-fitting
In fitting a model to a set of data, our objective is typically to ,ansdwel~ . h genera . t ed the data . Note that mo e questions about the process whlc
94
PARAMETRIC MODELS FO
R COVARIANCE STRUCTURE
We expect only that the former will nd 'process' are not synonYIDhousl' tt in the sense that it will contain a d a . tion to tea er be a useful approxlma h values can be interprete as answers f ameters w o s e , II h small number 0 par . d b the data. Typica y, t e mo d eI will to the scientific questIOns posh~ h Yre not of interest in themselves, but h ameters w IC a contain furt her par, hich we make about t e parameters of whose values affect the mferences w direct interest. . r bout the model parameters is an end rations mlerence a d ' . In many app IC ' . h t use the fitted model for pre IctlOn of the in itself. In others, we m~y WIS 0 e profile associated with an individual t' llous-tlme respons . . llnderlymg con m .' . S tion 5.6. Here, we conSider only the . Wi t ke up thiS Issue m ec llmt. e a h' h divide into four stages: model-fitting process, w IC we . - Ch ' g the general form of the model; . (1) formulatwn oosm (2) estimation - attaching numerical values to parameter~, (3) inference - ca1culatl'ng confidence intervals or testmg hypotheses about parameters of direct interest;
MODEL-FITTING 95
need for a transformation or .. h an In erently t. a random effects model . When th e pattern . non-s atlonary model such as f '. tionary, the empirical variogram b 0 vanatlon appears to be sta. can e used to t' covanance structure. In particul 'f 11 . eli nnate the underlying set of times the empirical variograr, I f~h umt~ are o~)served at a common for the underlying theoretical va~m 0 e~eslduaL" IS essentially unbiased and Verbyla (1998) describe a vers?gram (h iggle: ~990, Chapter' 5). Diggle Ion 0 f t e emplfJcal' . can be used when the pattern of variat" . . vanogram which '. Ion 18 non-statIOnary d I' . Note that If an Incorrect paramet . . . . reSIduals, the emplflcal variogram will bTicb' mo 1e 18 used to defi ne the e lase( to an unknown extent. .5.3.2 Estimation Our objective in this second stage is to attach numerical values to the parameters in the model, whose general form is (5,3.1)
(4) diagnostics - checking that the model fits the data.
We write and
5.3.1
Formulation
Formulation of a model is essentially a continuation of the explo~atory data · conSI'dered'n Chapter 3 but directed towards the speCIfic aspects anaIySlS 1 , . fth d t which our model aims to describe. In partIcular, we assume here o e a a h'd entI'fi cat'IO~ an,d that any essentially model-independent issues such as tel treatment of outliers have already been resolved. The focus of attentIOn IS then the mean and covariance structure of the data. With regard to the mean, time-plots of observed averages within treatment groups are simple and effective aids to model formulation for well-replicated data. When there are few measurements at anyone time, non-parametric smoothing of the data is helpful. For the covariance structure, we use residuals obtained by subtracting from each measurement the OL8 estimate of the corresponding mean response, based on the most elaborate model we are prepared to contemplate for the mean response. As discussed in Section 4.4, for data from designed experiments in which the only relevant explanatory variables are the treatment labels we can use a saturated model, fitting a separate mean response for each combination of treatment and time. For data in which the times of measurement are not common to all units, or when there are continuously varying covariates which may affect the mean response non-linearly, the choice of a 'most elaborate model' is less obvious. Time-plots, scatterplot matrices and empirical variogram plots of these residuals can then be used to give an idea of the underlying structure. Non-stationary variation in the time-plots of the residuals suggests the
Ct.
/3,
2
a- , and
0: for the estimates of the model parameters (3 a 2 '
,
,
Particular methods of estimation have been derived for special cases of (5,3.1). However, it is straightforward to derive a general likelihood method. The resulting algorithm is computationally feasible provided that the sequences of measurements on individual units are not too long. H computing efficiency is paramount, more efficient algorithms can be derived to exploit the properties of special covariance structures. For example, estimation in ante-dependence models can exploit their Markovian structure. The general method follows the same lines as the development in Section (4.4), but making use of the parametric structure of t~e varia,nce matrix, V(o), Also as in Section 4.5 we favour restric~ed ma.,{lI~um likelihood (REML) estimates over classical maximum li~e~lhood :stlm~tes t? reduce the bias in the estimation of 0, although thiS IS less ImpOltant If the dimensionality of (3 is small. For given 0, equation (4.5,5) holds in a modified form as (5.3.2)
and (4.5.6) likewise holds in the form
0'2(0) = RSS(o)j(N - p),
(5.3,3)
RSS(o) = {y - x/3(o)}'V(o(l{y - x!3(o)},
(5.3.4)
where
96
PARAMETRIC MODELS FOR COVARIANCE STRUCTURE ,.'.f..•..•
and N = Ez;,l ~ is the total number of measurements on all m units. Finally, the REML estimate of a maximizes £"(a) =
m
= L{Yi - x i /3(o)}'Vi(t i j O)-l{Yi - Xi~(o)}, i=l
(5.3.6)
and
+
Nlog{RSS(a)} + ~ log I Vi(t i ; a)
&,log I X;V;(t
5.3.3
-~ [NlogI\lllS()} + ~Iog II\(!;;>I].
Inference
m
(5.3.8)
Inference about {3 ca b b with (5.3.1), implies ~ha: ssoo on the result (5.3.2) which in co' . , nJunctlon {3(0) '" MV N {{3, (12(X'V(0)-1 X)-l We assume that (539) t' }. (5.3.9) . " con mues to hold to b su stItute the REML estim t '2 " a good approximati if . (53 ) a es (1 and 0 for th .. ~I._ on, 'We ex m ..9. This gives e UlIAllOWn values of (12 and
where
f3 '" MVN({3, V),
(5.3.10)
v = a- (X'V(a)-lX)-1. 2
The immediate application of (53 10) . elements of (3. Almost as im d···t. 1Shto set standard errors on individual " . me 1a e 1S t e calculation of nfid . lor general lInear transformations of the form co ence regions 'IjJ= D/3,
I[
-2
L(Q) =
(5.3.5)
and the resulting REML estimate of (3 is /3 = /3(&). The nature of the computations involved in maximizing £" (ex) becomes clear if we recognize explicitly the block-diagonal structure of V (0). Write 'Yi for the vector of ni measurements on the ith unit and t i for the corresponding vector of times at which these measurements are taken. Write l{(tiia) for the non-zero ni x ni block of V(o) corresponding to the covariance matrix of Vi. The notation emphasizes that the source of the differences amongst the variance matrices of the different units is variation in the corresponding times of measurements. Finally, write Xi for the ni x P matrix consisting of those rows of X which correspond to the explanatory variables measured on the ith unit. Then (5.3.4) and (5.3.5) can be written as
L*(o) =
estimates maxitnize
-! [Nlog{RSS(a)} + log I V(a) I +log /X'V(a)-lXJ] ,
RSS(o)
.
(5.3.11)
~~~~ ~o: ~::~~::~tk~~a: fr ;~r~f3~i:~e: ~ p. Confidence regions for 1P
I
;p '" MVN('IjJ,DVD'), i ; 0)-1
X",] .
(5.3.7)
Each evaluation of L" (0) th £ . I £ th . ere ore mvo ves m determinants of order p x p or e second term m (5 3 7) and t t d . the V; (t·· ) I .: ' . amos m etermmants and inverses of fewer\h~nO~ ~~~t apphcatIOn~, the number of distinct Vi(t i ; a) is much ; r ermore, the. d1mensionality of 0 is typically very small sav three J or lour at most and m '. . ' our expenence It IS seldom necessary to use sophisticated a t' .' t' of Nelder and Meal(~~~~a) IOn algor~thms. Either the simplex algorithm requires the USer to 'd .o~ a qUa:'I-Newton algorithm, neither of which IProvl e ~n or~at1on on the derivatives of £"(0) usually works well Th . e on YexceptIOns m o ' , ur expenence have been when an averelaborate model is fltt d t a, are poorly determ' ed ~. spars~ dat.a and the covariance parameters, e REML estimates ma:: .' ~*a(llY) m thIS subsection, note that whilst the mIze a given by (5.3.7), maximum likelihood
from which it follows in turn that
(5.3.12) is distributed as X;. Let er(q) denote the q-critical value of X2 so that 2 r' P{X r 2 cr(q)} = q. Then, a 100(1 - q) per cent confidence region for 'ljJ is {'ljJ: T('IjJ) ~ er(q)}.
(5.3.13)
Using the well known duality between hypothesis tests and confidence regions, a test of a hypothesized value for 'IjJ, say H o: tP = tPo, consists of rejecting H o at the IOOq per cent level if T( tPo) > er ( q). Note in particular that a statistic to test H o: 'ljJ = 0 is (5.3.14)
whose null sampling distribution is
X;.
~.~ ,_c,
PARAMETRIC MODELS FOR COVARIANCE STRUCTURE
98
. Iement'm g the above method of inference, the In lmp . nuisance para2 d are estimated once only, from the maximal model for {3. meters, a an 0, d'ffi It . II h The choice of the maximal model is sometimes I cu.' espeCla y w en the . f ment are highly variable between tlmes 0 measure . , umts or, when there are continuously varying covariates. An aIt~rnatlve, step-up approach .may then be useful, beginning with a very simple model for (3 an~ conslder_ .mg t h e maxJmlze . , d log-likelihoods from a sequence of progres,nvely . . .. . more _ elaborate models. At each stage, the maximized lo~-lIkehho.od IS £(0), where L(o) is defined by (5.3.8) with the curren~ de~lgn matnx.X, and & .. L() maximizes o. Let Lk be the maximized log-lIkelIhood associated with the kth model in this nested sequence, and Pk the number of columns in the corresponding design matrix. The log-likelihood. ratio statistic to test the adequacy of the kth model within the (k + l)st IS
(5.3.15) If the kth model is correct, Wk is distributed as chi-squared on Pk+1 - Pk degrees of freedom. This likelihood ratio testing approach to inference is convenient when the data are unbalanced and there is no obvious maximal model. However, the earlier approach based on the multivariate Gaussian distribution for is in our view preferable when it is feasible because it more clearly separates the inferences about f3 and o.
i3
5.3.4
EXAMPLES
5.4
99
Examples
In this seetion . These involve: we giVE' three' applicati(JIls of the mod I b ed e - as. approach. (I) the data set of Example 1 4 L OIl tile protein ' from cows on each of thr~'' d·ff. .' content of milk samples t( I f>rpnt dwts' (2) a set of data on tho Liod y-welg 'h.ts of '' . 'I experiment; '.' cows in a 2 by 2 factorial
(:3) the data set of ExamplE' 1 1 CD 1 ,on , + cell numbers. Example 5.1. Protein contents of 'lk '., ml sampIes
These data were shown in Fig 1 4 Tl . . " ley COIlSlst of measur t f . content in up to 19 weekly samples t. 'k f emen so protem . ' a en rom each of 79 11 to one of three different diets- 25 co ' d b . cows a ocated '. . ws receive a arley diet 27 cows a mixed diet of barley and lupins and 27 c . d' f'.' , , , I' . ' " . ows a let 0 lupms only The Imtla lJ1spectlon of the data reported in eha t 1 . , p er suggested that the mean response profiles are approximately parallel sho . . . . I h . . . . ,wmg an mltIa s arp drop associated With a settlmg-in period ' followed . t Iy cOllStant . by an approxnllae mean response over most of the experimental period and . .. . . a POSSI'bl e gentl e nse towar~s the. en~. The empirical variogram, shown in Fig. 3.16, exhibits a smooth nse WIth IIlcreasing lag, levelling out within the range spanned by the data, Both the mean and covariance structure therefore seem amenable to parametric modelling. Our provisional model for the mean response profiles takes the form
(5.4.1)
Diagnostics
Diagnostic checking of the model against the data completes the modelfitting process. The aim of diagnostic checking is to compare the data with the fitted model in such a way as to highlight any systematic discrepancies. Since our models are essentially models for the mean and covariance structure of the data, simple and highly effective checks are to superimpose the fitted mean response profiles on a time-plot of the average observed response within each combination of treatment and ti~e, and to superimpose the fitted variogram on a plot of the empirical vanogram, Simple plots of this kind can be very effective in revealing inconsistencies and model which were missed at earlier stages. If so, these can be mcorporated into a revised model, and the fitting process repeated. More formal ' £or regression models are discussed by . d'Iagnos t'IC cn' t ena Cook and WeIsberg (1982) and Atkinson (1985) in the context of linear model~ for. cro~s-sectional data and, in a non-parametric setting by Hastie and Tlbshlram (1990). '
betwee~ data
where 9 = 1,2,3 denotes treatment group, and time, t, is measured in weeks. Our provisional model for the covariance structure takes the form of (5.2.10) with an exponential correlation function. For the model-fitting, it is convenient to extract a 2 as a scale parameter and reparametrize to al = T 2 / a 2 and 0:2 = v 2 / a 2 . With this parametrization the t~eoretic~1 varia~ce of each measurement is a 2 (1 + 0:1 + 0:2) and the theoretIcal vanogram IS
,(u) =
(T2{0:1
+ 1- exp(-a3 u )}.
The REML estimates of the model parameters, with estimated. st:;~~ ard errors and correlation matrix of the mean para~e;ers, are bglvet the . b d to make 1Il1erenCes a ou Table 5.1. This informatIOn can now e us~ W'· t amples d' d' SectIOn 5.3.3. e give wo ex . mean response profiles as Iscusse. I~ h tl the diets affect the mean Because our primary interest IS III we. ler _ (-l _ f.J To do h h othesls that POI - ,u02 - ,u03· response profiles, we first tes t t e yp
OVAWANCESTRUCTURE
100
PARAMETRIC MODELS FOR C
EXAMPLES
fitted to data on pro. for the rnO d eI . bJe 5.1. REML estimates Mean respoIlBe; (b) Covanance :n content of milk samples: (a) structure.
(a)
101
The REML estilIl8.te of "" is
1J, = Df30 = (0.10,0.11). From (5.3.12), the test statistic is
= 15.98,
To = tb'(DV1D')-1"" Parameter Estimate
,801 ,802 ,808 ,81 ,82 ,8s
4.15 4.05 3.94 -0.229 0.0078 0.00056
Correlation matrix
BE
1.00 0.054 0.52 1.00 0.053 0.52 0.53 1.00 0.053 1.00 -0.61· -0.62 -0.62 0.016 -0.33 1.00 0.0079 -0.60 -0.06 -0.06 -0.93 0.24 0.00050 0.01 0.01 0.02
which we refer to critical values of X2 S' P 2 we clearly reject t/J = 0 and conclud ~ha'tlUd~ h: 2 > 15.98} :: 0.0003, the •mean response profil e. Fu rt h ermore, the ordering ofe th th let affects . . h h e ree estunates fJ' 'bl Wit t e parameter estimate for the mixed diet 1 i '011' 18 sew e, two pure diets. y ng between those for the A question of secondary interest is whether there . '. h response towards the end of the experiment Th h 18: ~e m t e mean (32 == (33 == O. The variance submatrix of Z. ~ ({3' e (3'YP) ?t /:18 to test this is fJ 2, 3 18 V2, w here
== [62.094 -3.631]
106V;
-3.631
2
(b)
+ 1- exp(-a3 u )} , Var(Y) == 0"2(1 + al + (2). Estimate 0.0635
0.3901 0.1007 0.1674
this, we need only consider the REML estimates and variance submatrix of f3 0 = ((301,(302,(303)' This submatrix is
Vi =
0.0029 0.0015 [ 0.0015
0.0015 0.0028 0.0014
.
The test statistic is
")'(u) == 0"2{al
Parameter
0.246
0.0015] 0.0014 . 0.0028
Now, using (5.3.11) and (5.3.12) on this three-parameter system, we proceed as follows. The hypothesis of interest is 'l/J = D/3 = 0, where
o
which we again refer to critical values of X~. Since P{X~ > 1.29} = 0.525, we do not reject the null hypothesis in favour of a late rise in the mean response. Refitting the simpler model with (32 == (33 == 0 leaves the other parameter estimates essentially unchanged, and does not alter the conclusions about the treatment effects. Results for the model with Ih = {33 = 0, but using maximum likelihood estimation, are given in Diggle (1990, Section 5.6). Verbyla and Cullis (1990) suggest that the mean response curves in the three treatment groups may not be parallel, and fit a model in which the difference, f.Ll (t) - f.L3 (t), is linear in t. A model which allows the difference in mean response between any two treatments to be linear in t repl~ces (3~2 and (303 in (5.4.1) by (3021 + (3022 t and (3031 + (3032 t , respectively. Usmgt,hls eight-parameter specification of the mean response profiles as our workl~g . of POSSI'bl' model a hypothesIs e mteres t'm th a t (3022 -- (3032 = 0 , that IS that (15.4.1) gives an adequate description of the data. Fitting this enlarged model, we obtain
(~022' ~032) = (-0.0042, -0.0095), with estimated variance matrix, 6
A
10 Vj
V3' where [33.41
18.92J
== 18.92 41.22 .
EXAMPLES
COVARIANCE STRUCTURE
pA.RAMETRlC
MOOELSFOR.
a
102
0 bich is to be compared _ 0 is To:::: 2.21}, w 033 there is no real
t (j 2 == ,..,032 2 2 210 := • , • The sta.tistic to tes ~2 X2 • Since P{X2 ::>' we consider {3032 alone, ~he with critical values 0 . : /31 == o. Even degree of freedom, which 'dence against (3022 -0 ·so~O == 2.208, on °d~e tly comparable with the VI e t a == l . l l .' ot trec I h 1 statistic to t~ . ,..,032 This analYSIB IB n they consider on y t e ast i still not significant. d Cullis (1990), as. t' n of the mean response :ne reported in Verbylo. :n_para.metric descrtp 10 6 ti11le--points and use a the barley diet. 'zed by the parameter 1 £. coWS on .8 llUJIlmaf l profile, 11-1 () t '. or al model fOt the data. 1. lification that {32 = {33 = O. Our ptOVJsionbl 5 1 but with the SlffidP an response profiles and estbne.tes inTo. e · inca! and fitte me Ftgure 5.6 compares the amp
if
4.0
II}
3.8
108
varioglams. The fit to the variogram appean to be satisfactory. The rise in the empirical variogtam at u = 18 can be discounted &8 it is based on only 41 observed pairwise differences. Since these pairwise ditferences deri-.re from 41 different animals and are therefore approximately independent, e1& mentary calculations give a 95% per cent confidence interval for the mean of 1'(18) as 0.127 ± 0.052, which includes the titted value, "((18) = 0.085. At first sight, the fit to the mean response profiles appears less satisfactory. However, the apparent lack of fit is towards the end of the experiment, by which time the responses from almost half of the a.nil111\la are missing. Clearly, this increases the variability in the observed mean responses. AIao the strong positive correlations between successive measurements mean that a succession of positive residuals is less significant than it would be for uncorrelated data. Finally, and most interestingly, we might question whether calving date is independent of the measurement process. If 1atercalving cows are also more likely to produce milk with a lower protein content, we would expect the observed mean responses to rise towards the end of the study. The implications for the interpretation of the fitted model are subtle, and will be pursued in Chapter 13.
Example 5.2. Body-weights of cows 3.6
These data, provided by Dr Andrew Lepper (CSIRO Division of Animal Health, Melbourne) consist of body-weights of 27 cows, measured at 23 unequally spaced times over a period of about 22. months. T~ey are listed in Table 5.2. For the analysis, we use a tlm~scal.e which runs from 0 to 66, each unit representing 10 days. One ammal showed an abnormally low weight-gain throughout the experiment and we have
V 3.4 3.2 3.0
15
10
5
0
Weeks
Ib) 0.12
---------- ---------
-------0.08 y(u)
0.04
- -,--
-
,
0.0 0
5
u
10
15
Fig. 5.6. Milk protein data: comparison between observed and fitted m~a~ response profiles and variograms: (a) mean response profiles - - : barley dIet, .........: mixed diet; - - -; lupins diet; (b) variograms - - : sampie variograffii - - -: fitted variogram.
4.7 4.868 4.868 4.828
4.787 4.605 4.745 4.745
5.075 5.136 5.165 4.905 5.298 5.416 5.075 5.193 5.22 5.273 5.323 5.193 5.011 5.136 5.273 5.347 5.011 5.136 5.193 5.106 5.298 4.977 5.043 5.136 5.106 5.165 5.011 4.828 4.942 5.298 5.106 5.22 4.868 5.043 5.273 5.165 4.905 5.011 5.106 5.011
5.298 5.323 5.416 5.617 5.481 5.521 5.521 5.46 5.416 5.541 5.438 5.561
5.652 5.501 5.561 5.371 5.438 5.394 5.371 5.298 5.273 5.58 5.22 5.298 5.561 5.416 5.501 5.635 5.347 5.347 5.521 5.541
5.298
5.371
5.416 5.416
104
PARAMETRIC MODELS
FO
R COVARIANCE STRUCTURE
, analysis, The remaining 26 animals removed it from the model-flttmg ' a 2 by 2 factorial design, The ngst treatment s m 'L' ' were allocat ed amo / f I'ron dosing and of m!ectJOn by the nce absence 0 two factors were prese . Th plications in the four groups were: organism M, paratuberculo8z8" L' e ·re only 10 x iron and infection. For Iy 9x IOLect 10n , , 4x control,3x Iron o~ , . I b kground to these data, see Lepper et at. further details of the blOloglCa ac (1989). L' t' of the bodyweights as the response varinT Iog-trans!orma Ion . vve use a. . variance over time. The resultmg data are shown able, to stabilize the.. I rl'ogram of OLS residuals from a saturated , F' 5"7 The emplflca vathat is fitting a separate parameter for each mIg, model for the mean response, ' " A' h . d treatment is shown 10 FIg. 5.8. s In t e prevIOUS , . f t' combmatlOn 0 Ime an , (b)
EXAMPLEs 105
0.008
r(u)
0.004
o
20
u
40
Fig, 5,8. Log-bodyweights of cows' sample
(s)
.
60
. vanogram of residuals.
6,0
E
I ~
~
5.5
,gJ
5.5
I .c
~
5.0
5.0
4.5
4.5 '-,-_---,_ _- . - _ - - - , _ - l 20 40 60 o
o
20
nme
40
60
Time
(c)
')'(u) = a 2 {ol
E
~
.gJ
!
5.5
1 "2
.0
8'
+1 -
exp( -Q3U2)}.
(d)
6.0
~
section, .the model described'III Sectlon . 5.2.3 seems re bl th e d etalled behaviour is rath d'LI' h Mona e, although er Iuerent t an th t 0 f th . For these data the measurement . a 2 e mIlk protein data. t b . , error vanance T small, as we would expect whereas th b t ' . ,seems 0 e relatively , e e ween-ammal variance 112 • Iarge. Also, the behaviour of the .. I . ' , IS very . empmca vanogram near the ori in su _ gests that a Gaussian correlation function might b . g than g . . e more appropnate an exponential. WIth a reparametrization to '" 2/ 2 d 2 thO ".'1 - T a an 02 = V /a 2 IS 2 suggests a model in which the theoretical variance of log-weight ~ ( 2) and the theoretical variogram takes the form (J (1 + 0:1 +
.c
g 5.0
5.0
-l
4.5
o
4.5
20
40
nme
60
o
20
40
60
Time
Fig: 5.7. Log-bodyweights of cows in a . (b) Iron dosing; (c) infection' (d) . ~ by 2 factonal experiment: (a) control; , Iron dosmg and infection.
Fitting the mean response profiles into the framework of the general linear model, J.t = X {3, proves to be difficult. This is not untypical of growth studies, in which the mean response shows an initially sharp increase before gradually levelling off as it approaches an asymptote. Low-degree polynomials provide poor approximations to this behaviour. A way forward is to use 23 parameters to describe the control mean response at each of the 23 time-points, and to model the factorial effects, or differences between groups, parametrically. This would not be sensible if the mean response profile were of intrinsic interest, but this is not so here. Interest focuses on the factorial effects. This situation is not uncommon, and the approach of modelling only contrasts between treatments, rather than treatment means themselves, has been advocated by a number of authors including Evans and Roberts (1979), Cullis and McGilchrist (1990) and Verbyla and Cullis (1990). Figure 5.9 shows the observed mean response in the control g~'oup, and the differences in mean response for the other three groups relative to the
ARIANCE STRUCTURE PARAMETRIC MODELS FOR COY
106
EXAMPLES
107
(a)
6.0
-------- ---------
0.008
t
5.5
J
--------
y(u)
0.004
5.0
1_..--__----;J,;------~44(0)-----~66'iO)---I 4.5{ 20 Time
--- .. _-----o
20
40
u
(b)
0.0
Fig. 5.10. Log- bodyweights of cows: observed and fitted variograms. sample variogram; - - -: fitted model.
(,
J ..0.10
I
~ -0.20
~'\~(':"':/~.::-::;..::.::::::.:./~~:: --
::::.:::>:..,::.::::::::::::::::, ::.::::::::
,..,.'.'.>1;;""
''''~\I::::.'...''\ "/....,, ~_ ...... , I ~ ..::.... \". ,. ~ ..,.. . . . . . " \ .. ,--_______ I', " _----\ \/ "..... '''\', --).--.-..---,.'------.......... ..... \ '" --, ........ --" ' ...... ' I
"
,
,....... -
..... ~
,---_
\.---
......
/""-,
_ ...
\
~ / \ / _.\
1 \ -----..../-------........--cr\\
1 ......1
...1
-0.30
60
LO------2.,.O------:4CO--------;6~O~-----l Time
Fig. 5.9. Log-bodyweights of cows: observed and fitted mean response profiles: (a) control --: observed; - - -: fitted; (b) difference ?etween co~trol ~nd treated, with fitted contrasts shown as smooth curves ......... : Iron; - - -: mfectlOn; - -: both.
control. These differences do seem amenable to a linear modelling approach. The fitted curves are based on a model in which each treatment contrast is a quadratic function of time, whereas the control mean is described by a separate parameter at each of the 23 time-points. Note that the fit to the observed control mean is nevertheless not exact because it includes indirect inform~tion from the other three treatment groups, as a consequence of our usmg a parametric model for the treatment contrasts. The standard errors of the fitted control means range from 0.044 to 0.046. Figure 5.10 shows the fit to th '. I . d e empltlca vanogram. This seems to be satisfactory, an confirms that the dominant component of variation is between animals;
the estimated parameters in the covariance structure are fr2 = 0.0016 = 4.099, and 6 3 = 0.0045. ' Table 5.3 gives the parameter estimates of direct interest, with their estimated standard errors. For these data, the principal objective is to describe the main effects of the two factors and the interaction between them. Note that in our provisional model for the data, each factorial effect is a quadratic function of time, representing the difference in mean logbodyweight between animals treated at the higher and lower levels of the effect in question. One question is whether we could simplify the description by using linear rather than quadratic effects. Using the notation of Table 5.3, this amounts to a test of the hypothesis that /321 = (322 = /323 = O. The chi-squared test statistic to test this hypothesis is To = 10.785 on three degrees of freedom, corresponding to a p-value of 0.020; we t~erefore retain the quadratic description of the factorial effects. Att~ntlOn n,ow focuses on the significance of the two main effects and of the mter~tlOn between them. Again using the notation of Table 5.2, the hypo~hes1s of . effect £ . .1S ,vOl (.I (~- -- 0, that of no effect no mam or Iron = {311 -- 1-"21 . mam . . s:lor mlec • s: t'lOn 1S . {302 -- {312 -- {322 = 0 , and that of. no mteractlOn . . 1S {303 = {3Ol +{302; {313 = {311 +{312; {323 = /321 +/322' The ~h1-squared stat1st1cs, To each on three degrees of freedom, and th~ir assoc1ate~1;;alue:~~o~ s: 1'1ows: I ron - 'T' 1. 936 586' . ,p - . , 10 .L 0 , p = 0. , InfectIOn - To = cfI = 0.353, cf2
Interaction - To = 0.521, p = 0.?14. . 1 si nificant main effect of infecThe conclusion is that there 1S a hIgh y g ., I the interaction . iF t f' n and not surpnsmg y, tion, whereas the mam e ec 0 .Iro. has failed to alleviate the effect are not significant, that 1S 1ron osmg
d'
PARAMETRIC MODELS FOR COVARIANCE STRUCTURE
108
Table 5.3. Estimates of parameters defining the treatment contrasts between the model fitted to data on log-bodyweights of cows. The contra'Jt between the mean response in treatment group g and the control group is Jig (t) = (30g + 1319 (t - 33) + (32g(t - 33)2.
EXAMPLES 109
2500 Q; .0
Treatment Iron
Parameter f30l
f3ll f321
Infection
(302 (312 (322
Both
(303 (313
f323
Estimates
SE
-0.079 -0.00050 -0.0000040
0.067 0.00071 0.000033
-0.159 -0.0009S 0.0000516
0.053 0.00056 0.00026
-0.239 -0.0020 0.000062
0.052 0.00055 0.000025
~verse effect of in~ection, Refitting a reduced model with only an infectIOn effect, we o~talll the following estimate for the contrast between the mean response With and without infection: fl(t)
= -0.167 - 0.00134(t -
33) + 0.0000566(t _ 33)2.
This equation and the associated t d d provide a compact summa S an ar e.rrors of the three parameters The estimated standard ry of ;he con~luslOns about treatment effects. t r errors lor the mt coefficients are 0.042 0 00044 d0 ercep, mear, and quadratic ,. ,an .00002, respectively. Example 5.3. Estimation of the 0 I . For the CD4+ d t f E P pu atlOn mean CD4+ curve CD aa 0 xample 1.1 there N 4+ cell numbers on m _ 369 . are = 2376 observations of is measured in years with ~ . ~en mfected with the HIV virus. Time ~s known approximately for e:C°hn?ldn . a~ the date of seroconversion, which mterest· . m IVldual In thO IS In the progression of . IS example, our primary since seroconversion, although I'n smeat~ C D4+ count as a function of time from th . ec Ion S S '11 I . e pomt of view of pred' t' . we WI a so consider these data subjects 0 Ie mg the CD4+ t . . raJectones of individual . ur parametric model as.the response variable and spec~ses::uare-root-transformed CD4+ count o o : h: seroconversion and quadra~~c inet~ean~ime-trend, Ji(t), as constant acks mean response, We also includ me t ereafter. In the linear model (P per day)· r e ' e as explanat . b and d . I creatlOnal drug use ( / ory vana Ies: smoking epresSlVe sy t yes no)· numb f mp oms as meas d b ' ers 0 sexual partners' ure Y the CESD scale. '
r::
E
. "-
:::J
c
1500
CD
"
0
...
:.j
~
'
.
..,.+ 0 0
500 0
-2
o
2 Years since seroconversion
4
6
~i~. 5.11. CD4+ cell counts with parametric estim t '. lImIts (plus and minu.., two standa d 'J f a e an~ pomtwlse confidence r errors or the mean tIme-trend. . Fig~lre 5.11 shows the data with the estimated mean time-trend and pomtwlse confidence limits calculated a'J plus and minus two standard I errors, re-expressed on the original scale The standard e . '. rrors were ca:ulated from a parametric model for the covariance structure of the kind mtroduced in Section 5.2.3. Specifically, we assume that the variance of each 2 2 meas~e~ent 2is 7 + a + v 2 an~ that the variogram within each unit is /'(~) - 7 +a {1-exp(-o.ll)}. FIgure 5.12 shows the empirical variogram, USIng a grouping interval of 0.25 years, and a parametric fit obtained by maximum likelihood, which gave estimates f2 = 10.7, &2 = 25.0, (;2 = 2.3, and 0: = 0.23. Figure 5.11 suggests that the mean number of CD4+ cells is approximately constant at close to 1000 cells prior to seroconversion. Within the first six months after seroconversion, the mean drops to around 700. Subsequently, the rate of loss is much slower. For example, it takes nearly three years before the mean number reaches 500, the level at which, when these data were collected, it was recommended that prophylactic AZT therapy should begin (Volberding et at., 1990). As with Example 5.1 on the protein contents of milk samples, the interpretation of the fitted mean response is complicated by the possibility that subjects who become very ill may drop out of the study, and these subjects may also have .unusually. low CD4+ counts. In Chapter 13 we shall discuss ways of handlmg potentIally informative dropout in longitudinal studies. . The upturn in the estimated mean time-trend from approXImately four . . 0 f course, a n artificial IS, . by-product . , of . the years after seroconverSlOn assumption that the time-trend is quadratic. As in our earlIer dISCUSSIons
PARAMETRIC MODELS
no
FOR COVARIANCE STRUCTURE ESTIMATION OF INDIVIDUAL TRAJECTORlES
the prediction variance of Y(t). Our predictor therefore takes the form
40
--------Viet)
30
~
::.
HI
= {J.(t) + U + ~Vi(t) == {J.(t) + Rt(t),
•
20
say.
•
.
For the time being, we shall assume that /l(t) is specified by a general
~Illear ~odel, and can therefo~e be estimate.d using the methods described III Sect~on 5.3. To construct Ri(t) we proceed as follows. For an arbitrary
10
set of tImes, u == (UI, ... ,up), let R.; be the p-element random vector with rth element itt (u r ). Let Y i be the ni -element vector with jth element Yij ~ /-L(t) + Ri(tij) + Zi}, and write t i == (til, ... ,tin,). Then, as our predictor for R.; we use the conditional expectation
o L---.,----~2----:3----:4~---;5--' o u Fig. 6.12. CD4+ cell counts: observed and fitted variograms e: empirical variogram - - -: empirical variance -~: fitted model. of these data a non-parametric, or at least non-linear, treatment of /L(t) would be preferable, and we shall take this up in Chapter 14.
5.5 Estimation of individual trajectories The CD4+ data provide an example in which the mean response is of interest, but not of sole interest. For counselling individual patients, it is that individual's history of CD4+ levels which is relevant. However, because observed CD4+ levels are highly variable over time, in part because of a substantial component of variation due to measurement error, the individual's observed trajectory is not reliable. This problem could arise in ~ither a non-parametric or a parametric setting. Because our motivation IS th.e CD4+. data, we will use the same modelling framework as in the prevIous sectIOn. J' - , 1 ... , n.t ,. 1 The data again consist of measurements ' y.. tJ , ,... ,m, and associated times, t ij . The model for the data is that
Yij
q
•
=
= f..t(tij) + Vi + W;(t;j) + Zij,
where the Vi are mutually independent N(O 11 2 ) the Z.. t all . ded h " t J are mu u y III pendent N(O 2) · G ,T . an t e {Wi (t)} are mutually independent zero-mean st at IOnary aUSslan proce 'th ' . . t sses WI covariance function 0- 2 p( u) Our objectIve IS 0 construct a predictor for th CD4 . time t. Because the Z.. e + number of an individual, i, at contribution t th t~:epres~nt measurement error, they make no direct , 0 e pre Ictor, Y(t), although, as we shall see, they do affect
k, = E(R; IYd· Under our assumed model, R.; and Y i have a joint multivariate Gaussian distribution. Furthermore, for any two vectors, u and t say, we can define a matrix, G(u, t), whose (r,j)th element is the covariance between R;(u r ) and R i (tj ), namely v 2 + (j2 p( IUr - t) I). Then, the covariance matrix of Y i is T 2 1+ G(t i , t i ) and the complete specification of the joint distribution of R i and Y i is
[R;] Yi
rv
111\1 N {[
0] [G(U, u) G(t;, u)
J..L,
7
2
G(u,t i ) ]} 1+ G(t" ti) ,
where J..Li has jth element p( ti})' . .' . . Using standard properties of the multivariate Gaussian distrIbutIOn It now follows that (5.5.1) with variance matrix Var(R;) defined a.'> Var(R; IY
) i
= G(u, u)
- G(u, t i ){ 7 2 1+ G(t;.t;)} -IG(t;, u).
(5.5.2)
2 _ 0 and u = ti. the predictor, ill' reduces to Yi - /li Note that when T h . measurement error the 't 'hould - when t ere IS no . . with zero vanance, as I::; CD4+ Inbers at the observatIOn d' t' f the true· nu data are a perfect pre IC IOn 0 2 ' 0 R' . flects a compromise between times. More interestingly, when 7 > '1 ~/e 72 increases. Again, tllis Y i - J..L' and zero, tending towards the a .er avasriance is large. an indit 'f asurement error I makes intuitive sense - I me . r. bi and the prediction for t lat vidual's observed CD4+ counts are Ulue ~a . ed~ idual's data and towards individual should be moved away from t e III IV the population mean trajectory.
PARAMETRIC MODELS FOR COVARIANCE STRUCTU~
112
FURrHER READING
In practice, formulae (5.5.1) and (5.5.2~ would be evaluated replacing J.l by the fitted linear model with jL = X (3. They therefore assume that Jt(t),{3 and the covariance matrix, V, are known exactly. The~ should hol.d approximately in large samples when p,(t),{3, and V are estImated. ~t 18 possible to adjust the prediction standard errors to re~ect.the unce~amty in 13 but the corresponding adjustment for the uncertamty m the estImated covariance structure is analytically intractable.
Example 5.4. Prediction of individual CD4+ trajectories We now apply these ideas to the CD4+ data using the same estimates of p,(t) and V as in Example 5.3. Figure 5.13 shows the estimated mean time-trend as a thick solid curve together with the data for two men and their predicted trajectories. The predictions were calculated from (5.5.1) in conjunction with the estimated population mean trajectory Jt(t) and covariance matrix 11, These predicted curves are referred to as empirical Bayes estimates. Note that the predicted trajectories smooth out the big fluctuations in observed CD4+ numbers, reflecting the substantial measurement error variance, 7'2. In particular, for individual A the predicted trajectory stays above the 500 threshold for AZT therapy throughout, whereas the observed CD4+ count dipped below this threshold during the fourth year post-seroconversion. For individual B, the fluctuations in observed CD4+ numbers are much smaller and the prediction tracks the data closely. Note that in both cases the general level of the observed data is preserved in the predicted trajectories. This reflects the effect of the component of vari2 ance, 11 , between individuals; the model recognizes that some men are 2000
113
intrinsically high responders and some low the predictions towards the pOp..l_t' reslPonders, and does not force wa. Ion mean evel. 5.6 Further reading
The subject of parametric modelling for lonait ' .,. ud'maI d ata contmues to . . generate an extensIve literature. Within the scope of th l' , e mear model WIth carre1ated errors, most· of the ideas are now well est abl'IShed ,and t he specIfic . d ls mo e proposed by dIfferent authors often amount to sens'lbl . t' . f T th C . , e vana ,IOns on a amI lar erne. ontnbutlons not cited earlier include Pantula and Pollock (1985), Wa~e (1985), Jones and Ackerson (1990), Munoz et al. (1992) ~nd Pourah~adl (1999),.Jon.es (1993) describes a state--space approach to lme~r modelhng of longltudmal data with serial correlation, drawing on the Ideas. of Kalman ~ltering (Kalman, 1960). Goldstein (1995) reviews the multi-level modellIng approach. A comprehensive discussion of the linear mixed model can be found in Verbeke and Molenberghs (2000). Nonlinear regression models with correlated error structure are considered in Chapter 14.
+ +
Q; 1500 .c E ::J
+
C
+
Q; 1000
+
l,,)
+
+
~
0
500 +
•
0 -2
o
2
4
6
Years since seroconversion
Fig. 5.13. CD4+ cell counts and e
with parametric estimate of
'. 1 ~Plflcal Bayes predictions for two subjects, popu atlOn mean response profile.
'17
71
TIME-BY-TIME ANOVA
115
6.2 Time-by-time ANQVA A time-by-time ANOV,A consists of n separate analyses, one for each subset of data to each time of observat'lon, t j ' E ach analySIS ' IS . , correspondmg IA a conventIOna NOVA, based on the appropriat d 1 ' . , " e un er ymg expenmental deSIgn and mcorporatmg relevant covariate information D t 'I b £ d' . e al S can e o~n m any standard text, for example Snedecor and Cochran (1989) Wmer (1977), and Mead and Curnow (1983). ' The simple~t illustration is the case of a one-way ANOVA for a completely randomIzed . , . . design. For the ANOVA table ' we use the d0 t no t atlOn to mdlcate averagmg over the relevant subscripts, so that
6 Analysis of variance methods
-1'" mh
6.1 Preliminaries
Yh.j
h=l" .. ,g; i=l , .. , " mh' J'=l ... " n
and 9
y.. j = m-
1
L
mh
LYhij
h=l i=l 9
= m- 1
L
mhYh,j'
h=I
Then, the ANOVA table for the jth time-point is
Between treatments Residual Total
df
Sum of squares
Source of variation
(6.1.1)
BTSS = l:t=I mh(Yh,j RSS = TSS - BTSS TSS
- Y'-j)2
= l:t=Il:7:hI (Yhij
g-l
m-g -
y.. j)2
m-l
The F -statistic to test the hypothesis of no treatment effects is F = {BTSS/ (g - I)} / {RSS/ (m - g) }, Note that j is fixed throughout the analysis and should not be treated as a second indexing variable in a two-wa.y
in which each YhtJ-. d eno t es th e Jt 'h observatIOn ' from the ith unit within th e ht h treatment group Fu th ' . ' r ermore, we assume a common set of times of 0 bservatlOn c . ' t·J' J = 1, ' .. ,n, lOr all of the m = "\'9 m sequences of a b servatlOns. An em hasis d' . L...Jh=I h choice of notation, alth:u h i on. e~Igned expenments is implicit in this by covariate vect g n pnnclple we could augment the data-array ors, Xhij, attached to each d b observational stud'Ies. Yhij, an so em race many We write {lh' - E(YJ ) f th group hand d:fin-e th hij or e mean response at time tj in treatment , mean respon :til' J-Lh = (lihi 11) eTh . se pro em group h to be the vector f'" " ' " f"'hn. e pnmary b' t' inferences about the .h . 0 Jec Ive of the analysis is to make {lhj WIt partIcular £ . the 9 mean response p fil re erence to dIfferences amongst ro es, J-Lh'
~ Yhij
i=I
In this chapter we describe how simple methods for the analysis of data from designed experiments, in particular the analysis of variance (ANOVA) can be adapted to longitudinal studies. ANOVA has limitations which prevent its recommendation as a general approach for longitudinal data. The first is that it fails to exploit the potential gains in efficiency from modelling the covariance among repeated observations. A second is that, to a greater or lesser extent, ANOVA is a simple method only with a complete, balanced array of data, This second limitation has been substantially diminished by the development of general methodolodgy for incomplete data problems (Little and Rubin, 1987), but the first is fundamental. Whether it is crucial in pr~tice depends on the details of the particular application, for example the sI~e and shape of the data array. As noted in Section 4.1, modelling the covarIance structure is most valuable for data consisting of a small number of long sequences of measurements. Throughout this chapter, we denote the data by a triply subscripted array, Yhij,
=mh
iI
ANOVA, . ffi Whilst a time-by-time ANOVA has the virtue of simplicity, It su ,ers from two major weaknesses, Firstly, it cannot address questions concernmg treatment effects which relate to the longitudinal development ~f the mea:n tween succeSSIve tj' ThIS response profiles' for example, growt h rat es be " , . b t' , a t earher tImes as can be overcome in part by usmg a serva Ions, Yh.k, 1 d' (1987) ante-dependence mode s, K covariates for the response Yhij' enwar s 1 t f thl"S idea , , , S . 521 logical deve opmen 0 ' whiCh we dIscussed III ectlOn . , ,are a al e Secondly the inferences made within each of the n se~aratevan yses ar , h h ld be combmed, rOr exampl e, not independent, nor is it clear how t ey s ou
TIME-BY-TIME ANOVA
115
6.2 Time-by-time ANOVA A time-by-time ANaVA le na yses, one lor each sub. consists of n separate. a s('t 0 f data correspondmg to each time of observ t' t E ach analySIS . IS . .' . a Ion,]. a conventIOnal ANaVA, based on the appropriat d I . . " . ' . e un er ymg expenmental deSign .and mcorporatmg relevant covariate informat'lO t '1scan b e . n. D eal (1989) found III any standard text, for example Snedecor and Coch Winer (1977), and Mead and Curnow (1983). ran. The simple~t illustration is the case of a one-way ANOVA for a completely randomIzed design. For the ANOVA table, we use the dot notation to indicate averaging over the relevant subscripts, so that
6 Analysis of variance methods
6.1 Preliminaries
In this chapter we describe how simple methods for the analysis of data from designed experiments, in particular the analysis of variance (ANOVA) can be adapted to longitudinal studies. ANaVA has limitations which prevent its recommendation as a general approach for longitudinal data. The first is that it fails to exploit the potential gains in efficiency from modelling the covariance among repeated observations. A second is that, to a greater or lesser extent, ANaVA is a simple method only with a complete, balanced array of data. This second limitation has been substantially diminished by the development of general methodolodgy for incomplete data problems (Little and Rubin, 1987), but the first is fundamental. Whether it is crucial in pr~ctice depends on the details of the particular application, for example the SIze and shape of the data array. As noted in Section 4.1, modelling the covariance structure is most valuable for data consisting of a small number of long sequences of measurements. Throughout this chapter, we denote the data by a triply subscripted array, Yhij,
h = 1, ... , g; i = 1, '" ,mh' ) J' = 1 ... " n
(6.1.1)
in which each Yhij denotes the jth observation from the ith unit within t he hth treatment group . Fu r thermore, we assume a common set of times · of 0 bservatlOn t· j - 1 . ' ] ' - , ... , n, for all of the m = mh sequences d' . Dh=I of 0 b servatlOllS. An em hasi choice of not t' lth p s on eSIgned expenments is implicit in this a lon, a ough i '. I by covariate t n prmclp e we could augment the data-array . vec ors, Xhij, attached to e h d observational stud'Ies. ac Yhij, an so embrace many We write /l-h' - E(Yi ) f h group h, and d:fi~e the h~ea~r t e mean respo.nse at time t j in treatment ILh = (lIhl ) Th .response profile m group h to be the vector ,., , ... , /l-hn. e pnmar b' . inferences about th .h Y 0 JectIve of the analysis is to make e /l-hj WIt partieul f . the 9 mean response fil ar re erence to dIfferences amongst pro es, ILh'
"'9
and 9
y.-j =
m-
l
mh
L
LYhi]
h=l
i=I
9
= m- I
L
mhYh·]"
h=I
Then, the ANaVA table for the jth time-point is Source of variation Between treatments Residual Total
df
Sum of squares BTSS = 2:::~=I mh (Yh.j RSS = TSS - BTSS TSS = 2:::7,=1
-
y.. j)2
L:::\ (Yhij -
y.. j f
g-1 m-g m-1
The F -statistic to test the hypothesis of no treatment effects is F = {BTSS/(g -l)}/{RSS/(m - g)}. Note that j is fixed th~ough~ut the analysis and should not be treated as a second indexing vanable m a two-way ANaVA. . . uffi Whilst a time-by-time ANOVA has the virtue of simpliCIty, It s .ers · ly,' 1t canno t a ddress questions concermng from two major weaknesses. F Irst treatment effects which relate to the longitudinal development ~f the mTehi~ between succeSSIve tj' s response profiles' for exampIe, groW th rat es . . t' y' at earher tImes as , . b can be overcome in part by usmg 0 serva Ions, h.k, d I d' (1987) ante-dependence mo e s, covariates for the response Yhij' K enwar s d I t f thI'S idea I gical eve opmen 0 . . . . . 521 whIch we dIscussed m SectIon .. ,are a 0 t alyses are Secondly the inferences made within each of the n se~arad e;:n pIe , h h Id be combme . ror exam , not independent, nor is it clear how t ey s au
ANALYSIS OF VARIANCE METHODS
DERIVED VARIABLES
1I6
. 'fi t up-mean differences may he col. f . ally slgm can, gro . a succesSIOn 0 margm' I ted data but much less so If there II' 'th weakly carre a ' . 've observations on each UnIt. lectively compe mg WI, l are strong correlations between success f l ' a trial on intestinal parasites . Example 6.1. Weights 0 ca ves III . . 'ment to compare two (iJfferent treatKenwar d (1987) desenbes an expen £ t 11' g intestinal parasites of calves. The data ents A and B say, or can ro III f . m, . h f 60 Ives 30 in each group, at each 0 11 times of consist of the WeJg ts a ca , measurement, as listed in Table 6.1. . 1 h th t 0 observed mean response profiles. The observed FIgure 6. sows e w £ t tment B is initially below that for treatment A, but mean response or rea . h d' d between the seventh and eighth measurement times. tear er IS reverse A time-by-time analysis of these data consists of a tw~-sample t-test at each of the 11 times of measurement. Table 6.2 summanzes the results of these t-tests. None of the 11 tests attains conventional levels of significance and the analysis would appear to suggest that the two mean response profiles are identical. An alternative analysis is a time-by-time analysis of successive weight-gains, dhij = Yhij -Yhij-l' The results of this second analysis are also given in Table 6.2. The striking features of the second analysis are the highly significant negative difference between the mean responses on treatments A and B at time eight, and the positive difference at time 11. This simple analysis leads to essentially the same conclusions as the more sophisticated ante-dependence analysis reported in Kenward (1987). . The overall message of Example 6.1 is that a simple analysis of longitudInal data can be highly effective if it focuses on exactly the right feature of the data, which in this case was the weight-gain between successive twowee~ periods rather than the weight itself. This leads us to a discussion of denved vanabies for longitudinal data analysis.
6.3 Derived variables Given a vector of observations .= ( . . , a derived variable i I ' Yh, Yh,l,··., Yhin), on a partIcular umt, . s a sca ar-valued function . _ () . tlOn for analysing a deri d . bl . ' Uh, - U Y hi . The motIva. . ve vana e IS two-£ ld Fr ., of VIew It reduces a mult' . t o . am a pragmatIc pomt Ivana e problem t " lar applications a single d . d . 0 a umvanate one; in particu, enve vanable substantive issues raised b th d may convey the essence of the detailed growth process fo; e at~. For example, in growth studies the h . b e campIex, yet for some purposes t he mterest may foe eac umt may . t d . us on somethIng . I ra e urmg the course of th . as SImp e as the average growth . e expenment B d' quantIty, Uhi, we avoid the is f . y re uCIllg each Y h' to a scalar can agam . use standard ANaVA sue a correlat'Ion WIt . h'III each sequence ' and Wh or regre . , e~ there are two or more derived ss~on methods for the data analysis. combmed inference which w vanabIes of interest the problem of e encounte d . h I re WIt the time-by-time ANOVA
117
Table 6.1. Weights (kg) of c a l ' . inal parasites. The original ~es III a. trial on the control of intestexpenment illvolved 60 1 3' of two groups. Data below ar £ th fi ca ves, 0 ill each e or erst group only. Time in weeks
0
2
4
6
8
10
12
14
16
18
233 231 232 239 215 236 219 231 230 232 234 237 229 220 232 210 229 204 220 233 234 200 220 225 236 231 208 232 233 221
224 238 237 246 216 226 229 245 228 240 237 235 234 227 241 225 241 198 221 234 234 207 213 239 245 231 211 248 241 219
245 260 245 268 239 242 246 270 243 247 259 258 254 248 255 242 252 217 236 250 254 217 229 254 257 237 238 261 252 231
258 273 265 288 264 255 265 292 255 263 289 263 276 273 276 260 265 233 260 268 274 238 252 269 271 261 254 285 273 251
271 290 285 308 282 263 279 302 272 275 311 282 294 290 293 272 274 251 274 280 294 252 254 289 294 274 267 292 301 270
287 300 298 309 299 277 292 321 276 286 324 304 315 308 309 277 285 258 295 298 306 267 273 308 307 285 287 307 316 272
287 311 304 327 307 290 299 322 277 294 342 318 323 322 310 273 303 272 300 308 318 284 293 313 317 291 306 312 332 287
287 313 319 324 321 299 299 334 289 302 347 327 341 326 330 295 308 283 301 319 334 282 289 324 327 301 312 323 336 294
290 317 317 327 328 300 298 323 289 308 355 336 346 330 326 292 315 279 310 318 343 282 294 327 328 307 320 318 339 292
293 297 321 326 334 329 336 341 332 337 308 310 300 290 337 337 300 303 319 326 368 368 349 353 352 357 342 343 329 330 305 306 328 328 295 298 318 316 336 333 349 350 284 288 292 298 347 344 328 325 315 320 337 338 328 329 348 345 292 299
19
again needs to be addressed. However, it is arguable tha~ the practical importance of the combined inference question is reduced If the separate derived variables address substantially different questions and each one has a natural interpretation in its own right.
ANALYSIS OF VARIANCE METHODS
DERIVED VARIABLES
119
118
parameter estimates from these fits as a natural set of derived variables. For exam~le, we might decide that the individual time-sequences can be well descnbed by a logistic growth model of the form
350
l.lhi(t) = oh;[l
300
250
200
LO-----'5------;;10~-----;'15~----;2roO~ Time (weeks)
' 6 ,. 1 Observed mean response profiles for data on weights of calves. - - : F Ig,
treatment Aj - - -: treatment B. Table 6.2. Time-by-time analysis of weights of calves. The analysis at each time is a two-sample t-test, using either the weights, Yhij, or the weight-gains, dhij, as the response. The 5%, 1%, and 0.1% critical values of t58 are 2.00, 2.16 and 3.47. Test statistic at time 1
2
3
4
5
6
7
8
9
Y 0.60 0.82 1.03 0.88 1.21 1.12 1.28 -1.10 -0.95 d 0.51 0.72 -0.15 1.21 0.80 0.55 -6.85 0.21
10
11
-0.53 0.85 0.72 4.13
The derived variable approach goes back at least to Wishart (1938) and was systematized by Rowell and Walters (1976). Rowell and Walter~ set. out the .detail~ of an analysis for data recorded at equally spaced timepOlllts, t" III whICh the n d' . I -. 1ImenSlOna response ' Uh'tl is transformed to a se t af artJhago 1 1 na ~o ynoillla coefficients of degree, 0 1 .. , n - 1. The first two of these are Immediatel . t .'" and y III erpretable m terms of average response average rate of change res t' 1 . h' pec Ive y, durmg the course of the experiment It I'S les 1 . s c ear w at tangibl . t . degree coefficients Al f . e m erpretatIon can be placed on higher . so, rom an mferent' 1 . f' '.. recognize that the 'orth I' la pomt 0 VieW, It IS Important to . ogona sums of squ . . mlal decomposition are no t Orth . ares assocIated WIth the polynoindependent because th b o~onal III the statistical sense, that is, not , eo servatlOllS w'th' . I. In Uhi are III general correlated. A more imaginative use of d ' pretable non-linear mod Is t enved vanables is to fit scientifically intere 0 each tim e-sequence, Yhi' and to use the
+ exp{ -(3h,(t -
b'hi)}r l
•
in .which ~ase e~timates of the asymptotes 0h" slope parameters (3h, and pomts of mflectlOn Ohi form a natural set of derived variables. Note that non-linear least squares could be used to estimate the 0h" 13hi , and Ohio and that no assumptions about the correlation structure are needed to validate the subsequent derived variable analysis. However, whilst validity is one thing, efficiency is quite another. The small-sample behaviour of ordinary least squares (OLS) estimation in non-linear regression modelling is often poor, and this can only be exacerbated by correlations amongst the Yhij' Furthermore, the natural parametrization from a scientific point of view may not correspond to a good parametrization from a statistical point of view, In the simpler context of uncorrelated data. these issues are addressed in Bates and Watts (1988) and in Ratkowsky (1983). Strictly, the derived variable approach breaks down when the array, Yhij, is incomplete because of missing values, or when the m times of observation are not common to all units. This is because the derived variable no longer satisfies the standard ANOVA assumption of a common variance for all observations. Also, we cannot simply weight the values of the derived variables according to the numbers of observations contained in the corresponding vectors, Ylti' because of the unknown effects of the correlation structure in the original data. In practice, derived variables appear to be used somewhat optimistically with moderately incomplete data. The consequences of this are unclear, although the randomization justification for the ANOVA is available if we are prepared to assume that the mechanisms leading to the incompleteness of the data are independent of. both the experimental treatments applied and the measurement process Itself. As we shall see in Chapter 11, this is often not so. Example 6.2. Growth of Sitka spruce with and without ozone To illustrate the use of derived variables we consider the ~98: growth dl~t~ on Sitka spruce. Recall that these consist of 79 trees grown III °tur c~n~ro '~h . h' h the first two chambers are rea e WI environment chambers m w IC t' bers of trees in the four introduced ozone at 70ppb. The rehspecb~vet.nu:fthe experiment was to 27 12 and 13 Teo Jec lve . wth d the response from each ch amb ers are 27 " investigate the effect of ozone on tree gro t '/yll _ log(d2h) on days 103, f eight measuremell s 0 tree was a sequence 0 d 308 h day one is first January 1989, 213 247 273 an were 130, 162, 190, , ,~ and hei ht respectively. and d and h refer to stem dIamether d . angt ~ource of variation in these . 46 saw that t e amm In SectIOn . we profile for each tree. This suggests data was a random shift in the response
ANALYSIS OF VARIAN
CEMETHODS DERIVED VARIABLES
'table derived variable. ould be a sUI " d mean response w ANOVA estimatmg a separate that the ob~:eanalysis is then a oneh-wafYthe four chambers, followed by The approp, :Lh in eac a mean response parameter, / ' us control contrast, estimation of the treatmen vers
121
120
c:=:
(1"1
+ 1"2) -
(1"3
2.2448 31.0320 33.2768
Between chambers Residual Total
df
Mean square
3 75 78
0.7483 0.4136 0.4266
o
(6.17 + 6.29) = -0.70,
with estimated standard error
se(c)
=
{vo
4136 .
=
+ 1"4)'
. . to test for chamber effects is not significant Note that the F-statlst lC ) However the estimate of the treatment (F3.75 = 1.81, p-val ue.~ .I5, , versus control contrast IS
c= (5.89 + 5.87) -
approximation is to assume exponential decay towards zero. This gives w(t) = w(~)~xp(-l1t) or, dividing by w(O),Xj exp(-;3t ). Taking y = J log (x J ), thiS III turn gives a linear regression through the origin, J
(~ + ~ + ~ + ~)} = 13 27 27 12
Figur~ 6.2 shows th~ 40 sequences of measurements. YJ' plotted against t . J The hnear model gIves a good approximation, but with substantial variation in the slope, -{3, for the different plants. This suggpsts using a least-squares estimate of 11 from each of the 40 plants as a derived variable. Note that {3 has a direct interpretation 3B a drying rote. Table 6.3 shows the resulting values, bhl1 h = 1, ... ,4; i = 1, ... , 10, the subscripts h and i referring to pH level and replicate within pH level, respectively. To analyse this derived variable, we would assume that the bhi are realizations of Gaussian random variables, with constant variance and means dependent on pH level. However, the pattern of the standard errors in Table 6.3 casts doubt on the constant variance assumption. For this reason, we transform the bhi to Zhi = log(b h ,) and assume that Zhi ,.... N(Jlh,U 2 ), where Jlh denotes the mean value of the derived variable within pH E
E
(al
.~ o.O~------------'·~
0.31,
~
~
-£i -0.2
-£i -{l.2
oc
0
g corresponding to a t-statistic of 2.26 on 75 df, which is significant (p-value ~ 0.03). The conclusion from this analysis is that introduced ozone suppresses growth of Sitka spruce, although the evidence is not overwhelming. Example 6.3. Effect of pH on drying rate of holly leaves This example concerns an experiment to investigate the effect of different pH treatments on the drying rate of holly leaves. The treatment was administered by intermittent spraying during the plant's life. Ten plants were allocated to each of four pH levels, 2.5, 3.5, 4.5, and 5.6, the last being a control. Leaves from the plant were then allowed to dry and were weighed at ~ach of 13 times, ~rregularly spaced Over a three-day p~riod. The recorded vanabl,e was the ratIo of current to initial fresh weight. Thus if w( t) denotes the weight at time t, the recorded sequence of values from 'each plant is
Xj
= w(tj)/w(O),
j
= 1, ... ,13.
Note that t = 0 d 1r ~ ,an XI = lor every plant. A plaUSible model for the d . towards a lowe b d h' rymg process is one of exponential decay ial. If this lowe~ bound':' Ich represents the dry weight of the plant materoun IS a small fraction of the initial weight, a convenient
(b)
0.0.--------------,
~ c
.~ -0.6
~ -0.6
e~
o
~
~
se
8'-1.0 -'
8'-1.04--_~~,.____._-~-,...-_:_!
o
1000 2000 3000 4000 5000 6000-'
0
1000 2000 3000 4000 5000 6000 TIme (minutes)
Time (minutes) ~
~
(d)
·~;;;;;;;;;;;;;;~=::==-l
'f ~~;;;;;;S;;;~;;;:::====-1 !~::' ~ -£i ~----------=:: E
(e)
0.0
~
-0.2
0
o
c ~ -{l.6
c
~-o.6 ~
A
.3'-1.0 0
~
e $
8'-1.0 L,0--10..,..00-2-0.... 00-3"='00""':0--::40:..0:-0-;:5~00;;:0~6000 1000 2000 3000 4000 5000 6000 -' TIme (minutes) Time (minutes)
I (a) pH Fig. 6.2. Data on drying rates of holly eaves: (c) pH = 4.5; (d) pH = 5.6 (control).
=
2.5; (b) pH
=
3.5;
ANALYSIS OF VARIANe
122
E METHODS REPEATED MEASURES
. d data (bhi) for analysis I 63 Deflve Tab e '. . t of holly leaves. of the drymg ra es pH Replication 1
2 3 4 5 6 7 8 9 10
Mean SE
2.5
3.5
4.5
5.6
0.33 0.30 0.91 1.41 0.41 0.41 1.26 0.57 1.51 0.55
0.27 0.35 0.26 0.10 0.24 1.51 1.17 0.26 0.29 0.41
0.76 0.43 0.18 0.21 0.17 0.32 0.35 0.43 1.11 0.36
0.23 0.68 0.28 0.32 0.21 0.43 0.31 0.26 0.28 0.28
0.77 (0.15)
0.54 (0.14)
0.44 (0.09)
0.33 (0.04)
(h
= 0 within model three, using the residual mean square from the saturated model two, is 9.41 on 1 and 36 degrees of freedom. This corresponds to a p-value of 0.004, representing strong evidence against 0 == O. The 1 F-statistic to test lack of fit of model three within model two is 0.31 on two and 36 degrees of freedom, which is clearly not significant (p-value == 0.732).
The conclusion from this analysis is that log-drying-rate decreases, approximately linearly, with increasing pH.
6.4 Repeated measures A repeated measures ANOVA can be regarded as a first attempt to provide a single analysis of a complete longitudinal data set. The rationale for the analysis is to regard time as a factor on n levels in a hierarchial design with units as sub-plots. In agricultural research, this type of experiment is usually called a split-plot experiment. However, the usual randomization justification for the split-plot analysis is not available because there is no sense in which the allocation of times to the n observations within each unit can be randomized. We therefore have to assume an underlying model for the data, namely, Yhij
level h. The means and standard errors of the log-transformed drying rate, corresponding to the last two rows of Table 6.3, are Mean -0.44 SE 0.19
-1.03 0.25
-1.01 0.19
-1.17 0.11
In contrast to the direct relationship observed in Table 6.3 between mean and variance for the estimated drying rates, bhi , a constant variance model does seem reasonable for the log-transformed rates, Zhi . . "'!e now consider three possible models for J.Lh, the mean log-drying-rate wlthm pH level, h. These are
= f.l
(1) (2)
f.lh f.lh
(no pH effect); arbitrary;
(3)
f.lh
==
eo + fh Uh where Uh denotes pH level in treatment group h.
t~ c~~pa:ison
of between models one and two involves a one-way ANOVA f fr e d a a III Table 6.3. The F-statistic is 3.14 on three and 35 degrees I f . o ee am, corresponding to frences e 'm mean response bet a p-va H ue I0 0.037 which is indicative of dlfthe observation th t th I ween P evels. Model three is motivated by on pH in a rou hiare °ttr~nsformed rates of drying seem to depend g y mear ashlo n . The F -statistic to test the hypothesis
123
= 13h
+ 'Yhj + Uhi + Zhij,
j = 1, ... ,n; i == 1, ... , mh; h == 1, ... , 9 (6.4.1 )
Jh
in which the 13h represent mai~ eflect~ for treatmen~s and the j inte:~ tions between treatments and tImes WIth the constramt that Lj=l Ih; . ' for all h. Also, the Uhi are mutually independent random effects for urnts and the Zhij mutually independent random measurement error~ U. rv In the model (6.4.1), E(Yj,;j) = 13h + Ihj . .If w~ as~um~ t at h~_ dlstnbutlOn ( N 0, 1/ 2) an d Z hij rv N(O " (12) then the resultmg • • t . of Y h, . Yih,n . ) is multivariate Gaussian WIth covanance rna nx (Yih,I,···,
V =
(12 I
+ 1/2 J,
.' . d J a matrix all of whose elements are 1. where I is the IdentIty matrIx 1/2/(1/2 + (12), between any two This implies a constant correlatIOn, P -
ax:
_
observations on the same unit. he model (6.4.1) is given by the follo~ing The split-plot ANOVA for t . the dot notation for averagmg, table: In this ANOVA table, we agam use)_l ",mh "'':' Yhi', and so on. -1 ",n ., y .. = (mhn ~;=l ~;=l; . . b f 'ts The first F-statlStlc so that Yhi. = n ~j=I Yh,;, h mh denotes the total num er 0 urn . Al so, m -- "C'"'9 ~h=I . associated with the table IS
F 1 = {BTSSt/(9 -l)}/{RSSI/(m - g)},
ANALYSIS OF VARIANCE METHODS
CONCLUSIONS
124
125
Sum of squares
Source of variation Between treatments Whole plot residual Whole plot total Between times Treatment by time interaction Split-plot residual Split-plot total
BTSSI RSSI
= m E~=I mh(Yh.· -
= TSSI -
y.
.)2
09-1 m-g
BTSSI
= m Er.=l E;:hl (Yhi. - y.·Y BTSSz = n E;=l (y.. j - y.. y TSSI
n-l
1SSz = E7=1 Er.=1 nh(Yh-j - y·YBTSS I - BTSSz RSSz = TSSz - 1SS2 - BTSS2TSSI
TSSz
= Er.,=Il:~ll:;=l (Yhij
which tests the hypothesis that
Fz
df
f3h
=
f3, h =
- y... )2
(g - 1) x (n - 1) (m - g) x (n - 1) nm-l
1, ... , g. The second is
= {ISSz/[(g -1)(n - 1)]}/{RSSzI[(m -
g)(n - 1)J},
which tests the hypothesis that rhj = rj, h = 1, ... , 9 for each of j 1, ... , n, that is, that all 9 group mean response profiles are parallel. The split-plot ANOVA strictly requires a complete data array. As noted in Section 6.1, recent developments in methodology for incomplete data relax t~s requirement. Alternatively, and in our view preferably, we can ~nal~se Incomplete data under the split-plot model (6.4.1) by the general l~kehhood-basedapproach given in Chapters four and five with the assumptlO~ of a ~on~tant correlation between any two measurements on the same umt and, InCidentally, a general linear model for the mean response profiles. Example 6.4. Growth of Sitka spruce with and without ozone (continued)
r~e f~~l,o~ing table ~resents a split-plot ANOVA for the
1989 sitka spruce a. vve Ignore possIble chamb iF t from the robust anal . . ~r e ec.s to ease comparison with the results ySlS given In SectIOn 4.6.
Source of variation Between treatments Whole plot residuals Whole plot total Between times Treatment by time interaction Split-plot residual Split-plot total
Sum of squares
df
Mean square
17.106 249.108 266.214 41.521 0.045
1 77 78 7 7
17.106 3.235 3.413 5.932 0.0064
5.091 312.871
539 631
0.0094
The F -statistic for differences betwe 1 . en contro and t t d IS F I = 5.29 on one and 77 degrees of freedom rea e . mean response of approximately 0.02. The F-statistic for d ' correspondmg to a p-value F 2 = 0.68 on seven and 539 degrees f f departure from parallelism is o ree om corresp d' t of approximately 0.69. The analys' th r '. on mgo a p-value IS erelore gIves m d t 'd a difference in mean response b t 0 era ,e eVI ence of e ween control and tr t d t evidence of departure from parallelism. . eae Tees. and no 6.5
Conclusions
The principal virtue of the ANaVA approach to longitudinal d t I' is its technical simplicity. The computatiollal 0 t' ' . la,a ana ySIS . . pera Ions !llVO ved are elementary and there IS an apparently reassuring familiarity in the solution of a complext class of problems by standard methods. However , we bel'leve . th a t th IS .vlr ue',IS outweIghed by the inherent limitations of the approach as noted III SectIOn 6.1. Provided that the data are complete, the method of derived variables can give a simple and easily interpretable analysis with a strong focus on particular aspects of the mean response profiles. Furthermore, it provides a feasible method of analysing inherently non-linear models for the mean response profiles, notwithstanding the statistical difficulties which arise with non-linear least-squares estimation in small samples. The method runs into trouble if the data are seriously incomplete, or if no single derived variable can address the relevant scientific questions. In the former case, the inference is strictly invalidated whilst in the latter there are difficulties in making a correct combined inference for the complete analysis, The method is not applicable if a key subject-specific covariate varies over time. Time-by-time ANaVA can be viewed as a special case of a derived variable analysis in which the problem of combined inference is seen in an acute form. Additionally, the implicitly cross-sectional view of the data fails to address the question of evolution over time which is usually of fundamental importance in longitudinal studies. . The split-plot ANaVA embodies strong assumptions about the covaTlance structure of the data. If these are reasonable, a model-based analysis under the assumed uniform correlation structure achieves the same ~nds, while coping naturally with missing values and allowing a structured lInear model for the mean response profiles. . . useful III partlcuIn summary, wh'lI st ANaVA methods are undoubtedly all . bl h to lar circumstances, they do not constitute a gener Y via e approac longitudinal data analysis, O
•
MARGINAL MODELS 127
7 Generalized linear models for longitudinal data This chapter surveys approaches to the analysis of discrete and continuous longitudinal data using extensions of generalized linear models (G LMs). We have shown in Chapter 4 how regression inferences using linear models can be made robust to assumptions about the correlation, especially when the number of observations per person, ni, is small relative to the number of individuals, m, With linear models, although the estimation of the regression parameters must take into account the correlations in the data, their interpretation is essentially independent of the correlation structure. With non-linear models for discrete data, such as logistic regression, different assumptions about the source of correlation can lead to regression coefficients with distinct interpretations. The data-analyst must therefore think even more carefully about the objectives of the analysis and the source of correlation in choosing an approach, In this ~hapter we discuss three extensions of GLMs for longitudinal data: margl~al, random effects, and transition models. The objective is to Pt.rese~;;hef Ideas underlying each model as well as their domains of applicaIon. ne ocus on the' t t t' 8-10 present det 'I b 1ll erpre a Ion of regression coefficients. Chapters al s a out each method and examples of their use.
7.1 Marginal models In a marginal model, the regression of h abIes is modelled separ t I f . . t e response on explanatory variI th a e y rom WithIn-person ('th' , ) n e regression we model t h ' WI m-umt correlation. of exp Ianatory variables ' e margInal expectat' E("\l") B' lon, L ij , as a function ag , y margmal expectat' e response over the sub I' lOn, we mean the averThe marginal expectation-~oPuhatlon that shares a common value of x. S 'fi IS W at we m d I . peCI cally, a marginal model has the £ 0 ~ In a cross-sectional study. ollowmg assumptions' (1) th e marginal e . of th ' xpectatlOn explanatory variables x.. b he response, E(Yij) = fli' depends on fun t' ' tJ , Y (fl") - x '(3 h J' C Ion such as the logit f b' tJ - ij were h is a known link or mary respo nses or log for counts;
(2) the marginal variance depends on the . Var(Y;j) = V(flij)¢ where v is a know ma~gmal mean according to scale parameter which may n . d t b n v~nance function and ¢ is a h ' eeo e estimated' (,'J) d}l"' . ) t e correlation between Y: 'J an ,k IS a funct' f h an d perhaps of additional aram ' Ion 0 t I' marginal means J( .. .' ) ,P eters Q, that is Co (}' V) ( fl'J,fl,k,Q where p(.) IS, a kno\""n f ' rr lJ' I,k- == unctIon.
Marginal regression coefficients {3 h. . coefficients from a cross-sectio I', I' ,aw the same .mterpretation as , ,na ana YSIS. tllargin' I d I. analogues for correlated data of GLM' i:' , a mo e S are natural Nearly all f th r ' s or mdependrnt data. o e mear models mtroduced in ('I .' mulated as marginal models sl'nc'e tl h " lapters 4 6 can be for, . ,wy avp 3.. 80 and a otherwise. - Figure 8.1 shows the estimated prevalence in two groups (1: risperidone of 6 mg' 0: haloperidol) at baseline and at 1, 2, 4, 6, and 8 weeks after randomization. There is a clear trend of decline in risk for having PANSS ~ 80 for both groups. A scientific question of interest is whether th~ apparent difference in trend between the two groups is clinically me,anmgful.. T~ formulate this objective statistically, we consider the followmg margm model for the binary response Y;j: logitPr(Y;j
= 1) = (30 + (3IXil + f32 X ij2 + f33 X il . Xij2, - 170 the number of subjects , t status defined earlIer . IS the treatmen
mh £or J. = 01 , I " " 5 and'-1 z - , ... , mwere '.
included in this analysis. Here
Xii
MARGINAL MODELS 154
6
. ,_ r
1.0
0.8
1 I
Haloperidol Risperidone 6 MG
1
--
EXAMPLES
_
155
5
R.... H~
~H
~
···· ...R : : - H________
0.6
~
·····R---------··-·--- A~~~~
~ a.. 0,4
4 0 .~
en
'0 '0
3
0
6l
0 -l
0,2
2
0.0
0.69
Fig. 8.1. Prevalence of PANSS ~ 80 as a function of time (in weeks) from randomization estimated separately for risperidone (6 mg) and haloperidol group.
O.l----,-----,-----r-----,--~---,~-~-_.__-.J
0,2513
0.5878 0.8473 1.0986
1.5041
2.1972
Lag in log(time + 1)
Fig. 8.2. Estimated log-odds ratio versus lag time, in log scale, for risperidone (6 mg) and haloperidol.
and Xij2 =; log(tj + 1), where t j is time in weeks from randomization. Thus, for example, Xi02 = log I = 0 and Xi52 = log 6. We have applied the logarithmic, tr~sformation to the tj's to capture the empirical phenomenon, as shown.m Flg,.8.l, of greater rate of decline in earlier weeks. The primary ~oeffiClent of mterest is (33 which describes the difference, in logit scale, III c~ange of prevalence over time between two treated groups. With up to SIX observations per subJ' ect'III the span 0 f eIght . '"IS of mterest weeks It to eX~lo:e t~e pattern of within-subject association to ins~re that such an associfatlOn IS properly acknowledged in GEE. Figure 8.2 shows for each one 0 the groups the plots f th . ' b' ' 0 e estimated log-odds ratio relating two mary responses from the sa b' that th d f" me su Ject versus the lag time. It is evident e egree 0 withm-subje t " ratio, decreases as the t' b c associatlOn, as measured by log-odds Ime etween visit . S . appears to be linear I'S . '1 b s Illcreases. uch a pattern, whIch , SImI ar etween th t t ture this empirical phen e wo reatment groups. To capomenon, we consider for J' < k = 0 , 1, ... , 5 , log OR(Yij, Yik)
= 00 + 01 IXij2 -
Xik2 I.
The results based upon the GEE a ' suggest that for patients in th h InaIY~IS are presented in Table 8.5. They e a opendol gro up, the prevalence of havmg .
PANSS ~ 80 decreases significantly (~2 = -0.445 ± 0.131) at the rate of 26% (0.26 = 1 - 2- 0.455) per week at baseline and less drastically at the rate of 8% (0.08 = 1 - (6/5)-0.445) per week at week four. The decline of prevalence for patients receiving risperidone of 6 mg is more pro~ounced at the rate of 43% (0.43 = 1 - 2-0.445-0.371) per week at baseline and 14% at week four. The difference between the two groups in pre~l~nce change over time is statistically significant at the 0.05 level (Z-sta~lstlc = -0.371/0.185 = -2.01). The results from the statistical modelling e ., I fi d' . F' 8 2 that the log-odds ratIO consistent with the empmca n mg mig. . g measuring the degree of within-subject association dec~eases as thedla ted observatIOns measure a t . £ time increases. Thus the ~dds ~atlO Of repea = ex (3.803 _ 0.924 . log 2) baseline and at week one IS estimated as 23.63 p d b t' 924 . 10 8) if the secon 0 serva IOn but drops to 6.56 = exp(3.803 - o. ~. ruing the treatment was measured at week eight. Finally, th:a~~e :gl~~;c: the within-subject effect as captured by /33 appears to be s f Models 1-3 of Table 8.5. . ., I odeUed' see results rom asSOCIatIOn IS proper Y ill '.... fed results for J33 under . . h' b' t asSOciatIOn IS Igno , However, If the WIt lll-SU Jec . t' I and qualitatively than the 'D: t b th quantIta Ive y , Model 4 are very d lHefen, a . thO trial more that 50% of rest. It is important to point out that m IS ,
a:
MARGINAL MODELS
EXAMPLES
156
. I d I coefficients (standard errors) from 8 5 Margma rna e Table ., . h'zo hrenia. GEE for the pre-post tnal on sc 1 P Model
157
Table 8.6.
Prevalence of respiratory infection and xerophthalmia by visit. Visit (season)
Prevalence (%) Variable
Logistic regression Intercept Treatment (Xl) Time (X2) Xl' x2
1 0.976 (0.222) 0.110 (0.315) -0.445 (0.131) -0.371 (0.185)
Within-subject association 3.803 Intercept (0.410) -0.924 lag time (z) (0.337) Treatment (Xl) Xl ·z
2
3
0.965 (0.221 ) 0.087 (0.313) -0.458 (0.129) -0.351 (0.184)
0.974 (0.222) 0.120 (0.315) -0.449 (0.131) -0.366 (0.184)
2.884 (0.456)
3.605 (0.591) -0.778 (0.444) 0.397 (0.809) -0.302 (0.673)
0.258 (0.641 )
1 (Su')
2 (A)
3 (W)
12.6 3.9 230
4.6 6.1 214
7.3 6.2 177
3.8 5.5 183
= Autumn', W = Winter' .. , S --
Sprmg. .
4 1.060 (0.225) 0.005 (0.320) -0.724 (0.141) -0.156 (0.202)
the sU~jects drop~ed out during the eight week follow-up period and that th~r~ IS strong ~vldence that dropout is related to response. The issue of mlssmg data WIll be addressed in Chapter 13 where we will revisit this example. ' Example 8.4. Respiratory infection in Indonesian preschool children Two hundred and seventy-five r h I ' . ined for t . p esc 00 children m Indonesia were examup a SIX consecutive qua t £ infection. This is a s b t f r ers or the presence of respiratory u se 0 a cohort st d' d b S A primary question of . t . u Ie Yammer et ai. (1984). . . merest IS wheth th I ' mfectlOn is higher amon h'ld er e preva ence of respIratory . gel ren who suit h h . mamfestation of chronic vit . A d fi . er xerop t almla, an ocular in the prevalence of resPl·ratamm._J.' e. Clency, Also of interest is the change .h ory huectlOn w'th I' elt er question can be dd d' 1 age. t IS worth noting that usmg both . data. In what follows aw resse '111 k . cross-sectIOnal and longitudinal , e WI 00 speClfi 11 ca y at the ageing question,
Respiratory infection Xerophthalmia No. of children 'Su - Summer; A
4 (S)
5 (Su) 14.9 3.1 195
6 (A) 9.4 3.0 201
Th~ prevalence of respiratory infection in six consecutive quarters, as shown m Table 8.6, reveals a possible seasonal trend with a summer maximum. The prevalence of xerophthalmia also indicat.es some seasonality, with a maximum in winter. We begin our analysis by considering only data from the first visit. The results are summarized in Table 8.7. Modell in Table 8.7 is a logistic regression of respiratory infection on xerophthalmia and age (centred at 36 months), adjusting for gender and height for age as a percentage of the United States National Center for Health Statistics standard. The results suggest a strong cross-sectional age effect on the prevalence of respiratory infection. The effect is non-linear on the logit scale, as the quadratic term for age is statistically significant and negative in sign. As shown in Fig. 8.3, the cross-sectional analysis suggests that the prevalence of respiratory infection increases from age 12 months and reaches its peak at 20 months before starting to decline. If we now include the data from all ~ix ~sits (Model 2 in Table 8.7), this concave relationship is preserve~ qualItatIvely, as shown in Fig. 8.3. We note that an annual sine and ~osme have ~een included in Model 2 to adjust for seasonality but that thiS has very ~ttle . ' The d'Iscrepancy among the age coeffiCients Impact on the age coefficIents. . al in Models 1 and 2 may be explained by Fig, 8.4 in which t.he cross-sectihon f . £. d' I ed graphIcally for eac 0 age effects on respiratory llllectlOn are ISp ay . ht d six visits, The age coefficients in Model 2 can be interpre.t~d as weig e '1 ffi ients from each VISIt. averages of the cross-sectlOna a~e coe.c . d ero hthalmia is posThe association between resplfatory mfectlO n a5~ IX IPThere is limited . , 11 . 'ficant at t he 10 eve. Hive, although not statIstlca YsIgm hth 1 . l'n this partial data . t' 'th xerop a mla information about the assoCla lOn w~ . A deficiency. See Sommer et at. set which includes only 52 events of vlt~mm t f over 23000 observations. (1984) for an analysis of the complete 'batat.se Oaf cross-sectional and lon· t' . h the contn u IOns . We now want to d IS mgms 1 t' hI'p of respiratory infectIOn , h t' ated re a lOns f gitudinal informatIOn to tees lID d.f:r es among sub-populations 0 . t arate werenc and age. That IS, we want 0 sep
MARGINAL MODELS
158
Table 8.7. Logistic regressions of the prevalence of respirat~ry function on age and xerophthalmia adjusting for gender, season, and heIg~t ~or age. Models 1 and 2 estimate cross-sectional effects; Models 3 and 4 dIstmguish cross-sectional from longitudinal effects. Models 2-4 are fitted using the alternating logistic regression implementation of GEE.
1
Intercept
-1.47 (0.36) -0.66 (0.44) -0.11 (0.041)
Gender Height for age Seasonal cosine Seasonal sine Xerophthalmia Age
0.44 (1.15) -0.089 (0.027) -0.0026 (0.0011)
2
-2.05 (0.21) -0.49 (0.24) -0.042 (0.023) -0.59 (0.17) -0.16 (0.14) 0.50 (0.44) -0.030 (0.008) -0.0010 (0.0004)
Age at entry
Follow-up time (Follow-up)2 logh')
0.49
(0.27)
--------
0.20
~
3
-1.76 (0.25) -0.53 (0.24) -0.051 (0.025)
0.53 (0.45)
4 -2.21 (0.32) -0.53 (0.24) -0.048 (0.024) -0.54 (0.21) -0.016 (0.18) 0.64 (0.44)
a... 0.10
o
20
40
60
Age (months) ~ig. 8.3. Prevalence of respiratorY i ~ . different models. : Modell' n.ectlOn as a function of age for three , ....... Model 2; - - -: Model 3.
1.0
I------------~
0.8 Q)
g 0.6 Q)
Cii
a;
a: -0.053 (0.013) -0.0013 (0.0005) -0.19 (0.071) 0.013 (0.004)
(Age at entry)2
~
c: Q)
r
~159
~
Model Variable
0.30
EXAMPLES
---~~::::=:::::~
046 .
-0.053 (0.013) -0.0013 (0.0005) -0.082 (0.099) 0.007 (0.007) 0.49
0.4
0.2 0.0
Visit4~
Lj~~!i~!lli~===::=~::==~==~~~~~§~§§§J Visit 3 Visit 6
o
~
--~~
20
Age (months)
40
60
Fig. 8.4. Prevalence of respiratory infection as a function of age estimated separately for each of the six visits.
(0.26)
children at diff (0.26) h'ld erent ages at a fi d cage1 .. ren . . xeTotime . th over time (I ongltudmal) d O (cross-sectIOnal) from changes in I]' e age of the ith child at the J'th ~o: we first decompose the variable VISIt, where J. -- 1, ... ,6, as the sum
the risk of respiratory infection as the children grow older. The parameter (3c describes the age effect which would be estimated from purely crosssectional data. The reader is referred to a more detailed discussion on this distinction in Sections 1.2 and 2.2. Note that the distinction between (3c and (3L defined for linear models holds only approximately for logistic and
If w age il + (age ij - age· ) e allow separ t ,I . a a e regressio geil , respectively, th e regression n coefficients (3c a n d (3L for age· 1 and age .coeffi . clent (3L d escn' b ' change iJof es the
other non-linear models. Results for Model 3 in Table 8.7 provide a different picture of how the risk of respiratory infection may be associated with age. The cross-sectional
MARGINAL MODELS 160
COUNTED RESPONSEs
~
161
prenatal care programme Tradit" 11 . the most commonly used'm d I ~ona y, the POisson distribution has been o e lOr count data. It has the form
1.5 1.0
Pr(Y == y) == J./,Ye-'" fyI,
III 'C
y == 0,1,2, ....
(I)
~ 0.5
..
iD
.9 0.0 -0.5 0
5
10
15
20
Follow-up time (months) FI . 8.5. The logarithm of the risk of respiratory infection as a function of fOlfow-up time relative to the risk at an individual's first visit.
parameters suggest that the risk of respiratory infection climbs steadily in the first 20 months of life before declining; see Fig. 8.3. The longitudinal age parameters suggest otherwise. The risk of respiratory infection declines in the first 7 to 8 months of follow-up before rising later in life; see Fig. 8.5. This pattern is consistent even if we subdivide the study population into five cohorts according to the age at entry, namely, 0-12, 13-24, 25-36, 37-48, and 49 months or older. This pattern of a convex relationship between age and the risk of respiratory infection appears to coincide with the pattern of seasonality noted earlier. To determine whether seasonality is responsible, we add the annual harmonic in Model 4. The results are shown in the last. c~lu~ of Table 8.6. In fact, the longitudinal age parameters are now all lns~gnIficant. It makes sense that we can learn little about the effects of agemg from data collect d 18 h' . . mont s If we restnct our attentIOn . e over to ~otngltt.udl inal information. This is especially true in the presence of a su s an la seasonal signal. On the other h comparing child f d'a and, much can be learned by ren 0 luerent ages so I are no cohort effect h' h ong as we can assume that there s, w IC would confound the inferences about age effects. 8.4 Counted responses 8.4.1
Parametric mod II' I e mg Jor count data
?ount data are increasingly common . '. lnclude the number opaille f ' attacks 0 III the. bIOlogical sciences. Examples . aft er receiving treatment £ • ccurnng durmg a six-month interval . h lOr mental IUn . th III a tree-month period d' ess, e number of sexual partners th recor ed III an HIV prevention programme and e number of infant death s per month b £ ' e ore and after introduction of a
The Poisson distribution is completely specified b th . . b th th Y e parameter 11, whIch IS 0 e mean and the variance of the d'st 'b t' h' I n u lon, t at IS, 11 == E(Y) = Var(Y). Unfortunately, the Poisson assumption that th d' . . e mean an vanance are eq~aI IS often mconsistent with empirical evidence. To Illustrate the problem, consider the seizure data in Table 1.5 which represent the number of epileptic seizures in each of four two-week in~ervals, f~r treatment. and control groups with a total of 59 individuals. Table 8.8 gives the ratIO of the sample variances to the means of the counts for each treatment-by-visit combination. A high degree of extra-Poisson variation is evident as the variance-to-mean ratios range from 7 to 39 whereas ratios of one correspond to the Poisson model. A commonly us~d model for over-dispersed count data, where the variance exceeds the mean, is the negative-binomial distribution. Here, we assume that given a rate Ili, the Yij are independent Poisson variates with mean and variance equal to Ili. The over-dispersion arises because the Ili are assumed to vary across subjects according to a gamma distribution with mean 11 and variance rP1J. 2 • Then, the marginal distribution of Yij has mean 11 and variance IJ. + rPIJ.2 which exceeds the Poisson variance when ¢ > O. The negative binomial and other random-effects models for count data are discussed in more detail in Chapter 9. In the regression context, we want to relate the counted response to explanatory variables. The most common assumption is log E(Yij) == X~j{3,
(8.4.1 )
so that {3 describes the change in the log of the ~opulat~on-average count per unit change in x. If /31 is the coefficient asSOCIated WIth the treatment assignment variable in the seizure data of Example 1.6, then exp(/3d ~epre. sents the ratio of average seIzure rat es, measure d as the number of seIZures Table 8.8. Variance-to-mean ratios for the epileptic seizure data. Visit Treatment Treated Placebo
1
2
3
4
38.7 10.8
16.8
23.8 24.5
18.8 7.3
7.5
MARGINAL MODELS
162
t' ents compared to that among the . d £, the treate d pa I h . er two-week perlo , or f f3 is evidence that t e treatment IS p ' A negative value 0 1 . t control patIents. b' ntrolling the seIzure ra e, effective relative to the plac~ a I~ cowhich is not prescribed by the Poisson rt" t for the over-dIspersIOn ~o accoun , tead that distribution, one can assume IDS Var(l'ij) == ¢ijE(l'ij), (8.4.2)
b of over-dispersion parameters, a regres'th,/, > 1 To control t h e nurn er A-. A-. ( ) WI ' h j ' ' d e d For this we can assume 'f'ij = 'f' QI , J sian model for the !/Ji ' IS nee 'ters A si~ple version of ¢( Q d would be . . vector a f ql parame h al IS were add,/, 'f antral Alternatively, we may allow a different !/J. = !/Jl if treate an '1'2 I c · . '~t different time points, constant acros,s subjects, '. !/J'I h ve assumed that the mterval lengths, tij, durmg whIch Untl now, we a b' d !" h' . the events are 0 bserved , are the same for each su . Ject, an lor eac VISIt. the seizure data where the mtervals after treatment . approprl'ate !"or l' • • Th IS' IS are all two weeks, A special problem may emerge If the observatIOns are collected at irregular times and Vij represents the number of events between successive visits. This problem, however, can be easily corrected by decomposing the marginal mean, !1ij = E(Vij), into the product of iij, the known observation period, and Aij, an unknown parameter representing the rate of the counting process per unit time, The log-linear model (8.4.1) can then be applied to Aij, that is, log Aij
= X~jf3,
so that a regression coefficient in f3 is a logarithm of the ratio of rates per unit interval of time. We note that logE(Yij) = logt ij
+ X~jf3.
This equation shows that we can take account of different interval lengths by introducing an offset, log t ij , into the log-linear model as explained in more detail by McCullagh and NeIder (1989), ThUS far, we have ignored parametric models for the correlation among repeate d observations on a unit Th effects whi h 'II b d' : e most natural models involve random , c WI e Iscussed III Chapt 9 T k' . ch we can sp 'f . er . a mg a margmal approa , eCI y a parametnc model fo th I' . ) where Q2 is a q -v t f r e corre atJOn coeffiCIent p = p( Q2 , 2 ec or 0 parameters As 'th b' of correlations for pa' f ,WI mary responses, the ranges models in addition t Irs 0 dcounts are restricted by their means. Sensible o ran om effects d I !" counts need further deveIopment. mo e s lOr the association among
8.4.2
Generalized estimating e t' qua zan approach W~ have suggested the Use of S in 8 . ficIent f3 for binary respon ~h ( ,2,4) to estimate the regression coefses. e same estimating function can be used
COUNTED RESPONSES
163
here, except that S{j depends not only on {3 and a b of the need to account for over-dispersion Let :'_ u(t also o)nbal because + ' - QI,a2 eavector f o q - ql q2 parameters. We can now solve simult I th . S{j = a and aneous y . e equatIOns
So({3, a)
=:
8 (88~.)' m
(W, -11i)
=:
0,
~j 2 = {Vij - !1ij}/{V(f.lij)P/2, W',. =: (R,·IR'·2 R· IR where 2 R ) d ' ... , 111, - 111,' R il'" " inn an 11i = E(W i ). There are two minor discrepancies between this procedure and the one suggested in Section 8.2.3, Firstly, the ni squared terms of the ~'s have Ij>(Qd, been added to Wi to estimate the overdispersion parameters, Secondly, the diagonal matrix Hi used in (8.2.5) is set to be the identity matrix, so that (S{j,SOl) depends on (3 and Q only. For binary data, Cov(Yij, Yik) for j =I- k is completely specified by (8.2.1) and (8.2.3), and no additional parameters are introduced. However, this is not the case for count data. By forcing Hi to be the identity matrix we potentially lose efficiency in estimating a. Our experience has suggested that this loss of efficiency has very little impact on the f3 estimation. Using Hi = I avoids estimating additional higher-order parameters, and this reduces sampling variation.
:
Example 8.5. Epileptic seizures We now revisit the seizure count data set considered briefly in Section 8.4.1. Fifty-nine epileptics suffering from partial seizures were randomized to receive either the anti-epileptic drug progabide (mI = 31) or a place~o (m2 = 28). Patients from the two groups appear ~o be ,comparable III terms of baseline age (in years) and eight-week baselme seIzure counts. as shown in Table 8.9. The main objective of this study was to determme whether progabide reduces the rate of seizures. . Table 8.10 gives the mean seizure rates per two weeks strat~ed by treatment group and time (baseline versus visits 1-4). Overall, there IS very
Table 8.9. Summary statistics for the epileptic seizure data. Baseline seizure counts in Age Group 8 weeks Theatment Placebo
27.7 ± 1.19 29.0 ± 1.13
31.6 ± 5.03 30.8 ± 4.93
MARGINAL MODELS SAMPLE SIZE CALCULATIONS REVISITED
164
Table 8.10. Averaged s.ei.zure weeks) by treatment and Visit. Visit
Treatment
Baseline
Progabide
rates (per two
Table (~.11. Log-linear regression coefficients and robust standard I' . errors III parentheses) £ . or ana yslS of seizure rates. The model in (8 .. 4 3) was fi tted USing GEE . with and 'th t . assummg exchangeable correlation, WI. o~ pat~ent number 207 who had unusual pre- and post-randomization seizure counts.
Seizure rate 7.90 (6.91)*
1-4
Variable
7.96 (5.71)*
Baseline
Placebo
1-4
Complete data
Intercept
7.70 8.60
Time (xI)
0.90**
Cross-product ratio
(0.74)*
Treatment
*Summary statistics when patient #207 is deleted. ··0.90;;;; (7.96/7.90)/(8.60/7.70).
(X2)
Xl . X2 Over-dispersion parameter
little change in two-week seizure counts for the treated group (7.96/7.90 = 101%) and a small increase in the placebo group (8.60/7.70 = 112%). The treatment effect, measured by the cross-product ratio, is reported in the last row of Table 8.10. Patient number 207 appears to be very unusual. They had an extremely high seizure count (151 in eight weeks) at baseline and their count doubled after treatment to 302 seizures in eight weeks. If their data are set aside, the cross-product ratio drops from 0.90 to 0.74, giving some indication of a treatment benefit. For more formal inference we now use a log-linear regression fitted by the GEE method as described in Sections 8.4.1 and 8.4.2. To estimate the overall treatment effect, we use the following model: logE(Y'ij)
= logt ij + (30 + (3lXijl + (32xi2 + (33 X ijl . Xi2,
j=0,1,2,3,4,
i=I, ... ,59.
(8.4.3)
Here, tij = 8 if j = 0 and tij = 2 if . = 1 2 . to account for d'ff b J , ,3,4. The log tij term IS needed 1 erent 0 servation periods. The covariates are defined as Xiji
={
I if visit 1 2 3 0 4 . "r , o If baseline
,
Xi2
{ 1 .If progabide =. ' a If placebo.
165
~he variable Xi2 is included' th tIons of baseline seizur III e model to allow different distribuThe parameter exp((3 )e. cOtuhnts ~etween the treated and placebo groupS. t b 1 IS e ratio of th men to efore treatment fo th I e average seizure rate after treat(33, represents the differencerin :h~ ~eb~ group. The coefficient of interest, ganthm of the post- to pre-treatment
Correlation coefficient
Patient 207 deleted
1.35 (0.16)
1.35 (0.16)
0.11 (0.12)
0.11 (0.12)
0.027 (0.22)
-0.11 (0.19)
-0.10 (0.21)
-0.30 (0.17)
19.4
10.4
0.78
0.60
ratio between the progabide and placebo groups. A negative coefficient corresponds to a greater reduction (or smaller increase) in the seizure counts for the progabide group. The results given in Table 8.11 suggest that, overall, there is very little difference between the treatment and placebo gr~ups in the change of seizure counts before and after the randomization ((33 = -0.10 ± 0.21) if patient number 207 is included. If this patient is set aside, there is m~dest evidence that progabide is favoured over the placebo.. Note .also ~hat different conclusions would be drawn if the strong over-dISperSI?n (
'"g ~ E(Y;; [1t,,) ~ exp(x,/13) (exP~::;~'13))
0
- 0 from being an absorbing state whereby when a > 0, we have The constant d prevents Yij-I . = 0 forces all fut ure responses t 0 bONate e '. YiJ-;1 1/9 , when the prevIOUS outcome, Yij-l, exceeds an Increased expect a t'on I ''''') I I , (,I) Wh < 0 a higher value at tij-I causes a ower va ue exp(roi .1-11-' ' en a ,
mod~l
at tWithin the linear regression model, the transition can be formulated with Ir = ar(Yij-r - roij-l'(3) so that E(~j! = .Xij (3 whatever the value of q. In the logistic and log-linear cases, It IS dlffic~1t to formulate models in such a way that f3 has the same meaning for dIfferent assumptions about the time dependence, When f3 is the scientific focus, the careful data analyst should examine the sensitivity of the substantive findings to the choice of time-dependence model. This issue is discussed below by way of example, Section 7.5 briefly discussed the method of conditional maximum likelihood for fitting the simplest logistic transition model. We now consider estimation in more detail. 10.2 Fitting transition models
As indicated in (7.5.3), in a first-order Markov model the contribution to the likelihood for the ith subject can be written as
II I (Yij IH ij ).
In a Markov model of order q, the conditional distribution of ~. is .)
l7tij ) ==
f(Yij IYij-I, ... , Yij-q),
so that the likelihood contribution for the ,;th b' t b ' su Jec ecomes n,
f(Yil"",Yiq)
II j=q+I
II I(YiQ+l,""
m
= II
Yini IYiI, ... , Yiq)
i=I
n,
II
i=lj=q+l
f(Yij IH ij ).
m
n,
L L
8
C
:~j vf.1-
1
The GLM (10.1.1) specifies anI th d' . the likelihood of th fir t Y e c?n ItJonal distribution f(Yi.1 l7-£i.1); . e s q ObservatIOn f(y·,1, ... ,Yiq ).IS not spec iiied dIrectly.
(10.2.2)
(Yi.1 - /-tm = 0,
where 6 = ((3 Q). This equation is the conditional analogue of th~ GLM . A ppend'IX A.. 5 The derivative'Il8/-tij/85 IS anaScore equation, discussed III £ ulate the
logous to Xij but it can depend on 0: and (3. We can Stl o~~ ws Let estimation procedure as an iterative weighted least squares as 0 dO c't Itij row 1 S l'i be the (ni - q)-vector of responses £or J. -- q + 1,. 't': n·• an 'th kth expectation given H ij . Let X; be an (ni -q) x (p+s)) ma(nrIX_Wql) x (n' _q) • / C k - 1 n· -q an, 1 8/-tiq+k/86 and Wi = dmg(l vik+q' - , ... , • .. (Yi _ C). Then, . F'IIIally, let Z·• = X.'t5.+ Z• onIt.X· using d·lagonal weighting matrIX. an updated ~ can be obtained by iteratively regressmg A
f(Y··IY·· ) 'J ,)-1"",Yij_q'
(10.2.1)
When maximizing (10.2.1) there are two distinct cases to consider. In the first, fr(H ij jQ,f3) = arlr(Hi.1) so that h(J.l5) == roi/f3 + L:=l ar fr(7-£ij). Here, h(/-tB) is a linear function of both f3 and Q == (ab"" as) so that estimation proceeds as in GLMs for independent data. We simply regress Yij on the (p + s )-dimensional vector of extended explanatory variables (Xij, It (H i.1 ), ... ,Is (H i.1 )). The second case occurs when the functions of past responses include both Q and (3. Examples are the linear and log-linear models discussed above. To derive an estimation algorithm for this case, note that the derivative of the log conditional likelihood or conditional score function has the form
i=lj=Q+l
j=2
f(Yij
m
SC (6) =
n,
Li (Yn, ... , Yin.) == I (Yil)
193
A
£ th conditional mean and variweights W. When the correct model is assumed or t' ~IY as m goes to infinity, ance, the solution ~ of (10.2.2) asympto IC al to the true value, t5, and follows a Gaussian distribution with mean equ
TRANSITION MODELS FOR CATE
TRANSITION MODELS 194
V" = ('f:,Xt'Wi x ;*)-1
(10.2.3)
,:1 • IT d ds on a and o. A consistent estimate, VS ' is obtained The varlance v8 epen fJ . a d by their estimates f3 and o. Hence a 95% confidence by rep1acmg fJ an a " . . a1 C (3 • (3A ± 2 where V.. is the element m the first row mterv lor 1 IS I V V 811 ' °u A
In the regression setting we model th t .. . f' ' e ransltlOn prob a b'l' . 0 covanates Xi' = (1 x'' I Itles as functlOns J "Jl,Xij2, ... ,Xi') Avery I ~ separate logistic regression for Pr(Yij := 1rV;_ = genera _model uses IS, we assume that J I Y'J)' Yij - 0,1. That
A
q;;-
and column of V". . .. . Ii the conditional mean is correctly speCIfied and the condItIOnal vanance is not we can still obtain consistent inferences about t5 by using the robust vari~nce from equation (A.6.1) in Appendix A, which here takes
and logit Pr(Yij = 11 Yij-I
(f,Xt'WiXt) (f:X;'WiViWi X:) (f Xt'Wi x ;)-1 -1
,:1
,=1
(10.2.4) A consistent estimate VR is obtained by replacing Vi = Var(Yi l1td in the equation above by its estimate, (l'i - Mic)(Yi - Mi C )', Interestingly, use of the robust variance will often give consistent confidence intervals for 6 even when the Markov assumption is violated. How!lver, in that situation, the interpretation of 1; is questionable since JL5 (IS) is not the conditional mean of Y:,'J given 'IJ, , ' t'J' 10.3 Transition models for categorical data This section discusses Mar kov ch" . am regreSSIOn models for categorIcal responses" observed at equ II d . b' a y space mtervals. We begin with logistic deIs lOr mary responses rna . mult'n . al d and then b nefly consider extensions to 1 o~m an ordered categorical outcomes. As dIscussed in Section 7 3 fi' . . characterized by th t " ., a. rst-order bmary Markov cham IS e ransltlOn matnx
(
11"00 11"10
11"01) 1I"1l '
logit Pr(Yij =
11 Yij-l
=
YiJ'-I)
== x',a + y"'J- IX',,,, 'JfJ o 'J .... ,
(10.3.1)
so ~hat f31 .= .f3o + o. Equation (10.3.1) expresses the two regressions as a smgle lOgiStIC model which includes as predictors the previous response Yij-I as well as the interaction of Yij-I and the explanatory variables. An advantage of the form in (10.3.1) is that we can now easily test whether simpler models fit the data equally well. For example, we can test whether 0:: = (ao,O) so that in (10.3.1) Yij-lXi/O: = O:OYij-l' This assumption implies that the covariates have the same effect on the response probability whether Yij-l = a or Yij-l = 1. Alternately, we can test whether a more limited subset of a is zero indicating that the associated covariates can be dropped from the model. Each of these alternatives is nested within the saturated model so that standard statistical methods for nested models can be applied. In many problems, a higher order Markov chain may be needed. The second-order model has transition matrix Yij Yij-2
0 0 1
1 where 7I"ab == Pr(y;" == blY:' _ probability that ' / _ 1 'hJ - 1 - a), a, b = 0,1. For example ?f01 is the L 'J W en the . ' that each row of a transition rna . prevIOUS resp~nse is Yij-l = 0. Note a) + Pr(y;'J' == 11 Y ) tnx sums to one smce Pr(Y:' = 0 I Y:' 1 = 'J-I = a == 1. As its '. '3 'Jname Imphes, the transition matrix
= 1) == X~jf3l,
where f30 and f31 may differ. In words, this model assumes tht th IX '11' a e euects . bl a f exp1ana.t ory vana es WI dIffer depending on the prevIous . response. A more conCIse form for the same model is
the form
,=1
195
records the probabilities of making e h f . one visit to the next. ac 0 the pOSSible transitions from
(p + 8) X (p + 8) variance matrix
VR;:;:;
GORICAL DATA
Yij-l
0 1 0 1
0
1
1rooo
11"001
11"010
11"011
11"100
11"101
11"110
7l'1l1
"V b)' for example 11"011 is the H ere, 11"abc = Pr(Yij = C IYij-2 = a, L ij-l , _ B al with probability that Yi J' = 1 given Yij-2 = 0 and Yij-l - 1. fiy;n ogy ate · could now t lour separ t h e regression models for a first-or d er ch aIll, we
TRANSITION MODELS FOR CATEGORICAL DATA
TRANSITION MODELS 196
~ ossible histories (Yij-2, Yij-d, ach of the lOur p ffi' t a f3
~
logistic regressions, one or e d (1 1) with regression cae Clen ~ 1-"00'. 01' an it is'agalll . more convenient to wnte a smgle name Iy (0 , 0) , (0 , 1) '.(1,0), I But {3 10' a nd (3 11' respectIve y. equation as follows logit
Pr(Yij =
11 Yij-2
--
--
Y'J-2,
Yo-I 'J
= X ij f3 + Y-')-- IX-'J-0'1 + y"'J- 2X 'J- -02 I
I
I
= Yij-I)
10.3.1
+ Y'j-I -- y-'J--2XiJ-03'
(10.3.2) .
a,z' .
~~~~~rr;.~s~~;~~:ene~ndst~is~ituationp' l:e :u:tg~~l~t:~~s:i~~t:et~r~~:~~~ models of different or er. exam , rOT
model which can be written in the form
= 11 Yij-3 = Yij-3, Yij-2 = Yij-2, Yij-l =
Yij-d
= X~jf3 + O'tYij-1 + 0'2Yij-2 + 0'3Yij-3 + 0'4Yij-lYij-2
+ 0'5Yij-lYij-3 + 0'6Yij-2Vij-3 + 0'7Yij-IYij-2Yij-3'
(10.3.3)
A second-order model can be used if the data are consistent with aa = a5 = blY:ij-I
I
.
(.
,
a=
0,1, .. , , c
V;
- 2.
Here, each is allowed to have a different intercept but the proportional odds model requires that covariates have the same effect on each Ya•• Our first application of the proportional odds model to a Markov chain for ordered categorical responses is the saturated first-order model without covariates. The transition matrix is 1fab = Pr(Vij = b IYij-I = a), a = 0,1, ... , c - 1. We model the cumulative probabilities,
Table 10.5. Definition of Y· variables for proportional odds modelling of ordered categorical data. V }'/ o
y.1
° 1 1
1
2
°1
o o
(10.3.4)
= a) = a)
=(Jb+X"'~ a I ) fJa
(10.3.5)
for a = 0, ... , C - l', b = 0 , ... "C - 2 As wI'th b'mary responses we c~n rewrit: (1O.~,5) as ~ single (although somewhat complicated) re~res swn equatIOn usmg the mteractions between Xij and the vector of derived 'bles Yij-I • . • vana - (Yij-I.O,"" Yij-I,C-2)
(10,3.6)
°
= 10glt Pr Ya = 1) = Oa + X 13,
_ - 9ab
x·· have a different effect on y..
Following Clayton (1992), it is convenient to introduce the vector of variables Y" = (Yo",~", ... ,YC- 2 ) defined by Ya" = 1, if Y s:: a and otherwise. If C = 3, Y" = (Yo", Yt) takes values as shown in Table 10.5. The proportional odds model is simply a logistic regression for the Y a" , since Pr(Y $ a) (Y ) Pr > a
= a) = a)
203
.. .,~ , h ....,.,umlllg t at
for a = 0,1, . , . , C -1 and b = ' 1,.,., C' - 2. Now suppose th t . C h .' a covanatE'B 'J. I ) lor eae prevIous state Y. . Th d I can be wntten as I)-I· e mo e
= ()a + x'f3,
l e - 2 Here an d for the remainder of this section,, we h were a = O, ,.:., . () and do not include an intercept term m x, write the model mtercepts asp; < a) = e-Oa /(1 +e-Oa). Since Pr(Y s:: a) is () < () < ... < ()C-2' If ()a = ()a+l, Taking x = 0, we see t?at Pr a non decreasing functIOn of a, we have 0 - 1 _ P (Y < + 1) and categories a and a + 1 can therefore then Pr(Y $ a) - r _ a be collapsed. " t t' , . parameters f3 have log odds ratIO mterpre a lOllS, smce The regressIOn
log
" O. In this model, s~~gested by Wong (1986), ,8 represents the influence of the explanatory variables when the previous response takes the value Yij-l = O. When Yij-I > 0, the conditional expectation is decreased from its maximum value, exp( x; ,,8) {I + exp(-ao)}, by an amount that depends on at. Hence, this mo~el only allows a negative association between the prior and current responses. For a given ao, the degree of negative correlation increases as at increases. Note that the conditional expectation must vary between exp(x;j,8) and twice this value, as a consequence of the constraints on ao and al.
1. IlC = exp(x;j,8){1 + exp{-ao - aIYij-t}}, ao, al
= exp(x;j,8+aYij-I)' This model appears sensible by analogy with the logistic model (7.3.1). But it has limited application for count data because when a > 0, the conditional expectation grows as an exponential function of time, In fact, when exp(x;j,8) = 11, corresponding to no dependence on covariates, this assumption leads to a stationary process only when a < 0. Hence, the model can describe negative association bu~ ~ot p~sitive association without growing exponentially over time. ThIS IS a time series analogue of the auto-Poisson process discussed for data on a two-dimensional lattice by Besag (1974).
2. 115
3.
~~d~e:p~~j,8+a{l~g('yij_l) -X~j_I,8}], 1988)
where Yi'j-l
= max(Yij-l, d)
< ,I. Th~s IS the model introduced by Zeger and Qaqish
( a~d bnefly dIscussed in Section 10.1. When a = 0 it reduces to an ordmary log linear d I Wh ' th 't .mo e. en a < 0, a prior response greater an I s expectatIOn decreases th t. and th" e expec atlOn for the current response ere IS negative correlation betwe ', d W there is positive correlation. en Y'J-l an Yij' hen a > ,
°
For the remainder of this se . sition model above It .ctlon, we focus on the third Poisson tran. can anse through . I h ' al l' caIIed a size-dependent branch' a SImp e p ySIC mec lalllsm mg process. Suppose that exp(x~j,8) = 11·
Yj
::=
L ZdYJ-d.
k==1
If YJ-l = 0, we assume that the population'. . uals. Now, if we asssume that the r d IS ~estarted WIth 2 0 individ. . an om vanables Z . d POIsson WIth expectation (JiJy* )1-0 th h k are III ependent J-I , en t e population size ,II C I • I 10low the transition model with I.F = the number of offs ri J p.(YJ-I/p.). The assumption about . . p ng per person represents a crowding effect. Whe the populatIOn IS large, each individual tends t d h . n a: . Th' , 0 ecrease t elr number of ousprmg. IS leads to a stationary process. Figure 10,1 displays five realizatiOllE of this transition model for different val~es of a. When a < 0, the sample paths oscillate back and forth about theIr long-term average level since a large outcome at on t'Ime decreases ,e . . the condItIonal expectation of the next response. When a > 0, the process meanders, staying below the long-term average for extended periods Notice t~a~ the sample paths have sharper peaks and broader valleys. Thi~ pattern IS III contrast to Gaussian autoregressive sample paths for which the peaks and valleys have the same shape, In the Poisson model the conditional variance equals the conditional mean. When by chance' we get a large observation, the conditional mean and variance of the next value are both large; that is, the process becomes unstable and quickly falls towards the long-term average. After a small outcome, the conditional mean and variance are small, so the process tends to be more stable. Hence, there are broader valleys and sharper peaks. To illustrate the application of this transition model for counts, we have fitted it to the seizure data. A priori, there is clear evidence that this model is not appropriate for this data set. The correlations among repeated seizure counts for each person do not decay as the time between observations increases. For example, the correlations of the square root transformed seizure number at the first post-treatment period with those at the second, third, and fourth periods are 0.76, 0.70, and 0.78, respectively. The Markov model implies that these should decrease as an approximately exponential function of lag. Nevertheless, we might fit the first-order model if our goal Was simply to predict the seizure rate in the next interval, given ?nly ~he rate in the previous interval. Because there is only a single. ob~ervatlOn pnor to randomization we have assumed that the pre-randomIzatIOn means are the same for the 'two treatment groups. Letting d = 0.3, we estimate the treatment effect (treatment-by-time interaction in Tables 8.11 and 9.6) to be -0.10, with a model-based standard error of 0.083. This standard error Q
'
\\
TRANSITION MODELS
FURTHER READING
206
207
(')~ol_~~~200
.
12 (b)
o
1
50
L
.11
100
hI. I. I
250
150
AIIlI .AIIL
.~L,~,"l . J1.dM .~LM
'1~~~50 o a 1 I"
I
50
I
I.
I
100
(.) ;11_~:-:~~~----::;:--':""-~~~C:L2Ci~----''--2!;O o
50
100
Fig. 10.1. Realizations of the Markov-Poisson time series model: (a) a = -0.8; (b) u == -0.4; (c) Q == 0.0; (d) Q == 0.4; (e) Q = 0.8.
is not valid because the model does not accurately capture the actual correlation structure in the data. We have also estimated a robust standard error which is 0.30, much larger than 0.083, reflecting the fact that t~e correlation does not decay as expected for the transition model. The estImate of treatment effect is not sensitive to the constant dj it varies between -0.096 and -0.107 as d ranges from 0.1 to 1.0. The estimate of a for d = 0.3 is 0.79 and also is not very sensitive to the value of c. 10.5 Further reading Markov models have been studied by probabilists and mathematical statisticians for several deCades. Feller (1968) and Billingsley (1961) are semi?al texts. But there has been little theoretical study of Markov regressIOn
models except for the linear model (e.g. Tsay, 1984). This is partly because it is difficult to derive marginal distributions when the conditional distribution is assumed to follow a GLM regression. Regression models for binary Markov chains are in common use. Examples of applications can be found in Korn and Whittemore (1979), Stern and Coe (1984), and Zeger et al. (1985). The reader is referred to papers by Wong (1986), and Zeger and Qaqish (1988) for methods applicable to count data. There has also been very interesting work on Bayesian Markov regression models with measurement error. See, for example, the discussion paper by West et at. (1985) and the volume by Spall (1988).
GENERALIZED LINEAR MIxED MODELS 209
11
Likelihood-based methods for categorical data
to the underlying model assumptions n eomatlOn to the likelihood at the posterior mode of th e ran om effects, To derive the adaptive quadrature
!
G - "K
2) _
~ U;Il
C
)}]
~ U;Il j} C
C-'I'¢(u/C'I') do.
C-'!>
¢(U/GI/2) ]
~ fjJ [ex x
~ IT t Wk' [ex
f(y"
x ¢([u _ al/b)
flU,; C) dU,
[exp{t,IOgf(Y'j I X" U,; lic) }] . C-'I'¢(U,/C'I') dU,
,=) k=1
~ tJ J[ex {t, log ~ fjJ [exp {t,
!ogf(Y'j I X" U,
Maximum likelihood algorithms
, , I I'andom effects standard numerical integration For low-dlmenslOlla , , ' luate the lrkelrhood, solve score, equatIOns, eva to d metho dscan be use Gauss-HermIte quadratan d compu t e rna de l- based information matrices, , K ' ure uses a fixed set of K ordinates and weights (Zk' wkh=1 to approximate the likelihood function, Consider a single scalar random effect Ui rv N(O, G) and the likelihood given by equation (9.2.5):
L(6, Y)
213
we begin with the likelihood given ab d OVe an then 'ct ' linear transformation for the placeme t f th COllSl er an arbItrary e quadrature points: n 0 L(t5,y)
p
{t,
¢([a +
m K ~g(;Wk'
[
exp
log
b·
'¢([u - aJ/b) du
fry"~ I X" U. ~ ("
+b ,); lie) } C-'I'
1]
z]/GI/2) ¢(z) . b 'ljl(z) dz
{
n, } ~logj(YijIXi,Ui=(U+b'Zk);I3C) C- I/ 2
I 2
!].
x ¢([a + b . zkl/C / ) . ¢(Zk) b
This shows that we can use the Gauss-Hermite quadrature points (Wk , Zk)K_ after any linear transformation determined by (a,b) (as long k-l . as the integrand is also modified to contain one additional term a ratIO of normal densities). If the function that we are trying to in~egrate were of the form exp( - ~ (u - a)2 /b 2) then evaluation using the .Imear transformation a + b . Zk would yield an exact result. An ada?tlVe approa~h uses different values, ai and bi for each subject, that proVIde a ~u~ratlc . ' bsu"Ject s cont n'b u tl'on to the log likehhood, apprmomation to the zth ".10 f( .. / X, lj .. ,qc) - U 2 /(2G). In Section 9.2.2 we showed that ~J g YtJ " t , fJ t e d b ' th . . • _ -V-I ( . - X·{3 ), an i IS e Ui IS the postenor mode, ai = Ui - CD, if Zt b. D'. +C-I 1-1/2, where approximate posterior curvature, bi = [Lj D ij V(Ji'J) IJ Dij = af..L~)ab. Liu and Pierce (1994) dis~uss ~ow this adaptive quadrature is related to higher order Laplace ~pproXlmatlOn, f PQL fixed quadratPinheiro and Bates (1995) studied the acc~acy 0 . d' model Their ure, and adaptive quadrature for the non-hnear rruxe .
214
LfKELfHOOD-BASED M
ETHODS FOR CATEGORICAL DATA
, d t btain high accuracy with fixed quadrature results SUggfl~'lt that mbor er a °adrature points may be necessary (100 or er 0 f qu " 'd tl1re methods proved accurate usmg 20 pomts me,th 0 ds a large num ') hile adaptIVe qua r a ' , ' software more. , w th d re noW implemented III commerCial or fewer Quadrature me a s a , ' I) d ' ,, l' ST'AT'A (fixed quadrature for logistic-norma an SAS packages mcluc mg I d ' I· . ) A k limitation of quadrature met 10 . S IS t lat I'Ikeh-, (fixed and ac1aptlve . , d' , r.(q quadrature points where q IS the ImenSlOn of . . ey hood evaIuatlOn reqUIres .l' ' . Lr tV· D q larger than two, the computatIOnal burden can t he ran dam euec t· C'or . ' " 'c Il'ml'tation makes numerIcal mtegratlOn usmg quadratbecome severe, Thl" ure an excellent choice for random intercept model~, .for n~sted random effects, or for random lines, but prohibitive for multIdimenSIOnal random effect models such as time series or spatial data models, Monte Carlo ML algorithms have been developed by McCulloch (1997) and Booth and Hobert (2000). Three methods are described and compared in McCulloch (1997): Monte Carlo Expectation-Maximization (MCEM); Monte Carlo Newton-Raphson (MCNR); and Simulated Maximum Likelihood (SML). Hybrid approaches that first use MCEM or MCNR and then switch to SML are advocated, Booth and Hobert (2000) develop adaptations that aim to automate the Monte Carlo ML algorithms and guide the increase in the Monte Carlo sample sizes that is required for the stochastic algorithm to converge. Booth and Hobert (2000) show that efficient estimation can be realized with random intercept models and crossed random effect models, but suggest that the proposed methods may break down when integrals are of high dimension. The advantage of Monte Carlo ML ~ethods is. that the approximation error can be made arbitrarily small ~Imply by Increasing the size of the Monte Carlo samples. In contrast, to Improve:ccuracy with quadrature methods a new set of nodes and weights, (Zk,Wkh:l' must be calculated for a larger value of K, In summary" fixed adapt'Ive, and St och ' numencal . ,mtegratlOn . astlC met hods have been devel d d d op,e an, rna e commercially accessible for use with GLMM h ' s aVlllg low-dimenSional random effect distributions However, none of the numerical ML h d ' met 0 s have been made computationally practical for mod I 'th e aspect makes it ims WI' random effects distributions with q > 5. This pOSSI ble to use ML f GLMM ' dom effects, such as (11.2.3 re ' o~ . s, th~t have senal r.anlongitudinal data anal . Fu), g atly hmltmg applIcatIOn for categorIcal ySIS, rther det 'I d' al regar mg methods for integral approximation can be f d' E oun In vans and Swartz (1995), 11.2.2
Bayesian methods
Zeger and Karim (1991) d' I . f ' ISCUSS use of G'bb ' ,YSIS 0 discrete data usin a GLM 1. S samphng for Bayesian anaImplementation of a wid g I M. Gibbs sampling is one particular , d' . er c ass of meth d £ lor Istnbution known as MCMC 0 s or sampling from a poster, MCMc methods constitute a technical
GENERALIZED LINEAR MIXED M
ODELS
215
breakthrough for statistical est' , , " Imatlon, These m th Y;'J-' , mdicate how strongly the past outcomes predict the current response, However, the;e Id t t to condition on past outcomes 0 are situations where we wou n~ wan 1 t clinical trials ' late X· For examp e, mos make inference regard mg a covar " t fixed final . . f t tent on the response a a are interested m the Impact 0, rea m file over time. In this case v.. measured follow-up time, or on the entIre response pro v d't' on outcomes .I'j-I,.I'J-2,··" we would not want to c,on ~ Ion din the effect of treatment on after baseline when making mference regar 1 gconsidered as intermediate Vij since earlier outcomes should be proper Y variables and not controlled ,for.. f erial dependence that a transition The attractive charactefl~atlOn~ h s marginal regression structure by model provides can be combmed WIt a
S FOR CATEGORlCAL DATA 226
MARGINALIZED MODELS
LIKELIHOOD-BASED METHOD
,.
d I (Azzalini, 1994; Heagerty, 2002), adopting a marginalized tran,sl~o~h:ofir:t_order Markov chain models of In this section we first reVle alt ative but equivalent, model speciern, 'd I d d cribe an Azzalini (1994 ) an es .' f a marginal regreSSIOn mo e used . f the combInatIOn 0 'd t fication III terms 0 f th response on covanates, an a ranto characterize the dependence; e ture the serial dependence in the sition model (Chapter 10).' usel'ktol'hcaPd function. We then generalize the d .d ntlfy a I e I 00 response process an I e . . del to allow pth-order dependence. I , alized tranSItIOn mo first-order margIn d b' y Markov chain mode to accomuce Azzalini (1994) introd athIntaris common in longitudinal data. a 'al depen d ence h dimes that the current response variable modate t e sen mo e assu , der Markov A fi rst -or . I through the immediate prevIOUS response, . d ndent on the hIstory on Y '1' . 18 epe, ') = E(li' I Yij-d, The transition probaI)J ItIes Pij,O = E(Yij I Yik, k < J) d .. J = E(Yi' I Yi-I = 1) define the Markov proE(Y.' I Y. I = 0 an P'J,I lJ J A I" (1994) d' tl parameterize the marginal mean. zza cess'3but d'J0 not Irec y . Inl F' arameterizes the transition probabilities through two ass~mptlOns. I~S~, a ~arginal mean regression model is adopted which constraInS the tranSitIOn probabilities to satisfy
J.l~ = Pij,l 'J.l~-I + Pij,O' (1 - J.l~-I) ,
(11.3.10)
Second, the transition probabilities are structured through assumptions on the pairwise odds ratio ,T,., _ ~IJ -
Pij,J/(I- Pij,d Pij,O! (1 - pij,O
)'
(11.3.11)
227
regression model for how strong) Yo. . d' (2000) describe the dependence :odellu~;:gl~ts Y;dj'l~eagehrty and. ~eger t f C _ E(V . mo e lOr t e condItIOnal I ij I Xi, Yik, k < J) with logit link expec a IOn f1ij 10git{E(Yij
I Xi, ?tij)}
=:
t.ij(X;) + I'ij,1 . Yij-I,
(11.3,12)
where H· = {Yo . k < J'} d The log oddS rat',10 I'i' I . . I 'J I" ,k . . an I'iJ ' I =: log W. 'J IS sImp y a OglStlC regressIOn coefficient in the model th t d't' J, hX Y: a, con I Ions on bot i and ij-l· The parameter t.iJ(X;) equals logit(p·, ) and' d t _ . d' r' I b M 'J,O IS e er mme Imp. IClt y y (3 and I'ij.1 through the marginal regression equation and .equatlOn (11.3.12). Furthermore, a general regression model can be speCIfied for I'ij,l, I'iJ'1 I
= Z'. 1 .1°1 J,)
(11.3.13)
where the parameter QI determines how the dependence of Yo. on v.. IJ I 'J-I varies as a function of covariates, Z;j,I, For example, lij.1 = O'j, allows serial dependence to change over time, and "Iij, I =: 0'0 +0'1 Zi allows subjects for whom Zi = 1 to have a different serial correlation as compared to subjects for whom Zi = O. In general, Zij is a subset of X; since we assume that equation (11.3.12) denotes the conditional expectation of Yij given both Xi and Zij. In summary, the marginalized transition model separates the specification of the dependence of Yij on Xi (regression) and the dependence of Yij on the history Yij-I, Yij-2, .. , ,Yil (auto-correlation) to obtain a fully specified parametric model for longitudinal binary data. A first-order Yil given model assumes that Yij is conditionally independent of Yij-2, Yij -I' The transition model intercept, t. ij (X i), is determined such that both the marginal mean structure and the Markov dependence structure are simultaneously satisfied. Equations (11.3,12) and (11.3,13) indicate how the ~rst-,order dep.e~d ence model can naturally be extended to provide a margmalIzed tranSItIOn model of general order, p. We assume that lij depends on the history only through the previous P responses, Yij -I, , .. , lij -p' ~ pth-order ~ependence model, or MTM(p) can be specified through the paIr of regressIOns: 0
which quantifies the strength of serial correlation. The simplest dependence model 83sumes a time-homogeneous association, I}J ij = I}J 0, however, models that allow Wij to depend on covariates or to depend on time are also possible. The transition probabilities, and therefore the likelihood, can be recovered ~,a function ~f the m~rginal means, J.l~, and the odds ratios I}J ij' ~zzahll1 (1994) prOVides detaIls on the calculations required for ML estimatIon and. establishes the orthogonality of the marginal mean and the odds ratio parameter in the restricted case of a time-constant (scalar) dependence model.
h{E(Yij
H~a~erty
and Zeger (2000) view the approach of Azzalini (1994) as combmmg a marginal mea d I h . ' n mo e t at captures systematic variatIOn III t he response 83 a fun t' f . I that d 'b ' Cion 0 covanates, with a conditional mean mode Th fiestcn des senal dependence and identifies the joint distribution of Yi' e rs -or er marginalized t 't' first assu . ransl IOn model, or MTM(I), is specified by mmg a regressIon structure for the marginal mean E(¥;' I Xi), . ~smg a generalized linear model h( M) _ , M .' lJ IS specified by assu' M k' J.lij - 'X ij (3 . Next, senal dependence mmg a ar ov structure, or equivalently by assuming a
I Xi)} = m~j(3M
0"
(11.3.14) p
10git{E(lij I Xi, Hij)} = D.ij(Xi ) +
L lij,k 'Yij-k
(11.3.15)
k=1
and we can further assume that the dependence parameters follow regres-
0
sion structure Ok
"II. nJ,
~k , k= = Z ''J' k OA I,
1, 0 " ,po
(11.3,16)
LIKELIHOOD-BASED
228
CATEGORICAL DATA METHODS F OR . .
,) rginalized transitIOn model additive ma For example, a second-order ( I ' Yij-I + "It},Z . Yi}-Z, and "Ii},1 ~ C I a depend on the interaction ' assumes.. logit{/LtJ } == Aij(Xi) +h "ItJ /I..e can a s , , == Z'· 2az, Althoug r'J d I J: r simplicity of presentatIOn, Z lal, "ItJ,z 'J, add'tive mO e 10 , 2 we assume an I meter 13M descri bes changes III the avery,..t}, I ' y,. tJ-' 1}-1 the MTM(2) the mean para . ithout controlling for previous n , f covanates, w d d e as a functIOn 0 age relspons ' 1 1 3' a d'lagra m that represents ,a secon -or er onse variables. Figure ,IS h ' er dashed box indicates that the res P " model T e Inn h' b t h inalized transitIOn 'h ar1 = 4>2 = 0 (completely random dropouts). Comparison of the RD and CRD lines in Table 13.3 confirms our earlier conclusion that dropouts are not completely random. More interestingly, there is overwhelming evidence in favour of informative dropouts: from the ID and RD lines, the likelihood ratio statistic to test the RD assumption is 12,31 on one degree of freedom, corresponding to a p-value of 0.0005, In principle, rejection of the RD assumption forces us to reassess our model for the underlying measurement process, Y·(t). However, it turns out that the maximum likelihood estimates of the parameters are virtually the same as under the RD assumption. This is not surprising, as most ofthe information about these parameters is contained in the 14 weeks of dropoutfree data, With regard to the possibility of an increase in the mean response towards the end of the experiment, the maximum likelihood estimates of {32 and {33 are both close to zero, and the likelihood ratio statistic to test {32 == {33 = 0 is 1.70 on two degrees of freedom, corresponding to a p-value of 0.43, In the case of the milk protein data, our reassessment of the drop~ut process has not led to any substantive changes in our inferences concernmg the mean response profiles for the underlying dropout-free process Y*(t). This is not always so. See Diggle and Kenward (1994) for exa~ples. Note also that from a scientific point of view, the analYSIS rep~r~ed here is suspect, because the 'dropouts' are an artefact of the .defi~ItIOn of time as being relative to calving date, coupled with the ter~llnat.lOn of the study on a fixed calendar date. This leads us on to ~ dISCUSSlO~ of pattern mixture models as an alternative way of representmg potentIally informative dropout mechanisms,
13.7,2
Pattern mixture models
P attern mixture models introduced by L1'ttl e (1993) , work .with the " factor' .IzatlOn , of the joint distribution ' * d D' t th marmnal dIstrIbutIOn of Y an moe 0'
MODELLING THE DROPOUT PROCESS MISSING VALUES IN L
ONGITUDINAL DATA
f Y* given D, thus P(Y*, D) = ·' I distributIOn a . . ,'hl t of D and the can dItlOna . oint of view, it IS always POSSI e 0 P(D)P(Y*' D), From a theoretIcal p . t e model and vice versa, as they d I as a pattern mIX ur . . I , .' f the same joint distrihutJOlI. n pracexpress a selection mO e .' t' . · factorIzatIOns 0 are simply aIternat Ive ifferent kinds of simplIfymg assump IOns, " the two approaches lead to d t Ice, I and hence to different ana yses. , sible rationale for pattern mix' oint of VIeW, a pos , From a rna deIImg P " d t time is somehow predestmed, th t each subject s ropOu . . 'es between dropout cohorts, ThIS ture rna deIS IS a and that the rneasuremendt process I:~:;y to apply very often although, as , I 't tation waul seem un I htera III erpre . ' th 'Ik protein data originally analysed by noted above one exceptIOn IS e ml th 'd t . d d (1994) using a selection model. Because e ropou Dlggle an enwdar . I to different cohorts the literal interpretation times' correspon precIse y . ' of a pattern mixture model is exactly rIght for these d~ta. The arguments in favour of pattern mixture ,mod~llmg are. us~ally o~ a of subjects m a 10ngItudmai tnal ' k'III d " First classification more pragmat IC . . . , , according to their dropout time prOVIdes an obvIous way of dIvIdmg the subjects into sub-groups after the event, and it is sensible to ask whether the response characteristics which are of primary interest do or do not vary between these sub-groups; indeed, separate inspection of sub-groups defined in this way is a very natural piece of exploratory analysis which many a statistician would carry out without formal reference to pattern mixture models. See, for example, Grieve (1994), Second, writing the joint distribution for Y* and D in its pattern mixture factorization brings out very clearly those aspects of the model which are assumption-driven rather than data-driven. As a simple example, consider a trial in which it is intended to take two measurements on each subject, Y* == (Yt, Yn, but that some subjects drop out after providing the first measurement, Let f(y Id) denote the conditional distribution of Y* given D == d, for d = 2 (dropouts) and d = 3 (non-dropouts). Quite gen~rally, f(y Id) == f(YI Id)f(Y2\ YI, d) but the data can provide no informatIon about f(Y2 \YI, 2) since, by definition D = 2 means that Y* is not observed. ' 2 Extensions of the k'nd f l ' tern ' t d 1 I 0 examp e gIven above demonstrate that patmIx ure mo e s cannot b 'd t'fi d . the conditional distributions e 1 en I e WIthout pl~ing restrictions on the use of complete . !(y I d): For example, LIttle (1993) discusses case mtsszng vanable restr' t' h' h assuming that for each d IC Ions, w IC correspond to < n + 1 and t :::: d, 300
.
K
f(Yt\YI'''.,Yt-I,d)=f(y
t
ly1, ... ,Yt-l,n+1).
At first sight pattern ' t hierarchy, Ho~ever Mo~~: urehmodels do not fit naturally into Rubin's , , erg s et al (1997) h h corresponds precisely to the th £ 11 '. s ow t at random dropout e 0 Owmg set of restrictions, which they
301
call available case missing value restrictions: f{Yt I YI" , . ,Yt-I, d) = f(Yt I YI,·,. , Yt-I, D > t).
This result implies that the hypothesis of random drop t b . . . , , ou cannot e tested Without makmg. addItIOnal assumptIOns to restrict the 1· f . c ass 0 a Iternat, , smCe the available case Inissl'ng val ue res t.ne . t'Ions ives under conSIderatIOn, . . , . cannot he verIfied empmcally, The identifiability associated , .problems " . . ' with in~ormat'Ive d ropout models,. ~nd th: ImpOSSIbIlIty of vahdatmg a random dropout assumption on e~lpI~Ical eVIdence a,lone, serve as clear warnings that the analysis of a 10ngItudmai data-set WIth dropouts needs to be undertaken with extreme caution .. How~ver, in the author's opinion this is no reason to adopt the superfiCIally SImpler strategy of automatically assuming that dropouts are ignorable.
Example 13.3, Protein content of milk samples (concluded) The first stage in a pattern mixture analysis of these data is to examine the data separately within each dropout cohort. The result is shown in Fig. 13.2, The respective cohort sizes are 41, 5, 4, 9 and 20, making it difficult to justify a detailed interpretation of the three intermediate cohorts. The two extreme cohorts produce sharply contrasting results. In the first, 19-week cohort the observed mean response profiles are well separated, approximately parallel and show a three-phase response, consisting of a sharp downward trend during the initial three weeks followed by a gentle rising trend which then levels off from around week 14. In the IS-week cohort, there is no clear separation amongst the three treatment groups and the trend is steadily downward over the 15 weeks, We find it hard to explain these results, and have not attempted a formal analysis, It is curious that recognition of the cohort effects in these data seems to make the interpretation more difficult,
13,7.3
Random effect models
Random effect models are extremely useful to the longitudinal data anal?,st. They formalize the intuitive idea that a subject's pattern of ,resp~nses 1~ a study is likely to depend on many characteristics of that subject, mcludmg some which are unobservable. These unobservable characteristics are then included in the model as random variables, that is, as random effects. It is therefore natural to formulate models in which a sUb!ect's propensity to drop out also depends on unobserved variables, that IS, on random effects, as in Wu and Carroll (1988) or Wu and Bailey (1989). In the present context, a simple formulation of a model of this kind woul~ ?e t~ pO,stul~te ' . TT ) a b Ivanate random effect, U = (UI , U2 an d t 0 mo del the jomt distnbutIOn
MISSING VALUES IN
802
MODELLING THE DROPOUT PROCESS
LONGITUDINAL DATA
303
hierarchy, the dropouts in (13.7.7) are completely random if U and U
4.6
1 are independent, whereas if U1 and U2 are dependent then in general the2
4.5
dropouts are informative. Strictly, the truth of this last statement depends on the precise formulation of It (y I UI) and f2(d IU2). For example, it would be within the letter, but clearly not the spirit, of (13.7.7) to set either fr(ylud = fl(y) or h(dlu2) :::: h(d), in which case the model would reduce trivially to One of completely random dropouts whatever the distribution of U.
4.0 4.0
':'.,.,;i(t) = f.l + 15k + 8k t + 'Yk .
= 1,2,3
!J>
01 02 03 81 B2 83 1'1 1'2 1'3
88.586 0 0.715 -0.946 -0.207 0.927 -2.267 -0.113 -0.129 0.106
0.956 0 1.352 0.552 0.563 0.589 0.275 0.081 0.088 0.039
95
90 \ \ \ \
\
.............
\
, Q)
UI
85
c
0 C-
\
"
\
'\, " \
l!?
"
"
"
",
\
,
\
C1l
....................."'\\...
"
\
c
....,.......
,
\
UI
.•....
\.,\
\
Q)
~
.....
\
\ \
80
\ \ \
Covariance structure 2)} ,(u) =a2{al+1-exp(-a3u Var(Y) = q2(1 + al + a2)
CJ2
al a2 a3
170.091 0.560 0.951 0.056
the data. Notice, however, that the estimate of 01 involves an extrapol~~ tion to zero time-lag which is heavily influenced by the assumed parametn form of the correlation function. . e Figure 13,7 shows the observed and fitted mean responses m t~ haloperidol, placebo and risperidone groups. On the fac~ of i~, t~e fit IS qualitatively wrong, but the diagram is not comparing hke wIth hke. As discussed in Section 13.7, the observed means are estimating the mean response at each observation time conditional on not having dropped out prior to the observation time in question, whereas the fitted means are actually estimating what the mean response would have been had there been no dropouts. Because, as Fig. 13.4 suggested and as we shall shortly confirm, dropout is asociated with an atypically high response, the non-dropout subpopulation becomes progressively more selective in favour of low responding subjects, leading to a correspondingly progressive reduction in the observed mean of the non-dropouts relative to the unconditional mean. Note in particular that the fitted mean in the placebo group is approximately constant, whereas a naive interpretation of the observed mean would have suggested a strong placebo effect. It is perhaps worth emphasizing that the kind of fitted means illustrated in Fig. 13.7 would be produced routinely by software packages
\ ~,
75
70
~
~ o
2
4 Time (weeks)
6
8
. t he pace I b0 (•••: •• ) , haloperidol (( - ) Fig. 13.7. Observed and fitted means III ared .. p . WIth fitted ts means ...... , and risperidone (------) treatment groups, com and - - - - - -, respectively) from an analYSIS Ignormg dropou .
. corre1at ed data.. They the correresult which include facilities for modellmg the are likely ' d 1 hich recogmzes 1of .a lIkelihood-based fit to a mo e tsw on t h e same subJ'ect but treats ation between repeated me~ureme~ nd Rubin sense. As discussed lllissing values as ignorable III the LIttle a t be appropriate to ' Section 13.7, estimates 0 f t h'IS k"III d , mayor may no In l' t 'on but in any event th . tIcular app ICa 1 , d e scientific questions pose III a par d and fitted means surely the qualitative discrepancy between the observe deserves further investigation,
A LONGITUDINAL TRIAL OF DRUG THERAPIES
MISSING VALUES TN LONGITUDINAL DATA
312
of measurements and dropouts. d t joint ana Iy. S IS' . . We therefore procce 0 a. 134' which the mean response WIthIn . .' In ase I'mmediately before dropout, T he empirical behaviour of FIg. ' h sharp mcre .' , each dropout cohort 8 ows ~ derance of 'inadequate response as coupled with the overwhelmmg prePtonmo'delli~g the probability of dropout £i dropout sugges s the stated reason or ':I e that is a selection model. In the . of the meEl8urec respons " . as a ,functIOn . 'm Ie 10 istic regression model for dropout, WIth the first Instance, we fit a SI p g lanatory variable. Thus, if Pi) denotes t asurement as an exp . most recen me . t' d ops out at the jth time-pOInt (so that the the probability that patIen z r . . jth and all subsequent measurements are mIssIng), then
logit(pij)
= ¢o + ¢lYi,j-l.
(13.8.2)
The parameter estimates for this simple dropout model are ¢o = -4.~17 and ~1 = 0.031. In particular, the positive estimate of 1>1 confirms that .hl~h responders are progressively selected out by the dropout process. W1thm this assumed dropout model, the log-likelihood ratio statistic to test the sub-model with (it = 0 is D = 103.3 on 1 degree of freedom, which is overwhelmingly significant. We therefore reject completely random dropout in favour of random dropout. At this stage, the earlier results for the measurement process obtained by ignoring the dropout process remain valid, as they rely only on the dropout process being either completely random or random. Within the random dropout framework, we now consider two possible extensions to the model: including a dependence on the previous measurement but onei and inclUding a dependence on the treatment allocation. Thus, we replace (13.8.2) by (13.8.3)
where k :::: k(i) denotes the treatment allocation for the ith subject ~ot.h exte~sions yield a significant improvement in the log-likelihood, ~ mdlcated III the first three lines of Table 13.6.
Table 13.6. Maximized log-likelihoods under different dropout models. logit(pij) Log-likelihood (30
+ (3!Yi ,j-I
+ (3IYi,j_1 + (32Yi,j-2 + (3IYi,j-l + (32Yi,j_2 (301< + "fYij + {3IYi.j-l + (32Yi,j
Finally, .we test ~he random dropout assumption by embedding (13.8.3) within the InformatIve dropout model logit(pij)
= /3ok + "YYij + 131Yi,J-1 + 132Yi.J-2.
(13.8.4)
FraIn lines 3 and 4 of Table 13.6 we compute the log-likelihood ratio statistic to test the ~ub-model of (13.8.4) with') = a as D = 7.4 on 1 degree of freedom ThIS corresponds to a p-value of 0.007, leading us to reject the random dropout assumption in favour of informative dropout. We emphasize at this point that rejection of random dropout is necessarily pre-conditioned by the particular modelling framework adopted, as a consequence of the Molenberghs et al. (1997) result. Nevertheless, the unequivocal rejection of random dropout within this modelling framework suggests that we should establish whether the conclusions regarding the measurement process are materially affected by whether or not we assume random dropout. Table 13.7 shows the estimates of the covariance parameters under the random and informative dropout models. Some of the numerical changes are substantial, but the values of the fitted variogram within the range of time-lags encompassed by the data are almost identical, as is demonstrated in Fig. 13.8. Of more direct practical importance in this example is the inference concerning the mean response. Under the random dropout assumption, a linear hypothesis concerning the mean response profiles can be tested using either a generalized likelihood ratio statistic, comparing the maximized log-likelihoods for the two models in question, or a quadratic form based on the approximate multivariate normal sampling dis~ribution. of the estimated mean parameters in the full model. Under the mformatIve dropout assumption, only the first of these methods is available, ~ecause the current methodology does not provide standard errors for the estImated treatment effects within the informative dropout model. Under the random dropout model (13.8.3), the generalized likelihood ratio statistic to test the hypothesis of no difference between the three mean . D = 42.32 on 6 d egrees 0 f freedom , whereas under the response profiles IS Table 13.7. Maximum likelihood estimates ~ covariance parameters under random dropout an informative dropout models. Parameter
(30
(301
can simply . I t tments For a non-p b' ts from each treatment group. From expenmenta ,rea .. th d separately to su Jec app Iy 1,he me, 0 f' .t' referable to use a common value of the . t' int a VIeW I IS P l' • • an mte~pre Ive po ,Wh' the data are from an experiment companng smoothmg constant, hI.. en uared-erro r criterion (14.1.4) for choosing several treatments, t e mean-sq . . . ,t . . f t 'butions from wlthm each treatrnen gIOUp. h consists of a sum 0 can n . . . f h • 1, 1, • estimated from the emplflcal vanogram a t e Covanance s rue ure IS . . . I t reatment . reSIduals pool ec across ,. groups , assummg a common covanance . . structure in all groups. Another possibility is to model treatment contrasts ~arame~ncal~y usmg a linear model. This gives the following semi-parametnc specIficatlOn of a model for the complete set of data,
(14.1.9) where reij is a p-element vector of covariates. This formulation also covers situations in which explanatory variables other than indicators of treatment group are relevant, as will be the case in many observational studies. For the extended model (14.1.9), the kernel method for estimating J1.(t) can be combined iteratively with a generalized least squares calculation for {3, as follows:
1. Given the cur:ent estimate, (3, calculate residuals, rij = Yij - X~j{3, and use these III place of Yij to calculate a kernel estimate, p,(t). 2. < p matrix with rows X~j' V is the assumed blo.ck-diagonal covarIance matrix of the data and r is the vector of reSIduals, rij.
3. Repeat steps (1) and (2) to convergence. This algorithm is an exa I f th Hastie and Tibshirani 19~6 eo. e back-fi~ting ~lgorithm described by there is a n f (d' ). Typically, few IteratlOns are required unless ear-con oun mg betw th r of the model. This might well ha ee: . e mear and non-~arametric parts nomial time-trend 'In the l' pp n If, for example, we mcluded a polymear part Furth d t '1 . a discussion of the asym t t' ' . er e al s of the algOrIthm, and given in Zeger and Diggl~(~~~4P)roPdertles of the resulting estimators, are an Moyeed and Diggle (1994).
Q;
.c
~
c 'ijl 1500
o
+
2!i ()
-2
o Years since seroconversion
. t and pointwise confidence Fig. 14.1. CD4+ cell counts with kernel estlma e limits for the mean response profile.
NON-LINEAR REGRESSION MODELLING ADDITIONAL TOPICS 326
327
variance (72, whilst the mean response function 11(.) , I' , '''''' ,IS a non- mear function of explanatory varIables Xi measured on the ith sub' t d Jec an parameters . . 1 I (3 For examp Ie, an exponentIal growth model wI'th ' . a smg e exp anatory variable x would speCIfy
40
30
Some non-li~ear models can be con~erted to a superficially linear form by transformatlOn, For example, the SImple exponential growth model above can be expressed as
y(u) 20
10
0
3
2
0
4
5
u
Fig. 14.2. CD4+ cell counts: observed and fitted variograms *: sample variogram - - -: sample variance ._; fitted model.
time the data were collected, it was recommended that prophylactic AZT therapy should begin (Volberding et al., 1990), As with Example 5.1 on the protein contents of milk samples, the interpretation of the fitted mean response is complicated by the possibility that subjects who become very ill may drop out of the study, and these subjects may also have unusually low OD4+ counts,
14.1.1
Further reading
Non- and semi-parametric methods for longitudinal data are areas of current research activity, See Brumback and Rice (1998), Wang (1998), and Zhang et ai, (1998) for methods based on smoothing splines. Lin and ~arroU (200?) develop methods based on local polynomial kernal regresSIOn, and Lm and Ying (2001) consider methods for irregularly spaced longitudinal data, 14.2 Non~linear regression modelling
In this ~ection, we consider how to extend the framework of Chapter 5 t~ non-1mear models for the mean response, Davidian and Giltinan (1995) gIVe a much mOre detailed ac t f th' , fr ,coun 0 IS tOPIC, Another useful review, om a somewhat ,dIfferent perspective, is Glasbey (1988). The cross-sectIOnal form of the non-II'n . , ear regreSSIOn model IS
Y;
= ,1(X i; (3) + Zi
i
== 1" , , ,n,
(14,2.1)
where the Zi are mutually independent d . , and are assumed to be normall d' t 'b eVIatI?nS from the mean response y IS n uted WIth mean zero and common
where J-i* (-) = log J.L(-) and f3j = log (31' However, it would not then be consistent with the original, non-linear formulation simply to add an assumed zero-mean, constant variance error term to define the linear regression model,
Y* = J-i*(x; (3*)
+ Z;
i
= 1"", n,
as this would be equivalent to assuming a multiplicative, rather than additive, error term on the unlogged scale. Moreover, there are many forms of non-linear regression model for which no transformation of f.1.(-), (3 and/or x can yield an equivalent linear model. Methods for fitting the model (14.2,1) to cross-sectional data are described in Bates and Watts (1988). In the longitudinal setting, we make the notational extension to 't,T 1: ij
=,...,I/(x ij,'(3)+Zij
]·=l" .. ,nt",' i=l",.,n,
(14,2.2)
. WI'th'III su b'Jec ts , and where, as usual, i indexes subjects and J, OCCasIOns consider two generalizations of the cross-sectional model: 1. Correlated error structures - the sequence of deviations Zij; j = 1, , . , , ni within the ith subject may be correlated; 2. Non-linear random effects - one or more 0 f the regression parameters . f . 'fi c s t ochastic perturbatIOns d0 a (3 may be modelled as subJect-specI ' (14 2 2) (3 is replaced by a ran am population average value, t h us III .' 'fr d' t 'b _ £ h subject am a IS n u vector B i , realised independent IY or eac . ) 'th mean (3 and tion (usually assumed to be multivariate GaUSSIan WI variance matrix
V,e.
. . Cha ter 5 we could treat these P , t'on of randomly Recall that in the linear case conSIdered III t ' th t ntext the assurop I Wo cases as one, because III a co 1 fan mean response varying subject-specific parameters leaves the p.opu as~ructure of the Yij' unchanged and affects only the form of the covarIance
JOINT MODELLING
ADDITIONAL TOPICS 328
alized linear models dkussed in , 1 .Is as for t he gener " , . For non-hnear moe c , j'ff t implications, both for statIstIcal th two CfJ.8eR have c I eren Chapter 7, e ,t t' f the model parameters. ' analysis and for the interprfl ,a JOn 0 C:orrclatcd cr7YJrs , . (1422) with a parametric specifiC '
-
,
'
,
there are no interactions of item and subject; the subject dev1atlOn IS the same for all items. To complete the specification, we can assume Ok and bi are independent, mean zero Gaussian variables with variances D/j and Db, rp,spectively. In some applications, we might allow only a subset of the coefficients to have random effects so that Do or Db might be degenerate. This multilevel formulation provides a lower-dimensional parameterization of the variance matrix Var(Y) = V, To see the specifics, we write
Yt
Xi
60 xl
60 x 90
6 90 x 1
+ +
fi 60 xl,
where 130 is a 30 x 1 vector of ones and 6 = (6i, 6&, . , , ,6~0)" Then, we have Yt == Xj(J ® 130 + f; , = X; (6 + bi QS1130) + fi,
•
>
"
the Pearson's chi-squared test statistic, The nu b fd ' exampIe IS ' one, because the sub-mod I hm er 0 egrees of freedom in thIS the unrestricted model has two. e as one parameter, whereas
112 - Y'2
. III . t h'18 exam pie means that both n] and n'2 are The word 'asymptotw' large, . . i by fitting a series of sub-models wlnch Likelihood Illfcrence procee( s , , '" ' . means thoa tach sub-model III the ' sequence are nested ThIS ec h IS'contamed b withm the prevIOus one, I n Example.A, 1, an interestmg , hypot eSIS, or su Th · that 01 = 0, corresponding to equalIty, 1of PI and' P2· e mo de,I t 0 t es t IS t' '£r b t thl's sllb-model and the full model Wit I no restnc JOn on dluerence e ween , ,.. , 01 can be examined by calculating the likelihood ratw test statzstzc, which is defined as >
GENERALIZED LINEAR MODELS
STATISTICAL BACKGROUND
342
,
' .
G = 2{log L(O Iy) -log L(Oo Iy)}, where 00 and iJ are the maximum likelihood estimates of (J under the null hypothesis or sub-model, and the unrestricted model, respectively. Assuming that the sub-model is correct, the sampling distribution of G is approximately chi-squared, with number of degrees of freedom equal to the difference between the numbers of parameters specified under the sub-model and the unrestricted model. An alternative testing procedure is to examine the score statistic, S( (J), as in (A.4.l), The score test statistic is
S(Oo)V(Oo)S' (0 0 ), whose null sampling distribution is also chi-squared, with the same degrees of freedom as for the likelihood ratio test statistic. In either case the submodel is rejected in favour of the unrestricted model if the test s'tatistic is too large, Example A.2. (continued) Suppose that we want to test whether the probabilities Pi and P2 from two treatment ' d ' I Th' " , s are 1 entICa, IS is equivalent to testing the sub-model :I~h:~ := NOlte that the value of ()2 is unspecified by the null hypothesis n ere ore las to be estimated The al eb ' £ . . ratio test stat' t' G' .' g ralC orm of the hkehhood in application~s i~c c IS c~~p~cated, and we do ~ot give it here, although n has the simple for: easl y e evaluated numencally. The SCore statistic
t
(YI - EI)2 + (Y2 - E 2)2 EI E2 where E; = ni(YI + Y2)/(nI + n ) 's null model that the two gr 2 I the expected value for Y; under the oups are t h e same Thi t . . . . s s atlstlC IS also known as
A.5 Generalized linear models Regression models for independent discrete and t' 'fi d , . con ·1llUOUS responses have been Ull! e under the class of generaliZed linear d l GL~l . .' moes, or lV s (McCullagh and NeIder, 1989), thus providing a common bod f t t' t' a l' d' Y 0 sa ,IS ,leal met h 0 d a Iogy lOr Iuerent types of response Here we " t h . I' " ., review. I' sa lent features of thIS class of models, We begin by considering two particular GLMs, logistic and Poisson reg:ession models, and then discuss the general class, Because GLMs apply to mdependent responses, we focus on the cross-sectional situation as in sec~ion A.2, w~th a si~gle response Yi and a vector Xi of P expla~atory vanables assOCIated WIth each of m experimental units. The objective is to describe the dependence of the mean response, J1.; = E(Y;), on the explanatory variables.
A,5.1
Logistic regression
This model has been used extensively for dichotomous response variables such as the presence or absence of a disease. The logistic model assumes that the logarithm of the odds of a positive response is a linear function of explanatory variables, so that
Iog Prey; Pr(li
=
1) = Io gf.Li- - =x'(3 . i
= 0)
1 - Jii
:ax.
iFigure A.I shows plots of Pr(Y = 1) against a single explanatory able x for several values of j3, A major distinction between the lOgIstIC regres~ion model and the linear model in Section A,2 is that the linearity applies to a transformation of the expectation of Yi, in this case the log odds transformation rather than to the expectation itself. Thus, the regression coefficients (3 'represent the change of the log odds of the response , variable per unit, change of x. Another feature 0 f t h e d'ch ~ 0 t om?us response . variable is that the variance of li is completely determmed by Its mean, Jl•. Specifically,
This is to be contrasted with the linear model, where Var(l'i) is usually asSumed to be a constant, u 2 , which is independent of the mean.
STATISTICAL BACKGROUND
344
----.~
1.0
GENERALIZED LINEAR MODELS
A.5.3
..,.,
345
The general class
Linear, logistic and Poisson regression models are all special cases of generalized linear models, which share the fOllowing features. First, the mean response, J.li = E(Yi), is assumed to be related to a vector of covariates, x, through
0.8
0.6 pIx) 0.4
For logistic regression, h(J.li) = log{J.li/(1 - J.li)}; in Poisson regression, h(Jli) = log(Jli)' The function h(.) is called the link function. Second, the variance of Yi is a specified function of its mean, J.li, namely
0.2
0.0 L...--,-_-.--_:--;~ 2 4 -4 -2 o
Var(Y;)
X
Fig. A.!. The logistic model, p(x) = exp(#x)/{1 +exp(#x)}; - : .........: (3 = 1; - - -: # = 2.
#
= -0.5;
A.5.2 Poisson regression Poisson regression, or log-linear, models are applicable to problems in which the response variable represents the number of events occurring in a fixed period of time. One instance is the number of seizures in a given timeinterval, as in Example 1.6. Because ofthe discrete and non-negative nature of count data, a reasonable assumption is that the logarithm of the expected count is a linear function of explanatory variables, so that
= Vi = ¢V(J.li).
In this expression, the known function v (.) is referred to as the variance function; the scaling factor, ¢, is a known constant for some members of the GLM family, whereas in others it is an additional parameter to be estimated. Third, each class of GLMs corresponds to a member of the exponential family of distributions, with a likelihood function of the form
f(yd = exp[{Yi f}; - t/J(fh)}/r/> + C(Yi,¢)]'
(A.5.l)
The parameter 0i is known as the natural parameter, and is re~ated to ?,i through Jli = 8t/J(Od/80i . For example, the Poisson distribution IS a speCIal case of the exponential family, with
log E(Yi) == x~f3.
~ere, the regression coefficient for a particular explanatory variable can be mterpreted as the logarithm of the ratio of expected counts before and after a o~e unit increase in that explanatory variable, with all other explanatory variables held constant. The term 'Poisson' refers to the distribution for counts derived by Poisson (1837), p(y) == exp( -J.l)/lY Iyl, y
= 0,1, "
i~
Var(Yi)
= E(Yi) == exp(x~,8).
. me . I u d e the Gaussian t or Normal Other distributions within this family gamma distribution, the binomial distribution and the two-parame er distribution. . f3 n be estimated by solving In any GLM the regression coeffiCients, ,ca the same estimating equation,
..
istic At's logt. hregreSSiOn, the assumption that Yi follows a Poisson distribuIon Imp les t at the varianc f v · d . th e mean and vanance . e a same .l i IS etermmed by its mean. In this case, are the
,
Oi=logJli' t/J(Oi) = exp(Oi) , e(Yi,¢)=-log(Yi')' r/>=1.
S(f3) =
f (:;)
I
V;I{y; - J.li(f3)} = 0,
(A.5.2)
i=I
) is the derivative of the logarithm where Vi = Var(Yi). Note that SCf3. ~ which is the maximum likeof the likelihood function. The solut~on f3 : ly reweighted least squares; l 'h . b bt' ned by Iterat Ive . I I ood estimate, can e 0 8J d t 'led discussion. Finally, marge See McCullagh and NeIder (1989) for a e 8J
QUASI-LIKELIHOOD
STATISTICAL BACKGROUND 346
samp Ies
,
'b t'
~ follows a Gaussian dlstrI u JOn
f3
V
=
(
~ (Dlti)' v:-
£-
Df3
with mean {3 and variance
D1.li) - I '8/3
then a simple calculation gives the asymptotic variance matrix of i3 as
4>
(A.5.3)
I
347
8 m
(
Xi
)-1
exp(x~f3)x:
i=1
.. I b V which is obtained by replacing This variance can be eHtImate( Y in the expression (A.5.3).
13 with
i3
A.6 Quasi-likelihood . rt f the GLM family is that the score function, One Importadnt prlope ~h~ mean and the variance of the }i, Wedderburn S(f3) depen s on y on . (A 5 2) ". ~an (1974') was the first to point out that the estimati~g equatIOn therefore be used to estimate the regression coefficIents for any chOIce~ of link and variance functions, whether or not they corre~pond to a ~artIcu lar member of the exponential family. The name quasz-score funetwn was coined for S(f3) in (A.5.2), since its integral with respect to 13 can be thought of as a 'quasi-likelihood' even if it does not :o~stitute a ~rop:r likelihood function. This suggests an approach to statIstIcal modellIng 10 wWch we make assumptions about the link and variance functions without attempting to specify the entire distribution of }i. This is desirable, since we often do not understand the precise details of the probabilistic mechanisms by which data were generated. McCullagh (1983) showed that the solution, ~, of the quasi-score function has a sampling distribution which, in large samples, is approximately Gaussian with mean (3 and variance given by equation (A,5.3).
Example A.2. Let YI ",., Ym be independent counts whose expectations are modelled as
/3
Thus, by comparison with (A.5,3) the variance of is inflated by a factor of 4>. Clearly, ignoring over-dispersion in the analysis would lead to under-estimation of standard errors, and consequent' over-statement. of significance in hypothesis testing. In the above example, 4>E(Y,) is but one of many possible choices for the variance formula which would take account of over-dispersion in count data. Fortunately, the solution, is a consistent. estimate of (3 as long as h(lti) = x~l3, whether or not the variance function is correctly specified. This robustness property holds because the expectation of S(f3) remains zero so long as E(}i) = lli((3), However, the asymptotic variance matrix of has the form
/3,
/3
V,
~V
(t.(:)'
v;lVal(v;)v;l: ) V
Note that V2 is identical to V in (A.5.3) only if Var(¥;) = Vi· When this assumption is in doubt, confidence limits for f3 can be based on the estimated variance matrix
11, = Y
(t.(:)'
~,({3)}'V;l: ) Y,
V,-l{V; -
(A.6.1)
/3
evaluated at (:J, We call V a model-based variance estimate of and V2 a robust variance estimate in that V2 is consistent regardless of whether the specification of Var(Yi) is correct.
logE(¥;) = x:f3,
i
= 1, ... , m,
In bio~edical studies, frequently the variance of Vi is greater than E(Yi), the varJa~ce expression induced by the Poisson assumption, This phenomenon IS known as over d' . 0 ne way to account for thIS . IS . to - zsperswn, assume that Var(Y.) - -I.E(Y.) h -I. . . N ' - '+'. i ": ere '+' IS a non-negative scalar parametel. ate that for the POlsson d1stribution 1> = l' if we allow -I. > 1 we no longer hav d' t 'b . " 'P, 1s.n utton from the exponential family. However if we defi ne (3' as t hee aSolutlOn to '
i=I
An alternative to the variance function Vi = 4>lli,
' t 'b tion (Breslow, 1984), is the form induced by the poisson-gamma d IS n u Vi = Iti(l
+ lti¢J) ,
1\,
m
L>i{Yi - exp(x~t3)} == 0
Example A.2. (continued)
'
, 'is difficult to choose empiriWith a limited amount of data avaIla~le, and McCullagh, 1993). g cally between these two variance functIOns ( llUl TT helps to alleviate 'ance estlmat e, Y2, The availability of the rob ust varI , f, ula in larger samples. of varIance orm ' the concern regarding the ch Olce
STATISTICAL BACKGROUND 848
It is interesting to note t~at in the special case where Iti == It and hence log f-li == 13o, the estimate '\12 reduces to m
~ - 2 L,..(Yi - Y) 1m 2 ,
i=l
Bibliography
the sampling variance of ~o == logY (Royall, 1986).
Aerts, M. ~nd. Claeskens, G. (1997). Local polynomial estimation in multiparameter lIkelIhood models. Journal of the American Statisti I A . t· 92, 1536-45. ca ssocza wn, Afsarinejad, K. (1983). Balanced repeated measurements designs. Bio etrik 70, 199-204. m a, Agresti, A. (1990). Categorical data analysis. John Wiley, New York. Agresti, A. and Lang, J. (1993). A proportional odds model with subject-specific effects for repeated ordered categorical responses. Biometrika, 80, 527-34. Agresti, A. (1999). Modelling ordered categorical data: recent advances and future challenges. Statistics in Medicine, 18, 2191-207. Aitkin, M., Anderson, D., Francis, B., and Hinde, J. (1989). Statistical modelling in GLIM. Oxford University Press, Oxford. Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polytomous data. Journal of the American Statistical Association, 88, 669-79. Alexander, C.S. and Markowitz, R. (1986). Maternal employment and use of pediatric clinic services. Medical Care, 24(2), 134--47. Almon, S. (1965). The distributed lag between capital appropriations and expenditures. Econometrica, 33, 178-96. Altman, N.S. (1990). Kernel smoothing of data with correlated errors. Journal of the American Statistical Association, 85, 749-59. Amemiya, T. (1985). Advanced econometrics. Harvard University Press, Cambridge Massachusetts. Amemiya, T. and Morimune, K. (1974). Selecting the optimal order o.f ~olyno mial in the Almon distributed lag. The Review of Economics and Statzstzcs, 56, 378-86. Andersen, P.K., Borgan, 0., Gill, R.D., and Keiding, N. (1993). Statistical models based on counting processes. Springer-Verlag, New York. Anderson, J.A. (1984). Regression and orde~ed categorical variables (with Discussion). JournoJ, d{ the Royal Statistical Soczety, B, 46, 1-30.
BIBLIOGRAPHY BIBLIOGRAPHY 350
. mponcnt models with binary A'tk' M (1985). VarIance co ..' t B 47 AnderfolOn, D.A. and I .m, .. " al 01 the Royal Statz8tzcal 80cu ',I), , , . interviewer vaTlahlllty. Journ reRpon s C . ' " 203 10. . d reg r c8SlUTt. Au mtroduction ~) Pl t transfo1'1natlO ns an p Atkinllon, A.e, (198.1. 0 II" " ~7!alysis. Oxford University' ress, to ,qraphical mdhod.~ of diagTto,~tzc regresszon )
Oxford. . h app I'Ica t'1011 • t t rreilltcd data Wit ' , , A (1994) Lo~iHtic re~resHlon ,or au oco AzzzaIml,. . . 81 767 75 to repeated me8J!ureH. Biol1wtnka" , A rc resentation of the joint distribution of re/lponses to n Bahadur, R.R, (1961). P 't lysi8 and prediction (cd. H. Solomon), 't In Studzes on I em ana f dichotomous I ems. h t' I Studies in the Social Sciences VI, Stan ord pp. 158-68, Stanford Mat ema Ica . University PresR, Stanford, California.
351
Breslow, N.E. and Day, N.E. (1980). Statistical methods in cancer research, Volume J. lARC Scientific Publications No. 32. Lyon. Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika, 82, 81-91. Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (C/R: p976-994). Journal of the American Statistical Association, 93, 961-76. Carey, V.C. (1992). Regression analysis for large binary clusters. Unpublished PhD thesis, Department of Biostatistics, The Johns Hopkins University, Baltimore, Maryland. Carey, V.C., Zeger, S.L., and Diggle, PoOL (1993), Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517-26.
Barnard, G.A. (1963). Contributions to the discussion of Professor Bartlett's paper. Journal of the Royal Stati.~tical Soezety, B, 25, 294.
Carlin, B.P. and Louis, T.A. (1996), Bayes and empirical Bayes methods for data analysis, Chapman and Hall, London.
Bartholomew, D.J. (1987). Latent variable models and factor analysis. Oxford University Press, New York.
Chambers, J.M. and Hastie, T.J. (1992). Stati8tical models in S. Wadsworth and Brooks-Cole, Pacific Grove.
Bates, D.M. and Watts, D.C. (1988). Nonlinear regression analysis and its Applications. Wiley, New York.
Chambers, J.M., Cleveland, W.S., Kleiner, 8., and Thkey, P.A. (1983). Graphical methods for data analysis. Wadsworth, Belmont, California.
Becker, R.A., Chambers, J.M., and Wilks, A.R. (1988). The new S language. Wadsworth and Brooks-Cole, Pacific Crove.
Chib, S. and Carlin, B. (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and Computing, 9, 17-26.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B, 36, 192-236.
Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347-61.
Billingsley, P. (1961). Statistical inference for Markov processes. University of Chicago Press, Chicago, Illinois.
Clayton, D.G. (1974). Some odds ratio statistics for the analysis of ordered categorical data. Biometrika, 61, 525-31.
Bishop, S.H. and Jones, B. (1984). A review of higher-order cross-over designs. Journal of Applied Statistics, 11, 29-50.
Clayton D.G. (1992). Repeated ordinal measurements: a generalis~ esti~t, . ing equation approach. Med~cal Researc h Counct'I Biostatistics Umt Technacal Reports, Cambridge, England.
Bishop,Y.M.M., Fienberg, S.E., and Holland, P.W. (1975). Discrete multivariate analysis: theory and proctice. MIT Press, Cambridge, Massachussetts. Bloomfield, P. and Watson, G.S. (1975). The inefficiency of least squares. Biometrika, 62, 121-28.
~oo~h, J.C, and Hobert, J.P. (1999). Maximizing generalized linear mixed model hkeh.ho.ods with an automated Monte Carlo EM algorithm. Journal of the Royal Stattstlcal Society, B, 61, 265-85. Box, G.P, ~nd Jenkins, C.M. (1970). Time series analysis _ forecasting and control (revised edn). Holden-Day, San Francisco, California. Breslow, N.E. (1984). Extra-Poisson variation in log linear models. Applied Statlstlcs, 33, 38-44.
rBreslow,' N.E. d
and Clayton D G (1993) A . ' . . . pprmomate inference in generalized mear mlxe models. Journal of the American Statistical Association, 88, 125-34.
Cleveland, W.S. (1979). Robust locally, w~ighted r~gr~ion a~~9~;~othing scatterplots. Journal of the American Statzstzcal ASSOCIatIOn, 14, Cochran, W.G. (1977). Sampling techniques. John Wiley, New York. y data Biometrics, 46, d I fi b' Conaway, M.R. (1990). A random effects mo e or mar . 317-28. d 'ifi ce in regre.ssion. Cook, D. and Weisberg, S. (1982). Residuals an an uen Chapman and Hall, London. amples (with d Copas, J.B. and Li, H.G. (1997). Inference for ~n-;:5~~5~ Discussion). Journal of the Royal Statistical Soctety, " . 06 Second supplement published 182? Courcier. Reissued with a supplement, 18 . 929 576-9 in A source book In A portion of the appendix was translated, 1 ,pp.
BIBLIOGRAPHY BIBLIOGRAPHY
352 b H A Ruger and H.M. Walker, McGraw , DESmith ed. trans. y . . y: k mathematzcs, . d' 1959 in 2 volumes, Dover, New or. H'Il New York' reprmte ' d t Chapman and Hall, London. I " Cox, D.R. (1970). Analysis of bmary a a. .' . d i d life tables (with dIl'ICUsslOn). Journal of Cox, D.R. (1972). RegressIOn mo e S an _ the Royal Statistical Soczety, B, 74, 187 200. . " , ' · t fstical analysIs. Statzstzcal sczence, 5, I d Cox, D.R. (1990). Role of rna e SIllS a I J.
169-74.
353
Diggle, P.J. (1990). Time series: a biostatistical introduction. Oxford University Press, Oxford.
•
.
Cox, D.R. and MIller, Wiley, New York.
H D (1965). The theory of stochastic processes. John .,
1984). Analysis of survival data. Chapman and Hall, Cox, D.R. an d 0 a kes, D . ( London. J (1989) Analysis of binary data. Chapman and Hall, II E .. Cox, .R . an d Sne, . London.
n
Diggl e , P.J. and Kenward, M.G. (1994). Informative dropout in long't d' I d t . ( . h d' .) A . I U ma a a analySIS WIt Iscusslon. pphed Statistics, 43, 49-73. Diggle, P.J. (1998). Dealing with missing values in longitudinal studies. In Advances zn the statzstzcal analysis of medical data (ed. B.S. Everitt and G. Dunn), pp. 203-28. Edward Arnold, London. Diggle, P.J. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data. Biometrics, 54, 401-15. Draper, N. and Smith, H. (1981). Applied regression analysis (2nd edn), Wiley, New York. Drum, M.L. and McCullagh, P. (1993). REML estimation with exact covariance in the logistic mixed model. Biometrics, 49, 677-89.
Cressie, N.A.C. (1993). Statistics for spatial data. Wiley, New York.
Dupuis Sammel, M. and Ryan, L.M. (1996). Latent variable models with fixed effects. Biometrics, 52, 650-63.
Crouch, A.C. and Spiegelman, E. (1990). The evaluation of integrals of the form J f(t) exp(_t 2 ) dt: application to logistic-normal models. Journal of the American Statistical Association, 85, 464-69.
Emond, M.J., Ritz, J., and Oakes, D. (1997). Bias in GEE estimates from misspecified models for longitudinal data. Communications in Statistics, 26, 15-32.
Cullls, B.R. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 79-80.
Engle, R.F., Hendry, D.F., and Richard, J.-F. (1983). Exogeneity, Econometrica, 51, 277-304.
Cul1is, B.R. and McGilchrist, C.A. (1990). A model for the analysis of growth data from designed experiments. Biometrics, 46, 131-42.
Evans, J.L. and Roberts, E.A. (1979). Analysis of sequential observations with applications to experiments on grazing animals and perennial plants. Biometrics, 35,687-93.
Davidian, M. and Gallant, A.R. (1992). The nonlinear mixed effects model with a smooth random effects density. Department of Statistics Technical Report, North Carolina State University, Campus Box 8203, Raleigh, North Carolina 27695. Davidian, M. and Giltinan, D,M. (1995). Nonlinear mixed effects models for repeated measurement data. Chapman and Hall, London. Deming, W.E. and Stephan, F.F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11, 427-44.
~empster, A.P., L~ird, N.M., and Rubin, D.B. (1977). Maximum likelihood from mcomplete data B, 39, 1-38.
VIa
the EM algorithm. Journal of the Royal Statistical Society,
Dhrymes ' P "J (1971) . D'zs t n 'b ut ed Iags: problems of estimation and formulation. Holden-Day, San Francisco, DBiggle't P,J · 4(1988). An approach to the analysis of repeated measures. lOme ncs, 4 ,959-71. Diggle, P.J. (1989). Testing for rand d . Biometrics, 45, 1255-58. om ropouts 1ll repeated measurement data.
Evans, M. and Swartz, T. (1995). Methods for approximating int~~als instatistics with special emphasis on Bayesian integration problems. Statzstzcal Sctence, 10,254-72. Faucett, C.L. and Thomas, D.C. (1996). Simultaneously model.lin g censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine, 15, 1663-86. Fearn, T. (1977). A two-stage model for growth curves which leads to Rao's covariance-adjusted estimates. Biometrika, 64, 141-43. 'l' th d its applications (3rd Feller, W. (1968). An introduction to probabz zty eory an edn). John Wiley, New York. d D A (1999) Combining mortality and . . . . .' Moo" 18 1341-54. Finkelstein, D.M. and Schoenfel, longitudinal measures in clinical trials. Statzstzcs m zcme, , . . d pendence estimating equaFitzmaurice, G.M. (1995). A caveat c~ncern~ng III e309-17 . tions with multivariate binary data. Bzometncs, 51,
BIBLIOGRAPHY BIBLIOGRAPHY
354 and Clifford, P. (1996). Logistic regression Fitzmaurice, G.M., Heath, A.F.; . . Journal 01 the Royal Slalzstzcal models for binary panel data wIth attntlOn.
Society, A, 159, 249 63.
. . • J N M (1993). A hkehhood-based method for ., " "k 80 141 .51. Fitzmaurice, G.M. and Lam, analysing longitudinal binary responses. Bwmetrz a, , " . . M 1 n t itzky A.G. (HJ9:3). RegresSIon models Fitzmaurice, G.M., LaIrd, N. ., all( .0 n ," 8 284 99 for discrete longitudinal responses. Statz.9tzcal Sczence" .
355
Godambe, V.P. (1960). An optimum property f l ' " " A if" 0 regu ar maxImum likelihood estImatIOn. nna so Mathemattcal Statistics, 31, 1208-12. Godfrey, L.G. and Poskitt, D.S. (1975). Testing the r t· t" f . l 1 esrIe IOns 0 the Almon lag technIque. Journa 0 the American Statistical Assoc,a ; t"lOn, 70 ,105-8. Goldfarb, N. (1960). An introduction to longitudinal stat1..stical analysis: the method of repeated observations from a fixed sample. Free Press of Glencoe, Illinois.
M (1995) An approximate generalized linear model with Follman, D. and Wu,. '.. " .. , 51 15168 random effects for informative mlssmg data. Bwmetrzcs" .
Goldstein, H. (1979). The design and analysis of longitudinal studies: their role in the measurement of change. Academic Press, London.
. k S J (1992). Repeated measures in clinical trials: anaF'r1son L J and Pococ, ., . S t" t" " , ... t t' tics and its implication for deSIgn. ta zs zcs m lysis using mean summary s a IS • Medicine, 11, 1685--1704. . LJ d Pock S.J. (1997). Linearly divergent treatment effects in F'r1son, ... an oc , . . t t' t' clinical trials with repeated measures: efficient analYSIS usmg summary s a IS ICS. Statistics in Medicine, 16, 2855-72.
Goldstein, H. (1986). Multilevel mixed linear model analy SIS . usmg . 't . " I era t'Ive generalised least squares. Bwmetrika, 73, 43-56.
Gabriel, K.R. (1962). Ante dependence analysis of an ordered set of variables. Anna18 of Mathematical Statistics, 33, 201~12. Gauss, C.F. (1809). Theoria motus corporum celestium. Hamburg: Perthes et Besser. Translated, 1857, as Theory of motion of the heavenly bodies moving about the sun in conic sections, trans. C. H. Davis. Little, Brown, Boston. Reprinted, 1963; Dover, New York. French translation of the portion on least squares, pp. 11134 in Gauss, 1855. Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to calculating margina densities. Journal of the American Statistical Association, 85, 398-409. Gelfand, A.E., Hills, S.E., Racine-Poan, A., and Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85, 972-85. Gelman, A., Carlin, J.B, Stern, H.S. and Rubin, D.B. (1995). Bayesian data analysis. Chapman and Hall, London.
Goldstein, H. (1995). Multilevel statistical models (2nd edn). Edward Arnold, London. Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A, 159, 505-13. Gourieroux, C., Monfort, A., and Trognon, A. (1984). Psuedo-maximum likelihood methods: theory. Econometrica, 52, 681-700. Graubard, B.I. and Korn, E.L. (1994). Regression analysis with clustered data. Statistics in Medicine, 13, 509-22. Gray, S.M. and Brookmeyer, R. (1998). Estimating a treatment effect from multidimensional longitudinal data. Biometrics, 54, 976-88. Gray, S. and Brookmeyer, R. (2000). Multidimensional longitudinal data: estimating a treatment effect from continuous, discrete or time to event response variables. Journal of American Statistical Association, 95, 396-406. Graybill, F. (1976). Theory and application of the linear model. Wadsworth, California.
Gibaldi, M. and Perrier, D. (1982). Pharmacokinetics. Marcel Dekker, New York.
Greenwood, M. and Yule, G.U. (1920). An enquiry into the nature of frequency distributions to the occurrence of multiple attacks of disease or of repeated accidents. Journal of the Royal Statistical Socieity, Series A, 83, 255-79.
Gilks, W., Richardson, S., and Speigelhalter, D. (1996). Markov chain Monte Carlo in practice. Chapman and Hall, London.
Grieve, A.P. (1994). Contribution to the Discussion of the paper by Diggle and Kenward. Applied Statistics, 43, 74-6.
Gilmour, A.R., A?ders~n, R.D., and Rae, A.L. (1985). The analysis of binomial data by a generalized hnear mixed model, Biometrika, 72, 593-99.
Griffiths D.A. (1973). Maximum likelihood estimation for the beta-binomial distribution, and an application to the househo Id d'ISt n'b u t'IOn 0 f the total number of cases of a disease. Biometrics, 29, 637-48.
Glasbey,C.A. (1988). Examples of regression with serially correlated errors. The Statzstzczan, 37, 277-92.
G11~~e~ G.FI·V · and McCullagh, P. o
e
(1995). Multivariate logistic models. Journal ova S tattsttcal Society, B, 57, 533-46.
Gromping, U. (1996). A note on fitting a marginal model to mixed effects loglinear regression data via GEE. Biometrics, 52, 28()-5. . f ult' . trouped Guo, S.W. and Lin, D.Y. (1994). Regression analySIS 0 m lvarl8 e g survival data. Biometrics, 50, 632-39.
BIBLIOGRAPHY
BIBLIOGRAPHY
357
356 E t d d generalized estimating equations Hall DB. and Severini, T.A. (1998). x en ;tatistical Association, 93, 1:365-75. for ~lu8~ered data, Journal of lite Amencan .. . .. t' egression. Cambridge Umverslty HardIe, W. (1990). Applied nonparame nc r .
Press New York.
• , . 1 'ff R M Pryor D.B. and Rosati, R.A. (1984). .., ' ' ., 5' . t· . E Lee , K.L. 'CalI, 11 F.., Harre, . [. . proved prognostic predictIOn. , talls zcs tTl Regression modelling strateglCs or 1m Medicine, 3, 143-52. . . t' with time series errors. Journal Hart, J.D. (1991). Kernel regressIOn estlma Ion of the Royal Statistical Society, B, 53, 173-87. Kernel regression estimation using ( 1986) E . Hart J.D. and WeIu Iy, T . ' . - l A _. , t d t Journal of the American Statzstzca ssoczatlOn, repeated mellBuremen s a a. 81, 1080-88.
Harville, D. (1974). Bayesian inference for variance components using only error contrllBts. Biometrika, 61, 383-85.
Heckman, J.J. and Singer, B. (1985). Longitudinal anal . Cambridge University Press, Cambridge. yszs
fl b 0
a Our market data.
Hedayat, A. and Afsarinejad, K. (1975). Repeated d . .. . measures eSlgns I In A r vey of statzstzcal deszgn and linear models (Ed J N S' )" su . . . nVllBtava. North-Holland Amster d am. ' Hedayat, A. and Afsarinejad, K. (1978). Repeated measure d ' II A of Statistics, 6, 619-28. s eslgns, . nnals d 1 Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regr~s . ' B lOmetrics, . " SlOn mo e for mu 1tl'1 eve I ana I YSIS. 50, 933-44. Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and recurrent events. Biostatistics, 1, 465-80. Hernan, M.A., Brumback, B., and Robins, J.M. (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96, 440-8.
Harville, D. (1977). Maximum likelihood estimation of variance components and related problems. Journal of the American Statistical Association, 72, 320-40.
Heyting, A., Tolboom, J.T.B.M., and Essers, J.G.A. (1992). Statistical handling of dropouts in longitudinal clinical trials. Statistics in Medicine, 11, 2043-62.
Hastie, T.J. and Tibshirani, R.J. (1990). Generalized additive models. Chapman and Hall, New York.
Hogan, J.W. and Laird, N.M. (l977a). Model-based approaches to analysing incomplete longitudinal and failuretime data. Statistics in Medicine, 16, 259-72.
Hausman, J.A. (1978). Specification tests in econometrics. Econometrica, 46, 1251-72.
Hogan, J.W. and Laird, N.M. (1977b). Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine, 16, 239-57.
Heagerty, P.J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics, 55, 247-57.
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-61.
Heagerty, P.J. (2002). Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics 58 (to appear).
Huber, P.J. (1967). The behaviour of maximum likelihood estimators under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, LeCam, L.M. and Neyman, J. editors, University of California Press, pp. 221-33.
Heagerty P.J. and Kurland, B.F. (2001). Misspecified maximum likelihood estimates and generalized linear mixed models. Biometrika, 88, 973-85. Heagerty,. P.J. and Zeger, S.L. (1996). Marginal regression models for clustered ordlllal measurements. Journal of the American Statistical Association 91 1024-36. ' , Heagerty,. P.J. and Zeger, S.L. (1998). Lorelogram: a regression approach to eX~lormg dependence in longitudinal categorical responses. Journal of the Amencan Statzstxcal Association, 93, 150-62. Heagerty, P.J. and Zeger S L (2000) M . l' rk l'h d' f . '. . . . argma Ized multilevel models and I e I 00 In erence. Statlstlcal Science, 15, 1-26. Heagerty, P.J. and Zeger S L (2000) Mit' . . ' " . u lVaflate continuation ratio models: connectIOns and caveats. Biometrics, 56, 719-32. Heckman, J.J. (1976) The commo t f n s ruct~re 0 statistical models of truncation, sample selection a d I: 't d d 1 for such models ~ 17 e E ependent vaflables, and a simple estimation method . nna s 01 conomzc and Social Measurement, 5, 475-92.
Hughes, J.P. (1999). Mixed effects models with censored data with application to HIV RNA levels. Biometrics, 55, 625-9. Jones, B. and Kenward, M.G. (1987). Modelling binary data from a three-point cross-over trial. Statistics in Medicine, 6, 555-64. Jones, B. and Kenward, M.G. (1989). Design and analysis of cross-over trials. Chapman and Hall, London. Jones, M.C. and Rice, J.A. (1992). Di~playing .the imp~~~1:::~es of large collections of similar curves. The Amencan Statzstzczan, ' .h . I rrelation- a state-space Jones, RM. (1993). Longitudinal data wzt serta co . approach. Chapman and Hall, London. Jones, R.H. and Ackerson, L.M. (1990). Serial correlation in unequally spaced longitudinal data. Biometrika, 77, 721-31.
BIBLIOGRAPHY
BIBLIOGRAPHY
359
358
. . F (1991) Unequally spaced longitudinal data .' . J onCll, R"H. and Boadl-Boteng, "161-75 wit.h IlClrial eorrclation. Bzometnc,~, 47, . .. (' J (1978) Mining geostatistic8. Academic Press, • Journel, A.G. and HUlJoregts, N " New York. of systf'matic sampling from conveyor belts. Jowett, G.H. (19S2), T he accuracy" Applied Statistics, 1, SOg. . I P t·· R I (1980) . Tlw statistical aualysi8 of failure time KalbflClBch, J. D ,am ren Ice, ,." data. .John Wiley, New York.
Lauritzen, S.L. (2000). Causal inference from gr h' I R-99-2021, Department of Mathematical Sci en ap Alcalbmodels.. Res~arch Report ces, a org Ulllversity. Lavalley, M.P. and De Gruttola , V. (1996) • IV ~Iodel ~ .. S lor empIrIcal B ayes t' of longitudinal CD4 counts. Statistics in Medicine, 15, 2289_305. es Imators Lawless, J.F. (1982). Statistical models and meth d ' I" . ' a s Jar IJetlme data, John \Viley , k N ew Yor. Lee, Y. and NeIder, J.A. (1996). Hierarchical gener I' d I' .. ') J . . a Ize meal' models (with dISCUSSIon. ournal of the Royal Stattstlcal Society, Series B, 58, 619-78,
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Baaic Engineering, 82, 34 45.
Lee, Y. and Neider, J.A. (2001), Hierarchical generalised linear mod I th 'f r d r e s , a syn eSIS 0 genera Ise meal' models, random-effects models and structured d' . Biometrika 88, 987-1006. IsperSlOns.
Karim, M.R, (1991). Generalized linear models with random eff~cts: a. Gi~bs sampling approach, Unpublished PhD thesis from the Johns Hopkms Umverslty Department of Biostatistics, Baltimore, Maryland.
Legendre, A.M. (1805). Nouvelles Methodes pour La determination des orbites des comEtes. John Wiley, New York.
Kaslow, R.A"
Ostrow, D.G"
Detels, R. et al, (1987). The Multicenter
AIDS Cohort Study: rationale, organization and selected characteristics of the participants. American Journal of Epidemiology, 126, 310-18. Kaufmann, H. (1987). Regression models for nonstationary categorical time series: asymptotic estimation theory. Annals of Statistics, 15, 863-71. Kenward, M.G. (1987). A method for comparing profiles of repeated measurements. Applied Statistics, 36, 296-308. Kenward, M.G., Lesaffre, E., and Molenberghs, G. (1994). An application of maximum likelihood and estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics, 50, 945-53. Korn, E.L. and Whittemore, A.S. (1979). Methods for analyzing panel studies of acute health effects of air pollution. Biometrics, 35, 795-802. Laird, N.M. (1988), Missing data in longitudinal studies. Statistics in Medicine, 7,305-15. N.M. and Wang" F (1990) t' t'lIIg rates of change III . randomized Laird, I' ' . . E SIma c InIcal tl'lals, Controlled Clinical Trials, 11, 405-19.
LB~ird, tN :M . and Ware, J .H. wme ncs, 38, 963-74.
(1982). Random-effects models for longitudinal data
.
Lang, J.B, and Agresti A (1994) S' I f ,'.' . Imu taneously modeling joint and marginal IS l'l U Ions 0 multIvarIate categorical responses Jo . Statzsllcal Association, 89, 625-32. . urnal of the Amencan
d' t 'b t'
Lang, J.B., McDonald J,W and Sm'th P W modeling of mUltivaria~e cat" . I I , ' .F. (1999). Association-marginal egOl'lca responses' a max' l'k l'h Journal of the Amencan St t' t· l A ' .' Imum lei ood approach. a IS lca ssonatlOn, 94, 1161-71. Lange, N. and Ryan L (1989) A . " . ssesslIIg norm I't· d Annals of Statistics, 17, 624-42. a I y 111 ran om effects models.
Lepper, A.W.D. (1989). Effects of altered dietary iron intake in Mycobacterium paratuberculosis-infected dairy cattle: sequential observations on growth, iron and copper metabolism and development of paratuberculosis. Res. Vet. Sci., 46, 289-96. Liang, K.- Y. and McCullagh, P. (1993). Case studies in binary dispersion. Biometrics, 49, 623-30. Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Liang, K.-Y. and Zeger, S.L. (2000). Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankya, B , 62, 134-48. Liang, K.-Y., Zeger, S.L., and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with Discussion). Journal of the Royal Statistical Society, B, 54, 3-40. Lin, D.Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association, 96,103-26. Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the Amencan Statistical Association, 91, 1007-16. Lin, X. and Carroll, R.J. (2000). Nonparametric function estimation for cJust~red data when the predictor is measured without/with error. Journal of the Amencan Statistical Association, 95, 520-34.
J an d B a t es, D .M . (1990). Nonlinear mixed effects models for LI'nds t rom, M ., repeated measures data. Biometrics, 46, 673--87.
360
BIBLIOGRAPHY
BIBLIOGRAPHY 361
. " t'mg equations for correlated binary data: . . S (1989) Generalized estlma D LipSitZ,.. f ciation Technical report, epartment using the odds ratio as a measure 0 as~o Ith' of Biostatistics, Harvard School of Public Hea .
Molenberghs, G. and Lesalfre E (1994) M . I ,. . argma mod r f I J e mg 0 corre ated ordinal data using a multivariate Plackett distr'b t' , , . 1 U IOn. ournal o{ the A ' me7'1.can Statistzcal Assoczatzon, 89, 633-44.
. gton, D . (1991). Generalized estimating. equa. L· 't S laird N an d Harrm IPSI Z, ., -' ,., d . jds ratios as a measure of assoCIatIOn. tions for correlated binary ata: uSing or Biometrika, 78, 15.360.
Molenberghs, G. and Lesaffre, E. (1999). Marginal d I' . I d a t a. S tatzstzcs ' , m . Medicine, 18, 2237-55. rna e mg categonca
. ture models incomplete data. ) P attern-mlx Little, R..1.A. (1993. . ' for multivariate , Journal of the American Statistical Assoczatwn, 88, 125-34. Little, R..1.A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journal of the American Statzstzeal Assoczatwn, 90, 1112 21. Little, R..1.A. and Rubin, D.B. (1987). Statistical analysis with missing data. John Wiley, New York. Little, R.1. and Rubin, D.B. (2000). Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annual Review in Public Health, 21, 121-45. Lin, G. and Liang, K.-Y. (1997). Sample size calculations for studies with correlated observations. Biometrics, 53, 937-47. Lin, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81,624-9. Lindsey, J.K. (2000). Obtaining marginal estimates from conditional categorical repeated measurements models with missing data. Statistics in Medicine, 19, 801-9.
f 0
'. multJvanate '
Molenberghs, G., Michiels, B., Kenward M.G. and D' I P.1 ( . . h . , , Igg e, " 1997). IVhssmg data mec amsms and pattern-mixture models St t' 't· N I ' a ~s ~ca . eer and~ca 1 5 3 - 6. 1 , 52 , Monahan, J.F. and Stefanski, L.A. (1992) Normal scale nIl' t '. • . . x ure approXimatIOns to F (x) and computatIOn of the logistic-normal integral. In Handbook of the logzstzc dzstrzbutzon (ed. N. Balakrishnan), pp. 529-40. Ivlarcel Dekker, New York. Morton, R (1987). A generalized linear model with nested strat of extra-Poisson variation. Biometrika, 74, 247-57. Mosteller, F. and Tukey, J.W. (1977). Data analysis and regression: a second course in statistics. Addison-Wesley, Reading, Massachusetts. Moyeed, RA. and Diggle, P.J. (1994). Rates of convergence in semi-parametric modelling of longitudinal data. Australian Journal of Statistics, 36, 75-93. Milller, M.G. (1988). Nonparametric regression analysis of longitudinal data. Lecture Notes in Statistics, 41. Springer-Verlag, Berlin. Munoz, A., Carey, V., Schouten, J.P., Segal, M., and Rosner, B. (1992). A parametric family of correlation structures for the analysis of longitudinal data. Biometrics, 48, 733-42.
Lunn, D.J., Wakefield, J., and Racine-Poon, A. (2001). Cumulative logit models for ordinal data, a case study involving allergic rhinitis severity SCores. Statistics in Medicine, 20, 2261-85.
Murray, G.D. and Findlay, J.G. (1988). Correcting for the bias caused by dropouts in hypertension trials. Statistics in Medicine, 1, 941-46.
Mason, W.E. and Fienberg, S.E. (eds) (1985). Cohort analysis in social research: beyond the zdentification problem. Springer-Verlag, New York.
NeIder, J.A. and Mead, R (1965). A simplex method for function minimisation. Computational Journal, 7, 303-13.
McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal o{ the Royal Statzshcal Society, B, 42, 109-42.
Neuhaus, J.M., Hauck, W.W., and Kalbfleisch, J.D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika, 79, 755-62.
McCUllagh, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11, 59-67. . , McCUllagh, P. and NeIder J A (1989) G Hall, New York. ' . . . eneraltzed lznear models. Chapman and McCulloch, C.E. (1997). Maximum I'k I'h d . mixed models. Journal o{ th A ' I e I 00 ~lgonthms for generalized linear e me7'1.can Statzstzcal Association, 92, 162-70. Mead, R. and Curnow, R.N. (1983) S ' . '. experimental biology Chap d H . tatzshcal methods zn ag'7'1.culture and . man an all, London. Molenberghs, G., Kenward M G d L e~affre, E. (1997). The analysis of longitudinal ordinal data Wi~h . '£ ., a~ III ormatJvedvl dropout. Biometrika, 84, 33-44.
Neuhaus, J.M. and Jewell, N.P. (1990). Some comments on Rosner's multiple logistic model for clustered data. Biometrics, 46, 523-34. Neuhaus J M and Kalbfleisch J.D. (1998). Between- and within-cluster , . . , .' 638-45 covariate effects in the analysis of clustered data. BlOmetncB, 54, . Neuhaus J.M. Kalbfleisch J.D., and Hauck, W.W. (1991). ~ comparison " , . average d approaches for analyzmg correlated of cluster-specific and populatIOn binary data. International Statistical Review, 59, 25-36. . . Neyman J. (1923). On the applIcatIOn a f pro ba b'l't 11 y theory . ' to agricultural . ' . , experiments: essay on prinCiples, sectIOn 9 , t ransIa ted in Stat1,Stzcal Sczence, 1990, 5,65-80.
BIBLIOGRAPHY BIBLIOGRAPHY
362
. t based on partially E L (1948). Consistent estlma es NeYr,nan, .1. and ~~~~' Ec~~ometrica, 16, 1-32. consIstent observatl . samples with multiple end. PC (1984). Procedurcfi for comparIng O'Brien, . ' . pOI·nts. Biometrics, 40, 1079-87. . r)'ance functIon "n.stimation for non-normal . M.C. (1992). ParametrIc va . Palk't d me!L'lurement data. Biometncs, 48, 18-30. repea e f :I' doth -H (1997). Effect of con oune mg an er Palta, M., Lin, C-Y., and Chao, d'. I d ta In Modelling longztudinal and . .' d Is for longltu lIla a . . ' rnisspecIflcatlon In rno e l" t'ons and future directzons (Spnnger spatially correlated data: methods, app zca z_ ' Lecture Notes in Statistics, Volume 122), 77 87. . . 2000). A note on margmal Imear ( W L . T A and C annett , J .E. Pan, ." ~UhIB, 'I'~ d response data. American Statistician, 54, 191--5. regressIOn Wit corre a e . . 1985). Nested analysis of varIance wIth k K.H, ( Pantula S.G. and Po IIOC, autocor;elated errors. Biometrics, 41, 909-20.
w,.
Patterson, H.D. (1951). Change-over trials. Journal of the Royal Statistical Society, B, 13, 256-71. Patterson, H.D. and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-54. Pawitan, Y. and Self, S. (1993). Modelling disease marker processes in AIDS. Journal of the American Statistical Association, 88, 719-26. Pearl, J. (2000). Causal inference in the health sciences: a conceptual introduction. Contributions to Health Services and Outcomes Research Methodology, Technical report R-282 , Department of Computer Science, University of California, Los Angeles. Pepe, M.S. and Anderson, G.A. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communication in Statistics - Simulation, 23(4), 939-51. Pepe, M.S. and Couper, D. (1997). Modeling partly conditional means with longitudinal data. Journal of the American Statistical Association, 92, 991-8. Pepe, M.S., Heagerty, P.J., and Whitaker, R. (1999). Prediction using partly conditional time-varying coefficients regression models. Biometrics, 55, 944-50. Pierce, D.A. and Sands, B.R. (1975). Extra-Bernoulli variation in binary data. Technical Report 46, Department of Statistics, Oregon State University.
Pinh~iro,. J.C. and Bates, D.M. (1995). Approximations to the log-likelihood functIOn III the non-linear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12-35. Plewis, 1. (1985). Analysing change: measurement and explanation using longitudmal data. John Wiley, New York.
363
pocock, S.J., Geller, N.L., and Tsiatis, A.A. (1987). The analysI's of mult' 1 . cI"Imca1 t rIa. . 15 B'wmetrics, 43, 487-98. Ip e endpoints III poififion, S.D. (1837). Recherches sur la Probabilite des Jugements en Mat' · C" lere Criminelle :~ .en M atlere. IVlle, ~recede€'..s des Regles Generales du Calcul des ProbabIlItIes. .Bacheher, ImprImeur-Libraire pour les Mathematiques , Ia Physique, etc., P ans. pourahmadi, M. (1999). Joint mean-covariance models with application to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677-90. Prentice, R.L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321-27. Prentice, R.L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033-48. Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics, 47, 825-39. Priestley, M.B. and Chao, M.T. (1972). Non-parametric function fitting. Journal of the Royal Statistical Society, B, 34, 384-92. Rao, C.R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika, 52, 447-58. Rao, C.R. (1973). Linear statistical inference and its applications (2nd edn). John Wiley, New York. Ratkowsky, New York.
D.A. (1983).
Non-linear regression modelling. Marcel Dekker,
Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and1cohvarRianc~ . Ily wh en th e d a t a are curves . Journal 0 t e oya structure nonparametnca Statistical Society, B, 53, 233-43. . ated me!L'lurement data. Ridout, M. (1991). Testing for random dropouts III repe Biometrics, 47, 1617-21. h to causal inference in mortality studies Robins, J.M. (1986). A new approac. . 0 control of the healthy worker with sustained exposure periods - applIcatIOn t . survivor effect. Mathematical Modelling, 7, 1393-512. ch to causal inference III morRobins, J.M. (1987). Addendum to 'A new .ar;rr~a lication to control of the tality studies with sustained exposure peno ~ M aft matics with Applications, healthy worker survivor effect.' Computers an a e 14, 923-45. . t' . uivalent trials. Stat'!S ICS Robins, J.M. (1998). Correction for non-compliance III eq in Medicine, 17, 269-302.
364
BIBLIOGRAPHY
. I t tural models versus structural nested models J M (1999). Margma fl ruc " b' Ro ms, .. ' . . I 0t t' tical models in epidemwlogy: the envzrontoolfl for causal mference. n.7 a UI ) 9~ 1"4 f as , , ' L (. 1 ME Halloran and D, Berry, pp. .j o J . MA ment and elmzcal trza~. u. ., Volumr; 116, Springer-Verlag New York. . .I.M., G' reen IdS causal effect Robmfl, an , ., an d Hu ,F. -C. (1999). Estimation of the I.. . ' (n the marginal mean of a repeated Jmary outcome of a tlme-varymg exposure J . ' ' ' . . '}I D'ISCUSSlon. . ) Jo umal o.rthe American WIt ~ . . , StatU/tIcal Assoczatwn, 94, 687712. (
' J M Rotnl'tzky" A. and , Zhao L.P. (1995). Analysis. . of semi parametric Rob Ins, ,. " regression models for repeated outcomes in the presence of mJssmg data. Journal oj the American Stati.~tical Association, 90, 106-21. Rosner, B. (1984). Multivariate methods in ophthalmology with application to other paired-data situations. Biometrics, 40, 1025-35. Rosner, B. (1989). Multivariate methods for clustered binary data with more than one level of nesting. Journal of the American Statistical Association, 84, 373-80. Rothman, K.J. and Greenland, S. (1998). Modern epidemiology. LippincottRaven. Rowell, J.G. and Walters, D.E. (1976). Analysing data with repeated observations on each experimental unit. Journal of Agricultural Science, 87, 423-32. Royall, R.M. (1986). Model robust inference using maximum likelihood estimators. International Statistical Review, 54, 221-26. Rubin, D.B. (1974). Estimating causal effects of treatment in randomized and non-randomized studies. Journal of Educational Psychology, 66, 688-701. Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-92.
~ubin, D.B. (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics, 6, 34-58. . Samet J M D ' . . F (20 ' :., ~mmlcl,.., Curnero, F.C., Coursac, 1., and Zeger, S.L. N 001 Fme partIculate aIr pollution and mortality in 20 US cities 1987-1994 ew ngland Journal of Medicine, 343(24), 1798-9. ' . Sandland, R.L, and McGilchrist C A (1979) S . Biometrics, 35, 255-71. ,. . . toChastIC growth curve analysis. Schall, R. (1991). Estimation in ener r d r . Biometrika, 78, 719-27. g a Ize mear models WIth random effects.
~charfstein, D.O., Rotnitzky, A., and Robi . Ignorable dropout using semipar t . ns, J.M. (1999). Adjusting for nonnc non-respon d I (. . . J ournal of the American Stati t'arne IA " se mo e s WIth DIScussIon). s tca ssoczatzon, 94, 1096-1120 . Seber, G A F (1977) L' . . . ,mear regression analysis. John Wiley, New York. Self, S. and Mauritsen R (1988) P [' " . ower/sample' I I . Inear models. Biometrics, 44, 79-86. SIze ca eu atlons for generalized
BIBLIOGRAPHY 365
Senn, S..1. (1992). Crossover trials in clinical . research. John Wiley, Chichester A (1997) . ShclIler, L.B., Beal, S .L " and Dun ne,. . cellsored ordered categorical longitud' I d ' AnalYSIS of nonrandomly , J ma ata from an I . . ' . a geBlc tnals (with DiscusslOn). ournal of the American St t t' I a zs lca ASSoCtatlOn, 92, 1235~55, Shih, .1. (1998), Modeling multivaraite d' [ . . 1115 28. lscrete allure tIme data. Biometrics, 54, Silverman, B.W. (1984). Spline smoothin . th ' , , g. e eqUIvalent va . bl k I Annals of Statzstlcs, 12,898-916. . na e erne method. Silverman, B.W. (1985). Some aspects f th . I' . ' . 0 e sp me smoothmg h non-parametnc regreSSIOn curve fitting (w'th D' , . approac to ., , I . Iscusslon) fouT7 I ,( th R Statzstzcal Sonely, B, 47, 1-52. . . W OJ e oyal SkeIlam, J.G. (1948). A probability distribution d "d f " 'b' b' erne rom the bmomlal dis trI utIOn y regardmg the probability of SUccess . bl b ' , . a.G, -41
INDEX further reading recommended 53 guidelines .'33 missing values in 282-318 longitudinal data analysis approaches 17-20 marginal analysis 17-18 random effects model 18 transition model 18 two-stage/derived variable analysis 17 classification of problems 20 confirmatory analysis 33 exploratory data analysis 33-53 longitudinal studies 1-3 advantages 1,16-17,22,245 compared with cross-sectional studies 1, 16-17,22-31 efficiency 24-6 lorelogram 34, 52-3 further reading recommended 53 lowess smoothing 41,44 compared with other curve-fitting methods 42, 45 examples 3, 36, 4 a
Madras Longitudinal Schizophrenia Study 234-7 analysis using marginalized models 240-3 marginal analysis 18 marginal generalized linear regression model 209 marginalized latent variable models 222-5, 232 maximum likelihood estimation for 225 marginalized log-linear models 220-1 233 marginalized models ' for categorical data 216-31 examples of use 231-3,240-3 example using Gaussian linear model 218-20 marginalized random effects models 223, 225 222, marginalized transition models 225-31 advantages 230-1 in examples 233,241-3 fir~t-order/MTM(l) 226-7, 230, 241 III example 241, 242 se~ond-order/MTM(2) 228 III example 242
INDEX 375 marginal mean response 17 marginal means definition 209 likel~hood-based estimates 232, 242 log-lInear model for 143-6 marginal models 17-18, 126-8, 141-68 advantages of direct approach 216-17 assumptions 126-7 examples of use 17-18,127,132, 135-6, 148-60 further reading recommended 167-8 and likelihood 138 marginal odds ratios 145, 147 marginal quasi-likelihood (MQL) methods 232 marginal structural models (MSMs) 276 advantage(s) 280 estimation using IPTW 277-9 in example 279-80 Markov Chain Monte Carlo (MCMC) methods 214-16, 332 in examples 232,238 Markov chains 131, 190 Markov models 87, 190-206 further reading recommended 206-7 see also transition models Markov-Poisson time series model 204-5 realization of 206 maximum likelihood algorithms 212 maximum likelihood estimation 64-5 Compared with REML estimation 69, 95 for generalized linear mixed models 212-14 in parametric modelling 98 for random effects models 137-8, 172-5 restricted 66-9 for transition models 138, 192-3 see also conditional likelihood; generalized estimating equations maximum likelihood estimator 60, 64, 340 variance 60 MCEM method see Monte Carlo Expectation-Maximization method MCMC methods see Markov Chain Monte Carlo methods MCNR method see Monte Carlo Newton-Raphson method mean response non-parametric modelling of 319-26 parametric modelling of 105-7
mean response profile(s) for calf intestinal parasites experiment 118 for cow weight data 106 defined in ANOVA 114 for mil~ protein data 99, 100, 102,302 for schIzophrenia trial data 14 307 309,311,315 " measurement error and random effects 91-3 and serial correlation 89-90 and random intercept 90-1 as Source of random variation 83 measurement variation 28 micro/macro data-representation strategy 37 milk protein data 5-7 8 9 dropouts in 290-1 ' , reasons for 285 testing for completely random dropouts 291-3 mean response profiles 99, 100 102 302
'
,
parametric model fitted 99-103 pattern mixture analysis of 301, 302 variogram 50, 52, 99 missing value mechanisms classification of 283-4 completely random 283, 284 random 283, 284 missing values 282-318 effects 282 ignorable 284 informative 80, 283 intermittent 284, 287, 318 and parametric modelling 80 model-based variance 347 model-fitting 93-8 diagnostic stage 98 estimation stage 95-7 formulation stage 94-5 inference stage 97-8 moments of response 138 Monte Carlo Expectation-Maximization (MCEM) method 214 Monte Carlo maximum likelihood algorithms 214 Monte Carlo Newton-Raphson (MCNR) method 214 Monte Carlo test(s), for completely random dropouts 290, 291
376 Mothers' Stress and Children's Morhidity (MSCM) Study 247-53 cross-sectional analysis 257-8 and endogeneity 268-9 g_computation 275-6 and lagged covariates 261-5 marginal structural models using IPTW 279-80 sample of data 252 Multicenter AIDS Cohort Study (MACS) 3 CESD (depressive symptoms) scores 39-40, 41 objective(s) 3-4 8ee also CD4+ cell numbers data multiple lagged covariates 260-1 multivariate Gaussian theory 339-40 multivariate longitudinal data 332-6 examples 332
natural parameter 345 negative-binomial distribution 161, 186-7 NeIder-Mead simplex algorithm 340 nested sub-models 342 Newton-Raphson iteration 340 non-linear random effects, in crosB-sectional models 327, 329 non-linear regression model 326-7 fitting to crosB-sectional data 327 non-linear regression modelling 326-9 non-parametric curve-fitting techniques 41-5 see also kernel estimation; lowess; smoothing spline non-parametric modelling of mean response 319--26 notation 15-16 causal models 271 conditional generalized linear model 209 dropout models 295 mar~inal generalized linear model 209 mIDClmum likelihood estimator 60 multivariate Gaussian distribution 339-40 non-linear regression model 326-7 parametric models 83-4 time-dependent covariates 245 no- unmeasured-confounders assumption 27Q-.1 273 numerical integration m~thods 212-14
INDEX odds ratio, in marginal model 127, 128 ordered categorical data 201-4 proportional odd modelling of 201-3 ordering statistic, data representation using 38 ordinary least squares (OLS) estimation and ignoring correlation in data 19 naive use 63 errors arising 63-4 in nonlinear regression modelling 119 relative efficiency in crossover example 63 in exponential correlation model 61-2 in linear regression example 62 in uniform correlation model 60-1 in robust estimation of standard errors 70, 75, 76 and sample variogram 50, 52 outliers, and curve fitting 44-5 over-dispersed count data, models for 161, 186-7 over-dispersion 162, 178, 346 ozone pollution effect on tree growth 4-5 see also Sitka spruce growth data
panel studies 2 parametric modelling 81-113 for count data 16~2 example applications 99-110 CD4+ data 108-10 cow weight data 103-8 milk protein data 99-103 fitting model to data 93-8 further reading recommended 113 notation 83-4 pure serial correlation model 84-9 random effects + measurement error model 91-3 random intercept + serial correlation + measurement error model 90-1 serial correlation + measurement error model 89-90 and sources of random variation 82-3 partly conditional mean 253 partly conditional models 259-60 pattern mixture dropout models 299-301 graphical representation 908 304 Pearson chi-squared statistic 186 Pearson's chi-squared test statistic 343
INDEX penalized Quasi-likelihood (PQL) methods 175, 210, 232 example of use 282 period 1 pig weight data 34 graphical representation 34-5, 35, 36 robust estimation of standard errors for 76-9 point process data 330 Poisson distribution 161, 186, 344 Poisson-gamma distribution 347 Poisson-Gaussian random effects models 188-9 Poisson regression models 344 population growth 205 Positive And Negative Syndrome Scale (PANSS) measure 11, 153, 330, 332 subset of placebo data 305 treatment effects 334, 335 potential outcomes 269-70 power of statistical test 28 predictive squared error (PSE) 45 predictors 337 principal components analysis, in data representation 38 probability density functions 88-9 proportional odds model 201-2 application to Markov chain 202-3 prospective collection of longitudinal data 1,2
Quadratic form (test) statistic 97, 309 quadrature methods 212-14 limitations 214 quasi-likelihood methods 232, 346-8 in example 347-8 see also marginal quasi-likelihood (MQL) methods; penalized quasi-likelihood (PQL) methods quasi-score function 346 random dropout mechanism 285 random effects + measurement error models 91-3 random effects dropout models 301-3 in example 312-14 graphical representation 909, 304, 305 random effects models 18, 82, 128-30, 169-89 assumptions 17Q-.1
377
basic premise 129, 169 examples of use 18, 129, 130, 132-3 fitting using maximum likelihood method 137-8 further reading recommended 189 hierarchical 334, 336 marginalized 222, 223, 225 multi-level 93 and two-stage least-squares estimation 57-9 random intercept models 90-1, 170, 210, 211 in example 239 random intercept + random slope (random line) models 210, 211, 238 in example 238-9 random intercept + serial correlation + measurement error model 9Q-.1 random missingness mechanism 283 generalized estimating equations under 293-5 random missing values 283, 284 random variation separation from systematic variation 217,218 sources 82-3 two-level models 93 reading ability/age example 1,2, 16 recurrent event data 330 recurrent events, joint modelling with longitudinal measurements 329-32 regression analysis 337 regression models notation 15 see also linear ... j non-linear regression model relative growth rate (RGR) 92 repeated measures ANOYA 123-5 see also split-plot ANOYA approach repeated observations correlation among 28 number per person 28 respiratory disease/infection, in Indonesian children 4, 131-6, 156-60, 182-4 restricted maximum likelihood (REML) estimation 66-9 compared with maximum likelihood estimation 69, 95 in parametric modelling 96, 99, 100
INDEX
INDEX
379
378 . t d maximum likelihood (REML) rest nc e estimation (co nt.) in robust estimation of standard errors 70-1,73-4, 79 . . retrospective collection of longltudmal data 1-2 RIce-Silverman prescription 321,322 robust estimation of standard errors 70-80 examples 73-9 robust variance 194, 347 roughness penalty 44 sample size calculations 25-31 binary responses 30-1 continuous responses 28-30 for marginal models 165-7 parameters required 25-8 sample variogram(s) 49 e'lCamples 51, 90, lOS, 105, 101 SAS softwll.l'e ~80, 214 saturated models 50, 65 graphical representation 909 limitations 65 robust estimation of standard errors in 70-1,73 scatterplots 33, 40 and correlation structure 46, 41 examples 96,98-49,4 5 ,41 schizophrenia clinical trial 10-13 dropouts in 12,13,306-16 marginal model used 153-6 mean response profiles 14, 901, 311, 315 multivariate data 332, 334, 935 PANSS measure 11,153,330,332 subset of placebo data 805 treatment effects 334, 335 random effects model used 181-2 variograms 308, 314 schizophrenia study (Madras Longitudinal Study) 234-7 analysis of data 237-43 score equations 173--4, 340 score function 340 score test statistic(s) 241, 242, 342 second-order marginalized transition modeljMTM(2) 228 in example 2413 selection dropout models 295-8 in example 312-16 graphical representation 303, 304-5
semi-parametric modelli~g 324 . sensitivity analysis, and mformatlVe dropout models 316 serial correlation 82 plus measurement error 89-90 and random intercept 90-1 pure 84-9 as source of random variation 82 Simulated Maximum Likelihood (SML) method 214 single lagged covariate 259-60 Sitka spruce growth data 4-5, 6, 7 derived variables used 119-20 robust estimation of standard errors for 73-6, 77 split-plot ANOVA applied 124-5 size-dependent branching process 204-5 smallest meaningful difference 27 smoothing spline 44, 320 compared with other curve-fitting techniques 42 smoothing techniques 33-4,41-5, 319 further reading recommended 41,53 spline 44 see also smoothing spline split-plot ANOVA approach 56, 123-5 example of use 124-5 split-plot model 92, 123, 124 stabilized weights 277 in example 278 standard errors robust estimation of 70-80 examples 73-9 standardized residuals, in graphical representation 35, 96 STATA software 214 stochastic covariates 253-8 strong exogeneity 246-7 structural nested models, further reading recommended 281 survival analysis 2 systematic variation, separation from random variation 217,218
time-by-time ANOVA 115-16,125 example of use 116, 118 time-dependent confounders 265-80 time-dependent covariates 245-81 time series analysis 2 time-by-time ANOVA, limitations 115-16 tracking 35
trajectories see individual trajectories transition matrix 194, 195 transition models 18, 130-1, 190-207 for categorical data 194-204 examples 197-201 for count data 204-6 examples of use 18, 130-1, 133, 197-201 fitting of 138, 192-4 marginalized 225-31 for ordered categorical data 201-4 see also Markov models transition ordinal regression model 203 tree growth data see Sitka spruce growth data Tufte's micro/macro datarrepresentation strategy 37 two-level models of random variation 93 two-stage analysis 17 two-stage least-squares estimation 57-9 type I error rate 26-7
unbalanced data 282 uniform correlation model 55-6, 285
variance functions 345 variograms 34, 48-50
autocorrelation function estimated from 50 in examples 51,52,308, 31 4,326 for exponential correlation model 85 further reading recommended 53 for Gaussian correlation model 86 for parametric models 102, 105, 107 for random intercepts + serial correlation + measurement error model 91 for serial correlation models 84-7 for stochastic process 48, 82 see also sample variogram vitamin A deficiency causes and effects 4, 197 see also Indonesian Children's Health Study Wald statistic 233, 241 weighted average 320 weighted least-squares estimation 59-64 working variance matrix 70 choice not critical 76 in examples 76, 78 xerophthalmia 4, 197 see also Indonesian Children's Health Study