The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
...
48 downloads
775 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
William F. Rosenberger George Mason University Fairfax, VA
@Z+KicjENCE A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
The Theory of Response-Adaptive Randomization in Clinical Trials
This Page Intentionally Left Blank
The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
William F. Rosenberger George Mason University Fairfax, VA
@Z+KicjENCE A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 0 2006 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 I River Street, Hoboken, NJ 07030, (201) 748-601 I, fax (201) 748-6008, or online at http://www.wiley.condgo/permission. Limit of Liability/Disclaimerof Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical suppolt, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-PublicationData:
Hu, Feifang, 1964The theory of response-adaptive randomization in clinical trials / Feifang Hu, William F. Rosenberger p. cm. Includes bibliographical references and index. ISBN-13: 978-0-471-65396-7(cloth) ISBN-10: 0-471-65396-9 (cloth) 1. Clinical trials. 1. Rosenberger, William F. 11. Title. R853.C55H8 2006 610.72’46~22 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 I
2006044048
To our teachers and our students
This Page Intentionally Left Blank
Contents
Dedication Preface
1 Introduction 1.1 Randomization in clinical trials 1.1.1 Complete randomization 1.1.2 Restricted randomizationprocedures 1.1.3 Response-adaptive randomizationprocedures 1.1.4 Covariate-adaptive randomizationprocedures 1.1.5 Covariafe-adjusted response-adaptive randomizationprocedures 1.2 Response-adaptive randomization in a historical context 1.3 Outline of the book 1.4 References 2 Fundamental Questions of Response-Adaptive Randomization 2.1 Optimal allocation
V
xi
6 7 8 8
11
11 vii
viii
CONTENTS
Therelationship betweenpower andresponse-adaptive randomization 2.3 The relationship for I< > 2 treatments 2.4 Asymptoticallybest procedures 2.5 References 2.2
15 i8 20 21
3 Likelihood-Based Inference 3. I Data structure and likelihood 3.2 Asymptotic properties of maximum likelihood estimators 3.3 Thegeneral result for determining asymptotically best procedures 3.4 Conclusions 3.5 References
23 23
4 Procedures Based on Urn Models 4.I Generalized Friedman 's urn 4. I . I Historical results on asymptotic properties 4.1.2 Assumptions and notation 4.1.3 Main asymptotic theorems 4. I . 4 Some examples 4.1.5 Proving the main theoretical results 4.2 The class of ternary urn models 4.2.I Randomized Pdlya urn 4.2.2 Birth and death urn 4.2.3 Drop-the-loser rule 4.2.4 Generalized drop-the-loser rule 4.2.5 Asymptoticproperties of the GDL rule 4.3 References
31 31 31 34 37 41 52 56 57 58 59 61 62 64
5 Procedures Based on Sequential Estimation 5.1 Examples 5.2 Properties of procedures based on sequential estimationfor K = 2 5.3 Notation and conditionsfor the general framework 5.4 Asymptoticresults and some examples 5.5 Proving the main theorems 5.6 References
67 68
25 27 28 28
70 75 79 85 88
CONTENTS
ix
6 Sample Size Calculation 6.1 Power of a randomization procedure 6.2 Three types of sample size 6.3 Examples 6.3.1 Restricted randomization 6.3.2 Response-adaptive randomization 6.4 References
91 92 96 99 99 101 103
7 Additional Considerations 7.1 The effects of delayed response 7.2 Continuous responses 7.2.1 Asymptoticvariance of thefour procedures 7.3 Multiple (K > 2) treatments 7.4 Accommodatingheterogeneity 7.4.1 Heterogeneity based on time trends 7.4.2 Heterogeneity based on covariates 7.4.3 Statistical inference under heterogeneity 7.5 References
105 105 108 111 113 114 114 115 115 118
8 Implications for the Practice of Clinical Trials 8.1 Standards 8.2 Binary responses 8.3 Continuous responses 8.4 The efects of delayed response 8.5 Conclusions 8.6 References
121 121 129 131 232 132
9 Incorporating Covariates 9.1 Introduction and examples 9.1. 1 Covariate-adaptive randomization procedures 9.1.2 CARARandomization Procedures 9.2 General framework and asymptotic results 9.2.1 Theprocedure for K treatments 9.2.2 Main theoretical results 9.3 Generalized linear models 9.4 Two treatments with binary responses 9.4.I Power 9.5 Conclusions
135 135 135 138 139 140 141 144 149 153 154
123
x
CONTENTS
9.6
References
155
10 Conclusions and Open Problems 10.1 Conclusions 10.2 Open problems 10.3 References
157
Appendix A: Supporting Technical Material A.1 Some matrix theory A.2 Jordan decomposition A.3 Matrix recursions A.4 Mart ingales A.4.I Dejinition and properties of martingales A.4.2 The martingale central limit theorem A.4.3 Gaussian approximations and the law of the iterated logarithm A.5 Cramkr- Wolddevice A.6 Multivariate martingales A.7 Multivariate Taylor S expansion A.8 References
161
Appendix B: Proofs B. 1 Proofs of theorems in Chapter 4 B.l.1 Proof of Theorems 4.1-4.3 B.1.2 Proof of Theorem 4.6 B.2 Proof of theorems in Chapter 5 B.3 Proof of theorems in Chapter 7 B.4 References
173 173 173 189 194 205 214
Author Index
215
Subject Index
217
157 158 159
161 162 163 164 164 165 167 168 168 172 172
Preface
Research in response-adaptive randomization developed as a response to a classical ethical dilemma in clinical trials. While clinical trials may provide information on new treatments that can impact countless lives in the future, the act of randomization means that volunteers in the clinical trial will receive the benefit of the new treatment only by chance. In most clinical trials, an attempt is made to balance the treatment assignments equally, but the probability that a volunteer will receive the potentially better treatment is only 1/2. Response-adaptiverandomization uses accruing data to skew the allocation probabilities to favor the treatment performing better thus far in the trial, thereby mitigating the problem to some degree. Response-adaptive allocation has a long history in the biostatistical literature, and the list of researchers who have worked (at least briefly) in the area reads like a Who’s Who of modern statistics: Anscombe, Chernoff, Colton, Cornfield, Flournoy, Greenhouse, Halperin, Louis, Robbins, Siegmund, Wei, Woodroofe, Zelen, and others. Largely because of the disastrous ECMO trial in the early 1980s, there was a general reluctance to use these procedures that has continued to this day. When the authors met in 1995, it was unclear whether these procedures were effective or could be adapted to modern clinical trials and whether certain fundamental questions could be answered. Our collaboration over the past 10 years has been an attempt to formalize the important questions regarding response-adaptive randomization in a rigorous mathematical framework and to systematically answer them. We generally had no idea that we were opening a can of worms that would require a demanding arsenal of mathematical tools. We set out to interest others in the problems, and this led to fruitful collaborations with many other investigators. This book is a result of xi
xii
PREFACE
these collaborations. It represents what we now know about the subject, and it is our attempt to form a mathematically rigorous subdiscipline of experimental design involving randomization. Two individuals were particularly influential: 2. D. Bai of Singaporc, whose collaborative work resulted in solutions to decades-old problems in urn models, largely forming the basis for Chapter 4; and L.-X. Zhang of China, whose collaborative work largely forms the basis of Chapter 5. This book is aimed at Ph.D. students and researchers in response-adaptive randomization. It provides answers to some of the fundamental questions that have been asked over the years: How does response-adaptive randomization affect power? Can standard inferential tests be applied following response-adaptive randomization? What is the effect of delayed response? Which procedure is most appropriate, and how can "most appropriate" be quantified? How can heterogeneity of the patient population be incorporated? Can response-adaptive randomization be performed with more than two treatments or with continuous responses? While the mathematics generated by these problems can sometimes be daunting, the response-adaptive randomization procedures themselves can be implemented in minutes by adding a loop to a standard randomization routine. Procedures can be simulated under various parameterizations to determine their appropriateness for use in clinical trials. Our hope is that any future objections to the use of response-adaptive randomization will not be based on logistical difficulties or the lack of theoretical justification of these procedures. Most of the book is written at the level of graduate students in a statistics program. The technical portions of the book are mostly relegated to appendices and to brief descriptions in Chapters 4 and 5 . That material requires advanced probability and stochastic processes as well as matrix theory. Prerequisite material can be found in Appendix A for those wishing to pursue the technical details. In addition, it is recommended that readers new to the area of response-adaptive randomization begin by reading Chapters 10-12 of Rosenberger and Lachin (Randomization in Clinical Trials, Wiley, New York, 2002). We would like to thank our colleagues 2. D. Bai, W. S. Chan, Siu Hung Cheung, Steve Durham, Nancy Flournoy, Bob Smythe, L.-X. Zhang, and Jim Zidek. In addition, we thank our current and former doctoral students Liangliang Duan and Thomas Gwise (Hu); Anastasia Ivanova, Yevgen Tymofyeyev, and Lanju Zhang (Rosenberger). Parts of this book were tested in a short course at a summer school in Torgnon, Italy, organized by Pietro Muliere of Bocconi University. We thank him and his students; in particular, the asymptotic variance in Example 5.9 was derived by the students of the course. As we point out in Chapter 10, open problems abound in this area, and it is our sincere hope that more talented researchers will be attracted to the beauty of the complex stochastic structures encountered throughout this book. However, our greatest hope is that, by providing a firm theoretical underpinning to the concept of response-adaptive randomization in this book, clinical trialists will be motivated to apply these techniques in practice. Finally, much of our research career has benefited from generous funding from the United States government. This included grants from the Division of Mathe-
PREFACE
xiii
matical Sciences, National Science Foundation: Hu and Rosenberger 2002-2005, Rosenberger 2005-2008, Hu (Career Award) 2004-2009; and the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health: Rosenberger (FIRST Award) 1995-2000. These grants provided the opportunity to advance our research program, and we wish to recognize the importance of such funding for young researchers. F. H. Charlorresville Kqinia
W. F. R. FairfM. KTinia
This Page Intentionally Left Blank
1 Introduction
1.1 RANDOMIZATION IN CLINICAL TRIALS We begin with a mathematical formulation of the problem. Consider a clinical trial of n patients, each of whom is to randomly receive one of K treatments. A randomization sequence is a matrix T = (TI,...,T,)’ where T, = e,, j = 1, ...,K , i = 1,...,n and e, is a vector of zeroes with a 1 in the j-th position. We will typically be interested in exploring important properties of the randomization sequence, which usually involves deriving asymptotic properties of the allocation proportions, given by N ( n ) / n where , N ( n )= ( N l ( n ) ..., , N K ( ~ )and ) N3(n) = C,”=,T,,. Necessarily IIN(n)ll= N3(n)= n. Let X = (XI, ...,X,)’, where J!!$== ( X 1 l ,...,X,K),be a matrix of response variables, where X, represents the sequence of responses that would be observed if each treatment were assigned to the i-th patient independently. However, only one element of X , will be observable. Throughout the book, we will consider only probability models for X, conditional on T,. In some applications, there may be a set of covariate vectors 21, ...,2, that are also of interest. In this case, we will consider probability models for X , conditional on T, and 2,. Let 7, = a { T 1 ..., , T,,} be the sigma-algebra generated by the first n treatment assignments, let X, = a { X 1 ,...,X,} be the sigma-algebra generated by the first n responses, and let 2, = a{2 1 , ..., 2,) be the sigma-algebra generated by the first n covariate vectors. Let F, = I, @ X,,@ Zn+l. A randomization procedure is defined by
xK
1
2
INTRODUCTION
where 4n+lis F,-measurable. We can describe dn as the conditional probability of assigning treatments 1, ...,fc to the n-th patient, conditional on the previous n - 1 assignments, responses, and covariate vectors, and the current patient's covariate vector. We can describe five types of randomization procedures. We have complete randomization if +n = E ( T n l F n - 1 ) = E ( T n ) ; restricted randomization if
response-adaptive randomization if
covariate-adaptive randomization if
and covariate-adjusted response-adaptive (CARA) randomization if
This book will primarily focus on response-adaptive randomization. The latter two classes of randomization procedures, covariate-adaptive randomization and covariateadjusted response-adaptive randomization, have been poorly studied. Chapter 9 will give an overview of the current knowledge about CARA randomization procedures. The book by Rosenberger and Lachin (2002) gives a thorough discussion of properties of complete and restricted randomization as well as a less thorough treatment of covariate-adaptive and response-adaptive randomization. We now give examples of each of these types of procedures. 1.1.1 Complete randomization
The simplest form of a randomization procedure is complete randomization. For K = 2, Ti = ( 1 , O ) or ( 0 , l ) according to the toss of a coin. Then 2'11, ...,Tnl are independent and identically distributed Bernoulli random variables with the probability of assignment to treatment 1 given by d i 1 = E(T,l) = 1/2, i = 1, ...,n. This procedure is rarely used in practice because of the nonnegligible probability of treatment imbalances in moderate samples. 1.1.2
Restricted randomization procedures
Restricted randomization procedures are the preferred method of randomization for many clinical trials because it is often desired to have equal numbers of patients assigned to each treatment, or nearly so. This is usually accomplished by changing
RANDOMIZAJION IN CLINICAL TRIALS
3
the probability of randomization to a treatment according to how many patients have already been assigned to that treatment. Rosenberger and Lachin (2002) describe four different restricted randomization procedures. The first is the random allocation rule, where, for K = 2 and n even, the probability that the i-th subject is assigned to treatment 1 is given by
The second is the truncated binomial design which, for n even and K = 2, is given by 4i1
= 1/2, = 0,
= 1,
if if if
max{Nl(i - I ) , N z ( i- I)} < n/2, Nl(i - 1) = n/2, Nz(i- 1) =n/2.
The third is Efon's biased coin design (Efron, 1971). From the vector N ( i ) ,let Di = Nl(i)- Nz(i)be the imbalance between treatments 1 and 2. Define a constant a E (0.5,1]. Then the procedure is given by q5i1
= 1/2,
Di-1 = 0,
if
- n, if Di-1 < 0, = 1 - a, if Di-1 > 0.
For the fourth procedure, Wei S urn design, Wei (1978) proposed that Efron's procedure be modified so that the degree of imbalance influences the randomization procedure. One technique for doing this is to establish an urn model whereby balls in the urn are labeled 1,...,K , and each represents a treatment to be assigned. Let Y , be the urn composition,where Y,j is the number of balls labeled j in the urn after n patients have been assigned, j = 1, ...,K. One begins with an initial urn composition Y O= 1. Each randomization is accomplished as follows: a ball is drawn and replaced, its label noted, the appropriate treatment is assigned, and one ball from each of the other K - 1 labels is added to the urn. Thus the restricted randomization procedure is given by
A generalization of this procedure is found in Wei, Smythe, and Smith (1986). Let r(.)= ( n l ( . ) ..., , T K ( . ) ) be a continuous function such that the following relationship is satisfied: if Nj(i-1) 1 2-1
then aj
'77
(N (-i - 1)) s El 1
j = l,...,K.
4
INTRODUCTION
Then the restricted randomization procedure is given by
In practice, the random allocation rule and truncated binomial design are usually performed within blocks of subjects so that balance can be forced throughout the course of the clinical trial. Forcing balance within blocks of fixed or random size is called a permuted block design. Alternatively, Efron’s biased coin design and Wei’s urn design, while not forcing perfect balance, adaptively balance the treatment assignments. See Rosenberger and Lachin (2002) for details. 1.1.3
Response-adaptive randomization procedures
Response-adaptive randomization procedures change the allocation probabilities for each subject according to previous treatment assignments and responses in order to meet some objective. Two objectives that we will be principally concerned with for direct application in clinical trials are maximizing power and minimizing exposure to inferior treatments. Here we give two examples of response-adaptive randomization procedures: one a generalization of Wei’s urn design and the other a generalization of Efron’s biased coin design. One can generalize Wei’s urn design to establish a broad family of responseadaptive randomization procedures based on a generalized Friedman’s urn model (Athreya and Karlin, 1968). For response-adaptive randomization, balls are added to the urn based not only on the treatment assigned, but also on the patient’s response. Formally, the randomization of patient i is accomplished as follows: a ball is drawn from the urn composition Y,-1 and replaced, its label noted (say j ) , and the j-th treatment is assigned. One then observes the variable Xi and D 3 k balls are added to the urn, for Ic = 1, ...,K , where D j k is a measurable function on the sample space of X i . The response-adaptive randomization procedure is then given by
Note that if D j k = 1 - d j k with probability one, where dik is the Kronecker delta, we have Wei’s urn design. The use of the generalized Friedman’s urn model for response-adaptive randomization procedures derives from Wei and Durham (1978) for binary response clinical trials with K = 2. Let X i j = 1 if patient i had a success on treatment j and X i j = 0 if patient i had a treatment failure on treatment j . With this notation, if treatment j was not assigned, then X i j is not observable. Define Y o= ( a ,a ) for some positive integer a and let Djk
= d j k x i j -f (1 - d j k ) ( l - x i j )
(1.3)
for all i . When the procedure (1.2) is used, this is called the rundotnizedphy-thewinner rule.
RANDOMIZATION IN CLINICAL TRIALS
Wei (1979) extended the randomized play-the-winner rule to K as follows. Let Y O= crl and Djk
= ( K - 1)djkxij 4- (1 - djk)(l
5
> 2 treatments
- xij)
for all i and again use the procedure (1.2). Note that D = { D j k , j , k = 1,...,K},which we call the design matrix, is assumed to be a homogeneous matrix over all patients i = 1,...,n. This may not be a reasonable assumption in practice, and we expend some energy later in the book dealing with heterogeneous matrices. If we assume homogeneity, many of our asymptotic results that we develop will depend on the matrix H = E ( D ) ,which we call the generating matrix. As a second example, for K = 2, Eisele (1 994) and Eisele and Woodroofe (1995) describe a doubly-adaptive biased corn design. Unlike the generalized Friedman’s urn, the doubly-adaptive biased coin design is based on a parametric model for the response variable. Let the probability distributions of X I ,...,X, depend on some parameter vector 8 E 0.Let p ( 8 ) E (0,1>be a target allocation, i.e., the target proportion of subjects desired to be assigned to treatment 1, which depends on the parameter vector 8. Let g be a function from [0, 112 to [0,1] such that the following four regularity conditions hold: (i) g is jointly continuous; (ii) g(r,r) = r; (iii) g(p, T ) is strictly decreasing in p and strictly increasing in r on (0,l)’; and (iv) g has bounded derivatives in both arguments. At the i-th allocation, the function g represents the closeness of Nl(i - l ) / ( i - 1)to the current estimate ofp(8) in some sense. Hu and Zhang (2004) proposed the following function g for 7 2 0:
S(0,Y) = 1; d L Y ) = 0. This function does not satisfy Eisele’s regularity condition (iv), but it satisfies alternative conditions of Hu and Zhang (2004), which we will describe later in the book. Then, for K = 2, Nl(i - 1) , P ( b i - l ) ) , where 8i-lAis some estimator of 8 based on data from the first i - 1 subjects. When y = 0 and 8i-1 is the maximum likelihood estimator of 8, the procedure reduces to cpil
= P@i-1).
This is called the sequential maximum likelihoodprocedure, which has been studied by Melfi and Page (2000) and Melfi, Page, and Geraldes (2001). These two examples, one based on a class of urn models and the other on a class of adaptive biased coin designs, lead to an important distinction motivating the approach to response-adaptive randomization. The first approach is completely nonparametric
6
INTRODUCTION
and is not designed to target some specific allocation based on unknown parameters. The second approach begins with a parametric response model and a target allocation based on unknown parameters of that model and sequentially substitutes updated estimates of those parameters. This book will explore both approaches: the first approach in the context of various urn models and the second approach in the context of the doubly-adaptive biased coin design. In fhture chapters we will refer to the first approach as procedures based on urn models and the second approach as procedures based on sequential estimation. 1.1.4
Covariate-adaptive randomization procedures
Clinical trialists are often concerned that treatment arms will be unbalanced with respect to key covariates of interest. To prevent this, covariate-adaptive randomization is sometimes employed. As one example, the Pocock-Simonprocedure (Pocock and Simon, 1975) is perhaps the most widely used covariate-adaptive randomization procedure. Let 21, ...,2, be the covariate vector of patients 1, ...,n. Further, we assume that there are S covariates of interest (continuous or otherwise) and they are divided into n,, s = 1, ..., S, different levels. Define Nsik(n),s = 1, ...,S, i = 1, ...,n,, k = 1 , 2 to be the number ofpatients in the i-th level ofthe s-th covariate on treatment k. Let patient n + 1 have covariate vector Zn+l = ( T I , ...,T S ) . Define a metric D s ( n ) = N,,, ~ ( n-)Nsr,2(n),which is the difference between the numbers ofpatients on treatments 1 and 2 for members of level T , ofcovariate s. Let w1,...,w s be a set of weights and take the weighted aggregate D ( n ) = w,D,(n). Establish a probability 7r E (1/2,1]. Then the procedure allocates to treatment 1 according to
xfZl
46il
= E(TillZ-1,Zi) = 1/2, -
T,
= 1 -7r, 1.1.5
if if if
D ( i - 1) = 0, D ( i - 1) < 0, D ( i - 1) > 0.
Covariate-adjusted response-adaptive randomization procedures
A relatively new concept in response-adaptive randomization is CARA randomization, in which the previous patient responses and covariate vectors and the current patient’s covariate vector are used to obtain a covariate-adjusted probability of assignment to treatment. One approach to this, for binary response ( X i j = 0 or 1, i = 1, ...,n,j = 1, ...,K ) using a logistic regression model, was given by Rosenberger, Vidyashankar, and Agarwal(200 I). Consider the covariate-adjusted logistic regression model, given by
where a is the intercept, p is the treatment main effect, -y is an S-dimensional vector of covariate main effects, and 6 is an S-dimensional vector of treatment-by-covariate interactioqs. The fitted model acer n patients yields maximum likelihood estimators bn of a, 0, of P, of 7, and 6, of 6. The covariate-adjusted odds ratio based on
+,
RESPONSE-ADAPTIVE RANDOMIZATION IN A HISTORICAL CONTEXT
7
+
the data for n patients and the covariate vector of the (n 1)-th patient is given by
Then we define the randomization procedure by
1.2
RESPONSE-ADAPTIVE RANDOMIZATION IN A HISTORICAL CONTEXT
In this book, the response-adaptive randomization procedures have three defining characteristics: (1) they are myopic; (2) they are fully randomized; (3) they require a fixed sample size. Let us examine these characteristics in a historical context. The response-adaptive randomization procedures we examine are myopic procedures in that they incorporate current data on treatment assignments and responses into decisions about treatment assignments for the next subject. After each subject, the data are updated, and a decision is made only about the next subject. This differs from the approach in multi-armed banditproblems (see Berry and Fristedt, 1986, for example) in which all possible sequences of treatment assignments and responses are enumerated, and the sequence that optimizes some objective is selected. The latter approach is computationally intensive, so myopic procedures are often preferable from a logistical consideration, although there is no guarantee that such procedures are globally optimal. Most of the work on multi-armed bandit problems has been in the context of nonrandomized studies, although recently there has been some work on randomized bandits by Yang and Zhu (2002). In addition to being myopic, the response-adaptive randomization procedures described are fully randomized in that each subject is assigned a treatment by random chance, the cornerstone of the modern clinical trial. Randomization is necessary to prevent selection bias, covariate imbalances, and to provide a basis for inference (Rosenberger and Lachin, 2002). Historically, response-adaptivetreatment allocation designs were developed for the purpose of assigning patients to the better treatment with probability 1 . The preliminary ideas can be traced back to Thompson (1933) and Robbins (1952) and then to a series of papers in the 1960s by Anscombe (1963), Colton (1963), Zelen (l969), and Cornfield, Halperin, and Greenhouse (1969). The idea of incorporating randomization in the context of response-adaptive treatment allocation designs stems from Wei and Durham (1978). Finally, the response-adaptive procedures in this book assume a fixed sample size. Historically, response-adaptive treatment allocation designs were often viewed in the context of sequential analysis, where a random number of patients N would be determined according to an appropriate stopping boundary. Early papers in this context were by Chernoff and Roy (1 965), Flehinger and Louis (1 97 l), Robbins and Siegmund (1 974), and Hayre (1 979). However, most modem clinical trials in the
8
INTRODUCTION
United States are conducted with a fixed, predetermined sample size set according to power and budgetary considerations. Interim monitoring plans are instituted to potentially stop the clinical trial early for safety or efficacy reasons. The imposition of an early stopping boundary is flexible, and in fact, many clinical trials that cross an early stopping boundary are not terminated early, usually for reasons unrelated to the primary outcome of the study. Thus we restrict our attention to myopic, hlly randomized, fixed sample size procedures, which reflect very well the modem conduct of clinical trials. In fact, to implement these procedures in an actual clinical trial requires nothing more than a simple modification of the randomization sequence generator. (Of course, there are many interesting and important statistical questions that arise in the context of response-adaptive randomization, which will be the subject of this book.) The methodology for such response-adaptive randomization procedures thus begins in the late 1970s, and much of our current understanding is very recent. Many open problems still abound, and this is a fruitful area for hture research. 1.3
OUTLINE
OF THE BOOK
In Chapter 2, we identify fundamental questions of response-adaptive randomization, namely, what allocation should we target to achieve requisite power while resulting in fewer treatment failures, and how do we employ randomization to attain that target allocation? In Chapter 3, we describe the impact on likelihood-based inference resulting from response-adaptive randomization procedures and derive a benchmark for measuring the efficiency of a response-adaptive randomization procedure. We then detail the two approaches to response-adaptive randomization: in Chapter 4, we describe procedures based on urn models and derive their asymptotic properties; in Chapter 5 , we describe procedures based on sequential estimation and derive their properties. In Chapter 6, we deal with the subtle concept of sample size computation in the context of random sample fractions. Chapter 7 discusses additional technical and practical considerations for response-adaptive randomization: delayed responses, heterogeneity, and more general response types and multiple treatments. Chapter 8 describes some practical considerations in selecting response-adaptive randomization procedures. Chapter 9 gives an overview of our current knowledge about C A M procedures. Two appendices describe certain mathematical background that forms a prerequisite for the more technical parts of the book and gives exhaustive proofs of some of the most important theorems in Chapters 4 and 5 . 1.4
REFERENCES
ANSCOMBE,F. J. (1963). Sequential medical trials. Journal ofthe American Statistical Association 58 365-384. ATHREYA,K. B. AND KARLIN,S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems.
REFERENCES
9
Annals of Mathematical Statistics 39 1801-1 8 17. BERRY,D. A. AND FRISTEDT, B. (1986). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London. CHERNOFF, H. AND ROY, s. N. (1965). A Bayes sequential sampling inspection plan. Annals of Mathematical Statistics 36 1387-1407. COLTON,T. (1963). A model for selecting one of two medical treatments. Journal of the American Statistical Association 58 388400. CORNFIELD, J., HALPERIN, M., AND GREENHOUSE, S. W. (1969). Anadaptive procedure for sequential clinical trials. Journal of the American Statistical Association 64 759-770. EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 58 403-4 17. EISELE,J. R. (1994). The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 38 249-261. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. FLEHINGER, B. J. AND LOUIS, T. A. (1971). Sequential treatment allocation in clinical trials. Biometrika 58 4 19-426. HAYRE,L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465-474. Hu, F. A N D ZHANG,L.-x. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-30 1. MELFI,V. F. AND PAGE,C. (2000). Estimation after adaptive allocation. Journal of Statistical Planning and Inference 87 353-363. MELFI,V. F., PAGE, c.,AND GERALDES,M. (2001). An adaptiverandomized design with application to estimation. Canadian Journal of Statistics 29 107-116. POCOCK, S. J . AND SIMON, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31 103-115. ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58 527-535. ROBBINS,H. AND SIEGMUND, D. 0. (1974). Sequential tests involving two populations. Journal of the American Statistical Association 69 132-1 39. ROSENBERGER, W. F. AND LACHIN,J . M. (2002). Randomization in Clinical Trials: Theory and Practice. Wiley, New York. ROSENBERGER, W. F., VIDYASHANKAR, A. N., AND AGARWAL, D. K. (200 1). Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics 11 227-236. THOMPSON, W. R. (1933). On the likelihood that one unknown probability exceeds another in the view of the evidence of the two samples. Biometrika 25 275294. WEI, L. J. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journal of the American Statistical Association
10
INTRODUCTION
73 559-563. WEI, L. J . (1979). The generalized Pblya’s urn design for sequeiitial medical trials. Annals of Statistics 7 29 1-296. W E I , L. J . A N D DURHAM, S. D. (1978). The randomized play-the-winner rule in medical trials. Journal of the American Statistical Association 73 840-843. WEI, L. J., SMYTHE, R. T., A N D SMITH,R. L. (1986). K-treatment comparisons with restricted randomization rules in clinical trials. Annals of Statistics 14 265-274. YANG, Y. AND ZHU, D.(2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Annals of Statistics 30 100-121. ZELEN,M. (1969). Play the winner and the controlled clinical trial. Journal of the American Statistical Association 64 131-1 46.
3 Fundamental Questions of Response-Adaptive Randomization
In this chapter we describe some fundamental questions that will be useful in making decisions about which response-adaptive randomization procedure to use. We begin with a discussion of optimal allocation to target certain objectives. Then we formulate the relationship between a response-adaptive randomization procedure and power. We then combine the two concepts to develop a framework with which to evaluate response-adaptive randomization procedures. 2.1
OPTIMAL ALLOCATION
In this section we introduce an optimization problem of the following form. Let n = (n1,...,n ~ )where , nl' = n,be the sample sizes on the K treatments and let the probability distributions of responses depend on some parameter vector 0 E 0. For function 17 and constant C, we can consider the problem
w(0)n' subject to ~ ( n 0),= C, min ...,n K
nl,
(2.1)
where u ( 0 ) = ( w l ( 0 ) ,...,wK(0)) are a set of positive weights (see Hayre, 1979; Jennison and Turnbull, 2000). It will be convenient to illustrate with the very simplest of clinical trials, where two treatments (A and B) will be compared with respect to a binary response. Let p~ be the probability of success on treatment A and p~ be the probability of success on treatment B ( q A = 1 - P A , Qg = 1 - p g ) . Suppose we have a fixed allocation of TZAsubjects on A and ng subjects on B, where n~ ng = n. Consider designing
+
11
12
FUNDAMENTAL QUESTIONS
the trial to test the hypothesis
HO : P A - p~ = 0 versus H A : P A - p~ # 0 using the Wald test, given by
e~,
4~ represent the estimators. (We could alternatively use a score test where $ A , $ B , or likelihood ratio test.) In modern clinical trials, the first activity of the statistician is to determine a requisite sample size to achieve reasonable power, based on relevant assumptions about the treatment effect. Traditionally, many statisticians would fix the sample proportions as n A = n B = n / 2 and find the minimum sample size n that provides the desired power level. An alternate way of approaching the problem is to fix the variance of test under the alternative hypothesis and find the allocation n = ( n A ,n B ) to minimize the total sample size. Using (2.1) with 'WI ( 8 ) = w2(8)= 1, 8 = (PA,p ~ )and , (2.3)
we can answer a classical question from clinical trials: forfixed variance of the test statistic under an alternative hypothesis, what allocation minimizes the total sample size? This question is equivalent to asking, forfixed sample size, what allocation maximizespower? Let p(6) be a given proportion of patients assigned to treatment A. From the equation v(n,@) =c and dropping the dependence of p on 8, we have
PAQA +-=PBQB pn
(1 - P b
c,
letting n A = p n , n B = (1 - p ) n . Solving for n yields (2.4)
Then (2.1) is equivalent to minimizing (2.4) with respect to p. Upon taking a derivative, we need to find the solution to the equation
The solution is
OPTIMAL ALLOCATION
13
and is called Neyman allocation. Unfortunately, when P A f p g > 1, Neyman allocation allocates more subjects to the inferior treatment. This leads to a second fundamental question: f o r f i e d variance of the test, what allocation minimizes the expected number of treatment failures? In this case, 77 remains the same as in (2.3), and W l ( 0 ) = q A , W 2 ( 8 ) = q B . The solution to this problem is then
(Rosenberger et al.,, 2001). (For lack of a better name, we refer to this as RSIHR allocation, as an acronym of the authors of the original paper.) For the case where the responses are normally distributed, we can use the same approach with 8 = ( p A , ~ A , p ~ , u gwhere ),
Neyman allocation results when w l ( 8 ) = w2(8) = 1 and we obtain p(B) = U A / ( U A + ~ g ) .One possible analog of RSIHR allocation would be to minimize the average response, where wl(0)= and w2(0)= pg for p~ > 0, pg > 0. This results in p = OA f i / ( o ~ f i + o ~ f l ) . Unfortunately,this allocation does not always ensure that more patients are assigned to the better treatment, as in the binary case. Extending Neyman and RSIHR allocation to K treatments is conceptually clear using (2.1) but has a number of subtleties that make it a much more difficult problem. We denote the optimal allocation p(B), noting its dependence on unknown parameters. We have, in essence, described a classical nonlinear optimal design problem, in which the optimal solution depends on unknown parameters (see, for example, Atkinson and Donev, 1992). Traditionally such problems would be solved by substituting a local “best guess” of the parameters (locally optimal design), or by averaging over a prior distribution on the unknown parameters (Bayesian optimal design), or sequentially, by substituting data into parameter estimates as they accrue. The latter idea, of sequential design, is the basic premise of response-adaptive randomization. We wish to target an unknown optimal allocation by substituting accruing data into our design, where in our case the “design” is the randomization process. We now return to the general formulation for K treatments, given in (2.1). We modify it slightly and formulate the optimization as two equivalent problems (Tymofyeyev, Rosenberger, and Hu, 2006). Let 4 ( n l ,...,n K ) be the noncentrality parameter of a suitable multivariate test statistic of interest under the alterative hypothesis. We assume that the noncentrality parameter is a concave function with nonnegative gradient. The first problem can be stated as follows:
14
FUNDAMENTAL QUESTIONS
where C is some positive constant, and w = (wl, ...,w ~ ) is' a vector with positive components. Here we minimize the weighted sum of sample sizes while fixing the value of the noncentrality parameter to be at least at the level C. The constant B E [O,l/K],K B 5 1,is a lower bound for the proportion nk/ C,"=,nj that allows us to control explicitly the feasible region of the problem. By selecting L3 > 0, one eliminates the possibility of having no patients assigned to a single treatment. The case when B = 0 is the least restrictive natural constraint, and B = 1/K immediately generates the solution nk/ CE1nj = l/K, for all k = 1,...,K , which is equal allocation. Note that problem (2.7) is a convex optimization problem. The second problem is
where M is some positive constant. In this problem, we maximize the noncentrality parameter of the test for the fixed value of the weighted sum of sample sizes. Again, this is a convex optimization problem and we are interested in the region that is enforced by the same €3. It can be shown that formulations (2.7) and (2.8) are equivalent with regard to specifying the same allocation proportions of patients to the treatments (Tymofyeyev, Rosenberger, and Hu, 2006); i.e., K
K
If, for example, we let w = 1,problem (2.7) minimizes the total sample size subject to a constraint that the noncentrality parameter be at least C. Problem (2.8) maximizes the noncentrality parameter subject to a constraint that the total sample size does not exceed M . This solution (for B = 0) is the analog of Neyman allocation (2.5) for K treatments. Our principal interest is in the analog of RSIHR allocation in ( 7 3 , where w = q , for q = (91, ...,g K ) , so that wn' is the expected number of treatment failures. Unfortunately, a global solution for all w to (2.7) and (2.8) has yet to be found. For Neyman allocation, there is a closed-form solution, and it is given as Theorem 2 in Tymofyeyev, Rosenberger, and Hu (2006). We state it here without proof. For some positive integers s and g such that s g < K , let
+
Then the vector of optimal proportions when wk = 1, k = 1,...,K , given by
THE RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
15
with components
Pi p:+1
P k -g
solves both optimization problems (2.7), (2.Q for Ii E [O,B],where 13 = min(B1, B K , I / K } ,
(2.1 1)
When B > B and B = B 1 , the solution is p* = ( B ,..., B , P ; ( - ~ +..., ~ p;O , with P;C-,+~
= ... = p k = (1 - ( K - g)B)/g.
When Ii > B and B = B K ,the solution is p* = ( p ; , ...,p:, B, ...,B) with p; = ... = p: = (1- (K s)B)/s.
-
The assumption on the ordering of the Pk's in (2.9) is given to specify the ordering and multiplicities of the largest and smallest values of the underlying probabilities by s and g , respectively, for the nondegenerate case when all Pk's are not the same. Note that if B = 0, we obtain a solution on the boundary which involves only the best and the worst treatments:
and 1 &+, = ... = pfK = -(1 - sp;). 9
2.2
T H E RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
Although a particular allocation may be optimal in terms of power and other criteria, we cannot ensure that a fixed optimal allocation will result because the parameters
16
FUNDAMENTAL QUESTIONS
are unknown. Using a response-adaptive randomization procedure to target a specific allocation induces correlation among treatment assignments that can lead to extrabinomial variability that can adversely affect power. This leads to a fundamental question: can we develop response-adaptive randomization procedures that result in fewer treatment failures without a loss ofpower? To answer this, we need to do a careful analysis of the relationship between power and variability of the procedure. We now consider the relationship between power and some target allocation p ( 6 ) (which may or may not be optimal in some sense) following the approach of Hu and Rosenberger (2003). For the test in (2.2), when 2’ is asymptotically (for n A ca and ng -+ 00) chi-square with 1 degree of freedom, then under the alternative hypothesis, power can be expressed as an increasing function of the noncentrality parameter of the chi-square distribution for a fixed target allocation proportion p(B). Using the simple difference measure, the noncentrality parameter can be expressed as follows:
-
Now we define a function
We have the following expansion:
).(f
= f(0)
+ f’(0). + f”(0)Z2/2 +
0(X2).
After some calculation, we obtain
and This yields
+O ( ( n A / n - PI2)
E?
(I)
+ (11)+ (111)+ O ( ( n A / n -
p)2).
(2.12)
THE RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
17
The first term (I)is determined by p and represents the noncentrality parameter for a fixed design with target allocation p. Note that Neyman allocation maximizes this term. We can use this term to compare the power of different target allocations. The second term (11)represents the bias of the actual allocation from the optimal allocation. With the design shifting to a different side from the target proportion p, the noncentrality parameter will increase or decrease according to the coefficient
It is interesting to see that this coefficient equals 0 if and only if p A q A ( 1 = 0, that is
-P
) -~
pBqBp2
i.e.,Neyman allocation.
Under certain response-adaptive randomization procedures, the test statistic Z2, with N A ( ~and ) N B ( ~substituted ) for n~ and 128,still has the asymptoticchi-square distribution with 1 degree of freedom and the same noncentrality parameter under alternatives (with N A ( ~and ) N B ( ~replacing ) n~ and TZB),despite the complex dependence structure of the treatment assignments and response. The specific procedures for which the asymptotic properties hold and the conditions required will be the subject of Chapter 3. For response-adaptive randomization procedures for which the asymptotic chisquare distribution holds, we can substitute the random variable N A ( ~for) n A in equation (2.12). The noncentrality parameter then becomes a random variable which is a function 4 ( N ~ ( n ) We ) . then consider the expectation of the noncentrality parameter, given by E ( ~ ( N A ( ~Most ) ) ) procedures . will be asymptotically unbiased, so we can assume that E ( N ~ ( n ) l-np ) = 0 for large n, although the rate of convergence will be of interest. For small to moderate samples, this may not be an appropriate approximation. Assuming that the second term vanishes, the average power lost of the procedure is then a function of
) . we which is a direct function of the variability in the design V u r ( N ~ ( n ) / n So now have the precise link between power and the variability of the design. Thus we can use the variance of NA( n ) / nto compare response-adaptive randomization procedures with the same allocation limit. The above derivation determines a template for theoretically evaluating responseadaptive randomization procedures in terms of power. When the usual test statistic has an asymptotic chi-square distribution, power can be evaluated by examining three things: (1) the limiting allocation of the procedure; (2) the rate of convergence to the limiting allocation; and (3) the variability of the procedure.
18
FUNDAMENTAL QUESTIONS
2.3 T H E RELATIONSHIP FOR K
> 2 TREATMENTS
In this section, we explore the relationship between power and response-adaptive randomization for multivariate hypotheses of multiple ( K > 2) treatments in a clinical trial. The idea of optimizing the noncentrality parameter of a multivariate test to achieve a prespecified power derives from an early paper by Lachin (1977). Lachin (2000) presents the important alternative hypotheses for multivariate tests. The contrast test of homogeneity compares K - 1 treatments to a control. This is only one possible set of contrasts, which could also involve successive differences or each treatment versus the average over all treatments, among others. The most likely format for a clinical trial is the comparison of K - 1 treatments to a control, and this is the alternative on which we focus. We also explore an omnibus alternative that specifies that at least one treatment differs from a null value. Define p1, . . . , p ~to be the success probabilities for the K treatments and let p1 (0), ...,p ~ ( 0be) the optimal allocation proportions for each treatment. Provided that the noncentrality parameter i s convex with nonnegative gradient, we can use the formulation in (2.7) or (2.8) to derive the appropriate allocations. For the contrast test of homogeneity, define
P, = ( P I - P K , .-,Prc-1 - P K ) and@,, = ($1 - f i ~ ..., , #K-I
- fir 0.
(4.5)
Also we assume that cov[(D,k(i),Dql(i))lFi-l]+ dqkl almost surely for all q, k , 1 = 1, ...,I 0 for which the function g(2,y ) satisfies
REMARK5.1. Condition (B2) is satisfied with A0 = 0 if we assume that the ( m + 1)-th patient is assigned to the treatment k with a probability less than 'Uk
whenever Nm,k/m > V k , In such case, the biased coin design analyzed by Smith (1984) and Wei, Smythe, and Smith (1986) is a special case of this generalization. Their g ( z , p) does not depend on p.
By symmetry, condition (B2) can be replaced by one of the following conditions: (B2') There exists a constant 0 5
A0
< 1 such that for each k
.. ,K ,
= 1,.
(B2") There exist two constants 0 5 A0 < 1, Q # 0 and an invertible real matrix S = ( ~ 1 ,... , s/K) such that S1' = ~ l and ' for each k = 1,.. ., K ,
78
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
+
REMARK5.2. Condition (B3) is easily understood. At the stage m 1, if all estimated proportions & j , j = 1, . .. ,K , are not very small, but the sample proportion Nm,k/rn is very small, then the probability of the assignment of the ( m 1)-th patient to the treatment k should not be too small to avoid experimental bias. This condition can be simply replaced by the condition that
+
Nn,k
-t
cm almost surely,
k = 1 , . . . , K.
(5.8)
Ifg(z, y) = g ( z ) is only a function of 2,then condition (83) is not needed, since this condition or (B.82) is only used for ensuring the consistency of 6, (cf: Lemmas 5.4 and 5.5). Also, if condition (B2') is satisfied at any point (w,v ) and gk(v, w ) = u k , then (B3) is obviously satisfied. So conditions (ii) and (iii) of Eisele (1994, 1995) or Eisele and Woodroofe (1995) imply this condition. REMARK5.3. Conditions (Bl) and (B4) are usually satisfied. In practice, we can check these two conditions very easily.
The third group of conditions are on the proportion function p ( r ) . The proportion functionx = ( t l , . . . , % d ) = ( ~ 1 1 , . , z l d , . . . , Z K ~. , . . ,Z K ~ 3 ) p ( r ) : E d x K+ (0,l)" satisfies the following conditions:
..
(Cl) p ( 0 ) = w and p ( r ) is a continuous function. (C2) There exists S > 0 for which
For a function h(u,w) : RL x R M + R K , we denote V,(h) and V,(h) to be the gradient matrices related to the vectors u and w, respectively, i.e.,
are two K x I( matrices, and
ASYMPTOTIC RESULTS AND SOME EXAMPLES
79
is a ( d K ) x K matrix. Obviously, H1' = El' = 0' since g(z,y)l' = 1. So, H has an eigenvalue X1 = 0 and has the following Jordan decomposition:
t - ' H t = diag(0, J z , . . . , Js), where J , is a vt x vt matrix, given by -At
Jt=
0 0
.. .
0
1 At
0 1
...
00
...
At
...
..
.
... 0 . . .. ..
0
0
0
*.
We may select the matrix t so that its first column is 1'. Let X = max{Re(Xz), . . . , Re(&)}and v = maxj{vj : Re(Aj) = A}. Further, if condition (B2) is also satisfied, then due to an argument similar to that made in Section 3 of Smith (1984) or in the proof of Lemma 3.2 of Wei, Smythe, and Smith (1986),
t - ' H t = dzag(0, A,.
.. ,A)
and H = AH0 = X(I - l'u),
(5.10)
where Ho = tdiag(O,l,. ., l ) t - ' = I - l'u, and u is the first row oft-'. We will use conditions (Al), (B l)-(B3), and (Cl) to establish strong consistency, and conditions (A2), (B4), and (C2) to establish asymptotic normality. We now make some further remarks on the assumptions.
.
REMARK 5.4. If g(z,y) and p ( z ) are twice differentiable at points (v, v ) and 0, respectively, or the second partial derivatives of them are bounded in a neighborhood of the points (v, v ) and 0, respectively, then conditions (B4) and (C2) are satisfied with 6 = 314. REMARK5.5. Our conditions on the allocation rule are weaker than those used by Smith (1984). Conditions (BIHB3) are weaker than the conditions (i)-(iii) in Eisele (1994, 1995) or Eisele and Woodroofe (1995). Their (iv) is a global condition; our (B4) is a local one instead. Also, condition (C2) is weaker than their (vi). Any condition of the form of their condition (v) is not assumed in this paper. It is usually difficult to verify their (v) in applications.
5.4
ASYMPTOTIC RESULTS AND SOME EXAMPLES
Now we state the three main asymptotic theorems.
THEOREM 5.1. (Strong Consistency) Ifconditions (Al), (BI), (B2), (B3), and (Cl) are satisfied, then N,/n -+ v and &, + w almost surely.
80
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
THEOREM 5.2. (Rates ofConsistency) If conditions (A2), (Bl)-(B4), (Cl), and (C2) are satisfied and X < 1, then for any K. > (1/2) V A, n-'(N,
- nv) -+ 0 ass. and j3, - v = 0
({F) a.s.
Furthermore, if X < 1/2, then
THEOREM 5 . 3 . (Asymptotic Normality) Suppose conditions (A2), (B 1)-(B4), (C l), and (C2) are satisfied. If X < 1/2, then we have n'/'(N,/n
- v, Gn - v ) + N ( 0 ,A). W
In order to define A, we need additional notation. We can write A as a partitioned matrix (5.11) where All, A12, and X3 can be calculated by using martingales and Gaussian processes (Appendix A). We can obtain them as follows. Let
v,,= V m - ( X l , k )= [COV [Xl,kl,Xl,kj];i , j = 1,. . . ,d] ,
(5.12)
fork = 1,.. . , K . Define
v = X3
1
1
Vl
VK
diag(-Vl,...,- VK),
(5.13)
= = diag(v) - v'v,
X2 = E'X3E.
(5.15)
Also, we let Wt and Bt be two independent standard K-dimensional Brownian motions. Define the Gaussian process
to be the solution of the equation dGt = (dWt)C:'2
+ B t Ct : / 2 d t + -Hdt, Gt t
Go = 0,
ASYMPTOTIC RESULTS AND SOME EXAMPLES
81
and define uH to be
Further, let
and
The result in Theorem 5.3 can be strengthened, and we now present Theorem 5.3'. Theorem 5.3 is an immediate consequence of Theorem 5.3'.
THEOREM 5.3'. Suppose conditions (A2), (Bl)-(B4), (Cl), and (C2) are satisfied. If A < 1/2, then n-'12
pint] -
IntIu, IntIF,,t1 - [ntlv)
5 (Gt, &xy2)
in the space D[o,l~ with the Skorohod topology.
REMARK5.6. Ifcondition (B2) or (B2") is satisfied, then by (B.81), aH =
c-
j=O
(Xl0gu)j
j!
Ho = UXHo = UX(I - l'u).
Also, C11' = 0, El' = 0. So, H ~ C I H=OC1, HhC2Ho = C2 and EN0 = E . It follows that Aii =
and
N&Ho 1- 2 x
h x 2 H o - El 2x2 -+ ( 12-NX)(1 - 2 4 1- 2x + (1 - X)(1 - 2 4 '
82
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
The first part of Gt is a Gaussian process with covariance function tXs’-XX1/( 1 2X), which agrees with (3.1) of Smith (1984). If the desired allocation proportions are known, then the second part of Gt does not appear since E = 0 and C2 = 0.
REMARK 5.7. For K = 2, we have
REMARK 5.8. Theorem 5.1 shows that the allocation tends toward Efron’s biased coin design with the desired probabilities as the size of the experiment increases. Theorem 5.2 provides a law of the iterated logarithm for the procedure. This theorem also applies to the adaptive bias coin designs (Wei, 1978) and the designs in Smith (1984) and Wei, Smythe, and Smith (1986). The general variance-covariance formula in Theorem 5.3 is very important because it can be used when comparing the design with other sequential designs. Now we give two examples for the multi-treatment clinical trial.
EXAMPLE 5 . 1 1 . For K > 2, suppose the responses of patients on each treatment are also dichotomous (i.e. success or failure). Let pk = Pr(success1 treatment k ) and qk = 1 - p k , k = 1 , . ..,I 1 are constants. Here the function g depends on the constant L for technical reasons, but we can choose large L to reduce its influence. For this function,
..* l-vz... -v2
1-Vl -v1
-v2
-211
=: - T H O and E = (1 +-y)Ho,
* * .
where Ho = I - l‘v. Obviously, g(z,y ) satisfies conditions (BI) and (B4). Also,
so (B3) is satisfied. For verifying condition (B2), we let f k ( 5 k ) = T ( Z )=
k x k = l fk(2k).
{
2)4%)7)
A L,
Obviously, f k ( x k ) < v k if Z k > v k , k = 1,. .. ,K , and
It follows that T ( z )1 1. So,
Therefore condition (B2) is satisfied. Furthermore, since Hol‘ = 0, V(p)l’ = 0 and vl’ = 1, we have C2 = (l+y)2HbC3Ho= ( l + ~ ) ~ Z ’ and 3 C3E = ( l + y ) & H o = ( l + y ) & .
So if p ( z ) is chosen to satisfy conditions (Cl) and (C2), then by Theorems 5.1-5.3 and Remark 5.1,
and n’/2(N,/n - V , 6,
- V)
+
N(O,A)
in distribution, where (5.18)
84
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
Also, A l l + C3 as y -+ 00. If the desired allocation proportions are the same as that in Wei's design, then the proportion function p( a) is
and the estimate of p = ( p l , ,.. ,pk) is
where Sm,k is the number of successes of all the Nn,k patients on treatment k in the first m stages. In this case
where ej is the vector of which the j-th component is 1 and others are 0. Also, ifthe desired allocation proportions are ...! / CEl then we can choose
(m, m)
p(u) =
As y
4
m,
(d"GiT!...,d%iTGa)/g~-. j=l
co,we have an asymptotically best procedure for the specified target.
EXAMPLE 5.12. Consider a K-arm clinical trial with continuous responses. Suppose that the responses from k ( k = 1, ..., K )treatments follow a normal distribution with mean p k and variance u:, As an extension of Example 5.9 (Baldi Antognini and Giovagnoli, 2005), we can use the following limiting proportions:
We also use the allocation function (5.17). Based on Example 5.1 1, we know that this allocation function satisfies the conditions of Theorem 5.3. It is easy to see that the conditions of Theorems 5.1-5.3 are all satisfied. Then we can obtain the asymptotic distribution of N , as well as its asymptotic variance-covariance matrix. Here we derive the case K = 3 with details. Similar to Example 5.1, we can calculate the matrices V1, V z , and Vs as in (5.12). They are
PROVING THE MAIN THEOREMS
k = 1 , 2 , 3 . From (5.14), (5.15), and (5.16),
+ +u
= E3 = (61
02”
y
u:(u; + U S ) -u:u; -u:u3”
-0:u; -u:u; u f ( u : + u ~ ) -0;u; -I+; us(.: u;)
By Theorem 5.3,
in distribution. When y -+ this particular target.
5.5
00
+
85
I
.
we again have an asymptotically best procedure for
PROVING THE MAIN THEOREMS
In order to prove Theorems 5.1-5.3, we need to use several complicated formulations. The proofs are found in Hu and Zhang (2004). Here we sketch the basic ideas without giving the details. The interested reader can see the complete details in Appendix B. The basic techniques used in the proof are as follows: (a) Developing a matrix recursion (see Appendix A). (b) Using the Jordan decomposition of the matrix recursion (see Appendix A). (c) Determining the order of some terms in the matrix recursion and applying the martingale central limit theorem (see Appendix A) to the leading term. (d) Applying Gaussian approximations (see Appendix A) to calculate the covariance structure. Let Fi = u(T1,.. . , T , , X 1 , . . , X i ) be the sigma-algebra generated by the previous i stages. Under Fm-l, T mand X, are independent, and
.
Let M , = C:=, AM,, where A M , = T , - E[TmIFm-1].Then
Therefore
86
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
For both Theorems 5.1 and 5.2, we have to evaluate the asymptotic order of the Mm, three terms. We obtain strong consistency and asymptotic normality of because it is a martingale sum. However,
xi=,
depends on both
N,-1
and bn-l and
depends on N,-1. To evaluate these two terms, we need several lemmas. Lemma 5.1 and Lemma 5.2 deal with the matrix recursion. Lemma 5.3 is about a sequence recursion of numbers. Lemma 5.4and Lemma 5.5 are about the consistency and rate of convergence of the parameter estimators when N n , k + 00 almost surely for each k = 1, ...)K . LEMMA5.1. Let Bn,n= I and Bn,i = nyz:(I matrices Q, and P, satisfy Q, = P,
+ C YQ H n-l
k= 1
,
Q, = A P n + Qn-1
i.e.,
where A P 1 = P I ,AP, = P,
+ j - I H ) . If two sequences of
- Pn-l,n 2 2, are the differences of P,, then
(5.19)
Also,
llBn,mll 5 C(n/m)xlog"-'(n/m) for all m = 1 , . . . , n , n 2 1, where log%= ln(x V e ) . LEMMA5.2. If two sequences of matrices Q, and P, satisfy AQ, = A P n then for any 6 > 0,
+n-1
n-1
(5.20)
87
PROVING THE MAIN THEOREMS
LEMMA5.3. Let A0 2 0 and KO > 0 be two real numbers, and let {q,} be a sequence of nonnegative numbers and { p , } a sequence of real numbers for which qn
5
A0 + =)(qn-l
(1
V
K O )+ A p n ,
n 2 2,
where A p l = p l and A p , = p , - p , - ~ , n 2 2. Then there exists a constant which depends only on A0 such that
LEMMA5.4, For each k = l,.. . ,K, we have that
{&,k
+ m}
C >0
almost surely
implies
(47)
if (Al),
en& - e,,
=0
(5.21)
if(A2),
i = 1, ...,d.
LEMMA5.5. If conditions (Al), (B3), and (C1) are satisfied, then Nn,k almost surely, k = 1,.. . , I(, and
6 , + 0 and
p^,
-+
-,
00
v a.s.
Further, ifcondition (A2) is alsosatisfied, then &,ki--8ki = o(J(log log Nn,k)/Nn,k) almost surely, k = 1,. . . ,K , i = 1,. . . ,d. Under conditions (Al), (BI), (B2), (B3), and (CI),we have Nn,k
k = 1,...,K , a n d
6, * 0 and
j3,
-+
+ 00
as.,
v
almost surely from Lemma 5.5. Then by using Lemma 5.3, we can show the strong consistency of N , (Theorem 5.1). Theorem 5.2 can be shown similarly. Now we consider Theorem 5.3. First, we can represent both the parameter estimators and the target as martingale sums approximately. Thus, h
n and
a.s.
88
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
where Q, is defined as Q, =
xi=, AQ,,
AQ, = (AQ,,I,. ..,AQ,,K)
with
= (AQ,,ki;i
= 1,.. . , d , k = l , . .. ! I ( )
Then Q, is a martingale sequence in R K x d , and Q, = O(d-) a.s. by condition (A2) and the law of the iterated logarithm from Theorem A.lO. Finally, the asymptotic normality of N , follows from the martingale central limit theorem from Theorem A.14. 0 5.6
REFERENCES
ATKINSON,A . C. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika 69 6 1-67. BAI,2. D. A N D Hu, F. (1999). Asymptotic theorem forurn modelswithnonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BALDIANTOGNINI, A. A N D GIOVAGNOLI, A. (2005). On the large sample optimality of sequential designs for comparing two or more treatments. Sequential Analysis 24 205-2 17. BANDYOPADHYAY, u. AND BISWAS,A. (2001). Adaptive designs for normal responses with prognostic factors. Biomefrika 88 409-4 19. EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 62 347-3 52. EISELE,J. R. (1 994). The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 38 249-262. EISELE,J. R. (1995). Biased coin designs: some properties and applications. In Adaptive Designs (Flournoy, N . and Rosenberger, W. F., eds.). Institute of Mathematical Statistics, Hayward, 48-64. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. Hu, F. A N D ROSENBERGER, W. I?. (2003). Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Associaiion 98 67 1-678. Hu, F. A N D ZHANG,L.-X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-30 1. MELFI,v. AND PAGE, c.(2000). Estimation after adaptive allocation. Journal of Statistical Planning and Inference 29 107-1 16. ROSENBERGER, W. F., STALLARD, N., IVANOVA, A . , HARPER,C. N., A N D RICKS,M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909-9 13. SILVEY,S. D. (1980). OpfimumDesign. Chapman and Hall, London.
REFERENCES
89
SMITH,R. L. (1984). Properties of biased coin designs in sequential clinical trials. Annals of Statistics 12 1018-1034. SMYTHE, R. T. (1996). Central limit theorems for urn models. Stochastic Processes and Their Applications 65 1 15-1 37. THOMPSON, W.R. (1933). On the likelihood that one unknown probability exceeds another in the review of the evidence of the two samples. Biometrika 25 275-294.
WEI, L. J . (1978). The adaptive biased coin design for sequential experiments. Annals of Statistics 6 92-100. WEI, L. J. (1979). The generalized P6lya’s urn design for sequential medical trials. Annals of Statistics 7 291-296. WEI, L. J., SMYTHE,R.T., AND SMITH,R. L. (1986). K-treatment comparisons with restricted randomization rules in clinical trials. Annals of Statistics 14 265-274. ZHANG,L. A N D ROSENBERGER, W. F. (2006). Response-adaptive randomization procedures in clinical trials with continuous outcomes. Biomerrics 62 562-569.
This Page Intentionally Left Blank
6 Sample Size Calculation
In the stages of planning a clinical trial, it is important to determine the number of subjects to be used in the trial. As pointed in Friedman, Furberg, and DeMets (1998):
Clinical trials should have sufficient statisticalpower to dctcct diffcrencesbctween groups considercd to be of clinical interest. Thercforc, calculation of sample size with provision for adequatc levels of significanceand power is an essential part of planning.
In the clinical trials literature, sample size is computed for fixed sample sizes and n = n1 nz. For example, both Eisele and Woodroofe (1995) and Rosenberger and Lachin (2002, page 26) assume that the allocations are fixed (not random) and predetermined. However, given a fixed sample size n of a randomization procedure, the number of subjects assigned to each treatment, n1 and n2, are random variables, unless the randomization procedure employed is a forced-balance design, such as the random allocation rule, truncated binomial design, or permuted block design (Rosenberger and Lachin, 2002). Therefore, as we have seen in Chapter 2, the power of a randomization procedure with a fixed sample size is also a random variable. If one uses a sample size based on the formula for a fixed design for a randomized clinical trial, then the clinical trial may not have sufficient statistical power with very high probability. Examples can be found in Section 6.3. In this chapter, we will study the random power function and then propose requisite sample size formulas for randomization procedures. n1, n2,
+
91
92
SAMPLE SIZE CALCULATION
6.1
POWER OF A RANDOMIZATION PROCEDURE
We now focus on the power function for comparing two treatments (1 and 2 ) in clinical trials. Suppose there are Nnl and Nn2 (Nnl Nn2 = n) patients on treatments 1 and 2 , respectively. Let X I , ...,X N , ~be the responses of patients on treatment 1 and Y1,...,Y N be ~ the ~ responses on treatment 2. For simplicity of illustration, we assume that x1 N ( P l , 4 andY1 " / 4 2 , &
+
'v
where both u? and uz are known and 1-11 and p2 are two unknown parameters. We only consider a one-sided hypothesis test, given by
Ho : p1 = p2 versus H 1 : p l > 1.12.
x
Without much difficulty, we can generalize the problem to a two-sided test. Let and y be the estimators of 1-11 and p2, respectively. For a given significance level a, we reject HOif
is defined as Pr(2 > qU))= where qOl) variable. The power function is now
cy
and 2 is a standard normal random
where Q is the cumulative distribution function ofthe standard normal distribution. It is clear that ,!?(PI,p2, Nnl, Nn2) is a random variable. Before we study the properties of this power function, we state the following condition.
ASSUMPTION6.1. For a given randomization procedure, we assume the following asymptotic results hold: N n l / n -+ v almost surely with 0 < v < 1
(6.3)
and
f i ( N , : / n - Y ) --$ N ( O , T ~ ) in distribution for some T~ > 0.
(6.4)
POWER O f A RANDOMIZATION PROCEDURE
93
REMARK6.1. Assumption 6.1 is usually true for most restricted randomization procedures. Theorem 4.3 and Theorem 5.3 ensure that Assumption 6.1 holds for most of the response-adaptive randomization procedures discussed in Chapters 4 and 5. To study properties of the power function, we define a function
After some simple calculations, we can obtain
where q!~is the density function of the standard normal distribution, and
We now state the following approximate result for the power function of a randomization procedure (Hu, 2006).
THEOREM 6.1. Under Assumption 6.1, we have the following approximation for large n:
94
SAMPLE SIZE CALCULATION
PROOF.From (6.6), we have
P( P I
1
C L 1~Nn 1 , Nn2
Now f,(lc) is adifferentiable function for small 1x1. When n is large, both @(a,) and an$(an) are bounded, because @(a,) converges to 0 much faster than a, converges to 03. Therefore fz(lc)is bounded for small 1x1. Based on Assumption 6.1 and a Taylor expansion,
+0.5f;(O)
(2
- u)
+ oP
((% ') - u)
When fL(0)# 0, we have
=
fi
(2
- u)
2
+ 0.5f~(O)(f~(O))-'fi (% n - U ) +~ ~ ( n - ' / ~ ) .
Because fL(0) # 0 and f i ( c l . 1 - p.2) is bounded by some constant, therefore, 0.5f;(O)(fA(O))-' is bounded. By Assumption 6.1,
n
REMARK 6.2. The function f,(O) is determined by u and represents the power for a fixed design (with u as the allocation proportion, that is, N,l/n = u). The value fn(0)is fixed for a given n. Also, fn(0) is maximized by taking v=-
01 01
+
u2
or Neyman allocation.
REMARK 6.3. When fA(0)# 0, the main random term is fA(0)(Nnl/n- u ) . The power function is mainly influenced by N,l/n - v, a random variable. The power will increase or decrease according the value of N,l/n. From Theorem 6.1, the power of a randomization procedure is approximately normally distributed for large n. To control the influence of this term, it is important to make fA(0) = 0, which is againNeymanallocation. When fA(0) = 0, the main random term is f:(0)(Nnl/n-
POWER OF A RANDOMIZATION PROCEDURE
95
v ) ~ .This is a second-order term and usually quite small. However, for most randomization procedures, f; (0) # 0.
For a randomization procedure, the average power is PO(P1, P2 1 n) =
EP(P1, P2, Nnl , K 2 ) .
Based on Theorem 6.1, we can obtain a result for the average power lost due to using a randomization procedure (Hu, 2006).
THEOREM 6.2. Under the assumptions of Theorem 6.1, if E(N,l/n) -v = o(n-I), then
P o ( P ~~,
2n ) , = f,(O)
+ 0.5ft(O)E (+- v)' + o (E (+- v)
')
When a, > 0, we have fl(0)< 0. Therefore, the average power lost from using a randomization procedure is -0.5f:(O)E(N,l/n - v )2 , which is a function of the variabililty of the randomization procedure.
PROOF. From Theorem 6.1, we have r3o(P1,
P2, n ) = E
m ,P2, Nn1, Nn2)
= fn(0) + fA(O)E( N n ~ / n- v)
(% - v) f,(O) + 0.5ft(O)E(*n +0.5fl(O)E
=
from the condition E(N,l/n) - u = .(TI-'). Because a, > 0, we need only show that
+o ( E (
-
- v)
')
.)'+ (* ') o(E
n - V)
Now we show that f " ( 0 ) < 0.
This is obtained as
-
(.a;
+ (1 - v)u:)(v3u; + (1 - v ) 3 4 ) - (v'u; - (1 - v) u2l )2 2
v4(1- 4 4
REMARK6.4. We are usually interested in the value n such that f,(O) is around power 0.8 or higher. In this case, the condition a, > 0 is satisfied. The condition E(N,l/n) - v = is weaker than the usual unbiasedness condition E ( N n l / n )- v = 0.
.(.-')
96
SAMPLE SIZE CALCULATION
REMARK6.5. By using a randomization procedure, the average loss of power is given by - 0 . 5 f ” ( 0 ) E ( N n l / n - v ) ~which , depends on the variability of the design. This agrees with results of Chapter 2. It is important to note that the average power lost has order n - l , and this order is not very significant in simulations. This may be the reason why the randomness is ignored in sample size calculations, because one usually only checks the average power of a fixed sample size by simulation. However, in practice, one only runs a clinical trial once; therefore, it is critical to consider the random power instead of the average power. Now we consider the following general case. Suppose f i 1 and li.2 are the corresponding estimators of p 1 and p2 based on the data. Let B: and 62”denote the corresponding variance estimates of f i f i . 1 and f i f i z , respectively. ASSUMPTION 6.2. Assume that as Nnl -+ 00 and Nn2 -+ co almost surely, (i) By -+ 0: and d; in probability and (ii) m(fi1 - p l ) -+ N ( O , o ; ) and m ( f i 2 - pz) -+ N ( O , o ; ) in distribution, where 0: and 0; are some positive constants. --$
02”
When the estimators f i 1 and f i z are maximum likelihood estimators, moment estimators, or estimators from some estimating functions, Assumption 6.2 is usually satisfied. Based on Assumption 6.2, for a given significance level a, we reject Ho if
We can then calculate the approximated power similar to Theorems 6.1 and 6.2. Details are provided in Hu (2006). 6.2
THREE TYPES OF SAMPLE SIZE
In this section, we consider the requisite sample size for a randomization procedure. Here we assume that a:, 0;and p1 - 112 are given. To achieve power p, it is required that
which, under Assumption 6.2, is approximately equivalent to
After some simple calculations, N,1 and Nn2 must satisfy
THREE TYPES
OF SAMPLE SIZE
97
For a fixed procedure, one has Nnl = nv and Nn2 = n(1 - v) (predetermined). The sample size no can then be calculated as
This is the requisite sample size for fixed design; here we call it the type I sample size.
REMARK6.6. The sample size formula in (6.8) is used in literature for both fixed procedures and randomization procedures. Eisele and Woodroofe (1 995) used this formula for doubly-adaptive biased coin designs. Rosenberger and Lachin (2002) also used it for randomization procedures. In the literature, the randomness of Nnl (of a randomization procedure) is ignored. As pointed out in Rosenberger and Lachin (2002, page 26), the randomness should not be ignored under response-adaptive randomization. But they did not study this problem further. For randomization procedures, both Nnl and Nn2 are random variables for a fixed n. Based on Assumption 6.1, we have Nnl + 00 and Nn2 00 almost surely as n -+ 00. Now from Assumption 6.2, we have -+
in distribution as both Nnl -+ 00 and Nnz under H1 can be approximated by
--f
00.
Therefore the power of the test
where 21 is a standard normal random variable and is the cumulative function of the standard normal distribution. From Assumption 6.1, we can replace Nnl and Nn2 by vn ~ f i Zand ( 1 v)n - r f i Z , respectively, where 2 is a standard normal random variable. The approximate power is then
+
in distribution. Thus, the mean power @&I, fixed n is approximately
p 2 , n) = E(P(p1,p2, Nn1 ,N n 2 ) ) for
POO(P~ ,P Z ,n) = ~ 8 ( ~P L1ZNn1, ,> ~ n 2 )
98
SAMPLE SIZE CALCULATION
To achieve a fixed power P on average, we just find the smallest n such that n ) 2 P. We refer to this as the fype 11sample size. From the above derivation, we have the following theorem (Hu, 2006).
&(pl,p2,
THEOREM 6.3. Under Assumptions 6.1 and 6.2, the sample size n1 (type 11) to achieve a fixed power P on average is the smallest n such that & ( p l , p 2 , n ) _> P, where the power function D o ( p l , p 2 ,n) is defined in (6.10).
REMARK6.7. In Theorems 6.1 and 6.2, we approximate the power function P ( p 1 , p2, N,1, Nn2) and the average power P o ( p I , p 2 ,n ) by Taylor expansion. In Theorem 6.3, we use the approximate normal distribution to calculate the corresponding average power and then calculate the requisite sample size. These two approximations are equivalent for large n. However, the ,&(PI, p2, n ) in (6.10) is much easier to implement. Also, the condition E ( N , l / n ) - v = o(n-') is not required in the calculation of sample size. From (6.9), we found that power depends on the proportion v and the variability 7 of a randomization procedure. When v is fixed, the power is a decreasing function of 7 . This has been demonstrated by simulation studies in Melfi and Page (1998) and Rosenberger et al. (2001). This agrees with Theorems 6.1 and 6.2. In practice, the clinical trial is only done once. Therefore, to achieve a certain power P on average is not enough. It is desirable to find a sample size n such that Pr(P(pI,p2, & I ,
N,z) 2
0)2 1 - P.
We refer to this as the type 111sample size. To achieve this, we have the following theorem (Hu, 2006).
THEOREM 6.4. Under Assumptions 6.1 and 6.2, the sample size 71.2 (type 111) to achieve a fixed power P with at least (1 - p)lOO% confidence is obtained (approximately) to be the smallest integer n which satisfies
and
where 7 is given in Assumption 6.1.
PROOF. We wish to find a sample size 122 such that for all n 2 np. By Assumption 6.2, this is approximately equivalent to
EXAMPLES
99
Thus, Nnl and Nnz must satisfy
If (6.11) and (6.12) are satisfied, then
is true for all
n v - z ( ~ / ~ ) T &< Nn1 < nu
+ ~ ( ~ 1 2 ) rand f i Nn2 = n - N n l .
Based on Assumption 6.1, we have approximately P r (nu - z ( ~ / ~ ) T &< Nnl < nu
+ Z(,,/~)T&)
= 1- p.
To ensure (6.13) approximately, we just need to find a sample size n such that both (6.1 1) and (6.12) are satisfied. The proof is now complete. 0
REMARK6.8. In Theorem 6.1, we obtained the approximate power function of a randomization procedure. When we calculated the requisite sample size (type 111) in Theorem 6.4, we did not use the result of Theorem 6.1 directly. This is because the approximate power in Theorem 6.1 depends on several terms; this makes it difficult to define a formula for the sample size. In Theorem 6.4, we derived the sample size formula directly from the definition of power, making it easier to understand and easier to calculate by using numerical methods. The results ofTheorems 6.3 and 6.4 are based on asymptotic properties of randomization procedures. In the following two sections, we use these results to calculate the requisite sample sizes and study their finite sample properties. For simplicity of notation, we will use no to represent the type I sample size from (6.8), n1 to represent the type I1 sample size from Theorem 6.3, and 722 to represent the type I11 sample size from Theorem 6.4. 6.3
EXAMPLES
In this section we calculate the sample sizes for restricted randomization procedures as well as response-adaptive randomization procedures. We will use the notation from Section 6.2. Also, we will use 1 - p = 0.9 in this section. 6.3.1
Restricted randomization
EXAMPLE6.1. COMPLETE RANDOMIZATION. Assign each patient to each treatment group with 1/2 probability. It is easy to see that N n l / n + 0.5 almost surely
100
SAMPLE SIZE CALCULATION
and in distribution. The sample size no is estimated as the smallest integer, which is (6.14)
The sample sizes n1 and n2 are defined in Theorems 6.3 and 6.4, respectively, with v = 1/2 and r = 1/2.
EXAMPLE 6.2. WEI’SURNDESIGN. Wei (1978) shows that 4(3b - a) in distribution. When a = 0 and b = 1, u = 1/2 and r2 = 1/12. We can then calculate sample sizes n1 and 122.
EXAMPLE 6.3. GENERALIZED BIASEDCOINDESIGN.Smith (1984) described a generalized biased coin design, which include Wei’s urn design as a special case. To describe the design, we first let N j l be the number of patients in the experimental treatment of the first j patients and Nj2 be the number of patients in the control treatment of the first j patients. Therefore N j l Njz = j . Assign the ( j 1)-th patient to treatment 1 with probability
+
+
N;2 N,71+ Nj’2* From Hu and Zhang (2004)’ we can show that
in distribution. Smith recommended the design with y = 5. In this case, v = 1/2 and the asymptotic variance is 1/44. The properties of the above three designs have been extensively studied in Chapter 3 of Rosenberger and Lachin (2002). Here we calculate the requisite sample sizes and then compare the results. The results in Table 6.1 show that n1 is greater than no by only 1 for both complete randomization and Wei’s urn design. For the generalized biased coin design, no and n1 are the same. This indicates that all three randomization procedures do not lose too much power on average compared to a fixed design. This confirms our finding that the average power lost has order n-l for a randomization procedure. We find that n 2 could be much larger than the sample size no. For example, for complete randomization with “1 = 1 and u2 = 2, no = 62, but n2 = 72 is required to achieve the fixed power (0.80) with probability 0.9. For this case, if a sample of
EXAMPLES
101
Table 6.1 Sample sizesfor complete randomization(CR), Wei 's urn design (UD). and Smith 's generalized biased coin design (GBC) (a = 0.05, p = 0.8, and 1.11 - 1.12 = 1).
n0
(CR) n2 (CR) nl (UD) nz (UD) 121 (GBC) 122 (GBC)
711
25 26 28 26 26 25 25
62 63 72 63 68 62 65
211 212 234 212 223 211 217
804 805 853 805 831 804 818
62 is used for a complete randomization, with 20% chance, the power is less than 0.76 when the target power is 0.80. This is due to the variation (1/4) of complete randomization. This disadvantage was pointed out by Efron (1971). For Wei's urn design and the generalized biased coin design (y = 5), the sample sizes 122 are 68 and 65, respectively. This is because both designs have much smaller variability. We did not calculate the sample sizes for Efron's biased coin design, because Assumption 6.1 is not satisfied. However, based on the simulation results on page 49 of Rosenberger and Lachin (2002), its sample sizes should be similar to that of the generalized biased coin design with y = 5. For the case that 6 1 = 02 = 1, fA(0) = 0 for Y = 1/2. In this case, the main random term in the power function is of order n-l. This explains the sample sizes n2 for both Wei's urn design and the generalized biased coin design.
6.3.2
Response-adaptive randomization
Following the notation in Section 6.2, when the variances 6: and 02" are known, then the optimal allocation for minimizing the total sample size and retaining preassigned power (Jennison and Turnbull, 2000) is Neyman allocation. In this case, u = al/(al 0 2 ) and fA(0) = 0. Therefore, the main random term of the power function has order n-l. However, 0: and 02" are usually unknown in practice. In these cases we can target the allocation using the methods in Chapter 5 .
+
EXAMPLE 6.4. DOUBLYADAPTIVE BIASED COIN DESIGN. HU and Zhang's (2004) randomization procedure is introduced in Example 5.7. Based on the asymptotic results of Theorem 5.3, we have
102
SAMPLE SIZE CALCULATION
Table 6.2 Samplesizesfor complete randomization(CR). doubly-adaptivebiasedcoin design (DBCD), and the sequential maximum likelihoodprocedure (SMLE) (a = 0.05, = 0.8, and pi - pa = 1).
no (CR) nl (CR) n2 (CR) no (DBCD) ~ . (SMLE) 1 n2 (SMLE) ~1 (DBCD,y = 1) 7 ~ (DBCD,y 2 = 1) 721 ( D B C D , y = 4 ) 122 (DBCD,y = 4)
25 26 28 25 28 31 26 28 26 27
62 63 72 56 58 63 57 59 57 58
211 212 234
155
157 163 156 159
156 158
804 805 853 501 504 510 502 506 502 505
almost surely and
in distribution. For fixed a and power p, no is estimated as the smallest integer larger than (01
+ O2Y(+) + q l - p ) ) 2 (111
- 112p
(6.15)
Then n1 and n2 are defined in Theorems 6.3 and 6.4, respectively, with
Note that when y = 0, we have the sequential maximum likelihood procedure of Melfi and Page (1998). For a given a = 0.05 and p = 0.8, Table 6.2 reports the required sample sizes (no,n1, and nz)for 11.1 - p2 = 1 and some different 0 1 and 0 2 values. Three designs (y = 0,1,4) are reported. From the results in Table 6.2, the sample size n1 is greater than no by 2 or 3 for each randomization procedure. This agrees with the theoretical results in Section 6.2. When 0 1 = 0 2 , the sample sizes no of the three designs are the same. The doublyadaptive biased coin design has substantial advantages over complete randomization when 01 is different from 6 2 . For example, when 01 = 1 and 0 2 = 8, the sample size
REFERENCES
103
no of the doubly-adaptive biased coin design is 501, which is significantly smaller than 804. So a well-chosen response-adaptive randomization procedure can reduce sample size significantly in clinical trials. From Table 6.2, we can find that 712 is slightly larger than no for the doublyadaptive biased coin design with y = 4. Therefore, without substantial change in sample size, a well-planned randomization procedure can still achieve a fixed power with high probability. Sample sizes in Table 6.2 are based on the power function (6.10), which depends on the asymptotic distribution of nl and 122. It is important to know their finite properties. Hu (2006) has compared the approximate average power function (6.10) with its simulated average power function. We consider the case with p1 = 1, p2 = 0, (TI = 1, I Y ~= 2, a = 0.95, and p = 0.80. The simulation shows that the average power function (6.10) provides a good approximation of the simulated average power. For randomization procedures, we can simulate the average power for each fixed sample size. Usually we can use Monte Carlo simulation to find the sample size nl to achieve a target power on average (as in Rosenberger and Hu, 2004). However, we cannot simulate the random power function for a fixedsample size. This is because we observe 1 (rejection) or 0 (acceptancc) from each simulation. This does not give us the distribution of the random power. Therefore, it is difficult to use Monte Carlo simulation to find the sample size 122.
6.4
REFERENCES
EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 62 347-3 52. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. FRIEDMAN, L. M., FURBERG, C . D., AND DEMETS, D. L. (1998). Fundamentals of Clinical Trials. Springer, New York. Hu, F. (2006). Sample size and power of randomized designs. Unpublished manuscript. Hu, F. AND ZHANG, L.-X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-301. JENNISON,c. AND TURNBULL, B. w. (2000), Group Sequential Methods with Application to Clinical Trials. Chapman and HalVCRC, Boca Raton. MELFI, v. AND PAGE, c. (1998). Variability in adaptive designs for estimation of success probabilities. In New Developments and Applications in Experimental Design (Floumoy, N., Rosenberger, W. F., and Wong, W. K., eds.). Institute of Mathematical Statistics, Hayward, 106-1 14. ROSENBERGER, W. F. AND Hu, F. (2004). Maximizing power and minimizing treatment failures in clinical trials. Clinical Trials 1 141-147. ROSENBERGER, w. F. AND LACHIN,J. M. (2002). Randomization in Clinical
104
SAMPLE SIZE CALCULATION Trials: Theory and Practice. Wiley, New York.
ROSENBERGER, W. F., STALLARD, N.,IVANOVA, A., HARPER,C. N.,AND RICKS,M. L. (2001). Optimal adaptivedesigns for binary response trials. Biometrics 57 909-9 13.
SMITH,R. L. (1984). Properties of biased coin designs in sequential clinical trials. Annals ofStatistics 12 1018-1034.
WEI, L. J. (1978). The adaptive biased coin design for sequential experiments. Annals of Statistics 6 92-100.
7 Additional Considerations
7.1
T H E EFFECTS OF DELAYED RESPONSE
From a practical perspective, there is no logistical difficulty in incorporating delayed responses into the response-adaptive randomization procedure, provided some responses become available during the recruitment and randomization period. For urn models, the urn is simply updated when responses become available (Wei, 1988). For procedures based on sequential estimation, estimates can be updated when data become available. Obviously, updates can be incorporated when groups of patients respond also, not just individuals. But how does a delay in response affect the procedures? Early papers evaluate the effects of delayed response by simulation using a priority queue data structure (e.g., Rosenberger and Seshaiyer, 1997). Bai, Hu, and Rosenberger (2002) were the first to explore the effects theoretically. In this section we present the condition required for our asymptotic results to be unaffected by staggered entry and delayed response. We find this condition to be satisfied for reasonable probability models. Bai, Hu, and Rosenberger (2002) assume a very general framework for delayed response under the generalized Friedman's urn with multinomial outcomes, and the delay mechanism can depend on the patient's entry time, treatment assignment, and response. We may not need that full generality in practice, but we state the general model for completeness. Asume a multinomial response model with responses cn)
-+
0,
C:=l E ( X i i l F + l )
(iii) c i 2 ,&[E(X;i)2
-+
0 in probability, and
- , ~ ( ( X i ~ ) ~ l F i - l0,) ] -+
where X i i = XiI(lXi( _< c,) and I ( . ) is an indicator function.
THEOREM A.5. (Strong Law of Large Numbers) Let { S i , F i } ,i = O , l , 2, ... be a zero-mean, square-integrable martingale. Then S, converges almost surely if
C& E(X;lFi-l) < 00.
A.4.2
The martingale central limit theorem
Let {S,j, F,j, 1 5 j 5 kn} be a zero-mean, square-integrable martingale for each n 2 1, and let X,j = Sj, - Sn,j-l, 1 5 j 5 k, (S,O = 0) be the martingale differences. Here k, -+ 00 as n -+ 00. The double sequence {S,j, Fnj, 1 5 j 5 kn} is called a martingale array. Define V2j = Ci=, E(X;ilFn,i-l), the conditional variance of Snj and U2j = c { = , X i i , the squared variation.
THEOREM A.6. (Martingale Central Limit Theorem)Suppose that max IXnj I -+ 0 in probability, 3
(A. 1)
166
SUPPORTING TECHNICAL MATERIAL
X:j j=1
-
u2 in probability,
where u2 is a positive constant,
and Fnj
E Fn+l,j for 1 5 j 5 k n , n 2 1.
Then
kn
=
&kn
1
Xnj
-+
N ( 0 ,02)in distribution.
j=1
Based on this theorem, we have the following corollary.
A . l . If COROLLARY kn
E[X:jT(IXnjI > €)[Fn,j-1]-+ 0 in probability for any c
> 0,
(A.5)
j=1
j=l
and condition (A.4) is true, then kn
snk,
=
j=1
X,j
-+
N ( 0 ,u 2 )in distribution.
This corollary is very useful in applications. In Theorem A.5 and Corollary A. 1, we assume that U i k nand V2kn converge to a constant u2. But we can generalize the result as follows.
THEOREM A.7. Suppose conditions (A.l), (A.3), and (A.4) are satisfied. Then U,-,',Snk,
-t
N(O,1) in distribution
if unknconverges to a positive random variable almost surely. A.2. Suppose conditions (A.4) and (AS) are satisfied. Then COROLLARY v;',snk,
+
N(O,1) in distribution
if Vnk, converges to a positive random variable almost surely. In most applications, one deals with a single martingale S, instead of martingale arrays S,j. However, one can apply the above results to martingale Sn in the following way: define k , = n, Fnj = Fj,and Snj = sK'Sj, 1 5 j 5 n, where sn is the standard deviation of S,. Based on Theorems A.5 and A.6, we can obtain the asymptotic normality of S,.
MARTINGALES
A.4.3
167
Gaussian approximations and the law of the iterated logarithm
Asymptotic proofs in Chapter 5 require the use of Gaussianapproximations (cJ Hall and Heyde, 1980, Appendix I). We first introduce the Skorohod representation of a martingale.
THEOREM A . 8 . (Skorohod Representation) Let {Sn,Fn},n = 0,1,2, ... be a
zero-mean, square-integrable martingale with martingale differences Xi.Then there exists a probability space supporting a standard Brownian motion W and a sequence of nonnegative variables ~ 1 , 7 2 , with the following properties. if Tn = Ti, S; = W(T,), X; = S;, Xy = S; - Sy-, for i 2 2, and 3; is the u-algebra generated by S;, ...,S; and by W ( t )for 0 5 t 5 Tn,then
xy=,
...
(i) { S n , n2 1) = {S,t,n 2 1) indistribution, (ii) Tn is 3;-measurable. We now state the main theorem that allows the embedding of a martingale process into a Brownian motion.
xy='=,
THEOREM 12.9. Let {Sn,Fn}, n = 0,1,2, ... be a zero-mean, square-integrable martingale with martingale differences Xi. Let U: = X,?and sf = E ( S z )= E(U;). Define to be a random element of C(0,I] obtained by interpolating between the points (O,O), (U;2Uf, U;lSl),...,( 1,U;lSn),namely,
en
m ( t )= u;'[s~+ x, 0, and (A. 12)
in probability, and condition (A. 10) holds, then kn
Snk, =
x,j
--$
N ( 0 ,C) in distribution.
j= 1
We will use Corollary A.4 to prove central limit results for both the generalized Friedman’s urn and the doubly-adaptive biased coin design in Appendix B. In Theorem A. 14 and Corollary A.4, we assume that U n k , and V n k , converge to a constant c. Now we state some more general results.
THEOREM A. 15. Let { S,j
,3,,j, 1 5 j 5 k n } be a zero-mean, square-integrable martingale array, Suppose that conditions (A.7), (A.9), and (A. 10) are satisfied. Then
Ui,!/S,kn
+ N ( O I,
) in distribution
172
SUPPORTING TECHNICAL MATERIAL
if U n k n converges to a positive definite random matrix almost surely.
COROLLARY A.5. Suppose that conditions (A. 10) and (A.ll) are satisfied. Then V ~ ~ ~ 2 S+ , kNn( 0 ,I ) in distribution if Vnk, converges to a positive definite random matrix almost surely.
A.7
MULTIVARIATE TAYLOR’S EXPANSION
We now state the multivariate extension of the well-known Taylor’s expansion. Given a function f (x),where x is a K-vector, the expansion of f ( x )about 0 is given by
+ af
af2
f(z)= f (0) -x’ ’ ax ax f x-xax
+ o(llxll;).
(A. 13)
When f(x) = (f l ( x ) ,...,fs(z)), each f i ( x ) is expanded separately according to (A.13), i = 1, ..., S. A.8
REFERENCES
ATHREYA, K. B. AND NEY, P. E. (1972). BranchingPuocesses.Physica-Verlag, Heidelberg. BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York. HALL,P. AND HEYDE,C. C. (1980). Martingale Limit TheoryandItsApplication. Academic Press, London.
Appendix B Proofs
B.l
PROOFS
OF THEOREMS IN CHAPTER 4
B.l.1 Proof of Theorems 4.1-4.3 We now prove the main asymptotic theorems for the generalized Freidman’s urn model. These proofs are adaptive from Bai and Hu (2005), and refer extensively to conditions, assumptions, and notation introduced in Chapter 4 and to technical theorems in Appendix A. First, we prove Lemmas 4.1 and 4.2 stated in Section 4.1.5. We then prove the three main theorems stated in Section 4.1.3.
PROOF OF LEMMA 4.1. Let e, = a, - ai-1 for i 2 1. By definition, we have e, = TiDil‘,where Ti is the result of the i-th draw, multinomially distributed according to the urn composition at the previous stages, i.e., the conditional probability that the i-th draw is a ball of type k (the k-th component of Ti is 1and other components are 0) given previous status is K - l , k / a , - l . From Assumptions 4.1 and 4.2, we have
and
173
174
PROOFS
Therefore n
n
i= 1
i= 1
forms a martingale sequence. From Assumption 4.2 and K
By Theorem A S , the series
> 1/2, we have
cq 00
i=l
converges almost surely. Then, by Kronecker's lemma (Hall and Heyde, 1980, p. 31),
almost surely. This completes the proof for conclusion (b) of the lemma. Conclusion (a) is a consequence of conclusion (b). 0
PROOFOF LEMMA4.2. Without loss of generality, we assume a0 = 1 in the following proof. For any random vector, write llYll := Define yn = (ynl, . . . , y n ~=) Y n t.Then(4.12)reduces to
m.
l l ~ -n EynII I MVn.
(B.4)
In Lemma 4.1, we have proved that [lan - n1I2 IC K 2 n (see (B.3) and (B.2)). Noticing that Ea, = n 1from (B. 1), the proof of (4.12) further reduces to showing that
+
for j = 2, .. . ,K . We shall prove (B.5) by induction. Suppose no is an integer and M a constant such that
PROOFS OF THEOREMS IN CHAPTER 4
and
M=
3 75
c1 +cz + c3 + cs + cs + (C3+2C5)MO 1 - 3E
where t < 1/4 is a prechosen small positive number,
and the constants C are absolute constants specified later. Consider m > no and assume that llyn - Eyn(l 5 MV,, for all 5 n < m. By (4.10) and (4.11), we have
I
and B m j j is the j-th column of the matrix Em,*. In the remainder of the proof of the theorem, we shall frequently use the elementary fact that
where $(n,i, A) is uniformly bounded (say, 5 qb) and tends to 1 as i -+ co.In the sequel, we use +(n,i, A) as a generic symbol, that is, it may take different values at different appearances and is uniformly bounded (by $, say) and tends to 1 as
176
PROOFS
i + 00. Based on this, one can find that the (h,ti + e)-th element of the block matrix fly=j+,(I + i-' J t ) is asymptotically equivalent to 1
e!- ( j / W
l o g e ( n / j ) $ ( % ~A,t ) ,
(B. 10)
where At is the eigenvalue of J t . By (B.7) and the triangle inequality, we have
(B. 1 1)
Consider the case where 1 (B. 10) we have
+ v1 + ...+ vt-1
< j 5 1 + v1 f
-
I(YoBrn,o,jII5 CIImTLx~IloguL-l m 1, there is a constant C,
> 0 such that
This inequality is an easy consequence of Burkholder's inequality (Theorem A.l). By using
-1- _ -1 +-2-ai-1 ai-1
and the above inequality, we have
i
zai-1
178
PROOFS
Combining the above four inequalities, we have proved (B. 15)
By Assumption 4.3 and the fact that
IIVi-lll is bounded, we have
5 5
L
(B. 16)
Next, we show that
I
(B. 17)
PROOFS OF THEOREMS IN CHAPTER 4
179
By Assumption 4.3 and the induction assumption that (Iyi-l - E y i - l ( l 5 M A ,
By Jensen's inequality, we have
5 (C5Mo + 6M)Vm. The estimate of the third term is given by
cs 1llWi - Ewcll(m/i)Re(X')log--'(m/i) m
5
5
1/2. Based on (B.20), we just have to show the strong consistency of Cy=lY i - l l a i - 1 . Based on the strong consistency of Y , , we obtain the strong consistency of CyZlY + l / i by using Kronecker's lemma. Now we consider the difference between Cy=lY i - l / u i - , and CyZlY i - l / i . Based on the fact that
almost surely, we establish the strong consistency of N , . 0
PROOFOF COROLLARY 4.3. From the proof of Theorem 4.1, one finds that the term estimated in (B. 12) is not necessary on the right-hand side of (B.11). Thus, to prove (4.13), it suffices to improve the right-hand sides of (B. 15HB.17) regarding EV,. The modification for (B.15) and (B.16) can be done without any further conditions, provided that the vector Yi-1 in these inequalities can be replaced by (0, Y i - l , - )The . details are omitted. To modify (B.17), we first note that (B. 18) can be trivially modified to cVmif condition (4.7) is strengthened to (4.8). The other two estimates for proving (B. 17) can be modified easily without any further assumptions. 0
PROOFS OF THEOREMS IN CHAPTER 4
181
PROOF OF LEMMA 4.3. From Theorems 4.1, we have Y n / a n --+ v a.s. Similar to (B.2), we have K
n
i=l
K
q=l k = l
K (=I
almost surely. Assumption 2.2 implies that { e , - E(eilFti-1))satisfies the Lyapunov condition. From the martingale central limit theorem (Theorem A.6), Assumptions 4.14.3 and the fact that n
a,
- n = 1 + C(e, - E(eilFi-l)), i= 1
the theorem follows. 0 PROOF OF THEOREM 4.2. To show the asymptotic normality of Y , - E Y , , we only need to show the asymptotic normality of y n - Eyn = ( Y , - E Y , ) t . From the proof of Lemma 4.1, we have n ynl
- Ey,,l = a , - IE - 1 = x ( e , - E(e,lF,-l)). i=l
From Corollary 4.1, we have
Combining the above results, we get
Again, Assumption 4.2 implies the Lyapunov condition. Using the martingale multivariate central limit theorem (Theorem A.6), as in the proof of Theorem 2.3 of Bai and Hu (1999), from (B.21), one can easily show that V;’(Y, - EY,) tends to a K-variate normal distribution with mean zero and variance-covariance matrix
By Theorem 4.1, for the case r = 112, V, = f i l ~ g ” - ” ~ n 6, 1 1 = 0 and El2 = 0. When T < 1/2, V, = &, 6 1 1 = K V q d q k i . Now let us find El*. Write t = ( l ‘ , t i , . . ‘, t : ) = (l’,L),ti = ( t ; , , . . ., t i , ) and
zqZ1 c,“==, z,“=,
182
PROOFS
= (Bn,i,a,**. , B n , i , K ) . (Also, the matrices with a minus sign in the subscript denotc the submatrices of the last K - 1 columns of their corresponding matrices.) Then the vector E l 2 is the limit of I
I
I
Bn,i,- = t - ' B n , i t -
n
i= 1 n
i= 1
K
=
n - 1 2 1
( C v q d q + H * ( d i a g ( w-) w*w)H
i=l
= 1
q=l
(&$, q= 1
+H*(diag(v)- v*w)H tn-'-&,-
)
) c n
=
1 C V q d q tn-' (qI1
&,z,-
+op(l)
i= 1
+ op(l),
(B.22)
i=l
where the matrices dq are defined in (4.6). Here we have used the fact that lH'(diag(v) - w*w) = l(diag(w) - w * w ) = 0. By elementary calculation and the definition of B n , i , - , we get
n
n
...
0
0
.(B.23) ..
0
n
n
In the h-th block of the quasi-diagonal matrix n
n
i=l j=i+l
the (9, g
+ 1)-th element (0 5 l 5 vh - 1) has the approximation (B.24) i=l
Combining (B.22), (B.23), and (B.24), we get an expression for X 1 2 .
PROOFS O f THEOREMS IN CHAPTER 4
183
The variance-covariance matrix EZZof the second to the K-th elements of V;'(yn - Ey,) was first calculated in (2.17) of Bai and Hu (1999). For the case, r < 1/2, Vn = fi.Let /n-1
\
n-1
n
r
0
n
0
...
0
.*.
0
n
J-J
t'Rt
(I+j--'JZ)
j=i+l
n j=i+l
(B.25)
+o( 1).
To obtain the limit of (BZ),we need to consider each block (9,h) of the matrix according to the Jordan decomposition for 9,h = 2, ...,s. For given g and h, the block matrix is
c J-J n
n-'
n
i=l j=i+l
n
( I + j - ' J j ) ( t ; ) * R t ; , J-J ( I + j - ' J h ) , j=i+l
(B.26)
184
PROOFS
To calculate this, we need to use the following two results:
1’
za 10gy1/xpx =
Irn
ybe-(”+’)%y =
+
r(b 1) ( a + l)(b+’)
’
for Re(a) > -1, b > -1 and the limit n
as n --t
00.
The (w, t)-th element of (€3.26) can be approximated by
is the (w,t)-th element of the matrix [(tk)*Rti]. Further, the where [(tL)*Rt;E](W,t) (w,t)-th element of (B.26) converges to
When T = 1/2, we shall use the fact that n
z-l logb(l/z)dz
5 n-‘ X ( i / n ) - ’Iogb(i/n) i= 1
5 logb(n)+
l/n 1
2-1
logb(l/z)dz
to obtain (B.28)
If A, = Ah, vg = Vh = v and Re(&) = 1/2, then the corresponding block (9,h ) is
c l-J n
V,-2
n n
n
(I+j-’Jj)(tj)*Rt’h
i=l j=i+l
(I+PJh),
(B.29)
j=i+l
where V,” = n log2”-1n. The (w,t)-th element of (B.29) can be approximated by
By using (B.28), (B.3 1) converges to
((v - 1)!)-2(2v - l)-l[(t;)*Rt‘h](l,J)
(B.31)
PROOFS OF THEOREMS IN CHAPTER 4
185
if w = t = v; otherwise, (B.3 1) converges to 0. Combining 011, C12, and X:22together and by using the martingale multivariate central limit theorem (Theorem A.14), n-’I2(Y, - E Y , ) has an asymptotically joint normal distribution with mean 0 and variance-covariance matrix C. Thus, we have shown that n-”2(Y, - E Y , )
+ N ( 0 ,( t - y C t - 1 )
in distribution. When (4.8) holds, un - nel has the same approximation as the right-hand side of (B.2 1). Therefore, in the martingale multivariate central limit theorem (Theorem A. 14), E Y , can be replaced by nw. This completes the proof of the theorem. 0
PROOF THEOREM 4.3. We have
We shall consider the limiting properties of N n . First, we have n
n
n
n-1
t=l
i=O
(B.32)
For simplicity, we consider the asymptotic distribution of N,t. Since the first component of N,t is a nonrandom constant n, we only need to consider the other K - 1 components. From (4.13) and (B.32), we obtain
i=l n
i=O n-1
i=l
i=O
i=O
186
PROOFS
n-1
n
(B.33)
-
+
n-1 where B,,J = t-'G,+1 . - G , t ,Bn,j = Cz=J B z , J / ( i 1). Here, in the fourth equality, we have used the fact that Y*,-(i 1 - uz)/[uz(i l)] = o p ( f i ) , which can be proven by the same approach used to show (B. 15) and (B. 19). In (B.33), we only have to consider the asymptotic distribution of the martingale A
3
n
xFz;
+
+
n-1
We now estimate the asymptotic variance-covariance matrix of V;'Un. To this end, we need only consider the limit of
n-1
(B.34)
PROOFS OF THEOREMS IN CHAPTER 4
187
since vT- = 0. This estimate implies that n
v12CE(Q;Q~IJE,-~) -+
{
j=l
-
c1=
;diug(w)t-,
if7 < 1/2, (B.35) if7 = 1/2,
as j
-+ 00.
Because Qj = [TjDj - ( Y j - l / ~ j - l ) H j ] t ,
( c )+
= ttdiag(v)Ht V,-2
n-1
gn,j,-
o(1).
(B.36)
j=1
From (B.8), we have
Based on (B.8), (B.9), and (B.101, the (h,h
+ l)-th element of the block matrix
has a limit obtained as
=
(&)e+l.
Substitutingthis into (B.37) and then (B.36), when V , = n, we obtain
j= 1
(B.38)
188
PROOFS
where is a K x ( K - 1) matrix whose first row is 0 and the rest is a block diagonal matrix, the t-block is vt x vt and its ( h ,h e)-th element is given by the right-hand side of (B.38). The matrix 52 is obviously 0 when V’, = n log2”-1n. Note that the third term in (B.34) is the complex conjugate transpose of the second term; thus, we also have the limit of the third term 5;. Now, we compute the limit 5, of the fourth term in (B.34). By Assumption 4.2, the matrices Rj in (B.34) converge to R. Then the fourth term in (B.34) can be approximated by
+
cin
n-1 X
l
i=j+l
n
(I-kT-lJh)
r=j+ 1
I’
(B.39)
g,h=l
Similar to (B.38), we can show that the (20, t)-th element of the (9,h)-th block of the matrix in (B.39) is approximately
(B.40) Here, strictly speaking, in the numerator of (B.40), there should be factors $ ( i , j , 20’) and +(m,j , t’). Since for any j o the total contributions of terms with j 5 j o is o(1) and the $ 3 tend to 1 as j 00, we may replace the $J’S by 1. For fixed w, w‘, t and t’, if A, # A h or Re(A,) < 1/2, we have
-
ccc
1 n-ln-l
- j=l
n-l
i = j m=j
(Z/j)Jg
(m/j)Xhl0gW’(i/j)logt’(m/j) (i l)(m l ) ( d ) ! ( t / ) !
+
+
PROOFS OF THEOREMS IN CHAPTER 4
189
Thus, when T < 1/2, if we split %3 into blocks, then the (w, t)-th element of the ( 9 ,h) block Eg,h (v, x vh) Of 5 3 is given by
(B.42)
I(t;)*Rt~l(u-w’,t-t’).
When T = 1/2, Z9,h = 0 if A, # Ah or if Re(A,) < 1/2. Now we consider Z,,h with A, = Ah and Re(&) = 1/2. If w’ t’ < 2v - 2, then
+
92-1 n-1 n-1
j=1
i=j
(i/j)Xg(l/j)Ag
logW$/j) 1ogtyq.j)
e=j
When w’= t‘ = v - 1which implies w = t = v = vg = vh, by Abelian summation we have n-ln-ln-1
-+
(XgI-2[(v - 1)!]-2(2v
- 1)-1.
(B.43)
Hence, for this case, Eg,h has only one nonzero element, which is the one on the right-lower corner of E9,hand is given by JA,
1-21
(v - 1)!]-2(2v - 1)-1 [(t;)*Rt)h](l,J).
(B.44)
Combining (B.34), (B.39, (B.38), (B.42), and (B.43), we obtain an expression of 5.By using the martingale multivariate central limit theorem (Theorem A.14), n - ’ / * ( N , - nv)t has an asymptotically joint normal distribution with mean 0 and variance-covariance matrix E.Thus, we have shown that n-”2(Nn - nu) -+ N ( 0 ,( t - 1 ) * 3 t - l )
in distribution. 0
8.1.2
Proof of Theorem 4.6
We now prove the main asymptotic result for the generalized drop-the-loser rule, as proved in Zhang et a/. (2006). Before we prove the main theorem, we prove the following two lemmas.
190
PROOFS
LEMMA4.8. Denote by Fn = a(T1,.. . , T n , Y i ,. . . ,Y n ) .Let
Vn,o =
n
C (Tm,om=l
~[~m,oI~m-l]),
n
Vn,k =
x { T m , k ( D m , k- 1) - E[Tm,k(Dm,k- l ) I r n - l ] } ,k = 1,2* m= 1
Assume EIIDm,klP]< 00 for p 2: 2. Then there exists a constant C, > 0 such that the martingales { vn,k,Fn; n 2: l } ,k = 0 , 1 , 2 , satisfy Ivm+i,k
- vm,ilp] 5 CPnpI2for all m and n, k = 0,1,2.
(B.45)
PROOF. Notice that IAVn,o(5 1 and
E[lAvn,kIPIFn-l]5 2’-’(1
+ E[lDn,klp])5 cp, k = 1 , 2 .
By Theorem A.2, we obtain (B.45) immediately. 0 Let un,k = akV,,o 4- vn,k, k = 1 , 2 . Then Un,k is the sum o f conditionally centered changes in the number o f type k balls in the first n draws, k = 1 , 2 . It can be shown that { u n , k ,Fn; n 2 1 ) is a martingale satisfying an inequality similar to that o f (B.45), k = 1,2. The next lemma gives the convergence rate ,of the urn proportions Y n . LEMMA4.9. Under Assumption 4.6, for each k = 1,2 and any 6 > 0,
(B.46) (B.47) (B.48) (B.49)
PROOF.According to (4.24), it is obvious that
+
Yn,k
=
yn-l,k
+ akYn-l,o
4-
- qkYn-l,k
lY;t-ll
(B.50) Then
PROOFS OF THEOREMS IN CHAPTER 4
Let Sn = max{l 5 j 5 n : according to (B.5 l), Yn,k
5 5 5 5
q,k
yn-l,k YS,,k y0,k &,k
0 is a constant and does not depend on t and n. Notice that Y m , k 2 - 1 . Letting t = n-1/4, we have
Choose p such that (4p)-' 5 S yields (B.48) immediately. Equation (B.49) can be derived easily from (B.48) and the Borel-Cantelli lemma. The proof of the lemma is now complete. Equation (B.49) indicates that the terms Y n , k in (4.25) can be neglected.
192
PROOFS
m= 1
m=l
E[AMn,k . AMn,jIAn-l] = 0,
j
#k
and
E [IAMn,kIPIA,-1] 5 2’E [IDl,kI”]. According to the law of the iterated logarithm for martingales (Theorem A. lo), we have Mn,k = a.s. (B.55)
o(d s )
Combining (B.49) and (B.55) shows that, for any 6 > 0,
akN;,o - qkN;,k
= -Mn,k
+ o(n3/8+6/2)
= -Mn,k -k = O(d-)
which, together with the fact that N;,o N;,k
= n
-
(B.56)
O(n3/8+6)
as.,
k = 1,2,
+ N;,l + N;,2 = n, yields
aklqk adq1-t a d q z
nSVk
s+l
+ 1 + O(&z&$ + o(d-) a s . , k = 1,2
(B.57)
and
alqZN;,2 - azq1N;J
= a2(alN;,o - QIN;,l) - .l(azN;JJ - 4zN;,2) = alMn,z - azMn,l+ o(n3/8+6) a s . (B.58)
We consider the martingale { M n =: alMn,2 - azMn,l}. From (B.54) and (B.57), it follows that n m=l n
n
m= 1
m=l
- n ~ ( u : v 2 0 +; a;v,a:)
+
O( J-) as. s+l By the Skorohod representation (Theorem A.8), there exists an A,-adapted nondecreasing sequence of random variables { rn}and a standard Brownian motion B such that
EIArnlAn-l]= E[(AMn)’(An-1], ElArnlp/2ICpElAMnI” 5 c p l b > 2
PROOFS OF THEOREMS IN CHAPTER 4
M , = qqI n = 1 , 2 , . . . .
193
(B.59)
Note that {C",l(Arm- EIAr,Idm-l])} is also a martingale. According to the law of the iterated logarithm, we have
On the other hand, by (B.57), we have
+ O(d z )
+ N;,2 = &n
U.S .
It follows that
Substituting (B.59) and (B.60) into (B.58) and applying the properties of a Brownian motion (cJTheorem 1.2.1 of Csorgo and RBvisz, 1981), we have aiq2Nn,2
-~
q l N n= , ~ aiqzN:,,,2
+ +
- azqiN.l,,,l = B(rUn)+ O ( U ; ' ~ + ~ )
+
+
= ~ ( n ( u ~ w 2 u , u2 ~ w l a ~ ) ~((n~oglogn)'/~(logn)'/~) ) 0(n3/8+6 1 = B ( ~ ( u : w ~ u&,a:)) ; ~ ( n ~ / u.s., ~ + ~ )
+
which, together with the fact that Nn,1
+ Nn,2 = n,yields
Notice that 2
u =
u;wlu: (a1q2
+
U?W,U,2
+a2d2
'
thus agreeing with (4.19). Let 1 B(t(u:Wlu;
W ( t )= -U
alq2
+ U;wlg:))
+ a2q1
Then { W ( t ) t; 1 0} is a standard Brownian motion. The proof is now complete. 0
194
8.2
PROOFS
PROOF OF THEOREMS IN CHAPTER 5
We now prove the main asymptotic theorems for the doubly-adaptive biased coin design, as originally proved in Hu and Zhang (2004a). We first prove the five lemmas stated in Section 5 . 5 . Then we prove the three main theorems stated in Section 5.4.
PROOFOF LEMMA5.1. First, we show the result (5.19) by induction. It is easy to see that (5.19) holds for n = 1. Now suppose it holds for n - 1, that is,
Then
-At Jt
=
1 At
0 1
...
0 0 0
... *. .. .. .. . . . . . . 0 0 0 ... At 0 0
0
-
PROOF OF THEOREMS IN CHAPTER 5
195
where llAnll -+ 0. Then
From (5.19), it follows that
Then by (5.20), there exists a constant GO2 1such that
Now we define a sequence of real numbers D , 2 1 such that
It is easy to define D, for m = 1,.. . ,9. Assume that n 2 10 and D, is defined form = 1,. . . , n - 1. Let n1 = and write
[a,
From (B.61) it follows that
196
PROOFS
where c k = k + 1 ( ' ) = 0.
NOW
define
Next, it suffices to show the boundedness of D,. Since llAnll + 0, there exists a n6 such that
Then
+
which, together with the induction, implies that D, 5 1 maxrn 0 can be chosen arbitrarily small. 0 PROOF OF THEOREM 5.3. Let Q, = C:=, AQ,, where AQ,=(AQ,,1
,...,A Q m , K ) = ( A Q m , k i ; i =...l ,, d , k = l , ...,K )
By definition,Q, is a sequence of martingale in R K x dand , Q, = O ( d w ) almost surely by (A2) and Theorem A. 10 (the martingale law of the iterated logarithm). By (B.66),we have
(B.67)
202
PROOFS
By (B.65)and condition (C2), we have
On the other hand, by (5.9) and (B.65)-(B.67), we have
Then, by Lemma 5.1 it follows that
PROOF OF THEOREMS IN CHAP JER 5
203
Note that U, is a sum of martingale differences. We now check the Lindeberg condition on U n .For some 0 < c < 1 / X - 2,
By the martingale multivariate central limit theorem (Theorem A. 14), we obtain the asymptotic normality of N , . Now the main task is to calculate its asymptotic variance-covariance matrix. First, we have
+ diug{g(v,v ) }
- {g(v,w)}'g(v,v) = diug(v) - v'v = c1
almost surely and
G O V ( A M ~ , A Q , ~= F 0~ - U.S. ~} Second
)
204
PROOFS
are martingales. We would like to embed these two martingales in a multivariate Brownian motion by using Theorem A.9 and Theorem A. 1 1 (Cram&-Wold device). Based on the above conditional variance-covariance limits, it follows that for any o < s < t < 1, (8.70)
+ n i a d z [ pY (:)Hdy]'.2 Y
[p
Y ( Yy d Y ] + o ( n ) (8.71)
PROOF OF THEOREMS IN CHAPTER 7
=
nsAl2
+ o(n),
205
(B.73)
and
where
This shows that the limiting variance-covariance function of
n-1/2 ( U [ n t ] r Q [ n t l v ( P ) I ~ ) agrees with the covariance function of (Gt, as defined in Theorem 5.3’. So, by the weak convergence of martingale (Theorem A.9),
n-lI2
(V,,t], q n t ] V ( P ) I , )
-+
(Gt,
my2)
in distribution. Thus, we have proved Theorem 5.3’. From (B.68) and (B.69), we can see that Theorem 5.3 is a special case of Theorem 5.3’ by taking s = t = 1 in (B.70), (B.72), (B.73), and (B.74). 0 B.3
PROOF OF THEOREMS IN CHAPTER 7
In this section, we prove results on delayed responses stated in Chapter 7. For the generalized Friedman’s urn, the proofs are adapted from Bai, Hu, and Rosenberger (2002) and Hu and Zhang (2004b). For the doubly-adaptive biased coin design, proofs are adapted from a currently submitted manuscript of Hu et al. (2006).
206
PROOFS
PROOFOF THEOREM 7.2. If we can express the effect of delayed response mathematically and show that the term from the delay mechanism is negligible, then by Theorems 4.1-4.3, we obtain Theorem 7.2. To do this, we first express the effect of delayed response. For simplicity of presentation. we use &,to represent the response of n-th patient. For patient n, after observing tn = 1, Jn = j (treatment j), we add d & ( I ) balls of type i to the urn, where the total number of balls added at each stage is constant; i.e., d j i ( l ) = 0, where p > 0. Without loss of generality, we can assume 0 = 1. Let
c%,
+ + +
So, for given n and m, if Mjl(n,m ) = 1, then we add balls at the ( n m)-th stage (i.e., after the ( n m)-th patient is assigned and before the ( n m 1)-th patient is assigned) according to T n D ( l ) Since . M j p ( n ,m ) = 0 for all 1’ # 1,
+
Consequently, the number of balls of each type added to the urn a..:r patient is assigned and before the ( n 1)-th patient is assigned is
+
the n-th
Wn m=O 1=1
m=l k 1
We can write (B.75)
By (B.79, we have n
n
n
n
L
m=l k = m 1=1
k
L
PROOF OF THEOREMS IN CHAPTER 7
207
m = l k=O 1=1 n L c o m = l 1=1 k=O
1=1 m = l k=n-m+l
n
L
1 = 1 m = l k=n-rn+l n L
n
That is, (B.76) where L
n
00
(B.77) If there is no delay, i.e., M~,,,,l(rn,k ) = 0 for all k 2 1 and all m and 1, then R, = 0 and (B.76) reduces to (B.78) which is the same as the basic recursive form of the urn model in Chapter 4. The main task now is to show that R, can be neglected. Before we do that, it shall be noted that the distance between Y nin (B.76) and that in (B.78) (without delayed responses) is not just h. This is because the distributions of T , will change due to delayed responses. So the asymptotic properties of the model with delayed responses do not simply follow from those when delayed responses do not appear. For a vector z in R m , we let 1 1 ~ 1 be 1 ~ its Euclidean norm and define the norm of an rn x m matrix M by llMlle = sup{llzMll/llslle : z # 0 ) . For any vector P and matrices M, M 1 , we have
208
PROOFS
Now from (B.77), for some constant C,
n
M
because llTmlle= 1for all rn and the number of balls added is always bounded. So n
n
It also follows that
which is summable. By the Borel-Cantelli lemma, we have
R, = o (nl-c') a.s. ~ c 0 under Assumption 7.2, where & is the estimator defined in Chapter 5 without delayed responses. To do this, we require three additional lemmas. From the time when the n-th patient is assigned to the time when the (n + 1)-th patient is to be allocated, all of the responses of patients on treatment k that we have observed correspond to the nonzero Tn-m,k&lk(n - m,m)-n- m,k , m = 0 , . . . ,n - 1, and the total number of such cases is
Let
Remember that Sn,k is the total sum of the responses on treatment k that are observed up to the time when the (n 1)-th patient is ready for treatment allocation, and W n , k is the number of responses observed. Then, for each k = 1,.. . ,K, we have
+
n
n
n
.i
n
m=l j=m
(B.81)
m=l j = O
and n
n n-m
(B.82) m= 1
m=l j = O
210
PROOFS
LEMMA7 . 1 . Let {?&,k = 1 , 2 , . . .} be a sequence of i.i.d. random variables with zero means and { ~ k k, = 1 , 2 , . ..} be another sequence of random variables that takes only two values, 0 or 1. Denote Cn = C;==,~ j Suppose . that for each n, qn is dependent on {ql ,... ,q n - l r € 1 , . . , E , } . Then, in the event that + m},
.
{c,
PROOF. The proof is similar to the proof of Lemma 5.4 in Section B.2.
0
LEMMA7 . 2 . For the doubly-adaptive biased coin design with delayed responses, if conditions (Al), -(B3), and (Cl) are satisfied, then N,,k .-+ 0;) almost surely, k = 1 , . . . ,I ( , and 0, 0 almost surely. PROOF. Based on (5.6) and Lemma 7.1, it suffices to show that, for each
Nn,k + 00 almost surely, k = I , . . ., K and
-
{Nn,k -+ m} implies @n,k + 6 k almost surely,
k = 1,.. . ,K .
(B.83)
As the former is implied by the latter (c$ the proof of Lemma 5.5 in Section B.2), it remains to prove (B.83). It is obvious that W
x h f k ( m , j ) = l for k = l ,
j=O
..., K.
(B.84)
Fix k = 1 , . .,, K . By (B.84) and (B.82),
For each j and k , { M k ( m ,j ) ,m = 1 , 2 . . .} is a sequence of i.i.d. random variables. From Lemma 7. I , it follows that N,,k -+ 00 almost surely implies
almost surely, j = 0 , 1 , ... . Then, in the event that { N,,k
3
m},
PROOF OF THEOREMS IN CHAPTER 7
almost surely as T
+ 00.
We therefore conclude that in the event that { Nn,k
211 + CQ},
(8.85) almost surely. Note that
in=lj=n-m+l
m=lj=n-m+l
m = l j=n-m+l
m= 1
m=l
By Lemma 7.1 and (B.85), in the event that { Nn,k
almost surely as A
-+ 00.
+ m},
Also, again by Lemma 7.1,
ck=l Tm'kXm'k
+ t9k a.s. in the event that
{ N n , k + 00).
Nn,k
Thus, by noticing (B.8 1) and (B.84), we conclude that in the event that { Nn,k
sn k Nn,k
+ CQ},
C\=iTm,kXm,k Nn,k
-
almost surely. Combining (B.85) and (B.86), we obtain (B.83) from the definition of &,k. 0 LEMMA7.3 For the doubly-adaptive biased coin design with delayed responses, suppose that N,/n -+v almost surely. If condition (A2) and Assumption 7.2 are satisfied, then for c' = (2 €0)/(2 €0 (1 E O ) C ) ,
+
+ + +
we have
en- 0, -
A
+
= o ( n ~ ' - ' ( ~ o g n ) ' ) o(n-(l+c~)/('+E~)) as.
212
PROOFS
In particular, if c 2 2, then for some cb > 0,
0, - 0 , = o(n-ll2-4 ) a s . I
h
PROOF.Notice that
(B.87)
.
almost surely, k = 1,. . ,K . By (B.81) and (B.84), for each k = 1,.. . , K ,
m=l
m=l j=n-m+l
Let 0 < c’ < 1 be a number the value of which will be defined later. Let 1, = [nv’] and I k ( m , p ) = C,”=,+,M d m ,j ) . Then,
n-1,
00
m=n-l,+l
m=l n
For the second term, by Theorems 1.2.1 and 2.6.6 of Csorgo and RCvCsz (198 I),
m=n-l,+l
=
o(nC’)
’>
+ o n2+r0
(
a.s.
PROOF OF THEOREMS IN CHAPTER 7
By choosing
ct =
2
2
+
213
€0
+ €0 + (1+
€O)C'
we have (B.90)
Combining (B.87)-(B.90), we have snk 2
Jn,k - ---
Nn,k
Nn,k
-
Rn,k Nn,k
-J n n k + 0 (nC'-'(logn)2)
+o
(n-e)
as.
Nn,k
Similarly,
Tn k = 1+ 0 (nC'-'(1ogn)2) + 0 ( n - 2 )
a.5.
Nn,k
Consequently, for each k = 1,. .., K ,
A
=
On,k
+ 0 (nC"'(logn)2) + o ( n - s )
as.
Now because c 1 2 in Assumption 7.2, we obtain Lemma 7.3 by d < 1/2. 0
214
8.4
PROOFS
REFERENCES
BAI,Z. D.AND Hu, F.(1999). Asymptotic theorem for urn models withnonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BAI, Z. D. A N D Hu, F. (2005). Asymptotics of randomized urn models. Annals of Applied Probability 15 9 14-940. BAI,Z. D., Hu, F., A N D ROSENBERGER, W. F. (2002). Asymptotic properties of adaptive designs with delayed response. Annals of Statistics 30 122-139. C S O R G ~M. , AND R k v ~ k z P. , (1981). Strong Approximations in Probability and Statistics. Academic Press, New York. DOOB,J. L. (1936). Note on probability. Annals of Mathematics 37 363-367. HALL,P. AND HEYDE,c. c.(1980). hfartingaleLimit TheoryandltsApplicarion. Academic Press, London. Hu, F. AND ZHANG,L.-X. (2004a). Asymptotic properties of doubly-adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-301. Hu, F. AND ZHANG, L.-X. (2004b). Asymptotic normality of adaptive designs with delayed response. Bernoulli 10 447463. Hu, F., ZHANG,L.-X., CHEUNG,S. H., AND CHAN, W. S. (2006). Doubly adaptive biased coin designs with delayed responses. Submitted. ZHANG,L.-X., CHAN, W. s., CHEUNG, S. H., AND Hu, F.(2006).Ageneralized drop-the-loser urn for clinical trials with delayed responses. Statistica Sinica, in press. ZHANG,L.-X., HU, F., AND CHEUNG,S.H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Annals of Applied Probability 16 340-369.
Author Index
Aganval, D. K..6,9, 138, 141, 149, 152-153, 155-156 Altman, D. G., 114, 117-1 18 Anderscn, J. S.,52.64, 121, 133 Anscombc, F. J., 7-8 Athreya, K. B.,4,8,31-33,56-57,64, 162, 172 Atkinson, A. C., 13,21,88, 109, 118, 136-137, 139, 154-156 Bai, Z. D., 34,37,46,50,52,56,64,82,88, 105-108,115,ll8, 173, 181,183.205,214 Balakrishnan, N., 64 Baldi Antognini, A., 70,84,88, 146, 153, 156 Bandyopadhyay, U., 68,88,109, I I I, I 13, I 18, 129,132,138,141,144.149,156
Bcny, D. A., 7,9 Billingsley, P., 168, 172 Biswas, A.,68,88,109-111,113,118,129,132, 138, 141, 144,149, 156 Chan, W.S.,61,65, 107-108, 118-119, 139, 141, 147-148,156,189,205,2l4 Charalambides, C., 64 Chcrnoff, H., 7 , 9 Cheung, S. H., 37,50,56,61,65, 107-108, 11&119,139,141,147-148, 156, 189, 205,214 Coad, D. S., 114, 118, 152, 156, 158-159 Colton, T., 7,9 Cornfield, J., 7, 9 Csorg6, M., 193,212,214 DeMcts, D. L., 91,103
Di Bucchianico, A., 65, I 18, 132 Donw, A.N., 13,21,136, 156 Doob, J. L., 197,214 Durham, S.D.,4,7, 10,25,29,32,56-59,6445 Efron, B.,3,9,68, 88, 101, 103, 156 Eiscle, J. R., 5,9,67-68, 72,79,88,91,97, 103, 108,118, 157,159 Farics, D. E., 52.64, 121, 133 Flchingcr, B. J., 7.9 Flournoy, N., 25.29.56-59.64.88 Friedman, L. M., 91,103 Fristedt, B., 7, 9 Furbcrg, C. D., 91, 103 Gcraldcs, M., $ 9 Giovagnoli, A., 70,84,88, 146,153,156 Grcenhousc, S. W., 7 , 9 Gwisc, T.,138, 156 Hall, P., 164, 167, 172, 174, 198,214 Halperin, M., 7,9 Harper, C. N., 13,21,69,88,98,104, 110, 119, 157, 159 Hayre, L. S.,7,9, l l , 2 l , 157, I59 Hciligenstein, 1. H., 121, 133 Hcyde, C. C., 164, 167, 172, 174, 198,214 Hu, F., 5.9, 13-14, 16,20-21,23,28,34,37,44, 46,SO. 52.56.60-61,64-65,67,69-70, 76,82,85,88,93,95-96,98, 101, 103, 105-108, I I t , 113, 115-1 19,123, 127-128, 132-133,139,141,147-148, 153-154, 156-157,159, 173,181,183, 215
216
AUTHOR INDEX
189, 194,205,214 Ivanova, A. V., 13,21,56,58-62,64,69,88,98, 104,ll0,119, 152,156-159 Jcnnison, C., I l , Z l , 101, 103 Johnson, N. L., 31,64 Karlin, S., 4, 8,31-33,56,64 Kicfcr, J., 137, 156 Kotz, S., 3 I , 64 Koutras, M. V., 64 Lachin, J. M . , 2 4 , 7 , 9 , 18,21,91,97, 10&101, 103, 121,133 Lauter, tf., 65, 118, 132 Li, W., 56-57,64 Lin, D. Y., 26.29 Louis, T.A., 7, 9 Mandal, S., 109-1 I I, 118,129, 132 Matthcws, P. C., 34,42,64 McCullagh, P., 144, 156 Melfi, V., 5,9,68,88,98, 102-103, 108, 1 11, 119 Ncldcr, J. A., 144, 156 Ncy, P. E., 57,64, 162, 172 Page, C., 5,9,68,88,98, 102-103, 108, I 11,119 Park, T. S., 26.29 Pocock, S. J., 6,9, 136, 156 RCvCsz, P., 193,212,214 Ricks, M.L.,13,21,69,88,98,104,110,119, 157,159 Robbins, H., 7,9 Roscnbcrgcr, W. F.. 246-7,9, 13-14, 16, 20-2 I, 23,25,28-29,3 I, 33-34,42,44, 56,58-60,64-65,69-70,73,88-89,91, 97-98,100-101,103-111,113, 115-119, 121, 123,127-129, 131-133,135, 138, 141,149, 152-159,205,214 Roy, S. N., 7 , 9
Royston, J. P., 114, 117-1 18 Scshaiycr, P., 105, 108, 119 Shao, J., 28-29 Shcn, L., 37,46,50,64 Sicgmund, D. O., 7, 9 Silvcy, S.D., 70.88 Simon, R., 6,9, 136, 156 Smith, R. L., 3, 10, 70, 77,79, 82.89, 100, 104 Smythc, R. T., 3, 10,26,29,33-34,65,77,79,82. 89 Stallard, N.,13,21,69,88,98,104,110,119, 157-159 Tamura, R. N.,52,64, 121, 133 Thompson, W. R., 7,9,74,89 Turnbull, B. W., 1 I , 21,101, 103 Tymofycyev, Y.,13-14,21,44,65, 113,119,128, 133 Vidyashankar, A. N.,6,9, 138,141, 149, 152-1 53,155-1 56 Wci, L. J., 3-5,7,9-10,26,29,32,42,56,65, 77, 79,82,84,89,10&101, 104-105, 119, 136,156 Wolfowitz, J., 137, 156 Woodroofc, M., 5, 9,68,72,79,88,91,97, 103, 108, 118 Wynn, H. P., 65, 118, 132 Yang, Y., 7, 10 Zelen, M.,7, 10, 136, 156 Zhang, L.-X.,5,9,20-21,23,28,37,50,56, 60-61,64-65,67,69-70,76,85,88.101,
103,107-108,111, 118-119,139,141, 147-148,156--157,159,189,194,205,214
Zhang, L., 73,89, 109, 1 1 I , 119, 129, 131, 133 Zhu, D., 7, 10 Zidek, J. V.,115-1 16, 118
Subject Index
Allocation proportions, I Asymptotically best procedure, 20-2 I, 27-28,61, 74,84-85,
157
Bandit problems, 7 Multi-armed, 7 Randomized, 7 Birth and dcath urn modcl, 34,56,58 Borel-Cantelli lemma, 179, 191, 198,208 Branching process, 57.59-60 Brownian motion, 62-63,80,167, 192493,204 Burkholdcr’s Inequality, 164 Burkholder’s inequality, 177 C A M randomization, 2.6.8, 135-141, 144, 149, 152-155, 158
Complete randomization, 2,24,67,70, 122-123.127-130,
100-102,
136,153
Conjugate transpose, I62 Continuous response, 84, 108-109, 129, 158 Convex optimization, 14 Covariate-adaptive randomization, 2,6, 135, 154 Pocoek-Simon procedure, 6, 136 Cramir-Wold device, 168-169,204 Delayed rcsponses, 8,82,105-108, 122, 131-132, 205-207.209-2 1 I Design matrix, 5, 136-137 Doubly-adaptivc biased coin dcsign, 5-6,24,26, 67,102-103, 108-109, 123, 158, 168, 171, 194,205.210-21 I
Drop-the-loser rule, 56,59,61,75,
115, 123,
127-128 Gcneralized, 61, I89
Eigenvector Generalized, I63 Letl, 32,37,43,45,47, 162 Right, 32, 162-163 Expected treatment failures, 13-14,21, 108, 110, 123, 127-128
Extended Pblya urn model, 33 Fisher’s information, 25,70, 144, 153 Gaussian approximations, 85, 167 Gcneralized Friedman’s urn model, 4 4 3 1-33,
35.56, 105, 107,114-116,158, 162, 171, 173,205 Generalized linear model, 139, 142, 144-145, 155 Generating matrix, 5.35-37, 39,4143,4547, 49,51-52,56,82, 114,162 Heterogeneous, I14 Immigration balls, 58-63 Jenscn’s incquality, I79 Jordan block, 163 Jordan canonical form, 54, 163 Jordan decomposition, 33-34,43,45,47,52,79. 85,162, 183, 194 Kronceker’s Icmma, 174, I80 Lindcberg condition, 203 Linear regression, 136, 141. 144-145, 148
217
218
SUBJECT INDEX
Logistic rcgression, 6, 138, 141, 144-145, 147, I53 Lyapunov condition, 181 Markov chain, 57 Martingale array, 165 Central limit theorem, 26-27,52,55-56,85,88, 165, 168, 181 differcncc, 148,164-165, 167,169-170,180. 203 Law of the iterated logarithm, 56,88, 167, 192, 20&20 1 Multivariate, 169 Skorohcd rcprcscntation, 167, I92 Strong law of largc numbcrs, 165 Weak law of large numbcrs, 165 Matrix rccursion, 52-53.85-86, 163, 168 Stochastic, 162 Strict positivity, 162 Variance-covariance, I9,3940,56,84, 136-137,146,170,181,183,185-186,
189,203 Maximum likclihood estimation, 5-6, 23.25-27. 71,96, 122, 142, 145, 147-150, 153 Multivariate martingalc array, 170 Multivariate martingale Central limit theorem, 170, I8 1, 185,I89,203 Strong law of largc numbers, 169 Weak law of large numbcrs, I69 Myopic proccdure, 7 Noncentrality parameter, 13-14, 1620, 109, 123, I47 Optimal allocation, 11, 15 for normal responses, 13 Neyman allocation, 13-14,21,24, 110-11 I , 113, 123 RSIHR allocation, 13-14.21, 110, 123 Optimal dcsign, I3 Bayesian, 13 Locally, 13
Power,4,8, 11-12, 15-18,20-21,24,60,71, 91-103, 108,110, 121-123, 127-132, 153-155 Randomization procedure, I Randomization scquence. 1 Randomizcd play-the-winner rulc, 4-5,24-25, 32-35,41,51-52,56,60-61,75,82, 127-128, 158
123,
Randomized P6lya urn niodcl, 56-57 Response-adaptive randomization, 2 Restricted randomization, 2 Efron’s biased coin design, 34.68.82 Permuted block design, 4 Random allocation rule, 3-4 Smith’s class of dcsigns, 70 Truncated binomial dcsign, 3-4 Wei’s urn design, 3-4,70,100 Roscnthal’s Incquality, I65 Sample size, 7-8, 11-12, 14,24,91,96-103, 108, 121-122,127,129-130, 153,158 Scquential design, 13 Sequential estimation proccdures, 6,60,67, I 11, 158
Sequential maximum likelihood proccdurc, 5, 102, 128-1 29 Scquential monitoring, 158 Skorohod topology, 81 Strong consistency, 56,62,79.86-87, 180,208 Survival time rcsponses, 108-109 Target allocation, 5-6,8, 16-17,20,28,67-71,75, 122-123,127, 144,155 Taylor’s expansion, 172 Tcrnary urn model, 3 I , 56 Time heterogcneity, I14 Triangle incquality, 176 Urn allocation, 60,70, 123, 127, I57 Um composition, 3432-33,37,50,57,59-61, 162, I73 Initial, 3,33-34,42,61, 127 Urn modcl Nonhomogeneous, 56 Sequential estimation-adjusted, 56