DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 19
Robustness of analytical chemical methods and pharmaceutical techn...
68 downloads
1255 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 19
Robustness of analytical chemical methods and pharmaceutical technologxal products
DATA HANDLING IN SCIENCE AND TECHNOLOGY
Advisory Editors: B.G.M. Vandeginste and S.C. Rutan Other volumes in this series:
Volume 1 Volume 2 Volume 3 Volume 4 Volume 5 Volume 6 Volume 7 Volume 8 Volume 9 Volume 10 Volume 11 Volume 12 Volume 13 Volume 14 Volume 15 Volume 16 Volume 17 Volume 18 Volume 19
Microprocessor Programming and Applications for Scientists and Engineers by R.R. Smardzewski Chemometrics: A Textbook by D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman Experimental Design: A Chemometric Approach by S.N. Deming and S.L. Morgan Advanced Scientific Computing in BASIC with Applications in Chemistry, Biology and Pharmacology by P. Valko and S. Vajda PCs for Chemists, edited by J. Zupan Scientific Computing and Automation (Europe) 1990, Proceedings of the Scientific Computing and Automation (Europe) Conference, 12-15 June, 1990, Maastricht, The Netherlands, edited by E.J. Karjalainen Receptor Modeling for Air Quality Management, edited by P.K. Hopke Design and Optimization in Organic Synthesis by R. Carlson Multivariate Pattern Recognition in Chemometrics, illustrated by case studies, edited by R.G. Brereton Sampling of Heterogeneous and Dynamic Material Systems: theories of heterogeneity, sampling and homogenizing by P.M. Gy Experimental Design: A Chemometric Approach (Second, Revised and Expanded Edition) by S.N. Deming and S.L. Morgan Methods for Experimental Design: principles and applications for physicists and chemists by J.L. Goupy Intelligent Software for Chemical Analysis, edited by L.M.C. Buydens and P.J. Schoenmakers The Data Analysis Handbook, by I.E. Frank and R. Todeschini Adaption of Simulated Annealing to Chemical Optimization Problems, edited by J.H. Kalivas Multivariate Analysis of Data in Sensory Science, edited by T. NZS and E. Risvik Data Analysis for Hyphenated Techniques, by E.J. Karjalainen and U.P. Karjalainen Signal Treatment and Signal Analysis in NMR, edited by D.N. Rutledge Robustness of Analytical Chemical Methods and Pharmaceutical Technological Products, edited by M.M.W.B. Hendriks, J.H. de Boer and A.K. Smilde
DATA HANDLING IN SCIENCE AND TECHNOLOGY - VOLUME 19 Advisory Editors: B.G.M. Vandeginste and S.C. Rutan
Robustness of analytical chemical methods and pharmaceutical technological products
edited by Margriet M.W.B. Hendriks Agricultural Mathematics Group, P.0. Box 100, 6700 AC Wageningen,The Netherlands
Jan H. de Boer Gasunie Research, P.O.Box 19,9700 MA Groningen, The Netherlands
Age K. Smilde Laboratory for Analytical Chemistry, University of Amsterdam Nieuwe Achtergracht 166, 7018 WVAmsterdam, The Netherlands
1996
ELSEVIER Amsterdam - Lausanne
- New York -Oxford -Shannon -Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211,1000 AE Amsterdam, The Netherlands
ISBN
0-444-89709-7
0 1996 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521,1000 AM Amsterdam, The Netherlands. Special regulations for readers in the USA - This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. All other copyright questions, including photocopying outside of the USA, should be referred to the copyright owner, Elsevier Science B.V., unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands
PREFACE
The aim of this book is to help those who are working in the area of analytical chemistry or pharmaceutical technology to develop robust analysis methods and pharmaceutical technological products. Robustness is a part of quality assurance. Awareness of the importance of quality is growing. Proof of this growth is the abundance of norms and protocols that are created to assure quality, e.g. Good Laboratory Practice (GLP) and Good Manufacturing Practice (GMP). It is therefore worthwhile to think about the formal methodology which can be used to assess and assure robustness. One of the quality aspects of analytical chemical methods is whether they are rugged or robust against extraneous influences or changing conditions. The latter can be a change in analist performing the analysis or a change in instrument, laboratory, supplier of chemicals etc. In pharmaceutical technology, it is important to develop formulations which are robust to environmental conditions, like temperature changes and relative humidity variation, and that have a long life time. These aspects belong to the quality assurance of pharmaceutical formulations. In this book we try to give a framework for assessing and assuring the robustness of analytical chemical methods and pharmaceutical technological products. The methodology to serve these purposes is essentially the same in both areas. The book contains review chapters, explaining the methodology, and chapters with applications from both analytical chemistry and pharmaceutical technology. We hope that this mixture gives a good flavor of the methodology and how it can be used. At a certain point in time all three of us were working in the Research Group Chemometrics at the University Centre for Pharmacy of the University of Groningen in the Netherlands. In this research group, one of the topics was developing methodology for assessing and assuring robustness. The methodology was applied in analytical chemistry and pharmaceutical technology. Some of the authors were or still are part of this research group: J. Wieling, C.A.A. Duineveld, P.M.J. Coenegracht and P. Koopmans. There was a close working relationship with the Department of Pharmaceutical Technology and Biopharmacy, of which C.E. Bos was and G.K. Bolhuis still is a member.
V
vi
PREFACE
We are glad that all other authors, Y. vander Heyden, D.L. Massart, S.P. Jones and M. Mulholland accepted the invitation to participate in the project of writing this book. We thank Gjalt Feenstra for his help in editing some of the figures and, finally, we hope that you will enjoy reading this book! September 1996 Margriet M.W.B. Hendriks, Wageningen Jan H. de Boer, Groningen Age K. Smilde, Amsterdam
TABLE OF CONTENTS Chapter 1
GENERAL INTRODUCTION TO ROBUSTNESS
1
1.1 INTRODUCTION
1
1.2 APPLICATION AREAS AND RELATED ROBUSTNESS QUESTIONS 1.2.1 Pharmaceutical formulations 1.2.2 Analytical chemical methods
2 2 3
1.3 STATISTICAL METHODOLOGY 1.3.1 Taguchi method 1.3.2 RSM and Experimental design 1.3.3 Sequential or simultaneous optimization 1.3.4 Multicriteria optimization 1.3.5 Robustness criteria 1.4 RECOMMENDED READING PATHS
8
REFERENCES
9
Chapter 2
STABILITY AND RESPONSE SURFACE METHODOLOGY
11
2.1 INTRODUCTION 2.1.1 Example
11 13
2.2 AN OVERVIEW OF RESPONSE SURFACE METHODOLOGY 2.2.1 First-order designs 2.2.2 Adding center points 2.2.3 Second-order designs 2.2.4 Optimal designs 2.2.5 Other second-order designs 2.2.6 Interim Summary
15 18 24 25 33 34 35
2.3 ROBUST DESIGN AND RESPONSE SURFACE METHODOLOGY 2.3.1 Response surface modeling of the mean and standard deviation 2.3.2 Analyzing the mean and standard deviation response surfaces 2.3.3 Experimental design with environmental variables
35 37 39 41
vii
...
Vlll
CONTENTS
2.3.4 Analysis of experimental designs with environmental variables 2.3.5 Example
47 53
2.4 SPLIT-PLOT DESIGNS FOR ROBUST DESIGN 2.4.1 Overview of split-plot designs 2.4.2 Precision of split-plot designs 2.4.3 Variants of split-plot designs 2.4.4 Analysis of split-plot designs for robust experimentation
57 59 69 69 70
2.5 CONCLUSIONS
73
ACKNOWLEDGEMENTS
75
REFERENCES
75
Chapter 3
REVIEW OF THE USE OF ROBUSTNESS AND RUGGEDNESS IN ANALYTICAL CHEMISTRY 79 3.1 INTRODUCTION
79
3.2 PLACE OF RUGGEDNESS TESTING IN METHOD VALIDATION
80
3.3 DEFINITIONS OF RUGGEDNESS
83
3.4 RUGGEDNESS TESTING OF PROCEDURE RELATED FACTORS 3.4.1 The steps of a ruggedness test 3.4.2 Selection of the factors 3.4.3 Selection of the levels of the factors 3.4.4 Selection of the experimental design 3.4.5 Experimental part of the ruggedness test 3.4.6 Analysis of the results 3.4.7 Statistical analysis of the results 3.4.8 Using predefined values to identify chemically relevant factors 3.4.9 Case studies 3.4.10 Expert systems and software packages for ruggedness testing
85 85 86 88 92 112 114 115 124 127 138
3.5 RUGGEDNESS TESTING OF NON-PROCEDURERELATED FACTORS: 138 THE USE OF NESTED DESIGNS 3.6 CONCLUSIONS
143
ACKNOWLEDGEMENTS
144
REFERENCES
145
CONTENTS
ix
Chapter 4
ROBUSTNESS CRITERIA; INCORPORATING ROBUSTNESS EXPLICITLY IN OPTIMIZATION PROCEDURES UTILIZING MULTICRITERIA METHODS 149 4.1 INTRODUCTION
149
4.2 A BRIEF INTRODUCTION TO THE TAGUCHI METHODS 4.2.1 Introduction 4.2.2 The loss function 4.2.3 Off-line quality control 4.2.4 Orthogonal arrays
150 150 151 154 156
4.3 THE ROBUSTNESS CRITERIA 4.3.1 Introduction 4.3.2 The variancekovariance structure of a mixture 4.3.3 General aspects of the robustness criteria 4.3.4 The Jones method 4.3.5 The Weighted Jones method 4.3.6 The Projected Variance method 4.3.7 The Robustness Coefficient
157 157 159 166 166 169 170 172
4.4 MULTICRITERIA DECISION MAKING 4.4.1 Introduction 4.4.2 Theory ofMCDM
175 175 180
4.5 THE ROBUSTNESS COEFFICIENT APPLIED IN A MCDM STRATEGY183 4.5.1 Introduction 183 4.5.2 Theory 183 4.5.3 Experimental 184 4.5.4 Results and discussion 185 REFERENCES
189
Chapter 5
RUGGEDNESS TESTS FOR ANALYTICAL CHEMISTRY
191
5.1 INTRODUCTION 5.1.1 Designing a protocol for method validation 5.1.2 Summary of the role of a ruggedness test in a method validation program
191 192 196
5.2 SELECTION OF FACTORS TO TEST 5.2.1 Selection of the number of levels at which to test a factor
197 198
X
CONTENTS
5.2.2 Selection of factors for HPLC methods 5.2.3 Selection of factors for other analytical methods
198 20 1
5.3 SELECTION OF EXPERIMENTAL DESIGNS 5.3.1 Factorial designs 5.3.2 Star designs 5.3.3 Central composite designs 5.3.4 Box-Behnken designs 5.3.5 Matching the ruggedness test to an efficient design
202 203 209 21 1 21 1 212
5.4 TREATMENT OF RESULTS 5.4.1 Measurements for a HPLC Study 5.4.2 Treatment of the results from the ruggedness study 5.4.3 Confounding effects in fractional factorial designs
214 214 216 217
5.5 EXAMPLE CASE STUDIES 219 5.5.1 The application of a ruggedness test to the assay of Aspirin and its major 219 degradation product, salicylic acid 5.5.2 The application of a ruggedness test to the assay of Salbutamol and its major 226 degradation product, AH4045 5.6 CONCLUSIONS
230
REFERENCES
230
Chapter 6
STABILIZING A TLC SEPARATION ILLUSTRATED BY A MIXTURE OF SEVERAL STREET DRUGS
233
6.1 INTRODUCTION
233
233 6.2 THEORY 233 6.2.1 Thin Layer Chromatography 235 6.2.2 Separation problem 236 6.2.3 Selection of mobile phases 238 6.2.4 Influence of temperature and relative humidity 240 6.2.5 Optimization 242 6.2.6 The Taguchi approach to robustness 243 6.2.7 Application of a parameter design in optimization 6.2.8 Generalization of parameter design towards Response Surface Methodology 245 (RSM) 246 6.2.9 Construction of experimental designs 248 6.2.10 Selection of the dependent variable. 250 6.2.11 Construction of models for the dependent variables
CONTENTS 6.2.12 Selection criteria for models 6.2.13 Selection of optimization criteria.
xi 25 1 252
6.3 EXPERIMENTAL 6.3.1 Materials and methods 6.3.2 Software
254
6.4 RESULTS 6.4.1 Introduction 6.4.2 Box Cox transformation 6.4.3 Selection of models 6.4.4 Chromatographic and empirical models 6.4.5 Determination of a solvent with a high minimum resolution 6.4.6 Determination of a solvent composition with a robust minimum resolution
254 254 254 257 258 258 259
6.5 CONCLUSIONS
262
REFERENCES
263
254
Chapter 7
ROBUSTNESS OF LIQUID-LIQUID EXTRACTION OF DRUGS 265 FROM BIOLOGICAL SAMPLES 7.1 INTRODUCTION
265
7.2 THEORY 7.2.1 Liquid-liquid extraction optimisation theory 7.2.2 Optimisation criteria
268 268 270
7.3 EXPERIMENTAL 7.3.1 Validation of robustness criteria by means of a comparison with a simulation experiment 7.3.2 Selection of solvents 7.3.3 The extraction of a group of sulphonamides from plasma
281
7.4 RESULTS AND DISCUSSION 7.4.1 Validation of robustness criteria by means of a comparison with a simulation experiment 7.4.2 The extraction of a group of sulphonamides
288
7.5 CONCLUSIONS
304
ACKNOWLEDGEMENTS
305
REFERENCES
305
28 1 284 286
288 295
xi i
CONTENTS
Chapter 8 THE USE OF A FACTORIAL DESIGN TO EVALUATE THE PHYSICAL STABILITY OF TABLETS AFTER STORAGE UNDER 309 TROPICAL CONDITIONS 8.1 INTRODUCTION 8.1.1 The use of experimental designs in tablet formulation 8.1.2 The use of factorial designs in physical tablet stability studies
309 310 311
8.2 THE USE OF THE RELATIVE CHANGE IN TABLET PARAMETERS IN A FACTORIAL DESIGN 312 313 8.2.1 Planning of the design 314 8.2.2 Tabletting, storage and measurements 8.2.3 Results 314 8.2.4 Conclusions 325 8.3 SELECTION OF EXCIPIENTS SUITABLE FOR USE IN TROPICAL COUNTRIES 8.3.1 Planning of the design 8.3.2 Tabletting, storage and measurements 8.3.3 Results 8.3.4 Conclusions
328 328 330 33 1 340
REFERENCES
340
LIST OF CONTRIBUTORS J.H. DE BOER Gasunie Research, P.O. Box 19, 9700 M A Groningen, The Netherlands
G.K. BOLHUIS Department of Pharmaceutical Technology and Biopharmacy, University Centre for Pharmacy, University of Groningen, A. Deusinglaan I , 9713 A V Groningen, The Netherlands C.E. Bos A.UK Veterinary Cooperation, P.O. Box 94, 5430 AB Cuijk, The Netherlands P.M.J. COENEGRACHT Research Group Chemometrics, University Centre for Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands C.A.A. DUINEVELD Quest International, P. 0.Box 2, I400 CA Bussum, The Netherlands
Y. VANDER HEYDEN ChemoAC, Pharmaceutical Institute, Vrije Laarbeeklaan 103, B-I 090, Brussels, Belgium
Universiteit
Brussel,
S.P. JONES Boeing Computer Services, The Boeing Company, P. 0.Box 24346, MS 7L22, Seattle, WA98124-0346, USA P. KOOPMANS Academic Hospital Groningen, P.O. Box 30001, 9700 RB Groningen, The Netherlands D.L. MASSART ChemoAC, Pharmaceutical Institute, Vrije Laarbeeklaan 103, B-I 090, Brussels, Belgium ...
Xlll
Universiteit
Brussel,
xiv
CONTRIBUTORS
M. MULHOLLAND Department of Analytical Chemistry, University of New South Wales, P. 0. Box I , Kensington, New South Wales 2033, Australia A.K. SMILDE Laboratory for Analytical Chemistry, Nieuwe Achtergracht 166, I018 WV Amsterdam, The Netherlands J. WIELING Biolntermediair Europe BV, P.O. Box 454, 9700 AL Groningen, The Netherlands
Chapter 1
GENERAL INTRODUCTION TO ROBUSTNESS AGEK. SMILDE Laboratory for Analytical Chemistry, Universityof Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
1.1 INTRODUCTION In the field of analytical chemistry and pharmaceutical formulations there is a growing awareness of quality issues. To mention a few, Good Laboratory Practice (GLP) and Good Manufacturing Practice (GMP) and I S 0 norms are important topics in both laboratory and industrial environments. Moreover, there is a growing number of regulatory committees concerned with all aspects of quality, both on a national and international level [ 1-31. In analytical chemistry, validation of the analytical methods is of utmost importance [4,5]. One of the aspects of this validation is the robustness of analytical methods against variations in experimental circumstances. The term “experimental circumstances” is very broad; it might even include inter-laboratory variation. In this book, only intra-laboratory experimental conditions are considered. No explicit attention is given to inter-laboratory variations, although some of the presented methodology might be useful in that area. In pharmaceutical technology, quality assurance of the pharmaceutical formulation is important. When a pharmaceutical formulation is produced, on-line quality monitoring and control has to be performed in order to check the quality of the outgoing products. Methodology to perform this task is Statistical Process Control (SPC) and is not included in this book. Good text books in the area of SPC exists [6-91. In this book the focus is on off-line quality control, e.g. how to make products that are intrinsic robust against process variations.
1
2
A.K. SMILDE
This book consists of eight chapters. Chapters 2, 3 and 4 give methodological background and reviews. Chapters 5 to 8 carry the applications; both in the field of analytical chemistry and pharmaceutical formulations. Since the field of which this book tries to give an overview is still under active research, this book is by no means a monograph with well established and tested methods. There are still a lot of questions and open ends. This book, however, does give some ideas how to tackle the problems of robustness. An important distinction has to be made with respect to the presented methods. There is a class of methods dealing with testing robustness of a given analytical method or a given pharmaceutical formulation. When related to an analytical method, such strategies are often referred to as “ruggedness testing”. The other class of methods aims at building in robustness in the design phase for pharmaceutical formulations or in the method development phase of analytical methods. This second class of methods is sometimes called “quality by design” for obvious reasons. This general introduction will continue with a summary of application areas covered in the following chapters and the related robustness questions which have to be solved. Then the different statistical methods that play a role in solving the questions and which are discussed in the following chapters will be put in a general framework. Recommendations will be given for readers of different levels how to approach this book in such a way that the reading of it is fun!
1.2 APPLICATION AREAS AND RELATED ROBUSTNESS QUESTIONS 1.2.1 Pharmaceutical formulations Robustness is a relatively new area of (optimization of) pharmaceutical formulations. Therefore, no review is given of existing applications. Some applications are given in the Chapters 2, 4 and 8; these are explained below. When tablet formulations are made, usually different quality criteria have to be met, e.g., a high crushing strength, a low disintegration time, a pre-set dissolution profile. A tablet consists normally of the pharmacon (the pharmacologically active compound; the drug) and excipients. Hence, a tablet can be made with different relative amounts of excipients (a mixture composition) and this creates room for optimizing a tablet
GENERAL INTRODUCTION TO ROBUSTNESS
3
formulation. Not only the relative amounts of excipients influence tablet properties but also process variables like compression force. Moreover, also environmental variables like temperature and humidity can influence (long-term) stability of tablets. Different robustness questions can be raised in this area. Suppose that a tablet has to be made with certain relative amounts of excipients. In the weighing of the excipients small errors are made which result in a variation of the mixture composition of excipients. How much are the quality criteria of tablets affected by this variation? Is it possible to select excipient compositions that minimize the quality reducing effects? These questions are posed and answered in Chapter 4. The properties of tablets are also influenced by temperature and humidity fluctuations, e.g., during storage. Is it possible to select excipient compositions that minimize the effect of temperature and humidity fluctuations on the quality of tablets? These questions are raised and treated in a small example in Chapter 2 and extensively in Chapter 8. 1.2.2 Analytical chemical methods A review of ruggedness testing methods is presented in Chapter 3 and in Chapter 5 examples are given. In these chapters procedures are described that test the robustness or ruggedness of existing methods. Hence, incorporating robustness explicitly in analytical techniques (see Section 1.1) is not discussed.
I .2.2.I High Performance Liquid Chromatography High Performance Liquid Chromatography (HPLC) is an analytical chemical method which is used on a large scale routinely. If an HPLC method is developed, the question arises whether the analytical results from this method depend critically on small deviations in the mobile phase composition, the selected UV-wavelength for detection etc. Methods to deal with these problems are outlined in Chapter 5 and examples are given how to tackle these problems. Examples of ruggedness testing in HPLC are also given in Chapter 3 . 1.2.2.2 Thin Layer Chromatography (TLC)
TLC is a simple, cheap and fast analytical chemical technique which is often used for screening purposes. Due to the set-up of an TLC experiment, not only the mobile phase composition has influence on retention (and resolution) but also temperature and humidity. Of course, a
4
A.K. SMILDE
TLC experiment can be carried out in controlled conditions, but then the appealing characteristics of simplicity, cheapness and fastness disappear. In order to make the TLC method simple and robust against temperature and humidity changes, it is possible to select a mobile phase that minimizes the harmhl effects of these changes. This is described in Chapter 6. 1.2.2.3 Liquid-liquid extraction One of the most used sample pre-treatment methods, especially in bioanalysis, is liquid-liquid extraction. Analytes present in an aqueous sample can be extracted from that sample with the use of organic solvents. A pure organic solvent can be used, but also mixtures of organic solvents. It is important in liquid-liquid extraction that as much as possible of the analyte(s) and internal standard are extracted in equal amounts. This defines an optimization problem where a mixture of organic solvents has to be chosen to reach the goals stated above. In mixing the organic solvents, errors are made. How much do these errors affect the quality criteria (maximizing amounts of analytes and internal standards extracted in an equal way)? Is it possible to select an organic solvent mixture that minimizes the effects of these errors? This problem is treated in Chapter 7 and is very related to the problem discussed in Chapter 4.
1.3 STATISTICAL METHODOLOGY In this Section a brief overview is given of the statistical methods that are discussed in the separate chapters in this book. For the established methodology, references will be given to existing text books.
1.3.1 Taguchi method The Taguchi method consists of a philosophy of quality, experimental design methods to build in robustness and methods to analyze data obtained from the experiments. All these topics are treated in text books [ 10-131. The philosophy is simple: make products with build-in robustness against all kinds of environmental disturbances and fluctuations. The experimental design methods are presented as linear graphs and orthogonal arrays, but are not essentially different from established designs like Factorial and Fractional Factorial designs [ 121. The methods of analyzing
GENERAL INTRODUCTION TO ROBUSTNESS
5
the data consists of criteria formulated in terms of signal-to-noise ratios and (simple) statistical tools to establish the relationship between the design variables, the environmental variables and the signal-to-noise ratios. There is also a lot of criticism on Taguchi’s method, especially with respect to the data analysis part. Some of this criticism focuses on the use of Taguchi’s quality criteria: the signal-to-noise ratios [ 151. A good example of the drawback of these signal-to-noise ratios is given in Chapter 6. A brief introduction into the Taguchi method is given in Chapter 4, whereas in Chapters 2 and 6 some of Taguchi’s ideas are touched upon and discussed. The philosophy of Taguchi (build-in robustness) is present in Chapters 2 , 4 , 6 , 7 and 8.
1.3.2 RSM and Experimental design The idea of Response Surface Methodology (RSM) is straightforward. Suppose that a tablet has to be made with a high crushing strength. The crushing strength depends on the relative amounts of excipients. The hnctional relationship between the quality criterion (crushing strength) and the design variables (relative amounts of excipients) may be very complicated, but can be approximated by a Taylor expansion. This results in a linear approximation (first-order approximation) if only one term in the Taylor expansion is considered. If this is not sufficient, a second term can be added and a second-order model is obtained. This results in empirical first-order models like
where CR is crushing strength; x,, x2 are the relative amounts of excipients 1 and 2, respectively and e is an error term. The parameters a,, a , and a2 have to be estimated; preferably using data obtained from an experimental design (see later). With the use of such an empirical model, the crushing strength can be optimized with respect to the relative amounts of excipients. The empirical model above can be represented as a surface in the 3-D space, where the axis are x,, x2 and CR. Hence, this methodology is called response surface methodology. Good textbooks have appeared in the area of RSM [ 14,161. In order to optimize the information content of experiments, it is wise to plan the experiments ahead in a well defined manner. Formal ways to plan
6
A.K. SMILDE
experiments are the subject of experimental design. The basic idea is to vary systematically all variables that might influence the responses (quality criteria). These ideas are explained in textbooks [14,16]. RSM is discussed in Chapter 2 and experimental design is discussed in Chapters 2, 3 and 5. Both RSM and experimental design techniques are present in all the other chapters in this book. Two special topics of experimental design are used in this book. The first topic is that of mixture designs. As an example, suppose that only two recipients are used in a tablet. Then the above used model for crushing strength has the restriction that x,+x,=l . Designs especially suited for this situation are called mixture designs and are treated in Cornell [17]. In Chapter 4, 6 and 7 these designs are used. The second special topic is the combination of factorial designs and mixture designs. Suppose that, in the example above, not only the two recipients influence the crushing strength but also the compression force and the mixing time of the recipients. The relative amounts of recipients have to be varied in a mixture design, but the compression force and mixing time can be varied according to a factorial design. Hence, a combination between a mixture design and a factorial design is needed. This topic is treated partly in Cornell [ 171 and extensively in some papers [18,19]. Chapter 6 gives an example of such combined designs. 1.3.3 Sequential or simultaneous optimization For the optimization of, for instance, a tablet formulation, two strategies are available: a sequential or a simultaneous approach. The sequential approach consists of a series of measurements where each new measurement is performed after the response of the previous one is known. The new experiment is planned according to a direction in the search space that looks promising with respect to the quality criterion which has to be optimized. Such a strategy is also called a hill-climbing method. The Simplex method is a well known example of such a strategy. Textbooks are available that describe the Simplex methods [20]. In the simultaneous approach the experiments are planned beforehand (preferably using experimental design techniques) and performed randomly. With RSM techniques the obtained experimental data can be used to model the quality criterion as a function of the design variables. Then an optimal setting of the design variables can be calculated. All the optimization experiments described in this book are using the simultaneous approach. The simultaneous approach uses in almost all
GENERAL INTRODUCTION TO ROBUSTNESS
7
cases the RSM method, and this approach is henceforth described in textbooks [14,16]. There are also hybrid forms of the sequential and simultaneous approach to optimization. In a part of the search space a small design is made and the initial experiments are carried out according to this design (simultaneous). Then the direction of steepest ascent is calculated and some experiments are made in this direction (sequential). In a promising new area a new design is made and experiments are performed according to this new design. Optimization is performed by repeating these steps a few times. This hybrid procedure is also described in standard textbooks [ 14,161. One of the drawbacks of sequential optimization methods is that optimizing two or more criteria at the same time is hard, if not impossible. If the two or more criteria are combined in one overall criterion, which is advocated sometimes, then ambiguous results are obtained. This is shown in Chapter 4. There are ways to overcome this ambiguity to some extent [21]. Another drawback of a sequential procedure is that it gives not much information on the dependence of the criterion on the design variables. In the context of robustness this is a very serious drawback. This is one of the reasons why the use of sequential optimization methods is not present in this book. 1.3.4 Multi-criteria optimization In practice often more than one quality criterion is relevant. In the case of the need to build in robustness, at least two criteria are already needed: the quality criterion itself and its associated robustness criterion. Hence, optimization has to be done on more than one criterion simultaneously. If a simultaneous optimization technique is used then there are procedures to deal with multiple optimization criteria. Several methods for multi-criteria optimization have been proposed and recently a tutorialheview has appeared [22]. An introduction to one particular multi-criteria optimization method the so called Pareto-Optimality method - is discussed in Chapter 4, where also an application of this method is given. 1.3.5 Robustness criteria If robustness has to be build in, then the concept of robustness has to be formalized and optimized. This is contrary to the class of methods that check the robustness or ruggedness of existing methods; then the influence
8
A.K. SMILDE
of the variables on the response can be expressed, for instance, in percentage change of the response. Several ways to formalize the concept of robustness are presented in this book. Robustness can be formalized and expressed as a variance of the quality criterion which is done in Chapter 7. Another way to formalize robustness is the percentage change of the response, which is done in Chapter 8. It is also possible to express robustness in more complicated ways, examples of those are given in Chapters 2 and 4. In Chapter 6 a maxi-min formalization is chosen: select the TLC-solvent composition in such a way that the minimum resolution between two pair of solutes is maximized.
1.4 RECOMMENDED READING PATHS For readers with no prior knowledge of experimental design and RSM Reading several chapters in the text books of Box et al. [14] is a good introduction. After that, the introduction in experimental design and RSM methodology of Chapter 2 can be read and an overview is also given in Chapter 3. For readers with no prior knowledge of optimization methods In the textbook of Box et.al. [14] the basic principles of optimization are also explained. The sequential simplex method is presented in Walters et.al. [20]. Multi-criteria optimization is presented in Chapter 4 on an introductory level. For those readers who want to know more about multicriteria optimization, see the references given in Section 1.3.4 and Chapter 4. For readers with no prior knowledge of the Taguchi method The Taguchi method is explained to some extent in Chapter 4. A general introduction is given in [ll-131. For detailed discussions, see the references given in Chapter 4 and Section 1.3.1. For readers with some knowledge of experimental design and RSM Start with reading Chapters 2 and 3, this will fresh up your memory.
GENERAL INTRODUCTION TO ROBUSTNESS
9
For readers with some knowledge of optimization methods Start reading Chapter 2 and 4, that gives the background material which should be understandable. If a more detailed understanding of, e.g., multicriteria optimization is wanted, then the references in Chapter 4 will suffice. For readers with some knowledge of the Taguchi method Start reading Chapter 2 and 4,that gives the background material which should be understandable.
REFERENCES
[ 101
[l 11 [ 121 [ 131
[14]
M. Parkany (editor ), Quality assurance for analytical laboratories, Royal Society of Chemistry, Cambridge, United Kingdom, 1993 (Proceedings of the Fifth International Symposium on the Harmonization of Internal Quality Assurance Schemes for Analytical Laboratories held in Washington DC, USA, 22-23 July 1993). Good Laboratory Practice in the Testing of Chemicals, Organization of Economic Co-operation and Development (OECD), Paris, 1982. Quality Management and Quality Assurance Standards: Guidelines for Selection and Use, 1987-03-15, International Organization for Standardization, 1987. G. Kateman and L. Buydens, Quality Control in Analytical Chemistry, John Wiley, New York, 1993. L.Buydens and P. Schoenmakers (editors), Intelligent software for chemical analysis, Elsevier, Amsterdam, 1993. G.B. Wetherill and D.W. Brown, Statistical Process Control: Theory and Practice, Chapman and Hall, London, 1991. R.W. Berger and T.H. Hart, Statistical Process Control: A Guide for Implementation, ASQC Quality Press, Milwaukee, 1986. A. Mitra, Fundamentals of Quality Control and Improvement, MacMillan, New York, 1993. Th.P. Ryan, Statistical Methodsfor Quality Improvement, John Wiley, New York, 1989. T. Bendell (editor), Taguchi Methods, Elsevier Applied Science, Amsterdam, 1989. G. Taguchi, Introduction to quality engineering: designing quality into products andprocesses, Kraus International Publications, White Plains, NY, USA, 1986. G.S. Peace, Taguchi methods: a hands-on approach, Addison Wesley, 1992. P.J. Ross, Taguchi Techniques for Quality Engineering: Loss Function, Orthogonal Experiments, Parameter and Tolerance Design, McGraw-Hill, New York, 1988. G.E.P. Box, J.S. Hunter and W.G. Hunter, Statistics for Experimenters, John Wiley, New York, 1978.
10
A.K. SMILDE
[151 G.E.P. Box, Signal-to-noise ratios, performance criteria, and transformations, Technometrics, 30 (1988) 1-40. [ 161 G.E.P. Box and N.R. Draper, Empirical Model Building and Response Surfaces, John Wiley, New York, 1986. [17] J.A. Cornell, Experiments with Mixtures, John Wiley, New York, 1990. [ 181 C.A.A. Duineveld, A.K. Smilde and D.A. Doornbos, Comparison of experimental designs combining process and mixture variables. Part 1: Design construction and theoretical evaluation, Chemometrics and Intelligent Laboratory Systems, 19 (1993) 295-308. [ 191 C.A.A. Duineveld, A.K. Smilde and D.A. Doornbos, Comparison of experimental designs combining process and mixture variables. Part 2: Design evaluation on measured data, Chemometrics and Intelligent Laboratory Systems, 19 (1 993) 309318. [20] F.H. Walters, L.R. Parker, S.L. Morgan and S.N. Deming, Sequential Simplex Optimization, CRC Press, Florida, 1991. [21] C.A.A. Duineveld, C.H.P. Bruins, A.K. Smilde, G.K. Bolhuis, K. Zuurman and D.A. Doornbos, Multicriteria Steepest Ascent, Chemometrics and Intelligent Laboratory Systems, 25 (1994) 183-202.
Chapter 2
STABILITY AND RESPONSE SURFACE METHODOLOGY STEPHEN P. JONES Boeing Computer Services, The Boeing Company, P. 0.Box 24346, MS 7L-22, Seattle, WA98124-0346, United States
2.1 INTRODUCTION In recent years much attention has been focused on the impact of the use of statistics, and in particular experimental design, to improve the quality of products and processes. An important component of the quality of a product is its robustness or stability in the presence of what Taguchi has called noise variables. These noise variables can be from a variety of sources, such as environmental conditions, deterioration of components, or variation in product components and manufacturing processes. It is possible that variation due to these sources will cause variation in the key characteristics of a product or process, resulting in a product of inferior quality. This chapter will examine the application of statistical experimental design to designing a product or process that is robust to variation from environmental variables. It should be understood that the phrase “environmental variables” is to be viewed broadly and is not just limited to variables such as temperature and humidity. In this context, variation from environmental variables is variation that is external to the product and that is outside of the control of the manufacturer during production. Thus, it might also include variation in the conditions in which the customer uses the product, or in the conditions in which the product is stored, or in how the product is maintained and serviced. It should be noted that experiments with this objective of robust design have been run for many years in agricultural research. For example, a paper by Yates and Cochran [ 13 describes experiments on crop varieties in different regions over several years; the objective being to determine a variety that consistently will produce a good yield over a range of climate
11
12
S.P. JONES
and soil conditions represented by the different regions. They used a graphical analysis of the interaction between the varieties and the regions to investigate the robustness of the varieties to the different regions. It is clear from this description that investigating crop varieties that are robust to environmental variation, whether due to climate, soil, aspect, farming practice, etc., is an application of experimental design techniques to robust design. The experiments conducted to perform ruggedness tests of measurement procedures can also be viewed as experiments to investigate robust design; see, for example, Wernimont [2], and Youden [3,4]. The objective of ruggedness tests is to determine a robust measurement procedure; that is a procedure that will give a consistent (and correct) result under a range of measurement conditions. An industrial example of the use of experimental design for robust design, given in Box and Jones [ 5 ] , is the case of a manufacturer of medical packaging material who sought a method of manufacture that would yield a robust packaging material. In this context, a robust packaging material is one that can be used to seal medical equipment under a range of sealing process conditions used by its customers, the medical equipment manufacturers. The environmental conditions were the sealing process factors. The objective of the experiment was directed towards achieving a suitable product design so that the variation in the environmental conditions did not result in variation in the product's performance, that is, how well the material seals. Packaging material that would yield a good seal over a range of sealing process conditions would have a competitive advantage since medical equipment manufacturers would not have to operate their sealing process within a narrow tolerance to produce a good seal. Therefore the equipment manufacturer can use less precise equipment or machines that are difficult to control consistently or a less qualified workforce. The motivation for interest in designing robust products and processes is that it is frequently more cost effective to reduce the effect of the environmental variation rather than to eliminate the source of the variation by controlling the environment. Furthermore, in some situations it might be impossible to eliminate or control the environmental variation. As an example, a manufacturer cannot control the variation in the use of their product and so would prefer to design the product to be robust to a wide range of customer usage conditions rather than to impose instructions that
STABILITY AND RESPONSE SURFACE METHODOLOGY
13
need to be strictly adhered to by the customer. In this way the product design is forgiving of variation beyond the control of the manufacturer. It should be noted that although it has been stated that the environmental variables are beyond the control of the manufacturer in the normal production or usage conditions, it is necessary that they can be controlled for an experiment. The objective of the experiment is to learn how to minimize the influence of the environmental variables on the product or process performance. To accomplish this objective it will be necessary to understand how variation in environmental conditions affects the product or process performance. The methodology that will be described in this chapter requires that the environmental conditions be changed in a controlled, structured manner.
2.1.1 Example Consider the set of data given in Table 2.1. In this example a tablet formulation is desired that will retain desired properties in both tropical and temperate climates. The actual climatic conditions that will be experienced in practice are beyond the control of the manufacturer but they can be simulated in a laboratory experiment. In this example, experiments are to be run with three constituents of the tablet formulation, say, glidant, lactose, and disintegrant, which will be denoted as A, B, and C, in a 23 factorial design. The two levels for each of the factors in the experiment are denoted by -1 and + l . The manufacturer wants a stable, or robust, tablet formulation so that it will retain its efficacy when stored in a range of temperatures and humidities. To yield data on this, for each of the eight tablet formulations, the storage temperature and humidity will be varied in a laboratory experiment following a 32 factorial design. In this design the environmental variables are varied in a climate-controlled chamber above and below their nominal settings (denoted by +1, - 1, and 0, respectively). A set of hypothetical data for a response of interest, say crushing strength, is shown in Table 2.1. The objective is to determine a combination of the factors glidant (A), lactose (B), and disintegrant (C) that will yield high values for crushing strength across the ranges of temperature and humidity studied in the experiment. At first glance it might appear that the formulation with A=-, B=-, and C=+ gives good values for crushing strength. Indeed at the nominal settings of temperature and humidity (0, 0) the crushing strength is 125 for this design combination, close to the largest response in the data set.
14
S.P. JONES
However, calculations of means and standard deviations for the response over the environmental conditions, shown in Table 2.2, reveal that the formulation with A=-, B=+, and C=+ yields an average crushing strength that is identical in magnitude but with considerably less variation as the temperature and humidity variables are changed. This formulation is robust, or stable, to storage in the range of climates represented by the changes in temperature and humidity considered in the experiment.
TABLE 2.1 HYPOTHETICAL DATA SET FOR TABLET FORMULATION EXPERIMENT Environmental Variables Temperature - o o o + + 0 + - o + - o Humidity Design Variables A B C 119 106 97 107 107 95 87 88 + 100 95 87 101 119 91 107 87 + - 116 112 119 102 101 87 105 105 + + - 109 93 100 91 102 103 85 88 + 115 108 104 128 125 97 99 107 + - + 112 113 90 103 94 88 96 97 + + 121 95 101 103 111 107 106 107 + + + 104 103 89 104 102 98 97 89
+ + 87 83 100 96 76 80 108 102
The arrangement containing the tablet design formulations is the inner array and the arrangement containing the environmental variables is the outer array. In this chapter these two arrays will be referred to as the design and environmental arrays and the total design will be called a crossproduct array. If there are n , runs in the design array and n2 runs in the environmental array, and the runs are made independently, then the total experiment will require n, x n2 runs. Thus, except where both n , and n2 are small, this could involve a large amount of experimental work. An issue that will be considered in this chapter is how the investigator can construct experimental designs that will require less work than these cross-product arrays and still be able to determine settings for the design variables that are stable (or insensitive) to variation from the environmental variables.
STABILITY AND RESPONSE SURFACE METHODOLOGY
15
TABLE2.2 SUMMARY STATISTICS FOR DATA SET OF TABLE 2.1 Design Variables Standard A B C Mean Deviation 11.21 99.22 + 1 1.42 96.67 105.22 9.61 7.8 1 96.33 15.66 106.56 10.87 97.00 7.14 106.56 6.00 98.67 The next section will present an overview of the statistical techniques associated with response surface methodology. In Section 2.3 the applicability of response surface methodology for robust design will be investigated. Section 2.4 will discuss the applicability of an alternative class of experimental designs called split-plot designs and show how the use of these designs can significantly reduce the amount of work required to conduct robust design experiments. Conclusions are given in Section 2.5.
2.2 AN OVERVIEW OF RESPONSE SURFACE METHODOLOGY The strategy for robust design experiments that will be considered in Section 2.3 is based on the statistical techniques associated with response surface methodology. This section will give an overview of response surface methodology, presenting some of the more common experimental designs that have been developed in this area. To motivate the response surface approach, suppose that there is some response of interest (for example, crushing strength in the tablet formulation example of Section 2.1.1), and a set of quantitative, continuous design variables that are of interest to the researcher (for example, the quantities of glidant, lactose, and disintegrant for the tablet formulation example). One possible objective for the researcher might be to understand and describe the relationship between the design variables and the response. This relationship can be described mathematically by
16
S.P. JONES
constructing an empirical model of the response as a function of the design variables over a range of interest. In the case where there is one design variable of interest, say percentage of lactose, the model of the response can be graphed as a curve on an x-y plot, as shown in Figure 2.1. When there are two factors of interest, the model of the response can be represented as a surface, often plotted as a contour diagram, as shown in Figure 2.2. On this plot the lines are contours of constant response and indicate the predicted response for the design variable combination. A response surface can be used to determine optimum factor settings for the response or to indicate a range of factor settings that yield an approximately equivalent response. This latter use indicates a region in the factor space where the response is robust to changes in the factors. For example, in Figure 2.2, it appears that the maximum crushing strength occurs when the percentage of lactose is 23% and the percentage of disintegrant is 3.3%. However, it can also be seen that near the optimum point the crushing strength is more stable or robust to changes in the quantity of lactose than to changes in the quantity of disintegrant.
110
8
5M
105
c
g m
%
2 u
100 95
90
20
30 Percentage Lactose
40
Figure 2.1 Curve showing the effect of lactose on crushing strength.
-
STABILITY AND RESPONSE SURFACE METHODOLOGY
c
* 2
8
4'0
c
3.5
6"
3.0
42
m
.r(
8 J cd
*
9 :: E
I
17
2.5
90
~
2.0
20
30
40
Percentage Lactose
Figure 2.2 Response surface showing the effect of lactose and disintegrant on crushing strength
To introduce some notation, let the response of interest be denoted by 77 and suppose that there are p quantitative, continuous design variables, xl, x2,..., xP , such that 77 is a function of the design variables, x,,xD ..., xd that is
where the form off is unknown. If the response is measured at a particular setting of the design variables then the measured response will differ from the true response due to experimental error, that is
where y is the measured response and E is the error. In response surface methodology, it is frequently assumed that f can be approximated in some region of the design variables by a low-degree polynomial. For example, if p=2, and a first-order model is assumed appropriate then
where
Po,PI,and P2 are constant coefficients that
measure the mean and
18
S.P. JONES
the effects of x 1 and x2 on the response. It is assumed that the x's are controlled and measured with no error in the experiment. Alternatively, the experimenter might assume that f can be approximated by a second-order model so that
The rationale for using low degree polynomials to approximate f is based on a Taylor series expansion off around x=O. The statistical techniques associated with response surface methodology are concerned primarily with two aspects of the experimentation process; the construction of experimental designs that yield data to permit the efficient modeling of the response surfaces, and the analysis of the experimental data and derived response surfaces. The statistical investigation of response surfaces has a history dating back to the pioneering work of George Box and his colleagues in the 1950's; see, for example, Box and Wilson [ 6 ] ,Box [7], Box and Youle [8]. An introduction to the concepts and techniques associated with response surface methodology can be found in Box, Hunter, and Hunter [9] (chapter 15) and Cornell [lo]. For an extensive coverage of response surface methodology see Myers [ I l l , Box and Draper [12], Khuri and Cornell ~31. The following sub-sections will describe some of the experimental designs that are commonly used to fit the first-order and second-order model. These designs will be called first-order and second-order designs, respectively.
2.2.1 First-order designs Suppose that there arep quantitative variables of interest, x l , x2, ..., xp, and that in some region of interest the response can be approximated by the general first-order model y
=
Po+ P I X l + p2x2 + ... + ppxp+ &
A class of experimental designs that are appropriate for obtaining data that will permit the estimation of the coefficients in equation ( 5 ) by least squares are the two-level factorial and fractional factorial designs. A single P
replicate of a two-level full factorial design in p variables will have 2
STABILITY AND RESPONSE SURFACE METHODOLOGY
19
experimental runs composed of all possible combinations of the p variables. Such a design will permit the estimation of all p main effects, all possible two-factor interactions, all possible three-factor interactions, ...,p P
factor interaction; a total of 2 -1 main effects and interactions. P
Frequently an experiment that required all 2 experimental runs would be too costly to run, especially for p not small. In these situations important information on the effects of the variables may be determined by running only a fraction of the full factorial design. With such designs, called fractional factorial designs, the ability to estimate the effect of some higher-order interactions is lost, and other effects and interactions are aliased together. This aliasing implies that the calculated effects cannot be unambiguously assigned to one of the effects or interactions that are aliased together in the design. To illustrate the concept of aliasing, consider an experiment with three variables, xl, xB x3, using the fractional factorial design given in Table 2.3. In a two-level fractional factorial design with the two levels of each factor coded - 1 and +1, the estimate for the coefficient of a variable is calculated as half of the difference between the average response at the high and the low setting of the variable. Thus, b,, the estimate of PI the coefficient for x,, will be calculated as
Now in Table 2.3 the column headed x2x3has been derived by multiplying together the columns for x2 and x3.This column can be used to calculate the interaction effect of x2 and x3.The interaction effect of two variables measures how the effect of one variable on the response depends on the level of the other variable. From the table it can be seen that the column headed x2x3is identical to the column headed x, and so the estimate of the interaction effect of x2x3 will be identical to that for the estimate of the coefficient for x,. Thus x, is aliased with x2x3and the calculated effect in equation (6) cannot be unambiguously assigned to the effect of x, or the interaction effect x2x3.In general, the aliasing of effects occurs when the calculation of the effects uses identical columns (apart from a switching of signs).
20
S.P. JONES
The degree of aliasing in a design can be summarized by stating the design's resolution. In general a design has resolution R if all effects containing k or more variables are unaliased with any variables containing less than R-k variables. Resolution is denoted by the appropriate Roman numeral. A resolution I11 design has all main effects unaliased with other main effects but may alias them with two-factor interactions. A resolution IV design does not alias main effects with two-factor interactions, but may alias two-factor interactions with one another. A resolution V design does not alias main effects with three-factor interactions nor alias two-factor interactions with one another, but may alias three-factor interactions with two-factor interactions.
TABLE2.3 Run 1
2 3
4
FOUR-RUN DESIGN WITH THREE FACTORS -1 +1 +1 -1
x3
-1 +1 -1 +1
-1
-1
y1
-1 +1
+I
y2
+1
y3
+I
x2x3
Response
x2
-1
Y"
Although some information is lost when fractional factorial designs are used instead of hll factorial designs, the advantage of these designs is that the total number of experimental runs can be reduced considerably. Furthermore, by carehl choice of design and allocation of the variables to the design, and by following a sequential approach to experimentation, the experimenter can use fractional factorial designs to obtain information in an economical manner. An excellent description of fractional factorial designs and aliasing can be found in Box, Hunter, and Hunter [9]. This book also contains a description of how these designs can be blocked to remove additional sources of variation from the analysis, thereby increasing the precision of the estimates of the coefficients. There is also a discussion of how experimental designs can be run sequentially, designing the next experiment in the light of information that has been obtained, and the unresolved questions that remain, from the previous experiments. The rationale for sequential experimentation is that the best time to design an experiment is after the experiment has been run, since at that stage more is
STABILITY AND RESPONSE SURFACE METHODOLOGY
21
known about the process than when the experiment was designed. Box, Hunter, and Hunter recommend that no more than 25% of the experimental budget be devoted to the first experiment, so that sufficient resources are retained to investigate questions that the data from the first experiment will raise. From the discussion of design resolution above, it should be clear that a resolution 111 design will permit the fitting of all the coefficients for the first-order model in equation ( 5 ) . However, the resolution 111 design will alias main effects with two-factor interactions. The aliasing of effects in fractional factorial designs has implications for the fitting of the response surface. The aliasing in the resolution 111 design implies that the coefficients associated with the main effects will be biased by the presence of any interactions in the true (unknown) model. To illustrate this, consider fitting the first-order model with three variables, xl, x2,x,,
3-1
Suppose that the experimenter runs the 2 fractional factorial design shown in Table 2.3. With this design each main effect is aliased with the two-factor interaction composed of the other two factors; that is, X I is aliased with x2x3,x2 is aliased with xIx3,and x, is aliased with xIx2.This can be verified by multiplying together the appropriate columns, as was done for x2x3. Suppose that the true unknown model is
Then with the design given in Table 2.3, b,, the experimenter's estimate of p3, will be biased by the coefficient P12. In fact
Similarly, if the true model is
22
S.P. JONES
then with the design in Table 2.3, b,, the experimenter’s estimate of P,, will be biased by the coefficient PZ3;that is
It can be seen that the use of a fractional factorial design can lead to biases in the estimation of the first-order coefficients from any interactions that are present in the true (unknown) model and that have been aliased with the main effects of the factors. Therefore, the experimenter needs to be aware of the aliasing that occurs with the use of a fractional factorial design and understand the biases that can result in the estimation of the coefficients of the model. A more complete discussion of the biases in estimation of coefficients from using fractional factorial designs and a description of how the biasing can be calculated for larger fractional factorial designs can be found in Box and Draper [I21 (pp. 65-70) and Myers [ 1I] (pp. 110-114). Some protection against the effect of biases in the estimation of the firstorder coefficients can be obtained by running a resolution IV fractional factorial design. With such a design the two-factor interactions are aliased with other two-factor interactions and so would not bias the estimation of the first-order coefficients. In fact the main effects are aliased with threefactor interactions in a resolution IV design and so the first-order effects would be biased if there were third-order coefficients of the form x?,xk in the true model. Fractional factorial designs use n = 4, 8, 16, 32, 64, ... runs, and can be constructed to carry up to p = n-1 variables. (A design that has p = n-1 variables in only n runs is called a saturated design since it cannot hold any more variables.) For values of n that are multiples of 4 but not a power of 2, that is, n = 12, 20, 24, 28, 36,..., an alternative class of first-order design that can be used are the Plackett-Burman designs; see Plackett and Burman [ 141. Plackett-Burman designs may be of use in screening situations, that is in situations when the experimenter wishes to examine many variables but believes that only a few are of importance. Furthermore, Plackett-Burman designs are particularly useful when following a sequential experimental strategy since a resolution IV design can be constructed from a PlackettBurman design by augmenting it with the foldover design; that is the design where all of the runs have the signs of all the variables switched. An example of a Plackett-Burman design with 11 factors in 12 runs is given in Table 2.4.
23
STABILITY AND RESPONSE SURFACE METHODOLOGY
It can be seen that this 12-run design is generated by starting with a particular row of -1's and +l's and generating the next row by cycling through the variables and shifting each sign one place to the right. This is repeated eleven times to obtain the first eleven runs and then the final run is constructed by adding a final row of -1's. The starting rows for 12-, 20-, and 24-run designs given by Plackett and Burman [14] are as follows: n=12: +1+1 -1 +1 +1 +l -1 -1 -1 +1 -1 n=20: + 1 + 1 - 1 - 1 + 1 + 1 + 1 + 1 -1 + 1 - 1 + 1 - 1 - 1 - 1 - 1 + 1 + 1 - 1 n=24: +1 + 1 + 1 + 1 + 1 -1 +1 -1 +1 + l -1 -1 +1 +1 -1 -1 +1 -1 +1 -1 -1 -1 -1.
TABLE 2.4 R u n A
PLACKETT-BURMAN DESIGN IN 12 RUNS B C D E F G H I
J
K
Plackett and Burman [14] give the method of design construction for all values of n that are multiples of 4 up to 100 except n=92. A disadvantage of Plackett-Burman designs is that the structure of the aliasing is more complex than the fractional factorial designs, so that it is harder to determine the effect of any biases in the estimates of the coefficients. Draper and Lin [ 151 indicate how, in the situation where only a few variables are important, additional runs can be added to PlackettBurman designs to yield experimental designs with higher resolution or clearer alias structure. Hamada and Wu [16], under the assumptions that there are few significant variables and that any variable in a significant interaction is likely to have a significant main effect, show how it may be
24
S.P. JONES
possible to study a few interactions in Plackett-Burman designs without adding any runs. Box and Meyer [17] describe how a Bayesian analysis can reveal active variables in the complex aliasing that occurs with Plackett-Burman designs. In conclusion, Plackett-Burman designs tend to have a complex alias structure and so the presence of interactions in the true model induces a complex bias structure on the first-order coefficients. Therefore, it is recommended that these designs only be used if the assumption of no second-order interactions is reasonable, or as a part of a sequential strategy of experimentation that would generate a resolution IV design by augmenting the Plackett-Burman design with its foldover design. 2.2.2 Adding center points When an unreplicated experiment is run, the error or residual sum of squares is composed of both experimental error and lack-of-fit of the model. Thus, formal statistical significance testing of the factor effects can lead to erroneous conclusions if there is lack-of-fit of the model. Therefore, it is recommended that the experiment be replicated so that an independent estimate of the experimental error can be calculated and both lack-of-fit and the statistical significance of the factor effects can be formally tested. In some experimental contexts, however, each experimental run is expensive. Thus it is infeasible to replicate each design point of the experiment to obtain an estimate of the experimental error. When all of the variables are quantitative, an estimate of the experimental error can be obtained by adding to the full factorial, fractional factorial or Plackett-Burman design, a number of runs at the center of the design. The center of the design is the midpoint between the low and high settings of the two-level factors in the experiment. Thus, if there are p variables, and the levels of the variables have been coded (-1, +l), then the center of the design is (x,, x2, ..., xp) = (0, 0, ..., 0). If the center point is replicated no times in the experiment, then the variance of the response at those runs provides an estimate of the experimental error with no - 1 degrees of freedom to statistically test both the lack-of-fit of the model and the significance of the coefficient estimates of the model. Another reason for augmenting the two-level design with center points is that these points allow for an overall test of curvature. It is clear that with only two levels for each variable it is impossible to detect any quadratic effect of the variables. Thus, the underlying model is assumed to
STABILITY AND RESPONSE SURFACE METHODOLOGY
25
be linear over the experimental region. To examine the quadratic effect of all the variables requires each variable to be run with at least three levels. An overall test of the presence of quadratic effects can be obtained by comparing the average of the center point runs, Yo, with the average of the cube portion of the design, 7,,since the expected value of ( J , -Yo) is
where Pi; is the quadratic effect of factor xi. A formal statistical test can be constructed by comparing
with the F1,(no-l) distribution, where 6; is the estimate of the experimental error from the variance of the no center point runs, and nc is the number of runs in the cube portion of the design. If the F-test is significant then there is evidence of a quadratic effect due to at least one of the variables. With the present design, however, the investigator will not be able to determine which of the variables has a quadratic effect on the response. Additional experimentation, perhaps by augmenting the current design with some star points to construct a central composite design (see section on central composite designs below), will need to be conducted to fully explore the nature of the quadratic response surface. 2.2.3 Second-order designs Suppose that there are p variables of interest, x,, x2, ..., xd and that in some region of interest the response can be approximated by the general second-order model
26
S.P. JONES
that is, y = intercept + (first-order terms) + (quadratic terms)
+ (cross-product terms) + E
This section will describe some of the classes of experimental designs that are appropriate for obtaining data that will permit the estimation of the coefficients in equation (14) by least squares. Three-level designs It is obvious that to be able to estimate the quadratic coefficients, p,,,pZ2,
f133 ,.., ppp,in equation (14), it is necessary to have at least three
distinct levels or settings for the variables. This suggests that a suitable design for estimating the coefficients of the second-order model would be a single replicate of a three-level full factorial design in p variables. This design P
will have 3 experimental runs composed of all possible combinations of the p variables. If there are only p=2, or p=3 variables then a full factorial design is often feasible. However, the number of runs required becomes prohibitively large as the number of variables increases. For example, with p=5 variables, the second-order model requires the estimation of 21 coefficients: the mean, five main effects, five pure quadratic terms, and ten two-factor interactions. The three-level full factorial design would require 5
= 243 runs. It might be supposed that a smaller design that permitted the estimation of the coefficients of interest could be constructed by taking a fraction of the full factorial. However, the aliasing of three-level designs is very complex and so fractionating a three-level design will not be pursued. The interested reader may refer to Kempthorne [ 181. Therefore, unless the number of factors is small, three-level designs are not usually feasible for response surface studies.
3
Central Composite Designs An alternative approach to constructing designs for estimating secondorder models is to consider building a design from those constructed for the first-order model. In Section 2.2.1, we discussed the use of fractional factorial designs to estimate the coefficients of the first-order model. It was noted that a fractional factorial design of resolution V would yield
STABILITY AND RESPONSE SURFACE METHODOLOGY
27
unbiased estimates of all coefficients for the main effects and two-factor interactions. To estimate the quadratic coefficients of the second-order model this design could be augmented with additional points where the variables are at additional settings to the fractional design so that each variable has at least three settings. A class of augmented designs, first proposed by Box and Wilson [6] and frequently applied in response surface work, is the central composite design. Composite designs consist of a full or fractional factorial design of at least resolution V; the number of runs in this design will be nc = 2@-k), these runs forming a cube portion with coordinates ofthe form (*l, f l , ..., f l ) ; ns = 2p star points with coordinates (fa,0, 0, ..., 0), (0, f a , 0, ..., 0), ..., (O,O, ..., *a); nocenter points (0, 0, 0, ..., 0). The use of the terms cube, star and center points is descriptive of the design pattern, as is clear when there are p = 3 variables. In that case the points of the central composite design, shown in Table 2.5, can be represented by the points in Figure 2.3. In Table 2.5, runs 1-8 are the cube portion, runs 9- 14 are the star portion, and runs 15-17 are the center points. In general the cube portion might be replicated yC times and the star portion might be replicated rs times. Also, it might be possible to use a fractional factorial design of resolution less than V if the experimenter is prepared to assume that certain interactions are negligible. A central composite design in four variables is shown in Table 2.6. In this table, runs 1-16 are the cube portion, runs 17-24 are the star portion, and runs 25-27 are the center points. The central composite design has several advantages over the three-level design. Firstly, the total number of runs in a central composite design is frequently less than that required for a three-level full factorial design. For example, with p=5 variables 243 runs would be required for the three-level full factorial design, whereas with single replicates for the cube and star portions and four center points, the total number of runs required for a central composite design would be 16 + 10 + 4 = 30 (for the cube portion a 2('-') fractional factorial design could be used).
28
S.P. JONES
t 1
+I
/
r
* *
0 cube points
star points 0 center point
-1
Figure 2.3 Central composite design with three variables.
TABLE 2.5 CENTRAL COMPOSITE DESIGN WITH THREE VARIABLES Run A B C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-1 +1 -1 +1 -1 +1 -1 +1
-a +a
-1 -1 +1 +1 -1 -1 +1 +1 0 0
0 0 0 0 0 0 0
-a +a
-1 -1 -1 -1 +1 +1 +1 +1 0 0 0 0
0 0 0 0 0
0 0 0
-a +a
STABILITY AND RESPONSE SURFACE METHODOLOGY
29
A second advantage of the central composite design is that it lends itself to a sequential approach to experimentation, since the central composite design can be built in sections. For example, an experimenter might initially assume that the response surface can be adequately represented by a first-order model, possibly with the addition of some two-factor interaction terms. Thus they might initially conduct a resolution V fractional factorial design. Following the analysis the experimenter might suspect some nonlinearity and so augment the first design with some center points. If examination of the response at the center point runs indicates the presence of quadratic effects, then the experimenter might be interested in fitting a second-order model. Data to enable this to be accomplished can be obtained by augmenting the design with star points to generate the central composite design. In some situations design augmentation can be accomplished so that the designs are orthogonally blocked, thus allowing for block differences to be eliminated in the analysis and estimation of the coefficients. The central composite design gives the experimenter the flexibility of choosing the value of a, the distance of the star points from the center of the design. One possible criterion for a is to choose it so that the central composite design is rotatable. A rotatable design is one in which the precision of the predicted response is the same at all points equidistant from the center point (0, O,..., 0). Rotatability is a useful property for a design since it relieves the experimenter from making any assumption that the underlying response surface is oriented in a particular direction. Rotatability ensures that whatever the orientation of the response surface the precision of the predicted response will not be dependent on the direction from the center of the design, only on the distance from the center of the design. It can be shown that for a central composite design to be rotatable the distance of the star points from the design center is a = (2‘p-k’rc/ Y ~ ) ’ ’ ~ where 2@-k’is the number of factorial points. Therefore, if p =5 and k =1 , then the design would be rotatable if a = (2(5-1))1’4 = 2. For p=3, then 3 = 1.68 generates a rotatable design. e(2 A possible disadvantage of the central composite design is that it requires five levels of each variable (0, h1 ,ha). In some situations it might be necessary or preferable to have only three different settings of the variables. In this case a can be chosen to be 1 and the design is called a face-centered composite design. These designs are not rotatable.
30
S.P. JONES
TABLE2.6 CENTRAL COMPOSITE DESIGN WITH FOUR VARIABLES Run A B C D
1 2 3 4 5 6 7 8 9 10 I1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
-1 -1 -1 -1 +I
-1 +1 -1 +I -1 +1 -1 +1 -1 +1 -1 +1 -1 +I -1 +I
-1 -1 +1
0 0 0 0
-a +a 0 0
-a +a
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
-a +a
+1
-1 -1 +1 +1 -1 -1 +I +1
-1 -1 +1 +1 0 0
+1
+1 +1 -1 -1 -1 -1 +I +I +I +I 0 0 0 0
-1 -1 -1 -1 -1 -1 -1 -1 +1 +1 +1 +1
+I +1 +1 +1 0 0 0 0 0 0
-a +a 0 0 0
The choice of the number of center points and the blocking of composite designs is discussed in Myers [ 113, Box and Draper [ 121, and Khuri and Cornell [13]. One final point: it should be noted that the experimenter is not constrained to use a resolution V design or to add star points for all of the factors. In particular, if it is believed that certain two-factor interactions
31
STABILITY AND RESPONSE SURFACE METHODOLOGY
can be assumed negligible, then it might be possible to use a resolution IV design with a particular assignment of variables to columns of the design. Alternatively, if there are certain pure quadratic effects that are deemed unimportant, then star points for those variables need not be added to the design. Box-Behnken Designs
Another alternative to the 3 full factorial is the Box-Behnken design (Box and Behnken [19]). These designs are a class of incomplete three-level factorial designs that either meet, or approximately meet, the criterion of rotatability. A Box-Behnken design for p=3 variables is shown in Table 2.7. This design will estimate the ten coefficients of the second-order 3
model in only fifteen runs, in contrast with the 3 =27 runs required by the full factorial design. This design is shown graphically in Figure 2.4.
TABLE2.7 BOX-BEHNKEN DESIGN FOR THREE VARIABLES C Runs A B 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15
0
-1 +1 -1 +1 -1 +1 -1 +1 0 0 0 0
-1 -1 +1 +1 0 0 0 0 -1 +1 -1
0 0 0 -1 -1 +1 +1 -1 -1 +1
+I
+I
0 0 0
0 0
0 0 0
0
Table 2.8 gives the runs for a Box-Behnken design in four variables. In this table, the runs are grouped in sets of four, each set of four being
S.P. JONES
32
composed of all the combinations of *l for the two variables indicated, the other two variables being set at 0. The design is completed with three center points, runs 25-27.
-1 -1
+I
Figure 2.4 Box-Behnken design for three Variables
TABLE 2.8 BOX-BEHNKEN DESIGN FOR FOUR VARIABLES Runs A B C 1-4 5-8 9-12 13-16 17-20 2 1-24 25-27
*l 0 kl 0 *1 0 0
*l 0 0 *1
0 kl 0
0 *l 0 *1 *1 0 0
D 0 kl kl 0 0
*I 0
As was mentioned above for central composite designs, the experimenter can modify these designs if they believe that certain two-factor interactions can be assumed negligible. Box and Jones [20,21] show how this can be done to yield what they call a modified Box-Behnken design that requires fewer runs than the standard Box-Behnken design. A table of Box-Behnken designs for p = 3, 4, ..., 7 variables can be found in Box and Draper [ 121, and for p = 3, 4, ..., 7, 9, 10, 11, 12, 16 variables in Box and Behnken [ 191.
STABILITY AND RESPONSE SURFACE METHODOLOGY
33
2.2.4 Optimal designs Optimal design theory provides an alternative approach to the selection of an experimental design. For a description of the theory of optimal design see, for example, Atkinson and Donev [22]. To motivate this approach, suppose that an experiment with n runs will be conducted and a model with k coefficients is to be fit to the data. This model can be represented in matrix notation as
y=xp+€
(15)
where y is an n x 1 column vector of response values, p is a k x 1 column vector of coefficient estimates, X is a n x k matrix that defines the runs in the experiment and E is an n x 1 column vector of errors. It is commonly assumed that the errors are independent and follow a normal distribution with variance 2. One of the questions that the experimenter needs to consider is how to choose good values for the elements of X. It can be shown that the variance of the coefficient estimates, b, of p is d(XTX)-'. Furthermore, the variance of the predicted response at any setting of the variables is also a function of (X'X)-'. Thus one way to choose good values for the elements of X is to choose them so that (XTX)-' is, in some sense, "smally7.A number of criteria have been developed, the most popular of which are: 0
0 0
A-optimality criterion: D-optimality criterion: E-optimality criterion: G-optimality criterion: the predicted response.
minimize trace (X'X)-', minimize det (X'X)-' , minimize max eigenvalue (XTX)-', minimize maximum value of the variance of
Efficient algorithms have been developed that construct D-optimal designs for a given response model, candidate design points, and number of runs (see, for example, Mitchell [23]). Optimal experimental design can be useful in situations where:
0
the experimental design region is irregularly shaped due to constraints on the variables, it is necessary to augment an existing design, designs must be constructed for special models or with a limited number of runs,
34
S.P. JONES
the experimenter has prior knowledge on the form of the model and desires coefficient estimates in a minimal number of runs. There has been an extensive critique on the role of optimal design theory in practical experimental design; see, for example, Box [24], Box and Draper [12]. One of the underlying assumptions behind optimal experimental design is that since the designs are only optimal within the region defined by the candidate design points, then there is a well-defined region of interest within which experiments can be run. The assumption is that the experimenter has no interest in the response outside of the region defined by the candidate points. In typical response surface studies, however, the region of interest might be poorly defined and might change as the investigation proceeds. Thus, it might be advisable to design the experiment to obtain information about the response beyond the current region of interest defined by the candidate points. Furthermore, optimal design theory assumes that the model is true within the region defined by the candidate design points, since the designs are optimal in terms of minimizing variance as opposed to bias due to lackof-fit of the model. In reality, the response surface model is only assumed to be a locally adequate polynomial approximation to the truth; it is not assumed to be the truth. Consequently, the experimental design chosen should reflect doubt in the validity of the model by allowing for model lack-of-fit to be tested. 2.2.5 Other second-order designs There are many other second-order designs that have been proposed in the statistical literature. Some of these designs are based on variants of the central composite design. For example, the designs proposed by Hartley [25] use cube portions in the central composite design of resolution less than V with no two-factor interactions aliased with one another. Secondorder designs can also be constructed using irregular fractional factorial designs for the cube portion. Irregular fractional factorial designs (see, for example, John [26] and Maclean and Anderson [27]) are non-orthogonal fractions of a full factorial design. Some second-order designs, such as the uniform shell designs (Doehlert [28]), have been proposed which are not based on the central composite design. A more thorough treatment of additional second-order designs can be found in the texts mentioned earlier: see Myers [ 1 11, Box and Draper [12], Khuri and Cornell [13].
STABILITY AND RESPONSE SURFACE METHODOLOGY
35
2.2.6 Interim Summary This section has given an overview of some of the experimental designs that are suitable for collecting data to estimate the coefficients of the firstorder and second-order model. Many of these designs are based on factorial and fractional factorial designs. It is clear that if the first-order model (equation (5)) is assumed to be valid, then a resolution I11 design or a Plackett-Burman design can be used since this design will estimate all of the P, without bias. However, it has been shown that with a resolution I11 design the estimates of the coefficients P, will be biased if the true (unknown) model contains interactions. The biases and lack-of-fit of the first-order model due to interaction effects can be examined by running a resolution IV design, which will yield unbiased estimates of all of the P,. A resolution IV design can be obtained from a Plackett-Burman or resolution I11 fractional factorial design by augmenting the initial design with its foldover design. If the true model contains quadratic terms then the estimate of the intercept, Po,of the first-order model will be biased. The lack-of-fit of the first-order model due to quadratic effects can be tested by adding center points to the design. To construct the central composite design to estimate the coefficients of the second-order model (equation (14)), usually a fractional factorial design of at least resolution V is used. In this case, if the model is valid, then all of the estimates of the main effect coefficients, p,, and the
interaction coefficients, P,,, are unbiased. An alternative to the central composite designs for estimating the coefficients of the second-order model are the Box-Behnken designs or the designs referenced in Section 2.2.5. Table 2.9, shows the minimum number of runs for a single replicate of a fractional factorial design with the desired resolution for p variables, p=3,...,11.
2.3 ROBUST DESIGN AND RESPONSE SURFACE METHODOLOGY This section will describe how the techniques associated with response surface methodology, outlined in Section 2.2, can be applied to designing a product or process that is insensitive, or robust, to variation that is difficult
36
S.P. JONES
or impossible to control. Two alternative strategies will be outlined in this introduction and will be considered in detail in Sections 2.3.1 - 2.3.5.
TABLE2.9 MINIMUM NUMBER OF RUNS FOR DESIRED RESOLUTION Number of Resolution Variables I11 IV V 3 4 8 8 4 8 8 16 16 16 5 8 16 32 6 8 16 64 7 8 16 64 8 16 32 128 9 16 32 128 10 16 11 16 32 128 In the first approach it is assumed that the effect of environmental variation on the response is investigated by running a replicated experiment. The replication enables the variation of the response to be estimated at each design point. In this scenario the environmental variation is uncontrolled during the experiment but is assumed to affect the response in a random manner and is captured in the replication. It is acknowledged that the variation that is measured at each design point will be from many sources, including the sources of the environmental variation. However, with this approach the objective of finding design variable settings that minimize the variation in the response can be achieved, although no information will be gained as to how the design variable settings might make the response robust to particular sources of environmental variation. The design and analysis of experiments with this first approach will be covered in Sections 2.3.1 and 2.3.2. In the second approach, the environmental variation is deliberately introduced into the experiment by including in the experimental design environmental variables that are controlled at predetermined settings during the experiment. In this approach it will be possible to estimate how much of the variation is due to the environmental variables and how much is due to unassignable sources. It will be possible also to determine how particular design variable settings might make the response robust to the sources of environmental variation considered in the experiment. The
STABILITY AND RESPONSE SURFACE METHODOLOGY
37
design and analysis of experiments with this second approach will be covered in Sections 2.3.3 and 2.3.4, and an example will be given in section 2.3.5. It will be seen that some of the methods for analysis of experiments conducted under the first approach can also be applied to data derived from experiments conducted under the second approach. 2.3.1 Response surface modeling of the mean and standard deviation In Section 2.2 it was shown that response surface methodology can be applied to enable a researcher to model the effect of multiple quantitative variables on a response with a low-degree polynomial. Frequently, response surface techniques have focused on the mean response as the only response of interest. However, by regarding the variation in the response as an additional response of interest, the researcher can investigate how to achieve a mean response that is on target with minimum variation. In particular, if a researcher replicates each design point in an experiment, then an estimate of the standard deviation at each point can be calculated and used to model the effect of the variables on the variability of the response. To illustrate this approach, suppose that in an experiment on tablet formulation a researcher is interested in understanding how three quantitative variables, pressure force, lactose quantity, and disintegrant quantity, affect crushing strength. Suppose that the objective is to have a mean crushing strength of 125 N with minimum variation. If it is believed that the effect of the variables on the crushing strength can be adequately represented by a second-order polynomial then a 17-run central composite design, shown in Table 2.5, could be run to estimate all of the terms in the second-order model
where the xi are coded settings for the three design variables. Now if each of the design points in the central composite design is replicated five times, so that the complete design has 75 runs, then at each design point we can calculate the average response and the standard deviation of the response. The analysis techniques associated with response surface methodology can then be applied to fit separate models to
38
S.P. JONES
the mean and the standard deviation. The researcher is then in a position to determine settings of the variables that will give a mean response that is close to target with minimum variation. (It should be noted that many authors suggest that, for theoretical reasons, the log of the standard deviation, ln(s), be modeled rather than s; see, for example, Bartlett and Kendall [29] and Box [30].) In the context of the tablet formulation example, the model of the mean and the standard deviation can be used to determine which factors affect the mean crushing strength only, which affect the variability in crushing strength only, and which affect both the mean and the variability. The researcher can then choose settings of the variables that will give a mean crushing strength that is consistently close to 125 N. At this stage it is important to stress that the run order of the experimental design, including all replicates, should be completely randomized, since the purpose of the replicates is to provide an estimate of the total variation in the process or product at each design combination. If the replicated experiment is not completely randomized, then it is likely that the variation at each design point will be under-estimated since it will not include a component due to any variation in the set-up of the design variables. This could lead to erroneous conclusions about robust design combinations if certain design combinations have less set-up variation than others. The advantages of using the response surface approach to study both the mean and the variability are that it is easy to apply, no new methods of analysis are required, and the standard analysis methods can be used to bring insight to bear on the dual objective of the mean response and the variability. Some of these methods of analysis are considered in Section 2.3.2. As was mentioned above, a disadvantage of this approach is that the variation that is measured at each replicated design point will be from many sources, including sources of environmental variation, and it will be impossible to attribute the variation to a particular source. Another disadvantage of this approach is that it assumes that the variation experienced at the design points during the course of the experiment is similar to that experienced in practice in the real world. Frequently an experiment will be well-controlled and so the variation experienced will be considerably less than that normally encountered. One of the rationales for the noise arrays and cross-product designs advocated by Taguchi and discussed in Section 2.1 is to deliberately
STABILITY AND RESPONSE SURFACE METHODOLOGY
39
introduce into the experiment sources of variation that are more in line with what will be encountered in practice. During the experiment the noise (or environmental) variables are changed in a controlled manner that mimics the variation likely to be experienced in practice. Experiments that deliberately introduce the variation into the experiment through the experimental design (called the second approach, above) will be considered in Sections 2.3.3 and 2.3.4. 2.3.2. Analyzing the mean and standard deviation response surfaces One analysis approach, appropriate if there are only a couple of design variables, is to construct contour plots of the mean response and the standard deviation of the response over the range of the variables. This will enable the researcher to see the constraints and trade-offs that may need to be made to achieve required values for the mean and variability of the response. A more rigorous analysis for simultaneously obtaining a target value for the mean and minimizing the variance has been discussed by Vining and Myers [3 11. They propose applying the dual response approach developed by Myers and Carter [32] and state that this approach can satisfy the goals of achieving a target for the mean and for the variance within a more rigorous statistical methodology than that proposed by Taguchi. The objective of the dual response approach of Myers and Carter is to optimize a primary response subject to an appropriate equality constraint on the value of a secondary response. An application of this approach to the study of products and processes that are stable to environmental variation would involve running a response surface design, such as a central composite design or Box-Behnken design, that is replicated at each design point, as described in Section 2.3.1. Since each design point is replicated, the mean and variance can be calculated for each point in the experiment. Separate second-order models are fit to the data from the experiment that adequately describe the effect of the variables on the mean and on the standard deviation of the response. Then these two models are studied using the dual response approach of optimizing a primary response subject to an appropriate equality constraint on the value of a secondary response. The choice of whether to make the mean the primary or the secondary response will depend on the objectives of the experiment. For example, if the objective is to have the mean on target with minimum variation then the dual response approach would suggest minimizing the variance (or
40
S.P. JONES
some function of the variance such as ln(s)), subject to the constraint that the mean is at its target value. In this case the variance (or ln(s)) will be the primary response and the mean will be the secondary response. Alternatively, if the objective is to maximize (or minimize) the mean response and keep the variation as small as possible then the dual response approach would suggest optimizing the mean subject to the constraint that the variance is less than some upper bound. In this case the mean will be the primary response and the variance will be the secondary response. As suggested by Vining and Myers [31], the investigator may wish to select several possible constraint values for the variance, find the corresponding optimum values for the mean response subject to these variance constraints, and select a good compromise among these values. Details of the dual response approach can be found in the references given above. It is an extension of ridge analysis (Hoerl [33], see also Box and Draper [12]). The assumption is that there is a spherical region of interest of the design variables and that the variable combination that optimizes the primary response subject to a constraint on the secondary response is likely to be on the boundary of this region of interest. Thus an additional constraint is introduced, that the optimal value for the primary response is on the boundary of this spherical region. Lagrange multpliers are used to solve this constrained optimization problem. An example of the application of the dual response approach is given in Vining and Myers 1311. The application of the standard nonlinear programming techniques of constrained optimization on analyzing the mean and variance response surfaces has been investigated by Del Castillo and Montgomery [34]. These techniques are appropriate since both the primary and secondary responses are usually quadratic functions. Del Castillo and Montgomery recommend the generalized reduced gradient (GRG) algorithm for the following reasons. Firstly, the GRG algorithm is a primal method meaning that at each iteration the method searches only through the feasible region to determine a point that improves the primary response. Secondly, the GRG algorithm is one of the most robust nonlinear programming methods in that it can solve a wide variety of problems. Finally, the GRG method is known to work well unless the starting point is far from optimal and the constraints are highly nonlinear. Neither of these conditions are likely to be of concern when applying GRG methods to the dual response problem. Del Castillo and Montgomery also mention that if, in the dual response problem, the
STABILITY AND RESPONSE SURFACE METHODOLOGY
41
primary response is quadratic and the secondary response is linear, then a simpler method, such as quadratic programming, would be appropriate. An explanation of the GRG algorithm and its application to the dual response problem is given in Del Castillo and Montgomery [34]. In this paper Del Castillo and Montgomery claim that the GRG methodology has an advantage over the dual response method of Vining and Myers [31] in that it allows more constraints (secondary responses, such as cost constraints) to be included in the optimization and the constraints can be of a more flexible form. Furthermore, the optimization can be conducted over non-spherical regions of interest; for example, a cuboidal region defined by design variables within the region -1 5 xiI + l .
2.3.3 Experimental design with environmental variables In this section it is supposed that the environmental variation is deliberately introduced into the experiment by including in the experimental design environmental variables that are controlled at predetermined settings during the experiment. Freeny and Nair [35] considered robust design experiments with uncontrollable, but measureable, environmental variables. Their approach will not be considered here; in this chapter it will be assumed that environmental variables can be controlled during the experiment. An advantage of including environmental variables in the experimental design is that the analysis can investigate the effect of design variables on specific sources of environmental variation with the objective of understanding how particular design variable settings might affect the variation in the response due to changes in the environmental variables. This and the subsequent section will consider the application of response surface methodology to these experiments. Section 2.2.4 will show how split-plot designs can be applied to include environmental variables in the experimental design. An example of this type of experiment is the tablet formulation experiment described in Section 2.1 and given in Table 2.1. The usual method that Taguchi advocates for introducing the environmental variation is to construct an experimental design that contains the environmental variable settings and to completely cross this design with the experimentill design that contains the design variables. If there are n l runs in the design array and n2 runs in the environmental array, and the runs are madc independently, then there will be n, x n2 runs for the total experiment.
42
S.P. JONES
Thus, the experimental designs advocated by Taguchi can require a prohibitively large number of runs. An alternative approach is to regard the environmental variables as standard experimental variables and to apply the techniques associated with response surface methodology to the combined set of design and environmental variables (see Welch, Yu, Kang, and Sacks [36], Shoemaker, Tsui, and Wu [37], and Box and Jones [38]). This approach can result in considerably smaller and therefore cheaper experiments. As an example of the reduction in the size of the experiment, consider the tablet formulation study of Table 2.1 which had three quantitative design variables, x,, x2, x3, and two quantitative environmental variables, z,, z2. Suppose that all of the variables, both design and environmental, are to be studied at three settings (coded -1, 0, +1), and that each combination was to be run independently and the experiment hlly randomized.
TABLE 2.10 TAGUCHI-DESIGN FOR THREE DESIGN VARIABLES AND TWO ENVIRONMENTAL VARIABLES
-1 0 0 0 +1
+1
-1 0 +I
+1
-1 0
+I
+I
0 +1 0 -1 0
-1
.___
+1
Taguchi's approach of using a separate design and environmental array might result in a nine run fractional factorial design for the design
STABILITY AND RESPONSE SURFACE METHODOLOGY
43
variables and a nine run full factorial design for the environmental variables. The complete crossed design is shown in Table 2.10. It can be seen that it would require 9x9 = 8 1 runs. This design would yield estimates of linear and quadratic effects for the variables and of the interactions between the design and the environmental variables. However, it does not yield any unbiased estimates of the two-factor interactions among the design variables. An alternative design, based on applying response surface methodology to a combined set of design and environmental variables, could result in a smaller number of runs. One such design is the face-centered composite design described in Section 2.2.3. This design would consist of a 16-run, resolution V, fractional factorial design, augmented by a pair of star points for each factor, and a number (no)of center points. Such a design, with no = 4 center points, is shown in Table 2.1 1. This design will permit the estimation of all the terms of a full-second order model
i=l
j=l
i=l
j=l
i=l k = i + l
Thus, not only will this design estimate all of the linear and quadratic terms and interactions between the design and the environmental variables, but it will also estimate all of the two-factor interactions among the design variables and among the environmental variables. It will accomplish this in only (26 + no)runs, compared with the 81 runs for the Taguchi design that yields less information. It might be argued that a more reasonable approach for the Taguchi-type design given in Table 2.10 would be to run a two-level design in the environmental variables since an experimenter is unlikely to be interested in estimating the quadratic effects of the environmental variables. Such a situation would permit the use of a 22 full factorial design for the environmental array, and the complete design would require 9x4 = 36 runs. It is noted that this is still more than is required for the composite design in Table 2.1 1. In fact, under the assumption that the quadratic effects of the environmental variables are not of interest, the design of Table 2.1 1 could be reduced to (22 + no)runs by eliminating runs 23-27, the star points for the environmental variables.
44
S.P. JONES
TABLE 2.1 1
FACE-CENTERED CENTRAL COMPOSITE DESIGN FOR THREE DESIGN VARIABLES AND TWO ENVIRONMENTAL VARIABLES Run x2 x3 =I =2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
-1
-1 -1 -1 +1 -1 -1 -1 -1 +1
+1 +1
+1 +1
+1 +1
-1 +1 0 0 0 0 0 0 0 0 0 0 0 0
-1 -1 -1 +1 -1 -1 +1
+1 +1 -1 -1 -1 +1 +1 +1 +1 0 0 -1 +I 0 0 0 0 0 0 0 0 0 0
-1 -1 +1 -1 -1 +1 +l -1 +1 -1 +1 +1 -1
-1 +1 +1
0 0 0 0 -1 +1 0 0 0 0 0 0 0 0
-1 +1 -1 -1 -1 +1 +1 +1 -1 +1 -1 +1
-1 +1 -1 +1 0 0 0 0 0 0 -1 +1
0 0 0 0 0 0
+1 -1 -1 -1 -1 +1
-1 +1 +1 +1 +1 -1 +1 -1 -1 +1 0 0 0 0 0 0 0 0 -1 +1 0 0 0 0
As another example of the reduction in the number of runs, consider an experiment to investigate three design and four environmental variables, all at three levels. A Taguchi crossed array might use a 3(3-1)fractional factorial design for the design array and a 3(4-1)fractional factorial design
45
STABILITY AND RESPONSE SURFACE METHODOLOGY
for the environmental array giving a complete crossed design of 9x27 = 243 runs. This design would yield estimates of the linear and quadratic effects for all the variables and of the interactions between the design and the environmental variables. However, it does not yield any unbiased estimates of the two-factor interactions among the design variables. An alternative design would be the seven variable Box-Behnken design shown in Table 2.12. In this table each group of eight runs consists of all eight combinations of *1 for the three variables indicated, the other four variables being set at their center point, 0. The design is completed with no center points giving a total of (56 + no)runs.
TABLE 2.12 BOX-BEHNKEN DESIGN FOR THREE DESIGN VARIABLES AND FOUR ENVIRONMENTAL VARIABLES Runs =4 zl z2 z3 x2 x3
1-8 9-16 17-24 25-32 33-40 40-48 49-56 nn
0 *1 0 *1 0 *l 0 0
0 0 *1 *l 0 0 *I 0
0 0 0 0 *1 *l *1 0
*1
0 0 *1 +1 0 0 0
*I 0 *1 0
0 *l 0 0
*1 +1 0 0 0 0 *1 0
0 *l +1 0 *1 0 0 0
The design of Table 2.12 will permit the estimation of all the terms of a full-second order model
Thus, this design will provide data to estimate all of the linear and quadratic terms and interactions between the design and environmental variables, but it will also estimate all of the two-factor interactions among the design variables and among the environmental variables. As with the
46
S.P. JONES
previous design, the Taguchi crossed design gives less information while requiring more runs than a standard second-order design. Note that even if the environmental variables are at two levels so that a 2(4-1) fractional factorial design can be run for the Taguchi environmental array, the complete crossed design has 9x8 = 72 runs, still more than the Box-Behnken design of Table 2.12, while providing estimates of fewer coefficients of the second-order model. Shoemaker et al. [37] give several examples of the reduction in the number of experimental runs that can occur when it is assumed that some of the terms in the fill second-order model are negligible. The reader is warned, however, that assuming a term is negligible is not an assurance that it can be ignored. The presence of terms in the true model that were assumed negligible will bias the estimates of the other coefficients. Box and Jones [20,38], showed that by considering the experimental objective, it is possible to construct smaller designs without having to assume that certain interactions are negligible. They showed that if the experimenter's objective is to find the design combination that minimizes the variance, then the second-order effects among the environmental variables (that is the pure quadratic and interaction terms) are not of interest. Consequently, smaller designs can be constructed by aliasing together interactions among the environmental variables. These designs would still enable the unbiased estimation of all other coefficients of the second-order model, even if the interactions among the environmental variables are not negligible. As an example, consider the experiment with three design variables and four environmental variables described above. A design based on the central composite design could be used that would only require (38 + no) runs. This would be achieved by using as the cube portion a 32-run7 resolution IV design that confounded all of the two-factor interactions among the environmental variables with one another, along with 6 star points for each of the design variables, and no center points. A facecentered composite design of this form, with four center points, is shown in the example given in Section 2.3.5; see Table 2.13. It should be noted that the use of this experimental design does not require the assumption of negligible interactions among the environmental variables, only that they are not of interest. If non-negligible interactions do exist they will not bias the estimates of the other coefficients of the second-order model. To summarize, it has been shown that combining the design and environmental variables into a single set for a response surface design not
STABILITY AND RESPONSE SURFACE METHODOLOGY
47
only results in experiments that frequently require fewer runs than Taguchi's designs, but also there is considerable flexibility in choosing the designs so that all of the coefficients of interest can be estimated and runs are not wasted to estimate coefficients that can be ignored. 2.3.4 Analysis of experimental designs with environmental variables Having considered the advantages of designing an experiment with a combined set of design and environmental variables, as opposed to Taguchi's crossed arrays, this section will consider the analysis of such experiments. It should be noted that in contrast to the previous section there is no pure replication of the design points from the response surface design. Consequently it is not possible to estimate the variance at each design point and to fit a model for the variance. The analysis approach in this section is based on fitting a model to the data without distinction as to whether the variables are design or environmental variables. The explicit modeling of the environmental variables has advantages over the modeling of a summary measure of variation such as the standard deviation which can lead to erroneous conclusions (see Steinberg and Bursztyn [39]). At this stage it is helpful to consider, in general terms, the objective of an experiment to investigate robustness. Consider an experiment with one design variable, x, and one environmental variable, z. The objective is to determine a setting of x that will yield a response that does not change as z varies. From this description it is clear that information on robustness will be contained in the interaction between x and z.
\ x=
+1
Environmental Variable,z
Figure 2.5 Design x Environment interaction plot
48
S.P. JONES
Figure 2.5 shows a possible interaction plot of x and z. In this figure the 0 setting of x yields a response that is approximately constant as the environmental variable, z is changed. This setting yields a response that is robust, or stable to the environmental variation, z. In contrast, at the other settings of x the response changes as z is varied, indicating that these settings of x do not make the response robust to the environmental variation, z. A good summary of the analysis methods discussed in this section can be found in Myers, Khuri, and Vining [40]. Similar approaches have been described by Welch et a1 [36], Shoemaker et a1 [37], and Box and Jones [5,38]; see also Myers [41]. Suppose that a response surface design has been run with n design variables, x,, x2,x3 ,..., x,, and m environmental variables, z,, z2,z3 ,..., 2,. During the experiment the environmental variables are controlled at fixed levels and can be regarded as fixed effects. Suppose that the x’s and z's are centered and scaled around 0. In this section, several alternative models for the relationship between the design and environmental variables and the response will be considered. Suppose, initially, that the response from the experiment can be adequately modeled by a first-order model in both the design and the environmental factors. n
m
i= I
j=l
In matrix notation,
where p and x are ( n x 1) vectors and y and z are (m x 1) vectors, and the E are independent N(O,o," ). In the experiment the environmental variables are controlled at fixed levels, but in reality the environmental variables have a random effect on the response, y,,. Thus, the actual variation in the response is
STABILITY AND RESPONSE SURFACE METHODOLOGY
49
where z are random settings of the environmental variables that affect the response in reality (outside of the experiment), and V is the variancecovariance matrix of z. It is clear from this formula that the variance of the response is independent of x, the settings of the design variables. Consequently, there is no opportunity for achieving a more robust response in the presence of the environmental variation, z, by selecting particular settings for the design variables. Consider, now, a second example where the response from the experiment can be adequately modeled by a model that contains linear terms in the design variables, x, and the environmental variables, z, and also cross-product terms xz. Therefore, if there are n design variables, x l , x2,. . ., x , and rn environmental variables, z,, z2,. . ., zm , then the response, y xz , can be represented by
= A + x a x i + cy j z j + m
n
y
x n
m
p J x i z j+ E
In matrix notation, yxz= Po+p'x
+ zTy + Z'DX + E
where f3 and x are (n x 1) vectors, y and z are (m x 1) vectors, and where D is an ( m x n) matrix that contains the coefficients that measure the interactions between the design and the environmental variables. It is assumed that an experimental design has been conducted that will permit estimation of all these two-factor interactions and the main effects of the design and the environmental factors. Box and Jones [21] discuss experimental designs that accomplish this. Now let
gj
=
I-[
=yj
dYXZ
dZj
z=O
n
+C4jxi i=l
forj = I , ...,m
50
S.P. JONES
Then
and g(x) = y + Dx is a measure of the change in the response, as a function of the design variables, in the direction of z at z = 0. Therefore, we have
y,,
=
Po + xTp+ zTg(x)+ &
(26)
Now, as before, in reality the environmental variables have a random effect on the response, y,,. Therefore the actual variance of the response is
where g(x) = y + Dx and V is the variance-covariance matrix of z. From this formula, it can be seen that the variance of the response is a function of the settings of the design variables. Therefore there is an opportunity for making the response robust to the environmental variation by careful selection of the settings of the design variables. Suppose that from an experiment good estimates of the terms of y and D are obtained and that the elements of V are known. Then the variance of y,, can be minimized as a function of the design variables, x. Also from equation (26), the mean response level is
under the assumption that the random environmental variables have a mean of zero. It can be seen that both VO, ) (equation (27)) and EO, ) (equation (28)) are essentially response surface models. From an experiment, estimates of y, D, 0,' , Po,and fl can be derived. Suppose, also that the elements of V are known, or can be estimated. Then the search for a choice of design variables that yields a response that is robust to the environmental variation and close to target will involve an examination of these two response surfaces. At this point, the scientist might proceed by following
STABILITY AND RESPONSE SURFACE METHODOLOGY
51
the dual response or constrained optimization approaches discussed in Section 2.3.2 or by simply overlaying contour plots of the mean and variance response surfaces. In practice, of course, there could be considerable uncertainty as to values for the elements of V, although it might be possible to estimate them from historical data. If reliable estimates of the values of the elements of V are unavailable then several alternative guesses could be made and the sensitivity of the conclusions to these estimates could be ascertained. If there is some target value, z, for the response then a measure of closeness of the mean response to that target is
Box and Jones [38] discussed the use of a general robustness measure of the form
where 0 I A I 1. Selection of a particular value for A corresponds to a particular weighing of the relative importance of being close to target and having small variation. Suppose, now, that the response from the experiment can be adequately represented by a model as in equation (22) but with the addition of pure quadratic and interaction terms for the design variables, x. For n design variables, x,, x2,. . ., x,,, and rn environmental variables, z I ,z2, . . ., z,, it is supposed that the model for the experiment is
In matrix notation we have
y
XZ
=
Po+ xTP+ xTBx+ zTy + zTDx+ E
52
S.P. JONES
where p and x are (n x 1) vectors, y and z are (m x 1) vectors, B is an (n x n) matrix that contains the coefficients that measure the interactions and pure quadratic terms among the design variables, and D is an (m x n) matrix that contains the coefficients that measure the interactions between the design and the environmental variables. As before, let g'(x) be as in equation (25) so that g(x) = y + Dx is a measure of the change in the response, as a function of the design variables, in the direction of z at z = 0. Therefore, we have y,,
=
Po+ x'p+
xTBx+ zTg(x)+ E
(33)
Now, as before, in reality the environmental variables have a random effect on the response, yxz. Therefore the actual variance of the response is
where g(x) = y + Dx. Therefore, the formula for the variance of the actual response is identical to the previous model (see equation (27)) and is a function of the settings of the design variables only through y and D. Therefore, as before, there is an opportunity for making the response robust to the environmental variation by careful selection of the settings of the design variables. Also from equation (33), the mean response level is
EO, ) = Po+ x'p + xTBx,
(35)
under the assumption that the random environmental variables have a mean of zero. The mean response level is now a function of both the firstorder and second-order terms in the design variables. Thus, it can be seen that both VbX) and EbX) are quadratic response surface models in x. From an experiment, estimates of y, D, a,", Po,p, and B can be derived. Suppose, also that the elements of V are known, or can be estimated. Then, as before, the search for a choice of design variables that yields a response that is robust to the environmental variation and close to target will involve an examination of these two response surfaces, equations (34) and (35).
STABILITY AND RESPONSE SURFACE METHODOLOGY
53
2.3.5 Example To illustrate the approach described above, consider an experiment with z3, three design variables, xl, x2,x,, and four environmental variables, zl, z2, z4.The objective was to find a setting of the design variables that will lead to a small response with minimum variability due to the environmental variables. Suppose that it was reasonable to assume that the second-order effects (that is the pure quadratic and interaction terms) among the noise variables were not of interest. The experimental design used was one based on the face-centered central composite design that only required (38 + no)runs. This design, described in Section 2.3.3, has as the cube portion a 32-run, resolution IV design that confounds all of the two-factor interactions of the noise factors with one another, along with six star points for the design factors and no center points. The design, with the responses from the experiment, is shown in Table 2.13. The following model, equation (36), was fit to the data.
It can be seen that this model contains all main effects, all quadratic terms in the design variables, all interactions among the design variables, and all interactions between the design and the environmental variables. An estimate of the pure experimental error can be obtained from the replication at the four center points. The ANOVA table shown in Table 2.14 indicates that there was no significant lack-of-fit of the model. Parameter estimates and t-statistics for this model are shown in Table 2.15. The following model for the response was derived using the significant effects indicated in Table 2.15. yXZ =41.83 + 2 . 5 0 ~- 3~ . 9 1 +~ 4.192, ~ - 4.382, + 2 . 6 9 +~2~. 3~8 ~ ,-~2,. 8 1 ~ ~+2 ,E
(3 7 )
54
S.P. JONES
TABLE 2.13 EXPERIMENTAL DESIGN AND DATA SET EXAMPLE Design Variables Environmental Variables Response
Runs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 21 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
x, -1 +1 -1 +1 -1 +1 -1
+I -1 +1 -1
x2
x3
-1 -1
-1 -1 -1 -1
+I +I -1 -1 +1
+I -1 -1
+I +I +I +I -1 -1 -1 -1
+I
+I +I
-1 +1 -1
-1 -1 +1
+1
+I
+I
+I
-1 +1 -1 +1 -1
-1 -1 +1 +1 -1 -1
-1 -1 -1 -1
+I -1 +1 -1
+I -1 +1 -1
+I -1 +1 -1
+I 0 0 0 0 0 0 0 0
+I +1 -1 -1
+I +1 -1 -1
+I +1 0 0 -1
+I 0 0 0 0 0 0
+I +I
+I +1 +1
+I -1 -1 -1 -1
+I +1
+I +I 0 0 0 0 -1
+I 0 0 0 0
-1 -1 -1 -1 -1 -1 -1 -1 +1
+I +I +1
+I +I +I +I -1 -1 -1 -1 -1 -1 -1 -1
+I +I +I +I +I +I +I +I 0 0 0
0 0 0 0 0 0 0
z2
z3
z4
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +1 +1 +1 +1
+1 -1 -1
+1 -1 -1
+I
+I
+I +I +I +I
-1
-1
+I
+I
+1 -1 -1
+1 -1 +1 -1 -1
+I +I -1
+I -1 -1 +1 +1 -1 -1
+I +1 -1
-1
+I
+I
-1 +1 +1 -1 +1
+1
+I +I
+I +I
+1
-1 +1 -1 -1
0 0 0 0 0 0 0 0 0 0
+1 -1 -1
+I -1 -1 -1
+I +I +I +I
+I -1 +I
+I -1
+I -1 -1
+I
+I
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Y 46 41 40 30 41 45 23 45 56 41 24 58 46 52 40 48 44 41 39 25 48 35 21 44 47 49 33 57 45 56 46 44 31 50 48 31 36 40 34 42 37 39
STABILITY AND RESPONSE SURFACE METHODOLOGY
55
TABLE 2.14 ANOVA TABLE FOR DATA IN TABLE 2.13 Sumof Mean Source df Squares Square F-ratio Model 24 2863.16 114.526 7.614 Error 16 240.67 15.042 Lack-of-fit 13 206.67 15.898 1.403 PureError 3 34.00 11.333 Total 41 3103.83
p-value < 0.0001 0.4394
This can be re-expressed as ~ , , = 4 1 . 8 3+ 2 . 5 0 ~- 3~ . 9 1 +~ 2.69x,x2 ~ + (4.19+ 2 . 3 8 ~ +~ (-) 4.38~ ~ 2 . 8 1 ~+~E) ~ ~ From this we have as the estimated mean response surface
E(yJ
= 41.83
+ 2.50x, - 3 . 9 1 +~ 2.69x1x2 ~
(39)
and, if we assume that the z's are uncorrelated, then the estimated variance response surface is ?(yJ
= (4.19+
2 . 3 8 ~ i?zl ~ ) ~+ (- 4.38- 2 . 8 1 ~ 3z3+ ~ ) ~ 3:
(40)
$2
Now from the center point runs an estimate of of 11.33 is obtained. Suppose that from previous studies or additional information it is known that good estimates for and 3f3are 1.0. Using these estimates, the estimated response surface for the variance is ?(yJ
= (4.19-t 2
. 3 8 ~ ~+ )(-~4.38- 2 . 8 1 ~ ~ + ) 11.33. ~
(41)
From equations (40) and (41), it can be seen that x3 has no effect on either the mean response or the variation. Furthermore, it can be seen that both xI and x2 have an effect on the variation of the response and that an opportunity exists to minimize the effect of the environmental variables z1 and z3 by a particular selection of these two design variables. It should be
56
S.P. JONES
noted that the analysis indicates that the other two environmental variables, z2 and z4,do not affect the response.
TABLE 2.15 PARAMETER ESTIMATES FOR DATA IN TABLE 2.13 Parameter Standard Variable Estimate Error t p-value Intercept 38.58 1.51 25.58 < 0.0001 2.50 0.67 3.76 0.0017 -3.91 < 0.000 1 0.67 -5.88 x2 x3 0.38 0.67 0.58 0.5734 4.19 < 0.0001 0.69 6.11 zl z2 -0.06 0.69 -0.09 0.9285 z3 -4.38 0.69 -6.38 < 0.0001 0.69 1.09 0.2902 z4 0.75 4.34 2.31 1.88 0.0787 0.34 2.31 0.15 0.8850 -0.66 2.31 -0.29 0.7787 2.69 0.69 3.92 0.00 12 1.06 0.69 1.55 0.1408 0.69 0.64 0.44 0.5324 2.38 0.0032 0.69 3.46 -0.88 0.2201 0.69 -1.28 -0.06 0.69 -0.09 0.9285 -0.19 0.7880 0.69 -0.27 0.63 0.3755 0.69 0.91 0.50 0.69 0.73 0.4764 -2.8 1 0.0008 0.69 -4.10 -0.8 1 0.69 -1.19 0.2533 0.13 0.69 0.18 0.8576 0.38 0.69 0.55 0.5920 0.69 0.69 1.00 0.3309 x3z4 -0.94 0.69 -1.37 0.1904
STABILITY AND RESPONSE SURFACE METHODOLOGY
57
Figures 2.6 and 2.7 show contour plots of the mean response and the standard deviation. It can be seen that within the experimental region the best setting of the design variables to minimize the response is to choose x,=-1 and x,=+l. In terms of minimizing the variation due to the environmental variables the best setting would be x,=-1 and x2=-1. Similar conclusions are reached for a range of alternative choices for the estimates of and indicating that the conclusions are not oversensitive to the particular estimates chosen for the variances of the environmental variables. Clearly, a compromise between the two objectives of minimizing the mean response and the variation in the response would need to be reached. One possibility would be to minimize the variation subject to the constraint that the mean response would be less than some target value. Alternatively, one could minimize the mean response subject to the constraint that the variation would be less than some target value. If there are different costs involved in operating the process at different settings of the design variables, then a contour plot showing the operating costs can be constructed to help find an operating condition that has low operating cost and reaches a satisfactory compromise between the two objectives.
&i3,
2.4 SPLIT-PLOT DESIGNS FOR ROBUST DESIGN In Sections 2.2 and 2.3 we considered the application of response surface methodology to the investigation of the robustness of a product or process to environmental variation. The response surface designs discussed in those sections are appropriate if all of the experimental runs can be conducted independently so that the experiment is completely randomized. This section will consider the application of an alternative class of experimental designs, called split-plot designs, to the study of robustness to environmental variation. A characteristic of these designs is that, unlike the response surface designs, there is restricted randomization of the experiment. Section 2.4.1 contains a brief description of split-plot designs and describes several alternative split-plot arrangements. In Section 2.4.2 the precision of the estimates from these split-plot arrangements is considered. Section 2.4.3 contains a discussion of variants of the standard split-plot arrangements. In Section 2.4.4 there is a discussion of the analysis of split-plot designs and a comparison of the design and analysis of split-plot experiments with the design and analysis methods proposed by Taguchi.
58
S.P. JONES
Figure 2.6 Contour plot of the mean response surface
xl
Figure 2.7 Contour plot of the variance response surface
STABILITY AND RESPONSE SURFACE METHODOLOGY
59
2.4.1 Overview of split-plot designs Split-plot designs occur in a wide range of applications of experimental design. One application area for split-plot designs is when there are some variables that can be applied only to experimental units that are larger than the units to which the other variables can be applied. As an example, consider an investigation on a heat treat process to determine if the temperature of the quench bath, whether parts are stacked horizontally or vertically, and two different types of fixture, affect warpage in metal castings. Suppose the bath can hold eight parts. Then within each bath we can test the effect of stacking and fixture, for example by running 8 parts according to a replicated 2* design in those two factors. We would run that same set-up for quench baths of different temperatures. To investigate the effect of temperature we need larger experimental units (baths), but to investigate the effect of stacking and fixture we can use smaller experimental units (parts). A second application area is when there is a variable that is difficult or expensive to change and so the randomization is restricted to limit the number of changes of that variable. This is accomplished by conducting the experiment in blocks with the restricted variable held constant within a block, but changed randomly between blocks. In this case the large experimental units are the blocks and the smaller experimental units are the individual runs within a block. An excellent exposition of split-plot experimental designs can be found in D.R. Cox’s book, “Planning of Experiments” [42]. He states that splitplot designs are particularly useful when one (or more) factors are what he calls classification factors. These factors are included in the experiment to determine whether they modify the effect of the other factors or indicate how the other factors work. The classification factors are included to examine their possible interaction with the other factors. Lower precision is tolerated for comparisons of the classification factors, in order that the precision of the other factors and the interactions can be increased. In the standard terminology associated with split-plot experiments, the classification factors are called whole-plot factors and are applied to the larger experimental units. The smaller experimental units are called subplots. In the following subsections several alternative experimental arrangements of split-plot experiments will be considered. The tablet formulation data given in the example of Table 2.1 in Section 2.1.1 will be
60
S.P. JONES
used to illustrate the applicability of these split-plot experiments to designing robust products and processes. Recall that in that example an experiment is to be run with three constituents of the tablet formulation, the design variables, which will be denoted as A, B, and C, and two environmental variables, storage temperature and humidity. The three design variables are arranged in a factorial design which is crossed with a 32 factorial design containing the environmental variables. A set of hypothetical data was shown in Table 2.1. The objective of the experiment is to determine a combination of the design variables that will yield high values for the response across the ranges of temperature and humidity studied in the experiment. The same set of data will be used in the following subsections to illustrate the different analyses for the alternative split-plot designs. Clearly, in practice the correct analysis of the experiment will depend on the particular experimental arrangement that was adopted.
Design (I): environmentalfactors as whole-plotfactors Using Cox's concept of classification factors, it seems most reasonable to have the classification factors, that is the whole-plot factors, associated with the environmental variables, since they are in fact included primarily to examine their possible interaction with the design variables. Thus, the first arrangement considered is one in which the whole plots contain the environmental variables and the subplots contain the design variables. Now, suppose that there are m levels of the environmental variables, El, E2, . . ., E,, . . ., Em,applied to the whole plots, that there are n levels of the design variables, D1,D2, . . ., D,, . . ., D , applied to the subplots, and that there are 1 replicates, rl, r2, . . ., rk, . . ., r,, with the whole plots in 1 randomized blocks. For the tablet formulation example given in Table 2.1 in Section 2.1.1, the environmental variables are temperature and humidity that are varied in a climate-controlled chamber and m is 9, the design variables are the quantities of A, B, and, C in the tablet formulation and n is 8, and since there is only one replicate 1 is 1. For the tablet formulation example of Table 2.1, this split-plot arrangement would require mxnxl = 9x8 = 72 tablet formulation batches to be made but only mxl = 9 operations of the climate chamber. The experiment would be conducted by placing in the climate chamber a complete set of 8 different tablet formulations at the same time. A completely randomized experiment (the cross-product experiment of
STABILITY AND RESPONSE SURFACE METHODOLOGY
61
Taguchi) with no replication would require not only 72 tablet batches, but also 72 operations of the chamber. It is clear, therefore that this experimental arrangement can be considerably easier to run than the completely randomized cross-product design. The model for arrangement (I) is y,
=m
+ rk+ Ej + hjk+ Dj + (DE)v+ egk,
where y is the response of the kthreplicate of the ihlevel of factor D, and Vk thejrhlevel of factor E, m is the overall mean, rk is the random effect of the th k replicate, with rk-N(o, 0," ), El is the fixed effect of thejfh level of E, D is the fixed effect of the ifh level of D, (DE) is the interaction effect of the ! I
irh level of D with the j t hlevel of E, hi,NCO,o:), is the whole-plot error,
e -N(O, rJk
0 : ) is the
subplot error, and h and eVk,are independent. lk
The ANOVA table is shown in Table 2.16. In this tablei,, d,, and Dkyare estimates of E ,D , and (DE) respectively. J ' V It can be seen from the ANOVA table that the sources of variation split into two parts, those coming from the whole plots (Env, and RXE) and those coming from the subplots (Design, DxE, and Error). In this case the mean square for Env would be tested against that of RxE, and the mean square for Design and for DxE would be tested against that of Error. Now suppose that there is no replication. Then to test Env, Design, and the interaction DxE estimates of 0,"and 0,'+ no: would be required. One possibility is to construct two normal plots, one for whole-plot and one for subplot contrasts, and to pick out as active contrasts those that fall away from a line (see Daniel [43]). Alternatively, if the design and the environmental factors are factorial combinations it may be possible to assume that higher-order interactions are negligible. If this assumption is reasonable then the whole-plot error can be estimated by pooling the higher-order interactions among the environmental variables, and the subplot error can be estimated by pooling the higher-order interactions among the design factors and between the design and the environmental factors. For example, for the tablet formulation data of Table 2.1, assuming that all contrasts involving three or more factors are estimating error, the following ANOVA table (see Table 2.17) is obtained.
62
S.P. JONES
TABLE 2.16 Source
ANOVA TABLE FOR ARRANGEMENT (I) df Sum of Squares Expected Mean Squares
Reps(R)
1-1
mna,2 + a,2 + n o ,2
Env(E)
m-1
- -nl- C E ~
RXE
(1-I ) (m-I )
Design(D) n-1
Error
(l I)m -(n-I )
j=l
i=l
2
+ a,2 +no:
2
+ 0, 2
m-1 j = l a, 2 + n a, 2,, - lm xDi n-1 i=l
2 0,
For the subplot analysis it appears that the effects due to A, and the interaction between B and Humidity are real, with some evidence of an interaction between B and Temperature. It is possible to split the two degrees of freedom for Temperature and Humidity into linear and quadratic contrasts and to construct a normal probability plot for the whole plot contrasts. This would reveal important effects due to the linear components of both Temperature and Humidity.
Design (II): design factors as whole-plotfactors An alternative arrangement to design (I) would be to have the design variables as the whole-plot factors and the subplots contain the environmental variables. As before, suppose that there are m levels of the environmental variables, E l , E2, . . ., Ej, . . ., Em , that there are n levels of the design variables, D,, D2, . . .Di, . . ., Dn , and that there are 1 replicates, rl,r2, * ., rk, . * ., r,, with the whole plots in 1 randomized blocks. The contrast with arrangement (I) is that in this arrangement the environmental variables are applied to the subplots, and not the whole plots, whereas the design variables are applied to the whole plots not the subplots. For the tablet formulation example, this experimental arrangement would arise if eight tablet formulation batches were made according to the eight different design combinations and that each of these batches were
STABILITY AND RESPONSE SURFACE METHODOLOGY
63
divided into nine sub-batches and that each of the 72 sub-batches were placed individually in the chamber for the appropriate setting of temperature and humidity. This would require only nxl = 8 tablet formulation batches to be made but would require mxnxl = 72 operations of the chamber. A completely randomized experiment in which there was no replication would have required 72 tablet batches and also 72 operations of the chamber. It is clear, therefore that this experimental arrangement can be considerably easier to run than the completely randomized design.
TABLE 2.17 ANOVA FOR THE TABLET FORMULATION DATA OF TABLE 2.1 FOR ARRANGEMENT (I) Source df ss MS F-ratio 2 Temp(T) Humidity(H) 2 TxH 4 Design A 1 B 1 C 1 AxB 1 AxC 1 BxC 1 Design x Env A x T 2 AxH 2 BxT 2 BxH 2 CxT 2 CxH 2 Higher Order 45 Interactions F (.05)=4.06 F (.05)=3.20 F
Whole Plot (Env)
1,45
2,45
1204.1 1199.4 350.5 938.9 60.5 144.5 24.5 40.5 18.0 62.1 17.0 399.0 799.1 65.3 97.6 2759.5 1,45
602.1 599.7 87.6 938.9 60.5 144.5 24.5 40.5 18.0 31.1 8.5 199.5 399.5 32.7 48.8 61.3
(.01)= 7.23 F
2,45
15.31 0.99 2.36 0.40 0.66 0.29 0.5 1 0.14 3.25 6.52 0.53 0.66
(.01)=5.11
The model for experimental arrangement (11) is
where, as before, y,, is the response of the kth replicate of the ifh level of factor D, and the j"' level of factor E, m is the overall mean, rk is the
64
S.P. JONES
random effect of the Ith replicate, with rk-N(O,a,?>,E,is the fixed effect of the j t h level of E, D,is the fixed effect of the interaction effect of the the subplot error.
iIh
J
ihlevel of D, (DE)..is the ! I
level of D with thejthlevel of E, e..-N(O,o:) is !Ik
In arrangement (11), q,-N(O,a:) is the whole-plot error, and q, and e.. !Ik are independent. The ANOVA table is shown in Table 2.18. This table shows that the sources of variation can be split into two parts, those coming from the whole plots (Design, and RxD) and those coming from the subplots (Env, DxE, and Error). In this case the mean square for Design would be tested against that of RxD,and the mean square for Env and for DxE would be tested against that of Error.
TABLE 2.18 Source
ANOVA TABLE FOR ARRANGEMENT (11) df Sum of Squares Expected Mean Squares
Reps(R)
1-1
Design(D) n-1
RXD
(1- 1)(n-1)
Env(E)
m-1
mno,2 Im
n
Cij; i=l
2 o,2 + + m o w
nl
rn
C i?; j=l
Error
(1- 1)n(m-I )
+ as2 + m a ,2
-C nl E j m-l j = l
2
+ 0,"
2 a"
As has been already pointed out, of course, the constructed data would only be appropriate for the model that reflected the way in which the experiment was carried out. However, to illustrate the analysis the same data from-Table 2.1 will be used. The ANOVA table for the tablet formulation data, assuming that the experiment was run according to arrangement (11), is shown in Table 2.19. For this arrangement, higher-order interactions are assumed to be negligible and their sums of squares are pooled to give an estimate of error. From the ANOVA table it appears that in the sub-plot analysis there
STABILITY AND RESPONSE SURFACE METHODOLOGY
65
are real effects due to both Temperature and Humidity as well as the interaction between B and Humidity. Further analysis would reveal that both the Temperature and Humidity effects are predominantly linear rather than quadratic. A normal plot of the whole-plot contrasts involving the design variables could be constructed and, in this example, would indicate the importance of factor A.
TABLE 2.19 ANOVA FOR THE TABLET FORMULATION DATA OF TABLE 2.1 FOR ARRANGEMENT (11) Source df ss MS F-ratio
A 1 B 1 1 C 1 AxB 1 AxC 1 BxC 1 AxBxC 2 Env Temp(T) 2 Humidity(H) 4 TxH 2 Design x Env A x T AxH 2 BxT 2 BxH 2 CxT 2 CxH 2 Higher Order 44 Interactions F (.05)=4.06 F (.05)=3.21 F
Whole-Plot (Design)
1,44
2,44
938.9 60.5 144.5 24.5 40.5 18.0 72.0 1204.1 1199.4 350.5 62.1 17.0 399.0 799.1 65.3 97.6 2687.5
938.9 60.5 144.5 24.5 40.5 18.0 72.0 602.1 599.7 87.6 31.1 8.5 199.5 399.5 32.7 48.8 61.1
(.01)= 7.25 F
1,44
2,44
9.86 9.82 1.43 0.5 1 0.14 3.25 6.52 0.53 0.66
(.01)=5.12
Design (IIJ: strip-block design Let us now consider an experimental arrangement where the subplot levels are assigned randomly in strips across each block of whole-plot levels. Such arrangements are frequently called strip-block designs. As an illustration of this arrangement, suppose that we have a whole-plot variable with three levels, a,, u2, and u3,a subplot variable with two levels, bl and
66
S.P. JONES
b2 and three blocks. Pictorially, the design is represented by the following figure (Figure 2.8). Block 1 a2 a1 a3
Block 2 a3 a2 a1
Block 3 a1 a3 a2
Figure 2.8 Strip-block design There is a certain symmetry with the allocation of the two variables, to the extent that it seems that both variables could equally well be designated as the whole-plot variable. Suppose, as before, that there are m levels of the environmental variables, E l , E2, . . ., Ej, . . ., E m , that there are n levels of the design variables, D,, D2, . . .Di, . . ., D , and that there are 1 replicates, r , , r2, . . ., rk, . . ., r,. For the tablet formulation example, the strip-block arrangement would arise if nxl = 8 tablet formulation batches were made according to the design combinations and that each of these batches were divided into mxl = 9 sub-batches. One sub-batch from each of the eight batches is then selected at random and these eight are placed in the chamber at the same time at the appropriate setting of temperature and humidity. This design would require only n = 8 tablet formulation batches to be made and only m = 9 operations of the chamber. Strip-block experiments, such as the one described in this section, are clearly considerably easier to run than either the completely randomized product design or either of the split-plot designs described above, that is, arrangements (I) and (11). The model appropriate for the strip-block arrangement is y, = m + rk+ EJ. +h. Jk + D.I + ' i k
where, as before, y,
+ (DQii + eUk,
(44)
is the response of the kth replicate of the
i" level of
factor D, and the j hlevel of factor E, m is the overall mean, rk is the
random effect of the kthreplicate, with rk-N(0, 0,'), Ej is the fixed effect of t h e j h level of E, D. is the fixed effect of the
ilh level of D, ( D a Vis the
interaction effect of the ihlevel of D with t h e i h level of E. In arrangement
67
STABILITY AND RESPONSE SURFACE METHODOLOGY
(III), hjk-N(O,$), qik-N(O,oi), eyk-N(0,02)and hjk , q, , and eiikare independent. The ANOVA table is shown in Table 2.20. This table shows that the sources of variation can be split into three parts. The mean square for the environmental variables are tested against that of RxE; the mean square for the design variables are tested against that of RxD; and the mean square for the design x environment interactions are tested against that of RxExD. When there is no replication and the design is sufficiently large either three normal plots can be constructed, one for the design variable contrasts, one for the environmental variable contrasts, and one for the designxenvironment interactions contrasts. Alternatively, ( o2+n ) could be estimated by pooling the higher-order interactions among the environmental variables, ( o2+m 02, ) by pooling the higher-order
02
interactions among the design variables, and o2 by pooling the higherorder interactions among the design x environment interactions.
TABLE 2.20 ANOVA TABLE FOR ARRANGEMENT (111) Sum of Squares Expected Mean Squares
Source
df
Reps(R)
1-1
Env(E)
m-1
RXE
(1- 1)(m-1)
Design(D) n-1
mno;
+ o2+ m o D 2 + n o ,2
j=1
2 02+noE
fi:
lm i=l
- lm XD:+o n-1 i=l
RXD
(1- 1)(n-1)
o2 + m o ,2
RxDxE
(1-1)(m-1)(n-1)
o2
2
+moD 2
To illustrate the analysis the same data given in Table 2.1 will be used, assuming that the experiment was conducted as a strip-block design. The ANOVA table for the tablet formulation data is given in Table 2.2 1.
68
S.P. JONES
As was discussed with arrangement (I), it is possible to split the two degrees of freedom for Temperature and Humidity into linear and quadratic contrasts and to construct a normal probability plot for the environmental variable contrasts. This would reveal important effects due to the linear components of both Temperature and Humidity. A normal plot for the design contrasts would indicate that there appears to be a real effect due to A. The analysis of the design x environment interactions is obtained by pooling together higher-order interactions to obtain an estimate of the error term 0 2 .This analysis indicates that the interaction between B and Humidity appears to be the only real interaction effect.
TABLE 2.21 ANOVA FOR THE TABLET FORMULATION DATA OF TABLE 2.1 FORARRANGEMENT(II1) Source df ss MS F-ratio Temp(T) Humidity(H) TxH A B C AxB AxC BxC AxBxC AxT AxH BxT BxH CxT CxH Higher Order Interactions
2 2 4 1 1 1 1 1 1 1 2 2 2 2 2 2 44
1204.1 1199.4 350.5 938.9 60.5 144.5 24.5 40.5 18.0 72.0 62.1 17.0 399.0 799.1 65.3 97.6 2687.5
F2,44(.05) = 3.21
602.1 599.7 87.6 938.9 60.5 144.5 24.5 40.5 18.0 72.0 31.1 8.5 199.5 399.5 32.7 48.8 61.1 F2,44(.01) = 5.12
0.5 1 0.14 3.25 6.52 0.53 0.66
STABILITY AND RESPONSE SURFACE METHODOLOGY
69
2.4.2 Precision of split-plot designs The above descriptions of alternative split-plot arrangements show that the ease of experimentation can vary depending on the particular experimental design followed. However, in considering which experimental design to adopt, the investigator should weigh many other criteria besides ease of experimentation. One criterion of importance is the precision of the estimates of the effects that these arrangements yield. It can be shown, see for example Kempthorne [18], Box and Jones [5], that for both design (I) and (11), the whole plot effects are determined less precisely than compared with the cross-product design, but that the subplot variables and the design x environment interactions are determined more precisely. For the strip-block design both the design and environmental vaiable effects are determined less precisely than with the cross-product design, but the design x environment interactions are determined more precisely. When the strip-block design is compared with both split-plot designs (I) and (11), the whole plot effects are determined with the same precision, the sub-plot effects are determined with less precision but the design x environment interactions are determined with more precision. 2.4.3 Variants of split-plot designs Adaptations of the split-plot methodology have been suggested by many authors (see, for example, Kempthorne [ 181, Cochran and Cox [44]). These authors describe various blocking arrangements to control for other sources of variation in split-plot experiments. The relevance of some of these arrangements to split-plot designs that investigate the influence of environmental variation is discussed in Box and Jones [5]. One of the most usefil adaptations of the split-plot design is the confounding of higher-order split-plot interactions when the split-plot treatments are in a factorial design; first suggested in Bartlett [45]. Such an experimental design requires fewer sub-plots within each whole plot, but still enables the required effects to be estimated. When the whole-plot design is a factorial design then it may be possible to reduce the number of whole plots required by confounding certain whole-plot interactions. The use of factorial and fractional factorial designs in split-plot arrangements has been investigated by Addelman [46], see also Daniel [47]. As an example of such an arrangement, consider a tablet formulation experiment with two environmental variables, temperature (T) and humidity (H), and five design variables, A, B, C, D, and E; with all of the
70
S.P. JONES
variables at two levels. Suppose that it has been decided that the environmental variables will be assigned to the whole plots and the design variables assigned to the sub-plots (design arrangement (I)). Suppose that the chamber can hold no more than 20 batches of tablets at a time. With this constraint it is no longer possible to use a full factorial for the design factors within each whole plot (run of the chamber) since this would require 25= 32 tablet batches for each run of the chamber. An alternative design would be to use a half-fraction of the design variables for each run of the chamber. Such a design, before randomization, is shown in Table 2.22. With this design the ABCDE fivefactor interaction is confounded with the TxH whole-plot contrast. Under the assumption of negligible three-factor and higher-order interactions all main effects and two-factor interactions can be estimated as well as interactions between the design and the environmental variables. 2.4.4 Analysis of split-plot designs for robust experimentation The appropriate analysis of data obtained from an experiment should be determined by the experimental design used to obtain those data. The hndamental characteristic of split-plot designs is that there are experimental units of different sizes and consequently multiple sources of variation. The analysis needs to take account of this structure and include multiple error terms and to test the significance of effects and interactions against the appropriate error term. This has been illustrated above with the three experimental arrangements for split-plot and strip-block designs. If there is replication of the experiment then an independent estimate of the error terms can be calculated and valid statistical tests, such as ANOVA, can be constructed. In split-plot and strip-block designs that are unreplicated, there is no independent estimate of the appropriate error terms available. Several alternative analysis approaches have been advocated. A number of authors, for example, Mason, Gunst, and Hess [48] (p. 370), suggest estimating the error terms by combining higher-order interactions that are assumed to be negligible. An alternative is to construct separate normal or half-normal probability plots for the effects and interactions calculated from the different types of experimental units. Then under the assumption of effect sparsity the slopes of the lines from the inert effects can be used to estimate the separate error terms. An alternative approach, which also depends on the assumption of effect sparsity, suggested by Box and Jones [ 5 ] , would be to construct separate
71
STABILITY AND RESPONSE SURFACE METHODOLOGY
TABLE 2.22
EXAMPLE OF A SPLIT-PLOT DESIGN USING A FRACTIONAL FACTORIAL Chamber Run 1 Chamber Run 2 Chamber Run 3 Chamber Run 4 Temp = Temp = + Temp = Temp = + Humidity = Humidity = Humidity = + Humidity = + A B C D E A B C D E A B C D E A B C D E
- - - - -
- + + + +
+ + + - - - + - - + + - - + - + - - + - + - + + -
+ + + +
+ + +
+ + +
+ + + + + + -
+ + - - + + - + - + + - - + - - -
+ + + + + -
+ + + + + + -
+ + + + + + + + + +
+ + + + + + + + -
+ + + - +
+ + + + + -
+ -
+ + + + +
+ + + + +
+ + +
+ -
- + - +
+ + + + + -
+ + + + + + -
+ + + + + + + + + + +
- -
+ + + + + + + + -
+ + +
+ + + + + -
+ + + + +
+ + + +
-
+
- -
+ + + + +
+ + + + +
-
- +
_ - - - -
+ + + +
- -
-
+ + + +
-
+ + + +
+ + + + - + + + + - - + + -
+ + + + + + + -
+ + + - - - + - - + - + - + - - -
Bayesian probability plots (Box and Meyer [49]) for the contrasts from different types of experimental units. The split-plot arrangement has similarities with Taguchi's crossed designs since both arrangements divide the factors into two groups; in the split-plot terminology the factors are assigned to either whole plots or subplots, in Taguchi's terminology the factors are either assigned to an inner (design) array or an outer (noise) array. Although there are similarities in the appearance of the designs, there are marked differences in the analysis of these designs. Some of these differences reflect different philosophical approaches to data analysis. Taguchi's analysis of robust design experiments is frequently conducted in terms of a performance statistic, such as a signal-to-noise ratio, that is calculated for each point of the design array using data obtained from the environmental (noise) array about that point.
72
S.P. JONES
The use of a single-valued summary statistic, such as a signal-to-noise ratio, has been considered in Box [30]. One of the criticisms of these signal-to-noise ratios is that they can obscure information that is contained in the data and thus limit the impact that the experimenter can have as they study the data. From an analysis of a signal-to-noise ratio, the experimenter does not know which of the environmental variables have significant interactions with the design variables and the magnitude of those interactions. This information, if it were available, might suggest ideas to the experimenter as to the underlying scientific theory which might influence the future course of the investigation.
I
I
-1
I
0
I
+1
Humidity
Figure 2.9 Interaction offactor B and humidity in datafiom Table 2.1 As was discussed in Section 2.3.4, the effects of the environmental variables on the design variables can be determined by studying the interactions between the design and environmental variables. To illustrate this, for the tablet formulation with the split-plot design arrangement (I) it was concluded that there was a significant interaction between humidity and factor B (see Table 2.17). This interaction is illustrated in Figure 2.9. From this figure it can be seen that by using the +1 setting of B the tablet is less sensitive to the changes in humidity. This information could be important to the subject matter specialist who might know of similar
STABILITY AND RESPONSE SURFACE METHODOLOGY
73
constituents that could be included in the tablet formulation to make it even more robust to changes in the humidity. It is the investigation of these design x environment interactions that are the key to understanding robustness and could lead to new aspects in the investigation and to significant improvements in the robustness of products. Therefore, a preferred analysis is one that identifies the significance and magnitude of individual design x environment interactions rather than an analysis in terms of a signal-to-noise ratio or a standard deviation that would obscure information that would be present in the individual interactions. We have seen that split-plot and strip-block designs can be considerably more convenient to conduct than the cross-product designs advocated by Taguchi. In particular, since in robust designs we are not specifically interested in the main effects of the environmental variables and would be prepared to accept lower precision in our estimates of these main effects, it would generally be more appropriate to have the environmental variables as whole-plot factors, as in arrangement (I). Alternatively, a strip-block design, arrangement (111), might be the most convenient design and would yield precise estimates of the key design x environment interactions. With regard to analysis, since Taguchi's analysis is fiequently conducted in terms of a single performance measure, such as a signal-to-noise ratio, that is calculated for each point of the design array, he ignores any information that might be contained in particular design x environment interactions. It is these interactions that are key to understanding robustness of product designs. In contrast, the split-plot and strip-block designs enable efficient estimation of these interactions. Therefore, split-plot designs and strip-block designs are of tremendous value in robust design experiments since they permit the precise estimation of the interactions of interest and can be considerably easier to run than the cross-product design that have traditionally been advocated.
2.5 CONCLUSIONS The concept of designing products and processes that are robust, or stable, to environmental variation is clearly very important. Robust design enables the experimenter to discover how to modify the design of a product or process to minimize the effects due to variation fiom environmental sources that are difficult, if not impossible, to control.
74
S.P. JONES
In this chapter the use of statistical experimental designs in designing products and processes to be robust to environmental conditions has been considered. The focus has been on two classes of experimental design, response surface designs and split-plot designs. The choice of an appropriate experimental design depends on the experimental circumstances. Box and Draper [12] (p. 502, 305) list a series of experimental circumstances that should be considered by the investigator when selecting a response surface design. Many of these considerations also apply to split-plot designs, and to experimental design in general. In the response surface strategy that was discussed in Section 2.3 standard response surface techniques are used to generate two response surface models, one for the mean response and one for the standard deviation of the response (or some function of the standard deviation). The standard deviation measures the stability of the response to the environmental variation. Standard analysis can reveal which factors affect the mean only, which only affect the variability, and which affect both the mean and the variability. The researcher can then apply optimization methods or construct contour plots of the mean and standard deviation response surfaces to determine settings of the design variables that will give a mean response that is close to the target with minimum variation. Taguchi's designs, a cross-product of two experimental designs, one for design variables and one for environmental variables, can require an excessive number of runs. In Section 2.3.3 it was shown how the number of runs can be substantially reduced by constructing a single experimental design that combined both the design variables and the environmental variables. The designs associated with response surface methodology offer considerable flexibility and can be built sequentially so that experimental resources can be used efficiently. The analysis proposed by Taguchi involves the construction of a signalto-noise ratio that combines both the mean and variability. Thus, with Taguchi's analysis there is a missed opportunity for a deeper understanding of the different variability that might affect the mean and the variability. As has been noted in this chapter, any restriction on the randomization of the experiment will lead the investigator to conduct one of the split-plot designs that were described in Section 2.4. In that section it was shown that the split-plot type designs can be a more efficient way to run robust design experiments than the cross-product arrays of Taguchi. Furthermore, the standard methods of analysis of split-plot experiments, that seek to
STABILITY AND RESPONSE SURFACE METHODOLOGY
75
estimate individual design x environment interactions, will yield more information than the signal-to-noise ratios proposed by Taguchi.
ACKNOWLEDGEMENTS The author expresses his appreciation to Denis Janky, David Rose, and Rod Tjoelker for their helpful comments on earlier versions of this chapter.
REFERENCES F. Yates, and W.G. Cochran, The analysis of groups of experiments, Journal of Agricultural Science, 28 (1938) 556-580. [2] G. Wernimont, Ruggedness evaluation of test procedures, Standardization News, 5 (1977) 13-16. [3] W.J. Youden, Experimental design and ASTM committee, Materials Research and Standards, 1 (1961) 862-867. Reprinted in Precision and measurement Calibration (Vol. 1. Special Publication 300), Gaithersburg, MD: National Bureau of Standards, 1969, ed. H.H. Ku. [4] W.J. Youden, Physical measurement and experimental design, Reprinted in Precision and measurement Calibration (Vol. 1. Special Publication 300, 1961), Gaithersburg, MD: National Bureau of Standards, 1969, ed. H.H. Ku. [5] G.E.P. Box and S.P. Jones, Split-plot designs for robust product experimentation, Journal of Applied Statistics, 19 (1992) 3-26. [6] G.E.P. Box and K.B. Wilson, On the experimental attainment of optimum conditions, Journal of the Royal Statistical Society, Series B, 13 (195 1) 1-45. [7] G.E.P. Box, The exploration and exploitation of response surfaces: some general considerations and examples, Biometrics, 10 (1954) 116-60. [8] G.E.P. Box and P.V. Youle, The exploration and exploitation of response surfaces: an example of the link between the fitted surface and the basic mechanism of the system, Biometrics, 11 (1955) 287-323. [9] G.E.P. Box, J.S. Hunter and W.G. Hunter, Statistics for Experimenters, New York, Wiley, 1978. [lo] Cornell J.A., How to Apply Response Surface Methodology. Basic References in Quality Control: Statistical Techniques, Vol. 8. Milwaukee: American Society for Quality Control, 1985. [ l 11 R.H. Myers, Response Surface Methodology, Boston, Allyn and Bacon, 1971. [ 121 G.E.P. Box and N.R. Draper, Empirical Model Building and Response Surfaces, New York, Wiley, 1986. [13] A.I. Khuri and J.A. Cornell, Response Surfaces: Designs and Analyses, New York, Marcel Dekker, 1987. [14] R.L. Plackett and J.P. Burman, The design of optimum multifactorial experiments, Biometrika, 33 (1946) 305-325.
[l]
76
S.P. JONES
[15] N.R. Draper and D.K.J. Lin, Projection properties of Plackett and Burman designs, Technometrics, 34 (1992) 423-428. [16] M. Hamada and C.F.J. Wu, Analysis of designed experiments with complex aliasing, Journal of Quality Technology, 24 (1992) 130-137. [17] Box, G.E.P. and R.D. Meyer, Finding the active factors in fractional screening experiments, Journal of Quality Technology, 25 (1993) 94- 105. [18] 0. Kempthome, The Design and Analysis of Experiments, New York, Wiley, 1952. [19] G.E.P. Box and D.W. Behnken, Some new three-level designs for the study of quantitative variables, Technometrics, 2 (1960) 455-475. [20] G.E.P. Box and S.P. Jones, Robust product designs, part 11: second-order models, Report No. 63 (1990), Center for Quality and Productivity Improvement, University of Wisconsin-Madison. [21] G.E.P. Box and S.P. Jones, Robust product designs, part I: first-order models with design x environment interactions, Report No. 62 (1990), Center for Quality and Productivity Improvement, University of Wisconsin-Madison. [22] A.C. Atkinson and A.N. Donev, Optimum Experimental Designs, New York, Oxford University Press, 1992. [23] T.J. Mitchell, An algorithm for the construction of D-optimal experimental designs, Technometrics, 16 (1974) 203-210. [24] G.E.P. Box, Choice of response surface design and alphabetic optimality, Utilitas Mathernatica, 21B (1982) 11-55. [25] H.O. Hartley, Smallest composite designs for quadratic response surfaces, Biometrics, 15 (1959) 61 1-624. [26] P.W.M. John, Statistical Design and Analysis of Experiments, New York, Macmillan, 1971. [27] R.A. Maclean and V.L. Anderson, Applied Factorial and Fractional Designs, New York, Marcel Dekker, 1984. [28] D.H. Doehlert, Uniform shell designs, Journal of the Royal Statistical Society, Series C, 19 (1970) 231-239. [29] M.S. Bartlett and D.G. Kendall, The statistical analysis of variance heterogeneity and the logarithm transformation, Journal of the Royal Statistical Society, Series B, 8 (1946) 128-150. [30] G.E.P. Box, Signal-to-noise ratios, performance criteria, and transformations, (with discussion), Technometrics, 30 (1988) 1-40. [31] G.G. Vining and R.H. Myers, Combining Taguchi and response surface philosophies: a dual response approach, Journal of Quality Technology, 22 (1990) 38-45. [32] R.H. Myers and W.H. Carter, Jr., Response surface techniques for dual response systems, Technometrics, 15 (1973) 30 1-317. [33] A.E. Hoerl, Optimum solution of many variables equations, Chemical Engineering Progress, 55 (1959) 69-78. [34] E. Del Castillo and D.C. Montgomery, A nonlinear programming solution to the dual response problem, Journal of Quality Technology, 25 (1993) 199-204. [35] A.E. Freeny and V.N. Nair, Robust parameter design with uncontrolled noise variables, Statisticu Sinica, 2 (1992).
STABILITY AND RESPONSE SURFACE METHODOLOGY
77
[36] W.J. Welch, T.K. Yu, S.M. Kang and J. Sacks, Computer experiments for quality control by parameter design, Journal ofQuality Technology, 22 (1990) 15-22. [37] A.C. Shoemaker, K.L. Tsui and C.F.J. Wu, Economical experimentation methods for robust design, Technometrics, 33 (1991) 41 5-427. [38] G.E.P. Box and S.P. Jones, Designing products that are robust to the environment, Total Quality Management, 3 (1992) 265-282. [39] D.M. Steinberg and D. Bursztyn, Dispersion effects in robust-design experiments with noise factors, Journal of Quality Technology, 26 (1994) 12-20. [40] R.H. Myers, A.L. Khuri and G.G. Vining, Response surface alternatives to the Taguchi robust parameter design approach, American Statistician, 46 (1992) 131139. [41] R.H. Myers, Response surface methodology in quality improvement, Communications in Statistics: Theory and Methods, 20 (1991) 457-476. [42] D.R. Cox, Planning of Experiments, New York, Wiley, 1958. [43] C. Daniel, Use of half-normal plots in interpreting factorial two-level experiments, Technometrics, 1 (1959) 311-341. [44] W.G. Cochran and G.M. Cox, Experimental Design, New York, Wiley, 1957. [45] M.S. Bartlett, Discussion of “Complex experiments”, by F. Yates, Journal of the Royal Statistical Society, Series B, 2 (1935) 224-226. [46] S. Addelman, Some two-level factorial plans with split-plot confounding, Technometrics, 6 (1964) 253-258. [47] C. Daniel, Applications of Statistics to Industrial Experimentation, New York, Wiley, 1976. [48] R.L. Mason, R.F. Gunst and J.L. Hess, Statistical Design and Analysis of Experiments, New York, Wiley, 1989. [49] G.E.P. Box and R.D. Meyer, An analysis for unreplicated fractional factorials, Technometrics, 28 (1986) 11-18.
This Page Intentionally Left Blank
Chapter 3
REVIEW OF THE USE OF ROBUSTNESS AND RUGGEDNESS IN ANALYTICAL CHEMISTRY Y. VANDER HEYDEN AND D.L. MASSART ChemoAC,Pharmaceutical Institute, Vrije Universiteit Brussel, Laarbeeklaan 103, B-I 090 Brussels, Belgium
3.1 INTRODUCTION This review describes the determination of robustness and ruggedness in analytical chemistry. The terms ruggedness and robustness as used in method validation are sometimes considered to be equivalent [1,2]. In other publications a difference is made between the two terms [3]. In the following only the term ruggedness will be used. The ruggedness of an analytical method can generally be described as the ability to reproduce an analytical method in different laboratories or in different circumstances without the occurrence of unexpected differences in the obtained results. A ruggedness test is a part of method validation (Table 3.1) and can be considered as a part of the precision evaluation [2,4,5]. Ruggedness is related to repeatability and reproducibility. Some definitions for ruggedness come very close to those for reproducibility. Certain interpretation methods to identify the significant factors in a ruggedness test use criteria based on results for repeatability or reproducibility. These two items will be considered in Section 3.4.7. The validation of analytical methods is becoming increasingly important, particularly in the pharmaceutical industry. This is due amongst others to the regulations imposed by the drug regulatory agencies [6]. The ruggedness testing should be performed for nearly all analytical methods used in pharmaceutical and biopharmaceutical analysis [2,7] as can be seen in Table 3.2. However, until now no uniform ruggedness testing procedure
79
80
Y. VANDER HEYDEN and D.L. MASSART
exists. This has led to a variety of approaches, proposed by different authors, for the different steps in a ruggedness test. These many approaches and the complexity of the ruggedness tests are two reasons why a ruggedness test is often not performed. The different steps in a ruggedness test will be discussed in this review.
TABLE 3.1 PERFORMANCE CRITERIA FOR METHOD VALIDATION [2,4,5] * Bias : Systematic errors * Accuracy : Random errors and systematic errors * Precision : Random errors - repeatability - intermediate precision estimate - reproducibility - ruggedness * Specificity and selectivity : - interference - peak purity * Range * Linearity * Sensitivity * Limits : - detection limit, limit of detection (LOD) - limit of quantitation (LOQ) 0 lower limit of quantitation (LLQ) 0 higher limit of quantitation (HLQ)
3.2 PLACE OF RUGGEDNESS TESTING IN METHOD VALIDATION As already mentioned in the introduction, ruggedness is a part of the precision evaluation. Precision is a measure for random errors. Random errors cause imprecise measurements. Another kind of errors that can occur are systematic errors. They cause inaccurate results and are measured in terms of bias. The total error is defined as the sum of the systematic and random errors.
TABLE 3.2 OVERVIEW OF METHOD VALIDATION IN PHARMACEUTICAL AND BIOPHARMACEUTICAL ANALYSIS (REPRINTED, WITH PERMISSION, FROM [2]) Type analytical-chemical method Validation parameter
Confirmation of the identity of pure substances
Determination of identity of unknown substances
Accuracy Linearity Precision Ruggedness Specificity Selectivity Sensitivity Limit of
no no no Yes Yes no no
*
no no no Yes Yes Yes no Yes
no
no
detection Limit of quantitation Range
Amount single pure substance
Amount active substance
Limit test (semiquantitiative)
Amount impurities/ degradation products (quantitative)
Dissolution speed of substances
Bioequivalence studies
Yes
Yes
Yes
Yes
Yes Yes
Yes Yes
Yes Yes Yes Yes Yes Yes Yes no
Yes Yes Yes
Yes
*
*
*
no Yes no
Yes Yes no
no Yes Yes Yes Yes Yes
*
*
*
*
*
* *
Yes
*
* * *
Yes Yes Yes
*
Yes
*
*
* * * * Yes Yes * = not always required (depending on the judgement of the experimentator)
no no yes = always required; no = not required;
82
Y. VANDER HEYDEN and D.L. MASSART
From a statistical point of view, precision measures the dispersion of the results around the mean, irrespective of whether that mean is a correct representation of the true value. This requires the calculation of a standard deviation. How this is done depends on the context. Two types of precision are usually distinguished, namely the repeatability and the reproducibility. Repeatability is the precision obtained in the best possible circumstances (same analyst, one instrument, within one day when possible) and reproducibility under the most adverse possible circumstances (different laboratories, different analysts, different instruments, longer periods of time, etc.). Reproducibility can be determined only with interlaboratory experiments. Intermediate situations may and do occur. They are for instance defined in terms of M-factordifferent intermediate precision measures, where M is one, two, three or even higher [8,9]. In this definition M refers to the number of factors that are varied to make the estimation. The most likely factors to be varied are time, analyst and instrument. According to this terminology, one estimates e.g. the time-and-analyst-different intermediate precision measure (M=2), when the precision is determined by measuring a sample over a longer period of time in one laboratory by two analysts with one instrument. In a protocol about collaborative studies [lo] it is also considered what is called preliminary estimates of precision. Among these the protocol defines the “total within-laboratory standard deviation”. This includes both the within-run or intra-assay variation (= repeatability) and the between-run or inter-assay variation. The latter means that one has measured on different days and preferably has used different calibration curves. It can be considered as a within-laboratory reproducibility. These estimates can be determined prior to an interlaboratory method performance study. The total within-laboratory standard deviation may be estimated fi-om ruggedness trials [lo]. A third term in the context of precision is robustness or ruggedness. The result of a ruggedness test indicates how tightly controlled the experimental factors should be. The detection of factors that heavily influence the results of a method leads eventually to a more reproducible method.
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
83
3.3 DEFINITIONS OF RUGGEDNESS The terms “ruggedness” and “ruggedness test” were introduced in analytical chemistry by Youden and Steiner [Ill. They proposed to perform an experiment in which one can verify whether certain factors of the test procedure have influence on the response of a method or not. If none of the investigated factors shows an influence on the response the method was considered to be “rugged” or “robust”. They introduced the ruggedness test because in collaborative tests it was not unusual to observe unexpected results for an analytical procedure which performed well in the laboratory that developed the method. The explanation for this was found in the fact that the initiating laboratory has a set of conditions, operations and equipment which do not vary. Transferring the procedure to another laboratory causes small changes in a number of these conditions, operations and equipment properties. Some of these produce changes in the response of the method which then lead to the unexpected results in the collaborative test. By performing the ruggedness test the factors causing difficulties in a collaborative test were tracked and could be controlled strictly thereby avoiding disappointments. Agencies or authorities such as I S 0 or IUPAC still do not provide any definition of ruggedness. In the chemical literature however, a ruggedness test was defined as [4,12]: “ A n intralaboratory experimental study in which the influence of small changes in the operating or environmental conditions on measured or calculated responses is evaluated. The changes introduced reflect the changes that can occur when a method is transferred between different laboratories, different experimentators, different devices, etc.” . Some pharmaceutical sources give definitions for ruggedness. Not all definitions however are the same. The definition which comes closest to the chemical definition mentioned above is the one of the French Guide for Validation of Analysis Methods [ 131. This Guide states that “the ruggedness of an analysis procedure is its capacity to yield exact results in the presence of small changes of experimental conditions such as might occur during the utilisation of these procedures”. It continues by defining that by a small change in experimental conditions is meant “any deviation of a parameter of the procedure compared to its nominal value as described
84
Y. VANDER HEYDEN and D.L. MASSART
in the method of analysis”. A definition similar to this one is also given in reference [2]. Some other sources have definitions that are different from the one given above [7,14]. The US Pharmacopeia [7] defines ruggedness as: “The ruggedness of an analytical method is the degree of reproducibility of test results obtained by the analysis of the same sample under a variety of normal test conditions, such as different laboratories, different analysts, different instruments, different lots of reagents, different elapsed assay times, different assay temperatures, different days, etc. Ruggedness is normally expressed as the lack of influence on test results of operational and environmental variables of the analytical method. Ruggedness is a measure of reproducibility of test results under normal, expected operational conditions from laboratory to laboratory and fiom analyst to analyst”. In fact this is nearly the definition of reproducibility. This definition is also followed by other authors [ 151. The Canadian Acceptable Methods document [14] gives more or less a combination of the two definitions described above and considers 3 levels in the testing of the ruggedness of a method, with the third level being performed only rarely. Level one “requires verification of the basic insensitivity of the method to minor changes in environmental and operational conditions and should include verification of reproducibility by a second analyst”. The first part of this definition resembles the French Guide’s definition. The second part is a check on the adequacy of the method description and should be done without input from the original analyst. Level two “requires a verification of the effect of more severe changes in conditions, such as the use of chromatographic columns from different manufacturers or the substitution of different equipment, and should be performed in a different laboratory”. This second level can be considered as being equivalent to the US Pharmacopeia (USP) definition. The third level of testing is then “a full collaborative testing” which is however done rarely. The extent to which a ruggedness test is performed depends on the general use of the method. Level 1 ruggedness testing is required for all methods. Level 2 testing is performed when a method is intended to be applied at multiple locations or in a number of laboratories at a single location. Collaborative studies are recommended if it is
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
85
intended that a method will be used in multiple locations using a variety of equipment. From the definitions given above it can be seen that there are two approaches to ruggedness testing (also equal to levels 1 and 2 given in the Acceptable Methods document [14]). In the first approach factors to be examined are selected from the set of operating and environmental conditions that are or could be stipulated in the analytical procedure. This kind of factors can be called procedure related factors. In the second approach non-procedure related factors are considered. Factors such as e.g. different laboratories, different analysts, different instruments, different lots of reagents, different days, different columns for HPLC methods or different plates for TLC methods are then examined. In the literature, ruggedness tests concern mainly procedure related factors, but occasionally one of the other factors, e.g. a column factor in HPLC, is examined. This will be discussed hrther in more detail (see Sections 3.4.2 and 3.4.4.4). The examination of the non-procedure related factors in ruggedness testing is described less frequently and requires another approach than the examination of procedure related factors. In Section 3.4 the strategy and the different possibilities for performing a ruggedness test when mainly procedure related factors are examined will be reviewed. In a later part (Section 3.5) ruggedness testing of nonprocedure related factors will be discussed.
3.4 RUGGEDNESS TESTING OF PROCEDURE RELATED FACTORS 3.4.1 The steps of a ruggedness test A ruggedness test requires an experimental design approach. It consists of the following steps: 1.
2.
Selection and identification of the operational or environmental factors to be investigated; Selection of levels for the factors to be examined. In a ruggedness test 2 or 3 levels for each factor are normally considered. The ruggedness for the factors in the intervals between the factor levels is then investigated;
86 3. 4. 5.
6. 7. 8.
Y. VANDER HEYDEN and D.L. MASSART
Selection of the experimental design; Carrying out the experiments described in the design. This is the experimental part of the ruggedness test; Computation of the effect of the factors on the response(s) of the method, to derive which factors might have experimentally relevant effects; Statistical analysis of the results. In this part of the test statistically significant effects are identified; Drawing chemically relevant conclusions; When necessary giving advice for improvement of the performance of the method and definition of suitability criteria.
The different steps described above will be explained in more detail in the following sections.
3.4.2 Selection of the factors As a first step one selects a number of factors to examine. The selected factors should be chosen from the description of the analytical procedure or from environmental parameters which are not necessarily specified explicitly in the analytical method. The factors can be quantitative (continuous, numerical) or qualitative (discrete). The factors to be tested should represent those that are most likely to be changed when a method is transferred, for instance, between different laboratories, different devices, or over time, and that potentially could influence the response of the method. However it is not always obvious which factors will influence a response and which will not. This is one of the reasons why screening designs are used (see Section 3.4.4). They allow to screen a large number of factors in a relatively small number of experiments. In Table 3.3 a list of different factors investigated in different publications is given [ 1,4,13,16,17]. The list is not exhaustive and is only shown to give the reader an idea. Table 3.3 shows that many authors have not really understood the nature of ruggedness testing. For instance, changing the type of the reagents in sample preparation, the mobile phase in HPLC [ 131 or the type of acid to control the pH of the mobile phase (e.g. orthophosphoric acid, acetic acid, perchloric acid [ 171) does not make sense. These kind of factors are more likely to be examined in a screening design to eliminate factors that are not
REVIEW OF ROBUSTNESS I N ANALYTICAL CHEMISTRY
87
significant during an optimization procedure [ 181. In a ruggedness test however, one starts from an optimized procedure which was already validated for precision and accuracy and which should not be changed in any detail. Some comments should be made: The selection of the factor “type of acid” in a ruggedness test could be accepted when only the pH is specified by the method rather than the acid used to bring the solution or the buffer up to the desired pH. Clearly, however, in such a case the method is poorly defined. In references [4,13] the sample weight is entered as a covariable. This may make some sense if the design employed later allows to measure the effect of other factors as influenced by the sample weight (interactions). The design then should be able to estimate interaction effects (see Section 3.4.4). However, in certain cases, one investigates if the factor sample weight influences the measured concentration. If it does not, this would mean the method is not able to perform the required analysis! Besides this effect should already have been studied in previous experiments in method validation. Entering the sample weight as a factor can be usehl in the case resolution is considered as the response. Then one is able to detect if the sample weight does influence the resolution between peaks in a chromatogram. However, this would have been studied better in the context of defining the limits of quantification and the range. A group of factors causing problems are HPLC columns. Some articles [4,6,19] propose to include the factor “batch of material” or “manufacturer of material” in a two level design and do this by comparing two columns. However, it is far from evident that these two selected columns are extreme levels for the whole population of batches from one manufacturer or for the population of columns from different manufacturers. The problem could be tackled by examining more than two columns. One possibility is to consider the column factors in the same way as the factors “different laboratories, different analysts, different instruments” and to examine these factors in a nested design (nested ANOVA) [20,21]. These designs will be discussed in Section 3.5. Another possibility is the use of (screening) designs where the factor of interest is examined at more then 2 levels. Some designs of this type are described in Section 3.4.4.5. One also has to realize that one is
88
Y. VANDER HEYDEN and D.L. MASSART
able to examine only one of the three mentioned column factors (manufacturer, batch, age) at the time in a Plackett-Burman or a fractional factorial design. Entering two of them requires the use of nested designs (see Section 3.5). A problem that could occur and that is usually overlooked is the possible interaction between the column factor and the other factors of the design. This will be discussed further in Section 3.4.4.4.
3.4.3 Selection of the levels of the factors In a second step the levels for the chosen factors are selected. For quantitative factors one considers a low and a high extreme level that is respectively smaller and larger than the nominal one. The nominal level is the level for the factor as it is given in the description of the procedure or the one that is most likely to occur in the case it is not specified in the analytical procedure. The levels for the factors are chosen in such a way that they represent the maximum difference in the values of the factors that could be expected to occur when a method is transferred from one laboratory to another without the occurrence of major errors [4]. In some publications only one extreme level is examined [ 11,221. In these cases only the influence of changing the factors to one side of the nominal level is examined. Three levels (two extremes and the nominal) are selected if one cannot exclude a nonlinear behaviour of the response in function of the change of the factors [6,17,19,23] (see Figure 3.1). If a factor that causes a nonlinear change of a response is examined in a two level design where its extreme levels were used, one could find a small effect, E(+1,-1) as is seen in Figure 3.1. One would conclude that the response is robust in the interval between the two extreme levels. However, changing the factor from the nominal to one of the extreme levels causes a considerable change in the response, E(+I,o) and E(o,-1). By examining this factor at three levels it will be observed that the response is not rugged in the interval between the two extreme levels.
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
89
IN THE LITERATURE Sample meparation. sample weight shaking or dissolution time sonication time temperature of sample preparation extraction volume wash volume centrifugation time pH of the solution composition of the reagents type of reagents
HPLC methods. pH of mobile phase amount organic modifier buffer concentration or ionic strength concentration tailing suppressor flow rate acid type in mobile phase age of the sample solution For gradient elution : * the factors concerning the mobile phase and summed up above could be considered for each mobile phase * steepness of the gradient * initial ratio of the mobile phases * final ratio of the mobile phases column factors : - batch number - manufacturer - age of the column detector factors :wavelength integration factors : - signal-to-noise ratio or sensitivity - method to draw baseline under a peak
90
Y. VANDER HEYDEN and D.L. MASSART
TLC methods. batch of plates composition of the mobile phase developing temperature
GC methods [ 161. injection temperature split flow liner type temperature rate sample matrix detector temperature column flow
Response
I
-1
0
+1
(level)
Figure 3.I Comparison of the observed change of the response by examination at 2 levels and the actual changes for an optimizedfactor having a non-linear response. E(+l,-i) = observed change when examined at the two extreme levels. E(+l,o); E(o,-I) = actual changes between method conditions (nominal level) and the extreme levels. -I = low extreme level, 0 = nominal level, + I = high extreme level
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
91
A common error is to select levels that are too far apart from each other. In a ruggedness test one selects the extreme levels of the factors to be somewhat larger than the changes that would occur for this factor under normally changing conditions (different laboratories, etc.). In a number of published ruggedness tests one finds levels that are quite far from each other, much further than can occur by transferring a method between different laboratories. If one can prove that in this chosen interval the factor is rugged, this is of course excellent. However, since one does not know the effect of the factor in advance one will introduce a large possibility of finding a significant effect which is not relevant for the evaluation of the ruggedness. If in a method description the pH of the mobile phase is 5.0 then one normally should be able to work in an interval between 4.8 and 5.2. This then is the interval proposed to be examined in a ruggedness test and not for example 4.0 and 6.0. Examples of levels of factors that seem too far from each other and that were tested in different ruggedness tests are given in Table 3.4. It should be noted that the same designs as those applied here are sometimes used in the optimization stage of the method [18,24-271, i.e. before the validation. In that case it makes sense to apply levels that are hrther apart than in ruggedness testing.
TABLE 3.4
SOME LEVELS OF FACTORS THAT ARE TESTED WITH LARGE INTERVALS IN A RUGGEDNESS TEST (HPLC METHODS) Factor Levels as tested in the literature PH nominal k 1 [4] nominal k 0.5 [6,13,19] flow rate
nominal k 0.3 ml/min [ 131 nominal k 0.5 ml/min [6,19]
wavelength (UV)
nominal k 8 to 12 nm [19]
92
Y. VANDER HEYDEN and D.L. MASSART
3.4.4 Selection of the experimental design To examine the ruggedness of the factors that were selected one could test these factors one variable at a time, i.e. change the level of one factor and keep all other factors at nominal level. The result of this experiment is then compared to the result of experiments with all factors at nominal level. The difference between the two types of experiments gives an idea of the effect of the factor in the interval between the two levels. The disadvantage of this method is that a large number of experiments is required when the number of factors is large. For this reason one prefers to apply an experimental design. In the literature a number of different designs are described, such as saturated fractional factorial designs and Plackett-Burman designs, full and fractional factorial designs, central composite designs and Box-Behnken designs [ 5 ] . In practice however, most designs used for the determination of ruggedness are fractional factorials or of the Plackett-Burman type. For this reason we will pay more attention to these designs and to a number of related concepts, such as interaction, confounding, defining relations, aliases, etc. The fractional factorial and Plackett-Burman designs are also called screening designs [28] because they allow to screen a large number of factors. 3.4.4.1 Full factorial designs
In a full factorial design all combinations between the different factors and the different levels are made. Suppose one has three factors (A,B,C) which will be tested at two levels (- and +). The possible combinations of these factor levels are shown in Table 3.5. Eight combinations can be made. In general, the total number of experiments in a two-level full factorial design is equal to 2/ with f being the number of factors. The advantage of the full factorial design compared to the one-factor-at-a-time procedure is that not only the effect of the factors A, B and C (main effects) on the response can be calculated but also the interaction effects of the factors. The interaction effects that can be considered here are three two-factor interactions (AB, AC and BC) and one three-factor interaction (ABC). From the 23 full factorial design shown in Table 3.5 these seven effects can be calculated. An eighth statistic that can be obtained from this design is the mean result. From a 2' full factorial design therefore 2' statistics can be calculated. The
93
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
number of effects (statistics) belonging to each type (average; main and multiple-factor interaction effects) is given in Table 3.6 for different full factorial designs.
TABLE 3.5
FULL FACTORIAL DESIGN FOR 3 FACTORS Factors A B C Resuonse Exueriment
+ +
2 3 4 5 6 7 8
+ + -
-
+ +
+ +
Y2 Y3 Y4 Y5 Y6 Y7 Y8
+ +
+ +
TABLE3.6 NUMBER OF STATISTICS THAT CAN BE CALCULATED FOR DIFFERENT FULL FACTORIAL DESIGNS Full factorial design ($ Statistics f=2 f=3 f=4 f=5 f=6 f=7 f=S Average 1 1 1 1 1 1 1 Main effects 2 3 4 5 6 7 8 Interaction effects 2-factor 3-factor 4-factor 5 -factor 6-factor 7-factor 8-factor
c.:
1
3 1
6 4 1
10 10 5 1
15 20 15 6 1
21 35 35 21 7 1
4
8
16
32
64
128
28 56 70 56 28 8 1 256
94
Y. VANDER HEYDEN and D.L. MASSART
The number of effects (statistics) can be calculated with the following formula [29]: Number ofp-factor interactions in a 2‘ design: To explain the concept of interaction, let us consider a two-factor interaction. This interaction occurs when the effect of the first variable obtained at the lowest level (-) of the second variable is different from the effect of the first factor at the highest level (+) of the second one. The effect of one variable is influenced by that of the other and therefore it is said that they “interact”. We will try to explain this with an example. Suppose that in Table 3.5 factor B is the “HPLC column” (level (+) = column K and level (-) = column L) and factor A is the “pH of the mobile phase” (level (+) = 5.2 and level (-) = 4.8). A two-factor interaction between the column manufacturer and the pH of the mobile phase occurs when the effect of the pH on the response (e.g. resolution) on column K is different from the effect of the pH on column L. The interaction is calculated as half the difference between the effect of the pH on column K and the effect of the pH on column L. The interaction is called in this example the pH by column interaction and is symbolised by pH x column or AxB or AB or BA. 1
Interaction effect (AB) =- (effect pH on column K-effect pH on column L) 2
or Interaction effect (AB) =
1 (EA,B(+) - EA,B(-)) 2
-
Let us now consider three-factor interactions (e.g. ABC in Table 3.5) to give a general idea how these and higher-order interaction effects (four-, five-factor interaction effects, etc.) are derived. A three-factor interaction means that a two-factor interaction effect is different at the two levels of the third factor. Two estimates for the AB interaction are available from the experiments, one for each level of factor C. The AB interaction effect is estimated once with C at level (+) (represented by EAB,c(+)) and once
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
95
with C at level (-) (represented by EAB,c(-)). Half the difference between these two estimates gives the three-factor interaction effect (ABC). 1
Interaction effect (ABC) = - (EAB,c(+) - EAB,c(-)) 2
As is the case for the two-factor interactions, the three-factor interaction is also symmetric in all its variables : the interaction effects ABC, ACB, BAC, CAB, BCA and CBA all give the same result. Higher-order effects are calculated by analogous reasoning. To compute main and interaction effects (see further Section 3.4.6) one determines:
where EX is the effect of factor X CY(+) and CY(-)are the sums of the responses where factor X was respectively at level (+) and at level (-) and where n is the number of runs from the design where the factor was at level (+) or at level (-).The effect of a factor can be considered as the difference between the mean response at high level (+) and those at low level (-). The term n is equal to
N when each experiment in the design is performed 2
-
once and N is the number of experiments of the design. For factor A from Table 3.5 this gives:
To calculate the interaction effects following the reasoning given above would require a lot of work. An easier way exists, namely by using the columns of so-called contrast coefficients. The contrast coefficients for the 23 design of Table 3.5 are given in Table 3.7. The contrast coefficients for the interactions are obtained by multiplying the corresponding signs of the contributing factors. For instance, the levels for AB are obtained by multiplying the signs of the columns A and B for each experiment. The
96
Y. VANDER HEYDEN and D.L. MASSART
interaction effects are then calculated analogously to the main effects. This would give for the interaction effect AB:
TABLE3.7 COLUMNS OF CONTRAST COEFFICIENTS FOR A 23FULL FACTORIAL DESIGN Factors Interactions Experiment A B C AB AC BC ABC + + + 1 + + + 2 + + 3 + + + + 4 5 + + + + + 6 + + + + 7 + + + + + + + 8 3.4.4.2 Fractional factorial designs The disadvantage of full factorial designs is that the number of experiments increases rapidly when the number of factors increases. For 6 factors 64 experiments are required and for 7 factors 128. In practice it is usually not possible to perform such a large number of experiments in a reasonable span of time. For this reason often only a fraction of a full factorial design is performed. This kind of design is called fractional (or partial) factorial design. Let us first consider a half-Ji.actionfactorial design. Only half of the number of experiments needed for a full factorial are performed. For 16
example for 4 factors, -=8 2
experiments are performed. This can be
observed from Table 3.8 in which a full factorial design for 4 factors is shown. By selecting 8 appropriate experiments from the set of 16 a halffraction factorial design is obtained. The experiments 1, 4, 6 , 7, 10, 11, 13 and 16 (experiments between brackets) form a half-fraction factorial
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
97
design for four factors while the experiments 2, 3, 5, 8, 9, 12, 14 and 15 form another half-fraction of the full factorial design. This kind of design is symbolized as 24-1.
TABLE 3.8 SELECTION OF A HALF-FRACTION FACTORIAL DESIGN FROM A FULL FACTORIAL DESIGN FOR 4 FACTORS Factors Exueriment A B C D
By reducing the number of experiments one, of course, also loses some information. In a fractional factorial design not all main and interaction effects can be estimated separately as in a full factorial design. In a halffraction factorial design the main effect of a factor will be estimated together with a higher-order interaction effect. Let us, for example, consider the half-fraction factorial design formed by the experiments between brackets in Table 3.8 and calculate the columns of contrast coefficients for the three-factor interactions (see Table 3.9). It can be seen that column ABC is equal to column D. This is also the case for the columns A and BCD; B and ACD; C and ABD. In the design formed by the experiments which are between brackets in Table 3.8, one estimates the
98
Y. VANDER HEYDEN and D.L. MASSART
total effect of factor D and of the interaction ABC. It is said that factor D and the interaction ABC are confounded with each other, meaning that they cannot be estimated separately. All other main effects are also confounded with a three-factor interaction (A with BCD; B with ACD and C with ABD). By calculating the columns of contrast coefficients for the two-factor interactions, one sees that in this design each two-factor interaction is confounded with another two-factor interaction (e.g. AB with CD). In terms of absolute size, main effects tend to be larger than two-factor interactions, which in turn tend to be larger than three-factor interactions, and so on. In the half-fraction factorial design of Table 3.9 the main effects are expected to be significantly larger than the three-factor interactions with which they are confounded. As a consequence it is supposed that the estimate for the main effect and the interaction together is an estimate for the main effect alone.
TABLE 3.9 THE COLUMNS OF CONTRAST COEFFICIENTS FOR THE THREEFACTOR INTERACTIONS OF THE HALF-FRACTION FACTORIAL DESIGN FOR FOUR FACTORS SELECTED FROM TABLE 3.8 Factors Interactions Exu. A B C D ABC ABD ACD BCD 1 + + 4 + + + 6 + + 7 10 + 11 + + 13 + + + + + + 16 Let us now consider how one selects the experiments from the full factorial to obtain a proper half-fraction factorial design. In practice, to construct a 24-1design one first constructs a full factorial design for 4-1=3 factors (see Table 3.7). The fourth factor (D) will be awarded to one of the columns of the interactions (e.g. to ABC in this case). This means that one confounds factor D with the interaction ABC. In this way the design of Table 3.10 is
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
99
obtained which is equal to the half-fraction factorial of Table 3.9. Only the order of the experiments (rows) is different.
TABLE 3.10
HALF-FRACTION FACTORIAL DESIGN FOR 4 FACTORS p4-') Factors Experiment A B C D = ABC -
+ -
+ +
+ + -
+ +
+ + -
+ -
+
The relationship D = ABC in this design is called the generator. The factor D and the three-factor interaction ABC are called aliases of one another because they are confounded. All aliases can be determined with the help of the defining relation or defining contrast (I). It is obtained by multiplying the effects occuring in the generator. I = D x ABC = ABCD The alias of any effect can be obtained by multiplying the effect with the defining relation, with as an additional rule that when a term appears an even number of times this term disappears from the product. For instance the aliases of factor A and of interaction AB are respectively :
A = A x ABCD = A ~ B C D= BCD AB = AB x ABCD = A ~ B ~ C= D CD All aliases of this design are shown in Table 3.1 1. The alias of the defining relation itself is the mean response. This design is called a design of resolution IV. The design resolution is determined by the number of terms in the defining relation. The higher the resolution of the design the higher
100
Y. VANDER HEYDEN and D.L. MASSART
the order of the interaction effect confounded with the main effect. In general in a design of resolution R no p-factor (interaction) effect is confounded with any effect containing less than R-p factors. The design given in Table 3.10 could be symbolized as a 24-1(IV) design. In this design no main effect (p=l) is confounded with any effect containing less than 3 factors (R-p = 4-1 = 3) and no two-factor interaction (p=2) is confounded with another effect that contains less than two factors (R-p = 4-2 = 2).
TABLE 3.1 1 ALIASES OF A 24-1DESIGN WITH I = ABCD Aliases A = BCD B = ACD C = ABD D = ABC AB = CD AC = BD AD = BC ABCD = Mean Now let us try to generate a quarter--action factorial design for 6 factors. In this design one fourth of the experiments of a full factorial design are performed, i.e.
26
-
4
6-2
=16 experiments. The design is symbolized by 2 . The
first four (6-2) columns in the design are given by the full factorial design for four factors. They are shown in Table 3.12. For the last two columns (E and F) a generator must be defined. For example E = ABCD and F = ABC. This gives two defining relations associated with the generators: I = ABCDE and I = ABCF. There are three defining relations in a quarter-fraction factorial design meaning that each effect is confounded with three other effects. The third defining relation is obtained by multiplying the two originally obtained relations using the multiplication rules mentioned higher:
I = ABCDE x ABCF = A ~ B ~ C ~ D = DEF EF
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
101
The defining relations are then I = ABCDE, I = ABCF and I = DEF. The resolution of the design is I11 since the smallest defining relation contains three terms. This means that certain main effects are confounded with twofactor interactions, e.g. D = EF = ABCE = ABCDF.
TABLE 3.12 QUARTER-FRACTION, 26-2(IV),FACTORIAL DESIGN. GENERATORS: E = ABC AND F = BCD Factors Experiment A B C D E F 1 2 3 4 5 6 7 + 8 + + 9 + + + + 10 + + + 11 + + + 12 + + 13 + + + 14 + + + 15 + + + + 16 Since two-factor interactions tend to be larger than three-factor interactions it would be worth-while to construct a design of resolution IV (or even higher if possible). In such a design the main effect would be confounded only with three-factor and higher-order interactions. This design could be expected to give better estimates for the main effects than the design of resolution 111. Since one is free to define the generators one can define another set of them to try to increase the resolution of the design. For instance one could define E = ABC and F = BCD. By using these latter two generators a design of resolution IV is created. The defining relations
102
Y. VANDER HEYDEN and D.L. MASSART
obtained with these generators are I = ABCE, I = BCDF and I = ADEF. The complete design is shown in Table 3.12 and the aliases of this design in Table 3.13.
TABLE 3.13 ALIASES IN THE 26”(IV) DESIGN OF TABLE 3.12 Aliases A = BCE = ABCDF = DEF B = ACE = CDF = ABDEF C = ABE = BDF = ACDEF D = ABCDE = BCF = AEF E = ABC = BCDEF = ADF F = ABCEF = BCD = ADE AB = CE = ACDF = BDEF AC = BE = ABDF = CDEF AD = BCDE = ABCF = EF AE = BC = ABCDEF = DF AF = BCEF = ABCD = DE BD = ACDE = CF = ABEF BF = ACEF = CD = ABDE ABD = CDE = ACF = BEF ABF = CEF = ACD = BDE Mean = ABCE = BCDF = ADEF Smaller fractions of a full factorial can be constructed by analogous reasoning as for the quarter fraction. The defining relations are obtained by making all possible multiplications between the original defining relations derived from the generators. Suppose that a 28-4design must be created, i.e. a sixteenthfiuction of an 8 factor full factorial design. In this design 4 generators have to be defined creating 4 original defining relations. The total number of defining relations in a sixteenth fraction is fifteen. The eleven remaining defining relations are obtained by multiplying the original ones two by two (6 relations), three by three (4 relations) and all four at a time (one relation). In general, a fractional factorial design can be written as in whichfis 1 2
the number of factors that is examined in the design and 7is the fraction
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
103
considered (v=l, 2, 3,...). When constructing such a design v generators have to be defined giving a total of 2"-1 defining relations. The design has experiments. Each effect is confounded with 2'- 1 other effects. The smallest fraction of a design in which main effects still can be estimated without confounding among each other is called a saturated fractional factorial design. The 27-4design for example can estimate the main effects of 7 factors in 8 experiments (see Table 3.14). The design is of resolution I11 and main effects are confounded with two-factor interactions. The design is saturated since it is not possible to increase the level of confounding. These saturated designs are used in a ruggedness test which means that one estimates that two-factor and higher-order interaction effects are negligible compared to the main effect. The saturated designs allow then to make an estimation for the main effects of the factors from the design.
TABLE 3.14 SATURATED FRACTIONAL FACTORIAL DESIGN FOR 7 FACTORS: 27-4(1~~). GENERATORS: D=ABC, E=AB, F=AC, G=BC Factors A B C D E F G Experiment 1 + + + 2 + + + 3 + + + 4 + + + 5 + + + 6 + + + 7 + + + 8 + + + + + + + 3.4.4.3 Plackett-Burman designs The most important alternative for saturated fractional factorial designs are Plackett-Burman designs [30]. For N = 4, 8, 16, ... experiments (generally 2x with x=2, 3, ...), these designs are saturated designs of resolution 111. However, there are also Plackett-Burman designs for 12, 20, 24, ... experiments. Generally, Plackett-Burman designs are described for a number of experiments, N, up till 100 with N being a multiple of four.
104
Y. VANDER HEYDEN and D.L. MASSART
Designs of more than 20 or 24 experiments are of no practical use in a ruggedness test, because the time to perform these designs becomes too long. The first line for the designs with N = 8, 12, 16,20 and 24 as described by Plackett and Burman [30] is given below:
An example of a Plackett-Burman design for N = 12 is shown in Table 3.15. The first row in the design is the one given by Plackett and Burman. The following N-2 rows are obtained by a cyclical permutation of one place (i.e. shifting the line by one place) compared to the previous row. The sign of the first factor (A) in the second row is equal to that of the last factor (K) in the first row. The signs of the following N-2 factors in the second row are equal to those of the first N-2 factors of the first row. The third row is derived from the second one in an analogous way. This procedure is repeated N-2 times until all but one line are formed. The last (flh) row consists completely of minus signs. Since Plackett-Burman designs are saturated designs of resolution I11 main effects are confounded with many higher-order effects among which also a number of two-factor interactions. In the eight experiment design for instance each main effect is confounded with 15 higher-order effects among which three two-factor interactions. It is possible to define the multiple-factor interactions that are confounded with each main effect. This can be done by constructing columns of contrast coefficients as in the full and fractional factorial designs. The algebraic rules used here however are different from those for the full and fractional factorial designs. Full andfiactional factorial designs: 1. negative and negative + positive 2. negative and positive + negative 3. positive and negative + negative 4. positive and positive -+positive
105
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
Plackett-Burman designs: 1. negative and negative --+ negative 2. negative and positive + positive 3. positive and negative + positive 4. positive and positive --+ negative The algebraic rules used for the Plackett-Burman designs are opposite to those for the full and fractional factorial designs. In a fractional factorial design (Tables 3.8, 3.10, 3.12, 3.14) there always is a row containing only plus signs. In a Plackett-Burman design however one always has a row containing only minus signs (Table 3.15). Let us compare the N=8 Plackett-Burman design of Table 3.16 with a saturated 27-4design. After 7-4 rearranging the Plackett-Burman design in an appropriate way the 2 design of Table 3.14 is obtained but with opposite signs. Therefore the algebraic rules to obtain the contrast coefficients must also be different.
TABLE 3.15 Exp. 1 2 3 4 5 6 7 8 9 lo 11 12
A
+ + -
PLACKETT-BURMAN DESIGN FOR N =12 Factors B C D E F G H I
+ + + -
+ + + +
+ + + -
-
-
+ + + -
+ + + -
-
J
K
-
-
+ -
+ + + +
+ + + + -
+ + + + +
+ + + +
+ + + -
+ + +
+ +
-
-
+ -
+ -
+ + -
+ + +
+ + -
-
+ + -
+ -
+ -
-
-
+
-
+
-
-
-
+
+
+ + + + + -
In Table 3.16 the columns of contrast coefficients for the two-factor interactions are given. They were obtained using the above stated rules. The contrast coefficients for three- and higher-order interactions can be
106
Y. VANDER HEYDEN and D.L. MASSART
obtained analogously. The columns of contrast coefficients that are equal to each other are confounded with each other (e.g. ABC confounded with E). It can be seen that in this design each main effect is confounded with three two-factor interactions (Table 3.17), and with a number of higherorder interactions.
TABLE 3.17 TWO-FACTOR INTERACTIONS CONFOUNDED WITH THE MAIN EFFECTS IN A N = 8 PLACKETT-BURMAN DESIGN Two-factor interactions confounded with the Factor corresponding factor A BF CD EG AF CG DE B AD BG EF C D AC BE FG E AG BD CF F AB CE DG AE BC DF G The Plackett-Burman designs, as do the saturated fractional factorial designs, only allow for estimating the main effects. One assumes that all interaction effects are negligible compared to main effects. A Plackett-Burman design with N experiments can examine up to N-1 factors. This is a difference with fractional factorial designs. Some saturated fractional factorial designs however contain also N- 1 factors (e.g. the 27-4design of Table 3.14) but this is not always the case. The saturated design for 5 factors, for example, is the 25-2design. In this design only 5 factors are examined in 8 experiments. Each Plackett-Burman design contains a fixed number of factors (a multiple of four minus one). After determination of the number of factors to be examined (this factors will be called real factors), the remaining number of potential factors in the design are defined as dummy factors. A dummy factor is an imaginary factor. A fractional factorial design on the other hand will be constructed depending only on the number of real factors. Normally no dummies are entered in those designs.
TABLE 3.16 PLACKETT-BURMAN DESIGN FOR 7 FACTORS (N=8) AND THE COLUMNS OF CONTRAST COEFFICIENTS FOR TWO-FACTOR INTERACTIONS AND FOR A THREE-FACTOR INTERACTION Factors Em. A B C D E F G AB AC AD AE AF AG BC
TABLE 3.16 (CONTINUED)
PLACKETT-BURMAN DESIGN FOR 7 FACTORS (N4)AND THE COLUMNS OF CONTRAST COEFFICIENTS FOR TWO-FACTOR INTERACTIONS AND FOR A THREE-FACTOR INTERACTION Exp. BD BE BF BG CD CE CF CG DE DF DG EF EG FG ABC 1 + - + + + - + + + + + + 2 + + + - + + - + + - + 3 + + - + + + + + + 4 + + + + + + + - + + + 5 + + + + + + + + 6 + + + + + + + + 7 1+ + + + + + + 8 -
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
109
The difference can be seen from the following example. If there are six factors, one can perform a Plackett-Burman design with 8 experiments, containing six real and one dummy factor. Another possibility would be to perform the twelve experiment design that would contain five dummies. The decision to select a larger design with more experiments could depend on the statistical interpretation one would like to apply (see Section 3.4.7). If the same six factors are examined in a fractional factorial design, one could create an eighth fraction, fJ(e.g. with generators D = BC, E = AB and F = AC). This is also a design with 8 experiments as the PlackettBurman design but containing no dummy factors. 3.4.4.4 Taking into account certain interactions when constructing a design Usually one considers all interaction effects negligible when performing a ruggedness test. If one suspects certain two-factor interactions as potentially important, one can take into account this fact when constructing or selecting a design. For instance the interaction between an HPLC column as a factor and the other factors of the design might be considered as being potentially significant. Suppose one is examining the factors “batch number of the HPLC column” and “concentration tailing suppressor’’ in a design together with some other factors. Depending on the degree of endcapping of a column (more or less residual silanolgroups) the concentration tailing suppressor could have a larger or a smaller effect on the asymmetry of a peak on different columns, i.e. there is an interaction between the factors batch number and concentration tailing suppressor. When both factors are examined in one design the effect on the tailing found for the factor “tailing suppressor concentration” will be the mean effect for both columns. To determine whether these interactions are important one can create a fractional factorial design in which the two factor-interactions of interest are not confounded with each other nor with main effects. In such a design the interaction effects can be directly estimated (see also Section 3.4.10).
3.4.4.5 Designs for ruggedness testing at three levels When in a ruggedness test three levels for the factors are examined different designs are theoretically possible.
110
Y. VANDER HEYDEN and D.L. MASSART
A first possibility is the use of full factorial designs with three levels [3 13. The disadvantage of the three-level factorial designs is that the number of experiments increase very rapidly for a larger number of factors. A second possibility is the use of central composite designs. Their disadvantage is the large number of experiments required even for a low number of factors, e.g. 25 experiments for 4 factors and 273 experiments for 8 factors. Even 25 different experiments can already take an unreasonable long time to be feasible. The central composite designs allow to estimate not only main effects but also interaction and quadratic effects. However, quadratic effects are normally never considered in a ruggedness test. These designs are mainly used for optimization purposes and less for ruggedness testing. Central composite designs are interpreted using a regression method. To obtain a reasonable regression the levels of the factors should differ enough but in a ruggedness test broad intervals between the levels should be avoided. Nevertheless, some authors have used them for ruggedness tests [ 161. Another possibility that at first sight appears to be attractive are the three-level designs as proposed by Plackett and Burman [30]. However it was shown that in these designs a confounding occurs between main effects [32]. Well balanced three-level designs (designs without a confounding of main effects) derived from the three-level Plackett-Burman designs have been described, but they can only examine half the number of factors originally proposed by Plackett and Burman. These well balanced designs could be used in ruggedness tests. However, no case studies are known. To test the factors at three levels in a ruggedness test one usually applies the so-called reflected designs [17,19,23]. A reflected design is in fact a two level Plackett-Burman, full or fractional factorial design that is executed twice. Once the design contains the first extreme and the nominal level and once the other extreme and the nominal level. Both designs have one experiment in common, namely an experiment in which all factors are at nominal conditions. A reflected Plackett-Burman design for 7 factors is shown in Table 3.18.
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
111
TABLE 3.18 REFLECTED DESIGN FOR 7 FACTORS DERIVED FROM THE PLACKETT-BURMAN DESIGN FOR 7 FACTORS Factors Exp. A B C D E F G 1 +1 +l +1 0 +1 0 0 2 0 +1 +1 +1 0 +1 0 3 0 0 +1 +1 +1 0 +1 4 +1 0 0 +1 +1 +1 0 5 0 +1 0 0 +1 +1 +1 +1 +l +1 6 0 +l 0 0 0 +1 7 +1 +1 0 0 +1 0 0 0 0 8 0 0 0 0 0 9 -1 -1 -1 -1 0 10 -1 0 -1 0 -1 0 -1 0 -1 11 -1 0 0 -1 -1 -1 0 12 -1 0 0 -1 -1 13 -1 -1 0 -1 0 -1 0 14 -1 -1 -1 0 -1 0 0 0 -1 15 -1 -1 0 0 -1 3.4.4.6 Taguchi designs Taguchi designs allow both to optimize a method with regard to certain factors and at the same time to test the robustness of a number of factors. The factors to be optimized are called control factors, design variables [33], controllable factors [34] or design parameters [35]. Those to be tested for ruggedness are known as noise factors [33,34], environmental variables [33] or sources of noise [35]. The Taguchi design examines both sets of factors in a combination of two experimental designs. These designs and the treatment of the results is described in other chapters in this book. The use of these designs was introduced to improve the quality of technological products. However, one could consider to use the Taguchi designs for optimization and ruggedness testing of analytical methods. The control factors would be the factors to optimize and the noise factors those for which to test the ruggedness. Certain control and noise factors could be the same but examined at different levels. In the inner design (design with
112
Y. VANDER HEYDEN and D.L. MASSART
the control factors) of a Taguchi design the factor would be examined over a broader range and possibly at more levels than in the outer design (design with the noise factors). Let us try to explain the above with an example. Suppose that one is trying to optimize k’ of different peaks in an HPLC analysis as a function of the pH and the percentage organic modifier in the mobile phase. The factors in the inner design would be the pH and the percentage organic modifier. In the outer design the ruggedness of the response towards the noise factors is examined. These noise factors could again include the factors pH and percentage organic modifier but now examined over a much smaller interval. The drawback of Taguchi designs is the relatively large number of experiments to perform. No case studies that optimize factors and at the same time test their ruggedness towards noise factors in the field of analytical chemistry are known to us.
3.4.5 Experimental part of the ruggedness test The experiments are performed according to the chosen design and a response or a number of responses are measured. The sequence in which the experiments are performed can influence the estimation of the effect of a factor [36]. The reason for this lies in the fact that the measurements can be influenced by different sources of error. Each measurement is influenced by uncontrolled factors that cause random error. Measurements can also be influenced by systematic errors or by systematic errors caused by dr$t (linear drift; due to time-dependent factors). The occurrence of systematic errors or of drift will affect the estimation of the effects of the factors from the design [36]. If only random errors would occur the experiments could be performed in any order. Performing the experiments in a randomized way allows to make more correct estimations of the effects when there are also systematic errors. These errors are due to uncontrolled factors which vary between sets of experiments [36]. An example of this type of error could be the factor “room temperature”. Suppose that the factor “room temperature” is not controlled or examined during the performance of a design and that it takes more than one day to execute the complete design. It is then possible that a different room temperature during the different days introduces systematic differences between the sets of experiments carried out on those days.
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
113
Especially the full and fractional factorial designs are best performed in a random order to avoid the influence of systematic errors because they are constructed so that one factor is at one level in the first N/2 experiments and at the other in the last N/2 experiments. The Plackett-Burman designs can be considered as randomized when they are performed in the sequence that is described in the original papers of Plackett and Burman. Sorting the experiments for organizing reasons or experimental constraints as is proposed in some publications [6,17,19] should preferably be avoided. In some experimental designs it is however too difficult and time-consuming to perform the design in a random order. This is for instance the case when one of the factors is the “column manufacturer” or “batch number of the column”. In this case two columns are used and a performance of the design in random order is not really indicated. Experiments performed with the different columns must be grouped. The experiments with each column still can be executed in random order. An example of drift which occurs in practice is the ageing of a chromatographic column. The ageing of a column can be considerable in the time needed to perform a design. In a ruggedness test performed in our laboratory of the HPLC method for the determination of tetracycline as described in the USP [37], considerable changes in response were observed due to the ageing of the column in the span of time needed to perform a half-fraction factorial design for 4 factors [38]. A reduction in retention time, capacity factor, relative retention and resolution for tetracycline up to respectively 14%, 19%, 11% and 18% occurred. This effect caused by the ageing of the column could be confounded with the effect of the sorted factor. Grouping of experiments has to be done with caution, always keeping in mind that time-dependent factors (drifting factors) could disturb the effects of the sorted factors. If systematic errors due to drift are expected then one can perform the design in a well defined randomized way so that the calculated main effects are not biased by the drift [36]. These designs are called anti-drift designs and they are described for full and fiactional factorial designs. However, the interactions effects calculated from these designs are still biased by the drift. Drift-free effects also can be obtained by regularly performing experiments at nominal level between the experiments of the design. From the experiments at nominal level the drift can be measured. This allows to
114
Y. VANDER HEYDEN and D.L. MASSART
correct the responses of the design and to obtain drift-free effects. The disadvantage of this method is that more experiments are required.
3.4.6 Analysis of the results After carrying out tine design the results are analysed. In the first instance the results of the design (y,, y,, ...,y,) can be plotted to see if one or a few results differ extremely from the rest of the design indicating possible errors in the execution of those experiments. Those differing significantly from the rest (according to, for instance, Dixon's test or Grubb's test) should best be repeated to verify if the extreme value was not due to an error. The effects of the different factors on the response are calculated. This is done according to equation (2). In equation (2) and in the rest of this chapter n represents the number of runs that are performed at one level of a factor, respectively at (f) or (-) level. The symbol N indicates the number of different experiments 3 specified in a design. For instance, for a 2 design for which each experiment is performed once, N is equal to 8 and n to 4. A theoretical example of how to calculate an effect is shown in equation (3) and some practical results can be observed in the tables belonging to the case studies described in Section 3.4.9 (Tables 3.22,3.24,3.26,3.28).1
In ref. [13,36], an effect is calculated as: EX=
c
y(+)-
N
c y(-)
1 y(+)-
2n
y(-)
(2')
An effect found with this equation is half the effect found with the formula used in eq.
(2). The effect obtained using equation (2) describes the effect that occurs when the factor is changed from one extreme level to the other. The use of equation (2') can be justified as an estimation of the effect that occurs when changing a factor from the nominal to an extreme level. However, the conclusions drawn from eq. (2') are only valid if a number of assumptions is fulfilled: (a) the factor is quantitative and not qualitative; (b) the nominal level is situated in the middle of the interval between the two extreme levels; (c) the response is linear in the interval between the two extreme levels. Since these assumptions are not always fulfilled the use of equation (2') is not recommended.
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
115
When a factor is examined at three levels two effects for each factor can be calculated:
and
where Ex(+l) and EX(-1) are the two effects of factor X respectively between the nominal and high extreme level and between the nominal and low extreme level; CY(+l), CY(-1) and CY(0) are the sums of the responses where factor X was respectively at level (+l), (-1) and (0) and where n is the number of runs where the factor was at each level. The effects on a response can also be normalized on a scale between 0 and 100% by dividing the effect of a factor by the mean nominal response ( ~ n o m and ) multiplying by 100 :
Ex(%)
= $100
Ynom
(7)
In this way the effect is expressed as a percentage of the response at nominal level. The effects and normalized effects on a response can be arranged from highest to lowest to show which effects have the largest influence on the considered response.
3.4.7 Statistical analysis of the results To determine whether an effect is statistically significant or not, a statistical interpretation method is used. Different possibilities have been described. An overview of them is given below. 3.4.7.1 Normal or half normal probability plots Normal probability plots or half normal probability plots (Birnbaun plots) [24,29] are graphical methods that help to decide which factors are significant. Effects that are normally distributed around zero are effects
Y. VANDER HEYDEN and D.L. MASSART
116
that are not different from experimental error (nonsignificant effects) and they tend to fall on a straight line in these plots. Significant effects do not belong to such a normal distribution and deviate from this line. A normal probability plot and a Birnbaun plot are shown in Figure 3.2. In Figure 3.2a the main effects for factors M and A and the two-factor interaction MC are significant. Factor M has the largest influence on the considered response. In Figure 3.2b factor E has a highly significant influence on the response. The effect of factor B can be considered as being on the edge of significance since it is not always obvious how to draw the straight line exactly.
Pp!)
i
92,9 7
--
78,6 6
--
7,l
1
@M
@Mc
--
@A I
I
I
I
Figure 3.2a Example of a Normal probability plot (takenfiom ref [29]).
REVIEW OF ROBUSTNESS rN ANALYTICAL CHEMISTRY
117
VARIABLE
0
20
40
60
80
100
120
140
160
180
EMPIRICAL DISTRIBUTION ( IE,I*lO)
Figure 3.2b Example of a Birnbaun plot (haEfnorma1probability plot) (ReprintedfiomInternational Laboratory, volume 16, page 43, 1986. Copyright 1986 by International Scientific Communications Inc. [24]) 3.4.7.2. Interpretation methods that use t-tests The test statistic [ 11,13,24,27,28,39] is:
where EX is the effect of factor X and (SE), is the standard error of the effect. In fact one tests whether EX is significantly different from zero or not. One could also say that one is comparing the mean responses at the two levels to see if they are significantly different. The critical value is a tvalue (tcritical) for the relevant number of degrees of freedom and a given
118
Y. VANDER HEYDEN and D.L. MASSART
a (usually 0.05). If the It1 -value for an effect exceeds the critical value it is considered to be statistically significant. In this type of application, one often does not try to determine the degrees of liberty correctly. Some authors [11,39] simply use the value 2 for tcjtical. The tcjtical value (-0.05) really depends on the number of degrees of freedom but tends to 2, especially when the number of degrees of freedom becomes larger. All hypothesis tests can be represented in two ways, i.e. using critical values, as described above, or using confidence intervals. The confidence limits are given by:
If the confidence interval of the effect of a factor contains zero, then the effect is not significantly different from zero. When zero is outside the confidence interval, then the factor has a statistically significant influence. A convenient way of representing the results is to calculate an Ecritical above which a calculated effect of a factor will be considered significant: 'critical
= 'critical (sE)e
Only one Ecritical has to be calculated for each response. The Ecritical is
l l
I I
then compared with the E x -values of the factors. The E x -values that are larger are significant.
Methods that estimate (SE)efiom the variance of the experiments. Since an effect is a difference of means (equation (2)) the standard error of the effect is calculated according to the equation for the standard error on a difference of means:
where s,' and sg estimate the variances of the two sets of measurements and na and nb are the number of measurements of those sets. When
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
119
adapting equation (1 1) for the calculation of the standard error of an effect it is assumed that
02 =
0:
= cr2
(estimated by s2) and na=nb=n with s2
being the variance of the experiments from the design and n the number of experiments performed at each level for a factor of the design. This gives2
A number of methods have been proposed to determine the variance, s2. They are described below. 1. Using a variance (s2) from R replicate measurements at nominal level [3 1,391. The standard error of an effect determined fiom unreplicated runs is given by equation (12). The number of degrees of freedom for tcritical is equal to R- 1.
2. Using duplicated runs for the experiments of the design [31]. The d? x d , ' variance is then given by s z = L = where di is the difference 2N 2n between the duplicated experiments. The standard error of an effect is derived from equation (12) and given by
Here n is equal to N since there are N measurements performed at each level of a factor. The number of degrees of fieedom for tcritical is N.
* Remark:
If eq. (2') is used to calculate the effects this has a consequence on the formula used to estimate (SE),. In eq. (2') the effect is considered as "a mean of N individual results [13]" (in which N/2 results have a (+) sign and N/2 a (-) one). The standard error is then estimated as a standard error on the mean of N results giving (SE),
120
Y. VANDER HEYDEN and D.L. MASSART
3. Using the variance obtained with the results (yl,y,, ... , y,) from the
~=C(Y~-Y)*
design as proposed by [ 131. The variance is given by s
N-1 where is the mean of the results. This criterion can however only be used if no significant effects occur [ 1 11. Therefore, this method should not be used. In reference [ 131, however, the restriction is not made.
Methods that estimate (SE), using negligible effects. In the previous methods the standard error for an effect was estimated from the variance of the experiments using equation (12). In the methods described in this section, the variance for an effect is estimated with the help of calculated effects that are considered to be negligible. 1. Using the effects of multiple-factor interactions from full or fractional factorial designs. Multiple-factor interactions (e.g. three- and four-factor interactions) are considered to have a negligible effect. It is then considered that these higher-order interaction effects measure differences arising from experimental error [3 13.
c
The variance of an effect is given here by (SE)e2=
EiiXjXk
where x jxk E ~ X X are the effects of the multiple-factor interactions and n x x X 1 1 k 1 J k is the number of these effects. The estimated standard error of an effect, (SE)e, is then given by nX
i
The number of degrees of freedom used for tcritical is nX.X.X 1 J k' 2. Using dummy factors in Plackett-Burman designs [24,26,27]. The effect of a dummy factor is considered to be due to experimental error.
(SE); =
C E2durnrny;i ndurnrny
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
121
where Edummy,i are the effects of the dummy factors and fldummy is the number of dummies. The estimated standard error of an effect is then
The number of degrees of freedom used for tcritical is ndummy. The equations used to estimate (SE), from negligible effects can be explained as follows. The mean effect of the multiple-factor interactions or of the dummies is expected to be zero. The variance of these effects can then be calculated as:
or IZdummy
IZdummy
In Plackett-Burman designs the main effects are confounded with a number of two-factor and higher-order interactions as already seen before. In a ruggedness test one is mainly interested in finding the significant main effects. The two- and higher-order interactions are often considered to be negligible. Two-factor interactions are usually indeed smaller than the main effects but they are not always so small to be completely neglected. As a consequence they will contribute to the calculated main effect. By using a number of dummies to estimate the (SE), value, one obtains a measure for experimental error to which these interaction effects can contribute. The (SE), value obtained with the methods of the previous part of this section where it was estimated from duplicated experiments in the design or from replicated measurements at nominal level and that can be considered as a measure for the
122
Y. VANDER HEYDEN and D.L. MASSART
repeatability do not take this into account and therefore is expected to be smaller then the one obtained from the method with the dummies. When (SE), is estimated from two-factor interactions in fractional factorial designs (instead of from three-factor or higher-order interactions) one has an interpretation criterion which is analogous to the one with dummies used for the Plackett-Burman designs. A possible consequence of using the dummy factor effects or the twofactor interactions is that significant interactions will increase the (SE), considerably. This problem can be avoided in the following way. The strongly significant dummy or two-factor interaction effects can be detected with normal probability plots and omitted from the statistical interpretation. Moreover, enough dummies or two-factor interaction effects should be used to estimate (SE), (for example at least three). 3.4.7.3 Interpretation methods that use an F-test Multiple-factor interaction effects can be used to determine a critical effect value for the interpretation of full and fractional factorial designs [29] :
Im
where m is the number of interaction effects considered; E x x x is the 1 J
k
effect of a multiple-factor interaction and F(2,m) the F-value of the Fisher distribution. The following critical level, derived from the above described statistical critical level, was also found in the literature [40]:
where k depends on the fraction of the factorial design and represents the number of effects that are confounded with each other, e.g. k = 2 for a half-
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
123
fraction factorial design, since 2 effects are confounded with each other. The idea behind the use of Ecritica12 is that the calculated main effect for a factor is also attributed partly to a number of multiple-factor interaction effects. The calculated effect for a non-significant factor can, according to [40], become statistically significant when using Ecriticall, due to the contribution of the multiple-factor interaction effects to the calculated effect, while this would not be the case with Ecritical However, in the 2' formula for the calculation of Ecritical there is a contradiction. On the 2 one hand it considers the multiple-factor interactions as being negligible (see calculation of Ecritical in equation (20) as well as the analogy with 1 equation (14)) while on the other hand it assumes that the multiple-factor interaction effects confounded with the main effects have the same order of magnitude as the main effects (see the use of factor k). Therefore, we do not recommend its use. One can also use the dummy factor effects in a similar way for the interpretation of Plackett-Burman designs. One then uses a formula analogous to equation (20). Instead of using the multiple-factor effects in the formula one uses the dummy factor effects:
i=l
'dummy
F(l, ndummy
where Ed is the effect of a dummy factor, ??dummy is the number of dummies used and F(I,ndummy) the corresponding value of a Fisher distribution. These interpretation criteria are less used in the literature than the t-test methods. However, both methods will yield identical results since f m d m ' [413. Some authors [29,34] present the statistical interpretation method as an ANOVA table. A general example for a 23 full factorial design is given in Table 3.19. The sums of squares (SSd are obtained with the effect values (EX) and the number of experiments in the design (N). The mean square
124
Y. VANDER HEYDEN and D.L. MASSART
values (MSx) are calculated with the SSx and the corresponding number of degrees of freedom ( d ! . The variance ratio, Fx, is obtained by dividing the M S x with the mean square value representing the error (MSerror). The calculated Fx value is then compared to a Fcritical that is equal to the tabulated F(dfx, df-ror). If Fx is larger the effect is statistically significant. The error term can be approximated in different ways. A first possibility is that, analogous to the above, it is estimated from the multiple-factor interactions (two-, three-factor interactions, etc.) for (fractional) factorial designs [29]. In the example of Table 3.19 the sums of squares of the interactions AB, AC, BC and ABC are summed giving a SSerror with 4 degrees of freedom. From this SSenor a MSenor is calculated that is used in the calculation of Fx for the main effects. The ANOVA table and equation (20) give of course the same results. A second possibility is described in reference [34]. It consists of summing all S S x values for which the %SSxvalue (see Table 3.19) is smaller than 5% to make an approximation of SSerrop The method assumes that these sums of squares come from effects that are negligible. The 5% value is an arbitrary value. The number of degrees of freedom is also equal to the sum of the dfx. The values for M S x and F x are then calculated analogous to what is described above. Thirdly, when the experiments of the design are replicated a number of degrees of freedom remain after the calculation of the SSx values and they allow to calculate a SSerror. This SSerror and the &error are then used to calculate MSerror and F X [34].
3.4.8 Using predefined values to identify chemically relevant factors When using predefined values [4] no statistical test is performed to identify relevant factors. So-called chemically relevant effects of factors are identified by comparing them with predefined critical values for the responses.
TABLE 3.19 Source of variation
Degrees of freedom
factors) A B C AB AC
&A (=I) dJij (=I) dfc (=I) ~ A (=I) B dfAC (=I)
ANOVA TABLE FOR A 23 FACTORIAL DESIGN Sum of squares (SS) %SS Mean Square
Variance ratio: F -.
SSA SSB SSC S~AB SSAC
%ssA %ssB YOSSC YOSSAB %ssAC
MSA MSB MSC M~AB MSAC
FA FB FC =3
Mserror
BC ABC
d h C (=I) dfABC (=I)
General notation:
dfx
SSBC SsABC
%ssBC %SSABC
MSBC M~ABC
126
Y. VANDER HEYDEN and D.L. MASSART
The authors also define a standard error although differently from the one previously given in equation 12. For that reason, when speaking about the standard error defined in reference [4] it will be indicated as the relative standard deviation of the experiments (RSD) since in fact that is what is calculated. The calculated effects are normalized on a scale of 100%. The experiments in the design are duplicated. The effect and the relative standard deviation are calculated as follows:
E EX(%) = 2-1 0 0 Y?Z
vn
where EX represent the effect of factor X as described higher and is the mean result of a number of measurements at nominal level (obtained from within or outside the design). In this method the effect values EX(%) are compared with predefined values to identify relevant factors. These predefined values do not represent the limit of statistical significance but the limit of chemical relevance. These limits represent acceptable variations that are allowed to occur in practice. A list of these predefined values for the effect of factors on responses measured in HPLC is shown in Table 3.20. The relative standard deviation is not used in a statistical test [4]. It is only used to check if the repeatability of the method is good enough. If the relative standard deviation is larger than 1%, repeatability is considered too high for a HPLC method. In that case the reason for the large relative standard deviation has to be diagnosed preliminary to the interpretation of the main effects from the ruggedness test. To obtain a standard error as defined earlier in this review one should use the formula described in equation (1 2) and also presented in reference [ 171 which would give in this case:
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
127
TABLE 3.20 LIST OF SOME PREDEFINED VALUES IN HPLC (REPRINTED WITH PERMISSION FROM REFERENCE 141) Response Predefined value 1% Conc. peak area 1Yo Conc. peak hight Plate count 50% Retention time 10% Peak area 2% Peak hight 2% 50% or 2.5 Resolution
3.4.9 Case studies
3.4.9.1 Ruggedness tests of a HPLC method for the determination of tetracycline.HC1 as described in the USPXXII [38] A Plackett-Burman (N=l2) and a quarter-fraction factorial design (Generators: E=ABC, F=BCD) were used to examine six factors. The factors were examined at two levels. The factors and their levels are shown in Table 3.21. The effects and the normalized effects on the following responses were determined: retention time, capacity factor, relative retention of tetracycline and of three by- and degradation products (44-epianhydrotetracycline (EATC) and epitetracycline (ETC), anhydrotetracycline(ATC)) and the resolution between the peaks. Critical effect values were obtained with a t-test (see equation (8) and (10)). The standard error was estimated from dummy factor effects for the PlackettBurman design (see Section 3.4.7.2, equation (19)) and from two-factor interaction effects for the fractional factorial design (see Section 3.4.7.2, equation (17)). Normal probability plots were also drawn for the normalized effects. The effects of the factors on some responses of tetracycline are given in Table 3.22 and one of the normal probability plots is shown in Figure 3.3. Analogous results are found from the statistical analyses of the PlackettBurman and the fractional factorial design in spite of the different level of confounding in these designs and of the different ways of estimating (SE),.
128
Y. VANDER HEYDEN and D.L. MASSART
The normal probability plots also lead to identical conclusions as the statistical analysis with the t-tests.
1
__
095
--
0
0
__
__ -1 __ F
o0
-0,5
-195 -40
C
0 0 0 0
0. 0 0
BDKF 0
-30
-20
-10
0
10
20
Observed Value (=Ex)
30
Figure 3.3 Normal probability plot of the normalized effectsfor the resolution between epianhydrotetracycline and tetracycline obtained JFom theJFactional factorial design
TABLE 3.21 FACTORS AND THEIR LEVELS FOR THE DETERMINATION OF TETRACYCLINE.HCL (CASE STUDY 1) Factors Levels A. Inorganic substances in mobile 0.0975M 0.1025M phase 0.195M 0.205M M(ammoniumoxalate) Ratio M(ammoniumphosphate) 260 ml 280 ml B. Dimethylformamide in mobilephase 7.50 7.80 C. pH of mobile phase 0.9 ml/min 1.1 ml/min D. Flow of mobile phase E. Integration parameter (SN-ratio) 1 3 F. Age of column new column 2 weeks used
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
129
One concludes for example that for the response capacity factor the ageing of the column (factor F) has the most significant effect. The amount DMF in the mobile phase (factor B) has a smaller effect which is on the limit of significancy. The effect of the factors A, C, D and E on the capacity factor is not significant. For the retention time the same conclusions as for the capacity factor can be drawn with as only difference the fact that the flow of the mobile phase (factor D) has a large effect. Concerning the resolution between the different peaks, the ageing of the column (F) has the largest influence. The pH (C) also has a significant effect while the other factors are not significant. From Figure 3.3 and Table 3.22 it can be observed that the effect estimated for the two-factor interactions BD+CF also is significant. Considering the fact that the main effects of the factors C and F are highly significant it is most likely that the significant effect found for BD+CF is due to the interaction CF and not to BD nor to one of the other two higher-order interactions confounded in this estimation. For the relative retention analogous results as for the resolution were obtained. The same factors cause an effect.
TABLE 3.22 EFFECTS, NORMALIZED EFFECTS (%E) AND CRITICAL EFFECT VALUES ON SOME RESPONSES OF TETRACYCLINE FROM THE PLACKETT-BURMAN AND THE FRACTIONAL FACTORIAL DESIGN Plackett-Burman design Retention time Factors
A B C D E F
Effect
Capacityfactor
%E
0.085 1.18 -0.363 -5.03 0.230 3.19 -1.344 -18.63 -0.033 -0.46 -0.929 -12.88
Effect
0.048 -0.128 0.071 -0.067 0.000 -0.367
Relative retention
Resolution
%E
Effect
%E
Effect
%E
2.52 -6.68 3.72 -3.51 -0.01 -19.15
-0.0370 -0.0502 0.0912 -0.0416 0.0178 -0.2808
-2.14 -2.91 5.28 -2.41 1.03 -16.27
-0.1148 -0.0521 0.3584 -0.2552 0.1869 -0.3133
-6.35 -2.89 19.84 -14.13 10.34 -17.34
130
Y. VANDER HEYDEN and D.L. MASSART
TABLE 3.22 (CONTINUED) Dummy 1 Dummy2 Dummy3 Dummy4 Dummy 5
0.160 0.176 0.136 -0.177 0.005
2.22 2.45 1.88 -2.46 0.07
0.080 0.086 0.035 -0.060 -0.010
4.16 4.49 1.82 -3.14 -0.52
0.0727 0.0495 0.0597 0.0028 0.0203
Ecritical
0.376
5.21
0.157
8.20
0.125
4.21 0.1198 6.63 2.87 0.1842 10.19 3.46 0.1540 8.52 0.16 -0.0212 -1.17 1.18 0.0003 0.02 7.21
0.309 17.13
Quarter @actionfactorial design Retention time Factors A B C D E F
%E
Effect
0.096 -0.200 0.236 -1.192 0.097 -0.854
1.33 -2.77 3.28 -16.53 1.34 -11.84
0.030 1.54 -0.094 -4.93 0.097 5.06 -0.004 -0.19 0.021 1.12 -0.328 -17.10
0.161 0.144 -0.006 0.216 -0.142 -0.128 0.038
2.23 2.00 -0.09 2.99 -1.97 -1.77 0.53
0.043 0.045 0.014 0.040 -0.042 -0.050 0.034
0.324 4.49 Ecritical Ecriticd Without effect of (BD+CF)
0.094
AB+CE AC+BE AD+EF AE+BC AF+DE BD+CF BF+CD
Effect
Capacifyfactor %E
Relative retention Effect %E
Effect
%E
-0.0122 -0.71 -0.0237 -1.37 0.0991 5.74 -0.0175 -1.02 0.0141 0.82 -0.2020 ..11.70
0.0882 -0.0019 0.4166 0.1374 -0.0339 -0.643 1
4.88 -0.10 23.06 7.60 -1.87 -35.60
0.87 0.50 1.22 -1.30 -1.50 -5.17 0.06
0.0716 -0.0076 -0.0822 0.0906 0.0535 -0.4053 -0.0802
3.96 -0.42 -4.55 5.02 2.96 -22.43 -4.44
0.089 5.15 0.039 2.27
0.393 0.153
21.78 8.49
2.26 0.0151 2.36 0.0086 0.74 0.0211 2.08 -0.0225 -2.21 -0.0258 -2.62 -0.0893 1.75 0.0010 4.93
Resolution
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
131
3.4.9.2 HPLC assays of acetylsalicylic acid and its major degradation product, salicylic acid [5,17] and of Salbutamol and its major degradation product [5,6] These two case studies are good examples of ruggedness tests that focus on the type of data analysis described in Section 3.4.6 and where one prefers not to carry out a statistical analysis. They will be described in detail later in this book (Chapter 5). The factors are examined at three levels in reflected Plackett-Burman designs. From the results of the designs normalized effects, EX(%), are calculated. No statistical interpretation criterion is used to identify significant effects but possibly relevant factors are determined by plotting the effects, EX(%), of the different factors (see Figure 3.4). Salbutamol
Figure 3.4 The effects, EXPA),of the differentfactors on the resolution (RJ between Salbutamol and its major degradation product (extractedfiom ref [4])
132
Y. VANDER HEYDEN and D.L. MASSART
Other results obtained from the ruggedness test are the definition of optimized method conditions for the factors and of system suitability criteria for a number of responses. System suitability parameters [6,17] are defined as an interval in which a response can vary for a rugged method. The system suitability criteria are the range of values between which a response (e.g. retention time, capacity factor, number of theoretical plates, resolution) can vary without affecting the quantitative results of the analysis. For instance, a design is performed and the retention time of the main substance varies between 200 s and 320 s without affecting the quantitative determination of the substances. The system suitability criteria for the retention time is then defined as the interval 200 s - 320 s. Optimal values for the factors are selected from the tested levels for the factors (extremes or nominal) in function of a number of responses of the method (see also references [ 16,191). When one changes the method conditions due to these results one has to be aware that a new method is defined. What is done here is in fact a simplistic way of optimizing a method. The optimization of a method however is a step that is expected to come much sooner in the method development than in the ruggedness testing. One also has to realize that when one defines a new method this requires a new full validation, including a ruggedness test.
TABLE 3.23 FACTORS AND THEIR LEVELS FOR THE DETERMINATION OF LINCOMYCINE A [13] Factor Nominal Minimal Maximal level level level Concentration 300 285 315 lincomycine A (mg/ml) pH of mobile phase
4.5
4.0
5.0
Flow of mobile phase (ml/min)
1.o
0.7
1.3
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
133
TABLE 3.24 EFFECTS ON THE RESPONSE ANALYTICALLY DETERMINED CONCENTRATION LINCOMYCINE A FOR THE RUGGEDNESS TEST OF CASE STUDY 3 [ 131 Factor Effect Confidence Conclusion interval on the effect A: Concentration + 15 +1 t ) + 2 9 Significant lincomycine A B: pH of mobile phase + 4.5 - 9.5 t)+ 18.5 Not significant C: Flow
+ 0.5
Interaction AB
-0.25
Interaction AC
- 0.75
Interaction BC
- 0.75
- 13.5 t)+ 14.5 - 14.3 t)+ 13.8 - 14.8 t)+ 13.3 - 14.8 t)+ 13.3
Not significant Not significant Not significant Not significant
3.4.9.4Ruggedness test for the determination of water infertilizers by total distillation with heptane [ 121 I Seven factors were examined in a 1/16th fraction of a 2 design. The factors and their levels are shown in Table 3.25. The factors were examined at the nominal and at an extreme level. A design was performed on four types of fertilizers containing, when determined under nominal conditions, respectively 18.80%, 1.10%, 3.83% and 1.03% water. For the first three fertilizers the results of the design and the calculated effects are given in Table 3.26. No statistical test was performed to identify significant effects. Important effects are determined by ranking the effects and comparing their values with each other. Due to the large difference in the water content comparisons between the effects in the different fertilizers (see Table 3.26) are clearer when normalized effects are used. For fertilizer 2, for instance, the effects are smaller in absolute value than for fertilizer 1, but much larger in relative value. The use of the relative normalized effects allows clearer comparisons between effects of a factor on the response of different samples.
134
Y. VANDER HEYDEN and D.L. MASSART
TABLE 3.25 THE FACTORS AND THEIR LEVELS EXAMINED FOR THE DETERMINATION OF WATER IN FERTILIZERS (CASE STUDY 4) r121
Factors A. Amount of water B. Reaction time C. Distillation rate D. Distillation time E. n-Heptane F. Aniline G. Reagent
Nominal level ca. 2ml 0 min. 2 dropsls 90 min. 210 ml 8 ml new
Extreme level ca. 5ml 15 min. 6 dropsls 45 min. 190 ml 12 ml used
TABLE 3.26
Exp. 1 2 3 4 5 6 7 8
RESULTS FOR THE RUGGEDNESS TEST ON THE DETERMINATION OF WATER IN FERTILIZERS [ 121 Amount of water (%) Effect 1 2 3 Factors 1 2 A 0.27 -0.07 18.80 3.93 1.10 B -0.10 -0.09 4.10 20.58 1.74 C -0.11 0.21 19.90 1.02 3.59 D -0.63 -0.23 18.03 1.19 3.38 19.50 1.10 3.49 E 0.08 0.19 F 0.83 0.11 19.16 1.13 3.77 19.88 1.24 3.97 G 0.99 0.11 19.85
1.27
3 -0.07 -0.21 -0.06 -0.27 0.09 0.33 -0.09
3.40
Factors A B C D E F G
Normalized effect (96) 1 2 -6.36 1.44 -0.53 -8.18 -0.86 19.09 -3.35 -20.91 0.43 17.27 4.41 10.00 5.27 10.00
3 -1.83 -5.48 -1.57 -7.05 2.35 8.62 -2.35
REVIEW OF ROBUSTNESS IN ANALYTICAL CHEMISTRY
135
3.4.9.5 Ruggedness test on an HPLC assay for the determination of phenylbutazone and its major degradation products [22] A Plackett-Burman design for 7 factors (N=8) was applied. The factors are tested at the nominal and an extreme level. They are shown in Table 3.27. For three brands of phenylbutazone (injectable solutions) and a reference solution (standard) experiments according to a design were performed. The responses considered are the amounts of phenylbutazone and two of its degradation products expressed as a percentage of the theoretical amount phenylbutazone in the injection solution. To identify significant effects they were compared to a critical value. From reference [22] it is not clear how this critical value was determined. The results for one brand of phenylbutazone are given in Table 3.28. From the results of the three brands it was observed that the only factor with a really significant influence was the age of the reference solution (C, in Table 3.28). The ruggedness test shows that the method description should specify that the test solutions have to be freshly prepared a shortly before analysis.
TABLE 3.27 FACTORS AND THEIR LEVELS FOR THE DETERMINATION OF PHENYLBUTAZONE (CASE STUDY 5) (REPRINTED WITH PERMISSION FROM [22]) Factors Nominal level Extreme level A. Ionic strength of buffer 0.10M 0.09M B. pH of buffer 5.25 5.35 C. Age of the reference solution Oh 18h D. Concentration in the reference 200 mg/ml 180 mg/ml solution E. Composition of the mobile phase 51 : 46.5 : 2.5 52 : 46 : 2 (TRIS/citrate-acetonitriletetrahydrofuran) F. Detection wavelength 237 nm 239 nm G. Flow of mobile phase 2.0 ml/min 1.9 ml/min
136
Y. VANDER HEYDEN and D.L. MASSART
TABLE 3.28 SOME RESULTS FOR THE RUGGEDNESS TESTS ON PHENYLBUTAZONE (REPRINTED WITH PERMISSION FROM r221) Deg. Exp prod. I
Deg. prod. 11
Phenyl- Factors butazone
1 2 3 4 5 6 7 8
1.559 1.553 1.506 1.649 1.568 1.570 1.533 1.573
97.78 101.58 98.39 102.54 96.80 100.82 95.84 103.65
0.743 0.739 0.719 0.715 0.743 0.714 0.756 0.748
Deg. prod. I
Deg. prod. 11
- 0.010
B
+ 0.005
+ 0.008 + 0.795 + 0.002 - 0.860
C
+0.010 + 0.025 - 0.010 + 0.005 + 0.005
- 0.042 - 0.022 - 0.022
A
D
Phenylbutazone
- 4.945 + 0.075
+ 0.970 + 1.035
+ 0.048 + 0.028 - 0.860
Critical value P