Frontiers in Statistical Quality Control 8

Hans-Joachim Lenz Peter-Theodor Wilrich Editors With 92 Figures and 93 Tables Physica-Verlag A Springer Company

2,272 337 68MB

Pages 358 Page size 336 x 529.4 pts Year 2010

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

Statistical Process Control

For Susan, Jane and Robert Fifth Edition John S. Oakland PhD, CChem, MRSC, FIQA, FSS, MASQ, FInstD, MInstM, FRSA E

819 136 3MB Read more

Statistical Process Control

For Susan, Jane and Robert Fifth Edition John S. Oakland PhD, CChem, MRSC, FIQA, FSS, MASQ, FInstD, MInstM, FRSA E

690 144 3MB Read more

Statistical process control

For Susan, Jane and Robert Sixth Edition John S. Oakland PhD, CChem, MRSC, FCQI, FSS, MASQ, FloD Executive Chairma

3,109 2,001 3MB Read more

Statistical Process Control, Sixth Edition

Statistical Process Control For Susan, Jane and Robert Statistical Process Control Sixth Edition John S. Oakland PhD

1,880 488 4MB Read more

Statistical Process Control, Fifth Edition

Statistical Process Control For Susan, Jane and Robert Statistical Process Control Fifth Edition John S. Oakland Ph

1,856 102 4MB Read more

Chaos in automatic control

DK3143_half-series-title.qxd 9/12/05 8:27 AM Page A DK3143_half-series-title.qxd 9/12/05 8:27 AM Page C CONTRO

1,044 636 7MB Read more

Chaos in Automatic Control

DK3143_half-series-title.qxd 9/12/05 8:27 AM Page A DK3143_half-series-title.qxd 9/12/05 8:27 AM Page C CONTRO

1,009 84 5MB Read more

Power Quality in Electrical Systems

This page intentionally left blank Alexander Kusko, Sc.D., P.E. Marc T. Thompson, Ph.D. New York Chicago San Fran

2,759 2,026 2MB Read more

Power Quality in Electrical Systems

This page intentionally left blank Alexander Kusko, Sc.D., P.E. Marc T. Thompson, Ph.D. New York Chicago San Fran

2,476 714 2MB Read more

Statistical Models (Cambridge Series in Statistical and Probabilistic Mathematics)

This page intentionally left blank Statistical models CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS

795 28 5MB Read more

File loading please wait...

Citation preview

Frontiers in Statistical Quality Control 8

Hans-Joachim Lenz Peter-Theodor Wilrich Editors

Frontiers in Statistical Quality Control 8 With 92 Figures and 93 Tables

Physica-Verlag A Springer Company

Professor Dr. Hans-Joachim Lenz [email protected] Professor Dr. Peter-Theodor Wilrich [email protected] Freie Universitåt Berlin Institut fçr Statistik und Úkonometrie Garystraûe 21 14195 Berlin Germany

ISBN-10 ISBN-13

3-7908-1686-8 Physica-Verlag Heidelberg New York 978-3-7908-1686-0 Physica-Verlag Heidelberg New York

Cataloging-in-Publication Data applied for Library of Congress Control Number: 2006921315 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Physica-Verlag. Violations are liable for prosecution under the German Copyright Law. Physica is a part of Springer Science+Business Media springer.com ° Physica-Verlag Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Soft-Cover-Design: Erich Kirchner, Heidelberg SPIN 11611004

88/3153-5 4 3 2 1 0 ± Printed on acid-free and non-aging paper

Editorial

The VIIIth International Workshop on "Intelligent Statistical Quality Control" took place in Warsaw, Poland, and was hosted by Professor Dr. Olgierd Hryniewicz, Systems Research Institute of the Polish Academy of Sciences and Warsaw School of Information Technology, Warsaw, Poland. The workshop itself was jointly organized by Professor Dr. 0 . Hryniewicz, Professor Dr. H.-J. Lenz, Professor Dr. P.-T. Wilrich, Dr. P. Grzegorzewski, Edyta Mrowka and Maciej Romaniuk. The workshop papers integrated in this volume are divided into three main parts: Part I: "General Aspects of SQC Methodology", Part 2: "On-line Control" with subchapters "Sampling Plans", "Control Charts" and "Monitoring", and Part 3: "Off-line Control" including Data Analysis, Calibration and Experimental Design. In Part 1 "General Aspects of SQC Methodology" von Collani and Palcat analyze "How Some ISO-Standards Complicate Quality Improvement". They compare the aims of ISOStandards for QC with the aims of continuous quality improvement. Due to a lack of compatibility different QC procedures are proposed. In Part 2 "On-line Control" there are fifteen papers. It starts with two papers on "Sampling Plans". Hryniewicz considers "Optimal Two-Stage Sequential Sampling Plans by Attributes". Acceptance sampling by attributes requires large samples when the fraction of nonconforming items in sampled lots or processes is very low. Wald's sequential sampling plans have been designed in order to meet this situation. Hryniewicz proposes restricted, curtailed sequential sampling plans for attributes. The plans fulfil pre-specified statistical requirements for risks, while offering minimal sampling efforts. Palcat reviews three-class sampling plans useful for legal metrology studies in his paper entitled "ThreeClass Sampling Plans - A Review with Applications". He reviews the key features of the three-class sampling plan theory and discusses some applications, where such plans would be effective for QC. The author examines applications, which are specific to the field of legal metrology. He closes with case studies where isolated lots are common and currently used methods are problematic. Control Charting has been an integral part of On-line Control, and there is evidence that this will continue. Therefore almost one half of the papers focus on "Control Charts". Bodnar and Schmid in "CUSUM Control Schemes for Multivariate Time Series" extend multivariate CUSUM charts to VARMA processes with Gaussian noise superimposed. They consider both modified control charts and residuals charts. By an extensive Monte Carlo study they compare them with the multivariate EWMA chart (Kramer and Schmid 1997). Knoth pays special attention to the correct design goal when control charts are run. His paper is entitled "The Art of Evaluating Monitoring Schemes - How to Measure the Performance of Control Charts". The author asks for caution when using the "the minimal out-of-control ARL" as a design criterion of monitoring schemes and advocates the "minimal steady-state ARL" from the viewpoint of features of the steady-state delay distribution. Morais and Pacheco present some striking examples of joint (p, 0)-schemes in "Misleading Signals in Joint Schemes for p and o".They show that the occurrence of misleading signals should alert the quality staff on the shop floor, and the practioneers should be bothered. Mrdwka and Grzegorzewski contribute to a new design of control

charts with a paper on "The FrCchet Control Charts". They suggest the FrCchet distance for simultaneously monitoring of process level and spread. Their new chart behaves comparable to classic control charts if changes either in process level or in process spread only are observed. However, it is much better than a combined (7- $-chart if simultaneous disturbances of the process level and spread happen. In their paper entitled "Reconsidering Control Charts in Japan" Nishinn, Kuzuya and Ishii study the role of causality and its relation to goals as target functions of control charting. Machine capability improvements due to advanced production technology have resulted in variance reduction within subgroups. They note that part of the variance between subgroups can be included into the variance due to chances. In a case study they show that a measurement characteristic specified by a related Standard is not necessarily appropriate for the control characteristic. Pokropp, Seidel, Begun, Heidenreich and Sever monitor police activities in "Control Charts for the Number of Children Injured in Traffic Accidents". They specify a generalised linear model (GLM) with Poisson counts. Parameter estimation is based on data, which represents the daily number of injuries. Seasonal effects are considered. Control limits are computed by Monte-Carlo simulation of the underlying mixing distributions in order to detect deviations from the police target values for various periods of interest. Reynolds jr. and Stoumbos take a look at process deviations and follow up the rational subgroup concept in "A New Perspective on the Fundamental Concept of Rational Subgroups". Control charts are usually based on a sampling interval of fixed length. They investigate the question whether it is better or not to use sample sizes n = 1 or n > I and to select either concentrated or dispersed sampling. A tandem chart to control p and o is investigated. They conclude that the best overall performance is obtained by taking samples of n = l and using an EWMA or CUSUM chart combination. The Shewhart chart combination with the best overall performance is based on n > 1. Saniga, McWilliams, Dnvis and Lucas investigate "Economic Advantages of CUSUM Control Charts for Variables". Their view on an economic CUSUM design is more general than the scope of earlier publications on this topic. ARLs are calculated using the Luceno and Puig-Pey (2002) algorithm in combination with a Nelder Mead search procedure. The policy decision of choosing a CUSLJM chart or a Shewhart chart is addressed. Suzuki, Harada and Ojima present a study on "Choice of Control Interval for Controlling Assembly Processes". Time series models are used for effective process control of specific assembly processes, especially, if the number of products is high. Influential factors like the control interval or the dead time of the assembly process are considered. Yasui, Ojima and Suzuki in "Generalisation of the Run Rules for the Shewhart Control Charts" extend Shewart's 3sigma rule -and propose two new rules based on sequences of observations. The performance of such modifications is evaluated under several out-of-control scenarios. Part 2 closes with three papers on "Monitoring". Andersson in her contribution to "Robust On-Line-Turning Point Detection. The Influence of Turning Point Characteristics" is interested in turning point problems of cyclical processes. She develops and evaluates the methodology for on-line detection of turning points in production processes by using an approximate ML estimation technique combined with a nonparametric approach. Iwersen and Melgnard in 'Specification Setting for Drugs in the Pharmaceutical Industry" discuss the practical implications of setting and maintaining specifications for drugs in the pharmaceutical industry. These include statistical process control limits, release limits, shelf life limits and in-use limits. The challenge is to make the limits consistent and practical. The approach involves normal linear mixed models and the Arrhenius model, a kinetic model, which describes for example the temperature dependence on drug degradation. In "Monitoring a Sequencing Batch Reactor for the Treatment of Wastewater by a Combination of Multivariate Statistical Process Control and a Classification

VII

Technique" Ruiz, Colomer amd Melendez combine multivariate SPC and a specially tailored classification technique in order to monitor a wastewater treatment plant. Part 3 "Off-line Control" includes five papers. Gbb discusses in "Data Mining and Statistical Control - A Review and Some Links" statistical quality control and its relation to very large (Terabytes) databases of operational databases sampled from industrial processes. He strongly advocates for adoption of techniques for handling and exploring large data sets, i.e. OLTP databases and (OLAP) data warehouses in industry. He reviews the links between data mining techniques and statistical quality control and sketches ways of reconciling these disciplines. Grzegorzewski and Mrdwka consider the calibration problem in which the corresponding loss function is no more piecewise constant as in Ladany (2001). In their paper on "Optimal Process Calibration under Nonsymmetric Loss Function" they consider the problem of how to set up a manufacturing process in order to make it capable. They propose an optimal calibration method for such loss functions. The suggested calibration procedure depends on the process capability index C p. Ojima, Yasui, Feng, Suzuki and Hararin are concerned with "The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments". They apply a canonical form of generalised staggered nested designs, and the probability of the occurrence of negative LS estimates of variance components is evaluated. Some practical hints are derived for the necessary number of laboratories involved in such problems. Koyama in "Statistical Methods Applied to a Semiconductor Manufacturing Process" uses a L16(2") orthogonal design and presents a semi-conductor factory scenario where new types of semiconductors are to be manufactured very shortly after the design. The lack of time causes small data sets as well as a lot of missing values. Finally, Vining and Kowalski in "An Overview of Composite Designs Run as Split-Plots" firstly summarise the results of Vining, Kowalski, and Montgomery (2004) and Vining, Parker, and Kowalski (2004). The authors secondly illustrate how to modify standard central composite designs and composite designs based on Plackett-Burman designs to accommodate the split-plot structure. The paper concludes with a walk through a fully worked-out example. The impact of any workshop is mainly shaped by the quality of papers, which are presented at the meeting, revised later and finally submitted. We would like to express our deep gratitude to the following members of the scientific programme committee, who did an excellent job with respect to the recruiting of invited speakers as well as refereeing all the submitted papers: Mr David Baillie, United Kingdom Prof. Elart von Collani, Germany Prof. Olgierd Hryniewicz, Poland Prof. Hans-J. Lenz, Germany Prof. Yoshikazu Ojima, Japan Prof. Poul Thyregod, Denmark Prof. Peter-Th. Wilrich, Germany Prof. William H. Woodall, U.S.A. We would like to close with our cordial thanks to Mrs. Angelika Wnuk, Institute of Production, Information Systems and Operations Research, Free University Berlin, who assisted us to clean up and to integrate WINWORD papers.

VIII

We gratefully acknowledge financial support of the Department of Economics, Institute of Statistics and Econometrics, and Institute of Production, Information Systems and Operations Research of the Free University of Berlin, Germany, which made it possible to get this volume put to press. Moreover, we again thank the Physica-Verlag, Heidelberg, for his continuing efficient collaboration. On behalf of all participants, the editors would like to thank Professor Dr. Olgierd Hryniewicz and his staff for their superb hospitality, the perfect organisation, and the stimulating scientific atmosphere. We are happy and proud to announce that the International Workshop on Intelligent Statistical Quality Control will be continued in 2007. Berlin, November 2005

Hans - J. Lenz Peter-Th. Wilrich

Contents PART 1:

GENERAL ASPECTS OF SQC METHODOLOGY

How Some IS0 Standards Complicate Quality Improvement E. von Collani, F. A. Palcat ...................................................................... PART 2:

3

ON-LINE CONTROL

2.1 Sampling Plans Optimal Two-Stage Sequential Sampling Plans by Attributes 0. Hryniewicz ................................................................................... 21 Three-Class Sampling Plans: A Review with Applications F. A. Palcat ........................................................................................

34

2.2 Control Charts CUSUM Control Schemes for Multivariate Time Series M. Bodnar, W. Schmid ...........................................................................

55

--

The Art of Evaluating Monitoring Schemes How to Measure the Performance of Control Charts? S. Knoth .............................................................................................

74

Misleading Signals in Joint Schemes for p and o M. C. Morais, A. Pacheco ........................................................................100 The Fr6chet Control Charts E. Mrowka, P. Grzegorzewski

..................................................................

123

Reconsidering Control Charts in Japan K. Nishina, K. Kuzuya, N. Ishii ..................................................................

136

Control Charts for the Number of Children Injured in Traffic Accidents F. Pokropp, W. Seidel, A. Begun, M. Heidenreich, K. Sever ................................ 151 A New Perspective on the Fundamental Concept of Rational Subgroups M. R. Reynolds, Jr., Z. G. Stoumbos ..........................................................

172

Economic Advantages of CUSUM Control Charts for Variables E. M. Saniga, T. P. McWilliams, D. J. Davis, J. M. Lucas .................................. 185 Choice of Control Interval for Controlling Assembly Processes T. Suzuki, T. Harada, Y. Ojima ................................................................... 199

Generalization of the Run Rules for the Shewhart Control Charts S. Yasui, Y. Ojima, T. Suzuki ................................................................... 207 2.3 Monitoring

Robust On-Line Turning Point Detection. The Influence of Turning Point Characteristics E. Andersson ............................ ... ............................................. 223 Specification Setting for Drugs in the Pharmaceutical Industry J. Iwersen, H. Melgaard .......................................................................... 249 Monitoring a Sequencing Batch Reactor for the Treatment of Wastewater by a Combination of Multivariate Statistical Process Control and a Classification Technique M. Ruiz, J. Colomer, J. Melendez ............................................................... 263

PART 3 :

OFF-LINE CONTROL

Data Mining and Statistical Control - A Review and Some Links ........... ........................................................ 285 R. Gob ............................. Optimal Process Calibration under Nonsymmetric Loss Function P. Grzegorzewski, E. Mrowka ................................................................... 309 The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments Y. Ojima, S. Yasui, Feng L., T. Suzuki, T. Harada .......................................... 322 Statistical Methods Applied to a Semiconductor Manufacturing Process T. Koyama .........................................................................................

332

An Overview of Composite Designs Run as Split-Plots G. Vining, S. Kowalski ........................................................................... 342

Author Index Andersson, Eva, Dr., Goteborg University, Statistical Research Unit, PO Box 660, SE-405 30 Goteborg, Sweden e-mail: [email protected] Begun, Alexander, Dipl.-Math., Helmut-Schmidt-Universit5it/Universitiit der Bundeswehr Hamburg, Institut fur Statistik und Quantitative Okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail: [email protected] Bodnar, Olha, Dr., Europe University Viadrina, Department of Statistics, Postfach 1786, D- 15207 Frankfurt(Oder), Germany e-mail: \[email protected] Collani, Elart von, Prof. Dr., Universitat Wiirzburg, Volkswirtschaftliches Institut, Sanderring 2, D-97070 Wurzburg, Germany e-mail: [email protected] Colomer, Joan, Prof. Dr., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail: [email protected] Davis, Darwin J., Ph.D., Prof., Department of Business Administration, College of Business and Economics, University of Delaware, 204 MBNA America Building, Newark, DE 19716, U S A e-mail: [email protected] Feng, Ling, Dr., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan e-mail: [email protected] Gob, Rainer, Prof. Dr., Universitat Wiirzburg, Institute for Applied Mathematics and Statistics, Sanderring 2, D-97070 Wurzburg, Germany e-mail: [email protected] Grzegorzewski, Przemyslaw, Ph.D., Polish Academy of Sciences, Systems Research Institute, Newelska 6, 01-447 Warsaw, Poland and Warsaw University of Technology, Faculty of Mathematics and Information Sciences, Plac Politechniki 1, 00-661 Warsaw, Poland e-mail: [email protected]

Harada, Taku, Ph.D., Tokyo University of Science, Department of Industrial Administration, 264 1 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail: [email protected],ac.jp Heidenreich, Melanie, Dipl.-Math., Helmut-Schmidt-UniversitatlUniversitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative bkonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail: [email protected] Hryniewicz, Olgierd, Prof. Dr., Systems Research Institute of the Polish Academy of Sciences and Warsaw School of Information Technology, Newelska 6, 0 1-447 Warsaw, Poland e-mail: [email protected] Ishii, Naru, Nagoya Institute of Technology, Department of Civil Engineering and Systems Management, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan e-mail: [email protected] Iwersen, Jmgen, Dr., Novo Nordisk AIS, Novo Alle, DK-2880 Bagsvaerd, Denmark e-mail: [email protected] Knoth, Sven, Dr., Advanced Mask Technology Center, Postfach 110 16 1, D-0 1330 Dresden, Germany e-mail: [email protected] Kowalski, Scott M., Dr., Technical Trainer, Minitab, Inc., State College, PA 16801, U.S.A. Koyama, Takeshi, Prof. Dr., Tokushima Bunri University, Faculty of Engineering, Sanuki City, 769-2101, Japan e-mail: [email protected] Kuzuya, Kazuyoshi, SQC Consultant, Ohaza-Makihara, Nukata-cho, Aichi 444-3624, Japan e-mail: [email protected] Lucas, James M., Dr., J. M. Lucas and Associates, 5 120 New Kent Road, Wilmington , DE 19808, U.S.A. e-mail: [email protected] McWilliams, Thomas P., Ph.D., Prof., Drexel University, Department of Decision Sciences, Philadelphia, PA 19104, U.S.A. e-mail: [email protected]

Melendez, Joaquim, Prof. Dr., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail: [email protected] Melgaard, Henrik, Dr., Novo Nordisk A/S, Novo Alle, DK-2880 Bagsvaerd, Denmark e-mail: [email protected] Morais, Manuel C., Technical University of Lisbon, Department of Mathematics and Centre for Mathematics and its Applications, Instituto Superior Tdcnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal e-mail: [email protected] Mrowka, Edyta, M.Sc., Polish Academy of Sciences, Systems Research Institute, Newelska 6, 01-447 Warsaw, Poland e-mail: [email protected] Nishina, Ken, Prof. Dr., Nagoya Institute of Technology, Department of Techno-Business Administration, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan e-mail: [email protected] Ojima, Yoshikazu, Prof. Dr., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail: [email protected] Pacheco, Antonio, Technical University of Lisbon, Department of Mathematics and Centre for Mathematics and its Applications, Instituto Superior Tecnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal e-mail: [email protected] Palcat, Frank, Measurement Canada, Ottawa, Ontario, KIA OC9, Canada email: [email protected] Pokropp, Fritz, Prof. Dr., Helmut-Schmidt-Universitat/Universitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail: [email protected] Reynolds Jr., Marion R., Prof. Dr., Virginia Polytechnic Institute and State University, Department of Statistics, Blacksburg, VA 24061-0439, U.S.A. e-mail: [email protected] Ruiz, Magda, Prof., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail: [email protected]

XIV

Saniga, Erwin M., Prof. Dr., University of Delaware, Department of Business Administration, Newark, DE 19716, U.S.A. e-mail: [email protected] Schmid, Wolfgang, Prof. Dr., Europe University Viadrina, Department of Statistics, Postfach 1786, D-15207 Frankfurt(Oder), Germany e-mail: [email protected] Seidel, Wilfried, Prof. Dr., Helmut-Schmidt-UniversitWUniversittitder Bundeswehr Hamburg, Institut fur Statistik und Quantitative Okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail: [email protected] Sever, Krunoslav, Dipl.-Math., Helmut-Schmidt-UniversitatLJniversitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative ijkonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail: [email protected] Stoumbos, Zachary G., Prof. Dr., Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8054, U.S.A. e-mail: [email protected] Suzuki, Tomomichi, Ph.D., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail: [email protected] Vining, G. Geoffrey, Prof. Dr., Virginia Tech, Department of Statistics, Blacksburg, VA 24061, U.S.A. e-mail: [email protected] Yasui, Seiichi, Science University of Tokyo, Department of Industrial Administration, 264 1 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail: [email protected]:jp

Part 1 General Aspects of SQC Methodology

How Some I S 0 Standards Complicate Quality Improvement Elart von Collanil and Frank A. palcat2 University of Wiirzburg, Sanderring 2, D-97070 Wiirzbuyg, Germany, [email protected] Measurement Canada, Standards Building, Holland Avenue, Ottawa, Canada, palcat.frankQic.gc.ca Summary. The practice of industrial quality control is often defined by IS0 standards, which are considered to represent the state of the art in relevant technology and science. At the same time industrial enterprises make great efforts to develop and implement strategies for continuous quality improvement in all parts of their organizations, focussing on reducing waste and producing better quality at lower costs in order to stay in business in a globally-competitive marketplace. In the first part of this paper, the aims of some relevant I S 0 standards for controlling quality are compared with the aims of a strategy for continuous quality improvement. As it turns out, the two aims are hardly compatible and using IS0 standards for controlling quality may constitute a major barrier for quality improvement. The second part outlines procedures for quality control which support quality improvement and, therefore, are more appropriate in modern industrial environments than the existing IS0 standards which are essentially still based on the thinking surrounding needs of the US armed forces during World War I1 and the first wave of progress in quality control over 60 years ago.

1 International Organization for Standardization I S 0 stands for the International Organization for Standardization, a body that has released more than 14000 standards in all areas of life. T h e IS0 internet homepage provides the following information: "If there were no standards, we would soon notice. Standards make an enormous contribution to most aspects of our lives - although very often, t h a t contribution is invisible. I S 0 (International Organization for Standardization) is the world's largest developer of standards. Although ISO's principal activity is

the development of technical standards, IS0 standards also have important economic and social repercussions. I S 0 standards make a positive difference, not just to engineers and manufacturers for whom they solve basic problems in production and distribution, but t o society as a whole. I S 0 standards contribute to making the development, manufacturing and supply of products and services more efficient, safer and cleaner. I S 0 standards are technical agreements which provide the framework for compatible technology worldwide." Accordingly, without standards, and in particular I S 0 standards, the lives of individuals and societies throughout the world would be a t a minimum very cumbersome. This is an undeniable fact and everybody should acknowledge the value of international standardization and, in particular, ISO's very important contribution in this regard. However, this does not mean that, among the many thousands of different standards in existence, all of them achieve their objectives. 1.1 I S 0 S t a n d a r d s

At present, I S 0 standards are divided into 40 different sections starting with "Generalities, Terminology, Standardization, Documentation" and ending with "Domestic and commercial equipment, Entertainment, Sports." This paper deals with the Section 07 heading "Mathematics, Natural Sciences," and more specifically with Subsection 07.020 representing mathematics and entitled "Mathematics: Application of statisticd methods in quality assurance." Natural sciences are represented by subsections for "Physics, Chemistry," for "Astronomy, Geodesy, Geography," for "Geology, Meteorology, Hydrology," for "Biology, Botany, Zoology" and for "Microbiology." The fact that standards dealing with statistical quality assurance are listed under the caption "mathematics" is quite strange, as no mathematician would agree that quality assurance is a special branch of mathematics. This leads t o the question as to which meaning of "statistics" is assumed by the relevant parts of the I S 0 standards. Unfortunately, this question is not addressed in the three voluminous standards entitled "Statistics - Vocabulary and symbols." However, there is a short note in the 2003 Business Plan of the technical committee responsible for Subsection 07.020, i.e., I S 0 T C 69 "Applications of statistical methods." There it is said: "Statistics is the science that develops probabilistic models, based on collection of data, that are used to optimize decisions under uncertainty and to forecast the impact of selected decisions. Thus, statistics contributes to all phases of the product life cycle such as market needs

assessment during conception, optimal specification development under cost and quality constraints during development, process control and optimization in delivery, and customer satisfaction assessment." Without intending to be captious, this definition of statistics falls short of being factual. Probabilistic models are developed in probability theory and this development is not at all based on data or the collection of data, but rather on mathematical principles. Moreover, it is unclear how statistics optimize any decision and looking a t textbooks of statistics reveals that in a majority of them there are no statistical methods for forecasting. Generally, statistics is called a methodology as stated earlier in the same Business Plan: "Statistical methodology coming from the mathematical branch of probability theory . ..," which again is not correct. Statistics came into being earlier and independently from probability theory. One might be inclined to ignore this looseness in the handling of concepts and definitions, however, as we will see, it is symptomatic of the underlying problem and has immediate consequences regarding the developed International Standards. 1.2 General Principles

The development of a standard is supposed to follow some general principles that are stated in ISO/IEC Directives, Part 2: "Rules for the structure and drafting of International Standards." Section 4 contains the general principles in 4.1 and 4.2: 4.1 Objective The objective of documents published by I S 0 and I E C ~is to define clear and unambiguous provisions in order to facilitate international trade and communication. To achieve this objective, the document shall - be as complete as necessary within the limits specified by its scope, - be consistent, clear and accurate, - take full account of the state of the art (see 3 . 1 1 ) ~ ~ - provide a framework for future technological development, and - be comprehensible to qualified persons who have not participated in its preparation. 4.2 Performance approach Whenever possible, requirements shall be expressed in terms of performance rather than design or descriptive characteristics. This approach

3~nternationalElectrotechnical Commission 4 ~ S O / ~Directives, ~C Part 2: "State of the art: developed state of technical capability at a given time as regards products, processes and services, based on the relevant consolidated findings of science, technology and experience."

leaves maximum freedom to technical development. Primarily those characteristics shall be included that are suitable for worldwide (universal) acceptance. Where necessary, owing to differences in legislation, climate, environment, economies, social conditions, trade patterns, etc., several opinions may be indicated. In the following, the objectives of some standards frequently used in industrial quality control are analyzed and, moreover, the conformity of the standards with the above stated general principles is examined.

2 AQL-Based Sampling Plan Standards The most popular types of lot-by-lot acceptance sampling plans are the socalled AQL-based sampling plan standards (hereafter referred to AQL-plans). There are two different types. The first type is classified as "inspection by attributes" and is currently represented by:

0

I S 0 2859-0:1995 Sampling procedures for inspection by attributes - Part 0: Introduction to the I S 0 2859 attribute sampling system. I S 0 2859-1:1999 Sampling procedures for inspection by attributes - Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lotby-lot inspection.

The second type is classified as "inspection by variables" and is currently represented by: I S 0 3951:1989 Sampling procedures and charts for inspection by variables for percent nonconforming.

2,l H i s t o r y of AQL-Plans Banks [3] gives a historical overview of the evolution of quality control from ancient times, including future trends. He gives the following account regarding the introduction of AQL-plans during World War 11:

" World War I1 caused a rapid expansion in industry connected with the war effort. The attempts t o meet the large demands for war material resulted in increased employment of unskilled personnel in the manufacturing industries. Inevitably, the quality of goods fell. Something had to be done to halt this degradation. Quality Control experienced a new impetus and came of age. The military played a significant role in this maturation. In 1942, the Army's Office of the Chief Ordnance came out with the "Standard Inspection Procedures," whose development was largely due to G. Edwards, H. Dodge, and G. Gause. Romig and Torrey also provided assistance in this effort. These procedures also contained sampling tables, which were based on an acceptable quality level (AQL)."

During the 1950s, these sampling procedures were further developed and resulted finally in the well-known MIL-STD-105D, which is the forerunner of IS0 2859. In 1957, MIL-STD-414 was released and is essentially identical in principle t o the current I S 0 3951. Thus, it can be stated that the AQL-plans were basically developed for the US armed forces and their development was largely concluded in the 1950s. Both of these military standards had the dual objective of protecting the soldiers from poor quality while providing them with sufficient material for their missions. Thus, methods were developed in order to reject product of such poor quality that a delivery to the troops would result in an incalculable danger. In addition, the methods were intended to impose the pressure of lot rejection on suppliers in order to motivate them to improve their production processes if inferior quality was being provided. This situation led to the development of the concept known as Acceptable Quality Level (AQL) and the associated AQL-plans (or schemes). In Duncan [13] the following description is given: "The focal point of MIL-STD-105D is the Acceptable Quality Level or AQL. In applying the standard it is expected that in a conference (at a high level) between a supplier and a military agency it will be made clear to the supplier what the agency considers to be an acceptable quality level for a given product characteristics. It is expected that the supplier will be submitting for inspection a series of lots of this product, and it is the purpose of the sampling procedure of MIL-STD105D to so constrain the supplier that he will produce product of AQL quality. This is done not only through the acceptance and rejection of a particular sampling plan but by providing for a shift t o another, tighter sampling plan whenever there is evidence that the contractor's product has deteriorated from the agreed upon AQL target." In practice, once a supplier had succeeded in achieving a production process producing continuously at the AQL, the goal was reached and no incentive for any further improvement was usually considered necessary. 2.2 Definition of AQL

The notion that the "AQL" would be the main point of negotiation between the military and the supplier was ambiguous from the beginning. Originally, the abbreviation stood for "acceptable quality level," however, in [24] this later changed to "acceptance quality limit5." Some of the various definitions given for AQL over time are listed, starting with the original definition given at the August 1942 Ordnance Control training conference. '3.4.6.15 acceptance quality limit AQL worst tolerable product quality level

1942 Ordnance Control training conference [12]: "Acceptable Quality Level (AQL) - the maximum percent defective which can be considered satisfactory as a process average; that is, it is the poorest quality which a facility can be permitted continually to present for acceptance." Dictionary of Statistical Terms (1960) [16]: "The proportion of effective units in a batch which is regarded as desirable by the consumer of the batch; the complement of the proportion of defectives which he is willing to tolerate." Quality Control Handbook (1962) [15]: "Closely related t o classifying of characteristics is the setting of the tolerable per cent defective for each of the defect classes. For vendor inspection this is commonly expressed as in acceptable quality level (AQL). Whereas classification indicates the relative seriousness of function of each characteristic class, the AQL specifies quantitatively the percentage of nonconformance which will be acceptable to the buyer in a 'mass' (or quantity) of product. In effect, the buyer is agreeing ahead of time that he cannot expect perfect product and that he will consider the purchase contract fulfilled if the degree of nonconformance in a lot is no worse than the specified level." American National Standard (1978) [I]:"The maximum percentage or proportion of variant units in a lot or batch that, for the purpose of acceptance sampling, can be considered satisfactory as a process average." American National Standard (1981) [2]: "The AQL is the maximum percent nonconforming (or the maximum number of nonconformities per 100 units) that, for purpose of sampling inspection, can be considered satisfactory as a process average." IS0 2859-1 (1989) [19]: "Acceptable quality level (AQL): When a continuous series of lots is considered, the quality level which for the purpose of sampling inspection is the limit of a satisfactory process average. The AQL is a parameter of the sampling scheme and should not be confused with the process average which describes the operating level of the manufacturing process. It is expected that the process average will be less than or equal to the AQL to avoid excessive rejections under this system." Encyclopedia of Statistical Sciences (1981) [17]: "This is usually defined as the maximum percent defective (or the maximum number. of defects per 100 units) that can be considered satisfactory for a process average." Schilling (1982) [18]: "Note on the meaning of AQL. When a consumer designates some specific value of AQL for a certain defect or group of defects, he indicates to the supplier that his (the consumer's) acceptance sampling plan will accept the great majority of the lots or batches that the supplier submits, provided the process average level of percent defective (or defects per hundred units) in these lots or batches be no greater than the designated value of AQL. Thus, the AQL is a designated value of percent defective (or defects per hundred units) that the consumer indicates will be accepted most of the time by the acceptance sampling procedure to be used. The sampling plans provided herein are so arranged that the

probability of acceptance at the designated AQL value depends upon the sample size, being generally higher for large samples than for small ones, for a given AQL. The AQL alone does not describe the protection t o the consumer for individual lots or batches but more directly relates to what might be expected from a series of lots or batches, provided the steps indicated in this publication are taken. It is necessary t o refer to the operating characteristic curve of the plan, to determine wha) protection the consumer will have. It also contains the following limitation: Limitation. The designation of an AQL shall not imply that the supplier has the right to supply knowingly any defective unit of product." In particular, Schilling's longish explanation demonstrates that the concept is not at all clear or unambiguous. AQL mixes process quality level with the proportion nonconforming of a stream of lots on the one hand, and of a single lot on the other. Moreover, it relates all three of them to a set of sampling plans determined within the framework of a sampling scheme. The ambiguity of the concept was well known, e.g., Duncan [14] notes "Intense difference~of opinion as to the desirability of certain sampling schemes sometimes arise because the various parties are not in agreement as to the meaning of the AQL." It would appear that the concept of an AQL could only be born in a military atmosphere characterized by conflicting interests and an inclination towards obscure solutions. The fact that civilian organizations for standardization have uncritically adopted the statistical quality concepts developed for the US Military reflects the decisive role the US Military had gained in this area and the unclear part of "statistics" within science and technology. 2.3 Objectives of AQL-Plans

According to I S 0 2859-1:1989, an AQL-plan standard is not "intended for estimating lot quality or for segregating lots", but in I S 0 3951:1989 the objective is described as follows: "The object of the method laid down in this International Standard is to ensure that lots of an acceptable quality have a high probability of acceptance and that the probability of not accepting inferior lots is as high as possible. In common with I S 0 2859, the percentage of nonconforming products in the lots is used to define the quality of these lots and of the production process in question." The formulation "lots of an acceptable quality have a high probability of acceptance" is neither clear nor accurate and reflects the obscure performance

of the sampling plans. In fact it is not possible to state any consistent performance criteria for the sampling plans contained in the International Standard, as each single sampling plan performs differently. The claim "the probability of not accepting inferior lots is as high as possible" incorrectly suggests an optimisation procedure. The claim is irrelevant and illustrates that there is no "full account of the state of the art." Of course, it makes sense to use the percentage of nonconforming products in a given lot to define its quality. One could also use this percentage to make inferences regarding the quality of the production process in question, but it makes no sense to use the percentage to define the quality of the production process. Moreover, the assumed close relationship between the percentage of nonconforming product in the lot and the production process restricts the application of these International Standards to situations where the lots are made up by consecutively produced items of a production process. Having in mind the original situation during World War 11, then the primary aim of AQL-plans was to reject all those lots that, if delivered, would endanger one's own soldiers. The detection of poor quality is not made directly by measuring it, but indirectly by showing in some way that the claim of acceptable quality (quantified by the agreed-to AQL) is wrong. The second more implicit aim is t o have the producer improve the process in question until process quality has reached the agreed-upon AQL-value. As mentioned earlier there are two types of AQL-plans. The second type is given by L'ISO3951:1989 Sampling procedures and charts for inspection by variables for percent nonconforming." As early as 1990, it had already been shown [4] that these standards cannot give same protection in principle as the corresponding attribute plans of IS0 2859. These findings were discussed, for example, during the 4th Workshop in Baton Rouge [5] and subsequently investigated by Gob [9, 10, 111 and other researchers. Nevertheless, the I S 0 3951 standard has not been withdrawn and is still being recommended for use by national and international bodies throughout the world.

3 Control Chart Standards Besides the AQL-plan standards, the next most widely applied International Standards for quality control are those for control charts and, in particular, for the so-called Shewhart Charts contained in IS0 8258. In the introduction of I S 0 8258 [21], we may read: "The traditional approach to manufacturing is to depend on production to make the product and on quality control to inspect the final product to screen out items not meeting specifications."

The object of statistical process control is to serve to establish and maintain a process at an acceptable and stable level so as to ensure conformity of products and services to specified requirements. The major statistical tool to do this is the control chart, which is a graphical method for presenting and comparing information based on a sequence of samples representing the current state of a process against limits established after consideration of inherent process variability." It appears that the standard is based on a rather out-dated process perception with respect to producing quality on the one hand and the relevant quality aims on the other. Fortunately, the practice of "inspecting quality into product" has been abandoned long ago. Thus, it has not become a tradition that is handed down through generations and still affects the present. 3.1 H i s t o r y of Control C h a r t s

In 1924, Western Electric's Bell Telephone Laboratories established its Inspection Engineering Department. One of its key members was Walter A. Shewhart, who proposed the first control chart for monitoring process quality that same year. Shewhart distinguished between chance causes and assignable causes of variation, where the latter can be assigned to a special disturbance and the former cannot. Based on this classification of variation, he searched for a method that would be able to distinguish between the two types of variation and could thus detect the occurrence of assignable causes, while tolerating the chance causes. Removing an assignable cause leads to an increase in profit in contrast to an attempt to remove chance causes, which is not economically. During the 1950s, W. Edwards Deming brought the concept of control charts t o Japan and the tremendous success of the transformed Japanese industry constituted the final breakthrough for control charting and, more generally, for statistical process control. 3.2 Objectives of C o n t r o l C h a r t s

The objective of a control chart is described in I S 0 8258 as follows: "The object of statistical process control is to serve to establish and rnaintain a process at an acceptable and stable level so as t o ensure conformity of products and services to specified requirements. The major statistical tool used to do this is the control chart, which is a graphical method of presenting and comparing information based on a sequence of samples representing the current state of a process

against limits established after consideration of inherent process variability. The control chart method helps first to evaluate whether or not a process has attained, or continues in, a state of statistical control at the proper specified level and then to obtain and maintain control and a high degree of uniformity in important process or service characteristic by keeping a continuous record of quality of the product while production is in progress. The use of a control chart and its careful analysis leads to a better understanding and improvement of the process." Thus, the general aim of control charts is to establish and maintain a satisfactory process state. This is done in a manner similar t o that of AQLplans. Control charts do not prove that an assignable cause has occurred, but the claim that no assignable cause has occurred is shown to be false. As an important consequence of this indirect procedure, no substantial statement can be made about the probability of detecting an assignable cause.

4 Modern Industrial Processes Since the 1940s and 1950s many things have changed in industry. Processes are better designed, controlled, and monitored, and automation is extremely advanced. Modern production processes have almost nothing in common with those manufacturing processes and conditions that gave rise to the development of AQL-based sampling plans and control charts. Nowadays, any important quality characteristic is continuously monitored and automatically controlled. Consequently, the produced quality when compared with the specifications is in general near perfect. 4.1 Role of AQL-Plans

As a matter of fact, none of the reasons that led to the development and application of AQL-plans by the US Department of Defense still exist. Nevertheless the International Standards mentioned above, which are essentially identical to the Military Standards of the 1950s, are recommended and applied throughout the world, even so they constitute costly activities that are deprived of appropriate meaning. One could think that applying AQL-plans may be nonsensical, but nevertheless harmless. However, any pro forma activity performed continuously will be inherently dissatisfying, will damage motivation, and will be counterproductive with respect to the true aims and requirements of modern industrial production processes. Clearly, in cases where near perfect quality is produced, it makes no sense to control to an acceptable quality level. However, this does not mean that

one can completely cease with product control activities. There are at least two reasons for continuing with product control: Legal liability regulations. Monitoring the actual quality level. The first reason is clear and requires special m e t h ~ d sthat meet the legal requirements. The second reason will be discussed below. But in any case, the AQL-sampling schemes as offered by the I S 0 are not appropriate for either purpose, as AQL-sampling plans are neither adapted t o meeting the legal requirements nor do they constitute measurement procedures for monitoring the actual quality level. 4.2 Role of Control Charts

In times where demand far exceeds supply and the number of suppliers is small, the primary focus of an enterprise is on product quantity. In situations where the consumer is as powerful as the US Military, the issue of an acceptable quality level (AQL) becomes important and certain quality requirements can be imposed on suppliers and their production processes. In an era of global competition, where supply far exceeds demand and suppliers are operating in a worldwide market, quantity and acceptable quality level lose their importance as measures of being successful in business. Success is determined by the ability to produce better product for less cost compared to a company's worldwide competitors. Reaching this objective means efforts must be increased to further improve processes and products in order to maintain a competitive edge. One consequence of this situation is the fact that only huge globally-operating companies may be able to afford the operational expenditures for research, development, and continuous improvement t o achieve this objective. In this modern environment, production processes not only look completely different from those of World War I1 era, but their requirements are fundamentally different, focussing on continuous improvement rather than simply achieving and maintaining an acceptable quality level. And in this new environment, any enterprise that intends to "rest on its laurels" knows that it will soon be out of business. Any strategy of continuous process improvement must consist of many co-ordinated activities, which, of course, include monitoring and controlling. There are many objectives of monitoring and controlling. The two most important are the following: 0

Revealing room for improvement.

Verifying improvements made. Clearly, neither of these aims can be achieved by using control charts. Therefore, as in the case of using AQL-plans, performing insignificant activities is not harmless but counterproductive and should be abandoned.

5 New Generation of Standards The existing International Standards for quality control cannot contribute towards a strategy of continuous quality improvement because they have been developed and designed for completely different and even conflicting objectives. Therefore, a question arises as to which International Standards are needed to cope with the present situation in modern industries. As already mentioned there is a need for International Standards for acceptance control in order to meet the requirements set by legal liability regulations. These standards have no impact on continuous process improvement and, therefore, will not be discussed in this paper. Of course, more interesting from the stochastic point of view are the methods supporting continuous process improvement. Modern process requirements and recent scientific advancements necessitate and enable completely new standards for different areas of application. As a general guideline, it should be emphasized that the new generation of stochastic standards should follow the "performance approach" criterion stated as a general development principle but neglected by the existing I S 0 statistical standards, which specify design details but leave little room for technological and scientific developments. Hence, they do not constitute a "framework for future development," but rather hinder further developments. This is a consequence of the I S 0 statistical standards still closely following the designs of their military forerunners. 5.1 S t o c h a s t i c Modelling

Because any appropriate method for monitoring or control must be based on a comprehensive stochastic model, International Standards based on stochastic modelling should be developed. The required models must be able to incorporate any available knowledge and express existing ignorance. Existing standards are generally not based on models that sufficiently reflect the existing uncertainty in a given situation. The majority of statistical standards assumes the normal approximation without a thorough justification. Consequently, it remains unknown whether or not the results are useful.

What is needed is a schematic procedure that starts with the collection of available knowledge and leads in a stepwise manner to a mathematical model describing the existing uncertainty while accounting for the available knowledge. The relevant principles for stochastic modelling, which should not be confused with modelling in probability theory, are derived and listed in some detail in [7]. The main difference between stochastic modelling and probabilistic modelling is the fact that the former closely follows the actual situation as given while the latter proceeds according to mathematical principles, in particular, using limiting processes. 5.2 Stochastic Monitoring Stochastic monitoring from the viewpoint of a continuous improvement strategy consists of continuously measuring the actual quality level for detecting room for improvement on the one hand and for directly verifying implemented improvements on the other. Therefore, measurement procedures are needed which enable variations in process quality level as related to environmental variations to be documented. The question as to how to measure process quality depends on the comprehensive process model as derived and need not necessarily consist of measuring the probability of nonconformance. Moreover, a process improvement need not necessarily lead to a decrease in the probability of nonconformance, but it may enhance the dependence structure of relevant characteristics. A method to construct the necessary measurement procedures based on the deriyed stochastic model is described in [8]. 5.3 Stochastic Verification

Finally, the suitability of a stochastic model on the one hand and of implemented improvements on the other must be verified. To this end, appropriate prediction procedures are necessary, as the usefulness of any mathematical model can only be verified by comparing predictions based on the model with the actual occurring events. Again the scientific principles for developing appropriate International Standards for prediction procedures to be used for model verification can be found in [8].

6 Conclusions The relevant International Standards aim for quality assurance and are essentially sorting methods aimed a t rejecting the hypothesis t h a t quality is satisfactory. They are based on the concept of in-control and out-of-control states for the production process and can be looked upon as a means for preserving the in-control state or for maintaining an acceptable quality level. Almost all of the relevant standards descend from some earlier military standard and, consequently, the underlying thinking is associated with that earlier time period. The military standards were characterized by application instructions being specified in detail, while their actual performance characteristics remained obscure. This is contrary to ISO1s general guidelines for standards development which expects requirements t o be performance-based rather than being prescriptive. As a consequence, many of the statistical standards cannot be sufficiently well adapted to given situations. However, the main weakness of existing International Standards in the field of statistics is the fact that their objectives are obsolete. New requirements arising from the dramatic technological and communications revolution on the one hand and progressive globalisation on the other have completely changed the needs in industry. Therefore, a new generation of stochastic International Standards is needed to meet the demands resulting from the efforts of implementing continuous quality improvement strategies. In the development of this new generation of stochastic standards the opportunity should also be taken t o change the I S 0 standards section from "mathematics" to "stochastics6" and develop a completely new, clear, and unambiguous terminology.

References 1. American National Standard (1978) Sampling Procedures and Tables for Inspection by Attribute. American Society for Quality Control, Wisconsin. 2. American National Standard (1981) Terms, Symbols and Definitions for Acceptance Sampling. American Society for Quality Control, Wisconsin. 3. Banks, J (1989) Principles of Quality Control. John Wiley & Sons, New York. 4. v. Collani, E (1991) A Note on Acceptance Sampling by Variables. Metrika 38: 19-36. 5. v. Collani, E (1992) The Pitfall of Variables Sampling. In: Lenz H-J, Wetherill GB, Wilrich P-Th (eds) Frontiers in Statistical Quality Control 4, Physica Verlag, Heidelberg, 91 -99.

he term "stochastics" originates from Greek, it was introduced by Jakob Bernoulli in his masterpiece "Ars conjectandi," which appeared 1713, and its meaning is "Science of Prediction"

6. v. Collani E, Drager K (2001) Binomial Distribution Handbook for Scientists and Engineers. Birkhauser, Boston. 7. v. Collani, E (2004) Theoretical Stochastics. In: v. Collani E (ed) Defining the Science of Stochastics. Heldermann Verlag, Lemgo, 147-174. 8. v. Collani, E (2004) Empirical Stochastics. In: v. Collani E (ed) Defining the Science of Stochastics. Heldermann Verlag, Lemgo, 175-213. 9. Gob, R (1996) An Elementary Model of Statistical Lot Inspection and its Application to Sampling Variables. Metrika: 44, 135-163. 10. Gob, R (1996) Test of Significance for the Mean of a Finite Lot. Metrika: 44, 223-238. 11. Gob, R (2001) Methodological Foundation of Statistical Lot Inspection. In: Lenz H-J, Wilrich P-Th (eds) Frontiers in Statistical Quality Control 6, Physica Verlag, Heidelberg, 3-24. 12. Dodge, H F (1969) Note on the Evolution of Acceptance Sampling Plans, Part 11. Journal of Quality Technology: 1, 155-162. 13. Duncan, AJ (1965) Quality Control and Industrial Statistics. 3rd ed., Richard D. Irwin, Homewood. 14. Duncan, AJ (1972) Quality Standards. Journal of Quality Technology: 4, 102109. 15. Juran, JM (1962) Quality Control Handbook. 2nd ed. McGraw-Hill, New York. 16. Kendall MG, Buckland WR (1960) A Dictionary of Statistical Terms. 2nd ed., Oliver and Boyd, Edinburgh. 17. Kotz S, Johnson NJ (1981) Encyclopedia of Statistical Sciences, Vol. 1, John Wiley & Sons, New York. 18. Schilling, EG (1892) Acceptance Sampling in Quality Control. Marcel Dekker, New York. 19. I S 0 2859-1 (1989) Sampling procedures for inspection by attribute - Part 1. International Organization for Standardization, Geneva. 20. I S 0 3951 (1989) Sampling procedures and charts for inspection by variables for percent nonconforming. International Organization for Standardization, Geneva. 21. I S 0 8258 (1991) Shewhart control charts. International Organization for Standardization, Geneva. 22. I S 0 T C 69 Application of statistical methods (2003) Business Plan 2003. International Organization for Standardization, Geneva. 23. Draft International Standard ISO/DIS 3534-1 (2003) Statistics - Vocabulary and symbols - Part 1: Probability and General Statistical Terms. International Organization for Standardization, Geneva. 24. Draft International Standard ISO/DIS 3534-2 (2003) Statistics - Vocabulary and symbols - Part 2: Applied statistics. International Organization for Standardization, Geneva.

Part 2 On-line Control

2.1 Sampling Plans

Optimal Two-Stage Sequential Sampling Plans by Attributes Olgierd Hryniewicz Systems Research Institute, Newelska 6, 01-447 Warsaw, POLAND hryniewi(9ibspan.waw.pl

Summary. Acceptance sampling plans have been widely used in statistical quality control for several decades. However, when nearly perfect quality is needed, their practicability is questioned by practitioners because of required large sample sizes. Moreover, the majority of well-known sampling plans allow nonconforming items in a sample, and this contradicts the generally accepted "zero defect" paradigm. Sequential sampling plans, introduced by Wald [7], assure the lowest possible sample size. Thus, they are applicable especially for sampling products of high quality. Unfortunately, their design is rather complicated. In the paper we propose a simple, and easy to design, special case of sequential sampling plans by attributes, named CSeq-1 sampling plans, having acceptance numbers not greater than one. We analyze the properties of these plans, and compare them to the properties of other widelyused sampling procedures.

1 Introduction Procedures of acceptance sampling for attributes have been successfully used for nearly eighty years. Sampling plans (single, double, multiple, and sequential) have been proposed by prominent statisticians, and published in many papers, textbooks and international standards. During the last ten years, however, their usage has been criticized by practitioners, who have pointed out many their many deficiencies. First of all, for quality levels that are considered appropriate for contemporary productions processes, the sampling plans published in international standards require excessively many items t o be sampled. Secondly, the majority of well-known sampling plans allow nonconforming items in a sample. This is in conflict with the zero-defect philosophy which has become a p a r a d i y for quality managers and engineers. Practitioners prefer to use sampling plans with a low sample size and with acceptance number set t o zero. Moreover, they require, if asked, very good protection against lots of bad quality. It has to be stated clearly, however, that the

fulfillment of all these requirements is, except for few cases, practically impossible. Therefore, there is a need for simple sampling procedures that are characterized by relatively low average sample sizes that do not frighten practitioners by the magnitude of their acceptance numbers. In many cases the maximum acceptance number is equal to one. One nonconforming item in a sample may be accepted in all cases where perfect quality is not attainable due to economical or physical limitations, but high quality, expressed as a low fraction nonconforming, is required. It is a well-known fact that sequential sampling plans are characterized by the lowest possible average sample size among all sampling plans that fulfill certain statistical requirements. Sequential sampling plans were introduced by Abraham Wald [7, 81 during the Second World War. Similar results were independently obtained by Bartky [4] and Barnard [3] at approximately the same time. Speaking in terms of mathematical statistics, sequential sampling plans can be regarded as sequential probability ratio tests (SPRT) for testing a simple hypothesis 0 = Bo against an alternative 8 = > 00, where 0 is the parameter of a probability distribution f (x, 8) that describes the observed random variable X. In the case of sampling by attributes when the random variable of interest is described by the binomial distribution, the sequential sampling plan is used to test the null hypothesis p = pl against the alternative hypothesis p = pz, where p is the parameter of the binomial distribution (probability of the occurrence of the event of interest in a single trial). Under sequential sampling, consecutive items are drawn from a given population (process or lot) one at a time, until a decision is y a d e whether to accept the null hypothesis or to reject it. In the context of quality control, the sampled items are judged either as conforming or nonconforming to certain requirements. For the sake of simplicity we will use this terminology, knowing that in a different context the meaning of the result of sampling an item might be different. Let d (n) denote the cumulative number of nonconforming items found in n sampled items. According to Wald's original proposal, we accept the null hypothesis (in quality control it means that we accept the sampled lot or process) if the plot of d (n) crosses from above to below the acceptance line A (n) = -hA gn. (1) When the plot of d (n) crosses from below to above the rejection line

+

R (n) = h~

+ gn,

the null hypothesis is rejected (so we reject the sampled lot or process). Otherwise, the next item is taken, and the procedure continues until an acceptance or rejection decision is made. Sampling plan parameters: hA,hR and g are called, respectively, the acceptance intercept, the rejection intercept, and the slope of the decision lines.

Let L (p) be the probability of acceptance (of the null hypothesis) when the value of the parameter of the considered binomial distribution is equal to p. We usually assume that the sampling plan has to fulfill the following two requirements: L ( ~ 1 2) 1 - a' (3)

In statistical quality it means that we require the high acceptance probability ( 2 1 - a ) for a product of good quality (with a fraction nonconforming equal to p l ) , and the low acceptance probability ( 5 P) for a product of bad quality (with a fraction nonconforming equal to pz). Probabilities cr and P are usually called producer's and consumer's risks, respectively. When the requirements (3) and (4) have the form of equalities, it has been proven that the sequential sampling procedure is the optimal with respect to the expected sample size [8]. It means that a sequential sampling plan is characterized by the minimal average sample size among all possible sampling plans that fulfill these requirements. It has f o be noted that requirements (3) and (4) in a form of equalities in the case of sampling by attributes may not be fulfilled when we sample from a finite population without replacement. In the case of sampling from an infinite population (or sampling with replacement) Wald [7] found a n approximation for the OC function of the sequential sampling plan, and showed that its parameters may be calculated from the following formulae:

For many years Wald's sequential sampling plans for attributes have been offered to practitioners as the procedures with the smallest average sample size. In practice, however, they have not been used in their original noncurtailed version. Despite the theoretical fact that the sequential sampling plan has the smallest average sample size, it may occur that the decision of acceptance or non-acceptance is made at a very late stage of sampling, i.e. after sampling a large and unknown number of items. Such a situation

may occur when the quality of a lot (measured in fraction nonconforming) is close to g. Practitioners do not like these situations, and want to know in advance what is the largest possible cumulative sample size. In order to meet these requirements a maximum cumulative sample size n, is introduced. When the cumulative sample size reaches the curtailment value without the decision having been made, the lot is accepted if the cumulative number of nonconforming items d (n,) is not greater than a given acceptance number '4 (4. The analysis of the statistical properties of the curtailed sequential sampling plans requires the application of numerical computations. Such methodology was proposed in the paper by Aroian [I]. Woodall and Reynolds [9] proposed a general model for the analysis of curtailed sampling plans using a discrete Markov chain representation. They also proposed an approximate method for finding the optimal curtailment. However, the implementation of their theoretical results is rather difficult for practitioners, as it requires special statistical skills and specialized software (for calculation of eigenvectors of matrices). The work on practical implementation of the curtailed sampling plans for attributes has been initiated by Baillie [2] who examined statistical characteristics of curtailed sequential sampling plans from the international standard I S 0 8422 [6] with the parameters calculated according to Wald's formulae. In this paper Baillie noticed that the actual risks of the sampling plans proposed in this standard are substantially different from the nominal ones (5% for a , and 10% for p). This result has important practical consequences, as the plans from this standard are practically the only sequential sampling plans for attributes that are used in practice. Moreover, Hryniewicz [5] has shown that curtailed sequential sampling plans for attributes have always smaller average sample size than non-curtailed ones. These and other similar results have prompted a group of researchers from ISO/TC 69 committee to seek for curtailed sequential sampling plans for attributes that fulfill (3) and (4), and are optimal in a certain sense. During the work on a new version of I S 0 8422 International Standard on sequential sampling plans for attributes many numerical experiments have revealed that some optimal curtailed sampling plans have the acceptance number at the curtailment equal to one. We believe that such sampling plans may be acceptable for practitioners who are looking for cost-efficient procedures with decision risks under control, and a low number of accepted nonconforming items in a sample. In the second section of the paper we introduce such a simple acceptance sampling procedure, which in fact is a special case of a well-known curtailed sequential sampling plan described above. For this procedure we present formulae for the calculation of its statistical properties. These formulae are used in the third section of the paper where we propose algorithms for the design of sampling plans that fulfill certain practical requirements. Theoretical results are illustrated with some numerical examples. In the fourth section of

the paper we compare the newly introduced sampling plans with the Wald's sequential sampling plans given in the international standard I S 0 8422 [6], and with curtailed single sampling plans having the acceptance number equal to zero.

2 Curtailed sequential sampling plans with acceptance

number not greater than one (CSeq-1) Let us introduce the proposed sampling plan by the description of its operation. It is typical for all sequential sampling plans by attributes. Sample items are drawn at random and inspected one by one, and the cumulative count (the total number of nonconforming items or nonconformities) d ( n ) is recorded. If, at a given stage, the cumulative count fulfills acceptability criteria, i.e. it is not greater than a certain acceptability number, then the inspection is terminated, and the inspected lot or process is accepted. If, on the other hand, the cumulative count is equal to a rejection number, then the inspection is terminated, and the inspected lot or process is rejected. If neither of these decisions can be made, then an additional item is sampled and inspected. Let us assume that the inspected lot or process can be accepted only in two cases: either when the cumulative count for a certain clearance cumulative sample size no is equal to zero, or if the cumulative count for the curtailment value n, such that n, > no is equal to one. In case of n, > no let the rejection number at the curtailment be set to two, and assume that it is also equal to two for all cumulative sample sizes. When n, = no, the rejection number is set to one for all cumulative sample sizes. Hence, the proposed sampling plan (CSeq-1) is defined by two parameters (no, n,), no n, . Note, however, that in the case of n, = no the CSeq-1 sampling plan is the same as the curtailed single sampling plan with acceptance number equal to zero. Let us assume that the probability of drawing a nonconforming item is constant for all inspected items, and equal to p. It means that in the case of sampling without replacement, as it is usually done in practice, we assume infinite (sampling from a process) or sufficiently large lots. In such a case the number of nonconforming items in the sample is distributed according to the binomial distribution. Thus, the probability of acceptance when i items have been inspected is given by

1 -a, and ( 1 < p. From ( 1 0 ) we can see that for n1 > 1 the following inequality holds ( 1 - ~2)"' [ I

Hence, we can find from ( 1 2 ) the following necessary condition for the value of the clearance parameter n o :

On the other hand, we can see from (13) that the following condition has to be also fulfilled (1 - pzIn0 < 0 . (16) Both inequalities (15) and (16) define, respectively, the upper and the lower limits for possible values of no. Now, let us find the limits for possible values of n l . For a given value of no, we can find from (13) that

nl L and from (12) we can find that

Thus, inequalities (15)-(18) define the joint ranges for admissible values of the parameters no and n,. If, for given a , 0, p l , and pz, these inequalities do not hold, then a CSeq-1 sampling plan which fulfills requirements (12) and (13) does not exist. Particularly, when there is no such n l that fulfills both (17) and (18), a discrimination rate between pl and pz is too high. In such a case it is necessary t o apply sequential sampling plans with the maximal acceptance number (at curtailment) greater than 1.

4 Design of optimal CSeq-1 sampling plans Sequential sampling plans are used in situations when it is necessary to minimize sampling costs. Therefore, from among all plans fulfilling conditions (12) and (13), we should choose as optimal the one that is characterized by the lowest average sample size. The average sample size for CSeq-1 sampling plan, defined by ( l l ) , is a complex function of the fraction nonconforming p. Therefore, it is not possible to propose a simple algorithm for its optimization, even for a simple objective function. In this paper we consider three objective functions: 0

minimization of supp ii (p; no, n,), minimization of (n(pl; no, n,) ;it (p2;no, n,)) 12, and minimization of (E(0; no,n,) ii (pl; no, n,)) 12.

+ +

The first of these objective functions is traditionally used for the optimization of sampling plans with requirements set on producer's and consumer's

risks. The second function can be recommended when the actual quality of inspected lots or processes is usually better than p l . The third function describes the sampling costs in the worse possible case. It may be recommended when the fraction nonconforming varies, e.g. when we inspect large lots delivered by different suppliers. Optimal CSeq-1 sampling plans can be found using numerical procedures. A significant simplification of computations can be attained with the help of the following lemma: Lemma 1. For a fixed value of no the average sample size T i (p; no, n,) is an increasing function of n,. The proof is given in the appendix. Thus, for any of the considered (or similar) objective functions the optimal value of n l = n, - no is equal to the smallest integer not smaller than the right-hand side of the inequality (17). Notice that this value is a function of pz , and is not a function of p l . Hence, if the objective function does not depend directly on p l , the optimal plan depends exclusively on pz. The value of pl only has impact on the range of admissible values of no. To illustrate the theoretical results let us find optimal CSeq-1 sampling plans for a subset of p l , and pz values taken from the international standard I S 0 8422. The risks are the same as in I S 0 8422, i.e. a = 0.05 and P = 0.1. The results presented in Table 1 confirm our previous remark that the parameters of the optimal CSeq-l sampling for the considered objective function do not depend on the value of pl. Another interesting, and somewhat unexpected, feature is the abrupt and significant change between the sampling plans when the discrimination rate becomes too high for the application of the curtailed single sampling plan with acceptance number equal to zero. This feature indicates the fundamental difference between these sampling plans. In the general case of the optimal curtailed Wald's sampling plans this difference is not so visible. The results presented in Table 2 confirm our claim, that in the case of the objective function that explicitly depends on pl , the parameters of the optimal CSeq-1 sampling plan depend both on pl and pz. It is worth noting that larger curtailment numbers are required for higher discrimination ratios. The results of optimization for the case considered in Table 3 are similar to those presented previously. We may add an additional remark that in the considered cases the dependence of the optimal clearance number no upon the values of pl and pz is not so simple. In the case of a fixed pl the clearance number no decreases with the decreasing discrimination rate. However, for a fixed value of pz the type of dependence is reversed. The interpretation of this phenomenon needs further investigation.

Table 1. Optimal CSeq-1 sampling plans (no,n,) that minimize sup, T i ( p ;no, n,)

0,Ol

(230,230)

(247,500)

(247,500)

(247,500) (247,500)

0,0125

(184,184)

(184,184)

(197,401)

(197,401) (197,401)

0,016

(143,143)

(143,143)

(143,143)

(153,316) (153,316)

0,02

(114,114)

(114,114)

(114,114)

(114,114) (124,244)

0,025 NA

-

.

(91,91) (91,91) (91,91) (91,91) (91,91) CSeq-1 plan not available (discrimination ratio too high).

(no,no)- curtailed single sampling plan.

5 CSeq-1 sampling plans vs. curtailed Wald's sequential sampling plans and curtailed single sampling plans T h e CSeq-1 sampling system described in the preceding sections is a special case of Wald's curtailed sequential sampling system defined by formulae (1) t o (7). By simple analysis of the CSeq-1 acceptance and rejection regions, we can easily show t h a t the following relations hold:

I t is easy t o see t h a t for Wald's original plans the value of h R may be smaller than the left-hand side of (20). T h e practical consequence of this is the following: it is possible t o reject the lot when we observe only one nonconforming item. I t gives a n additional degree of freedom in the design of the sampling plans. Thus, it is possible t o find Wald's curtailed sampling plans which are more effective (smaller average sample sizes) t h a n the optimal

Table 2. Optimal CSeq-1 (7%(PI;no, nc) +? ( ~ 2no,nc)) ; 12.

+

sampling

plans

(no,nc)

that

minimize

0,0125

(184,184)

(184,184)

(186,515)

(187,493)

(188,476)

0,016

(143,143)

(143,143)

(143,143)

(145,403)

(146,381)

0,02

(114,114)

(114,114)

(114,114)

(114,114)

(116,316)

0,025 NA

-

(91,91) (91,91) (91,91) (91,91) (91,91) CSeq-1 plan not available (discrimination ratio too high).

1 (no,no)- curtailed single sampling plan.

CSeq-1 sampling plan. The illustration of this fact is given in Table 4 where we compare the Wald's sequential sampling plan (taken from the draft of the new version of the I S 0 8422 international standard: h~ = 0.883, hR = 0.991, g = 0.000903, n, = 2083 with the optimal CSeq-1 sampling plan (no = 968, n, = 2144), both designed to fulfill the following requirements (PRQ = 0.02%, CRQ = 0.25%, a = 0.05, ,B = 0.1) Thus, CSeq-1 sampling plans are, in general, inferior in comparison t o the optimal Wald's plans. However, its design is much simpler, and requires significantly smaller computational effort. Moreover, if one accepts one nonconforming item in the sample ("benefit of doubt"), then the CSeq-1 plan seems to be more "logical " to practitioners. Now, let us consider the comparison between the CSeq-1 sampling plan, and the curtailed single sampling plan with clearance number (maximal sample size) no , and acceptance number equal t o zero. The probability of acceptance for this sampling plan is given by a simple formula

Table 3. (E(0; no, n,)

Optimal

CSeq-1

+ 5 (PI;no, n,)) 12.

sampling

plans

(no, n,)

that

minimize

PZ/P~

0,0002

0,00025

0,000315

0,0004

0,0005

0,002

NA

NA

NA

NA

NA

0,0025

(968.2144)

NA

NA

NA

NA

NA

-

CSeq-1 plan not available (discrimination ratio too high).

(no,no)- curtailed single sampling plan.

Table 4. Comparison of Wald's and CSeq-1 sampling plans. Parameter

Wald's sequential plan

CSeq-1 plan

PA (CRQ)

0,0997

0,lO

(E(PRQ) + A (CRQ)) I2 922,7 It is easy to show t h a t the requirements (3) curtailed single sampling plan if

927,14

-

(4) are fulfilled by t h e

where

From the analysis of (3), (4), and (22) we can find t h a t the parameter no has t o fulfill the following relation:

It is easy to show that the average sample size for the curtailed single sampling plan is an increasing function of no. Thus, the optimal (assuring the minimum average sample size) value of no is given by the following formula:

Let us analyze the conditions (17) and (18). From this analysis it follows immediately that the parameter no of the CSeq-1 sampling plan has to fulfill two inequalities:

Hence, for the optimal curtailed single sample and CSeq-1 sampling plans the following inequality holds no 5 no. If (29) holds, then for any sequence of sampling results the decisions under the rules of the CSeq-1 sampling plans are made not earlier than the decisions under the rules of the curtailed single sampling plans. Therefore, if the curtailed single sampling plan that fulfills (3) and (4) exists, than it is always more effective than any CSeq-1 sampling plan that also fulfills these requirements. This result is not unexpected if we consider the curtailed single sample plan with acceptance number equal to zero as the special case of the CSeq-1 sampling plan with n, = co.

6 Appendix Proof. of Lemma 1. Let An, = f i (p; no, n, have Anc = no (1 no

+ 1) - f i (p; no, n,) . From (11) we

+ (n, + 1) nop (1 - p)"" n,+l

+ iC= 2 i (i - 1)p2 (1 - p)i-2 + i=no+l C inop2(1 - p)i-2

.

-no (1 - p)nO - ncnop (1 - P ) ~ " - ' no

-

C i (i - 1)p2 (1 - p)i-2 -

i=2 = nop (1 - P ) ~ " - '[(nc n,-1 > 0, = nap (1 - P)

+

5

inop2(I -p)'-' i=no+l 1) (1 - p ) - n,] (n, 1)nop2 (1 - p)"'-'

and this completes the proof.

+

+

References 1. Aroian, L.A. (1976):"Applications of the direct method in seqpential analysis". Technometrics, 18, 301 - 306. 2. Baillie, D..H. (1994); "Sequential sampling plans for inspection by attributes with nearnominal risks ".Proc. of the Asia Pacific Quality Control Organisation Conference. 3. Barnard, G.A. (1946): "Sequential tests in industrial statistics". J o u m . Roy. Stat. Soc. Supplement, 8, 1 - 21. 4. Bartky, W. (1943): "Multiple sampling with constant probability". Annals of Math. Stat., 14, 363-377. 5. Hryniewicz, 0 . (1996):"Optimal sequential sampling plans ". In: Proc. of the 4th Wuerzburg-Umea Conf. in Statistics, E.von Collani, R.Goeb, G.Kiesmueller (Eds.), Wuerzburg, 209 - 221. 6. I S 0 8422: 1991. Sequential samplzng plans for inspection by attributes. 7 . Wald, A. (1945):"Sequential tests of statistical hypotheses". Annals of Math. Stat., 16, 117 186. 8. Wald, A. (1947): "Sequential Analysis". J.Wiley, New York. 9. Woodall, W. H., Reynolds Jr., M.R. (1983):"A discrete Markov chain representation of the sequential probability ratio test ". Commun. Statist. - Sequential Analyszs, 2(1), 27 - 44.

Three-Class Sampling Plans: A Review with Applications Frank A. Palcat Measurement Canada, Ottawa, Ontario, KIA 0C9, Canada [email protected]

Summary. Acceptance sampling plans have been widely used in statistical quality control

for several decades. The vast majority of the relevant research and development over this period has focused on two-class sampling plans that involve classifying product characteristics as either conforming or nonconforming with respect to specified acceptance requirements. In recent years, some developmental work has occurred with respect to three-class sampling plans that additionally involve classifying product characteristics as marginally conforming with respect to requirements. This work remains relatively unknown among practitioners and applications seem to be presently limited to control of undesirable microbiological presence in food. This paper presents a review of the key developmental contributions to three-class sampling plan theory and discusses some applications where such plans would provide a more effective means of quality control. Applications specific to the field of legal metrology, including case-studies where isolated lots are common and currently-used methods are problematic, are particularly examined.

1 Introduction Acceptance sampling plans have been widely used in statistical quality control for several decades. The vast majority of the relevant research and development over this period has focused on two-class sampling plans that involve classifying product characteristics as either conforming or nonconforming with respect to specified acceptance requirements. Since the 1940s and 1950s, several national and international standards have been formulated using this dichotomous basis for product quality, including the popular IS0 2859 and IS0 3951 series for sampling by attributes and sampling by variables respectively. In recent years, some practitioners and researchers have expressed interest in sampling plans for evaluating products that may be categorized into three (or more) quality classes. For example, under such sampling plans, a product with three quality states may be classified as conforming, marginally-conforming, or nonconforming, where the original conforming category is split into two parts, although other methods for defining and labeling the resulting classes are possible. However, developmental work with respect to multiple-class sampling plans has

been extremely minimal in comparison to that associated with their two-class counterparts and, outside the field of food safety inspection, such sampling plans are generally unknown in application and not even mentioned in popular statistical quality control texts. The present paper attempts to rectify the problem regarding a lack of awareness of such sampling plans among statistical quality control professionals. Section 2 provides definitions of symbols used throughout the paper. Section 3 provides a review of the key developmental contributions to three-class sampling plan theory over the past three decades. In section 4, a few extensions to the contributions in section 3 are presented along with some comments of a practical nature. Applications specific to the field of legal metrology, including case-studies where isolated lots are common and currently-used methods are problematic, are examined in section 5. Section 6 concludes the paper with some closing remarks.

2 Symbols The symbols in this section are used throughout the paper. Wherever equations or symbols from the reviewed papers are cited in this one, they have been translated in accordance with this list. maximum allowable number of marginally-conforming items in a sample maximum allowable number for the sum of the marginally-conforming and nonconforming items in a sample maximum allowable number of nonconforming items in a sample number of conforming items in a lot (Do= N- Dl - D2) number of marginally conforming items in a lot number of nonconforming items in a lot number of conforming items in a sample (do= n - dl - d2) number of marginally conforming items in a sample sum of the marginally-conforming and nonconforming items in a sample ( 4 2 = dl + d2) number of nonconforming items in a sample acceptability constant associated with the sum of the marginally-conforming and nonconforming items acceptability constant associated with the nonconforming items in the sample lower specification limit separating marginally-conforming from conforming items lower specification limit separating nonconforming from marginallyconforming items lot size sample size

probability of acceptance lot or process proportion conforming (po= 1 -pl -p2) lot or process proportion marginally conforming lot or process proportion marginally conforming plus nonconforming @I2 = P I + ~ 2 ) lot or process proportion nonconforming sample standard deviation (n-1 degrees of freedom) noncentral t variable narrow limit compression factor upper specification limit separating conforming from marginallyconforming items upper specification limit separating marginally-conforming from nonconforming items critical value for quality value function (QVF) approach value assigned to a marginally-conforming item (0 < v < 1) sample mean the xthfractile of the standardized normal distribution noncentrality parameter of a noncentral t distribution lot or process standard deviation

3 Review of Key Contributions 3.1 Sampling by Attributes

The subject of three-class acceptance sampling plans appears to have been first introduced in the statistical quality control literature in 1973. Motivated by applications in the health protection and food safety fields, Bray et al. [4] laid down the basic theory for three-class acceptance sampling by attributes using the trinomial probability distribution model. In general, such a sampling plan is specified by a sample size, a critical value representing the maximum allowable number for the sum of the marginallyconforming and nonconforming items in the sample, and a critical value representing the maximum allowable number of nonconforming items in the sample (n, c12, cz). A random sample of n items is inspected and the number of marginallyconforming items (dl) and nonconforming items (d2) in the sample are counted. If both of the following inequalities are satisfied:

dl, 5 c I 2 and d, S c , (where dl*= d l + d2) then the lot is accepted; otherwise, it is rejected. The probability mass function for the trinomial distribution is:

(1)

and the function for the sampling plan's probability of acceptance is:

Bray et al.'$ primary interest was in the subset of such sampling plans where c2 is fixed at 0 and c , is the maximum allowable number of marginally-conforming items in the sample. In this subset of plans, a lot is rejected if d2 > 0 or dl > cl and accepted otherwise. The simplified function for the sampling plan's probability of acceptance is:

Several different "c2 = 0" sampling plans are tabulated in the paper for various combinations of pi, along with contour representations of the operating characteristics for two example sampling plans. Sampling plans based on Bray et al.'s paper [and in particular equation (4)] were adopted for food safety purposes by such organizations as the International Commission on Microbiological Specifications for Foods (ICMSF) and the Codex Alimentarius Commission. References include ICMSF [I 81, which was originally published in 1974, and chapter 7 of Ahmed [I]. It should be noted that these sampling plans are one-sided, based on two upper specification limits conventionally represented by the symbols m and M, whereas in this paper, these two specification limits are defined as U , and U2respectively. 3.2 Sampling by Attributes - Curtailed Inspection

Shah and Phatak [26, 271 extended the basic work of Bray et al. [4] regarding three-class attributes sampling plans, focusing on the practice of curtailed inspection. In [26], the authors first explored the case of semi-curtailed sampling where inspection would cease as soon as the number of nonconforming items or nonconforming plus marginally-conforming items discovered in the sample were sufficient to cause lot rejection. This was followed by consideration of the case of fully-curtailed inspection where, in addition to ceasing inspection due to early lot rejection, sampling inspection would cease as soon as lot acceptance was evident. The authors provide equations for the average sample number (ASN) as well as the maximum likelihood estimators (MLE) of the proportions marginallyconforming and nonconforming in the lot and their associated asymptotic vari-

ances under these forms of curtailed inspection. The relationship between the percent saving in inspection and the efficiency of the estimators is also provided. In their second paper [27], the authors extended their research to address threeclass attribute sampling plans in a multiple sampling context, with particular focus on double sampling. Expressions for the ASN, the MLEs of the proportions marginally-conforming and nonconforming after a specified number of lots have been inspected, and the relations between the estimators' asymptotic variances and the ASN are obtained under uncurtailed, semi-curtailed, and fully-curtailed inspection in the double sampling context, then generalized to multiple sampling. As the full details of Shah and Phatak's two papers are beyond the intended scope of this one, the interested reader is advised to consult their work. 3.3 Sampling by Attributes - Indexed by AQL

In a series of three papers, Clements [7, 8, 91 set out to create a system of sampling plans comparable to those in such standards as MIL-STD-lO5D [30] and I S 0 2859-1:1999 [19]. As his papers were largely of an evolving nature, this section will focus primarily on Clements [9]. His approach differs slightly from that of Bray et al. [4] in that he specifies a sampling plan in terms of a sample size, a critical value for the maximum allowable number of marginally-conforming items in the sample, and a critical value for the maximum allowable number of nonconforming items in the sample, i.e., (n, c,, c Z ) A random sample of n items is inspected and the number of marginally-conforming items (dl) and nonconforming items (d2) in the sample are counted. If both of the following inequalities are satisfied:

d, lc, and d2 lc, then the lot is accepted; otherwise, it is rejected. In this case, the function for the sampling plan's probability of acceptance is simply the cumulative distribution function for the trinomial distribution:

Clements [9] used Table 11-A of [30] as a launching pad. This table provides code letters, sample sizes, and associated acceptance and rejection numbers based on various acceptance quality limits (AQL) under normal inspection. As much as possible, Clements limited himself to using the preferred sample sizes and AQL values to construct sets of possible trinomial-based alternatives to the binomialbased plans in the table. He pointed out that the system he developed worked best for AQL values less than 4.0% and code letters L and above. For code letters from F to K, adjustments to sample sizes were necessary. However, he was able to succeed in using a single set of acceptance numbers along each diagonal in his version of the master table.

3.4 Sampling by Variables

At the suggestion of O.B. Allen (University of Guelph), Brown [5] investigated extending the three-class attributes sampling plan approach to a sampling-byvariables framework. (Newcombe and Allen [22] later published an abbreviated version of this thesis.) The primary focus of this work was the application where the variable of interest is distributed according to a normal distribution with unknown mean and standard deviation and the marginally-conforming and nonconforming specification limits are one-sided. However, the thesis [5] does include a chapter dealing with techniques for applying the method under situations of nonnormality. Somewhat paralleling the methodology of Bray et al. [4], Brown approached the problem by applying two two-class sampling plans simultaneously: one for the proportion nonconforming and one for the combined proportion marginallyconforming and nonconforming. Such a sampling plan is specified by a sample size, an acceptability constant with respect to the sum of the proportions marginally-conforming and nonconforming, and another acceptability constant with respect to the proportion nonconforming, i.e., (n, kI2,k2). A random sample of n items is inspected and the sample mean (x,- ) and standard deviation (s) are calculated. If both of the following inequalities are satisfied:

Z+k,,s 0 may be to' reduce the values of the outer specification limits (using the same method described above) to the point where c2 would become 0 for the sample size being used and the desired probability of acceptance for the proportion outside these limits (in effect, this introduces a fourth class). 5.2.2 Control of Initial Quality

In the early 1970s, following the introduction of sampling inspection for the control of in-service quality, acceptance sampling was introduced for the control of the initial quality of new and reserviced utility meters prior to their installation and use. Again, a sampling-by-variables approach was adopted. However, in this case, MIL-STD-414 [29] was selected and sampling plans were based on an AQL of 1.0%. As in the case for the control of in-service meter quality, the use of these sampling plans was beset with problems. Some of these deficiencies included: lots were most frequently not produced in a manner to satisfy the required continuing-series assumption. tests to determine whether the distribution of the meters' performance results satisfied the normality assumption were never performed. the required switching rules were not applied and no possibility for discontinuation of sampling inspection existed. rejected lots could be resubmitted without evidence of corrective action. univariate sampling plans were being applied to a product with correlated multivariate characteristics.

attribute-type qualitative characteristics were being evaluated using inappropriate sample sizes. the effects of measurement uncertainty were ignored. To address the more serious problems above, I S 0 2859-2 [20] is an obvious solution as it is designed for isolated lots and does not assume quality characteristics are normally distributed. The existing AQL of 1.O% would be translated to a limiting quality (LQ) of 3.15% per the standard's recommendation. As a result, sample sizes will be necessarily larger than those that were previously used. A possible means to reduce the required sample size would be to apply the narrow-limit approach described in 5.2.1 above. The new specification limits designating the marginally-conforming class could be defined using one of the other tabulated LQ values (e.g., LQ = 8.0%) in the isolated-lot sampling standard. The sample size and acceptance number associated with this LQ and a c2 value of 0 would complete the sampling plan. In addition, double or multiple sampling plans could be devised to further reduce sample sizes on average for good-quality scenarios.

5.3 Packaged Products Sampling plans for the net contents of packaged products (also referred to as prepackaged commodities) are set out in the Canadian Weights and Measures Regulations. These sampling plans involve both attributes and variables criteria. An attributes sampling plan is used to control the proportion beyond a lower specification limit and a hypothesis test is used to determine whether the lot mean is acceptably close to the declared quantity for the packages. In addition, a requirement is specified that no more than one unit may be below twice the lower specification limit, in effect creating a three-class attributes plan with two classes of nonconforming product. Unlike utility meters, packaged products are not inspected before entering the marketplace, and when they are inspected, the focus is on the available lot rather than the process that produced it. The objective of these sampling plans is not explicitly given but it is apparent upon some analysis of the given details that the plans are very loosely based on I S 0 2859-1 [19]. Evidently, two sampling plans (n, c2) based on an AQL of 2S%, namely, (32, 2) and (125, 7), were extracted from the standard and then several additional sampling plans were constructed around them [the ranges of the sample sizes are 10(1)32(32)125 with a range of acceptance numbers of 0(1)7]. The criterion regarding the sample mean is based on a t-test designed to accept the null hypothesis 99.5% of the time. Considering the general design of the sampling scheme and how the sampling plans are used in practice, several deficiencies are apparent: wrong sampling standard being applied as lots are inspected in isolation from the production process. incorrect hypothesis test for the mean being used as the lots are finite. tests to determine whether the distribution of the packages' net quantities satisfy the normality assumption of the t-test not being performed.

arbitrary sample sizes and acceptance numbers being used rather than a coherent system of sampling plans. the effects of measurement uncertainty being ignored. sampling scheme design is overly biased towards the producer and lacking uniformity in consumer protection. Most of these problems could be eliminated by using a more appropriate sampling plan standard such as I S 0 2859-2 [20]. If an AQL of 2.5% could be accepted in the "continuing series of lots" sense, an LQ of 8.0% would be appropriate according to the isolated-lot sampling standard. In addition, GSb [15, 161, for example, has given the proper test of significance for the mean of a finite lot under the normality assumption and this test has been known for several decades. Despite readily available solutions, even OIML's draft recommendation [21] on this subject suffers from the same problems mentioned above with the exception that only three individual sampling plans have been proposed. A possible alternative that would eliminate the need to make any distributional assumptions would be to use a three-class sampling plan approach where the marginally-conforming class would consist of packages with net quantities less than the declared quantity but greater than or equal to the lower specification limit. Since the t-test for the lot mean has an analogous distribution-free alternative through the well-known sign test (see, for example, Randles [24]), this hypothesis test can be applied to determine the acceptability of the distribution's location or central tendency. As the test simply involves counting the number of packages in the sample with actual quantities less than the declared quantity, it can be readily integrated within the sampling plan design. An example of a possible three-class sampling plan (n, c12,c2) would be (50, 30, l), based on a lot size from 501 to 1200 per [20]. The value c12was determined using the binomial distribution plan (50, 30) to provide approximately a 95% probability of acceptance when the proportion of the lot above the declared quantity is 0.50. Such an approach has the added advantages of being readily applicable to net contents that are variable within a homogeneity classification as well as being transparent to the user. It should also be noted that if the criterion regarding a package exceeding twice the lower specification limit were maintained, technically a four-class sampling plan would result.

6 Conclusion This paper has reviewed key contributions to three-class sampling plan development, offered some minor extensions to this work, and considered some applications that could potentially benefit from their implementation. Three-class sampling plans provide viable alternatives to the much more commonly used two-class sampling plans, offering greater ability to discriminate between lots of acceptable and unacceptable quality as well as offering economy in

inspection for similar control of quality. Their addition to the set of tools currently available to quality practitioners should be beneficial in quality control and improvement activities. With recent emphasis on quality concepts such as "continuous improvement" and "six sigma" (where quality levels in parts per million are being sought), three-class sampling plans may find many useful applications. The development of generic international standards for three-class sampling plans would aid in making this tool more generally available to potential users. In this regard, the work of Clements [7, 8, 91 could serve as a good foundation for creating a three-class counterpart to I S 0 2859-1 [I91 and I S 0 2859-2 [20]. As a by-product of this review, another area that could benefit from more research and possibly benefit from the exposure of international standardization is recommended methodologies for multiple classification or grading of lots as a result of sampling inspection. Bebbington et al. [3] observed that current sampling inspection standards do not address this reality. Collani also brought to my attention some of his work [I01 in this regard. These authors' research could serve as a foundation for developing a flexible methodology for the multiple classification of lots. Finally and more generally, there are many areas in both government and industry where standardized statistical quality control methods (such as I S 0 standards) are applied outside their intended scope or modified in uninformed manners by users. In many cases, more appropriate solutions already exist or could be readily developed upon analysis of the control activity's objective and setting. However, unless formal reviews are conducted, opportunities for improvement may not be identified and ineffective practices may go uncorrected for years. In the case of regulatory applications in particular, interested researchers could assist by initiating such reviews as research topics. This would contribute not only to the advancement of science but also to more efficiency in rules and regulations and hopefully reduced costs to society.

Acknowledgements I would like to thank the following individuals for their support and contributions to the development of this paper: Mr. David Baillie (Chair, I S 0 TC69lSC5) for inviting and supporting the topic of this paper; Prof. Dr. Elart von Collani (Universitat Wiirzburg) for supplying reference material as well as general support for the topic of this paper; Dr. Pat Newcombe-Welch (University of Waterloo) for kindly providing me with her MSc. thesis [5]; and Dr. K. Govindaraju (Massey University) for kindly providing me with his paper [3] as well as identifying additional references [27,28]. I would also like to thank Mr. Baillie, Dr. von Collani, Dr. Govindaraju, Dr. Olgierd Hryniewicz (Polish Academy of Sciences), Dr. Alvin Rainosek (University of South Alabama), and Mr. Henry Telfser (Measurement Canada) for providing helpful feedback on an earlier draft of this paper. Fi-

nally, I would like to thank an anonymous referee for identifying several editorial improvements and generally adding value to this final version of the paper.

References Ahmed, F.E., Ed. (1991). Seafood Safety. National Academy Press, Washington. Baillie, D.H. (1987). Multivariate acceptance sampling. In: Frontiers in Statistical Quality Control 3, Lenz, H.J. et al., Eds., Physica-Verlag, Heidelberg, 83-1 15. Bebbington, M., Govindaraju, K., DeSilva, N., and Volz, R. (2000). Acceptance sampling and practical issues in procurement inspection of apples. ASA Proceedings of the Section on Quality and Productivity, 104-109. Bray, D.F., Lyon, D.A., and Burr, I.W. (1973). Three Class Attributes Plans in Acceptance Sampling. Technometrics, Vol. 15, No. 3, 575-585. Brown, P.A. (1984). A Three-Class Procedure for Acceptance Sampling by Variables. Unpublished M.Sc. Thesis, University of Guelph, Department of Mathematics and Statistics, Guelph, Ontario, Canada. Cassady, C.R. and Nachlas, J.A. (2003). Evaluating and Implementing 3Level Acceptance Sampling Plans. Quality Engineering, Vol. 15, No. 3, 361369. Clements, J.A. (1979). Three-Class Attribute Sampling Plans. ASQC Technical Conference Transactions - Houston, 264-269. Clements, J.A. (1980). Three-Class Sampling Plans - Continued. ASQC Technical Conference Transactions - Atlanta, 475-482. Clements, J.A. (1983). Trinomial Sampling Plans to Match MIL-STD-105D. ASQC Quality Congress Transactions - Boston, 256-264. v. Collani, E. and Schmidt, R. (1988). Economic Attribute Sampling in the Case of Three Possible Decisions. Technical Report of the Wiirzburg Research Group on Quality Control, No. 13, September. v. Collani, E. (1990). ANSIIASQC 21.4 versus ANSIIASQC Z1.9. Economic Quality Control, Vol. 5, No. 2,60-64. v. Collani, E. (1991). A Note on Acceptance Sampling for Variables. Metrika, Vol. 38, 19-36. v. Collani, E. (1992). The Pitfall of Acceptance Sampling by Variables. In: Frontiers in Statistical Quality Control 4, Lenz, H.J. et al., Eds., PhysicaVerlag, Heidelberg, 91-99. Gdb, R. (1996). An Elementary Model of Statistical Lot Inspection and its Application to Sampling by Variables. Metrika, Vol. 44, 135-163. Gob, R. (1996). Tests of Significance for the Mean of a Finite Lot. Metrika, Vol. 44,223-238. Gdb, R: (2001). Methodological Foundations of Statistical Lot Inspection. In: Frontiers in Statistical Quality Control 6, Lenz, H.J. and Wilrich, P.-Th., Eds., Physica-Verlag, Heidelberg, 3-24.

17. Hryniewicz, 0 . (1991). A Note on E. v. Collani's Paper "ANSIIASQC 21.4 versus ANSIIASQC Z1.9". Economic Quality Control, Vol. 6, No. 1, 16-18 18. ICMSF (1986). Sampling for microbiological analysis: principles and specific applications, 2nd ed. Microorganisms in Foods, book 2. University of Toronto Press, Toronto. 19. International Organization for Standardization (1999). I S 0 2859-1:1999. Sampling procedures for inspection by attributes - Part I: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection. ISO, Geneva. 20. International Organization for Standardization (1985). IS0 2859-2:1985. Sampling proceduresfor inspection by attributes - Part 2: Sampling plans indexed by limiting quality (LQ)for isolated lot inspection. ISO, Geneva. 21. International Organization of Legal Metrology (2002). OIML R 87 Net Quantity of Product in Prepackages, 3rd Committee Draft Revision. OIML, Paris. 22. Newcombe, P.A. and Allen, O.B. (1988). A Three-Class Procedure for Acceptance Sampling by Variables. Technometrics, Vol. 30, No. 4,415-421. 23. Owen, D.B. (1965). A Special Case of the Bivariate Non-Central tDistribution. Biometrika, Vol. 52, 437-446. 24. Randles, R.H. (2001). On Neutral Responses (Zeros) in the Sign Test and Ties in the Wilcoxon-Mann-Whitney Test. The American Statistician, Vol. 55, NO. 2, 96-101. 25. Schilling, E.G. (1982). Acceptance Sampling in Quality Control. Marcel Dekker, New York. 26. Shah, D.K. and Phatak, A.G. (1977). The Maximum Likelihood Estimation Under Curtailed Three Class Attributes Plans. Technometrics, Vol. 19, No. 2, 159-166. 27. Shah, D.K. and Phatak, A.G. (1978). The maximum likelihood estimation under multiple three class attributes plan - Curtailed as well as uncurtailed, Metron, 36,99-118. 28. Singh, H.R., Sankar, G., and Chatterjee, T.K. (1991). Procedures and tables for construction and selection of three class attributes sampling plans, (STMA V33 1597) IAPQR Transactions, Journal of the Indian Association for Productivity, Quality & Reliability, 16, 19-22. 29. U.S. Department of Defense (1 957). MIL-STD-414: Military Standard: Sampling Procedures and Tables for Inspection by Variablesfor Percent Defective. U.S. Government Printing Office, Washington. 30. U.S. Department of Defense (1963). MIL-STD-IO5D: Military Standard: Sampling Procedures and Tables for Inspection by Attributes. U.S. Government Printing Office, Washington.

Part 2 On-line Control

2.2 Control Charts

CUSUM Control Schemes for Multivariate Time Series Olha Bodnarl and Wolfgang Schmid2 Department of Statistics, European University, PO Box 1786, 15207 Frankfurt (Oder), Germany obodnarQeuv-f rankf urt-o .de Department of Statistics, European University, PO Box 1786, 15207 Frankfurt (Oder), Germany schmidBeuv-f rankf urt-o de

.

Summary. Up to now only a few papers (e.g., Theodossiou (1993), Kramer and Schmid (1997)) dealt with the problem of detecting shifts in the mean vector of a multivariate time series. Here we generalize several well-known CUSUM charts for independent multivariate normal variables (Crosier (1988), Pignatiello and Runger (1990), Ngai and Zhang (2001)) to stationary Gaussian processes. We consider both modified control charts and residual charts. It is analyzed under which conditions the average run lengths of the charts do not depend on the covariance matrix of the white noise process. In an extensive Monte Carlo study these schemes are compared with the multivariate EWMA chart of Kramer and Schmid (1997). The underlying target process is a vector autoregressive moving average process of order (1,l). For measuring the performance of a control chart the average run length is used.

1 Introduction There are many cases in which a n analyst is interested in t h e surveillance of several characteristic quantities. If, for instance, we consider a portfolio of stocks then t h e investor wants t o have a hint about possible changes in t h e return behavior of t h e stocks. By adjusting his portfolio a t a n early stage he avoids t o lose money in a bear market and he is able t o increase his profit in a bull market. T h e second example is taken'from environmental statistics. In order to measure t h e air pollution a lot of characteristics are monitored. T h e early detection of a n increase in t h e pollution helps t o reduce any health risks. Such problems can be found in many areas of economics, engineering, natural sciences, and medicine. Of course it is always possible t o reduce t h e surveillance problem t o a univariate one by considering each individual process characteristic separately. But then all information about t h e dependence structure of t h e quantities is lost. This attempt will not lead t o a power-

ful algorithm. For that reason it is necessary to jointly monitor all process characteristics. Most of the literature about the surveillance of multivariate observations is based on the assumption of independent and normally distributed samples. Moreover, it mainly deals with the detection of mean shifts. The first control chart for multivariate data was proposed by Hotelling (1947) (see also Alt (1985), Alt and Smith (1988)). It is known as a multivariate Shewhart control chart (T2- or X2- chart) and it is designed for detecting shifts in the mean vector of independent multivariate normally distributed random vectors. This control scheme is based on the Mahalanobis distance between the vector of observations and the target mean vector of the process. The interpretation of the out-of-control signals is discussed in detail by Murphy (1987), Mason et al. (1995), Runger et al. (1996), and Timm (1996). An extension of the exponentially weighted moving average (EWMA) chart of Roberts (1959) to multivariate observations was given by Lowry et al. (1992). Each component is weighted by its own smoothing parameter. Instead of one smoothing parameter the multivariate EWMA chart (MEWMA) works with a smoothing matrix. The distance between the multivariate EWMA recursion and its target value is measured by the Mahalanobis distance. The underlying observations are assumed to be independent and normally distributed. The generalization of the cumulative sum (CUSUM) chart of Page (1954) has turned out t o be more difficult. Several proposals have been made in literature (e.g., Woodall and Ncube (1985), Healy (1987), Crosier (1988), Pignatiello and Runger (1990), Hawkins (1991, 1993)). More recently, Ngai and Zhang (2001) proposed a projected pursuit CUSUM (PPCUSUM) control chart which is an extension of the multivariate CUSUM procedure proposed by Healy (1987). In many applications the underlying data-generating process possesses a more complicated structure. Alwan (1989) analyzed many data sets and he showed that the independence assumption is frequently not fulfilled. In the last 15 years a great number of publications in statistical process control treated the topic of control charts for time series. A natural procedure is to determine the design of the control scheme in the same way as for independent samples but with respect to the time series structure. This is the main idea behind the modified control charts (e.g., Vasilopoulos and Stamboulis (1978), Schmid (1995, 1997alb)). The second procedure is to transform the original data to independent ones. This leads to the residual charts (e.g., Alwan and Roberts (1988), Montgomery and Mastrangelo (1991), Lu and Reynolds (1999)). Up to now there are only a few papers dealing with the surveillance of multivariate time series. Using the sequential probability ratio method of Wald (1947) Theodossiou (1993) derived a multivariate CUSUM control chart for vector autoregressive moving average (VARMA) processes. It is based on the knowledge on the size and the direction of the shift. It has been shown that this chart reacts very quickly to deviations from the chosen direction. Runger

(1996) considered a model which is related to the method of principal component analysis. Based on a principal component decomposition a control statistic is presented. An extension of the MEWMA chart to time series was given by Kramer and Schmid (1997). They proposed modified and residual control charts for the detection of shifts within correlated components. In a new paper ~ l i w aand Schmid (2005) proposed several c~ntr01charts for the surveillance of the correlation structure between the components of a multivariate time series. However, in this paper we shall focus on mean shifts. The remainder of the paper is organized as follows. In the next section (Section 2) a model for the description of the out-of-control situation is introduced. In Section 3 modified multivariate CUSUM control charts are derived. The control charts of Crosier (1988), Pignatiello and Runger (1990), and Ngai and Zhang (2001) are extended to a multivariate stationary time series. In Section 3.5 an invariance property of these control schemes is considered. Several CUSUM control schemes for residuals are introduced in Section 4. In Section 5 the results of a simulation study are presented. Based on the average run length (ARL) all control charts are compared with each other. Moreover, we include the MEWMA scheme of Kramer and Schmid (1997). Section 6 contains concluding remarks.

2 Model One of the main aims of statistical process control is to monitor whether the observed process (the actual process) coincides with a target process. In engineering the target process is equal to the process which fulfills the quality requirements. In economics it is obtained by fitting a model to a previous data set. In the following {Yt) stands for the target process. Let { Y t ) be a p-dimensional time series and let Y t = ( Y t l , ...,Ytp)'. In the rest of the paper it is always assumed that {Yt) is a (weakly) stationary process with mean vector pO= (pol,..,pop)' and cross-covariance matrix r ( h ) = E((Yt+h -po)(Yt -PO)') = [E((&t - ~ o t ) ( & - - h ,-jC L O ~ ) ) ] : , = ~ =[~Y Z J ( ~ ) ] : , ~ = ~ at lag h. In practice it is necessary to estimate the parameters of the target process. We do not want to discuss the influence of parameter estimation on the power of a control scheme. This was done in literature by, e.g., Kramer and Schmid (2000). Thus, we assume that the process mean vector ,uO and the crosscovariance matrices r ( h ) , h 0 are known. Suppose that the data ~ 1 ~ x.. 2are , sequentially taken. The data are assumed to be a realization of the observed process {Xt). Here our aim is to monitor the mean behavior of the observed process. The observed process may deviate from the target process by a mean change. It is assumed in the following that Xt = Yt + al{q,+ 1,...) ( t ) , (1)

>

d m ,

where a = (al ...,a,d-)' E 1RP. The symbol l A ( t )denotes the indicator function of the set A at point t . The values a l , ...,a, E lR and q E fl are unknown quantities. If a # 0 we say that a change point at time q is present. a describes the size and the direction of the change. In case of no change the target process coincides with the observed process. We say that the observed process is in control. Else, it is denoted to be out of control. The in-control mean vector po is called target value. Note that X t = Y t for t 0, i.e. both processes are the same up to time point 0. For determining the control design only the target process is of relevance. It is assumed that { Y t ) is a p-dimensional stationary Gaussian process. The relationship between the observed and the target process has to be clarified in order to obtain statements about the performance of the control charts (Section 5). This is done by applying model (1). In the following EaYqdenotes the expectation taken with respect to this model. Eo means that no shift is present. By analogy the notations Cov,,,, Cove, etc. are used.

0 plays the role of a 1-dim. reference value. Hence the multivariate CUSUM recursion is defined in the following way 0

(St-I

+ X t - p o ) ( l - $1

if Ct 5 k if Ct > k

(5)

for t 2 1 with So = 0. The scheme gives an out-of-control signal as soon as the length of the vector S t , i.e.

exceeds a preselected control limit. Crosier (1988) proved that the distribution of the run length depends on the mean vector p and the covariance matrix C only through the non-centrality parameter X = lip - poll&, i.e. it is directionally invariant. Concluding, the idea behind his MCUSUM scheme is first to update the vector of cumulative sums, then to shrink it toward 0 , and finally, to use the length of the updated and shrunken CUSUM t o test whether or not the process

is out-of-control. The choice of the reference value k = X/2 is recommended to detect any shift in the mean vector. The extension of Crosier's MCUSUM control chart t o time series is obtained in a similar way as for the MC1 and MC2 schemes. At time t we calculate Ct with respect to the covariance matrix I'(O), i.e.

Then the cumulative vector of sums Sm,t is calculated with respect to the new quantity C,,t. This means that

for t

> 1 with Sm,o =

0.Finally the control scheme signals at time t if

MCUSUMm,t = max(0, Cm,t - l c )

> hs .

(7)

3.4 Modified P r o j e c t e d P u r s u i t C U S U M

Another extension of the CUSUM chart is based on the idea of projection pursuit. It was proposed by Ngai and Zhang (2001) for independent vectors. Here it will be denoted as PPCUSUM. For a given direction a 0 with /la0112 = 1 (Euclidean norm) we define the CUSUM statistic Cp

= 0,

Ctao = max(0, lC !:

+ a b ~ = - l / -~ p( o~) ~- k ) ,

,

t

> 1 . (8)

Pollak (1985) and Moustakides (1986) showed that the univariate CUSUM chart possesses certain optimality properties. If the direction of the shift would be known in the multivariate context then the CUSUM chart based on the projected observations afXt would reflect this desirable behavior. The problem consists in the fact that the direction is unknown and therefore the statistic cannot be directly applied. Ngai and Zhang (2001) proposed to solve this problem by estimating a0 by Ao, and approximate C F by Here tois the value at which CFo attains its maximum on the unit circle, i.e. C? = maxllal12,1 C f . They proved that maxllal12,1 Cta = P P C U S U M t with

c?.

P P C U S U M t = max(0,

Itst-1,tllz - k,

11%-z,tllz - 2k,

..., IISo,tllz

- tic)

(9)

>

for t 1 with St-,,t as in Section 3.1. The control scheme gives a signal as soon as P P C U S U M t is sufficiently large. If the process is concluded to be out of control at time t o then there is tl < to such that

J s : ~ , ~ ~ z- -(to~- st l )~k ~ = ,max ~ ~C t > h Ilallz=l

The direction of the shift is estimated by

and

Because the process should be normalized with respect to the covariance matrix of the time series I= has to be replaced by r ( 0 ) in, (8) and thus the norm in (9) is taken with respect to r(O), i.e. Il..llrco, is always used. Hence the modified control chart gives an alarm if

where h4 is a given constant. The constant h4 is again chosen to achieve the fixed in-control average run length. 3.5 Invariance Property of the Modified MC1, MC2, MCUSUM, and PPCUSUM Chart The control schemes derived in the Sections 3.1-3.4, i.e., the modified MC1 and MC2 schemes, the modified MCUSUM chart, and the modified PPCUSUM method are rather general and can be applied to a large family of stochastic processes. On the other side, it is useful to know properties of the control charts, which one can use to simplify the calculation of the control design. One of the most important properties of these control charts in the independent case is the invariance of the in-control average run length with respect to X. However, it is rather difficult to establish such a property for time series data. That is why in this section we restrict ourselves to the case of a stationary VARMA(1,l) process { Y t ) i.e. , {Yt) is astationary solution of the stochastic difference equation

Assuming that det(1 - cf, z ) has no zeros on and inside the unit circle there exists a unique stationary solution. This model has turned out to be of great importance in many practical applications.

Theorem 1. Let Y t be a stationary p-dimensional VARMA(1,l) process. Let cf, be a matrix such that det(1 - @ z ) has no zeros on and inside the unit circle. Suppose that {et} are independent and normally distributed with zero Then the mean vector and positive definite covariance matrix I= = [aij]7,j=1. distributions of the run lengths of the modified M C I , modified MC2, modified MCUSUM, and modified PPCUSUM control charts do not depend on X i n the in-control state if cf, 2 and O X are symmetric matrices.. The proof of the Theorem is given in the appendix. Note that for a symmetric matrix cf, it only holds that cf,Xis symmetric for all positive definite matrices X if @ = $1 for some 4 E (-1,l). Thus the charts,are in general not directionally invariant.

4 Multivariate CUSUM Residual Charts In recent literature there are many papers about residual charts. In most of the papers the mean behavior of a time series is monitored for the mean dealing with (e.g., Alwan and Roberts (1988), Kramer and Schmid (1997), Lu and Reynolds (1999)). The idea behind the residual charts is to transform the original data to new variables which are independent. Next the above presented CUSUM charts for independent values are applied to the transformed quantities, the residuals. The process residuals are determined as the difference between the observation a t time t and a predictor for this value. For calculating the predictor of X t it is frequently assumed that the whole history of the process up to -m is known. But in practical applications such an assumption is not realistic. Following Kramer and Schmid (1997) we want to predict X t based on the random vectors X I , ..., This is done under the assumption that the process is in control. In other words, xt = f ( X I , ...,Xt-l), where f ( y l , ...,yt-I) is the best linear predictor of Y t given that Y I = y l , ...,Yt-I = yt-I. In the comparison study of Section 5 we assume that the target process {Yt) is a stationary VARMA(1,l) process. Then from Brockwell and Davis (1991, ch.11) it follows that the best linear predictor is given by

xt = with

xl

@ X t d l - Elt (Xt-1 - x ~ - ~ ) for

t 22

= 0 and the matrix O t is computed recursively as

where X+OI=O'-OtVt-lQ~

for for

t=1 t22

'

When the process { Y t ) is invertible then it follows that O t -t O and V t -t E with t -+ w. A recursive procedure for the calculation of xt and V t = Varo(Xt xt)for an arbitrary stationary process can be found in Brockwell and Davis (1991, ch.11). The normalized residuals are given by q t = v;"~ ( X t x t ) for t 1. Now it holds that Eo(qt) = 0 , Covo(qt) = I and Eo(qt 77:) = 0 for t # s. Assuming that {Xt) is a Gaussian process it follows that in the in-control state the variables { q t ) are independent and normally distributed. Therefore one can apply the usual control charts for independent observations to the residuals {q,). It has to be noted that in the case of a shift in the mean vector the normalized residuals are no longer identically distributed but normally distributed. The reason for this behavior lies in the starting problem (see Kramer and Schmid (1997)). Moreover, under the outof-control situation analyzed in this paper, the normalized residuals are still independent. Many analysts prefer t o work with the limit of O t and V t . Using the asymptotic values the corresponding residuals are not independent. This is a further drawback of this procedure.

>

4.1 MC1 for Residuals In Section 3.1 the modified MC1 control chart was introduced. In this section we use the same control procedure but instead of the original observations it is applied to the normalized residuals. Hence, we obtain the following control statistic

The number of observations since the last reconstruction of the control chart n,,t is defined by analogy to (2). The control chart gives a signal as soon as the statistic MClrtt exceeds the control limit. It is determined such that the in-control ARL of the chart is equal to a fixed value [. In practice this equation has to be solved by simulations.

4.2 MC2 for Residuals The MC2 scheme for residuals is based on the quantity D;,, = q:qt The CUSUM recursion is given by

with MC2r,o = 0. The MC2 control chart for residuals signals a change if the statistic M C ~ , Jexceeds the preselected control limit.

4.3 MCUSUM for Residuals By analogy to Section 3.3 we consider C , t = (S,t-I +qt)'(Sr,t-l + q t ) with S , t as in ( 5 ) . Then,

A large value of MCUSUMr,t is a hint that a change has happened. The control limit is determined as described above. It is obtained within an extensive Monte Carlo study.

4.4 PPCUSUM for Residuals The PPCUSUM control statistic for residuals is defined as

As usually the control limit is chosen such that the in-control ARL is equal to a fixed value [. Because of no explicit formula is available for the distribution of the ARL we calculate it via simulations.

5 A Comparison of the Multivariate Control Charts In this section we want to compare the introduced control charts with the multivariate EWMA charts of Kramer and Schmid (1997). 5.1 Structure of the Monte Carlo Study The in-control process is taken to be a two-dimensional stationary VARMA(1,l) process with mean vector po = 0 and normally distributed white noise {Q) as defined in Theorem 1. It is assumed that the conditions of Theorem 1 hold. In order to assess a control scheme it is necessary to fix the out-of-control situation. Here we make use of the model (1) with q = 1. Hence,

with a = (a1 J m , a z J W K ) ) ' = (a1 J r l l o , a z J y z z O y z z o ) ' . Because the distribution of st is symmetric it follows that the distribution of ( X I , ..., Xk) for a shift of size a is the same as its distribution for a shift of size -a, i.e. it holds that Pa((X1, . . , X k ) E B ) = P-,((XI, . . , X k ) E B ) for all Bore1 sets B C RZk.Because of this symmetry we can assume without any restriction in our analysis that a2 2 0. The usual problem in dealing with multivariate time series is the huge number of parameters. Therefore we assume that @ and Q are diagonal matrices. Here we present the results for the configuration

As a measure of the performance of a control chart the average run length (ARL) is applied. All charts were calibrated to have the same in-control ARLs, here 200. Because no explicit formula for the in-control and the out-of-control ARLs are available a Monte Carlo study is used to estimate these quantities. Our estimators of the in-control ARLs are based on l o 5 simulated independent realizations of the process. The control limits of all charts were determined by applying the Regula falsi to the estimated ARLs. While in Table 1 the control limits of the modified charts are given for various values of the reference value k and the smoothing parameter r , Table 2 contains the control limits for the residual schemes for the same values of k and r. For the modified charts the control limits depend on the process parameters. Because in the in-control state the residuals are independent and identically distributed the control limits of the residual charts are the same as for independent samples. Furthermore they do not depend on the process parameters. In the out-of-control state they are again independent but no longer identically distributed. Consequently they are not directionally invariant. However, the influence of the

first observations is small and thus the charts behave nearly like directionally invariant. For the CUSUM schemes the control limits decrease as k increases. Conversely, for the MEWMA charts the control limits increase as the parameter r increases. Finally, almost in all cases the control limits of the modified control charts are larger than the corresponding limits of the residual charts. Table 1. Control limits of the modified MEWMA, MC1, MC2, MCUSUM, and PPCUSUM charts for the two-dimensional VARMA(1,l) process of Section 5.1 (incontrol ARL = 200).

Table 2. Control limits of the MEWMA, MC1, MC2, MCUSUM, and PPCI.JSUM residual charts for the two-dimensional VARMA(1,l) p.rocess of Section 5.. l (incontrol ARL = 200).

In order to study the out-of-control behavior of the proposed control charts we take various reference values k into account. For all CUSUM charts k is chosen as an element of the set {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1). For the multivariate EWMA charts the smoothing matrix is taken as a diagonal matrix with equal diagonal elements r with r E {0.1,0.2,0.3,0.4,0.5,0.6,0.7, 0.8,0.9,1.0).

5.2 Behavior in the Out-Of-Control State For the determination of the out-of-control ARLs we made use of l o 6 independent realizations of the underlying process. In the Tables 3 to 6 the out-of-control ARLs of all control charts within our study are presented. The corresponding values r and k at which the smallest out-of-control ARLs are attained, are given in brackets. These values should be taken to detect the specific shift in the mean vector of the target process. For a given shift the ARL of the best chart is printed boldfaced.

Table 3. Minimal out-of-control ARLs of the modified MEWMA, MC1, MC2,

MCUSUM, and PPCUSUM control charts for different values of the parameters

r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - part 1 l\az

0.0

0.25

0.5

0.75

1.0

1.25

1.5

5.56(0.3) 4.74(0.3) 3.98(0.4) 3.36(0.4) 2.84(0.5) 2.42(0.7) 2.06(0.q 5.04 (0.9) 4.31 (1.0) 3.62 (1.1) 3.08 (1.1) 2.66 (1.1)2.35 (1.1)2.09 (1.1) 6.83 (0.8) 5.56 (1.1) 4.49 (1.1) 3.64 (1.1) 3.00 (1.1) 2.50 (1.1) 2.12 (1.1) 1.50 5.25 (1.1) 4.49 (1.1) 3.81 (1.1) 3.29 (1.1) 2.86 (1.1) 2.53 (1.1) 2.28 (1.1) 5.29 (1.0) 4.49 (1.1) 3.79 (1.1) 3.23 (1.1) 2.80 (1.1) 2.47 (1.1) 2.20 (1.1) 7.59 (0.2) 6.24 (0.3) 5.04 (0.3) 4.15 (0.4) 3.44 (0.4) 2.86 (0.5) 2.42 (0.7) 6.94 (0.7) 5.74 (0.9) 4.67 (1.0) 3.80 (1.1) 3.15 (1.1) 2.69 (1.1) 2.35 (1.1) 10.36 (0.8) 8 1 1 0 6.21 (1.0) 4.77 (1.1) 3.74 (1.1) 3.02 (1.1) 2.50 (1.1) 6.01 (1.0) 4.89 (1.1) 4.00 (1.1) 3.37 (1.1) 2.89 (1.1) 2.53 (1.1) 1.25 7.27 (0.9) 7.37 (0.8) 6.05 (0.9) 4.90 (1.0) 3.98 (1.1) 3.32 (1.1) 2.83 (1.1) 2.46 (1:l) 11.06 (0.2) 8.52 (0.2) 6.63 (0.2) 5.20 (0.3) 4.19 (0.3) 3.41 (0.4) 2.82 (0.5) 10.06 (0.6) 8.05 (0.7) 6.25 (0.8) 4.87 (1.0) 3.87 (1.1) 3.16 (1.1)2.66 (1.1) 17.03 (0.5) 12.57 (0.7) 9.03 (0.7) 6.57 (1.0) 4.88 (1.1) 3.73 (1.1) 2.98 (1.1) 10.60 (0.7) 8.43 (0.8) 6.54 (0.9) 5.09 (1.1) 4.05 (1.1) 3.36 (1.1) 2.86 (1.1) 1.0 10.86 (0.7) 8.55 (0.8) 6.63 (0.9) 5.13 (1.0) 4.03 (1.1) 3.30 (1.1) 2.80 (1.1) 17.03 (0.1) 12.34 (0.1) 8.98 (0.2) 6.66 (0.2) 5.13 (0.3) 4.08 (0.4) 3.31 (0.4) 15.83 (0.4) 11.83 (0.5)8.66 (0.6) 6.40 (0.8) 4.86 (1.0)3.78 (1.1) 3.07 (1.1) 31.01 (0.3) 21.23 (0.4) 13.96 (0.6)9.37 (0.8) 6.54 (1.0) 4.74 (1.1) 3.61 (1.1) 3.75 16.71 (0.5) 12.47 (0.6) 9.11 (0.8) 6.70 (0.9) 5.08 (1.1) 3.98 (1.1) 3.27 (1.1) 17.34 (0.5) 12.81 (0.6) 9.24 (0.7) 6.79 (0.9) 5.11 (1.0) 3.96 (1.1) 3.22 (1.1) 32.06 (0.1) 19.21 (0.1) 12.55 (0.1)8.73 (0.2) 6.41 (0.2) 4.88 (0.3) 3.86 (0.47 28.67 (0.3) 19.03 (0.4)12.48 (0.5)8.58 (0.7) 6.19 (0.8) 4.61 (1.0)3.59(1.1) 63.89 (0.2) 39.55 (0.3) 22.87 (0.4)13.89 (0.6)8.93 (0.8) 6.12 (1.0) 4.43 (1.1) 29.99 (0.3) 20.01 (0.4) 13.21 (0.6)9.05 (0.8) 6.48 (1.0) 4.81 (1.1) 3.79 (1.1) 1.5 -,31.72 (0.3) 20.85 (0.4) 13.53 (0.6)9.18 (0.7) 6.55 (0.9) 4.86 (1.1) 3.76 (1.1) 84.86(0.1)35.97(0.1)18.28 (0.1)11.51 (0.1)7.96 (0.2) 5.87 (0.3) 4.50 (0.3) 67.48 (0.1) 34.08 (0.2)18.70 (0.4)11.61 (0.6)7.82 (0.7)5.63 (0.8)4.24 (1.0) 136.29 (0.2) 79.03 (0.2) 39.28 (0.3)20.95 (0.4)12.32 (0.7)7.93 (0.9) 5.45 (1.1) 1.25 69.80 (0.2) 35.63 (0.3) 19.69 (0.4)12.22 (0.6)8.24 (0.8) 5.89 (1.0) 4.43 (1.1) 75.81 (0.2) 38.26 (0.3) 20.54 (0.4)12.58 (0.6)8.36 (0.8) 5.94 (1.0) 4.42 (1.1) m o d m e w m a 76.17 (0.1) 27.73 (0.1) 14.97 (0.1)9.91 (0.2) 6.93 (0.2) 5.19 (0.3) modmcl 64.95 (0.1)27.65 (0.3)15.37 (0.4)9.76 (0.6) 6.73 (0.8) 4.93 (0.9) 138.03 (0.2)64.03 (0.2)30.69 (0.4) 16.69 (0.5)10.08 (0.8)6.64 (0.9) modmc2 m o d m c u s u m 67.35 (0.2) 29.07 (0.3) 16.17 (0.5) 10.27 (0.7)7.06 (0.9) 5.16 (1.1) .O m o d p p c u s u m 73.95 (0.2) 300.3 (0.3) 16.78 (0.5) 10.50 (0.9 7.16 (0.8) 5.16 (1.0) ~~~~~

The results of the modified charts can be found in the Tables 3 and 4. In our study the modified MC1 control chart overperforms the other modified schemes. In almost all cases it provides the smallest out-of-control ARL. On the second place the modified CUSUM chart can be ranked. It is clearly worse than the MC1 approach but it seems to be a little bit better than the other competitors. The modified PPCUSUM method and the modified multivariate EWMA chart follow on the next places. They provide similar results, once modppcusum is better, once modewma. The modified MC2 scheme behaves considerably worse. For nearly all shifts under consideration it has a larger out-of-control ARL than the other charts. In the Tables 5 and 6 the out-of-control ARLs of the residual charts are shown. In all cases the best performance is obtained by the MC1 control chart. On the second rank we find the MCUSUM control scheme applied to the residuals. The results of the PPCUSUM and MEWMA control charts are again quite similar. While for small and large shifts a better performance is reached

Table 4. Minimal out-of-control ARLs of the modified EWMA, MC1, MC2, MCUSUM, and PPCUSUM control charts for different values of the parameters T and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 2 al\a2

1.5 0.75 1.O 1.25 0.25 0.5 0.0 85.06 ( 0 . 1 ) 88.35 ( 0 . 1 ) 37.05 ( 0 . 1 ) 18.55 ( 0 . 1 ) 11.66 ( 0 . 1 ) 7.96 ( 0 . 2 ) 5.91 ( 0 . 3 )

67.31 ( 0 . 1 ) 66.11 ( 0 . 1 ) 33.49 ( 0 . 2 ) 18.46 ( 0 . 4 ) 11.49 ( 0 . 5 ) 7.76 (0.7) 5.61 ( 0 . 9 ) 0.25

137.26 ( 0 . 2 ) 137.08 (0.2) 69.76 ( 0 . 2 ) 68.32 ( 0 . 2 ) 76.03 ( 0 . 2 ) 74.93 (0.2) 32.03 ( 0 . 2 ) 43.54 (0.1)

28.52 ( 0 . 3 ) 34.62 (0.2) 0.5

0.75

1.0

1.25

1.5

63.82 30.13 31.72 17.06

(0.2) (0.3) (0.3) (0.1) 16.76 ( 0 . 5 ) 31.09 ( 0 . 4 ) 17.36 ( 0 . 5 ) 17.36 ( 0 . 5 ) 11.05 ( 0 . 2 ) 10.06 ( 0 . 6 ) 17.01 ( 0 . 5 ) 10.58 (0.7) 10.86 ( 0 . 6 ) 7.57 ( 0 . 2 ) 6.93 ( 0 . 7 ) 10.34 ( 0 . 6 ) 7.26 (0.9) 7.35 (0.8) 5.56 (0.3) 5.02 ( 0 . 9 ) 6.84 ( 0 . 8 ) 5.27 (1.1) 5.30 ( 1 . 0 )

78.81 36.23 38.67 21.87

(0.2) (0.3) (0.3) (0.1) 20.26 ( 0 . 4 ) 39.33 ( 0 . 3 ) 21.10 ( 0 . 4 ) 21.10 ( 0 . 4 ) 13.47 ( 0 . 1 ) 11.93 ( 0 . 5 ) 21.28 (0.5) 12.55 (0.6) 12.95 ( 0 . 6 ) 8.95 ( 0 . 2 ) 88.4 (0.7) 12.57 (0.6) 8.47 ( 0 . 8 ) 8.62 ( 0 . 8 ) 6.40 ( 0 . 3 ) 5.77 (0.8) 8.15 (0.8) 6.04 ( 0 . 9 ) 6.10 ( 0 . 9 )

( 0 . 2 ) 39.15 (0.3) 20.76 (0.4) 12.21 ( 0 . 7 ) 7.85 (0.8) ( 0 . 3 ) 19.48 ( 0 . 4 ) 12.11 ( 0 . 6 ) 8.19 ( 0 . 9 ) 5.87 (1.1) ( 0 . 3 ) 20.36 ( 0 . 4 ) 12.43 (0.6) 8.26 ( 0 . 8 ) 5.92 (1.0) ( 0 . 1 ) 19.79 (0.1) 12.82 (0.1) 8.82 ( 0 . 2 ) 6.46 (0.2) 28.23 ( 0 . 3 ) 18.59 ( 0 . 4 ) 12.24 ( 0 . 5 ) 8.45 (0.1) 6.11 (0.8) 63.81 ( 0 . 2 ) 39.18 ( 0 . 3 ) 22.56 (0.4) 13.65 ( 0 . 6 ) 8.82 (0.8) 29.49 ( 0 . 3 ) 19.61 (0.4) 12.95 (0.6) 8.92 ( 0 . 8 ) 6.40 (0.9) 31.18 (0.3) 20.43 (0.4) 13.30 (0.6) 9.03 ( 0 . 7 ) 6.47 (0.9) 22.12 ( 0 . 1 ) 17.38 ( 0 . 1 ) 12.65 (0.1) 9.08 ( 0 . 2 ) 6.77 (0.2) 20.14 (0.4) 16.46 (0.5) 12.23 (0.6) 8.94 (0.7) 6.62 ( 0 . 9 ) 39.35 ( 0 . 3 ) 30.80 (0.4) 20.86 (0.4) 13.74 ( 0 . 5 ) 9.19 (0.7) 20.94 (0.4) 17.07 (0.5) 12.56 (0.6) 9.04 (0.7) 6.69 (0.9) 20.94 ( 0 . 4 ) 17.07 (0.5) 12.56 (0.6) 9.04 (0.7) 6.69 (0.9) 14.55 ( 0 . 1 ) 13.63 ( 0 . 1 ) 11.20 (0.2) 8.60 ( 0 . 2 ) 6.73 (0.2) 12.69 (0.5) 11.80 ( 0 . 5 ) 9.87 (0.6) 7.86 (0.7) 6.13 ( 0 . 8 ) 23.00 ( 0 . 4 ) 21.21 ( 0 . 4 ) 16.80 (0.5) 12.34 ( 0 . 6 ) 8.90 (0.7) 13.36 ( 0 . 6 ) 12.47 ( 0 . 6 ) 10.43 (0.7) 8.30 ( 0 . 9 ) 6.43 (1.0) 13.83 ( 0 . 6 ) 12.81 ( 0 . 6 ) 10.66 ( 0 . 7 ) 8.40 ( 0 . 8 ) 6.50 (0.9) 9.97 (0.2) 10.02 ( 0 . 2 ) 9.05 ( 0 . 2 ) 7.67 ( 0 . 2 ) 6.26 (0.3) 8.74 (0.6) 8.72 ( 0 . 6 ) 7.96 (0.7) 6.83 (0.8) 5.64 (0.9) 14.06 ( 0 . 6 ) 14.W ( 0 . 6 ) 12.51 (0.7) 10.22 ( 0 . 7 ) 7.97 (0.8) 9.22 (0.8) 9.17 (0.7) 8.37 ( 0 . 8 ) 7.13 (0.9) 5.89 (1.1) 9.35 (0.7) 9.32 (0.7) 8.49 (0.8) 7.24 ( 0 . 8 ) 5.96 (1.0) 7.16 ( 0 . 3 ) 7.42 (0.2) 7.16 (0.3) 6.45 (0.3) 5.58 (0.3) 6.27 (0.8) 6.47 ( 0 . 8 ) 6.27 ( 0 . 8 ) 5.71 (0.9) 4.97 ( 0 . 9 ) 9.14 (0.7) 9.51 ( 0 . 8 ) 9.09 ( 0 . 8 ) 8.03 ( 0 . 8 ) 6.72 (0.9) 6.59 (0.9) 6.78 ( 0 . 9 ) 6.53 ( 0 . 9 ) 5.97 ( 1 . 1 ) 5.18 (1.1) 6.67 (0.9) 6.88 ( 0 . 8 ) 6.61 ( 0 . 9 ) 6.01 (0.9) 5.23 (1.0)

79.30 35.15 37.53 33.07

by the PPCUSUM approach, the residual MEWMA scheme overperforms the last control scheme for moderate values of the shift. Finally, the MC2 chart shows much worse performance. An advantage of the MEWMA residual scheme is the robustness with respect to the choice of the smoothing parameter. In most cases the smallest out-of-control ARL was obtained for r lying between 0.1 and 0.3 while the optimal values of the modified charts are larger. Such a behavior was already described in Kramer and Schmid (1997). The other charts were more sensitive with respect to the choice of the smoothing parameter. For small shifts a 1 and a 2 the best smoothing parameter is small as well while for large changes the smoothing parameter should be taken large. This is the same behavior as for the univariate EWMA chart for independent samples. The comparison between the modified and the residual charts leads to interesting results. If the ARL of a modified chart is compared with its residual counterpart then for a fixed shift both types of control schemes have a similar performance, except for the MC2 approach. For the MC1, MCUSUM, and PPCUSUM approaches the modified schemes behave better than the residual charts if a 2 is small, else the residual approach is better. However, this behavior might change for other parameter constellations (cf. Kramer and Schmid

Table 5. Minimal out-of-control ARLs of the residual EWMA, MC1, MC2, MCUSUM, and PPCUSUM control charts for different values of the parameters r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 1 0.25 1.0 1.25 1.5 0.5 0.75 al\az 0.0 5.67 (0.2) 4.81 (0.3) 4.03 (0.3) 3.40 (0.4) 2.85 (0.5) 2.41 (0.5) 2.06 (0.6) 5.04 (0.7) 4.26 (0.8) 3.58 (0.9) 2.99 (1.0) 2.52 (1.1) 2.17 (1.1) 1.89 (1.1) 10.02 (0.7) 7.74 (0.8) 5.86 (0.9) 4.45 (1.0) 3.45 (1.1) 2.73 (1.1) 2.26 (1.1) -1.5 5.36 (0.8) 4.51 (0.9) 3.79 (1.0) 3.15 (1.1) 2.68 (1.1) 2.31 (1.1) 2.03 (1.1) 5.55 (0.8) 4.66 (0.8) 3.85 (0.9) 3.21 (1.1) 2.69 (1.1) 2.30 (1.1) 2.02 (1.1) 7.77 (0.2) 6.38 (0.2) 5.20 (0.3) 4.21 (0.3) 3.49 (0.4) 2.89 (0.4) 2.40 (0.57 7.00(0.5) 5.72(0.6) 4.64(0.7) 3.75(0.8) 3.07(1.0) 2.55(1.1) 2.16(1.1) -1.25 16.31 (0.5) 12.11 (0.6) 8.68 (0.7) 6.26 (0.9) 4.60 (1.0) 3.49 (1.1) 2.74 (1.1) 7.46 (0.7) 6.12 (0.8) 4.91 (0.9) 3.96 (0.9) 3.24 (1.1) 2.70 (1.1) 2.31 (1.1) 7.77 (0.6) 6.33 (0.7) 5.08 (0.8) 4.06 (0.9) 3.28 (1.0) 2.71 (1.1) 2.29 (1.1) 11.19 (0.1) 8.93 (0.2) 6.85 (0.2) 5.41 (0.2) 4.27 (0.3) 3.47 (0.4) 2.85 (0.5) 10.27(0.4) 8.05(0.5) 6.18(0.6) 4.81(0.7) 3.80(0.9) 3.06(1.0) 2.51(1.1) -1.0 27.87 (0.3) 19.75 (0.4) 13.40 (0.5) 9.05 (0.7) 6.32 (0.8) 4.56 (1.1) 3.40 (1.1) 10.95 (0.5) 8.59 (0.6) 6.61 (0.7) 5.11 (0.9) 4.02 (0.9) 3.23 (1.1) 2.66 (1.1) 11.62 (0.5) 9.01 (0.6) 6.85 (0.7) 5.28 (0.8) 4.11 (0.9) 3.28 (1.1) 2.68 (1.1) 17.48 (0.1) 12.84 (0.1) 9.53 (0.2) 6.95 (0.2) 5.34 (0.3) 4.16 (0.3) 3.34 (0.4) 16.35 (0.3) 11.88 (0.4)8.53 (0.5) 6.31 (0.6) 4.77 (0.7) 3.71 (0.9) 2.97 (1.0) -0.75 50.16 (0.2) 33.67 (0.3) 21.27 (0.4) 13.48 (0.5) 8.82 (0.7) 6.04 (0.9) 4.32 (1.1) 17.30 (0.4) 12.71 (0.5) 9.16 (0.6) 6.68 (0.7) 5.05 (0.8) 3.91 (1.0) 3.11 (1.1) 18.70 (0.4) 13.54 (0.4) 9.60 (0.5) 6.97 (0.7) 5.20 (0.8) 4.00 (0.9) 3.17 (1.1) 33.61 (0.1) 20.60 (0.1) 13.19 (0.1) 9.29 (0.2) 6.62 (0.2) 5.03 (0.3) 3.92 (0.3) 29.54 (0.2) 19.08 (0.3)12.32 (0.4)8.38 (0.5) 6.01 (0.6) 4.52 (0.8) 3.51 (0.9) -0.5 94.29 (0.1) 59.57 (0.2) 34.52 (0.3) 20.15 (0.4) 12.43 (0.5)8.04 (0.8) 5.48 (1.0) 31.30 (0.3) 20.21 (0.3) 13.05 (0.5) 8.94 (0.6) 6.38 (0.7) 4.77 (0.9) 3.71 (1.0) 34.16 (0.2) 21.73 (0.3) 13.92 (0.4) 9.33 (0.6) 6.61 (0.7) 4.93 (0.9) 3.78 (1.0) 88.96 (0.1) 38.90 (0.1) 19.44 (0.1) 12.06 (0.1) 8.29 (0.2) 6.05 (0.2) 4.57 (0.3) 68.96 (0.1) 33.75 (0.2) 18.08 (0.3)11.15 (0.4)7.57 (0.5) 5.47 (0.7) 4.11 (0.8) -0.25 160.83 (0.1) 104.67 (0.1)55.64 (0.2) 29.90 (0.3) 17.22 (0.4)10.61 (0.7)6.96 (0.9) 72.14 (0.1) 35.57 (0.2) 19.22 (0.4) 11.86 (0.5) 8.09 (0.7) 5.80 (0.8) 4.35 (0.9) 80.77 (0.1) 38.56 (0.2) 20.65 (0.3) 12.59 (0.5) 8.37 (0.6) 5.99 (0.7) 4.48 (0.9) r e s m e w m a 79.20 (0.1) 28.91 (0.1) 15.44 (0.1) 10.20 (0.1)7.11 (0.2) 55.3 (0.3) 62.22 (0.1)26.21 (0.2)14.63 (0.3)9.35 (0.5) 6.49 (0.6) 4.78 (0.7) resmcl 153.29 (0.1)82.74 (0.1) 41.85 (0.2) 22.97 (0.4)13.63 (0.5)8.62 (0.7) resmc2 0.0 r e s m c u s u m 65.22 (0.1) 27.49 (0.3) 15.43 (0.4) 9.90 (0.6) 6.88 (0.7) 5.04 (0.9) resppcusum 72.45 (0.2) 29.93 (0.3) 16.35 (0.4) 10.40 (0.5)7.17 (0.7) 5.20 (0.8)

(1997)). Note that in the univariate case modified schemes lead to a smaller out-of-control ARL if the process has a positive correlation structure while for a negative correlation residual charts should be preferred (cf. Knoth and Schmid (2004)). In the multivariate case the situation is more difficult. If we consider a VAR(1) process then E ( ~ ? ( o ) - ' / ~=x I~' () ~ ) - l / ~Ifa .all elements ~/~ positive then the shift is overweighted. For the residual chart of I ' ( O ) - are we get that limt+oo E(v~-'/'(x~ - x ~ ) )= c-'/'(I - @)a. Here for positive elements of C-'/' the shift is overweighted (downweighted) if all components of @ are negative (positive). Because a VARMA process depends on many parameters a comparison between the modified and the residual charts turns out to be difficult. However, in all of our simulations the MC1 chart provides very good results. For that reason we recommend to apply either the modified or the residual MC1 scheme. For reasons of simplicity we have taken the smoothing values of the MEWMA chart all equal the same. This chart has

Table 6. Minimal out-of-control ARLs of the residual EWMA, MC1, MC2,

MCUSUM, and PPCUSUM control charts for different values of the parameters

r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 2

1.5 1.25 1.0 0.75 0.25 0.0 0.5 88.66 (0.1) 83.56 (0.1) 36.30'10.1) 18.55 (0.1) 11.74 (0.1) 8.18 (0.2) 5.95 (0.2) 69.00 (0.1) 65.26 (0.1) 31.77 (0.2)17.41 (0.3) 10.90 (0.4) 7.46 (0.5) 5.39 (0.7) 0.25 160.84 (0.1)156.72 (0.1)99.21 (0.2) 52.41 (0.2) 28.51 (0.4) 16.55 (0.4) 10.30 (0.7) 71.65 (0.1) 68.00 (0.1) 33.54 (0.2) 18.34 (0.4) 11.53 (0.5) 7.91 (0.7) 5.73 (0.8) 80.67 (0.2) 76.31 (0.2) 36.41 (0.2) 19.80 (0.4) 12.20 (0.5) 8.24 (0.6) 5.92 (0.7) 33.61 (0.1) 41.96 (0.1) 31.03 (0.1) 18.97 (0.1) 12.55 (0.1) 8.90 (0.2) 6.47 (0.2) 29.54 (0.2) 36.00 (0.2) 27.72 (0.2)17.83 (0.3)11.68 (0.4)8.08 (0.5) 5.89 (0.6) 54.26(0.2) 31.37(0.3) 18.61(0.4)11.72(0.6) 93.65(0.1) 109.98(0.1)87.27(0.1) 0.5 31.19 (0.3) 37.67 (0.2) 29.15 (0.3) 18.73 (0.4) 12.34 (0.5) 8.59 (0.6) 6.23 (0.7) 34.22 (0.3) 41.05 (0.2) 31.76 (0.3) 20.26 (0.9) 13.13 (0.5) 8.98 (0.6) 6.44 (0.7) 17.56 (0.1) 21.73 (0.1) 21.01 (0.1) 16.39 (0.1) 12.08 (0.1) 8.98 (0.2) 6.67 (0.2) 16.33 (0.3)19.94 (0.3) 19.31 (0.3)15.35 (0.3)11.22 0.4) 8.16 (0.5) 6.09 (0.6) 0.75 50.42 (0.2) 63.60 (0.2) 61.07 (0.2) 45.62 (0.2) 30.03 (0.3) 19.07 (0.4)12.38 (0.5) 17.27 (0.4) 21.13 (0.3) 20.54 (0.3) 16.16 (0.4) 11.82 (0.5) 8.69 (0.6) 6.46 (0.7) 18.74 (0.4) 22.80 (0.3) 22.13 (0.3) 17.42 (0.4) 12.58 (0.5) 9.08 (0.6) 6.70 (0.7) 11.23 (0.1) 13.27 (0.1) 13.96 (0.1) 12.74 (0.1) 10.63 (0.1) 8.45 (0.2) 6.54 (0.2) 10.28(0.4) 12.26(0.4) 13.02(0.4) 11.83(0.4) 9.75(0.4) 7.65(0.5) 5.96(0.6) 27.84 (0.3) 35.50 (0.3) 37.98 (0.3) 33.28 (0.3) 25.06 (0.3) 17.45 (0.4)12.09 (0.6) 1.0 10.95 (0.5) 13.08 (0.5) 13.84 (0.4) 12.58 (0.5) 10.36 (0.5) 8.15 (0.6) 6.32 (0.7) 11.57(0.5) 13.95(0.4) 14.77(0.4) 13.39(0.4) 10.90(0.5) 8.48(0.6) 6.52(0.7) 7.78(0.2) 9.20(0.2) g9.90(0.1)9.79(0.1) 8.80(0.2) 7.38(0.2) 6.12(0.2) 7.00 (0.6) 8.20 (0.5) 8.91 (0.5) 8.77 (0.5) 7.89 (0.5) 6.72 (0.6) 5.54 (0.7) 1.25 16.36 (0.4) 20.47 (0.4) 22.96 (0.4) 22.34 (0.4) 19.04 (0.4) 14.79 (0.5)10.95 (0.6) 7.46 (0.7) 8.77 (0.6) 9.55 (0.5) 9.41 (0.6) 8.47 (0.6) 7.15 (0.7) 5.86 (0.8) 7.79 (0.6) 9.20 (0.6) 10.04 (0.5) 9.89 (0.5) 8.83 (0.6) 7.42 (0.6) 6.08 (0.7) 5.66 (0.2) 6.47 (0.2) 7.09 (0.2) 7.24(0.2) 6.94 (0.2) 6.27 (0.2) 5.48 (0.2) 5.03(0.7) 5.78(0.6) 6.35(0.6) 6.52(0.6) 6.24(0.6) 5.64(0.7) 4.88(0.7) 10.08 (0.7') 12.48 (0.5) 14.32 (0.5) 14.73 (0.5) 13.65 (0.5) 11.57 (0.5) 9.23 (0.7) 1.5 5.37(0.8) 6.20(0.7) 6.78(0.7) 6.94(0.7) 6.66(0.7) 55.9(0.8) 5.18(0.8) 5.56(0.8) 6.42(0.7) 7.06(0.6) 7.28(0.7) 6.94(0.6) 6.22(0.7) 5.36(0.8)

.al\az

much more flexibility and improvements can be expected if different values are chosen. In Woodall and Mahmoud (2004) the so-called inertia behavior of control schemes is studied. The chart statistic can take values such that changes in the parameters are more difficult to detect. To measure this effect they introduce a new measure, the signal resistance. In their paper they compare several multivariate control charts for independent samples based on this performance criterion. It is shown that, e.g., the PPCUSUM chart has a much better worstcase signal resistance perforrriance than the MCl chart. In our analysis we have not considered the inertia problem. Further research in this direction is necessary.

6 Conclusions Due to the fast development of the computer technology in the last decade it is nowadays possible t o analyze multivariate data sets in a n on-line way. Although such problems arise in various scientific disciplines it is quite surprising that only a: few papers dealt with this topic. There seems to be a

lack of statistical methods for the simultaneous analysis of several correlated processes. In this paper we introduce several CUSUM control charts for detecting a shift in the mean vector of a stationary multivariate time series. We consider modified schemes and residual charts. Due to the large amount of parameters we focus on a VARMA(1,l) process. Sufficient conditions are derived under which the in-control ARL does not depend on the covariance matrix of the white noise process (Theorem 1). However, in general, the out-of-control ARL depends on all parameter matrices of the VARMA model. These charts are compared with the MEWMA chart of Kramer and Schmid (1997). In most cases the best performance is displayed by the MC1 control chart. Our paper can be considered as a first step in this direction. It is necessary to compare these charts for further parameter constellations. Especially, it is desirable to have a rule of thumb whether residual or modified charts should be applied. In our comparison study we restricted ourselves to the two-dimensional case. Because many data sets in practice are highly dimensional it is of great interest to study the behavior of these control schemes for higher dimensions.

7 Appendix In this section the proof of Theorem 1 is given.

We give the proof for the modified MC1 control chart. For the other modified schemes the results can be verified in the same way. Without loss of generality it is assumed that po = 0. We observe that in the in-control state (MC11, ..,MC1,) is a function of (Yi Y j ) ' r - l ( 0 ) (Y, Y,) for l < i < j < n . According to Khatri et. a1 (1977) the joint characteristic function of the quadratic forms (Y, Y,)'r-'(0) (Y, Y,) for 1 i j n is given by

+

+

+

+

< <

time in the language of sequential statistics. In order to give a general idea of control chart, we make use of a chart statistic 2, as a measurable function of {xi}i=1,2, ...,n . Then

where [c;, c:] resembles the continuation region of the charting process. Usually, c; and c: do not depend on time n (except the original EWMA control chart). Additionally, one of these values could be a reflecting barrier. For special one-sided control charts the continuation region could be unbounded below or above. Denote by

the process history before the change point m. The notation c(.) refers to the minimal sigma algebra generated by the random variables under consideration. Note that the star version stand for all in-control sequences which do not lead to an alarm before time point m. Now, we are ready to define various run length measures of specific ARL types. 1. Zero-state ARL: L={

El (L) E,(L)

, out-of-control measure , in-control measure

(3)

Recall that the index (1 or a)corresponds to a change point at chart start up (or before) and in far future, respectively. This ARL type is the most popular one and mostly called shortly ARL. The main drawback is the very specific change point position. In other words, by using the zerostate ARL we are assuming that we know the condition of the control chart right at the change point. At least, it is a satisfying measure in the in-control case, i. e. when we are quantifying the mean time until a wrong signal. Eventually, the phrase zero points to the fact that the initial chart state is given by a certain natural state, which corresponds sometimes to the average state (EWMA charts, see next section) or to the worst-case state (CUSUM schemes, details in next section). Thus, the zero-state ARL might be expressed as Le(z) with actual parameter value 6 and starting value Zo = z. 2. Conditional ARL:

Contrary to the other concepts this one is introduced exclusively for our listing. It allows to create a link between different competing concepts which do not possess a straightforward linkage. Obviously, DT = E1(L). For m > 1, the quantity D& is a real random variable (F&Ll-mea~~rable, i. e. a function of the history XI,. .. , Xm-1 with L > m - 1).

Expected conditional ARL (expectation is taken over the in-control part):

Dm measures a kind of average detection delay. Again, Dl= E1(L). Taking the limit leads to the well-known Steady-state ARL (after a very long in-control period (m -+ oo) without a signal): 2) = lim Dm (6) m+oo Worst-case ARL (the worst case due to Lorden 1971): W = sup ess sup D k m

We can imagine W as the average detection delay of a monitoring scheme being in the least favorable state for detection of a change. Lorden (1971) proved (and later some others too, see next section) some optimality results. The notation above differs from the original one by Lorden, so that we want to sketch the prove of equivalence in the next lines.

= sup m

ess sup D;

, because Pm(L 1 m) > 0 for any finite m .

Note, that for all schemes under consideration in case of independent { X , ) the worst-case ARL, W, is simply the largest zero-state ARL value. That is, we start the monitoring scheme in its worst condition concerning detection speed and the change takes place right at the beginning. Pollak-Szegmund worst-case ARL (more similar to the steady-state ARL): WPS = sup Dm = sup E(DL) m

m

From (6) we see that Dm converges to V, so that either WpS (sup) and V (lim) coincide, or there are some finite many Dm only, that are larger than V. We would like to conjecture that WPS = max {c,2)). a-worst-case ARL - since the very small likelihood of being in the worst case position of both the EWMA control chart and the Shiryaev-Roberts scheme (introduced in next section) the following concept is presented: with

For usual control charts, and change point models as (1) the limiting distribution of D& exists (see Appendix), so that the above definition can be used. Furthermore, the following identities hold:

E

(m-m lim

lim Dm

m+oo

lim

m+oo

All identities are valid by definition except for the second equality. It can be proved by means of Lebesgue's Theorem on Dominated Convergence. For details see the Appendix. In the following we give a historical outline and a more detailed study on a favorite ARL. The steady-state ARL will be preferred, while the zero-state ARL serves as a suitable surrogate of the steady-state ARL in the case of an EWMA scheme.

3 Remarks concerning history of measuring control chart performance In the beginnings of control charting, similar performance measures as in statistical test theory with fixed sample statistics were used. That is, one considered the probabilities of the errors of the first and the second kind. For control charting, the error of the first kind corresponds to the event of a wrong signal (blind alarm), while a missing signal in case of a disturbed process generates the error of the second kind. In order to evaluate the chart performance, often a fixed time point during the observation process is assumed. The standard Shewhart chart, however, is equivalent to a sequence of tests, so that it is reasonable to extend the consideration to the required number of tests until one of these tests ends with the rejection of the null hypothesis. Aroian & Levene (1950) were the first who tried to assess control charts in a more appropriate way than the simple manner of the error probabilities. They presented the so-called average spacing number and average efficiency number as predecessors of the more famous (zero-state) ARL. Then, Page (195413) introduced the term Average Run Length (ARL) "as the average number of articles inspected between two successive occasions when rectifying action is taken. " So, the most popular performance measure of control charts was born. And it prevailed, despite such apocalyptical comments like in Barnard (1959) "If it were thought worthwile one could use methods analogous to these given by Page (1954) and estimate the average run length as a function of the departure from the target value. However, as I have already indicated, such computations could be regarded as having the function merely of avoiding unemployment amongst mathematicians. " Nevertheless, hundreds of papers were written about the computation of the ARL, and probably some more dozens will be written more. In the 1950s further measures were considered. Girshick & Rubin (1952) formulated a Bayesian framework and constructed a monitoring scheme which

was the first step on the way to a scheme called "Shiryaev-Roberts procedure" which is now very popular amongst mathematical statisticians. The initial scheme is based on a constant g which describes the probability of switching from the in-control state to the out-of-control state at each time point and the likelihood ratio of the competing densities. More precisely, they employ

The parameter g stands for the probability that the process switches from incontrol to out-of-control in the next time unit. The authors proved optimality for a certain threshold a given a complex loss function. More comprehensive contributions to the Bayesian approach are given by Shiryaev (1963a)/Shiryaev (1963b) and Shiryaev (1976). Shiryaev introduced a random change point M with geometric distribution.

Contrary to (1) Shiryaev distinguishes between the cases m = 0 and m = 1, where the former represents changes in the past and the latter a change right at the beginning of the monitoring process. Further, he exploited the following loss function. Recall, that L denotes the run length of the scheme.

P,,p(L < M ) denotes the false alarm probability and the second term quantifies the average delay until a valid alarm. The constant c quantifies the cost relation between false alarms and detection delay. Both values should be small for an appropriate monitoring scheme. Shiryaev proved, that a scheme based on the a-posteriori probability P ( M 5 nlX1, X2,. . . ,Xn) minimizes the above loss function. This scheme minimizes E,,p(L - MIL 2 M ) for given upper bound a of P,,p(L < M ) (or given maximal number of expected false alarms) as well. Note, that by means of P ( M = ml M 2 m) = p we see that the Girshick-Rubin scheme and the Shiryaev scheme coincide for the geometric prior. In Shiryaev (1963b) the author considered schemes from the viewpoint of a stationary regime (unfortunately for continuous time only). Shiryaev introduced a quantity called mean delay time which in fact looks similar to the steady-state ARL (there remain some differences). Moreover, for obtaining the optimal scheme he let X (the continuous time counterpart to the geometric prior is an exponential one with density X exp(-X t)) tend to zero which corresponds to Pollak (1985) and Roberts (1966). The latter author took the

Girshick-Rubin scheme and set more pragmatically g = 0, while Pollak proved that the original Shiryaev scheme converges to that procedure which is now known as Shiryaev-Roberts scheme. Shiryaev (2001) notes that the statistic $ = ($t)t20 is called the Shiryaeu-Roberts statistic (the discrete time $CI, is equivalent to Girshick-Rubin's 2, with g = 0). Thus, in the sequel we maintain " Shiryaev-Roberts scheme" as title and " GRSR" as abbreviation for this famous limiting scheme. The Bayes schemes became not very popular in SPC literature contrary to the more theoretic oriented papers, cf. F'ris6n & de Mar6 (1991), F'ris6n & Wessman (1998), Risen (2003), and Mei (2003) for a very recent and lively collection of papers concerned with this approach. Barnard (1959) combined estimation and monitoring, so that measures from estimation theory could be exploited. Bather (1963) addressed the problem of economic design of control charts and used a couple of economic constants for setting up control charts. Contrary to (1) he based his study on monitoring a non-constant parameter, which itself fluctuates around a target value. His optimal scheme resembles the EWMA scheme of Roberts (1959). There are, of course, more papers about economic design. The incorporation of economic parameters in the control chart design, however, leads to a more complicated framework, so that mostly the simple Shewhart chart was treated alone. Thus, the current paper abstains from the economic design. For more details we confer to Ho & Case (1994a) for an overview about literature in that field up to 1994, and to Ho & Case (199413) for economic design for an EWMA control chart. A more recent overview is given in Keats, del Castillo, von Collani & Saniga (1997). An important contribution to control charting performance measures is Roberts (1966). He compared five (finally eight due to combinations) control charts by using one measure. On the one hand, nearly all state of the art schemes were covered. On the other hand, the author constituted a kind of benchmark for evaluating control charts. Unfortunately, he mixed two different concepts. While he used results from earlier published papers (Ewan & Kemp 1960) for considering the CUSUM scheme, which provided zero-state ARL values, he evaluated all other schemes by simulating a quantity which is more similar to the steady-state ARL. More precisely, he determined D9 in (5), that is the so-called expected conditional ARL (see Section 2). Evidently, Roberts was the first and for some time the only one who employed a kind of steady-state performance measure. Note that in case of the Shewhart chart all ARL types are equal, because only the most recent data point (we assume independent data here) does influence the current decision about a change. Roberts' paper provides a good basis to discuss measuring control chart performance. Therefore, the control charts used are introduced in some detail right here and compared by Roberts' measures as well as by the measures of Section 2. Let us recall the setup of the main schemes and the computational methods used by Roberts. He analyzed the one-sided case, which is the usual approach of more mathematically working SPC researchers (the "change

point followers"), while the more applied ones are dealing more frequently with two-sided schemes. In the whole study it was assumed, that the data are independently normally distributed, i. e. X , r v N ( A ,1) with A E {0,0.25,. . . ,5). The stopping rules (times) are written again in sequential statistics manner and represent the run length. Shewhart

0

X

chart

-

Shewhart (1931):

Moving Average (MA) - Roberts (1966):

Exponentially Weighted Moving Average Chart (EWMA) -Roberts (1959):

Lfiuedfirnits= inf { n E N : Zn > c J-}

Lvarying limits= inf n E IN : Zn > c

(1 - (1 -

, x A/(2

-

A)

The fixed limits EWMA chart is the more popular one. Roberts (1966), however, included the varying limit chart in his analysis. Cumulative Sum scheme (CUSUM) - Page (1954a):

Shiryaev-Roberts (GRSR) - Girshick & Rubin (1952), Shiryaev (1963b), Roberts (1966), Pollak (1985), here written in the In() fashion like in Roberts (1966):

The schemes ought to be adjusted to give an in-control ARL (zero-state, expected conditional) of 740, which is equal to the in-control ARL of the onesided X chart with 3a-limit. Moreover, the target out-of-control shift for the more complex charts is A = 1. In Table 1 the actual design parameters and computational methods used by Roberts (1966) for deriving the ARLs are collected. While it is not easy t o understand the way how Roberts (1966) "translated" results of previous papers into suitable ones for his comparison, the "translated" values are fortunately not very far from the true values.

Table 1. One-sided control charts considered in Roberts (1966)

chart design type of ARL all in one Shewhart c=3 MA n = 8, c = 2.79 'D EWMA A = 0.25, c = 2.87 'D CUSUM k = 0.47, h = 5 El(,) ( L ) GRSR

k

= 0.5, g = 390

'D9

computational method exact ( E m(L) = 741.43) Monte Carlo, 25 000 repetitions Monte Carlo, 25 000 repetitions "translation" of two-sided results of Ewan & Kemp (1960) Monte Carlo, 50000 repetitions

In Table 2, the original results of Roberts' and recent results are tabulated to demonstrate the effects of Roberts' mixture of performance measures. In order to obtain accurate numbers for our recent results, a Monte Carlo study with lo8 repetitions was performed, and a density recursive estimation with quadrature was applied except for MA as described in Knoth (2003). Both Table 2. Different ARL types for one-sided schemes given in Roberts (1966), original and recent results

chart

0 0.5 Shewhart 741 161 740 40 737 39 736 39 740 40 686 39 EWMA 689 39 685 38 740 34 CUSUM 719718 32 724 34 740 32 GRSR 690asa 30 697 33

A

type of ARL 1.0 1.5 2.0 2.5 3.0 44 15 6.3 3.2 2.0 la11 in one (exact) 10.2 6.0 4.6 3.8 3.3 Roberts 10.0 5.8 4.4 3.7 3.1 'Dg/V 8.9 4.2 2.6 1.9 1.5 L 10.1 5.1 3.5 2.7 2.2 Roberts 9.8 5.0 3.4 2.6 2.1 'Dg/D 2.6 2.2 10.0 5.1 3.4 L, fixed limits 9.2 4.4 2.7 2.0 1.5 C,varying limits 10.0 5.8 4.3 3.5 2.9 Roberts 9.29.1 5.1 3.6 2.8 2 . 4 ~ . ~ v9/'D 9.9 5.6 3.9 3.1 2.6 C 9.2 5.2 3.7 2.9 2.4 Roberts 9.18.9 5.25.13.73.6 2.92.8 2.4 'D9/'D 10.4 6.2 4.4 3.5 2.9 C

methods provide the same values. Therefore we take these results as the true ones. Note, that the standard error of the Monte Carlo study is about or less than (ARL value)/lo4. Besides, we applied the Gaui3-Legendre Nystrom method for solving the ARL integral equation (and the left eigenfunction

integral equation as well) except for EWMA with varying limits, where this method is not applicable. These results support the previous. In Table 2, the results originally reported by Roberts and the corresponding values of the newer study are boldly written. Furthermore, if the D values (the limit) differ from the Dg ("very" finite) ones, then it is indicated by an index which presents the D value. Roberts chose m = 9 as sufficiently large for approximating the case m = oo (m >> 1 might look more reasonable), which is supported by the more recent results for 2). For the EWMA control chart we cannot observe differences in the "non zero-state" ARL values (D9/D) in considering the fixed and the varying control limits scheme, at least for the chosen accuracy. Note that for all schemes not really the limit D is neither simulated nor derived by the density recursion based approach. But Dloo is taken as substitute, because the sequence { D m ) , = 1,2,. . . is nearly constant for considerably small m already. The completely integral equation oriented approach exploits implicitely that behavior as well (the dominant eigenvalue is computed by means of the power method) and confirms all these approximation results again. Roberts obtained good accuracy for the out-of-control results, while the incontrol values are evidently smaller (one positive exception only - MA) than the true values. The most importing issue of the above table consists in the fact that in case of the CUSUM scheme the worst-case ARL (E1(L) = W = WpS for standard CUSUM) was taken, while all other schemes were evaluated by taking an average measure (D9 z Dloo z D). Now, we return to the historical roots. Lorden (1971) introduced five years after Roberts (1966) a very pessimistic performance measure, the quantity W ( W L ~see ~ Section ~ ~ ~ 2). , He proved that the CUSUM control chart is asymptotically optimal in this sense. Later, Moustakides (1986) and Ritov (1990) extended this result to the finite case. Since for the classical CUSUM control chart the worst case and the usual zero state coincide (as for the simple Shewhart chart), the ARL could be established as a dominating measure. This property is not valid for the EWMA control chart, that was not very popular in those times. After Hunter (1986) and Lucas & Saccucci (1990) it regained attention. Afterall, it was very common to compare control charts in terms of their ARL. Remember that Woodall & Maragah (1990) in the discussion of Lucas & Saccucci (1990) described the inertia problem of EWMA control charts which causes bad worst-case behavior. In order to get some more insights into the inertia problem of the EWMA control chart we want to use the a-worst-case ARL, i. e. W,, which accounts for the probability of being in worst-case condition. We look on the (upper) EWMA example given by Roberts (1966). Remember that for deriving the ARL values, all our quadrature based approaches demand a bounded continuation region (the more popular Markov chain approach as well need it), so which seems that we truncate the region for our study at z, = -6 Thus, the to be reasonable for a (upper) threshold value of 2.87 J-.

+

EWMA statistic follows 2, = max {z,, (1 - A)ZnP1 AX,). There are no substantial differences between the truncated scheme and the primary scheme validated by the Monte Carlo results. For instance, for A = 1Monte Carlo simulation yields 10.0205 with standard error 0.0007, while quadrature provides 10.0193 in case of computation of the zero-state ARL, El (L) . Analogously in case of the steady-state ARL, D,the values are 9.8468 (standard error 0.0007) and 9.8473, respectively. Therefore we are allowed to use the truncated scheme for a worst case analysis. In Table 3, the corresponding values are tabulated. The worst-case ARL W is based on the zero-state ARL with zo = z,, i. e. the EWMA statistic 2, has to bridge the broadest gap to signal. The likelihood of 2, is small being Table 3. Worst-case ARLs, W e , for the upper EWMA chart of Roberts (1966)

with

CY E

chart

{0.05,0.01,0.001)

I

A 0 0.5 1.0 1.5 2.0 2.5 3.0

GRSR 1697 33 10.4 6.2 4.4 3.5 2.91

type of ARL

C/W

near by or just at the lower border (steady-state probability is smaller than lop9), so that in fact W is equal to W,lo-s. Recall that with (steady-state) probability 1 - a the average number of observations (or samples) from the change point m until signal is smaller than or equal to W,. Taking a = 0.5 (W.5, not given in Table 3), we observe the same values like for E1(L) for the given accuracy, i. e. the median-worst-case resembles the zero-state result (approximately). Regarding the numbers in Table 3, we see that the onesided EWMA control chart is inferior to CUSUM and GRSR in terms of the worst-case behavior even for "more likely worst cases". Pollak & Siegmund (1985), Pollak (1985), and Pollak (1985) considered a further worst-case measure in the 1980s. The quantity WpS (see again Section 2) takes only the worst change point position, but not the worst possible incontrol past as Lorden's measure does. Thus, it is not completely unexpected that a different scheme from CUSUM won the new race. The previous mentioned Shiryaev-Roberts rule with a "slight" modification is asymptotically optimal in terms of minimizing WPS. The modification consists in replacing the deterministic starting value zo by a random variable with a density which

characterizes the steady-state distribution of the chart statistic. This means that we start our control chart in an artificial steady-state. Mevorach & Pollak (1991) compared both the CUSUM and the Shiryaev-Roberts scheme by their steady-state behavior for exponentially distributed data. As evident Table 2 the steady-state results of both schemes are very similar, while the ShiryaevRoberts rule is slightly better. Mei (2003) mentioned that strict optimality for the W p Scriterion is still an open problem. For conservative users, however, the CUSUM chart exhibits that valuable worst-case behavior in terms of Lorden's measure. See in Table 2 that now CUSUM is slightly better (except A = 0.5, but the schemes were designed for A = 1) than GRSR (Shiryaev-Roberts). Recall that for CUSUM and GRSR schemes the zero state resemble the worst case. From Tables 2 and 3 we learn that in the one-sided framework CUSUM and GRSR are the dominating schemes. Only in terms of steady-state measures, EWMA behaves nearly as CUSUM and GRSR. The situation becomes better for EWMA in the two-sided case. In the famous EWMA paper by Lucas & Saccucci (1990) the two-sided case was examined. And the authors stated: "Our comparisons showed that the ARLs for the E W M A are usually smaller than the ARLs of CUSUM up to a value of the shift near the one that the scheme was designed to detect." This is a very common statement in control charting literature. Let us reconsider the results of their paper. In Table 4 the results of the comparison study are reorganized with some corrections (mostly regarding the worst-case ARL, where both authors did compute the wrong values for small A). The target mean change is A = 1. And in terms of W and 2)the control chart CUSUMl is the first choice for all considered shifts starting fiom 1. In the two-sided case, however, the differences between all these schemes are considerably small. For a more comprehensive study we refer to Section 5. Finishing our historical journey of performance measuring we want to cite some further ideas. In Dragalin (1988, 1994a) the optimality problem was considered if the out-of-control parameter is unknown. The resulting chart, a generalized CUSUM scheme, is based on the GLR (Generalized Likelihood Ratio) statistic. Some more recent results in that field are Gordon & Pollak (1997), Yakir (1998), and very recently the PhD thesis of Mei (2003). These results, however, stem from the "most theoretical stream" among statisticians working in change point detection. Thus, it needs some more time to understand what their results mean in practice (be aware of terms like "asymptotic"). The same is valid for results like Aue & Horvath (2004) and citations therein, where it is not clear whether the results could employed in SPC or not. The consideration of several papers by Lai (1973 - 2001) lead to further complications. He advocates the steady-state look on control charts, which will be more discussed in the next section. Finally, we mention Gan (1995), who took the median instead of the expectation of the run length. Margavio, Conerly, Woodall & Drake (1995) consider alarm rates, Tartakovsky (1995)

Table 4. Results from Lucas & Saccucci (1990), reordered and adjusted; the smallest ARL value among the four schemes is boldly typed; the second CUSUM scheme is added only here; the indexes indicates that the original result (equals to the index) is replaced; EWMAl - X = 0.133, c = 2.856, EWMA2 - X = 0.139, c = 2.866, CUSUMl - k = 0.5, h = 5, CUSUMz - k = 0.4, h = 6, E,(L) x 465 for all charts. chart

CUSUMl CUSUM2 EWMAl

1

.25

.5

139 120

38.0 33.6

114

32.6

A .75 1. 1.5 worst-case ARL W

17.0 10.4 5.75 16.5 10.6 6.19 steady-state ARL V 15.6 9.85g.84 5.62

2.

2.5

3.

4.01 4.41

3.11 2.57 3.46 2.88

3.98

3.13

2.61

zero-state ARL C

analyzes nonhomogeneous Gaussian processes, Poor (1998) exploites a n exponential instead of a linear penalty for the delay, and Morais & Pacheco (1998) and Morais (2002) employ stochastic orderings for the analysis of control charts. Trying a rksum6, we suggest t o roughly distinguish the worst-case and the steady-state approach of assessing detection power. Hence, the simple look a t the usual (zero-state) ARL becomes less and less appropriate. In the next section, a very surprising behavior of one-sided EWMA control charts will be demonstrated. Looking for minimal out-of-control (zero-state) ARLs leads in some cases to control charts with strange properties.

4 The dilemma of the minimal out-of-control ARL

criterion Already Lai (1995) criticized the uncontrolled usage of the (zero-state) ARL by saying this: The ARL constraint Ee,(T) 2 y stipulates a long expected duration to false alarm. However, a large mean of T does not necessarily

imply that the probability of having a false alarm before some specified time m is small. I n fact, it is easy to construct positive integer-value random variables T with a large mean y and also having a high probability that T = 1. And

then in FrisQn (2003) the "minimal out-of-control ARL criterion" was directly attacked. In the one-sided case, she proved that the optimal EWMA control chart for a given out-of-control shift has an arbitrary small X > 0. Up to now it seemed to be impossible to analyze EWMA control charts with such a small A. Novikov (1990) provides asymptotical ARL results for X -+ 0, but his approach could not be used here, because the alarm threshold is fixed and does not depend on A. This leads to E,(L) + cc for X -+ 0. Moreover, the larger E,(L) is the larger the optimal X (in terms of both C and 2)). While for CUSUM and GRSR the corresponding value does not depend on Em (L), it is generally difficult to analyse EWMA optimality in Novikov's framework. Thus, in Fris6n (2003) a Monte Carlo study was employed to illustrate this phenomenon. Furthermore, one-sided EWMA control charts are not usual for mean monitoring. Therefore, we illustrate the dilemma by considering an upper EWMA control chart for monitoring the normal variance. Before F'risQn (2003) it was common sense that an EWMA control chart design is chosen to ensure a given in-control ARL value and a minimal out-ofcontrol ARL. For instance, Mittag, Stemann & Tewes (1998) consider several control charts for monitoring a normal variance. Rational subgroups of size 5

Fig. 1. Like in Mittag et al. (1998): Relative ARL efficiency W(E) = ARLEWMA etc.(~)/ARL~hewhart S ( E ) (E = U/Q) of EWMA and CUSUM control charts for monitoring normal variance, E, (L) = 250.

are formed and the sample variance S2was computed. Based on S2,different control charts were considered. The whole comparison was summarized in a figure like Figure 1. The control charts under consideration are EWMA charts based on S2 and S with continuation regions [0,c:], an EWMA chart based on In S2with

reflecting barrier at l n a i = 1 (from Crowder & Hamilton 1992), a CUSUM scheme based on lnS2 (like in Chang & Gan 1995), and a CUSUM scheme based on S2 (not included in the original paper of Mittag et al. 1998). The ARLs were computed by means of the Nystrom method based on G a d Legendre quadrature for the ln S2 based charts, while piecewise collocation methods as in Knoth (2005a, 2004b) were employed for the S2 (and S) based charts. Mittag et al. (1998) utilized the Markov chain approximation which is less accurate. The target out-of-control standard deviation is 1.5 and the dominating (in terms of the zero-state ARL) EWMA-S2 control chart looks like:

Now, we do not want to discuss whether this is a fair comparison between the control charts by taking the zero-state ARL. On the contrary, we want to look for a more optimal EWMA-S2 chart in terms of the out-of-control ARL for E = 1.5. Mittag et al. (1998) took X E [0.05,1] which provides the smallest E1(L) for given E,(L) = 250. The lower border 0.05 was chosen because of numerical difficulties (and applicational reasons). What happens if we are able to consider X < 0.05? The answer is given by Figure 2.

Fig. 2. Out-of-control ARL E l ( L )vs. X for EWMA-S2control charts with in-control ARL E,(L) = 250, rational subgroups with size 5.

The dotted line marks the lower X limit used by Mittag et al. (1998). The = 0.000 042. Taking X = 0.000 041 smallest X that was used for Figure 2 is,iX,

would need a critical value c (to give E,(L) = oo) which is negative, so that does not belong to the transition region of the the starting value zo = chart. Thus, Xmi, seems to be a natural border for reasonable X's, because E,(Z,) = a; (for all n) should be a value of the inner transition region. The ARL values were computed by exploiting again the collocation approach and with increasing dimension for decreasing A. The smallest X was validated by a Monte-Carlo study with 10' repetitions which provided 250.103 (s.e. 0.091) and 1.3628 (s.e. 0.0000) for the collocation results E,(L) = 250 and E1(L) = 1.3630, respectively. First, it is an unpleasant behavior if the minimum is obtained at the (lower) border of the X domain. Second, the false alarm probability P,(L = 1) is about 0.4! Thus, we constructed an EWMA control chart with E,(L) = 250 and a high alarm probability at chart startup, whose easy construction Lai mentioned in his paper. Third, with decreasing X the inertia problem becomes more heavier, that is, the worst-case ARL increases. Hence, should we take the local instead of the global minimum? In Figure 3, where we take subsamples of size 2 (or use individual observations and known, fixed mean p o ) and again in-control ARL with E,(L) = 250, there is no local minimum. Everybody,

~02

Fig. 3. Out-of-control ARL El (L) vs. X for EWMA-S2 control charts with in-control

ARL E,(L) = 250, rational subgroups with size 2.

who wants to design an EWMA control chart in the usual way, will take the smallest X under consideration, which will lead to some possible strange side effects. However, the corresponding curve of the steady-state ARL, V ,for the EWMA-S2 control chart of Mittag et al. (1998), would lead to a global, inner minimum again, cf. Figure 4.

zero-state ARL .........

.. .-

;

steady-state ARL ...................................!.. ........ % :

Fig. 4. Out-of-control zero-state ARL E l ( L ) and steady-state ARL 2) vs. X for EwMA-s2 control charts with in-control ARL E,(L) = 250, rational subgroups with size 5 .

In Figure 4 we see that for not too small X both ARL types under consideration, i. e. E1(L)and V, nearly coincide. Thus, the zero-state ARL, E1(L), appears like a suitable performance measure as long it mimics the steady-state ARL, D. Note that, finally, the one-sided EWMA control chart could act as nontrivial case for Lai's anti-(zero-state) ARL example. Lai (1995) proclaimed the following:

I n practice, the system only fails after a very long in-control period and we expect many false alarms before the first correct alarm. It is therefore much more relevant to consider (a) the probability of no false alarm during a typical (steady state) segment of the base-line period and (b) the expected delay i n signaling a correct alarm, instead of the ARL which is the mean duration to the first alarm assuming a constant in-control or out-of-control value. Hence, the consideration of the steady-state behavior seems to be of current interest. The probability mentioned in (a) is directly linked to the incontrol V, because in steady state the run length is geometrically distributed. And for (b) we employ the out-of-control V. Eventually, as for similar considerations of Markov chain quasi-stationary behavior we ought to distinguish the steady-state concept described by conditioning on {L _> m ) with m -t oo, and a kind of cyclical steady-state where after each blind alarm the chart is restarted (see, e. g., Shiryaev 196313).

In the next section a more detailed study by exploiting the random variable D* provides a good comparison of all the competing ARL concepts for CUSUM, EWMA, and GRSR schemes.

5 The steady-state delay D* as framework for comparing control chart performance measures First of all, during the whole section we consider control charts monitoring a normal mean. The in-control mean is po = 0, while the target out-of-control mean is p1 = 1. The variance is set by g 2 = 1. The in-control ARL E,(L) is given by 500. Remember that the zero-state ARL could be written as L,(zo) with actual mean p and initializing value zo (for the chart statistic 2,). Then with the steady-state density of Z,, or more suitably let Z* the steady-state chart statistic which possesses the density function +(.) (see Appendix) we define the steady-state delay by D; = L,(Z*) . Due to the positiveness of density +(.) and using the Lebesgue's Dominated Convergence Theorem we conclude that (see again Appendix) W = ess sup D; D

,

= E(D;).

As conjectured before we state that WpS = max{D, L,(zO)). With D* (in the sequel we suppress the index p) we utilize a random variable, whose expectation resembles the steady-state ARL, D, its essential supremum is equal to the worst-case ARL, W, due to Lorden, and its range coincide with that of the zero-state ARL function. And for small m already we could observe that the probability laws of D& and D* are nearly the same. Thus it might be a good framework for dealing with the competing control chart measures. This analysis will be done in terms of the survival function of D*. First, we start with one-sided control charts. To deal with the one-sided EWMA problem we allow reflection at a pre-specified lower border z,. Moreover, we set X = 0.155 which minimizes the out-of-control D at p l = l for the chart without reflexion, cf. Section 4. Figure 5 demonstrates the effects of different values of z,. Increasing the reflection border improves the worst-case and the steady-state ARL. There are some turning points, however. Regarding the values of Table 5 we realize that the best W could be achieved around 0 and the best D between -1 and 0 (we take, finally, z, = -0.4). Setting z, = 0, that is reflection at the incontrol mean of the chart statistic (without reflection), is a usual approach for one-sided variance EWMA control charts (see Section 4). Therefore, we choose the EWMA charts with z, E (-6, -0.4,O) (z, = -6 should mimic the

Fig. 5. Survival function P(D* > 1) for one-sided EWMA charts (A = 0.155) with different reflection borders z,, E,(L) = 500, at out-of-control mean p = 1.

Table 5. Zero-state, steady-state, and worst-case ARL for one-sided EWMA charts (A = 0.155) with different reflection borders z,, X = 0.155, E,(L) = 500, at out-ofcontrol mean p = 1.

chart without any reflection) as candidates for the comparison with CUSUM and GRSR. Now, CUSUM and GRSR are adjusted (Ic = 0.5) to work perfectly for a shift of size 1. In Figure 6, all 5 control charts under consideration are compared in terms of the survival function P ( D * > 1). In Table 6 some numerical values complement the figure. In Figure 6, P ( D * > I ) curves looks very similar up t o 1 = 9. At 1 = 9.13 = WcusuM, PcusuM(D* > I ) jumps to 0 driven by the reflection barrier 0 of the CUSUM scheme. Similar, but weaker effects one can observe for the reflected EWMA charts (see as well Figure 5). The first EWMA curve that jumps, belongs to z, = 0, then the curve jumps with z, = -0.4, and for z, = -6 we do not see any jump ( P ( D * > 12) is about 0.004, while W = W,lo-, = 14.52). The latter EWMA chart exhibits the unhappy tail behavior: with (steadystate) probability of about 0.2 the delay is larger than the worst cases of the 4 competitors. Based on the zero-state ARL, though, this EWMA chart would

Fig. 6. Survival function P(D* > 1) for one-sided control charts, E,(L) out-of-control mean p = 1.

= 500, at

Table 6. Zero-state, steady-state, and worst-case ARL for one-sided control charts, E,(L) = 500, at out-of-control mean p = 1.

chart GRSR CUSUM

EWMA z, = -6

z, = -0.4

ZT = 0

be the best chart. Finally, the GRSR scheme (Shiryaev-Roberts) provides a smooth and beneficial shape with the smallest expectation V = E ( D * ) .Note that the CUSUM chart operates with a probability larger than 0.5 in worst case condition. Now, let us consider the two-sided case. For the EWMA control chart, things become easier. No reflection is needed and the worst-case problem lessens. We include two different values of A, one for the smallest out-of-control L: (A = 0.1336) and one for the smallest out-of-control 2) (A = 0.1971). The two-sided CUSUM scheme is formed as usually by two one-sided charts. While it is untroublesome to compute the zero-state ARLs (see Lucas & Crosier 1982), it is much more demanding for the steady-state ARL (and of course for dealing with D*). For the latter, a two-dimensional Markov chain (for the Markov chain approach see Brook & Evans 1972) is adapted. Then by means of that discrete model, 2) and P ( D * > 1) are approximated. In the same way the two-sided GRSR scheme is treated (with the exception that the zero-state ARL is approximated by the Markov chain method as well).

We remark that there are different ways of constructing a two-sided GRSR scheme. According t o Pollak & Siegmund (1985) we would take the average of two one-sided GRSR statistics (one upper, one lower scheme), and the combined scheme signals if that value crosses a threshold. We used the usual coupling as in the CUSUM case, that is, the first single scheme which signals, generates the alarm of the combined scheme. While the zero-state ARL values coincide (LPS = Lcoupling= 11.142), the steady-state values slightly differ (VpS = 9.644 # 9.630 = 'Dcoupling). The last chart is a modification of the CUSUM chart due to Crosier (1986), which results in a single chart. For the sake of simplicity we do not consider couplings of two one-sided EWMA charts. These control charts are compared in the same way as we did it for the 7 the survival function P(D*> 1) and in Table 7 one-sided ones. In Figure the ARL types under consideration are presented.

Fig. 7. Survival function P(D* > I ) for two-sided control charts, E,(L) out-of-control mean p = 1.

= 500,

at

First of all, the two-sided CUSUM and GRSR schemes behave similarly to their one-sided counterparts. Crosier's CUSUM modification surpasses both EWMA charts in all ARL types (see Table 7). The worst-case ARLs, W, of the EWMA charts, however, reach tenable size. But with probability of about 0.3 both EWMA delays are larger than the worst cases of GRSR and CUSUM. The main advantage of the EWMA charts (and Crosier's CUSUM) is the simpler setup and analysis. At last, what measure, W or V, turns out to be the appropriate one? We see from Figure 7 and partially from Figure 6 that W does not resemble a "representive" measure. For CUSUM only the worst case is the usual case.

Table 7. Zero-state, steady-state, and worst-case ARL for two-sided control charts, E,(L) = 500, at out-of-control mean p = 1. chart GRSR CUSUM

CrosierA = 0.1336 X = 0.1971 CUSUM EWMA

Looking at the shapes of the P ( D * > 1) curves of CUSUM and GRSR, it is difficult to decide which scheme is better. For 1 < WcusuM the GRSR curve lies below the CUSUM curve, and for 1 2 W c u s u ~vice versa. Taking the expectation of D* seems to be a reasonable answer for that problem. But, further questions could be asked. Should we consider also the second moment or quantiles of the run length? Both types were adressed in literature. However, optimal solutions are now even more difficult to find. Finally, I would like to take this opportunity to thank the referee for helpful comments and corrections.

6 Appendix Properties of D* Based on the filtrations given in (2) we rewrite the conditional ARL D h for Markovian control charts (that is, the distribution of the current chart statistic does only depend on the previous statistic) with transition kernel M ( . , .). For instance, the EWMA transition kernel for monitoring normal mean looks like M(5, z) = 4((z - (1 - X)i)/X) /A with the normal density $(.). Denote 0 the continuation region [c: ,ct].

The measure M is equivalent to the Lebesgue measure with one exception for control charts with a reflecting barrier c; (or c: analogously), where M(cl*) = 1 (see Woodall 1983). Madsen & Conn (1973) proved that for primitive kernel functions M(., .) (fulfilled, e. g., for monitoring normal mean or variance),

fi(-) uniformly converges to $(.), the left normalized, positive eigenfunction of the kernel M ( - ,.). Let D* = L(Z*) with Z*

N

$(.)

Then, by means of Lebesgue's Dominated Convergence Theorem we can prove that

fkPl( z )L ( z )dM ( z ) =

lim E , ( ~ - m + l l ~ > m = ) lim Dm

m-co

m--roo

=v, and P(D* > I )

= =

10

$ ( z ) l { L ( z )> I ) d M ( z )

lirn m+m

10

f~-l(~)l{~(~)>l)dM(~)

= lim P(D; m+cc

>I).

For one-sided control charts, the ARL function L(z) is usually decreasing in z so that by using L ( z ) > 1 H z < C 1 ( l ) the above survival function simplifies to (0 = [c; ,c;])

P(D* > 1 )

= 1 ( 1 ) + ( z )d M ( Z ) .

For two-sided control charts things become more complicate: CUSUM and GRSR need a two-dimensional analysis, while for EWMA the L(z) is not monotone anymore.

References Aroian, L. A. & Levene, H. (1950). The effectiveness of quality control charts, J. Amer. Statist. Assoc. 45: 520-529. Aue, A. & Horvath, L. (2004). Delay time in sequential detection of change, Stat. Probab. Lett. 67: 221-231. Barnard, G. A. (1959). Control charts and stochastic processes, J. R. Stat. Soc., Ser. B 21(2): 239-271. Bather, J. A. (1963). Control charts and minimization of costs, J. R. Stat. Soc., Ser. B 25: 49-80. Brook, D. & Evans, D. A. (1972). An approach to the probability distribution of CUSUM run length, Bzometrzka 59(3): 539-549. Chang, T. C. & Gan, F. F. (1995). A cumulative sum control chart for monitoring process variance, Journal of Quality Technology 27(2): 10S119.

Crosier, R. B. (1986). A new two-sided cumulative quality control scheme, Technometrics 28(3): 187-194. Crowder, S. V. & Hamilton, M. D. (1992). An EWMA for monitoring a process standard deviation, Journal of Qualzty Technology 24(1): 12-21. Dragalin, V. (1988). Asimptoticheskie reshenya zadachi obnaruzhenya razladki pri neizvestnom parametre, Statisticheskie Problemy Upravlenya 83: 47-51. Dragalin, V. (1994). Optimality of generalized Cusum procedure in quickest detection problem, Proceedings of the Steklov Instztute of Mathematics 202(4): 107119. Ewan, W. D. & Kemp, K. W. (1960). Sampling inspection of continuous processes with no autocorrelation between results, Bzometmka 47: 363-380. Friskn, M. (2003). Statistical Surveillance. Optimality and Methods, Int. Stat. Rev. 71(2): 403-434. Friskn, M. & de Mark, J. (1991). Optimal surveillance, Bzometrzka 78(2): 271-280. FrisBn, M. & Wessman, P. (1998). Quality improvements by likelihood ratio methods for surveillance, in B. Abraham (ed.), Quality improvement through statistical methods. International conference, Cochin, India, December 28-31, 1996, Statistics for Industry and Technology, Boston: Birkhauser, pp. 187-193. Gan, F. F. (1995). Joint monitoring of process mean and variance using exponentially weighted moving average control charts, Technometrzcs 37: 446-453. Girshick, M. A. & Rubin, H. (1952). A Bayes approach to a quality control model, Ann. Math. Stat. 23: 114-125. Gordon, L. & Pollak, M. (1997). Average run length to false alarm for surveillance schemes designed with partially specified pre-change distribution, Ann. Stat. 25(3): 1284-1310. Hawkins, D. M., Qiu, P. & Kang, C. W. (2003). The changepoint model for Statistical Process Control, Journal of Qualzty Technology 35(4): 355-366. Ho, C. & Case, K. E. (1994a). Economic design of control charts: a literature review for 1981 - 1991., Journal of Qualzty Technology 26: 39-53. Ho, C. & Case, K. E. (1994b). The economically-based EWMA control chart, Int. J. Prod. Res. 32(9): 2179-2186. Hunter, J. S. (1986). The exponentially weighted moving average, Journal of Qualzty Technology 18: 203-210. Keats, J. B., del Castillo, E., von Collani, E. & Saniga, E. M. (1997). Economic modelling for statistical process control, Journal of Quality Technology 29(2): 144147. Knoth, S. (2003). EWMA schemes with non-homogeneous transition kernels, Sequential Analysis 22(3): 241-255. Knoth, S. (2005a). Accurate ARL computation for EWMA-SZ control charts, Statistics and Computing 15(4): 341-352. Knoth, S. (2005b). Computation of the ARL for CUSUM-S2 schemes, Computatzonal Statistics &' Data Analysis in press. Lai, T . L. (1973). Gaussian processes, moving averages and quick detection problems, Ann. Probab. l(5): 825-837. Lai, T. L. (1974). Control charts based on weighted sums, Ann. Stat. 2: 134-147. Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems, J. R. Stat. Soc., Ser. B 57(4): 613-658.

Lai, T. L. (1998). Information bounds and quick detection of parameter changes in stochastic systems, IEEE Transactzons on Informatzon Theory 44(7): 29172929. Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges, Statzstzca Sznzca 11:303-408. Lorden, G. (1971). Procedures for reacting to a change in distribution, Ann. Math. Stat. 42(6): 1897-1908. Lucas, J. M. & Crosier, R. B. (1982). Fast initial response for CUSUM qualitycontrol schemes: Give your CUSUM a head start, Technometncs 24(3): 199205. Lucas, J. M. & Saccucci, M. S. (1990). Exponentially weighted moving average control schemes: Properties and enhancements, Technometncs 32: 1-12. Madsen, R. W. & Conn, P. S. (1973). Ergodic behavior for nonnegative kernels, Ann. Probab. 1: 995-1013. Margavio, T . M., Conerly, M. D., Woodall, W. H. & Drake, L. G. (1995). Alarm rates for quality control charts, Stat. Probab. Lett. 24: 219-224. Mei, Y. (2003). Asymptotzcally optzmal methods for sequentzal change-poznt detectzon, Ph. D. dissertation, California Institute of Technology. Mevorach, Y. & Pollak, M. (1991). A small sample size comparison of the Cusum and Shiryayev-Roberts approaches to changepoint detection, Amencan Journal of Mathematzcal and Management Sczences 11: 277-298. Mittag, H.-J., Stemann, D. & Tewes, B. (1998). EWMA-Karten zur ~ b e r w a c h u n ~ der Streuung von Qualitatsmerkmalen, Allgemeznes Statzstzsches Archzv 82: 327-338. Morais, M. C. (2002). Stochastzc ordenng zn the performance analyszs of qualzty control schemes, Ph. D. dissertation, Instituto Superior TBcnico, Technical University of Lisbon. Morais, M. C. & Pacheco, A. (1998). Two stochastic properties of one-sided exponentially weighted moving average control charts, Commun. Stat. Szmula. Comput. 27(4): 937-952. Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions, Ann. Stat. 14(4): 1379-1387. Novikov, A. (1990). On the first passage time of an autoregressive process over a level and a application to a "disorder" problem, Theor. Probabzlzty Appl. 35: 269-279. translation from Novi:1990a. Page, E. S. (1954a). Continuous inspection schemes, Bzometnka 41: 100-115. Page, E. S. (195413). Control charts for the mean of a normal population, J. R. Stat. Soc., Ser. B 16: 131-135. Pollak, M. (1985). Optimal detection of a change in distribution, Ann. Stat. 13: 206227. Pollak, M. & Siegmund, D. (1985). A diffusion process and its applications to detecting a change in the drift of brownian motion, Bzometnka 72(2): 267-280. Poor, H. V. (1998). Quickest detection with exponential penalty for delay, Ann. Stat. 26(6): 2179-2205. Ritov, Y. (1990). Decision theoretic optimality of the CUSUM procedure, Ann. Stat. 18(3): 1464-1469. Roberts, S. W. (1959). Control-charts-tests based on geometric moving averages, Technometncs 1: 239-250. Roberts, S. W. (1966). A comparison of some control chart procedures, Technometncs 8: 411-430.

Shewhart, W. A. (1931). Economzc Control of Qualzty of Manufactured Product, D. van Nostrand Company, Inc., Toronto. Shiryaev, A. N. (1963a). Ob otimal'nykh metodakh v zadachakh skorejshego obnaruzheniya, Teomya Veroyatnostez z eye pmmenenzya 8(1): 26-51. in Russian. Shiryaev, A. N. (1963b). On optimum methods in quickest detection problems, Theor. Probabzlzty Appl. 8: 22-46. Shiryaev, A. N. (1976). Statzstzcal sequentzal analyszs. Optzmal stoppzng rules. (Statzstzcheskzj posledovatel'nyj analzz. Optzmal'nye pravzla ostanovkz), Moskva: Nauka. Shiryaev, A. N. (2001). Essentials of the arbitrage theory, Part 111. http://www.ipam.ucla.edu/publications/fm200l/ashiryaev1.pdf. Tartakovsky, A. G. (1995). Asymptotic properties of CUSUM and Shiryaev's procedures for detecting a change in a nonhomogeneous gaussian process, Math. Methods Stat. 4(4): 389-404. Woodall, W. H. (1983). The distribution of the run length of one-sided CUSUM procedures for continuous random variables, Technometmcs 25: 295-301. Woodall, W. H. & Maragah, H. D. (1990). Discussion of "exponentially weighted moving average control schemes: Properties and enhancements", Technometmcs 32: 17-18. Woodall, W. H. & Montgomery, D. C. (1999). Research issues and ideas in statistical process control, Journal of Qualzty Technology 31(4): 376-386. Yakir, B. (1998). On the average run length to false alarm in surveillance problems which possess an invariance structure, Ann. Stat. 26(3): 1198-1214.

Misleading Signals in Joint Schemes for p and a Manuel Cabral Moraisl and Ant6nio Pacheco*12 Instituto Superior Tkcnico, Departamento de Matembtica/CEMAT Av. Rovisco Pais, 1049-001 Lisboa, PORTUGAL majQmath.ist . u t l . p t apacheco0math. i s t .utl .p t

Summary. The joint monitoring of the process mean and variance can be achieved by running what is termed a joint scheme. The process is deemed out-of-control whenever a signal is observed on either individual chart of a joint scheme. Thus, the two following events are likely to happen: a signal is triggered by the chart for the mean although it is on-target and the standard deviation is off-target; the mean is out-of-control and the variance is in-control, however, a signal is given by the chart for the standard deviation. Signals such as these are called misleading signals ( M S ) and can possibly send the user of the joint scheme to try to diagnose and correct a nonexistent assignable cause. Thus, the need to evaluate performance measures such as: the probability of a misleading signal (PMS); and the number of sampling periods before a misleading signal is given by the joint scheme, the run length t o a misleading signal (RLMS). We present some striking and instructive examples that show that the occurrence of misleading signals should be a cause of concern in practice. We also establish stochastic monotonicity properties for RLMS and monotone behaviors for PMS, which have important implications in practice in the assessment of the performance of joint schemes for the mean and variance of process output.

1 Joint schemes for p and a Control schemes are widely used as process monitoring tools t o detect simultaneous changes in the process mean p and in its standard deviation (T which can indicate a deterioration in quality. T h e joint monitoring of these two parameters can be achieved by running what is grandly termed a joint (or combined) scheme.

Table 1. Some individual and joint schemes for /I and u. Individual scheme for p

Acronym

X

S-p c-ll CS - p E-P ES - p

CUSUM Combined CUSUM-Shewhart

EWMA Combined EWMA-Shewhart

Sf -p C+ - p CS+ - p E+ - p Es+ - p

Upper one-sided x Upper one-sided CUSUM Combined upper one-sided CUSUM-Shewhart Upper one-sided E W M A Combined upper one-sided E WMA-Shewhart Individual scheme for a

Sf -0 C+ - 0

E+ - a ESf - u

Upper one-sided S2 Upper one-sided CUSUM Combined upper one-sided CUSUM-Shewhart Upper one-sided E W M A Combined upper one-sided E WMA-Shewhart

Joint scheme

Scheme for

CSC

SS CC CCS EE CES

-

0

S-fi

c-fi CS-p E -P ES-11

fi

Scheme for u

S+ - 0 C+ - 0 CS+ - u Ef -0 ES+ - o

The most popular joint schemes are obtained by simultaneously running a control scheme for p and another one for a (see for instance Gan (1989, 1995)), such as the ones in Table 1. Primary interest is usually in detecting increases or decreases in the process mean, and yet we consider both standard charts for p and upper one-sided charts for p . The former individual charts have acronyms S - p , C - p , CS-p, E - p and ES - p and the latter are denoted by Sf - p, C + - p, C S + - p, E+ - p and ES+- p; the description of these charts can be found in Table 1. Moreover, we only consider the problem of detecting inflations in the process standard deviation, because an increase in a corresponds to a reduction in quality and, as put by Reynolds Jr. and Stoumbos (2001), in most processes, an assignable cause that influences the standard deviation is more likely to result in an increase in a. Thus, only upper one-sided charts for a are considered in this paper. Their acronyms are St - a, C+ - a, C S + - a, E+ - a and ES+ - a and their description can be also found in Table 1.

As a result we shall deal with the joint schemes denoted by SS, C C , C C S ,

EE, C E S , S S + , C C + , C C S + , EE+ and C E S + , as described in Table 1. These joint schemes involve the individual charts for p and a whose summary statistics and control limits can be found in Tables 11 and 12, respectively, in the Appendix.

2 Misleading signals The process is deemed out-of-control whenever a signal is observed on either individual chart of the joint scheme. Thus, a signal from any of the individual charts could indicate a possible change in the process mean, in the process standard deviation or in both parameters. Moreover, the following two types of signals are likely to happen: a a

a signal is triggered by the chart for p although p is on-target and a is off-target; p is out-of-control and a is in-control, however, a signal is given by the chart for a.

These are some instances of what St. John and Bragg (1991) called "misleading signals" (MS). These authors identified the following types of misleading signals arising in joint schemes for p and a: I. the process mean increases but the signal is given by the chart for a, or the signal is observed on the negative side of the chart for p; 11. p shifts down but the signal is observed on the chart for 0,or the chart for p gives a signal on the positive side; III.an inflation of the process standard deviation occurs but the signal is given by the chart for p. Only type I11 correspond to what is called a "pure misleading signal" by Morais and Pacheco (2000) because it is associated t o a change in the value of one of the two parameters that is followed by an out-of-control signal by the chart for the other parameter - it corresponds to misinterpreting a standard deviation change as a shift in the mean. However, there is a situation that also leads to a "pure misleading signal" and is related to both misleading signals of Types I and 11: 1V.a shift occurs in p but the out-of-control signal is observed on the chart for a. This is called a misleading signal of Type IV (although it is a sub-type of Types I or 11) by Morais and Pacheco (2000) and it corresponds to misinterpreting a mean change as a shift in the process standard deviation. Let us remind the reader that diagnostic procedures that follow a signal can differ depending on whether the signal is given by the chart for the mean

or the chart for the standard deviation. Moreover, they can be influenced by the fact that the signal is given by the positive or negative side of the chart for p (that is, the observed value of the summary statistic is above the upper control limit or below the lower control limit, respectively). Therefore a misleading signal can possibly send the user of a joint scheme in the wrong direction in the attempt to diagnose and correct a nonexistent assignable cause (St. John and Bragg (1991)). These misleading results suggest inappropriate corrective action, aggravating unnecessarily process variability and increasing production (inspection) costs. In addition, we strongly believe that no quality control operator or engineer with proper training would be so naive to think that a signal from the scheme for the mean only indicates possible shifts in the mean. However, based on the independence between p and the RL distributions of the schemes for a, signals given by the scheme for a are more likely to be associated t o an eventual shift in this parameter. Nevertheless, the main question here is not whether there will be misleading signals but rather: 0

the '(probability of a misleading signal" (PMS); and the number of sampling periods before a misleading signal is given by a joint scheme, the "run length to a misleading signal" (RLMS).

St. John and Bragg (1991) believed that the phenomenon of misleading signals had not been previously reported. The fact that no such studies had been made is rather curious because misleading signals can arise in any joint scheme for multiple parameters (as, e.g., the multivariate CUSUM quality control schemes proposed by Woodall and Ncube (1985)) and also in any twosided control scheme for a single parameter. But, in fact, as far as we have investigated, there are few references devoting attention t o misleading signals or even realizing that one is confronted with such signals.

St. John and Bragg (1991). Figure 3 of this reference illustrates the frequency of misleading signals for various shifts in the process mean when a joint scheme of type (X, R) is used; the results were obtained considering 5000 simulated runs of subgroups of five observations from a normal process with mean p and standard deviation u. Yashchin (1985). In Figure 10 of Yashchin (1985) we can find three values of the probability of a signal being given by the upper one-sided chart for p when there is a decrease in this parameter; but the author does not mention that these values refer in fact to the probability of a misleading signal for a two-sided control scheme for p which comprises an upper onesided chart and a lower one-sided chart for p. Morais and Pacheco (2000). These authors provide formulae for the probability of misleading signals of Types I11 and IV for joint schemes for p and a. Based on those expressions these probabilities are evaluated for the joint schemes S S and EE. This paper also accounts for the comparison of these two joint schemes, not only in terms of conventional performance

measures such as ARL and RL percentage points, but also with regard to the probabilities of misleading signals. Morais and Pacheco (2001b). This paper introduces the notion of run length to a misleading signal and provides monotonicity properties to both PMS and RLMS of the joint EWMA scheme E E + . Reynolds Jr. and Stoumbos (2001). This paper refers to the joint monitoring of p and a using individuals observations and also discusses the phenomenon of misleading signals (although not referred as such). Table 3 provides simulation-based values not only of the probability of misleading signals of Types I11 and IV but also of the probability that correct signals (i.e, non misleading signals) occur and of the probability of a simultaneous signal in both individual charts, when p is on-target and a is out-of-control, and when 5 is in-control and p is off-target. The authors claim that these probabilities can provide guidelines in the diagnosis of the type of parameter(s) shift (s) that have occurred. Keeping all this in mind, this paper provides striking and instructive examples that alert the user to the phenomenon of (pure) misleading signals, namely when dealing with the ten joint schemes for ji and a in Table 1, whose constituent individual charts are introduced in the same table and described more thoroughly in the Appendix. The monotonicity behaviors of PMSs and the stochastic monotonicity properties of RLMSs of some of these joint schemes are also addressed in this paper. Comparisons between the joint schemes are also carried out, based on PMSs and RLMSs. The numerical study that we conduct is designed with careful thought into the appropriate selection of individual chart parameters to ensure common ARLs for these charts and hence fair comparisons among the joint schemes. Based on this extensive study, we believe that the PMSs and the RLMSs of Types I11 and IV should also be taken in consideration as additional performance measures in the design of joint schemes for p and a or any joint scheme for more than one parameter.

3 Probability of a misleading signal (PMS) Throughout the remainder of this paper po and a 0 denote the in-control process mean and standard deviation (respectively). We shall also consider that the shift in p is represented in terms of the nominal value of the sample mean standard deviation 6 = f i ( p - po)/ao and the inflation of the process standard deviation will be measured by 6' = a/ao with: -00 < 6 < +CQ and 6' 1, for the joint schemes S S , CC, EE, C C S and C E S ; and 6 0 and 6' 2 1, for schemes SS+,C C + , EE+, CCS+ and C E S + . In this setting misleading signals of Type I11 occur when 6 = 0 and 6' > 1. On the other hand, misleading signals of Type IV occur when: S # 0 and 6' = 1, for schemes SS, C C , EE, C C S and C E S ; and 6 > 0 and 8 = 1, for

>

>

schemes SS+,C C f , EE+, CCS+ and C E S + . And since we are dealing with a normally distributed quality characteristic, the summary statistics of the two individual charts for p and a are independent, given (S,8). Thus, we can provide plain expressions for the probabilities of misleading signals of Types I11 and IV, denoted in general by PMSIII(B) and P M S I ~ ( S ) . Lemma 1 - T h e expressions of the PMSs of Types 111 and I V for joint schemes involving individual schemes with independent s u m m a r y statistics (such as the t e n joint schemes i n Table 1 ) are

(or S > 0 w h e n we are using upper one-sided schemes for p), where RL,(6, 8 ) and RL,(B) represent the r u n lengths of the individual schemes for p and 5 , and P x (x), F x (x) and Fx (x) denote the probability, distribution and survival function of the discrete random variable X . The exact expressions of the PMSs of the joint schemes SS and SS+ follow immediately by plugging in the survival functions of the run lengths RL, and RL, into equations (2) and (3). The approximations to PMSs of the remaining joint schemes are found by using the Markov approximations (Brook and Evans (1972)) to the survival function of RL, and RL, and truncating the series (2) and (3). The approximate values of the PMSs converge to the true values due to the convergence in law of the approximate RLs involved in the definition of the PMSs. Theorem 2 - T h e monotonicity properties (1)-(16) in Table 2 are valid for the (exact) PMSs of Types 111 and I V of the joint schemes in Table 1 based exclusively o n upper one-sided individual charts: SS+, CC+ , CCS+, EE+ and C E S + .

The monotonicity properties of PMSs of Types I11 and IV given in Theorem 2 are intuitive and have an analytical justification - most of them follow

Table 2. Monotonicity properties of the (exact) PMSs of Types I11 and IV.

Joint scheme

Type I11 (6 = 0,0 > 1)

Type IV (6 > 0,0 = 1)

SS

(1) PMSIII(O) -1 with 0

(8) PMSrv(6) 1with 6

CC+, CCS+

(2) PMSrrl(8) with a! (3) PMSIII(B) 1 with P (Cl) PMSrrr (6)" (4) PMSrrr(6') 1 with k i (5) PMSrrr(O) I' with k$

+

r

(9)

(10) (11) (12) (13)

PMSrv(6) 1 with a! PMSrv(6) 1 with P PMSrv(6) 4 with 6 PMSrv(0) 1 with k,$ PMSrv(0) 1 with k$

(6) PMSrrr(6) T with a! (7) PMSrrr(0) 4 with P (C2) PMSrrr (8)"

(14) PMSrv(6) 1 with a (15) PMSrv(6) with P (16) PMSrv(6) 1 with 6 *(Cl, C2) Conjecture of no monotonc behavior in terms of 0

E E + , CESf

directly from expressions (1)-(4) of Lemma 1, and from the stochastic monotone properties of the RLs of the individual schemes for p and of those for 0.

Take for instance the PMS of Type IV of the joint scheme SS+,

It is a decreasing function of 6 because: the distribution of RLs+-,(l) does not depend on 6; RLF(6, 1) stochastically decreases with 6 and so does its survival function for any i and PMSIv,ss+(b) defined by Equation (3). As a consequence the joint scheme SS+ tends to trigger less misleading signals of Type IV as 6 increases. As for the monotone behavior of the PMS of Type I11 of the joint scheme S S + , PMSIII,ss+ (Q),it follows immediately that it is a decreasing function of Q if we note that, for 0 > 1,

=

[I-

1- -

-l

1 - W t 10)

Note that the properties of the PMS of the remaining joint schemes depend on parameters like: a and ,G that refer to the head start given to the individual Markov type scheme for p and a (respectively); and Ic that represents the reference value of a CUSUM scheme. The proof of the results concerning the Markov-type joint schemes can be found in Morais (2002, pp. 118-119) and follows from several facts: stochastically monotone matrices (Daley (1968)) arise naturally when we are dealing with upper one-sided Markov-type schemes;

r

it is possible to rewrite the survival function of RL of any of these schemes as an increasing functional of the associated Markov chain; we can establish a stochastic order relation, in the sense of Kalmykov (Kalmykov (1962)), between pairs of probability transition matrices, which allows t o compare the associated Markov chains, i.e., any pair of increasing functionals (Shaked and Shanthikumar (1994, p. 124)), thus establishing a n inequality between the survival functions of two competing run lengths.

The conjectures ( C l ) and (C2) included in the table of Theorem 2 concern the PMSs of Type I11 of the joint schemes C C + ( C C S + ) and EE+ ( C E S + ) , respectively, and surely deserve a comment. P M S l l l ( 0 ) involves in its definition RL,(O, 0) and RL,(O). If, in one hand, this latter random variable stochastically decreases with 0 (see Morais (2002, p. 169),), on the other hand, we cannot tell what is the stochastic monotone behavior of RL,(O, 8) in terms of 0, according to Morais (2002, pp. 169 and go), and Morais and Pacheco (2001a) for the case of scheme E+-p. Thus, establishing a monotonic behavior of PMSIIl(8) in terms of 0 seems to be non trivial. However, as we shall see, the numerical resuits in the next section not only illustrate Theorem 2 but also support conjectures ( C l ) and (C2). In fact values of P M S I ~ I ( O ) seem to decrease and then increase with 0 for the joint schemes CC+, C C S + , EEf and C E S + . The practical significance of this non-monotonous behavior is as follows: the joint scheme tendency t o misidentify a shift in u can increase as the displacement in c becomes more severe.

4 PMS: numerical illustrations As an illustration, we provide values for the PMSs of Types I11 and IV of the ten joint schemes in Table 1 considering: r r

r

sample size equal to n = 5; nominal values po = 0 and uo = 1; and b = 0.05,0.10,0.20,0.30,0.40,0.5,0.6,0.7,0.8,0.9,1.0,1.5,2.0,3.0and 0 = 1.01,1.03, 1.05,1.10,1.20,1.30,1.40,1.50,1.60,1.70,1.80,1.90,2.00, 3.00,

which practically cover the same range that was found useful previously in Gan (1989). With the exception of the joint schemes SS and SS+,the values of these probabilities are approximate and based on the Markov approach using 41 transient states and considering a relative error of in the truncation of the series in (2) and (3). The range of the decision intervals [LCL, UCL) of all the individual schemes for p and for u has been chosen in such way that, when no head start has been adopted, these schemes are approximately matched in-control and all the corresponding in-control ARLs are close to 500 samples (see Table 3). Take for instance the Shewhart-type schemes:

Table 3. Parameters and in-control ARLs of individual and joint schemes for p and 0.

Scheme for p

Parameters

S-P

Er

c-P CS-M E - IL ES-p

h,, [, y, [,

= 3.09023 = 22.7610 = 3.29053, h, = 31.1810 = 2.8891, A, = 0.134 = 3.29053, y , = 3.0934, A, = 0.134

Scheme for o

Parameters

S' - o C+ - o CSf -u E+-o ES+ - o

[:

= 16.9238

h: = 3.5069, k,f = 0.055 [: = 18.4668, h: = 3.9897, k: = 0.055

y,f = 1.2198, A: = 0.043 [ : = 18 4668, y,f = 1.3510, A: = 0.043

500.000 500.001 500.001 499.988 499.999

ARLc(1) 500.000 499.993 500.002 500.027 500.033

Joint scheme

No. of Iterations for ARL,,,,(O, 1)

ARL,,n(O, 1)

SS-a CC - o CCS - o EE-o CES - u

--

250.250 272.653 267.309 253.318 252.003

1882 1909 2050 2060

It should be also added that, in the case of the individual combined CUSUM-Shewhart and EWMA-Shewhart schemes for p and for u, all the Shewhart-type constituent charts have in-control ARL equal t o 1000. Therefore:

A few more remarks on the choice of the individual charts parameters ought to be made, namely that most of them were taken from the literature

in order to optimize the detection of a shift in p from the nominal value po to po 1.0 x aO/& and a shift in a from a0 to 1.25 x ao. LLOptimality" here means that extensive numerical results suggest that with such reference values and smoothing constants will produce an individual chart for p (a) with the smallest possible out-of-control ARL, ARL,(l.O, 1.0) (ARL,(1.25)), for a fixed in-control ARL of 500 samples. The upper control limits of the individual charts are always searched in such way that, when no head start has been adopted, they have in-control ARLs close to 500.

+

C - p: we adopted a null reference value because we are dealing with a standard CUSUM, a positive (negative) reference value would suggest that positive (negative) shifts are more likely to occur; also note that this standard individual chart is different from the two-sided scheme which makes use of an upper and a lower one-sided chart for p . a E - p: the smoothing constant A, = 0.134 was taken from Gan (1995) and agrees with Figure 4 from Crowder (1989); a C+ - p: the reference value k t = 0.5 is suggested by Gan (1991) for a two-sided chart for p; a E+ - p: we adopt the same smoothing constant A t = 0.134 as for E - p ; a C+ - a, Ef - a : Gan (1995) suggests a reference value k: = 0.055 and a smoothing constant A: = 0.043, respectively. a

Finally, note that the individual charts C S - p , E S - p , C S + - p , E S + - p , C S + - a, ES+ - a have the same reference values and smoothing constants as C - p, E - p, C+ - p , E+ - p, C + - a, E+ - a . However, the control limits are larger than the corresponding "non-combined" individual schemes and are obtained taking into account the supplementary Shewhart control limits and the in-control matching of the ARLs. We proceed to illustrate the monotonicity properties stated in the previous section with the joint scheme EE+. For that purpose some head starts (HS) have been given to this joint scheme: HS, = 0%, 50% and HS, = 0%, 50%. It is worth recalling that we did not consider HS,, HS, = -50% because both individual schemes are upper one-sided. The results in Table 4 not only show that PMSs of Types I11 and IV can be as high as 0.47 but also remind us of some of the monotonicity properties in Theorem 2, namely that:

a

giving head starts to the individual chart for p, E + - p , leads to an increase of PMSs of Type I11 and a decrease of the PMSs of Type IV; adopting a head start to the individual chart for a, E+ - a, yields a decrease of the values of PMS of Type I11 and an increase of the ones of Type IV; and underestimating the magnitude of the changes in p results in an overestimation of the PMS values of Type IV.

The numerical results also suggest that underestimating the magnitude of the changes in a also leads to an overestimation of the values of the PMSs

of Type I11 (recall Conjecture ( C l ) ) , for most of the values we considered for 9. However, note that P M S I I I ( ~ changes ) its monotonous behavior, for large values of 9, according to the values in bold in Table 4. Table 4. PMSs of Types I11 and IV for the joint scheme EE'.

The second illustration accounts for the comparative assessment of the schemes SS, C C , C C S , EE and C E S , and of the schemes SS+,C C + , C C S + , EE+ and C E S + , with regard to probabilities of misleading signals of Types I11 and IV. Tables 5 and 6 provide values of these probabilities for the former and the latter groups of five joint schemes, respectively. Also, no head starts have been given to any of the individual schemes whose approximate RL has a phase-type distribution. Values of PMSIV(6), for 6 < 0, were omitted from Table 5 by virtue of the fact that the run length RL,(6,1) is identically distributed to RL,(-6,l) for symmetric values of HS,, hence PMSIV(-6) for HS, = -a! x 100% equals P M S I ~ ( S for ) HS, = a! x loo%, where a! E [O,l). Tables 5 and 6 and Figures 1 and 2 show how the use of joint schemes based on CUSUM and E WMA summary statistics can offer substantial improvement with regard to the (non)emission of MSs of Type 111: schemes SS and SS+ tend to produce this type of MSs more frequently for a wide range of the values of 8. We can add that the joint schemes EE and C E S are outperformed by schemes C C and C C S (respectively) in terms of PMSs of Type 111. However, the joint schemes EE+ and C E S + appear t o offer a slightly better performance than C C + and C C S f , having in general lower PMSs of Type 111.

Table 5. PMSs of Types I11 and IV for the joint schemes SS, CC, CCS, E E and C E S (standard case). PMSIII(H) H

SS

CC

CCS

PMSIV(B) EE

CES

d

SS

CC

CCS

EE

CES

Fig. 1. PMSs of Types I11 and IV for the joint schemes SS, CC, CCS, EE and C E S (standard case).

The numerical results in Tables 5 and 6 and Figures 1 and 2 also suggest that the use of joint schemes C C S , CES, C C S + and C E S f instead of CC, E E , CCf and EEf (respectively), causes in general a n increase in PMSs of Type 111. This is probably due to the fact that a joint combined scheme has a total of four constituent charts (instead of the usual two), thus, twice as much sources of MSs. However, note that there are a few instances where combined joint schemes appear to trigger slightly less PMSs of Type I11 than their "non combined" counterparts, for moderate and large values of 8. Tables 5 and 6 and Figures 1 and 2 show that MSs of Type IV are more likely to happen in schemes SS and SS+ than while using the remaining joint schemes for p and a. All these latter joint schemes seem t o have a similar behavior in terms of the frequency of PMSs of Type IV, in particular when the individual charts for p are upper one-sided, as shown by Figures 1 and 2.

Table 6. PMSs of Types I11 and IV for the joint schemes S S f , C C f , E E f , CCSf and CES+ (upper one-sided case).

PMS IV

PMS 111

O,S[

0 5,

I...

,

1.5

9

25

3

e

PMSs of Types I11 and IV for the joint schemes SS+,C C f , CCS', EE' and CESf (upper one-sided case). Fig. 2.

5 Run length to a misleading signal (RLMS) Another performance measure that also springs to mind is the number of sampling periods until a misleading signal is given by the joint scheme, the ) , of run length to a misleading signal (RLMS) of Type 111, R L M S I ~ I ( 0 and Type IV, R L M S I V ( 6 ) .R L M S I I I ( B )and R L M S I V ( 6 )are improper random variables with an atom in +m because the non occurrence of a misleading signal is an event with non-zero probability:

The next lemma not only adds the survival functions of the R L M S of Types I11 and IV but also provides alternative expressions that prove to be useful in the investigation of the stochastic monotonicity properties of this performance measure.

Lemma 3 - Let RLMSIrI(0) and RLMSIV(6) denote the R L M S s of Types 111 and I V of any of the joints schemes i n Table 1. T h e n

(or 6 > 0 when upper one-sided schemes for p are at use), for any positive integer m. Exact expressions for the survival functions of the R L M S of Types I11 and IV of the schemes SS and SS+ are once again obtained by plugging in the survival functions of the run lengths RL, and RL, into equations (9) and (11). Additionally, if we consider the exact expressions of the PMSs of the joint schemes SS and SS+ we get

where FRL,,,c(6,e) (m) = 1 - FRLl,(6,0) (m) x FRL,(@) (m). These alternative expressions of the distribution function of the RLMSs are due to the fact that the constituent Shewhart charts of schemes SS and SS+ deal in any case with (time-)independent summary statistics. Approximations to these performance measures for the remaining joint schemes are obviously obtained by replacing the survival functions in Equations (9) and (11) by the corresponding Markov approximations and Equations (13) and (14) do not hold in this case because of the (time-)dependence structure of their summary statistics. Theorem 4 - T h e stochastic monotonicity properties (1)-(16) in Table 7 hold for the (exact) R L M S s of Types 111 and I V of the joint schemes S S + , C C f , C C S , EE+ and C E S + .

Table 7. Stochastic monotonicity properties for the RLMSs of Types I11 and IV. > 1)

.Joint scheme

T y p e 111 (6 = 0 , 8

C C f ,CCSf

( 1 ) R L M S r r r ( 8 ) 1,t with a ( 2 ) R L M S r r r ( 8 ) T s t with P ( C 4 ) RLMSrrr(O)* ( 3 ) R L M S r r r ( 8 ) Tat with k ; ( 4 ) R L M S r r r ( 0 ) l s t with kf

E E f , CESf

( 5 ) R L M S r r r ( 0 ) l s t with (6) R L M S r r r ( 8 ) Tat with ( C 5 ) RLMSrrr(o)*

*

T y p e IV (6

(8) R L M S r v ( 6 ) tat with a: (9) R L M S r v ( 6 ) Lt with P (10) R L M S r v ( 6 ) f,t with 6 (11) R L M S r V ( 8 ) l,t with k: (12) R L M S r v ( 9 ) T.t with k,f

(14) (15) (16) (C3, C4, C 5 ) . without stochastic monotonous behavior, in O(

P

> 0 , e = 1)

R L M S r v ( 6 ) T,t with a R L M S r v ( 6 ) lat with P R L M S r v ( 6 ) f a + with 6 t h e usual sense, regarding 8

The stochastic monotonicity properties described in Theorem 4 come as no surprise - they point in the opposite direction of the monotone behavior of the corresponding PMSs, except for Conjecture (C3) which refers to the joint scheme S S f that we proved it has decreasing P M S I I I ( 8 ) .These properties are ensured by the stochastic monotonicity properties of RL,(6, 8 ) and RL,(8) and Equations ( 9 ) - ( 1 2 ) . For example, the increasing behavior of the survival function of R L M S I V , s s + ( G ) follows from ( 1 1 ) and the fact that RL,(6, 1) stochastically decreases with 6 . The run length to a misleading signal of Type 111of the joint schemes C C + , C C S + , EE+ and C E S + , R L M S l I I ( 8 ) ,stochastically decreases with a . This conclusion can be immediately drawn from (10) because: the in-control run ), length of a Markov-type scheme for a with a P x 100% head start, ~ L f ( l does not depend on the head start a; and the run length of a Markov-type scheme for p on the presence of a shift in a and with a a x 100% head start, RLE(0,O ) , stochastically decreases with a, thus FRLE(o,e)(i) increases with a for any i. However, this result could not be drawn from (9) because PRL,P(O,B)(i) is not an increasing function of a, although RL:(O,O) stochastically decreases with a . Similarly, to prove that RLMSIv(b) stochastically decreases with P, we have to use ( 1 2 ) instead of ( 1 1 ) . We ought to add that, based on the percentage points of R L M S I I I ( 8 ) numerically obtained in the next section, we conjecture that R L M S I I I ( 8 ) has no stochastic behavior, in terms of 8 , for none of the five upper one-sided joint schemes under investigation.

6 RLMS: numerical illustrations The percentage points of RLMS are crucial because this random variable has no expected value or any other moment; and note that any p x 100% percentage point of probability, for p equal or larger than PMS, is equal to +co,as illustrated by Tables 8-10.

The presentation of the numerical results concerning RLMS follows closely the one in Section 4. Thus, we use the same constellation of parameters for the ten joint schemes and we begin with an illustration of some stochastic monotonicity properties of the RLMS of the joint scheme EE+. Table 8 and the following ones include I % , 5%, lo%, 15%, 20% percentage points of RLMSIII(B) and RLMSrV(6) for a smaller range of B and 6 values: 0 = 1.01,1.03,1.05, 1.10 and 6 = 0.05,0.10,0.20,0.30. Table 8. Percentage points of RLMS of Type I11 and Type IV of the joint scheme EE' (listed in order corresponding t o p x 100% = 1%,5%, lo%, 15%,20% percentage points, for each 0 and each 6); percentage points of RL,,,(O, 6) and RL,,,(6,1) are in parenthesis.

Also, in order to give the user an idea of how quick a misleading signal of Type I11 and Type IV is triggered by a joint scheme, the corresponding percentage points of RL,,,(O, 8) and RL,,,(6,1) have been added in parenthesis to Tables 8-10 and, thus, can be compared to the corresponding percentage points of RLMSIII(B) and RLMSIV (6).

The results in Table 8 illustrate the findings concerning the RLMSs stochastic monotonicity properties of the scheme EE+. The emission of MSs of Type I11 is indeed speeded up by the adoption of a head start H S p ; giving a head start to the individual chart for g has exactly the opposite effect. Besides this R L M S I v (6) stochastically increases with 6, which means that MS of Type IV will tend t o occur later as the increase in p becomes more severe. However, the entries in bold in Table 8 show that a few percentage points of RLMSrII(B) do not decrease with 0. Table 8 also gives the reader an idea of how soon misleading signals can occur. For instance, the probability of triggering a misleading signal of Type I11 within the first 59 samples is of a t least 0.10 when there is a shift of 1%in the process standard deviation and no head start has been adopted for scheme

EE+. Table 9. Percentage points of RLMSs of Type I11 and Type IV (listed in order corresponding, to p x 100% = 1%,5%, lo%, 15%,20% percenbage points, for each 8 and each 6); percentage points of RL,,,(O, 8) and RL,,,(6,1) are in parenthesis. F-1 ~ ~ (~8 ) "s) (FRL: , V

FR - IL M S ~ ~ ~ ( ~ ( F) R( l P f . ,) L , O ( O . ~ ) ( ~ ) ) H

SS

CC

CCS

EE

CES

6

SS

CC

CCS

EE

CES

Tables 9 and 10 allow us to assess and compare the performance of all the joint schemes in terms of number of sampling periods until the emission of misleading signals. The examination of these two tables leads to overall conclusions similar t o those referring to the PMSs. According to the percentage points in Table 9, the joint scheme SS tends to produce MSs of Type I11 considerably sooner than scheme CC, and the schemes EE, C C S and C E S appear to trigger them later than scheme SS. This probably comes by virtue of the fact that P M S I r ~ ( 0 )of scheme SS is larger than the corresponding PMSs of the remaining joint schemes, as men-

tioned in Section 4. Note, however, that the percentage points of R L M S I I ~ ( 0 ) for schemes SSf, C C f , C C S f , EE+ and C E S + differ much less than when standard individual charts for p are a t use, as apparent in Table 10. Table 10. Percentage points of RLMSs of Type 111 and Type IV (listed in order corresponding to p x 100% = I%, 5%, lo%, 15%,20% percentage points, for each 9 and each 6 ) ; percentage points of RL,,,(O, 6) and RL,,,(6,1) are in parenthesis.

The schemes C C and C C S tend to require more samples to trigger MSs of Type 111 than joint schemes EE and C E S , respectively. O n the other hand, scheme C C f appears to give such MSs almost as late as scheme EE+. The same seems to hold for schemes C C S + and CES+. Supplementing Shewhart upper control limits to the individual CUSUM charts for p and 0 of the joint C C scheme seems t o substantially speed up the emission of M S of Type 111. The same comments do not stand for schemes EE and C E S , or C C + and CCS+, or even EE+ and C E S f . A brief remark on the percentage points of RLMSIII,ss+ (0): all the values suggest that this random variable stochastically decreases as the shift in the process standard deviation increases. However, recall that we proved that PMSIII,ss+ (0) decreases with 0. Thus, if we had considered larger values of 0, we would soon get percentage points equal to +oo instead of smaller ones. In addition, note that Conjectures (C4) and (C5) concerning the remaining schemes C C + , C C S + , EE+ and C E S + are conveniently supported by the numerical results in bold in Table 10. As for RLMSs of Type IV, it is interesting to notice that the joint schemes SS and SS+ do not show such a poor performance when compared to their

Markov-type counterparts CC, EE, C C S , C E S and C C f , EE+, C C S + , C E S f , respectively. Also, joint schemes C C , E E , C C S and C E S seem to have similar performance as far as RLMSIV(G) is concerned, as suggested earlier by the PMSIV(G) values in Table 5 and by Figure 1. The same appears to happen with schemes C C + , EE+, C C S f , C E S + . In addition, the use of combined schemes yields in general smaller values for the percentage points of RLMSIV(S); that is, the MSs of Type IV tend to be triggered sooner. It is worth mentioning that the RLMS is important to assess the performance (in terms of misleading signah) of schemes for p and a based on univariate summary statistics, such as the Shewhart type scheme proposed by Chengalur et al. (1989) whose statistic is n-I (*)2. This comes by virtue of the fact that in such cases we cannot define PMS. The RLMSs of those schemes are particular cases of the run length of the joint scheme itself:

x:=l

7 Final remarks The numerical results obtained along this paper suggest that the schemes SS and SSf compare unfavorably to the more sophisticated joint schemes C C , C C f , etc., in terms of MSs of both types, in most cases. Thus, the SS and SSf schemes are far from being reliable in identifying which parameter has changed. This is the answer for St. John and Bragg's (1991) concluding question: (Misleading signals can be a serious problem for the user of joint charts.) Would alternatives (EWMA or CUSUM) perform better in this regard? Tables 5 and 6 give the distinct impression that joint schemes for p and a can be very sensitive to MSs of both types: the values of PMSs are far from negligible, especially for small and moderate shifts in p and a, thus, misidentification of signals is likely to occur. The practical significance of all these results will depend on the amount of time and money that is spent in attempting to identify and correct nonexisting causes of variation in p (u),i.e., when a MS of Type I11 (IV) occurs. No monotonicity properties results have been stated to the PMSs and the RLMSs of the joint schemes CC, C C S , EE and C E S . This is due to the fact that the constituent individual Markov-type schemes for p are not associated with stochastically monotone matrices (as in the upper one-sided case), an absolutely crucial characteristic to prove the stochastic monotonicity

results of the RLs and, thus, the monotonicity properties of the PMSs and the stochastic monotonicity properties of the RLMSs. Finally, we would like to refer that Reynolds Jr. and Stoumbos (2001) advocate that the probabilities of misleading signals are useful to diagnose which parameter(s) has(have) changed, and suggest the use of the pattern of the points beyond the control limits of the constituent charts in the identification of the parameter that has effectively changed. A plausible justification for this diagnostic aid stems from the fact that changes in p and a have different impacts in those patterns. Table 11. Summary statistics and control limits of the individual schemes for p. Chart for p

Summary statistic

Appendix: Individual control schemes for p and a Let ( X I N , .. . , X n N ) denote the random sample of size n at the sampling period N ( N E IN). XN = n-I Cy=lX Z ~SN2 , = (n--l)-l C:=, (XiNdXN)2 and ZN = fi x ( X N - pO)/oOrepresent the sample mean, sample variance and nominal standardized sample mean, respectively. According to this, we define the summary statistics and control limits of the individual schemes for p and a in Table 11 and Table 12 (respectively), preceded by the corresponding acronym. Some of the summary statistics of these individual control schemes are trivial ( S - p, S+ - p, S+ - a ) or can be found, such as presented here or with a slight variation, in Montgomery and Runger (1994, p.875) ( C - p), Lucas and Saccucci (1990) ( E - p and ES - p), Lucas and Crosier (1982) (C+ - p ) , Yashchin (1985) (CS+ - p), Crowder and Hamilton (1992) ( E + - a ) , Gan(1995) (C+-o), or are natural extensions of existing ones (CS-p, E + - p , E S + ' - p , C S + - u , ES+ - 5 ) . Table 12. Summary statistics and control limits of the individual schemes for a. Scheme for o Summary statistic

E+ - a ES+

-o

W;N

=

N =0

,,zu,:

max {ln(ao2), (1 - A:)

x

w ; ~ -+~A:

x ln(sN2)), N

>0

wTN and s:+

Scheme for o Control limits

Note that, with the exception of the charts S - p, S+ - p and S+- u, an initial value has to be considered for the summary statistic of the standard schemes for p ( C - p, C S - p, E - p , ES - p) and the individual upper one-sided schemes for p and a (C+ - p, CS+ - p, E+ - p, ES+ - p, and C + - 5 , C S + - a , E+ - a , E S + - 0 ) . Let (LCL UCL)/2 a x (UCL - LCL)/2 (LCL a x (UCL - LCL)) be the initial value of the summary statistic of the standard schemes for p

+

+

+

(upper one-sided schemes for ,u and a ) , with a E ( - 1 , l ) ( a E [0,1)). If E (-1,0) U ( 0 , l ) ( a E ( 0 , l ) ) a n a x 100% head start (HS) has been given to t h e standard schemes for ,u (to the individual upper one-sided schemes for p and a); no head start has been adopted, otherwise. T h e adoption of a head start may speed up t h e detection of shifts by t h e control scheme a t start-up and also after a restart following a (possibly ineffective) control action (see Lucas (1982), Lucas a n d Crosier (1982), Yashchin (1985) and Lucas and Saccucci (1990)). T h e control limits of the individual charts for p and a are also in Table 11 and Table 12, respectively. It is worth adding t h a t the control limits of the individual EWMA charts for p and a are all specified in terms of the exact asymptotic standard deviation of W P , ~ and Wu,N = (1 - Au) x W U , ~ - 1 A, x ln(SN2) (see Lucas and Saccucci (1990) a n d Crowder a n d Hamilton (1992), and recall t h a t these latter authors used a n infinite series expansion t o approximate the trigamma function $'(.)).

a

+

References 1. Brook D, Evans DA (1972) An approach to the probability distribution of CUSUM run length. Biometrika 59:539-549. 2. Chengalur IN, Arnold JC, Reynolds Jr. MR (1989) Variable sampling intervals for multiparameter Shewhart charts. Communications in Statistics - Theory and Methods 18: 1769-1792. 3. Crowder SV (1989) Design of exponentially weighted moving average schemes. Journal of Quality Technology 21:155-162. 4. Crowder SV, Hamilton MD (1992) An EWMA for monitoring a process standard deviation. Journal of Quality Technology 24:12-21. 5. Daley DJ (1968) Stochastically monotone Markov chains. Zeitschrift Wahrscheinlichkeitstheorie werwandte Gebiete 10:305-317. 6. Gan FF (1989) Combined mean and variance control charts. ASQC Quality Congress Transactions Toronto, 129- 139. 7. Gan FF (1991) Computing the percentage points of the run length distribution of an exponentially weighted moving average control chart. Journal of Quality Technology 23:359-365. 8. Gan FF (1995) Joint monitoring of process mean and variance using exponentially weighted moving average control charts. Technometrics 37:446-453. 9. Kalmykov GI (1962) On the partial ordering of one-dimensional Markov processes. Theory of Probability and its Applications 7:456-459. 10. Lucas JM (1982) Combined Shewhart-CUSUM quality control schemes. Journal of Quality Technology 14:51-59. 11. Lucas JM, Crosier RB (1982) Fast initial response for CUSUM quality-control schemes: give your CUSUM a head start. Technometrics 24:199-205. 12. Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32:l-12. 13. Montgomery DC, Runger JC (1994) Applied Statistics and Probability for Engineers. John Wiley & Sons, New York.

14. Morais MC (2002) Stochastic Ordering in the Performance Analysis of Quality Control Schemes. PhD thesis, Department of Mathematics, Instituto Superior TBcnico, Lisbon, Portugal. 15. Morais MC, Pacheco A (2000) On the performance of combined EWMA schemes for p and o:A Markovian approach. Communications in Statistics - Simulation and Computation 29:153-174. 16. Morais MC, Pacheco A (2001a) Some stochastic properties of upper one-sided x and EWMA charts for ,LL in the presence of shifts in u.Sequential Analysis 20(1/2):1-12. 17. Morais MC, Pacheco A (2001b) Misleading signals em esquemas combinados EWMA para p e a (Misleading signals in joint EWMA schemes for p and a ) . In Olivcira P, Athayde E (eds) Um Olhar sobre a Estatistica:334-348. Ediq6es SPE. In Portuguese. 18. Rcynolds Jr. MR, Stoumbos ZG (2001) Monitoring the process mean and variance using individual observations and variable sampling intervals. Journal of Quality Technology 33:181205. 19. Shaked M, Shanthikumar JG (1994) Stochastic Orders and Their Applications. Academic Press, London. 20. St. John RC, Bragg DJ (1991) Joint X-bar & R charts under shift in mu or sigma. ASQC Quality Congress Transactions - Milwaukee, 547-550. 21. Woodall WH, Ncube MM (1985) Multivariate CUSUM quality-control procedures. Technometrics 27:285-292. 22. Yashchin E (1985) On the analysis and design of CUSUM-Shewhart control schemes. IBM Journal of Research and Development 29:377-391.

The Frechet Control Charts Edyta Mr6wka and Przemyslaw Grzegorzewski Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland mrowkaQibspan.waw.pl,pgrzegBibspan.waw.pl

Summary. A new control chart based on the Frkchet distance for monitoring simultaneously the process level and spread is suggested. Our chart reveals interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than - S control chart under simultaneous disturbances in the process level and spread.

1 Introduction Statistical process control (SPC) is a collection of methods for achieving continuous improvement in quality. This objective is accomplished by a continuous monitoring of the process under study in order to quickly detect the occurrence of assignable causes and undertake the necessary corrective actions. One of the most popular S P C procedures are Shewhart - S control -for monitoring the process level and spread. chart X control chart contains three lines: a center line (CL) corresponding to the process level and two other horizontal lines, called the upper control limit (UCL) and the lower control limit (LCL), respectively. When applying this chart one draws samples of a fixed size n at specified time points, then he computes an arithmetical mean of each sample and plots it as a point on the chart. As long as the points lie within the control limits the process is assumed to be in control. However, if a point plots outside the control limits (i.e. below LCL or above UCL) we are forced to assume that the process is no longer under control. One will immediately intervene in the process in order to find disturbance causes and undertake corrective actions to eliminate them. The construction of S control chart is similar, however its center line is equal to the process standard deviation and we plot on this chart points corresponding to sample standard deviations computed for consecutive samples (see, e.g., Mittag, Rinne, 1993; Montgomery, 1991; Western Electric, 1956). In the present paper we suggest another approach for designing control charts based on goodness-of-fit tests. Then we propose a particular realization of that idea, i.e. a new control chart based on the Frkchet distance for monitoring simultaneously the process level and spread. Our chart, called, the Frkchet control chart, is very simple in use. Moreover, a simulation study

shows that the FrCchet control chart reveals quite interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than 5 f - S control chart under simultaneous disturbances in the process level and spread. The paper is organized as follows: the idea of designing control charts using goodness-of-fit tests is proposed in Section 2. Section 3 is devoted to the FrCchet distance between distributions. Then, in Section 4 we show how to construct a goodness-of-fit test based on the FrCchet distance. In section 5 we design the FrCchet control chart and in Section 6 we discuss some statistical properties of the suggested control chart.

2 Control charts and statistical tests Suppose that the process under consideration is normally distributed. Let us first assume that we know the parameters of the process (i.e. its mean po and standard deviation ao) when the process is thought to be in control. In such a case the traditional Shewhart X control chart is equivalent to the two-sided hypothesis testing problem for the mean, i.e. H : p = po against K : p # po. Similarly, S control chart is equivalent to the two-sided hypothesis testing problem for the standard deviation, i.e. H : a = a 0 against K : a # DO. Therefore, 5 f - S control chart is equivalent to the following testing problem H : p = p~ &

a = (TO

K :1H

(i.e. the process is in control) (i.e. the process is out of control).

(1)

In this case we will accept the null hypothesis if ~0 and a0 fall into the acceptance area designed by the appropriate confidence intervals. Assuming a = 0.0027 significance level we get 99.27% confidence intervals for p and a with borders corresponding to limits of the control chart (see Figure 1). 0

LCL,

-

.. .

LCL,

_

_ _-

CL,

CLx

I UCLC

_-

UCL,

CI

--

, \ acceptance region

Fig. 1. Acceptance area of

- S control chart

It is easily seen that the process is thought to be in control as long as points (Ti,Si), corresponding to successive samples fall into the following area

(2) It is quite obvious that points Si) located close to the central point ( C L ~CLs) , of this area are treated as the evidence that the process is in control. Unfortunately, according to Figure 1 the process is also thought to be in control if both its mean and standard deviation are quite far from this central point and close to the "corners" of the area (2), i.e. ( U C L ~UCLs), , (LCLY, UCLs), (LCLy, LCLs) and (UCLF, LCLS). This unpleasant feature of - S control chart inclined us to design such a statistical device that would behave in general similarly as - S control chart but which would be more sensitive for the simultaneous disturbances of the process level and spread than this control chart. under study X , and let Fo be a Let F denote a distribution of a distribution of the process Xo which is assumed to be in control. To check if the process under study is in control one may also use a goodness-of-fit test. Then our problem could be expressed as a problem of testing following hypotheses H:F=Fo K : F # Fo. (3)

(x,

Assuming that the test statistic T is a distance between X and X o l i.e.

then we get the one-sided critical region of the form

where d, is a critical value such that

and a is an accepted significance level. If we apply such goodness-of-fit test for designing control chart, we obtain a new SPC tool with only one upper control limit corresponding to the critical value of that test for given significance level a. The idea how to operate such a new control chart is very simple and could be described as follows: - draw a sample X I , . . . , X,. Then compute a value of the test statistic

T ( X l 1 .. . , X,) and plot it as a point on the control chart;

- as far as points fall below the control limit the process is thought to be in control. A point above the control limit is interpreted as evidence that the process is out of control (see Figure 2).

Fig. 2. Control chart based on a goodness-of-fit test

Having in mind that our device should be simple for use we are looking for such a distance measure that enables us to construct a control chart that fulfill following requirements: - our control chart based on a goodness-of-fit test could be applied instead

of

51 - S control chart;

- our controI chart based on a goodness-of-fit test should be more sensitive than - S control chart under simultaneous disturbances of the process

X

level and spread; - our control chart based on a goodness-of-fit test should behave comparably with control chart under disturbances of the process level only; - our control chart based on a goodness-of-fit test should behave comparably with S control chart under disturbances of the process spread only; - control chart based on a goodness-of-fit test should be simple in use and should not require more observations and calculations than r? - S control chart. Now the crucial point is to find an appropriate distance measure which could be used as a test statistic. Many distances between distribution are considered in the literature (see, e.g., Gibbis and Su, 2002; Sobczyk and Spencer, 1992). In the next section we propose how to design a desired control chart based on the Frkhet distance. It is also worth noting that there exist some other approaches for monitoring the mean and variance using single statistic or circular boundaries, like Reynolds et al. (1981), Cox (1996), Chao and Cheng (1996) or He and Grigoryan (2006).

3 The F'rechet distance Let X and Y denote two random variables with distribution functions F and G, respectively, having first and second moments. Then the Frkchet distance between these two distributions is defined by (Frkchet, 1957)

dF (F,G ) = d~ (X, Y) =

J

min E IU - v12,

u.v

where the maximization is taken over all random variables U and V having distributions F and G, respectively. In the particular case when F and G belong to a family of distributions closed with respect to changes of location and scale, the FrCchet distance takes the simple form (see Dowson and Landau, 1982)

where p x , p y , O X , uy denote means and standard deviations of random variables X and Y, respectively. For univariate distributions the FrCchet distance can be graphically interpreted as the length of the hypotenuse in a right triangle with legs p~ - p~ and o x - g~ (see Radi and Nyquist, 1995) which is shown in Figure 3.

Fig. 3. Graphical interpretation of the F'rBchet distance

It is worth noting that a family of normal distributions is closed with respect to changes of location and scale. Hence the FrCchet distance between two normal distributions with parameters p x , p y , O X , O Y , respectively, is given by (8). It is known the normal distribution is completely characterized by two parameters - its mean and standard deviation - and we find these very parameters in (8). Thus the FrCchet distance utilizes the whole information on normal random variables. Recalling that in SPC we traditionally assume normality of the process under study, the FrCchet distance seems to be an excellent candidate for the distance measure that could be used for designing control chart suggested in the previous section.

4 Goodness-of-fit test based on the Frechet distance Suppose that the process in control X o is normally distributed N (PO,00). Moreover, let X I , . . . X n denote a random sample from the process X which is also normally distributed N ( p ,0 ) . TO decide whether the process under study is in control we consider a following hypotheses testing problem

H : X -- N (PO,0 0 ) K:iH

(i.e. the process is in control) (i.e. the process is out of control)

To verify these hypotheses we construct a goodness-of-fit test based on the Frhchet distance between the distribution of the process in control and the empirical distribution of the process under study. Namely, we define the test statistic as follows

x

where and Sx denote a sample average and a sample standard deviation, respectively, i.e.

Then, assuming the significance level a, we accept the null hypothesis H, which is equivalent to the statement that the process under study is in control, if the test statistic (10) does not exceed the critical value d, taken from the following equation

The acceptance area of the hypothesis H is given in Figure 4. It is seen immediately (compare Figure 4 and Figure 1) that samples having the average and standard deviation significantly different than values given in the null hypothesis lead to rejection H. In the context of the statistical process control such a situation is interpreted as evidence that the process is out of control. In order to compute critical values d, of the desired test let us assume that Z1,. . . Zn are realizations of a random variable Z from the standard normal distribution N ( 0 , l ) . Then the Frkchet distance (10) between the empirical distribution of the random variable Z and Zo -- N ( 0 , l ) is given by

I

acceptance region

Fig. 4. The acceptance area of the goodness-of-fit test based on the Frkchet distance

where 'Z and Sz denote a sample average and a sample standard deviation, respectively, from the sample Z1,. . . 2,. The critical value d, corresponding to the significance level a fulfills the following equation

P (d2, (Z,ZO) > d,) = a.

C'

(15)

Suppose that the null hypothesis H given by 9 holds, i.e. X I , . . . Xn N (PO,00). Then after the standardization Zi = we get 21,. . . Zn N ( 0 , l ) . Hence z=-x - P o (16) 'Jo and S z = - sx . 'Jo Substituting (16) and (17) into (14) we get N

Therefore, by (lo), our test statistic could be expressed in a following form

and finally we will reject H if T ( X l , . . . , X n ) > '~idcx, where d, is obtained from (15).

5 The Frkchet control chart As it was mentioned in Section 2, control charts based on goodness-of-fit tests have only one control limit. In the case of the control chart utilizing the FrCchet distance, called the Frhchet control chart, this very control limit (the upper one) has the following form

where 002 is a variance of the process in control, while d, is a critical value obtained from (15). Since the process variance a; is very often unknown, we estimate it bv

which is the average of sample variances 5': obtained for preliminary samples X t l , . . . , X,,, i = 1 , .. . , m taken when the process is thought to be in control, i.e. 1 " S: = - x ) ~ , i = l,...,m. (23)

-

C(X,,

3=l

Since control limits of the traditional Shewhart control charts are &sigma limits which correspond to the significance level a = 0.0027, thus we also construct the FrCchet control charts for the same significance level. One should also remember that critical values d, depend not only on a but on a sample size n as well. Accepting the convention applied for the notation in the - S or X - R control charts, we treat the critical value d, as Shewhart a coefficient that depend on the sample size and we denote it by G2. Thus finally the control limit of the FrCchet control chart is given by

where factors G2 for some sample sizes are tabulated in Table 1. Now, after designing the Frkchet control chart we may plot points corresponding to values T ( X I , . . . , X,) given by (10) and obtained from the consecutive samples. Since the parameters po and - a0 of the process - in control are usually unknown, we estimate them by fl: and 3,where X given by

is the arithmetic mean of the averages obtained for the preliminary samples X i l , , . . , Xi,, i = 1,.. . , m , i.e.

Table 1. Factors Gz for constructing the Frkchet control chart.

while

3 given by

is the average of sample standard deviations Si obtained for preliminary samples taken when the process is thought to be in control, i.e.

Hence we plot on the chart points computed from the formula

An algorithm given below shows in a brief way how to apply the Frkchet control in practice:

1. Using preliminary-samples, taken when the process is thought to be in control, compute and 3. 2. Find the control limit U C L F . 3. Take a sample X I , . . . ,Xn.4. For given sample compute Xi and Si. 5. Compute df =

(xi-R)' + (s,-3)'.

6. Plot point d7 on the control chart. 7. If d: < U C L p then go back to step 3. Otherwise go to step 8. 8. Alarm showing that the process is probably out of control.

9. Investigation and corrective action is required to find and eliminate the assignable cause or causes responsible for this behavior. 10. Go back to step 1.

6 Statistical properties of the Frkchet control chart In this section we discuss briefly some statistical properties of the Frkchet control chart and compare them with the properties of the traditional F- S control chart. We have performed a broad simulation study to check whether the requirements given in Section 2 are fulfilled. The operating-characteristic curves for the Fr6chet control chart, 5? control chart and S control chart under disturbances of the process level p only are shown in Figure 5. The operating-characteristic curves for these charts under disturbances of the process spread only are shown in Figure 6. Finally, in Figure 7 we show the OC curves for these three charts under simultaneous disturbances of the process level and spread.

- Freche chart o

0.5

1

1.5

2

2.5

38

Fig. 5. The OC curves under disturbances of p for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.

----- Frechet chart -.- X chart

06

P

\

- - S chart

04

\

02

.'

02

0

o

2

4

5 8

o

2

I

48

3

Fig. 6. The OC curves under disturbances of u for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.

d) n=20 0 6

P

chart -.- XFrechet chart --- S chart

0 4

0 1

o

2

I

6

a 8

Fig. 7. The OC curves under disturbances of p and u for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.

It is seen that under disturbances of the process level the Frkchet control control chart while under disturbances of chart behaves comparable with the process spread only the Frkchet control chart behaves comparable with S control chart. However, it is evident that under simultaneous disturbances of the process level and spread the Frhchet control chart behaves better than the traditional and S control charts.

7 Conclusions We have proposed a new method for designing control charts based on goodness-of-fit tests. The construction of these charts utilizes distance measures between distributions. In the paper we have investigated the so-called the Frkchet control chart, based on the Frkchet distance for simultaneously monitoring the process level and spread. Our new control chart is very simple in use and reveals quite interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than X - S control chart under simultaneous disturbances in the process level and spread. However, the idea of designing control charts through goodness-of-fit tests does not reduce to the Frhchet distance. One may try t o design control chars utilizing other distances between distributions as well. Moreover, the idea is not restricted t o charts for variables. Thus our investigations in this directions will be continued in further papers.

References 1. Chao M.T., Cheng S.W. (1996), Semicircle Control Chart for Variables Data, Quality Engineering 8: 441-446. 2. Cox M.A.A. (1996), Implementing the Circle Technique Using Spreadsheets, Quality Engineering 9: 65-76. 3. Dowson D.C., Landau B.V. (1982), The Re'chet distance between multivariate normal distributions, J . Mult. Anal. 12, 450-455. 4. Gibbis A., Su F. (2002), On choosing and bounding probability metrics, Cite Seer, NEC Research Institute. 5. FrBchet, M. (1957), Sur la distance de deux lois de probabilite, C. R. Acad. Sic. Paris 244: 689-692. 6. Hadi A.S., Nyquist H. (1995), R & h e t distance as a tool for diagnosing multivariate data, Proceedings of the Third Umea - Wuerzburg Conference in Statistics, Umea. 7. He D. and Grigoryan A. (2006), Joint statistical design of double sampling and s charts, Eur. J . Oper. Res. 168: 122-142. 8. Mittag H.J., Rinne H. (1993), Statistical Methods of Quality Assurance, Chapman and Hall.

9. Montgomery D. (1991), Statistical Quality Control, John Wiley & Sons, INC., New York. 10. Reynolds M.R., Jr., Ghosh B.K. (l98l), Designing Control Charts for Means l Transactions, San Francisco, pp. 400and Variances, ASQC ~ n n u a Congress 407. 11. Sobczyk K., Spencer B.E. (1992), Random fatigue: from data to theory, Academic Press. 12. Western Electric (1956), Statistical Quality Control Handbook, Wester Electric Corporation, Indianapolis Ind.

Reconsidering Control Charts in Japan Ken ~ i s h i n a ' Kazuyoshi , ~ u z u ~Naru a ~ ,~ s h i i '

'

Nagoya Institute of Technology, Department of Techno-Business Administration, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan [email protected], [email protected] SQC Consultant, Ohaza-Makihara, Nukat-cho, Aichi 444-3624, Japan [email protected]

Summary. In this paper, the role of control charts is investigated by considering the practice of using Shewhart control charts from the following viewpoints:

1) Which rules should be used for detecting an assignable cause? 2) What should be considered as chance cause? 3) What should be selected as control characteristics? Woodall (2000) indicated that it is very important to distinguish between the use of control charts in Phase 1 and Phase 2. Considering the different aims of control charts in Phase 1 and Phase 2, we suggest how to use a set of eight pattern tests specified in I S 0 8258: Shewhart control charts The improved machine performance by advanced production engineering results in a reduction of variability within a sub-group and a relatively high process capability. Despite, there are frequently alarms when using Shewhart control charts and this is one of the reason why control charts are not used in Japanese industry. For decreasing the false alarm rate, we propose to include a part of the variability between sub-groups into the variability due to chance causes. The complexity of quality characteristics to be assured has grown with the complexity of products. On the other hand, the advances in measurement technology enable to monitor almost every characteristics. Based on a case study we show that the characteristic to be assured must not necessarily be similar to the characteristic to be monitored.

1 Introduction In Japan "must-be quality" has been regarded as important as "attractive quality" because especially during the last few years some quality issues regarding must-be quality have appeared in industry. Therefore, the time has come to reconsider the fundamental quality activities, although, Japan is proud to be looked upon as a Quality Nation. While control charts are considered to be one of the main methodologies to assure a good must-be quality, they disappeared in Japanese industry long ago. Lately, however, some Japanese companies have reconsidered SPC activities by using Shewhart control charts. About 80 years have passed since Dr. Shewhart proposed control charts, and since then the industrial environment has undergone vast changes. Production engineering and production systems have advanced remarkably and, hence, it is worthwhile to reconsider the use of control charts in the light of such changes. In this paper, the role of control charts in SPC activities is confirmed by considering the practical approach using Shewhart control charts from the following viewpoints: 1) Which rules should be used for detecting an assignable cause? 2) What should be considered as a chance cause? 3) What should be selected as control characteristics?

2 Which rules should be used for detecting an assignale cause? 2.1 Background

I S 0 8258: Shewhart control charts (1991) specifies a set of eight pattern tests for assignable causes (see Fig. l), which are originated from the Western Electric Statistical Quality Handbook (1956). Nelson (1984) presented some notes on the tests. But I S 0 8258 does not explain how to use them depending on the process stage and situation. At the I S 0 voting stage, Japan stood against the I S 0 draft (DIS) arguing that the set of eight pattern tests should not be specified in the formal text because such specific rules, once published, might be arbitrarily interpreted. In the Japanese Industrial Standard (JIS), the content of the revised JIS 9021 of 1999 is consistent with I S 0 8258; however, JIS 9021 explains in a Remarks column that the set of eight pattern tests must not be regarded as specified rules but rather as a kind of guideline.

Some practitioners are concerned about the question which of the tests should be used, because in case of many tests the probability of Type I error may become too large. On the other hand, if only the 3-sigma rule is applied, then the power of test may be small. Before considering the right way to use the rules of control charts in process control, the statistical properties of control charts have to be analyzed. It is important to show how these tests affect the procedure of process control.

UCL

P O T E ] The chart is portioned into six zones of equal width (one sigma) between the upper control limit (UCL) and the lower control limit (LCL), which are labeled A, B, C, C, B, A. Test 1 : One point beyond zone A Test 2: Nine points in a row on zone C or beyond Test 3: Six points in a row steadily increasing or decreasing Test 4: Fourteen points in a row alternating up and down Test 5: Two out of three points in a row in zone A or beyond Test 6: Four out of five points in a row in zone B or beyond Test 7: Fifteen points in a raw in zone C above and below central line Test 8: Eight points in a row on both sides of central line with none in zone C Fig. 1: Tests for assignable causes (IS0 8258)

2.2 Role of control charts in each phase of process control

It is important to understand the procedure of process control by means of control charts. At the start (Phase 1 - see, for example, Woodall and Montgomery (1999)) the process has to be brought into a stable state with a high process capability. In Phase 1, the control charts are used without given standard value for making a retrospective analysis. During the second phase (Phase 2) the process is monitored using control charts with given standard value. The determination of the standard value is a key factor of Phase 2 as it is closely connected with the question what should be considered as chance cause. This question is discussed in the next chapter. Woodall (2000) indicated that it is very important to distinguish between use of a control chart in Phase 1 and Phase 2. In Phase 1 the control chart should be used as a tool for exploratory data analysis, because in this phase the process has to be brought into a stable state with high process capability. On the other hand, in Phase 2, control charting can be considered as repeated hypothesis testing aiming at preserving the process capability. Woodall (2000)'s paper is accompanied by various discussion contributions and some of them deal with Phase 1 and Phase 2. We agree with Woodall's opinion basically; however, if we distinguish the procedure of process control from the viewpoint of whether the control chart is used as a tool for exploratory analysis or hypothesis testing, we think that Phase 2 should be divided into two phases, Phase 2-1 and Phase 2-2. Phase 2-1 is similar to Phase 2 of Woodall (2000); the control chart is used for repeated hypothesis testing and runs from the start of process monitoring until an out of contol state is detected. Phase 2-2 is defined as the subsequent period of searching for the assignable causes. In Phase 2-2 the control chart is used as a tool for exploratory data analysis for identifying the type of assignable cause. The change pattern can provide valuable information about the assignable cause and, additionally, it is necessary to estimate the change-point. 2.3 How to use the rules in each phase

Starting from the discussion in Section 2.2, we next investigate the different use of the rules of control charts in Phase 1, Phase 2-1 and Phase 2-2. As mentioned in Section 2.2, in Phase 1 the control chart is a tool for exploratory data analysis aiming at achieving a stable state for the process and arriving at a sufficiently good process performance. Therefore, it is of utmost importance to pinpoint quickly improvement factors and, clearly, the set of eight tests in I S 0 8258 is very helpful in doing so.

On the other hand, in Phase 2 the control chart may be regarded as repeated hypothesis testing and the focus lies upon the probability of type I error. Phase 2 means that the process has entered mass production and the routine phas has started. Therefore, any alarm released by the control chart represents a serious matter and, hence, a very small probability of Type I error is required. Test I in I S 0 8258 marks the original Shewhart control chart and is called the 3-sigma rule. Any supplementary rule applied in addition to the 3-sigma rule increases inevitably the probability of a Type I error. The 3-sigma rule is an omnibus test. If it is desirable to detect a relatively small shift and/or a trend in the process mean then the use of a supplementary rule is helpful and the question arises which tests should be selected as supplementary rules. Champ and Woodall (1987) and Davis and Woodall (1988) have analyzed the performances of Test 2, Test 5 and Test 3. In this paper we investigate the performances of Test 2, Test 3, Test 5 and Test 6 as supplementary rules. The alarm rate at time t, a ( t ) , is defined as the probability of an alarm at time t given no alarm prior to time t, i.e.,

where T stands for the (discrete) random time until the control chart triggers an out-of-control signal (see Margavio et. a1 (1995)). Fig. 2 shows the false alarm rates of Test 2, Test 3, Test 5, Test 6 and of Cusum charts, where each rule is used together with the 3-sigma rule. They are obtained on the assumption of standard normality. As mentioned earlier, in Phase 2 the probability of a Type I error is of vital importance because Phase 2 corresponds to routine mass production. Often, only the 3-sigma rule is applied, but if particular change patterns in a process shall be detected quickly, then supplementary rules are helpful. However, any too large increase of the probability of Type I error would be counterproductive. The calculations for Test 2, Test 5 and Test 6 in Fig. 2 are performed using a Markov chain approach (see Champ and Woodall (1987)) and those for Test 3 are done by means of a Monte Carlo simulation. To this end 60,000 simulation runs were performed using Numerical Technologies Random Generator for Excel (NtRand) Version 2.01. From Fig. 2 it can be seen that Test 2 performs very bad at time t=9, and Test 6 performs almost everywhere worse that the other rules. On the other hand, the false alarm rate of Test 3 is nearly equal to the 3-sigma rule. This suggests that Test 3 is closely associated with the 3-sigma rule. Davis and Woodall (1988) did not recommended Test 3, because Test 3 does not significantly improve the detection power for trend in process mean. It is well known that the other rules,

except Test 3, improve the performance (see, for example, Champ and Woodall (1987)). Fig. 2 also contains the false alarm rate of a Cusum chart with parameters 5.0 and 0.5 as supplementary rule. It can be seen that the pattern of Test 5 is similar to that of the Cusum chart. The results shown in Fig. 2 suggest to select Test 5, i.e., "2 out of 3 rule" with 2-sigma-warning limits from the rules in given I S 0 825. If, for example, by computer aided SPC, it is possible to implement Cusum charts, then we recommend them as a supplementary rule for Shewhart control charts with 3sigma control limits. In Phase 2-2 the control charts aim at searching for and identifying assignable causes. Thus, the following information are helpful: 1) The pattern characterizing the process change: shift or trend. 2) The time of the process change. In Phase 2-2 a control chart should also be used as a tool for exploratory data analysis and actually the rules in IS0 8285 except Test 7 are helpful for detecting the pattern of change. Additionally, Cusum charts may be used for the changepoint (see Nishina (1996)). The above discussions suggest to distinguish three phases in process control: Phase I , Phase 2-1 and Phase 2-2. A control chart is used as a tool for exploratory data analysis in Phase 1 and Phase 2-2, and for repeated hypothesis testing in Phase 2-1.

2 -o- Test 3 -A- Test 5 -++ Tets 6 +Cusum --t Test

0.003 -

0.002 -

0.001

0

-

0 t 0

Fig. 2: False alarm rate

5

10

15

20

3 What should be considered as a chance cause? 3.1 Background

One of the reasons for not using control charts in Japanese industry is the common understanding that false alarms are released too often by Shewhart control charts. As is generally known, control charts are used for visualizing variability due to chance causes and for visualizing the change in variability due to the occurrence of an assignable cause. The problem is to identify the chance causes. Chance causes are related to the inherent variability exhibited by the " 4 M (Machine, Material, Man, Method), which is uncontrollable from the technical, economic and organisational viewpoint. Dr. Shewhart proposed a suitable way to fix the amount of variability due to chance causes. The method is based on relatively small sub-groups, in which the 4M conditions are more or less equal and to identify the variability due to chance causes by the within sub-group variability. A rational sub-group is defined as a subgroup within which variability is due only to random causes, and between which variability is due to special causes, selected to enable the detection of any special cause of variability among subgroups (by ISOIFDIS 3534-2). However, in practice it is difficult to specify a rational sub-group. In practice, sub-groups are made up in terms of a given day, operator shift or a treatment lot, etc.

Consider 2 - R control charts. The variability within a sub-group is estimated using the mean range after confirming that the X - R control chart has not indicated an out-of-control state; then the estimated variability within a sub-group is regarded as the variability due to chance causes. It is a nearly standardized procedure. However, even in the case that the process capability is adequate, the control chart can have many points outside the control lines and, consequently, process engineers distrust control charting. 3.2 Investigations and countermeasures Because of the unfavorable behavior of control charts experienced by the European Centre for TQM at Bradford University, Caulcutt (1995) pointed out that the above standardized procedure must not necessarily be appropriate as it results in X charts which often release alarms although no cause of instability can be detected. Bissell (1992) suggested that X - R control charts be supplemented by a delta chart to monitor the medium-term variability. The delta chart would be used to plot the successive differences in the group means.

Kuzuya (2000) investigated 63 cases of control charts in the stage of routine mass production used for monitoring essential quality characteristics for processes exhibiting a capability index exceeding 1.33. He found that half of the X or charts triggered alarms, thus, confirming Caulcutt's results (1995). We propose a procedure in which the variability due to chance causes includes some of the variability between sub-groups. The essential is that the process performance in the early-stage mass production is regarded as the standard value of variability due to chance causes in routine mass production. We explain the procedure by illustrating a heat treatment process. Fig. 3 shows an ordinary X - R chart in the early-stage mass production, where the control characteristic is hardness. The R chart indicates the in-control state, but the chart has many out-of-control points. On the other hand, Fig. 4 shows another R chart, where the control limits of chart are deduced from the overall process variability and not from the mean range R .

z

z-

z

In this case, the sub-group is composed of a treatment lot; that is, the variability within a sub-group is the variability within a lot. The sub-group has a meaning from the viewpoints of both physical aspect and quality assurance. Therefore, it is necessary to control the variability within a treatment lot using R charts. Fig. 4 indicates that the process is in-control and, hence, the process may proceed from early-stage mass production to routine-stage mass production where the control limits of R chart given in Fig. 4 are used as a standard control level. This means that the random variability due to some allowable causes between sub-groups in the early-stage mass production is included as variability due to chance causes, for example, the variability within the material, which is allowed by a standard for material of the company.

z-

The procedure outlined above leads to suggest the following standard procedure of process control activities from pilot production to routine mass production: Stage 1: Understanding and improving machine performance in pilot production. The machine performance index PM is given as

where W is the tolerance and sM is the machine performance represented by the standard deviation. We decide whether or not to proceed to the early-stage production with reference to the machine performance index PM.

Stage 2: Understanding and improving process performance in early-stage production. This stage corresponds to Phase 1. We decide whether or not to remove the early-stage production control system and proceed to the routine mass production

By means of the process performance index, Pp.

where the standard deviation s, represents the process performance. In case of a decision to proceed to the next stage the standard deviation s, becomes the standard value for control during the routine mass production stage. Stage 3: Monitoring process in routine mass production

This stage corresponds to Phase 2. Any change of the 4M in the production system should lead to a restart in Stage 1. An improvement of the quality level leads to a restart in Stage 2. Improving the machine performance by advanced production engineering results in reduction of variability within a sub-group and the chance causes have to be reconsidered, but note that the variability within a sub-group is not necessarily considered as variability due to chance causes.

Fig. 3: Ordinary

- R chart in the early-stage mass production

Fig. 4:

X -R

chart, where the control lines of

X

chart taken from the overall process

variability instead of the mean of range R

4 Selection of control characteristics 4.1 A case study (see Isaki, Kuzuya and Nishina (2002))

Consider the following parts production process by a transfer machine, which is a turntable system in the horizontal axis direction, the x-coordinate direction. In this process, drilling holes, cutting sections, tapping screws and so on are performed at each station in turn. The subject of this section is drilling holes. The characteristic to be assured is the location of a hole. We specify a target location (xo, y o ) for the center of the hole. The coordinates (xo, y o ) are measured from some specified point. Similarly, the coordinates (x, y) denote the center of a produced hole. Then the position deviation for the hole location is defined as D 2 . Its specification region is a circle with a diameter of 0.1.

(4.1)

Fig. 5 shows the histogram of the value D. The,sample size is 50. The control chart indicates that the process is in control. In addition, the value of the process capability index C p is obtained as

The value of C, indicates that there is a high level of process capability. However, after having left the production stage and proceeded to mass production, a nonconforming issue appeared, although the process capability had been highly acceptable. Fig. 6 shows the two-dimensional plot of the centers of the drilled holes, where the origin of the coordinates is the target location ( x o ,y o ) . Fig. 6 indicates that the variability for x-coordinate is much larger than that for y-coordinate.

where s, and s y denote the standard deviations for the location for xcoordinate and y-coordinate, respectively.

I frequency

2 ~=2,l'(x-xo) +(y-yd2 Fig. 5: Histogram of the value D (Isaki et al. (2002))

Calculating the process capability for the two coordinates separately, Cp, and Cpy , we obtain: Cp, = 0.94 , Cpy = 3.35

The process capability index of the x-coordinate is much lower than that of the ycoordinate. The essential point in the above case study is that the selection of the quality characteristic has been inappropriate. This is a very instructive case showing the significance of selecting the control characteristic

Fig. 6: Two-dimensional plots of the center of the produced hole (Isaki et al. (2002))

4.2 Characteristic to be assured and/or monitored

In the above case study the characteristic to be monitored is D shown in Equation (4.1). But if D is selected as a control characteristic, performance evaluation is misleading and process control results in failure. There are many sources of the location variability, for instance, the variability of the location of jig, of setting the part and so on. However, the above observed variability is caused by the turntable mechanism of the transfer machine (see Fig. 7). The low process capability with respect to the x-coordinate is due to the

variability of the turntable stopping position. The location variability can be decomposed to the variability in the x-coordinate and the y-coordinate with different sources within the mechanism of the production process.

Fig.7: Turntable mechanism of transfer machine

The lesson from this case study can be extended to more general situations, especially when geometric characteristics are involved, for example, perpendicularity, deviation from circulation etc. These characteristics can be decomposed to some more elementary characteristics corresponding to relevant process mechanisms. In the case study at hand the process capability study must be based on each of the deviations of the hole location along the x- and ycoordinates, respectively. It may be difficult to find out the assignable cause when the control chart indicates an out-of-control, however, any inappropriate selection of the control characteristic makes it even more difficult. The control characteristic must be selected in accordance to the relevant elements of the process. The compelxity of quality characteristics of modern production processes have grown with the complexity of products and they can be monitored only by the great advance in the measurement technology. However, the case study shows that a measurement characteristic specified by a related technical standard is not necessarily appropriate as control characteristic.

5 Conclusive remarks Control charts disappeared long ago from Japanese industries. Some of the key reasons have been addressed in this paper, referring to the rules how to indicate assignable causes, how to consider chance causes, and how to select the control characteristics. As mentioned in the Introduction, the role of control charts has recently been reconsidered in Japanese industry. Although control charts are generally looked upon as fundamental tools in SPC, their unquestioned use yields unwanted results and leads to distrust in the control chart concept itself. When talking about control charts in Japan, we mean Shewhart control charts. However, where computer-aided SPC is available, an integration of SPC and APC and the use of more sophisticated control charts, as for example Cusum charts, could be beneficial, Additionally, Phase 1 and Phase 2 should be strictly distinguished in industry. However, process industry has some features, which make it difficult to use control charts compared with the parts industry; for example, the process mean is apt to driR owing to uncontrollable factors (see Box and Kramer (1992)). In such a situation, the main problem is to arrive at Phase 2.

References I S 0 8258 (1 991): Shewhart control charts. Western Electric (1956): Statistical Quality Control Handbook, American Telephone and Telegraph Company, Chicago, IL. Nelson, L. S. (1984): "The Shewhart Control Chart - Tests for Special Causes", Journal of Quality Technology, Vol. 16, No. 4,237 - 239. Woodall, W. H. and Montgomery, D. C. (1999): "Research Issues and Ideas in Statistical Process Control", Journal of Quality Technology, Vol. 31, No. 4, 376 - 386. Woodall, W. H. (2000): "Controversies and Contradictions in Statistical Process Control (with discussions)", Journal of Quality Technology, Vol. 32, NO. 4,341 - 378. Champ, C. W. and Woodall, W. H. (1987): "Exact Results for Shewhart Control Charts With Supplementary Runs Rules", Technometrics, Vol. 29, No. 4,393 - 399. Davis, R. B. and Woodall, W. H. (1988): "Performance of the Control Chart Trend Rule Under Linear Shift", Journal of Quality Technology, Vol. 20, No. 4,260 - 262.

8.

9. 10. 11.

12.

Margavio, T. M., Conerly, M. D., Woodall, W. H. and Drake, L. G. (1995): "Alarm Rates for Quality Control Charts", Statistics & Probability Letters, V O ~24,219. 224. Nishina, K. (1996): "A study on estimating the change-point and the amount of shift using cusum charts", The Best on Quality, Chapter 14, Edited by Hromi, J. D., ASQC. Caulcutt, R. (1995): "The Rights and Wrongs of Control Charts", Applied Statistics, Vol. 44, No. 3, 279 - 288. Kuzuya, K. (2000): "Control Charts are renewed by Adaptation of New JIS", Proceeding of the 65lhConference of Japanese Society of Quality Control, 72 - 75 (in Japanese). Isaki, Y., Kuzuya, K. and Nishina, K. (2002): "A Review of Characteristic in Process Capability Study", Proceeding of the 32nd Annual Conference of Japanese Society of Quality Control, 6 1 64 (in Japanese). Box, G. and Kramer, T. (1992): "Statistical Process Monitoring and Feedback Adjustment - A Discussion", Technometrics, 25 1 - 267.

-

13.

Control Charts for the Number of Children Injured in Traffic Accidents Pokropp, F., Seidel, W., Begun, A., Heidenreich, M., Sever, K. Helmut-Schmidt-Universitat/Universitat der Bundeswehr Hamburg, Holstenhofweg 85, D-22043 Hamburg, Germany [email protected], wilfried.seidelBhsuhh.de

1 Introduction and Summary One of the major objectives of police work in Germany is to reduce the number Y of children injured by trafic accidents. For monitoring police activities, "target-numbers" of injuries provided by police authorities were mainly used as control limits. In general however, those limits do not reflect random v a r b tions of Y. Consequently they cause false alarm too often or they do not give sufficient hints to unusual increases of injury numbers. Improved methods of monitoring the effectiveness of police measures are needed. (See Section 2.) To construct more reliable control limits a stochastic model for Y should be used. We aimed at developing two "versions" of control limits: (a) for detecting significant deviations from injury incidences in the past, (b) for detecting significant deviations from pre-given target values. (See Section 5.) We had data for about five years in various regions. Doing some data analysis we observed several predominant patterns of Y for all years and all regions: Due to weather conditions (e.g.), typical differences showed up between seasons (e.g. winter and summer time); due to family and/or children's activities during holidays, periods of vacancies differed from school periods; weekends behaved differently from normal week days, even special weekdays showed up in the data. (See Section 3.) The structure of patterns which we had recognized in the data recommended to look for explanatory variables which could represent "seasonal" effects, particularly weather conditions, and periods of high or low traffic activities, also of different intensities of children's participation or presence in traffic. Also weekday effects had to be taken care of. Since one of our major objectives was to predict injury numbers for several regions on the basis of estimated parameters from data in the past and values of explanatory variables in the future which cannot be controlled, we had to look for variables prevailing in time and region. Thus we decided to base a model for the number YT of injured children within a time-period T on daily numbers of injured children,

explained by month of the day, weekday and holiday- resp. "bridgeday"character of the day. (See subsections 4.1, 4.2.) Parameter estimation was done by Maximum Likelihood methods. To get some insight in the performance of ML-estimation for our model we computed the ML-estimators on simulated populations. The parameters used for the simulation process were sufficiently well "reproduced" by the estimated parameters. Validation of the model was done heuristically as well as by well established methods for Generalized Linear Models. Results were encouraging. (See Subsection 4.3.) The construction of control limits was done by simulation. Using the estimated parameters - eventually "corrected" by pre-given target values - a random sample from the estimated distribution of YT (the number of injured children during time period T ) was drawn. Empirical quantiles from the Sample were taken as control limits. Several examples of control limits are given and commented. (See Section 5.) Finally the power of the control scheme is discussed in section 6 .

2 Applied Practices A first step to judge observed numbers of injuries, to discover trends or changes of patterns, is to look at average numbers in the past. This was done by police personal engaged in monitoring injuries caused by traffic accidents, but without any reference to the stochastic character of the involved events. In very detailed overviews local police personal can register locations which turn out to be crucial points of accidents. Further, they can identify areas of special population structures. Numbers of injuries can then be related to character of location, population structure etc. A typical way to proceed in practice is to agree for each particular local area and sometimes also for special population groups (e.g. nationalities) on an annual absolute target number TN of injuries. For better comparison between different areas, the corresponding relative number of injuries RN = TNx100000/(size of population 5 15 years) is also frequently used. As these target numbers should reflect the goal of cutting down accident numbers, they are taken as fractions of corresponding averages of observed numbers from the last years. Usually the size of the fraction varies between different areas or population groups; it is a matter of political bargaining. To ensure continuous controlling within a year, TN and RN (for a specified region) were percentually "distributed" on quarters and then uniformly broken down to monthly TN-s and RN-s. Percentages for the distribution are chosen to cope with "seasonal" effects. A real life eelcample is quarter (month) 1 (1-3) 2 (4-6) 3 (7-9) 4 (10-12) percent (percent) 17.3 (5.77) 30.9 (10.3) 31.6 (10.5) 20.2 (6.73)

TN and RN have always been understood to be upper limits for injury numbers, although they were taken as fractions of averages of observed numbers. This method can cause serious problems, particularly if injury numbers are small. This will be demonstrated by the following real life example for two "very small" regions A and B, where in Table 2 we list the accumulated target number of injuries and the accumulated observed number of injuries. Table 1. Target = accumulated target number of injuries Obs = accumulated observed number of injuries

region size of population < 15 years TN RN A 15 100 54 357.50 53 B 300.22 17 700 Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec A: Target 3.11 6.23 9.34 14.90 20.47 26.03 31.72 37.40 43.09 46.73 50.36 54 Obs 10 13 18 22 23 27 36 39 50 52 53 54 B: Target 3.06 6.11 9.17 14.63 20.09 25.55 31.13 36.71 42.29 45.86 49.43 53 12 20 Obs 1 4 6 25 30 39 49 57 60 61 In region A, the accumulated observed numbers (Obs) increase rapidly over the accumulated target numbers (Target) and later on slow down to end up with Obs = 54 = Target. Contrary to that we find a very slow growth of accumulated observed numbers at the beginning of the year in region B, but extreme growth towards year's end with Obs = 61 and Target = 53, and in this context we might well say: 61 >> 53. Naturally, we have to ask for an explanation of the peculiar difference between region A and region B. Can we find "structural" differences between A and B, or is it just stochastics which is "active"? There seem to be two major weak points in the described practitioner's procedures: (1) the lack of taking into account random effects in injury numbers, (2) the "distribution" of the annual TN on quarters and months without any reference to an idea of modelling "seasonal" behaviour of injury numbers, possibly with local specifications. We propose to base controlling on a stochastic model for the distribution of injury numbers. Targets may be defined in terms of expected values of numbers of injuries. indirectly monitored by control limits derived from the quantiles of the distribution of Y . Seasonal patterns are explained by the model. The model further allows to specify targets individually for different time periods, even for an arbitrary collection of days, as long as the "values" of explaining variables in the model are known for those days.

3 Some Data Analysis We had daily data for the number of injured children from 1997 until 2001. For a better overview let us look a t monthly numbers for only three years: 1998, 2000, 2001. Do this for a "large" population of about 3 million children of age 5 15 (Figure 1) and a "small" population of about 40 thousand children (Figure 2).

Jan

Mar

May

Jul

S ~ P

Nov

Months in 61998

0 2000

Fig. 1. Injury-Numbers in a Large Population of

2001

= 3 Mio Children

Look particularly a t July and August in Figure 1. In 1998 and 2000, total July and the first third of August were within the summer holiday period. Summer holidays in 2001 began the July, 5. and ended August, 18. The number of injured children in July 2001 is higher than in July 1998 and 2000 and lower in August 2001 than in August 1998 and 2000. If we look a t April we find a remarkably low number of injuries in 2001. A closer look shows that particularly a t the end of April the number of injuries is low 2001. This might partly be attributed t o the fact that May, 1, was a Tuesday, making Monday, 30. of April a "bridgeday" with less traffic activities than at a "normal" Monday. Similar patterns can be recognized in Figure 2. Additionally we may conclude that in different regions the "contribution" of a month (here e.g. September) can have a large variation over several years. If we look a t the calender we notice that September 1998 has only 8 weekend-days (i.e. Saturday or Sunday), whereas September 2000 an 2001 have 9 and 10 weekend-days. A closer look to daily data does often reveal the importance of the "distribution" of weekends or holidays or "bridge days" within a month for the number of injuries.

Jan

Mar

May

Jul

Sept

Nov

Months in 1998

02000

f32001

Fig. 2. Injury-Numbers in a Small Size Population of w 40 000 Children

This leads us to look at the number of injured children per weekday. This is done for 1999 in Figure 3. A very similar picture shows up if we consider time period 1997 - 2001. Apparently, weekend-days Saturday and Sunday are "good" for children; Friday is the "worst" day, and Monday through Thursday are all similar.

4 The Model 4.1 Explanatory Variables

The structure of patterns which we had recognized in the data recommended to look for explanatory variables which could represent "seasonal" effects, particularly weather conditions, and periods of high or low traffic activities, also of different intensities of children's participation or presence in traffic. Also daily effects - look at Figure 3 - had to be taken care of. (See also (3), second version for log At .) Since one of our major objectives was to predict injury numbers for several regions on the basis of estimated parameters from data in the past and values of explanatory variables in the future which cannot be controlled, we had to look for variables suitable for our purposes and prevailing in time and region. Most obvious was that the weekday played an essential role for the number of injuries. Periods of school holidays and of "bridge" days showed up, too.

Mo

Tu

We

Th

Fr

Sa

So

Fig. 3. Number of Injuries per Weekday in 1999

(Different from a day in a period of school holidays, a bridge day is within a group of at most four days off school.) Since weather conditions can strongly influence traffic but cannot be foreseen a longer period ahead we took months 'as indicators for weather and other conditions which influence traffic activities. Thus we decided to base our model on daily numbers of injured children, explained by month of the day, weekday and holiday- resp. bridgedaycharacter of the day. The number of injured children in traffic accidents in a given time interval T is modelled as aggregated daily number of injuries for days in time interval T . 4.2 Model Assumptions

We introduce the following

Notation for day t : Yt = number of injured children at day t Mm(t) = 1 if t belongs to month m , m = 1 , . . . , 1 2 W,(t) = 1 if t is weekday w , w = 1, . . . 7 (see also (3)) H ( t ) = 1 if t is a day within "longer" school holidays ("longer" = more than 4 days) B(t) = 1 if t is a "bridge day", i.e. a day within a short period (at most 4 days) of school vacancy. M's, W's, H, B are indicator variables with alternative value = 0 .

For each day t we model Yt to be distributed as a r-mixture of Poissondistributed variables. The mixing variable Z ("frailty") is introduced to take care of overdispersion. The r ( a , b)-distribution is given by the r-density:

f r ( a , b ) (2) =

exp{-bx' r(a)

ba

, expectation a/b , variance a/b2 .

N B M , a Negative Binomial Model: The daily M o d e l , - i.e. the model for Yt is given as a r-mixture of Poisson variables:

(a) Z -- r ( l / u 2 , 1/u2) ; E ( Z ) = 1 , Var(Z) = u2 ; (b) (Y,I Z = z ) -- Pois(z -At) ; (c) For fi we then have a Negative Binomial Model (NBM): fi -- N B with E(&) = A t , Var(fi) = At(1 a2Xt); alternative notation: fi -- NB(b,p) with b := 1 / 0 2 , p := 1 / ( 1 + Atu2).

+

(d)

{fi : t

E

T ) are independent for each set T of days.

where Wednesday is day w = 7 (randomly chosen) and sufficiently taken care of by W w = 0 for w 5 6 ; thus, S7 is not in the model. Second version for log Xt : log X t =

xi2

+

(3)

xWGl 3

pm~,(t) bWww(t)+ T H H ( ~ )+ rsB(t) . m=l After grouping weekdays with estimated 6-s close to zero we obtained a t most four weekdays: the "Wednesdayn-group (containing also Monday, Tuesday, Thursday) with Ww = 0 , Friday, Saturday and Sunday. Thus, only at most three S's are needed. This was done separately for each region under consideration. (Sometimes Saturday and Sunday build a group, and only two 6-s are needed.) In short notation: log A t = 4 . P w ~ ~ ~ P = ( P I , . . . , P I z , S I , S ~ , S ~ , Y H , Y B ) ' , xt = [ M i @ ) ., . . , ~ i z ( t ) , W ~ (~t )2, ( t ) , ~ 3 ( t ) , ~ ( t ) , ~ . (t)]' N B M for t i m e p e r i o d s T (e.g. months, quarters of years) : YT = number of injured children during time period T ; YT =

EiET fi with

yT =

c:=,

Y, according t o ( I ) , (3).

We also write

fi if T = (1,. . . , n) without loss of generality.

(4)

4.3 E s t i m a t i o n a n d Validation

To estimate the parameters in (3) with ( I ) , the marginal likelihood-function f M a L resp. L = log f M a L was maximized. This was done on the basis of all data available from about five years. With ( 2 ) ) (3) and (4) we have the marginal likelihood

Z2 ,

2,thus Xt

for all t

To check the performance of the marginal ML-Method for our model, we estimated parameters from artificial data for the days of about five years, which had been generated according to (3) with known given parameters. Results were encouraging : P-s , 6-s , 5-s and S2 turned out to be very close to the given parameter set of p-s , b-s, y-s and u 2 for the generation of the artificial data set. To zntuitively verify a reasonably good performance of model (4) we compared the known monthly numbers of injured children of a specified year with the corresponding estimated expected numbers, where the estimation was based on the parameter estimates from the data for the four other years; i.e. we made an "ex-post-prognosis" for already known monthly data of one year, but the data - daily injury numbers - might also well come from days later than the time period for which the injury numbers were to be "predicted". Again we judged the results to be sufficiently good to base the construction of control limits on model (4) with (3). A systematic way to look at the adequacy of a model is to apply deviance measures from t h e theory of Generalized Linear Models (GLMs). One way of modelling overdispersion (see LEE, NELDER(2000)) is to use a PoissonGamma-Model (NBM = Negative Binomial Model) - as we did in (3), (4). We compared the performance of our NBM with performances of a PoissonGeneral-Linear Model (PGLM) and of a Quasi-Likelihood-Model (QLM) . PGLM : The PGLM has overdispersion u2 = 1 : Var(Y,) = 1 . E(Y,) . QLM : In a QLM we have Var(Yt) = u 2 .V(Yt) with some suitable variance function V and without specifying the distribution of Yt within the exponential family. (The lack of further model specification is somewhat unsatisfactory. We follow CHRISTENSEN'S comment on QLM: "Such extensions are widely accepted as providing valuable data analytic tools; however, many people have difficulty in understanding the theoretical basis for them." CHRISTENSEN (1990), p. 364 .) To compare NBM with PGLM and QLM we made model checking plots based on the deviance residuals for the three models. (Examples are given in A

A

Figures 4 and 5.) Because these models have different variance forms we used a variance checking plot, where the absolute values of studentized deviance residuals were plotted against the fitted values. The deviance D in a GLM compares the log-likelihood function 1(X;y) with ML-estimated 5 and the log-likelihood function for a saturated model, where we Lave as many parameters as observations and thus X = y : D(Y;A) = 2g2 (L(Y;Y) - l(X;Y)) = C:==l dt . Deviance Residuals are defined as rt = sig(yt - At)& . PIERCE, SCHAEFER (1986) show that the deviance residuals may be considered as (asymptotically) normal with mean zero and common variance, irrespective of the distribution postulated for the yt. For model checking plots the usual studentized deviance residuals r ; are used. Two model checking plots were made: 1. The r,*were plotted against the fitted values transformed to the constant information scale of the assumed distribution. In all plots the residuals seem to fluctuate randomly around zero. This indicates that there is no misspecification of the link function or the linear predictor. 2. The absolute values I rf I were plotted against the fitted values transformed to the constant information scale. These plots may be used to check the variance function. Here one can see that the I r,* I of the PGLM (Figure 5) reach much higher values than the I rf I of the NBM (Figure 4). That means that the residuals of PGLM have a greater variance than residuals of NBM. Further there is no systematic change of range (i.e. no increase or decrease of the variance) in the plot for the NBM. Plots for QLM looked very much like the plots for NBM. But in view of a certain lack of model structure - as mentioned above - the NBM should be preferred. A

A

5 Control Limits 5.1 C o n s t r u c t i o n for P r e d i c t e d N u m b e r s of I n j u r i e s The prediction

pT

:=

YT (Pred) for YT (4) is clearly done by

~ET

with

:= x i ,

log(xt) := x: .

( s t ,P : see ( 3 ) ( f ) ) . (6)

For the construction of control limits for YT (4) we need the estimated distrzbution of YT . Since the analytic handling of that distribution is not obvious, simulation procedures were used: for each t E TI generate realizations yt,, ( s = 1,.. . ,2000) from

N B @ : = 1/S2, $:= 1/[1+ xtS2]) (see (3)(c)). Realisations yT,, = CtET yt,, , s = 1,.. . , 2000, offer simulated yl,, = al2-quantile and y,,, = (1 - a/2)-quantile as simulated lower and upper (1 - a)-CL for YT .

(7)

I

10

I

I

I

I

20

30

40

50

Scaled Fitted Values Fig. 4.

1 T*

/-Plot for NBM in a Large Population of

L 6

%

I

I

I

I

8

10

12

14

3 Mio Children

Scaled Fitted Values Fig. 5. 1 r * ]--Plot for PGLM in a Large Population of

FZ

3 bdio Children

Typical results with a! = 0.15 and a = 0.01 for months 2003, quarters 2003 and the year 2003 are shown in Figures 7, 8, 6 for a large population of about 3 million children, in Figures 9, 10, 11 for a small population of about

40 000 children.

-

Upp +99% LB Upp +85% --

Pred Ea Low 85%

Low 99%

1

Fig. 6. Control Limits for Injuries, Year 2003, in a Large Population of w 3 Mio Children. (Exact Values are in the Appendix, Table 4)

C)early, an observed numker YT of injuries in time period T may well differ from the predicted number YT . In Figure 12 we show a typical p&ture of YT in relation t o the control limits which are constructed "around" YT . 5.2 C o n s t r u c t i o n for G i v e n T a r g e t N u m b e r s of I n j u r i e s

Target numbers (in general as upper limits) for the number of injuries during some time T are given by police authorities as percentage numbers qT for the decrease of predicted (often derived from observed) numbers within some population. Percentages 1- qT are easily converted into given target numbers yT of injuries. For our purposes (and in view of our model) it is much more convenient to use YT rather than 1 - qT . Assume that for time period T we have a

for which control limits are wanted without any reference to what might happen in sub-periods of T. In view of (8) we then can proceed as follows: introduce correction factor CT with

(a) C T : =1 - q T ; (b)

yT = C T . Y T ;

with At := CT . xt (see (6)) ; proceed as in (7) with replaced by At to obtain control limits for yT .

YT = CtET At

st

Jan

Mar

~

+

~

~

Jul

May

+

9

9

%

~ -Pred + a

S ~ P 5

% Low 85%

Nov

- -- Low 99%]

Fig. 7. Control Limits for Injuries, Months 2003, in a Large Population of = 3 Mio Children. (Exact Values are in the Appendix, Table 2)

1500

4 2

1

4

3

Quarters 2003

1[+-;:

Gp+99FzUpp+85%

-Pred

-+Low

85%

- - - Low 99% 1

Fig. 8. Control Limits for Injuries, Quarters 2003, in a Large Population of = 3 Mio Children. (Exact Values are in the Appendix, Table 3)

+ ,

,

. I

-....--'Jan

Mar

May

b-Upp +99% -u- Upp +85% -Pred

Jut

SeP -+-Low 85%

Nov

- -- Low 99% 1

Fig. 9. Control Limits for Injuries, Months 2003, in a Small Population of rz 40 000 Children. (Exact Values are in the Appendix, Table 5)

0

4 4

3

2

1

Quarters 2003

1- + - Upp +99%

Upp +85% -Pred

+

+Low

85%

- - - Low 99%]

Fig. 10. Control Limits for Injuries, Quarters 2003, in a Small Population of x 40000 Children. (Exact Values are in the Appendix, Table 6)

Fig. 11. Control Limits for Injuries, Year 2003, in a Small Population of = 40000 Children. (Exact Values are in the Appendix, Table 7)

Jan

Mar

E-Upp +99%

-

May

Jul

Upp +85% -Observed

Sept --c Low 85%

Nov

- - - Low 99% /

Fig. 12. Observed Number of Injuries and Control Limits in Year 2000 in a Medium Size Population of = 500 000 Children

What can we do if control limits are wanted for a target number in time period T as well as for target numbers in sub-periods T ( i ) C T ? Suppose that we have a disjoint decomposition: k

U

(a) T = 2 = 1 T ( i ) , T ( i ) # 0 for all i , T ( i ) n T ( j ) = 0 for i # j ; (b) yT = pre-given target number in T (as in (8)) ; = pre-given target number in T ( i ) , i = 1,.. . , k - 1 ; (c) Y T ( k cannot ) be pre-given (see (12) with (14)).

(10)

We do have the predicted injury numbers in all time periods of interest according to (6): A

k

-

A

A

y~ = Cs=l y ~ ( i= ) YT(*) + Y T ( k );

pT(*)=

k-1

-

YT(()

Let now as in (9) and with (10)

Define now the "correction factor" for T ( k ) :

We then obviously obtain (also with (11))

Equation (13) makes sense if 0 < C T ( k ).

-

Due to ( l l ) , (12), (13), (14) we have the corrected

At = CT(,). At for t E T ( i )

,

and the construction of control limits can be done as in (9)(b) This procedure opens a large variety of control limits according to (9). Note in particular that in (10)(a) the T ( i ) are not necessarily intervals. We thus might for suitable special subsets of T - in view of Figure 3 e.g. for the subset of all Fridays ! - construct control limits to pre-given target numbers of injuries.

6 Power of Control Limits Suppose that the expected value of the number of injuries in some time period T differs from the target number yT by some factor a , i.e.

How likely is it that our procedure will detect such a deviation? and ,,y, have been constructed Let us assume that control limits yl, for &- = CtET At according to (9). For a given factor a we assume that

YT =

xtET & where now

&

NB(b,p(a)) with (16)

b = l / a 2 , p = l / (1 + a . At v2) ,

and (15) holds. We are now interested in the power

with a simulated distribution function Fa of YT under (16). We calculated power @(a)(17) - with a range of several a near 1 - in a large population (= 3 million children) with 0.99-control limits for the year 2000 (see Figure 13), with 0.85-control limits for the year 2000 (see Figure 14) and for May 2000 (see Figure 15); in a small population (= 40 thousand children) the power for the year 2000 (see Figure 16) and for May 2000 (see Figure 17) with 0.85-control limits was calculated.

0,9

0,92

0.94

0,96 0,98

1

1,02

1,04

1,06

1,08

1,l

a = Factor of Deviation Fig. 13. Power in 2000, cr = 0.01 ; Large Population: = 3 Mio Children

Let us at first look at the large population. In Figure 13 we see that with significance 0.01 an increase resp. decrease of injury numbers of about 5% (i.e. ( a - 1 ( = 0.05) within year 2000 will be detected with probability of around 0.60; this power becomes = 0.95 if we have significance 0.15, as is seen in Figure 14. For a short period of time like a month - here May 2000 - Figure 15 shows a power of hardly more than 0.30 for I a - 1 I = 0.05. But a

Fig. 14. Power in 2000, a = 0.15; Large Population: x 3 Mio Children

0.9

0,92

0,94

0.96

0,98

1

1,02

1,04

1,06

1.08

1,l

a = Factor of Deviation Fig. 15. Power in May 2000,

ai

= 0.15; Large Population:

x 3

Mio Children

0,g

03

12

1,l

1

a = Factor of Deviation Fig. 16. Power in 2000, a = 0.15 ; Small Population:

0,4

0,6

03

1

= 40 000 Children

12

1,4

a = Factor of Deviation Fig. 17. Power in May 2000, a = 0.15 ; Small Population:

= 40 000 Children

1,6

larger increase resp, decrease of about 10% would be detected with probability 0.70 . In a small population, the probability of detecting a 5% increase resp. decrease within one year is less than 0.20, as we see in Figure 16; only large increases resp. decreases of ~ 2 0 %have a probability of about 0.70 to be detected. Even very large increases resp. decreases (say 20%) within one month will hardly be detected (power = 0.20) , as shown in Figure 1 7 . Due to the character of our model the results on the power of control limits for injury numbers should not surprise. Control limits and their power are useful for sufficiently large populations in sufficiently large time periods corresponding to the (in time and space) "global" character of our model. We might additionally take the hint that "local" events require more individual treatments.

--

7 Appendix

Table 2. Large Population of z 3 Mio Children: CL for Injuries, Months 2003

Pred 606

Month Jan Feb Mar A P ~ May Jun Jul

-4%

'

Sept Oct Nov Dec

Table 3. Large Population of

= 3 Mio Children: CL for Injuries, Quarters 2003 Upper

Quarter

2. Apr

-

Jun

Lower 85% 99% 571 530 628 580 753 697 856 788 1242 1159 1187 1113 1119 1045 902 839 946 888 834 771 702 652 626 583

3580

Lower 85% 99% 1999 1916 3365 3228 3037 2921 2219 2130

170

Table 4. Large Population of z 3 Mio Children: CL for Injuries, Year 2003

Upper Year

2003

Table 5. Small Population of

Lower

85% 99% +85% Pred 11155 10984 10791 10575

+99,0% 11362

= 40 000 Children: CL for Injuries, Months 2003 Lower

Upper Month Jan Feb Mar A P ~ May Jun Jul Aug Sept Oct Nov Dec

+99,0% 14 13 18 19 25 26 31 17 25 17 22 19

+85% 9 9 13 14 18 19 23 12 19 13 16 14

Pred

6 6 10 10 14 14 18 9 15 9 12 10

Table 6. Small Population of z 40 090 Children: CL for Injuries, Quarters 2003

Lo.

85% 17 31 34 26

Table 7. Small Population of x 40000 Children: CL for Injuries, Year 2003

Upper Year

2003

+99,0% 166

Lower

+85% 147

Pred

134

85% 121

99% 106

References 1. Christens, P.F. (2003): Statistical Modelling of Traflic Safety Development, P h D thesis. T h e Technical University of Denmark, IMM,Denmark. 2. Christensen, R. (1990): Log-Linear Models. Springer-Verlag New York etc. 3. Dobson, A.J. (2002): An Introduction t o Generalized Linear Models. Chapman and Hall. 4. Lee,Y., Nelder, J.A. (2000): Two Ways of Modelling Overdispersion in NonNormal Data. Applied Statistics, Vol. 49, p. 591-598. 5. McCullagh, P., Nelder J.A. (1989): Generalized Linear Models. Chapman and Hall. 6. Pierce, D.A., Schaefer, D.W. (1986): Residuals in Generalized Linear Models. JASA, Vol. 81, p. 977-986. 7. Toutenburg, H. (2003): Lineare Modelle. Physica-Verlag.

A New Perspective on the Fundamental Concept of Rational Subgroups Marion R. Reynolds, ~ r . and ' Zachary G. stoumbos2

'

Virginia Polytechnic Institute & State University, Department of Statistics, Blacksburg, VA 24061-0439, USA [email protected] Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8054, USA [email protected]

Summary. When control charts are used to monitor processes to detect special causes, it is usually assumed that a special cause will produce a sustained shift in a process parameter that lasts until the shift is detected and the cause is removed. However, some special causes may produce a transient shift in a process parameter that lasts only for a short period of time. Control charts are usually based on samples of n 2 1 observations taken with a sampling interval of fixed length, say d. The rational-subgroups concept for process sampling implies that sampling should be done so that any change in the process will occur between samples and affect a complete sample, rather than occur while a sample is being taken, so that only part of the sample is affected by the process change. When using n > 1, the rational-subgroups concept seems to imply that it is best to take a concentrated sample at one time point at the end of the sampling interval d, so that any process change will occur between samples. However, if the duration of a transient shift is less than d, then it appears that it might be beneficial to disperse the sample over the interval d,to increase the chance of sampling while this transient shift is present. We investigate the question of whether it is better to use n > 1 and either concentrated or dispersed sampling, or to simply use n = I . The objective of monitoring is assumed to be the detection,of special causes that may produce either a sustained or transient shift in the process mean p and/or process standard deviation a. For fair comparisons, it is assumed that the sampling rate in terms of the number of observations per unit time is fixed, so that the ratio nld is fixed. The best sampling strategy depends on the type of control chart being used, so Shewhart, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) charts are considered. For each type of control chart, a combination of two charts is investigated; one chart designed to monitor ,u, and the other designed to monitor a. The conclusion is that the best overall performance is obtained by taking samples of n = I observations and using an EWMA or CUSUM chart combination. The Shewhart-type chart combination with the best overall performance is based on n > I, and the choice between concentrated and dispersed sampling for this control chart combination depends on the importance attached to detecting transient shifts of duration less than d.

1

Introduction

Consider the problem of using control charts to monitor the mean p and standard deviation a of a normal process variable, where the objective is to detect small as well as large changes in p or a. A change in p or a may correspond to a sustained shift that lasts until it is detected by a control chart and removed. Most evaluations of the performance of control charts consider sustained shifts. A change in p or a may also correspond to a transient shift that lasts for only a short period of time, even if not detected by a control chart. For example, a process may be affected by temporary changes in electrical voltage caused by problems external to the process being monitored. The general question being considered here is the question of how to sample from the process to detect sustained and transient shifts. The traditional approach to deciding how to sample from control charts is based on the rational-subgroups concept. This concept was introduced by Shewhart (1931) and is discussed in standard texts such as ~ a w k i n and s Olwe11(1998), Ryan (2000), and Montgomery (2005). The basic idea of this concept is that sampling should be done so that it is likely that any process change will occur between samples and thus affect a complete sample, rather than occur while a sample is being taken, so that only part of the sample is affected by the process change. There are a number of well-known applications of the rational-subgroups concept. For example, if there is a change in process personnel every eight hours, then a sample should not overlap two of these eight-hour periods because a process change may correspond to the change in personnel. Any sample containing observations from the two periods would contain observations from both before and after the change. As another example, consider a process that consists of multiple streams, where a process change can affect only one stream. For example, components may be manufactured simultaneously on five machines and a special cause may affect only one of the five machines. In this situation, a sample should not consist of one observation from each machine, because a change in only one machine would tend to be masked by the observations from the other unchanged machines. In this paper, we assume that when process change occurs, it is just as likely to occur in one place as another. Thus, we assume that we can avoid situations such as sampling from two different work periods. In addition, we will not investigate the situation where only part of the sequence of observations is affected by the process change, as would occur, for example, when sampling from multiple streams and only one stream is affected by the change. Thus, we assume that when a process change occurs, it affects all observations obtained while this change is in effect. To more clearly explain the sampling issues that we will investigate, consider a process that is currently being monitored by taking samples of size n, where n = 4,

using a sampling interval of d, where d = 4 hours. Assume that the four observations in a sample are taken at essentially the same time, so that the time between observations can be neglected. For example, the sample might consist of four consecutive items produced. We call this concentrated sampling, because the four observations are concentrated at one time point. Concentrated sampling seems to correspond to the rational-subgroups concept, because any process change is likely to occur during the four-hour time interval between samples. Consider a transient shift that lasts for a duration of I hours. For example, if a change in the electrical voltage to the process lasts for only two hours, then 1= 2 hours. If a concentrated sample is taken every four hours, then a transient shift of duration I = 2 hours may be completely missed. To detect transient shifts of short duration, it appears that it would be better to spread out the sampling over the four-hour interval, with one observation every hour (see Reynolds and Stoumbos (2004a, 2004b) and Montgomery (2005)). In general, a sample of n > 1 observations spread out over the time interval d, with a time interval of dln between each observation, will be called dispersed sampling, because the observations are dispersed over the interval. , The objective of this research work is to investigate several questions connected with concentrated versus dispersed sampling. In particular, for detecting sustained andlor transient shifts, we investigate the question of whether it is better to use concentrated sampling and take all observations at the same time, or instead use dispersed sampling and spread the observations out over the interval d. If dispersed sampling is used with n = 4 and d = 4, then observations are taken individually every hour. In this case, we have the option of plotting a point after each observation, so this would correspond to n = 1 and d = 1. So, if observations are taken individually every hour, we investigate the question of whether it is better to plot a point after each observation, corresponding to n = 1, or instead to defer plotting until four observations have been obtained and thus plot a point every four hours, corresponding to dispersed sampling.

Specifically, in this paper we consider the above questions for Shewhart and exponentially weighted moving average (EWMA) control charts and refer the reader to Reynolds and Stoumbos (2004a, 2004b) for related, detailed investigations on the cumulative sum (CUSUM) control charts. We will show that the answers to these questions depend on the type of control chart being used.

2

Monitoring the Process

The process variable of interest, X,is assumed to have a normal distribution with mean p and standard deviation a. Assume that all observations from the process are independent. Let be the in-control or target value for p, and let oobe the incontrol value for a.

Suppose that samples of size n 2 1 are taken every d hours, and, without loss of generality, assume that the sampling rate is nld = 1.0 observation per hour. So, this could correspond to one observation every hour, or four observations every four hours, or eight observations every eight hours, and so forth. We assume that the sampling cost depends only on the ratio nld, so that all of these sampling patterns have the same cost. For monitoring p and a, it is customary to use two control charts in combination, one designed to detect changes in p and the other to detect changes in cx The traditional Shewhart control chart for monitoring p is the X chart based on plotting the sample means. Three-sigma control limits are usually used with this chart. For monitoring o, consider the Shewhart s2chart based on the sample variance, with only an upper control limit determined by the chi-squared distribution. Here, we only consider the problem of detecting increases in o; but the conclusions of the paper are similar if both increases and decreases are considered. When n > 1, we consider the performance of the Shewhart X and SZ charts when used in combination to detect shifts in p or o. When n = 1, the X chart reduces to the Xchart. In this case, the moving-range (MR) chart is frequently used to monitor o, but recent research has shown that using the X chart alone is sufficient to monitor both p and o. Thus, when n = 1, we consider the X chart alone for monitoring p and a . The EWMA control statistic for detecting changes in p is

where the starting-value is Yo= p o ,as it is usually taken to be in practice. A signal is generated if Y, falls outside a lower or an upper control limit. The smoothing parameter 1 (0 -45 0) serves as a tuning parameter that can be used to make the chart sensitive to small or large parameter shifts. The EWMA control chart was originally proposed by Roberts (1959), and a number of papers in the last 20 years have considered the design and implementation of EWMA charts (see, for example, Crowder (1 987), Lucas and Saccucci (1990), Yashchin (1 993), Gan (1 995), Morais and Pacheco (2000), Stoumbos and Sullivan (2002), Stoumbos, Reynolds, and Woodall (2003), and Stoumbos and Reynolds (2005)). Very effective control charts for detecting changes in a can be based on squared deviations from target (see, for example, Domangue and Patch (1991), MacGregor and Harris (1993), and Shamma and Amin (1993)). The one-sided EWMA control statistic for detecting increases in o c a n be written as

where the starting value is 2, = 0:. A signal is given if 2, exceeds an upper control limit. A two-sided EWMA chart for o c a n be developed for detecting both increases and decreases in o, but we consider only the problem of detecting increases in o. Extensive investigations of the properties of the EWMA charts based on Y, and Z, when these two EWMA charts are used in combination to detect changes in ,u or o have been conducted by Stoumbos and Reynolds (2000) and Reynolds and Stoumbos (2001a, 2001b, 2004a, 2004b, 2005,2006)). Here, we will also consider the performance of the EWMA charts based on Y, and Z, when these two EWMA charts are used in combination to detect changes in ,u or a. The performance of this EWMA chart combination depends on the choice of the tuning parameter 1.When n = 4, we use 1= 0.2. When n = 1, we use A = 1- (1 - 0.8)"~ = 0.05426, which ensures that the sum of the weights of four observations is 0.2, the same weight used for the case of n = 4 (for additional discussion, see Reynolds and Stoumbos (2004a, 2005)).

3

Measures of Control Chart Performance

Define the average time to signal (ATS) to be the expected amount of time from the start of process monitoring until a signal is generated by a control chart. When the process is in control (,u=b and o = oo), we want a large ATS, which corresponds to a low false-alarm rate. When there is a change in the process, we want the time from the change to the signal by the chart to be short. Define the steady-state ATS (SSATS) to be the expected time from the change to the signal. The SSATS is computed assuming that the control statistic is in steady state at the time that the process change occurs, and that the process change can occur at random between sampling points. Additional discussion of the SSATS can be found in Reynolds (1995), Stoumbos and Reynolds (1996, 1997,2001), and Stoumbos, Mittenthal, and Runger (2001). Comparisons of control charts or combinations of control charts will be done for the case in which the in-control ATS is very close to 1481.6 hours. This is the in-control ATS value for a Shewhart X chart based on samples of n = 4 observations every d = 4 hours and using the standard three-sigma limits. When two control charts are considered in combination, the control limits of each chart have been adjusted to give equal individual in-control ATS values (usually of about 2700) and a joint in-control ATS value veryscloseto 1481-6. For all of the numerical results presented in the tables provided in this paper, we assume, without loss of generality, that = 0 and oo= 1.

4

Evaluations for Shewhart Control Charts

Table 1 gives SSATS values for sustained shifts in ,u or o and concentrated sampling, for the Shewhart X chart with (n, 6) = (1, 1) and the Shewhart X and s2 chart combination with (n, 6)= ( I , I), (2, 2), (4, 4), (8, 8), and (16, 16). From Table 1, we see that using n = 1 is good for detecting very large shifts, but performance is very bad for detecting small shifts. Using a large value of n is good for detecting small shifts, but performance is bad for detecting very large shifts. Table 1. SSATS Values for Shewhart Control Charts and Sustained Shifts Using Concentrated Sampling

X Chart Size of Sustained Shift i n p o r cr

Combined

& S' Charts

n=l d= 1

In-Control ATS p = 0.5 1.o 2.0 3.O 5.0 10.0

o= 1.4 1.8 3.0 5.0 10.0

Thus, the Shewhart control charts are very sensitive to the choice of n. A reasonable compromise for detecting both small and large shifts would be an intermediate value of n. For comparisons later with EWMA control charts, we will use n = 4 and d = 4 as a good compromise. This corresponds closely to the traditiona1 practice of taking samples of around n = 4 observations (see, for example, Stoumbos et al. (2000) and reference therein). Table 2 gives SSATS values for Shewhart control charts based on concentrated or dispersed sampling when there is a sustained shift in p or cr. From Table 2, we see that concentrated sampling is always better than dispersed sampling for detecting sustained shifts. Thus, the rational-subgroups concept gives the correct answer in this case. Observations should be taken in groups so that a shift will occur between groups.

This makes intuitive sense. For a sustained shift, we want to wait until the end of the interval d to take a sample so that the sample will not include any in-control observations. Table 3 gives signal probabilities for Shewhart charts based on concentrated or dispersed sampling when there is a transient shift of duration 1 = 1 hour or I = 4 hours. The signal probabilities are the probability of a signal either while the transient shift is present or within 4 hours after the end of the transient shift. From Table 3 we see that for transient shifts of short duration (I= 1.0), the Shewhart X chart with n = 1 is best. But we have already established that n = 1 is not good for detecting sustained shifts. Thus, we should rule out using n = 1 unless transient shifts and large sustained shifts are the only process changes that are of concern. If n = 4 or 8 is being used and I = 1, then dispersed sampling is better than concentrated sampling (except for small shifts). Thus, we have a situation in which we want to use dispersed sampling, because using concentrated sampling when the transient shift is of short duration would mean that we might not sample when the shift is present. Note that the use of dispersed sampling violates the rationalsubgroups concept because a shift will occur within a sample. If the transient shift is of longer duration (I = 4), then using n = 4 and concentrated sampling is best. So the rational subgroups concept gives the correct answer for transient shifts of longer duration. When Shewhart control charts are being used, we see that the choice of n and the choice between dispersed or concentrated sampling depends on what we want to detect. If detecting large sustained shifts or transient shifts of short duration is the primary concern, then using n = 1 is reasonable. But if we want to have reasonable performance for detecting smaller shifts, then we need to use a larger value of n, such as n = 4. In this case, dispersed sampling is best for detecting transient shifts of short duration. In many applications, sustained shifts may be more common than transient shifts, and in this case, using concentrated sampling would be reasonable.

179

Table 2. SSATS Values for Shewhart Control Charts and Sustained Shifts Using Concentrated or Dispersed Sampling Combined

X Chart

1.O 2.0 3 .O 5.0 10.0

a= 1.4 1.8 3.0 5.0 10.0

&

n=4 d=4

Size of Sustained Shift inpor a

p = 0.5

2

s2 Charts n=8 d=8

Dispersed

Concentrated

Dispersed

521.3 121.3

65.4 16.5 3.3

Table 3. Signal Probabilities for Shewhart Control Charts and Transient Shifts Using Concentrated or Dispersed Sampling Combined

X Chart Duration 1 and Size of Transient Shift in p

;,: Concentrated

Dispersed

X

1

& S' Charts

;,: Concentrated

Dispersed

Evaluations for EWMA Control Charts Table 4 gives SSATS values for sustained shifts in p or a for EWMA control charts with (n, d)= (1, 1) and (4, 4). For purposes of comparison, SSATS values are also given for the Shewhart X and S' chart combination with (n,d) = (4,4). When (n, d)= (4,4), SSATS values are given for both concentrated and dispersed sampling. From Table 4, we see that if n = 4 in the EWMA control charts, then concentrated sampling is better than dispersed sampling for sustained shifts. This is the same conclusion reached for n = 4 and sustained shifts for the Shewhart control charts. Now, consider the choice of n in the EWMA charts. For detecting large shifts, it is much better to use n = 1 in the EWMA charts, but for smaller shifts, n = 4 and concentrated sampling is a little better. Overall, it seems to be better to use n = 1 in the EWMA charts. If the EWMA charts use the best overall value of n (n = 1) and the Shewhart charts use the best overall value of n (an intermediate value such as n = 4), then the EWMA charts are much better than the Shewhart charts for both small and large shifts. The Shewhart charts are slightly better for some intermediate shifts. Note that this contradicts the conventional wisdom that Shewhart charts are best for detecting large parameter shifts. Shewhart charts are best only in the case when n = 1 is used, but n = 1 should not be used with Shewhart charts unless detecting large shifts is the only objective. Table 5 gives signal probabilities for EWMA and Shewhart control charts based on concentrated or dispersed sampling when there is a transient shift of duration 1 = 1 hour or 1 = 4 hours. From Table 5 we see that, if n = 4 is used in the EWMA charts, then dispersed sampling is much better than concentrated sampling for detecting large shifts of short duration ( I = 1). For the case of n = 4 and 1= 4, concentrated sampling is a little better than dispersed sampling. However, for the EWMA control charts, using n = 1 is better than using n = 4 in all cases of the transient shifts in Table 5. This corresponds to the value of n that is best for sustained shifts. Thus, for EWMA control charts, the best overall value of n does not depend on the type of shift. Using n = 1 is best for both sustained and transient shifts. If n = 1 is used in the EWMA charts, then the rational subgroup concept applies in the sense that a shift must occur between samples, because the samples are of size n = 1.

181

Table 4. SSATS Values for Shewhart and EWMA Control Charts and Sustained Shitts Us-

ing Concentrated or Dispersed Sampling 2ombined Size of Sustained Shift inpor a

z-&S2 Charts

Combined EWMA Charts

n=4 d=4 :oncentrated

Dispersed

Concentrated

Dispersed

In-Control ATS p = 0.5

1 .o

2.0 3.0 5.0 10.0 c = 1.4

1.8 3.0 5.0 10.0

Table 5. Signal Probabilities for Shewhart and EWMA Control Charts and Transient Shifts Using Concentrated or Dispersed Sampling

1 Combined

& S' Charts

1

Combined EWMA Charts

Duration 1 and

Concentrated

I = 1.0 p = 2.0

3.0 4.0 5.0 7.0

0.20 0.25 0.25 0.25 0.25

0.05 0.19 0.48 0.80 1 .OO

0.04 0.18 0.50 0.82 1 .OO

0.79 1 .OO 1 .OO 1 .OO 1 .OO

0.50 0.92 1 .OO 1 .OO 1 .OO

0.49 0.96 1 .OO 1 .OO 1 .OO

1 = 4.0 p = 2.0

3.0 4.0 5.0 7.0

Dispersed

Conclusions and Discussion Shewhart control charts are very sensitive to the choice of n. The best overall performance is achieved when n > 1 (for example, n = 4 is a reasonable compromise). If a Shewhart chart uses n = 4, then concentrated sampling is better for sustained shifts, but dispersed sampling is better for transient shifts of short duration. Using n = 4 and dispersed sampling violates the rational subgroup concept. For the EWMA control charts, it is best overall to use n = 1, so the issue of concentrated versus dispersed sampling does not arise. For EWMA charts with n = 1, there is no violation of the rational subgroups concept. Our recommendation for monitoring the process mean and standard deviation is to take small frequent samples ( n = 1 and d = 1) and use an EWMA chart for the mean in combination with an EWMA chart for the standard deviation based on squared deviations from target. We have rigorously verified that the basic conclusions about EWMA control charts apply to CUSUM control charts (see Reynolds and Stoumbos (2004a, 2004b)). That is, CUSUM chart combinations based on sample means and squared deviations from target can be used in place of the respective EWMA chart combinations. The rational subgroup concept, as usually formulated, can be usefd in some contexts, but not useful in other contexts. The rational subgroup concept was originally formulated when Shewhart control charts were the only charts being used. EWMA and CUSUM control charts accumulate information over time, so what is being plotted is not just a function of the current process data. For example, an EWMA control statistic plotted after a process change will contain information from both the in-control and the out-of-control distribution, so there is no way to avoid violating the rational concept that stipulates that what is plotted should be affected by just the out-of-control distribution. When n = 1 is used in EWMA or CUSUM control charts, we still have the issue of avoiding a sampling plan that samples from multiple streams. Perhaps the idea that a process change should affect a complete sample should be replaced with the idea that that sampling should be done so that all observations taken during the time that the special cause is present should be affected by this special cause.

References 1. Crowder, S. V. (1987), "A Simple Method for Studying Run-Length Distributions of Exponentially Weighted Moving Average Charts," Technometrics, 29,40 1-407.

Domangue, R. and Patch, S. C. (1991), "Some Omnibus Exponentially Weighted Moving Average Statistical Process Monitoring Schemes," Technometrics, 33,299-3 13. Gan, F. F. (1995), "Joint Monitoring of Process Mean and Variance Using Exponentially Weighted Moving Average Control Charts," Technomefrics, 37,446-453. Hawkins, D. M., and Olwell, D. H. (1998), Cumulative Sum Control Charts and Chartingfor Quality Improvement, New York: Springer-Verlag. Lucas, J. M., and Saccucci, M. S. (1990), "Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements," Technometrics, 32, 1-12. MacGregor, J. F. and Harris, T. J. (1993), "The Exponentially Weighted Moving Variance", Journal of Quality Technology, 25, 106-118. Montgomery, D. C. (2005), Introduction to Statistical Quality Control, 5th Edition, New York: Wiley. Morais, M. C., and Pacheco, A. (2000), "On the Performance of Combined EWMA Schemes for p and o: A Markovian Approach," Communications in Statistics Part B - Simulation and Computation, 29, 153-174. Reynolds, M. R., Jr. (1995), "Evaluating Properties of Variable Sampling Interval Control Charts," Sequential Analysis, 14, 59-97. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2001a), "Individuals Control Schemes for Monitoring the Mean and Variance of Processes Subject to Drifts," Stochastic Analysis and Applications, 19, 863-892. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2001b), "Monitoring the Process Mean and Variance Using Individual Observations and Variable Sampling Intervals," Journal of Quality Technology, 33, 181-205. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2004a), "Control Charts and the Efficient Allocation of Sampling Resources," Technometrics, 46,200-214. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2004b), "Should Observations be Grouped for Effective Process Monitoring?" Journal of Quality Technology, 36, 343-366. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2005), "Should Exponentially Weighted Moving Average and Cumulative Sum Charts Be Used With Shewhart Limits?" Technometrics, 47,409-424. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2006), "An Evaluation of an Adaptive EWMA Control Chart," preprint. Roberts, S. W. (1959), "Control Charts Based on Geometric Moving Averages," Technometrics, I, 239-250. Ryan, T. P. (2000), Statistical Methods for Quality Improvement, 2nd Edition, New York: Wiley. Shamma, S. E. and Amin, R. W. (1993), An EWMA Quality Control Procedure for Jointly Monitoring the Mean and Variance", International Journal'of Quality and Reliability Management, 10,58-67.

19. Shewhart, W. A. (1931), Economic Control of Quality of Manufactured Product, New York: Van Nostrand. 20. Stoumbos, Z. G., Mittenthal, J., and Runger, G. C. (2001), "Steady-StateOptimal Adaptive Control Charts Based on Variable Sampling Intervals," Stochastic Analysis and Applications, 19, 1025-1057. 21. Stoumbos, Z. G., and Reynolds, M. R., Jr. (1996), "Control Charts Applying a General Sequential Test at Each Sampling Point," Sequential Analysis, 15, 159-183. 22. Stoumbos, Z. G., and Reynolds, M. R., Jr. (1997), "Control Charts Applying a Sequential Test at Fixed Sampling Intervals," Journal of Quality Technology, 29, 21-40. 23. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2000), "Robustness to Nonnormality and Autocorrelation of Individuals Control Charts," Journal of Statistical Computation and Simulation, 66, 145-187. 24. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2001), "The SPRT Control Chart for the Process Mean with Samples Starting at Fixed Times," Nonlinear Analysis: Real World Applications, 2, 1-34. 25. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2005), "Economic Statistical Design of Adaptive Control Schemes for Monitoring the Mean and Variance: An Application to Analyzers," Nonlinear Analysis: Analysis: Real World Applications, 6, 8 17-844. 26. Stoumbos, Z. G., Reynolds, M. R., Jr., Ryan, T. P., and Woodall, W. H. (2000), "The State of Statistical Process Control as We Proceed into the 21st Century," Journal of the American Statistical Association, 95,992-998. 27. Stoumbos, Z. G., Reynolds, M. R., Jr., and Woodall, W. H. (2003), "Control Chart Schemes for Monitoring the Mean and Variance of Processes Subject to Sustained Shifts and Drifts," in Handbook of Statistics: Statistics in Industry, 22, eds. C. R. Rao and R. Khattree, Amsterdam, Netherlands: Elsevier Science, 553-571. 28. Stoumbos, Z. G., and Sullivan, J. H. (2002), "Robustness to Non-Normality of the Multivariate EWMA Control Chart," Journal of Quality Technology, 34,260-276. 29. Yashchin, E. (1993), "Statistical Control Schemes: Methods, Applications and Generalizations," International Statistical Review, 6 1,41-66.

Economic Advantages of CUSUM Control Charts for Variables Erwin M. ~ a n i g a ' Thomas , P. ~ c ~ i l l i a mDarwin s ~ , J. ~ a v i s ~ and James M. ~ u c a s ~

' University of Delaware, Dept. of Business Administration, Newark, DE 19716, USA [email protected]

Drexel University, Department of Decision Sciences, Philadelphia, PA 19104, USA [email protected]

University of Delaware, Dept. of Business Administration, College of Business and Econmics, Newark, DE 19716, USA [email protected]

J.M. Lucas and Associates, 5120 New Kent Road, Wilmington, DE 19808, USA [email protected]

Summary. CUSUM charts are usually recommended to be used to monitor the quality of a stable process when the expected shift is small. Here, a number of authors have shown that

the average run length (ARL) performance of the CUSUM chart is better than that of the standard Shewhart chart. In this paper we address this question from an economic perspective. Specifically we consider the case where one is monitoring a stable process where the quality measurement is a variable and the underlying distribution is normal. We compare the economic performance of CUSUM and X charts for a wide range of cost and system parameters in a large experiment using examples from the literature. We find that there are several situations in which CUSUM control charts have an economic advantage over X charts. These situations are: 1. when there are high costs of false alarms and high costs of repairing a process; 2. when there are restrictions on sample size and sampling interval; 3 , when there are several components of variance, and; 4. when there are statistical constraints on ARL.

1

Introduction

Taylor (1968), Goe1(1968), Chiu(1974) and von Collani (1987) have addressed the problem o f economic design o f CUSUM charts for controlling a process where quality is measured by variables. All used simple models due to Duncan(1956) although Taylor's model does not include sampling costs. Virtually all other work on CUSUM design for variables data has been on statistical design. Lucas( 1976),

Wooda11(1986a), Gan(1991), Hawkins(l992) and Prabhu, Runger and Montgomery(1997), among others, have addressed the problem. Goel's (1968) conclusions were that CUSUM charts perform better in terms of ARL than F charts but the cost advantages were small unless there were smaller sample sizes used than optimal. He also finds that there are smaller sample sizes required for CUSUM charts than for X charts when they are designed to achieve the same in-control and out-of- control ARL's. Goel's small experimental study indicates that, within the range of the study, economically optimal CUSUMS behave much like an y chart; i.e. the reference value k is relatively large and the decision interval h is small. ChiuYs(1974)conclusions are similar. His small experiment of fifteen runs indicates that economic CUSUM designs are also characterized by small h values and by an out-of-control ARL of less than 1.25. The latter finding indicates that a small out of control ARL may be economically optimal as a constraint when designing CUSUM control charts. Unfortunately, Chiu's economic CUSUM designs can have very small in control ARL's; some examples have an in-control ARL of 21.2. These small values of an in-control ARL can lead to a loss of management trust in the control procedure as well as unnecessary process adjustments which can itself lead to an increase in process variability. Von Collani's(1987) conclusions are that Shewhart charts perform quite well in terms of costs when compared to more complex methods such as CUSUM charts and he supports this conclusion with several examples. The problem of a small in-control ARL as well as other problems associated with economically designed control charts have been addressed by Woodall(1986b). In particular, Woodall(1986a) has criticized the economic design of a CUSUM chart by pointing out an example of Chiu's (1974) in which the out of control ARL S for a statistically designed CUSUM chart are better than the out of control ARL's for an economically designed CUSUM control chart, especially if a shift apart from the expected shift occurs. His argument is that the small increase in cost resulting from the use of the CUSUM chart "appears to be a small price to pay for the increased sensitivity of the procedure" (Wooda11(1986a, p.101). Nonetheless, as Chiu (1974, p 420) argues, "the use of control charts is basically an economic problem" even though control charts are usually designed statistically if a formal procedure of design is used at all. Moreover, one can easily achieve desired ARL's by placing statistical constraints on the economic model that ensure specific in control and out of control ARL limits are achieved, a solution proposed by Saniga(1989). Interestingly, he has shown that tighter than desired statistical designs can be economically advantageous. In this paper we look at the problem of economic CUSUM design on a larger and broader scope than either Goel (1968), Chiu (1974) or von Collani(1987).

Our purpose is to investigate CUSUM design using a general economic model due to Lorenzen and Vance (1986). We solve the economic design problem for a wide range of input parameters using a Nelder Meade(1965) search procedure: ARL S are calculated using the Luceno and Puig-Pey (2002) algorithm. Specifically, we address a number of issues concerning CUSUM chart design and, in addition, the policy decision of choosing a CUSUM chart versus an X bar chart. These issues are: 1. In what regions of cost and other input parameters are there cost advantages to CUSUM charts versus X charts? 2. What form do economically designed CUSUM charts take in terms of the reference value, decision interval and ARL 's. Are the conclusions of Goel(1968) and Chiu(1974) accurate over a wide range of examples? 3. What part do restrictions on sample size and the sampling interval have on these decisions? 4. Does the presence of several components of variance affect the conclusions in 1 above? 5. What is the effect of statistical constraints on the economic decision when using a CUSUM chart versus an X chart? In the next section we present the economic model we employ to answer the above questions and discuss the algorithm used to find economic designs. In section 3 we present results that are used to answer the questions posed above. In section 4 we address the variance component problem. Section 5 addresses the issue of statistical constraints. Finally, in section 6 we draw some brief conclusions.

2

The economic model

Lorenzen and Vance(1986) developed a general form of the popular Duncan(1956) model for the economic design of a control chart. This model contains all costs associated with using a control chart to maintain current control of a process including costs of nonconformities, costs of sampling of sampling and inspection, costs of false alarms and the costs for locating and repairing the process. The Lorenzen and Vance model is more general in the sense that production can continue or be stopped during a search for the assignable cause and production can be either continued or stopped during repair. Otherwise, the standard assumptions in economic control chart design are made; these are that the time in control follows the negative exponential distribution, a single shift of known size can occur, and the other cost and system parameters are deterministic. The model is defined as follows where C is the expected cost per hour:

+ TI + T2}+ {[(a + bn)lh]

x llh - t + nE + h(ARL2) + 61 TI + d2T2]} +{l/h + ( 1 - Gl)sTolARLl - t + nE + h(ARL2) + TI + T 2 } .

The terms are: n = sample size. h = hours between samples. L = number of standard deviations from control limits to center line for the chart. kreference value for the CUSUM chart h=decision interval for the CUSUM chart g; intersample interval t= [l - (1 + h )e-'q/[ h(l - e-*g)]. s = e'Agl(l - e' g). ARLl= average run length while in control ARL2= average run length while out of control. h = llmean time process is in control. 6 = number of standard deviations slip when out of control. E = time to sample and chart one item. To= expected search time when false alarm. TI= expected time to discover the assignable cause. T2= expected time to repair the process. dl= 1 if production continues during searches = 0 if production ceases during repair. a2= 1 if production continues during repair = 0 if production ceases during repair. Co=quality costlhour while producing in control. CI= quality costlhour while producing out of control (> Co). Y= cost per false alarm. W= cost to locate and repair the assignable cause. a= fixed cost per sample. b= cost per unit sampled.

x

8

We calculate ARLs for cusum charts using an algorithm developed by Luceno, A. and J Puig-Pey(2002). In our study we use zero state ARL 's following the argument by Taylor(1968) in which he supports this contention with results from a full factorial experiment.

x

The program used to find optimal and cusum-chart design parameters searches over all possible sample size values in a user-specified range. Given the sample size n, the Nelder-Mead simplex algorithm is used to find the optimal control chart parameters (L for the 2-chart, k and h for the cusum chart). We initially experimented with two methods for determining the optimal time between samples g. The first was to include g as a parameter in the simplex search, the second was to use an approximation given by McWilliams (1994)). The approximation

was seen to be quite accurate for virtually all cases considered and its use led to a substantial reduction in computing time, so it is used to find an initial solution in the final version of the program. The few cases observed where the approximation procedure failed were cases where the optimal value of g was considerably larger than the mean time to the occurrence of an assignable cause, so if the initial approximate solution has this property then the program switches to the approach of including g in the simplex search. Finally, the approach of monitoring the process by simply searching for an assignable cause every g hours, without taking a sample, was always included as an option in the search for an optimal plan as in some cases this is indeed the cost-minimizing strategy. The algorithm also incorporates the option of constraining the intersample interval g and/or adding constraints on in- and out-of-control average run lengths. This is done via the use of a penalty term added to the calculated cost function value indicating the extent to which a constraint or series of constraints is violated.

3

Results

We found economic designs for CUSUM charts and for X charts for 216 configurations of cost and system parameters based upon Chiu's(1974 ) first example. Some of these results are given in Table 1. In this experiment we set CI =100,200,300,500,750,1000, b=0.1,0.5,1.0, =0.01,0.05, and 8 =0.5(0.5)3. All other parameters were the same as ChiuYs(1974);note that here we set 8 = 6 =O. For these examples the cost advantages of the economically designed CUSUM control chart were small when compared to the economically designed X chart. The largest difference in cost was a cost increase of 0.15 1 percent gained from using an X chart versus a CUSUM chart. This example was one in which the expected shift was very small; i.e. 6 =0.50. For both charts the in control ARL's were very small, 23.1 and 23.9 respectively for the and CUSUM chart while the out of control ARL 's were 1.17 and 1.18. Small in control ARL 's have caused several researchers, in particular Woodall (198613) to question the validity of economic designs in general. The actual designs were ~ 3 8 L=2.02 , and g-2.89 for the 2 chart and n= 37, k-1.52, h= 0.53, and g= 2.83 for the CUSUM chart.

x

In this experiment the smallest difference in cost occurred not unexpectedly when 8 =3.0; this cost difference was 0.003 percent. The fourth example in Table 1 shows a case where the optimal policy is to search for an assignable cause every 4.06 hours without sampling. Thus, sampling may not always be the optimal policy. Our 216 examples show that in all cases economic CUSUM designs behave similarly to an X design in that k is relatively large when compared to h, a finding also noted by Goel(1968).

Futher experiments were done where we ran the same example above except that we set Y= 5, 200 and W = 1,l O,5O and 6 =l,2,3. We also tried the same example with 6 ,=0,1, 6 =0,1 and 6 = 1,2,3. In these examples the maximum difference in cost between the X chart and the CUSUM chart was 0.354 percent which occurred when 8 =1.0 and the cost of a false alarm was relatively high; here Y=200. For both charts the in control average run lengths for the economic designed charts was about 760, a result due to the high cost of a false alarm. We also ran 72 experiments using Lorenzen and Vance's example problem where we varied (Co, C,) =(0,835) and (114.2, 949), (Y, W)=(200,200),(977,977),(1500,1500), a=0,10,50,200 and 6 =0.5,0.86, 1.5. We present several results from this experiment in Table 2. In 16 of the 72 cases the X design had more than a 1% cost disadvantage. In all but one of these 16 cases the (Y,W)combinations were at the two highest levels. The most extreme cost disadvantage of the X chart occurred when Y=W=1500 and, again expectedly, 6 =0.5. This design is shown in the last row of Table 2. The cost advantage of the CUSUM chart was 9.4%. Here the design was n=20, g=2.99 and L=2.03. The respective in and out of control ARL's were 23 and 1.7. For the CUSUM chart the design parameters were n=l, ~ 0 . 1 k-0.25 , and h= 10.38. ARLS for the CUSUM chart were 1256 and 38. Note here the substantial cost savings of 9.4% coupled with a design that is more satisfying in terms of in control ARL albeit not as satisfying in the small intersample interval ofg=0.10.

x

In some applications there may be restrictions on sample size. Similarly, there may be a simultaneous bound on the intersample interval. To investigate the effect of these bounds we returned again to the experiment where we used Chiu's(1974) first example and varied C, ,b, 1 and 6 as above. In this set of 2 16 runs, though, we placed a constraint on n and g of n l2 and g 2 1. The results from this experiment, several examples of which appear in Table 3, were that in 34 of the 216 runs there is at least an hourly savings of 1% occurring from the use of a CUSUM chart. In 11 of these 34 runs there is at least a 3% cost savings. The most extreme savings of 10% occurred when 8 =1 rather than when 6 =0.5. (Here the design is to use n=2, g=l, k=0.707 and h= 1.75. ARL's are 27.6 and 3.2, the in control ARL making this design most likely not employable)

4

The impact of variance components In the examples we presented above we assumed x -N(P,O) when the process is in control and x -~(p+8o,a')

when the process is out of control. Now suppose we have several components of variance that cannot be eliminated (due to variations in materials, machines, workers, etc.). Here, if 0 2 B is the between sample variance and 0 2 w is the within sample variance, then, X - ~ ( , u ,+o0 2~W )~ . Then, the in control distribution of 2 is x - N( p, o 2e + o 2wmJ where no is the sample size used to estimate these components of variance. The out of control distribution is x N( 6 + p,o 2B + o 2W/nO)where 6 = ( a 2B + o 2w/nO)112 The standardized one sigma shift, i.e. 6 = 1, for both the and CUSUM chart is ( O 2B + o 2W/nO)1'2/(a 2B + a 2w/n)1'2

-

x

We present some examples of economically designed control charts in Table 4 that were designed when there were variance components. The other parameters were obtained using Chiu's (1974) example 1. Several interesting conclusions can be drawn from these few runs. First, there are more substantial cost savings resulting to the use of CUSUM charts when there are variance components then when there is a single component of variance. This is due to the fact that an increase in n does not affect o 2B SO the increased power of the CUSUM is more relatively advantageous here when compared to the situation examined in Table 1 where there is only the single component of variance. Note also the other advantages of the CUSUM in some of the examples. For example, consider the three examples in the set (second) of examples where Y=W=100.0. The cost advantage of the CUSUM ranges from 22 to 31% and while the ARL's out of control are similar the in control ARLs of the CUSUM range from 53 1 to 723 when compared to the optimal X designs whose out of control ARL's range from 98 to 123. It is also interesting to note that the optimal economic CUSUM designs are more in line with typical CUSUM designs in the relationship of k to h rather than the other economic designs we have found where the CUSUM performs much like an X chart.

5

Economic statistical design of CUSUM and charts

X control

Saniga (1989) has shown that one can eliminate many of the problems with the applicability of economic designs by solving optimizing ( 1 ) above with the addition of constraints of the type: ARLO,2 ARLboundO,, i=1,2 ,...,rn and,

ARLO 4 ARLbound,, j=1,2,..,q

where ARLbound are bounds on the ARL's. (Note that there can be any number of contstraints , represented here by m and q.) For example, one may wish to place a lower ARL bound on the in control ARL and an upper bound on the out of control ARL at the level of the expected shift. Note that the design solution that satisfies these bounds is a statistical design and note also that there can be many feasible statistical designs. One interesting aspect of designs that are found by optimizing an economic model coupled with statistical constraints is that the optimal design may be tighter than the statistical design that just meets the constraints. For example, an optimal design may have an in control ARL that is larger than the target value of the statistical design and the out of control ARL may be smaller than the target ARL. We solved several economic statistical design problems for CUSUM and X charts and present the results in Table 5. Once again, we explore these results in the context of a problem due to Lorenzen and Vance (1986). In this particular configuration of parameters the X control chart has an economic disadvantage over the CUSUM chart of a little more than 2% (Case 1 in the Table). It is interesting to note that neither design has an in control ARL that would be acceptable in practice; these are respectively 14 and 28. In cases 2 through 9 we placed constraints on the problem such that the in control ARL is at least 150. We also placed constraints on a small shift of little importance ( 8 = 0.1) such that either chart would not signal. This constraint was ARLS = 0.1 2 80-100 for the 8 cases with constraints. We also placed constraints on the ARL for a small shift of 6 = 0.5 of ARL 4 1.5-4 for various cases. Note from cases 2-4 that there is a substantial savings of as from 15-17% per hour gained from using the CUSUM chart instead of the 2 chart. Note also that this advantage increases as one places a higher ARL constraint on the small shift of little importance. The advantage of the CUSUM decreases from 17% to 2% as the constraint on the out of control ARL for 6 = 0.5 becomes more restrictive. Note from the designs themselves that the economic statistical designs are characterized by nearly 3 o control limits but large n for the F chart while the CUSUM designs can result in nearly a 100% reduction in sample size over the 7 design in some cases (2,3,4). It is interesting to compare the average time to signal a shift, or ATS= gARL, for the economically optimal designs. In case 4 the ATS's for the three levels of 6 = 0.0,0.1,0.5 are respectively 425,196 and 5.6 for the X chart and 292,119 and 6.5 for the CUSUM chart. So the CUSUM chart has considerable worse ATS's but is cheaper. In case 5 on the other hand the X ATSS are 425,196 and 5.6 while for the CUSUM chart the ATS's are 475,168 and 4.9. Thus, here the X chart is almost 13% more costly and has ATS's not as good as those of the CUSUM except for the very small shift.

Generally, this small experiment shows that the addition of statistical constraints on ARL can result in a cost advantage for the CUSUM chart.

6

Summary

In this study we discuss some situations in which there are economic advantages to using CUSUM charts over charts to monitor a stable process. These situations are:l. when there are high costs of false alarms and high costs to locate and repair the assignable cause of poor quality; 2. when there are restrictions on sample size and the intersample interval; 3. when there are several components of variance; and 4. when there are constraints on ARL. Some numerical results we present show the magnitude of the economic advantage to the CUSUM chart.

x

References 1. Chiu, W.K. (1974) The Economic Design of CUSUM Charts for Controlling Normal Means. Applied Statistics 23,420-433 2. Duncan, A.J. (1956) The Economic Design of X bar Charts Used to Maintain Current Control of a Process. Journal of the American Statistical Association 5 1,228-242 3. Gan, F.F. (1991) An Optimal Design of CUSUM Quality Control Charts. Journal of Quality Technology 23,279-286 4. Goel, A.L. (1968) A Comparative and Economic Investigation of X bar and Cumulative Sum Control Charts. Unpublished PhD dissertation, University of Wisconsin. 5. Hawkins, D.M. (1992) A Fast Accurate Approximation for Average Run Lengths of CUSUM Control Charts. Journal of Quality Technology 24,37-43 6. Lorenzen, T.J., Vance, L.C. (1986) The Economic Design of Control Charts: A Unified Approach. Technometrics 28,3-10. 7. Lucas, J.M. (1976) The Design and Use of V-Mask Control Schemes. Journal of Quality Technology 8, 1-12 8. Luceno, A., Puig-Pey, J. (2002) Computing the Run Length Probability Distribution for CUSUM Charts. Journal of Quality Technolqgy 34,209-215 9. McWilliams, T.P. (1994) Economic, Statistical, and Economic-Statistical Chart Designs. Journal of Quality Technology 26,227-238 lO.Nelder, J.A., Meade, R. (1965) A Simplex Method for Function Minimization. The Computer Journal 7 1 l.Prabhu, S.S., Runger, G.C., Montgomery, D.C. (1997) Selection of the Subgroup Size and Sampling Interval for a CUSUM Control Chart. I.I.E. Transactions 29,45 1-457

x-

12.Saniga, E.M. (1989) Economic Statistical Design of Control Charts with an Application to X bar and R Charts, Technometrics 3 1 , 3 13-320 13.Taylor, H.M. (1968) The Economic Design of Cumulative Sum Control Charts. Technometrics 10,479-488 14.Von Collani, E. (1987) Economic Process Control. Statistica Neerlandica 41, 89-97 15.Woodal1, W.H. (1986a) The Design of CUSUM Quality Control Charts. Journal of Quality Technology 18, 99-102 16.Woodall, W.H. (1986b) Weaknesses of the Economic Design of Control Charts (Letter to the Editor). Technometrics 28,408-410

Table 1 OPTIMAL CUSUM AND CONTROL CHART DESIGNS CHIU (1974) EXAMPLE 1

TABLE 2 OPTIMAL ECONOMIC CUSUM AND CONTROL CHART DESIGNS LORENZEN AND VANCE (1986) EXAMPLE

TABLE 3

x

OPTIMAL CUSUM AND CONTROL CHART DESIGNS CHUI (1974) EXAMPLE 1 n52,gzl

Table 4 OPTIMAL CUSUM ANDFCONTROL CHART DESIGNS WHEN THERE ARE TWO = 1) COMPONENTS OF VARIANCE (no = 4, dB

TABLE 5 OPTIMAL ECONOMIC-STATISTICAL DESIGNS FOR VARIOUS CONSTRAINTS

arameters

cusum parameters

*Binding Constraint Economic Design Inputs /2=0.02, 6 = 0 . 5 0 , E = t = t 1 = 0 . 0 8 3 3 , t t = 0 . 7 5 , C ~ = O , C =835.0, ~ Y = W=977.4,a= 10.0,b=4.22, & = 1, &=O

Choice of Control Interval for Controlling Assembly Processes Tomomichi Suzuki, Taku Harada, and Yoshikazu Ojima Tokyo University of Science, 2641 Yarnazaki, Noda, Chiba, 278-85 10, JAPAN, [email protected]

Many industrial products are produced by continuous processes. Time series analysis and control theory have been widely applied to such processes. There are also many industrial products that are produced by assembly processes. In those processes, time series analysis and control theory are usually not required. However, there exist assembly processes that do need time series analysis for effective process control. Problems are discussed when time-series analysis methodology is applied to specific assembly processes for effective process control, especially if the number of products is high. Influential factors like the control interval and dead time of the process under consideration are considered. Summary.

1 Introduction Time series analysis is applied in many fields and there are many authors such as Harvey (1993) and Akaike and Kitagawa (1994). In industrial fields, it is used mainly as a tool for system identification and controlling the process illustrated by Akaike and Nakagawa (1972). It is usually used in process industry, and not commonly used in assembly industry. The assembly processes themselves usually do not depend on time, but the cause of disturbances such as environmental factors depend on time. Therefore, appropriate use of time series analysis methods can lead to effective control of the relevant assembly process. To control a dynamic process effectively, there are two stages that should have been done. The first stage is the identification stage. Here the system model of the process is built, identified, and diagnosed. This is best done by thorough examination of the process and of a designed experiment for identification. The second stage is the controller design stage. Here the control equation is calculated based on the result of the first stage. From the viewpoint of time series analysis, the most important aspect of the assembly processes is that the number of the products is often very large. If each product is analyzed separately, the total size of the data for time series analysis will be extremely tedious, and the statistical model of the process will be very complicated and its validity will be in doubt. Hence the need to how to handle the

data in terms of time series analysis should be discussed. The proposed method considers such factors as the sampling interval, the control interval, and the dead time of the process. The proposal is presented through analysis of a precision instrument manufacturing process.

2 Controlling Assembly Processes 2.1 The Process

The process we have analyzed is the assembly process manufacturing components of a precision instrument originally presented by Ishii (1995). The main part of the component is the rotating part. The speed of the process is so high that it produces about fifty thousand components per day. The most important characteristic of the product is the number of rotation. The characteristic is measured automatically for each of the products. The number of rotation has a target value. Deviation from the target value can be compensated by adjusting the control variable which is the depth or position of the rotating part. The process is controlled by feedback control; that is adjusting the depth according to the measured value of characteristics. Because there is a pool for keeping the intermediate products, there is a delay between the input and the output of the process. 2.2 Practical Problems

When actually controlling the process, there are a number of difficulties to overcome. The process had been controlled by the control scheme developed heuristically by the operators. The control scheme considered the effects of the following factors. a) Accuracy of measuring the number of rotation that is the output characteristics of the product b) Speed of production which is very fast c) Process dead time that is the number of waiting parts in the pool whose size is usually below 100 The control scheme had been to adjust the control variable according to the average value for every 100 products. The adjustment of the control variable is automatically conducted. The optimum sampling interval for system identification is based on the sampling theorem cf. Goodwin and Sin (1984) and Astrom and Wittenmark (1995). To control this process, we need system identification and control. We need to obtain data, identify the process, formulate controller, and then apply the controller. This series of analyses are done for the same time interval. Since our aim is

to control the process, we investigate that time interval and call it 'control interval'. The most important factor for controlling this process is how to select the control interval since analyses for system identification and control simulation as well as daily operations will be performed based on the selected value of the control interval. The next chapter deals with the control interval and its effects.

3 Evaluating Effect of Control Interval 3.1 Obtaining the Noise Series We analyzed the data for four continuous days. The number of rotation is recorded for each of the product. The size of the data is approximately 280,000. After checking the outliers, 241,000 data are used for further analyses. The value of the control variable is not recorded, but it can be calculated by using the control scheme and the deviation from the target. The transfer function that is the relation between control variable and the process output was already identified by performing the experiment. We estimated the noise process. Since analyzing or controlling the process per each data is not practical at all, we calculated the average for every 10 data. As a result, the noise series of length 24,100 is obtained. The plot of the noise series is shown as Fig.1. From Fig. 1, we learn that the noise is varying gradually and waving largely.

Fig. 1. Plot of the Noise Series

3.2 Control Interval

We calculated the noise series that is made from the average value of each interval. The control intervals we investigated are shown in Table I. Table 1. Control Intervals Investigated

3.3 Identifying the Noise

Identifications of the obtained noise series are performed for each of the time intervals considered in chapter 3.3. We used ARIMA models discussed by Box et a1.[4]. Since identifications are performed for many series, we need to evaluate and compare the results. We used AIC as a goodness of fit measure. System identifications are performed for all the control intervals. After examining the results, the following two models are selected. Model 1: IMA(1,l) process Model 2: AR(2) process Since our aim is not focused only on determining the best fit model, but also on evaluating the effect of the control interval, we discuss two models. The values of AIC for those two models were actually close for most of the control intervals. 3.4 Effect of the Control Interval In this section, we discuss the effect of the control interval over MSE when MMSE controllers are used. 3.4.1 MMSE Controller

The model considered in this paper is described as equations (1) and (2). The system model is described as

where the noise N, is a linear stochastic process expressed as equation (2).

Y,: Process characteristic, Output Variable to be controlled. The objective of control is to keep this variable as close as possible to the target value, which is set to zero without losing generality. Each Y, is the average of all the latest measured characteristic for each of the time interval. X,: Manipulated Variable, Input This variable is adjusted in order to control the process output. N,: Process Noise ARIMA process with order p for AR, d for I, and q for MA, and its values considered as given. a,: White Noise The variance of the white noise is considered as given. $ Dead Time The lag between process input and the process output, which is considered as given. B: Backshift operator L2(B)ILI(B): Transfer Function This function expresses relation between input and output.

In equation (I), we consider the transfer function to be stable, when we write

L,(B)/L,(B) = v ( B )=v, +v,B+v2B2

+ - a -

, weassume f),l 1 in GR2(m) rule. The Markov chain is homogeneous under an in-control process. Hence, the transition probability is stationary and the transition matrix doesn't depend on time. The each element of the transition matrix is the probability transiting from S, to S,. In m = 2 of GRl (m), the transition matrix P is

where p = Pr[Xt = l(the observed observation will exceed the control limit)]. The mean absorbing time is (I - Q)-I J, where Q is the matrix of probabilities above with the first row and column removed from P and J is the column vector with the elements of 1. The in-control ARL can be calculated based on the mean absorbing time. Let the absorbing time of a Markov chain starting from the initial transient state Si be ati. If the run length under the initial transient state Si is rli, ati

is equal to (rli - m). Hence the mean absorbing time E(ati) = E(rli) - m. E(rli) is the average run length under the initial transient state Si,and then the in-control ARL under the initial transient state Si is

The in-control ARL for GRl ( m ) is

If the initial state is the transient state, E[rlIX1 = Si]is the formula (2). If the initial state is the absorbing state, E[rlIX1 = Si]is m. The in-control ARL for GR2(m) is l {So,S l , . . . , S,)] ARL = E [ ~ l l X E n

=

C E[rEIX1 = Si]P r [ X l = Si].

(4)

i=O

If the initial state is the transient state, E[rlIX1 = Si]is the formula (2). If the initial state is the absorbing state except for So,E[rlIX1 = Si] is m. In X1 = So,E[rlIX1 = So]is m - 1. The formulae (3) and (4) are functions of p. The probability satisfying ARL = 370.4 that an individual observation exceeds the control limit is numerically calculated in the range [O,1]by the Newton-Raphson method. Hence we calculate kl and k2 as @-l(1- ( p / 2 ) ) ,where @ is the cumulative distribution function of the standard normal distribution. These results are shown in Tables 1 and 2. Table 1. Control limit coefficients kl in G R l ( m )

Table 2. Control limit coefficients kz in G R z ( m )

3 Evaluation There are several variations of the process mean in an out-of-control process by some causes. We consider the typical ones, i.e. step shift, trend, and between group variation. These are sustained shifts. When Yt is the observation in an out-of-control process, these are modeled as follows, respectively,

fi Yt Yt

--

+

N(p0 60, 0 2 ) ,6 : the amount of the shift N(p0 ( p t ) ~0 2, ) ,P : the proportionality constant N(po, (y2 l ) 0 2 ) ,y : the ratio of variation

+

+

We call these models the step shift situation, the trend situation and the between group variation, respectively. The model of between subgroup variation is & = po at etra N(0, y u 2 ) , & N ( 0 , u 2 ) . Therefore y is the ratio of between group variation and the variation of individual observation. The Shewhart control chart with the generalized rules is evaluated by the outof-control ARL and the out-of-control standard deviation of the run length (SDRL). The out-of-control ARL and SDRL are defined as ARL and SDRL until the variation of the process mean is detected, assuming that has already been occurred in starting the control chart.

+ +

-

-

3.1 T h e S t e p Shift Out-of-Control Situation

The out-of-control ARLs under the step shift situation, in GRl(m) are shown in Table 3. m = 1 is the Shewhart's three sigma rule. These ARLs are obtained by substituting p with pl in the transition matrix, where pl = Pr[Yt is the out of control]. The ARLs in S = [0.75,2.50] are minimalized in case of m = 2, and those in other range are minimalized in case of m = 1. The ARLs in S = [0.75,2.00]of m = 3 are less than those of m = 1. The ARLs for any 6 are increasing when m is increasing (m = 2, . . . '8). Hence m = 2 of GRl(m) is effective for the step shift in S = [0.75,2.50]. The SDRLs for GRl(m) and GR2(m) in the step shift situation can be mathematically calculated by an absorbing Markov Chain approach. The variance of the run length for GRl(m) is

where M = ( I - Q)-', z = M J and ~ C is T the row vector with the elements of Pr[X1 is the transient state S,].The variance of the run length for GRz(m) is x~(2M - 1 )-~( 7 ~ ~ 2 ) ' n O ( l2nTz), (6)

+

+

where no = Pr[X1 = So]. Hence, the SDRLs for G R l ( m ) and GR2(m) are the square root of the formula (5) and ( 6 ) , respectively. Table 4 shows the out-of-control SDRLs under the step shift situation in GRl (m). In 6 0.75, all the GRl(m) rule have the smaller standard deviation

>

than the Shewhart's three sigma rule, and the variation of the run length for m = 8 is the smallest. The out-of-control ARLs under the step shift situation in GR2(m) are shown in Table 5 . The first column m = 1 is the Shewhart's three sigma rule. These results are obtained by the same procedure as the case GRl(m). The ARLs of m = 3 in S = 0.25 and 0.50 are the smallest of ARLs for other m's. In the range 0.75 to 1.50, ARLs of m = 4 are the smallest. In the range 1.75 to 2.50, those of m = 2 are the smallest again and the Shewhart's three sigma rule ( m = 1) has the smallest ones in 6 2 2.75. The most ARLs in 6 = 1.0 to 2.0 for m 2 3 are smaller than that of m = 1. Hence Our proposed rules GRz(m), especially m = 3, are effective for the step shift of 6 = 1.0 to 2.0. The out-of-control SDRLs under the step shift situation in GR2(m) are shown in Table 6. The SDRLs for GR2(m),m = 3 , . . . , 8 are approximately the same, however, those are smaller than the SDRLs of the Shewhart's three sigma rule. Therefore, GR2(m = 3 or 4) rules can detect the step shift from small to large stably in aspect of both the average and the standard deviation of run length. Table 3. Out-of-control ARL under the step shift situation in GRl ( m )

3.2 The Trend Shift Out-of-Control Situation

Tables from 7 to 10 are the ARLs and SDRLs under the trend. The Markov chain isn't homogeneous under the trend, since the probability the observation is out of control at time t is a function on time t . In addition, it is non-linear because the observations have the normal distribution. Therefore it is difficult

Table 4. Out-of-control SDRL under the step shift situation in GRl(m)

Table 5. Out-of-control ARL under the step shift in GR2(m)

to calculate out-of-control ARLs and SDRLs by the absorbing Markov chain mathematically. We calculate them by Monte Carlo Method. The number of the replication is 10,000. The out-of-control ARLs for G R l ( m )are shown in Table 7 under the trend shift situation. The ARLs of m = 2 and 3 in the p's range 0.01 to 0.07 are slightly smaller than those of m = 1. In other words, the m = 2 and 3 are better at detecting the slowly increasing shift than the Shewhart's three sigma rule.

Table 6. Out-of-control SDRL under the step shift situation in GRZ(m)

The out-of-control SDRLs for GRl(m)are shown in Table 8 under the trend shift situation. There is hardly difference between GRl ( m )rules about the standard deviation of run length. On the other hand, SDRLs for GRz(m) shown in Table 10 are smaller than that for the Shewhart's three sigma rule in p 5 0.05. This result means that GR2(m)is stabler about detecting the out-of-control process, when the gradual trend shift occurs. Table 9 is the result of the out-of-control ARL for GR2(m)under the trend. The ARLs of m = 8 are similar to those of m = 1. The ARLs of m = 3 and m = 4 are the smallest of m's through all P's. Hence the generalized run rules GR2(m)have the higher performance than the Shewhart's three sigma rule. 3.3 The Between Group Variation Out-of-Control Situation

We show the performance of the generalized rules under the between group variation situation in from Table 11 to Table 14 . These results are calculated by Monte Carlo method with 10,000 replications. In Tables 11 and 13, the out-of-control ARLs of GRl(m)and GR2(m)are approximately equal to or larger than those of Shewhart's three sigma rule. In Tables 12 and 14, the SDRLs of GRl(m)and GR2(m)are also approximately equal to or larger than those of Shewhart's three sigma rule. Hence the generalized run rules are not good at detecting the between group variation very much.

Table 7. Out-of-control ARL under the trend situation in GRl(m) m kl

1 2 3 4 5 6 7 8 3.0000 1.9322 1.4514 1.1644 0.9704 0.8296 0.7224 0.6381

Table 8. Out-of-control SDRL under the trend shift situation in GRl ( m )

Table 9. Out-of-control ARL under the trend situation in GRZ(m)

Table 10. Out-of-control SDRL under the trend shift situation in GRz(m)

Table 11. Out-of-control ARL under the between group variation situation in

GRl (m)

Table 12. Out-of-control SDRL under the between group variation in GRl(m)

m k1

1 2 3 4 5 6 7 8 3.0000 1.9322 1.4514 1.1644 0.9704 0.8296 0.7224 0.6381

Table 13. Out-of-control ARL under the between group variation in GRz(m)

Table 14. Out-of-control SDRL under the between group variation in GRz(m)

4 Conclusions In the Shewhart control chart, an out-of-control signal occurs when the current observation exceeds three sigma control limit. We generalized this rule and proposed the new rules based on successive runs. False alarm rates of these rules are controlled to ARL = 370.4. The performance for our proposed rules is evaluated from the aspects of both the average and standard deviation of run length under the three out-ofcontrol situations. The ARLs and SDRLs for our rules ( G R l ( m )and G R 2 ( m ) ) are smaller than the Shewhart three sigma rules, when the amount of the step shift is moderate and the slope of trend is gradual. However, G R l ( m ) and G R z ( m )are not powerful for detecting large shifts, because of the affection of the lower limit m. The performances for Klein (2000) are better in step shift and trend shift situations than those for our rules. However our rules are better in the between subgroup variation situaiton. Klein' s two of two rule is that no out-of-control signal is given if one of two successive points is above an upper limit and the other is below a lower limit. It is also the same for the two of three rule. Thus foregoing results are obtained. Hence, some of generalized run rules are effective for small shifts and slowly increasing ( decreasing ) shifts. Especially effective are the rules that 2 of 3 successive observations exceed the 2.0698 sigma control limit and that 2 successive observations exceed the 1.9322 sigma control limit.

5 Acknowledgments The authors thank the referees for useful comments on an earlier draft.

References 1. Chaing, D, and Niu, S. C. (1981). "Reliability of consecutive-k-out-of-n : F System", IEEE. Trans. Reliability, Vol. R-30, No. 1, pp.87-89. 2. Champ, C. W. and Woodall, W. H. (1987). "Exact Results for Shewhart Control Charts With Supplementary Runs Rules", Technometrics, Vol. 29, No. 4, pp.393-399. 3. Derman, C., Lieberman, G. J. and Ross, S. M. (1981). "On the Consecutive-kof-n : F System", IEEE. Trans. Reliability, Vol. R-31, No. 1, pp.57-63. 4. Klein, M. (2000). "Two Alternatives to the Shewhart X Control Chart", Journal of Quality Technology, Vol. 32, No. 4, pp.427-431. 5. Kolev, N. and Minkova, L. (1997). "Discrete Distributions Related t o Success Runs of Length k in a Multi-State Markov Chain", Commun. Statist. Theory and Meth, Vol. 26(4), pp.1031-1049. 6. Page, E. S. (1955). "Control Charts with Warning Lines", Biometrika, 42, pp.243-257.

Part 2

On-line Control

2.3 Monitoring

Robust On-Line Turning Point Detection. The Influence of Turning Point Characteristics E. Andersson Goteborg University, Statistical Research Unit, PO Box 660, SE-405 30 Goteborg, Sweden [email protected] Summary: For cyclical processes, for example economic cycles or biological cycles, it is often of interest to detect the turning points, by methods for on-line detection. This is the case in prediction of the turning point time in the business cycle, by detection of a turn in monthly or quarterly leading economic indicators. Another application is natural family planning, where we want to detect the peak in the human menstrual cycle in order to predict the days of the most fertile phase. We make continual observation of the process with the goal of detecting the turning point as soon as possible. At each time, an alarm statistic and an alarm limit are used to make a decision as to whether the time series has reached a turning point. Thus we have repeated decisions. An optimal alarm system is based on the likelihood ratio method. The full likelihood ratio method is optimal. Here we use a maximum likelihood ratio method, which does not require any parametric assumptions about the cycle. The alarm limit is set to control the false alarms. The influence, on the maximum likelihood ratio method, of some turning point characteristics is evaluated (shape at the turn, symmetry and smoothness of curve). Results show that the smoothness has little effect, whereas a non-symmetric turn, where the postturn slope is steeper, is easier to detect. If a parametric method is used, then the alarm limit is set in accordance with the specified model and if the model is mis-specified then the false alarm property will be erroneous. By using the maximum likelihood ratio method, the false alarms are controlled at the nominal level. Another characieristic is the intensity, i.e, the frequency of the turns. From historical data we construct an empirical density for the turning point times. However, using this information only benefits the surveillance system if the time to the turn we want to detect, agrees with the empirical density. If, on the other hand, the turn we want to detect occurs "earlier than expected", then the time until detection is long. A method that does not use this prior information works well for all turning point times.

1 Introduction In many situations it is important to monitor a process in order to detect an important change in the underlying process. In statistical process control we often want to detect a change in the mean (or mean vector) or a change in the variance

(or covariance matrix), see Sepulveda and Nachlas (1997), Gob et al. (2001), Wu et al. (2002). In this paper we consider cyclical processes where it is of interest to detect the turning points. One example is turning point detection in leading economic indicators, in order to predict the turning point time of the general business cycle (Neftci (1982)). Another example is natural family planning, when the most fertile phase is preceded by a peak in a hormone (Royston (1991)). In public health it can also be of interest to monitor the influenza activity, in order to detect an outbreak or a decline as soon as possible (Baron (2002), Sonesson and Bock (2003)). In the financial market the aim is to maximize the wealth. An indicator is monitored with the aim of detecting the optimal time to trade a certain asset. Optimal times to trade are related to changes (regime shifts) in the stochastic properties of the indicator. Thus also here the aim is to detect the next turn (regime shift), see Bock et al. (2003), Dewachter (2001), Marsh (2000). The monitoring is made on-line and an alarm system is used to make a decision as to whether the change has occurred or not. The alarm system consists of an alarm statistic and an alarm limit and can be presented in a chart. The alarm statistic can be based on only the last observation (the Shewhart method, Shewhart (193 1)) or all the observations since the start of the surveillance, as is done in the CUSUM method (Page (1954), Woodall and Ncube (1985)) and the EWMA method (Roberts (1959), Morais and Pacheco (2000)). The alarm limit can be constant or time dependent. When the statistic crosses the limit there is an alarm, which can be false or motivated. The alarm limit is often set to control the false alarms, for example by a fixed average run length to first false alarm (ARLO). When evaluating a surveillance system it is important to consider the timeliness of the alarms, i.e. how soon after the change that a motivated alarm is called. Ideally we want an alarm system which has few false alarms and short delay for motivated alarms. It has been shown (Fris6n and de Mar6 (1991)) that optimal methods are based on the likelihood ratio between in-control and out-of control processes. Likelihood ratio methods are based on full knowledge of the process incontrol and out-of control, only the time of the change is unknown. When monitoring a cyclical process this means that the optimal method assumes knowledge of the parametric structure of the cycles. In a practical situation we estimate the parameters using previous data. Since the cycles often change over time, there is always a risk for mis-specification. Therefore we will use a nonparametric method, which is not based on any parametric assumptions. This nonparametric approach is compared to the optimal method. The effect of different data generating processes is investigated, as well as the effect of using prior information about the turning point time.

2 Statistical surveillance In the business cycle application, the process under surveillance, X, is a monthly (or quarterly) leading economic indicator. Each month (or quarter) the alarm system is used to decide whether there is enough evidence that the change (here a turn) has occurred yet. Thus we have repeated decisions and at each decision time we want to discriminate between the event that the process is in-control (denoted D) and the event that the process is out-of control process (denoted C). The following model for X at time s is used

where E is iid N(0; 0).At an unknown time z there is a turn in p. The assumptions in (1) can be too simple for many applications and extensions could be motivated by the features of the data or by theory. Model (I), however, is used here to emphasize the inferential issues. We want to control the risk of giving an alarm when there has been no turn. In hypothesis testing this is done by the size (a).In a situation with repeated decisions, if a should be limited (to e.g 0.05, see Chu et al. (1996)), then the alarm limit would have to increase with time. The drawback of this is that the time until the change is signaled can be long, for changes that occur late (see Pollak and Siegmund (1975), Bock (2004)). In much theoretical work on surveillance the false alarms are controlled by a fixed expected probability of a false alarm (see Shiryaev (1963), Frisen and de Mare (1991)). In quality control the ARLO is often fixed (the Average Run Length to the first alarm when the process is in-control). Hawkins (1992), Gan (1993) and Anderson (2002) suggest that the control instead be made by the MRL' (the median run length), which has easier interpretations for skewed distributions and much shorter computer time for calculations. The time of alarm, tA, is defined as the first time for which the alarm statistic crosses the alarm limit. Quick detection is important and one evaluation measure that reflects this is the conditional expected delay (see Fris6n (2003))

where T = time of turn. CED measures how long it takes before the system signals that a turn has occurred. Often the ARL' is reported (the average time to an alarm, given that the change occurred at the same time as the surveillance was started) and ARL' = CED(I)+l.

2.1 The likelihood ratio method

The surveillance method that is based on the likelihood ratio, f(x I C)/f(x I D), is optimal in the sense of a minimal expected delay for a given false alarm probability (Friskn (2003)). The events C and D are formulated according to the application. For example, when it is of interest to detect if the change (in p) has occurred at the current time, then C={T=S) and D={r>s), which implies p D : p ( l ) = p ( 2 ) = . . . = p (s ) = p 0 , slp(s> = P, f lm = p , )

'

Thus for this situation it is optimal to use the Shewhart method where the alarm statistic is based only on the last observation, x(s). In this paper we are interested in detecting if there has been a change since the ) D={z>s). Then the likelihood start, i.e. C={zls}={{r=l}, {2=2},..., { ~ s } and ratio is based on all s partial likelihood ratios, i.e.

where x,={x(l), x(2), ..., x(s)} and wj=P(z=j)/P(r$s). We want to detect a turn in p (hereafter exemplified with a peak), thus we know that p is such that z > s (3a) ( 1 ) ( 2 )(

j

- l),(j-1 .

s),

z=j.(3b)

If the correct parametric structure of p is known (for example piecewise linear), then p, conditional on event D and C, can be specified

where

Po, PI and P2 are known and Cj denotes the event {z=j).

Given that the parametric model of p is known (for example.as above), the following surveillance method can be used (suggested by Shiryaev (1963), Roberts (1966)). The method is hereafter denoted the SR method and gives an alarm when

where c is a constant alarm limit. FrisBn and de Mare (1991) showed that using the SR method for surveillance is, under certain conditions, the same as using the posterior probability in a Hidden Markov Model approach (Hamilton (1989), Le Strat and Carrat (1999)). 2.2 The maximum likelihood ratio method

In practice p is seldom known but estimated from previous data. If the structure of p changes over time (as it often does, see Figure l), then an estimate based on previous data can result in a mis-specified p (further discussed in Section 3.3).

Fig 1: Swedish industrial production index, monthly data for the period January 1976 December 1993. Source: Statistics Sweden.

A surveillance method based on non-parametric estimation, without any parametric assumptions, was suggested by Frisen (1994). This method is based on the maximum likelihood ratio

max f(x,

Ic>

ID)

max f(xT

'

where the likelihoods are maximized under the order restrictions in (3). The aim is to detect a peak in p, from monotonically increasing to unimodal. Thus, we have

where pD and pC are unknown and estimated, unlike the likelihood ratio in (4) where pD and pC are known. The estimation is made using non-parametric regression under order restrictions, see Barlow et al. (1972) and FrisBn (1986). The solution technique is the pool adjacent violators algorithm, described in e.g. AD Robertson et al. (1988). For the monotonic case ( p ), also called isotonic regression, the aim is to estimate p(t)=E[X(t)] as a function of time, under order restrictions. This is a least square estimate where the sum of squares is minimized under the monotonicity restriction in (3a): (p(1)I p(2)< ... T. Focus is on episodes occurring in specified time windows. A window of width win is a subsequence EL,..., Et+,in-l of win successively observed events. For instance, Eloo= A4, Elol = 0,Eloz= A3, Elo3= A], EIo4= A2 is a window of length 5 starting a t time t = 100. Sucessive windows Et,..., Et+,in-1, E t + l ,..., EttWi,,of given width win overlap in the w i n - 1 observations win Et+1,..., Et+,inPl observations. For given width w i n , only the T windows between E-,in+l, ..., Eo and E T , ..., ET+,in-l are interesting. T h e remaining windows consist of the null event 0. An episode a of events A i l , ...,Ai, occurs in a window Et, ..., Et+,in_l if there are k pairwise different times t t l , ..., 5 t h 5 t w i n - 1, in the time order prescribed by the epsiode, with Etl = Ail, ...,Et, = Ai,. For instance, the window Eloo= A4, ElO1= 0 ,Elo2 = A3,ElO3 = A1,ElO4 = A2 contains all epsiodes from figure 1. If a window contains a n episode a , it also contains all subepisodes 5 a.

+

t , e. g., t o allow for preventive action on the system. The necessary distance is prescribed by the lead tzme length or warnrng tzme length w, i. e., any prediction made a t time t refers to times s t + w. If the reference times are too far away from

>

t h e prediction time t , the prediction is meaningless. T h e latter requirement is reflected by prescribing the monitoring period length m > w, where any prediction made a t time t refers t o times s 5 t m. Hence a prediction of a target event Z a t prediction time t asserts that Z will occur a t some time t + w 5 s 5 t m, i. e., that there will be some time t + w 5 s 5 t m with E , = Z.Hence the meaning of "correct prediction" ("hit") a n d "false prediction" is obvious.

+

+

+

T h e empirical evaluation of the prediction strategy reflects two opposing requirements which have to be balanced by the strategy. T h e first requirement is diversity in prediction attempts, i, e., the strategy should not concentrate on the most perspective target events where successful prediction is most easily to be achieved. T h e set 2 of target events should be covered as far as possible by correct predictions. This amounts to requiring a high value of the hit rate or recall #target events correctly predicted recall = (3) #target events occurring T h e second requirement is precison. T h e number of correct predictions should be large in comparison with the number of false predictions. This amounts t o requiring a high value of the precision

precision

=

#correct predictions #predictions

Both quantities recall and precision are simultaneously accounted for by the weighted harmonic mean

F,

=

1

&(-+A)

-

+

(T 1 ) . recall . precision recall 7 . precision

+

(5)

where a weight T > 0 is attached t o recall. F, is known as F-measure in the theory of information retrieval, see van Rijsbergen (1979). T h e measures recall, precision, F, can be used t o evaluate the entire prediction strategy or single patterns. For further diversity considerations a distance measure d ( 4 , $) between two prediction patterns is established as the ratio of the number of target events occurring where 4 and .J, differ in prediction, divided by the total number of target events occurring. Then

measures the similarity of a prediction pattern patterns 11.

4 t o t h e entire set of prediction

From a learning event sequence (Et)tzo,...,-,t h e sets @z of prediction patterns are determined by a genetic algorithm. T h e algorithm is initialized

with patterns consisting of single events. In each step, new patterns are created from existing ones by crossover and mutation operators, and worse patterns are discarded in favour of better ones, where the patterns are evaluated by the quotient F 7 ( 4 ) / N ( 4 ) of the F-measure and the similarity measure N(#J). For details, see Weiss (1999) and Weiss & Hirsh (1998).

1 2 Relations between Temporal Pattern Analysis and Stochastic Time Series Analysis Paragraphs 10 and 11 describe two instances of KDD approaches to the analysis of patterns in time series of events, one concentrating on frequent patterns, the other concerned with prediction of rare events. In recent years, the subject has received growing interest in research, see for instance the bibliography by Roddick & Spiliopoulou (1999). Time series analysis is a classical subject of stochastics. Over the last two decades, the analysis of categorical time series received considerable attention in the academic literature. Several models were suggested. Raftery (1985) introduced the mixture transition distribution (MTD) model for p-th order Markov chains. For methods of empirical inference under MTD models see Berchtold & Raftery (1999). Jacobs & Lewis (1978a, 197813, 1978c) introduced DAR (discrete autoregressive) and DARMA (discrete autoregressive moving average) processes. Jacobs & Lewis (1983) generalize DARMA to NDARMA(p, q) (dzscrete autoregressive moving average) models. Several authors discuss regression models, see Fahrmeir & Kaufmann (1987), Green & Silverman (1994), Fahrmeir & Tutz (2001). Hidden Murkov (HM) models are popular in speech recognition, see Rabiner (1989) or MacDonald & Zucchini (1997). Measures of categorical dispersion and association are discussed by Liebetrau (1990) or Agresti (1990). I n applications, stochastic modelling of categorical time series suffers in two related respects: communication and simplicity. Stochastic categorical time series models are poorly communicated to practitioners. They are discussed rather in academic stochastic literature and are known only in specialized communities. There is no easily accessible literature for categorical prediction problems as it exists in cardinal time series analysis. Typically, Weiss & Hirsh (1998) refer to the monograph by Brockwell & Davis (1996) as their source of time series prediction, they state that "these statistical techniques are not applicable to the event prediction problem", and they conclude that learning by data mining should be used to solve their problem. In theoretical substance and in the way df presentation stochastic categorical time series models are not accessible for practitioners. For the analysis of cardinal temporal data, a set of convenient and flexible concepts and methods is widely acknowledged and used, e. g., serial correlation, ARMA models, ARIMA mod-

els, state space models, Kalman filtering. A similar standard toolbox does not exist for the case of categorical temporal data. Since stochastic models are insufficiently communicated and not suitably tailored for application, the practice of categorical time series analysis has widely become a subject of the KDD community. The advantages of KDD are sparseness, simplicity, immediacy, flexibility, potential of customization, and adaptivity. A difficult problem like event analysis or prediction in a telecommunication network can be tackled immediately without preliminary modelling the specific structure of the underlying phenomenon. Assumptions are few and elementary, e. g., assuming some probability t o be constant over a time period. In particular, narrowing assumptions on probability distributions are avoided. Hence the method applies to a great variety of situations. Provided that new data are a t hand, a KDD approach easily adapts t o system dynamics, e. g., reconfiguration or expansion of a network. The drawbacks of KDD approaches are poverty in explanation and structural analysis, opaqueness and heuristic character of methods, inability t o account for prior incomplete knowledge on the phenomenon, inability to distinguish between substantial and random effects. The source of the event series remains a black box, the laws of the underlying phenomenon (network, system) remain unknown. However, this kind of black box modelling governs large parts of cardinal time series analysis, too, e. g., consider the Box-Jenkins modelling approach. Mannila et al. (1997) mention a marked point process as a stochastic structure for categorical time series. Alas, sufficiently simple and customized tools for the analysis of such a structure are not on the market. A more perspective approach are hidden Markov (HM) models. These models are clearly and transparently structured, and serve well t o explain the relation among observed event patterns and the stipulated states of a n underlying system. However, HM models are difficult in theory, implementation requires expert knowledge in stochastics, time, thorough study of the subject matter. HM models do not serve for rapid customized solutions. Adaptation t o reconfigurations in the systems may be difficult. Whereas KDD approaches are already widely used in network monitoring, the potential of HM modelling is still under study. Several contributions resulted from the MAGDA (Mod6lisation et Apprentissage pour une Gestion Distribuhe des Alarmes) project run by France Telecom. Fabre et al. (1998) describe the network as a so-called partially stochastic Petri net which contains random and deterministic variables. The hidden states of the net are subject to an HM model. An extension of the Viterbi algorithm is used to calculate the most likely state history from the observed alarms, see also Aghasaryan et al. (1998), Benveniste et al. (2001, 2003). Practical experiences with these approaches are not reported on. For many applications, immediacy, flexibility, and adaptivity are predominant requirements. In these cases, a KDD approach is preferrable. It is an interesting strategy for statisticians to t o refine and clarify KDD approaches

by stochastic reasoning. An important topic is the definition of standard objective functions to measure the performance of prediction policies, see Hong & Weiss (2004). Measures mentioned in paragraph 11, like recall, precision, F-measure are common in the information retrieval and language processing communities, see Yang & Liu (1999). They are generally used as empirical measures without an underlying stochastic model. I t is necessary t h a t such measures are formulated and investigated in terms of a stochastic model. In particular the relations t o stochastic measures of prediction quality should be investigated.

13 Temporal Pattern Analysis and St atistical Process

Control Paragraph 8 introduced the topic of temporal pattern analysis from the point of view of telecommunication network monitoring. T h e topic is of similar importance for monitoring and control of industrial processes or systems.

13.1 Complex Monitoring and Control Tasks. Temporal pattern analysis is an important tool in monitoring and control of complex systems. Methods for detecting and interpreting patterns of events (status messages, alarms, adjustments etc.) are important for obtaining and processing information from the system. For instance, Milne et al. (1994) develop a feedback monitoring and maintenance policy for industrial gas turbines which have a quite sophisticated structure. An important part of the policy is a tool ("chronicle model") for representing temporal episodes of system events like external action, internal processes, status messages, alarms, and for associating episodes with maintenance actions. An online recognition system detects episodes from the chronicle model1 during system operation.

13.2 Event P a t t e r n Analysis and Control Charting. Temporal pattern analysis can help t o extract more information from control charts. Kusiak (2000) interpretes a control chart as a simple clustering mechanism which devides the sample space into two regions: alarm region (event A) and no-alarm region (event 0 ) .Thus control charting produces episodes of the type 0. ... . O . A. The information contained in the corresponding specific sequence T I ,T z ,... of sample statistics is not used. However, this information may be useful for two purposes: 1) For the specific purpose of the control chart to detect predefined out-of-control situations rapidly, e. g., a shift in the mean or in the variance. 2) For the purpose of learning about the process. A well-known approach to achieve purpose 1) in the classical two-sided Shewhart X chart is the addition of warning limits and runs rules. In addition t o the action limits (lower and upper 3 sigma limits), the warning limits

define further regions in the sample space (range of X), usually bounded by the target, positive and negative 1 sigma limit, positive and negative 2 sigma limit. Thus the range of X is divided into 8 regions which correspond to 8 events. The runs rules prescribe a n alarm, if the temporal succession of the occurrences of these events corresponds to certain patterns (episodes). A popular set of three rules was originally used a t Western Electric Company (1956). In various situations, charts with runs rules perform better than simple Shewhart charts, see Page (1955), Roberts (1958), Bissel (1978), Wheeler (1983) and Champ & Woodall (1987,1990), Gi5b et al. (2001). In specific situations, further sets of rules may be useful. Methods for finding and administrating such rules can be adopted from alarm pattern correlation in network monitoring. Control charting intends to detect rare events. It should be investigated whether methods for predicting rare events as exemplified in paragraph 11 can be helpful to improve upon control chart performance. A formal view considers a control chart as an iterated statistical test of significance of a parametric hypothesis, e. g., on the mean or variance of a distribution. Though helpful for mathematical analysis, this view unnecessarily restricts the potential of control charting, see the discussion by Woodall (2000). A control chart can convey a lot of additional information on the process. Experienced operators study patterns on the chart t o detect unwelcome variation and to learn about unknown factors or sources of variation. As a rule of thumb, Wheeler (1995), page 139, recommends t o seek a n explanation for any time pattern that repeats itself eight times in succession. Patterns are also used t o track assignable causes after an alarm. Such intuitive reasoning can be improved by tools for detecting temporal patterns and for administrating associated rules. Patterns in control charts have been investigated with standard pattern recognition methods, e. g., with neural networks, see Hwarng & Hubele (1991) or Smith (1992). However, customary classification methods are not designed for temporal patterns, and they focus on previously defined pattern types like trend or cycle. The application of more recent methods designed specifically for temporal patterns seems more promising. Investigation of temporal patterns my also be used t o tune a so-called pre-control chart. See Bhote (1988) for details on pre-control.

13.3 Event Pattern Analysis in Automation. Automation often intends to substitute the manual operations of skilled operators by automata. The analysis of successive patterns of operations can support this process. An approach of this type is described by Heierman & Cook (2003).

13.4 Event Pattern Analysis and Process Monitoring. Classical online monitoring (control charting) methodology considers the process as a black box and ignores information about explanatory factors. This

approach was adequate in a time when online information from the process and its environment was difficult to be obtained and processed. Nowadays, a lot of information (occurreences of events) are available from measuring equipment, machines and automata in the production line and from sources in the environment of the process. Practitioners criticize the black box approach of classical monitoring tools which do not exploit further information. In a customary view, the investigation of process factors is completely detached t o off-line S P C in an experimental preproduction phase which implements techniques like quality function deployment (QFD), failure mode and effect analysis (FMEA), design of experiments (DOE). However, designed experiments may produce designed results. The real operating conditions may be different and may introduce unexpectedly varying factors, e, g., production speed, material properties, operators' dispositions, computer problems. Event pattern analysis can help to learn from events in the process and its environment. In particular, a global SPC approach, see paragraph 4 can use event pattern analysis in the macro analysis of processes. 13.5 Event Management

Paragraph 8 motivates event management by problems of telecommunication network monitoring. Similar and related problems occur on several levels of industrial organizations, particularly in manufacturing and in supply chains. A manufacturing system or a supply chain can be considered as a network of nodes corresponding to departments, assembly lines, automata, computers, measuring devices, conveyances, freight routes, logistic nodes. Malfunctions or failures in a single component can propagate through the system. In a modern environment, the nodes permanently collect and communicate information in the form of status messages, signals, alarms, reports, e. g., on machine status, machine adjustments, machine malfunction or failure, control chart signals, throughput, quality deviations (product, material), environment parameters (temperature, humidity), short of materials, short of manpower, changes in operators, orders overdue, freight status, transit delays. These events have t o be processed, categorized, analyzed, so as t o take suitable action on the level of management, organization, engineering, operation. This is the task of manufacturzng event m a n a g e m e n t (MEM) or supply c h a m event m a n a g e m e n t (SCEM). Similar problems occur in customer relatzonshzp event m a n a g e m e n t (CREM). In industry, interest in event management is growing rapidly. Lots of event management software packages are on the market, e. g., ALTA Power, Aspen Operations Manager, SAP Event Management Visiprise Event Manager, Movex, Categoric Software, Matrikon ProcessGuard. Yet, the topic has received few attention in the academic literature. As in telecom network mon-

itoring, four aspects have to be distinguished: 1) Tools for notifying events. 2) Tools for the intelligent administration of events, event pa$terns and interpretations thereof, and rules. 3) Event correlation, i. e., determining and interpreting meaningful patterns of events. 4) Inference from event patterns (prediction). Aspect 1) is essentially a matter of information transmission and reporting. Aspect 2) requires technology of databases, rule based systems, knowledge bases. Aspects 3) and 4) include the topic of analyzing time series of events. Practitioners criticize, see Bartholomew (2002), t h a t event management packages heavily concentrate on aspect 1) and, occasionally, aspect 2), but fall short completely in aspects 3) and 4). Statisticians should become aware of the opportunities for their discipline in t h e area of event management. Event management is a n excellent framework for statistical control methodology on a macro level, see paragraph 4. T h e framework is described b.y Ming et al. (2002) in their architecture for anticipative event management and intelligent self-recovery consisting of four components: i) event monitoring and filtration, ii) intelligent event manager, iii) intelligent self-recovery engine, iv) supporting databases. Stochastic problems are inherent in components i), ii), iii). In particular, methods for analyzing categorical time series are required.

14 Conclusion We have reviewed the relevance of database technology a n d KDD for process monitoring and control. In particular, we have studied d a t a warehousing and methods for temporal pattern analysis. Both fields can contribute t o enhance classical statistical control methodology, and to integrate statistical d a t a analysis into higher order decision processes in industrial organizations.

References 1. Aghasaryan, A., Fabre, E., Benveniste, A,, Boubour, R., Jard, C. (1998) "A Hybrid Stochastic Petri Net Approach to Fault Diagnosis in Large Distributed Systcms". In : Mathematical Thcory of Network and Systems (MTNS), edited by A. Bcghi, L. Fincsso, and G. Picci, I1 Poligrafo, Padova, Italy, pp. 921-924. 2. Agrawal, R., Lawless, J. F., and Mackay, R. J. (1999) "Analysis of Variation Transmission in Manufacturing Processes Part 11". Journal of Quality Technology, Vol. 31, No. 2, pp. 143-154. 3. Agrcsti, A. (1990) Categorical Data Analysis. John Wiley and Sons Inc., New York. 4. Bartholomew, D. (2002) "Event Management: Hype or Hope?" Industry Week, May 2002. 5. Bcndcll, A., Disney, J., and McCollin, C. (1999) "The Future Role of Statistics in Quality Engineering and Management". The Statistician, 48, Part 3, pp. 299-326. -

6. Benveniste, A,, Le Gland, F . , Fabre, E., and Haar, S. (2001) "Distributed Hidden Markov Models". Pages 211-220 in: Optimal Control and PDE's - Innovations and Applications. In honor of Alain Bensoussan on the occasion of his 60th birthday. Editcd by J.-L. Menaldi, E. Rofman, and A. Sulem. IOS Press, Amsterdam. 7. Benveniste, A,, Fabre, E., and Haar, S. (2003) "Markov Nets: Probabilistic Models for Distributed and Concurrent Systems". IEEE Transactions on Automatic Control, 48, 11, pp. 1936-1950. 8. Berchthold, A., and Raftery, A. (1999) The mixture transition distribution (MTD) model for high-order Markov chains and non-Gaussian time series. Technical Report 360, Department of Statistics, University of Washington. 9. Bhote, K . R. (1988) World Class Quality: Design of Experiments Made Easier More Cost Effective than SPC. American Management Association, New York. 10. Bouloutas, A. T . , Calo, S., and Finkcl, A. (1994) "Alarm Correlation and Fault Identification in Communication Networks". IEEE Transactions on Communications, Vo1.42, No. 21314, pp. 523-533. 11. Braucr, B. (2001) "Data Quality - Spinning Straw Into Gold". Paper 117 in: Proccedings of thc 26th SAS Users Group International Conference, SAS Institutc Inc. 12. Brockwell, P. J . , and Davis, R. A. (1996) Introduction t o Time Series and Forecasting. Springer-Verlag, New York. 13. Brugnoni S., Bruno G., ManioncR., Montariolo E., Paschetta E., Sisto L. (1993) "An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network". In: Proceedings of the IFIP TC6/WG 6.6 Third International Symposium on Integrated Network Management, pp. 617-628, ElsevierINorthHolland. 14. Davison, B., and Hirsh, H. (1998) "Probabilistic Online Action Prediction". In: Proccedings of the AAAI Spring Symposium on Intelligent Environmemnts. 15. Drusinsky, D., and Shing, M.-T. (2003) "Monitoring Temporal Logic Specifications Combincd with Timc Scrics Constraints". Journal of Universal Computer Science, vol. 9, no. 11, pp. 1261-1276. 16. Fa.bre, E., Aghasaryan, A., Benveniste, A., Boubour, R., and Jard, C. (1998) "Fault Dctection and Diagnosis in Distributed Systems: An Approach by Partially Stochastic Petri Nets". Discrete Event Dynamic Systems 8, 2 (Special issue on Hybrid Systems), pp. 203-231. 17. Faltin, F. W., Mastrangelo, C. M., Rungcr, G. C., and Ryan, T . P. (1997) "Considerations in the Monitoring of Autocorrelated and Independent Data". Journal of Quality Technology, Vol. 29, No. 2, pp. 131-133. 18. Fahrmeir, L., and Kaufmann, H. (1987) "Regression models for non-stationary categorical time series". Journal of Timc Series Analysis, Vol. 8, No. 2, pp. 147-160. 19. Fahrmcir, L., and Tutz, G. (2001) Multivariate Statistical Modelling Based on Gcncralixcd Linear Models. Springer-Verlag, New York. 20. Fong, D. Y. T . , and Lawless, J . F. (1998) "The Analysis of Process Variation Transmission with Multivariate Measurcmcnts". Statistica Sinica, 8, pp. 151164. 21. Friedman, J. H. (1997) "Data Mining and Statistics: What's t h e Connection?". In: Proccedings of t h e 29th Symposium on the Interface, edited by D. Scott. 22. Fricdman, J. H. (2001) "The Role of Statistics in the Data Revolution". International Statistical Review, 69, 5.

23. Frohlich, P., Nejdl, W., Jobmann, K., and Wietgrefe, H. (1997) "Model-Based Alarm Correlation in Cellular Phone Networks". In: Proceedings of t h e Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telccommunicatiori Systems (MASCOTS). 24. Gale, W . A , , Hand, D. J . , and Kelly, A. E. (1993) "Artificial Intelligence and Statistics". Pages 535-576 in: Handbook of Statistics 9: Computational Statistics, edited by C. R. Rao, North-Holland, Amsterdam. 25. Gob, R., Dcl Ca.stilto, E., a.nd Ra.tz, M. (2001) "Run Length Comparisons of Shewhart Charts and Most Powerful Test Charts for the Detection of Trends and Shifts". Communications in Statistics, Simulation and Computation, 30, 2, pp. 355-376. 26. Green, P. J., and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London. 27. Hand, D. J . (1998) "Data Mining: Statistics and More?" T h e American Statistician, Vol. 52, No. 2, pp. 112-118. 28. Hand, D. J. (1999) "Statistics and Data Mining: Intersecting Disciplines". SIGKDD Explorations, Volume 1, Issue 1, pp. 16-19. 29. Hcicrman, E., and Cook, D. J. (2003) "Improving Home Automation by Discovering Regularly Occurring Device Usage Patterns". In: Proceedings of the Intcrnational Confcrcncc on Data Mining, pp. 537-540. 30. Hong, S. J . , and Wciss, S. (2004) "Advances in Predictive Models for Data Mining". Pattern Recognition Letters Journal. To appear. 31. Hwarng, H. B., and Hubele, N. I?. (1991) "X-bar Chart Pattern Recognition Using Ncural Networks". ASQC Quality Congress Transactions, pp. 884-889. 32. Inrnon, W. H. (1996) Building the Data Warehouse. John Wiley and Sons Inc., New York. 33. Jacobs, P. A,, and Lewis, P. A. W. (1978a) "Discrete time series generated by mixturcs. I: Correlational and runs properties". Journal of t h e Royal Stat. Soc. B, Vol. 40, No. 1, pp. 94-105. 34. Jacobs, P. A., and Lewis, P. A. W . (197%) "Discrete time series generated by mixturcs. 11: Asymptotic properties". Journal of the Royal Stat. Soc. B, Vol. 40, No. 2, pp. 222-228. 35. Jacobs, P. A., and Lewis, P. A. W. (1978a) "Discrete time series generated by mixturcs. 111: Autorcgressive processes (DAR(p))". Naval Postgraduate School Technical Report NPS55-78-022. 36. Jacobs, P. A,, and Lewis, P. A. W . (1983) "Stationary discrete autoregressivemoving average time series generated by mixtures". Journal of Time Series Analysis, Vol. 4, No. 1, pp. 19-36. 37. Jakobson, G., and Weissman, M. D. (1993) "Alarm Correlation". IEEE Network, 7(6), pp. 52-59. 38. Ji, X., Zhou, S., Cao, J . , and Shao, J . (2001) "Data Warehousing Helps Enterprise Improve Quality Management". Paper 115 in: Proceedings of the 26th SAS Users Group Intcrnational Conference, SAS Institute Inc. 39. Kanji, G. K., and Arif, 0 . H. (1999) "Quality Improvement by Quantile Approach". Bulletin of the Intcrnational Statistical Institute, 52nd Session, Procecdings Tome LVIII. 40. Klcnz, B. W., and Fulcnwider, D. 0 . (1999) "The Quality Data Warehouse: Solving Problems for the Enterprise". Paper 142 in: Proceedings of the 24th SAS Users Group International Conference, SAS Institute Inc.

41. Kusiak, A. (2000) "Data Analysis: Models and Algorithms". Pages 1-9 in: Proceedings of thc SPEI Confcrcnce on Intelligent Systems and Advanced Manufacturing, editcd by P. E. Orban and G. K. Knopf, SPIE, Vol. 4191, Boston. 42. Lawlcss, J. F. Mackay, R. J . , and Robinson, J . A. (1999) "Analysis of Varia.tion Transmission in Manufacturing Processes - Part I". Journal of Quality Tcchnology, Vol. 31, No. 2, pp. 131-142. 43. Lcnz, H.-J. (1987) "Dcsign and Implementation of a Sampling Inspection System for Incoming Batches Based on Relational Databases". Pages 116-127 in: Frontiers in Statistical Quality Control 3, edited by H.-J. Lenz, G. B. Wetherill, P.-Th. Wilrich. Physica-Verlag, Heidelberg. 44. Liebetrau, A. M. (1990) Measures of Association. Fifth Edition. Sage Publications, Newbury Park, London, New Delhi. 45. MacDonald, I, L., and Zucchini, W. (1997) Hidden Markov and other models for discrete-valued time series. Chapman and Hall, London. 46. Mannila, H. (2000) "Theoretical Frameworks for Data Mining". SIGKDD Explorations, Volume 1, Issue 2, pp. 30-32. 47. Mannila, H., Toivonen, H., and Verkamo, A. I. (1997) "Discovery of Frequent Episodes in Event Sequences". Data Mining and Knowledge Discovery 1(3), pp. 259-289. 48. McClellan, M. (1997) Applying Manufacturing Execution Systems. St. Lucie Press, Boca Raton. 49. Megan, L., and Cooper, D. J . (1992) "Ncural Network Based Adaptive Control Via Temporal Pattcrn Recognition". Canadian Journal of Chemical Engineering, 70, p. 1208. 50. Milne, R., Nicol, C., Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C., Qucvedo, J., Aguilar, J., and Guasch, A. (1994) "TIGER: Real-Time Situation Assessmcnt of Dynamic Systems". Intelligent Systems Engineering, pp. 103-124. 51. Ming, L., Bing, Z. J . Zhi, Z. Y., and Hong, Z. D. (2002) "Anticipative Event Managcmcnt and Intelligcnt Self-Recovery for Manufacturing". Technical Report AT/02/020/MET of the Singapore Institute of Manufacturing Technology. 52. Moller, M., and Tretter, S. (1995) "Event correlation in network management systems". In: Proceedings of the 15th International Switching Symposium, volumc 2, Berlin. 53. Oates, T . (1999) "Identifying distinctive subsequences in multivariate time series by clustering". In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 322-326. 54. Padmanabhan, B., and Tuzhilin, A. (1996) "Pattern Discovery in Temporal Databases: A Temporal Logic Approach". Pages 351-354 in: Proceedings of the Sccond ACM SIGKDD Intcrnational Conference on Knowledge Discovery and Data Mining, Portland. 55. PricewaterhouscCoopers (2002) Global Data Managcmcnt Survey. PricewaterhouseCoopcrs. 56. Rabincr, L. R. (1989) "A Tutorial on Hidden Markov Models and Selectcd Application in Speech Recognition". Proceedings of the IEE, volume 77, number 2, pp. 257-286. 57. Raftcry, A. E. (1985) "A Model for High-Order Markov Chains". Journal of the Royal Statistical Society, Scries B, Vol. 47, No. 3, pp. 528-539, 1985. 58. Rijsbergcn, C. J . van (1979) Information Retrieval. Butterworths, London.

59. Roddick, J . F., and Spiliopoulou, M. (1999) "A Bibliography of Temporal, Spatial and Spatio-Temporal Data Mining Research". SIGKDD Exporations, volume 1, issue 1, pp. 34-38. 60. Rutledge, R. A. (2000) "Data Warehousing for Manufacturing Yield Improvement". Paper 134 in: Procecdings of the 25th SAS Users Group International Confcrcncc, SAS Institute Inc. 61. G. Shafcr (1976) A Mathcmatical Theory of Evidence. Princeton University Prcss, Princcton. 62. Silvcrston, L., Inmon, W . H., and Graziano, K. (1997) T h e Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs. John Wiley and Sons Inc., New York. 63. Smith, A. E . (1992) "Control Chart Representation and Analysis via Backpropagation Neural Networks". Pagcs 275-282 in: Proceedings of t h c 1992 International Fuzzy Systems and Intelligent Control Conference. 64. Thornhill, N. F., Atia, M. R., and Hutchison, R. J . (1999) "Experiences of Statistical Quality Control with B P Chemicals". International Journal of COMADEM, 2(4), pp. 5-10. 65. Tukey, J . W . "The Future of Data Analysis". T h e Annals of Mathematical Statistics, Vol. 33, pp. 1-67. 66. Wciss, G. M. (1999) "Timewcaver: A Genetic Algorithm for Identifying Predictivc Pattcrns in Scqucnces of Events". Pages 719-725 in: Proceedings of the Genetic and Evolutionary Computation Conference, edited by W . Banzhaf, J. Daida, A. Eiben, M. Garzon, V. Honavar, M. Jakiela. Morgan Kaufmann, San Francisco. 67. Wciss, G. M. (2001) "Predicting Telecommunication Equipment Failures from Scqucnces of Nctwork Alarms". In: Handbook of Knowledge Discovery and Data Mining, edited by W . Kloesgen and J. Zytkow, Oxford University Press. 68. Wciss, G. M., and Hirsh, H. (1998) "Learning t o Predict Rare Events in Event Scqucnccs". Pagcs 359-363 in: Proceedings of the Fourth International Confercnce on Knowledge Discovery and Data Mining, AAAI Press. 69. Wheeler, D. J . (1995) Advanccd Topics in Statistical Process Control. S P C Prcss, Knoxville, Tennessee. 70. Woodall, W . H. (2000) "Controversies and Contradictions in Statistical Process Control". Journal of Quality Technology, Vol. 32, No. 4, pp. 341-350. 71. Wcstern Electric Company (1956). Statistical Quality Control Handbook. American Tclephonc and Tclegraph Company, Chicago. 72. Yang, Y., and Liu, X. (1999) " A Rc-Examination of Text Categorization Methods". In: Proceedings of the ACM SIGIR Conferencc on Research and Development in Information Retrieval.

Optimal Process Calibration under Nonsymmetric Loss Function Przemyslaw G r z e g o r z e w ~ k i land , ~ Edyta Mr6wka1 Systems Research Institute, Polish Academy of Sciences, Ncwclska 6, 01-447 Warsaw, Poland Faculty of Math, and Inf. Sci., Warsaw University of Technology, Plac Politcchniki 1, 00-661 Warsaw, Poland c-mail: {pgrzcg, mrowka}@ibspan.waw.p1

Summary. A problcm of process calibration for nonsymmetric loss function is considered and an optimal calibration policy is suggested. The proposed calibration mcthod might be used e.g. when the loss caused by oversize and undersize are not equal.

1 Introduction Statistical process control ( S P C ) is a collection of methods for achieving continuous improvement in quality. This objective is accomplished by a continuous monitoring of the process under study in order t o quickly detect the occurrence of assignable causes and undertake the necessary corrective actions. Although many S P C procedures have been elaborated, Shewhart control charts are still the most popular and widely used SPC tools. Typical Shewhart's control chart contains three lines: a center line corresponding to the process level and two other horizontal lines, called the upper control limit and the lower control limit, respectively. When applying this chart one draws samples a t specified time moments and then plots sample results as points on the chart. As long as the points lie within the control limits the process is assumed to be in control. However, if a point plots outside the control limits we are forced to assume that the process is no longer under control. It should be emphasized that there is no connection between the control limits of the control chart and the specification limits of the process. The control limits are driven by the natural variability of the process, usually measured by the process standard deviation. On the other hand, the specification limits are determined externally. They may be set by management, the manufacturing engineers, by product developers and designers or by the customers. Of course, one should have knowledge on inherent variability when setting specifications. But generally there is no mathematical or statistical relationship between the control limits and specification limits (see [5], [4]).

It is evident that the quality improvement makes sense if and only if the final products meet requirements and expectations. Therefore, the process should be both in control and should meet specification limits. It is importa.nt to note t,hat we can have a process in control (i.e. stable with small variability), but producing defective items because of low capability. In this paper we consider the problem of the process calibration, i.e. how to set-up a manufacturing process in order to make it capable. In Sec. 2 we introduce basic notation and recall the traditional method of calibration which corresponds to the symmetric loss function. However, quite often the lower and upper specification limits cannot be treated in the same way. For example, it may happen that although one of the specification limits has been exceeded we obtain a nonconforming item that could be improved or corrected, which requires, of course, some costs. But exceeding the other critical specification limit would lead to the complete fault that cannot be corrected. In the last case the whole material used for manufacturing this item would be wasted a.nd the cost would be much higher. Ladany in his papers [ 2 ] ,[3] considered a situation when the loss generated by passing the specification limits are constant, however different for each specification limit (see Sec. 3). However, it seems that a more adequate loss function should not be piecewise constant but the loss corresponding to noncritical fault should increase gradually. The optimal calibration method for such loss function is considered in Sec. 4.

2 Process calibration with symmetrical loss function Let X denote a quality characteristic under study of a product item (length, diameter, weight, thickness, pressure strength, etc.). An item is called nonconforming if the measurement x of X lies outside a specified closed interval [ L S L ,U S L ] given by a certain lower specification limit L S L and upper specification limit U S L . This interval is sometimes called a tolerance interval. Moreover, we generally assume that X is normally distributed, i.e. X N ( m ,02). A common practice is to set-up the mean of the variable a t the middle distance between the specification limits, i.e.

-

PO =

LSL

+ USL 2

This approach is legitimate only if an incurring loss does not depend on which specification limit - the lower or upper - is exceeded. It means that the loss function L ( x ) is symmetrical, e.g. w 0

w

if if if

x < LSL LSL x < U S L x > USL

0.The loss function (2) is given in Fig. 1. As it is seen this method of calibration is very simple and natural. However, as it was mentioned above, very often the assumption of the symmetrical loss function is not appropriate in practice. Let us consider two following examples:

Example 1 Suppose that the quality characteristic under study is the inner diameter of a hole. Undersized holes can be rebored a t extra costs. But the reduction of the oversized hole would be either impossible or would require much higher repair costs. Example 2 Suppose now that the quality characteristic under study is the outside diameter of the bar or the thickness of the milled item. Oversized items can be regrounded a t additional costs. But the undersized item could be sold only for scrap. These examples show that in situations described above the symmetrical loss function is useless and a nonsymmetric loss function should de rather applied.

3 A simple nonsymmetric loss function To find the optimal calibration policy for situations as described in the examples given above let us consider a following loss function: w 0

z

if if if

x < LSL LSL 5 x 5 U S L x > USL

I

LSL

USI,

Fig. 2. Nonsymmetric loss function (3).

where, without loss of generality, we can assume that z < w (see Fig. 2) The expected loss for function (3) is given by the following formula:

E L ( x )= wP(X < LSL)

+zP(X > USL).

(4)

If we assume that the random variable X is normally distributed, i.e. X N ( p , u 2 ) ,then: X - p < LSL-p X-p USL-p E L ( x ) = wP() + z [ 1 - p(5 u u u a LSL - p USL - p = w@( 1-z@( a ) +Zl

)I

(5)

u

where di is a cu~nulativedistribution function of standard normal distribution N(O,l). Suppose that the standard deviation a is known and we consider the expected loss as a function of the process mean p, i.e. E L = E L , ( x ) = EL,&). More precisely, we are looking for such p where the expected loss takes its minimum. The minimum can be attained for p such that

Thus we have

where q5 denotes the density of standard normal distribution N ( 0 , l ) . Hence

Solving this equa.tion we obtain

[

= exp -

(USL - p12 + (LSL - p12 ?a2

2a2

I

which gives W

2021n z

P

= USL' -

USL =

LSL' - 2p (USL - LSL) ,

+ LSL 2

+

n2 w In -. USL- LSL z

By (1) we get immediately

P=PO+

o2

W

U S L - LSL In z

11 is easy to prove that function (5) has extremum at point (8) and it is a minimum. This result corresponds to the discussion by Lanany in [2], [3]. It is worth noting t,hat our formula (8) reduces to the classical result (1) if w = z . IS the t,rue variance o2 is not known then we may estimate it by

where X I , . . . , X , is a random sample from the process under study. Or, even better, we may estimate variance using m random samples X l j , . . . , Xnj, ,j = I , . . . , m and then we get

where

x,i

and where denotes the average obtained for the j-th sample. In this case our calibration policy leads to the following formula -

S2

W

G = po + USL - LSL ln z

4 A general model As it was mentioned in Sec. 1, it seems that a more natural loss function should be given by the formula dla. dla dla dla where, without loss of generality, z rework U L R > U S L (see Fig. 3).

x < LSL LSL 5 x < U S L USL < x 5 ULR x > ULR

(13)

< w and the so-called upper limit for

Fig. 3. Nonsymmetric loss function (13).

Function (13) attributes a high constant loss w for the critical fault and in the case of the noncritical fault it increases gradually on the interval [ U S L ,U L R ] till it reaches the level z. Going back to Example 2, such shape of the loss function reflects nicely that the additional costs for regrounding the oversized item depend on how much it has been oversized: small oversize results in small costs while relatively big oversize causes high costs. Moreover, starting from some oversize the costs of the corrective action remain constant,. In this situation the expected loss is given as follows:

E L ( x )= wP(X 5 LSL)+zP(X > ULR)+

x - USL U L R - U S L f(x)dx (14)

USL

where f ( x ) denotes the density of normal distribution N ( p , u 2 ) .After some transformations we get

L S L - p I+%+%( p - U S L E L ( x ) = w@( c ULR-USL

~7

)

-

-

+z U L R

-

PL- p

USL

u

Z -0

ULR - USL

0

As in Sec. 3 we have to consider the expected loss as a function of the mean p, i.e. E L = E L ( p ) ,and to find its minimum. Thus we need

+-o1 U L R -z U S L [ W L R - P I @ (U L R - p 1 After some simplifications we get

dELXb) Z d~ ULR - USL Hence we have to solve

). =0

which leads to following equation

and which is equivalent to 0 (LSL- P ) ~ In u, + In z ULR - USL 202

t2

+n

exp(--)dt 2

= 0.

(16)

USL-,I

Unfortunately, this equation depends not only on the unknown variable LSL, U S L , U L R , w , z and o , as well. In order to simplify it a t least a little let us introduce a following notation:

p but on 6 parameters:

a=

U S L - LSL 0

b=

7

ULR - USL 7

0

W

K = -. Z

Therefore, the optimal calibration is given by

p = U S L - Aa, where the correction factor A is a solution of the following equation:

Equation (21) is more acceptable than (16) because it depends only on 3 parameters: a , b and K. But it is still very difficult to solve it analytically. It could be solved numerically but then a natural question immediately arises: how to present the result in a convenient way such that it could be recommended to practitioners? Three dimensional tables containing the correction factors A are, of course, are not the best solution! Moreover, it is still not clear which values these parameters should be considered. Therefore, we need another reparametrization. But it we look carefully on (17) we find easily that parameter a resembles one of the best known tools used in the capability analysis - the so called process capability index C p (see [I], [5], [4])

Cp =

U S L - LSL - 1 - -a. 6a 6

The process capability index is a measure of the ability of the process to manufacture products that meet specifications. If the quality characteristic under study is normally distributed and assuming the process mean is centered between the upper and lower specification limits, one can calculate probabilities of meeting specifications q = P(LSL

< X < USL).

(23)

These probabilities for a few values of the process capability index C p are given in Table 1: To illustrate the use of Table 1, notice that C p = 1 implies

Table 1. Probability of mccting specifications.

a fallout rate of 2700 parts per million, while C p = 1.5 implies a fallout rate of 7 parts per million. For more information on process capability indices and process capability analysis we refer the reader to ( [ I ] ) . Moreover, to avoid parameters a and b let us henceforth consider the ratio

Therefore, now the optimal calibration is given by (20), where the correction factor A is a solution of the fbllowing equation

Since the most common values of the process capability index applied in practice are 1, 1.33 and 1.5, we could also restrict our considerations to these values. Thus we can solve numerically (25) for each fixed Cp and then present the correction factor A in tables which depend on two parameters h and K (see Tables 2, 3 and 4). Thus finally, the calibration policy depend on three natural parameters: Process capability index C,, h which is the ratio of the length of the tolerance interval and the length of the interval where loss function (13) is increasing, and K which is the ratio of constant losses w and z . Hence, to sum up, our calibration method for loss function (13), represented by five following parameters LSL, USL, ULR, w , z , might be described by a following algorithm: -

-

fix desired value of the process capability index C,, compute the ratio h = compute the loss ratio K = y find in a Table the appropriate value of the correction factor A set-up the process mean a t p = USL - Ao.

If the standard deviation o is unknown it should be estimated by a sample standard deviation 2 = 3 (see (9) or (11)). In such a way instead of (20) we have to use the following equation:

Remark As it is known the process capability index Cp defined by (22) and listed in Table 1 with the probability (23) is only correct in case that the process mean is centered between the upper and lower specification limits, i.e. p = LSL:USL. However, in (26) we recommend a nonsymmetrical calibration. Thus one may think that there is some contradiction or confusion. Actually there is no contradiction here. All calculations and formulae hold even if we do not refer to capability index and make use of coefficient a given by (17) a,nd called standardized specification interval. However, we have decided to apply C, index which is just a / 6 because practitioners are accustomed just to this coefficient and they have considerable intuition connected witah this pammeter. Therefore, the process capability index Cp is ut,ilized ra,tller as a convenient reference parameter for choosing appropriate correction factor.

0.80 0.85 0.90 0.95 1.00

2.318 2.328 2.337 2.345 2.353

2.308 2.318 2.327 2.335 2.343

2.298 2.308 2.317 2.326 2.334

2.289 2.299 2.308 2.317 2.325

2.281 2.290 2.299 2.308 2.316

2.273 2.282 2.291 2.300 2.308

2.265 2.275 2.284 2.292 2.300

2.258 2.267 2.276 2.285 2.293

2.251 2.260 2.269 2.278 2.286

Table 2. The corrcction factor A for C, = 1.

2.244 2.254 2.263 2.271 2.279

2.238 2.247 2.256 2.265 2.273

0.90 3.416 3.408 3.401 3.394 3.387 3.381 3.375 3.369 3.364 3.359 3.354 0.95 3.422 3.415 3.407 3.400 3.394 3.367 3.382 3.376 3.371 3.365 3.36C 1.00 3.429 3.421 3.413 3.407 3.400 3.394 3.388 3.382 3.377 3.372 3.367

Table 3. The correction factor A for C, = 1.33.

0.90 3.963 3.956 3.949 3.943 3.937 3.932 3.926 3.921 3.917 0.95 3.969 3.962 3.955 3.949 3.943 3.938 3.932 3.927 3.922 1.00 3.974 3.967 3.961 3.955 3.949 3.943 3.938 3.933 3.928

Table 4. The correction factor A for C, = 1.5.

3.912 3.908 3.918 3.913 3.923 3.919

5 Conclusion We have considered the problem of process calibration with nonsymmetric loss function which should be used if e.g. the loss caused by oversize and undersize are not equal. It is worth noting that the suggested calibration policy depends on such natural and well-known parameters as t h e process capability index C,.

References 1. Kotz S., Johnson N.L., Process Capability Indices, Chapman and Hall, 1993. 2. Ladany S.P., Optimal joint set-up and calibration policy with unequal revenue from oversized and undersized items, Int. J. Operations and Production Management, 16 (1996), 67-88. 3. Ladany S.P., Optimal set-up of a manufacturing process with unequal revenue from oversized and undersized items, In: Frontiers in Statistical Quality Control, Lenz H. J., Wilrich P. Th. (Eds.), Springer, 2001, pp. 93-101. 4. Mittag H.J., Rinne H., Statistical Methods of Quality Assurance, Chapman & Hall, London, 1993. 5. Montgorncry D.C., Introduction t o Statistical Quality Control, Wiley, New York, 1991.

The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments

Yoshikazu Ojima, Seiichi Yasui, Feng Ling, Tomomichi Suzuki and Taku Harada Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chida, 278-85 10, JAPAN, [email protected]

Summary. Nested experiments are commonly used to estimate variance components especially for the precision experiments. The ANOVA (analysis of variance) estimators are expressed as linear combinations of the mean squares from the ANOVA. Negative estimates can occur, as the linear combinations are usually including negative coefficients. The probability of the occurrence of negative estimates depends on the degrees of freedom of the mean squares and the true values of the variance components themselves. Based on the probability, some practical recommendations concerning the number of laboratories can be derived for the precision experiments.

1 Introduction Precision is one of the most important characteristics to evaluate the performance of measurement methods. There are several precision measures; i.e. repeatability, intermediate precision measures, and reproducibility. Their importance and the ways of determination are described in I S 0 5725-3 (1994). Precision is corresponding to measurement errors which are associated to measurement processes and environments. Introducing some replications after a point in the measurement process the measurement errors before and after the point can be evaluated separately. For example, replication of days (i.e, carrying out an experiment on several days) enables to obtain an error component due to the difference between days, and replication of laboratories (i.e. participation of several laboratories) enables to obtain an error component due to the difference between laboratories. From the statistical viewpoint, a measurement result y can be expressed as

where p is a general mean or the true value, a is a random effect due to a laboratory,

p is a random effect due to a day, y is a random effect due to an operator, E is a random effect due to replication under repeatability conditions. We usually assume that p is an unknown constant, and a, P, y, E are random variables with their expectations are 0, and their variances are oA2,ciB2,cic2, and OE2, respectively. Nested experiments are commonly used to estimate variance components (oA2, ciB2,oC2,and oE2)especially for the precision experiments. The ANOVA (analysis of variance) estimators of variance components are widely used because they are unbiased estimators and can be obtained without any distributional assumptions. The estimators are expressed as linear combinations of the mean squares from the ANOVA. The mean squares are chi-squared random variables under the normality assumption. Negative estimates can occur as the linear combinations are usually including negative coefficients. For the case of balanced nested experiments, the mean squares are mutually independent chi-squared random variables, and then the probability of the occurrence of negative estimates can be evaluated by the F distribution. For the case of the generalized staggered nested experiments, the mean squares are chi-squared random variables with correlation. Applying the canonical forms of the generalized staggered nested designs the probability of the occurrence of negative estimates is evaluated in this paper. The probability of the occurrence of negative estimates depends on the degrees of freedom of the mean squares and the true values of the variance components themselves. Based on the probability and the precision of the estimators, some practical recommendation concerning the number of laboratories can be derived for the precision experiments.

2 Three-stage nested experiments 2.1 Statistical model and ANOVA Three-stage nested experiments include three random components; the statistical model of a measurement result can be generally expressed as Y I / L = ~ + ~ I + ~ I / + E , / ~ ,

a

where p is a general mean, a, is a random variable from N(0, oA2),effect of a laboratory p,, is a random variable from N(0, oB2),effect of a day E , , L is a random variable from N(0, o;), measurement error, and k = I , ... , r , , ; j = 1, ... , b,; i = 1, ... , m . We use the symbols, p~ and pBto express the ratios c i A 2 / oand ~ oB2/oE2.Sums of squares, SSA,SSB,and SSEare obtained as

where averages are defined as

Degrees of freedom,

VA,

vB,and v . are obtained as

v ~ = m -1, vB= C b , - m , and v.=Cr(/ - C b , 'J

I

I

2.2 Balanced nested experiments 2.2.1 General

The number of replications is constant in the balanced nested experiments. Hence, we denote b = 6,(for all i), and r = r, (for all i and j). Table 1 shows the ANOVA table for the balanced nested experiments. Table 1: The ANOVA table for the balanced nested experiments

Factor

A B E

Sum of Squares, SS SSA SSB SSE

Degrees of Freedom, v m-1 m(b - 1 ) mb(r-1)

Mean Square, MS SSA/V ,

E(MS)

ssB/VB

0E2+rOB2+rbOA2 O B2+rOB2

SSEIv~

02

From the ANOVA table, ANOVA estimators are derived as

&: n2

= (MSA-MSB)I(rb),

0, = MSE

(5) For the case of the balanced nested experiments, all sums of squares are mutually independent chi-square distributed. The distribution of SSA / E(MSA) is a X 2 distribution with vA degrees of freedom, and so on. From Eq. (5),the probabilities of the occurrences of negative estimates are;

where F(v, , v2 ) is an F distributed random variable with ( v , , v2 ) degrees of freedom. 2.2.2 Required number of laboratories for precision experiments From the viewpoint of the precision experiments, usual experiments are designed as r = 2 , and b = 2. Applying r = b = 2, and v~ = m 1 , v~ = m, and vc = 2m, we have [ - I , m ) < (1+2pB)/(1+2p~+4pA) ] ~ r [ c j : < o ] = ~ r F(m

< o ] = ~ r ~[ ( m2m) , < 1/(1+2pB)1.

@I (9)

Table 2 shows the minimum number of laboratories ( m ) on given pB = oB2/oE2, for given p (= ~ r [ 6 ;< 0 I). The calculations of p-values for Eq. ( 9 ) are based on the programs in JSA (1972). Table 3 shows the minimum number of laboratories ( m ) on given

PA =

oA2/oE2, and given pB = oB2/oE2, for given p (= ~ r [ & : < 0 I).

Table 2:n ?The minimum number of laboratories (m), on given pB = 02/oE2,for given p (= Prl 0, < 0 I)

From Table 2 , a large number of laboratories are required to estimate oB2positively, for the case of smaller p,.

Table 3 (a): The minimum number of laboratories (m), on pA = oA2/oE2, and given pB = "2 02/oE2,for given p ( = Pr[ OA< 0 1) = 0.01

Table 3 (b): The minimum number of laboratories (m), on p,

=

oA2/oEZ, and given pB =

-2

oB2/oE" for given p ( = Pr[ oA< o 1) = 0.02

Table 3 (c): The minimum number of laboratories (m), on pA = oA2/oE2, and given pB =

-

2

oB2/oE2, for given p ( = Pr[ oA< 0 1) = 0.05

Table 3 (d): The minimum number of laboratories (m), on pA = (rA2/oE2, and given p,

=

-2

o ~ ~ lforo given ~ ~ p, ( = Pr[.oA i0 1) = 0.10

Table 3 (e): The minimum number of laboratories (m),on pA = oAZ/oE2, and given pB = -2

oB2/oE2, for given p ( = Pr[ oA< 0 1) = 0.20

O n the contrary, Table 3 shows that a large number of laboratories are required to estimate (rA2positively, for the case of larger pB. From the viewpoint of con-

ducting precision experiments, the number of participating laboratories is usually less than 50, and the results from Tables 2 and 3 show that we may often find negative estimates of variance components. 2.3 Staggered nested experiments

Based on the structure of staggered nested experiments, we have b = 2, r,, = 2, and r,* = I (for all i). Table 4 shows the ANOVA table for the staggered nested experiments. Table 4: The ANOVA table for the staggered nested experiments

Squares,

Degrees of Freedom, v

Mean Square, MS

E(MS)

From the ANOVA table, ANOVA estimators are derived as

6:= MSA13 - 5/12 MSB+ 1/12 MSE, 3;

-

= 314

(MSB - MSE),

2

os = MS,;.

(10)

For the case of the staggered nested experiments, all sums of squares are chisquare distributed, however SSAand SSBare positively correlated. Concerning Pr[&; < 0 1, SSB and SSEare mutually independent, and then the probability of negative estimates can be obtained similarly to the case of the balanced nested experiments. Table 5 shows the minimum number of laboratories (m) on given pB = oB2/oE2,for given p (= Pr[&j < 0 I) based on Eq.(ll).

Table 5: The minimum number of laboratories (m), on given p,

Pr[ &; < o 1)

= (TB2/(TE2,

for given p (=

From Table 5 , a large number of laboratories are required to estimate oB2positively, for the case of smaller pB. The tendency is similar to the results of the balanced nested experiments. Ojima (1998) derived the canonical form of the staggered nested experiments. Using the canonical form, P r [ s i < 0 ] can be evaluated. Table 6 shows the and given p~ minimum number of laboratories ( m ) on given pA = oAZ/oE2, oB2/oE2, for given p (= ~r[c?: < 0

I).

=

The result of Table 6 has been obtained by

Monte Carlo simulation. From Table 6, a large number of laboratories are required to estimate oA2positively, for the case of larger pB. The tendency is similar to the results of the balanced nested experiments.

-

Table 6 (a): The minimum number of laboratories (m), on p~ = oA2/oE2, and given p~ = 2

oB21cr2, for given p ( = Pr[OA < 0 1) = 0.01

Table 6 (b): The minimum number of laboratories (m), on p~

=

oA2/02, and given pB =

=

oA2/02, and given pB =

n2

o$/oE2,for given p ( = Pr[ OA< 0 I ) = 0.02

Table 6 (c): The minimum number of laboratories (m), on p,

o B 2 / ~for 2 ,given p ( = Pr[ 6;< 0 1) = 0.05

Table 6 (d): The minimum number of laboratories (m), on pA = G ~ ~ / and c T given ~ ~ , p,

=

n2

oB2hE2, for given p ( = Pr[ oA< o 1) = 0.10

Table 6 (e): The minimum number of laboratories (m), on p~ = oA2/02,and given p~

=

*2 oB2/oE2, for given p ( = Pr[ OA < 0 1) = 0.20

For the staggered nested experiments, less data are obtained in a laboratory than for the balanced experiments. Therefore variances of MSB and MSE obtained by the staggered nested experiments are larger than those by the balanced experiments. Consequently, the required number of laboratories for the staggered experiments is almost twice as much as for the balanced experiments. The comparison is only based on the number of laboratories and is not based on the whole number of data in an experiment. The number of data required in the staggered experiment is 314 of that in the balanced experiment with the same number of participating laboratories. Hence, it can be stated that the amount of information obtained from the staggered experiment is also 314 of that of the balanced experiment. 2.4 Practical consideration for precision experiments

As described in Eq.(l), the three stage nested design includes three random components, i.e. a, p, E. Applying the nested design for precision experiments, a is a random component due to a laboratory, P is a random effect due to a day, and E is an error component under repeatability conditions. As oE2is the repeatability variance, the case of a large value of p~ = oB210E2, e.g. pB > 5, means that oB2(day to day variation) is relatively much larger than oE2. In other words, the measurement process in the laboratory is not well controlled. The measurement process might not be standardized enough, or calibration methods may not be sufficient. On the contrary, a relatively small value of pB means that the measurement process in the laboratories can be well controlled. However, there is a possibility that the repeat-

ability value itself is too large in practice. If laboratories are well controlled, and resulting situation is 0.2 < pB < 0.5, it is not necessary to detect uB2. For such a case, the occurrence of negative estimates of uB2is not any longer a serious problem. To evaluate the value of PA, we should also consider a relative magnitude of p ~ . The case of small pA with large p~ is not a natural situation. Because small p,~, means there is small difference between laboratories, at the same time large pB means large variation within days in each laboratory. It can be caused by very poor calibration, or sample deterioration. Those problems should be solved in the stage of standardization of measurement methods. Reviewing several cases of pA and pB as above, it is enough to consider only the case of pA> p ~0.5 , < PB< 2, and 1 < p, < 2, for usual practical precision experiments. For balanced experiments the required number of participating laboratories should be 30 for 1% level (probability of occurrence of negative estimates), 20 for 5% level, 10 for 10% level, from Table 3. Similarly, for the staggered experiments the required number of participating laboratories should be 60 for 1% level, 40 for 5% level, 20 for 10% level, from Table 6.

3 Four-stage nested experiments 3.1 Statistical model and ANOVA

Four-stage nested experiments include four random components; the statistical model of a measurement result can be generally expressed as Y ~ ~ k l = ~ + a ~ + p ~ j + ~ ~ j k + ~ ~ / k l ~ (I2) where symbols are defined similarly as in Eq. (2), however 1 = 1, ... , r,,k; k = 1, ... , c,,; j = 1, ... , b,; i = 1, ... , m. Sums of squares, SSA,SSB,SSc, and SSE are obtained similarly as in Eq. (3). Degrees of freedom, VA, VB, vc, and vE are also obtained similarly as in Eq. (4). 3.2 Occurrence of negative estimates

The number of replications is constant in the balanced nested experiments. Hence, we denote b = 6, (for all i), c = c,, (for all i and j), and r = rUk(for all i, j and k). The probability of the occurrence of negative estimates can be evaluated similarly to section 2.2. Based on the structure of staggered nested experiments, we have b = 2, c,, = 2, C,Z = I, rIl = 2, rl12= 1, and r I 2=~ 1 (for all i). The probability of the occurrence of negative estimates can be evaluated similarly to section 2.3. Ojima (2000) proposed the generalized staggered nested experiments and derived the canonical form. Using the canonical form, the probability of the occurrence of negative estimates can be evaluated by the similar manner discussed in the section 2.3.

3.3 Practical considerations for precision experiments If a precision experiment is conducted after a complete standardization o f the measurement method, three stage nested experiment may be suitable. Four or more stage nested experiments may be useful to detect some large components o f variation in the measurement process. The result o f the experiment should be used to refine the process in order to have high precision. For such cases, the occurrence of negative estimates is not a serious problem because the most important aim of the experiment is to detect the largest variance component in the process.

4 Conclusion The probability o f the occurrence of negative estimates depends on the degrees o f freedom of the mean squares and the true values o f the variance components themselves. Based on the probability and the precision of the estimators, some practical recommendations concerning the number o f laboratories can be derived for the precision experiments. For balanced experiments, the required number o f participating laboratories should be 30 for 1% level (probability o f occurrence o f negative estimates), 20 for 5% level, 10 for 10% level. Similarly, for the staggered experiments the required number o f participating laboratories should be 6 0 for 1% level, 40 for 5% level, 20 for 10% level.

References I. IS0 5725-3: Accuracy (trueness and precision) of measurement methods and results Part 3: Intermediate measures of the precision of a standard measurement method, ISO, 1994 2. Yoshikazu OJIMA (1998), General formulae for expectations, variances and covariances of the mean squares for staggered nested designs, Journal of Applied Statistics, Vol. 25, pp.785-799, 1998 3. Yoshikazu OJIMA (2000), Generalized Staggered Nested Designs for Variance Components Estimation, Journal ofApplied Statistics, Vol. 27, pp.541-553, 2000 4. JSA (1972), Statistical Tables and Formulas with Computer Applications, Japanese Standard Association, 1972

Statistical Methods Applied to a Semiconductor Manufacturing Process Takeshi Koyama Tokushima Bunri University, Faculty of Engineering, Sanuki City, 769-2101, Japan [email protected]

1 Introduction Quality assurance of semiconductor devices requires the effective utilization of the statistical methods. In order to put new products on the market at the soonest possible time, manufacturers are typically forced to apply advanced yet immature technologies. Under this difficult situation, application of the statistical method that was designed to prevent post-shipment defects has become vital. The key factors of quality assurance system including reliability assurance are the following: (1) quality design of new products and initial verification; (2) quality improvement in the manufacturing process; (3) customer support and quality information collection; and (4) study and development of fundamental technologies for quality assurance. In this paper, the author first gives an overview of the statistical methods applied to the semiconductor manufacturing process in Section 2. It is essential to improve the quality (yield, in general) from the beginning of the production of a newly developed product to the end of its life as shown in Fig. 1. As the goal is to increase the profit, the priority in the process of the development of an optimized statistical method is the speed of the quality improvement. Therefore, the method that produces the positive results in the shortest time is the one that is the most valuable. Next, Section 3 demonstrates an application example of the orthogonal array of L,&!'~) to take some defects away. Section 4 then presents some new proposals for the prevention of quality deterioration caused by time dependently deteriorated materials.

1

3

5

7

9

11

13

15

17

19

21

Month Fig. 1 : Trend of yield

23

2 Overview of statistical methods Presently available statistical methods[l]-[4] are classified as Fig. 2, based on the possible quantity of available data and the unit cost for making samples, test, evaluation, analysis, etc. The top left area of the figure corresponds to development and design phases. At this stage, the high cost may prevent us from obtaining enough quantity of data. What must be prepared here is therefore specific circuits and structures that enable effective evaluation despite of the limited quantity of samples. Moreover, development of a certain kind of accelerated life test is essential for short evaluation. The bottom right area, on the other hand, corresponds to the volume production phase. Now manufacturers can obtain enough quantity of data with cheaper evaluation cost. In the semiconductor industry, as shown in Fig. 1, the more rapidly the quality improvement is achieved, the more profitable the manufacturer is. Time is thus most important factor when we detect defects and clear the problem.

trial production phase olume production phase

Available quantity of data Fig. 2: Overview of statistical methods

large

3 An experiment with L&") It is presented that application of ~ ~ ~ ( 2orthogonal ") array[5] led to clear the problem caused by defect such as crack of passivation layer of a semiconductor device. The results obtained from this experiment led to the amazingly successful improvement of the yield. 3.1 Design of experiment Table 1 shows the six factors taken into the experiment, which were thought to be closely related to the defective, namely, crack of the passivation layer. Levels 0 and 1 were determined as shown in Table 1. As it was very difficult to change the condition of diffusion process for the factors F and D, trials 1-4, 5-8, 9-12, and 13-16 were grouped as 4 groups in the dotted lines in Table 2 and the groups were randomized (split-unit design[4]). Other factors except F and D were randomized within each group. As a result, the sequence of the experiments is as follows: trials 10(AoBICIDIFoII), 12, 9, 11, 14, 16, 13, 15, 4, 3, 1, 2, 6, 8, 7, 5, where A,, means level 0 of factor A, namely, 2.0pm, and so on. Again, every 4 trials, for example, trials 10, 12, 9, 11 were experimented under the same conditions regarding factors F and D. Assignment of the factors is shown in Table 2. Columns 1-7 are assigned as the primary group[4] and columns 8-1 5 as secondary group[4]. The first-order error variation Sel is given by columns 5-7 (denoted by el), and second-order error variation Se2 to columns 9, 11, and 13-1 5 (denoted by e2). Then,

because the variation of column i, S,, is given by S , = ( T , , ) , - T ,,,,, ) 2 116 ,

where T(,,, and T(,,oare the sums of the obtained data corresponding to levels 1 and 0[4]. The factors were assigned to the columns as shown in Table 2. The factors F, D, F X D and A were determined as the primary group, also the factors C, B, and I as the secondary group.

Table 1: Assignment of factors

A B C D F I

Factor SiOz layer pre-treatment separation by boron supporting boat diffusion tube handling tweezers

Level 0 2.0pm used three times yes used silica glass tube stainless steel

Level 1 0.8pm new no new polysilicon tube Teflon

The interaction[4] FxD between the factors F and D was assigned to column 3, because the effect of factor F differed according to the level of factor D. 3.2 Analysis of the results

The rightmost column of Table 2 shows the results. Trial 10 datum was missed because of the break of the sample wafer. Now, let x be the missing value, then, error sum of square Se is given by Se = Se,

+ Se,

(4)

with values shown in bottom two rows of Table 2, where Se, and Se2 are given by Equations (1) and (2). The missing value is estimated as x=77.6 by dSe/&=O, so as to minimize Se. The value, Se113/Se2/5=0.26,is smaller than the critical F-value, F(3, 5 ; 0.05)=5.41[6], where 3 , 5 and 0.05 are the degrees of freedom of Se, and Se2, and the percentage of F-distribution, so Se, is pooled to Se2 for the analysis of variance. The result of analysis of variance is shown in Table 3. Factors D, DxF , F and A are significant, because the calculated F-ratios are larger than the critical F-values, F ( 1 , 7 ; 0.05)=5.59 or F(1,7; 0.01)=12.2[6],where 1 and 7 are the degrees of freedom and 0.05 and 0.01 are the percentage of F-distribution. In this analysis, the total degree of freedom as well as the error degree of freedom is subtracted by one, because one missing value is estimated.

Table 3: Analysis of variance

Factor D F DxF A C B I error sum

Sum of square 1975.8 3962.7 2155.3 4395.7 924.2 6.3 322.2 1620.5

Degree of freedom

Mean square 1975.8 3962.7 2155.3 4395.7 924.2 6.3 322.1 231.5

Note: *; larger than F(1,7; 0.05)=5.59, **; larger than F(1,7; 0.01)=12.2

The action that the silica glass tubes were replaced by polysilicon tubes, which was suggested as a result of this experiment, had been able to eliminate the defects from about 17% to almost 0 %.

4 Control of processing time for quality assurance 4.1 Quality influenced by throughput time

Such materials as photo-resist, etching solution, etc. are, in general, degraded in accordance with time passing. In order to assure quality, the staying time in processes should be limited. The quality of these materials changes exponentially as shown in Equation (5).

where, q , and a are the initial quality and the coefficient of degradation, respectively. Let q~ be the allowable limit quality, then, the allowable time should be less than

The total of the waiting time and the treatment time should be controlled withint,.

4.2 Control of batch products Fig. 3 shows a generalized process instead of, for example, the photo lithography process using photo-resist, the etching process using etching solution, etc. in the semiconductor manufacturing factory. When a product to be processed arrives at the process, it joins the waiting queue, unless the process is empty. Although the arrival distribution or the working (treatment) distribution is, in general, expressed by the exponential distribution as well as the Erlang distribution, the gamma distribution

is considered as more general distribution in this paper. If k is an integer, it becomes the Erlang distribution. Furthermore, if k=l, it becomes the exponential distribution. For the arrival distribution, k=k, and a = a,. For the working

k,, a,,.

g(c k', ,a,) arrival

+

departure

- +

Process

Waiting queue (Buffer)

(work station)

Fig. 3: A generalized process

distribution, k=k,,,and a = a,,,.The unit time may be determined as 11 a,. Although staying time in processes has been generally analyzed based on the queuing theory[7][8], the condition that a ,I a , p >l is not taken into consideration, because the number of products in a queue diverges under this condition. However, actual queues in the manufacturing process suggest that the condition that p > l exists on some periods. This necessitates the transient analysis. Transient characteristics are studied by computer simulation as shown in Fig. 4. In this simulation, n, m,

7(i) , and

Kt,,(k)

are the number of trials, the

number of batch products produced continuously, the mean finish time of i-th products, and the k-th mean number of waiting products, respectively. Figures 5 (a) and (b) show the computed mean numbers of waiting products and Fig. 6 shows the computed mean finish time under the conditions: k,=5, a,=2, k,,,=2; ,Q =I .2, 1.4, 1.6; p =0.2, 0.4, 0.6; m=90 and n=50. The numbers of waiting products increase almost linearly under the condition that p 1 1.2. On the contrary, the numbers keep almost the same rates under the condition that p 5 0.6. Fig. 6 shows the mean finish time up to determined number of products. From Fig. 6 , we can obtain

where t, ,B, and x are, respectively, finish time, ,8= q m ) ' m , and the number of batch products produced continuously. Table 4 shows the obtained ,8. Table 4: Obtained fl

0.4

item Mean finishtime

P

T;

0.8

0.2 0.4 181. 182 (m) 6

1.2

1.6

2.0

2.4

2.8

3.2

0.6 0.8 1.0 1.2 1.4 1.6 182.7 184.7 195.0 222.3 256.6 292.1

Conditions: k,=5, a,=2, k,=2, m=90 and n=50.

,k,,, a,,, n, m I

Loop number of tr~alsStart I = l-n

r,,(i): i-th arrival time

I

Loop number of batch products Start

pu,(i): i-th uniform random variable for working time

I = 1-m

I

Generate un~formrandom variable, pi .(I) Calculate t,(i)

= g-1(p,l,2(i),

f,.(i): i-th working time t,(i): i-th start time

k,, a , )

tl(i): i-th finish time rWi(i):i-th waiting time d(i): flag for i-th wait

I Generate uniform random variable, po,(i) I

n,,,(k): k-th number o f waiting products s,(i, j): i-th sum of finish time in j-th trial s,",(k,j): k-th sum of the number of waiting products inj-th trial fAi): i-th mean finish time n,.,(k): k-th mean number waiting products

Calculate t,,(i) = g-'(pii,,(i), k,,,,a,,)) w ~ t hthe inverse gamma function provided in ExcelIVBA I

I

Loop: counting number of waiting products: k = 1 -2m I

I Looo: number of batch oroducts: End I Sort r,(l)- t,,(m), tA1)- tAm) in ascendmg order

I

I

Loop: number of trials: End]

Calculate and output tAi) = ski, n)ln

Fig. 4: Flow chart for computer simulation

(End)

100

50

150

Time

(4

Time Fig. 5: Mean number of waiting products: n=50

-

10

20

-

30

-

---

-

40

50

60

.4

70

80

90

Number of batch products Fig. 6: Mean finish time: n=50

From Equations (6) and (8), we obtain

Thus, we should control the number of batch products, that is manufactured in series, not to excess XL given by the Equation (9) in order to assure quality, considering the deterioration of material as well as the number of the waiting products.

5 Summary Statistical methods applied to semiconductor manufacturing process are normally viewed. An example with tI6(2l5) orthogonal array is presented including split-unit design and estimation of one missing value. The result of this experiment was extraordinarily effective to improve the yield beyond expectation. Dynamic and statistical characteristics about consuming time in the manufacturing process are studied by computer simulation, in particular, under the condition that the working time is larger than arrival time interval with gamma random variables. The study will be useful to assure quality in manufacturing process for preventing quality deterioration in accordance with time.

References [ l ] Wadsworth, H., K. S. Stephens and A. B. Godfrey: Modern Methods for Quality Control and Improvement, John Wiley & Sons, Inc., 1986. [2] Montgomery, D. C.: Introduction to Statistical Quality Control, John Wiley & Sons, Inc., 2004. [3] Beck, J. V. and I