PREFACE
In my first post as a graduate bacteriologist, my manager impressed upon me the need for proper use of statistical analysis in any work in applied microbiology. So insistent was he that I attended a part-time course in statistics for use in medical research at University College, London – unfortunately the course was presented by a mathematician who failed to recognize that detailed mathematical concepts often cause non-mathematicians to ‘switch off’! In those days before the ready availability of personal computers and electronic calculators, statistical calculations were done manually, sometimes aided by mechanical calculators. Even simple calculations that nowadays take only a few minutes, would often take days and nights to complete. Nonetheless, my interest in the use of statistics has stayed with me throughout my subsequent career. In the early 1970s, I was fortunate to work with the late Dr Eric Steiner, a chemist with considerable knowledge of, and experience in, statistics who opened my eyes to a wider appreciation of statistical methods. I was also privileged for a time to have guidance from Ms Stella Cunliffe, who was the first woman President of the Royal Statistical Society, from 1975 to 1979. Over several years I lectured on statistical aspects of applied microbiology at various courses, including the WHO-sponsored course in Food Microbiology that I set up at the University of Surrey. During that time I became aware of a lack of understanding of basic statistical concepts by many microbiologists and the total lack of any suitable publication to provide assistance. In the early 1990s I prepared a report on statistical aspects of microbiology for the then UK Ministry of Agriculture, Food and Fisheries (MAFF). It is from that background that the first edition of this book arose. This book is intended as an aid to practising microbiologists and others in the food, beverage and associated industries, although it is relevant to other aspects of applied microbiology also. It is also addressed to students reading food microbiology at undergraduate and postgraduate levels. With greater emphasis now being placed on quantitative microbiology, including the use of legislative microbiological criteria for foods, and with the introduction of concepts such as Hazard Associated Critical Control of Processes (HACCP) the need to understand relevant statistical matters assumes an even greater importance in food microbiology.
xvii
Pre-N53039.indd xvii
5/26/2008 9:10:16 PM
xviii
PREFACE
This book is written not by a professional statistician but by a microbiologist, in the sincere hope that it will help future applied microbiologists. In preparing, editing and reviewing the work, I have received helpful comment and advice from colleagues in many countries. I would particularly acknowledge numerous helpful discussions on various topics with members of ISO TC34/SC9/WG2, especially Dr Hilko van der Voet (Wageningen, NL) and Dr Bertrand Lombard (AFSSA, Paris, France), and with members of the AOAC Presidential Taskforce on ‘Best Practices in Microbiological Methodology’, especially Harry Marks (USDA FSIS, Washington). I am especially indebted to Dr Alan Hedges (University of Bristol Medical School) and Dr Janet E L Corry (University of Bristol Veterinary School, Langford) for commenting on the manuscript and to Dr Sharon Brunelle of AOAC for writing the chapter on Method Validation. But all errors of omission or commission are mine alone. I would acknowledge the help received from the editors at Academic Press. I would also wish to thank the many authors and publishers who have kindly granted me the rights to republish tables and figures previously published elsewhere. Details of these permissions are quoted in the text. Finally, I would thank my wife, my family and my friends for their patience and understanding during the time that I have been preparing this revised edition. Basil Jarvis Upton Bishop, Herefordshire, UK
Pre-N53039.indd xviii
5/26/2008 9:10:16 PM
1 INTRODUCTION
One morning a professor sat alone in a bar at a conference. When his colleagues joined him at the lunch break they asked why he had not attended the lecture sessions. He replied by saying, ‘If I attend one session I miss nine others; and if I stay in the bar I miss all ten sessions. The probability is that there will be no statistically significant difference in the benefit that I obtain!’ Possibly a trite example, but statistics are relevant in most areas of life. The word ‘statistics’ means different things to different people. According to Mark Twain, Benjamin Disraeli was the originator of the statement ‘There are lies, damned lies and statistics!’ from which one is supposed to conclude that the objective of much statistical work is to put a positive ‘spin’ onto ‘bad news’. Whilst there may be some political truth in the statement, it is generally not true in science provided that correct statistical procedures are used. Herein lies the rub! To many people, the term ‘statistics’ implies the manipulation of data to draw conclusions that may not be immediately obvious. To others, especially many biologists, the need to use statistics implies a need to apply numeric concepts that they hoped they had left behind at school. But to a few, use of statistics offers a real opportunity to extend their understanding of bioscience data in order to increase the information available. Microbiological testing is used in industrial process verification and sometimes to provide an index of quality for ‘payment by quality’ schemes. Examination of food, water, process plant swabs, etc. for microorganisms is used frequently in the retrospective verification of the microbiological ‘safety’ of foods and food process operations. Such examinations include assessments for levels and types of microorganisms, including tests for the presence of specific bacteria of public health significance, including pathogens, index and indicator organisms. During recent years, increased attention has focused, both nationally and internationally, on the establishment of numerical microbiological criteria for foods. All too often such criteria have been devised on the misguided belief that testing of foods for compliance with numerical, or other, microbiological criteria will enhance consumer protection by improving food quality and safety. I say ‘misguided’ because no amount of testing of finished products will improve the quality or safety of a product once it has been manufactured. There are various forms of microbiological criteria that are set for different purposes; it is not the Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH001-N53039.indd 1
1
5/26/2008 4:09:04 PM
2
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
purpose of this book to review the advantages and disadvantages of microbiological criteria – although statistical matters relevant to criteria will be discussed (Chapter 14). Rather, the objective is to provide an introduction to statistical matters that are important in assessing and understanding the quality of microbiological data generated in practical situations. Examples, chosen from appropriate areas of food microbiology, are used to illustrate factors that affect the overall variability of microbiological data and to offer guidance on the selection of statistical procedures for specific purposes. In the area of microbiological methodology it is essential to recognize the diverse factors that affect the results obtained by both traditional methods and modern developments in rapid and automated methods. The book considers: the distribution of microbes in foods and other matrices; statistical aspects of sampling; factors that affect results obtained by both quantitative (e.g. colony count and most probable number (MPN) methods) and quantal methods; the meaning of, and ways to estimate, microbiological uncertainty; the validation of microbiological methods; and the implications of statistical variation in relation to microbiological criteria for foods. Consideration is given also to quality monitoring of microbiological practices and the use of Statistical Process Control for trend analysis of data both in the laboratory and in manufacturing industry. The book is intended as an aid for practising food microbiologists. It assumes a minimal knowledge of statistics and references to standard works on statistics are cited whenever appropriate.
CH001-N53039.indd 2
5/26/2008 4:09:04 PM
2 SOME BASIC STATISTICAL CONCEPTS
POPULATIONS The true population of a particular ecosystem can be determined only by carrying out a census of all living organisms within that ecosystem. This applies equally whether one is concerned with numbers of people in a town, state or country or with numbers of microbes in a batch of a food commodity or product. Whilst, in the former case, it is possible at least theoretically to determine the human population in a non-destructive manner, the same does not apply to estimates of microbial populations. When a survey is carried out on people living for instance, in a single town or village, it would not be unexpected that the number of residents differs between different houses; nor that there are differences in ethnicity, age, sex, health and well-being, personal likes and dislikes, etc. Similarly, there will be both quantitative and qualitative differences in population statistics between different towns and villages, different parts of a country and different countries. A similar situation pertains when one looks at the microbial populations of a food. The microbial association of foodstuffs differs according to diverse intrinsic and extrinsic factors, especially the acidity and water activity, and the extent of any processing effects. Thus the primary microbial population of acid foods will generally consist of yeasts and moulds, whereas the primary population of raw meat and other protein-rich foodstuffs will consist largely of Gram negative non-fermentative bacteria, with smaller populations of other organisms (Mossel, 1982). In enumerating microbes, it is essential first to define the population to be counted. For instance, does one need to assess the total population, that is living and dead organisms, or only the viable population; if the latter, is one concerned only with specific groups of organisms, for example aerobes, anaerobes, psychrotrophs and psychrophiles, mesophiles or thermophiles? Even when such questions have been answered, it would still be impossible to determine the true ecological population of a particular ‘lot’ of food, since to do so would require testing of all the food. Such a task would be both technically and economically impossible. Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH002-N53039.indd 3
3
5/26/2008 4:15:05 PM
4
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
LOTS AND SAMPLES An individual ‘lot’ or ‘batch’ consists of a bulk quantity of food that has been processed under essentially identical conditions on a single occasion. The food may be stored and distributed in bulk or as pre-packaged units each containing one or more individual units of product (e.g. a single meat pie or a pack of frozen peas). Assuming that the processing has been carried out under uniform conditions, then, theoretically, the microbial population of each unit should be typical of the population of the whole lot. In practice, this will not always be the case. For instance, high levels of microbial contamination may be associated only with specific parts of a lot due to some processing defect. In addition, estimates of microbial populations will be affected by the choice of test regime that is used. It is not feasible to determine the levels and types of aerobic and anaerobic organisms, or of acidophilic and non-acidophilic organisms, or other distinct classes of microorganism using a single test. Thus when a microbiological examination is carried out, the types of microorganisms that are detected will be defined in part by the test protocol. All such constraints therefore provide a biased estimate of the microbial population of the ‘lot’. Hence, sampling of either bulk or pre-packaged units of product merely provides a sample of the types and numbers of microorganisms that make up the population of the ‘lot’ and those population samples will themselves be further sampled by our choice of examination protocol. In order to ensure that a series of samples drawn from a ‘lot’ properly reflect the diversity of types and numbers of organisms associated with the product it is essential that the primary samples should be drawn in a random manner, either from a bulk or as individual packaged units of the foodstuff. Analytical chemists frequently draw large primary samples that are blended and resampled before taking one or more analytical samples – the purpose is to minimize the betweensample variation in order to determine an ‘average’ analytical estimate for a particular analyte. It is not uncommon for several kilograms of material to be taken as a number of discrete samples that are then combined. Indeed, for some purposes, such multiple sampling procedures are commonplace. The sampling of foods for microbiological examination cannot generally be done in this way because of the risks of cross contamination during the mixing of primary samples. A ‘population sample’ (i.e. a unit of product) may itself be subdivided for analytical purposes and it is necessary, therefore, to consider the implications of determining microbial populations in terms of the number, size and nature of the samples taken. In a few instances it is possible for the analytical sample to be truly representative of the ‘lot’ sampled. Liquids, such as milk, can be sufficiently well mixed that the number of organisms in the analytical sample is representative of the milk in a bulk storage tank. However, because of problems of mixing, samples withdrawn from a grain silo, or even from individual sacks of grain, may not necessarily be truly representative. In such circumstances, deliberate stratification (qv) may be the only practical way of taking samples. Similar situations obtain when one considers complex raw material (e.g. animal carcases), or composite food products (e.g. ready-to-cook frozen meals containing slices of cooked meat, Yorkshire pudding, peas, potato and gravy). It is necessary to consider also the actual sampling protocol to be used: for instance, in sampling from a meat or poultry carcase, is the sample to be taken by swabbing, rinsing or excision of
CH002-N53039.indd 4
5/26/2008 4:15:05 PM
SOME BASIC STATISTICAL CONCEPTS
5
skin? Where on the carcase should the sample be taken? For instance, one area may be more likely to carry high numbers and types of organism than other areas. Hence, standardisation of sampling protocols is essential. In situations where a composite food consists of discrete components, a sampling protocol needs to be used that reflects the purpose of the test – is a composite analytical sample required (i.e. one made up from the various ingredients in appropriate proportions) or should each ingredient be tested separately. These matters are considered in more detail in Chapter 5.
AVERAGE SAMPLE POPULATIONS If a single sample is analysed, the result provides a method-dependent single point estimate of the population numbers in that sample. Replicate tests on a single sample provide an improved estimate of population numbers, based on the average of the results, together with a measure of variability of the estimate for that sample. Similarly, if replicate samples are tested, the average result provides a better estimate of the number of organisms in the population based on the inter-sample average and an estimate of the variability between samples. Thus, we can have greater confidence that the ‘average sample population’ will reflect more closely the population in the ‘lot’. The standard error of the mean (SEM) provides an estimate of the extent to which that mean (average) value is reliable. If a sufficient number of replicate samples is tested then we can derive a frequency distribution for the counts, such as that shown in Fig. 2.1 (data from Blood, 1974). Note that the distribution curve has a long left hand tail and that the curve is not symmetrical, probably because the data were compiled from results obtained in two different production plants. The statistical aspects of frequency distributions are discussed in Chapter 3. Adding the individual values and dividing by the number of replicate tests provides a simple arithmetic mean of the values (x ( x1 x2 x3 .... xn ) / n in1 xi / n where xi is the value of ith test and n is the number of tests done). However, it is possible to derive 35
30.5
25
23
20
21
16
15 10 4.5
3
5 7.
7. 0
5 6.
0 6.
5.
5
0 5. 0
5
3. 5
0
0
4.
2
0
5
4.
% frequency
30
Colony count (log cfu/g) FIGURE 2.1 Frequency distribution of colony count data determined at 30°C on beef sausages manufactured in two factories (modified from Blood, 1974) (reproduced by permission of Leatherhead Food International).
CH002-N53039.indd 5
5/26/2008 4:15:05 PM
6
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
other forms of average value. For instance, multiplying the individual counts on n samples and then taking the nth root of the product provides the geometric mean value (x) : x n (x1 x2 x3 xn ) It is simpler to determine the approximate geometric mean by taking logarithms of the original values (y log10 x), adding the log-transformed values and dividing the sum by n to obtain the mean log value ( y ), which equals log x. This value is then back-transformed by taking the antilog to obtain an estimate of the geometric mean value: n
y
∑ yi i 1
n
n
∑ log xi i 1
n
log x
The geometric mean is appropriate for data that conform to a log-normal distribution and for titres obtained from n-fold dilution series. It is important to understand the difference between the geometric and the arithmetic mean values since both are used in handling microbiological data. In terms of microbial colony counts, the log mean count is the log10 of the simple arithmetic mean; by contrast, the mean log-count is the arithmetic average of the log10-transformed counts that, on back-transformation gives the geometric mean count. The methods are illustrated in Example 2.1. STATISTICS AND PARAMETERS A population is described by its parameters: the mean () and the variance (2). But we cannot know the values of these parameters except for a finite population (e.g. a set of pipettes). However, we can obtain estimates of these parameters from the statistics that describe the sample population in terms of its analytical mean value ( x ) and its variance (s2). We can also provide a measure of the likelihood that the same mean result would be attained if analyses were repeated on a further set of samples from the same ‘lot’. Such estimated values are statistics that can be used as estimates of the true population parameters. VARIANCE AND ERROR Results from replicate analyses of a single sample, and analyses of replicate samples, will always show some variation that reflects the distribution of microbes in the samples tested, inadequacies of the sampling technique and technical inaccuracies of the method and the analyst. The variation can be expressed in several ways. The statistical range is the simplest way to describe the dispersion of values by deriving the differences between the lowest and the highest estimates, for example, in Example 2.1, the colony count range is 610 (i.e. 19701360). The statistical range is often used in Statistical Process Control (Chapter 12) but since it depends solely on the values for the extreme
CH002-N53039.indd 6
5/26/2008 4:15:06 PM
SOME BASIC STATISTICAL CONCEPTS
7
counts, its usefulness is severely limited since it takes no account of the distribution of values between the two extremes. The population variance is derived from the mean of the squares of the deviations, viz. 2 (x )2 / n , where x is an individual result, μ the population mean value, n the number in the population and indicates ‘sum of’. Each individual result (x) differs from the population mean by a value (x ), which is referred to statistically as the deviation. But as the value of μ is unknown, the sample mean ( x ) is used as an estimate of the population mean. The ‘sample variance’ (s2) provides an estimate of the population variance (2) and is determined as a weighted mean of the squares of the deviations, weighting being introduced through the application of the concept of degrees of freedom, which assumes that of n observations, only (n 1) are available since one observation has been used already in determining the mean value. The unbiased estimate (s2) of the population variance (2) is thus derived from: n ⎞2 ⎛ n n ∑ xi2 ⎜⎜⎜ ∑ xi ⎟⎟⎟ ⎜⎝ i1 ⎟⎠ i 1 s2 n(n 1) n ⎡ ⎤ The alternative form of this equation ⎢⎢ s2 ∑ (x x)2 /(n 1)⎥⎥ should not normally be i 1 ⎣ ⎦ used in practical calculation of the sample variance since it is based on the square of the deviations from the mean value. Such deviations are usually only an approximation for the absolute infinite decimal value; and since the sum of the deviations from the mean value are squared, any discrepancies are additive and the derived variance may be inaccurate. The standard deviation(s) of the sample mean is the square root of the variance (s s2 ) . The coefficient of variation (CV), often referred to as the relative standard deviation (RSD), is the standard deviation expressed as a percentage of the mean: %CV %RSD (s/x) 100 . The term ‘standard error’ is often used conventionally to mean the ‘standard deviation’ (described above) and is a statistical measure of the deviation that estimates would be expected to show in testing repeat samples from the same population. In other words, it shows how much variation might be expected to occur merely by chance in the characteristics of samples drawn equally randomly from a single population. However, the SEM is a measure of the deviation in the mean value which would be expected if repeated analyses were undertaken on the same ‘lot’ of product. The SEM is estimated from the square root of the variance divided by the number of observations used, that is, SEM s2 / n s/ n .
THE CENTRAL LIMIT THEOREM We should pause at this point to consider an important statistical theorem, which underlies many statistical procedures. The central limit theorem is a statement about the sampling distribution of the mean values from a defined population. It describes the characteristics of
CH002-N53039.indd 7
5/26/2008 4:15:06 PM
8
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
the distribution of mean values that would be obtained from tests on an infinite number of independent random samples drawn from that population. The theorem states, ‘for a distribution with a population mean and a variance 2, the distribution of the average tends to be Normal, even when the distribution from which the average is computed is non-Normal. The limiting normal distribution has the same mean as the parent distribution and its variance is equal to the variance of the parent divided by the sample size (2/N)’. Individual results from a finite number of independent, randomly drawn samples from the same population are distributed around the average (mean) value so that the sum of the values greater than the average will equal the sum of the values lower than the average value. If sufficient independent random samples are tested then we can derive a statistical distribution that describes the occurrence of the population (Chapter 3). Now, no matter what form the actual distribution takes, the distribution of the average (mean) result in repeated tests always approaches a Normal distribution when sufficient trials are undertaken. In this situation, the number of trials relates not to the number of samples per se but to the number of replicate trials.
EXAMPLE 2.1 DERIVATION OF SOME BASIC STATISTICS THAT DESCRIBE A DATA SET Assume that we wish to determine the statistics that describes a series of replicate colony counts on n samples, represented by x1, x2, x3,…, xn, for which the actual values are 1540, 1360, 1620, 1970, 1420 as colony forming units (cfu)/g The range of colony counts provides a measure of the extent of overall deviation between the largest and the smallest data values and is determined by subtracting the lowest count from the highest count; for the example data the range is 19701360 610. The median colony count is the middle value (in an odd-numbered set of values) or the average of the two middle values in an even-numbered set of values; for this sequence of counts the median value of 1360, 1420, 1540, 1620, 1970 1540. The arithmetic average (mean) colony count is the sum of the individual values divided by the number of values, that is n
x (x1 x2 x3 xn )/ n
∑ xi/n , i1
where x mean value and means ‘sum of ’; for our data the mean count x/n (1540 1360 1620 1970 1420)/5 1582. The geometric mean colony count is the nth root of the product obtained by multiplying together each value of x. Hence, the geometric mean count
n
( x1 x2 x3 xn ) .
Alternately, we can transform the x values by deriving their logarithms so that y log10 x: then geometric mean is the antilog of the sum of y divided by n ⎡⎛ n ⎡⎛ n ⎞⎟ ⎤ ⎞⎟ ⎤ = antilog ⎢⎢⎜⎜⎜ ∑ log xi ⎟⎟⎟ / n ⎥⎥ = antilog ⎢⎢⎜⎜⎜ ∑ yi ⎟⎟⎟ /n ⎥⎥ . ⎟⎠ ⎥ ⎢⎣⎜⎝ i1 ⎟⎠ ⎥⎦ ⎢⎣⎜⎝ i1 ⎦
CH002-N53039.indd 8
5/26/2008 4:15:06 PM
SOME BASIC STATISTICAL CONCEPTS
9
For our data the geometric mean colony count antilog(log10 x/n) antilog[log 1540 log 1360 log 1620 log 1970 log 1420)/ 5] ant ilog[(3 . 1875 3 . 1335 3 . 2095 3 . 2945 3 . 1523)/ 5] antilog(15 . 9 7 73 / 5) antilog(3 . 19456) 1568 .
The sample variance (s2) is the sum of the squares of the differences between the values for x and the mean value (x ) , divided by the degree of freedom of the data set (i.e. n 1). (One value of n was used in determining the mean value, hence there are only n 1 degrees of freedom (df)). Thus
s2
( n∑ x
2
2
(∑ x)
n(n 1)
) (∑ x
2
2
( ∑ x ) /n
)
(n 1)
Hence for our data, s2
[(15402 13602 16202 19702 14202 )] [(1540 1360 1620 1970 1420) 2 /5 ] (5 1) 12, 742, 900 12, 513,620 4
229,280 4
57,320
An alternative form of the equation is: n
s2 =
∑ ( xi − x)
2
/(n − 1)
i=1
Hence, with mean (x ) 1582, the variance is given by: ⎡ (1540 1582)2 (1360 1582)2 (1620 1582)2 (1970 1582)2 (14 2 0 1582)2 ⎤ ⎦ s2 ⎣ (5 1) ⎡ (42)2 (222)2 382 3882 (162)2 ⎤ ⎦ ⎣ 4 1 7 64 49, 284 1444 150, 544 26, 244 4 229, 280 57, 320 4 Note that in this example, where the mean value was finite, both methods gave the same result for the variance. However, where the mean value is not finite, rounding errors can cause serious inaccuracies in the variance calculation.
CH002-N53039.indd 9
5/26/2008 4:15:06 PM
10
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
The standard deviation (s) around the mean is the square root of the variance and is given by: s2 57, 320 239.4 Thence the Relative Standard Deviation (RSD), which is the ratio between the standard deviation and the mean value, is given by 100 239.4/1582 15.1% The variance of the log10-transformed values is derived similarly using the transformed values, that is, y log10 x, then: s2 [5(3 . 18752 3 . 13352 3 . 20952 3 . 29452 3 . 15232 ) (3 . 1875 3 . 1 3 35 3 . 2095 3 . 2945 3 . 1523)2 ]/ (5 4) [(5 51 . 07077) 255 . 2741 1 53]/20 (255 . 35385 255 . 274115)/ 20 0 . 0039869 0.0040 Using the alternative method with a mean log-count of 3.1946, gives the variance of y as: s2 [(3 . 1875 3 . 1946)2 (3 . 1335 3 . 1946)2 (3 . 2095 3 . 1946)2 (3 .2 2 945 3 . 1946)2 2 (3 . 1523 3 . 1946) ]/4 0 . 01577493 /4 0 . 0039438 0 .0039 Note the small difference in the variance estimates determined by the two alternative methods. The SD of the mean log-count is 0.00399 0.0631665 0.0632 and the RSD of the mean log-count is (0.0632 100)/3.19456 1.97%. The reverse transformation of the mean log-count is done by taking the antilog of y : – x 10 y 103.1946 1565. But this is not an accurate estimate of the geometric mean x . The relationship between the log mean count (log x ) and the mean log-count ( y ) is given by the formula: log x
y ln(10) s2 y 2 . 3025 s2 10 10
where s2 variance of the log-count. Hence, for these data where y 3 . 1946 and s2 0.0040, the log mean colony count is given by log x 3 . 1946 2 . 3025 0 . 0040 /10 3 . 1955 Hence x 103.1955 1568 . 6 1569 . Note that standard deviations of the mean log-count should not be directly backtransformed since the value obtained (100.0635 1.1574) would be misleading. Rather,
CH002-N53039.indd 10
5/26/2008 4:15:07 PM
SOME BASIC STATISTICAL CONCEPTS
11
the approximate upper and lower 95% confidence intervals around the geometric mean would be determined as 10(3.194620.0635) and 10(3.194620.0635), that is, 103.33216 2097 and 103.0676 1168. Hence for these data the geometric mean is 1569 and the 95% upper and lower confidence limits are 2097 and 1168, respectively. A comparison with the arithmetic mean and its 95% confidence limits is shown below: 95% Confidence limits Method
Mean
Median
Lower
Upper
Arithmetic
1582
1540
1104
2060
Geometric
1569
1168
2097
For these data the difference between the arithmetic and geometric mean values is small since the individual counts are reasonably evenly distributed about the mean value and are not heavily skewed. Note that the median value is smaller than both mean values because of the small population of results that were examined. The standard deviation of the arithmetic mean value reflects the level of dispersion of values around the mean value. Note also that the upper and lower 95% confidence limits are distributed evenly about the arithmetic mean value (1582 478) but are distributed unevenly around the geometric mean value (1565 397 and 1565 532).
References Blood RM (1974) The Clearing House Scheme. Tech Circular No. 558. Leatherhead Food Research Association. Mossel, DAA (1982) Microbiology of Foods: The Ecological Essentials of Assurance and Assessment of Safety and Quality, 3rd edition. University of Utrecht, NL.
Further Reading Glantz, SA (1981) Primer of Biostatistics, 4th edition. McGraw-Hill, New York, USA. Hawkins, DM (2005) Biomeasurement – Understanding, Analysing and Communicating Data in the Biosciences. Oxford University Press, Oxford, UK. Hoffman HS (2003) Statistics Explained: Internet Glossary of Statistical Terms. http://www.animatedsoftware.com/statglos/statglos.htm
CH002-N53039.indd 11
5/26/2008 4:15:07 PM
3 FREQUENCY DISTRIBUTIONS
Replicate analyses on a single sample, or analyses of replicate samples, will always show a variation among results. A variable quantity can be either continuous, that is it can assume any value within a given range, or discrete, that is it assumes only whole number (integer) values. Continuous variables are normally measurements (e.g. height and weight, pH values, chemical composition data, time to obtain a particular change during incubation), although an effective discontinuity is introduced by the limitations of measurement (e.g. length to the nearest 0.1 mm, acidity to the nearest 0.01pH unit, etc.). Discrete variables are typified by whole number counts, for example the number of bacteria in a sample unit, the number of insects in a sack of corn, etc. When a large number of measurements or counts has been done, the observations can be organized into frequency classes to derive a frequency distribution. Since counts are discontinuous, each class in the frequency distribution will be an integer or a range of integers and the number of observations falling into that class will be the class frequency. When more than one integer is combined, the classes must not overlap and, although not essential, it is usual to take equal class intervals in simple frequency analyses. Although the form of a frequency distribution can be seen from a tabulation of numerical data, it is more readily recognized in a histogram (i.e. a bar chart), where the areas of the rectangles are proportional to the frequency. If the class intervals are equal, then the height of each column in a bar chart is proportional to the frequency. An example of the derivation of frequency distributions is given in Example 3.1. The histograms shown in Figs. 3.1 and 3.2 illustrate the effect of changing the relative positions of the class boundaries on the apparent shapes of the frequency distributions. Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH003-N53039.indd 13
13
5/26/2008 4:41:09 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
14
(a) Class intervals 0.005 g
Frequency (f )
6
4
1.08
1.07
1.06
1.05
1.09
1.085
1.065
0
1.045
0
1.025
2
1.005
2
0.985
4
0.965
4
1.08
6
1.06
6
1.04
8
1.02
8
1.00
10
0.98
10
0.96
12
0.945
(c) Class intervals 0.010 g
12
0.94
Frequency (f )
(b) Class intervals 0.010 g
1.04
1.03
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.95
0
0.94
2
Weight of water (g) delivered by pipette FIGURE 3.1 Frequency distribution histograms for data (from the section Variability in Delivery from a 1 ml (1 cm3) Pipette in Example 3.1) for the delivery of distilled water from a 1-cm3 pipette; the mean value is 0.984 g and the standard deviation is 0.028 g.
CH003-N53039.indd 14
5/26/2008 4:41:09 PM
FREQUENCY DISTRIBUTIONS
15
11
(a) Class interval 1
10 9
Frequency (f )
8 7 6 5 4 3 2
22
26
24
22
20
18
16
14
12
10
8
6
0
4
1
(b) Class interval 3
20 18
Frequency (f )
16 14 12 10 8 6 4
25–27
22–24
19–21
16–18
13–15
10–12
7–9
0
4–6
2
Number of bacteria/field FIGURE 3.2 Frequency distribution of microscopic counts of bacteria having a mean count of 12.77 bacteria/ field and a variance of 24.02(from the section Variability in Numbers of Bacterial cells Counted Microscopically in Example 3.1).
CH003-N53039.indd 15
5/26/2008 4:41:09 PM
16
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
For a frequency distribution the arithmetic mean value (x) can be derived using the formula: x ∑ fX / n , where X is the mid-point value of the frequency class and f is the frequency of values within the same class. Hence, for the distribution shown in Fig. 3.1(c) x ∑ [(0 . 950 3) (0 . 960 10) (0 . 970 9) ... (1 . 090 1)]/ 50 49 . 21/ 50 0 . 9842 The unbiased estimate of the population variance (s2) is given by: s2 ( ∑ (fX 2 ) x ∑ f X ) (n 1) For x 0 . 9842, x ∑ f X 49 . 21 and ∑ (f X 2 ) 3(0 . 95)2 10(0 . 96)2 9(0 . 97)2 ... (1 . 09)2 48 . 272 . Hence, the estimate of variance is given by s2 (48 . 472 0 . 9842 (49 . 21))/ 49 0 . 0008065 and the standard deviation is given by s '0 . 0008065 0 . 0284 . Thus, for the data cited, the mean weight of water delivered from a 1-cm3 pipette was 0.984 g, and the standard deviation was 0.0284 g. (Using the method illustrated in Example 2.1, the calculated mean value was 0.983 g and the standard deviation was 0.0287 g.)
EXAMPLE 3.1 DERIVATION OF THE MEAN, VARIANCE AND FREQUENCY DISTRIBUTION Variability in delivery from a 1 ml (1 cm3) pipette A pipette was used to transfer 1 cm3 of distilled water into a tared weighing boat. The weight of water delivered was determined on an analytical balance. The experiment was performed 50 times and gave the following results (g water): 0.948, 1.012, 1.085, 1.063, 1.010, 1.000, 0.994, 0.986, 0.995, 0.999, 0.969, 0.965, 0.945, 0.977, 0.957, 0.946, 0.960, 0.955, 1.010, 0.965, 0.975, 0.972, 0.957, 0.961, 0.975, 0.988, 0.989, 0.974, 0.980, 0.980, 1.001, 0.977, 1.021, 1.051, 0.965, 0.963, 0.971, 0.983, 0.962, 0.984, 0.978, 0.968, 0.960, 1.027, 0.959, 0.985, 0.985, 0.967, 0.960, 0.992. The data are organized in distribution frequency classes as shown below (Table 3.1 and Fig. 3.1).
CH003-N53039.indd 16
5/26/2008 4:41:10 PM
CH003-N53039.indd 17
0.9400–0.9449 0.9450–0.9499 0.9500–0.9549 0.9550–0.9599 0.9600–0.9649 0.9650–0.9699 0.9700–0.9749 0.9750–0.9799 0.9800–0.9849 0.9850–0.9899 0.9900–0.9949 0.9950–0.9999 1.0000–1.0049 1.0050–1.0099 1.0100–1.0149 1.0150–1.0199 1.0200–1.0249 1.0250–1.0299 1.0300–1.0349 .. .. 1.0500–1.0549 1.0550–1.0599 1.0600–1.0649 .. 1.0850–1.0899 1.0900–1.0949
Frequency class boundaries (g) 0 3 0 4 6 6 3 5 4 5 2 2 2 0 3 0 1 1 0 . . 1 0 1 . 1 0
Frequency (f) with interval 0.005 g (Fig. 3.1(a)) 0.9400–0.9499 0.9500–0.9599 0.9600–0.9699 0.9700–0.9799 0.9800–0.9899 0.9900–0.9999 1.0000–1.0099 1.0100–1.0199 1.0200–1.0299 .. .. 1.0500–1.0599 1.0600–1.0699 .. .. 1.0800–1.0899
Frequency class boundaries (g)
TABLE 3.1 Arrangement of Data in Frequency Classes
3 4 12 8 9 4 2 3 2 . . 1 1 . . 1
Frequency (f) with interval 0.020 g (Fig. 3.1(b)) 0.9450–0.9549 0.9550–0.9649 0.9650–0.9749 0.9750–0.9849 0.9850–0.9949 0.9950–1.0049 1.0050–1.0149 1.0150–1.0249 1.0250–1.0349 1.0450–1.0549 1.0550–1.0649 .. .. 1.0850–1.0949
Frequency class boundaries (g) 0.950 0.960 0.970 0.980 0.990 1.000 1.010 1.020 1.030 1.050 1.060 . . 1.090
Frequency class mid-point (X)
3 10 9 9 7 4 3 1 1 1 1 . . 1
Frequency (f) with interval 0.010 g (Fig. 3.1(c))
FREQUENCY DISTRIBUTIONS
17
5/26/2008 4:41:10 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
18
Variability in numbers of bacterial cells counted microscopically (Data of Ziegler and Halvorson (1935): their appendix I, slide 1) One hundred microscope fields on a single slide were examined and the number of bacteria counted per field was determined. The frequency distributions of the data are derived from the following:
No. bacteria/ field (x)
No. bacteria/ field (x)
Frequency (f )
9
16 17 18
4 7 7
20
19 20 21
4 1 2
22
22 23 24
0 3 1
19
25 26 27
0 1 0
Frequency (f )
4 5 6
2 4 3
7 8 9
8 5 7
10 11 12
11 7 4
13 14 15
8 6 5
冎 冎 冎 冎
冎 冎 冎 冎
18
7
4
1
n 100 x
∑
fX {(9 5) (20 8) (22 11) (19 14) (18 17) (7 20) (4 23) (1 26)} / 100 n 12 . 77
where x mean value, X mid value of the frequency classes (i.e. 5, 8, 11,…) and f the class frequency value.
∑ f ( x ) x ∑ fx 18685 12.77(1277) 24.0173 2
s2
n1
99
s 24 . 0173 4 . 9 007
The frequency distributions are shown in Fig. 3.2.
CH003-N53039.indd 18
5/26/2008 4:41:10 PM
FREQUENCY DISTRIBUTIONS
19
TYPES OF FREQUENCY DISTRIBUTION Mathematically defined frequency distributions can be used as models for experimental data obtained from any population. Assuming that the experimental data fits one of the models then, amongst other things: 1. The spatial (ecological) dispersion of the population can be described in mathematical terms. 2. The variance of population parameters can be estimated. 3. Temporal and spatial changes in density can be compared. 4. The effect of changes in environmental factors can be assessed. The mathematical models used most commonly for analysis of microbiological data include the Normal (Gaussian), Binomial, Poisson and Negative Binomial distributions, which are described below. The parameters of the distributions are summarized in Table 3.2.
STATISTICAL PROBABILITY Probability is about chance and the likelihood that an event will, or will not, occur in any specific situation. For instance, the probability that a specific person will win the jackpot in the National Lottery is very low (about 1 in 14 million) because the odds against are most improbable – yet people do win the Lottery showing that no matter how improbable an
TABLE 3.2 Some Continuous and Discrete Distribution Functions
Name Normal (Gaussian)
x
Binomial
xs s, for s 0,1,2,…,n
Poissonb
xs s, for s 0,1,2,…, xs s, for s 0,1,2,…,
a
Negative binomial
a b
CH003-N53039.indd 19
Probability density function fx
Domain
(1/
⎡ 1 2⎤ 2 exp ⎢ ((x m)/ ) ⎥ ⎢⎣ 2 ⎥⎦
)
( ns ) ps (1 p)ns em m s / s! ⎛⎜ ⎞ n s ⎜⎝ n ss 1⎟⎟⎟⎠ p (1 p)
Restriction on parameters
Mean
Variance
m 0
m
2
0 p 1q 1p
np
npq
0m
m
m
n 0 and 0 p 1 (p 1/Q and 1p P/Q)
nP
nPQ
The Normal distribution is the limiting form of the Binomial distribution when n → and p → 0. Limiting form of binomial, as p → 0, q → .
5/26/2008 4:41:11 PM
20
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
event there is always a chance that it will occur. By contrast, the chance of getting snow in winter if you live in Norway, Russia or Canada is very high – one might say it is certain to occur – but if you live in England the chance is not very high. If a normal coin is tossed once it will fall to show either a ‘head’ or a ‘tail’ and there is a 50% probability for obtaining a head (or a tail) in a single throw. This is written p 0.50, q 0.50 where (p q) 1; p is the probability that an event will occur (i.e. of obtaining a ‘head’) and q the probability of failure (i.e. that a head will not occur and conversely that a ‘tail’ will occur). It is important to recognize that since there are two mutually independent ways of obtaining either a head (H) or a tail (T), then for two or more coins we could get either H or T for each coin. Hence if first coin falls as an H, the second or subsequent coins could also fall either H or T, that is the following sequences could occur: HH, HT, TH, TT. Therefore, by tossing two coins there are 4 possible outcomes: HH (1 in 4 chances or P 0.25), HT and TH (which are the same; 2 in 4 chances or P 0.50) or TT (1 in 4 chances or P 0.25). If we toss three coins there are 8 possible outcomes: HHH (0.125), HHT (including HTH and THH; 0.375), HTT (including THT and TTH; 0.375) or TTT (0.125). We can determine the number of outcomes quite simply in this case because for a single toss of the coin the probability for obtaining one head is 0.5; with two coins the probability of obtaining HH 0.5 0.5 (0.5)2 0.25 (1 in 4); with three coins the probability of HHH is (0.5)3 that is 0.125 or 1 in 8; and so on. We can generalize these probabilities by saying that the relative probability of a specified outcome for any given number of trials (n) is determined by Pn, where P the probability of that single independent event occurring in a single trial. Pascal’s triangle (Fig. 3.3) provides a simple ‘ready reckoner’ for the any number of independent trials where the event is either positive or negative. Provided that the outcome of one event (A) does not affect the outcome of a second event (B), then the events are totally independent; however, if event A can affect the possible outcome of event B then the events are not independent. Some events are mutually
No of trials
Possible outcomes 1
1
1
2
1
3 4
1 1
1 2
3 4
1 3
6
1 4
1
FIGURE 3.3 Pascal’s Triangle – a visual illustration of binomial outcomes from a series of trials, for instance the toss of a coin. In a single trial there are 2 possible outcomes: a head or a tail, with equal probability (P 0.5). In 2 trials there are 4 possible outcomes: 2 heads (P 0.25), 1 head and 1 tail (P 0.5) or 2 tails (P 0.25). In 3 trials there are 8 possible outcomes and in 4 trials there are 16 possible outcomes, etc. Note that each value in a line is the sum of the values immediately above it. The figure can be expanded by adding data for additional trials.
CH003-N53039.indd 20
5/26/2008 4:41:11 PM
FREQUENCY DISTRIBUTIONS
21
exclusive, for instance, a ‘normal’ coin has both a head and a tail, so when a spun coin falls it can show either a head or a tail – it cannot show both. Suppose that we have a bag containing 20 balls: 4 red (R), 6 blue (B) and 10 green (G). The probabilities of randomly drawing either 1R, or 1B or 1G are 4/20 (0.2), 6/20 (0.3) or 10/20 (0.5), respectively. Assuming that after each draw the ball is returned to the bag, the events are totally independent so that probability that we can draw sequentially 1R, 1B and 1G balls is: P(R)P(B)P(G) P(R傽B傽G) 0.2 0.3 0.5 0.03. Note that since the events are totally independent the overall probability is the multiple of the individual probabilities, as described in the ‘Multiplication Rule’. Suppose further that half of the balls in each colour is marked with an odd number (O) and the remainder with an even (E) number, and that we wish to draw a ball that is either red or odd but not both. Then P(R) 0.2 and P(O) 0.5. These events are not mutually exclusive, so the combined probability is given by P(R艛O) P(R) P(O) 0.2 0.5 0.7. Note that these events are not mutually exclusive, so the ‘Addition Rule’ applies. But if we want to draw either a red ball, or an odd ball, but not a red–odd ball, then the probabilities are again dependent upon the ‘Addition Rule’: P(R艛O) P(R) P(O) P(R傽O), where P(R傽O) signifies the probability for both events to occur. Thus, P(R艛O) 0.2 0.5 0.1 0.6. Let us extend this concept. A pack of playing cards consists of 52 cards divided into 4 suits (spades, hearts, diamonds and clubs) each of which contains cards numbered from 1 to 10 plus a jack, a queen and a king. If we shuffle the pack and then draw the top card the independent chance of drawing an Ace is 1 in 13 (because there are 4 aces in the 52 cards) but the chance of drawing the Ace of Spades is only 1 in 52. If we shuffle the cards and lay the top four cards on the table, what is the chance that any one of the cards laid down will be an Ace? We have already determined that the chance of picking any one specific value card is 4 in 52, so if the first card is not an ace, then the chance that the second card will be an ace is now 4 in 51 (because 1 card has already been drawn and the individual chances are independent of each other) so the cumulative probability is 4/52 4/51 16/2652 ⬇ 1/166. Similarly if card 2 is not an ace, then the probability for an ace as card 3 is 4/50 and for card 4 the chances of an ace are 4 in 49. So the overall chance of finding 1 ace amongst 4 cards is 4/52 4/51 4/50 4/49 256/6,497,400 ⬇ 1/25,380. However, if the first card had been an ace, the chance that the second card would also be an ace is now reduced to 3/51. Hence, the chance that all 4 cards are aces will be 4/52 3/ 51 2/50 1/49 24/6,497,400 ⬇ 1 in 270,000.
THE BINOMIAL DISTRIBUTION (2 ) If we toss a number of coins the average probability of equal numbers of heads and tails is p q 0.5, but if all coins were ‘double-headed’, the probability of a ‘head’ occurring would be p 1.0 (i.e. there would be no chance of obtaining a ‘tail’ and q 0). We can therefore use the concept of probability to answer the general question: ‘What is the chance
CH003-N53039.indd 21
5/26/2008 4:41:12 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
22
that a specific event will occur?’ The probability scale ranges from P 0 for impossible events to P 1 for certain events. In the binomial distribution, p is the probability that an event will occur and q is the probability that the event will not occur. If p and q remain constant in each of a given number (n) of individual independent trials, then since p q 1, the probability series is described by the general expression (p q)n. The individual terms are given by the binomial expansion: ⎞ ⎛ n! ( n x) p x ⎜ Px ( n ) ⎜⎜ x ! (n x)! ⎟⎟⎟⎟ ( q(nx) px ) x )( q ⎠ ⎝ where Px is the probability of finding x individuals in a sample, n is the number of times the test is repeated and n! means factorial n (e.g. 5! 5 4 3 2 1 120). The factorial term merely counts the number of mutually exclusive ways to obtain a particular outcome. The population parameters of mean () and variance (2) are given by np and 2 npq. The binomial distribution is used as a model when a specific characteristic of an individual in a sample can be recognized (e.g. the prevalence of defective samples in a lot). The distribution is often used as the basis for drawing up sampling schemes in Acceptance Sampling (qv) of foods and other materials. In such schemes, q is defined as the probability that any one sample will not be defective (e.g. that it will not be contaminated), p is the probability that the sample will be defective (e.g. will be contaminated) and n is the number of samples tested. The expected probabilities, derived from the expansion of (p q)n, can be calculated (Example 3.2) or obtained from Tables of Binomial Probability, such as those given by the National Bureau of Standards (1950); Fisher and Yates (1974) and Pearson and Hartley (1966).
EXAMPLE 3.2 CALCULATION OF EXPECTED FREQUENCIES FOR A POSITIVE BINOMIAL DISTRIBUTION In a sample of 100 farmed trout, the mean level of Clostridium botulinum spores detected was 2 spores/fish. It is widely believed that the maximum likely prevalence of contamination is 10 spores/fish (n 10). What is the frequency distribution and what are the chances of not detecting the organism? We can use the binomial distribution to determine the probability (P) that no contamination is detectable (P(x0)), or that 1, 2, …,10 spores will be detected (P(x1), P(x2), …, P(x10)). To do this, a sample estimate (pˆ ) of the overall population value in relation to the maximum contamination level expected, is derived as follows: The probability of contamination is given by pˆ
CH003-N53039.indd 22
x 2 0 . 2; hence qˆ 1 pˆ 0 . 8 n 10
5/26/2008 4:41:12 PM
FREQUENCY DISTRIBUTIONS
23
Expected probabilities are given by the expansion of (pˆ qˆ ) n (0 . 2 0 . 8)10 i.e. P(x)
n! ⎡⎢ qˆ (nx) pˆ x ⎤⎥ ⎦ x ! (n x)! ⎣
For the values given, the probability of detecting no Cl. botulinum spores in a fish is P(x0)
10! (0 . 8100 0.20 ) 0 . 810 0 . 1073 0!(10 0)!
The probability of detecting 1 spore/fish is P(x1)
10 ! (0 . 8101 0 . 21 ) 10 0 . 89 0 . 21 0 . 2684 1 ! (10 1)!
The probability of detecting 2 spores/fish is P(x2)
10! (0 . 8102 0 . 22 ) 45 0 . 88 0 . 22 0 . 3020 2 ! (10 2)!
The probabilities for 3, 4, 5, 6, 7, 8, 9 and 10 spores/fish are derived similarly. The total probability ⬄1.0. The expected frequencies of occurrence of Cl. botulinum spores in 100 fish are given by f P(x)N. x(spores/fish) 0 1 2 3 4 5 6 7 8 9 10 Total
P(x)
f NP(x)
f (as integer)*
0.1074 0.2684 0.3020 0.2013 0.0881 0.0264 0.0055 0.0008 0.0001 ** **
10.74 26.84 30.20 20.13 8.81 2.64 0.55 0.08 0.01 – –
11 27 30 20 9 3 1 0 0 0 0
1.0000
100.00
101
* That is to the nearest whole number. ** 0.0001.
Hence with a mean contamination level of 2 spores/fish, and a probable maximum of 10 spores/fish, the probability of not detecting any Cl. botulinum spores would be 11/100, or slightly more than 1 in 10. This theme is developed further in Chapters 5 and 8 in relation to sampling schemes and the use of presence or absence tests for specific organisms.
CH003-N53039.indd 23
5/26/2008 4:41:12 PM
24
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Figure 3.4 shows the probability distribution for different values of the probability for a successful test (p) and different numbers of trials (n). As the values of n and p increase the shape of the binomial distribution approaches the bell-shaped curve, described as the Normal (or Gaussian) distribution. This is the distribution which often describes continuous variables (i.e. measurements rather than counts).
THE NORMAL DISTRIBUTION The Normal distribution refers to a family of distributions that have the same generic shape: they are symmetrical curves with more values concentrated in the centre of the curve and fewer in the tails (Fig. 3.5). The shape of the curve is described by the general expression: f ( X)
1 2 2
e
(x)2 2 2
where f(X) the probable density of the derived variable X (the standard normal deviate) with X (x )/ where x the random variable (i.e. observed value), population mean and standard deviation of the population mean. An estimate of the value of the standard normal deviate (X) can be determined from sample data (x) using Z (x x)/ s . In other words, by subtracting the mean value (x) from each observed value (x) and dividing by the estimate of the standard deviation (s) a series of standardized deviates (Z) is derived. These can be obtained from Tables of the Standardized Normal Deviate (Pearson and Hartley, 1966). The Central Limit Theorem states that if a large number of random variables (i.e. samples) is selected from (almost) any distribution, with mean and variance 2 then the means of these samples will themselves follow a normal distribution with a mean x and a standard deviation x , which is the standard error of the mean, with x / n . Hence, any distribution will approach the normal distribution as the sample size increases, provided that the mean and variance are independent of one another. As will be seen below, distributions such as the Poisson and the Binomial tend to approach normality when the number of samples tested tends to infinity. A unique property of the normal distribution is that the mean and variance are independent and the shape of the distribution is a function of the population variance parameter. From Fig. 3.5 it can be seen that 95.45% of observations occur within2 standard deviations () from the mean () and that 99.73% lie within 3 from the mean. The normal distribution is rarely a suitable model for microbiological data as measured, but it is very important because many parametric statistical tests are based on the normal distribution. Such tests include analysis of variance (ANOVA) and tests for significance of differences. Thus, for microbiological data, steps have to be taken to transform the data such that they become ‘normally’ distributed.
CH003-N53039.indd 24
5/26/2008 4:41:12 PM
FREQUENCY DISTRIBUTIONS
p 0.2 n5
p 0.1 n5
60 Frequency (%)
25
p 0.3 n5
p 0.4 n5
p 0.5 n5
50 40 30 20 10 0
0 1 2 3 4 5 x
0 1 2 3 4 5
p 0.5 n 10
50 Frequency (%)
0 1 2 3 4 5 x
x
40
30
30
20
20
10
10 0
2
4
6
8
10
p 0.5 n 20
0
0
2
4
6
8
x 30
p 0.1 n 20
25
20
15
15
10
10
5
5 0
2
4
6
8
10
10 x
12
14
16
0
18
20
p 0.3 n 20
25
20
0
0 1 2 3 4 5 x
50
40
0
0 1 2 3 4 5 x
0
2
4
6
8
10
12
14
16
18
20
FIGURE 3.4 Binomial probability distributions for various values of p and n in the expansion (p q)n.
CH003-N53039.indd 25
5/26/2008 4:41:13 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
26
40
% frequency (fx)
30
20
10
0
3
2
1
0 xm σ 95.45% 99.73%
1
2
3
X
FIGURE 3.5 A Normal (Gaussian) Distribution Curve described by the equation f (x) where population mean with variance 2.
1 2 2
e(x)
2 /2
THE POISSON DISTRIBUTION (2 ) The arithmetic mean of the discrete binomial distribution is given by np and the variance by 2 npq q. Since q 1 p 1 (/n), then 2 q (1 (/n)) (2/n). Hence if n is finite, the variance will always be less than the mean (2 ). When the probability that an event will occur is low (p → 0) and n approaches infinity (n → ) with np fixed and finite, then the binomial distribution approaches the discrete Poisson distribution. Since 2 (1 (/ n)) , as n →, /n → 0, and 2 → . The Poisson distribution is described by the equation Px e ( x / x !), where Px is the probability that x individuals occur in a sampling unit, is the Poisson parameter ( 2) and e is the exponential value (2.7183). Unlike the binomial distribution, which is a function of two parameters (n and p), only one parameter () is needed for the Poisson series, since np. The Poisson parameter () is estimated by the mean value (m) where m s2, thus Px em (m x / x !) .
CH003-N53039.indd 26
5/26/2008 4:41:13 PM
FREQUENCY DISTRIBUTIONS
27
The probabilities of 0, 1, 2, 3, etc. individuals per sampling unit are given by the individual terms of the expansion of this equation, thus: P(x0) em P(x1) em
P(x2) em P(x3) e3 This can be generalized as P(x1) P(x)
m P(x0) (m) 1!
⎛m⎞ m2 P(x1) ⎜⎜ ⎟⎟⎟ ⎜⎝ 2 ⎠ 2!
⎛m⎞ m3 P(x2) ⎜⎜ ⎟⎟⎟ , etc. ⎜⎝ 3 ⎠ 3!
m . (x 1)
The individual terms can be calculated (Example. 3.3) or may be obtained from standard Tables; for example Pearson and Hartley (1966) give individual terms of the Poisson distribution for different values of m from 0.1 to 15.0. The expected frequency (nPx) is obtained by multiplying each term of the series (Px) by the number of sampling units (n).
EXAMPLE 3.3 CALCULATION OF THE EXPECTED FREQUENCIES OF A POISSON DISTRIBUTION It is intended to inoculate 1000 bottles of meat slurry with a spore suspension at an average level of 10 spores/bottle. What are the expected frequencies for (a) less than 1 spore, (b) less than 5 spores and (c) more than 15 spores/bottle? For the intended mean inoculum level m 10 with N 1000, the probability of 0 spores/bottle (i.e. 1 per bottle) is P(x0) em e10 0 . 0000454 Hence the expected frequency NP(x) 0.0454. An alternative way to express this would be that only 1 in 22,000 bottles would be expected not to contain at least one spore. The succeeding terms of the Poisson series are used to calculate the remaining probabilities.
CH003-N53039.indd 27
5/26/2008 4:41:14 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
28
For simplicity, the generalized equation of P(x1) Px · m/(x 1) is used: P(x0) em e10 0 . 000454 m 10 P(x1) P(x0)
0 . 0000454
0 . 000454 x 1 1 P(x2) 0 . 000454
P(x3) 0 . 00227
10 0 . 00227 2
10 0 . 00 7 57 3
TABLE 3.3 Individual terms (x 0 … 24) of a Poisson Distribution for m 10 and n 1000, Where f is the Expected Frequency of Occurrence of a Spore Inoculum (for Details See Example 3.3). The Values of Px and f are Shown to Four Significant Places X
Px
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
10
Total
P(x0) e P(x1) P(x0) 10/1 P(x2) P(x1) 10/2 P(x3) P(x2) 10/3 P(x4) P(x3) 10/4 P(x5) P(x4) 10/5 P(x6) P(x5) 10/6 P(x7) P(x6) 10/7 P(x8) P(x7) 10/8 Etc.
Px 0.0000454 0.0004540 0.002270 0.007566 0.01892 0.03783 0.06306 0.09008 0.1126 0.1251 0.1251 0.1137 0.09478 0.07291 0.05208 0.03472 0.02170 0.01276 0.007091 0.003732 0.001866 0.0008891 0.0004039 0.0001756 0.00007317
f nPx 0.04540 0.4540 2.270 7.567 18.92 37.83 63.06 90.08 112.6 125.1 125.1 113.7 94.78 72.91 52.08 34.72 21.70 12.76 7.091 3.732 1.866 0.8891 0.4039 0.1756 0.07317
f (as integer) 0 0 2 8 19 38 63 90 113 125 125 114 95 73 52 35 22 13 7 4 2 1 0 0 0
冧
29
冧
49
NP(x) 999.90717
From the data in Table 3.3, the cumulative probability of less than 5, and more than 15, spores/bottle would be P 0.291 and P 0.0484, respectively. Hence, in 1000 replicate inoculated bottles the expected frequencies would be 29 and 49, respectively; Thus 922 of the 1000 bottles would be expected to contain between 5 and 15 spores/bottle.
CH003-N53039.indd 28
5/26/2008 4:41:14 PM
FREQUENCY DISTRIBUTIONS
29
Since the Poisson distribution is associated with rare events which can be considered to occur randomly, for example the distribution of industrial accidents over a long time period or the distribution of small numbers of bacteria in a large quantity of food, then tests for agreement with a Poisson distribution will be tests for randomness of distribution. We have noted above that the Poisson distribution is a special version of the binomial distribution where p → 0 as n → and 2 → . Certain conditions must be met if the Poisson series is to be used as a mathematical model for bacterial counts in a food sample: 1. The number of individual organisms per sampling unit (k) must be well below the maximum possible number that could occur (k → ). 2. The probability that any given position in the sampling unit is occupied by an organism is both constant and very small (constant p → 0); consequently, the probability that that position is not occupied by a particular organism is high (q → 1). 3. The presence of an individual organism in any given position must neither increase nor decrease the probability that another organism occurs near by. 4. The sizes of the samples must be small relative to the whole population. The first condition implies that food samples showing high levels of contamination, or culture plates with large numbers of colonies, might be expected not to conform to a Poisson distribution but the Poisson tends to the normal distribution when is large. The second condition implies that there must be an equal chance that any one organism will occur at any one point in the food sample or the culture. This condition is fulfilled only if the individuals are distributed randomly. The third criterion implies that if bacterial cells have replicated then more than one organism will occur within a given location, hence randomness is unlikely once replication occurs. In a broth culture or in a well-mixed suspension of a liquid food, such as milk, the total volume occupied by 1,000,000 (106) bacterial cells is only about 1 part in 106 of the total volume of the liquid, hence it is reasonable to assume that the cells will occur randomly. In a solid food sample (e.g. a minced meat) contamination occurs randomly throughout and the condition would be met; but if the surface of a piece of meat is contaminated more than the deep tissues, randomness might apply only after total maceration of the sample in diluent. However, if contamination were relatively light, the distribution of organisms could be random as judged by use of some suitable surface sampling technique. When individual organisms are not well separated the variance will be less than the mean (s2 x) and the binomial might be a more suitable model for the distribution. Tests to determine whether the Poisson distribution provides a good description of a set of data are given in Chapter 4. If clumping of organisms occurs, the third condition will not be met and it is probable that variance will be greater than the mean (s2 x) . In theory, the removal of a sample from a finite population will affect the value for P in the next sample unit. However, if the sample forms only a minute proportion of the total ‘lot’ size, then this effect will be minimal and the value of P would not alter significantly from one examination to the next.
CH003-N53039.indd 29
5/26/2008 4:41:15 PM
30
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
THE NEGATIVE BINOMIAL DISTRIBUTION (2 ) If the second and third conditions for use of the Poisson distribution are not fulfilled, the variance of the population will usually be greater than the mean (2 ). This is particularly the case in microbiology where aggregates and cell clumps occur both in natural samples and in dilutions, slide preparations, etc. Of the various mathematical models available, the negative binomial is frequently the best model to describe the distribution frequencies obtained (Jones et al., 1948; Bliss and Fisher, 1953; Gurland, 1959; Takahashi et al., 1964; Dodd, 1969) but other complex distributions may be appropriate. The negative binomial, which describes the number of failures before the xth success when n is the integer, is the mathematical counterpart of the (positive) binomial, and is described by the expansion of (q p)k, where, q 1 p and p /k. The parameters of the equation are the mean and the exponent k. Unlike exponent n in the binomial series, the exponent k of the negative binomial is neither an integer nor is it the maximum possible number of individuals that could occur in a sample population. Instead it is related to the spatial or temporal distribution of the organisms in the sample and takes into account the effects of cell clumps and aggregates. The variance of the population is given by 2 kpq q (1 ( k)) 2 ( 2 k) . Therefore, the reciprocal of the constant k (i.e. 1/k) is a measure of the excess variance or clumping of individuals in the population. As 1/k → 0 and k → , the distribution converges to Poisson, with 2 . Conversely, if clumping is dominant, k → 0 and therefore 1/k → ; the distribution converges on the logarithmic distribution (Fisher et al., 1953). Applications of the negative binomial distribution within biological sciences are numerous. One such was the demonstration that the numbers of microbial colonies in soil follow a Poisson distribution and the numbers of organisms within colonies follows a logarithmic distribution, so that the distribution of all organisms in soil conform to a negative binomial (Jones et al., 1948). The individual terms of the expansion of (q p)k are given by: k ⎛ ⎞ ⎛ (k x 1)! ⎞⎟ ⎛⎜ x ⎞⎟ ⎟ ⎟⎜ Px ⎜⎜1 ⎟⎟⎟ ⎜⎜ ⎜⎝ k ⎠ ⎜⎝ x ! (k 1)! ⎟⎟⎠ ⎜⎝ k ⎟⎟⎠
where Px is the probability that x organisms occur in a sample unit. As in other distributions, the expected frequency of a particular count is NPx, where N is the number of sample units. A very simple method of deriving an approximate value for the k can be obtained by rearranging the equation for variance of a negative binomial 2 ( 2 / k), hence k 2 ( 2 ) . Since the statistical estimates of the parameters and 2 are x and s2, respectively, an estimate (kˆ) of the value of k can be derived from kˆ x 2 /(s2 x). Anscombe (1950) showed that this method is inefficient because it does not give a reliable estimate for values of kˆ below 4 unless x is also less than 4.
CH003-N53039.indd 30
5/26/2008 4:41:15 PM
FREQUENCY DISTRIBUTIONS
31
Estimations of the population parameters and k are obtained from the frequency distribution statistics x and kˆ , the arithmetic mean (x) being derived in the normal way (Example. 2.1). Determination of kˆ is much more complex. Several methods have been proposed (Anscombe, 1949, 1950; Bliss and Fisher, 1953; Debauche, 1962; Dodd, 1969) to obtain an approximate estimate, which can then be used in a maximum likelihood method to obtain an accurate estimate for kˆ (Example 3.4).
EXAMPLE 3.4 CALCULATION OF kˆ AND DERIVATION OF THE EXPECTED FREQUENCY DISTRIBUTION OF A NEGATIVE BINOMIAL FOR DATA FROM MICROSCOPIC COUNTS OF BACTERIAL CELLS IN MILK (DATA OF MORGAN ET AL., 1951) The observed frequency distribution of bacterial cells per field in a milk smear was:
Number of cells (x) 0 F fx Ax
1
56 104 0 104 344 240
2
3
4
5
80 160 160
62 186 98
42 168 56
27 135 29
6
7
9 9 54 63 20 11
8
9
10
10
Total
5 40 6
3 27 3
2 20 1
1 10 0
400 967
Where x the cell count, f observed frequency of that count and Ax is the cumulative frequency of counts exceeding x. The total number (N) of sample units (i.e. fields counted) is given by N f 400 and the total of Ax fx 967 . The arithmetic mean count is given by x { fx/ f } 967 / 400 2 . 4175 , and the variance by s2 { fx2 x fx}/(n 1) {3957 2 . 4175(967)}/ 399 4 . 0583
Estimation of kˆ (Method 1) This simple method is based on the equation for variance of a negative binomial. The population variance is given by: 2 ( 2 k ) and therefore k 2 ( 2 ) .Substituting the sample statistics for the population parameters we get kˆ x 2 (s2 x ). The method is not very efficient for values of k 4 but provides an approximation that can be used in other methods. For our data, kˆ1
CH003-N53039.indd 31
x2 (2 . 4175)2 3 . 5619 3 . 6 s2 x 4 . 0583 2 . 4175
5/26/2008 4:41:15 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
32
Test for efficiency
kˆ 3 . 5619 1 . 47 which is less than 6 and x 2 . 4175
(kˆ x )(kˆ 2) (3 . 5619 2 . 4175)(3 . 5619 2) 13 . 76 x 2 . 4175 which is less than 15. Hence the value for kˆ is not efficient.
Estimation of kˆ by the maximum likelihood method The maximum likelihood equation is ⎛ x⎞ N log e ⎜⎜1 ⎟⎟⎟ ⎜⎝ kˆ ⎟⎠
⎛ A
⎞
∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠.
where N the total number of sampling units (i.e. microscopic fields examined), Ax is the total number of counts exceeding x and loge is the natural logarithm. Different values of kˆ are tried until the equation is balanced by iteration. We solve each side of the equation, initially using the approximation of kˆ1 3 . 6 (derived by Method 1). Solving first the left hand side of the equation: ⎛ ⎛ x ⎞⎟ 2 . 4175 ⎟⎞ ⎜ N log e ⎜⎜1 ⎟⎟⎟ 400 log e ⎜⎜1 ⎟ 2 0 5 . 496 ⎜⎝ ˆ ⎜⎝ 3 . 6 ⎟⎠ k1 ⎟⎠ Solving the right hand side of the equation: ⎛ A
⎞
Ax0 A A A A x1 x2 x3 … x10 kˆ 10 kˆ kˆ 1 kˆ 2 kˆ 3
344 240 160 98 … 1 205 . 84 3.6 4.6 5.6 6.6 13 . 6
∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠
Using kˆ1 3.6, the 2 equations differ by 0.344 (i.e. the left hand side is less than the right hand side). We now select a larger value (e.g. 5.0) for kˆ3 and again solve the two sides of the equation: ⎛ ⎛ x ⎞⎟ 2 . 4175 ⎞⎟ ⎜ N log e ⎜⎜1 ⎟⎟⎟ 400 log e ⎜⎜1 ⎟ 1 5 7 . 76 ⎜⎝ ˆk ⎟ ⎜⎜⎝ 5 . 0 ⎟⎠ 3⎠ ⎛ A
⎞
∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠
CH003-N53039.indd 32
344 240 160 98 … 1 156 . 51 5 6 7 8 15
5/26/2008 4:41:16 PM
FREQUENCY DISTRIBUTIONS
33
For this trial value of kˆ3 , the difference between the two sides is 1.65. We then arrange the data as follows: K
Difference
3.6
0.34
kˆ1 kˆ
?
kˆ3
5.0
2
0.00 1.65
Then, kˆ2 3 . 6 0 (0 . 34) 0 . 34 0 . 171 5.0 3.6 1 . 65 (0 . 34) 1 . 99 Hence, kˆ2 3 . 6 1 . 4 0 . 171 0 . 239 and kˆ2 3 . 6 0 . 239 3 . 84 The distribution of these counts can therefore be described by the statistics: x 2 . 1475; s2 4 . 0583; and k 3 . 84
The negative binomial distribution curve The distribution curve can be derived from the probability function equation:
P(x)
kˆ x ⎛ x ⎞⎟ ⎛⎜⎜ (kˆ x 1)! ⎞⎟⎟ ⎛⎜ x ⎞⎟ ⎜ ⎟⎟ ⎜⎜1 ⎟⎟ ⎜ ⎟⎟ ⎜⎜ ⎝ kˆ ⎟⎠ ⎜⎜⎝ x!(kˆ 1)! ⎟⎠ ⎝ x kˆ ⎟⎠
The probability of 0 bacteria/field is given by: ⎛ P(x0) ⎜⎜1 ⎜⎝ ⎛ ⎜⎜1 ⎜⎝
3 .84 ⎛ (3 . 84 0 1) ! ⎞⎟ ⎛ ⎞⎟0 2 . 4175 ⎞⎟ 2 . 4175 ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎟ ⎜⎝ 0 ! (3 . 84 1) ! ⎠ ⎜⎝ 2 . 4175 3 . 84 ⎟⎠ 3 . 84 ⎠ 3 .84 2 . 4175 ⎞⎟ ⎟⎟ 3 . 84 ⎠
The calculation is simplified by taking logs of both sides, ⎛ 2 . 4175 ⎞⎟ log Px0 3 . 84 log ⎜⎜1 ⎟ 3 . 84 log (1 . 6296) ⎜⎝ 3 . 84 ⎟⎠ 0 . 8144 therefore
CH003-N53039.indd 33
Px0 antilog (0 . 8144) 0 . 153
5/26/2008 4:41:16 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
34
The expected frequency of zero cell counts is: NP(x0) 400 0.153 61.2 The probability of 1 bacterial cell/field is given by: k 1 ⎛ x ⎞ ⎛ (k 1 1)! ⎞⎟ ⎛⎜ x ⎞⎟ P(x1) ⎜⎜1 ⎟⎟⎟ ⎜⎜ ⎟⎟ ⎜ ⎟⎟ ⎜⎝ k ⎠ ⎜⎝ 1 ! (k 1)! ⎠ ⎝⎜ x k ⎠ ⎛ (k 1 1)! ⎞⎟ ⎛ x ⎞⎟1 Px0 ⎜⎜ ⎟ ⎜⎜ ⎟ ⎝⎜ 1 ! (k 1)! ⎟⎠ ⎝⎜ x k ⎟⎠
Px0
1 k ⎛⎜ x ⎞⎟ ⎟⎟ ⎜⎜ 1 ⎝ x k⎠
Hence, Px1 (0.153) (3.84) (0.3863)1 0.227. The expected frequency of a count of 1 bacterial cell/field is: NP(x1) 400 0.227 90.8 The probability of 2 bacterial cells/field is given by: ⎛ (k 2 1)! ⎞⎟ ⎛ x ⎞⎟2 P(x2) Px0 ⎜⎜ ⎟⎜ ⎟ ⎜⎝ 2 ! (k 1)! ⎟⎠ ⎜⎜⎝ x k ⎟⎠ ⎛ (k 1)(k) ⎞⎟ ⎛ x ⎞⎟2 Px0 ⎜⎜ ⎟⎟ ⎜⎜ ⎟ ⎜⎝ ⎠ ⎜⎝ x k ⎟⎠ 2 ⎛ 4 . 84 . 3 . 84 ⎞⎟ Hence, Px2 (0 . 153) ⎜⎜ ⎟⎟ (0 . 3863)2 0 . 212 ⎜⎝ ⎠ 2 The expected frequency of a count of 2 bacterial cells/field is: NP(x2) 400 0.212 84.8 This process is continued until P(x) 1 and f 400, as illustrated in Table 3.4. A plot of the observed and expected frequencies is given in Fig. 3.6. Note 1: In the equation P(x)
kˆ ⎛ x ⎞⎟ ⎛⎜⎜ (kˆ x 1)! ⎞⎟⎟ ⎛⎜ x ⎞⎟ ⎜ ⎟⎟ it is not necessary to derive ⎜1 ⎟⎟ ⎜ ⎟⎜ ⎜⎝ k ⎠ ⎜⎜⎝ x ! (kˆ 1)! ⎟⎟⎠ ⎜⎝ x kˆ ⎟⎠
⎛ (kˆ x 1) ! ⎞⎟ ⎟⎟ simplifies as follows: the factorials in the second component. The component ⎜⎜⎜ ⎜⎜⎝ x ! (kˆ 1) ! ⎟⎟⎠
CH003-N53039.indd 34
For x 0,
⎛ (kˆ 0 1)! ⎞⎟ ⎛ (kˆ 1)! ⎞⎟ ⎜⎜ ⎟⎟ 1 ⎟⎟ ⎜⎜ ⎜ ⎜⎜ ⎜⎝ 0 ! (kˆ 1)! ⎟⎟⎠ ⎜⎜⎝ 1(kˆ 1)! ⎟⎟⎠
For x 1,
⎛ (kˆ 1 1) ! ⎞⎟ ⎛ (kˆ 1) ! ⎞⎟ ⎜⎜ ⎟⎟ kˆ ⎟⎟ ⎜⎜ ⎜ ⎜⎜ ⎜⎝ 1 ! (kˆ 1) ! ⎟⎟⎠ ⎜⎜⎝ 1(kˆ 1) ! ⎟⎟⎠
5/26/2008 4:41:17 PM
FREQUENCY DISTRIBUTIONS
For x 2,
For x 3,
35
⎛ (kˆ 2 1) ! ⎞⎟ ⎛ (kˆ 1)(kˆ)(kˆ 1) ! ⎞⎟ (kˆ 1)kˆ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎜ ⎟⎟ ˆ 2 ⎜⎝ 2 ! (k 1)! ⎟⎟⎠ ⎜⎜⎝ 2(kˆ 1)! ⎠
⎛ (kˆ 3 1)! ⎞⎟ ⎛ (kˆ 2)(kˆ 1)(kˆ)(kˆ 1)! ⎞⎟ ⎜⎜ ⎟⎟ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜⎜⎝ 3 ! (kˆ 1)! ⎟⎟⎠ ⎜⎜⎜⎝ 6(kˆ 1)! ⎠ ˆ ˆ ˆ (k 2)(k 1)k 6
Note 2: Computer algorithms are now available for fitting negative binomial models to experimental data. The ‘goodness-of-fit’ between the observed and calculated distributions (e.g. Table 3.4) can be tested using 2 (see Chapter 4). It is essential that in all calculations to derive k, P(x), etc. at least seven significant figures are retained in the calculator to avoid ‘rounding’ errors.
TABLE 3.4 Individual Terms for Values of x from 0 to 10, for a Negative Binomial Distribution with x 2.4175, n 400 and k 3.84 (for Details See Example 3.4)
Value x 0 1 2 3 4 5 6 7 8 9 10 10
Probability of occurrence Px P(x0) (1(x/k))k Ra P(x1) R (k/1)(Yb) P(x2) R[(k 1)(k)/2!](Y) P(x3) R[(k 2)(k 1)(k)/ 3!](Y) etc. etc.
N
CH003-N53039.indd 35
a
R (1 x / k)k .
b
Y (x /(x k)) x .
Calculated frequency (nPx)
Frequency as integer (nPx)
Observed frequency (f )
0.1528 0.2269 0.2129 0.1603
61.20 90.79 84.88 63.84
61 91 85 64
56 104 80 62
0.1058 0.06395 0.03628 0.01962 0.01022 0.005165 0.0025471
42.17 25.55 14.54 7.90 4.13 2.10 1.04
42 26 15 8 4 2 1 1
42 27 9 9 5 3 2 1
398.16
400
400
P 0.9949
5/26/2008 4:41:17 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
36
Negative binomial Number of occurrences
120 100 80 60 40 20 0
0
1
2
3
4
5
6
7
8
9
10
Bacterial cells/field FIGURE 3.6 Negative binomial distribution showing the expected distribution (line) and the observed (histogram) frequency distribution of counts of bacterial cells/field.
The approximate value for kˆ can be substituted in a maximum likelihood equation: ⎛ x⎞ N log e ⎜⎜⎜1 ⎟⎟⎟ ⎝ kˆ ⎟⎠
⎛ Ax ⎞⎟ ⎟⎟ k x ⎟⎠
∑ ⎜⎜⎜⎝ ˆ
where N is the total number of sample units, loge is the natural logarithm and Ax is the accumulated frequency (i.e. the total number of counts) exceeding x. Different approximations for kˆ are tried and the equation is balanced by iteration. The method is illustrated in Example 3.4. Tables of expected probabilities for 1480 negative binomial distributions, covering values of k from 0.1 to 200, have been published (Williamson and Bretherton, 1963). The tables are arranged in order of increasing size of a parameter ‘p’ that is equivalent to 1/q (not p as used in the present discussion). The discrepancy arises because they used an alternative form [pk(1 q)k] of the negative binomial equation where p q 1 (whereas in this text the term q p 1 is used). An estimate of Williamson and Bretherton’s ‘p’ is given by: p 1(1 x / k) . An example of a frequency distribution for a negative binomial is given in Fig. 3.4 that shows a comparison of the observed and calculated frequency distributions.
CH003-N53039.indd 36
5/26/2008 4:41:17 PM
FREQUENCY DISTRIBUTIONS
37
RELATIONSHIP BETWEEN THE FREQUENCY DISTRIBUTIONS The parameters of the various distributions are summarized in Table 3.5. Elliott (1977) illustrated the general relationships among the binomial family distributions as:
Binomial family
Binomial (s 2 x) (p q)n
k→
‘Normal’ (s 2 and x are independent) ( p q 0.5)
Negative binomial (s 2 x ) (q p)k
k→
Poisson (s 2 x ) (p → 0, q → 1)
k →0
Logarithmic series
Reproduced from Elliott (1977), by permission of the Freshwater Biological Association
This effect can be seen by comparison of the curves in Fig. 3.4 and Figs. 3.7–3.9. The shapes of both the binomial (Fig. 3.4) and the negative binomial frequency distribution curves, with 10 and k 1000 (Fig. 3.9) are very similar to that of a Poisson distribution with 10 (Fig. 3.7). It can also be seen that the binomial is asymmetric for low values of p (or q) (Fig. 3.4), that negative binomial curves are asymmetric for low values of and k (Figs. 3.8 and 3.9), and that Poisson curves are asymmetric for low values of (Fig. 3.7).
TRANSFORMATIONS Whenever it is required to make comparisons of data (e.g. tests for the standard difference between mean values), the parametric test methods require that the data conform to a normal distribution, that is the variance of the sample should be independent of the mean and the components of the variance (i.e. the variances due to actual differences between the samples and those due to random error) should be additive. The binomial distribution approximates to a normal distribution when the number of sample units is large (n 20) and the variance is greater than 3. Since s2 npq, the normal approximation can be used when p 0.4–0.6 and n is greater than 12, or when p 0.1–0.9 and n 33. The normal approximation cannot be used if n 12.
CH003-N53039.indd 37
5/26/2008 4:41:18 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
38
The Poisson distribution is asymmetric for low values of its parameter (estimated by m x s2) but approaches the Binomial when is large and the Binomial itself approaches the normal distribution when n is large (Fig. 3.7). The normal approximation to Poisson can be used when is generally > 10. For small values of k, the negative binomial distribution is asymmetric but it approaches normality for large values of k when the mean () is also large (e.g. 10, k 1000 in Fig. 3.9). The first condition of normality (i.e. symmetrical distribution of values around the mean) can be attained by all three distributions in certain circumstances, so that some methods
14
k 3.0 m 1.0
10 Frequency (%)
Frequency (%)
50 40 30
8 6
20
4
10
2
0
k 3.0 m 4.9
12
0
2
4
6
4
6
0
0
2
4
6
8
10 x
12
14
16
18
20
12
14
16 x
18
20
22
24
26
28
30
32
x 14
k 3.0 m 9.5
12
Frequency (%)
10 8 6 4 2 0
0
2
8
10
FIGURE 3.7 Negative binomial distributions for k 3.0 and for various mean values () based on expansion of the formula (q p)k.
CH003-N53039.indd 38
5/26/2008 4:41:18 PM
FREQUENCY DISTRIBUTIONS
39
Frequency (%)
10
k 1.9 m 10
8 6 4 2 0
0
2
4
6
Frequency (%)
10
k5 m 10
8 6 4 2 0
0
2
4
6
10 Frequency (%)
8 10 12 14 16 18 20 22 24 26 28 30 x
8 10 12 14 16 18 20 22 24 26 x k 10 m 10
8 6 4 2 0
0
2
4
6
8 10 12 14 16 18 20 22 x
Frequency (%)
12 10
k 50 m 10
8 6 4 2 0
0
2
4
6
Frequency (%)
20
8 10 12 14 16 18 20 x k 1000 m 10
15 10 5 0
0
2
4
6
8 10 12 14 16 18 20 x
FIGURE 3.8 Negative binomial distributions for 10 and values of k from 1.9 to 1000 based on the expansion of the formula (q p)K.
CH003-N53039.indd 39
5/26/2008 4:41:18 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
40
Frequency (%)
40
20 λ5
λ1 20
0
10
0
2
4
0
0
2
4
6
8
10
Frequency (%)
12 λ 10
6
9
λ 20
6 3
0
0
2
4
6
8
0 10 12 14 16 18 20 22 24 26 28 30 32 10 12 14 16 18 Number of individuals (x) per sampling unit
FIGURE 3.9 Poisson distribution series for values of from 1 to 20. The frequency of each count is expressed as percentage of total number of counts.
associated with the normal distribution (e.g. standard error of the mean and confidence limits) may then be applied. However, as mean and variance increase together in all three distributions, the condition requiring independence of mean and variance can never be fulfilled. Consequently, standard methods intended to answer questions such as ‘Does the arithmetic mean value of one set of colony counts differ from that of a second set?’ and other parametric tests cannot be done without risk of introducing considerable errors. The problem can be overcome by transforming the data, using an appropriate mathematical model such that the distribution frequency is normalized (see Fig. 3.10) and the interdependence of mean and variance is removed. Plotting the mean against the variance on a log-log scale, can provide an assessment as to whether or not the mean and variance of the original and transformed data are independent. If, as in Figs 9.1–9.3, the log variance increases with increasing mean values, the mean and variance are not independent. Transformation also results in the components of variance becoming additive, thereby permitting application of analysis of variance. The choice of transformation to be used is governed by the frequency distribution of the original data. In many routine operations, the number of sample units may be too small to permit the data to be arranged in a frequency distribution. In such circumstances, the relationship between the mean value (x) and the variance (s2) of the data can be used as a guide in the choice of a suitable transformation (Table 3.5).
CH003-N53039.indd 40
5/26/2008 4:41:18 PM
FREQUENCY DISTRIBUTIONS
41
TABLE 3.5 Transformation Functions Original distribution Transformation – Replace ‘x’ with
Special conditions
Known
Not known
Poisson
s2 (x)
x
No counts 10
Poisson
s2 (x)
x 0.5
Some counts 10
Binomial
s2 (x)
Negative binomial
sin1 sinh1
Not known Not known
s2
( x 0 . 375)
k5
( k 2(0 . 375))
log (x (k/2))
5k2
(x)
log x
No zero counts
(x)
log (x 1)
Some zero counts
Negative binomial
s2
x
Source: Modified from Elliott (1977).
Although in some microbiological situations the distribution of microorganisms conforms to Poisson, in most circumstances, additional method-based components may affect the pure Poisson sampling variance such that the distribution conforms either to a lognormal or a negative binomial distribution. For routine purposes one can be reasonably confident that a logarithmic transformation will be appropriate. That is to say that the data value x is replaced by a value y, where y log x or y log (x 1) depending upon whether or not any zero counts are involved (Table 3.5). Occasionally it may be necessary to back transform the derived arithmetic mean value to the original scale (see calculation of geometric mean, Example. 2.1), although transformed values are frequently cited in microbiological texts as, for instance, log cfu/g. If back transformation is required, it is essential that the transformation is totally reversed, that is for the x 0 . 5 transformation of Poisson data, square the transformed value and subtract 0.5; for log (x 1), take the antilog and then subtract 1 (although this correction is usually insignificant). Transformation of data is an essential requirement for most parametric statistical analysis of quantitative data obtained in microbiological analysis. Non-parametric procedures offer alternative means of data analysis, since such methods are by definition distributionfree, but they may be unreliable since they make certain assumptions about the shape and dispersion of the distribution, which must be the same for all the groups compared.
CH003-N53039.indd 41
5/26/2008 4:41:19 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
42
(a) 100
50
0
0
2
4
6
8
0.4
0.6
0.8
10 12 Count (x )
14
16
18
20
Frequency (f) (%)
(b) (i) 100
50
0 0.2
1.0
1.2
1.4
Transformed count log x
1.6
k 2
(b) (ii) 100
50
0 0.0
0.2
0.3
0.4
0.5
1.0
1.2
1.4
Transformed count (log(x 1)) FIGURE 3.10 Frequency distributions of microscopic counts of bacterial cells/field (a) before and (b) after transformation of data. Transformation (b)(i) used the formula log(x k/2) and transformation (b)(ii) used log(x 1), where x is the actual number of bacteria/field.
CH003-N53039.indd 42
5/26/2008 4:41:19 PM
FREQUENCY DISTRIBUTIONS
43
EXAMPLE 3.5 TRANSFORMATION OF A NEGATIVE BINOMIAL DISTRIBUTION The data cited in Example 3.4 are transformed using (a) log (x (k/2)) using the best value calculated for k and (b) log (x 1), which assumes that the distribution is unknown but that s2 x . The original frequency distribution, with x 2 . 4175 and kˆ 3 . 915 , and the derived frequencies (a) and (b) are: Observed frequency (f)
Mean (x )
log (x (k/2)) (a)
log (x 1) (b)
56 104 80 62 42 27 9 9 5 3 2 1
0 1 2 3 4 5 6 7 8 9 10 19
0.292 0.471 0.597 0.695 0.775 0.842 0.901 0.952 0.998 1.040 1.108 1.321
0.000 0.301 0.477 0.602 0.699 0.778 0.845 0.903 0.954 1.000 1.104 1.301
A comparison of the frequency curves is given in Fig. 3.6; note that the asymmetry is markedly reduced by transformation.
References Anscombe, FJ (1949) The statistical analysis of insect counts based on the negative binomial distribution. Biometrika, 5, 165–173. Anscombe, FJ (1950) Sampling theory of the negative binomial and logarithmic series distributions. Biometrika, 37, 358–382. Bliss, CI and Fisher, RA (1953) Fitting the binomial distribution to biological data and a note on the efficient fitting of the negative binomial. Biometrics, 9, 176–200. Debauche, HR (1962) The structural analysis of animal communities in the soil. In Murphy, PW (ed.) Progress in Soil Zoology. Butterworth, London, pp. 10–25. Dodd, AH (1969) The theory of disinfectant testing with a mathematical and statistical section, 2nd edition. Swifts (P and D) Ltd, London. Elliott, JM (1977) Some methods for the statistical analysis of samples of benthic invertebrates. (2nd edition). Freshwater Biological Association Scientific Publication No. 25. Ambleside, Cumbria, UK.
CH003-N53039.indd 43
5/26/2008 4:41:20 PM
44
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Fisher, RA, Corbett, AS and Williams, CB (1953) The relation between the number of species and the number of individuals in a random sample of an animal population. J. Animal Ecol., 12, 42–58. Fisher, RA and Yates, F (1974) Statistical tables for biological, agricultural and medical research, 6th edition. Longman Group, London. Gurland, J (1959) Some applications of the negative binomial and other contagious distributions. Amer. J. Pub. Health, 49, 1388–1399. Jones, PCT, Mollison, JE and Quenouille, MH (1948) A technique for the quantitative estimation of soil microorganisms. Statistical note. J. Gen. Microbiol., 2, 54–69. Morgan, MR, MacLeod, P, Anderson, EO and Bliss, CI (1951) A sequential procedure for grading milk by microscopic counts. Storrs Agricultural Experimental Station Bulletin No. 276. National Bureau of Standards (1950) Tables of Binomial Probability Distribution. Applied Mathematics Series No. 6, Washington, DC. Pearson, ES and Hartley, HO (1966) Biometrika Tables for Statisticians, 3rd edition. University Press, Cambridge. Takahashi, K, Ishida, S and Kurokawa, M (1964) Statistical consideration of sampling errors in total bacteria cell count. J. Med. Sci. Biol., 17, 73–86. Williamson, E and Bretherton, MH (1963) Tables of the Negative Binomial Probability Distribution. Wyman, New York. Ziegler, NR and Halvorson, HO (1935) Application of statistics to problems in bacteriology. IV Experimental comparison of the dilution method, the plate count and the direct count for the determination of bacterial populations. J. Bacteriol., 29, 609–634.
CH003-N53039.indd 44
5/26/2008 4:41:20 PM
4 THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
The results of a microbiological analysis will indicate the level and/or types of organisms in a sample matrix and may reflect the dispersion of organisms within, or on, that matrix. If one considers a series of replicate samples taken from a stored food, changes in numbers will occur with increasing storage time. Since not all organisms will grow at the same rate at any specific temperature and, indeed, some strains of organism may decrease in number, the relative proportions of organisms in the population will also change. In a localized ecological situation, the growth of an organism will result in development of microcolonies that may then affect the growth of other organisms by removing or providing essential nutrients, by antibiosis, etc. Hence knowledge of the temporal changes occurring over time in a population is of practical importance. Three types of spatial distribution of organisms may occur (Elliott, 1977): 1. Random distribution. 2. Regular, uniform or even distribution (sometimes called under-dispersion). 3. Contagious (aggregated or clumped) distribution, or over-dispersion. The three basic distribution types are illustrated diagrammatically in Fig. 4.1, but contagious distribution can also take other forms (Fig. 4.2). It is possible for two or more of the three basic types of distribution to occur simultaneously. For example, Jones et al. (1948) showed that although bacterial colonies were randomly distributed in soil (i.e. occurred in a Poisson series) individual cells within a colony conformed to a logarithmic series and the total bacterial population followed a negative binomial (contagious) distribution. The dispersion of a population determines the relations between variance (2) and mean (). Various mathematical distributions, such as those discussed in Chapter 3, can be used to Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH004-N53039.indd 45
45
5/26/2008 4:46:39 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
46
s2 m
(a) Random
s2 m
s2 m
(b) Regular
(c) Contagious
FIGURE 4.1 Three types of spatial distribution: (a) Random; (b) Regular (upper-ideal form, lower-normal form); (c) Contagious (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
(a)
(b)
(c)
FIGURE 4.2 Different types of contagious distribution: (a) Small clumps; (b) Large clumps with individuals randomly distributed in each clump; (c) Large clumps with individuals uniformly distributed in each clump (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
CH004-N53039.indd 46
5/26/2008 4:46:39 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
47
model the relationships between variance and mean. A Poisson series (2 ) is a suitable model for a random distribution and the binomial (2 ) is an approximate model for a regular distribution. The negative binomial is one of several models that may be used to describe a contagious distribution (2 ). RANDOM DISTRIBUTION In a random distribution there is an equal chance for any one individual microbial cell to occupy any specific unoccupied position in a suspension and, therefore, the presence of one organism will not affect the position of adjacent organisms. Since randomness implies the lack of any systematic distribution, some cells occur in close proximity whereas others are more distantly spread (Fig. 4.1a). A bacterial cell measures about 0.5–1.5 m long by about 1 m in diameter so that the relative volume occupied by such a cell will be about 1 m3. Since a volume of 1 cm3 comprises 1012 m3, then 1 million (106) bacterial cells will occupy only about 0.0001% of the available volume. It is reasonable, therefore, to assume that individual cells, clumps and microcolonies in a suspension will be distributed randomly, at least at levels up to about 109 colony forming units (cfu)/ml. The accepted test for randomness is agreement with a Poisson distribution, which requires compliance with the four conditions given previously (Chapter 3). Assuming that a hypothesis of randomness is not rejected, there remains a possibility that non-randomness occurs but that it cannot be detected. The most plausible cause of non-randomness is that of chance effects. Such effects might include environmental factors that affect microbial distribution resulting in a tendency for organisms to form microcolonies or to migrate in or on a food. On many occasions, therefore, if a test of randomness is satisfied one must conclude that any non-randomness cannot be detected using the sampling, analytical and statistical techniques available. It is important to take into consideration the effect of sample size when considering randomness. If the sample unit is much larger than the average size of clumps of individuals, and these clumps are randomly distributed, then the apparent population dispersion will be random and non-randomness will not be detected. Maceration and dilution of a sample to estimate numbers of colony forming units will result in disruption of the cell clumps and aggregates, such that the results obtained may indicate a random distribution of organisms. Only for low-density populations can randomness be a true hypothesis and the implications must be considered carefully before the hypothesis is accepted. However, in practical food microbiology, the advantages associated with use of the random dispersion (i.e. the Poisson series, with simple methods of calculation of confidence limits, etc.) have often been considered to outweigh the disadvantages of rejecting the hypothesis for randomness. Such expediency can lead to inaccurate conclusions since sources of variation other than simple sampling will usually inflate the overall variance and lead to non-Poisson distribution!
CH004-N53039.indd 47
5/26/2008 4:46:41 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
48
43 40
it
35
Up p
Confidence limits
er
lim
30
25
20
15 it
er ow
10
lim
L
5
0 0
5
10
15 c or m
20
25
30
FIGURE 4.3 95% confidence limits for c from a Poisson series. c is a single count or the mean (x m) of a small sample (n) where nm 30 (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
Tests for Agreement with a Poisson Series Confidence limits for a Poisson variable. An approximate test for agreement with a Poisson series (Fig. 4.3) is provided by confidence limits for the mean. For the data used in section ‘For large sample numbers’ in Example 4.1, the mean value is 2.68 and the counts range from 0 (i.e. 1) to 6. From Fig. 4.3, the 95% confidence limits for m 2.68 range from 1 to about 8. Consequently, it is not unreasonable to suppose that since all the counts lie within these limits, the counts come from a Poisson series; and therefore that the parent population could be distributed randomly. This test will not distinguish regular from random populations and the dispersion could still be random if only one or two counts lie outside the confidence limits. Although rapid, this test may be unreliable.
CH004-N53039.indd 48
5/26/2008 4:46:41 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
49
EXAMPLE 4.1 INDEX OF DISPERSION AND 2 TESTS FOR AGREEMENT WITH A POISSON SERIES For Small Sample Numbers (n 30) Apparent Under-dispersion In a series of replicate plate counts the following numbers of colonies were counted on the 108 dilution plates (data of Ziegler and Halvorson, 1935; their experiment 4a): 69, 69, 68, 79, 81, 90, 81, 80, 72, 78 x 76 . 7; s2 49 . 79; n 10 and ν 101 9 degrees of freedom First we determine the Index of Dispersion (I) value for the data using the equation: I
s2 49 . 79 0 . 65 x 76 . 7
A value of I 1, suggests the occurrence of under-dispersion and therefore the possibility of a ‘regular’ distribution. We use the 2 to test for compliance with a random distribution: 2
s2 (n 1) I ( n 1) 0 . 65 9 5 . 85 x
From tables (e.g. Pearson and Hartley, 1976) the observed probability value for 2 5.85 is slightly greater than P 0.75 with 9 degrees of freedom (ν 9). Hence although the index of dispersion is less than 1, agreement with a Poisson series is not disproved at the 5% significance level and the hypothesis of randomness cannot be rejected.
Apparent Over-dispersion Numbers of colonies per plate from Ziegler and Halvorson (1935), their Experiment 4b at 107 dilution: 287, 307, 340, 332, 421, 309, 327, 310, 320, 358, 302, 304 The sample statistics are: x 326 . 41 ; s2 1254.8; n 10; ν (n 1) 9. The index of dispersion ( I ) s2 / x 1254 . 8 / 326 . 4 4 . 84 >> 1 which is indicative of over distribution (i.e. contagious distribution). The 2 test is applied, as before: 2
s2 (n 1) I (n 1) 4 . 84 9 43 . 60 x
From tables, the observed value for 2 is P 0.001 with 9; therefore, we must reject the hypothesis that the distribution of colonies is random.
CH004-N53039.indd 49
5/26/2008 4:46:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
50
For Large Sample Numbers (n > 30) The frequency distribution of bacterial colonies in soil determined by microscopy (Table 4.1; Jones et al., 1948) had a mean count (x ) of 2.68 and a variance (s2) of 2.074 with n 80.
The Index of Dispersion Test The null hypothesis (H0) is that the numbers of colonies are distributed randomly; the alternative hypothesis (H1) is non-randomness. The index of dispersion (I ), using the equation used previously s2 / x 2 . 074 / 2 . 68 0 . 774 for which 2 I (n1) (0.774)79 61.15. Since the sample is large (n 80), the standardized normal deviate (d ) is calculated: d
2 2 (2y 1)
122 . 3 159
11 . 059 12 . 610 1 . 551 1 . 551 Since the absolute value (1.55) of d is less than 1.96 (i.e. 1.55 1.96) the null hypothesis for agreement with a Poisson series is not rejected at the 95% probability level.
The Goodness-of-fit Test The index of dispersion result is checked by a goodness-of-fit test. The observed and expected colony numbers in the frequency distribution (Table 4.1) are used to determine a 2 value for each frequency class based on the ratio (Observed Expected)2/Expected. These values are summed to give a cumulative 2 value of 4.83, for which P 0.30, with 4 degrees of freedom (i.e. ν 6 2 4). Hence, the null hypothesis of randomness can not be rejected. TABLE 4.1 2 Test for Goodness-of-Fit to a Poisson Distribution Number of Colonies
Observed (O)
Expecteda (E)
2 (O E)2/E
0 1 2 3 4 5 and over
7 8 24 20 11 10
5.5 14.7 19.7 17.6 11.8 10.7
0.409 3.054 0.939 0.327 0.054 0.046
Totals
80
80.0
4.829
a
CH004-N53039.indd 50
Frequency
Derived using: P(x)
em (mx/x `!) and m 2.68.
5/26/2008 4:46:42 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
51
Fisher’s Index of Dispersion (I) provides a measure of the equality of variance and mean in Poisson series, viz: sample variance s2 I theoretical variance x
∑ (x x)
2
x(n 1)
where s2 sample variance, x sample mean and n number of sample units. The significance of the extent to which this ratio departs from unity can be determined from a table of 2 (Chi2), since I(n1) is approximated by 2 with (n1) degrees of freedom, under the null hypothesis: 2 I(n 1)
s2 (n 1) x
∑ (x x)2 (n 1) ∑ (x x)2 (n 1)x
x
From tables of 2, or more approximately from Fig. 4.4, agreement with a Poisson series is not disproved at the 95% probability level (0.975 P 0.025) if the 2 value lies between the upper and lower significance levels for (n 1) degrees of freedom. When agreement is perfect I 1 and 2 (n1). If the sample is large (n 30), it can be assumed that the absolute value of 2 2 is distributed normally about 2 ν 1 with unit variance, where ν degrees of freedom. Agreement is then accepted (P 0.05) if the absolute value of d is less than 1.96, where d is the normal variate with zero mean and ν is the degrees of freedom. The value of d is derived from: d
2 2 (2 ν 1
Departure from a Poisson series (i.e. non-randomness) is shown by: (a) 2 less than expected (i.e. d 1.96 with a negative sign): suspect a regular distribution with s2 < x ; (b) 2 greater than expected (d 1.96 with positive sign): suspect contagious distribution. When the sample number is large, the result of the index of dispersion test should be checked by the 2 test for ‘goodness-of-fit’ since this is a more robust method of testing for compliance with a specific distribution. The Likelihood Ratio Index (G2 Test) as described by Anon (2005) provides an alternative method, which is slightly more complicated but more accurate. Firstly it is necessary to carry out an evaluation of colony numbers on replicate plates for evidence of under- or over-dispersion. To test the homogeneity of colony counts, the G2 test uses the equation: ⎡ n ⎛ O ⎞⎤ Gn21 2 ⎢⎢ ∑ Oi ln ⎜⎜⎜ i ⎟⎟⎟⎥⎥ ⎜⎝ Ei ⎟⎠⎥ ⎢⎣ i1 ⎦ where Oi is the ith observed colony count, Ei is the ith expected colony count, i is the number of colony counts (from 1, 2, …, n) and n is the actual number of colony counts.
CH004-N53039.indd 51
5/26/2008 4:46:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
52
50 45 40 35
Contagious
rl ev el
25
Up
pe
x2
30
20
om
nd
Ra
15 10
r we Lo
5
el lev
Regular
0 0
5
10 15 20 Degrees of freedom (y)
25
30
FIGURE 4.4 The 5% significance levels of 2. If 2 value between the upper and lower significance levels then agreement with a Poisson series is accepted at 95% probability level (P 0.05) (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
The assumption is made that the observed counts should follow the volumes of sample tested. The expected value (Ei) is calculated as the expected fraction of the sum of all colonies that each of the test volumes should contain at each dilution level: Ei
Vi ∑ Vi
∑ Ci
where Vi the volume of dilution i tested in each replicate plate, Vi is the sum of volumes tested at that dilution and Ci is the sum of colonies counted at that dilution level. In practice it is not necessary to calculate Ei since the formula can be inserted in the general equation: ⎡ C Gn21 2 ⎢⎢ ∑ Ci ln i Vi ⎢⎣
CH004-N53039.indd 52
∑ Ci
ln
∑ Ci ⎤⎥ ∑ Vi ⎥⎥⎦
5/26/2008 4:46:43 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
53
where Ci is the number of colonies counted on the ith plate, Vi is the volume tested on the ith plate, i is the number of sets of plate counts (i 1, 2, …, n) and n is the total number of plates used. The value determined for G2 can be compared with values of the 2 distribution for n1 degrees of freedom (see section ‘General Homogeneity Test’ in Example 4.2). The ISO standard method (Anon., 2005) includes a BASIC computer programme for calculating the G2 index.
EXAMPLE 4.2 DETERMINATION OF THE OVERALL AGREEMENT BETWEEN COLONY COUNTS ON PARALLEL PLATES USING THE ISO STANDARD METHOD (ANON, 2005) General Homogeneity Test Assume that duplicate colony counts are made on two 10-fold dilutions of a bacterial suspension; the numbers of colonies counted, together with the relative volumes of suspension tested, are:
Dilution
Colony counts
103 104
211 26
Total
226 24
Total count 437 50
Relative volumes 10 1
10 1
487
Total volume 20 2 22
Are these colonies randomly dispersed? The null hypothesis Ho is that the colonies are derived from a randomly dispersed population of organisms; the alternative hypothesis H1 is that they are not derived from a random population. Randomness of the whole set is measured by the likelihood ratio index (G 2) using the equation: ⎡ ⎛ C ⎞ Gn21 2 ⎢⎢ ∑ ⎜⎜⎜ Ci ln i ⎟⎟⎟ ⎜⎝ Vi ⎟⎠ ⎢ ⎣
⎛ ⎞⎤ ⎜⎜ ∑ Ci ⎟⎟⎥ C ln ∑ i ⎜⎜⎜ V ⎟⎟⎟⎥⎥ ⎝ ∑ i ⎠⎦
Hence,
⎡⎛ ⎞ ⎛ ⎞⎤ 2 ⎢⎜ 211 ln 211 226 ln 226 26 ln 26 24 ln 24 ⎟⎟ ⎜⎜ 487 ln 487 ⎟⎟⎥ Gn 1 2 ⎢⎜ ⎟⎠ ⎜⎝ ⎟⎠⎥ ⎜⎝ 22 10 10 1 1 ⎦ ⎣ 2[(642 . 39 704 . 66 84 . 71 7 6 . 27) 1508 . 35] 2(1509 . 04 1508 . 35) 2 0 . 69 1 . 38
CH004-N53039.indd 53
5/26/2008 4:46:43 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
54
Since there are four terms in the sum, there are 41 3 degrees of freedom. The calculated value for G 2 (1.38) is compared with the 2 value for 3 degrees of freedom. This shows, with a probability level of 0.95 P 0.10, that the null hypothesis should not be rejected and therefore the colonies come from a randomly distributed population. The ratio of counts between the 2 dilution levels is given by 437/50 8.74:1. Whilst this is lower than the ideal ratio of 10:1, the difference could have arisen purely by chance and there is no reason to consider the data to lack homogeneity. We can check this using the Fisher Index of Dispersion (Example 4.1): the mean colony count at the lower level of dilution (103) is: [211 226 (26 10) (24 10)]/4 234.25 and the variance (s2) of the counts with 3 degrees of freedom 435.92. Then the Fisher Index of Dispersion (I ) 234.5/435.9 0.54 and the Fisher 2 value I (n1) 0.54 3 1.62. For 2 of 1.62 and ν 3, the probability is 0.70 P 0.50. This does not disprove a hypothesis of randomness – possibly because of the small number of values tested – so the null hypothesis is accepted.
General Test of Homogeneity Followed by an Analysis of Deviance This is a similar data set but with three parallel tests at each dilution: Dilution 4
10 105
Colony count 190 10
Total
220 21
Total 165 9
575 40
Relative volume 10 1
10 1
Total
10 1
615
30 3 33
G 2 2 [190 ln(190/10) 220 ln(220/10) … 21 ln(21/1) 9 ln(9/1) 615 ln(615/33)] 19.64 For ν 61 5 degrees of freedom, the 2 value of 19.64 shows with a probability of P 0.01 that the hypothesis of randomness should be rejected and suggests overdispersion of the counts. The G2 equation is then used repeatedly to evaluate the causes of the over-dispersion between the three components of the data set, that is, the replicate plates at dilution 104, the replicate plates at dilution 105 and the differences between the dilution levels. For the counts at 104: G 2 2 [190 ln(190/10) 220 ln(220/10) 165 ln(165/10) 575 ln(575/30)] 7.91, with ν 2 For the counts at 105: G 2 2[10 ln (10) 21 ln(21) 9 ln(9) 40 ln(40/3)] 6.25, with ν 2 For the dilution ratio: G 2 2 [575 ln(575/30) 40 ln(40/3) 615 ln(615/33)] 5.48, with ν 1
CH004-N53039.indd 54
5/26/2008 4:46:43 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
The derived values of G
2
55
can be entered into an Analysis of Deviance Table:
Source of variance
G2
ν
P
Between replicate counts at 104 Between replicate counts at 105 Between dilutions
7.91
2
0.02 P 0.01
6.25
2
0.05 P 0.025
5.48
1
0.02 P 0.01
19.64
5
0.005 P 0.001
Total
This analysis of deviance demonstrates that the causes of over-dispersion are associated largely with the replicate counts at the lower dilution (104) and the non-linear effect of the dilution from 104 to 105. Such differences in counts at successive dilutions are not unusual in microbiological practice and indicate a need for investigation of the practices used in the laboratory.
The G2 index can be used also to provide an overall index of homogeneity of parallel plating. Since all the volumes tested in a parallel series at a given dilution are nominally identical, the formula can be simplified and rewritten as: ⎡ Gn21 2 ⎢⎢ ∑ Ci ln Ci ⎢⎣
∑ Ci
ln
∑ Ci ⎤⎥ n
⎥ ⎥⎦
Although a solitary value is of limited use, if there are m sets of plates with equal numbers of parallel tests, then the additive property of G2 can be used to provide an overall measure of homogeneity across the series. This value G2p can be derived as: 2 Gp2 Gm (n1)
m
∑ G(2n1) j 1
The derivation of G2 and Gp2 is illustrated in section ‘General test of homogeneity followed by an analysis of deviance’ in Example 4.2. The 2 test for goodness-of-fit can be undertaken when sufficient samples have been tested to permit comparison of the observed frequency distribution with the expected frequency derived mathematically (see section ‘The goodness-of-fit test’ in Example 4.1). The
CH004-N53039.indd 55
5/26/2008 4:46:44 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
56
model (Table 4.1) is a good fit to the experimental data when the observed and estimated frequencies agree, and is tested by 2 where: 2
∑
(Observed Expected)2 Expected
The 2 value is calculated for each frequency class and the cumulative 2 value is derived, with ν degrees of freedom, where: ν (number of frequency classes) (number of estimated parameeters) 1
Since in a Poisson series only one parameter () is estimated, ν (number of frequency classes) 2 It has been recommended that the number of expected values should not be less than 5, and that frequencies should be combined when necessary. Cochran (1954) considers that this weakens the test too far and recommends combinations of expected values such that none is less than 1. Agreement with a Poisson series is not rejected at the 95% probability level (P 0.05) if the 2 value is less than the 5% point for 2 with degrees of freedom. If is greater than 30, agreement is accepted (P 0.05) when the absolute value of d is less than 1.645, where: d
2 2 (2ν 1)
The ‘goodness-of-fit’ is an all-purpose test which can be applied to test agreement of experimental data with any specific statistical distribution. By contrast, the index of dispersion test (see above) is aimed directly at a property of the Poisson distribution, namely the expected equivalence of variance and mean. Consequently, one would normally expect that if the hypothesis of randomness is not rejected by the index of dispersion test, then the ‘goodness-of-fit’ test should also not reject the hypothesis. REGULAR DISTRIBUTION If high levels of individual cells in a population are crowded together, yet neither clumped nor aggregated, the dispersion of the population tends towards a regular (i.e. homogenous) distribution. In such circumstances, the number of individuals per sampling unit approaches the maximum possible and the variance is less than the mean ( 2 ). The binomial distribution provides an approximate mathematical model. The characteristic features of the distribution are illustrated diagrammatically in Fig. 4.1b. In terms of microbiological analysis, such a situation might be expected when considerable growth has occurred (e.g. on the cut surface of a piece of meat or when colonies are
CH004-N53039.indd 56
5/26/2008 4:46:44 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
57
crowded on a culture plate). Such distributions are rarely seen in microbiological analysis, because the normal laboratory procedures destroy any regular distribution that may have occurred, although they may occur as experimental artefacts. The Binomial Distribution as a Model for a Regular Dispersion The expected frequency distribution of a positive binomial is given by n(q p)k where n number of sampling units, k maximum possible number of individuals in a sampling unit, p probability for occurrence of a specific organism in a sample unit, and q (1 p) is the probability of any one place in a sampling unit not being occupied by a specific individual. Estimates of the parameters p, q and k are obtained from the sample. The highest count on the sample provides a rough estimate for k but is frequently low. A more accurate estimate can be derived from the mean and variance, where the estimate of k kˆ (x 2 )/(x s2 ) , to the nearest integer (whole number). Expected probabilities are given by the expansion of (p q)k. Estimates of p and q are given by: pˆ x/kˆ and qˆ 1 pˆ CONTAGIOUS (HETEROGENEOUS) DISTRIBUTIONS The spatial distribution of a population of organisms is rarely regular or truly random, but is usually contagious and has a variance significantly greater than the mean (2 ). In a contagious distribution, clumps and aggregates of cells always occur, but the overall pattern can vary considerably (Fig. 4.1c and 4.2). The frequent detection of contagious distributions in microbiology reflects the colonial growth of microbes, and indirectly reflects the consequence of environmental factors on growth; but it may be due also to artefacts associated with the counting procedure. The overall dispersion pattern is dependent on the sizes of the clumps and aggregates, the spatial distribution of, and distance between, the clumps themselves, and the spatial distribution of organisms within a clump. A pattern found frequently is that shown in Fig. 4.1c, where patches of high density (clumps, microcolonies) are dispersed on a background of low density. Several mathematical models have been proposed to describe contagious distributions. Of these, the negative binomial is the most useful. Other models, such as Taylor’s Power Law (Taylor, 1971) and, for asymmetrical (i.e. skew) distributions, the Thomas (1949), Neyman (1939): Type A and Pôlya-Aeppli (Pôlya, 1931) distributions may be of value. Elliott (1977) gives some examples of the application of such models. The Negative Binomial Distribution as a Model for Contagious Distribution This distribution, described fully in Chapter 3, has two parameters (the mean and an exponent k) and one mode (i.e. most frequent count). The negative binomial can be applied as a model to a wide variety of contagious distributions.
CH004-N53039.indd 57
5/26/2008 4:46:44 PM
58
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
1. True contagion. The presence of one individual increases the chances that another will occur in the same place. Since bacterial and yeast cells frequently adhere to one another, contamination may be by a single cell or by more than one cell. Furthermore, growth of a microcolony from a single cell will occur frequently. 2. Constant birth-death-immigration rates. This is likely to be of significance only in the context of growth and senescence, and/or in chemostat situations, which are outside the scope of this account. 3. Randomly distributed clumps. If clumps of cells are distributed randomly and the numbers of individuals within a clump are distributed in a logarithmic fashion, then a negative binomial distribution will result (Jones et al., 1948; Quenouille, 1949). The mean number of clumps (m1) per sampling unit and the mean number of individuals per clump (m2) are given by: ⎛ ⎛ ⎞⎞ m1 k ln ⎜⎜⎜1 ⎜⎜ ⎟⎟⎟⎟⎟⎟ ⎜⎝ k ⎠⎟⎠ ⎝ and ⎛ ⎞⎟ ⎜⎜ ⎟ ⎜⎝ k ⎟⎠ () m2 ⎛ ⎛ ⎛ ⎞⎞ ⎛ ⎞⎞ m1 ln ⎜⎜⎜1 ⎜⎜ ⎟⎟⎟⎟⎟⎟ k ln ⎜⎜1 ⎜⎜ ⎟⎟⎟⎟⎟⎟ ⎜⎝ k ⎠⎟⎠ ⎜⎝ k ⎠⎟⎠ ⎜⎝ ⎝ where m1 m2 and k is the exponent. The size of the sample unit will affect the values of and m1, but not of m2. If m2 is constant, the ratio /k will also be constant. Hence k is directly proportional to and the sample size will affect both k and . 4. Heterogeneous Poisson Distributions. Compound Poisson distributions, where the Poisson parameter () varies randomly, have a 2 distribution with 2 degrees of freedom (Arbous and Kerrich, 1951). Southwood (1966) considers that in some respects this is a special variant of (3) above, where ⎛ ⎞⎟ ⎛ 2k ⎞⎟ ⎟ ⎜⎜ ⎟ m1 ⎜⎜⎜ ⎜⎝ m2 ⎟⎟⎠ ⎜⎝ 2 ⎟⎟⎠ Since the value of m1 would be close to unity and would not be influenced by sample size, and m2 must always be slightly less than , it is difficult to see its relevance in microbiological distributions. Test for agreement of large samples (n 50) with a negative binomial. The simplest and most useful test is that for goodness-of-fit. The maximum likelihood method (Chapter 3 Example 3.4) provides the most accurate estimate of k and should always be used with large
CH004-N53039.indd 58
5/26/2008 4:46:44 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
59
samples. The arithmetic mean (x) of the sample provides an estimate of . As the two parameters are estimated from the sample data, the number of degrees of freedom () number of frequency classes after combination 3. Test for agreement of small samples (n 50) with a negative binomial. If the data can be arranged in a frequency distribution, the 2 test can be applied as described above. When this cannot be done, other tests are used based upon observed and expected moments1 (the first moment about the origin the moment of the arithmetic mean (x) ; the second moment about the mean moment of the variance (s2); the third moment is a measure of skewness; and the fourth moment is a measure of kurtosis or flatness). The statistic U is the difference between the sample estimate of variance (s2) and the expected variance of a negative binomial: ⎛ x 2 ⎞⎟ ⎟⎟ U s2 ⎜⎜ x
⎜⎝ kˆ ⎟⎠ where kˆ is estimated from the frequency of zero counts (see below). The statistic T is the difference between the sample estimate of the third moment and the expected third moment: ⎛ ∑ x3 3x ∑ x 2 2 x 2 ∑ x ⎞⎟ ⎞ ⎛ 2 ⎟⎟ s2 ⎜⎜ 2s 1⎟⎟⎟ T ⎜⎜⎜ ⎜ ⎟⎠ ⎟⎠ ⎜⎝ n ⎝ x The expected values of U and T are zero for perfect agreement with a negative binomial, but agreement is accepted if the values of U and T differ from zero by less than their standard errors. The standard errors of U and T can be calculated from the formulae of Anscombe (1950) and Evans (1953) or derived from the nomogram in Fig. 4.6. A large positive value for U or T indicates greater skewness than the negative binomial and suggests that the lognormal distribution might be a more suitable model. A large negative value indicates that other distributions (e.g. Neyman Type A) might be more appropriate because of the reduced skewness. 1. Moment estimate of k. The estimate of k is given by kˆ (x 2 )/(s2 x). This can be derived, as described previously (Chapter 3), from the theoretical variance or from the statistics X and Y, where: X x2
s2 n
and
Y s2 x
1
Statistical Moments describe the shape of a distribution through estimates of its mean, variance, skewness and kurtosis.
CH004-N53039.indd 59
5/26/2008 4:46:44 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
60
The expectations of X and Y are given by: EX 2
and
EY
2 k
Hence, k 2 /EY EX /EY For small samples, kˆ EX / EY [(x 2 s2 )/ n] /(s2 x) where n number of sample units. As n increases, s2/n decreases and the equation for kˆ approaches x 2 /(s2 x) This method is more than 90% efficient for small values of x when kˆ /x > 6 , for large values of x when kˆ > 13 and for medium values of x when [(k x)(k 2)/x ] 15 (Anscombe, 1949, 1950). Frequently, kˆ < 4 , so the use of this method is limited to small mean values (x < 4) but can be used to find an approximate value for kˆ which can then be applied in the other methods. 2. Estimate of kˆ from the proportion of zeros. Equation kˆ log[1 (x/kˆ)] log(n /fx0 ) is solved by iteration for various values of kˆ , where n is the number of sample units and fx0 is the frequency of occurrence of samples with no organisms. The method is over 90% efficient when at least one-third of the counts is zero (i.e. fx0 n/3) but more zero counts are needed if x > 10 . The use of this method is illustrated in Example 3.4. 3. Transformation method for estimation of kˆ . After deriving an approximate value for kˆ by method (1), each count (x) is transformed (Table 3.4) to a value y, where y log[x (kˆ /2)] if the approximate value for kˆ 2 to 5 and x 15 ; or to y sinh1 (x 0 . 375) / ( k 0.75) if the approximate value for kˆ < 2 and x > 4 . The function sinh1 is the reciprocal of the hyperbolic sine and is found in standard mathematical tables (Chambers Mathematical Tables, Vol. 2 – table VIA; Comrie, 1949). The expected variance of the transformed counts is independent of the mean and equals (0.1886 trigamma kˆ ) for the first transformation and (0.25 trigamma kˆ ) for the second transformation. Selected values for the function ‘trigamma kˆ ’ are given in Table 4.2; other values can be obtained from tables 13 to 16 of Davis (1963). To estimate k, try different values of kˆ in the appropriate transformation until the variance of the transformed counts equals the expected variance (trigamma kˆ ; Table 4.2). These transformations can also be used to estimate the confidence limits for small samples from a negative binomial distribution. The method is over 90% efficient if x > 4 , and is
CH004-N53039.indd 60
5/26/2008 4:46:45 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
61
TABLE 4.2 Expected Variance of Transformed Counts from a Negative Binomial 0.1886 trigamma
kˆ
for x > 15
k 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.2 5.4 5.6 5.8 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
0.25 trigamma k
0.1216 0.1138 0.1081 0.1023 0.0972 0.0925 0.0882 0.0843 0.0808 0.0775 0.0745 0.0717 0.0691 0.0667 0.0644 0.0623 0.0603 0.0585 0.0567 0.0551 0.0535 0.0521 0.0507 0.0494 0.0481 0.0469 0.0458 0.0447 0.0437 0.0427 0.0417 0.0400 0.0384 0.0369 0.0355 0.0342 0.0314 0.0290 0.0273 0.0251 0.0235 0.0222 0.0209 0.0198
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4
kˆ
for x > 4
k 0.1612 0.1517 0.1432 0.1356 0.1288 0.1226 0.1170 0.1118 0.1071 0.1028 0.0987 0.0950 0.0916 0.0884 0.0854 0.0826 0.0800 0.0775 0.0752 0.0730 0.0710 0.0690 0.0672 0.0654 0.0638 0.0622 0.0607 0.0593 0.0579 0.0566 0.0553 0.0542 0.0530 0.0519 0.0509 0.0498 0.0489 0.0479 0.0470 0.0462 0.0453 0.0445 0.0438 0.0430 0.0423
6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8
k 0.0416 0.0409 0.0402 0.0396 0.0390 0.0384 0.0378 0.0373 0.0367 0.0362 0.0357 0.0352 0.0347 0.0342 0.0338 0.0333 0.0329 0.0324 0.0320 0.0316 0.0312 0.0308 0.0305 0.0301 0.0297 0.0294 0.0291 0.0287 0.0284 0.0281 0.0278 0.0275 0.0272 0.0269 0.0266 0.0263 0.0258 0.0252 0.0247 0.0243 0.0238 0.0234 0.0229 0.0225 0.0221
12.0 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 14.2 14.4 14.6 14.8 15.0 15.2 15.4 15.6 15.8 16.0 16.2 16.4 16.6 16.8 17.0 17.2 17.4 17.6 17.8 18.0 18.2 18.4 18.6 18.8 19.0 19.2 19.4 19.6 19.8 20.0
0.0217 0.0214 0.0210 0.0207 0.0203 0.0200 0.0197 0.0194 0.0191 0.0188 0.0185 0.0183 0.0180 0.0177 0.0175 0.0172 0.0170 0.0168 0.0166 0.0163 0.0161 0.0159 0.0157 0.0155 0.0153 0.0152 0.0150 0.0148 0.0146 0.0145 0.0143 0.0141 0.0140 0.0138 0.0137 0.0135 0.0134 0.0132 0.0131 0.0130 0.0128
Source: Reproduced from Elliott, (1977) by permission of The Freshwater Biological Association; with additional data from Davis (1963)
CH004-N53039.indd 61
5/26/2008 4:46:46 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
62
more efficient than method (1) (Anscombe, 1949, 1950). Because of the work involved, use of method (3) can rarely be justified in terms of increased efficiency of determination of kˆ unless the number of terms involved is few. Elliott (1977) provides a key to the choice of method for deriving kˆ (Table 4.3). The Lognormal and Other Contagious Distribution Models The lognormal distribution is a special form of contagious distribution that has only one mode, but is more skewed than the negative binomial. When logarithms of counts follow a normal frequency distribution, the original counts must follow a discrete lognormal distribution. The logarithmic transformation of counts (Chapter 3 and Table 3.4) often provides a useful approximate model to normalize data in a negative binomial distribution. However, the distribution of bacterial cells in colonies normally follows a logarithmic distribution (Jones et al., 1948; Quenouille, 1949), the individual terms of which are given by: x,
x 2 x3 xn , , ..., n 2 3
where 1/ln(1x) and x is the bacterial count (Fisher et al., 1943). An intermediate distribution, the Poisson lognormal, with a skewness intermediate between negative binomial and discrete lognormal has been shown to be a special form of the heterogeneous Poisson distribution (Cassie, 1962). Other types of contagious distribution
TABLE 4.3 Key to Methods for Deriving
kˆ
modified from Elliott (1977)
(A) Counts can be arranged in frequency distribution
Approximate methoda
Accurate methoda
Test by
Moment estimate (1)
Maximum likelihood (see Chapter 3)
2
Moment estimate (1)
Proportion of zeros (2)
U statistic
Moment estimate (1)
Moment estimate (1)
T statistic
–
Transformation (3)
T statistic
(B) Counts cannot be arranged in frequency distribution (a) x and x / k meet in U-half of Fig. 4.6 and f0 n/3 (b)(i) x and x / k meet in T-half of Fig. 4.6 and x < 4 (b)(ii) x and x / k meet in T-half of Fig. 4.6 and x > 4 a
Figures in ( ) refer to methods described in the text.
CH004-N53039.indd 62
5/26/2008 4:46:47 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
63
have been described with various degrees of skewness and in some instances have more than one mode (for review see Elliott, 1977). The absolute nature of the population distributions in microbiological analysis, will be masked by the effects of sampling (see below) and the dilution procedures used. The apparent distributions of counts will appear to follow Poisson or, more usually, negative binomial distributions; for routine purposes the latter can be approximately ‘normalized’ using the logarithmic transformation (see Fig. 3.9). The moments of the lognormal distribution are given by: Median exp(); Mean exp( 2/2) and variance exp(2 2) · exp(2 1), where mean and 2 variance of the raw counts. EFFECTS OF SAMPLE SIZE Reference was made above to the effect of sample size on apparent randomness. If it is assumed that the population distribution is essentially contagious (Fig. 4.5), then for various sample quadrat sizes (A, B, C, D) the apparent distribution of counts would vary. If the smallest sampling quadrat (A) were moved across the figure, dispersion would appear to be random, or slightly contagious, with s2 x but for quadrat (B) it would appear contagious because each sample would contain either very few or very many organisms (hence s2 > x ). With sample quadrat (C), regularly distributed cell clumps would sug gest that the overall dispersion would be random (s2 x) ; while for quadrat (D), dispersion 2 might appear regular (s < x) since the clumps would affect only the dispersion within the sample unit and each would contain about the same number of individuals. However, if the
A
B
C
D
FIGURE 4.5 Four sample templates (A, B, C and D) and a contagious distribution with regularly distributed clumps. The area of sample quadrat D is 16x greater than A, C is 9x greater than A, and B is 4x greater than A (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
CH004-N53039.indd 63
5/26/2008 4:46:47 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
64
distribution of the clumps were random or contagious, such effects would not be seen with sample quadrats (C) and (D). These are purely hypothetical illustrations but the concept is of practical importance where microbiological samples are obtained, for instance, by swabbing the surface of a piece of meat or during hygiene tests on process equipment. The occurrence of bacterial cells in food materials sometimes appears to be random, but is usually contagious. In circumstances where the cells can be viewed in situ they are most often seen to be in a contagious association, due to replication, senescence and death of cells within the microenvironment. Increasing the sample size does not change the intrinsic distribution of the organisms but the processes of sample maceration and dilution introduce artificial changes. Only in special circumstances is it possible to examine very small samples drawn directly from a food, for example, the smear method for total bacterial counts on milk (Breed, 1911) and, even then, spreading the sample across a microscope slide will affect the spatial distribution. In addition, the size of the sample (20 l) will still be large compared with the size of individual bacterial cells and clumps. It is therefore pertinent to consider that the distribution of organisms in a food will follow a contagious rather than a random distribution. The numbers of discrete colonies on a culture plate derived from higher serial dilutions will normally appear to conform to a Poisson series (i.e. a random distribution of colony forming units will be observed) partly as a consequence of maceration and dilution. But it is important to remember that it is not possible to distinguish whether the colonies grew from individual cells or from cell clumps and clusters. Even when organisms have been subjected to some form of sublethal treatment (e.g. freezing, sublethal heating, treatment with disinfectants or food preservatives) the total number of viable microbial cells (and clumps) will often follow a Poisson distribution in the dilutions plated, but the numbers of apparently viable cells may not do so. Since some fraction of the organisms will have greater intrinsic resistance, or may have received a less severe treatment than other cells, the distribution of viable organisms detected in replicate colony counts will frequently follow a contagious distribution with s2 > x (see Example 4.4; Dodd, 1969; Abbiss and Jarvis, 1980).
EXAMPLE 4.3 USE OF METHOD (2) AND THE U-STATISTIC TO ESTIMATE THE NEGATIVE BINOMIAL PARAMETER (k) AND DETERMINE THE GOODNESS-OF-FIT Microscopic counts of bacterial spores in a suspension gave the following distribution: Number of spores/field (x) x Frequency (f ) (fx) x2 f (x2)
CH004-N53039.indd 64
0 14 0 0 0
1 6 6 1 6
2 8 16 4 32
3 4 12 9 36
7 2 14 49 98
8 4 32 64 256
Total 9 2 18 81 162
40 98 590
5/26/2008 4:46:48 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
65
Does the negative binomial provide a good model for the distribution of these data? We test the null hypothesis that the data distribution conforms to the negative binomial; the alternative hypothesis is that the data does not conform. The mean value (x ) is given by x ∑ ( fx) / ∑ f 98 / 40 2 .45 where fx is determined by multiplying the number x by the frequency (f) of occurrence, and (fx) is the cumulative sum of these values. The variance (s2) is derived as: s2
∑ f ( x2 ) ( x)∑ ( fx) 590 2.45(98) 8.97 (n 1)
39
Then the approximate value of kˆ (by method (1)) is given by: kˆ
( s2
x2 (2 . 45)2 0 . 9206 x) (8 . 97 2 . 45)
The co-ordinates, x 2 . 45, and x/ kˆ 2 . 45 / 0 . 9206 2 . 66, meet in the U half of the standard error nomogram (Fig. 4.6) close to the line for S.E.U 2; and f 0 14 n/3. Then kˆ can be estimated by method (2), first trying kˆ 0 . 92 using the equation: ⎛n⎞ ⎛ ⎛x log ⎜⎜⎜ ⎟⎟⎟ kˆ log ⎜⎜⎜1 ⎜⎜ ⎜⎝ kˆ ⎜⎝ ⎜⎝ f0 ⎟⎠
⎞⎟⎞⎟ ⎟⎟⎟⎟ ⎠⎟⎠
For the left hand side of the equation: ⎛ n ⎞⎟ ⎛ 40 ⎞ ⎜ log ⎜⎜ ⎟⎟⎟ log ⎜⎜ ⎟⎟⎟ log 2 .857 0 . 45593 ⎜⎝ 14 ⎠ ⎜⎜⎝ f 0 ⎟⎠
(1)
Inserting the value kˆ 0 . 92 in the right hand side of the equation: ⎡ ⎡ ⎛ x ⎞⎤ ⎛ 2 . 45 ⎞⎟⎤ kˆ log ⎢⎢1 ⎜⎜ ⎟⎟⎟⎥⎥ (0 . 92) log ⎢1 ⎜⎜ ⎟⎥ 0 . 5187 ⎢ ⎜⎝ 0 . 92 ⎟⎠⎥ ˆ ⎟ ⎜ ⎝ k ⎠⎥⎦ ⎢⎣ ⎦ ⎣
(2)
Since the value of (2) is greater than that of (1), calculation (2) is repeated using a ˆ 0 . 60) ; this gives: lower value (k ⎛ ⎛ 2 . 45 ⎞⎟⎞⎟ (0 . 60) log ⎜⎜⎜1 ⎜⎜ ⎟⎟ 0 . 4237 ⎜⎝ 0 . 60 ⎟⎠⎟⎟⎠ ⎜⎝
(3)
Since value (3) is lower than both values (1) and (2), the value of lies between 0.60 and 0.92. By iteration we find that kˆ 0 . 696 gives ⎛ ⎛n⎞ ⎛ x ⎞⎞ kˆ log ⎜⎜⎜1 ⎜⎜⎜ ⎟⎟⎟⎟⎟⎟ 0 . 45598 log ⎜⎜⎜ ⎟⎟⎟ ⎟ ˆ ⎟ ⎜⎝ f0 ⎟⎠ ⎜⎝ ⎝ k ⎠⎠ kˆ
CH004-N53039.indd 65
5/26/2008 4:46:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
66
x 0.1
0.5
1
5
10 S.E 80 . (U )
50
50
100 50
40 20
10 10
10 5
) (T E. S. 000 1
5
2
5 x k
32
1 0.
0
5
0
16
x k
80 40
0.
2 20
1
0.
1
10
1 5
0.
05
0.5
2
0.5
1 0.5
0.2
0.2 0.1
0.5
1
5
10
50
100
x
FIGURE 4.6 Standard errors of T and U for n 100. For other values of n, multiply standard error by 10/冪n (after Evans, 1953) (reproduced from Elliott, 1977 with permission of the Freshwater Biological Association).
Hence kˆ 0 . 696 provides a reliable estimate for k (if required the process could be continued to further places of decimals). Now, for x 2 . 45, s2 8.97, kˆ 0 . 696 and n 40; the absolute value of U is determined: ⎧⎪ ⎧⎪ ⎛ 2 . 452 ⎞⎟⎫⎪⎪ ⎛ x 2 ⎞⎫⎪ ⎟⎟⎬ 2 . 104 U s2 ⎪⎨ x ⎜⎜ ⎟⎟⎟⎪⎬ 8 . 97 ⎪⎨ 2 . 45 ⎜⎜⎜ ⎜⎝ kˆ ⎟⎠⎪⎪ ⎪⎪ ⎪⎪ ⎝ 0 . 696 ⎟⎠⎪⎪⎭ ⎩ ⎩ ⎭ From Fig. 4.6, the standard error of U is given by: (10) ⎛ 10 ⎞⎟⎟ U 2 ⎜⎜ 2 3 . 16 ⎜⎝ n ⎟⎟⎠ 6 . 324 Since the absolute value of U (2.104) is less than its standard error (3.16), the hypothesis that the data conform to the negative binomial distribution is not disproved at the 95% significance level. It should be noted that for the U statistic it is not possible to state the actual probability level.
CH004-N53039.indd 66
5/26/2008 4:46:49 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
67
EXAMPLE 4.4 USE OF METHOD (3) AND THE T STATISTIC TO ESTIMATE kˆ AND TEST AGREEMENT WITH A NEGATIVE BINOMIAL DISTRIBUTION (DATA OF ABBISS AND JARVIS, 1980) After treatment with detergent and disinfectant, the colony counts (cfu/ml) on rinses from replicate miniaturized Lisboa tubes (Blood et al., 1979) were as follows: 21, 19, 21, 34, 38, 30, 9, 8, 12, 16 Does the negative binomial provide a good model for the distribution of these counts? The parameters of the count are the Mean x 20 . 8, Variance s2 106.84 and n 10. Then by the method of matching moments (method (1)), the approximate value for kˆ is given by: kˆ [ x 2 (s2 /n)] /(s2 / x ) 4 . 90 Determine the square and cube for each value of x and summate:
x
x2
x3
21 19 21 34 38 30 9 8 12 16
441 361 441 1156 1444 900 81 64 144 256
9261 6859 9261 39,304 54,872 27,000 729 512 1728 4096
208
5288
153,622
Calculate the T statistic from the equation:
⎛ ⎞ ⎪⎧⎛ 2 ⎞ ⎪⎫ ⎜ ( ∑ x3 3 x ∑ x2 2 x 2 ∑ x ) ⎟⎟ ⎟⎟ s2 ⎪⎨⎜⎜ 2 s ⎟⎟⎟ 1⎪⎬ T ⎜⎜⎜ ⎜ ⎟ ⎟ ⎪ ⎪⎪ n ⎜⎜⎝ ⎟⎠ ⎪⎩⎝ x ⎠ ⎭ ⎛ (153, 622 329, 971 . 2 179, 979 . 24 ) ⎞⎟ ⎪⎧⎪⎛ 2(106 . 84) ⎞⎟ ⎪⎫ ⎜⎜⎜ ⎟⎟ 1⎪⎬ ⎟⎟⎟ 106 . 84 ⎨⎪⎜⎜⎜ ⎪⎪⎭ ⎟ ⎜⎝ ⎝ ⎠ 10 20 . 8 ⎠ ⎪⎩ 362 . 90 4 990 . 736 627 . 83 627 . 8
For values of x 20 . 8, x/ k 4 . 1 and n 10 the standard error of T is determined from Fig. 4.6 as T 320 (10 / 10 ) 1012 .
CH004-N53039.indd 67
5/26/2008 4:46:50 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
68
As the absolute value of T (627.8) is less than its standard error (1012), the hypothesis that the data conform to a negative binomial distribution is not rejected at the 95% significance level. The negative value of T indicates a tendency towards reduced skewness. Since x > 4 and the approximate value of kˆ > 2 and 5 , estimate kˆ from the transformation y log x (k /2) . Firstly try the approximate estimate of kˆ 4 . 9 :
(
x y
21 19 1.3701 1.3314
21 1.3701
)
34 1.5617
38 1.6069
30 9 8 12 1.5112 1.0588 1.0191 1.1599
16 1.2660
The mean transformed count is ( y) 1 . 3255 and the variance of the transformed counts is s2y 0 . 0411 . The expected variance for kˆ 4 . 9 is 0.0427 (Table 4.2 page 61) and the actual variance is slightly less than for kˆ 5 . 0 . Repeat the transformation using other values for kˆ and set out the data in a table:
( )
Variance s2y
kˆ
y
Observed (O)
Expected (E)a
Difference (O E)
4.9 5.0 5.1 5.2
1.326 1.327 1.328 1.329
0.0411 0.0408 0.0406 0.0404
0.0427 0.0417 0.0408 0.0400
0.0016 0.0009 0.0002
0.0004
a
The expected variance is the 0.1886 trigamma values, from Table 4.1.
Hence, the best estimate of k is approximately kˆ 5 . 1 since the value of the observed variance of the transformed data approximates closely to the expected variance. If the estimate of k is required to more decimal places, or for values not shown in Table 4.2, then it is necessary to refer to mathematical tables such as those of Davis (1963) for values of trigamma kˆ , or determine the value by iteration from the data obtained.
References Anon. (2005) Milk and milk products – Quality control in microbiological laboratories. Part 1. Analyst performance assessment for colony counts. ISO 14461-1:2005. International Organization for Standardisation, Geneva. Abbiss, JS and Jarvis, B (1980) Validity of statistical interpretations of disinfectant test results. Research Report No. 347. Leatherhead Food Research Association. Anscombe, FJ (1949) The statistical analysis of insect counts based on the negative binomial distribution. Biometrics, 5, 165–173.
CH004-N53039.indd 68
5/26/2008 4:46:50 PM
THE DISTRIBUTION OF MICROORGANISMS IN FOODS IN RELATION TO SAMPLING
69
Anscombe, FJ (1950) Sampling theory of the negative binomial and logarithmic series distributions. Biometrika, 37, 358–382. Arbous, AG and Kerrich, JE (1951) Accident statistics and the concept of accident-proneness. Biometrics, 7, 340–432. Blood, RM, Williams, AP, Abbiss, JS and Jarvis, B (1979) Evaluation of disinfectants and sanitizers for use in the meat processing industry. Part 1. Research Report No. 303. Leatherhead Food Research Association. Breed, RS (1911) The determination of the number of bacteria in milk by direct microscopical examination. Zentralblatt für Bakteriologie. II. Abt. Bd., 30, 337–340. Cassie, RM (1962) Frequency distribution models in the ecology of plankton and other organisms. J. Animal Ecol., 31, 65–92. Cochran, WG (1954) Some methods for strengthening the common 2 tests. Biometrics, 10, 417–451. Comrie, LJ (1949) Chambers Six-Figure Mathematical Tables, Vol. 2. Chambers, Edinburgh. Davis, HT (1963) Tables of the Higher Mathematical Functions, Vol. 2. Principia Press, Bloomington, IN. revised Dodd, AH (1969) The Theory of Disinfectant Testing with Mathematical and Statistical Section, 2nd edition. Swifts (P & D) Ltd., London. Elliott, JM (1977) Some Methods for the Statistical Analysis of Samples of Benthic Invertebrates, 2nd edition. Freshwater Biological Association, Ambleside, Cumbria. Scientific Publication. No. 25. Evans, DA (1953) Experimental evidence concerning contagious distributions in ecology. Biometrika, 40, 186–211. Fisher, RA, Corbett, AS, and Williams, CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J. Animal Ecol., 2, 42–58. Jones, PCT, Mollison, JE and Quenouille, MH (1948) A technique for the quantitative estimation of soil microorganisms. Statistical note. J. Gen. Microbiol., 2, 54–69. Neyman, J (1939) On a new class of ‘contagious’ distributions, applicable in entomology and bacteriology. Annal. Mathemat. Stat., 10, 35–57. Pearson, ES and Hartley, HO (1976) Biometrika Tables for Statisticians, 3rd edition. University Press, Cambridge. Pôlya, G (1931) Sur quelques points de la théorie des probabilités. Anna. Inst. Henri Poincaré, 1, 117–161. Quenouille, MH (1949) A relation between the logarithmic, Poisson and negative binomial series. Biometrics, 5, 162–164. Southwood, TRE (1966) Ecological Methods. Methuen, London. Taylor, LR (1971) Aggregation as a species characteristic In: G.P. Patil et al. (Eds.), Vol. 1. pp. 357– 372. Pennsylvania: Penn State Univ. Press. Thomas, M (1949) A generalization of Poisson’s binomial limit for use in ecology. Biometrika, 36, 18–25. Ziegler, NR and Halvorson, HO (1935) Application of statistics to problems in bacteriology. IV. Experimental comparison of the dilution method, the plate count and the direct count for the determination of bacterial populations. J. Bacteriol., 29, 609–634.
CH004-N53039.indd 69
5/26/2008 4:46:51 PM
5 STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
Whenever we wish to make a decision about the quality, safety or acceptability, of a food material it is necessary to carry out microbiological (and/or other) tests on representative samples drawn from the ‘lot’ in question. Strictly, the estimates reflect only the parameters of the samples tested but decisions concerning the food ‘lot’ are often based on the results of such analyses. The precision of the estimate will depend on the test(s) employed (Chapters 6–11), whether the samples are truly representative and the number of samples tested. The ‘correctness’ of the decision will increase with increasing sample size and number, but the number of samples that can be tested will be limited inter alia by the practicalities and economics of testing. A large bulk of a relatively uniform material, such as milk, can be tested by drawing a representative random sample after thorough mixing of the bulk. Suppose, however, that a sample is taken from a total volume of 10,000 l, 1 ml of that sample is diluted for testing by a colony count procedure and colonies are counted on the 102 dilution; then the amount used for the test would be only 1 in 102 104 103 or one thousand millionth (109) of the ‘lot’. To ensure that the final dilution used is truly representative of the ‘lot’ under test, requires very efficient and intimate mixing at all stages of sampling, both in the production plant and in the laboratory. Whilst such representative sampling may be theoretically feasible for a liquid, it can rarely be so for solids (e.g. grain, vegetables, meat, etc.) or for solid–liquid multiphase systems (e.g. meat and gravy in a pastry case). Leaving aside for the moment the technical efficiency of testing (see Chapters 6–9), the efficiency of the sampling operation per se will affect the results obtained (see Board and Lovelock, 1973 and Chapter 6). Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH005-N53039.indd 71
71
5/26/2008 4:51:56 PM
72
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
ATTRIBUTES AND VARIABLES SAMPLING A product that has been produced to a set of process and product specifications is assumed to possess certain compositional and other characteristics, measurement of which can serve as a means to assess the ‘manufactured quality’ of the product through application of appropriate sampling and testing procedures. Although testing cannot of itself change the overall quality of the product, the assumption is made that test results can be used to draw conclusions about the overall quality of the product. This then leads to the concept that a batch or ‘lot’ of product can be accepted, or rejected, according to the results of tests on randomly drawn representative samples; this concept is known as Acceptance Sampling (qv). Two types of sampling scheme can be used in Acceptance Sampling: one is for Attributes and the other for Variables. In this context, the term ‘variables’ describes those attributes of a sample that can be estimated analytically; by contrast, the term ‘attributes’ describes inherent characteristics that may be assessed directly by inspection or indirectly by measurement (qv). A Variables Sampling scheme assumes that the measurements made on a series of samples from a given population follow a normal distribution with approximately 95% of all test values within 2 standard deviations from the mean. Unfortunately, neither the distribution of microorganisms in a food nor estimates of chemical or physical attributes of microbes (see Chapter 9) conform to a normal distribution. However, after logarithmic transformation, the log colony count values generally approximate to a normal distribution (Kilsby et al., 1979). Hence, Variables Sampling, which is the more efficient scheme, can be applied in microbiological analyses, although it had long been considered to be unsuitable (Kilsby, 1982). By contrast, Attributes Sampling assumes that each unit in a ‘lot’ is characterized as being defective or not; in some schemes a ‘marginally defective’ characterization is permitted (see below). In microbiological testing, the term ‘defective’ implies that the sample unit contains more than a specified number of organisms (when tested by an appropriate test) or, in the case of a presence/absence test, that the target organism sought is detected when a sample unit of specified size is tested by an appropriate method. In the first case (i.e. numbers of organisms colony counts) a variable is measured, but the counts are considered only in relation to a predetermined limit and the essential criterion is whether or not that limit is exceeded. Such an approach permits classification of the sample units in terms of the proportion defective and the proportion not defective in a given number of sample units. The frequency of occurrence of defective units is generally described by the binomial distribution, although other distribution functions (e.g. Poisson series) may be used in some instances. In the assessment of colony count data, a marginally defective grouping is often included in order to make some allowance for variation in the distribution of organisms in the food and for the imprecision associated with colony count procedures. The marginally defective sample is defined as one that contains a number of organisms lower than a specified upper limit (M) but a greater number than a lower (acceptable) specified limit (m). Such a scheme, referred to as a 3-class sample plan, is described approximately by the trinomial distribution, which makes allowance not only for the proportions of defective and non-defective
CH005-N53039.indd 72
5/26/2008 4:51:56 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
73
items (p and q, respectively) but also for the proportion () of marginally defective units, so that p q 1.
BINOMIAL AND TRINOMIAL DISTRIBUTIONS The Binomial Distribution (2-Class Sample Plan) It has already been shown (Chapter 3) that the probability for an event to occur x times out of n tests on a large number of occasions is given by: P(x)
n! (px )(q)(n x) (n x)!x !
where q 1 p. We can derive an estimate of p (pˆ ) from the expected prevalence of contamination of the ‘lot’ and, therefore, we can estimate the probability with which 0, 1, 2, …, n defective sample units will be found in samples of different size, provided that the total number of sample units tested (n) is small compared to the ‘lot’ size (N); otherwise, each time a sample unit is removed, the proportion of defectives will be changed significantly. Sample sizes 20% of the total are generally considered satisfactory. Thus, for a sample of 20 units (n 20) from a consignment with expected 1% defectives (i.e. pˆ 0 . 01) , we can calculate the terms of the expansion (for detail of the procedure see Chapter 3) by substituting in the equation given above. Table 5.1 shows the percentage frequency with which 0, 1 or 2 defective units would occur on average in samples of various sizes taken from ‘lots’of different average quality and size. If 20 sample units were tested for salmonellae, on average it is likely that, in 82 out of 100 tests, no salmonellae would be detected if 1% of the sample units contained in the ‘lot’ were contaminated (even assuming no technical errors in the analysis). If 20 sample units of a ‘lot’ containing 5% true defectives were tested, on average no salmonellae would be expected to be found in 36 out of 100 tests, but one or more positive tests would be found in 64 out of the 100 tests on average. However, had a sample size of 100 units been tested, then the probability of detecting no salmonellae in ‘lots’ with 1% or 5% true defectives would be 0.37 or 0.006 (i.e. on average none would be detected in 37 or 0.6 out of 100 tests), respectively. It can be seen, therefore, that testing small numbers of samples can give little or no protection against accepting a ‘lot’ containing salmonellae (or other pathogen) unless the prevalence of contamination is very high. The same reasoning can be applied in relation to tests of processed cans for seam faults, loss of headspace vacuum or ‘blowing’ on incubation, or indeed to any other sampling and testing procedure where the prevalence of defectives is sought. This does not imply that sampling is useless, since, if operated continuously on a quality control basis, it provides ‘cumulative’ data that a process is operating satisfactorily, or that the quality has suddenly deteriorated (see also Chapter 12).
CH005-N53039.indd 73
5/26/2008 4:51:56 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
74
TABLE 5.1 Percentage Frequency of Occurrence of 0, 1 or 2 Defective Units in Samples of Different Size % Frequency for samples of size (n) 10 Defective items in lot (%) 0.01 0.1 1 2 5 10
20
100
with number of defective units of 0
1
99.9 99 90 82 60 35
0.1 1 9 17 32 39
2 0.1 0.1 0.4 1 7 19
0 99.8 98 82 67 36 12
1 0.2 2 17 27 38 27
2 0.1 0.1 2 5 19 29
0
1
99 90 37 13 0.6 0.1
1 9 37 27 3 0.1
2 0.1 0.5 18 27 8 0.2
Source: Modified from Steiner, 1971; reproduced by permission of Leatherhead Food International.
Another way of looking at this is to say that if, say, 10 sample units are tested, then the probability (P) of finding no salmonellae in ‘lots’ with 0.1, 1, 5 or 10% overall true prevalence of salmonellae is 0.99, 0.90, 0.60 and 0.35, respectively (from column 2, Table 5.1). Then putting P(x0) probability of finding no salmonellae, and P(x0) probability of obtaining one or more positive tests for salmonellae in n sample units, it can be shown that P(x0) 1 (1 d)n where d prevalence of defectives per 100 samples, and n number of samples tested and shown to be satisfactory. Hence, d 1 n (1 P(x0) ). For a given probability of detection and number of sample units tested, we can therefore define the probable prevalence of contamination of the ‘lot’ (see Example 5.1). Similarly, for a defined probability we can determine how many samples it is necessary to test to give an assurance that a specified prevalence of defectives (d) is not exceeded (see Example 5.2) using the equation: n
log10(1 P) , log10(1 d)
where P P(x0)
From the properties of the binomial distribution given previously (Chapter 3) the standard deviation of the distribution of the number of defective units is given by np(1 p) and the mean is np. The larger the sample size the closer the binomial distribution approaches the normal distribution, which can be used as an approximation when n 20 and the number of defective units is large. When the chance of detecting defectives is very small and the number of sample units tested is large, the binomial distribution approaches the Poisson series.
CH005-N53039.indd 74
5/26/2008 4:51:56 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
75
EXAMPLE 5.1 CALCULATION OF THE PROBABLE PREVALENCE OF DEFECTIVES Assume 30 25 g samples are tested and all found to be satisfactory (e.g. negative for salmonellae). Then the probable prevalence of defectives is given by: d 1 n (1 P) If n 30 and P 0.95 then: d 1 30 (1 0 . 95) 1 30 0 . 05 0 . 095 Hence, the upper 95% confidence limit (CL) for the prevalence of defective sample units 9.5%, i.e. on 19 occasions out of 20 this result would indicate that the prevalence of salmonellae is between zero and 9.5%, but on 1 occasion out of 20, this level could be exceeded by an unknown amount. If we wished to be more precise we could determine the 99% upper CL d 1 30 (1 0 . 99 ) 1 30 0 . 01 0 . 142; hence the prevalence of salmonellae will be between 0 and 14.2% on 99 occasions out of 100, but could exceed this level on 1 occasion out of 100. To determine the upper CL for prevalence of salmonellae in a given weight of product multiply d by 1000/W, where W weight to samples tested. Hence, the upper 95% CL for prevalence of salmonellae in the product 0.095 1000/25 3.8 4 salmonella/kg and the upper 99% CL 6 salmonellae/kg.
EXAMPLE 5.2 HOW MANY SAMPLES DO WE NEED TO TEST? Can we calculate the number of samples that it is necessary to test with negative results to ensure at a probability of P 0.95 that not more than 0.1% of the lot is contaminated with salmonellae? The expected prevalence of defectives (d ) is: d 0.1/100 0.001; let P 0.95 Then log10 (1 P) log10 0.05 n 2994.2 log10 (1 d) log10 0.999 Hence to ensure with 95% probability that not more than 0.1% of a lot is contaminated by salmonellae it would be necessary to test 2995 sample units with negative results; clearly a technical impossibility! If the expected prevalence of defectives is 5%, then d 0.05. To be able to ensure with a 95% probability that salmonellae would contaminate not more than 5% of the lot would require at least 59 samples to be tested: n
CH005-N53039.indd 75
log10 0.05 1.3010 58.4 log10 0.95 0.02228
5/26/2008 4:51:56 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
76
The Trinomial Distribution (3-Class Sample Plans) If a sample is classified into two groups (i.e. good or defective), as used in 2-class attributes plans, the proportion of occasions on which an event occurs is described by the binomial distributions as described above. The 3-class sampling plan is defined by the trinomial distribution, which is an extension of the binomial, and both are special cases of the multinomial distribution. Bray et al. (1973) showed that the trinomial distribution provides an acceptable approximation when the ‘lot’ size is large, even when p q. The probabilities that d0 good units, d1 marginal units and d2 (n d0 d1) defective items occur is given by the expansion of the term n! (pd0 pd1 pd2 ) d0!d1!d2! 0 1 2 where, p0, p1 and p2 are the proportions of good, marginal and defective units. If the prevalence of marginally defective units (d1) 0, then this simplifies to the binomial expansion: P(x)
n! n! (p0d0 ) (p2d2 ) px (1 p)n x d0!d2! (n x)!x !
We can determine the probabilities of occurrence for various combinations of test results from the trinomial distribution; for instance, a sample of 20 units (n) drawn from a ‘lot’ containing 1% defective (d2 0.01) and 10% marginally defective (d1 0.10) units. The first few terms of the expansion are shown in Table 5.2 for the probabilities of detecting different combinations of defective and marginally defective units. The cumulative frequency
TABLE 5.2 Percentage Frequency of Occurrence of 0, 1 or 2 Defective Units with 0, 1, 2, 3, 4 or 5 Marginally Defective Units in a Sample of 20 Units % Frequency of detection for Number of defective unitsa in sample 0 1 2
0
1
2
3
4
5 Cumulative for 0 to 5 (%)
Marginally defective unitsa in sample 9.7 2.2 0.2
21.9 4.7 0.5
23.3 4.7 0.4
15.7 3.0 0.3
7.5 1.4 0.1
2.7 0.5 0.1
80.8 16.5 1.5
Assumes sample size 20 units; prevalence of defective units in lot 1%; prevalence of marginally defective units 10%.
a
CH005-N53039.indd 76
5/26/2008 4:51:57 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
77
values for each of 0, 1 and 2 defectives, for the first five marginally defective terms, provides values of 80.8, 16.5 and 1.5%, respectively. These are similar to the observed values of 81.8, 16.5 and 1.6%, respectively for the equivalent binomial distribution for 20 sample units containing 0, 1 and 2 defectives from a population with 1% defective units.
ACCURACY OF THE SAMPLE ESTIMATE The accuracy of the estimate of defectives (or marginal defectives) is derived in statistical terms by allocating upper and lower CLs to the sample value. These correspond to a probability that the true (‘lot’) value may lie outside these limits. Commonly the 95% CL is used although other limits can be applied. Suppose 3 defective units are found in a 2-class plan analysis of 20 sample units. Then if the true proportion of defectives in the consignment is p, the probability (P) of obtaining 3 or less defective units in the sample is given by the sum of the last four terms of the binomial expansion of (p q)n where n 20: 20! 20 20! 1 19 20! 2 18 20! 3 17 q pq p q p q 20!0! 19!1! 18!2! 17!3! where q 1 p. Equating this to P 0.025 (i.e. 2.5% probability) and solving for p sets an upper limit to the proportion of defectives, beyond which there is only a 2.5% chance of the true value occurring. Similarly, by solving the first 17 terms of (p q)20 and equating to P 0.025, the lower limit for p can be determined for P 2.5%. Taken together, these calculations define the upper and lower 95% CLs of p 0.38 and p 0.03, respectively. Because of the amount of calculation needed to derive these limits, the CL can be obtained more simply from standard Tables of the binomial expansion. For instance, Pearson and Hartley’s (1976) table 41 provides 95% and 99% CL for values of n from 8 to 1000, although the accuracy of the limits is restricted at values of n greater than 20. Since in most instances the proportion of defectives is small (20%) the limits to the Poisson distribution can be used. Pearson and Hartley’s (1976) table 40 provides CL ranging from 90% to 99.8% for observed frequency values of up to 50 defectives (c in their table). When the number of defectives is large ( 10), the normal approximation may be used and the 95% CLs are given by 2 np(1 p) of the observed number. In this expression, the term p should ideally be the true proportion but the observed value of p (i.e. pˆ ) can be used to provide an estimate. Table 5.3 compares the 95% CLs for two cases where the number of efectives is (a) 3 in 20 and (b) 30 in 200. The Poisson series provides a good approximation to the true (binomial) limits in both cases, but the normal approximation is of use only for the larger number of sample units. It should not be forgotten that even when 200 units are tested, there is still a 1 in 20 chance that the true proportion of defectives lies outside the 95% CLs.
CH005-N53039.indd 77
5/26/2008 4:51:57 PM
78
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
TABLE 5.3 95% Confidence Limits (CL) to an Estimated Percentage of Defectives Derived from Different Distributions 95% CL for 15% defectives assuming: Method of calculation assumes: Binomial distribution Poisson series approximation Normal approximation
3 defectives in 20 units 3–38% 3.1–44% 0.03–31%
30 defectives in 200 units 10–21% 10.1–21.4% 13.2–16.8%
The CL for the trinomial distribution can be derived similarly to that for the binomial, but the calculations are much more involved.
Variation in Sample Numbers The limiting precision of the sampling procedure depends on the number of sample units tested rather than on the size of the ‘lot’. In some sampling schemes the size of the sample has been related to the ‘lot’ size and/or the perception of risk associated with the product. For instance, many national control programmes are based on inspection of samples of fresh and processed foods for a range of defined defects (decomposition, unwholesomeness, etc), the number of samples to be inspected increases with increasing lot size and is based on defined sampling plans. However, tests for microorganisms in such products are generally based on fewer samples, the actual number of samples tested being dependent upon the perceived risks of microbial contamination, the likely handling practices between production and consumption (e.g. whether of not the product will be cooked before consumption) and the potential susceptibility to pathogenic agents of the intended consumer group (infants, elderly, immunocompromised persons, etc.) (ICMSF, 1986, 2002). In most instances, the number of samples per lot is defined, for example, 5 units/lot for colony counts and up to 60 units for high risk products for examination for pathogens (Andrews and Hammack, 2003). Some older references to sampling of foods recommend that the sample size should be related to the square root of the ‘lot’ size. There is no scientific justification for this approach other than to increase the sample size as the ‘lot’ size increases. In practice, sampling schemes generally relate sample size to ‘lot’ size since the risks of a wrong decision become more serious as the ‘lot’ size increases. The most commonly used standard sampling tables include the ISO 2859 series (Anon., 2006a), other ISO standards (e.g. Anon., 2001a, 2003, 2004, 2006b), and those of the American National Standards Institute (Anon., 2001b). All specify plans based on the size of the ‘lot’ and a criterion termed the Acceptable Quality Level (see below). Examples of the relation between sample size and ‘lot’ size are illustrated in Tables 5.4–5.6 for various Acceptance Sampling Plans (see below).
CH005-N53039.indd 78
5/26/2008 4:51:57 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
79
TABLE 5.4 Acceptance Sampling – Single Sampling Plans for Proportion of Defectives by Attributes % Defective items in lots with chance of acceptancea of MIL STD code
Lot size
Sample size (n)
AQL %
Acceptance number (c)
0.99
0.95 (5% PR)
0.50
0.10 (10% CR)
B
9–15
3
4
0
0.3
1.7
21
54
C
16–25
5
2.5
0
0.2
1.0
13
37
D
26–50
8
1.5 6.5
0
0.1 2.0
0.6 4.6
8.3 20
25 41
E
51–90
13
1.0 4.0
0 1
0.08 1.2
0.4 2.8
5.2 12
16 27
F
91–150
20
0.65 2.5 4.0
0 1 2
0.05 0.8 2.2
0.3 1.8 4.2
3.4 8.2 13
11 18 24
G
151–280
32
0.4 1.5 2.5 4.0
0 1 2 3
0.03 0.5 1.4 2.6
0.2 1.1 2.6 4.4
2.1 5.2 8.3 11
6.9 12 16 20
H
281–500
50
0.25 1.0 1.5 2.5 4.0
0 1 2 3 5
0.02 0.3 0.9 1.7 3.7
0.1 0.7 1.7 2.8 5.3
1.4 3.3 5.3 7.3 11
4.5 7.6 10 13 18
J
501–1200
80
0.65 1.0 1.5 2.5 4.0
1 2 3 5 7
0.2 0.6 1.0 2.3 3.7
0.4 1.0 1.7 3.3 5.1
2.1 3.3 4.6 7.1 9.6
4.8 6.5 8.2 11 14
K
1201–3200
125
0.4 0.65 1.0 1.5 2.5
1 2 3 5 7
0.1 0.3 0.7 1.4 2.3
0.3 0.7 1.1 2.1 3.2
1.3 2.1 2.9 4.5 6.1
3.1 4.3 5.3 7.4 9.4
L
3201–10,000
200
0.4 0.65 1.0 1.5 2.5
2 3 5 7 10
0.2 0.4 0.9 1.5 2.4
0.4 0.7 1.3 2.0 3.1
1.3 1.8 2.8 3.8 5.3
2.7 3.3 4.6 5.9 7.7
Source: Modified from Anon (2001b). a
Chance of accepting lots of different quality (expressed as % individual items lying outside specification) for different sample sizes and acceptance criteria. The lot is accepted if c or fewer defectives are found in a sample size of n.
CH005-N53039.indd 79
5/26/2008 4:51:57 PM
CH005-N53039.indd 80 200 315 500 800
L M
N
P Q 1
0 1
Re
1
Re
1
0
Ac
1 2 2
0
Ac
1
0
3 3
2 2
1
Re
Ac
1
0
4 5
3 3
2 2
1
Re
Ac
1
0
6 7
4 5
3 3
2 2
1
Re
0
55
67
45
33
1
0
1
1
1
1
Ac Acceptance number: Re Rejection number
Use first sampling plan above arrow.
Re
Re
400 Ac
Re
650 Ac
Re
1000 Ac
8 10 11 14 15 21 22 30 31
Re
250 Ac
8 10 11 14 15 21 22 30 31 44 45
67
Re
150 Ac
8 10 11 14 15 21 22 30 31 44 45
67
45
Re
100 Ac
8 10 11 14 15 21 22 30 31 44 45
67
45
33
Re
65 Ac
8 10 11 14 15 21 22 30 31 44 45
67
45
33
22
Re
40 Ac
8 10 11 14 15 21 22
67
45
33
22
Re
25 Ac
8 10 11 14 15 21 22
67
45
33
22
Re
15 Ac
8 10 11 14 15 21 22
67
45
33
22
1
10 Ac
8 10 11 14 15 21 22
67
45
33
22
1
0
Re
6.5 Ac
8 10 11 14 15 21 22
67
45
33
22
1
Re
4.0 Ac
67 8 10 11 14 15 21 22 8 10 11 14 15 21 22
45
33
1 22
1
0
Re
2.5 Ac
8 10 11 14 15 21 22
45 67
33
1
0
22
1
Re
1.5 Ac
8 10 11 14 15 21 22
67
33 45
1 22
1
0
Re
1.0 Ac
8 10 11 14 15 21 22
67
1
0
22
1
Re
0.65 Ac
8 10 1114 15 21 22
67
45
33
1
0
Re
0.40 Ac
22 33
1
Re
0.25 Ac
1 22
1
Re
0.15 Ac
Use first sampling plan below arrow. If sample size equals, or exceeds, lot or batch size, do 100 percent inspection.
0
Re
Ac
Acceptable quality levels (normal inspection)
Source: Permission to reproduce extracts from BS 6001-1:1999 is granted by BSI. British Standards can be obtained in PDF format from the BSI online shop: http://www.bsi-global.com/en/Shop or by contacting BSI Customer Services for hardcopies: Tel 44(0)20 8996 9001, E-mail: mailto:
[email protected].
R
1250
125
K
J
H
32 50 80
13 20
8
2 3 5
Ac
Sample 0.010 0.015 0.025 0.040 0.065 0.10 size
G
D E E
B C
A
Sample size code letter
TABLE 5.5 Standard Plans for Attribute Sampling – Single Sampling Plans for Normal Inspection (Master Table)
80 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
5/26/2008 4:51:57 PM
CH005-N53039.indd 81 3 1 4 4
4 2 5 6
5 3 7 8
7 5 9 7 11 11 16 9 12 13 18 19 26 27
7 5 9 7 11 11 16 17 22 25 31 9 12 13 18 19 26 27 37 38 56 57
7 5 9 7 11 11 16 17 22 25 31 9 12 13 18 19 26 27 37 38 56 57
5 3 7 8
Source: Permission to reproduce extracts from BS 6001-1:1999 is granted by BSI. British Standards can be obtained in PDF format from the BSI online shop: http://www.bsi-global.com/en/Shop or by contacting BSI Customer Services for hardcopies: Tel 44(0)20 8996 9001, E-mail: mailto:
[email protected].
Ac Acceptance number: Re Rejection number Use corresponding single sampling plan (or alternatively, use double sampling plan below, where available)
Use first sampling plan above arrow.
7 5 9 7 11 11 16 17 22 25 41 9 12 13 18 19 26 27 37 38 56 57
5 3 7 8
4 2 5 6
7 5 9 7 11 11 16 9 12 13 18 19 26 27
7 5 9 7 11 11 16 9 12 13 18 19 26 27
7 5 9 7 11 11 16 9 12 13 18 19 26 27
3 8
2 6
1 4
7 5 9 9 11 11 16 17 22 25 31 9 12 13 18 19 26 27 37 38 56 57
5 7
4 5
3 4
7 5 9 7 11 11 16 9 12 13 18 19 26 27
3 8
2 6
1 4
0 3
7 5 9 7 11 11 16 9 12 13 18 19 26 27
5 3 7 8
7 5 9 7 11 11 16 9 12 13 18 19 26 27
5 3 6 8
5 3 7 8
4 2 5 6
4 2 5 6
3 1 4 4 5 3 7 8
4 2 5 6
3 1 4 4
2 0 2 3
5 7
4 5
3 1 4 4
7 5 9 7 11 11 16 9 12 13 18 19 26 27
5 3 7 8
1 4
0 3
4 2 5 6
3 4
2 2
7 5 9 7 11 11 16 9 12 13 18 19 26 27
5 3 7 8
5 3 7 8
4 2 5 6
7 5 9 7 11 11 16 9 12 13 18 19 26 27
3 8
2 6
7 5 9 7 11 11 16 9 12 13 18 19 26 27
5 3 7 8
2 7
4 5
4 2 5 6
3 1 4 4
4 2 5 8
3 1 4 4
2 0 2 3
0 1
0 1
3 4
2 0 2 3
0 1 2 0 2 3
0 1
2 2
Use first sampling plan below arrow. If sample size equals, or exceeds, lot or batch size, do 100 percent inspection.
2 0 2 3
R
0 1
1250 2500
1250 1250
First second
Q
4 2 5 6
4 3 7 8
800 800
First second
P
4 2 5 6
800 1600
500 500
First second
N 3 1 4 4
500 1000
315 315
First second
M
3 1 4 4
2 0 2 3
315 630
200 200
First second
L
0 1
200 400
125 125
First second
K
2 0 2 3
125 250
80 80
First second
J
0 1
4 2 5 6
3 1 4 4
0 3
0 1
80 160
50 50
First second
H
2 2
3 1 4 4
2 0 2 3
0 1
50 100
32 32
First second
G
1 4
2 0 2 3
0 1
32 61
20 20
First second
F
3 4
0 3
20 40
13 13
First second
E
2 2
0 1
13 26
8 8
First second
D
3 1 4 4
0 1
8 18
5 5
First second
2 0 2 3
2 0 2 3
3 6 5 10
3 3
First second
C
0 1
0 1
2 4
0 1
Acceptable quality levels (normal inspection) Cumulative 1.5 2.5 4.0 6.5 15 25 650 1000 1.0 10 40 65 100 150 250 400 sample 0.010 0.015 0.025 0.040 0.065 0.10 0.15 0.25 0.40 0.65 size Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Ac Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re Re
2 2
Sample size
First second
Sample
B
A
Sample size code letter
TABLE 5.6 Standard Plans for Attribute Sampling – Double Sampling Plans for Normal Inspection (Master Table)
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
81
5/26/2008 4:51:58 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
82
ACCEPTANCE SAMPLING BY ATTRIBUTES The risk of accepting an unsatisfactory ‘lot’ or of rejecting a satisfactory ‘lot’ leads to another aspect of sampling theory known as Acceptance Sampling. Any manufacturer producing food (or other goods) to an acceptable average quality will have variations between ‘lots’. The purchaser decides whether to accept or reject the ‘lot’ on the basis of tests on a random sample, or a series of random samples. Two types of error can result from decisions based on the results from the test sample(s): 1. a ‘good lot’ may be rejected (a Type I or ‘’ Error), or 2. a ‘bad lot’ may be accepted (Type II or ‘’ Error). Assume that the null hypothesis Ho for acceptance of a batch of product is that it is not contaminated by specific pathogens; then by default, the alternative hypothesis H1 is that the product is contaminated. If there is evidence of contamination, then we must reject Ho and accept H1. However, although the result that led to a decision to accept or reject a lot may be genuine there is a risk that it might be a false negative or false positive result. In setting up an attributes sampling plan it is normal to allow for a producer’s risk that acceptable product may be wrongly rejected – this is a Type I error. Conversely, there is a consumer’s risk that a product is not acceptable. The null hypothesis for rejection (Ho) is that the product is contaminated; H1 is that the product is not contaminated. So if a contaminated batch of product is accepted the decision error is known as a Type II error. The and values for establishing Type I and Type II limits are statistical probabilities of the normal distribution (Fig. 5.1). The aim of Acceptance Sampling is to reduce both risks to a minimum but without the need for excessive sampling. Two expressions commonly used are Acceptance Quality Level (AQL) and Rejectance Quality Level (RQL) (or the Lot Tolerance Percent Defective
α 0.05
(a)
β 0.10
(b)
FIGURE 5.1 The normal distribution curve showing the upper ( 0.05) and lower ( 0.10) CLs used in setting Acceptable Quality and Rejectable Quality Limits. Note that 95% of the cumulative distribution lies to the left of the value and that 90% of the cumulative distribution lies to the right of the value.
CH005-N53039.indd 82
5/26/2008 4:51:58 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
83
(LTPD)). The AQL defines the quality of good ‘lots’ that the purchaser is prepared to accept most of the time (normally set at P 0.95). Hence the Type I error (producer’s risk) of rejecting good quality ‘lots’ is 5%. The LTPD denotes the quality of poor ‘lots’ that the purchaser wishes to reject as often as possible – normally set at P 0.90. Hence the Type II error (consumer’s risk) is 10%. Consignments of intermediate quality will be accepted at a frequency somewhere between P 0.10 and P 0.95. The balancing of producer and consumer risk is of critical importance. For the producer, if a ‘good’ lot is wrongly rejected, he loses both the cost of production and the potential profit from sales; however, if a ‘bad’ lot is accepted the producer’s risks may include the costs of product recall, including the value of the recalled product, loss of reputation and legal costs arising from consumer and customer litigation, whilst the consumer will be dissatisfied with the product received.
Two-Class Plans Tables based on the binomial and Poisson distributions showing the size of sample and the acceptable number of defectives for different values of AQL and LTPD have been published, for example, the ISO 2859 series (Anon., 2006a) and the US MIL-STD-105D series (Anon., 2001b). Tables such as these, of which examples are shown in Tables 5.4 and 5.5, were derived primarily for acceptance of goods by inspection, although the term ‘inspection’ includes examination and testing – it is important to note the high levels of samples required for statistically viable sampling schemes. Generally, the AQL is designated in a purchase contract or some other documentation. The AQL value depends on the sample size, being higher for large samples than for small ones but the AQL does not describe the level of protection to the consumer for individual lots but more directly relates to what might be expected from a series of lots. It is necessary to refer to an operating characteristics (OC) curve (e.g. Fig. 5.2) of the sampling plan in order to determine the relative risks attached to the plan in terms of both producer’s risk and consumer’s risk. Generally the sampling plan varies according to ‘lot’ size so that more samples are taken from larger ‘lots’ in order to reduce the risk of accepting a ‘lot’ with a high proportion of poor quality products. For instance, it can be seen from Table 5.4, sampling plane code J, that for a ‘lot’ containing between 501 and 1200 units, 80 sample units should be examined. Assuming an AQL of 1.0%, the ‘lot’ would be accepted if not more than two units out of the 80 were found to be defective – this critical acceptance number is termed c. At such a level, there would still be a chance that defective items would occur in the acceptable ‘lot’ to the extent that 1.03% defectives would be accepted with a probability of 0.95 (i.e. the 5% producer’s risk) and that 6.52% defectives would be accepted with a probability of 0.10 (i.e. the 10% consumer’s risk). Table 5.5 shows the critical (c) levels for acceptance and rejection by normal inspection for various sample sizes and different levels of AQL. Again, taking sample size code J as an example, with 80 samples examined (n), an AQL of 10% would permit acceptance with not more than 14 defective units (c) and would require rejection with 15 defective units.
CH005-N53039.indd 83
5/26/2008 4:51:59 PM
Probability of acceptance of lot (Pa)
1.00
0.00
0.80
0.20
0.60
0.40
n5 c2 0.40
0.60 n 10 c2
Probability of rejection of lot (Pr)
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
84
n 10 c0 0.20
0.00
0.80
n 20 c0
0
10
20
30 40 % defectives in lot
50
60
1.00 70
FIGURE 5.2 Operating characteristics (OC) curves for typical attribute sample plans used in microbiological criteria for foods; n number of samples tested and c number of unsatisfactory test results that can be accepted. Note that the gradient of the lines increases as n increases and as c becomes smaller.
However, for an AQL of 2.5%, c 5 and for an AQL of 0.25%, c 0. These single sampling plans could therefore be described as follows: for AQL 10 %, for AQL 2 . 5 %, for AQL 0 . 25 %,
CH005-N53039.indd 84
n 80, n 80, n 8 0,
c 14 c5 c0
5/26/2008 4:51:59 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
85
TABLE 5.7 Percentage of Defectives in Consignments Which Would Have a 95% Chance of Acceptance (AQL) and a 90% Chance of Rejection (LTPD) by Different Sampling Schemes % Defectives in consignment when ca 0 Number of sample units (n) 10 20 30 50 100 150 200 500
95% Chance of acceptance 0.5 0.3 0.2 0.1 0.05 0.04 0.03 0.01
ca 1
90% Chance of rejection 25 12 8 5 2.5 1.7 1.2 0.5
95% Chance of acceptance 3.5 1.8 1.2 0.7 0.35 0.23 0.18 0.07
90% Chance of rejection 39 20 13 8 4 2.6 2.0 0.8
Source: Reproduced from Steiner, 1971 by permission of Leatherhead Food International. a c maximum acceptable number of defectives.
Because of the very large numbers of samples required, double or other multiple sequential sampling plans may be adopted. For a double sampling scheme (e.g. Table 5.6) and again, taking the data for sample code J, 100 samples would be drawn from the ‘lot’, but initially only 50 samples would be examined. For an AQL of 10% the ‘lot’ would be accepted with 7 or less defective units (c1) and rejected if 11 or more were found (c2). However, if an intermediate level of defectives (c) occurred (i.e. between c1 and c2) then the second set of 50 samples would be tested. The cumulative critical defective level (c) would then be 18. Such a plan could be described as: Double sampling, AQL 10 %;
n1 50, c1 7, c2 11 n 100, c 18
Table 5.7 illustrates the 95% probability for acceptance (5% AQL) and 90% probability for rejection (10% LTPD) for different sample numbers and two levels of maximum acceptable number of defectives based on Poisson distribution. The figures can be used also to derive the upper boundaries describing the number of specific organisms in a given quantity of a product. For instance, if the sampling scheme required testing of 10 sample units (n 10) with no salmonellae detected (c 0) then the ‘lot’ would be accepted with a 95% probability that the prevalence of contamination in the ‘lot’ is less than 0.5% and rejected with a 90% probability that the prevalence of contamination is 25% or more. However, even if no salmonellae were detected, there would be a 5% probability that the prevalence of
CH005-N53039.indd 85
5/26/2008 4:51:59 PM
86
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
contamination would be between 0.5% and 25%. If we assume that a positive test requires the presence of at least one organism per sample and if the sample unit is 5 g, then the prevalence of contamination is 1 organisms/25 g 0.04 organism/g 40 organisms/kg. Now if 25% of the 25 g sample units are positive, then on average the overall contamination level would be at least 10 organisms/kg (compare with Example 8.5, which assumes binomial distribution and determines the maximum level as 10.4 salmonellae/kg). There is also a 10% probability that a rejected ‘lot’ may contain more than 25% prevalence of contamination. Internationally agreed Statistical Sampling Tables are available for Attributes Sampling schemes and for Variables Sampling schemes with either known and unknown standard deviations. They are intended for use in conjunction with continuous production and provide facility to switch from normal to tightened or reduced inspection based on previous performance. International standards exist for other criteria, for instance Sequential Attribute Sampling Plans according to defined AQL (Anon., 2005a), Attribute Sampling Plans for skip-lot procedures (Anon., 2005b) and Attribute Sampling Plans for limiting AQL for isolated lots (Anon., 2005c). Other standard tables exist also for specific products, for example, milk and milk products (Anon., 2004). However, the schemes are basically suitable only for inspection-type tests and, therefore, whilst of value to the microbiologist in field studies (e.g. in assessing the quality of sacks of stored grain with respect to overt mould or insect damage; or the incidence of bottles of a beverage showing evidence of microbial growth), the sheer quantity of testing required would totally swamp any laboratory facility. They serve, therefore, as ideals towards which we might aim if effective rapid automated microbiological testing becomes better established.
Three-Class Plans In precisely the same way that 2-class plans are constructed for various AQL and LTPD levels based on binomial or Poisson distributions, 3-class plans can be derived from the trinomial distribution. Examples of such plans are cited by Bray et al. (1973) and by ICMSF (1986, 2002). Since 3-class plans are used only for microbiological colony count data (and not for presence/absence tests) the criteria for acceptable or defective samples are defined in terms of two colony count levels. The lower criterion (designated m) is the target level below which colony counts should fall most of the time; the upper criterion (M) is that level which must not be exceeded. Hence marginal defectives will have counts intermediate between m and M. Since no colony count should exceed M, the maximum acceptable level of defective units (c) will be zero, but some marginal defective units (c1) will be permitted out of n samples tested. Hence, the acceptable proportions of defectives is p2 0 and the proportion of marginal defectives is p1 c1/n. Hence, the probability for acceptance of samples (Pa) can be derived from a simplification of the trinomial expansion, for values of i from 0 to c1: Pa
CH005-N53039.indd 86
∑ ( i ) p0(ni) p1i c1
n
i 0
5/26/2008 4:52:00 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
87
where p0 proportion of good samples, p1 proportion of marginal samples, n number of sample units tested and c1 number of marginal defectives permitted. This has the same form as that of a binomial expansion but is not a general binomial since p0 p1 1. Values of Pa can be calculated from the equation: ⎪⎪⎧ c n ⎛ p ⎞⎟i ⎪⎫⎪ ⎞⎟(ni) ⎛ p 0 1 ⎜ ⎜ ⎪ ⎟ ⎪⎬ (p p1 )n ⎟ Pa ⎨ ∑ ⎜⎜ ⎜ ⎪⎪ i0 i ⎜⎜⎝ p0 p1 ⎟⎟⎠ ⎜⎝ p0 p1 ⎟⎟⎠ ⎪⎪ 0 ⎪⎩ ⎪⎭
()
The expression inside the braces { } is a cumulative binomial term which can be read directly from standard Tables of the binomial distribution (Pearson and Hartley (1976), table 37). Example 5.3 illustrates the derivation of data for a 3-class sampling plan.
EXAMPLE 5.3 DERIVATION OF PROBABILITIES FOR ACCEPTANCE (Pa) Suppose that it is desirable to assess the probability for acceptance (Pa) of a batch of product expected to have up to 2% defective items and up to 30% marginal defectives. The proposed sampling plan permits acceptance of no defective units (c2 0) in 10 sample units (n 10). What would be the probability of acceptance for sample plans that permit 0, 1 or 2 marginal defectives (c1 0, 1 or 2)? Firstly let us assume that the test methods can determine only acceptable or unacceptable items (i.e. a 2-class sample plan). The binomial probability that an event will occur x times out of n tests is given by: ⎤ x ⎡ n! ⎥ p (1 p) nx P(x) ⎢ ⎢⎣ (n x)!x! ⎥⎦ where, p is the probability that the event will occur. An estimate of p (pˆ ) can be derived from the expected 2% incidence of defectives, so that pˆ 0.02. Then for c2 0, Pa [10!/(10 0)! 0!] 0.020 (1 0.02)100 0.9810 0.82 Hence on average we would expect that for 10 replicate sample units, no defective units would be found in 82 tests out of every hundred undertaken (cf. Table 5.1). Next, let us consider the occurrence of marginal defectives at levels up to 30% but with zero defectives. The proportion of defectives is given by (pˆ 0 ) 0 . 00 , and the proportion of marginal defectives by (pˆ 1 ) 0, 0 . 10, 0 . 20 or 0 . 30 . From the binomial expansion used above or from standard Tables of Binomial Frequencies (e.g. Pearson and Hartley (1976) table 37) we can derive the following values for the probability of occurrence of
CH005-N53039.indd 87
5/26/2008 4:52:00 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
88
0, 1 or 2 ‘defectives’ in a sample of 10 units, where ‘defectives’ are now defines ‘marginal defectives’: Probability of occurrence of ‘defectives’
Number of ‘defectives’ in sample
Probability of acceptance with a proportion ˆ of marginal defectives (P1 ) of:
Px
C1
0.0
0.10
0.20
0.30
P(x0) P(x1) P(x2)
0 1 2
1.0 0.0 0.0
0.35 0.39 0.19
0.11 0.27 0.30
0.03 0.12 0.23
(Px < 1)
1.0
0.74
0.38
0.15
(Px < 2)
1.0
0.93
0.68
0.38
Cumulative probability
Hence if n 10 and c1 0, the probabilities of acceptance of a batch with 0% defectives and 0, 10, 20 or 30% marginal defectives would be 1.0, 0.35, 0.11 or 0.03, respectively. But, if n 10 and c1 2, the cumulative probabilities of acceptance (Pa) for these levels of marginal defectives would be 1.0, 0.93, 0.68 and 0.38, respectively. Now it has already been shown above that if the batch contains 2% defectives and 0% marginal defectives the probability of acceptance, with c2 0, is Pa 0.82. Hence for a batch containing 2% defectives and 30% marginal defectives, the cumulative probability of acceptance is given by: Pa 0.82 0.03 0.025 Pa 0.82 0.38 0.31
if c1 0 if c1 2
and c2 0, and and c2 0
By extending this procedure for other values we can derive the required Pa values for the 3-class plans given below and in Table 5.8.
Proportion of marginal defectives (Expected defectives 2%) 0.0 0.1 0.2 0.3
CH005-N53039.indd 88
Probability of acceptance (Pa) with c2 0, and c1 0
c1 1
0.82 0.29 0.09 0.025
0.82 0.61 0.31 0.12
c1 2 0.82 0.76 0.56 0.31
5/26/2008 4:52:00 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
89
Table 5.8 shows that for n 10, the probability of acceptance of a ‘lot’ containing 2% defectives and 20% marginal defectives is 0.55 when c1 2, but only 0.09 when c1 0. The sampling plan is more stringent when c1 0, but there is an equal chance of accepting a ‘lot’ with 5% defective units and 0% marginal defectives with either ci 0 or ci 2. In setting a 3-class plan for a specified AQL, it is necessary to decide on an arbitrarily defined probability of acceptance (Pa) of, say, 95%, for ‘lots’ of specific quality; (e.g. 90% good, 10% marginal, 0% bad) and then to choose a provisional sample size n (e.g. n 10). From Table 5.9 for n 10, p0 0.90, p1 0.10, p2 0, we find Pa 0.93 for c1 2. Therefore, an appropriate sampling plan would be n 10, c1 2. Such a plan would also accept ‘lots’ with 50% marginal (or 45% marginal, and 5% defectives) with Pa 0.05 (AQL 5%) and would accept 60% marginal defectives (or 40% marginal defectives and 20% defectives) with Pa 0.01. Table 5.10 provides an AQL chart for 3-class sample plans based on Bray et al. (1973). Probabilities marked * indicate suitable plans for an AQL of about 95% and those marked ** indicate the equivalent cut-off point for a LTPD of 10%.
TABLE 5.8 Probability of Acceptance for a 3-Class Plan with N , n 10, c1 0 or 2 and c2 0a Marginal defectives in lot %
Defectives in lot % 0
2
5
10
20
30
40
50
c1 2 70 60 50 40 30 20 10 0
0.00 0.01 0.05 0.17 0.38 0.68 0.93 1.00
0.00 0.01 0.04 0.14 0.31 0.55 0.76 0.82
0.00 0.01 0.03 0.10 0.23 0.41 0.56 0.60
0.00 0.02 0.06 0.13 0.24 0.32 0.35
0.00 0.01 0.02 0.04 0.07 0.10 0.11
0.00 0.01 0.02 0.03 0.03
0.00 0.01 0.01
0.00
c1 0 50 40 30 20 10 0
0.00 0.01 0.03 0.11 0.35 1.00
0.00 0.02 0.09 0.29 0.82
0.00 0.02 0.07 0.21 0.60
0.00 0.01 0.04 0.12 0.35
0.00 0.01 0.04 0.11
0.00 0.01 0.03
0.00 0.01
0.00
a N lot size, n sample size and c2 number of defective units permitted and c1 number of marginally defective units permitted.
CH005-N53039.indd 89
5/26/2008 4:52:00 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
90
TABLE 5.9 Probabilities of Acceptance for 3-Class Plans (N ) Using c2 0 and c1 Number of Marginally Defective Samples Acceptable (Sample size 10)
% of lot Good 99 99 97 97 95 95 90 90 90 80 80 80 80 80 70 70 70 70 60 60 60 60 60 50 50 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20
Marga
Bad
0
1
2
3
4
5
6
7
8
9
10
1 0 3 1 5 3 10 8 5 20 18 15 10 5 30 25 20 10 40 35 30 20 10 50 45 40 30 20 10 60 50 40 30 70 60 50 40 80 70 60
0 1 0 2 0 2 0 2 5 0 2 5 10 15 0 5 10 20 0 5 10 20 30 0 5 10 20 30 40 0 10 20 30 0 10 20 30 0 10 20
0.90 0.90 0.74 0.74 0.60 0.60 0.35 0.35 0.35 0.11 0.11 0.11 0.11 0.11 0.03 0.03 0.03 0.03 0.01 0.01 0.01 0.01 0.01 0.00
1.00 … 0.97 0.81 0.91 0.79 0.74 0.66 0.54 0.38 0.35 0.31 0.24 0.17 0.15 0.13 0.11 0.07 0.05 0.04 0.04 0.03 0.02 0.01 0.01 0.01 0.01 0.00 0.00 0.00
… … 1.00 0.82 0.99 0.81 0.93 0.78 0.59 0.68 0.59 0.48 0.32 0.19 0.38 0.29 0.21 0.09 0.17 0.13 0.10 0.06 0.02 0.05 0.05 0.04 0.02 0.01 0.00 0.01 0.01 0.01 0.00 0.00
… … … … 1.00 0.82 0.99 0.81 0.60 0.88 0.74 0.56 0.34 0.20 0.65 0.45 0.29 0.10 0.38 0.28 0.20 0.08 0.03 0.17 0.13 0.10 0.05 0.02 0.01 0.05 0.03 0.02 0.01 0.01 0.01 0.00 0.00 0.00
… … … … … … 1.00 0.82 … 0.97 0.80 0.59 0.35 … 0.85 0.54 0.33 0.11 0.63 0.42 0.27 0.10 … 0.38 0.27 0.18 0.07 0.02 … 0.17 0.09 0.04 0.02 0.05 0.03 0.01 0.01 0.01 0.00 0.00
… … … … … … … … … 0.99 0.81 0.60 … … 0.95 0.58 0.34 … 0.83 0.53 0.32 0.11 … 0.62 0.41 0.26 0.09 0.03 … 0.37 0.17 0.07 0.02 0.15 0.07 0.03 0.01 0.03 0.02 0.01
… … … … … … … … … 1.00 0.82 … … … 0.99 0.60 0.35 … 0.95 0.58 0.34 … … 0.83 0.52 0.32 0.10 … … 0.62 0.25 0.09 0.03 0.35 0.15 0.06 0.02 0.12 0.06 0.02
… … … … … … … … … … … … … … 1.00 … … … 0.99 0.59 0.35 … … 0.95 0.58 0.34 0.11 … … 0.83 0.31 0.10 … 0.62 0.24 0.08 0.02 0.32 0.14 0.05
… … … … … … … … … … … … … … … … … … 1.00 0.60 … … … 0.99 0.59 0.35 … … … 0.95 0.34 0.11 … 0.85 0.31 0.10 0.03 0.62 0.24 0.08
… … … … … … … … … … … … … … … … … … … … … … … 1.00 0.60 ··· … … … 0.99 0.35 … … 0.97 0.34 0.11 ··· 0.89 0.32 0.10
1.00 0.90 1.00 0.82 1.00 0.82 1.00 0.82 0.60 1.00 0.82 0.60 0.35 0.20 1.00 0.60 0.35 0.11 1.00 0.60 0.35 0.11 0.03 1.00 0.60 0.35 0.11 0.03 0.01 1.00 0.35 0.11 0.03 1.00 0.35 0.11 0.03 1.00 0.35 0.11
Source: Permission to reproduced from Bray et al., 1973. a Marg Marginal defectives.
CH005-N53039.indd 90
5/26/2008 4:52:01 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
91
TABLE 5.10 Probabilities of Acceptance for 3-Class Plans, Chosen So That AQL 95% (*), LTPD 10% (**) and c2 0 Percent of lot
Sample size (n) 3
Good
AQL
LTPD
Marginal
Bad
10
25
60
Maximum number of marginal defectives (c1) acceptable 1
2
5
9
99 95 90 80 70 60 50 40 30
1 5 10 20 30 40 50 60 70
0 0 0 0 0 0 0 0 0
1.00 0.99 0.97* 0.90 0.78 0.65 0.50 0.35 0.22
1.00 0.99 0.93* 0.68 0.38 0.17 0.05* 0.01 0.00
1.00 1.00 0.97* 0.62 0.19 0.03 0.00**
1.00 1.00 0.93* 0.21 0.01 0.00 **
99 90 80 60 40
0 9 19 30 59
1 1 1 1 1
0.97 0.95 0.88 0.64 0.35
0.90 0.85 0.63 0.16 0.01
0.78 0.75 0.52 0.02 0.00
0.55 0.52 0.14 0.00
98 90 80
0 8 18
2 2 2
0.94 0.92 0.86
0.82 0.78 0.59
0.60 0.60 0.42
0.30 0.29 0.09
Tables of the various probabilities for different proportions of good, marginal and defective sample units, and various sample plans (i.e. values of n and c) are given by Bray et al. (1973) and by ICMSF (1986, 2002).
Operating Characteristic Curves A convenient way of looking at the relative efficiencies of various sampling plans is to construct a series of OC curves. An OC curve is a graphical representation of the cumulative probability of acceptance (Pa) against the percentage of defective sample units in the ‘lot’. This is illustrated in Fig. 5.2 for various 2-class sampling plans. The steepness of the OC curve increases with both an increase in sample numbers (n) and a reduction in acceptance numbers (c). For instance, a sampling plan of n 20, c 0 would accept 5% defectives
CH005-N53039.indd 91
5/26/2008 4:52:01 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
92
1.00
0.00 1.00 0.20 0.80 0.40
Pa
Pr
0.60 0.60 0.40 0.80 0.20 1.00 0 0
10 5
20
10 %
30
20 30 de fe in ctive lot s
50 40
60 50
70
%
es tiv
c 40 efe ld a n t rgi in lo ma
FIGURE 5.3 An Operating Characteristics (OC) surface plot for a 3-class attributes plan with n 10, c1 2 and c2 0 (data from Table 5.8), where n number of samples tested, c1 number of acceptable marginal defectives and c2 number of defectives permitted. The right hand OC plot gives the values for a 2-class plan with n 10, c2 0 and 0% marginal defectives. The left hand OC plot is for a 2-class plan with n 10, c1 2 with 0% defectives. The other curves are for different values of prevalence of defectives plotted against the prevalence of marginal defectives. The ‘surface’ of the plot shows the probability of acceptance Pa (or rejection Pr) for various combinations of prevalence of defective and marginally defective samples.
CH005-N53039.indd 92
5/26/2008 4:52:01 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
93
36 times in 100 tests, whereas a plan with n 10, c 0 would accept the same level of defectives in 60 out of 100 tests. For these same sampling plans and percentage ‘lot’ defectives the probabilities of rejection (Pr) would be 64 and 40 times out of 100 tests, respectively. However, for plans that allow two defective units out of 5 or 10 sample units, there is little difference in Pa at 5% ‘lot’ defectives. The ideal OC curve would have a vertical ‘cut-off’ for the acceptable percentage of defectives in a ‘lot’. Such a curve would require the testing of so many samples per ‘lot’ that it would be both economically and technically impracticable. OC curves for 3-class plans can be considered to form a geometric surface on a 3-dimensional ‘plot’ of Pa (or Pr) against the percentage of ‘lot’ defectives and the percentage of ‘lot’ marginal defectives. This is illustrated in Fig. 5.3 for the sampling plan n 10, c1 2, c2 0, the acceptance probability (Pa) values of which are presented in Table 5.8.
ACCEPTANCE SAMPLING BY VARIABLES As is the case for Attributes Sampling, international standards have been produced for Variables Sampling Plans (see for instance, Anon., 2007) but they are rarely used in assessment of data from the microbiological examination of food. Sampling plans based on such Tables are analogous to those in Tables 5.4–5.7, but relate acceptable quality levels to the mean and standard deviation of populations. A major disadvantage of Attributes Sampling schemes is that no assumptions are made about the distribution of measurable parameters within a ‘lot’. Whilst this is not of consequence in schemes based purely upon inspection for defects, the use of Attributes schemes based upon the acceptability, or otherwise, of estimates of variable parameters will result in excessive risk of rejection of otherwise acceptable materials. Indeed, Clark (1978) pointed out that in using an ICMSF sampling plan for n 5, c 1 with a ‘lot’ containing no defective material and only 10% marginal defective units (i.e. with 10% counts m and M) there is a 1 in 12 chance of rejection of the ‘lot’). Clearly, such a risk is unacceptably high and could result in the wastage of much valuable foodstuff. The primary advantage of a Variables Sampling scheme is that proper use of quantitative data should permit ‘better decision processes’ and lead to ‘economic benefits’ (Gascoigne and Hill, 1976). The primary disadvantages are claimed to be: (1) the need for analysis of laboratory data, rather than merely making a ‘go – no go’ decision; and (2) an absolute requirement that the data obtained for the parameter being estimated conforms to a normal frequency distribution. Neither of these disadvantages is real! The widespread availability of low cost computing facilities provides facility for data analysis as part of laboratory quality assurance procedures and microbiological colony count data can be ‘normalized’ by transformation. If one assumes that a set of colony counts on a ‘lot’ of food follows a contagious distribution with variance greater than the mean (i.e. s2 x) , as is generally the case, then
CH005-N53039.indd 93
5/26/2008 4:52:02 PM
94
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
these data may be ‘normalized’ by a logarithmic transformation (Table 3.4). It is generally the case that microbiological colony counts conform at least approximately to a log-normal distribution (Fig. 2.1; also Kilsby et al., 1979). Hence, Variables Sampling schemes can be applied in microbiological control systems where colony counts are determined. However, presence-absence tests cannot be treated in this way unless sufficient replicates are tested to provide an estimate of Most Probable Numbers (see Chapter 8). How, then, is a variables scheme formulated? A variables scheme depends on defining a percentage, d, of units in a ‘lot’ that exceeds a defined critical limit in order to assess whether the ‘lot’ is unsatisfactory. If the critical limit is defined in terms of a log colony count (C) then, for a set of n samples, we can determine whether the percentage of values exceeding C exceeds the permissible level (d). For a normal distribution, the percentage (K) exceeding the value C depends solely on the population distribution and is given by: K (C )/, where the population mean, and the population standard deviation. By reference to Tables of the standard normal deviate (Pearson and Hartley, 1976; table 1) the percentage beyond the value C can be determined for any value of K. For example, 16% of values greater than C would correspond to K 1 and 5% of values above C corresponds to K 1.65. Conversely, the critical value of K (Kd) corresponding to d may be determined, for example, for d 5%, then Kd (C )/ 1.65. Low values of K imply a high percentage of values above C so that rejection may be warranted. In most food situations the values of and will not be known, but they can be estimated from the sample data by the statistics for the mean (x) and standard deviation (s). Hence, an estimate of K (Kˆ ) can be derived from: Kˆ (C x)/ s . Since the estimate (Kˆ ) of K is as likely to exceed Kd as it is to fall below it, some allowance must be made for the imprecision of Kˆ . This is achieved by determining the value of Kˆ , such that the probability of Kˆ K is at least P when K Kd, and P is the desired lowest probability for rejection. Values of K may be calculated for particular situations (Bowker, 1947; Kilsby et al., 1979) or may be obtained from standard Tables (Bowker and Goode, 1952, table K). However, Malcolm (1984) showed that, for low values of n, the values of K derived by Kilsby et al. (1979) were not sufficiently precise, since they were derived by a mathematical approximation procedure. Use of the non-central t distribution for the calculation permitted derivation of more precise values for K (Table 5.11). The acceptance criterion with an upper critical level C is given by x Ks C for an acceptance probability of P and the rejection criterion by x Ks C as illustrated in Fig. 5.4. Kilsby et al. (1979) recommended that the scheme be used for control purposes in relation to both safety and quality specifications, in a manner analogous to the ICMSF attributes scheme, but with lower producer’s risk. It may be used also to ensure compliance with Good Manufacturing Practices (GMP) by the introduction of a lower critical limit (Cm) (Table 5.11). The use of such a scheme is illustrated in Example 5.4. Regrettably this approach to sampling criteria has not been accepted widely, possibly since it involves replicate tests and calculation of statistical values rather than just mindless comparison of the results of tests with defined parameters.
CH005-N53039.indd 94
5/26/2008 4:52:02 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
95
TABLE 5.11 Values of K, Calculated Using the Non-Central t Distribution for use in setting specifications for variables sampling. (a) Safety/Quality Specification (x Ks C) Probability of rejection (Pr)
Proportion exceeding C (d)
Number of replicates (n) 3
4
5
6
7
8
9
10
4.4 3.3 2.5 1.9 1.4
3.9 2.9 2.2 1.6 1.2
3.5 2.6 2.0 1.5 1.1
3.2 2.4 1.8 1.4 1.0
3.0 2.2 1.7 1.3 0.9
0.99
0.1 0.2 0.3 0.4 0.5
2.3
5.4 4.0 3.0 2.3 1.7
0.95
0.05 0.1 0.3 0.5
7.7 6.2 3.3 1.7
5.1 4.2 2.3 1.2
4.2 3.4 1.9 0.95
3.7 3.0 1.6 0.82
3.4 2.8 1.5 0.73
3.2 2.6 1.4 0.67
3.0 2.4 1.3 0.62
2.9 2.4 1.3 0.58
0.90
0.1 0.25
4.3 2.6
3.2 2.0
2.7 1.7
2.5 1.5
2.3 1.4
2.2 1.4
2.1 1.3
2.1 1.3
10
(b) GMPa limit: (x Ks Cm) Probability of acceptance Pa
Proportion exceeding Cm d
Number of replicates (n) 3
4
5
6
7
8
9
0.95
0.10 0.20 0.30
0.33 0.13 0.58
0.44 0.02 0.36
0.52 0.11 0.24
0.57 0.17 0.16
0.62 0.22 0.10
0.66 0.26 0.06
0.69 0.29 0.02
0.71 0.32 0.00
0.90
0.05 0.10 0.20 0.30 0.40 0.50
0.84 0.53 0.11 0.26 0.65 1.09
0.92 0.62 0.21 0.13 0.46 0.82
0.98 0.68 0.27 0.05 0.36 0.69
1.03 0.72 0.32 0.01 0.30 0.60
1.07 0.75 0.35 0.04 0.25 0.54
1.10 0.78 0.38 0.07 0.21 0.50
1.12 0.81 0.41 0.10 0.17 0.47
1.14 0.83 0.43 0.12 0.16 0.44
0.75
0.01 0.05 0.10 0.25 0.50
1.87 1.25 0.91 0.31 0.47
1.90 1.28 0.94 0.35 0.38
1.92 1.31 0.97 0.38 0.33
1.94 1.33 0.99 0.41 0.30
1.96 1.34 1.01 0.42 0.27
1.98 1.36 1.02 0.44 0.25
2.00 1.37 1.03 0.45 0.24
2.01 1.38 1.04 0.46 0.22
Source: Reproduced from Malcolm, 1984, by permission of Blackwell Publishing. GMP Good Manufacturing Practice.
a
CH005-N53039.indd 95
5/26/2008 4:52:03 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
96
σ,s
K K (Kd )
K (Kd )
K
Reject Reject
Accept Accept
(a)
μ, x
(b)
FIGURE 5.4 Variables Sampling – rejection and acceptance criteria on a standards graph (reproduced from Kilsby et al., 1979, by permission of Blackwell Publication). (a) Rejection: k is larger than K such that the chance of rejection (response to the right hand side of the line k) is increased (50%). (b) Acceptance: k is smaller than K such that the chance of acceptance (response on the left hand side of the line k) is increased (50%).
EXAMPLE 5.4 APPLICATION OF A VARIABLES SAMPLING SCHEME Suppose: (1) that for a food product the absolute critical limit (C) that is not to be exceeded is 106 cfu/g ( M in ICMSF plans) and that under GMP a limit (Cm) of 105 cfu/g should be achievable; (2) that five samples are to be tested per lot; and (3) that the producer wishes to reject with a 95% probability any lots where the proportion (d) exceeding C is 10% or greater and to accept with a 90% probability only those lots where the proportion which exceeds Cm is less than 20%. The food safety specification requires rejection if x K1 s C. Then from Table 5.11, for n 5, P 0.95 and d 0.10: K1 3.4. For the safety criterion, reject the ‘lot’ if: x 3 . 4 s 6 The GMP specification requires acceptance if: x K 2 s < Cm and, from Table 5.11, for n 5, P 0.90 and d 0.20: K2 0.27. Hence for GMP criterion, accept if: x 0 . 27s < 5 Suppose that for two lots of product the following data are obtained: Lot A B
CH005-N53039.indd 96
Log10 cfu/g 4.52; 4.28; 4.79; 4.91; 4.50 4.98; 5.02; 5.28; 4.61; 5.11
Mean (x )
SD (s)
4.60 5.00
0.25 0.25
5/26/2008 4:52:03 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
97
Then for Lot A: x 3.4s 4.60 3.4(0.25) 5.45 6.0 (food safety criterion), and 4.60 0.27(0.25) 4.67 5.0 (GMP criterion).
x 0.27s
Hence, the product from Lot A would be acceptable on both safety/quality and GMP criteria. For Lot B: x 3.4s 5.0 3.4(0.25) 5.85 6.0 (food 5.0 0.27(0.25) 5.1 5.0 (GMP criterion)
safety
criterion),
and
x 0.27s
Hence, the product of Lot B would be unacceptable in terms of the GMP specification but would not be rejected on safety criteria.
TABLE 5.12 Random Numbers Column number Row number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
CH005-N53039.indd 97
1–4
5–8
9–12
13–16
17–20
21–24
25–28
29–32
33–36
37–40
87 08 88 33 22 50 48 70 93 45 50 76 91 64 33 20 90 59 05 10 92 85 08 50 59 36 05 85 13 46 56 27 54 15 83 01 00 28 52 29 64 43 80 68 28 72 23 48 04 41
83 09 78 20 09 11 56 57 40 93 72 02 01 34 63 71 65 46 93 57 63 26 36 55 77 09 86 43 99 31 09 33 62 11 86 58 17 33 68 59 09 80 13 48 57 80 49 96 27 70
40 14 40 40 00 37 16 24 12 80 39 19 67 13 95 94 78 82 80 32 69 69 82 11 83 87 25 50 30 29 66 79 22 33 89 77 81 42 32 69 68 29 80 84 54 05 00 17 10 49
39 15 24 73 36 51 21 74 88 63 40 69 11 00 13 77 16 45 86 65 81 54 26 54 81 77 76 70 71 70 32 29 39 39 68 87 24 33 40 30 86 65 25 33 80 92 88 90 13 76
99 24 77 70 55 95 91 53 26 93 57 23 32 09 12 44 97 85 26 90 70 56 76 88 93 77 36 32 91 10 50 54 58 30 29 71 55 75 89 12 60 27 70 89 82 65 63 67 99 38
21 85 00 31 83 97 18 05 85 05 09 33 39 76 12 94 57 75 27 54 17 62 85 67 48 44 26 68 99 84 76 94 73 43 49 50 42 70 11 07 87 70 76 61 25 01 02 64 64 14
00 45 84 59 13 75 59 61 19 87 20 70 21 64 91 04 79 96 34 94 43 17 82 21 88 30 54 92 55 31 27 01 59 32 46 53 73 65 18 53 77 45 03 41 74 58 71 12 90 60
54 19 26 06 46 22 74 97 84 37 86 45 29 85 41 83 79 08 46 33 86 78 65 00 37 21 84 90 95 20 45 87 26 43 56 53 16 96 27 13 31 69 57 89 89 39 21 02 69 75
36 18 50 30 77 50 31 82 59 76 13 94 65 14 79 72 16 83 65 35 99 62 83 89 74 02 02 38 90 28 29 66 76 12 41 53 47 17 46 54 12 31 87 07 25 05 29 86 10 97
03 88 95 96 11 72 77 68 16 65 98 39 51 74 44 08 43 99 56 84 34 15 06 09 93 10 77 40 49 78 23 15 99 10 52 20 42 69 85 40 21 79 56 12 57 66 88 54 16 60
5/26/2008 4:52:04 PM
98
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Statistical Considerations About Drawing Representative Samples Samples should be representative of the ‘lot’ from which they are drawn, such that the quality of each individual sample is neither better nor worse that the overall population from which the samples were taken. In drawing samples to conform to the requirements of a sampling plan it is of prime importance to avoid bias. Random sampling is universally recognized as the way to avoid bias and is safer than aiming deliberately to draw samples from specific parts of a ‘lot’. Whilst there is no guarantee that the quality of sample units drawn randomly will be typical of the ‘lot’, we can at least be confident that there was no deliberate bias in the choice of samples and statistical logic demands randomization. The simplest manner of doing this is to use random numbers drawn from standard tables (e.g. Table 5.12) or derived using a computer program. Each unit within a ‘lot’ is allocated a sequential number from 1 to N in any convenient sequence. Then, using a page of random numbers, select randomly a particular digit or series of digits, (e.g. by using a pencil point brought randomly onto the page). The digit, or series of digits, nearest to the pencil point is selected as the first random number. Then follow down the column(s) taking each sequential number that lies within the range 1 to N until sufficient numbers have been drawn for n sample units. This is illustrated in Example 5.5.
EXAMPLE 5.5 USE OF RANDOM NUMBER TABLES AND SAMPLE PLANS TO DRAW SAMPLES Suppose that sacks of flour in a warehouse are to be examined for evidence of insect infestation and mould. The lot consists of 1400 sacks, on pallets with 20 sacks per pallet. From sampling tables (Table 5.4, assuming ‘normal’ inspection) it is found necessary to draw 125 samples from the lot. Each pallet is sequentially numbered: 1–20, 21–40, 41–60, …, 1381– 1400, each sack being allocated a number according to its position on the pallet. Using random number tables (e.g. Table 5.12) of four random digits in consecutive columns and, using a pencil point to identify randomly row 10 in column 1, the first random number to be taken will be 0510 (i.e. row 10, columns 1–4; sample unit 510). The next value in row 11 (9285) is too large and is ignored; the third, fifth and sixth values (0850, 0585 and 1346, respectively) are retained. This process is continued as necessary until the required number (125) of sets of random digits is obtained, moving from the bottom set of digits (row 25, columns 1–4) to the first set (row 1) in columns 5–8, and so on. If a set of digits occurs more than once it is recorded only once. The 125 sets of 4 digits obtained give the numbers of sacks to be examined (viz: 510, 850, 585, 1346, 28, 441, 911, 134, 933, 980, 1348, 37, 1280, 17, 1049, …, etc.). An alternative approach, which introduces some bias yet makes sampling easier, is to deliberately stratify and to draw 13 out of the 70 pallets as primary samples (e.g. numbers 69, 11, 50, 29, 33, 42, 5, 17, 49, 39, 24, 36, 21, starting from row 11, columns 11–12 in Table 5.12).
CH005-N53039.indd 98
5/26/2008 4:52:04 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
99
Then to examine 5 sacks drawn at random from each identified pallet (e.g. for pallet number 69 examine sacks numbers 17 (row 11, columns 21–22) 11, 2, 5 and 7; for pallet number 11 examine sacks 9 (row 16, columns 5–6) 17, 13, 20 and 11) and so on. Note that in this deliberately stratified sampling scheme the total number of sample units to be tested will be lower than if samples are drawn totally at random since a 2-tier sample plan is used (i.e. for 70 pallets on normal inspection, plan E (Tables 5.4 and 5.5) requires 13 pallets to be tested, whilst for each pallet of 20 sacks plan C requires that only 5 sacks/pallet are to be tested – hence only 65 sacks are tested. One or more positive tests from the 5 sacks/pallet would provide evidence that that pallet is unacceptable. If the AQL required were, say, 4% then the following probabilities of acceptance would result: Number of units
% Defectives
Number of units in lot
To test (n)
Acceptable defective (c)
AQL (%)
Accepted at Pa 0.95
Rejected at Pa 0.10
1400 sacks 70 pallets 20 sacks
125 13 5
10 1 0
4 4 2.5
4.9 2.8 2.1
12.3 26.8 37.0
Note that the deliberately stratified test would be less stringent for a required AQL than would the non-stratified test. Only 1 of the 70 pallets could be accepted if found defective, and none of the five sacks examined per pallet could be accepted if defective; yet the overall rejection level would be higher (i.e. at Pa 0.1) the accepted lot might contain up to 37% defectives compared with 12% for the examination of 125 individual sacks.
Stratified Sampling Where sub-populations may vary considerably, or where different ‘lots’ are combined to make up a consignment, it is necessary to identify the relative proportion of each ‘lot’ or portion in a consignment. The number of samples drawn should then reflect the overall composition of the consignment. For instance, if a consignment consists of 20% of Lot A, 50% of Lot B and 30% of Lot C, then the number of samples drawn from each lot should be representative of the whole, that is a total of, say, 40 samples should comprise 8 samples from Lot A, 20 from Lot B and 12 from Lot C. In these circumstances, deliberate stratification is used in order to ensure that a proportionate number of sample units are taken from each of the different strata in order to provide an equal opportunity for random sample units from different strata to be included in the total sample. However, the results from the individual ‘lots’ can be pooled only if no evidence for heterogeneity is found in the results.
CH005-N53039.indd 99
5/26/2008 4:52:04 PM
100
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Frames Sometimes it is not possible to sample a whole ‘lot’ or consignment because of problems of accessibility. Such a situation might arise in sampling stacks of cartons within a cold store, warehouse or a ship’s hold. In such situations, sample units are drawn at random from those parts of the ‘lot’ that are accessible (referred to as ‘frames’). After analysis, the results are applicable only to the frame – not to the whole ‘lot’ – but if several distinct frames are tested and the results are homogeneous then it is a reasonable presumption that the results on the frames may be considered to be indicative of those that might have been obtained on the whole ‘lot’.
Single or Multiple Sampling Schemes Normally, a given number of random samples is drawn for testing from a ‘lot’ but sometimes the numbers required to be tested may be too high. In such circumstances, a multiple sampling scheme may be used. Each bulk unit (e.g. a case of product) is numbered and the required number of bulk sample units is drawn using random numbers. An appropriate number of individual sample units is then drawn from each bulk unit, again using random numbers to identify the individual unit(s) to be tested. This is illustrated in Example 5.5.
References Anon. (1999) Sampling Procedures for Inspection by Attributes. Sampling Schemes Indexed by Acceptance Quality Limit (AQL) for Lot-by-Lot Inspection. British Standards Institute, London, BS. BS 6001-1:1999, ISO 2859-1:1999. Anon. (2001a) Statistical Aspects of Sampling from Bulk Materials – Part 2: Sampling of Particulate Materials. International Organisation for Standardisation, Geneva. ISO 11648-2:2001. Anon. (2001b) American National Standards Sampling Procedures and Tables for Inspection by Attributes. MIL-STD-105D, Revision E. Anon. (2003) Statistical Aspects of Sampling from Bulk Materials – Part 1: General Principles. International Organisation for Standardisation, Geneva. ISO 11648-1:2003. Anon. (2004) Milk and Milk Products – Sampling – Inspection by Attributes. International Organisation for Standardisation, Geneva. ISO 5538:2004. Anon. (2005a) Sampling Procedures for Inspection by Attributes – Part 4: Procedures for Assessment of Declared Quality Levels. International Organisation for Standardisation, Geneva. ISO 2859-4:2005. Anon. (2005b) Sampling Procedures for Inspection by Attributes – Part 5: System of Sequential Sampling Plans Indexed by Acceptance Quality Limit (AQL) for Lot-by-Lot Inspection. International Organisation for Standardisation, Geneva. ISO 2859-5:2005. Anon. (2005c) Sampling Procedures for Inspection by Attributes – Part 2: Sampling Plans Indexed by Limiting Quality (LQ) for Isolated Lot Inspection. International Organisation for Standardisation, Geneva. ISO 2859-2:2005.
CH005-N53039.indd 100
5/26/2008 4:52:04 PM
STATISTICAL ASPECTS OF SAMPLING FOR MICROBIOLOGICAL ANALYSIS
101
Anon. (2006a) Sampling Procedures for Inspection by Attributes – Part 10: Introduction to the ISO 2859 Series of Standards for Sampling for Inspection by Attributes. International Organisation for Standardisation, Geneva. ISO 2859-10:2006. Anon. (2006b) Sequential Sampling Plans for Inspection by Attributes. International Organisation for Standardisation, Geneva. ISO 8422:2006. Anon. (2007) Sampling Procedures for Inspection by Variables. Guide to Single Sampling Plans Indexed by Acceptance Quality Limit (AQL) for Lot-by-Lot Inspection for a Single Quality Characteristic and a Single AQL. British Standards Institute, London. BS 6002-1:2007 (ISO 3951-1:2005). Andrews, WH, Hammack, TA (2003) Food Sampling and Preparation of Sample Homogenate, Chapter 1. In Bacteriological Analytical Manual Online, Edition 8, Revision A, 1998, updated 2003. Washington DC, US FDA. http://www.cfsan.fda.gov/~ebam/bam-1.html. Board, RG and Lovelock, DW (1973) Sampling-Microbiological Monitoring of Environments. Board, RG and Lovelock, DW (eds) SAB. Tech. Series 7. Academic Press, London. Bowker, AH (1947) Tolerance limits for normal distributions. In Eisenhardt, C, Hastay, MW, and Wallis, WA (eds) Selected Techniques of Statistical Analysis. McGraw-Hill, New York, pp. 95–109. Bowker, AH and Goode, HP (1952) Sampling Inspection by Variables. McGraw-Hill, New York. Bray, DF, Lyon, DA and Burr, IW (1973) Three class attributes plans in acceptance sampling. Technometrics, 15(3), 575–585. Clark, DS (1978) The International Commission on Microbiological Specifications for Foods. Food Technol., 32(67), 51–54. Gascoigne, JC and Hill, ID (1976) Draft British Standard 6002: “Sampling inspection by variables” (with Discussion). J. Roy. Statistical Soc Series A, 139, 299–317. ICMSF (1986) Microorganisms in Foods. 2. Sampling for Microbiological Analysis: Principles and Specific Applications, 2nd edition. University of Toronto Press, Toronto. ICMSF (2002) Microorganisms in Foods. 7. Microbiological Testing in Food Safety Management. Kluwer Academic/Plenum Publishers. Kilsby, DC (1982) Sampling schemes and limits. In Brown, MH (ed.) Meat Microbiology. Academic Press, London, pp. 387–421. Kilsby, DC, Aspinall, LJ and Baird-Parker, AC (1979) A system for setting numerical microbiological specifications for foods. J. Appl. Bacteriol., 46, 591–599. Malcolm, S (1984) A note on the use of the non-central t distribution in setting numerical microbiological specifications for foods. J. Appl. Bacteriol., 57, 175–177. Pearson, ES and Hartley, HO (1976) Biometrika Tables for Statisticians, 3rd edition. Cambridge University Press. Steiner, EH (1971) Sampling schemes for controlling the average quality of consignments on the proportion lying outside specification. Sci. and Tech. Survey. No. 65. Leatherhead Food International.
CH005-N53039.indd 101
5/26/2008 4:52:04 PM
6 ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
Estimation of levels of microorganisms in foods involves a series of operations each of which contributes in some measure to the overall ‘error’ or ‘lack of precision’ of the analytical result. Before considering the overall errors inherent to methods of enumeration, it is pertinent to consider the errors associated with the stages of laboratory sampling, maceration and dilution that are common to many quantitative microbiological techniques. It is not proposed to discuss the methodology per se since this is dealt with adequately in standard laboratory reference books (ICMSF, 1978; Harrigan, 1998; Anon., 2006). However, reference to certain aspects cannot be avoided completely when considering the effects of methodology on the precision of laboratory results.
LABORATORY SAMPLING ERRORS The size of a sample drawn from a ‘lot’ of food will almost invariably be considerably larger than that required for analytical purposes. Processes such as ‘quartering’ that are used to obtain a representative laboratory sample for chemical analysis (and even for some mycological analyses; Jarvis, 1978), cannot usually be applied in microbiological analysis because of contamination risks. It is essential to ensure that test samples are drawn randomly from the primary samples provided for examination such that they are truly representative of the material under test. A detailed discussion of methods for taking and preparing laboratory samples for analysis is given by ICMSF (1978). In general, solid food materials will be macerated during preparation of a suspension for subsequent dilution and examination. The size of sample to be taken will depend upon the amount of primary sample available and its homogeneity, or otherwise. Ideally, the analytical sample should never be less than 10 g and it should be weighed to the nearest 0.1 g into a sterile container on a top-pan or other suitable balance. The precision (as reflected in the coefficient of variation) of the weight of sample taken increases with increasing sample size (Table 6.1). Provided that the Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH006-N53039.indd 103
103
5/26/2008 7:43:47 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
104
TABLE 6.1 Effect of Sample Weight on the Coefficient of Variation (CV) of weighed samples
a
Sample size (g)
Number of samples (n)
Mean weighta (g)
SD (g)
SE mean
CV (%)
10 50 100
19 20 20
9.99 50.01 99.99
0.071 0.093 0.066
0.016 0.021 0.015
0.71 0.19 0.07
Each sample was weighed to 0.1 g on a top-pan balance and the ‘weight’ of the sample was then determined on an analytical balance.
TABLE 6.2 Errors in Dispensed Nominal 9 ml volumes of Diluent Diluent Dispensed
Weighed
Number of replicates
(1) Automatic dispensing pipette A A A B1 A B2 B B (2) Graduated pipette A A A B
Mean weight (g)
SD
CV (%)
50 25 25 50
8.516 8.063 8.010 8.502
0.124 0.201 0.209 0.111
1.46 2.49 2.61 1.30
50 50
9.277 9.008
0.074 0.198
0.80 2.20
*A: before autoclaving; B: after autoclaving; B1, B2: replicates stacked in top or bottom half of basket, respectively.
sample is weighed carefully, the relative contribution of inaccuracies in weighing will be small when compared to the contributions due to other sources (see below) and the errors that might arise through sample contamination. Differences in analytical results on replicate samples tested in different laboratories, or even between different workers in the same laboratory, may reflect differences in the sample handling procedure. Such differences could arise through inherent differences in the distribution of organisms between samples but may also reflect the storage and subsequent handling of the primary samples and the way in which test samples are taken. DILUENT VOLUME ERRORS It is widely recognized that a major potential source of error in counts is that caused by variations in the volume of diluent used. When dilution blanks are prepared by dispensing before autoclaving, the potential errors are generally largest. The data in Table 6.2 illustrate this point.
CH006-N53039.indd 104
5/26/2008 7:43:48 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
105
Diluent volume errors will have a cumulative effect in any serial dilution series and therefore can have a major effect on the derived colony counts. Due attention is required not only to minimize the variation between replicate volumes but also the mean volume dispensed. For instance, whereas 5 10-fold serial dilutions give a dilution factor of 105 (1 in 100,000), 5 9.8 ml serial dilutions gives a dilution factor of 9.85 (approximately 1 in 90,000). This is discussed in more detail below. Errors in the volumes of diluent used in the primary homogenization of samples will also affect the subsequent dilutions. It is therefore essential to ensure that the volume of diluent added to a primary sample is measured as accurately as possible.
EXAMPLE 6.1 DILUENT VOLUME ERRORS What is the effect of autoclaving on the actual dispensed volumes of a diluent to be used in a serial dilution procedure? A notional 9 ml of distilled water was dispensed into each of a number of tared universal bottles using an automatic dispensing pump or a graduated pipette. The weight of water dispensed was determined before and after autoclaving, or after aseptic dispensing of bulk sterilized water. The results are shown in Table 6.2 The coefficients of variation for hand-dispensed diluent increased from 0.8% to 1.5% in samples weighed before autoclaving, to 2.2–2.6% in samples weighed after autoclaving. Although calibrated to deliver 9 ml of diluent, the automatic dispensing pump delivered only 8.516 g 0.997 g/ml ( 8.49 ml); after autoclaving the mean volume was 8.01 ml (8.037 g 0.997 g/ml). In practice, some allowance can be made for evaporation of water during autoclaving, but the volume needs to be checked carefully before use. Aseptic dispensing of diluents after sterilization is the preferred method of use.
PIPETTE VOLUME ERRORS Over the years, many studies have been made of the variation in volumes of liquid dispensed by bacteriological pipettes of various categories. Table 6.3 provides a summary of some typical data. The variation in the volume of liquid dispensed is dependent upon technical (e.g. operator) errors in the use of pipettes, calibration errors and the extent to which organisms adhere to the glass (or plastic) of the pipette; in addition, the multiple use of a single pipette to prepare a dilution series introduces covariance errors (Hedges, 1967, 2002, 2003; see below). It is worthy of note that, manufacturers’ pipette calibration errors are of two kinds: manufacturing tolerance errors (or inaccuracy) and imprecision (repeatability errors) generally cited as a % coefficient of variation. Values for inaccuracy errors are not readily available from suppliers and reflect the extent of the manufacturer’s acceptable production QC tolerance (Hedges, 2002). It is highly desirable that all pipettes (including semi-automated pipette systems that use replaceable pipette tips) should be recalibrated within the laboratory before use (Hedges, 2003).
CH006-N53039.indd 105
5/26/2008 7:43:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
106
TABLE 6.3 Errors in the Volumes of Diluent Delivered Using Different Pipettes
Sample
Number of tests
SD (mg)
CV (%)
a
Reference
Water
60
30.0
0.60
2.00
Snyder (1947)
La Ca Lb Cb
Water Water Water Water
10 10 20 20
20.0 33.8 19.0 33.3
0.95 2.87 0.63 1.49
4.75 8.48 3.33 4.47
C 1b C 2b C 3a
Milk Milk Milk
21 12 5
10.1 5.6 9.0
0.30 0.77 0.70
2.97 13.75 7.78
Brew (1914)
Serological (1 ml disposable)
Ca Ca Cb
Water Water Water
50 50 25
1046.0 983.0 1009.8
37.0 28.7 45.7
3.49 2.92 4.53
Jarvis (1989)
Serological (1 ml)
C a,c C a,d
Water Water
59 60
908.7 102.5
9.0 2.8
0.99 2.73
Snyder (1947)
Semiautomatice
Ca Cb
Water Water
50 50
97.04 98.2
2.6 10.7
2.68 10.90
Jarvis (1989)
Type of pipette Capillary
L
‘Breed’ capillary
Mean weight delivered (mg)
Spencer (1970)
L laboratory made and calibrated; C commercially prepared and calibrated; C1, C2, C3 different makes of pipette; a Variation within pipette; b variation between pipettes; c 0.9 ml dispensed; d 0.1 ml dispensed; e 100 l disposable tips.
EXAMPLE 6.2 CALCULATION OF DILUTION ERRORS USING A SINGLE PIPETTE FOR ALL TRANSFERS How significant are the errors associated with preparation of a dilution series? Is it better to make six 10-fold dilutions or a 100-fold dilution followed by four 10-fold dilutions? Assume firstly that the dilution series is to be prepared from a liquid sample. One 1 ml of the sample is transferred by pipette to a 99 ml volume of diluent and after mixing a further pipette is used to prepare the four subsequent 10-fold serial dilutions to 106. This gives us one 100-fold (n 1) and four 10-fold (m 4) sequential dilutions. The equation for estimation of dilution error (based on Jennison and Wadsworth, 1940) is: % Dilution error 100
CH006-N53039.indd 106
a2 (m n)2 m2 (a2 b2 ) n2 (a2 c2 ) x2 u2 v2
5/26/2008 7:43:48 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
107
Assume that the SD of the volume of liquid delivered from each 1 ml disposable pipette: a 0.04 ml (see Table 6.3); that the SD of the 9 ml diluent volumes: b 0.20 ml (Table 6.2) and the SD of the 99 ml diluent volume is c 2.0 ml. Then for: a 0.04, b 0.20, c 2.0, m 4, n 1, x 1.0, u 10.0 and v 100.0: The % error of the 106 dilution is given by: 0 . 04 2 (4 1)2 4 2 (0 . 04 2 0 . 22 ) 12 (0 . 04 2 22 ) 2 1 102 1002 0 . 0016(25) 16(0 . 0416) 1(4 . 0016) 100 1 100 10, 000 100 0 . 04 0 . 00656 0 . 00040016 100 0 . 04696 100(0 . 2167) 21 . 7 % 100
Assume now that the dilution series is produced using six 10-fold dilutions. For: a 0.04, b 0.20, m 6, n 0, x 1.0 and u 10.0. Then the % error of 106 dilution 0 . 04 2 (6)2 62 (0 . 04 2 0 . 22 ) 02 (0 . 04 2 2 . 02 ) 2 1 102 1002 0 . 0016(36) 36(4 . 0016) 100 1 102 100 0 . 0576 0 . 014976 100 0 . 07257 6 100(0 . 2694) 26 . 9 % 100
For the range of standard deviations and dilution volumes used in this calculation, the overall percentage dilution error of the 106 dilution would be higher (26.9%) when prepared using 6 10-fold dilutions than if 1 100-fold and 4 10-fold sequential dilution steps were used (21.7%). These values are derived using high levels of error for the pipette and diluent volumes in order to illustrate the importance of minimizing such errors.
EXAMPLE 6.3 CALCULATION OF DILUTION ERRORS USING A DIFFERENT PIPETTE FOR EACH TRANSFER Would the error of the 10 6 dilution be less if a separate pipette were used for each stage in the dilution process? Assume that a dilution series to 106 is prepared using six 10-fold sequential dilutions. As before (Example 6.2) assume SD of the pipettes: a 0.04 ml, SD of diluent volume: b 0.20 ml, the number of dilution stages: m 6 and the volume of the inoculated
CH006-N53039.indd 107
5/26/2008 7:43:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
108
diluent: u 10; then: % error of 106 dilution prepared using a different randomly chosen pipette for each stage is
a2 m m(a2 c2 ) 2 x u2 2 0 . 04 (6) 6(0 . 04 2 0 . 22 ) 100 12 102 100 0 . 0096 0 . 002496 100
100 0 . 012096 11 . 0 % Hence the percentage dilution error for the 106 dilution prepared using 6 10 ml serial dilutions with a different randomly drawn pipette for each dilution step is approximately half of that which would be obtained for an equivalent dilution series made using a single pipette for all stages (21.7%; Example 6.2). Comparative calculated errors for using a single pipette or different pipettes for various serial dilution series are shown in Table 6.4. Note that the standard deviations used for Examples 6.2 and 6.3 are different to those used in the table.
TABLE 6.4 Dilution Errors for Various Levels of Dilution* Prepared Either with the Same Pipette Throughout or Using a Different, Randomly Selected Pipette for Each Transfer % Dilution error when dilutions prepared using Single pipette Dilution level 10−1 10−2 10−3 10−4 10−5 10−6
A
B
2.2 4.5 6.7 9.0 11.2 13.5
2.2 4.2 6.4 8.6 10.8
Different pipettes
C
A
B
C
4.5 6.4 8.5
2.2 3.2 3.9 4.5 5.0 5.5
2.2 3.2 3.9 4.5 5.0
3.2 3.9 4.5
Source: Based on Jennison and Wadsworth, 1940; Hedges, 1967. A: only 10-fold dilutions prepared; B: one 100-fold dilution, remainder 10-fold; C: two 100-fold dilutions, remainder 10-fold. * Assumes standard deviations of 1 ml pipettes, 9 ml diluent volumes and 99 ml diluent volumes are 0.02 ml, 0.1 ml and 1 ml, respectively.
CH006-N53039.indd 108
5/26/2008 7:43:48 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
109
OTHER SOURCES OF ERROR Other inter-related major sources of potential error occur in making sample dilutions. Solid food samples require maceration and homogenization. Traditionally, laboratory maceration was done using top-drive or bottom-drive homogenizers, although this has now been largely superseded by techniques such as ‘stomaching’ and ‘pulsifying’. A potential source of error with any type of traditional homogenizer is related to the inadequate homogenization of the sample, such that a heterogeneous suspension results. Extensive maceration with a bottom-drive homogenizer can result in a pronounced temperature rise that may affect the viability of some organisms through sublethal damage and may adversely affect colony counts. Studies by Barraud et al. (1967), Kitchell et al. (1973) and others have shown that results varying by about 7% can be obtained with different makes of homogenizer and by 50% when a pestle, mortar and sand were used to grind meat samples. The second related source of error is inadequate mixing of the inoculated diluent, such that the organisms are not thoroughly distributed in the suspension, both at the primary and subsequent stages of a serial dilution. The extent of maceration and mixing can affect the size and distribution of cell aggregates within the suspension so that apparent differences are seen in the microbial population in replicate plates both at a single dilution and between different dilution levels. The introduction of the ‘Stomacher™’ (Sharpe and Jackson, 1972) provided a totally new approach to the preparation of primary food suspensions that avoids the need to maintain a large number of sterile macerators. In addition, the more gentle ‘massaging effect’ of the stomacher separates organisms from the food such that there was less physical breakdown of the food into particles that might interfere in later stages of the test. Comparative studies showed that colony counts on food suspensions prepared by ‘stomaching’ and by homogenisation were generally comparable although, in some cases, counts after ‘stomaching’ may be higher than counts after maceration (Tuttlebee, 1975; Emswilea et al., 1977). A variation in the stomaching procedure uses ‘filter bags’ into which the food sample is placed before immersion in the diluent so that the suspension contains the organisms without any of the ‘stomached’ food sample. Another alternative procedure uses the ‘Pulsifier®’, an instrument that combines a highspeed shearing action with intense shock waves to liberate organisms with minimal disruption of the food sample matrix. Described originally by Fung et al. (1998), its use has been described by, among others, Sharpe et al. (2000), Kang and Dougherty (2001) and Wu et al. (2003). The lower level of both suspended and dissolved solids improves membrane filtration rates and reduces interference in polymerase chain reaction (PCR) and similar methods. As pointed out by Mudge and Lawler (1928), among others, a further source of ‘dilution error’ relates to variations in time between preparation of dilutions and plating. Major differences in colony count can result from ignoring this effect; the time between preparation of dilutions and plating should be consistent and as short as possible.
CH006-N53039.indd 109
5/26/2008 7:43:49 PM
110
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
CALCULATION OF THE RELATIVE DILUTION ERROR It has long been recognized that in seeking to assess the precision of a colony count it is necessary to rely not just on the actual count of colonies but to take account also of the contributions to variance of the preceding steps, especially those of the dilution series. Jennison and Wadsworth (1940) derived a formula to estimate the magnitude of the dilution error knowing the standard deviations of the pipette volume(s), diluent volumes and number of dilutions prepared. For a dilution series consisting of both 1 in 100 and 1 in 10 dilutions, the dilution error, as the standard deviation of the final dilution, is given by:
F
x mn um v n
a2 (m n)2 m2 (a2 b2 ) n 2 (a2 c 2 ) 2 x u2 v2
where, F standard deviation of the expected final dilution (F xmn/umvn), x volume measured by pipette (e.g. 1 ml) with variance a2; u volume of inoculum diluent (e.g. 1 ml 9 ml 10 ml) with combined variance a2 b2, and b2 variance of the 9 ml diluent volume; v volume of inoculum diluent (e.g. 1 ml 99 ml 100 ml) with combined variance a2 c2, and c2 variance of 99 ml diluent volume; m number of 1 in 10 dilutions and n number of 1 in 100 dilutions. Note that this and subsequent equations have been corrected from those given by Jennison and Wadsworth (1940). For simplicity, since x 1.0, u 10 and v 100, F can be expressed as a percentage error: Percentage dilution error 100 a2 (m n)2
m2 (a2 b2 ) n 2 (a2 c 2 ) 102 1002
This generalized equation of the percentage error, as standard deviation of the volume, can be used for any series of dilutions with pipette and diluent volumes (a, b and c) for any combination (m and n) of 10 and 100 ml dilution series. Since the absolute value of a2 should be small compared to the values of b2 and c2, and with careful control of the diluent volumes, such that b 10a and c 100a, this simplifies to: Percentage dilution error 100a (m n)2 m2 n 2 Jennison and Wadsworth (1940) provide a Table of dilution errors for various combinations of pipette and diluent blank variances and various combinations of 1 in 10 and 1 in 100 dilutions to 108. However, in their original calculation, they assumed that only one pipette would be used in the preparation of a dilution series; that is to say, that the same pipette would be used throughout. Hedges (1967) showed that the basic equation is incorrect if a fresh, randomly selected pipette is used for each stage in preparing a dilution series. Since this latter method is the one adopted traditionally by most microbiologists, it is important to recognize the difference in magnitude of the error that results from elimination of the
CH006-N53039.indd 110
5/26/2008 7:43:49 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
111
covariance terms the equations derived above. Thus, when a fresh randomly drawn pipette is used at each stage of preparation of a dilution the equation given above for the percentage dilution error simplifies to: Percentage dilution error 100a (m n) n m 100a 2(m n) Percentage dilution errors for dilutions to 106 are illustrated in Table 6.4 for various combinations of 1 in 100 and 1 in 10 dilutions, prepared with a single pipette or with a fresh pipette for each transfer. The magnitude of the dilution error can be limited to some extent by reducing the number of dilutions prepared, that is in Table 6.4 the dilution error of the 106 dilution (using a separate pipette at each stage) is reduced from 13.4% to 8.5% by preparing two 100-fold and two 10-fold dilutions, rather than six 10-fold dilutions. The calculations for use of different pipettes at each stage of the dilution process show that the errors are lower than in a series for which a single pipette is used throughout. For instance, six 10-fold dilutions made with fresh randomly drawn pipettes at each stage have a dilution error of 5.5% (cf. 13.4% when the same pipette is used throughout). Hedges (2002) investigated the inherent inaccuracy of pipettes in terms of the pipette manufacturers’ quoted calibration and repeatability errors (the latter being a measure of imprecision) and took account also of the impact of the Poisson distribution and the sampling component of variance a consideration ignored by Jennison & Wadsworth (1940). As noted previously, he observed that pipette manufacturers’ rarely provide information on manufacturing inaccuracy and generally cite only the imprecision value, usually as a coefficient of variation (CV) 100 s/u, where s is the standard deviation and u is the nominal volume of the pipette. In his treatment, Hedges (2002) assumed that a volume (u) of inoculum is pipetted into a volume (v) of diluent to prepare a dilution u/(uv) and that the process is repeated n times to give a final dilution, with D [u/(uv)]n as the nth step. He further supposed two method scenarios: (a) the more traditional approach where separate randomly selected pipettes are used for each of the u volumes and different randomly drawn pipettes are used to dispense each of the v volumes of diluent at each stage; and (b) a more modern approach where the same randomly selected dispenser pipette is used to deliver all u volumes (with a fresh tip used at each step) and another randomly selected pipette is used to deliver all v volumes. He used Taylor’s series to derive equations to determine the dilution components of variance for each serial dilution stage of both methods. Let D p/q un/(uv)n, where p un and q (uv)n and w (u v). The volumes u and v are uncorrelated, Var(ui) Var(uj) – where i dilution i, and j dilution (i 1); similarly for v and w. For method (b), the cumulative variances associated with the inoculum transfer volumes (ui) and the dispensed diluent volumes (vi) are determined for each stage in the dilution process: Var(p) Var{(u)n} n u2(n1){Var(u) (n1) Covar(ui,uy)} Var(q) Var{(v)n} n v2(n1){Var(v) (n1) Covar(vi,vy)}
CH006-N53039.indd 111
5/26/2008 7:43:49 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
112
TABLE 6.5 Dilution and Sampling Components of Variance for Serial Dilutions Made Using Separate Pipettes at Each Step (Method (a)) and the Same Pipette at Each Step (Method (b)) Dilution component a Step (n) 1 2 3 4 5 6
Method (a)
Method (b)
Sampling component b
2.0479 107 4.0958 109 6.1436 1011 8.1915 1013 1.0239 1014 1.2287 1016
2.0479 107 5.3644 109 9.9532 1011 1.5815 1012 2.2951 1014 3.1361 1016
1.0 102 1.1 103 1.11 104 1.111 105 1.1111 106 1.11111 107
Source: Modified from Hedges (2002) and reproduced by permission of the author and Elsevier. a Values to be multiplied by (N*)2, where N* estimate of the colony count. b Values to be multiplied by N* and u2, where u volume delivered to the plate
Whence, the variance of the dilution process is given by: Var(D) Var(p/q) D2{Var(p)/p2 Var(q)/q2 2Covar(p,q)/pq} However, for method (a), the variances of p and q are not affected by covariances, so the equations for Var(p) and Var(q) are simplified by the exclusion of the covariance factors. He introduced, also, an estimate of the sampling component due to the random distribution of organisms at the sequential stages of the dilution series. Table 6.5 summarizes the calculated dilution and sampling components and the correlation coefficients for derivation of covariances are shown in Table 6.6. The derivations of the %CVs of the colony count for dilution methods (a) and (b) are shown in Table 6.7. The overall % error of a count of 223 colonies at the 106 dilution (using the same pipette throughout) estimated by the method of Hedges (2002) was 7.3%. The procedures to be followed to derive the dilution components are illustrated in Example 6.4. In a subsequent paper, Hedges (2003) assessed the impact on the precision of serial dilutions and colony counts following recalibration of pipettes in the laboratory, a requirement for accredited laboratories (ISO, 1999). He concluded that although recalibration improves the precision of the methods, owing to the dominant effect of the final sampling variance, the overall improvement in the precision of colony counts is small. Augustin and Carlier (2006) applied the methods of Hedges (2002, 2003) in their evaluation of repeatability variance for data from laboratory proficiency testing. For ease of calculation, they converted some of the equations into coefficients of variation (CV). For instance, the CV for the volume (u) of inoculum transferred between dilutions is given by: CV(2p) Var(p) / p2 Var(un )/ u2n, where Var variance, p un volume of inoculum
CH006-N53039.indd 112
5/26/2008 7:43:49 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
113
transferred at dilution n. They then applied the various equations to determine the overall CV of the derived colony count, which they showed to be related more closely to the number of colonies counted than to any other parameter examined. This again indicates the overwhelming importance of the statistical distribution of organisms both in the original foodstuff and in the final countable test plates.
EXAMPLE 6.4 DETERMINATION OF THE PRECISION OF SERIAL DILUTIONS AND THE COEFFICIENT OF VARIATION OF A COLONY COUNT USING THE PROCEDURE OF HEDGES (2002) (Some of the text is based on HEDGES (2002) and is reproduced by permission of the author and Elsevier) Suppose that you need to compare the overall precision of a 10-fold serial dilution series to 106, assuming (a) use of different randomly selected 1 and 10 ml pipettes for each dilution stage and (b) use of the same 1 and 10 ml pipettes throughout, but with random selection of pipette tips. Although this procedure seems complex, it is relatively straightforward providing that the calculations are done in the following sequence: 1. Calculate the basic pipette variances and covariances. 2. Calculate the variance of the dilution series due to pipette ‘errors’ (by separating the numerator and denominator of the final dilution fraction). 3. Add in the pipette error due to the final delivery to the culture plate. 4. Calculate the Poisson sampling error due to ‘sampling’ (a) during dilution and (b) during the final delivery to the plate. 5. Add the two sources of error to obtain final variance. The following terms are used in this example: cal(·) and rep(·) mean the pipette calibration and pipette repeatability variances for u and v; u volume of inoculum transferred by pipette from dilution i to the next dilution j; u volume of inoculum transferred by pipette to the culture plate; v pipetted volume of diluent to be inoculated at each step; w (u v)total volume of inoculated diluent at each step; p u n and q w n (u v)n, where n the number of dilutions; D p/q u n/(u v)n dilution at step n; X number of colonies counted on plate at dilution D; Var(·) variance, where (·) means u, v, w, D, X; Covar(·) covariance between two factors (e.g. Covar(u,v) covariance between u and v).
CH006-N53039.indd 113
5/26/2008 7:43:49 PM
114
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Pipette variances and covariances Hedges (2002) shows that the variance of the inoculum volumes at step n of a dilution series is given by: Var(p) Var{(u)n } n u2(n1){Var(u) (n1) Covar(ui,uj)}. Similarly, the variance of the diluent volumes at step n is given by: Var(q) Var{(w)n} n w2(n1){Var(w)(n1) Covar(wi,wj)}. Whence, the variance of the dilution (D) the variance of (p/q) Var(D) Var(p/q) D2{Var(p)/p2 Var(q)/q2 2Covar (p,q)/pq}. Now Var(p/p2) the square of the coefficient of variation of p (i.e. (CVp)2); similarly Var(q/q2) (CVq)2. So Var(D) D2 {(CVp)2 (CVq)2 2Covar (p,q)/pq}. For both methods of dilution, the volumes u and v are uncorrelated, and the variance of the volume of inoculum delivered u is given by Var(u) cal(u) rep(u); similarly the variance of the volume of diluent (v) is Var(v) cal(v) rep(v), so the total pipetting variance Var(w) is given by Var(w) Var(u) Var(v). For our 10-fold dilution series we use 1 ml ( 1) pipettes that have a calibration inaccuracy of 0.81% and an imprecision of 0.4% and diluent is dispensed in 9 ml (u 9) volumes with pipettes having a calibration inaccuracy of 0.3% and an imprecision of 0.1%. Then, using the equations from Hedges (2002), the calibration variance of the 1 ml pipette cal(u) {(u inaccuracy)/(3 100)}2 {(1 0.81)/300}2 7.29 106; similarly, the calibration variance of the 9 ml pipette cal(v) {(9 0.1)/100}2 8.1 105. The repeatability variance of the 1 ml pipette rep(u) {(u imprecision)/100}2 {(1 0.4)/100}2 1.60 105; similarly, the repeatability variance of the 9 ml pipette rep(v) {(9 0.1)/100}2 8.1 105. The variance of the volume transferred at each step is given by: Var(u) {7.29 106 1.60 105} 2.329 105; and the variance of the diluent volumes is: Var(v) {8.1 105 8.1 105} 1.62 104; so the total pipetting error Var(w) Var(u)Var(v) {2.329 105 1.62 104} 1.853 104. However, for any dilution step a covariance factor (Covaru,w) may be necessary to allow for compound errors in both the inoculum volumes (u) and the diluent volume (v). To determine covariance, the relation: Covar(s, r) rs, r Var (s) Var (r) is used, where rs,r the Pearson correlation coefficient, for which r2 represents the proportion of the total variance shared linearly by the two covariates. The correlation coefficients for the various possible covariates are given in Table 6.6.
CH006-N53039.indd 114
5/26/2008 7:43:49 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
115
TABLE 6.6 Correlation Coefficients for Covariates in a Serial Dilution Scheme Correlation coefficients (r) for Covariates
Method (a) a
Method (b) a
ui, uj
0.00
{cal(u)/Var(ui)} {cal(u)/Var(uj)} 0.313
v i , vj
0.00
{cal(v)/Var(vi)} {cal(v)/Var(vj)} 0.500
ui, vi
0.00
0.00
w i , wj
0.00
ui, wi ( pn, qn)
Var(u)/Var(w) 0.355
{cal(u) cal(v)} {cal(u) cal(v)} 0 . 476 Var(wi ) Var(w j ) Var(u)/Var(w) 0.355
Source: Modified from Hedges, 2002 and reproduced by permission of the author and Elsevier. a
Methods (a) and (b) are described in Example 6.4.
Hedges (2002) shows that for both methods the value of the correlation coefficient for u,w 0.355 (a value supported by simulation studies) so, for the pipette volumes referred to above, the covariance of u and w for method (a) is Covar(u, w) Covar ( p, q) 0 . 355 Var ( p) Var (q) 0 . 355 (1 . 397 104 ) (1 . 118 107 ) 14 . 030
Pipette variances For method (a), where a different randomly drawn pipette is used at each step, the only covariate is Covaru,w; then the total variance contribution to the 106 dilution for the 1 ml pipette volume is given by: Va r ( p ) Va r { ( u ) n} n u 2(n 1){ Va r ( u ) } 6 1 2(5){ 2 . 3 2 9 1 0 5} 6 {2.329 105} 1.397 104 Similarly the total variance contribution to the 106 dilution for the 9 ml pipette volume is Var(q) Var{(w)n} n w2(n1){Var(w)} 6 10(25) {1.853 104} 6 1010 {1.853 104} 1.1118 107 The value of the correlation coefficient for u,w 0.355 (Hedges, 2002) so, for the pipette volumes referred to above, the covariance of u and w for method (a) is Covar(u, w) Covar ( p, q) 0 . 355 Var ( p) Var (q) 0 . 355 (1 . 397 104 ) (1 . 1118 107 ) 14 . 00
CH006-N53039.indd 115
5/26/2008 7:43:50 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
116
So the total variance of the 106 dilution (D) is given by: Var(D) Var(p/q) D2{Var(p)/p2 Var(q)/q2 2Covar(p,q)/pq} 1012{(1.397 104)/1(1.118 107)/1012 2(14.00/106} 1012{(1.397 104) (1.118 105) (28.010 106)} 1.229 1016 For method (b), covariance factors are required at all stages since single pipettes are used to deliver each of the u and v volumes. Hence the variance of p is given by: Var(p) Var{(u)n} nu 2(n1){Var(u) (n1) Covar(ui,uj)} 6 12(61){2.329 105 (61)(7.290 106} 6 {2.329105 3.645 105} 6 5.974 105 3.584 104 Similarly, the variance of q is given by: Var(q) Var{(w)n} nw2(n1){Var(w) (n1) Covar(wi,wj) 6 10(25){(1.853 104) 5(8.820 105)} 6 1010 6.263 104 3.758 107 Now the covariates are given by: Covar(u, w) Covar ( p, q) Covar (u, w) Covar ( p, q) 0 . 355 Var( p) Va r (q) 0 . 355 (3 . 584 104 ) (3 . 758 107 ) 41 . 199 Covar (ui , u j ) 0 . 3 1 3 (2 . 329 105 )2 7 . 290 106 , and Covar (wi , w j ) 0 . 476 (1 . 853 1 04 )2 8 . 820 105 Hence, the total variance of the 106 dilution is given by: Var(D) Var(p/q) D2{Var(p)/p2 Var(q)/q2 2Covar(p,q)/pq} 1012{(3.584 104)/1(3.758 107)/1012 2(41.199)/106} 1012 {(3.584 104) (3.758 105) (8.2398 105)} 3.136 1016
CH006-N53039.indd 116
5/26/2008 7:43:50 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
117
Poisson sampling error Next, we need to take account of the Poisson sampling variance (Var(P)) at each step throughout the series up to and including the volume delivered to the culture plate. Again, Hedges (2002) provides an approximation for this variance, where u represents the volume delivered to the plate: Var(P) {(u2Var(D)) (D2Var(u))} For method (a): Var(P) {1 1.2281016} (1012 2.329 105) {1.228 1016 2.329 1017) 1.461 1016 For method (b): Var(P) {(1 3.136 1016) (2.329 1017) 3.369 1016
The final pipetting error For an observed count of X colonies per volume (u) at step n the error is estimated by (N*)2 Var(P), where N* is the estimate of the original count (N ): N* X 1/D 1/u. If X 223, then at step n 6, N* 223 106 and the final pipetting error for: method (a) (N*)2 Var(P) (223 106)2 (1.4609 1016) 7.265; method (b) (223 106)2 (3.3689 1016) 16.7537.
The total sampling error There now remains the estimate of the total (Poisson) sampling error, which is the same for both methods. There are two error components: the first is the delivery of the final diluted inoculum to the plate and the second is the cumulative sampling error during the preparation of the dilution series. The first is estimated by the number of colonies on the plate (X 223), the second is derived as: u2 N *
i2 n
∑
{1/ zi } u2 N * the sampling component (from Table 6.5).
in1
For this example, where n 6, the sampling component is 1.11111 107 and the total sampling error is (12 223 106 1.11111 107) 223 24.77 223 247.778.
CH006-N53039.indd 117
5/26/2008 7:43:50 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
118
Combined variance of the colony count The variance of (X ) sum of the pipetting and sampling errors; for: method (a), the combined variance of (X ) is 7.265 247.778 255.043; method (b), the combined variance of (X ) is 16.753 247.778 264.531. The standard deviation and coefficient of variation of the colony count at the 106 dilution using the two different methods of preparing the dilution series, is: method (a), SD Var 255.043 15.97; %CV 100(15.97/223) 7.161%; method (b), SD 264.531 16.264; %CV 100(16.26/223) 7.293%. It is worthy of note that for method (a), the pipetting error is only 2.85% of the combined error of the colony count whilst for method (b) it is 6.33%. These data are summarized in Table 6.7 TABLE 6.7 Examples of the Calculated Variance for Colony Counts Done by the Standard Plate Count Method Colonies counted: X 223; Dilution level: D 106 Volume of diluent plated u 1.0 ml; Number of dilutions: n 6; Colony count (N*) 2.23 108 cfu/ml Method
(a) Separate pipettes
Dilution error a Final pipetting error b Total sampling error c Var (X) Sum of pipetting and sampling errors Coefficient of Variation of X (CV, %) d
1.2287 1016 7.2684 247.7778 255.046
Pipetting error as a % Var(X) Distribution error as a % Var(X)
7.162 2.85 97.15
(b) Single pipette 3.1361 1016 16.7537 247.7778 264.532 7.293 6.33 93.67
Source: Modified from Hedges (2002) and reproduced by permission of the author and Elsevier. a
from Table 6.5. Final pipetting error (N*)2 Var(P) where Var(P) {u2 Var(D) (D2 Var(u)}2. c Total sampling error {u2 N* sampling component value (Table 6.5) X}. d CV (%) 100 Var(X)/X (e.g.) 100 255.046/223. b
CH006-N53039.indd 118
5/26/2008 7:43:50 PM
ERRORS IN THE PREPARATION OF LABORATORY SAMPLES FOR ANALYSIS
119
EFFECTS OF GROSS DILUTION SERIES ERRORS ON THE DERIVED COLONY COUNT Gross errors in the volumes of inoculum and diluent affect the overall dilution level and will therefore influence the apparent colony count. For instance, the sequential inoculation of a 1 ml volume into, for example, 8.5 ml volumes will give a dilution level of 1 in 9.5, not 1 in 10 as intended. Hence the % error associated with the 106 dilution, assuming use of different pipettes with variance 0.02 ml and diluent blanks with variance 0.1 ml (Table 6.4) would be 5.5% for six 10-fold dilutions, but would be 5.7% for six 9.5-fold dilutions. Possibly more important is the impact on the calculated number of organisms. For instance, an average colony count of 100 colonies from the 106 dilution would be recorded as 100 106 cfu/unit of sample. But if we take due note of the volume errors, the derived count should be 100 9.56 (i.e. 100 735100 73.5 106 cfu/unit of sample). The difference in these counts is highly significant since a ‘true’ count of 74 million would have been recorded as 100 million. Such differences could affect the likelihood that a colony count result on a sample would, or would not, comply with a microbiological criterion and justifies the necessity for calibration of dispensing equipment and checking of dispensed volumes as part of laboratory quality monitoring Anon (1999).
References Anon. (1999) General Requirements for the Competence of Testing and Calibration Laboratories, 1st edition. International Organisation for Standardisation, Geneva. ISO/IEC 17025 Anon. (2006) Official Methods of Analysis, 18th edition. Association of Official Analytical Chemists Inc, Washington DC. Augustin, J-C and Carlier, V (2006) Lessons from the organization of a proficiency testing program in food microbiology by interlaboratory comparison: Analytical methods in use, impact of methods on bacterial counts and measurement uncertainty of bacterial counts. Food Microbiol., 23, 1–38. Barraud, C, Kitchell, AG, Labots, H, Reuter, G, and Simonsen, B (1967) Standardisation of the total aerobic count of bacteria in meat and meat products. Fleischwirtschaft, 12, 1313–1318. Brew, JD (1914) A comparison of the microscopical method and the plate method of counting bacteria in milk. NY Agric. Expt. Sta., Bull, 373, 1–38. Emswilea, BS, Pierson, LJ, and Kotula, AW (1977) Stomaching versus blending. Food Technol., pp, 40–42. October 1977 Fung, DYC, Sharpe, AN, Hart, BC, and Liu, Y (1998) The Pulsifier® a new instrument for preparing food suspensions for microbiological analysis. J. Rapid Meth. Autom. Microbiol., 6, 43–49. Harrigan, WF (1998) Laboratory Methods in Food Microbiology. Academic Press, London. Hedges, AJ (1967) On the dilution errors involved in estimating bacterial numbers by the plating method. Biometrics, 23, 158–159. Hedges, AJ (2002) Estimating the precision of serial dilutions and viable bacterial counts. Int. J. Food Microbiol., 76, 207–214. Hedges, AJ (2003) Estimating the precision of serial dilutions and colony counts: Contribution of laboratory re-calibration of pipettes. Int. J. Food Microbiol., 87, 181–185.
CH006-N53039.indd 119
5/26/2008 7:43:50 PM
120
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
ICMSF (1978) Microorganisms in foods. 1 Their significance and methods of enumeration, 2nd edition. University of Toronto Press, Toronto. Jarvis, B (1978) Methods for detecting fungi in foods and beverages. In Beuchat, L (ed.), Food & Beverage Mycology. AVI Publ. Co . Jarvis, B (1989) Statistical aspects of the microbiological analysis of foods. In Progress in Industrial Microbiology 21. Elsevier, Amsterdam. Kang, DH and Dougherty, RH (2001) Comparison of Pulsifier® and Stomacher™ to detach microorganisms from lean meat tissues. J. Rapid Meth. Autom. Microbiol., 9, 27–32. Kitchell, AG, Ingram, GC, and Hudson, WR (1973) Microbiological sampling in abattoirs. In Board, RG and Lovelock, DW (eds.) Sampling – Microbiological Monitoring of Environments. Academic Press, London, pp. 43–61. SAB Technical Series No. 7 Jennison, MW and Wadsworth, GP (1940) Evaluation of the errors involved in estimating bacterial numbers by the plating method. J. Bacteriol., 39, 389–397. Mudge, CS and Lawler, BM (1928) Is the statistical method applicable to the bacterial plate count? J. Bacteriol., 15, 207–221. Sharpe, AN and Jackson, AK (1972) Stomaching: a new concept in bacteriological sample preparation. Appl. Microbiol., 24, 175–178. Sharpe, AN, Hearn, EM, and Kovacs-Nolan, J (2000) Comparison of membrane filtration rates and hydrophobic grid membrane filter coliform and Escherichia coli counts in food suspensions using paddle – Type and Pulsifier sample preparation procedures. J. Food. Prot., 62, 126–130. Snyder, TL (1947) The relative errors of bacteriological plate counting methods. J. Bacteriol., 54, 641–654. Spencer, R (1970) Variability in colony counts of food poisoning clostridia. Research Report No. 151. Leatherhead Food Research Association. Tuttlebee, JW (1975) The Stomacher – Its use for homogenisation in food microbiology. J. Food Technol., 10, 113–123. Wu, VCH, Jitareerat, P, and Fung, DYC (2003) Comparison of the Pulsifier® and the Stomacher® for recovering microorganisms in vegetables. J. Rapid Meth. Autom. Microbiol., 11, 145–152.
CH006-N53039.indd 120
5/26/2008 7:43:51 PM
7 ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
In all colony count techniques, replicate volumes of each of several serial dilutions are dispersed in, or on, a nutrient medium. A count of colonies is made after incubation at an appropriate temperature and the level of organisms per unit of sample is derived from the mean number of colonies counted and the appropriate dilution factor. The assumption is frequently made that each colony arises from a single viable organism, but since a colony can arise also from a clump, or aggregate, of organisms the colony count procedure gives an estimate of numbers of colony-forming units (cfu), not of total viable organisms per se. Furthermore, colony count methods will only provide an estimate of those organisms able to grow in, or on, the specific culture medium in the conditions of incubation used in the test. No colony count procedure can be expected, therefore, to provide a true estimate of the total viable population of microorganisms. All colony count methods are subject to errors, some of which are common, whilst others are specific to a particular method or group of methods. Errors common to all procedures include: (a) the sampling and dilution errors, discussed previously (Chapter 6); (b) errors in pipetting volumes of diluted sample; (c) microbial distribution errors; (d) errors of counting and recording colony numbers; and (e) errors of calculation. Details of methodology are given in standard texts such as Harrigan (1998) and ICMSF (1978).
SPECIFIC TECHNICAL ERRORS Pour Plate and Similar Methods Any method in which the inoculum is mixed with molten agar may result in a heat-shock to the organisms, the extent of which is dependent on the thermal sensitivity of the organisms and the temperature of the molten agar. Careful tempering of the agar is obviously of paramount importance. In all methods of this type, thorough mixing of the inoculum with the medium is vital in order to obtain an even distribution of organisms. Errors can arise Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH007--N53039.indd 121
121
5/26/2008 7:55:08 PM
122
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
through, for example, splashing agar onto the lid of a Petri dish, or partial setting of agar before adequate mixing has been undertaken. Colonies that grow in the medium may be obscured by others on the surface (this is particularly the case when ‘spreaders’ occur). In general, colonies in the depths of the agar will be exposed to a slightly lower oxygen tension than will colonies on the surface and marked differences in colony size may result after a finite incubation period. Consequently organisms that are particularly sensitive to thermal shock, lowered oxygen tension, or both, should not be counted by these methods. The Colworth Droplette method (Sharpe and Kilsby, 1971), in which dilutions are prepared in molten agar and replicate drops of the molten agar are dispensed into a Petri dish, is reported not to be affected by limitations on oxygen diffusion, but will be subject to problems of thermal shock and to the factors affecting drop counts (see below). Deep agar counts (i.e. tube counts, black-rod counts, agar in plastic bags, etc.) will provide an estimate of the facultative and obligate anaerobic organisms in a sample. Since the oxygen and redox potential (Eh) tolerance of such organisms will vary, different organisms may grow at different depths of medium. Clearly, this may not give an accurate estimate of the numbers of potential and facultative anaerobic organisms. Where pre-reduced media are used in deep tubes, one sometimes sees stratified growth where organisms in the bottom of a tube are inhibited by a very low Eh, and organisms in the upper layers of agar are inhibited by oxygen diffusion leading to elevated Eh levels. The latter problem can be overcome by pouring a plug of non-inoculated agar onto the top of the solidified, inoculated agar. A further problem sometimes experienced with deep agar counts is splitting of the agar due to gas production. Such effects often occur in deep agar tube counts of proteolytic clostridia and make accurate counting of colonies very difficult.
Surface Plating Methods In surface plating, a volume of inoculum is pipetted onto the ‘dried’ surface of an agar medium. A problem common to all methods of this type relates to the extent of surface drying of the agar; inadequate drying will lead to delay in absorption of the inoculum whilst excessive drying will cause ‘case-hardening’ and, possibly, a reduction in the water activity (aw) of the surface layer. Case-hardening will permit drops of inoculum to ‘run’ across the plate and will generally result in a smaller area of inoculum spread – this is of importance in the drop count (Miles, et al., 1938) methods. Restriction of aw may also affect the rate of growth of hydrophilic organisms. When the inoculum is spread across the surface of the agar (as in the whole or onequarter-plate spread technique) a proportion of the inoculum may adhere to the surface of the spreader, thereby giving a falsely low colony count. However, the extent of this error is likely to be low when compared with the other intrinsic errors of colony count procedures. The Spiral Plate (Gilchrist et al., 1973) and similar semi-automated methods reduce the errors of dilution since an increasingly reducing inoculum is distributed across the surface of the agar plate in the form of an Archimedes’ spiral to give the equivalent of a 3-log dilution of organisms on a single plate (Plate 7.1). Possible technical errors include
CH007--N53039.indd 122
5/26/2008 7:55:08 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
(a)
(c)
123
(b)
(d)
PLATE 7.1 Distribution of bacterial colonies from raw milk on agar plates using the spiral plating system, to show (a) and (b) low and medium numbers of colonies for counting either manually or using an automated counting system, (c) medium-high numbers of colonies suitable for counting using an automated counting system and (d) unacceptably high numbers of colonies for counting. Counts on plates (b) and (c) are typically done by scoring the number of colonies in each of several sectors of the plate. Images supplied by and reproduced with the kind permission of Don Whitley Scientific Ltd.
CH007--N53039.indd 123
5/26/2008 7:55:08 PM
124
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
malfunction of the continuous pipette. A more frequent problem is that the agar surface is not completely level; problems of uneven or sloping surface cause an abnormal distribution pattern of organisms and computation errors will result.
PIPETTING AND DISTRIBUTION ERRORS Pipette Errors The accuracy of pipettes has been discussed in Chapter 6. Since volumes of each dilution used for preparing plates (or tube) counts are also pipetted, an error factor for the variance of pipettes used must be included in the overall assessment of colony count accuracy.
Distribution Errors It is normally assumed (Fisher et al., 1922; Wilson, 1922; Snyder, 1947; Reed and Reed, 1949; Badger and Pankhurst, 1960) that the distribution of cfu in a Petri dish, or in a ‘drop’ of inoculum, follows a Poisson series (see also Chapter 4). However, other workers (e.g. Eisenhart and Wilson, 1943) have demonstrated that although pure cultures of bacteria generally follow a Poisson series, mixed cultures may deviate from Poisson and demonstrate either regular or contagious distribution. This effect was illustrated in Example 4.4 for colony counts of organisms surviving disinfection. In such a situation, the organisms detected are those not sublethally damaged to an extent that precludes growth in the recovery conditions used. Hence, one would expect deviation of colony counts from Poisson whenever sublethal cell damage has occurred (e.g. after heating, freezing, treatment with chemicals, etc.) since the susceptibility of individual organisms and strains will vary, especially if they form part of a cell clump or aggregate. Contagious distribution can be seen also in many situations where the inoculum consists of a mixture of viable cells and cell aggregates. The Index of Dispersion test of Fisher et al. (1922) has been suggested as a means of testing observed plate count results for agreement with a Poisson distribution (see also Chapter 4) using 2 as the test criterion: n
2
∑ (xi x)2 i 1
x
where x is the mean colony count from n plates; xi is the colony count on the ith plate and 2 has (n 1) degrees of freedom. Fisher et al. (1922) demonstrated that abnormally large variations in colony numbers are associated with factors such as antibiosis between colonies growing on a plate and that subnormal variations (i.e. regular distributions) were associated with culture media defects. Eisenhart and Wilson (1943) proposed the use of a control chart to test whether counts
CH007--N53039.indd 124
5/26/2008 7:55:11 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
125
are in statistical control. Since the critical value of the Fisher 2 test is dependent on the number of replicate counts, this procedure cannot be used unless at least duplicate counts are available at the same dilution level. An example of a 2 control chart is shown in Fig. 7.1 for colony counts of clostridia (Spencer, 1970). It can be seen that one of the 2 values on the first 22 tests (drop counts) slightly exceeded the P 0.025 level and none fell below the value for P 0.975. Similarly, only one of the 10 tube counts gave a 2 value below P 0.975. If the counts are ‘in control’, on average not more than 1 in 20 tests should lie
40
P 0.025
X2
30
20
P 0.50
10
P 0.975
0
0
5
10
15
20
25
30
Test number FIGURE 7.1 Control chart for colony counts of Clostridium botulinum (䊊, 䊐) and Clostridium perfringens (䊉, 䊏) by ‘drop’ count (䊊, 䊉) and tube count (䊐, 䊏) methods, with 20 replicate counts/test (data from Spencer, 1970).
CH007--N53039.indd 125
5/26/2008 7:55:11 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
126
outside the critical values for 2; hence, Spencer’s (1970) tube counts were in control and the distribution of colonies could not be shown to differ significantly from Poisson. For the tube counts, there are insufficient data to accept or reject the hypothesis that counts were ‘in control’, but there is a reasonable presumption that they were. However, these counts were on suspensions of pure cultures. Colony counts determined in parallel by the drop count and spiral plate maker methods (SPM) on a range of foods (Jarvis et al., 1977) were examined similarly (Figs. 7.2 and 7.3). 6
P 0.025
5
X2
4
3
2
1 P 0.975
0
0
5
10
15
20
25
Test number FIGURE 7.2 Control chart for colony counts on food samples using drop count method.
CH007--N53039.indd 126
5/26/2008 7:55:11 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
127
The control chart for drop counts (Fig. 7.2) shows that 2 values for two of the 20 sets of counts exceeded 5.02 (P(1) 0.025) and that 2 values for 10 data sets were lower than 0.45 (P 0.975). In fact, for these data only 8 of the 20 sets appeared to conform to Poisson. However, for the 20 counts taken together, the cumulative 2 23.53, for which 0.50 P 0.25; hence, overall, the counts cannot be disproved to conform to a Poisson distribution. By contrast, the control chart for SPM counts (Fig. 7.3) indicates 13 tests with 2 0.025 and 1 test with 2 0.975; only 9 counts conformed to Poisson. The cumulative 2 value was 334.1 with 69 degrees of freedom; this value is 2 for P 0.001; hence it may be concluded that these counts made by SPM are subject to contagion and probably
40
X2
30
20
P 0.025
10
P 0.50 P 0.975
0 0
5
10
15
20
25
Test number FIGURE 7.3 Control chart for colony counts on food samples using SP method (3 degrees of freedom).
CH007--N53039.indd 127
5/26/2008 7:55:11 PM
128
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
conform to a Negative Binomial distribution rather than Poisson. It is important to recognize that these two methods were used in parallel to test the same dilutions of the test samples, although the SPM counts were done on lower dilutions than were the drop counts. This latter observation was unexpected, since SPM colony counts on whole plates and segments from SPM counts of pure cultures conform to Poisson (Jarvis, unpublished data). The contagious distribution detected may be related to cell aggregates associated with the presence of microcolonies in the food samples. Further tests on other SPM data for food samples indicated that counts on higher dilutions which gave colonies distributed across the whole plate, conformed well with Poisson ( 2 19 . 983 for v 33, giving P 0 . 95) .However, colony counts for lower dilutions of the samples, derived from the numbers of colonies developing in specific sectors of a plate, demonstrated a contagious distribution. Of 37 individual sets of counts analysed, 18 demonstrated contagion, 1 a regular and 18 a Poisson distribution. The cumulative 2 of 633.8 for 36, gave P 0.001. Since many of these counts were replicates at lower dilutions of those not disproved to conform to Poisson distribution at higher dilution, one must suppose that either the effect of close development of colonies on the SPM plates leads to interactive effects or at the low dilution organisms are not randomly distributed in the diluent. Comparison of mean colony count data for different dilutions, however, indicates little practical difference in the results obtained. LIMITING PRECISION AND CONFIDENCE LIMITS OF THE COLONY COUNT For data distributed according to a Poisson series, the variance (s2) should equal the mean value (x) and therefore the mean count itself provides a measure of the precision of the test. When a large number of sample units is tested (n > 30) and the samples are drawn at random, the derived mean is one of many possible means distributed around the true population mean. Similarly, when a mean colony count is used as an estimate of numbers of cfu in different dilutions of the sample, the derived mean cfu will be normally distributed around the true population mean of that dilution. In a normal population distribution, 95% of the values will lie within the range 1.96 standard deviations of the mean. Therefore, 95% of the sample means (x) will be expected to lie within the range of 1.96 standard errors of the population mean (). The lower 95% confidence limits of the mean are given approximately by x 1 . 96 standard errors and the upper limit by x 1 . 96 standard errors. More precisely, the 95% confidence limits are given by (x t s2 / n ) and (x t s2 / n ) , where s2 / n is the standard error of the mean and t is the value for Student’s t-distribution, and is dependent on the number of degrees of freedom ( n 1). In all cases where 2 is not known, Student’s t should be used since the sample variance (s2) provides only an estimate of 2. Values of t decrease as n increases, but for the 95% confidence limits, since all values of t are close to 2, the latter value may be used as an approximation. In a like manner, confidence limits for P 0.99, or any other probability value, may be derived (Table 7.1). For small samples (n 30) from a Poisson series, the normal approximation cannot be applied unless the product nm is greater than 30 (m is the estimate of the Poisson parameter ).
CH007--N53039.indd 128
5/26/2008 7:55:12 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
129
TABLE 7.1 Values of Student’s ‘t’ for 95% and 99% Confidence Limits for Different Degrees of Freedom ( n 1) 95% Confidence limits
t
30 35 40 45–54 55–70 71–98 99–165 165
2.04 2.03 2.02 2.01 2.00 1.99 1.98 1.97 1.96
99% Confidence limits
30 35 40 45 50 55 58–64 65–73 74–85 86–103 104–129 130–174 175
t 2.75 2.72 2.70 2.69 2.68 2.67 2.66 2.65 2.64 2.63 2.62 2.61 2.60 2.58
As m s2 in the Poisson series (see Chapter 3), the analytical mean (x) provides an estimate of both m and s2; hence the estimate of m is given by: x x / n , where x / n is the standard error of the mean. Confidence limits for the estimate of the population mean ( 2) are given by: xt
x x to x t n n
where t is approximately 2 (for 10) or may be obtained from Pearson and Hartley’s (1976) table 12 for 2Q 0.05 (for the 95% limits) or 2Q 0.01 (for the 99% limits). The calculation is illustrated in Example. 7.1. When nm 30, confidence limits for a Poisson variable can be obtained from standard tables (Pearson and Hartley (1976), table 12 for 1 2 0.95 provides the 95% confidence limits for a value c, which can be either a single count or an estimate (x) of the statistic m). For small samples (n 30) from a Binomial distribution, the normal approximation can be used, when P 0.4 – 0.6 and n 10 – 30; or when P 0.1 – 0.9 and n 30, where P is the probability and n is the maximum possible level of occurrence of individuals. When n 10 or n 10 – 30, it is difficult to calculate confidence limits, but approximate levels can be obtained from standard tables or charts of P (e.g. Pearson and Hartley (1976), table 41). In the latter, values of n are printed along the curves and values of P (as c/n) on the abscissa. Confidence limits are read from the ordinate. As P x / n , the limits for x are derived from
CH007--N53039.indd 129
5/26/2008 7:55:12 PM
130
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
EXAMPLE 7.1 CALCULATION OF 95% CONFIDENCE LIMITS FOR A SMALL SAMPLE (N < 30) FROM A POISSON SERIES For the colony counts used in Example 4.1 the statistics are: m x 76 . 7; . s2 49 . 8; n 10 . Agreement with a Poisson series was not rejected (P 0.05); therefore dispersion of the population was accepted to be random. Since the product nm 30, the best estimate of m (and hence of x ) with its standard error is given by:
x
76 . 7 x 76 . 7 76 . 7 2 . 77 n 10
The value of Student’s t (Pearson and Hartley, 1976, table 12) for n 1 9 degrees of freedom and 2Q 0.05 (for 95% limits) is: t 2.262. So the 95% confidence limits of the population mean () are given by: ⎛ ⎞ ⎛ ⎞ ⎜⎜ x t x ⎟⎟ to ⎜⎜ x 1 x ⎟⎟ ⎟⎟ ⎟ ⎜⎜ ⎜ ⎜⎝ n ⎟⎠ n ⎟⎟⎠ ⎝ (76 . 7 2 . 262 (2 . 77)) to (76 . 7 2 . 262 (2 . 77)) 70 . 44 to 82.96 or 76.7 6 . 26 Therefore for these data, the best estimate of the population mean would be expected to lie between 70 and 83 cfu/g, with a 95% probability.
limits for P multiplied by n. Such limits are rarely needed in microbiology except in relation to sampling schemes (Chapter 5). Approximate limits can be obtained from the Poisson variable but will always be wider and more skewed than for the corresponding Binomial variable, for example, 95% limits for x 14 and n 20 range from 8 – 24 for a Poisson distribution and from 9.5 – 17.8 for a Binomial distribution. For small samples (n 30) from a contagious (e.g. a negative binomial) distribution with s2 x , it is not possible to use the normal distribution to derive confidence limits until the data have been transformed. The choice of transformation depends on the values of x and kˆ , (Table 3.4 and method (3) for estimating kˆ ). Each value of x is replaced by a transformed value y where y log (x (k / 2)) or y sinh1 (x 0 . 375)/(k 0 . 75) Taking the simple case, the mean transformed count (y ) is given by: y log10 (x (kˆ / 2))/ n , where n the number of counts. As the distribution of the transformed counts is approximately normal and the expected variance is 0.1886 trigamma kˆ (Table 4.3), then the 95% confidence limits are given by: y t 0 . 1886 trigamma kˆ / n , where t is given by Student’s distribution (Table 7.1; and
CH007--N53039.indd 130
5/26/2008 7:55:12 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
131
Pearson and Hartley (1976), table 12). These limits are transformed back to the original scale to give confidence limits for the population mean: ⎞ ⎛ ⎜⎜ 0 . 1886 trigamma k ⎟⎟ k ⎟⎟ antilog ⎜ y t ⎜⎜ ⎟⎟ 2 n ⎠ ⎝ However, for small values of n (n 10) the estimate of confidence limits will be only very approximate. If the more complex transformation is used, the 95% confidence limits are given by y t 0 . 25 trigamma kˆ / n but it is doubtful whether the greater accuracy of this complex transformation is justified. In many instances, the Poisson approximation can be used if x is small (5) and kˆ is large (5). The logarithmic transformations (y logx; or y log(x 1)) can be used if kˆ is close to 2 – the addition of one is required if some of the results are zero since it is not possible to determine the logarithmic value of zero. An example of a calculation of 95% confidence limits for a negative binomial is given in Example 7.2 and for a logarithmic transformation in Example 7.3. Table 7.2 summarises the 95% confidence limits for various colony levels, assuming Poisson distribution. From the data on limiting precision, it can be seen that the precision of the count increases as the number of colonies counted increases. For drop counts it is possible to count colonies until a predetermined minimum count is reached, by plating a greater number of drops of each dilution than might normally be plated (Badger and Pankhurst, 1960). This has the advantage that the precision of the colony count is kept constant in relation to the distribution of organisms in the inoculum. For
TABLE 7.2 Approximate 95% Confidence Limits for Number of Colonies Assuming Agreement with Poisson Series Number of colonies counted 500 400 320 200 100 80 50 30 20 16 10 6
CH007--N53039.indd 131
Limiting precision (to nearest %) 9 10 11 14 20 22 28 37 47 50 60 83
95% Confidence limits 455 – 545 360 – 440 284 – 356 172 – 228 80 – 120 62 – 98 36 – 64 19 – 41 11 – 29 8 – 24 4 – 16 1 – 11
5/26/2008 7:55:13 PM
132
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
most purposes a count of about 100 colonies, providing an accuracy of 20%, is adequate, although Gaudy et al. (1963) recommended lower and upper limits for acceptable colony counts by the drop count method of 100 and 300 colonies. For pour-plate, spread-plate and similar methods of enumeration, this procedure cannot readily be applied. It is frequently recommended (ICMSF, 1978; Harrigan, 1998; Anon., 2005a, b) that colony numbers are counted only on plates having between 30 and 300 colonies (or between 25 and 250), thereby giving counts with a precision of about 3711%. It was noted by Cowell and Morisetti (1969) that use of other than 10-fold serial dilution series could lead to improved precision in the plate count. For instance, they suggested that if a 3-fold dilution series were used and counts were made only on plates with 80–320 colonies, the precision of the count would be increased to 2211%. Furthermore, such a procedure would avoid the situation where anomalies occur, for instance, no plates in a 10-fold series having less than 30 or more than 300 colonies, or sequential dilutions both having more than 30 but less than 300 colonies. When plates at more than one dilution level are counted, it is possible merely to derive an arithmetic mean count at, for example, two dilution levels (ICMSF, 1978, p. 116).
EXAMPLE 7.2 CALCULATION OF THE 95% CONFIDENCE LIMITS FOR A SMALL SAMPLE FROM A NEGATIVE BINOMIAL DISTRIBUTION / 2)) and the mean The counts (Example 4.4) are transformed using y log (x (k 5.0 and n 10. The geometric mean count transformed count y 1 . 3255 , with k (x ) antilog (y) 21 . 16 . From Table 4.3, the expected variance (0.1886 trigamma kˆ ) 0.0417. The value of t 2.262 for n 1 9 and 2Q 0.05 (Pearson and Hartley (1976), table 12). Therefore the 95% confidence limits of the sample mean (m) are given by: ⎡ ⎛ ⎞⎤ ⎜ t 0 . 1886 trigamma k ⎟⎟⎥ kˆ ⎢ ⎟⎟⎥ antilog ⎢ y ⎜⎜⎜ ⎟⎟⎥ 2 ⎢ ⎜⎜ n ⎟⎠⎥ ⎢⎣ ⎝ ⎦ ⎡ ⎛ 2 . 262 0 . 0417 ⎞⎟⎤ ⎟⎟⎥⎥ 2 . 5 . antilog ⎢⎢1 . 3255 ⎜⎜⎜ ⎟⎟⎠⎥ ⎜⎝ 10 ⎢⎣ ⎦ antilog(1.3255 0 . 1461) 2 . 5 12 . 61 to 27.12 for the derived mean of 21.16. It should be noted that the confidence limits are distributed asymmetrically around the geometric mean value, that is, the lower CL 21.16 8.55 12.61 and the upper CL 21.16 5.96 27.12. Asymmetry is always found when a reverse transformation is done on values calculated using log-transformed data.
CH007--N53039.indd 132
5/26/2008 7:55:13 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
133
EXAMPLE 7.3 CALCULATION OF 95% CONFIDENCE LIMITS FOLLOWING LOGARITHMIC TRANSFORMATION From Example 2.1 the transformed counts (y log x) are 4.2695, 4.9258, 4.8293, 4.8887 and 4.9165 log cfu/g. The mean transformed count (y) 4 . 7660 and the variance of the transformed counts ( s2y ) 0 . 07844 . For n1 4 degrees of freedom, and 2Q 0.05, the value of t 2.776. The 95% confidence limits for y , are given by: y t s2y / n 4 . 766 2 . 776 0 . 07844 / 5 4 . 766 0 . 3477 4 . 4183 to 5.1137 ⬇ 4 . 42 to 5 . 11 log cfu/g. Hence, the derived (geometric) mean antilog 4.766 58,345 cfu/g and the derived 95% CL are 26,200 to 129,927 cfu/g. Again, note the asymmetry of the confidence limits.
However, it has been recommended for many years that a weighted mean count be derived (Farmiloe et al., 1954); in this method, all colonies are counted at each dilution level providing countable plates and the mean count is weighted to take account of the differing levels of precision of the counts, using the following formula: x
1
d1
( ∑ C1 ∑ C2 ∑ C3 ... ∑ Cz ) n1
n n n2 23 ... zz 1 a a a
where: C1 the total colony count on all (n1) plates at the lowest countable dilution (d1); C2 total colony count on all (n2) plates at the next countable dilution (d2) with dilution factor a, etc. The use of this method is illustrated in Example 7.4. It is perhaps simpler to consider that, in a 10-fold series, the colonies (cfu) on a plate prepared at the first countable dilution level (d1) are equivalent to 1 ml of the diluted sample; the cfu at the next dilution represents 0.1 ml and at the next dilution 0.01 ml of the diluted sample. Hence, the total number of cfu counted on a single plate at each of three sequential dilutions is divided by 1.11 to give the derived number of cfu at the first of those dilution levels. If two or three parallel plates are counted at each dilution then the total count of colonies is divided by 2.22 or 3.33, respectively. Since a larger number of colonies are used to derive the weighted mean, the limiting precision of the mean is better than for either the mean count of colonies on the first countable plates or the arithmetic mean of colonies from two countable dilutions. Assuming a Poisson distribution, then the weighted mean is also the maximum likelihood estimate of the colony count. For the colony counts used in Example 7.4, the 95% confidence limits of the mean counts of colonies, and their limiting precisions, are shown in Table 7.2. In the example, the weighted mean and confidence limits of colony counts at two or three dilution levels does
CH007--N53039.indd 133
5/26/2008 7:55:13 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
134
EXAMPLE 7.4 CALCULATION OF WEIGHTED AND SIMPLE MEAN COUNTS Assume colony counts have been determined as follows on a series of dilution plates:
Dilution (d) 104 105 106
Number of colonies (C ) 320, 286, 291 45, 38, 43 3, 5, 4
Total number of colonies (C )
% Limiting precisiona
897 126 12
11.6 30.9 100.0
a
Limiting precision on mean count assuming Poisson distribution.
The arithmetic mean count of cfu at 104 dilution is given by: ⎛ 1 ⎞⎟ ⎛⎜ ∑ C ⎞⎟⎟ ⎛ 897 ⎞⎟ ⎜⎜ ⎟ ⎜⎜ 104 ⎜⎜ ⎟ 2 . 99 1 06 cfu/ml ⎜⎝ d ⎟⎠ ⎜⎜ n ⎟⎟⎟ ⎜⎝ 3 ⎟⎠ ⎝ ⎠ The arithmetic mean count of cfu at the 105 dilution 105(126/3) 42 105 Hence, depending on the dilution level used, the simple mean colony count will vary from 3.0 to 4.2 106. ICMSF (1978) recommended use of the arithmetic mean value, that is: (4.2 106 2. 99 106)/2 3.6 106 cfu per unit of sample. This value is, however, based on counts ranging in precision from 11 – 34%. If counts on the highest dilution were also included, the mean value would rise to 3.73 106, but the precision range would widen to 1173%. Using the formula of Farmiloe et al. (1954), the weighted mean count would be: ⎞⎟ ⎛ ⎞⎟ ⎜⎜⎜ ∑ C1 ∑ C2 … ∑ Cz ⎟⎟ ⎟⎟ ⎜ ⎟⎟ ⎜ ⎟⎟ n2 … nz ⎟⎜ ⎟⎟ 1 ⎠⎜ (z 1) n1 ⎜⎜⎝ ⎟⎠ a a
⎛1 x ⎜⎜⎜ ⎜⎝ d
1 (897 126 12) 1035
104
3 . 1 106 104 3 . 33 3 . 33
This calculation gives 1/10 of the weight to the colony count from the 105 dilution (precision 34%) and 1/100 of the weight to the 106 dilution count (precision 73%), whilst acknowledging that the overall precision of the mean count is increased by counting all countable colonies.
CH007--N53039.indd 134
5/26/2008 7:55:14 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
135
not differ from the arithmetic mean but the precision of the weighted mean count is slightly better (13.5%) than that using the arithmetic mean (the best is 14.4%). Where arithmetic means are derived from counts at two or more dilution levels, the confidence limits can be no better than the limits for the least accurate count; hence arithmetic averaging is not to be recommended . However, experience suggests that the precision is rarely improved (Jarvis et al., 2007).
GENERAL TECHNICAL ERRORS Incubation Errors The temperature profile within a stack of Petri dishes in an incubator will be dependent on: the temperature differential between the incubator air and the Petri dish contents; the number of Petri dishes in a stack; the degree to which stacks of dishes are crowded together; the extent to which the incubator is filled; and whether or not the incubator has an air circulator. Differences in count may occur in replicate dishes as a result of overcrowding of plates, poor air circulation, etc. since different dishes may effectively be incubated at different temperatures. Where a temperature gradient is suspected within an incubator duplicate plates should be incubated on different shelves in order to try to cancel out any temperature effects. These effects may be exacerbated by problems of inadequate oxygen transfer, resulting in oxygen starvation of strict aerobes, unless vented Petri dishes are used. Similarly, anaerobic cultures may be adversely affected by failure of anaerobe cultivation systems, etc.
Counting Errors The ability to count accurately the number of colonies in a Petri dish (or other system) is dependent on many factors, not least of which is the ability of the worker to discriminate colonies of different sizes both from one another and from food debris or imperfections in the agar. This ability depends not only upon the individual worker, per se, but also upon the worker’s state of mental and physical health, and whether distractions occur which may affect the accurate counting and recording of colony numbers. The efficiency of colony counting will generally be lower with increasing number of plates counted and will also be lower when very large numbers of colonies occur on a plate, such that only an estimated colony count can be derived (ICMSF, 1978). However, low accuracy may also occur where few colonies appear on a plate. In a study of the counting efficiency of six workers, Fruin et al. (1977) demonstrated that only 51% of colony counts (over the range 5–400 colonies/plate) lay within 5% of the real count (determined from photographs of the plates; Fruin and Clark, 1977), and only 82% lay within 10% of the ‘photo count’. Their data summarised in Table 7.3 show that the efficiency of counting increased slightly when not more than 20 colonies/plate were counted.
CH007--N53039.indd 135
5/26/2008 7:55:14 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
136
TABLE 7.3 Percentage Efficiency of Colony Counting by Six Workers Percentage of workers’ counts Colony count range/plate 10–100 20–200 30–300 40–400
Number of plates counted/worker 352 339 361 361
Within 5% of photo count 52a 60b 60b 61b
Within 10% of photo count 78a 85b 88b,c 89c
Source: Modified from Fruin et al. (1977). a,b,c Mean values within a column followed by the same letter are not significantly different (P 0.05).
Analysts tended to count fewer colonies than were actually on the plate and the mean deviation by all analysts from the photo count was 2.5% with a range of 0.1% to 4.8%. However, good laboratory practice requires that it is better generally not to use low counts except if these occur on the first countable plates. The number of colonies counted may also be affected by factors such as coalescence. Hedges et al. (1978), using the Miles and Misra (1938) technique, demonstrated that for Escherichia coli, the average number of colonies when counted after 9 h incubation was 55/drop (range 45–63), whereas the average number counted after 18 h was 45 (range 38– 52; n 16). Similar effects were not reported for counts of Staphylococcus aureus, which produces more compact colonies, or for E. coli when counted by the Colworth ‘droplette’ method (Sharpe and Kilsby, 1971) or the SPM (Gilchrist et al., 1973). The use of instrumental methods to enumerate colonies on plates can reduce the errors due to operator bias and fatigue. Yet counts made by such methods may not always be comparable with ‘manual’ counts because of differences in the discriminatory power of the eye and the electronic counting systems (Jarvis and Lach, 1975; Jarvis et al., 1978). For instance, the Fisher Bacterial Colony Counter can detect the presence of colonies that cannot readily be counted by eye, yet it cannot discriminate between food particles and bacterial colonies, or imperfections in the agar or the Petri dish. Large colonies may be counted more than once, whilst colonies at different depths of the agar may be ignored because colonies on the surface have already been ‘scored’. These faults are common to many electronic counting systems. For counting SP, a Laser Bacterial Colony Counter is available which has been shown to give good agreement with ‘manual’ counts in both laboratory and factory evaluations (Jarvis et al., 1978; Kramer et al., 1979). Whereas all colonies are normally recorded only for ‘countable’ plates, tubes and ‘drops’, the counting system for the SPM assumes that only part of each plate will be counted since the dilution is made on the plate, except when the colony count is low (Plate 7.1). In manual counting of SPM plates, colonies in two diametrically opposed sectors are counted and
CH007--N53039.indd 136
5/26/2008 7:55:14 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
137
the number is related to the area counted and hence the amount of inoculum dispensed in those areas. The Laser counter displays the area occupied by a predetermined number of colonies or, if that number is not reached, the total number of colonies on the plate is shown. By counting to a predetermined colony level, the distribution error is kept constant, thereby ensuring comparability in precision between replicate series of plates. Hedges et al. (1978) have shown that the precision of SPM plate counts is similar whether counts are made on the whole plate or only on a ‘sector’. Worker’s Error Courtney (1956), Anon. (2001) and ICMSF (1978) recommend that all workers should be able to repeat their own counts of colonies within 5% and the counts of other workers within 10%. Fowler et al. (1978) determined a coefficient of variation (CV) of 7.7% for individuals to reproduce their own results and 18.2% for workers to reproduce counts made by other persons; these errors include preparation of dilutions and plates, as well as counting errors. Donnelly et al. (1960) had previously shown that 7 of 21 analysts produced colony count data for milk with a marked consistent bias.
COMPARABILITY OF COLONY COUNT METHODS Because of the increasing interest being paid to microbiological quality monitoring and the present day interest in microbiological criteria for foods (Chapter 14), assessment of the comparability of various methods of analysis is necessary from time to time. In certain circumstances, specific techniques may be used since these will favour the growth of particular groups of organisms (e.g. spread plates for psychrotrophs on meat and poultry; Barnes and Thornley, 1966). In a comparison of the pour-plate and spread-plate methods for mesophilic aerobes on meat, Nottingham et al. (1975) reported that higher counts were generally obtained by the spread-plate method. This confirmed the earlier observations by Barraud et al. (1967). In contrast, close comparability of counts was found by Donnelly et al. (1976), Jarvis et al. (1977) and by Kramer et al. (1979) for various colony count methods when applied to a range of food products. Jarvis et al. (1977) were unable to demonstrate any statistically significant differences (P 0.05) between counts determined by four workers using the pour-plate, spread-plate, drop count and SP methods on sausages, minced beef, cream and coleslaw. Other studies on milk have shown a high degree of correlation between use of standard methods and the SP method (Donnelly et al., 1976). In a further investigation, Kramer and Gilbert (1978) compared the pour-plate, spread-plate, drop count, (Miles et al., 1938), the Colworth ‘droplette’ (Sharpe and Kilsby, 1971) and micro-dilution (Kramer, 1977) methods on a wide range of food products and were also unable to determine any significant differences. Correlation between all the methods was high (r 0.979– 0.994; P 0.001).
CH007--N53039.indd 137
5/26/2008 7:55:14 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
138
It has rightly been stated by Hedges et al. (1978) that such studies have been concerned more with comparability than with precision. In their work, Hedges and co-workers demonstrated that the SP method provided colony counts of bacteria in pure suspension that were at least as precise as those obtained under comparable conditions by the Colworth ‘droplette’ and the Miles and Misra (MM) methods. The CV were: 1.50% (SP), 0.96% (MM) and 4.41% (droplette) for 25 replicate counts; where 5 replicates were tested, the CV for each method was about 2.1%. Earlier studies using the pour-plate method (Ziegler and Halvorson, 1935) had shown a CV for counts on pure cultures ranging from about 3.5% (100 replicates) up to 24%, with a majority at the lower end. OVERALL ERROR OF COLONY COUNT METHODS Table 7.4 summarizes the types of error that can occur in colony count procedures. Jennison and Wadsworth (1940) confined consideration of the overall error to (a) the distribution, or sampling, error and (b) the dilution error. They defined it mathematically as: % Total error (% Distribution error)2 (% Dilution error)2 It is clear that these errors do not necessarily take into account the errors associated with sampling, weighing or plating (Hedges, 2002). Since the latter also comprise part of the overall worker’s error (i.e. in counting, recording, calculating, etc.), which itself includes the distribution error, plating errors may be ignored. Hence, the overall error of colony count procedures should be derived as follows: % Total error (A)2 (B)2 (C)2 where A % Sampling error; B % Distribution error; and C % Dilution error. TABLE 7.4 Sources of Error in Colony Count Procedures Source of error
Includes errors due to
Sampling error
Weighing Maceration Pipette imprecision Pipette calibration errors Dispensed diluent volumes Pipetting errors Culture medium faults Incubation faults Non-randomness of propagules Counting errors ‘Recording’ errors Mathematical errors
Dilution error
Plating error
Distribution error Calculation error
CH007--N53039.indd 138
5/26/2008 7:55:14 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
139
It is frequently stated that the overall results obtainable by colony count procedures should be repeatable within 10% of the value obtained by another worker, using the same techniques and the same samples. We have already seen that dilution errors alone can exceed 10% in some circumstances, depending on the relative accuracy of pipette and dilution volumes, and the number of dilutions prepared. Distribution errors can also exceed 10% when only small numbers of colonies (400) are counted. If we assume for normal routine purposes that the following percentage errors occur, we can derive estimates for the typical overall error of the colony count method. Assuming a sampling error of 5% (A), a distribution error of 10% (100 colonies counted) (B), and a dilution error of 5.5% (C) (from Table 6.4, dilution to 106 with different pipettes), then the overall % error is given by: Overall error (%) 52 102 5 . 52 155 . 25 12 . 46 % So if it is assumed that the cultural and other conditions are satisfactory, and that the distribution of organisms follows a Poisson series (i.e. that a hypothesis of randomness is not rejected), then the approximate 95% confidence limits on the count of 100 106 would be 25%. Table 7.5 illustrates the confidence limits for a range of colony count levels (and of their logarithmic transformations). Donnelly et al. (1960) recommended that the variance amongst analysts in state laboratories testing milk samples containing 5000–150,000 cfu/ml should not exceed 0.012 (in terms of log-count); hence, on 95% of occasions, the difference between two counts should not exceed 2 2(0 . 012) , i.e. 0.31 log units. This value is not dissimilar to that derived in Table 7.5 for a mean count of 200 colonies/plate. However, none of these data takes account of abnormal variation in distribution of organisms within a food sample, since milk is easily mixed and the theoretical calculation assumes random distribution. Hall (1977) showed that the variance of log-transformed colony counts done by different workers in different laboratories varied from 0.015 to 0.39, with an overall variance of 0.033 (almost treble that shown by Donnelly et al., 1960) giving overall confidence ranges from duplicate analyses of 0.51 log units. Variance between laboratories was 0.057 log units, giving reproducibility limits of 0.68 log units for duplicate analyses. Data obtained in analyses of various types of frozen vegetables (Hall, 1977) showed 95% confidence limits for aerobic plate counts ranging from 0.11 to 1.05 log units (Table 7.6). These limits take account of the inter-sample variation, in addition to the dilution and distribution errors. In similar tests using differential colony counts for coliforms, E. coli and S. aureus, Hall (1977) reported 95% confidence limits ranging up to 1.5 log units. Clearly, in a situation where sublethal damage causes variable recovery of organisms, especially on selective-diagnostic media, the degree of confidence that can be placed in the results will be limited. It has been stated elsewhere (Jarvis et al., 1977; Kramer and Gilbert, 1978) that the expected 95% confidence limits about the mean for aerobic colony counts is of the order of 0.5 log cycles. This is not dissimilar to some of the limits shown in Table 7.6 or to values
CH007--N53039.indd 139
5/26/2008 7:55:14 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
140
TABLE 7.5 Approximate Confidence Limits for Counts of Colonies Based on Poisson Distribution, Assuming Sampling Error of 5% and a Dilution Error (to 106) of 5.5%
Total colonies counteda 600 400 200 100 60 40 30
Mean colony count (106) 300 200 100 50 30 20 15
Overall % errorb ()
Bounds of the 95% CL on the colony count (106)
8.48 8.96 10.26 12.46 14.89 17.47 19.71
249–351 164–236 79–121 38–62 21–39 13–27 9–21
Log mean colony count 8.48 8.30 8.00 7.70 7.48 7.30 7.18
Bounds of the 95% CL on log mean countc 8.44–8.51 8.21–8.37 7.95–8.05 7.62–7.76 7.34–7.58 7.20–7.38 6.95–7.32
a
Assuming duplicate plates. Assuming six 10-fold dilutions, using different pipettes for each stage; error increases disproportionately with reducing numbers of colonies counted. c The 95% CL bounds around the log mean count are non-linear. b
TABLE 7.6 95% Confidence Limits for Aerobic Colony Counts on Various Frozen Vegetables
Type of vegetable
Mean colony count (log10 cfu/g)
95% Confidence limits (log10 cfu/g)a
Sliced green beans Peas Diced swede Diced carrots Diced celery Spinach leaf I Spinach leaf II Spinach leaf III
4.52 4.27 4.33 5.02 4.95 5.71 5.16 5.28
0.21 0.22 0.11 0.13 0.22 1.03 0.49 1.05
Source: Recalculated from Hall (1977). These limits differ in some cases from those shown in tables 1–8 of Hall (1977).
a
derived from other experimental data. However, the data of Hall (1977) indicate that much lower precision can occur frequently. Some contribution towards these wide confidence limits is related to worker error and to non-uniform experimental technique, including variable resuscitation. Hall’s (1977) report is a useful source of data on inter- and intra-worker and laboratory error in food microbiology.
CH007--N53039.indd 140
5/26/2008 7:55:14 PM
ERRORS ASSOCIATED WITH COLONY COUNT PROCEDURES
141
References Anon. (2001) Downes, FP and Ito, K (eds.) In Compendium of Methods for the Microbiological Examination of Foods, 4th edition. American Public Health Association, Washington, DC. Anon. (2005a) Milk and Milk Products – Quality Control in Microbiological Laboratories – Part 1: Analyst Performance Assessment for Colony Counts. International Organisation for Standardisation. ISO 14461-1:2005. Anon. (2005b) Milk and Milk Products – Quality Control in Microbiological Laboratories – Part 2: Determination of the Reliability of Colony Counts of Parallel Plates and Subsequent Dilution Steps. International Organisation for Standardisation, Geneva. ISO 14461-2:2005. Badger, EM and Pankhurst, ES (1960) Experiments on the accuracy of surface drop bacterial counts. J. Appl. Bacteriol., 23, 28–36. Barnes, EM and Thornley, MJ (1966) The spoilage flora of eviscerated chickens stored at different temperatures. J. Food. Technol., 1, 113–119. Barraud, C, Kitchell, AG, Labots, H, Reuter, G and Simonsen, B (1967) Standardisation of the total aerobic count of bacteria in meat and meat products. Fleischwirtschaft, 12, 1313–1318. Courtney, JL (1956) The relationship of average standard plate count and ratios to employee proficiency in plating dairy products. J. Milk Food Technol., 10, 336–344. Cowell, ND and Morisetti, MD (1969) Microbiological techniques – some statistical methods. J. Sci. Food Agric., 20, 573–579. Donnelly, CB, Harris, EK, Black, LA and Lenis, KH (1960) Statistical analysis of standard plate counts of milk samples split with state laboratories. J. Milk Food Technol., 23, 315–319. Donnelly, CB, Gilchrist, JE, Peeler, JT and Campbell, JE (1976) Spiral plate count method for examination of raw and pasteurised milk. Appl. Environ. Microbiol., 32, 21–27. Eisenhart, C and Wilson, PW (1943) Statistical methods and control in bacteriology. Bacteriol. Rev., 7, 57–137. Farmiloe, FJ, Cornford, SJ, Coppock, JBM and Ingram, M (1954) The survival of Bacillus subtilis spores in the baking of bread. J. Sci. Food Agric., 5, 292–304. Fowler, JL, Clark, WS, Foster, JF and Hopkins, A (1978) Analyst variation in doing the standard plate count as described in ‘Standard methods for the examination of dairy products’. J. Food Protect., 41, 4–7. Fruin, JT and Clark, WS (1977) Plate count accuracy: Analysts and automated colony counter versus a true count. J. Food Protect., 40, 552–554. Fruin, JT, Hill, TM, Clarke, JB, Fowler, JL and Guthertz, LS (1977) Accuracy and speed in counting agar plates. J. Food Protect., 40, 596–599. Fisher, RA, Thornton, HG and MacKenzie, WA (1922) The accuracy of the plating method of estimating the density of bacterial populations with particular reference to the use of Thornton’s agar medium with soil samples. Annal. Appl. Biol., 9, 325–359. Gaudy, AF, Jr., Abu-Niaaj, F and Gaudy, ET (1963) Statistical study of the spot-plate technique for viable-cell counts. Appl. Microbiol., 11, 305–309. Gilchrist, JE, Campbell, JE, Donnelly, CB, Peeler, JT and Delaney, JM (1973) Spiral plate method for bacterial determination. Appl. Microbiol., 25, 244–252. Hall, LP (1977) A study of the degree of variation occurring in results of microbio-logical analyses of frozen vegetables. Report No.182. Campden & Chorleywood Food Research Association, Campden UK. Harrigan, WF (1998) Laboratory Methods in Food Microbiology, 3rd edition. Academic Press, London.
CH007--N53039.indd 141
5/26/2008 7:55:14 PM
142
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Hedges, AJ (2002) Estimating the precision of serial dilutions and viable bacterial counts. Int. J Food Microbiol., 76, 207–214. Hedges, AJ, Shannon, R and Hobbs, RP (1978) Comparison of the precision obtained in counting viable bacteria by the ‘Spiral Plate Maker’, the ‘Droplette’, and the ‘Miles & Misra’ Methods. J. Appl. Baceriol., 45, 57–65. ICMSF (1978) Microorganisms in Foods 1. Their Significance and Methods of Enumeration, 2nd edition. (reprinted and revised 1988) University Press, Toronto. Jarvis, B, Hedges, A, and Corry (2007) Assessment of measurement uncertainty for quantitative methods of analysis: Comparative assessment of the precision (uncertainty) of bacterial colony counts. Int. J. Food Microbiol. 116, 44–51. Jarvis, B and Lach, V (1975) Evaluation of the Fisher Bacterial Colony Counter. Tech. Circ. No. 600. Leatherhead: Leatherhead Food Research Association. Jarvis, B, Lach, VH, and Wood, JM (1977) Evaluation of the Spiral Plate Maker for the enumeration of microorganisms in foods. J. Appl. Bacteriol., 43, 149–157. Jarvis, B, Lach, VH, and Wood, JM (1978) Evaluation of the Laser bacterial colony counter. Research Report No. 290. Leatherhead Food Research Association, Leatherhead. Jennison, MW and Wadsworth, GP (1940) Evaluation of the errors involved in estimating bacterial numbers by the plating method. J. Bacteriol., 39, 389–397. Kramer, J (1977) A rapid microdilution technique for counting viable bacteria in food. Lab. Pract., 26, 676. Kramer, JM, and Gilbert, RJ (1978). Enumeration of microorganisms in food: a comparative study of five methods. J. Hyg., Camb., 61, 151–159. Kramer, JM, Kendall, M and Gilbert, RJ (1979) Evaluation of the spiral plate and laser colony counting techniques for the enumeration of bacteria in foods. Eur. J. Appl. Microbiol. Biotechnol., 6, 289–299. Miles, AA, Misra, SS and Irwin, JO (1938) The estimation of the bacteriological power of blood. J. Hyg., Camb., 38, 732–749. Nottingham, PM, Rushbrock, AJ, and Jury, KE (1975) The effect of plating technique and incubation temperature on bacterial counts. J. Food Technol., 10, 273–279. Pearson, ES and Hartley, HO (1976) Biometrika tables for statisticians, 3rd edition. University Press, Cambridge. Reed, RW and Reed, GB (1949) “Drop plate” method for counting viable bacteria. Can. J. Res., E 26, 317–326. Sharpe, AN and Kilsby, DC (1971) A rapid inexpensive bacterial count technique using agar droplets. J. Appl. Bacteriol., 34, 435–440. Snyder, TL (1947) The relative errors of bacteriological plate counting methods. J. Bacteriol., 54, 641–654. Spencer, R (1970) Variability in colony counts of food poisoning clostridia. Res. Report No. 151. Leatherhead Food Research Association, Leatherhead. Wilson, GS (1922) The proportion of viable bacteria in young cultures with especial reference to the technique employed in counting. J. Bacteriol., 7, 405–446. Ziegler, NR and Halvorson, HO (1935) Application of statistics to problems in bacteriology. IV. Experimental comparison of the dilution method, the plate count and the direct count for the determination of bacterial populations. J. Bacteriol., 29, 609–634.
CH007--N53039.indd 142
5/26/2008 7:55:15 PM
8 ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
A quantal test is one that gives an ‘all or nothing’ response, for example, growth () or non-growth () in a suitable culture media. Microbiological quantal procedures are those used to detect the ‘presence or absence’ of a specific target organism (e.g. a pathogen such as Salmonella, Listeria or Escherichia coli in foods), or specific spoilage organisms (e.g. Alicyclobacillus spp. in fruit-concentrates). Procedures include the simple detection of microbial growth, following inoculation into a suitable culture medium, by the development of turbidity or by biochemical/biophysical change in the medium (e.g. production of acid and gas in MacConkey broth), or a change in an electrical signal in an impedance culture system. In other cases, inoculation of pre-enrichment media followed by selective enrichment, diagnostic plating and identification of organisms may be required. Modern developments in quantal methodology include the use of immuno-magnetic beads (Patel, 1994; Shaw et al., 1998), to separate target organisms from a mixed flora following the pre-enrichment stage, and the use of genetic techniques such as the Polymerase Chain Reaction (PCR tests) for detection of DNA from selected target species (see for instance ISO 2005a, b, 2006a, b). No matter the approach that may be used, all such procedures are based on the concept of quantal response. DILUTION SERIES AND ITS MOST PROBABLE NUMBER COUNTS In its simplest form, the dilution count method consists of serial dilution of the initial food sample homogenate followed by inoculation of an appropriate volume of each dilution into one or more tubes of culture medium (agar or broth). The number of tests that show evidence of growth is recorded after incubation. Cultures showing no evidence of growth are assumed not to have received a viable inoculum (q 1), that is, to have received 1 viable cell or cell aggregate; an assumption which may not always be correct. Similarly, any culture showing growth is assumed to have received at least one viable cell or cell aggregate (p 1). When a culture of microorganisms is inoculated at each of several dilution levels a graded response would be expected with more tests positive at the lowest dilution level and none, or only a few, positive tests at the highest dilution tested. This is illustrated in Table 8.1 Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH008-N53039.indd 143
143
5/26/2008 7:58:45 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
144
TABLE 8.1 Expected Proportional Responses in a Multiple-Tube Dilution Series
Dilution tested 103 104 105 106 107 108
Expected number of positives for number of tests inoculated
Mean inoculum per testa (m)
Positive (Px 0)
Negative (Px0)
1
5
10
200 20 2 0.2 0.02 0.002
0.999 0.999 0.865 0.181 0.020 0.002
0.001 0.001 0.135 0.819 0.980 0.998
1 1 1 0 0 0
5 5 4 1 0 0
10 10 9 2 0 0
Expected proportion of tests
Initial inoculum contains 2 105 viable cells/ml; dilution factor 10. a Assuming 1 ml of sample dilution per test.
for dilutions from 103 to 108 of a culture initially containing 2 105 viable cells/ml and 1, 5 or 10 replicate tests at each dilution level. Statistically, such quantal responses are described by the binomial distribution where the proportion of positive reactions is given by p and negative reactions by q, where q 1p (see Chapter 3). In practice, the binomial distribution proves very unwieldy if many tests are used and the Poisson distribution provides a simpler approach. If it is assumed that each viable cell is capable of producing a growth response, then the probability that a culture will receive no inoculum (Px0), after inoculation with a suitable volume of a dilution containing a mean inoculum level of m cells, is given by the first term of the Poisson expansion (see Table 3.2), that is Px0 em q. Similarly, the probability for a positive (i.e. growth) response is given by: Px0 1 Px0 1em p. A plot of the expected dose-response curve, based on Poisson distribution, is shown in Fig. 8.1. Eisenhart and Wilson (1943) showed that, in practice, the actual dose-response curve may be less steep and Meynell and Meynell (1965) suggest that this could arise both through systematic technical error and as a result of contagious distribution, which is a consequence of cell clumping, sublethal damage, etc.
Single-Tube Dilution Tests If only one test is done at a given dilution level, only two responses are possible, that is, growth (positive) or no growth (negative). When such single tests are done at several dilution levels, it is not uncommon for ‘skips’ to occur, for example, Dilution Response
CH008-N53039.indd 144
101
102
103
104
105
5/26/2008 7:58:46 PM
Probability of growth Px>0
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
145
1.00 0.80 0.60 0.40 0.20 0.00 0.001
0.01
0.1
1
10
100
Mean inoculum level (m) FIGURE 8.1 Dose-response curve: Probability of growth (i.e. a positive response) at various inoculum levels (m).
The probability for a negative response (i.e. fewer than one viable organism/test) is dependent on the level of organisms in the original sample. For instance, if the above series were derived from a sample containing, say, 5 103 viable cells/ml then, assuming 1 ml inoculum/test, the probability of occurrence of the observed negative result at the 104 and the positive result at the 105 dilution would be 0.62 and 0.05, respectively. Hence, the result obtained on the 104 dilution might be expected to occur on about 6 out of 10 occasions. However, the result on the 105 dilution would be expected to occur on about 1 out of 20 occasions. Neither result would be improbable, but the result on the 105 dilution would be less probable than that on the 104 dilution. However, if the original sample contained only 1 103 viable cells/ml, the frequencies for the observed results in the 104 and 105 dilutions would be 0.90 and 0.01 respectively. Hence, the result at the 104 dilution would be highly probable since. It might be expected to occur in 9 out of 10 tests, but the result at the 105 dilution would be highly improbable since it might reasonably be expected to occur in fewer than one in 100 tests. The probabilities for such an observation are listed for various inocula levels (m) in Table 8.2, from which it can be seen that a ‘skip’ could occur by chance at least once in every 20 tests for mean inoculum levels between 0.5 and 3 viable cells/test at the 10z dilution. When such skips occur, the analyst needs to decide whether the observed result could have been expected to occur merely by chance (i.e. P 0.05) or whether the probability of its occurrence is so remote that the result should be ignored. Only an approximate indication of the initial contamination level can be derived from a single-test assay, as illustrated in Example 8.1 and the occurrence of a ‘skip’, which should always be recorded, can affect the confidence that can be placed in the result. It is doubtful whether it is generally worthwhile to derive confidence limits for probable contamination levels determined by single-tube dilution tests. It should be noted that the calculation of the mean and confidence limits assumes a Poisson distribution (which may not be correct) and ignores any effects of dilution and other technical errors. In the single-tube, as in multiple-tube series the confidence limits for the results can be improved by reducing the dilution interval (i.e. 1 in 2 dilution is better than 1 in 4, which is better than 1 in 10), since the response curve obtained would be smoother with the lower dilution factor.
CH008-N53039.indd 145
5/26/2008 7:58:46 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
146
TABLE 8.2 Probability of Occurrence of ‘Skips’ in a Single-Tube Dilution Series with Various Inocula Levels Probability* of Mean inoculum level in dilution 10z a (m)
a negative result (Px0) at dilution 10z a
0.01 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 30.0
a positive result (Px0) at dilution 10(z1) a 0.01 0.01 0.01 0.02 0.03 0.04 0.05 (1 in 20)** 0.06 0.07 0.08 0.09 0.10 0.18 0.26 (1 in 4) 0.33 0.40 0.95
0.99 0.95 0.90 0.82 0.74 0.67 0.61 (ca. 1 in 2) 0.55 0.50 0.45 0.41 0.37 0.14 0.05 (1 in 20)** 0.02 0.01 0.01
*
To 2 significant places. 95% CL at which both events might occur purely by chance. a 10z dilution level (e.g. 101); 10(z1) next dilution level (e.g. 102). **
EXAMPLE 8.1 CALCULATION OF INITIAL CONTAMINATION LEVEL FROM A SINGLE-TEST DILUTION SERIES One ml inoculum, from each of several 10-fold serial dilutions, was inoculated into tubes of culture medium and after incubation the following results obtained: Result at dilution 2
10 Sample A Sample B
103
104
105
106
Sample A: The results indicate that the initial contamination level lies between 103 and 104 organisms/ml of original sample. From Table 8.2, the 5% probability that a positive result P( x1) [1 P( x0) ] will occur at dilution 10z
CH008-N53039.indd 146
5/26/2008 7:58:46 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
147
(i.e. 103 in this example) is associated with a mean inoculum level of not more than 0.5 organisms/unit volume of that dilution. The 95% probability of a negative result P(x0) at the 10(z1) dilution would be associated with a mean inoculum level of not more than of 3 organisms/unit volume of dilution 10(z1). Hence, the apparent contamination level lies between 1000 and 10,000 organisms per unit volume of the original sample, but the 95% confidence limits range from 0.5 103 to 3 104 (i.e. from 500 to 30,000) organisms/ml original sample. Sample B: The apparent contamination level could be either 103 and 104 or 105 and 106 organisms, (i.e. between 103 and 106 organisms/unit of original sample.). Applying the same approach as that applied for sample A, the 95% limits for the count of 103–104 are 500–30,000 organisms. If the true level of organisms is, say, 5 103 organisms/ml, then the likelihood that a negative result will be seen at the 104 dilution is 0.61, and the likelihood of a positive result at the 105 dilution is 0.005. This observed result would occur in at least once in every 20 tests. These limits are shown (**) in Table 8.2. Similarly, the limits for the results on the 105106 dilutions are 50,000– 3,000,000 organisms/ml. But with a true level of 5 103 organisms/ml, the probability of a positive result at the 105 dilution is only 0.01, although the probability of a negative results on the 106 dilution is 0.99. Hence, although the result on the 105 dilution should be recorded it is highly improbable and should not be used in a calculation of cell numbers.
Multiple Tests at a Single Dilution Level If more than one test is inoculated at a given dilution level, the growth response will range from either all positive (at a high inoculum level) to all negative (with a very low inoculum), with a graded response between in which some tubes remain sterile whilst others exhibit growth (see last 2 columns, Table 8.1). For a single dilution level, the probability (Px1) that no viable organisms will occur in a sample of volume v when the true density m cells/unit volume is given by P(x0) evm. If n portions each of volume V are tested and, of these, S samples are sterile, then the proportion of sterile tests is S/n and the estimated probability ( Pˆ ) will be given by Pˆ s / n evm . This can be ˆ ) of the mean number of organisms (m) in volume v: rearranged to give an estimate ( m ⎛ 1 ⎞ ⎛S⎞ ⎛ 2 . 303 ⎞⎟ ⎛S⎞ ˆ ⎜⎜ ⎟⎟⎟ ln ⎜⎜ ⎟⎟⎟ ⎜⎜ m log ⎜⎜ ⎟⎟⎟ ⎜⎝ V ⎠ ⎜⎝ n ⎠ ⎜⎝ V ⎟⎟⎠ ⎜⎝ n ⎠ ˆ is the probwhere ln and log are logarithms to bases e and 10, respectively. The value m able number of organisms/ml.
CH008-N53039.indd 147
5/26/2008 7:58:47 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
148
30
Probability(%)
25
20
15
10
5
0 0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Mean number of organisms/ml FIGURE 8.2 Probability distribution curves for two multiple-tube dilution tests at a single dilution level, with n 10 replicates, and 6 (䊏) or 4 (䉱) sterile tests, derived using equation: (n!/S!(nS)!)eVmS(1eVm)(nS). The maximum likelihood values are ca. 0.5 organisms/ml for 6 sterile tests, and c. 0.9 organisms/ml for 4 sterile tests.
At a probability of P0 that a sample is ‘sterile’, then the probability that S of n samples ⎡ ⎤ n! ⎥ P S (1 P0 )nS , but since ⎢⎣ (S !(n S) !) ⎥⎦ 0 ⎡ ⎤ n! ⎥ eVm (1 eVm )nS (Cochran, 1950). P0 evm, this expression may be rewritten as: ⎢⎢ ⎣ (S !(n S) !) ⎥⎦
are sterile is given by the binomial expansion of ⎢
The terms derived from this expansion permit the construction of a curve showing the ˆ ). The value of m ˆ is the probability (P) of occurrence against the density of organisms ( m Poisson estimate of the mean that corresponds to the highest probability value and provides an estimate of the most probable number (MPN). Figure 8.2 presents curves for two values ˆ 0 . 51 of S/n, assuming V 1 ml. If S/n 0.6, then the MPN of organisms inoculated m cells/ml but if S/n 0.4, the MPN (mˆ) 0 . 92 cells/ml. These values are the same as those ˆ 1 / V ln S/ n. However, from the shapes that would be derived using the equation m of the curves it is clear that many other densities of organisms could also give the observed ratios of S/n (i.e. 0.4 and 0.6), though at a lower probability. MULTIPLE TEST DILUTION SERIES Multiple Tests at Several Dilution Levels The precision of multiple dilution tests is very poor when the number of organisms inoculated is likely to give either all positive or all negative results. If all replicate tests are positive,
CH008-N53039.indd 148
5/26/2008 7:58:47 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
149
the estimated density is infinity and if all results are negative, the estimated density is zero at maximum probability. Thus, a test using replicates at a single dilution is of value only if the numbers of organisms inoculated (mV) is chosen to give a graded response with some tests positive and others negative. ˆ ) can When more than one dilution level is used, an estimate of population density ( m be derived for each dilution level. If a series of multiple tests (ni) is set up at several serial dilutions (i), each with an inoculum volume Vi, and Si sterile tests occur then the estimated density of organisms for the ith dilution is given by: ⎛ 2 . 303 ⎞⎟ ⎛S ⎟⎟ log ⎜⎜ i ˆ i ⎜⎜⎜ m ⎜⎜⎝ n ⎜⎝ Vi ⎟⎠ i
⎞⎟ ⎟⎟ ⎟⎠
However, the best estimate at each dilution will vary in precision, so it is wrong to take the arithmetic mean of the estimates for each dilution level in order to derive an overall estimate. Two primary methods exist for deriving the best estimate of numbers. Stevens’ method for multiple dilution levels. Fischer and Yates (1954) published Tables based on the mean fertile level (x) and the mean sterile level (y) where x number of positive cultures at all dilution levels divided by the number of cultures (n) at each level, and y number of levels (i) minus x. Using the same notation as before (i.e. Si is the number of negative results out of ni tests at the ith dilution level), then: x
∑ {(n1 S1) (n2 S2 ) ... (ni Si )} n
and
yix
Using Table 8.3 (reproduced from Fisher and Yates, 1963) we determine the value of a factor K for values of x or y according to the number of dilution levels (i) and the dilution ratio . The estimated number of organisms is then given by: log x(log )K, where estimated number of organisms ( mˆ ). For 10-fold dilutions this simplifies to: log xK. The variance of the estimate of log is given by: s2 n1 (log 2)(log ) . For 10-fold dilutions: 1 2 slog n (log 2)(log 10) 0 . 3010 / n . This method is applicable to 2-, 4-and 10-fold dilution ratios with not less than 4, 4 or 3 dilution levels, respectively. Unfortunately, the method does not discriminate in any way between valid and invalid series of results (see below). The MPN method for multiple dilution levels is an extension of the concept used above for multiple tubes at a single level. Suppose that for i dilution levels the proportions of sterile (Si) cultures are given by: S S1 S2 S3 , , , ... , i n1 n2 n3 ni Then, the probability that these events should all happen at once can be derived from the product of the probabilities that each individual event would occur. As before, a graph of
CH008-N53039.indd 149
5/26/2008 7:58:47 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
150
TABLE 8.3 Determination of K Values for the Estimation of the Density of Organisms by the Dilution Method Value of K for 10-fold dilutions a at 3 or more levels
4-fold dilutions at levels Value x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.5 2.0 2.5
4
5
6
0.704
0.706
0.707
0.615
0.617
0.618
0.573
0.576
0.577
0.555 0.545 0.537
0.558 0.551 0.548 0.545
0.559 0.553 0.551 0.552
x1
x1
0.761 0.740 0.733 0.736 0.744 0.753 0.763
0.763 0.768 0.768 0.760 0.747 0.736 0.733 0.736 0.744 0.753 0.763
Source: Reproduced with permission of Pearson Education Ltd from Fisher and Yates (1963, table VIII.2), Statistical Tables for Biological Agricultural and Medical Research, published by Oliver & Boyd, Edinburgh. a When x 1 enter table with decimal part of x only.
probability against m shows a single maximum likelihood value that is referred to the MPN ˆ cannot be written down explicitly, ˆ ). As pointed out by Cochran (1950), the value of m (m but can be derived from the equation: S1v1 S2 v2 ... Si vi (n S1 )v1ev1m (n2 S2 )v2 ev2 m ... (ni Si )vi evi m 1 1 ev1m 1 ev2 m 1 evi m where, vi volume of culture inoculated, Si number of sterile tubes out of ni tests prepared at dilution i and m is the MPN. Methods for solving this equation by iteration have been given by Halvorson and Ziegler (1933), Finney (1947) and Hurley and Roscoe (1983). De Man (1975, 1983) derived
CH008-N53039.indd 150
5/26/2008 7:58:47 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
151
composite probability distribution curves for various combinations of results, dilution levels and culture numbers, using computer techniques. Hurley and Roscoe (1983) provide a computer programme written in ‘Basic language’ to calculate MPN values and the standard error of the log10 MPN. For routine purposes, standard Tables can be used to derive the MPN of organisms in a sample. Some MPN Tables do not provide data on the confidence limits that can be applied to the MPN values and, in some instances, include many unacceptable and improbable results (i.e. nonsense combinations; De Man, 1975). However, reliable MPN Tables based on De Man (1975) are given by for instance, ICMSF (1978), Blodgett (2003), ISO (2005c), AOAC (2006). Woodward (1957), De Man (1975) and Hurley and Roscoe (1983) have all pointed out that the occurrence of improbable results indicates a malfunction in the performance of the MPN test. As an extreme example, De Man (1975) cites the case of a result 0, 0, 10 positive results from a 10-tube 3-dilution series (100102). The MPN for such a result can be calculated as 0.9 organisms/ml with 95% confidence limits of 0.5–1.7 with a probability of obtaining such a result of 1024 for a mean inoculum level of 0.9 organisms/ml. Furthermore, it should be noted that the method of Stevens would give the same estimate of the number of organisms ( 1.73; 95% CL 0.78–3.85) for this combination (0-0-10) as for the combinations 0-10-0 and 10-0-0, even though only the last example would be acceptable, giving an MPN of 2.3 organisms/ml with 95% CL of 1.2–5.8 organisms/ml (De Man, 1975). In order to assess whether an improbable result has been obtained, the 2 test can be used to assess the goodness-of-fit of observed and expected results. Moran (1954 a,b, 1958) has proposed a more severe test but it is applicable only when the extreme dilutions give approximately 0% and 100% positive results. Moran’s test is performed by deriving a value Tobs, which is given by: Tobs
z
∑
m1
fm (n fm ) , where fm the number of positive results at each
dilution (m 1… z) and n is the total number of tests at each dilution (i). The expected value (Texp) and the standard error (SET) can then be obtained from tables (Table 8.4); a value M is derived for the significance of the difference between Tobs and Texp, that is, M (TobsTexp)/SET. The upper 95 and 99% limit values of M would be 1.645 (P 0.05) or 2.326 (P 0.01), respectively. The use of this method is illustrated in Example 8.2. Cochran (1950) pointed out that in planning an MPN test it is necessary to decide on: (1) the range of volumes to be tested; (2) the dilution factor to be used; and (3) the number of tests to be inoculated at each dilution level. In planning such a test, the aim is to obtain equal relative precision across a number of possible levels of cell density, that is, the ratio between the standard error and the true density should be unity. When the density of organisms is expected to be reasonably constant (e.g. in analysis of a water supply of known quality) it is possible to select inoculum levels such that the lowest dilution volume (i.e. largest volume of inoculum) should contain at least one viable organism and that the
CH008-N53039.indd 151
5/26/2008 7:58:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
152
TABLE 8.4 Expected Values of T (Texp) and Standard Error of T (SET) for Moran’s Test Texp at dilution ratio na 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 30 40
2 20 30 42 56 72 90 110 132 156 182 210 240 272 306 342 380 870 1560
SET at dilution ratio
4
10
2
10 15 21 28 36 45 55 66 78 91 105 120 136 153 171 190 435 780
6.02 9.03 12.64 16.86 21.67 27.09 33.11 39.74 46.96 54.79 63.22 72.25 81.88 92.12 102.95 114.39 261.90 469.61
5.69 7.63 9.75 12.02 14.46 17.03 19.74 22.58 25.55 28.63 31.83 35.14 38.56 42.07 45.70 49.41 91.53 141.49
4 4.02 5.40 6.89 8.50 10.22 12.04 13.96 15.97 18.07 20.24 22.51 24.85 27.27 29.75 32.31 34.94 64.72 100.05
10 3.12 4.19 5.35 6.60 7.93 9.34 10.83 12.39 14.02 15.71 17.74 19.28 21.16 23.08 25.08 27.11 50.22 77.63
Source: Modified from Meynell and Meynell (1970). n number of tests inoculated at each dilution.
a
highest dilution (i.e. smallest volume) should contain not more than two viable organisms. However, when the upper density of organisms nears the limit of the test for the chosen dilutions, the probability of obtaining all positive cultures is greater for a small number of replicates at each dilution level than for a large number of replicates. Hence, it is safer to use the rule that the maximum number of organisms should not exceed 1 per volume of the highest dilution. This can be done by estimating the upper and lower limits (mˆ H and mˆ L) between which the true density would be expected to lie. Then the largest volume (VH; i.e. the volume at lowest dilution level) and the smallest volume (VL; i.e. volume at highest dilution level) can be derived from: VH 1/ mˆ L and VL 11 mˆ H. For example, if the probable true density (m) is expected to lie between 10 and 800 organisms/ml, the largest volume to be inoculated per test should not exceed 1/10 ml (1 ml of a 101 dilution) and the smallest volume should not exceed 1/800 (1.25 ml of a 103 dilution). In practice, this range could be covered by testing 1 ml of each of three 10-fold dilutions (i.e. 1/10, 1/100 and 1/1000) or by testing 1 ml of, say, one 10-fold and four 4-fold dilutions (i.e. 1/10, 1/40, 1/160, 1/640, 1/2560).
CH008-N53039.indd 152
5/26/2008 7:58:48 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
153
EXAMPLE 8.2 USE OF MORAN’S TEST FOR THE VALIDITY OF MULTIPLE-TUBE DILUTION COUNTS Assume the following results to be obtained in a multiple-tube dilution assay on 2 samples, using 10 tubes at each dilution level and a 10-fold dilution ratio: Number of tests positive at dilution
Sample A Sample B
101
102
103
104
10 10
10 9
8 7
4 5
105 0 0
For Sample A: Tobs 10 (10 10) 10 (10 10) 8 (10 8) 4 (10 4) 0 (10 0) 0 0 16 24 0 40
Entering Table 8.4 for n 10 and with a 10-fold dilution factor, Texp 27 . 09 and SET 9 . 34
then: MA (4027.09)/9.34 1.38 Since MA1.45, the observed results for Sample A do not differ significantly (P 0.05) from those expected and the series is valid for an MPN calculation. For Sample B: Tobs 10 (10 10) 9 (10 9) 7 (10 7) 5 (10 5) 0 (10 0) 0 9 21 25 0 54
Again, from Table 8.4 for 10 tests at 10-fold dilutions: Texp 27 . 09 and SET 9 . 34
then MB (54 27.09)/9.34 2.88 Since MB 2.326, the observed number of positive tubes differs significantly (P 0.01) from the expected number and therefore the results on Sample B are improbable and should not be used to calculate an MPN.
CH008-N53039.indd 153
5/26/2008 7:58:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
154
80
Dilution ratio 2 Dilution ratio 10
70
Standard error % MPN
. 60
50
40
30
20 0.1
0.5
1.0
5.0
10.0
50.0
100.0
500.0
True density (organisms per ml) FIGURE 8.3 Changes in the standard error of MPN values for dilution ratios 2 and 10 (reproduced from Cochran, 1950, by permission of the International Biometric Society).
The choice of dilution ratio affects the test precision significantly over the range of dilution ratios between two and ten only if the total number of tests is not kept constant. However, the precision is more constant throughout the range of densities tested (i.e. 1/VH to 1/VL) if the lower dilution ratio is used. This is illustrated in Fig. 8.3 for dilution ratios of two and ten. The standard error of the MPN can be derived (Cochran, 1950) from the following equation (assuming a cell density between 1/VH and 1/VL, a defined number of tests at each dilution and a dilution ratio of five or less): SElog mˆ 0 . 55 (log ) / n , where the dilution ratio and n number tests at each dilution. For 10-fold dilutions, where ˆ , a more conservative estimate is preferred: the standard error may peak at the value for m SElog mˆ 0 . 58 log / n 0 . 58 1 / n . Values for various levels of and n, together with factors for deriving standard errors and 95% confidence limits are given in Table 8.5. See also the paper by Hurley and Roscoe (1983). The standard error increases (precision decreases) when the likelihood of all positive or all negative results increases. The precision is greatest when approximately equal numbers of positive and negative results are most likely to occur. The optimum mean inoculum (about 1.6 organisms/culture) is more likely to be missed by chance in a 10-fold than in a 2-fold
CH008-N53039.indd 154
5/26/2008 7:58:48 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
155
TABLE 8.5 Standard Error of log mˆ and Factor for CL Number of samples per dilution n 1 2 3 4 5 6 7 8 9 10
SE(log10 mˆ ) for dilution ratio ()
Factor a for 95% CL with dilution ratio ()
2
4
5
10
2
4
5
0.301 0.213 0.174 0.150 0.135 0.123 0.114 0.107 0.100 0.095
0.427 0.302 0.246 0.214 0.191 0.174 0.161 0.151 0.142 0.135
0.460 0.325 0.265 0.230 0.206 0.188 0.174 0.163 0.153 0.145
0.580 0.410 0.335 0.290 0.259 0.237 0.219 0.205 0.193 0.183
4.00 2.67 2.23 2.00 1.86 1.76 1.69 1.64 1.58 1.55
7.14 4.00 3.10 2.68 2.41 2.23 2.10 2.00 1.92 1.86
8.32 4.47 3.39 2.88 2.58 2.38 2.23 2.12 2.02 1.95
10 14.45 6.61 4.68 3.80 3.30 2.98 2.74 2.57 2.43 2.32
Source: Reproduced from Cochran (1950), by permission of the International Biometric Society. In deriving the 95% CL the MPN should be multiplied or divided by the appropriate factor. For example if the MPN for a 5-test dilution series with a 10-fold dilution ratio () is 22 organisms/ml, the 95% CL are given by (22/3.30) and (22 3.30), that is 6.7 to 72.5. a
dilution series, hence the graph for 10-fold dilutions fluctuates more than that for 2-fold dilutions (Fig. 8.3). These CLs assume that the logarithm of the MPN is normally distributed and differ from those cited by De Man (1975), who derived CLs from the computer-printed histograms of the distribution functions for different combinations of test numbers and positive results. However, De Man’s Tables (Tables 8.6 and 8.7) cover only 10-fold dilution ratios; the data in Table 8.4 provides a way of assigning confidence limits for MPNs derived from other dilution series, as does the computer programme given by Hurley and Roscoe (1983). Differences Between MPN Values From time to time it may be necessary to establish the significance of a difference between two MPN estimations. This can be determined using the combined standard errors of the two MPNs and standard statistical tables of Students ‘t’ test: t
CH008-N53039.indd 155
log m^ 1 log m^ 2 ⎛ log 1 ⎞⎟ ⎛ log 2 ⎞⎟ ⎟ ⎜⎜ ⎟ 0 . 55 ⎜⎜⎜ ⎜⎝ n1 ⎟⎟⎠ ⎜⎜⎝ n2 ⎟⎟⎠
5/26/2008 7:58:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
156
TABLE 8.6 MPN Table for 3 1, 3 0.1 and 3 0.01 g/ml Inoculum Confidence limits 95% Number of positive results 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
0 0 1 1 2 3 0 0 0 1 1 2 2 3 0 0 0 1 1 1 2 2 2 3 3 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
0 1 0 1 0 0 0 1 2 0 1 0 1 0 0 1 2 0 1 2 0 1 2 0 1 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3
MPN
Categorya
0.30 0.30 0.30 0.61 0.62 0.94 0.36 0.72 1.1 0.74 1.1 1.1 1.5 1.6 0.92 1.4 2.0 1.5 2.0 2.7 2.1 2.8 3.5 2.9 3.6 2.3 3.8 6.4 4.3 7.5 12 16 9.3 15 21 29 24 46 110 110
– 2 1 3 2 0 1 1 0 1 2 1 3 3 1 1 3 1 1 3 1 2 0 2 3 1 1 2 1 1 2 0 1 1 1 3 1 1 1 –
Lower 0.00 0.01 0.01 0.12 0.12 0.35 0.02 0.12 0.4 0.13 0.4 0.4 0.5 0.5 0.15 0.4 0.5 0.4 0.5 0.9 0.5 0.9 0.9 0.9 0.9 0.5 0.9 1.6 0.9 1.7 3 3 1.8 3 3 9 4 9 20
99% Upper
0.94 0.95 1.00 1.70 1.70 3.50 1.70 1.70 3.5 2.00 3.5 3.5 3.8 3.8 3.50 3.5 3.8 3.8 3.8 9.4 4.0 9.4 9.4 9.4 9.4 9.4 10.4 18.1 18.1 19.9 36 38 36.0 38 40 99 99 198 400
Lower
Upper
0.00 0.00 0.00 0.05 0.05 0.18 0.01 0.05 0.2 0.06 0.2 0.2 0.2 0.2 0.07 0.2 0.2 0.2 0.2 0.5 0.2 0.5 0.5 0.5 0.5 0.3 0.5 1.0 0.5 1.1 2 2 1.2 2 2 5 3 5 10
1.40 1.40 1.60 2.50 2.50 4.60 2.50 2.50 4.6 2.70 4.6 4.6 5.2 5.2 4.60 4.6 5.2 5.2 5.2 14.2 5.6 14.2 14.2 14.2 14.2 14.2 15.7 25.0 25.0 27.0 44 52 43.0 52 56 152 152 283 570
Source: Reproduced from De Man (1983), by kind permission of the author and Springer Science and Business Media. a
For explanation of see Footnote to Table 8.7.
CH008-N53039.indd 156
5/26/2008 7:58:48 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
157
TABLE 8.7 MPN Table for 5 1, 5 0.1 and 5 0.01 g/ml Inoculum Confidence limits 95% Number of positive results 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3
0 0 1 1 2 2 3 0 0 0 1 1 1 2 2 3 3 4 0 0 0 1 1 1 2 2 2 3 3 4 0 0 0 1 1 1 2 2 2 3 3 3
0 1 0 1 0 1 0 0 1 2 0 1 2 0 1 0 1 0 0 1 2 0 1 2 0 1 2 0 1 0 0 1 2 0 1 2 0 1 2 0 1 2
MPN organisms /g (ml) 0.18 0.18 0.18 0.36 0.37 0.55 0.56 0.20 0.40 0.60 0.40 0.61 0.81 0.61 0.82 0.83 1.0 1.1 0.45 0.68 0.91 0.68 0.92 1.2 0.93 1.2 1.4 1.2 1.4 1.5 0.78 1.1 1.3 1.1 1.4 1.7 1.4 1.7 2.0 1.7 2.1 2.4
99%
Categorya
Lower
Upper
Lower
Upper
– 1 1 2 2 3 3 1 1 3 1 2 0 1 3 3 0 0 1 1 3 1 1 3 1 2 0 2 3 3 1 1 2 1 1 3 1 1 3 1 2 3
0.00 0.00 0.01 0.07 0.07 0.17 0.17 0.02 0.07 0.17 0.07 0.17 0.33 0.18 0.35 0.33 0.3 0.3 0.08 0.18 0.33 0.19 0.33 0.4 0.34 0.4 0.6 0.4 0.6 0.6 0.21 0.4 0.6 0.4 0.6 0.6 0.6 0.7 0.7 0.7 0.7 1.0
0.65 0.65 0.65 0.99 0.99 1.40 1.40 0.99 1.00 1.40 1.10 1.40 2.20 1.40 2.20 2.20 2.2 2.2 1.40 1.50 2.20 1.70 2.20 2.5 2.20 2.5 3.4 2.5 3.4 3.4 2.20 2.2 3.4 2.5 3.4 3.4 3.4 3.9 3.9 3.9 3.9 6.6
0.00 0.00 0.00 0.02 0.02 0.09 0.09 0.01 0.02 0.09 0.03 0.09 0.20 0.09 0.20 0.20 0.2 0.2 0.04 0.09 0.20 0.10 0.20 0.2 0.20 0.2 0.4 0.2 0.4 0.4 0.12 0.2 0.4 0.2 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.7
0.93 0.93 0.93 1.40 1.40 2.10 2.10 1.40 1.40 2.10 1.40 2.10 2.80 2.10 2.80 2.80 2.8 2.8 2.10 2.10 2.80 2.30 2.80 3.4 2.80 3.4 4.4 3.4 4.4 4.4 2.80 2.9 4.4 3.4 4.4 4.4 4.4 5.1 5.2 5.2 5.2 9.4 (Continued)
CH008-N53039.indd 157
5/26/2008 7:58:49 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
158
TABLE 8.7 (Continued) Confidence limits 95% Number of positive results 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
4 4 5 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 0 0 0 0 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4
0 1 0 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 0 1 2 0 1 0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 3 4 0
MPN organisms /g (ml) 2.1 2.4 2.5 1.3 1.7 2.1 2.5 1.7 2.1 2.6 3.1 2.2 2.6 3.2 3.8 2.7 3.3 3.9 3.4 4.0 4.7 4.1 4.8 2.3 3.1 4.3 5.8 3.3 4.6 6.3 8.4 4.9 7.0 9.4 12 15 7.9 11 14 17 21 13
99%
Categorya
Lower
Upper
Lower
Upper
2 3 3 1 1 2 0 1 1 2 0 1 1 2 0 1 1 3 1 2 3 3 3 1 1 2 3 1 1 1 3 1 1 1 2 0 1 1 1 2 3 1
0.7 1.0 1.0 0.4 0.6 0.7 1.0 0.6 0.7 1.0 1.0 0.7 1.0 1.0 1.3 1.0 1.0 1.3 1.3 1.3 1.4 1.3 1.4 0.7 1.0 1.3 2.1 1.0 1.4 2.1 3.4 1.5 2.2 3.4 3 6 2.3 3 5 7 7 3
4.0 6.6 6.6 3.4 3.4 3.9 6.6 3.9 4.1 6.6 6.6 4.8 6.6 6.6 10.0 6.6 6.6 10.0 10.0 10.0 11.3 10.0 11.3 6.6 6.6 10.0 14.9 10.0 11.3 14.9 22.0 14.9 16.8 22.0 24 35 22.0 24 35 39 39 35
0.5 0.7 0.7 0.3 0.4 0.5 0.7 0.4 0.5 0.7 0.7 0.5 0.7 0.7 0.9 0.7 0.7 0.9 0.9 0.9 0.9 0.9 0.9 0.5 0.7 0.9 1.4 0.7 0.9 1.4 2.1 0.9 1.4 2.1 2 4 1.5 2 3 4 4 3
5.2 9.4 9.4 4.4 4.4 5.2 9.4 5.1 5.3 9.4 9.4 6.1 9.4 9.4 14.7 9.4 9.4 14.7 14.7 14.7 14.7 14.7 14.7 9.4 9.4 14.7 20.0 14.7 14.7 20.0 27.0 20.0 23.0 28.0 32 45 27.0 32 45 51 51 45 (Continued)
CH008-N53039.indd 158
5/26/2008 7:58:49 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
159
TABLE 8.7 (Continued) Confidence limits 95% Number of positive results 5 5 5 5 5 5 5 5 5 5 5
4 4 4 4 4 5 5 5 5 5 5
MPN organisms /g (ml)
1 2 3 4 5 0 1 2 3 4 5
17 22 28 35 43 24 35 54 92 160 160
Categorya
Lower
1 1 1 1 3 1 1 1 1 1
6 7 10 10 15 7 10 15 23 40
Upper 39 44 70 70 106 70 106 166 253 460
99% Lower
Upper
4 4 6 6 9 4 6 10 15 20
51 57 92 92 150 92 150 223 338 620
Source: Reproduced from De Man (1983), by kind permission of the author and Springer Science and Business Media. a
Explanation of the category system: Category 1: A valid result, likely to occur on 95% of occasions. The number of organisms in the sample is equal to the MPN and the result has the greatest chance of being obtained. Category 2: A result less reliable than Category 1 that is likely to occur on only 1% of occasions; Category 3: A result even less likely to occur than Category 2, with 0.1% chance of occurring. Category 0: Results in this category are unlikely to occur, other than by chance. In the absence of purely technical errors, the likelihood of such a result is less than 0.01%. Before starting testing, it is essential to decide which categories will be acceptable, that is only Category 1, Categories 1 and 2, or even Categories 1, 2 and 3. When the decision to be taken on the basis of the result is important, only Category 1 or at most Category 1 and 2 results should be accepted. All Category 0 results should be considered with great suspicion. Caution: The CL given in the tables are meant only to provide an indication of the statistical limits of the MPN results. Other sources of variation may sometimes be more important.
ˆ 1 and m ˆ 2 are the MPN values for two series of tests (1) and (2), with dilution where, m ratios 1 and 2, respectively and n1 and n2 are the numbers of tests for each series. If the dilution ratio () is 10-fold, the equation is given by: t
log m^ 1 log m^ 2 ⎛ log 1 ⎞⎟ ⎛ log 2 ⎞⎟ ⎟ ⎟ ⎜⎜ 0 . 58 ⎜⎜⎜ ⎜⎝ n1 ⎟⎟⎠ ⎜⎜⎝ n2 ⎟⎟⎠
log m^ 1 log m^ 2 ⎛1⎞ ⎛ 1 ⎞ 0 . 58 ⎜⎜⎜ ⎟⎟⎟ ⎜⎜⎜ ⎟⎟⎟ ⎜⎝ n1 ⎟⎠ ⎜⎝ n2 ⎟⎠
The use of the method is illustrated in Example 8.3.
CH008-N53039.indd 159
5/26/2008 7:58:49 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
160
EXAMPLE 8.3 CALCULATION OF THE MPN FOR TWO DILUTION SERIES AND THE SIGNIFICANCE OF THE DIFFERENCES BETWEEN THE MPN ESTIMATES In the first test (A), 5 tubes were each inoculated with 1 ml of a series of 10-fold dilutions and, in the second test (B), 3 tubes were each inoculated with 1 ml of a series of 10-fold dilutions. The results were: Number of positive tests at dilution Test
Number of tests at each dilution
A B
5 3
101
102
103
104
5 3
2 3
1 0
0 0
Then from Table 8.7, using the sequence 5-2-1 for Sample A, the MPN 7 10 70 organisms/ml, with 95% CL of 22–168 organisms/ml. and, from Table 8.6, using sequence 3-3-0 for Sample B, the MPN 240 organisms/ml, with 95% CL of 40–990 organisms/ml. Hence, these two test series provide MPNs of 70 and 240 organisms/ml but the 95% CLs of the MPNs overlap. Is the difference in MPN estimates significant? We can define two alternative hypotheses: (1) The null hypothesis is that the results are not different (Ho); and (2) The alternative hypothesis is that the results are significantly different (H1). We can test the significance of the difference between the results using Student’s ‘t’ test, viz: t
^ ^ log m 1 log m2 log 1 log 2 0 . 58 n1 n2
ˆ 1 and m ˆ 2 MPNs given by tests A (70) and B (240), respectively; the where, m dilution ratio 1 2 10; n1 and n2 number of tubes at each dilution level, for test A (n1 5) and test B (n2 3), with six degrees of freedom (ν 5 32). Then: log 240 log 70 log 10 log 10 0 . 58 5 3 2 . 3802 1 . 8451 0 . 5351 0 . 4236 0 . 58 0 . 2 0 . 33 1 . 2632
t
CH008-N53039.indd 160
5/26/2008 7:58:50 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
161
If we now look up the percentage points of the ‘t ’ distribution (e.g. table 12 in Pearson and Hartley, 1976), for 6 degrees of freedom, the observed value of t (1.2632) is slightly less than the tabulated value (1.440) for P 0.10. Such a difference would therefore be expected to occur on 1 occasion in 10 (P 0.10). Hence, the difference between these observed MPN values could easily occur by chance and the null hypothesis (H0) that the two results do not differ significantly is not rejected and the alternative hypothesis (H1) is rejected.
Special Applications of Multiple-Tube Dilution Tests For control purposes it may be necessary to make a decision to accept or reject a consignment of food based on the result of an MPN test. For such purposes it is possible to construct an operating characteristics (OC) curve for acceptance or rejection at specific confidence levels of tube dilution counts. The best control system would be based on the use of a limit permitting ‘acceptance of the ‘lot’ if less than c out of n tests at dilution i show evidence of growth’ (i.e. if the number of sterile tests at dilution i is ci). The shape of the OC curve will be dependent on the values c and n for the dilution level giving the critical density of organisms. On the assumption that log m is normally distributed when n is large, the steepest part of the OC curve will occur at the 50% acceptance point (approximate value of c 0.2n). Lower percentage acceptance would be associated with c 0.2n and vice versa. The appropriate choice of c will therefore lie in the range 0–0.5n, assuming that the cumulative binomial distribution is a valid model. Aspinall and Kilsby (1979) proposed a quality control procedure based on this concept. In their scheme a critical value (c) is chosen such that when: (1) the cell density exceeds c the probability of rejection is at least 0.95; and (2) when the cell density is less than 0.2c, the probability of acceptance is 0.95. For a critical value (c) of 10z1 cfu/g, the test sample is diluted 10(z1)-fold and 0.5 ml of this suspension is inoculated into each of two tubes of culture medium. A further 10fold dilution of the suspension (i.e. a 10z-fold dilution) is prepared and 0.5 ml is added to each of seven tubes of media. After incubation, the MPN and confidence limits (CL) can be derived from the data in Table 8.8; it should be noted that the CL values shown are twice those cited in the original paper of Aspinall and Kilsby (1979) (Kilsby and Aspinall, 1983, personal communication). If all tubes give a positive response, the material is rejected; if a result is obtained which is not shown in Table 8.8 the test must be repeated. The scheme is illustrated in Example 8.4.
CH008-N53039.indd 161
5/26/2008 7:58:50 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
162
TABLE 8.8 Estimates of MPN for a Defined Quality Control Multi-Tube Test Numbers of positive tests at dilution 10(z1) 0 0 1 1 2 1 2 2 2 2 2 2 2
95% Confidence limits MPN/g 10(z1)
10z
Lower 10(z1)
Upper 10(z1)
– 0.014 0.038 0.32 0.48 1.34 0.90 1.92 2.8 4.0 6.8 11.2 19.6
2.8 2.8 4.8 6.4 10.8 6.8 16.2 22.0 30.0 40.0 58.0 98.0
0.76 (0) 0.76 0.92 1.9 2.7 3.0 4.6 7.4 11.0 17.0 25.0 39.0 39 ( )
0 1 0 1 0 2 1 2 3 4 5 6 7
Source: Modified from Aspinall and Kilsby (1979).
QUANTIFICATION BASED ON RELATIVE PREVALENCE OF DEFECTIVES The use of the binomial distribution to describe the OC curve for sample plans was described in Chapter 5. This relationship, describing the probability of occurrence of defective samples can be used also to quantify the prevalence of defective units and the probable level of occurrence of target organisms in a food lot. On the assumption that the organisms are distributed randomly and that a number (n) of samples is tested, then the probability (P) of no defective test samples (i.e. no positive tests) is given by Pd0 (1d/100)n, where d actual % defectives in the lot. This will be valid provided that n is a small fraction ( 20%) of the total lot. The equation may be rearranged to give an estimate ( dˆ ) of the true prevalence of defectives for a given probability and number of samples: dˆ 100 1 n (1 P) . Knowing the probable percentage prevalence ( dˆ ) of defective samples and the size of each sample (W) it is possible to derive an upper CL for the probable contamination level (C) for the lot:
(
C
)
(
1000 d 1000 1 n (1 P) W 100 W
)
organisms/kg at a probability of P. Table 8.9 provides derived values for the upper CL of the prevalence of contamination at P 0.95 and P 0.99, assuming a sample weight (W) of 25 g for various numbers of samples (n). The calculation is illustrated in Example 8.5. Note that this expression is valid only if all tests are negative.
CH008-N53039.indd 162
5/26/2008 7:58:50 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
163
EXAMPLE 8.4 USE OF MULTIPLE-TUBE MPN FOR QUALITY ACCEPTANCE/REJECTION PURPOSES Suppose for a food product, that the manufacturer’s quality specification requires rejection of product containing more than a critical limit (c) and that c 1000 103 organisms/g. If we set 103 10(z1), then z 2. A result that shows a level of organisms greater than c will result in rejection of the product, whilst a lower level will permit acceptance for quality testing purposes. An MPN scheme using different levels of test at 2 dilutions of the sample has been devised for acceptance or rejection of sample lots (Aspinall and Kilsby, 1979). Serial dilutions of product samples from each of three lots were prepared at 10(z1) (i.e. 101) and 10z (i.e. 102). Two tubes of culture medium were each inoculated with 0.5 ml of the 101 dilution and 7 tubes were each inoculated with 0.5 ml of the 102 dilution. Following incubation, the cultures were examined for evidence of growth and the following results were recorded for the three samples of product. Result at dilution of Product lot A B C a
101 2 1 0
102
MPN/ga
Decision
7 5 1
39 10 390 Not calculable 0.76 101 7.6
Reject lot Repeat test Accept lot
1
From Table 8.8.
A potential problem in the application of the method as described is that if all nine tubes give positive results (i.e. 2, 7) the MPN cannot be calculated and the result requires that the product be rejected. However, if only eight positive tubes had occurred (i.e. 2, 6) the product would be accepted with an MPN of 390/g (Table 8.8). If, as in Example 8.4, the critical level (c) were an MPN of 1000/g, the cut-off of 390/g would appear to be somewhat low. It should be noted that in Table 8.8 several potential results are not shown, since it is believed that such results would be highly improbable – hence the need to repeat the test on Sample B. This scheme of Aspinall and Kilsby (1979) is much less likely to reject satisfactory material than is a 3 level, 3-test scheme (3,3,3) such as is frequently used for control purposes (ICMSF, 1978).
CH008-N53039.indd 163
5/26/2008 7:58:50 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
164
TABLE 8.9 Upper CL for Percentage Defective Sample Units and Contamination Levels Assuming Random Distribution of Target Organism Estimated mean number of organisms/kg a,b at
% defectives a at Number of sample units tested (n) 1 2 3 4 5 10 20 30 50 100
P 0.95
P 0.99
P 0.95
P 0.99
95 77.6 63.2 52.7 45.1 25.9 13.9 9.5 5.8 3.0
99.0 90.0 78.5 68.4 60.2 36.9 20.6 14.2 8.8 4.5
38 31 25 21 18 10 6 4 2 1
40 36 31 27 24 15 8 6 4 2
a
Assuming no positives detected. Assuming sample size 25 g; rounded to the nearest whole number.
b
EXAMPLE 8.5 CALCULATION OF THE UPPER BOUND OF THE 95% AND 99% CONFIDENCE INTERVALS FOR CONTAMINATION USING PRESENCE/ABSENCE TESTS Assume 10 25 g samples are tested for salmonellae and all tests are negative. What are 95% and 99% confidence intervals (CI) for the probable prevalence of contamination? 95% Confidence interval Use the formula: d 100 (1 n ) , for 0.05. Then the upper bound for the 95% CI for the probable incidence of defective (i.e. contaminated) sample units is given by: d 100 (1 10 0 .05 ) 100 (1 0 .741) 25 .9 % ; and the 95% probability for contamination of the lot at the upper bound of the CI is given by: 25 . 89 1000 = 0 . 2589 40 10 . 4 ⬇ 11 salmonellae/kg. 100 25
99% Confidence interval Similarly, the 99% upper bound of the CI for prevalence of contaminated sample units is given by: d 100 (1 10 0 .01 ) 36 .9 % ; and the 99% probability for contamination of the lot at the upper bound of the CI is given by: 36 . 9 1000 = 0 . 369 40 14 . 76 ⬇ 15salmonellae/kg. 100 25
CH008-N53039.indd 164
5/26/2008 7:58:50 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
165
Then by testing 10 25 g sample units with negative results, we can say that the prevalence of defective samples in the ‘lot’ lies between 0% and 26% at a 95% probability level or between 0% and 37% at a 99% probability level; and that the CLs for the levels of contamination lie between 0 and 11 salmonellae/kg at a 95% probability and between 0 and 15 salmonellae/kg at a 99% probability.
From the data in Table 8.9, it can be seen that in testing for a specific target organism (e.g. Salmonella) using an appropriate method, a negative test result on a single sample (i.e. n 1) of 25 g will merely indicate that at the upper 95% CL the lot will contain no more than 38 salmonellae/kg. However, this also means that there is a 5% chance that the true prevalence will exceed this value by an unknown amount. Increasing the number of samples tested increases the level of confidence that the prevalence of the salmonellae is low since if all test results were negative, the upper 95% CL on, for example, 10 or 20 tests would be not more than 11, nor more than 6, salmonellae/kg, respectively. The data also indicate that the 99% upper CLs for prevalence would be 40, 15 or 8 salmonellae/kg, respectively, for 1, 10 or 20 samples tested with ‘negative’ results. Note, that no matter how many sample units are tested one can never give a guarantee that the ‘lot’ from which the samples were drawn is totally free from salmonellae. For multiple tests at a single dilution, a mean contamination level can be estimated but this value is not an MPN. Using the equation: m ˆ (1 / V) ln(S/ n) , a single positive result from 10 tests each of 25 g provides a mean value estimate of 4.2 salmonellae/kg, with 95% CI ranging from 1 to 60 organisms/kg. It should be noted that this estimated value is less than the 95% upper CL value (10 salmonellae/kg) that would be expected if all tests had been negative.
SOME STATISTICAL ASPECTS OF MULTI-STAGE TESTS The Test Procedure When samples are tested for the presence or absence of specific organisms, the analytical procedures frequently involve four stages: pre-enrichment; selective enrichment; plating onto diagnostic agar; and identification of ‘typical’ colonies. Different methodological procedures are sometimes recommended in relation to the media to be used and the incubation times and temperatures during the pre-enrichment stage. For example, Hall (1975) recommended pre-enrichment incubation for 4 h at 37°C of a 1 in 2 dilution of a sample in peptone buffer, before addition of the entire pre-enrichment medium to an equal volume of double-strength selective enrichment broth. By contrast the ISO (2002) standard method for detection of salmonellae in foods and feeds requires that the initial 25 g sample should be inoculated into 9 volumes of a liquid pre-enrichment medium (buffered peptone water) that is incubated for 18 2 h at 37°C before transferring 1 ml into 10 ml
CH008-N53039.indd 165
5/26/2008 7:58:50 PM
166
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
of each of two selective enrichment media (Rappaport Vassiliadis medium with Soya (RVS) and Muller-Kauffman tetrathionate/novobiocin broth (MK)). The RVS is incubated for 24 3 h at 41.5°C and the MK for 24 3 h at 37°C. Thereafter, one 5 mm loopful (about 0.02 ml) from each enrichment culture is streaked on to one plate of each of two selective diagnostic media that are incubated for 24 3 h at 37°C, prior to selection of typical colonies for typing. Hence, three dilution stages occur. The first stage results in a 10-fold dilution of the microorganisms in the test sample. The numbers of both target and competitor organisms that grow during incubation will depend on (a) the initial contamination levels and dispersion of both the target organism and competitive organisms; (b) the condition of the organisms (i.e. whether cells have been ‘stressed’ by the environmental conditions); and (c) the population doubling time for the target organism in the conditions of the test (i.e. culture medium composition and the incubation temperature and time). Growth of target organisms during the second and third stages (selective enrichment and diagnostic culture) will also be dependent on the specific cultural conditions. The fourth stage (identification of discrete colonies) is considered below. The possibilities for isolation of specific organisms will depend not only on the initial prevalence of contamination but also on the manner in which the test is done. A period of pre-enrichment that is too short may not result in recovery of the stressed organisms; this reduces the probability of transferring resuscitated organisms to the enrichment broth and subsequently to the diagnostic culture. Too long a period in a non-selective pre-enrichment may result in overgrowth by competitive organisms, leading to suppression of the target organism. An illustration of the effects of incubation time is shown for a hypothetical example in Examples 8.6 and 8.7, and in Fig. 8.4.
EXAMPLE 8.6 EFFECT OF INCUBATION TIME IN PREENRICHMENT MEDIUM ON NUMBERS OF SALMONELLAE FOR SUBCULTURE Assume: 1. 25 g dry food sample is inoculated into 225 ml of peptone broth for pre-enrichment; 2. the initial contamination level 1 salmonella/10 g sample; 3. the length of lag phase (L) for resuscitation 6 h; 4. population doubling time (Td) in the pre-enrichment broth is 30 min (0.5 hr), at the appropriate incubation temperature; and, for simplicity, 5. synchronous growth will occur. The relative numbers of organisms can be determined using the growth kinetics equation: N N 0 2(TL) /Td , where: NT number of salmonellae/ml after time T (h), N0 initial number of salmonellae/ml, L length of lag phase (h); and Td doubling time (h).
CH008-N53039.indd 166
5/26/2008 7:58:51 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
167
Since the initial number of salmonellae in 25 g of food sample 2.5, then the initial number in the enrichment culture N0 2.5/250 0.01 organisms/ml. If L 6 h, then the number of viable salmonellae after incubation for 6 h will not have changed (provided that no cells die); this is derived from the kinetic equation, viz: NT 0.01·2(66)/0.5 0.01·20 0.01. Similarly, we can derive values for other incubation periods: 9 h,
N T 0 . 01 2(96) / 0.5 0 . 01 26 0 . 01
12 h,
N T 0 . 01 2(126) / 0.5 0 . 01 212 41 . 0
18 h,
N T 0 . 01 2(186) / 0.5 0 . 01 224 1 . 68 1 03
24 h,
N T 0 . 01 2(246) / 0.5 0 . 01 236 6 . 87 108
Then, assuming random distribution, the expected number of organisms likely to be transferred in an inoculum of pre-enrichment medium and the probability of transferring at least one viable organism can be calculated for any of these, or other, time intervals: Probability (P ) of 1 organism in Incubation time (h) a 6 9 12 18 24
Estimated mean number of organisms in 1 ml medium b 0.01 0.64 41 1.7 103 6.9 108
1 ml 0.01 0.47 0.99 0.99 0.99
10 ml 0.095 0.99 0.99 0.99 0.99
a In pre-enrichment, assuming conditions as above. b
Assuming 0.01 salmonellae/g food sample and random distribution.
Once sufficient growth has occurred during pre-enrichment, the 95% probability of transferring at least one viable organism in 1 ml of culture requires at least three ˆ 3 : Px0 0.05, hence viable salmonellae/ml, assuming Poisson distribution ( m P(x 1) 0.95). The Table (above) shows that, for the conditions and kinetic values used, the pre-enrichment culture would need to be incubated for at least 12 h to have a 99% probability of transferring a viable inoculum in 1 ml of the culture. At lower viable cell numbers, the chances of transferring at least one viable organism will be improved by inoculation of a larger volume of pre-enrichment culture (e.g. 10 ml) into the enrichment culture, or by using procedures such as immuno-magnetic beads to recover the target organism from a large volume of the pre-enrichment culture. In media where resuscitation, but no growth, occurs during pre-enrichment it is essential to use the entire pre-enrichment culture to inoculate double-strength enrichment medium (Hall, 1975), although the selectivity may then be affected by food constituents.
CH008-N53039.indd 167
5/26/2008 7:58:51 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
168
EXAMPLE 8.7 PROBABILITY OF TRANSFERRING SALMONELLAE DURING ENRICHMENT AND ISOLATION A question often asked about presence or absence tests for pathogens is, ‘What is the probability of detecting salmonellae, if present in a food sample, using pre-enrichment and enrichment cultures prior to streaking onto diagnostic agars to isolate any organisms that may be present?’ Assume: 1. the initial mean contamination level 1 salmonella/10 g sample; 2. pre-enrichment for 12 h, yielding about 41 salmonellae/ml (as Example 8.6); 3. 1 ml pre-enrichment medium inoculated into 9 ml selective enrichment medium; 4. lag time (L) in enrichment medium 4 h; doubling time 60 min 1 h; 5. enrichment period 18 h or 24 h; and 6. 0.02 ml (1 loopful) inoculated onto diagnostic medium. From Example 8.6, the 12 h pre-enrichment culture would contain 41 salmonellae/ml. Since the pre-enrichment is diluted 1 in 10 into a selective enrichment medium the initial level of salmonellae would be 41 /10 4 .1 /ml N0 . After incubation for 18 h, and assuming synchronous growth and other factors as summarized above, then: N T 4 . 1 2(184) / 1 4 . 1 214 ⬵ 6 . 7 104 salmonellae/ml of enrichment culture
Similarly, after 24 h: N T 4.1 2(244) / 1 4.1 220 ⬵ 4.3 106 salmonellae/ml of enrichment culture
Hence, a standard 0.02-ml loop would be expected to transfer ca.1340 salmonellae after 18 h incubation or 86,000 salmonellae after 24 h incubation. Hence, the statistical probability of transferring at least one viable salmonellae to the isolation medium would be 0.99. However, if the enrichment medium had been inoculated after 6 h preenrichment (see Example 8.6), during which time resuscitation but no cell growth had occurred, then the level of inoculum transferred to the enrichment medium would have been only 0.1 cell/10 ml of inoculum. It would therefore have been extremely unlikely (P0.01) that salmonellae would have been detected in the enrichment culture even after 24 h. These effects are illustrated diagrammatically in Fig. 8.4.
CH008-N53039.indd 168
5/26/2008 7:58:51 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
169
Log10 number salmonell ae/ml or per loop (0.02 ml)
10
8 P >> 0.99 6 P >> 0.99
4
P >> 0.99 P >> 0.99
2
P > 0.99
P > 0.99
0
P 0.27 2
4
P 0.47
N0 P- < 0.01 0
6
12
18
24
30
36
42
Total incubation time (h) FIGURE 8.4 Hypothetical illustration of changes in numbers of viable salmonellae during incubation in preenrichment (---) and enrichment (—) cultures, and the probability of transfer of at least 1 salmonella cell in 1 ml of the culture after specific periods of time. It is assumed that the lag phase is 6 h and that the viable cells then grow with a doubling time of 30 min (see Example 8.7).
Compositing of Samples When multiple samples are to be tested, it is sometimes recommended that the samples be composited in order to reduce the workload on the laboratory. Suppose that a test procedure requires the testing of 10 25 g samples for the presence of salmonellae and suppose that only one of the 10 samples is contaminated. What are the options for compositing samples? Several alternative approaches have been recommended, but various critical issues must be resolved before choosing a procedure for use. (1) An unacceptable approach. The 10 25 g samples are composited into a single 250 g quantity from which a single 25 g sub-sample is taken for testing. This method, which is used widely in chemical analyses to ensure greater homogeneity of the material sampled, is potentially prone to microbiological cross-contamination risks. More importantly, however, in statistical terms, only a single sample has been tested (not 10 samples), so that the results
CH008-N53039.indd 169
5/26/2008 7:58:51 PM
170
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
will provide no more information about the lot from which the original 10 samples were taken than would testing of a single sample, albeit the combined test sample will be more representative of the lot than would a single original test sample. (2) Bulk compositing and testing. The second approach combines all 10 samples at the pre-enrichment stage; that is 250 g of combined sample is inoculated directly into 2250 ml of pre-enrichment medium. After incubation, the subsequent stages are done on the same basis as for a single sample (i.e. 10 ml pre-enrichment medium is inoculated into 100 ml of enrichment medium, etc.). The key issue here is the level of sensitivity of the test method. What is the likelihood that a method capable of detecting, say, one viable cell in a 25 g sample of food material will detect one viable cell in 250 g of the same food? A positive result would indicate that at least one of the original samples was contaminated, but a negative result (i.e. failure to detect the target organism) might suggest wrongly that none of the test samples was contaminated. It may be that the sensitivity of the test method can be improved by for example extending the pre-enrichment incubation period, but this would work only if growth of competitive organisms can be adequately controlled. Only if it has been demonstrated that the level of sensitivity of the method is at least one organism in 250 g is it permissible to test such composite samples. (3) Wet compositing. The only fully acceptable method of compositing is to inoculate 10 pre-enrichment cultures (i.e. one for each 25 g sample) and, after incubation, to inoculate equal volumes of each pre-enrichment culture (i.e. 10 1 ml) into 100 ml enrichment broth. By retaining the original pre-enrichment samples, it is possible also to re-culture from these if the composite enrichment broth gives a positive result, so that the prevalence of positive samples can be determined. Example 8.8 illustrates the statistical aspects that must be taken into account in compositing of samples for microbiological analyses. A more detailed appraisal of compositing has been published by Jarvis (2007).
EXAMPLE 8.8 STATISTICAL ASPECTS OF SAMPLE COMPOSITING (Based in part on Jarvis, 2007 with permission of Blackwell Publishing) Suppose that a microbiological food safety criterion requires the testing of 30 samples, each of 25 g, for the presence of salmonellae using defined methods (e.g. EU (2005) for infant feeds). Then, provided that the test procedure is sufficiently sensitive to detect a single salmonella in a larger quantity of sample, from a statistical perspective there is no difference in testing 1 750 g, 3 250 g or 30 25 g samples (H. Marks, 2006, personal communication). But if there is any doubt as to the level of sensitivity of the test procedure, then compositing should not be done, except using a ‘wet’ compositing approach where each individual sample unit is pre-enriched separately (Jarvis, 2007).
CH008-N53039.indd 170
5/26/2008 7:58:52 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
171
Assume n sample units, each unit (i ) being of size ki gram with a cell density of i cells/g. Then, provided that the test procedure is sufficiently sensitive to detect the organism if present, the probability (P(x0)) of finding a negative result on any sample i is given by: P( xi 0) ei ki
If 1 1 cell/10 g and k1 25 g, then the probability of not detecting any organism in the 25 g sample (assuming the test protocol is 100% effective) is given by: P( x1 0) e12.5 e2.5 0 . 082
The probability of finding a negative result on all n units is the product of the individual probabilities for a negative result in each unit: P( x1 0)( x2 0)( x3 0)...( xn 0) ek1 ek2 ek3 ... en kn eki
To illustrate this effect, suppose that we have 3 250 g sample units with levels of 0.01, 0.00 and 0.02 viable cells/g, respectively; then the total cell count in the three sample units will be 2.5, 0.0 and 5.0 cells/250 g, respectively. The probability of obtaining a negative (or ‘conversely’ of not detecting a positive) result in the individual sample units will be 0.082, 1.00 and 0.007, respectively. The combined probability of a negative result on the three sample units will be given by: P( x1 0)( x2 0)( x3 0) e di ki e (2.505.0) e7.5 0 . 00055
This is the same probability that would be found by using the mean cell count (0.01 cells/g) and multiplying by a k value of 750 g Suppose then that 30 samples (each of 25 g) from a single lot are tested individually with negative results. Then it is possible to assign a CI for the true prevalence of contamination in the lot from which the samples were taken. Such CIs will be bounded by 0 and Pu, where Pu is chosen so that the interval has a 100(1a)% probability of enclosing the true value (t%). Pu can be derived from a rearrangement of the equation (1Pu)n a , where n is the number of samples tested. With a 0.05 for a 95% CI, the probability of Pu is given by: Pu (%) 100 (1 n ) .
CH008-N53039.indd 171
5/26/2008 7:58:52 PM
172
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
For instance, if we test 30 25 g samples, with negative results, then the upper bound of the CI is given by:
(
)
(
Pu (%) 100 1 n 100 1 30 0 . 05 100(1 0 . 905) 9 . 5 %
)
In other words, obtaining negative test results on 30 25 g individual samples indicates a possible true prevalence of contamination of 9.5%, on 95% of occasions, but on 5% of occasions a higher true prevalence could exist. However, if we had tested 3 250 g composite samples with negative results, the value of the upper bound of the CI of Pu (with a 0.05) would be 63.2%. In other words, testing a smaller number of large samples with negative results will indicate a higher prevalence of defective units. However, we can derive a possible level of contamination at the upper confidence limit by multiplying the derived value for Pu (%) by 1000/W where W is the weight of the sample units tested. Hence with a value of Pu 9.5% for 30 25 g samples, the possible level of contamination at a 0.05 is 3.8 cells/kg. But, for 3 250 g where the Pu 63% the 95% probability of the true level of contamination is 2.5 cells/kg. Hence, testing larger quantities of composited sample will indicate a higher prevalence, but a lower true level, of contamination than will testing a greater number of smaller samples – provided that the negative results of the test are not due to the inability of the method to detect the target organism.
Selection of Colonies for Identification After development of isolated colonies on the diagnostic medium, it is necessary to select colonies showing typical morphology for further identification. The chance of isolation of specific organisms from an almost pure culture will of course be high. However, when mixed cultures are streaked onto agar plates, different types of organism may have apparently similar colonial morphologies. It is essential to test several apparently typical colonies from such plates. In some laboratory manuals (e.g. Hall, 1975) no reference is made to the number of colonies to pick; ICMSF (1978) recommends testing at least two typical colonies, ISO (1979) recommends selection of the square root of the number of ‘typical’ colonies and ISO (2002) recommends testing up to five typical colonies. The chance of picking any one colony of the type sought is dependent on the number (n) of the specific type of colonies relative to the total number (N) of colonies present. Since N is not large the relationship is governed by the hypergeometric distribution. This is illustrated in Example 8.9. Table 8.10 shows the number of typical colonies of the target organism
CH008-N53039.indd 172
5/26/2008 7:58:52 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
173
TABLE 8.10 Probability of Picking One or More Specific Organisms from Amongst a Number of ‘Typical’ Colonies on a Diagnostic Plate Number (%) of colonies of target organism required on culture plate b for the stated probability Number of ‘typical’ Number of colonies (N) colonies pickeda (n) 30 40 55 70 90 110 135 150
5 6 7 8 9 10 11 12
P 0.50
P 0.90
P 0.95
P 0.99
4 (13) 5 (12) 5 (9) 6 (9) 7 (8) 8 (7) 8 (6) 9 (6)
11 (37) 12 (30) 15 (27) 17 (24) 20 (22) 22 (20) 25 (19) 26 (17)
13 (43) 15 (38) 19 (35) 21 (30) 25 (28) 28 (25) 31 (23) 32 (21)
17 (57) 21 (53) 26 (47) 30 (43) 35 (39) 39 (35) 45 (33) 47 (31)
Based on sample size integer of N . To provide the stated probability (P) for picking at least one specific organism from the N typical colonies on the plate. a
b
that must be present in order to select at least one colony of the target organism from among a population of otherwise apparently typical colonies, with a given statistical probability. For instance, if the true prevalence of E. coli colonies is 50% of the colonies having ‘typical’ coliform morphology, then the probability of selecting at least one E. coli by testing 5 colonies out of a population of 30 colonies is better than 95% (P 0.05). If, however, the true prevalence is only 10% (i.e. 3 E. coli in 30 colonies) then the chance that at least one E. coli colony is included in the sample of 5 colonies is less than 50% (P 0.50). A similar situation would apply to the testing of individual colonies using PCR or other molecular methods of identification.
EXAMPLE 8.9 PROBABILITY FOR SELECTION OF A SPECIFIC COLONY FROM A DIAGNOSTIC PLATE CULTURE What is the probability for picking at least one E. coli colony from a diagnostic plate containing 30 typical coliform colonies if the actual number of non-E. coli colonies is (a) 10 and (b) 20? Since the population is finite and (relatively) small, removal of any one colony will reduce the residual number from which to select another colony. Hence, the
CH008-N53039.indd 173
5/26/2008 7:58:52 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
174
hypergeometric distribution is the most appropriate. The probability of not selecting any E. coli colonies is described by the equation: P( x0) ( !( N n) !) / ( ( n) ! N !) ,
where N number of typical coliform colonies on the plate, n number of colonies picked for identification ( integer of N ) and number of non-E. coli typical colonies on the plate. Since: P( x>1) 1 P( x0) , then P( x>1) 1 ( !( N n) !) /(( n) ! N !) ,
which simplifies to: ⎡ ( 1)( 2)...( n 1) ⎤ ⎥ P( x>1) 1 ⎢ ⎢⎣ N ( N 1)( N 2)...( N n 1) ⎥⎦
Hence, for N 30, n 5 (i.e. integer of
30 ) and 10:
⎡ (10) (9) (8) (7) (6) ⎤ ⎥ P( x>1) 1 ⎢⎢ ⎥ ⎣ (30) (29) (28) (27) (26) ⎦ 1 0 . 0 01768 0 . 9982
Similarly, for 20: ⎡ (20) (19) (18) (17) (16) ⎤ ⎥ P( x>1) 1 ⎢⎢ ⎥ ⎣ (30) (29) (28) (27) (26) ⎦ 1 0 . 108795 0 . 8912
Hence, provided that non-E. coli colonies constitute not more than one-third of the total number of typical colonies, there is a better than 99% chance, on average, that at least one E. coli colony will be picked in a sample of n 5 from a population of 30 typical colonies. If two-thirds of the typical colonies are coliforms other than E.coli, then the chance of selecting at least one E.coli in a sample of n 5 would be only 90%. However, if 25 of the 30 colonies were non-E. coli there would be only about a 60% chance that one of the five colonies picked would be an E. coli.
CH008-N53039.indd 174
5/26/2008 7:58:52 PM
ERRORS ASSOCIATED WITH QUANTAL RESPONSE METHODS
175
Hedges et al. (1977) published a theoretical study of the expected distribution of E. coli serotypes amongst samples drawn randomly from populations containing different numbers of serotypes. From these studies it was possible to recommend guidelines for planning efficient sampling programmes to determine the serotypes present in various animal species. This is of particular importance in the testing of typical colonies from culture plates of mixed culture isolates of organisms from natural sources (e.g. salmonellae, Listeria, etc.).
References Aspinall, LJ and Kilsby, DC (1979) A microbiological quality control procedure based on tube counts. J. Appl. Bacteriol., 46, 325–330. Association of Official Analytical Chemists (AOAC) (2006) Official Methods of Analysis, 18th edition. AOAC, Washington DC. Rev. 1. Blodgett, R (2003) Most Probable Number from serial dilutions. Appendix II. In Bacteriological Analytical Manual Online, Edition 8, Revision A, 1998, updated 2003. Washington DC, US FDA. http://www.cfsan.fda.gov/~ebam/bam-a2.html. Cochran, WG (1950) Estimation of bacterial densities by means of the ‘most probable number’. Biometrics, 6, 105–116. De Man, JC (1975) The probability of Most Probable Numbers. Eur. J. Appl. Microbiol., 1, 67–78. De Man, JC (1983) MPN tables corrected. Eur. J. Appl. Microbiol., 17, 301–305. Eisenhart, C and Wilson, PW (1943) Statistical methods and control in bacteriology. Bacteriol. Rev., 7, 57–137. Finney, DJ (1947) The principles of biological assay. J. Roy. Stat. Soc., Ser. B., 9, 46–91. Fisher, RA and Yates, F (1963) Statistical Tables for Biological, Agricultural and Medical Research, 6th edition. Longman Group, London. Hall, LP (1975) A Manual of Methods for the Bacteriological Examination of Frozen Foods. Campden and Chorleywood Food Research Association, Chipping Campden. Halvorson, HO and Ziegler, NA (1933) Application of statistics to problems in bacteriology. I. A means of determining bacterial population by the dilution method. J. Bacteriol., 25, 101–121. Hedges, AJ, Howe, K, and Linton, AH (1977) Statistical considerations in the sampling of Escherichia coli from intestinal sources for serotyping. J. Appl. Bacteriol., 43, 271–280. Hurley, MA and Roscoe, ME (1983) Automated statistical analysis of microbial enumeration by dilution series. J. Appl. Bacteriol., 55, 159–164. ICMSF (1978) Micro-organisms in Foods. Vol. 1. Their Significance and Methods of Enumeration, 2nd edition. University Press, Toronto. ISO (1979) Meat and Meat Products. Detection and Enumeration of Enterobacteriaceae (Reference Methods). International Organization for Standardization, Geneva. ISO 5552: 1979. ISO (2002) Microbiology of Food and Animal Feeding Stuffs – Horizontal Method for the Detection of Salmonella Species. International Organization for Standardization, Geneva. ISO 6579: 2002. ISO (2005a) Microbiology of Food and Animal Feeding Stuffs – Polymerase Chain Reaction (PCR) for the Detection of Food-Borne Pathogens Performance Testing for Thermal Cyclers. International Organization for Standardization, Geneva. ISO/TS 20836: 2005. ISO (2005b) Microbiology of Food and Animal Feeding Stuffs – Polymerase Chain Reaction (PCR) for the Detection of Food-Borne Pathogens – General Requirements and Definitions. International Organization for Standardization, Geneva. ISO 22174: 2005.
CH008-N53039.indd 175
5/26/2008 7:58:52 PM
176
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
ISO (2005c) Microbiology of Food and Animal Feeding Stuffs – Horizontal Method for the Detection and Enumeration of Presumptive Escherichia coli – Most Probable Number Technique. International Organization for Standardization, Geneva. ISO 7251: 2005. ISO (2006a) Microbiology of Food and Animal Feeding Stuffs – Polymerase Chain Reaction (PCR) for the Detection of Food-Borne Pathogens – Requirements for Sample Preparation for Qualitative Detection. International Organization for Standardization, Geneva. ISO 20837: 2006. ISO (2006b) Microbiology of Food and Animal Feeding Stuffs – Polymerase Chain Reaction (PCR) for the Detection of Food-Borne Pathogens – Requirements for Amplification and Detection for Qualitative Methods. International Organization for Standardization, Geneva. ISO 20838: 2006. Jarvis, B (2007) On the compositing of samples for qualitative microbiological testing. Lett. Appl. Microbiol., 45, 592–598. (1 October 2007: doi. 10.1111/j.1472-765X.2007.02237.x) Meynell, GG and Meynell, E (1965) Theory and Practice in Experimental Bacteriology. Cambridge University Press, London. Moran, PAP (1954a) The dilution assay of viruses. I. J. Hyg. (Camb.)., 52, 189–193. Moran, PAP (1954b) The dilution assay of viruses II. J. Hyg. (Camb.)., 52, 444–446. Moran, PAP (1958) Another test for heterogeneity of host resistance in dilution assays. J. Hyg. (Camb.)., 56, 319. Patel, PD (1994) Microbiological applications of immunomagnetic techniques. In: Patel, PD (ed.) Rapid Analysis Techniques in Food Microbiology. Chapman Hall, London. Pearson, ES and Hartley, HO (1976) Biometrika tables for statisticians (3rd ed.), University Press, Cambridge. Shaw, SJ, Blais, BW, and Nundy, DC (1998) Performance of Dynabeads anti-Salmonella system in the detection of Salmonella sp. in foods, animal feeds, and environmental samples. J. Food Prot., 61, 1507–1510. Woodward, RL (1957) How probable is the most probable number?. J. Am. Water. Works. Assoc., 49, 1060–1068.
CH008-N53039.indd 176
5/26/2008 7:58:53 PM
9 STATISTICAL CONSIDERATIONS OF OTHER METHODS IN QUANTITATIVE MICROBIOLOGY
In addition to those methods considered previously, a range of other methods is used to estimate microbial numbers in foods. These fall into two major categories: direct microscopic methods of analysis and indirect methods of analysis, which are dependent upon some physical or chemical means of indirectly estimating microbial cell numbers. The development and application of ‘rapid methods’ has led to considerable interest in indirect methods to detect and estimate microorganisms in foods.
DIRECT MICROSCOPIC METHODS Much has been written in the past about direct microscopic counts using techniques such as the haemocytometer, or other type of counting chamber, the Breed smear (Breed and Brew, 1925), the proportional (or ratio) count technique (Thornton and Gray, 1934) and the membrane filter technique. Technical details for these techniques are given in standard laboratory texts such as Meynell and Meynell (1965) and Harrigan (1998). The average number of microbial cells counted on each microscopic field will be related to the initial level present in the sample, or in a macerate of a solid food sample, and will be dependent upon the dilution factor, the area over which the sample is distributed and the area of the field of view of the microscope. The distribution of microbial cells is usually considered to follow either a negative binomial or a Poisson series (Chapters 3 and 4), although other more complex distributions may occur. For instance, reference has been made already to the work of Jones et al. (1948), who showed that the number of micro-colonies followed a Poisson distribution whilst the total number of cells followed the negative binomial. Similar observations have been made by other workers (e.g. Takahashi et al. (1964); see also Examples 3.4, 4.3 and 4.4). Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH009-N53039.indd 177
177
5/26/2008 8:00:56 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
178
TABLE 9.1 Bivariate Frequency Distribution of Reference and Test Cells per Unit Haemocytometer Cell Square Number of test cells/ square
Number of reference cells/square 0–4
5–9
10–14
3–4 5–9 10–14 15–19 20–24 25–29
138 (146.2) 172 (171.5) 39 (27.9) 1a – –
114 (113.5) 268 (287.5) 98 (100) 8 (11.7) 1a –
15 (10.9) 66 (60.6) 30 (44.2) 14 (10) 3a –
Total
350
507
128
15–19
20–24
30–34
Total
1a 4a 7 (6.8) 2a – –
– – 13b (9.2) – –
– – – – – 1a
268 528 174 25 4 1
14
1
1
1000
Source: Reproduced from Takahashi et al. (1964) by permission of the Japanese Journal of Infectious Diseases. Values in () are expected frequencies based on bivariate negative binomial distribution. Goodness-of-fit 2 15.86; 5% critical limit for 2g 16.92. Values marked a combined to make value b.
The ratio method is based on examination of a blend of the sample suspension with a suspension of particles, of similar size to bacterial cells and having a known particle density. Numbers of microbial cells and particles are counted on each of several microscopic fields of view. A bivariate frequency distribution is obtained since both the microbial cells and the reference cells (or particles) will be distributed in the counting chamber or on the microscope slide (Table 9.1). Takahashi et al. (1964) demonstrated that distribution of both the reference particles and test cells followed the negative binomial distribution. For a distribution conforming to Poisson, it has already been demonstrated (Table 7.2) that precision increases with an increase in the number of individuals counted since 2 . In the negative binomial series the variance and mean are related by the following expression (Chapter 3): 2 (2/k) and 1/k provides a measure of the excess variance due to clumping. Therefore, greatest precision will again result from counting a large number of cells but for maximum precision these must be distributed over a large number of fields since the variance is related to both the overall mean and the parameter k (Table 9.2). Hence, it is essential to count as many organisms as is technically feasible, over a large number of fields, in case the clumping effect is pronounced; where ratio counts are made, greatest precision results from use of approximately equal numbers of microbial cells and reference particles in each field. The standardized latex particles supplied for use with the Coulter electronic particle counter provides a suitable source of reference materials. Apart from distribution errors, the errors of direct counts are related mainly to the inaccuracies of pipetting very small volumes of sample (Table 6.3; data of Brew, 1914), uneven distribution of the sample on the slide or in the counting chamber, inaccuracies in
CH009-N53039.indd 178
5/26/2008 8:00:57 PM
STATISTICAL CONSIDERATIONS OF OTHER METHODS IN QUANTITATIVE MICROBIOLOGY
179
TABLE 9.2 Limiting Precision for Selected Numbers of Fields Counted and Values of kˆ Where Contagion Is Apparent (for Details of Calculation see Example 7.2) Limiting precision to nearest % for kˆ values of Mean Number of cells/field
Number of fields counteda
2
3
4
5
500
2
100 136,000
100 28,290
99 11,894
99 6748
330
3
250
4
95 699 72 259
87 414 63 172
81 303 57 133
76 244 53 111
200
5
100
10
50
20
63 171 44 78 32 46
54 118 36 57 26 34
49 94 32 46 22 28
44 80 29 40 20 24
a
Assumes a total of 1000 cells counted.
the dimensions or use of counting chambers, the massive multiplication factor involved (Example 9.1) and inadequate mixing of the original sample.
Howard Mould Count A special form of microscopic count, developed to improve the quality of tomatobased products, is dependent for quantification on scoring fields as positive for fungal mycelium (i.e. containing at least a defined ‘critical’ amount) or negative (i.e. less than the critical level). Technical details of methodology are given by AOAC (2006). The method is highly subjective. Even skilled analysts obtain a variance of the order of 55% or more (Vas et al., 1959; Dakin et al., 1964; Jarvis, 1977). Since the analyst scores the number of fields with or without fungal mycelium, each series of replicate tests can be expected to follow the positive binomial distribution, but as the number of fields to be counted is large (not less than 100/sample) the normal approximation may be accepted. With a standard deviation of about 7.5%, we can put 95% confidence limits of 15% on the observed value from a single Howard mould count (HMC) estimate. For a manufacturer to supply tomato-based products to a specification of (say) not greater than 50% positive fields, his control analyses must make due allowance for the variation in the method. If the standard deviation is 7.5%, then to avoid producing more than one in 20 lots with a HMC 50%, the manufacturer’s control average must not exceed 39% when duplicate
CH009-N53039.indd 179
5/26/2008 8:00:57 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
180
EXAMPLE 9.1 DILUTION EFFECT IN MICROSCOPIC COUNTS ON A BACTERIAL SMEAR Assume, 0.01 ml sample was distributed over an area of 1 cm2. After fixing and staining, the slide was examined using microscope optics giving a field diameter of 16 m. From 400 fields counted the mean cell count/field was 2.42 with s2 4.06 (data from Examples 3.4 and 4.2).
Calculate: 1. Area of field r2 (16/2)2 m2 201.06 m2 A 2. The number of fields/cm2 (1 / A) 108 108 / 201 4 . 97 105 3. The mean number of cells/field 2.42
4. The mean number of cells/cm2 4 . 97 105 2 . 42 1 . 2 106
5. Since 0.01 ml of sample was tested, the direct cell count on sample 1.2 106 102 1.2 108 cells/ml. Hence, the overall dilution factor was 4.96 107. This value is much greater than the mean cell count/field or the 95% CL of that count (2.3–2.55). An error of 10% in pipetting and spreading the sample would also affect the count relatively little; for example, if 0.009 ml had been pipetted instead of 0.10 ml, then the apparent count would have been 1.33 108, which does not differ practically from that determined above. It is clear that with such large dilution factors involved, microscopic counts can be used only when large total cell numbers are anticipated and that the precision of such counts is low.
samples are analysed1. For further consideration on lot tolerances the reader is referred to Davies (1954) and Duncan (1965). The HMC is affected by processes such as homogenization that disrupt and disperse mould mycelium (Eisenberg, 1968). Such processes totally negate the use of this method and it is, at best, a quality control method although it is used by Regulatory Authorities in the United States of America and elsewhere.
INDIRECT METHODS Although many indirect methods of enumeration have been developed, few are useful for food analysis without considerable modification. Indirect methods are based on measurements of physical (e.g. turbidity, packed cell volume, etc.) or chemical (e.g. protein, nucleic acid, ATP, chitin) properties of cell constituents or of cell metabolism (e.g. 1
95% tolerance given by
CH009-N53039.indd 180
x 1 . 96 /冑 n.
5/26/2008 8:00:57 PM
STATISTICAL CONSIDERATIONS OF OTHER METHODS IN QUANTITATIVE MICROBIOLOGY
181
electrical measurements, oxygen utilization, carbon dioxide production, acid production, etc.). Food constituents affect many of these techniques significantly, unless the microbial cells are separated from the food matrix before analysis (Wood et al., 1976). Since all indirect estimation systems are based on measurements other than counts, one may suppose that the results from any estimate will follow a normal distribution. Whilst this is true for specifically chemical measurements (e.g. pH, acidity, proteins, nucleic acids, ATP, chitin, oxygen uptake, etc.), measurements of such compounds in microbial cells rarely conforms to a normal distribution, even when pure cultures are tested. For instance, a plot of mean values and the variance estimates of chitin, as a measure of fungal mycelium in tomato juice (Jarvis, 1977), shows a high degree of correlation (rxy 0.91, SEr 0.354) indicating a lack of independence (Fig. 9.1). Six of the nine sets of data, all associated with high mould count estimates, conform to a negative binomial distribution with individual values for kˆ ranging from 9 to 42. Consequently, although the results of estimations of pure chitin are normally distributed around the mean, the variance associated with estimation of chitin in food is affected by the contagious distribution of mould in the food. Estimation of microbial numbers by electrical impedance measurement is dependent upon electrical conductance and capacitance changes in culture media resulting from microbial growth. The time-to-detection of a specific impedance change (in -siemens) can be related to the initial number of microorganisms in the medium (Wood et al., 1977; Richards et al., 1978).
4
Log10 variance
x
x
x 3
x
2
x
x
1
x 0
x 1
2
3
Log10 mean chitin level (g g1 solids)
FIGURE 9.1 Plot of log10 variance against log10 mean level of fungal chitin in tomato products (Jarvis, 1977). Regression equation: y 2.7867x 3.04; r 0.91; standard error of r 0.354.
CH009-N53039.indd 181
5/26/2008 8:00:57 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
182
For a pure culture growing synchronously, we should expect normal growth kinetics to apply (Richards et al., 1978); so that an inverse linear relationship would be found between the log initial cell density and time-to-detection of a specific impedance change (Fig. 9.2). This is illustrated in Example 9.2. For pure cultures, the variance of the detection time is largely independent of the detection time so that, as expected, the detection times are distributed normally. However, when food samples are examined, the variance is related to the detection time so the distribution is not ‘normal’ (Fig. 9.3). It may be supposed that this association reflects the distribution of organisms in the replicate test samples. Naturally contaminated samples will have a distribution not only of numbers of organisms capable of growth in the test system, but also of organisms with different specific growth rates, lag times, nutritional requirements, etc. Although physical and chemical analytical methods would be expected to yield normally distributed results, those obtained for estimation of microorganisms will be affected by the types, numbers, condition and distribution of the microorganisms in the sample. This discussion should not be taken to imply that such alternative methods are unsuitable for the purposes intended. Rather, its purpose is to make readers aware that misleading statements and claims about the precision of such methods by those who are concerned with marketing new methodology concepts. It is for such reasons that it is necessary for all new methods to be validated using approved statistical procedures (see Chapter 13).
EXAMPLE 9.2 IMPEDANCE MEASUREMENTS – RELATIONSHIP BETWEEN INITIAL NUMBERS AND TIME-TODETECTION OF SPECIFIC IMPEDANCE CHANGE How many organisms must be inoculated for detection to be achieved within 2, 4, 6 and 8 h incubation at a defined temperature? For synchronous growth, N N 0 2(TL) /Zd , where: N number of organisms after time T, N0 number of organisms at time T0, Zd doubling time and L lag period. Assume that the culture has an average population doubling time of 0.33 h (20 min), a lag period of 1 h, and that 108 organisms/ml are needed to show a specific impedance change. The initial contamination level (N0) can be derived from the equation: N 0 N/ 2(TL) /Td assuming synchronous growth and maintenance of average population doubling time (Td). Since N 108, Td 0.333 and L 1, then for a detection time (T) 6, N0 108/215 3048. The results are shown below: Detection time (T) (h) 2 4 6 8
Generation periods (TL)/Td
Population increase 2(TL) /Td
Initial level N0 (organisms/ml)
3 9 15 21
8 512 32,768 2,097,152
1.25107 9.8105 3.05103 4.77101
Hence, for detection after 2, 4, 6 or 8 h, the original samples would require about 107, about 106 cells/ml, about 103 and 5 cells/ml, respectively.
CH009-N53039.indd 182
5/26/2008 8:00:57 PM
STATISTICAL CONSIDERATIONS OF OTHER METHODS IN QUANTITATIVE MICROBIOLOGY
183
9 8 Detection time (h)
7 6 5 4 3 2 1 0
1
2
3
4
5
6
7
8
Log N° (Viable cells/ml) FIGURE 9.2 Hypothetical plot of ‘time-to-detection’ (assuming this is related to final cell density) plotted against initial cell numbers (data from Example 9.2).
0
x x
x x
x
x
x
1 x x Log10 variance
x x
x
x x xx
x
x x
2
x
x
x
x xx
3 0.2
x xx
0.4
x
0.6 0.8 Log10 mean detection time (h)
1.0
1.2
FIGURE 9.3 Plot of log10 variance against log10 mean detection time for food samples analysed by impediometry; regression equation y 2.59x1.638; r 0.72; standard error of r 0.189.
CH009-N53039.indd 183
5/26/2008 8:00:58 PM
184
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
References Association of Official Analytical Chemists (2006) Official methods of analysis, 18th edition. AOAC, Washington DC. Rev. 1. Breed, RS and Brew, JD (1925) Counting bacteria by means of the microscope. New York State Agricultural Experimental Station, Geneva, Technical Circular No. 58. 12 pp. Reprinted from Technical Bulletin No. 49. 31 pp. (1916). Brew, JD (1914) A comparison of the microscopical method and the plate method of counting bacteria in milk. NY Agric. Expt. Sta. Bull., 373, 1–38. Dakin, JCY, Smith, DJ and Taylor, AMcM (1964) Variance in the Howard mould count arising from observer and distribution differences. Technical Circular No. 259. Leatherhead Food Research Association. Davies, OL (1954) The Design and Analysis of Industrial Experiments. Oliver & Boyd, London. Duncan, AJ (1965) Quality Control and Industrial Statistics, 3rd edition. R.D. Irwin, Chicago. Eisenberg, WV (1968) Mould counts of tomato products as influenced by different degrees of comminution. Q. Bull. Assoc. Food Drug Off. US, 32, 173–179. Harrigan, WF (1998) Laboratory Methods in Food Microbiology, 3rd edition. Academic Press, London. Jarvis, B (1977) A chemical method for the estimation of mould in tomato products. J. Food Technol., 12, 581–591. Jones, PCT, Mollison, JE and Quenouille, MH (1948) A technique for the quantitative estimation of soil microorganisms. Statistical note. J. Gen. Microbiol., 2, 54–69. Meynell, GG and Meynell, E (1965) Theory and Practice in Experimental Bacteriology. Cambridge University Press, London. Richards, JCS, Jason, AC, Hobbs, G, Gibson, DM and Christie, RH (1978) Electronic measurement of bacterial growth. J. Phys. E., 11, 560–568. Takahashi, K, Ishida, S and Kurokawa, M (1964) Statistical consideration on sampling errors in total bacteria cell count. J. Med. Sci. Biol., 17, 73–86. Thornton, HG and Gray, PHH (1934) The numbers of bacterial cells in field soils as estimated by the ratio method. Proc. Roy. Soc. Lond. B. Biol. Sci., 115, 522. Vas, K, Fabri, I, Kutz, N, Lang, A, Orbanyi, T and Szabo, G (1959) Factors involved in the interpretation of mold counts of tomato products. Food Technol. (Champaign), 13, 318–322. Wood, JM, Jarvis, B and Wiseman, A (1976) The separation of microorganisms from food. Chem. Ind. (London), 27, 783–784. Wood, JM, Lach, VH and Jarvis, B (1977) Detection of food-associated microbes using electrical impedance measurements. J. Appl. Bact., 43, xiv–xv.
CH009-N53039.indd 184
5/26/2008 8:00:58 PM
10 MEASUREMENT UNCERTAINTY IN MICROBIOLOGICAL ANALYSIS
There are only two absolute certainties in life: death and taxes! Whatever task we undertake, no matter how menial or how sophisticated, we are faced with a lack of certainty in the outcome! It is therefore essential to have a basic understanding of what is meant by uncertainty in relation to microbiological data. In this chapter we shall consider briefly the definition of uncertainty, its causes and general aspects of uncertainty measurement. In the next chapter we shall consider details of the various approaches to its determination. Corry et al. (2006) have published a more detailed critical review of measurement uncertainty in quantitative microbiological analysis. In microbiological laboratory practice, we can identify many causes of variability. For instance: the ability of an isolate to give atypical reactions on a diagnostic medium; the use of the incorrect ingredients in a culture medium; the consequence of changing brands of commercial media; use of non-standard conditions in the preparation, sterilization and use of a culture medium or diluent; equipment errors; the tolerance applied to the shelf life of reagents; human errors in weighing, dispensing, pipetting and other laboratory activities; the relative skill levels of different analysts; the relative well-being of anyone who is doing the analyses; and so on, and so on … ad infinitum! These are but a few trite examples of biological, instrumental and personal bias that affect the accuracy, precision and hence the uncertainty of microbiological tests; a situation that constantly faces scientists involved in laboratory management. To interpret properly the results obtained using any analytical procedure, whether physical, chemical or biological, requires careful consideration of the diverse sources of actual or potential error associated with the results obtained. It is important to differentiate between the two meanings of the term ‘error’: in common parlance, error means mistakes, for instance failure properly to undertake a defined test protocol; in statistics ‘error’ means statistical variation. Any analytical result is influenced by a complex of three groups of statistical error: 1. Random errors, associated with the distribution of analytical targets in the primary sample matrix and in the analytical (test) sample, errors in the formulation of the culture media, etc. Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH010-N53039.indd 185
185
5/26/2008 8:02:31 PM
186
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
2. Systematic errors associated with analytical equipment and procedures. 3. Modification of the systematic errors in a particular laboratory that carries out test procedures due to environmental, equipment and individual analysts’ personal traits.
ACCURACY AND PRECISION ‘Accuracy’ is a qualitative concept (VIM: ISO, 2007). In simple terms, accuracy can be defined as the correctness of a result, relative to an expected outcome; whilst ‘precision’ is a measure of the variability of test results. Accuracy is defined (ISO 2003, 2006a) as ‘the closeness of agreement between a measurement result and the true value.’ Accuracy is a combination of trueness and precision (a combination of random components and systematic error or bias components) although in the usual biostatistics terminology ‘trueness’ is generally called accuracy. This differs from the definition given by VIM (ISO, 2006b): ‘the closeness of agreement between the result of a measurement and a true value of a measurand’. Accuracy is essentially the ‘absence of error’; the more accurate a result the lower the associated error of the test. It is import-ant to note that this definition of the term ‘accuracy’ applies only to results; in common parlance, people often use the term accuracy when they refer to ‘the accuracy of a method’ or ‘the accuracy of a piece of equipment’ such as a pipette. ‘Trueness’ is defined (ISO, 2003) as, ‘the closeness of agreement between the average value obtained from a large series of test results and an accepted reference value’. Trueness is equivalent to an absence of ‘bias’, which is the difference between the expectation of the test results and an accepted reference value and is a measure of systematic error. Trueness, sometimes referred to as accuracy, may correctly be contrasted with precision. ‘Precision’ is defined as the ‘closeness of agreement between independent test results obtained under stipulated conditions’. Precision depends only on the distribution of random errors and does not relate to a true or specified value. The measure of precision is expressed usually in terms of imprecision and computed as a standard deviation of the test results. Lack of precision is reflected by a large standard deviation. Independent test results means results obtained in a manner not influenced by any previous results on the same or similar test material. Quantitative measures of precision depend critically on the stipulated conditions. Repeatability and reproducibility conditions are particular sets of extreme stipulated conditions (ISO, 2003). Relationships between trueness, accuracy, precision and uncertainty are illustrated schematically in Fig. 10.1 (Analytical Methods Committee, 2003). The concepts of accuracy and trueness must take account of error and precision. Uncertainty estimates (qv) provide a simple way to quantify such needs. However, since in a real-life situation we can never know the ‘true’ or ‘correct’ answer, trueness can be assessed only in a validation-type trial using an accepted reference material for which a ‘true’ concentration is known. This is more complex in microbiology than it is in physics and chemistry.
CH010-N53039.indd 186
5/26/2008 8:02:31 PM
MEASUREMENT UNCERTAINTY IN MICROBIOLOGICAL ANALYSIS
si n es g u tim nc at er t e ain ac cu ra cy
ec re a
Im
pr ov in
g
D
Improving truness
ty
187
Error
Improving precision FIGURE 10.1 Relationships between trueness, accuracy, precision and uncertainty in analytical results (AMC, 2003). The schematic illustrates a series of shots at a target. At the bottom left the grouping of shots is neither true nor precise. Improving trueness without improving precision moves the grouping onto the target (top left), whilst improving precision, but not accuracy results in a closer grouping of shots that are some distance (the error) from the ‘bull’s eye’ (bottom right). Improving both trueness and precision (i.e. improving accuracy and decreasing uncertainty) gives a close grouping of shots around the bull’s eye (reproduced by permission of the Royal Society of Chemistry, London).
MEASUREMENT UNCERTAINTY The Eurachem (2000) definition of Uncertainty of a Measurement is: ‘A parameter associated with the result of a measurement that characterises the dispersion of the values that could reasonably be attributed to the measurand’. The term ‘measurand’ is a bureaucratic way of saying ‘analyte’. Translated into simple English this definition can be rewritten, as ‘Uncertainty is a measure of the likely range of values that is indicated by an analytical result.’ For quantitative data (e.g. colony counts or most probable number (MPN) values) a measure of uncertainty may be any appropriate statistical parameter associated with the test result. Such parameters include the standard deviation, the standard error of the mean or a confidence interval around that mean. Measures of repeatability and reproducibility are the cornerstones of the estimation of analytical uncertainty. ISO (2004) defines Repeatability as: ‘a measure of variability derived under specified repeatability conditions’, that is independent test results obtained with the same method on identical test items in the same laboratory by the same analyst using the same equipment, batch of culture media and diluents, and tested within a short interval of time. By contrast, Reproducibility is ‘a measure of variability derived under reproducibility conditions’, that is test results are obtained with the same method on identical test items in different laboratories with different operators using different equipment and at different times. Valid statements of repeatability and reproducibility require specification of the conditions used.
CH010-N53039.indd 187
5/26/2008 8:02:31 PM
188
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Intermediate reproducibility (ISO, 1994) is defined as ‘a measure of reproducibility derived under reproducibility conditions within a single laboratory’. Standard uncertainty (uy) of a measurement (y) is defined (ISO, 1995) as ‘the result obtained from the values of a number of other quantities, equal to the positive square root of a sum of terms, the terms being the variances or covariances of these other quantities weighted according to how the measurement result varies with changes in these quantities’. More simply, this means that standard uncertainty is the square root of the sum of those variances that are likely to have influenced the result. Expanded uncertainty (U) is ‘the quantity defining an interval about the result of a measurement expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand’ (ISO, 1995). Expanded Uncertainty values are derived by multiplying the standard uncertainty by a ‘coverage factor’ to provide confidence intervals for repeatability and reproducibility around the mean value. Routinely, a coverage factor of 2 is used to give approximate 95% distribution limits (95% confidence interval) around a ‘normalized’ mean value. For quantal data (e.g. presence or absence tests) uncertainty measures cannot be derived in the same way. However, procedures that derive quantitative estimates of quantal responses can be used to provide a measure of variability and hence of uncertainty, for example the standard error associated with derived values for an LOD50 (qv) or for the relative proportions of positive and negative results that conform to a binomial distribution in a comparative evaluation of methods (see below).
HOW IS UNCERTAINTY ESTIMATED? There are two totally different approaches to the estimation of uncertainty (ISO, 1995): The ‘bottom-up’ approach uses the estimates of all errors associated with all the relevant steps undertaken during an analysis to derive a value for a ‘combined standard uncertainty’ associated with a method (Eurachem 2000; Niemelä, 2002, 2003; NMKL, 2002). Essentially this approach provides a broad indication of the possible level of uncertainty associated with method rather than a measurement. In practice this approach will always underestimate the extent of variation since it cannot take into account either sample matrix-associated errors or the actual day-to-day variation seen in a laboratory. For these reasons, this approach is not considered to be appropriate for microbiological analyses (ISO, 2006b; Corry et al., 2006). The ‘top-down’ approach is based on statistical analysis of data generated in intra- or inter-laboratory collaborative studies of a method to analyse a diversity of matrices. It therefore provides an estimate of the uncertainty of a measurement associated with the result obtained using a specific method (see e.g. Corry et al., 2006, 2007; Jarvis et al., 2007a, b). Quantitative tests: For quantitative data (e.g. colony counts and MPN estimates), measures of ‘repeatability’ and ‘reproducibility’ are derived as the standard deviations of repeatability (sr) and reproducibility (sR). However, microbiological data do not usually conform to a ‘Normal’ distribution, and therefore require mathematical transformation prior to statistical analysis. For most purposes, a log10 transformation is used to ‘normalize’ the data
CH010-N53039.indd 188
5/26/2008 8:02:31 PM
MEASUREMENT UNCERTAINTY IN MICROBIOLOGICAL ANALYSIS
189
but in cases of significant over-dispersion the use of a Negative–Binomial transformation may be necessary (Jarvis, 1989; Niemelä, 2002). If there is reason to believe that data conform to a Poisson distribution, then a square root transformation is required, since the population variance (2) is numerically equal to the mean () value. Statistical analyses of collaborative trial data are generally done by analysis of variance (ANOVA) after removing any outlying values, as described by Youden and Steiner (1975) and by Horwitz (1995). However, it has been argued (e.g. Analytical Methods Committee 1989a, b, 2001) that it is wrong to eliminate outlier data and that application of Robust Methods of analysis is preferable. One approach to robust analysis is a ‘robusticized’ ANOVA procedure based on Huber’s H15 estimators for the robust mean and standard deviation of the data (AMC, 1989a, b; ISO, 1998; AMC 2001). An alternative approach is that of the Recursive Median (REMEDIAN) procedure (ISO, 2000). A major drawback to use of published robust techniques for analysis of data from inter-laboratory trials is that they do not permit simple derivation of components of variance. A novel approach to overcome this disadvantage is by the use of stepwise robust analysis for ‘nested’ trial data, as described by Hedges and Jarvis (2006). Examples of the use of traditional ANOVA, robust ANOVA and related procedures are given in Chapter 11. Intermediate Reproducibility of quantitative data can be done using similar procedures to estimate intermediate (i.e. within-laboratory) reproducibility associated with the use of an analytical procedure in a single laboratory. Even data obtained, for instance, in laboratory quality monitoring can be used to provide an estimate of intra-laboratory reproducibility. ISO (2006b) describes a statistical procedure for analysis of paired data. A worked example is provided in Chapter 11. Qualitative tests: Estimation of uncertainty associated with quantal test methods (e.g. presence or absence) is currently the subject of much discussion. Many of the potential errors that affect quantitative methods also affect qualitative methods; but there are also some additional potential errors that are inherent in the analytical procedure. By definition, the output from a series of quantal tests is a number of either positive or negative responses (see Chapter 7). There is an intrinsic need to ensure effective growth of the index organism to critical levels during all the cultural stages of such tests – so culture medium composition, incubation times and temperatures, etc. are critical to the success of the test. It is critical also to ensure that the confirmatory stages of a test protocol can actually identify the target organism. Knowledge of the potential effect of competitive organisms is also of major importance for both cultural and confirmatory stages of a test protocol. One approach to quantifying such data was the derivation of the Accordance and Concordance concept (Langton et al., 2002) that sought to provide measures ‘equivalent to the conceptual aspects of repeatability and reproducibility’. However, experience suggests that this approach is not sufficiently robust to be used in the manner proposed since it merely reinterprets the original data. Provided that a sufficient number of parallel tests has been undertaken at each of several levels of potential contamination, then it is possible to quantify the test responses in terms of an estimated level of detection for example 50% positives [LOD50] (Hitchins and Burney, 2004). Note that in this context, the term LOD refers to the level of detection not to the
CH010-N53039.indd 189
5/26/2008 8:02:31 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
190
more usual Limit of Detection. This statistical approach essentially estimates the MPN of organisms at each test level and then analyses the relative MPN values using the SpearmanKärber approach. Current investigations have shown that alternative approaches using a generalized linear model with probit, logit or loglog analyses may be more appropriate (H Van der Voet, 2007 personal communication). The common theme of these approaches is to transform purely qualitative data into a quantitative format for which error values can be assigned in order to derive an estimate of the uncertainty of the test result. An extrapolation of the approach is to determine the absolute limit of detection (LOD0) and a selected higher limit of detection (e.g. LOD90) such that a dose–response curve can be derived. This may be of importance in differentiating between methods capable of detecting specific organisms at a similar LOD50 level but for which the limits LOD 0 and LOD 90 values differ. An alternative approach is to estimate the uncertainty associated with the proportions of test samples giving a positive response, based on the binomial distribution. Examples of the way in which such approaches to analysis of qualitative data can be used are illustrated in worked examples in Chapter 11. REPORTING OF UNCERTAINTY The expression of uncertainty is of some importance. Assuming a mean aerobic colony count 5.00 log10 cfu/g and a reproducibility standard deviation of 0.25 log10 cfu/g, then the 95% expanded uncertainty value is 0.50 log10 cfu/g. The results could be reported as, for instance, the aerobic colony count on product X is: 5.00 0.50 log10 cfu/g, with a 95% probability; or 5.00 log10 cfu/g 10% with a 95% probability, or between 104.5 and 105.5 cfu/g with a 95% probability, or between 4.5 and 5.5 log10 cfu/g with a 95% probability. If log-transformed data are back-transformed, that is 5.0 log10 cfu/g 100,000 cfu/g, then the associated uncertainty estimate limits will not be linear. For instance, backtransformation of a log colony count of 5.0 0.50 log10 cfu/g is equivalent to a count of 100,000 cfu/g with 95% uncertainty limits of approximately 31,600 and 316,200 cfu/g. It is important not to refer to analytical methods as having a precision of for example 10% based on uncertainty estimates. Uncertainty is a measure of variability in a test result and is therefore a measure of the lack of precision in the result rather than an estimate of the precision of the method itself. SAMPLING UNCERTAINTY Estimates of measurement uncertainty do not take account of the distribution of organisms within a lot, within a primary sample or within a test sample. Individual samples of foodstuff may be drawn either from a bulk lot or from individual primary and secondary
CH010-N53039.indd 190
5/26/2008 8:02:31 PM
MEASUREMENT UNCERTAINTY IN MICROBIOLOGICAL ANALYSIS
191
packaged units within the lot, as discussed in Chapters 5 and 6. Estimates of measurement uncertainty are based on the preparation of an initial suspension of the analytical sample from which serial dilutions are prepared for testing. It is assumed that the analytical samples have been drawn randomly and with care to ensure that they are representative of the primary sample unit or, indeed, of the entire lot of product. In practice this may not be so, no matter how carefully the samples are drawn. It is also generally assumed that microbial cells are randomly distributed within the lot, the sample unit and the test sample, but there are many cases of heterogeneous dispersion of microbial cells within a food matrix. Consequently it is not unreasonable to suppose that any estimation of measurement uncertainty may underestimate the total level of uncertainty associated with an analysis. The food matrix may affect the estimate of measurement uncertainty because of its effect on the recovery of organisms and/or interactions between food particles and the test medium. Such effects will tend to increase the estimate of measurement uncertainty per se but do not provide estimates of sampling uncertainty. Investigation of the chemical composition of diverse foodstuffs (Ramsey et al., 2001; Lyn et al., 2002, 2003) has demonstrated that the sampling uncertainty is at least as great, and is often greater than, the estimate of measurement uncertainty. It can reasonably be assumed that the same situation will apply to the microbiological examinations of foods, but there is little published data. In an examination of imported prawns by a UK Health Protection Agency laboratory, sufficient replicate samples and analyses were taken to estimate the extent of both sampling and measurement uncertainty (Jarvis et al., 2007b). The estimate of total uncertainty was 18.6%. The contribution of sampling uncertainty to the total was 15.3%, whilst the estimated measurement uncertainty was 10.8%. It is probable that the extent of sampling uncertainty for microbiological examination of foods will be at least as great as the estimate of the actual measurement uncertainty. As is the case with chemical analyses, such underestimates may be critical in assessing the compliance of food materials with legislative and commercial microbiological criteria for foods.
THE USE OF UNCERTAINTY MEASURES IN ASSESSING COMPLIANCE The role of measurement uncertainty in relation to compliance of a test result with defined microbiological criteria for foods is of considerable importance and is discussed in Chapter 14.
References Analytical Methods Committee (1989a) Robust statistics – How not to reject outliers. Part 1: Basic concepts. Analyst, 114, 1693–1697. Analytical Methods Committee (1989b) Robust statistics – How not to reject outliers. Part 2: Interlaboratory trials. Analyst, 114, 1699–1702.
CH010-N53039.indd 191
5/26/2008 8:02:32 PM
192
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Analytical Methods Committee (2001) Robust Statistics: A Method of Coping with Outliers. AMC Brief No.6, Analytical Methods Committee, Royal Society of Chemistry, London. http://www.rsc. org/Membership/Networking/InterestGroups/Analytical/AMC/TechnicalBriefs.asp. Analytical Methods Committee (2003) Terminology – The Key to Understanding Analytical Science. Part 1: Accuracy, Precision and Uncertainty. AMC Brief No.13, Analytical Methods Committee, Royal Society of Chemistry, London. http://www.rsc.org/Membership/Networking/InterestGroups/ Analytical/AMC/TechnicalBriefs.asp. Corry, J, Jarvis, B, Passmore, S, and Hedges, A (2006) A critical review of measurement uncertainty in the enumeration of food microorganisms. Food Microbiol., 24, 230–253. Corry, JEL, Jarvis, B, and Hedges, A (2007) Measurement uncertainty of the EU methods for microbiological examination of red meat. Food Microbiol., 24, 652–657. Eurachem (2000) Quantifying Uncertainty in Analytical Measurement, 2nd edition. Laboratory of the Government Chemist, London. Hedges, A and Jarvis, B (2006) Application of robust methods to the analysis of collaborative trial data using bacterial colony counts. J. Microbiol. Meth., 66, 504–511. Hitchins, A D and Burney, A A (2004) Determination of the Limits of Detection of AOAC Validated Qualitative Microbiology Methods. AOAC International 118th Annual Meeting Program, Poster Abstract # P-1021, p. 153. Horwitz, W (1995) Protocol for the design, conduct and interpretation of method performance studies. Pure Appl. Chem., 67, 331–343. ISO (1994) Accuracy (Trueness and Precision) of Measurement Methods and Results – Part 2: Basic Method for the Determination of Repeatability and Reproducibility of a Standard Measurement Method. International Organisation for Standardisation, Geneva. ISO 5725-2:1994. ISO (1995) Guide to the Expression of Uncertainty in Measurement (GUM). International Organisation for Standardisation, Geneva. ISO/IEC Guide 98:1995. ISO (1998) Accuracy (Trueness and Precision) of Measurement Methods and Results – Part 5: Alternative Methods for the Determination of the Precision of a Standard Measurement Method. International Organisation for Standardisation, Geneva. ISO 5725-5:1998. ISO (2000) Microbiology of Food and Animal Feeding Stuffs – Protocol for the Validation of Alternative Methods. International Organisation for Standardisation, Geneva. ISO 16140:2000. ISO (2003) Statistics – Vocabulary and Symbols Part 1: General Statistical Terms and Terms used in Probability. International Organisation for Standardisation, Geneva. ISO 3534-1:2003. ISO (2004) Guidance for the Use of Repeatability, Reproducibility and Trueness Estimates in Measurement Uncertainty Estimation. International Organisation for Standardisation, Geneva. ISO TS 21748:2004. ISO (2006a) Statistics – Vocabulary and Symbols-Part 2: Applied Statistics. International Organisation for Standardisation, Geneva. ISO 3534-2: 2006. ISO (2006b) Microbiology of Food and Animal Feeding Stuffs – Guide on Estimation of Measurement Uncertainty for Quantitative Determinations. International Organisation for Standardisation, Geneva. ISO PTDS 19036:2006. ISO/IEC (2007) International vocabulary of metrology – Basic and general concepts and associated terms (VIM). International Organisation for Standardisation, Geneva. ISO/IEC Guide 99:2007 Jarvis, B (1989) Statistical aspects of the microbiological analysis of foods. Prog. Appl. Microbiol., 21. Elsevier, Amsterdam. Jarvis, B, Hedges, A, and Corry, JEL (2007a) Assessment of measurement uncertainty for quantitative methods of analysis: Comparative assessment of the precision (uncertainty) of bacterial colony counts. Int. J. Food Microbiol., 116, 44–51.
CH010-N53039.indd 192
5/26/2008 8:02:32 PM
MEASUREMENT UNCERTAINTY IN MICROBIOLOGICAL ANALYSIS
193
Jarvis, B, Corry, JEL, and Hedges, A (2007b) Estimates of measurement uncertainty from proficiency testing schemes, internal laboratory quality monitoring and during routine enforcement examination of foods. J. Appl. Microbiol., 103, 462–467. Langton, SD, Chevennement, R, Nagelkeke, N, and Lombard, B (2002) Analysing collaborative trials for qualitative microbiological methods: Accordance and concordance. Int. J. Food Microbiol., 79, 171–181. Lyn, JA, Ramsey, MH, and Wood, R (2002) Optimised uncertainty in food analysis: application and comparison between four contrasting ‘analyte–commodity’ combinations. Analyst, 127, 1252–1260. Lyn, JA, Ramsey, MH, and Wood, R (2003) Multi-analyte optimisation of uncertainty in infant food analysis. Analyst, 128, 379–388. Niemelä, SI (2002) Uncertainty of Quantitative Determinations Derived by Cultivation of Microorganisms, 2nd edition. Centre for Metrology and Accreditation, Advisory Commission for Metrology, Chemistry Section, Expert Group for Microbiology, Helsinki, Finland, Publication J3/2002. Niemelä, SI (2003) Measurement uncertainty of microbiological viable counts. Accredit. Qual. Assur., 8, 559–563. NMKL (2002) Measurement of uncertainty in microbiological examination of foods. NMKL Procedure no 8 2nd ed., Nordic Committee on Food Analysis. Ramsey, MH, Lyn, JA, and Wood, R (2001) Optimised uncertainty at minimum overall cost to achieve fitness-for-purpose in food analysis. Analyst, 126, 1777–1783. Youden, WJ and Steiner, EH (1975) Statistical Manual of the AOAC. AOAC, Washington.
CH010-N53039.indd 193
5/26/2008 8:02:32 PM
11 ESTIMATION OF MEASUREMENT UNCERTAINTY
In the previous chapter we considered the definitions and general approaches to estimation of measurement uncertainty. This chapter considers the procedures in more detail and provides worked examples of different methods for estimation of uncertainty. THE ‘GENERALIZED UNCERTAINTY METHOD’ (GUM) OR BOTTOM-UP PROCEDURE The basis of the generalised uncertainty method (GUM) ‘bottom-up’ approach described in Eurachem (2000) is to identify and take into account the cumulative variance associated with all stages of an analytical method. In order to estimate a generic level of uncertainty for a method, the variance associated with each individual stage is combined with the variances and covariances of all the other stages that make up an analytical procedure. This is illustrated diagrammatically in the following schematic:
Sample Matrix
Sampling Procedure
Analytical Method
Analytical Result
Errors Associated with the Microbial Distribution in the Sample Matrix The largest potential error sources of the sample matrix are: the spatial distribution of micro-organisms (e.g. random, under- or over dispersion), the condition of the microorganisms (viable, sublethally damaged, non-cultivatable); the presence of competitive organisms that might affect recoverability of specific types; the location of the organisms on or within Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH011-N53039.indd 195
195
5/26/2008 8:42:36 PM
196
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
the matrix (i.e. primarily on the surface or distributed throughout the matrix). However, the intrinsic composition of the matrix may also affect the results of an analysis.
Errors Associated with the Sampling Process It is assumed throughout that any sample has been drawn totally at random and without any deliberate bias. How representative of the lot is the analytical sample? Should the analytical sample be representative of the whole matrix, or should it relate only to a specific part, for example the surface of a meat carcass? If the former, should the matrix be homogenized prior to taking a sample? If the latter, will the technique used to sample the surface layer (excision, swabbing, rinsing or use of a replica plating technique) affect the results obtained? Is the microflora in the analytical sample representative, both numerically and typically, of the microorganisms in the original matrix? What size of sample should be tested? Increasing the size of an analytical sample frequently results in an apparent increase in the colony count whilst reducing the variance (Brown, 1977; Kilsby and Pugh, 1981).
Errors Associated with Use of a Microbiological Method At its simplest, the analysis consists of: ●
● ● ● ● ●
suspending and macerating the analytical sample in a defined volume of a suitable primary diluent; preparing serial dilutions; transferring measured volumes onto or into a culture medium; incubating the culture plates (or tubes); counting and recording the numbers of colonies; deriving a final estimate of colony forming units (cfu) in the original matrix.
Some errors, for example those associated with the accuracies of weighing, pipette volumes, colony counting, etc. can be quantified and measures of the variance can be derived. Other errors can be assessed, but not necessarily quantified: for instance, the extent to which a culture medium supports the growth of specific organisms. It has been suggested that errors in ability of a particular culture medium to support growth of specific organisms should be quantified and used to provide a correction factor for the yield of such organisms. In most circumstances, it is debatable whether correction factors should ever be used in microbiological practice, although it is commonplace in chemical analysis! Errors associated with individual technical performance on a day, cannot be quantified. Some analytical errors associated with microbiological practices are probably less significant than others, but how do you know which these are if the errors cannot be quantified? To assess the uncertainty of an analytical microbiological procedure from the ‘bottom-up’ requires a full evaluation of all potential sources of error for each and every stage of an
CH011-N53039.indd 196
5/26/2008 8:42:37 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
197
Sample preparation Sub-sampling Dilution Homogenisation Neutralisation
Sample receipt and storage
Incubation Calculation
Resuscitation Delay
Inoculation
log(cfu/g) Sub-sampling
Dilution Media preparation
Counting
FIGURE 11.1 A simple cause and effect (Fishtail) diagram illustrating some of the many sources of variance in a standard aerobic colony count (Reproduced from Jewell (2004), by permission of the Campden and Chorleywood Food Research Association).
analytical procedure. Figure 11.1 is a ‘Cause and Effect’ diagram to illustrate some of the sources of errors likely to affect a colony count procedure. Once a reliable schedule of quantifiable errors has been produced, the combined variance is obtained by summing the individual contributory variances: sR2 sa2 sb2 ... sx2 s2y sz2 where sR2 reproducibility variance of the method and sa2,... z variance of stages a,…z within the overall method. By definition, the standard uncertainty is the reproducibility standard deviation (sR) derived from the square root of the combined variance: sR sa2 sb2 ... sx2 sy2 sz2 However, it must be recognized that covariances may also need to be taken into account – but many of these are not immediately obvious. The expanded uncertainty is derived by multiplying the standard uncertainty by a coverage factor k, which has a value from 2 to 3. A value of 2 is normally used to give approximate 95% confidence limits; hence U k . sR 2. sR
CH011-N53039.indd 197
for a 95% confidence interval
5/26/2008 8:42:37 PM
198
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Niemelä (2002) provides a more detailed explanation of this approach to assessment of measurement uncertainty in microbiological analysis. In my opinion, this step-wise approach is not satisfactory for microbiological examination of foods because of the difficulty of building a fully comprehensive model of the measurement process. It is difficult to quantify the variance contribution of many individual steps in the analytical process not least because the analyte is a living organism, whose physiological state can be variable and the analytical target includes different strains, species and genera of microbes. In other words, microbiological methods are generally unsuitable for a rigorous and statistically valid metrological procedure for estimation of measurement uncertainty (Anon., 2006a).
THE TOP-DOWN APPROACH TO ESTIMATION OF UNCERTAINTY In this approach, the parameters used to derive uncertainty measures are estimated from the pooled results of valid inter-laboratory collaborative studies, or in the case of intermediate reproducibility, from an intra-laboratory study. Appropriate procedures to ensure that the study design is valid have been described by Youden and Steiner (1975), Anon. (1994, 1998) and Horwitz (1995). Quantitative microbiological data (e.g. colony counts and most probable numbers (MPNs) do not conform to a normal distribution and require transformation to ‘normalize’ the data before analysis. The choice of transformation is dependent upon the distribution of the original data. Routinely, most microbiologists transform their data by converting each data value (xi) into the log10 value (yi) where yi log10 xi. For purely theoretical reasons it may be more correct to transform the data using the natural logarithm (i.e. yi Ln xi) rather than the logarithm to base 10. For low-level counts (typically 100 cfu/g) that conform to the Poisson distribution (mean value (m) variance (s2)), the data should be transformed by taking the square root of each data value (i.e. yi xi). Even at low count levels the sampling variance will still be inflated by the variances of the counting method so the log transformation is usually preferable. However, because of problems of over-dispersion frequently associated with microbial distributions, it may be preferable to test for conformance with a negative binomial distribution. Some statistical packages (e.g. Genstat) include a facility to assess conformance with a negative binomial, using the Maximum Likelihood Method programme RNEGBINOMIAL, but such procedures are not universally available and it is very timeconsuming to calculate manually (see Chapter 3; and Niemelä, 2002).
ANALYSIS OF VARIANCE (ANOVA) The transformed data from all participating laboratories are subjected to analysis of variance (ANOVA) after first checking for conformance to a ‘normal’ distribution (qv) and identification and removal of ‘outliers’ (qv) followed, if necessary, by repeating the tests for conformance to ‘normality’.
CH011-N53039.indd 198
5/26/2008 8:42:38 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
199
Tests for ‘Normality’ Before doing any tests for ‘normality’ it is essential to undertake a descriptive analysis of the distribution of the data by plotting as histograms, box-plots, etc. in order to inspect the distribution. The key element is to obtain approximate symmetry (see e.g. Fig. 11.2). Various tests can be used to assess the ‘normality’ of data including the Box–Cox normality plot (Box and Cox, 1964), the Shapiro–Wilk test (Shapiro and Wilk, 1965), the Anderson–Darling test (Stephens, 1974), the D’Agostino–Pearson test (D’Agostino, 1986)
Graphical and Descriptive Analysis of Data
Variance SD SE CV
Median 97.9% CI
5.140 4.602 to 5.678 0.5662 0.7524 0.2379 15%
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
Frequency Plot
4.970 4.500 to 5.620
Range IQR
2.79 0.49
Percentile 2.5th 25th 50th 75th 97.5th
– 4.838 4.970 5.330 –
Shapiro–Wilk Skewness Kurtosis
Frequency
Mean 95% CI
10
Coefficient 0.9204 1.1222 2.4815
Box plot showing outliers
3
p 0.3605 0.1009 –
Normal quantile
n
2 1
“Normality Plot”
0 1 2 Log colony count/g–A
FIGURE 11.2 Descriptive statistics, together with frequency, box and normality plots of one set of interlaboratory study data. Although there is evidence of kurtosis and positive skewness, the log-transformed data conform reasonably to a ‘normal’ distribution. The Box plot shows the presence of a potential low-level outlier ( ) and a significant high-level outlier (䊊).
CH011-N53039.indd 199
5/26/2008 8:42:38 PM
200
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
and the Kolmogorov–Smirnov test (Chakravarti et al., 1967). Details of such tests are given in standard statistical texts (e.g. D’Agostino, 1986) and are included also in many computer software packages of statistical methods. How useful are the tests? Normality tests have little power to tell whether or not a small sample of data comes from a normal distribution such that small data sets almost always pass a normality test. With large samples, minor deviations from normality may be flagged as statistically significant, even though such deviations would not affect the results of subsequent statistical tests. If you to decide use normality tests first consider whether normality tests provide useful information. D’Agostino (1986) says, ‘The Kolmogorov–Smirnov test is only a historical curiosity. It should never be used’. It is however, one of the standard tests included in many statistical software packages. The Shapiro–Wilk normality test is complex and doesn’t work well when several values in a data set are the same. In contrast, the D’Agostino–Pearson omnibus test is easy to understand. It first analyses the data to determine skewness (i.e. to quantify the distribution asymmetry) and kurtosis (to quantify the shape of the distribution), calculates the extent of the difference of each of these values from the value expected (for a normal distribution) computes a single probability value from the sum of the squares of these discrepancies. Unlike the Shapiro–Wilk test, this test is not affected if the data contains identical values.
Tests for Outliers In a set of replicate measurements one or more values may differ considerably from the majority. In such a case there is always a strong motivation to eliminate the deviant values because of their effect on subsequent calculations, such as variance measurements. This is permissible only if the suspect values can be ‘legitimately’ characterized as outliers. An outlier is defined as ‘a value that arises from a distribution that differs from the main “body of data”’. Although this definition implies that an outlier may be found anywhere within the range of observations, it is natural to suspect and examine as possible outliers only the extreme values. Before taking a decision, it is essential to check the raw data for evidence of calculation or transcription errors, or identifiable technical errors in the test undertaken. In the absence of any valid explanation for an apparent outlier, such data are considered to be ‘outliers’. The rejection of suspect observations should not be based exclusively on objective criteria if there is a risk that the data are not normally distributed, or that there is extensive kurtosis or skewness in the distribution. Statistically sound tests for ‘the detection of outliers’ provide evidence that one or more data values lie outside the expected normal distribution of data, but not that the data values are intrinsically wrong. Youden and Steiner (1975), Anon. (1994) and Horwitz (1995) provide details of the most commonly used statistical tests for outliers. Youden and Steiner (1975) describe procedures for deciding whether or not one or more values should be excluded from a data set, including tests to assess whether sub-sets of data from individual laboratories conform to the general data set using a ranking procedure. They also discuss tests for ruggedness and the effect of missing values in an ANOVA procedure. Anon. (1994) and Horwitz (1995)
CH011-N53039.indd 200
5/26/2008 8:42:38 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
201
provide details of the Cochran and Grubb tests that are used to identify which individual data values should be described as ‘outliers’. Details of the methods and illustration of their use to examine data for outliers are given in Examples 11.1 and 11.2.
EXAMPLE 11.1 THE YOUDEN AND STEINER (1975) PROCEDURE TO IDENTIFY OUTLIER LABORATORIES IN A COLLABORATIVE STUDY Seven laboratories carried out tests for total aerobic and Enterobacteriaceae colony counts on a number of samples (Corry et al., 2007). A preliminary evaluation was made to assess whether the data from any laboratory might be statistically different to the others. The data are tabulated and ranked from 1 to 7 for each sample, as illustrated in Table 11.1. TABLE 11.1 Ranking of Enterobacteriaceae Colony Counts from Sample C in an Inter-laboratory Study of Microbiological Methods for Analysis of Meat Colony count (log10 cfu/g) and Rank (by laboratory) A1S1
A1S2
A2S1
A2S2
Laboratory
Count
Rank
Count
Rank
Count
Rank
Count
Rank
Total rank score
L1 L2 L3 L4 L5 L6 L7
3.86 4.07 4.95 4.4 3.58 4.29 3.81
3 4 7 6 1 5 2
3.82 3.99 4.99 4.4 3.53 4.41 3.78
3 4 7 5 1 6 2
4.06 3.77 4.6 4.09 3.47 4.95 4.18
3 2 6 4 1 7 5
3.96 4.11 4.63 3.86 3.47 4.79 3.95
4 5 6 2 1 7 3
13 15 26 17 4 25 12
Source: Data from Corry et al. (2007). A1S1 Analyst 1, Sample 1, etc.
Corresponding samples are ranked across laboratories with the lowest count determined on each sample being scored as 1 and the highest as 7; in the event that there are two equal values, each is given the rank x 1/2 and the next laboratory is scored as x 2. The total rank score for each laboratory is compared with the 95% critical limit values in table B of Youden and Steiner (1975). For 7 laboratories, each testing 4 samples, the upper and lower limits are 27 and 5, respectively. Hence at a 95% probability, the data from Laboratory 5 (total rank score 4) are significantly lower that those from the other laboratories. For subsequent parametric analyses the data should be investigated to ensure that that there is no acceptable technical explanation for the low results,
CH011-N53039.indd 201
5/26/2008 8:42:38 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
202
such as a calculation error that can be corrected. In the absence of any such error the data should be eliminated. For instance, if the results from laboratory 5 had been calculated wrongly, such that the results should have been 10-fold higher (i.e. about 4.5 rather than 3.5 log cfu/g), then none of the laboratories would have been scored as an ‘outlier’. It is worthy of note that the total rank scores for laboratories 3 and 6 are close to the upper critical limits but are not significantly different to the other laboratory data sets.
EXAMPLE 11.2 USE OF DIXON’S AND GRUBB’S TESTS TO IDENTIFY OUTLIER DATA Dixon’s test compares the differences between the highest two (or the lowest two) data values in a set with the difference between the highest and the lowest values in that set. In a set with n values, where n is 8 or less, highest value (xn) should be rejected if xn xn1 >r xn x1 where r is the critical value for a 95% probability, given by Dixon (1953) and Youden and Steiner (1975). Similarly, the lowest value (x1) should be rejected if x2 x1 >r xn x1 In Example 11.1, we eliminated laboratory 5 (x1), so we can now apply Dixon’s test for individual outliers to the remaining data (Table 11.2). TABLE 11.2 Ranking of Enterobacteriaceae Colony Counts on Sample C After Removal of Data for Laboratory 5 Colony count (log10 cfu/g) and Rank (by laboratory) A1S1
Laboratory L1 L2 L3 L4 L6 L7
CH011-N53039.indd 202
A1S2
A2S1
A2S2
Count (x)
Rank
Count (x)
Rank
Count (x)
Rank
Count (x)
Rank
3.86 4.07 4.95 4.40 4.29 3.81
2 3 6 5 4 1
3.82 3.99 4.99 4.40 4.41 3.78
2 3 6 4 5 1
4.06 3.77 4.60 4.09 4.95 4.18
2 1 5 3 6 4
3.96 4.11 4.63 3.86 4.79 3.95
3 4 5 1 6 2
5/26/2008 8:42:39 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
203
For a set of 6 measurements, Dixon’s criteria for rejecting the highest or lowest values, with a 1 in 20 chance of being wrong, is that either: x6 x5 x x1 or 2 > 0 . 56 (for a set of 6 results) (table C1, Youden and Steiner, x6 x1 x6 x1 1975). Since all of the calculated values are smaller than the Dixon’s criteria, none of the data values are identified as outliers (Table 11.3). Grubb’s test is an alternative method for data that conform reasonably to a normal distribution (Grubbs, 1969). The test detects one outlier at a time; after that outlier has been removed from the data set, the test is repeated iteratively, but it should not be used for samples sizes of less than 6. The null hypothesis (H0) is that there are no outliers in the data set; the alternative hypothesis (Ha) is that there is at least one outlier in the data set. The Grubb’s test statistic is a two-sided test that measures the largest absolute deviation from the sample mean in units of the sample standard deviation and is derived using the equation: Z
max yi y s
where, y and s denote the sample mean and standard deviation, respectively. Alternative versions to test for minimum (ymin) and maximum (ymax) value outliers are: Zmin
y ymin s
Zmax
and
ymax y , respectively s
For the data in Table 11.2, the mean colony count ( y ) is 4.23 log cfu/g and s 0.398. The maximum absolute difference in the log10 colony counts is 4.99 4.23 0.74 log10 units, so the Z value 0.75/0.398 1.88, which is less than the critical value of 2.80 for n 24. So we accept the null hypothesis that none of the values are outliers. Critical values for Z can be obtained from standard statistical tables of Student’s t distribution; a simple calculator for determining the Grubb’s Z value is available on the Internet1.
TABLE 11.3 Dixon’s Test Values for Data in Table 11.2 A1S1 Highest value
4 . 95 4 . 40
Lowest value
3 . 86 3 . 81
4 . 95 3 . 81
4 . 95 3 . 81
A1S2 0 . 48
0 . 04
4 . 99 4 . 40 4 . 99 3 . 78 3 . 82 3 . 78 4 . 99 3 . 78
A2S1
0 . 49
0 . 03
4 . 95 4 . 60 4 . 95 3 . 77 4 . 06 3 . 77 4 . 95 3 . 77
A2S2
0 . 30
0 . 15
4 . 79 4 . 63 4 . 79 3 . 95 3 . 96 3 . 95 4 . 79 3 . 95
0 . 19
0 . 01
1 The website www.graphpad.com/quickcalcs/Grubbs1.cfm provides a simple on-line calculator to determine the presence of an outlier in a data set of up to 2000 values.
CH011-N53039.indd 203
5/26/2008 8:42:39 PM
204
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
IUPAC recommends an iterative procedure (Horwitz, 1995). Cochran’s test for withinlaboratory variance (stage 1) is used first to eliminate any outlier laboratory and then Grubb’s two-tailed test for between laboratory variance is used (stage 2), again removing any laboratory that fails; if no laboratory fails, then Grubbs pair-value test is applied (stage 3) to identify any failure. The procedure is then repeated sequentially either until no more outliers are found or until the maximum permitted number of data values (22%; 2/9) has been removed. Details of the methods and critical values for the tests are given in Horwitz (1995).
The Standard ANOVA Procedure The standard ANOVA is a statistical model that is used to uncover the main and interacting effects of independent variables on a dependent variable. The procedure estimates the extent to which an observed overall variance can be partitioned into components each of which causes a specific effect. In the case of an inter-laboratory collaborative trial, independent variables might include replication of samples, analysts, laboratories, test methods or other factors. ANOVA can be done on the assumption of fixed-effect, random-effect or mixed-effect models – the random-effects model generally applies in the context of the analysis of collaborative trial data. Depending upon the experimental design, ANOVA can be used either as a simple one-way or as a nested design (e.g. 2 2, 3 3, etc.). The procedure provides an estimate of the significance of an effect through derivation of the Fisher’s F-ratio (see Example 11.4). The estimate of variance is determined from the sum of squares of the difference between each data value and the overall mean value mean. In a simple model, the sum of the squares of the variance is equal to the sum of the squares of both the treatments and the ‘error’: SSTotal SSTreatments SSError The contribution of each effect to the total degrees of freedom (df ) of the estimated sum of squares can be partitioned similarly: dfToral dfTreatments dfError. Assuming a fully ‘nested’ experimental design (e.g. duplicate testing of ‘S’ samples by ‘A’ analysts in each of ‘L’ laboratories), the residual mean variance of the ANOVA (i.e. the variance of replicated analyses on a single sample) provides an estimate of repeatability variance ( sr2 ). The estimate of reproducibility variance ( sR2 ) first requires computation of the contributions to variance of the samples, analysts and laboratories (see below). The repeatability standard deviation (sr) and the reproducibility standard deviation (sR), being the square root values of the respective variances, are the measures of standard uncertainty from which the expanded uncertainty estimates are derived.
CH011-N53039.indd 204
5/26/2008 8:42:39 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
205
EXAMPLE 11.3 USE OF THE ANOVA PROCEDURE TO DERIVE THE COMPONENT VARIANCES FROM A COLLABORATIVE ANALYSIS Assume: An inter-laboratory trial has been done in 10 laboratories (p 10) in each of which two analysts tested two replicate samples, making duplicate analyses of each sample. Hence, each laboratory carried out 8 replicate analyses and the total number of analyses 8p 80. Each data value (ypijk ) is allocated to a cell in the data table below in the sequence laboratory (p), analyst (i), sample (j) and replicate (k), and the data are then analysed by nested ANOVA (Table 11.4). TABLE 11.4 Layout of Data in a Cell-Formata for ANOVA Analyst (i 1) Sample (j 1) Laboratory (P 1,…,10) 1 2 3 4 … … … 10
Analyst (i 2)
Sample (j 2)
Sample (j 1)
Sample (j 2)
Replicate Replicate Replicate (k 1) (k 2) (k 1)
Replicate (k 2)
Replicate (k 1)
Replicate (k 2)
y1112 y2112 y3112 y4112 … … … y10112
y1122 y2122 y3122 … … … … y10122
y1211 y2211 y3211 …
y1212 y2212 y3212 …
y1221 y2221 y3221 …
Y1222 Y2222 Y3222 …
y10211
y10212
y10221
Y10222
y1111 y2111 y3111 y4111 … … … Y10111
y1121 y2121 y3121 y4121 … … … Y10121
Replicate Replicate (k 1) (k 2)
Cell formats are labelled in the sequence e.g. y1121 laboratory 1, analyst 1, sample 2, replicate 1.
a
The ANOVA table for a four-factor fully nested experiment is depicted in Table 11.5 TABLE 11.5 ANOVA Table for a Four-Factor Fully Nested Experiment for 10 laboratories, 2 analysts, 2 samples each tested in duplicate Source of variation
Sum of squares
Degrees of freedom
Laboratories
SSlab
p19
SSlab/9 MSlab
Analysts
SSana
p 10
SSana/10 MSana
Samples Residual
SSsam SSres
2p 20 4p 40
SSsam/20 MSsam SSres/40 MSres
Total
Total SS
8p 1 79
Mean square
Expected mean square componentsa 2 2 2 2 r 2sam 4ana 8 lab 2 2 2 r 2sam 4ana 2 2 r 2sam 2 r
a
The components are shown as population variances since this is an expectation table.
CH011-N53039.indd 205
5/26/2008 8:42:39 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
206
The residual mean square (MSres) provides the repeatability variance ( sr2 ) between duplicate analyses done on the same replicate sample and the repeatability standard deviation is sr2 . 2 2 ) is given by [ MS The variance due to samples (ssam sam sr ]/ 2 2 2 The variance due to analysts (sana ) is given by [MSana 2ssam sr2 ]/ 4 2 2 2 2 ) is given by [ MS The variance due to laboratories (slab lab 2 ssam 4 sana sr ]/ 8 2 2 2 2 2 The reproducibility variance (sR ) is given by [ ssam sana slab sr ] The reproducibility standard deviation is given by
2 2 2 s2 ssam sana slab r
EXAMPLE 11.4 DERIVATION OF THE COMPONENT VARIANCES FROM AN ANOVA OF A COLLABORATIVE TRIAL Assume an experimental design of 10 laboratories, in each of which two analysts each test two samples in duplicate (i.e. two replicate analyses per sample) for aerobic colony counts. The log10-transformed colony counts (log10 cfu/g) are tabulated (Table 11.6) as follows: TABLE 11.6 Example of Layout of Data for Aerobic Colony Counts (log10 cfu/g) on Replicate Samples Examined in 10 Laboratories Analyst (i 1) Sample (j 1) Laboratory (P 10) 1 2 3 4 5 6 7 8 9 10
Replicate Replicate (k 1) (k 2) 5.56 6.02 6.26 5.07 5.39 5.98 5.43 5.94 5.45 5.51
5.73 5.88 6.30 5.11 5.25 5.88 5.18 5.73 5.35 5.74
Analyst (i 2)
Sample (j 2) Replicate (k 1) 5.76 5.87 6.46 4.90 5.28 6.02 5.16 5.28 5.49 6.18
Sample (j 1)
Replicate Replicate (k 2) (k 1) 5.59 5.80 6.54 4.61 5.52 5.64 5.08 5.47 5.42 6.13
6.08 5.54 6.42 4.63 5.34 5.96 6.15 5.99 5.68 5.83
Replicate (k 2) 5.96 5.63 6.49 4.81 5.46 6.06 5.76 6.01 5.57 5.91
Sample (j 2) Replicate Replicate (k 1) (k 2) 6.07 5.92 6.11 4.42 5.47 5.70 5.44 5.92 5.74 5.76
5.99 5.79 6.42 4.56 5.49 5.57 5.43 6.13 5.69 5.60
A test for normality (e.g. Shapiro–Wilk, W 0.9830, P 0.0885) did not reject the hypothesis that the log10 transformed data conform reasonably (although not perfectly) to a normal distribution. The variance of the Laboratory 7 data was larger than for the other laboratories but the Cochran test (Horwitz, 1995) rejected the null hypothesis that the individual variances were different. Subsequent evaluation using the Grubb’s test
CH011-N53039.indd 206
5/26/2008 8:42:40 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
207
also failed to demonstrate that individual data sets or other laboratories were outliers. The results of the multivariate ANOVA are tabulated below: TABLE 11.7 ANOVA Table for the Four-Factor Fully Nested Experiment (No Data Excluded) Source of variation
Sum of squares
Degrees of freedom
Mean square (rounded to 4 places)
Component contributions to the mean square
Laboratories Analysts Samples Residual Total
12.636 1.4906 1.346 0.5554 16.0272
9 10 20 40 79
1.4040 0.1491 0.0673 0.0139
2 2 2 s2 r 2ssam 4sana 8s lab 2 2 2 s r 2ssam 4sana 2 s2 r 2ssam 2 sr
The component variances are derived as follows: Repeatability variance (sr2 ) 0.0139 2 ) [0.0673 s2 ]/2 [0.06730.0139]/2 0.0267 Sample variance (ssam r 2 ) [0.1491( s2 2 s2 Analyst variance (sana r sam )]/4
[0.14940.0673]/4 0.0205 Laboratory variance
2 ) (slab
2 2 )]/8 [1.404( sr2 2ssam 4 sana
[1.4040.1491]/8 0.1569 Reproducibility variance
(sR2 )
(0.0139 0.0267 0.0205 0.1569) 0.2180
Reproducibility standard deviation sr 0.2180 0.4669 Repeatability standard deviation sr 0.0139 0.1179 The mean colony count 5.6921 5.69 (log10) cfu/g. Hence, the relative standard deviation of reproducibility (RSDR) 100 0.4669/5.69 8.21% and the relative standard deviation of repeatability (RSDr) 100 0.1179/5.69 2.07%. From these values the 95% expanded uncertainty of reproducibility is given by: U 2 sR 2 0 . 4669 0 . 9338 . 0 . 93 (log10 ) cfu/g The upper and lower bounds of the 95% confidence interval on the mean colony count are: UCI 5 . 69 0 . 93 6 . 62 (log10 ) cfu/g LCI 5 . 69 0 . 93 4 . 76 (log10 ) cfu/g Hence, repeat tests in different laboratories on samples drawn from this particular lot could be expected to lie, with a 95% probability, within the range 4.74–6.62 log10 cfu/g.
CH011-N53039.indd 207
5/26/2008 8:42:40 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
208
TABLE 11.8 Effect of Outlier Values on Mean and Median Values
Series
Data values (x)
Number of values (n)
Sum of values ()
Mean value
x
Median valuea
A
1, 3, 3, 3, 4, 4, 5, 5, 6, 6
10
40
4
4
B
1, 3, 3, 3, 4, 4, 5, 5, 6, 26
10
60
6
4
C
1, 3, 3, 3, 4, 4, 5, 6, 15, 26
10
70
7
4
D
1, 3, 3, 3, 4, 4, 5, 6, 15, 126
10
170
17
4
E
1, 3, 3, 3, 4, 4, 5, 6
8
29
3.6
3.5
a
The median is the mid value of a series having an odd number of values, or the average of the two mid values for a series having an even number of values.
ROBUST METHODS OF ANOVA Because of the problems with the occurrence of outlier data, several alternative approaches to the ANOVA have been developed, based on Robust methods of statistical analysis. Robust procedures estimate variances around the median values and do not require identification and removal of outlying data (which could actually be valid results). Mean values are affected significantly by one or more outlier values within a data set, whereas the median value is less affected (Table 11.7). For series A in Table 11.7, the mean and median values are the same, but the presence of one or more high values (as in Series B, C, D) significantly increases the value of the mean but has no effect on the median value. Removal of the high outliers (series E) reduces both the mean and the median values but with the loss of original data that may be valid. An equivalent effect would be seen with low value outliers, although the occurrence of both high and low outliers could balance out the effect on the mean, but not on the variances. Before deciding whether or not to accept data with obvious outlier values it is essential to check that the values are not due to a simple transcription, calculation or technical error. There are two primary alternative techniques of robust analysis: One approach (RobStat) (Analytical Methods Committee, 1989a,b, 2001) calculates the median absolute difference (MAD) between the results and their median value and then applies Hüber’s method of winsorization, a technique for reducing the effect of outlying observations on data sets (for detail see Smith and Kokic, 1997). The procedure can be used with data that conform approximately to a normal distribution but having heavy tails and/or outliers. The procedure is illustrated in Example 11.5 for a set of descriptive statistical data shown in Fig. 11.2. However, the procedure is not suitable for multimodal or heavily skewed data sets. The Analytical Methods Committee (AMC) website2 provides downloadable software for use either in Minitab or Excel (97 or later version). 2
www.rsc.org/lap/rsccom/amc/amc_software.htm#robustmean
CH011-N53039.indd 208
5/26/2008 8:42:41 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
209
EXAMPLE 11.5 ROBUST ANOVA USING THE ROBSTAT PROCEDURE (ANALYTICAL METHODS COMMITTEE, 2001) The data examined previously by ANOVA (Example 11.4) were analysed using the robust procedure. The data were tabulated (Table 11.9) as replicate pairs in an Excel spreadsheet to estimate the repeatability between replicates and the overall reproducibility. TABLE 11.9 Data Organised Sequentially for Analysis of Within- and Between-Sample Standard Deviation Using the RobStat Procedure Replicate Laboratory 1
Analyst
Sample
1
2
Mean
1
1 2 1 2
5.56 5.76 6.08 6.07
5.73 5.59 5.96 5.99
5.65 5.68 6.02 6.03
1 2 1 2
6.02 5.87 5.54 5.92
5.88 5.80 5.63 5.79
5.95 5.84 5.59 5.86
1 2 … …
6.26 6.46 … …
6.30 6.54 … …
6.28 6.50 … …
1 2 1 2
5.51 6.18 5.83 5.76
5.74 6.13 5.91 5.60
5.63 6.16 5.87 5.68
2 2
1 2
3
1
… …
… …
10
1 2
The RobStat menu is set up to analyse the data in rows (i.e. as replicate values) by an iterative method, with a convergence criterion of 0.0001 and a constant value (c) of 1.5. The output (Table 11.10) gives a repeatability SD (i.e. between replicates) of 0.12 and
TABLE 11.10 RobStat Output Table Showing the Within-Group (Repeatability) SD, the Between-Group SD and the Reproducibility SD for the Data from Table 11.9 Parameter Grand mean Within-group/repeatability SD Between-group SD Reproducibility SD
CH011-N53039.indd 209
5.691514 0.119667 0.400970 0.418446
5/26/2008 8:42:41 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
210
a reproducibility SD of 0.42. The reproducibility SD is calculated from the [(between-group SD)2 (within-group SD)2] (Table 11.10). The RSD of repeatability is 100 0.12/5.69 2.11% and the RSD of reproducibility is 100 0.42/5.69 7.38%. The RobStat procedure can be repeated on values for each pair of samples to determine the between- and within-analyst SDs, and similarly to determine the between- and within-laboratory SDs. For these data the results are in Table 11.11. TABLE 11.11 RobStat Robust Estimates of the Between- and Within-SD for Samples, Analysts and Laboratories Parameter
Samples
Analysts
Laboratories
Grand mean Within-group/repeatability SD Between-group SD Reproducibility SD
5.691514 0.119667 0.40097 0.418446
5.69297 0.181677 0.341481 0.386802
5.692971 0.215163 0.251358 0.330871
Unfortunately, these tabulated standard deviations do not directly provide estimates of the component variances from the collaborative trial. The next example demonstrates a way in which component variances can be derived from such data.
An alternative approach, known as the Recursive Median is based on an extrapolation of the work of Rousseeuw & Croux (1993). One version of this approach (described in ANON 2003) uses Rousseeuw’s recursive median Sn. The procedure is illustrated in Example 11.7. Wilrich (2007) has proposed an alternative approach to the robust estimation of repeatability and reproducibility standard deviations. He demonstrated that estimates of SR based on the procedures described in Anon. (1994, 1998, 2003), including the identification and removal of outliers where appropriate, or using a Swiss standard method (Anon., 1989) fail adequately to provide efficient estimates (maximum efficiency of the estimates range from 25% to 58%). Wilrich proposed the use of an alternative procedure based on the original procedure of Rousseeuw and Croux (1993) who derived a value Qn, which has an efficiency of 82.3%, rather than Rousseeux’s Sn procedure described in Anon. (2003) and illustrated in Example 11.7. A problem with robust methods of analysis is that they do not directly provide a means of assessing component variances. Hedges and Jarvis (2006) have described a stepwise procedure for determination of component variances. All results are first analysed to determine the between- and within-group variances associated with duplicate tests. This is then followed by sequential analysis of the mean values to determine the between- and within-variances for samples, analysts and laboratories. The procedure is illustrated in Example 11.6. Hedges (2008) used the Wilrich (2007) approach to revise the procedure described by Hedges and Jarvis (2006).
CH011-N53039.indd 210
5/26/2008 8:42:41 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
211
EXAMPLE 11.6 DETERMINATION OF COMPONENT VARIANCES FROM ROBUST ANOVA USING THE METHOD OF HEDGES AND JARVIS (2006) In the previous example we determined the between- and within-standard deviations for a set of nested data involving 10 laboratories, in each of which 2 analysts each examined 2 samples in duplicate – a total of 80 sets of aerobic colony counts. The robust analyses (Table 11.11) provide the starting point to determine the component variances for samples, analysts and laboratories. We use the notation that the variance is the square of the relevant group SD, that 2 2 is ‘within’ group variance is shown by Wgroup and ‘between’ group variance by B(group) . The residual variance (i.e. the error of the analysis) is given by: 2 sr2 Wsam (0 . 119667)2 0 . 014320191 ⬄ 0 . 0143
The component variance due to the samples is given by: 2 2 W2 2 2 2 2 ssam Wana sam Wana Sr /2 (0 . 181677) (0 . 119667) / 2 0 .0 0 25846 ⬄ 0 . 0258 2 2 where Wana is the variance within analysts and Wsam is the variance within samples the residual variance sr2 . Similarly, the component variance due to analysts is given by: 2 2 W 2 / 2 (0 . 215163) 2 (0 . 181677) 2 2 0 . 029791 ⬄ 0 . 0 2 98 , sana Wlab ana 2 is the within laboratory variance; and the component variance due to laboratowhere Wlab ries is given by, 2 B2 (0 . 251358) 2 0 . 0631808 ⬄ 0 . 0632 , where slab lab laboratory variance.
Then the reproducibility SD
2 Blab
is the between-
2 2 2 s2 ssam sana slab r
0 . 0632 0 . 0298 0 . 0258 0 . 0143 0 . 133 1 0 . 3648 Hence, for a mean colony count of 5.69 log10 cfu/g, the RSDs of repeatability and reproducibility determined by the Hedges and Jarvis (2006) modification of the RobStat procedure, are 2.10% and 6.41%, respectively. In Example 11.4, the traditional ANOVA of the same data gave RSDs of repeatability and reproducibility of 2.07% and 8.21%, respectively. As was noted in that Example, although the data set contained no outliers it did not conform well to a normal distribution, showing evidence of kurtosis and skewness. It is not surprising therefore that the traditional ANOVA gave higher values for inter-laboratory reproducibility.
CH011-N53039.indd 211
5/26/2008 8:42:41 PM
212
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
EXAMPLE 11.7 ROBUST ANOVA USING THE RECURSIVE MEDIAN PROCEDURE (ISO 2003) The original colony count data for each sample (Table 11.6) is rearranged into a Primary Data Table in which the mean of the duplicate counts is shown for each sample (Table 11.12a) tested by each analyst. For instance, for Analyst 1 in Laboratory 1, the mean counts for samples 1 and 2 are 5.65 [(5.56 5.73)/2] and 5.68 [(5.76 5.59)/2], respectively and the Analyst mean count (5.66) is determined. The mean count for each analyst is then copied (as shown by the arrows) to the Secondary Data Table (Table 11.12b). NB Although these data are shown to two significant places, as in all statistical work the actual data used for the calculations should not be rounded down. The mean count (Mk) and the standard deviation (S k) for each analyst is then determined, and the median of the mean counts (Med Mk ) and the median standard deviation (MED Sk) value is determined. The standard deviation of repeatability (Sr) is calculated as k2(MED Sr), where k2 Rousseeuw’s constant 1.4826 for two data values. Hence for the data shown Sr 0.215. Each of the values for Mk is now put into a two-way matrix (Table 11.12c) that is used to determine the absolute differences in the mean counts by laboratory, that is the absolute differences between the mean count in each laboratory and the mean counts in each of the other laboratories 1–10. Note that this differs from the ISO procedure (Anon., 2003) where the zero values for the comparison of like with like (e.g. Lab 1 with Lab 1) are removed3. The MAD (Median Absolute Difference) is then determined for each laboratory – as demonstrated by Wilrich (2007) if the data set contains an even number of values, the median used is the upper median value. Finally, the recursive median (MEDMED) of the MAD values is determined, using the lower median value for even numbered data sets (Table 11.12d). The between-laboratory SD (SLab ) is determined by multiplying MEDMED by a constant k1, where k1 1.1926. The reproducibility SD4 is then determined as, SR
2 SLab Sr2 / 2
For the data shown, where SLab 0.355 and Sr 0.215, SR 0.386. The Relative SDs of repeatability and reproducibility are 3.70% and 6.65%. The component variances for the Remedian procedure can be derived in a similar manner to those used for the RobStat procedure (Example 11.6) but the working is somewhat more complex (for details see Hedges and Jarvis, 2006; a programme to do the stepwise median in SAS is freely available at http://www.bristol.ac.uk/cellmolmed/sas.html). As can be seen from the worked example, the Remedian process is much more complicated and time-consuming than is the RobStat approach to Robust ANOVA. 3
Rousseeux and Croux, (1993) retained these values in the matrix. The equation for determination of the reproducibility standard deviation is given in the published form (Anon, 2003). 4
CH011-N53039.indd 212
5/26/2008 8:42:41 PM
CH011-N53039.indd 213
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
10
9
8
7
6
5
4
3
2
Analyst
Laboratory
a. Primary Data Table
5.65 6.02 5.95 5.59 6.28 6.46 5.09 4.72 5.32 5.40 5.93 6.01 5.31 5.96 5.84 6.00 5.40 5.63 5.63 5.87
Sam 1 5.68 6.03 5.84 5.86 6.50 6.27 4.76 4.49 5.40 5.48 5.83 5.64 5.12 5.44 5.38 6.03 5.46 5.72 6.16 5.68
Sam 2 5.66 6.03 5.89 5.72 6.39 6.36 4.92 4.61 5.36 5.44 5.88 5.82 5.21 5.70 5.61 6.01 5.43 5.67 5.89 5.78
Mean
Mean log cfu/g
5.66 5.89 6.39 4.92 5.36 5.88 5.21 5.61 5.43 5.89 5.64
5.85 5.81 6.38 4.77 5.40 5.85 5.46 5.81 5.55 5.84 5.81
Mean {Mk}
0.262 0.120 0.021 0.219 0.057 0.042 0.346 0.283 0.170 0.078 0.14
Std Dev {Sk}
MED{Mk} MED{Sk}
6.03 5.72 6.36 4.61 5.44 5.82 5.70 6.01 5.67 5.78 5.75
Analyst(X) Analyst(Y)
Sr k2. MED{SDk} St Dev'n of Repeatability 0.215 k2 1.4826
1 2 3 4 5 6 7 8 9 10 Median
Laboratory
b. Secondary Data Table
TABLE 11.12a,b Robust analysis of data by the Recursive Median (Remedian) Procedure (ISO, 2003)
ESTIMATION OF MEASUREMENT UNCERTAINTY
213
5/26/2008 8:42:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
214
TABLE 11.12c The Remedian Matrix to Derive the Absolute and the Median Differences Laboratory 1
2
3
4
5
6
7
8
9
10
5.85
5.81
6.38
4.77
5.40
5.85
5.46
5.81
5.55
5.84
Laboratory
Absolute difference between the mean colony counts 1 2 3 4 5 6 7 8 9 10
5.85 5.81 6.38 4.77 5.40 5.85 5.46 5.81 5.55 5.84
MADa MEDMEDb a
0.00 0.04 0.53 1.08 0.44 0.00 0.39 0.04 0.30 0.01
0.04 0.00 0.57 1.04 0.40 0.04 0.35 0.00 0.26 0.03
0.53 0.57 0.00 1.61 0.98 0.53 0.92 0.57 0.83 0.54
1.08 1.04 1.61 0.00 0.64 1.09 0.69 1.05 0.79 1.07
0.44 0.40 0.98 0.64 0.00 0.45 0.05 0.41 0.15 0.44
0.00 0.04 0.53 1.09 0.45 0.00 0.40 0.04 0.30 0.01
0.39 0.35 0.92 0.69 0.05 0.40 0.00 0.36 0.09 0.38
0.04 0.00 0.57 1.05 0.41 0.04 0.36 0.00 0.26 0.03
0.30 0.26 0.83 0.79 0.15 0.30 0.09 0.26 0.00 0.29
0.01 0.03 0.54 1.07 0.44 0.01 0.38 0.03 0.29 0.00
0.30 0.30
0.26
0.57
1.05
0.44
0.30
0.38
0.26
0.29
0.29
MAD Median absolute difference MEDMED Median MAD values
b
TABLE 11.12d The derivation of statistical parameters Median log colony count Median median difference in counts Repeatability SD Between laboratory SD Reproducibility SD Repeatability limit Relative SD of repeatability Reproducibility limit Relative SD of reproducibility
MED{Mk} {MEDMED} RM Sr SL 1.1926 RM SR [SL2 Sr2/2] R 2 2 Sr RSDr 100 Sr/{Med(Mk)} R 2 2 SR RSDR 100 SR/{Med(Mk)}
5.81 0.30 0.215 0.355 0.386 0.608 3.70% 3.088 6.65%
A comparison of the RSD values derived by the three procedures is:
Parameter RSD repeatability (%) RSD reproducibility (%)
CH011-N53039.indd 214
ANOVA
RobStat ANOVA
Remedian
2.07% 8.21%
2.10% 6.41%
3.70% 6.65%
5/26/2008 8:42:42 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
215
The estimates of RSD reproducibility are similar by both robust procedures and both are smaller than those obtained by the traditional ANOVA procedure. However the Remedian gives a higher estimate for the RSD repeatability than either the traditional ANOVA or the RobStat procedures, both of which are similar. We have seen similar differences in many comparative analyses of the three methods.
MEASUREMENT OF INTERMEDIATE REPRODUCIBILITY Intermediate reproducibility is the term used to describe estimates of reproducibility done in a single laboratory. Confusion sometimes arises between repeatability and intermediate reproducibility. Estimates of repeatability require all stages of the replicated testing to be done only by a single analyst, carrying out repeat determinations on a single sample in a single laboratory, using identical culture media, diluents, etc. within a short time period (e.g. a few hours). If more than one analyst undertakes the analyses and/or tests are done on different samples and/or on different days then the calculation derives a measure of intermediate reproducibility. The procedure described below can be used to determine average repeatability estimates for individual analysts provided all the repeatability criteria are met; or for estimates of intermediate reproducibility. Such estimates provide a source of data amenable to statistical process control (Chapter 12), which can be valuable in laboratory quality management. Intra-laboratory uncertainty estimates can be made either by carrying out an internal collaborative trial, with different analysts testing the same samples over a number of days, using different batches or even different brands of commercial culture media, etc. In such a case the statistical procedures of choice are those described under ANOVA or Robust ANOVA. However, if a laboratory undertakes routine quality monitoring tests, it is possible to estimate reproducibility from these data. A simple procedure, described by ISO (Anon., 2006a), can be used to determine the variance for each set of transformed replicate data values. The reproducibility standard deviation is derived from the square root of the sum of the duplicate variances divided by the number of data sets. The equation is SR
(yi1 yi 2 )2 / 2 n i 1 n
∑
where yi1 and yi2 are the log-transformed values of the duplicate counts (xi1 and xi2) and n is the number of pairs of counts. Calculation of intermediate reproducibility values is shown in Example 11.8.
CH011-N53039.indd 215
5/26/2008 8:42:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
216
EXAMPLE 11.8 DERIVATION OF INTERMEDIATE REPRODUCIBILITY (MODIFIED FROM ANON., 2006) Suppose that we have a set of data consisting of duplicate results on a number of samples, all of which were tested by the same procedure. The data below were derived from enumeration of aerobic mesophilic flora in mixed poultry meat samples. The duplicate data values (xi A and xi B) are log10 transformed to give yi A and yi B, respectively. The mean log10 counts ( y ) are determined as (yi A yi B)/2 and the vari2 ) 2 /y . ances (SRi from (yi Ayi B)2/2. The RSD % for each pair of counts is given by 100 SRi TABLE 11.13 Procedure for Calculation of Intermediate Reproducibility Based on 10 Sets of Paired Colony Counts (Anon., 2006a)
Test (i )
i1 i2 i3 i4 i5 i6 i7 i8 i9 i 10
Colony count A Colony count B (cfu/g) (cfu/g)
Log count A
Log count B
Mean log count
Absolute difference in log count
Variance
Relative standard deviation (%)
SR2 i
RSDRi
xi A
xi B
yi A log10 (xi A)
yi B log10 (xi B)
y
yi Ayi B
6.70E 04 7.10E 06 3.50E 05 1.00E 07 1.90E 07 2.30E 05 5.30E 08 1.00E 04 3.00E 04 1.10E 08
8.70E 04 6.20E 06 4.40E 05 4.30E 06 1.70E 07 1.50E 05 4.10E 08 1.20E 04 1.30E 04 2.20E 08
4.83 6.85 5.54 7.00 7.28 5.36 8.72 4.00 4.48 8.04
4.94 6.79 5.64 6.63 7.23 5.18 8.61 4.08 4.11 8.34
4.89 6.82 5.59 6.82 7.26 5.27 8.67 4.04 4.30 8.19
0.11 0.06 0.10 0.37 0.05 0.18 0.11 0.08 0.37 0.30
Average
6.18
0.00605 0.00180 0.00500 0.06845 0.00125 0.01620 0.00605 0.00320 0.06845 0.04500
1.59% 0.62% 1.26% 3.84% 0.49% 2.42% 0.90% 1.40% 6.09% 2.59%
0.022145
The reproducibility standard deviation is derived as the square root of the mean variance of the individual data sets: n
SR
∑ ( yi1 yi2 )2 / 2 i1
n
0 . 22145 0 . 022145 0 . 1488 10
The overall intermediate %RSD is given by the ratio between the overall SD and the overall mean log count 100 SR / y 100 (0.1488/6.18) 2.41%. The % RSD values on individual duplicate analyses (i 1,…10) range from 0.62% to 6.09%. Note: The average value (2.12%) of the individual RSD values differs from the calculated overall RSD (2.41%) and should not be used.
CH011-N53039.indd 216
5/26/2008 8:42:42 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
217
ESTIMATION OF UNCERTAINTY ASSOCIATED WITH QUANTAL METHODS By definition, a quantal method is non-quantitative and merely provides an empirical answer regarding the presence or absence of a specific index organism or a group of related organisms in a given quantity of a representative sample. Provided that multiple samples are analysed and, on the assumption that the test method is ‘perfect’, then the number of tests giving a positive response provides an indication of the overall prevalence of defective samples (see Chapter 8). For instance, if a test on 10 parallel samples found 4 positive and 6 negative samples then the apparent prevalence of defectives would be 40% (of the samples analysed). However, if no positive samples were found it would be common practice to refer, incorrectly, to an apparent prevalence of defectives of 0% when in fact the apparent prevalence is 10% with 95% confidence limits of 0–31%. Estimation of Variance Based on Binomial Distribution Determination of repeatability and reproducibility estimates for quantal test procedures can be based on the binomial probability of detection of positive and negative results. As described in Chapter 8, the probability of an event occurring (P) can be derived from the binomial response and an error estimate made, provided that a realistic number of samples has been analysed. For events that conform to the binomial distribution, the probability of obtaining k successes out of n trials is given by: P(k out of n)
n! n! (pk )(qnk ) (pk )(1 p)nk k !(n k) ! k !(n k) !
where k is the observed number of positive events from n independent trials, p is the probability of success (i.e. that an event will occur) in a single trial and q( 1 p) is the probability of failure (i.e. that the event will not occur) in a single trial. For a single trial, the mean of a binomial distribution is derived from the normal approximation and is given by np; the variance (2) is given by 2 npq np (1p). Suppose that a series of 10 replicate trials, each of 10 tests, on a reference sample gives a total of 60 positive results (out of 100 tests). Then the overall proportion of positive results is p 60/100 0.6 and that of negative results is q 1 p 0.4. The variance (s2) around the mean (np 60) is given by np(1 p) 60 0.4 24 and the standard deviation (s) 24 4.9. Hence, the approximate expanded 95% uncertainty around the mean is 2 4.9 9.8. So on 95% of occasions the expected proportion of positive results will be 0.6 0.1. Clearly, the confidence limits applicable to such data are very wide but the approach does derive an estimate for a parameter (s) that could be used as a measure of the standard uncertainty. An alternative approach, suggested by Prof. Peter Wilrich (personal communication), uses the ISO procedure (Anon., 1994) to determine the variances of repeatability and reproducibility for continuous characteristics in inter-laboratory experiments in which a qualitative
CH011-N53039.indd 217
5/26/2008 8:42:43 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
218
characteristic is investigated. A measurement series yij is made up of either 1’s (positive result) or 0’s (negative result), where the laboratory is i (i 1,…,k) and the measurement result is j 1 n (j 1,…,n). Its average, given by yi ∑ yij , is the fraction of positive results obtained in n j1 laboratory j; the estimated sensitivity pˆ i, and the average fraction of positive results over
k n k all laboratories, y 1 ∑ ∑ yij 1 ∑ yi provides an estimated average sensitivity ( p ) kn i1 i1 k i1 in all laboratories. Applying these concepts to the ISO statistical model (Anon., 1994) permits the estimation of variance both within and between laboratories (i.e. repeatability and reproducibility variances) by use of ANOVA. The between- and within-laboratory standard deviations can be obtained from the ANOVA and hence estimates of measurement uncertainty can be derived.
MPN Estimates When some test results are positive, an estimate of population density can be derived for multiple tests even at a single dilution level, using the equation: ⎛s⎞ 1 M Ln ⎜⎜ ⎟⎟⎟ ⎜⎝ n ⎠ v where, M Poisson estimate of the mean, v quantity of sample, s number of sterile tests out of n tests inoculated. Assuming 10 tests are set up on replicate 25 g samples of product, three tests are positive and seven are negative, then the value of M is: ⎛7 ⎞ 1000 M Ln ⎜⎜ ⎟⎟⎟ 40(0 . 3567) 14 . 27 14 organisms / kg ⎜⎝ 10 ⎠ 25 Since, for a Poisson estimate, the mean equals the variance, then the standard deviation of such a result will be 14 3.74, thus we can say that the result with its 95% confidence limits are given by 14 7.5 organisms/kg and range from 6 to 22 organisms/kg. As shown in Tables 8.6 and 8.7, if tests are done at several levels of inoculation it is possible to derive upper and lower confidence limits for the estimated MPN either from Tables or by calculation The confidence limits define the relative uncertainty of the estimate.
Level of Detection Estimates The equation above is also the basis for deriving MPN values for use in the Spearman– Kärber procedure to estimate the LOD50 for a test. The procedure uses non-parametric
CH011-N53039.indd 218
5/26/2008 8:42:43 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
219
statistics to calculate the microbial concentration (and confidence limits) that corresponds to a 50 % probability of a positive result using a defined test method. Finney (1971) describes the statistical background to the test. A practical example of the calculation (taken from Hitchins, 2006) is shown in Example 11.9; Fig. 11.3 illustrates a typical sigmoid dose response curve. The procedure is not applicable to routine test data since it requires an estimate of the level of organisms present in the inoculum – however it is suitable for validation of quantal methods (Chapter 13) using inoculated samples. The procedure requires a minimum of three different concentrations of organisms at log-spaced intervals; at least one giving a fractional response (i.e. some positive and some negative values) and one giving a 100% response. In addition, it is necessary to have one totally negative set of data (i.e. a negative control). However more levels are desirable, even at the expense of the degree of replication. For each level at least three, and preferably at least five, replicate tests per concentration level per laboratory or test occasion are desirable (the confidence level increases with increased replication) using a constant analytical portion size (i.e. the volume or weight of sample or dilution). This method of quantification enables the derivation of a value for the standard error, and hence of the uncertainty, of the LOD50 estimate. The procedure can be used to compare performance of two or more methods where both have been evaluated under identical conditions in two or more laboratories. Alternatives to the Spearman–Kärber procedure, including probit, logit and other procedures, are currently being assessed for analysis of microbiological quantal data. Van der Voet (personal communication, 2007) considers that using a generalized linear model of the complimentary loglog response may provide a better option.
Level of detection (%)
100 80 60 40 20 0 0.00
0.01
0.10
1.00
10.00
Level of inoculum (cfu/g) FIGURE 11.3 Sigmoid response curve for a set of data determined in a Spearman–Kärber evaluation of a quantal method for detection of organisms (based on the method of Hitchins, 2006).
CH011-N53039.indd 219
5/26/2008 8:42:43 PM
220
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
EXAMPLE 11.9 USE OF THE SPEARMAN–KÄRBER METHOD TO DETERMINE THE LOD50 VALUE FOR A QUANTAL ASSAY Suppose that 10 replicate tests for the presence or absence of a specific organism in a food product are done, using an appropriate method, at each of 4 logarithmically spaced levels of inoculation (100, 10, 1 and 0.1 cfu/25 g sample) and that the following numbers of positive results are obtained: 10, 8, 3 and 0, respectively. The prevalence of the test organism is determined from MPN tables. For instance, the three inoculated sample sets at the highest inoculum level that gave results of 10-8-3, respectively. So the MPN is 20 organisms/25 g (95% confidence limits 9–35 organisms/ 25 g). Similarly, for the second highest inoculum series, the results of 8-3-0 give an MPN of 1.9 organisms/25 g (95% limits 0.9–3.4). Whilst these MPN values demonstrate that the level of recovery of the inoculated organisms is only in the order of 20%, they do not give any information about the relative responsiveness of the procedure. Such problems have been common in toxicology and other sciences for many years and were resolved by the use of probit and other related statistical procedures. One such approach is the Spearmann–Kärber method for analysis of quantal data. ˆ ) of the true The generalized Spearmann–Kärber formula provides an estimate ( mean value based on the observed responses at the differing inoculation levels. ˆ
k1
1
∑ ( pi1 pi ) 2 ( xi xi1 ) i1
ˆ estimator of the mean (); xi inoculum level with xi xi 1 … xk ; k the where, number of inoculation levels; and pi is the observed proportion of positive replicates in an experiment where ni replicates are tested independently at inoculation level xi yielding ri positive results, so that pi ri /ni, with i 1,…,k. It is assumed that p1 0 and pk 1. Provided that ni 2, i 1,…k and qi 1pi , the variance of the mean is derived from var()
k1
p
⎛ xi1 xi1 ⎞⎟2 ⎟⎟ ⎠ 2
∑ q (n i 1) ⎜⎜⎜⎝ i1
i
Use of a spreadsheet5 derived by Hitchins (2006) provides a simple way to determine the mean value for a level of detection of 50% positive results (LOD50). For the data shown above, the LOD50 is 2.5 cfu/25 g sample, with 95% confidence limits for the LOD50 of 0.98–6.4 cfu/25 g sample (Table 11.14). Whilst the procedure may not be perfect, it does at least provide an estimate of the level of uncertainty of data from a test method together with an indication of a specific
5
CH011-N53039.indd 220
The Excel spreadsheet can be obtained from
[email protected] 5/26/2008 8:42:44 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
221
TABLE 11.14 Example of the Spearman–Kärber Excel spreadsheet Enter spike size (cfu or MPN/ portion)
Enter spike size (cfu or MPN/ g or ml)
Enter number of replicates per spiking level
Enter number of replicates grown
0.01 1 10 100
0.004 0.04 0.4 4 Total degrees of freedom
10 10 10 10 36
0 3 8 10
From Table enter t value for 95% confidence level
2.03
Result LOD 50% (cfu or MPN per g/ml)
Lower limit
Upper limit
0.100
0.039
0.256
Source: Reproduced by permission of the author from Hitchins (2005).
parameter, the LOD50. The response curve for the data is shown in Fig. 11.3. This example serves also to demonstrate the low precision of quantal methods. Clearly, if an example had been used with more sets of data values then the sigmoid response would have been more pronounced. Where tests are done in a number of laboratories, the overall response levels for each inoculum (or naturally occurring) level of prevalence of the organism can be used to obtain a more precise estimate of the LOD50.
Use of Reference Materials in Quantal Testing One of the inherent problems with presence or absence tests relates to the likelihood that a method may give either a false positive (Type I error) or a false negative (Type II error) result (Chapter 8). A false positive is a result that indicates the apparent presence of the target organism when it is not present in the sample; a false negative result is one where the presence of a target organism is not detected when it is present in a sample. Such errors create specific problems in the interpretation of quantal test results and in many cases may be related to lack of homogeneity in the test matrix. If tests are done on natural food matrixes, it is impossible to estimate the likely occurrence of false results, other than by comparison of results in one laboratory with those in another. During development, evaluation and routine use of any method it is essential to ensure that the likelihood of such errors occurring
CH011-N53039.indd 221
5/26/2008 8:42:44 PM
222
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
is minimized. A laboratory proficiency scheme provides a way to monitor the efficiency of a test procedure in any individual laboratory. Minimizing the occurrence of false results requires the use of standardized reference materials that can be relied upon to contain the target organism at a defined level. For highlevel contamination (e.g. 1000 cfu/ml) that is not a major problem; the issue arises primarily where the actual level of detection is intended to be close to the minimum level of detection. For instance, to detect 1 cfu of a specific organism in (say) 25 g of sample implies that the organism is evenly distributed throughout a lot of test material such that each 25 g sample unit is likely to contain the organism. Unfortunately it is not possible to add a single test organism to each individual 25 g sample to guarantee with 100% probability that each sample would contain the organism. Even if it were possible, there is a distinct probability that the organism might not survive the preparation and storage process. There is little guarantee that low levels of inoculated organisms will be distributed throughout a batch of food matrix such that each of a number of randomly drawn samples will contain the test organism at the same concentration. Standard reference materials for use in quantal microbiological tests are now available either for direct use in a test system or for dilution into a batch of material. Certified reference materials, that is those conforming to the requirements of ISO Guide 35 (Anon., 2006b), have defined levels of specific types of organism. However, no matter how well a commercial- or laboratory-standardized preparation has been produced, the indicated level of target organism can never be absolute, not least since it has to be estimated by a viable counting method. Data provided by manufacturers typically show a range of microbial numbers in standardised preparations. One of the most precise reference materials, BioBalls™, is prepared using flow cytometry to dispense a precise number of cells that are immediately freeze-dried into a water-soluble sphere. Low-dose BioBalls™ typically contain 28–33 cfu of a target organism, with 95% uncertainty limits in the order of7 cfu. Whilst this level of precision is probably as good as can be achieved in practice, it is important to recognize that the inherent uncertainty limits can be as large as 20–25% of the mean level. If we assume a Poisson distribution, then if the mean number of viable cells is, say, 30 cfu, the SD 30 5.48. The lower and upper confidence limits on a mean count of 30 cells are 19 (2.5%) and 41 (97.5%). With such high inherent distribution variability it is not surprising that in microbiological testing there can be no ‘absolute’ reference values. References Analytical Methods Committee (1989a) Robust statistics – How not to reject outliers. Part 1: Basic concepts. Analyst, 114, 1693–1697. Analytical Methods Committee (1989b) Robust statistics – How not to reject outliers Part 2: Interlaboratory trials. Analyst, 114, 1699–1702. Analytical Methods Committee (2001) Robust statistics: A method of coping with outliers. AMC Brief No.6. Royal Society of Chemistry, London. http://www.rsc.org/Membership/Networking/ InterestGroups/Analytical/AMC/TechnicalBriefs.asp.
CH011-N53039.indd 222
5/26/2008 8:42:44 PM
ESTIMATION OF MEASUREMENT UNCERTAINTY
223
Anon. (1994) Accuracy (trueness and precision) of measurement methods and results – Part 2: Basic methods for the determination of repeatability and reproducibility of a standard measurement method. International Organisation for Standardisation, Geneva. ISO 5725-2:1994 Anon. (1989) Schweizerisches Lebensmittelbuch, Kapitel 60: Statistik und Ringversuche. EDMZ, Bern. Anon. (1998) Accuracy (trueness and precision) of measurement methods and results – Part 5: Alternative methods for the determination of the precision of a standard measurement method. International Organisation for Standardisation, Geneva. ISO 5725-5:1998 Anon. (2003) Microbiology of food and animal feeding stuffs – Protocol for the validation of alternative methods. International Organisation for Standardisation, Geneva. ISO 16140:2003 Anon. (2006a) Microbiology of food and animal feeding stuffs – Guide on estimation of measurement uncertainty for quantitative determinations. International Organisation for Standardisation, Geneva. ISO TS 19036:2006 Anon. (2006b) Reference materials – General and statistical principles for certification. International Organisation for Standardisation, Geneva. ISO Guide 35: 2006 Box, GEP and Cox, DR (1964) An analysis of transformations. J. Royal Stat. Soc., B26, 211–243. discussion 244–252. Brown, MH (1977) Microbiology of the British Fresh Sausage. PhD Thesis, University of Bath. Chakravarti, IM, Laha, RG and Roy, J (1967) Handbook of Methods of Applied Statistics, Vol. I, Techniques of Computation, Descriptive Methods, and Statistical Inference. John Wiley and Sons, New York. pp. 392–394. Corry, JEL, Hedges, AJ and Jarvis, B (2007) Measurement uncertainty of the EU methods for microbiological examination of red meat. Food Microbiol., 24, 652–657. Dixon, WJ (1953) Processing data for outliers. Biometrics, 9, 74–89. D’Agostino, RB (1986) Tests for normal distribution. In D’Agostino, RB and Stephens, MA (eds.) Goodness-of-Fit Techniques. Marcel Decker, New York, pp. 36–41. Eurachem (2000) Quantifying Uncertainty in Analytical Measurement, 2nd edition. Laboratory of the Government Chemist, London. Finney, DJ (1971) Probit Analysis, 3rd edition. University Press, Cambridge, UK. Grubbs, F (1969) Procedures for detecting outlying observations in samples. Technometrics, 11, 1–21. Hedges, AJ and Jarvis, B (2006) Application of ‘robust’ methods to the analysis of collaborative trial data using bacterial colony counts. J. Microbiol. Meth., 66, 504–511. Hedges, AJ (2008) A method to apply the robust estimator of dispersion, Qn, to fully-nested designs in the analysis of variance of microbiological count data. J. Microbiol. Meth., 72, 206–207. Hitchins, AD (1998) Retrospective interpretation of qualitative collaborative study results: Listeria methods. AOAC International Annual Meeting 112: 102, Abstract J-710? Hitchins, AD (2006) Proposed Use of a 50% Limit of Detection Value in Defining Uncertainty Limits in the Validation of Presence-Absence Microbial Detection Methods. In Best Practices in Microbiological Methodology, AOAC Final BPMM Task Force Report, Statistics WG Reports 4a and 4b, (Appendices K & L). http://www.cfsan.fda.gov/~comm/microbio.html Horwitz, W (1995) Protocol for the design, conduct and interpretation of method performance studies. Pure Appl. Chem., 67, 331–343. Jewell, K (ed) (2004) Microbiological Measurement Uncertainty: A Practical Guide. Guideline No. 47. Campden & Chorleywood Food Research Association, Campden, UK. Kilsby, DC and Pugh, ME (1981) The relevance of the distribution of microorganisms within batches of food to the control of microbiological hazards from foods. J. Appl. Microbiol., 51, 345–354.
CH011-N53039.indd 223
5/26/2008 8:42:44 PM
224
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Niemelä, SI (2002) Uncertainty of Quantitative Determinations Derived by Cultivation of Microorganisms. 2nd edition. Centre for Metrology and Accreditation, Advisory Commission for Metrology, Chemistry Section, Expert Group for Microbiology, Helsinki, Finland, Publication J3/2002. Rousseeuw, PJ and Croux, C (1993) Alternatives to the median absolute deviation. J. Amer. Stat. Ass., 88, 1273–1283. Shapiro, SS and Wilk, MB (1965) An analysis of variance test for normality (complete samples). Biometrika, 52, 591–611. Smith, P and Kokic, P (1997) Winsorisation for numeric outliers in sample surveys. Bull. Int. Stat. Inst., 57, 609–610. Stephens, MA (1974) EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Ass., 69, 730–737. Wilrich, P-Th (2007) Robust estimates of the theoretical standard deviation to be used in interlaboratory precision experiments. Accred. Qual. Assur., 12, 231–240. Youden, WJ and Steiner, EH (1975) .
CH011-N53039.indd 224
5/26/2008 8:42:44 PM
12 STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
Microbiological data are often filed without much consideration as to their further application. When one considers the costs of providing, staffing and running laboratories, of obtaining and examining samples, etc. it is at best short-sighted not to make effective use of data generated. Procedures for statistical process control (SPC) have been around for more than 80 years and have been the subject of many books and papers (see for instance, Shewhart, 1931; Anon., 1956; Juran, 1974; Duncan, 1986; Beauregard et al., 1992; Montgomery, 2000;) although few have referred to microbiological applications. The benefits of, and approaches to, SPC for trend analysis of microbiological data was included in the report of a US FDA-funded programme undertaken by an AOAC Presidential Task Force programme on ‘Best Practices in Microbiological Methods’. The report (Anon., 2006a,b) is freely available on the FDA website.
WHAT IS SPC? SPC is a practical approach to continuous control of quality (and safety) that uses various tools to monitor, analyse, evaluate and control a process and is based on the philosophical concept that it is essential to use the best available data in order to optimise process efficiency. In this context, the term ‘process’ is any activity that is subject to variability and includes not just production, storage and distribution within an industry, such as food processing, but also other operations where it is necessary to ensure that day-to-day operations are in control. It is therefore relevant also to laboratory monitoring and testing activities, where ‘process’ describes the production of analytical data and to other forms of quantifiable activity. The concept was introduced in the late 1920s by Dr William Shewhart whose work helped to improve efficiency in the United States manufacturing industry by moving from traditional quality control to one based on quality assurance. Hence, SPC is based on the concept of Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH012-N53039.indd 225
225
5/26/2008 8:47:09 PM
226
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
using data obtained from process line measurements and/or on samples of intermediate and/ or finished products, rather than reliance on 100% final product inspection. In this context, quality assurance implies that appropriate checks are ‘built in’ to ensure that a production process is in control (a concept not dissimilar to that of the HACCP approach to food safety management). Assurance of production control comes from the regular monitoring of process parameters and on tests to evaluate product, or other, samples taken during a process. The target of SPC is to achieve a stable process, the operational performance of which is predictable, through monitoring, analysis and evaluation. It is therefore one of the tools of Total Quality Management, a subject largely outside the scope of this book, but considered in the context of the definition and implementation of Food Safety Objectives in Chapter 14. At its simplest, SPC in manufacturing production is done by operative recording of simple physical (time, temperature, pressure, etc.), chemical (pH values, acidity titrations, etc.) or other parameters critical to effective operation of a process plant. Such values are plotted on appropriately designed record forms or charts. An inherent problem in any manufacturing operation is that both management and operatives have a tendency to ‘twiddle’ knobs and adjust settings, probably with the best of intentions. If it is suspected that a process is not quite running properly it is a natural human instinct to try to put matters right – but by so doing matters may get worse. Provided that control systems are operating properly, there is no necessity for human intervention – but if there is clear evidence of an ‘out of control’ situation developing, then action needs to be taken to identify and correct the cause. SPC provides an objective approach to aid decision-making for process control. Data obtained by analysis of critical product compositional parameters can also be charted. SPC provides a means to assess trends in the data over time in order to ensure that the products of a process conform to previously approved criteria. Evidence of significant trends away from the ‘norm’ requires remedial action, for example, to halt production and/or to return the process operation to a controlled status. In a similar way, SPC of internal laboratory quality performance monitoring data can be used to provide assurance that analyses are being undertaken properly and to identify when potential problems have arisen.
TREND ANALYSIS At its simplest, trend analysis involves plotting data values, for which a target value can be established, against values for time, production lot number or some other identifiable parameter. Note that a target value is not a product specification; rather it is a value derived by analysis of product produced under conditions when the process is known to be ‘in control’. Figure 12.1 provides a simple schematic of trend lines that lack any form of defined limits. Although the direction of the trend is obvious, in the absence of control limits, there is no guidance on whether or not a process has ‘gone out of control’. One of the philosophical aspects of SPC is that once control has been achieved, the approach can be converted to Statistical Process Improvement whereby process and, subsequently, product improvement is based on the principles of Total Quality Management (Beauregard et al., 1992). It is noteworthy that the European legislation on microbiological
CH012-N53039.indd 226
5/26/2008 8:47:10 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
227
Product temperature (°C)
8 7
d
6 a
5
b 4 3 c 2 8
9
10
11
12
13
14
Time (h) FIGURE 12.1 Hypothetical illustration of trends in product temperature: (a) Perfect control (never seen in real life situations); (b) Regular cycling within limits; (c) Continuous reduction in temperature; (d) Continuous increase in temperature.
criteria for foodstuffs recommends that manufacturers should use trend analysis to record and examine the results of routine testing (Anon., 2005). TOOLS FOR SPC Shewhart’s original (1931) concept of a process is depicted in Fig. 12.2 as a ‘cause and effect’ (fishtail) diagram comprising materials, methods, manpower, equipment and measurement systems that together make up the process environment. The extent of variability (i.e. lack of control) associated with each of these factors impacts on the subsequent stages of the process and the overall variability in a process can be determined from the individual factors, viz: Voutput
K
∑ VK i 1
Vmaterials Vequipment Vmethod Vpeople ... VK
where VK variance associated with an individual factor K. Evidence of a high variance in some measurable quality of the finished product will reflect in the sum of the variances of the manufacturing stages, the causes of the variances being either random or inherent. Consequently, the basic statistical tools for SPC are the estimates of the mean (or median) value, the variance and the distribution characteristics of key parameters associated with a final, or intermediate stage, product and include the variance associated with the monitoring and recording of critical control point parameters, such as time, temperature and pressure (cf. HACCP). From many perspectives, this is similar to the estimation of the measurement uncertainty of a method (Chapters 10 and 11). A microbiological procedure to assess the quality or safety of a product can be used to verify that a production process is ‘in control’ but only if the microbiological test procedure is itself ‘in control’.
CH012-N53039.indd 227
5/26/2008 8:47:10 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
228
Methods
People
Environment
Output
Measurement system Machine
Materials
FIGURE 12.2 Schematic ‘Cause and Effect’ description of a process – after Shewhart.
The principle of SPC charts relies on the properties of the normal distribution curve (Chapter 3), which has a symmetrical distribution around the mean value, such that 95% of the population falls within the range of mean 2 SDs and 99.7% falls within the range of mean 3 SDs. However, as we have observed previously, the distribution of microbiological data is inherently asymmetrical and does not usually conform to the normal distribution. It is therefore necessary to use an appropriate transformation of the data, depending on its actual distribution characteristics. As explained earlier (Chapter 4) the appropriate transformation for microbiological data could be yi loge xi or yi log10 xi for data that are log-normal distributed (e.g. colony counts), or yi 冪xi for data conforming to Poisson (e.g. for low cell or colony counts), or the inverse hyperbolic sine transformation for data conforming to a negative binomial distribution. Before any system of SPC can be used it is essential to define the process, the measurement system and the expected variability of the process characteristics to be measured. From a microbiological perspective, therefore, it is necessary to understand how a product is processed, the microbial associations of the ingredients and the product, and the procedures to be used to determine the microbial composition of the product.
SPC Performance Standards It is necessary to introduce ‘performance standards’ for implementation of SPC for microbiological data (Anon., 2006a). Such ‘performance standards’ are not meant to prescribe procedures that should be used for evaluating processes, but to provide guidance on the methodology to be used: 1. Charting of data is necessary to gain full benefit from SPC. Charts of output data provide a visual aid to detecting and identifying sources of unexpected variation leading to their elimination.
CH012-N53039.indd 228
5/26/2008 8:47:10 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
229
2. Results to be plotted in a control chart, when the process is known to be ‘in control’ should be normally, or nearly normally, distributed. Where this is not the case then the data must be ‘normalised’ by an appropriate mathematical transformation. 3. During the initial setting up of the charts it is essential that the process be operated in a stable and controlled manner, so that the data values used to establish the operational parameters of the SPC procedure are normally distributed. The ‘rule of thumb’ requires at least 20 sets of individual results to compute the mean, standard deviation and other summary statistics in order to estimate the distribution of the results and construct control limits and target value. 4. Rules for evaluating process control require the assessment of both Type I and Type II errors: a Type I error defines the process to be ‘out of control’ when it is not and a Type II error defines a process to be ‘in control’ when it is not. The probabilities to be used for this purpose are based on the and probabilities, respectively, of the normal distribution curve. In addition, the average run length (ARL), which is the expected number of results before an ‘out of control’ signal is seen, is also an important parameter. 5. When a process is believed to be ‘in control’, the limits for assessing individual results are set at some distance from the mean or process target value. In general, the default distance is 3SDs since the (Type I error) should be kept low (i.e. 1%). 6. In using the SPC approach it is necessary to establish rules to assess whether a change in the process average has occurred. Such rules include the use of moving averages, and the number of consecutive values moving in a single direction. 7. Process control limits used in SPC are not product criteria limits, set by customer or regulation; rather they are process-related limits. Product specifications should not be shown on control charts. A fuller explanation of performance standards for SPC can be found in the BPMM report that is freely available on the Internet (Anon., 2006a,b).
SETTING CONTROL LIMITS Assume, for a defined non-sterile food product, processed using a specific manufacturing system, that the microbial load is considered to be acceptable at the end of processing if the average aerobic colony count at 30°C is, say, 104 cfu/g (4.0 log10 cfu/g). Assume also that the log10-transformed counts conform reasonably to a normal distribution and that the standard uncertainty of the measurement method used to establish the levels of organisms from the product is 0.25 log10 cfu/g. Since we are interested only in high counts, a lower limit is not necessary. Then, for an one-sided process control limit, if the mean colony count level is 104 cfu/g (4.0 log10 cfu/g) then, for 0.01, the upper 99% confidence interval will be (4.0 2.326 0.25) 4.58 log10 cfu/g. It would then be expected that the probability of a data value exceeding 4.6 log10 cfu/g would occur on average only on 1 occasion in 100 tests. It is useful also to set a process control warning limit, e.g. for 0.05 the warning limit would be (4.0 1.645 0.25) 4.41 log10 cfu/g. These upper values, together with
CH012-N53039.indd 229
5/26/2008 8:47:11 PM
230
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
the mean value of 4.0 log10 cfu/g, constitute the limit values drawn on the SPC control chart for average values. Data from tests on a large number of samples, taken during processing of a single batch of products, should conform (after transformation if necessary) to a normal distribution (Fig. 12.3a) and should lie within the upper and lower control limits. However, a data distribution that is skewed (Fig. 12.3b) shows the process to have been out of control even though all the data values lie within the limits. Data conforming to a normal distribution with an increased mean value (Fig. 12.3c) will include results that exceed the upper limit thus showing that some aspect of the process was not properly controlled (for instance, poor quality ingredients). Other variations could include data with larger variances, such that the distribution curve is spread more widely, data with smaller variances leading to a reduced spread of results, negatively skewed data where a majority of the values are below the control target, and multimodal distributions. A bimodal distribution (Fig. 12.3d) suggests that the process either used two different sources of a major ingredient or that a major change in process conditions occurred during manufacture. SHEWHART’S CONTROL CHARTS FOR VARIABLES DATA Control charts present process data over a period of time and compare these to limits devised using statistically based inferential techniques. The charts demonstrate when a process is in a state of control and quickly draw attention to even relatively small, but consistent, changes away from the ‘steady state’ condition, leading to a requirement to a stop or adjust a process before defects occur. Most control charts are based on the central limit theory of normal distribution using sample average and variance estimates. Of course, in the case of microbiological data, levels of colony forming units need to be normalised by an appropriate mathematical transformation (e.g. log transformation). There are four major types of control chart for variables data: 1. 2. 3. 4.
The x chart, based on the average result; The R chart, based on the range of the results; The s chart, based on the sample standard distribution; and The x chart, based on individual results.
The R and s charts monitor variations in the process whereas the x and x charts monitor the location of results relative to predetermined limits. Often charts are combined, for example, into an x and R chart or an x and s chart; in addition, special versions of control charts may be used for particular purposes such as batch operations. x and R Charts These use the average and range of values determined on a set of samples to monitor the process location and process variation. Use of a typical x and R chart is illustrated in Example 12.1. The primary data for setting up a chart is derived by analysis, using the procedure to be used in process verification, of at least 5 sample units from at least 20 sets
CH012-N53039.indd 230
5/26/2008 8:47:11 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
231
(a)
(b)
(c)
(d) Target value
Limit value
FIGURE 12.3 In control or in specification? (a) In control and within limit; (b) In control but not within limit; (c) Within limit but not in control due to skewness; (d) Bimodal distribution – some, but not all, data conform to the target and limit values.
of product from the specific process. After normalisation of the data, the average ( x ) and range (R) of values is derived for each set of samples and the data are plotted as a histogram to confirm that the pattern of distribution conforms reasonably to a normal distribution. Assuming that the distribution is acceptable, the process average (target value x ) and the process average range ( R ) are calculated from the individual sample set averages and ranges, using the equations: Target value x
∑x; n
Target value (R) R
∑R n
where x sample average, R sample range and n number of sample sets.
CH012-N53039.indd 231
5/26/2008 8:47:11 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
232
Control limits need to be established for both sample average and sample range data sets; if both upper and lower confidence limits (CLs) are required, these are located symmetrically about the target (centre) line; in most cases, microbiological tests of quality/safety, require only upper confidence limits (UCLs). Usually these will be set at UCL x x (A 2 R) and UCLR D4 R for the 99% UCLs, where A2 and D4 are factors that are dependent on the number of sample units tested (see Table 12.2). The values for the Target Line and UCLs are drawn on the charts and are used to test the original sets of sample units for conformity (Example 12.1). If all data sets conform then the chart can be used for routine testing, but if there is evidence of non-conformance, the cause must be investigated and corrected, the process resampled and new targets and UCLs established. The values for the control lines should be reviewed regularly; evidence that the limits are wider than previously determined is a cause for concern and the reasons must be investigated. If process improvements occur over time, the limits can be tightened accordingly. Procedures for assessing whether or not a process is going out of control are described below.
EXAMPLE 12.1 DATA HANDLING FOR PREPARATION OF SHEWHART CONTROL CHARTS FOR MEAN, STANDARD DEVIATION AND RANGE VALUES ( x , s AND R CHARTS) Suppose that 24 sets of aerobic colony counts were done on a product prepared in one production plant. Each dataset consists of 5 replicate samples, each of which was examined in duplicate. How do we prepare an SPC chart? The counts are log10-transformed and tested for conformance with normality, skewness and kurtosis – although not perfect, the fit is adequate for purpose. The mean, variance, standard deviation and range of log counts for each sample set are given in Table 12.1. The overall mean values were then determined: average mean colony count ( x ) 4.98; average standard deviation ( s ) 0.18 ; and average range ( R ) 0.43 (all as log10 cfu/g). Table 12.1. TABLE 12.1 The Mean, Standard Deviation and Range of Aerobic Colony Counts for 5 Samples of Each of 24 Sets of a Fresh Food Product Aerobic colony count (log10 cfu/g) Occasion 1 2 3 4 5 6
CH012-N53039.indd 232
Mean ( x ) 5.05 5.06 4.81 4.76 4.84 4.72
SD (s) 0.343 0.359 0.197 0.140 0.163 0.107
Range (R) 0.72 0.81 0.47 0.34 0.43 0.29
5/26/2008 8:47:11 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
233
Aerobic colony count (log10 cfu/g) Occasion
Mean ( x )
SD (s)
Range (R)
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
4.95 5.11 5.03 5.06 5.11 5.13 4.74 4.80 4.91 5.13 5.05 4.97 4.95 5.05 5.09 5.04 5.13 5.01
0.192 0.149 0.123 0.235 0.148 0.198 0.199 0.142 0.247 0.144 0.225 0.093 0.066 0.140 0.132 0.145 0.182 0.255
0.42 0.40 0.30 0.59 0.31 0.48 0.44 0.35 0.59 0.35 0.54 0.25 0.17 0.35 0.34 0.37 0.47 0.54
Overall mean
4.98
0.180
0.43
Setting Control Limits (CL) Control limits are established for each chart at 3 SD above and below the target (mean value) line; additional control lines can also be inserted at 1 and 2 SDs.
A ‘Mean value’ ( x ) chart The centre line is set at the overall mean value (i.e. for our data this is 4.98 log10 cfu/g). The upper and lower control limits are determined as: UCL x x A 2 R
and the LCL x x A 2 R
where x is the overall mean value, A2 is a constant (Table 12.2) and R is the mean range. An alternative calculation uses the average standard deviation ( s ): UCL x x A 3 s
and the LCL x x A 3 s
x 4 . 98 and R 0 . 43 , UCL x 4 . 98 0 . 577 0 . 43 5 . 23 ; the For our data, with alternative procedure using s 0 . 180 , gives UCL x 4 . 98 1 . 427 0 . 180 5 . 24 .We can determine the LCL similarly: LCL x 4 . 98 0 . 577 0 . 43 4 . 98 0 . 26 4 . 74 . These control limits are drawn on the graph and the data plotted (Fig. 12.4). All data values lie within the range 2 SD to 3 SD. Since in most microbiological situations
CH012-N53039.indd 233
5/26/2008 8:47:12 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
234
lower counts are generally perceived as being beneficial, the LCL is rarely used. This chart will be used further in the Example 12.2. A Range (R) chart The mean value for the range ( R ) chart 0.43; the UCLR and LCLR values are derived from the equations: UCL R D4 R
and the LCL R D3 R
where, D3 and D4 are constants (Table 12.2). For our data: UCL R 2 . 114 0 . 43 0 . 909 ⬇ 0 . 91 and LCL R 0 . 076 0 . 43 0 . 033 ⬇ 0 . 03 As before, these values are entered as control limits on a graph and the range data values plotted (Fig. 12.5). A Standard Deviation (s) Chart The mean value of the standard deviations ( s ) 0.180. The upper and lower control limits are determined as: UCL s B4 s and LCL s B3 s , where B3 and B4 are constants from Table 12.2. For our data: UCL s 2 . 089 0 . 180 0 . 376 ⬇ 0 . 38 and LCL s 0 . 030 0 . 180 0 . 0054 ⬇ 0 . 0 1
TABLE 12.2 Some Constants for Control Chart Formulae x & R Charts
Number of replicates 2 3 4 5 6 7 8 10 15 20 a
CH012-N53039.indd 234
x -Chart limits A2 1.880 1.023 0.729 0.577 0.483 0.419 0.373 0.308 0.223 0.180
x & s Charts
R-Chart limitsa D3 – – – – – 0.076 0.136 0.223 0.347 0.180
x -Chart limits
D4
A3
3.267 2.574 2.282 2.114 2.004 1.924 1.824 1.777 1.653 1.585
2.659 1.954 1.628 1.427 1.287 1.182 1.099 0.975 0.789 0.680
s-Chart limitsa B3 – – – – 0.030 0.118 0.185 0.284 0.428 0.510
B4 3.267 2.568 2.266 2.089 1.970 1.882 1.815 1.716 1.572 1.490
Number of replicates 2 3 4 5 6 7 8 10 15 20
If a value is not shown, use the next higher value.
5/26/2008 8:47:12 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
235
These values are entered onto the chart and the standard deviation values plotted (Fig. 12.6).
Interpretation of the Standard Reference Charts The data used to derive the control limits were obtained on product samples taken when the process was believed to be ‘in control’. In order to confirm that both the process and the measurements were ‘in control’ it is essential to chart these data before using the control charts for routine purposes. Figures 12.5 and 12.6 show that the data values for the range and standard deviation charts are randomly distributed around the mid line with no sequential runs of data on either side and they all lie between the upper and lower CLs. However, the x chart (Fig. 12.4) shows that several values lie outside the 2 SD LCL and one value is on that line. In a two-sided test this would indicate that the process was not absolutely ‘in control’ for these particular batches – but in this instance, for bacterial counts, we are interested only in the UCL so these low values can be ignored and the data above the centre line is ‘in control’. These charts will be used in Example 12.2.
x chart
5.3
UCL 5.24 5.2
x (log cfu/g)
5.1
5.0 Avg 4.98 4.9
4.8
LCL 4.72
4.7 4.6 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Sample number
FIGURE 12.4 Statistical control chart for mean ( x ) colony count (as log10 cfu/g) on 24 sets of replicate examinations of a fresh food product produced under controlled conditions. The heavy dotted lines (– – – – –) show the upper (UCL) and lower (LCL) 99~~% CL. The lighter dotted lines (……) are the upper and lower limits for 1 SD and 2 SDs around the centre line.
CH012-N53039.indd 235
5/26/2008 8:47:13 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
236
R chart
1.00
(s 0.19)
UCL 0.91
0.90
Range (log cfu/g)
0.80 0.70 0.60 0.50
Avg 0.43
0.40 0.30 0.20 0.10 0.00 1 2
3 4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Sample number
FIGURE 12.5 Statistical control chart for range (R ) of log10 cfu/g on 24 sets of replicate examinations of a fresh food product produced under controlled conditions.
s chart
Sample standard deviation
0.4
(s 0.192)
UCL = 0.38
0.35 0.3 0.25 0.2 Avg = 0.18
0.15 0.1 0.05 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Sample number
FIGURE 12.6 Statistical control chart for standard deviation (s) of log10 cfu/g on 24 sets of replicate examinations of a fresh food product produced under controlled conditions.
CH012-N53039.indd 236
5/26/2008 8:47:13 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
237
x and s Charts For sample unit sizes greater than 12, the standard deviation (s) should be used to monitor variation rather than the range (R) because it is more sensitive to changes in the process. However, it should be noted that the distribution of s tends to be symmetrical only when n 25, otherwise it is positively skewed. Thus the s chart is less sensitive in identifying nonnormal conditions that cause only a single value in a sample to be unusual. The Target lines for the x and s charts are x and s , respectively. The calculation for x was shown previously; s is determined by averaging the sample standard deviations of the individual sample sets: k
Centre line s
∑ si i 1
k
s1 s2 ... sk k
where k number of sample sets and si standard deviation for sample sets i 1, 2, 3, … k. Note, however, that this is a biased average, since the true average standard deviation is determined from: k
s
∑ si2 i 1
k
s12 s22 ... sk2 k
.
The UCLs are calculated from: UCL (x) x A3 s
UCL(s) B 4 s
where A3 and B4 are standard values (Table 12.2). The charts are prepared and interpreted in a similar manner to x and R charts.
Interpretation of x , s and R Charts If a process is ‘in control’ the data from critical parameter tests should conform to the rules for normal distribution, that is, symmetry about the centre distribution line and randomness; 95% of data should lie within 2 s and 99.7% should lie within 3 s of the centre line. A process complying with these rules is generally stable. If this is not the case, the process may be out of control and the cause(s) must be found and rectified. The key question is ‘How do you recognise out of control situations from the chart?’ At least 11 ‘out of control’ patterns have been identified (Anon., 1956; Wheeler and Chambers, 1984; Beauregard et al., 1992). As a general rule, 6 patterns cover the majority of situations
CH012-N53039.indd 237
5/26/2008 8:47:14 PM
238
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
likely to be encountered (Beauregard et al., 1992). For a single-sided test, such as is used for microbiological data, these patterns can be summarised as: 1. A single point above the UCL that is set at 3 s above the central line. The probability for any point to occur above UCL is less than 0.14% (about 1 in 700). 2. Seven consecutive points above the centre line (probability 0.78%; about 1 in 125). 3. Two consecutive points close to the UCL – the probability of 2 consecutive values occurring close to a limit in a normal distribution is about 0.05% (1 in 2000). 4. Ten out of 11 consecutive points above the centre line – probability 0.54% (about 1 in 200). If reversed (i.e. going downwards) this pattern may also indicate a process change that warrants investigation. 5. A trend of seven consecutive points in an upward (or downwards) line indicates a lack of randomness. 6. Regular cycling around the centre line. Beauregard et al. (1992) suggested some possible engineering causes for these patterns but others may be equally plausible when examining microbiological data. When using microbiological colony counts or other quantitative measures it is essential to ensure that the test is itself in control. Changes to the culture media, incubation conditions and other factors discussed previously (Chapter 10) must be taken into account when examining SPC and other forms of microbiological trend analysis to ensure that it is the process rather than the method of analysis that is ‘out of control’! Examples 12.1 and 12.2 illustrate some ‘out of control’ patterns.
EXAMPLE 12.2 USE OF THE x AND R SPC CHARTS Having prepared ‘standard’ SPC charts, how do we now use them to examine data from routine production and what should we look for to indicate that a process is not ‘in control’? We now extend the database used in Example 12.1 by adding data for a further 57 samples, obtained when the plant was running normally. We plot the mean and range of the log-transformed counts for each sample (Table 12.3) on these charts using the centre line and UCL values derived for the mean log counts in Example 12.1 and using both the UCL and LCL for the range of counts. What do the results tell us? The x chart (Fig. 12.7) provides a ‘mountain range’ pattern with numerous peaks and troughs. Note firstly that the mean colony counts on samples 26–28, 32, 50–52, 64–69, 75 and 79–81 all exceed the UCL, in some instances by a considerable amount. These process batches are therefore ‘out of control’ and it would be necessary to investigate the causes. Are the colony count data recorded correctly or was there some technical reason why high counts might have been obtained in these instances? Was one or more of the ingredients different? Had a large quantity of ‘reworked’ material been blended in to the batch? And so on.
CH012-N53039.indd 238
5/26/2008 8:47:14 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
239
TABLE 12.3 Aerobic Colony Count Data on 81 Sets of Samples of a Fresh Food Product Replicate colony counts (log10 cfu/g) Sample
1
2
3
4
5
Mean
Range
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
4.73 4.81 4.93 4.83 4.77 4.72 5.16 5.07 5.02 4.93 4.93 5.22 4.48 4.71 5.20 5.38 5.26 5.06 4.98 5.02 5.00 5.12 5.29 5.06
5.37 5.23 4.47 4.61 4.80 4.84 5.12 5.36 4.91 5.34 4.96 5.20 4.87 4.95 4.78 5.09 4.72 4.82 4.90 4.94 5.29 5.08 5.21 4.78
4.65 4.82 4.90 4.79 4.92 4.75 4.93 5.07 4.92 4.75 5.16 4.78 4.57 4.89 4.61 5.04 5.00 4.97 5.04 4.96 5.13 5.17 5.13 4.72
5.17 4.84 4.94 4.95 4.65 4.55 4.74 5.09 5.07 5.06 5.23 5.26 4.92 4.60 5.13 5.04 5.26 4.96 4.87 5.02 4.94 5.06 5.19 5.25
5.35 5.62 4.80 4.64 5.08 4.76 4.78 4.95 5.21 5.23 5.24 5.18 4.85 4.84 4.85 5.11 5.01 5.01 4.95 5.29 5.10 4.80 4.82 5.26
5.05 5.06 4.81 4.76 4.84 4.72 4.95 5.11 5.03 5.06 5.11 5.13 4.74 4.80 4.91 5.13 5.05 4.97 4.95 5.05 5.09 5.04 5.13 5.01
0.72 0.81 0.47 0.34 0.43 0.29 0.42 0.40 0.30 0.59 0.31 0.48 0.44 0.35 0.59 0.35 0.54 0.25 0.17 0.35 0.34 0.37 0.47 0.54
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
5.66 5.15 4.78 5.48 4.70 4.86 5.45 5.00 5.08 4.80 4.68 4.78 5.01 4.28 4.73 4.85
5.00 5.08 6.17 5.26 4.66 4.99 5.70 5.60 4.90 4.30 4.60 4.97 4.80 4.50 4.82 4.80
4.80 5.78 5.20 5.00 4.58 4.65 6.03 5.20 5.41 4.60 4.63 4.62 4.85 4.74 4.91 5.10
5.01 5.15 5.80 5.20 4.80 4.95 5.97 4.78 4.90 5.08 4.86 4.80 4.85 4.70 5.10 4.95
5.20 5.60 5.97 5.26 5.02 5.02 6.28 5.48 5.58 4.90 4.97 4.85 4.90 4.80 5.05 5.35
5.13 5.35 5.58 5.24 4.75 4.89 5.89 5.21 5.18 4.74 4.75 4.80 4.88 4.60 4.92 5.01
0.86 0.70 1.39 0.48 0.44 0.37 0.83 0.82 0.68 0.78 0.37 0.35 0.21 0.52 0.37 0.55
(continued)
CH012-N53039.indd 239
5/26/2008 8:47:14 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
240
TABLE 12.3 (Continued) Replicate colony counts (log10 cfu/g) Sample 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
CH012-N53039.indd 240
1 4.60 4.45 4.55 5.02 4.74 4.80 5.33 5.00 5.32 5.63 5.29 5.15 4.60 4.71 4.80 5.10 5.03 4.90 4.45 4.98 5.57 5.49 5.03 5.10 5.28 5.09 5.31 5.53 4.93 5.18 4.90 4.60 5.27 5.08 4.76 5.12 5.28 5.25 5.28 5.20 5.85
2
3
4
5
4.60 4.60 4.46 4.88 4.81 4.90 4.99 5.35 5.18 5.51 5.33 5.45 4.49 4.84 4.68 5.35 4.70 4.78 4.80 4.71 5.10 5.36 5.95 5.87 5.10 5.42 5.63 5.32 5.18 4.50 5.04 4.30 4.81 5.60 5.31 5.09 5.49 4.90 5.44 5.24 6.09
4.50 4.69 4.50 4.72 4.76 5.30 5.10 5.10 5.45 5.35 5.31 4.78 4.90 5.01 5.15 5.10 4.94 4.88 4.85 5.15 4.87 5.16 5.12 4.98 5.36 5.39 5.84 5.20 5.04 5.08 4.95 4.30 4.79 5.70 4.65 4.80 5.38 5.08 4.98 5.42 5.44
4.30 4.45 4.32 4.24 4.96 5.10 5.25 5.08 5.30 5.10 5.88 5.30 4.89 5.10 5.20 4.80 4.88 4.95 5.10 5.17 5.26 4.89 5.09 5.80 5.90 5.37 5.27 5.04 5.13 4.30 5.08 5.18 4.70 5.20 4.88 5.05 5.18 5.25 5.20 5.47 5.36
4.62 5.10 4.62 4.54 4.81 4.80 5.33 5.50 5.40 5.40 5.40 5.26 5.10 4.82 4.90 4.85 4.98 5.30 5.04 5.28 4.76 5.00 5.01 4.99 5.20 5.49 5.65 5.17 5.12 4.30 5.26 5.08 5.09 5.51 5.20 5.10 4.70 5.40 5.40 5.38 5.63
Mean
Range
4.53 4.66 4.49 4.68 4.82 4.98 5.20 5.21 5.33 5.40 5.44 5.19 4.80 4.90 4.95 5.04 4.91 4.96 4.85 5.06 5.11 5.18 5.24 5.35 5.37 5.35 5.54 5.25 5.08 4.67 5.05 4.69 4.93 5.42 4.96 5.03 5.21 5.18 5.26 5.34 5.67
0.32 0.65 0.30 0.78 0.22 0.50 0.34 0.50 0.27 0.53 0.59 0.67 0.61 0.39 0.52 0.55 0.33 0.52 0.65 0.57 0.81 0.60 0.94 0.89 0.80 0.40 0.57 0.49 0.25 0.88 0.36 0.88 0.57 0.62 0.66 0.32 0.79 0.50 0.46 0.27 0.73
5/26/2008 8:47:14 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
241
x chart
(Avg 4.98, UCL 5.23, for Subgroups 1–24)
6.0 5.8
Mean log cfu/g
5.6 5.4 UCL
5.2
5.0 Average 4.8 4.6 4.4 1
6
11
16
21
27
32
42
47
52
57
62
67
72
77
82
Sample number
FIGURE 12.7 Statistical control chart for mean ( x ) colony count (as log10 cfu/g) on 82 sets of replicate examinations of a fresh food product produced under controlled conditions. The control lines were derived from the original 24 samples (Fig. 12.4).
Note secondly that the values for batches 32–33, 50–51, and 78–79 all lie within the bounds of 2 SD to 3 SD (UCL) – another prime indicator of an ‘out of control’ situation that requires investigation. In addition, several of these batches are linked to others where the colony counts exceed the UCL. Then note that several consecutive batches of product have results lying in the area above the average control line (e.g. batches 20–28, 51–55 and 62–70); in each case there is a sequential run of 7 or more points above the line. However, it should be noted that batches 20–24 were part of the ‘in control’ batches used to set up the control chart. If we include also some values below the average line (e.g. 47–49) we have an even longer sequential run of increasing mean colony count values. The R chart (Fig. 12.8) is less dramatic though it still provides valuable information. It shows successive runs of range values above the average line for batches 29–35 and 61–68. In addition, the ranges on samples from batches 32 and 65 exceed the UCL for ranges, indicating that one or more of the colony counts was considerably greater or smaller than the other data values in the set. If you look at the data in Table 12.3, you will note that for batch 27 the recorded colony counts were 4.78, 6.17, 5.20, 5.80 and 5.97 – the analyst should have been suspicious of these wildly disparate results – were the results calculated correctly or was this a heterogeneous batch of product? Similarly, for batch 63 we have counts of 5.03, 5.95, 5.12, 5.09 and 5.01 log cfu/g. The second count is significantly higher than the others – should it have been 4.95 not 5.95? Had a simple calculation or transcription error resulted in a set of data that indicates an out of control situation? These are just examples of how SPC charts can aid both microbiological laboratory control and also process control. Further rules for describing out of control situations
CH012-N53039.indd 241
5/26/2008 8:47:15 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
242
R Chart (Avg 0.43, UCL 0.91, LCL None)
1.60 1.40
Range (log cfu/g)
1.20 1.00
UCL
0.80 0.60 Average 0.40 0.20 0.00 1
6
16
21
27
32
37
42
47
52
57
62
67
72
77
82
Sample number
FIGURE 12.8 Statistical control chart for range (R) of colony counts (as log10 cfu/g) on 82 sets of replicate examinations of a fresh food product produced under controlled conditions. The control lines were derived from the original 24 samples (Fig. 12.6).
within SPC charts are given in the text of this chapter. Finally, a word of warning: if you use a computer program to generate control charts it is essential, once the control limits have been set, to ensure that they are not updated automatically whenever the chart is updated with new data!
CUSUM Charts CUmulative SUM charts (CUSUM) provide a particularly useful means of process control. They are more sensitive than the x and R charts for detection of small changes (0.5 – 1.5 SD) in process output but are less sensitive when large ( 2 SD) process shifts occur. Nonetheless, because of their sensitivity to smaller shifts they signal changes in the process mean much faster than do conventional charts. CUSUM charts are constructed differently. The sum of the deviation of a normalised value from a target statistic is plotted (see Example 12.3) and they do not have control limits as such. Instead, they use decision criteria to determine when a process shift has occurred. For standard CUSUM charts a ‘mask’ is used to cover the most recent data point; if the mask covers any of the previously plotted points it indicates that a shift in process conditions occurred at that point. Construction of the mask is complex and it is difficult to use (Montgomery, 1990). Beauregard et al. (1992) recommend that it is not used and that a CUSUM signal chart is used instead.
CH012-N53039.indd 242
5/26/2008 8:47:16 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
243
EXAMPLE 12.3 USE OF A CUSUM CHART TO MONITOR A CRITICAL CONTROL POINT ATP bioluminescence is used to provide a rapid and simple method to monitor the hygienic status of a process plant Critical Control Point (CCP). The data, in the form of Relative Light Units (RLU), measure the amount of microbial or other ATP at the CCP and the hygienic status of the process plant at that point is assessed by comparison with previously determined maximum RLU levels on the basis of a Pass/Fail status. If the test indicates that the plant has not been properly cleaned then cleaning and testing are repeated until satisfactory results are obtained, before starting process operations. Can we use a CUSUM chart to examine performance in monitoring this CCP? Trend analysis of RLU data provides a simple means to monitor the efficiency of cleaning operations over a period of time. Hayes et al. (1997) used the CUSUM approach to analyse data from a dairy company. They argued that since the data set consists of only single measurements (n 1) other forms of SPC were generally not suitable and the raw data do not conform to a normal distribution. This is not strictly true since alternative procedures do exist. However, their data provides a useful set for CUSUM analysis. The raw data and several alternative transformations are shown in Table 12.4. Although a square root transformation (as used by Hayes et al., 1997) improves the normality of the distribution, there is still considerable skewness and kurtosis; both forms of logarithmic transformation effectively normalise the data and remove both skewness and kurtosis (Table 12.5). Figure 12.9 shows the distributions of the raw and the Ln-transformed data, ‘box and whisker’ plots (that show outliers values in the raw data) and ‘normality plots’ and demonstrates the importance of choosing the correct transformation procedure to normalise data (Table 12.5). The CUSUM plot is set up as follows using a spreadsheet (Table 12.6): In column A, we list the data reference value (e.g. day numbers) and in column B the actual data values (for this example these are the actual RLUs measured on each day). In column C we transform the data values (column B) into the natural logarithms, that is, the Ln-RLU values and in column D we show the process ‘target’ value (Ln 110 4.70). We subtract the target value from the transformed data value in column E and in column F we sequentially add the values from column E to generate the CUSUM values. These values (column F) are then plotted against the day reference value (column A). An ‘individuals’ plot of the ATP measurements (as RLUs) for each day of testing (Fig. 12.10) shows a day-to-day lack of consistency amongst the readings with values in excess of the arbitrary control limit value of 110 RLUs, set prior to the routine use of the system, on days 46–47, 58, 69, 70, 71, 74 and 87. The standard plot shows the individual values above the limit but fails to indicate any significant adverse trends. A similar situation is seen in a plot of the Ln-transformed data (Fig. 12.11). By comparison, the CUSUM plot (Fig. 12.12) shows several adverse trends including a major adverse trend from day 65 onwards, thereby indicating that the process is going out of control, i.e. development of a potential problem, before it actually occurred. This is where the CUSUM plot scores over other forms of trend analysis – it is predictive and indicates that a process is going ‘out of control’ before it actually happens. However, to use the CUSUM system effectively it is necessary to develop a CUSUM Mask, which is both difficult and very time consuming. An alternate procedure is shown in Example 12.4.
CH012-N53039.indd 243
5/26/2008 8:47:16 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
244
TABLE 12.4
ATP ‘Hygiene Test’ Measurements on a CCP in a Dairy Plant, Before and After Transformation Using the Square Root Function, the Natural Logarithm (Ln) and the Logarithm to Base 10 (log10) Transformed RLU Day
RLU (x)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
23 46 39 62 33 21 20 56 69 26 27 24 22 69 42 97 56 33 33 69 24 34 29 29 19 57 81 99 43 80 68 112 25 52 24 35 15 47 104 53 81 34 17 62 23 123
冪x 4.80 6.78 6.24 7.87 5.74 4.58 4.47 7.48 8.31 5.10 5.20 4.90 4.69 8.31 6.48 9.85 7.48 5.74 5.74 8.31 4.90 5.83 5.39 5.39 4.36 7.55 9.00 9.95 6.56 8.94 8.25 10.58 5.00 7.21 4.90 5.92 3.87 6.86 10.20 7.28 9.00 5.83 4.12 7.87 4.80 11.09
Transformed RLU
Ln (x)
log10 (x)
Day
RLU (x)
3.14 3.83 3.66 4.13 3.50 3.04 3.00 4.03 4.23 3.26 3.30 3.18 3.09 4.23 3.74 4.57 4.03 3.50 3.50 4.23 3.18 3.53 3.37 3.37 2.94 4.04 4.39 4.60 3.76 4.38 4.22 4.72 3.22 3.95 3.18 3.56 2.71 3.85 4.64 3.97 4.39 3.53 2.83 4.13 3.14 4.81
1.36 1.66 1.59 1.79 1.52 1.32 1.30 1.75 1.84 1.41 1.43 1.38 1.34 1.84 1.62 1.99 1.75 1.52 1.52 1.84 1.38 1.53 1.46 1.46 1.28 1.76 1.91 2.00 1.63 1.90 1.83 2.05 1.40 1.72 1.38 1.54 1.18 1.67 2.02 1.72 1.91 1.53 1.23 1.79 1.36 2.09
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
147 13 12 15 14 76 28 29 44 15 16 123 5 41 56 9 82 22 22 63 85 34 158 164 155 79 94 319 38 19 64 50 16 25 28 35 52 28 35 52 139 89 62 55 12 67
冪x 12.12 3.61 3.46 3.87 3.74 8.72 5.29 5.39 6.63 3.87 4.00 11.09 2.24 6.40 7.48 3.00 9.06 4.69 4.69 7.94 9.22 5.83 12.57 12.81 12.45 8.89 9.70 17.86 6.16 4.36 8.00 7.07 4.00 5.00 5.29 5.92 7.21 5.29 5.92 7.21 11.79 9.43 7.87 7.42 3.46 8.19
Ln (x) 4.99 2.56 2.48 2.71 2.64 4.33 3.33 3.37 3.78 2.71 2.77 4.81 1.61 3.71 4.03 2.20 4.41 3.09 3.09 4.14 4.44 3.53 5.06 5.10 5.04 4.37 4.54 5.77 3.64 2.94 4.16 3.91 2.77 3.22 3.33 3.56 3.95 3.33 3.56 3.95 4.93 4.49 4.13 4.01 2.48 4.20
log10 (x) 2.17 1.11 1.08 1.18 1.15 1.88 1.45 1.46 1.64 1.18 1.20 2.09 0.70 1.61 1.75 0.95 1.91 1.34 1.34 1.80 1.93 1.53 2.20 2.21 2.19 1.90 1.97 2.50 1.58 1.28 1.81 1.70 1.20 1.40 1.45 1.54 1.72 1.45 1.54 1.72 2.14 1.95 1.79 1.74 1.08 1.83
Source: Based on the data of Hayes et al. (1997).
CH012-N53039.indd 244
5/26/2008 8:47:16 PM
245
60
25
50
20
40
Frequency
Frequency
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
30 20
15 10 5
10
0
0
5
3
4
2 Normal quantile
Normal quantile
6
3 2 1 0 1
1 0 1 2
2 3
3 0
50 100 150 200 250 300 350 RLU
1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Ln RLU
FIGURE 12.9 Data distributions, ‘box and whisker’ plots and ‘normality’ plots for untransformed and Ln-transformed data (as relative light units, RLU) determined by ATP analysis of the mandrel of a dairy plant bottling machine.
CH012-N53039.indd 245
5/26/2008 8:47:17 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
246
TABLE 12.5 Descriptive Statistics of Data from Table 12.4. Note that the Tests for Normality, Skewness and Kurtosis are Identical for the Two Alternative Logarithmic Transformations Both of Which Demonstrate Conformance with Normality; Note Also that Neither the Original Data nor the Square Root Transformed Data Comply with a Normal Distribution and Show Marked Skewness and Kurtosis Transformed RLU Parameter Number of Tests (n) Mean Median Variance Standard deviation Normality test a Skewness Kurtosis
RLU (x)
冪x
Ln (x)
log10 (x)
92
92
92
92
54.6 41.5 2109.8 45.93
6.90 6.44 7.044 2.654
3.726 3.726 0.558 0.747
1.618 1.618 0.105 0.324
4.69 (P 0.0001)
1.33 (P 0.0019)
0.205 (P 0.8723)
0.205 (P 0.8723)
2.67 (P 0.0001) 11.45 (P 0.0001)
1.13 (P 0.0001) 2.18 (P 0.0055)
0.004 (P 0.9864) 0.025 (P 0.8904)
0.004 (P 0.9864) 0.025 (P 0.8904)
Source: Based on the data of Hayes et al. (1997). Anderson-Darling Test.
a
TABLE 12.6 Spreadsheet Layout for Data Used to Derive a Cumulative Sum (CUSUM) Chart
1 2
B
C
D
E
F
Day
RLU
Ln-RLU
Target (Ln-RLU)
Difference (C – D)
CUSUM
3
0
4
1
23
3.135
3.912
0.777
0.777
5
2
46
3.829
3.912
0.083
0.860
6
3
39
3.664
3.912
0.248
1.108
7
4
62
4.127
3.912
0.215
0.893
8
5
33
3.497
3.912
0.416
1.309
9
6
21
3.045
3.912
0.868
2.176
10
7
20
2.996
3.912
0.916
3.093
11
8
56
4.025
3.912
0.113
2.979
12
9
69
4.234
3.912
0.322
2.657
13
10
26
3.258
3.912
0.654
3.311
…
…
…
…
…
…
…
…
…
…
…
…
90
55
4.007
3.912
0.095
12.090
92
CH012-N53039.indd 246
A
93
91
12
2.485
3.912
1.427
13.517
94
92
67
4.205
3.912
0.293
13.224
5/26/2008 8:47:17 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
350
247
Individuals chart (Target 50, UCL 110, LCL 0, s 33.12, for Subgroups 1–92)
300 250
RLU
200 150 UCL 100 Tartget 50 0 1
10
19
28
37
46
55
64
73
82
91
Sample number
FIGURE 12.10 Individuals plot of the ATP data from hygiene tests on a dairy bottling mandrel.
Individuals chart (Target 3.90, UCL 4.70)
6.0 5.5 5.0
UCL
Ln RLU
4.5
Target
4.0 3.5 3.0 2.5 2.0 1.5 1.0 1
11
21
31
41
51
61
71
81
91
Sample number
FIGURE 12.11 Individuals plot of the ATP data (as Ln-RLU) from hygiene tests on a dairy bottling mandrel.
CH012-N53039.indd 247
5/26/2008 8:47:17 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
248
0.0
Cumulative sum of Ln RLU
5.0
10.0
15.0
20.0
25.0
0
10
20
30
40
50
60
70
80
90
Day number
FIGURE 12.12 CUSUM plot of the ATP data (as Ln-RLU) from hygiene tests on a dairy bottling mandrel.
The CUSUM Signal Chart This consists of two parallel graphs of modified cumulative sum values. The upper graph is a plot of cumulative sum upper signal values in relation to an upper signal alarm limit (USAL). The lower graph records the cumulative sum lower signal values against a lower signal alarm limit (LSAL). The cumulative signal values are calculated by adjusting the nominal mean by a signal factor (SF), which is a derived value for the level of shift to be detected (Example 12.4): Cumulative USAL
∑ (x v
SF)
Cumulative LSAL
∑ (x v
SF)
where, x v is the nominalised mean ( x – a target value). Limits may be applied to both the upper and lower signal values. The upper signal value can never be negative so that
CH012-N53039.indd 248
5/26/2008 8:47:18 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
249
if the calculation indicates a negative value then it is put equal to zero and is not plotted. Similarly, the lower signal value can never be positive so a positive value is put as zero and is not plotted. Example 12.4 illustrates the derivation of a CUSUM signal plot.
EXAMPLE 12.4 CUSUM SIGNAL PLOT Can we modify the CUSUM chart to provide more definitive information on ‘out of control’ situations? The signal chart consists of two parallel graphs of modified CUSUM values. The upper graph plots a ‘cumulative upper signal value’ that is monitored against an upper signal alarm limit (USAL). The lower graph monitors a ‘cumulative lower signal value’ against a LSAL. Each data value is ‘nominalised’ by subtraction of a target value, the nominalised mean ( xv ) ( x – target value). The upper and lower signal values are calculated by adjusting the ‘nominalised’ mean value by a ‘signal factor’ (SF), which is a function of the extent of change (level of shift) to be detected and the standard deviation of the data and is calculated as: SF (change to be detected) (sample standard deviation/2). If a change of 1 standard deviation is to be signalled by the CUSUM chart, then the SF SD/2. The signal values are calculated as: Cumulative upper signal value ( xv SF) Cumulative lower signal value ( xv SF) The USAL and LSAL, against which any change is monitored, are set at 10 SF and 10 SF, respectively. To use the system, the data are ‘nominalised’, adjusted by the SF and then summed cumulatively. Table 12.7 shows how the data for analysis is set out in a spreadsheet; for this example, the data are the individual Ln-transformed values used in Example 12.3. Note that in column E and G, respectively, the cumulative US values 0 and the cumulative LS values 0 are shown as [0]. Zero values are not plotted for either signal value. In deriving the Table, the SD used (s 0.75) is that of the Ln-transformed group (see Table 12.5) and the ‘nominal value’ used to ‘nominalise’ the data is the approximate mean value of the entire group (3.7). The cumulative Upper Signal (US) values (column E) and the cumulative Lower Signal (LS) values are plotted against sample number, to give a series of individual points and lines. Whilst it is simple to plot the data manually, it is difficult to achieve the effect seen in Fig. 12.13 using normal Excel graphics – these lines were produced using an add-on programme for SPC. Figure 12.13 shows a continuous rise in US values from day 69 to day 74, with days 74 to 78 all exceeding the USAL that is set at 10 SF (i.e. 10(s/2) 3.75).
CH012-N53039.indd 249
5/26/2008 8:47:18 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
250
TABLE 12.7 Spread sheet Data Layout for Deriving CUSUM Signal Values – Only the First 18 Values are Shown. The Nominal Value is Taken as 3.7 (Mean Ln Value for All 92 Data Points) and s 0.75 (from Table 12.5)
1
A
B
C
D
E
F
G
Day
Ln RLU
Nominalised (Nominal 3.7)
Nominalised SF
Cumulative US value
Nominalised +SF
Cumulative LS Value
xv ( x 3 . 7)
Xv s / 2
∑ ( xv s / 2)
Xv s / 2
Xv s / 2
2
x
3
1
3.135
0.56
0.94
[0]
4
2
3.829
0.13
0.25
[0]
0.50
[0] b
5
3
3.664
0.04
0.41
[0]
0.34
[0]
6
4
4.127
0.43
0.05
7
5
3.497
0.20
0.58
0.19
a
0.05 [0]
0.19
0.80
[0]
0.17
[0]
8
6
3.045
0.66
1.03
[0]
0.28
0.28
9
7
2.996
0.70
1.08
[0]
0.33
0.61
[0]
10
8
4.025
0.33
0.05
11
9
4.234
0.53
0.16
0.16
0.70
[0]
0.91
[0]
12
10
3.258
0.44
0.82
[0]
0.07
0.07
13
11
3.296
0.40
0.78
[0]
0.03
0.10
14
12
3.178
0.52
0.90
[0]
0.15
0.24
15
13
3.091
0.61
0.98
[0]
0.23
0.48
16
14
4.234
0.53
0.16
17
15
3.738
0.04
0.34
0.16 [0]
0.91
[0]
0.41
[0]
18
16
4.575
0.87
0.50
0.50
1.25
[0]
19
17
4.025
0.33
0.05
0.45
0.70
[0]
20
18
3.497
0.20
0.58
0.17
[0]
[0]
Note: The value of Ln-RLU is shown as x since in most CUSUM signal charts this would be the mean value of two or more values. The cumulative US and cumulative LS values are plotted in Fig. 12.13. a values 0 are shown as [0] and not included in the chart. b values 0 are shown as [0] and not included in the chart.
CUSUM signal chart 5
Signal value
USAL 3.75
Avg 0.00 LSAL 3.75 -5 1
11
21
31
41
51
61
71
81
91
Sample number
FIGURE 12.13 CUSUM signal plot of the ATP data (as Ln-RLU) from hygiene tests on a dairy bottling mandrel 䊏 Upper Signal values; 䉬 Lower Signal values.
CH012-N53039.indd 250
5/26/2008 8:47:18 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
251
The Interpretation of CUSUM Signal Charts Interpretation is relatively straightforward. Any value that exceeds the USAL (or for some purposes the LSAL) is deemed to be ‘out of control’ since the signal value limits are set to enable the detection of a predetermined change, usually 10 times the SF. Smaller signals may be indicative of less extreme changes. Control Charts for Attribute Data Attribute control charts can be used for the results of quantal tests (e.g. tests for the presence of salmonellae in a given quantity of sample), but are probably worthwhile only if many sample units are tested. The charts can be set up for numbers or proportions of non-conforming sample units. For microbiological tests, a number chart for non-conforming units is probably the most useful but the number of sample units tested must generally remain constant. Anon. (2006b) provides special measures for examining and plotting data when different numbers of samples are tested. For an np chart the centre line is the average number of non-conforming units ( np ), where p is the average proportion of positive results, (npi) number of non-conforming samples in sample set i 1 to k and k number of sample sets: k
np
∑ (np)i i 1
k
(np)1 (np)2 ... (np)k k
The upper and lower control limits are calculated from: UCLnp np 3
np(1 − np) n
LCLnp np 3
np(1 np) n
where, n the sample size. The use of this system is shown in Examples 12.5 and 12.6.
EXAMPLE 12.5 SPC OF ATTRIBUTES DATA How can I set up an SPC for data on the prevalence of organisms from quantal tests? Special versions of SPC are available for analysis of attributes data. Attributes can include the occurrence of defective products (Example 12.6), the prevalence of detection of pathogenic or index organisms in food samples, or some other measurable attribute. Let us assume that tests were done for campylobacters on swabs of 30 randomlydrawn poultry carcases per flock in a processing plant; let us assume further that several flocks of poultry had been slaughtered and processed sequentially and that the first eight flocks had previously been designated as being free from campylobacters
CH012-N53039.indd 251
5/26/2008 8:47:19 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
252
TABLE 12.8 Occurrence of Positive Campylobacter Samples in Swabs of Processed Chickens from Sequential Flocks Passing Through a Processing Plant Tests for Campylobacter
Flock number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Total 1–8 Total 9–20
Number of ve np
Number of ve
Proportion ve p
1 0 2 1 1 2 0 1 13 20 12 12 15 16 14 23 18 11 19 25
29 30 28 29 29 28 30 29 17 10 18 18 15 14 16 7 12 19 11 5
0.03 0.00 0.07 0.03 0.03 0.07 0.00 0.03 0.43 0.67 0.40 0.40 0.50 0.53 0.47 0.77 0.60 0.37 0.63 0.83
8 198
232 162
0.03 0.55
(‘camp ve’) flocks whilst the campylobacter history of the others flocks were unknown. The results of the examinations are summarised in Table 12.8. We will examine the data using an np control chart, where p the proportion of positive samples and n is the total number of replicate tests per flock (n 30) to compare the number of positive samples for each set of tests (Fig. 12.14). The chart shows the low prevalence of campylobacter positive results from the initial eight flocks and, a high but variable prevalence of campylobacters in the remaining flocks. The average number of all positive tests is np 10.3 out of 30 tests (n) for which the upper (UCL) and lower (LCL) control limits can be determined as UCL 18.1 and the LCL 2.5, using the equations UCL np 3 np(1 (np / n)) and, LCL np np 3 np(1 (np / n)) .
np
These
values
are
shown
on
the
chart
(Fig. 12.14).
CH012-N53039.indd 252
5/26/2008 8:47:20 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
253
np control chart (Avg 10.3, UCL 18.1, LCL 2.5, n 30, for subgroups 120)
30
Number positive
25
20 UCL 15
10
Avg
5 LCL 0
1
2
3
4
5
6
7
8
9
10 11 12 Flock number
13
14
15
16
17
18
19
20
FIGURE 12.14 Attributes np -chart for prevalence of campylobacter in poultry processed sequentially through a commercial plant. The first eight flocks were believed to be campylobacterfree flocks but the status of the remainder was not known. Note the overall average number of positive tests was 10.3 out of 30. The average (Avg), upper control limit (UCL) and lower control limit (LCL) are shown by dotted lines.
However, we need to set control limits using the data from the uncontaminated flocks (1–8) for which the value of np np/k 8/8 1. Since this value of np is 5, we cannot use the standard equations to derive the values for the UCL and LCL which will be asymmetrical about the mean value ( np ). We must derive these estimates from the binomial distribution. The probability with which we can determine the occurrence of defective samples is based on the equation: Px [n ! ]/[(n i) ! i ! ]pi (1 p)ni , where i number of positive results, n number of replicate tests 30 and p the average proportion of positive re sults 1/30 0.03333. The cumulative binomial probability is the sum of the individual probabilities for i 0…x. The probability for zero positives (i.e. i 0) is given by: 30 ! Pi0 0 . 033330 (1 0 . 03333)(300) (30 0) ! 0 ! 0 . 966730 0 . 3617 Similarly, the probability for 1 positive (x 1) is given by: 30 ! 0 . 033331 (1 0 . 03333)(301) (30 1) ! 1 ! 30 0 . 0333 0 . 966 729 0 . 3741
Pi1
Hence, the cumulative probability of finding up to, and including, 1 positive is given by: Pi 1 Pi0 Pi1 0 . 3617 0 . 3741 0 . 7358
CH012-N53039.indd 253
5/26/2008 8:47:20 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
254
The individual probabilities of occurrence for values of i from 2 to 6 are calculated similarly and the cumulative probabilities are determined: Number of positive tests (i) 0 1 2 3 4 5 6
Probability of occurrence 0.3617 0.3741 0.1871 0.0602 0.0140 0.0025 0.00036
Cumulative probability 0.3617 0.7358 0.9229 0.9831 0.9971 0.9996 0.9999
For a single-sided limit, the UCL must be set at an value of 0.00270 such that 1 0.9973. We must therefore use the value of x 5 for the UCL, since this is the first value for which the cumulative probability (0.9996) exceeds the limit probability of 0.9973. By definition, the LCL is zero. We can now set the control limits based on the initial 8 data sets: the average number of positive tests 1 and the UCL 5. However, if the chart were to be based on two-sided limits, the UCL would be based on 0.9956 in which case we would use the value for x 4. The chart shows that all of the succeeding data sets exceed this UCL (Fig. 12.15) and therefore demonstrate that the prevalence of campylobacters in the ‘camp ve’ flocks is ‘out of control’ by comparison with the ‘camp ve’ flocks. The control charts used in this example were prepared using an Excel add-in program (SPC for Excel from Business Process Improvement, Cypress TX, USA) but the charts could have been derived manually using the procedures described above. np control chart (Limits based on subgroups 1–8, n 30)
30
Number positive
25
20
15
10
5
UCL Avg
0 Flock number (k)
FIGURE 12.15 Attributes np -chart for numbers of positive tests for campylobacter in a poultry processing plant. The average and UCL (dotted lines) were derived for the first eight flocks of ‘camp ve’ birds giving an average number of positive tests of 1 and an upper control limit of 5. Note that the prevalence of contamination in all of the unknown flocks was greater than the UCL.
CH012-N53039.indd 254
5/26/2008 8:47:20 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
255
EXAMPLE 12.6 USE OF ATTRIBUTE DATA CHARTS IN PROBLEM INVESTIGATION During a spell of hot weather, a beverage manufacturer received a high number of complaints regarding spoilage in several lots of one particular brand of a flash-pasteurised beverage. Spoilage was evidenced by cloudiness and gas formation. Production was halted and the product was recalled from the marketplace. The frequency of overt spoilage in bottles of product from consecutive lots and the presence of spoilage yeasts in apparently unspoiled product were determined to try to identify the cause(s) of the problem. The results are summarised in Table 12.9. It is clear from the results that the spoilage problem appeared to start suddenly – probably as a consequence of the increased ambient temperatures. The plot of the spoilage data using an np Attributes chart without control limits (Fig. 12.16) shows a sudden steep rise in incidence for lots 4037 onwards, followed by a gradual fall-off in lots 4042– 4047. However, even in unspoiled product there was evidence of significant levels of contamination by spoilage yeasts in the majority of the unspoiled products in all lots from 4030 onwards. Examination of cool-stored reference stock from the earlier lots also showed evidence of yeast contamination at an average level of slightly greater than 1 organism/750 ml bottle (Fig. 12.17).
60 55
Prevalence of defective bottles
50 45 40 35 30 25 20 15 10 5 0 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 Lot number
FIGURE 12.16 Multiple np -chart showing the prevalence of spoiled products (_____) and of unspoiled products with evidence of yeast contamination (-- - - - - -). The arrow indicates the onset of the heat wave.
CH012-N53039.indd 255
5/26/2008 8:47:20 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
256
TABLE 12.9 Incidence of Overt Spoilage and Yeast Contamination in a Bottled Beverage
Number of products a Lot number 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050
Spoiled 1 3 4 1 0 1 3 13 46 54 52 59 53 48 46 12 8 1 0 0 0
Contaminated 35 32 36 48 39 25 42 10 5 4 8 1 7 12 13 48 50 59 60 60 57
Spoiled or contaminated (%)
Incidence of low level contamination b (%)
60 58 67 82 65 43 75 38 85 97 100 100 100 100 98 100 97 100 100 100 95
40 50 70 50 80 60 90 60 100 100 100 100 100 100 100 100 100 100 100 100 100
a 60 bottles tested/lot. b
10 cool-stored reference products tested/lot.
A cause and effect study was undertaken prior to examination of the process plant; Fig. 12.18 shows some of the potential causal factors, to illustrate the way in which a ‘cause and effect’ diagram can be used as an aid to problem-solving. After much investigation it was concluded that the actual cause was a major breakdown in pasteurisation efficiency, although the process records (not shown) indicated that the pasteurisation plant had operated within defined time and temperature limits. A strip-down examination of the pasteurisation plant identified a defective plate seal in the pre-heat section that had permitted a low level contamination of pasteurised product by unpasteurised material. The presence of low levels of viable yeast cells in reference samples of earlier batches of product indicated that previous lots had also been contaminated but that the organisms had not grown sufficiently to cause overt spoilage at the ambient temperatures then occurring.
CH012-N53039.indd 256
5/26/2008 8:47:21 PM
STATISTICAL PROCESS CONTROL USING MICROBIOLOGICAL DATA
257
100
90
Per cent defective
80
70
60
50
40
30 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 Target 0
Lot number
FIGURE 12.17 Comparison of prevalence of actual product spoilage (–––––––) and the prevalence of low-level yeast contamination in cool-stored reference samples of the product (- - - - - - - - - -). Measurement
People
Environment
Policies Process Management
Storage temperature Storage time
Temperature
Time
Practices Distribution system
Quality control Records Quality assurance Records
Process operators QC/QA staff Warehouse operatives Product spoilage
Production plant Maintenance Filling plant Pasteurisation plant
HACCP Scheme CCPs Monitoring procedures/Records Verification procedures/Records
Microbial load
Ingredients
Ingredient quality Process batch Packaging materials
Process operation Cleaning procedures Process records
Machines
Methods
Materials
FIGURE 12.18 Example of a simplified ‘Cause and Effect’ (Fishbone) diagram used in the investigation of product spoilage.
CH012-N53039.indd 257
5/26/2008 8:47:21 PM
258
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Why had these yeasts not been detected in the regular checks made on products? Simply, because the level of contamination was only slightly more than 1 viable yeast cell per 750 ml bottle of product and the methodology used for routine testing relied on membrane filtration of 100 ml of product from each bottle tested. Hence, the likelihood of detecting such a low prevalence of yeasts was not high. In this example the causal effect of contamination was a pasteurisation plant defect, but a major contributory factor was the elevated ambient temperature. Another major contributory factor, albeit not a causal factor, was the inadequacy of the microbiological test system that was not fit for the purpose required since it could not detect the presence of such a low level of product contamination. This example serves also as a warning that it is not possible to ‘test quality in a production process’!
CONCLUSION Control charts, as a means of simple trend analysis, provide visual aids to interpretation of data that are more readily understood than are tables of data values. They identify when a process went out of control so that appropriate remedial action can be taken before ‘out of control’ equates to ‘out of specification’ with a subsequent manufacturing crisis. References Anon. (1956) Statistical Quality Control Handbook, 2nd edition. Delco Remy, Anderson, IN. Anon. (2005) Commission Regulation (EC) No 2073/2005 of 15 November 2005 on microbiological criteria for foodstuffs. Official J Europ. Union, L138, 1–26. 22 December 2005. Anon. (2006a) Final Report and Executive Summaries from the AOAC International Presidential Task Force on Best Practices in Microbiological Methodology: Part II F Statistical Process Control. USFDA. http://www.cfsan.fda.gov/~acrobat/bpmm-f.pdf Anon. (2006b) Final Report and Executive Summaries from the AOAC International Presidential Task Force on Best Practices in Microbiological Methodology: Part II F Statistical Process Control – 1. Appendices for Statistical Process Control. USFDA. http://www.cfsan.fda. gov/~acrobat/bpmm-f1.pdf Beauregard, MR, Mikulak, RJ, and Olson, BA (1992) A Practical Guide to Statistical Quality Improvement. Van Nostrand Reinhold, New York. Duncan, AJ (1986) Quality Control and Industrial Statistics, 5th edition. Richard D Irwin, Homewood, IL.. Hayes, GD, Scallan, AJ, and Wong, JHF (1997) Applying statistical process control to monitor and evaluate the hazard analysis critical control point hygiene data. Food Cont., 8, 173–176. Juran, JM (1974) Quality Control Handbook, 3rd edition. McGraw Hill Book Co., New York. Montgomery, DC (2000) Introduction to Statistical Quality Control, 4th edition. John Wiley & Sons, New York. Shewhart, WA (1931) Economic Control of Quality of Manufactured Product. Van Nostrand Co. Inc., New York. Wheeler, DJ and Chambers, D (1984) Understanding Statistical Process Control, 2nd edition. Statistical Process Control Inc., Knoxville, TN.
CH012-N53039.indd 258
5/26/2008 8:47:22 PM
13 VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD Dr Sharon Brunelle*
Microbiological methods, either qualitative (quantal; presence/absence) or quantitative are composed of multiple steps, including sample preparation, sample analysis, data interpretation and confirmation (Fig. 13.1). The optimal integration of these steps defines the method and is critical to providing reliable results. Nowadays, microbiological methods range from fully manual methods to partially automated methods to fully automated methods. Quantal methods for the detection of pathogens in food are generally aimed at detecting 1 colony-forming unit (cfu) per test portion (typically 25 g of food matrix). In order to achieve this level of detection, the organisms in the test portion of food must be enriched to a level detectable by the analytical assay. The detectable level for polymerase chain reaction (PCR) methods is in the range of 103–104 cfu/ml and for immunoassays is 105–106 cfu/ml. Both raw foods and processed foods set challenges at a detection level of 1 cfu/test portion. Raw foods carry a higher bacterial load than processed foods, so the challenge is to detect 1 cfu of target
Qualitative detection
Quantitative estimation
Sample preparation
Sample enrichment
Analysis
Isolation and confirmation
Sample preparation
Analysis
Data processing
FIGURE 13.1 Steps comprising qualitative and quantitative microbiological methods. *Brunelle Biotech Consulting, Technical Consultant to AOAC International and AOAC Research Institute Woodinville WA, USA. Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH013-N53039.indd 259
259
5/26/2008 8:56:40 PM
260
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
organism among 104–106 cfu of background flora and as a result, the enrichment media used often contain selective agents to suppress non-target organisms. For processed foods, the challenge lies in the state of the organism. After processing, any surviving microbial cells are likely to be injured, so the method must allow for recovery of injured cells. In this case, a pre-enrichment broth without selective agents can be employed to allow the organism to repair itself and reproduce. Transfer to a selective broth or addition of selective agents can then be used to suppress non-target organisms once repair and recovery have taken place. Examples of qualitative methods include use of chromogenic agars, enzyme-linked immunosorbent assays (ELISAs), lateral flow immunoassays, enzyme-linked gene probe assays and PCR techniques. Quantitative methods are aimed at estimating the level of pathogens, coliforms and E. coli, yeast and mould, or total aerobic bacterial load. A test portion, typically 50 g of food matrix, is diluted into a broth or buffer at a ratio of 1:9. The diluted homogenised test portion is analysed directly without enrichment. Quantitative methods include direct methods such as enumeration by plate count, or indirect methods, such as the most probable number (MPN), impedance measurement and real time PCR (RT-PCR). MPN procedures require replicate tubes at multiple dilutions with a qualitative readout of each tube, followed by the calculation of the ‘most likely’ estimate of the contamination level based on the number of tubes positive at each dilution level. Impedance methods measure the time required, under controlled growth conditions, to cross a threshold impedance value. The time required to cross the threshold value is related to the contamination level of the sample using a standard curve. RT-PCR measures the number of cycles required to cross a threshold value (Ct value). The Ct value relates to the contamination level of the sample in that the higher the contamination level, the fewer PCR cycles required to reach the threshold. Ct values are converted to cfu/test portion using a standard curve usually built into the RT-PCR software.
THE STAGES OF METHOD DEVELOPMENT Method development begins with a concept of the intended use of the method. This defines the type of test, the target analyte, the applicable foods or matrices and the user of the method. The type of test may be qualitative detection or quantitative estimation of contamination. The target analyte, meaning the microorganism or toxin that the method detects, quantifies or identifies, can be a genus, a species or a serovar. Matrices are generally foods or environmental surfaces. Foods are typically divided into categories, such as meat, poultry, seafood, fruits and vegetables, etc. and these categories are subdivided into raw, heat processed, frozen, fermented, etc. as appropriate. Examples of environmental surfaces are those that can be found in a food manufacturing facility, such as stainless steel, rubber, sealed concrete, ceramic or glass, plastic, wood, food-grade painted surfaces, air filter material and cast iron. In microbiological food examination, the end user is typically a trained laboratory technician. However, there are instances where this may not be the case, such as detection
CH013-N53039.indd 260
5/26/2008 8:56:41 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
261
of biological threat agents where the end user could be a trained ‘first responder’ in the field. As an example of an intended use statement, the method developer might state, ‘this method is intended for the detection of Salmonella species in raw poultry in a microbiological laboratory’. Method development can begin once the concept of the method is defined by the intended use. First, the ‘assay critical’ reagents are identified. If the method is an immunoassay, antibodies are screened against pure culture target and non-target organisms to identify highly selective candidates. For molecular-based assays, DNA sequences are identified and targeted with primers and probes. These primers and probes are screened against DNA derived from target and non-target organisms. Performance parameters, such as sensitivity, specificity, Ct values and the like are measured under various assay conditions to optimize performance and to determine the best candidate critical reagents. Assays are then constructed around the ‘best candidate’ critical reagents, based on the initial pure-culture screening. The assays are challenged with pure cultures and a few selected foods representative of the breadth of the intended matrices. During this process, sample preparation methodology is also varied to ensure that the target analyte is presented to the critical reagent(s) in an optimal manner that facilitates detection. For immunoassays, this means ensuring that the antigenic target is accessible to the antibody. In the final stages of method development, the optimized assay configuration is tested against matrix samples inoculated at various levels of target analyte. Final optimization of assay conditions (buffers, critical reagent concentrations, reaction times, reaction temperatures, etc.) occurs at this stage. Statistically designed experiments should be carried out to verify optimal assay conditions, to examine the ruggedness of the assay and to establish ‘guard bands’ on the critical assay parameters. Guard bands are defined in this context as the variations of assay conditions that are tolerated by the method without significantly affecting the method results. Once method development is complete, the method is transferred to the manufacturing facility and process development occurs. The goal of process development is to devise a manufacturing scheme that produces assay components consistent with the final assay design in a reproducible manner. Some re-optimization of the assay design may be required upon scaled-up production in order to achieve the assay performance observed in the final method development stage. Additionally, new ‘guard bands’ may need to be established on full-scale manufactured assay components. The manufactured assay components and method instructions are now ready for validation.
WHAT IS VALIDATION? Validation is the establishment of method performance in a single laboratory or multiple laboratories under controlled conditions. A method can be validated to demonstrate that it performs as claimed; that it performs at least as well as a validated standard method; or that it performs to a set of established criteria. In general, current practices in food microbiology
CH013-N53039.indd 261
5/26/2008 8:56:41 PM
262
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
(Anon, 2003; Feldsine et al., 2002) validate by comparison to a standard, typically an official or regulatory method, such as an AOAC International Official Method of AnalysisSM (AOAC, 2007a), an FDA Bacteriological Analytical Manual (BAM) method (FDA, 2007), a USDA Microbiological Laboratory Guidebook (MLG) method (USDA, 2007) or an International Standards Organization (ISO) method (Anon, 2007). Some organizations, such as ISO and AOAC, are re-visiting validation standard practices and considering a paradigm shift to establishing method performance independent of comparison to a standard (FDA, 2007b). This could result in accepting evidence to demonstrate that a method performs as claimed or, alternatively, accepting comparative method performance against established acceptance criteria. The former requires the end user to determine whether the method performance meets their needs for a particular use, that is, whether the method is ‘fit for purpose’. For example, a method might detect Salmonella reliably down to a level of 10 cfu/25 g. While this does not meet regulatory standards of 1 cfu/25 g, there might be non-regulatory uses for such a method. The latter establishes whether the method meets the requirements for a particular intended use, such as regulatory testing. In the range of 1 cfu/test portion, it is also feasible to compare method performance to the theoretical Poisson distribution of target organisms in the food matrix. Validation of a microbiological method for food generally includes a test of inclusivity and exclusivity to establish the analytical selectivity of the method for the analyte. In other words, is the method inclusive in the scope of the target analyte and is it exclusive of cross reactivity to closely related non-target analytes? Such pure culture studies establish the analytical scope of the method. The method is then challenged with a range of artificially- or naturally-contaminated food matrices to establish: (1) that the food matrices do not interfere with either the growth of the organism or its detection, and (2) that background flora found naturally in foods do not suppress the enrichment or detection of the target organism. It may be discovered, for example, that a particular method well suited to detection of analyte in processed foods where background flora is generally very low, does not perform well with unprocessed or raw foods that tend to have much higher levels of competing background flora. It is for these reasons that a method should be considered validated only for those foods or food types that have been tested successfully in the validation study. While validation establishes the method performance in one or multiple laboratories, verification establishes proper implementation of a method in the end user’s laboratory. Using a specified verification protocol, the laboratory performs the method and ensures that results are comparable to the method performance established in the validation study.
Selectivity Testing For inclusivity testing of qualitative methods, 50 or more strains are chosen for analysis based on the target scope of the method. For example, if the target group is a family, then representative strains from the various genera within that family are chosen. If the target is a species, then a range of strains within that species should be chosen to represent the antigenic or genetic
CH013-N53039.indd 262
5/26/2008 8:56:41 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
263
diversity. In the case of a genus-level test for Salmonella, 100 strains should be chosen due to the much greater size and variation of the genus compared to other pathogens. Exclusivity testing requires at least 30 organisms closely related to the target group. The test results (positive or negative) of each organism are reported and % inclusivity and % exclusivity are expressed as the ratio of the number of organisms correctly detected to the number of organisms tested. For instance, a method for which the inclusive strains resulted in 49 positive responses out of 50 strains tested, has an inclusivity ratio of 49/50. Selectivity testing is not applicable to quantitative methods that measure general categories of organisms, such as total aerobic counts or yeast and mould counts. For quantitative methods that target a genera or species, however, 30 strains of the target microorganism are tested. Exclusivity requires at least 20 closely related non-target organisms. As for qualitative methods, the % inclusivity and % exclusivity values are determined.
Method Comparison: Qualitative Methods Qualitative method validation studies can be divided into two types – paired sample design and unpaired sample design (Fig. 13.2). A paired sample design is one in which a single test portion is enriched and the enriched test portion is analysed by both the alternative method (the new method being validated) and the reference method (the established official or regulatory method). This design results from a common enrichment scheme for the two methods. When the enrichment conditions differ between the two methods, an unpaired sample design is used. In the unpaired design, distinct test portions are enriched and analysed by the two methods or the enrichment schemes diverge after a common preenrichment. It must be noted that, regardless of presumptive results, the alternative method test portion enrichment cultures must be subjected to the confirmatory procedures outlined in the reference method in order to establish the true status (positive or negative) of the test portions. The distinction between paired and unpaired study designs is critical to the resulting statistical analyses. Comparison of two qualitative methods is best accomplished near the limit of detection of one or both of the methods. It is only at low levels of contamination (approximately 0.2–2 cfu/test portion) that differences between the methods can be observed. Thus, the goal of the method comparison study is to achieve an artificial or natural contamination level that yields fractional positive results across all replicates, preferably near 50% positive for one of the methods. The data can then be statistically analysed for a significant difference between the methods. If fractional positive results are not obtained for at least one of the methods, then the study is repeated at a higher or lower contamination level as needed to achieve fractional positive results. To avoid having to repeat a study, many analysts will inoculate at more than one level to increase the chances of obtaining fractional positive results. The inoculation of food matrices is carried out using a single strain of target organism. Food isolates are preferred and a different strain is used for each food type tested. The inoculating strain is cultured in an appropriate enrichment broth and the concentration is determined by plate count. Enough food matrix is inoculated at one level to carry out testing of replicate test portions by the alternative and reference methods as well as the test portions required
CH013-N53039.indd 263
5/26/2008 8:56:41 PM
264
CH013-N53039.indd 264
Test portion 25 g
Test portion 25 g
Primary enrichment in Media A
(a)
Secondary enrichment in Media B
Secondary enrichment in Media C
Test portion 2 25 g
Reference method enrichment
Alternative method enrichment
Reference method detection
Alternative method detection
Reference method detection
Alternative method detection
Reference method detection
Reference method detection
Alternative method detection
Reference method detection
Reference method result
Alternative method result
Reference method result
Alternative method result
Reference method result
Reference method result
Alternative method result
Reference method result
Confirmed
Presumptive
Confirmed
Presumptive
Confirmed
Confirmed
Presumptive
Confirmed
(b)
(c)
FIGURE 13.2 Examples of paired and unpaired study designs: (a) Paired samples; (b) Unpaired samples with different secondary enrichment; and (c) Unpaired samples with different primary enrichment.
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Enrichment in Media A
Test portion 1 25 g
5/26/2008 8:56:41 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
265
for MPN analysis (Blodgett, 2003), (Goulden, 1959). Typically, these amounts are approximately 900 g for paired samples and approximately 1400 g for unpaired samples. Statistical Analysis of Paired Sample Designs: Single Laboratory and Collaborative Laboratory Validation Data are collected and analysed by food matrix and contamination level. The food matrix is inoculated in bulk, homogenized, and allowed to stabilize for an appropriate period of time. On the day of analysis, replicate 25-g test portions are randomly removed from the bulk and homogenized in the enrichment medium at a 10-fold dilution of the test portion (25-g food matrix with 225 ml enrichment medium). On the same day that replicate test portion enrichments are begun, the bulk contaminated matrix is also examined by MPN (Blodgett, 2003) to determine the contamination level at the initiation of the study. Typically, a three-tube MPN at three or four levels is carried out. For example, the technician might analyse three 100 g portions, three 10 g portions, three 1 g portions and three 0.1 g portions. Following analysis by the reference method, the number of positives at each level is compared to an MPN table to yield the probable level of contamination of the bulk matrix. Single laboratory studies involve the analysis of twenty replicate test portions at a single contamination level by both methods. Collaborative studies typically include 12–15 laboratories each analysing six replicate test portions at each contamination level. At least 10 valid data sets for each food type are required. Collaborative data are first reviewed for completeness and any laboratory yielding data that appear aberrant is questioned to determine whether there is cause to eliminate the data set. Sample integrity upon receipt, equipment malfunctions and protocol deviations would be reasons for eliminating aberrant data from further analysis. If no assignable cause can be determined, the data are considered valid and representative of interlaboratory variation. Such variation could be indicative of poorly written method instructions, poor training or even variation in the prevalence of contamination across the test portions, rather than the inherent variability of the method itself, and therefore aberrant data should always be investigated thoroughly to determine whether improvements should be made. Once the initial data review is complete, multi-laboratory collaborative data should be checked for inter-laboratory homogeneity using a Pearson Chi Square test (LaBudde, 2006). The Chi-Square test can be carried out by constructing a 2 L contingency table (Table 13.1), with one row indicating the number of positive results for the method across laboratories, and the other row indicating the negative results. The 2 value is calculated by the equation: 2 ∑ {(ai E1i )2 /E1i (bi E 2i )2 / E 2i }, where the expected values E1i and E2i are given by: E1i A(ai bi)/(A B), E2i B(ai bi)/(A B) and ai, bi, A and B are as defined in Table 13.1. The P-value of the calculated 2 is computed for L 1 degrees of freedom. If the value of P is less than the tabulated value of 2 at 0.05, the data are not independent across laboratories and consequently the equations for between-method comparisons are not valid. If the data are independent across laboratories, then the collaborative data can be compiled by food matrix and contamination level for further statistical analysis.
CH013-N53039.indd 265
5/26/2008 8:56:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
266
TABLE 13.1 Contingency Table for Inter-laboratory Homogeneity Result Positive Negative
Lab #1
Lab #2
...
Lab #L
Total
a1 b1
a2 b2
... ...
aL bL
A B
a1 through aL are the number of test portions yielding positive results and b1 through bL are the number of test portions yielding negative results by the candidate method, for laboratories 1 through L, respectively. A and B are the total number of test portions yielding positive or negative results, respectively, by the candidate method.
Performance Indicators The performance indicators for qualitative methods include sensitivity, specificity, false negative rate and false positive rate. Sensitivity is defined as the proportion of true positive samples that test positive by the alternative method at a single contamination level. Likewise, specificity is defined as the proportion of true negative test portions that test negative by the alternative method. The true status of each test portion is typically defined by the reference method cultural results. The false negative rate is the proportion of true positive test portions that test negative by the alternative method and is equal to 1 minus sensitivity. The false positive rate is likewise the proportion of true negative test portions that yield positive results by the alternative method and is equal to 1 minus specificity. Because of the simple relationships between sensitivity and false positive rate, and specificity and false negative rate, it is not necessary to report both sets of performance indicators. Typically the false positive and false negative rates are more meaningful to the end user and are the preferred indicators to report. Calculation of the performance indicators is easily accomplished by tabulating the single laboratory or compiled collaborative laboratory data as shown in Table 13.2. The performance indicators are based on the values of a, b, c and d as defined in Table 13.2. They are calculated as follows: Sensitivity a/(a b) False negative rate b/(a b) 1 – sensitivity Specificity d/(c d) False positive rate c/(c d) 1 – specificity Sensitivity, or relative sensitivity, and false negative rate vary with the level of contamination of the matrix and, therefore, these performance indicators are reported in conjunction with the contamination level. Test for Significant Difference McNemar’s Chi Square (2) test is used to determine whether the two methods are significantly different (Siegel, 1956). This is not to say that the methods are equivalent if no significant
CH013-N53039.indd 266
5/26/2008 8:56:42 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
267
TABLE 13.2 Data Tabulation of Paired Sample Method Comparison Study Alternative method positive
Alternative method negative
a c
b d
Reference method positive Reference method negative
a number of positive replicate tests by both the candidate and reference methods; b number of negative replicate tests by the candidate method, that are positive by the reference method; c number of positive replicate tests by the candidate method that are negative by the reference method; d number of negative replicate tests by both the candidate and reference methods.
difference is found, but rather that a significant difference was not detected. Tests for statistical equivalence require more statistical power and thus, higher numbers of replicate test portions. Using the data from Table 13.2, 2 is calculated using the McNemar formula: 2
(b c)2 bc
The experimental 2 value is compared to the tabulated 2 value with v 1 (degree of freedom) at 0.05 (2 3.84) (Pearson and Hartley, 1974). Experimental values greater than the tabulated value indicate a significant difference between the two methods.
Statistical Analysis of Unpaired Sample Design: Single Laboratory and Collaborative Laboratory Validation It has become common for commercial test kit manufacturers also to develop proprietary enrichment media, precluding the use of the paired sample design. As for the paired sample design, the bulk matrix is inoculated, homogenized and stabilized and test portions are taken from the bulk. Current practice is to randomly remove a 2x test portion (50 g), homogenize, and split this into two 25 g test portions, one to be analysed by the reference method and one to be analysed by the alternative method. All test portions analysed by the alternative method, regardless of presumptive result, are subjected to the reference method confirmation procedure to establish the true status of the test portions. Because there is no protocol for unpaired samples, statistical analyses of data from the test portions are performed in the same way as for paired samples. But if we assume Poisson distribution of target cells, at the low inoculation levels required for fractional positive results, then we cannot assume that the two test portions are equivalent with respect to the presence or absence of target organism. Hence, pairing of the test results is not justified but current validation guidelines (Anon, 2003; Feldsine et al., 2002) do not adequately address this situation. Alternative statistical methodologies for unpaired sample study designs are presented below.
CH013-N53039.indd 267
5/26/2008 8:56:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
268
TABLE 13.3 Data Tabulation of Unpaired Sample Method Comparison Study
Alternative method
Presumptive positive Presumptive negative
Reference method
Confirmed positive
Confirmed negative
A C
B D
E
F
A number of presumptive positive replicate results that were confirmed positive; B number of presumptive positive replicate results that were confirmed negative; C number of presumptive negative replicate results that were confirmed positive; D number of presumptive negative replicate results that were confirmed negative; E number of replicates that gave positive results by the reference method; F number of replicates that gave negative results by the reference method.
Performance Indicators A data table (Table 13.3) is constructed from either single laboratory data or collaborative data (after removal of any invalid data sets and testing for lab-to-lab homogeneity as discussed above) compiled by food matrix and contamination level. The following performance parameters can be defined: Relative sensitivity – the proportion of presumptive positive results that confirmed positive for the alternative method relative to the proportion of positive results for the reference method A/E False positive rate – the proportion of confirmed positive test portions for the alternative method that yielded presumptive negative results B/(B D) False negative rate – the proportion of confirmed negative test portions for the alternative method that yielded presumptive positive results C/(A C) Note that presumptive results are not reported for the cultural reference methods. Note also the assumption that false negative results cannot be obtained by the reference method, not withstanding the microbial distribution issues at low inoculum levels since they cannot be detected in unpaired samples. Occasionally false negatives from the reference method are seen in paired sample studies, for instance where a PCR method is positive and the reference method is negative. Upon further examination (alternative confirmation procedures or sheer persistence) one may eventually find a target colony. Test for Significant Difference For comparison of methods in an unpaired sample design, the Mantel-Haenszel 2 test (Siegel, 1956) is used. The test statistic is: 2
(n 1)(AF (B C D)E)2 , (A E)(B C D F)(E F)(A B C D)
where n A B C D E F
CH013-N53039.indd 268
5/26/2008 8:56:42 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
269
This test is compared to a tabulated 2 value, with v 1 df at the designated probability level, usually 0.05, for which the tabulated 2 value is 3.84. Method Comparison for Quantitative Methods: Single Laboratory and Collaborative Laboratory Studies Using either naturally contaminated or artificially contaminated food, three lots of the food matrix covering at least 3 log units of contamination are analysed by both the alternative and reference methods. If artificially contaminated food is tested, then one lot of uninoculated food matrix must also be included. The lowest contamination level should be close to, but not at, the limit of detection of the method. In the single laboratory study, five replicate test portions at each level are examined by each method. Collaborative studies require a minimum of eight valid data sets for each food type, so it is recommended that 10–14 laboratories participate in the study. Each laboratory must analyse two test portions per contamination level per food matrix.
Examination of Data for Outliers The data are first subjected to visual inspection for obvious aberrant data. If aberrant data are observed, the laboratory is contacted to determine whether there is cause for removal of the data set as mentioned previously. The data can also be examined for statistical outliers using the Grubbs, Cochran or Dixon tests (see Chapter 11), but removal of statistical outliers is now being discouraged in favour of investigating outlier data for assignable cause. Additionally, robust methods of statistical analysis are becoming more widely accepted. Graphical Representation of Data Quantitative data are first normalized by logarithmic transformation, and then the data for each food matrix are graphed with the alternative method results on the y-axis and the reference method results on the x-axis. Linear regression is performed to determine the slope and linear correlation coefficient of the line. This is most easily done using a program such as Microsoft Excel®. Figure 13.3 shows an example of pooled raw meat data from a total aerobic count method (Kodaka, 2004) graphed against the reference standard plate count method (AOAC, 2007b). Performance Parameters For quantitative methods, the performance parameters are repeatability, reproducibility and relative standard deviation (see Chapter 11). This allows comparison of the method variability at different concentrations. For each contamination level of each food in a single laboratory, the mean of the log-transformed values is calculated and the standard deviation,
CH013-N53039.indd 269
5/26/2008 8:56:42 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
270
Methods correlation at 35°C (n 60) 10.0
log cfu/ml compact dry
9.0 8.0
y 0.9914x 0.0061 R2 0.9955
7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0 10.0
log cfu/ml pour plate FIGURE 13.3 Graph of pooled raw meat data from the validation of the Nissui Compact Dry TC (reproduced from Kodaka, 2004, by permission of AOAC International).
sR determined. From collaborative study data, the mean of the log-transformed data across all labs and the standard deviation, sR, is calculated for each contamination level of each food (see Table 13.5). In each case, the standard deviation values are divided by the mean log count of colonies to arrive at the relative standard deviations.
Comparison of Means The mean log10 values for the alternative and reference methods are compared for each lot or contamination level of each matrix. For paired samples, the paired t test is most appropriate. For unpaired samples, the independent t test can be used for comparison of means from two methods and a one-way analysis of variance (ANOVA) can be used if two or more method means are to be compared. The paired t-test (Goulden, 1959) is performed by calculating the mean difference between paired samples and thence the standard error of the mean difference. Dividing the mean difference by the standard error of the mean difference yields the t-statistic that is t-distributed with n-1 degrees of freedom. Thus, t (x1 x2 ) / s2 / n , where x1 is the mean of the first method, x2 is the mean of the second method and s2 is the combined variance. The calculated t-value can be compared to a table of critical values of the t distribution (Pearson and Hartley, 1958) to determine whether the difference is significant. With 0.05, if the calculated value is less than the tabled value then the difference is not significant at the 5% level. Alternatively, the t test function of Microsoft Excel®, can be used to obtain a P value. If P 0.05 the difference is generally accepted to be significant.
CH013-N53039.indd 270
5/26/2008 8:56:42 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
271
TABLE 13.4 Validation Results of a Collaborative PCR Study with Unpaired Samples
Alternative method
Confirmed positive
Confirmed negative
40 1 7
1 30 65
Presumptive positive Presumptive negative
Reference method
The independent t test (Goulden, 1959) uses the equation: t
| x1 x2 | ⎛ (n1 n2 ) ⎞⎟ ⎛ (n1 1)s12 (n2 1)s22 ⎞⎟ ⎜⎜ ⎟⎟ ⎟ ⎜ ⎜⎜⎝ (n n ) ⎟⎟⎠ ⎜⎜⎜⎝ ⎟⎠ (n n 2) 1 2
1
2
and as for the paired t test, the resultant t value is compared to the critical value for t with (n1 n2 2) degrees of freedom. Again, the test function of Microsoft Excel® can be used to yield a P value for the independent t test. To perform the ANOVA (Wernimont, 1985), first organize the data as in Table 13.5. In this case, three methods are being compared in a single lab study, but a similar table can be constructed for collaborative data and for comparison of two methods. Begin by calculating the Sum of Squares for each method. This is done using the equation SS xi2 xi2 / n, where xi is an individual replicate value and n is the number of replicates tested by that method. The sum of the squares within all methods is obtained by summing the sum of squares for the individual methods, that is, SSwm inA SSi (SSA SSB ... SSn ), where SSA, SSB to SSn are the individual sums of squares. Next, the mean values from each method are used to calculate the between-method sum of squares, SSbm, using the equation: ni (xi x)2 , where xi is the mean value for the ith method, x is the overall mean value and ni is the number of replicates for the ith method. The sums of squares, SSwm and SSbm, are transformed into Mean Squares, MSwm and MSbm, by dividing by the appropriate degrees of freedom. For the within-method calculation, the degrees of freedom equals the sum of degrees of freedom for each method: vi (ni 1) . For the between-method calculation, the degrees of freedom are the number of methods minus one. We now arrive at MSwm SSwm/vwm and MSbm SSbm/vbm. The purpose is to determine whether the observed variability between the means of the methods can be attributed to the random variability between the replicates. To test this, we calculate the ratio, F MSbm/MSwm and compare the resultant F-value to the critical value of the F-distribution for 0.05 (Siegel, 1956) with vBM and vWM degrees of freedom, respectively. If the observed value of F is equal to or greater than the critical F table value, then the difference between method means is significant.
CH013-N53039.indd 271
5/26/2008 8:56:43 PM
CH013-N53039.indd 272
∑ ( yi y)2 n2 1 sr 2 / y SS 2a ∑ yi2 (∑ yi )2 / n2 SS 2 b n2 (y y)2
∑ ( xi x)2 n1 1 sr 1 / x SS1a ∑ xi2 (∑ xi )2 / n1 SS 1b n1 ( x x)2
RSDr
SSwm
SSbm
SS3b n3 ( z z )2
SS3a ∑ zi2 (∑ zi )2 / n3
sr 3 / z
n3 1
∑ ( zi z )2
∑ xi ∑ yi ∑ zi n1 n2 n3
SSbm SS1b SS2b SS3b
SSwm SS1a SS2a SS3a
x
Total
SSwm within-method sum of squares; SSbm between-method sum of squares; MSbm between-method mean square; MSwm withinmethod mean square; vbm degrees of freedom between methods (number of methods minus one); vwm degrees of freedom within methods ((n1 1) (n2 1) (n3 1)).
MSbm MSwm F ratio
sr
SSbm/vbm SSwm/vwm MSbm/MSwm
y ∑ yi / n2
x ∑ xi / n1
z ∑ zi / n3
z1 z2 z3 z4 z5
y1 y2 y3 y4 y5
x1 x2 x3 x4 x5
1 2 3 4 5 Mean
Method 3
Method 1
Method 2
Replicate
TABLE 13.5 Data Tabulation for Quantitative Collaborative Study – log10 Transformed Counts for One Contamination Level of One Food
272 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
5/26/2008 8:56:43 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
273
FUTURE DIRECTIONS As new pathogens emerge and biothreat agents are targeted for detection, we enter an arena in which reference cultural methods may not be established. This presents a new paradigm for validation of microbiological methods. The AOAC Presidential Task Force for Best Practices for Microbiological Method Validation (BPMM) has recommended that qualitative methods be validated to determine performance parameters that are independent of comparison to a reference method and further that the most important parameter to be determined is the 50% limit of detection or LOD50 (FDA, 2006). The LOD50 is the point on the dose-response curve that results in positive responses for 50% of replicates (Fig. 13.4). As can be seen in Fig. 13.4, the LOD50 is only part of the story. The three curves shown all intersect at the same LOD50, but have very different slopes. Determining the LOD90, for example, in addition to the LOD50 would better define the shape of the dose-response curve and provide a more accurate description of the method performance at low doses. The details of the study designs and statistical methodology to determine these parameters are still being debated. Suffice to say that in the future, method validation will likely move away from significance testing in method comparison and move toward providing an independent assessment of method performance. This independent assessment can also be applied to reference methods in those cases where an appropriate reference method exists.
% Positive response
100 80 60 40 20 0 1
0.8
0.6
0.4
0.2
0
log cfu/25 g test sample FIGURE 13.4 Limit of detection curves.
CH013-N53039.indd 273
5/26/2008 8:56:44 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
274
EXAMPLE 13.1 USE OF THE MANTEL-HAENSZEL 2 TEST TO ASSESS THE RELATIVE SENSITIVITY OF TWO METHODS DURING QUALITATIVE METHOD VALIDATION (DATA OF FELDSINE ET AL., 2005) A collaborative validation of a PCR method for E. coli O157:H7 examined independent portions of inoculated food matrices by PCR and by a reference cultural method. Each of 15 collaborating laboratories examined 6 replicate test portions of ground beef at each of three levels of inoculum – uninoculated, 0.08 cfu/25 g (low) and 0.29 cfu/25 g (high). The reported valid data from 12 laboratories for the low level inoculation are shown in Table 13.4. Visual examination of the data suggests that the alternative PCR method is much more sensitive than the reference method but we need to assess the difference statistically. Since this validation used an unpaired sample design, we use the Mantel-Haenszel 2 test, for (21)(21) 1 degree of freedom, to assess the significance of the difference between the results of the two methods. The equation is:
2
(n 1)(AF (B C D)E)2 (A E)(B C D F )(E F )(A B C D)
Then: 2
(72 1){(40 65) (1 1 30)7}2 (40 7)(1 1 30 65)(7 65)(40 1 1 30)
71{2600 224}2 71 23762 (47 97 72 72) 23, 633, 856
400, 8 2 1, 696 16 . 96 23, 633, 856
For 1 df, the probability is that the observed 2 value of 16.96 is likely to occur with a frequency of 0.001, which is much higher than the critical 2 value of 3.84 at 0.05. This confirms statistically what was seen by observation – the PCR method is significantly more sensitive than the reference cultural method.
CH013-N53039.indd 274
5/26/2008 8:56:44 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
275
EXAMPLE 13.2 STATISTICAL VALIDATION OF A QUANTITATIVE METHOD (DATA OF KINNEBERG AND LINDBERG (2002)) A collaborative study to validate a rapid (Petrifilm™) method for total coliforms included comparison of the rapid method, after 14 h and 24 h incubation, to the standard method for the analysis of vanilla ice cream using violet red bile agar (VRBA; APHA, 1985). Contaminated ice cream, prepared with low, medium and high levels of inoculum, was examined. The log-transformed colony count data for the high level inoculation are presented in Table 13.6. The mean ( x ), reproducibility (sR) and the relative standard deviation (RSDR) are determined for each of the three methods as well as the overall mean ( x ) – the results are shown in Table 13.6. TABLE 13.6 Data Analysis for Validation of Petrifilm™ Coliform Method on Vanilla Ice Cream At the High Inoculation Level, After Removal of Outliers
Laboratory 1 2 3 4 5 6 7 8 9 10 11 Mean sR RSDR SSwm SSbm MSbm MSwm F ratio F(0.05, 2, 63)
Petrifilm 14 h
Petrifilm 24 h
A
A
4.653 4.398 4.833 4.934 4.663 4.851 4.778 4.398 4.763 4.845 4.690
B 4.716 4.643 4.833 4.778 4.602 5.127 4.763 4.342 4.531 4.820 4.806 4.717 0.185 0.039 0.712 0.0595
B
4.748 4.415 4.845 4.934 4.681 4.869 4.778 4.623 5.029 4.857 4.708
4.785 4.653 4.929 4.778 4.613 5.130 4.763 4.591 4.863 4.820 4.813
VRBA A
B
4.903 4.204 4.826 4.968 4.785 5.207 4.954 4.669 4.940 4.505 4.771
Total
4.978 4.342 4.875 4.778 4.672 5.152 4.919 4.748 4.908 4.690 4.949
4.783 0.156 0.033
4.808 0.234 0.049
4.769
0.512 0.00431 0.0486 0.0377 1.289 3.150
1.151 0.0335
2.375 0.0973
Source: Data from Kinneberg and Lindberg (2002). SSwm within-method sum of squares; SSbm between-method sum of squares; MSbm between-method mean square; MSwm within-method mean square.
CH013-N53039.indd 275
5/26/2008 8:56:44 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
276
Do the colony counts done using the 2 variants of the Petrifilm™ method and the reference method differ statistically? xi2 ( xi )2 First we determine the SSwm for each method: SS . n 2 For the 14 h Petrifilm™ method: SS 490.148 – (103.767) /22 490.148 – 489.436 0.712. For the 24 h Petrifilm™ method, SS 0.512 and for the Reference method, SS 1.159. Adding the method SS values together yields the sum of the squares within methods: SSwm 0.712 0.512 1.159 2.383. Next, we determine the sum of squares between the methods (SSbm) by adding the square of the difference between the method mean ( x ) and the overall mean ( x ) multiplied by the number of test results (n), that is, SSbm ∑ ni (xi x )2 . For these data, SSbm 22(4.717 4.769)2 22(4.783 4.769)2 22(4.808 4.769)2 0. 0 595 0.00431 0.0335 0.0973. Then the SSwm and SSbm are divided by their respective degrees of freedom to yield the mean squares within and between the methods: MSwm 2.383/63 0.0378 and MSbm 0.0973/2 0.0487. Finally, we determine the F ratio of the mean squares between and within methods. F MSbm/MSwm 0.0487/0.0378 1.287, for 0.05, with v1 2 and v2 63 degrees of freedom. This value is less than the tabulated value of 3.150; hence, no significant differences are detected between the mean values of the three methods.
References Anon (2003) Microbiology of food and animal feeding stuffs – Protocol for the validation of alternative methods, ISO16140:2003. Geneva: International Organisation for Standardisation. Anon (2007) International Organization for Standardization (ISO) online, http://www.iso.org/iso/en/ prods-services/ISOstore/store.html. AOAC (2007a) Official Methods of Analysis online (2007) http://eoma.aoac.org. AOAC International, Gaithersburg MD. AOAC (2007b) Official Methods of Analysis online (2007) Method 966.23 http://eoma.aoac.org. AOAC International, Gaithersburg MD. APHA (1985) Standard Methods for the Examination of Dairy Products. American Public Health Association, Washington DC. Blodgett, R (2003) Most Probable Number Determination from Serial Dilutions, Bacteriological Analytical Manual, Appendix 2, http://www.cfsan.fda.gov/~ebam/bam-a2.html. FDA (2007) Bacteriological Analytical Manual online, http://www.cfsan.fda.gov/~ebam/bam-toc.html. FDA (2006) Final Report and Executive Summaries from the AOAC International Presidential Task Force on Best Practices in Microbiological Methodology, http://www.cfsan.fda.gov/~comm/bpmmtoc.html.
CH013-N53039.indd 276
5/26/2008 8:56:45 PM
VALIDATION OF MICROBIOLOGICAL METHODS FOR FOOD
277
Feldsine, P, Abeyta, C, and Andrews, WH (2002) AOAC International Methods Committee Guidelines for validation of qualitative and quantitative food microbiological official methods of analysis. J. Assoc Offic. Anal Chem. Int., 85, 1187–1200. Feldsine, PT, Green, ST, Lienau, AH, Stephens, J, Jucker, MT, and Kerr, DE (2005) Evaluation of the assurance GDS™ for E. coli O157:H7 method and assurance GDS for shigatoxin genes method in selected foods: Collaborative study. J. Assoc Offic. Anal Chem. Int., 88, 1334–1348. Goulden, CH (1959) Methods of Statistical Analysis, 2nd edition. Wiley, New York, USA. Kinneberg, KM and Lindberg, KG (2002) Dry rehydratable film method for rapid enumeration of coliforms in foods (3M™ Petrifilm™ Rapid Coliform Count Plate): Collaborative study. J. Assoc Offic. Anal Chem. Int., 85, 56–71. Kodaka, H (2004) Nissui pharmaceutical and Neogen kits granted PTM status. Inside Lab. Manage., 8(4), 19–22. LaBudde, R (2006) Statistical analysis of interlaboratory validation studies. IV. Example analysis of matched binary data. Technical Report 233. Virginia Beach, VA 23464 USA, Least Cost Formulations, Ltd. Pearson, ES and Hartley, HO (1958) Biometrika Tables for Statisticians, 6th edition, Vol. 1. Cambridge University Press, Cambridge, UK. Siegel, S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill Book Co, New York NY, USA. USDA (2007) Microbiology Laboratory Guidebook online, http://www.fsis.usda.gov/Science/ Microbiological_Lab_Guidebook/index.asp. Wernimont, GT (1985) Use of statistics to develop and evaluate analytical methods. Spendley, W (ed.) AOAC Int. Gaithersburg, MD, USA.
CH013-N53039.indd 277
5/26/2008 8:56:45 PM
14 RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
The objective of ensuring safe food for the World’s constantly growing population has been a major preoccupation of governments, international organizations (e.g. WHO/FAO CODEX, ILSI, ISO, ICMSF, etc.) and professional and trade bodies over many years. Yet, in deprived areas of the world, there remains a basic need to ensure a reliable food supply. In all countries, especially in developed consumer-oriented countries, the need is to ensure that foods do not present an unacceptable risk to the health and well being of the consumer. At times, therefore, there is a clash of priorities: foods exported from third world countries help to support their national economy, but the foods are required to comply with the rules imposed by international trade particularly the import regulations of developed countries. Meanwhile, the indigenous population often consume foods that do not meet those criteria. Attempts to improve food quality and safety are important for all consumers – but if you are starving the importance of quality and safety appears less important that having enough food for your family. People in developed countries who constantly demand ever-increasing quality and safety standards often overlook this paradigm. Following the publication by Accum (1820) of his ‘Treatise on Adulteration of Food and Culinary Poisons’ and subsequent work in the middle of 19th century by the Lancet Analytical Sanitary Commission and other bodies (see Amos, 1960) the need to improve food safety in the United Kingdom led to the introduction of food legislation concerned with such diverse areas as food composition, food additives, chemical contaminants and, eventually, on microbiological contamination. In recent times legislation has been concerned primarily with controlling those aspects of food production that are necessary to ‘ensure’ the safety of foods at all stages from ‘farm to fork’. In the area of food microbiology, such legislative control has been aimed at improving the safety and quality of foods processed by the food industry, or supplied by catering outlets, since it is rightly perceived that industrial scale production impacts on many more consumers than does traditional domestic production. Yet it is often the small producer or caterer who presents the greatest risk to consumer well being. Statistical Aspects of the Microbiological Examination of Foods Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved.
CH014-N53039.indd 279
279
5/26/2008 9:02:47 PM
280
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Throughout the world, the law imposes a duty of care and responsibility for the safety and quality of foods on those business organizations involved in the procurement, processing, distribution and retail sale of the products. For instance, in Europe, the basic premise of food law is enshrined in a General Regulation (Anon., 2002a) on food safety, with subsidiary legislation on key issues, including microbiological aspects of safety. One facet of modern food legislation is the requirement for risk assessment by governments in order to provide the legislative framework within which food producers, processors, caterers and all others concerned with food must operate.
FOOD SAFETY OBJECTIVES AND RISK ASSESSMENT Modern approaches to food safety include the identification of actual, or potential, hazards from microbial contamination, assessing the risk that such contamination may cause disease in the consumer, and then seeking to employ processes that will control and minimize such risks. ‘Hazard’ can be defined as something that has the potential to cause harm, for instance the contamination of food by pathogenic bacteria. ‘Risk’ is defined as the likelihood of harm in a defined situation; for instance, consumption of food contaminated with specific pathogenic microorganisms and/or their toxins. Risk assessment of foods is therefore concerned with assessing the potential risk that consumption of a food may cause harm to consumers. As is amply demonstrated by ICMSF (2002), risk assessment requires an understanding of microbial contamination per se and also that both food process operations and domestic food handling practices may reduce or increase the risk from a defined hazard for a defined group of consumers (infants, children, the aged, the immuno-compromised, etc.). Rather than seeking to control food safety on an ad hoc basis, ‘perceived wisdom’ requires that specific objectives should be set to ensure, so far as is practicable, that food does not threaten the health and well being of consumers. The Phytosanitary Measures (SPS) of the World Trade Organisation (WTO) require member states to ensure that their sanitary and phytosanitary requirements are based on scientific principles whilst not unnecessarily restricting international trade. This means that member countries must establish appropriate measures on the basis of the actual risks likely to be involved; originally, this requirement was primarily for those risks arising from chemical contaminants. The concept of Quantitative Microbiological Risk Assessment (QMRA) was introduced in the 1990s following development of predictive models for growth and survival of pathogenic, and other, microbial populations in foods based on fundamental studies of microbial growth and survival (see e.g. Haas, 1983, 2002; Roberts and Jarvis, 1983; Whiting, 1995). This approach is based on the application and extension of the concept of microbiological compositional analysis (Tuynenburg-Muys, 1975) and uses the wide availability and computational power of desktop computers, often using dedicated microbial modelling software. The WTO/SPS proposed the concept of an ‘appropriate level of protection’ (ALOP) defined as, ‘the level of protection deemed appropriate by the member (country) to establish a sanitary or phytosanitary measure to protect human, animal and plant life or health within
CH014-N53039.indd 280
5/26/2008 9:02:47 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
281
its territory’. Subsequently, the Codex Committee on Food Hygiene (CCFH) developed consensus protocols for risk analysis of pathogens in food (see for instance Anon., 2000a, b, 2002a) and produced a report on the ‘Principles and Guidelines for the Conduct of Microbiological Risk Management’ (MRM) (Anon., 2002b, c). A key output was the redefinition of the WTO/SPS definition, as ‘ALOP refers to a level of protection of human health established for a food-borne pathogen’. However, there was little guidance on the nature of an ALOP or how it might be established. Several alternative approaches for setting an ALOP were debated including the concept of ‘as-low-as reasonably achievable’ (ALARA) based on the performance of the available risk management options. All aspects of control require the definition of criteria for the ‘disease burden’ that public health can accept for a population, for instance ‘the number of cases per year per 100,000 population for a specific hazard in a specific food commodity’. But even this leaves much room for debate. In 1979, Mossel and Drion had proposed the concept of a ‘lifetime tolerable risk’ for botulinum and other toxins, but other objectives were more constrained both in relation to the time span and the defined population. So what is the tolerable risk to which any consumer should be exposed and over what timeframe? Is the ‘population at risk’ the total population, the most susceptible group(s) or only that proportion of the population who actually consume a particular food? Does the ALOP include specific demographic groups of the population? Are related health concerns linked together (e.g. all cases of salmonella food poisoning) or considered only in relation to specific foods? What is the impact of alternative transmission routes, that is transmission by food handlers; cross-contamination between foods, due to poor storage and handling practices, and transmission of pathogens from food to consumers; or even, person-to-person transmission? Is the risk timeframe a definable period (e.g. a year) or a lifetime? Havelaar et al. (2004) propose a definition for an ALOP as ‘no more than x cases of acute gastroenteritis per 100,000 population per year associated with hazard Y and food Z’. This ALOP concept provides a useful target measure for public health policy but is of limited use in implementing food chain safety measures. The International Commission on Microbiological Specifications for Food (ICMSF) introduced the concept of ‘Food Safety Objectives’ (FSOs) that was adopted subsequently by CODEX (CCFH) as part of its MRM document. An FSO provides a means to convert public health goals into parameters that can be controlled by food producers and monitored by government agencies. It is defined (ICMSF, 2002) as, ‘the maximum frequency and/or concentration of a microbial hazard in a food considered tolerable for consumer protection’. ICMSF (2002) notes that FSOs are ‘typically expressions of concentrations of microorganisms or toxins at the moment of consumption’. Concentrations at earlier stages of the food chain are considered to be performance criteria. Hence, an FSO seeks to take account of hazards arising both during commercial processing and from unpredictable effects associated with retail and domestic food storage and handling. By contrast, performance criteria relate to the requirement to control hazards at earlier stages of the food chain. ICMSF (2002) provides the following simplistic equation to describe the performance criteria concept: H0 R I FSO. The terms in the equation (i.e. the performance criteria) are: the initial level of the specific hazard (H0) associated with raw materials and
CH014-N53039.indd 281
5/26/2008 9:02:48 PM
282
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
ingredients, the cumulative decrease in hazard level due to all processing factors (R), and the cumulative increase in hazard as a consequence of post process microbial growth (I). The symbol implies that the cumulative effect should be less than, or at least not greater than, the FSO expressed in terms of log10 units for a specific organism. Suppose for instance that the FSO for a specific pathogen in a defined food is considered to be not more than 1 organism/10 g (i.e. 1 log10 organisms/g) at the time of consumption. Suppose further that the maximum initial contamination level is likely to be 100 organisms/g (2 log10 organisms/g) and the maximum reduction is likely to be obtained by a combination of thermal and other processes is 3 log10 units. Then, to ensure that the FSO is not exceeded, the risk of re-growth between processing and consumption (I) must be zero, that is substituting H0 2, R 3 and FSO 1, in the rewritten equation I FSO R H0 1 3 2 0. Thus conditions of storage and handling post-processing must be such as to prevent any growth of surviving organisms. However, if the initial level of contamination of the product (R) were only 10 organism/g (1 log10 organisms/g) then the re-growth allowance would be not greater than 1 log10 unit : I 1 3 1 1. Such calculations are, of course, based on ‘point’ values and make no allowance for variations in microbial distribution within or between batches of food ingredients or for variations in process efficiency. Nonetheless they do provide a simple way to demonstrate how an FSO can be used to assess risk for a specific products and processes. So, the FSO is defined as ‘the maximum likely level of hazard that is acceptable’ following the integration of several stages in food processing, based on knowledge of microbial associations of foods, processing hurdles (Leistner and Gould, 2001; ICMSF, 2005) which may result in death or inhibition of microorganisms and of the likelihood of re-contamination and/or re-growth of organisms during subsequent storage and handling. Thus the FSO concept relates also to the use of the Hazard Analysis and Critical Control Point (HACCP) concept for controlling the effectiveness of food processing operations. For any manufacturing process, HACCP requires analysis of potential hazards and the identification and monitoring of control points that are critical to elimination or reduction of each hazard (CCPs). Furthermore, the concept requires that each CCP will be monitored using simple, indirect methods to ensure the process is operated correctly and that the effectiveness of monitoring is verified also by appropriate microbiological examination. HACCP is now required by law in many countries and forms part of a wider Total Quality Management (TQM) procedure that is used within a business to ensure that all foods produced conform to criteria that define acceptable quality and safety standards. The underlying themes of ALOP, ALARA, FSOs, HACCP, TQM and GMP imply that all persons responsible for production, distribution, sale and preparation of foods work together to ensure that potential hazards are identified and controlled effectively in order to minimize ‘so far as is achievable’ the risks to consumer health. This requires knowledge and experience of potential microbiological hazards and the likely effects of both acceptable and unacceptable practices at all stages from ‘farm to fork’. An FSO differs from a microbiological criterion; an FSO is not applicable to individual ‘lots’ and does not specify sampling plans. Rather an FSO is a statement of the level of control expected for a food processing operation that can be met by the proper application
CH014-N53039.indd 282
5/26/2008 9:02:48 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
283
of GMP, HACCP systems, performance criteria, process/product criteria and/or acceptance criteria (ICMSF, 2002). Ideally, an FSO should be quantitative and verifiable, though not necessarily by microbiological examination of food. An FSO provides a means by which control authorities can communicate clearly to industry what is expected for foods produced in properly managed operations, for instance by specifying the frequency or concentration of a microbial hazard that should not be exceeded at the time of consumption. An FSO therefore provides a basis for the establishment of product criteria that can be used to assess whether an operation complies with a requirement to produce safe food. FSOs can be established using risk evaluation by an expert panel using quantitative risk assessment. In all cases, the first step is the identification of hazards associated with specific foods by epidemiological or other means. Next an exposure assessment is required to estimate the probable prevalence and levels of microbial contamination at the time of consumption. Such assessment requires information about the amount of product consumed by different categories of consumers and use of mathematical models that take account of the prevalence of the organism(s), the nature of the food processing operations, the probability for growth of the organisms in the food both before and after processing and the impact that food-handling practices will have on the levels of organisms likely to be consumed. Characterization of the hazard (Anon., 2000c) requires assessment of the severity and duration of adverse effects resulting from exposure of individuals to a specific pathogen. A dose–response assessment provides a measure of the potential risk. The likelihood of exposure is dependent not only on the characteristics of specific strains of microbe, but also on the susceptibility of the host and the characteristics of the food that acts as the carrier for the organisms. Finally, risk characterization combines the information to produce a risk assessment that indicates the possible level of disease (usually as the number of cases per 100,000 people per year) likely to result from the given exposure. Risk characterization needs to be validated by comparison with epidemiological and other data and should reflect the distribution of risk associated with the many facets affecting contamination, survival and growth of a specific organism in food processed in a specific way. The overall process of establishing an FSO for any one specific food/pathogen combination is very challenging. It is clear from the FAO/WHO Expert consultation (Anon., 2000b) that hazard and risk characterization even for a limited scenario (e.g. salmonellae in eggs and broiler chicken) requires data of the highest quality and use of effective mathematical models to interpret those data (cf. views of Roberts and Jarvis, 1983). This is not to suggest that the approach is invalid or unattainable – rather it shows how little we really understand about those microorganisms in foods that are responsible for many apparently commonplace causes of human food-borne disease. However, Szabo et al. (2003) have described the development of a system for achieving an FSO for control of Listeria monocytogenes in fresh lettuce. CODEX has now published a Guide for National Food Safety Authorities (Anon., 2006a) that explains the whole concept of Food Safety Risk Analysis. This report covers all aspects of risk assessment for foods including providing guidance on the four stages of a risk management procedure (Table 14.1). Microbiological risk assessment is based on the
CH014-N53039.indd 283
5/26/2008 9:02:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
284
TABLE 14.1 The Stages of Risk Assessment, Control and Management (based on Anon., 2006a) Step
Process
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Preliminary Risk Management Activities Identify and define the food safety issue Develop a risk profile Establish broad management goals Decide whether a risk assessment is necessary Establish a risk assessment policy Commission the risk assessment Review the results of the risk assessment Rank the food safety issues and set prioritiesa
2 2.1 2.2 2.3 2.4 2.5 2.6
Select Risk Management Options Identify available risk management options Evaluate identified risk management options Select risk management option(s) Identify a desired level of consumer health protection Decide on preferred risk management option(s) Deal with ‘uncertainty’ in quantitative risk and policy
3
Implement Risk Management Decisions
4
Monitor and Review
a
Only applicable if more than one risk associated with a specific food.
use of ‘quantitative microbiological metrics’ as a risk management option. ‘Quantitative metrics’ is defined as the ‘quantitative expressions that indicate a level of control at a specific step in a food safety risk management system … the term “metrics” is used as a collective for the new risk management terms of food safety objective (FSO), performance objective (PO) and performance criteria (PC), but it also refers to existing microbiological criteria’. The report provides a case study for Listeria monocytogenes in ready-to-eat foods and recognizes the desirability of using FSOs, POs and PCs in the development of risk-based microbiological criteria; however, methods for achieving these objectives are still under development. Statistical methods play a major role in the use and interpretation of mathematical models of microbial contamination, growth and survival in relation to the epidemiology of food-borne disease, but it is not appropriate to consider such matters here. The reader is recommended to consult publications that consider this matter in more detail (Jouve, 1999; ICMSF, 2002; Szabo et al., 2003; Havelaar et al., 2004; Anon., 2002c, 2005a, 2006a; Rieu et al., 2007). However, we do need to consider the implications that arise in the application of quantitative and qualitative data in food control situations. First, it must be realized that no amount of quantitative or qualitative testing can ever control the safety and quality of manufactured foods (or other materials). Rather, such testing
CH014-N53039.indd 284
5/26/2008 9:02:48 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
285
indicates whether or not a production process, including all sources of contamination, is adequately controlled in terms of process conditions, process hygiene, pre- and post-process storage and distribution, etc. End-point testing in a factory environment provides data for feedback control of a process. More satisfactory, is the use of Quality Assurance programmes such as those which seek to identify and control potentially hazardous stages of a process, including HACCP described, for instance by Bauman (1974), Ito (1974), Peterson and Gunnerson (1974), ICMSF (1988), Wallace and Mortimer (1998), Jouve (2000). In the HACCP system, end-point testing is used to validate process controls for raw material quality; time–temperature relationships for heating, cooling, freezing, etc.; process plant cleaning and disinfection; operator hygiene and the many other factors critical to the production of foods under GMP. A different aspect of food control relates to the assessment of actual or potential health risks associated with particular food commodities or products, whether imported or home produced; and the assessment of ‘quality’ of foods in retail trade in so far as this may be required by food legislation. For such purposes, it is not possible to ‘control’ the process or the post-process distribution and storage, although inspection of process plants, including those in exporting countries, is now the ‘norm’. Testing by enforcement bodies is targeted to assess whether foods on sale have the necessary qualities expected of them and/or whether they constitute a (potential) risk to the health of the consumer. Consequently, various qualitative and quantitative microbiological criteria have been derived to provide guidance both for production personnel within industry and for enforcement authorities. Such criteria may not have legislative status but properly devised criteria can be of enormous value in ensuring compliance with GMP.
MICROBIOLOGICAL CRITERIA Microbiological criteria can be defined as, ‘limits for specific or general groups of microorganisms that can be applied in order to ensure that foods do not present a potential health hazard to the consumer and/or that foods are of a satisfactory quality for use in commerce’. This definition is deliberately vague since it encompasses a wide range of types of criteria: 1. A Microbiological Guideline is used to provide manufacturers, and others, with an indication of the numbers of organisms that should not be exceeded if food is manufactured using Good Manufacturing Practices (GMP) and stored during its ‘normal’ shelf life under appropriate conditions. 2. A Microbiological Specification defines the limits that would be considered appropriate for a particular food in a particular situation and may be used in contractual commercial agreements or may be recommended by national or international agencies as a means of improving the quality and safety of foods. 3. A Microbiological Standard is that part of national, or supra national, legislation that aims to control the safety and, in some cases, the quality of foods manufactured in, or imported into, that country.
CH014-N53039.indd 285
5/26/2008 9:02:48 PM
286
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Microbiological standards therefore have mandatory effect whilst specifications and guidelines do not. It is not intended to consider here the arguments for and against the use of legislative microbiological standards for foods since these have been well argued elsewhere (Thatcher, 1955, 1958; Wilson, 1955; Ingram, 1961, 1971; Charles, 1979; BairdParker, 1980; Mossel, 1982). It is worthwhile, however, to summarize the principles for establishment of ‘Microbiological Reference Values’ (i.e. criteria) described in more detail by Mossel (1982): 1. The number of criteria should be strictly limited so that the maximum number of samples can be examined for a given laboratory capacity. 2. The choice of criteria must be based on ecological considerations and be related to organisms of importance in public health and/or quality. 3. Criteria should be carefully formulated in justifiable quantitative terms. 4. Species, genera or groups of organisms to which the criteria are applied should be described in appropriate taxonomic terms. 5. Test methods should be described in sufficient detail to permit their use in any reputable laboratory, and should be applied only after full validation including inter-laboratory trials. 6. Numerical values should be derived only as a result of adequate surveys to establish technological feasibility. In 1979, Codex Alimentarius adopted the principles summarized above and agreed modified definitions of the various types of criteria to include reference to the use of sampling plans and standardized methodology as a prerequisite for setting criteria (Anon., 1997). Consideration of most of these points is outside the scope of this chapter and the reader is referred to the publications of Mossel (1982), ICMSF (1986, 2002), and the UK Health Protection Agency (Gilbert et al., 2000). However, principles (3) and (6) are clearly the raison d’etre for this chapter which is concerned with interpretation of laboratory data in relation to quantitative and qualitative microbiological criteria.
How Should Microbiological Criteria Be Set for Quantitative Data? Collection of Data Numerical criteria should be derived from surveys undertaken to determine what is technically feasible. Having decided upon the tests to be done and the methodology and culture media to be used, a survey should be made of products from several manufacturers who operate their process plants under GMP. As a prerequisite, an inspection should be made of the manufacturing processes together with microbiological examination of products so that any deficiencies in the process can be identified and corrected before the survey is done. Samples should be obtained at various times during the working day. They should be stored and transported to the laboratory under appropriate (e.g. ambient, refrigerated or
CH014-N53039.indd 286
5/26/2008 9:02:48 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
287
30
Frequency (%)
25 20
w
15 10 5 0 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 Colony count (log cfu/g)
FIGURE 14.1 Frequency distribution of colony counts on cooked meats, overlaid with a ‘normal’ distribution curve and showing the 95% percentile () at 3.49 log10 cfu/g (based on data of Kilsby et al., 1979).
frozen) conditions for examination using suitable microbiological reference methods. Ideally the laboratory examination should be undertaken within a few hours of sample procurement. Sampling is then repeated over a period of several days in the same and, ideally, also in other factories producing the same generic product, until sufficient data (not less than 100 data sets) have been collected. A distribution curve is prepared of the log10 colony counts against the frequency of occurrence of counts, for example, the data for a cooked meat product shown in Fig. 14.1; the 95th percentile () value (i.e. the log count which is exceeded only by colony counts on 5% of the samples) is also shown on the graph (see also Fig. 2.1 for colony counts on sausages manufactured in two different process plants). Such curves can also be drawn in an idealistic manner (Fig. 14.2) to indicate the mean, mode and the 95th percentile value (). In the case of the cooked meat data (Fig. 14.1) the 95th percentile is 3.5 log10 cfu/g indicating a well-controlled process, whereas for the data in Fig. 2.1, the 95th percentile is slightly below 7.0 log cfu/g, which is unacceptably high even for a raw meat product. Setting Criteria Limits For quantitative data, it is necessary to decide on the maximum number of organisms (the critical level: M) that could be permitted in any circumstances. This will normally be related to the minimum spoilage level (MSL) or, in the case of pathogens, to the minimum infective dose (MID). This critical level is the maximum count which is acceptable for foods manufactured under GMP and will usually be set at least one log cycle above the 95th percentile, provided that such value does not approach too closely the MSL or MID. The data from the survey of sausage manufacture (Fig. 2.1) indicate that many of the colony count values
CH014-N53039.indd 287
5/26/2008 9:02:48 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
288
are unacceptable; no attempt should be made to use such data to set critical values – rather the manufacturing processes must be improved, before the survey is repeated. The lower critical limit (m) is set at some point above the 95th percentile () and below the maximum (M) such that the value of m represents counts that can be achieved most of the time in product manufactured under GMP. The tolerance () between and m should be set to take account of the imprecision in microbiological measurement, the variability likely to occur in good quality raw materials, the manufacturing process, etc. For industrial control purposes, m is frequently established at, or very close to, the 95th percentile value. Figure 14.2 illustrates the establishment of a three-class sampling plan based on the results of a survey. In establishing criteria for ‘point of sale’ or import sampling (i.e. by enforcement authorities), due attention is required not only to what can be achieved in GMP but also to the effects of subsequent storage and distribution. This requires understanding of the microbiological consequences of storage time and temperature and of the effects of the intrinsic properties of the food (i.e. pH, aw, presence of antimicrobial agents, etc.). In addition, from a public health viewpoint, the potential for consumer mishandling during storage, cooking and serving needs to be considered; and the vulnerability of particular consumer groups. Hence the use of rapid indirect methods to indicate that a product has been adequately processed may be of more value than quantitative microbiological methods per se (Charles, 1979). For qualitative testing (e.g. for presence/absence of pathogens), the survey of product manufactured under conditions of GMP must be done using realistic sizes (e.g. at least 25 g) and numbers of samples in order to determine the prevalence of the target organism in the product. As explained in Chapter 5, the value of quantal tests is dependent not only on the relative sensitivity and specificity of the test procedure, but also on the number of samples examined. To obtain meaningful results, at least 100 samples should be examined from
Verdict based on results of survey
Acceptance plan
Reject
Mode
λ
Median
Frequency of counts
Survey
Conditional release n 10 c 2
Accept
φ
m
M
MSL
FIGURE 14.2 Distribution plot of the results of microbiological surveys on a given type of food (modified from Mossel, 1982).
CH014-N53039.indd 288
5/26/2008 9:02:48 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
289
each of a number of manufacturers. If the target organism is likely to be present in the test material, then the survey should be done using several quantities of each replicate sample, possibly using an MPN test in order to ascertain both the prevalence and level of potential contamination.
Establishing Sampling Plans Having established critical values it is necessary to consider the sampling plan requirements. In the first instance, a decision concerning the choice of Attributes or Variables systems is required. As described previously (Chapter 5) an Attributes scheme merely requires a ‘gono go’ decision based upon the analytical findings and takes no note of microbial distribution or methodological imprecision except in so far as tolerance may have been built in to the proposed values for m and M. By contrast, the Variables scheme assumes that the transformed data conform to a ‘normal’ distribution and incorporates into the specification a tolerance related to the number of samples to be examined and the standard deviation of data determined on the replicate samples examined. Traditionally organizations such as ICMSF and Codex Alimentarius have recommended the adoption of Attributes systems based on three-class sampling plans for quantitative data and two-class plans for quantal data (see Chapter 5). ICMSF (1986, 2002) acknowledge the benefits of Variables sampling plans in those circumstances where the microbial distribution in foods is known, as for instance, in food manufacture where the process is stable. However, they argued against the use of Variables plans for criteria that might be applied, for instance, in food control situations where knowledge of the microbial distribution is unknown (e.g. examination of samples taken at a port-of-entry). Whilst there is logic in this argument, increasingly legislative criteria are introduced for use by the food production industry where such knowledge does exist. In addition, the use of microbiological measurement uncertainty now provides a means of assessing the variability between replicate random samples of a food, although such variability may not reflect the true microbial distribution in the food. European legislation on microbiological criteria for foods (Anon., 2005b) follows the Attribute sampling concept, albeit with some variations in the process hygiene criteria. Two types of criteria are published: for ‘food safety’ and ‘process hygiene’. The food safety criteria are mainly two-class sampling plans for particular pathogens, for which the criteria require ‘absence’ (i.e. non-detection by the specified method) of the target organism in a defined quantity of sample. The process hygiene criteria are mainly, but not exclusively, based on three-class sampling plans. An important variation introduced into the interpretation of the process hygiene sampling plans for Enterobacteriaceae and aerobic colony counts on carcasses of meat animals is that a product is defined as ‘satisfactory’ if the daily mean log10 colony count m, ‘acceptable’ if the daily mean log10 count is between m and M, and ‘unacceptable’ if the daily mean M. All of the other three-class process hygiene sampling plans conform to the normal scheme where ‘satisfactory’ requires all counts m, acceptable if not more than c/n counts m but M, and unacceptable if any count M.
CH014-N53039.indd 289
5/26/2008 9:02:49 PM
290
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
So far as I am aware, no national or international agency has adopted a Variables scheme for microbiological examinations although such schemes have been adopted in legislation for quantitative determination of chemical contaminants (see, for instance, the European criteria for mycotoxins such as aflatoxin, ochratoxin and T2 toxin; Anon., 2006b). However, there is a difference in the type of sampling plan used for such micro-chemical analyses. A number of representative samples are drawn, in accordance with the ‘lot’ size; these samples are then blended and comminuted before drawing replicate analytical samples. The blending of the primary samples is intended to minimize ‘between sample’ variance and compliance with the criteria is dependent upon the analytical variation between the analytical samples, which should be representative of the entire ‘lot’. In these analyses allowance is made for the measurement and sampling uncertainty (see also below). Because of the possible risks of environmental contamination, blending of samples prior to drawing a portion for microbiological examination is not generally attempted. Sampling Plans for Attributes or Variables? In setting any sampling plan, it is essential to recognize that the level of sampling which would be required to provide a high degree of protection to both the manufacturer and the consumer cannot be achieved in practice. From the data presented earlier (Chapter 5), for a ‘lot’ consisting of, say, 2000 units, an Attributes Single Sampling Plan would require 125 sample units to be examined (Table 5.4). For an Acceptable Quality Level (AQL) of 1%, not more than two defective samples can be permitted (i.e. n 125, c 2), yet there would still be a producer risk (at 5% probability) of accepting up to 1.1% defective items and a 10% consumer risk of accepting 5.3% defectives. For microbiological examination, such a level of sampling is not viable from either a practical or an economical standpoint. However, if the level of sampling plan is not fixed in relation to the ‘lot’ size and the AQL, but by the economic and practical constraints associated with testing, both the producer and consumer risks increase considerably. For quantal data, Attributes plans provide the only option but this is not the case for quantitative data such as colony counts. Hence, one must question the wisdom of using Attributes schemes for microbiological criteria based on quantitative data. Of course, making a decision as to whether a specific colony count exceeds, or not, a predetermined limit is simple to understand and requires no statistical appraisal of the data. Calculation of the measurement uncertainty of counts on replicate samples is not required for Attributes plans. Yet, increasingly, those responsible for food examination are required by contract, or legislation, to derive and report values for measurement uncertainty to provide a measure of the precision of results (see Chapters 10 and 11). When only a limited number of representative samples has been examined, the risks of making wrong decisions are very high, so it is sensible to use of all the available analytical data to assess whether or not a set of results is compliant with a defined limit. I would argue therefore that this justifies the adoption of Variables sampling schemes for quantitative data, provided that knowledge exists of the microbial distribution in the foods. However, whichever scheme is adopted, it is essential to define the degree of ‘risk’ that is acceptable to the manufacturer and the consumer.
CH014-N53039.indd 290
5/26/2008 9:02:49 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
291
Example 5.4 (based on Kilsby et al., 1979) illustrates a special version of a Variables sampling plan for which the food safety criterion was defined as ‘reject with a 95% probability any “lot” where the proportion exceeding C is ⱸ10%’. The criterion is set, for n 5, such that the producer can expect to reject with 95% probability any ‘lot’ where 10% or more of the samples might have colony counts that exceed an upper limit of C. But, on average, the scheme recognizes that there is still a 5% risk that a ‘lot’ would be accepted if more than 10% of samples have counts that exceed C. The scheme also sets a GMP criterion that accepts with 90% probability those ‘lots’ where the proportion exceeding a lower limit (Cm) is less than 20%; conversely, there is a greater than 10% risk of accepting a ‘lot’ where the proportion exceeding Cm is greater than 20%.
EXAMPLE 14.1 COMPARISON OF ATTRIBUTES AND VARIABLES SAMPLING PLANS The level of samples for examination necessary to provide a high degree of protection to both the manufacturer and the consumer cannot be achieved in practice. In Chapter 5, it was shown that for a ‘lot’ consisting of, say, 2000 product units, an Attributes Single Sampling Plan would require 125 sample units to be examined (Table 5.4). If the level of sampling is fixed not in relation to the ‘lot’ size but by economic and practical constraints associated with testing, both the producer and consumer risks increase considerably. Suppose that the data on colony counts used in Example 5.4 were applied to an Attributes sampling plan, defined as n 5, c 2, m 5 log10 cfu/g and M 6 log10 cfu/g. The colony count data are: Lot A B
Log10 cfu/g 4.52; 4.28; 4.79; 4.91; 4.50 4.98; 5.02; 5.28; 4.61; 5.11
Mean ( x )
SD (s)
4.60 5.00
0.25 0.25
For an Attributes plan, each individual result is required to comply with the criteria, thus for lot A, since each of the counts is below the marginally acceptable limit (m), the ‘lot’ would be deemed acceptable. For lot B, where the counts have the same distribution as lot A, two counts exceed m, two counts equal m (after ‘rounding’ 4.98 and 5.02 both equal 5.0) and one count is lower than m. This ‘lot’ would also be acceptable by the Attributes plan. But as we saw in Example 5.4, in the Variables scheme ‘lot B’ would have been acceptable on the basis of the safety/quality criterion but not in respect of the GMP requirement. The acceptance criteria for the Variables sampling plan were: ●
●
CH014-N53039.indd 291
Reject the ‘lot’ with 95% probability if 10% or more of the colony counts exceed M (CM in the scheme) (i.e. a producer’s risk of 5%). Accept the ‘lot’ with a 90% probability if less than 20% of the colony counts exceed m (Cm in the scheme) (i.e. a consumer risk of 10%).
5/26/2008 9:02:49 PM
292
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Now for a three-class Attributes sampling plan for which the criteria limits are m 5.0 and M 6.0, the AQL will be dependent upon the number of replicate samples examined (n) and the number of results (c) that are marginally acceptable (i.e. 5.0 but 6.0). The proportion of defective or marginally defective samples that are likely to be accepted with a probability of 95% can be derived for various sampling plans using the method of Bray et al. (1973), as described in Chapter 5 and illustrated in Table 5.10. If we assume zero defectives (i.e. no colony counts M), then for a sampling plan with n 5, c2 0 and either c1 1 or 2, the AQL for acceptance of marginal defectives with a 90% probability would approximate to 10% or 25%, respectively. However, there is also a 90% probability that either plan could accept at least 1% actual defectives with 5% or 20% marginal defectives. It is not possible to determine with any certainty the proportion of marginal defectives that would be rejected on 90% of occasions but it is much larger (40% marginal defectives). Hence, an Attributes plan for an identical number of samples (n 5, c1 2), would be less discriminatory and would be more likely to accept a higher number of defective or marginally samples than would the Variables plan. For this reason alone, one must question the justification for basing microbiological criteria on Attributes schemes for legislative purposes. When only a limited number of samples can be analysed, best use of the analytical data can be achieved only by adoption of a Variables sampling scheme. However, whichever scheme is adopted, it is essential to define at an early stage the degree of ‘risk’ that would be acceptable.
Consider an Attributes sampling plan based on similar criterion limits (i.e. m 5 log10 cfu/g and M 6 log10 cfu/g) and assume that no ‘unacceptable’ results (i.e. greater than 6 log10 cfu/g) are obtained by examining five replicate samples. For such a plan there would be a 59% chance of accepting the ‘lot’ with a true incidence of 10% defectives, a 33% chance of accepting a lot with 20% true defectives and even a 5% chance of accepting a ‘lot’ with 45% defectives. Hence, the Attributes plan based on the same results but ignoring the variance of the test results, offers a lower standard of manufacturer and consumer protection than does the Variables plan. THE RELEVANCE OF MICROBIAL MEASUREMENT UNCERTAINTY TO MICROBIOLOGICAL CRITERIA The CODEX Committee on analysis and sampling discussed an outline proposal (Anon., 2004) that included a recommendation that ‘Codex Commodity Committees concerned with commodity specifications and the relevant analytical methods should state, inter alia, what allowance is to be made for measurement uncertainty when deciding whether or not an analytical result falls within specification’. The proposal also noted ‘this requirement may not apply in situations where a direct health hazard is concerned, such as for food pathogens’. Whilst indicating a need to have a standardized approach to the use of uncertainty measures in interpreting microbiological data, no specific approach was proposed. Previously, the UK Accreditation Service (Anon., 2000c) had provided guidance on how laboratories might cite and interpret analytical data and the European DG SANCO
CH014-N53039.indd 292
5/26/2008 9:02:49 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
293
(Anon., 2003) assessed the problem associated with differing national interpretations of uncertainty in relation to compliance with defined limits. The DG SANCO report (Anon., 2003) recommended that ‘…. measurement uncertainty (should) be taken into account when assessing compliance with a specification’. The report continued, ‘In practice, if we are considering a maximum value in legislation, the analyst will determine the analytical level and estimate the measurement uncertainty of that level, subtract the uncertainty from the reported concentration and use that value to assess compliance. Only if that value is greater than the legislation limit will the control analyst be sure “beyond reasonable doubt” that the sample concentration of the analyte is greater than that prescribed by legislation’. Note that this recommendation is dependent upon expressing uncertainty on an additive scale, where the expanded uncertainty (U) can be subtracted from the estimated value to give the lower 97.5 % confidence boundary. Samples drawn from a ‘lot’ should, but may not, be truly representative of the ‘lot’ (Anon., 2002d) and the results of any examination will provide only an estimate of the true microbial population. As discussed previously, even in a well-controlled laboratory, quantitative microbiological methods are subject to many sources of error that often lead to substantial estimates of measurement uncertainty. The expanded repeatability measurement uncertainty (i.e. the 95% confidence boundaries) of an analysis can frequently extend to 5% of the mean colony count (i.e. up to 0.25 log cfu) from replicate tests on a single sample, and inter-sample and inter-laboratory variation results in even wider limits (see e.g. Corry et al., 2007b; Jarvis et al., 2007a,b). Furthermore, the intrinsic errors associated with MPN and other tube dilution methods are even greater, and the 95% confidence limits (CL) are so wide, that these methods should generally be restricted to use in industrial quality assessment except where one is looking for specific changes in an otherwise stable situation (e.g. in potable water analysis). The level of uncertainty attributable to sampling is but poorly documented. For chemical analyses on foods the uncertainty due to sampling is greater than the measurement uncertainty (Lyn et al., 2002, 2003) and limited data from Jarvis et al. (2007b) indicate that a similar situation may apply to microbiological analyses. Hence, the use of criteria limits that take no account of such variation cannot be considered to be scientifically sound. For instance, if a mean log count is (say) 5.8 and the 95% measurement uncertainty limit is 0.5 log cycles then, on 19 occasions out of 20, the mean count indicates that the ‘true’ result lies within the log range 5.3–6.3, and a true result outside this range could be expected on one occasion in 20. An Attributes scheme makes no allowance for such variation except in so far as the positioning of the control limits. There are different ways in which measurement uncertainty estimates could be applied in relation to criteria limits. For upper control limits, the suggestions of various bodies include: 1. Subtracting the expanded 95% uncertainty value from a mean result before comparing the ‘corrected’ result with the limit, to ensure ‘compliance without reasonable doubt’ (Anon., 2003). 2. Applying a ‘guard band’ to the limit by adding the uncertainty value to the limit before comparing the actual result (essentially the same effect as in (1) but avoiding the necessity to ‘correct’ each result) (Anon., 2007).
CH014-N53039.indd 293
5/26/2008 9:02:49 PM
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
294
3. Applying the general statistical procedures for CL of a mean value to permit comparison of the possible range of the true value with the limit. 4. Ignoring uncertainty estimates completely because wide estimates demonstrate poor laboratory technique – not necessarily true but arguable. Some of these options are illustrated in Example 14.2. EXAMPLE 14.2 CAN MEASUREMENT UNCERTAINTY BE USED IN ASSESSING COMPLIANCE WITH A CRITERION? Compliance of mean colony counts EC legislation (Anon., 2005b) on microbiological criteria for swabs of meat carcases (other than pig for which different limits are set) uses a two-class plan and sets criteria limits for Enterobacteriaceae (2.5 log10 cfu/cm2) and aerobic colony count (5.0 log10 cfu/cm2) that require compliance of the daily mean colony count with the absolute limit (M ). Suppose that replicate swabs of carcases are examined daily for aerobic colony count and mean values and estimates of expanded measurement uncertainty for the data are determined. Suppose, also, that the expanded (95% probability) intra-laboratory reproducibility uncertainty is 0.5 log10 cfu/cm2 and the mean colony counts (as log10 cfu/cm2) on four separate series of replicate examinations are A 4.0, B 4.6, C 5.4 and D 6.0 log10 cfu/cm2. Figure 14.3 illustrates the upper and lower bounds of the distribution limits of these four sets of mean colony counts together in relation to the criterion limit value (M 5.0 log10 cfu/cm2).
Colony count (log cfu/cm2)
7.0
6.0
5.0
4.0
3.0 A
B
C
D
Scenario FIGURE 14.3 Implication of measurement uncertainty on mean values in relation to a criterion limit. Four scenarios showing mean counts (log10 cfu/cm2) of A 4.0, B 4.6, C 5.4 and D 6.0 and a limit value of 5.0 log10 cfu/cm2 with upper and lower confidence Intervals based on an expanded measurement uncertainty value of 0.5 log10 cfu/cm2.
CH014-N53039.indd 294
5/26/2008 9:02:49 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
295
Scenario A shows a mean colony count of 4.0 log10 cfu/cm2 with the upper boundary of the 95% CL at 4.5 log10 cfu/cm2; here there can be no dispute – the test result is compliant with the limit of 5.0 log10 cfu/cm2. Scenario B shows a mean colony count of 4.6 log10 cfu/ cm2 but the upper boundary of the 95% CL (5.1 log10 cfu/cm2) exceeds the criterion limit. Scenario C shows a mean colony count (5.4 log10 cfu/cm2) that is above the criterion limit but the lower boundary of the 95% CL (4.9 log10 cfu/cm2) is below the criterion limit – are these data compliant with the limit or not compliant? Scenario D shows both the mean colony count of 6.0 log10 cfu/cm2 and the lower boundary of 5.5 log10 cfu/cm2 is also above the criterion limit – here there can be no dispute; the results do not comply with the criterion. Whilst the decisions regarding Scenarios A & D are clear-cut, the situation is less clear for data in scenarios B & C. Some might consider Scenario B to be acceptable, since the mean value is below the criterion limit and, if one follows the proposal of DG SANCO (Anon., 2003) to deduct the expanded uncertainty value from the mean value, the ‘corrected’ mean value (i.e. the lower boundary of the 95% CL) is significantly below the limit value. Hence it can be argued that the colony count is compliant with the limit. However, others might deem this to be unacceptable since the upper boundary of the 95% CL exceeds the criterion limit. For Scenario C, the mean value is above the criterion limit and might be interpreted as non-compliant; however, by deduction of the expanded uncertainty value the ‘corrected’ mean count is below the criterion limit, so the count is again compliant with the limit. In both cases, it would be ‘beyond reasonable doubt’ that the test data comply with the criterion. However, this approach is not generally accepted at this time.
Compliance of individual colony counts Where a criterion requires that each individual result does not exceed a criterion limit, as in a normal two-class or three-class Attributes sampling plan, then the situation would be different. Suppose that, for the data from Scenario C above (mean count 5.4 log10 cfu/cm2), the individual colony counts on five replicate samples were 5.6, 5.0, 5.1, 5.5 and 5.8 log10 cfu/cm2. If the two-class sampling plan was, for instance, n 5, c 1, M 5.0, then four of the five counts exceed the criterion limit value (5 log10 cfu/cm2) so the data set would be deemed to be non-compliant. But if the expanded measurement uncertainty were to be subtracted from each individual value, the ‘corrected’ values would be 5.1, 4.5, 4.6, 5.0 and 5.3 log10 cfu/cm2, respectively. Hence, although the corrected mean value is below the limit value (and therefore compliant), the individual colony counts would be deemed to be non-compliant since two of the individual corrected values would exceed the criterion limit. Guidelines have been published for expression and use of measurement uncertainty in chemical analyses (see for instance Anon., 2007), but none exist for data on microbiological examination. The microbiology community needs to consider this matter on an international basis and to come forward with proposed Guidelines on compliance with criteria.
CH014-N53039.indd 295
5/26/2008 9:02:50 PM
296
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Eurachem (Anon., 2007) recommends the approach in which ‘guard bands’ are set around a criterion limit to show the level of tolerance that can be given for a particular form of analysis. The width of the guard band is determined by knowledge of acceptable limits to the measurement uncertainty of the test method and forms part of the decision process in setting criteria. The benefit of this approach is that there is a clear statement of intent with regard to the measurement uncertainty that is deemed acceptable for a specific form of analysis. Such an approach provides a demonstration of greater transparency in setting and interpreting criteria. It can be argued that a three-class sampling plan for microbiological criteria already incorporates the guard band concept since the lower limit (m) is the actual acceptance limit whilst the upper criterion limit (M) is an absolute that must never be exceeded. However, there is a significant difference in approach, since microbiological three-class plans are predicated on the interpretation of each of n results against the criteria, rather than the average of the n colony counts. This illustrates one of the many issues concerning measurement uncertainty that still needs to be addressed internationally for microbiological examination of foods. It again raises the issue of whether it is justifiable to convert variables data into ‘simple’ ‘go-no go’ values for use in an Attributes sampling scheme; or whether the time is now right to change to use of a Variables sampling scheme that takes account of all the data relevant to a set of colony counts on replicate samples drawn from a ‘lot’. Failure to make a clear decision on this approach will hinder future development on microbiological criteria for foods. Criteria based on presence or absence of a particular organism or group of organisms are affected not only by the distribution of the target organism(s) within the food but also by the adequacy, or otherwise, of the test procedures. Hence, there is necessity to use accredited methods with the level of sensitivity and specificity suitable for the intended purpose. Even then, tests giving negative results on a number of individual replicate samples do not guarantee freedom from the organism in question; they merely indicate that the probability for occurrence of the organism in the ‘lot’ is within certain tolerances. The tolerance will be dependent upon the number of samples tested and, of course, on the efficiency of the method used. However, in the same way that a scheme showing apparent absence of specific organisms cannot guarantee total absence of the organisms from the product, the detection of one or more positive samples could also arise by chance. Hence, product of equivalent quality could be rejected on one occasion yet be accepted on another occasion if only a small number of samples are tested. For a lot having, say, 1% defectives (i.e. 1% of the 25 g sample units derived from a lot contain at least one detectable salmonella), then if 10 samples are tested on average one would expect to detect salmonellae and therefore reject the lot (if c 0) on 10 occasions out of 100, yet not detect them and accept the lot on 90 occasions (Table 5.1). However, if the true prevalence of defectives is only 0.1%, on average, then tests on 10 samples would expect to detect salmonellae only once in every 100 tests. The analyst is unable to differentiate between false negative and true negative results when carrying out quantal tests on real-life samples, whilst detection of confirmed positive results indicates that the presumption for a high prevalence of contamination is probably well founded.
CH014-N53039.indd 296
5/26/2008 9:02:50 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
297
CONCLUSION In establishing microbiological criteria, including sampling plans, due cognizance is required of the effects of sample numbers on the efficiency of the laboratory examination and on the overall work of the laboratory. The potential cost of intensifying testing schemes needs to be balanced against the costs of unnecessarily rejecting valuable food materials and/or of increasing the risk of spoilage or food-borne disease by accepting defective products. Those responsible for setting criteria must define the AQL, Producer’s Risk and Consumer’s Risk and must also recognise the imprecision of microbiological methods. It is important, therefore, that a ‘transparent’ approach to the setting of criteria is adopted including a more generally accepted set of decision rules for criteria. It is also essential that internationally acceptable Guidelines should be published for use of measurement uncertainty in compliance assessment. Until such decisions have been taken it is dubious whether meaningful criteria can be set for microbiological examination of foods. Effective microbiological control comes from use of GMP and HACCP and other control strategies at all stages of food production, distribution and storage, based on knowledge of the microbial ecology of particular foods under different process and storage conditions. Such control strategies also require effective inspection assessments of manufacturing processes. End-point testing on manufactured foods is effective only as a means of retrospective assessment of the process and storage conditions. However, SPC provides trend analysis of microbiological data as an additional means of monitoring change in a manufacturing process. The distribution of organisms in foods and the statistical variation associated with methods of enumeration lead to the conclusion that at present microbiological criteria should not be used other than as guidelines and specifications. Except in the case of high levels of contamination by pathogens, there is little or no evidence to show any benefits to food safety of the imposition of legislative microbiological criteria for foods, other than in the retrospective investigation of food poisoning incidents.
References Accum, F (1820) A Treatise on Adulteration of Food and Culinary Poisons. Longman, Hurst, Rees, Orme & Brown, London. Amos, AJ (1960) Pure Food and Pure Food Legislation. Butterworths, London. Anon. (1997) Principles for the establishment and application of microbiological criteria for foods. 3rd edition. Codex Alimentarius Guidelines, Report No CXG 21e. WHO/FAO, Rome. Anon. (2000a) The interactions between assessors and managers of microbiological hazards in food. Report of a WHO Expert consultation, Kiel, Germany, 21–23 March 2000. WHO, Geneva. Anon. (2000b) Joint FAO/WHO Expert Consultation on Risk Assessment of Microbiological Hazards in Foods FAO Food and Nutrition Paper 71. WHO/FAO, Rome. Anon. (2000c) The Expression of Uncertainty in Testing. UKAS Publication LAB 12. Laboratory of the Government Chemist, London. Anon. (2002a) Regulation (EC) No. 178/2002 of the European Parliament and of the Council of 28 January 2002 laying down the general principles and requirements of food law, establishing the
CH014-N53039.indd 297
5/26/2008 9:02:50 PM
298
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
European Food Safety Authority and laying down procedures in matters of food safety. Offic. J. Eur. Comm., L31, 1–24. 1 February 2002. Anon. (2002b) Principles and guidelines for incorporating microbiological risk assessment in the development of food safety standards, guidelines and related texts. Report of a joint FAO/WHO Consultation, Kiel, Germany, 18–22 March 2002. FAO/WHO/Institute for Hygiene and Food Safety of the Federal Dairy Research Centre, Kiel. Anon. (2002c) Proposed draft principles and guidelines for the conduct of microbiological risk management. Joint FAO/WHO Food Standards Programme. Codex Committee on Food Hygiene 35th session. CX/FH 03/7. FAO and WHO, Rome and Geneva. Anon (2002d) Guide on Sampling for Analysis of Foods. NMKL Procedure No. 12. NMKL. National Veterinary Institute, Oslo, Norway. Anon. (2003a) The relationship between analytical results, the measurement uncertainty, recovery factors and the provisions in EU food and feed legislation. Report to the EU Standing Committee on the Food Chain and Animal Health Working Group Draft, 5 June 2003. Anon. (2004) The use of analytical results: sampling plans, relationship between the analytical results, the measurement uncertainty, recovery factors and provisions in Codex standards. CODEX Report ALINORM 04/27/23 Appendix VII. FAO/WHO, Geneva. Anon. (2005a) Food safety management systems – requirements for any organisation in the food chain. ISO 22000, 2005. Anon. (2005b) Commission Regulation No. 2073/2005 of 15th November 2005 on Microbiological Criteria for Foodstuffs. Offic. J. Eur. Union, L338, 1–26 of 22 December 2005. Anon. (2006a) Food Safety Risk Analysis: A Guide for National Food Safety Authorities. FAO Food and Nutrition Paper 87. WHO/FAO, Rome. Anon. (2006b) Commission Regulation (EC) No. 1881/2006 of 19 December 2006 setting maximum levels for certain contaminants in foodstuffs. Offic. J. Eur. Union, L364, 5–24 of 20 December 2006. Anon. (2007) Eurachem/CITAC Guide: Use of Uncertainty Information in Compliance Assessment. Ellison, S L R and Williams, A (eds). Eurachem, LGC, Teddington (http://www.eurachem.org/ guides/Interpretation_with_expanded%20uncertainty_2007_v1.pdf). Baird-Parker, AC (1980) Food microbiology in the 1980s. Fd. Technol., Australia, 32, 70–77. Bauman, HE (1974) The HACCP concept and microbiological hazard categories. Food Technol., 28(9), 30–34 and 74. Bray, DF, Lyon, DA, and Burr, IW (1973) Three class attributes plans in acceptance sampling. Technometrics, 15(3), 575–585. Charles, RHG (1979) Microbiological standards for foodstuffs. Health Trend, 11, 1–4. Corry, JEL, Hedges, AJ, and Jarvis, B (2007b) Measurement uncertainty of the EU methods for microbiological examination of red meat. Food Microbiol., 24, 652–657. Gilbert, RJ, de Louvois, J, Donovan, T, Little, C, Nye, K, Ribeiro, CD, Richards, J, Roberts, D, and Bolton, FJ (2000) Guidelines for the microbiological quality of some ready-to-eat foods sampled at the point of sale. Commun. Dis. Public. Health, 3, 163–167. Haas, CN (1983) Estimation of risk due to low doses of microorganisms, a comparison of alternative methodologies. Am. J. Epidemiol., 118, 573–582. Haas, CN (2002) Conditional dose-response relationships for microorganisms: Development and application. Risk Anal., 22, 455–463. Havelaar, AH, Nauta, MJ, and Jansen, JT (2004) Fine-tuning Food Safety Objectives and risk assessment. Int. J. Food Microbiol., 93, 11–29. ICMSF (1986) Microorganisms in Foods. 2 Sampling for Microbiological Analysis: Principles and Specific Applications, 2nd edition. University of Toronto Press, Toronto.
CH014-N53039.indd 298
5/26/2008 9:02:50 PM
RISK ASSESSMENT, FOOD SAFETY OBJECTIVES AND MICROBIOLOGICAL CRITERIA FOR FOODS
299
ICMSF (1988) Microorganisms in Foods. 4. Application of the Hazard Analysis Critical Control Point (HACCP) System to Ensure Microbiological Safety and Quality. Blackwell, Oxford. ICMSF (2002) Microorganisms in Foods. 7. Microbiological Testing in Food Safety Management. Kluwer Academic/Plenum Publishers, New York, USA. ICMSF (2005) Microorganisms in Foods. 6. Microbial ecology of Food Commodities. Kluwer Academic/ Plenum Publishers, New York, USA. Ingram, M (1961) Microbiological standards for foods. Food Technol., 15(2), 4–16. Ingram, M (1971) Microbiological Standards for foods. Food Industry South Af., 24(2), 8–13. Ito, K (1974) Microbiological critical control points in canned foods. Food Technol., 28, 46–48. Jarvis, B, Corry, JEL, and Hedges, AJ (2007) Estimates of measurement uncertainty from proficiency testing schemes, internal laboratory quality monitoring and during routine enforcement examination of foods. J. Appl. Microbiol., 103, 462–467. Jarvis, B, Hedges, AJ, and Corry, JEL (2007) Assessment of measurement uncertainty for quantitative methods of analysis: Comparative assessment of the precision (uncertainty) of bacterial colony counts. Int. J. Food Microbiol., 16, 44–51. Jouve, J-L (1999) Establishment of food safety objectives. Food Contr., 10, 303–305. Jouve, J-L (2000) Good manufacturing practice, HACCP and quality systems, In Lund, BM, Baird-Parker, AC, and Gould, GW (eds.) The Microbiological Safety and Quality of Food. Vol. II ch 58. As per, Publisher, Gaithersburg, Maryland. pp.1627–1655 Kilsby, DC, Aspinall, LJ, and Baird-Parker, AC (1979) A system for setting numerical microbiological specifications for foods. J. Appl. Bacteriol., 46, 591–599. Leistner, L and Gould, GW (2001) Hurdle Technology: Combination Treatment for Food Stability, Safety and Quality. Kluwer Academic/Plenum Publishers, New York. Lyn, JA, Ramsey, MH, and Wood, R (2002) Optimised uncertainty in food analysis: application and comparison between four contrasting ‘analyte–commodity’ combinations. Analyst, 127, 1252–1260. Lyn, JA, Ramsey, MH, and Wood, R (2003) Multi-analyte optimisation of uncertainty in infant food analysis. Analyst, 128, 379–388. Mossel, DAA (1982) Microbiology of Foods: The Ecological Essentials of Assurance and Assessment of Safety and Quality, 3rd edition. University of Utrecht, Utrecht. Mossel, DA and Drion, EF (1979) Risk analysis. Its application to the protection of the consumer against food-transmitted diseases of microbial aetiology. Antonie van Leeuwenhoek, 45, 321–323. Peterson, AC and Gunnerson, RE (1974) Microbiological critical control points in frozen foods. Food Technol., 28, 37–44. Rieu, E, Duhem, K, Vindel, E, and Sanaa, M (2007) Food Safety Objectives should integrate the variability of the concentration of pathogen. Risk Anal., 27, 373–386. Roberts, TA and Jarvis, B (1983) Predictive modelling of food safety with particular reference to Clostridium botulinum in model cured meat systems. In Roberts, TA and Skinner, FA (eds.) Food Microbiology: Advances and Perspectives. Academic Press, London. Szabo, EA, Simons, L, Coventry, MJ, and Cole, MB (2003) Assessment of control measures to achieve a Food Safety Objective of less than 100 cfu of Listeria monocytogenes per gram at the point of consumption for fresh pre-cut iceberg lettuce. J. Food Protect., 66, 256–264. Thatcher, FS (1955) Microbiological standards for foods: Their function and limitations. J. Appl. Bacteriol., 18, 449. Thatcher, FS (1958) Microbiological standards for foods. II. What may microbiological standards mean?. Food Technol., 12, 117–122.
CH014-N53039.indd 299
5/26/2008 9:02:50 PM
300
STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS
Tuynenburg-Muys, G (1975) Microbial safety and stability of food products. Antonie van Leeuwenhoek, 41, 369–371. Wallace, CA and Mortimer, S (1998) HACCP – A practical approach, 2nd edition. Aspen Pub, Gaithersburg, Maryland. Whiting, RC (1995) Risk Analysis. A Quantitative Guide. Wiley, Chichester, UK. Wilson, GS (1955) Symposium on food microbiology and public health: general conclusion. J. Appl. Bacteriol., 18, 629–630.
CH014-N53039.indd 300
5/26/2008 9:02:51 PM
INDEX
ATP Measurements, 180, 181, 243 Acceptable Quality Level, 78, 83–91, 290 Acceptance sampling, 22, 72, 79, 82 See also Attributes sampling; Variables sampling Accuracy, 186 Analysis of variance (ANOVA), 24, 189, 198–215 quantitative method comparisons, 270–2 robust methods, 208–15 standard ANOVA procedure, 204–8 tests for normality, 199–200 tests for outliers, 200–4 Anderson–Darling test, 199 Appropriate level of protection (ALOP), 280–1 Arithmetic mean, 5, 8, 16 colony counts, 133–5 As-low-as reasonably achievable (ALARA) concept, 281 Attributes sampling, 72–3, 79–81, 82–93, 296 control charts, 251–8 disadvantage of, 93 sampling plans, 86, 290–2 three-class plans, 86–91 two-class plans, 83–6 Average run length (ARL), 229 Average sample populations, 5–6 Batches, 4 Binomial distribution, 19, 25, 37, 73–5, 321–4 as model for regular dispersion, 57
calculation of expected frequencies, 22–3 quantal responses and, 144 variance estimation, 217–18 tables, 22 See also Negative binomial distribution Box–Cox normality plot, 199 Bulk compositing, 170 Cause and effect diagram, 197 Central limit theorem, 7–8, 24 Chi Square tests, 49–50, 56 comparisons of methods, 265, 274 paired sample design, 267 unpaired sample design, 268 Chitin estimation, 181 Class frequency, 13 Codex, 281, 283–4, 286, 289, 292 Committee on Food Hygiene (CCFH), 281 Coefficient of variation (CV), 7 colony counts, 112–18 Collaborative studies, 265 Colony counts: arithmetic mean count, 8, 134 coefficient of variation (CV), 112–18 comparability of methods, 137–8 confidence limits, 128–35, 140 dilution error effects on, 119 direct microscopic counts, 177–80 geometric mean, 8–9 median, 8 overall error, 138–40
301
Index-N53039.indd 301
5/26/2008 9:20:15 PM
INDEX
302 Colony counts (continued) pour-plate methods, 121–2 range of, 8 surface plating methods, 122–4 weighted mean count, 134 See also Dilution count method Colony selection for identification, 172–5 Colworth Droplette method, 122 Compositing of samples, 169–72 Confidence limits (CLs): colony counts, 128–35, 140 dilution series, 154–5, 163 for Poisson variable, 48–56 prevalence of defectives, 162–5 sample value, 77–8 Contagious distribution, 45, 46, 57–63 heterogeneous Poisson distributions, 58 lognormal distribution, 62 negative binomial distribution as model, 57–62 randomly distributed clumps, 58 true contagion, 58 Continuous variables, 13 Control charts, 229–58 CUSUM charts, 242–50 CUSUM signal chart, 248–51 interpretation of, 251 for attribute data, 251–8 x– and R charts, 230–6 interpretation of, 237–42 – and s charts, 237 x interpretation of, 237–42 Control limits, 229–30, 233–6 Counting errors, 135–7 Coverage factor, 188 Critical control points (CCPs), 282 Cusum charts, 242–50 CUSUM signal chart, 248–51 interpretation of, 251 D’Agostino–Pearson test, 199–200 Data sets, 8–11 De Man’s Tables, 155 Defectives, 72 prevalence of, 772–4 calculation of, 74 quantification based on relative prevalence, 162–5
Index-N53039.indd 302
Degrees of freedom definition, 7 Deviation, 7 See also Standard deviation Diluent volume errors, 104–5, 106–8 Dilution count method, 143–8 multiple test dilution series, 148–62 differences between MPN values, 155–61 Moran’s test, 151, 153 MPN method, 149–55, 163 multiple tests at several dilution levels, 148–55 special applications, 161–2 Stevens’ method, 149 multiple tests at a single dilution level, 147–8 single-tube dilution tests, 144–7 Dilution errors: calculation, 110–18 diluent volume errors, 104–5, 106–8 effects on colony count, 119 Direct microscopic counts, 177–80 dilution effect, 180 Howard mould count, 179–80 Discrete variables, 13 Distribution errors, 124–8, 195–6 Drop count method, 125, 126, 127, 128, 137 Electrical impedance measurement, 181–3 Enrichment cultures, 166, 167, 168, 169, 263 probability for transfer of viable inoculum, 167 Error, 6–7, 185–6, 195–8 counting errors, 135–7 diluent volume errors, 104–5, 106–8 distribution errors, 124–8, 195–6 incubation errors, 135 laboratory sampling errors, 103–4, 196 overall error of colony count methods, 138–40 pipette volume errors, 105–8, 111, 113–18 plating methods, 121–4 relative dilution error calculation, 110–18 effects on colony count, 119 standard, 7, 10–11 worker’s error, 137 Expanded uncertainty, 188 False negative results, 221 rates, 266, 268
5/26/2008 9:20:15 PM
INDEX
False positive results, 221 rates, 266, 268 Fisher Bacterial Colony Counter, 136 Fisher’s index of dispersion, 51, 124 Food safety: criteria, 289 law, 280 objectives, 280–5 risk assessment, 280–5 Food Safety Objectives (FSOs), 282–4 Frames, (sampling), 100 Frequency distributions, 13–18 relationship between, 37 transformations, 37–43 types of, 19 See also Binomial distribution; Normal (Gaussian) distribution; Poisson distribution; Negative Binomial distribution; Trinomial distribution 2
G Test, 51–6 Gaussian distribution, See Normal (Gaussian) distribution General Homogeneity Test, 53–4 followed by analysis of deviance, 54–5 Generalized uncertainty method (GUM), 195–8 Geometric mean, 6, 8–9 Good Manufacturing Practices (GMP), 94, 288, 291 Goodness-of-fit tests, 50, 55–6, 64–6 Graphical representation, 269 Guard bands, 261, 296 Hazard, 280 characterization, 283 identification of, 283 Hazard Analysis and Critical Control Point (HACCP) concept, 282, 285 Heterogeneous distribution, See Contagious distribution Homogenizers, 109 Howard mould count (HMC), 179–80 Impedance measurement, 181–3, 260 Incubation errors, 135 Independent test results, 186–7
Index-N53039.indd 303
303 Index of dispersion, 50, 51, 124 test, 124 Intermediate reproducibility, 188, 189 measurement, 215 k estimation, 59–62, 64–8 Kolmogorov–Smirnov test, 200 Laboratory proficiency schemes, 222 Level of detection (LOD), 189–90, 273 estimates, 218–21 Lifetime tolerable risk, 281 Likelihood Ratio Index, 51–6 Limit of detection, 190, 273 Lod50, 188, 189 Lognormal distribution, 62 Lot Tolerance Percent Defective (LTPD), 83 Lots, 4–5 Lower signal alarm limit (LSAL), 248–50 Maceration, 109 McNemar Chi Square test, 266–7 Mantel-Haenszel Chi Square test, 268–9, 274 Maximum likelihood methods, 31, 32–3 Mean, 5–6, 16–18 arithmetic, 5, 8, 16 colony counts, 134 geometric, 6, 8–9 Measurand, 187 Measurement uncertainty, 187–8 analysis of variance (ANOVA), 198–215 bottom-up estimation approach, 195–8 estimation of quantal method uncertainty, 217–22 level of detection estimates, 218–21 most probable number (MPN) estimates, 218 reference materials in quantal testing, 221–2 variance based on binomial distribution, 217–8 intermediate reproducibility measurement, 215–6 relevance to microbiological criteria, 292–6 top-down approach to estimation, 198 See also Uncertainty Method comparisons, See Validation Method development, 260–1
5/26/2008 9:20:15 PM
INDEX
304 Microbiological criteria, 285–92 compliance assessment, 294–5 data collections, 286–7 establishing sampling plans, 289–90 relevance of measurement uncertainty, 292–6 setting criteria limits, 287–9 Microbiological guideline, 285 Microbiological reference values, 286 Microbiological specification, 285 Microbiological standard, 285 Microbiological testing, 1–2 Moran’s test, 151, 153 Most Probable Numbers (MPNs), 94, 148, 260 differences between values, 155–61 estimates, 218 MPN method for multiple dilution levels, 149–55, 163 standard error, 154–55 Multi-stage tests: compositing of samples, 169–72 selection of colonies for identification, 172–5 test procedure, 165–9 Multiplication Rule for combined probabilities, 21 Negative binomial distribution, 19, 30–6, 38, 39 as model for contagious distribution, 57–62 calculation of expected frequencies, 31–6 RNEGBINIMIAL programme, 198 tests for agreement, 58–62, 67–8 transformation, 43 Normal (Gaussian) distribution, 19, 24–6 transformations, 40–2 Normality tests, 199–200 Operating characteristics (OC) curves, 83, 84, 91–3, 161 Outliers: data examination, 269 tests for, 200–4 Over-dispersion, See Contagious distribution Paired sample design, 263, 264 validation, 265–6 performance indicators, 266 test for significant difference, 266–7
Index-N53039.indd 304
Paired t-test, 270 Parameters, 7 Pascal’s triangle, 20 Performance criteria (PC), 284 Performance indicators: paired sample designs, 266 quantitative methods, 269 unpaired sample designs, 268 Performance objective (PO), 284 Performance standards, 228–9 Pipette volume errors, 105–8, 111, 113–18 Plating methods: comparability of colony counts, 137–8 pour-plate, 121–2 Spiral Plate (SP), 122–4, 126–8, 136 surface plating, 122–4 Poisson distribution, 19, 26–9, 38–42 calculation of expected frequencies, 27–8 heterogeneous, 58 quantal responses and, 144 randomness and, 47 transformation, 40–2 Poisson lognormal distribution, 62 Polymerase chain reaction (PCR) methods, 259–60 real time (RT-PCR), 260 Population at risk, 281 Populations, 3 average sample populations, 5–6 Pour-plate methods, 121–2, 137 Pre-enrichment incubation, 165–9 Precision, 186 measure of, 186 Prevalence of defectives, 72–4 calculation of, 75 quantification based on relative prevalence, 162–5 Probability, 19–21 of acceptance, 87–8, 89, 90–1 Process development, 261 Process hygiene criteria, 289 Qualitative methods, 189, 259, 260 validation, 263–9 paired sample designs, 265–6 unpaired sample designs, 267–9 Quality Assurance programmes, 285
5/26/2008 9:20:15 PM
INDEX
Quantal response tests, 143, 259 estimation of associated uncertainty, 217–22 level of detection estimates, 218–21 most probable number (MPN) estimates, 218 variance based on binomial distribution, 217–18 use of reference materials, 221–2 See also Dilution count method; Multi-stage tests Quantitative methods, 188, 259, 260 validation, 269–72, 275–6 comparison of means, 270–2 graphical representation, 269 outlier data examination, 269 performance parameters, 269–70 Quantitative metrics, 284 Quantitative Microbiological Risk Assessment (QMRA), 280 R charts, 230–6 interpretation of, 237–42 Random distribution, 45, 46, 47–56 tests for agreement with a Poisson series, 48–56 Random numbers, 98–9 Random sampling, 98 Ratio count technique, 177–8 Recursive Median (REMEDIAN), 189 Reference materials in quantal testing, 221–2 Reference values, 286 Regular distribution, 47, 48, 60–145, 46, 56–7 binomial distribution as model, 56–7 Rejectance Quality Level (RQL), 82 Relative sensitivity, 268 Relative standard deviation (RSD), 7, 10 Repeatability, 187 standard deviation, 204, 210 variance, 204 Reproducibility, 187 intermediate, 188 measurement, 215–16 standard deviation, 204, 210 variance, 204 Risk, 280 assessment, 280–5 characterization, 283
Index-N53039.indd 305
305 lifetime tolerable risk, 281 population at risk, 281 RNEGBINOMIAL maximum likelihood method programme, 198 RobStat, 210, 211–12 Robust analysis, 208–15 s charts, 237–42 interpretation of, 237–42 Sample size, 63–4 Sample variance, 7, 9–10 Samples, 4–5 average sample populations, 5–6 compositing of, 169–72 representative, 98 Sampling, 71 accuracy of the sample estimate, 77–8 errors, See Error number of samples, 75 variation, 78 random, 98 stratified, 99 uncertainty, 190–1 See also Acceptance sampling; Attributes sampling; Variables sampling Sampling plans: attributes sampling plans, 86, 290–2 establishment of, 289–90 three-class sample plans, 76–7, 86–91, 296 two-class sample plans, 79–81, 90–473–4, 83–6 variables sampling plans, 289–92 Selectivity testing, 262–3 Sensitivity, 266 relative, 268 Sequential Attribute Sampling Plans, 86 Shapiro–Wilk test, 199 Shewhart’s control charts, See Control charts Skips, 144, 145, 146 Spearman–Kärber method, 218–21 Specificity, 266 Spiral Plate (SP) method, 122–4, 126–8, 136, 137 Spread-plate method, 137 Standard deviation, 7, 10–11 relative (RSD), 7, 10 repeatability, 204, 210 reproducibility, 204, 210
5/26/2008 9:20:16 PM
INDEX
306 Standard error, 7 most probable number (MPN), 154, 155 Standard error of the mean (SEM), 5, 7 Standard uncertainty, 188 combined, 188 Standards, 261, 262 microbiological, 286 performance standards, 228–9 Statistical Process Control (SPC), 6–7, 225–6 performance standards, 228–9 setting control limits, 229–30 tools for, 227–8 See also Control charts Statistical Process Improvement, 226 Statistical range 6–, 7 Statistical Sampling Tables, 86 Stevens’ method for multiple dilution levels, 149 Stratified sampling, 99 Student’s t-test, 128–9 Surface plating methods, 122–4 Tables of Binomial Probability, 22 Tables of the Standardized Normal Deviate, 24 Three-class sample plans, 76–7, 86–91, 296 Total Quality Management (TQM), 226, 282 Transformations, 37–43 back transformations, 41 Trend analysis, 226–7 Trinomial distribution, 76–7, 86 Trueness, 186 Uncertainty, 185, 187 compliance assessment and, 191 estimation of, 188–90 expanded, 188 reporting of, 190 sampling, 190–1 standard, 188 combined, 188 See also Measurement uncertainty Under-dispersion, See Regular distribution
Index-N53039.indd 306
Unpaired sample design, 263 validation, 267–9 performance indicators, 268 test for significant differences, 268–9 Upper signal alarm limit (USAL), 248 Validation, 261–72 qualitative methods, 263–7, 274 paired sample designs, 265–7 unpaired sample designs, 267–9 quantitative methods, 269–72, 275–6 comparison of means, 275–6 graphical representation, 269 outlier data examination, 269 performance parameters, 269–70 selectivity testing, 282–3 Variables: continuous, 13 discrete, 13 Variables sampling, 72, 93–100 application, 96–7 drawing representative samples, 98–9 frames, 100 sampling plans, 289–92 single or multiple sampling schemes, 100 stratified sampling, 99 Variance, 6–7, 9–10, 16–18 population, 7 repeatability, 204 reproducibility, 204 sample, 7, 9 See also Analysis of variance (ANOVA) Verification, 262 Weighted mean count, 134 Wet compositing, 170 Worker’s error, 137 World Trade Organization Phytosanitary Measures (SPS), 280, 281 x– charts, 230–42 interpretation of, 237–42
5/26/2008 9:20:16 PM