Probability and Statistics for Computer Science
JAMES L. JOHNSON Western Washington University
WILEYINTERSCIENCE A JOH...
440 downloads
2282 Views
13MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Probability and Statistics for Computer Science
JAMES L. JOHNSON Western Washington University
WILEYINTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION
Copyright © 2003, 2008 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hobokcn, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without cither the prior written permission of the Publisher, or authorization through payment of the appropriate pcr-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive. Danvcrs, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons. Inc., 111 River Street, Hobokcn, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/pcrmission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategics contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wilcy.com.
Library of Congress Cutaloging-in-Publication Data is available. ISBN 978-0-470-38342-1
10 9 8 7 6 5 4
3 2 I
To my muses Shelley, Jessica, and Jimmy
Contents Preface
ix
1
Combinatorics and Probability 1.1 Combinatorics 1.1.1 Sampling without replacement 1.1.2 Sampling with replacement 1.2 Summations 1.3 Probability spaces and random variables 1.4 Conditional probability 1.5 Joint distributions 1.6 Summary
1 4 6 23 34 44 61 69 82
2
Discrete Distributions 2.1 The Bernoulli and binomial distributions 2.2 Power series 2.3 Geometric and negative binomial forms 2.4 The Poisson distribution 2.5 The hypergeometric distribution 2.6 Summary
91 93 108 128 150 164 172
3
Simulation 3.1 Random number generation 3.2 Inverse transforms and rejection filters 3.3 Client-server systems 3.4 Markov chains 3.4.1 Irreducible aperiodic Markov chains 3.4.2 Convergence properties 3.5 Summary
179 180 189 205 227 238 247 264
4
Discrete Decision Theory 4.1 Decision methods without samples 4.2 Statistics and their properties 4.3 Sufficient statistics 4.4 Hypothesis testing 4.4.1 Simple hypothesis versus simple alternative
271 272 285 314 333 339
vu
viii
Contents 4.4.2 Composite hypotheses Summary
354 363
5
Real Line-Probability 5.1 One-dimensional real distributions 5.2 Joint random variables 5.3 Differentiable distributions 5.4 Summary
371 374 396 416 432
6
Continuous Distributions 437 6.1 The normal distribution 437 6.1.1 The univariate and bivariate normal distributions . . . . 437 6.1.2 The multivariate normal distribution 455 6.2 Limit theorems 468 6.2.1 Convergence concepts 471 6.2.2 An inversion formula 477 6.3 Gamma and beta distributions 490 6.4 The χ 2 and related distributions 504 6.5 Computer simulations 527 6.6 Summary 540
7
Parameter Estimation 7.1 Bias, consistency, and efficiency 7.2 Normal inference 7.3 Sums of squares 7.4 Analysis of variance 7.5 Linear regression 7.6 Summary
545 546 557 568 575 608 635
A Analytical Tools A.l Sets and functions A.2 Limits A.3 Structure of the real numbers A.4 Riemann-Stieltjes integrals A.5 Permutations and determinants
641 641 647 659 672 688
B Statistical Tables
713
Bibliography
733
Index
739
4.5
Preface This text develops introductory topics in probability and statistics with particular emphasis on concepts that arise in computer science. It starts with the basic definitions of probability distributions and random variables and elaborates their properties and applications. Statistics is the major application domain for probability theory and consequently merits mention in the title. Unsurprisingly, then, the text treats the most common discrete and continuous distributions and shows how they find use in decision and estimation problems. It also constructs computer algorithms for generating observations from the various distributions. However, the text has a major subtheme. It develops in a thorough and rigorous fashion all the necessary supporting mathematics. This approach contrasts with that adopted by most probability and statistics texts, which for economy of space or for fear of mixing presentations of different mathematical sophistication, simply cite supporting results that cannot be proved in the context of the moment. With careful organization, however, it is possible to develop all the needed mathematics beyond differential and integral calculus and introductory matrix algebra, and this text purports to do just that. Of course, as the book lengthens to accommodate the supporting mathematics, some material from the typical introduction to probability theory must be omitted. I feel the omissions are minor and that all major introductory topics receive adequate attention. Moreover, engagement with the underlying mathematics provides an opportunity to understand probability and statistics at a much deeper level than that afforded by mechanical application of unproved theorems. Although the presentation is as rigorous as a pure mathematics text, computer science students comprise the book's primary audience. Certain aspects of most computer science curriculums involve probabilistic reasoning, such as algorithm analysis and performance modeling, and frequently students are not sufficiently prepared for these courses. While it is true that most computer science curriculums do require a course in probability and statistics, these courses often fail to provide the necessary depth. This text certainly does not fail in presenting a thorough grounding in elementary probability and statistics. Moreover, it seizes the opportunity to extend the student's command of mathematical analysis. This approach is different than that taken by other probability and statistics texts currently aimed at computer science IX
X
Preface
curriculums. The more rigorous approach does require more work, both from the student and from the instructor, but the rewards are commensurate. The engineering sciences, like computer science, also tend to use texts that place more emphasis on mechanical application of results than on the mathematical derivation of such results. Consequently, engineering science students will also benefit from the deeper presentation afforded by this text. Nevertheless, the primary audience remains computer science students because many of the illustrative examples are computer science applications. Therefore, from this point forward, I assume that I am addressing a computer science student or instructor. Computer science students typically follow a traditional curriculum that includes one or two terms of probability and statistics, which follow prerequisite courses in differential and integral calculus and linear algebra. Although these prerequisite courses do introduce limit processes and matrix transformations, they typically emphasize formulas that isolate applications from the underlying theory. For example, if we drain a swimming pool with a sinusoidal cross-section, we can calculate how fast the water level falls without invoking limit operations. We simply set up a standard differential ratio and equate it to the drain flow rate. Why this works is buried in the theory and receives less and less emphasis once a satisfactory collection of calculation templates is available. This text provides an opportunity to reconnect with the theoretical concepts of these prerequisite courses. As it probes deeper into the properties of probability distributions, the text puts these concepts to fruitful use in constructing rigorous proofs. The book's ambient prose deals with the principal themes and applications of probability, and a sequence of mathematical support modules interrupts this prose at strategic junctures. With some exceptions, these modules appear as needed by the probability concepts under discussion. A reader can omit the modules and still obtain a good grounding in elementary probability and statistics, including philosophical interpretations of probability and ample exercise in the associated numerical techniques. Reading the support modules will, however, strengthen this understanding and will also arouse an appreciation for the mathematics itself. The encapsulation is as follows. An appendix gathers selected topics from set theory, limit processes, the structure of the real numbers, RiemannStieltjes integrals, matrix transformations, and determinants. The treatment first reviews the material at an introductory level. The prepared reader will be familiar with these concepts from previous courses, but the results are nevertheless proved in detail. The less prepared reader will certainly find frequent recourse to the appendix, and the text provides pointers to the appropriate sections. However, even the prepared reader will benefit from the introductory presentations, which serve both as a review of proof technique and as an introduction to the argument style pursued in the main text. Upon completing an introductory review, the appendix then extends the topics as necessary to support the arguments that appear in the main body of the text. Therefore,
xi
Preface
all chapters depend on the appendix for completeness. Even a reader well grounded in the aforementioned prerequisites can expect to spend some time mastering the specialized tools developed in the appendix. The appendix, with its eclectic collection of review topics and specialized extensions, provides general mathematical background. There is need, however, for more specific supporting mathematics in connection with particular probabilistic and statistical concepts. Until perhaps halfway through the text, this supporting mathematics appears in mathematical interludes, which occur in each chapter. These interludes introduce particular results that are needed for the first time in that chapter. The first interlude deals with summation techniques, which are useful tools for the combinatoric problems associated with probability over equally likely outcomes. Others treat convergence issues in power series, stability features of Markov matrices, and sufficient statistics. Before taking up continuous distributions, however, it is appropriate to devote a full chapter to the mathematical issues that arise when one attempts to generalize discrete probability to uncountable sets and to the real line in particular. This chapter is actually a brief introduction to measure theory, and its logical place is just prior to the discussion of the common distributions on the real line. Two further interludes follow in subsequent chapters. They deal with limit theorems for continuous random variables and with decompositions of the sample variance. In short, the text exploits opportunities to introduce the mathematical analysis necessary to establish the basic results of probability theory. Moreover, the presentation clearly considers the mathematical analysis and the probability theory to be of equal importance. The following sketch shows the dependencies among the chapters, with the understanding that portions of the appendix are prerequisite for any given path. The dashed boxes note the mathematical interludes within the chapters.
Chapter 5 Probability on the Real Line
Discrete Decision Theory
Chapter 6 ■ Continuous Distributions
1
Chapter 7 Parameter Estimation
- * » Limit theorems —
, >| Sums of squares |
The reader can study the appendix in detail to ensure familiarity with all the background mathematics needed in the text, or can start immediately
xii
Preface
with the probability discussions of Chapter 1 and refer to the appendix as needed. Because of its breadth, the appendix is more difficult to master in its entirety than the mathematical interludes of the introductory chapters. A reader who prefers that the material increase monotonically in difficulty should start with the introductory chapters and digress into the appropriate appendix sections as needed. When using the text to support a course, an instructor should follow a similar path. As noted earlier, the intended audience is computer science students. Once past colored balls in numbered urns, which constitute the traditional examples in combinatoric problems, the text uses examples that reflect this readership. Client-server performance evaluation, for instance, offers many opportunities for probabilistic analysis. These examples should provide no difficulty for other readers, such as students from the engineering sciences, because the examples make no profound references to advanced concepts, but rather use generally accessible quantities, such as terminal response time, server queue length, error count per 1000 programming statements, or operation count in an algorithm. These examples are no more difficult than those in a more general probability text that ventures beyond the traditional urns, colored balls, and dice. The requirements of the Computer Science Accreditation Board and the Accreditation Board of Engineering Technology (CSAB/ABET) include a one-semester course in probability and statistics. This text satisfies that requirement. In truth, it is sufficient for a full-year course because it not only develops the traditional introductory probability concepts but also includes considerable material on mathematical reasoning. For a one-semester course, the following selection is appropriate. Note that the topics lie along an acceptable dependency chain in the earlier diagram. Appendix. Sections as referenced in the items below Chapter 1. Combinatorics and Probability Chapter 2. Discrete Distributions Chapter 3-4. Simulation, Sections 3.1 to 3.3, or Discrete Decision Theory, Sections 4.1 and 4.2 Chapter 6. Continuous distributions, Sections 6.1, 6.3, and 6.4 Chapter 7. Parameter Estimation, Sections 7.1, 7.2, and 7.4 The one-semester abbreviation is possible because Chapters 3 and 4 present major applications of discrete probability and, in the interest of time, need only be sampled. Chapter 5 is advanced material that elaborates the difficulties in extending discrete probability to uncountable sample spaces. It is present for logical completeness and to answer the nagging question that occurs to many students: Is the introduction of a sigma-algebra really necessary in the general definition of a probability space? Consequently, the proposed
Preface
xiii
one-semester course omits Chapter 5 with minimal impact on subsequent material. Finally, Chapter 7 undertakes major applications of continuous probability and also admits partial coverage. At the time of this writing, many computer science curriculums include only a first probability course. However, there is a recognized need for further study, at least in the form of an elective second course, if not in a required sequel to the introductory course. Anticipating that this increased attention will also expose the need for a more complete mathematical treatment of the material, I have provided unusally detailed excursions into supporting topics, such as estimation arguments with limits, properties of power series, and Markov processes. Buttressed by these mathematical excursions, the text provides a thorough introduction to probability and statistics—concepts, techniques, and applications. Consequently, it offers a continuing discussion of the real-world meaning of probabilities, particularly when the frequency-of-occurrence interpretation becomes somewhat strained. Any science that uses probability must face the interpretation challenge. How can you apply a result that holds only in a probabilistic sense to a particular data set? The text also discusses competing interpretations, such as the credibility-of-belief interpretation, which might appear more appropriate to history or psychology. The goal is, of course, to remain continually in touch with the real-world meaning of the concepts. Probability as frequency of occurrence over many trials provides the most compelling interpretation of the phenomenon. It is intuitively plausible, for example, that a symmetric coin should have equal chances of landing heads or tails. The text attempts to carry this interpretation as far as possible. Indeed, the first chapter treats the combinatorics arising from symmetric situations, and this treatment serves as a prelude to the formal definitions of discrete probability. As the theory accumulates layer upon layer of reasoning, however, this viewpoint becomes difficult to sustain in certain cases. When testing a hypothesis, for example, we attempt to infer the prevailing state of nature from sampled data. What does it mean to assign a priori probabilities to the possible states? This practice allows statisticians to incorporate expert judgment into the decision rules, but the assigned probabilities do not admit a frequency-of-occurrence interpretation. Rather, they reflect relative strength-of-belief statements about the possible states. As necessary, the text interrupts the technical development to comment on the precise real-world interpretation of the model. Although beautiful as abstract theory, probability and statistics are also rightly praised for their ability to deliver meaningful statements about the real world. Interpreting the precise intent of these statements should be a primary goal of any text. A trend in modern textbooks, particularly those not addressed specifically to a mathematics curriculum, is to avoid the theorem-proof presentation style. This style can be sterile and detached, in the sense that it provides sparse context for the motivation or application of the theorems. Without
χΐν
Preface
the theorem-proof style, on the other hand, arguments lose some precision, and there is a blurring of the line between the general result and its specific applications. I have adopted what I consider a middle ground. I maintain a running prose commentary on the material, but I punctuate the dialog with frequent theorems. Often, the theorem's proof is a simple statement: "See discussion above." This serves to set off the general results, and it also provides reference points for later developments. The ambient prose remains connected with applications and with the questions that motivate the search for new general results. Plentiful examples, displayed in a contrasting typographical style, play a major role in compensating for the perceived coldness of the theorems. Incidentally, I should say that I do not find the theorems cold, even in isolation. But I am responding to the spirit of the age, which suggests that a theorem wrapped in an example is more digestible than a naked theorem. The theorems also further a second ambition, noted above, which is to involve the reader more extensively in precise mathematical argument. An aspect of proofs that attracts major criticism is the tendency to display, out of thin air, an expression that magically satisfies all the required constraints and invites the algebraic manipulations necessary to complete the proof. I have tried to avoid this practice by including some explanation of the mysterious expression's origin. I must admit, however, that I am not always successful in this ploy. Sometimes an explanation adds nothing to a careful contemplation of the expression. In such cases, I am tempted to suggest that the reader reflect. on the beauty of the expression, note how one part attaches to the known information while another extends toward the desired result, and view the expression as a unifying link, growing naturally from a study of the context of the problem in question. Instead, however, I fall back on the age-old practice: "Consider the following expression — " The reader should take these words as an invitation to pause and ponder the situation. In summary, the text develops a main theme of probability and statistics, together with the mathematical techniques needed to support it. Since it is not practical to start with the Peano axioms, there are, however, some prerequisites. Specifically, the text's mathematical level assumes that the reader has mastered differential and integral calculus and has some exposure to matrix algebra. Nevertheless, acknowledging the mechanical fashion in which these subjects are taught these days, the text provides considerable detail in all arguments. It does assume, nevertheless, that readers have some familiarity with limiting operations, even if they do not have significant experience with the concept. For example, readers should be comfortable with l'Hôpital's rule for evaluating limits that initially appear to produce indeterminate results. By contrast, the text does develop the theory of absolutely convergent series to the point of justifying the interchange of summation order in double summations. As another example, the readers should be generally conversant with power series, although the text develops this topic sufficiently to justify the term-by-term differentiation that is needed to recover parameters from a
Preface
xv
random variable's moment generating function. The background appendix and the mathematical interludes should bridge the gap between prerequisite knowledge and that needed to establish all probability and statistical concepts encountered. They also serve to present a self-contained book, which is my preference when learning new material. A reader can always skip sections that are peripheral to the main point, but cannot as easily fill in omissions. Some expositions consist of step-by-step procedures for solving a probabilistic or statistical problem. That is, they involve algorithms. For example, algorithms appear in sections concerned with computer simulations of probabilistic situations. In any case, algorithms in this text appear as mutilated C code, in the sense that I vary the standard syntax as necessary to describe the algorithm most clearly to a human reader. For instance, I use only two iterators, the while-loop and the for-loop, and in each case, the indentation serves to delineate the body of statements under repetition. The left fragment below, intentionally meaningless to focus attention on the code structure, must be reformulated as shown on the right to actually compile properly, while (X > Y) while (X > Y) { for (j = 1; j 3) (4-4) (4*5) (4-6) a seven total has probability 6/36. The j 5 · 1 ) j 5 ' 2 j [ 5 ' 3 j [5-4j j 5 'fj [5-6j calculations for other results are equally * ' ' *■ ' ' * ' ' ^ ' ' ^ ' ° ' * ' ' straightforward. The probability of a three total is 2/36. The probability that the absolute difference between the two dice is greater than two is 12/36 because the favorable outcomes are the upper-right and lower-left triangles, each containing six entries. Probabilistic quantities appear frequently in the computer and engineering sciences. To the examples noted earlier we can add the time required for a computer to respond to a command from an interactive terminal, the number of errors in a program module, the number of polls to find a receptive device, the time to access a transmission bus, the number of memory bits flipped by cosmic radiation, or the number of compare operations executed by a sort algorithm. In many cases, it is difficult to calculate meaningful probability weights. For example, suppose that you measure the system response time T to a terminal command by directly observing a number of such operations. You then have an empirical estimate of the fraction of cases for which T — 2 seconds. But is it possible to compute this fraction in advance from your knowledge of the system parameters? Such a computation is not likely; it depends on circumstances that are neither repeatable nor easily controlled, such as parallel activity from other system users. By contrast, the number of compare operations in a sort algorithm is subject to analysis. The possible inputs of size N are the permutations of the numbers 1,2,... ,N, and it is reasonable to assume that all such inputs are equally likely. For a specific input, the algorithm determines exactly the number of compare operations, and we can therefore proceed by direct count to enumerate the fraction of cases for which the count is a particular n. This chapter opens our study of probability by considering just such situations, where we can compute the relevant probabilities by exhaustive counting. Each observation of a nondeterministic phenomenon is called a trial. In symmetric situations, such as coin tosses or dice rolls, we can reasonably assume that each elementary outcome appears with equal frequency in a long sequence of trials. However, as we associate trials in various ways, the equal frequencies quickly disappear. For example, a tossed coin displays either a head or a tail, and we expect a sequence of tosses to exhibit both results with approximately equal frequencies. But, if we change the definition of a trial to cover all tosses necessary to obtain a head, each observation is a number of tails followed by a head. It is not likely that all such observations are equally likely. Do you feel that 6 tails followed by a head will occur as often as 2
4
Chapter 1: Combinatorics and Probability
tails followed by a head? This chapter's early sections develop the tools to answer such questions. In the process, we obtain an intuitive appreciation for the probability of an event, such as 6 tails followed by a tail, as its relative frequency of occurrence in an extended sequence of observations. In a particular application, the usual difficulty is the systematic counting of the total possible outcomes and of the outcomes favorable to a certain result. Unlike the simple examples cited above, most applications provide a very large field of possible outcomes, and it is not feasible to list them explicitly. Before launching into the details, we take the time to elaborate three counting principles. All are immediately obvious, so you may feel that there is no need to state them explicitly. In complicated situations, however, when you cannot seem to find a starting point for a solution, you might well return to these simple principles. The first principle suggests breaking the mass to be counted into disjoint pieces and then summing the subcounts from these pieces. That is, if you want to count rooms within a building, you first perform subcounts on each floor and then add the subcounts. In applying this principle, you need to ascertain that there is no overlap among the pieces, which could lead to overcounting some elements. The second principle, known as the multiplicative principle, is actually a special case of the first principle. It again breaks the mass into disjoint pieces, but now such that each piece contains the same number of elements. The total count is then the number of pieces times the count of any one piece. If, for example, you want the number of houses on a rectangular site, you count the houses in one row and multiply by the number of rows. The third principle, known as the pigeonhole principle, states that the allocation of more than n pigeons to n pigeonholes results in multiple occupancy for at least one pigeonhole. If we randomly allocate four balls among three containers, what is the probability that all containers receive at most one ball? By the pigeonhole principle, it is zero. Some container must receive at least two balls. Consider a related question: What is the probability that the leftmost container receives at most one ball? Now, that is a more complicated story, which we now begin.
1.1
Combinatorics
Many situations call for counting the number of ways to select, under specified constraints, objects from a given set. When we say that a set contains n objects, we understand, in keeping with the usual definition of a set, that the objects are distinct. We may wish, nevertheless, to disregard distinctions among certain elements. For example, in a set of 7 balls, we can have 4 reds and 3 blacks. In this case, we can adopt a notation that both captures the elements' distinct identities and emphasizes the two internal classes of red and black. The designation {ΓΙ,Γ2,Γ 3 ,Γ4,61,62.^3} serves this purpose, but other schemes are equally valid. The best notation varies with the context of the problem at hand. In some manner or other, a problem context always specifies three gen-
Section 1.1: Combinatorics
5
Comb. Sequences abc abc, acb, bac, bca, cab, cba abd abd, adb, bad, bda, dab, dba acd acd, adc, cad, cda, dac, dca bed bed, bde, cbd, cdb, dbc, deb (a) Sampling without replacement Comb. abc abd acd bed aab abb aac ace aad add
Comb. Sequences Sequences abc, acb, bac, bca, cab, cba 66c 66c, 6c6, c66 abd, adb, bad, bda, dab, dba 6cc 6cc, c6c, cc6 bbd bbd,bdb,dbb acd, adc, cad, cda, dac, dca bed, bde, cbd, cdb, dbc, deb bdd bdd, dbd, ddb aab,aba,baa ccd ccd, ede, dec edd edd, ded, ddc abb, bab, bba aaa aaa aac, aca, caa ace, cac, cca 666 666 aad,ada,daa ccc ccc ddd ddd add, dad, dda (b) Sampling with replacement
T A B L E 1.1. 3-combinations and their related 3-sequences from a field of 4 symbols
eral constraints and perhaps further particular constraints. The general constraints answer the questions: How many objects are selected? Does the selection order make a difference? After drawing an object and recording the result, do we return the object to the set before drawing the next item? The responses to some questions affect the possible answers to others. For example, if you return objects to the common pool after each selection, then the total number of objects drawn can exceed the pool size. 1.1 Definition: An ordered selection of k objects is a k-sequence. An unordered selection of k objects is a k-combination. In the process of sampling with replacement, we note the identity of each object as it is drawn, but we return it to the common pool before the next draw. In the alternative process, sampling without replacement, we do not return the drawn object to the pool. | From the set {a,b,c, d), we generate four 3-combinations and twentyfour 3-sequences when sampling without replacement. The upper tabulation of Table 1.1 suggests a 6-to-l relationship between the sequences and the combinations. Sampling with replacement, on the other hand, yields twenty 3combinations and sixty-four 3-sequences, as illustrated in the lower tabulation. If there is a relationship between a combination and the sequences involving the same choices, it is less apparent. Duplications appear in both sequences and combinations when choosing with replacement, and this complicates the count. As noted earlier, we use the term probability for relative frequency of occurrence. Suppose, for example, that we are generating sequences with-
6
Chapter 1: Combinatorics and Probability
out replacement in the context of Table 1.1. Note that 4 of the 24 possible sequences contain the ordered pair "ab." Consequently, we say that the probability of "ab" occurring as a subsequence is 4/24 = 1/6. This policy agrees with our intuition when all sequences are equally likely. When, in due course, we come to a formal definition of probability, we will find that it conforms with this frequency-of-occurrence notion.
1.1.1
Sampling without replacement
Under sampling without replacement, neither a sequence nor a combination exhibits a duplicate entry. Sequences, however, have more structure because the insertion order is important. Our goal is to find systematic methods for counting sequences and combinations under sampling without replacement, but also under a variety of further constraints. We start with sequences. Suppose that we have seven tiles, labeled with the letters: a b c d e f g. Suppose further that we consider a word to be any 3-sequence. How many such words can we compose with the seven tiles? Possible words are abc, cag, def, fed, and the like. We imagine a repetitive process. On each trial, the process selects three tiles from the initial seven and assembles them in the order selected to form a word. There is no replacement as the tiles are selected. However, all seven tiles are reinstated at the beginning of each trial. With this clarification, the desired count is the number of 3sequences from a set of seven symbols, where the selection process is without replacement. We apply the multiplicative counting principle that breaks the possibilities into groups of the same size. In particular, a systematic count arises from partitioning the words into seven groups—those beginning with a, those beginning with b, and so forth. Within the first group, where the first letter is a, we distinguish six subgroups: those with second letter b, those with second letter c, and so forth through those with second letter g. In the second group, where the first letter is b, we again distinguish six subgroups: those with second letter a, those with second letter c, and so forth through those with second letter g. An important observation is that each group divides into the same number of subgroups. Table 1.2 organizes these subdivisions. Within each subgroup, where the first two letters are now specified, there remains a choice of one of the five remaining letters for the last tile. Hence each of the 42 subgroups divides naturally into five further pieces, each distinguished by one of the remaining five letters. So, if we choose three tiles randomly from the group of seven and lay them out in the order chosen, we will form one of exactly 7 · 6 · 5 = 210 three-letter words. In terms of Definition 1.1, this demonstration shows that there are exactly 210 possible ways of choosing, without replacement, a sequence of length 3 from a group of size 7. Now suppose that we want the probability that a three-letter sequence chosen without replacement from a field of seven letters has its first two letters in alphabetical order. In other words, we want the fraction of the 210 sequences that have the first two letters in order. Referring
Section 1.1: Combinatorics First letter a
b
c
Second letter b c d e f g a c d e f g a b d e f g
7 Possible third letters cd e fg b d e fg b cefg b cd fg bcd eg bcd ef cd e fg a d e fg a ce fg a cd fg a cdeg a c d ef bd e fg a d efg a b e fg a b d fg a bdeg a bd ef
T A B L E 1.2. Organizing the three-letter words from seven distinct tiles
to the organization above, we see that the first group, containing 6 · 5 = 30 sequences, meets the criterion. Since the first letter is a, the first two letters must be in order, regardless of the remaining letters. Sequences in the second group, however, begin with b, so the first subgroup, where the second letter is a, must be excluded. The second group then contributes 5 ■ 5 = 25 sequences that meet the criterion. The third group starts with c, so we must exclude its first two subgroups, where the second letter is a or b. Hence, we obtain only 4 · 5 = 20 sequences here. The pattern is clear: Each group must exclude one more subgroup than its predecessor. The total number of sequences with the first two letters in order is then 6 · 5 + 5 ■ 5 + 4 · 5 + 3 · 5 + 2 · 5 + 1 · 5 + 0 · 5 = 105, which gives a probability of 105/210 = 1/2. Is this result surprising? The number of sequences with the first two letters in order is exactly one-half of the total number of three-letter sequences. A so-called "sanity check" is always advisable. This means that we should try to confirm by some other method that the result is correct or at least reasonable. An obvious check in this case is that the result is a proper fraction, between zero and 1, because it represents the ratio of some number of selected possibilities to the total number of possibilities. So the result is reasonable in this sense. However, we can argue further that it should be 1/2 exactly. Suppose that we reorganize the three-letter sequences into two different groups: Gx contains those sequences with the first two letters in order and G2 contains those with the first two letters out of order. We can define a function / : G\ —> G2 by f(xyz) — yxz, where xyz is a three-letter
8
Chapter 1: Combinatorics and Probability
sequence from G\. This means that x < y alphabetically, so yxz is indeed a sequence in G2. This function establishes a one-to-one correspondence between the two sets, so they must have the same number of elements. Using the notation \X\ to indicate the number of elements in a set, we have that the probability of a three-letter sequence having its first two letters in order i s | G 1 | / | G 1 U G 2 | = |G 1 |/(2|G 1 |) = l / 2 . With some additional notation, we can extend the example to a general result. Recall that n! means the product of the integers n · (n - 1) · (n — 2) · · · 1 and that we define 0! = 1. The notation Pn>k will mean the first fc factors in the expansion for n!. 1.2 Definition: For integers n > 0, fc > 0, Ρ η ,* denotes the fcth falling factorial of n. It is the product n · (n — 1) · (n - 2) · · · (n - fc + 1). For the moment, we think of the " P " as an abbreviation for "product." As a special boundary case, we define Pn,o = 1-1 For k > n the expansion contains a zero factor, and therefore Pn
•A:
C'„,o · 0 — C„,i · 1 = n ■ 2"
U=o
The last simplification involves summing kC„,k, which you may or may not know how to do at this time. Section 1.2 will review these matters, but for the moment, even if you are not able to perform the reduction here, you can easily verify that the general formula generates the correct result for n = 10. D Exercises
1.1 The binary coded decimal (BCD) code assigns a four-bit binary pattern to each digit as follows. 1 2 3 4 0 5 6 7 8 |9 0000 0001 0010 0011 0100 0101 0110 0111 100011001 This is just one of N possible codes that express the ten digits as distinct four-bit binary patterns. What is iV? The BCD code leaves six patterns
22
Chapter 1: Combinatorics and Probability unused: 1010,1011,1100,1101,1110, and 1111. Other codes involve different sets of unused patterns, although each such set will always contain six members. Assume that the first step in constructing a code specifies the set of unused patterns. How many choices are available at this step?
1.2 What is the probability of receiving a full-house poker hand? A fullhouse hand contains 2 cards of one value and 3 cards of a different value. 1.3 What is the probability of straight-flush poker hand? A straight-flush hand contains 5 cards of consecutive value, all from the same suit. 1.4 In the game of bridge, point cards are aces, kings, queens, and jacks. What is the probability of receiving a 13-card bridge hand containing 3 or more point cards? 1.5 A 5-person committee is formed from a group of 6 men and 7 women. What is the probability that the committee has 3 or more men? 1.6 Seventeen 10-kilohm resistors are inadvertently mixed with a batch of eighty 100-kilohm resistors. Five resistors are drawn from the combined group and wired in series. What is the probability that the series resistance will be less than 500-kilohms? 1.7 If you randomly choose 3 days from the calendar, what is the probability that they all fall in June? 1.8 In a group of 1000 persons, 513 plan to vote Republican and 487 plan to vote Democrat. What is the probability that a random sample of 25 persons will erroneously indicate a Democratic victory? *1.9 Suppose that 8 balls are tossed toward a linear arrangement of 10 open boxes. What is the probability that the leftmost box receives no balls? *1.10 You arrive at the gym to find a single free locker, number 4 in a row of 10. You take that locker. When you finish your workout, you find 6 lockers, including yours, still occupied. What is the probability that the neighboring lockers on either side of yours are now free? 1.11 Calculate £ f c +k2+k3+k4=7 7!/[A:i! · fo! · fo! · A4!]. The notation means the sum over all sequences of four nonnegative integers, chosen with replacement, that add to 7. 1.12 A football fan is sitting high in the stadium and can barely make out the cheerleaders far below on the field. These cheerleaders are arranged in a line and are waving colored fabrics. If 3 cheerleaders have red fabrics, 2 have green fabrics, and 2 have blue fabrics, how many color patterns are discernible by the faraway fan?
Section 1.1: Combinatorics
23
1.13 Bob and Alice are dinner guests at a party of eight, 4 male and 4 female. The hostess arranges the guests linearly along a table with the men on one side and the women on the other. What is the probability that Bob and Alice will be facing each other or be within one position of facing each other? 1.14 Six devices d\,d,2,... ,d$ connect with a central processor, which polls them in order to identify one that is ready to communicate. Suppose that three are ready to communicate when the polling cycle starts. What is the probability that the central processor issues exactly two polling requests before communicating. 1.15 If you rearrange the letters of "Mississippi," how many distinguishable patterns are possible? *1.16 In the context of Example 1.12, what fraction of paths from (0,0) to (10,10) never rise above the diagonal? That is, what fraction of paths pass only through nodes (x,y) with x > y?
1.1.2
Sampling with replacement
Let us now consider the situation where repetitions are permitted in choosing from a set of n objects. That is, each chosen object, after being recorded in the result, rejoins the set before another object is drawn. This allows an object to appear multiple times in the fc-sequence or /^-combination under construction. It is an easy task to determine the number of ^-sequences from a set of size n, but it is somewhat less obvious how to calculate the number of /c-combinations. We start with the easy theorem. 1.19 T h e o r e m : Let n > 0 and k > 0. Under sampling with replacement, a set of n objects admits nk fc-sequences. PROOF: Let 5 be the set in question. \S\ = n. Let s(n,k) denote the number of fc-sequences that arise from S. There is just one empty sequence, and 1 = n°. Also, there are n = n1 1-sequences, each consisting of a single object from S. Therefore, s(n,0) = 1 and s(n, 1) = n. We now proceed by induction on k. Assume that s{n,j) = n J , for j = 1,2,... ,k - 1. Now consider all fc-sequences arising from S. Divide them into groups according to the first symbol. There are n such groups. Within each group, the sequence following the initial symbol is a (k — l)-sequence from S because S reacquires the initial symbol before any subsequent choices. By the induction hypothesis, the number of such continuation sequences is s(n, k - 1) = nk~1. Hence there are n groups of nk~1 sequences, giving n · nk~1 = nk sequences in total. I 1.20 Example: A sequence of heads and tails results from tossing a fair coin 10 times. What is the probability that heads and tails alternate in the sequence? The sequence is one of the 2 10 = 1024 10-sequences arising from the set {H, T} when sampling with replacement. Of these 1024 possibilities, only two provide a strict alternation of heads and tails—the one starting with a head and alternating
Chapter 1: Combinatorics and Probability
24
= 04
1-2 c Φ
|0.3
0)
Ö-3
Ö0.2 a
£1
1-4 ^
0.1
(
500 1000 Number of tosses
1500
"Ό
500 1000 Number of tosses
1500
5 log (Number of tosses)
10
05 .t;
I04
c 0)
S 0.3
■5-3
«
Ö0.2
*&
01 0
S )
5 log (Number of tosses)
10
0
Figure 1. 4. Probability of an equal head-tail split in a sequence of coin tosses, (see Example 1.21). Logarithms are base 2. thereafter, and the one starting with a tail and alternating thereafter. So the probability is 2/1024 = 0.001953. The coin may be fair, but strict alternation of the two possibilities is not likely. U 1.21 Example: A sequence of heads and tails results from tossing a fair coin 10 times. What is the probability that the sequence has exactly five heads in it? The sequence is again one of 2 10 = 1024 possible 10-sequences that arise from the set {H,T} when sampling with replacement. To determine the number of sequences with exactly 5 heads, we note that such a sequence is fixed once we determine the positions of the 5 heads. Hence, we must choose 5 positions, without replacement, from the set {1,2,3,4,5,6,7,8,9,10}. Theorem 1.7 states that such choices number Cio.s = 252. The desired probability is then 252/1024 = 0.246. Again, the coin may be fair, but the chance that such fairness will be manifest as an equal split over 10 tosses is rather smaller than 0.5. Indeed, if we were looking for an even split in 100 tosses, the probability would be Cioo.5o/2100 = 0.0796. This suggests, correctly it turns out, that the chances of an even split become less as the number of tosses increases. Figure 1.4 provides some credibility for this suggestion. Since the probability of an even head-tail split is zero when the number of coin tosses is odd, all the graphs in Figure 1.4 provide probability points only for even numbers of coin tosses. The upper-left graph shows how the probability of an even split drops rapidly as the number of tosses grows larger, but it appears to level out. Switching to a logarithmic scale on the probability axis, we can discern a bit more detail about the transition, as shown in the graph in the upper right. However, the flattening of the decline rate persists in this representation. Using logarithmic scales on both axes provides the clearest indication of the overall trend, as shown in the lower-right graph. Π
1.22 T h e o r e m : Let n > 0 and k > 0. Under sampling with replacement, Cn+k-\,k fc-combinations arise from a set with n objects. P R O O F : Let the elements of the set be x\,... , xn. In a given ^-combination,
Section 1.1: Combinatorics Pattern Repetitions *l*l*l (1,1,1,0) *l*ll* (1,1,0,1) »||*|* (1,0,1,1) 1 * 1 * 1 * (0,1,1,1) * * l * l l (2,1,0,0) * l * * l l (1,2,0,0) * * l l * l (2,0,1,0) * l l * * l (1,0,2,0) » » I I I * (2,0,0,1) • I I I * * (1,0,0,2)
3-comb abc abd acd bed aab abb aac ace aad add
25 Pattern Repetitions 1 * * 1 * 1 (0,2,1,0) 1 * 1 * * 1 (0,1,2,0) 1**11* (0,2,0,1) 1 * 1 1 * * (0,1,0,2) 1 1 * * 1 * (0,0,2,1) 1 1 * 1 * * (0,0,1,2) * * * l l l (3,0,0,0) 1 * * * 1 1 (0,3,0,0) 1 1 * * * 1 (0,0,3,0) I I I * * * (0,0,0,3)
3-comb 66c 6cc bbd bdd ccd edd aaa 666 ccc jddd
TABLE 1.5. Star-bar patterns and 3-combinations from a field of 4 symbols (see Theorem 1.22) each Xi is present some integer number of times, say fc\ > 0. Furthermore, these repetition numbers, provided that their sum is fc, completely specify the particular fc-combination. So the set {{ki,k2,... , kn)\ki > 0, Σ ki = A;} is in one-to-one correspondence with the fc-combinations. Therefore, an equivalent question is: How many such ordered sets of n nonnegative integers add to fc? These are n-sequences chosen from { 0 , 1 , 2 , 3 , . . . ,fc} with replacement. Although we know from Theorem 1.19 that the total number of such n-sequences is (fc + l ) n , not all of these n-sequences sum to fc. Instead, consider the following indirect approach to counting just those n-sequences that add to fc. Imaginefcasterisks and n—1 vertical bars, arranged linearly. Each such display contains n + k — 1 symbols. The possibilities for n — 4,fc= 3 appear as Table 1.5. (The table depicts 3-combinations from the set {abed} rather than from {£1,2:2,13,£4} in order to facilitate comparison with Table 1.1. That earlier table breaks out all the 3-combinations from a set of size 4 but does not offer a method for counting them systematically.) The number of asterisks to the left of the first bar specifies the repetition count for the first symbol from the set. The number of asterisks between the first and second bar specifies the repetition count for the second symbol, and so forth. Because exactly fc asterisks appear in each pattern, the sum of the repetition counts as separated by the bars is always fc. Each distinct sequence of repetition counts corresponds to a fc-combination, and vice versa. The total number of fc-combinations is then the number of ways of choosing n — 1 positions for the bars out of the total of n + fc — 1 possible positions. Theorem 1.7 gives that value as Cn+k-i,n-i = Cn+k-i,k, which proves the theorem. | 1.23 T h e o r e m : The number of n-sequences (x\,X2,X3,.. ■ ,xn) with nonnegative integer components such that 53"_ 0 £i = fc is Cn+k-i,kP R O O F : The proof of Theorem 1.22 establishes a one-to-one correspondence between these n-sequences and the fc-combinations from a set of size n. The theorem also asserts that the latter count is Cn+k-i,k- I The preceding section noted three interpretations of Cn,k, and there are
26
Chapter 1: Combinatorics and Probability
three parallel interpretations for Cn+k-i,k- They again involve a selection process, an allocation process, and a count of constrained vectors. 1. Cn+k-i,k counts the number of ways to select k distinguishable objects from a pool of n, given that each object may be chosen zero, one, or more times. 2. C„+fc_i,fc counts the number of ways to allocate k indistinguishable objects to n containers, given that a container may acquire zero, one, or more objects. 3. Cn+k-\,k counts the number of n-vectors whose components are nonnegative integers summing to k. Theorem 1.23 notes the equivalence of the selection solution and the number of n-vectors with nonnegative integer components summing to k. The equivalence of the allocation solution follows because each such n-vector specifies an allocation pattern. The vector (k\,k2,.·· ,kn) specifies fcj objects in the first container, &2 in the second, and so forth. In discussing the pigeonhole principle, the chapter's introduction noted the impossibility of randomly allocating four balls among three containers such that all containers receive at most 1 ball. It then asked the probability of an allocation in which the leftmost container receives at most 1 ball. We can now answer that question. Total allocations number £3+4-1,4 = 15, while those favoring the leftmost container give it zero or 1 ball. If the leftmost container receives zero balls, the remaining two containers must accommodate four balls, which they can do in C 2 +4-i,4 = 5 ways. If the leftmost container receives 1 ball, the remaining two must receive 3 balls, which can happen in ^2+3-1,3 = 4 ways. The probability is then (5 + 4)/15 = 0.6 that the leftmost container receives at most 1 ball. In this small example, we can tabulate * (0,0,4) * (0,3,1) * (0,2,2) the possible allocations. In the scheme to the * (0,4,0) * (1,3,0) (2,2,0) right, {ki,k2,ks) means that the left container (4,0,0) (3,0,1) * (1,1,2) receives ki balls, the center container receives * (0,1,3) (3,1,0) * (1,2,1) &2, and the right container fc3. Of course, the * (1,0,3) (2,0,2) (2,1,1) ki must sum to four. The asterisks mark the allocations in which the left container receives at most 1 ball. When choosing fc-sequences or fc-combinations with replacement, it is possible for k to be much larger than n, the size of the pool. The following example illustrates this possibility and also investigates the difference between fc-sequences and fc-combinations in probability calculations. 1.24 Example: Ten tosses of a fair coin produces an ordered pair (h,t), where h is the number of heads over the 10 tosses and t is the number of tails. Of course, h + t = 10. How many ordered pairs can result from the experiment? What is the probability of getting (3, 7)? The elements h and t can assume any of the values from { 0 , 1 , . . . , 10}, provided that their sum is 10. According to Theorem 1.23, the number of such 2sequences that sum to 10 is C2+io-i,io = Cn,io = 11. Such a high-powered formula
Section 1.1: Combinatorics
27
is not necessary in this simple case. We see that all possible (ft, t) pairs must be of the form (ft, 10 — ft), with 0 < ft < 10. Hence, there are 11 such pairs. Also, a given (ft, t) pair corresponds to a 10-combination from {H,T}, so Theorem 1.22 also provides the correct answer: Cio+2-1,10 = 11· Now, since there are 11 possible 10-combinations and just 1 with 3 heads, it might appear that the probability of getting (3,7) is 1/11. This is not correct. When calculating a probability as a fraction of some number of equally likely possibilities, we must check carefully that the denominator equitably represents the field of possibilities. Of the 11 possible 10-combinations, are all equally likely in a physical experiment with ten coins? Does (1,9), in which just 1 head occurs, appear as often as (5,5), in which 5 heads occur? No, these outcomes are not equally likely. What actually occurs are 10-sequences, such as HTTHHHTTHT. By Theorem 1.19, there are 2 10 = 1024 such sequences., and they are equally likely. Also, since a sequence with 3 heads is determined once the positions of the heads are known, we must have as many such sequences with 3 heads as there are 3combinations, without replacement, from the positions { 1 , 2 , . . . , 10}. Using Theorem 1.7, we then compute
P 37
'( - > = W C ' ° ' 3 = W = 0117
ΡΓ 5 5 =
1
Cl
5
252
< · > w °- = w = 0·246
P ' 0 . » ) = ^ » . . = - ^ - = 0.0098.0 When calculating a probability as a fraction of cases favorable to some constraint out of all possible cases, we must be more careful when the process is sampling with replacement. In the simpler situation, sampling without replacement, we can normally use either sequences or combinations, as long as we are consistent in the numerator and denominator. As suggested by Table 1.1, each fc-combination, without replacement, gives rise tofc!fc-sequences. Therefore, a solution using fc-sequences in numerator and denominator is the same solution using fc-combinations, after multiplying numerator and denominator by fc!. The value of the fraction does not change. If you want, for example, the probability of receiving 4 hearts when dealt 4 cards from a well-shuffled deck, you can form the appropriate 4-sequence ratio, ^13,4/^52,4 = 0.00264, or the corresponding 4-combination ratio, Ciz^/C$2,4 = 0.00264. The numerator and denominator of the first fraction are just 4! = 24 times those of the second. Because combinations are simpler and extend more easily to complicated situations, they are more frequently used in situations where the underlying process is sampling without replacement. In sampling with replacement, each fc-combination does not. give rise to fc!fc-sequences;sometimes the number of sequences is much less, as illustrated in Table 1.1. Therefore, the ratio of fc-combinations is not the same as the ratio of the corresponding fc-sequences. In sampling with replacement, it is normally the ratio of the appropriate fc-sequences that corresponds to the reality of the experiment. The following example provides another situation where a simple
28
Chapter 1: Combinatorics and Probability
sanity check shows that a proposed solution involvingfc-combinationscannot be correct. 1.25 Example: Suppose that 10 people are gathered in a room. What is the probability that two or more persons have the same birthday? By having the same birthday, we mean born on the same day of the year, not necessarily being the same age. We imagine each person announcing a birthday from the 365 possible choices. The birthdays constitute a 10-combination from the set of days: {1, 2, 3 , . . . , 365}. The 10-combination arises from sampling with replacement because an announcement can duplicate a birthday already used. The number of such 10-combinations is D = C365+io-i,io· The combinations corresponding to no two persons having the same birthday are just those that happen to have no duplicate elements. That is, they correspond to the 10-combinations that you could draw without replacement. Hence they number C365,io· The rest of the combinations, N = £374,10 — C*365,i0i then count the sequences for which at least two persons share a birthday. It might appear that the probability of two or more persons having the same birthday is N/D. This is not correct. Consider the following sanity check. Suppose that the room contains only two persons. You then expect the probability of a shared birthday to be 1/365. (You ask one person for his birthday, and you have 1 chance in 365 that it will match that of the other person.) The reasoning above, however, gives N
D
_
C365+2-1.2 —