Univariate Discrete Distributions
Univariate Discrete Distributions THIRD EDITION
NORMAN L. JOHNSON University of No...
223 downloads
2103 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Univariate Discrete Distributions
Univariate Discrete Distributions THIRD EDITION
NORMAN L. JOHNSON University of North Carolina Department of Statistics Chapel Hill, North Carolina
ADRIENNE W. KEMP University of St. Andrews Mathematical Institute North Haugh, St. Andrews United Kingdom
SAMUEL KOTZ George Washington University Department of Engineering Management and Systems Engineering Washington, D.C.
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: ISBN 0-471-27246-9 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1
To the memory of Norman Lloyd Johnson (1917–2004)
Contents
Preface 1
xvii
Preliminary Information 1.1
1.2
1
Mathematical Preliminaries, 1 1.1.1 Factorial and Combinatorial Conventions, 1 1.1.2 Gamma and Beta Functions, 5 1.1.3 Finite Difference Calculus, 10 1.1.4 Differential Calculus, 14 1.1.5 Incomplete Gamma and Beta Functions and Other Gamma-Related Functions, 16 1.1.6 Gaussian Hypergeometric Functions, 20 1.1.7 Confluent Hypergeometric Functions (Kummer’s Functions), 23 1.1.8 Generalized Hypergeometric Functions, 26 1.1.9 Bernoulli and Euler Numbers and Polynomials, 29 1.1.10 Integral Transforms, 32 1.1.11 Orthogonal Polynomials, 32 1.1.12 Basic Hypergeometric Series, 34 Probability and Statistical Preliminaries, 37 1.2.1 Calculus of Probabilities, 37 1.2.2 Bayes’s Theorem, 41 1.2.3 Random Variables, 43 1.2.4 Survival Concepts, 45 1.2.5 Expected Values, 47 1.2.6 Inequalities, 49 1.2.7 Moments and Moment Generating Functions, 50 1.2.8 Cumulants and Cumulant Generating Functions, 54 vii
viii
CONTENTS
1.2.9 1.2.10 1.2.11 1.2.12 1.2.13 1.2.14 1.2.15 1.2.16 1.2.17 1.2.18 2
Families of Discrete Distributions 2.1 2.2
2.3
2.4
2.5 2.6 2.7 2.8 2.9 3
Joint Moments and Cumulants, 56 Characteristic Functions, 57 Probability Generating Functions, 58 Order Statistics, 61 Truncation and Censoring, 62 Mixture Distributions, 64 Variance of a Function, 65 Estimation, 66 General Comments on the Computer Generation of Discrete Random Variables, 71 Computer Software, 73
Lattice Distributions, 74 Power Series Distributions, 75 2.2.1 Generalized Power Series Distributions, 75 2.2.2 Modified Power Series Distributions, 79 Difference-Equation Systems, 82 2.3.1 Katz and Extended Katz Families, 82 2.3.2 Sundt and Jewell Family, 85 2.3.3 Ord’s Family, 87 Kemp Families, 89 2.4.1 Generalized Hypergeometric Probability Distributions, 89 2.4.2 Generalized Hypergeometric Factorial Moment Distributions, 96 Distributions Based on Lagrangian Expansions, 99 Gould and Abel Distributions, 101 Factorial Series Distributions, 103 Distributions of Order-k, 105 q-Series Distributions, 106
Binomial Distribution 3.1 3.2 3.3 3.4 3.5 3.6
74
Definition, 108 Historical Remarks and Genesis, 109 Moments, 109 Properties, 112 Order Statistics, 116 Approximations, Bounds, and Transformations, 116
108
ix
CONTENTS
3.6.1 Approximations, 116 3.6.2 Bounds, 122 3.6.3 Transformations, 123 3.7 Computation, Tables, and Computer Generation, 124 3.7.1 Computation and Tables, 124 3.7.2 Computer Generation, 125 3.8 Estimation, 126 3.8.1 Model Selection, 126 3.8.2 Point Estimation, 126 3.8.3 Confidence Intervals, 130 3.8.4 Model Verification, 133 3.9 Characterizations, 134 3.10 Applications, 135 3.11 Truncated Binomial Distributions, 137 3.12 Other Related Distributions, 140 3.12.1 Limiting Forms, 140 3.12.2 Sums and Differences of Binomial-Type Variables, 140 3.12.3 Poissonian Binomial, Lexian, and Coolidge Schemes, 144 3.12.4 Weighted Binomial Distributions, 149 3.12.5 Chain Binomial Models, 151 3.12.6 Correlated Binomial Variables, 151 4
Poisson Distribution 4.1 4.2
4.3 4.4 4.5 4.6
4.7
Definition, 156 Historical Remarks and Genesis, 156 4.2.1 Genesis, 156 4.2.2 Poissonian Approximations, 160 Moments, 161 Properties, 163 Approximations, Bounds, and Transformations, 167 Computation, Tables, and Computer Generation, 170 4.6.1 Computation and Tables, 170 4.6.2 Computer Generation, 171 Estimation, 173 4.7.1 Model Selection, 173 4.7.2 Point Estimation, 174 4.7.3 Confidence Intervals, 176
156
x
CONTENTS
4.7.4 Model Verification, 178 4.8 Characterizations, 179 4.9 Applications, 186 4.10 Truncated and Misrecorded Poisson Distributions, 188 4.10.1 Left Truncation, 188 4.10.2 Right Truncation and Double Truncation, 191 4.10.3 Misrecorded Poisson Distributions, 193 4.11 Poisson–Stopped Sum Distributions, 195 4.12 Other Related Distributions, 196 4.12.1 Normal Distribution, 196 4.12.2 Gamma Distribution, 196 4.12.3 Sums and Differences of Poisson Variates, 197 4.12.4 Hyper-Poisson Distributions, 199 4.12.5 Grouped Poisson Distributions, 202 4.12.6 Heine and Euler Distributions, 205 4.12.7 Intervened Poisson Distributions, 205 5
Negative Binomial Distribution 5.1 5.2 5.3
Definition, 208 Geometric Distribution, 210 Historical Remarks and Genesis of Negative Binomial Distribution, 212 5.4 Moments, 215 5.5 Properties, 217 5.6 Approximations and Transformations, 218 5.7 Computation and Tables, 220 5.8 Estimation, 222 5.8.1 Model Selection, 222 5.8.2 P Unknown, 222 5.8.3 Both Parameters Unknown, 223 5.8.4 Data Sets with a Common Parameter, 226 5.8.5 Recent Developments, 227 5.9 Characterizations, 228 5.9.1 Geometric Distribution, 228 5.9.2 Negative Binomial Distribution, 231 5.10 Applications, 232 5.11 Truncated Negative Binomial Distributions, 233 5.12 Related Distributions, 236 5.12.1 Limiting Forms, 236
208
xi
CONTENTS
5.12.2 5.12.3 5.12.4 5.12.5 5.12.6 5.12.7 5.12.8 5.12.9 6
Extended Negative Binomial Model, 237 Lagrangian Generalized Negative Binomial Distribution, 239 Weighted Negative Binomial Distributions, 240 Convolutions Involving Negative Binomial Variates, 241 Pascal–Poisson Distribution, 243 Minimum (Riff–Shuffle) and Maximum Negative Binomial Distributions, 244 Condensed Negative Binomial Distributions, 246 Other Related Distributions, 247
Hypergeometric Distributions 6.1 6.2
Definition, 251 Historical Remarks and Genesis, 252 6.2.1 Classical Hypergeometric Distribution, 252 6.2.2 Beta–Binomial Distribution, Negative (Inverse) Hypergeometric Distribution: Hypergeometric Waiting-Time Distribution, 253 6.2.3 Beta–Negative Binomial Distribution: Beta–Pascal Distribution, Generalized Waring Distribution, 256 6.2.4 P´olya Distributions, 258 6.2.5 Hypergeometric Distributions in General, 259 6.3 Moments, 262 6.4 Properties, 265 6.5 Approximations and Bounds, 268 6.6 Tables, Computation, and Computer Generation, 271 6.7 Estimation, 272 6.7.1 Classical Hypergeometric Distribution, 273 6.7.2 Negative (Inverse) Hypergeometric Distribution: Beta–Binomial Distribution, 274 6.7.3 Beta–Pascal Distribution, 276 6.8 Characterizations, 277 6.9 Applications, 279 6.9.1 Classical Hypergeometric Distribution, 279 6.9.2 Negative (Inverse) Hypergeometric Distribution: Beta–Binomial Distribution, 281 6.9.3 Beta–Negative Binomial Distribution: Beta–Pascal Distribution, Generalized Waring Distribution, 283 6.10 Special Cases, 283
251
xii
CONTENTS
6.10.1 6.10.2 6.10.3 6.10.4 6.10.5 6.11 Related 6.11.1 6.11.2
Discrete Rectangular Distribution, 283 Distribution of Leads in Coin Tossing, 286 Yule Distribution, 287 Waring Distribution, 289 Narayana Distribution, 291 Distributions, 293 Extended Hypergeometric Distributions, 293 Generalized Hypergeometric Probability Distributions, 296 6.11.3 Generalized Hypergeometric Factorial Moment Distributions, 298 6.11.4 Other Related Distributions, 299
7
Logarithmic and Lagrangian Distributions 7.1
7.2
Logarithmic Distribution, 302 7.1.1 Definition, 302 7.1.2 Historical Remarks and Genesis, 303 7.1.3 Moments, 305 7.1.4 Properties, 307 7.1.5 Approximations and Bounds, 309 7.1.6 Computation, Tables, and Computer Generation, 310 7.1.7 Estimation, 311 7.1.8 Characterizations, 315 7.1.9 Applications, 316 7.1.10 Truncated and Modified Logarithmic Distributions, 317 7.1.11 Generalizations of the Logarithmic Distribution, 319 7.1.12 Other Related Distributions, 321 Lagrangian Distributions, 325 7.2.1 Otter’s Multiplicative Process, 326 7.2.2 Borel Distribution, 328 7.2.3 Consul Distribution, 329 7.2.4 Geeta Distribution, 330 7.2.5 General Lagrangian Distributions of the First Kind, 331 7.2.6 Lagrangian Poisson Distribution, 336 7.2.7 Lagrangian Negative Binomial Distribution, 340
302
xiii
CONTENTS
7.2.8 7.2.9 8
Mixture Distributions 8.1
8.2
8.3
8.4 9
Lagrangian Logarithmic Distribution, 341 Lagrangian Distributions of the Second Kind, 342
Basic Ideas, 343 8.1.1 Introduction, 343 8.1.2 Finite Mixtures, 344 8.1.3 Varying Parameters, 345 8.1.4 Bayesian Interpretation, 347 Finite Mixtures of Discrete Distributions, 347 8.2.1 Parameters of Finite Mixtures, 347 8.2.2 Parameter Estimation, 349 8.2.3 Zero-Modified and Hurdle Distributions, 351 8.2.4 Examples of Zero-Modified Distributions, 353 8.2.5 Finite Poisson Mixtures, 357 8.2.6 Finite Binomial Mixtures, 358 8.2.7 Other Finite Mixtures of Discrete Distributions, 359 Continuous and Countable Mixtures of Discrete Distributions, 360 8.3.1 Properties of General Mixed Distributions, 360 8.3.2 Properties of Mixed Poisson Distributions, 362 8.3.3 Examples of Poisson Mixtures, 365 8.3.4 Mixtures of Binomial Distributions, 373 8.3.5 Examples of Binomial Mixtures, 374 8.3.6 Other Continuous and Countable Mixtures of Discrete Distributions, 376 Gamma and Beta Mixing Distributions, 378
Stopped-Sum Distributions 9.1 9.2 9.3 9.4 9.5 9.6
343
Generalized and Generalizing Distributions, 381 Damage Processes, 386 Poisson–Stopped Sum (Multiple Poisson) Distributions, 388 Hermite Distribution, 394 Poisson–Binomial Distribution, 400 Neyman Type A Distribution, 403 9.6.1 Definition, 403 9.6.2 Moment Properties, 405 9.6.3 Tables and Approximations, 406
381
xiv
CONTENTS
9.7 9.8 9.9 9.10 9.11 9.12 9.13
9.6.4 Estimation, 407 9.6.5 Applications, 409 P´olya–Aeppli Distribution, 410 Generalized P´olya–Aeppli (Poisson–Negative Binomial) Distribution, 414 Generalizations of Neyman Type A Distribution, 416 Thomas Distribution, 421 Borel–Tanner Distribution: Lagrangian Poisson Distribution, 423 Other Poisson–Stopped Sum (multiple Poisson) Distributions, 425 Other Families of Stopped-Sum Distributions, 426
10 Matching, Occupancy, Runs, and q-Series Distributions 10.1 10.2 10.3 10.4
10.5 10.6
10.7
10.8
Introduction, 430 Probabilities of Combined Events, 431 Matching Distributions, 434 Occupancy Distributions, 439 10.4.1 Classical Occupancy and Coupon Collecting, 439 10.4.2 Maxwell–Boltzmann, Bose–Einstein, and Fermi–Dirac Statistics, 444 10.4.3 Specified Occupancy and Grassia–Binomial Distributions, 446 Record Value Distributions, 448 Runs Distributions, 450 10.6.1 Runs of Like Elements, 450 10.6.2 Runs Up and Down, 453 Distributions of Order k, 454 10.7.1 Early Work on Success Runs Distributions, 454 10.7.2 Geometric Distribution of Order k, 456 10.7.3 Negative Binomial Distributions of Order k, 458 10.7.4 Poisson and Logarithmic Distributions of Order k, 459 10.7.5 Binomial Distributions of Order k, 461 10.7.6 Further Distributions of Order k, 463 q-Series Distributions, 464 10.8.1 Terminating Distributions, 465 10.8.2 q-Series Distributions with Infinite Support, 470 10.8.3 Bilateral q-Series Distributions, 474 10.8.4 q-Series Related Distributions, 476
430
CONTENTS
11 Parametric Regression Models and Miscellanea
xv
478
11.1 Parametric Regression Models, 478 11.1.1 Introduction, 478 11.1.2 Tweedie–Poisson Family, 480 11.1.3 Negative Binomial Regression Models, 482 11.1.4 Poisson Lognormal Model, 483 11.1.5 Poisson–Inverse Gaussian (Sichel) Model, 484 11.1.6 Poisson Polynomial Distribution, 487 11.1.7 Weighted Poisson Distributions, 488 11.1.8 Double-Poisson and Double-Binomial Distributions, 489 11.1.9 Simplex–Binomial Mixture Model, 490 11.2 Miscellaneous Discrete Distributions, 491 11.2.1 Dandekar’s Modified Binomial and Poisson Models, 491 11.2.2 Digamma and Trigamma Distributions, 492 11.2.3 Discrete Ad`es Distribution, 494 11.2.4 Discrete Bessel Distribution, 495 11.2.5 Discrete Mittag–Leffler Distribution, 496 11.2.6 Discrete Student’s t Distribution, 498 11.2.7 Feller–Arley and Gegenbauer Distributions, 499 11.2.8 Gram–Charlier Type B Distributions, 501 11.2.9 “Interrupted” Distributions, 502 11.2.10 Lost-Games Distributions, 503 11.2.11 Luria–Delbr¨uck Distribution, 505 11.2.12 Naor’s Distribution, 507 11.2.13 Partial-Sums Distributions, 508 11.2.14 Queueing Theory Distributions, 512 11.2.15 Reliability and Survival Distributions, 514 11.2.16 Skellam–Haldane Gene Frequency Distribution, 519 11.2.17 Steyn’s Two-Parameter Power Series Distributions, 521 11.2.18 Univariate Multinomial-Type Distributions, 522 11.2.19 Urn Models with Stochastic Replacements, 524 11.2.20 Zipf-Related Distributions, 526 11.2.21 Haight’s Zeta Distributions, 533 Bibliography
535
Abbreviations
631
Index
633
Preface
This book is dedicated to the memory of Professor N. L. Johnson, who passed away during the production stages. He was my longtime friend and mentor; his assistance with this revision during his long illness is greatly appreciated. His passing is a sad loss to all who are interested in statistical distribution theory. The preparation of the third edition gave Norman and I the opportunity to substantially revise and reorganize parts of the book. This enabled us to increase the coverage of certain areas and to highlight today’s better understanding of interrelationships between distributions. Also a number of errors and inaccuracies in the two previous editions have been corrected and some explanations are clarified. The continuing interest in discrete distributions is evinced by the addition of over 400 new references, nearly all since 1992. Electronic databases, such as Statistical Theory and Methods Abstracts (published by the International Statistical Institute), the Current Index to Statistics: Applications, Methods and Theory (published by the American Statistical Association and the Institute of Mathematical Statistics), and the Thomson ISI Web of Science, have drawn to our attention papers and articles which might otherwise have escaped notice. It is important to acknowledge the impact of scholarly, encyclopedic publications such as the Dictionary and Bibliography of Statistical Distributions in Scientific Work, Vol. 1: Discrete Models, by G. P. Patil, M. T. Boswell, S. W. Joshi, and M. V. Ratnaparkhi (1984) (published by the International Co-operative Publishing House, Fairland, MD), and the Thesaurus of Univariate Discrete Probability Distributions, by G. Wimmer and G. Altmann (1999) (published by Stamm Verlag, Essen). The new edition of Statistical Distributions, by M. Evans, N. Peacock, and B. Hastings (2000) (published by Wiley, New York), encouraged us to address the needs of occasional readers as distinct from researchers into the theoretical and applied aspects of the subject. The objectives of this book are far wider. It aims, as before, to give an account of the properties and the uses of discrete distributions at the time of writing, while adhering to the same level and style as previous editions. The 1969 intention to exclude theoretical minutiae of no apparent practical importance has not
xvii
xviii
PREFACE
been forgotten. We have tried to give a balanced account of new developments, especially those in the more accessible statistical journals. There has also been relevant work in related fields, such as econometrics, combinatorics, probability theory, stochastic processes, actuarial studies, operational research, and social sciences. We have aimed to provide a framework within which future research findings can best be understood. In trying to keep the book to a reasonable length, some material that should have been included was omitted or its coverage curtailed. Comments and criticisms are welcome; I would like to express our gratitude to friends and colleagues for pointing out faults in the last edition and for their input of ideas into the new edition. The structure of the book is broadly similar to that of the previous edition. The organization of the increased amount of material into the same number of chapters has, however, created some unfamilar bedfellows. An extra chapter would have had an untoward effect on the next two books in the series (Univariate Continuous Distributions, Vols. 1 and 2); these begin with Chapter 12. Concerning numbering conventions, each chapter is divided into sections and within many sections there are subsections. Instead of a separate name index, the listed references end with section numbers enclosed in square brackets. Chapter 1 has seen some reordering and the inclusion of a small amount of new, relevant material. Sections 1.1 and 1.2 contain mathematical preliminaries and statistical preliminaries, respectively. Material on the computer generation of specific types of random variables is shifted to appropriate sections in other chapters. We chose not to discuss software explicitly—we felt that this is precluded by shortage of space. Some of the major packages are listed at the end of Chapter 1, however. Many contain modules for tasks associated with specific distributions. Websites are given so that readers can obtain further information. In Chapter 2, most of the material on distributions based on Lagrangian expansions is moved to Chapter 7, which is now entitled Logarithmic and Lagrangian Distributions. There are new short sections in Chapter 2 on order-k and q-series distributions, mentioning their new placement in the book and changes in customary notations since the last edition. Chapters 3, 4, and 5 are structurally little changed, although new sections on chain binomial models (Chapter 3), the intervened Poisson distribution (Chapter 4), and the minimum and maximum negative binomial distributions and the condensed negative binomial distribution (Chapter 5) are added. It is hoped that the limited reordering and insertion of new material in Chapter 6 will improve understanding of hypergeometric-type distributions. Chapter 7 now has a dual role. Logarithmic distributions occupy the first half. The new second part contains a coherent and updated treatment of the previously fragmented material on Lagrangian distributions. The typographical changes in Chapters 8 and 9 are meant to make them more reader friendly. Chapter 10 is now much longer. It contains the section on record value Distributions that was previously in Chapter 11. The treatment of order-k distributions
PREFACE
xix
is augmented by accounts of recent researches. The chapter ends with a consolidated account of the absorption, Euler, and Heine distributions, as well as new q-series material, including new work on the null distribution of the Wilcoxon– Mann–Whitney test statistic. Chapter 11 has seen most change; it is now in two parts. The ability of modern computers to gather and analyze very large data sets with many covariates has led to the construction of many regression-type models, both parametric and nonparametric. The first part of Chapter 11 gives an account of certain regression models for discrete data that are probabilistically fully specified, that is, fully parametric. These include the Tweedie–Poisson family, the Poisson lognormal, Poisson inverse Gaussian, and Poisson polynomial distributions. Efron’s double Poisson and double binomial and the simplex-binomial mixture model also receive attention. The remainder of Chapter 11 is on miscellaneous discrete distributions, as before. Those distributions that have fitted better into earlier chapters are replaced with newer ones, such as the discrete Bessel, the discrete Mittag–Leffler, and the Luria–Delbr¨uck distributions. There is a new section on survival distributions. The section on Zipf and zeta distributions is split into two; renewed interest in the literature in Zipf-type distributions is recognized by the inclusion of Hurwitz–zeta and Lerch distributions. We have been particularly indebted to Professors David Kemp and “Bala” Balakrishnan, who have read the entire manuscript and have made many valuable recommendations (not always implemented). David was particularly helpful with his knowledge of AMS LATEX and his understanding of the Wiley stylefile. He has also been of immense help with the task of proofreading. It is a pleasure to record the facilities and moral support provided by the Mathematical Institute at the University of St Andrews, especially by Dr. Patricia Heggie. Norman and I much regretted that Sam Kotz, with his wide-ranging knowledge of the farther reaches of the subject, felt unable to join us in preparing this new edition. Adrienne W. Kemp St Andrews, Scotland November 2004
CHAPTER 1
Preliminary Information
Introduction This work contains descriptions of many different distributions used in statistical theory and applications, each with its own pecularities distinguishing it from others. The book is intended primarily for reference. We have included a large number of formulas and results. Also we have tried to give adequate bibliographical notes and references to enable interested readers to pursue topics in greater depth. The same general ideas will be used repeatedly, so it is convenient to collect the appropriate definitions and methods in one place. This chapter does just that. The collection serves the additional purpose of allowing us to explain the sense in which we use various terms throughout the work. Only those properties likely to be useful in the discussion of statistical distributions are described. Definitions of exponential, logarithmic, trigonometric, and hyperbolic functions are not given. Except where stated otherwise, we are using real (not complex) variables, and “log,” like “ln,” means natural logarithm (i.e., to base e). A further feature of this chapter is material relating to formulas that will be used only occasionally; where appropriate, comparisons are made with other notations used elsewhere in the literature. In subsequent chapters the reader should refer back to this chapter when an unfamiliar and apparently undefined symbol is encountered. 1.1 MATHEMATICAL PRELIMINARIES 1.1.1 Factorial and Combinatorial Conventions The number of different orderings of n elements is the product of n with all the positive integers less than n; it is denoted by the familiar symbol n! (factorial n), n! = n(n − 1)(n − 2) · · · 1 =
n−1
(n − j ).
(1.1)
j =0
Univariate Discrete Distributions, Third Edition. By Norman L. Johnson, Adrienne W. Kemp, and Samuel Kotz Copyright 2005 John Wiley & Sons, Inc.
1
2
PRELIMINARY INFORMATION
The less familiar semifactorial symbol k!! means (2n)!! = 2n(2n − 2) · · · 2, where k = 2n. The product of a positive integer with the next k − 1 smaller positive integers is called a descending (falling) factorial ; it will in places be denoted by n(k) = n(n − 1) · · · (n − k + 1) =
k−1
(n − j ) =
j =0
n! , (n − k)!
(1.2)
in accordance with earlier editions of this book. Note that there are k terms in the product and that n(k) = 0 for k > n, where n is a positive integer. Readers are WARNED that there is no universal notation for descending factorials in the statistical literature. For example, Mood, Graybill, and Boes (1974) use the symbol (n)k in the sense (n)k = n(n − 1) · · · (n − k + 1), while Stuart and Ord (1987) write n[k] = n(n − 1) · · · (n − k + 1); Wimmer and Altmann (1999) use x(n) = x(x − 1)(x − 2) · · · (x − n + 1), x ∈ R, n ∈ N. Similarly there is more than one notation in the statistical literature for ascending (rising) factorials; for instance, Wimmer and Altmann (1999) use x (n) = x(x + 1)(x + 2) · · · (x + n − 1), x ∈ R, n ∈ N. In the first edition of this book we used n[k] = n(n + 1) · · · (n + k − 1) =
k−1
(n + j ) =
j =0
(n + k − 1)! . (n − 1)!
(1.3)
There is, however, a standard notation in the mathematical literature, where the symbol (n)k is known as Pochhammer’s symbol after the German mathematician L. A. Pochhammer [1841–1920]; it is used to denote (n)k = n(n + 1) · · · (n + k − 1)
(1.4)
[this definition of (n)k differs from that of Mood et al. (1974)]. We will use Pochhammer’s symbol, meaning (1.4) except where it conflicts with the use of (1.3) in earlier editions. The binomial coefficient nr denotes the number of different possible combinations of r items from n different items. We have n n! n = = ; (1.5) r r!(n − r)! n−r
3
MATHEMATICAL PRELIMINARIES
also n n = =1 0 n It is usual to define
n r
and
n+1 n n = + . r r r −1
(1.6)
= 0 if r < 0 or r > n. However,
−n (−n)(−n − 1) · · · (−n − r + 1) = r! r r n+r −1 . = (−1) r
(1.7)
The binomial theorem for a positive integer power n is n n n−j j a b . j
(a + b)n =
(1.8)
j =0
Putting a = b = 1 gives n n n + + ··· + = 2n 0 1 n and putting a = 1, b = −1 gives n n n − + · · · + (−1)n = 0. 0 1 n More generally, for any real power k (1 + b) = k
∞ k j =0
j
aj ,
−1 < b < 1.
(1.9)
By equating coefficients of x in (1 + x)a+b = (1 + x)a (1 + x)b , we obtain the well-known and useful identity known as Vandermonde’s theorem (A. T. Vandermonde [1735–1796]): n a+b a b = . n j n−j j =0
Hence
2 2 2 n n 2n n + + ··· + . = 1 n n 0
(1.10)
4
PRELIMINARY INFORMATION
The multinomial coefficient is
n r1 , r2 , . . . , rk
=
n! , r1 !r2 ! · · · rk !
(1.11)
where r1 + r2 + · · · + rk = n. The multinomial theorem is a generalization of the binomial theorem:
n
n! k a ni i=1 i , aj = k i=1 ni ! j =1 k
(1.12)
where summation is over all sets of nonnegative integers n1 , n2 , . . . , nk that sum to n. There are four ways in which a sample of k elements can be selected from a set of n distinguishable elements:
Order Important? No Yes No Yes
Repetitions Allowed?
Name of Sample
No No Yes
k-Combination k-Permutation k-Combination with replacement k-Permutation with replacement
Yes
Number of Ways to Select Sample C(n, k) P (n, k) C R (n, k) P R (n, k)
where C(n, k) =
n! , k!(n − k)!
(n + k − 1)! C (n, k) = , k!(n − 1)! R
P (n, k) =
n! , (n − k)!
(1.13)
P (n, k) = n . R
k
The number of ways to arrange n distinguishable items in a row is P (n, n) = n! (the number of permutations of n items). The number of ways to arrange n items in a row, assuming that there are k types of items with ni nondistinguishable items of type i, i = 1, 2, . . . , k, is the multinomial coefficient n1 ,n2n,...,nk . The number of derangements of n items (permutations of n items in which item i is not in the ith position) is
1 1 1 n 1 + − + · · · (−1) Dn = n! 1 − . 1! 2! 3! n!
5
MATHEMATICAL PRELIMINARIES
The signum function, sgn(·), shows whether an argument is greater or less than zero: sgn(x) = 1 when x > 0;
sgn(0) = 0;
sgn(x) = −1
when
x < 0.
The ceiling function, x, is the least integer that is not smaller than x, for example, e = 3,
7 = 7,
−2.4 = −2.
The floor function, x, is the greatest integer that is not greater than x, for example, e = 2,
7 = 7,
−2.4 = −3.
The notation [·] = · is called the integer part. π =4
∞ (−1)j = 3.1415926536, 2j + 1 j =0
e=
∞ j =0
ln 2 =
1 = 2.7182818285, j!
∞ (−1)j −1
j
j =0
= 0.6931471806.
1.1.2 Gamma and Beta Functions When n is real but is not a positive integer, meaning can be given to n!, and hence to (1.2), (1.3), (1.5), (1.7), and (1.11), by defining (n − 1)! = (n),
n ∈ R+ ,
(1.14)
where (n) is the gamma function. The binomial theorem can thereby be shown to hold for any real power. There are three equivalent definitions of the gamma function, due to L. Euler [1707–1783], C. F. Gauss [1777–1855], and K. Weierstrass [1815–1897], respectively: Definition 1 (Euler):
∞
(x) = 0
t x−1 e−t dt,
x > 0.
(1.15)
6
PRELIMINARY INFORMATION
Definition 2 (Gauss): (x) = lim
n→∞
n!nx , x(x + 1) · · · (x + n)
x = 0, −1, −2, . . . .
(1.16)
Definition 3 (Weierstrass): ∞ x x 1 γx = xe exp − , 1+ (x) n n
x > 0,
(1.17)
n=1
where γ is Euler’s constant γ = lim
1+
n→∞
1 1 1 ∼ 0.5772156649 . . . . + + · · · + − ln n = 2 3 n
(1.18)
From Definition 1, (1) = 0! = 1. Using integration by parts, Definition 1 gives the recurrence relation for (x): (x + 1) = x(x)
(1.19)
[when x is a positive integer, (x + 1) = x!]. This enables us to define (x) over the entire real line, except where x is zero or a negative integer, as
(x) =
∞
t x−1 e−t dt,
x > 0, (1.20)
0
x −1 (x + 1),
x = −1, −2, . . . .
x < 0,
From Definition 3 it can be shown that
∞
0
1 2
= π 1/2 ; this implies that
√ e−t dt = π; t 1/2
hence, by taking t = u2 , we obtain
∞ 0
Also, from
1 2
2 −u π exp du = . 2 2
(1.21)
= π 1/2 , we have (2n)!π 1/2 , n + 12 = n!22n
(1.22)
7
MATHEMATICAL PRELIMINARIES
Definition 3 and the product formula sin(πx) = πx
∞
1−
n=1
x2 n2
(1.23)
together imply that (x)(1 − x) =
π , sin(πx)
x = 0, −1, −2, . . . .
(1.24)
Legendre’s duplication formula [A.-M. Legendre, 1752–1833] is √
π(2x) = 22x−1 (x) x + 12 ,
x = 0, − 12 , −1, − 32 , . . . .
(1.25)
Gauss’s multiplication theorem is (mx) = (2π)(1−m)/2 mmx−1/2
m j =1
x = 0, −
j −1 x+ , m
1 2 3 ,− ,− ,..., m m m
(1.26)
where m = 1, 2, 3, . . . . This clearly reduces to Legendre’s duplication formula when m = 2. Many approximations for probabilities and cumulative probabilities have been obtained using various forms of Stirling’s expansion [J. Stirling, 1692–1770] for the gamma function: (x + 1) ∼ (2π)1/2 (x + 1)x+1/2 e−x−1 1 1 1 − × exp + − · · · , (1.27) 12(x + 1) 360(x + 1)3 1260(x + 1)5 (x + 1) ∼ (2π)1/2 x x+1/2 e−x 1 1 1 1 × exp + − + · · · , − 12x 360x 3 1260x 5 1680x 7 (x + 1) ∼ (2π)1/2 (x + 1)x+1/2 e−x−1 1 1 × 1+ − ··· , + 12(x + 1) 288(x + 1)2
(1.28)
(1.29)
(x + 1) ∼ (2π)1/2 x x+1/2 e−x 1 139 571 1 × 1+ − − +··· . + 12x 288x 2 51,840x 3 2,488,320x 4 (1.30)
8
PRELIMINARY INFORMATION
These are divergent asymptotic expansions, yielding extremely good approximations. The remainder terms for (1.27) and (1.28) are each less in absolute value than the first term that is neglected, and they have the same sign. Barnes’s expansion [E. W. Barnes, 1874–1953] is less well known, but it is useful for half integers: 7 1 31 + − + · · · . (1.31) x + 12 ∼ (2π)1/2 x x e−x exp − 24x 2880x 3 40320x 5 Also (x + a) (a − b)(a + b − 1) a−b ∼x + ··· . 1+ (x + b) 2x
(1.32)
These also are divergent asymptotic expansions. Series (1.31) has accuracy comparable to (1.27) and (1.28). The beta function B (a, b) is defined by the Eulerian integral of the first kind :
1
B(a, b) =
t a−1 (1 − t)b−1 dt,
a > 0,
b > 0.
(1.33)
0
Clearly B(a, b) = B(b, a). Putting t = u/(1 + u) gives
∞
B(a, b) = 0
ua−1 du du, (1 + u)a+b
a > 0,
b > 0.
(1.34)
The relationship between the beta and gamma functions is B(a, b) =
(a)(b) , (a + b)
a, b = 0, −1, −2 . . . .
(1.35)
The derivatives of the logarithm of (a) are also useful, though they are not needed as often as the gamma function itself. The function ψ(x) =
d (x) [ln (x)] = dx (x)
(1.36)
is called the digamma function (with argument x) or the psi function. Similarly ψ (x) =
d d2 [ψ(x)] = 2 [ln (x)] dx dx
9
MATHEMATICAL PRELIMINARIES
is called the trigamma function, and generally ψ (s) (x) =
ds d s+1 [ψ(x)] = [ln (x)] dx s dx s+1
(1.37)
is called the (s + 2)-gamma function. Extensive tables of the digamma, trigamma, tetragamma, pentagamma, and hexagamma functions are contained in Davis (1933, 1935). Shorter tables are in Abramowitz and Stegun (1965). The recurrence formula (1.19) for the gamma function yields the following recurrence formulas for the psi function: ψ(x + 1) = ψ(x) + x −1 and ψ(x + n) = ψ(x) +
n
(x + j − 1)−1 ,
n = 1, 2, 3, . . . .
(1.38)
j =1
Also
n
ψ(x) = lim ln(n) − n→∞
(x + j )−1
j =0 ∞
= −γ −
1 x + x j (x + j )
(1.39)
j =1
= −γ + (x − 1)
∞
[(j + 1)(j + x)]−1
(1.40)
j =0
and ψ(mx) = ln(m) +
m−1 1 j ψ x+ , m m
m = 1, 2, 3, . . . ,
(1.41)
j =0
where γ is Euler’s constant (∼ = 0.5772156649 . . .). An asymptotic expansion for ψ(x) is ψ(x) ∼ ln x −
1 1 1 1 + − + ···, − 2 4 2x 12x 120x 252x 6
(1.42)
and hence a very good approximation for ψ(x) is ψ(x) ≈ ln(x − 0.5), provided that x ≥ 2. Particular values of ψ(x) are ψ(1) = −γ ,
ψ
1 2
= −γ − 2 ln(2) ≈ −1.963510 . . . .
10
PRELIMINARY INFORMATION
1.1.3 Finite Difference Calculus The displacement operator E increases the argument of a function by unity: E[f (x)] = f (x + 1), E[E[f (x)]] = E[f (x + 1)] = f (x + 2). More generally, E n [f (x)] = f (x + n)
(1.43)
for any positive integer n, and we interpret E h [f (x)] as f (x + h) for any real h. The forward-difference operator is defined by f (x) = f (x + 1) − f (x).
(1.44)
Noting that f (x + 1) − f (x) = E[f (x)] − f (x) = (E − 1)f (x), we have the symbolic (or operational ) relation ≡ E − 1.
(1.45)
If n is an integer, then the nth forward difference of f (x) is n f (x) = (E − 1)n f (x) =
=
n j =0
n n
j
j =0
(−1)j E n−j f (x)
n (−1)j f (x + n − j ). j
(1.46)
Also, rewriting (1.45) as E = 1 + , we have f (x + n) = (1 + )n f (x) =
n n j =0
j
j f (x).
(1.47)
Newton’s forward-difference (interpolation) formula [I. Newton, 1642–1727] is obtained by replacing n by h, where h may be any real number, and using the interpretation of E h [f (x)] as f (x + h): f (x + h) = (1 + )h = f (x) + h f (x) +
h(h − 1) 2 f (x) + · · · . 2!
(1.48)
11
MATHEMATICAL PRELIMINARIES
The series on the right-hand side need not terminate. However, if h is small and n f (x) decreases rapidly enough as n increases, then a good approximation to f (x + h) may be obtained with but few terms of the expansion. This expansion may then be used to interpolate values of f (x + h), given values f (x), f (x + 1), . . . , at unit intervals. The backward-difference operator ∇ is defined similarly, by the equation ∇f (x) = f (x) − f (x − 1) = (1 − E −1 )f (x).
(1.49)
Note that ∇ ≡ E −1 ≡ E −1 . There is a backward-difference interpolation formula analogous to Newton’s forward-difference formula. The central-difference operator δ is defined by δf (x) = f x + 12 − f x − 12 = (E 1/2 − E −1/2 )f (x).
(1.50)
Note that δ ≡ E −1/2 ≡ E −1/2 . Everett’s central-difference interpolation formula [W. N. Everett, 1924– ] f (x + h) = (1 − h)f (x) + hf (x + 1) − 16 (1 − h)[1 − (1 − h)2 ]δ 2 f (x) − 16 h(1 − h2 )δ 2 f (x + 1) + · · · is especially useful for computation. Newton’s forward-difference formula (1.48) can be rewritten as f (x + h) =
∞ h j =0
j
j f (x).
(1.51)
If f (x) is a polynomial of degree N , this expansion ends with the term containing N f (x). Applying the difference operator to the descending factorial x (N) gives x (N) = (x + 1)(N) − x (N) = (x + 1)x(x − 1) · · · (x − N + 2) − x(x − 1)(x − 2) · · · (x − N + 1) = [(x + 1) − (x − N + 1)]x(x − 1) · · · (x − N + 2) = N x (N−1) .
(1.52)
Repeating the operation, we have j x (N) = N (j ) x (N−j ) , For j > N we have j x (N) = 0.
j ≤ N.
(1.53)
12
PRELIMINARY INFORMATION
Putting x = 0, h = x, and f (x) = x n in (1.51) gives x = n
n x k=0
k
k 0n =
n S(n, k)x! k=0
(x − k)!
,
(1.54)
where k 0n /k! in (1.54) means k x n /k! evaluated at x = 0 and is called a difference of zero. The multiplier S(n, k) = k 0n /k! of the descending factorials in (1.54) is called a Stirling number of the second kind. Equation (1.54) can be inverted to give the descending factorials as polynomials in x with coefficients called Stirling numbers of the first kind : x! s(n, j )x j . = (x − n)! n
(1.55)
j =0
These notations for the Stirling numbers of the first and second kinds have won wide acceptance in the statistical literature. However, there are no standard symbols in the mathematical literature. Other notations for the Stirling numbers are as follows:
First Kind
Second Kind
s(n, j ) n − 1 (n) B j − 1 n−j
S(n, k) n (−k) kBn−k n k 0n /k!
(j )
Sn
(m) n
Reference Riordan (1958) Milne-Thompson (1933) David and Barton (1962) Abramowitz and Stegun (1965)
j Sn
ᑭnk
Jordan (1950)
Sn
j
σnk
Patil et al. (1984)
S(n, j )
Z(n, k)
Wimmer and Altmann (1999)
Both sets of numbers are nonzero only for j = 0, 1, 2, . . . , n, k = 0, 1, 2, . . . , n, n > 0. For given n or given k, the Stirling numbers of the first kind alternate in sign. The Stirling numbers of the second kind are always positive. An extensive tabulation of the numbers and details of their properties appear in Abramowitz and Stegun (1965) and in Goldberg et al. (1976). The numbers increase very rapidly as their parameters increase. Useful properties are ∞ s(n, j )x n [ln(1 + x)]j = j ! , (1.56) n! n=j
13
MATHEMATICAL PRELIMINARIES
(ex − 1)k = k!
∞ S(n, k)x n n=k
Also
and
n
n!
.
(1.57)
s(n + 1, j ) = s(n, j − 1) − ns(n, j ),
(1.58)
S(n + 1, k) = kS(n, k) + S(n, k − 1),
(1.59)
S(n, j )s(j, m) =
j =m
n
s(n, j )S(j, m) = δm,n ,
(1.60)
j =m
where δm,n is Kronecker delta [L. Kronecker, 1823–1891]; that is, δm,n = 1 for m = n and zero otherwise. Charalambides and Singh (1988) have written a useful review and bibliography concerning the Stirling numbers and their generalizations. Charalambides’s (2002) book deals in depth with many types of special numbers that occur in combinatorics, including generalizations and modifications of the Stirling numbers and the Carlitz, Carlitz–Riordan, Eulerian, and Lah numbers. The Bell numbers are partial sums of Stirling numbers of the second kind, Bm =
m
S(m, j ).
j =0
The Catalan numbers are Cn =
1 2n . n+1 n
The Fibonacci numbers are F0 = F1 = 1, F2 = F0 + F1 = 2, F3 = F1 + F2 = 3, F4 = F2 + F3 = 5, .. . Their generating function is g(t) = 1/(1 − t − t 2 ). The Narayana numbers are 1 n n N (n, k) = . n k k−1
14
PRELIMINARY INFORMATION
1.1.4 Differential Calculus Next we introduce from the differential calculus the differential operator D, defined by Df (x) = f (x) =
df (x) . dx
(1.61)
More generally D j x N = N (j ) x N−j ,
j ≤ N.
(1.62)
Note the analogy between (1.53) and (1.62). If the function f (x) can be expressed in terms of a Taylor series, then the Taylor series is f (x + h) =
∞ j h j =0
j!
D j f (x).
(1.63)
The operator D acting on f (x) formally satisfies ∞ (hD)j j =0
j!
≡ ehD .
(1.64)
Comparing (1.48) with (1.63), we have (again formally) ehD ≡ (1 + )h
and
eD ≡ 1 + .
(1.65)
Although this is only a formal relation between operators, it gives exact results when f (x) is a polynomial of finite order; it gives useful approximations in many other cases, especially when D j f (x) and j f (x) decrease rapidly as j increases. Rewriting eD ≡ 1 + as D ≡ ln(1 + ), we obtain a numerical differentiation formula f (x) = Df (x) = f (x) − 12 2 f (x) + 13 3 f (x) − · · · .
(1.66)
(This is not the only numerical differentiation formula. There are others that are sometimes more accurate. This one is quoted as an example.) Given a change of variable, x = (1 + t), we have [D k f (x)]x=1+t = D k f (1 + t).
(1.67)
Consider now the differential operator θ , defined by θf (x) = xDf (x) = xf (x) = x
df (x) . dx
(1.68)
15
MATHEMATICAL PRELIMINARIES
This satisfies θ k f (x) =
k
S(k, j )x j D j f (x)
(1.69)
j =1
and
x k D k f (x) = θ (θ − 1) · · · (θ − k + 1)f (x).
Also
(1.70)
[θ k f (x)]x=et = D k f (et ),
(1.71)
e−ct [θ k f (x)]x=et = (D + c)k [e−ct f (et )], and
(1.72)
x c θ k [x −c f (x)] = [ect D k {e−ct f (et )}]et =x = [(D − c)k f (et )]et =x = (θ − c)k f (x).
(1.73)
The D and θ operators are useful for handling moment properties of distributions. Lagrange’s expansion [J. L. Lagrange, 1736–1813] for the reversal of a power series assumes that if (1) y = f (x), where f (x) is regular in the neighborhood of x0 , (2) y0 = f (x0 ), and (3) f (x0 ) = 0, then k ∞ (y − y0 )k d k−1 x − x0 x = x0 + . (1.74) k! dx k−1 f (x) − y0 k=1
x=x0
More generally h(x) = h(x0 ) +
∞ (y − y0 )k k=1
k!
d k−1 dx k−1
x − x0 h (x) f (x) − y0
k ! , (1.75) x=x0
where h(x) is infinitely differentiable. (This expansion plays an important role in the theory of Lagrangian distributions; see Section 2.5.) L’Hˆopital’s rule [G. F. A. de L’Hˆopital, 1661–1704] is useful for finding the limit of an indeterminate form. If f (x) and g(x) are functions of x for which limx→b f (x) = limx→b g(x) = 0, and if limx→b [f (x)/g (x)] exists, then f (x) f (x) = lim . x→b g(x) x→b g (x) lim
(1.76)
The use of the O, o notation (Landau’s notation) [E. Landau, 1877–1938] is standard. We say that f (x) f (x) = o(g(x)) as x → ∞ if lim =0 x→∞ g(x)
16
PRELIMINARY INFORMATION
and f (x) = O(g(x))
" " " f (x) " "