ADVANCED TEXTS IN ECONOMETRICS
General Editors C. W . J. Granger
G. E . Mizon
STOCHASTIC LIMIT THEORY An Introducti...
305 downloads
847 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCED TEXTS IN ECONOMETRICS
General Editors C. W . J. Granger
G. E . Mizon
STOCHASTIC LIMIT THEORY An Introduction for Econometricians
JAMES DAVIDSON
Oxford University Press 1 994
Oxford University Press, Walton Street, Oxford Oxford New York
oxz
6DP
Athens Auckland Bangkok Bombay Calcutta Cape Town Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Paris Singapore Taipei Tokyo Toronto and associated companies in Berlin Ibadan Oxford is a trade mark of Oxford University Press Published in the United States by Oxford University Press Inc., New York ©James Davidson, 1994 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press. Within the UK, exceptions are allowed in respect of any fair dealing for the purpose of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, or in the case of
reprographic reproduction in accordance with the terms of the licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside these terms and in other countries should be sent to the Rights Department, Oxford University Press, at the address above This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, re-sold, hired out or otherwise circulated without the publisher's prior consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available
ISBN 0-19-877402-8
ISBN 0-19--877403-6 (Pbk)
1 3 5 7 9 10 8 6 4 2
Printed in Great Britain on acid-free paper by Biddies Ltd., Guildford and King's Lynn
For Lynette, Julia, and Nicola.
' . . . what in me is dark Illumine, what is low raise and support, That, to the height of this great argument, I may assert Eternal Providence, And justify the ways of God to men.'
Paradise Lost, Book I,
16-20
Contents
Preface Mathematical Symbols and Abbreviations
xiii xix
Part 1: Mathematics
1. Sets and Numbers
1.1 1 .2 1.3 1 .4 1.5 1 .6
Basic Set Theory Countable Sets The Real Continuum Sequences of Sets Classes of Subsets Sigma Fields
2. Limits and Continuity
2.1 2.2 2.3 2.4 2.5 2.6 2.7
The Topology of the Real Line Sequences and Limits Functions and Continuity Vector Sequences and Functions Sequences of Functions Summability and Order Relations Arrays
3 8 10 12 13 15 20 23 27 29 30 31 33
3. Measure
3.1 3.2 3.3 3.4 3.5 3.6
Measure Spaces The Extension Theorem Non-measurability Product Spaces Measurable Transformations Borel Functions
36 40 46 48 50 55
4. Integration
4. 1 4.2 4.3 4.4
Construction of the Integral Properties of the Integral Product Measure and Multiple Integrals The Radon-Nikodym Theorem
57 61 64 69
5. Metric Spaces
5. 1 Distances and Metrics 5.2 Separability and Completeness 5.3 Examples
75 78 82
viii
Contents 5.4 Mappings on Metric Spaces 5.5 Function Spaces
6. Topology
6. 1 6.2 6.3 6.4 6.5 6.6
Topological Spaces Countability and Compactness Separation Properties Weak Topologies The Topology of Product Spaces Embedding and Metrization
84 87 93 94 97 101 102 105
Part II: Probability
7. Probability Spaces
7. 1 7.2 7.3 7.4
Probability Measures Conditional Probability Independence Product Spaces
8. Random Variables
8.1 8.2 8.3 8.4 8.5
Measures on the Line Distribution Functions Examples Multivariate Distributions Independent Random Variables
9. Expectations
9. 1 9.2 9.3 9.4 9.5 9.6
Averages and Integrals Expectations of Functions of X Theorems for the Probabilist's Toolbox Multivariate Distributions More Theorems for the Toolbox Random Variables Depending on a Parameter
10. Conditioning
10. 1 10.2 10.3 10.4 10.5 10.6
Conditioning in Product Measures Conditioning on a Sigma Field Conditional Expectations Some Theorems on Conditional Expectations Relationships between Subfields Conditional Distributions
11. Characteristic Functions
1 1 . 1 The Distribution of Sums of Random Variables 1 1 .2 Complex Numbers
111 1 13 1 14 1 15 1 17 1 17 122 124 126 128 130 132 135 137 140 143 145 147 149 1 54 157 161 162
Contents 1 1 .3 The Theory of Characteristic Functions 1 1 .4 The Inversion Theorem 1 1 .5 The Conditional Characteristic Function
ix 164 168 171
Part III: Theory of Stochastic Processes
12. Stochastic Processes
12. 1 12.2 12.3 12.4 12.5 12.6
Basic Ideas and Terminology Convergence of Stochastic Sequences The Probability Model The Consistency Theorem Uniform and Limiting Properties Uniform Integrability
13. Dependence
13.1 13.2 13.3 13.4 13.5 13.6
Shift Transformations Independence and Stationarity Invariant Events Ergodicity and Mixing Subfields and Regularity Strong and Uniform Mixing
14. Mixing
14.1 14.2 14.3 14.4
Mixing Sequences of Random Variables Mixing Inequalities Mixing in Linear Processes Sufficient Conditions for Strong and Uniform Mixing
177 178 179 1 83 1 86 188 191 1 92 195 199 203 206 209 21 1 215 219
15. Martingales
15.1 15.2 15.3 1 5.4 15.5
Sequential Conditioning Extensions of the Martingale Concept Martingale Convergence Convergence and the Conditional Variances Martingale Inequalities
16. Mixingales
16. 1 16.2 16.3 16.4
Definition and Examples Telescoping Sum Representations Maximal Inequalities Uniform Square-integrability
17. Near-Epoch Dependence
17. 1 Definition and Examples 17.2 Near-Epoch Dependence and Mixingales
229 232 235 238 240 247 249 252 257 261 264
Contents
X
17.3 Near-Epoch Dependence and Transformations 17.4 Approximability
267 273
Part IV: The Law of Large Numbers
18. Stochastic Convergence
1 8. 1 1 8.2 1 8.3 1 8.4 1 8.5 1 8.6
Almost Sure Convergence Convergence in Probability Transformations and Convergence Convergence in Lp Norm Examples Laws of Large Numbers
19. Convergence in Lp-Norm
19. 1 19.2 19.3 1 9.4 19.5
Weak Laws by Mean-Square Convergence Almost Sure Convergence by the Method of Subsequences A Martingale Weak Law A Mixingale Weak Law Approximable Processes
20. The Strong Law of Large Numbers
20. 1 20.2 20.3 20.4 20.5 20.6
Technical Tricks for Proving LLNs The Case of Independence Martingale Strong Laws Conditional Variances and Random Weighting Two Strong Laws for Mixingales Near-epoch Dependent and Mixing Processes
21. Uniform Stochastic Convergence
21.1 2 1 .2 21.3 2 1 .4 2 1 .5
Stochastic Functions on a Parameter Space Pointwise and Uniform Stochastic Convergence Stochastic Equicontinuity Generic Uniform Convergence Unifom1 Laws of Large Numbers
28 1 284 285 287 288 289 293 295 298 302 304 306 31 1 313 316 318 323 327 330 335 337 340
Part V: The Central Limit Theorem
22. Weak Convergence of Distributions
22. 1 22.2 22.3 22.4 22.5 22.6
Basic Concepts The Skorokhod Representation Theorem Weak Convergence and Transformations Convergence of Moments and Characteristic Functions Criteria for Weak Convergence Convergence of Random Sums
347 350 355 357 359 361
Contents 23. The Classical Central Limit Theorem
23. 1 23.2 23.3 23.4
The i.i.d. Case Independent Heterogeneous Sequences Feller's Theorem and Asymptotic Negligibility The Case of Trending Variances
24. CLTs for Dependent Processes
24. 1 24.2 24.3 24.4 24.5
A General Convergence Theorem The Martingale Case Stationary Ergodic Sequences The CLT for NED Functions of Mixing Processes Proving the CLT by the Bernstein Blocking Method
25. Some Extensions
25. 1 25.2 25.3 25.4
The CLT with Estimated Normalization The CLT with Random Norming The Multivariate CLT Error Estimation
xi 364 368 373 377 380 383 385 386 391 399 403 405 407
Part VI: The Functional Central Limit Theorem
26. Weak Convergence in Metric Spaces
26. 1 26.2 26.3 26.4 26.5 26.6
Probability Measures on a Metric Space Measures and Expectations Weak Convergence Metrizing the Space of Measures Tightness and Convergence Skorokhod's Representation
27. Weak Convergence in a Function Space
27. 1 27.2 27.3 27.4 27.5 27.6 27.7
Measures on Function Spaces The Space C Measures on C Brownian Motion Weak Convergence on C The Functional Central Limit Theorem The Multivariate Case
28. Cadlag Functions
28. 1 28.2 28.3 28.4
The Space D Metrizing D Billingsley's Metric Measures on D
413 416 418 422 427 43 1 434 437 440 442 447 449 453 456 459 461 465
Contents
xii
28.5 Prokhorov's Metric 28.6 Compactness and Tightness in D 29. FCLTs for Dependent Variables
29. 1 29.2 29.3 29.4 29.5
The Distribution of Continuous Functions on D Asymptotic Independence The FCLT for NED Functions of Mixing Processes Transformed Brownian Motion The Multivariate Case
30. Weak Convergence to Stochastic Integrals
30. 1 30.2 30.3 30.4
Weak Limit Results for Random Functionals Stochastic Processes in Continuous Time Stochastic Integrals Convergence to Stochastic Integrals
Notes References Index
467 469 474 479 48 1 485 490 496 500 503 509 517 519 527
Preface
Recent years have seen a marked increase in the mathematical sophistication of econometric research. While the theory oflinear parametric models which forms the backbone of the subject makes an extensive and clever use of matrix algebra, the statistical prerequisites of this theory are comparatively simple. But now that these models are pretty thoroughly understood, research is concentrated increas ingly on the less tractable questions, such as nonlinear and nonparametric estima tion and nonstationary data generation processes. The standard econometrics texts are no longer an adequate guide to this new technical literature, and a sound understanding of the probabilistic foundations of the subject is becoming less and less of a luxury. The asymptotic theory traditionally taught to students of econometrics is founded on a small body of classical limit theorems, such as Khinchine' s weak law of large numbers and the Lindeberg-Levy central limit theorem, relevant to the stationary and independent data case. To deal with linear stochastic difference equations, appeal can be made to the results of Mann and Wald (1943a), but even these are rooted in the assumption of independent and identically distributed disturbances. This foundation has become increasingly inadequate to sustain the expanding edifice of econometric inference techniques, and recent years have seen a systematic attempt to construct a less restrictive limit theory. Hall and Heyde' s Martingale Limit Theory and its Application (1980) is an important landmark, as are a series of papers by econometricians including among others Halbert White, Ronald Gallant, Donald Andrews, and Herman Bierens. This work introduced to the economeuics profession pioneering research into limit theory under dependence, done in the preceding decades by probabilists such as J. L. Doob, I. A. Ibragimov, Patrick Billingsley, Robert Serfling, Murray Rosenblatt, and Donald McLeish. These latter authors devised various concepts of limited dependence for general nonstationary time series. The concept of a martingale has a long history in probability, but it was primarily Doob's Stochastic Processes (1953) that brought it to prominence as a tool of limit theory. Martingale processes behave like the wealth of a gambler who undertakes a succession of fair bets; the differences of a martingale (the net winnings at each step) are unpredictable from lagged infor mation. Powerful limit theorems are available for martingale difference sequences involving no further restrictions on the dependence of the process. Ibragimov and Rosenblatt respectively defined strong mixing and uniform mixing as character izations of 'limited memory ' , or independence at long range. McLeish defined the notion of a mixingale, the asymptotic counterpart of a martingale difference, becoming unpredictable m steps ahead as m becomes large. This is a weaker property than mixing because it involves only low-order moments of the distribution, but mixin!!ales nossess most of those attributes of mixin!! nrocesses needed to make
xiv
Preface
limit theorems work. Very important from the econometrician' s point of view is the property dubbed by Gallant and White (1988) near-epoch dependence from a phrase in one of McLeish' s papers, although the idea itself goes back to Billingsley (1968) and Ibragimov ( 1962). The mixing property may not be preserved by transformations of sequences involving an infinite number of lags, but near epoch dependence is a condition under which the outputs of a dynamic econometric model can be shown, given some further conditions, to be mixingales when the inputs are mixing. Applications of these results are increasingly in evidence in the econometric literature; Gallant and White's monograph provides an excellent survey of the possibilities. Limit theorems impose restrictions on the amount of dependence between se quence coordinates, and on their marginal distributions. Typically, the probabil ity of outliers must be controlled by requiring the existence of higher-order moments, but there are almost always trade-offs between dependence and moment restrictions, allowing one to buy more of one at the price of less of the other. The fun of proving limit theorems has been to see how far out the envelope of sufficient conditions can be stretched, in one direction or another. To complicate matters, one can get results both by putting limits on the rate of approach to independence (the rate of mixing), and by limiting the type of dependence (the martingale approach), as well as by combining both types of constraint (the mixingale approach). The results now available are remarkably powerful, judged by the yardstick of the classical theory. Proofs of necessity are elusive and the limits to the envelope are not yet known with certainty, but they probably lie not too far beyond the currently charted points. Perhaps the major development in time-series econometrics in the 1980s has been the theory of cointegration, and dealing with the distributions of estimators when time series are generated by unit root processes also requires a new type of limit theory. The essential extra ingredient of this theory is the functional central limit theorem (FCLT). The proof of these weak convergence results calls for a limit theory for the space of functions, which throws up some interesting problems which have no counterpart in ordinary probability. These ideas were pioneered by Russian probabilists in the 1950s, notably A. V. Skorokhod and Yu. V. Prokhorov. It turns out that FCLTs hold under properties generally similar to those for the ordinary CLT (though with a crucial difference), and they can be analysed with the same kind of tools, imposing limitations on dependence and outliers. The probabilistic literature which deals with issues of this kind has been seen as accessible to practising econometricians only with difficulty. Few concessions are made to the nonspecialist, and the concerns of probabilists, statisticians, and econometricians are frequently different. Textbooks on stochastic processes (Cox and Miller 1965 is a distinguished example) often give prominence to topics that econometricians would regard as fairly specialized (e.g. Markov chains, processes in continuous time), while the treatment of important issues like nonstationarity gets tucked away under the heading of advanced or optional material if not omitted altogether. Probability texts are written for students of mathematiCS and 3��1Jme rJ fr�millrJrltV With thP ff\JVlf\rP nf thP cnh1A£>t thM
Preface
XV
econometnc1ans may lack. The intellectual investment required is one that students and practitioners are often, quite reasonably, disinclined to make. It is with issues of this sort in mind that the present book has been written. The first objective has been to provide a coherent and unified account of modern asymptotic theory, which can function as both a course text, and as a work of reference. The second has been to provide a grounding in the requisite mathematics and probability theory, making the treatment sufficiently self-contained that even readers with limited mathematical training might make use of it. This is not to say that the material is elementary. Even when the mathematics is mastered, the reasoning can be intricate and demand a degree of patience to absorb. Proofs for nearly all the results are provided, but readers should never hesitate to pass over these when they impede progress. The book is also intended to be useful as a reference for students and researchers who only wish to know basic things, like the meaning of technical terms, and the variety of limit results available. But, that said, it will not have succeeded in its aim unless the reader is sometimes stimulated to gain a deeper understanding of the material - if for no better reason, because this is a theory abounding in mathematical elegance, and technical ingenuity which is often dazzling. Outline of the Work
Part I covers almost all the mathematics used subsequently. Calculus and matrix algebra are not treated, but in any case there is little of either in the book. Most readers should probably begin by reading Chapters 1 and 2, and perhaps the first sections only of Chapters 3 and 4, noting the definitions and examples but skipping all but the briefest proofs initially. These chapters contain some difficult material, which does not all need to be mastered immediately. Chapters 5 and 6 are strictly required only for Chapter 21 and Part VI, and should be passed over on first reading. Nearly everything needed to read the probability literature is covered in these chapters, with perhaps one notable exception- the theory of normed spaces. Some treatments in probability use a Hilbert space framework, but it can be avoided. The number of applications exploiting this approach seemed currently too small to justify the added technical overhead, although future developments may require this judgement to be revised. Part II covers what for many readers will be more familiar territory. Chapters 7, 8, and 9 contain essential background to be skimmed or studied in more depth, as appropriate. It is the collections of inequalities in §9.3 and §9.5 that we will have the most call to refer to subsequently. The content of Chapter 10 is probably less familiar, and is very important. Most readers will want to study this chapter carefully sooner or later. Chapter 1 1 can be passed over initially, but is needed in conjunction with Part V. In Part III the main business of the work begins. Chapter 12 gives an introduc tion to the main concepts arising in the study of stochastic sequences. Chapters 1 3 and 14 continue the discussion by reviewing concepts of dependence, and Chapters 15, 16, and 17 deal with specialized classes of sequence whose properties
xvi
Preface
make them amenable to the application of limit theorems. Nearly all readers will want to study Chapters 12, 13, and the earlier sections of 14 and 15 before going further, whereas Chapters 16 and 17 are rather technical and should probably be avoided until the context of these results is understood. In Parts IV and V we arrive at the study of the limit theorems themselves. The aim has been to contrast alternative ways of approaching these problems, and to present a general collection of results ranging from the elementary to the very general. Chapter 18 is devoted to fundamentals, and everyone should read this before going further. Chapter 19 compares classical techniques for proving laws of large numbers, depending on the existence of second moments, with more modern methods. Although the concept of convergence in probability is adequate in many econometric applications, proofs of strong consistency of estimators are increas ingly popular in the econometrics literature, and techniques for dependent processes are considered in Chapter 20. Uniform stochastic convergence is an essential concept in the study of econometric estimators, although it has only recently been systematically researched. Chapter 21 contains a synthesis of results that have appeared in print in the last year or two. Part V contrasts the classical central limit theorems for independent processes with the modern results for martingale differences and general dependent processes. Chapter 22 contains the essentials of weak convergence theory for random variables. The treatment is reasonably complete, although one neglected topic, to which much space is devoted in the probability literature, is conver gence to stable laws for sequences with infinite variance. This material has found few applications in econometrics to date, although its omission is another judge ment that may need to be revised in the future. Chapter 23 describes the classic CLTs for independent processes, and Chapter 24 treats modern techniques for dependent, heterogeneous processes. Part VI deals with the functional central limit theorem and related convergence results, including convergence to limits that can be identified with stochastic integrals. A number of new mathematical challenges are presented by this theory, and readers who wish to tackle it seriously will probably want to go back and apply themselves to Chapters 5 and 6 first. Chapter 26 is both the hardest going and the least essential to subsequent developments. It deals with the theory of weak convergence on metric spaces at a greater level of generality than we strictly need, and is the one section where topological arguments seriously intrude. Almost certainly one should go first to Chapter 27, referring back as needed for definitions and statements of the prerequisite theorems, and pursue the rationale for these results further only as interest dictates. Chapter 28 is likewise a technical prologue to Chapers 29 and 30, and might be skipped over at first reading. The meat of this part of the book is in these last two chapters. Results are given on the multivariate invariance principle for heterogeneous dependent processes, paralleling the central limit theorems of Chapter 24. A number of the results in the text are, to the author' s knowledge, new. These include 14.13/14, 19.11, 20.18119, 20.21, 24.617/14, 29.14/29.18 , and 30.13/14, although some have now appeared in print elsewhere.
Preface
xvii
Further Reading
There is a huge range of texts in print covering the relevant mathematics and probability, but the following titles were, for one reason or another, the most frequently consulted in the course of writing this book. T. M. Apostol' s Mathemat ical Analysis (2nd edition) hits just the right note for the basic bread-and butter results. For more advanced material, Dieudonne' s Foundations of Modern Analysis and Royden's Real Analysis are well-known references, the latter being the more user-friendly although the treatment is often fairly concise. Halmos's classic Measure Theory and Kingman and Taylor's Introduction to Measure and Probability are worth having access to. Willard' s General Topology is a clear and well-organized text to put alongside Kelley's classic of the same name. Halmos's Naive Set Theory is a slim volume whose main content falls outside our sphere of interest, but is a good read in its own right. Strongly recommended is Borowski and Borwein' s Collins Reference Dictionary of Mathematics; one can learn more about mathematics in less time by browsing in this little book, and following up the cross references, than by any other method I can think of. For a stimulating introduction to metric spaces see Michael Barnsley' s popular Fractals Everywhere. For further reading on probability, one might begin by browsing the slim volume that started the whole thing off, Kolmogorov's Foundations of the Theory of Probability. Then, Billingsley's Probability and Measure is an inspiration, both authoritative and highly readable. Breiman' s Probability has a refreshingly informal style, and just the right emphasis. Chung's A Course in Probability Theory is idiosyncratic in parts, but strongly recommended. The value of the classic texts, Loeve's Probability Theory (4th edition) and Feller' s An Introd uction to Probability Theory and its Applications (3rd edition) is self-evident, although these are dense and detailed books that can take a little time and patience to get into, and are chiefly for reference. Cramer' s Mathematical Methods of Statistics is now old-fashioned, but still useful. Two more recent titles are Shiryayev's Probability, and R. M. Dudley's tough but stimulating Real Analysis and Probability. Of the more specialized monographs on stochastic convergence, the following titles (in order of publication date) are all important: Doob, Stochastic Processes; Revesz, The Laws of Large Numbers; Parthasarathy, Probability Measures on Metric Spaces; Billingsley, Convergence of Probability Measures; Iosifescu and Theodorescu, Random Processes and Learning; Ibragimov and Linnik, Independent and Stationary Sequences of Random Variables; Stout, Almost Sure Convergence; Lukacs, Stochastic Convergence; Hall and Heyde, Martingale Limit Theory and its Application; Pollard, Convergence of Stochastic Processes; Eberlein and Taqqu (eds.), Dependence in Probability and Statistics. Doob is the founding father of the subject, and his book its Old Testament. Of the rest, Billingsley's is the most original and influential. Ibragimov and Linnik's essential monograph is now, alas, hard to obtain. The importance of Hall and Heyde was mentioned above. Pollard's book takes up the weak convergence
Preface
xviii
story more or less where Billingsley leaves off, and much of the material complements the coverage of the present volume. The Eberlein-Taqqu collection contains up-to-date accounts of mixing theory, covering some related topics outside the range of the present work. The literature on Brownian motion and stochastic integration is extensive, but Karatzas and Shreve' s Brownian Motion and Stochastic Calculus is a recent and comprehensive source for reference, and Kopp' s Martingales and Stochastic Integrals was found useful at several points. These items receive an individual mention by virtue of being between hard covers. References to the journal literature will be given in context, but it is worth mentioning that the four papers by Donald McLeish, appearing between 197 4 and 1977, form possibly the most influential single contribution to our subject. Finally, titles dealing with applications and related contributions include Serfling, Approximation Theorems of Mathematical Statistics; White, Asymptotic Theory for Econometricians; Gallant, Nonlinear Statistical Methods; Gallant and White, A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models, Amemiya, Advanced Econometrics. All of these are highly recommended for forming a view of what stochastic limit theory is for, and why it matters. Acknowledgements
The idea for this book originated in 1987, in the course of writing a chapter of mathematical and statistical prerequisites for a projected textbook of econometric theory. The initial, very tentative draft was completed during a stay at the University of California (San Diego) Department of Economics in 1988, whose hospitality is gratefully acknowledged. It has grown a great deal since then, and getting it finished has involved a struggle with competing academic commitments as well as the demands of family life. My family deserve special thanks for their forbearance. My colleague Peter Robinson has been a great source of encouragement and help, and has commented on various drafts. Other people who have read portions of the manuscript and provided invaluable feedback, not least in pointing out my errors, include Getullio Silveira, Robert de Jong, and especially Elizabeth Boardman, who took immense trouble to help me lick the chapters on mathematics into shape. I am also most grateful to Don Andrews, Graham Brightwell, S¢ren Johansen, Donald McLeish, Peter Phillips, Hal White, and a number of anonymous referees for helpful conversations, comments and advice. None of these people is responsible for the various flaws that doubtless remain. The book was written using the ChiWriter 4 technical word processor, and after conversion to Postscript format was produced as camera-ready copy on a Hewlett Packard LaserJet 4M printer, direct from the original files. I must particularly thank Cay Horstmann, of Horstmann Software Design Corp., for his technical assistance with this task.
London, June
1994
Mathematical Symbols and Abbreviations
In the text, the symbol o is used to terminate examples and definitions, and also theorems and lemmas unless the proof follows directly. The symbol • terminates proofs. References to numbered expressions are enclosed in parentheses. Ref erences to numbered theorems, examples etc. are given in bold face. References to chapter sections are preceded by §. In statements of theorems, roman numbers (i), (ii), (iii), ... are used to indi cate the parts of a multi-part result. Lower case letters (a), (b), (c), ... are used to itemize the assumptions or conditions specified in a theorem, and also the components of a definition. The page numbers below refer to fuller definitions or examples of use, as appropriate. 1.1 l l . l lp ''·"
absolute value 20 Lp-norm 132 Euclidean norm; 23 also fineness (of a partition) 438 347, 418 weak con vergence (of measures); also implication 19 monotone convergence 23 convergence 23 almost sure convergence 179 convergence in distribution 347 convergence in Lp norm 287 convergence in probability 284 mapping, function 6 composition of mappings 7 equivalence in order of magnitude (of sequences); 32 also equivalence in distribution (of r.v.s) 123 addition modulo 1 46 set difference 3 partial ordering, inequality 5 strict ordering, strict inequality 5 order of magnitude inequality (sequences); 32 also absolutely continuous (measures) 69 mutually singular (of measures) 69 indicator function 53 almost everywhere 38 autoregressive process 218 autoregressive moving average process 215 almost surely, (with resp. to p.m. �) 1 13 •
0
+ -,I s, 2 «
l..
lA (.)
a.e. AR ARMA a.s., a.s.[�]
XX
Ac A, (A) Ao am No
v
B(n,p ) 13
CLT ch.f. c.d.f. Cro,IJ
C,::>
C,::>
X z(n) d(x,y )
D ro, IJ !D
�
dA E
ess sup
E(.) E(.lx) E(.i�) 3
f+, f-
FCLT F(.) x(.) m iff inf i.i.d. 1.0. m
pr. LIE LIL lim limsup, lim liminf, lim L(n) Lp-NED MA m(.)
Symbols complement of A closure of A interior of A strong mixing coefficient aleph-nought (cardinality of [N) 'for every' Binomial distribution with parameters n and p Borel field central limit theorem characteristic function cumulative distribution function continuous functions on the unit interval set containment strict containment chi-squared distribution with n degrees of freedom distance between x and y cadlag functions on the unit interval dyadic rationals symmetric difference boundary of A set membership essential supremum expectation conditional expectation (on variable x) conditional expectation (on a-field �) 'there exists' positive, negative parts of f functional central limit theorem cumulative distribution function characteristic function of X uniform mixing coefficient 'if and only if infimum independently and identically distributed infinitely often in probability law of iterated expectations law of the iterated logarithm limit (sets); also limit (numbers) superior limit (sets); also superior limit (numbers) inferior limit (sets); also inferior limit (numbers) slowly varying function near-epoch dependent in Lp-norrn moving average process Lebesgue measure
3 2 1 , 77 2 1 , 77 209 8 12 122 16 364 162 1 17 437 3 3 124 75 456 26 3 21, 77 3 1 17 128 144 147 15 61 450 1 18 162 209 5 12 193 28 1 284 149 408 13, 23 13, 25 13 , 25 32 261 193 37
Symbols m.d. m.g.f. m.s.
[M
N(Jl,d) [N 1No
n, n
mAn 0(.) o(.) Op (.) Op (.) 0 p.d.f. p.m. P(.) P(. I A) P(. l §')
rr,n
1t 1tt(. ) (Q r.v. Rro,t] x Ry
IR IR+ [R
IR+ n IR
s.e. s.s.e. SLLN sup S(x, £) Sn s� a(r;) a(X)
ffdx ffdJl, ffdF ffdP
L, L
x.xi
martingale difference 23 0 moment-generating function 162 mean square 287 space of measures 418 Gaussian distribution with mean Jl and variance d 123 natural numbers 8 [N u {0} 8 intersection 3 minimum of m and n 258 'Big Oh' , order of magnitude relation 31 'Little Oh' , strict order of magnitude relation 31 stochastic order relation 1 87 strict stochastic order relation 1 87 null set 8 probability density function 122 probability measure 111 probability 111 conditional probability (on event A) 1 13 conditional probability (on a-field §' ) 1 14 product of numbers; 167 also partition of an interval 58 product measure 64 coordinate projection 434 rational numbers 9 random variable 1 17 real valued functions on [0, 1 ] 434 relation 5 real line 10 non-negative reals 11 extended real line, IR u { -oo ,+oo} 12 IR+u { +oo } 12 n-dimensional Euclidean space 11 stochastic equicontinuity 336 strong stochastic equicontinuity 336 strong law of large numbers 289 supremum 12 £-neighbourhood, sphere 20, 76 sum of random variables 290 variance of Sn 364 a-field generated by collection r; 16 a-field generated by r.v. X 146 Lebesgue integral 57 Lebesgue-Stieltjes integral 57 expected value (integral with resp. to p.m.) 128 sum 31
Symbols
xxii Tro
u, U
U[a,b]
v, v mv n
WLLN w.p. 1
Xn X,
® 7l.
X
{.} {.}j, {.r:"' { {.}}
[x] [a,b] (a,b) (Q,?f) (Q,?f ,J..l) (Q,?f ,ji) (Q, ?f,P) (S,d) (2:Z,'t)
shift transformation union uniform distribution on interval [a,b] union of a-fields maximum of m and n weak law of large numbers with probability 1 sample mean of sequence {X1} 1 Cartesian product a-field of product sets integers set designation; also sequence, array infinite sequences array largest integer :::; x closed interval bounded by a,b open interval bounded by a,b measurable space measure space complete measure space probability space metric space topological space
Common usages
A,B,C,D, . . .
X,Y,Z,... X,Y,Z, .. . f,g,h, .. .
£,0,11
B,M
dl,'(f;,V,V, . .
?f,'§,Jf, ..
S,lf ,2:Z, ... J..l,V, ...
d,p 't
.
sets random variables random vectors functions positive constants bounding constants collections of subsets a-fields spaces measures metrics topology
191 3 123 17 257 289 1 13 289 5 48 9 3, 23 23 34 9 11 11 36 36 38 111 75 93
I MATHEMATICS
1 Sets and Numbers
1 . 1 B asic Set Theory
A set is any specified collection of objects. In this book the objects in question are often numbers, but they may also be functions, or other sets, or indeed wholly arbitrary, to be determined by the context in which the theory is applied. In any analysis there is a set which defines the universe of discourse, containing all the objects under consideration, and in what follows, sets denoted A, B etc., are subsets of a set X, with generic element x. Set membership is denoted by the symbol ' E , x E A meaning 'x belongs to the set A'. To show sets A and B have the same elements, one writes A = B. The usual way to define a set is by a descriptive statement enclosed in braces, so that for example A = { x: x E B } defines membership of A in terms of membership of B, and is an alternative way of writing A = B. Another way to denote set membership is by labels. If a set has n elements one can write A = { xi, i = l , ... ,n } , but any set of labels will do. The statement A = { Xa, a E C} says that A is the set of elements bearing labels a contained in another set C, called the index set for A. The labels (indices) need not be numbers, and can be any convenient objects at all. Sets whose elements are sets (the word 'collection' tends to be preferred in this context) are denoted by upper-case script characters. A E t3' denotes that the set A is in the collection t3', or using indices one could write t3' = {A a : a E C} . B is called a subset of A, written B � A, if all the elements of B are also elements of A. If B is a proper subset of A , ruling out B A, the relation is written B cA. The union of A and B is the set whose elements belong to either or both sets, written A u B. The union of a collection t3', the set of elements belong ing to one or more A E t3', is denoted U AE�A, or, alternatively, one can write U a E cA a for the union of the collection {A a: a E C}. The intersection of A and B is the set of elements belonging to both, written A n B. The intersection of a collection t3' is the set of elements common to all the sets in t3', written r1AE � A or n aEcA a. In particular, the union and intersection of {AJ , A2, ..., An} are written Ut=IAi andn t=IAi. When the index set is implicit or unspecified we may write just U a A a, nAi or similar. The difference of sets A and B, written A - B or by some authors A \B, is the set of elements belonging to A but not to B. The symmetric difference of two sets is A !J.B = (A - B) u (B - A). X- A is the complement of A in X, also denoted Ac when X is understood, and we have the general result that A - B = A n Be. The null set (or empty set) is 0 = XC, the set with no elements. Sets with no elements in common (having empty intersection) are called disjoint. A partition of a set is a '
=
Mathematics
4
collection of disjoint subsets whose union is the set, such that each of its elements belongs to one and only one member of the collection. Here are the basic rules of set algebra. Unions and intersections obey commuta tive, associative and distributive laws: A u B = B uA, (1.1) A n B = B n A, (1 .2) (A u B) u C = A u (B u C), (1 .3) (A liB) II C = A n (B n C), (1 .4) A n (B u C) = (A n B) u (A n C), (1 .5) A u (B II C) = (A u B) II (A u C). (1 .6) There are also rules relating to complements known as de Morgan ' s laws: (A u B)c = Ac n Bc, ( 1 .7) (AIIB)c = Ac u Bc. (1 .8) Venn diagrams, illustrated in Fig. 1 . 1 , are a useful device for clarifying rela tionships between subsets.
Fig. 1 . 1 The distributive and de Morgan laws extend to general collections, as follows. 1.1 Theorem Let t;' be a collection of sets, and B a set. Then
( U A) n B = U (A n B), (ii) ( n A ) u B = n (A u B), c (iii) ( n A ) = u Ac, (i)
A E t5
A E t5
A E t5
A E t5
AEt5
A E t5
Sets and Numbers (iv)
( U A) c A E�
=
n Ac.
A E�
5
o
The Cartesian product of two sets A and B, written AxB, is the set of all pos sible ordered pairs of elements, the first taken from A and the second from B; we write AxB = {(x,y) : x E A, y E B}. For a collection of n sets the Cartesian pro duct is the set of all the n-tuples (ordered sets of n elements, with the ith ele ment drawn from Ai), and is written n XAi i=l
=
{(Xt,X2, ... ,Xn): Xi
E
Ai, i = l , ... ,n}.
( 1 .9)
If one of the factor sets Ai is empty, Xi=tAi is also empty. Product sets are important in a variety of different contexts in mathematics. Some of these are readily appreciated; for example, sets whose elements are n-vectors of real numbers are products of copies of the real line (see § 1 .3). But product sets are also central to the mathematical formalization of the notion of relationship between set elements. Thus: a relation R on a set A is any subset of AxA. If (x,y) E R, we usually write x R y. R is said to be reflexive iff x R x, symmetric iff x R y implies y R x, antisymmetric iff x R y and y R x implies x = y, transitive iff x R y and y R z implies x R z, where in each case the indicated condition holds for every x, y, and z E A, as the case may be. (Note: 'iff means 'if and only if .) An equivalence relation is a relation that is reflexive, symmetric, and transi tive. Given an equivalence relation R on A, the equivalence class of an element x E A is the set Ex = {y E A: x R y } . If Ex and Ey are the equivalence classes of elements x and y, then either Exn Ey = 0, or Ex= Ey. The equality relation x = y is the obvious example of an equivalence relation, but by no means the only one. A partial ordering is any relation that is reflexive, antisymmetric, and transi tive. Partial orderings are usually denoted by the symbols::;; or �. with the under standing that x � y is the same as y ::;; x. To every partial ordering there corre sponds a strict ordering, defined by the omission of the elements (x,x) for all x E A. Strict orderings, usually denoted by < or >, are not reflexive or antisym metric, but they are transitive. A set A is said to be linearly ordered by a partial ordering ::;; if one of the relations x < y, x > y, and x = y hold for every pair (x,y) E AxA. If there exist elements a E A and b E A such that a::;; x for all x E A, or x ::;; b for all x E A, a and b are called respectively the smallest and largest elements of A. A linearly ordered set A is called well-ordered if every subset of A contains a smallest element. It is of course in sets whose elements are numbers that the ordering concept is most familiar. Consider two sets X and Y, which can be thought of as representing the universal sets for a pair of related problems. The following bundle of definitions contains
Mathematics
6
the basic ideas about relationships between the elements of such sets. A mapping (or transformation or junction) T: X f--7 Y is a rule that associates each element of X with a unique element of Y; in other words, for each x E X there exists a specified element y E Y, denoted T(x). X is called the domain of the mapping, and Y the codomain. The set Gr = { (x,y): x E X, y = T(x)} � Xx Y ( 1 . 10) is called the graph of T. For A c X, the set T(A) = { T(x): x E A} �0. 1.13 Theorem 2 N o = c. Proof The proposition is proved if we can show that 2 a-J is equipotent with !R or, equivalently (in view of 1.8), with the unit interval [0, 1]. For a set A E 2 1N , construct the sequence of binary digits { b 1 ,b2,b3 , ... } according to the rule, 'bn
14
Mathematics
= 1 if n E A, bn = 0 otherwise'. Using formula ( 1 . 1 5) with m = 1 and q = 0, let this sequence define an element XA of [0, 1 ] (the case where bn = 1 for all n defines 1). On the other hand, for any element x E [0, 1], construct the set Ax E 2 1N according to the rule, 'include n in Ax if and only if the nth digit in the binary expansion of x is a 1 ' . These constructions define a 1-1 correspondence between 2 1N and [0, 1 ] . • When studying the subsets of a given set, particularly their measure-theoretic properties, the power set is often too big for anything very interesting or useful to be said about it. The idea behind the following definitions is to specify sub sets of 2x that are large enough to be interesting, but whose characteristics may be more tractable. We typically do this by choosing a base collection of sets with known properties, and then specifying certain operations for creating new sets from existing ones. These operations permit an interesting diversity of class mem bers to be generated, but important properties of the sets may be deduced from those of the base collection, as the following examples show. 1.14 Definition A ring :R is a nonempty class of subsets of X satisfying (a) 0 E :R. (b) If A and B E 'R then A u B E 'R, A n B E 'R and A - B E 'R. o One generates a ring by specifying an arbitrary basic collection b', which must include 0, and then declaring that any sets that can be generated by the specified operations also belong to the class. A ring is said to be closed under the opera tions of union, intersection and difference. Rings lack a crucial piece of structure, for there is no requirement for the set X itself to be a member. If X is included, a ring becomes afield, or synonymously an algebra. Since X -A = A c, this amounts to including all complements, and, in view of the de Morgan laws, specifying the inclusion of intersections and differ ences becomes redundant. 1.15 Definition A field lff is a class of subsets of X satisfying (a) X E r:f. (b) If A E lff then Ac E r:f. (c) If A and B E lff then A u B E lff . o A field is said to be closed under complementation and finite union, and hence under intersections and differences too; none of these operations can take one outside the class. These classes can be very complex, and also very trivial. The simplest case of a ring is { 0 } . The smallest possible field is {X,0 } . Scarcely less trivial is the field {X,A, Ac, 0 } , where A is any subset of X. What makes any class of sets interesting, or not, is the collection b' of sets it is declared to contain, which we can think of as the 'seed' for the class. We speak of the smallest field containing b' as 'the field generated by t;' ' . Rings and fields are natural classes in the sense of being defined in terms of the simple set operations, but their structure is rather restrictive for some of
Sets and Numbers
15
the applications in probability. More inclusive definitions, carefully tailored to include some important cases, are as follows. 1.16 Definition A semi-ring !f is a non-empty class of subsets of X satisfying (a) 0 E !f. (b) If A, B E !f then A n B E !f. (c) If A, B E !f and A s;;;; B, .:3 n < oo such that B - A = U}=! Cj, where Cj E !f and q n Cj' = 0 for each j, j'. o More succinctly, condition (c) says that the difference of two !!-sets has a finite partition into !f-sets. 1.17 Definition A semi-algebra !f is a class of subsets of X satisfying (a) X E !f. (b) If A, B E !f then A n B E !f. (c) If A E !f, .:3 n < oo such that A c = U}=1 Cj, where Cj E !f and Cj n Cj' = 0 for each j, j'. o A semi-ring containing X is a semi-algebra. 1.18 Example Let X = [R , and consider the class of all the half-open intervals I = (a,b] for -oo < a � b < +=, together with the empty set. If It = (a 1 ,bd and h = (a2 ,b2], then It n fz is one of ft, h (a l ,bz], (az,bd, and 0 . And if /1 s;;;; lz so that a2 � a 1 and b 1 � bz, then lz - It is one of 0, (az,ad, (b i ,bz], (a2,a!] u (b 1 ,b 2], and h The conditions defining a semi-ring are therefore sat isfied, although not those defining a ring. If we now let [R be a member of the class and follow 1.17, we find that the half open intervals, plus the unbounded intervals of the form ( -oo,b] and (a,+oo ), plus 0 and [R, constitute a semi-algebra. o 1 .6 Signna Fields
When we say that a field contains the complements and finite unions, the qualifier finite deserves explanation. It is clear that A 1 , , An E '!f implies that U}=1Aj E r:J by a simple n-fold iteration of pairwise union. But, given the constructive nattit;e of the definition, it is not legitimate without a further stipulation to aSSt,lQle that such an operation can be taken to the limit. By making this additional s�ip��atjgn, ,we obtain the concept of a a-field. t;f��'1Jefuif:tion A a-field (a-algebra) '!f is a class of subsets of X satisfying •••
•··•
< ;{arx e
r:;. e
then Ac E '!f. {An, n E IN } is a sequence of '!f-sets, then u;=lAn E r:J. 0 &'ctt:.:fllelCiis�clcJSed under the operations of complementation and countable union, d¢ Morgan laws, of countable intersection also. A a-ring can be althol.lgh this is not a concept we shall need in the sequel. r;, the intersection of all the a-fields containing r; is A
·.'!f
Mathematics
16
called the a-field generated by �. customarily denoted a(�). The following theorem establishes a basic fact about a-fields. 1.20 Theorem If � is a finite collection a(�) is finite, otherwise a(�) is always uncountable. Proof Define the relation R between elements of X by 'x R y iff x and y are elements of the same sets of �· . R is an equivalence relation, and hence defines an equivalence class £? of disjoint subsets. Each set of £? is the intersection of all the �-sets containing its elements and the complements of the remainder. (For example, see Fig. 1 . 1 . For this collection of regions of IR 2 , £? is the partition defined by the complete network of set boundaries.) If � contains n sets, £? con tains at most 2n sets and a(�), in this case the collection of all unions of g -sets, contains at most 22n sets. This proves the first part of the theorem. Let � be infinite. If it is uncountable then so is a(�) and there is nothing more to show, so assume � is countable. In this case every set in £? is a countable intersection of �-sets or the complements of �-sets, hence £? � a(�), and hence also ?l({:g) � a(�), where ?l({:g) is the collection of all the countable unions of b-sets. If we show ?l({:g) is uncountable, the same will be true of a(�). We may assume that £? is countable, since otherwise there is nothing more to show. So let the sets of £? be indexed by IN. Then every union of £?-sets corresponds uniquely with a subset of IN, and every subset of IN corresponds uniquely to a union of g -sets. In other words, the elements of 71(£?) are equipotent with those of 21N, which are uncountable by 1.13. This completes the proof. • 1.21 Example Let X = IR, and let � = { ( -oo,r], r E (Q }, the collection of closed half-lines with rational endpoints. a(�) is called the Borel field of IR, generally denoted 'B. A number of different base collections generate 'B. Since countable unions of open intervals can be closed intervals, and vice versa, (compare 1.12), the set of open half-lines, { (-oo,r), r E (Q }, will also serve. Or, letting { rn } be a decreasing sequence of rational numbers with rn -1- x,
(1.21) n (-oo, rn]. n=l Such a sequence exists for any x E IR (see 2.15), and hence the same a-field is generated by the (uncountable) collection of half-lines with real endpoints, { ( -oo,x], x E IR } . It easily follows that various other collections generate 'B, including the open intervals of IR, the closed intervals, and the half-open intervals. o ( -oo, x]
=
1.22 Example Let X = iR, the extended real line. The Borel field of iR is easily given. It is 'B = {B, B U { +oo}, B u { -oo}, B U {+oo } U { - oo } : B E 'B}, where 'B is the Borel field of IR . You can verify that 'B is a a-field, and is gener ated by the collection � of 1.21 augmented by the sets { -oo} and iR. o 1.23 Example Given an interval I of the line, the class 'B1 = {B r. I: B E 'B } is
Sets and Numbers
17
called the restnctlon of 'B to /, or the Borel field on / . In fact, 'B1 is the a-field generated from the collection ri' = { ( oo , r] n /: r E (Q } . o Notice how a(ri') has been defined 'from the outside' . It might be thought that a(ri') could be defined 'from the inside' , in terms of a specified sequence of the opera tions of complementation and countable union applied to the elements of ri'. But, despite the constructive nature of the definitions, 1.20 suggests how this may be impossible. Suppose we define A 1 as the set that contains ri', together with the complement of every set in ri' and all the finite and countable unions of the sets of ri'. Of course, ih is not a(ri') because it does not contain the complements of the unions. So let A2 be the set containing A 1 together with all the complements and finite and countable unions of the sets in A1 . Defining A3 , A4, ... in the same manner, it might be thought that the monotone sequence {An } would approach a(ri') as n � oo ; but in fact this is not so. In the case of the class 'Br0, 1 1, for example, it can be shown that A"" is strictly smaller than a(ri') (see Billingsley 1986: 26). On the other hand, a(ri') may be smaller than 2x. This fact is demonstrated, again for 'Bro,l]• in §3.4. The union of two a-fields (the set of elements contained in either or both of them) is not generally a a-field, for the unions of the sets from one field with those from the other are not guaranteed to belong to it. The concept of union for a-fields is therefore extended by adding in these sets. Given a-fields ':J and §', the smallest a-field containing all the elements of ':J and all the elements of §' is denoted ':} v §', called the union of ':J and §'. On the other hand, ':J n §' = {A: A E ':J and A E §'} is a a-field, although for uniformity the notation ':J 1\ §' may be used for such intersections. Formally, ':J 1\ §' denotes the largest of the a-fields whose elements belong to both ':J and §'. Both of these operations generalize to the count able case, so that for a sequence of a-fields ':Jn, n = 1 , 2 ,3, ... we may define v;=l'§n and n;=I ':Jn. Without going prematurely into too many details, it can be said that a large part of the intellectual labour in probability and measure theory is devoted to proving that particular classes of sets are a-fields. Problems of this kind will arise throughout this book. It is usually not too hard to show that A c E ':J when ever A E 'ff, but the requirement to show that a class contains the countable unions can be tough to fulfil. The following material can be helpful in this connection. A monotone class .M is a class of sets such that, if {An} is a monotone sequence with limit A, and An E .M for all n, then A E .M. If {An} is non-decreasing, then A = U";=IAn. If it is non-increasing, then A = no;=IAn. The next theorem shows that, to determine whether or not we are dealing' with a a-field, it is sufficient to consider whether the limits of monotone sequences belong to it, which should often be easier to establish than the general case. 1.24 Theorem ':J is a a-field iff it is both a field and a monotone class. Proof The 'only if part of the theorem is immediate. For the 'if part, define A,� u� tEm, for any sequence { Em E ':} ' m E IN }. Since ':} is a field, An E ':} for any ,yz�c/. if�� tt· l'!lt tAm n E IN } is a monotone sequence with limit u;= I An E ':} ' by -
0. An open set is a set A � !R such that for each x E A, there exists for some E > 0 an £-neighbourhood which is a subset of A. The open intervals defined in § 1 .3 are open sets since if a < x < b, £ = min { I b - x I , I a - x I } > 0 satisfies the definition. IR and 0 are also open sets on the definition. The concept of an open set is subtle and often gives beginners some difficulty. Naive intuition strongly favours the notion that in any bounded set of points there ought to be one that is 'next to' a point outside the set. But open sets are sets that do not have this property, and there is no shortage of them in IR . For a complete understanding of the issues involved we need the additional concepts of Cauchy sequence and limit, to appear in §2.2 below. Doubters are invited to suspend their disbelief for the moment and just take the definition at face value. The collection of all the open sets of IR is known as the topology of !R. More precisely, we ought to call this the usual topology on !R, since other ways of defining open sets of !R can be devised, although these will not concern us. (See Chapter 6 for more information on these matters.) More generally, we can discuss subsets of!R from a topological standpoint, although we would tend to use the term subspace rather than subset in this context. If A c Sl � !R, we say that A is open in Sl if for each x E A there exists S(x,£), £ > 0, such that S(x,£) n Sl is a subset of A . Thus, the interval [0,1) is not open in !R, but it is open in [0, 1]. These sets define the relative topology on Sl , that is, the topology on Sl relative to !R . The following result is an immediate consequence of the definition. 2.1 Theorem If A is open in !R, A n Sl is open in the relative topology on Sl. o A closure point of a set A is a point x E !R such that, for every £ > 0, the set
Limits and Continuity
21
A n S(x,E) is not empty. The closure points of A are not necessarily elements of A, open sets being a case in point. The set of closure points of A is called the closure of A, and will be denoted A or sometimes (A)- if the set is defined by an expression. On the other hand, an accumulation point of A is a point x E IR which is a closure point of the set A - { x}. An accumulation point has other points of A arbitrarily close to it, and if x is a closure point of A and x e A, it must also be an accumulation point. A closure point that is not an accumulation point (the former definition being satisfied because each £-neighbourhood of x contains x itself) is an isolated point of A. A boundary point of a set A is a point x E A such that the set Ac n S(x,£) is not empty for any E > 0 . The set of boundary points of A is denoted aA, and A = A u aA. The interior of A is the set A 0 = A - aA. A closed set is one containing all its closure points, i.e. a set A such that A = A. For an open interval A = (a,b) c IR, A = [a,b]. Every point of (a,b) is a closure point, and a and b are also closure points, not belonging to (a,b). They are the boundary points of both (a,b) and [a,b]. 2.2 Theorem The complement of an open set in IR is a closed set. o This gives an alternative definition of a closed set. According to the defini tions, 0 (the empty set) and IR are both open and closed. The half-line ( -oo,x] is the complement of the open set (x,+oo) and is hence closed. Extending this result to relative topologies, we have the following. 2.3 Theorem If A is open in 5l c IR , then 5l - A is closed in Sl. o In particular, a corollary to 2.1 is that if B is closed in IR then 5l n B is closed in Sl. But, for example, the interval [�)) is not closed in IR , although it is closed in the set (0, 1 ) , since its complement (0,�) is open in (0, 1 ) . Some additional properties of open sets are given in the following theorems. 2.4 Theorem (i) The union of a collection of open sets is open. (ii) If A and B are open, then A n B is open. o This result is be proved in a more general context below, as 5.4. Arbitrary inter sections of open sets need not be open. See 1.12 for a counter-example. 2.5 Theorem Every open set A � IR is the union of a countable collection of dis joint open intervals. Proof Consider a collection { S(x,Ex), x E A h where for each x, Ex > 0 is chosen small enough that S(x,Ex) � A. Then UxEAS(x,Ex) c A, but, since necessarily A s;;;; UxEAS(x,Ex), it follows that UxEAS(x,Ex) = A. This shows that A is a union of open intervals. Now define a relation R for elements of A, such that x R y if there exists an open interval I � A with x E I and y E I. Every x E A is contained in some interval by the preceding argument, so that xRx for all x E A. The symmetry of R is obvious. Lastly, if x,y E I � A and y,z E I' � A, In I' is nonempty and hence Iu I' is also an open interval, so R is transitive. Hence R is an equivalence
Mathematics
22
relation, and the intervals I are an equivalence class partitioning A. Thus, A is a union of disjoint open intervals. The theorem now follows from 1.11. • Recall from 1.21 that :B, the Borel field of !R , is the a-field of sets generated by both the open and the closed half-lines. Since every interval is the intersection of a half-line (open or closed) with the complement of another half-line, 2.2 and 2.5 yield directly the following important fact. 2.6 Theorem :B contains the open sets and the closed sets of !R . o A collection � is called a covering for a set A t;;;; !R if A � UsEtS'B. If each B is an open set, it is called an open covering. 2.7 LindeiOf's covering theorem If � is any collection of open subsets of !R , there is a countable subcollection { Bi E � . i E [N } such that u B = U Bi. i= l
B EtS'
(2. 1 )
Consider the collection J> = {Sk = S(rk.sk), rk E (Q , Sk E (Q + } ; that is the collection of all neighbourhoods of rational points of !R, having rational radii. The set (Q x (Q + is countable by 1.5, and hence J> is countable; in other words, indexing by k E [N exhausts the set. We show that, for any open set B c !R and point X E B, there is a set sk E y> such that X E sk c B. Since X has a £-neighbourhood inside B by definition, the desired Sk is found by setting sk to any rational from the open interval (0, 1£), for £ > 0 sufficiently small, and then choosing rk E S(x,!£) as is possible by 1.10. Now for each x E UsEtS'B choose a member of J>, say Sk(x) • satisfying x E Sk(x) c B for any B E �- Letting k(x) be the smallest index which satisfies the requirement gives an unambiguous choice. The distinct members of this collection form a set that covers UsEtS'B, but is a subset of J> and hence countable. Labelling the indices of this set as k 1 ,k2 , ... , choose Bi as any member of � containing Skr Clearly, Ui= I Bi is a countable covering for U7= tSk;• and hence also for UsEtS'B. • It follows that, if � is a covering for a set in !R , it contains a countable sub covering. This is sometimes called the LindelOf property. The concept of a covering leads on to the crucial notion of compactness. A set A is said to be compact if every open covering of A contains afinite subcovering. The words that matter in this definition are 'every' and 'open' . Any open covering that has !R as a member obviously contains a finite subcovering. But for a set to be compact, there must be no way to construct an irreducible, infinite open cover ing. Moreover, every interval has an irreducible infinite cover, consisting of the singleton sets of its individual points; but these sets are not open. 2.8 Example Consider the half-open interval (0, 1]. An open covering is the count able collection { (lin, 1], n E [N } . It is easy to see that there is no finite sub collection covering (0, 1] in this case, so (0, 1] is not compact. o A set A is bounded if A t;;;; S(x,E) for some x E A and £ > 0. The idea here is that Proof
Limits and Continuity
23
E is a possibly large but finite number. In other words, a bounded set must be
containable within a finite interval. 2.9 Theorem A set in IR is compact iff it is closed and bounded. o This can be proved as a case of 5.12 below, and provides an alternative definition of compactness in IR . The sufficiency part is known as the Heine-Borel theorem. A subset B of A is said to be dense in A if B � A � B. Readers may think they know what is implied here after studying the following theorem, but denseness is a slightly tricky notion. See also 2.15 and the remarks following before coming to any premature conclusions. 2.10 Theorem Let A be an interval of IR, and C �A be a countable set. Then A - C is dense in A. Proof By 1.7, each neighbourhood of a point inA contains an uncountable number of points. Hence for each x E A (whether or not x E C), the set (A - C) n S(x,£) is not empty for every E > 0, so that x is a closure point of A - C. Thus, A - C c (A - C) u C = A � (A - C).
•
The k-fold Cartesian product of IR with copies of itself generates what is called Euclidean k-space, IR k. The points of IR k have the interpretation of k-vectors, or ordered k-tuples of real numbers, x = (x 1 ,x2 , . . . ,xk)'. All the concepts defined above for sets in IR generalize directly to IR k. The only modification required is to replace the scalars x and y by vectors x and y, and define an £-neighbourhood in a new way. Let llx -y II be the Euclidean distance between x andy, where ! Ia II = [:L7= 1 ai J 1 12 is the length of the vector a = (a 1 , ,ak) and then define S(x,E) = {y: llx - y I I < E } , for some E > 0. An open set A of IR 2 is one in which every point x E A can be contained in an open disk with positive radius centred on x. In IR 3 the open disk becomes an open sphere, and so on. • . •
'
2 . 2 Sequences and Limits
A real sequence is a mapping from IN into IR . The elements of the domain are called the indices and those of the range variously the terms, members, or coordinates of the sequence. We will denote a sequence either by {xn, n E IN }, or more briefly by {xn } i, or just by {xn } when the context is clear. {xn } i is said to converge to a limit x, if for every E > 0 there is an integer Ne for which (2.2) Write Xn ---7 x, or x = limn�ooXn· When a sequence is tending to +oo or -oo it is often said to diverge, but it may also be said to converge in tR, to distinguish those cases when it is does not approach any fixed value, but is always wandering. A sequence is monotone (non-decreasing, increasing, non-increasing, or decreas ing) if one of the inequalities Xn :::; Xn+ b Xn < Xn+ l ' Xn � Xn+ l , or Xn > Xn+l holds for every n. To indicate that a monotone sequence is converging, one may write for emphasis either Xn t x or Xn ,J, x, as appropriate, although Xn ---7 x will
24
Mathematics
also do in both cases. The following result does not require elaboration. 2.1 1 Theorem Every monotone sequence in a compact set converges. o A sequence that does not converge may none the less visit the same point an infinite number of times, so exhibiting a kind of convergent behaviour. If { xn, n E IN } is a real sequence, a subsequence is { Xnk ' k E IN } where { nk> k E IN } is any increasing sequence of positive integers. If there exists a subsequence { xnk' k E IN } and a constant c such that Xnk -7 c, c is called a cluster point of the sequence. For example, the sequence { (-It, n = 1 ,2,3, ... } does not converge, but the subsequence obtained by taking only even values of n converges trivially. c is usually a finite constant, but += and oo may be cluster points of a sequence if we allow the notion of convergence in iR. If a subsequence is convergent, then so is any subsequence of the subsequence, defined as { Xmk' k E IN } where { mk } is an increasing sequence whose members are also members of { nk } . The concept of a subsequence is often useful in arguments concerning conver gence. A typical line of reasoning employs a two-pronged attack; first one identi fies a convergent subsequence (a monotone sequence, perhaps); then one uses other characteristics of the sequence to show that the cluster point is actually a limit. Especially useful in this connection is the knowledge that the members of the sequence are points in a compact set. Such sequences cannot diverge to infin ity, since the set is bounded; and because the set is closed, any limit points or cluster points that exist must be in the set. Specifically, we have two useful results. 2.12 Theorem Every sequence in a compact set of !R has at least one cluster point. Proof A monotone sequence converges in a compact set by 2.11. We show that every sequence { Xn, n E IN } has a monotone subsequence. Define a subsequence { Xnk } as follows. Set n 1 = 1, and for k = 1,2,3, ... let Xnk+ l = supn�nkXn if there exists a finite nk+ l satisfying this condition; otherwise let the subsequence terminate at nk. This subsequence is non-increasing. If it terminates, the sub sequence { Xn, n :?: nk } must contain a non-decreasing subsequence. A monotone subsequence therefore exists in every case. • 2.13 Theorem A sequence in a compact set either has two or more cluster points, or it converges. Proof Suppose that c is the unique cluster point of the sequence {xn}, but that Xn A c. Then there is an infinite set of integers { nk . k E I.N } such that I Xnk - c I :?: £ for some £ > 0. Define a sequence {yk } by setting Yk = Xnk · Since {yk } is also a sequence on a compact set, it has a cluster point c ' which by construction is different from c. But c' is also a cluster point of {xn } , of which {yk } is a subsequence, which is a contradiction. Hence, Xn -7 c. • 2.14 Example Consider the sequence { l ,x,x2,x3 , ... ,xn, ... } , or more formally {Y, n E IN0} , nwhere x is a real number. In the case l x l < 1 , this sequence converges to zero, { I x I } being monotone on the compact interval [0, 1]. The condition specified -
Limits and Continuity
25
in (2.2) is satisfied for Ne = log(E)Ilog lxl in this case. If x = 1 it converges to 1 , trivially. If x > 1 it diverges in IR, but converges in iR to +oo. If x = -1 it neither converges nor diverges, but oscillates between cluster points + 1 and - 1 . Finally, if x < - 1 the sequence diverges in IR , but does not converge in iR. Ulti mately, it oscillates between the cluster points +oo and -oo. o We may discuss the asymptotic behaviour of a real sequence even when it has no limit. The superior limit of a sequence {xn} is limsup Xn = inf sup Xm. (2.3) n n m Ne. A sequence satisfying this cri terion is called a Cauchy sequence. Any sequence satisfying (2.2) is a Cauchy sequence, and conversely, a real Cauchy sequence must possess a limit in IR. The two definitions are therefore equivalent (in IR, at least), but the Cauchy condi tion may be easier to verify in practice. The limit of a Cauchy sequence whose members all belong to a set A is by defini tion a closure point of A, though it need not itself belong to A. Conversely, for every accumulation point x of a set A there must exist a Cauchy sequence in the set whose limit is x. Construct such a sequence by taking one point from each of the sequence of sets, {A r1 S(x, 1/n), n = 1,2,3, .. } ,
Mathematics
26
none of which are empty by definition. The term limit point is sometimes used synonymously with accumulation point. The following is a fundamental property of the reals. 2.15 Theorem Every real number is the limit of a Cauchy sequence of rationals. Proof For finite n let Xn be a number whose decimal expansion consists only of zeros beyond the nth place in the sequence. If the decimal point appears at position m, with m > n, then Xn is an integer. If m s n, removing the decimal point produces a finite integer a, and Xn = a llOn -m , so Xn is rational. Given any real x, a sequence of rationals {xn } is obtained by replacing with a zero every digit in the decimal expansion of x beyond the nth, for n = 1 2, ... Since l xn+ l - xn l < w-n , {xn } is a Cauchy sequence and Xn --7 X as n --7 oo . • The sequence exhibited is increasing, but a decreasing sequence can also be con structed, as { -yn } where {Yn } is an increasing sequence tending to -x. If x is itself rational, this construction works by putting Xn = x for every n, which trivially defines a Cauchy sequence, but certain arguments such as in 2.16 below depend on having Xn :f:. x for every n. To satisfy this requirement, choose the 'non terminating' representation of the number; for example, instead of 1 take 0.9999999 . .. , and consider the sequence {0.9, 0.99, 0.999, ... } . This does not work for the point 0, but then one can choose {0. 1 , 0.01 , 0.001, ... } . One interesting corollary of 2.15 is that, since every £-neighbourhood of a real number must contain a rational, (Q is dense in IR . We also showed in 2.10 that IR (Q is dense in IR , since (Q is countable. We must be careful not to jump to the conclu sion that because a set is dense, its complement must be 'sparse ' . Another version of this proof, at least for points of the interval [0, 1], is got by using the binary expansion of a real number. The dyadic rationals are the set [) = { i/2", i 1, ... ,2n - 1, n E [N } . (2.5) The dyadic rationals corresponding to a finite n define a covering of [0, 1] by intervals of width 1/2n , which are bisected each time n is incremented. For any x n E [0, 1], a point of the set { i/2", i = 1, ... ,2 - 1 } is contained in S(x,E} for E < 212", so the dyadic rationals are dense in [0, 1]. [) is a convenient analytic tool when we need to define a sequence of partitions of an interval that is becom ing dense in the limit, and will often appear in the sequel. Another set of useful applications concern set limits in IR. 2.16 Theorem Every open interval is the limit of a sequence of closed sub intervals with rational endpoints. Proof If (a,b) is the interval, with a < b, choose Cauchy sequences of rationals a n .J., a and bn t b, with a 1 < b 1 (always possible by 1.10). By definition, for every x E (a,b) there exists N � 1 such that x E [an,bn] for all n � N, and hence (a,b) c Iiminfn [a n bn] On the other hand, since an > a and b > bn, (a,b) c � [ambn]c for all n � 1, so that (a,bt � liminfn[an,bn Y This is equivalent to limsupn[an,bn] � (a,b). Hence lim [a ,bn] exists and is equal to (a,b). • ,
-
=
,
.
n
n
·
Limits and Continuity
27
This shows that the limits of sequences of open sets need not be open, nor the limits of sequences of closed sets closed (take complements above). The only hard and fast rules we may lay down are the following corollaries of 2.4(i): the limit of a non-decreasing sequence of open sets is open, and (by complements) the limit of a non-increasing sequence of closed sets is closed. 2 . 3 Functions and Continuity
A function of a real variable is a mapping f: S H lf, where S c IR , and lf � IR . By specifying a subset of IR as the codomain, we imply without loss of generality that f(S) lf, such that the mapping is onto lT . Consider the image in lf , under f, of a Cauchy sequence {xn } in S converging to x. If the image of every such sequence converging to x E S is a Cauchy sequence in lf converging to f(x), the function is said to be continuous at x. Continuity is formally defined, without invoking sequences explicitly, using the £ - () approach. f is continuous at the point x E S if for any £ > 0 ::3 0 such that I y - x I < () implies I f(y) - f(x) I < £, whenever y E S. The choice of () here may depend on x. If f is continuous at every point of S, it is simply said to be continuous on S. Perhaps the chief reason why continuity matters is the following result. 2.17 Theorem If f: S H lf is continuous at all points of S, f- 1 (A) is open in S whenever A is open in lT, and r 1 CA) is closed in s whenever A is closed in lf. D This important result has several generalizations, of which one, the extension to vector functions, is given in the next section. A proof will be given in a still more general context below; see 5.19. Continuity does not ensure that f(A)is open when A is open. A mapping with this property is called an open mapping, although, since f(Ac) i= f(Af in general, we cannot assume that an open mapping is also a closed mapping, taking closed sets to closed sets. However, a homeomorphism is a function which is 1-1 onto, contin uous, and has a continuous inverse. If f is a homeomorphism so is f - 1 , and hence by 2.17 it is both an open mapping and a closed mapping. It therefore preserves the structure of neighbourhoods, so that, if two points are close in the domain, their images are always close in the range. Such a transformation amounts to a relabelling of axes. If f(x + h) has a limit as h -!- 0, this is denoted f(x+). Likewise, f(x-) denotes the limit of f(x - h). It is not necessary to have x E S for these limits to exist, but if f(x) exists, there is a weaker n.,ption of continuity at x. f is said to be right-continuous at the point x E S if, for any £ > 0, ::3 () > 0 such that whenever 0 � h < () and x + h E S, l f(x + h) - f(x) l < £. (2.6) It is said to be left-continuous at x if, for any £ > 0, 3 () > 0 such that when ever 0 � h < () and x - h E S, l f(x) - f(x - h) l < £. (2.7) ==
Mathematics
28
Right continuity at x implies f(x) = f(x+) and left continuity at x implies f(x) f(x- ). If f(x) = f(x+) f(x-), the function is continuous at x. Continuity is the property of a point x, not of the function f as a whole. Despite continuity holding pointwise on 5>, the property may none the less break down as certain points are approached. 2.18 Example Consider f(x) = 1/x, with 5l = ""IT" = (0, oo) For £ > 0, 2 0 £X l f(x + O) - f(x) ! = x(x + O) < £ iff o < 1 - £X and hence the choice of o depends on both £ and x. f(x) is continuous for all x > 0, but not in the limit as x --7 0. o The function f: 5l 1---7 ""IT" is uniformly continuous if for every £ > 0 3 o > 0 such that (2.8) l x - y l < 0 ::::} l f(x) - f(y) ! < £ for every x,y E 5>. In 2.18 the function is not uniformly continuous, for whichever o is chosen, we can pick x small enough to invalidate the definition. The problem arises because the set on which the function is defined is open and the boundary point is a discontinuity. Another class of cases that gives difficulty is the one where the domain is unbounded, and continuity at x is breaking down as x --7 oo However, we have the following result. 2.19 Theorem If a function is continuous everywhere on a compact set 5>, then it is bounded and uniformly continuous on 5>. o (For proof, see 5.20 and 5.21.) Continuity is the weakest concept of smoothness of a function. So-called Lip schitz conditions provide a whole class of smoothness properties. A function f is said to satisfy a Lipschitz condition at a point x if, for any y E S(x,O) for some o > 0, 3 M > 0 such that (2.9) ! f(y) - f(x) ! � Mh( ! x -y ! ) where h: IR + 1---7 IR + satisfies h(d) J, 0 as d t 0. f is said to satisfy a uniform Lipschitz condition if condition (2.9) holds, with fixed M, for all x,y E 5>. The type of smoothness imposed depends on the function h. Continuity (resp. uniform continuity) follows from the Lipschitz (resp. uniform Lipschitz) property for any choice of h. Implicit in continuity is the idea that some function o(.): IR + 1---7 IR + exists satisfying 0(£) J, 0 as £ J, 0. This is equivalent to the Lipschitz condition holding for some h(.), the case h = o- 1 . By imposing some degree of smoothness on h - making it a positive power of the argument for example - we impose a degree of smoothness on the function, forbidding sharp 'corners' . The next smoothness concept is undoubtedly well known to the reader, although differential calculus will play a fairly minor role here. Let a function f: 5l 1---7 ""IT" be continuous at x E 5> . If
=
=
.
.
Limits and Continuity f�(x)
=
{
lim f(x + Mo
�- f(x� J
29 (2. 10)
exists, f�(x) is called the left-hand derivative of f at x. The right-hand deriva tive, f�(x) , is defined correspondingly for the case h t 0. If f�(x) = f�(x), the common value is called the derivative of f at x, denoted f'(x) or dfldx, and f is said to be differentiable at x. If f': S � IR is a continuous function, f is said to be continuously differentiable on S. A function f is said to be non-decreasing (resp. increasing) if f(y) 2 f(x) (resp. f(y) > f(x)) whenever y > x. It is non-increasing (resp. decreasing) if -f is non-decreasing (resp. increasing). A monotone function is either non-decreasing or non-increasing. When the domain is an interval we have yet another smoothness condition. A function f: [a,b] � IR is of bounded variation if 3 M < oo such that for every partition of [a,b] by finite collections of points a = Xo < X! < ... < Xn = b, n (2. 1 1) L I f(x i) - f(x i- 1 ) I � M.
k=l
2.20 Theorem If and only if f is of bounded variation, there exist non-decreasing functions f1 and h such that f = h - fJ . o (For proof see Apostol 1974: Ch. 6.) A function that satisfies the uniform Lipschitz condition on [a,b] with h( I x - y I ) = I x - y I is of bounded variation on [a,b]. 2 . 4 Vector Sequences and Functions
A sequence {xn } of real k-vectors is said to converge to a limit x if for every E > 0 there is an integer Ne for which (2. 12) ll xn - x ll < E for all n > Ne. The sequence is called a Cauchy sequence in IR k iff ll xn - Xm II < E whenever n > Ne and m > Ne. A function '
f: s
�
'IT',
where S c IR k , and 'IT' � IR, associates each point of S with a unique point of 'IT'. Its graph is the subset of Sx'U' consisting of�the (k + I)-vectors {x, f(x) } for each x E S. f is continuous at x E S if for any E > 0 3 8 > 0 such that (2. 13) l i b I I < 8 ::::} lf(x + b) - f(x) I < E whenever x + b E S. The choice of 8 may here depend on x. On the other hand, f is uniformly continuous on S if for any E > 0, 3 8 > 0 such that (2. 14) li b II < 8 ::::} sup l f(x + b) - f(x) I < E. x e S ,x+b e 5l
Mathematics
30
A vector f (f1 , Jm)' of functions of x is called, simply enough, a vector function? Continuity concepts apply element-wise to f in the obvious way. The function f : s; f----7 s;, s; c !R k is said to be one-to-one if there exists a vector function f- 1 : 5) f----7 5>, such that f - 1 (j(x)) x for each x E 5>. An example of a 1-1 continuous function is the affine transformation3 f(x) = Ax + b for constants b (k x l) and A (k x k) with IA I * 0 , having inverse f - 1 (y) 1 1 A - (y - b ) . In most other cases the function f - does not possess a closed form, but there is a generalization of 2.17, as follows. 2.21 Theorem lffj: 5) f----7 "U" is continuous, where 5) c !R k and "U" c !R m ,J-\A) is open in 5) when A is open in "U", andf- 1 (A) is closed in 5) when A is closed in "U". o ==
•••
==
==
2 . 5 Sequences of Functions
Let fn: Q f----7 lf, "U" c !R, be a function, where in t�is case Q may be an arbitrary set, not necessarily a subset of !R . Let Um n E [N } be a sequence of such func tions. If there exists f such that, for each ro E n, and £ > 0, 3 NEro such that I fn(ro) - f(ro) I < £ when n > NEw• then fn is said to converge to f,pointwise on n. As for real sequences, we use the notations fn -----) f, fn t f, or fn -i f, as approp riate, for general or monotone convergence, where in the latter case the mono tonicity must apply for every ro E Q. This is a relatively weak notion of conver gence, for it does not rule out the possibility that the convergence is breaking down at certain points of n. The following example is related to 2.18 above. 2.22 Example Let fn(x) n!(nx + 1), x E (O,oo). The pointwise limit of fn(x) on (O,oo) is 1/x. But ==
j tn(x) - ��
==
x(nx\ 1) '
and 1/(x(Nf.Xx + 1)) < £ only for NEX > (1/EX - 1)(1/x). Thus for given £, NEX -----) oo as x -----) 0 and it is not possible to put an upper bound on Nex such that l fn(x) - llxl < £, n � Nex, for every x > 0. o To rule out cases of this type, we define the stronger notion of uniform conver gence. If there exists a function f such that, for each £ > 0, there exists N such that sup I fn(ffi) - f(ro) I < £ when n > N, ffi E Q
fn is said to converge to f uniformly on n.
Limits and Continuity
31
2 . 6 S ummability and Order Relations
The sum of the terms of a real sequence {xn}i is called a series, written 2:';;'= 1xn (or just Lxn). The terms of the real sequence { �= I Xm, n E IN } are called the partial sums of the series. We say that the series converges if the partial sums converge to a finite limit. A series is said to converge absolutely if the mono tone sequence { �= 1 1 Xm I, n E IN } converges. 2.23 Example Consider the geometric series, L}= 1x 1. This converges to 1/(1 - x) when lxl < 1 , and also converges absolutely. It oscillates between cluster points 0 and 1 for x = - 1 , and for other values of x it diverges. o 2.24 Theorem If a series converges absolutely, then it converges. Proof The sequence { �= 1 1 Xm I , n E IN } is monotone, and either diverges to +oo or converges to a finite limit. In the latter case the Cauchy criterion implies that l xn l + .... + l xn+m l --7 0 as m and n tend to4 infinity. Since l xn l + .. . + l xn+m l � l xn + .... +Xn+m l by the triangle inequality, convergence of { L�=!Xm, n E IN } follows by the same criterion. • An alternative terminology speaks of summability. A real sequence {xn }i is said to be summable if the series Lxn converges, and absolutely summable if { I Xn I } i is summable. Any absolutely summable sequence is summable by 2.24, and any summable sequence must be converging to zero. Convergence to zero does not imply summability (see 2.27 below, for example), but convergence of the tail sums to zero is necessary and sufficient. 2.25 Theorem Iff { xn }i is summable, L;;;=nXm --7 0 as n --7 oo. Proof For necessity, write l l:;'= l xm l ::; l l:�:}xm l + I .L;;;=nxm l · Since for any £ > 0 there exists N such that I r;;;=nXm I < £ for n � N, it follows that I r;;;= 1Xm I ::; l l:�:}xm l + £ < oo. Conversely, assume summability and let A = 2:';;'= 1Xn. Then � mn =- 1Xm --7 0 as n --7 00 • • �= ,L..,m=nXm = A ,L.., 1 A sequence { Xn } i is Cesaro-summable if the sequence { n - 1 L�= 1 xm } i converges. This is weaker than ordinary convergence. 2.26 Theorem If { xn } i converges to x, its Cesaro som also converges to x. o But a sequence can be Cesaro-summable in spite of not converging. The sequence in Cesaro sum to zero, whereas the partial sum sequence { (- 1 t} o converges {�=0(- 1 )m }0 converges in Cesaro sum tb :! (compare 2.14). Various notations are used to indicate the relationships between rates of diver gence or convergence of different sequences. If {xn}i is any real sequence, {an } i is a sequence of positive real numbers, and there exists a constant B < oo such that l xn l lan ::; B for all n, we say that Xn is (at most) of the order of magnitude of am and write Xn = O(an). If {xn lan} converges to zero, we write Xn = o(an), and say that Xn is of smaller order of magnitude than an. an can be increasing or decreasing, so this notation can be used to express an upper bound either on the rate of !!fowth of a diver!!ent seauence. or on the rate of convergence of a .
-
Mathematics
32
sequence to zero. Here are some rules for manipulation of 0(.), whose proof follows from the definition. If Xn = O(n a) and Yn = O(nP), then Xn + Yn = O(nmax { a,p j) (2. 15) XnYn O(na+P ), (2. 1 6) � = O(naP ), whenever � is defined. (2. 17) =
An alternative notation for the case Xn � 0 is Xn « an , which means that there is a constant, 0 < B < oo, such that Xn � Ban for all n. This may be more convenient in algebraic manipulations. The notation Xn - an will be used to indicate that there exist N � 0, and finite constants A > 0 and B � A, such that infn�Xn fan) � A and supn�Xn fan) � B. This says that {xn } and {an } grow ultimately at the same rate, and is different from the relation Xn = O(an), since the latter does not exclude Xn !an � 0. Some authors use Xn - an in the stronger sense of Xn fan � 1 . 2.27 Theorem If { xn } is a real positive sequence, and Xn - n a, (i) if a > - 1 then I,�= I Xm - n 1 +a; (ii) if a = - 1 then L�= I Xm - log n; (iii) if a < -1 then I.;;;=]Xm < 00 and I.;;;=nXm O(n l +a). Proof By assumption there exist N � 1 and constants A > 0 and B � A such that An a � Xn � Bn a for n � N, and hence A I.�=Nm a � L�=NXm � BI,�=Nma . The limit of I,�= 1 m a as n � oo for different values of a defines the Riemann zeta function for a < -1, and its rates of divergence for a � -1 are standard results; see e.g. Apostol (1974: Sects. 8. 12-8. 13). Since the sum of terms from 1 to N- 1 is finite, their omission cannot change the conclusions. • It is common practice to express the rate of convergence to zero of a positive real sequence in terms of the summability of the coordinates raised to a given power. The following device allows some furtherrefinement of summability condi tions. Let U(v) be a positive function of v. If U(vx)IU(v) � .xP as v � oo (0) for x > 0 and -oo < p < +oo, U is said to be regularly varying at infinity (zero). If a positive function L(v) has the property L(vx)fL(v) � 1 for x > 0 as v � oo (0), it is said to be slowly varying at infinity (zero). Evidently, any regularly varying function can be expressed in the form U(v) = vPL(v), where L(v) is slowly varying. While the definition allows v to be a real variable, in the cases of interest we will have v = n for n e [N, with U and L having the interpretation of positive sequences. 2.28 Example (log v)a is slowly varying at infinity, for any a. o On the theory of regular variation see Feller (1971), or Loeve (1977). The impor tant property is the following. 2.29 Theorem If L is slowly varying at infinity, then for any 8 > 0 there exists N � 1 such that =
Limits and Continuity
33
(2. 1 8) Hence we have the following corollary of 2.27, which shows how the notion of a convergent power series can be refined by allowing for the presence of a slowly varying function. 2.30 Corollary If Xn = O(naL(n)) then :�:�;;;'= 1 xn < oo for all a < - 1 and all functions L(n) which are slowly varying at infinity. o On the other hand, the presence of a slowly varying component can affect the summability of a sequence. The following result can be proved using the integral test for series convergence (Apostol 1974: Sect. 8. 12). 2.31 Theorem If Xn - 1/[n(log n) 1 +1i] with () > 0, then :L';;'= tXn < oo If () = 0, then I,�= I Xm - log log n. o 2.32 Theorem (Feller 197 1 : 275) If a positive monotone function U(v) satisfies U(vx) (2. 19) U(v) ---7 \jl(x), all x E D, where D is dense in IR+, and 0 < \jl(x) < oo, then \jl(x) = x P for oo < p < oo o To the extent that (2. 1 9) is a fairly general property, we can conclude that monotone functions are as a rule regularly varying. 2.33 Theorem The derivative of a monotone regularly varying function is regu larly varying at oo Proof Given U(v) = vPL(v), write (2.20) U'(v) = pvP - 1 L(v) + vPL'(v) = vP - \pL(v) + vL'(v)). If L'(v) ---7 0 there is no more to show, so assume liminfvL'(v) > 0. Then .
-
.
.
( )
!!_ L(vx) dv L(v)
which implies L'(vx)/L'(v)
U'(vx) � -:---:-"U'(v)
=
=
{
L'(v) L'(vx) L(vx) L(v) L'(v) L(v)
---7
_
)
---7
0'
(2.21)
•
(2.22)
1 . Thus,
X p - IPL(vx) + vxL'(vx) pL(v) + vL'(v)
---7
X P.
2 . 7 Arrays
Arguments concerning stochastic convergence often involve a double-indexing of elements. An array is a mapping whose domain is the Cartesian product of count able, linearly ordered sets, such as [N x [N or ::r x [N, or a subset thereof. A real double array, in particular, is a double-indexed collection of numbers, or, alter natively, a sequence whose members are real sequences. We will use notation such as { { xnt' t E ::r } , n E [N }, or just {xnr} when the context is clear.
Mathematics
34
A collection of finite sequences { {xn1, t 1, . . . ,kn }, n E IN } , where kn t oo as n � oo, is called a triangular array. As an example, consider array elements of the form Xnr y1/n, where { y1, t = 1, ... ,n } is a real sequence. The question of whether the series { I.7= tXnt• n E IN } converges is equivalent to that of the Cesaro convergence of the original sequence; however, the array formulation is frequently the more convenient. 2.34 Toeplitz's lemma Suppose { Yn } is a real sequence and Yn � y. If { { Xn1, t 1 , . .. ,kn }, n E IN } is a triangular array such that (a) Xnr � 0 as n � oo for each fixed t, =
=
=
kn
(b) lim L l xnrl :::; C < n�oo t=l
00,
kn
(c) lim ,Lxnr = 1, n�oo t=l then I.�g 1 xnrYr � y. For y = 0, (c) can be omitted. Proof By assumption on { Yn L for any E > 0 3 Ne 2 1 such that for n I Yn - y I < EIC. Hence by (c), and then (b) and the triangle inequality, kn
lim L XnrYt - Y n�oo t=l
=
>
Ne,
kn
lim L Xnt(yt - Y) n�oo t=l Ne
(2.23) :::; lim L Xnt(yt - y) + £ = £, n�oo t=l in view of (a). This completes the proof, since £ is arbitrary. • A particular case of an array { Xnr } satisfying the conditions of the lemma is Xnr (I.�=IYst 1 y1, where { y1} is a positive sequence and L�=IYs � oo. A leading application of this result is to prove the following theorem, a funda mental tool of limit theory. 2.35 Kronecker's lemma Consider sequences { a1} 1 and { x1 } 1 of positive real numbers, with a1 t 00 If I.7= 1 x1/a1 � C < oo as n � oo, 1 n � 0. (2.24) an ,Lxr t=l =
•
co = 0 and Cn = I.7=tX11a1 for n E IN, note that x1 = ar(c1 - Cr- t), t 1 , ... ,n. Also define ao = 0 and b1 = a1 - ar-1 for t = 1, ... ,n, so that an = I.7=tb1. Now apply the identity for arbitrary sequences ao, ... ,an and ca, ... ,cm n n (2.25) ,L atCcr - Cr-1 ) = ,L Car-1 - ar) Cr- 1 + anCn - aoco. Proof Defining =
t=l
t=l
(This is known as Abel's partial summation formula.) We obtain
Limits and Continuity
35
(2.26) where the convergence is by the Toeplitz lemma, setting Xnt = b1 !an . • The notion of array convergence extends the familiar sequence concept. Consider for full generality an array of subsequences, a collection { { Xmnk' k E IN } m E IN } , where { nk, k E IN } is an increasing sequence of positive integers. If the limit Xm = limk�ooXmnk exists for each m E IN, we would say that the array is convergent; and its limit is the infinite sequence { Xm, m E IN } . Whether this sequence converges is a separate question from whether it exists at all. Suppose the array is bounded, in the sense that supk,m i Xmnk l ::; s < =. We know by 2.12 that for each m there exists at least one cluster point, say Xm, of the inner sequence {Xmnk' k E IN }. An important question in several contexts is this: is it valid to say that the array as a whole has a cluster point? 2.36 Theorem Corresponding to any bounded array { { Xmnk' k E IN } , m E IN } , there exists a sequence {xm } , the limit of the array { {xmnl• k E IN } , m E IN } as k � =, where { nk} is the same subsequence of { nd for each m. Proof This is by construction of the required subsequence. Begin with a conver gent subsequence for m 1 ; let { n1} be a subsequence of { nk } such that x1,nl � x1. Next, consider the sequence {xz,nl } . Like {xz,nk}, this is on the bounded interval (-B,B), and so contains a convergent subsequence. Let the indices of this latter subsequence, drawn from the members of {n1}, be denoted {nt} and note that Xt,ni � Xt as well as xz,ni ---7 xz. Proceeding in the same way for each m gener ates an array { {nT, k E IN }, m E IN }, having the property that {xi,nT• k E [N } is a convergent sequence for 1 ::; i ::; m. Now consider the sequence { nZ, k E IN } ; in other words, take the first member of {nk}, the second member of {nt}, and so on. For each m, this sequence is a sub sequence of {nT } from the mth point of the sequence onwards, and hence the sequence {xm,ni• k � m } is convergent. This means that the sequence {xm,n�· k E IN } is convergent, so setting {nk} {nZ} satisfies the requirement of the theorem. • =
=
This is called the 'diagonal method' . The el�ments nZ may be thought of as the diagonal elements of the square matrix (of infinite order) whose rows contain the sequences { nT} , each a subsequence of the row above it. This theorem holds independently of the nature of the elements {Xmn } . Any space of points on which convergent sequences are defined could be substituted for !R . We shall need a generalization on these lines in Chapter 26, for example.
3 Measure
3 . 1 Measure Spaces
A measure is a set function, a mapping which associates a (possibly extended) real number with a set. Commonplace examples of measures include the lengths, areas, and volumes of geometrical figures, but wholly abstract sets can be 'measured' in an analogous way. Formally, we have the following definition. 3.1 Definition Given a class '!F of subsets of a set Q, a measure J.l: '!F 1--7 [R is a function having the following properties: (a) J.L(A) � 0, all A E '!F. (b) j.!(0) 0. (c) For a countable collection { Aj E '!F, j E IN } with Aj n Af = 0 for j :1= j' and UjAj E '!F, (3. 1) J.l uAj = L J.l(Aj). o =
( ) 1
1
The particular cases at issue in this book are of course the probabilities of random events in a sample space Q; more of this in Chapter 7. Condition (a) is optional and set functions taking either sign may be referred to as measures (see e.g. §4.4), but non-negativity is desirable for present purposes. A measurable space is a pair (Q,'!F) where Q is any collection of objects, and '!F is a a-field of subsets of Q. When (Q,'!f) is a measurable space, the triple (Q,'!f,J.l) is called a measure space. More than one measure can be associated with the measurable space (Q,'!f), hence the distinction between measure space and measur able space is important. Condition 3.1(c) is called countable additivity. If a set function has the property (3.2) J.l(A u B) J.l(A) + J.!(B) for each disjoint pair A,B, a property that extends by iteration to finite collec tions A1, , A n, it is said to be finitely additive. In 3.1 '!F could be a field, but the possibility of extending the properties of J.l to the corresponding a-field, by allowing additivity over countable collections, is an essential feature of a measure. If j.!(Q) < oo the measure is said to be finite. And if Q Uj Qj where { Qj} is a countable collection of '!F-sets, and J.L(il.i) < oo for each j, J.l is said to be a-finite. In particular, if there is a collection !I such that '!F a(!/) and Qj E !I =
• . .
=
=
Measure
37
{An B: B AA
for each j, j.l is said to be a-finite on !f (rather than on r:J). If r:JA = E ) r:J } for some E r:J, A is a measurable space and ( ,r:JA,j.l) is a measure space called the restriction of (Q,r:J,j.l) to If in this case = 0 (equivalent to = j.!(Q) when j.!(Q) < =)A is called a support of the measure. When supports n, the sets of r:JA have the same measures as the corresponding ones of r:J. point ffi ffi}) > 0 is called an atom of the measure. E Q with the property 3.2 Example The case closest to everyday intuition is Lebesgue measure, m, on the measurable space (IR,13), where 13 is the Borel field on IR . Generalizing the notion of length in geometry, Lebesgue measure assigns m((a,b]) = b - a to an interval (a,b]. Additivity is an intuitively plausible property if we think of measuring the total length of a collection of disjoint intervals. Lebesgue measure is atomless (see 3.15 below), every point of the line taking measure 0, but m(IR) = =. Letting ((a,b], 13ca,bl • m) denote the restriction of (IR,13,m) to a finite interval, m is a finite measure on (a,b]. Since IR can be partitioned into a countable collection of finite intervals, m is a-finite. o Some additional properties may be deduced from the definition: 3.3 Theorem For arbitrary r:J-sets and j E IN } , (i) c =:::> (monotonicity). (ii) = + u (iii) (countable subadditivity). Proof To show (i) note that and are disjoint sets whose union is by hypothesis, and use 3.1(a) and 3.1(c). To show (ii), use and in each union are disjoint. The result where again the sets = follows on application of 3.1(c). To show (iii), define = and = Note that the sets are disjoint, that and that = Hence, Uj
j...L(A)
A (A ,r:J
Aj.!(Ac)
A.
j.!( {
A, B, {Aj, Aj.!(A BB) + j.!(j.!A(A)n B)::;; j.!(B)I!(A) j.!(B). j...L(UAj) ::;; Ljll(Aj) A B-A B, Au B =Au (B-A) B (A n B) u (B-A), B A En 1 1 An-Uj:} En En s An, U}=tBj =IAj. Aj. (3.3) � (ffi) � (QB;) t �(Bj) ,; t�Ai)· =
•
=
This proof illustrates a standard technique of measure theory, converting a sequence of sets into a disjoint sequence having the same union by taking differ ences. This trick will become familiar in numerous later applications. The idea behind 3.3(ii) can be extended to give an expression for the measure of any finite union. This is the inclusion-exclusion formula: �
� (Q,Aj) t�(Ai) - t;�(Air> Ak) + ��(Ai r> Akn A1) ± j.!(At nA 2 n . . nAn), (3.4) where the sign of the last term is negative if n is even and positive if n is odd, and there are 2n - 1 terms in the sum in total. The proof of (3.4) is by induction from 3.3(ii), substituting for the second term on the right-hand side of =
• • .
Mathematics
38
(3.5) repeatedly, for n - 1. n - 2, ... , 1 . Let {An, n E [N } be a monotone sequence of �-sets with limit A E �. A set func tion on 11 is said to be continuous if 11(An) --7 11(A). 3.4 Theorem A finite measure is continuous. Proof First let {An } be increasing, with An - I � An, and A U;= IAn. The sequence { B1, j E fN } , where B 1 A 1 , and B1 = A1 - AJ - I for j > 1 is disjoint by construc tion, with B1 E �. An = U]= IB1, and =
=
n
(3.6)
11(An) = L 11(Bj). J==l
The real sequence { 11(An) } is therefore monotone, and converges since it is bounded above by 11(Q) < oo Countable additivity implies LJ=II1(B1) = 11(U }==1B1) = 11(A). Alternatively, let { An } be decreasing, with An-I ;;;;? An and A n;=IAn. Consider the increasing sequence {Aj } , determine il(Ac) by the same argument, and use finite additivity to conclude that 11(A) = 11(Q) - 11(A c) is the limit of 11(An) = 11(.Q) - 11(A�). • The finiteness of the measure is needed for the second part of the argument, but the result that 11(An) -7 11(A) when An 1' A actually holds generally, not excluding the case 11(A) This . theorem has a partial converse: 3.5 Theorem A non-negative set function 11 which is finitely additive and contin uous is countably additive. Proof Let { Bn } be a countable, disjoint sequence. If An U]=1B1, the sequence { An } is increasing, Bn n An-! = 0, and so 11(An) 11(Bn) + 11(An- I ) for every n, by finite additivity. Given non-negativity, it follows by induction that { 11(An) } is monotone. If A = U}=1 B1, 11(A) = LJ=II1(B1), whereas continuity implies that 11(A) = .
=
=
oo .
=
=
11(U }= 1Bj) .
•
Arguments in the theory of integration often turn on the notion of a 'negligible' set. In a measure space (.Q,�,I1), a set of measure zero is (simply enough) a set M E � with J.l(M) = 0. A condition or restriction on the elements of n is said to occur almost everywhere (a.e.) if it holds on a set E and .Q - E has measure zero. If more than one measure is assigned to the same space, it may be necessary to indicate which measure the statement applies to, by writing a.e.[l1] or a.e.[v] as the case may be. 3.6 Theorem
(i) If M and N are �-sets, M has measure 0 and N � M, then N has measure 0. (ii) If { M1 } is a countable sequence with 11(M1) = 0, V j, then 11(U1 M1) = 0. (iii) If {0·} is a countable sequence with 11(Ef) 0, V j, then 11((U E1Y) = 0. =
Measure
39
Proof (i) is an application of monotonicity; (ii) is a consequence of countable
additivity; and (iii) follows likewise, using the second de Morgan law. • In §3.2 and §3.3 we will be concerned about the measurability of the sets in a given space. We show that, if the sets of a given collection are measurable, the sets of the a-field generated by that collection are also measurable (the Exten sion Theorem). For many purposes this fact is sufficient, but there may be sets outside the a-field which can be shown in other ways to be measurable, and it might be desirable to include these in the measure space. In particular, if (A) it would seem reasonable to assign J.L(E) = J.L(A) whenever B. This is equivalent to assigning measure 0 to any subset of a set of measure 0. The measure space (Q,�,Jl) is said to be complete if, for any set � with 0, all subsets of are also in �- According to the following result, every measure space can be completed without changing any of our conclusions except in respect of these negligible sets. 3.7 Theorem Given any measure space (Q,�,J.L), there exists a complete measure space (Q,�Il,ji), called the completion of (Q,�,J.L), such that � �ll, and for all E �- D Notice that the completion of a space is defined with respect to a particular measure. The measurable space (Q,�) has a different completion for each measure that can be defined on it. Proof Let Nil denote the collection of all subsets of �-sets of J.L-measure 0, and �ll � Q: Nil for some (3.7) �}. If 0, any set satisfies the criterion of (3.7) and so is in �ll as the definition requires. For �ll, let where is any �-set satisfying E Nil. To show that the choice of is immaterial, let £ 1 and £2 be two such sets, and note that (3.8) 0. Since J.L(E1 u £2) n £2) + £2), we must conclude that n £2) � J.L( i) � (3.9) n £2) for i = 1 and 2, or, = J.L £ ). Hence, the measure is unique. When �' we since can choose 0 Nil, confirming that the measures agree on �. It remains to show that �ll is a a-field co:Q.taining �- Choosing F in (3.7) for � shows � � �ll. IfF �ll, then Nil for E � and hence Nil where �, and so �ll. And finally if �ll for j IN, there exist � for j IN , such that Nil. Hence
J.L(B) =
JI(E)
A cE c J.L = E E J.L(E)
E
c
E
J.L(E) =
EE = {F E!l.F E J.L(E) = F cFEE JI(F) = J.L(E), E !l FE E J.L(E1 !l. E2) = J.L((F!l.El )!l(F!l.E2)) = = J.L(El J.L(El !l J.L(El E J.L(El J.L ( E ) ( FE 2 1 E = F, F !l F = E E = Ee !l Fe FE E E !l FE E = E !l.FEjE E EeE E FeEjE!l Fj E Fj E E (3. 10) (wEj) !l (uFj) c L) CEj!l.Fj) E by 3.6(ii). This means that UjFj E �ll, and completes the proof. 1
1
1
Nil,
•
40
Mathematics
3 . 2 The Extension Theorem
You may wonder why, in the definition of a measurable space, � could not simply be the set of all subsets; the power set of n. The problem is to find a consistent method of assigning a measure to every set. This is straightforward when the space has a finite number of elements, but not in an infinite space where there is no way, even conceptually, to assign a specific measure to each set. It is necessary to specify a rule which generates a measure for any designated set. The problem of measurability is basically the problem of going beyond constructive methods with out running into inconsistencies. We now show how this problem can be solved for a-fields. These are a sufficiently general class of sets to cope with most situa tions arising in probability. One must begin by assigning a measure, to be denoted)..l{J , to the members of some basic collection � for which this can feasibly be done. For example, to construct Lebesgue measure we started by assigning to each interval (a,b] the measure b - a. We then reason from the properties of J...4:J to extend it from this basic collection to all the sets of interest. � must be rich enough to allow J...4:J to be uniquely defined by it. A collection � c � is called a determining class for (.Q,�) if, whenever J..l and v are measures on �. J..l(A) = v(A ) for all A E � implies that J..l = v. Given �, we must also know how to assign )..l{J-values to any sets derived from � by operations such as union, intersection, complementation, and difference. For disjoint sets A and B we have )..l{J(A u B) = J..ln(A) + J..ln(B) by finite additivity, and when B � A, J..ln (A - B) = J..ln(A) - )..l{J(B). We also need to be able to determine J..ln (A n B), which will require specific knowledge of the relationship between the sets. When such assignments are possible for any pair of sets whose measures are themselves known, the measure is thereby extended to a wider class of sets, to be denoted !f. Often !f and � are the same collection, but in any event !f is closed under various finite set operations, and must at least be a semi-ring. In the applications !f is typically either a field (algebra) or a semi-algebra. Example 1.18 is a good case to keep in mind. However, !f cannot be a a-field since at most a finite number of operations are permitted to determine J..ln(A) for any A E !f. At this point we might pose the oppo site question to the one we started with, and ask why !f might not be a rich enough collection for our needs. In fact, events of interest frequently arise which !f cannot contain. 3.15 below illustrates the necessity of being able to go to the limit, and consider events that are expressible only as countably infinite unions or intersections of �-sets. Extending to the events � = a(!f) proves indispensable. We have two results, establishing existence and uniqueness respectively. 3.8 Extension theorem (existence) Let !f be a semi-ring, and let )..l{J: !f 1---7 iR+ be a measure on !f. If � = a(!f), there exists a measure J..l on (.Q,:¥), such that J..l(E) = )..l{J (E) for each E E !f. o Although the proof of the theorem is rather lengthy and some of the details are fiddly, the basic idea is simple. Take an event A c .Q to which we wish to assign a
Measure
41
measure 11(A). If A E :1, we have 11(A) = J.lo(A). If A � :1, consider choosing a finite or countable covering for A from members of !!; that is, a selection of sets E1 E :1, j = 1 ,2,3, ... such that A c U1E1. The object is to find as 'economical' a covering as possible, in the sense that LJJ.lo(E1) is as small as possible. The outer measure of A is !l*(A) = inf L J.lo(E1), (3. 1 1) j
oo
where the infimum is taken over all finite and countable coverings of A by !!-sets. If no such covering exists, set ll*(A) = Clearly, !l*(A) = J.lo(A) for each A E !f. 11* is called the outer measure because, for any eligible definition of 11(A), ll*(A) �
(
.
� 11(EJ) � 11 L)E1) � 11(A ), for E1 J
J
E
!f.
(3. 1 2)
The first inequality here is by the stipulation that 11(E1) = J.lo(E1) for E1 E !I in the case where a covering exists, or else the majorant side is infinite. The second and third follow by countable subadditivity and monotonicity respectively, because 11 is a measure. We could also construct a minimal covering for A c and, at least if the relevant outer measures are finite, define the inner measure of A as 11* (A) = 1-l*(.Q) - 1-L*(Ac). Note that since 11(A) = 11(.0) - 11(Ac) and !l*(Ac) � 11(Ac) by (3. 1 2), (3. 13) If !l*(A) = 11/A), it would make sense to call this common value the measure of A, and say that A is measurable. In fact, we employ a more stringent criterion. A set A � Q is said to be measurable if, for any B � Q, (3. 14) This yields ll*(A) = 11* (A) as a special case on putting B = Q, but remains valid even if 11(.0) = Let Jrt denote the collection of all measurable sets, those subsets of Q satis fying (3. 14). Since !l*(A) = J..Lo(A) for A E !I and J..to( 0) = 0, putting A = 0 in (3. 14) gives the trivial equality !l*(B) = !l*(B). Hence 0 E Jrt, and since the definition implies that Ac E Jrt if A E Jrt, Q E Jrt too. The next steps are to determine what properties the set function W : Jrt 1--7 rR shares with a measure. Clearly, !l*(A) � 0 for all A � Q. (3. 15) Another property which follows directly from the definition of 11* is monotonicity:
oo
.
(3. 16) Our goal is to show that countable additivity also holds for 11* in respect of Jrt-sets, but it proves convenient to begin by establishing countable subadditivity. 3.9 Lemma If { A1, j E IN } is any sequence of subsets of Q, then
Mathematics
42
( )
)l* uAJ � 1
� )l*(Aj). 1
(3. 17)
Assume )l* (A1) < for each j. (If not, the result is trivial.) For each j, let { EJk } denote a countable covering of A1 by !f-sets, which satisfies _L �(EJk) < )l*(Aj) + 2 -J£ k for any £ > 0. Such a collection always exists, by the definition of )l*. Since u�j � Uj,kEjb it follows by definition that oo
Proof
)l* 1 'Lj= 1 T =
(L)A1) � �1,k �(EJk) < � )l*(AJ) + £, 1
1
(3. 1 8)
noting 1 . (3. 17) now follows since £ is arbitrary and the last inequal ity is strict. • The following is an immediate consequence of the theorem, since subadditivity supplies the reverse inequality to give (3. 14). 3.10 Corollary A is measurable if, for any B � Q, (3. 19) The following lemma is central to the proof of the extension theorem. It yields countable additivity as a corollary, but also has a wider purpose. 3.11 Lemma A1. is a monotone class. Proof Letting {A1, j E IN } be an increasing sequence of .M.-sets converging to A = u�j. we show A E .M.. For n > 1 and E E Q, the definition of an .M.-set gives ).!* (An n E) = )l* (An - 1 n (An n E)) + )l*(A� -1 n (An n E)) = )l* (An - 1 n E) + )l* (Bn n E).
(3.20)
where Bn = An -An-I, and the sequence { Bj} is disjoint. Put Ao = 0 so that )l* (Ao n E) = 0; then by induction, n )l*(An n E) = _L )l* (B1 n E) (3.21) }=1 holds for every n. The right-hand side of (3.21) for n E IN is a monotone real sequence, and )l*(An n E) ----7 )l*(A n E) as n ----7 oo. Now, since An E .M., )l* (E) = )l* (An n E) + )l* (A� n E) :2: )l* (An n E) + )l* (Ac n E),
(3.22)
using the monotonicity of )l* and the fact that Ac � A�. Taking the limit, we have from the foregoing argument that
Measure
43
(3.23) so that A E .At by 3.10. For the case of a decreasing sequence, simply move to the complements and argue as above. • Since {B1} is a disjoint sequence, countable additivity emerges as a by-product of the lemma, as the following corollary shows. 3.12 Corollary If {B1} is a disjoint sequence of .At-sets, 1-1
*
(L)Bi) = � j.l*(BJ).
(3.24)
1
1
Proof Immediate on putting E = Q in (3.21) and letting n -----7 oo, noting UB1 = A. •
Notice how we needed 3.10 in the proof of 3.11, which is why additivity has been derived from subadditivity rather than the other way about. Proof of 3.8 We have established in (3. 15) and (3.24) that W is a measure for the elements of .At. If it can be shown that r:J � .At, setting j.!(A) = j.l*(A) for all A E r:J will satisfy the existence criteria of the theorem. The first step is to show that !f � .At or, by 3.10, that A E !f implies (3.25) for any E c Q. Let {A1 E !f } denote a finite or countable covering of E such that by defin LJ �-to(A1) < j.l*(E) + e, for e > 0. If no such covering exists, j.l*(E) ition and (3.25) holds trivially. Note that E n A � UCA1 n A), and since !f is a semi-ring the sets A1 n A are in !f. Similarly, E n Ac � Uj(A1 n Ac), and by simple set algebra and the definition of a semi-ring, A1 n Ac Ar (A1 n A) U qk (3.26) k where the c1k are a finite collection of !/-sets, disjoint with each other and also with A1 n A . Now, applying 3.9 and the fact that j.l*(B) 1-lo(B) for B E !f, we find = oo
=
=
=
j k
j =
L 1-lo(Aj) j
O { co: cf(co) s x} = { ro: j(co) < x/c }c, c < O c = 0 and x � 0 0, c = 0 and x < 0 0,
(3.55)
where for each of the cases on the right-hand side and each xlc E [R the sets are in c:f, proving part (i). If and only if f + g < x, there exist r E (Q such that f < r < x- g (see 1.10). It follows that
{ ro: f(ro) + g(ro) < x} = U { ro : f(ro) < r} n { co: g(ro) < x - r}.
(3.56)
r e IQ
The countable union of c:f-sets on the right-hand side is an c:f-set, and since this holds for every x, part (ii) also follows by 3.24(i), where in this case it is convenient to generate 'B from the open half-lines. • Combining parts (i) and (ii) shows that if fJ, ... ,fn are measurable functions so is I.J== 1 cd_;, where the c1 are constant coefficients. The measurability of suprema, infima, and limits of sequences of measurable functions is important in many applications, especially the derivation of inte grals in Chapter 4. These are the main cases involving the extended line, because of the possibility that sequences in [R are diverging. Such limits lying in iR are called extended functions. 3.26 Theorem Let Un } be a sequence of c:f/'B-measurable functions. Then infnfn , supnfn, liminfnfn, and Iimsupnfn are c:f/'B-measurable. Proof For any
x E [R , { ro: fn(co) s x} E c:f for each
n
by assumption. Hence
{ co: SUPnfn(ro) s x} = n { ro: fn(CO) s x} E c:f, (3.57) n==l so that supnfn is measurable by 3.24(ii). Since infnfn = -supn( -fn), we also
obtain
{ co: infnfn(CO) < x} = { co: supn(-fn(ro)) > -x} = { co: SUPn(-fn(CO)) S -x}c
Measure
=
U {ro:
n= !
53
fn(ro) < x} E �.
(3.58)
To extend this result from strong to weak inequalities, write { 00: infnfn( (J)) � X}
=
n { 00:
m= l
infnfn((J)) < X + 1 fm } E �.
(3.59)
Similarly to (3.57), we may show
(3.60) n { ro: fn(ro) � x} E �. k :?: n and applying (3.59) to the sequence of functions 8n supk :?:n!k yields { ro: SUPk :?:nfkCro) � x}
=
=
(3.61)
{ ro: Iimsupnfn(ro) � x} E �. In much the same way, we can also show
(3.62)
{ 00: liminfnfn((J)) � X} E �. The measurability condition of 3.24 is therefore satisfied in each case.
•
We could add that limnfn(ro) exists and is measurable whenever Iimsupnfn(ro) = liminfnfn(ro). This equality may hold only on a subset of n, but we say fn converges a.e. when the complement of this set has measure zero. The indicator function lE{ro) of a set E E � takes the value lE(ro) 1 when ro E E, and 1£(ro) 0 otherwise. Some authors call 1E the characteristic function of E. It may also be written as h or as XE· We now give some useful facts about indicator functions. =
=
3.27 Theorem (i) 1E(ro) is �/'B measurable if and only if E E �. (ii) 1 g:(ro) 1 - 1E(ro). (iii) 1 u .g(ro) sup 1 e-(ro). =
l
l
(iv) 1 n;E;(ro)
=
=
l
. I
i� f } ei(ro) I
=
n 1E;(00). i
B E , 'B, if 0 E B and 1 E B if 1 E B, 0 e B if 0 E B, 1 e B
Proof To show (i) note that, for each n
l�/(B)
=
E Ec
(3.63)
0, otherwise These sets
are
in � if and only if E E �. The other parts of the theorem are immediate from the definition. •
Mathematics
54
Indicator functions are the building blocks for more elaborate functions, constructed so as to ensure measurability. A simple function is a �/13-measurable function f: n f-7 1R having finite range; that is, it has the form n f(ro) = _'L a; l E; (ro) = a;, ro E Ei, (3.64) i= l where the aJ , ... ,an are constants and the collection of �-sets EJ, ... ,En is a finite partition of Q. �/13-measurability holds because, for any B E 13, (3.65) f - l (B) = u E; E �a; E B
Simple functions are ubiquitous devices i n measure and probability theory, because many problems can be solved for such functions rather easily, and then generalized to arbitrary functions by a limiting approximation argument such as the following.
0 -+--�����----�--��£! E2 £3 £4 \£6\Es Es E7
Eg
Es E7 E6 Es
Fig. 3 . 1 3.28 Theorem If f i s �/13-measurable and non-negative, there exists a monotone sequence of �/13-measurable simple functions Uen>• n E IN } such that fen)(ro) t f(ro)
for every
0)
E Q.
l , ... ,n2n , consider the sets E; = { ro: (i - 1 )12n s f(ro) < i/2n } . Augment these with the set En2n+ 1 = { ro: f(ro) � n } . This collection corresponds to n a n2 + 1-fold partition of [0,=) into 13-sets, and since f is a function, each ro maps into one and only one f( ro), and hence belongs to one and only one E;. The E; therefore constitute a partition of n. Since f is measurable, E; E � for each i. Define a simple function fen) on the E; by letting a; = (i - l)/2n, for i = 1 , ... , n2n + 1 . Then fen) s f, but fn+ 1 (ro) � fn(ro) for every ro; incrementing n bisects each interval, and if fen) (ro) = (i - 1)/2n, fen+ l ) (ro) is equal to either n Proof For i =
2(i - 1)/2 + l = fen) (ro), or (2i - 1)/2n+ l > fn(ro).
It
follows that the sequence is
Measure
55
monotone, and limn�"',f(n iro) f(ro). This holds for each ro E .0. To extend from non-negative to general functions, one takes the positive and negative parts. Define f + = max {f,O} and f - = f + - f, so that both f + and r are non-negative functions. Then if f (n) and f(n) are the non-negative simple approximations to f + and f - defined in 3.28, and f(n) f(n) - f(n) • it is clear that =
=
(3.66) Fig. 3.1 illustrates the construction for n = 2 and the case .Q = [R , so that f(ro) is a function on the real line. 3 . 6 B orel Functions
If f is a measurable function, and
g:
5i
1--7
lf;
5i
c
[R , lf
c
[R
is a function of a real variable, is the composite function go f measurable? The answer to this question is yes if and only if g is a Borel function. Let 'B5 = { B n 5i : B e 'B}, where 'B is the Borel field of [R . 'B5 is a a-field of subsets of 5i, and B n 5i is open (closed) in the relative topology on 5i whenever B is open (closed) in [R (see 2.1 and 2.3). 'B5 is called the Borel field on 5i. Define 'Elf similarly with respect to lf. Then g is called a Borel function (i.e., is Borel measurable) if g - 1 (B ) E 'B5 for all sets B E 'Elf.
3.29 Example Consider g(x) = I x 1 . g - 1 takes each point of [R + into the points x and -x. For any B e 'B+ (the restriction of 'B to [R +) the image under g - 1 is the set containing the points x and -x for each x E B, which is an element of 'B. o 3.30 Example Let g(x) = 1 if x is rational, 0 otherwise. Note that (Q E 'B (see 3.15), and g - 1 is defined according to (3.63) with E = !Q, so g is Borel-measur able. o In fact, to construct a 'plausible' non-measurable function is quite difficult, but the obvious case is the following. 3.31 Example Take a set A � 'B; for example, let A be the set H defined in 3.17. Now construct the indicator function 1 A (x): [R 1--7 { 0, 1 } . Since 1 A 1 ( { 1 }) = A � 'B, this function is not measurable. o Necessary conditions for Borel measurability are hard to pin down, but the follow ing sufficient conditions are convenient. 3.32 Theorem If g: 5i 1--7 lf is either (i) continuous or (ii) of bounded variation, it is Borel-measurable. Proof (i) follows immediately from 3.22 and the definition of a Borel field, since
continuity implies that h - \B) is open (closed) in 5i whenever B is open (closed) in lf, by 2.17. To prove (ii), consider first a non-decreasing function h: [R 1--7 [R , having the property h(y) ::;; h(x) when y < x; if A = {y: h(y) ::;; h(x)} , sup A x and A is one =
Mathematics
56
of ( -oo,x) and (-oo,x], so the condition of 3.24 is satisfied. So suppose g is non decreasing on S; applying the last result to any non-decreasing h with the prop erty h(x) = g(x), x E S, we have also shown that g is Borel-measurable because g - 1 (B n ""U') h - 1 (B) n S E :B'£, for each B n "U' E :B1r. Since a function of bounded variation is the difference of two non-decreasing functions by 2.20, the theorem now follows easily by 3.25. • =
This result lets us add a further case to those of 3.25. 3.33 Theorem If f and g are measurable, so is fg. Proof fg !((f + g) 2 - f 2 - i), and the result follows on combining 3.32(i) with 3.25 (ii ) . • =
The concept of a Borel function extends naturally to Euclidean n-spaces, and indeed, to mappings between spaces of different dimension. A vector function ---?
c
IR k, "U' � IR m is Borel-measurable if g- 1 (B) E :B'£ for all B E :B1r, where :B'£ and :B1r = { B n "U': B E :Em } . g: S
"U'; S
=
{B n S : B E :Bk }
3.34 Theorem If g is continuous, it is Borel-measurable. Proof By 2.21.
•
Finally, note the application of 3.21 to these cases. 3.35 Theorem If ll is a measure on (IR k,:Bk) and g: S � "U' is Borel-measurable where S c IR k and "U' c IR m , llC - 1 is a measure on (]',:B1f) where
(3.67)
for each B E :Elf.
o
A simple example is where g is the projection of IR k onto IR m for m < k. If X is k x 1 with partition X' = (X;,x; * ), where X is m x 1 and X** is (k - m) x 1, let * g: IR k � IR m be defined by g(X)
In this case, llc-\B)
=
!l(g-\B))
=
= X* . !l(B x iR k-m) for B E IR m .
(3.68)
4 Integration
4. 1 Construction of the Integral
The reader may be familiar with the Riemann integral of a bounded non-negative function f on a bounded interval of the line [a,b], usually written f�fdx. The objects to be studied in this chapter represent a heroic generalization of the same idea. Instead of intervals of the line, the integral is defined on an arb itrary measure space. Suppose (Q,�,!-l) is a measure space and f : Q H [R+ is a �/;8-measurable function into the non-negative, extended real line. The inte gral of f is defined to be the real valued functional
(4. 1) where the supremum is taken over all finite partitions of Q into sets Ei E �' and the supremum exists. If no supremum exists, the integral is assigned the value +=. 5 The integral of the function 1Af, where lA(ro) is the indicator of the set A E � ' is called the integral of f over A, and written fAfd!-l. The expression in (4. 1 ) is sometimes called the lower integral, and denoted L fd!-l. Likewise defining the upper integral of f,
(4.2) we should like these two constructions, approximating f from below and from above, to agree. And indeed, it is possible to show that fJd!-l f*fd!-l whenever f is bounded and !-l(Q) < =. However, J*fd!-l if either the set { co: f(co) > 0 } has infinite measure, or f is unbounded on sets of positive measure. Definition (4. 1 ) i s preferred because it can yield a finite value in these cases. 4.1 Example A familiar case is the measure space (IR ,:B,m), where m is Lebesgue measure. The integral ffdm where fis a Borel function is the Lebesgue integral of f. This is customarily written ffdx, reflecting the fact that m((x, x + dx]) = dx, even though the sets {Ed in (4. 1 ) need not be intervals. o 4.2 Example Consider a measure space (IR,:B,!-l) where 1-l differs from m. The integral ffd!-l, where f is a Borel function, is the Lebesgue-Stieltjes integral. = oo
=
Mathematics
58 The monotone function
F(x) = �(( -oo, x]) (4.3) has the property �((a,b]) = F(b) - F(a), and the measure of the interval (x, x + dx] can be written dF(x). The notation ffdF means exactly the same as ffd�, the choice between the � and F representations being a matter of taste. See §8.2 and §9. 1 for details. o For a contrast with these cases, consider the Riemann-Stieltjes integral. For an interval [a,b], let a partition into subintervals be defined by a set of points I1 = {xt, ... , xn } , with a x0 < x 1 < ... < Xn = b. Another set Il' is called a refinement of I1 if I1 � Il'. Given functions f and a: IR H IR, let =
n
S(Il,a,f) = _Lf(t;) (a(x;) - a(x;- t)),
(4.4)
i=l
where t; E [x;- t, x;]. If there exists a number f�fda, such that for every £ > 0 there is a partition Ile with
I S(TI,a,f) -J>da j < £
for all I1 � Ile and every choice of { t;}, this is called the Riemann-Stieltjes integral of f with respect to a. Recall in this connection the well-known formula for integration by parts, which states that when both integrals exist,
f(b)a(b) = f(a)a(a) +
s>da + s:adj.
(4.5)
When a = x and f is bounded this definition yields the ordinary Riemann integral, and when it exists, this always agrees with the Lebesgue integral of f over [a,b ] . Moreover, if a is an increasing function of the form in (4.3), this integral is equal to the Lebesgue-Stieltjes integral whenever it is defined. There do exist bounded, measurable functions which are not Riemann-integrable (consider 3.30 for example) so that even for bounded intervals the Lebesgue integral is the more inclusive concept. However, the Riemann-Stieltjes integral is defined for more general classes of integrator function. In particular, if f is continuous it exists for a of bounded variation on [a,b ] , not necessarily monotone. These integrals therefore fall outside the class defined by (4. 1 ), although note that when a is of bounded varia tion, having a representation as the difference of two increasing functions, the Reimann-Stieltjes integral is the difference between a pair of Lebesgue-Stieltjes integrals on [a,b]. The best way to understand the general integral is not to study a particular measure space, such as the line, but to restrict attention initially to particular classes of function. The simplest possible case is the indicator of a set. Then, every partition {E;} yields the same value for the sum of terms in (4. 1 ), which is
Integration
59
f dJl = f 1AdJl Jl(A), A '!F. Note that if A '!F, the integral is undefined. =
(4.6)
for any A E e Another case of much importance is the following. 4.3 Theorem If f = 0 a.e. [Jl], then ffdJl = 0. Proof The theorem says there exists C � Q with Jl( C) = 1 , such that f(ro) = 0 for ro e C. For any partition {EJ, ... ,En} let Ei = Ei n C, and E[ = Ei - Ei. By additiv ity of Jl,
� { inf f(ro)) Jl(Ea = L { inf , f(ro)) Jl(Ei) + � { inf JCro)) Jl(Ei) I
I
W E E;
=
I
W E E;
0,
W E E;
(4.7)
the first sum of terms disappearing because f(ro) = 0, and the second disappearing by 3.6(i) since Jl(Ef) =:;; Jl(C c) = 0 for each i. • A class of functions for which evaluation of the integral is simple, as their name suggests, is the non-negative simple functions. 4.4 Theorem Let and fgd� = I,Jgid� by (4.27). 4 . 3 Product Measure and Multiple Integrals
Let (Q,?f,�) and (3,§',v) be measure spaces. In general, (Q x 3, ?f ® §', 1t) might also be a measure space, with 1t a measure on the sets of ?J ® §'. In this case measures � and v, defined by �(F) = 1t(F x 3) and v(G) = 1t(O x G) respectively, are called the marginal measures corresponding to 1t. Alternatively, suppose that � and v are given, and define the set function 1t: 'Rf!i '§ !--7 iR"+, where 'Rf!i'§ denotes the measurable rectangles of the space Q x 3, by '
1t(F x G) =
�(F)v ( G) .
(4.28)
We will show that 1t is a measure on 'R ffi'fi• called the product measure, and has an extension to ?; ® §', so that (0 x 3, ?f ® §', 1t) is indeed a measure space. The first
Integration
65
step in this demonstration is to define the mapping T00: 3 H Q x 3 by T00(�) = (m,�), so that, for G E §', T00(G) = m } X G. For £00
=
T;;/(E)
=
{�: (m,�) E E }
{
�
E E ':f ® §', let
3.
(4.29)
E
The set £00 can be thought of as the cross-section through at the element m. For any countable collection of ':f ® rt'-sets { , j E IN } ,
E1
(uej) 00 {�: ((J),�) E UEj} l) { �: ((1),�) E Ej} UCEj)oo. =
1
1
=
=
1
For future reference, note the following.
4.15 Lemma T00 is a §'/(':; ® §')-measurable mapping for each (J) E Proof We must show that £00 E §' whenever
G
E §', it is obvious that E00
Since ':f ® §'
=
a('Rff'fl ),
=
{
G, m E F 0, m
�
F
E E ':f ® §'. IfE
(4.30)
1
= Fx G
E §'.
for
n.
FE ':f and (4.3 1)
the lemma follows by 3.22.
•
The second step is to show the following. 4.16 Theorem 1t is a measure on 'R nt· Proof Clearly 1t is non-negative, and n(0) = 0, recalling that Fx 0 = 0 x G = 0 for any F E ':f or G E §', and applying (4.28). It remains to show countable additiv ity. Let E 'Rff'fl, j E IN } be a disjoint collection, such that there exist sets E ':} and Gj E §' with = X G ; and also suppose = E 'Rff'fj, such that there exist sets F and G with = F x G. Any point ( m,�) E F x G belongs to one and only one of the sets x G1, so that for any m E F, the sets of the subcollection G1 } for which m E must constitute a partition of G. Hence, applying (4 .30) and (4.3 1),
{E1
{
Ej FjE j F1 F1
E uj Ej
F1
(4.32) where the additivity of v can be applied since the sets G1 appearing in this decomposition are disjoint. Since we can also write v(£00) = v(G) 1 r(m), we find 1l(E)
�
11(F}v(G)
�
fv(Ero)dl!(ro) f (:�:>/ro Gj)) d�t(ro) �
)v(
Mathematics
66 j
j
as required, where the penultimate equality is by 4.14.
(4.33) •
It is now straightforward to extend the measure from �Yi'fl to r::F ® §'. 4.17 Theorem (.Q x 3, r::F ® §', n) is a measure space. Proof r::F and §' are a-fields and hence semi-rings; hence �Yi'fl is a semi-ring by 3.19. The theorem follows from 4.16 and 3.8. •
Iterating the preceding arguments (i.e. letting (.Q,r:f) and/or (3,§') be product spaces) allows the concept to be extended to products of higher order. In later chapters, product probability measures will embody the intuitive notion of statis tical independence, although this is by no means the only application we shall meet. The following case has a familiar geometrical interpretation. 4.18 Example Lebesgue measure in the plane, IR 2 = IR x IR , is defined for intervals by
(4.34) Here the measurable rectangles include the actual geometrical rectangles (products of intervals), and 'B2 , the Borel sets of the plane, is generated from these as a consequence of 3.20. By the foregoing reasoning, (IR 2 ,:lf,m) is a measure space in which the measure of a set is given by its area. o We now construct integrals of functions f( co,�) on the product space. The follow ing lemma is a natural extension of 4.15, for it considers what we might think of as a cross-section through the mapping at a point co E .Q, yielding a function with domain 3. 4.19 Lemma Let j: .Q x 3 r-7 1R be r::F ® §'/'B-measurable. Define fw(�) = f(co,�) for fixed co E .Q. Then fw: 3 f--7 IR is §'/'B-measurable. Proof We can write
(4.35) By 4.15 and the remarks following 3.22, the composite function f0T00 is §'/'B measurable. • Suppose we are able to integrate fw with respect to v over 3. There are two ques tions of interest that arise here. First, is the resulting function g(co) = f=:.fwdv :1/'B-measurable? And second, if g is now integrated over .Q, what is the relationship between this integral and the integral fnx:ddn over .Q x 3? The affirmative answer to the first of these questions, and the fact that the 'iterated' integral is identical with the 'double' integral where these exist, are the most important results for product spaces, known jointly as the Fubini theorem. Since iterated integration is an operation we tend to take for granted m;th rnnJt1 nJ P "R 1 Pmllnn infP:OT::ll « nerhanS the main 00int needing tO be Stressed
Integration
67
here is that this convenient property of product measures (and multivariate Lebesgue measure in particular) does not generalize to arbitrary measures on product spaces. The first step is to let f be the indicator of a set E E � ® fl. In this case fro is the indicator of the set Ew defined in (4.29), and
f
fw dv
=
V (Ero)
=
gE(ro),
(4.3 6)
[R+
say. In view of 4.15, Ew E fJ and the function gE: Q 1--7 is well-defined, although, unless v is a finite measure, it may take its values in the extended half line, as shown. 4.20 Lemma Let J..L and v be a-finite. For all E E � ® fi, and
fo.gEdJ..L
=
8E is ���-measurable (4.37)
rt(E).
By implication, the two sides of the equality in (4.37) are either both infinite, or finite and equal. Proof Assume first that the measures are finite. The theorem is proved for this
case using the rt-A theorem. Let d1 denote the collection of sets E such that gE satisfies (4.37). 'R'!Ji'f!l � dl, since if E = Fx G then, by (4.31),
(4.38) 8E (ffi) = V(G) l F (ro), F E � ' and fo.gEdJ..L = J..L(F)v ( G) = n(E) as required. We now show .4 is a A.-system. Clearly Q x 3 E A, so 1.25(a) holds. If Et ,Ez E .4 and Et c Ez, then, since 1 ErE 1 = 1 E2 - l EI'
=
(4.39)
8E2(ro) -8E1(ro),
an ��� measurable function by 3.25, and so, by additivity of rt,
Jo.8ErE1dJ..L(ro)
=
n(Ez) - rr(Et)
=
n(E2 - E1),
(4.40)
showing that d1 satisfies 1.25(b). Finally, If A t and A2 are disjoint so are (At)ro and (Az) ro, and 8A1vAz{ro) = 8A1(ro) + 8Az(ro) . To establish 1.25(c), let {Ej E .4, j E IN } be a monotone sequence, with Ej t E . Define the disjoint collection { Aj } with At Et and Aj = Ej+t - Ej, j > 1 , so that E = U}=1Aj and Aj E .4 by (4.39). By countable additivity of v, =
gE(ro) = L8A/ffi). Tl9s,is 3'/13�measurable by 3.26, �
; ;'-""' " '' ' 0, and if x E B, ::3 S(x,£s) � B similarly, with £8 > 0. If x E A n B, S(x,£) c A n B, with £ > 0. • The important thing to bear in mind is that openness is not preserved under arbitrary intersections. A closure point of a set A is a point x E § (not necessarily belonging to A) such that for all 8 > 0 ::3 y E A with d(x,y) < 8. The set of closure points of A, denoted A, is called the closure of A. Closure points are also called adherent points, 'sticking to ' a set though not necessarily belonging to it. If for some 8 > 0 the definition of a closure point is satisfied only for y = x, so that S(x,8) n A = { x}, x is said to be an isolated point of A. A boundary point of A is a point x E A, such that for all 8 > 0 ::3 z E Ac with d(x,z) < 8. The set of boundary points of A is denoted dA, and A = A u dA. The interior of A is A0 = A - dA. A closed set is one containing all its closure points, such that A = A. An open set does not contain all of its closure points, since the boundary points do not belong to the set. The empty set 0 and the space § are both open and closed. A subset B of A is said to be dense in A if B c A � B. A collection of sets t;' is called a covering for A if A � Use �B. If each B is open, it is called an open covering. A set A is called compact if every open covering of A contains a finite subcovering. A is said to be relatively compact if A is compact. If § is itself compact; (S,d) is said to be a compact space. The remarks in §2. 1 about compactn�ssci���: a:e equally relevant to the general case. A is said to be bounded if ::3 x �f"tt.'�tl;�ro< r < such that A c S(x,r); and also oo ,
78
Mathematics
totally bounded (or precompact) if for every E > 0 there exists a finite collection of points XJ , . . . ,Xm (called an £-net) such that the spheres S(x;,E), i 1 , ... ,m form a covering for A. The S(x;,E) can be replaced in this definition by their closures S(x;,E), noting that S(x;,E) is contained in S(x;, E + 8) for all 8 > 0. The points of the E-net need not be elements of A. An attractive mental image is a region of IR 2 covered with little cocktail umbrellas of radius E (Fig. 5.2). Any set that is totally bounded is also bounded. In certain cases such as (!R n,dE) the converse is also true, but this is not true in general. =
Fig. 5.2 If a set is relatively compact, it is totally bounded. Proof Let A be relatively compact, and consider the covering of A consisting of the E-balls S(x,E) for all x E A. By the definition this contains a finite sub cover S(x;,E), i = l , ... ,m, which also covers A. Then {x 1 , ... ,xm } is an E-net for A, and the theorem follows since E is arbitrary. • The converse is true only when the space is complete; see 5.13. 5.5 Theorem
5 . 2 Separability and Completeness
In thinking about metric spaces, it is sometimes helpful to visualize the analogue problem for IR, or at most for IR n with n s 3, and use one's intuitive knowledge of those cases. But this trick can be misleading if the space in question is too alien to geometrical intuition. A metric space is said to be separable if it contains a countable, dense subset. Separability is one of the properties that might be considered to characterize an 'IR-like' space. The rational numbers (Q are countable and dense in IR, so IR is separable, as is IR n . An alternative definition of a separable metric space is a metric space for which the LindelOf property holds (see 2.7). This result can be given in the following form. 5.6 Theorem In a metric space s; the following three properties are equivalent: (a) s; is separable. (b) Every open set A k S has the representation
Metric Spaces
79
A = U Bi, Bi E V,
(5.9)
i=l
where V is a countable collection of open spheres in 5>. (c) Every open cover of a set in 5> has a countable subcover. o A collection V with property (b) is called a base of 5>, so that separability is equated in this theorem with the existence of a countable base for the space. In topology this property is called second-countability (see §6.2 ) . (c) is the LindelOf property. Proof We first show that (a) implies (b). Let V be the countable collection of spheres {S(x,r): x E D, r E ID + } , where D is a countable, dense subset of§, and ID + is the set of positive rationals. If A is an open subset of 5>, then for each x E A, 3 o > 0 such that S(x,o) s;; A. For any such x, choose xi E D such that d(xi,x) < o/2 (possible since D is dense) and then choose rational ri to satisfy d(xi,x) < ri < 0/2. Define Bi = S(xi, ra E V, and observe that (5. 10) x E Bi � S(x,o) � A. Since V as a whole is countable, the subcollection {Bd of all the sets that satisfy this condition for at least one x E A is also countable, and clearly A � UBi c A, so A = UiBi. Next we show that (b) implies (c). Since V is countable we may index its elements as { V1, j E IN }. If t5 is any collection of open sets covering A, choose a subcollection { Ci, j E IN } , where c1 is a set from t5 which contains \!} if such exists, otherwise let c1 = 0. There exists a covering of A by V-sets, as just shown, and each \!} can itself be covered by other elements of V with smaller radii, so that by taking small enough spheres we may always find an element of t5 to contain them. Thus A c UJCJ, and the LindelOf property holds. Finally, to show that (c) implies (a), consider the open cover of 5> by the sets {S(x, l/n), x E 5> } . If there exists for each n a countable subcover {S(xnb l ln), k E IN }, for each k there must be one or more indices k' such that d(xnk.Xnk') < 2/n. Since this must be true for every n, the countable set {Xnh k E IN , n E IN } must be dense in 5>. This completes the proof. • The theorem has a useful corollary. 5.7 Corollary A totally bounded space is separable. o Another important property is that subspaces of separable spaces are separable, which we show as follows. 5.8 Theorem If (§,d) is a separable space and A c §, then (A,d) is separable. Proof Suppose D is countable and d�nse in §. Construct the countable set E by .s�t taking one point from each An (5. 1 1) ..
80
Mathematics
For any x E A and 8 > 0, we may choose y E D such that d(x,y) < 8/2. For every such y, ::3 z E E satisfying z E A n S(y,r) for r < 8/2, so that d(y ,z) < 8/2. Thus d(x,z) s d(x,y) + d(y,z) < 8, (5. 1 2) and since x and 8 are arbitrary it follows that E is dense in A. • This argument does not rule out the possibility that A and D are disjoint. The separability of the irrational numbers, !R - IQ , is a case in point. On the other hand, certain conditions are incompatible with separability. A subset A of a metric space (§,d) is discrete if for each x E A, 3 8 > 0 such that (S(x,8) - {x}) n A is empty. In other words, each element is an isolated point. The integers 7L are a discrete set of (!R,d£), for example. If § is itself discrete, the discrete metric dD is equivalent to d. 5.9 Theorem If a metric space contains an uncountable discrete subset, it is not separable. Proof This is immediate from 5.6. Let A be discrete, and consider the open set UxE A S(x, Ex) , where Ex is chosen small enough that the specified spheres form a disjoint collection. This is an open cover of A, and if A is uncountable it has no countable subcover. • The separability question arises when we come to define measures on metric spaces (see Chapter 26). Unless a space is separable, we cannot be sure that all of its Borel sets are measurable. The space D ra,b] discussed below (5.27) is an important example of this difficulty. The concepts of sequence, limit, subsequence, and cluster point all extend from !R to general metric spaces. A sequence {xn} of points in (§,d) is said to converge to a limit x if for all E > 0 there exists Ne � 1 such that (5. 1 3) d(xn,x) < E for all n > Ne. Theorems 2.12 and 2.13 extend in an obvious way, as follows. 5.10 Theorem Every sequence on a compact subset of § has one or more cluster points. o 5.1 1 Theorem If a sequence on a compact subset of § has a unique cluster point, then it converges. o The notion of a Cauchy sequence also remains fundamental. A sequence {xn} of points in a metric space (§,d) is a Cauchy sequence if for all £ > 0, 3 Ne such that d(xn,Xm) < E whenever n > Ne and m > Ne. The novelty is that Cauchy sequences in a metric space do not always possess limits. It is possible that the point on which the sequence is converging lies outside the space. Consider the space (IQ,d£). The sequence {xn } , where Xn = 1 + 112 + 116 + . . . + 11n ! E IQ, is a Cauchy sequence since l xn+ l - xn l = 1/(n + 1 ) ! ---7 0; but of course, Xn ---7 e (the base of the natural logarithms), an irrational number. A metric space (§,d) is said to be complete if it contains the limits of all Cauchy sequences defined on it. (!R ,dE)
Metric Spaces
81
(�,dE)
is not. is a complete space, while Although compactness is a primitive notion which does not require the concept of a Cauchy sequence, we can nevertheless define it, following the idea in 2.12, in terms of the properties of sequences. This is often convenient from a practical point of view. 5.12 Theorem The following statements about a metric space (�,d) are equivalent: (a) � is compact. (b) Every sequence in � has a cluster point in � (c) � is totally bounded and complete. o Notice the distinction between completeness and compactness. In a complete space all Cauchy sequences converge, which says nothing about the behaviour of non Cauchy sequences. But in a compact space, which is also totally bounded, all sequences contain Cauchy subsequences which converge in the space. Proof We show in turn that (a) implies (b), (b) implies (c), and (c) implies (a). n E [N } be a sequence in �. and define a Suppose � is compact. Let decreasing sequence of subsets of � by Bn = k � n } . The sets Bn are closed, and the cluster points of the sequence, if any, compose the set c = n-;;= r Bn = (U-;;'= 1B�Y- If C = 0, � = U;=1B�, so that the open sets B� are a cover for �. and by assumption these contain a finite subcover. This means that, for some m < oo, � � U�::l B� = cn�=lBnY = B�. This leads to the contradiction Em = 0, so that c must be nonempty. Hence, (a) implies (b). Now suppose that every sequence has a cluster point in �- Considering the case of Cauchy sequences, it is clear that the space is complete; it remains to show that it is totally bounded. Suppose not: then there must exist an £ > 0 for which . ,xn} such that no £-net exists; in other words, no finite n and points :::::; £ for all j ::f. k. But letting n ---7 oo in this case, we have found a sequence with no cluster point, which is again a contradiction. Hence, (b) implies (c). Finally, let � be an arbitrary open cover of �- We assume that � contains no finite subcover of � ' and obtain a contradiction. Since � is totally bounded it must possess for each n � 1 a finite cover of the form n Bni = S(Xni• l/2 ) , i = l , ... ,kn . (5. 14) Fixing n, choose an i for which Bni has no finite cover by �-sets (at least one For n > {Bnd1�1 is also a such exists by hypothesis) and call this set covering for Dn-l and we can chooseDn so thatDn nDn-1 has no finite subcover by �-sets, and accordingly is nonempty . Thus , choose a sequence of points E Dn, n E [N } . Since Dn is a ball of radius · . , and contains and Dn+ I is of radius triangle inequality implies that 1 /2n+ l and contains n n ---7 0 as • .·· i 12 {xn } is a Cauchy sequence and :::: :; 6 T < 32,j=0 converges to a limit x E �. by S(x,£) c A for some £ > 0. Choose a set A E � containing n radius < 6/2 , choosing £ Since for any n
{xm
{xk:
{x1,
d(xj,Xk)
Dn.
d(xn,Xn+m)
Xn+l , d(xn.Xn+I)
d(xmx)
.
. .
1,
{xn
Xn,
112n,
Mathematics 2 < 912n ensures that Dn S(x, E). But this means Dn A, which is a contradiction since Dn has no finite cover by t;'-sets, Hence t;' contains a finite subcover, and 8
c
c
(c) implies (a).
•
In complete spaces, the set properties of relative compactness and precompact ness are identical. The following is the converse of 5.5.
A
5.13 Corollary In a complete metric space, a totally bounded set is relatively
compact.
Proof If s; is complete, every Cauchy sequence in
A.A
A has a limit in s;, and all such
points are closure points of The subspace (A,d) is therefore a complete space. It follows from 5.12 that if is totally bounded, A is compact. • 5 . 3 Examples
The following cases are somewhat more remote from ordinary geometric intuition than the ones we looked at above.
2
5.14 Example In § 1 . 3 and subsequently we shall encounter rR00, that is, infinite dimensional Euclidean space. If . . E rR00, and . E
oo x = (xt, X z, . ) (y , y z, . ) [R y 1 similarly, a metric for [R oo is given by 00 k (5. 15) doo(x,y) = ,L2k=l do(Xk>Yk), where d0 is defined in (5. 1). Like d0, doc is a bounded metric with doo(x,y) ::::; 1 for all x and y. =
o
5.15 Theorem (rR00,doo) is separable and complete.
Proof To show separability, consider the collection
Am = {x = (x1,x2, ... ): xk rational if k ::::; m, xk = 0 otherwise } (5. 16) [R OO, and by 1.5 the collection A = {A m, m = 1 , 2, ... } is also count Aable. m is Forcountable, any y [R oo and E 0, 3 x A m such that m k£ + Loo Tkdo(O,yk) ::::; £ + 2-m. doo(x,y) ::::; ,L2(5. 17) k=l k=m+l Since the right-hand side can be made as small as desired by choice of E and m, y is a closure point of A. Hence, A is dense in IR 00 • To show completeness, suppose {xn = (X tn,Xzn, ... ) , n IN } is a Cauchy sequence k in IR"". Since do(Xkn .Xkm) ::::; 2 doo(XmXm) for any k, {xkn• n IN } must be a Cauchy sequence in Since m kd(Xk.Xkn) + Tm (5. 18) doo(X,Xn) ::::; _LT k=l c
E
>
E
E
E
IR .
Metric Spaces
83
for all m, we can say that Xn ---7 x = (x 1 ,X2, ... ) E !Roo iff xkn ---7 xk for each k = 1 ,2 , ... ; the completeness of 1R implies that {xn} has a limit in IR 00 • • 5.16 Example Consider the 'infinite-dimensional cube', [0, 1]00; the Cartesian' product of an infinite collection of unit intervals. The space ([0, l] oo ,doo) is separable by 5.8. We can also endow [0, l]oo with the equivalent and in this cas e bounded metric, 00 k Poo(x,y) = 2 2- l xk - Yk l · D (5. 19) k=l
In a metric space (S,d), where d can be assumed bounded without loss of general ity, define the distance between a point x E '£ and a subset A � '£ as d(x,A) = infy e A d(x,y). Then for a pair of subsets A,B of (S,d) define the function 5 dH: i'» X 2 � IR+, 5 where 2 is the power set of '£, by
{
}
max sup d(x,A), sup d(y,B) . yEA XE B dH(A,B) is called the Hausdorff distance between sets A and B.
dH(A,B)
=
(5.20)
5.17 Theorem Letting Jf5 denote the compact, nonempty subsets of '£, (Jf5 ,dH) is a metric space. Proof Clearly dH satisfies 5.1(a). It satisfies 5.1(b) since the sets of Jf5 are
closed, although note that dH(A, A) = 0, so that dH is only a pseudo-metric for general subsets of '£. To show 5.1( c), for any x E A and any z E C we have, by definition of d(x,B) and the fact that d is a metric, sup d(x,B) :::; sup { d(x,z) + d(z,B) } .
(5.21)
xEA
xEA
Since C is compact, the infimum over C of the expression in braces on the right hand side above is attained at a point z E C. We can therefore write
{
sup d(x,B) :::; sup inf (d(x,z) + d(z,B) )
xeA
x e A ze C
:::; sup d(x,C) xEA
+
sup d(z,B).
}
Similarly, supy ed(x,A) :s;. SUPz e cd(z,A) + supyeBd(y,C), and hence, e
{
dH(A,B) :::; max sup d(x,C) xEA
When (S,d) is complete, it
can
+
(5.22)
ZEC
supd(z,B), sup d(z,A) + sup d(y,C) ze C ,ZE C yeB
}
(5.23)
Mathematics
84
5.18 Example Let 5l IR. The compact intervals with the Hausdorff metric define a complete metric space. Thus, { E [N } is a Cauchy sequence which converges in the Hausdorff metric to This is the closure of the set which we usually regard as the limit of this sequence (compare 2.16), but although =
[0, 1 - lin], n [0,1].
[0,1)
[0,1) Jf$, dH{[0,1),[0,1]) 0. Another case is where 5l (IR 2 ,dE) and Jf$ contains the closed and bounded subsets �
=
D
=
of the Euclidean plane. To cultivate intuition about metric spaces, a useful exercise is to draw some figures on a sheet of paper and measure the Hausdorff distances between them, as in Fig. 5.3. For compact and if and only if = compare this with another intuitive concept of the 'distance between two sets' , infx e A y s ( y , which is zero if the sets touch or intersect.
A B;
A B, dH(A,B) 0 =
,
e
dE x, )
dH(A,B) Fig. 5.3 5 .4 Mappings on metric spaces
We have defined a function as a mapping which takes set elements to unique points of IR , but the term is also used where the codomain is a general metric space. Where the domain is another metric space, the results of §2.3 arise as special cases of the theory. Some of the following properties are generalizations of those given previously, while others are new. The terms mapping, transformation, etc., are again synonyms for function, but an extra usage is functional, which refers to the case where the domain is a space whose elements are themselves functions, with (usually) IR as co-domain. An example is the integral defined in §4. 1 . The function (5l,d) f--7 is said to be continuous at x if for all £ > ::3 8 > such that
0
f:
(l,p) sup p(f(y), f(x)) < £.
ye SJ(x,li)
Here, 8 may depend on such that
>
0
0
(5.24)
x. Another way to state the condition is that for £ > 0 ::3 8 f(Sd(x,o)) � Sp(f(x),E), (5.25)
Metric Spaces
85
where Sd and Sp are respectively balls in ('£,d) and (U" ,p ) . Similarly, f is said to be uniformly continuous on a set A c '£ if for all E > 0, 3 8 > 0 such that sup
sup p(f(y),f(x)) < E.
(5.26)
x E A y E SJ(x,li)nA
Theorem 2.17 was a special case of the following important result. 5.19 Theorem For A e lf, f- 1 (A) is open (closed) in '£ whenever A is open (closed) in 1r , iff f is continuous at all points of §. Proof Assume A is open, and let f(x) E A for x e f- 1 (A). We have Sp(f(x),E) c A
for some E > 0. By 1.2(iv) and continuity at x,
(5 . 27) If A is open then u - A is closed and f- 1 (U" - A) = '£ -f-\A) by 1.2(iii), which is closed if f-\A) is open. This proves sufficiency. To prove necessity, suppose f-1(A) is open in '£ whenever A is open in lr, and in particular, f- 1 (Sp(f(x),E)) for E > 0 is open in '£. Since x e f- 1 (Sp(f(x),E)), there is a 8 > 0 such that (5.25) holds. Use complements again for the case of closed sets. •
This property of inverse images under f provides an alternative characterization of continuity, and in topological spaces provides the primary definition of continuity. The notion of Borel measurability discussed in §3.6 extends naturally to mappings between pairs of metric spaces, and the theorem establishes that continuous transformations are Borel-measurable. The properties of functions on compact sets are of interest in a number of contexts. The essential results are as follows. 5.20 Theorem The continuous image of a compact set is compact. Proof We show that, if A c '£ is compact and f is continuous, then f(A) is compact.
Let � be an open covering of f(A). Continuity of f means that the sets f- 1 (B), B e � are open by 5.19, and their union covers A by 1.2(ii). Since A is compact, these sets contain a finite subcover, say, f-\Bt), ... ,f-\Bm). It follows that
f(A) c f
�
}
F 1 (Bj) =
Q
tlf 1 (Bj)) �
Q
Bj,
(5.28)
where the equality is by 1.2(i) and the second inclusion by 1.2(v). Hence, B 1 , ... ,Bm is a finite subcover of f(A) by �-sets. Since � is arbitrary, it follows that f(A) is compact. • 5.21 Theorem If f is continuous on a compact set, it is uniformly continuous on the set. ), and for each x e A, continuity at x Proof Let A c '£ be compact. Choo��:e;?c( . means that there exists a spber� ,.�,� rk), k = 1 , ... ,m. Let 8 min1 � k �mrk, and consider a pair of points x,y E S such that d(x,y) < 8. Now, y E Sd(xk ,rk) for some k, so that p(f(x k),f(y)) < 1£, and also d(xk,x) � d(xby) + d(x,y) s rk + 8 � 2rb (5.29) using the triangle inequality. Hence p(f(xk),f(x)) � 1£, and p(f(x),f(y)) � p(f(x),f(xk)) + p(f(xk),f(y)) < £. (5.30) Since, 8 independent of x and y, f is uniformly continuous on A. • If f: S 1-7 1f is onto, and f and f - 1 are continuous, f is called a homeo morphism, and S and 1f are said to be homeomorphic if such a function exists. IfS is homeomorphic with a subset of lf, it is said to be embedded in 1f by f. If f also preserves distances so that p(f(x),f(y)) = d(x,y) for each x,y E S, it is called an isometry. Metrics d1 and d2 in a space S are equivalent if and only if the identity mapping from (S,d1 ) to (S,d2) (the mapping which takes each point of S into itself) is an homeomorphism. 5.22 Example If d""' and p""' are the metrics defined in (5. 1 5) and (5. 1 9) respectively, the mapping g: ([R""' ,d""') ---7 ([0, l] =,poo), where g = (gt .gz, . .. ) and
=
1-1
(5.3 1 ) i s an homeomorphism. o Right and left continuity are not well defined notions for general metric spaces, but there is a concept of continuity which is 'one-sided' with respect to the range of the function. A function f: (S,d) 1-7 IR is said to be upper semicontinuous at x if for each £ > 8 > 0 such that, for y E S, d(x,y) < 8 => f(y) < f(x) + £. (5.32) If {xn } is a sequence of points in S and d(xn ,x) ---7 0, upper semicontinuity implies limsupnf(xn ) � f(x). The level sets of the form {x: f(x) < <X} are open for all <X E IR iff f is upper semicontinuous everywhere on S. f is lower semi continuous iff -f is upper semicontinuous, and f is continuous at x iff it is both upper and lower semicontinuous at x. A function of a real variable is upper semicontinuous at x if it jumps at x with f(x) '?. max {f(x-),f(x+)}; isolated discontinuities such as point A in Fig. 5.4 are not ruled out if this inequality is satisfied, On the other hand, upper semi continuity fails at point B. Semicontinuity is not the same thing as right/left continuity except in the case of monotone functions; if f is increasing, right (left) continuity is equivalent to upper (lower) semicontinuity, and the reverse holds for decreasing functions. The concept of a Lipschitz condition generalizes to metric spaces. A function f on (S,d) satisfies a Lipschitz condition at x E S if for 8 > M > 0 such that, for any y E Sd(x,8),
03
03
Metric Spaces
87
p(f(y),f(x)) :::; Mh(d(x,y)) (5.33) where h(.): IR + 1---7 IR + satisfies h(d) ,!, 0 as d ,!, 0. It satisfies a uniform Lipschitz condition if condition (5.33) holds unifonnly, with fixed M, for all x E $. The remarks following (2.9) apply equally here. Continuity is enforced by this condition with arbitrary h, and stronger smoothness conditions are obtained for special cases of h. A
Fig. 5.4 5 . 5 Function Spaces
The non-Euclidean metric spaces met in later chapters are mostly spaces of real functions on an interval of IR . The elements of such spaces are graphs, subsets of IR 2 . However, most of the relevant theory holds for functions whose domain is any metric space ('£,d), and accordingly, it is this more general case that we will study. Let denote the set of all bounded continuous functions f: $ 1---7 IR , and define
C'!D
du(f,g) = sup j f(x) - g(x) 1 . XES
(5 .34)
5.23 Theorem du i s a metric. Proof Conditions S.l(a) and (b) are immediate. To prove the triangle inequality
write, given functions f, g and h
E
Cs ,
du(f,h) = sup I f(x) - g(x) + g(x) - h(x) I XES
:::; sup ( I f(x) - g(x) I + I g(x) - h(x) I ) XES
:::; du(f,g)+du(g ,h). du is called the uniform
Cs
An important subset of by § is compact,
Cs = Us
•
(5.35)
) is a metric space. of uniformly continuous functions. If relatively compact, every f E Cs has
Mathematics
88
a uniformly continuous restriction to $, and every f E U$ has a continuous extension to $, say f, constructed by setting f(x) = f(x) for x E $ and f(x) = lim nf(xn) for a sequence {xn E $ } converging to x, for each x E $ - $. Note that for any pair f,f' E $, du(f,f') = du(f,f'), so that the spaces Cs; and U$ are isometric. There are functions that are continuous on $ and not on $, but these cannot be uniformly continuous. The following is a basic property of C$ which holds independently of the nature of the domain $. 5.24 Theorem (C$,du) is complete. Proof Let Un} be a Cauchy sequence in C$ ; in other words, for £ > 0 3 N6 � 1
such that du(fn.fm) � £ for n,m > N6• Then for each x E Si, the sequences Un(x) } satisfy I fn(x) - fm(x) � du(fnJm); these are Cauchy sequences in [R , and so have limits f(x). In view of the definition of du, we may say that fn -----7 f uniformly in Si. For any x,y E Si, the triangle inequality gives
I
(5.36) + I fn(y) -f(y) 1 . Fix £ > 0. Since fn E c$ ' 3 0 > 0 such that I fn(X) - fn(y) I < t£ if d(x,y) < 0. Also, by uniform convergence 3 n large enough that (5.3 7) max { l f(x) - fn(x) l , l fn(y) - f(y ) l } < !£, so that I f(x) - f(y) I < £. Hence f E C$ , which establishes that C$ is complete. • I f(x) -f(y) I � I f(x) - fn(X) + I fn(X) - fn(y) I
I
Notice hQw this property holds by virtue of the uniform metric. It is easy to devise sequences of continuous functions converging to discontinuous limits, but none of these are Cauchy sequences. It is not possible for a continuous function to be arbitrarily close to a discontinuous function at every point of the domain. A number of the results to follow call for us to exhibit a continuous function which lies· uniformly close to a function in U$ , but is fully specified by a finite collection of numbers. This is possible when the domain is totally bounded. 5.25 Theorem Let (Si,d) be a totally bounded metric space. For any f E U$ , there
exists for any £ > 0 a function g e U$ , completely specified by points of the domain x 1 , ••• ,Xm and rational numbers a t , ... ,am, such that du(f,g) < £. o
We specify rational numbers here, because this will allow us to assert in appli cations that the set of all possible g · is countable.
Proof7 By total boundedness of §, 3 for 0 > 0 a finite o-net {XJ, ... ,Xm } . For each xi, let Ai = {x: d(x,xi) � 2o } and Bi = {x: d(x,xi) � !o } , and define functions
gi :
s;
f-7
[0, 1 ] by
gi(x)
=
d(x,AD d(x,Aa + d(x,Bi) '
(5.38)
where d(x,A) = infy Ad(x,y) . d(x,A) is a uniformly continuous function of x by construction, and gi(x) is also uniformly continuous, for the denominator is never e
Metric Spaces less than �8. Then define
g(x)
89
'Li=i gi(x)ai
(5.39)
= ----
Lt=i gi(x)
Being a weighted average of the numbers { ai} , g(x) is bounded. Also, since {xi} is a 8-net for 5>, there exists for every x E 5i some i such that d(x,Ai) � 8, as well as d(x,BD :::; d(x,xi) :::; 8, and hence such that gi(x) � �- Therefore, Lt=i gi(x) � � and uniform continuity extends from the gi to g. For arbitrary f E Us, fix E > 0 and choose 8 small enough that I f(x) - f(y) I < �E when d(x,y) < 28, for any x,y E 5>. Then fix m large enough and choose xi and ai for i = , ... ,m, so that the S(xi,8) cover 5>, and I f(xi) - ad < �E for each i. Note that if d(x,x;) � 28 then x E Ai and gi(x) = 0, so that in all cases
1
(5.40) Hence
gi{x) I f(x) - ad :::; g;(x) I f(x) - f(xD I + g;(x) I f(x;) - a; I < g;(x)E
(5.41)
for each x E 5i and each i. We may conclude that
du(f,g) = sup I f(x) - g(x) I XES
(5.42) The next result makes use of this approximation theorem, and is fundamental. It tells us (recalling the earlier discussion of separability) that spaces of contin uous functions are not such alien objects from an analytic point of view as they might at first appear, at least when the domain is totally bounded. 5.26 Theorem (i) If (5i,d) is totally bounded then (Us,du) is separable. (ii) If (5i,d) is compact then (Cs,du) is separable. Proof We need only prove part (i), since for part (ii),
Cs = Us by 5.21 and the
same conclusion follows. Fix m and suitable points {xJ, ... ,xm } of § so as to define a countable family of functions A m = {gmk. k E IN } , where the gmk are defined as in 5.25, and the index k enumerates the countable collection of m-vectors (a 1 , ... ,am) of rationals. For each E > 0, there exists m large enough that, for each f E Us, du(f, gmk) < E for some k. By 1.5, A = limm ooA m is countable, and there exists gk E A such that du(f, gk) < E for every E > 0. This means that A is dense in Us. • ---t
To show that we cannot rely on}���!i�t�perties holding under more general -, ,, ,·
·,;:,' ·: �"";
. . ·-.
Mathematics
90
circumstances, we exhibit a nonseparable function space. 5.27 Example For $ = [a,b ], an interval of the real line, consider the metric space (Dca ,b],du) of real, bounded cadlag functions of a real variable. Cadlag is a colourful French acronym (continue a droite, limites a gauche) to describe functions of a real variable which may have discontinuities, but are right contin uous at every point, with the image of every decreasing sequence in [a,b] contain ing its limit point; in other words, there is a limit point to the left of every point. Of course, Cra.b] k; Dra.b] · Functions with completely arbitrary discon tinuities form a larger class still, but one that for most purposes is too unstructured to permit a useful theory. To show that (Dca,b]•du) is not separable, consider the subset with elements =
fa(t)
{o,
t<e , 8 1, t � e
E
[a,b].
(5.43)
This set is uncountable, containing as many elements as there are points in [a,b]. But du(fa,fa') = 1 for 8 i= 8', so it is also discrete. Hence (Dca.b],du) is not separable by 5.9. o Let A denote a collection of functions f: ($,d) H (U", p) . A is said to be equi continuous at x E $ if 'i/ £ > 0 3 o > 0 such that sup p(f(y),f(x)) < £.
sup
(5.44)
jEA y E Sd{x,o)
A is also said to be uniformly equicontinuous if 'i/ £ > 0 3 o > 0 such that sup sup
sup p(f(y) , f(x)) < £.
(5.45)
jEA X E $ y E Sd(x,o)
Equicontinuity is the property of a set of continuous functions (or uniformly continuous functions, as the case may be) which forbids limit points of the set to be not (uniformly) continuous. In the case when A c Cs; (Us;) but A is not (uniformly) equicontinuous, we cannot rule out the possibility that A � Cs; (Us;). An important class of applications is to countable A, and if we restrict atten tion to the case A = Um n E [N } , A c Cs; (or Us;) may not be essential. If we are willing to tolerate discontinuity in at most a finite number of the cases, the following concept is the relevant one. A sequence of functions Un , n E [N } will be said to be asymptotically equicontinuous at x if 'i/ £ > 0 3 o > 0 such that limsup n�oo
{
}
sup p(fn(y) , fn(x)) < £,
y E Sd{x,o)
(5.46)
and asymptotically uniformly equicontinuous if 'i/ £ > 0 3 o > 0 such that
Metric Spaces
{
limsup sup n�oo X E $
91
}
sup p(fn(y) ,fn(x)) < E.
y E Sd(x,8)
(5.47)
If the functions fn are continuous for all n, limsupn�oo can be replaced by supn in (5.46) and similarly for (5.47) when all the fn are uniformly continuous. In these circumstances, the qualifier asymptotic can be dropped. The main result on equicontinuous sets is the ArzeUt-Ascoli theorem. This designation covers a number of closely related results, but the following version, which is the one appropriate to our subsequent needs, identifies equicontinuity as the property of a set of bounded real-valued functions on a totally bounded domain which converts boundedness into total boundedness.
5.28 Arzela-Ascoli theorem Let (S,d) be a totally bounded metric space. A set A c C5; is relatively compact under du iff it is bounded and uniformly equi
continuous.
Proof Since C5; is complete, total boundedness of A is equivalent to relative
compactness by 5.13. So to prove 'if , we assume boundedness and equicontinuity, and construct a finite E-net for A. It is convenient to define the modulus of continuity of f, that is, the function w : c$ X IR + � IR + where
w(f,8) = sup
sup l f(y) - f(x) l .
x e $ y e Sd(x,B)
Fix £
>
(5.48)
0, and choose 8 (as is possible by uniform equicontinuity) such that sup w(f,8) < E.
(5.49)
fe A
Boundedness of A under the uniform metric means that there exist finite real numbers U and L such that L�
inf f(x) � sup f(x)
feA,x e $
feA,xe $
� U.
Let {x1 , ... ,xm } be a 8-net for S, and construct the finite family Dm = {gk E A, k = l , ... ,(v + l )m }
(5.50) (5.5 1)
according to the recipe of 5.25, with the constants ai taken from the finite set { L + ( U - L)u/v } , where u and v are integers with v exceeding ( U - L)IE and u = O, ... ,v. This set contains v + 1 real values between U and L which are less than E apart, so that Dm has (v + l )m members, as indicated. Since the assumptions imply A � U5;, it follows by 5.25 that for every f E A there exists gk E Dm with du(f,gk) < E. This shows that Dm is a E-net for A, and A is totally bounded. To prove 'only if , suppose A is relatively compact, and hence totally bounded. Trivially, total boundedness implies boundedness, and it remains to show uniform equicontinuity. Consider for £ > 0 the set
Mathematics
92
{f: w(f, llk) < £ } .
(5.52) Uniform equicontinuity of is the condition that, for any £ > 0, there exists k large enough that A c Bk(£). It is easily verified that (5. 53) l w(f,8) - w(g,8) 1 � 2du(f,g), so that the function w(.,o): (C$,du) H (lR + ,dE) is continuous. Bk(£) is the in verse image under w(.,O) of the half-line [0,£) which is open in [R + , and hence Bk(£) is open by 5.19. By definition of C$, w(f, llk) � 0 as k � = for each f e C$ . In other words, w converges to 0 pointwise on C$ , which implies that the collection {Bk(£), k e IN } must be an open covering for Cs;, and hence for But by hypothesis A is compact, every such covering of A has a finite subcover, and so A c Bk(£) for finite k, as required. • Bk(£)
A
=
A.
6
Topology
6. 1 Topological Spaces
Metric spaces form a subclass of a larger class of mathematical objects called topological spaces. These do not have a distance defined upon them, but the concepts of open set, neighbourhood, and continuous mapping are still well defined. Even though only metric spaces are encountered in the sequel (Part VI), much of the reasoning is essentially topological in character. An appreciation of the topological underpinnings is essential for getting to grips with the theory of weak convergence. 6.1 Definition A topological space (lZ,'t) is a set lZ on which is defined a topol ogy, a class of subsets 't called open sets having the following properties: (a) lZ E 't, 0 E 't. (b) If � c 't, then Uoe t5 0 E 't. (c) If 01 E 't, 02 E 't, then Ot ll 02 E 't. o These three conditions define an open set, so that openness becomes a primitive concept of which the notion of £-spheres around points is only one characteriza tion. A metric induces a topology on a space because it is one way (though not the only way) of defining what an open set is, and all metric spaces are also topolog ical spaces. On the other hand, some topological spaces may be made into metric spaces by defining a metric on them under which sets of 't are open in the sense defined in §5. 1 . Such spaces are called metrizable. A subset of a topological space (lZ,'t) has a topology naturally induced on it by the parent space. If A c ZZ, the collection 'tA = {A 11 0: 0 E 't } is called the relative topology for A. (A,'tA) would normally be referred to as a subspace of lZ . If two topologies t1 and t2 are defined on a space and 'tJ c 't2, then t1 is said to be coarser, or weaker, than 't2, whereas 't2 is finer (stronger) than 'tJ . In partic ular, the power set of lZ is a topology, called the discrete topology, whereas { 0 ,ZZ } is called the trivial topology. Two metrics define the same topology on a space if and only if they are equivalent. If two points are close in one space, their images in the other space must be correspondingly close. If a set 0 is open, its complement oc on lZ is said to be closed. The closure A of an arbitrary set A c lZ is the intersection of all the closed sets containing A . As for metric spaces, a set A c B, for B c ZZ, is said to be dense in B if B � A. ·
6.2 Theorem The intersection of any collection of closed sets is closed. lZ and 0 are both open and closed. o
Mathematics
94
However, an arbitrary union of closed sets need not be closed, just as an arbit rary intersection of open sets need not be open. For given x E � ' a collection Y'x of open sets is called a base for the point x if for every open 0 containing x there is a set B E Y'x such that x E B and B c 0. This is the generalization to topological spaces of the idea of a system of neighbourhoods or spheres in a metric space. A base for the topology 't on � is a collection V of sets such that, for every 0 E 't, and every x E 0, there exists B E V such that x E B c 0. The definition implies that any open set can be expressed as the union of sets from the base of the topology; a topology may be defined for a space by specifying a base collection, and letting the open sets be defined as the unions and finite intersections of the base sets. In the case of IR , for example, the open intervals form a base. 6.3 Theorem A collection V is a base for a topology 't on � iff (a) Une v B = �. (b) '\/ B1,B2 E V and x E B1 n B2, 3 B3 E V such that x E B3
c
B1 n B2.
Proof Necessity of these conditions follows from the definitions of base and open set. For sufficiency, define a collection 't in terms of the base V, as follows:
0 E 't iff, for each x E 0, 3 B E V such that x E B c 0.
(6. 1)
0 satisfies the condition in (6. 1 ), and � satisfies it given condition (a) of the theorem. If & is a collection of 't-sets, Uoe � 0 E 't since (6. 1 ) holds in this case in respect of a base set B corresponding to any set in & which contains x. And if 01,02 E 't, and x E 01 n 02, then, letting B1 and B2 be the base sets specified in
(6. 1 ) in respect of x and 01 and 02 respectively, condition (b) implies that x E
B3 c 01 n 02, which shows that 't is closed under finite intersections. Hence, 't is
a topology for �.
•
The concept of base sets allows us to generalize two further notions familiar from metric spaces. The closure points (accumulation points) of a set A in a topological space (�,'t) are the points x E � such that every set in the base of x contains a point of A (a point of A other than x). An important exercise is to show that x is a closure point of A if and only if x is in the closure of A . We have generalizations of two other familiar concepts. A sequence {xn } of points in a topological space is said to converge to x if, for every open set 0 containing x, 3 N � 1 such that Xn E 0 for all n � N. And x is called a cluster point of { Xn } if, for every open 0 containing x and every N � 1 , Xn E 0 for some n � N. In general topological spaces the notion of a convergent sequence is inadequate for characterizing basic properties such as the continuity of mappings, and is augmented by the concepts of net and filter. Because we deal mainly with metric spaces, we do not require these extensions (see e.g. Willard 1970: Ch. 4). 6 . 2 Countability and Compactness
The countability axioms provide one classification of topological spaces accord ing, roughly speaking, to their degree of structure and amenability to the methods
Topology
95
of analysis. A topological space is said to satisfy the first axiom of countabil ity (to be first-countable) if every point of the space has a countable base. It satisfies the second axiom of countability (is second-countable) if the space as a whole has a countable base. Every metric space is first-countable in view of the existence of the countable base composed of open spheres, S(x, 1/n) for each x. More generally, sequences in first-countable spaces tend to behave in a similar manner to those in metric spaces, as the following theorem illustrates. 6.4 Theorem In a first-countable space, x is a cluster point of a sequence {xm n E IN } iff there is a subsequence {xnk k E IN } converging to x. ' Proof Sufficiency is immediate. For necessity, the definition of a cluster point implies that n ;::: N such that Xn E 0, for every open 0 containing x and every N ;::: 1 . Let the countable base of x be the collection 'llx = {Bi, i E IN }, and choose a monotone sequence of base sets {Ab k E IN } containing x (and hence nonempty) with A 1 = B1, and Ak c Ak- 1 n Bk for k = 2,3, ... ; this is always possible by 6.3. Since x is a cluster point, we may construct an infinite subsequence by taking Xnk as the next member of the sequence contained in Ak> for k = 1 ,2, ... For every open set 0 containing x, N ;::: 1 such that Xnk E Ak � 0, for all k ;::: N, and hence Xnk � x as k � oo, as required. •
3
3
The point of quoting a result such as this has less to do with demonstrating a new property than with reminding us of the need for caution in assuming properties we take for granted in metric spaces. While the intuition derived from [R-like situat ions might lead us to suppose that the existence of a cluster point and a conver gent subsequence amount to the same thing, this need not be true unless we can establish first-countability. A topological space is said to be separable if it contains a countable dense subset. Second-countable spaces are separable. This fact follows directly on taking a point from each set in a countable base, and verifying that these points are dense in the space. The converse is not generally true, but it is true for metric spaces, where separability, second countability and the LindelOf property (that every open cover of � has a countable subcover) are all equivalent to one another. This is just what we showed in 5.6. More generally, we can say the following. 6.5 Theorem A second-countable space is both separable and LindelOf. Proof The proof of separability is in the text above. To prove the LindelOf property, let � be an open cover of �, such that UAE �A = �. For each A E � and x E A , we can find a base set Bi such that x E Bi c A. Since Ui'=tBi = �. we may choose a countable subcollection Ai, i = 1,2, ... such that Bi c Ai for each i, and hence U i= tA i
= �-
•
A topological space is said to be compact if every covering of the space by open sets has a finite subcover. It is said to be countably compact if each countable covering has a finite subcovering. And it is said to be sequentially compact if each sequence on the space has a convergent subsequence. Sometimes, compact-
Mathematics
96
ness is more conveniently characterized in terms of the complements. The comple ments of an open cover of X are a collection of closed sets whose intersection is empty; if and only if X is compact, every such collection must have a finite sub collection with empty intersection. An equivalent way to state this proposition is in terms of the converse implication. A collection of closed sets is said to have the finite intersection property if no finite subcollection has an empty inter section. Thus: is compact (countably compact) if and only if no collection (countable collection) of closed sets having the finite intersection property has an empty intersection. o
6.6 Theorem
X
The following pair of theorems summarize important relationships between the different varieties of compactness. 6.7 Theorem A first-countable space X is countably compact iff it is sequentially
compact.
Proof Let the space be countably compact. Let {xm n E
IN } be a sequence in X, and define the sets En = {xmXn+t, . . } , n = 1,2, ... The collection of closed sets (Bn, n E IN } clearly possesses the finite intersection property, and hence nnBn is nonempty by 6.6, which is another way of saying that { Xn} has a cluster point. .
Since the sequence is arbitrary, sequential compactness follows by 6.4. This proves necessity. For sufficiency, 6.4 implies that under sequential compactness, all sequences in X have a cluster point. Let { C;, i E IN } be a countable collection of closed sets having the finite intersection property such that An = n'l= t C; ::�; 0, for every finite n. Consider a sequence {xn } chosen such that Xn E An, and note since {An } is monotone that Xn E A m for all n 2 m; or in other words, A m contains the sequence { Xn, n 2 m } . Since { Xn} has a cluster point x and A m is closed, x E A m . This is true for every m E [N , so that n'i=t Ci is nonempty, and X is countably compact by 6.6. • 6.8 Theorem A metric space ('5,d) is countably compact iff it is compact. Proof Sufficiency is immediate. For necessity, we show first that if '5 is
countably compact, it is separable. A metric space is first-countable, hence countable compactness implies sequential compactness (6.7), which in turn implies that every sequence in '5 has a cluster point (6.4). This must mean that for any £ > 0 there exists a finite £-net {x1 , ... ,xm } such that, for all x E '5, d(x,xk) < £ for some k E { l , . . . ,m} ; for otherwise, we can construct an infinite sequence {xn } with d(xn,Xn') 2 £ for n '# n', contradicting the existence of a cluster point. Thus, for each n E [N there is a finite collection of points An such that, for every X E '5, d(x,y) < 2 -n for some y E An. The set = u;;=tAn is countable and dense in '5, and '5 is separable. Separability in a metric space is equivalent by 5.6 to the LindelOf property, that every open cover of '5 has a countable subcover; but countable compactness implies that this countable subcover has a finite subcover in its turn, so that
D
Topology compactness is proved.
97
•
Like separability and compactness, the notion of a continuous mapping may be defined in terms of a distance measure, but is really topological in character. In a pair of topological spaces � and Yl, the mapping f: � 1---7 Yl is said to be contin uous if f-1(B) is open in � when B is open in Yl, and closed in � when B is closed in Yl. That in metric spaces this definition is equivalent to the more familiar one in terms of£- and 8-neighbourhoods follows from 5.19. The concepts of homeomor phism and embedding, as mappings that are respectively onto or into, and 1-1 continuous with continuous inverse, remain well defined. The following theorem gives two important properties of continuous maps. 6.9 Theorem Suppose there exists a continuous mapping f from a topological space � onto another space Yl. (i) If � is separable, Yl is separable. (ii) If � is compact, Y1 is compact. Proof (i) The problem is to exhibit a countable, dense subset of Yl. Consider f(D) where D is dense in �- If f(D) is the closure of f(D), the inverse image f -1 (f(D)) is closed by continuity of f, and contains f - 1 (f(D)) , and hence also contains D by 1.2(iv). Since D is the smallest closed set containing D, and � c D, it follows that � c f-1 (f(D)) . But since the mapping is onto, Yl = f(�) c f(F1 Cf(D))) � f(D), where the last inclusion is by 1.2(v). f(D) is therefore
dense in Yl as required. f(D) is countable if D is countable, and the conclusion follows directly. (ii) Let b' be an open cover of Yl. Then { r 1 (B): B E b'} must be an open cover of � by the definition. The compactness of � means that it contains a finite sub cover, say f-'(B1 ), ,f-1 (Bn) such that • • •
Y1 =
t(�)
=
t
(O t- ' csj)) t (r ' (usj)) y=l
=
y=l
�
(J sj.
J=l
(6.2)
where the third equality uses 1.2(ii) and the inclusion, 1.2(v). Hence b' contains a finite subcover. •
Note the importance of the stipulation 'onto ' in both these results. The extension of (ii) to the case of compact subsets of � and Y1 is obvious, and can be supplied by the reader. Completeness, unlike separability, compactness, and continuity, is not a topo logical property. To define a Cauchy sequence it is necessary to have the concept of a distance between points. One of the advantages of defining a metric on a space is that the relatively weak notion of completeness provides some of the essential features of compactness in a wider class than the compact spaces. 6 . 3 Separation Properties
Another classification of topological spaces is provided by the separation axioms, which in one sense are more primitive than the countability axioms. They are
Mathematics
98
indicators of the richness of a topology, in the sense of our ability to distin guish between different points of the space. From one point of view, they could be said to define a hierarchy of resemblances between topological spaces and metric spaces. Don ' t confuse separation with separability, which is a different concept altogether. A topological space lZ is said to be: - a T1 -space, iff \1 x,y E lZ with x '# y 3 an open set containing x but not y and also an open set containing y but not x; - a Hausdoiff (or T2-) space, iff V x,y E lZ with x '# y 3 disjoint open sets 01 and Oz in lZ with x E 01 and y E Oz; - a regular space iff for each closed set C and x (i!O C 3 disjoint open sets 01 and Oz with x E 0 1 and C c 02; - a normal space iff, given disjoint closed sets C1 and C2 , 3 disjoint open sets 0 1 and Oz such that C1 c 01 and Cz c Oz. A regular T1 -space is called a T3-space, and a normal T1 -space is called a T4-space. In a T1 -space, the singleton sets { x} are always closed. In this case y E { x V whenever y '# x, where { x} c is the complement of a closed set, and hence open. Conversely, if the T1 property holds, every y '# x is contained in an open set not containing x, and the union of all these sets, also open by 6.1(b), is {x} c. It is easy to see that T4 implies T3 implies T2 implies T1 , although the reverse implications do not hold, and without the T1 property, normality need not imply regularity. Metric spaces are always T4, for there is no difficulty in construct ing the sets specified in the definition out of unions of £-spheres. We have the following important links between separation, compactness, count ability, and metrizability. The first two results are of general interest but will not be exploited directly in this book, so we forgo the proofs. The proof of 6.12 needs some as yet undefined concepts, and is postponed to §6.6 below. 6.10 Theorem A regular LindelOf space is normal. 6.1 1 Theorem A compact Hausdorff space is T4.
o
o
6.12 Urysohn's metrization theorem A second-countable T4-space is metriz able. o In fact, the conditions of the last theorem can be weakened, with T4 replaced by T3 in view of 6.10, since we have already shown that a second-countable space is LindelOf (6.5). The properties of functions from lZ to the real line play an important role in defining the separation properties of a space. The key to these results is the remarkable Urysohn' s lemma. 6.13 Urysohn's lemma A topological space lZ is normal iff for any pair A and B of disjoint closed subsets there exists a continuous function f: lZ � [0, 1 ] such that f(A) = 0 and f(B) 1 . o =
The function f is called a separating function.
Topology
99
Proof This is by construction of the required function. Recall that the dyadic
rationals 10 are dense in [0, 1 ] . We demonstrate the existence of a system of open sets { Un r E [) } with the properties
(6.3) (6.4) (6.5) Normality implies the existence of an open set U1 12 (say) such that U1 12 contains A and (U1 12)c contains B. The same story can be told with Uf12 replacing B in the role of C2 to define Uu4, and then again with Uu2 replacing A in the role of C1 to define U314· The argument extends by induction to generate sets { Uml2n, m = n 1 , . . . ,2 - 1 } for any n E [N, and the collection { Un r E 10 } is obtained on letting n --7 oo It is easy to verify conditions (6.3)-(6.5) for this collection. Fig. 6. 1 illustrates the construction for n = 3 when A and B are regions of the plane. One must imagine countably many more 'layers of the onion' in the limiting case. .
Fig. 6. 1 Now define f:
iZ
f(x) =
{
--7
[0, 1 ] by
inf{ r
E
10 :
X
E
1,
(6.6)
where, in particular, f(x) = 1 for x E B. Because of the monotone property (6.5), we have for any a E (0, 1 ]
{x: f(x) < a }
= =
{x: inf{ r E ID : x E Ur } < a } {x: ::3 r < a such that x E Ur } r< a
(6.7)
which is open. On the other hand, because [) is dense in [0, 1 ] we can deduce that,
Mathematics
100 for any �
E
[0, 1 ), {x: f(x) $ �}
= = =
=
{x: inf{ r E ID : x E U, } $ � } { x: x E U, V r > � }
n
u,
n
u,
r> P r> j3
(6.8)
which is closed. Here, the final equality must hold to reconcile the following two facts: first that U, � U, and second that, for all r > � ' there exists (since ID is dense) s E ID with r > s > � and Us c U, by (6.5). We have therefore shown that, for 0 ::; � < a $ 1,
{x: � < f(x) < a} = {x: f(x) < a} n {x: f(x) $ � }c
(6.9)
is open, being the intersection of open sets. Since every open set of [0,1] is a union of open intervals (see 2.5), it follows that f - \A) is open in 2:\ whenever A is open in [0,1], and accordingly f is continuous. It is immediate that f(A) = 0 and f(B) = 1 as required, and necessity is proved. Sufficiency is simply a matter, given the existence of f with the indicated properties, of citing the two sets f- 1 ([0,�)) and f -1 ((�,1]), whose images are open in 2:\ , disjoint, and contain A and B respectively, so that 2:\ is normal. •
It is delightful the way this theorem conjures a continuous function out of thin air! It shows that the properties of real-valued functions provide a legitimate means of classifying the separation properties of the space. In metric spaces, separating functions are obtained by a simple direct construc tion. If A and B are closed and disjoint subsets of a metric space ('5,d), the normality property implies the existence of 3 > 0 such that infxE A yE Bd(x,y) � 3. The required function is ,
d(x,A) f(x) = d(x,A) + d(x,B)
(6. 10)
where d(x,A) infyE Ad(x,y) , and d(x,B) is defined similarly. The continuity of f follows since d(x,A) and d(x,B) are continuous in x, and the denominator in (6. 10) is bounded below by 3 . A similar construction was used in the proof of 5.25. The regularity property can be strengthened by requiring the existence of separating functions for closed sets C and points x. A topological space 2:\ is said to be completely regular if, for all closed C � 2:\ and points x ItO C, 3 a continuous function f: 2:\ 1---7 [0, 1 ] with f( C) = 0 and f(x) 1 . A completely regular T1 -space is called a Tychonoff space or T3 rspace. As the tongue-in-cheek terminology suggests, a T4-space is T3 1 (this is immediate from Urysohn ' s lemma) and a T3 � -space is clearly T3, although the reverse implications do not hold. Being T4, metric spaces are always T3 � · ==
==
2
Topology
101
6 . 4 Weak Topologies
Now let' s go the other way and, instead of using a topology to define a class of real functions, use a class of functions to define a topology. Let � be a space and fF a class of functions f: � 1----7 Y/1, where the codomains Yl1 are topological spaces. The weak topology induced by fF on � is the weakest topology under which every f E fF is continuous. Recall, continuity means that r\A) is open in � whenever A is open in Yl1 . We can also call the weak topology the topology gener ated by the base of sets V consisting of the inverse images of the open sets of the Y/1, under f E IF, together with the finite intersections of these sets. The inverse images themselves are called a sub-base for the topology, meaning that the sets of the topology can be generated from them by operations of union and finite intersection. If we enlarge IF we (potentially) increase the number of sets in this base and get a stronger topology, and if we contract IF we likewise get a weaker topology. With a given IF , any topology stronger than the weak topology contains a richer collection of open sets, so the elements of IF must retain their continuity in this case, but weakening the topology further must by definition force some f E IF to be discontinuous. The class of cases in which Yf1 = for each f suggests using the concept of a weak topology to investigate the structure of a space. One way to represent the richness of a given topology 't on � is to ask whether 't contains, or is contained in, the weak topology generated by a particular collection of bounded real-valued functions on �- For example, complete regularity is the minimal condition which makes the sort of construction in 6.13 feasible. According to the next result, this is sufficient to allow the topology to be completely characterized in terms of bounded, continuous real-valued functions on the space.
IR
6.14 Theorem If a topological space (�,'t) is completely regular, the topology 't is the weak topology induced by the set
fF
of the separating functions.
Proof Let V denote the collection of inverse images of open sets under the func tions of IF . And let T denote the weak topology induced by IF, such that the V-sets, together with their finite intersections, form a base for T. We show that T = 't. For any x E 2Z , let 0 E 't be an open set containing x. Then ac is closed, and by complete regularity there exists f E IF taking values in [0,1 ] with f(x) = 1 and f(Oc) = 0. The set (�,1] is open in [0, 1], and B = j - 1 ((�, 1]) is therefore an open set, containing x and disjoint with oc so that B c 0. Since this holds for every such 0, x has a base Vx consisting of inverse images of open sets under functions from fF . Since x is arbitrary the collection V { Vx• x E � } forms a base for 't. It follows that 't c T. On the other hand, T is by definition the weakest topology under which every f E IF is continuous. Since f E IF is a separating function and continuous under 't, it also follows that T c 't. • =
Mathematics
102
6.5 The Topology of Product S paces
Let � and YI be a pair of topological spaces, and consider the product space � x YJ . The plane IR x IR and subsets thereof are the natural examples for appreciating the properties of product spaces, although it is a useful exercise to think up more exotic cases. An example always given in textbooks of topology is tC x tC where (C is the unit circle; this space has the topology of the torus (doughnut). Let the ordered pair (x,y) be the generic element of � x YJ. The coordinate projections are the mappings 7tzZ: � x Y1 H � and 1t'rT : � x Y1 H Y1, defined by 1tll(x,y)
=
1t'rT (x,y)
=
(6. 1 1) (6. 1 2)
x
y.
If � and Y1 are topological spaces, the coordinate projections can be used to generate a new topology on the product space. The product topology on � x Y1 is the weak topology induced by the coordinate projections. The underlying idea here is very simple. If A c � and B c Y1 are open sets, the set A X B = (A X Yl) n (� X B), where A X Y1 = 1t� \A) and � X B = 1t� 1 (B), will be regarded as open in � x Y1, and is called an open rectangle of � x YJ. The product topology on � x Y1 is the one having the open rectangles as a base. This means that two points (x1 ,y 1 ) and (x2,y2) are close in � x Y1 provided XJ is close to x2 in �' and y 1 to y2 in YJ . Equivalently, it is the weakest topology under which the coordinate projections are continuous. If the factors are metric spaces (�,dll) and (YJ, d'rT ), several metrics can be constructed to induce the product topology on � x Y1, including
(6. 13) and
(6. 14) An open sphere in the space (� x YJ, p), where p is the metric in (6. 13), also
happens to be an open rectangle, for
Sp ((x,y) ,8) = Sd (x,b) X Sd (y,b) ; Yl
)(
(6. 1 5)
but of course, this is not true for every metric. Since either � or Y1 may be a product space, the generalization of these results from two to any finite number of factors is straightforward. The generic element of the space X7= I �i is the n-tuple (XJ , ,xn: xi E �D, and so on. But to deal with infinite collections of factor spaces, as we shall wish to do below, it is necessary to approach the product from a slightly different viewpoint. Let A denote an arbitrary index set, and { � a ' a E A } a collection of spaces indexed by A . The Cartesian product �A = Xae A� a is the collection of all the mappings x: A H Uae �a such that x(a) E � a for each a E A. This definition contains that given A in § 1 . 1 as a special case, but is fundamentally more general in character. The coordinate projections are the mappings 1ta : �A H � a with • . .
Topology 1ta(x) = x(a),
103 (6. 1 6)
but can also be defined as the images under x of the points a E A. Thus, a point in the product space is a mapping, the one which generates the coordinate projections when it is evaluated at points of the domain A. In the case of a finite product, A can be the integers 1 , ... ,n. In a countable case A = IN, or a set equipotent with IN , and we should call x an infinite sequence, an element of � 00 (say). A familiar uncountable example is provided by a class of real-valued functions x: IR � IR, so that A = IR . In this case, x associates each point a E IR with a real number x(a) , and defines an element of the product IR IR . The product topology is now generalized as follows. Let { �a. a E A } be an arbitrary collection of topological spaces. The Tychonoff topology (product topology) on the space �A has as base the finite-dimensional open rectangles, sets of the form Xae A Oa, where the Oa � �a are open sets and Oa = �a except for at most a finite number of coordinates. These basic sets can be written as the intersections of finite collections of cylinders, say (6. 1 7) for indices a1, . . . ,am E A. Let 't be a topology under which the coordinate projections are continuous. If Oa is open in �a. n;1 (0a) E 't, and hence 't contains the Tychonoff topology. Since this is true for any such 't, we can characterize the Tychonoff topology as the weak topology generated by the coordinate projections. The sets n; \ Oa) form the sub-base for the topology, whose finite intersections yield the base sets. Something to keep in mind in these infinite product spaces is that, if any of the sets �a are empty, �A is empty. Some of our results are true only for non empty spaces, so for full rigour the stipulation that elements exist is desirable. 6.15 Example The space (C,du) examined in §5.5 is an uncountable product space having the Tychonoff topology; the uniform metric is the generalization of the maximum metric p of (6. 1 3). Continuous functions are regarded as close to one another under du only if they are close at every point of the domain. The sub sequent usefulness of this characterization of ( C,du) stems mainly from the fact that the coordinate projections are known to be continuous. o The two essential theorems on productspaces extend separability and compactness from the factor spaces to the product. The following theorem has a generalization to uncountable products, which we shall not pursue since this is harder to prove, and the countable case is sufficient for our purposes. 6.16 Theorem Finite or countable product spaces are separable under the product topology iff the factor sp aces are separable. Proof The proof for finite products is an easy implication of the countable case, hence consider �oo = Xt=!�i· Let Di = { di!,di2···· } � �i be a countable dense set for each i, and construct a set D � �oo by defining
Mathematics
104 Fm
m
=
X D; X X { dil }
(6. 1 8)
i=l
i=m+ l for m = 1 ,2, . . . , and then letting D = U;;;= I Fm . Fm is equipotent with the set of m-tuples formed from the elements of the countable D 1 , ... ,Dm , and is countable by induction from 1.4. Hence D is countable, as a countable union of countable sets. We will show that D is dense in 2'.\"". Let B = Xi= 1 0; be a non-empty basic set, with 0; open in 2'.\ ; and 0; = 2'.\; except for a finite number of coordinates. Choose m such that 0; = 2'.\ ; for i > m, and then m B n Fm = x co; n D;) X X { d;J } i= 0, (6. 19) i=m+l i=l recalling that the dense property implies O; n D; :t= 0, for i = 1, . . . ,m. Since B n F c B n D, it follows that B contains a point of D; and since B is an arbi trary basic set, D is dense in 2'.\ "", as required. • m
One of the most powerful and important results in topology is Tychonoff' s theorem, which states that arbitrary products of compact topological spaces are also compact, under the product topology. It will suffice here to prove the result for countable products of metric spaces, and this case can be dealt with using a more elementary and familiar line of argument. It is not necessary to specify the metrics involved, for we need the spaces to be metric solely to exploit the equivalence of compactness and sequential compactness. 6.17 Theorem A finite or countable product of separable metric spaces (2'.\ ;,d;) is compact under the product topology iff the factor spaces are compact. Proof As before, the finite case follows easily from the countable case, so assume
2'.\"" = Xi= 1 � ; , where the �i are separable spaces. In a metric space, which is first countable, compactness implies separability and is equivalent to sequential compactness by 6.8 and 6.7. Since � i is sequentially compact and first-countable, every sequence {x;n, n E IN } on �i has a cluster point x; on the space (6.4). Applying the diagonal argument of 2.36, there exists a single subsequence of integers, { nk> k E IN } , such that X;nk ---7 x;, for every i. Consider the subsequence in �00, {xnk ' k E IN } where Xnk = (XJnk' Xznk, ... ). In the product topology, Xnk ---7 x = (xi . xz, . . . ) iff X;nk ---7 x; for every i, which proves that � oo is sequentially compact. � = can be endowed with the metric Poo = Lt=1d; lzi, which induces the product topology. �00 is separable by 6.16, and sequential compactness is equivalent to compactness by 6.8 and 6.7, as above. This proves sufficiency. Necessity follows from 6.9(ii), by continuity of the projections as before. • 6.18 Example The space !R oo (see 5.14) is endowed with the Tychonoff topology
if we take as the base sets of a point x the collection N(x ; k, £)
=
{y : l x; - y; l < £, i
=
l , ... ,k} ;
k
E
IN, £ > 0.
(6.20)
A point in !R oo is close to x in this topology if many of its coordinates are close
Topology
105
to those of x; another point is closer if either more coordinates are within £ of each other, or the same coordinates are closer than £, or both. The metric doo defined in (5. 1 5) induces the topology of (6.20). If {xn } is a sequence in )Z, doo (Xn,x) � 0 iff V £,k 3 N � 1 such that Xn E N(x;k,£) for all n � N. We already know that IROO is separable under doo (5.15), but now we can deduce this as a purely topological property, since !R oo inherits separability from IR by 6.16. o The infinite cube shares the topology (6.20) with IROO and is a compact space by 6.17; to show this we can assign the Euclidean metric to the factor spaces [0, 1]. The trick of metrizing a space to establish a topological property is frequently useful, and is one we shall exploit again below.
[O,lr
6.6 Embedding and Metrization
Let )Z be a topological space, and IF a class of functions f: )Z 1---7 YJ1 . The evalua tion map e: )Z 1---7 XrE IF Yl1 is the mapping defined by e(x)f = f(x). (6.21) The class IF may be quite general, but if it were finite we would think of e(x) as the vector whose elements are the f(x), f E IF . (6.21) could also be written 1tj 0e = f where rc1 is the coordinate projection. A minor complication arises because f need not be onto YI1, and e()Z) c XrE rF YI1 is possible. If A is a set of points in YJ1, the inverse projection rcj 1 (A) may contain points not in e()Z). We therefore need to express the inverse image of A under f, in terms of e, as (6.22) The importance of this concept stems from the fact that, under the right condi tions, the evaluation map embeds )Z in the product space generated by it. It would be homeomorphic to it in the case e()Z) = XrErF Yft . 6.19 Theorem Suppose the class IF separates points of )Z, meaning that f(x) ::f. f(y) for some f E IF whenever x and y are distinct points of )Z. If )Z is endowed with the weak topology induced by IF , the evaluation map defines an embedding of )Z into XrYit .
Proof It has to be shown that e is a 1-1 mapping from )Z onto a subset of XrYI1, . which is continuous with continuous inverse. Since IF separates points of )Z, e is 1 - 1 , since e(x) ::f. e(y) whenever f(x) ::f. f(y) for some f E IF. To show continuity of e, note first that F 1 (A) is open in )Z whenever A is open in YJ1 under the weak topology, and sets of the form rcj 1 (A) are likewise open in XtYI1 with the product topology, since the projections are continuous. But e- 1 (rcj 1 (A)) = (rc�ef 1 (A) = f - 1 (A), so we can conclude that the inverseimages under e of sets of the form )'tj\A), A � YJ1, are open. Since inverse i�ges preserve unions and intersections .base sets of XtYJ1, which are (see 1.2) the same property extends� fit$1 . the product topology, and finite intersections of these inverse . thence to all the open sets of open in )Z. Let B be a e - 1 is continuous if e(B) is open
Mathematics
106
set of the formf- 1 (A), where A is open in Yfr Since IF defines the topology on � we know this set to be open, and the finite intersections of such sets form a base for � by assumption. Since e is 1-1 and e- I a mapping, it will suffice to verify that their images under e are open in e(�). Noting that B is a set of the type shown in (6.22), e(B) = nj 1 (A) n e(�), but since nj 1 (A) is open, e(B) is open in e(�) as required. • The following is the (for us) most important case of Urysohn 's embedding theorem. 6.20 Theorem A second-countable T4-space (�,'t) can be embedded in [0, 1r. o The proof requires the sufficiency part of the following lemma. 6.21 Lemma Let x E �. and let 0 � � be any open set containing x. Iff � is a regular space, there exists an open set U with x E U c 0. Proof Let � be regular. If 0 is open and x E 0, there exist disjoint open sets U and C such that x E U and oc c C, and hence cc c 0. Since U � cc by the dis jointness and cc is closed, we have U � cc c 0, and sufficiency is proved. To prove necessity, suppose x E U and U c 0. oc is a closed set not containing x, and oc c if, where U and if are disjoint open sets. Hence � is regular. • Proof of 6.20 Let V be a countable base for 't. Since the space is T4 it is T3 and hence regular. For any x E � and B E V containing x, we have by 6.21 a U E 't such that x E U c B, and also by definition of a base 3 A E V with x E A c U c U, and hence x E A c U c B. (A is the smallest closed set containing A, note.) Since V is countable, the collection of all such pairs, say d1 {(A,B): A E V, B E 'V; A c B } , (6.23) is countable, and so we can label its elements (A,B)i = (Ai,BD, i = 1,2, ... Every x E � lies in Ai for some i E IN . Since the space is normal, we have by Urysohn's lemma a separating function fi: � H [0, 1 ] for each element of dJ, such that fi(Ai) 1 and f/B7) 0. For each x E � and closed set c such that X * C, choose (Ai,Bi) such that X E xi c Bi c cc, and then fi(x) = 1 and flC) = 0. These separating functions form a countable class IF , a subset of U(�). Since the space is T1 , C can be a point {y} so that IF separates points. And since the space is T3 � and hence completely regular, 't is the weak topology induced by rF , by 6.14. It follows by 6.19 that the evaluation map for rF embeds 2:\ into [0, 1r. • Recall that [0, 1 r endowed with the metric P oo defined in (5. 19) is a compact metric space. It follows that e(�), which is homeomorphic to � under the eval uation mapping by rF , is a totally bounded metric space. It further follows that (�,'t) is metrizable, since, among other possibilities, it can be endowed with the metric under which the distance between points x and y of � is taken to be P oo (e(x),e(y)). We have therefore proved the Urysohn metrization theorem, 6.12. The topology induced by this metric on [0,1r is the Tychonofftopology. A base for a point p = (pt.p 2, ... ) E [O,l r in this topology is provided by sets of the =
=
=
Topology
107
form N(p;k, £)
{q E [0, 1]"": IPi - qd < £, i = 1, . . . ,k} , (6.24) for some finite k, and 0 < £ < 1, which is the same as (6.20) . The topology induced on � by the embedding is accordingly generated by the base sets N(x;k,£) = {y E �: I fiCx) - flY) I < £, i = 1, ... ,k}, (6.25) which can be recognized as finite intersections of the inverse images, under functions from If , of £-neighbourhoods of [R ; this is indeed the weak topology induced by If . This further serves to remind us of the close link between product topologies and weak topologies. Since metric spaces are T4, separable metric spaces can be embedded in [0, 1 ]"" by 6.20. In this case the motivation is not metrization, but usually compactific ation - that is, to show that separable spaces can be topologized as totally bounded spaces. Both metrization and compactification are techniques with impor tant applications in the theory of weak convergence, which we study in Chapter 26. Although the following theorem is a straightforward corollary of 6.20, the result is of sufficient interest to deserve its own proof; the main interest is to see how in metric spaces there always exists a ready-made collection of functions to define the weak topology. 6.22 Theorem A separable metric space ('S,d) is homeomorphic to a subset of [0, 1 ] "" . Proof Let � = d/(1 + d), which satisfies 0 � � S 1 and is equivalent to d, so that (�,d0) is homeomorphic to (�,d). By separability there exists a countable set of points {zi , i E [N } which is dense in �. Let a countable family of functions be defined by fi(x) = d0(x,zi), i = 1, 2 , ... , and define an evaluation map h: � H [0,1 ]"" by h(x) = (do(X,ZJ), do(X,Z2), ... ) . (6.26) We show that h is an embedding in ([0, 1] 00,poo) where Poo(h,g) = I.k'= I I hk - gk l 12k. If {xn} is a sequence in � converging to x, then for each k, do(Xn,Zk) -7 d0(x,zk). Accordingly, V k,£ ::3 N � 1 such that Xn E N(x;k,£) for all n � N, Poo(h(xn),h(x)) ---7 0, and h is continuous at x. On the other hand, if Xn A x, there exists £ > 0 such that V N � 1, do(Xn,x) � £ for some n � N. Since {zk } is dense in � . there is a k for which �(xn,Zk) � �£ and do(X,Zk) < i£, so that I �(Xn,Zk) - do(X,Zk) I � !£ and hence ( 6.27) Since this holds for some n � N for every N � 1, it holds for infinitely many n, and h(xn) A h(x). We have therefore shown that h(xn) -7 h(x) if and only if Xn -7 x. This is the property of a 1-1 continuous function with continuous inverse. • But note too the alternative approach of transforming the distance functions into separating functions as in (6.10), and applying 6.20. =
II PROBABILITY
7
Probability Spaces
7 . 1 Probability Measures
A random experiment is an action or observation whose outcome is uncertain in advance of its occurrence. Tosses of a coin, spins of a roulette wheel, and observations of the price of a stock are familiar examples. A probability space, the triple (0./!f ,P), is to be thought of as a mathematical model of a random experiment. 0. is the sample space, the set of all the possible outcomes of the experiment, called the random elements, individually denoted w. The collection 'Jf of random events is a cr- field of subsets of 0., the event A E ':f being said to have occurred if the outcome of the experiment is an element of A. A measure P is assigned to the elements of ':f, P(A) being the probability of A . Formally, we have the following. 7.1 Definition A probability measure (p.m.) on a measurable space (Q,':f) is a set function P: ':f � [0, 1 ] satisfying the axioms of probability: (a) P(A) � 0, for all A E ':f. (b) P(Q) = 1 . (c) Countable additivity: for a disjoint collection {Aj E ':f , j E IN } , P
(�j) � P(Aj) =
D
(7. 1 )
Under the frequentist interpretation of probability, P(A) is the limiting case of the proportion of a long run of repeated experiments in which the outcome is in A. Alternatively, probability may be viewed as a subjective notion with P(A) said to represent an observer ' s degree of belief that A will occur. For present purposes, the interpretation given to the probabilities has no relevance. The theory stands or falls by its mathematical consistency alone, although it is then up to us to decide whether the results accord with our intuition and are useful in the ana lysis of real-world problems. Additional properties of P follow from the axioms. 7.2 Theorem If A, B, and {Aj, j E IN } are arbitrary ':f-sets, then (i) P(A) � 1 . (ii) P(A c) = 1 - P(A). (iii) P(0) = 0. (iv) A � B ::::} P(A) � P(B) (monotonicity). (v) (A u B) = P(A) + P(B) - P(A r1B). (vi) P(UAj) � l.jP(Aj) (countable suba,ddltivity).
Probability
1 12
A or A1 -!- A � P(A1) --7 P(A) (continuity). o Most of these are properties of measures in general. The complementation property (ii) is special to P, although an analogous condition holds for any finite measure, with P(Q.) replacing 1 in the formula. (iii) confirms P is a measure, on the definition. (vii) AJ
1'
Proof Applying 7.1(a), (b), and (c),
P(A) + P(Ac)
=
P(A u Ac)
=
P(Q.)
=
(7.2)
1,
from which follow (i) and (ii), and also (iii) on setting A by 3.3, and (vii) by 3.4. •
=
Q.. (iv)-(vi) follow
To create a probability space, probabilities are assigned to a basic class of events tg, according to a hypothesis about the mechanisms underlying the random outcome. For example, in coin or die tossing experiments we have the usual hypo thesis of a fair coin or die, and hence of equally likely outcomes. Then, provided tg is rich enough to be a determining class for the space, (0./!f,P) exists by 3.8 (extension theorem) where 'J = cr(tg). 7.3 Example Let 13[0,11 = {B n [0, 1], B E 13), where 13 is the Borel field on IR . Then ([0, 1], 13[0, 1 1, m), where m i s Lebesgue measure, i s a probability space, since m([O, 1 ]) = 1 . The random elements of this space are real numbers between 0 and 1 , and a drawing from the distribution is called a random variable. It is said to be distributed uniformly on the unit interval. The inclusion or exclusion of the endpoints is optional, remembering that m([0, 1J) = m((0,1)) 1 . o =
The atoms of a p.m. are the outcomes (singleton sets of Q.) that have positive probability. The following is true for finite measures generally but has special importance in the theory of distributions. 7.4 Theorem The atoms of a p.m. are at most countable. Proof Let ro1 be an atom satisfying P( { ro 1 } ) ;::: P( { ro}) for all ro E
Q., let ffi2 satisfy
P( { ro2 }) ;::: P( { ro}) for all ro E Q. - { ro 1 } , and so forth, to generate a sequence with
(7.3) The partial sums Lt= 1 P( { roi}) form a monotone sequence which cannot exceed P(Q.) = 1, and therefore converges by 2.11, implying by 2.25 that limn�=P( { ffin }) = 0. All points with positive probability are therefore in the countable set { roi, i E
IN } .
•
Suppose a random experiment represented by the space (Q.,'J,P) is modified so as to confine the possible outcomes to a subset of the sample space, say A c Q.. For example, suppose we switch from playing roulette with a wheel having a zero slot to one without. The restricted probability space is derived as follows. Let 'JA denote the collection { E n A, E E 'J } . 'JA is a a-field (compare 1.23) called 'J A and is called the trace of 'J on A. Defining PA(E) = P(E)IP(A) for E E 'JA• PA can be verified to be a p.m. The triple (A,'JA ,PA) is called the trace of (Q.,'J,P) on A.
Probability Spaces
1 13
This is similar to the restriction of a measure space to a subspace, except that the measure is renormalized so that it remains a p.m. In everyday language, we are inclined to say that events may be 'impossible' or 'certain' . If such events are none the less elements of tg;, and hence technically random, we convey the idea that they will occur or not occur with 'certainty' by assigning them probabilities of zero or one. The usage of the term 'certain' here is deliberately loose, as the quotation marks suggest. To say an event cannot occur because it has probability zero is different from saying it cannot occur because the outcomes it contains are not elements of .Q. Similarly, to say an event has probability 1 is different from saying it is the event .Q. In technical discus sion we therefore make the nice distinction between sure, which means the latter, and almost sure, which means the former. An event E is said to occur almost surely (a.s.), or equivalently, with probability one (w.p. 1) if M = .Q - E has prob ability measure zero. This terminology is synonymous with almost everywhere (a.e.) in the measure-theoretic context. When there is ambiguity about the p.m. being considered, the notation 'a.s. [P]' may be used. 7 . 2 Conditional Probability
A central issue of probability is the treatment of relationships. When random experiments generate a multi-dimensioned outcome (e.g. a poker deal generates several different hands) questions always arise about relationships between the different aspects of the experiment. The natural way to pose such questions is: 'if I observe only one facet of the outcome, does this change the probabilities I should assign to what is unobserved?' (Skilled poker players know the answer to this question, of course.) The idea underlying conditional probability is that some but not all aspects of a random experiment have been observed. By eliminating some of the possible outcomes (those incompatible with our partial knowledge), we have to consider only a part of the sample space. In (.Q,tg; ,P), suppose we have partial information about the outcome to the effect that 'the event A has occurred' , where A E tg;. How should this knowledge change the probabilities we attach to other events? Since the outcomes in Ac are ruled out, the sample space is reduced from .Q to A. To generate probabilities on this restricted space, define the conditional probability of an event B as P(B IA) = P(A n B)IP(A), for A,B E tg;, P(A) > 0. P(. IA) satisfies the probability axioms as long as P does and P(A) > 0. In particular, P(A lA) = 1 , and P(Bc i A) = 1 - P(B IA), since B nA and Bc nA are disjoint, and their union is A. The space (A,tg;A,PA), the trace of the set A on (.Q,tg;,P), models the random exper iment from the point of view of an observer who knows that ro E A. Events A and B are said to be dependent when P(B lA) '# P(B). In certain respects the conditioning concept seems a little improper. A context in which the components of the random outcome are revealed sequentially to an observer might appear relevant only to a subjective interpretation of probability, and lead a sceptical reader to call the neutrality of the mathematical theory into question. We might also protest that a random event is random, and has no business
Probability
1 14
defining a probability space. In practice, the applications of conditional proba bility in limit theory are usually quite remote from any considerations of sub jectivity, but there is a serious point here, which is the difficulty of constructing a rigorous theory once we depart from the restricted goal of predict ing random outcomes a priori. The way we can overcome improprieties of this kind, and obtain a much more powerful theory into the bargain, is to condition on a class of events, a a-sub field of CZF. Given an event B E CZF, let the set function P(B I §'): §' 1-7 [0, 1 ] represent the contingent probability to be assigned to B after drawing an event A from §', where §' c CZF. We can think of §' as an information set in the sense that, for each A E §', an observer knows whether or not the outcome is in A. Since the elements of the domain are random events, we must think of P(B I §') as itself a random outcome (a random variable, in the terminology of Chapter 8) derived from the restricted probability space (Q,§',P). We may think of this space as a model of the action of an observer possessing information §', who assigns the conditional probability P(B I A) to B when be observes the occurrence of A, viewed from the standpoint of another observer who has no prior information. §' is a a-field, because if we know an outcome is in A we also know it is not in A c, and if we know whether or not it is in Aj for each j = 1 ,2,3, ... , we know whether or not it is in Ufij· The more sets there are in §' the larger the volume of information, all the way from the trivial set ':! = (.0,0) (complete ignorance, with P(B I ':!) = P(B) a.s.) to the set CZF itself, which corresponds to almost sure knowledge of the outcome. In the latter case, P(B I CZF) = 1 a.s. if ro E B, and 0 otherwise. If you know whether or not ro E A for every A E CZF, you effectively know ro. 8 7 . 3 Independence
A pair of events A, B E
equivalently, if
CZF
is said to be independent if P(A n B) = P(A)P(B), or,
P(B I A) = P(B). (7.4) If, in a collection of events �. (7.4) holds for every pair of distinct sets A and B from the collection, � is said to be pairwise independent. In addition, � is said to be totally independent if p
( n A) = AE J
rr P(A)
(7.5)
AEJ
for every subset J c � containing two or more events. This is a stronger condition than pairwise independence. Suppose � consists of sets A, B, and C. Knowing that B has occurred may not influence the probability we attach to A, and similarly for C; but the joint occurrence of B and C may none the less imply something about A . Pairwise independence implies that P(A n B) = P(A )P(B), P(A n C) = P(A )P( C), and P(B n C) = P(B)P( C), but total independence would also require P(A n B n C) P(A)P(B)P( C). =
Probability Spaces
1 15
Here are two useful facts about independent events. In each theorem let � be a totally independent collection, satisfying P((U E .'IA) = IL E .'IP(A) for each subset :J
�
�-
7.5 Theorem The collection �, which contains A and A c for each A E � is totally
independent. Proof It is sufficient to prove that the independence of A and B implies that of Ac and B, for B can denote any arbitrary intersection of sets from the collection and (7.5) will be satisfied, for either A or Ac. This is certainly true, since if P(A n B) = P(A)P(B), then P(A c n B) = P(Ac n B) + P(A n B) - P(A)P(B) = P(B) - P(A)P(B) = P(Ac)P(B). •
(7.6)
7.6 Theorem Let { Bj} be a countable disjoint collection, and let the collections consisting of Bj and the sets of � be totally independent for each j. Then, if B = Uj Bj, the collection consisting of B and � is also independent. Proof Let :J
be any subset of �- Using the disjointness of the sets " of B, and countable additivity,
(
P Bn n A A E .'f
) P (Uj Bj n AnE .'fA) 'L P(Bj)P ( n A ) j A E .'f =
=
=
(
L P Bj n n A j
=
A E .'f
P(B) n P(A) . A E .'f
)
•
(7.7)
7 . 4 Product Spaces
Questions of dependence and independence arise when multiple random experi ments run in parallel, and product spaces play a natural role in the analysis of these issues. Let (Q. x 3, '!F ® 'fl, P) be a probability space where '!F ® 'fl is the a-field generated by the measurable rectangles of Q. x 3, and P(Q. x 3) = 1 . The random outcome is a pair (m,�). This is no more than a case of the general theory of §7. 1 (where the nature of m is unspecified) except that it becomes possible to ask questions about the part of the outcome represented by m or � alone. Pn.(F) = P(Fx 3) for F E '!F, and Ps(G) = P(Q. x G) for G E 'f/, are called the marginal probabilities. (Q.,'!F,Pn) and (3,'fl,Ps) are probability spaces representing an incompletely observed random experiment, with m or � ' respectively, being the only things observed in an experiment generating ( m,�). On the other hand, suppose we observe � and subsequently consider the 'experi ment' of observing m. Knowing � means that for eachQ. x G we know whether or not ( m,�) is in it. The conditional probabilities generated by this two-stage exper iment can be written by a slight abuse of notation as P(FI 'fl), although strictly speaking the relevant events are the cylinders F x 3, and the elements of the
cnnrlitinnin o- C\-tlP.ln
::�rf".
0" X r; fnr
r; E=
�-
�n WP.
rmcrht tn mntP c-nrnoi-h; � �
1 : 1,�
1 16
Probability
P(F x 3 1 Q x §'). In this context, product measure assumes a special role as the model of independence. In (Q x 3, r:J ® §', P), the coordinate spaces are said to be independent when P(Fx G) = Pn(F)P=.(G) (7.8) for each F E r:J and G E §'. Unity of the notation is preserved since F x G = (Fx 3) n (Q x G). We can also write P(F x 3 1 Q x G) = Pn(F), or with a further slight abuse of notation P(FI G) = Pn(F), for any pair F E r:J and G E §'. Independence means that knowing � does not affect the probabilities assigned to sets of r:J. Since the measurable rectangles are a determining class for the space, the p.m. P is entirely determined by the marginal measures.
8 Random Variables
8. 1 Measures on the Line
Let (Q,'!f,P) be a probability space. A real random variable (r.v.) is an '!f/'13measurable function X: Q H IR . 9 That is to say, X(co) induces an inverse mapping from 'J3 to 'Ji such that X - 1 (B) E 'Ji for every B E 'B, where 'J3 is the linear Borel field. The term ''!f-measurable r.v.' may be used when the role of 'J3 is understood. The symbol � will be generally used to denote a p.m. on the line, reserving P for the p.m. on the underlying space. Random variables therefore live in the space (IR,'B,�), where � is the derived measure such that �(B) = P(X - \B)) = P(X E B). The term distribution is syn onymous with measure in this context. The properties of r. v.s are special cases of the results in Chapter 3; in particular, the contents of §3.6 should be reviewed in conjunction with this chapter. If g: IR 1---7 IR is a Borel function, then goX(ro) = g(X(ro)) is also a r.v., having derived p.m. �g - 1 according to 3.21. If there is a set S E 'J3 having the property �(S) = 1, the trace of (IR,'B,�) on S is equivalent to the original space in the sense that the same measure is assigned to B and to B n S, for each B E 'B. Which space to work with is basically a matter of technical convenience. If X is a r.v., it may be more satisfactory to say that the Borel function X 2 is a r. v. distributed on IR+, than that it is distributed on IR but takes values in IR + almost surely. One could substitute for (IR ,'B,�) the extended space {R,'B,�) (see 1.22), but note that assigning a positive probability to infinity does not lead to meaningful results. Random variables must be finite with probability 1 . Thus (IR ,'B,�), the trace of (R,'B,�) on IR , is equivalent to it for nearly all purposes. However, while it is always finite a.s., a r.v. is not necessarily bounded a.s.; there may exist no constant B such that I X(co) I :::; B for all ro E C, with P(O. - C) = 0. The essential supremum of X is ess sup X = inf{x: P( I X I > x) = 0}, and this may be either a finite number, or
(8 . 1 )
+oo.
8 . 2 Distribution Functions
The cumulative distribution function (c.d.f.) of X is the function F: iR where
1---7
[0, 1],
F(x) = ).1((-oo, x]) = P(X :::; x), x E iR. (8.2) We take the domain to be iR since it is natural to assign the values 0 and 1 to
Probability
1 18
F(-oo) and F(+oo) respectively. No other values are possible so there is no contra diction in confining attention to just the points of IR . To specify a distribution for X it is sufficient to assign a functional form for F; J..1 and F are equivalent representations of the distribution, each useful for different purposes. To represent J..l(A) in terms of F for a set A much more complicated than an interval would be cumbersome, but on the other hand, the graph ofF is an appealing way to display the characteristics of the distribution. To see how probabilities are assigned to sets using F, start with the half-open interval (x,y] for x < y. This is the intersection of the half-lines (oo,y] and (-oo, xr = (x,+oo). Let A = (-oo, x] and B = ( - oo,y], so that J..l(A) = F(x) and J..l(B) = F(y); then J..l((x,y]) = J..l(Ac n B) = 1 - J..l(A u B') = 1 - (J..l(A) + 1 - J..l(B)) = J..l(B) - J..l(A) = F(y) - F(x),
(8.3)
A and Be being disjoint. The half-open intervals form a semi-ring (see 1.18), and
from the results of §3.2 the measure extends uniquely to the sets of 'B. As an example of the extension, we determine J..l({x}) = P(X = x) for x E IR (compare 3.15). Putting x = y in (8.3) will not yield this result, since A n A' = 0, not {x}. We could obtain {x} as the intersection of (-oo, x] and [x,+oo) = ( -oo, x)', but then there is no obvious way to find the probability for the open interval (-oo, x) = (-oo, x] - {x} . The solution to the problem is to consider the monotone sequence of half-lines (-oo, x - 1/n] for n E IN . Since (x - 1/n, x] = (-oo, x - 1/n] n (-oo, x], we have J..l( (x - 1/n,x]) = F(x) - F(x - 1/n), according to (8.3). Since {x} = n;=l (x - 1/n,x], {x} E 13 and J..l( {x}) = F(x) - F(x- ), where F(x-) is the left limit of F at x. F(x) exceeds F(x-) (i.e. F jumps) at the atoms of the distribution, points x with J..l( { x}) > 0. We can deduce by the same kind of reasoning that J..l((x,y)) = F(y-) - F(x), J..l( [x,y)) = F(y-) - F(x- ), and that, generally, measures of open intervals are the same as those of closed intervals unless the endpoints are atoms of the distribution. Certain characteristics imposed on the c.d.f. by its definition in terms of a measure were implicit in the above conclusions. The next three theorems establish these properties. 8.1 Theorem F is non-negative and non-decreasing, with F(-oo) = 0 and F(+oo) = 1, and is increasing at x E IR iff every open neighbourhood of x has positive measure. Proof These are all direct consequences of the definition. Non-negativity is from (8.2), and monotonicity from 7.2(iv). F is increasing at x if F(x + E) > F(x - E) for each E > 0. To show the asserted sufficiency, we have for each such E, (8.4) F(x + E) - F(x - E) :2: F((x + E)-) - F(x - E) = J..l(S(x,E)). For the necessity, suppose J..l(S(x,£)) = 0 and note that, by monotonicity of F, J..l(S(x,£)) F((x + £)-) - F(x - £) ;:::: F(x + £12) - F(x- £12). • (8.5) =
The collection of points on which F increases is known as the suvvort of LL Tt�
Random Variables
1 19
complement in IR , the largest set of zero measure, consists of points that must all lie in open neighbourhoods of zero measure, and hence must be open. The support of 11 is accordingly a closed set. 8.2 Theorem F is right-continuous everywhere. Proof For
x
E
IR and n � 1 , additivity of the p.m. implies
1-!(( -oo, x + lin]) = 1-!(( -oo, x]) + 1-!((x, x + lin]). (8.6) As n � oo, 1-!(( -oo, x + lin]) -.1- 1-!(( -oo, x]) by continuity of the measure, and hence limn�ooll((x, x + lin]) = 0. It follows that for E > 0 there exists Ne such that 1-!((x, x + 1/n]) < £, and, accordingly, 1-!( (-oo, x]) � 1-!((-oo, x + lin]) < 1-!((-oo, x]) + £, (8.7) for n � Ne. Hence F(x+) = F(x), proving the theorem since x was arbitrary. • If F(x) had been defined as 1-!((-oo, x)), similar arguments would show that it was left continuous in that case.
F = F+� 1 F(x2) ---------------------- ------------------· ···- ------·----·--············· -------� · · ·
F�::�: · · · · · · · · · · · · · · · · · · �
· · · ....................... . F(x 1 -) 0 _.��----+---+---+ +----
I
F' F"
Fig. 8.1 8.3 Theorem F has the decomposition
F(x) = F(x) + F''(x)
(8.8)
where F'(x) is a right-continuous step function with at most a countable number of jumps, and F''(x) is everywhere continuous. Proof By 7.4, the jump points of F are at most countable. Letting
denote these points,
{x1 , x2, } •.•
Probability
120
F'(x) = L (F(x;) - F(x;-))
(8.9)
x;5,x
is a step function with jumps at the points x;, and F''(x) = F(x) - F'(x) has F(x;-) = F(x;) at each x; and is continuous everywhere. • Fig. 8 . 1 illustrates the decomposition. This is not the only decomposition of F. The Lebesgue decomposition of J..l with respect to Lebesgue measure on IR (see 4.28) is J..l = J..l l + J..l2 where J..l i is singular with respect to m (is positive only on a set of Lebesgue measure 0) and J..L2 is absolutely continuous with respect to Lebesgue measure. Recall that J..L2(A) = fAf(x)dx for A E 'B, where f is the associated Radon-Nikodym derivative (density function). If we decompose F in the same way, such that F;(x) = Jl;((-oo, x]) for i = 1 and 2, we may write F2(x) = s:oof(s)ds, implying that f(x) = dF2 1dS I �:x• This must hold for almost all x (Lebesgue measure), and we call F2 an absolutely continuous function, meaning it is differentiable almost everywhere on its domain. F' ::; F1 since F1 may increase on a set of Lebesgue measure 0, and such sets can be uncountable, and hence larger than the set of atoms. It is customary to summarize these relations by decomposing F" into two additive components, the absolutely continuous part F2, and a component F3 = F" - F2 which is continuous and also singular, constant except on a set of zero Lebesgue measure. This component can in most cases be neglected. The collection of half-lines with rational endpoints generates 'B (1.21) and should be a determining class for measures on (IR ,'B). The following theorem estab lishes the fact that a c.d.f. defined on a dense subset of IR is a unique represen tation of J..l. 8.4 Theorem Let J..l be a finite measure on (IR ,'B) and D a dense subset of iR . The function G defined by G(x) =
{
F(x)
= J..L(( -oo, x]), x
E
D
x
E
IR - D
F(x+),
(8. 1 0)
i s identical with F. Proof By definition, IR
� D and the points of IR - D are all closure points of D. For each x e IR , not excluding points in lR - D, there is a sequence of points in D converging to x (e.g. choose a point from S(x, lln) n D for n E IN). Since F is right-continuous everywhere on IR, J..L((-oo, x]) = F(x+) for each x E IR - D. •
Finally, we show that every F corresponds to some J..l, as well as every J..l to an F. 8.5 Theorem Let F: iR � [0, 1 ] be a non-negative, non-decreasing, right continuous function, with F(-oo) = 0 and F(+oo) = 1 . There exists a unique p.m. J..l on (IR,'B) such that F(x) = J..L((-oo, x]) for all x E IR . o Right continuity, as noted above, corresponds to the convention of defining F by (8.2). If instead we defined F(x) = J..L(( -oo, x)), a left-continuous non-decreasing F
Random Variables
121
would represent a p.m. Proof Consider the function <j>:
[0, 1 ]
1---7
iR,
defined by
<j>(u) = inf{x: u � F(x) } .
(8. 1 1 )
can be thought of as the inverse of F; <j>(O) = -oo, <j>( 1 ) = +oo, and since F is non decreasing and right-continuous, is non-decreasing and left-continuous; is therefore Borel-measurable by 3.32(ii). According to 3.21, we may define a meas ure on (IR ,'B) by m<j> - \B) for each B E 'B, where m is Lebesgue measure on the Borel sets of [0, 1 ] . I n particular, consider the class rg of the half-open intervals (a,b] for all a,b E IR with a < b. This is a semi-ring by 1.18, and cr(tg') = 'B by 1.21. Note that
<j> - 1 ((a,b]) = { u : inf{x: u � F(x) } E (a,b] } = (F(a), F(b)] .
(8. 12)
For each of these sets define the measure
1 j..t((a,b]) = m<j>- ((a,b])) = F(b) - F(a).
(8. 1 3)
The fact that this is a measure follows from the argument of the preceding para graph. tg is a determining class for (IR,'B), and the measure has an extension by 3.8. It is a p.m. since j..t(IR) = 1 , and is unique by 3.13. • The neat construction used in this proof has other applications in the theory of random variables, and will reappear in more elaborate form in §22.2. The graph of is found by rotating and reflecting the graph of F, sketched in Fig. 8.2; to see the former with the usual coordinates, turn the page on its side and view in a mirror. 1 F(x)
F ---------
-----------------------------------------------------------
F(x-) ------------------------ --- -
0 �-----�a= b= x <j>(c) <j>(c+) Fig. 8.2 If F has a discontinuity at x, then = x on the interval (F(x-), F(x)], and <j> 1 ({x}) = (F(x-), F(x)] . Thus, j..t( {x}) = m((F(x-), F(x)]) = F(x) - F(x-), as required. On the other hand, if an interval (a,b] has measure 0 under F, F is
Probability
122
constant on this interval and has a discontinuity at F(a) = F(b) = c (say). takes the value a at this point, by left continuity. Note that <j> - 1 (c) = (a,b], so that J.L((a,b]) = m(c) = 0, as required. 8 . 3 Examples
Most of the distributions met with in practice are either discrete or continuous. A discrete distribution assigns zero probability to all but a countable set of points, with F'' = 0 in the decomposition of 8.3. 8.6 Example The Bernoulli (or binary) r.v. takes values 1 and 0 with fixed proba bilities p and 1 - p. Think of it as a mapping from any probability space contain ing two elements, such as 'Success' and 'Failure', 'Yes' and 'No ' , etc. o 8.7 Example The binomial distribution with parameters n and p (denoted B(n,p)) is the distribution of the number of 1 s obtained in n independent drawings from the Bernoulli distribution, having the probability function
P(X = x) =
(:)p\1 -pt-x, x = O,... ,n.
(8. 14)
o
8.8 Example The limiting case of (8. 14) with p = 'A in, as n ---7 oo, is the Poisson distribution, having probability function
P(X = x) = X1,-e-"1!, x = 0, 1 ,2, ... .
(8. 15)
This is a discrete distribution with a countably infinite set of outcomes.
o
In a continuous distribution, F is absolutely continuous with F1 = 0 in the Lebesgue decomposition of the c.d.f. The derivative f= dF/dx exists a.e. [m] on !R , and is called the probability density function (p.d.f.) of the p.m. According to the Radon-Nikodym theorem, the p.d.f. has the property that for each E E 'B,
(8. 1 6) 8.9 Example For the uniform distribution on [0, 1] (see 7.3),
10,
0 F(x) = x, 0 s x s 1. 1, X > 1 X
0. The family of p.d.f.s with location parameter v and scale parameter 8 take the form
Probability 1 f(x; v,3) = -
(
)
1 , -co 1tD 1 + [(x - v)/3] 2
< X