Course 221: Michaelmas Term 2006 Section 1: Sets, Functions and Countability David R. Wilkins c David R. Wilkins 2006 Co...
30 downloads
1095 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Course 221: Michaelmas Term 2006 Section 1: Sets, Functions and Countability David R. Wilkins c David R. Wilkins 2006 Copyright
Contents 1 Sets, Functions and Countability 1.1 Sets . . . . . . . . . . . . . . . . . . . . . . 1.2 Cartesian Products of Sets . . . . . . . . . . 1.3 Relations . . . . . . . . . . . . . . . . . . . . 1.4 Equivalence Relations . . . . . . . . . . . . . 1.5 Functions . . . . . . . . . . . . . . . . . . . 1.6 The Graph of a Function . . . . . . . . . . . 1.7 Functions and the Empty Set . . . . . . . . 1.8 Injective, Surjective and Bijective Functions 1.9 Inverse Functions . . . . . . . . . . . . . . . 1.10 Preimages . . . . . . . . . . . . . . . . . . . 1.11 Finite and Infinite Sets . . . . . . . . . . . . 1.12 Countability . . . . . . . . . . . . . . . . . . 1.13 Cartesian Products and Unions of Countable 1.14 Uncountable Sets . . . . . . . . . . . . . . . 1.15 Power Sets . . . . . . . . . . . . . . . . . . . 1.16 The Cantor Set . . . . . . . . . . . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sets . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
2 2 5 6 6 8 9 10 11 13 14 16 17 18 21 21 22
1
Sets, Functions and Countability
1.1
Sets
A set is a collection of objects; these objects are known as elements of the set. If an element x belongs to a set X then we denote this fact by writing x ∈ X. Sets with small numbers of elements can be specified by listing the elements of the set enclosed within braces. For example {a, b, c, d} is the set consisting of the elements a, b, c and d. Two sets are equal if and only if they have the same elements. The empty set ∅ is the set with no elements. Standard notations N, Z, Q, R and C are adopted for the following sets: • the set N of positive integers; • the set Z of integers; • the set Q of rational numbers; • the set R of real numbers; • the set C of complex numbers. A set A is said to be a subset of a set B if every element of A is also an element of B. If A is a subset of B but is not equal to B, then we say that A is a proper subset of B. If A is a subset of a set B then we denote this fact by writing A ⊂ B. Note that A = B if and only if A ⊂ B and B ⊂ A. Given a set X and a condition that may or may not be satisfied by elements of X, the subset of X consisting of all elements of X that satisfy the stated condition is represented using the notation {x ∈ X : condition}. Thus for example {n ∈ Z : n > 0} is the subset of the set Z of integers which consists of all strictly positive integers. (In certain contexts it is possible to simplify the above notation to {x : condition} if it is clear from the context what the set is to which the elements x in question belong.) Let a and b be real numbers satisfying a ≤ b. Then intervals in the set of real numbers are denoted using the following standard notation: • [a, b] denotes the set {x ∈ R : a ≤ x ≤ b}; • (a, b) denotes the set {x ∈ R : a < x < b}; • [a, b) denotes the set {x ∈ R : a ≤ x < b}; 2
• (a, b] denotes the set {x ∈ R : a < x ≤ b}; • [a, +∞) denotes the set {x ∈ R : x ≥ a}; • (a, +∞) denotes the set {x ∈ R : x > a}; • (−∞, a] denotes the set {x ∈ R : x ≤ a}; • (−∞, a) denotes the set {x ∈ R : x < a}. The union, intersection and difference of two sets are defined as follows:— • the union X ∪Y of two sets X and Y is the set consisting of all elements that belong to X or to Y (or to both); • the intersection X ∩ Y of two sets X and Y is the set consisting of all elements that belong to both X and Y ; • the difference X \ Y of two sets X and Y is the set consisting of all elements that belong to X but not to Y . The sets X and Y are said to be disjoint if no element belongs to both X and Y (i.e., X ∩ Y = ∅.) Note that X ∪ Y is the union of the three sets X ∩ Y , X \ Y and Y \ X. Moreover these three sets are pairwise disjoint (i.e., each pair is disjoint). We can also consider unions and intersections of more than two sets. The union of a given collection of sets is the set consisting of all elements that belong to at least one of the given sets. The intersection of a given collection of sets is the set consisting of all elements that belong to every one of the given sets. Let X1 , X2 , X3 , . . . , Xn be sets. We denote the union and intersection of these sets by X1 ∪ X2 ∪ X3 ∪ · · · ∪ Xn and X1 ∩ X2 ∩ X3 ∩ · · · ∩ Xn respectively. The union and intersection of an infinite sequence X1 , X2 , X3 , . . . of sets ∞ ∞ S T are denoted by Xi and by Xi respectively. More generally, given any i=1
i=1
collection C of sets, S the union T and intersection of the sets in the collection are denoted by X∈C X and X∈C X respectively. It is handy to introduce the notion of a collection (Xi : i ∈ I) of sets indexed by some set I. This associates to each element i of the indexing set I a corresponding set Xi . We can form the S union or intersection of the sets in such an indexed collection. The union i∈I Xi consists of everything that belongsTto at least one of the sets Xi in the indexed collection; the intersection i∈I Xi consists of everything that belongs to every single set in the indexed collection. Thus for example if I = {1, 2, . . . , n} then a collection 3
of sets indexed by I is just a finite collection of sets X1 , X2 , . . . , Xn ; and in this case [ \ Xi = X1 ∪ X2 ∪ · · · ∪ Xn , Xi = X1 ∩ X2 ∩ · · · ∩ Xn . i∈I
i∈I
Similarly if I = N then a collection (Xi : i ∈ I) of sets indexed by I is just ∞ S S an infinite sequence X1 , X2 , X3 , . . . of sets, and in this case i∈I Xi = Xi and
T
i∈I
Xi =
∞ T
i=1
Xi .
i=1
Let X be a set, and let A be a subset of X. The complement of A (in X) is the set X \ A of all elements of X that do not belong to A. For each subset A of a given set X, let Ac denote the complement of A in X. Then (A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c for all subsets A and B of X. These identities generalize to situations where the number of subsets of X involved is greater than two. This basic result is stated formally in the following lemma. Lemma 1.1 Let X be a set, and let C be an indexed collection of sets. Then [ \ \ [ (X \ Y ) = X \ Y and (X \ Y ) = X \ Y. Y ∈C
Y ∈C
Y ∈C
Y ∈C
Proof Let x be an element of X. Then [ x∈ (X \ Y ) ⇐⇒ there exists Y ∈ C such that x ∈ X \ Y Y ∈C
⇐⇒ ⇐⇒ ⇐⇒
there exists Y ∈ C such that x 6∈ Y \ x 6∈ Y, Y ∈C \ x∈X\ Y. Y ∈C
T S It follows from this that the subsets Y ∈C (X \ Y ) and X \ Y ∈C Y of X have the same elements, and are therefore the same set. Similarly \ x∈ (X \ Y ) ⇐⇒ for all Y ∈ C, x ∈ X \ Y Y ∈C
⇐⇒
for all Y ∈ C, x 6∈ Y [ x 6∈ Y, Y ∈C [ x∈X\ Y,
⇐⇒ ⇐⇒ and therefore
T
Y ∈C (X
\Y)=X \
S
Y ∈C
Y ∈C
4
Y , as required.
Lemma 1.1 thus ensures that the complement of the intersection of any collection of subsets of a given set is the union of the complements of those subsets; and the complement of the union of any collection of subsets of a given set is the intersection of the complements of those subsets. In particular, if X is a set, and if (Yi : i ∈ Y ) is any indexed collection of sets, then [ \ \ [ (X \ Yi ) = X \ Yi and (X \ Yi ) = X \ Yi , i∈I
1.2
i∈I
i∈I
i∈I
Cartesian Products of Sets
Let X and Y be sets. An element x of X and an element y of Y together specify an ordered pair (x, y). Ordered pairs (x, y) are characterized by the following property: (x, y) = (u, v) if and only if x = u and y = v. The set of all ordered pairs (x, y) with x ∈ X and y ∈ Y is referred to as the Cartesian product of the sets X and Y , and is denoted by X × Y . Example The Cartesian product R × R consists of all ordered pairs (x, y) where x and y are real numbers. This set is denoted by R2 . Example Let X = {1, 2, 3} and Y = {2, 4}. Then X × Y = {(1, 2), (1, 4), (2, 2), (2, 4), (3, 2), (3, 4)}. The Cartesian product X1 ×X2 ×X3 ×· · ·×Xn of n sets X1 , X2 , X3 , . . . , Xn is the set consisting of all ordered n-tuples (x1 , x2 , . . . , xn ), where xi ∈ Xi for i = 1, 2, 3, . . . , n. Example Points of 3-dimensional space are represented with respect to a Cartesian co-ordinate system as ordered triples (x, y, z), where x, y and z are real numbers. The set of all such ordered triples is the Cartesian product R × R × R (denoted by R3 ). Note that if Xi is a finite set with mi elements for i = 1, 2, . . . , n, then the Cartesian product X1 × X2 × X3 × · · · × Xn has m1 m2 m3 · · · mn elements.
5
1.3
Relations
Let X be a set. A binary relation on X determines, for elements u and v of X, whether or not u is related to v. For example, there is a binary relation on the set of real numbers, where two real numbers x and y are related if and only if x is less than y. It is traditional to denote binary relations by inserting the symbol for the relation between any two elements that are related. Thus if ∼ is a relation on a set X then u ∼ v is true for elements u and v of X if and only if u and v are related. Familiar examples of this notation are provided by the relations = (‘equals’), < (‘less than’) and ≤ (‘less than or equal to’) on sets of numbers. Any binary relation ∼ on a set X determines a corresponding subset {(u, v) ∈ X × X : u ∼ v} of the Cartesian product X × X. Conversely any subset R of X × X determines a corresponding relation ∼ on X, where elements u and v of X satisfy u ∼ v if and only if (u, v) ∈ R. There is thus a one-to-one correspondence between binary relations on a set X and subsets of X × X.
1.4
Equivalence Relations
Let ∼ be a binary relation on a set S. • The relation ∼ is reflexive on S if the following is true: x ∼ x for all elements x of S. • The relation ∼ is symmetric on S if the following is true: if x and y are elements of S and if x ∼ y then y ∼ x. • The relation ∼ is transitive on S if the following is true: if x, y and z are elements of S and if if x ∼ y and y ∼ z then x ∼ z. Example The relation = (i.e., ‘is equal to’) is reflexive, symmetric and transitive on any set. Example The relation < (i.e., ‘is less than’) is transitive on the set of real numbers but is neither reflexive nor symmetric. Example The relation ≤ (i.e., ‘is less than of equal to’) is reflexive and transitive on the set of real numbers but is not symmetric. Example The relation 6= (i.e., ‘is not equal to’) is symmetric on the set of real numbers but is neither reflexive nor transitive. 6
Example The relation ‘has the same number of elements as’ is reflexive, symmetric and transitive on any collection of finite sets. Definition An equivalence relation on a given set is a binary relation on that set which is reflexive, symmetric and transitive. The relation of equality is an equivalence relation on any set. The relation < (i.e., ‘is less than’) is not an equivalence relation on the set of real numbers because it is neither reflexive nor symmetric. Definition Let ∼ be an equivalence relation on a set X. The equivalence class of x in X (with respect to the equivalence relation ∼) is the set Cx consisting of all elements of X that are related to x. Thus Cx = {y ∈ X : x ∼ y}. Lemma 1.2 Let ∼ be an equivalence relation on a set X, and, for each x ∈ X, let Cx denote equivalence class of x, defined by Cx = {y ∈ X : x ∼ y}. Then the following are true: (i) x ∈ Cx for all x ∈ X; (ii) y ∈ Cx if and only if Cx = Cy ; (iii) if x and y are elements of X and if Cx ∩Cy is non-empty, then Cx = Cy ; (iv) an element x of X belongs to exactly one equivalence class. Proof The fact that x ∈ Cx for all x ∈ X follows immediately from the fact that any equivalence relation is required to be reflexive. This proves (i). Suppose that y ∈ Cx . Then x ∼ y. Also y ∼ x, since any equivalence relation is transitive. If z ∈ Cy then x ∼ y and y ∼ z, and hence x ∼ z, since any equivalence relation is transitive. It follows that if z ∈ Cy then z ∈ Cx , and thus Cy ⊂ Cx . Similarly Cx ⊂ Cy . Thus if y ∈ Cx then Cx = Cy . Conversely if Cx = Cy then y ∈ Cx , since y ∈ Cy . This proves (ii). Next note that if x and y are elements of X and if Cx ∩ Cy is non-empty, then there exists some element z of X such that z ∈ Cx and z ∈ Cy . It follows from (ii) that Cx = Cz and Cy = Cz , and therefore Cx = Cy . This proves (iii). Finally (iv) is a consequence of (i) and (iii). 7
Definition Let X be a set. A partition of X is a collection of subsets of X with the property that every element of X belongs to exactly one of these subsets. Let an equivalence relation be given on a set X. Then the collection of equivalence classes constitutes a partition of X. Conversely any partition of a set X determines an equivalence relation, where two elements of X are related if and only if they belong to the same subset in the partition.
1.5
Functions
Let X and Y be sets. A function f : X → Y from X to Y assigns to each element x of the set X exactly one element f (x) of the set Y . The set X is the domain of the function, and the set Y is the co-domain of the function. The notation f : X → Y is used to specify a function f whose domain is the set X and whose co-domain is the set Y . A function is not fully specified unless its domain and co-domain are specified. Example Let us consider ‘the function that sends x to 1/x2 ’. Note that 1/x2 is is not defined when x = 0. Therefore we cannot view this ‘function’ as a function on the set of real numbers. We can however take as the domain of the function the set R \ {0} of all non-zero real numbers. We thus obtain a function f : R \ {0} → R from the set R \ {0} of non-zero real numbers to the set R of real numbers, where f (x) = 1/x2 for all non-zero real numbers x. There is also a function g: C \ {0} → C from the set C \ {0} of non-zero complex numbers to the set C of complex numbers, where g(z) = 1/z 2 for all non-zero complex numbers z. The functions f and g have different domains, and are therefore considered to be different functions. Note that there is no element x of the domain R \ {0} of f : R \ {0} → R for which f (x) = 0. Also f (x) = f (−x) for all non-zero real numbers x. Thus, given an element y of the co-domain R of the function f , there need not be exactly one element x of the domain satisfying f (x) = y. There may not be any such elements x, as is the case when y < 0, or there may be more than one such element x, as is the case when y > 0. Let X be a set. There is a function i: X → X from X to itself, where i(x) = x for all x ∈ X. This function is referred to as the identity map of X. Let f : X → Y be a function from a set X to a set Y . The range f (X) of the function is defined to be the set {f (x) : x ∈ X} of all elements of the co-domain Y that are of the form f (x) for some element x of the domain. 8
The image f (A) of a subset A of X is defined to be the set {f (x) : x ∈ A} of all elements of the co-domain Y that are of the form f (x) for some element x of A. Note that the range of a function f : X → Y is the image f (X) of the domain X of the function. Also f −1 (Y ) = X. Example Let f : R → R be the function defined by f (x) = x2 for all x ∈ R. The range of f is the set [0, +∞) of non-negative real numbers. The image f ([1, 2]) of the interval [1, 2] is the interval [1, 4]. Let X, Y and Z be sets, and let f : X → Y and g: Y → Z be functions, where the domain Y of g: Y → Z is the co-domain of f : X → Y . The composition function g ◦ f : X → Z is defined by (g ◦ f )(x) = g(f (x)) for all x ∈ X. Note that g ◦ f denotes the function ‘f followed by g’.
1.6
The Graph of a Function
Let X and Y be sets. To every function f : X → Y from X to Y there corresponds a subset Γ(f ) of the Cartesian product X × Y , where Γ(f ) = {(x, y) ∈ X × Y : y = f (x)}. Mathematicians often refer to the subset of X ×Y corresponding to a function f : X → Y as the graph of the function. The following example suggests the reason for this terminology. Example Let q: R → R be the function from the set R of real numbers to itself defined such that q(x) = x2 for all real numbers x. The graph of this function is the subset of R × R given by {(x, y) ∈ R × R : y = x2 }. Note that this subset consists of the Cartesian coordinates of the points of the plane that lie on the curve that represents the graph of the given function. Whilst every function from X to Y determines a corresponding subset Γ(f ) of X × Y , it is not possible to obtain every subset of X × Y in this fashion. Indeed it is easy to see that a subset R of X × Y is the graph of some function f : X → Y if and only if, for every element x of X, there exists exactly one element y of Y for which (x, y) ∈ R. If the subset R of X ×Y has this property, then the corresponding function f : X → Y is characterized by the property that, for each element x of X, f (x) is the unique element of Y for which (x, f (x)) ∈ R. 9
Remark Some mathematicians choose to define a function from a set X to a set Y to be a subset G of the Cartesian product with the property that, to each element x of G there exists exactly one element y of Y for which the ordered pair (x, y) belongs to the subset G. This amounts to identifying the function with its graph.
1.7
Functions and the Empty Set
We shall adopt the convention that, given any any set Y , there exists exactly one function from the empty set ∅ to the set Y . This convention may at first seem a bit strange. Nevertheless experience shows it is in practice a natural convention to adopt, and it tends to simplify the statements and proofs of theorems (which would otherwise be hedged about with all sorts of qualifications and subsidiary arguments to cover the special cases where one of more of the sets involved happens to be the empty set). This convention can also be justified on the grounds that functions from a set X to a set Y correspond to subsets G of the Cartesian product that have the property that, given any element x of X, the set {y ∈ Y : (x, y) ∈ G} has exactly one element. (The subset G of X × Y is the graph of the function f .) In the case where X is the empty set, the Cartesian product X ×Y is also the empty set. The empty set has exactly one subset. This subset is of course the empty set. Moreover it has, vacuously, the property required in order to be the graph of the function, for if the set X is empty, then it is not possible to find any element x of X for which the number of elements belonging to the subset {y ∈ Y : (x, y) ∈ G} of Y differs from one. Thus, if X is the empty set, then the Cartesian product X × Y has exactly one subset which has the properties required of the graph of any function. Note that if there exists a function f : X → Y from a set X to a set Y , and if the set Y is the empty set, then the set X must also be the empty set. (For if x were an element of X, then f (x) would be an element of Y , and therefore the set Y would be non-empty.) We see therefore that the number of functions from a set X to the empty set is zero if X is non-empty, but is one if X is the empty set. When defining properties that sets may or may not have, it is sometimes necessary to decide whether or not the empty set has the given property. There are standard conventions and forms of reasoning that mathematicians regularly adopt to settle such questions. In mathematics, one often meets definitions of properties, applicable to sets, where a set is said to have some property P if and only if all the elements of the set have some property Q. (For example suppose that P represents the property of being a subset of a given set Y . Then the corresponding property 10
Q is that of being an element of the set Y . For a set X is a subset of a given set Y if and only if all the elements of X are elements of Y .) In such cases the question arises as to whether or not the empty set has the property P . The standard convention adopted is that in such cases the empty set does indeed have the property P . Note that if a non-empty set X fails to have the property P , then there must exist at least one element of X which fails to have the property Q. It is natural to extend this basic logical principle to the case where the set X is empty. Clearly the empty set cannot have any elements that fail to have this property Q. So it makes sense to say that all elements of the empty set have the property Q, and therefore the empty set has property P . In effect, we are saying that, if Q is some property that elements of sets may or may not have, then the empty set is considered to be an example of a set whose elements all have the property Q. As a result, given any set X, empty or non-empty, and given any property Q, the statement “all elements of X have the property Q” is the logical negation of the statement “there exists an element of X which does not have the property Q”. Example To give a concrete example, consider the definition of a subset of a given set Y . We say that a set X is a subset of Y if and only if every element of X is an element of Y . Thus if a set X fails to be a subset of Y , there must exist at least one element of X that is not an element of Y . According to the convention we have described, the empty set is to be regarded as a subset of Y , since the empty set clearly does not have any elements that are not elements of Y . (Of course, it does not have any elements at all.) Example Let u be a real number. We say that a subset X of the set of real numbers is bounded above by u if every element x of X satisfies the inequality x ≤ u. Accordingly the empty set is bounded above by u. Moreover if X is a subset of the set of real numbers (empty or non-empty), and if X is not bounded above by the real number u, then there exists at least one element x of the set X which satisfies the inequality x > u.
1.8
Injective, Surjective and Bijective Functions
We now define injective, surjective and bijective functions:— • a function f : X → Y is said to be injective (or one-to-one) if f (u) 6= f (v) whenever u and v are elements of the domain X with u 6= v; • a function f : X → Y is said to be surjective (or onto) if each element of the codomain of the function is the image f (x) of at least one element x of the domain X; 11
• a function f : X → Y is said to be bijective (or is said to be a one-to-one correspondence) if it is both injective and surjective. Injective, surjective and bijective functions are also referred to as injections, surjections and bijections respectively. Note that a function f : X → Y is bijective if and only if, given any element y of the co-domain Y of the function, there exists exactly one element x of the domain X satisfying f (x) = y. Example Let N denote the set {1, 2, 3, 4, . . .} of positive integers. Let f : N → N be the function defined by f (n) = n2 for all positive integers n. This function is injective, for if m and n are positive integers and if m 6= n then m2 6= n2 . The function is not surjective, since there is no positive integer n satisfying f (n) = 3. Example Let g: R → [0, +∞) be the function from the set R of real numbers to the set [0, +∞) of non-negative real numbers that sends each real number x to x2 . This function is not injective, since g(2) = g(−2) = 4. It is surjective: √ for any non-negative real number y, there is a real number y satisfying √ g( y) = y. Example Let h: N → N be the function from the set N of positive integers to itself defined by n n + 1 if n is odd; h(n) = n − 1 if n is even. Thus h(1) = 2, h(2) = 1, h(3) = 4, h(4) = 3, etc. The function is injective. Indeed let m and n be positive integers with m 6= n. If m is odd and n is even then h(m) 6= h(n), since h(m) is even and h(n) is odd. If m is even and n is odd then h(m) 6= h(n), since h(m) is odd and h(n) is even. If m and n are both odd then h(m) 6= h(n) since h(m) = m + 1, h(n) = n + 1 and m + 1 6= n + 1. If m and n are both even then h(m) 6= h(n) since h(m) = m − 1, h(n) = n − 1 and m − 1 6= n − 1. We have thus verified that h(m) 6= h(n) for all positive integers m and n satisfying m 6= n. Thus the function is injective. Let n be a positive integer. If n is odd then n = h(n + 1). If n is even then n = h(n − 1). Thus the function is surjective. The function h: N → N is therefore bijective. Example Let Y be a set. We have adopted the convention that there is exactly one function e: ∅ → Y from the empty set ∅ to the set Y . Clearly there do not exist distinct elements of the empty set that get mapped to the same element of Y . Therefore this function e: ∅ → Y is an injective function. The function e is surjective if and only if Y = ∅. 12
Lemma 1.3 Let X, Y and Z be sets, and let f : X → Y and g: Y → Z be functions. (i) If f : X → Y and g: Y → Z are injective, then so is g ◦ f : X → Z. (ii) If f : X → Y and g: Y → Z are surjective, then so is g ◦ f : X → Z. (iii) If f : X → Y and g: Y → Z are bijective, then so is g ◦ f : X → Z. Proof First suppose that f : X → Y and g: Y → Z are injective. We must prove that g◦f : X → Z is injective. Let u and v be elements of X with u 6= v. Then f (u) 6= f (v), since f : X → Y is injective. But then g(f (u)) 6= g(f (v)), since g: Y → Z is injective. It follows that g ◦ f : X → Z is injective. This proves (i). Next suppose that f : X → Y and g: Y → Z are surjective. We must prove that g ◦ f : X → Z is surjective. Let z be an element of Z. Then there exists y ∈ Y satisfying g(y) = z, since g: Y → Z is surjective. Then there exists x ∈ X satisfying f (x) = y, since f : X → Y is surjective. But then g(f (x)) = z. It follows that g ◦ f : X → Z is surjective. This proves (ii). Clearly (iii) follows from (i) and (ii).
1.9
Inverse Functions
Definition Let X and Y be sets, and let f : X → Y be a function from X to Y . A function g: Y → X from Y to X is said to be the inverse of f : X → Y if g(f (x)) = x for all x ∈ X and f (g(y)) = y for all y ∈ Y . We denote by f −1 : Y → X the inverse of a function f : X → Y , provided that such an inverse exists. Example Consider the function f : [1, 2] → [1, 4], where f (x) = x2 for all x ∈ [1, 2]. The inverse of this function is the function g: [1, 4] → [1, 2], where √ g(y) = y for all y ∈ [1, 4]. Example Consider the function h: R → R, where h(x) = x2 for all real numbers x. This function does not have an well-defined inverse. Indeed no function k: R → R has the property that y = h(k(y)) for all real numbers y, since this identity clearly cannot be satisfied when y < 0. Lemma 1.4 Let X and Y be sets. A function f : X → Y has a well-defined inverse if and only if it is a bijection. Moreover the inverse of a bijection is itself a bijection. 13
Proof Let f : X → Y be a function which has a well-defined inverse f −1 : Y → X. Let u and v be elements of X. Then u = f −1 (f (u)) and v = f −1 (f (v)). Thus if u 6= v then f (u) 6= f (v). The function f : X → Y is therefore injective. The function f : X → Y is also surjective, since y = f (f −1 (y)) for all y ∈ Y . We have thus shown that if a function f : X → Y has a well defined inverse then it is both injective and surjective, and is thus a bijection. Conversely suppose that f : X → Y is a bijection. Then, given any element y of Y , there exists exactly one element x of X satisfying f (x) = y. We therefore define f −1 (y) to be the unique element x of X satisfying f (x) = y. Clearly f (f −1 (y)) = y for all y ∈ Y . Thus f ◦ f −1 is the identity map of Y . We must also show that f −1 ◦ f is the identity map of X. Let x be an element of X. Then f (f −1 (f (x))) = f (x), since f ◦ f −1 is the identity map of Y . But f : X → Y is injective. It follows that f −1 (f (x)) = x, since the elements x and f −1 (f (x)) are mapped by f to the same element of the set Y . We have thus shown that if the function f : X → Y is a bijection then it has a well-inverse. If g: Y → X is the inverse of a bijection f : X → Y then f is the inverse of g, and therefore g: Y → X must be a bijection.
1.10
Preimages
Let f : X → Y be a function from a set X to a set Y , and let B be a subset of Y . The preimage of the set B under the function f is the set f −1 (B) defined such that f −1 (B) = {x ∈ X : f (x) ∈ B}. Remark The preimage f −1 (B) of a subset B of Y is defined in this fashion for all functions f : X → Y from X to Y , irrespective of whether or not that function has a well-defined inverse function. The standard notation f −1 (B) adopted for the preimage of the set B reflects that fact that any function f : X → Y from a set X to a set Y induces a corresponding function from subsets of Y to subsets of X that obviously goes in the reverse direction to the function f . In cases where the function f : X → Y does have well-defined inverse −1 f : Y → X, the preimage of a subset B of Y under the function f coincides with the image of B under the inverse of the function f , so that, in this case, the notation f −1 (B) is unambiguous, and can be taken to represent either the preimage of the set B under the function f , or else the image of B under the function f −1 . Example Let f : R → R be the function defined by f (x) = x2 for all x ∈ R. 14
The preimage f −1 ([1, 4]) of the interval [1, 4] is the union [−2, −1] ∪ [1, 2] of the intervals [1, 2] and [−2, −1]. Lemma 1.5 Let f : X → Y be a function between sets X and Y and let B be a subset of Y . Then f −1 (Y \ B) = X \ f −1 (B). Thus the preimage of the complement of any subset of Y is the complement of the preimage of that subset. Proof Let x be an element of X. Then x ∈ f −1 (Y \ B)
⇐⇒ ⇐⇒
f (x) ∈ Y \ B ⇐⇒ f (x) 6∈ B x 6∈ f −1 (B) ⇐⇒ x ∈ X \ f −1 (B).
It follows that the subsets f −1 (Y \ B) and X \ f −1 (B) of X contain the same elements, and must therefore be the same subset of X. Lemma 1.6 let f : X → Y be a function from a set X to a set Y , and let C be any collection of subsets of Y . Then \ \ [ [ f −1 (B). B = f −1 (B), f −1 B = f −1 B∈C
B∈C
B∈C
B∈C
Thus the preimage of any union of subsets of Y is the union of the preimages of those subsets, and the preimage of any intersection of subsets of Y is the intersection of the preimages of those subsets. Proof Let x be an element of X. Then [ [ B ⇐⇒ f (x) ∈ x ∈ f −1 B∈C
B∈C
B
⇐⇒ ⇐⇒
there exists B ∈ C such that f (x) ∈ B there exists B ∈ C such that x ∈ f −1 (B) [ ⇐⇒ x ∈ f −1 (B). B∈C S S −1 It follows that the subsets f −1 (B) of X contain the B∈C B and B∈C f same elements, and must therefore be the same subset of X. Also \ \ −1 x∈f B ⇐⇒ f (x) ∈ B B∈C
B∈C
⇐⇒ ⇐⇒
for all B ∈ C, f (x) ∈ B for all B ∈ C, x ∈ f −1 (B) \ ⇐⇒ x ∈ f −1 (B). B∈C T T B and B∈C f −1 (B) of X contain the It follows that the subsets f −1 B∈C same elements, and must therefore be the same subset of X. 15
1.11
Finite and Infinite Sets
A set is said to be finite if the number of elements it contains is finite. A basic result states that if X is a finite set then any injection f : X → X from the set X to itself is a bijection. Infinite sets do not have this property. Although the above result seems fairly obvious, we shall give a fairly formal proof. Let n be a positive integer. We say that a set X has n elements if there exists a bijection f : {1, 2, . . . , n} → X defined on the set {1, 2, . . . , n} of natural numbers not exceeding n, and mapping this set onto X. We say that a set X has zero elements if it is the empty set. We say that a set X is finite if there exists some non-negative integer n such that X has n elements. If X is a set with n elements, where n ≥ 1, and if x is some element of X, then the set X \ {x} has n − 1 elements. This fact is readily verified, and is an easy consequence of the fact that, for each integer j between 1 and n there exists a bijection from the set {m ∈ N : 1 ≤ m ≤ n − 1} to the set {m ∈ N : 1 ≤ m ≤ n and m 6= j}. This observation enables us to set up a proof of the required result for finite sets by induction on the number of elements in the set. Proposition 1.7 Let X be a finite set. Then any injection f : X → X is a bijection. Proof The result is easily seen to be true when the number of elements in X is zero or one. Suppose that the result is true for all sets with k elements, where k is some natural number. We show that the result is then true for all sets with k + 1 elements. Let X be a set with k + 1 elements, and let f : X → X be a function from X to X which is an injection. Suppose that there were to exist some element x of X that was not in the range of f . Let Y = X \ {x}, and let g: Y → Y be the function defined such that g(y) = f (y) for all y ∈ Y . The function g would then be an injection from the set Y to itself. Moreover the set Y has k elements. The inductive hypothesis therefore ensures that every injection from the set Y to itself is a surjection. Therefore the function g would be a surjection. In particular there would exist some element y of Y such that g(y) = f (x). But then x and y would be distinct elements of X with the property that f (x) = f (y). But the function f is an injection, and therefore this situation cannot arise. We see therefore that a contradiction would arise were there to exist any element x of the set X that was not in the range of the injection f . It follows that the range of the function f must be the whole of the set X. Thus f is a surjection, and is 16
therefore a bijection. We have thus shown that if every injection from a set with k elements to itself is a bijection, then every injection from a set with k + 1 elements to itself is a bijection. It now follows by induction on the number of elements in the set that every injection from a finite set to itself is a bijection. Example Consider the function f : N → N from the set of natural numbers to itself defined such that f (n) = n + 1 for all natural numbers n. This function is an injection. However it is not a surjection, because the number 1 is not in the range of the function. The function f is thus an example of a function from a set to itself that is an injection but is not a bijection. A set is said to be infinite if it is not finite. Lemma 1.8 Let X be an infinite set. Then there exists an injection f : X → X that is not a bijection. Proof No finite list of elements of X can include all elements of X, and therefore there exists an infinite sequence x1 , x2 , x3 , . . . of elements of X which are distinct (so that xj 6= xk whenever j 6= k). Let f : X → X be the function defined such that f (xn ) = xn+1 for all natural numbers n, and f (x) = x for all elements of X not included in the sequence x1 , x2 , x3 , . . .. Then the function f is an injective function whose range is X \ {x1 }. This injection is not a bijection. The following result follows immediately on combining the results of Proposition 1.7 and Lemma 1.8. Proposition 1.9 A set X is infinite if and only if there exists an injection f : X → X that is not a bijection.
1.12
Countability
Definition A set X is said to be countable if there exists an injection f : X → N mapping X into the set N of natural numbers. Example The set Z of integers is countable. For there is a well-defined bijection f : Z → N defined such that f (n) = 2n + 1 when n ≥ 0 and f (n) = −2n when n < 0. This bijection maps the set of non-negative integers onto the set of odd natural numbers, and maps the set of negative integers onto the set of even natural numbers. Lemma 1.10 Any subset of a countable set is countable. 17
Proof Let Y be a subset of a countable set X. Then there exists an injection f : X → N from X to the set N of natural numbers. The restriction of this injection to set Y gives an injection from Y to N. Proposition 1.11 A non-empty set X is countable if and only if there exists a surjective function g: N → X mapping the set N of natural numbers onto X. Proof Suppose that X is a countable non-empty set. Then there exists an injection f : X → N from X to N. Let x0 be some chosen element of the set X. Then there is a well-defined function g: N → X defined such that g(f (x)) = x for all x ∈ X, and g(n) = x0 for natural numbers n that do not belong to the range f (X) of the function f . (The definition of the function g relies on the fact that, given an element n of the range f (X) of the injection f , there exists exactly one element x of the set X for which f (x) = n.) The function g is clearly a surjection, in view of the fact that x = g(f (x)) for all x ∈ X. Conversely let X be a non-empty set, and let g: N → X be a surjection from N to X. Given an element x, there exists at least one natural number n for which g(n) = x. It follows that there is a well-defined function f : X → N such that, given any element x of X, f (x) is the smallest natural number n for which g(n) = x. Then g(f (x)) = x for all x ∈ X. It follows from this that if x1 and x2 are elements of X (not necessarily distinct), and if f (x1 ) = f (x2 ), then x1 = g(f (x1 )) = g(f (x2 )) = x2 . We conclude that distinct elements of the set X get mapped to distinct natural numbers. Thus the function f : X → N is an injection, and therefore the set X is countable, as required. Corollary 1.12 Let h: X → Y be a surjection. Suppose that the set X is countable. Then the set Y is countable. Proof There is nothing to prove if the set X is the empty set, since in that case the set Y must also be the empty set. Suppose therefore that the set X is non-empty and countable. It follows from Proposition 1.11 that there exists a surjection g: N → X from N to X. The composition h ◦ g: N → Y of g and h is then a surjection from N to Y (since the composition of two surjections is always a surjection). It then follows from Proposition 1.11 that the set Y is countable, as required.
1.13
Cartesian Products and Unions of Countable Sets
Lemma 1.13 There exists a bijection between the sets N × N and N. 18
Proof Let f : N × N → N be the function defined such that f (j, k) = 21 (j + k − 1)(j + k − 2) + k. One can check that this function f is a bijection. Note that, for each natural number m greater than one, this function f maps the set Dm into the set Im , where Dm = {(j, k) ∈ N × N : j + k = m} and Im = {n ∈ N : 12 (m − 1)(m − 2) < n ≤ 12 m(m − 1)}. Now, given any natural number n, there exists a unique natural number m greater than one such that 21 (m−1)(m−2) < n ≤ 12 m(m−1). It follows that each natural number belongs to exactly one of the sets I2 , I3 , I4 . . . . Moreover if n is a natural number, and if n ∈ Im , where m is a natural number greater than one, then n = f (m − k, k) where k = n − 21 (m − 1)(m − 2). Moreover (m − k, k) is the unique element of Dn satisfying f (n − k, k) = n. These facts ensure that, given any natural number n, there exists exactly one pair (j, k) of natural numbers satisfying f (j, k) = n. (These natural numbers j and k satisfy j + k = m, where m is the unique natural number greater than one that satisfies the inequalities 12 (m − 1)(m − 2) < n ≤ 12 m(m − 1).) Therefore the function f is both injective and surjective, and is thus a bijection, as required. Remark The function f : N × N → N used in the proof of Lemma 1.13 is constructed so that f (1, 1) = 1, f (2, 1) = 2, f (3, 1) = 4, f (4, 1) = 7,
f (1, 2) = 3,
f (2, 2) = 5,
f (3, 2) = 8,
f (1, 3) = 6,
f (2, 3) = 9,
f (1, 4) = 10,
etc.
These examples giving the value of (j, k) for small values of j and k should convey the basic scheme used to construct this function f . Proposition 1.14 Let X and Y be countable sets. Then the Cartesian product X × Y of X and Y is a countable set. Proof There exist injective functions g: X → N and h: Y → N, because the sets X and Y are countable. Also there exists a bijection f : N × N → N from N×N to N (Lemma 1.13). Let p: X ×Y → N be the function defined such that p(x, y) = f (g(x), h(y)) for all x ∈ X and y ∈ Y . We claim that the function p is an injection. Let x1 and x2 be elements of X (not necessarily distinct), and let y1 and y2 be elements of Y . Suppose that p(x1 , y1 ) = p(x2 , y2 ). Then 19
(g(x1 ), h(y1 )) = (g(x2 ), h(y2 )), because the function f : N → N is an injection, and therefore g(x1 ) = g(x2 ) and h(y1 ) = h(y2 ). But the functions g and h are injections. It follows that x1 = x2 and y1 = y2 , and thus (x1 , y1 ) = (x2 , y2 ). We have therefore shown that if the elements (x1 , y1 ) and (x2 , y2 ) of X × Y are such that p(x1 , y1 ) = p(x2 , y2 ) then (x1 , y1 ) = (x2 , y2 ). This shows that the function p: X × Y → N is an injection. The existence of such an injection guarantees that the set X × Y is countable, as required. Corollary 1.15 Let X1 , X2 , . . . , Xn be countable sets. Then the Cartesian product X1 × X2 × · · · × Xn of these sets is a countable set. Proof The result follows by induction on the number of sets forming the Cartesian product, because the set X1 × X2 × · · · × Xn may be regarded as the Cartesian product of the sets X1 × X2 × · · · × Xn−1 and Xn whenever n > 1, and the Cartesian product of any two countable sets is countable (Proposition 1.14). Lemma 1.16 The set Q of rational numbers is countable. Proof The set Z of integers and the set N of natural numbers are countable sets, and therefore the Cartesian product Z × N is a countable set (Proposition 1.14). There is an obvious surjection g: Z × N → Q, where g(z, n) = z/n for all integers z and natural numbers n. The result therefore follows immediately on applying Corollary 1.12. Proposition S∞ 1.17 Let X1 , X2 , X3 , . . . be a sequence of countable sets Then the union n=1 Xn of these countable sets is itself a countable set. Proof For each natural number n let gn : Xn → N be an injective function from Xn to the set N of natural numbers. (Such injective functions exist because each set Xn is countable.) We shall S construct an injective function h: X → N × N from X to N, where X = ∞ n=1 Xn . Given any element x of X, let h(x) = (n(x), gn(x) (x)), where n(x) is the smallest natural number with the property that x ∈ Xn(x) . (Note that x belongs to at least one of the sets Xn , and therefore this natural number n(x) is well-defined.) Let x and y be elements of X satisfying h(x) = h(y). We claim that x = y. Now if h(x) = h(y) then n(x) = n(y). It follows that x ∈ Xn and y ∈ Xn , where n = n(x) = n(y). Moreover gn (x) = gn (y). But g: Xn → N is an injective function. It follows that x = y. We conclude therefore that the function h: X → N × N is injective. 20
Now Lemma 1.13 ensures that there exists a bijective function f : N×N → N from N×N to N. The composition function f ◦h: X → N is then an injective function from X to N. We conclude therefore that the set X is countable, as required. Corollary 1.18 Let (Xi : i ∈ I) be a S collection of countable sets, indexed by a countable set I. Then the union i∈I Xi of the sets in this countable collection is a countable set. Proof The indexing set I is a countable set. Therefore there exists an injective function g: I → N. It follows that, for each natural number n, there exists at most one element i of the indexing set such that g(i) = n. If there exists some element i of I such that g(i) = n, let Yn = Xi ; otherwise let Yn = ∅.S Then Y1 , S Y2 , Y3 , . . . is an infinite sequence of countable sets, and ∞ clearly i∈I Xi = n=1 Yn . It follows immediately from Proposition 1.17 S X is a countable set, as required. i i∈I We define a countable union of sets to be a union of sets where the sets making up the collection can be indexed by some countable sets. Thus the union of a finite number of sets is a countable union of sets. Also the union of an infinite sequence X1 , X2 , X3 , . . . of sets is a countable union. The result of Corollary 1.18 may be summed up in the statement that any countable union of countable sets is itself a countable sets.
1.14
Uncountable Sets
A set that is not countable is said to be uncountable. Many sets occurring in mathematics are uncountable. These include the set of real numbers (see Proposition 1.21). It follows directly from Lemma 1.10 that if a set X has an uncountable subset, then X must itself be uncountable. It also follows directly from Corollary 1.12 that if h: X → Y is a surjection from a set X to a set Y , and if the set Y is uncountable, then the set X is uncountable.
1.15
Power Sets
Definition Let X be a set. The power set P(X) is defined to be the set of subsets of X. Proposition 1.19 Let X be a set. Then there does not exist any surjection from X to the power set P(X) of X. 21
Proof Let f : X → P(X) be any function from X to P(X), and let Zf = {x ∈ X : x 6∈ f (x)}. Let x ∈ X. Then x belongs to exactly one of the sets Zf and f (x). It follows that Zf 6= f (x) for all x ∈ X. We have thus shown that any function f : X → P(X) determines an element Zf of P(X) that does not belong to the range of the function f . Thus no function f : X → P(X) can be a surjection. Corollary 1.20 ) The power set P(N) of the set N of natural numbers is uncountable. Proof It follows from Proposition 1.11 that if the set P(N) were countable then there would exist a surjection f : N → P(N) from N to P. But it follows from Proposition 1.19 that there are no surjections from N to P(N). Therefore P(N) cannot be countable.
1.16
The Cantor Set
Definition The Cantor set is the set consisting of all real numbers that can ∞ X 2an be expressed as the sums of infinite series of the form , where, for n 3 n=1 each natural number n, either an = 0 or an = 1. We shall show that the Cantor set is an uncountable set. Let P(N) denote the set of all subsets of the set N of naturalP numbers, and let f : P(N) → R be the function defined such that f (I) = 2 3−n for n∈I
all subsets I of N. The range of this function f is the Cantor set. We claim that the function f is injective. Let I and J be subsets of N, and let a1 , a2 , a3 , . . . and b1 , b2 , b3 , . . . be the infinite sequences defined such that an = 1 if n ∈ I, an = 0 if n 6∈ I, bn = 1 if n ∈ J and bn = 0 if n 6∈ J. Then f (I) =
∞ X 2an n=1
3n
,
f (J) =
∞ X 2bn n=1
3n
.
Suppose that I 6= J. Let m be the smallest value of n for which an 6= bn . Then ∞ X 2(bm − am ) 2(bn − an ) f (J) − f (I) = + . m 3 3n n=m+1 22
Now |bn − an | ≤ 1 for all natural numbers n, and therefore ∞ ∞ ∞ X X X 2(bn − an ) 2|bn − an | 2 2 1 ≤ = m+1 × ≤ n n n n=m+1 3 3 3 3 1− n=m+1 n=m+1 It follows that
1 3
=
1 . 3m
2(b − a ) m m f (J) − f (I) − ≤ 1 , 3m m 3
where |bm − am | = 1, and therefore 2(bm − am ) − 1 = 1 . |f (J) − f (I)| ≥ 3m m 3 3m We conclude from this that f (I) 6= f (J) when I 6= J. It follows from this that the function f : P(N) → R is injective, and therefore defines a bijection between the set P(N) and the Cantor set. Now P(N) is uncountable (Corollary 1.20). It follows that that Cantor set is uncountable. Proposition 1.21 The set R of real numbers is uncountable. Proof Every subset of a countable set is countable (Lemma 1.10). The Cantor set is an uncountable set that is a subset of the set R of real numbers. Therefore R cannot be countable.
23
Course 221: Michaelmas Term 2006 Section 2: Metric Spaces David R. Wilkins c David R. Wilkins 1997–2006 Copyright
Contents 2 Metric Spaces 2.1 Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . 2.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Convergence of Sequences in a Metric Space . . . . . . . 2.4 Continuity of Functions between Metric Spaces . . . . . 2.5 Continuity of Functions with Values in Euclidean Spaces 2.6 Open Sets in Euclidean Spaces . . . . . . . . . . . . . . . 2.7 Open Sets in Metric Spaces . . . . . . . . . . . . . . . . 2.8 Closed Sets in a Metric Space . . . . . . . . . . . . . . . 2.9 Continuous Functions and Open and Closed Sets . . . . 2.10 Homeomorphisms . . . . . . . . . . . . . . . . . . . . . .
1
. . . . . . . . . .
. . . . . . . . . .
2 . 2 . 3 . 4 . 5 . 7 . 8 . 9 . 12 . 14 . 16
2 2.1
Metric Spaces Euclidean Spaces
We denote by Rn the set consisting of all n-tuples (x1 , x2 , . . . , xn ) of real numbers. The set Rn represents n-dimensional Euclidean space (with respect to the standard Cartesian coordinate system). Let x and y be elements of Rn , where x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ), and let λ be a real number. We define x+y x−y λx x·y
= = = =
(x1 + y1 , x2 + y2 , . . . , xn + yn ), (x1 − y1 , x2 − y2 , . . . , xn − yn ), (λx1 , λx2 , . . . , λxn ), x1 y1 + x2 y2 + · · · + xn yn , q x21 + x22 + · · · + x2n . |x| =
The quantity x · y is the scalar product (or inner product) of x and y, and the quantity |x| is the Euclidean norm of x. Note that |x|2 = x · x. The Euclidean distance between two points x and y of Rn is defined to be the Euclidean norm |y − x| of the vector y − x. Lemma 2.1 (Schwarz’ Inequality) Let x and y be elements of Rn . Then |x · y| ≤ |x||y|. Proof We note that |λx + µy|2 ≥ 0 for all real numbers λ and µ. But |λx + µy|2 = (λx + µy).(λx + µy) = λ2 |x|2 + 2λµx · y + µ2 |y|2 . Therefore λ2 |x|2 + 2λµx · y + µ2 |y|2 ≥ 0 for all real numbers λ and µ. In particular, suppose that λ = |y|2 and µ = −x · y. We conclude that |y|4 |x|2 − 2|y|2 (x · y)2 + (x · y)2 |y|2 ≥ 0, so that (|x|2 |y|2 − (x · y)2 ) |y|2 ≥ 0. Thus if y 6= 0 then |y| > 0, and hence |x|2 |y|2 − (x · y)2 ≥ 0. But this inequality is trivially satisfied when y = 0. Thus |x · y| ≤ |x||y|, as required.
2
It follows easily from Schwarz’ Inequality that |x + y| ≤ |x| + |y| for all x, y ∈ Rn . For |x + y|2 = (x + y).(x + y) = |x|2 + |y|2 + 2x · y ≤ |x|2 + |y|2 + 2|x||y| = (|x| + |y|)2 . It follows that |z − x| ≤ |z − y| + |y − x| for all points x, y and z of Rn . This important inequality is known as the Triangle Inequality. It expresses the geometric fact the the length of any triangle in a Euclidean space is less than or equal to the sum of the lengths of the other two sides.
2.2
Metric Spaces
Definition A metric space (X, d) consists of a set X together with a distance function d: X × X → [0, +∞) on X satisfying the following axioms: (i) d(x, y) ≥ 0 for all x, y ∈ X, (ii) d(x, y) = d(y, x) for all x, y ∈ X, (iii) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X, (iv) d(x, y) = 0 if and only if x = y. The quantity d(x, y) should be thought of as measuring the distance between the points x and y. The inequality d(x, z) ≤ d(x, y)+d(y, z) is referred to as the Triangle Inequality. The elements of a metric space are usually referred to as points of that metric space. Note that if X is a metric space with distance function d and if A is a subset of X then the restriction d|A × A of d to pairs of points of A defines a distance function on A satisfying the axioms for a metric space. The set R of real numbers becomes a metric space with distance function d given by d(x, y) = |x − y| for all x, y ∈ R. Similarly the set C of complex numbers becomes a metric space with distance function d given by d(z, w) = |z − w| for all z, w ∈ C, and n-dimensional Euclidean space Rn is a metric space with with respect to the Euclidean distance function d, given by v u n uX d(x, y) = |x − y| = t (xi − yi )2 i=1
3
for all x, y ∈ Rn . Any subset X of R, C or Rn may be regarded as a metric space whose distance function is the restriction to X of the distance function on R, C or Rn defined above. Example The n-sphere S n is defined to be the subset of (n+1)-dimensional Euclidean space Rn+1 consisting of all elements x of Rn+1 for which |x| = 1. Thus S n = {(x1 , x2 , . . . , xn+1 ) ∈ Rn+1 : x21 + x22 + · · · + x2n+1 = 1}. (Note that S 2 is the standard (2-dimensional) unit sphere in 3-dimensional Euclidean space.) The chordal distance between two points x and y of S n is defined to be the length |x − y| of the line segment joining x and y. The n-sphere S n is a metric space with respect to the chordal distance function.
2.3
Convergence of Sequences in a Metric Space
Definition Let X be a metric space with distance function d. A sequence x1 , x2 , x3 , . . . of points in X is said to converge to a point p in X if, given any strictly positive real number ε, there exists some natural number N such that d(xj , p) < ε whenever j ≥ N . We refer to p as the limit lim xj of the sequence x1 , x2 , x3 , . . . . j→+∞
Example The set R of real numbers is considered to be a metric space, whose distance function d is defined such that d(u, v) = |u − v| for all real numbers u and v. An infinite sequence x1 , x2 , x3 , . . . of real numbers converges in this metric space to some real number p if and only if, given any strictly positive real number ε, there exists some positive integer N such that |xj − p| < ε whenever j ≥ N . This criterion reproduces the standard definition of convergence for an infinite sequence of real numbers. We conclude therefore that the definition of convergence for sequences of points in metric spaces generalizes the standard definition of convergence for infinite sequences of real numbers. Example Let z1 , z2 , z3 , . . . be an infinite sequence of complex numbers. The set C of complex numbers is considered to be a metric space, whose distance function d is defined such that d(z, w) = |z − w| for all complex numbers z and w. It follows that the infinite sequence z1 , z2 , z3 , . . . of complex numbers converges in this metric space to the complex number w if and only if, given any strictly positive real number ε, there exists some positive integer N such that |zj − w| < ε whenever j ≥ N . This is the standard criterion for the convergence of an infinite sequence of complex numbers. 4
Example Let n be a positive integer, and let p1 , p2 , p3 , . . . be an infinite sequence of points in n-dimensional Euclidean space Rn . This sequence of points converges to some point r of Rn if and only if, given any strictly positive real number ε, there exists some positive integer N such that |pj − r| < ε whenever j ≥ N . If a sequence of points in a metric space is convergent then the limit of that sequence is unique. Indeed let x1 , x2 , x3 , . . . be a sequence of points in a metric space (X, d) which converges to points p and p0 of X. We show that p = p0 . Now, given any ε > 0, there exist natural numbers N1 and N2 such that d(xj , p) < ε whenever j ≥ N1 and d(xj , p0 ) < ε whenever j ≥ N2 . On choosing j so that j ≥ N1 and j ≥ N2 we see that 0 ≤ d(p, p0 ) ≤ d(p, xj ) + d(xj , p0 ) < 2ε by a straightforward application of the metric space axioms (i)–(iii). Thus 0 ≤ d(p, p0 ) < 2ε for every ε > 0, and hence d(p, p0 ) = 0, so that p = p0 by Axiom (iv). Lemma 2.2 Let (X, d) be a metric space, and let x1 , x2 , x3 , . . . be a sequence of points of X which converges to some point p of X. Then, for any point y of X, d(xj , y) → d(p, y) as j → +∞. Proof Let ε > 0 be given. We must show that there exists some natural number N such that |d(xj , y) − d(p, y)| < ε whenever j ≥ N . However N can be chosen such that d(xj , p) < ε whenever j ≥ N . But d(xj , y) ≤ d(xj , p) + d(p, y),
d(p, y) ≤ d(p, xj ) + d(xj , y)
for all j, hence −d(xj , p) ≤ d(xj , y) − d(p, y) ≤ d(xj , p) for all j, and hence |d(xj , y) − d(p, y)| < ε whenever j ≥ N , as required.
2.4
Continuity of Functions between Metric Spaces
Definition Let X and Y be metric spaces with distance functions dX and dY respectively. A function f : X → Y from X to Y is said to be continuous at a point p of X if and only if the following criterion is satisfied:— • given any strictly positive real number ε, there exists some strictly positive real number δ such that dY (f (x), f (p)) < ε for all points x of X satisfying dX (x, p) < δ. 5
The function f : X → Y is said to be continuous on X if and only if it is continuous at p for every point p of X. Example Let X be a subset of the set R of real numbers. We can regard X and R as metric spaces whose distance function d is defined such that d(u, v) = |u − v| for all real numbers u and v belonging to the relevant set. A real-valued function f : X → R satisfies the above definition of continuity at an element p of X if and only if, given any strictly positive real number ε, there exists some strictly positive real number δ such that |f (x) − f (p)| < ε for all elements x of X satisfying |x − p| < δ. We see from this that the definition of continuity for functions between metric spaces generalizes the standard definition of continuity for functions of a real variable. Example Let D be a subset of the set C of complex numbers. We can regard D and C as metric spaces whose distance function d is defined such that d(z, w) = |z − w| for all complex numbers z and w belonging to the relevant set. A complex-valued function f : D → R satisfies the above definition of continuity at an element w of D if and only if, given any strictly positive real number ε, there exists some strictly positive real number δ such that |f (z) − f (w)| < ε for all elements z of D satisfying |z − w| < δ. This is the standard definition of continuity for functions of a complex variable. Example Let X and Y be a subsets of Rm and Rn respectively. A function f : X → Y from X to Y is continuous at a point p of X if and only if the following criterion is satisfied:— given any strictly positive real number ε, there exists some strictly positive real number δ such that |f (x) − f (p)| < ε for all points x of X satisfying |x − p| < δ. Lemma 2.3 Let X, Y and Z be metric spaces, and let f : X → Y and g: Y → Z be continuous functions. Then the composition function g ◦ f : X → Z is continuous. Proof We denote by dX , dY and dZ the distance functions on X, Y and Z respectively. Let p be any point of X. We show that g ◦ f is continuous at p. Let ε > 0 be given. Now the function g is continuous at f (p). Hence there exists some η > 0 such that dZ (g(y), g(f (p))) < ε for all y ∈ Y satisfying dY (y, f (p)) < η. But then there exists some δ > 0 such that dY (f (x), f (p)) < η for all x ∈ X satisfying dX (x, p) < δ. Thus dZ (g(f (x)), g(f (p))) < ε for all x ∈ X satisfying dX (x, p) < δ, showing that g ◦ f is continuous at p, as required. 6
Lemma 2.4 Let f : X → Y be a continuous function between metric spaces X and Y , and let x1 , x2 , x3 , . . . be a sequence of points in X which converges to some point p of X. Then the sequence f (x1 ), f (x2 ), f (x3 ), . . . converges to f (p). Proof We denote by dX and dY the distance functions on X and Y respectively. Let ε > 0 be given. We must show that there exists some natural number N such that dY (f (xn ), f (p)) < ε whenever n ≥ N . However there exists some δ > 0 such that dY (f (x), f (p)) < ε for all x ∈ X satisfying dX (x, p) < δ, since the function f is continuous at p. Also there exists some natural number N such that dX (xn , p) < δ whenever n ≥ N , since the sequence x1 , x2 , x3 , . . . converges to p. Thus if n ≥ N then dY (f (xn ), f (p)) < ε, as required.
2.5
Continuity of Functions with Values in Euclidean Spaces
Let f : X → Rn be a function mapping a mapping a set X into n-dimensional Euclidean space Rn . Then f (x) = (f1 (x), f2 (x), . . . , fn (x)) for all x ∈ X, where f1 , f2 , . . . , fn are functions from X to R, referred to as the components of the function f . Proposition 2.5 Let X be a metric space, and let p be a point of X. A function f : X → Rn mapping X into the Euclidean space Rn is continuous at p if and only if its components are continuous at p. Proof Note that the ith component fi of f is given by fi = pi ◦ f , where pi : Rn → R is the continuous function which maps (y1 , y2 , . . . , yn ) ∈ Rn onto its ith coordinate yi . It therefore follows immediately from Lemma 2.3 that if f is continuous the point p, then so are the components of f . Conversely suppose that the components of f are continuous at p ∈ X. Let ε > 0 be given. Then there √ exist positive real numbers δ1 , δ2 , . . . , δn such that |fi (x) − fi (p)| < ε/ n for x ∈ X satisfying d(x, p) < δi , where d denotes the distance function on the metric space X. Let δ be the minimum of δ1 , δ2 , . . . , δn . If x ∈ X satisfies d(x, p) < δ then 2
|f (x) − f (p)| =
n X
|fi (x) − fi (p)|2 < ε2 ,
i=1
and hence |f (x) − f (p)| < ε. Thus the function f is continuous at p, as required. 7
2.6
Open Sets in Euclidean Spaces
Let X be a subset of Rn . Given a point p of X and a non-negative real number r, the open ball BX (p, r) in X of radius r about p is defined to be the subset of X given by BX (p, r) ≡ {x ∈ X : |x − p| < r}. (Thus BX (p, r) is the set consisting of all points of X that lie within a sphere of radius r centred on the point p.) Definition Let X be a subset of Rn . A subset V of X is said to be open in X if and only if, given any point p of V , there exists some δ > 0 such that BX (p, δ) ⊂ V . By convention, we regard the empty set ∅ as being an open subset of X. (The criterion given above is satisfied vacuously in the case when V is the empty set.) In particular, a subset V of Rn is said to be an open set (in Rn ) if and only if, given any point p of V , there exists some δ > 0 such that B(p, δ) ⊂ V , where B(p, r) = {x ∈ Rn : |x − p| < r}. Example Let H = {(x, y, z) ∈ R3 : z > c}, where c is some real number. Then H is an open set in R3 . Indeed let p be a point of H. Then p = (u, v, w), where w > c. Let δ = w − c. If the distance from a point (x, y, z) to the point (u, v, w) is less than δ then |z − w| < δ, and hence z > c, so that (x, y, z) ∈ H. Thus B(p, δ) ⊂ H, and therefore H is an open set. Lemma 2.6 Let X be a subset of Rn , and let p be a point of X. Then, for any positive real number r, the open ball BX (p, r) in X of radius r about p is open in X. Proof Let x be an element of BX (p, r). We must show that there exists some δ > 0 such that BX (x, δ) ⊂ BX (p, r). Let δ = r − |x − p|. Then δ > 0, since |x − p| < r. Moreover if y ∈ BX (x, δ) then |y − p| ≤ |y − x| + |x − p| < δ + |x − p| = r, by the Triangle Inequality, and hence y ∈ BX (p, r). Thus BX (x, δ) ⊂ BX (p, r). This shows that BX (p, r) is an open set, as required. Lemma 2.7 Let X be a subset of Rn , and let p be a point of X. Then, for any non-negative real number r, the set {x ∈ X : |x − p| > r} is an open set in X. 8
Proof Let x be a point of X satisfying |x − p| > r, and let y be any point of X satisfying |y − x| < δ, where δ = |x − p| − r. Then |x − p| ≤ |x − y| + |y − p|, by the Triangle Inequality, and therefore |y − p| ≥ |x − p| − |y − x| > |x − p| − δ = r. Thus BX (x, δ) is contained in the given set. The result follows.
2.7
Open Sets in Metric Spaces
Definition Let (X, d) be a metric space. Given a point p of X and r ≥ 0, the open ball BX (p, r) of radius r about p in X is defined by BX (x, r) = {x ∈ X : d(x, p) < r}. Definition Let (X, d) be a metric space. A subset V of X is said to be an open set if and only if the following condition is satisfied: • given any point v of V there exists some δ > 0 such that BX (v, δ) ⊂ V . By convention, we regard the empty set ∅ as being an open subset of X. (The criterion given above is satisfied vacuously in this case.) Lemma 2.8 Let X be a metric space with distance function d, and let p be a point of X. Then, for any r > 0, the open ball BX (p, r) of radius r about p is an open set in X. Proof Let q ∈ BX (p, r). We must show that there exists some δ > 0 such that BX (q, δ) ⊂ BX (p, r). Now d(q, p) < r, and hence δ > 0, where δ = r − d(q, p). Moreover if x ∈ BX (q, δ) then d(x, p) ≤ d(x, q) + d(q, p) < δ + d(q, p) = r, by the Triangle Inequality, hence x ∈ BX (p, r). Thus BX (q, δ) ⊂ BX (p, r), showing that BX (p, r) is an open set, as required. Lemma 2.9 Let X be a metric space with distance function d, and let p be a point of X. Then, for any r ≥ 0, the set {x ∈ X : d(x, p) > r} is an open set in X.
9
Proof Let q be a point of X satisfying d(q, p) > r, and let x be any point of X satisfying d(x, q) < δ, where δ = d(q, p) − r. Then d(q, p) ≤ d(q, x) + d(x, p), by the Triangle Inequality, and therefore d(x, p) ≥ d(q, p) − d(x, q) > d(q, p) − δ = r. Thus BX (x, δ) ⊂ {x ∈ X : d(x, p) > r}, as required. Proposition 2.10 Let X be a metric space. The collection of open sets in X has the following properties:— (i) the empty set ∅ and the whole set X are both open sets; (ii) the union of any collection of open sets is itself an open set; (iii) the intersection of any finite collection of open sets is itself an open set. Proof The empty set ∅ is an open set by convention. Moreover the definition of an open set is satisfied trivially by the whole set X. Thus (i) is satisfied. Let A be any collection of open sets in X, and let U denote the union of all the open sets belonging to A. We must show that U is itself an open set. Let x ∈ U . Then x ∈ V for some open set V belonging to the collection A. Therefore there exists some δ > 0 such that BX (x, δ) ⊂ V . But V ⊂ U , and thus BX (x, δ) ⊂ U . This shows that U is open. Thus (ii) is satisfied. Finally let V1 , V2 , V3 , . . . , Vk be a finite collection of open sets in X, and let V = V1 ∩ V2 ∩ · · · ∩ Vk . Let x ∈ V . Now x ∈ Vj for all j, and therefore there exist strictly positive real numbers δ1 , δ2 , . . . , δk such that BX (x, δj ) ⊂ Vj for j = 1, 2, . . . , k. Let δ be the minimum of δ1 , δ2 , . . . , δk . Then δ > 0. (This is where we need the fact that we are dealing with a finite collection of open sets.) Moreover BX (x, δ) ⊂ BX (x, δj ) ⊂ Vj for j = 1, 2, . . . , k, and thus BX (x, δ) ⊂ V . This shows that the intersection V of the open sets V1 , V2 , . . . , Vk is itself open. Thus (iii) is satisfied. Example The set {(x, y, z) ∈ R3 : (x − 1)2 + (y − 2)2 + z 2 < 9 and z > 1} is an open set in R3 , since it is the intersection of the open ball of radius 3 about the point (1, 2, 0) with the open set {(x, y, z) ∈ R3 : z > 1}. Example The set {(x, y, z) ∈ R3 : (x − 1)2 + (y − 2)2 + z 2 < 4 or z > 1} is an open set in R3 , since it is the union of the open ball of radius 2 about the point (1, 2, 0) with the open set {(x, y, z) ∈ R3 : z > 1}. 10
Example For each natural number k, let Vk = {(x, y, z) ∈ R3 : k 2 (x2 + y 2 + z 2 ) < 1}. Now each set Vk is an open ball of radius 1/k about the origin, and is therefore an open set in R3 . However the intersection of the sets Vk for all natural numbers k is the set {(0, 0, 0)}, and thus the intersection of the sets Vk for all natural numbers k is not itself an open set in R3 . This example demonstrates that infinite intersections of open sets need not be open. Lemma 2.11 A sequence x1 , x2 , x3 , . . . of points in Rn converges to a point p if and only if, given any open set U which contains p, there exists some natural number N such that xj ∈ U for all j satisfying j ≥ N . Proof Suppose that the sequence x1 , x2 , x3 , . . . has the property that, given any open set U which contains p, there exists some natural number N such that xj ∈ U whenever j ≥ N . Let ε > 0 be given. The open ball B(p, ε) of radius ε about p is an open set by Lemma 2.6. Therefore there exists some natural number N such that xj ∈ B(p, ε) whenever j ≥ N . Thus |xj −p| < ε whenever j ≥ N . This shows that the sequence converges to p. Conversely, suppose that the sequence x1 , x2 , x3 , . . . converges to p. Let U be an open set which contains p. Then there exists some ε > 0 such that the open ball B(p, ε) of radius ε about p is a subset of U . Thus there exists some ε > 0 such that U contains all points x of X that satisfy |x − p| < ε. But there exists some natural number N with the property that |xj − p| < ε whenever j ≥ N , since the sequence converges to p. Therefore xj ∈ U whenever j ≥ N , as required. Lemma 2.12 Let X be a metric space. A sequence x1 , x2 , x3 , . . . of points in X converges to a point p if and only if, given any open set U which contains p, there exists some natural number N such that xj ∈ U for all j ≥ N. Proof Let x1 , x2 , x3 , . . . be a sequence satisfying the given criterion, and let ε > 0 be given. The open ball BX (p, ε) of radius ε about p is an open set (see Lemma 2.8). Therefore there exists some natural number N such that, if j ≥ N , then xj ∈ BX (p, ε), and thus d(xj , p) < ε. Hence the sequence (xj ) converges to p. Conversely, suppose that the sequence (xj ) converges to p. Let U be an open set which contains p. Then there exists some ε > 0 such that BX (p, ε) ⊂ U . But xj → p as j → +∞, and therefore there exists some natural number N such that d(xj , p) < ε for all j ≥ N . If j ≥ N then xj ∈ BX (p, ε) and thus xj ∈ U , as required. 11
Definition Let (X, d) be a metric space, and let x be a point of X. A subset N of X is said to be a neighbourhood of x (in X) if and only if there exists some δ > 0 such that BX (x, δ) ⊂ N , where BX (x, δ) is the open ball of radius δ about x. It follows directly from the relevant definitions that a subset V of a metric space X is an open set if and only if V is a neighbourhood of v for all v ∈ V .
2.8
Closed Sets in a Metric Space
A subset F of a metric space X is said to be a closed set in X if and only if its complement X \ F is open. (Recall that the complement X \ F of F in X is, by definition, the set of all points of the metric space X that do not belong to F .) The following result follows immediately from Lemma 2.8 and Lemma 2.9. Example The sets {(x, y, z) ∈ R3 : z ≥ c}, {(x, y, z) ∈ R3 : z ≤ c}, and {(x, y, z) ∈ R3 : z = c} are closed sets in R3 for each real number c, since the complements of these sets are open in R3 . Lemma 2.13 Let X be a metric space with distance function d, and let x0 ∈ X. Given any r ≥ 0, the sets {x ∈ X : d(x, x0 ) ≤ r},
{x ∈ X : d(x, x0 ) ≥ r}
are closed. In particular, the set {x0 } consisting of the single point x0 is a closed set in X. Let A be some collection of subsets of a set X. Then [ \ [ \ (X \ S), X\ S= (X \ S) S= X\ S∈A
S∈A
S∈A
S∈A
(i.e., the complement of the union of some collection of subsets of X is the intersection of the complements of those sets, and the complement of the intersection of some collection of subsets of X is the union of the complements of those sets, so that the operation of taking complements converts unions into intersections and intersections into unions). The following result therefore follows directly from Proposition 2.10. Proposition 2.14 Let X be a metric space. The collection of closed sets in X has the following properties:— 12
(i) the empty set ∅ and the whole set X are both closed sets; (ii) the intersection of any collection of closed sets in X is itself a closed set; (iii) the union of any finite collection of closed sets in X is itself a closed set. Lemma 2.15 Let F be a closed set in a metric space X and let (xj : j ∈ N) be a sequence of points of F . Suppose that xj → p as j → +∞. Then p also belongs to F . Proof Suppose that the limit p of the sequence were to belong to the complement X \ F of the closed set F . Now X \ F is open, and thus it would follow from Lemma 2.12 that there would exist some natural number N such that xj ∈ X \ F for all j ≥ N , contradicting the fact that xj ∈ F for all j. This contradiction shows that p must belong to F , as required. Definition Let A be a subset of a metric space X. The closure A of A is the intersection of all closed subsets of X containing A. Let A be a subset of the metric space X. Note that the closure A of A is itself a closed set in X, since the intersection of any collection of closed subsets of X is itself a closed subset of X (see Proposition 2.14). Moreover if F is any closed subset of X, and if A ⊂ F , then A ⊂ F . Thus the closure A of A is the smallest closed subset of X containing A. Lemma 2.16 Let X be a metric space with distance function d, let A be a subset of X, and let x be a point of X. Then x belongs to the closure A of A if and only if, given any ε > 0, there exists some point a of A such that d(x, a) < ε. Proof Let x be a point of X with the property that, given any ε > 0, there exists some a ∈ A satisfying d(x, a) < ε. Let F be any closed subset of X containing A. If x did not belong to F then there would exist some ε > 0 with the property that BX (x, ε) ∩ F = ∅, where BX (x, ε) denotes the open ball of radius ε about x. But this would contradict the fact that BX (x, ε) ∩ A is non-empty for all ε > 0. Thus the point x belongs to every closed subset F of X that contains A, and therefore x ∈ A, by definition of the closure A of A. Conversely let x ∈ A, and let ε > 0 be given. Let F be the complement X \ BX (x, ε) of BX (x, ε). Then F is a closed subset of X, and the point x does not belong to F . If BX (x, ε) ∩ A = ∅ then A would be contained in F , and hence x ∈ F , which is impossible. Therefore there exists a ∈ A satisfying d(x, a) < ε, as required. 13
2.9
Continuous Functions and Open and Closed Sets
Let X and Y be metric spaces, and let f : X → Y be a function from X to Y . We recall that the function f is continuous at a point p of X if and only if, given any ε > 0, there exists some δ > 0 such that dY (f (x), f (p)) < ε for all points x of X satisfying dX (x, p) < δ, where dX and dY denote the distance functions on X and Y respectively. Expressed in terms of open balls, this means that the function f : X → Y is continuous at p if and only if, given any ε > 0, there exists some δ > 0 such that f maps BX (p, δ) into BY (f (p), ε) (where BX (p, δ) and BY (f (p), ε) denote the open balls of radius δ and ε about p and f (p) respectively). Let f : X → Y be a function from a set X to a set Y . Given any subset V of Y , we denote by f −1 (V ) the preimage of V under the map f , defined by f −1 (V ) = {x ∈ X : f (x) ∈ V }. Proposition 2.17 Let X and Y be metric spaces, and let f : X → Y be a function from X to Y . The function f is continuous if and only if f −1 (V ) is an open set in X for every open set V of Y . Proof Suppose that f : X → Y is continuous. Let V be an open set in Y . We must show that f −1 (V ) is open in X. Let p be a point belonging to f −1 (V ). We must show that there exists some δ > 0 with the property that BX (p, δ) ⊂ f −1 (V ). Now f (p) belongs to V . But V is open, hence there exists some ε > 0 with the property that BY (f (p), ε) ⊂ V . But f is continuous at p. Therefore there exists some δ > 0 such that f maps the open ball BX (p, δ) into BY (f (p), ε) (see the remarks above). Thus f (x) ∈ V for all x ∈ BX (p, δ), showing that BX (p, δ) ⊂ f −1 (V ). We have thus shown that if f : X → Y is continuous then f −1 (V ) is open in X for every open set V in Y . Conversely suppose that f : X → Y has the property that f −1 (V ) is open in X for every open set V in Y . Let p be any point of X. We must show that f is continuous at p. Let ε > 0 be given. The open ball BY (f (p), ε) is an open set in Y , by Lemma 2.8, hence f −1 (BY (f (p), ε)) is an open set in X which contains p. It follows that there exists some δ > 0 such that BX (p, δ) ⊂ f −1 (BY (f (p), ε)). We have thus shown that, given any ε > 0, there exists some δ > 0 such that f maps the open ball BX (p, δ) into BY (f (p), ε). We conclude that f is continuous at p, as required. Let f : X → Y be a function between metric spaces X and Y . Then the preimage f −1 (Y \ G) of the complement Y \ G of any subset G of Y is equal to the complement X \ f −1 (G) of the preimage f −1 (G) of G. Indeed x ∈ f −1 (Y \ G) ⇐⇒ f (x) ∈ Y \ G ⇐⇒ f (x) 6∈ G ⇐⇒ x 6∈ f −1 (G). 14
Also a subset of a metric space is closed if and only if its complement is open. The following result therefore follows directly from Proposition 2.17. Corollary 2.18 Let X and Y be metric spaces, and let f : X → Y be a function from X to Y . The function f is continuous if and only if f −1 (G) is a closed set in X for every closed set G in Y . Let f : X → Y be a continuous function from a metric space X to a metric space Y . Then, for any point y of Y , the set {x ∈ X : f (x) = y} is a closed subset of X. This follows from Corollary 2.18, together with the fact that the set {y} consisting of the single point y is a closed subset of the metric space Y . Let X be a metric space, and let f : X → R be a continuous function from X to R. Then, given any real number c, the sets {x ∈ X : f (x) > c},
{x ∈ X : f (x) < c}
are open subsets of X, and the sets {x ∈ X : f (x) ≥ c},
{x ∈ X : f (x) ≤ c},
{x ∈ X : f (x) = c}
are closed subsets of X. Also, given real numbers a and b satisfying a < b, the set {x ∈ X : a < f (x) < b} is an open subset of X, and the set {x ∈ X : a ≤ f (x) ≤ b} is a closed subset of X. Similar results hold for continuous functions f : X → C from X to C. Thus, for example, {x ∈ X : |f (x)| < R},
{x ∈ X : |f (x)| > R}
are open subsets of X and {x ∈ X : |f (x)| ≤ R},
{x ∈ X : |f (x)| ≥ R},
{x ∈ X : |f (x)| = R}
are closed subsets of X, for any non-negative real number R.
15
2.10
Homeomorphisms
Let X and Y be metric spaces. A function h: X → Y from X to Y is said to be a homeomorphism if it is a bijection and both h: X → Y and its inverse h−1 : Y → X are continuous. If there exists a homeomorphism h: X → Y from a metric space X to a metric space Y , then the metric spaces X and Y are said to be homeomorphic. The following result follows directly on applying Proposition 2.17 to h: X → Y and to h−1 : Y → X. Lemma 2.19 Any homeomorphism h: X → Y between metric spaces X and Y induces a one-to-one correspondence between the open sets of X and the open sets of Y : a subset V of Y is open in Y if and only if h−1 (V ) is open in X. Let X and Y be metric spaces, and let h: X → Y be a homeomorphism. A sequence x1 , x2 , x3 , . . . of points in X is convergent in X if and only if the corresponding sequence h(x1 ), h(x2 ), h(x3 ), . . . is convergent in Y . (This follows directly on applying Lemma 2.4 to h: X → Y and its inverse h−1 : Y → X.) Let Z and W be metric spaces. A function f : Z → X is continuous if and only if h◦f : Z → Y is continuous, and a function g: Y → W is continuous if and only if g ◦ h: X → W is continuous.
16
Course 221: Michaelmas Term 2006 Section 3: Complete Metric Spaces, Normed Vector Spaces and Banach Spaces David R. Wilkins c David R. Wilkins 1997–2006 Copyright
Contents 3 Complete Metric Spaces, Normed Vector Spaces and Banach Spaces 3.1 The Least Upper Bound Principle . . . . . . . . . . . . . . . . 3.2 Monotonic Sequences of Real Numbers . . . . . . . . . . . . . 3.3 Upper and Lower Limits of Bounded Sequences of Real Numbers 3.4 Convergence of Sequences in Euclidean Space . . . . . . . . . 3.5 Cauchy’s Criterion for Convergence . . . . . . . . . . . . . . . 3.6 The Bolzano-Weierstrass Theorem . . . . . . . . . . . . . . . . 3.7 Complete Metric Spaces . . . . . . . . . . . . . . . . . . . . . 3.8 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 3.9 Bounded Linear Transformations . . . . . . . . . . . . . . . . 3.10 Spaces of Bounded Continuous Functions on a Metric Space . 3.11 The Contraction Mapping Theorem and Picard’s Theorem . . 3.12 The Completion of a Metric Space . . . . . . . . . . . . . . . .
1
2 2 2 3 5 5 7 9 11 15 19 20 23
3 3.1
Complete Metric Spaces, Normed Vector Spaces and Banach Spaces The Least Upper Bound Principle
A set S of real numbers is said to be bounded above if there exists some real number B such x ≤ B for all x ∈ S. Similarly a set S of real numbers is said to be bounded below if there exists some real number A such that x ≥ A for all x ∈ S. A set S of real numbers is said to be bounded if it is bounded above and below. Thus a set S of real numbers is bounded if and only if there exist real numbers A and B such that A ≤ x ≤ B for all x ∈ S. Any bounded non-decreasing sequence of real numbers is convergent. This result can be proved using the Least Upper Bound Principle. The Least Upper Bound Principle expresses a basic property of the real number system. It states that, given any non-empty set S of real numbers that is bounded above, there exists a least upper bound (or supremum) for the set S. We shall denote the least upper bound of such a set S by sup S. It is the least real number with the property that s ≤ sup S for all s ∈ S. The Least Upper Bound Principle also guarantees that, given any nonempty set S of real numbers that is bounded below, there exists a greatest lower bound (or infimum) for the set S. We shall denote the greatest lower bound of such a set S by inf S. It is the greatest real number with the property that s ≥ inf S for all s ∈ S. One can readily verify that inf S = − sup{x ∈ R : −x ∈ S} for any set S of real numbers that is bounded below.
3.2
Monotonic Sequences of Real Numbers
An infinite sequence a1 , a2 , a3 , . . . of real numbers is said to be bounded above if the corresponding set {a1 , a2 , a3 , . . .} of values of the sequence is bounded above. Similarly an infinite sequence a1 , a2 , a3 , . . . of real numbers is said to be bounded below if the set {a1 , a2 , a3 , . . .} is bounded below. An infinite sequence is said to be bounded if it is bounded above and below. Thus an infinite sequence a1 , a2 , a3 , . . . of real numbers is bounded if and only if there exist real numbers A and B such that A ≤ aj ≤ B for all positive integers j. An infinite sequence a1 , a2 , a3 , . . . is said to be non-decreasing if aj+1 ≥ aj for all positive integers j. Similarly an infinite sequence a1 , a2 , a3 , . . . is said to be non-increasing if aj+1 ≤ aj for all positive integers j. A sequence is said to be monotonic if it is non-increasing, or it is non-decreasing. 2
Theorem 3.1 Any non-decreasing sequence of real numbers that is bounded above is convergent. Similarly any non-increasing sequence of real numbers that is bounded below is convergent. Proof Let a1 , a2 , a3 , . . . be a non-decreasing sequence of real numbers that is bounded above. It follows from the Least Upper Bound Principle that there exists a least upper bound l for the set {aj : j ∈ N}. We claim that the sequence converges to l. Let ε > 0 be given. We must show that there exists some positive integer N such that |aj − l| < ε whenever j ≥ N . Now l − ε is not an upper bound for the set {aj : j ∈ N} (since l is the least upper bound), and therefore there must exist some positive integer N such that aN > l − ε. But then l − ε < aj ≤ l whenever j ≥ N , since the sequence is non-decreasing and bounded above by l. Thus |aj − l| < ε whenever j ≥ N . Therefore aj → l as j → +∞, as required. If the sequence a1 , a2 , a3 , . . . is non-increasing and bounded below then the sequence −a1 , −a2 , −a3 , . . . is non-decreasing and bounded above, and is therefore convergent. It follows that the sequence a1 , a2 , a3 , . . . is also convergent.
3.3
Upper and Lower Limits of Bounded Sequences of Real Numbers
Let a1 , a2 , a3 , . . . be a bounded infinite sequence of real numbers, and, for each positive integer j, let Sj = {aj , aj+1 , aj+2 , . . .} = {ak : k ≥ j}. The sets S1 , S2 , S3 , . . . are all bounded. It follows that there exist well-defined infinite sequences u1 , u2 , u3 , . . . and l1 , l2 , l3 , . . . of real numbers, where uj = sup Sj and lj = inf Sj for all positive integers j. Now Sj+1 is a subset of Sj for each positive integer j, and therefore uj+1 ≤ uj and lj+1 ≥ lj for each positive integer j. It follows that the bounded infinite sequence (uj : j ∈ N) is a nonincreasing sequence, and is therefore convergent (Theorem 3.1). Similarly the bounded infinite sequence (lj : j ∈ N) is a non-decreasing sequence, and is therefore convergent. We define lim sup aj = j→+∞
lim inf aj = j→+∞
lim uj = lim sup{aj , aj+1 , aj+2 , . . .},
j→+∞
j→+∞
lim lj = lim inf{aj , aj+1 , aj+2 , . . .}.
j→+∞
j→+∞
3
The quantity lim sup aj is referred to as the upper limit of the sequence j→+∞
a1 , a2 , a3 , . . .. The quantity lim inf aj is referred to as the lower limit of the j→+∞ sequence a1 , a2 , a3 , . . .. Note that every bounded infinite sequence a1 , a2 , a3 , . . .. of real numbers has a well-defined upper limit lim sup aj and a well-defined lower limit j→+∞
lim inf aj . j→+∞
Proposition 3.2 A bounded infinite sequence a1 , a2 , a3 , . . .. of real numbers is convergent if and only if lim inf aj = lim sup aj , in which case the limit of j→+∞
j→+∞
the sequence is equal to the common value of its upper and lower limits. Proof For each positive integer j, let uj = sup Sj and lj = inf Sj , where Sj = {aj , aj+1 , aj+2 , . . .} = {ak : k ≥ j}. Then lim inf aj = lim lj and lim sup aj = lim uj . j→+∞
j→+∞
j→+∞
j→+∞
Suppose that lim inf aj = lim sup aj = c for some real number c. Then, j→+∞
j→+∞
given any positive real number ε, there exist natural numbers N1 and N2 such that c − ε < lj ≤ c whenever j ≥ N1 , and c ≤ uj ≤ c + ε whenever j ≥ N2 . Let N be the maximum of N1 and N2 . If j ≥ N then aj ∈ SN , and therefore c − ε < lN ≤ aj ≤ uN < c + ε. Thus |aj − c| < ε whenever j ≥ N . This proves that the infinite sequence a1 , a2 , a3 , . . . converges to the limit c. Conversely let a1 , a2 , a3 , . . . be a bounded sequence of real numbers that converges to some value c. Let ε > 0 be given. Then there exists some natural number N such that c − 12 ε < aj < c + 21 ε whenever j ≥ N . It follows that Sj ⊂ (c − 21 ε, c + 12 ε) whenever j ≥ N . But then c − 12 ε ≤ lj ≤ uj ≤ c + 21 ε whenever j ≥ N , where uj = sup Sj and lj = inf Sj . We see from this that, given any positive real number ε, there exists some natural number N such that |lj − c| < ε and |uj − c| < ε whenever j ≥ N . It follows from this that lim sup aj = lim uj = c and lim inf aj = lim lj = c, j→+∞
j→+∞
j→+∞
as required. 4
j→+∞
3.4
Convergence of Sequences in Euclidean Space
Lemma 3.3 Let p be a point of Rn , where p = (p1 , p2 , . . . , pn ). Then a sequence x1 , x2 , x3 , . . . of points in Rn converges to p if and only if the ith components of the elements of this sequence converge to pi for i = 1, 2, . . . , n. Proof Let xji and pi denote the ith components of xj and p. Then |xji −pi | ≤ |xj − p| for all j. It follows directly from the definition of convergence that if xj → p as j → +∞ then xji → pi as j → +∞. Conversely suppose that, for each i, xji → pi as j → +∞. Let ε > 0 be given. √ Then there exist natural numbers N1 , N2 , . . . , Nn such that |xji −pi | < ε/ n whenever j ≥ Ni . Let N be the maximum of N1 , N2 , . . . , Nn . If j ≥ N then n X √ 2 |xj − p| = (xji − pi )2 < n(ε/ n)2 = ε2 , i=1
so that xj → p as j → +∞.
3.5
Cauchy’s Criterion for Convergence
Definition An infinite sequence a1 , a2 , a3 , . . . of real numbers said to be a Cauchy sequence if, given any positive real number ε, there exists some positive integer N such that |aj − ak | < ε for all j and k satisfying j ≥ N and k ≥ N . Theorem 3.4 (Cauchy’s Criterion for Convergence) A sequence of real numbers is convergent if and only if it is a Cauchy sequence. Proof Let a1 , a2 , a3 , . . . be a sequence of real numbers. Suppose that this sequence converges to some limit c. Let some positive real number ε be given. Then there exists some natural number N such that |aj − c| < 21 ε whenever j ≥ N . If j and k are positive integers satisfying j ≥ N and k ≥ N then |aj − ak | ≤ |aj − c| + |c − ak | < 21 ε + 12 ε = ε. This shows that any convergent sequence of real numbers is a Cauchy sequence. Next let a1 , a2 , a3 , . . . be a Cauchy sequence of real numbers. We must prove that this sequence is convergent. First we show that it is bounded. Now there exists some natural number M such that |aj − ak | < 1 for all positive integers j and k satisfying j > M and k > M . Let R be the maximum of the real numbers |a1 |, |a2 |, . . . , |aM −1 |, |aM | + 1. 5
It is clear that |aj | ≤ R when j < M . If j ≥ M then |aj − aM | < 1, and therefore |aj | < |aM | + 1 ≤ R. Thus |aj | ≤ R for all positive integers j. This proves that the Cauchy sequence is bounded. For each positive integer j, let uj = sup{ak : k ≥ j} and lj = inf{ak : k ≥ j}. Then u1 , u2 , u3 , . . . is a non-increasing sequence which converges to lim sup aj , j→+∞
and l1 , l2 , l3 , . . . is a non-decreasing sequence which converges to lim inf aj . j→+∞
Let ε be some given positive real number. Then there exists some natural number N such that |aj − ak | < ε for all positive integers j and k satisfying j ≥ N and k ≥ N . It follows from this that aN − ε < aj < aN + ε for all positive integers j satisfying j ≥ N . It then follow from the definitions of uN and lN that aN − ε ≤ lN ≤ uN ≤ aN + ε. Now 0 ≤ uj − lj ≤ uN − lN whenever j ≥ N . It follows that lim sup aj − lim inf aj = lim (uj − lj ) ≤ uN − lN ≤ 2ε. j→+∞
j→+∞
j→+∞
Thus if d = lim sup aj − lim inf aj then 0 ≤ d ≤ 2ε for all positive real j→+∞
j→+∞
numbers ε. It must therefore be the case that d = 0. Thus lim sup aj = j→+∞
lim inf aj . It now follows from Proposition 3.2 that the Cauchy sequence j→+∞
a1 , a2 , a3 , . . . is convergent, as required. An infinite sequence x1 , x2 , x3 , . . . of points of n-dimensional Euclidean space Rn is said to be a Cauchy sequence if, given any positive real number ε, there exists some positive integer N such that |xj − xk | < ε for all j and k satisfying j ≥ N and k ≥ N . Corollary 3.5 Every Cauchy sequence of points of n-dimensional Euclidean space Rn is convergent. Proof If an infinite sequence x1 , x2 , x2 , . . . of points in Rn is a Cauchy sequence, then, for each integer i between 1 and n, the ith components of those points constitute a Cauchy sequence of real numbers. But every Cauchy sequence of real numbers is convergent (Theorem 3.4). Therefore the ith components of the sequence x1 , x2 , x2 , . . . converge. It then follows from Lemma 3.3 that the Cauchy sequence x1 , x2 , x3 , . . . converges to some point of Rn , as required.
6
3.6
The Bolzano-Weierstrass Theorem
Let a1 , a2 , a3 , . . . be an infinite sequence of real numbers. A subsequence of this sequence is a sequence that is of the form am1 , am2 , am3 , . . ., where m1 , m2 , m3 , . . . are positive integers satisfying m1 < m2 < m3 < · · · . Thus, for example, a2 , a4 , a6 , . . . and a1 , a4 , a9 , . . . are subsequences of the given sequence. Lemma 3.6 Let a1 , a2 , a3 , . . . be a bounded infinite sequence of real numbers, and let c be a real number satisfying c < lim sup aj . Then there exist infinitely j→+∞
many positive integers j such that aj > c. Proof Let N be a positive integer. Then c < lim sup aj ≤ sup{aj : j ≥ N }, j→+∞
It follows that c is not an upper bound for the set {aj : j ≥ N }, and therefore there exists some positive integer satisfying j ≥ N for which aj > c. We conclude from this that there does not exist any positive integer N with the property that aj ≤ c whenever j ≥ N . Therefore {j ∈ N : aj > c} is not a finite set. The result follows. Proposition 3.7 Any bounded infinite sequence a1 , a2 , a3 , . . . of real numbers has a subsequence which converges to the upper limit lim sup aj of the j→+∞
given sequence. Proof Let s = lim sup aj , and let j→+∞
uN = sup{aN , aN +1 , aN +2 , . . .} = sup{aj : j ≥ N } for all positive integers N . The upper limit s of the sequence a1 , a2 , a3 , . . . is then the limit of the non-increasing sequence u1 , u2 , u3 , . . .. Let ε be positive real number. The convergence of the infinite sequence u1 , u2 , u3 , . . . to s ensures that there exists some positive integer N such that uN < s + ε. But then aj < s + ε whenever j ≥ N . It follows that the number of positive integers j for which aj ≥ s + ε is finite. Also it follows from Lemma 3.6 that the number of positive integers j for which aj > s − ε is infinite. Putting these two facts together, we see that the number of positive integers j for which s − ε < aj < s + ε is infinite. (Indeed let S1 = {j ∈ N : aj > s − ε} and S2 = {j ∈ N : aj ≥ s + ε}. Then S1 is an 7
infinite set, S2 is a finite set, and therefore S1 \ S2 is an infinite set. Moreover s − ε < aj < s + ε for all j ∈ S1 \ S2 .) Now given any positive integer j, and given any positive number mj such that |amj − s| < j −1 , there exists some positive integer mj+1 such that mj+1 > mj and |amj+1 − s| < (j + 1)−1 . It follow from this that there exists a subsequence am1 , am2 , am3 , . . . of the infinite sequence a1 , a2 , a3 , . . ., where m1 < m2 < m3 < · · ·, which has the property that |amj − s| < j −1 for all positive integers j. This subsequence converges to s as required. The following theorem, known as the Bolzano-Weierstrass Theorem, is an immediate consequence of Proposition 3.7. Theorem 3.8 (Bolzano-Weierstrass) Every bounded sequence of real numbers has a convergent subsequence. The following result is the analogue of the Bolzano-Weierstrass Theorem for sequences in n-dimensional Euclidean space. Corollary 3.9 Every bounded sequence of points in Rn has a convergent subsequence. Proof Let x1 , x2 , x3 , . . . be a bounded sequence of points in Rn . Let us (i) denote by xj the ith component of the point xj , so that (1)
(2)
(n)
xj = (xj , xj , . . . , xj ) for all positive integers j. Suppose that, for some integer s between 1 and n − 1, the sequence x1 , x2 , x3 , . . . has a subsequence xp1 , xp2 , xp3 , . . . with the property that, for each integer i satisfying 1 ≤ i ≤ s, the ith components of the members of this subsequence constitute a convergent sequence (i) (i) (i) (s+1) for each positive integer j. xp1 , xp2 , xp3 , . . . of real numbers. Let aj = xpj Then a1 , a2 , a3 , . . . is a bounded sequence of real numbers. It follows from the Bolzano-Weierstrass Theorem (Theorem 3.8) that this sequence has a convergent subsequence am1 , am2 , am3 , . . ., where m1 < m2 < m3 < · · ·. Let qj = pmj for each positive integer j. Then xq1 , xq2 , xq3 , . . . is a subsequence of the original bounded sequence x1 , x2 , x3 , . . . which has the property that, for each integer i satisfying 1 ≤ i ≤ s + 1, the ith components of the members (i) (i) (i) of the subsequence constitute a convergent sequence xq1 , xq2 , xq3 , . . . of real numbers. Repeated applications of this result show that the bounded sequence x1 , x2 , x3 , . . . has a subsequence xr1 , xr2 , xr3 , . . . with the property that, for each integer i satisfying 1 ≤ i ≤ n, the ith components of the members 8
of the subsequence constitute a convergent sequence of real numbers. Let z = (z1 , z2 , . . . , zn ) where, for each value of i between 1 and n, the ith com(i) (i) (i) ponent zi of z is the limit of the sequence xr1 , xr2 , xr3 , . . . of ith components of the members of the subsequence xr1 , xr2 , xr3 , . . . . Then this subsequence converges to the point z, as required.
3.7
Complete Metric Spaces
Definition Let X be a metric space with distance function d. A sequence x1 , x2 , x3 , . . . of points of X is said to be a Cauchy sequence in X if and only if, given any ε > 0, there exists some positive integer N such that d(xj , xk ) < ε for all j and k satisfying j ≥ N and k ≥ N . Every convergent sequence in a metric space is a Cauchy sequence. Indeed let X be a metric space with distance function d, and let x1 , x2 , x3 , . . . be a sequence of points in X which converges to some point p of X. Given any positive real number ε, there exists some positive integer N such that d(xn , p) < ε/2 whenever n ≥ N . But then it follows from the Triangle Inequality that d(xj , xk ) ≤ d(xj , p) + d(p, xk ) < 12 ε + 21 ε = ε whenever j ≥ N and k ≥ N . Definition A metric space (X, d) is said to be complete if every Cauchy sequence in X converges to some point of X. The spaces R and C are complete metric spaces with respect to the distance function given by d(z, w) = |z − w|. Indeed this result is Cauchy’s Criterion for Convergence. However the space Q of rational numbers (with distance function d(q, r) = |q − r|) is not complete. Indeed one can construct an √ infinite sequence q1 , q2 , q3 , . . . of rational numbers which converges (in R) to 2. Such a sequence of rational numbers is a Cauchy sequence in both R and Q. However this Cauchy sequence does not converge to an point of the √ metric space Q (since 2 is an irrational number). Thus the metric space Q is not complete. It follows immediately from Corollary 3.5 that n-dimensional Euclidean space Rn is a complete metric space. Lemma 3.10 Let X be a complete metric space, and let A be a subset of X. Then A is complete if and only if A is closed in X. 9
Proof Suppose that A is closed in X. Let a1 , a2 , a3 , . . . be a Cauchy sequence in A. This Cauchy sequence must converge to some point p of X, since X is complete. But the limit of every sequence of points of A must belong to A, since A is closed. In particular p ∈ A. We deduce that A is complete. Conversely, suppose that A is complete. Suppose that A were not closed. Then the complement X \ A of A would not be open, and therefore there would exist a point p of X \ A with the property that BX (p, δ) ∩ A is nonempty for all δ > 0, where BX (p, δ) denotes the open ball in X of radius δ centred at p. We could then find a sequence a1 , a2 , a3 , . . . of points of A satisfying d(aj , p) < 1/j for all positive integers j. This sequence would be a Cauchy sequence in A which did not converge to a point of A, contradicting the completeness of A. Thus A must be closed, as required. Theorem 3.11 The metric space Rn (with the Euclidean distance function) is a complete metric space. Proof Let p1 , p2 , p3 , . . . be a Cauchy sequence in Rn . Then for each integer m between 1 and n, the sequence (p1 )m , (p2 )m , (p3 )m , . . . is a Cauchy sequence of real numbers, where (pj )m denotes the mth component of pj . But every Cauchy sequence of real numbers is convergent (Cauchy’s criterion for convergence). Let qm = lim (pj )m for m = 1, 2, . . . , n, and let j→+∞
q = (q1 , q2 , . . . , qn ). We claim that pj → q as j → +∞. Let ε > 0 be given. √ Then there exist positive integers N1 , N2 , . . . , Nn such that |(pj )m − qm | < ε/ n whenever j ≥ Nm (where m = 1, 2, . . . , n). Let N be the maximum of N1 , N2 , . . . , Nn . If j ≥ N then 2
|pj − q| =
n X
((pj )m − qm )2 < ε2 .
m=1
Thus pj → q as j → +∞. Thus every Cauchy sequence in Rn is convergent, as required. The following result follows directly from Lemma 3.10 and Theorem 3.11. Corollary 3.12 A subset X of Rn is complete if and only if it is closed. Example The n-sphere S n (with the chordal distance function given by d(x, y) = |x − y|) is a complete metric space, where S n = {(x1 , x2 , . . . , xn+1 ) ∈ Rn+1 : x21 + x22 + · · · + x2n+1 = 1}.
10
3.8
Normed Vector Spaces
A set X is a vector space over some field F if • given any x, y ∈ X and λ ∈ F, there are well-defined elements x + y and λx of X, • X is an Abelian group with respect to the operation + of addition, • the identities λ(x + y) = λx + λy,
(λ + µ)x = λx + µx,
(λµ)x = λ(µx),
1x = x
are satisfied for all x, y ∈ X and λ, µ ∈ F. Elements of the field F are referred to as scalars. We consider here only real vector spaces and complex vector spaces: these are vector spaces over the fields of real numbers and complex numbers respectively. Definition A norm k.k on a real or complex vector space X is a function, associating to each element x of X a corresponding real number kxk, such that the following conditions are satisfied:— (i) kxk ≥ 0 for all x ∈ X, (ii) kx + yk ≤ kxk + kyk for all x, y ∈ X, (iii) kλxk = |λ| kxk for all x ∈ X and for all scalars λ, (iv) kxk = 0 if and only if x = 0. A normed vector space (X, k.k) consists of a a real or complex vector space X, together with a norm k.k on X. Note that any normed complex vector space can also be regarded as a normed real vector space. Example The field R is a one-dimensional normed vector space over itself: the norm |t| of t ∈ R is the absolute value of t. Example The field C is a one-dimensional normed vector space over itself: the norm |z| of z ∈ C is the modulus of z. The field C is also a twodimensional normed vector space over R. 11
Example Let k.k1 , k.k2 and k.k∞ be the real-valued functions on Cn defined by kzk1 =
n X
|zj |,
j=1
kzk2 =
n X
! 21 |zj |2
,
j=1
kzk∞ = max(|z1 |, |z2 |, . . . , |zn |), for each z ∈ Cn , where z = (z1 , z2 , . . . , zn ). Then k.k1 , k.k2 and k.k∞ are norms on Cn . In particular, if we regard Cn as a 2n-dimensional real vector space naturally isomorphic to R2n (via the isomorphism (z1 , z2 , . . . , zn ) 7→ (x1 , y1 , x2 , y2 , . . . , xn , yn ), where xj and yj are the real and imaginary parts of zj for j = 1, 2, . . . , n) then k.k2 represents the Euclidean norm on this space. The inequality kz + wk2 ≤ kzk2 + kwk2 satisfied for all z, w ∈ Cn is therefore just the standard Triangle Inequality for the Euclidean norm. Example The space Rn is also an n-dimensional real normed vector space with respect to the norms k.k1 , k.k2 and k.k∞ defined above. Note that k.k2 is the standard Euclidean norm on Rn . Example Let `1 = {(z1 , z2 , z3 , . . .) ∈ C∞ : |z1 | + |z2 | + |z3 | + · · · converges}, `2 = {(z1 , z2 , z3 , . . .) ∈ C∞ : |z1 |2 + |z2 |2 + |z3 |2 + · · · converges}, `∞ = {(z1 , z2 , z3 , . . .) ∈ C∞ : the sequence |z1 |, |z2 |, |z3 |, . . . is bounded}. where C∞ denotes the set of all sequences (z1 , z2 , z3 , . . .) of complex numbers. Then `1 , `2 and `∞ are infinite-dimensional normed vector spaces, with norms k.k1 , k.k2 and k.k∞ respectively, where kzk1 =
+∞ X
|zj |,
j=1
kzk2 =
+∞ X
! 12 |zj |2
,
j=1
kzk∞ = sup{|z1 |, |z2 |, |z3 |, . . .}. 12
(For example, to show that kz + wk2 ≤ kzk2 + kwk2 for all z, w ∈ `2 , we note that ! 21 ! 12 ! 12 n n n X X X |zj + wj |2 ≤ |zj |2 + |wj |2 ≤ kzk2 + kwk2 j=1
j=1
j=1
for all positive integers n, by the Triangle Inequality in Cn . Taking limits as n → +∞, we deduce that kz + wk2 ≤ kzk2 + kwk2 , as required.) If x1 , x2 , . . . , xm are elements of a normed vector space X then
m m
X
X
kxk k, xk ≤
k=1
k=1
where k.k denotes the norm on X. (This can be verified by induction on m, using the inequality kx + yk ≤ kxk + kyk.) A norm k.k on a vector space X induces a corresponding distance function on X: the distance d(x, y) between elements x and y of X is defined by d(x, y) = kx − yk. This distance function satisfies the metric space axioms. Thus any vector space with a given norm can be regarded as a metric space. Lemma 3.13 Let X be a normed vector space over the field F, where F = R or C. Let (xj ) and (yj ) be convergent sequences in X, and let (λj ) be a convergent sequence in F. Then the sequences (xj + yj ) and (λj xj ) are convergent in X, and lim (xj + yj ) =
lim xj + lim yj , j→+∞ lim (λj xj ) = lim λj lim xj .
j→+∞
j→+∞
j→+∞
j→+∞
j→+∞
Proof First we prove that lim (xj + yj ) = x + y, where Let x = lim xj , j→+∞
j→+∞
y = lim yj . Let ε > 0 be given. Then there exist natural numbers N1 and j→+∞
N2 such that kxj − xk < 21 ε whenever j ≥ N1 and kyj − yk < 21 ε whenever j ≥ N2 . Let N be the maximum of N1 and N2 . If j ≥ N then k(xj + yj ) − (x + y)k ≤ kxj − xk + kyj − yk < ε. It follows from this that lim (xj + yj ) = x + y. j→+∞
Next we prove that lim (λj xj ) = λx, where λ = lim λj . Let ε > 0 be j→+∞
j→+∞
given. Then there exist natural numbers N3 and N4 such that ε kxj − xk < 2(|λ| + 1) 13
whenever j ≥ N3 , and |λj − λ|
0 be given. We can find N such that
+∞ P
kxn k < ε, since
+∞ P
kxn k is
n=1
n=N
convergent. Let sn = x1 + x2 + · · · + xn . If j ≥ N , k ≥ N and j < k then
k k +∞
X
X X
ksk − sj k = xn ≤ kxn k ≤ kxn k < ε.
n=j+1
n=j+1
14
n=N
Thus s1 , s2 , s3 , . . . is a Cauchy sequence in X, and therefore converges to +∞ P some element s of X, since X is complete. But then s = xj . Moreover, j=1
on choosing m large enough to ensure that ks − sm k < ε, we deduce that
m +∞ m m m
X
X
X X X
kxn k + ε. xn < kxn k + s − xn ≤ xn + s − ksk ≤
n=1
n=1 n=1 n=1 n=1 Since this inequality holds for all ε > 0, we conclude that ksk ≤
+∞ X
kxn k,
n=1
as required.
3.9
Bounded Linear Transformations
Let X and Y be real or complex vector spaces. A function T : X → Y is said to be a linear transformation if T (x + y) = T x + T y and T (λx) = λT x for all elements x and y of X and scalars λ. A linear transformation mapping X into itself is referred to as a linear operator on X. Definition Let X and Y be normed vector spaces. A linear transformation T : X → Y is said to be bounded if there exists some non-negative real number C with the property that kT xk ≤ Ckxk for all x ∈ X. If T is bounded, then the smallest non-negative real number C with this property is referred to as the operator norm of T , and is denoted by kT k. Lemma 3.15 Let X and Y be normed vector spaces, and let S: X → Y and T : X → Y be bounded linear transformations. Then S + T and λS are bounded linear transformations for all scalars λ, and kS + T k ≤ kSk + kT k,
kλSk = |λ|kSk.
Moreover kSk = 0 if and only if S = 0. Thus the vector space B(X, Y ) of bounded linear transformations from X to Y is a normed vector space (with respect to the operator norm). Proof k(S+T )xk ≤ kSxk+kT xk ≤ (kSk+kT k)kxk for all x ∈ X. Therefore S + T is bounded, and kS + T k ≤ kSk + kT k. Using the fact that k(λS)xk = |λ| kSxk for all x ∈ X, we see that λS is bounded, and kλSk = |λ| kSk. If S = 0 then kSk = 0. Conversely if kSk = 0 then kSxk ≤ kSk kxk = 0 for all x ∈ X, and hence S = 0. The result follows. 15
Lemma 3.16 Let X, Y and Z be normed vector spaces, and let S: X → Y and T : Y → Z be bounded linear transformations. Then the composition T S of S and T is also bounded, and kT Sk ≤ kT k kSk. Proof kT Sxk ≤ kT k kSxk ≤ kT k kSk kxk for all x ∈ X. The result follows. Proposition 3.17 Let X and Y be normed vector spaces, and let T : X → Y be a linear transformation from X to Y . Then the following conditions are equivalent:— (i) T : X → Y is continuous, (ii) T : X → Y is continuous at 0, (iii) T : X → Y is bounded. Proof Obviously (i) implies (ii). We show that (ii) implies (iii) and (iii) implies (i). The equivalence of the three conditions then follows immediately. Suppose that T : X → Y is continuous at 0. Then there exists δ > 0 such that kT xk < 1 for all x ∈ X satisfying kxk < δ. Let C be any positive real number satisfying C > 1/δ. If x is any non-zero element of X then kλxk < δ, where λ = 1/(Ckxk), and hence kT xk = Ckxk kλT xk = Ckxk kT (λx)k < Ckxk. Thus kT xk ≤ Ckxk for all x ∈ X, and hence T : X → Y is bounded. Thus (ii) implies (iii). Finally suppose that T : X → Y is bounded. Let x be a point of X, and let ε > 0 be given. Choose δ > 0 satisfying kT kδ < ε. If x0 ∈ X satisfies kx0 − xk < δ then kT x0 − T xk = kT (x0 − x)k ≤ kT k kx0 − xk < kT kδ < ε. Thus T : X → Y is continuous. Thus (iii) implies (i), as required. Proposition 3.18 Let X be a normed vector space and let Y be a Banach space. Then the space B(X, Y ) of bounded linear transformations from X to Y is also a Banach space. Proof We have already shown that B(X, Y ) is a normed vector space (see Lemma 3.15). Thus it only remains to show that B(X, Y ) is complete. Let S1 , S2 , S3 , . . . be a Cauchy sequence in B(X, Y ). Let x ∈ X. We claim that S1 x, S2 x, S3 x, . . . is a Cauchy sequence in Y . This result is trivial 16
if x = 0. If x 6= 0, and if ε > 0 is given then there exists some positive integer N such that kSj − Sk k < ε/kxk whenever j ≥ N and k ≥ N . But then kSj x − Sk xk ≤ kSj − Sk k kxk < ε whenever j ≥ N and k ≥ N . This shows that S1 x, S2 x, S3 x, . . . is indeed a Cauchy sequence. It therefore converges to some element of Y , since Y is a Banach space. Let the function S: X → Y be defined by Sx = lim Sn x. Then n→+∞
S(x + y) = lim (Sn x + Sn y) = lim Sn x + lim Sn y = Sx + Sy, n→+∞
n→+∞
n→+∞
(see Lemma 3.13), and S(λx) = lim Sn (λx) = λ lim Sn x = λSx, n→+∞
n→+∞
Thus S: X → Y is a linear transformation. Next we show that Sn → S in B(X, Y ) as n → +∞. Let ε > 0 be given. Then there exists some positive integer N such that kSj −Sn k < 12 ε whenever j ≥ N and n ≥ N , since the sequence S1 , S2 , S3 , . . . is a Cauchy sequence in B(X, Y ). But then kSj x − Sn xk ≤ 12 εkxk for all j ≥ N and n ≥ N , and thus
≤ lim kSj x − Sn xk kSx − Sn xk = lim (S x − S x) j n
j→+∞
≤
j→+∞
lim kSj − Sn k kxk ≤ 12 εkxk
j→+∞
for all n ≥ N (since the norm is a continuous function on Y ). But then kSxk ≤ kSn xk + kSx − Sn xk ≤ kSn k + 21 ε kxk for any n ≥ N , showing that S: X → Y is a bounded linear transformation, and kS − Sn k ≤ 12 ε < ε for all n ≥ N , showing that Sn → S in B(X, Y ) as n → +∞. Thus the Cauchy sequence S1 , S2 , S3 , . . . is convergent in B(X, Y ), as required. Corollary 3.19 Let X and Y be Banach spaces, and let T1 , T2 , T3 , . . . be +∞ P bounded linear transformations from X to Y . Suppose that kTn k is conn=0
vergent. Then
+∞ P
Tn is convergent, and
n=0
+∞ +∞
X
X
Tn ≤ kTn k.
n=0 n=0 17
Proof The space B(X, Y ) of bounded linear maps from X to Y is a Banach space by Proposition 3.18. The result therefore follows immediately on applying Lemma 3.14. Example Let T be a bounded linear operator on a Banach space X (i.e., a bounded linear transformation from X to itself). The infinite series +∞ X kT kn
n!
n=0
converges to exp(kT k). It follows immediately from Lemma 3.16 (using induction on n) that kT n k ≤ kT kn for all n ≥ 0 (where T 0 is the identity operator on X). It therefore follows from Corollary 3.19 that there is a well-defined bounded linear operator exp T on X, defined by +∞ X 1 n exp T = T n! n=0
(where T 0 is the identity operator I on X). Proposition 3.20 Let T be a bounded linear operator on a Banach space X. Suppose that kT k < 1. Then the operator I − T has a bounded inverse (I − T )−1 (where I denotes the identity operator on X). Moreover (I − T )−1 = I + T + T 2 + T 3 + · · · . Proof kT n k ≤ kT kn for all n, and the geometric series 1 + kT k + kT k2 + kT k3 + · · · is convergent (since kT k < 1). It follows from Corollary 3.19 that the infinite series I + T + T2 + T3 + ··· converges to some bounded linear operator S on X. Now (I − T )S =
lim (I − T )(I + T + T 2 + · · · + T n ) = lim (I − T n+1 )
n→+∞
= I − lim T n→+∞
n→+∞
n+1
= I,
since kT kn+1 → 0 and therefore T n+1 → 0 as n → +∞. Similarly S(I −T ) = I. This shows that I − T is invertible, with inverse S, as required. 18
3.10
Spaces of Bounded Continuous Functions on a Metric Space
Let X be a metric space. We say that a function f : X → Rn from X to Rn is bounded if there exists some non-negative constant K such that |f (x)| ≤ K for all x ∈ X. If f and g are bounded continuous functions from X to Rn , then so is f + g. Also λf is bounded and continuous for any real number λ. It follows from this that the space C(X, Rn ) of bounded continuous functions from X to Rn is a vector space over R. Given f ∈ C(X, Rn ), we define the supremum norm kf k of f by the formula kf k = sup |f (x)|. x∈X
One can readily verify that k.k is a norm on the vector space C(X, Rn ). We shall show that C(X, Rn ), with the supremum norm, is a Banach space (i.e., the supremum norm on C(X, Rn ) is complete). The proof of this result will make use of the following characterization of continuity for functions whose range is Rn . Theorem 3.21 The normed vector space C(X, Rn ) of all bounded continuous functions from some metric space X to Rn , with the supremum norm, is a Banach space. Proof Let f1 , f2 , f3 , . . . be a Cauchy sequence in C(X, Rn ). Then, for each x ∈ X, the sequence f1 (x), f2 (x), f3 (x), . . . is a Cauchy sequence in Rn (since |fj (x) − fk (x)| ≤ kfj − fk k for all positive integers j and k), and Rn is a complete metric space. Thus, for each x ∈ X, the sequence f1 (x), f2 (x), f3 (x), . . . converges to some point f (x) of Rn . We must show that the limit function f defined in this way is bounded and continuous. Let ε > 0 be given. Then there exists some positive integer N with the property that kfj − fk k < 13 ε for all j ≥ N and k ≥ N , since f1 , f2 , f3 , . . . is a Cauchy sequence in C(X, Rn ). But then, on taking the limit of the left hand side of the inequality |fj (x) − fk (x)| < 13 ε as k → +∞, we deduce that |fj (x)−f (x)| ≤ 13 ε for all x ∈ X and j ≥ N . In particular |fN (x)−f (x)| ≤ 31 ε for all x ∈ X. It follows that |f (x)| ≤ kfN k + 31 ε for all x ∈ X, showing that the limit function f is bounded. Next we show that the limit function f is continuous. Let p ∈ X and ε > 0 be given. Let N be chosen large enough to ensure that |fN (x)−f (x)| ≤ 31 ε for all x ∈ X. Now fN is continuous. It follows from the definition of continuity for functions between metric spaces that there exists some real number δ satisfying δ > 0 such that |fN (x) − fN (p)| < 31 ε for all elements x of X 19
satisfying dX (x, p) < δ, where dX denotes the distance function on X. Thus if x ∈ X satisfies dX (x, p) < δ then |f (x) − f (p)| ≤ |f (x) − fN (x)| + |fN (x) − fN (p)| + |fN (p) − f (p)| < 31 ε + 31 ε + 31 ε = ε. Therefore the limit function f is continuous. Thus f ∈ C(X, Rn ). Finally we observe that fj → f in C(X, Rn ) as j → +∞. Indeed we have already seen that, given ε > 0 there exists some positive integer N such that |fj (x) − f (x)| ≤ 31 ε for all x ∈ X and for all j ≥ N . Thus kfj − f k ≤ 13 ε < ε for all j ≥ N , showing that fj → f in C(X, Rn ) as j → +∞. This shows that C(X, Rn ) is a complete metric space, as required. Corollary 3.22 Let X be a metric space and let F be a closed subset of Rn . Then the space C(X, F ) of bounded continuous functions from X to F is a complete metric space with respect to the distance function ρ, where ρ(f, g) = kf − gk = sup |f (x) − g(x)| x∈X
for all f, g ∈ C(X, F ). Proof Let f1 , f2 , f3 , . . . be a Cauchy sequence in C(X, F ). Then f1 , f2 , f3 , . . . is a Cauchy sequence in C(X, Rn ) and therefore converges in C(X, Rn ) to some function f : X → Rn . Let x be some point of X. Then fj (x) → f (x) as j → +∞. But then f (x) ∈ F , since fj (x) ∈ F for all j, and F is closed in Rn . This shows that f ∈ C(X, F ), and thus the Cauchy sequence f1 , f2 , f3 , . . . converges in C(X, F ). We conclude that C(X, F ) is a complete metric space, as required.
3.11
The Contraction Mapping Theorem and Picard’s Theorem
Let X be a metric space with distance function d. A function T : X → X mapping X to itself is said to be a contraction mapping if there exists some constant λ satisfying 0 ≤ λ < 1 with the property that d(T (x), T (x0 )) ≤ λd(x, x0 ) for all x, x0 ∈ X. One can readily check that any contraction map T : X → X on a metric space (X, d) is continuous. Indeed let x be a point of X, and let ε > 0 be given. Then d(T (x), T (x0 )) < ε for all points x0 of X satisfying d(x, x0 ) < ε. Theorem 3.23 (Contraction Mapping Theorem) Let X be a complete metric space, and let T : X → X be a contraction mapping defined on X. Then T has a unique fixed point in X (i.e., there exists a unique point x of X for which T (x) = x). 20
Proof Let λ be chosen such that 0 ≤ λ < 1 and d(T (u), T (u0 )) ≤ λd(u, u0 ) for all u, u0 ∈ X, where d is the distance function on X. First we show the existence of the fixed point x. Let x0 be any point of X, and define a sequence x0 , x1 , x2 , x3 , x4 , . . . of points of X by the condition that xn = T (xn−1 ) for all positive integers n. It follows by induction on n that d(xn+1 , xn ) ≤ λn d(x1 , x0 ). Using the Triangle Inequality, we deduce that if j and k are positive integers satisfying k > j then d(xk , xj ) ≤
k−1 X
d(xn+1 , xn ) ≤
n=j
λj − λk λj d(x1 , x0 ) ≤ d(x1 , x0 ). 1−λ 1−λ
(Here we have used the identity λj + λj+1 + · · · + λk−1 =
λj − λk .) 1−λ
Using the fact that 0 ≤ λ < 1, we deduce that the sequence (xn ) is a Cauchy sequence in X. This Cauchy sequence must converge to some point x of X, since X is complete. But then we see that T (x) = T lim xn = lim T (xn ) = lim xn+1 = x, n→+∞
n→+∞
n→+∞
since T : X → X is a continuous function, and thus x is a fixed point of T . If x0 were another fixed point of T then we would have d(x0 , x) = d(T (x0 ), T (x)) ≤ λd(x0 , x). But this is impossible unless x0 = x, since λ < 1. Thus the fixed point x of the contraction map T is unique. We use the Contraction Mapping Theorem in order to prove the following existence theorem for solutions of ordinary differential equations. Theorem 3.24 (Picard’s Theorem) Let F : U → R be a continuous function defined over some open set U in the plane R2 , and let (x0 , t0 ) be an element of U . Suppose that there exists some non-negative constant M such that |F (u, t) − F (v, t)| ≤ M |u − v| for all (u, t) ∈ U and (v, t) ∈ U . Then there exists a continuous function ϕ: [t0 − δ, t0 + δ] → R defined on the interval [t0 − δ, t0 + δ] for some δ > 0 such that x = ϕ(t) is a solution to the differential equation dx(t) = F (x(t), t) dt with initial condition x(t0 ) = x0 . 21
Proof Solving the differential equation with the initial condition x(t0 ) = x0 is equivalent to finding a continuous function ϕ: I → R satisfying the integral equation Z t
ϕ(t) = x0 +
F (ϕ(s), s) ds. t0
where I denotes the closed interval [t0 − δ, t0 + δ]. (Note that any continuous function ϕ satisfying this integral equation is automatically differentiable, since the indefinite integral of a continuous function is always differentiable.) Let K = |F (x0 , t0 )| + 1. Using the continuity of the function F , together with the fact that U is open in R2 , one can find some δ0 > 0 such that the open disk of radius δ0 about (x0 , t0 ) is contained in U and |F (x, t)| ≤ K for all points (x, t) in this open disk. Now choose δ > 0 such that √ δ 1 + K 2 < δ0 and M δ < 1. Note that if |t − t0 | ≤ δ and |x − x0 | ≤ Kδ then (x, t) belongs to the open disk of radius δ0 about (x0 , t0 ), and hence (x, t) ∈ U and |F (x, t)| ≤ K. Let J denote the closed interval [x0 − Kδ, x0 + Kδ]. The space C(I, J) of continuous functions from the interval I to the interval J is a complete metric space, by Corollary 3.22. Define T : C(I, J) → C(I, J) by Z t T (ϕ)(t) = x0 + F (ϕ(s), s) ds. t0
We claim that T does indeed map C(I, J) into itself and is a contraction mapping. Let ϕ: I → J be an element of C(I, J). Note that if |t − t0 | ≤ δ then |(ϕ(t), t) − (x0 , t0 )|2 = (ϕ(t) − x0 )2 + (t − t0 )2 ≤ δ 2 + K 2 δ 2 < δ02 , hence |F (ϕ(t), t)| ≤ K. It follows from this that |T (ϕ)(t) − x0 | ≤ Kδ for all t satisfying |t − t0 | < δ. The function T (ϕ) is continuous, and is therefore a well-defined element of C(I, J) for all ϕ ∈ C(I, J). We now show that T is a contraction mapping on C(I, J). Let ϕ and ψ be elements of C(I, J). The hypotheses of the theorem ensure that |F (ϕ(t), t) − F (ψ(t), t)| ≤ M |ϕ(t) − ψ(t)| ≤ M ρ(ϕ, ψ) for all t ∈ I, where ρ(ϕ, ψ) = supt∈I |ϕ(t) − ψ(t)|. Therefore Z t |T (ϕ)(t) − T (ψ)(t)| = (F (ϕ(s), s) − F (ψ(s), s)) ds t0
≤ M |t − t0 |ρ(ϕ, ψ) 22
for all t satisfying |t − t0 | ≤ δ. Therefore ρ(T (ϕ), T (ψ)) ≤ M δρ(ϕ, ψ) for all ϕ, ψ ∈ C(I, J). But δ has been chosen such that M δ < 1. This shows that T : C(I, J) → C(I, J) is a contraction mapping on C(I, J). It follows from the Contraction Mapping Theorem (Theorem 3.23) that there exists a unique element ϕ of C(I, J) satisfying T (ϕ) = ϕ. This function ϕ is the required solution to the differential equation. A straightforward, but somewhat technical, least upper bound argument can be used to show that if x = ψ(t) is any other continuous solution to the differential equation dx = F (x, t) dt on the interval [t0 − δ, t0 + δ] satisfying the initial condition ψ(t0 ) = x0 , then |ψ(t) − x0 | ≤ Kδ for all t satisfying |t − t0 | ≤ δ. Thus such a solution to the differential equation must belong to the space C(I, J) defined in the proof of Theorem 3.24. The uniqueness of the fixed point of the contraction mapping T : C(I, J) → C(I, J) then shows that ψ = ϕ, where ϕ: [t0 − δ, t0 + δ] → R is the solution to the differential equation whose existence was proved in Theorem 3.24. This shows that the solution to the differential equation is in fact unique on the interval [t0 − δ, t0 + δ].
3.12
The Completion of a Metric Space
We describe below a construction whereby any metric space can be embedded in a complete metric space. Lemma 3.25 Let X be a metric space with distance function d, let (xj ) and (yj ) be Cauchy sequences of points in X, and let dj = d(xj , yj ) for all positive integers j. Then (dj ) is a Cauchy sequence of real numbers. Proof It follows from the Triangle Inequality that dj ≤ d(xj , xk ) + dk + d(yk , yj ) and thus dj − dk ≤ d(xj , xk ) + d(yj , yk ) for all integers j and k. Similarly dk − dj ≤ d(xj , xk ) + d(yj , yk ). It follows that |dj − dk | ≤ d(xj , xk ) + d(yj , yk ) for all integers j and k. Let ε > 0 be given. Then there exists some positive integer N such that d(xj , xk ) < 12 ε and d(yj , yk ) < 12 ε whenever j ≥ N and k ≥ N , since the sequences (xj ) and (yj ) are Cauchy sequences in X. But then |dj − dk | < ε whenever j ≥ N and k ≥ N . Thus the sequence (dj ) is a Cauchy sequence of real numbers, as required. 23
Let X be a metric space with distance function d. It follows from Cauchy’s Criterion for Convergence and Lemma 3.25 that lim d(xj , yj ) exists for all j→+∞
Cauchy sequences (xj ) and (yj ) in X. Lemma 3.26 Let X be a metric space with distance function d, and let (xj ), (yj ) and (zj ) be Cauchy sequences of points in X. Then 0 ≤ lim d(xj , zj ) ≤ lim d(xj , yj ) + lim d(yj , zj ). j→+∞
j→+∞
j→+∞
Proof This follows immediately on taking limits of both sides of the Triangle Inequality. Lemma 3.27 Let X be a metric space with distance function d, and let (xj ), (yj ) and (zj ) be Cauchy sequences of points in X. Suppose that lim d(xj , yj ) = 0 and lim d(yj , zj ) = 0.
j→+∞
j→+∞
Then lim d(xj , zj ) = 0. j→+∞
Proof This is an immediate consequence of Lemma 3.26. Lemma 3.28 Let X be a metric space with distance function d, and let (xj ), (x0j ), (yj ) and (yj0 ) be Cauchy sequences of points in X. Suppose that lim d(xj , x0j ) = 0 and lim d(yj , yj0 ) = 0.
j→+∞
j→+∞
Then lim d(xj , yj ) = lim d(x0j , yj0 ). j→+∞
j→+∞
Proof It follows from Lemma 3.26 that lim d(xj , yj ) ≤
j→+∞
=
lim d(xj , x0j ) + lim d(x0j , yj0 ) + lim d(yj0 , yj )
j→+∞
lim
j→+∞
j→+∞
j→+∞
d(x0j , yj0 ).
Similarly lim d(x0j , yj0 ) ≤ lim d(xj , yj ). It follows that lim d(xj , yj ) = j→+∞
j→+∞
j→+∞
lim d(x0j , yj0 ), as required.
j→+∞
Let X be a metric space with distance function d. Then there is an equivalence relation on the set of Cauchy sequences of points in X, where two Cauchy sequences (xj ) and (x0j ) in X are equivalent if and only if 24
˜ denote the set of equivalence classes of Cauchy lim d(xj , x0j ) = 0. Let X
j→+∞
sequences in X with respect to this equivalence relation. Let x˜ and y˜ be ˜ and let (xj ) and (yj ) be Cauchy sequences belonging to the elements of X, equivalence classes represented by x˜ and y˜. We define d(˜ x, y˜) = lim d(xj , yj ). j→+∞
It follows from Lemma 3.28 that the value of d(˜ x, y˜) does not depend on the choice of Cauchy sequences (xj ) and (yj ) representing x˜ and y˜. We obtain ˜ This distance function satisfies in this way a distance function on the set X. the Triangle Inequality (Lemma 3.26) and the other metric space axioms. ˜ with this distance function is a metric space. We refer to the Therefore X ˜ as the completion of the metric space X. space X We can regard the metric space X as being embedded in its completion ˜ ˜ by the equivalence class of the X, where a point x of X is represented in X constant sequence x, x, x, . . .. Example The completion of the space Q of rational numbers is the space R of real numbers. ˜ of a metric space X is a complete metric Theorem 3.29 The completion X space. ˜ of X. Proof Let x˜1 , x˜2 , x˜3 , . . . be a Cauchy sequence in the completion X For each positive integer m let xm,1 , xm,2 , xm,3 , . . . be a Cauchy sequence in X ˜ Then, belonging to the equivalence class that represents the element x˜m of X. for each positive integer m there exists a positive integer N (m) such that d(xm,j , xm,k ) < 1/m whenever j ≥ N (m) and k ≥ N (m). Let ym = xm,N (m) . We claim that the sequence y1 , y2 , y3 , . . . is a Cauchy sequence in X, and that ˜ corresponding to this Cauchy sequence is the limit in X ˜ the element y˜ of X of the sequence x˜1 , x˜2 , x˜3 , . . .. Let ε > 0 be given. Then there exists some positive integer M such that M > 3/ε and d(˜ xp , x˜q ) < 31 ε whenever p ≥ M and q ≥ M . It follows from ˜ that if p ≥ M and q ≥ M then the definition of the distance function on X 1 d(xp,k , xq,k ) < 3 ε for all sufficiently large positive integers k. If p ≥ M and k ≥ N (p) then d(yp , xp,k ) = d(xp,N (p) , xp,k ) < 1/p ≤ 1/M < 31 ε It follows that if p ≥ M and q ≥ M , and if k is sufficiently large, then d(yp , xp,k ) < 31 ε, d(yq , xq,k ) < 13 ε, and d(xp,k , xq,k ) < 13 ε, and hence d(yp , yq ) < 25
ε. We conclude that the sequence y1 , y2 , y3 , . . . of points of X is indeed a Cauchy sequence. ˜ which is represented by the Cauchy sequence Let y˜ be the element of X y1 , y2 , y3 , . . . of points of X, and, for each positive integer m, let y˜m be the ˜ represented by the constant sequence ym , ym , ym , . . . in X. Now element of X d(˜ y , y˜m ) = lim d(yp , ym ), p→+∞
and therefore d(˜ y , y˜m ) → 0 as m → +∞. Also d(˜ ym , x˜m ) = lim d(xm,N (m) , xm,j ) ≤ j→+∞
1 m
and hence d(˜ ym , x˜m ) → 0 as m → +∞. It follows from this that d(˜ y , x˜m ) → 0 ˜ converges as m → +∞, and therefore the Cauchy sequence x˜1 , x˜2 , x˜3 , . . . in X ˜ We conclude that X ˜ is a complete metric space, since to the point y˜ of X. ˜ is convergent. we have shown that every Cauchy sequence in X Remark In a paper published in 1872, Cantor gave a construction of the real number system in which real numbers are represented as Cauchy sequences of rational numbers. The real numbers represented by two Cauchy sequences of rational numbers are equal if and only if the difference of the Cauchy sequences converges to zero. Thus the construction of the completion of a metric space, described above, generalizes Cantor’s construction of the system of real numbers from the system of rational numbers.
26
Course 221: Michaelmas Term 2006 Section 4: Topological Spaces David R. Wilkins c David R. Wilkins 1997–2006 Copyright
Contents 4 Topological Spaces 4.1 Topological Spaces: Definitions and Examples . . 4.2 Hausdorff Spaces . . . . . . . . . . . . . . . . . . 4.3 Subspace Topologies . . . . . . . . . . . . . . . . 4.4 Continuous Functions between Topological Spaces 4.5 Homeomorphisms . . . . . . . . . . . . . . . . . . 4.6 Sequences and Convergence . . . . . . . . . . . . 4.7 Neighbourhoods, Closures and Interiors . . . . . . 4.8 Product Topologies . . . . . . . . . . . . . . . . . 4.9 Cut and Paste Constructions . . . . . . . . . . . . 4.10 Identification Maps and Quotient Topologies . . . 4.11 Connected Topological Spaces . . . . . . . . . . .
1
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
2 2 4 4 6 8 8 9 11 14 16 19
4
Topological Spaces
The theory of topological spaces provides a setting for the notions of continuity and convergence which is more general than that provided by the theory of metric spaces. In the theory of metric spaces one can find necessary and sufficient conditions for convergence and continuity that do not refer explicitly to the distance function on a metric space but instead are expressed in terms of open sets. Thus a sequence of points in a metric space X converges to a point p of X if and only if every open set which contains the point p also contains all but finitely many members of the sequence. Also a function f : X → Y between metric spaces X and Y is continuous if and only if the preimage f −1 (V ) of every open set V in Y is an open set in X. It follows from this that we can generalize the notions of convergence and continuity by introducing the concept of a topological space: a topological space consists of a set together with a collection of subsets termed open sets that satisfy appropriate axioms. The axioms for open sets in a topological space are satisfied by the open sets in any metric space.
4.1
Topological Spaces: Definitions and Examples
Definition A topological space X consists of a set X together with a collection of subsets, referred to as open sets, such that the following conditions are satisfied:— (i) the empty set ∅ and the whole set X are open sets, (ii) the union of any collection of open sets is itself an open set, (iii) the intersection of any finite collection of open sets is itself an open set. The collection consisting of all the open sets in a topological space X is referred to as a topology on the set X. Remark If it is necessary to specify explicitly the topology on a topological space then one denotes by (X, τ ) the topological space whose underlying set is X and whose topology is τ . However if no confusion will arise then it is customary to denote this topological space simply by X. Any metric space may be regarded as a topological space. Indeed let X be a metric space with distance function d. We recall that a subset V of X is an open set if and only if, given any point v of V , there exists some δ > 0 such that {x ∈ X : d(x, v) < δ} ⊂ V . The empty set ∅ and the whole space X are open sets. Also any union of open sets in a metric space 2
is an open set, and any finite intersection of open sets in a metric space is an open set. Thus the topological space axioms are satisfied by the collection of open sets in any metric space. We refer to this collection of open sets as the topology generated by the distance function d on X. Any subset X of n-dimensional Euclidean space Rn is a topological space: a subset V of X is open in X if and only if, given any point v of V , there exists some δ > 0 such that {x ∈ X : |x − v| < δ} ⊂ V. In particular Rn is itself a topological space whose topology is generated by the Euclidean distance function on Rn . This topology on Rn is referred to as the usual topology on Rn . One defines the usual topologies on R and C in an analogous fashion. Example Given any set X, one can define a topology on X where every subset of X is an open set. This topology is referred to as the discrete topology on X. Example Given any set X, one can define a topology on X in which the only open sets are the empty set ∅ and the whole set X. Definition Let X be a topological space. A subset F of X is said to be a closed set if and only if its complement X \ F is an open set. We recall that the complement of the union of some collection of subsets of some set X is the intersection of the complements of those sets, and the complement of the intersection of some collection of subsets of X is the union of the complements of those sets. The following result therefore follows directly from the definition of a topological space. Proposition 4.1 Let X be a topological space. Then the collection of closed sets of X has the following properties:— (i) the empty set ∅ and the whole set X are closed sets, (ii) the intersection of any collection of closed sets is itself a closed set, (iii) the union of any finite collection of closed sets is itself a closed set.
3
4.2
Hausdorff Spaces
Definition A topological space X is said to be a Hausdorff space if and only if it satisfies the following Hausdorff Axiom: • if x and y are distinct points of X then there exist open sets U and V such that x ∈ U , y ∈ V and U ∩ V = ∅. Lemma 4.2 All metric spaces are Hausdorff spaces. Proof Let X be a metric space with distance function d, and let x and y be points of X, where x 6= y. Let ε = 12 d(x, y). Then the open balls BX (x, ε) and BX (y, ε) of radius ε centred on the points x and y are open sets. If BX (x, ε) ∩ BX (y, ε) were non-empty then there would exist z ∈ X satisfying d(x, z) < ε and d(z, y) < ε. But this is impossible, since it would then follow from the Triangle Inequality that d(x, y) < 2ε, contrary to the choice of ε. Thus x ∈ BX (x, ε), y ∈ BX (y, ε), BX (x, ε) ∩ BX (y, ε) = ∅. This shows that the metric space X is a Hausdorff space. We now give an example of a topological space which is not a Hausdorff space. Example The Zariski topology on the set R of real numbers is defined as follows: a subset U of R is open (with respect to the Zariski topology) if and only if either U = ∅ or else R \ U is finite. It is a straightforward exercise to verify that the topological space axioms are satisfied, so that the set R of real numbers is a topological space with respect to this Zariski topology. Now the intersection of any two non-empty open sets in this topology is always non-empty. (Indeed if U and V are non-empty open sets then U = R \ F1 and V = R \ F2 , where F1 and F2 are finite sets of real numbers. But then U ∩ V = R \ (F1 ∪ F2 ), which is non-empty, since F1 ∪ F2 is finite and R is infinite.) It follows immediately from this that R, with the Zariski topology, is not a Hausdorff space.
4.3
Subspace Topologies
Let X be a topological space with topology τ , and let A be a subset of X. Let τA be the collection of all subsets of A that are of the form V ∩ A for V ∈ τ . Then τA is a topology on the set A. (It is a straightforward exercise to verify that the topological space axioms are satisfied.) The topology τA on A is referred to as the subspace topology on A. Any subset of a Hausdorff space is itself a Hausdorff space (with respect to the subspace topology). 4
Lemma 4.3 Let X be a metric space with distance function d, and let A be a subset of X. A subset W of A is open with respect to the subspace topology on A if and only if, given any point w of W , there exists some δ > 0 such that {a ∈ A : d(a, w) < δ} ⊂ W. Thus the subspace topology on A coincides with the topology on A obtained on regarding A as a metric space (with respect to the distance function d). Proof Suppose that W is open with respect to the subspace topology on A. Then there exists some open set U in X such that W = U ∩ A. Let w be a point of W . Then there exists some δ > 0 such that {x ∈ X : d(x, w) < δ} ⊂ U. But then {a ∈ A : d(a, w) < δ} ⊂ U ∩ A = W. Conversely, suppose that W is a subset of A with the property that, for any w ∈ W , there exists some δw > 0 such that {a ∈ A : d(a, w) < δw } ⊂ W. Define U to be the union of the open balls BX (w, δw ) as w ranges over all points of W , where BX (w, δw ) = {x ∈ X : d(x, w) < δw }. The set U is an open set in X, since each open ball BX (w, δw ) is an open set in X, and any union of open sets is itself an open set. Moreover BX (w, δw ) ∩ A = {a ∈ A : d(a, w) < δw } ⊂ W for any w ∈ W . Therefore U ∩ A ⊂ W . However W ⊂ U ∩ A, since, W ⊂ A and {w} ⊂ BX (w, δw ) ⊂ U for any w ∈ W . Thus W = U ∩ A, where U is an open set in X. We deduce that W is open with respect to the subspace topology on A. Example Let X be any subset of n-dimensional Euclidean space Rn . Then the subspace topology on X coincides with the topology on X generated by the Euclidean distance function on X. We refer to this topology as the usual topology on X. Let X be a topological space, and let A be a subset of X. One can readily verify the following:— 5
• a subset B of A is closed in A (relative to the subspace topology on A) if and only if B = A ∩ F for some closed subset F of X; • if A is itself open in X then a subset B of A is open in A if and only if it is open in X; • if A is itself closed in X then a subset B of A is closed in A if and only if it is closed in X.
4.4
Continuous Functions between Topological Spaces
Definition A function f : X → Y from a topological space X to a topological space Y is said to be continuous if f −1 (V ) is an open set in X for every open set V in Y , where f −1 (V ) ≡ {x ∈ X : f (x) ∈ V }. A continuous function from X to Y is often referred to as a map from X to Y . Lemma 4.4 Let X, Y and Z be topological spaces, and let f : X → Y and g: Y → Z be continuous functions. Then the composition g ◦ f : X → Z of the functions f and g is continuous. Proof Let V be an open set in Z. Then g −1 (V ) is open in Y (since g is continuous), and hence f −1 (g −1 (V )) is open in X (since f is continuous). But f −1 (g −1 (V )) = (g ◦ f )−1 (V ). Thus the composition function g ◦ f is continuous. Lemma 4.5 Let X and Y be topological spaces, and let f : X → Y be a function from X to Y . The function f is continuous if and only if f −1 (G) is closed in X for every closed subset G of Y . Proof If G is any subset of Y then X \ f −1 (G) = f −1 (Y \ G) (i.e., the complement of the preimage of G is the preimage of the complement of G). The result therefore follows immediately from the definitions of continuity and closed sets. We now show that, if a topological space X is the union of a finite collection of closed sets, and if a function from X to some topological space is continuous on each of these closed sets, then that function is continuous on X.
6
Lemma 4.6 Let X and Y be topological spaces, let f : X → Y be a function from X to Y , and let X = A1 ∪ A2 ∪ · · · ∪ Ak , where A1 , A2 , . . . , Ak are closed sets in X. Suppose that the restriction of f to the closed set Ai is continuous for i = 1, 2, . . . , k. Then f : X → Y is continuous. Proof Let V be an open set in Y . We must show that f −1 (V ) is open in X. Now the preimage of the open set V under the restriction f |Ai of f to Ai is f −1 (V ) ∩ Ai . It follows from the continuity of f |Ai that f −1 (V ) ∩ Ai is relatively open in Ai for each i, and hence there exist open sets U1 , U2 , . . . , Uk in X such that f −1 (V )∩Ai = Ui ∩Ai for i = 1, 2, . . . , k. Let Wi = Ui ∪(X \Ai ) for i = 1, 2, . . . , k. Then Wi is an open set in X (as it is the union of the open sets Ui and X \ Ai ), and Wi ∩ Ai = Ui ∩ Ai = f −1 (V ) ∩ Ai for each i. We claim that f −1 (V ) = W1 ∩ W2 ∩ · · · ∩ Wk . Let W = W1 ∩ W2 ∩ · · · ∩ Wk . Then f −1 (V ) ⊂ W , since f −1 (V ) ⊂ Wi for each i. Also W =
k [ i=1
(W ∩ Ai ) ⊂
k [
(Wi ∩ Ai ) =
i=1
k [
(f −1 (V ) ∩ Ai ) ⊂ f −1 (V ),
i=1
since X = A1 ∪ A2 ∪ · · · ∪ Ak and Wi ∩ Ai = f −1 (V ) ∩ Ai for each i. Therefore f −1 (V ) = W . But W is open in X, since it is the intersection of a finite collection of open sets. We have thus shown that f −1 (V ) is open in X for any open set V in Y . Thus f : X → Y is continuous, as required. Alternative Proof A function f : X → Y is continuous if and only if f −1 (G) is closed in X for every closed set G in Y (Lemma 4.5). Let G be an closed set in Y . Then f −1 (G) ∩ Ai is relatively closed in Ai for i = 1, 2, . . . , k, since the restriction of f to Ai is continuous for each i. But Ai is closed in X, and therefore a subset of Ai is relatively closed in Ai if and only if it is closed in X. Therefore f −1 (G) ∩ Ai is closed in X for i = 1, 2, . . . , k. Now f −1 (G) is the union of the sets f −1 (G) ∩ Ai for i = 1, 2, . . . , k. It follows that f −1 (G), being a finite union of closed sets, is itself closed in X. It now follows from Lemma 4.5 that f : X → Y is continuous. Example Let Y be a topological space, and let α: [0, 1] → Y and β: [0, 1] → Y be continuous functions defined on the interval [0, 1], where α(1) = β(0). Let γ: [0, 1] → Y be defined by α(2t) if 0 ≤ t ≤ 12 ; γ(t) = β(2t − 1) if 21 ≤ t ≤ 1. Now γ|[0, 12 ] = α ◦ ρ where ρ: [0, 12 ] → [0, 1] is the continuous function defined by ρ(t) = 2t for all t ∈ [0, 12 ]. Thus γ|[0, 21 ] is continuous, being a composition 7
of two continuous functions. Similarly γ|[ 12 , 1] is continuous. The subintervals [0, 12 ] and [ 12 , 1] are closed in [0, 1], and [0, 1] is the union of these two subintervals. It follows from Lemma 4.6 that γ: [0, 1] → Y is continuous.
4.5
Homeomorphisms
Definition Let X and Y be topological spaces. A function h: X → Y is said to be a homeomorphism if and only if the following conditions are satisfied: • the function h: X → Y is both injective and surjective (so that the function h: X → Y has a well-defined inverse h−1 : Y → X), • the function h: X → Y and its inverse h−1 : Y → X are both continuous. Two topological spaces X and Y are said to be homeomorphic if there exists a homeomorphism h: X → Y from X to Y . If h: X → Y is a homeomorphism between topological spaces X and Y then h induces a one-to-one correspondence between the open sets of X and the open sets of Y . Thus the topological spaces X and Y can be regarded as being identical as topological spaces.
4.6
Sequences and Convergence
Definition A sequence x1 , x2 , x3 , . . . of points in a topological space X is said to converge to a point p of X if, given any open set U containing the point p, there exists some natural number N such that xj ∈ U for all j ≥ N . If the sequence (xj ) converges to p then we refer to p as a limit of the sequence. This definition of convergence generalizes the definition of convergence for a sequence of points in a metric space. It can happen that a sequence of points in a topological space can have more than one limit. For example, consider the set R of real numbers with the Zariski topology. (The open sets of R in the Zariski topology are the empty set and those subsets of R whose complements are finite.) Let x1 , x2 , x3 , . . . be the sequence in R defined by xj = j for all natural numbers j. One can readily check that this sequence converges to every real number p (with respect to the Zariski topology on R). Lemma 4.7 A sequence x1 , x2 , x3 , . . . of points in a Hausdorff space X converges to at most one limit.
8
Proof Suppose that p and q were limits of the sequence (xj ), where p 6= q. Then there would exist open sets U and V such that p ∈ U , q ∈ V and U ∩ V = ∅, since X is a Hausdorff space. But then there would exist natural numbers N1 and N2 such that xj ∈ U for all j satisfying j ≥ N1 and xj ∈ V for all j satisfying j ≥ N2 . But then xj ∈ U ∩ V for all j satisfying j ≥ N1 and j ≥ N2 , which is impossible, since U ∩ V = ∅. This contradiction shows that the sequence (xj ) has at most one limit. Lemma 4.8 Let X be a topological space, and let F be a closed set in X. Let (xj : j ∈ N) be a sequence of points in F . Suppose that the sequence (xj ) converges to some point p of X. Then p ∈ F . Proof Suppose that p were a point belonging to the complement X \F of F . Now X \ F is open (since F is closed). Therefore there would exist some natural number N such that xj ∈ X \ F for all values of j satisfying j ≥ N , contradicting the fact that xj ∈ F for all j. This contradiction shows that p must belong to F , as required. Lemma 4.9 Let f : X → Y be a continuous function between topological spaces X and Y , and let x1 , x2 , x3 , . . . be a sequence of points in X which converges to some point p of X. Then the sequence f (x1 ), f (x2 ), f (x3 ), . . . converges to f (p). Proof Let V be an open set in Y which contains the point f (p). Then f −1 (V ) is an open set in X which contains the point p. It follows that there exists some natural number N such that xj ∈ f −1 (V ) whenever j ≥ N . But then f (xj ) ∈ V whenever j ≥ N . We deduce that the sequence f (x1 ), f (x2 ), f (x3 ), . . . converges to f (p), as required.
4.7
Neighbourhoods, Closures and Interiors
Definition Let X be a topological space, and let x be a point of X. Let N be a subset of X which contains the point x. Then N is said to be a neighbourhood of the point x if and only if there exists an open set U for which x ∈ U and U ⊂ N . One can readily verify that this definition of neighbourhoods in topological spaces is consistent with that for neighbourhoods in metric spaces. Lemma 4.10 Let X be a topological space. A subset V of X is open in X if and only if V is a neighbourhood of each point belonging to V . 9
Proof It follows directly from the definition of neighbourhoods that an open set V is a neighbourhood of any point belonging to V . Conversely, suppose that V is a subset of X which is a neighbourhood of each v ∈ V . Then, given any point v of V , there exists an open set Uv such that v ∈ Uv and Uv ⊂ V . Thus V is an open set, since it is the union of the open sets Uv as v ranges over all points of V . Definition Let X be a topological space and let A be a subset of X. The closure A of A in X is defined to be the intersection of all of the closed subsets of X that contain A. The interior A0 of A in X is defined to be the union of all of the open subsets of X that are contained in A. Let X be a topological space and let A be a subset of X. It follows directly from the definition of A that the closure A of A is uniquely characterized by the following two properties: (i) the closure A of A is a closed set containing A, (ii) if F is any closed set containing A then F contains A. Similarly the interior A0 of A is uniquely characterized by the following two properties: (i) the interior A0 of A is an open set contained in A, (ii) if U is any open set contained in A then U is contained in A0 . Moreover a point x of A belongs to the interior A0 of A if and only if A is a neighbourhood of x. Lemma 4.11 Let X be a topological space, and let A be a subset of X. Suppose that a sequence x1 , x2 , x3 , . . . of points of A converges to some point p of X. Then p belongs to the closure A of A. Proof If F is any closed set containing A then xj ∈ F for all j, and therefore p ∈ F , by Lemma 4.8. Therefore p ∈ A by definition of A. Definition Let X be a topological space, and let A be a subset of X. We say that A is dense in X if A = X. Example The set of all rational numbers is dense in R.
10
4.8
Product Topologies
The Cartesian product X1 × X2 × · · · × Xn of sets X1 , X2 , . . . , Xn is defined to be the set of all ordered n-tuples (x1 , x2 , . . . , xn ), where xi ∈ Xi for i = 1, 2, . . . , n. The sets R2 and R3 are the Cartesian products R × R and R × R × R respectively. Cartesian products of sets are employed as the domains of functions of several variables. For example, if X, Y and Z are sets, and if an element f (x, y) of Z is determined for each choice of an element x of X and an element y of Y , then we have a function f : X × Y → Z whose domain is the Cartesian product X × Y of X and Y : this function sends the ordered pair (x, y) to f (x, y) for all x ∈ X and y ∈ Y . Definition Let X1 , X2 , . . . , Xn be topological spaces. A subset U of the Cartesian product X1 × X2 × · · · × Xn is said to be open (with respect to the product topology) if, given any point p of U , there exist open sets Vi in Xi for i = 1, 2, . . . , n such that {p} ⊂ V1 × V2 × · · · × Vn ⊂ U . Lemma 4.12 Let X1 , X2 , . . . , Xn be topological spaces. Then the collection of open sets in X1 × X2 × · · · × Xn is a topology on X1 × X2 × · · · × Xn . Proof Let X = X1 × X2 × · · · × Xn . The definition of open sets ensures that the empty set and the whole set X are open in X. We must prove that any union or finite intersection of open sets in X is an open set. Let E be a union of a collection of open sets in X and let p be a point of E. Then p ∈ D for some open set D in the collection. It follows from this that there exist open sets Vi in Xi for i = 1, 2, . . . , n such that {p} ⊂ V1 × V2 × · · · × Vn ⊂ D ⊂ E. Thus E is open in X. Let U = U1 ∩U2 ∩· · ·∩Um , where U1 , U2 , . . . , Um are open sets in X, and let p be a point of U . Then there exist open sets Vki in Xi for k = 1, 2, . . . , m and i = 1, 2, . . . , n such that {p} ⊂ Vk1 × Vk2 × · · · × Vkn ⊂ Uk for k = 1, 2, . . . , m. Let Vi = V1i ∩ V2i ∩ · · · ∩ Vmi for i = 1, 2, . . . , n. Then {p} ⊂ V1 × V2 × · · · × Vn ⊂ Vk1 × Vk2 × · · · × Vkn ⊂ Uk for k = 1, 2, . . . , m, and hence {p} ⊂ V1 × V2 × · · · × Vn ⊂ U . It follows that U is open in X, as required.
11
Lemma 4.13 Let X1 , X2 , . . . , Xn and Z be topological spaces. Then a function f : X1 × X2 × · · · × Xn → Z is continuous if and only if, given any point p of X1 × X2 × · · · × Xn , and given any open set U in Z containing f (p), there exist open sets Vi in Xi for i = 1, 2, . . . , n such that p ∈ V1 × V2 · · · × Vn and f (V1 × V2 × · · · × Vn ) ⊂ U . Proof Let Vi be an open set in Xi for i = 1, 2, . . . , n, and let U be an open set in Z. Then V1 ×V2 ×· · ·×Vn ⊂ f −1 (U ) if and only if f (V1 ×V2 ×· · ·×Vn ) ⊂ U . It follows that f −1 (U ) is open in the product topology on X1 ×X2 ×· · ·×Xn if and only if, given any point p of X1 ×X2 ×· · ·×Xn satisfying f (p) ∈ U , there exist open sets Vi in Xi for i = 1, 2, . . . , n such that f (V1 ×V2 ×· · ·×Vn ) ⊂ U . The required result now follows from the definition of continuity. Let X1 , X2 , . . . , Xn be topological spaces, and let Vi be an open set in Xi for i = 1, 2, . . . , n. It follows directly from the definition of the product topology that V1 × V2 × · · · × Vn is open in X1 × X2 × · · · × Xn . Theorem 4.14 Let X = X1 × X2 × · · · × Xn , where X1 , X2 , . . . , Xn are topological spaces and X is given the product topology, and for each i, let pi : X → Xi denote the projection function which sends (x1 , x2 , . . . , xn ) ∈ X to xi . Then the functions p1 , p2 , . . . , pn are continuous. Moreover a function f : Z → X mapping a topological space Z into X is continuous if and only if pi ◦ f : Z → Xi is continuous for i = 1, 2, . . . , n. Proof Let V be an open set in Xi . Then p−1 i (V ) = X1 × · · · × Xi−1 × V × Xi+1 × · · · × Xn , and therefore p−1 i (V ) is open in X. Thus pi : X → Xi is continuous for all i. Let f : Z → X be continuous. Then, for each i, pi ◦ f : Z → Xi is a composition of continuous functions, and is thus itself continuous. Conversely suppose that f : Z → X is a function with the property that pi ◦ f is continuous for all i. Let U be an open set in X. We must show that f −1 (U ) is open in Z. Let z be a point of f −1 (U ), and let f (z) = (u1 , u2 , . . . , un ). Now U is open in X, and therefore there exist open sets V1 , V2 , . . . , Vn in X1 , X2 , . . . , Xn respectively such that ui ∈ Vi for all i and V1 × V2 × · · · × Vn ⊂ U . Let Nz = f1−1 (V1 ) ∩ f2−1 (V2 ) ∩ · · · ∩ fn−1 (Vn ), where fi = pi ◦ f for i = 1, 2, . . . , n. Now fi−1 (Vi ) is an open subset of Z for i = 1, 2, . . . , n, since Vi is open in Xi and fi : Z → Xi is continuous. Thus Nz , being a finite intersection of open sets, is itself open in Z. Moreover f (Nz ) ⊂ V1 × V2 × · · · × Vn ⊂ U, 12
so that Nz ⊂ f −1 (U ). It follows that f −1 (U ) is the union of the open sets Nz as z ranges over all points of f −1 (U ). Therefore f −1 (U ) is open in Z. This shows that f : Z → X is continuous, as required. Proposition 4.15 The usual topology on Rn coincides with the product topology on Rn obtained on regarding Rn as the Cartesian product R × R × · · · × R of n copies of the real line R. Proof We must show that a subset U of Rn is open with respect to the usual topology if and only if it is open with respect to the product topology. Let U be a subset of Rn that is open with respect to the usual topology, and let u ∈ U . Then there exists some δ > 0 such that B(u, δ) ⊂ U , where B(u, δ) = {x ∈ Rn : |x − u| < δ}. Let I1 , I2 , . . . , In be the open intervals in R defined by δ δ Ii = {t ∈ R : ui − √ < t < ui + √ } n n for i = 1, 2, . . . , n. Then I1 , I2 , . . . , In are open sets in R. Moreover {u} ⊂ I1 × I2 × · · · × In ⊂ B(u, δ) ⊂ U, since 2
|x − u| =
n X
2
(xi − ui ) < n
i=1
δ √ n
2
= δ2
for all x ∈ I1 × I2 × · · · × In . This shows that any subset U of Rn that is open with respect to the usual topology on Rn is also open with respect to the product topology on Rn . Conversely suppose that U is a subset of Rn that is open with respect to the product topology on Rn , and let u ∈ U . Then there exist open sets V1 , V2 , . . . , Vn in R containing u1 , u2 , . . . , un respectively such that V1 × V2 × · · · × Vn ⊂ U . Now we can find δ1 , δ2 , . . . , δn such that δi > 0 and (ui − δi , ui + δi ) ⊂ Vi for all i. Let δ > 0 be the minimum of δ1 , δ2 , . . . , . . . , δn . Then B(u, δ) ⊂ V1 × V2 × · · · Vn ⊂ U, for if x ∈ B(u, δ) then |xi − ui | < δi for i = 1, 2, . . . , n. This shows that any subset U of Rn that is open with respect to the product topology on Rn is also open with respect to the usual topology on Rn . The following result is now an immediate corollary of Proposition 4.15 and Theorem 4.14. 13
Corollary 4.16 Let X be a topological space and let f : X → Rn be a function from X to Rn . Let us write f (x) = (f1 (x), f2 (x), . . . , fn (x)) for all x ∈ X, where the components f1 , f2 , . . . , fn of f are functions from X to R. The function f is continuous if and only if its components f1 , f2 , . . . , fn are all continuous. Let f : X → R and g: X → R be continuous real-valued functions on some topological space X. We claim that f + g, f − g and f.g are continuous. Now it is a straightforward exercise to verify that the sum and product functions s: R2 → R and p: R2 → R defined by s(x, y) = x + y and p(x, y) = xy are continuous, and f + g = s ◦ h and f.g = p ◦ h, where h: X → R2 is defined by h(x) = (f (x), g(x)). Moreover it follows from Corollary 4.16 that the function h is continuous, and compositions of continuous functions are continuous. Therefore f + g and f.g are continuous, as claimed. Also −g is continuous, and f − g = f + (−g), and therefore f − g is continuous. If in addition the continuous function g is non-zero everywhere on X then 1/g is continuous (since 1/g is the composition of g with the reciprocal function t 7→ 1/t), and therefore f /g is continuous. Lemma 4.17 The Cartesian product X1 × X2 × . . . Xn of Hausdorff spaces X1 , X2 , . . . , Xn is Hausdorff. Proof Let X = X1 × X2 × . . . , Xn , and let u and v be distinct points of X, where u = (x1 , x2 , . . . , xn ) and v = (y1 , y2 , . . . , yn ). Then xi 6= yi for some integer i between 1 and n. But then there exist open sets U and V in Xi such that xi ∈ U , yi ∈ V and U ∩ V = ∅ (since Xi is a Hausdorff space). −1 Let pi : X → Xi denote the projection function. Then p−1 i (U ) and pi (V ) are −1 open sets in X, since pi is continuous. Moreover u ∈ pi (U ), v ∈ p−1 i (V ), −1 and p−1 (U ) ∩ p (V ) = ∅. Thus X is Hausdorff, as required. i i
4.9
Cut and Paste Constructions
Suppose we start out with a square of paper. If we join together two opposite edges of this square we obtain a cylinder. The boundary of the cylinder consists of two circles. If we join together the two boundary circles we obtain a torus (which corresponds to the surface of a doughnut). Let the square be represented by the set [0, 1] × [0, 1] consisting of all ordered pairs (s, t) where s and t are real numbers between 0 and 1. There is an equivalence relation on the square [0, 1] × [0, 1], where points (s, t) and 14
(u, v) of the square are related if and only if at least one of the following conditions is satisfied: • s = u and t = v; • s = 0, u = 1 and t = v; • s = 1, u = 0 and t = v; • t = 0, v = 1 and s = u; • t = 1, v = 0 and s = u; • (s, t) and (u, v) both belong to {(0, 0), (0, 1), (1, 0), (1, 1)}. Note that if 0 < s < 1 and 0 < t < 1 then the equivalence class of the point (s, t) is the set {(s, t)} consisting of that point. If s = 0 or 1 and if 0 < t < 1 then the equivalence class of (s, t) is the set {(0, t), (1, t)}. Similarly if t = 0 or 1 and if 0 < s < 1 then the equivalence class of (s, t) is the set {(s, 0), (s, 1)}. The equivalence class of each corner of the square is the set {(0, 0), (1, 0), (0, 1), (1, 1)} consisting of all four corners. Thus each equivalence class contains either one point in the interior of the square, or two points on opposite edges of the square, or four points at the four corners of the square. Let T 2 denote the set of these equivalence classes. We have a map q: [0, 1] × [0, 1] → T 2 which sends each point (s, t) of the square to its equivalence class. Each element of the set T 2 is the image of one, two or four points of the square. The elements of T 2 represent points on the torus obtained from the square by first joining together two opposite sides of the square to form a cylinder and then joining together the boundary circles of this cylinder as described above. We say that the torus T 2 is obtained from the square [0, 1] × [0, 1] by identifying the points (0, t) and (1, t) for all t ∈ [0, 1] and identifying the points (s, 0) and (s, 1) for all s ∈ [0, 1]. The topology on the square [0, 1]×[0, 1] induces a corresponding topology on the set T 2 , where a subset U of T 2 is open in T 2 if and only if q −1 (U ) is open in the square [0, 1] × [0, 1]. (The fact that these open sets in T 2 constitute a topology on the set T 2 is a consequence of Lemma 4.18.) The function q: [0, 1] × [0, 1] → T 2 is then a continuous surjection. We say that the topological space T 2 is the identification space obtained from the square [0, 1] × [0, 1] by identifying points on the sides to the square as described above. The continuous map q from the square to the torus is an example of an identification map, and the topology on the torus T 2 is referred to as the quotient topology on T 2 induced by the identification map q: [0, 1] × [0, 1] → T 2. 15
Another well-known identification space obtained from the square is the Klein bottle (Kleinsche Flasche). The Klein bottle K 2 is obtained from the square [0, 1] × [0, 1] by identifying (0, t) with (1, 1 − t) for all t ∈ [0, 1] and identifying (s, 0) with (s, 1) for all s ∈ [0, 1]. These identifications correspond to an equivalence relation on the square, where points (s, t) and (u, v) of the square are equivalent if and only if one of the following conditions is satisfied: • s = u and t = v; • s = 0, u = 1 and t = 1 − v; • s = 1, u = 0 and t = 1 − v; • t = 0, v = 1 and s = u; • t = 1, v = 0 and s = u; • (s, t) and (u, v) both belong to {(0, 0), (0, 1), (1, 0), (1, 1)}. The corresponding set of equivalence classes is the Klein bottle K 2 . Thus each point of the Klein bottle K 2 represents an equivalence class consisting of either one point in the interior of the square, or two points (0, t) and (1, 1 − t) with 0 < t < 1 on opposite edges of the square, or two points (s, 0) and (s, 1) with 0 < s < 1 on opposite edges of the square, or the four corners of the square. There is a surjection r: [0, 1] × [0, 1] → K 2 from the square to the Klein bottle that sends each point of the square to its equivalence class. The identifications used to construct the Klein bottle ensure that r(0, t) = r(1, 1 − t) for all t ∈ [0, 1] and r(s, 0) = r(s, 1) for all s ∈ [0, 1]. One can construct a quotient topology on the Klein bottle K 2 , where a subset U of K 2 is open in K 2 if and only if its preimage r−1 (U ) is open in the square [0, 1] × [0, 1].
4.10
Identification Maps and Quotient Topologies
Definition Let X and Y be topological spaces and let q: X → Y be a function from X to Y . The function q is said to be an identification map if and only if the following conditions are satisfied: • the function q: X → Y is surjective, • a subset U of Y is open in Y if and only if q −1 (U ) is open in X.
16
It follows directly from the definition that any identification map is continuous. Moreover, in order to show that a continuous surjection q: X → Y is an identification map, it suffices to prove that if V is a subset of Y with the property that q −1 (V ) is open in X then V is open in Y . Example Let S 1 denote the unit circle {(x, y) ∈ R2 : x2 +y 2 = 1} in R2 , and let q: [0, 1] → S 1 be the continuous map defined by q(t) = (cos 2πt, sin 2πt) for all t ∈ [0, 1]. We show that q: [0, 1] → S 1 is an identification map. This map is continuous and surjective. It remains to show that if V is a subset of S 1 with the property that q −1 (V ) is open in [0, 1] then V is open in S 1 . Note that |q(s) − q(t)| = 2| sin π(s − t)| for all s, t ∈ [0, 1] satisfying |s − t| ≤ 12 . Let V be a subset of S 1 with the property that q −1 (V ) is open in [0, 1], and let v be an element of V . We show that there exists ε > 0 such that all points u of S 1 satisfying |u − v| < ε belong to V . We consider separately the cases when v = (1, 0) and when v 6= (1, 0). Suppose that v = (1, 0). Then (1, 0) ∈ V , and hence 0 ∈ q −1 (V ) and 1 ∈ q −1 (V ). But q −1 (V ) is open in [0, 1]. It follows that there exists a real number δ satisfying 0 < δ < 21 such that [0, δ) ⊂ q −1 (V ) and (1 − δ, 1] ∈ q −1 (V ). Let ε = 2 sin πδ. Now if −π ≤ θ ≤ π then the Euclidean distance between the points (1, 0) and (cos θ, sin θ) is 2 sin 12 |θ|. Moreover, this distance increases monotonically as |θ| increases from 0 to π. Thus any point on the unit circle S 1 whose distance from (1, 0) is less than ε must be of the form (cos θ, sin θ), where |θ| < 2πδ. Thus if u ∈ S 1 satisfies |u − v| < ε then u = q(s) for some s ∈ [0, 1] satisfying either 0 ≤ s < δ or 1 − δ < s ≤ 1. But then s ∈ q −1 (V ), and hence u ∈ V . Next suppose that v 6= (1, 0). Then v = q(t) for some real number t satisfying 0 < t < 1. But q −1 (V ) is open in [0, 1], and t ∈ q −1 (V ). It follows that (t − δ, t + δ) ⊂ q −1 (V ) for some real number δ satisfying δ > 0. Let ε = 2 sin πδ. If u ∈ S 1 satisfies |u − v| < ε then u = q(s) for some s ∈ (t − δ, t + δ). But then s ∈ q −1 (V ), and hence u ∈ V . We have thus shown that if V is a subset of S 1 with the property that −1 q (V ) is open in [0, 1] then there exists ε > 0 such that u ∈ V for all elements u of S 1 satisfying |u − v| < ε. It follows from this that V is open in S 1 . Thus the continuous surjection q: [0, 1] → S 1 is an identification map. Lemma 4.18 Let X be a topological space, let Y be a set, and let q: X → Y be a surjection. Then there is a unique topology on Y for which the function q: X → Y is an identification map. Proof Let τ be the collection consisting of all subsets U of Y for which q −1 (U ) is open in X. Now q −1 (∅) = ∅, and q −1 (Y ) = X, so that ∅ ∈ τ and 17
Y ∈ τ . If {Vα : α ∈ A} is any collection of subsets of Y indexed by a set A, then it is a straightforward exercise to verify that [ \ [ \ q −1 (Vα ) = q −1 Vα , q −1 (Vα ) = q −1 Vα α∈A
α∈A
α∈A
α∈A
(i.e., given any collection of subsets of Y , the union of the preimages of the sets is the preimage of the union of those sets, and the intersection of the preimages of the sets is the preimage of the intersection of those sets). It follows easily from this that unions and finite intersections of sets belonging to τ must themselves belong to τ . Thus τ is a topology on Y , and the function q: X → Y is an identification map with respect to the topology τ . Clearly τ is the unique topology on Y for which the function q: X → Y is an identification map. Let X be a topological space, let Y be a set, and let q: X → Y be a surjection. The unique topology on Y for which the function q is an identification map is referred to as the quotient topology (or identification topology) on Y . Let ∼ be an equivalence relation on a topological space X. If Y is the corresponding set of equivalence classes of elements of X then there is a surjection q: X → Y that sends each element of X to its equivalence class. Lemma 4.18 ensures that there is a well-defined quotient topology on Y , where a subset U of Y is open in Y if and only if q −1 (U ) is open in X. (Appropriate equivalence relations on the square yield the torus and the Klein bottle, as discussed above.) Lemma 4.19 Let X and Y be topological spaces and let q: X → Y be an identification map. Let Z be a topological space, and let f : Y → Z be a function from Y to Z. Then the function f is continuous if and only if the composition function f ◦ q: X → Z is continuous. Proof Suppose that f is continuous. Then the composition function f ◦ q is a composition of continuous functions and hence is itself continuous. Conversely suppose that f ◦ q is continuous. Let U be an open set in Z. Then q −1 (f −1 (U )) is open in X (since f ◦ q is continuous), and hence f −1 (U ) is open in Y (since the function q is an identification map). Therefore the function f is continuous, as required. Example Let S 1 be the unit circle in R2 , and let q: [0, 1] → S 1 be the map that sends t ∈ [0, 1] to (cos 2πt, sin 2πt). Then q: [0, 1] → S 1 is an identification map, and therefore a function f : S 1 → Z from S 1 to some topological space Z is continuous if and only if f ◦ q: [0, 1] → Z is continuous. 18
Example The Klein bottle K 2 is the identification space obtained from the square [0, 1] × [0, 1] by identifying (0, t) with (1, 1 − t) for all t ∈ [0, 1] and identifying (s, 0) with (s, 1) for all s ∈ [0, 1]. Let q: [0, 1] × [0, 1] → K 2 be the identification map determined by these identifications. Let Z be a topological space. A function g: [0, 1] × [0, 1] → Z mapping the square into Z which satisfies g(0, t) = g(1, 1 − t) for all t ∈ [0, 1] and g(s, 0) = g(s, 1) for all s ∈ [0, 1], determines a corresponding function f : K 2 → Z, where g = f ◦ q. It follows from Lemma 4.19 that the function f : K 2 → Z is continuous if and only if g: [0, 1] × [0, 1] → Z is continuous. Example Let S n be the n-sphere, consisting of all points x in Rn+1 satisfying |x| = 1. Let RP n be the set of all lines in Rn+1 passing through the origin (i.e., RP n is the set of all one-dimensional vector subspaces of Rn+1 ). Let q: S n → RP n denote the function which sends a point x of S n to the element of RP n represented by the line in Rn+1 that passes through both x and the origin. Note that each element of RP n is the image (under q) of exactly two antipodal points x and −x of S n . The function q induces a corresponding quotient topology on RP n such that q: S n → RP n is an identification map. The set RP n , with this topology, is referred to as real projective nspace. In particular RP 2 is referred to as the real projective plane. It follows from Lemma 4.19 that a function f : RP n → Z from RP n to any topological space Z is continuous if and only if the composition function f ◦ q: S n → Z is continuous.
4.11
Connected Topological Spaces
Definition A topological space X is said to be connected if the empty set ∅ and the whole space X are the only subsets of X that are both open and closed. Lemma 4.20 A topological space X is connected if and only if it has the following property: if U and V are non-empty open sets in X such that X = U ∪ V , then U ∩ V is non-empty. Proof If U is a subset of X that is both open and closed, and if V = X \ U , then U and V are both open, U ∪ V = X and U ∩ V = ∅. Conversely if U and V are open subsets of X satisfying U ∪ V = X and U ∩ V = ∅, then U = X \V , and hence U is both open and closed. Thus a topological space X is connected if and only if there do not exist non-empty open sets U and V such that U ∪ V = X and U ∩ V = ∅. The result follows.
19
Let Z be the set of integers with the usual topology (i.e., the subspace topology on Z induced by the usual topology on R). Then {n} is open for all n ∈ Z, since {n} = Z ∩ {t ∈ R : |t − n| < 21 }. It follows that every subset of Z is open (since it is a union of sets consisting of a single element, and any union of open sets is open). It follows that a function f : X → Z on a topological space X is continuous if and only if f −1 (V ) is open in X for any subset V of Z. We use this fact in the proof of the next theorem. Proposition 4.21 A topological space X is connected if and only if every continuous function f : X → Z from X to the set Z of integers is constant. Proof Suppose that X is connected. Let f : X → Z be a continuous function. Choose n ∈ f (X), and let U = {x ∈ X : f (x) = n},
V = {x ∈ X : f (x) 6= n}.
Then U and V are the preimages of the open subsets {n} and Z \ {n} of Z, and therefore both U and V are open in X. Moreover U ∩ V = ∅, and X = U ∪ V . It follows that V = X \ U , and thus U is both open and closed. Moreover U is non-empty, since n ∈ f (X). It follows from the connectedness of X that U = X, so that f : X → Z is constant, with value n. Conversely suppose that every continuous function f : X → Z is constant. Let S be a subset of X which is both open and closed. Let f : X → Z be defined by 1 if x ∈ S; f (x) = 0 if x 6∈ S. Now the preimage of any subset of Z under f is one of the open sets ∅, S, X \ S and X. Therefore the function f is continuous. But then the function f is constant, so that either S = ∅ or S = X. This shows that X is connected. Lemma 4.22 The closed interval [a, b] is connected, for all real numbers a and b satisfying a ≤ b. Proof Let f : [a, b] → Z be a continuous integer-valued function on [a, b]. We show that f is constant on [a, b]. Indeed suppose that f were not constant. Then f (τ ) 6= f (a) for some τ ∈ [a, b]. But the Intermediate Value Theorem would then ensure that, given any real number c between f (a) and f (τ ), there would exist some t ∈ [a, τ ] for which f (t) = c, and this is clearly impossible, since f is integer-valued. Thus f must be constant on [a, b]. We now deduce from Proposition 4.21 that [a, b] is connected. 20
Example Let X = {(x, y) ∈ R2 : x 6= 0}. The topological space X is not connected. Indeed if f : X → Z is defined by 1 if x > 0, f (x, y) = −1 if x < 0, then f is continuous on X but is not constant. A concept closely related to that of connectedness is path-connectedness. Let x0 and x1 be points in a topological space X. A path in X from x0 to x1 is defined to be a continuous function γ: [0, 1] → X such that γ(0) = x0 and γ(1) = x1 . A topological space X is said to be path-connected if and only if, given any two points x0 and x1 of X, there exists a path in X from x0 to x1 . Proposition 4.23 Every path-connected topological space is connected. Proof Let X be a path-connected topological space, and let f : X → Z be a continuous integer-valued function on X. If x0 and x1 are any two points of X then there exists a path γ: [0, 1] → X such that γ(0) = x0 and γ(1) = x1 . But then f ◦ γ: [0, 1] → Z is a continuous integer-valued function on [0, 1]. But [0, 1] is connected (Lemma 4.22), therefore f ◦γ is constant (Proposition 4.21). It follows that f (x0 ) = f (x1 ). Thus every continuous integer-valued function on X is constant. Therefore X is connected, by Proposition 4.21. The topological spaces R, C and Rn are all path-connected. Indeed, given any two points of one of these spaces, the straight line segment joining these two points is a continuous path from one point to the other. Also the n-sphere S n is path-connected for all n > 0. We conclude that these topological spaces are connected. Let A be a subset of a topological space X. Using Lemma 4.20 and the definition of the subspace topology, we see that A is connected if and only if the following condition is satisfied: • if U and V are open sets in X such that A∩U and A∩V are non-empty and A ⊂ U ∪ V then A ∩ U ∩ V is also non-empty. Lemma 4.24 Let X be a topological space and let A be a connected subset of X. Then the closure A of A is connected. Proof It follows from the definition of the closure of A that A ⊂ F for any closed subset F of X for which A ⊂ F . On taking F to be the complement of some open set U , we deduce that A ∩ U = ∅ for any open set U for which 21
A ∩ U = ∅. Thus if U is an open set in X and if A ∩ U is non-empty then A ∩ U must also be non-empty. Now let U and V be open sets in X such that A ∩ U and A ∩ V are non-empty and A ⊂ U ∪ V . Then A ∩ U and A ∩ V are non-empty, and A ⊂ U ∪ V . But A is connected. Therefore A ∩ U ∩ V is non-empty, and thus A ∩ U ∩ V is non-empty. This shows that A is connected. Lemma 4.25 Let f : X → Y be a continuous function between topological spaces X and Y , and let A be a connected subset of X. Then f (A) is connected. Proof Let g: f (A) → Z be any continuous integer-valued function on f (A). Then g ◦ f : A → Z is a continuous integer-valued function on A. It follows from Proposition 4.21 that g ◦ f is constant on A. Therefore g is constant on f (A). We deduce from Proposition 4.21 that f (A) is connected. Lemma 4.26 The Cartesian product X × Y of connected topological spaces X and Y is itself connected. Proof Let f : X ×Y → Z be a continuous integer-valued function from X ×Y to Z. Choose x0 ∈ X and y0 ∈ Y . The function x 7→ f (x, y0 ) is continuous on X, and is thus constant. Therefore f (x, y0 ) = f (x0 , y0 ) for all x ∈ X. Now fix x. The function y 7→ f (x, y) is continuous on Y , and is thus constant. Therefore f (x, y) = f (x, y0 ) = f (x0 , y0 ) for all x ∈ X and y ∈ Y . We deduce from Proposition 4.21 that X × Y is connected. We deduce immediately that a finite Cartesian product of connected topological spaces is connected. Proposition 4.27 Let X be a topological space. For each x ∈ X, let Sx be the union of all connected subsets of X that contain x. Then (i) Sx is connected, (ii) Sx is closed, (iii) if x, y ∈ X, then either Sx = Sy , or else Sx ∩ Sy = ∅.
22
Proof Let f : Sx → Z be a continuous integer-valued function on Sx , for some x ∈ X. Let y be any point of Sx . Then, by definition of Sx , there exists some connected set A containing both x and y. But then f is constant on A, and thus f (x) = f (y). This shows that the function f is constant on Sx . We deduce that Sx is connected. This proves (i). Moreover the closure Sx is connected, by Lemma 4.24. Therefore Sx ⊂ Sx . This shows that Sx is closed, proving (ii). Finally, suppose that x and y are points of X for which Sx ∩ Sy 6= ∅. Let f : Sx ∪ Sy → Z be any continuous integer-valued function on Sx ∪ Sy . Then f is constant on both Sx and Sy . Moreover the value of f on Sx must agree with that on Sy , since Sx ∩ Sy is non-empty. We deduce that f is constant on Sx ∪ Sy . Thus Sx ∪ Sy is a connected set containing both x and y, and thus Sx ∪ Sy ⊂ Sx and Sx ∪ Sy ⊂ Sy , by definition of Sx and Sy . We conclude that Sx = Sy . This proves (iii). Given any topological space X, the connected subsets Sx of X defined as in the statement of Proposition 4.27 are referred to as the connected components of X. We see from Proposition 4.27, part (iii) that the topological space X is the disjoint union of its connected components. Example The connected components of {(x, y) ∈ R2 : x 6= 0} are {(x, y) ∈ R2 : x > 0} and {(x, y) ∈ R2 : x < 0}. Example The connected components of {t ∈ R : |t − n|
0 such that (s − δ, s + δ) ⊂ W . Moreover s − δ is not an upper bound for the set S, hence there exists some τ ∈ S satisfying τ > s − δ. It follows from the definition of S that [a, τ ] is covered by some finite collection V1 , V2 , . . . , Vr of open sets belonging to U. 2
Let t ∈ [a, b] satisfy τ ≤ t < s + δ. Then [a, t] ⊂ [a, τ ] ∪ (s − δ, s + δ) ⊂ V1 ∪ V2 ∪ · · · ∪ Vr ∪ W, and thus t ∈ S. In particular s ∈ S, and moreover s = b, since otherwise s would not be an upper bound of the set S. Thus b ∈ S, and therefore [a, b] is covered by a finite collection of open sets belonging to U, as required. Lemma 5.3 Let A be a closed subset of some compact topological space X. Then A is compact. Proof Let U be any collection of open sets in X covering A. On adjoining the open set X \ A to U, we obtain an open cover of X. This open cover of X possesses a finite subcover, since X is compact. Moreover A is covered by the open sets in the collection U that belong to this finite subcover. It follows from Lemma 5.1 that A is compact, as required. Lemma 5.4 Let f : X → Y be a continuous function between topological spaces X and Y , and let A be a compact subset of X. Then f (A) is a compact subset of Y . Proof Let V be a collection of open sets in Y which covers f (A). Then A is covered by the collection of all open sets of the form f −1 (V ) for some V ∈ V. It follows from the compactness of A that there exists a finite collection V1 , V2 , . . . , Vk of open sets belonging to V such that A ⊂ f −1 (V1 ) ∪ f −1 (V2 ) ∪ · · · ∪ f −1 (Vk ). But then f (A) ⊂ V1 ∪ V2 ∪ · · · ∪ Vk . This shows that f (A) is compact. Lemma 5.5 Let f : X → R be a continuous real-valued function on a compact topological space X. Then f is bounded above and below on X. Proof The range f (X) of the function f is covered by some finite collection I1 , I2 , . . . , Ik of open intervals of the form (−m, m), where m ∈ N, since f (X) is compact (Lemma 5.4) and R is covered by the collection of all intervals of this form. It follows that f (X) ⊂ (−M, M ), where (−M, M ) is the largest of the intervals I1 , I2 , . . . , Ik . Thus the function f is bounded above and below on X, as required. Proposition 5.6 Let f : X → R be a continuous real-valued function on a compact topological space X. Then there exist points u and v of X such that f (u) ≤ f (x) ≤ f (v) for all x ∈ X. 3
Proof Let m = inf{f (x) : x ∈ X} and M = sup{f (x) : x ∈ X}. There must exist v ∈ X satisfying f (v) = M , for if f (x) < M for all x ∈ X then the function x 7→ 1/(M − f (x)) would be a continuous real-valued function on X that was not bounded above, contradicting Lemma 5.5. Similarly there must exist u ∈ X satisfying f (u) = m, since otherwise the function x 7→ 1/(f (x)−m) would be a continuous function on X that was not bounded above, again contradicting Lemma 5.5. But then f (u) ≤ f (x) ≤ f (v) for all x ∈ X, as required. Proposition 5.7 Let A be a compact subset of a metric space X. Then A is closed in X. Proof Let p be a point of X that does not belong to A, and let f (x) = d(x, p), where d is the distance function on X. It follows from Proposition 5.6 that there is a point q of A such that f (a) ≥ f (q) for all a ∈ A, since A is compact. Now f (q) > 0, since q 6= p. Let δ satisfy 0 < δ ≤ f (q). Then the open ball of radius δ about the point p is contained in the complement of A, since f (x) < f (q) for all points x of this open ball. It follows that the complement of A is an open set in X, and thus A itself is closed in X. Proposition 5.8 Let X be a Hausdorff topological space, and let K be a compact subset of X. Let x be a point of X \ K. Then there exist open sets V and W in X such that x ∈ V , K ⊂ W and V ∩ W = ∅. Proof For each point y ∈ K there exist open sets Vx,y and Wx,y such that x ∈ Vx,y , y ∈ Wx,y and Vx,y ∩ Wx,y = ∅ (since X is a Hausdorff space). But then there exists a finite set {y1 , y2 , . . . , yr } of points of K such that K is contained in Wx,y1 ∪ Wx,y2 ∪ · · · ∪ Wx,yr , since K is compact. Define V = Vx,y1 ∩ Vx,y2 ∩ · · · ∩ Vx,yr ,
W = Wx,y1 ∪ Wx,y2 ∪ · · · ∪ Wx,yr .
Then V and W are open sets, x ∈ V , K ⊂ W and V ∩W = ∅, as required. Corollary 5.9 A compact subset of a Hausdorff topological space is closed. Proof Let K be a compact subset of a Hausdorff topological space X. It follows immediately from Proposition 5.8 that, for each x ∈ X \ K, there exists an open set Vx such that x ∈ Vx and Vx ∩ K = ∅. But then X \ K is equal to the union of the open sets Vx as x ranges over all points of X \ K, and any set that is a union of open sets is itself an open set. We conclude that X \ K is open, and thus K is closed.
4
Proposition 5.10 Let X be a Hausdorff topological space, and let K1 and K2 be compact subsets of X, where K1 ∩ K2 = ∅. Then there exist open sets U1 and U2 such that K1 ⊂ U1 , K2 ⊂ U2 and U1 ∩ U2 = ∅. Proof It follows from Proposition 5.8 that, for each point x of K1 , there exist open sets Vx and Wx such that x ∈ Vx , K2 ⊂ Wx and Vx ∩ Wx = ∅. But then there exists a finite set {x1 , x2 , . . . , xr } of points of K1 such that K1 ⊂ Vx1 ∪ Vx2 ∪ · · · ∪ Vxr , since K1 is compact. Define U1 = Vx1 ∪ Vx2 ∪ · · · ∪ Vxr ,
U2 = Wx1 ∩ Wx2 ∩ · · · ∩ Wxr .
Then U1 and U2 are open sets, K1 ⊂ U1 , K2 ⊂ U2 and U1 ∩ U2 = ∅, as required. Lemma 5.11 Let f : X → Y be a continuous function from a compact topological space X to a Hausdorff space Y . Then f (K) is closed in Y for every closed set K in X. Proof If K is a closed set in X, then K is compact (Lemma 5.3), and therefore f (K) is compact (Lemma 5.4). But any compact subset of a Hausdorff space is closed (Corollary 5.9). Thus f (K) is closed in Y , as required. Remark If the Hausdorff space Y in Lemma 5.11 is a metric space, then Proposition 5.7 may be used in place of Corollary 5.9 in the proof of the lemma. Theorem 5.12 A continuous bijection f : X → Y from a compact topological space X to a Hausdorff space Y is a homeomorphism. Proof Let g: Y → X be the inverse of the bijection f : X → Y . If U is open in X then X \ U is closed in X, and hence f (X \ U ) is closed in Y , by Lemma 5.11. But f (X \ U ) = g −1 (X \ U ) = Y \ g −1 (U ). It follows that g −1 (U ) is open in Y for every open set U in X. Therefore g: Y → X is continuous, and thus f : X → Y is a homeomorphism. We recall that a function f : X → Y from a topological space X to a topological space Y is said to be an identification map if it is surjective and satisfies the following condition: a subset U of Y is open in Y if and only if f −1 (U ) is open in X. 5
Proposition 5.13 A continuous surjection f : X → Y from a compact topological space X to a Hausdorff space Y is an identification map. Proof Let U be a subset of Y . We claim that Y \ U = f (K), where K = X \ f −1 (U ). Clearly f (K) ⊂ Y \ U . Also, given any y ∈ Y \ U , there exists x ∈ X satisfying y = f (x), since f : X → Y is surjective. Moreover x ∈ K, since f (x) 6∈ U . Thus Y \ U ⊂ f (K), and hence Y \ U = f (K), as claimed. We must show that the set U is open in Y if and only if f −1 (U ) is open in X. First suppose that f −1 (U ) is open in X. Then K is closed in X, and hence f (K) is closed in Y , by Lemma 5.11. It follows that U is open in Y . Conversely if U is open in Y then f −1 (Y ) is open in X, since f : X → Y is continuous. Thus the surjection f : X → Y is an identification map. Example Let S 1 be the unit circle in R2 , defined by S 1 = {(x, y) ∈ R2 : x2 + y 2 = 1}, and let q: [0, 1] → S 1 be defined by q(t) = (cos 2πt, sin 2πt) for all t ∈ [0, 1]. It has been shown that the map q is an identification map. This also follows directly from the fact that q: [0, 1] → S 1 is a continuous surjection from the compact space [0, 1] to the Hausdorff space S 1 . We shall show that a finite Cartesian product of compact spaces is compact. To prove this, we apply the following result, known as the Tube Lemma. Lemma 5.14 Let X and Y be topological spaces, let K be a compact subset of Y , and U be an open set in X × Y . Let V = {x ∈ X : {x} × K ⊂ U }. Then V is an open set in X. Proof Let x ∈ V . For each y ∈ K there exist open subsets Dy and Ey of X and Y respectively such that (x, y) ∈ Dy × Ey and Dy × Ey ⊂ U . Now there exists a finite set {y1 , y2 , . . . , yk } of points of K such that K ⊂ Ey1 ∪ Ey2 ∪ · · · ∪ Eyk , since K is compact. Set Nx = Dy1 ∩ Dy2 ∩ · · · ∩ Dyk . Then Nx is an open set in X. Moreover Nx × K ⊂
k [
(Nx × Eyi ) ⊂
i=1
k [
(Dyi × Eyi ) ⊂ U,
i=1
so that Nx ⊂ V . It follows that V is the union of the open sets Nx for all x ∈ V . Thus V is itself an open set in X, as required. Theorem 5.15 A Cartesian product of a finite number of compact spaces is itself compact.
6
Proof It suffices to prove that the product of two compact topological spaces X and Y is compact, since the general result then follows easily by induction on the number of compact spaces in the product. Let U be an open cover of X × Y . We must show that this open cover possesses a finite subcover. Let x be a point of X. The set {x}×Y is a compact subset of X ×Y , since it is the image of the compact space Y under the continuous map from Y to X ×Y which sends y ∈ Y to (x, y), and the image of any compact set under a continuous map is itself compact (Lemma 5.4). Therefore there exists a finite collection U1 , U2 , . . . , Ur of open sets belonging to the open cover U such that {x} × Y is contained in U1 ∪ U2 ∪ · · · ∪ Ur . Let Vx denote the set of all points x0 of X for which {x0 } × Y is contained in U1 ∪ U2 ∪ · · · ∪ Ur . Then x ∈ Vx , and Lemma 5.14 ensures that Vx is an open set in X. Note that Vx × Y is covered by finitely many of the open sets belonging to the open cover U. Now {Vx : x ∈ X} is an open cover of the space X. It follows from the compactness of X that there exists a finite set {x1 , x2 , . . . , xr } of points of X such that X = Vx1 ∪ Vx2 ∪ · · · ∪ Vxr . Now X × Y is the union of the sets Vxj × Y for j = 1, 2, . . . , r, and each of these sets can be covered by a finite collection of open sets belonging to the open cover U. On combining these finite collections, we obtain a finite collection of open sets belonging to U which covers X × Y . This shows that X × Y is compact. Theorem 5.16 Let K be a subset of Rn . Then K is compact if and only if K is both closed and bounded. Proof Suppose that K is compact. Then K is closed, since Rn is Hausdorff, and a compact subset of a Hausdorff space is closed (by Corollary 5.9). For each natural number m, let Bm be the open ball of radius m about the origin, given by Bm = {x ∈ Rn : |x| < m}. Then {Bm : m ∈ N} is an open cover of Rn . It follows from the compactness of K that there exist natural numbers m1 , m2 , . . . , mk such that K ⊂ Bm1 ∪ Bm2 ∪ · · · ∪ Bmk . But then K ⊂ BM , where M is the maximum of m1 , m2 , . . . , mk , and thus K is bounded. Conversely suppose that K is both closed and bounded. Then there exists some real number L such that K is contained within the closed cube C given by C = {(x1 , x2 , . . . , xn ) ∈ Rn : −L ≤ xj ≤ L for j = 1, 2, . . . , n}. Now the closed interval [−L, L] is compact by the Heine-Borel Theorem (Theorem 5.2), and C is the Cartesian product of n copies of the compact set [−L, L]. It follows from Theorem 5.15 that C is compact. But K is a closed subset of C, and a closed subset of a compact topological space is itself compact, by Lemma 5.3. Thus K is compact, as required. 7
5.2
Compact Metric Spaces
We recall that a metric or topological space is said to be compact if every open cover of the space has a finite subcover. We shall obtain some equivalent characterizations of compactness for metric spaces (Theorem 5.22); these characterizations do not generalize to arbitrary topological spaces. Proposition 5.17 Every sequence of points in a compact metric space has a convergent subsequence. Proof Let X be a compact metric space, and let x1 , x2 , x3 , . . . be a sequence of points of X. We must show that this sequence has a convergent subsequence. Let Fn denote the closure of {xn , xn+1 , xn+2 , . . .}. We claim that the intersection of the sets F1 , F2 , F3 , . . . is non-empty. For suppose that this intersection were the empty set. Then X would be the union of the sets V1 , V2 , V3 , . . ., where Vn = X \ Fn for all n. But V1 ⊂ V2 ⊂ V3 ⊂ · · ·, and each set Vn is open. It would therefore follow from the compactness of X that X would be covered by finitely many of the sets V1 , V2 , V3 , . . ., and therefore X = Vn for some sufficiently large n. But this is impossible, since Fn is non-empty for all natural numbers n. Thus the intersection of the sets F1 , F2 , F3 , . . . is non-empty, as claimed, and therefore there exists a point p of X which belongs to Fn for all natural numbers n. We now obtain, by induction on n, a subsequence xn1 , xn2 , xn3 , . . . which satisfies d(xnj , p) < 1/j for all natural numbers j. Now p belongs to the closure F1 of the set {x1 , x2 , x3 , . . .}. Therefore there exists some natural number n1 such that d(xn1 , p) < 1. Suppose that xnj has been chosen so that d(xnj , p) < 1/j. The point p belongs to the closure Fnj +1 of the set {xn : n > nj }. Therefore there exists some natural number nj+1 such that nj+1 > nj and d(xnj+1 , p) < 1/(j + 1). The subsequence xn1 , xn2 , xn3 , . . . constructed in this manner converges to the point p, as required. We shall also prove the converse of Proposition 5.17: if X is a metric space, and if every sequence of points of X has a convergent subsequence, then X is compact (see Theorem 5.22 below). Let X be a metric space with distance function d. A Cauchy sequence in X is a sequence x1 , x2 , x3 , . . . of points of X with the property that, given any ε > 0, there exists some natural number N such that d(xj , xk ) < ε for all j and k satisfying j ≥ N and k ≥ N . A metric space (X, d) is said to be complete if every Cauchy sequence in X converges to some point of X. Proposition 5.18 Let X be a metric space with the property that every sequence of points of X has a convergent subsequence. Then X is complete. 8
Proof Let x1 , x2 , x3 , . . . be a Cauchy sequence in X. This sequence then has a subsequence xn1 , xn2 , xn3 , . . . which converges to some point p of X. We claim that the given Cauchy sequence also converges to p. Let ε > 0 be given. Then there exists some natural number N such that d(xm , xn ) < 21 ε whenever m ≥ N and n ≥ N , since x1 , x2 , x3 , . . . is a Cauchy sequence. Moreover nj can be chosen large enough to ensure that nj ≥ N and d(xnj , p) < 12 ε. If n ≥ N then d(xn , p) ≤ d(xn , xnj ) + d(xnj , p) < 12 ε + 21 ε = ε. This shows that the Cauchy sequence x1 , x2 , x3 , . . . converges to the point p. Thus X is complete, as required. Definition Let X be a metric space with distance function d. A subset A of X is said to be bounded if there exists a non-negative real number K such that d(x, y) ≤ K for all x, y ∈ A. The smallest real number K with this property is referred to as the diameter of A, and is denoted by diam A. (Note that diam A is the supremum of the values of d(x, y) as x and y range over all points of A.) Let X be a metric space with distance function d, and let A be a subset of X. The closure A of A is the intersection of all closed sets in X that contain the set A: it can be regarded as the smallest closed set in X containing A. Let x be a point of the closure A of A. Given any ε > 0, there exists some point x0 of A such that d(x, x0 ) < ε. (Indeed the open ball in X of radius ε about the point x must intersect the set A, since otherwise the complement of this open ball would be a closed set in X containing the set A but not including the point x, which is not possible if x belongs to the closure of A.) Lemma 5.19 Let X be a metric space, and let A be a subset of X. Then diam A = diam A, where A is the closure of A. Proof Clearly diam A ≤ diam A. Let x and y be points of A. Then, given any ε > 0, there exist points x0 and y 0 of A satisfying d(x, x0 ) < ε and d(y, y 0 ) < ε. It follows from the Triangle Inequality that d(x, y) ≤ d(x, x0 ) + d(x0 , y 0 ) + d(y 0 , y) < diam A + 2ε. Thus d(x, y) < diam A + 2ε for all ε > 0, and hence d(x, y) ≤ diam A. This shows that diam A ≤ diam A, as required. Definition A metric space X is said to be totally bounded if, given any ε > 0, the set X can be expressed as a finite union of subsets of X, each of which has diameter less than ε. 9
A subset A of a totally bounded metric space X is itself totally bounded. For if X is the union of the subsets B1 , B2 , . . . , Bk , where diam Bn < ε for n = 1, 2, . . . , k, then A is the union of A ∩ Bn for n = 1, 2, . . . , k, and diam A ∩ Bn < ε. Proposition 5.20 Let X be a metric space. Suppose that every sequence of points of X has a convergent subsequence. Then X is totally bounded. Proof Suppose that X were not totally bounded. Then there would exist some ε > 0 with the property that no finite collection of subsets of X of diameter less than 3ε covers the set X. There would then exist an infinite sequence x1 , x2 , x3 , . . . of points of X with the property that d(xm , xn ) ≥ ε whenever m 6= n. Indeed suppose that points x1 , x2 , . . . , xk−1 of X have already been chosen satisfying d(xm , xn ) ≥ ε whenever m < k, n < k and m 6= n. The diameter of each open ball BX (xm , ε) is less than or equal to 2ε. Therefore X could not be covered by the sets BX (xm , ε) for m < k, and thus there would exist a point xk of X which does not belong to B(xm , ε) for any m < k. Then d(xm , xk ) ≥ ε for all m < k. In this way we can successively choose points x1 , x2 , x3 , . . . to form an infinite sequence with the required property. However such an infinite sequence would have no convergent subsequence, which is impossible. This shows that X must be totally bounded, as required. Proposition 5.21 Every complete totally bounded metric space is compact. Proof Let X be some totally bounded metric space. Suppose that there exists an open cover V of X which has no finite subcover. We shall prove the existence of a Cauchy sequence x1 , x2 , x3 , . . . in X which cannot converge to any point of X. (Thus if X is not compact, then X cannot be complete.) Let ε > 0 be given. Then X can be covered by finitely many closed sets whose diameter is less than ε, since X is totally bounded and every subset of X has the same diameter as its closure (Lemma 5.19). At least one of these closed sets cannot be covered by a finite collection of open sets belonging to V (since if every one of these closed sets could be covered by a such a finite collection of open sets, then we could combine these collections to obtain a finite subcover of V). We conclude that, given any ε > 0, there exists a closed subset of X of diameter less than ε which cannot be covered by any finite collection of open sets belonging to V. We claim that there exists a sequence F1 , F2 , F3 , . . . of closed sets in X satisfying F1 ⊃ F2 ⊃ F3 ⊃ · · · such that each closed set Fn has the following properties: diam Fn < 1/2n , and no finite collection of open sets belonging 10
to V covers Fn . For if Fn is a closed set with these properties then Fn is itself totally bounded, and thus the above remarks (applied with Fn in place of X) guarantee the existence of a closed subset Fn+1 of Fn with the required properties. Thus the existence of the required sequence of closed sets follows by induction on n. Choose xn ∈ Fn for each natural number n. Then d(xm , xn ) < 1/2n for any m > n, since xm and xn belong to Fn and diam Fn < 1/2n . Therefore the sequence x1 , x2 , x3 , . . . is a Cauchy sequence. Suppose that this Cauchy sequence were to converge to some point p of X. Then p ∈ Fn for each natural number n, since Fn is closed and xm ∈ Fn for all m ≥ n. (If a sequence of points belonging to a closed subset of a metric or topological space is convergent then the limit of that sequence belongs to the closed set.) Moreover p ∈ V for some open set V belonging to V, since V is an open cover of X. But then there would exist δ > 0 such that BX (p, δ) ⊂ V , where BX (p, δ) denotes the open ball of radius δ in X centred on p. Thus if n were large enough to ensure that 1/2n < δ, then p ∈ Fn and diam Fn < δ, and hence Fn ⊂ BX (p, δ) ⊂ V , contradicting the fact that no finite collection of open sets belonging to V covers the set Fn . This contradiction shows that the Cauchy sequence x1 , x2 , x3 , . . . is not convergent. We have thus shown that if X is a totally bounded metric space which is not compact then X is not complete. Thus every complete totally bounded metric space must be compact, as required. Theorem 5.22 Let X be a metric space with distance function d. The following are equivalent:— (i) X is compact, (ii) every sequence of points of X has a convergent subsequence, (iii) X is complete and totally bounded, Proof Propositions 5.17, 5.18 5.20 and 5.21 show that (i) implies (ii), (ii) implies (iii), and (iii) implies (i). It follows that (i), (ii) and (iii) are all equivalent to one another. Remark A subset K of Rn is complete if and only if it is closed in Rn . Also it is easy to see that K is totally bounded if and only if K is a bounded subset of Rn . Thus Theorem 5.22 is a generalization of the theorem which states that a subset K of Rn is compact if and only if it is both closed and bounded (Theorem 5.16).
11
5.3
The Lebesgue Lemma and Uniform Continuity
Lemma 5.23 (Lebesgue Lemma) Let (X, d) be a compact metric space. Let U be an open cover of X. Then there exists a positive real number δ such that every subset of X whose diameter is less than δ is contained wholly within one of the open sets belonging to the open cover U. Proof Every point of X is contained in at least one of the open sets belonging to the open cover U. It follows from this that, for each point x of X, there exists some δx > 0 such that the open ball B(x, 2δx ) of radius 2δx about the point x is contained wholly within one of the open sets belonging to the open cover U. But then the collection consisting of the open balls B(x, δx ) of radius δx about the points x of X forms an open cover of the compact space X. Therefore there exists a finite set x1 , x2 , . . . , xr of points of X such that B(x1 , δ1 ) ∪ B(x2 , δ2 ) ∪ · · · ∪ B(xr , δr ) = X, where δi = δxi for i = 1, 2, . . . , r. Let δ > 0 be given by δ = minimum(δ1 , δ2 , . . . , δr ). Suppose that A is a subset of X whose diameter is less than δ. Let u be a point of A. Then u belongs to B(xi , δi ) for some integer i between 1 and r. But then it follows that A ⊂ B(xi , 2δi ), since, for each point v of A, d(v, xi ) ≤ d(v, u) + d(u, xi ) < δ + δi ≤ 2δi . But B(xi , 2δi ) is contained wholly within one of the open sets belonging to the open cover U. Thus A is contained wholly within one of the open sets belonging to U, as required. Let U be an open cover of a compact metric space X. A Lebesgue number for the open cover U is a positive real number δ such that every subset of X whose diameter is less than δ is contained wholly within one of the open sets belonging to the open cover U. The Lebesgue Lemma thus states that there exists a Lebesgue number for every open cover of a compact metric space. Let X and Y be metric spaces with distance functions dX and dY respectively, and let f : X → Y be a function from X to Y . The function f is said to be uniformly continuous on X if and only if, given ε > 0, there exists some δ > 0 such that dY (f (x), f (x0 )) < ε for all points x and x0 of X satisfying dX (x, x0 ) < δ. (The value of δ should be independent of both x and x0 .) Theorem 5.24 Let X and Y be metric spaces. Suppose that X is compact. Then every continuous function from X to Y is uniformly continuous. 12
Proof Let dX and dY denote the distance functions for the metric spaces X and Y respectively. Let f : X → Y be a continuous function from X to Y . We must show that f is uniformly continuous. Let ε > 0 be given. For each y ∈ Y , define Vy = {x ∈ X : dY (f (x), y) < 12 ε}. Note that Vy = f −1 BY (y, 12 ε) , where BY (y, 12 ε) denotes the open ball of radius 12 ε about y in Y . Now the open ball BY (y, 12 ε) is an open set in Y , and f is continuous. Therefore Vy is open in X for all y ∈ Y . Note that x ∈ Vf (x) for all x ∈ X. Now {Vy : y ∈ Y } is an open cover of the compact metric space X. It follows from the Lebesgue Lemma (Lemma 5.23) that there exists some δ > 0 such that every subset of X whose diameter is less than δ is a subset of some set Vy . Let x and x0 be points of X satisfying dX (x, x0 ) < δ. The diameter of the set {x, x0 } is dX (x, x0 ), which is less than δ. Therefore there exists some y ∈ Y such that x ∈ Vy and x0 ∈ Vy . But then dY (f (x), y) < 12 ε and dY (f (x0 ), y) < 21 ε, and hence dY (f (x), f (x0 )) ≤ dY (f (x), y) + dY (y, f (x0 )) < ε. This shows that f : X → Y is uniformly continuous, as required. Let K be a closed bounded subset of Rn . It follows from Theorem 5.16 and Theorem 5.24 that any continuous function f : K → Rk is uniformly continuous.
5.4
The Equivalence of Norms on a Finite-Dimensional Vector Space
Let k.k and k.k∗ be norms on a real or complex vector space X. The norms k.k and k.k∗ are said to be equivalent if and only if there exist constants c and C, where 0 < c ≤ C, such that ckxk ≤ kxk∗ ≤ Ckxk for all x ∈ X. Lemma 5.25 Two norms k.k and k.k∗ on a real or complex vector space X are equivalent if and only if they induce the same topology on X.
13
Proof Suppose that the norms k.k and k.k∗ induce the same topology on X. Then there exists some δ > 0 such that {x ∈ X : kxk < δ} ⊂ {x ∈ X : kxk∗ < 1}, since the set {x ∈ X : kxk∗ < 1} is open with respect to the topology on X induced by both k.k∗ and k.k. Let C be any positive real number satisfying Cδ > 1. Then
1
1
Ckxk x = C < δ, and hence
1
x kxk∗ = Ckxk
Ckxk < Ckxk. ∗
for all non-zero elements x of X, and thus kxk∗ ≤ Ckxk for all x ∈ X. On interchanging the roles of the two norms, we deduce also that there exists a positive real number c such that kxk ≤ (1/c)kxk∗ for all x ∈ X. But then ckxk ≤ kxk∗ ≤ Ckxk for all x ∈ X. We conclude that the norms k.k and k.k∗ are equivalent. Conversely suppose that the norms k.k and k.k∗ are equivalent. Then there exist constants c and C, where 0 < c ≤ C, such that ckxk ≤ kxk∗ ≤ Ckxk for all x ∈ X. Let U be a subset of X that is open with respect to the topology on X induced by the norm k.k∗ , and let u ∈ U . Then there exists some δ > 0 such that {x ∈ X : kx − uk∗ < Cδ} ⊂ U. But then {x ∈ X : kx − uk < δ} ⊂ {x ∈ X : kx − uk∗ < Cδ} ⊂ U, showing that U is open with respect to the topology induced by the norm k.k. Similarly any subset of X that is open with respect to the topology induced by the norm k.k must also be open with respect to the topology induced by k.k∗ . Thus equivalent norms induce the same topology on X. It follows immediately from Lemma 5.25 that if k.k, k.k∗ and k.k] are norms on a real (or complex) vector space X, if the norms k.k and k.k∗ are equivalent, and if the norms k.k∗ and k.k] are equivalent, then the norms k.k and k.k] are also equivalent. This fact can easily be verified directly from the definition of equivalence of norms. We recall that the usual topology on Rn is that generated by the Euclidean norm on Rn . 14
Lemma 5.26 Let k.k be a norm on Rn . Then the function x 7→ kxk is continuous with respect to the usual topology on on Rn . Proof Let e1 , e2 , . . . , en denote the basis of Rn given by e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0), · · · ,
en = (0, 0, 0, . . . , 1).
Let x and y be points of Rn , given by x = (x1 , x2 , . . . , xn ),
y = (y1 , y2 , . . . , yn ).
Using Schwarz’ Inequality, we see that
n n
X
X
|xj − yj | kej k kx − yk = (xj − yj )ej ≤
j=1 j=1 ! 12 ! 21 n n X X ≤ (xj − yj )2 kej k2 = Ckx − yk2 , j=1
j=1
where C 2 = ke1 k2 + ke2 k2 + · · · + ken k2 and kx − yk2 denotes the Euclidean norm of x − y, defined by kx − yk2 =
n X
! 12 (xj − yj )2
.
j=1
Also |kxk − kyk| ≤ kx − yk, since kxk ≤ kx − yk + kyk,
kyk ≤ kx − yk + kxk.
We conclude therefore that |kxk − kyk| ≤ Ckx − yk2 , for all x, y ∈ Rn , and thus the function x 7→ kxk is continuous on Rn (with respect to the usual topology on Rn ). Theorem 5.27 Any two norms on Rn are equivalent, and induce the usual topology on Rn .
15
Proof Let k.k be any norm on Rn . We show that k.k is equivalent to the Euclidean norm k.k2 . Let S n−1 denote the unit sphere in Rn , defined by S n−1 = {x ∈ Rn : kxk2 = 1}, and let f : S n−1 → R be the real-valued function on S n−1 defined such that f (x) = kxk for all x ∈ S n−1 . Now the function f is a continuous function on S n−1 (Lemma 5.26). Also the function f is non-zero at each point of S n−1 , and therefore the function sending x ∈ S n−1 to 1/f (x) is continuous. Now any continuous real-valued function on a closed bounded subset of Rn is bounded on that set (Proposition ). It follows that there exist positive real numbers C and D such that f (x) ≤ C and 1/f (x) ≤ D for all x ∈ S n−1 . Let c = D−1 . Then c ≤ kxk ≤ C for all x ∈ S n−1 . Now kxk = f kxk−1 x 2 kxk2 for all x ∈ Rn \ {0}. (This is an immediate consequence of the fact that kλxk = |λ| kxk for all x ∈ IRn and λ ∈ R.) It follows that ckxk2 ≤ kxk ≤ Ckxk2 for all x ∈ Rn \ {0}. These inequalities also hold when x = 0. The result follows.
16
Course 221: Hilary Term 2007 Section 6: The Extended Real Number System David R. Wilkins c David R. Wilkins 1997–2007 Copyright
Contents 6 The 6.1 6.2 6.3
Extended Real Number System 2 The Extended Real Line . . . . . . . . . . . . . . . . . . . . . 2 Summation over Countable Sets . . . . . . . . . . . . . . . . . 4 Summable Functions . . . . . . . . . . . . . . . . . . . . . . . 10
1
6 6.1
The Extended Real Number System The Extended Real Line
It is often convenient to make use of the extended real line [−∞, +∞]. This is the set R ∪ {−∞, +∞} obtained on adjoining to the real line R two extra elements +∞ and −∞ that represent points at ‘positive infinity’ and ‘negative infinity’ respectively. We define c + (+∞) = (+∞) + c = +∞ and c + (−∞) = (−∞) + c = −∞ for all real numbers c. We also define products of non-zero real numbers with these extra elements ±∞ so that c × (+∞) c × (−∞) c × (+∞) c × (−∞)
= = = =
(+∞) × c = +∞ (−∞) × c = −∞ (+∞) × c = −∞ (−∞) × c = +∞
when c > 0, when c > 0, when c < 0, when c < 0,
We also define 0 × (+∞) = (+∞) × 0 = 0 × (−∞) = (−∞) × 0 = 0, and (+∞) × (+∞) = (−∞) × (−∞) = +∞, (+∞) × (−∞) = (−∞) × (+∞) = −∞. The sum of +∞ and −∞ is not defined. We define −(+∞) = −∞ and −(−∞) = +∞). The difference p − q of two extended real numbers is then defined by the formula p − q = p + (−q), unless p = q = +∞ or p = q = −∞, in which cases the difference of the extended real numbers p and q is not defined. We extend the definition of inequalities to the extended real line in the obvious fashion, so that c < +∞ and c > −∞ for all real numbers c, and −∞ < +∞. Given any real number c, we define [c, +∞] (c, +∞] [−∞, c] [−∞, c)
= = = =
[c, +∞) ∪ {+∞} = {p ∈ [−∞, ∞] : p ≥ c}, (c, +∞) ∪ {+∞} = {p ∈ [−∞, ∞] : p > c}, (−∞, c] ∪ {−∞} = {p ∈ [−∞, ∞] : p ≤ c}, (−∞, c) ∪ {−∞} = {p ∈ [−∞, ∞] : p < c}. 2
There is an order-preserving bijective function ϕ: [−∞, +∞] → [−1, 1] from the extended real line [−∞, +∞] to the closed interval [−1, 1] which is c defined such that ϕ(+∞) = 1, ϕ(−∞) = −1, and ϕ(c) = for all real 1 + |c| numbers c. Let us define ρ(p, q) = |ϕ(q) − ϕ(p)| for all extended real numbers p and q. Then the set [−∞, +∞] becomes a metric space with distance function ρ. Moreover the function ϕ: [−∞, +∞] → [−1, 1] is a homeomorphism from this metric space to the closed interval [−1, 1]. It follows directly from this that [−∞, +∞] is a compact metric space. Moreover an infinite sequence (pj : j ∈ N) of extended real numbers is convergent if and only if the corresponding sequence (ϕ(pj ) : j ∈ N) of real numbers is convergent. Given any non-empty set S of extended real numbers, we can define sup S to be the least extended real number p with the property that s ≤ p for all s ∈ S. If the set S does not contain the extended real number +∞, and if there exists some real number B such that s ≤ B for all s ∈ S, then sup S < +∞; otherwise sup S = +∞. Similarly we define inf S to be the greatest extended real number p with the property that s ≥ p for all s ∈ S. If the set S does not contain the extended real number −∞, and if there exists some real number A such that s ≥ A for all s ∈ S, then inf S > +∞; otherwise inf S = −∞. Moreover ϕ(sup S) = sup ϕ(S) and ϕ(inf S) = inf ϕ(S), where ϕ: [−∞, +∞] → [−1, 1] is the homeomorphism defined such that ϕ(+∞) = 1, ϕ(−∞) = −1 and ϕ(c) = c(1 + |c|)−1 for all real numbers c. Given any sequence (pj : j ∈ N) of extended real numbers, we define the upper limit lim sup pj and the lower limit lim sup pj of the sequence so that j→+∞
j→+∞
lim sup pj = lim sup{pk : k ≥ j}, j→+∞
lim inf pj = lim inf{pk : k ≥ j}.
j→+∞
j→+∞
j→+∞
Every sequence of extended real numbers has both an upper limit and a lower limit. Moreover an infinite sequence of extended real numbers converges to some extended real number if and only if the upper and lower limits of the sequence are equal. (These results follow easily from the corresponding results for bounded sequences of real numbers, on using the identities ϕ(lim sup pj ) = lim sup ϕ(pj ), j→+∞
ϕ(lim inf pj ) = lim inf ϕ(pj ), j→+∞
j→+∞
j→+∞
where ϕ: [−∞, +∞] → [−1, 1] is the homeomorphism defined above.) The function that sends a pair (p, q) of extended real numbers to the extended real number p + q is not defined when p = +∞ and q = −∞, or 3
when p = −∞ and q = +∞ but is continuous elsewhere. The function that sends a pair (p, q) of extended real numbers to the extended real number pq is defined everywhere. This function is discontinuous when p = ±∞ and q = 0, and when p = 0 and q = ±∞. It is continuous for all other values of the extended real numbers p and q. Let a1 , a2 , a3 , . . . be an infinite sequence of extended real numbers P which does not include both the values +∞ and −∞, and let pk = kj=0 aj for all natural numbers k. If the infinite sequence p1 , p2 , p3 , . . . of extended real numbers converges in the extended real line [−∞, +∞] to some extended real number p, then this value p is said to be the sum of the infinite series +∞ +∞ P P aj , and we write aj = p. j=1
j=1
It follows easily from this definition that if +∞ is one of the values of +∞ P the infinite series a1 , a2 , a3 , . . ., then aj = +∞. Similarly if −∞ is one of j=1
the values of this infinite series then then
+∞ P
aj = −∞. Suppose that the
j=1
members of the sequence a1 , a2 , a3 , . . . are all real numbers. Then
+∞ P
an =
j=1
+∞ if and only if, given any real number B, there exists some real number N k +∞ P P such that an > B whenever k ≥ N . Similarly aj = −∞ if and only j=1
j=1
if, given any real number A, there exists some real number N such that k P aj < A whenever k ≥ N . j=1
6.2
Summation over Countable Sets
Let S be a countable set, and let λ: S → [0, +∞] be a function on the set S that takes values in the set [0, +∞] of non-negative extended real numbers. We define ( ) X X λ(s) = sup λ(s) : F is finite and F ⊂ S . s∈S
s∈F
P We define s∈S λ(s) = 0 when S = ∅. If the set S is finite and non-empty, then there exist distinct elements s1 , s2 , . . . , sm of S such that S = {s1 , s2 , . . . , sm }. Then X λ(s) = λ(s1 ) + λ(s2 ) + · · · + λ(sm ). s∈S
4
Proposition 6.1 Let S be a countable infinite set, let λ: S → [0, +∞] be a function on S with values in the set [0, +∞] of non-negative extended real numbers, and let ϕ: N → S be a bijective function mapping the set N of natural numbers onto S. Then X
λ(s) =
+∞ X
λ(ϕ(j)).
j=1
s∈S
Thus if s1 , s2 , s3 , . . . be an infinite sequence of distinct elements of S that includes every element of S, then X
λ(s) =
+∞ X
λ(sj ).
j=1
s∈S
Proof Given any finite subset F of S, there exists some natural number N such that F ⊂ {ϕ(j) : j ∈ N and j ≤ N }. Then X
λ(s) ≤
N X
λ(ϕ(j)) ≤
j=1
s∈F
+∞ X
λ(ϕ(j)).
j=1
It follows that X
λ(s) = sup
s∈S
( X
λ(s) : F is finite and F ⊂ S
)
λ(ϕ(j)).
j=1
s∈F
But
≤
+∞ X
N X
X
λ(ϕ(j)) ≤
j=1
λ(s)
s∈S
for all natural numbers N , and therefore +∞ X
λ(ϕ(j)) = lim
N →+∞
j=1
It follows that
P
s∈S
λ(s) =
+∞ P
N X
λ(ϕ(j)) ≤
j=1
X
λ(s).
s∈S
λ(ϕ(j)), as required.
j=1
Corollary 6.2 Let a1 , a2 , a3 , . . . be an infinite sequence of non-negative real numbers, and let ϕ: N → N be a permutation of the set of natural numbers. +∞ +∞ P P Then aj = aϕ(j) . Thus the sum of any infinite series of non-negative j=1
j=1
real numbers has a value which is independent of the order of summation. 5
Proof Let λ: N → R be the function defined such that λ(j) = aj for all natural numbers j. It follows immediately from Proposition 6.1 that +∞ X
aϕ(j) =
j=1
+∞ X
λ(ϕ(j)) =
j=1
Thus the sum of the infinite series
X
λ(j).
j∈N
+∞ P
aϕ(j) has a value that is the same for
j=1
all permutations ϕ: N → N of the set N of natural numbers, and is therefore +∞ P equal to aj . j=1
Let C be a collection of sets. We say that the sets in this collection are pairwise disjoint if and only if the intersection of any two distinct sets in the collection is the empty set. Proposition S 6.3 Let C be a countable collection of countable sets, let V be the union S∈C S of the sets belonging to the collection C, and let λ: V → [0, +∞] be a function on V taking values in the set [0, +∞] of non-negative extended real numbers. Suppose that the sets in the collection C are pairwise disjoint. Then ! X X X λ(s) = λ(s) . s∈V
S∈C
s∈S
Proof The result follows immediately if C = ∅. It suffices therefore to prove the result when the collection C is non-empty. First we prove that ! X X X λ(s) ≤ λ(s) . s∈V
S∈C
s∈S
Let F be a finite subset of V . Then the number of sets in the collection C which have non-empty intersection with F is finite. (Indeed the number of such sets cannot exceed the number of elements in the finite set F .) Let the number of such sets be m, and let these sets be S1 , S2 , . . . , Sm . Then the sets S1 , S2 , . . . , Sm are pairwise disjoint (so that Si ∩ Sj = ∅ whenever i 6= j), and m S F = (F ∩ Sj ), and therefore j=1
X s∈F
λ(s) =
m X j=1
X
s∈F ∩Sj
λ(s) ≤
m X j=1
6
X
s∈Sj
λ(s) ≤
X X S∈C
s∈S
!
λ(s) .
It follows from this that ( ) ! X X X X λ(s) = sup λ(s) : F is finite and F ⊂ V ≤ λ(s) . s∈V
S∈C
s∈F
s∈S
We now prove the reverse inequality ! X X X λ(s) ≤ λ(s). S∈C
s∈S
s∈V
P
P If s∈V λ(s)P= +∞ therePis nothing to prove. Suppose that s∈V λ(s) < +∞. Then s∈S λ(s) ≤ s∈V λ(s) < +∞ for all S ∈ C. Let S1 , S2 , . . . , Sr be distinct sets belonging to the collection C, and let ε be some positive real number. Then, for each integer j between 1 and r, there exists a finite subset Fj of Sj such that X ε X λ(s). λ(s) − < r s∈F s∈S j
j
P
(Indeed s∈Sj λ(s) is by definition the least upper bound of the sums of the values of the function λ taken over finite subsets of Sj , and may therefore be approximated to within an error of ε/r by the sum taken over some sufficiently large finite subset of Sj .) Let F = F1 ∪ F2 ∪ . . . ∪ Fr . Now the sets S1 , S2 , . . . , Sr are pairwise disjoint, and therefore so are the sets F1 , F2 , . . . , Fr . It follows that r r X X X X X X λ(s) − ε < λ(s) = λ(s) ≤ λ(s). j=1
j=1
s∈Sj
s∈Fj
s∈F
s∈V
The inequality r X j=1
X
s∈Sj
λ(s) − ε
λ(s) − 21 ε and µ(s) > µ(s) − 12 ε. s∈F1
s∈S
s∈F2
s∈S
Let F = F1 ∪ F2 . Then X X X X (λ(s) + µ(s)) ≥ (λ(s) + µ(s)) = λ(s) + µ(s) s∈S
s∈F
≥
X
s∈F
λ(s) +
s∈F1
X
µ(s) >
s∈F2
s∈F
X
λ(s) +
s∈S
X
µ(s) − ε.
s∈S
The inequality X
(λ(s) + µ(s)) >
s∈S
X
λ(s) +
s∈S
X
µ(s) − ε
s∈S
therefore holds, irrespective of the value of the positive real quantity ε. It follows that X X X (λ(s) + µ(s)) ≥ λ(s) + µ(s), s∈S
s∈S
s∈S
P
when s∈S (λ(s) + µ(s)) < +∞. This inequality obviously holds also when the sum on the left hand side has the value +∞. Therefore X X X µ(s), λ(s) + (λ(s) + µ(s)) = s∈S
s∈S
as required. 9
s∈S
6.3
Summable Functions
Definition Let f : S → C be a function defined on a countable set S and taking values in the field C of complex numbers. The function f is said to be summable if the set S, provided that X |f (s)| < +∞. s∈S
We shall show that we can attach a well-defined value to the sum X f (s) s∈S
of the values of a summable function defined on a countable set S. This result is equivalent to the well-known theorem of analysis which states that an infinite series of real or complex numbers has a sum which is independent of the order of summation, provided that the infinite series is absolutely convergent. Let S be a countable set, and let f : S → C be a function on S with values in the field of complex numbers. We can write f (s) = x+ (s) − x− (s) + iy + (s) − iy − (s), where x+ (s) = max(Re[f (s)], 0),
x− (s) = max(− Re[f (s)], 0),
y + (s) = max(Im[f (s)], 0),
y − (s) = max(− Im[f (s)], 0).
Then x+ , x− , y + and y − are functions on S with values in the set of nonnegative real numbers. We can therefore sum these functions over the set S to obtain well-defined extended real numbers X X X X x+ (s), x− (s), y + (s), and y − (s) s∈S
s∈S
s∈S
s∈S
(which may be finite or infinite). Now x+ (s) ≤ |f (s)|,
x− (s) ≤ |f (s)|,
y + (s) ≤ |f (s)|,
and y − (s) ≤ |f (s)|
and |f (s)| ≤ x+ (s) + x− (s) + y + (s) + y − (s) P for all s ∈ S. It follows that |f (s)| < +∞ if and only if s∈S
X
X
x+ (s) < +∞,
s∈S
s∈S
10
x− (s) < +∞,
X
y + (s) < +∞,
and
s∈S
X
y − (s) < +∞.
s∈S
Definition Let S be a countable set, and let f : S → C be a summable function on S. We define X X X X X f (s) = x+ (s) − x− (s) + i y + (s) − i y − (s), s∈S
s∈S
s∈S
s∈S
s∈S
where x+ (s) = max(Re[f (s)], 0),
x− (s) = max(− Re[f (s)], 0),
y + (s) = max(Im[f (s)], 0),
y − (s) = max(− Im[f (s)], 0).
The following result follows immediately from this definition. Lemma 6.6 Let S be a countable set, and let f : S → C be a summable function on S. Then X X X f (s) = Re[f (s)] + i Im[f (s)]. s∈S
s∈S
s∈S
Lemma 6.7 Let S be a countable set, and let f : S → R be a summable realvalued function on S. Suppose that f (s) = p(s) − q(s) for P all s ∈ S, where p(s) ≥ 0 and q(s) ≥ 0 for all s ∈ S. Suppose also that p(s) < +∞ and s∈S P q(s) < +∞. Then
s∈S
X
f (s) =
s∈S
X
p(s) −
s∈S
X
q(s).
s∈S
P Proof It follows from the definition of s∈S f (s) that X X X f (s) = x+ (s) − x− (s), s∈S
s∈S
s∈S
where x+ (s) = max(Re[f (s)], 0),
x− (s) = max(− Re[f (s)], 0).
Then f (s) = x+ (s) − x− (s) = p(s) − q(s), and therefore x+ (s) + q(s) = x− (s) + p(s) for all s ∈ S. It follows from Lemma 6.5 that X X X X x+ (s) + q(s) = x− (s) + p(s). s∈S
s∈S
s∈S
11
s∈S
Therefore X
f (s) =
s∈S
X
x+ (s) −
s∈S
X
x− (s) =
s∈S
X
p(s) −
s∈S
X
q(s),
s∈S
as required. Proposition 6.8 Let S be a countable set, and let f : S → C and g: S → C be summable functions on S. Then X X X (f (s) + g(s)) = f (s) + g(s). s∈S
s∈S
s∈S
Proof Let x+ (s) = max(Re[f (s)], 0),
x− (s) = max(− Re[f (s)], 0),
y + (s) = max(Im[f (s)], 0),
y − (s) = max(− Im[f (s)], 0),
u+ (s) = max(Re[g(s)], 0),
u− (s) = max(− Re[g(s)], 0),
v + (s) = max(Im[g(s)], 0),
v − (s) = max(− Im[g(s)], 0).
Then X
f (s) =
X
g(s) =
s∈S
X
X
x+ (s) −
s∈S
x− (s) + i
s∈S
X
y + (s) − i
s∈S
X
y − (s)
s∈S
and s∈S
X
u+ (s) −
s∈S
X
X
u− (s) + i
s∈S
v + (s) − i
s∈S
X
y − (s).
s∈S
Now Re[f (s) + g(s)] = x+ (s) + u+ (s) − (x− (s) + u− (s)) for all s ∈ S. It follows from Lemma 6.5, Lemma 6.6 and Lemma 6.7 that " # X X Re[f (s) + g(s)] Re (f (s) + g(s)) = s∈S
s∈S
=
X
=
X
=
X
(x+ (s) + u+ (s)) −
s∈S
(x− (s) + u− (s))
s∈S
+
x (s) +
s∈S
X
+
u (s) −
s∈S
Re[f (s)] +
s∈S
= Re
X
"
X
X s∈S
Re[g(s)]
s∈S
X
f (s) +
s∈S
X s∈S
12
#
g(s) .
x− (s) −
X s∈S
u− (s)
Similarly Im
"
X
#
(f (s) + g(s)) = Im
"
s∈S
X
f (s) +
s∈S
X
#
g(s) .
s∈S
Therefore X
(f (s) + g(s)) =
s∈S
X
f (s) +
s∈S
X
g(s),
s∈S
as required. Corollary 6.9 Let S be a countable set, and let f : S → C be a summable function on S. Then X X cf (s) = c f (s) s∈S
s∈S
for all complex numbers c. Proof Let f (s) = x(s) + iy(s) for all s ∈ S, where x(s) ∈ R and y(s) ∈ R, and let c = a+ib, where a, b ∈ R. Then cf (s) = ax(s)−by(s)+i(ay(s)+bx(s)) for all s ∈ S. Now X X X x(s) = x+ (s) − x− (s) s∈S
s∈S
s∈S
and X
ax(s) =
s∈S
X
ax+ (s) −
s∈S
X
ax− (s).
s∈S
(This last identity follows directly from the definition of the sum of a realvalued function over a countable set, on considering separately the cases when a ≥ 0 and when a ≤ 0.) It follows that X X ax(s) = a x(s). s∈S
Similarly X X bx(s) = b x(s), s∈S
s∈S
X
s∈S
ay(s) = a
s∈S
X
X
y(s),
s∈S
by(s) = b
s∈S
X
y(s).
s∈S
It follows that ! X X X X X cf (s) = ax(s) − by(s) + i ay(s) + bx(s) s∈S
s∈S
= a
X
s∈S
x(s) − b
X
y(s) + i a
s∈S
s∈S
= (a + ib)
s∈S
X
x(s) + i
s∈S
X
y(s) + b
s∈S
X s∈S
13
s∈S
!
y(s)
=c
X s∈S
X s∈S
f (s),
!
x(s)
as required. Corollary 6.10 Let S be a countable set, and let f : S → C be a summable function on S. Then X X f (s) ≤ |f (s)|. s∈S
s∈S
Proof There exists a complex number c satisfying |c| = 1 for which X X f (s) f (s) =c s∈S
Then
X X X f (s) = Re[cf (s)] ≤ |f (s)|, s∈S
as required.
s∈S
s∈S
s∈S
Proposition S 6.11 Let C be a countable collection of countable sets, let V be the union S∈C S of the sets belonging to the collection C, and let f : V → [0, +∞] be a function on V taking values in the set [0, +∞] of non-negative extended real numbers. Suppose that the sets in the collection ! C are pairwise X X X disjoint, and that either |f (s)| < +∞ or |f (s)| < ∞. Then s∈V
X
S∈C
f (s) =
s∈V
X X S∈C
s∈S
!
f (s)
s∈S
Proof We set f (s) = x+ (s) − x− (s) + iy + (s) − iy − (s), for all (s) ∈ S × T , where x+ (s) = max(Re[f (s)], 0),
x− (s) = max(− Re[f (s)], 0),
y + (s) = max(Im[f (s)], 0),
y − (s) = max(− Im[f (s)], 0).
It follows from Corollary 6.4 that X s∈V
λ(s) =
X X S∈C
14
s∈S
!
λ(s)
in the cases when λ(s) = |f (s)|, λ(s) = x+ (s), λ(s) = x− (s), P λ(s) = y + (s) − and λ(s) = y (s). Thus if at least one of the quantities |f (s)| and s∈V P P s∈S |f (s)| is finite then S∈C
X
X
x+ (s) < +∞,
s∈V
X
s∈V
y + (s) < +∞ and
s∈V
But then X
f (s) =
v∈W
X
v∈W
x− (s) < +∞,
X
y − (s) < +∞.
s∈V
x+ (s) −
X
x− (s) + i
v∈W
X
y + (s) − i
v∈W
X
y − (s)
v∈W
for any subset W of V . The result therefore follows on applying Corollary 6.4 to the functions that send s ∈ V to x+ (s), x− (s), y + (s) and y − (s). The following result now follows directly from Proposition 6.11. Corollary 6.12 Let S and T be countable sets, and let f : S × T → [0, +∞] be a complex-valued function on S × T . Suppose that at least one of the quantities ! ! X X X X X |f (s, t)| , |f (s, t)|, |f (s, t)| s∈S
t∈T
t∈T
(s,t)∈S×T
s∈S
is finite. Then all these quantities are finite, and ! ! X X X X X f (s, t) = f (s, t) = f (s, t) . s∈S
t∈T
t∈T
(s,t)∈S×T
s∈S
Example The exponential function of complex analysis is defined such that +∞ n X z for all complex numbers z. exp z = n! n=0 Let z and w be complex numbers, let P be the set of non-negative integers, and let f : P → P be the function defined such that f (j, k) =
15
zj zk j!k!
for all non-negative integers j and k. Now ! ! X X X |z|j X |w|k X |z j | |f (j, k)| = = exp |w| j! k! j! j∈P j∈P j∈P k∈P k∈P = exp |z| exp |w|, Thus f is a summable function on P × P . Moreover it follows from Corollary 6.12 that ! ! X X X X z j X wk f (j, k) = f (j, k) = j! k∈P k! j∈P j∈P k∈P (j,k)∈P ×P X zj = exp(w) = exp z exp w. j! j∈P Now P × P is the disjoint union of the sets D0 , D1 , D2 , D3 , . . ., where Dn = {(j, k) ∈ P × P : j + k = n} for each non-negative integer n. It follows from Proposition 6.11 that +∞ X X X f (j, k) = f (j, k) . n=0
(j,k)∈P ×P
(j,k)∈Dn
But k = n − j for all (j, k) ∈ Dn , and therefore X
f (j, k) =
n X
f (j, n − j) =
j=0
(j,k)∈Dn
=
n X z j wn−j j=0
j!k!
n 1 X n j n−j = z w n! j=0 j
(z + w)n . n!
Therefore exp z exp w =
X
f (j, k) =
(j,k)∈P ×P
+∞ X n=0
X
(j,k)∈Dn
f (j, k) =
+∞ X (z + w)n n=0
n!
= exp(z + w). Thus this standard identity for the exponential function is a consequence of the basic theory of summable functions on countable sets developed above, as is the more general result involving Cauchy products of absolutely convergent infinite series. 16
Course 221: Hilary Term 2007 Section 7: Measure Spaces David R. Wilkins c David R. Wilkins 1997–2007 Copyright
Contents 7 Measure Spaces 7.1 Bricks . . . . . . . . . . . . . . . . . . 7.2 Lebesgue Outer Measure . . . . . . . . 7.3 Outer Measures . . . . . . . . . . . . . 7.4 Measure Spaces . . . . . . . . . . . . . 7.5 Lebesgue Measure on Euclidean Spaces 7.6 Basic Properties of Measures . . . . . .
1
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2 . 2 . 7 . 8 . 14 . 16 . 19
7 7.1
Measure Spaces Bricks
Definition We define an n-dimensional brick to be a subset of Rn that is a Cartesian product of bounded intervals. Let B be an n-dimensional brick. Then there exist bounded intervals I1 , I2 , . . . , In such that B = I1 × I2 × · · · × In . Let ai and bi denote the endpoints of the interval Ii for i = 1, 2, . . . , n, where ai ≤ bi . Then the interval Ii must coincide with one of the intervals (ai , bi ), (ai , bi ], [ai , bi ) and [ai , bi ] determined by its endpoints, where (ai , bi ) = {x ∈ R : ai < x < bi },
(ai , bi ] = {x ∈ R : ai < x ≤ bi }
[ai , bi ) = {x ∈ R : ai ≤ x < bi },
[ai , bi ] = {x ∈ R : ai ≤ x ≤ bi }.
We say that the brick B is open if Ii = (ai , bi ) for i = 1, 2, . . . , n. Similarly we say that the brick B is closed if Ii = [ai , bi ] for i = 1, 2, . . . , n. Definition Let B be an n-dimensional brick that is the Cartesian product I1 × I2 × · · · × In of bounded intervals I1 , I2 , . . . , In , and let ai and bi denote the endpoints of the interval Ii , where ai ≤ bi . The content m(B) of the n Q brick B is then defined to be the product (bi − ai ) of the lengths of the i=1
intervals I1 , I2 , . . . , In . Note that a one-dimensional brick is a bounded interval in the real line, and the content of the brick is the length of the interval. A two-dimensional brick is a rectangle in R2 with sides parallel to the coordinate axes, and the content of the brick is the area of the rectangle. The content of a threedimensional brick is the volume of that brick. Let B be an n-dimensional brick, and let B1 , B2 , . . . , Bs be a finite cols S lection of n-dimensional bricks. We shall show that if B ⊂ Bk then m(B) ≤
s P
k=1
m(Bk ). We shall also show that if the interiors of the bricks
k=1
B1 , B2 , . . . , Bs are disjoint and are contained in B then m(B) ≥
s P
m(Bk ).
k=1
These results are of course fairly intuitive, and may at first sight seem to be obvious. Now the brick B is the Cartesian product I1 × I2 × ×In of bounded intervals I1 , I2 , . . . In in the real line. Let ai and bi denote the endpoints of 2
the interval Ii for i = 1, 2, . . . , n, where ai and bi are real numbers satisfying ai ≤ bi . Similarly each brick Bk is a Cartesian product Ik,1 × Ik,2 × ×Ik,n of bounded intervals Ik,1 , Ik,2 , . . . Ik,n in the real line. Let ak,i and bk,i denote the endpoints of the interval Ik,i for i = 1, 2, . . . , n, where ak,i and bk,i are n Q real numbers satisfying ak,i ≤ bk,i . Then m(B) = (bi − ai ), and m(Bk ) = n Q
i=1
(bk,i − ak,i ) for k = 1, 2, . . . , s.
i=1
Now there exist finite sets P1 , P2 , . . . , Pn such that ai ∈ Pi , bi ∈ Pi , ak,i ∈ Pi and bk,i ∈ Pi for i = 1, 2, . . . , n and k = 1, 2, . . . , s. Let Pi = {ti,0 , ti,1 , ti,2 , . . . , ti,mi } for i = 1, 2, . . . , mi , where ti,0 < ti,1 < ti,2 < · · · < ti,mi . Also let J denote the set consisting of all n-tuples (j1 , j2 , j3 , . . . , jn ) with 1 ≤ ji ≤ mi for i = 1, 2, . . . , n, and, for each (j1 , j2 , . . . , jn ) ∈ J, let Vj1 ,j2 ,...,jn denote the open brick consisting of all points (x1 , x2 , . . . , xn ) of Rn that satisfy ti,ji −1 < xi < ti,ji for i = 1, 2, . . . , n. Then the content m(Vj1 ,j2 ,...,jn ) of the n Q brick Vj1 ,j2 ,...,jn is the product (ti,ji − ti,ji −1 ) of the lengths ti,ji − ti,ji −1 of i=1
the intervals (ti,ji −1 , ti,ji ). Now, given any integer i between 1 and n, the endpoints ai and bi of the interval Ii belong to the set Pi , and therefore there exist integers pi and qi satisfying 1 ≤ pi ≤ qi ≤ mi such that a = ti,pi and b = ti,qi . Then X (ti,ji − ti,ji −1 ). b i − ai = pi <ji ≤qi
(The sum on the right hand side of the above equality has the value zero when pi = qi .) It follows from this that n Y m(B) = (bi − ai ) = i=1
=
X
X
n Y (ti,ji − ti,ji −1 )
(j1 ,j2 ,...,jn )∈J(B) i=1
m(Vj1 ,j2 ,...,jn ),
(j1 ,j2 ,...,jn )∈J(B)
where J(B) = {(j1 , j2 , . . . , jn ) ∈ J : pi < ji ≤ qi for i = 1, 2, . . . , n} = {(j1 , j2 , . . . , jn ) ∈ J : Vj1 ,j2 ,...,jn ⊂ B}. 3
Now (j1 , j2 , . . . , jn ) ∈ J(B) if and only if Vj1 ,j2 ,...,jn ⊂ B. We conclude therefore that that the content m(B) of the brick B is the sum of the contents of those open bricks Vj1 ,j2 ,...,jn for which (j1 , j2 , . . . , jn ) ∈ J and Vj1 ,j2 ,...,jn ⊂ B. Similarly X m(Bk ) = m(Vj1 ,j2 ,...,jn ) (j1 ,j2 ,...,jn )∈J(Bk )
for k = 1, 2, . . . , s, where J(Bk ) = {(j1 , j2 , . . . , jn ) ∈ J : Vj1 ,j2 ,...,jn ⊂ Bk }. Now suppose that B ⊂
s S
Bk . Then J(B) ⊂
k=1
X
m(B) =
≤
s S
J(Bk ), and therefore
k=1
m(Vj1 ,j2 ,...,jn )
(j1 ,j2 ,...,jn )∈J(B) s X X
m(Vj1 ,j2 ,...,jn ) =
k=1 (j1 ,j2 ,...,jn )∈J(Bk )
s X
m(Bk )
k=1
On the other hand, suppose that the interiors of the bricks B1 , B2 , . . . , Bs s S are disjoint and contained in B. Then J(Bk ) ⊂ J(B), and moreover each k=1
n-tuple (j1 , j2 , . . . , jn ) of integers in J(B) belongs to at most one of the sets J(B1 ), J(B2 ), . . . , J(Bs ). Therefore m(B) ≥
s X
X
m(Vj1 ,j2 ,...,jn ) =
k=1 (j1 ,j2 ,...,jn )∈J(Bk )
s X
m(Bk ).
k=1
We have therefore proved the following two results. Proposition 7.1 Let B be a brick in n-dimensional Euclidean space Rn , and let B1 , B2 , . . . , Bs be a finite collection of bricks in Rn . Suppose that s s S P B⊂ Bk . Then m(B) ≤ m(Bk ). k=1
k=1
Proposition 7.2 Let B be a brick in n-dimensional Euclidean space Rn , and let B1 , B2 , . . . , Bs be a finite collection of bricks in Rn . Suppose that the interiors of the bricks B1 , B2 , . . . , Bs are disjoint and are contained in B. s P Then m(B) ≥ m(Bk ). k=1
The following corollary follows immediately from the inequalities proved above. 4
Corollary 7.3 Let B be a brick in n-dimensional Euclidean space Rn , and let B1 , B2 , . . . , Bs be a finite collection of bricks in Rn . Suppose that the s S interiors of the bricks B1 , B2 , . . . , Bs are disjoint and B = Bk . Then m(B) =
s P
k=1
m(Bk ).
k=1
Lemma 7.4 Let B be an brick in Rn , and let ε be any positive real number. Then there exist a closed brick F and and open brick V such that F ⊂ B ⊂ V , m(F ) > m(B) − ε and m(V ) < m(B) + ε. Proof Suppose that B = I1 × I2 × · · · × In , where I1 , I2 , . . . , In are bounded intervals. Now lim
h→0
n n Y Y (m(Ii ) + h) = m(Ii ) = m(B). i=1
i=1
It follows that, given any positive real number ε, we can choose the positive real number δ small enough to ensure that n Y (m(Ii ) − δ) > m(B) − ε,
n Y (m(Ii ) + δ) < m(B) + ε.
i=1
i=1
Let F = J1 × J2 × · · · × Jn and V = K1 × K2 × · · · × Kn , where J1 , J2 , . . . , Jn are closed bounded intervals chosen such that Ji ⊂ Ii and m(Ji ) > m(Ii ) − δ for i = 1, 2, . . . , n, and K1 , K2 , . . . , Kn are open bounded intervals chosen such that Ii ⊂ Ki and m(Ki ) < m(Ii ) + δ for i = 1, 2, . . . , n. Then F is a closed brick, V is an open brick, F ⊂ B ⊂ V , m(F ) > m(B) − ε and m(V ) < m(B) + ε, as required. Any closed n-dimensional brick F is a compact subset of Rn . This means that, given any collection V of open sets in Rn that covers F (so that each point of F belongs to at least one of the open sets in the collection), there exists some finite collection V1 , V2 , . . . , Vs of open sets belonging to the collection V such that F ⊂ V1 ∪ V2 ∪ · · · ∪ Vs . We shall use this property of closed bricks in order to generalize Proposition 7.1 to countable infinite unions of bricks. Proposition 7.5 Let A be a brick in n-dimensional Euclidean spaceS Rn , and n let C be a countable P collection of bricks in R . Suppose that A ⊂ B∈C B. Then m(A) ≤ m(B). B∈C
5
P
Proof There is nothing to prove if
m(B) = +∞. We may therefore P restrict our attention to the case where m(B) < +∞. Moreover the B∈C
B∈C
result is an immediate consequence of Proposition 7.1 if the collection C is finite. It therefore only remains to prove the result in the case where the collection C is infinite, but countable. In that case there exists an infinite sequence B1 , B2 , B3 , . . . of bricks with the property that each brick in the collection C occurs exactly once in the sequence. Let some positive real number ε be given. It follows from Lemma 7.4 that there exists a closed brick F such that F ⊂ A and m(F ) ≥ m(A) − ε. Also, for each k ∈ N, there exists an open brick Vk such that Bk ⊂ Vk +∞ S and m(Vk ) < m(Bk ) + 2−k ε. Then F ⊂ Vk , and thus {V1 , V2 , V3 , . . .} is a k=1
collection of open sets in Rn which covers the closed bounded set F . It follows from the compactness of F that there exists a finite collection k1 , k2 , . . . , ks of positive integers such that F ⊂ Vk1 ∪ Vk2 ∪ · · · ∪ Vks . It then follows from Proposition 7.1 that m(F ) ≤ m(Vk1 ) + m(Vk2 ) + · · · + m(Vks ). Now
+∞
X 1 1 1 1 + + · · · + ≤ = 1, k 2k1 2k2 2ks 2 k=1 and therefore m(F ) ≤ m(Vk1 ) + m(Vk2 ) + · · · + m(Vks ) ≤ m(Bk1 ) + m(Bk2 ) + · · · + m(Bks ) + ε +∞ X ≤ m(Bk ) + ε. k=1
Also m(A) < m(F ) + ε. It follows that m(A) ≤
+∞ X
m(Bk ) + 2ε.
k=1
Moreover this inequality holds no matter how small the value of the positive real number ε. It follows that m(A) ≤
+∞ X k=1
as required. 6
m(Bk ),
7.2
Lebesgue Outer Measure
n We say S that a collection S C of n-dimensional bricks covers a subset E of R if E ⊂ B∈C B, (where B∈C B denotes the union of all the bricks belonging to the collection C). Given any subset E of Rn , we shall denote by CCBn (E) the set of all countable collections of n-dimensional bricks that cover the set E.
Definition Let E be a subset of Rn . We define the Lebesgue outer measure ∗ µP (E) of E to be the infimum, or greatest lower bound, of the quantities m(B), where this infimum is taken over all countable collections C of B∈C
n-dimensional bricks that cover the set E. Thus ( ) X µ∗ (E) = inf m(B) : C ∈ CCBn (E) . B∈C
The Lebesgue outer measure µ∗ (E) of a subset E of Rn P is thus the greatest extended real number l with the property that l ≤ m(B) for any B∈C
countable collection C of n-dimensionalPbricks that covers the set E. In particular, µ∗ (E) = +∞ if and only if m(B) = +∞ for every countable B∈C
collection C of n-dimensional bricks that covers the set E. Note that µ∗ (E) ≥ 0 for all subsets E of Rn . Lemma 7.6 Let E be a brick in Rn . Then µ∗ (E) = m(E), where m(E) is the content of the brick E. P Proof It follows from Proposition 7.5 that m(E) ≤ m(B) for any countB∈C
able collection of n-dimensional bricks that covers the brick E. Therefore m(E) ≤ µ∗ (E). But the collection {E} consisting of the single brick E is itself a countable collection of bricks covering E, and therefore µ∗ (E) ≤ m(E). It follows that µ∗ (E) = m(E), as required. Lemma 7.7 Let E and F be subsets of Rn . Suppose that E ⊂ F . Then µ∗ (E) ≤ µ∗ (F ). Proof Any countable collection of n-dimensional bricks that covers the set F will also cover the set E, and therefore CCBn (F ) ⊂ CCBn (E). It follows that ( ) X µ∗ (F ) = inf m(B) : C ∈ CCBn (F ) B∈C
≥ inf
(
X
)
m(B) : C ∈ CCBn (E)
B∈C
7
= µ∗ (E),
as required. Proposition 7.8 Let E be a countable collection of subsets of Rn . Then [ X µ∗ E ≤ µ∗ (E). E∈E
E∈E
Proof Let K = N in the case where the countable collection E is infinite, and let K = {1, 2, . . . , m} in the case where the collection E is finite and has m elements. Then there exists a bijective function ϕ: K → E. We define Ek = ϕ(k) for all k ∈ K. Then E = {Ek : k ∈ K}, and any subset of Rn belonging to the collection E is of the form Ek for exactly one element k of the indexing set K. Let some positive real number ε be given. Then corresponding to each element k of K there exists a countable collection Ck of n-dimensional bricks covering the set Ek for which X ε m(B) < µ∗ (Ek ) + k . 2 B∈C k
S
Let C = k∈K S Ck . Then C is a collection of n-dimensional bricks that covers the union E∈E E of all the sets in the collection E. Moreover every brick belonging to the collection C belongs to at least one of the collections Ck , and therefore belongs to exactly one of the collections Dk , where Dk = Ck \ S j 1. It follows from the definition of measurable sets that ! ! ! ! ! m m m [ [ [ λ A∩ Ek = λ A∩ Ek \ Em + λ A∩ Ek ∩ Em . k=1
k=1
k=1
9
But
A∩
m S
Ek
\ Em = A ∩
k=1
m−1 S
Ek and
k=1
A∩
m S
Ek
∩ Em = A ∩ Em ,
k=1
because the sets E1 , E2 , . . . , Em are pairwise disjoint. Therefore ! ! m−1 m [ [ Ek + λ(A ∩ Em ). λ A∩ Ek = λ A ∩ k=1
k=1
The required result therefore follows by induction on m. Proposition 7.10 Let λ be an outer measure on a set X, and let E and F be λ-measurable subsets of X. Then the complement X \ E of E, and the union E ∪ F , intersection E ∩ F and difference E \ F of E and F are λ-measurable. Proof Let E c = X \ E, F c = X \ F and (E ∪ F )c = X \ (E ∪ F ). Then A ∩ E c = A \ E and A \ E c = A ∩ E, and therefore λ(A) = λ(A \ E) + λ(A ∩ E) = λ(A ∩ E c ) + λ(A \ E c ). We conclude that the complement X \ E of the λ-measurable subset E of X is itself a λ-measurable subset of X. Next we show that E ∪ F is λ-measurable. Now λ(A) = λ(A ∩ E) + λ(A \ E) = λ(A ∩ E) + λ(A ∩ E c ). for all subsets A of X. Also λ(B) = λ(B ∩ F ) + λ(B \ F ) = λ(B ∩ F ) + λ(B ∩ F c ). for all subsets B of X. Therefore λ(A ∩ E) = λ(A ∩ E ∩ F ) + λ((A ∩ E ∩ F c ), λ(A ∩ E c ) = λ(A ∩ E c ∩ F ) + λ((A ∩ E c ∩ F c ), and thus λ(A) = λ(A ∩ E) + λ(A ∩ E c ) = λ(A ∩ E ∩ F ) + λ(A ∩ E ∩ F c ) + λ(A ∩ E c ∩ F ) + λ(A ∩ E c ∩ F c ) for all subsets A of X. Let A be a subset of X, and let B = A ∩ (E ∪ F ). Then A ∩ E ∩ F ⊂ B, A ∩ E ∩ F c ⊂ B, A ∩ E c ∩ F ⊂ B, 10
A ∩ E c ∩ F c ⊂ X \ B, and therefore B ∩ E ∩ F = A ∩ E ∩ F,
B ∩ E ∩ F c = A ∩ E ∩ F c,
B ∩ E c ∩ F = A ∩ E c ∩ F,
B ∩ E c ∩ F c = ∅.
It follows that λ(A ∩ (E ∪ F )) = λ(B) = λ(B ∩ E ∩ F ) + λ(B ∩ E ∩ F c ) + λ(B ∩ E c ∩ F ) + λ(B ∩ E c ∩ F c ) = λ(A ∩ E ∩ F ) + λ(A ∩ E ∩ F c ) + λ(A ∩ E c ∩ F ). Also A ∩ E c ∩ F c = A ∩ (E ∪ F )c . We conclude therefore that λ(A) = λ(A ∩ E ∩ F ) + λ(A ∩ E ∩ F c ) + λ(A ∩ E c ∩ F ) + λ(A ∩ E c ∩ F c ) = λ(A ∩ (E ∪ F )) + λ(A ∩ (E ∪ F )c ) for all subsets A of X. This shows that if E and F are λ-measurable subsets of X, then so is E ∪ F . Let E and F be λ-measurable subsets of X. Then X \ E and X \ F are λ-measurable sets, and therefore (X \E)∪(X \F ) is a λ-measurable set. But (X \E)∪(X \F ) = X \(E ∩F ). Thus the complement X \(E ∩F ) of E ∩F is a λ-measurable set, and therefore E ∩F is itself a λ-measurable set. Thus the intersection of any two λ-measurable subsets of X is a λ-measurable set. It follows from this that the intersection of any finite collection of λ-measurable subsets of X is itself λ-measurable. Let E and F be λ-measurable subsets of X. Then E \ F = E ∩ (X \ F ), and E and X \ F are both λ-measurable sets. It follows that the difference E \ F of any two λ-measurable subsets E and F of X is itself λ-measurable. This completes the proof. It follows from the above proposition that any finite union or intersection of measurable sets is measurable. Proposition 7.11 Let λ be an outer measure on a set X. Then the union of any countable collection of λ-measurable subsets of X is λ-measurable.
11
Proof The union of any two λ-measurable sets is λ-measurable (Proposition 7.10). It follows from this that the union of any finite collection of λ-measurable sets is λ-measurable. Let E1 , E2 , E3 , . . . be an infinite sequence of pairwise disjoint λ-measurable subsets of X. We shall prove that the union of these sets is λ-measurable. m S Let A be a subset of X. Now Ek is a λ-measurable set for each positive k=1
integer m, because any finite union of λ-measurable sets is λ-measurable, and therefore ! ! m m [ [ λ(A) = λ A ∩ Ek + λ A \ Ek k=1
k=1
for all positive integers m. Moreover it follows from Lemma 7.9 that ! m m [ X λ A∩ Ek = λ(A ∩ Ek ). k=1
Also
k=1
m [
Ek ,
≥λ A\
+∞ [
Ek
λ(A ∩ Ek ) + λ A \
+∞ [
A\
+∞ [
Ek ⊂ A \
k=1
k=1
and therefore λ A\
m [
Ek
!
!
.
k=1
k=1
It follows that λ(A) ≥
m X k=1
Ek
!
,
k=1
and therefore λ(A) ≥ =
lim
m→+∞ +∞ X
m X
λ(A ∩ Ek ) + λ A \
+∞ [
Ek
!
k=1
k=1
λ(A ∩ Ek ) + λ A \
k=1
+∞ [
Ek
!
.
k=1
However it follows from the definition of outer measures that ! ! +∞ +∞ +∞ [ [ X λ A∩ Ek = λ (A ∩ Ek ) ≤ λ(A ∩ Ek ). k=1
k=1
k=1
12
Therefore λ(A) ≥ λ A ∩
+∞ [
Ek
!
+λ A\
k=1
+∞ [
Ek
+∞ S
Ek and A \
k=1
λ(A) ≤ λ A ∩
.
k=1
But the set A is the union of the sets A ∩ +∞ [
!
Ek
!
+∞ S
Ek , and therefore
k=1
+λ A\
+∞ [
Ek
!
.
k=1
k=1
We conclude therefore that λ(A) = λ A ∩
+∞ [
Ek
!
+λ A\
k=1
+∞ [
Ek
!
k=1
for all subsets A of X. We conclude from this that the union of any pairwise disjoint sequence of λ-measurable subsets of X. is itself λ-measurable. Now let E1 , E2 , E3 , . . . be a countable sequence of (not necessarily pairwise +∞ +∞ S S Fk , where F1 = E1 , and Ek = disjoint) λ-measurable sets. Then k=1
Fk = Ek \
k−1 S
k=1
Ej for all integers k satisfying k > 1. Now we have proved that
j=1
any finite union of λ-measurable sets is λ-measurable, and any difference of λ-measurable sets is λ-measurable. It follows that the sets F1 , F2 , F3 , . . . are all λ-measurable. These sets are also pairwise disjoint. We conclude that the union of the sets F1 , F2 , F3 , . . . is λ-measurable, and therefore the union of the sets E1 , E2 , E3 , . . . is λ-measurable. We have now shown that the union of any finite collection of λ-measurable sets is λ-measurable, and the union of any infinite sequence of λ-measurable sets is λ-measurable. We conclude that the union of any countable collection of λ-measurable sets is λ-measurable, as required. Corollary 7.12 Let λ be an outer measure on a set X. Then the intersection of any countable collection of λ-measurable subsets of X is λ-measurable. Proof T Let C beSa countable collection of λ-measurable subsets of X. Then X \ E∈C E = E∈C (X \ E) (i.e., the complement of the intersection of the sets in the collection is the union of the complements of those sets.) T Now X \E is λ-measurable for every E ∈ C. Therefore the complement X \ E∈C E T of E∈C E is a union of λ-measurable sets, and is thus itself λ-measurable. It T follows that intersection E∈C E of the sets in the collection is λ-measurable, as required. 13
Proposition 7.13 Let λ be an outer measure on a set X, let A be a subset of X, and let C be a countable collection of pairwise disjoint λ-measurable sets. Then ! [ X λ A∩ E = λ(A ∩ E). E∈C
E∈C
Proof It follows from Lemma 7.9 that the required identity holds for any finite collection of pairwise disjoint λ-measurable sets. Let E1 , E2 , E3 , . . . be an infinite sequence of pairwise disjoint λ-measurable subsets of X. Then ! ! m m +∞ X [ [ λ(A ∩ Ek ) = λ A ∩ Ek ≤ λ A ∩ Ek k=1
k=1
k=1
for all positive integers m. It follows that +∞ X
λ(A ∩ Ek ) = lim
m→+∞
k=1
m X
λ(A ∩ Ek ) ≤ λ A ∩
+∞ [
Ek
!
.
k=1
k=1
But the definition of outer measures ensures that ! ! +∞ +∞ +∞ X [ [ λ(A ∩ Ek ) (A ∩ Ek ) ≤ Ek = λ λ A∩ k=1
k=1
k=1
We conclude therefore that λ A ∩
+∞ S
Ek
k=1
=
+∞ P
λ(A ∩ Ek ) for any infinite
k=1
sequence E1 , E2 , E3 , . . . of pairwise disjoint λ-measurable subsets of X. Thus the required identity holds for any countable collection of pairwise disjoint λ-measurable subsets of X, as required.
7.4
Measure Spaces
Definition Let X be a set. A collection A of subsets of X is said to a σalgebra (or sigma-algebra) of subsets of X if it has the following properties: (i) the empty set ∅ is a member of A; (ii) the complement X \ E of any member E of A is itself a member of A; (iii) the union of any countable collection of members of A is itself a member of A.
14
Lemma 7.14 Let X be a set, and let A be a σ-algebra of subsets of X. Then the intersection of any countable collection of members of the σ-algebra A is itself a member of A. Proof Let C be a countable collection of sets belonging S S to A. Then X \TE ∈ (X \ E) ∈ A. But (X \ E) = X \ E. A for all E ∈ C, and therefore E∈C E∈C E∈C T It follows that the complement of the intersection E of the sets in the E∈C T collection C is itself a member of A, and therefore the intersection E of E∈C
those sets is a member of the σ-algebra A, as required. Let X be a set, and let C be a collection of subsets of X. The collection of all subsets of X is a σ-algebra. Also the intersection of any collection of σ-algebras of subsets of X is itself a σ-algebra. Let A be the intersection of all σ-algebras B of subsets of X that have the property that C ⊂ B. Then A is a σ-algebra, and C ⊂ A. Moreover if B is a σ-algebra of subsets of X, and if C ⊂ B then A ⊂ B. The σ-algebra A may therefore be regarded as the smallest σ-algebra of subsets of X for which C ⊂ A. We shall refer to this σ-algebra A as the σ-algebra of subsets of X generated by C. We see therefore that any collection of subsets of a set X generates a σ-algebra of subsets of X which is the smallest σ-algebra of subsets of X that contains the given collection of subsets. Definition Let X be a set, and let A be a σ-algebra of subsets of X. A measure on A is a function µ: A → [0, +∞], taking values in the set [0, +∞] of non-negative extended real numbers, which has the property that ! [ X µ µ(E) E = E∈C
E∈C
for any countable collection C of pairwise disjoint members of the σ-algebra A. Definition A measure space (X, A, µ) consists of a set X, a σ-algebra A of subsets of X, and a measure µ: A → [0, +∞] defined on this σ-algebra A. A subset E of a measure space (X, A, µ) is said to be measurable (or µmeasurable) if it belongs to the σ-algebra A. Theorem 7.15 Let λ be an outer measure on a set X. Then the collection Aλ of all λ-measurable subsets of X is a σ-algebra. The members of this σ-algebra are those subsets E of X with the property that λ(A) = λ(A ∩ E) + λ(A \ E) for any subset of A. Moreover the restriction of the outer measure λ to the λ-measurable sets defines a measure µ on the σ-algebra Aλ . Thus (X, A, µ) is a measure space. 15
Proof Immediate from Propositions 7.10, 7.11 and 7.13. Definition A measure space (X, A, µ) is said to be complete if, given any measurable subset E of X satisfying µ(E) = 0, and given any subset F of E, the subset F is also measurable. The measure µ on A is then said to be complete. Lemma 7.16 Let λ be an outer measure on a set X, let A be the σ-algebra consisting of the λ-measurable subsets of X, and let µ be the measure on A obtained by restricting the outer measure λ to the members of A. Then (X, A, µ) is a complete measure space. Proof Let E be a measurable set in X satisfying µ(E) = 0, let F be a subset of E, and let A be a subset of X. Then A∩F ⊂ A∩E and A\E ⊂ A\F ⊂ A, and therefore 0 ≤ λ(A ∩ F ) ≤ λ(A ∩ E) and λ(A \ E) ≤ λ(A \ F ) ≤ λ(A). Now it follows from the definition of measurable sets in X that λ(A) = λ(A ∩ E) + λ(A \ E). Moreover 0 ≤ λ(A ∩ E) ≤ λ(E) = µ(E) = 0. It follows that λ(A ∩ E) = 0 and λ(A \ E) = λ(A). The inequalities above then ensure that λ(A∩F ) = 0 and λ(A\F ) = λ(A). But then λ(A) = λ(A∩F )+λ(A\F ), and thus F is λ-measurable, as required.
7.5
Lebesgue Measure on Euclidean Spaces
We are now in a position to give the definition of Lebesgue measure on ndimensional Euclidean space Rn . We have already defined an outer measure µ∗ on Rn known as Lebesgue outer measure. We defined a brick in Rn to be a subset of Rn that is a Cartesian product of n bounded intervals. The product of the lengths of those intervals is the content of the brick. Then, ∗ given any subset E of Rn , we defined the Lebesgue P outer measure µ (E) of the set E to be the infimum of the quantities m(B), where the infimum B∈C
is taken over all countable collections of bricks in Rn that cover the set E, and where m(B) denotes the content of a brick B in such a collection. Thus X m(B) ≥ µ∗ (E) B∈C
for every countable collection C of bricks in Rn that covers E; and, moreover, given any positive real number ε, there exists a countable collection C of bricks in Rn covering E for which X µ∗ (E) ≤ m(B) ≤ µ∗ (E) + ε. B∈C
16
These properties characterize the Lebesgue outer measure µ∗ (E) of the set E. We say that a subset E of Rn is Lebesgue-measurable if and only if it is µ∗ -measurable, where µ∗ denotes Lebesgue outer measure on Rn . Thus a subset E of Rn is Lebesgue-measurable if and only if µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A \ E) for all subsets A of Rn . The collection Ln of all Lebesguemeasurable sets is a σ-algebra of subsets of Rn , and therefore the difference of any two Lebesgue-measurable subsets of Rn is Lebesgue-measurable, and any countable union or intersection of Lebesgue-measurable sets is Lebesguemeasurable. The Lebesgue measure µ(E) of a Lebesgue-measurable subset E of Rn is defined to be the Lebesgue outer measure µ∗ (E) of that set. Thus Lebesgue measure µ is the restriction of Lebesgue outer measure µ∗ to the σ-algebra Ln of Lebesgue-measurable subsets of Rn . It follows from Lemma 7.16 that Lebesgue measure is a complete measure on Rn . Remark The Lebesgue measure µ(E) of a subset E of R2 may be regarded as the area of that set. It is not possible to assign an area to every subset of R2 in such a way that the areas assigned to such subsets have all the properties that one would expect from a well-defined notion of area. One might at first sight expect that Lebesgue outer measure would provide a natural definition of area, applicable to all subsets of the plane, that would have the properties that one would expect of a well-defined notion of area. One would expect in particular that the area of a disjoint union of two subsets of the plane would be the sum of the areas of those sets. However one it is possible to construct examples of disjoint subsets E and F in the plane which interpenetrate one another to such an extent as to ensure that µ∗ (E ∩ F ) < µ∗ (E) + µ∗ (F ), where µ∗ denotes Lebesgue outer measure on R2 . The σ-algebra L2 consisting of the Lebesgue-measurable subsets of the plane R2 is in fact that largest collection of subsets of the plane for which the sets in the collection have a well-defined area; the Lebesgue measure of a Lebesgue-measurable subset of the plane can be regarded as the area of that set. Similarly the σ-algebra L3 of Lebesgue-measurable subsets of threedimensional Euclidean space R3 is the largest collection of subsets of R3 for which the sets in the collection have a well-defined volume. Proposition 7.17 Every open set in Rn is Lebesgue-measurable. Proof Let W be the collection of all open bricks in Rn that are Cartesian products of intervals whose endpoints are rational numbers. Now the set I of all open intervals in Rn whose endpoints are rational numbers is a countable set, as the function that sends such an interval to its endpoints defines an 17
injective function from I to the countable set Q × Q. Moreover there is a bijection from the countable set I n to W that sends each ordered n-tuple (I1 , I2 , . . . , In ) of open intervals to the open brick I1 × I2 × · · · × In . It follows that the collection W is countable. Let V be an open set in Rn , and let v be a point of V . Then there exists some positive real number δ such that B(v, δ) ⊂ V , where B(v, δ) ⊂ V denotes the open ball of radius δ centred on v. Moreover there exist open bricks W belonging to W for which v ∈ W and W ⊂ B(v, δ). It follows that the open set V is the union of the countable collection {W ∈ W : W ⊂ V } of open bricks. Now each open brick is a Lebesgue-measurable set, and any countable union of Lebesgue-measurable sets is itself a Lebesgue-measurable set. Therefore the open set V is a Lebesgue-measurable set, as required. Corollary 7.18 Every closed set in Rn is Lebesgue-measurable. Proof This follows immediately from Proposition 7.17, since the complement of any Lebesgue-measurable set is itself Lebesgue measurable set. Definition A subset of Rn is said to be a Borel set if it belongs to the σ-algebra generated by the collection of open sets in Rn . All open sets and closed sets in Rn are Borel sets. The collection of all Borel sets is a σ-algebra in Rn and is the smallest such σ-algebra containing all open subsets of Rn . Definition A measure defined on a σ-algebra A of subsets of Rn is said to be a Borel measure if the σ-algebra A contains all the open sets in Rn . Corollary 7.19 Lebesgue measure on Rn is a Borel measure, and thus every Borel set in Rn is Lebesgue-measurable. Remark The definitions of Borel sets and Borel measures generalize in the obvious fashion to arbitrary topological spaces. The collection of Borel sets in a topological space X is the σ-algebra generated by the open subsets of X. A measure defined on a σ-ring of subsets of X is said to be a Borel measure if every Borel set is measurable.
18
7.6
Basic Properties of Measures
Let (X, A, µ) be a measure space. Then the measure µ is defined on the σ-algebra A of measurable subsets of X, and takes values in the set [0, +∞], where [0, +∞] = [0, +∞) ∪ {+∞}. Thus µ(E) is defined for each measurable subset E of X, and is either a non-negative real number, or else has the value +∞. The measure µ is by definition countably additive, so that ! [ X µ E = µ(E) E∈C
E∈C
for every countable collection C of pairwise disjoint measurable subsets of X. In particular µ is finitely additive, so that if E1 , E2 , . . . , Er are measurable subsets of X that are pairwise disjoint, then µ(E1 ∪ E2 ∪ · · · ∪ Er ) = µ(E1 ) + µ(E2 ) + · · · + µ(Er ). Also µ
+∞ [
Ej
!
=
j=1
+∞ X
µ(Ej )
j=1
for any infinite sequence E1 , E2 , E3 , . . . of pairwise disjoint measurable subsets of X. Let E and F be measurable subsets of X. Then E = (E ∩ F ) ∪ (E \ F ), and the sets E ∩F and E \F are measurable and disjoint. It therefore follows from the finite additivity of the measure µ that µ(E) = µ(E ∩ F ) + µ(E \ F ). Also E ∪ F is the disjoint union of E and F \ E. and therefore µ(E ∪ F ) = µ(E) + µ(F \ E) = µ(E ∩ F ) + µ(E \ F ) + µ(F \ E). It follows that µ(E ∪ F ) + µ(E ∩ F ) = (µ(E ∩ F ) + µ(E \ F )) + (µ(E ∩ F ) + µ(F \ E)) = µ(E) + µ(F ). Now let E and F be measurable subsets of X that satisfy F ⊂ E. Then µ(E) = µ(F ) + µ(E \ F ), and µ(E \ F ) ≥ 0. It follows that µ(F ) ≤ µ(E). Moreover µ(E \ F ) = µ(E) − µ(F ), provided that µ(E) < +∞. Lemma 7.20 Let (X, A, µ) be a measure space, and let E1 , E2 , E3 , . . . be an infinite sequence of measurable subsets of X. Suppose that Ej ⊂ Ej+1 for all positive integers j. Then ! +∞ [ µ Ej = lim µ(Ej ). j→+∞
j=1
19
Proof Let E =
+∞ S
Ej , let F1 = E1 , and let Fj = Ej \
j=1
j−1 S
Ek for all integers j
k=1
satisfying j > 1. Then the sets F1 , F2 , F3 , . . . are pairwise disjoint, the set Ej is the disjoint union of the sets Fk for which 1 ≤ k ≤ j, and the set E is the disjoint union of all of the sets Fk . It therefore follows from the countable (and finite) additivity of the measure µ that µ(E) =
+∞ X
µ(Fk ),
µ(Ej ) =
k=1
j X
µ(Fk ).
k=1
But then µ(E) =
+∞ X
µ(Fk ) = lim
j→+∞
k=1
j X
µ(Fk ) = lim µ(Ej ),
k=1
j→+∞
as required. Lemma 7.21 Let (X, A, µ) be a measure space, and let E1 , E2 , E3 , . . . be an infinite sequence of measurable subsets of X. Suppose that Ej+1 ⊂ Ej for all positive integers j, and that µ(E1 ) < +∞. Then ! +∞ \ µ Ej = lim µ(Ej ). j→+∞
j=1
Proof Let Gj = E1 \ Ej for all positive integers j, let E =
+∞ T
Ej , and let
j=1
let G =
+∞ S
Gj . It then follows from Lemma 7.20 that µ(G) = lim µ(Gj ). j→+∞
j=1
Now Ej = E1 \ Gj for all positive integers j, and µ(E1 ) < ∞. It follows that µ(Ej ) = µ(E1 ) − µ(Gj ) for all positive integers j. Also E = E1 \ G. Therefore µ(E) = µ(E1 ) − µ(G) = µ(E1 ) − lim µ(Gj ) = lim µ(Ej ), j→+∞
as required.
20
j→+∞
Course 221: Hilary Term 2007 Section 8: The Lebesgue Integral David R. Wilkins c David R. Wilkins 1997–2007 Copyright
Contents 8 The 8.1 8.2 8.3 8.4 8.5 8.6
Lebesgue Integral Measurable Functions . . . . . . . . . . . . . . . . . . . . . Integrals of Measurable Simple Functions . . . . . . . . . . Integrals of Non-Negative Measurable Functions . . . . . . Integration of Functions with Positive and Negative Values Lebesgue’s Dominated Convergence Theorem . . . . . . . . Comparison with the Riemann integral . . . . . . . . . . .
1
. . . . . .
. . . . . .
2 2 9 15 18 22 26
8
The Lebesgue Integral
8.1
Measurable Functions
Definition Let X be a set, let A be a σ-algebra of subsets of X, and let f : X → [−∞, +∞] be a function on X with values in the set [−∞, +∞] of extended real numbers. The function f is said to be measurable with respect to the σ-algebra A if {x ∈ X : f (x) < c} ∈ A for all real numbers c. Definition Let (X, A, µ) be a measure space. A function f : X → [−∞, ∞] defined on X is said to be measurable if it is measurable with respect to the σ-algebra A of measurable subsets of X. It follows from these definitions that a function f : X → [−∞, ∞] defined on a measure space (X, A, µ) is measurable if and only if {x ∈ X : f (x) < c} is a measurable set for all real numbers c. Proposition 8.1 Let X be a set, let A be a σ-algebra of subsets of X, let f : X → [−∞, ∞] be a function on X, with values in the set [−∞, ∞] of extended real numbers, which is measurable with respect to the σ-algebra A, and let a, b and c be real numbers, where a ≤ b. Then the following sets also belong to the σ-algebra A: (i) {x ∈ X : f (x) ≥ c}; (ii) {x ∈ X : f (x) ≤ c}; (iii) {x ∈ X : f (x) > c}; (iv) {x ∈ X : a ≤ f (x) ≤ b}; (v) {x ∈ X : a < f (x) < b}; (vi) {x ∈ X : a ≤ f (x) < b}; (vii) {x ∈ X : a < f (x) ≤ b}; (viii) {x ∈ X : f (x) = c}; (ix) {x ∈ X : f (x) = −∞}; (x) {x ∈ X : f (x) = +∞}; (xi) {x ∈ X : f (x) < +∞}; (xii) {x ∈ X : f (x) > −∞}; 2
(xiii) {x ∈ X : f (x) ∈ R}. Proof The set {x ∈ X : f (x) ≥ c} is the complement of a set {x ∈ X : f (x) < c} belonging to the σ-algebra A, and must therefore itself belong to this σ-algebra. This proves (i). The set {x ∈ X : f (x) ≤ c} may be represented as a countable intersection +∞ \ 1 x ∈ X : f (x) < c + n n=1 of sets that are of the form {x ∈ X : f (x) < c + n−1 } for some natural number n. These sets belong to the σ-algebra A, and any countable intersection of sets belonging to A must itself belong to this σ-algebra. Therefore {x ∈ X : f (x) ≤ c} belongs to the σ-algebra. This proves (ii). The set {x ∈ X : f (x) > c} is the complement of a set {x ∈ X : f (x) ≤ c} which belongs to the σ-algebra A, and must therefore itself belong to A. This proves (iii). The set {x ∈ X : a ≤ f (x) ≤ b} is the intersection of sets {x ∈ X : f (x) ≥ a} and {x ∈ X : f (x) ≤ b} that belong to the σ-algebra A. It follows that {x ∈ X : a ≤ f (x) ≤ b} must itself belong to A. Similarly {x ∈ X : a < f (x) < b} is the intersection of sets {x ∈ X : f (x) > a} and {x ∈ X : f (x) < b}, {x ∈ X : a ≤ f (x) < b} is the intersection of sets {x ∈ X : f (x) ≥ a} and {x ∈ X : f (x) < b}, and {x ∈ X : a < f (x) ≤ b} is the intersection of sets {x ∈ X : f (x) > a} and {x ∈ X : f (x) ≤ b}, and therefore {x ∈ X : a < f (x) < b}, {x ∈ X : a ≤ f (x) < b} and {x ∈ X : a < f (x) ≤ b} belong to A. This proves (iv), (v), (vi) and (vii). Moreover (viii) is a special case of (iv). The set {x ∈ X : f (x) = −∞} may be represented as a countable intersection +∞ \ {x ∈ X : f (x) < −n} n=1
of sets belonging to A, and must therefore itself belong to A. This proves (ix). Similarly the set {x ∈ X : f (x) = +∞} may be represented as a countable intersection +∞ \ {x ∈ X : f (x) ≥ n} n=1
of sets belonging to A, and must therefore itself belong to A. This proves (x).
3
The set {x ∈ X : f (x) < +∞} is the complement of the set specified in (x), and must therefore belong to A. Similarly the set {x ∈ X : f (x) > −∞} is the complement of the set specified in (ix), and must therefore belong to A. This proves (xi) and (xii). Finally we note that {x ∈ X : f (x) ∈ R} is the intersection of the sets {x ∈ X : f (x) < +∞} and {x ∈ X : f (x) > −∞} specified in (xi) and (xii), and must therefore belong to A, as required. Corollary 8.2 Let X be a set, let A be a σ-algebra of subsets of X, let f : X → [−∞, ∞] be a function on X, with values in the set [−∞, ∞] of extended real numbers, which is measurable with respect to the σ-algebra A, and let m be a real number. Then mf is measurable with respect to A. Proof The result is immediate when m = 0. Let c be a real number. If m > 0 then {x ∈ X : mf (x) < c} = {x ∈ X : f (x) < c/m}, and if m < 0 then {x ∈ X : mf (x) < c} = {x ∈ X : f (x) > c/m}. It then follows immediately from Proposition 8.1 and the definition of measurable functions that {x ∈ X : mf (x) < c} ∈ A. Therefore mf is measurable, as required. Proposition 8.3 Let X be a set, let A be a σ-algebra of subsets of X, let f : X → [−∞, ∞] and g: X → [−∞, ∞] be functions on X, with values in the set [−∞, ∞] of extended real numbers, which are measurable with respect to the σ-algebra A. Suppose that f (x) + g(x) is defined for all x ∈ X. Then f + g is measurable with respect to A. Proof Let c be a real number, and let x be a point of X. Suppose that f (x) + g(x) < c. Then there exists some real number δ satisfying δ > 0 for which f (x) + g(x) < c − δ, and there exists a rational number q satisfying f (x) < q < f (x) + δ. Then f (x) > q − δ, and g(x) < (c − δ) − (q − δ) = c − q. We conclude therefore that, given any point x of X for which f (x)+g(x) < c, there exists some rational number q such that f (x) < q and g(x) < c − q. It follows from this that [ {x ∈ X : f (x) + g(x) < c} = {x ∈ X : f (x) < q and g(x) < c − q}. q∈Q
4
Now {x ∈ X : f (x) < q and g(x) < c − q} is the intersection of the sets {x ∈ X : f (x) < q} and {x ∈ X : g(x) < c−q}. Also {x ∈ X : f (x) < q} ∈ A and {x ∈ X : g(x) < c − q} ∈ A, because the functions f and g are measurable. It follows that {x ∈ X : f (x) < q and g(x) < c − q} ∈ A for each rational number q. Also the set Q of rational numbers is countable. We conclude therefore that the set {x ∈ X : f (x) + g(x) < c} can be represented as a countable union of sets belonging to the σ-algebra A, and must therefore itself belong to A. We conclude that the function f + g is measurable, as required. We recall that the sum of two extended real numbers is not defined when one has the value +∞ and the other has the value −∞. Corollary 8.4 Let X be a set, let A be a σ-algebra of subsets of X, let f : X → [−∞, ∞] and g: X → [−∞, ∞] be functions on X, with values in the set [−∞, ∞] of extended real numbers, which are measurable with respect to the σ-algebra A. Then {x ∈ X : f (x) + g(x) is defined and f (x) + g(x) < c} ∈ A for all real numbers c. Proof Let X0 = {x ∈ X : f (x) + g(x) is defined}. Then X \ X0 is the union of the sets {x ∈ X : f (x) = +∞} ∩ {x ∈ X : g(x) = −∞} and {x ∈ X : f (x) = −∞} ∩ {x ∈ X : g(x) = +∞}, and it follows from Proposition 8.1 that both these sets belong to A. It follows that X \ X0 ∈ A, and therefore X0 ∈ A. Now let A0 be the collection of subsets of X0 consisting of all such subsets that are of the form X0 ∩ E for some E ∈ A. It is a straightforward exercise to verify that A0 is a σ-algebra of subsets of X0 . The restrictions of the functions f and g to X0 are measurable with respect to the σ-algebra A0 . It follows from Proposition 8.3 that the restriction of f + g to X0 is measurable with respect to A0 , and therefore {x ∈ X0 : f (x) + g(x) < c} ∈ A0 for all real numbers c. But X0 ∈ A, and therefore every set belonging to A0 is the intersection of two sets belonging to A, and must therefore itself belong to A. Thus A0 ⊂ A. We conclude therefore that {x ∈ X : f (x) + g(x) is defined and f (x) + g(x) < c} ∈ A for all real numbers c, as required. 5
Corollary 8.5 Let X be a set, let A be a σ-algebra of subsets of X, let f : X → [−∞, ∞] and g: X → [−∞, ∞] be functions on X, with values in the set [−∞, ∞] of extended real numbers, which are measurable with respect to the σ-algebra A. Then f ·g is measurable with respect to A, where (f ·g)(x) = f (x)g(x) for all x ∈ X. Proof Let X0 = {x ∈ X : f (x) ∈ R and g(x) ∈ R}. It follows from a straightforward application of Proposition 8.1 that X0 ∈ A. If we then define A0 = {X0 ∩ E : E ∈ A}, then A0 is a σ-algebra of subsets of X0 , and A0 ⊂ A. Now if c > 0 then √ √ {x ∈ X0 : f (x)2 < c} = X0 ∩ {x ∈ X : − c < f (x) < c}. It follows from Proposition 8.1 that {x ∈ X0 : f (x)2 < c} ∈ A0 for all positive real numbers c. Also {x ∈ X0 : f (x)2 < c} = ∅ when c ≤ 0. It follows from this that the restriction of the function f 2 to X0 is measurable with respect to the σ-algebra A0 . Similarly the restrictions of the functions g 2 and (f + g)2 to X0 are measurable with respect to A0 . Now f (x)g(x) = 21 ((f (x) + g(x))2 − f (x)2 − g(x)2 ) for all x ∈ X0 . It therefore follows from a straightforward appplication of Proposition 8.3 that the restriction of f ·g to X0 is measurable with respect to the σ-algebra A0 . Thus {x ∈ X0 : f (x)g(x) < c} ∈ A for all real numbers c. Now if x ∈ X \ X0 then either f (x) ∈ {−∞, +∞} or g(x) ∈ {−∞, +∞}. It follows easily from this that if x ∈ X \ X0 then f (x)g(x) ∈ {−∞, 0, +∞}, and, by straightforward applications of the results in Proposition 8.1, it is easy to show that the sets {x ∈ X \ X0 : f (x)g(x) = −∞}, {x ∈ X \ X0 : f (x)g(x) = 0} and {x ∈ X \ X0 : f (x)g(x) = +∞} all belong to A. We conclude that the intersection of {x ∈ X : f (x)g(x) < c} with the sets X0 and X \ X0 belongs to A for all real numbers c, and therefore {x ∈ X : f (x)g(x) < c} ∈ A for all real numbers c. Therefore f · g is measurable with respect to the σ-algebra A, as required. Lemma 8.6 Let X be a set, let A be a σ-algebra of subsets of X, and let f1 , f2 , . . . , fm be functions on X with values in the set [−∞, +∞] of extended real numbers. Suppose that each of the functions f1 , f2 , . . . , fm is measurable with respect to the σ-algebra A. Then so are max(f1 , f2 , . . . , fm ) and
6
min(f1 , f2 , . . . , fm ).
Proof Let c be a real number. Then {x ∈ X : max(f1 , f2 , . . . , fm ) < c} =
m \
{x ∈ X : fi (x) < c}
i=1
and {x ∈ X : min(f1 , f2 , . . . , fm ) < c} =
m [
{x ∈ X : fi (x) < c}.
i=1
It follows that {x ∈ X : max(f1 , f2 , . . . , fm ) < c} is a finite intersection of sets belonging to A, and must therefore itself belong to A. Similarly {x ∈ X : min(f1 , f2 , . . . , fm ) < c} is a finite union of sets belonging to A, and must therefore itself belong to A. The result follows. Proposition 8.7 Let X be a set, let A be a σ-algebra of subsets of X, and let f1 , f2 , f3 , . . . be an infinite sequence of functions on X with values in the set [−∞, +∞] of extended real numbers. Suppose that each of the functions f1 , f2 , f3 , . . . is measurable with respect to the σ-algebra A. Then so are g and h, where g(x) = sup{fi (x) : i ∈ N},
h(x) = inf{fi (x) : i ∈ N}
for all x ∈ X. Proof Let c be a real number, and let x be a point of X. Then g(x) < c if and only if there exists some natural number n such that fi (x) < c − n−1 for all natural numbers i. Therefore +∞ [ +∞ \ 1 {x ∈ X : g(x) < c} = x ∈ X : fi (x) < c − n n=1 i=1 +∞ \
1 for all real numbers c. Now x ∈ X : fi (x) < c − ∈ A for each natun i=1 ral number n. It follows that {x ∈ X : g(x) < c} ∈ A for all real numbers c. Thus the function g is measurable with respect to A. Also +∞ [ {x ∈ X : h(x) < c} = {x ∈ X : fi (x) < c}. i=1
It follows that {x ∈ X : h(x) < c} ∈ A for all real numbers c. Thus the function h is measurable with respect to A.
7
Corollary 8.8 Let X be a set, let A be a σ-algebra of subsets of X, and let f1 , f2 , f3 , . . . be an infinite sequence of functions on X with values in the set [−∞, +∞] of extended real numbers. Suppose that each of the functions f1 , f2 , f3 , . . . is measurable with respect to the σ-algebra A. Then so are f ∗ and f∗ , where f ∗ (x) = lim sup fn (x),
f∗ (x) = lim inf fn (x) n→+∞
n→+∞
for all x ∈ X. Proof It follows from the definition of the upper and lower limits that f ∗ (x) = lim gn (x) = inf{gn (x) : n ∈ N}, n→+∞
where gn (x) = sup{fi (x) : i ≥ n}. The measurability of f ∗ therefore follows directly on applying Proposition 8.7. Similarly f∗ (x) = lim hn (x) = sup{hn (x) : n ∈ N}, n→+∞
where hn (x) = inf{fi (x) : i ≥ n}, and therefore f∗ is also measurable. Let f1 , f2 , f3 , . . . be an infinite sequence of functions defined on some set X with values in the set [−∞, +∞] of extended real numbers, and let x ∈ X. Then lim fn (x) is defined, and belongs to the set [−∞, +∞] of extended n→+∞
real numbers if and only if lim sup fn (x) = lim inf n→+∞ fn (x). n→+∞
Corollary 8.9 Let X be a set, let A be a σ-algebra of subsets of X, and let f1 , f2 , f3 , . . . be an infinite sequence of functions on X with values in the set [−∞, +∞] of extended real numbers. Suppose that each of the functions f1 , f2 , f3 , . . . is measurable with respect to the σ-algebra A. Let X0 = {x ∈ X : lim fn (x) is defined} n→+∞
Then X0 ∈ A. Moreover if f (x) = lim fn (x) for all x ∈ X0 , then f is a n→+∞
measurable function on X0 . Proof Note that X0 = {x ∈ X : lim sup fn (x) − lim inf fn (x) = 0}. n→+∞
n→+∞
It follows from Proposition 8.1 that X0 ∈ A. Moreover the function f coincides with the measurable functions f ∗ on X0 , where f ∗ (x) = lim sup fn (x), n→+∞
and must therefore be a measurable function on X0 , as required. We see therefore that if (X, A, µ) is a measure space then the limit of any convergent sequence of measurable functions on X must itself be measurable. 8
8.2
Integrals of Measurable Simple Functions
Lemma 8.10 Let (X, A, µ) be a measure space, and let E1 , E2 , . . . , Em be a finite collection of measurable subsets of X. Then there exists a finite collection G S1 , G2 , . . . , Gr of pairwise disjoint measurable subsets of X such that Ej = k∈K(j) Gk for j = 1, 2, . . . , m, where K(j) denotes the set of all integers k between 1 and r for which Gk ⊂ Ej . Proof For each subset S of {1, 2, . . . , m} let \ [ FS = Ej \ j∈S
j6∈S
Ej .
(Thus FS is defined to be the set of all elements x of X that satisfy x ∈ Ej for all j ∈ S and x 6∈ Ej for all j ∈ {1, 2, . . . , m} \ S.) Then each set FS is measurable. Given a point x ∈ X let S(x) denote the set of all integers j between 1 and m for which x ∈ Ej . Then S(x) is the unique subset of {1, 2, . . . , m} for which x ∈ FS(x) . It follows that if S 0 and S 00 are distinct subsets of {1, 2, . . . , m} then S 0 ∩ S 00 = ∅. Let S1 , S2 , . . . , Sr be a list of subsets of {1, 2, . . . , m} with the property that every subset S of {1, 2, . . . , m} for which FS is non-empty occurs exactly once in the list, and let Gk = FSk for k = 1, 2, . . . , r. Then the sets G1 , G2 , . . . , Gr have the required properties. Let X be a set, and let E be a subset of X. The characteristic function of E is defined to be the function χE : X → R defined so that 1 if x ∈ E; χE (x) = 0 if x 6∈ E. Lemma 8.11 Let (X, A, µ) be a measure space, and let f be a real-valued function on X. Suppose that there exist measurable sets E1 , E2 , . . . , Em and real numbers c1 , c2 , . . . , cm such that f (x) =
m X
cj χEj (x)
j=1
for all x ∈ X, where χEj denotes the characteristic function of the set Ej . Then f is a measurable function on X, and {x ∈ X : f (x) = c} is a measurable set for all real numbers c. Proof Lemma 8.10 ensures that there exists a finite collection S G1 , G2 , . . . , Gr of pairwise disjoint measurable subsets of X such that Ej = k∈K(j) Gk for 9
j = 1, 2, . . . , m, where K(j) denotes Pthe set of all integers k between 1 and r for which Gk ⊂ Ej . Then f = rk=1 dk χGk , where each real number dk denotes the sum of those real numbers cj for which Gk ⊂ Ej . Then, given any real number b, the set {x ∈ X : f (x) = b} is the union of those sets Gk for which dk = b, and is therefore a measurable set. It follows that, for each real number c, the set {x ∈ X : f (x) < c} is a finite union of measurable sets, and is therefore a measurable set. Thus the function f is measurable, as required. Remark The above result also follows immediately on noting that the characteristic function of a measurable set is measurable, any scalar multiple of a measurable function is measurable, and any finite sum of measurable functions is measurable. However the proof given above is more elementary than the proof of the result that finite sums of measurable functions are measurable. Definition Let (X, A, µ) be a measure space. A function f : X → R is said to be a measurable simple function on X if there exist measurable sets E1 , E2 , . . . , Em and real numbers c1 , c2 , . . . , cm such that f (x) =
m X
cj χEj (x)
j=1
for all x ∈ X, where χEj denotes the characteristic function of the set Ej . Note that Lemma 8.11 guarantees that any real-valued function on X that satisfies the definition of a measurable simple function on X is guaranteed to be measurable. It follows directly from the definition of measurable simple functions that any constant multiple of a measurable simple function is a measurable simple function, and the sum of any finite number of measurable simple functions is a measurable simple function. Lemma 8.12 Let (X, A, µ) be a measure space, and let f : X → R be a measurable simple function on X. Suppose that f (x) =
m X
cj χEj (x) =
j=1
n X
dk χFk (x)
k=1
for all x ∈ X, where E1 , E2 , . . . , Em and F1 , F2 , . . . , Fn are measurable sets, and c1 , c2 , . . . , cm and d1 , d2 , . . . , dn are real numbers. Then m X j=1
cj µ(Ej ) =
n X k=1
10
dk µ(Fk ).
Proof It follows on applying Lemma 8.10 to the finite collection consisting of all the sets E1 , E2 , . . . , Em , F1 , F2 , . . . , Fn that there exists a that there exists a finite collection GS 1 , G2 , . . . , Gr of pairwise disjoint measurable S subsets of X such that Ej = s∈S(j) Gs for j = 1, 2, . . . , m and Fk = s∈T (k) Gs for k = 1, 2, . . . , m, where S(j) denotes the set of all integers s between 1 and r for which Gs ⊂ Ej and T (k) denotes the set of all integers s between 1 and r for which Gs ⊂ Fk . Then the additivity of the measure µ ensures that X µ(Ej ) = µ(Gs ). s∈S(j)
It follows that m X j=1
cj µ(Ej ) =
m X X
cj µ(Gs ) =
r X X
cj µ(Gs ),
s=1 j∈U (s)
j=1 s∈S(j)
where U (s) denotes the set P of all integers j between 1 and m for which Gs ⊂ Ej . But clearly cj = gs , where gs denotes the value of the j∈U (s)
function f on Gs for s = 1, 2, . . . , r. It follows that m X
cj µ(Ej ) =
r X
gs µ(Gs ).
r X
gs µ(Gs ).
n X
dk µ(Fk ),
s=1
j=1
Similarly n X
dk µ(Fk ) =
s=1
k=1
It follows that
m X
cj µ(Ej ) =
j=1
k=1
as required. Definition Let (X, A, µ) be a measure space, and let f : X → R be a measurable simple function on X. Then there exist measurable sets E1 , E2 , . . . , Em and real numbers c1 , c2 , . . . , cm such that f (x) =
m X
cj χEj (x)
j=1
for all x ∈ X, where χEj denotes the characteristic function of the set Ej . We define Z m X f dµ = cj µ(Ej ). X
j=1
11
R This quantity X f dµ is referred to as the integral of f on X with respect to the measure µ. R Lemma 8.12 ensures that the integral X f dµ of a measurable function is well-defined, and does not depend on the choice of measurable sets m P E1 , E2 , . . . , Em and real numbers c1 , c2 , . . . , cm for which f = cj χEj . j=1
Proposition 8.13 Let (X, A, µ) be a measure space, let f and g be measurable simple functions on X, and let c be a real number. Then Z Z cf dµ = c f dµ, X
and
Z
X
(f + g) dµ =
X
Z
f dµ +
Z
X
g dµ.
X
Proof The result follows immediately from the definition of the integral of a measurable simple function. Lemma 8.14 Let (X, A, µ) be a measure space, and let f : X → R and g: X → R be measurable simple functions on X. Suppose that f (x) ≤ g(x) for all x ∈ X. Then Z Z f dµ ≤
X
g dµ.
X
Proof Let f (x) =
m X
cj χEj (x) and g(x) =
j=1
n X
dk χFk (x)
k=1
for all x ∈ X, where E1 , E2 , . . . , Em and F1 , F2 , . . . , Fn are measurable sets, and c1 , c2 , . . . , cm and d1 , d2 , . . . , dn are real numbers. Then there exists a finite collection of pairwise disjoint measurable sets G1 , G2 , . . . , Gr such that each of the sets E1 , E2 , . . . , Em , F1 , F2 , . . . , Fn is a union of finitely many of the pairwise disjoint sets G1 , G2 , . . . , Gr . Then f (x) =
r X
aj χGj (x) and g(x) =
j=1
r X
bj χGj (x),
j=1
for all x ∈ X, where aj and bj denote the values of the functions f and g on Gj for j = 1, 2, . . . , r. Then aj ≤ bj for each j. It follows that Z Z r r X X f dµ = aj µ(Gj ) ≤ bj µ(Gj ) = g dµ, X
j=1
j=1
as required. 12
X
Definition Let (X, A, µ) be a measure space, let E be a measurable subset of X, and let s: X →R [0, +∞) be a non-negative measurable simple function on X. The integral E s dµ of s over E is defined by the formula Z Z s dµ = s · χE dµ, E
X
where χE denotes the characteristic function of E and s · χE is the product of the functions s and χE (so that (s · χE )(x) = s(x)χE (x) for all x ∈ X). Proposition 8.15 Let (X, A, µ) be a measure space, let s: X → [0, R +∞) be a non-negative measurable simple function on X, and let ν(E) = E s dµ for all measurable sets E. Then ν is a measure defined on the σ-algebra A of measurable subsets of X. Proof The function s is a non-negative measurable simple function on X, and therefore there exist non-negative real numbers c1 , c2 , . . . , cm and meam P surable sets F1 , F2 , . . . , Fm such that s(x) = cj χFj (x) for all x ∈ X. Let j=1
E be a measurable set in X. Then s(x)χE (x) =
m P
cj χFj ∩E (x) for all x ∈ X,
j=1
and therefore ν(E) =
Z
s dµ =
E
Z
s · χE dµ =
X
m X
cj µ(Fj ∩ E).
j=1
Let E be a countable collection of pairwise disjoint measurable sets. It follows from the countable additivity of the measure µ that ν
[
E∈E
E =
m X j=1
cj µ
[
E∈E
m X X X (Fj ∩ E) = cj µ(Fj ∩ E) = ν(E). j=1
E∈E
E∈E
Thus the function ν is countably additive, and is therefore a measure defined on A, as required. Corollary 8.16 Let (X, A, µ) be a measure space, let s: X → [0, +∞) be a non-negative measurable simple function on X, and let E1 , E2 , E3 , . . . be an infinite sequence of measurable subsets of X, where Ej ⊂ Ej+1 for all positive integers j. Then Z Z lim f dµ = f dµ, j→+∞
where E =
+∞ S
Ej
E
Ej .
j=1
13
R Proof Let ν(F ) = F s dµ for all measurable sets F . Then ν is a measure on X. It follows that ! +∞ [ ν Ej = lim ν(Ej ) j→+∞
j=1
(Lemma 7.20). The result follows. We shall extend the definition of the integral to non-negative measurable functions that are not necessarily simple. In developing the properties of this integral, we shall need the result that a non-negative measurable function on a measure set is the limit of a non-decreasing sequence of measurable simple functions. We now proceed to prove this result. Proposition 8.17 Let (X, A, µ) be a measure space, and let f : X → [0, +∞] be a non-negative measurable function on X. Then there exists an infinite sequence s1 , s2 , s3 , . . . of non-negative measurable simple functions with the following properties: (i) 0 ≤ sj (x) ≤ sj+1 (x) for all j ∈ N and x ∈ X; (ii) lim sj (x) = f (x) for all x ∈ X. j→+∞
Proof For each positive integer j let Fj = {x ∈ X : f (x) ≥ j}, and for each integer k satisfying 1 ≤ k ≤ 2j j, let k k−1 ≤ f (x) < j . Ej,k = x ∈ X : 2j 2 Then the sets Fj and Ej,k are measurable sets. Let j
sj (x) = jχFj (x) +
2 j X k−1 k=1
2j
χEj,k (x)
for all j ∈ N and x ∈ X. Then sj is a measurable simple function on X which takes the value 2−j (k − 1) when 2−j (k − 1) ≤ f (x) < 2−j k for some integer k between 1 and 2j j, and takes the value j when f (x) ≥ j. One can readily verify that 0 ≤ sj (x) ≤ sj+1 (x) for all j ∈ N and x ∈ X. If f (x) < +∞ and j ≥ f (x) then 0 ≤ f (x) − sj (x) < 2−j . It follows that if f (x) < +∞ then lim sj (x) = f (x). If f (x) = +∞ then sj (x) = j for all j→+∞
positive integers j, and therefore lim sj (x) = f (x) in this case as well. The j→+∞
result is thus established. 14
8.3
Integrals of Non-Negative Measurable Functions
Definition Let (X, A, µ) be a measure space, and let f : X → [0, +∞] be a measurable function on X taking values in the set [0, +∞] of non-negative R extended real numbers. TheR integral X f dµ of f over X is defined to be the supremum of the integrals X s dµ as s ranges over all non-negative measurable simple functions on X that satisfy s(x) ≤ f (x) for all x ∈ X. Let (X, A, µ) be a measure space, and let f : X → [0, +∞] be a measurable function on X taking values in the set [0, +∞] of non-negative extended real R numbers. It follows from the above definition that X f dµ = C for some nonnegative extended real number C if and only if the following two conditions are satisfied: R (i) X s dµ ≤ C for all non-negative measurable simple functions s on X that satisfy s(x) ≤ f (x) for all x ∈ X. (ii) given any non-negative real number c satisfying c < C, there exists some non-negative measurable simple function s on X such that s(x) ≤ R f (x) for all x ∈ X and X s dµ > c. It follows directly from Lemma 8.14 that the definition of the integral for non-negative measurable functions is consistent with that previously given for measurable simple functions. Lemma 8.18 Let (X, A, µ) be a measure space, and let f : X → [0, +∞] and g: X → [0, +∞] be measurable functions on X with values in the set [0, +∞] of non-negative extended real numbers. Suppose that f (x) ≤ g(x) for all x ∈ X. Then Z Z f dµ ≤ g dµ X
X
Proof This follows immediately from the definition of the integral, since any non-negative measurable simple function s on X satisfying s(x) ≤ f (x) for all x ∈ X will also satisfy s(x) ≤ g(x) for all x ∈ X.
The Monotone Convergence Theorem We now prove an important theorem which states that the integral of the limit of a non-decreasing sequence of measurable functions is equal to the limit of the integrals of those functions. A number of other important results follow as consequences of this basic theorem.
15
Theorem 8.19 Let (X, A, µ) be a measure space, let f1 , f2 , f3 , . . . be an infinite sequence of measurable functions on X with values in the set [0, +∞] of non-negative extended real numbers, and let f : X → [0, +∞] be defined such that f (x) = lim fj (x) for all x ∈ X. Suppose that 0 ≤ fj (x) ≤ fj+1 (x) for j→+∞
all j ∈ N and x ∈ X. Then lim
j→+∞
Z
fj dµ =
X
Z
f dµ
X
Proof It follows fromR Corollary 8.9 that the limitRfunction fRis measurable. R Moreover X fj dµ ≤ X f dµ, and therefore lim X fj dµ ≤ X f dµ. j→+∞
Let s be a non-negative measurable simple function on X which satisfies s(x) ≤ f (x) for all x ∈ X, and let c be a real number satisfying 0 < c < 1. If f (x) > 0 then f (x) > cs(x) and therefore there exists some positive integer j such that fj (x) ≥ cs(x). If f (x) = 0 then s(x) = 0, and therefore +∞ S fj (x) ≥ cs(x) for all positive integers j. It follows that Ej = X, where j=1
Ej = {x ∈ X : fj (x) ≥ cs(x)}. Now c
Z
s dµ ≤
Z
fj dµ ≤
Z
fj dµ ≤ lim
j→+∞
X
Ej
Ej
Z
fj dµ.
X
Also Ej ⊂ Ej+1 for all positive integers j. It therefore follows from Corollary 8.16 that Z Z Z s dµ ≤ lim fj dµ. s dµ = lim c c j→+∞
X
j→+∞
Ej
X
Moreover this inequality holds for all real numbers c satisfying 0 < c < 1, and therefore Z Z s dµ ≤ lim fj dµ. X
j→+∞
X
This inequality holds for all non-negative measurable simple functions s satisfying s(x) ≤ f (x) for all x ∈ X. It Rnow follows from R the definition of the integral of a measurable function that X f dµ ≤ lim X fj dµ, and therefore j→+∞ R R f dµ = lim X fj dµ, as required. X j→+∞
Proposition 8.20 Let (X, A, µ) be a measure space, and let f : X → [0, +∞] and g: X → [0, +∞] be non-negative measurable functions on X. Then Z Z Z (f + g) dµ = f dµ + g dµ. X
X
16
X
Proof It follows from Proposition 8.17 that there exist infinite sequences s1 , s2 , s3 , . . . and t1 , t2 , t3 , . . . of non-negative measurable simple functions such that 0 ≤ sj (x) ≤ sj+1 (x) and 0 ≤ tj (x) ≤ tj+1 (x) for all j ∈ N and x ∈ X, lim sj (x) = f (x) and lim tj (x) = g(x). Then lim (sj (x) + j→+∞
j→+∞
j→+∞
tj (x)) = f (x) + g(x). It therefore follows from Proposition 8.13 and the Monotone Convergence Theorem (Theorem 8.19) that Z Z Z Z (f + g) dµ = lim (sj + tj ) dµ = lim sj dµ + tj dµ j→+∞ X j→+∞ X X X Z Z sj dµ + lim tj dµ = lim j→+∞ X j→+∞ X Z Z = f dµ + g dµ, X
X
as required. Proposition 8.21 Let (X, A, µ) be a measure space, and let f1 , f2 , f3 , . . . be an infinite sequence of non-negative measurable functions on X. Then ! Z +∞ +∞ Z X X fj dµ = fj dµ. X
j=1
j=1
X
Proof It follows from Proposition 8.20 that ! Z N N Z X X fj dµ fj dµ = X
j=1
j=1
X
for all positive integers N . It then follows from the Monotone Convergence Theorem (Theorem 8.19) that ! ! Z Z +∞ N +∞ Z X X X fj dµ = lim fj dµ = fj dµ, X
N →+∞
j=1
X
j=1
j=1
X
as required. Lemma 8.22 (Fatou’s Lemma) Let (X, A, µ) be a measure space, let f1 , f2 , f3 , . . . be an infinite sequence of non-negative measurable functions on X, and let f∗ (x) = lim inf fj (x) for all x ∈ X. Then j→+∞
Z X
f∗ dµ ≤ lim inf j→+∞
17
Z X
fj dµ.
Proof Let gj (x) = inf{fk (x) : k ≥ j} for all j ∈ N and x ∈ X. Then the functions g1 , g2 , g3 , . . . are measurable (Proposition 8.7), and f∗ (x) = limj→+∞ gj (x) for all x ∈ X. Also 0 ≤ gj (x) ≤ gj+1 (x) for all j ∈ N and x ∈ X. It follows from the Monotone Convergence Theorem (Theorem 8.19) that Z Z f∗ dµ = lim gj dµ. j→+∞
X
X
Now gj (x) ≤ fk (x) for all x ∈ X and for all positive integers j and k satisfying j ≤ k. It follows that Z Z gj dµ ≤ fk dµ whenever j ≤ k, X
and therefore
X
Z
gj dµ ≤ inf
Z
X
fk dµ : k ≥ j .
X
It follows that Z Z Z Z f∗ dµ = lim gj dµ ≤ lim inf fk dµ : k ≥ j = lim inf fj dµ, j→+∞
X
j→+∞
X
j→+∞
X
X
as required.
8.4
Integration of Functions with Positive and Negative Values
Definition Let (X, A, µ) be a measure space, and let f : X → [−∞, +∞] be R a measurable function on X. The function f is said to be integrable if |f | dx < +∞. X Let (X, A, µ) be a measure space, and let f : X → [−∞, +∞] be a measurable function on X. Then f gives rise to non-negative measurable functions f+ and f− on X, where f+ (x) = max(f (x), 0) and f− (x) = max(−f (x), 0) for all x ∈ X. Moreover f (x)R= f+ (x) −Rf− (x) and |f R R (x)| = f+ (x)R+ f− (x) for all f dµ ≤ X |f | dµ and RX |f | dµ = X − R x ∈ X. RNow X f+ dµ ≤ X |f | dµ, R f dµ + f dµ. It follows that |f | dµ < +∞ if and only if X f+ dµ < X + X R X − +∞ and X f− dµ < +∞. Definition Let (X, A, µ) be a measure space, R and let f : X → [−∞, +∞] be an integrable function on X. The integral X f dµ of f on X is defined by the identity Z Z Z f dµ = f+ dµ − f− dµ, X
X
X
where f+ (x) = max(f (x), 0) and f− (x) = max(−f (x), 0) for all x ∈ X. 18
Lemma 8.23 Let (X, A, µ) be a measure space, let f : X → [−∞, +∞] be an integrable function on X, and let u: X → [0, +∞] and v: X → [0, +∞] be non-negative integrable functions on X such that f (x) = u(x) − v(x) for all x ∈ X. Then Z Z Z f dµ = u dµ − v dµ. X
X
X
Proof Let f+ (x) = max(f (x), 0) and f− (x) = max(−f (x), 0) for all x ∈ X. Then f (x) = f+ (x) − f− (x) = u(x) − v(x) for all x ∈ X, and therefore f+ (x) + v(x) = f− (x) + u(x) for all x ∈ X. It follows from Proposition 8.20 that Z Z Z Z f+ dµ + v dµ = f− dµ + u dµ. X
X
X
X
But then Z
Z
f dµ =
X
f+ dµ −
X
Z
f− dµ =
X
Z
u dµ −
X
Z
v dµ,
X
as required. Lemma 8.24 Let (X, A, µ) be a measure space, and let f : X → [−∞, +∞] and g: X → [−∞, +∞] be integrable functions on X. Then Z Z Z (f + g) dµ = f dµ + g dµ, X
and
X
Z
cf dµ = c
X
X
Z
f dµ
X
for all real numbers c. Proof Let f+ (x) = max(f (x), 0),
f− (x) = max(−f (x), 0),
g+ (x) = max(f (x), 0),
g− (x) = max(−f (x), 0),
u(x) = f+ (x) + g+ (x) and v(x) = f− (x) + g− (x) for all x ∈ X. Then the functions u and v are integrable, and f (x) + g(x) = u(x) − v(x) for all x ∈ X. It follows from Lemma 8.23 that Z Z Z (f + g) dµ = u dµ − v dµ X X X Z Z Z Z = f+ dµ + g+ dµ − f− dµ − g− dµ X X X X Z Z = f dµ + g dµ. X
X
19
R R The identity X cf dµ = c X f dµ follows directly from the definition of the integral, on considering separately the cases when c > 0, c = 0 and c < 0.
Integrals of Complex-Valued Functions We can define the integral of a complex-valued function by splitting the integrand into its real and imaginary parts. Definition Let (X, A, µ) be a measure space, let f : X → C be a function on X with values in the field C of complex numbers, and u: X → R and v: X → IR be the real-valued functions that determine the real and imaginary √ parts of the function f , so that f (x) = u(x) + iv(x) for all x ∈ X, where i = −1. The function f is said to be measurable if the functions u and v are both measurable; and the function f is said to be integrable if the functions u and v are both integrable. If the function f is integrable, then the integral of f over X is defined by the formula Z Z Z f dµ = u dµ + i v dµ. X
X
X
Lemma 8.25 Let (X, A, µ) be a measure space, and let f : X → C and g: X → C be integrable functions on X with values in the field C of complex numbers. Then Z Z Z (f + g) dµ = f dµ + g dµ, X
and
X
Z
cf dµ = c
X
X
Z
f dµ
X
for all complex numbers c. R R R Proof The identity X (f + g) dµ = X f dµ + X g dµ follows directly on decomposing the functions f and g into their real and imaginary parts. Let f (x) = u(x) + iv(x) where u: X → R and v: X → R are integrable real-valued functions on X. Also let c = a + ib, where a and b are real numbers. Then cf (x) = au(x) − bv(x) + i(av(x) + bu(x)), for all x ∈ X. It follows from Lemma 8.24 that Z Z Z (au − bv) dµ + i (av + bu) dµ cf dµ = X
X
X
20
Z Z = a u dµ − b v dµ + i a v dµ + b u dµ X X X X Z Z = (a + ib) u dµ + i v dµ X X Z = c f dµ Z
Z
X
as required. Proposition 8.26 Let (X, A, µ) be a measure space, let f : X → C be a measurable function on X with values in the field C of complex numbers, and let |f | be the real-valued function on X defined such that |f |(x) = |f (x)| for all x ∈ X. Then the function |f | is Rmeasurable. Moreover the measurable function f is integrable if and only if X |f | dµ < +∞, in which case Z Z f dµ ≤ |f | dµ. X
X
Proof Let f (x) = u(x) + iv(x) for all x ∈ X, where u and v are real-valued functions on X. Then the functions u and v are measurable. Now {x ∈ X : |f |(x) < r} = {x ∈ X : u(x)2 + v(x)2 < r2 }
for all real numbers r. Moreover the function that sends points x of X to u(x)2 + v(x)2 is a measurable function on X, as sums and products of measurable functions are measurable (Proposition 8.3 and Corollary 8.5). Therefore {x ∈ X : |f |(x) < r} is a measurable set for all positive real numbers r. This shows that |f | is a measurable function on X. Now |u(x)| ≤ |f (x)|, |v(x)| ≤ |f (x)| and |f (x)| ≤ |u(x)| + |v(x)| for all x ∈ X, and therefore Z Z Z Z |f | dµ |v| dµ ≤ |u| dµ ≤ |f | dµ, X
X
and
Z X
|f | dµ ≤
X
Z
|u| dµ +
X
X
Z
|v| dµ.
X
R Thus if X |f | dµ < +∞ then u and v are integrable functions on X, and thus f is an integrable function onRX. Conversely, if f is an integrable function on R R X then X |u| dµ < +∞ and X |v| dµ < +∞, and therefore RX |f | dµ < +∞. Thus the measurable function f is integrable if and only if X |f | dµ < +∞. Now suppose that the function f is integrable. There exists a complex R number w satisfying |w| = 1 for which w X f dµ is a positive real number. 21
Let wf (x) = u1 (x) + iv1 (x) for all x ∈ X, where u1 and v1 are real-valued functions on X. Then the functions u1 and v1 are R measurable R functions on X,R and u1 (x) ≤ |f (x)| for all x ∈ X. Also w X f dµ = X u1 dµ because w X f dµ is a non-negative real number. It follows that Z Z Z Z f dµ = w f dµ = u1 dµ ≤ |f | dµ, X
X
X
X
as required.
8.5
Lebesgue’s Dominated Convergence Theorem
Theorem 8.27 Let (X, A, µ) be a measure space, let f1 , f2 , f3 , . . . be an infinite sequence of measurable complex-valued functions on X, and let f be a measurable complex-valued function on X, where f (x) = lim fj (x) for j→+∞
all x ∈ X. Suppose that there exists a non-negative integrable function g: X → [0, +∞] such that |fj (x)| ≤ g(x) for all j ∈ N and x ∈ X. Then the function f is integrable, Z lim |fj − f | dµ = 0 j→+∞
and lim
j→+∞
X
Z
fj dµ =
X
Z
f dµ.
X
Proof |f | satisfies |f |(x) ≤ g(x) for all x ∈ X, and thereR The function R fore X |f | dµ ≤ X g dµ < +∞. It follows from Proposition 8.26 that the function f is integrable on X. Now |fj (x) − f (x)| ≤ 2g(x) for all j ∈ N and x ∈ X. Moreover lim |fj (x) − f (x)| = 0 for all x ∈ X, and therefore
j→+∞
lim (2g(x) − |fj (x) − f (x)|) = 2g(x)
j→+∞
for all x ∈ X. It follows from Fatou’s Lemma (Lemma 8.22) that Z Z 2 g dµ = lim (2g − |fj − f |) dµ X X j→+∞ Z ≤ lim inf (2g − |fj − f |) dµ j→+∞ X Z Z = 2 g dµ − lim sup |fj − f | dµ, j→+∞
X
22
X
and therefore lim sup j→+∞
Z
|fj − f | dµ ≤ 0.
X
R But X |fj − f | dµ ≥ 0 for all positive integers j, as the integrand is nonnegative everywhere on X. It follows that Z Z lim |fj − f | dµ = lim sup |fj − f | dµ = 0. j→+∞
j→+∞
X
X
Now, on applying Proposition 8.26 and Lemma 8.25, we find that Z Z Z Z fj dµ − f dµ = (fj − f ) dµ ≤ |fj − f | dµ. X
X
X
X
for all positive integers j. It follows that Z Z lim fj dµ − f dµ = 0, j→+∞
X
and therefore
lim
j→+∞
X
Z
fj dµ =
Z
X
f dµ,
X
as required. Corollary 8.28 Let (X, A, µ) be a measure space, let f : X → C be a function on X, let u be a positive real number, and, for each real number h satisfying 0 < h < u, let fh : X → C be a measurable function on X, where the functions f and fh take values in the field of complex numbers. Suppose that lim fh (x) = f (x) for all x ∈ X. Suppose also that there exists a nonh→0
negative integrable function g: X → [0, +∞] on X such that |fh (x)| ≤ g(x) for all x ∈ X and h ∈ (0, u). Then the function f is integrable, Z lim |fh − f | dµ = 0 h→0
and lim
h→0
X
Z
fh dµ =
X
Z
f dµ.
X
R R Proof Let F (h) = X fh dµ for all h ∈ (0, u), and let l = X f dµ. It follows from Lebesgue’s Dominated Convergence Theorem that the function f is integrable, and lim F (aj ) = l for all infinite sequences a1 , a2 , a3 , . . . of real j→+∞
23
numbers in (0, u) for which lim aj = 0. A standard result of analysis then j→+∞
ensures that lim F (h) = l, and thus h→0
lim
h→0
Z
fh dµ =
X
Z
f dµ.
X
Indeed, suppose it were the case that lim F (h) did not exist, or were h→0
not equal to the value l. Then there would exist a positive real number ε0 with the property that, given any positive real number δ there would exist some h ∈ (0, u) satisfying 0 < h < δ for which |F (h) − l| ≥ ε0 . It follows that that there would exist an infinite sequence a1 , a2 , a3 , . . . of elements of (0, u) for which 0 < aj < 1/j and |F (aj ) − l| ≥ ε0 , and thus the sequence F (a1 ), F (a2 ), F (a3 ), . . . would not converge to the limit l. Similar reasoning shows that Z lim |fh − f | dµ = 0, h→0
X
as required. The theory of integration provided by the theory of Lebesgue is both more general and more powerful than that of the Riemann integral. Consider bounded real-valued functions defined on a bounded interval in the real line. Any such interval may be regarded as a measure space, the measure being one-dimensional Lebesgue measure. On examining the definition of the Riemann integral, one can establish that those bounded real-valued function on the interval with well-defined Riemann integrals are also integrable with respect to Lebesgue measure, and moreover the value of the Lebesgue integral coincides with that of the Riemann integral. In particular the Lebesgue integrals of standard functions are those that can be computed by the usual techniques of Calculus. Indeed one can easily see that the standard proof of the Fundamental Theorem of Calculus is valid when the theory of integration is that of Lebesgue and the measure is Lebesgue measure on the real line. Corollary 8.29 Let I be an interval in the real line R, let J be an open interval in R, and let f : I × J → C be a continuously differentiable function on I × J with values in the field C of complex numbers. Suppose that there exists some non-negative integrable function g: I → [0, +∞] such that ∂f (x, y) ∂y ≤ g(x) 24
for all x ∈ I and y ∈ J. Then Z Z ∂f (x, y) d f (x, y) dx = dx. dy I ∂y I Proof We apply the theory of the Lebesgue integral, where the relevant measure is Lebesgue measure on the real line. Continuous functions are measurable. Moreover it follows directly from the Mean Value Theorem that f (x, y + h) − f (x, y) ≤ g(x) h whenever x ∈ I and y, y + h ∈ J. It now follows from Corollary 8.28 that Z Z d f (x, y + h) − f (x, y) f (x, y) dx = lim dx h→0 dy I h I Z f (x, y + h) − f (x, y) dx = lim h→0 h ZI ∂f (x, y) = dx, ∂y I as required. We now give an example to demonstrate that it is not always possible to interchanges integrals and limits, when conditions such as those in the statement of Lebesgue’s Dominated Convergence Theorem are not satisfied. Example Let f1 , f2 , f3 , . . . be the sequence of continuous functions on the interval [0, 1] defined by fn (x) = n(xn − x2n ). Now Z 1 n n 1 lim fn (x) dx = lim − = . n→+∞ 0 n→+∞ n + 1 2n + 1 2 On the other hand, we shall show that lim fn (x) = 0 for all x ∈ [0, 1]. n→+∞
Thus one cannot interchange limits and integrals in this case. Suppose that 0 ≤ x < 1. We claim that nxn → 0 as n → +∞. To verify this, choose u satisfying x < u < 1. Then 0 ≤ (n + 1)un+1 ≤ nun for all n satisfying n > u/(1 − u). Therefore there exists some constant B with the property that 0 ≤ nun ≤ B for all n. But then 0 ≤ nxn ≤ B(x/u)n for all n, and (x/u)n → 0 as n → +∞. Therefore nxn → 0 as n → +∞, as claimed. It follows that n n lim fn (x) = lim nx lim (1 − x ) = 0 n→+∞
n→+∞
n→+∞
for all x satisfying 0 ≤ x < 1. Also fn (1) = 0 for all n. We conclude that lim fn (x) = 0 for all x ∈ [0, 1], which is what we set out to show.
n→+∞
25
8.6
Comparison with the Riemann integral
The theory of integration developed by Lebesgue is both more general and more powerful than that developed earlier in the nineteenth century by mathematicians such as Cauchy and Riemann. In order to compare the two theories of integration, we must first review the basic principles of the earlier theory of integration, that gives rise to the concept of the Riemann integral of a bounded function on an interval in the real line. A partition P of an interval [a, b] is a set {x0 , x1 , x2 , . . . , xn } of real numbers satisfying a = x0 < x1 < x2 < · · · < xn−1 < xn = b. Given any bounded real-valued function f on [a, b], the lower sum L(P, f ) and the upper sum U (P, f ) of f for the partition P of [a, b] are defined by L(P, f ) =
n X
mi (xi − xi−1 ),
U (P, f ) =
i=1
n X
Mi (xi − xi−1 ),
i=1
where mi = inf{f (x) : xi−1 ≤ x ≤ xi } and Mi = sup{f (x) : xi−1 ≤ x ≤ xi }. n P Clearly L(P, f ) ≤ U (P, f ). Moreover (xi − xi−1 ) = b − a, and therefore i=1
m(b − a) ≤ L(P, f ) ≤ U (P, f ) ≤ M (b − a), for any real numbers m and M satisfying m ≤ f (x) ≤ M for all x ∈ [a, b]. Definition Let f be a bounded real-valued function on the interval [a, b], Rb where a < b. The upper Riemann integral U a f (x) dx and the lower RieRb mann integral L a f (x) dx of the function f on [a, b] are defined by Z b U f (x) dx ≡ inf {U (P, f ) : P is a partition of [a, b]} , a Z b f (x) dx ≡ sup {L(P, f ) : P is a partition of [a, b]} L a
Rb Rb (i.e., U a f (x) dx is the infimum of the values of U (P, f ) and L a f (x) dx is the supremum of the values of L(P, f ) as P ranges over all possible partitions of the interval [a, b]). If Z b Z b U f (x) dx = L f (x) dx a
a
then the function f is said to be Riemann-integrable on [a, b], and the RieRb mann integral of f on [a, b] is defined to be the common value of U a f (x) dx Rb and L a f (x) dx. 26
In developing the theory of the Riemann integral, one makes use of the notion of refinements of partitions. Let P and R be partitions of [a, b], given by P = {x0 , x1 , . . . , xn } and R = {u0 , u1 , . . . , um }. We say that the partition R is a refinement of P if P ⊂ R, so that, for each xi in P , there is some uj in R with xi = uj . Let R be a refinement of some partition P of [a, b]. It is not difficult to show that Then L(R, f ) ≥ L(P, f ) and U (R, f ) ≤ U (P, f ) for any bounded function f : [a, b] → R. Given any two partitions P and Q of [a, b] there exists a partition R of [a, b] which is a refinement of both P and Q. For example, we can take R = P ∪ Q. Such a partition is said to be a common refinement of the partitions P and Q. Let P and Q be partitions of [a, b], and let R be a common refinement of P and Q. Then L(P, f ) ≤ L(R, f ) ≤ U (R, f ) ≤ U (Q, f ). Thus, on taking the supremum of the left hand side of the inequality L(P, f ) ≤ U (Q, f ) as P ranges over all possible partitions of the interval [a, b], we see that Rb L a f (x) dx ≤ U (Q, f ) for all partitions Q of [a, b]. But then, taking the infimum of the right hand side of Rthis inequality as R b Q ranges over all possible b partitions of [a, b], we see that L a f (x) dx ≤ U a f (x) dx. We now show that, if a bounded measurable function on a bounded interval is Riemann-integrable, then the value of the Riemann integral coincides with the value obtained on integrating the function with respect to Lebesgue measure on the real line, in accordance with the theory developed by Lebesgue. Let f : [a, b] be a bounded measurable function on a closed bounded interval [a, b], and let P be a partition of [a, b]. Then the values of lower sum L(P, f ) and upper sum U (P, f ) are given by the formulae L(P, f ) =
n X
mi (xi − xi−1 ),
U (P, f ) =
i=1
n X
Mi (xi − xi−1 ),
i=1
where mi = inf{f (x) : xi−1 ≤ x ≤ xi } and Mi = sup{f (x) : xi−1 ≤ x ≤ xi }. Let s: [a, b] → R be the function defined such that s(a) = m1 , s(b) = mn , s(x) = mi when xi−1 < x < xi for some integer i between 1 and n, and s(xi ) = max(mi , mi+1 ) for i = 1, 2, . . . , n − 1. Similarly let t: [a, b] → R be the function defined such that t(a) = M1 , T (b) = Mn , t(x) = Mi when xi−1 < x < xi for some integer i between 1 and n, and t(xi ) = min(Mi , Mi+1 ) for i = 1, 2, . . . , n − 1. We regard the interval [a, b] as a measure space, where the measurable sets are the Lebesgue-measurable subsets of [a, b], and the measure on [a, b] is Lebesgue measure µ. Then the functions s and t are measurable simple 27
functions on [a, b]. Moreover the integral of the functions s and t over the onepoint set {ti } is zero for i = 1, 2, . . . , n, and therefore the Lebesgue integrals of the functions s and t satisfy Z
n Z X
s dµ =
[a,b]
Z
i=1 (xi−1 ,xi ) n Z X
t dµ =
[a,b]
i=1
s dµ = t dµ =
(xi−1 ,xi )
n X
i=1 n X
mi (xi − xi−1 ) = L(P, f ), Mi (xi − xi−1 ) = U (P, f ).
i=1
But s(x) ≤ f (x) ≤ t(x) for all x ∈ [a, b], and therefore Z Z Z s dµ ≤ f dµ ≤ t dµ, [a,b]
[a,b]
[a,b]
R where [a,b] f dµ denotes the integral of the the function f over the interval [a, b], taken with respect to Lebesgue measure on [a, b] in accordance with the R theory of Lebesgue. Thus L(P, f ) ≤ [a,b] f dµ ≤ U (P, f ) for all partitions P of [a, b]. It follows from the definitions of the lower and upper Riemann integrals that Z b Z Z b L f (x) dx ≤ f dµ ≤ U f (x) dx. a
[a,b]
a
Thus, if the function f is Riemann-integrable, the common value of its lower and upper Riemann integrals on the interval [a, b] must coincide with the value of the Lebesgue integral. Thus every measurable Riemann-integrable function on [a, b] is also Lebesgue-integrable, and the two theories of integration give the same value for the integral of such a function over a bounded interval in the real line. One can also show that any bounded Riemann-integrable function on a bounded interval is guaranteed to be measurable on [a, b]. Moreover it can also be shown that a bounded measurable real-valued function on [a, b] is Riemann-integrable if and only if there exists a subset E of [a, b] such that E has Lebesgue measure zero and the restriction of f to [a, b] \ E is continuous.
28
Course 221: Hilary Term 2007 Section 9: Signed Measures and the Radon-Nikodym Theorem David R. Wilkins c David R. Wilkins 1997–2007 Copyright
Contents 9 Signed Measures and the Radon-Nikodym Theorem 9.1 Signed Measures . . . . . . . . . . . . . . . . . . . . . 9.2 The Hahn Decomposition Theorem . . . . . . . . . . . 9.3 The Jordan Decomposition of a Signed Measure . . . . 9.4 Absolute Continuity . . . . . . . . . . . . . . . . . . . 9.5 The Radon-Nikodym Theorem . . . . . . . . . . . . . .
1
. . . . .
. . . . .
. . . . .
. . . . .
2 2 2 5 6 8
9 9.1
Signed Measures and the Radon-Nikodym Theorem Signed Measures
Definition Let X be a set, and let A be a σ-algebra of subsets of X. A signed measure is a function ν: A → R that maps elements of A to real numbers and is countably additive, so that [ X ν E = ν(E) E∈E
E∈E
for any pairwise disjoint countable collection E of subsets of X satisfying E ⊂ A. Thus (non-negative) measures and signed measures are countably additive functions defined on a σ-algebra A of subsets of some set X. Non-negative measures take values in the set [0, +∞] of non-negative extended real numbers, whereas signed measures take values in the field R of real numbers. In particular, if ν is a signed measure on a σ-algebra of subsets of some set X, and if E ∈ A, then ν(E) never takes on either of the values +∞ or −∞. We shall prove that any signed measure may be represented as the difference of two non-negative measures. Definition Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. A subset P of X is said to be a positive set if P ∈ A and ν(S) ≥ 0 for all subsets S of P that belong to A. A subset N of X is said to be a negative set if N ∈ A and ν(S) ≤ 0 for all subsets S of P that belong to A.
9.2
The Hahn Decomposition Theorem
Proposition 9.1 Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on A. Suppose that ν(N ) = 0 for all negative subsets N of X. Then ν(S) ≥ 0 for all S ∈ A. Proof We define the measurable subsets of X to be those that belong to the σ-algebra A. Suppose that there existed a measurable subset S of X for which ν(S) < 0. The S could not be a negative set, and therefore there would exist some measurable subset F of S for which ν(F ) > 0. There would then exist some positive integer k1 and a measurable subset F1 of S such that ν(F1 ) ≥ 1/k1 , and such that k1 is the smallest positive integer for which there 2
exists a measurable subset F1 of S satisfying ν(F1 ) ≥ 1/k1 . Let S1 = S \ F1 . Then ν(S1 ) = ν(S) − ν(F1 ) < 0. It would therefore follow that there would exist some positive integer k2 and a measurable subset F2 of S1 such that ν(F2 ) ≥ 1/k2 , and such that k2 is the smallest positive integer for which there exists a measurable subset F2 of S1 satisfying ν(F2 ) ≥ 1/k2 . Then F1 and F2 would be disjoint subsets of S. We could continue in this fashion to obtain an infinite sequence of positive integers k1 , k2 , k3 , . . . and an infinite sequence F1 , F2 , F3 , . . . of pairwise disjoint measurable subsets of S such that ν(Fm ) ≥ 1/km for all positive integers m, and such that, for each positive integer m, the positive integer km is the smallest positive integer for which m−1 S there exists a corresponding subset Fm of S \ Fj satisfying ν(Fm ) ≥ 1/km . j=1
Indeed suppose that positive integers k1 , k2 , . . . , km−1 and pairwise disjoint measurable subsets F1 , F2 , . . . , Fm−1 have been found with the required propm−1 m−1 S P erties. Let Sm = S \ Fj . Then ν(Sm ) = ν(S) − ν(Fj ) < 0. But then j=1
j=1
Sm would not be a negative set, as X contains no measurable sets N satisfying ν(N ) < 0. Therefore there would exist a subset G of Sm satisfying ν(G) > 0, and therefore there would exist a positive integer km and a measurable subset Fm of Sm such that ν(Fm ) ≥ 1/km , and such that km is the smallest positive integer for which there exists a measurable subset Fm of Sm satisfying ν(Fm ) ≥ 1/km . We see therefore that if S were a measurable subset of X satisfying ν(S) < 0 then there would exist an infinite sequence k1 , k2 , k3 , . . . of positive integers and an infinite sequence F1 , F2 , F3 , . . . of pairwise disjoint subsets of S such that ν(Fm ) ≥ 1/km for all positive integers m, and such that, for each positive integer m, the positive integer km is the smallest positive integer for which m−1 S there exists a subset Fm of S \ Fj satisfying ν(Fm ) ≥ 1/km . Clearly j=1
km ≤ km+1 for all positive integers m. Let F =
+∞ S
Fm . The countable
m=1
additivity of the signed measure ν would ensure that +∞ X
m=1
Therefore
lim
m→+∞
−1 km
1/km ≤
+∞ X
ν(Fm ) = ν(F ) < +∞,
m=1
= 0, and thus
lim km = +∞. Also ν(S \ F ) =
m→+∞
ν(S) − ν(F ) ≤ ν(S) < 0, and moreover if G were a measurable subset of S \ F then ν(G) < 1/(km − 1) for all positive integers km for which km > 1, and therefore ν(G) ≤ 0. Thus S \ F would be a negative subset of X, and ν(S \ F ) < 0. But X contains no negative set N satsifying ν(N ) < 0. Thus 3
the existence of a measurable subset S of X satisfying ν(S) < 0 would lead to a contradiction. We conclude therefore that ν(S) ≥ 0 for all measurable subsets S of X, as required. Theorem 9.2 (Hahn Decomposition Theorem) Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σalgebra A. Then there exist subsets N and P of X, such that N is a negative set, P is a positive set, X = N ∪ P , N ∩ P = ∅. Proof We define the measurable subsets of X to be those that belong to the σ-algebra A. The empty set is a negative set, and therefore the set X has negative subsets. Let the extended real number α be the infimum (or greatest lower bound) of the values of ν(Z) for all negative subsets Z of X, let N1 , N2 , N3 , . . . be an infinite sequence of negative sets in X with the +∞ S property that ν(Nj ) → α in [−∞, 0] as j → +∞, and let N = Nj . Let j=1 m−1 S
S be a subset of N , let S1 = S ∩ N1 and let Sm = (S ∩ Nm ) \
Nj for
j=1
all integers m satisfying m > 1. Then the sets S1 , S2 , S3 , . . . are pairwise ∞ S Sm = S. It follows disjoint, Sm ⊂ Nm for all positive integers m, and m=1
from the countable additivity of the signed measure ν that ν(S) =
+∞ P
ν(Sm ).
m=1
Also ν(Sm ) ≤ 0 for all positive integers m, as Sm is a measurable subset of the negative set Nm . It follows that ν(S) ≤ 0. Thus the set N is a measurable set. It follows from this that ν(N \ Nm ) ≤ 0, and therefore that ν(N ) = ν(N \ Nm ) + ν(Nm ) ≤ ν(Nm ) for all positive integers m. But then ν(N ) ≤ lim ν(Nm ) = α, and therefore ν(N ) = α. In particular α > −∞, m→+∞
as the values of the signed measure ν are by definition real numbers. Let P = X \ N . If the set P were to contain a negative set S satisfying ν(S) < 0 then N ∪ S would be a negative set in X satisfying ν(N ∪ S) < α. But this is impossible, as α is by definition the infimum of the values ν(Z) as Z ranges over all negative subsets of X. Therefore P cannot contain any negative set S satisfying ν(S) < 0. We may therefore apply Proposition 9.1 to the restriction of the signed measure ν to the measurable subsets of P , concluding that ν(S) ≥ 0 for all measurable subsets S of P . Thus P is a positive set. We see therefore that N is a negative set, P is a positive set, X = N ∪ P and N ∩ P = ∅, as required. Definition Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. A Hahn decomposition of 4
X with respect to the signed measure ν is a pair (N, P ) of subsets of X such that N is a negative set for the signed measure ν, P is a positive set for ν, X = N ∪ P and N ∩ P = ∅. The Hahn Decomposition Theorem thus guarantees the existence of a Hahn decomposition for any signed measure. Lemma 9.3 Let X be a set, let A be a σ-algebra of subsets of X, let ν be a signed measure defined on the σ-algebra A, and let (N1 , P1 ) and (N2 , P2 ) be Hahn decompositions of X with respect to the signed measure ν, where the sets N1 and N2 are negative, and the sets P1 and P2 are positive. Then ν(S ∩ N1 ) = ν(S ∩ N2 ) and ν(S ∩ P1 ) = ν(S ∩ P2 ) for all S ∈ A. Proof Let S ∈ A. Then S ∩ N1 = (S ∩ N1 ∩ N2 ) ∪ (S ∩ N1 ∩ P2 ). Now S∩N1 ∩P2 ⊂ N1 , and therefore ν(S∩N1 ∩P2 ) ≤ 0. Also S∩N1 ∩P2 ⊂ P2 , and therefore ν(S ∩ N1 ∩ P2 ) ≥ 0. It follows that ν(S ∩ N1 ∩ P2 ) = 0, and therefore ν(S ∩ N1 ) = ν(S ∩ N1 ∩ N2 ) + ν(S ∩ N1 ∩ P2 ) = ν(S ∩ N1 ∩ N2 ). Similarly ν(S ∩ N2 ) = ν(S ∩ N1 ∩ N2 ). It follows that ν(S ∩ N1 ) = ν(S ∩ N2 ). Moreover ν(S ∩ P1 ) = ν(S) − ν(S ∩ N1 ) = ν(S) − ν(S ∩ N2 ) = ν(S ∩ P2 ), as required.
9.3
The Jordan Decomposition of a Signed Measure
Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. Then there exists a Hahn decomposition of X as the disjoint union of a negative set N and a positive set P . Let ν+ (S) = ν(S ∩ P ) and ν− (S) = −ν(S ∩ N ) for all S ∈ S. Then ν+ and ν− are countably additive functions defined on A and are thus (non-negative) measures on X. Moreover ν(S) = ν(S ∩ P ) + ν(S ∩ N ) = ν+ (S) − ν− (S) for all subsets S of X that belong to A. Moreover it follows from Lemma 9.3 that the values of ν+ (S) and ν− (S) are determined by the signed measure ν and the measurable set S, and do not depend on the choice of the Hahn decomposition (N, P ). It follows that every signed measure ν defined on A may be expressed as the difference ν+ − ν− of two (non-negative) measures defined on A. 5
Definition Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. The Jordan decomposition of ν on X is the representation of ν as the difference ν+ −ν− of two (non-negative) measures ν+ and ν− defined on A, where, for each S ∈ A, the values ν+ (S) and ν− (S) of the measures ν+ and ν− on S are characterized by the property that ν+ (S) = ν(S ∩ P ) and ν− (S) = −ν(S ∩ N ) for any Hahn decomposition of X as the disjoint union of a negative set N and a positive set P . Definition Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. The total variation of ν is the (non-negative) measure |ν| defined on A such that |ν|(E) = ν+ (E) + ν− (E) for all E ∈ A. Let X be a set, let A be a σ-algebra of subsets of X, and let ν be a signed measure defined on the σ-algebra A. Then ν+ = 12 (|ν| + ν) and ν− = 12 (|ν| − ν),
9.4
Absolute Continuity
Definition Let X be a set, let A be a σ-algebra of subsets of X, let µ be a (non-negative) measure defined on A, and let ν be a measure or signed measure defined on A. The measure ν is said to be absolutely continuous with respect to the measure µ if ν(E) = 0 for all E ∈ A satisfying µ(E) = 0. If ν is absolutely continuous with respect to the measure µ then we denote this fact by writing ν