This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
.. >. E A} be a collection of real-valued functions on a set X . Then a gauge can be defined by D = { dx : >. E A}, where d.x (x, y) = l f>.. ( x) - f>.. (y) l . =
:
43
Cardinality
This gauge is separating if and only if the collection separates points of X (in the sense of 2.6). b. A gauge D on a set X is separating in the sense of 2.11 if and only if the collection of functions = Ux , d : x E X, d E D} defined by fx . d(Y) = d(x, y) is a separating collection in the sense of 2.6.
CARDINALITY The cardinality of a finite set is the number of distinct elements in that set; thus it is a nonnegative integer. The "cardinality" of a not-necessarily-finite set is a bit harder to define; we shall post pone that concept until 6.23. However, it is much easier to define the comparison of car dinalities of two sets. This notion is due to Georg Cantor and is the foundation of mod ern set theory. We say that two sets X and Y have the same cardinality - written card(X) = card(Y) - if there exists a bijection between X and Y. More generally, we write card( X) � card(Y) if X has the same cardinality as some subset of Y - i.e., if there exists an injection from X into Y. Similarly, we write card(X) < card(Y) if X and Y satisfy card( X ) � card(Y) but do not satisfy card( X) = card(Y) - i.e. , if there exists an injection from X into Y but there does not exist a bijection from X onto Y. (Cantor invented these ideas while investigating Fourier series; see 26.48.) With this convention, we can now restate some of the definitions given in 1.20 and add a few more definitions. A set S is 2. 16.
finite if card(S) = card( { 1 , 2, . . . , n } ) for some nonnegative integer n (in which case we call n the cardinality of the set and write card(S) = n); infinite if it is not finite; cofinite if it is being considered as a subset of some set X and its complement X \ S is finite; countable (or denumerable) if card(S) � card(N) ; uncountable if it is not countable; countably infinite if card(S) = card(N).
Caution: Some mathematicians apply the term "countable" or the term "denumerable" only to the sets that have the same cardinality as N. Also, some mathematicians use a slightly different definition of "infinite" - see the remark in 6.27. The cardinality of a set X is sometimes abbreviated l X I . Much of our presentation of cardinality i s based on Dalen, Doets, and Swart [1978] and Kaplansky [1977] .
44
Chapter 2: Functions
2 . 1 7. Further remarks. Throughout the mathematical literature, the letter a (a Greek lowercase sigma) is often used to indicate countable sums or unions � e.g. , in a-ideals, a-algebras, a-additive measures, a-convex sets, Fa sets. Similarly, 15 (delta) is often used to indicate countable products or intersections � e.g., in sets. We shall define these terms separately in their appropriate contexts.
G15
Remarks. It is customary to use the familiar symbol :S for comparison of cardinal ities. Do not assume too much on the basis of this notation, however; the comparison of cardinalities is not quite like the comparison of real numbers. Some familiar properties of real numbers are also valid for cardinalities, and some are not. For instance, it is quite easy to prove that for any sets X, Y, Z we have 2.18.
card(X) :S card(Y) and card(Y)
:S card(Z)
imply card(X) :S card(Z) .
(The reader should show this now, as an exercise. ) It is rather harder to prove that card(X)
:S card(Y)
and card(Y) :S card(X)
imply
card(X) = card(Y ) ;
that is the content of the Schroder-Bernstein Theorem i n 2.19. Thus, comparison of car dinalities is a preordering; comparison of distinct cardinalities is a partial ordering. Still stronger properties about the comparison of cardinalities will be proved in 6.22, but the proof is deeper and also requires that we assume the Axiom of Choice. Schroder-Bernstein Theorem. Let X and Y be sets. If there exist injections : Y ---+ X and f : X ---+ Y, then there exists a bijection from X onto Y. In other words, if card(Y) :S card(X) and card(X) :S card(Y), then card(X) = card(Y) . 2.19. e
Proof This presentation follows Cox [1968] . We may assume that Y � X and that we are given an injection f : X ---+ Y. (More precisely, since we have an injection e : Y ---+ X, by relabeling we may identify each point o f Y with its image under e; see 1 . 10.) I n the following diagram, the big box represents the set X. X\Y =C
j 2 (C)
f(C) --
-[-+-
!3 (C) -1-+-
j4 (C) -1-+-
...
-1-+-
X\S Let C = X\Y. Since f is injective and has range contained in Y , the sets C, f(C) , P(C) , f 3 (C) , . . . are disjoint. (Here r is the nth iterate of f.) Let s = u:=O r (c); note that f(S) = S \ C � S. (See the diagram above. ) Define a function h : X ---+ Y by taking h(z) = f(z) when z E S and h(z ) = z when z E X \ S. Verify that the function h takes each r (C) bijectively to r + 1 (C) , and hence h is a bijection from X onto Y . 2 . 20 .
Exercises and examples.
Cardinality
45
a. card(0) = 0 and card({0}) = 1 .
b . card(0) < card({ 1 } ) < card( { 1 , 2}) < card( { 1 , 2, 3}) < · · · < card(I"1} c. In N, the subset { n : n > 4} is cofinite. d. The sets N, N U {0}, Z, and {even positive integers} all have the same cardinality. Hint: See diagram below.
N
1, 2, 1 1 { 0, 1 , 1 1 { 0, 1 , 1 1 { 2, 4, {
N u {0} z
{even positive integers}
3, 1 2, 1 -1, 1 6,
4, 1 3, 1 2, 1 8,
5, 1 4, 1 -2, 1 10,
6, 1 5, 1 3, 1 12,
7,
1 6, 1 -3, 1 14 ,
. . .} . . .} . . .}
. . .}
e. Cantor's Theorem on pairs. N x N is countably infinite. Hint: By tracing along the diagonals in the diagram below, we obtain the sequence ( 1 , 1 ) , (2, 1 ) , ( 1 , 2 ), (3, 1 ) , (2, 2 ) , ( 1 , 3 ) , (4, 1 ) , . . . , which is an enumeration of N x N .
(1 1) ' ( 1 , 2) ( 1 , 3) ( 1 , 4) ( 1 , 5)
/ / / /
(2, 1 ) (2, 2) (2, 3)
(2, 4) (2, 5)
/ / / /
(3, 1 ) (3, 2) (3, 3) (3, 4) (3, 5)
/ / / /
(4 , 1) ( 4, 2) ( 4, 3) (4, 4) ( 4, 5)
/ / / /
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5)
f. card(Q) = card(N) . Hint: Use the preceding result, together with the Schroder Bernstein Theorem. Remark. Later we will show card(IR) = card(NN ) = card(2N ) > card(N). See 10.44.f. g. If card(X) 2:: card(N) and u is any object, then card(X U { u } ) = card( X ) . Hints: This is trivial if u E X. If u tt X, use 2.20.d. h. If X and Y are finite sets, then card(X x Y) = card(X)card(Y) and card(X Y ) = card(X) card( Y ) (with the conventions r0 = 1 for r 2:: 0 and or = 0 for r > 0). i. For any set X , we have card(X x X ) 2:: card(X ) . Hint: Treat separately the cases of X = 0 and X of. 0. Remarks. We have card(X x X) > card(X) when X is a finite set containing more than one element. We have card(X x X) = card(X) when X is empty or a singleton. In 6.22 we shall see that card(X x X) = card(X) when X is an infinite set; however, the proof of that result will require the Axiom of Choice.
46
Chapter 2: Functions
If X, Y, and Z are any sets, then there is a natural bijection between zx x Y and (ZY ) x . Indeed, f E zx x Y means that (x, y) f-+ f(x, y) is a map from X Y into Z, while f E (ZY)X means that x f-+ f(x, ·) is a map from X into zY . It is easy to see that this correspondence between zx x Y and (zY)x is no more than a change of notation. k. The set {0, 1} is often called "2." Let X be a set; we can identify each subset S � X with its characteristic function 1s : X {0, 1 }, defined in 2.2.b. Thus there is a natural bijection between the power set of X, P(X) {subsets of X}, and the X th power of the set 2, 2x = {functions from X into 2}. These two objects are often used interchangeably. If X is a finite set, then card(P(X)) 2card(X) is the number of subsets of X. Exercise for beginners. List the eight subsets of X {0, 1, 2}. Hint: Don 't forget 0 and X. l. Theorem (Cantor) . card(P(X)) > card(X) for every set X. Hints: Easily card(P(X)) ;::: card(X). Now suppose that there exists a bijection f X P(X). Define R { x E X x � f(x)} , and let r f - 1 (R). Show that rER r � R, a contradiction. Note: This contradiction is not paradoxical, but it is similar to Russell' s Paradox (1.43). m. Example. card(2�'�) > card(N). n. Example. card(2�'�) card(N�'�). Hints: card(N�'�) :::; card((2�'�)�'�) card(2NxN) card(2�'�) :::; card(N�'�). 2.21. How many kinds of infinity are there? By Cantor's Theorem (in 2.20.1), card(N) < card(P(N)) < card(P(P(N))) < card(P(P(P(N)))) < · · · (where P denotes the power set). Thus there are infinitely many different kinds of infinity. We can get still more infinities, as follows: Let S be the union of all the sets N, P(N), P(P(N)), P(P(P(N))), . . . . Then S is bigger than any one of those sets. We can go further: We have card(S) < card(P(S)) < card(P(P(S))) < card(P(P(P(S)))) < · · · and we can continue this process again and again, infinitely many times. Are there still more infinities? Perhaps there are some even bigger than anything op tained in the "list" suggested above; or perhaps there are some lying between two consecu tive elements of that list. An inaccessible cardinal (also known as a strongly inaccessible cardinal) is, roughly, a set too big to be in the list given above; i.e., it is an uncountable set that is bigger than j.
x
--->
=
=
:
=
--->
{==}
=
=
:
=
=
=
Induction and Recursion on the Integers
47
anything obtainable from smaller sets via power sets and unions. We shall not make this precise; refer to books on set theory and logic (e.g. , Shoenfield [1967] ) for details. It is not intuitively obvious whether such enormous cardinals exist. Their existence or nonexistence is taken as a hypothesis in some studies in set theory. Surprisingly, such assumptions about enormous sets lead to important conclusions about "ordinary" sets such as lR; see 14. 75. In applicable analysis one seldom has any need for infinite cardinalities other than card(N) or card(2 N ) = card(JR ) . The Continuum Hypothesis (CH) asserts that there are no other cardinalities between those two. The Generalized Continuum Hypothesis (GCH) asserts that for any infinite set X, there are no other cardinalities between card(X) and card(2 x ). Cantor spent a large part of his last years trying to prove that CH was true or false. The question remained open for decades. Finally, Godel and Cohen developed new methods to show that neither the truth nor the falsehood of CH can be proved from the usual axioms of set theory; thus CH is independent of those axioms. This is explained briefly in 14.7, 14.8, 14.53, 14.73, and 14.74.
INDUCTION AND RECURSION ON THE INTEGERS 2.22.
We assume the reader is familiar with the basic properties of the natural numbers N =
{ 1 , 2, 3, . . . } . ( Caution: Some mathematicians use the symbol N for the set {0, 1 , 2, 3, . . . } ,
but i n this book 0 tJ_ N . ) Following are two basic principles about the natural numbers. Induction i s a method for proving statements about objects that have already been defined; recursion is a method for defining new objects. Principle of Countable Induction. Suppose 1 E T � N and T has the property that whenever n E T then also n + 1 E T. Then T = N.
This principle can also be formulated as a method for proving that a statement P(x) is true for every x E N - -- just take T = {x E N : P(x) } . (Note: To logicians, this reformulation is not quite equivalent . It is usually understood that the statement P(x) must be expressed using finitely many symbols from a language with only countably many symbols, so there are only countably many possible P's - but there are uncountably many sets T � N.) For our second principle, we shall agree that the empty sequence - the sequence with no components, or the sequence of length 0 -- is a finite sequence and hence a member of the domain of p. Principle of Countable Recursion. Let T be a set, and let p be some map ping from {finite sequences in T} into T. Then there exists a unique sequence (t 1 , t 2 , t 3 , . . . ) in T that satisfies t, = p (t 1 , t 2 , . . . , t n - d for all n.
In other words, our definition of t, may depend on all the preceding definitions. Both of these principles are generalized to sets other than N in 1.50, 3.39.f, 3.40, and 5.51; they are then referred to as transfinite induction and recursion. For now, we note a few elementary applications of the countable case.
48
Chapter 2: Functions
2.23.
Examples in countable induction and recursion.
a. Factorials are defined recursively: 0! = 1 , and (n + 1)! = (n + 1 ) · (n!) for n 0, 1 , 2, 3, . . . . (We read "n!" as "n factorial." ) The first few factorials are 0! = 1 , 1! = 1 , 2 ! = 2 , 3! = 6 , and 4 ! = 24. b. The binomial coefficient G) (read "n choose k" ) can be defined directly by a formula:
(nk) - k!(nn!- k)!
(n = 0, 1, 2, 3, . . . ; k = 0, 1, 2, . . . , n )
or it can be defined by recursion on n: we take then
(Z) = (�) = 1 for n = 0, 1 , 2, 3, . . . , and (O :::; k < n) .
Show by induction that the two methods of defining (�) yield the same values. Also, using the second method, show that the (�) 's are the numbers in Pascal's Triangle. Pascal's Triangle: Each number is the sum of the two numbers above it.
1
1
1 4
1 3
1 2 6
1 3
1
4
1
1
By convention, we define G) = 0 when n ?: 0 and k E Z \ {0, 1 , 2, . . . , n } . c . B y induction on n, prove the Binomial Theorem:
( n = 1 , 2, 3, . . . ) . An example is (x + y) 4 = y 4 + 4xy3 + 6x 2 y2 + 4x3 y + x4 . d. A prime number is an integer greater than 1 that is not divisible by any positive integer except itself and 1. The first few prime numbers are p 1 = 2, p 2 = 3, p3 = 5, P4 = 7, and p5 = 1 1 . The following induction argument proves that there are infinitely many prime numbers and also gives us a crude but easy upper bound on Pn· Assume that p 1 , p 2 , . . . , Pn have already been found, for some positive integer n. Then q = P 1 P2 · · · Pn + 1 is g�eater than Pn , and it is not divisible by any of P l , P2 , . . . , Pn . Hence either q is a new prime, or it is divisible by a new prime. In any case, Pn+ l < q :::; 2P l P2 · · · Pn · Use induction to show that Pn :::; 2 2n . e. Joke. Every positive integer has some remarkably interesting property. "Proof" If not, let no be the first uninteresting number. Then n0 has the property that it is the first uninteresting number - but isn't that an interesting property? Exercise. Carefully explain what has gone wrong here. Hint: See 1 . 1 1 .
C hapter 3 Relat ions and Orderings
3.1. Preview. The chart below shows the connections between some kinds of preorders that we shall study in this and later chapters. Lattices and order completeness are studied in greater detail in Chapter 4; directed orderings are studied further in Chapter 7; Boolean algebras and Heyting algebras are covered in Chapter 13. preordered set
� /PI'I /
set with / "���:� :��'"
/
/ Dedekind
complete poset
�
i,ectcd eet
lattice
\
com�lete latti ce
�
distributive lattice
� / / �nfinitely distributive lattice Heyrng algebra s
i-infinitely distributive lattice
I
(P(n), , �' ;::: , > , respectively. ' Note that ( R - 1 ) - 1 = R. If =:;: and >,:= are relations that are inverses of each other, then there exists a duality between =:;: and >,:=; each statement about either of these relations can be converted to a statement about the other relation. See 1 .7. e. The composition of any two relations Q and R on a set X is the relation defined by Q o R = { (x, y) E X x X : xRu and uQy for at least one u E X} . This definition generalizes that in 2.3 - i.e., i f Q and R are in fact functions, then the composition defined in this fashion is the same as the composition defined in 2.3. Exercise. Verify that the compositions of relations satisfy ( P o Q) o R = P o (Q o R) . f. I f R is a relation on X and Y � X, then the restriction of R t o Y (or trace of R on Y) is the relation R y defined by
I
if and only if In other words, Graph(R often denote R
I Y ) = Graph(R) n (Y
u, v E Y and uRv. x
Y ) . By a slight abuse of notation, we
I Y simply by the same symbol R - for example, a restriction of any of
the relations =, -/=-, � '
�'
:::; ,
::; be an equivalence relation on a set X; let Q be the resulting quotient set and -+ Q be the quotient mapping. A function f defined on X is said to respect the equivalence >::; if the value of f(x) is unchanged when x is replaced by an equivalent element of X - that is, if x 1 >::; x2 =? f(x l ) f(x 2 ). Another way to say this is that each set of the form f � 1 (z) is a union of equivalence classes. Similarly, a relation R on X is said to respect the equivalence >::; if the validity of the statement u R v is unaffected when u, v are replaced by equivalent elements of X - that is, if let
1r : X
=
u � u' ,
v � v' ,
u' R v'.
uRv
Show: a. Let f : X -+ Y be some function. We can define a corresponding function f : Q -+ Y by the rule f(1r(x)) = f(x) if and only if f respects >::; , We then say that the function J is well defined. The hat over the f is sometimes omitted; if no confusion will result , we sometimes use the same symbol f again for the new function defined on Q. b. Let R be some relation on X . We can define a corresponding relation R on Q by the rule u Rv 1r(u) R 1r(v) if and only if R respects the equivalence relation >::; , We then say that the relation R is well defined. The hat over the R is sometimes omitted: If no confusion will result, we sometimes use the same symbol R again for the new relation defined on Q. c. Example. Let � be a preordering on a set X, and define x >::; y to mean that x � y and y � x. Show that >::; is an equivalence relation on X , and that � respects this equivalence relation. Show that the resulting relation ;;; is a partial ordering on the quotient set Q. d. Let (X, d) be a pseudometric space (defined in 2 . 1 1 ) . An equivalence relation >::; on X can be defined by: x >::; y if and only if d(x, y) 0. Then d acts as a metric on the quotient space X/ >::; , More precisely, let 1r : X -+ X/ >::; be the quotient map; then a metric D on X/ >::; can be defined by D (1r(x), 1r(y)) d(x, y) . More generally, let (X, D) be a gauge space. Define an equivalence relation on X 0 for all pseudometrics d E D . Then D acts as a by: x >::; y if and only if d( x, y) separating gauge on the quotient space X/ >::; ,
=
=
=
3 . 13.
The term "equivalent" also has some common uses that are implicit in our mathe matical language: Two words, phrases, or definitions are equivalent if they have the same meaning. This is an equivalence relation on the set of all words, phrases, or definitions in our vocabulary. Similarly, two statements are equivalent if each implies the other via some set of rules of inference. This is an equivalence relation on the set of all statements that can be expressed in our mathematical language. Since different rules of inference may be used, there are actually several meanings for "equivalent statements." Here are two main interpretations: •
Many mathematicians call two statements "equivalent" if each implies the other easily - i.e., by a fairly short and elementary proof. Of course, "elementary" is a subjective
Chapter 3: Relations and Orderings
56
term here; what is elementary for one mathematician may not be elementary for another. Most mathematicians do not make any restriction on the use of the Axiom of Choice; it may be used freely as a "rule of inference." An example: The mathematical literature sometimes refers to Caristi ' s Fixed Point Theorem I 9.45 and Bri::insted's Maximal Principle ( (DC4) in I9.5 I ) as "equivalent" because each implies the other easily; see I9.51. Strictly speaking, the relation "each implies the other easily" is not really an equivalence relation, for it is not transitive: If ( AI)
�
(A2)
(A2),
by 99 easy proofs, then (AI) •
�
�
(A3) ,
(A99)
�
(AIOO)
(A IOO) by a proof that is not necessarily easy.
Logicians sometimes give the Axiom of Choice special status and treat it as a statement rather than as a rule of inference. When this system is followed, then the Axiom of Choice or its consequences can only be used when stated explicitly as hypotheses. This system which will be followed in parts of this book enables us to trace the effects of the Axiom of Choice. For emphasis, statements equivalent in this sense are sometimes called effectively equivalent. See 6.I 8. With this interpretation, Caristi's Fixed Point Theorem and Bri.insted's Maximal Principle are not equivalent; see the discussion in 19.51. �
�
MORE ABOUT POSETS 3.14.
Definitions. Recall that a partial order is a relation � that is reflexive ( x � x for all x), transitive (x � y, y � z =;. x � z) , and antisymmetric (x � y and y � x imply x
= y).
A set equipped with such an ordering is a partially ordered set, or poset. Let (X, � ) be a partially ordered set. An order interval in X is a subset of the form
[a, b]
=
{x E X : a � x � b}
for some a, b E X . In IR or [-oo, +oo], slightly different terminology is commonly used. An interval is any set of one of the following types:
[a, b] [a, b) (a, b] (a, b)
{x E [-oo, +oo] {x E [-oo, +oo] {x E [-oo, +oo] {x E [-oo, +oo]
a :::; x :::; b} , a :s; x < b}, a < x :s; b}, a < x < b},
57
More about Posets
for any extended real numbers a, b. In particular, the extended real line is the interval [ -oo, +oo] (thus justifying our notation), and the real line lR is the interval ( -oo, +oo ) . Two other important sets are [0, +oo) {x E lR : x � 0} and [0, +oo] {x E lR : x � O } U {+oo}. An interval of the form [a, b] is sometimes called a closed interval; an interval of the form (a, b) is an open interval. This terminology reflects the topological structure of lR or [-oo, +oo] , introduced in 5.15.f. 3 . 1 5 . Let X be a poset. A set S � X is order bounded if it is contained in an order interval. It is simply called "bounded" if the context is clear, but be aware that the term "bounded'' has other, possibly inequivalent meanings - see 4.40, 23. 1 , 27.2, and 27.4. Fortunately, all the usual meanings of "bounded" coincide at least for subsets of JRn . Note that any subset of an order bounded set is order bounded. Although the statement "S is bounded" does not mention the set X explicitly, bound edness of a set S � X depends very much on the choice of X. For instance, Z is unbounded when considered as a subset of lR (with its usual ordering), but Z is bounded when con sidered as subset of the extended real line [-oo, +oo] (introduced in 1 .1 7) . In fact, every subset of [-oo, +oo] is bounded, since [-oo, +oo] itself is bounded. 3 . 16 . Let (X, �) be a poset. A lower set in X is a set S � X with the property that =
=
a
X � s, X E X, s E s
X E S.
Some older books refer to lower sets as initial segments or order ideals. Special examples and properties. a. Clearly, X is a lower set in itself.
Any other lower set is called a proper lower set. b. One lower set is the set of predecessors of w, defined by Pre (w) = {x E X : x -< w}. It is proper. It is empty if and only if w = min(X). c . The principal lower set determined by any w E X is the set { x E X : x � w}. It is sometimes denoted by lw. It is nonempty. It is improper if and only if w max(X). Exercise. A lower set is equal to the union of all the principal lower sets that it contains. d. The mapping w f-+ lw, sending each element to its principal lower set, is an order isomorphism from (X, �) onto a subset of the poset (::P(X) , �). Thus any poset can be represented isomorphically in the form (e , �) for some collection e of sets. Lower sets are discussed further in 4.4. b. 3 . 17. Let (X, �) and (Y, [;;; ) be partially ordered sets. A mapping p : X Y is increasing (isotone, order-preserving) if x 1 � x2 =? p(xl) [;;; p(x2 ); decreasing ( antitone, order-reversing) if x 1 � x 2 =? p(xl ) ;;;) p(x 2 ) ; =
--->
58
Chapter 3: Relations and Orderings
if it is increasing or decreasing; strictly increasing or decreasing or monotone if it is injective and (respectively) increasing or decreasing or monotone; an order isomorphism if it is a bijection from X onto Y such that both p and p - 1 are increasing. (The terms "isotone" and "antitone" are used especially if X and Y are collections of sets, ordered by inclusion.) The relationships between these kinds of mappings are explored in the next few exercises; a chart below summarizes the results. The chart also includes sup-preserving and inf-preserving, as a preview of notions that will be introduced in 3.22. monotone
monotone
/ increasing
�decreasing
sup-
strictly
inf-
order isomorphism Caution: Some mathematicians use the terms nondecreasing or weakly increasing where we have used the term "increasing;" some of these mathematicians use the term "increasing" where we have used the term "strictly increasing." Analogous terminology is used for decreasing. 3.18.
Basic properties and examples.
A sequence of real numbers ( r 1 , r2 , r3 , . . . ) is increasing if r 1 :::; r2 :::; r3 :::; · · · . b. S CS is an antitone mapping from (P(X ) , s;:;) into itself, for any set X. c. The inverse of an increasing bijection need not be increasing. For instance, let :::; be the usual ordering on Z, and let � be the partial ordering on Z defined by if y X E {0, 5, 10, 15, 20, 25, . . . }. X � y Then the identity map x x is increasing from (Z, �) into (Z, :::; ) , but not from (Z, :::; ) into (Z, �). d . Let f X Y be any function. Then the forward image map f P(X) P(Y) and the inverse image map f - 1 P(Y) P(X ) , defined in 2. 7 and 2.8, are both order-preserving - that is, a.
f---+
-
f---+
:
----+
:
----+
:
----+
59
Max, Sup, and Other Special Elements
MAx , SuP, AND OTHER SPECIAL ELEMENTS
Let (X, �) be a partially ordered set, and let y, z E X and S
3.22.
--->
L
CHAINS 3.23. Definition. Let (X, �) be a poset. Then the following conditions are equivalent. If any, hence all, are satisfied, we say that (X, �) is a chain (or � is a total order or linear order or chain order) . (A) Any two elements of X are comparable (defined in 3.9.a). (B) Each two-element subset of X has a first element. (C) Each two-element subset of X has a last element. (D) Each nonempty finite subset of X has a first element. (E) Each nonempty finite subset of X has a last element. (F) (X, �) satisfies the Thichotomy Law: for each x, y E X, exactly one of the three conditions
X -< y,
y -< x,
x=y
holds. In other words, the sets Graph(- 0 and E
Let any function f � � and any number > 0 be given. Then there exists a set S t:;; � that acts as an approximate fixed point of f, in the following sense: :
•
•
•
----+
E
diam(S) � For each i 1, 2, . . . , n, there is some u E S such that ui - E � f (u ) i . There exists some point E S satisfying 2:::: 7= 1 Vi + 2 2:::: :� 1 f ( k =
v
E
v
Remarks. We emphasize that f is not assumed to be continuous or even measurable. Aside from the domain and codomain, we make no assumption at all about f. Thus, these theorems are not really "about" f; they are theorems about the combinatorial structure of lR11 • An analogous theorem about infinite dimensional vector spaces will be given in 27.19. A similar argument in two dimensions, more geometrical and elementary in presentation, was given by Shashkin [1991]. Theorem 2 of Baillon and Simons [1992] is also very similar. The First Approximate Fixed Point Theorem can be proved by methods similar to those below, using Wolsey ' s [1977] Cubical Sperner Lemma instead of our 3.36. It would be interesting to know if the First Approximate Fixed Point Theorem can also be proved by some short argument using 3.36; no such argument is presently known to this author. Proof of the Second Approximate Fixed Point Theorem. By writing Un+ 1 1 - 2::::;'= 1 Uj , we may rewrite =
The (n + l)st coordinate will be treated just like the other coordinates in the following argument. Let M be an integer large enough so that 2(n + 1 ) /M < Let X consist of the collection of all points u E � for which all of Mu 1 , A1u2, . . . , A1un, A1un+ 1 are integers. Let P {1, 2, 3, . . . , n, n + 1 }. For 1 � j � n + 1 define a preordering of X by taking u �j when u.i � Let a be a crystal, and let S be its range. If n+l L a ( i) i < 1 - M ' iE Dom ( a) then there exists a member x E X that satisfies a (i ) i < Xi for all i E Dom( a ) , contradicting (CR-2) in 3.32. Thus the inequality above does not hold. E.
=
v
v1 .
-
72
Chapter 3: Relations and Orderings
For any i, j E Dom(a), we have a(i ); :::; a(j);, and therefore n+1 1-� < L a(j); < 1 L a(i); < i EDom(a) iEDom( a) from which it follows that 0 :::; a(j); - a(i); :::; ni[l for each i,j E Dom(a). Therefore ia(j); - a(k);l :::; 2 (n + 1 ) /M whenever i,j, k E Dom(a). Hence i v; l :::; c for all v E S and i E Dom(a). On the other hand, since 2::�:/ a(j); 1, we must have l::i$ Dom( a) a(j); :::; ni[l for j E Dom(a) and, in particular, a(j); :::; nifl for each i t/:. Dom(a). Thus 0 :::; :::; c/2 for all E S and i tJ. Dom(a). Therefore diam(S) :::; c. Define a labeling £ � { 1, 2, . . . , n, n + 1} as follows: let £( ) = i if i is the first coordinate that satisfies :::; f( ) By 3.36 there exists a complete crystal a with respect to that labeling. When i E Dom(a) = C(Ran(a)), then i C( ) for some E S, so :::; f ( );. On the other hand, we noted earlier in this proof that when i tJ. Dom(a) and E S, then :::; c; hence f ( ); 2:: - c. This completes the proof. u; -
=
u
u; u
:
u
u;
u;
--""*
u ;.
u
u,
u;
u
=
u
u
u;
WELL ORDERED SETS 3.38. Definition. Let (X, �) be a poset. We say � is a well ordering if each nonempty subset of X has a first element. Then X is a well ordered set, or a woset. Examples. The set N is well ordered. Also see 3.43 and 5.44. Remark. Well ordered sets are only used infrequently in analysis. This subchapter may be postponed or omitted if the reader is concerned only with the usual topics of analysis. 3 . 39 .
Basic properties of wosets.
Any woset is a chain. b. Any subset of a woset is a woset. c. Let S be a subset of a woset X. Then S is a proper lower set in X if and only if S = Pre(b) for some b E X, with notation as in 3.16.b. Hint for the "only if" part: Let b be the first element of X \ S. d. Let X be a woset. Then the lower sets of X form a woset X, when ordered by � - The last element of X is X. If X is not empty, then the first element of X is 0 Pre(min(X)), where min(X) is the first member of X. e . Any woset X is a proper lower set of some larger woset Y. Indeed, one way to form such a larger set is by adjoining some new element - call it - that is not already present in X and defining to be larger than all the elements of X. f. Induction o n Wosets. Let (X, �) be a woset, and let S be a subset of X with the property that Pre(b) � S b E S. Then in fact S = X. Hint: If not, let b be the first element of X \ S. a.
=
0
=?
0
73
Well Ordered Sets
Notation. For the result below, if (X, �) is a well ordered set and T is a nonempty set, then an X -based sequence in T will mean a function whose domain is some proper lower set of X and whose range is contained in T. As a degenerate case, we may view the empty function (with graph equal to the empty set) as an X-based sequence in T. Theorem of Recursion on Wosets. Let (X, �) be a woset and let T be a nonempty set. Let any function p {X-based sequences in T} T be given. Then there exists a unique function F : X T satisfying for each x E X. 3.40.
:
----+
----+
Here F 1 Pre( x ) denotes the restriction of F to the set Pre( x) = { E X : -< x}. Thus, the value of F at any x is determined, via the rule by the values of F at all the predecessors of x. Remark. Compare this result with 2.22. Proof of theorem. First we prove uniqueness. Suppose F1, F2 are two such functions, and F1 =/= F2. Let x be the first member of X that satisfies F1 ( x) =/= F2 ( x). Then F1 ( ) = F2 ( ) for all E Pre( x) - that is, the restrictions F1 1 Pre ( x) and F2 1 Pre (x ) are the same function But then F1 ( x) = p ( zp) = F2 ( x), a contradiction. This proves uniqueness. We now turn to the existence proof. It will be convenient to replace X with a slightly larger set. Let Y = XU{q}, where q is some object not belonging to X. Extend the ordering of X to an ordering on Y by setting x -< q for all x E X; then Y is also a woset. Note that for each y E Y, the set Pre(y) is a lower set in X; in particular, Pre(q) = X. We shall prove that for each y E Y there is a function Fy : Pre(y) T satisfying for each x E Pre(y). (Once this is established, we simply take F = Fq to prove the theorem.) First note that for each y, we may unambiguously use the notation "F,;" because there is at most one such function Fy; that is clear by a uniqueness argument similar to the one at the beginning of the proof of the theorem. The proof of the existence of Fy 's is by induction on y. Assume, then, that some ry E Y is given and that Fy 's exist for all y -< ry; we are to demonstrate the existence of We demonstrate that in two different ways, according to the nature of ry: First, suppose ry has an immediate predecessor � - that is, suppose ry is the first member of Y after some member � · Then Pre(ry) = {0 U Pre(�), and Ff, : Pre(�) ----+ T is a function of the sort described above. Define a function F : Pre( ry) T by when x E Pre(�) F (x) = pFf,(x) (Ff.) when x = � · It is easy to verify that Fr1 has the required properties. w
p,
zp.
w
w
w
w
----+
Fry .
ry
{
ry
----+
74
Chapter 3: Relations and Orderings
On the other hand, suppose 7) has no immediate predecessor in Y. Then Pre(7) ) = UY-., :S) is well ordered and A is a finite set, then the lexicograph ical ordering is a well ordering on P. (iii) In general, the lexicographical ordering on an infinite product is not a well ordering. Indeed, if A is an infinite woset with no last element and each X>. is a woset containing at least two elements, then P is not well ordered. To see this, let � be the function whose value at .A is the smallest member of X>.; show that P \ {0 has no smallest element. For a more concrete special case, show that {x E {0, l}N : x =/= (0, 0, 0, . . . )} has no first element in {0, l}N. b . (This construction will be used in 3.45.) Let (X, :S ) be a well ordered set. Define an ordering [;;;; on X X, as follows: For (x, y) and (u, v) in X X, say (u, v) (x, y) means that max{u, v} < max{x, y}, or max{u, v} max{x, y} and u < x, or max{u, v} max{x, y} and u x and v < y. Verify that this is a well ordering on X X. 3.45. Theorem on card(X 2 ) . Let X be an infinite set, and suppose that X can be well ordered. Then card(X X) = card(X). Remarks. The present result does not require the Axiom of Choice, which tells us that every set can be well ordered; see 6.20 and 6.22. Proof of theorem. Let I I denote cardinality. Clearly, l X I ::; I X XI. Suppose l X I < I X X I for some infinite woset (X, :S); we shall obtain a contradiction. Clearly, we can replace X by any other woset with the same cardinality; by 3.39.d we may replace X with the first lower set in X that is infinite and satisfies lX I < I X XI. Observe that if K is any proper lower set in X, then either K is finite or IK I = I K K l; hence I KI =/= l XI, hence (since K � X) we have I KI < l XI. In particular, X does not have a last element - for, if X were an infinite woset with last element �, then X \ { 0 would be a proper lower set with the same cardinality as X, a contradiction. Define a well ordering [;;;; on X X as in 3.44.b. Since X and X X are well ordered, one of these sets is uniquely order isomorphic to a lower set of the other. Since l X I < IX XI, the order isomorphism must be from X onto a set L that is a proper lower set of X X. Then I Ll l X I is an infinite cardinal. Let (uo, vo) be the [:;-first member of (X X) \ L. Let w0 be the maximum of uo and v0 in (X, ::;). Let M = {x E X : x ::; w0}. Then M is a lower set in X. Since X does not have a last element, M is a proper lower set in X, and therefore I M I < l XI. Observe that (u, v) (uo, vo) max{u, v} ::; wo u, v E M, (u, v) E L and thus L � M M. Hence ILl ::; IM MI. Since L is an infinite set, M must be infinite, too. By our choice of X, then, I M Ml IMI . Now lXI I L l :S IM M l I M I < lXI, a contradiction. This completes the proof. x
x
c:::
• •
=
•
=
=
x
x
x
x
x
x
x
x
x
x
=
x
x
=}
C:::
x
x
=}
=
=}
=
x
=
77
Well Ordered Sets
3.46. Definition. A collection :J of subsets of a set X is said to have finite character if for each set S ,:= y, y E S =? x E S; down-closed, or a lower set, if x � y, y E S =? x E S; sup-closed if, whenever A is a nonempty subset of S and a = sup(A) exists in X, then a is a member of S; inf-closed if, whenever A is a nonempty subset of S and = inf(A) exists in X, then L is a member of S. (Lower sets were defined in 3.16.) Show that the collections of such sets are Moore collections, with resulting Moore closures as follows: { x E X x >,:= s for some s E S}, up-cl(S) down-cl(S) {x E X x � s for some s E S}, (X, :::; )
L
81
Moore Collections and Moore Closures
{x E X x = sup(A) for some nonempty A � S}, sup-cl(S) {x E X x = inf(A) for some nonempty A � S}. inf-cl( S) Show also that (i) Any up-closed set is sup-closed; any down-closed set is inf-closed. (ii) A set S is up-closed if and only if X \ S is down-closed. (iii) Any union of up-closed sets is up-closed; any union of down-closed sets is down-closed. (This property is not shared by most Moore collections.) (iv) 0 is up-closed and down-closed. c. If P = { d 1 , d2 , . . . , dn } is a finite collection of pseudometrics on a set X (defined in 2.11), then another pseudometric, VP, can be defined by (v P)(x, y)
max {d 1 (x, y), d2 (x, y), . . . , dn (x, y)} . Clearly, V P is the supremum of P in the family of all pseudometrics - i.e., it is the smallest pseudometric that is larger than or equal to all the d1 's. We may also denote it by d 1 V d2 V · · V dn . When P contains just a single pseudometric, then V P equals ·
that pseudometric. Let D be a gauge on X - that is, a collection of pseudometrics. We shall say that D is max-closed if d1 , d2 E D d1 d2 E D or, equivalently, if d 1 , d2 , . . . , dn E D d 1 d2 dn E D. (In the wider literature, another name for max-closed is saturated. ) Clearly, this determines a type of Moore closure on the collection of all pseudometrics on X; the max closure of a gauge D is the gauge max-cl(D) { v P : P is a finite subset of D } . Similarly, a gauge D is or i f d1 , ED d 1 + d2 E D. This also determines a Moore closure, which we shall call the sum closure. A gauge D is directed if for each finite set P � D, there exists some pseudometric d E D such that P are the same; we shall denote them both by sl_ . Thus, sl_ {y E X X y for all X E S}. This set is called the orthogonal complement of S. Let us first restate, in the present notation, some of the conclusions already reached in the preceding sections: b.
qop
f---7
=
f---7
JR.
l_
l_
l_
l_
l_
x
l_
=}
=
:
l_
f---7
(Thus the mapping S Sl_ is antitone.) S S1_1_ is a Moore closure operator on X. We shall denote it by cl, at least for the moment. Caution: This operator is not called a "closure" in most specialized contexts where it is applied. Instead it is given other names, such as "closed linear span." cl ( UAEA (st)) . ( UAEA S>, ) = n AE A ( Sf ) and (n>- EA SA ) cl( n>- EA s>-) = n >-EA cl(s>-) · We also have a few new conclusions, which do not apply to all polar pairs. Show: 0 1_ = {O}j_ = X and Xl_ = {0}. Hence cl(0) = {0}. Thus the empty set is not a closed set. (Hence this Moore closure is not a topological closure; see 5.19.) S ,:= has the same property:
lattice ordering; Dedekind complete; order complete. b. Any well ordered set is Dedekind complete. c. Every chain is a lattice. d. Any lattice is both a poset and a directed set. A poset (X, �) is order complete if and only if it is order bounded and Dedekind complete. f. If (X, � ) is a Dedekind complete poset and both � and >,:= are directed, then (X, � ) is a lattice. 4.16. Observations on products. With the product ordering 3.9.j , a product of lattices is a lattice; a product of complete lattices is a complete lattice; a product of Dedekind complete posets is a Dedekind complete poset. In each case the supremum or infimum in the product is defined pointwise; see 3.2l.n. See also the corollary in 4.28. e.
MORE ABOUT LATTICES x
If (X, �) is a lattice, then the binary operation 1\ : X X ---> X is both and commutative: x2 1\ x 1 associative: X 1 1\ (x2 1\ X3). It follows that the operations in x1 1\ x2 1\ x3 1\ · · · 1\ Xn can be evaluated in any order left to right, right to left, center to outside, etc. - and thus parentheses are not necessary. The value of the expression is the same as inf{x 1, x2, . . . , xn } · (Hint: 3.21.m.) Analogous conclusions apply for V 's and sups. An equivalent definition of lattice is: A poset in which every finite nonempty subset has a sup and an inf. 4.17.
1 See "reflective subcategories," in books on category theory, for other examples besides order completions and uniform completions.
89
More about Lattices
Lattices, particularly finite ones, can be illustrated with lattice diagrams. Elements of the lattice are indicated by vertices - i.e., dots. In these diagrams, we have x >,:= y if there is a downward path from x to y. Two examples are given below. 4.18.
{a, b}
{a
¢
Lattice diagrams {b}
0
0
The first diagram shows the inclusion relation between the subsets of a two-element set. This lattice is known (among some lattice theorists) as 22 . The second diagram shows a lattice containing five members; 0 is the smallest member and 1 is largest. This lattice is sometimes known as M3 . 4.19.
Miscellaneous properties.
In a lattice, the union of two order bounded sets is order bounded. (Hence, in a lattice, the order bounded sets form an ideal of sets, in the sense of 5.2.) b. Not every subset of a lattice is a lattice; not every subset of a directed set is directed. For instance, ';2} with the product ordering is a lattice but its subset {(x, y) E x + y = 0} is not directed. 4.20. Meet-join characterization of lattices. We have defined "lattice" in terms of its ordering � ' but we shall now show that "lattice" can be defined instead in terms of the binary operations 1\ and V. Show that these laws are satisfied, for all x, y, z in a lattice: L1 (commutative) : x 1\ y = y 1\ x and x V y = y V x, L2 (associative): x 1\ (y 1\ z) = (x 1\ y) 1\ z and x V (y V z) = (x V y) V z, L3 (absorption) : x 1\ (x V y) = x and x V (x 1\ y) = x, and (*) x � y x l\ y = x x V y = y. Conversely, suppose X is a set equipped with two binary operations V that satisfy Ll-L3. Show that x 1\ y = x ¢::::::} x V y = y. Define � by ( * ); then show that (X, �) is a lattice. ( Hint: First use L3 to prove that x V x = x 1\ x = x. ) 4 . 2 1 . Let (X, �) be a lattice. Then a sublattice of X is a subset S that is closed under the lattice operations V, 1\ - i.e., that satisfies a.
'Z} :
¢::::::}
¢::::::}
/\,
90
Chapter 4: More about Sups and Infs /\ .
It then follows that S is also a lattice, when equipped with the restrictions of V, If (X, �) is a lattice, S is a subset, and S is a lattice when equipped with the restriction of the ordering � ' it does not follow that S is necessarily a sublattice of X. The collection of all sublattices of a lattice X is a Moore collection of subsets of X. The closure of any set S � X is the smallest sublattice containing S; it is called the sublattice generated by S. Example. Let V = { (x1, x2 , x 3 ) E �3 : x1 + x2 = x3 }. Then V is a subset (in fact, a linear subspace) of �3 . We shall order V by the restriction of the product ordering; that is, means Xj ::::; YJ for all j. on Then V is a lattice (in fact, a vector lattice), but the lattice operations V, 1\ determined V by the ordering � are not simply the restrictions of the lattice operations on �3 . Rather, the reader should verify that (x V y) I max{x1, yi}, max{x2 , y2 }, (x v y)2 (x V y) 3 [ (x V y) I + (x v y)2J and 1\ is computed analogously with minima. For instance, let x = (1, 2, 3) and y = (3, -1, 2). Then the lattice operations of �3 yield x V y = (3, 2, 3), which is not a member of V; the lattice operations of V (defined by the formulas above) yield x V y = (3, 2, 5). This example may seem somewhat contrived, but it is actually quite typical of the behavior one sees in lattices of measures, which are discussed in later chapters. 4.22. Example. The set N = {positive integers} is a lattice when ordered by this rule: x � y if x is a divisor of y that is, if xu = y for some u E N. With this ordering, u V and u 1\ are the least common multiple and greatest common divisor of u and respectively. A sublattice of N is given by {divisors of m}, for any positive integer m. 4.23. Definition. For a lattice (X, �), the following two conditions are equivalent: (A) x 1\ (y V z ) = (x 1\ y) V (x 1\ z ) for all x, y, z E X. (B) x V (y l\ z) = (x V y) l\ (x V z) for all x, y, z E X. If these conditions are satisfied, we say (X, �) is a distributive lattice. 4.24. Definition. We shall say that a lattice (X, �) is semi-infinitely distributive if it satisfies either of the following conditions: (A') x 1\ sup(S) = supsE s (x 1\ ) (B') x V inf(S) = infsE s(x V ) where the equations are to be interpreted in this sense: If the left side of the equation exists, then so does the right side, and they are equal. If both of these two conditions are satisfied, v
-
v,
s ,
s ;
v
91
More about Complete Lattices
the lattice X is infinitely distributive. It is clear that any semi-infinitely distributive lattice is distributive. In 5.21 we shall give an example of a semi-infinitely distributive lattice that is not infinitely distributive; thus the two laws (A') and (B') are not equivalent to each other. Exercise. Let X be a lattice. Show that conditions (A') and (B') are respectively equivalent to the following two conditions: (A" ) sup(R) 1\ sup(S) = sup{r 1\ r E R, E S}, (B" ) inf(R) V inf(S) = inf{r V r E R, E S}, s
s
s
:
s
for any nonempty sets R, S � X. Again, each equation is to be interpreted in this fashion: If the left side of the equation exists, then so does the right side, and they are equal. Hint: See 3.2l.m. 4.25. Examples.
If 0 is any set, then (:P(O), �) is an infinitely distributive lattice. (See 1.29.b.) b. The five-element lattice M3 is not distributive. (See 4.18.) c. Every chain is an infinitely distributive lattice. Further examples of infinitely distributive lattices will be given in 8.43. 4.26. A lattice homomorphism is a mapping f : X ---+ Y, from one lattice into another, that satisfies f(x l V x2 ) = f(xt) V j(x2 ) and j(x1 1\ x2 ) = f(xl ) 1\ j(x2 ) for all x 1 , X 2 in X. Lattice homomorphisms will be studied further in 8.48 and thereafter. a.
MORE ABOUT C OMPLETE LATTICES 4.27. Some important examples. In Chapter 10, in our formal development of IR, we shall show that lR is Dedekind complete. (More precisely, we shall prove that there exists a unique Dedekind complete ordered field and then define lR to be that field.) For now, however, we shall "borrow" that result from Chapter 10: We shall accept the fact that lR is Dedekind complete and use that fact in some examples below. The extended real line, [-oo, +oo], was introduced in 1.17. Recall that it is obtained by adjoining two new objects, -oo and +oo, to the real number system and defining -oo < r < +oo for all real numbers r. It follows that [-oo, +oo] is a chain that is order complete. 4.28. Observation. Let A be any nonempty set. Then JRA = {functions from A into JR} is Dedekind complete, and [-oo, +oo] A = {functions from A into [-oo, +oo]} is a complete lattice, when these products are equipped with the product ordering.
92
Chapter 4: More about Sups and Infs
4.29. Miscellaneous properties. a.
b.
c.
For any set X, the ordering ,:= a >,:= b >,:= y. Since D is a chain, it follows that a < b. Hold a fixed, and let b vary over all of Uy; thus a is a lower bound for Uy, so a E Ly. This reasoning is applicable for every a E Lx , so Lx c:;; Ly. Since x = sup(Lx) and y = sup(Ly), it follows that x � y. Proof of extension of mappings. Define Lt;, Ut, as in 4.31. For each x E X, the set Lx is nonempty and is bounded above in D by any d1 E Ux . Therefore f(d l ) is an upper bound for the set f(Lx) = {f(d) : d E L, }. Since Q is Dedekind complete, sup f(Lx) exists in Q. Hence a function F : X --+ Q can be defined by ( ** ). It is easy to see that this function is increasing and is an extension of f, since f is sup-preserving on D. If f has a sup-preserving extension F : X --+ Q, that extension must satisfy ( ** ), since x = sup(Lx). It suffices to show the function F defined by ( ** ) is indeed sup-preserving. Let S be a nonempty subset of X, and suppose u = sup(S) in X; we are to show that q = sup{F(s) : E S } exists in Q and equals F(u). For simplicity of notation, we may replace S with the set { x E X : x :S s for some s E S}; this does not affect our hypotheses or desired conclusion. Thus we may assume S is down-closed in X. Hence S D = U sES L8• For each E S we have sup(Ls ), and therefore sup(S n D ) u = sup(S) = sup {sup(Ls) : s E S} = sup L s sES by 3.2l.m. Also, from 3.25.d we see that Lu c:;; S { u } .
(ii)
1,
s
n
s
U
(u )
s =
96
Chapter 4: More about Sups and Infs
For each s E S we have s ::::; CJ and hence F(s) ::::; F(CJ); thus the set {F(s) : s E S} is bounded above by F(CJ). Since Q is Dedekind complete, it follows that q = sup{F(s) : s E S} exists in Q and that q ::::; F(CJ). It remains to show the reverse of this inequality. If CJ tt D, then La s;:; S, and so F(CJ) = sup{f(d) : d E La} ::::; sup{F(s) : s E S} = q. On the other hand, if CJ E D, then (since f is sup-preserving on D) f(CJ) = f(sup(S n D)) = sup(f(S n D)) < sup(F(S)) = q. Proof of uniqueness of completions. For k = 1, 2, let fk : D ___s Xk be the inclusion map. Using the Extension Property with X = Xj (for j = 1, 2) and Q = Xk, we see that fk extends uniquely to a sup-preserving mapping Fjk : Xj ____, Xk. Thus Fjk is the only sup-preserving mapping from Xj into Xk that leaves elements of D fixed. Since the identity map of X1 is a sup-preserving map that leaves elements of D fixed, it follows that F11 is the identity map on X1 and that this is the only sup-preserving map from X1 into itself that leaves elements of D fixed. Analogous statements are valid for F22 and x2. The compositions F21 F12 : X1 ____, X1 and F12 F21 : X2 ____, X2 are sup-preserving maps that leave elements of D fixed. Hence these maps are the identity maps on X1 and X2, respectively. Therefore F12 : X1 X2 is an order isomorphism. 4.39. ( Optional remarks.) Although the "Dedekind completion" defined in 4.33 is probably the simplest for the purposes of this book, some mathematicians may prefer a different sort of completion. Let (X, �) be a poset. Let S .) - g(>.) l >. E A} is a metric on B(A) = {bounded functions from A into IR}. Suppose d is a metric on the given set A. Then we may embed the metric space (A, d) in the metric space (B(A), p), as follows: Fix any point in A; we shall· denote it by "0" (although we do not assume any additive structure here). For each JL E A define a function f'" E B(A) by d(>., JL) - d(>., 0) (>. E A). Verify that p(f'", fv) d(JL, ) Thus JL f'" is a distance-preserving map from A into B(A), and so we may view A as a subset of B(A). The space B(A) has certain special properties that will be of interest later: It is a Banach space. Thus the example above shows that every metric space can be embedded isometrically in a Banach space. See 19.11.£ and 22.14. a.
c.
:
=
v .
�---+
98
Chapter 4: More about Sups and Infs
4.42.
f(y, x).
x
Let X be a set, and let f : X X ----+ [O, +oo) be some function satisfying f(x, y) = Then we can define a pseudometric d on X by d(x, y)
inf
{t, f(a;_ , , a;)
here the infimum is over all nonnegative integers m and all finite sequences ( aj )j=o in X that go from x to y. The existence of the infimum follows from the fact that [0, +oo] is order complete. We permit m = 0 in the case when x = y; then the sum is interpreted to be 0. This construction can be summarized informally as "the distance between two points is the shortest route connecting them." It is not hard to show that d(x, y) ::::; f(x, y) and that in fact d is the largest pseudometric that is less than or equal to f. 4.43. More generally, the formula above defines a pseudometric d if the function f is merely defined on a subset D 0. Choose j as large as possible satisfying L.:;{= 1 f(xi - 1 , xi ) ::::; b/2. Then j < m and L.::I�i f(xi - 1 , xi ) > b/2; hence L.:::':J +2 f(xi - 1 , xi ) < b/2. By two uses of the induction hypothesis we have f(xo, xj ) ::::; b and Also f(xj , Xj + l ) ::::; b by our definition of b. By ( 1 ) we have f(x0 , Xm ) ::::; 2b, completing the induction proof of (2). Now define d as in 4.42. Then d is a pseudometric and d ::::; f. From (2) we have f ::::; 2d. The second inclusion in ( * * * ) is obvious, since d ::::; f. For the first inclusion in ( * * * ), suppose d(x, y) < 2 - n. By the definition of d, then, there exists a finite sequence ao, a1 , . . . , am E X, with ao = x and am = y and L.:;'J'= 1 f(aj - 1 , aj ) < 2 - n. By (2), then, f(x, y) < 2 - n+ 1 . Since f takes on only the values 2°, 2 - 1 , 2 - 2 , . . . , and 0, we must have f(x, y) ::::; 2 - n, and hence (x, y) E Vn .
C hapter 5 F ilters, Topologies , and Other S et s of Sets
FILTERS AND IDEALS
5.1. Let 3" be a nonempty collection of subsets of a set X . We say 3" is a filter on X if (i) S E 3" and S ,:= is another preorder.) Example. The upper set topology on N is v
{ { 1 , 2, 3, . . . } , {2, 3, 4, . . . } , {3, 4, 5, . . . } , . . . , 0 } .
e. Let X be a subset of a topological space (Y, 'J) . Verify that {X n T : T E 'J} is a topology on X . It is called the relative topology, or subspace topology, induced
Y.
Any subset of a topological space will be understood to be equipped with on X by its relative topology, unless some other arrangement is specified. Show that (i) A subset of X is open in the relative topology if and only if it is of the form X n G for some set G � that is 'J-open.
Y
Chapter 5: Filters, Topologies, and Other Sets of Sets
108
(ii) A subset of X is closed in the relative topology if and only if it is of the form X n F for some set F s;;; that is 'J-closed.
Y
(iii) Suppose X itself is 'J-open. Then a subset of X is open in the relative topology if and only if it is 'J-open. (iv) Suppose X itself is 'J-closed. Then a subset of X is closed in the relative topology if and only if it is 'J-closed.
Y. Y.
Y
(v) Suppose W s;;; X s;;; Then the relative topology induced on W by is the same as the relative topology induced on W by the relative topology induced on X by
f. Let (X, :S:) b e a chain. Let 'J b e the collection o f all sets T s;;; X that satisfy the following condition:
For each p E T, there exists some set J of the form { x E X {x E X : a < x < b} or {x E X : x < b} such that p E J s;;; T.
a < x} or
Then 'J is a topology on X , called the order interval topology. The usual topologies on IR and [-oo, +oo] are their order interval topologies. These sets will always be understood to be equipped with these topologies, unless some other arrangement is specified. The topology of IR is in many ways typical of topologies used in analysis. In fact, most topological spaces used in analysis are built from copies of IR, in one way or another. Any subset of IR is a chain, but such sets are not always equipped with their order interval topologies. Rather, they are usually equipped with the relative topology induced by IR (as defined in 5. 1 5.e) . That topology does not always agree with the order interval topology; we shall compare the two topologies in 15.46. g. Definitions. Let (X, d) be a pseudometric space (defined as in 2. 1 1 ) . For any z E X and r > 0, we define the open ball centered at z with radius r to be the set {x E X : d(x, z) < r}.
(We may omit the subscript be open if
d when no confusion will result.) A set T s;;; X is said to
for each z E T, there exists some
r >
0 such that Bd(z, r)
s;;; T.
The reader should verify that the collection of all such sets T is a topology 'Jd on X; we call it the pseudometric topology (or the metric topology, if d is known to be a metric) . Any pseudometric space will be understood to be equipped with this topology, unless some other is specified. The reader should verify that Bd( z, r) is an open set in the topological space (X, 'Jd) , thus justifying our calling i t the open ball. We also define the closed ball with center z and radius r to be the set {x E X : d(z, x) ::; r}.
The reader should verify that this is a 'Jd-closed set.
Topologies
109
The usual metric on IR is that given by the absolute value function - that is, = l x - Y l · The set IR is always understood to be equipped with this metric, unless some other arrangement is specified. Exercise. Show that the resulting metric topology on IR is the same as the order interval topology on R (This result will be easier to prove later; see 15.43. ) Two of the usual metrics on the extended real line [-oo, +oo] are
d(x, y)
d(x, y)
=
I arctan(x) - arctan(y) l
and
d(x, y) =
J 1 :I x l -
1
:1Y 1 J ·
(It follows easily from 2 . 1 5.a that these are both metrics.) In fact, there are many usual metrics on [ -oo, +oo] , all of them slightly more complicated than one might wish. Fortunately, they are interchangeable for most purposes: They all yield the same topology, and in fact we shall see in 18.24 that they all yield the same uniformity. Exercise. Show that the two metrics given above both yield the order interval topology on [-oo, +oo] . (This exercise may be postponed; it will be easier after 15.43. ) Further examples of pseudometric topologies are given i n 5.34 and elsewhere. h. For many applications we shall need a generalization of pseudometric topologies: Let D be a gauge (i.e., a collection of pseudometrics) on a set X. For each d E D let Bd be the corresponding open ball, as in 5. 15.g. Let 'JD be the collection of all sets T � X having the property that for each x E T, there is some finite set Do that nd E Do Bd ( x , r) � T.
� D and some number r > 0 such
Then ( exercise) 'JD is a topology on X . We may call it the gauge topology determined by D. Any gauge space (X, D) will be understood to be equipped with this topology unless some other arrangement is specified. We may write 'J, omitting the subscript D, if the choice of D is clear or does not need to be mentioned. Exercises. If D is a gauge and E is its max closure or its sum closure (as defined in 4.4.c) , then D and E determine the same topology. If D is a directed gauge (as defined in 4.4.c), then we can always choose Do to be a singleton in the definition of 'JD given above. Remarks continued. Whenever convenient, we shall treat pseudometric spaces as a special case of gauge spaces, with gauge D consisting of a singleton { d}. When no confusion will result, we may write d and { d} interchangeably and consider d itself as a gauge. Conversely, in 5.23.c we shall see that any gauge topology 'JD can be analyzed in terms of the simpler pseudometric topologies {'Jd : d E D } . Different gauges on a set X may determine the same topology or different topologies. Two gauges D and E are called equivalent (or topologically equivalent) if they determine the same topology. This terminology is discussed further in 9.4. A topological space (X, 'J) is metrizable (or pseudometrizable) if there exists at least one metric d on X (respectively, at least one pseudometric d on X ) for which 'J = 'Jd; the topological space is gaugeable if there exists at least one gauge D on X for which 'J = 'JD . This (pseudo ) metric or gauge is not necessarily unique. When we state that a topological space is (pseudo) metrizable or gaugeable, we do not necessarily have some
Chapter 5: Filters, Topologies, and Other Sets of Sets
1 10
particular (pseudo )metric or gauge in mind. We say that a topology 'J and a gauge D are compatible if 'J = 'JD i this term also applies to topologies and pseudometrics. Most topologies used in analysis are gaugeable. In 16.18 we present some examples of topologies that are not gaugeable, but these examples are admittedly somewhat contrived. Actually, the term "gaugeable" is seldom used in practice. We shall see in 16.16 that a topology is gaugeable if and only if it is uniformizable and if and only if it is completely regular; the terms "uniformizable" and "completely regular" are commonly used in the literature. Exercise. If (X, D) is a gauge space and S � X, then the relative topology on S is also gaugeable; it can be given by the restriction of D to S.
i. ( Optional.) We can generalize the notion of pseudometric topologies still further. Let D be a quasigauge on X - i.e., a collection of quasipseudometrics on X (which are not necessarily symmetric; see 2 . 1 1 ) . We can use D to define a quasigauge topology 'Jv in a fashion analogous to that in 5.15.h. That is, 'Jv is the collection of all sets T � X that have the property that for each x E T, there is some finite set Do � D and some number r > 0 such that {u E X : maxdE Do d(x, u) < r} � T. (This is the supremum of the topologies 'Jd determined by the individual quasipseudo metrics d E D ; see 5.23.c.) Reilly 's Representation. Actually, every topology 'J on a set X can be determined by a quasigauge D. Show this with D = {de : G E 'J}, where
dc (x, x')
{�
if x E G and otherwise.
x' tJ_ G
Consequently, many of the ideas that we commonly associate with gauge spaces uniform continuity, equicontinuity, completeness, etc. - can be extended (in a weaker and more complicated form) to arbitrary topological spaces. This presentation follows Reilly [1973] . Similar ideas have been discovered inde pendently in other forms; for instance, see Kopperman [1988] and Pervin [ 1962] . 5. 1 6 .
Definitions. Let (X, 'J) be a topological space, and let S � X.
a. We shall say that S is a neighborhood of a point
z if z E G � S for some open set G. Then N( z ) = {neighborhoods of z} is a proper filter on X , which we shall call the neighborhood filter at z or the filter of neighborhoods of z . Caution: Some mathematicians define neighborhood as we have done, but other mathematicians also require the set S to be open, as part of their definition of a neigh borhood of a point. With the latter approach, the neighborhoods of a point generally do not form a filter. The two definitions yield similar results for the main theorems of general topology, but the open-neighborhoods-only approach is not compatible with the pedagogical style with which general topology is developed in this book: We shall use filters frequently.
Topologies
111
b. There exist some closed sets that contain S (for instance, X itself), and among all such sets there is a smallest (namely, the intersection of all the closed supersets of S) . The smallest closed set containing S is called the topological closure of S; we shall denote it by cl(S). It is a special case of the Moore closure. It is probably the type of closure that is most often used by analysts. It may be called simply the closure of S, if the context is clear.
c. There exist some open sets that are contained in S (for instance, 0), and among all such sets there is a largest (namely, the union of all the open subsets of S) . The largest open set contained in S is called the interior of S; we shall denote it by int (S) .
d. Let X and Y be sets, and suppose some element of Y is designated "0" - e.g. , if Y
[-oo, +oo].
is a vector space, or if Y � Let f : X ___... Y be some function. If X is a topological space, then the support of f means the set supp(f)
cl ( {x E X
:
f(x) # 0} ) .
If X is not equipped with a topology, then the support of f usually means the set {x E X : f(x) # 0 } . Note that these two definitions agree if X has the discrete topology. 5 . 1 7.
Elementary properties.
a. int(S) � S � cl(S). A set S is open if and only if S = int(S) , and a set S is closed if and only if S = cl(S) .
b. The notions of closure and interior are dual to each other, in the sense of 1 .7. Show that C cl(S)
=
int( C S),
C int(S) = cl( C S),
where C A = X \ A. c.
A set S � X is open if and only if S is a neighborhood of each of its points.
d. z E cl(S) if and only if S meets every neighborhood of z . e. If G is open and cl(S) n G is nonempty, then S n G is nonempty. 5 . 1 8. Closures and distances. Let (X, d) be a pseudometric space. The diameter of a set and the distance from a point to a set were defined in 4.40. Let S be a nonempty subset of X , and let z E X. Then:
a. dist(z, S) = dist(z, cl(S)), and dist(z, S) = 0 {:==:;. z E cl(S) . b. diam(cl(S)) = diam(S) . c. cl(B(z, r)) � K (z, r), where B and K are the open and closed balls, defined as in 5. 15.g. Show that cl(B(z, r)) � K(z, r) may sometimes occur, by taking X = d (x , y) = min{ 1 , lx - Y l } . (See 26.4.a for a further related result.)
IR and
Chapter 5: Filters, Topologies, and Other Sets of Sets
1 12
d. ( Optional.) Assume (X, d) is a metric space. Let X = {nonempty, closed, metrically bounded subsets of X } . For S, T E X, let
}
{
h(S, T)
max sup dist(s, T), sup dist(t, s) . sE S t ET
(See example in the figure below.) Show that h is a metric on X; it is called the
Hausdorff metric.
Example. Hausdorff distance h between a circle and a rectangle
5. 19.
Kuratowski's Closure Axioms. Let X be a set, and let cl : P(X)
---+ P(X) be
some mapping. Show that cl is the closure operator for a topology on X if and only if cl satisfies these four conditions: cl(0) = 0,
S .. E A) , the smallest open set containing all the G>. 's is
u a).. ,
>. EA while the largest open set contained in all the G>. 's is
int
(n c)..) . >. EA
113
Topologies
Note that when the index set A i s finite, then n .\ EA G.\ i s an open set, and so 1\ .\ EA G.\
generally is equal to n .\ E A G.\ . Hence the inclusion 'J L P ( X ) preserves finite sups and infs; thus it is a lattice homomorphism. However, when the index set A is infinite, 1\ .\ EA G.\
generally is not equal to n .\ EA G,\ . Thus the inclusion 'J L P ( X ) is sup-preserving, but it generally is not inf-preserving. It is easy to verify that ('J, c;;;: ) satisfies one of the infinite distributive laws:
H A V G.\ .\ E A
V (H A G.\ )·
.\ EA
( 1)
( See also the related results in 13.28.a. ) However, ('J, c;;;: ) does not necessarily satisfy the other infinite distributive law, HV
f\ G.\
.\ EA
f\ (H v G.\ ) ·
.\ E A
(2)
For instance, that law is not satisfied in the following example, taken from Vulikh [ 1967] : Let 'J be the usual topology on the real line. Let H = ( 0, 1 ) , A = N, and G n = ( 1 - � , 2 ) for n = 1 , 2, 3, . . . . Verify that /\ nE N Gn = ( 1 , 2 ) , hence the left side of equation ( 2 ) is ( 0, 1 ) U ( 1 , 2 ) . On the other hand, H V Gn = (0, 2 ) , hence the right side of equation ( 2 ) is (0, 2 ) . It is possible to study at least some properties of a topological space purely in terms of its lattice of open sets; one can disregard the individual points that make up those sets. ( See the related result in 16.5.d and the related comments in 13.3. ) An introduction to this "pointless topology" was given by Johnstone [ 1983] . However, this pointless topology is seldom useful in applied analysis, which is greatly concerned with points.
Neighborhood Axioms. Let X be a set. For each x E X , suppose N(x ) is some filter on X , such that x is a member of every member of N( x ) . Let
5.22.
'J
{ G c;;;: X : G E N( z ) for every z E G } .
(In particular, 0 E 'J, since there is no z E G in that case. ) Then the following three conditions are equivalent: ( A ) There exists a topology on X for which { N( z ) : z E X } is the system of neighborhood filters. (B) For each z E X , the collection of sets 'J n N( z ) is a base for the filter N( z ) . That is, every member of N( z ) contains some member of N( z ) n 'J. (C) For each z E X and each S E N( z ) , there is some G E N( z ) with the property that u E G =;. S E N( u ) . Moreover, if ( A ) , (B), (C) are satisfied, then the topology in ( A ) must be 'J.
Hints: Let us first restate (B) as follows: (B') For every S E N( z ) , there is some G E N( z ) n 'J such that G c;;;: S.
Chapter 5: Filters, Topologies, and Other Sets of Sets
1 14
For (A) ::::} (B'), let int be the interior operator of the given topology; show that G int(S) is a member of the collection of sets 'J described above. For (B' ) ::::} (C), note that u E G ::::} G E N(u) ::::} S E N(u) since S 2 G. For (C) ::::} (A), define an operator int : 'Y(X) --+ 'Y(X) by int(S) {z E X : S E N(z) } ; then verify that this operator int satisfies the conditions of 5.20. =
=
5.23.
Here are a few more ways to make topologies: a. If { 'J>. : .>. E A} is a collection of topologies on a set X, then {S � X
:
S E 'J>. for every .>.}
is also a topology on X . It is sometimes called the infimum of the 'J.x 's, since it is their greatest lower bound - i.e., it is the largest topology that is contained in all the 'J.x's. b. From the preceding result we see that the collection of all topologies on X is a Moore collection of subsets of 'Y(X) . Thus, if 9 is any collection of subsets of X, then there exists a smallest topology 'J containing 9 - namely, the intersection of all the topologies that contain 9. The topology 'J obtained in this fashion is the topology generated by 9; the generating set 9 is also called a subbase for the topology 'J. (The topology generated is a special case of the Moore closure, but the terms "closed" and "closure" generally are not used in this context.) Example. The order interval topology on a chain X (defined in 5 . 15.f) is the topol ogy generated by the sets that can be expressed in either of the forms Sa = { x E X
: a < x}
or
Sb = {x E X :
X
< b}
a, b E X. Exercise. Let 9 be a collection of subsets of a set X. A set T � X is a neighborhood of a point x E X with respect to the topology generated by 9 if and only if T has the for points
following property:
There is some finite set { G1 , G2 , . . . , Gn } � 9 such that
x E n;= l Gj � T.
(We permit n = 0, with the convention that the intersection of no subsets of X is all of X . ) c. If {'J.x : .>. E A } is a collection o f topologies on a set X , then the topology generated by 9
{S � X
:
S E 'J>. for some .>.}
is called the supremum of the 'J.x 's, since it is their least upper bound - i.e., it is the smallest topology that contains all the 'J.x 's. The collection of all topologies on X is a complete lattice when ordered by � since each subcollection has an inf (see 5.23.a) and a sup. Important example. On any gauge space (X, D), the gauge topology 'JD is the supremum of the pseudometric topologies {'Jd : d E D } (defined as in 5. 15.g).
Algebras and Sigma-Algebras
1 15
Remarks. The theory of topological spaces will be developed a little further in Chapter 9. It will be continued in much greater detail in Chapter 15 and thereafter.
5 . 24.
ALGEBRAS AND SIGMA-ALGEBRAS Let X b e a set, and let C denote complementation i n X. An algebra (or field) of subsets of X is a collection S s;;; P(X) with the following properties: 5.25.
(i)
X E S,
(ii) S E S
=?
Cs E S, and (iii) S, T E S =? S U T E S. In the terminology of 1 .30, that says: S contains X itself and S is closed under complemen tation and finite union. It follows that 0 E S and that S is closed under finite intersection and relative complementation: S, T E S implies S n T, S\T E S. Caution: The term "algebra" has many different meanings in mathematics; several meanings will be given in 8.47 and one more in 1 1.3. When we need to distinguish the algebra defined above from other kinds of algebras, the algebra defined in the preceding paragraph will be called an algebra of sets. A u-algebra (or a-field) of subsets of X is an algebra that is closed under countable umon: (iii' ) S1 , S2 , S3 , · · · E S =? U;: Sj E S. 1 It follows immediately that any a-algebra S is also closed under countable intersection: 81 , 82, S3, . . E S =? n;:1 Sj E S . A measurable space is a pair (X, S ) , where X is a set and S is a a-algebra of subsets of X . The elements of S are referred to as the measurable sets in X . We may refer to X itself as a measurable space if S does not need to be mentioned explicitly. Measurable spaces (X, S) should not be confused with measure spaces (X, S, J.l ) , introduced in 2 1.9, or with spaces of measures {f.la }, introduced in 1 1 .47, 1 1 .48, and 29.29.f. Somewhat impre.cisely, we may say that a measure is device for measuring how big sets are; a measurable space is a space that is capable of being equipped with any of several different measures; a measure space is a space that has been equipped with a particular measure; and a space of measures is a collection of measures that is equipped with some additional structure (linear, topological, etc.) that leads us to call it a "space." .
5. 26. Examples of (a-) algebras. In the following examples, any statement involving a- in parentheses should be read once with the a- omitted and once with it included. Let X be any set; then:
a. { 0, X} is the smallest (a- )algebra on X; we shall call this the indiscrete (u-)algebra. b. P(X) = {subsets of X } is the largest (a-)algebra on X. We shall call it the discrete ( u-)algebra.
Chapter 5: Filters, Topologies, and Other Sets of Sets
1 16
c. Let J s;;; lR be an interval ( possibly all of JR) . Let A be the collection of all unions of finitely many subintervals of J ( where a singleton is considered to be an interval and, by convention, 0 E A also) . Show that A is an algebra of subsets of J. d. Let { S, a E A} be a collection of ( a- ) algebras on X. Then :
{T s;;; X
e.
f. g. h.
T E S, for every a E A}
is also a ( a- ) algebra on X. In view of the preceding exercise, the collection of all ( a- ) algebras on X is a Moore collection of subsets of 1'( X). Hence, given any collection 9 of subsets of X , there exists a smallest ( a- ) algebra that contains 9 - namely, the intersection of all the ( a- ) algebras that contain 9 . We call it the ( u-)algebra generated by 9; we say that 9 is a generating set for that ( a- ) algebra. ( The ( a- ) algebra thus generated is a special case of the Moore closure, introduced in 4.3. However, the terms "closed" and "closure" generally are not used in this context. ) The a-algebra generated by 9 is sometimes denoted by a ( 9 ) . { S s;;; X : S or CS is finite } is the algebra generated by the singletons of X . { S s;;; X : S o r C S i s countable } is the a-algebra generated by the singletons of X . ( The proof of this result assumes some familiarity with the most basic properties of countable sets; see particularly 6.26. ) Let 9 be a collection of subsets of X . Then:
( i ) The algebra generated by 9 is equal to the union of the algebras generated by finite subsets of 9 .
( ii ) The a-algebra generated by 9 is equal to the union of the a-algebras generated by countable subcollections of 9.
i. Some of the most important a-algebras are determined in one way or another by topologies. Let (X, 'J) be a topological space. The Borel u-algebra is the a-algebra generated by 'J - that is, the smallest a-algebra containing all the open sets. Its members are called the Borel sets. ( Remark. In 15.37.e we shall see that when X is any subinterval of the real line, equipped with its usual topology, then the Borel a-algebra is generated by the algebra in 5.26.c. ) Some other a-algebras based on topologies are • •
the almost open sets, also known as the 20.20 and thereafter;
sets with the Baire property, studied in
Baire sets, mentioned in 20.34; and • in JR n , the Lebesgue measurable sets, studied in Chapters 21 and 24. Caution: The "Baire sets" are not the same as the "sets with the Baire property," and the
the "Lebesgue measurable sets" are not the same as the "Lebesgue sets" ( introduced in 25.16 ) .
Algebras and Sigma-Algebras
117
j . The clopen subsets o f a topological space X form an algebra o f subsets o f X . 5 . 27. More definitions ( optional). Let n b e a set. A ring o f subsets of n (also known as a clan) is a collection � of subsets of n that satisfies 0 E � and also
A, B E �
A U B, A \ B E �.
A u-ring (also known as a tribe) is a ring � that also satisfies
:P(r.!)
is an algebra (or a-algebra) if and only if it is a ring (or Clearly, a collection � y
is not constructively provable; there is no algorithm that takes constructive descriptions of
x and y and yields the assertion of one of those three relations. We shall illustrate and
demonstrate this unprovability in two ways in 14.9 and 10.46. However, Bishop [1973 / 1985] points out that in most applications, the Trichotomy Law is not needed in its full strength; it can be replaced by the following weaker law.
Comparison Law. For any real numbers u, v, and y, if u < v then at least
one of u
< y or y < v must hold.
This law is constructively provable. The alterations one makes while translating classical mathematics to constructive math ematics generally have little or no effect on the ultimate applications. For instance, one of the fundamental theorems of classical functional analysis is the Hahn-Banach Theorem; we shall study several versions of this theorem in later chapters. Some versions assert the existence of a certain type of linear functional on a normed space X. The theorem is inher ently nonconstructive, but a constructive proof can be given for a variant involving normed i.e., normed spaces that have a countable dense subset; see spaces X that are separable Bridges [1979] . Little is lost in restricting one's attention to separable spaces, for in applied math most or all normed spaces of interest are separable. The constructive version of the Hahn-Banach Theorem is more complicated, but it has the advantage that it actually finds the linear functional in question. -
6.7. Constructivists and mainstream mathematicians use the same words in different ways; in fact, different schools of constructivists use the same words in different ways. A basic example is in the meaning of "real number." Mainstream mathematicians have several different equivalent definitions of real numbers (see Chapter 10). One way to define
136
Chapter 6 : Constructivism and Choice
a real number is as an equivalence class of Cauchy sequences of rational numbers ( see 19.33.c ) . But constructivists prefer to indicate a real number by a Cauchy sequence that is accompanied by some estimate of the rate of convergence - e.g. , a sequence (rn) of rational numbers that satisfies l rm - rn l ::; max { ,k , � }. Of course, in mainstream mathematics, every real number can be represented as the limit of such a sequence, but such sequences are not essential to our way of thinking about real numbers. In constructivist mathematics, all computations about real numbers are expressed, either directly or indirectly, in terms of such sequences. ( Constructivist "real numbers" are discussed further in 10.46. ) Here is a more complicated example of the differences in language: In constructive analysis, the continuous functions that are of chief interest are the uni formly continuous ones. Indeed, it is hard to constructively establish that a function is continuous except by giving a modulus of uniform continuity - and thus establishing that the function is indeed uniformly continuous. Of course, in mainstream mathematics, any continuous function on a compact interval is uniformly continuous, but that fact is not provable in constructive mathematics. In the terminology of Bishop and Bridges [ 1985] , a function on a compact interval is continuous if it has a modulus of uniform continuity - i.e., that book's definition of "continuity" is the usual definition of uniform continuity, but the context is one where the two notions are classically equivalent anyway. Among some constructivists, the only functions that can really be called "functions" are the representable ones. Moreover, it is a theorem ( in certain axiom systems of constructive mathematics ) that every representable function is continuous. Thus, under certain uses of the language, the following is true:
Ceitin's Theorem. Every function is continuous. A proof of this startling result can be found on page 69 of Bridges and Richman [ 1987] . The result is slightly less startling when we consider that, even in mainstream mathematics, any function with certain "good" properties is continuous; theorems to this effect are given in 24.42, 27.28.c, and 27.45. The introduction to constructivism given by Bridges and Mines [ 1984] also discusses the importance of language.
6.8. Constructivism versus mainstream mathematics. This book, which is intended to intro duce the reader to the literature, is frequently nonconstructive, since much of the literature is nonconstructive. Indeed, the constructivist viewpoint is foreign to most mathematicians today; we are so used to nonconstructive proofs that we tend to believe one cannot do much interesting mathematics constructively. And, until a few decades ago, we would have been right. Brouwer's intuitionism was more a matter of philosophy than mathematics, and Heyting extended the matter from philosophy to formal logic. But then, finally, Bishop [ 1967] showed how to develop a large portion of analysis constructively. ( See also the re vised version, Bishop and Bridges [ 1 985] . ) Since then, several other mathematicians have extended Bishop's style of reasoning and written constructive versions of many other parts of mathematics. In particular, the reader may refer to Bridges [ 1979] for functional analysis, to Beeson [ 1985] for foundations ( i.e. , logic and set theory) , and to Bridges and Richman [ 1987] for a recent survey of the several different schools of constructivism.
Further Comments on Constructivism
137
Despite its growing literature, constructivism remains separated from the mainstream of mathematics. This may be largely because constructivism's finer distinctions necessitate a use of language quite different from, and more complicated than, that of the mainstream mathematician. For instance, among some constructive analysts, x -j. y simply means the negation of x = y, while x # y means 2 the slightly stronger condition of apartness: We can find a positive lower bound for the distance between approximations to x and y. Thus, constructivists distinguish between notions that the classical mathematician is accustomed to viewing as identical. Consequently, a mainstream mathematician can only learn constructivism by relearning his or her entire language - a sizable undertaking. Some philosophical questions deserve at least a brief mention here, although we shall not address them in any depth. Bishop [1973/1985] suggested that mainstream mathematicians, in pursuit of form, have lost track of content; Bishop exhorted mathematicians to return to a more meaningful mathematics. Perhaps the contentless mathematics that he condemned would include the intangibles studied elsewhere in this book (free ultrafilters, etc . ) , which lack examples and do not seem to be a direct reflection of anything in the "real world." However, an argument can be made for the conceptual usefulness of such objects. For instance, free ultrafilters provide a basis for nonstandard analysis, which yields new insights into calculus and other limit arguments. Moreover, we may be surprised by just what kinds of mathematics can reflect the real world; for instance, Augenstein [1994] suggests that the Banach-Tarski Decomposition may be a useful model of some interactions of subatomic particles. Both constructive and nonconstructive thinking have their advantages. A constructive proof may be more informative (e.g. , it tells us that v'2 v'2 is irrational - see 6.5) , but a nonconstructive proof is often quicker and simpler. Extending a metaphor of Urabe: To feed one's family, it is not enough to prove that a certain pond contains a fish; ultimately one must catch the fish. On the other hand, it would be helpful to have an inexpensive device that quickly and easily determines which ponds contain fish.
6.9. Much of this book is concerned with nonconstructive mathematics. Moreover, to better understand some of the nonconstructible objects studied in this book, we shall sometimes find it helpful to vary the amount and kind of nonconstructiveness that we are willing to accept. In particular, we may compare results requiring the Axiom of Choice with results that only require a weakened form of the Axiom of Choice. At first glance, that looks like a rather strange notion; after all, either we can find a certain mathematical object, or we can't. How we can say that one object is harder to find than another object, when in fact we can't find either of them? The metaphor of "oracles" was introduced in recursion theory by Turing [1939] (see the discussion by Enderton [1977 recursion theory] ) ; a similar metaphor may be helpful in the present context. Imagine we have access to an oracle, who has frequent conversations with some deity. We present the oracle with various questions that we have been unable to answer by merely mortal, human methods. The oracle is able and willing to answer some, but not all, of these questions. For instance, the oracle might tell us whether Goldbach's 2Caution: inequality.
Some constructive analysts use x f
y
to denote apartness and use �(x
=
y)
to denote
Chapter 6: Constructivism and Choice
138
conjecture is true, but refuse to comment on the Riemann Hypothesis. In some of the literature, such an oracle is referred to as a "limited principle of omniscience." Now, in some cases, if the oracle gives us an answer to question A , we may use that information to deduce an answer to question B - even if the oracle has not given us an answer to B. Thus, one answer may be stronger than another. Similarly, two answers may be considered equivalent to each other if each is stronger than the other - i.e., if either answer would enable us to deduce the other. It must be emphasized that when we use the oracle 's answer to A to deduce an answer to B , then we are using human, mortal reasoning - i.e., the oracle is not helping in such deductions. Thus, our relation - of one answer being stronger than another - is deter mined without the aid of the oracle; this relation does not depend on our actually having answers to either of the questions A or B . It is these relations between the answers, not the actual answers themselves, that will concern us later, when we compare different levels of nonconstructiveness. Since the oracle is not actually used to determine and compare those different levels, we may now dispense with the oracle altogether.
6.10. Proposition. The Axiom of Regularity implies the Law of the Excluded Middle, if interpreted in the language of constructivism. (Hence the Axiom of Regularity is noncon structive.) Proof The following proof is modified from Beeson [1985] . Since most readers of this book probably are not familiar with constructivist language, we shall restate the proof in terms of the oracle metaphor of 6.9. Interpreted in constructivist terms, the Axiom of Regularity says that we have an oracle of the following type: We may describe to the oracle some nonempty set S, in terms that do not necessarily give a clear understanding of the set but that do at least uniquely determine the set. Then the oracle will specify to us some element x E S such that x n S = 0. Let P be a proposition (such as Goldbach 's conjecture) that we can state precisely, but that we do not necessarily know to be true or false. Now define s -
-
{
{0, { 0 } } {{0}}
i f P is true if P is false.
Then S is nonempty, since { 0 } E S. The oracle will tell us either "0 is a member of S that does not meet S" - in which case P is obviously true - or "{ 0} is a member of S that does not meet S" - in which case we can deduce that P is false. Thus, the oracle can be used to deduce the truth or falsehood of any proposition P.
Remark. The Axiom of Choice, if interpreted in constructivist terms, can also be shown to imply the Law of the Excluded Middle. The proof of this implication, though short, depends on a deeper understanding of constructivist language; it does not translate readily into the language of mainstream mathematicians. We omit it here; it is given by Beeson [1985] and Bridges and Richman [1987] .
The Meaning of Choice
139
6.11. Constructivism (in the sense of Errett Bishop) will be discussed further in 6.13, 10.46, and 1 5.48. Logicians have another notion that is similar to constructibility. An object xo is said to be definable if there exists a proposition P(x) in first-order logic for which x x0 is the unique element for which P(x) is true. See Levy [1965] . Constructibility in the sense of Bishop, constructibility in the sense of Godel, and defin ability in the sense of Levy are far outside the mainstream of thinking of most analysts. In 14.76 we shall introduce "quasiconstructibility," which is (in this author 's opinion) closer to the way that most analysts think.
=
THE MEANING OF CHOICE 6.12. Conventional set theory is Zermelo-Fraenkel set theory plus the Axiom of Choice, abbreviated ZF + AC. We described Zermelo-Fraenkel set theory in 1 .47. The Axiom of Choice has many equivalent forms; we shall study several in this and later chapters. (A much longer list of equivalents is given by Rubin and Rubin [1985] .) We shall denote our equivalents of Choice by (AC 1 ) , (AC2) , (AC3), (AC4) , etc.; collectively we shall refer to them as AC. Most of these equivalents are discussed in next few pages. A few more equivalents are the Vector Basis Theorem in 1 1 .29, and Tychonov 's Theorem and similar results on product topologies in 15.29, 17.16, and 19. 13. Here are three of the simplest forms of Choice:
(AC1) Choice Function for Subsets. Let X be a nonempty set. Then for each nonempty subset S . . (AC3) Nonempty Products. If {X>. : A E A} is a nonempty set of non empty sets, then the Cartesian product TI >.E A X>. is nonempty. That is, there exists a function f : A U>.E A X>. satisfying j (A) E X>. for each A.
---+
A function f that specifies choices, in this or similar contexts, is called a choice function. We postpone until 6.19 the proof of equivalence of these three principles. The Axiom of Choice is "obviously true," in that it agrees with the intuition of most mathematicians. For instance, consider (AC 1 ) . Each nonempty set S .) = {!(>.) : f E with >. E Dom(f)} is a finite subset of X , for each >. E A; (ii) each finite set S � A is the domain of at least one element of ; and (iii) has finite character; i.e., a function f from some subset of A into X is a member of if and only if each restriction of f to a finite subset of Domain(!) is a member of . Then A is the domain of at least one element of .
Remarks. The proof of (UF2) =? (UF 1 ) will be given via several other propositions in 13.22. Actually, (UF2) remains equivalent if we make the further stipulation that X = {0, 1 }; that will be evident from the argument in 13.22. The principle (UF2) is very similar to several principles that are known as Rado's Selection Lemma; the reader is cautioned that those principles are not all known to be equivalent to each another. For a few results on Rado's Lemma(s) see Howard [1984] and [1993] , Jech [1977] , Rav [1977] , and Thomassen [1983] . The Cowen-Engeler Lemma, particularly with X = {0, 1 } , is in many respects similar to the Compactness Principle of Propositional Logic, which is (UF16) in 14.6 1 . In fact, as Rav [1977] points out, the Cowen-Engeler Lemma is a sort of combinatorial, non-logicians' version of (UF16); the Cowen-Engeler Lemma can often be used in place of (UF16) but does not require any knowledge of formal logic. Proof of (UF 1 ) =? (UF2) . This proof is modified from arguments of Rav [1977] and Luxemburg [1962] . Let Fin(A) = {finite subsets of A }. For each S E Fin(A ) , let fs = {f E : Dom(f) 2 S}. Then fs is nonempty, by hypothesis (ii). Since fs n fr = fsur , the collection of sets {r s : S E Fin( A ) } has the finite intersection property. By (UF 1 ) , there exists a (not necessarily unique) ultrafilter 11 on that includes {f s : S E Fin(A) } . To define rp : A ----+ X , temporarily fix any >. E A. Note that (>.) = {!(>.) : f E f p } } · The sets { f E fp,} : f (>.) = x} (for x E (>.)) are disjoint and their union is fp} , which is a member of the ultrafilter 11. By 5.7.b and 5.8(E) , precisely one of the x's in (>.) satisfies {f E f {A} : f(>.) = x} E 11. Let that X be denoted by rp(>.) . Thus, we define a function rp : A ----+ X, satisfying {f E f p } : f(>.) = rp(>.) } E 11
for all >. E A.
153
The Ultrafilter Principle
It suffices to show that rp E . Let any E Fin(A) be given. Since has finite character, it suffices to show that rp agrees on with some f E . The set W
=
S S nE {f E r p} : f( >. ) = r..p ( >. )}
>. S
is also an element of li, hence a nonempty subset of . Now any
f E Ill will do.
6.36. Exercise. Show that (UF2) implies the Axiom of Choice for Finite Sets, which was
stated in
6.15 as (ACF).
(6.14). Marriage Theorems. Let {S1 : E r} be a collection of sets. Assume either
Hint: Use the Finite Axiom of Choice
1
6.37.
(i) r is finite (for P. Hall's Theorem) , or
(ii) each
S, is finite (for M. Hall's Theorem) .
Then the following two conditions are equivalent:
x E Il,E r S, . card(U, E F S,) 2: card( F) for each finite set F 0, there exists an integer N = N(s) such that n ;:=:: N =? d(x, Xn ) < c. We then write Xn ----> X or x = limn� oo Xn .
7.2. Chapter overview. Much of analysis can be formulated in terms of convergence of sequences in metric spaces, but occasionally we need greater generality. Nets are a generalization of sequences. sequence is a function whose domain is N; a net (or "generalized sequence" ) is a function whose domain is any directed set D. Most of this chapter can be postponed; it will not be needed until much later in this book. convergence space is a set X equipped with some rule that specifies which nets
A
A
155
Chapter 7: Nets and Convergences
156
- or equivalently, 1 which filters - converge to which "limits" in X . Analysts who are already familiar with convergent sequences in metric spaces should have little difficulty with convergent nets, for - as we shall see in this chapter - nets and convergence spaces are natural generalizations of sequences in metric spaces. The chart at the beginning of this chapter shows the relations between some of the main types of convergences we shall consider in this book. In later chapters we shall be primarily concerned with topological convergences - and, to a much smaller degree, order convergences. The other kinds of convergences - Hausdorff, pretopological, first countable, etc. - are introduced here mainly to give a clearer understanding of the basic properties of topological and order convergences. Nets are particularly helpful for understanding topologies that are known to be non metrizable - e.g., the weak topology of an infinite-dimensional normed vector space - or understanding topologies that are not known to be metrizable. But nets are also occasion ally useful in metric spaces; two examples of this are the proof of Caristi's Theorem given in 19.45 and the explanation of Riemann integrals given in 24.7. One very important order convergence that is not topological is the convergence almost everywhere of [-oo, +oo]-valued random variables over a positive measure; this topic is considered briefly in 21 .43. Other nontopological order convergences are important in the study of vector lattices, but that subject is not studied in great depth in this book. We are more concerned with order convergences that are topological. For instance, the order convergence and the topological convergence in are identical, but the order viewpoint and the topological viewpoint yield different kinds of information about that convergence. Nets are an aid to the intuition and to the process of discovery, but they are not always essential; many proofs involving nets can be rewritten so that nets are not mentioned. Some researchers prefer to rewrite their proofs in that fashion: The original insight may thereby be obscured, but the result becomes readable by a wider audience since familiarity with nets is no longer required. Although nets are used mainly for convergences, it is conceptually simpler to first study nets without regard to convergences - i.e., as devices for a modified sort of "counting," without any regard to limits. That is the subject of the first half of this chapter.
�
7.3. Review of directed sets. Before reading this chapter, it may be helpful to briefly review the introduction to filters in Sections 5 . 1 through 5 . 1 1 . Also, recall from 3 . 8 the definition of directed set: It i s a set X equipped with a relation =
rp({J) ?= a:o.
(ii") The A-valued net rp : lB --+ A is an Aarnes-Andenres subnet of the identity map i;. : A --+ A.
c. Willard [1970] modified Kelley's definition slightly, adding a requirement of monotonic ity; this may make the definition more palatable to many readers. We shall say ( Yf3 ) i s a Willard subnet o f (xa ) i f there exists a function rp : lB --+ A such that (i) y = x o rp - that is, Yf3 = x 'P ( f3) for all {3 (ii) rp is monotone; that is, {31 � {32 (iii) for each a:0
==>
E !B;
rp({Jl ) � rp({J2 ) ; and
E A there is some {30 E lB such that rp (f3o) ?= a:0 .
163
Subnets 7.16. Comparison of the definitions. a. Show that any Kelley subnet is also an Aarnes-Andenres subnet.
The converse is not valid. For instance, each of the sequences (0, 5, 6, 7, 8, . . . ) and ( 1 , 5, 6, 7, 8, . . . ) is an AA subnet of the other, but neither is a Kelley subnet of the other. b. Show that any Willard subnet is also a Kelley subnet. The converse is not valid. For instance, each of the sequences (2, 1 , 4, 3, 6, 5, . . . ) and ( 1 , 2, 3, 4, 5, 6 , . . . ) is a Kelley subnet of the other, but neither is a Willard subnet of the other. c. Suppose ( xa : a E A) is a net in a set X and lF is a frequent subset of the directed set A. Then lF is a directed set (see 7.10.f) , and so ( xa : a E JF) is a net. We shall say that it is a frequent subnet of the net ( xa : a E A). (In some of the literature, this is called a cofinal subnet. ) Show that any frequent subnet is a Willard subnet (by using the inclusion map i : lF __f_. A for the map cp in definition 7.15.c). The converse is not valid. For instance, show that (1, 1, 2, 2, 3, 3, . . . ) is a Willard subnet, but not a frequent subnet, of the sequence ( 1 , 2, 3, . . . ) . Frequent subnets cannot b e used interchangeably with Willard, Kelley, or A A sub nets; see 17.29. d. Frequent subnets are a generalization of subsequences. Let ( xm : m E be two sequences. Show that ( Yn) is a and ( Yn : n E subsequence of ( xm ) if and only if (yn) is a frequent subnet of ( Yn ) ·
N)
N)
7.17. Further elementary properties. a. Composition of subnets. If ( z1 ) is a subnet of ( Yf3), and ( Yf3) is a subnet of ( x0 ) , then
(z1 ) is a subnet of (x0 ) . If the two given subnets are Kelley subnets, Willard subnets, or frequent subnets, then then ( z1 ) is the same type of subnet of ( Xa ) . b. Suppose that ( xa : a E A) is a net in a set X and ( xa ) is eventually in some set of the form E = E1 U E2 U · · · U En � X. Then there is at least one j such that ( xa ) is frequently in Ej . Thus ( X a ) has a frequent subnet that takes all its values in E1 . c. Definition. Two nets have the same eventuality filter if and only if each net is a subnet of the other. We shall then say the nets are AA-equivalent, or simply equivalent.
7.18. Lemma on Common Subnets. Let ( u a : a E A) , ( v{3 : f3 E Iffi ) , and ( w1 : 'Y E C) be three nets taking values in a set X. Say the nets have eventuality filters :J, 9 , and JC, respectively. Then the following conditions are equivalent: (A)
F n G n H is nonempty, for every F E :J,
(B) J\1 {S � X filter. =
:
S
:;;::>
G E 9, H E JC.
F n G n H for some F E :J, G E 9, H E JC} is a proper
(C) The three filters have a common proper superfilter - i.e., there exists a proper filter which contains all three given filters.
Chapter
1 64
7:
Nets and Convergences
(D) The three nets have a common AA subnet - i.e., there exists a net (P.x) which is an AA subnet of each of the given nets. (E) The three given nets have a common Willard subnet - i.e., there exists a net (p.x : >. E lL) which is a Willard subnet of each of the three given nets. (It is understood that three different functions are used for the monotone mappings cp from lL into A, IB, and C . ) Furthermore, that net can be chosen so that it is a maximal common AA subnet of the three given nets - i.e., so that if (qJL) is any common AA subnet of the three given nets, then (q1 J is also an AA subnet of (p_x ) . Note. We have stated the lemma in terms of three nets and three filters to display a typical case. The number 3 may be replaced by any positive integer. Proof of lemma. The equivalence of (C) and (D) is immediate from our correspondence between AA subnets and superfilters. The implications (C) =} (A) =} (B) =} (C) are easy; the implication (E) =} (D) is trivial. It suffices to show that (A)-(D) together imply (E) . Note that the filter M in condition (B) is a a minimum common superfilter - i.e., it is the smallest filter containing all of the given filters. Any net corresponding to it is a maximal common AA subnet of the three given nets. It suffices to exhibit a net (P.x : >. E lL) whose eventuality filter is M, such that (p.x : ,\ E lL) is a Willard subnet of each of the three given nets. For each ( a , b, c) E A x IB x C, define
Un x
{ : 0: � a } n { V,6 : (3 � b} n { w, : r � c} { EX : = = Vf3 = w1 for some a �
Ta , b , c
x Un
a,
(3 � b, r �
c} .
Then Ta , b , c is nonempty, by condition (A) . Hence lL
{ (a, f3, r) E
A X !B x C
Ua = V,13
=
w1 }
is a frequent subset of A x IB x C, when A x IB x C is given the product ordering. For each >. = (a, (3, r) in lL, define p_x = = Vf3 = w1 ; the remaining verifications are easy. For the monotone mappings cp from A x IB x C into A, IB, C, use the coordinate projections.
Un
7. 19. Corollary on equivalent subnets. If (y13 ) is an AA subnet of ( equivalent (in the sense of 7. 17.c) to a Willard subnet of ( )
xa ·
xa ) , then (y13) is
Hints: The two given nets have a common AA subnet - namely, (y,13). As in 7.18(E), let (p.x ) be a common Willard subnet and also a maximal common AA subnet of the two given nets. Since (y,13) has the property for which (p.x ) is maximal, (y,13) is an AA subnet of (p_x ) . Thus (y,13) and (p.x ) are subnets of each other. Remarks. A similar result is given by Giihler [1977] . We have seen that every Willard subnet is a Kelley subnet, every Kelley subnet is an AA subnet, and every AA subnet is equivalent to a Willard subnet. Consequently, the three types of subnets can be used interchangeably in many contexts. See especially 15.38.
Universal Nets
165
7.20. Though AA subnets are simpler than Kelley subnets in most respects, Kelley subnets do have at least one advantage, which we now present in two formulations: ( 1 ) Suppose that f : X ---> V is some function, (x0 : a E A) is some net in X , and (Yf3 : (3 E B) is some Kelley subnet of the net (f(xa ) : a E A) in V. Then (xa : a E A) has a Kelley subnet (sf3 : (3 E B) in X such that f(sf3) = Yf3 for each (3. (Indeed, if Yf3 = f (x'P(f3) ) , take Sf3 = x'P(f3) ·) (2) Suppose that ( (ua , va) : a E A) is a net in some product of sets U x V; then (va a E A) is some net in V. Suppose that (Yf3 : (3 E B) is some Kelley subnet of the net (va : a E A) in V. Then ( (u0 , va ) : a E A) has a Kelley subnet ( (pf3 , qf3) : (3 E B) such that qf3 = Yr> for each (3. (Indeed, if qf3 = Yf3 = v'P (f3) ' take Pf3 = u'P(f3) ·) These are actually two formulations of the same principle. To see this, observe that if then we can reformulate the problem as in (2) by taking j, (xa ) , (Yf3) are given as in X = U and (ua , va) = (xa , f(xa ) ) . Conversely, if we are given (ua , v0) as in (2) , then we can reformulate the problem as in ( 1 ) by taking X = U x V and Xa = (ua , va) , and letting f : X ---> V be the projection onto the second coordinate.
(1),
7.21. Some properties of nets are subnet hereditary, in the sense that if a net has the property, then so does every subnet. For instance, we shall see in later chapters that in a topological space, every subnet of a convergent net is convergent. Likewise, some properties are supernet hereditary, in the sense that if a net has the property, then so does every supernet. For instance, in a topological space, the property of not being convergent is supernet hereditary. Many proofs with nets involve such hereditary properties. Consequently, in many proofs it is possible to replace a given net with any convenient subnet, or with any convenient supernet.
Some proofs use the phrase "we may assume," particularly in connection with hered itary properties. In many cases, what this means is that by relabeling, we may replace the given net with some subnet or supernet that has an additional property of interest. See the related discussion in 1 . 10.
UNIVERSAL NETS 7.22. Definition. A universal net (also occasionally known as an ultranet) in a set X is a net (x6 ) with the property that for each set S � X , either (i) eventually x0 E S or (ii) eventually x0 E X \S. 7.23. Example. Let (x6 ) be a net in X. Assume (x6 ) is eventually constant; i.e., assume there exists some z E X such that eventually x0 = z. Then (x6) is a universal net. Although other universal nets exist, other explicit examples of universal nets do not exist! That is explained below.
Chapter 7: Nets and Convergences
166
7.24. Observations. A net (x0) is universal if and only if its eventuality filter is an ultrafilter. If a net is universal, then any AA-equivalent net is also universal; by 7. 19, therefore, in the discussions below it does not matter whether we use Willard subnets, Kelley subnets, or AA subnets. A net (x6) is eventually equal to some constant x if and only if its eventuality filter is the fixed ultrafilter at x. Thus, the theory of universal nets is simply a reformulation of the theory of ultrafilters. The Ultrafilter Principle, introduced in 6.32, can be reformulated as (UF3) Universal Subnet Theorem (Tukey, Kelley). Every net has a subnet that is universal. Likewise, the Weak Ultrafilter Theorem, presented in 6.33, can be reformulated as '
(WUF ) Weak Universal Subnet Theorem. There exists a universal net in
N that is not eventually constant.
As we remarked in 6.33, free ultrafilters are intangibles. The same is therefore also true of universal nets that are not eventually constant. Though we have no explicit examples of these peculiar nets, nevertheless they are useful conceptual tools for some kinds of reasoning.
7.25. Further properties of universal nets. a. If (xa) is a universal net, then any subnet of (xa) is AA-equivalent to (xa) and is also universal.
b. If ( Xa ) is a universal net in X and X a is frequently in some set 5 eventually in
5.
;= (m l l + 1 , m22 + 1 , m33 + 1 , . . . ). b. Yes, if we use Aarnes-Andenres subnets and X is a finite set. Indeed, by 7. 10.e there is at least one E X such that frequently X a = Then the constant sequence (xo , xo , xo, . . . ) is an AA subnet of ( xa ) · c. No, if X is an infinite set, regardless of which type of subnets we use. Indeed, let ll be any free ultrafilter on X. (The existence of such an ultrafilter was established in 6.33.) Let (xa) be a corresponding net; thus (xa) is a universal net that is not eventually constant. If some sequence (Ym ) is a subnet of (xa), then (Ym ) has the same eventuality filter ll, hence (Ym ) is universal and not eventually constant - contradicting 7.25.f.
a. No, if we use Kelley subnets. Indeed, take A
x0.
x0
7.29. Theorem. Let X be a chain ordered set (e.g., the real line) . Then any sequence in X has a monotone subsequence.
Proof (Thurston [1994] ) . By a maximal element of a sequence we shall mean a maximal element of the range of that sequence. It is easy to see that if s is a sequence that has no maximal element, then s has an increasing subsequence. Now let s = (x 1 , x2 , x3 , ) be a given sequence; we may assume that every subse quence of s has a maximal element. Let Xn(l ) be a maximal element of s. Let Xn( 2) be a maximal element of (xn(l) + l , X n(l) + 2 , Xn(l) +3 > . . . ). Let X n( 3 ) be a maximal element of (xn( 2 ) + l , Xn( 2) +2 , Xn( 2 ) +3 , . . . ) . Continuing in this fashion, we obtain positive integers n(1) < n(2) < n(3) < · · · satisfying X n(l) 2 Xn( 2) 2 Xn( 3 ) 2 · · · . •
.
.
CONVERGENCE SPACES 7.30. By a convergence space (or limit space) we shall mean a set X equipped with a function lim {proper filters on X} --+ {subsets of X } . Any function can b e used for lim i n this definition, but in most cases of interest the function is determined by some structure already given on X - a topology, an ordering, a measure, etc. We emphasize that the value of lim is a subset of X. In some convergence spaces (e.g. , t he one used in college calculus) , the set lim :1 contains at most one point of X ; such convergence spaces are discussed further in 7.36.
Convergence Spaces
169
7.31. Whenever (X, lim) is a convergence space, then we shall extend the function lim in the following ways: (a) If '13 is a filterbase on X, then lim '13 = lim J"_ where J" is the filter generated by '13. (b) I f (xa ) i s a net in X, then lim(xa ) = lim J" where J" i s the eventuality filter o f (xa ) Note that the resulting "function" lim : {nets i n X } ----+ {subsets o f X } i s not a function strictly in the sense of 1 .3 1 , since {nets in X } is a proper class, not a set. Note, also, that this function satisfies the following condition: (*) if (xa ) and (y(3 ) are nets with the same eventuality filter (i.e., if (xa ) and (yf3) are AA-equivalent) , then the set of limits of (xa ) is equal to the set of limits of ( Yf3)Conversely, we have this alternate definition: A convergence space is a set X that is equipped with some function lim : {nets in X } ----+ {subsets of X} that satisfies (*).
Indeed, if (*) is satisfied, then (b) defines a corresponding limit function on the collection of all proper filters on X. (In many applications, we verify (*) by verifying a stronger property described below in 7.34.b.) Thus, in convergence spaces we may use nets and their eventuality filters interchangeably (and use AA subnets and superfilters interchangeably, as well) . Each type of object has its advantages, as we noted in 7.13. Remarks. For more general theories of convergences than those considered in this book, see: Bentley, Herrlich, and Lowen-Colebunders [1970] for categories of convergence spaces; Dolecki and Greco [1986] for algebraic properties of collections of convergence structures; and Gahler [1984] for "convergence spaces" that are more general than "filter convergence spaces." In Kelley's [1955/ 1975] book, net convergences are considered in great generality, without condition (*) being imposed a priori; see the remarks in 15. 10.
7.32. More notations. If J" is a proper filter or a net, the expression z E lim J" will be read as "z is a limit of J"." It may also be written as J" ----+ z and read as "J" converges to z . " The statement "J" does not converge to z" may be written as z � lim J", or as J" f+ z. Many variants on these notations can be used for clarification. For instance, for a net (xa : o: E A), the expression Xa ----+ z may also be written as "xa ----+ z in X as o: increases in A." When two or more convergences are being considered, we may use a prefix or subscript or superscript to distinguish them; for instance, we may write z E 'J- lim Xa
or
z E lim Xa 'J
or
Xa
'J
----+
Z
to indicate that z is a limit of the net (xa ) when we use the convergence function determined by some structure 'J, rather than some other structure S. Other variants on the notation should be clear from the context; we shall not attempt to list them all here. ----+
7.33. Let p : X Y be a mapping from one convergence space into another. We shall say is convergence preserving if it has this property:
p
Chapter
170 whenever (Xa) is a net converging to a limit converges to p (x ) in Y
7:
Nets and Convergences
x in X , then the net (p( Xa ) )
or, equivalently, whenever � is a filter converging to a limit p - 1 (S) E �} converges to p(x) in Y.
x in X , then the filter { S � Y
(Exercise. Prove the equivalence.) Observe that the composition of two convergence pre serving maps is convergence preserving; this is discussed further in 9. 7. 7.34. Definitions. Most convergence spaces of interest satisfy both of the properties below; in fact, these properties are satisfied by all the convergence spaces that we shall consider in this book. (Some mathematicians make one or both of these properties a part of their definition of convergence space.) a. A convergence space is centered if it has the property that
if Uz is the ultrafilter fixed at z, then Uz ____, z , or, equivalently, if (xa) is a net such that eventually Xa = z, then Xa ____, z . b. A convergence space is isotone if it has this property: if 9 is a superfilter of �' and � ____, z, then 9 ____, z or, equivalently, if (y13) is a subnet of (xa ) , and Xa ____, z, then Y/3 ____, z . In the last sentence, it does not matter which type of subnet we use - Willard, Kelley, or AA - since we have built condition ( * ) of 7.31 into our definition of convergence space. On the other hand, for AA subnets the isotonicity condition above implies condition (*) of 7.31 .
7.35. Exercise. Let X b e an isotone convergence space. I f (xa) i s a universal net and some subnet of (xa) converges to z, then Xa ____, z also. 7.36. A convergence space (X, lim) is Hausdorff if each net or proper filter F has at most one limit - i.e. , if each set of the form lim F contains at most one member. When (X, lim) is a Hausdorff convergence space, then z E lim F may be rewritten as z = lim F; we say that z is the limit of F. (Now the notation should begin to look more like that of college calculus.) In effect, our original limit function - which took values in {subsets of X } - is replaced by a new function, again denoted by "lim," which takes values in X. Thus, we are not asserting that z = { z } . The distinction between the two different lim functions should be clear in most contexts and should not cause any confusion. Most convergence spaces or topological spaces in applications are Hausdorff, and so some mathematicians incorporate the Hausdorff condition into other definitions - e.g., they make it a part of their definition of convergence space, compact space, gauge space, completely regular space, topological linear space, or locally convex space. We shall not
Convergence in Posets
171
follow that practice, for many of the concepts in this book are revealed more clearly if Hausdorffness is treated as a separate property. It is often helpful to analyze Hausdorff spaces in terms of other, simpler spaces that are not Hausdorff (see 15.25.d). Throughout this text, Hausdorffness will be assumed only when stated explicitly. More notation. If X and Y are convergence spaces and Y is Hausdorff, then the equation lim
x ---+ xo
f(x)
=
Yo
is a condition on x0, y0, and J, with the following meaning: Whenever (xa) is a net in X \ { x0} that converges in X to xo, then f (X a) Yo in Y . Most limits in college calculus are of this form - in some cases with xo or Yo equal to oo. Making oo a member of our -+
convergence space is not particularly difficult; see the discussions in 5.15.f, 5.15.g, 18.24.
CONVERGENCE IN POSETS 7.37. Remarks. The two most important kinds of convergences are the topological con vergences, studied in Chapter 15, and the order convergences, studied in the remainder of this chapter. The most important type of order convergence needed by analysts is the order convergence in JR; that special case should be kept in mind by the reader at all times throughout the remainder of this chapter. However, many of the basic properties of order convergence in lR generalize readily to other settings that are occasionally useful. Thus, we begin our study of order convergence in a setting that has as few hypotheses as possible: the setting of partially ordered sets. 7.38. The literature contains several different, inequivalent definitions of convergence in partially ordered sets. The following one works best for our purposes, despite its complexity. It can be restated in other ways that are sometimes more convenient; see 7.40.d and, in special contexts, 7.41 and 7.45. Definition. Let (X, �) be a poset. Let z E X , and let (xa : o: E A) be a net in X. We shall say that (xa) is order convergent to z (sometimes written X a � z ) if there exist nonempty sets S, T � X such that ( S, �) and (T, >;=) are directed sets, sup(S) and inf(T) both exist in X and are equal to z, and for each fixed s E S and t E T we have eventually s � Xa � t. (We emphasize that T is to be a directed set when we reverse the restriction of the given ordering. Thus, each finite subset of S must have an upper bound in S, and each finite subset of T must have a lower bound in T.) : o:
7.39. Definitions. Let (xa E A) be a net taking values in a partially ordered set (X, �) . (xa ) is increasing if
We say that
Chapter
172
7:
Nets and Convergences
This may be abbreviated Xn r. We say that (xn) increases to a limit z , denoted Xn l z, i f i n addition z = sup {xn : a E A}. Analogously, a net (xn : a E A) i s decreasing (written x"' 1 ) i f a � {3 ==? Xn � Xf3i the net decreases to a limit z (written Xn 1 z) if in addition z = inf { Xn : a E A}. A net is monotone if it is increasing or decreasing.
7.40. Exercises. Let (xn : a E A) be a net in a poset (X, � ) , and let z E X. Then:
Xn l z if and only if (xn) is increasing and Xn � z (in the sense of 7.38). b. Xn 1 z if and only if (xn) is decreasing and Xn � z (in the sense of 7.38). a.
c. In a complete lattice, any monotone net converges. d. Order convergence in terms of monotone convergence: Xn and only if
�
z (defined as in 7.38) if
there exist nets ( Uf3 : {3 E JE) and ( v"f : "( E C) such that Uf3 l z and v"f 1 z, and for each fixed {3 and "( we have a-eventually Xn E { x : u13 � x � v"f}.
Hints: For the "if" part, let S and T be the ranges of those nets (u13) and (v"'). For the "only if" part, let ( Uf3) and ( v"f) be given by the identity maps on the sets lE = S and C = T.
e. Order convergence is centered and isotone. f. Convergence preserves inequalities. Suppose (xn a E A) and ( Yn a E A) are nets :
:
based on the same directed set, satisfying Xn � Yn for all a. If Xn � X00 and Yn � Yoo , then Xoo � Yoo · Hint: Let sx and yx be two sets that satisfy the conditions in 7.38 that define the convergence Xn � x00• Also, let SY and TY be two sets that satisfy the conditions in 7.38 that define the convergence Yn � Y= · Fix any s x E sx and tY E TY ; then we have eventually sx � X n � Yn � t Y and thus s x � tY. Use that fact to prove that sup(Sx ) � inf(TY ) . ,
g.
Order convergence is Hausdorff. Thus, the statement Xn � z = o- lim Xn . Hint: Apply the preceding result with Xn = Yn ·
z may be rewritten as
h. Let (X, � ) and (Y, � ) be posets. (Here we use the same symbol � for two different
partial orderings.) Let f : X ---> Y be some function that is sup-preserving and inf preserving (see 3.22). Then f is also convergence-preserving (see 7.33 ) , if X and Y are equipped with their order convergences. Hint: First show that f preserves the convergence of monotone sequences - i.e., the convergences described in 7.39; then use 7.40.d. Remark: The assumptions cannot be weakened substantially; in 15.45 we give a partial converse. ,.
i. The "squeeze theorem." Suppose (xn : a E A) , ( Yn : a E A), (zn : a E A) are nets based on the same directed set, satisfying Xn � Yn � Zn for all a. If Xn � w and Zn � w, then also Y= � w. (Remark. Compare with 26.52(E) .)
Convergence in Posets
173
xa
7.41. Theorem on convergence in chains. Let (X, :::; ) be a chain. Let ( : o: E A) be a net in X, and let z E X. Then __::__, z (that is, order convergence, as defined in 7.38 or characterized in 7 .40.d) if and only if these two conditions are satisfied for all cr and T in X :
X0
(i) i f z
(ii) if z
>
cr ,
>
cr ,
Xa and < T , then eventually Xa < T. then eventually
Remarks. Note that condition (i) is satisfied vacuously (i.e., for free) if z happens to be the largest element of X, for then there is no element T that satisfies z < T. Likewise; condition (ii) is satisfied vacuously if z happens to be the smallest element of X. Considering the examples of [�oo, +oo] , [0, +oo), IR, we see that some chains have both a largest and smallest element , some chains have one or the other, and some chains have neither. Proof of equivalence. It is an easy exercise that order convergence implies conditions (i) and
(ii); we omit the details. Conversely, assume that ( ) and z satisfy condition (i) above; we shall find a set T satisfying the conditions of 7.38. (Forming S from (ii) is similar.) If the :::; z, then the singleton T = { z } satisfies the requirements net ( ) satisfies eventually for 7.38, and we are done. Assume, therefore, that the net ( ) does not satisfy eventually :::; z. Let T = { t E X : t > z } ; we shall show that this set satisfies the requirements. E T, and so T is nonempty. From condition (i) we see that for each We have frequently t E T, eventually < t. It suffices to show that z = inf(T). Clearly z is a lower bound for T; we must show that it is larger than any other lower bound. Suppose, on the contrary, that z' is a lower bound for T and z ' > z. Then z' is actually a member of T, and thus z' is the smallest element of T. That is, z and z' are adjacent in the ordering - i.e., there < z' and is no other element of X between z and z' . Since z' E T, we have eventually thus eventually :::; z, a contradiction. This completes the proof.
xa
Xa
Xa
Xo:
Xa
xa
Xn
Xn
Xa
7.42. Proposition ( optional) . Suppose X is an infinitely distributive lattice (as defined in 4.23) . Then the lattice operations V, 1\ are "jointly continuous," in the following sense: If ; ) o: E A) is a net in X x X with __::__, and � __::__, then V � __::__, V (( 1\ and 1\ ; __::__,
x,, x , : xa x ,
Xa
x x' .
x
x' ,
x
Proof This argument follows Vulikh [1967] . We shall show x0 V x�
meets is proved analogously. By assumption, there exist nets ( u:>.. (u� : cr E S) , ( v � : T E T ) such that
U;>..
T
x,
u�
T
x' ,
and for each fixed ,\, p,, cr, T we have a-eventually and Let
L x S and AI x T have the product orderings. Define
v
�
l
x' ,
x x'
Xa x
x x' ; the result for : ,\ E L), (v11 : E Ivf) ,
__::__,
V
p,
174
Chapter
7:
Nets and Convergences
Then for each fixed A, /-1, a, T we have a-eventually ih,,a =., a I (x v x' ) and v/L ,T l (x v x' ) , so Xa
( v11) V ( )
=
inf 11E M
inf v� TE T
=x
V x' .
v X� � X v x ' .
CONVERGENCE IN COMPLETE LATTICES 7.43. Remarks on applicability of the theory. When (X, =. E A} be a collection of subgroups of an additive group X . Their sum, L .-\ E A S,x , is defined to be the set of all sums of finitely many elements of U .-\ E A S,x . In other words, it is the set of all sums of the form
where n is a nonnegative integer and each
Sj
is a member of some S,x. Show that
(i) L .-\E A S,x is the union of sums of finitely many of the S,x 's. (ii) L .-\E A S,x is the subgroup of X generated by the set u,\E A S,x .
Sums and Quotients of Groups
185
8.12. Let S = S1 + S2 + · · · + Sn be a sum of finitely many subgroups. The set S is called the internal direct sum of the Sj ' s if it has this further property: Each s E S can be expressed in one and only one way as s = s1 + s 2 + · · · + sn , where Sj E Sj . We then write s = sl EB s2 EB EB Sn or s = EB;'= l si Such a decomposition may be helpful, because it may express a complicated object S in terms of simpler Sj 's. More generally, let S = I: A EA SA b e a sum o f arbitrarily many subgroups. We say S is the internal direct sum of the SA's, and write S = EBA EA SA , if each s E S can be written in one and only one way as a sum s = I: A EA sA, where each s A i s a member of SA and only finitely many of the sA's are nonzero. (The internal direct sum is often called the "direct sum," but it should not be confused with the external direct sum described in 9.30.) If S = EBA EA SA , then we can define mappings cpA : S ----+ SA by the rule that s = I: A EA cpA ( s); we may call cpA the projection onto SA. (The term "projection" also has other meanings; see 1 .34 and 22.45.) Some basic properties of direct sum decompositions are 0
0
0
0
a. S is an internal direct sum of the subgroups { SA : ,\ E A} if and only if S = I: A EA SA and s/1 n I: A #/1 SA {0} for each f-L E A. b. Each mapping cpA, considered as a map from S into itself, is idempotent (defined in =
2.4); it has range SA . c.
Each cpA , considered as a map from S into either S or SA, is additive.
8.13. An important special case is that in which an additive group X itself is the internal direct sum of two subgroups - say S and T. Then we write X = S EB T. This means that
x
each E X can be written in one and only one way in the form s + s E S and t E T,
t, where
or, equivalently, that
S + T = X and S n T = {0}. We shall then say that the subgroups S and T are additively complementary, or that they are additive complements of each other. (Some mathematicians would simply call these sets "complements" of each other, but in this book we have too many other uses for that term. )
Exercises. Suppose X = S EB T . Let cps : X ----+ S and cpr : X ----+ T b e the projections, defined as in 8.12 - that is, = cps(x) + cpr (x) for each Show that
x
a. b. c. d.
x.
cps + cpr i x (where i x is the identity map of X) Range( cps) = Ker( cpr) = S and Range( cpr) = Ker( cps) = T. cpscpr = cprcps = 0. =
Conversely, suppose X is an additive group and p : X ----+ X is an idempotent homomor phism. Let q = ix - p. Show that q is also idempotent, and X = Range(p) EB Range(q) .
Chapter 8: Elementary Algebraic Systems
186
8.14. Let G be an additive group, and let H be a subgroup. Define sums of sets as in 8.3.
The cosets of H are the sets x + H = {x + h : h E H}. Note that any two cosets are either identical or disjoint; thus they form a partition of G. Show that (x + H) + (y + H) = (x + y) + H,
- (x + H) = ( -x) + H.
Let G I H be the set of all cosets of H; show that G I H is an additive group with identity element 0 + H and with other operations defined as above. Since the cosets of H form a partition of G, they define an equivalence relation on G by:
g 1 , g2 belong to the same coset The cosets of H are the equivalence classes for this equivalence relation, and G I H is the quotient set (as in 3. 1 1 ) . Consequently, the group G I H is called the quotient group. The quotient map 1r : G ----+ G I H (defined as in 3 . 1 1 ) is given by 1r ( g ) = g + H. It is a group homomorphism from G onto G I H. Note that it satisfies
7r(7r- 1 (B))
=
B
Jr-1 (1r(A)) = A + H
for any B � GIH, for any A � G.
whereas
Algebra books contain a more general theory of quotients, applicable to groups that are not necessarily commutative. However, that theory is more complicated and will not be needed for our purposes.
8.15. Not every quotient group GI H is isomorphic to a subgroup of G.
Example. The circle group [0, 1 ) , introduced in 8. 10.e, can also be described as the quotient of the additive group � by the subgroup Z. The circle group is not isomorphic to a subgroup of R One easy way to show this is to note that 0 and � are distinct solutions of x + x = 0 in [0, 1 ) . In the group � the equation x + x = 0 has only one solution. ' 8.16. Not every subgroup of every group has an additive complement. (Contrast 1 1 .30.f.)
Example. Z is a subgroup of � ' but there is no subgroup G � � satisfying � Z EB G. Indeed, show that if G were such a group, it would be isomorphic to JRIZ, and hence isomorphic to [0, 1 ) , contradicting the result in 8.15. =
8.17. Let f : X ----+ Y be an additive mapping - i.e., a homomorphism of additive groups. Then the kernel of f is the set Ker(f)
{x E X : f(x)
=
0}.
A few of its basic properties are: Ker(f) is a subgroup of X; hence 0 E Ker(f). b. Ker(f) = {0} if and only if f is injective. c. (Isomorphism Theorem.) Let 7r : X ----+ XIKer(f) be the quotient map. Then F(1r(x)) = f (x) defines a group isomorphism F : XIKer(f) ----+ Ran(!) . d. Degenerate examples. Let X be any additive group. Then the identity map i : X ----+ X has kernel {0} , and the constant map x f---> 0 (from X into any additive group) has kernel X. a.
Rings and Fields
187
RINGS AND FIELDS 8.18. Definitions. A ring is an additive group (R, 0, +) equipped with another associative binary operation ( ·), called multiplication, which distributes over addition on both the left and right:
(w · x) + (w · y) (x · w) + (y · w)
w · (x + y) (x + y) · w
and
for all w, x, y E R. A ring with unit also has a special element 1 ( one) , such that ( R, 1 , ·) is a monoid. Caution: Some mathematicians work only with rings with unit, and then they may refer to those objects simply as "rings." For a trivial example of a ring without unit, consider the even integers, with the usual operations of addition and multiplication. For a less trivial example of considerable interest to analysts, see 1 1 .4.e. (Most of the rings used by analysts have additional structure: They are linear algebras, as explained in 1 1 .3. However, Z is an important commutative ring that is not a linear algebra.) By our definitions, the addition operation in any ring is commutative. A commutative ring is a ring in which the multiplication operation is also commutative. A field is a commutative ring with unit, in which 0 of. 1 and in which every nonzero element has a multiplicative inverse. Consequently, in fields we are able to perform "ordinary arithmetic" computations. For instance, the student should prove (and explain) that in a field,
w y + X Z
wz + xy xz
Examples. Some fields with which most readers are informally acquainted are Q and IR; these are introduced formally in 8.22, 10. 10, 10.8, and 10.15. The fundamental operations of a ring with unit or a field are those of its additive group (the binary operation +, the unary operation - , and the nullary operation 0) and those of its multiplicative monoid (the binary operation · and the nullary operation 1 ) . When we talk about fundamental operations and related concepts, then a field will simply be viewed as a particular type of ring with unit. (See the related remarks in 8.54.) A homomorphism of rings with unit is a mapping f : R ---+ S from one ring into another; which preserves the fundamental operations - i.e., which satisfies f(x l + x2 ) = f(xi) + ! (x2 ),
! ( - x) = - f(x), J(x1x 2 ) f(O) = 0,
=
f(xi )f(x 2 ),
/(1)
=
1
for all x, x 1 , x2 E R. All of these conditions are conceptually relevant, but some of them ar� redundant, i.e., implied by some of the other conditions. A homomorphism of fields will simply mean a homomorphism f : R ---+ S of rings with unit, where R and S happen to be fields; no additional requirement is imposed on f for this case. However, ( exercise) it follows from our definition that if f : R ---+ S is a homomorphism of fields, then f is injective and f(x - 1 ) = f(x) - 1 for all x of. 0.
Chapter 8 : Elementary Algebraic Systems
188
8.19. Some elementary properties. For all w , x, y, z in a ring R with unit, we have a. 0 · X = 0 X 0. b. (-x) · y = x · (-y) = -(x · y) . c . ( - 1) · x = -x. That is, the additive inverse of 1 times any ring element x is the additive inverse of x. d. There is a unique homomorphism from Z into the ring R. e. If 0 = 1, then R = {0}. This is the smallest ring. =
•
8.20. Example: finite rings and fields. Let m be an integer greater than 1 . For integers x, y E Z, write x = y (mod m) if x - y is a multiple of m - that is, if x - y = km for some integer k. We then say that x and y are congruent modulo m. It is easy to verify that =
is an equivalence relation on classes, since
Z. The arithmetic operations make sense on the equivalence
The equivalence classes are most often represented by their smallest nonnegative mem bers - i.e. , the numbers 0, 1 , 2, . . . , m - 1. Thus we obtain arithmetic operations on the set Zm = {0, 1 , 2, 3, . . . , m - 1 } , which can b e described more directly as follows: to add or multiply two numbers x , y in Zm, take their ordinary sum or product in Z, and then subtract a suitable multiple of m to obtain an element of {0, 1 , 2, . . . , m - 1 } . With these operations, Zm is a commutative ring with unit, called the integers modulo m. As an illustrative example, below are the addition and multiplication tables for Z6. Note that, considered as an additive group, Zm is the subgroup generated by { 1 } in the group [0, m) of reals modulo m, introduced in 8. 10.e.
+ 0 1 2 3 4 5
0 0 1 2 3 4 5
1 1 2 3 4 5 0
2 2 3 4 5 0 1
3 3 4 5 0 1 2
4 4 5 0 1 2 3
5 5 0 1 2 3 4
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 1 2 3 4 5
2 0 2 4 0 2 4
3 0 3 0 3 0 3
4 0 4 2 0 4 2
5 0 5 4 3 2 1
Recall that a prime number is one of the numbers 2, 3, 5, 7, 1 1 , etc. - that is, an integer greater than 1 that can only be written as a product of two positive integers if one of those factors is 1 . It is an easy exercise to show that the finite ring Zm is a field if and only if m is a prime number. In particular, Z 2 = { 0, 1 } is the smallest field; it will be of some importance in the study of Boolean algebras.
Rings and Fields
189
Related exercises.
a. In the ring 2:6 , we have 2 · 3 = 0; thus the product of two nonzero elements is zero. In
=
the ring z4 ' we have 2 2 0. b. The ring :Z.4 is not a field, but there does exist a field F4 containing exactly 4 elements; it is unique up to isomorphism. Find its addition and multiplication tables. c.
In the field :Z.5, the squares of the numbers 0, 1 , 2, 3, 4 are the numbers 0, 1 , 4, 4, 1. More generally, show that if m is an odd prime number, then exactly half of the nonzero members of Zm are squares of members of :Z.rn .
Remarks. Finite fields are not often useful in analysis; we have mentioned them only because they offer very easily understood illustrations of the concept of "field." We shall now state without proof a few more results about finite fields; the proofs of these additional results are beyond the scope of this book, but can be found in more specialized books - see for instance, Lidl and Niederreiter [1983] . Let be an integer greater than 1 . Then there exists pn for some a field Fq containing exactly elements if and only if is of the form prime number p and some positive integer n in which case the field Fq is unique (up to isomorphism) . Considered as a linear space over :Z.P (see Chapter 1 1 ) , the field Fq is isomorphic to (:Z.p) n . The multiplicative group Fq\{0} is isomorphic (as a group) to the additive group Zq- l · The explicit formation of such finite fields - i.e., the computation of their addition and multiplication tables - is a somewhat complicated matter. However, when p is an odd prime, then it is fairly easy to form a field with p2 elements; a simple method is given in 10.23.b.
q
q
q
q=
-
A)
8.21. Example: products. Suppose that (R>. : ,\ E is a collection of rings. Then we can make the Cartesian product P f LEA R>. into a ring, by defining operations coordinatewise:
=
(J + g) (>.)
=
!(>-) + g(>.) ,
(!g) (>.)
=
!(>-) g(>.) ,
etc. The additive identity Op is the function that takes the value 0>. at the ,\th coordinate. If the R>. 's are rings with unit, then so is P, with multiplicative identity 1p equal to the function that takes the value 1 >. on the >.th coordinate. The product of two or more fields is not a field, when operations are defined in this fashion, since any element of P with a 0 in at least one component has no multiplicative inverse. However, a different method can sometimes be used to make a product of fields into a field; see 10.22.
8.22. The reader is undoubtedly quite familiar with the field of rational numbers, Q
=
{ m/n : m, n E :Z., n i=- 0 } . Nevertheless, we shall give a formal construction of it; the same method of construction will subsequently be used to form another, less familiar field. An integral domain is a commutative ring D with the property that whenever
x, y E D with xy = 0, then at least one of x, y is 0.
190
Chapter 8: Elementary Algebraic Systems
Of course, any field is an integral domain. The ring Z = { integers} is an example of an integral domain that is not a field; another example will be given in 8.24. The finite rings z4 and z6 are not integral domains. Let D be an integral domain. For pairs (x, m) and (y, n) in D x (D\{0} ) , define (x, m) ::::::: (y, n) to mean that xn = ym. Verify that this is an equivalence relation on D x (D\{0} ) . Define lF' t o b e the set of equivalence classes. Addition and multiplication i n lF' are defined by (x, m) + (y, n) = (xn + ym, mn), (x, m) (y, n) = (xy, mn). ·
The reader should verify that these operations are well-defined - i.e., that the definitions above do not depend on the particular choice of representations for the equivalence classes. That is, if (x 1 , mi ) ::::::: (x2 , m2 ) and (y1 , n 1 ) ::::::: (y2 , n2 ) , verify that
With operations so defined, verify that lF' is a field. It is called the field of fractions of f---t ( x, 1 ) is an embedding of D in lF' - that is, an injective ring homomorphism - and so we may view the ring D as a subset of the field
D, or field of quotients. The mapping x
JF'.
Having completed our construction of JF', we now switch to conventional notation: The equivalence class containing (x, m) is represented by xjm or � . Of course, the representa tion is not unique, since any pair in the equivalence class can be used to form this expression. We urge the reader not to switch to this notation until after completing the construction of lF' and the verifications that it requires. The artificiality of the unfamiliar notation (x, m) will make it less likely that we will inadvertently assume some familiar property of lF' that has not yet been proved. In the particular case where the integral domain D is the ring Z, the resulting field of quotients is the field of rational numbers; it is denoted by Q.
8.23. Exercises about Q.
Hint: 2.20.e. 2 b. There is no x E Q satisfying x = 2. Hint: If x = pjq, consider how many factors of 2 there are in p or in q. (We assume familiarity with basic properties of the integers, a.
Show that card(Q) = card(N ) .
e.g. , the uniqueness of prime factorization.)
c. If lF' is a field, there is a unique ring homomorphism from Q into JF'. d. Example. There is a unique ring homomorphism h Q ---+ Z5. With that ring homo morphism, evaluate h( � ) . Explain. (Thus we obtain a member of { 0, 1 , 2, 3, 4 } which :
is in a sense "congruent modulo 5" to the fraction 2/3.)
8.24. Example: the ring of polynomials and the field of rational functions. Let lK b e any integral domain (for instance, the integers or the rationals or the reals) . Let S = { s, t, u, . . . } be a nonempty (finite or infinite) set of distinct symbols not already used in our description of lK or elsewhere in our language. We write S as { s, t, u, . . . } to display a few typical elements, but we do not require that S be countable or ordered in any fashion. (For the
191
Rings and Fields
simplest examples, take S t o be just a singleton: S = { s } . However, later we shall have some uses for much larger collections S as well. ) A monomial with variables in S and coefficients i n lK is any expression such as as3 t 2 uv, where a E lK and s, t, u, v E S - that is, an element of lK multiplied by finitely many members of S. If the coefficient a is not zero, then the degree of the monomial is the sum of the exponents of the variables - for instance, the monomial as3 t 2 uv = as3 t 2 u 1 v 1 has degree 3 + 2 + + = 7. A polynomial with variables in S and coefficients in lK is a sum of finitely many monomials - i.e., any expression such as as 3 +bst 2 + ct 2 +duv + e, where a, b, c, d, e E lK and s, t, u, v E S. The degree of the polynomial is the highest degree of any of its monomials; for instance, as3 + bst 2 + ct 2 + duv + e has degree 3. A homogeneous polynomial of degree k is a sum of several monomials of degree k; for instance, as 3 + bs2 t + cstu + dtu2 + ev 3 is a homogeneous polynomial of degree 3. Addition, multiplication, and equality of polynomials are defined by the usual algebraic rules; we omit the details. The set of all polynomials with variables in S and coefficients in lK is easily seen to form a commutative ring with unit, which we shall denote by JK[S] . Note that each a E lK may be viewed as a constant polynomial - i.e. , a polynomial of degree 0; thus each member of lK may be viewed as a member of JK[S] . This mapping from lK into JK[S] is an injective ring homomorphism; thus we may view lK as a subset of JK[S] . When S consists of just one variable - say s - then the ring JK[S] = JK[ { s }] may be written more briefly as JK[s] . Then any polynomial may be written in the form
1 1
p(s)
an sn + an - l sn- l +
·
·
·
+ a 1 s + ao
where the coefficients aj are members of JK. If p(s) is not the constant function 0, then by dropping any leading zero terms we can choose the representation so that a n "/=- 0. Then n is the degree of the polynomial, and a n is called the leading coefficient. If the ring lK is an integral domain, then so is the ring JK[S] . Hence we can form its field of quotients, as in 8.22. That field is called the field of rational functions with variables in S and coefficients in IF ; we shall denote it by JK(S) . A member of that field is a rational function with variables in S and coefficients in IF - i.e., a quotient of two polynomials. A typical rational function is
as3 + bst2 + ct2 + duv + e bt 3 + dst + fuv 3 + g
Equality between such rational functions and arithmetic operations with such functions are defined in the usual fashion; we omit the details. If S consists of just a single variable s, then the field JK ( S) = JK ( { s } ) may be written more briefly as JK( s). :
= {pjq E JK(S) p and q are homogeneous polynomials of the same degree} . (For instance, if S = { s, t, u} a9d
8.25. Blass 's Subfield ( optional) . Define JK(S) as above. Let 'B lK
=
ffi.,
then
3s3 + V'is2 t + � stu + 1rsu2 17stu - V'sst 2 + 6.179t3
is a typical member of 'B .) Show that 'B is a subfield of JK(S). This field will be mentioned again in 1 1 . 29.
192
Chapter 8 : Elementary Algebraic Systems
MATRICES 8.26. Matrix notation. Let IK be a ring, and let m and n be positive integers. A n m-by-n matrix over IK is a rectangular array
a 12 a 22
a , ,. a 2n
a ml a m2
a mn
a1 1 a21
A
l
with m rows and n columns, where each aij is an element of IK. We say aij is the element (or component ) in row i and column j. The matrix A given above may be represented more briefly as A = ( aij : 1 :::; i :::; m; 1 :::; j :::; n), or still more briefly as ( aij ) if no confusion will result. The transpose of an m-by-n matrix A is the n-by-m matrix AT obtained by flipping A over diagonally, so that the kth row becomes the kth column and vice versa. Obviously, ( AT ) T = A. An m-by-n matrix is called a column matrix if n = 1 (i.e. , if it consists of just one column) , a row matrix if m = 1 (i.e. , if it consists of just one row), and a square matrix if m = n. Note that the transpose of a row matrix is a column matrix, and vice versa. For any positive integer p, it is customary 1 to consider elements of JKP as column matrices when matrices are to be used at all, but to save space on the printed page they are often represented as the transposes of row matrices. Thus the ordered p-tuple (b1 , b2 , . . . , bp ) can also be written as [b1 b2 · · bp ] T ; we emphasize that the representation with parentheses requires commas while the representation with brackets requires that the commas be omitted. ·
8.27. Matrix multiplication has slightly complicated dimensional requirements. If A is an m-by-n matrix and B is an n-by-p matrix, then we can form their product AB R, an =
m-by-p matrix:
a 12 a 22
a ln a 2n
a m l a m2
a mn
au a 21
m-by- n
ll
bu b l 2 b21 b22
blp b2p
bnl bn2 n-by-p
bnp
l
r 12 r22
r1p r2p
Tml Tm2
Tmp
ru r2 1
m-by-p
l
defined by this formula: rik = ail b lk + ai2 b2 k + · · + ain bnk · In general, multiplication of matrices is not commutative. In fact, when the product AB is defined, the product BA is not necessarily defined. For instance, for the matrices above, we can only define BA if m = p, and in that case AB is an m-by-m matrix while BA is an n-by-n matrix. Thus, AB = BA can only hold if A and B are square matrices of ·
]f(P 1960.
1 Some older algebra books represent members of the prevailing convention since sometime around
as row matrices, but column matrices seem to be
Matrices
193
the same dimension. Even then, AB = BA only holds in an occasional coincidence; it does not hold in general, even if the underlying ring lK is commutative. For instance, if and then AB =/= BA, provided lK is a ring with unit in which 0 =/= 1. However, we do have (AB)T = BT AT if the ring lK is commutative. It is an easy exercise to show that multiplication of matrices is associative: (AB)C = A(BC) whenever the dimensions of the matrices match up - i.e., A is an m- by- n matrix, B is an n-by-p matrix, and C is a p-by-q matrix. Hence we may omit the parentheses and write the product simply as ABC. The element in row h, column k of that product is
I:: I I:�= I a hi bijCjk ·
8.28. Matrices as functions on columns. An important special case of matrix multiplication is the following: Let A be an m-by-n matrix. Represent elements of ocm and ocn by column matrices - i.e., by m-by-1 matrices and by n-by-1 matrices, respectively. Then the mapping v f---+ A v is an additive map from ocn into ocm . This type of map plays an important role in the theory of finite-dimensional vector spaces, discussed further in Chapter 1 1 . It is so important that we shall write it out more explicitly here:
!f A� [ .
an a2 I
a i2 a 22
a In a 2n
a m i a m2
a mn then
and
v=
Av
[
�
VI v2 Vn
l
a n v 1 + a 1 2 V2 + · · · + a 1nVn a 2 1 VI + a 22 v2 + · · · + a 2n Vn
l
a m 1V1 + am2 V2 + · · · + a mn Vn In particular, the n- by-n matrices act as mappings from ocn into itself. If lK is a ring with unit, then the n-by-n matrices form a monoid, under the operation of matrix multiplication. The invertible elements of that monoid form a group. An interesting subgroup consists of the permutation matrices of order n; these are the n-by-n matrices A that have the following property: Each row contains n 1 zeros and 1 one; each column also contains n 1 zeros and 1 one. For example, there are six permutation matrices of order 3: -
-
[� n [� !] [! �l [� n [� n [� �l 0 1 0
1 0 0
0 0 1
0 0 1
1 0 0
0 1 0
Chapter 8: Elementary Algebraic Systems
194
Such a matrix is called a permutation matrix for the following reason: If n distinct members of the ring lK are arranged in a column matrix v, then the mapping v f-+ Av permutes those n members - i.e., the column matrix Av consists of the same n members, arranged in some other order ( or in the same order, if A = I) The group of permutation matrices of order n is isomorphic ( as a group ) to the symmetric group of order n, introduced in 8. 10.i. .
l[
l[
8.29. The ring of matrices. Addition of m- by- n matrices is defined componentwise:
[
au am l
a 1n .:
a mn
+
bu :.
bm l
b in
�
b n
a u + bu
a l n + b ln
� bm l
a mn + bmn
=
am l
Thus, we can only add two matrices if they have the same dimensions. With multiplication and addition defined as above, the set Mat ( n; JK) over JK} is a ring; it has additive and multiplicative identities given by
=
l
{ n-by- n matrices
and
Here In is an n-by-n matrix that has ls along its main diagonal and Os elsewhere; it may be written more briefly as (8;j) , where 8 is the Kronecker delta. In general the ring Mat ( n; JK) is not commutative.
ORDERED GROUPS 8.30. I n this book an ordered monoid will mean an additive monoid X that i s equipped with a partial ordering � that is translation-invariant - i.e., that satisfies
x+u � y+u for all x, y, u E X. If X is also a group, we shall call it an ordered group. Most of the ordered monoids used by analysts have a great deal more structure - in fact, most of them are Riesz spaces. However, [0, + oo] is an important ordered monoid that is not even a group.
8.31. Arithmetic in ordered monoids. Let S and T be nonempty subsets of an ordered
monoid (X, �). For each of the following equations, show that if the left side exists, then the right side also exists and the two sides are equal: max ( S )
+ max ( T) = max ( S + T) ,
min ( S )
+ min ( T)
min ( S + T).
195
Ordered Groups Also show that each of the following inequalities holds if both sides exist: sup(5)
inf(5) + inf(T) � inf(5 + T) .
+ sup(T) � sup(5 + T) ,
(Compare with 8.33.a.)
8.32. Proposition. The sup of a directed family of additive maps is additive. More precisely: Let M be an additive monoid, and let (Y, �) be an ordered monoid. Let be a collection of additive maps from M into Y. Assume is directed by the product ordering on � that is, for each h , h E there exists f E such that
yM
f ( x) � h ( x)
and
f ( x) � h ( x)
for all x E M.
Assume that h ( x) = sup/ E f ( x) exists in Y for each x E M. Then the function h : M ----> Y is also additive. (This result will be used in 1 1 .57.)
Hints: The proof of h ( x + x' ) � h (x) + h ( x' ) is easy � it does not require to be directed; we leave the details as an exercise. For the reverse inequality, show that h ( x) + h ( x') =
sup [h ( x) !J ,f2 E
+ h ( x') ]
::::;
sup [f ( x) / E
+ f (x') ]
=
h ( x + x').
8.33. Arithmetic in ordered groups. Let (X, � ) be an ordered group. Let 5 and T be nonempty subsets of X, and let x, y E X. Show that a. For each of the following equations, if the left side exists, then the right side exists and the two sides are equal: sup(5)
+ sup(T)
sup(5 + T),
inf(5)
inf(5 + T) .
+ inf(T)
b. X � y X' that preserves the fundamental operations - i.e., that satisfies for all j E J and all x 1 , x2 , . . . E X. We may call this a homomorphism of arity r to emphasize the particular arity being used. This generalizes the definitions of lattice homomorphism, monoid homomorphism, group homomorphism, and ring homomorphism, given in 4.26, 8 . 1 , 8.9, and 8. 18. Note that this definition does not involve any additional properties that may be enjoyed by the algebraic systems X and X'. For instance, cp 1 is commutative if it is a binary opera tion satisfying cp 1 (x1 , x2 ) = cp 1 ( x2 , x l ) , but this additional information is not relevant in de termining whether f is a homomorphism. A function f : X ---> Y, from one monoid into an other, is a monoid homomorphism if and only if it satisfies f(x1 Ox x2 ) = f(xl ) O y f(x 2 ) , regardless of whether one or both monoids are commutative.
8.49. Exercise. If f : X ---> Y is an isomorphism (i.e. , a bijective homomorphism) from one algebraic system of arity T onto another, then f - 1 : Y ---> X is also a homomorphism. 8.50. Let T be an arity function. Our main interest lies not in all algebraic systems of
type T, but just in those algebraic systems that satisfy a given collection of identities, as explained below. Let X be an algebraic system of arity T. A term in X is an n-ary operation on X that is formed by composing finitely many of the fundamental operations, finitely many times. For instance, if cp 1 is a 1-ary operation and cp2 is a 2-ary operation, then the function
p(w, x, y, z) is a term in the algebraic system. Note that the right side does not depend on w; this illustrates that a term is not required to depend on all of its arguments. The identity map x f--+ x will be considered a term; it is the composition of no fundamental operations. Note that our method of specifying a term depends only on the arities of the cpj ' s (i.e., the values of T(j)) and on the order of composition of the cpj ' s, not on other information about X or the cpj 's. For instance, if T ( 1 ) = 1 and T(2) = 2, then we can define a term by cp2( x, cp 1 ( z)) but not by r.p2 ( x, r.p 1 (z), w) - regardless of other properties that may or may not be enjoyed by the functions cp 1 and cp2 . Hence corresponding compositions of fundamental operations can be used to define corresponding terms in different algeb�;aic systems, provided they are of the same arity T. By a "term of arity T" we shall mean a method of specifying a term. The method does not refer to any particular algebraic system X; it specifies a corresponding term for each algebraic system of arity T.
Chapter 8: Elementary Algebraic Systems
204
An equational axiom, or identity, for algebras the form
X
of arity
T
is a condition on
X
of
where p and are terms of arity T . Such a condition is satisfied by some algebraic systems of arity T and not by others. For instance, let T be an arity function such that T ( 1 ) = 2 - i.e., such that rp 1 is a binary operation. Then the commutative law for rp1 is the equational axiom rp1 (x 1 , x2) = rp1 (x2, x 1 ) . This equational axiom is satisfied by the binary operation of a commutative group such as (lR, + ) , but not by the binary operation of a noncommutative group such as Perm(X) - see 8. 10.i. Let T be an arity, and let :J be a collection of identities compatible with T . By an algebraic system of variety ( r, :J) we shall mean a algebraic system X of arity T that satisfies all the identities in :J. Examples will be given starting in 8.52.
q
8.51. Proposition ( optional) . The equational variety ( T, :J) is a complete theory, in the
following sense: Let n be a nonnegative integer, and let each taking n arguments. Consider the equation
p and q be
any terms of arity
T,
(not necessarily belonging to :J ) . Then
equation ( * ) is a semantic theorem in ( T, :J ) , in the sense that it is satisfied by every algebraic system of type ( T, :J),
if and only if
equation ( * ) is a syntactic theorem in ( T, :J) , in the sense that it can be deduced from the identities that belong to :J by using finitely many substitutions.
Remarks. Thus, for any equation ( * ) , we can find either a proof (as in 8. 7) or a counterex ample (as in 8.27, which shows by example that not every ring is commutative).
Sketch of proof. We shall omit most of the proof, since it is not needed later in this book; it can be found in more detail in Johnstone [1987] and in other textbooks. Obviously, any syntactic theorem is also a semantic theorem. To prove{3any semantic theorem is a syntactic {3 theorem, the main idea is this: Call two terms a and "equivalent" if the equation a = is a syntactic theorem in ( T, :J); this is an equivalence relation on terms. The quotient set - i.e., the set of all equivalence classes - can be made into an algebraic system 3 of type ( T, :J) in a natural way. Since p = is a semantic theorem, it is satisfied by 3; hence p = is a semantic theorem. Optional exercise. Carry out this argument in detail for some particularly simple variety - e.g. , the variety of monoids, described in 8.53. Related discussion: see 14.58.
q
q
205
Examples of Equational Varieties
EXAMPLES OF EQUATIONAL VARIETIES 8.52. Let 7 be defined on the set J = { 1 , 2} by the values 7 ( 1 ) = 7(2) = 2. Then an
algebraic system of arity 7 means a set X equipped with two binary fundamental operations. Let X and Y be two such algebraic systems, with fundamental operations denoted by V and /\. Then a mapping f : X ----+ Y is a homomorphism (of arity 7) if and only if it satisfies
for all x1 , x2 E X . A lattice i s an algebraic system with this arity function 7, which also satisfies the equa tional axioms L 1 -L3 of 4.20. Thus, lattices make up the equational variety ( 7, { L 1 , L2, L3} ) . Some algebraic systems of arity 7 are lattices, and some are not. A lattice homomorphism is a homomorphism of arity 7 between two lattices. In many contexts we describe a lattice in terms of its ordering � but for purposes of ' this chapter we must instead describe a lattice in terms of its fundamental operations V, /\. Most other kinds of ordered sets - posets, chains, directed sets, etc. - cannot be described in an analogous fashion, and so they do not form equational varieties.
8.53. Let 7 be the function defined on J = {0, 1 } by the values 7(0) = 0 and 7 ( 1 ) = 2. Then an algebraic system of arity 7 is a set X equipped with one nullary operation rp0 (i.e. , a specially selected constant member of X) and one binary operation rp 1 . A homomorphism from one algebraic system of this arity to another is a mapping f : X ----+ X' that satisfies
f(rpo) = rp�
and
for all x1 , x 2 E X. A monoid is an algebraic system with this arity 7 whose two fundamental operations rpo, 'Pl satisfy these three axioms:
(
'{' 1 x, 'Pl (y,
z))
'P l ( 'f!o , x) = x,
(associative law)
'Pl (x, rpo)
=
x.
(identity laws)
A group is an algebraic system of arity a defined on {0, 1 , 2} by 7(0) 0, 7 ( 1 ) = 2, 7(2) = 1 - that is, with the same fundamental operations as monoids, plus a unary operation rp2 - and that satisfies the equational axioms above and also these two equations: =
(inverse laws)
a means a homomorphism f : X ----+ X' of arity 7 that also satisfies j(rp2 (x)) = rp� (f(x)) , regardless of whether X, X' satisfy the equational axioms for a group. A monoid or group is commutative if it satisfies the equational axioms listed above plus this axiom:
A homomorphism of arity
(commutative law)
Chapter 8: Elementary Algebraic Systems
206
Of course, the various properties of monoids and groups take a simpler appearance if we use the notations introduced earlier in this chapter:
'Po = i,
Thus, we prefer those notations when we are working solely with monoids or groups. The notation 'Po , 'Pt, 'P2 is advantageous only when we are trying to see how monoids and groups fit into a more general theory of algebraic systems.
8.54. Rings with unit were introduced in 8.18. A ring with unit is an algebraic system with arity function given by the table below and satisfying certain identities that we shall not list here. Rings with unit form an equational variety. Attaching one more equational axiom, we obtain commutative rings with unit, another equational variety.
j T(j)
'{Jj
0 1 2 3 4 0 2 1 0 2 0 + - 1
Boolean rings will be studied in 13.13 and thereafter. A Boolean ring is a ring with unit, in which each element satisfies x2 = x. Thus, Boolean rings form an equational variety; we simply add one more identity to the list of identities for rings with unit. The fundamental operations of a Boolean ring are the fundamental operations of a ring with unit: (0, 1 , - , · , + ) . Boolean lattices are another equational variety, described in 13. 1 , with rather different fundamental operations 0, 1 , C, V, /\ , satisfying rather different equational axioms. However, in a certain sense, Boolean rings and Boolean lattices are different views of the same ob jects: Boolean rings and Boolean lattices can be transformed into each other, as described in 13. 14. The terms "Boolean ring," "Boolean lattice," and "Boolean algebra" are used interchangeably in some of the literature, but in this book we distinguish between the ring and lattice viewpoints. A field X has a multiplicative inverse operation x f-+ x- 1 , but that operation is only defined on X \ {0}, not on all of X . Consequently we cannot view fields as an equational variety (unless we replace our definitions with much more complicated definitions, as some mathematicians do) . Instead we shall simply view a field as a particularly interesting member of the variety of commutative rings with unit.
8.55. Let lF be a field. In Chapter 1 1 we shall introduce lF-linear spaces. These form an equational variety, but their arity is a little more complicated to describe. The operations of an lF-linear space X are the operations of an additive group, together with the operation of scalar multiplication. In most contexts, scalar multiplication is thought of as a mapping m : ( c, x ) f-+ ex, from lF x X into X. However, to fit scalar multiplication into our theory of universal algebras, we prefer to think of scalar multiplication as a collection of many unary operations me : x f-+ ex. We have one mapping from X into X for each c E lF. If lF
207
Examples of Equational Varieties
is an infinite field (such as the real numbers IR or the complex numbers C), then we have infinitely many of these unary operations. We obtain the equational variety of lF-linear algebras (defined as in 1 1 .3) by adding one more fundamental operation (vector multiplication) governed by a few more identities.
8.56. Lattice groups were introduced in 8.38. They are an equational variety, with the
fundamental operations of additive groups plus the fundamental operations of a lattice, and with appropriate equational axioms. Part of our definition of a lattice group was the translation-invariance of the ordering, x � y =? x + z � y + z, introduced in 8.30. However, an implication is not an equational axiom, and a partial ordering is not a binary operator; the condition x � y =? x + z � y + z is not permitted as an ingredient in our theory of universal algebras. How can the condition be reformulated? We can dispense with � replacing statements of the form x � y with ' corresponding statements of the form x V y = y. Thus, the translation-invariance of the ordering can be restated as the equational axiom ( + z) V (y + z) = ( x V y) + z (introduced in 8.42.a) . Vector lattices and lattice algebras will be introduced in 1 1 .44. They are equational varieties, with the fundamental operations of vector spaces or algebras together with V , /\. Ordered monoids, ordered groups, and ordered vector spaces, introduced in 8.30 and 1 1 .44, are not equational varieties, since their orderings cannot be described in terms of fundamental operations.
x
C hapter 9 Concrete Categories
monoids (monoid homom. ) lattices (lattice homom. )
additive groups (additive maps)
lattice groups (additive lattice homomorphisms)
TAG (contin. additive)
vector lattices (linear lattice homom. )
metric spaces ( unif. contin.)
I G-normed spaces (contin. additive)
F-normed spaces ( contin. linear)
9.1. Preview. The chart above shows some of the most basic categories that we shall
consider in this book. (An additional chart at the beginning of Chapter 22 shows some more advanced categories. ) The components of a category are (i) its objects - sets with additional structure - and (ii) its morphisms - mappings between those sets, which (in most cases
208
209
Preview
of interest) preserve that additional structure in at least one direction. Morphisms are indicated in parentheses in the chart; for instance, "topological spaces (continuous)" is included in the chart to indicate the category whose objects are topological spaces and whose morphisms are continuous maps between those spaces. Precise definitions will be given in 9.3, and examples will be given in some detail starting in 9.6. Some of the categories mentioned in this chapter are not introduced formally until later; this chapter may be considered as a preview of those categories. The line segments in the chart indicate natural relations between categories via forgetful functors (discussed in 9.34 ) . Some, but not all, of these forgetful functors are given by the inclusion of a subcategory in a category (discussed in 9.5) . The category theory being introduced here is based loosely on the theory of Eilenberg and Mac Lane. It should not be confused with Baire category theory, an unrelated topic introduced elsewhere in this book. The Eilenberg-Mac Lane theory was originally developed mainly for applications in algebraic topology (discussed briefly in 9.33); recently it has also been useful in the abstract theory of computer programs. However, most theorems of Eilenberg-Mac Lane category theory are irrelevant to the purposes of this book and will be omitted. The language of the Eilenberg-Mac Lane theory is useful to us, but we shall take the liberty of modifying that language slightly to make it more useful for the purposes of analysts; thus some of our definitions differ slightly from the definitions to be found in books on category theory. Some other introductions to category theory can be found in Herrlich and Strecker [1979] , Mac Lane [1971], and Mac Lane and Birkhoff [1967] .
9.2. Introductory discussion. We say that two objects X and Y are isomorphic if there is a
correspondence between them that preserves (in both directions) all the structure currently of interest. Such a mapping is then called an isomorphism. Different branches of mathe matics, being concerned with different kinds of structures - order, algebraic, topological, uniform, etc. - have different meanings for the terms "isomorphic" and "isomorphism." (This multiplicity of meanings may confuse some beginners. ) However, most meanings of isomorphic and isomorphism can be subsumed by one abstract meaning developed in this chapter; see particularly 9. 14. If two objects A and B are isomorphic, then they differ only in their labeling and are essentially two different representations of the same object. They can be used interchange ably, provided that we are willing to relabel everything else that they interact with. The "essence" of the objects is the part of them that does not depend on the particular choice of representation. This interchangeability is the heart of mathematics (and, indeed, of all abstract thinking) ; for instance, the "essence" of the number 4 does not depend on whether we are dealing with four apples or four airplanes. When two objects X and Y are isomorphic, we may sometimes identify X and Y, and treat them as equal, because for most practical purposes they are the "same" set. We rp.ay even write X Y, if this will not cause confusion. More generally, suppose X is isomorphic to a subset of a set Y; then we may identify X with that subset and write X .. ) gj ( >.. ) } belongs to :.f. In other words, (h, h, . . . , fn ) is equivalent to (g1 , g2 , . . . , gn) if and only if h is equivalent to g 1 , h is equivalent to g2 , . . . , and fn is equivalent to gn. Therefore, an equivalence class of n-tuples can be represented as an n-tuple of equivalence classes. It is easy to verify that =
=
for any n sets S1 , S2 , . . . , Sn · 9.48. What about an infinite product of sets? Not all of the reasoning in the preceding section generalizes readily. Let 's see what goes wrong. Let :J' be a filter on A, and let h, h, !3, . . . be infinitely many functions defined on A. Then a sequence of functions (h, h, !J, . . . ) may also be viewed as a sequence-valued function. We may use the two viewpoints interchangeably. Observe that two sequence-valued functions f = (h, h, !J, . . . ) and g (g1 , g2 , g3, . . . ) are equivalent in the sense of 9.40 if and only if the set { >.. E A : f(>.. ) g(>.. ) } belongs to :J' - that is, if and only if the set n;: 1 {>.. E A : IJ (>.. ) = gj (>..) } belongs to :.f. Since :.r is a filter, this condition implies, but is not necessarily implied by, the condition that each of the sets {A E A : fJ (>.. ) gj (>..) } belongs to :.f. In other words, if (h, h, !J, . . . ) is equivalent to (g1 , g2 , g3, . . . ) , then each fJ is equivalent to gj , but not necessarily conversely. An equivalence class of sequences is not the same thing as a sequence of equivalence classes. For instance, let A = N, and let :.r be the cofinite filter on N. Let fj be the constant function 0, and let gj : N N be defined by gj (k) bj k , where 6 is the Kronecker delta (defined in 2.2.d). Then for each j, we see that fJ is equivalent to gj since they agree everywhere on N except at one point. But f (h, h, !J, . . . ) is not equivalent to g (g 1 , g2 , g3, . . . ) since they agree nowhere on N - indeed, f(n) and g(n) differ in their nth coordinate. 9.49. Reduced powers of functions. How do we extend functions? For instance, we would like to extend the function sin : lR lR to a function * sin : *JR *JR; how is this accomplished? Let p : X Y be a function from one set to another. There are a couple of natural methods for defining a reduced power *p : * X *Y; fortunately they yield the same result. (A) One method is to identify a function with its graph. Then a function is a set of ordered pairs, no two of which have the same first element. Show that if Gr(p) � X Y is the graph of a function p : X Y, then * ( Gr(p)) � *X *Y is the graph of a function from * X into * Y, which we shall denote by *p. Thus *(Gr(p)) = Gr(*p). Note that, since Gr(p) � Gr(*p), the function *p is an extension of the function p - that is, we have X � * X, and *p(x) p(x) for every x E X. (B) Another method is by this rule: If p : X Y is some function, we wish to define a function (*p) : *X ---+ *Y by specifying its value on each � E *X. Any � E * X may be written in the form 1r(f) for some function f : A n (where =
=
=
---+
=
=
---+
---+
=
---+
---+
x
x
---+
=
---+
---+
235
The Reduced Power Functor
f2 is some sufficiently large codomain), and : f2A -+ *fl is the quotient map taking functions to their equivalence classes, for some sufficiently large codomain n. Show that the mapping f p o f respects the equivalence relation on XA - that is, 7r(fi ) = 7r (f2) =} 1r (p o fi ) = 1r(p o f2) (see 3.12). Hence a function ( *p) : *X -+ *Y is well defined by the rule for f E XA . (*p) ( 7r(j)) 7r(p f) Show that these two definitions yield the same function *p. When p is some familiar function, then it is customary to write *p without the star. For instance, the extension of sin, which would naturally be written * sin, is customarily written sin instead. 7r
f---+
0
9.50. Further properties of reduced powers of functions.
The taking of reduced powers preserves identity maps - i.e., if ix is the identity map of X, then * (i x ) is equal to the identity map of the set *X. Also, the taking of reduced powers preserves composition of functions; that is, *(p o q) = (*p) o (*q) for any functions q : W -+ X and p : X -+ Y. From these two facts it follows that the taking of reduced powers is a covariant functor from the category of sets to the category of sets; that term was introduced in 9.31. b . The reduced power *p : *X -+ *Y is injective or surjective if and only if the mapping p : X Y has that property. Let f : X -+ Y and let S r u + z whenever u, z E D and u � � + TJ and z � ( p - z >r whenever u, z E D and u � sup{x + y : x, y E D, x � � ' y � ry} and z�( p - z ";r sup{x + y : x, y E D, x � � ' y � ry} whenever z E D and z � ( p - z >r x + y whenever x, y, z E D and x � �' y � TJ, z � ( p >r x + y + z whenever x, y, z E D and x � � ' y � ry, z � (. The last statement is symmetric in � ' ry, (; hence the first statement is not affected by a permuting of those three terms. Thus addition in X (defined as in ( al)) is associative. Since D is sup-dense in X, the addition in X also satisfies � + 0 = 0 + � = �. Thus we have established that (X, 0, +) is an additive monoid. Define -� as in (bl); the mapping x -x from D into D is thus extended to a mapping from X into X. To show that (X, +, -, 0) is an additive group, fix any � E X, and let 1 = � + ( -0; we must show that = 0. Observe that for v E D, and hence 1 = � + (-�) = sup{x - v : x, v E D, x � � � v}. From this it follows immediately that 1 � 0. To show that 1 >r 0, we shall apply 3.21.g. Fix any u E D with u >r it suffices to show that >r 0. From u >r 1 we conclude, successively, that u
1
li
r--+
u
245
Ordered Fields and the Reals :r ,
v E D, x � � � v implies
u
�x-
v;
for each v E D with v � � we have x E D, x � '
�
=? x � u + v;
for each v E D with v � �' we have � � u + v;
N(u) is bounded below; N( -u) is bounded above.
0.
By assumption D is integrally closed; hence u � Thus X is an additive group, when equipped with the operations + and - defined as in (al) and (bl ) . The ordering is translation-invariant (as defined in this follows trivially from our definition of addition in X. Hence X is in fact an ordered group, with D as a subgroup. Therefore inf( -S) = - sup(S) for any set S � X; now (a2) and (b2) follow from (al) and (hl ) .
8.30);
10.6. Suppose the conditions of the preceding theorem are satisfied. Suppose, also, that :
--->
Q is another ordered group, and f D Q is a sup-preserving group homomorphism, and F X Q is a sup-preserving extension of f. Then F is also a group homomorphism. Proof. It suffices to show that F preserves addition. Define sets Lc as in 4.31; then :
(
___,
= sup(Lc ) . Then for any � ' {.r + y
:
X.
7)
E X we have
y E D, X � C Y � 7) } {x E D
x � 0 + {y E D
y � 7) }
Hence F(� + 1))
F(sup(LE + L ,1 ) ) sup(f(LE + L ,1 ) ) sup(f(Ld + f(L ,1 ) ) sup(f(Ld) + sup(f(£ ,1 ) ) F(sup(Ld) + F(sup( £ ,1 ) ) F(O + F( 7l )
10.5
by (al ) in since F is sup-preserving since f is additive on D by since F is sup-preserving since D is sup-dense in X .
8.33
ORDERED FIELDS AND THE REALS 10. 7. Definitions.
A chain ordered ring is a ring R equipped with an ordering S such
that (i) (R, S) is a chain: (ii) the ordering is translation-invariant - that is, all x, y. u E X; and
:r
S
y =? x + u
S
y + u for
246
Chapter 1 0: The Real Numbers ::::}
(iii) x, y � 0 xy � 0. If R is also a field, we shall call it a chain ordered field. (Some mathematicians call these an ordered ring and an ordered field, respectively, but that is not specific enough for the purposes of this book.) Some basic examples. The rational number system Q (with its usual ordering) is clearly a chain ordered field; we shall assume informal familiarity with that fact, but it also follows from a construction presented in 10.11. The real number system JR, introduced in the next paragraph, is a chain ordered field. Certain other subsets of lR are also chain ordered fields - for instance, {a + bj2 : a, b E Q}, which is introduced in 10.23.a. In 10.12 we present a chain ordered field that is not contained in R 10.8. We now define the real number system lR to be a Dedekind complete, chain ordered field (or, in the terminology of some mathematicians, a complete ordered field). To make sense of this definition, we shall show that (i) there exists a Dedekind complete, chain ordered field, and (ii) any two such fields are isomorphic; thus, there is only one real number system. We shall prove those facts in 10.15. Discussion. Intuitively, we usually think of the real number system as a model for the set of all points on a Euclidean straight line. However, that description has certain drawbacks. It does not determine lR uniquely, for it also fits *JR quite well. Also, the geometric description does not translate readily into usable algebraic axioms. We also think of reals as "infinite decimal expansions" such as 3.14159265358979323 · · · . In grade school we learn, informally, how to perform arithmetic operations with such ex pansions. A formal theory of such expansions is sketched in 10.44 and 10.45. Perhaps this view of the real number system is the most concrete and the most useful for purposes of real-world applications - in physics, engineering, etc. However, in advanced analysis we usually consider the decimal expansions to be just rep resentations for numbers, not the numbers themselves. Those numbers have other represen tations (in binary, in ternary, in hexadecimal, etc.). In the development of abstract theory, what we really need are not concrete representations such as 3.14159265358979323 . . . , but the essential properties of the real numbers, which are used to prove theorems. That lR is a field means that we can do ordinary arithmetic; that it is chain ordered means that inequal ities work the way they should; that it is Dedekind complete means that we can take sups, infs, and limits. Analysts often take these ideas for granted and forget how complicated a structure the real number system is. Actually, we shall prove the existence of lR in several different ways. The proof in 10.15.d is fairly detailed; other proofs are sketched briefly in 10.45 and 19.33.c. All of the constructions are somewhat complicated and nonintuitive - they represent a real number as a set of rational numbers, or a pair of sets of rational numbers, or a set of pairs of rational numbers, etc. The theorem on the uniqueness of the reals, in 10.15.e, tells us that these constructions of lR from Q all yield the same result. Any one of these constructions is sufficient, and it does not matter which one we use. After we have proved the existence of a Dedekind complete, chain ordered field by representing it in terms of rational numbers, we may discard that representation; we may return to thinking of real numbers as indivisible, primitive objects like the points on a line.
247
Ordered Fields and the Reals 10.9. A few basic properties.
If F is a chain ordered field, then 0 < x < y 0 < -y1 < -.X1 b. If F is a chain ordered field, then F has no greatest or least element. If F is a chain ordered field, then x2 = -1 has no solution x in F. d. If R is a chain ordered ring with unit, other than {0}, then the unique homomorphism from Z to R (noted in 8.19.d) is injective and order-preserving. Thus R contains an isomorphic copy of Z. Hint: Let 1R denote the multiplicative identity of R. Show that 1R > 0, and hence 1R + 1R + + 1R (the sum of finitely many such terms) is also positive. e. If F is a chain ordered field, then the unique ring homomorphism from Q into F (noted in 8.23.c) is injective and order-preserving. Thus F contains a uniquely determined isomorphic copy of Q. Identifying various sets with their isomorphic copies when no confusion will result, we may write N
lR
lF
lF
lR 1
250
Chapter 1 0: The Real Numbers
isomorphism preserves sums. Applying 10.6 to the multiplicative groups of positive elements, we see that that isomorphism also preserves products. f. ( Optional.) Let lF be an ordered field. Show that lF is Archimedean if and only if (after relabeling by isomorphism) we have Ql � lF � IR, in which case Ql is both sup-dense and inf-dense in lF, and IR is the Dedekind completion of lF. 10.16. Remarks. Our proof of the uniqueness of IR depends on our use of conventional language and logic. If we change our rules of inference - e.g., if we restrict ourselves to first order language and logic, as is common in nonstandard analysis - then there may be many different models of the real line (though they may be indistinguishable except through the use of higher-order language and logic). See 14.68. 10.17. ( Optional.) Let lF be a chain ordered field. Let lF be equipped with the order interval topology (see 5.15.f) and the resulting convergence (see 7.41 and 15.41). Then it can be shown that the following conditions are equivalent. (A) lF is Dedekind complete and thus is the real line. (B) If (xn ) is any monotone, bounded sequence in lF, then ( xn ) has a limit in lF. (C) lF is Archimedean, and every Cauchy sequence in lF (defined as in 10.14) has a limit in lF. (D) lF is connected (defined as in 5.12). ( E ) For any a, b E lF with a � b, the set [a, b] = {x E lF : a � x � b} is compact (defined as in 17.2). (F) For any a, b E lF with a � b, the set [a, b] = {x E lF : a � x � b} is pseudocompact (defined as in 17.26.a). We shall not prove the equivalence. These conditions and others are proved equivalent by Artmann [1988]; that exposition is based in part on Steiner [1966]. Artmann's book also gives an example of a non-Archimedean field in which every Cauchy sequence converges. Thus Dedekind completeness is not the same thing as Cauchy completeness. THE HYPERREAL NUMBERS 10.18. By the hyperreal line (or the hyperreal number system ) we shall mean any non-Archimedean, chain ordered field lHI that contains IR as a subfield; the members of lHI are called hyperreal numbers. Strictly speaking, there are many hyperreal lines. We gave one construction in 10.12; two more constructions are given in 10.19 and 10.20. However, usually we work with just one such field at a time, and so it is convenient to call that field "the" hyperreal number system while we are working with it. Let lHI be a hyperreal line; thus Z � Ql � IR � JHI. Then:
251
The Hyperreal Numbers
a.
lR
Elements of are called real numbers, or sometimes (for emphasis) standard real numbers.
A hyperreal number � is called bounded if - r < � < r for some real number r; otherwise � is unbounded. (Other terms commonly used in place of "bounded" are limited and hyperfinite.) Clearly, any real number is bounded. Show that some hyperreal number a is unbounded. Then ±a, ±2a, ±3a, . . . are different unbounded hyperreal numbers, and a · a is yet another one. The set of positive unbounded hyperreal numbers has no largest or smallest member. The set of bounded hyperreals is a commutative ring with unit. c. A hyperreal number � is called infinitesimal if -r < � < r for every positive real number r. Show that 0 is the only real infinitesimal. Show that a nonzero hyper real number � is infinitesimal if and only if 1/� is unbounded. The set of positive infinitesimal numbers has no largest or smallest member. Every positive real number is an upper bound for the set of infinitesimals. Show that the set of infinitesimals does not have a least upper bound in !HI. This illustrates the fact (which we already knew) that is not Dedekind complete. The set of all infinitesimals is an ordered ring (without unit). (Some mathematicians exclude 0 when they define infinitesimal, but that definition has the disadvantage that the resulting set of infinitesimals does not have such a nice algebraic structure.) d. Two hyperreal numbers are said to be infinitely close (or infinitesimally close) if their difference is an infinitesimal. Let � be a bounded hyperreal number. Show that there is one and only one real number r that is infinitely close to �- That number r is called the standard part of �; we may abbreviate it by std(�). Hint: To show that there is at least one such number, use the Dedekind complete ness of to prove that there is a real number r = inf{ lR s > 0; then show it has the required properties. e. Show that {bounded hyperreals} {real numbers} {infinitesimals} is an internal direct sum decomposition of one additive group into two subgroups. Show that std {real numbers} {bounded hyperreals} is an isotone map (for the ordering) and a ring homomorphism. Thus, nestled around each real number r there are infinitely many bounded hy perreal numbers, all infinitely close to that real number r. In some books, some of these hyperreal numbers are denoted by r + E, r - r + 8, r - 8, etc.; a picture of a microscope is sometimes used to suggest their closeness to r. b.
lHI
lR
s E
EEl
-+
E,
:
252
Chapter 1 0: The Real Numbers
It can be shown that the smallest hyperreal line is the field IR.(x), of rational functions in one variable with real coefficients, constructed as in 10.12. Indeed, if we take the real line and adjoin some element x that is infinitely large, then x acts as a transcendental over JR. and hence acts algebraically as an indeterminate - i.e., as a variable. The resulting field generated by JR. U { x } must then be JR.( x). This is discussed by Fleischer [1967]. Fleischer also points out that, although we may not be able to extend quite as many functions in the setting of this field as we did in 9.49, at least we can extend some functions constructively. See also the related remarks in 10.20.c. 10.19. Ultrapowers of the reals. Let 9" be a proper filter on a set A. Define the reduced power *JR. = JR.A /9" and its arithmetical operations and ordering as in 9.41, 9.49, and 9.51. Then *JR. is a ring with unit by 9.53 since JR. is a ring with unit. In fact, *JR. is a commutative lattice algebra since JR. is. (Similarly, the hypernatural numbers *N inherit some of the properties of N.) Recall from 9.52.e and 9.52.f that *JR. is chain ordered if and only if 9" is an ultrafilter. Show that *JR. is a field if and only if 9" is an ultrafilter. Hints: If 9" is not an ultrafilter, then A can be partitioned into sets A1 and A2, neither of which is an element of 9". Let a 1 and a2 be their characteristic functions. Show that neither a1 nor a2 is equivalent to 0, but their product is 0. b. Suppose 0, A, 9" satisfy the conditions of the Enlargement Principle (9.54), and JR.
> n.
=
T
=
"
:
n,
:
=
=
=
=
:
=
=
} -+ X
=
> -oo
then p :::; q and it suffices to show that p � q. We may assume that 7j and p < +oo. Since (hJ ) is a subsequence of (gj' : j E N) , which is in turn a subsequence of (gj' -1 : j E N), we have Hence the numbers q" - 1 are bounded below by the number q, which is not -oo. Also, p" (x" ) < p" < p < +x. Sin.cen p < +oc, we have q" - 1 (x") :S: for all sufficiently large. For those our definition n I of x tells us that q - cannot be +oo. Thus for all sufficiently large we have qn- I finite, and therefore (by our definition of x" ) -oc < 7j :S: qn - 1 < 1 + q" - l (x") < 1 + p < +x. Taking limits yields 7j :S: p. n
n
n,
n
n
n
266
Chapter 1 0: The Real Numbers
10.39. Convergence of infinite series. Let a 1 , a2 , a3, be complex numbers (or, in par ticular, real numbers). Then the expression 00 or • • •
is called a series (or an infinite series). That expression also represents the limit of the sequence - that is, the value of limn_,00 2:: �= 1 a k - if that limit exists. (We say the series is the "limit of the partial sums.") When the limit exists, we say the series 2:: � 1 a k converges. When the limit fails to exist, we say the series diverges. The definitions above all generalize readily to the case where a 1 , a2 , a3, . . . are members of any monoid equipped with a Hausdorff convergence structure (see 7.36). In 22.20 we consider the case where X is any Banach space; in Chapter 26 we consider the case where X is a topological vector space. For infinite series of real numbers, it is customary to extend the definition a little further: When a 1 , a 2 , a3, . . . are real numbers, then 2:: � 1 a k is understood to mean the limit of a 1 + a 2 + · · · + an, not just in IR, but in the extended real line [-oo, +oo]. When the limit happens to be +oo, some mathematicians say that the series diverges to infinity. (Some mathematicians say that the series converges to infinity, but we shall not follow that terminology in our discussion of infinite series.) Similar terminology applies for -oo. The sequence (ak) (a 1 , a2 , a3, . . . ) should not be confused with the series 2:: � 1 a k = a 1 + a 2 + a3 + For instance, the sequence (2- k ) ( � , i , k , . . . ) converges to 0, while the series 2:: %"= 1 2 - k = � + i + k + · · · converges to 1 by the result in 10.4l.d. 10.40. When all the a k ' s are nonnegative real numbers, then 2:: %"= 1 a k always exists in [0, +oo] - that is, the series always converges to a finite number or diverges to +oo. We may abbreviate these two cases by saying simply that 2:: %"= 1 a k < oo or that 2:: %"= 1 a k = oo. More generally, let (a,x >. E A) be any parametrized collection of members of [0, +oo]. Then we define the sum L-X E A a,x to mean the supremum of all sums of the form L-XE L a,x for finite sets L � A. The supremum exists, since [0, +oo] is order complete. Again, the order of the terms does not affect the summation. Exercise. When A N, then this definition is equivalent to the one given earlier for 2:: %"= 1 a k . Actually, we are mainly interested in the countable case, because ( exercise) if L-X E A a,x is finite, then at most countably many of the a,x 's are nonzero. Hint: Show that for each positive integer m, the set Am { >. E A : a,x rk} is finite. · ·
· .
=
=
:
=
=
>
10.41. Some basic properties of convergent series. a.
Suppose 2::;: 1 aj and 2::;: 1 b1 are convergent series of real or complex numbers with finite sums, and k is any constant. Then 2::;: 1 (a1 + b1 ) and 2::;: 1 (ka1 ) are also convergent series, with sums equal to (2::;: 1 aj ) + (2::;: 1 bj ) and k 2::;: 1 aj , respectively.
267
Convergence of Sequences and Series
If aj :::; bj for all j, then 2".:::� 1 aj :::; 2".:::� 1 bj. In particular, if 0 :::; aj :::; bj and 2.::: bj is convergent, then 2.::: aj is convergent. c. If 2".:::� 1 aj is a convergent series of real or complex numbers, then limj__,00 aj 0. On the other hand, if limj__,oo aj 0, it does not follow that 2".:::� 1 aj is convergent - for example, consider the harmonic series in 10.41.f below. d. Geometric series. Show that 1 + r + r 2 + · · · + rn (1 - rn + 1 )/(1 - r) if r =/= 1, and hence 1 / (1 - r) if jrj < 1 1 + r + r2 + r3 + · · · divergent if jrj 2 1.
b.
=
=
=
{
If f : [1, +oo) [0, +oo) is a decreasing function, then 2.:::n ;: 1 f(j) and 00 J1 f(x) dx are both finite or both infinite. In fact, 2:::7�i j(j) :::; f1 +1 f(x)dx :::; -+
e. Integral test.
2.:::7= 1 f(j). f. Corollary.
The series 2.:::;: 1 j -P converges for real numbers p > 1 and diverges if O < p ::; I. In particular, the harmonic series 2.:::;: 1 I 1 + � + � + i + · · · diverges. However, it diverges rather slowly - i.e., to make 2.:::7= 1 I moderately large, we must make incredibly enormous. In fact, the integral test tells us that 2.:::;'= 1 I is approximately equal to In n. For instance, when is a trillion, then 2.:::7 1 I is only approximately equal to ln(10 1 2 ) 12 ln(10) � 27.63. When n is a googol, or 10 100 , then 2.:::7= 1 I is still only about ln(10 10 0 ) 100 ln(10) 230.26. The harmonic series is a sort of "borderline case" - it often takes delicate calcula tions to decide the convergence or divergence of series that are similar to the harmonic series. The series 2.:::;: 2 j 1!, .i and 2.:::;: 2 j( ln j ) fln ln j ) also diverge, even more slowly. On the other hand, the series 2.:::;: 2 j (l; converges. g. Alternating series test. If a 1 2 a 2 2 a3 2 · · · 2 0 and limn __,00 an 0, then the series a 1 - a 2 + a3 - a 4 + · · · converges. We omit the rather elementary proof (which can be found in most calculus books), since we shall prove a stronger result in 22.21. n h . Let tn 1 + 1 + 1 + · · + 1 - In . Show that tn - tn + 1 Jrn + 1 1 dx - n +1 1 > 0. Hence the sequence t 1 , t2 , t3, . . . is bounded and decreasing. It therefore converges to a limit, which is called Euler's constant. That limit is approximately 0.577215664901532 . . . ; it is commonly denoted by 10.42. When we add up only finitely many numbers, or add up infinitely many nonnegative numbers, then it does not matter in what order we add them; the result is the same. However, when we add up infinitely many numbers, some positive and some negative, then changing the order of the terms may affect the answer. For instance, using 10.41.h, show that 1 - 2 + :l 4 + - · · · In 2 but 1 1 1 + -1 - -1 · · · 1 + -31 - -21 + -51 + -71 - -41 + -91 + -32 ln 2. 11 - -6 + 13 15 8 =
n
=
n
=
�
=
.i ) 2
=
2
=
3
·
:n
n
f.
l
l
-
l
l 5
=
'
=
;:
268
Chapter 1 0: The Real Numbers
If we change the order of the terms a bit more, we obtain the series 1 + -1 + -1 - -1 + 1 + 1 + 1 + 1 1 1 - -21 + -31 5 7 6 9 11 13 15 8 4 1 odd term 2 odd terms 4 odd terms 1 1 1 1 1 1 + 17 + 19 + 21 + 23 + 25 + 27 + 291 + 311 - 101 + 8 odd terms which converges to +oo. Hints: Observe that '-v-'
'-v-'
-
etc. Also, show that 1 ln(1) - ln(2) 1 ln(2) - ln(3) - 81 ____: ln(4) . ln(3) _:.__-2_-'---C. -4 > ' 2 ' -6 > 2 ' Other rearrangements of this series yield other sums. In fact, it can be proved that any number L in [ -oo, +oo] can be obtained as the sum of a suitable rearrangement of the series above. ( Hints: Obtain L -oo in a fashion analogous to the method used above for the sum L +oo. Now consider any finite number L. Take just enough positive terms to get a partial sum that is greater than L; then take just enough negative terms to get a partial sum that is less than L; then just enough positive terms . . . ; etc.) Thus, it is erroneous and misleading to say that 2:::: � 1 ak a 1 + a 2 + a3 + · · · is simply the "sum" of the ak 's. To be more precise, we must say that 2:::: � 1 a k is the sum of the ak 's in the specified order; this is reflected in our definition 2:::: � 1 ak limn � oo (a l +a2+· · ·+an ) · Different orderings of the ak 's yield different partial sums Sn a 1 + a2 + · · · + an and thus different sequences (sn), which may have different limits. Intuitively, it may be helpful to view a series this way: The numbers in a 1 + a2 + a3 + · · · are not all added "simultaneously;" rather, the leftmost terms are added earlier than the terms occurring farther to the right. See the related results in 23.26. 10.43. Example. We shall now show that the series 2:::::= 1 � I sin(nx) I diverges, for any real number x that is not a multiple of (However, in 22.22 we shall show that the series 2:::: := 1 � sin(nx) converges for any real number x.) Proof Since I sin( n( x + 1r)) I I sin( n x ) I, we may translate x by any multiple of 1r; thus we may assume that /2 < x < 1r/ 2 and x -/=- 0. Since I sin( - nx) I I sin( nx) I, we may assume that 0 < x < 1r/2. Choose a positive integer M large enough so that (M - 1)x > 21r. Consider any M consecutive integers k+ 1, k+2, . . . , k+M. Since (k+ M)x- (k+ 1)x 21r, the angles (k + 1)x, . . . , (k + M)x go a bit more than once around a circle. Those angles cannot skip across the interval (�1r, �1r) (modulo 21r) without taking a value in that interval, since that interval has width 1r/ 2, which is larger than x. Thus at least one of those angles lies in the interval (�1r, �1r) (modulo 21r), and so at least one of the numbers sin( ( k + 1 )x), . . . , sin( ( k + M)x) is larger than � V2. Hence for any nonnegative integer j >
=
=
=
=
=
Jr.
-Jr
=
=
>
Convergence of Sequences and Series we have max Therefore
{
269
sin(Mj + M) sin(Mj + 1) sin(Mj + 2) ... ' Mj + 1 ' Mj + M Mj + 2 '
f I sin(n) l
n= l which diverges to
oo
M "' "' I sin(Mj + p) l Mj + p L..., = p=l jL...,
n
oo
O
}
>
l2 J2
Mj + 1
>
since the harmonic series does.
10.44. Decimals from real numbers. Let D = {0, 1 , 2, . . . , 9}. For each sequence cr ( s 1 , 8 2 , 83, 8 4 , . . . ) in DN , let
00
"' 81 L..., = 101
j l
Since the 8J ' s are nonnegative, the partial sums are an increasing sequence, and it is easy to see that they are bounded above; hence the series converges to a finite real number h ( cr). Then the expression "0.8 1 8 2 83 is called the the decimal representation of the number h ( cr). Show that ·
·
·"
a. 0 ::; h ( cr) ::; 1 . b. B y a decimal rational we shall mean a number o f the form m/10 k , where m and k are
integers. Show that any decimal rational m/ l Ok in (0, 1 ) is equal to h ( cr) for exactly two different sequences cr: one that is all Os after a certain point, and another that is all 9s after a certain point. (For instance, 3.279999 . . . = 3.280000 . . . . ) c. Any other real number r E (0, 1 ) (i.e., not a decimal rational) is equal to h ( cr) for exactly one sequence cr .
d. Note that there are only countably many decimal rationals in [0, 1] . Use this to show that card( [O, 1]) = card(D N ) . e. We evolved the decimal representation system because we each have ten fingers. But mathematically, there is nothing special about the number ten. An analogous system, which might develop on a planet where the people have b fingers for some integer b > 1 , would use representations of the form
with SJ E {0, 1, 2, . . . , b - 1 } . Thus card([O, 1]) = card( {0, 1, 2, . . . , b - 1 } N ) . In particular, we could take b = 2 . Thus card( [O, 1 ] ) = card(2 N ) . f. Cardinality of the reals. Conclude that card(IR) = card(2 N ) . The number system C , defined in 10.24, has a natural bijection to lR x IR; conclude that also card(C) = card(2 N ) .
Chapter 1 0: The Real Numbers
270
10.45. Real numbers from decimals ( optional). In the preceding section, we considered lR as already known � i.e., as defined in 10.8 and constructed in 10.15.d � and we studied decimal expansions as infinite series in R Historically, decimal expansions predate the abstract ideas of a Dedekind complete, chain ordered field. We could actually construct a Dedekind complete, chain ordered field by using formal decimal expansions; these ideas were published by Stolz in 1886. We assume some familiarity with Q, but not with R Define lR to be the set of all infinite sequences of the form
where we identify a sequence ending in infinitely many Os with the corresponding sequence ending in infinitely many 9s (as in 1 0.44. b) . Such a sequence will be represented, as usual, by " z + .y1 y2 y3y4 . . . . " Its rational truncations are the finite sequences of symbols etc.
z,
where " z + . y 1 yz · · · Yk " represents the rational number z + lf5 + -fcill + · + � · Define the ordering of lR in the usual lexicographical fashion. Then it is easy to show that lR is chain ordered and Dedekind complete. Define the arithmetic operations ( +) and ( ) first for rational truncations, in the obvious fashion. Then the sum of two numbers a, b E lR is defined to be the sup of the sums of the rational truncations of a and b. The product of two positive numbers a, b E lR is the sup of the products of their rational truncations. The product of two not-necessarily-positive numbers is defined in terms of the products of defined in this positive numbers; we omit the details. Zealous readers can verify that fashion, is a complete ordered field. This approach is developed in greater detail in various other sources � for instance, Abian [198 1 ] , Dienes [ 1957] , and Ritt [1946] . ·
·
·
JR,
10.46. Constructible numbers. The constructivists' notion o f "number" i s a bit different from the mainstream mathematicians' notion. For a constructivist, a number is acceptable if it can be approximated arbitrarily closely and some estimates can be given for how fast the approximations are converging. Thus, numbers such as J2 and 1r are perfectly acceptable, for we have formulas (albeit complicated) for approximating these numbers to as many decimal places as we may wish. (See also 6.7.) However, the constructivists' notion of "number" has a few surprising consequences. For instance, recall from the footnote in 6.4 that Goldbach's Conjecture asserts that for each integer > 1 ,
k
( * ) the number 2k can b e written as the sum o f two prime numbers. No proof or counterexample for this proposition has yet been found. Although we do not know whether ( * ) is true for every we can easily test it for any particular and thus we can evaluate the number defined by
k,
{�
k,
if ( * ) is true for this if ( * ) is false for this
k k.
Convergence of Sequences and Series
271
Let us also define x 1 = 0. We can evaluate Xk for as many k's as we wish. (So far, all known xk's are 0. Perhaps someday someone will find a k for which Xk = 1 , or will prove that all the xk 's are 0.) Now define X3 + . . . + k Xk (-1) 1000 10 k
r
+ ... .
(To show that this series converges to a real number, either use results about Cauchy sequences in Chapter 19, or prove that the liminf and limsup of the partial sums differ by less than w - n for any n.) We shall refer to this number r as the "Goldbach number." We don't know the "exact value" of r yet. Nevertheless, constructivists would say that f is indeed a "real number," since we have an algorithm that can "find" r as accurately as we wish - i.e., given any > 0, we can compute an approximation f' satisfying i f - f' l The sign of the Goldbach number is related to the Goldbach Conjecture:
f
•
• •
< f.
r = 0, if the conjecture is true; r > 0, if the conjecture is false and the first counterexample (i.e. , the first contradiction to (
* )) occurs when k is even;
r < 0, if the conjecture is false and the first counterexample occurs when k is odd.
We don't yet know which of those three cases holds; it is possible that we will never know. This mysterious quality may make some classical mathematicians reluctant to accept r as a "real number." It leads constructivists to conclude that, even if we know two real numbers (0 and r) to arbitrarily high accuracy, we still may be unable to tell which of the numbers is larger. This makes plausible our assertion in 6.6 that the Trichotomy Law for real numbers is not constructively provable. We shall encounter the Goldbach number r again in 15.48.
C hapter 1 1 Linearity
LINEAR SPACES AND LINEAR SUBSPACES 11.1. Definitions. Let lF be any field. An JF-linear space is a set V equipped with operations 0, - , +, which make it an additive group, and also equipped with another mapping called scalar multiplication, from lF x V into V, satisfying certain rules noted below. The elements of V are called vectors. The elements of lF are then called the scalars; we refer to lF as the scalar field. For any vector v and scalar c , the result of the scalar multiplication of c and v is called their product. It is usually written as c · v or as cv; generally the raised dot is included only for clarification or emphasis. The rules satisfied by scalar multiplication are: (i) 1 . v = v,
a · (;3 · v) = (a;J ) · v, (iii) a · ( u + v) = (a · u ) + (a · v), (iv) (a + J3) · v = (a · v) + (J3 · v) , for all a , J) E lF and u, v E V . The second rule i s a sort of associativity of multiplication, (ii)
although it should be noted that two different kinds of multiplication are involved: scalar times scalar and scalar times vector. The last two rules assert the distributivity of multipli cation over addition; they can also be described as asserting the additivity of the mapping v r---+ a · v (for fixed scalar a) and the mapping a r---+ a · v (for fixed vector v). The same symbol "0" will be used for the additive identities of the scalar field lF and the various linear spaces; it should be clear from the context just which additive identity is meant by any "0." An lF-linear space may be called a linear space, or a vector space, if the choice of the scalar field lF is clear or does not need to be mentioned explicitly. Whenever we work with several linear spaces at once, it will be understood that all the linear spaces are over the same scalar field lF (unless some other arrangement is specified) - e.g. , the discussion may apply to several vector spaces over IR or to several vector spaces over C , but we do not mix the two types unless that is mentioned explicitly. Whenever possible, we prefer not to specify what scalar field is being used, so that we can apply our results to many different
272
Linear Spaces and Linear Subspaces
273
scalar fields. See also the related discussion in 10.34.
11.2. Some basic properties. a. 0 · v = 0 for any vector v; that is, the field's additive identity times any vector v yields the linear space's additive identity, and
b. ( - 1 ) v
= - v for any vector additive inverse of v. ·
v;
v
that is, the field's - 1 times any vector
11.3. More definitions. A linear algebra over a field IF is a set and two multiplication operations :29 and such that
*,
X
yields the
equipped with 0, +,
(i)
X with 0, + , * is a ring. (The operation * may be called the ring multipli cation; in some contexts it is referred to as the vector multiplication.
(ii)
X with 0, + , ® is a linear space over some field IF (and ® is the multiplication of scalars times vectors, often called the scalar multiplication).
(iii) The two multiplication operations satisfy this compatibility rule: c ® (x ( c ® x) y x :29 ( c * y) for all scalars c and vectors x, y.
*
=
* Y)
=
Such an object X is simply called an "algebra" in some of the older literature; we might refer to it as an algebra in the classical sense. For clarification we might call X an algebra over IF. (Perhaps a better term would be linear ring, or IF-linear ring.) If (X, 0, + , * ) is a ring with unit 1 , then the resulting linear algebra is called an algebra with unit, or a unital
algebra.
The linear algebra is said to be commutative if its ring multiplication is commutative - i.e., if x y = y * x for all x, y E X. Of course, we have used the symbols ® and in this introductory discussion only for emphasis. Usually, the multiplication operations are both written as a raised dot ( · ) or indicated by juxtaposition -- i.e. , the product of x and y (with either type of multiplication) is usually denoted x y or xy. Most of the rings used by analysts are linear algebras over the field lR or C. Boolean algebras, studied in Chapter 13 and thereafter, can be viewed as algebras over the finite field z2 {0, 1 } .
*
*
·
=
11.4. Examples. a. Any field IF is a commutative unital algebra over itself. b. More generally, if IF is a field and n is a positive integer, then IFn
= { n-tuples of elements of IF} is a commutative unital algebra over IF. Elements of IF n are customarily represented in the form v = (v 1 , v2 , . . . , vn ) using parentheses and commas, or as n-by1 column matrices, or as the transposes of 1-by-n row matrices: [v 1 v2 vn] T ; see ·
·
·
Chapter 1 1 : Linearity
274 8.26. The vector operations on lFn are defined coordinatewise:
x,
c.
etc., for any vectors y E lFn and scalar c E JF. Still more generally, any product P = flx EA X>. of lF-linear spaces can be made into an lF-linear space, with operations defined coordinatewise:
(f + g)( >. ) f, g E P, >. E
c . ( ! ( >. )) (c . ! ) ( >. ) ( ! ( >. )) + (g( >.)), and scalars c. I f the X>. 's are lF-(unital) algebras, then
for all A, also an lF-(unital) algebra, with vector multiplication
( !g)( >. )
P is
(f( >. ))(g( >. )).
I t i s commutative if the X>. 's are all commutative. In particular, when all the X>. 's are equal to one space X, we see that X A = {functions from A into X } is a linear space or a linear algebra. The product vector space takes a more intuitively appealing form if we write A = {a, {3, "( , } (Here we follow the convention of 1 .32: it is not assumed that A is ordered or countable. ) Then we have . . .
.
=
and for linear algebras
l
Xa + Ya Xf3 + Yf3
Xf3
Ya Yf3
x,
x, + y,
y,
d. Let lF be a field, let n be a positive integer, and let X be the set of all n-by-n matrices
over JF. Then lF is a unital algebra, with vector multiplication given by the multiplica tion of matrices (as defined in 8.27) . This algebra is not commutative if n > 1 . Preview. More generally, i f X i s a linear space, then the linear operators from X into X form a noncommutative unital algebra with ring multiplication given by composition of operators. If X is a topological vector space, we may also consider the continuous linear operators; it is another unital algebra.
Linear Spaces and Linear Subspaces
275
e. Let G be a locally compact Abelian group equipped with its Haar measure, and let L 1 (G) be defined accordingly - see 26.45. It can be shown that L 1 (G) is a commutative algebra, generally not unital, with ring multiplication defined by the convolution
operation (! * g)(t) fc f(t - s)g(s) ds. f. Another important algebraic system can be described as follows: Let X = IR 3 be equipped with the usual vector space operations, as in 1 1 .4.b. The cross product of two vectors is defined by
This multiplication is anticommutative: it satisfies x x y = - y x x, and consequently x x x = 0. In particular, we have
i X j = k, j X i = -k,
j X k = i, k X j = -i,
k X i = j, i X k = -j ,
where
The cross product is not associative; for instance, i x ( i x j) = -j -1- 0 = ( i x i ) x j. Consequently, IR3 is not a linear algebra when the cross product is used for vector multiplication. Several more examples are given in 1 1 .45 and 1 1 .46, and in Chapter 22 and thereafter.
X
1 1 . 5 . Definitions. Let be an IF-linear space, and let S , . i. Let (Y>. : .X E A) be an indexed set of lF-linear spaces. The external direct sum of the Y>. 's is the set
U Y>.
>. E A.
=
{1
E
IT Y>.
>. E A.
:
}
f(.X) is nonzero for at most finitely many A's .
This is a linear subspace of the product f L EA Y>, . Of course, if A is a finite set, then the external direct sum is equal to the product. The external direct sum described above is a special case of the external direct sum defined in 9.30. Caution: Some mathematicians call this the "direct sum;" see the remarks in 9.30. An important special case is that in which all the Y>. 's are equal to one vector space Y. Then the external direct sum U >. EA Y is equal to the set of all functions f : A ----> Y that vanish on all but finitely many A's. Specializing further: Let lF be the scalar field; then Un E N lF is the linear space consisting of all sequences of scalars that have only finitely many nonzero terms. j. If IF' is any field, then JFIF = {functions from lF into itself} is a linear space. (In fact, it is a commutative algebra; see 1 1 .3.) For each positive integer n , let Pn = {polynomials of degree at most n, in one variable, with coefficients in lF } ; this is a linear subspace of JFIF. The set Qn = {polynomials of degree exactly n } is not a linear space, since it is not closed under addition. k. Preview. Let lF be the scalar field (either or X + iX and real linear maps g 1 : V ---> X. Also, if T and X are real linear spaces, then any real linear map from T into X extends uniquely to a complex linear map from T iT into X + iX.
a. If f is a complex linear functional on V, then 9 1 ( v)
=
+
LINEAR DEPENDENCE 1 1 . 13. Definitions. A set S W is any function, then f can be extended to a linear function from V into W. Hint: 1 1 . 18.d. We may view IR as a linear space over the scalar field Q; a basis for this linear space is called a Hamel basis. (Some mathematicians apply that term more widely, as noted in 1 1 . 17.) Using such a basis, show that there exists a function f : IR --> IR that is additive - that is, satisfying f(s + t) = f(s) + f(t) - but not continuous. Remark: Compare this with 24.42. If V is any linear space, its linear dual separates points of V. Any complex linear space can b e represented as the complexification of some real linear space (as defined in 1 1 . 1 1 ) . Let S b e a linear subspace of a linear space Then S has an additive complement T has a linear subspace T satisfying - that is,
X.
X
S+T
=X
and
SnT
{0} ,
or, equivalently, satisfying the condition that each x E can be written in one and only one way as s + t with t E T.
X
s E S and
It may be instructive to contrast this with 8. 16. g. Let S be a linear subspace of a linear space Then S is the range of a linear projection - i.e., there exists a linear map f : S that has range S and satisfies f(s) = s for each s E S.
X -->
X.
1 1.31. Theorem (Lowig, 1934). Let V be an lF-linear space. Then any two vector bases for
V over lF have the same cardinality. (That cardinality can therefore be called the
dimension of the linear space. )
Proof This proof i s taken from Hall [1958] . Let S and T b e vector bases for V . Each s E S can be expressed uniquely (except for the order of summation) in the form s = a 1 h + · · · + ant n for some positive integer n, some nonzero scalars a 1 , a 2 , . . . , an, and some vectors t 1 , t 2 , . . . , tn E T. Let F(s) be the finite set { h , t 2 , . . . , tn} obtained in this fashion.
287
Dimension of the Linear Dual (Optional)
If s 1 , s 2 , , s k are distinct elements of S, then F ( s l ) U F ( s 2 ) U . . . U F ( sk ) contains at least k elements, for otherwise s 1 , s 2 , . . . , Sk would be linearly dependent (by 11.25). By M. Hall's Marriage Theorem 6.37(ii), there exist points t ( s ) E F ( s ) such that the mapping s t ( s ) is injective; thus card(S) :::; card(T). Similarly card(T) :::; card(S); now apply the Schroder-Bernstein Theorem 2.19. •
.
•
r---+
DIMENSION OF THE LINEAR DUAL ( OPTIONAL)
The results below make use of the fact (proved in 10.44.f) that card(JR) = card(IC) card(2N). Also, the results below assume the Axiom of Choice; these results should be contrasted with 27.47.a. Throughout the discussion below, let F be the scalar field; assume F is either lR or C. Let X be a linear space over F, and let X * be its linear dual - i.e., the set of all linear maps from X into F. 11.33. Observation. If dim(X) also, and card(X) < oo, then dim(X*) card(X*) = card(F). Hint: 11.22. 11.34. Proposition. card(X) = max{ card(F), dim(X)}. Hints: Let B be any vector basis for X; then card(F B) = max{ card(F), dim(X)} by (AC13) in 6.22. Let bE B F be the external direct sum (defined in 11.6.i) of B copies of F. Use 11.18.e, 6.22, andU 1 1.29 to explain 1 1.32. Assumptions, notations, and remarks. =
n
n
x
( )
card( X) = card U F :::; card bE B
(nU= l (F
)
B) n = card(F B) < card(X X) = card(X). Then use the Schroder-Bernstein Theorem. 11.35. Lemma. If X is infinite-dimensional, then dim(X*) 2': card(2�'�). Hints: Let { e0, e 1 , e 2 , . . . } be any linearly independent sequence in X. For each real number r, by 1 1.30.b there exists some fr E X* satisfying fr (en ) = rn for n 0, 1, 2, . . .. Now apply 11.15, to show that the fr 's are linearly independent members of X*. 11.36. Theorem. If X is infinite-dimensional, then dim(X*) > dim(X). Proof. By the preceding results, we have dim(X*) 2': dim(2N), hence dim(X*) = card(X*). Let B be a vector basis for X; then B is an infinite set; hence card(N B) = card( B). Since the X* is isomorphic to F8, we have card(X*) = card(F8) = card((2N)8) = card(2Nx B ) = card(28) > card(B) = dim(X). x
x
x
=
x
288
Chapter 1 1 : Linearity
PREVIEW OF MEASURE AND INTEGRATION
Let X be an additive monoid. (In most cases of interest X is either [0, +oo] or a vector space.) Let S be a collection of subsets of a set n with 0 E S, and let T : S ----> X be some mapping satisfying T ( 0 ) 0. We say that T is 2:::7=1 T (5j) whenever 51, 52 , . . . 5n are finitely additive if T ( U7= 1 5j) finitely many disjoint members of S whose union is also a member of S; ' countably additive (or O"-additive) if X is equipped with a metric (or other notion of convergence) and T ( U;: 1 51) 2:::;: 1 T (51) whenever 51, 52 , 53, . . . is a sequence of disjoint members of S whose union is also a member of S. The expression 2:::;: 1 T(5j) is defined as in 10.39. Of course, every countably additive mapping is also finitely additive, since we may take 53, 54, 55, . . . all equal to 0. We emphasize that "finitely additive" means "at least finitely additive, and perhaps countably additive;" it does not mean "finitely additive but not countably additive." Aside from the requirements 0 E S � P(O) , the collection S in the definition above is arbitrary. We now impose some additional restrictions. By a charge we shall mean a finitely additive mapping from an algebra of sets into an additive monoid. By a measure we shall mean a countably additive mapping from a rr-algebra of sets into an additive monoid equipped with some convergence structure. Cautions: The terminology varies considerably throughout the literature. Some math ematicians apply the term "measures" to what we have called charges, or to countably additive charges, or to positive measures (defined below), etc. Unfortunately, the phrase "f.L is a charge (or measure) on W" has two different meanings in the literature: It may mean W is the ( rr-)algebra S on which f.L is defined, or it may mean that W is the underlying set n on which S is defined. One must determine from context just which meaning is intended. 1 1.38. Remarks on the choice of the codomain X. In most applications of charges, the monoid X usually is either [0, +oo] or some vector space; then f.L may be called a positive charge or a vector charge, respectively. Though a wide variety of vector spaces are used in this fashion in spectral theory, in more elementary applications the vector spaces most often used for the monoid X are the one-dimensional vector spaces � and C. The resulting charge or measure is then called a real-valued charge or measure or a complex charge or measure, respectively. We shall study positive charges and measures in 21.9 and thereafter; real-valued charges and measures in 1 1 .47 and thereafter; and other vector charges and measures in 29.3 and thereafter. Positive charges and vector charges differ only slightly in their definition, but more substantially in their use. We are mainly interested in positive charges when they are in fact measures; moreover, it is commonplace to fix one particular positive measure f.L and then use it for many different purposes. In contrast, vector charges are sometimes of interest without countable additivity or rr-algebras, but they are of interest mainly in large 1 1.37. Definitions.
=
=
289
Preview of Measure and Integration
collections � i.e., we may study the relationships between many different vector charges, which are members of a "space of charges" as in 11.47. An important part of the theory of vector measures v is the question of just when they can be represented in the form v ( S) Is f(w) dp,(w) for some vector-valued function f and some positive measure p,; see 29.20 and 29.21. 11.39. Remarks on the choice of the domain S. In most of our elementary examples of charges or measures later in this book, the collection of sets S is actually equal to P(O) = {subsets of 0}. However, our most important measure is Lebesgue measure, which is not so elementary and which is not defined on a (}-algebra of the form :P(O); in 21.22 we prove it cannot be extended in a natural way to P(O). In many cases of interest, 0 is a topological space, and S is either the Borel (}-algebra or some (}-algebra containing the Borel (}-algebra. Recall that the Borel u-algebra is the (}-algebra on 0 generated by the topology � i.e., the smallest (}-algebra containing all the open sets; the members of that (}-algebra are called Borel sets. A measurable space is a pair (0, S) consisting of a set 0 and a (}-algebra S of subsets of 0; a measure space is a triple (0, S, p,) in which is a positive measure on S. (It might be more descriptive to call (0, S, p,) a "positive measure space," but we shall not be concerned with a measure "space" in which is a vector measure.) Thus, a measurable space is a space that is capable of being equipped with a measure; a measure space is a space that has been equipped with a positive measure. These terms should not be confused with each other, or with a space of measures i.e., a collection of measures equipped with some structure that makes the collection into a vector space, a topological space, or some other sort of "space," as in 11.48. 11.40. Several kinds of integrals will be introduced in this book; still more integrals can be found in the wider literature. When necessary, we shall specify what kind of integral is being used. Fortunately, the several integrals generally agree in those cases where they are all defined. For instance, I01 t 2 dt makes sense as a Riemann integral or as a Lebesgue integral, but with either interpretation the expression has the value of 1/3. We now informally sketch some of the main features shared by most types of integrals. Precise definitions will be given later. In general, an integral Is f dp, depends on a set S, a function f (called the integrand) , and a charge p, . In some of our studies of integrals, we may hold one or two of the arguments S, f, fixed. When an argument is held fixed and/or its value is understood, then it may be supressed from the notation; thus or or jr fdp, Is fdp, may be written as Usually, when S is omitted from the notation, then S is understood to be equal to 0. When 0 is a subset of IRn and is Lebesgue measure, then dp,(w) may be written simply as dw. The integral Is f dp, may be written in greater detail as Is f(w) dp,(w ) . Here w is a dummy variable, or placeholder. It is sometimes helpful in clarifying just what is the JL
JL
�
JL
J
JL
290
Chapter 1 1 : Linearity
argument of f, particularly if the function f is complicated. The integral is not altered in value if we replace w with some other letter, or omit it altogether. Thus:
Is ! (·) dJ-1( · )
Is f( >.) dJ-1(>.)
1 1.41. Using (O"-)algebras and charges, we shall consider integrals I f dJ-1 of three main types in this book: (i) J-1 is a vector charge taking values in a complete normed vector space, and f is a scalar-valued function taking values in the scalar field of that vector space. Then I fdJ-1 takes values in the vector space. We shall call this a Bartle integral (though the terminology varies in the literature); this type of integral is introduced in 29.30. The mapping (!, J-1) I f dJ-1 is bilinear - i.e., linear in each variable when the other variable is held fixed. For f and J-1 held fixed, the mapping S Is f dJ-1 is finitely additive; i.e., it is a vector charge. This is algebraically the simplest type of integral we shall consider. We modify this concept in a couple of ways, indicated below, to allow +oo in our computations. (ii) J-1 is a positive measure (and thus may take the value +oo), f is a function taking values in some complete normed vector space, and some restriction is placed on llf(-) 11 so that it is "not too big." Then I fdJ-1 takes values in the vector space. We shall call this a Bochner integral; it is introduced in 23.16. It is a linear function of J, for fixed /-1 · For fixed J, the mapping J-1 I f dJ-1 is like the "upper half" of a linear map: It preserves sums and multiplication by positive constants. For f and J-1 fixed, the mapping S Is f dJ-1 is countably additive - i.e., it is a vector measure. A central result for Bochner integrals is Lebesgue's Dominated Convergence Theorem, 22.29. (iii) J-1 and f both take values in [0, +oo], and I fdJ-1 does, too. We shall call this a positive integral; it is introduced in 21.36. It behaves like the "positive quadrant" of a bilinear mapping: The maps f I f dJ-1 and J-1 I f dJ-1 both preserve sums and multiplication by positive constants. For f and J-1 fixed, the mapping S Is f dJ-1 is countably additive - i.e., it is a positive measure. A central result for positive integrals is Lebesgue's Monotone Convergence Theorem, 21.38(ii). We emphasize that for integrals of this type, I f dJ-1 may take the value +oo. When I fdJ-1 exists and is finite, we say that f is integrable. Other types of integrals over charges are possible, of course. For instance, for any vector spaces X, Y, Z, we could integrate an X-valued function f with respect to a Y-valued measure J-1, using some bilinear map ( , ) : X Y Z; then I f dJ-1 takes place in Z. However, such integrals will not be studied in this book. A few other integrals will be defined in other fashions, not in terms of charges and algebras. The Riemann integral I: f(t) dt is reviewed in Chapter 24; in that chapter we ,._..
,._..
,._..
,._..
,._..
,._..
,._..
x
____,
291
Preview of Measure and Integration
also introduce the Henstock integral I: f(t) dt and the Henstock-Stieltjes integral I: f(t) dcp(t), and show how these integrals are related to the Lebesgue integral. Here f and are functions defined on an interval [a, b] . 1 1.42. Integration of simple functions. Let A be an algebra of subsets of a set n . A function f : n ----> X is called a simple function if the range of f is a finite subset of X, and f - 1 (x) E A for each x E X (or equivalently, for each x E Ran(!)). Equivalently, a simple function is one that can be written in the form n cp
f(-)
where is a positive integer, the xj ' s are members of X, and ls1 ( - ) is the characteristic function of some set Sj E A. (The representation ( ) is not unique, since we do not require the 's to be nonzero or distinct and we do not require the Sj 's to be disjoint.) If X is a vector space, then it is easy to verify that the simple functions form a linear subspace of x n . If X [0, +oo], the set of simple functions is not a linear space, but at least it acts like the "upper half" of a linear space: It is closed under addition and under multiplication by nonnegative constants. Now let f-1 be a charge defined on A, taking values in some monoid K, and let f : n ----> X be a simple function. When it makes sense, we define n
*
Xj
=
1 f df-1
X
n
The summation on the right is over all x E X or, equivalently, (since f-1(0) 0) the summation is over all x E Ran(!). Thus, the summation involves only finitely many terms. Equivalently, if f is represented by ( ) then =
1
*
,
n
L (S ) x . j = l f-L 1 1 For these summations to make sense, we must also make certain restrictions: We must have some notion of how to multiply times f-1 u - l (X)) and how to add Up the resulting products. This requirement is met by any simple function, in cases 11.41(i) and 11.41(iii). In case 11.41(ii), the requirement is met by any simple function f that satisfies this additional hypothesis: 1-1( {w E D : f(w) #c 0} ) < oo. In this case we say that f is an integrable simple function. If we use representation ( ) then we must choose the Sj 's so that no nonzero vector x is associated with a set Sj that has infinite measure. (That is accomplished, for instance, if we require that f be an integrable simple function and the sj ' s be disjoint.) n
f df-1
X
*
,
j
292
Chapter 1 1 : Linearity
Simple functions should not be confused with step functions, though the two notions are closely related. A step function is a mapping f : [a, b] X, from some subinterval of lR into some vector space, with the property that there exists some partition a = to < t 1 < t 2 < < t n = b such that f is constant on each subinterval (tj_ 1 , t1 ) . (Different partitions may be used for different step functions.) Step functions are a special case of simple functions, as follows: Let A be the collection of all finite unions of subintervals of [a, b] . (We interpret "subintervals" so that singletons and the empty set belong to A .) Then A is an algebra of sets, and the resulting X-valued simple functions (defined as in 1 1.42) are precisely the step functions. 11.43.
--+
· · ·
ORDERED VECTOR SPACES 11.44. Remarks. We shall only consider ordered vector spaces using lR for the scalar field. It is possible to develop a theory of ordered vector spaces using other scalar fields - see, for instance, Schaefer [1971] - but such a theory is more complicated and less natural and intuitively appealing; it is not recommended for beginners. Definitions. An ordered vector space is a real vector space X equipped with a partial ordering � such that ( i) x � y x+ � y+ (i.e., X is an ordered group); and (ii) If x >,:= 0 in X and r 2 0 in JR, then rx >,:= 0 in X. We say X is a Riesz space, or vector lattice, if in addition (iii) ( X, �) is a lattice - i.e., each finite nonempty subset of X has a supremum and an infimum. Finally, X is a lattice algebra (or algebra lattice) if X is also an algebra (in the classical sense, as in 1 1 .3) whose vector multiplication satisfies (iv) x, y >,:= 0 =? xy >,:= 0. If X is a Riesz space, then a Riesz subspace is a subset S that is closed under the vector operations and the lattice operations - that is, s + t, cs, s V t, s 1\ t E S. s, t E S, c E lR Clearly, such a set is itself a Riesz space, when equipped with the restriction of the operations of X. 1 1.45. Example: real-valued functions. Let A be any set. Then the product JR A {functions from A into lR} is a Dedekind complete lattice algebra, when given the product ordering - that is, when ordered by if x�y x(.X) :::; y(.X) for every >. E A. {==}
u
u
Ordered Vector Spaces
293
The vector and lattice operations are defined pointwise: (x · y)(,\ ) = x( ,\ ) · y(,\ ), (x + y ) ( ,\ ) x( ,\ ) + y( ,\ ), (x 1\ y ) (,\ ) min{x(J\), y(,\ )}. (x V y ) ( ,\ ) = max{ x( ,\ ), y(J\ )}, More generally, for any set S c::; IRA that is bounded above or below by some real-valued function, we have [inf(S ) ] (,\ ) = inf{s(,\ ) : s E S}. [sup(S)] ( ,\ ) = sup{s( ,\ ) : s E S}, When this ordering is used, many mathematicians write x :S y instead of x ,::o 0 f(x) >,::o 0. 1 1.53. Proposition. Let X and Y be Riesz spaces; assume that Y is Archimedean (defined as in 10.3). Let f X Y be an additive, increasing map - that is, and '*
:
----+
297
Positive Operators
Then f is IR.-linear. Corollary. Every lattice group homomorphism from a Riesz space into an Archimedean Riesz space is actually a Riesz space homomorphism. Proof of proposition. It suffices to show that f(rx) r f(x) for every real number r and every vector x E X. By additivity and the Jordan Decomposition, it suffices to prove that equation when r ::;> 0 and x � 0. By additivity. it is easy to see that f(qx) = qf(x) for rational numbers q. Since f is order-preserving , we can conclude that =
Now, for any integer m E N, we can find rational numbers q1 , q2 > 0 such that 1 < q 1 r < 0 < q2 r < . It follows that 1 1 m f ( x) � ( q1 r) f ( x) � f ( rx) r f ( x) � ( q2 r) f ( x) � m f ( x). Let "( = f(rx) rf(x); it follows that the subgroup Z"( = {m"( : m E Z} is bounded above by f(x) . Since Y is Archimedean, it follows that "( = 0. 1 1.54. A pathological example. In the preceding theorem, we cannot omit the assumption that Y be Archimedean. To see this, let be the hyperreal line (see 10.18). We shall prove the existence of a mapping f JR. --> that is a homomorphism for lattice groups but is not IR.-linear. First represent JR. as an internal direct sum, JR. = X Y, where X and Y are some additive subgroups of JR. other than { 0} and JR. itself. (This can be accomplished using 11.30.a, since JR. may be viewed as a linear space over the scalar field Q.) Let c be a nonzero infinitesimal in Define f : JR. --> by taking f(x + y) x + (1 + c)y for all x E X and y E Y. Then f is clearly additive. It is not linear, for if x, y are nonzero real numbers with x E X and y E Y then yf(x) = yx -=/= (1 + c)xy = xf(y) . It suffices to show that f is order-preserving. Suppose x 1 + Y1 < x2 + Y2 , where x2 E X and Yl , Y2 E Y. Then x2 + Y2 x 1 Y1 is a positive real number and ( Yl Y2 )c is an infinitesimal. Hence ( Y I Y2 )c < x2 + Y2 x 1 Y l · That is, f(x l + yJ ) < f(x2 + Y2). (This example disproves an erroneous assertion of Birkhoff [1967, page 349].) 1 1 .55. Proposition (Kantorovic). Let X, Y be Riesz spaces, and assume Y is Archimedean. Let f : X+ --> Y+ be any function. Then f extends to a positive operator F : X --> Y if and only if f is additive - i.e., if and only if f(x 1 + x2) = f(xi ) + f(x2 ) for all x 1 , x2 E X+. If that condition is satisfied, then the extension F is uniquely determined: It satisfies �
�
�
-
m
..!.. rn
�
-
�
�
�
:
IHI.
lHI
lHI
EB
lHI
=
�
Xj ,
�
�
�
�
�
F(x) Proof This proof follows the presentation of Aliprantis and Burkinshaw [1985]. Obviously, if f extends to a positive linear operator, then f must be additive and the extension F must satisfy the formula ( ** ). Conversely, assume f is additive and define F : X --> Y by ( ** ); we must show that F is linear. The proof will be in several steps:
Chapter 1 1 : Linearity
298
If x = u - v with u, v E X+ , then F(x) = f(u) - f(v). Hint: x + - x - = x = u - v, hence x + + v = u + x - ; now use our assumption that f is additive on X+ . b. F(x 1 + x2 ) = F(x l ) + F(x2 ); that is, F is additive on X. Hint: Apply the preceding result with u = xi + xt and v = x1 + x2. F is an increasing function on X. Hint: If x >;= 0 F(x) = f(x) >;= 0. Finally, apply 11.53 to complete the proof. 1 1.56. Observations. Let X and Y be Riesz spaces. Then: a. ,Cb(X, Y) = {order bounded linear maps from X into Y} is a linear subspace of [.,(X, Y) = {linear maps from X into Y}. b. Let J, g E [.,(X, Y). Then the following conditions are equivalent: (A) f - g is an increasing operator - that is, x1 =;= 0. (C) f(x) =,:= 0 also. Let tx = x - sx ; next we shall show that tx lies in T = sl_ . Let any E s be given; we are to show that I I 1\ ltx I = 0. Since M is bounded above by x, we have Sx � x; therefore tx >,:= 0 and ltxl = tx . Let u = uab 1\ tx; then u >,:= 0 and it suffices to show that u � 0. Since 0 � u � Iu I and u E S and S is a sup-closed ideal, it follows that u E S; hence also u + sx E S. Then * * *
:
=
(J
(J
0
and so
�
U + Sx
u + Sx
whence u + sx � sup(M) sx , and thus u � 0. This completes our proof of ( ). Next we prove the conclusion of ( ) with the hypothesis weakened: We shall permit x to be any element of X, not necessarily nonnegative. Applying the Jordan Decomposition, we have X = p - n, where p, n E x+. Then p, n have Riesz decompositions n and p Sp + tp E s + T Sn + t n E S + T. We obtain x = Sx + tx, with Sx = Sp - Sn E S and tx = tp -t n E T. To show every sup-closed ideal is an orthogonal complement, let S be an sup-closed ideal. Clearly S � 51-1-. For the reverse inclusion, let X1- E sj_j_ have decomposition X = s + t E s + T. Then t E T, but also t = x - s E Sl-1- = T . Hence t = 0, and x s E S. Whenever S and T are orthogonal complements, they satisfy S n T = {0}; see 8.13. In the present context, we have also shown that S + T X. Hence S T = X (see 8 . 1 1 ) , and the projections Irs , 7rr are uniquely determined group homomorphisms. The arguments of the preceding paragraphs show that Irs must satisfy the formula stated in the theorem. Note that if u >,:= 0 then 0 � 1r5 (u) � u. For any x E X, both x+ and x- are nonnegative, so 0 � Irs(x+ ) � x+ and 0 � Irs (x- ) � x- . Since x+ 1\ x- = 0, it follows that Irs (x+ ) 1\ Irs(x- ) = 0. From the Jordan Decomposition x = x+ - x- , we obtain Irs(x) = Irs (x+) - Irs(x-), which is therefore the Jordan Decomposition of Irs(x). Hence [1rs ( x ) ] + s ( x+ ) . By 8.45, 1rs is a homomorphism of lattice groups. If X is a Riesz space, then Irs is a homomorphism of Riesz spaces, by 1 1 .53. The same conclusions can be drawn for Iry. =
* * *
* * *
=
=
= 1r
El:l
C hapter 1 2 Convexity 12.1. Preview. The diagram below shows examples of a star set, a nonconvex set, and a convex set, all of which will be defined soon. The distinction between convex and nonconvex may be easier to understand after 12.5.i.
A typical star set2 in JR
a non convex set
a convex set
12.2. Notational convention. Throughout the remainder of this book (except where noted otherwise), the scalar field of a linear space will always be either lR or C. Usually the scalar field will be denoted by F, and we shall not specify which field is intended; this intentional ambiguity will permit us to treat both the real and complex cases simultaneously. However, we shall make free use of certain properties and structures enjoyed by lR and C that are not shared by all other fields - e.g., the real part, imaginary part, complex conjugate, and absolute value (see 10.31), and the completeness of the metric determined by that absolute value (see Chapter 19).
CONVEX SETS 12.3. Definitions. Several types of sets will now be introduced together; they have similar definitions and basic properties. Let X be a linear space with scalar field F (equal to lR or 302
303
Convex Sets
and let S ..1, (x, y) = AX + p,y, for all choices of A, in the scalar field. Likewise, S is a convex set if and only if S is closed under all the binary operations b>.. l ->. for A E [0, 1]. The other classes of sets can be characterized similarly - using not only binary operations, but also unary operations (s As for balanced sets and star sets, s - s for symmetric sets) and the nullary operation 0 (for balanced sets, absolutely convex sets, and star sets). Since these classes are Moore collections, they are closed under intersection. Thus, any intersection of convex sets is a convex set, etc. In fact, all the fundamental operations involved are finitary, and so the resulting classes of sets are algebraic closure systems, in the sense of 4.8. Since these classes of sets are Moore collections, they yield Moore closures (see 4.3) in fact, they yield algebraic closures (see 4.8). However, in this context it is not customary to use the term "closure." Instead we use different terms for the different kinds of closures: The smallest linear subspace containing a set T is the (linear) span of T. The smallest convex set containing a set T is the convex hull of T. Analogously we define the affine hull of T, the symmetric hull of T, the balanced hull of T, the absolutely convex hull of T, and the star hull of T. Notations for these hulls vary throughout the literature. In this book the convex hull of T and balanced hull of T will be denoted by co(T) and bal(T), respectively. 12.4. Some relations between convexity and its relatives. These relationships are summa rized in the following chart. q,
x
p,
c--+
c--+
-
304
Chapter 1 2: Convexity
I nonzero singleton I
absolutely convex = convex and balanced
Led I
I bal
A set is absolutely convex if and only if it is convex and balanced. b. Every balanced set is a symmetric star set. Every convex set that contains 0 is a star set. d. Every affine set is convex. A subset of X is a linear subspace of X if and only if it is affine and contains 0. Thus any linear subspace of X ( in particular, X itself) is convex, affine, symmetric, balanced, absolutely convex, and a star set. f. If x E X \ {0}, then the singleton {x} is an affine set, but it is not balanced. Moreover, suppose that the scalar field lF' is ffi?.. Then: g. A set is balanced if and only if it is a symmetric star set. h. A set is absolutely convex if and only if it is nonempty, symmetric, and convex. 12.5. Further elementary properties. Let X be an JF'-linear space. Then: Any union of symmetric sets or balanced sets or star sets is, respectively, a symmetric or balanced or star set. b. Suppose that J' is a nonempty collection of subsets of X that is directed by inclusion - i.e., such that for each F1 , Fz E J' there exists some F E J' such that F1 U Fz � F . If every member of J' is convex or affine or absolutely convex, then the union of the members of J' also has that property, respectively. Hint: 4.8(B). a.
c.
e.
a.
305
Convex Sets c.
The convex hull of a set T is equal to the set of all convex combinations of members of T - i.e., all vectors of the form
where n is a positive integer, the t1 's are members of T, and the Cj ' s are positive numbers whose sum is 1. d . The convex hull of a set T is the union of the convex hulls of the finite subsets of T. The convex hull of a balanced set is balanced. f. The absolutely convex hull of any set S � X is equal to co(bal( S)). g. The balanced hull of a convex is set is not necessarily convex. See the example in the following diagram. e.
Example. Let the scalar field be JR, let the vector space be JR2 , and let S = [0, 1J [0, 1 J . Then S is convex. However, T = bal(S) [0, 1 J X [0, 1 J U [-1, OJ X [-1, OJ is balanced but not convex. x
=
If x, y E X, then the straight line through x and y is the set {ax + (1 -a)y : a E JR}. It is the affine hull of the set { x, y}, if the scalar field is R In a real vector space, a set S � X is affine if and only if it contains the straight line through each pair of its members. i. If y X, then the straight line segment from x to y is the set { ax + ( 1 a) y : a [0, 1]}. It is the convex hull of the set { x, y}. The points x and y are its endpoints. A set S � X is convex if and only if it contains the straight line segment connecting each pair of its members (regardless of whether the scalar field is lR or C). j. A subset of lR is convex if and only if it is an interval. k. Let X be a real linear space, and let C � X. Then there exists an ordering � on X that makes X into an ordered vector space with nonnegative cone X equal to C if and only if C satisfies these conditions: (i) C is convex, (ii) C n ( -C) += {0}, and (iii) if x C and r > 0 then rx E C. h.
x,
-
E
E
E
12.6. Exercises: arithmetic operations on convex sets. a. For each >. in some index set A, suppose that C>- is a convex subset of some linear space XA. Then n>-E A CA is a convex subset of the linear space n>-EA XA. b. Let f : X Y be a linear map. If S � X is a convex set, then so is f(S) � Y. If
T � Y is
a
--7
convex set, then so is f - 1 (T) � X.
6
Chapter 1 2: Convexity
c.
In particular, if 5 � X is a convex set, then c5 = { cs : s E 5 } is convex for any scalar c, and xo + 5 = { xo + s : s E 5 } is convex for any vector x0 E X. If 5 and are convex subsets of X, then
30
T
co(5 U
T)
U
o:o:;a::; 1
[a5+ (1 - a)TJ.
For any sets A1 , Az, . . . , Ak � X, the convex hull of the sum is the sum of the convex hulls. That is, co (2::7=1 A; ) = I:7=1 co(A;). If C is a convex set, then C + C = 2C. That is, {x + y : x, y E C} = { 2u : u E C } . Equivalently, 1C + 1C = C. 12. 7. ( Optional.) As we noted in 12.3, it is possible to consider convex sets as algebraic systems, with fundamental operations given by the binary operations for r E (0, 1). cr ( x, y ) = rx (1 - r)y One may be tempted to try to view convex sets as an equational variety and thus apply to them all the theory of equational varieties. However, convex sets do not form a variety, for they are not closed under the taking of homomorphic images that respect the fundamental operations. It can be proved (see Romanowska and Smith [1985]) that the smallest variety containing all convex sets is the variety of barycentric algebras. These are the algebraic systems that have fundamental operations given by some binary operations Cr for r E (0, 1 ) , where the binary operations satisfy these identities: when 0 < r < 1; and Cr ( X, y) = C1-r ( y, x) Cr ( x, x = X Cs j( s + 1 ) ( x, Ct - s ( Y , z )) when 0 < s < t < s Ctj( s + 1 ) (cs ;t ( X , y ) , z ) The convex sets are the barycentric algebras that can be embedded in vector spaces; not all barycentric algebras can be so embedded. The following example, from [Romanowska and Smith], shows that the class of convex sets is not closed under the taking of homomorphic images that respect the fundamental operations. Let n be an integer greater than 1. Let 0 = {e 1 , e2 , . . . , en } be the standard basis for JRn - that is, let = (0, 0, . . . , 1, . . . , 0, 0) be the vector with in the jth place and Os elsewhere. Let � be the convex hull of the set 0; it is a convex subset of JRn (called the standard simplex) . We shall also consider the set P(O) = {subsets of 0} as an algebraic system, with binary operations defined by for r E (0, 1). Cr ( A, B ) AUB (We emphasize that all the Cr 's, for different values of r, are the same binary operation.) The set P(O) cannot possibly be isomorphic to a convex subset of a real vector space, for d.
e.
+
)
+ 1.
1
Ej
=
307
Combinatorial Convexity in Finite Dimensions (Optional)
any convex set that contains more than one point must contain infinitely many points. However, the mapping f : � -+ P(O) defined by is a homomorphism - i.e., it preserves the fundamental operations of the algebraic systems. Thus P(O) is a homomorphic image of a convex set. Therefore it preserves any identities that could be used to define the variety of convex sets - but it is not a convex set. Thus convex sets do not form a variety. In fact, P(O) is a barycentric algebra. 12.8. Definition. Let X be a linear space, with scalar field JR. or C. A set S s;;; X is absorbing (or radial) if for each x E X we have ex E S for all scalars sufficiently small (i.e., for all scalars satisfying lei ::; r, where r is some positive number that may depend on x and S). Show that the absorbing sets form a proper filter on X; thus they are sets that are "large" in the sense of 5.3. Absorbing sets will be important in the theory of Minkowski functionals (see 12.29.c and 12.29.g) and topological vector spaces (see 26.26, 27.9.e, and 27.20). c
c
COMBINATORIAL CONVEXITY IN FINITE DIMENSIONS ( OPTIONAL)
Let x0, XI , . . . , Xk be vectors in IR.n , for some positive integers k and n with k > n. Then there exist real numbers p0 , PI , . . . , Pk , not all zero, such that 'LJ=o Pj 0 and "LJ=o PjXj 0. Hint: First show that the vectors XI - x0, x2 - x0, . . . , Xk - x0 are linearly dependent see 11.25. 12.10. Caratheodory's Theorem. Let S s;;; !R.n . Then every point in co(S) can be expressed as a .convex combination of n + 1 or fewer elements of S. Proof The proof is in several steps. (i) Let Tk be the set of all convex combinations of k or fewer elements of S. It suffices to show that if k > n then Tk + I s;;; Tk . (Why?) (ii) Let x E Tk + I · Then x aoxo + + a k xk for some x0, xi , . . . , xk E S and ao, a i , . . . , ak E (0, 1] with ao + + a k = 1. (Explain.) (iii) Choose real numbers Po, PI , . . . , Pk as in Radon 's Lemma. For j 0, 1, . . . , k and any real number r, let f3j (r) = Dj - TPj · Show that x "LJ=o f3J (r)xj and 1 = "LJ=o f3J (r). (iv) By a suitable choice of r, show that x E Tk . 12.11. Radon's Intersection Theorem. Let S be a subset of IR. n consisting of at least 12.9. Radon's Affineness Lemma. =
=
=
·
·
·
·
·
·
=
=
308
Chapter 12: Convexity
n+2
points. Then S can be partitioned into disjoint subsets Q and R such that co( Q) meets co(R). Hints: Let S 2 { xo, x 1 , . . . , Xn+ l } · Choose real numbers Po,Pl, . . . , Pn+l as in Radon ' s Lemma. By relabeling and reordering, we may assume and where 0 ::; r < n + 1 (explain). Now let and Then what? 12.12. Helly's Intersection Theorem. Let S0, S1 , . . . , Sk be convex subsets of lR n , where k and n are positive inte�ers and k > n. Suppose that each n + 1 of these sets have nonempty intersection. Then ni =O si is nonempty. Hints: By induction on k, we may assume that the intersection of any k of the Sj ' s is nonempty (explain). For each j 0, 1, . . . , k, pick some Xj E ni#j S; . Apply Radon 's Intersection Theorem to the points xj. (How?) 12.13. The following result is interesting enough to deserve mention, though its proof is too difficult to include here: Shapley-Folkman Theorem. Suppose x E 2::;: 1 co(Aj), in JRn . Then x can be expressed as x 2::;'= 1 Xj, where each Xj E co(Aj) and where {j : Xj � Aj} has cardinality at most n. Taking m much larger than n, this shows that the sum of a large number of arbitrary sets is "almost convex." Proofs can be found in the appendices of Arrow and Hahn [1971] and Starr [1969]. Actually, those proofs assume the sets Aj are compact, but the problem can easily be reduced to that case by using Caratheodory's Theorem and its consequences; see 26.23.g. Other matters related to the theorems of Radon, Helly, and Caratheodory are considered by Danzer, Griinbaum, and Klee [1963]. Additional material on convexity, especially in finite dimensions, can be found in Roberts and Varberg [1973], Rockafellar [1970], and Stoer and Witzgall [1970]. =
=
CONVEX FUNCTIONS
For the definitions below, we consider functions f taking values in The definitions can be simplified slightly when f is known to be real-valued i.e., when oo, +oo � Range(!) - and certainly that restricted case still covers most of the applications. For these reasons, some mathematicians define "convex" only for real-valued 12.14.
Remarks.
[-oo, +oo] .
-
309
Convex Functions
functions. However, the greater generality of extended real-valued functions is occasionally useful, because [ - oo, +oo] is order complete - i.e., we can always take sups and infs in [-oo, +oo] . Arithmetic in [ - oo, +oo] is defined as in 1.17. Note that a sum of finitely many terms, r 1 + r2 + + rn, is defined if and only if - oo and +oo are not both among r 1 , r2 , . . . , rn · 12.15. Definition. Let C be a convex subset of a linear space X, and let f : C ---> [-oo, +oo] be some function. Then the following conditions are equivalent; if they are satisfied we say f is a convex function. ( A ) The set {(x, r) E C lR : f(x) :::; r} is a convex subset of C JR. ( This set is called the epigraph of f.) (B) The set {(x, r) E C lR : f(x) < r} is a convex subset of C JR. (C) Whenever x0, x 1 E C and 0 < A < 1, then f ((1 - .A )xo + .Ax ! ) < (1 - .A )f(xo) + .\ f(x l ) whenever the right side is defined ( see remark in 12.14). (D) Whenever n is a positive integer and .\ 1 , .\2 , . . . , An are positive numbers sum ming to 1 and x 1 , x2 , . . . , Xn E C, then · · ·
x
x
x
x
whenever the right side is defined (see remark in 12.14). (E) Whenever n is a positive integer and !Ll , JL2 , . . , !Ln are positive numbers and X! , X2 , · · · , Xn E C, then .
fL 1
+ fL2
+ · · · + fLn
whenever the right side is defined. If f is real-valued - i.e .. if ±oo � Range ( !) - then the following conditions are also equivalent. ( F ) For each v E X and � E C, the function ! (� + pv) - f( O p p
>
(G)
is increasing on the interval {p E lR : p 0, � + pv E C}. For each v E X and � E C, the function p h�.v(P) [f(� + pv) - ! (0]/p is increasing on the set where it is defined - i.e., on the set {p E lR \ {0} : >---->
� + pv E C}.
=
310
Chapter 1 2: Convexity
Hints: The equivalence of (A), (B), (C) follows by considering various cases, according to whether each number involved is +oo, - oo, or a finite real number. Obviously (D) implies (C) as a special case; conversely, (D) follows from (C) by induction. Condition (E) is just a reformulation of (D). Now suppose f is real-valued. To prove (C) (F), show ht;, ,v (>..p) :S ht;, ,v ( ) for 0 < )... < 1 by taking xo = � and X I = � + pv. To prove (F) (C), take � = x0 and v = XI x0; use the fact that ht;,,v(>.. ) :S h v ( 1 ) . Obviously (G) implies (F). To prove that (C) and (F) together imply (G), note that ht;,, - v( -p) = -ht;, , v (p) ; also, the inequality ht;,,v ( -p) :S ht;, , v(p) for p > 0 follows from the convexity of f. 12.16. Further definitions. A function g : C [-oo, +oo] is concave if -g is convex. A function h C [-oo, +oo] is affine if it is both concave and convex. An equivalent condition for h to be affine is that whenever xo , X I E C and 0 < )... < 1 , then (1 - >.. ) h(xo) + >..h (xi) whenever the right side is defined - i.e., whenever we do not have one of h(x0), h(xi ) equal to - oo and the other equal to +oo. 12.17. Some elementary properties of convex functions. Let X be a vector space, let C be a convex subset of X, and let f : C [- oo, +oo] be some function. Then: a. f is convex if and only if the restriction f 1 is a convex function for each line segment L L whose endpoints are elements of C - equivalently, if and only if for each x0, XI E C, the function )... f ((1 - >.. )x0 + >.xi ) is a convex function from the interval [0, 1] into [ -oo, +oo] . b. We say f is quasiconvex if the set { x E C : f ( x) :S r} is a convex set for each r E [-oo, +oo] . Show that (i) Every convex function is quasiconvex. (ii) Every increasing function from lR into [-oo, +oo] is quasiconvex. (iii) (Example.) The function f ( x) = x3 is increasing on JR, hence quasicon vex, but it is not convex. (Hint: Use 12.19(E).) c. We say f is strictly convex if it has this property: Whenever x and y are two distinct points in C and 0 < )... < 1, then f(>..x + (1 - >.. ) y) < >.. f (x) + (1 - >..) f(y). Show that if C is an open interval in the real line, then any convex function from C into lR is either affine or strictly convex. d. If f is a real-valued function defined on a linear space, then f is affine if and only if f - f(O) is linear. Caution: In some contexts the term "linear" is used for affine maps as well. Es pecially, a "piecewise-linear" map is a map that is defined separately on various parts of its domain and is affine on each of those parts. This terminology is especially common in numerical analysis. =}
t;,
-
:
____,
____,
____,
f-+
=}
P
311
Convex Functions
If f is real-valued and convex and its domain C is the convex hull of a finite set, then f is bounded. Hints: Say C = co{x 1 , x2 , . . . , Xn }. First show supx E C f(x) :::; maxj f(xj) · Then let u = � (x 1 + x2 + n +l xn ) · For each y E C, show there is some corresponding E C satisfying u = �y + � Use this to obtain a lower bound on f(y) . f. ( Optional.) Let V be a real linear space. Let
,:=, 1 , 0, C , V, A) is another Boolean lattice - i.e., we obtain a new Boolean lattice
X
if we keep the same set and the same complementation operation, but swap 0 and 1, and swap meets and joins. (In fact , the mapping x f-.+ Cx is an isomorphism, in the sense of 13.7, from B onto B0P .) Any statement about Boolean lattices has a dual statement that follows as a consequence by this swapping. For instance, the two De Morgan's Laws in 13.5.d are dual to each other. When two statements are dual to each other in this fashion, for brevity we may state just one of them.
BOOLEAN HOMOMORPHISMS AND SUBALGEBRAS 13.7. Definitions. We may view Boolean lattices as an equational variety, in the sense of 8.50. The fundamental operations are V, 1\, C , 0, 1. A Boolean lattice satisfies the axioms of a lattice (that is, Ll-L3 in 4.20 ) , together with these axioms: x 1\ 0
=
x 1\ Cx
=
0,
xV1
=
x V Cx
=
1.
A Boolean lattice, viewed as an algebraic system in this fashion, is usually called a Boolean algebra. We may occasionally revert to the term "Boolean lattice" to emphasize the ordering structure. We emphasize that, in this book, the singleton {0} is a Boolean algebra (albeit a degenerate one) ; see the remarks in 13.4.a. A Boolean homomorphism is a homomorphism in this variety - i.e., a mapping that preserves the fundamental operations. Thus, a Boolean homomorphism means a mapping
330 f
:
Chapter 1 3: Boolean Algebras
X Y from one Boolean algebra into another that satisfies --+
f(x l V x 2 ) = f(x i ) V j(x2 ), f(x l 1\ x2 ) f(x i ) 1\ j(x 2 ), f(1) = 1 , J (Cx) = CJ (x), f(O) = o, for all x , x 1 , x 2 E X. We may call this a Boolean algebra homomorphism for emphasis or clarification. Exercise. It suffices to show that f preserves V and C; the other conditions then follow as consequences. Hint: 0 0 1\ Co. =
13.8. Definition and a concrete example. A two-valued homomorphism on a Boolean algebra is a Boolean homomorphism from into the Boolean algebra 2 = {0, 1 }. If i s an algebra of subsets of some set n, and Wo n, then one two-valued homomor phism on is the probability concentrated at w0 :
X
X
X
X
{0
1 if w0 if w0
J-L(S)
Y
E
ES �S
for
S E X.
Exercise. If is a nondegenerate Boolean algebra, then there does not exist any ho In particular, there is no momorphism from the degenerate Boolean algebra {0} into two-valued homomorphism on the degenerate Boolean algebra {0}.
Y.
X
13.9. More definitions. If is a Boolean algebra, then a Boolean subalgebra of X is a subobject of in the variety of Boolean algebras - i.e., it is a set that is closed under the fundamental operations. Thus, a Boolean subalgebra of X is a nonempty that satisfies set
X
S�X
S�X
S
Note that itself is then a Boolean algebra, when equipped with the restrictions of the operations of We can apply to Boolean subalgebras all the conclusions of Chapter 4 about Moore closed sets and all the conclusions of Chapter 9 about subalgebras in an equational variety. Thus, X is a Boolean subalgebra of itself; the intersection of any collection of Boolean subalgebras is a Boolean subalgebra; any homomorphic image of a Boolean subalgebra is is a Boolean subalgebra; etc. The Boolean subalgebra generated by a set G the smallest Boolean subalgebra that includes G; it is equal to the intersection of all the Boolean subalgebras that include G; the set G is then called a generating set, or a set of generators, for the Boolean subalgebra In the special case where X = P(D) for some set D and G is a collection of subsets of D, we find that is the algebra of sets generated by G (see 5.26.e) .
X.
S
S.
�X
S
13.10. Normal Form Theorem. Let X be a Boolean algebra, and let G � X. Then the Boolean subalgebra follows: Let
S generated by G can be described more concretely in three stages, as {x E X {x E X {x E X
x E G or Cx E G}, x is the inf of finitely many members of Gc } , x is the sup of finitely many members of Gc A }.
Boolean Homomorphisms and Subalgebras
331
(We have 1 E G c A and 0 E G c Av since the inf of no members of X is 1 and the sup of no members of is 0.) Then G c Av = S. In other words, S consists of the elements of that can be written in the form
X
X
I
Ji
v 1\ cn ( i,j) gi,j i =l j= l
s
where I, J1 , h , . . . , J1 are nonnegative integers, the n ( i, j) ' s are nonnegative integers (or more simply, Os and 1s) , and the gi,/s are elements of G. An expression such as the right side of ( * ) is said to be in normal form. In particular, the subalgebra generated by a finite set is also finite.
Hints: Obviously Gc Av is closed under finite sups. Use the Distributive Law and De Mor gan's Laws to show that G c Av is also closed under finite infs and under complementation. Remarks. An important special case is that in which X
= P(n) for some set n. Then are collections of subsets of n, and "inf'' and "sup" mean "intersection" and "union," respectively. The theorem above shows that the algebra of sets S generated by a given collection of sets 9 can be obtained by a three-stage construction. An analogous three-stage construction does not work for 0'-algebras: If we start with a collection 9 of subsets of n, and then close it under complementation, then under countable intersection, then under countable union, the resulting collection A.sa is not necessarily equal to the 0'-algebra generated by 9. Indeed, A6,. is contained in the 0'-algebra generated by 9, but Aba is not necessarily closed under complementation or countable intersection. An example is given by n = 2!\!, with A equal to the collection of all sets of the form p X n;:m+l {0, 1 } , where m is any positive integer and p is any subset of I17= 1 {0, 1 } . We omit the lengthy computation that shows that the resulting collection A6,. is not closed under complementation or countable intersection. It is easy to see where the analogy between algebras and 0'-algebras breaks down: the product of finitely many finite sets is finite, but a product TI-rEC A-y (as in 1.38) of countably many countable sets is not necessarily countable - see 2.20.k and 2.20.1.
G, Gc , Ge l\ , Gc A v
13.11. Sikorski's extension criterion. Let G be a subset of a Boolean algebra X , be another Boolean algebra, and which generates a Boolean subalgebra S be some mapping. Then f can be extended to a Boolean homomorphism F : S ---> if and only if f satisfies the following condition:
Y
Y
Y
e n , gl
(\
cn 2 g2
(\ . . . (\
cn k gk
0
implies for every nonnegative integer
k and every choice of and
Moreover, if this condition is satisfied, then the extension F : S --->
Y is uniquely determined.
Remark. This theorem is similar in nature to 1 1 . 10, though a bit more complicated.
Chapter 1 3: Boolean Algebras
332
The uniqueness of F is clear: If an extending homomorphism F : S ----t Y exists, then it must satisfy
Proof of theorem.
I
F ( 8)
=v
J,
(\ Cn( i ,j ) f(g;,j ) i=I j=I
I
J,
. 'l,J
ij V I\ Cn( , ) g · . i=I j=I
8
whenever
(1)
Every 8 E S can b e expressed as a combination of g; ,j 's i n G as above, by 13. 10; hence there is at most one homomorphism F : S ----t Y that extends f. It is not immediately clear that equation ( 1 ) determines a function, however. Some 8 E S may be representable in normal form in terms of 9i,j 's in more than one way, and so we must verify that the resulting value of F ( 8) does not depend on the particular representation of 8. After we establish that, we shall show that the function F defined by ( 1 ) is indeed a homomorphism. To show that ( 1 ) actually does define a function, suppose that I
J,
v 1\
cn( i , j ) 9i ,j
K Lk , V 1\ cm ( k l ) hk , l k=I l=I
8
i=I j=I for some 9i ,i 's and h k , l 's in G, and let I
J,
v 1\
,:o u0 } is a Boolean ultrafilter in X . Hence its characteristic function
{
f(y) is a member of X* with f(xo )
=
1 if y E T 0 if y E X \ T
1.
13.21. Lemma on Stone's Epimorphism. Let X be any Boolean algebra. Assume that X * is nonempty. Then there exists a Boolean homomorphism from X onto an algebra 6 of subsets of X*. In fact, one such homomorphism may be defined as follows: For each x E X, let S(x) = {! E X* : f(x) = 1 } . Let 6 be the range of this mapping. Verify that S(x V y) = S(x) U S(y), S( Cx) = X* \ S(x) ,
S(x 1\ y) = S(x) n S(y), S(O) =
0,
S ( 1 ) = X*.
These equations show that 6 is an algebra of subsets of X * and that the mapping x f--+ S(x) is a homomorphism from onto 6. These observations do not require the Axiom of Choice or any of its weakened forms. However, we can draw further conclusions about Stone ' s epimorphism if we assume some weakened form of the Axiom of Choice; see (UF6) in 13.22.
X
BOOLEAN EQUIVALENTS OF UF 13.22. We shall show that the several principles listed below are equivalent to the Ultrafilter Principle. In Chapter 6 we proved (UF 1 ) =} (UF2) (and in Chapters 7 and 9 we proved that (UF1 ) can b e described also as the set of all functions f from subsets of X into { 0 , 1 } , such that {(x0 , 1 ) } U Graph(!) is the graph of a function that satisfies Sikorski's extension criterion ( 13. 1 1 ) . It is easy to verify that ci> has finite character, in the sense of (UF2) (iii). Also, ([> satisfies (UF2)(i) trivially, since the set {0, 1 } is finite. To verify (UF2) (ii), let S be any finite subset of X. Then the Boolean subalgebra generated by S U { x0 } is finite, and so we can apply 13.20 to it. Thus, (UF2) is applicable; this completes the proof. This argument is a modification of one by Rice [1968] .
Proof of (UF5) =? (UF6) . X* is nonempty, so Stone ' s mapping S : X ----+ 6 in 13.21 is well-defined. Also, from (UF5) we see that x E X\{0} =:;, S(x) i- 0. Thus the ring-with unit homomorphism S : X ----+ 6 has kernel equal to {0}, so S is injective. Therefore S is an isomorphism from X onto 6 .
Chapter 1 3: Boolean Algebras
340
Proof of (UF6) =? (UF7) . Obvious. Proof of (UF7) =? (UF8) . Immediate from the example in 13.8. Proof of (UF8) =? (UF9). Let I be a given proper ideal in X. Let 1r X
the quotient map, onto the quotient Boolean algebra. By (UF8) , Verify that 1r- 1 (P) is a prime ideal in X that includes I.
:
----+
X/ I be X/ I has a prime ideal P.
Proof of (UF9) =? (UF 10) . The "only if" part is obvious and does not require (UF9); any subset of an ultrafilter (or more generally, any subset of a proper filter) has the finite meet property. For the "if" part, conversely, suppose S has the finite meet property. Then the set x E X : x >r s 1 /l. · · · /l. sn for some finite set {s 1 , . . . , sn } � s
}
{
is a proper filter containing S. (In fact, it is the smallest such filter; it is the filter generated by S.) By (UF9) , this filter is contained in some ultrafilter.
Proof of (UF10) =? (UF 1 ) . Immediate from 13. 18(F) .
HEYTING ALGEBRAS 13.23. In this subchapter we shall consider two types of algebraic systems that are slightly more general than Boolean algebras. They have most of the properties of Boolean algebras, but not quite all. In particular, they lack some of the symmetry or duality of Boolean algebras; thus they might be thought of as "one-sided Boolean algebras." That relatively pseudocomplemented lattices are more general than Heyting algebras and Heyting algebras are more general than Boolean algebras can be seen from the examples in 13.28. 13.24. Definition. Let X be a lattice, and let a, b E X. Then the set { x E X : a /1. x � b} is nonempty - for instance, b is a member. The pseudocomplement of a relative to b is the element of X denoted by a =? b and defined by this formula:
E X : a /1. x � b} We shall also refer to =? as the Heyting implication.
(a =? b)
max{ x
if such a maximum exists. We say that X is a relatively pseudocomplemented lattice if the Heyting impli cation is a binary operation - that is, a =? b exists in X for all a, b E X, and thus the Heyting implication is a mapping from X x X into X.
13.25. Basic properties. Let X b e a relatively pseudocomplemented lattice. Then: a. x � (a =? b) if and only if a /1. x � b. b. x � (a =? ( b =? c)) if and only if a /1. b /1. x � c. c. Interchange of Hypotheses. (a =? ( b =? c)) = ( b =? (a =? c) ) . d. (a /1. (a =? b)) � b � (a =? b).
341
Heyting Algebras e. If a =:$ b, then (c =? a) =:$ (c =? b) and (a =? c) � (b =? c) . Hint: c
{x E X : 1\ x =:$ a} {x E X : a l\ x =:$ c} f. ( a =? b) =:$ ( ( b =? c) =? ( a =? c)) .
C
{x E X : c 1\ x =:$ b} ,
=>
{x E X : b l\ x =:$ c} .
g. X has a largest element , hereafter denoted by 1. It is equal to (a =? a), for any a E X. h . ( 1 =? a) = a and (a =? 1) = 1 for any a E X. 1.
a =:$ b if and only if (a =? b) = 1.
j. X is a distributive lattice. In fact, it satisfies one of the infinite distributive laws: If a E X and S
X by
E X : a 1\ x = 0}.
C is called the pseudocomplement.
13.27. Basic properties. Let X be a Heyting algebra. Prove the following properties. (Several of these are just specializations of results of variables to 0.)
13.25, obtained by setting one of the
Co = 1, C1 = o, (o '* b) = 1 . If a =:$ b , then Cb =:$ Ca. Contrapositive Law. (a =? (Cb)) = (b =? (Ca)) . Double Negation Law. a =:$ CCa. Hint: Use the Contrapositive Law with b = Ca, and use 13.25.i. e. (a =? b) =:$ ((Cb) =? (Ca) ) .
a. b. c. d.
Chapter 1 3: Boolean Algebras
342
f. a A (Ca) = O. g. (a =} (Ca)) = (Ca). h. Brouwer's Triple Negation Law. CCCa = Ca.
Hints: (Ca) � CC(Ca) by applying the Double Negation Law t o Ca. Also, apply to both sides of the Double Negation Law; by 13.27.b this yields C(CCa) � C(a) .
C
i. ((Ca) A (Cb)) = C(a V b). j. ( (Ca) V (Cb)) � C(a A b). k. ((Ca) V b) � (a =} b) . l. CC (b V (Cb)) = 1 . Hint: Use 13.27.f with a = Cb; also use 13.27.i. Although we omit the proof, it can be shown that Heyting algebras form an equational variety. See Rasiowa [1974] . n. Every Boolean lattice is a Heyting algebra.
m.
13.28. Topological examples. Let X be a set, and let 'D be a collection of subsets of X that is closed under finite intersections and arbitrary unions. Assume that the partially ordered set ('D, � ) is a lattice. Then ('D, �) is a relatively pseudocomplemented lattice. To see this, let S, T be any two given members of 'D. Let Xs,T = {G E 'D : S n G � T}. Then the union of the members of Xs,T is itself a member of Xs,T , and thus the largest member of Xs,r; hence it satisfies the requirements for a relative pseudocomplement. We note two particular instances of this when X is a topological space; these examples are from Rasiowa and Sikorski [1963] : a. The lattice of open sets, discussed in 5.21, is a Heyting algebra, since it also has a smallest member - namely, the empty set. (Although the proof is too long to present here, it can be shown that, conversely, any Heyting algebra is lattice isomorphic to the lattice of open sets of some topological space; a proof of this is given by Rasiowa and Sikorski [1963, page 128] .) In 5.21 we verified directly that the lattice of open sets satisfies one of the infinite distributive laws; that fact also follows from 13.25.j . The lattice of open sets may or may not be a Boolean algebra; in 5.21 we gave an example in which the lattice of open sets does not satisfy the other infinite distributive law and thus is not a Boolean algebra. b. The open dense subsets of a topological space X form a lattice, with binary lattice operations u, n (see 15. 13.c). In fact, it is a relatively pseudocomplemented lattice, by the argument given at the beginning of this section. It may or may not have a smallest member, and thus it may or may not be a Heyting algebra. For instance, ( exercise) it does not have a smallest member if X is the real line with its usual topology.
13.29. Which Heyting algebras are Boolean? Suppose X is a Heyting algebra. Then the following conditions are equivalent:
a V (Ca) = 1 for all a E X. (B) CCa � a for all a E X.
(A)
Heyting Algebras
343 V
(C) (a =;. b) � ((Ca) b) for all a, b E X. (D) ((Ca) =;. (Cb)) � (b =;. a) for all a, b E X. (E) ((Ca) =;. b) � ((Cb) =;. a) for all a, b E X. (F) X is a Boolean algebra. Proof. If X is a Boolean algebra, then it is easy to verify that all the other conditions listed above are satisfied. Conversely: If (A) holds, then C is a complementation operation (as in 13. 1 ) , not just pseudocomple mentation; since any Heyting algebra is a distributive lattice, (F) follows. For (B) implies (A) , note that CC (a (Ca)) = 1 and a � CCa in any Heyting algebra. For (C) implies (A) , let b = a. For (D) implies (B), let b = CCa and simplify. For (E) implies (B), let b = Ca. V
C hapter 1 4 Logic and Intangibles 14.1. Introduction. Contrary to the assumption of many nonmathematicians, the study of formal logic does not make us more "logical" in the usual sense of that word - i.e. , the study of logic does not make us more precise or unemotional. Formal logic is not merely a more accurate or more detailed version of ordinary mathematics. Rather, it is a whole other subject, with its own methods and its own theorems, which are of a rather different nature than the theorems of other branches of mathematics. Because many of logic's most important applications are in set theory, those two subjects are often presented together, and they may be confused in the minds of some beginners. However , logic and set theory are really different subjects. It is possible to do some inter esting things in set theory without any formal logic (see Chapter 6). Conversely, logic can be applied to other theories besides set theory - e.g., real analysis, ring theory, etc. We have already seen examples of this in 8.51 and 13.15. 14.2. Chapter overview. This chapter provides a brief introduction to formal logic. Our presentation is mostly conventional, but we follow the unconventional approach of Rasiowa and Sikorski [1963] in our definition of "free variables" and "bound variables;" this is dis cussed in 14.20. We shall cover the basics of logic, up to and including a proof of the Completeness and Compactness Principles, which show that the syntactic and semantic views of consistency are equivalent. An easy corollary of the Compactness Principle is the existence of nonstan dard models of arithmetic and analysis in 14.63; this is one way to introduce the subject of nonstandard analysis. The Completeness and Compactness Principles are also interesting to us because they are equivalent to the Ultrafilter Principle, a weak form of Choice studied extensively in other chapters of this book. After the Completeness and Compactness Principles we shall state a few more advanced results, with references in lieu of proofs. Our main goal is to develop some understanding of the notion of "consistency," so that we can understand Shelah's alternative to conventional set theory, Con(ZF + DC + BP) . Con(ZF) This result was proved by Shelah [1984] , but the proof is too long and too advanced to be included in this book. Our goal is only to understand the statement of Shelah's result and some of its applications. At the end of this chapter, we shall use Shelah ' s consistency result to explain intangibles - i.e., objects that "exist" in conventional mathematics but that lack 344
Some Informal Examples of Models
345
' examples." For a first reading, some may choose to skip ahead to the end of this chapter and just read the summary of consistency results and the explanation of intangibles; the rest of this chapter will not be needed elsewhere in the book. SOME INFORMAL EXAMPLES OF MODELS 14.3. In logic we separate a language from its meanings. An interpretation of a language is a way of assigning meanings to its symbols. Formulas are not true or false in any absolute sense; they are only true or false when we give a particular interpretation to the language. For instance, the axioms of ZF set theory are usually regarded as true, but they become false if we interpret "set" and "member" in the peculiar fashion indicated in 1.48. In the view of some logicians -- especially, formalists - mathematical objects such as sets do not really "exist;" all that really "exists" is the language we use to discuss sets and the reasoning we can perform in that language. When we change the language or its interpretation, then the nature of sets or other mathematical objects changes. Bertrand Russell took such a viewpoint when he said Mathematics is the only science where one never knows what one is talking about nor whether what is said is true. If we cannot establish absolute truth, the next best thing is syntactic consistency i.e., knowing that our axioms do not lead by logical deduction to a contradiction. By the Completeness Theorem (proved in 14.57), syntactic consistency is equivalent to semantic consistency -- i.e., knowing that our collection of axioms has at least one model. A model of a collection of formulas is an interpretation that makes those formulas true; it is a sort of "'example" for that collection of formulas. An interpretation of a language may be highly unconventional, unwieldly, and not at all intuitive. It may be constructed just for a brief, one-time use -- e.g., to prove the consistency of a given collection of axioms. After a model has been used to establish consistency of some axioms, in some cases we may choose to discard the model and think solely in terms of the axioms, because they are conceptually simpler. (A good example of this is in 14.4. ) Application-oriented mathematicians may choose to skip the modeling step altogether, and begin with the axioms, trusting that other mathematicians have already justified those axioms with a model. All of the terms introduced above -- interpretation, consistent, model, etc. -- will be given more specialized and precise meanings later in this chapter. But firt>t, to introduce the basic ideat>, in the present subchapter we t>hall present some informal examples of models, where "model" has the broad and slightly imprecise meaning indicated above. Most of these examples are mere sketches, intended only to indicate the flavor of the ideas. The omitted details are considerable, and are not intended as exercises; the reader who wishes to fill in the details should consult the references in the bibliography. 14.4. Models of the reals. The axioms for an ordered field were given in 10. 7; those plus -
Chapter 14: Logic and Intangibles
346
Dedekind completeness make up the axioms for the real number system. Many analysis books simply "define" to be a Dedekind complete, ordered field. But how do we know that that list of axioms makes sense? We must show (or trust other mathematicians who say they have shown) that
lR
(i) there is such a field, and (ii) any two such fields are isomorphic. Proof of (ii) is given in 10. 15.e. Proofs of (i) by different constructions in terms of the rationals are given in 10. 15.d, 10.45, and 19.33.c. Any one of these constructions is a model of the axioms of and it therefore demonstrates the consistency of the axioms of R However, the constructions - involving Dedekind cuts, equivalence classes of Cauchy sequences, etc. - are rather complicated and generally have little to do with our intended applications of the reals. The axioms for are usually much simpler conceptually and more convenient for applications. Thus, after we have demonstrated consistency we may discard the model and think of the real numbers in terms of their axioms: The real number system is a complete ordered field.
JR,
lR
14.5. A non-Euclidean geometry modeled in Euclidean geometry. During the 18th and 19th centuries, mathematicians became concerned about Euclid's Parallel Postulate, which says, in one formulation: through a given point p not on a given line L, there passes exactly one line that lies in the same plane as L but does not meet L . The other postulates of geometry are concerned with objects o f finite size, such as triangles. In contrast, the Parallel Postulate is concerned with behavior of points that are very far away - perhaps infinitely far away - and so the Parallel Postulate is less self-evident. Some mathematicians attempted to remove any doubts by proving this axiom as a consequence of the other axioms of Euclidean geometry. In these attempts, one approach was to replace the Parallel Postulate with some sort of alternative that negates the Parallel Postulate, and then try to derive a contradiction. Some alternatives to the Parallel Postulate did indeed lead to clear contradictions, but other alternatives merely led to very peculiar conclusions. The peculiar conclusions made up new, non-Euclidean geometries. For instance, a paper of Riemann (1854) developed a geometry, now called double elliptic geometry or Rie mannian geometry, in which any two lines meet in two points, and the sum of the angles of a triangle is greater than 180 degrees. At first these geometries were not seen to have anything to do with the "real" world; they were merely viewed as imaginary mathematical constructs. However, in 1 868 Eugenio Beltrami observed that the axioms of two-dimensional double elliptic geometry are satisfied by the surface of an ordinary sphere of Euclidean geometry, if we interpret "line" to mean "great circle" (i.e. , a circle whose diameter is the diameter of the sphere) . Therefore, if a contradiction arises in our reasoning about double elliptic geometry, then by the same argument with a different interpretation of the words we can obtain a contradiction in Euclidean geometry. Thus, we have a model that establishes relative consistency: If the axioms of Euclidean geometry are noncontradictory and the
Some Informal Examples of Models
347
theorems we have proved about the sphere in Euclidean geometry are correct, then the axioms of double elliptic geometry are also consistent. Even if we find these bizarre geometries distasteful and prefer to concern ourselves only with Euclidean geometry, Beltrami's reasoning leads to this important conclusion: The Parallel Postulate of Euclidean geometry is not implied by the other axioms of Euclidean geometry. As Hirsch [1995] has put it so aptly, before the 19th century Euclidean geometry was "not merely an axiomatic study, but our best scientific description of physical space." In retrospect, we can now see that double elliptic geometry is every bit as "realistic" as Eu clidean geometry. Ants on a very large sphere might think they were on a plane, if they thought at all. Indeed, many humans thought that way until Colombus sailed. In much the same fashion, our three-dimensional space may be very slightly curved in a fourth direction, but the curvature may be so small that we have not yet detected it. Perhaps a space ship that travels far enough in a seemingly straight line will eventually return to its home planet. We can only be certain of what is near at hand. For a more detailed discussion of the history of these ideas, see Kline [1980] . A similar approach to the Parallel Postulate, using the interior of a circle in the Euclidean plane, was developed by Cayley; it is discussed by Young [191 1/1955] .
14.6. Specifying a universe. Here is one way to construct models of set theory: Let M be some given class of sets. Hereafter, interpret the term "set" to mean "member of M." Thus, the phrases "for each set" and "for some set" will be interpreted as "for each member of M" and "for some member of M." Then statements in the language of set theory can be interpreted in terms of M. For instance, the definition of equality of sets (given in 1 .47) says that if A and B are two sets, then A = B holds if and only if for each set T, we have T E A +--+ T E B.
This condition is satisfied when "set" and "member" have their usual meanings, but not when those terms have certain unconventional meanings, as in 1 .48. Are they satisfied in the model M? Yes, for some choices of M; no, for others. In the model M, the definition of equality has this interpretation: Let A, B E M; then A = B if and only if for each T E M, we have T E A +--+ T E B .
It is easy t o see that "equality" of sets in the model M coincides with the restriction of ordinary equality to the collection M if and only if M has this property: whenever A, B E M and M n A = M n B, then A = B .
( !)
Easy exercise (from Doets [1983] ) . I f M i s a transitive set (defined as i n 5.42 ) , then M satisfies the condition above. A converse to this exercise is Mostowski's Collapsing Lemma, which states that any model that satisfies (!) is isomorphic to a transitive model. We shall not prove this lemma; it can be found in books on axiomatic set theory.
(!)
Chapter
348
14:
Logic and Intangibles
If J\1 includes some, but not all, members of von Neumann 's universe V (described in 5.53) , then sets A, B E J\1 may have different properties when viewed in J\1 or in V. For instance, since J\1 has fewer sets than V, it also has fewer functions and fewer bijective functions. It is quite possible that there exists a bijective function in V between A and B , but there does not exist such a function in J\1. Thus card(A) = card(B) in V, but card(A) =/= card(B) in J\1. When we go from the smaller universe J\1 to the larger universe V, some distinct cardinalities coalesce; this phenomenon is called cardinal collapse.
14.7. Godel's universe. A subclass of V was used for an important model of set theory by Godel around 1939. He interpreted "set" to mean "member of L," using the universe L of sets that are "constructible relative to the ordinals," as described in 5.54. That universe is (perhaps) smaller than the usual universe V. With this interpretation, he was able to show that the axioms of ZF set theory plus AC (the Axiom of Choice) plus GCH (the Generalized Continuum Hypothesis) are all true. He constructed his model L inside the conventional universe V, and his use of V assumed the consistency of the ZF axioms. Thus, he concluded that
if ZF is consistent, then ZF + A C + GCH is consistent.
Though Godel's proof involved constructible sets, this conclusion does not mention con structible sets, and is not restricted to any particular meaning for "sets." (In 1963 Cohen showed, by other methods, that ·CH is also consistent with set theory; see 14.8. Thus the Continuum Hypothesis and the Generalized Continuum Hypothesis are independent of the axioms of conventional set theory.) Godel ' s construction also shows that if ZF is consistent, then so is ZF + AC + GCH + (V = L). The axiom V = L is called the Axiom of Constructibility, which says that all sets are constructible relative to the ordinals. Thus, we cannot be sure that Godel's constructible universe L really is smaller than von Neumann ' s universe V; we do not obtain a contradiction if we assume that those two universes are the same. On· the other hand, it has been proved by other methods that V =/= L is also consistent with set theory - see for instance Bell [1985] so the Constructibility Axiom is independent of the axioms of conventional set theory. -
14.8. Modeling the reals with random variables. The Continuum Hypothesis (CH) can be formulated as a statement about subsets of the reals: It says that no set S satisfies card(N) < card(S) < card(JR). By now the literature contains several different variants of Cohen's proof that CH is independent of ZF + AC. One of the simplest to outline is the following: Let (n, I:, 1-1 ) be a probability space, where the set n has a very high cardinality. Let :R. be the space of all equivalence classes of real-valued random variables. For a suitable choice of (n, I:, 1-1 ) , it is possible to show that (i) For a suitably formulated axiomatization of JR, every axiom of is satisfied with probability 1 by :R..
lR
(ii) If statements that are true with probability 1 are used to generate new state ments, via the rules of logic, then the new statements are also true with probability 1.
Some Informal Examples of Models
349
(iii) The Continuum Hypothesis, interpreted as a statement about :R, is not true with probability Here IR is modeled by :R , and "truth" is replaced by "truth with probability 1 ." The axioms of set theory and of IR, though interpreted in a peculiar fashion, remain unchanged in superficial appearance, and the rules of logic remain unchanged insofar as they deal with strings of symbols. Therefore, regardless of what kind of "truth" and "sets" and "real numbers" we use, the axioms of sets and of IR cannot be used, via the rules of logic, to deduce the Continuum Hypothesis. The explanation sketched above is only the merest outline; the omitted details are numerous and lengthy. Some of them are given by Manin [1977] .
1.
14.9. A tapas model for constructivists. We mentioned in 6.6 that the Trichotomy Law for real numbers is not constructively provable. We now sketch part of a demonstration of that unprovability, due to Scedrov. Our treatment is modified from Bridges and Richman [1987] . By a "Scedrov-real number" we shall mean a continuous function from [0, 1] into R Let f be such a function, let P be a statement about a real number, and let x E [0, 1] ; then we say P is true for f at x if P is a true statement about the real number f (y) for all y in some neighborhood of x in [0, 1]. The collection of all such points x is the truth value of P for f; it is an open set. A statement is true if its truth value is the entire interval [0, 1] . It can be demonstrated (though we shall omit the details here) that the Scedrov-real numbers are a model of the real numbers with constructivist rules of inference. Now let f(x) = x and g (x) = 0, for all x E [0, 1]. Then the truth value of the statement f :S: g is the empty set (since the interior of a singleton is empty), while the truth value of f > g is the interval (0, 1] . Hence the truth value of the statement "f :S: g or f > g" is the interval (0, 1 ] , and thus that statement is not true in this model. 14.10. A finite model. The following example (from Nagel and Newman [1958] ) is a bit contrived, but it illustrates a point well. We consider a mathematical system consisting of two classes of objects, K and L, which must satisfy these axioms: 1 . Any two members of K are contained in just one member of L. 2 . No member of K is contained in more than two members of L. 3 . The members of K are not all contained i n a single member of L, 4. Any two members o f L intersect i n just one member o f K. 5. No member of L contains more than two members of K. The consistency of this axiom system can be established by the following model: (*) Let T be a triangle. Let of T.
L be the set of edges of T, and let K be the set of vertices
We can verify that this model satisfies the preceding five axioms; thus they cannot be contradictory. We emphasize that those five axioms might also have other interpretations;
Chapter 14: Logic and Intangibles
350
we are not restricted to (*) as the only possible interpretation. However, we do have at least one model, given in (*) . This is sufficient to prove that the five axioms by themselves cannot lead to a contradiction. We have used Euclidean geometry to make (*) easy to visualize, but perhaps we do not feel certain about the reliability of Euclidean geometry. The use of geometry is not essential for our present axiom system. We can reformulate (*) without mentioning triangles: (**) Let
a, b, c be distinct objects.
Let
K = {a, b, c} and L = { {a, b}, {b, c}, {c, a} } .
The model (**) has only finitely many parts; thus, it leaves very little room for doubt. The importance of such models is discussed in 14.70.
LANGUAGES AND TRUTHS 14.1 1 . A language is a collection 1:- of symbols, together with rules of grammar that govern the ways in which those symbols may be put together into strings of symbols called "formulas." For instance, one of the most important languages we shall study is the language of set theory. This language includes symbols such as E , �' n, etc. Its grammatical rules tell us that A n B E C is a formula, but
A EE B,
An =
A n U = B,
are not formulas. In formal logic we separate a language from its meanings. For instance, in a formal language, "1 + 2" and "3" are different, unrelated strings of meaningless symbols. When we interpret that language in its usual fashion, then the strings "1 + 2" and "3" represent the same object. Although ultimately we shall be concerned with attaching meanings to the symbols in the language £ , at the outset it is best to disregard such meanings - even the meanings of familiar symbols such as E, � ' n, + , = Conceptually, a good place to start is the monoid of meaningless strings of symbols, described in 8.4.g. .
14.12. Formal versus informal systems. Ordinary mathematicians (i.e. , those other than logicians) may study sentences, theorems, and proofs about (for instance) rings or differen tial equations. However, logicians study sentences, theorems, and proofs about sentences, theorems, and proofs. The ordinary mathematician uses a language that describes rings or differential equations; the logician uses a language that describes languages. The logician is related to the mathematician much as a linguist is related to a novelist. As Rosser [1939] pointed out, in works of logic we may commonly identify at least two distinct systems of reasoning:
a. The inner system of reasoning is the subject of the work. It is a sort of microcosm
of reasoning. It may be less powerful than the reasoning that we use in "ordinary" mathematics, but it is delimited more precisely. Just as a theorem about rings must have precise hypotheses ( "Let G be a commutative ring . . . " ) , so too a theorem in logic
351
Languages and Truths
must have precise hypotheses ( "Let £., be a language with infinitely many free variable symbols . . . " ) . The inner language is also called the object language. Formulas such as ('lit; P(�, x)) U Q (x, J(y, z ) ) will occur in the object languages studied in this chapter. b. The outer system of reasoning is ordinary reasoning. It is conducted in the language of ordinary discourse, also sometimes known as the metalanguage - a natural lan guage such as English or Japanese, modified slightly to suit the specialized needs of mathematicians. In logic, as in algebra or analysis, the outer language usually does not have to be formal - we can communicate effectively without first discussing in detail how we will communicate. When the inner system is mathematics, the outer system is often called "meta mathematics," which translates roughly to "beyond mathematics" or "above mathe matics" or "about mathematics." For instance, the Soundness Principle 14.55(iv) and the Godel-Mal ' cev Completeness Principle in 14.57 are results about formal systems; thus they are "metatheorems" which reside in the outer system. The inner and outer systems do not necessarily have the same truths; one of these systems may be stronger than the other. For instance, we must assume ZF plus the Ultrafilter Principle (UF) in our outer system when we want to prove the Godel-Mal'cev Completeness Principle. That principle can be applied to inner systems that are weaker (such as ZF) or stronger (ZF + AC) or perhaps not even directly comparable. Here is another example: Let "Con" denote consistency. Then "Con(ZF)" and "Con(ZF + AC + GCH)" are two statements about the consistency of certain axiom systems in formal set theory. Thus they are metamathematical statements, where the mathematics in this case is set theory. Then "Con(ZF) =? Con(ZF + AC + GCH)" is a metatheorem - or, if we prefer, a metametatheorem, as discussed below. c. Beginners may find it helpful to view the Ultrafilter Principle and the Completeness Principle as "true;" then they will only need to deal with the two levels of reasoning described above. However, more advanced readers can consider a third level: Through out many chapters of this book we study equivalents of AC and of UF, viewing them as principles which "might be true" or "might be false," depending on what kind of universe we decide to live in. The implication (UF8) =? (UF l l ) , proved in 14.57, is a metametatheorem it is a theorem about metatheorems such as the Completeness Principle. It is even more "outer" than the "outer system" - but to avoid confusion, hereafter we shall not discuss such results in this fashion. ·
--
There is some resemblance between the logician ' s inner and outer systems - both systems include "sentences," "implications," "theorems," and "proofs." This may cause some con fusion for beginners. No such confusion arises in other subjects, such as ring theory or differential equations - e.g. , a theorem about rings generally does not look like a ring.
14.13. The beginner is cautioned to carefully maintain in his or her mind the distinction between inner and outer systems. Throughout this chapter we shall use notations that support that distinction. First, however, we shall give an example of the kind of difficulties that arise when the distinction is not maintained carefully: Berry's Paradox. Call a positive integer succinct if it can be described in
Chapter 1 4: Logic and Intangibles
352
sentences of the English language using less than 1000 characters (where a char acter means a letter, a space, or a punctuation symbol) . There are only finitely many different characters, and so it is clear that there are only finitely many succinct numbers. Let n0 be the first positive integer that is not succinct. We have described no in this paragraph, which is shorter than 1000 characters. So n0 is succinct after all, a contradiction.
Explanation of the flaw in the reasoning. The first sentence suggests that we are to use English for the formal language of our inner system. However, English is a very fluid language, which changes even while it is being used. Everyday, nonmathematical English is permitted to talk about itself, and this kind of self-referencing can lead to paradoxes, but they are not taken seriously because, after all, English is not mathematics. In attempting to make mathematically precise sense out of Berry's Paradox, we must use some frozen, unchanging "version" of English for the formal language of our inner system. We assume that some particular version of English has been selected and is understood by all parties participating in this endeavor. The term "English" hereafter is understood to refer to this frozen, formal language. This object language cannot discuss itself, and thus cannot discuss what is a "sentence of the English language." The notion of a "sentence" (as it is used here) is a metamathematical concept - i.e., a concept about the language, rather than a concept expressed in the language. Now we can give a more precise definition of a "succinct" positive integer: It is a positive integer that can be described in the object language in fewer than 1000 characters. Likewise, no is the first positive integer that cannot be described in the object language in fewer than 1000 characters. These definitions are mathematically precise, quite brief, and not at all fallacious, but they are formulated in the metalanguage, not in the object language. Our definition of n0 is formulated in the metalanguage; we have not given a definition of n0 in the object language. We certainly have not given a definition of n0 that is 1000 characters or fewer in the object language. We cannot conclude that n0 is succinct, so no contradiction is reached. 14.14. When 9" and 9 are formulas in our object language, then 9" --> 9 is also a formula in that language. It is most often read as "9" implies 9." We now consider two types of implications in our metalanguage, which will be investi gated in greater detail later in this chapter. Let � be a set of formulas, and let 9" be a formula. a. A derivation of 9" from � is a finite sequence of formulas £ 1 , £ 2 , . . . , E n such that E n 9" and each £1 is either (i) an axiom, (ii) a member of � ' or (iii) obtained from =
previous members of the sequence by rules of inference. When such a sequence exists, then we say 9" is a syntactic consequence of �; this is abbreviated as � f--- T When � is the empty set, the derivation is called a proof, and the consequence 9" is called a syntactic theorem; we may write 0 f--- 9" or, more briefly, f--- 9". Observe that this notation does not reflect the choice of the axioms, which are nevertheless available for use in the derivation. The statement " � f--- 9" " is equivalent to this statement, which may be preferred by some readers: "any set of axioms that makes all the members of � into syntactic theorems, also makes 9" into a syntactic theorem."
353
Languages and Truths
A set of formulas 2: is syntactically inconsistent if some formula and its negation are both syntactic consequences of I:; otherwise the set of formulas is syntactically consistent. The study of syntactic consequences is sometimes called proof theory.
b. We write L; I= J" to say that J" is a semantic consequence of I:. This means that every model of L; is also a model of J" - i.e., that every interpretation of the language that makes L; true (and makes all of our unmentioned axioms true) , also makes J" true. The axioms, if any, are understood from the context and are not mentioned in this notation. When L; is the empty set, the consequence J" is called a semantic theorem. We may write 0 I= J" or, more briefly, I= J". This means that any interpretation that makes the axioms true also makes J" true. A set of formulas L; is semantically inconsistent if it has no models or seman tically consistent if it has at least one model. The study of semantic consequences is sometimes called model theory. On the surface, proof theory and model theory seem rather different. A fundamental and nontrivial result of first-order logic is that proof theory and model theory are equivalent, in this sense: they actually yield the same notions of "theorem" and "consistency," provided that our system of reasoning is sound. This equivalence will be established in 14.57 and 14.59; thereafter, we can simply refer to a "theorem" and to "consistency." However, to establish the equivalence, we must first develop the syntactic and semantic views separately.
A few cautionary remarks. (i) Terminology varies in the literature. Some mathematicians prefer either the viewpoint of proof theory or the viewpoint of model theory, and so they define the terms "theorem," "valid formula," "true formula," or "tautology" to be synonomous with what we have called a "syntactic theorem" or with what we have called a "semantic theorem." Which term is applied to which type of theorem varies from one paper or book to another. That may confuse beginners, but it does not affect the ultimate results since the two kinds of theorems will eventually be shown equivalent. (ii) The symbol I= has another meaning, which will not be used in this chapter. We mention it to prevent confusion when the reader runs across it in some other book. If M is some particular model in which a formula J" is true, some mathematicians may write M I= :f. This is read as: M is. a model of J", or M satisfies J", or J" holds in M. (iii) The symbols f- (syntactic implication) and I= (semantic implication) should not be confused with these similar symbols: T
(truth) ,
l_
(falsity) ,
If-
(forcing) ,
which are used in some books (but not this one) . Forcing is discussed briefly in 14.53.
354
Chapter 14: Logic and Intangibles
(iv) Although � f- 9 and � F= 9 will ultimately be proved equivalent to each other, they are not equivalent to � ---+ 9. In fact, no direct comparison is possible, since � ---+ 9 is a statement in the object language. We can modify that statement slightly if we wish to make comparisons: Each of the four expressions
f- (� ---+ 9) ' is a statement in the metalanguage. The first two are equivalent to each other, and the last two are equivalent to each other. In propositional logic, all four statements are equivalent to each other. In predicate logic, the last two statements are slightly stronger than the first two statements; this will be discussed further in 14.38 and thereafter.
14.15. The kind of logic used most often in the literature is first-order logic; it is also known as predicate logic or the predicate calculus. To be precise, we may subdivide a theory into these ingredients: A first-order language includes an alphabet of symbols - punctuation sym bols, symbols for individuals, symbols for operations, and quantifiers - and grammatical rules for forming those symbols into formulas. All of the symbols are understood as meaningless characters of a meaningless alphabet; they will only take on meaning when we consider an interpretation or quasi-interpretation (as in 14.47 ) . The specification of the symbols and rules is understood to in clude a specification of the arities of the operation symbols, as explained in 14.18 below. First-order language is discussed in further detail in the next subchapter. A first-order logic includes the language, plus rules of inference and logical axioms. It may also be viewed as including the resulting theorems - i.e., the syntactic and semantic consequences of those rules and axioms. The rules of inference and logical axioms are discussed in the subchapter which begins with 14.25, and the resulting theorems are discussed in the subchapters after that. A first-order theory includes the logic, plus extra-logical axioms. It may also be viewed as including the resulting theorems. Some examples of extra-logical axioms are given in 14.27.
INGREDIENTS OF FIRST- ORDER LANGUAGE We shall now list the ingredients. Some readers may wish to glance ahead to 14.24, where we consider propositional logic, a special case that has fewer ingredients.
14.16. Punctuation symbols. These are parentheses for grouping - i.e., to avoid ambiguity - and commas for delimiting items in a list. It is possible to give precise rules for the use of parentheses and commas, but we shall omit the details.
355
Ingredients of First-Order Language
14.17. Symbols for individuals. (These are omitted in propositional logic; see 14.24.) In predicate logic, there are three types of symbols for individuals: •
individual constant symbols, denoted in the discussions below by
a, b, c, . . . •
or
I
II
a, a , a , . . .
(Actually, constant symbols may be dispensed with as a separate class of symbols, since they may be viewed as function symbols of arity 0; see 14.18 below. )
individual free variable symbols, denoted in the discussions below by
x, y, z, . . . •
or
or
or
I
v, v , v
II
'
0
0
0
individual bound variable symbols, denoted in the discussions below by
�' 7], (
, 0
0
0
or
or
�, (, (',
0
0
0
In most texts on logic, the free and bound variables are taken from the same set of symbols; in some texts either the constant symbols or the variable symbols are omitted altogether. However, we prefer to use three separate sets of symbols; this is discussed further in 14.20. The sets of constants and variables are countably infinite in most applications, but these sets could be larger or smaller. Practical, everyday mathematics uses only countably many symbols - for instance, although there are uncountably many real numbers, we have no way of actually writing down distinct representations for most of those numbers. It is not even humanly possible to write down a countably infinite collection of symbols; that would require more time than any mere mortal has. Nevertheless, for some theoretical purposes it can be useful to conceptualize and investigate a language with uncountably many symbols - e.g., we could say "let £.- be a language that includes a constant symbol c,. for each real number r . We can talk about the c,. 's in the abstract, even if we can ' t write them all down concretely. Hereafter, we shall assume that "
the set of free variables and the set of bound variables are both empty (as zn propositional logic) or both infinite (as in ordinary mathematics) . The case of finitely many variables turns out to be technically different and difficult, and will not be considered in this chapter. That case is manageable for elementary results but becomes difficult starting in 14.4 1 ; for simplicity of exposition we shall exclude that case from the outset. One difficulty with that case can be explained roughly as follows: Reasoning in for_mal logic (or in other parts of mathematics, for that matter) involves substitutions, usually replacing all occurrences of one free variable with copies of some term whose free variables are not already in use. A single computation may involve many substitutions and thus many free variables. It will involve only a finite number of free variables, but in general we do not know in advance how large or small that finite number will be - the number may vary from one computation to another, and in general there is no finite upper bound for the number of variables needed for a computation. If our language has only finitely
Chapter
356
14:
Logic and Intangibles
many free variables - i.e., if some particular finite number is specified in advance - then we may run out of variables before some computations are completed. On the other hand, if we have infinitely many free variables, we can complete any computation and still have plenty of free variables left over.
14.18. Symbols for operations. Each operation symbol has an arity, or rank - i.e., an associated nonnegative integer that specifies how many arguments each of these symbols should be followed by. For instance, if f has arity 4, then we may form expressions such as f(w, x, y, z). The precise rules for forming such expressions are given in 14.22 and 14.23. It is convenient to write x + y instead of +(x, y). The abstract discussions of expressions f ( x, y) will apply to expressions x + y with obvious modifications. Analogous modifications also apply for other commonly used binary operation symbols, such as · , x , 1\, V, etc. We have three types of operation symbols, listed below. Although interpretations are not part of the formal language, a preview of typical interpretations may make the formal language easier to understand, so we include a few examples of interpretations here: (i) Function symbols, here denoted sitional logic; see 14.24.)
J, g, h,
etc. (These are omitted in propo
Examples in arithmetic or analysis. We might use the function symbols +, - , /, all with arity 2, and the function symbols cos and - with arity 1 . A · ,
function symbol with arity 0 is a symbol that gets interpreted as a constant - e.g., the symbol "3" or "J5." In ordinary mathematics, the character "-" represents both the binary operator of subtraction and the unary operator of additive inverse, but those are actually different operators, and for purposes of logic it would be best to represent them with different characters such as "-" and "-." Interestingly, these two operators are represented by different keys on some recent handheld electronic calculators, a source of confusion for mathematicians who grew up using one character for the two operations. Examples in set theory. We might use the function symbols n, U with arity 2, and C with arity 1 ; we might use 0 for a symbol with arity 0. Examples in group theory. A character such as o or D might be used as a function symbol of arity 2. (ii) Relation (or predicate) symbols, here denoted P, Q, R, etc. (In propo sitional logic, these occur only with arity 0, and are then called primitive proposition symbols; see 14.24 and 14. 18(ii) . ) Examples. Some common relations o f arity 2 are , :::; , 2 , = , -=/=-, E , tf_. Many other meanings are possible for relation symbols; for instance, in arith metic, R(x, y) might have the interpretation "x is a divisor of y. " An example of a relation with arity 1 is "x is a prime number." A relation with arity 0 is just a statement that does not mention any variables. Remark. It is actually possible to dispense with function symbols, by viewing each function of arity n as a relation with arity n + 1. For instance, the equation z = x + y determines a function z = f(x, y) with arity 2, but it also determines a relation R(x, y, z) with arity 3.
Ingredients of First-Order Language
357
(iii) Logical connective symbols. The precise choice of logical connective sym bols may vary slightly from one exposition to another. The ones we shall use are: (arity 1 ) not, negation ( arity 2 ) or, disjunction u (arity 2 ) n and, conjunction implies, implication (arity 2 ) Some mathematicians use additional connectives - e.g., the connective ,__.. (iff) or the connective I (the Sheffer stroke) . Also, some mathematicians prefer to define some of the connectives in terms of others - e.g., the connective U may be defined by the equation A U 'B = (>A) -> 'B. However, we prefer to begin with unrelated symbols and then find relationships as a consequence of axioms. The notations vary slightly. For instance, among some mathematicians, "not'' ��or" '"and" "implies"
may be may be may be may be
written written written written
instead instead instead instead
as as as as
V or U
1\ or n or & =}
or
=>
We have chosen our notation in this book so that different symbols are used in logics (U, n) , in lattices (v, /\ ) , and in algebras of sets (U, n ) . This may reduce some confusion when two of these different kinds of structures must interact - see especially 14.27.d, 14.32, and 14.38. It should be understood that meanings are not yet attached to the symbols - not even to familiar symbols such as ', + , C, o, = , E , -> We may call ' the "negation'' or call + the "plus sign" to make them easier to read aloud and to lend some intuition about what this is all leading up to, but we do not yet associate these symbols with their usual meanings or any other meanings. l\Ieanings will be attached later, when we consider interpretations in 14.47. In the formal language, these symbols are merely viewed as meaningless symbols, with arities assigned to grammatically govern the joining of these meaningless symbols into meaningless strings. In fact , the symbols ' and U have slightly different meanings in intuitionist and classical logic, both of which are introduced in the following pages. In the formal theory, our meaningless symbols may also be accompanied by some axioms. The logical connective symbols are governed by the logical axioms (see 14.25) ; function symbols may be governed by extra-logical axioms as in 14.27.c; relation symbols may be governed by extra-logical axioms as in 14.27.a. No other meaning is attached to any of these symbols in the formal theory. .
14.19. Quantifiers. There are two kinds of quantifiers: \/E. .
the universal quantifier, usually read "for each C'
::IE. . the existential quantifier, usually read "there exists � such that."
Chapter
358
14:
Logic and Intangibles
Here � is a bound variable; any other bound variable may be used in the same fashion. (In propositional logic there are no variables and thus no quantifiers; see 14.24) We caution that the symbols 1::/ and :3 occasionally have meanings slightly different from "for each" and "there exists;" see 14.47.j. Until we study their interpretations in 14.47.j , the symbols 1::/ and :3 should be viewed as not having any meaning at all; they are simply meaningless symbols whose use is governed by grammatical rules and inference rules listed in 14.23(iii) and 14.26. The quantifier 1::/ is commonly read as "for all" in the mathematical literature, but we prefer to read it as "for each." In common English, "for all" suggests that the objects are perhaps being treated all in the same fashion. The customary mathematical meaning of 1::/ is closer to "for each," which emphasizes that the objects under consideration can all be treated separately, one by one, perhaps with each treated differently. We emphasize that in a first-order language, a quantifier is understood to act only on an individual variable. Thus, it is possible to say "for each individual C' but grammatically it is not permitted to say "for each formula 3"' or "for each class S of individuals." Those expressions are permitted in higher-order languages - i.e. , languages of second or third order, etc. ; we shall not investigate such languages in this book.
14.20. Discussion of bindings. Some popular expositions of logic are Mendelson [1964] and Hamilton [1978] ; those textbooks have been used widely and their treatment can now be considered "conventional" or "customary." Our own treatment will follow Rasiowa and Sikorski [1963] , which is unconventional in some minor respects. For instance, the Rasiowa Sikorski treatment uses fewer definitions of symbols and more axioms governing the use of undefined symbols. A more important difference is in the use of bound and free variables: •
•
In conventional treatments such as Mendelson [1964] or Hamilton [1978] , the rule for incorporating quantifiers into formulas is trivial: If A is any formula and x is any variable, then 1::/xA and :3xA are formulas, regardless of how x is already being used in the formula A or elsewhere. The same symbols are used for free variables and bound variables; one defines whether a variable symbol x is bound or free according to how and where it appears in a formula. The definitions of bound and free variables and the rules for substitution are (in this author ' s opinion) rather complicated and nonin tuitive. The definitions involve the "scope" of a quantifier and the rather convoluted notion of "a term t that is free for the variable x in the formula �- " In Rasiowa and Sikorski [1963] , the rule for defining free and bound variables is trivial: Before we even begin to think about how to make formulas, we agree in advance which symbols ( x, y, z, . . . ) will be free variables and which symbols ( �, rJ, ( , . . . ) will be bound variables; two disjoint sets of symbols are used. The rules for incorporating those variables into formulas (described in 14.22 and 14.23) and for making substitutions (described in 1 4.26) are not trivial, but they are not particularly complicated.
To motivate either approach, we shall now discuss bindings in general.
359
Ingredients of First-Order Language
In ordinary mathematics (i.e. , outside of formal logic) , bindings are generated by certain operators such as J, :2:.:: , f l For instance, the equation
f(x)
t lo
x
d
e �
x free apparent
makes sense whenever is a real number. In this equation, is a variable, and � is a or variable (also sometimes known as an variable) . The function is a function of it does not really involve �- In some sense, � is not really a "variable" at all - it is just a "placeholder," and the place can be held just as well by nearly any other letter. All of the expressions
bound dummyx;
fx lo
f
w2 dw , f(x). f(x) x3 dE, cannot
represent exactly the same function In fact, that function can also be represented without any dummy variables: = / 3. However, dummy variables are unavoidable for certain other functions; for instance, it is well known (but not easy to prove) that the function = J; exp (e) be represented in terms of the classical elementary functions (algebraic expressions, trigonometric functions, exponentiation, logarithms, and compositions of such). In the paragraph above, we have followed the typographical convention of Rasiowa and Sikorski, using different sorts of letters for free variables etc.) and bound variables ( f,, T), ( , etc .). But in the wider literature, that convention generally is not observed, and any letter can be used for either type of variable. For instance, the function described in the = J0!'. In this re preceding paragraph could be defined as easily by the equation spect, the Mendelson/Hamilton approach follows the convention of "ordinary" mathematics (i.e. , mathematics outside of logic) . However, in other respects the Mendelson/Hamilton approach differs from the conventions of ordinary mathematics, as we shall now describe. J;;' e we can replace f, by any other letter. There is In the equation one exception: We should not replace f, with itself. Polite mathematicians prefer not to write = J;;· .r2 since that equation uses the same letter for two different purposes - as a free variable and a bound variable. Admittedly, that type of expression can be found in some physics or engineering books - it is interpreted to mean the same thing = J;;" e - but mathematicians frown upon such constructions. Likewise, an as expression such as + exp(x2 )
g(x)
(x, y, z,
f( O
J(x) dx
f(x)
J(x)
dE,,
=
dE,
g(x,y)
f2 x dx.
nearly x
x
(x y) 2 + 1Y
dx
will make any well-bred mathematician uncomfortable, but we know that what is probably meant is + + r exp( e ) �.
g(x,y)
(x y) 2
lo
d
Analogous beasts appear in conventional logic books, but with little or no stigma at tached. In the formula
Q(x,y)
u
(vx (R(x,z))),
Chapter
360
14:
Logic and Intangibles
the variable x has one free occurrence and two bound occurrences. (The "x" immediately after the V is one of the bound occurrences. ) Such a distasteful formula is not absolutely necessary for proofs, since Vx (R(x, z)) is in most respects equivalent to Vw ( R(w, z)), as explained in 14.42. Thus we can replace ( * ) with the formula Q(x, y) U (Vw ( R(w, z))), which does not mix free and bound occurrences o f one symbol. 1 1 An integral formula such as g( u) = J0 ftl x3 u dx dx has no clear meaning in ordinary mathematics. Nevertheless, analogous formulas appear often in logic; one such formula is
::Jx
(
\jX ( p ( X, U))
),
( * *)
which has one variable bound twice. Such a formula may seem unnatural, since it has no ana logue outside of formal logic. Again, such beasts are not really necessary: Since Vx P(x, u ) is in most respects equivalent to Vw P(w, u) (see 14.42 ), it may be helpful to view ( * * ) as having the same meaning as ::Jx (Vw ( P( w, u))), an expression with no double bindings. 1 1 1 (An analogous interpretation would make J0 It: x 3 u dx dx equal to J0 ftl w 3 u dw dx.) Thus, i n any explanation o f logic, i t i s necessary t o either (i) prohibit nonintuitive expressions such as ( * ) and ( ** ) , or (ii) provide rules for dealing with such expressions. Conventional books such as Mendelson [1964] and Hamilton [1978] have followed option (ii ) , but the rules are necessarily rather complicated. Rasiowa and Sikorski [1963] have taken option (i) , and so shall we in this book. Since the nonintuitive expressions can always be replaced by more acceptable ones anyway, the difference between options (i) and (ii) has only a superficial or cosmetic effect; it has no effect on deeper results discussed later in this chapter, such as the Completeness Principle. A word of caution: Even the Rasiowa-Sikorski approach is not entirely trivial. Among other things, it permits expressions such as (:3� P(O ) U (V� Q( O ) . The �'s in the first half of this expression are unrelated to the fs in the second half of this expression. Some confusion might be avoided if we replace this formula with the equivalent formula (:3� P(�) )U(V'l Q(77) ) . Actually, variables could be dispensed with altogether; books on combinatory logic such as Hindley, Lercher and Seldin [1972] show that everything can be expressed in terms of functions. That approach will not be followed in this book, however.
14.21. Substitution notation. Throughout the discussions in the next few pages, we shall frequently use this notation: Let x be a free variable symbol. Let A(x) be a finite string of symbols in which x may occur 0 or more times, and in which other free or bound variables may occur 0 or more times. Let be any finite string of symbols. Then A(iJ) will denote the string of symbols obtained from A(x) by replacing each occurrence of x (if there are any) with a copy of the string Of course, if x does not appear in the string A(x), then A(iJ) is identical to A(x) . IJ
IJ .
14.22. Grammatical rules, part 1 . In a first-order language, terms are certain finite strings of symbols formed recursively by these two rules:
36 1
Ingredients of First-Order Language (i) Any constant symbol or free variable symbol is a term.
( ii) If f is an n-ary function symbol and t 1 , t 2 , . . . , t n are terms, then the expres sion j(t 1 , t 2 , . . . , t n ) is a term. There are no other terms besides those formed via these rules. Condition (ii) is not self-referential - i.e., it does not involve any circular reasoning that leads to a contradiction. Indeed, we may classify strings of symbols according to their length (i.e., how many symbols appear in a string) or their depth (i.e. , how many times we have functions nested within functions) . Then the construction in condition (ii) always forms longer terms from shorter ones or forms deeper terms from shallower ones. To prove a statement about all the terms of a language, it is often possible to proceed by induction on the length or depth of the terms. Observation: By our definition, no bound variables appear in terms.
Example. Consider a language in which 2, 5, 6 are among the constant symbols, x and y are among the variable symbols, J and cos are function symbols of arity 1 , and +, - , are function symbols of arity 2. Then the string of symbols (5·x )+( J ( (6·y) - ( cos(2) ) ) ) is a term. When it is given its usual interpretation involving real numbers, then that string of symbols is a real-valued function of two real variables, written more commonly as 5x + J6y - cos 2. ·
Remark. In the discussions below, terms will generally be represented by the letters t, t 1 , t 2 , . . . and s, s 1 , s2 , . . . , etc. However, it should be understood that these letters are
not actually symbols making up a part of our formal language (the inner system) . Rather, these letters are metavariables - i.e., they are part of the metalanguage; they are infor mal conventions adopted for our discussion in the outer system. The precise expression "let t be a term" is an abbreviation for the imprecise and unwieldly expression "let us consider any term, such as J (xl , x2 , g(cl , c2 ) , h(x3 , c3 , c4 ) ) ."
14.23. Grammatical rules, part 2. Certain finite strings of symbols are known as formulas (or, in some books, well-formed formulas , or wff's) . The definitions are recursive: (i) An atomic formula (or atom) is an expression of the form P( t 1 , t 2 , . . . , t n ) , where P is an n-ary relation symbol and t 1 , t 2 , . . . , t n are terms. It is a formula. We permit n = 0. Thus a primitive proposition symbol (i.e. , relation sym bol of arity 0) is an atomic formula. (In propositional logic, this is the only type of atomic formula, since the only relation symbols we have in proposi tional logic are those' of arity 0; see 14.24.) (ii) If A 1 , A 2 , . . . , An are formulas and � is an n-ary logical connective symbol, then �(A 1 , A 2 , . . . , A n ) is a formula. Since we will only use a few connectives, this rule can be restated as: If A 1 and A 2 are formulas, then are formulas. We may omit the parentheses when no confusion is likely. (iii) Suppose A(x) is a formula in which the bound variable � does not occur. Apply the substitution notation of 14.21. Then Vt, A( O and ::lt, A( O are also
Chapter 1 4: Logic and Intangibles
362
formulas. ( Of course, no formulas can be formed in this fashion if the sets of variable symbols are empty, as in propositional logic. ) There are no other formulas besides those formed recursively using the rules above. Remarks. The beginner might be concerned that 14.23 ( ii ) seems self-referential and thus might permit circular reasoning. However, there is no need for worry - no circularity is possible here. Each formula formed as in 14.23 ( ii ) is longer ( in number of symbols used ) than the formulas from which it was formed. A statement about the set of all formulas can be proved by induction on the lengths of the formulas; this is a common method of proof. In the discussions below, formulas will generally be represented by the letters etc. However, it should be understood that these letters are not actually symbols making up a part of our formal language ( the inner system ) . Rather, they are metavariables, adopted for our informal discussion in the outer system. The precise expression "let 9" be a formula" is an abbreviation for the imprecise and unwieldly expression "let us consider any formula, such as (•P(x, f(y, z) ) ) U (Q(x, z) (S(x , y , z, g(z))))))."
A,�' e,
n ((R n
14.24. An important special case of predicate logic is propositional logic ( or propo sitional calculus, also known as sentential logic or sentential calculus) . Historically,
-+ n
it developed before other kinds of logic. It is simpler than predicate logic, in that it has fewer ingredients. A typical formula in propositional logic is ( P ( P ( -. P))) -+ ( -.P); a typical formula in predicate logic is (1::/e, P(�, x)) U Q (x, f(y, z)). I n propositional logic, there are no symbols for individuals - i.e., no constant indi viduals, no bound variables, and no free variables - and there are no function symbols. Consequently, the only relation symbols have arity 0, and there are no quantifiers and no terms.
AssuMPTIONS IN FIRST-ORDER LoGIC In addition to its language ( i.e. , alphabet and grammatical rules, described above ) , a logical theory also involves certain assumptions. These are listed below.
14.25. Logical axioms. The literature contains many different axiomatizations of logic. We shall follow the development of Rasiowa and Sikorski [ 1963] . Our first nine axioms determine what is known as positive logic. (i )
( ii ) ( iii ) ( iv ) ( v) ( vi )
(A -+ �) -+ ((� -+ e) -+ (A -+ e)). This is called the Syllogism Law. A -+ (A u �). � -+ (A u �). (A -+ e) -+ ((� -+ e) -+ ((A u �) -+ e)). (A n �) -+ A. (A n �) -+ �-
Assumptions in First-Order Logic (vii) ( e
___.
A)
___.
363
( ( e ___. 13) ___. ( e ___. (A 13)) ) . n
(A ---> (13 ---> e)) ___. ((A n 13 ) ---> e) . This is the Importation Law. (ix) ((A n 13) ---> e) ---> (A ---> (13 ---> e)) . This is the Exportation Law.
(viii)
The nine axioms above, plus the next two axioms below, determine what is commonly known as intuitionist logic. It was developed largely by Heyting and corresponds closely to intuitionist or constructivist thinking. (x) (xi)
(A n ( -,A)) ---> 13 . This is the Duns Scot us Law.
(A ___. (A n (-,A)) ) ---> ( -,A) .
Finally, the eleven axioms above plus the twelfth axiom below determine what is known as classical logic, which is close to the way of thinking of most mathematicians.
A U ( -,A) . This is the Law of the Excluded Middle, or tertium non datur. The twelve rules listed above are actually axiom schemes - each of them represents infinitely many axioms. For instance, Axiom Scheme (ii) yields the axiom P ---> (.PUQ), but it also yields the axiom (P(x) n R(J(a, y))) ---> ((P(x) n R(J(a, y)) ) U (Q n (--,S) )) by using different formulas for A and 13. (Recall that A and 13 only belong to the metalanguage. They are informal shorthand abbreviations for expressions such as P(x) n R(J(a, y) ) , which ( xii)
belong to the object language. )
14.26. The rules of inference of our logical system are rules by which, from a given set of formulas, we may deduce (or infer ) another formula. Here, "deduce" and "infer" merely mean "obtain." We are not necessarily obtaining "true" formulas -- we are merely collecting "obtainable" formulas, and the rules of inference tell us which formulas are obtainable. Admittedly, the rules of inference are most often applied to formulas that are in some sense "true," but this is not always the case. For instance, in a proof by contradiction, we may assume the negation of the desired conclusion, and then use the rules of inference to try to infer various consequences of that and other assumptions, until a contradiction is reached, thereby proving the desired conclusion. The rules of inference and logical axioms vary slightly from one exposition to another. Indeed, what one book calls a rule of inference is what another book may call a logical axiom. Our own rules, listed below, follow those of Rasiowa and Sikorski [1963] . These rules will be "justified" in 14.55(ii) . (Rl) Modus ponens, also known as the rule of detachment. Suppose formulas. Then from A and (A ---> 13) we can infer 13 .
A and 13 are
Our first rule, modus ponens, is present in all versions o f logic. The remaining rules below involve variables, and so they can be skipped in considering any logic that does not involve variables (such as propositional logic) .
, Xn
(R2) Rule of substitution. Let x 1 , x2 , . . . be distinct free individual variables, and let t 1 , t 2 , . . . , t, be (not necessarily distinct) terms. Let A(x1 , x 2 , . . . , x n ) be a formula,
Chapter 14: Logic and Intangibles
364
in which each of the free individual variables x 1 , x2 , . . . , Xn occurs 0 or more times. Let A(t 1 , t 2 , . . . , tn) be the formula obtained from A(x 1 , x2 , . . . , xn) by simultaneously replacing all occurrences of the x1 's with copies of the corresponding t1 ' s. Then from A(x 1 , x2 , . . . , Xn) we can infer A(h , t 2 , . . . , tn) · In the four rules below, let A(x) be a formula in which the bound variable � does not occur; we follow the substitution notation of 14.2 1 . Also, let � be any formula. Then:
( R3 ) Introduction of existential quantifiers. Suppose � contains no occurrence of x. Then from A(x) -+ � we can infer (3�; A(�)) -+ � . ( R4 ) Introduction of universal quantifiers. Suppose � contains no occurrence of x. Then from � -+ A(x) we can infer � -+ (VE A(�)) . (R5) Elimination of existential quantifiers. ( We make no assumption about whether x appears in � . ) From (3� A( O ) -+ � we can infer A(x) -+ �. (R6) Elimination of universal quantifiers. ( We make no assumption about whether x ap pears in � . ) From � -+ (V� A(�)) we can infer � -+ A(x). The rules of inference form the basis for our syntactic implications, and in fact the rules of inference are our most basic examples of syntactic implications. The rule of modus ponens says that for certain formulas �' 9, 9-C we have �' 9 I- 9-C; the other five rules of inference are of the form � I- 9. The rules of inference listed above will be assumed i.e., taken as hypotheses in our reasoning about reasoning. Some auxiliary rules of inference will be proved as consequences of modus ponens and the logical axioms in 14.30. -
14.27. Examples of extra-logical axioms. Besides the logical axioms shared by essentially all first-order theories determining our reasoning methods, a particular first-order theory may have additional, specialized axioms, determining the mathematical objects that we wish to study with that reasoning process. We refer to these as extra-logical or nonlogical axioms. Below are some examples. It should be understood that these examples are not part of the general explanation of predicate logic developed in this chapter - i.e., we shall not assume these axioms later in this chapter. a. In many first-order systems, a special role is played by a relation symbol of rank
two, called equality or equals. It is equipped with several axioms; the precise list of axioms varies from one exposition to another. Most often this symbol is denoted by = - though in some expositions it may instead be denoted by � or = or some other symbol, to emphasize that it is a symbol with precisely specified properties under formal study rather than just ordinary informal equality. Here is a typical set of axioms for equality: "
"
( i) x = x. (ii ) (x = y) -+ (y = x). (iii) ( (x = y) n (y = z) ) -+ (x = z ) .
"
"
"
"
Assumptions in First-Order Logic
365
(iv) Let s 1 , s2 , t be terms, and let y be a free variable. For j = 1 , 2, let tj be the term obtained from t by replacing each occurrence of the variable y with the term si . Then (s 1 = s2) ---> (t 1 = t2) is an axiom. (v) With notation as in 14.2 1 , if s 1 , s2 are terms, then A(s2)) is an axiom.
(s 1
=
s2) ---> (A(si) --->
The first three of these axioms say that equality is an "equivalence relation," in a sense similar to that in 3.8 and 3.10. The last two axioms say that "equals can be substituted for equals." Actually, there is some redundancy in our formulation; our axioms (ii) and (iii) actually follow from axioms (i) , (iv) , and (v) . (See Hamilton [1978] , for instance.) A logical system that includes such axioms is generally called predicate logic with equality. In this book we shall consider the axioms of equality to be extra logical axioms, since they do not occur in all first-order systems. However, we caution that some mathematicians are concerned solely with first-order systems with equality, and some of these mathematicians find it convenient to designate the axioms of equality as "logical axioms" - which means that those axioms may sometimes get used without being mentioned. b. For the theory of preordered sets, two of the binary relation symbols are = and �. Axioms used are the axioms for equality (described above) plus these axioms for the ordering: (reflexive) (x � x), (transitive) ((x � y) n (y � z)) ---> (x � z). For the theory of partially ordered sets, we add this axiom: ( antisymmetric)
((x � y) n (y � x)) ---> (x = y) .
All of the axioms above are of first order - i.e., they deal only with individual members of a preordered or partially ordered set D, not with subsets of that set. In contrast, the Dedekind completeness of a poset D is a statement requiring a higher order language. Recall that D is Dedekind complete if each nonempty subset that is bounded above has a least upper bound. In symbols, that condition is
0)) n (:3x Vy (y E S ---> y � x))) ---> ( ::Ju Vx ((u �
for each S � D, ( ( (S = ) � (Vy (y E S ---> y � x))))
.r
c.
•
where A � '13 i s a n abbreviation for (A ---> '13) 1\ ('13 ---> A). The condition begins with "for each set S that is a subset of D;" thus it involves a quantifier that ranges over subsets of D. There are other, equivalent ways to formulate the condition of Dedekind completeness of D, but none of them can be expressed in first-order language over D. Contrast this with 14.27.d, below. For the theory of monoids, one of the binary relations is =, one of the binary functions is o, and some nullary function (i.e., function of arity 0) is denoted by i. Axioms used arc the axioms of equality (described above) plus these axioms: (associative) (right identity) (left identity)
(x o y) o z x o = x, i o x = x. z
=
x o (y o z),
Chapter
366
14:
Logic and Intangibles
Additional algebraic axioms can be used to determine the theory of other types of algebraic systems - e.g., groups, rings, etc.
d. In the language of set theory, the individual elements a, b, c, x, y, z, etc., that we discuss are intended to represent sets. In conventional (i.e. , atomless) set theory, the only undefined constant is all other constants are defined in terms of it. Thus, 0 is 1 is an abbreviation for 2 is an abbreviation for an abbreviation for etc. , as i n 5.44. A basic binary relation is E (membership) . Other relations can be defined in terms of membership. For instance, u � v means (x E u) --+ (x E v), and u = v means (u � v) n (v � u ) . The most commonly used axioms of set theory are the ZF axioms listed in 1 .47. To make ZF into a first-order theory we must view the Axiom of Comprehension and the Axiom of Replacement not as single axioms, but as axiom schemes. Each of these two schemes represents infinitely many different axioms. We have one Axiom of Comprehension for each property P that can be formulated in the first-order language, and one Axiom of Replacement for each function f that can be formulated in the first-order language. (See also the reinterpretation of these axioms indicated in 14.67.) The language of set theory is extremely powerful - it is more expressive than any of the other languages mentioned above. As we remarked in 1 .46, all familiar objects of mathematics can be expressed in this language. The integers can be built up using the Axiom of Infinity; the rational numbers can be built up using equivalence classes of pairs of integers; the real numbers can be built up using Dedekind cuts of rationals. The language of set theory is sufficiently expressive for us to assert that a certain poset X is Dedekind complete: We can describe the ordering as a subset of X x X , and the subsets of X are members of P(X ) . Contrast this with 14.27.b, above.
0,
0;
{ 0},
{ 0, { 0}},
SOME SYNTACTIC RESULTS (PROPOSITIONAL LOGIC) 14.28. Remark. We now consider some consequences of the logical axioms and inference rules listed in 14.25 and 14.26. We begin with some results that do not mention variables or constants; these results will not require any rules of inference except modus ponens; these results will apply equally well to propositional logic or predicate logic. In 14.39 we shall begin to consider results that do involve variables and constants.
14.29. Some basic syntactic theorems of positive logic.
A --+ A. (ii) A __. (:B --+ A). (iii) :B --+ (A --+ A). (i)
367
Some Syntactic Results (Propositional Logic) (iv) A ----> ( (A ----> A) ----> A). (v) ((A ----> 13) n A) ----> 13 .
(vi) A ----> (13 ----> (A n 13 ) ) . These results will be proved for all formulas A and 13,using the axioms of positive logic (i.e. , Axioms (i) through (ix) of These results will be used later in proofs.
14.25).
Proof. The formula
is an instance of Axiom (ix) , and
( ((A n A) ----> A) n A) ----> A is an instance of Axiom (vi).
Combine these, by modus ponens, to prove the formula ((A n A) ----> A) ----> (A ----> A). Next , (A n A) ----> A is another instance of Axiom (vi); combine that with the preceding formula to prove Theorem (i) . Theorem (ii) is immediate from Axioms (v) and (ix) via modus ponens (with the sub stitution e = A). The formula (A ----> A) ----> (13 ----> (A ----> A)) i s an instance of Theorem (ii). Combine it with Theorem (i) , by modus ponens, to prove Theorem (iii ) . Theorem (iv) is just an instance of Theorem (ii ) . The formula ( (A ----> 13) ----> (A ----> 13 ) ) ----> ( (A ----> 13) n A) ----> 13 is an instance of Axiom (viii). An instance of Theorem (i) is (A ----> 13) ----> (A ----> 13 ) . Combine those, by modus ponens, to prove Theorem (v). The formula ( (A n 13) ----> (A n 13)) ----> (A ----> (13 ----> (A n 13 ) ) ) is an instance of Axiom (ix) , and the formula (A n 13) ----> (A n 13) is an instance of Theorem (i) . Combine these to prove Theorem (vi).
(
)
14.30. Additional rules of inference. We shall use the axioms of positive logic to prove: (i) If A, 13, e are some formulas such that A ----> 13 and 13 ----> e are syntactic theorems, then A e is also a syntactic theorem. (ii) A n 13 is a syntactic theorem if and only if both A and 13 are syntactic ---->
theorems. (iii) (A ----> A) ----> A is a sy�1tactic theorem if and only if A is a syntactic theorem. (iv) If A ----> (13 ----> e) and A ----> 13 are syntactic theorems, then A ----> e is a syntactic theorem.
Proofs. We shall use not only Axioms (i) through (ix) , but also some of the Theorems of the previous section. Rule (i) follows easily from Theorem (i) and modus ponens. For Rule (ii), observe that if A n 13 is a syntactic theorem, then A and 13 are syntactic theorems by Axioms (v) and (vi) via modus ponens. Conversely, if A and 13 are syntactic theorems, then A n 13 follows from Theorem (vi) and two applications of modus ponens.
Chapter
368
14:
Logic and Intangibles
To prove Rule (iii) : If A is a syntactic theorem, then from Theorem (iv) by modus ponens we know (A ----+ A) ----+ A is a syntactic theorem. Conversely, if (A ----+ A) ----+ A is a syntactic theorem, then from Theorem (i) by modus ponens we can conclude that A is also a syntactic theorem. To prove Rule (iv) : Note that (A ----+ ('B ----+ e)) ----+ ((A ----+ 'B) ----+ (A ----+ (('B ----+ e) n'B))) is an instance of Axiom (vii ) . Combine it with the two given syntactic theorems, via modus ponens; thus we obtain A ----+ ( ( ('B ----+ e) n'B)) as a syntactic theorem. On the other hand, an instance of 14.29(v) gives us ( ('B ----+ e) n 'B) ----+ e. Combine these two results, using 14.30(i) ; thus A ----+ e is a syntactic theorem.
14.31. A set I: of formulas is syntactically inconsistent if we can use I: to deduce both A and -,A, for some formula A. Note that we can then use I: to deduce any formula; that is clear from the Duns Scotus Law (Axiom (x) in 14.25) . If I: is not syntactically inconsistent, then it is syntactically consistent. A derivation is understood to involve only finitely many steps, and so it can only involve finitely many of the axioms. Therefore, a collection of formulas is syntactically consistent if and only if each finite subset of that collection is syntactically consistent. In other words, syntactic consistency of sets of formulas is a property with finite in the sense of 3.46.
character,
14.32. Definition of the ordering of the language. Let F be the set of all formulas. We use the positive logic axioms from 14.25, plus whatever additional axioms we may choose. We now define two binary relations on F as follows: For formulas A and 'B, A � 'B will mean that the formula A ----+ 'B is a syntactic theorem; A � 'B will mean that both A ----+ 'B and 'B ----+ A are syntactic theorems. It should be emphasized that the relations � and � are part of our metalanguage, not part of our object language (see 14. 1 2.b) . Thus, the expressions A � 'B and A � 'B are not "formulas." It follows from Theorem (i) of 14.29 and Rule (i) of 14.30 that
� is a preorder, and � is an equivalence relation, on F. Let lL = (F / �) be the set of equivalence classes. The set lL, equipped with the operations discussed below, is commonly known as the Lindenbaum algebra. The preorder � on F determines a partial order on lL, which we shall also denote by � Let [ ] : F ----+ lL b e the quotient map; that is, [A] i s the equivalence class containing the formula A. Thus
[A] � ['B]
if and only if
(A ----+ 'B) is a syntactic theorem.
We shall use equality ( = ) in its usual fashion as a relation between equivalence classes. Thus, [A] = ['B] means that A and 'B belong to the same equivalence class - i.e. , it means that A � 'B.
369
Some Syntactic Results (Propositional Logic)
It should be emphasized that any axioms whatsoever (or no axioms at all) may be used in addition to Axioms (i)-(ix) . Different choices of additional axioms yield different relations � ' :::::: and thus yield different Lindenbaum algebras. Throughout most of our discussions in this and the next chapter, we assume that some particular choice is made regarding the additional axioms, and thus we may speak of the Lindenbaum algebra.
14.33. Theorem. The Lindenbaum algebra (L, �) defined above is, in fact, a relatively pseudocomplemented lattice (as defined in 13.24) , with operations given by ( [A] =;. [13] ) = [A ----> 13] , 1 = [A ----> A]
[A] v [13] = [A u 13], [A] A [13] = [A n 13] ,
for any formulas A, 13. A formula A is a syntactic theorem if and only if it satisfies [A] If we assume the axioms of intuitionist logic, then L is a Heyting algebra, with
O = [A n ( --.A)],
=
1.
C[A] = [·A]
for any formula A. The Heyting algebra is degenerate (i.e., satisfying 0 = 1 ) if and only if our set of axioms is syntactically inconsistent (i.e., there is some formula such that An (·A) is a syntactic theorem) , in which case every formula is provable. If we assume the axioms of classical logic, then L is a Boolean algebra. Its greatest member 1 is also equal to [A U (•A)] for any formula A. The Boolean algebra L is more than just 0, 1} if and only if at least one formula :J is neither provable nor disprovable from the axioms.
{
Proof of theorem. (This argument follows the exposition of Rasiowa and Sikorski [1963] .) We first show that any two-element subset of (L, � ) has a supremum. Let any two elements of L be given; then those two elements can be represented as [A] and [13] for some formulas A and 13 (which are not uniquely determined by the given elements of L). From Axioms (ii) and (iii) we see that [A U 13] is an upper bound for the set { [A] , [13] } in the poset (L, � ) . Is it least among the upper bounds? Let any other upper bound for { [A] , [13] } be given; say that upper bound can be represented by [e] for some formula e. Then A ----> e and 13 ----> e are syntactic theorems, by our definition of � · By Axiom (iv) via modus ponens, it follows that (A v 13) ----> e is also a syntactic theorem; thus [A u 13] � [e] . Thus [A u 13] is indeed least among the upper bounds of { [A] , [13] } . An analogous argument works for lower bounds, using Axioms (v), (vi ) , and (vii). Thus L is a lattice, with [A] V [13] = [A u 13] and [A] 1\ [13] = [A n 13] . Next we shall show that ( [A] =? [13] ) = [A ----> 13] defines a relative pseudocomplementa tion operator. Let any formulas A, 13 be given; we are to show that [A ----> 13] is the largest A in L that satisfies A 1\ [A] � [13] . By Theorem (v) of 14.29 we know that [A ----> 13] is one of the A ' s with that property. Is it the largest? Let any formula '.D satisfying ['.D] 1\ [A] � [13] be given. Then ('.D n A) ----> 13 is a syntactic theorem. The formula
is also a syntactic theorem, as it is an instance of Axiom (ix) . From those two syntactic theorems via modus ponens we deduce the syntactic theorem '.D ----> (A ----> 13 ) . In other words, '.D � (A ----> 13 ) , so [A ----> 13] is indeed largest .
Chapter 1 4: Logic and Intangibles
370
That proves our claim about relative pseudocomplements. As in any relatively pseudo complemented lattice, we now know that the largest element of [, is 1 = [A ---> A] , for any formula A. By 14.30(iii) we know that A is a syntactic theorem if and only if 1 � [A] ; that is, if and only if 1 = [A] . Now suppose our axioms include the axioms of intuitionist logic. By Axiom (x) we see that [A n (-,A)] � [13] for any formulas A, 13 . Thus [, has a smallest element, given by the rule 0 = [A n (-,A)] for any formula A. Hence [, is a Heyting algebra. The conclusion about inconsistency and degeneracy is now obvious. We have 0 = [A] /\ [-,A] . By 13.25.a it follows that [-,A] � ( [A] ::::;. 0). On the other hand, we also have ( [A] ::::;. 0) = [A ---> (A n (-,A))] � [-,A] by Axiom (xi ) . Thus the pseu docomplement ( [A] ::::;. 0) is equal to [-,A] . If we also assume Axiom (xii) , then [A] V (C[Al) = [A U (-,A)] = 1 , so [, is a Boolean algebra.
14.34. Further consequences in intuitionistic logic. In any intuitionist logic, all the formulas given by the following schemes are syntactic theorems. This follows from the fact that the formulas correspond to identities that are satisfied in any Heyting algebra. a. b. c. d. e. f. g. h. i.
Interchange of Hypotheses. (A ___. (13 ___. e)) ___. (13 ___. (A ___. e)). Contrapositive Law. (A ---> (---,13 ) ) ---> (13 ---> (-,A) ) . Double Negation Law. A ---> (-,-,A) . (A ___. (-,A)) ___. ('A). (A ---> 13) ___. ((---,13 ) ___. (-,A) ) . Brouwer's Triple Negation Law. (-,-,-,A) ---> (-,A) and (-,A) ---> (-,-,-,A). ((-,A) n (---, 13 )) ___. (-,(A u 13 ) ) and (-,(A u 13)) ___. ((-,A) n ( ---, 13 )). ((,A) u (---, 13 )) ___. (-,(A n 13 ) ) . ((-,A) u 13) ___. (A ---> 13 ) .
I n classical logic we have those formulas, plus the ones listed i n the section below.
14.35. Some nonconstructive techniques of reasoning. In the setting of intuitionist logic, the following formula schemes are undecidable, in the sense that they can neither be proved nor disproved syntactically, using just the intuitionist logical axioms. Moreover, they are equivalent to each other, in the sense that any one of them can be deduced from any of the others. They are all derivable in classical logic; adding any one of them to intuitionist logic yields classical logic. Some of the formulas below could be viewed as symbolic representations of the principle of proof by contradiction. (A ) Law of the Excluded Middle: A U (-,A)
(B) Converse of the Double Negation Law: (-,-,A) ---> A
(C) (A ___. 13) ___. ( (,A) u 13 ) ( D ) ((-,A) ___. (---, 13 ) ) ___. ( 13 ___. A)
Some Syntactic Results (Propositional Logic) (E) ( (---. A) ----> 13 ) ___, for all formulas A, 13.
371
((---. 13 ) ----> A)
Proof The equivalence of these conditions is just a restatement of the result of 13.29. To say that these conditions cannot be disproved in intuitionist logic is just to say that the axioms of classical logic are syntactically consistent; that will be established in the next chapter. To show that these conditions cannot be disproved in intuitionist logic, let ( H, 0, 1 , V, 1\, =:;., C) be some particular Heyting algebra that is not a Boolean algebra. (An example of such is mentioned in 13.28.a.) Say the members of H are 0, 1 , a, b, c, etc. Form a propositional calculus that has one primitive propositional symbol for each member of H; say the primitive propositional symbols are denoted P0 , P1 , Pa , Pb , Pc, etc. Now interpret each formula in the propositional logic as the corresponding member of the Heyting algebra. For example, one instance of the Duns Scotus Law is the formula ( Pc n (---. Pc) ) ----> Pa ; interpret this as the member of the Heyting algebra represented by ( c 1\ (Cc)) =:;. a. That expression simplifies to 1 , in any Heyting algebra. It is now a tedious but straightforward matter to verify that (i) each of the eleven logical axiom schemes of intuitionist logic is represented by 1 in the Heyting algebra; and (ii) if :1 and :1 ----> 9 are formulas represented by 1 in the Heyting algebra, then 9 is also represented by 1 in the Heyting algebra - i.e., modus ponens preserves "truth;" but (iii) the Law of the Excluded Middle is not represented by 1 in this particular Heyting algebra. This completes the proof. Remarks. In effect, we have used H as a "quasimodel" for our propositional calculus. Models and quasimodels will be explored in greater detail later in this chapter. However, for brevity we shall only consider classical logics, and so all our formal models and quasimodels will be Boolean-valued. Thus the argument given in the preceding paragraph does not quite fit the formal framework developed later in this chapter. 14.36. Discussion of intuitionist logic. The axiom system of classical logic is slightly stronger than that of intuitionist logic. Hence, the set of syntactic theorems in classical logic is slightly larger than that of intuitionist logic. The connective U has rather different meanings in classical logic and intuitionist logic. In classical logic, if P is a primitive proposition symbol about which nothing in particular is assumed, then neither P nor ---. P is a theorem; nevertheless P U ( ---. P ) is a theorem - this is just the Law of the Excluded Middle. In contrast, in intuitionist propositional logic, if A and 13 are some formulas such that A U 13 is a syntactic theorem, then at least one of A or 13 is a syntactic theorem. (The proof is too difficult to give here; a topological proof is given by Rasiowa and Sikorski [1963, page 394] . An analogous result for predicate logic can be found on page 430 of thl}t book.) This result may surprise many readers, because it is so different from what we are familiar with in classical logic. It may also puzzle some readers, because it seems to give a stronger conclusion in intuitionist logic than in classical logic - even though intuitionist logic is the weaker logic. But read carefully! Since intuitionist logic has fewer axioms and fewer syntactic theorems than classical logic, the hypothesis that "A U 13 is a syntactic theorem of intuitionist logic" is stronger than the hypothesis that "A U 13 is a syntactic theorem of classical logic."
372
Chapter 14: Logic and Intangibles
Heyting developed his algebraic approach to intuitionist logic in a paper in 1930. In 1932, Kolmogorov published some related results, including this intuitive ( i.e., real-world ) interpretation of Heyting's formalism: Let us use letters such as A, � e, etc., to denote problems that are to be solved. Interpret ' connectives as follows: A n � means the problem "to solve both A and � ," A U � means the problem "to solve at least one of A or � ," A -+ � means the problem "to show how any solution of A would yield a solution of � ," -.A means the problem "to show how any solution of A would yield a contra diction." Then the properties of Kolmogorov's system of problem-solving coincides with the prop erties of Heyting's formal intuitionist propositional calculus. ( For further references and discussion, see Kneebone [ 1963] . ) The Law of the Excluded Middle ( which we introduced in 6.4 ) is not taken as an axiom in the intuitionist system of Heyting or Kolmogorov - i.e., although for some particular problems A we may be able to solve at least one of A or -.A, we do not have a general method for doing that. The systems of Heyting and Kolmogorov reflect a somewhat constructive viewpoint, but we shall not try to make that description precise, for there are many different schools of constructivism.
S OME SYNTACTIC RESULTS ( PREDICATE LOGIC) 14.37. Remark. Our preceding syntactic results did not make any direct use of variables; they would apply equally well to propositional logic or predicate logic. We now turn to syn tactic results that do involve variables. Most of these results are only relevant to predicate logic. A few of them are also relevant to propositional logic, but take a simplified form in that case; see 14.40.b for instance. 14.38. We begin by considering the relation between these two kinds of implications: ( ii ) I; u 3'" f- 9. ( i ) I; f- (3'" -+ 9), Here 3'" and 9 are any formulas, and I: i s any set of formulas. It is easy to see that ( i ) '* ( ii ) - i.e., that whenever 3'" and 9 are some particular formulas that satisfy ( i ) , then they also satisfy ( ii ) . (Proof. Assume I: f- (3'" -+ 9), and assume we are given the set of formulas I: U 3'". Since we are given I:, we may deduce 3'" -+ 9 . Since we are also given 3'", by modus ponens we may deduce 9 . ) Under certain additional assumptions we can show that ( ii ) ==* ( i ) , and therefore ( i ) , ( ii ) are equivalent; that is the subject of 14.39 and 14.40. However, in general ( i.e., without additional assumptions ) , ( ii ) does not imply ( i ) ; that is shown by an example in 14.60.
Some Syntactic Results (Predicate Logic)
373
14.39. The Deduction Principle. Let 9" and 9 be formulas, and let I: be a set of formulas. Suppose that I: U {9"} f- 9; that is, there exists a derivation of 9 from I: U {9"} . Suppose, moreover, that the derivation can be chosen so that whenever any of the inference rules (R2) , (R3) , (R4) is used, then the free variables x, x1 , x2 , x3, . . . being replaced are symbols that do not appear in 9". Then I: f- (9" --+
9).
Proof We view I: as a collection of extra-logical axioms. Let £ 1 , £ 2 , . . . , e n be the given derivation. Thus, e n = 9, and each CJ is either a logical or extra-logical axiom, or 9", or a consequence of previous c/s by the rules of inference. It suffices to prove, by induction on k = 1 , 2, . . . , n, that I: f- (9" --+ ck) · We prove this by considering cases according to the method by which Ck enters the given derivation. If c k is an axiom, then from C k --+ (9" --+ c k) (in 14.29(ii) ) and Ck we may deduce 9" --+ c k . If c k is equal to 9", then 9" --+ £ k is the formula 9" --+ 9", which was proved in 14.29(i) . Thus, in these cases we have I: f- (9" --+ c k ) , without even referring to the induction hypothesis. Next, consider the case in which Ck follows from previous formulas Ci and CJ via modus ponens. Then (with i and j switched if necessary) we may assume ci is the formula CJ --+ C k . B y our induction hypothesis we have I: f- (9" ---. (£J ---. ck ) ) and I: f- (9" --+ cj ) · B y 14.30(iv) it follows that I: f- ( 9" --+ c k ) . Next, consider the case i n which c k follows from some previous formula Cj by the Rule of Substitution (R2) � i.e., by replacing some or all of the free variables with specified terms. Since none of those free variables appear in 1", the same substitution leaves 1" unaffected. Thus 1" --+ c k follows from 1" --+ c J by the Rule of Substitution. Next, consider the cases in which ck follows from some previous formula CJ by one of the remaining inference rules (R3), (R4) , (R5), (R6). In these cases, Cj is a formula C --+ 'D of a certain type, and from it we can deduce ck , a formula C' --+ 'D ' of a certain type. It suffices to show, by different reasonings in these four cases, that ( a) from 9" --+ ( e --+ 'D) we can deduce 1" --+ ( e' --+ 'D ' ) . ' For some of these cases it is helpful to use 14.34.a; thus ( a ) is established if we can just show that from e --+ (1" --+ 'D ) , we can deduce e' --+ (1" --+ 'D ' ) . (b) For other cases, it is helpful to . use Axioms (viii) and (ix) in 14.25; thus if we can just show that from ( e
n 1") --+ 'D ' we can deduce ( e'
n
1") --+ 'D ' .
(a)
is established ( c)
A-pplications of (R3) are of this form: A(x) contains no occurrence of � ' 'B contains no occurrence of x, and from A(x) --+ 'B we infer (:3�; A(�) ) --+ 'B . By assumption, x does not occur in 1", hence it does not occur in ( 1" --+ 'B ), and so from A ( x) --+ ( 1" --+ 'B) we infer (:3�; A(O ) --+ (1" --+ 'B), by the same rule of inference. That is just (b) . Applications of (R4) are of this form: A(x) contains no occurrence of � ' 'B contains no occurrences of x , and from 'B --+ A(x) we infer 'B --+ (Vt; A(�) ) . By assumption, 1" contains
Chapter
374
x, �) A(x) (:lt;, A(O) A(x) __, A(x) __, __, (' 'B I
( CIA I ), ( IAI v I'BI ), ( I A I A I'BI ), ( I A I =* I'B I ).
380
j.
Chapter
14:
Logic and Intangibles
In each of these equations, the connective on the left side is the formal, logical symbol; the connective on the right side is a unary or binary operation in the Boolean algebra B. In general, a formula with n distinct free variables is interpreted as a mapping from D n into B. Quasi-interpretation of quantifiers. If the language has infinitely many free variable symbols, then quantifiers are interpreted as suprema and infima in the Boolean algebra B, as follows: For simplicity we explain quantifiers first in the case of a formula involving only one free variable. Suppose x is the only free variable occurring in A(x), and � does not occur in A(x); we follow the substitution notation of Then x f-+ I A(x) l is a function from D into B, which takes some truth value whenever x is replaced by some d E D (whether that d has a name or not). Then we define the quasi-interpretations
14.21.
IA(d) l , 1 3� A( � ) l = dsup ED
I V� A(� ) l
dE D I
= inf
A(d) l .
The sup and inf are with respect to the ordering of the Boolean lattice B. More generally, assume is a bound variable, x 1 , x 2 , , Xn are distinct free variables, A(x1 , x 2 , . . . , x n ) is a formula whose only free variables are x 1 , x 2 , . . . , Xn , and � does not appear in A(x1 , x 2 , . . . , Xn) · Then (x 1 , x 2 , . . . , Xn) f-+ I A(x l , x 2 , . . . , Xn) l is a function from D n into B, taking a truth value whenever x 1 , x 2 , . . . , Xn are replaced by some d 1 , d2 , . . . , dn E D (whether those dj 's have names or not). Now we define the quasi-interpretations
�
. • •
sup I A(d, x 2 , X3 , . . . , xn) l , dED these are functions from D n - l into B. To make sense of this definition, we require that (�) all sups and infs of the types indicated above must exist in B.
One simple way to satisfy (�) is by insisting that B be a complete Boolean algebra i.e., by requiring that every subset of B have a sup and an inf. However, that simple requirement may be too strong in some cases - see so we merely keep it in mind for motivation; (�) is the one condition we shall actually impose as part of our definition of "quasi-interpretation." Caution: When B is the two-element Boolean algebra { then each of those sups or infs indicated above is actually a maximum or a minimum, and so and 3 actually do have the meanings "for each . . . in D . . . " and "there exists . . . in D such that . . . " However, when B is a larger Boolean algebra, then the sups and infs need not be maxima or minima. It is quite possible that supdE D I A(d, x 2 , x 3 , . . . , xn) l is equal to and yet no one of the elements d E D actually satisfies I A(d, x 2 , x 3 , . . . , Xn) l = (Thus, in the terminology of and an existential formula may be valid and yet not have a witness.) The possible lack of a suitable d is the cause of some of the complications in the pages that follow - e.g., this is why is needed as a step in the proof of the Completeness Principle.
14.56 -
0, 1},
\i
.
1,
14.46
1.
14.48,
14.46
381
The Semantic View
14.48. More terminology. Let (B, D, I I ) be a quasi-interpretation of some language f.-, and let 3" be some formula in that language with n free variables. Then 13"1 is a function from D" into B. It is a constant function (i.e., a member of B) if n = 0; it may or may not be a constant function otherwise.) We say that the formula 3" is valid in the quasi-interpretation if that function 1 3"1 is a constant function that takes only the value 1. Observation. From the definition given in 14.47.j for the quasi-interpretation of quan tifiers and the definition of closures given in 14.44, we see immediately that in any quasi
interpretation, the closure of any valid formula is valid. Example. Consider quasi-interpretations with domain D
equal to the set Z of integers. Let J"(x) be a formula that states (in some appropriate symbolism) that "x is even;" then ---, (J"(x)) states that "x is odd." When the language of arithmetic is given its usual inter pretation, then the formula (:f(x)) (---, (:f(x))) is valid, since every integer is either even or odd; this is an instance of the Law of the Excluded Middle. However, neither of the formulas :f(x) or ---, ( :f(x)) is valid, since neither of the statements "every integer is even" or "every integer is odd" is correct. 14.49. Definition. Let V be the set of all free variable symbols in a language £.-, and let (B, D, I I ) be a quasi-interpretation of £.-. By a valuation (or assignment) on (B, D, I I) we shall mean a map 'lj; : V ---> D; thus it is just a member of D v . We shall denote by 1 11/J the effect of combining a quasi-interpretation I I and a valuation 'lj;. Expressions are interpreted with values in B, as in 14.47- 14.47.j , but in addition each free variable v is replaced by its valuation 'ij;( v) E D. Thus, all the functions get evaluated. For any formula 3" - even one involving free variables - the result 1 3"1 is a particular member of the Boolean algebra B, not just a function from Dn into B. We emphasize that l :tl v, is not necessarily 0 or 1; it may be some other member of B. Observe that a formula 3" is valid in I I , as defined in 14.48, if and only if it satisfies l:tlv• = 1 for every valuation 'lj;. 14.50. More terminology. Let (B, D, I I ) be a quasi-interpretation of the language, and let I: be a collection of formulas in that language. We say that (B, D, I I ) is a quasimodel of the theory E if every formula in I: is valid in (B, D, I I ) . A two-valued quasimodel of a theory I: (i.e., a quasimodel in which B = {0, 1}) will be called a model of I:. Alternate terminology. Instead of "quasi-interpretation" or "quasimodel," some mathemati cians use the terms Boolean-valued interpretation or a Boolean-valued model. We prefer not to use those terms, for this reason: Following common nonmathematical English usage, those terms would appear to refer to notions that are less general than "interpreta tion" or "model." Our prefix of "quasi-" suggests greater generality, and is therefore more descriptive. Also, Rasiowa and Sikorski [1963] use the term "realization" for both quasi-interpreta tions and quasimodels, but perhaps that is less helpful to the beginner 's intuition. A few mathematicians use the term "model" where we have used the term "valuation." This changes the nature of the theory, but not by very much if (as in some books) we do U
Chapter
382
14:
Logic and Intangibles
not distinguish between constants and variables.
14.51. Quasi-interpretations of propositional logic. The ingredients of a quasi interpretation can be simplified substantially when we work with propositional logic - i.e., when there are no quantifiers, individual constants, individual variables, or relation symbols of arity greater than 0. In this special case we can take the domain D to be empty. For a theory in propositional logic, a quasi-interpretation means an assignment of some truth value for each primitive proposition symbol P; we may denote that assignment by I P I . Thus, we only need to specify a mapping I I as in 14.47.e, and only for n = 0. After that, the quasi-interpretation recursively assigns a true value to compound propositions, as in 14.4 7.i. Note that if each primitive proposition is true or false - i.e., if I P I E {0, 1} for each prim itive proposition symbol P - then in fact 1:11 E {0, 1 } for each formula :f; this follows by in duction on the lengths or depths of formulas. In this case, the resulting quasi-interpretation is in fact an interpretation.
14.52. Example. Peano arithmetic uses a constant symbol "u" (the "unit" or "urele ment" ) , a unary function "CJ" (the successor function), the binary relation "=" with the axioms for equality (listed in 14.27.a) , plus these three further axioms:
(CJ(�) = u)). That is, u is not the successor of any number. (ii) ( (CJ(x) = CJ(y)) ----+ (x = y)) . That is, CJ is injective. (iii) The Induction Axiom. If S is a subset of the domain that satisfies u E S and also satisfies ((x E S) ----+ (CJ(x) E S)) , then S = D. (i) ' (3�;
A few models of Peano's Axioms are given by: •
D = N = {1, 2, 3, . . . } , u = 1 , CJ(x) = x + 1;
•
D = N U {0}, u = 0, CJ(x) = x + 1;
•
D = 2 N = {2n : n E N}, u = 2, CJ(x) = x + 2 .
I t is not difficult to manufacture more models of Peano's Axioms. However, all these models are isomorphic - i.e., it can be shown that if (D, u, CJ ) is any model of Peano' s Axioms, then there is a unique bijection b : D ----+ N such that b( u) = 1 and b( CJ( x)) = 1 + b( x). Thus, Peano's Axioms determine N uniquely up to isomorphism. Peano's first two axioms fit into a first-order language, but the last axiom requires a higher-order language, since it quantifies over sets S � D. In a first-order language, we have no precise representation of the notion of "a subset of D." For most purposes, we can replace axiom (iii) with a scheme of infinitely many first order axioms: For each property P( x) that can be expressed in first-order language, we have an axiom (iii) p
( (P(u) n (P(x) ----+ P(CJ(x)))) ----+ (Vr, P(O ) .
383
The Semantic View
This axiom scheme is slightly weaker than Peano's Axiom (iii). One way to see that fact is to note that if we only have finitely or countably many symbols in our language, then there are only countably many properties P that can be expressed in the language, but N has uncountably many subsets S for us to consider. For another demonstration that the first-order axiom scheme (iii) is weaker than (iii), see 14.63, where we shall show that no system of first-order properties of N can uniquely determine N up to isomorphism. 14.53. A brief introduction to forcing (optional). Cohen's method of forcing is a technique for creating models and quasimodels, particularly of set theory. Our presentation below is based on Bell [1985]. Let B be a complete Boolean algebra. We shall describe classes y(B) and y( r ) , which can be used for the domains for quasimodels of set theory, taking truth values in B. The Boolean-valued universe y(B) , will be defined recursively, in a fashion some what analogous to the construction of the von Neumann universe V in 5.53, but with this difference: When we ask whether x E y and whether x = y, the answers are not necessarily members of the Boolean algebra 2 = {0, 1} = { "no," "yes" }; rather, the answers may be members of the Boolean algebra B. More precisely, for each ordinal a, let v�B ) be the set of all B-valued functions X that have Dom(x) c;;; Vr�B) for some ordinal (3 < a; then let y(B) be the union of all the V�B ) 's. Truth values in this quasi-interpretation are defined recursively too. The language of set theory expresses everything - ordered pairs, the integers, functions, etc. - in terms of set membership, so in our formal language we can dispense with function symbols and with most relation symbols; we only need the two relation symbols E and =. A term is a constant or a free variable; an atomic formula is an expression of the form s E t or s = t where s, t are terms. Truth values of atomic formulas are defined thus: sup (v(y) 1\ l u = Yl) , lu E v i P
(
)(
yEDom(v)
)
sup u (u(x) lx E v i ) 1\ sup (v(y) =? IY E ul) . l u = v i = :r EDom( ) yEDOm(v) Other formulas are built from atomic formulas and evaluated in a fashion similar to that in 14.47.i, 14.47.j . With these evaluations, y(B) is a quasimodel of conventional set theory, ZF + AC (if we assume ZF + AC in the outer system). Making different choices of the complete Boolean algebra B yields different additional properties of y(B) , and hence various semantic consistency results. For instance, with a suitable choice of B, y(B) does not satisfy the Continuum Hypothesis; therefore Con(ZF + AC + -.CH). Con(ZF) However, all the quasi-interpretations constructed in the fashion above will satisfy ZF + AC. To get negations of AC, we need a more complicated construction, based on automorphisms of B. =?
384
Chapter 14: Logic and Intangibles
An automorphism of B is a Boolean isomorphism g : B B - i.e., a Boolean homomorphism that is also a permutation of B. The automorphisms of B form a group, Aut(B), with group operation given by the composition of functions. An automorphism g : B B can be extended naturally to a map g (B) : V (B) V( B) recursively by this rule: Whenever u E V (B) with domain Dom(u), then g (B) u is the member of V( B ) that has Dom(g( B lu) = {g(B) x : x E Dom(u)} and is defined on that domain by (g(B lu)(gC B lx) = g(u(x)). (That last g is just the original mapping from B into B.) It is not hard to verify that the map g g (B) is a group homomorphism; that is, it preserves compositions: (gh) (B) = g( B ) h (B) . Let G be a subgroup of Aut( B). For each x E V (B) , define the stabilizer group stabc(x) {g E G g(x) = x }; it is a subgroup of G . Now let r be a collection of subgroups of Aut( B). We now recursively define the Boolean valued universe vcr) ' a subclass of V (B) ' as follows: -->
-->
-->
�
:
For each ordinal a, let v� r) be the set of all B-valued functions X that have Dom(x) � vJ r) for some ordinal (3 < a and satisfy stabc(x) E f;
then let vcr) be the union of all the v�r) 's. Truth values can be defined on vcr) just as they were defined on V (B) . Certain choices of G and yield quasimodels of certain set theories. For instance, Bell [1985] shows a quasimodel of this sort in which a set is infinite but Dedekind finite (see 6.27); hence the axiom of Countable Choice is not satisfied. Thus Con(ZF + ..,cc). Con(ZF) The omitted details are very large and numerous, and are not intended as an exercise. The interested reader should consult Bell [1985] and other books on forcing. The main ideas of forcing can be reformulated in syntactic terms. Let P be a suitable subset of B \ {0}. For p E P and formulas A, let p A be an abbreviation for p � I A I, where I I is the truth-value mapping and � is the ordering of the Boolean algebra B; then p is called a "forcing condition." The basic properties of the Boolean-valued universe vcr) can be reformulated as properties of the forcing relation In fact, it is possible to study without referring to Boolean-valued universes. This approach is more difficult for newcomers to logic and will not be explained here, but it seems to be preferred by logicians - they find it more intuitive than the Boolean-valued approach. This is the approach originally used by Cohen. The approach via Boolean-valued universes is a later reformulation, due largely to Scott and Solovay. Historical note: The interested reader may search in vain for an important paper of Scott and Solovay, often referenced as "to appear." That work actually did not appear. It is subsumed by Bell [1985], as explained in Scott 's foreword in that book. r
If-
I f- .
I f
Soundness, Completeness, and Compactness
385
SOUNDNES S , C OMPLETENESS , AND C OMPACTNESS 14.54. Observation. Any first-order language (as described in 14.15- 14.23) has at least one interpretation (as described in 14.47-14.50). Proof. Here is one trivial construction: Let D = {0}, where "0" is some object - i.e., let D be a singleton. Interpret every constant symbol to have value 0; interpret all the relation symbols to only take the value "true." 14.55. Proposition. Every quasi-interpretation of the language £., is also a quasimodel of the logic. That is, if (B, D, I I) is a quasi-interpretation of a first-order language f- , then (i) each of the twelve logical axioms listed in 14.25 is valid in I I; and (ii) each of the six rules of inference listed in 14.26 is valid in the following sense: Whenever E and :J are formulas that are valid in I I and 9 is a formula that can be deduced from E and :J using one of the rules of inference, then 9 is also valid in I 1 Using the two preceding results plus an induction argument, it follows that (iii) If I: is any given set of extra-logical axioms and A is a formula that can be deduced from I: and the logical axioms via the rules of inference, then A is valid in every quasimodel of I:. Since every model is a quasimodel, as a corollary we obtain this slightly weaker result: (iv) The Soundness Principle. If I: is any given set of extra-logical axioms and A is a formula that can be deduced from I: and the logical axioms via the rules of inference, then A is valid in every model of I:. In other words, if I: f- A, then I: F= A. In other words, every syntactic theorem is a semantic theorem. Remark. Some mathematicians use the term "theorem" only for syntactic theorems and call a formula "true" if it is valid in every model. With that terminology, the Soundness Principle takes this more memorable form: Every theorem is true. Proof. We first consider the validity of the twelve logical axioms. We shall demonstrate validity only for Axiom (ii); the other axioms can be verified in a similar fashion and are left as exercises. Let I I be some quasi-interpretation of the language f- ; we wish to prove that l A --> (A U 23)l,p = 1 for every valuation 'lj;. By 14.47.i, that condition can be restated as (IA I,p (IAI,p V l23l ,p )) = 1. But (a =? (a V (3)) = 1 is true for any elements a, (3 in any Boolean algebra B. This proves the validity of Axiom (ii). Next, we consider the validity of the inference rules. We shall verify this only for ( R3 ) and ( R5 ) ; verification of the other inference rules is left as an exercise. We assume � is a bound variable that does not occur in the formula A(x); we follow the substitution notation of 14.21. Let 23 be some formula; for ( R3 ) we also assume that x does not occur in 23. Let =?
386
Chapter 14: Logic and Intangibles
be a given quasi-interpretation of the language. The conditions "A(x) ----> � is valid" and ----> � is valid" can be restated, respectively, as IA(x ) I 'P � I � I'P for every valuation (1) 13� A (0 1 1/l � 1 � 1 1/1 for every valuation 'lj;. (2) If 'lj; is any given valuation, for each d E D we may define an auxiliary valuation by when v =/:. x when v = x . From the definition in 14.47.j we see that 1 3� A (� ) l 1/l supdE D IA(x ) l 1/ld · Thus (2) can be restated IA(x) l 1/ld � 1�11/1 for every 'lj;, I I , and d. (2') To verify (R3), we nee-1 to show that (1) implies (2'). Since x does not occur in the formula � we find that 1 � 1 1/ld 1 � 1 1/1 for every d; hence IA(x ) l 1/ld � 1 � 1 1/ld = 1 � 1 1/1 · To verify (R5), we' need to show that en implies (1); just observe that when {j '1/J(x) , then '1/Jb = 'lj;. 14.56. Observation. Let � be a syntactically consistent set of formulas. Then � has a quasimodel. In fact, one can be specified as follows: For domain D use the set of all terms in the language, with the interpretation mapping I I defined on terms by the identity mapping. For the Boolean algebra of truth values use the Lindenbaum algebra JL, with the interpretation mapping I I defined on formulas by the equivalence class mapping [ ] defined in 14.32. Remark. We do not assert that the Lindenbaum algebra is necessarily complete. The fact that it satisfies condition 14.47.j (q) follows from 14.41. 14.57. We shall show that the following two principles (and two more covered in 14.59) are equivalent to the Ultrafilter Principle; we refer especially to other equivalents in 13.22. I I "(3� A (0 )
r.p ,
=
='
=
(UFll) for Propositional Logic: (UF12) for Predicate Logic: Godel-Mal'cev Completeness Principle ( consistency version ) . If � is any set of formulas, then these three conditions are equivalent: (A) � is syntactically consistent - i.e., � cannot be used to deduce a contradiction. (B) � is semantically consistent - i.e., � has at least one model. (C) � has at least one quasimodel. As an intermediate step between (UFll) and (UF12), we shall also prove the equivalence of this more complicated principle:
387
Soundness, Completeness, and Compactness
Let ,C be a language that has no variable symbols and no quantifiers (but may still have constants and functions). Let � be a set of formulas in ,C that is syntactically consistent in ,C. Then � has at least one model ( {0, 1 } , D, I 1 ) . Furthermore, the model can be chosen so that the mapping I J {terms of ,C } D is surjective - i.e., so that for each individual d E D there is at least one term t satisfying l t l d. (UF13)
:
----+
=
Proof. The implication (C) =? (A) is proved as follows: Suppose � is not syntactically consistent. Then there is some formula A such that both A and ·A are syntactic theorems. If I I is a quasimodel of � ' then it makes both A and ·A valid - that is, I A I and I•AI are both equal to the constant function 1. Then 1 = I •AI = CIAI = C1 = 0. Thus the Boolean algebra B is degenerate, contrary to the requirement in 14.47.a. This shows (C) =? (A) . The implication (B) (C) is trivial, since every model is a quasimodel. It only remains to prove (A) =? (B). (That implication by itself is sometimes known as the Completeness Principle.) Proof of (UF8) =? (UFll). As we noted in 14.56, the Lindenbaum algebra lL is a quasi model of �. By (UF8), there exists a Boolean homomorphism from lL into {0, 1 }. Use that homomorphism to map the truth values in lL to truth values in {0, 1 }. The homomor phism preserves the action of V, /\, C, =? . We need not concern ourselves with V since we are considering only propositional logic, which has no individuals or quantifiers. Thus the resulting map into { 0, 1 } is a model of �. Proof of (UFll) (UF13). Let e be the given predicate calculus - i.e., the language ,C equipped with the logical axioms and rules of inference, the given extra-logical axioms � ' and the r�ulting syntactic theorems. T__? construct a model, we shall first form a related language ,C and propositional calculus e; a model for that propositional calculus will be used to form a model for e. Form a languag� ,C by taking each atomic formula of ,C as a primitive propositional variable symbol of ,C. Thus, an expression such as P(f(a, b) , g(c)) will be treated as a single symbol, grammatically on the �arne level as Q or R. The terms f(a, b) and g(c) and the constants a, b, c play no role in ,C , except as meani�gless marks on paper that serve to make up parts of that single symbol. The language ,C will have no individual variable symbols, individual constant symbols, or functions. It will have the same logical connective symbols • , as the language ,C . Each formula in either of the languages ,C or E can be reinterpreted as a formula in the other language by reading it in a different fashion. For instance, in the original language ,C , the expression P Q(a, b, g(c)) consists of seven symbols =?
:J,
=?
n, U , ----+
U
u
p
Q
a
b
g
c
joined together with commas, parentheses, and juxtapositions; but in the new language E the expression P Q( a, b, g(c)) consists of just the three symbols U
p
joined together with juxtaposition.
u
Q(a, b, g(c) )
388
Chapter 14: Logic and Intangibles �
�
Form a new propositional calculus e using the language £., and the same set I; of extralogical axioms (but read in a different fashion, noted above). Since £., has no variable �mbols or quantifiers, rules of inference (R2) through (R6) are irrelevant; thus both e and e have modus ponens as their only rule of inference. Therefore, proofs in the two systems are identical in appearance (though read differently). By assumption, e is syntactically consistent; therefore e is, too. By (UFll), e has a model, as described in 14.51. That is, there exists a mapping {formulas of E} { "true," "false" } defined on the primitive proposition s,ymbols and then defined recursively on other formulas, in such a way that all the axioms of e become true. Next, let D be the set of all terms that can be formed in the language £., , as defined in 14.22 - i.e., expressions such as f(a, b) and g(c) . We shall now construct a model for e whose domain is the set D. To do that, we must describe an interpretation mapping I I that can be applied to constant symbols, to function symbols, and to relation symbols, as explained in 14.47. For terms, the mapping I I will just be the identity mapping. In other words, any term t in £., is a string of symbols that is a single element d of D; we interpret I t I = d. Next we shall interpret atomic formulas: If P is an n-ary predicate symbol in the language £.,, and t1, t2, . . . , tn are terms, then the atomic formula P(t 1 , t 2 , . . . , tn) in £., will be given the same truth value (1 or 0) that__!t had when we viewed it a primitive proposition symbol in the propositional calculus e introduced a few paragraphs ago. Finally, we recursively assign truth values for compound propositions, as in 14.47.i. Thus we obtain an interpretation of £.,, which is in fact a model of I;, Proof of (UF13) =? (UF12). Let I; be a syntactically consistent set of axioms; we wish to prove that I; has a model. By repeated use of the Rule of Generalization 14.43, we may replace all the members of I; with closed formulas - i.e., formulas with no free variables. Then, replacing members of I; by equivalent formulas (where equivalence is as in 14.32), by 14.42 we may assume that each axiom in I; is in prenex normal form - i.e., with all the quantifiers at the beginning of the formula. Let 'J0 represent the given logical system - i.e., the given language and syntactically consistent set of axioms. Form new, syntactically consistent systems 'J1 , 'J2 , 'J3 , . . . recur sively; obtain 'In+ l from 'In by adding new axioms and new constant symbols, using the construction given in 14.46; the axioms added in this fashion are also without free variables. Let 'J00 = U:=o 'In , in the obvious sense - i.e., let 'J00 be the original system To plus all the additional axioms and constant symbols of the 'In 's. Clearly, 'J00 is also syntactically consistent, by 14.31. Let ll be the subset of 'J00 consisting of those statements that do not contain any quantifiers. Now let M = ( {0, 1 } , D, I I ) be a model for ll, of the type described in (UF13) - i.e., with each individual named by at least one term. We shall show that M is also a model for 'J (and hence also for our original system 'J = 'J0). It suffices to show, by induction on integers k 2: 0, that if � is an axiom of 'J in which k or fewer quantifiers appear, then � is valid in the interpretation M. as
as
oo
oo
Soundness, Completeness, and Compactness
389
This is clear for k 0, since such axioms are just the axioms of 11. Suppose it is true for some k, and let :I be an axiom of 'J00 involving k + 1 quantifiers; we shall show that this :I is also valid in JYL There are two cases to consider: :I is either of the form A(�) or of the form :lf, A(�) , for some formula A. For these two cases we refer again to the construction in 14.46. A( O . Thus \::If, A(�) is an axiom in 'J00 , hence in 'In for Case (i). :I is of the form all integers n sufficiently large - say for all n ::::0: j. If t is any term in the language of 'J00 , then t is a term in 'In for some n ::::0: j, and so (by our construction of 'In + l from 'In ) we know that the statement A(t) is an axiom of 'In + l , hence of 'J00 • The statement A(t) involves only k quantifiers, hence it is valid in JY(. Thus, 1 = I A(t) l = I A I ( I t l ) = I AI (d) , where d = l t l . By our assumption about the model M in (UF13), the mapping I I takes terms onto individuals; thus IA I (d) = 1 for every individual d in the domain D. By the definition in 14.47.j , therefore, I Vf, A( O I = 1, so :I is valid. Case (ii). is of the form :lf, A(�) . Then A(�) is an axiom of 'In for some n. By our construction of 'In + l from 'In , the axiom has a witness - i.e., there is some constant symbol c in 'In+ l such that A(c) is an axiom of 'Jn+l · Now A(c) is an axiom of 'J00 involving only k quantifiers, so it is valid in M. Therefore :lt; A(O is valid in M by the definition of I :Jf, A( O I in 14.47.j . This completes the proof. =
\::If,
\::If,
J"
:lf,
14.58. Remarks. The earliest version of the completeness principle was due to Godel, so it is sometimes known as the Godel Completeness Theorem. It should not be confused with Godel's Incompleteness Theorems, introduced in 14.62 and 14.70. In mathematics, the term "complete" generally means "not missing any parts," or "not having any holes in it" - see 4.14. Predicate logic is complete in some respects, but incomplete in other respects. The equivalence of the completeness principles with other forms of UF was proved by Rasiowa and Sikorski [1951] , Los [1954] , and Henkin [1954] . Our exposition is based on Cohen [1966] and several other works. With a bit more work (not shown in detail here) , our proof of (UF12) can be modified to show a slightly stronger principle: If I: is a syntactically consistent set of formulas, then I: has a model whose domain D satisfies card(D) -::; max{card(I:), card(N) } .
any syntactically consistent first-order theory with a countable language has a countable model. Most languages used in practice are countable (i.e., have only countably
In particular,
many symbols) . As we remarked in 14.6, it is possible to form a model of set theory by replacing '{.Oil Neumann ' s universe with some other class JY( of sets, which may be smaller. The class JY( need not be a proper class - it may be a set. In fact, it may even be a countable set, since set theory can be described with a countable language. Thus arises a situation which, at first, seems paradoxical: Set theory describes and proves the existence of various uncountable objects, and yet some of the sets that can be used as domains for models of set theory are countable! This is known as Skolem's Paradox. To understand this, we must distinguish between the "inner" and "outer" systems, described in 14.12. The set D may be countable in the outer system, but not in the inner system - i.e., there may be a
V
Chapter 1 4: Logic and Intangibles
390
N
bijection between D and in the informal, outer system that we use to analyze the model, but there may be no such function in the formal, inner system.
14.59. Here is another form of the Completeness Principle: (UF14) for Propositional Logic: (UF15) for Predicate Logic: Completeness Principle ( theorems version). Let I: be a collection of axioms, and let A be a formula. Then the following are equivalent: (A) A is a syntactic theorem. That is, I: f- A. (B) A is a semantic theorem. That is, I: of I:, the formula A is also valid.
t=
A. That is, in every model
(C) In every quasimodel of I:, the formula A is also valid.
Proof that the "consistency versions" (UF l l ) and (UF12) imply the "theorem versions" (UF14) and (UF15), respectively. The implication (A) '* (C) was given in 14.55(iii) . The
implication (C) '* (B) is trivial, since every model of I: is a quasimodel of I:. It suffices to prove (B) '* (A) . Let X be the closure of A (defined as in 14.44). In every model of I:, since A is valid, X is also valid, by 14.48. If I: U { •X} is syntactically consistent, then it has a model by by (UF l l ) or (UF12), but that model would make X and •X both valid, a contradiction. Thus I: U {·X} is syntactically inconsistent. The formula •X has no free variables, so by 14.40.c we obtain I: f- ••X. Since we are using classical logic, that simplifies to I: f- X. By the result in 14.44, then, we have I: f- A . This completes the proof.
Proof that the "theorem versions" (UF14) and (UF15) imply the "consistency versions" (UF l l ) and (UF12), respectively. The only part that requires proof is (A) '* (B) . Let A
be any formula. Suppose that I: has no model. Then it is vacuously true that every model of I: makes A n (·A) valid. Thus A n ( •A) is a semantic theorem, and therefore a syntactic theorem. Thus from I: we can deduce A n (•A); that is, I: is syntactically inconsistent.
14.60. A pathological example. To complete the discussion in 14.38, we shall now present an example in which but not
f- (J" ---* 9).
It i s clear from 14.40.a that i n such an example, J" must have at least one free variable. Actually, what we shall prove is that J" f= 9 but not f- (J" ---* 9); the desired conclusion then follows from the Completeness Theorem. Assume that our language includes (among other things) the constant symbols 0 and 1 , the binary relation symbols = and -:/- , and at least one free variable symbol x. Assume that our axiom system includes at least the usual axioms for equality (these are listed in 14.27.a) and the axiom 0 -:/- 1 . Let 9 be the formula "0 = 1;" thus •9 is one of our axioms.
Soundness, Completeness, and Compactness
391
The formulas x = 1 and x =I- 1 are negations of each other, but neither of these formulas is a valid formula in any interpretation of the language, since each can be falsified by at least one valuation � i.e., by at least one choice of the value of x. Let :1 be either one of these two formulas ( it doesn't matter which) . Then neither :1 nor has a model, hence neither is a semantic theorem, hence neither is a syntactic theorem. Since there are no models of :1, we can say ( vacuously ) that every model of :1 is also a model of 9. That is, :1 f= 9. I f :1 ---> 9 were a syntactic theorem, then its contrapositive, ( •9) ---> (-.:f) would also be would also be a theorem, by modus ponens, since ( •9) is one of our a theorem. Then axioms. But we already know that (-.:f) is not a theorem. Thus, :1 ---> 9 is not a syntactic theorem � i.e., we do not have f- (:J ---> 9 ) .
-,:f
(-,:f)
14.61. Following are two more equivalents of UF: (UF16) for Propositional Logic: (UF17) for Predicate Logic: Compactness Principle. If � is a set of formulas, every finite subset of which has a model, then � has a model.
Remarks. The name "Compactness Principle" stems from some topological considerations described in 17.25. A nonlogicians ' variant of the Compactness Principle is given by ( UF2 ) � see the remarks in 6.35. Proof of ( UF l l ) =? ( UF16 ) and proof of ( UF12 ) =? observation about finite character, in 14.31.
( UF17 ) .
Immediate from the
Proof of ( UF17 ) =? ( UF16 ) . Propositional logic is a special case of predicate logic. Proof of ( UF16 ) =? ( UF8 ) . Let X be a nondegenerate Boolean algebra; it suffices to
show that the dual of X is nonempty f : X ---+ { 0, 1 } that satisfies
f(x V y)
=
f(x) V f(y) ,
�
i.e., we are to show the existence of a function
f(Cx) = Cf(x),
f(1)
=
1
( !)
for all x, y E X. Let 'B { 0, 1 } . Now true if x tt Xo or f(x) = 1 interpret Px as false if x E Xo and f(x) = 0 .
.
14.62. Let us emphasize the difference between a model and a quasimodel. A model "answers every question" that can be expressed in the formal language by assigning a truth value of "true" or "false" or 0) to every closed formula. A quasimodel does not give such a definite answer, since its truth values may range through a Boolean algebra. The Lindenbaum algebra, which yields a quasimodel as in plays this special role: It tells us which formulas are provable or disprovable, by assigning them the truth values of or 0. Some formulas may be neither provable nor disprovable - the Lindenbaum algebra may have other values besides 1 and 0. We can answer some of the unanswered questions by adding more axioms, but we would have to keep adding more axioms; that will be evident from Godel's Incompleteness Theorem, described below. If we really want to have an answer to every question, the Completeness Theorem gives us one way to accomplish that. Any consistent theory has a (not necessarily unique) model and thus a (not necessarily unique) method for assigning the value "true" or "false" to every closed formula. We can even make each closed formula provable or disprovable, in this rather contrived fashion: Form a model, and then use that model's valid formulas as the axioms for a new theory. The new theory extends the old one, is consistent, and has Lindenbaum algebra However, this formulation is not constructive, since the Completeness equal to {0, Theorem is not constructively provable. The resulting axiom system is extremely large and not recursive. Godel's First Incompleteness Theorem, published in 1931, says that for sufficiently complicated theories we cannot answer all the questions. Somewhat more precisely:
(1
14.56,
1
1}.
Let 'J be a formal theory that includes arithmetic, and assume that the axioms of 'J can be described in a mechanical fashion (i.e., recursively - we shall not give a precise definition of this term). Assume the language of 'J includes only countably many symbols. Then:
Godel's First Incompleteness Theorem. If 'J is consistent, then there exist formulas that can be formulated in the language of 'J, but cannot be proved or disproved within the formal system from the axioms of 'J. We shall not prove this theorem, but we shall sketch some of the ideas of the proof. The remainder of this section is optional; it will not be needed later in the book. Let 'J be a theory that contains arithmetic; say £.., is the language of 'J. Some properties for instance, a number x is composite if it satisfies of numbers can be expressed in £.., -
Now assume that 'J's language £.., has only countably many symbols, and let S be the set of all finite strings of symbols. Then it is possible to number these strings - i.e., to define a canonical injective mapping # : S __, N. Statements about strings can be transformed to statements about numbers - for instance, define a relation !> on the positive integers by saying that m !> n if m = # (S) and n = #(T) for strings S, T such that S is a proof of T.
Soundness, Completeness, and Compactness
393
The relation t> is a purely numerical relation - i.e., it is a relation whose graph is a subset of N x N. Although we defined t> in terms of the language £., and the correspondence #, it is possible to describe this same relation t> in purely numerical terms, without mentioning £., or #. When A is a formula, we shall call #(A) the Godel number of A. Let G be the set of all Code! numbers; it is a subset of N. We shall now outline a proof of:
Lemma. Let Q be a property of some natural numbers - i.e., assume that Q (x) is true for some natural numbers x and false for others. Assume, moreover, that the statement "x has the property Q" is expressible in the formal language C Then there exist a particular number n and a formula A such that (i) the Code! number of A is n, and (ii) A expresses the statement that ''the number n has the property Q." Sketch of proof of lemma. Let v be some particular free variable, which will not change for the remainder of this discussion. Define a special function 'P : G x N ---> G as follows: To evaluate 'P( m , n), let Srn be the string o f symbols with Code! number m. The number n can be expressed in the language £., ; let Tn be the string of symbols that expresses the number n. Let Um . n be the string obtained from Srn by replacing each occurrence of v in it with a copy of the string Tn ; finally, let 'P( m, n) be the Code! number of Urn . n · The equation z = 'P( x, x) is a statement about numbers. Code! proved that it can be expressed in the formal language C The map x f-+ 'P(x, x) is an operation somewhat analogous to the operation of quining, which was introduced in 1 . 12. Now, let 9 be the formula that expresses, in the language £., , the statement "'P( v , v ) has the property Q." Say the Code! number of 9 is p. The number p can be expressed in the formal language. Obtain A from 9 by replacing each occurrence of v with the string that expresses p, and let n = '-{J(p, p) ; this proves the lemma.
Proof of theorem, continued. For property Q (x) , Code! uses a property such as "x is the Code! number of a formula that is not provable in 'J." Of course, it is a nontrivial matter to establish that the lemma is applicable to this property. The lemma then yields an effectively constructible formula that says, roughly, "I am not provable." (However, it uses indirect self-referencing as in Quine's Paradox ( 1 . 12), rather than direct self-referencing as in Epimenides's Paradox ( 1 . 1 1 ) . ) Such a statement cannot be provable and therefore must be true and therefore cannot be disprovable either. In this fashion we obtain Codel's First Incompleteness Theorem. Remarks. Longer informal expositions of this subject can be found in Rosser [1939] , Nagel and Newman [1958] , Hofstadter [1979] , and Mac Lane [1986] . More technical and detailed expositions can be found in books on logic.
Chapter
394
14:
Logic and Intangibles
NONSTANDARD ANALYSIS
lR
14.63. No system o f first-order properties of N or can uniquely determine N or R Any first-order theory that can be modeled by N or can also be modeled by some system of "numbers" that includes infinitely large members. More precisely, we have this proposition:
lR
Skolem's example ( 1934). Let £ be a first-order language that includes infinitely many free variables, the relation symbol " " , and the constant symbols "1," "2," "3," . . . (and possibly other symbols as well) . Let I: be a set of axioms in that language. Suppose that N (respectively, JR) is the domain for some model of I:, giving the symbols " " and "1," "2," "3," . . . their usual meanings. Then there exist other models for I:, which are not isomorphic to N (respectively, JR). In fact, there exists a model that contains an "infinitely large number" - i.e., a number that is greater than all the numbers 1 ,2,3, . . . .
y and 1' ----> z -- that is, such that 1' � N(y) and 1' � N(z). In view of 7.18(E) , that can happen if and only if every member of N(y) meets every member of N(z). 409
Chapter 1 5: Topological Spaces
410
b. X has the "star property:" If z E X and (xa) is a net that does not converge to z, then (xa) has a subnet (yf3) that stays out of some neighborhood of z; hence no subnet of (yf3 ) converges to z. Also, X h ts the "sequential star property:" If z E X and (xn ) is a sequence that does not converge to z, then (xn) has a suiJsequence (Yn ) that stays out of some neighborhood of z; hence no subsequence of (Yn ) converges to z (and in fact, no subnet of (Yn) converges to z). Hints: Assume the net (xa : a E A) does not converge to z. Then there is some neighborhood N of z such that Xa is not eventually in N. Thus, IllS = {a E A : Xa tJ_ N } i s a frequent subset o f A, and s o the frequent subnet (xa : a E IllS ) has the desired properties. For the sequential result, recall from 7. 16.d that any frequent subnet of a sequence is actually a subsequence. Remark. We t>hall see in 21 .43 that some complete lattices have order convergences that lack the seque:1tial star property and therefore are not pretopological convergences. c. If (xa) and (Yf3) are nets in a set S � X, both converging to some limit z E X , then (xa) and (Yf3) are '3ubnets of a single net in S which also converges to z. Hint: Let the given nets have eventuality filters 3'" and 9. Then 3'" n 9 is a proper filter that contains { S} U N( z). d. Let p : X --+ Y be a mapping from one pretopological space into another. Then p is convergence-preserving (defined as in 7.33) if and only if p has this property: Whenever N is a neighborhood of p(x) in Y, then p - 1 (N) is a neighborhood of x in X.
15.4. Definitions. Let (X, lim) be a pretopological convergence space, with neighborhood filters N(x). We define two maps from P(X) into itself, the convergence closure operator and the convergence interior operator, by cl(S)
{z E X {z E X {z E X
S is a member of some filter that converges to z} some net that converges to z is eventually in S} S meets every neighborhood of z} , and
int(S)
{z E X {z E X {z E X
S is a member of every filter that converges to z } every net that converges to z is eventually in S} S is a neighborhood of z } .
Then the closure and interior are related by: int(X \ S)
X \ cl(S) .
Thus, closures and interiors are dual notions, in the sense of 1 .7. In practice, however, closures and interiors are commonly used in different ways. Typically, the closure of a set
Topological Spaces and Their Convergences
411
S i s used i f S itself does not contain enough points for some purpose - e.g., i f S i s not closed under some sort of operation of taking limits. The interior of a set S may be used as part of an argument to show that S or some other related set is nonempty, and thus to prove the existence of certain mathematical objects.
15.5. Further properties of pretopological closures. a.
cl(0) = 0, S c;;; cl(S) , and S c;;; T =? cl(S) c;;; cl(T) .
b. cl(S u T) = cl(S) u cl(T) . More generally, cl (U7=1 s1) = U:1=1 cl(SJ ) for any finite n. (For a slight generalization, see 16.23.c.) c.
cl(S) \ cl(T) = cl(S \ T) \ cl(T) c;;; cl(S \ T). Hint: From S = (S n T) U ( S \ T) , we obtain cl(S) = cl(S n T) U cl(S \ T) . Intersect both sides of this equation with the complement of cl(T) , to obtain cl(S) \ cl(T) = cl(S \ T) \ cl(T) . This argument is taken from Kuratowski [1948] .
Remarks. The closure of a pretopological space does not necessarily satisfy the idempotence condition cl (cl(S) ) = cl(S) . That is the one condition it still needs to be a Moore closure (see 4.5.a) or a topological closure (see 5. 19, 15.6, 15.7, and 15. 10(E) ) . 15.6. Example: a pretopological closure that is not idempotent. We exhibit a space in which cl(cl(S)) may differ from cl(S) . The underlying set will be JR2 . For each (x, y) E
{ (x, y' ) E lR2
:
I y - y' I 'S c
JR 2 and each number c > 0, let
} U { (x', y) E lR2 : I x - x' I 'S c } .
(This is a plus-shaped set centered at (x, y) ; each of its four arms has length c. ) Now define the neighborhood filter N(x, y) to be the filter {S c;;; JR 2 : S ;;:> KE (x, y) for some c > 0}. The resulting convergence is as follows: A proper filter :T on JR2 converges to a limit ( x , y) if and only if KE (x, y) E :T for every c > 0. Equivalently, a net (x,, Yn ) in JR 2 converges to a limit (x, y) if and only if for each c > 0, we have eventually ( x "' Yn ) E KE (x, y). Finally, let S = { (x, y) E JR2 : x > 0 and y > 0 } . Verify that cl(S) but cl(cl(S))
{ (x, y) E JR2 { (x, y) E JR 2
x � 0 and y � 0 and (x, y) -=f. (0, 0) } , x � 0 and y � 0} is strictly larger.
TOPOLOGICAL SPACES AND THEIR CONVERGENCES 15.7. Definitions. Let (X, 'J) be a topological space, as defined in 5.12. Recall from 5.16.a that a set N c;;; X is a neighborhood of a point p if p E G c;;; N for some open set G. Then
Chapter 1 5: Topological Spaces
412
N(p) = {N : N is a neighborhood of p} is a filter, called the neighborhood filter at p. The N(p) 's form a system of neighborhood filters for a pretopology, as defined in 15.2. Thus, a topological space is a special type of pretopological space. Convergence is defined as in 15.2. However, in a topological space (X, 'J) , the definition of convergence can be reformulated in terms of open sets. It is easy to show t hat for filters, :f --->
z
if and only if every open set containing z is a member of :f.
The equivalent condition for nets is:
Xn
--->
z
if and only if for each open set G containing
z,
eventually
x " E G.
The convergence given by either of these rules is the convergence determined by the topology. Every topological space is understood to be equipped with this convergence, unless some other arrangement is specified. A convergence rule that can be determined by some topology is called a topological convergence. Of course, any sequence is also a net, and so all our results for nets will apply to sequences. We shall work with sequences rather than with nets or filters whenever possible, since sequences are conceptually simpler.
15.8. A few basic properties of topological convergences. Let (X. 'J) be a topological space. Show that a. A set is open if and only if it is a neighborhood of each of its points. b. A set S 0 we have eventually l r, - ;; I < E . Observe that the convergence in any pseudometric space (X. d) can be characterized in tf'nus of convergence of distances, which are real numbers: .
s
.1"0 --> J!
iu (X, d)
15. 10. Theorem characterizing topological convergences (optional) . Let X be a converge tH"P space whose convergence is centered and isotone (as defined in 7.34) . Then the following couditions are equivalent . (A) The convergence on X is topological - i.e . . given by a topology. (B) (Iterated Net Condition.) Let (y� : b E D) be a net in X converging to a limit ;:; . For each b E D. let ( J:� : E E Eo ) be a net in X converging to Yo . Let F = flhE D Eh have the product ordering. and let D x F have the product ordering. Then the net (.r�. ( h J : (b. f) E D x F) converges to z .
( C) (Cook-Fischer Iterated Filter Condition.) Let 9 b e a filter on a set I, and let I X be some function. Assume the filterbase v(9) = { v(G) : I' :
-->
414
Chapter 1 5: Topological Spaces
G E 9} converges to some point z in X. For each i E I, suppose s(i) is a filter on X converging to v(i) . Then the filter X = U c E 9 n i EG s(i) converges to z. (D) (Kowalsky's Conditions.) The convergence is pretopological (as defined in 15.2) . Furthermore, suppose 9 is a filter on X converging to some point z in X. For each x E X, assume that s(x) is a filter on X converging to x. Then the filter X = U cE9 n i EG s(i) converges to z. (E) (Gherman's Conditions.) The convergence is pretopological. Moreover, the closure operator defined in 15.4 is idempotent - i.e., it satisfies cl( cl(S)) = cl(S).
Bibliographical remarks. Earlier, more complicated versions of parts of this theorem were given by Kelley [1955/1975] and Cook and Fischer [1967] ; those versions assumed a "star property" like that in 15.3.b. The star property assumption was removed independently, in different fashions, by Aarnes and Andenres [1972] and Gherman [1980] . It should be emphasized that our definition of "isotone" is based on Aarnes-Andenres subnets - i.e. , we assume condition 7.3 1 ( * ) . Kelley studied nets without assuming that condition and without considering filters. With his formulation the star property cannot be omitted; that was shown by Aarnes and Andenres [1972] . Hints for (A) =? (B) . Let N be an open neighborhood of x;(o) E N.
z; it suffices to show that
eventually
Outline of (B) =? (C). We shall begin by constructing a net that is somewhat like the canonical net of v(9), but is also parametrized by elements of I. Let 9 be ordered by reverse inclusion (see 7.4), let X and I have the universal ordering (see 3.9.g) , and let products have product ordering. Let { (i, G) E I x 9
D
i E G}.
Then D is a frequent subset of I x 9, hence directed. For {j = (i, G) in D, let Yb = v(i) . The net (Yb : {j E D) converges to z since its eventuality filter includes v(9). For each {j = (i, G) i n D, let S0 = s(i); thus S0 i s a filter o n X that converges to v( i) = Yo . The canonical net of S0 is
(x�
:
f E Eb) ,
where
Eo
=
{ (w, S) E X x Sb
w E S} and xfw,S )
=
w;
this net also converges to v( i) y0. Define F = Il o ED E0 as in the statement o f ( B ) . Then the net (x�(o) : f) E D x F ) converges t o z by the assumed condition (B) . Hence its eventuality filter C. also converges to z. We wish to show that X ---> z; it suffices to show that X 2 C. . Let A E C. ; we are to show that A E X. Since A is an eventual set of (x�(o) : f) E D x F), there is some {jA E D and ]A E F such that {j � {)A , J � ]A =? x�(o) E A . Say =
(8,
(8,
{jA {j
=
(iA, GA) ·
••
Temporarily fix any i E GA, and let {j = (i, GA) · Then {j is a member of D that satisfies � {jA and therefore x�( b ) E A for all f � fA · Thus x� E A for all E Eb such that
f
415 : c E Eh) is eventually in A, so A is a member of that net ' s
1\Iore about Topological Closures
Thus the net ( � eventuality filter, which is sh = s(i). Thus A E n iE G A 8(i) s;;; X. Outline of (C) (D). Easily, condition (C) implies the iterated filter condition in (D). It suffices to show (C) implies the convergence is pretopological. Fix any z E X; let N(z) be the intersection of all the filters that converge to z: we wish to show N(z) --> z. Let I = {filters on X that converge to z} and G = {I}. Define s(i) = i and v(i) z for all i E I. The hypotheses of the Cook-Fischer condition are satisfied, and therefore X --> z. Unwinding the notation, we find that X = N( z). Outline of (D) (E). Let I = cl(S); we wish to show cl(I) = I. Clearly. I {filters on X} as follows: if X � I N(x) s(.r) N(x) v {S} if J: E I. In either case. s(:r) is a filter that converges to Since z E cl(I), some filter 9 converges to z and contains I. By the assumed condition (D), Kowalsky's iterated filter X = Uc E 9 ni E G s(i) converges to z. Since I E 9, we have S E n iEI 8(i) z. we have z E d(S) = I. (A): Let � denote the convergence originally given on X; we are to prove Hint for (E) that � is a topological convergence. Condition (E) tells us that the convergence closure operator defined in 15.4 satisfies Kuratowski's axioms 5.19 and thus is the closure operator for some topology 'J on X. Since int(CS) C (cl(S)) . the convergence interior operator defined in 15.4 is the interior operator for that topology Let L be the convergence of that topology. Now. both � and L are pretopologic:al, and they have the same interior operator. hence the same neighborhood filters. Being pretopological. they satisfy c :;= fA (b).
:r
=}
=
=}
:r
U
{
:r .
=}
=
'J.
z
:J ]
N(z)
z.
That is. the two convergences are the same. Hence � is a topological convergence. MORE ABOUT TOPOLOGICAL CLOSURES 15.11. The closure operator is isotone - i.e., it satisfies S .EA
]
c�(n s), ) >. EA
and
U cl(S;, )
>.EA
=}
c
cl(S) . EA
and
Chapter 1 5: Topological Spaces
416
as we noted in 4.29.c. Neither of these inclusions is necessarily reversible, as we shall now show with simple examples. For both examples, take X = lR with its usual topology; let Q = {rational numbers} . Then
n
c1 (s")
U
cl (S" )
.-\ E lR
,\ E Q
=> 7o
c 7o
c1
(n ) (,\ U ) s,
if s" = JR \ {>-} ;
s,
if s" = {>-}.
.-\ E lR
c1
EQ
15.12. Relativization of closures. Let Y be a topological space, let X � Y, and let X be equipped with the relative topology. Let ely and cl x denote closures in the topology of Y and the topology of X . Then for any set S � X, we have cl x (S)
X n cly (S).
15.13. A subset S is dense in a topological space X if cl(S) = X . A topological space X is separable if it has a countable dense subset. Show that a. A set S � X is dense if and only if it meets every nonempty open subset of X . b. I f G i s an open subset o f a topological space X, and Y i s a dense subset of X , then c.
G � cl(G n Y) . The intersection of finitely many open dense sets is open and dense.
d. Any subset of a separable metric space is separable. e.
Any open subset of any separable space is separable.
f. However, separability is not a hereditary property - - i.e., not every subspace of a separable space is necessarily separable. Example. Let � be some particular member of an uncountable set X (for instance, take 0 E lR). Let X be given the topology 'J = { S � X : � E S or S = 0}. Show that X is separable, but the relative topology on X \ { 0 is not separable. g.
Let (X, d ) be a separable metric space. Then there is a sequence (x n ) in X with the property that every point in X is the limit of some subsequence of ( xn ) · In fact, we can choose the subsequence canonically (i.e. , without any arbitrary choices) . Hints: Repetitions 3;re permitted. If (uk) is a countable dense set , let ( x 11 ) be the sequence
Given any point p E X, we can choose a subsequence (xn , ) of (xn ) canonically as follows: Take n1 = 1 . Thereafter, let n; be the first integer greater than n;_ 1 that satisfies d (xn t. ' p) < lz .
417
Continuity
CONTINUITY
(X.
X.
S) and (Y, 'J) be any topological spaces, let .r0 E and let Y be a function. Then the following conditions are equivalent: if any (hence all) are satisfied, we say f is continuous at the point X0 • ( A ) f is ''convergence-preserving'' at :r0 . That is, whenever ( :r, ) is a net converg ing to a limit .r0 in X . t hen also f(r, ) ---> f(:r 0) in Y. (Compare t his with condition 1 5 . 1 4 ( E ) , which is intuitivdy similar but removes the concept of "time" from our convergence. ) 15. 14. Definition. Let
f:
X
--->
( B ) Whenever 'B i s a filterbase converging t o a limit :r 1 1 i n X , t hen the filterbase 'B = { !( ) : B E 'B } converges to the limit f(.r11) in Y.
B
(C) The inverse image of each neighborhood of f(;r:0 ) is
If the topologies on X and Y are given by gauges D and ts:
(D) For each pseudometric e E E and each number set D' such that
0
a
E.
neighborhood of .r 11 • then an equivalent condition
c > 0. t here exists some finite
e (-.p(.r0 ). -.p(:r ))
< c.
We emphasize that t he choice of !) and D' may depend on all of c. r . and 1·0 . but not 011 .r ; t his should be contrasted with t he definition of uniform continuity in 1 8 . 8 ( C ) . Of course, the preceding condition simplifies slightly if is a pseudometric space with singletm1 gauge D = { d} - or. more generally, if D is a gauge that is directed (as defined in 4.-±.c) . If X = = IR, and !HI = * IR is the hyperreal line constructed as in 1 0 . 20 . a . then t he conditions above arc also equivalent to t his condition:
X
Y
(E)
Whenever � is a hyperreal number that is infinitely close to :r0 (defined in 1 0 . 1 8 . c ) , then * J(O is infinitely close to f ( :r0 ) . (Here * J : !HI ---> !HI is defined as in 9.49. Compare this condition with 1 5 . 1 4 ( A ) . )
Y
15.15. Let (X. S) and (Y, 'J) b e any topological spaces, and let f : X ---> b e a funct ion. Then the following conditions are equivalent: if any (hence all) are satisfied. we say f is continuous. (A)
Inverse images of open set s are open: that is. definition of continuity was used in 9 . 8 . )
T E 'J
=;.
f - 1 (T) E S.
(This
(B) The inverse image of each closed set is closed.
- :x ,
----> :r
:
a.
MORE ABOUT INITIAL AND PRODUCT TOPOLOGIES 15.24. Convngf?ncr: in initial topologies. Let (X. S) have the initial topology determined by some mappings tp>. : X (Y>,. 'J>.) that is. suppose S is the weakest topology that makes all the tp>,'s continuous (sec 9 . 1 5 and 9 . 1 6 ) . (It is also sometimes known as the weak topology. ) Show that A set N .. ) = >.. o 'P : X ----+ S for any -
----+
423
More about Initial and Product Topologies
Show that S'P is continuous, when sY and sx are equipped with their product topolo gies. Hint : 15.25.b. e. Let f2 be a set. Identify each set S c;;; n with its characteristic function 1s f2 ---> {0 , 1 }; then the 1s ·s are members of 2°. Show that the order convergence of sets Sn described in 7.48 is the same as the convergence of the 1s" 's given by the product topology on 2n (where 2 = {0 , 1} has the discrete topology, as usual). 15.27. Theorem. If { Xn : o E A} is a collection of separable topological spaces with card(A) ::; card(IR), then Tin E A X" (with the product topology) is separable. Proof. The product topology is not affected if we replace the index set A with another index set of the same cardinality; hence we may assume A c;;; JR. Let P = Tin E A X" . For each E A, let x!' , .r2' . x3' , . . . be a dense sequence in X" . Let 2 be the collection of all closed subintervals of IR that have rational endpoints and positive, finite length. For each positive integer m, each finite sequence 11 , h , . . . , 1171 of disjoint members of 2. each finite sequence n1 , n2, . . . , nm of positive integers, and each o E A, define :
a
...
PJ1 . . . . Jm . 11 1 .
x�;, x�;2
n nr ( a )
if a E 11
if
a E
h
x�� m if 1m x!' if A \ U;'� 1 J;. The function PJ, . ... .J, . n , . . . .. 1naps each n to smne rne1nber of X" , so P.J, . ... .J, . n, .. . . n, is actually a member of P. There are only countably many such functions p . since 2 and N are countable. It suffices to show the functions p are dense in P. Let any nonempty open set G c;;; P be given; it suffices to show G contains one of the functions p. For each (3 A, let P Xr; be the (3th coordinate projection; by 15.26.a we know that G :2 n;'� 1 Jr,-;,1 ( V, ) for some distinct numbers n 1 , O m A and some nonempty open sets V, c;;; Xo , . Choose 2 such that o; 1; , and choose some numbers such that disjoint sets 11 , 12 , is a member of G. This x�; E v; . Unwinding all the notation, verify that P .h . . .. . .J , . n , . . . . . proof follows Willard [1970] ; further references are also given by Willard. 15.28. Let X = f1 .x E Y.x be a product of topological spaces, equipped with the product topology. A point in XA may be seen as an ·'ordered A-tuple" (.r,, x11. :x·"� . . . . ) (see 1.32). Hence a mapping h : X ---> Z, from X into some other topological space Z, may be written as z = h( Xn , x1; , . . . ) . Ordinary continuity from X (with the product topology) to Z is sometimes called joint continuity, to emphasize that the variables x, . x1; . . . . are being considered together, not separately. A slightly weaker condition is separate continuity: we say that the mapping z h(xn , X;; , x1 , . . . ) is separately continuous if z is continuous as a function of each one of the arguments x whenever all the other arguments are held fixed. ex E
a E
n,
H1;
:
--->
E
. . . . 1111 E
.
.
.
•
E
ni
E
n ,
:x; "� ,
•
=
Examples. a.
.r 1
.x
x
Let (X, 'J) be a topological space, and let d X X [0 , +oc) be a pseudometric on X (not necessarily associated with the topology 'J) . Let X X have the product :
--->
x
424
Chapter
15:
Topological Spaces
topology, and let lR have its usual topology. Then d is separately continuous from X X into lR if and only if d is jointly continuous. ( Hint : 2.12.e.) Thus, the phrase "a continuous pseudometric" is not really ambiguous. b. Define f lR lR ---> lR by when(x, y) =/= (0, 0) f(x, y) when (x, y) = (0, 0). Show f is separately continuous but not jointly continuous. Hints: 15.14(A) and 15.25.b. c. In 1.17 we defined the extended real numbers and their arithmetic operations. For many purposes -- particularly in the theory of measure and integration - it is convenient to define the product of 0 and ±oo to be 0. That causes confusion for some students, because it seems to be contrary to what they would expect from their experience with calculus. We shall now take a closer look at this. Most of the multiplication rules would make multiplication a jointly continuous operation from [-oo, +oo] [-oo, +oo] into [-oo, +oo]. That is, if (xa ) and (Ya ) are nets converging to some limits x and y, then XnYa xy. For instance, if x, 3 and Ya +oo, then XaYa ---> 3 · ( +oo) = +oo. This behavior is very reassuring: it tells us that ±oo are very much like ordinary real numbers. The only exceptions are when we multiply 0 times ±oo. If Xn 0 and y, ±oo, then the product XnYn could converge to anything or not converge at all. For instance, take the directed set to be N, so that our nets are sequences. Then 1 n 2 ---> +oo, -1 · n ---> 1 ., 1 n ---> 0, n n and ( � sin n) n does not converge at all. This state of affairs can be summarized as follows: Multiplication, considered as a binary operation on [ -oo, +oo], is jointly continuous everywhere except at the ordered pairs (O, ±oo) and (±oo, O). It cannot be made jointly continuous at those ordered pairs, no matter how we define the products at those pairs. Still, for some purposes in the theory of measure and integration (not involving limits of this sort), it is conve nient to define 0 oo = 0 and accept multiplication as an operation that is discontinuous at that ordered pair. 15.29. A topological equivalent of choice ( optional) . We shall show that AC (introduced in 6.12, 6.20, and 6.22) is equivalent to the following assertion, from Schechter [1992]: (AC19) Product of Closures. For each ,\ in some index set A, let S>., be subset of some topological space X>.,. Then cl( f L E A S>.. ) = f LEA cl(S>.. ) . In this equation, the first "cl" denotes closure in the product topology on f LE A X>.,, while the second "cl" denotes closure in X>.. . x
:
x
x
--->
--->
--->
--->
- ·
·
--->
2 · T1
·
a
425
Q11otient Topologies
is provable in ZF (i.e., without AC). Actually, the inclusion cl(fl A E A SA) c:;; TI A E A cl(SA) 1 To see this, just note that TI A E A cl(SA) n AE A 7r_\ (cl(SA)) is closed (where 7rA is the .Xth coordinate projection). Thus. it remains to show that the inclusion =
(AC20)
is equivalent to AC. Refer to 6.12. To prove (AC3) =? (AC20), let any f E TI A E A cl(SA) be given; we wish to show that f E cl(fl A E A SA) · It suffices to show that TI A E A SA meets every neighborhood of f. Let G be any neighborhood of f; then f E TI A E A GA c:;; G where GA is some open subset of XA. Since f(.X ) E cl(SA), the set S\ meets every neighborhood of f(.X) in XA · Thus the set GA n SA is nonempty. Choose any element zA E GA n SA. Now the function z. defined by z(.X ) zA· is an element of G n TI A E A SA. To prove (AC20) (AC3). let A, SA, 6 , XA , X , � be as in 6.24. Let XA be equipped with the indiscretP topology, i.e., in which the only open sets are 0 and XA. Then �A E XA ci(SA) · Hence the function � is an element of TI A E A cl(SA)· Now apply (AC20); this tells us c!(fl A E A SA) is norwmpty, and therefore the set TI A E A SA is nonempty. =
=?
=
QUOTIENT TOPOLOGIES 1 5 . 3 0. Definition. Let (X, S) be a topological space, let Q be a set, and let X Q be a surjective mapping. The resulting quotient topology (or identification topology) on Q is defined to be 1r :
---+
saw in 5.40. b that this collection 'J is a topology on Q. (In fact, 5.40. b shows that 'J is a topology regardless of whPthcr is surjective. but surjcctivity of is part of the definition of a quotient topology.) \Nhen Q is equipped with the quotient topology, then will be called a topological quotient map (or topological identification map ) . The terminology stems from the fact that Q is tlw quotient set of X , determined by the mapping (see 3.11). Alternatively, points of Q arc obtained by identifying with each other (i.e., merging) those points of X that have the same image under 7r. In gemTal. convergmtce of nets and filters in the quotient topology does not have a simple characterization analogous to that of 15.24.b. A partial result in that direction is given in 22.1:� .('. Our trPatnwnt of quotients is based partly on Dugundji [1966]. WP
1r
1r
1 5 . 3 1 . Basic proper-ties of the quotient topology. a. 1r : X Y a 1r - J (T) X.
Let if b . Let if
1r
he topological quotient map. Then a set T is open in Y if and only is OJWn in (This is just a restatement of the definition.) X Y be a topological quotient map. Then a set T is closed in Y if and only (T) is closed in X . ---+
1r :
1r - 1
1r
---+
426
Chapter
1 5:
Topological Spaces
(Composition property.) If 7r : X ---> Q is a topological quotient map and g : Q ---> Z is some mapping such that the composition g o 7r : X ---> Z is continuous, then g is continuous. In fact, a continuous surjective map 7r : X ---> Q is a topological quotient map if and only if it has that composition property. For this reason the quotient topology is sometimes called the final topology - it has some properties analogous to the initial topology (introduced in 9.15 and 9.16), but with the arrows reversed. d. Let X be a topological space and let 1r : X ---> Q be a surjective mapping. Then the quotient topology on Q makes 7r continuous. In fact, the quotient topology is the strongest (i.e., largest) topology on Q that makes 7r continuous. Recall that a mapping is open if the forward image of each open set is open, or closed if the forward image of each closed set is closed. Show that if 7r : X ---> Y is a continuous surjective map that is either open or closed, then 7r is a topological quotient map. Several of the most important topological quotient maps are open maps (see 16.5 and 22.13.e), but this is not a property of all topological quotient maps. f. Let 7r : X Q be a topological quotient map. Recall from 4.4.e that the 1r-saturation of a set S Q be a surjective mapping that is distance-preserving - i.e., that satisfies e(1r(x l ), 1r(x2)) = d(x 1 , x2). Then the mapping 7r is open, closed, and a topological quotient map. More generally, let (X, D) and ( Q, E) be gauge spaces, with gauges D = { d-\ : .\ E A} and E = { e-\ : A E A} parametrized by the same index set A. Suppose 7r : X ---> Q is surjective mapping that is "distance-preserving" in the following sense: for all x 1 , x2 E X and A E A. Then 7r is open, closed, and a topological quotient map. A slight specialization of this result is given in 16.21. c.
e.
--->
g.
a
NEIGHBORHOOD BASES AND TOPOLOGY BASES'' 15.32.
Let X be a topological space (or, more generally, a pretopological space), and let
427
Neighborhood Bases and Topology Bases
x E X . A base of neighborhoods at x, or a neighborhood base at x, is any filterbase 'B that generates the neighborhood filter N(x). In other words. it is any collection 'B C::: N(.r) with the property that every member of N(:r) is a superset of some member of 'B . 15.33. E.rarnples of neighborhood bases. Let ( X , 'J) be a topological space. and let x E X . Let N( ) be the neighborhood filter at x. a. Trivially, N(:r) itself is a neighborhood base at :r. b. Another neighborhood base at .r is given by N(:r) n 'J. the collection of open neighbor hoods of X . !\lore generally, an open neighborhood base means any neighborhood base, all of whose members are open sets. Thus, it is a neighborhood base 'B C::: N(.r) n 'J. c. A closed ne·ighborhood base means a neighborhood base, all of whose members are dosed sets. A topological space is called regular if every point has a closed neighborhood base. Regular spaces will be investigated further in 16.13. Exercise. Every gauge space (X, D) is regular. Hint: Let Br� and Kr1 denote the open and closed d-balls. as in 5.15.g. If N is a neighborhood of then N n riEC' Br�(:r , ) :;::> n r E Kr� (.r , r / 2 ) for some finite set c c::: D and some r > 0. Other examples of neighborhood bases will be given in Chapters 26 through 28. 15.34. A topological space is first countable if each neighborhood filter N(J·) is generated by some countable filterbase 'B (J·) i.e., if for each x there is some countable collection 'B ( ) C::: N( x) such that each member of N( :r) is contained in some member of 'B ( ) As shown by some of the exercises below, in a first countable space. sequential arguments are sufficient for many purposes; nets are very seldom needed. However. the Principle of Countable Choice is needed for many of these sequential arguments i.e .. tlw proofs may require a sequence of arbitrary choices, since there is no "canonical sequence'· analogous to the canonical nets developed in 7.11. Sequential arguments are also sufficient in a few special situations in spaces that are not first countable: that is the content of the deep theorems 17.50 and 28.36. For a more elementary example of sequences sufficing in a space that is not first countable, consider the characterizations of closures and continuity when X is an infinite set equipped with the cofinite topology (see 5.15.c· and 15.9.c ) . :r
r
J:
i
C'
=:!
J' ,
.r
.
Exercises. a.
Any pscudometric space (X, d) is a first countable space. A countable, open neigh borhood base at x is given by the open balls B(x, f,) = {u E X : d(n, .r) < f,} for n E N. In particular, lR is first countable. Remark. Actually, first countable is only a very slight generalization of pseu dometrizable. !\lost spaces of interest to analysts are subsets of topological vector spaces; among such spaces. first countable is the same as pseudornetrizable - see 26.32.
428
Chapter 1 5: Topological Spaces
In any first countable space X, cl(S) is equal to the sequential closure of S that is, the set { x E X : x is a limit of some sequence in S}. c. In a first countable space, if some subnet of a sequence (xm) converges to a limit then some subsequence (xkn : n = 1 , 2, 3, . . . ) also converges to z. Hints : Let { B 1 , B2 , B3 , . . . } be a neighborhood base at we may assume B ;;;;> B2 ;;;;> B3 ;;;;> (Why?) Let k0 = 0. Thereafter, show that there exists an integer kr, that satisfies both kn > kn - 1 and Xkn E Bn . d. Let X and Y be topological spaces; assume X is first countable. Then a mapping p X Y is continuous if and only if it preserves sequential convergences -- i.e., if and only if it satisfies Xn X in X p ( xn ) ----+ p( x) in Y - regardless of whether Y is first countable. Hints : Assume p preserves sequential convergences. Let N be a neighborhood of p( x0) in Y ; we wish to show that p- 1 (N) is a neighborhood of x0 in X. Let {B1 , B2 , B3 , . . . } be a neighborhood base at x0 in X; we may assume B1 ;;;;> B2 ;;;;> B3 ;;;;> ; we wish to show that p - 1 (N) contains some BJ . Suppose not. Then there exist points Xj E Bj \ p 1 (N). The sequence (xj ) converges to x0 , hence p( xj ) p( x0 ) , hence for j sufficiently large we have p( xj ) E N, a contradiction. 15.35. Another way to describe topologies is in terms of bases for topologies. Let (X, 'J) be a topological space, and let 13 existenct> is asserted by that theorem. Indeed, for a weak ( "Brouwerian" ) counterexample, consider the mysterious "Goldbach number" r described in 10.46. It is a mJmlwr that is known to be quite dose to zero, and in fact it can be approximated as accurately as one may wish, but we do not yet know whether this number is positiw. negative, or zero. Use it to define a pit>cewise-affine function f as in the following diagram. This function is well-defined and continuous, and we can evaluate it with as much accuracy as wt> may wish. It satisfies .f (O) < 0 < f(3) . Finding an exact solution E [0 , 3] of f(c) 0 would tell us much about r: if < 1 , then r > 0: if 1
rn
111 .
c
=
c
f(t)
=
{
r
W
This contradicts the fact that the theorem.
T n (4 - 1 - 1 )
=
T n + 1 2 T rn - 1+ 2 .
p, q both lie in G = B(�, 2 -rn - 1 ) and completes the proof of
HEREDITARY AND PRODUCTIVE PROPERTIES 16.32.
Definitions. A property P is hereditary if, whenever Y is a topological space with property P and X is a subset of Y equipped with the relative topology, then X also has property P. For instance, Hausdorff is a hereditary property, since any subspace of a Hausdorff space is also Hausdorff when equipped with the relative topology. A property P is productive if, whenever X = fLEA Y>. is a product of topological spaces and the Y>, ' s all have property P, then X (equipped with the product topology) also has property P. A property P is an initial property if, whenever X has the initial topology determined by a collection of mappings f : X ---> Y>, , and the Y>. ' s are topological spaces with property P, then X also has property P. Note that any initial property is also a hereditary property and a productive property, since the relative and product topologies are special cases of initial topologies.
16.33.
Exercises and remarks. a. All the following separation axioms are initial properties: symmetric, preregular, reg-
452
Chapter 1 6: Separation and Regularity Axioms
ular, completely regular. The verification of these facts are fairly straightforward ex ercises; we shall omit the details. b. All the separation axioms Tn , for n = 0, 1 , 2, 3, 3.5, are hereditary and productive properties. In fact, if X has the initial topology determined by a collection of mappings f : X ---. Y.x , and that collection of mappings separates points of X , and the Y.x 's have one of the properties T0, T1 , T2 , T3 , T3 . 5 , then X also has that property. c. Normalcy and paracompactness are not hereditary; we shall prove that via an example in 17.40.a. d. Normalcy is not productive; we shall prove that by an example in 17.40.b. e. Paracompactness is not productive. Indeed, let X be the real line equipped with the topology generated by all sets of the form { x E JH; : a ::; x < b} , for a, b E R (This is called the right half-open interval topology, or the lower limit topology.) It can be shown that X is a paracompact Hausdorff space, but X x X (with the product topology) is not paracompact. (In fact, X x X is not normal; this gives another proof that normalcy is not productive.) We omit the details of the proof, which can be found in topology books.
Chapter 1 7 Compact ness 17.1. Preview. In !Rn , a set is compact if and only if it is closed and bounded. That notion is generalized in this and the next few chapters. The following chart shows relations between some of the main relatives of compactness.
CHARACTERIZATIONS IN TERMS OF CONVERGENCES 1 7 . 2 . Definition and exercise. Let ( X , 'J) be a topological space. We say that X is compact if any of the following equivalent conditions are satisfied. (Examples will be given later in the chapter. )
( A ) Every open cover of X has a finite subcover. That is, i f 9 = {G.\ : ,\ E A} is a cover of X consisting of open sets, then some finite subcollection {G.\ , , G .\2 , , G.\ " } is also a cover of X . (This is the most common definition of compactness in the mathematical literature. ) .
•
.
( B ) Every filter subbase consisting o f closed subsets of X i s fixed. That is, when ever S = { S.\ : ,\ E A} is a collection of closed subsets of X that has the finite intersection property, then n .\ E A 5.\ is nonempty.
(C) Every net in X has a cluster point - i.e., every net has a convergent subnet. (D) If J" is a proper filter on X , then the set n FE J" cl (F) = {cluster points of J"} is nonempty. Remarks. In view of 7. 19 and 15.38, it does not matter which kind of subnet we use in (C). Also, the equation n FEJ" cl(F) = {cluster points of J"} was noted in 15.38 (regardless of compactness) . Proof of equivalence. The equivalence of (A) and (B) follows from taking complements i.e., take G.\ = X \ S.\ . The equivalence of (C) and (D) is just the correspondence between nets and proper filters (see 7.9, 7. 1 1 , and 7.3 1 ) . For (D) =? (B), let J" be the filter generated by S. For (B) =? (D), let S = {cl(F) : F E J"} .
453
Chapter 1 7: Compactness
454
I compact metric I
/
�
topologically complete pseudometric
,--1 ----, --'---sequentially compact
\
locally compact T2
I pseudocompact I locally compact regular
1 7. 3 . More definitions. A subset K of a topological space Y is said to be a compact set if K is a compact space when equipped with the relative topology induced by Y. This notion is so important that we shall now reformulate all four of the conditions stated in 17.2; the formulations below are occasionally more convenient than those given in 17.2. A set K � Y is a compact set if one ( hence all ) of the following conditions is satisfied. (A) Whenever {G>- : >. E A} is a collection of open subsets of Y with union containing K , then U >-E L G >, :::2 K for some finite set L , : >. E A} is a collection of closed subsets of Y such that the collection { K n F>, : >. E A} has the finite intersection property, then n >-EA F>, meets K.
(C) Every net in K has a cluster point in K - i.e., every net in K has a subnet
455
Characterizations in Terms of Convergences that converges to a point in K .
( D) If :J is a proper filter on Y and K E :J, then :J has a superfilter that converges to some point in K. Although these conditions refer to the topology of Y , they do not actually depend on Y, where Y except insofar as it determines the relative topology on K . Thus, if K � Y n and are two topological spaces that determine the same relative topology on K, then K is compact ¢==? K is a compact subset of Y ¢==? K is a compact subset of
Z, Z.
Z
17.4. The preceding definitions of compactness and their proof of equivalence did not require the Axiom of Choice or any weakened form of Choice. Following is another characterization of compactness; this statement is equivalent to the Ultrafilter Principle.
(UF18) Let X be a topological space. Then X is compact if (and only if) every ultrafilter on X converges to some limit - or equivalently, if (and only if) every universal net in X converges to some limit. In fact, this statement is equivalent to the Ultrafilter Principle with or without the paren thesized "and only if" part . It follows from the definitions of "ultrafilter" and "compact" that if X is compact, then every ultrafilter on X converges; this implication does not require any arbitrary choices and thus is valid in ZF. We assert that our earlier versions of UF are equivalent to the remaining statement that (*) X is compact if every ultrafilter on X converges. Indeed, (*) follows easily from (UF3) in 7.24, together with the definition of "compact." A proof that (*) implies (UF 1 ) will be given in 1 7.22. 1 7.5. Let f : A --+ X be a mapping from a set A (without any topology necessarily specified) into a compact Hausdorff space X . (i) Suppose (>-n ) is a universal net in A. Then (f(>-n ) ) is a universal net i n X , which converges to a unique limit . We say that f converges along the universal net (>-n) to that limit. (ii) Equivalently, let U be an ultrafilter on A. Then {f( U ) : U E U} is a filterbase on X , and the filter it generates is an ultrafilter. That ultrafilter converges to a unique limit in X . We say that f converges along the ultrafilter U to that limit. Let us denote that limit by limu f. We may restate its definition: limu f is the unique point in X with the property that each neighborhood N of limu f contains some set f( U ) with U E U. (This notion is discussed also by Bourbaki [1966] .) We have limu f = limu g whenever f and g are U-equivalent in the sense of 9.41, and so limu is in fact well defined on the quotient space X A /U - i.e., on the ultrapower *X. 17.6. ( Optional.) The Ultrafilter Principle implies the Hahn-Banach Theorem. We shall show that (UF 1 ) =} (HB 1 ) . Let (6, �) be a directed set. Let 'D be the filter of eventual subsets of 6 - that is, let
}
S :2 { D E 6 : D � Do } for some Do E 6 .
Chapter 1 7: Compactness
456
By (UF1), let U be an ultrafilter on � that extends D . If f : � ----> lR is a bounded function, then f may be viewed as a map into the compact Hausdorff space [a , b] for some a, b E JR. Hence we may define limu f E lR as in 17.5. Obviously the map f f---+ limu f is positive and linear. It is easy to verify that if f : � ----> lR is a bounded function such that the net {!(b) : b E � } converges to a limit L, then that limit is equal to limu f. Thus limu is a Banach limit. Remark. Actually, Pincus [1972] showed that the Hahn-Banach Theorem is strictly weaker than the Ultrafilter Principle, but the proof of that fact is beyond the scope of this book.
BASIC PROPERTIES OF COMPACTNESS 17.7. Elementary examples and properties.
Any finite subset of any topological space is compact. In particular, 0 is compact. b. The union of finitely many compact sets is compact. c . Let X be a topological space. Then a.
:J
{ S c;;; X : S c;;; K for some compact set K }
is an ideal on X. Thus, for some purposes, we may view the members of :J as "small" subsets of X , in the sense of 5.3. d. Let S and 'J be two topologies on a set X . Then the weaker topology has more compact sets - or at least as many. That is, if S c;;; 'J, then every 'J-compact set is also S-compact. It is possible for S and 'J to yield the same collections of compact sets even if if S � 'J; see the second and third examples below. (i) The discrete topology on X is the strongest topology, so it should have the fewest compact sets. Show that a subset of X is compact for the discrete topology if and only if that subset is finite. (ii) The indiscrete topology on X is the weakest topology, so it has the most compact sets. In fact, with the indiscrete topology, every subset of X is compact. (iii) The cofinite topology is strictly stronger than the indiscrete topology (unless card(X) < 2), but the cofinite topology also makes every subset of X compact. e.
If ( x 1 , x2 , x3 , . . . ) is a sequence converging to a limit x0 in a topological space, then the set { x0 , x 1 , x2 , x3 , . . . } is compact. (This result does not generalize to nets. )
f. I n any topological space, the intersection of a closed set and a compact set i s compact. In a compact topological space, any closed set is compact. In a Hausdorff topological space, any compact set is closed.
Basic Properties of Compactness
457
g. Any compact preregular space is paracompact (hence normal and completely regular) .
Proof Given an open cover, any finite subcover is a locally finite refinement.
--> Y is a continuous map from one topological space into another, and K c=; X is compact, then f(K) is compact. i. Any upper semicontinuous function from a compact set into [ -oo, +oo] assumes a maximum. j. Dini's Monotone Convergence Theorem. Let (g" : o E A) be a net of continuous functions (or more generally, upper semicontinuous functions) from a compact topolog ical space X into R Assume that g" 1 0 pointwise -- i.e. , assume that for each x E X the net (gn ( x ) ) is decreasing and converges to 0. Then the convergence is uniform i.e. , limnE A ::mp.r E X g, (x) = 0. Hint: Let c > 0 be given. If none of the closed sets Fn = { x E X : fn (x) � c } is empty. show that the collection of Fn 's has the finite intersection property.
h. The continuous image of a compact set is compact. That is, if f : X
17.8. Proposition. Let (X, :S:) be a chain ordered set (for instance, a subset of [-oo, +oo] ) , and let 'J be the interval topology on X (defined in 5. 15.f) . Then (X, 'J) is compact if and only if (X. :S: ) is order complete. Furthermore, if (xn: : o E A) is a net in an order complete chain, then lim inf Xn is the smallest cluster point of the net, and lim sup Xn is the largest cluster point of the net.
Proof. First, suppose that X is order complete. It follows easily from 15.42 that lim inf X0 and lim snp :r" are cluster points of (xn ) · It follows from 7.47.a that any other cluster points must lie between those two. On the other hand, suppose X is not order complete; we shall show X is not compact. Assume D is a nonempty subset of X such that sup( D) does not exist in (X. ::; ) . Consider D itself as a directed set; we shall show that the inclusion map i : D --> X is a net with no cluster point. To put our notation in a more familiar form, we shall write the net as (ih : D E D), where in fact i0 = D . Consider any z E X ; we shall show z cannot be a cluster point of X . We consider two cases: (i) z is not an upper bound of D. In this case there is some Do E D with Do > z. The set { x E X : :r < Do } contains z but is not a frequent set for the net ( i6 ) , so z is not a cluster point . (ii) z is an upper bound of D, but is not the least upper bound. Thus D has some upper bound b < z. Then the set { x E X : x > b} contains z but is not a frequent set for the net ( i6 ) , so z is not a cluster point. 17.9. CorollaTies. a. The extended real line [-oo, +oo] is compact when equipped with its usual topology. (That topology will be discussed further in 18.24.)
b. A subset of IR is compact if and only if it is closed and bounded. In particular, any interval [a. b] c=; IR (where 17.10.
-x
< a < b < +oo) is compact.
Compactness and Hausdo TjJ spaces.
Chapter 1 7: Compactness
458
a. Let S be a subset of a Hausdorff topological space. Then S is compact if and only if S is closed and S is contained in a compact set.
b. Let S be a subset of a compact Hausdorff space. Then S is compact if and only if S is closed.
c. If X is a compact space, Y is a Hausdorff space, and f : X ----) Y is continuous, then f is a closed mapping - i.e., the image of a closed subset of X is a closed subset of Y. If, furthermore, f is a bijection, then f - 1 is also continuous - that is, f is a homeomorphism.
d. No Hausdorff topology on a set can be strictly weaker than a compact topology on that set. In other words, it is not possible for a set to have two topologies S S is Hausdorff and 'J is compact.
� 'J where
17.11. We shall say that a topological space X is locally compact if each point has a compact neighborhood. Following are some examples. a. Any compact space is locally compact. b. Any set with the discrete topology is a locally compact Hausdorff space. c. lR is locally compact.
Preview of further results. In 17. 1 7 we shall see that JRn is a locally compact Hausdorff
space, when equipped with the product topology. In 27. 17 we shall see that no infinite dimensional Hausdorff topological vector space is locally compact. In 17. 14.d we shall see that any locally compact preregular space is completely regular.
REGULARITY AND COMPACTNESS 17.12. If X is a symmetric space and x E X , then cl({x}) is compact. Proof Any open cover of cl( { x}) has a finite subcover; that is immediate from 16.6(B) . 17.13. Let X be a preregular space, and let K be a compact subset of X . Then: a. If p E X with cl( {p} ) disjoint .from K, then K and cl( {p}) are contained in disjoint open sets.
Proof For each x E K, we have x � cl( {p } ) . By 16. 10, p and x are contained in
disjoint open sets Ax and Bx , respectively. Then { Bx : x E K} is an open cover of the compact set K, so it has a finite subcover; we have K � B = Ux E I Bx for some finite set I � K. Then p is in the open set A = nx E I Ax .
b. If L is closed and compact, and K and L are disjoint, then K and L are contained in disjoint open sets. Proof For each p E L, the sets cl( {p} ) and K are contained in disjoint open sets Ap and Bp, respectively. The sets Ap form an open cover of L. Choose a finite subcover
Regularity and Compactness
459
of the Ap ' s, and take their union for an open set containing L. The intersection of the corresponding Bp's is an open set containing K. c . cl(K) = U x EK cl({x}) . Proof We have cl(K ) :2 Ux E K cl( { x } ) if K is any subset of any topological space. For the reverse inclusion, let p E cl( K ) ; we wish to show that p E U x E K cl( { x}) . Since p E cl(K), we know that K meets every neighborhood of p. Hence by the preceding exercise, cl({p}) is not disjoint from K. Say x E cl( {p} ) n K. By 16. 10, then, also p E cl( {x}), as required. d. If K is contained in an open set G, then cl(K) S: G also. Proof Use the preceding exercise and 16.6(B ) . e . cl(K) i s compact. More generally, i f K S: T S: cl(K) , then T is compact. Hints: Any open cover of T is also an open cover of K; use the preceding exercise. f. If S S: K, then cl(S) is compact. Proof cl(S) = cl(S) n cl(K) is the intersection of a closed set and a compact set; apply 17.7.f. 1 7. 14. Let X be a locally compact preregular space. Then: a. (Neighborhoods of points.) Each point has a neighborhood base consisting of closed compact sets (and hence X is regular) . Proof Any x E X has a compact neighborhood, hence (by 17. 13.e) has a closed compact neighborhood K . Then K is a compact preregular space, hence (by 17.7.g) K is a regular space. Hence x has a neighborhood basis in K consisting of closed sets. Since K is a neighborhood of x in X , those same sets also form a neighborhood basis for x in X. Those sets are also closed and compact in X .
b . (Neighborhoods of compact sets.) Let K S: G with K compact and G open i n X. Then
c.
there exists some open set H whose closure is compact, such that K S: H S: cl(H) S: G. Proof By 17. 14.a, each x E K is contained in some open set Ax whose closure is a compact set contained in G. Then the compact set K has open cover {Ax : x E K } , hence some finite subcover {Ax : x E F } . Let H = U x EF A x . Then cl(H) = Ux E F cl(Ax ) by 15.5.b or 16.23.c; hence cl(H) i s a compact subset o f G. (Local partitions of unity.) Suppose K S: U7=l Gj , where K is compact and the Gj 's are open. Then there exist continuous functions 'PJ : X ----+ [0, 1] such that 2:7= 1 'PJ = 1 on K , and each 'PJ vanishes outside some compact subset of GJ . Proof Let G = U7= l G1 . By two applications of 17. 14.b, we may find open sets G ' , G" such that K S: G" S: cl( G" ) S: G' S: cl( G' ) S: G and cl( G' ) , cl( G") are compact sets. The set cl( G' ) , equipped with the relative topology, is compact and a preregular space, hence paracompact (see 17.7.g) . Let Tj = G" n Gj for j = 1 , 2, . . . , n, and let To = cl(G' ) \ K. These sets are relatively open in cl(G' ) , and they form a cover o f cl(G' ) . Let (S1 : j = 0, 1 , 2, . . . , n ) b e a shrinking o f (T1 ) that is, let the S1 's be another cover of cl( G ' ) consisting of relatively open sets, such
Chapter 1 7: Compactness
460
that cl( SJ ) S: TJ . Form a partition of unity on cl( G' ) that is precisely subordinated to (Sj); say cpo , cpi , cp2 , · · · , cpn are continuous functions from cl(G' ) into [0, 1] such that ""L;'=o cpj = 1 and cpj vanishes outside Sj. Since K S: cl( G ' ) \ S0, we must have ""L7= l cpj = 1 on K . For 1 ::; j ::; n, extend cpj to all of X by defining cpj = 0 outside of cl( G' ) . Note that cp1 vanishes outside cl(SJ ) , which is a compact subset of G1 . It suffices to show that cp1 is continuous on X. Note that X is the union of the open sets G ' and C cl(G" ) , and cpj is continuous on each of those sets, since cp1 vanishes on the latter set.
d. Corollary. Any locally compact preregular space is completely regular.
Proof Let any open set G and any point x E G be given. Then K = cl( { x } ) is a compact subset of G, by 16.6(B) and 17.12. Apply the preceding exercise with n = 1 .
17.1 5 . Definition and proposition. Let S b e a subset of a topological space X . Following are three closely related conditions on S:
(A) cl(S) is compact. (It is then customary to say that S is relatively compact. ) (B) S is a subset of a compact set. (As we noted i n 17.7.c, the sets satisfying this condition form an ideal.) (C) Every net in S (or every proper filter on X that contains point in X.
S)
has a cluster
We have the following implications: In any topological space, (A) In any preregular space, (B) (Proved in 17. 13.f. )
=} =;.
(B)
=;.
(C) . (Obvious.)
(A) , and so those two conditions are equivalent.
In any regular space, ( C ) =} (A) , and so all three conditions are equivalent. (Proved in the paragraphs below. )
Proof Assume regular and (C) . Let 9 b e any proper filter on X with cl(S) E 9 ; we must show 9 has a cluster point. Assume not; we shall obtain a contradiction. For each x E X, since x is not a cluster point of 9, there is some neighborhood Nx of x that is disjoint from some member of 9, and hence X \ Nx E 9. These conditions are preserved if we replace Nx with a smaller neighborhood of x; since X is regular, we may assume Nx is closed. Then the set Gx = X \ Nx is open and a member of 9. Since 9 is closed under finite intersection, for any finite set A S: X the set cl(S)nna E Ga A is a member of 9 and hence nonempty. By 5. 17.e, the set n a E Ga is nonempty. Thus A the collection of sets S = { S} U { Gx : x E X } has the finite intersection property and therefore is contained in a proper filter :J. By our assumption on S, that filter :J has some cluster point � E X . Now N� is a neighborhood of � ' hence N� meets every member of :J, hence N� meets G� , a contradiction.
sn
Tychonov'H Theorem
461
TYCHONOV ' S THEOREM 17.16. Recall that the Axiom of Choice, in one form, asserts that a product TI ,\ E A s,\ of nonempty sets is nonempty (see (AC3) in 6.12). That result bears some resemblance to: (AC21) Tychonov Product Theorem. Any product TI -\ EA Y-\ of compact
topological spaces is compact .
Here it is understood that the product space is equipped with the product topology. In contrast with (AC3 ) , however, the Tychonov Product Theorem does not assert that the product is nonempty. (An empty set is a perfectly acceptable compact topological space! ) Thus, it may b e surprising that the Tychonov Product Theorem i s equivalent t o the Axiom of Choice. Proof of (AC3) ::?- (AC2 1 ) . We shall make use of (UF18), which we already know to be a consequence of the Axiom of Choice. Let (x, : o: E A) be a universal net in TI -\ EA Y,\ ; we must show that (xn : o: E A ) is convergent. Let 1f,\ : X ----+ Y-\ be the ,\th coordinate projection. The net (n-\ (xa ) : o: E A) is universal in Y-\ . Let S-\ = {y E Y-\ : n-\ ( x n ) ----+ y } . Then each s,\ is nonempty. Hence TI ,\ EA s,\ is nonempty, by (AC3). If z E TI,\ E A s,\ , then Xn ----+
z.
The proof of (AC21 )
::?-
(AC3) will be given in 17.20.
n be a positive integer, and let IR n have its product topology. A " subset of IR is compact if and only if it is closed and bounded, where "bounded" has its
17.17. Corollary. Let
usual meaning (see 3 . 1 5 and 2. 12.a) . Hence JR" is locally compact.
17.18. An auxiliary construction. This observation will be used occasionally - e.g., in 26.9 and in 26.10. Let 0 be an open subset of Then there exists a sequence ( Gn ) of open sets whose union is 0, such that each Gn is contained in a compact subset of 0. We remark:
IR" .
One way to construct such a sequence ( Gn ) is as follows: The rational numbers are countable (see 2.20.f) . Consider all the open balls G = B (x, r) with the property that r and all the coordinates of x are rational numbers, and the closure of B (x, r) is contained in n. There are only countably many such balls; let them be the G n ' s. b. In most applications of such a sequence, the particular choice of the Gn ' s is not im portant. Any other such sequence Hn will do just as well, because ( exercise) each Gn is contained in the union of finitely many of the Hn 's, and vice versa. a.
COMPACTNESS AND CHOICE ( OPTIONAL) 17.19. Remarks. This subchapter is optional. It is concerned with showing that cer-
Chapter 1 7: Compactness
462
tain propositions imply either the Axiom of Choice or weakened forms of Choice. Many mathematicians take the viewpoint that the Axiom of Choice is simply "true;" with that viewpoint, this subchapter is of no interest.
17.20. Compactness equivalents of AC. We shall prove that the Axiom of Choice is equiva lent, not only to Tychonov's Theorem, but also to several other principles that are seemingly weaker: (AC22)
Any product of compact gauge spaces is compact.
(AC23)
Any product of knob spaces is compact.
(AC24)
Any product of
T1 compact topological spaces is compact.
(AC25) Any product of topological spaces, each equipped with the cofinite topology, is compact. (Gauge topologies and knob topologies were introduced in 5. 15.h and 5.34.c, respectively.)
Intermediate proofs. Any knob space is a compact gauge space, and any space with the cofinite topology is a compact space. Thus, the proofs of (AC2 1 ) =:? (AC22) =:? (AC23) and (AC2 1 ) =:? (AC24) =:? (AC25) are obvious.
T1
=:? (AC3) and (AC25) =:? (AC3) . This argument is from Kelley [1950] . Define S;. , 6 , etc. , as in 6.24. Equip each with either the knob topology or the cofinite topology. In either case, the set S;. is closed. Hence 3'" is a filterbase consisting of closed subsets of X. By assumption, X is compact; hence the intersection of the members of 3'" is nonempty - completing the proof indicated in 6.24.
Proof of (AC23)
Y;.
Further remarks. The Axiom of Choice is equivalent to Tychonov's Theorem, if we use any of the usual definitions of compactness, given in 17.2. An alternate approach is taken by Comfort [1968] . Comfort suggests a different definition of compactness, which is more complicated than the usual definitions but has this interesting property: we can prove that the product of Comfort-compact spaces is Comfort-compact without using the Axiom of Choice. But we haven't really eliminated AC; it turns out that AC is equivalent to the statement that a space is compact (in the usual sense) if and only if it is Comfort-compact. 17.21. We have established that the Axiom of Choice is needed to prove Tychonov's Theorem - i.e., that the product of arbitrarily many arbitrary compact sets is compact. But it is not needed for certain weakened forms of Tychonov's Theorem. For instance, arbitrary choices are not needed for:
Y1 , Y2 , . . . , Yn be compact topo Yn , with the product topology, is also
Tychonov Theorem (finite version) . Let logical spaces. Then X = compact .
x
Y1 Y2
x ·· · x
( Ua, Va
Proof It suffices to prove this for n = 2, and then apply induction. Let ( ) be any net in x Then ( : o: E is a net in the compact space
Y1 Y2 .
Ua
A)
Y1 ;
:
A)
o: E hence it
Compactness and Choice (Optional)
463
has a convergent subnet. By 7. 19, ( ua : a E A) also has a convergent Kelley subnet (un (()) (3 E lffi ) . Now, ( vn (f1) : (3 E lffi ) is a net in the compact space Y2 ; hence it has a convergent Kelley subnet (va( fJ ( I ) ) : r E C). Then ((un (f3(1) ) ' Va((J( I ) ) ) r E C) i s a convergent net in Y1 x Y2 , and it is a subnet of the given net. ( Optional exercise: Shorten this proof, using the notational convention of 7.2 1 . )
:
:
17.22. Compactness equivalents of UF. The Ultrafilter Principle was introduced i n 6.32. We shall now show that it is equivalent to (UF18) (introduced in 17.4) and the following principles. (UF19)
Any product of compact Hausdorff spaces is compact.
(UF20) (Stone-Cech Compactification Theorem.) Let X be a completely regular Hausdorff space. Then there exists a topological space (J(X) (called the Stone-Cech compactification of X ) with these properties: (i) (J ( X ) is a compact Hausdorff space, (ii) X is a dense subset of (3(X) , and (iii) if K is another compact Hausdorff space and f X ---> K is a continuous map, then f extends uniquely to a continuous map (J(X) ---> K.
: F:
(UF21) Let 2 = {0, 1 } be equipped with the discrete topology. Then for any set X , the set 2 x (with the product topology) is compact. Remarks. (UF19) and (UF21 ) are just (AC21 ) and (AC23) specialized to the case of Haus dorff spaces. Most topological spaces of interest in applications are Hausdorff, hence most applications of (AC21 ) or (AC23) actually follow from (UF19) or (UF2 1 ) . A set of the form 2 x , with the product topology, is sometimes known as a Cantor space. However, that name is more often used for the "middle thirds" set C = C0 n C1 n C2 n · · · , where C0 = [0, 1], C1 = [0, 1 J u [ � , 1] , C2 = [0, iJ u [ � , u [ � , �] u [ � , 1] , etc. Actually, i t can be proved that the middle thirds set is homeomorphic to 2 N , but we shall omit the proof. From property (UF20) (iii) it follows easily that the Stone- Cech compactification is unique up to homeomorphism.
�]
Proof of (UF18) =? (UF19). Just modify the proof of (AC3) =? (AC2 1) given in 17.16. If each YA is a compact Hausdorff space, then each SA is a singleton, and so the Axiom of Choice is not needed to prove f L EA SA is nonempty.
Proof of (UF19) =? (UF20) . Let I = [0, 1] and let C(X, J ) = {continuous functions from X into 1}. Any x E X determines an evaluation mapping T.T : C(X, I ) ---> I, defined by Tx (J) = f(x) for each f E C(X, I ) . Use the fact that X is a completely regular Hausdorff space, to show that the mapping x f---> T.r from X into I C (X.I) is injective and is a homeomorphism onto its range. Identifying X with its image, we may view X as a subset of IC (X.I) . By (UF7) , I C (X.I) is a compact Hausdorff space. Let (3(X) be the closure of X in I C( X.I) ; then (J ( X ) is compact and X is a dense subset. The uniqueness of the extension follows from the fact that X is dense in (J ( X ) . To prove the existence of the extension F, let any continuous f : X ---> K be given. Whenever 'P E C(K, J) , then 'P o f E C(X, J) . Hence, if A is any mapping from C(X, J ) into I, then
F
Chapter 1 7: Compactness
464
'P f---7 >.(�.p o f) is a mapping from C(K, I) into I, which we shall denote by F(>.) . The mapping >. f---7 F(>.), from J C ( X , I ) into J C ( K, I ) , is easily seen to be a continuous extension of f. Moreover, F((3(X)) = F(cl(X)) s;;; cl(F(X) ) = cl(f(X)) s;;; K. The equivalence of (UF19) with other forms of UF apparently was first proved by Los and Ryll-Nardzewski [1954] . However, the proof given above is based on Mycielski [1964] . Proof of (UF20) =} (UF2 1 ) . Let A be any set, and let X = 2 A be equipped with the product topology. Let (3(X) be its Stone- Cech compactification. Then each coordinate projection n.x : X ---> 2 (for >. E A) extends uniquely to a continuous function P.x : (3(X) ---> 2. Define a mapping P : (3(X) ---> 2A coordinatewise, so that Jr_x o P = P_x. Then P is a continuous function from the compact space (3(X) onto X. By 17.7.h, X is compact. The equivalence of (UF20) with other forms of UF was apparently first announced by Rubin and Scott [1954] ; the proof given here is based on Gillman and Jerison [1960] . =} (UF 1 ) (based on Mycielski [1964] ) . Let X be any given set, and let £0 be a given proper filter on X ; we wish to show that £0 is included in an ultrafilter. Let � = {subsets of X } . Then P(�) = {subsets of �} may be identified with 2 2: = {mappings from � into {0, 1 } } , as usual. Any :1 E P(�) = 2 2: is a collection of subsets of X , and any S E � is a subset of X . Let 2 2: have the product topology; then the Sth coordinate projection ns : 2 2: ---> 2, defined by
Proof of (UF21)
ns ( :T)
{�
if S E :1 if s tJ. :1,
is continuous (as with any product topology) . Now define the sets
D E
:1 is a proper filter on X } , and :1 � E o } , S E :1 or X \ S E :f}
fs
for each S s;;; X. Show that
D
n
{ :f E 2 2:
7rA (J")ns ( :T) - 7rAns ( :f) = 1r0 ( :f) = 0},
A,B
{=::;>
a E En1 nJ = n 2 k for some
k
{=::;>
j is even,
since the functions cp and j ....__. nj are injective. Thus the sequence ( !n1 ( u) : j E converge. This proof follows the presentation of Wilansky [1970] .
N) does not
17.29. Example of the inadequacy of frequent subnets. In 7.19 and 15.38 we saw that Willard subnets, Kelley subnets, and AA subnets can be used interchangeably for most purposes in topology. We now show that frequent subnets (defined in 7. 16.c) cannot be used interchangeably with those other types of subnets. Let X be any topological space that is compact but not sequentially compact (e.g. , the space in 17.28). Let (xn ) be a sequence in X that has no convergent subsequence. Then (xn) has a convergent subnet (u{3) · Any frequent subnet of (xn ) is a subsequence (see 7.16.d). Any net that is equivalent to (u{3) must have the same limit(s) as (uf3 ) · Thus, (uf3) is a subnet of (xn ) , but it is not equivalent (in the sense of 7. 1'l.c) to a frequent subnet of (xn ) . Other examples of the inadequacy of frequent subnets have been given by Wolk [1982] and other papers cited by Wolk. 17.30. a.
Relations between different kinds of compactness. Any countably compact, first countable space is sequentially compact. (Recall that a topological space is first countable if the neighborhood filter at each point has a countable filter base.) Hint : 15.34.c.
b. Any countably compact space is pseudocompact.
1Ft
c.
Hint : Let f : X ---+ be continuous; consider whether the set n�= l {x E X : f(x) 2: n } is nonempty. ( Optional.) Any paracompact, pseudocompact space is compact. Proof Suppose X is paracompact but not compact. Let 9 = {Ga : a E A } be an open cover with no finite subcover. Let {!a : a E A} be a partition of unity that is precisely subordinated to the given cover. Then the sets Ha = {x E X : fa (x) i= 0 } � Ga also form an open cover with no finite subcover. Recursively choose a sequence (xn ) in X and a sequence (a(n)) in A such that X n t/. Ha(l) U · · · U Ha(n- 1 ) and X n E Ha(n ) · Define fn( n ) (x) . n g(x) fa ( n ) (xn ) n= l
f
Then g : X
---+ 1Ft is continuous but g(xn ) 2: n , so X is not pseudocompact.
Compactness, .Maxima, and Sequences
469
17.31. Remarks. We have considered three types of compactness that can be described in terms of convergences. A topological space is compact if every net has a convergent subnet ; countably compact if every sequence has a convergent subnet; sequentially compact if every sequence has a convergent subsequence. It is easy to see that any compact space is countably compact, and any sequentially compact space is countably compact. In general, no other implications hold between these three kinds of compactness - the example in 1 7.38 shows that sequential compactness does not imply compactness, and the example in 17.28 shows that compactness does not imply sequential compactness. However, under certain additional hypotheses, all three kinds of compactness are equivalent, as we shall see in 17.33 and 17.51 .
Optional. It is interesting to consider a fourth type of compactness. Say that a topological space is
supersequentially compact if every net has a convergent subnet that is a sequence. Clearly, this implies the other three kinds of compactness. Supersequential compactness is not necessarily a useful notion; we introduce it only to illustrate certain ideas about subnets. It turns out that supersequential compactness depends, not on the topology of X, but only on the cardinality of X and on which definition of "subnet" we use. Let X be a nonempty set. Then:
a. If we use Aarnes-Anden n G17 •
n E N and a = (s 1 , s2 , . . . , sn ) E sn .
For any
T
E sn ,
since ?/; and 'P7 are continuous,
is an open neighborhood of T in sn . Since sn is compact, it is contained in the union of the sets HT ( T E An ) ' for some finite set An � sn . Let 1> n = { 'PT : T E An } · Note that for each positive integer n and each a E sn , the set G17 meets 1> n . In fact, G17 meets 1>171 for every m :2: n, since G17, � G17 whenever a ' is an extension of a. Each 1> n is a finite set, which we now arrange in any order. Form a sequence (gk ) by taking the element of 1> 1 , then the elements of 1>2, then the elements of 1>3 , etc. Then each G17 is a frequent set for the sequence (gk ) . Since the G17 's form a neighborhood base 'B for ?/;, the sequence (gk ) has ?/; as a cluster point.
17.50. Eberlein-Smulian Theorem (nonlinear version). Let S be a compact topo logical space and let (M, d) be a compact metric space. The product M 5 will be topologized with the product topology; subsets of M 5 will be topologized with the relative topology thereby determined. In particular, this applies to C(S, M)
{ continuous functions from S into M }
C
M5 .
Let 1> (j) ) , Pa (m ) ] < 1/j for all positive integers m ::; j. This completes the recursive definition. (We do not assert that the sequences (IPa (m) ) and (s{J (n ) ) are subnets of the given nets (IPn) and (s13 ) . ) Now apply (B) to the sequences ( IPn ( m ) : m E N) and (sf3 ( n ) n E N); this proves Poo = q00 . (F). Suppose 'ljJ E cl(A), and 'ljJ is discontinuous at some point s0 E S. Proof of (A) Then some net (IPn) in converges to '1/J , and some net (s13 ) in S converges to s0 and satisfies 'lj;(s13 ) f. 'lj;(s0). Since M is compact, by replacing (s13) with a subnet we may assume that ('1j;(s13 )) converges to some limit z -=f. 'l/J(s0). This contradicts (A). Thus cl() � C ( S, M) . Proof that (F) and (A) together imply (G). For any s E S and E V, let [c(s)] (IP) = ip(s); thus we define c( s) E Mv. Observe that the "evaluation map" E S Mv defined in this fashion is continuous. Since Mv is a compact metric space, it is separable, amj. hence there is some countable set T � S such that c(T) is dense in c(S). We now claim that if g, h E cl(V) and g = h on T, then g = h on S. To see this, fix any s0 E S. There is some net (t13 ) in T such that E(t13) E(s0). Replacing (trJ) with a subnet, we may also assume (g(t!1)) and (h(t(1)) are convergent. By (A), show that g(s0) = lim!' g(t!1) = lim!' h(t!1) = h(s0). This proves our claim.5 Now, the restriction map 9t g f-+ g [ T is continuous from M onto Mr. By 17.10.c, that map gives a homeomorphism of cl(V) onto its image, 91( cl(V) ) , which is a subset of the metrizable space MT. Proof of (G) (E). M5 is compact, and any compact metric space is sequentially compact. 17.51. Corollary. Assume the conditions of the preceding theorem. Then is compact is sequentially compact is countably compact. Outline of proof. Countable compactness is always implied by either of the other two kinds of compactness, so we may assume is countably compact. Then condition (D) of the Eberlein-Smulian Theorem is satisfied, hence all the conditions of that theorem are satisfied. It now suffices to show is closed, for then compactness and sequential compactness follow from parts (C) and (E) of the Eberlein-Smulian Theorem. Let 'ljJ E cl(); we wish to show 'ljJ E . By condition (F) of the theorem, we have 'ljJ E C(S, M). By 17.49, 'ljJ is a cluster point of some sequence (IPn ) in . We know cl( { IPn } ) is metrizable, by condition (G) of the Eberlein-Smulian Theorem; hence by 15.34.c, 'ljJ is the limit of some subsequence of ( IPn ) . Since is Hausdorff and countably compact, any sequence in that converges must have its unique limit in ; hence 'ljJ E . =:?
:
ip
:
�
�
:
=:;.
-¢==}
-¢==}
C hapter 1 8 Uniform Spaces
We now resume the study of uniform spaces, which we began in Chapters 5 and 9. Our study will also make use of material from Chapters 7 through 17; see especially 16. 16. As shown in the following chart, uniform structure fits between topological structure and the structure provided by distances, in its degree of detail of information about objects. l\Iovement from right to left in this table is given by forgetful functors (discussed in 9.34). 18.1. Preview.
less
details about the object more structure: topological uniform distances typical questions: Is f continuous? Is f uniformly In metric Is S compact? Is continuous? spaces: Is f S topologically Is S complete'? nonexpansi ve? complete'? Is S bounded? broader T subset of 2x subset of 2 x x subset of [0 . +:xJ)X x X topology quasi-uniformity quasigauge class of objects 1 completely regular uniformity gauge (a set of topology pseudometrics) pseudometrizable pseudometrizable pseudometric topology uniformity narrower 1 metrizable metrizable metric topology uniformity +-----
---->
x
ThesE' functors are not inclusions of subcategories in categories, because the maps are not injective. For instance, each gauge uniquely determines a uniformity. We may "forget" which gauge determined the uniformity: different gauges on a set may determine the same or different uniformities. Similarly, each uniformity uniquely determines a completely regular topology. We may '·forget" which uniformity determined the topology: different uniformities on a set may determine the same or different completely regular topologies on that set. Each category in the table is a full subcategory of the category above it. (Full subcat481
482
Chapter 1 8: Uniform Spaces
egories were introduced in 9.5.) For instance, completely regular topological spaces are a full subcategory of topological spaces; in either of these two categories the morphisms are the continuous maps. LIPSCHITZ MAPPINGS 18.2. Definitions. Let (X, d) and (Y, e) be metric spaces. A mapping p : X ---+ Y is said to be Lipschitz, or Lipschitzian, if e(p(x 1 ) , p(x2 ) ) ::; r.d(x 1 , x2 ) for some finite constant r. and all x 1 , x2 E X. The smallest such is then called the Lipschitz constant of p; it is equal to r.
The set of all Lipschitz mappings from (X, d) into (Y, e) will be denoted Lip( X, Y ). We say p is nonexpansive if (P) Lip ::; 1. The mapping is a strict contraction if (P) Lip < 1. Caution: This book will not use the term "contraction" by itself. Some math ematicians use that term for nonexpansive mappings; others use it for strict contractions. 18.3. Examples.
Let S be a nonempty subset of a metric space (X, d). Then the map x dist(x, S), defined in 4.40, is nonexpansive from X into R (See 4.41. b.) b. (This example assumes more knowledge of calculus.) Let p JR. ---+ JR. be continuously differentiable. Then p is Lipschitz if and only if p' is bounded, in which case (P) Lip supx E IR fp'(x) f . (We shall generalize this result in 25.24.) c. x fxf is Lipschitz on JR., but not continuously differentiable. d. (Preview.) Let p X Y be a linear map from one normed vector space to another. Then p is continuous if and only if p is Lipschitzian. (See 23.1.) e. Suppose (X, p) is a metric space, and f : X ---+ X is a mapping with the property that for each x E X, the set {r ( x) : n 0, 1, 2, 3, . . . } is metrically bounded. Then f--+
a.
:
f--+
:
=
---+
=
{3(x, y)
is a metric on X that is larger than or equal to p and makes the mapping f nonexpansive from (X, {3) into itself. In fact, {3 is the smallest metric on X that has those two properties. (In 19.47.c we shall consider some conditions under which {3 is topologically equivalent to p.) 18.4. Definitions. Let (X, d) and (Y, e) be metric spaces. For a 0 and mappings p : X ---+ Y, let >
483
Lipschitz Mappings a
We say p is Holder continuous with exponent if (P)o. < oo. We shall denote the class of such functions by Hol0(X, Y). Note that Hol 1 (X, Y) Lip(X, Y), with (p) I (P) Lip · =
=
18.5. Examples and exercises. a. A function p : X ---+ Y is constant if and only if (P )o:
0. b. Let X and Y both be equal to the set [0, +oo) equipped with the usual metric d(x, y) l x - Yl· Let a, /3 E (0, 1]. Let p(x) xf1 . Then if {3 = a if {3 =f. a. Hint: First show that ( ( u + h )f1 - uf1) / h " is a nonincreasing function of u, for u, h > 0. c. Let p : X Y and q : Y ---+ Z be Holder continuous with exponents a and {3 respectively. Then the composition q o p : X Z is Holder continuous with exponent a/3, and in fact (q o P)a f3 � (p);� (q) # . In particular, the composition of Lipschitzian functions is a Lipschitzian function; we have (qp/ Lip � (P) Lip (q) Lip · d. Let X and Y be metric spaces, with X metrically bounded. For a > {3 and p : X ---+ Y, show that (p)13 � (P)a.(diam(X)) - !1 and hence Hol" (X, Y) � Hol13 (X, Y). e. For > 1 , the spaces Hol0 ( JR. , Y) are not very interesting, for they contain only constant functions. A hint will be given when we generalize this result slightly in 22.18.e. 18.6. Let X and Y be metric spaces. A function f : X ---+ Y is called locally Lipschitz if any (hence all) of the following equivalent conditions are satisfied: (A) f is Lipschitz on a neighborhood of each point, (B) f is Lipschitz on each compact set. (C) f is Lipschitz on a neighborhood of each compact set. More precisely, if K is a compact subset of X, then there is some number r > 0 such that the restriction of f to the open set { x E X : distd(x, K) < r} is Lipschitz. =
=
=
{�
---+
---+
a
Exercises.
Prove the equivalence. Hints: It suffices to show (A) :::::;. (C). Suppose not. Show that there exist se quences (xn ), (x�) in X such that Xn =f. x�, , distd(Xn , K) 0, distd(x�,, K) 0, and e ( f(x n ) , J(x�) ) /d(xn , x�,) ---+ oo. Passing to subsequences, we may assume (x n ) and (x�) converge to limits Xao and x� in K. Show that Xao = x� . Then what? b. For any open cover of a metric space X, there exists a partition of unity subordinated to that cover, consisting of locally Lipschitzian functions. Hints: Let 'J {To: : a E A } be a locally finite open cover that is subordinated to the given cover (see 16.31 and 16.29). For each a, define fa : X [0, +oo) by !a ( x ) dist(x, X \ Ta ) · Then define a partition of unity {go: } as in 16.25.c. a.
---+
---+
=
=
---+
484 c.
Chapter 1 8: Uniform Spaces
Let X be a metric space, and let f : X --+ IR be continuous. Then f can be approx imated uniformly by locally Lipschitz functions -- i.e., for any c > 0 there exists a locally Lipschitz function g : X --+ IR satisfying supx E X l f(x) - g(x) l < c. Proof Each x E X has a neighborhood Nx such that l f(x) - f (y) l < c for all y E Nx. Choosing smaller Nx 's if necessary, we may assume that the Nx 's are all open; then they form an open cover of X . Let {Pu : a E A} be a locally Lipschitzian partition of unity that is subordinated to the cover { Nx }. For each E A, let v(a) be some member of X such that {u E X : Pn (u) -=/= 0} ip V-uniformly also. Proof. Clearly (A) =? (B). To show (B) =? (A) , we shall take S = Y x Y - that is, we shall consider functions cp : Y x Y ---> Y . Let (x" , x;, ) : a E A be a net in Y x Y with eventuality filter equal to U. (For instance, we could use the filter ' s canonical net; see 7. 1 1 . ) Let D and E be gauges that determine the uniformities lL and V . Then D (xa x� ) ---> 0 in Y , and (in view of observations in 18.7) it suffices to show that E(xa , x� ) ---> 0 in Y . Denote S = Y x Y . Define cp(x, x ' ) = x ' for all ( x , x' ) E S . Define functions IP a : S ---> Y by if x = Xn and :r ' = x;, Xn IPn (x, x' ) otherwise. X
(
)
,
{
(
)
For each d E D and s E S we have d IPn (s ) , ip(s) :S d(xn , x� ) , which tends to 0 uniformly for all choices of s E S; thus 'Pn ---> cp D-uniformly on S. By assumption (B), then, ip,, ---> IP £-uniformly on S as well. Fix any e E E. Then
which tends to 0 as
a
increases. This proves E( Xn , x;, ) ---> 0.
EQUICONTINUITY 18.28. Let X be a topological space, let (Y, e ) be a pseudometric space, and let cp : X ---> Y be some mapping. Then the oscillation of ip at a point x0 E X with respect to the pseudornetric e is defined to be the number osc,. ( IP, xo )
(
inf diam, cp( NEN(J-o)
N) )
where N(x0) is the neighborhood filter at x0 . (We may omit the subscript e if the choice of e is understood.) Thus, the oscillation is a number in [0, +oo ] . More generally, if is a collection of functions from X into Y , the oscillation of at x0 is defined to be the number
osc, (. x0 ) = We observe that
(
inf sup diam,. cp( NEN(J:o ) 'f' E 0 (in the sense of 18.7( C) ) , then E ('Pa(xa), 'Pa(x�)) --> 0. For each V E V, there is some U E U such that (x, x' ) E U, 'P E For each number > 0 and each pseudometric e E E, there exists some number > 0 and some finite set D' D such that sup e ('P(x), 'P(x')) < max d(x, x') < that the choice of and D' depends on and e but not on x,(Wex', emphasize ip.) For each e E E, there exists a finite set De D and a function re : [0, +oo) ( a
(A) Whenever
(B)
x
x
(C)
E
tJ
r:;;;
tJ
d E D'
E.
'P E
E
tJ
( D)
r:;;;
[0, +oo) that is continuous and increasing, and satisfies le sup
ip E
e (ip(x),ip(x') )
. is a singleton. In this special case, the argument above establishes the statement (AC26) using just ZF - i.e., set theory without the Axiom of Choice. In particular, AC is not needed to prove that 2 A is complete, where 2 = {0, 1 } has the discrete uniform structure.
A,
Proof of (AC26) =? (AC27) . As we noted in 19. 1 1 .h, every knob space is complete. Proof of (AC27) =? (AC3) . Let A, S>. , 6 , Y>. , X, � be as in 6.24, and equip X with topology and uniform structure as the product of knob spaces. For each A E A , the filterbase 1f>. (:J)
505
Total Roundedness and Precompactness
is Cauchy on Y.x , since it includes the set 7r.x (Tp} ) = S.x . Since S.x is closed in Y.x , any limit of 1r.x ( 3") must lie in S.x . Since each 1r.x ( 3") is Cauchy, the filterbase :1 is Cauchy on X. By hypothesis, X is complete, so :1 has at least one limit ( in X. We have 1r.x (() E S.x for each .A ; thus ( E fh E A S.x .
TOTAL BOUNDEDNESS AND PRECOMPACTNESS 19.14. Definition. Let (X, U) be a uniform space, and let the uniformity U. A set S X is totally bounded if
�
for each U E U. there is some finite set
D be any gauge that determines
F � X such that S � Ur EF U [:r]
or, equivalently, if for each number E > 0 and each that S � UrE F Br� ( :r: . E).
d E D, there exists some finite set F � X such
The proof of equivalence of these two definitions is left as an exercise. A substantially different characterization of totally bounded sets will be given in 19. 17.
19.15. Basic properties of total boundedne8s. a. We obtain an equivalent definition if we replace "finite set
F � S" in either of the conditions above.
F � X"
with "finite set
b. In a pseudometric space, the definition above simplifies slightly. A set is totally bounded if and only if, for each E > 0, the set can be covered by finitely many balls of c. d.
e. f.
g.
radiuH E. If X is a uniform space and S � X , then S is also a uniform space (see 9.20 ) . Show that S is totally bounded, as a subset of X, if and only if S is totally bounded as a subset of itself. Let D be a gauge that determines the uniformity U. Then a set S � X is totally bounded in the uniform space (X, U) = (X, D) if and only if S is totally bounded in each of the pseudometric spaces (X, d) for d E D. (Hence many questions about total boundedness of uniform spaces can be reduced to questions about total boundedness of pseudometric spaces.) Any finite subset of a uniform space is totally bounded. The totally bounded subsets of a uniform space form an ideal. That is: any subset of a totally bounded set is totally bounded, and the union of finitely many totally bounded Hets is totally bounded. If {Y.x : A E A } is a collection of totally bounded uniform spaces, and X is equipped with an initial uniformity determined by the Y.x 's, then X is also totally bounded. In particular, any product of totally bounded uniform spaces is totally bounded. when equipped with the product uniform structure.
Chapter 1 9: Metric and Uniform Completeness
506
More particularly, 2 A is totally bounded for any set A. This argument does not require the use of the Axiom of Choice or any weakened form of Choice; we shall use that observation in a proof in 19. 17. h. Let S 0 and some pseudometric d E D such that X cannot be covered by finitely many balls Bd (x, c ) , with centers x E X. Let J
=
{A
:
-->
-->
Chapter 1 9: Metric and Uniform Completeness
512
(B) p extends to a Cauchy continuous function p : X ----+ (C) p is Cauchy continuous from S into
Y;
Y.
Furthermore, if p is uniformly continuous, then so is any continuous extension p; in fact, any modulus of uniform continuity for p will also be a modulus of uniform continuity for p.
Proof The conclusion about uniform continuity follows from 18. 10. The implication (B)
=}
(C) is trivial. The implication (A) =} (B) follows easily from 19.26. Now assume (C) ; it suffices to prove (A) . Fix any x E X; let N(x) be its neighborhood filter in X . Then S n N(x) = {S n N : N E N(x)} is the neighborhood filter of x in S. That filter converges to x in S, and therefore that filter is Cauchy. Since p is Cauchy continuous, the filterbase p (S n N(x)) = {p(S n N) : N E N(x)} is Cauchy in Y - that is, the filter it generates is Cauchy. Since Y is complete, that filter is convergent. Now we may apply 16.15; this completes the proof.
19.28. Definition. Let X be a complete uniform space, and let f : [a, b] ----+ X be some func tion. We say f is piecewise continuous if it satisfies any of these equivalent conditions. (The proof of equivalence uses 19.27.) (A)
f is continuous except at finitely many points and has left- and right-hand limits at those points.
< (B) We can form a partition a = t0 < t 1 < t 2 < uniformly continuous on each open interval (t1_ 1 , t1 ) . ·
· ·
t n = b such that f is
(C) We can form a partition a = t0 < t 1 < t 2 < < t n = b such that f agrees on each open interval (t1_ 1 , t1 ) with some X-valued function J1 that is continuous on the closed subinterval [t1_ 1 , t1 ] . ·
·
·
CAUCHY SPACES ( OPTIONAL) 19.29. Remarks. Some of the ideas covered in this chapter can be extended to a setting · slightly more general than uniform spaces. A Cauchy space is a set X equipped with a collection e of proper filters on X, called the Cauchy filters, which satisfy these axioms: (i) For each
x E X, the. ultrafilter fixed at x is Cauchy.
(ii) If 9", 9 are proper filters, 9" is Cauchy, and 9" 0 there exists some M such that m, n 2: N ==? lxm - Xn l < E. Now verify lots of things; the quotient space constructed in the preceding paragraph is a Dedekind complete, chain ordered field, and thus it is R
19.34. Preliminaries on Kolmogorov quotients. Before continuing to the next two sections, the reader may find it helpful to briefly review sections 16.5 and 16.21, on Kolmogorov quotients. The quotient is formed from a space by "collapsing together" (i.e. , identifying) those points that are indistinguishable from one another. It is easy to see that a gauge space is complete if and only if its Kolmogorov quotient is complete, provided that the Kolmogorov quotient is equipped with the gauge determined as in 16.21 .
19.35. Lemma. Every pseudometric space has a (distance-preserving) completion. Proof Let (S, d) be a pseudometric space. Let Q be its Kolmogorov quotient; then Q
is a metric space when metrized as in 16.21. The quotient map 1r : S ---+ Q is distance preserving and surjective (but not injective unless S is Hausdorff) . Let C be a distancepreserving Hausdorff completion of Q, formed as in 19.33.a or 19.33.b, and let i : Q --S C be the inclusion map. The composition S � Q __i:__. C is a distance-preserving map into a complete metric space, but in general this map is not injective.
Completions
515
To overcome that drawback, we shall form a new space X that has C as its Kolmogorov quotient - i.e., we shall reverse the process of forming a Kolmogorov quotient. By relabeling if necessary, we may assume C is disjoint from S. We define the set X to be C \ i(Q) U S. To define the pseudometric of X, view X as a modification of C, formed by "uncollapsing" the points that were collapsed together by 7r. For each q E Q, replace the single point i(q) E C with a relabeled copy of the set 7r� 1 (q) � S, all the members of which were separated by distance 0 in S and will be separated by distance 0 in the new space X. Points in C \ i(Q) are left unaltered in forming the new space X. The inclusion S � X is distance-preserving and injective, with X complete.
(
)
19.36. Theorem: Existence of completions of uniform spaces. Every uniform space has a completion. Furthermore, the completion can be given by a distance-preserving inclusion, in the following sense: Let S be a uniform space whose uniform structure is given by a gauge D. Then there exists a complete uniform space X with gauge E, such that S is a dense subset of X and the members of D are just the restrictions of the members of E. If S is Hausdorff, then we may choose X Hausdorff as well. Proof. The proof may seem long because it involves a great deal of notation; but it is conceptually simple and actually involves very little computation. For each pseudometric d E D, let (Td, d) be a completion of the pseudometric space ( S, d) . Here we use the same letter d for the given pseudometric on S and its extension to the larger space. The letter D will be used to represent not only the original gauge, but also the collection of these extensions. Let Y = Ti d E D Td be equipped with the product uniform structure; then Y is complete, by (AC26) in 19. 13. There may be many gauges on Y that give that product uniform structure; one particularly convenient gauge is formed as follows: For each pseudometric d, we define a corresponding pseudometric on Y , which we shall denote by d, as follows: d(y, y') = d (7rd(y), 7rd(y')), where 'iTd : Y ----+ Td is the dth coordinate projection. It follow:; trivially from 18. 1 1 (C) that the pro�uct uniform structure on Y is ; consisting of all such pseudometrics d. given by the gauge D Define an inclusion i : S � Y by taking i(s) = (s, s, s, . . . ) - that is, each member of S is mapped to the corresponding constant function. Clearly this map is distance-preserving: d(s, s') = d(i(s), i(s')). The closure of i(S) in Y is a distance-preserving completion of S. If the original uniform space (S, D) is Hausdorff, then the construction above may be modified to yield a Ha11sdorjJ distance-preserving completion, as follows: Let Q be the Kolmogorov quotient of Y. Then the gauge space (Q, D) is complete and Hausdorff. The quotient map 7r : Y ---+ Q is not necessarily injective, but its restriction to i(S) is injective. Thus, the closure of 7r (i(S)) in Q is a distance-preserving Hausdorff completion of S. 19.37. Theorem: Uniqueness of Hausdorff completions. Both of the results below follow easily from 19.27, by an argument similar to the uniqueness proof in 4.38; for the metric space result , use a suitable modulus of uniform continuity. a.
Let
X be a Hausdorff uniform space. Then the Hausdorff completion of X is unique
Chapter 1 9: Metric and Uniform Completeness
516
up to isomorphism. In other words, if i 1 : X --L Y1 and i 2 : X --L Y2 are two such completions, then the bijection i 2 i1 1 : Range( it) ----+ Range( i 2 ) extends uniquely to a bijection 1: : Y1 ----+ Y2 that is uniformly continuous in both directions.
o
b. Let X be a metric space. Then the metric completion of X is unique up to isomorphism.
In other words, if i 1 : X --L Y1 and i 2 : X --L Y2 are two such completions, then the bijection i 2 i1 1 : Range(i t ) ----+ Range( i 2 ) extends uniquely to a distance-preserving bijection 1: : Y1 ----+ Y2 .
o
19.38. Example and remarks. The Lebesgue space L 1 [0, 1], defined in 22.28, is a complete metric space in which C[O, 1] = {continuous scalar-valued functions on [0, 1] } is dense; those properties will be proved in 22.30.d and 22.31(i). Thus U [O, 1] is the completion of C[O, 1] , 1 where the metric used is d(f, g) = f0 ff(t) - g(t) f dt. Although we shall prove that fact as a theorem, it could instead be used as a definition of L 1 [0, 1]. It is perhaps the most elementary definition of L 1 [0, 1 ] ; it does not require any measure theory. However, that definition has several drawbacks. It depends heavily on the topological structure of the interval [0, 1] , and thus it does not generalize readily to the Lebesgue spaces U (p,). Also, it does not give us easy access to the important theorems that sometimes make L 1 [0, 1] more useful than C[O, 1] - e.g. , theorems such as the Monotone and Dominated Convergence Theorems 21 .38(ii) and 22.29. Moreover, viewing L 1 [0, 1] as the completion of C[O, 1] does not offer us much insight into the structure of L 1 [0, 1 ] : It describes members of that space as equivalence classes of Cauchy sequences of members of C[O, 1 ] , where the definition of "equivalence" is somewhat complicated; or it identifies L 1 [0, 1] as a subset of the collection of bounded maps from C[O, 1] into R We would prefer to view the members of L 1 [0, 1] as maps from [0, 1] into R We shall follow the usual development of integration theory: We begin with measures and measurable functions (in 9.8, 1 1 .37, and Chapter 2 1 ) . We use linearity to define the integrals of simple functions; then we take limits to obtain the integrals of other measurable functions. The measure J1 can be defined on any measurable space 0; the particular topo logical properties of [0, 1] are not especially relevant in this construction. Two measurable functions from 0 into the scalars are equivalent if they differ only on a set of measure 0. The members of L P (JL) are equivalence classes of measurable functions whose integrals are not too big - see 22.28. This approach requires an explanation of "measurable function" and "measure 0," but it does not involve Cauchy sequences and ultimately it is more insightful. For most purposes, we can work with any member of an equivalence class, and so we obtain members of L 1 [0, 1] as maps from [0, 1] into JR.
BANACH ' S FIXED POINT THEOREM 19.39. Theorem (Banach, Caccioppoli) . If X is a nonempty complete metric space and f : X ----+ X is a strict contraction, then f has a unique fixed point �. Moreover, � = limk�x J k (x) for every x E X. In fact, we have this estimate of the rate
Banach 's Fixed Point Theorem
517
of convergence: ( f) Lpd
(x, .f(x) )
1 - (!) Lip
Hints: Show d (P (x) , .fH 1 (x)) ::; (f) Lp d(x, .f (x)) by induction on j. Also,
(
d .f" (x), .f'" (x)
)
Jn - 1
L d (P (x) , .tJ + ' (x))
n,
by repeated use of the triangle inequality.
Remarks. The Contraction Mapping Theorem is remarkable: It has a short and simple proof, and yet it has many applications; see for instance 19.40.c and 30.9. In some respects it cannot be improved upon; this is made clear by the two converses given in 19.47 and 19.50. 19.40. Exercises. a. Let (X, d) be a metric space. Let f X X be a strict contraction -- or, more generally. let f be a self-mapping of X satisfying :
(
d f(x) , f(y) Then
)
0, either "((t) = 0 for some t > 0 or {"!(t) t > 0} contains arbitrarily small positive numbers. In either case, since "( is nondecreasing, it follows from d(f(x) , f(y)) :::; "f(d(x, y)) that f is :
continuous. Next we shall show that .any orbit x, f(x) , P (x), P (x), . . . converges. Fix any x = xo E X ; let Xn = r (x) and Cn = d(xn , X n + d · Then Cn :::; "in [d(xo, x l )J , so Cn ----+ 0. Suppose (xn ) is not Cauchy. Then there exist E > 0 and integers m(k) and n(k) such that and k :::; m(k) < n(k) for k = 1 , 2, 3, . . . . For each k, we may assume that n(k) is chosen as small as possible; hence d(xm ( k) ' X n(k) - d < E. Since 0
,:= cE: , then v(c) 2 T and d(cE:, c) :::; v(cc) - v(c) < E. T
T
b. If A is the limit of that net, then A is a �-upper bound for C. Proof. For any fixed c E C and for all c' >,:= c in C, we have d(c, c') :::; v(c) - v(c') . Take limits as
c' increases; use the fact that v is lower semicontinuous.
Now let e be the collection of all nonempty �-chains. Then e is nonempty since each singleton is a member of e. Use 0 as n ---> for some neighborhood of �-
oo,
___,
V
19.47. Meyers's Converse to the Contraction Mapping Theorem. Let f be a continuous self-mapping of a nonempty, complete metric space (X, p) . Suppose that � is a fixed point of f, and fn ( x) � as n for each x E X . Also assume that fn � uniformly on some neighborhood of � - i.e., assume � has some neighborhood such that
___,
___, oo
(
lim sup p r (v) , � vE V
rt-----.-Jo OCl
)
V
0.
___,
Meyers's Converse (Optional)
521
Then there exists a topologically equivalent, complete metric contraction.
d on X that makes f a strict
Remarks. This proof is taken from Meyers [1967] . A similar result was discovered indepen dently in Leader [1977] . Both proofs were inspired by the treatment of the compact case given in Janos [1967] . Proof of theorem. The proof is in several steps. a.
By replacing V with a smaller neighborhood of �' we may assume also that V is open and that f(V) � V . Hints: Certainly the theorem's hypotheses on V remain satisfied if we replace V with any smaller neighborhood of �- Replacing V with such a neighborhood, we may assume V is open. Now choose k large enough so that f k (V) � V; then let W = n�':r� f - J (V). The set W has the required properties; we shall relabel it as V.
b. Some easy observations: U := o f - n (V) = X and
For integers n (not necessarily positive), let Kn+ l , and
)
Also, Kn -> � as n -> oo that is, any neighborhood of � contains Kn for all n sufficiently large. Hence diam(Kn ) -> 0 as n - if we use the metric p or any other metric that is equivalent to p - and n n E Z Kn = { 0 . B y replacing p with an equivalent metric that i s also topologically complete, we may assume that f is a nonexpansive mapping - i.e., that p(f(x), f(y)) :::; p(x, y). Hints: For any x, y E X, the sequence P (r(x), r(y)) : n = 1 , 2, 3, . . . consists of nonnegative numbers converging to 0; hence a maximum exists: -
c.
(
Kn = c1 r( V ) . Show that f(Kn ) �
-> oo
(
{J (x, y)
max {p (f n (x), j" ( y)) : n = 0, 1 , 2, . . . } .
)
As we noted in 18.3.e, {3 is a metric, uniformly stronger than p, and {3 makes f non expansive. In view of 19. 1 1 .i, it suffices to show that p is topologically stronger than (3. Let any x E X and c > 0 be given; we must find a t5 > 0 such that p(x, y) < b =? {J (x, y) < c. Choose N large enough so that diarnp (J N ( V )) < c and f N (:r) E V. Using the continuity of f in (X, p) , show that there is some t5 > 0 satisfying
p(x, y)
< t5
max p (Jl (x) , Jl (y)) < c. J have the property that
Z
.X (J(x) ) = .X(x) -
1
whenever
f(x) =I- x (i.e., whenever x =I- 0 .
To show that there exists such a function .X, define equivalence classes and sketch trees as in 19.48. Choose some representative element zs from each equivalence class S. (This requires some form of the Axiom of Choice, if there are infinitely many equivalence classes. ) Define .X(zs) = 0 for each equivalence class S. After that, .X is uniquely determined: add 1 when moving up in the tree, and subtract 1 when moving down in the tree. As in 19.48, we define j= (x) = � for all x E X and define p, q , z, d as in 19.48. In the present application, that yields d(x, y)
j =O
j =O
These sums converge even if p or q is infinite, so d is a metric. From .X(J(x)) = .X(x) - 1 it follows that (!) Lip :::; � - For arguments below, we note that q(x, 0 = 0, and hence d(x, �) :::; L:;: o 2 >. ( x ) - j = 2 >. ( x J+ l . To show that the metric is complete, let (xn) be a Cauchy sequence; we wish to show that (xn ) converges. If the numbers .X(xn ) are not bounded below, then some subsequence (xn k ) satisfies .X(xn k ) ----> -oo and hence d(xn k , �) ----> 0; hence Xn ----> � since (xn ) is Cauchy. Thus we may assume that .X(xn ) is bounded below by some finite constant C. Whenever X m and X n are distinct members of X , then at least one of the numbers p( Xm , Xn ) , q( Xm , Xn ) is positive, and so
However, (xn ) is Cauchy, so d(xm , Xn ) < 2 c for all m, n sufficiently large. Thus, for all m, n sufficiently large, the points Xm and Xn are not distinct - i.e., the sequence (xn ) is eventually constant and therefore convergent.
19.51. We shall show that the Principle of Dependent Choice, introduced in 6.28, is equivalent to the following principles about complete metric spaces: (DC3) Dancs-Hegedus-Medvegyev Principle. Let (X, d) be a nonempty, complete metric space. Let � be a partial ordering on X, which is semicontin uous in the following sense: For each x E X , the set F(x) = {y E X : y � x} is closed in the metric space (X, d) . Assume also that d and � satisfy the Picard condition:
whenever (xn ) is a sequence in X with x 1 � x 2 � x3 � d(x n , X n + I ) ----> 0.
then
Bessaga 's Converse and Bronsted 's Principle (Optional)
527
Then (X, � ) has a maximal element.
(DC4) Bronsted's Maximal Principle. Let (X, d) be a nonempty, complete metric space, and suppose r : X ---+ [0, +oo) is lower semicontinuous. Define a partial ordering � on X by: x�
y if d(x, y) ::::; r(x) - r(y).
Then (X, � ) has a maximal element.
Remarks. Caristi's Theorem (19.45) follows from Bronsted's Theorem (DC4) by a one-line proof: The maximal point is a fixed point. Also, Bronsted's Theorem follows from Caristi's Theorem by a one-line proof, if we are permitted to use the Axiom of Choice: Just take f to be a suitable choice function. Thus, the two theorems are "equivalent" in a sense used by some mathematicians: Each follows easily from the other, if we are permitted to use conventional set theory (including the Axiom of Choice) . However, Brunner [1987 Zeitschr.] has pointed out that the two theorems are not equivalent in the sense of set theory, for Bronsted's Theorem is equivalent to DC (as we shall show) , whereas Caristi's Theorem actually follows from just ZF, without DC or any other weakened version of Choice. Further discussion of this and related ideas are given by Manka [1988] . Proof of (DC2) =? (DC3 ) . Note that u E F(v) =? F(u)
d(xn - l , xn )
Using (DC2), we construct a sequence (x n : n E N) satisfying this inequality. By the Picard condition, then, d(xn , Xn - d ---+ 0; hence diam(F(x n ) ) ---+ 0. From Xn E F(xn-d we obtain F(xn )
(an )
Chapter 1 9: Metric and Uniform Completeness
528 By a
choice sequence of length n we shall mean a finite sequence
(x ( 1 ) , x(2), . . . , x(n ) )
x
that satisfies x( k + 1 ) E (x(k)) for k = 1, 2, . . . , n - 1 . Such a sequence may be viewed as a function from the set { 1 , 2, . . . , n} into A . We shall also consider the empty sequence to be a choice sequence (of length 0); we shall denote it by � · By the "Axiom" of Finite Choice (in 6.14), any choice sequence can be extended to a longer choice sequence; thus there is no maximal choice sequence. Let X be the set of all choice sequences. We observe that X does not contain an infinite �-chain. Indeed, if e were such a chain, then UcEe Graph( c) would be the graph of an infinite choice sequence. For each x E X, let >.(x) be the length of x. Also, define the "immediate truncation function" f : X ----+ X by f(x)
{(
�
x ( 1 ) , x(2) , · . , x(n - 1)
)
(
if x = x ( 1 ) , x(2) , . . . , x(n) if X = �·
)
Then >.(f(x) ) = >.(x) - 1 when x is not the empty sequence. Let A(x) = 2- -' ( x ) . Define equivalence and the functions p, q, z, d as in 19.48. Note that every point in X is equivalent to the empty sequence, and thus S0 = X is the only equivalence class. Therefore p and q are always finite, the sums in 19.48(q) always converge, and d is a metric on X. We restate its formula here:
d (x, y)
d(x, z) + d (y, z )
p- 1
L T-'(x)+J
+
L T-'(y)+J
j=O
[T-'(z) - T-'(x) J + [T-'(z) - T-'(y) ] ,
j=O =
q- 1
(q)
where p, q are the smallest nonnegative integers satisfying JP (x) = r(y) , and z = z(x, y) is the common value of JP(x) and r(y) . The choice sequence z = z(x, y) is the longest common restriction of x and y; it is the empty sequence if x and y begin with different choices. Define the Bronsted ordering � as in (DC4), using the function A(x) = 2--'(x) . That is, define to mean X � y We claim that the following statements are equivalent: (A) the sequence y is an extension of x - that is, z(x, y) = x; (B) d(x, y) = 2- -\ ( x) - 2-,\ (y) ;
(C)
X � y.
Indeed, (A) ==? (B) follows from (q), and (B) suppose that x � y. Then
==?
(C) is trivial. For (C)
d(x, y)
. ( z ) � 2 · 2 - >. ( r ) , hence
.\ (z) ;:::: .\(x). But
z
529 is a restriction of x; hence in fact
In particular, we note that
if y is not the empty sequence. From the illustration in 19.48 it is clear that the nearest points to any choice sequence x are the extensions y obtained by adding one more term at the end of sequence x, and those sequences satisfy x = f (y) ; thus their distance from x is 2 - >. ( y ) = 2- >. ( J· ) - I . Thus w,
:r E X,
d ( w , x)
j.
(2)
This property will be preserved if we replace (x11 ) by a further subsequence. First consider the case in which (xn) has some subsequence whose lengths are bounded. Such a subsequence is eventually constant , by ( 1) and (2) ; hence it is convergent. We now consider the remaining case, in which ( .\(x, ) ) has no bounded subsequence i.e., the case in which lim11 �x .\(:rn ) = CXJ . Replacing (x, ) with a subsequence. we may assume t hat .\ (xn) ;:=::
and for all n . Let VJ = z ( xj . XJ + J ) ; that is, by (2) and (3) and ( : ) ,
T1
>
d ( :r1 , x1 + l )
v1
n
(3)
+1
is the largest common restriction of :rJ and
[T >- (vl ) - T >. ( .r � ) ] [T >- ( •·� ) - T A (IJ+l l ] > 2 [T >-( vj l > 2 [T >-( ,.1 ) - T >- ( r 1 l ]
Then,
:rJ + 1 .
+
T .J -
1
]
.
The inequality 2 - J > 2 [ 2 ->- ( • ·J J - 2 - J- 1 ] simplifies to .\(v1 ) > j . Thus v i , the common restriction of x1 and XJ + 1 , has length greater than j . That is, the functions V j , Xj , XJ + 1 all have domains that include the set { 1 , 2 . . . . , j } , and those functions all agree on that set . Let wJ be the function on { 1 . 2 . . . . , j } obtained by restricting any of VJ , J:7 , XJ+ 1 to that set . The function 1L•1 is a choice sequence, since it is a restriction of a choice sequence. Then w i + 1 , defined analogously, is an extension of WJ , since both these functions are restrictions of J'.J + l · The sequence w 1 , w2 • w:l · . . . forms an infinite �-chain in X , a contradiction.
C hapter 2 0 Baire Theory
20.1. Preview. The name "Baire" is, unfortunately, associated with four distinct notions, which can easily be confused: •
sets of the first or second category of Baire;
•
Baire spaces;
•
sets with the Baire property; and
•
Baire sets.
All are introduced in this chapter. The first three of these notions are closely related and will be studied extensively in the following pages. The fourth notion is less important for the purposes of this book and will be introduced briefly in 20.34 mainly to prevent the beginner from confusing Baire sets with the other "Baire" notions. Much of the material in this chapter is taken from Kuratowski [1948], Bourbaki [1966] , Engelking [1977] , Oxtoby [1980] , and Vaughan [1988] .
G-DELTA SETS 20.2. Terminology. I n some older topology books, the letters "F" and "G" are reserved for closed sets and open sets, respectively. That convention is no longer widely used. This text does not follow that convention in general, but gives those letters preference whenever convenient. The following related convention is still widely used: The union of countably many closed sets is called an "Fu ;" the intersection of countably many open sets is called a "G6." Similarly, the union of countably many G6 's is a G6u ; the intersection of countably many Fu 's is a Futi · The letters F and a come from ferme and sum, French for "closed" and "sum." The let ters G and 8 come from Gebiet and Durchschnitt, German for "open set" and "intersection;" see Hocking and Young [1961] . 530
Meager Sets
531
Exercises. The complement of an Fa is a G0 , and conversely. (Thus any results about Fa 's can be restated in terms of G0 ' s, or conversely.) b. The terms "F0" and "Ga" are not useful, since the intersection of countably many closed sets is a closed set, etc. Likewise, the terms "Faa" and "Gu/' are not useful: the union of countably many Fa ' s is another Fa , etc. c . Any Fa is in fact the union of an increasing sequence of closed sets, and any G0 is the intersection of a decreasing sequence of open sets. Hint: If S = U�= l Kn where the Kn's are closed, then also a.
6
d. The intersection of finitely many Fa 's is another Fa ; the union of finitely many G 's is another G8 . Hint: If A 1
x. Hence the set K = {x, x 1 , x2 , x3 , · · · } is compact, and fn ____, f uniformly on K. By the continuity of f we have f(xn) ____, f(x). Then
) + d (fn (Xn), f(xn)) + d (! (xn), f(x) ) 0. d(x, Xn) + 0 + sup d (!n (v) , f(v) ) + d (! (xn), f(x) ) v EK (
d(x, Xn) + d Xn, fn (Xn)
This proves our claim. It suffices to exhibit a G8 set w* contained in 1ll 1 n W u , for on such a set we have T = T. For f E 1ll 1 , the set F(f) = { fixed points of ! } is nonempty; let 8 ( ! ) = diamd F U ) . Observe that 8 (-) = 0 on 1ll 0 . Now consider 1ll 1 as a topological space, equipped with the relative topology. Next we claim that 8 ( ·) is continuous at each point of W 0 . I n other words,
( )
for each E >
8 (·) 'S E.
0,
each
f
E 1ll 0 has an open neighborhood Vf,e: in 1ll 1 on which
Indeed, suppose not. Since the given topology on W and on W 1 is metrizable, there exists a sequence Un) in W 1 converging to f with 8 (fn) > E. Then there exist Xn , Yn E F(fn) with d(xn, Yn) > E. By our definition of Wo, both the sequences (xn) and (yn) must converge to T(j). But then d(x n , Yn ) ----> 0, a contradiction. This proves the claim. Now observe that we = u/ E 'llo Vj.E is an open set in \{/ 1 that contains Wo, and on which 8 ( · ) -:::: E. Hence \{/* = n�= 1 w1 / n is a G6-set in \{/ 1 that contains Wo , and on which 8 (-) = 0. Since w* is a Gb-set in 1ll 1 and 1ll 1 is a G,5-set in W , it follows ( easy exercise ) that w* is a Gt5-set in W. Now consider any f E 1ll * . Then F(f) is a nonempty set with diameter 0 ( since w* c;;; 8- 1 (0) n wl ) . Since X is a metric space, F(f) is a singleton; hence f E W u . Thus w* c;;; 1ll 1 n 1ll u . This completes the proof.
20. 11. Corollary on Nonexpansive Mappings. Let (X, d) be a complete metric space, with the property that
536
Chapter 20: Baire Theory ( ! ) the identity map i : X ----> X can be approximated uniformly on sequence of strict contractions Cn : X ----> X.
X by a
Let \[1 b e the set o f all nonexpansive self-mappings of X, equipped with the topology of uniform convergence on X. Then there exists a set \[! * that is comeager in \[1, such that each f E \[!* has a unique fixed point T(j) E X and the mapping T : \[!* ----> X is continuous. (Thus, most nonexpansive self-mappings of X have unique fixed points, which depend continuously on the mappings. )
Remark. Condition ( ! ) is satisfied by bounded metric spaces that are not too irregularly shaped. For instance, it is satisfied if X is a closed bounded subset of a Banach space such that X - x0 is a star set (in the sense of 12.3) for some x0 E X. Indeed, in that case we can take cn (x) = ( 1 - � )x + �xo. Proof of corollary. We shall apply 20. 10. It suffices to show that the set \[10 , defined as in that theorem, is dense in \[1 under the hypotheses of the present corollary. If f : X ----> X is any non expansive mapping, then f is uniformly approximated by the mappings Cn f, which are strict contractions. B y 19.41, every strict contraction is a member of \[10.
o
TOPOLOGICAL COMPLETENESS 20.12. A topological space (X, 'J) is topologically complete (or completely metrizable)
if its topology is pseudometrizable, and at least one of the pseudometrics that yields the topology 'J is complete. In describing a topologically complete space, we do not necessarily specify a particular pseudometric. Caution: Some mathematicians apply the term "topologically complete" only to spaces that are metrizable - i.e., Hausdorff.
20.13. Alexandroff-Mazurkiewicz Theorem on Topological Completeness. Let
(X, d) be a topologically complete Hausdorff space, and let S s;; X have the relative topology. Then S is topologically complete if and only if S is a G6 set in X - i.e. , the intersection of countably many open subsets of X.
Proof Let d b e a complete metric on topologically complete. Verify that e(s, t)
d(s, t)
+
·I
X . We first show that any open set G
1 1 dist (t, X \ G) dist(s, X \ G)
I
(s, t E
C
X is
G)
is a complete metric on G that is topologically equivalent to the restriction of d. Now suppose S = n�= 1 Gn is the intersection of countably many open sets. Then the product P = f1�= 1 Gn has a topology that can be given by a complete metric, by 19.13. Let D be the diagonal set { ( Xn ) E P : x 1 = x2 = X3 = · · }. Then D is a closed subset of P (why?) , hence also complete. Finally, the mapping s f----+ (s, s, s, . . . ) is a homeomorphism from S onto D. ·
537
Baire Spaces and the Baire Category Theorem
For the converse, suppose that S t;;; X is topologically complete. Let e be a complete metric on S that is topologically equivalent to the restriction of d. The identity map i : ( S, d) ( S, e) is continuous, so by 20.9 it extends to a continuous map z : C S, where C is a Gb-subset of X that contains S and is defined by C { x E clx (S) the filterbase i (S n N(x) ) converges in s } . It suffices to show that C t;;; S. Let x0 E C; we wish to show that x0 E S. Each neighborhood of x0 contains a member of S n N(x0), so S n N(x0) converges to x0 . By assumption, X is Hausdorff, so SnN(x0) converges to no other limit. Since x0 E C. the filterbase i(SnN(x0)) converges in S. But i is just the identity map, so we have established that the filterbase S n N(x0) converges in S. Thus its limit, x0, lies in S. 20.14. Example. Show that the set lR \ Ql = {irrational numbers}, topologized as a subset of IR, is topologically complete. -->
-->
:
BAIRE SPACES AND THE BAIRE CATEGORY THEOREM 20.15. Let X be a nonempty topological space. Show that the following conditions on X are equivalent. If X possesses any one (hence all) of these properties, we say X is a Baire space. (A) If G1 , G2 , G3, . . . is a sequence of open dense subsets of X, then the 8et n�1 Gn i8 den8e in X. (B) If F1, F2 • Fl , . . . is a sequence of closed subsets of X and U;:== Fn contain8 a nonempty open set, then at least one of the F,'8 contain8 a nonempty open set. (C) Any comeager 8ubset of X i8 dense in X. (D) Any meager sub8et of X ha8 empty interior. (E) Any nonempty open subset of X is nonmeager. The last condition implies, in particular, that X itself is nonmeager, and hence the meager 8E't8 form a proper O"-ideal on X. 20.16. For our purpose8, the most important re811lt about Baire 8paces is (DC5) Baire Category Theorem. Any complete pseudometric space is a Baire space. For motivation the reader may wish to glance ahead to applications of this theorem, in 20.29, 23.13, 23.14, 23.15.b, 26.2, 27.18, and 27.25. We 8hall prove that the Baire Category 1
Chapter 20: Baire Theory
538
Theorem is an equivalent of the Principle of Dependent Choices, which was introduced in section 6.28.
Proof of (DC2) =? (DC5) . Let (X, d) be a complete pseudometric space, and let any open dense sets V1 , V2 , V3 , . . . is actually a homeomorphism from NN onto X. 20.28. Lemma. Let X be a nonempty, separable, zero-dimensional, complete metric space. Let Y � X be a G6 set that is dense in X, such that X \ Y is also dense in X. Then Y is homeomorphic to the irrationals. Proof This result is from Mazurkiewicz [1917-1918]. The set Y is a separable metric space. It is zero-dimensional, for if { Ba E A} is a clopen base for X, then { Ba n Y : E A} :
a
a
543
Tail Sets
is a clopen base for Y. That Y is complete follows from Alexandroff 's Theorem 20.13. We shall apply the Alexandroff-Urysohn Theorem 20.27; it suffices to show that no nonempty clopen subset of Y is compact (when we use the relative topology of Y ) . Indeed, suppose K is a nonempty clopen compact set in Y , where we use the relative topology of Y; we shall obtain a contradiction. Since K is compact in Y, it is also compact in X; thus clx (K) = K. Since K is open in Y, we have K G n Y for some nonempty set G which is open in X. Then K = clx(G n Y) 2 G by 15.13.b. Since X \ Y is dense in X, the nonempty set G must meet X \ Y - contradicting G � K � Y . 20.29. Theorem. Let X be a nonempty, complete, separable metric space, having no isolated points. Then there exists a meager set M � X and a homeomorphism f from X \ M onto the irrational numbers (where the irrationals are topologized as a subset of �). Proof. This is from Schechter, Ciesielski, Norden [1993]. For later reference we note that the proof of this theorem does not require the Axiom of Choice; at most, it requires DC. Since X is a separable metric space, it has a countable base B1 , B2, B3, . . . . Let D be the union of the boundaries of the Bj 's; then D is meager. We easily verify that X \ D is a nonempty, separable, zero-dimensional metric space. Moreover, it is a G0 subset of a complete metric space; hence it is topologically complete by 20.13. Let C be any countable dense subset of X \ D. Then any superset of C is also dense in X \ D. The set M C U D is meager in X \ D; hence X \ M is dense in X \ D, by the Baire Category Theorem. Also, M is the union of countably many closed sets, so X \ M is a G0 set in X \ D. By the preceding lemma, X \ M is homeomorphic to the irrationals. 20.30. Corollary. Let X1 and X2 be nonempty, complete, separable metric spaces, that have no isolated points. Then there exist meager sets MJ � Xj (j = 1, 2) such that X1 \ M1 is homeomorphic to X2 \ M2. =
=
TAIL S ETS
We consider two different notions of "tail sets." We shall relate them in exercise 20.32.c, below. (However, these two notions are unrelated to a third meaning of the term, given in 7.7.) a. We may sometimes write the set 2N as { 0, l} N , particularly if we want to emphasize that we are viewing it as a collection of sequences of Os and 1s. A set S � { 0, 1 } N is a tail set in {0, 1 } N if it has this property: Whenever x (x 1 , x2, X3, . . . ) is a member of S, and y (y1 , Y2, y3, . . . ) is another sequence of Os and 1s that differs from x in only finitely many components, then y is also a member of S. (The idea is that x and y are eventually the same; they have the same "tails.") 20.31. Definitions.
=
=
544
Chapter 20: Baire Theory n.
A dyadic rational will mean a number of the form m/2n , for integers m and A set S c;;: [0, 1) is a tail set in the interval [0, 1) if it has this property: Whenever x is a member of S and y is another point in [0, 1) that differs from x by a dyadic rational, then y is also a member of S. 20.32. Exercises. Show that a. The two kinds of tail sets can also be described as follows: Say that two sequences of Os and 1s are equivalent if they differ in only finitely many components; or say that two numbers in [0, 1) are equivalent if they differ by a dyadic rational. These are equivalence relations on { 0, 1 F11 and on [0, 1), respectively. In either setting, a set is a tail set if and only if it is a union of equivalence classes. b. The tail sets in {0, 1 y E
{
---->
0 for each E > 0
(
) }
for each E > 0, eventually f.l * w E n : d fa (w) , f (w) > E
< E.
(Exercise. Prove that equivalence.) This is also called convergence in probability if J.L(O) = 1 . •
We say fo:
---->
f J.L-almost uniformly if
for each E > 0, there exists a measurable set S and fo: ----> f uniformly on S.
� n such that J.L(O \ S)
<E
It is easy to verify that each of these is a centered, isotone convergence, as defined in 7.34. Convergence in measure actually has much better properties. We shall see in 2 1 .34 that it is determined by a pseudometric - or by a metric, if we identify functions that are J.L-equivalent. Almost uniform convergence is not given by a metric. In fact, we shall show in 2 1.33.c that almost uniform convergence is not topological, or even pretopological.
Observations. Convergence in measure is preserved if we replace functions with equivalent functions. That is, if fo: is J.L-equivalent to 9o: and f is J.L-equivalent to then
g,
fo:
---->
f in measure
9o:
---->
g
in measure.
Thus, convergence in measure makes sense for equivalence classes of functions. Almost uniform convergence makes sense for sequences of equivalence classes of func tions, just like almost everywhere convergence - see 2 1 . 18.
Preview. The following chart summarizes the relations that we shall establish between the three kinds of convergences.
Convergence in Measure
563
21.30. Proposition. If io.
--+
I
I
(pass to a subsequence)
everywhere.
convergence almost everywhere
convergence almost uniformly
convergence in measure
in's
(assume measurable, and either JL(D.) < oo or dominated)
in's
i JL-almost uniformly, then in i in measure and JL-almost --+
(Proof. Easy exercise.)
21.31. Theorem. If gn --+ g in measure, then the sequence
(gn) has a subsequence that g tL-almost uniformly (and therefore also converges pointwise JL-a.e. ) . Hints: For each c > O, eventually JL* { w E D. : d(gn(w), g(w)) > E } < c by assumption. Hence (gn) has a subsequence (h) that satisfies, for some measurable sets Sk , converges to
and Let Tk = S" u Sk:+ 1 as j --+ oo .
U
Sk +2 U · · · ; then JL(TA:) < 2 - k . Show that i1 --+ g uniformly on D. \ T"
21.32. Egorov's Theorem. Let (!1 ) be a sequence in SM(S, X), converging pointwise :
i II X. Assume also JL(D.) < oo. Then i1 --+ i JL-almost uniformly (hence i in measure) . Remarks. Note that we must assume the i1 ' s are strongly measurable. See also the related result in 26. 12.f. to a limit also f1 --+
--+
Hints: Let any E > 0 be given. For positive integers k and m, let
i
Use the strong measurability of the /s (see 2 1 .7) to show that Bk:, m is a measurable set. For fixed m, show that IL(n %"= 1 Bk:. m ) = 0. Using the fact that JL(D.) < oo, show that JL(Bk ( rn ) . m ) < 2 - m E for some integer k(m). Let AE = u �=l Bk ( m ) .1n - Then JL(A: ) < E, and f1 --+ uniformly on D. \ A".
i
21.33. Examples.
Chapter 21 : Positive Measure and Integration
564
a. Let (rl, S, 11) be the real line with Lebesgue subsets and Lebesgue measure. Let fn be the characteristic function of the interval [n, n + � ] . Then fn ---+ 0 pointwise and in measure, but not 11-almost uniformly.
b. Let (rl, S, 11) be the unit interval
[0,
1 ] , with Lebesgue subsets and Lebesgue measure; thus the measure of an interval is the length of that interval. Let h , h , h , . . . be the characteristic functions of the intervals
etc., in that order. Show that pointwise 11-a.e.
fn
---+ 0 in measure, but not 11-almost uniformly or
c. Example of non-pretopological convergence. Use the preceding example and the last few theorems to show that, in general, almost uniform convergence and almost everywhere convergence both lack the sequential star property introduced in 15.3.b. Hence, in general, those two convergences are not pretopological.
21.34. Let (rl, S, 11) be a measure space, and let (X, d) be a pseudometric space. For
j, g E x n , define
l��
[
arctan a + 11*
{ w E r2 : d (f(w), g(w) ) > a }] .
(The arctan function can be replaced by any other bounded remetrization function; see 18. 14. ) Admittedly, this formula is rather complicated. After we use it below to prove a few simple, basic properties, we will generally refer to those simple, basic properties, rather than the complicated formula for D'"; we will very seldom want to make direct use of that formula. Still, the reader will probably find it conceptually helpful to see that there is some explicit formula for the pseudometric. Show that
D'" is a pseudometric on x n . b. D'"( J , g) = 0 if and only if d( J (-), g(-)) = 0 11-almost everywhere. Thus, if d is a metric on X, then D'" is a metric on the quotient space *X = x n /11 - i.e., on the set of all a.
11-equivalence classes of functions.
c. The convergence determined by the pseudometric D'" is the same as convergence in measure.
d. SM(S, X ) (defined in 21 .4) is a closed subset of the pseudometric space x n ; hence SM(11, X) (defined in 21 . 17) is a closed subset of the metric space *X. 2 1 .31.
Hint: 2 1 .3 and
21.35. Theorem. I f the pseudometric space (X, d) is complete, then the pseudometric space (X n , D'") is complete.
Integration of Positive Functions
565
Proof. Any D1L-Cauchy sequence has a subsequence (fk) satisfying for
k = 1 , 2, 3, . . .
it suffices to show that that subsequence is convergent in measure. By assumption, there k k is a measurable set Sk :2 {w E 0 : d(fk (w), fk+ 1 (w)) > 2 - - 1 } with f.L(Sk ) < 2 - - 1 . Let k Tk = Sk u Sk+ 1 u Sk+2 u · · · ; then f.L(Tk ) < 2- . Since the Tk 's form a decreasing sequence, we have for w E 0 \ T1 and i 2': j. Fix any k and any w E 0 \ Tk ; consider i 2': j 2': k; the preceding estimate shows that the sequence (fk (w), fk +1 (w), fk+ 2 (w), . . . ) is Cauchy in (X, d) . Since that space is complete, the sequence (f; (w) : i E N) is convergent. Let f(w) be any of its limits. (This is unique if d is a metric on X.) Take limits in ( * * ) as i -" oo, to establish
d(f(w), fJ (w))
Thus for j
(O < p < oo)
and l l x lloo = max { l x1 l , lx2 l , . . . , l xn l }. Show that the functions I ll �'in{ l ,p } ' for 0 < p :::; oo, are G-norms on X, all equivalent to one another, and the topology they determine on X is the product topology. When 1 :::; p :::; oo, then I li P is a norm on X . 22.13. Quotient norms. Let X be an Abelian group (written as an additive group); let K be a subgroup; let Q = X/K be the quotient group; let : X Q be the quotient map. For each G-seminorm p : X [0, +oo) , we may define an associated function p : Q [0, +oo) by p(q) inf {p(x) : x E 1r- 1 (q) } . Show that a. p is a G-seminorm on Q. In fact, it is the largest G-seminorm on Q that satisfies p (1r(x)) :::; p(x) for all x E X. Hint: 4.42. ----+
7r
----+
----+
581
Sup Norms b.
If X is a vector space, p is a seminorm, and K is a linear subspace of X, then p is a seminorm on the quotient space X/ K. In some cases it is a norm; then it is called the
quotient norm. c.
7r
preserves open balls:
(
1r {x E X : p(x) < d
)
{q E Q : p(q) < E} .
If p- 1 (0) :;2 K, then p is constant on each set of the form 1r - 1 (q) , and so our definition of p simplifies to p(1r(x)) p(x). e. The map between pseudometric spaces, : (X, p) ( Q, p), is a topological quotient map (defined as in 15.30). More generally, let X be topologized by a gauge D consisting of G-seminorms, and let Q be topologized by the corresponding gauge D = {p : p E D}. Suppose that D is directed, in the sense of 4.4.c. Then X Q is an open mapping (by 22.13.c), hence it is a topological quotient map (by 15.31.e).
d.
=
7r
1r :
-->
-->
SUP NORMS
As usual, let IF be either IR or -)1 ,\ E A}; this is sometimes called the sup norm. It is complete. We have already seen one example of sup norms in 22.11. The metric on B(A) obtained from this norm is the same as the metric given in 4.41.f. The results in 4.41.f show that 22.14.
:
every metric space (A, d) may be viewed as a subset of a Banach space.
Thus, in principle, metric spaces are not really "more general" than subsets of normed spaces. However, this embedding is seldom used in applications. The additional linear structure of B(A) may be merely distracting and not particularly relevant to the properties of the metric space (A, d) that one may be studying. For instance, we may gain some under standing of the "numbers" ±oo by viewing them as elements of the metric space [ -oo, +oo] introduced in 18.24, but that understanding is not necessarily increased if we study the larger and more complicated space B( [-oo, +oo] ) . 22.15. More sup-normed spaces. Let n be a topological space; then B(!l) = {bounded functions from n into IF} is a Banach space when equipped with the sup norm. We now consider some interesting subspaces. First of all,
582
Chapter 22: Norms
continuous functions from n into lF}. is a closed linear subspace of B(n) - hence a Banach space, when equipped with the sup norm. If n is a uniform space, show that BUC(n) = { bounded, uniformly continuous functions from n into lF} is a closed subspace of BC(n) . Suppose n is a locally compact Hausdorff space ( such as !Rn , for instance) . Then a function f n lF is said to vanish at infinity if for each E > 0 the set { x E n I f ( x) I > E} is relatively compact. In this setting, C0(n) = { continuous functions from n into lF that vanish at infinity } is a closed linear subspace of BC(n) , hence another Banach space. If n is also equipped with a uniform structure, show that C0(n) � BUC(n) � BC(n). Of course, all these spaces are the same if n is a compact Hausdorff space. A generalization. For any Banach space (X, I lx ) , we can define BC(n, X ) , C0(n, X ) , BUC(n, X) in an analogous fashion; they are closed linear subspaces of the Banach space B (n, X) { bounded functions from n into X} with sup norm llflloo = sup{ lf (w) lx E n} . A specialization. Let n be the set N = { positive integers } , equipped with this metric: d(m,n) = I arctan ( m) - arctan ( n) l . This gives N its usual topology ( i.e., the discrete topology) , but gives N the uniform structure of a subset of the compact space [0, +oo] . Then B (N) = B C(N) ( since the topology on N is discrete) , and the three Banach spaces BC(N) , BUC(N), C0(N) can be rewritten respectively as { bounded sequences of scalars } , €00 c { convergent sequences of scalars } , { sequences of scalars that converge to 0}, c0 all equipped with the sup norm. BC(n)
:
=
{ bounded,
-+
:
:
22.16. Exercises. a. The sup-normed spaces C [O, 1]
w
and C0 (IR) are separable. We prove this for real scalars; the proof for complex scalars is similar. By a "rational piecewise affine function" we shall mean a continuous function whose graph consists of finitely many line segments, each of which has endpoints with rational coordinates; in the case of C0(IR) we extend such a function by making it equal to 0 for all sufficiently large or small arguments. Show that there are only countably many rational piecewise affine functions. Show that members of C[O, 1] or C0(IR) are uniformly continuous; use that fact to show that the rational piecewise affine functions are dense. Hint:
583
Sup Norms b.
The sup-normed space BUC(lR.) is not separable. Hints: This is easiest in the case where the scalar field is 0 for at least one cp E (since the product topology on xr is a Hausdorff topology), so 11!11 > 0. :
s
c0 •
--->
--->
n
584
Chapter 22: Norms
It remains to show that ( V, II II) is complete. Let Un ) be a II 11 -Cauchy sequence in V. Thus, for each number c > 0 there is some integer NE such that m, n 2 NE llfm - fn I I :S c. Therefore, for each r.p E we have :::}
x
Thus the net Um - fn (m, n) E N N) converges to 0 pointwise on r. Since X is complete, for each E r there exists J('!) = limn--> 00 fn ('/). Since fn f pointwise, we have r.p(fn - f ) 0 for each E . Hold and m fixed and take limits in ( ) as n --+ oo; thus r.p ( fm - f ) :S c . In other words, m 2 NE ll fm - ! II :S c. This proves that f E V and that Um ) converges to J in (V, I II ). 22.18. The space of Holder continuous functions. The definition of Holder continuity, given in 18.4, simplifies slightly when the metric space Y is a normed space, with norm I I Y . Let (X, d) be any metric space, and let > 0. For functions f : X Y, we obtain '/
--+
:
r.p
--+
r.p
*
:::}
o:
--+
Show that a Hol (X, Y) = { ! E y x : ( !)a a< oo} is a linear space. The function On is not a norm, but rather a seminorm on Hi:il (X, Y). Indeed, we have ( !)a = 0 if and only if f is a constant function. b. To get a norm, select any point in X; let us call that point "0" (although we shall not use any additive structure in X). Then llf l l a (!)a + lf (O) I Y defines a norm II ll a on Hi:il"'(X, Y). c . If Y is complete, then Hol" (X, Y) (normed as above) is complete, regardless of whether X is complete. Hint: This is a special case of 22.17. d. If 0 < < {3 :S 1, show that Hi:il11 ([0 , 1] , Y) C([O, 1], Y), where the last space is the space of continuous functions from [0, 1] into Y, equipped with the sup norm. The inclusions are continuous. If Y = !Rn for some positive integer n, then the inclusions are compact � i.e., a bounded subset of one normed space is a relatively compact subset of the next space. Hint: Use the Arzela-Ascoli Theorem 18.35. e. A related exercise. This time we take the domain, rather than the codomain, to be a subset of a normed space. Let C be a convex subset of a normed space (X, I l x ), and let (Y, e ) be any metric space. Show that if > 1, then Hi:il"' ( C, Y) contains only constant functions.
Exercises. a.
=
o:
c
c
·'
o:
585
Convergent Series
Hint: Suppose (P) n = k , and let any u, v E C be given. Let n be a large positive integer. Define ( 1 - t ) u + tv for j = 0, 1 , 2, . . . , n. Then Xj =
e (p( u), p( v ))
"S:
n
L e (p(xJ ) , p(xJ-d)
j =l
"S:
n
L klxJ - xJ-r lx
j=l
kl u - v lx n - 1
Let (X. I I ) be a Banach space. Let B V ( [a, b] , X) be the set of all functions from into X that have bounded variation (as defined in 19.21). Show that a. BV ( [a, b] . X) is a linear space, and Var(-, [a, b] ) is a seminorm on B V ( [a, b] , X). b. I I 'P I I BV = l cp(a) l x + Var(·, [a, b] ) is a norm on B V ( [a, b] , X ) . Moreover, I I 'P I I = "S: I I 'P I I B V · c. B V ( [a. b] , X ) . normed as above, is complete. Hint: This is a special case of 22.17. d. We say f [a, b] X is a normalized function of bounded variation on [a, b] if f has bounded variation on [a, b] , f is right continuous on (a, b) , and f(O) = 0. The collection of such functions will be denoted by N B V ( [a, b] , X); it will play an important role in 29.34. Note that N BV( [a, b] , X) is a linear subspace of BV( [a, b] , X ) , and Var(·, [a, b] ) acts as a norm on N BV( [a, b] , X ) .
22.19. [a, b]
:
__,
CONVERGENT SERIES
By a series in a normed space (X, II II ) we mean an expression of the form · · ·, where the x_/s are members of X. The sum of the series + is the vector v = limN L.�= if this limit exists. If it exists, we say the series is convergent: w may also write v L� J A series L � I :rJ is absolutely convergent if L� I ll x.i II < oo. Any absolutely con vergent series in a Banach space is convergent; that follows from the completeness of X. In fact, ( exercise) a nonned space is complete if and only if every absolutely convergent series iu the space is couvergent. See also related results in 10.41. 23.26. and 23.27. 22.21. Dirichlet's test. Let V be a Banach space. Let L�=l VA: be a series in V whose partial sums 811 L�'= 1 form a bounded sequence. Let (bk ) be a sequence of real numbers decreasing to 0. Then the series L � r bk vk is convergent. (A corollary is the Alternating Series Test, given in 10.4l.g.) Proof of Dirichlet 's test. For any positive integers with n :::- verify that 22.20.
L.� l XJ =
:r 1
+
:r2
.r;;
+
�x
e
1 :r1 . =
Xj .
t he
=
v,.
m, n
II
L b, u,
h= 111
b11 + l sn - bm sm - 1 -
m,
71
L (bk+l
k =ffl
- b k)sk .
586
Chapter 22: Norms
By assumption, S = supn l sn l is finite. Hence It follows that the partial sums of the series 2::: �=1 bk vk form a Cauchy sequence. 22.22. Example. If ( bk) is any sequence of positive numbers decreasing to 0, then k 2::: �= 1 bk sin(kx) converges for each real number r. In particular, the series 2::: � 1 sin� x) and 2::: � 1 t"�(k��ll both converge. (Contrast this result with 10.43.) Proof First show that 2 (sin �) ( sin x + sin 2x + + sin x ) cos cos ( + � )x , · · ·
(�x) - (
n
n
)
either directly (using trigonometric identities), or by using the formulas sinO = (eili e - ili) / 2i, cos B = ( e ili + e i0) / 2. Use that formula to show that the partial sums of the series 2::: �= 1 sin kx form a bounded sequence, for each fixed Now apply Dirichlet ' s Test. 22.23. Let (X, I II ) be a complex Banach space. Let c0 , c1 , c2 , . . . be some sequence in X, and let a be a complex number. (In the simplest case we take a = 0.) Then the expression I:�=O cn (A - a) n is called a power series centered at the en's are called its coefficients. Associated with the power series is a number R E [0, +oo] defined by 1 limsup �. R n->oo This number R is called the radius of convergence of the power series, the set {A E C : l A - a l < R} is called the disk of convergence, and the set {A E C : lA - a l = R} is called the circle of convergence. (The following results are also valid with real scalars, with intervals for "disks," but for simplicity of notation we shall only consider complex scalars.) The series, radius, and disk have these properties: a. If only finitely many of the en ' s are nonzero, and limn_,00 llcn ll/llcn + 1 11 exists in [0, +oo] , then that limit is equal to R. Remark. The expression limn_,00 llcn ll/llcn + I il is simpler, and thus is preferable in those cases where it is applicable. On the other hand, the more complicated expression 1 / lim supn_,oo y![jCJf has the advantage that it is always applicable. b. For each complex number A with l A - a l < R, the series I:�=O cn (A - a) n converges to a limit - that is, limN _,oo I:�=O cn (A - a) n exists in X. The series is absolutely con vergent, and the convergence is uniform on compact subsets of the disk of convergence. Thus the power series defines a function on that disk; we summarize this by writing 00
x.
a;
! (A)
( I A - a i < R) .
587
Convergent Series
Hints: Any compact subset is contained in a set of the form { >. E C 1 >- - a l ::; r} for some number r < R. See 10.4l.d and 22.20. c. The series L n en (>. - a) n is divergent ( i.e., nonconvergent ) for every A E C with 1 >- - a l > R. 0 as Hint: If the series is convergent for some value of >., then en (>. - a) n n oo. Then for all n sufficiently large, we have li en (>. - a) n ll < 1. If A i- a, then v1cJf < 1/ l >- - a l . Further properties of power series are described in 23.29 ( iii) and 25.27. 22.24. Elementary examples. A power series I::= o en (>. - a) n converges inside the circle of convergence, and diverges outside that circle. The behavior is more complicated the circle of convergence - i.e., at points >. satisfying 1 >- - a l = R. A series may converge at all, some, or none of these points. Following are a few simple examples with center a = 0 and with coefficients in X = C. a. Any polynomial (of a complex variable, with complex coefficients ) is a power series with infinite radius of convergence. It has only finitely many nonzero coefficients. b. The power series j(>.) = I::=o >.'' = 1 + >. + >. 2 + >.3 + has radius of convergence equal to 1. Since :
---t
---t
on
· · ·
when >. i- 1,
we easily see that the power series I::=o >.n converges to 1 �>. when 1 >-1 < 1 and diverges for every >. such that I>. I 2 1. f + >.42 + �3 + has radius of convergence c . The power series j(>.) = 2::: := 1 n - 2 equal to 1. Show that this series converges ab:solutely at every point on the circle of convergence. d. Hardy gave an example of a power :series that converges uniformly, but not absolutely, on its circle of convergence. Lusin gave an example of a series I::= l an N' such that a n 0, but such that the series diverges at every point of the circle of convergence. These examples are much more complicated and will not be given here; they c;cm be found in Landau [ 1929, pages 68-71] . 22.25. Sequence spaces. Let lF be the scalar field (JR or C). For any sequence of scalars x = (x l , x2 , x:�, . . . ) , define ll xll= = sup { l x 1 l , l x2 l , l x 3 1 , . . . } and >.''
=
· · ·
---t
(0 < p < oo).
Then define (O < p :S: oo).
588
Chapter 22: Norms
Then fip is a linear subspace oflFN . If 1 s; p s; oo, then II llv is a norm on fip (hint: 12.29.g) ; hence sequences of scalars satisfy Minkowski's Inequality: (1 :S p < oo).
If 0 < p < 1, then II llv generally is not a norm, but II II� is a G-norm on fip (hint: 12.25.e) ; in fact, we shall see in Chapter 26 that it is a special kind of G-norm that we call an F-norm. The spaces f!p are a simple but important special case of the spaces £P (p,, X ) , introduced in 22.28. The completeness of the spaces fip and £P (p,, X) will be proved in 22.31 (i ) . The Cauchy-Schwarz Inequality states that ll xYIIl s; ll x ii2IIY II 2 , where xy is the sequence whose nth term is XnYn · For a proof, take limits in 2.10. Exercise. Let 0 < p < oo. Show that a subset S is relatively compact in f!p if and only if it is metrically bounded and satisfies limN � oo supx ES '£ ':= N l xk I P = 0. A generalization. Let .lJ be any nonempty set. For any function x : .lJ JF, define ll x lloo = supj EJJ l xj I and ---->
(0 < p < oo).
Positive sums over arbitrary index sets are defined as in 10.40. ) Then define (O < p :S oo). {x E lF.IJ : ll x llv < oo } For 1 s; p s; oo, fip(.If) is a linear subspace of JF.IJ and II llv is a norm on that space. This generalization will be particularly useful in 22.56. 22.26. The James space J (optional) . For sequences x = (x 1 , x 2 , x3 , . . ) of scalars, let sup { l xk( l ) - Xk(2 ) 1 2 + l xk( 2) - Xk( 3) 1 2 + l xk(:.J) - xk(4) 1 2 ll x ll .1 2 } 1 /2 2 2 + + l x k(n- 2 ) - X k(n- 1 ) I + l x k(n- 1 ) - X k(n ) I + l x k(n) - X k( 1 ) I where the supremum is over all positive integers n and all finite increasing sequences k ( 1 ) < k ( 2) < · · < k(n) of positive integers. Let J = {x E c0 ll x ii .J < oo}. This space was devised by James [ 1951 ] to answer several questions about normed spaces; one of those questions will be mentioned in the remarks in 28.41. The space J is discussed further by James [1982] . Show a. ( J, I l l.1 ) is a Banach space. b. f!2 � J, and I 11 2 is strictly stronger than II ll.1 on fi2 . Hint: Use the Cauchy-Schwarz inequality. c. J � and II ll.1 is strictly stronger than the sup norm on J. (
.
· · ·
·
co ,
:
Bochner-Lebesgue Spaces d.
589
For 2 p neither of €P or J includes the other, and neither of I li P or II ll .1 is stronger than the other on €P n J. Hints: To show llxll / l l x ii P is unbounded, consider (1 - " , 0, 2 - r , 0, 3- r , . . . , 0, n - r , 0, 0, 0, 0, 0, . . . ) with r E ( � , � ). To show ll x l l p / l lx ii .J is unbounded, consider a sequence of n 1 s followed by infinitely many Os.
590
Chapter 22: Norms
Show that £00 is such a space.) It is not presently known whether, whenever X is such a space and (0, S) is a measurable space, then M(S, X) is necessarily a linear space. Some related questions are considered by Stone [1976]. 22.28. Definitions. Let (0, S, JL) be a measure space, and let (X, I I ) be a Banach space. For each f E SM(S, X), the function I J(-) 1 : 0 ---+ [O, +oo] is measurable. Hence, using the type of integral defined in 21.36, we can define the quantities
{L I J(w) IPdJL(w) } l/p
( O < p < oo ) , inf {r > 0 : I f(-) I :S: r JL-a.e.} J I J JJoo - they are numbers in [0 , +oo] . (The case of p = 1 is particularly simple and important, so we shall restate it separately: II J II I = fn I J(·) I dJL.) We can also define the set of functions U(JL, X) {f E SM( S, X) : il f iiv < oo} (0 < p ::; oo). Then U(JL, X) is a linear subspace of SM(S, X), for each p E (0, oo]. When 1 ::; p < oo, then II ll v is a seminorm on that space. (Hint: 12.29.g.) When 0 < p ::; 1, then II II � is a G-seminorm on that space (hint: 12.25.e); in fact, we shall see in Chapter 26 that it is an F-seminorm. Note that .C 00 (S, X) (defined in 22.27.b) includes only functions that are bounded, but 00 ,C (JL, X) consists of functions that are bounded almost everywhere. In fact, a function belongs to ,C 00 (JL, X) if and only if it agrees almost everywhere with some member of .c''0 ( S, X). Remarks on membership in the Lebesgue spaces. Some mathematicians define the spaces ,CP (JL, X) a little differently, but in most cases their definitions are equivalent to the one given above. Note that f belongs to ,CP (JL, X) if and only if 1. f is "regular," in the sense that f belongs to SM(S, X), and 2. f is "not too big," in the sense that there exists some function g E ,C P (JL, JR.) such that ll f ll v
1! (·) 1 :::: g(·).
These two conditions are entirely different in nature and can be studied separately from one another. Associated metric spaces. For 0 < p ::; oo, in general the spaces ,C P (JL, X) are merely pseudometric spaces; we can make them into metric spaces by taking quotients in the usual fashion: Observe that II! - g JJv = 0 if and only if f = g JL-a.e. This defines an equivalence relation f ::::::; g on the pseudometric space ,CP (JL, X). The resulting metric space is denoted U'(JL, X); we may call it the Bochner-Lebesgue space of order p. The seminorm II li P or G-seminorm II II � on U(JL, X) (for 1 ::; p ::; oo or 0 < p < 1, respectively) acts as a norm or G-norm on LP(JL, X). In general, the spaces ,CP (JL, X) and LP (JL, X) are different. Members of ,CP (JL, X) are functions, whereas members of LP (JL, X) are equivalence classes of functions. In some con texts, members of L P (JL, X) are discussed as if they were functions - i.e., the distinction
591
Bochner-Lebesgue Spaces
between a function and its equivalence class is ignored. In certain contexts this abuse of language is convenient and does not cause confusion. Although the spaces f](J1, X) and U'(/1, X) are different in general, they are the same in some special cases - for instance, when 11 is counting measure, for then each equivalence class in f](J1, X) contains only one function. Notation for scalar-valued functions. When X is the scalar field lF, then we abbrevi ate U' (J1, X) as f] (tL) and abbreviate V'(J1, X) as V'(J1). The spaces £1' (11) are called Lebesgue spaces. When 11 is counting measure on the finite set { 1, 2, . . . } , then £1'(11) = J:..l ( tL) is just the finite dimensional space lFn , normed as in 22.11. When 11 is counting measure on N, then £1' (11) = £ P (tL) is just the sequence space Rp ; thus all the results proved below for integrals have corollaries about sums. More generally, when is counting measure on some set ], then LP (JL) U'(tL) is the generalized sequence space Rp (.JJ ) introduced in 22.25. When n is a subset of !R71 and is n-dimensional Lebesgue measure, then LP (tL) is usually written as £P (0). For instance, if 11 is one-dimensional Lebesgue measure on the interval [0, 1 ] , then £1' ( 11 ) is usually written as £1'(0, 1) or £1'[0, 1 ] . There is no substantial difference between U'(O, 1) and U' [O, 1 ] since a single point has Lebesgue measure 0. Further notation. The number ll f l l x is sometimes called the essential supremum of the function f. Caution: That term has another meaning; see 21.42. An integrable function is a member of U (/1, X) or £ 1 (/1, X ) ; this terminology is ex plained in 23.16. 22.29. Lebesgue's Dominated Convergence Theorem. Let 0 p Let Un ) be a sequence in £P (J1; X), converging pointwise to a limit f. Assume that the fn 's are dominated by some member of £P (J1; lR) - i.e., assume that lfn (w) l ::; g(w) for some function g U' (JL; IR). Then f E U' (J1; X) and ll fn - f l i P -+ 0. Remarks. This theorem can be proved for Riemann integrals by more elementary methods - i.e., not involving a-algebras and abstract measure theory. See Luxemburg [1971 ] and Simons [ 1995] , and other papers cited therein. Proof of theorem. We first prove this in the case of 1. Observe that l f(w) l ::; g(w). Apply Fatou 's Lemma (see 21.39.c) to the functions ,n
11
=
11
(D), let X n = U n / ll u n ll and Yn = Vn / ll vn ll ·
For (A)
22.41. Examples. a.
Let 1 < p < oo. When ll f ll = ll g ll = 1, then Clarkson's Inequality (proved in 22.35) 1 13 yields modulus of convexity less than or equal to the function b(c) = 1 - [ 1 - ( � ) 13 ] 1 ; thus LP (J-L) is uniformly convex. Optional remarks. When p > 2 > q then this estimate is the best possible, and so the function b defined above (with {3 = p) is actually equal to the modulus of convexity of LP (J-L); this is shown by Hanner [1955] . However, when p < 2 < q, then the estimate can be improved slightly; Hanner shows that the modulus of convexity b(c) is the slightly smaller function defined implicitly by the equation (1 - b + �) P + p (1 - b - �) = 2.
In general, norms of type II 111 are not strictly convex. For instance, when JR2 is equipped with the norm ll (x 1 , x2 ) ll1 = l x 1 l + l x2 l , then the unit sphere contains the line segment { (x 1 , x2 ) : x 1 , x2 2: O, x 1 + x2 = 1 } . c . In general, norms of type I I l l oo are not strictly convex. For instance, when 1R2 is equipped with the norm ll (x 1 , x2 ) ll oo max{ l x l , l x2 l } , then the unit sphere contains the line segment { (x 1 , x2 ) : x 1 = 1 and - 1 ::::; x2 ::::; 1 } . d . ( A renorming example due to Clarkson.) Let lF be the scalar field, and let C[O, 1 ] = { continuous functions from [ 0, 1] into lF}. Let ( t n : n = 1, 2, 3, . . . ) be a dense sequence in (0, 1 ) e.g., the rationals i� (0,1 ) or the dyadic rationals. For continuous f : [0, 1] lF, let b.
=
1
-
e.
----+
Show that II lie is a strictly convex norm on C[O, 1] that is equivalent to II ll oc · Hint: Use the strict convexity of £2 . (Lovaglia 's example.) Show that Clarkson ' s norm I lie , given in 22.4l.d, is not locally uniformly convex, by letting x(t) be the constant (3/4) 1 1 2 and Yn (t) x(t) min{1, nt}. =
599
Strict Convexity and Uniform Convexity
If X is a uniformly convex normed space, then the completion of X is also uniformly convex; it has the same modulus of convexity. ( Optional.) The completion of a strictly convex space need not be strictly convex, as the following example shows. For sequences of scalars y (yo, Y1 , Y2, · · · ), let IIYII = IYol + /''£"F= 1 4 -i iYJ I 2 . Let Y be the set of all sequences for which IIYII < oo. Let X be the subspace consisting of those sequences y that also satisfy limJ� = YJ = 0. Show that (Y, I II) is a Banach space, and X is a dense linear subspace. b. X is strictly convex. Hint: Use the fact that £2 is complete and strictly convex. Y is not strictly convex. Hint: 22.4l.b. 22.43. Clarkson's Renorming Theorem. Let (X, II II) be a separable normed space. Then I I II is equivalent to a strictly convex norm. Proof This short proof is from Riley [1981]. Let lF be the scalar field. Let (x n ) be a sequence in X with the property that every point in X is a limit of some subsequence of (xn ) (see 15.13.g). For = 1, 2, 3, . . . , let inf IIY - h,ll · dist(y, lFxn ) fn ( Y ) .\ E lF Define 'Y (Y ) = IIYII + 2::;',"= 1 2 - n fn (y). Then show Each fn is a seminorm on X , with fn ( ·) :S: II · II · b. 'Y is a norm on X that is equivalent to I I I· Now let y and z be nonzero vectors in X, with 'Y(Y + z) = 'Y ( Y ) + 'Y (z). It suffices to show that y = tz for some t > 0. Show, first of all, that IIY + z ll = I IYII + ll z ll , and fn ( Y + z) = fn (Y) + fn (z) for all d. Since (xn) is dense in X, there is some subsequence (xn(j) ) that is I 1 1-convergent to y + z. That is, IIY + z - Xn(j) I ---+ 0 as j ---+ oo. Hence fn(j ) (y + z) ---+ 0, and therefore fn(j ) ( Y) ---+ 0. Thus there exist scalars AJ with IIY - AjXn(jJ I I ---+ 0. e. We consider two cases now: First, suppose the sequence (>.1) is unbounded. Replacing it with a subsequence (explain), we may assume that 1/>..1 0. Using the joint continuity of multiplication (noted in 22.7), show that ll xn(j) I 0, hence II Y + z ll 0, hence y = z = 0, a contradiction. f. Thus, the sequence (>.J) is bounded. Replacing it with a subsequence (explain), we may assume that ()..J) converges to some finite scalar >.. . In that case, again using the joint continuity of multiplication, show y = >.(y + z ) hence ).. i= 0. g. Similarly, z JL( Y + z) for some nonzero scalar JL, so y tz for some nonzero scalar t. h. Since also IIY + z ll = II Y II + ll z ll , show that II + t l = 1 + l t l , and therefore t > 0. 22.44. Remarks. The theorem above was originally proved for norms by Clarkson. The proof given above can also be applied to F-norms, if interpreted appropriately. Still more is true, at least for norms. We have in fact 22.42.
=
a.
c.
n
a.
c.
n.
---+
---+
;
=
=
=
600
Chapter 22: Norms
Kadec's Renorming Theorem. Every separable normed space has an equiv alent norm that is locally uniformly convex. The proof of Kadec 's theorem is longer and deeper, and will not be given here. Some other, related results: Any separable normed space has an equivalent norm that makes both X and its dual strictly convex (Klee, 1959). If both X and its dual are separable (Kadec, Klee, Asplund) or if X is reflexive (Troyanski), then X has an equivalent norm that makes both X and its dual locally uniformly convex. For further, related reading and references, see Diestel [1975], Istratescu [1984], and Lindenstrauss [1988]. 22.45. Theorem on closest points. Let Q be a convex subset of a Banach space X . Assume either (i) X is strictly convex and Q is compact, or (ii) X is uniformly convex and Q is closed. Then for each point x E X there is a unique point 7T(x) E Q that is closest to x - i.e., that satisfies llx - 7T(x)ll = dist(x, Q). Furthermore, this function 7T : X --+ Q is continuous. It is called the closest point projection onto Q. Proof. Uniqueness follows from 22.39(D). For any x E X, there exists a sequence (qn ) in Q that satisfies llx - qnll --+ dist(x, Q); any such sequence will be called a minimizing sequence for x in this proof. Note that any subsequence of a minimizing sequence is a minimizing sequence. To prove the existence of 7T(x) it suffices to show that (!) any minimizing sequence for x has a convergent subsequence. That is easy in case (i), since any sequence in a compact metric space has a convergent subsequence. The proof of (!) will take slightly longer for case (ii). Let (qn ) be a minimizing sequence, and let r = dist(x, Q). The result is trivial if r = 0; we shall assume r > 0. By rescaling, we may assume r = 1. Thus llx - qrn I I --+ 1 and llx - q, I --+ 1 as --+ oo. On the other hand, � (qrn + qn ) E Q since Q is convex; thus llx - � (qm + qn ) ll 2 dist(x, Q) = 1. Therefore ll (x - qrn) (x - q,) ll --+ 2. By 22.40(C) the sequence (qn) is Cauchy. This completes the proof of (!). Thus 7T is defined everywhere on X. To show 7T is continuous, suppose (xn ) is a sequence converging in X to some limit x= ; we must show that 7T(xn) converges to 7T(xx). Suppose not. Replacing (xn ) with a subsequence, we may assume 11 7T(xn ) - 7T(x= ) ll > for some constant > 0. We know that dist(xn , Q) --+ dist(x= , Q) by 4.4l.b; hence (7T(x,)) is a minimizing sequence for x= . Replacing (x,) with a subsequence, by (!) we know that (7T(x,)) converges to some limit q E Q. Then llq -7T(x = ) I 2 > 0, so q -/:- 7T( ) Thus, q is not the member of Q closest to so llq - Xcxo ll > dist(xcxo, Q). Hence llq - xx ll > r > dist(xcxo, Q) for some real number r. Then for all sufficiently large we have 117T(xn) - xn ll > r > dist(xn, Q), a contradiction. Thus 7T is continuous. m, n
+
"'
Xx ,
"'
n
Xx
.
"'
601
Hilbert Spaces
HILBERT SPACES
Let X be a linear space over F. An inner product on X is a mapping : X X ___, that satisfies: is linear, for each y E X (linear in first component) ( -. y) : X (positive-definiteness) =? (x, x) > 0 #0 (conjugate symmetry) (x, y) (y, x) . where the bar denotes complex conjugation. The conjugate symmetry condition is some times called "antisymmetry." If the scalar field is JR, then the complex conjugate of a scalar is equal to that scalar, and so the conjugate symmetry condition becomes (symmetry) (x, y) = (y, x) , and it also implies that ( , ) is bilinear i.e., linear in each of its two arguments. An inner product space is a linear space equipped with an inner product. A:s we shall see in an exercise below, if ( ) is an inner product then ll x ll = (x, x/ 1 1 2 i:s a norm on X. An inner product space will always be understood to be equipped with this norm, unless some other arrangement i:s specified. If the norm is complete, then the inner product space is called a Hilbert space. 22.47. Examples. If (12, S, JL) is any measure :space, then L 2 (JL) is a Hilbert space, with inner product defined by 22.46. Definition.
(. )
x
II:'
:r
___,
II:'
=
II:'
-�
1 f(w) g(w) dJL(w).
(f, g )
n
The convergence of the integral is guaranteed by Holder 's inequality. If the scalar field is JR, then the bar over the g (j) may be omitted. We note some important special cases. a. Let (12, S, fl) be :smne set .lJ equipped with counting measure. Then we obtain the normed space £2 (JJ) introduced in 22.25. It has inner product 2::: J(j)g (j). u. g ) jE]
In 22.56 we :shall prove that every Hilbert space can be expressed in this form - i.e., every Hilbert space is isomorphic to :some £2 (JJ). However, other representations of Hilbert spaces are often useful. b. When .lJ is a finite :set containing elements, we find that Fn is a Hilbert space when equipped with the inner product (x, y) X t Yl + X2 Y2 + + XnYn If the scalar field i:s JR, then the bar over the Y.i 's may be omitted. In JRn , the inner product is also known as the dot product; it is used in analytic geometry to give algebraic formulas for much of Euclidean geometry. n
· · ·
·
602
Chapter 22: Norms
22.48. Some elementary properties. Let ( , ) be an inner product on some vector space, and let ll x ll = (x, x) 1 1 2 . (We do not yet assert that I II is a norm; that fact is shown below.) Show that a. ll x + Yll 2 = ll x ll 2 + 2 Re (x, y) + IIYII 2 (Hence Re(x, y) is uniquely determined by II - It follows easily that (x, y) is uniquely determined by II II- ) ·
b. Schwarz Inequality. l (x, y) l :::; ll x ii iiYII -
Substitute c = (x, y)/ IIYII 2 , and use 0 :::; ll x - cy ll 2 . c. I I is a norm on X . d . The mapping (x, y) f-+ (x, y) is a continuous map from X topology) into lF. Hint:
x
X
(with the product
e. Parallelogram Equation. ll x + Yll 2 + ll x - Yll 2 = 2 ll x ll 2 + 2 IIYII 2 -
Remark. Clarkson's Inequality may be viewed as a generalization of the Parallelo gram Equation. Clarkson 's Inequality tells us that I li P norms, for 1 < < oo, are "almost as good as" the norms of inner product spaces. f. Any inner product space is uniformly convex. 22.49. Converse results ( optional) . Let (X, II II ) be a normed space whose norm satisfies the Parallelogram Equality. Then I II arises from an inner product ( , ) , which is uniquely determined by II 11 (i) If lF = IR, then � [ ll x + Yll 2 - ll x - Yll 2 ] = (x, y) . (ii) If lF = C, then � [ll x + Yll 2 - ll x - Yll 2 + i ll x + iy ll 2 - i ll x - iy ll 2 ] = (x, y) . Hint: 22.48.a. 22.50. Let X be a linear space, and let ( , ) be an inner product on X. In this context, we say that two elements x, y E X are orthogonal to each other, denoted x y, if (x, y) = 0. For any set S � X , the orthogonal complement of S is the set sl. {x E X x for all E S}. This definition is a special case of 4.12, with p
.l
.l s
r
{ (x, y) : x .l y}
s
{ (x, y) : (x, y) = 0},
and so the conclusions of 4.12 are applicable. The mapping S f-+ 51.1. is then a Moore closure on X . (That closure is characterized further in 22.52.) 22.51. Theorem on closest points. Let C be a nonempty closed convex subset ot a Hilbert space X . Then for each E X , there is a among the members of C a unique point ( ) that is closest to It can be characterized as follows: It is the only point � E C that satisfies for all x E C. 1r u
u.
u
603
Hilbert Spaces
(In terms of Euclidean geometry, this inequality says that the directed line segment from � to u and the directed line segment from � to x are separated by an obtuse angle i.e., an angle greater than a right angle.) The mapping 1r X C is also nonexpansive - i.e., it satisfies (1r) Lip :::; 1. If C is a closed linear subspace of X , then 1r ( u) can also be characterized as follows: It is the unique point � E C that satisfies u - � E C .l . Proof. Let C be closed, convex, and nonempty, and let u E X . It follows from 22.45 that there is a closest point and that it is unique. Now let � be a point in C. Then � is the point in C that is closest to u I I � - u ll < l l x - u ll for all x E C \ {0 II� - u ll < p,x + (1 - A) � - u ll for all x E C \ {0 and A E (0, 1] II� - u l l 2 < I I (� - u) + A (x - 0 11 2 for all x E C \ {0 and A E (0, 1] I I � - u ll 2 < II� - u l l 2 + 2 A Re( � - u, x - �) + A 2 ll x - � 1 1 2 for all x E C \ { 0 and A E ( 0, 1] 0 < 2 A Re( � u, x - �) + A2 ll x - � 1 1 2 for all x E C \ { 0 and A E ( 0, 1] 0 < 2Re(� - u, x - 0 + A ll x - � 1 1 2 for all x E C \ {0 and A E (0, 1] 0 :::; Re(� - u . x - �) for all E C \ { 0 0 :::; Re(� - u , x - �) for all x E C. This proves the first characterization. Thus 0 :::; Re (1r (u) - u, x - 1r (u) ) for all u E X and x E C. Apply that result with x 1r( v) to obtain 0 :::; Re( 1r ( u) - u, 1r( v) - 1r( u ) ) for any u, v E X. Reversing the roles of u and v yields 0 :::; Re( 1r( v) - v , 1r( u) - 1r( v ) ) . Combine that inequality with ( 1 ) and rearrange the results to obtain ll 1r ( v ) - 7r (u) ll 2 Re ( 1r(v ) - 1r (u) , 1r(v) - 1r (u) ) :::; Re (v - u, 1r(v) - 1rc ( u) ) :
--
--->
� �
�
�
-
� �
x
�
=
=
< l (v - u, 1r (v) - 7r (u) ) l :::; l l v - u ll l l 1r (v) - 7r (u) ll
and therefore ll 1r(v ) - 7r (u) ll :::; l l v - u l l · Thus 1r is nonexpansive. Now suppose C is a linear subspace of X , and � E C. Then as x varies over all members of C, x - � also varies over all members of C. Hence � is the point in C that is closest to u 0 :::; Re(� - u, x - � ) for all x E C 0 :::; Re(� - u, y ) for all y E C 0 :::; Re(� - u, ry ) for all y E C and all scalars 0 ( � - u, y ) for all y E C � � � � �
=
� - u E c.L .
c
604
Chapter 22: Norms
22.52. Theorem on orthogonal complements. Let X be a Hilbert space and S � X. Then Sj_j_ is the closed linear span of S. Thus S is an orthogonal complement if and only if S is a closed linear subspace of X. Furthermore, if S, T � X with Sj_ = T and Tj_ = S, then S and T form an internal direct sum decomposition of X; that is, S + T = X and S T = {0} . The projections of onto S and T are the closest point mappings; i.e., for any x E X the unique decomposition X = S+t with s E S, t E T is given by s and t being the points in S and T that are closest to x. These are continuous linear maps. Remark. Compare this theorem with 1 1.61 . Proof of theorem. It is easy to see that any orthogonal complement is a closed linear subspace of X. Let clsp(S) denote the closed linear span of S; then Sj_j_j_ � clsp(S). We wish to show equality here. Suppose that X E clsp(S) \ sj_j_ Since X tic s j_ ' there is some y E Sj_ such that (x, y) -I 0. Since y E Sj_, we have n
X
0
(s, y) = 0
for every s E S, hence (by the linearity of (·, y)) also for every s E span( B), hence (by the continuity of (·, y)) also for every s E cl(span(S)) = clsp(S). But this contradicts (x, y) "I 0. Thus, we must have Sj_j_ = clsp(S). Now suppose that Sj_ = T and Tj_ = S. Let s be the point in S that is closest to x. By 22.51 , x - s is a member of Sj_ = T. This shows that x can be represented as the sum of an element of S and an element of T. Since S and T are linear subspaces of X and S T = {0}, the representation is necessarily unique (see 8.13). Thus, in such a representation, the S component must be the member of S closest to x. By symmetric reasoning, the T component must be the member of T closest to x. 22.53. Remarks. The preceding theorem has a converse: If X is a normed space in which every closed linear subspace has an additive complement that is also a closed linear subspace, then X is isomorphic to a Hilbert space. This was proved in Lindenstrauss and Tzafriri [1971]; the proof is too long to give here. 22.54. Definitions. Let X be a Hilbert space. An orthonormal set in X is a set S C X with the property that (s, t) = 881, where 8 is the Kronecker delta - i.e., if s -1 t (s, t) if s = t. Some easy obse'f'Vations. Suppose { e 1 , e 2 , . . . , en} is an orthonormal set. Then: a. ii eJ ii = 1 for each j. b. If x = r 1 e 1 + r2 e 2 + · · + rnen and y = s 1 e 1 + s 2 e 2 + · · + s n e n for some scalars rj and Sj , then (x, y) = r1 s 1 + r2 s 2 + · · · + rnsn and ll x ll 2 = h 1 2 + l r2 l 2 + · · · + l rn l 2 . n
{�
·
·
605
Hilbert Spaces c.
The e1 's are linearly independent - i.e., if r1 e 1 + r2 e2 + · · · r11 e11 = 0 then r 1 = r2 =
· · · = rn = 0. d. If x = r1 e 1 + r2 e 2 + · · · + r11e11
and u E X, then rt
n ll u l l 2 + L i c1 - (u, ej) l 2 - L l (u, eJ) I 2 . j= l
j= l
Hence the member of span { e 1 , e2 , . . . , en} that is closest to u is the vector x = r1 e 1 + r2 e 2 + · · · + r11 en obtained by taking r1 = (u, ej ) for all j. Its distance from u is 11
ll u ll 2 - L l ( u . eJ W · .i = l
e. llull 2 2: l.:j'= 1 I (u, eJ W for any u E
X. 22.55. Theorem. Let X be a Hilbert space, and let { e1 j E JJ} be an orthonormal subset of X. Then the following conditions are equivalent. If one (hence all three) of them are satisfied, we say {c1 .i E JJ} is an orthonormal basis for X. (A) { e1 : j JJ} is a maximal orthonormal set - i.e., an orthonormal set that is not contained in any other orthonormal set. (B) The span of { e1 : j E JJ} is dense in X. (C) Parseval's Identity. ll u ll 2 = LJ E.IJ l (u, eJ ) I 2 for every u E X. Remark. By Zorn ' s Lemma, any orthonormal set can be extended to a maximal orthonormal set. However, some Hilbert spaces have natural orthonormal bases that can be constructed without the Axiom of Choice. For instance, the space £2 has orthonormal basis consisting of the vectors 6 , 6. 6 , . . . , where E,i = (0, 0, . . . , 0, 1, 0, . . . ) has a 1 in the jth place and Os elsewhere. Proof of (A) (B). Suppose the span of { e.J } is not dense. Then the closed span of { eJ } - which we shall denote by Y is not equal to X. Let E X \ Y. Let y be point in Y that is closest to Then z = y - u is nonzero, and z is orthogonal to all of Y - hence to all of { e j }. Let E, = z/ ll z ll - Thm {e./ : j E JJ} {0 is an orthonormal set: thus {e1 : j E JJ} is not maximal. Proof of (B) (C). Let any u E X and 0 be given. Since the span of the ej's is dense in X, there is some finite set 10
< c.
0
( x, Yl
L 'Px (j)
--->
,
Chapter 2 3 N ormed Operators
NORMS OF OPERATORS 23.1. Let (X, II ll x ) and (Y, I II Y ) be normed vector spaces. Let f : X ---> Y be a linear map. Show that the following conditions are equivalent: (A) f is continuous. (B) f is uniformly continuous. (C) f is Lipschitzian; i.e., the Lipschitz constant
sup
1 11111 1
{ ll f(x)ll x (-- xf(x') IIY 'llx
x, x' E X, x =f. x'
}
is finite. (D) f is a bounded linear operator - i.e., whenever S 0. (I) If (un ) is a sequence in X with ll un ll x ---> 0, then supn ll f(un) I I Y < oo. Moreover, if these conditions are satisfied, then all the numbers 1 11!11 1 defined above are equal to each other. Hint for 23. 1(1) 23. l (H): Let Un = Xn/ �Further notations. The set of all bounded linear operators from X into Y is a linear subspace of y x = {maps from X into Y } , which we shall often denote by BL(X, Y). It is a normed space, with I ll fI l l (defined as above) for the norm of f. A norm obtained in this fashion =
=}
607
608
Chapter 23: Normed Operators
is the operator norm determined by II ll x and II IIY · A bounded linear operator will generally be given this norm, unless some other norm is specified. In most of the literature the operator norm is denoted by II II , but in this textbook we shall frequently denote it by I ll I l l to aid the beginner in distinguishing this norm from the "lower-level" norms of X and Y. 23.2. Exercises and examples. a. If f : X Y and g : Y Z are bounded linear maps, then the composition g o f : X ---+ Z is also a bounded linear map, with l ll g o f ll l ::; l ll f ll l l ll g ll l · Also, the identity map ix : X X has operator norm equal to 1 . Thus, we could take the bounded ---+
---+
---+
linear maps as the morphisms of a category, with normed linear spaces for the objects. b. Let � be a set, and let B(�) = {bounded functions from � into JR}; then B(�) is a Banach space when equipped with the sup norm. Let T B(�) lR be a positive linear map - i.e., assume f 2 0 =? T(f) 2 0. Then T is also a bounded linear map; in fact, I II T II I ::; I T(1) 1 . In particular, any Banach limit (defined as in 12.33) is a bounded linear operator. Similarly, if � is a topological space, then BC(�) = {bounded continuous functions from � into lR} is a Banach space when equipped with the sup norm, and any positive linear map from BC(�) into lR is a bounded linear map. The definitions of the vector space BL(X, Y) and its operator norm I ll I l l depend on the norms II llx and I IIY of the spaces X and Y. Show that if II llx and I I IIY are replaced with equivalent norms I II � and II II� , then the vector space BL(X, Y) remains the same, and its norm I ll Il l is replaced with an equivalent norm I ll I l l . See also the related result in 23.29(iv). d. If Y is complete, then the normed space BL(X, Y) is complete - regardless of whether X is complete. In particular, BL(X, F) is complete, since the only scalar fields F that we are considering for normed spaces in this book are lR and C , both of which are complete. e. Elementary Extension Theorem. Let Xa be a dense linear subspace of a normed space X; let X0 be normed with the restriction of the norm of X. Let Y be a Banach space. If fa : Xa Y is a continuous linear map, then fa extends uniquely to a continuous linear map f : X Y. Furthermore. fa and f have the same operator norm. Proof This is a special case of 19.27. (However, some readers may prefer to prove it directly.) 23.3. Example: matrix norms. Let T be an m-by-n matrix, with scalar t;1 in row i, column j . Consider elements x E Fn as n-by-1 column vectors and elements y E F"' as m-by-1 column vectors. Then T acts as a linear map from Fn into Fm , with y Tx given as usual by y; = 2:::7=1 t;J Xj (1 ::; i ::; ) The choice of the norms on pn and Fn will affect the. value of the operator norm I II T II I · For most choices, the value of I II T II I is complicated and difficult to compute. But for the two following choices, the value of I II T II I is fairly simple. a. Let Fm and Fn both be normed by their respective I I 111 -norms, as defined in 22.11. :
---+
c.
'
---+
---+
=
m .
609
Norms of Operators
Then
"'
max l t u l · l
a.
n,
t 9 be a collection of continuous linear maps from X into Y. Then these conditions are equivalent: (A) if> is bounded pointwise; that is, !J>(x) {!f (x) : ip E if>} is a bounded subset of Y for each x E X . (B) is uniformly bounded, i.e., equicontinuous; that is, sup
k . For each k, we can choose some uk E X with ll uk ll x 1 and I I IPk (uk ) IIY > k. (Remark. These choices do not require the Axiom of Choice, but only the Axiom of Countable Choice, discussed in 6.25.) We offer two different methods for finishing the proof. The first method is shorter but relies on earlier results that are rather nonelementary: By the Baire Category Theorem (a form of Dependent Choice), the complete metric space X is a Baire space. Since the pointwise to 0, and functions IPk are bounded pointwise, the functions k - 1 1 2 ipk converge therefore are equicontinuowi by 23.13. Since the vectors k - 1 1 2 uk converge to 0 in X, it follows that k - 1 /2 1Pk (k - 1 1 2 uk) --+ 0 in Y. But ll k - 1 / 2 ipk (k - 1 / 2 uk) IIY k - 1 II1Pk (uk) IIY > 1, contradiction. The second proof, though longer, may be preferable to some readers, because it is self contained and does not rely on the Baire Category Theorem or other deep topological theorems. (In fact, it uses Countable Choice but not Dependent Choice.) It is based on Bennefeld [1980]. Recursively define a sequence (xn) in X and a sequence Un) in as follows: Let x0 0 and choose any fo E if>. Having chosen x0, x 1 , . . . , Xn - 1 E X and fo, h , . . . , J, 1 E if> (clear for 1), define the numbers .
=
=
if>
if>
=
=
a
if>,
=
_
n =
A,
n- 1
L /supE
,:= 80}. By assumption, there exists a probability charge JL on P(�) that takes the value 1 on elements of 9'". Define LIM(u) J-6. u(o) dJL(D) in the obvious fashion for simple functions u. Since simple functions are dense, we can extend this definition to u E B (�) by taking limits. (This construction is a special case of the Bartle integral construction described in 29.30.) Note that if F E 9'", then LIM(lp) = JL(F) 1 (where lp : f2 ___, {0, 1} is the character istic function of F), and JL(� \ F) 1-1 0. JL(�) - JL(F) If g is a bounded real-valued function on � that vanishes on F, then - ll gl l oo l -6.\F :::; g :::; llg ll oo l -6. \F; hence 0 0 and thus LIM(g) = 0. If h is any bounded real-valued function on � ' then h - h l p vanishes on F, so LIM( h) = LIM(hlp ). =?
___,
___,
n,
· · ·
· · ·
=
=
620
Chapter 23: Normed Operators
We shall show that LIM is a Banach limit with the desired properties. Clearly LIM is a positive linear functional; it suffices to show that LIM( u) :::; lim sup8 E .t> u( b) for each u E B ( !::. ) . Fix any number r > lim sup8 E .c, u( b); it suffices to show that r 2 LIM( u) . By our choice of r, we have r > u(b) for all b sufficiently large - say for all b )r b0. Thus r 1 p 2 u1p. The set F = {b E /::,. : b )r b0} belongs to 1', hence LIM(u) r. LIM(r) 23.19. Luxemburg 's Boolean equivalents of HE ( optional). The principle (HB12) involved algebras of sets. We shall generalize that principle to Boolean algebras. Admittedly, Boolean algebras don't seem to be much more general -- indeed, (UF7) in 13.22 tells us that every Boolean algebra is isomorphic to an algebra of sets. However, UF is stronger than HB, so we are not permitted to use UF in the next few paragraphs when we prove that certain principles are equivalent to HB. First, we must generalize some notions of charges and measures to Boolean algebras. a. In a Boolean lattice X, we say that two elements x, y are disjoint if x 1\ y = 0. Note that 0 is disjoint from any element; 0 is even disjoint from itself. A subset of X is disjoint (or, for emphasis, pairwise disjoint) if each pair of elements of that set is disjoint. b. A probability (or probability charge) on X is a function X ----+ [0, 1] such that M(1) = 1 and f.L(x V y) = f.L(x) + f.L( Y ) for disjoint x, y E X. Of course, if is any probability on a Boolean lattice X, then f.L ( O) is equal to 0, since 0 is disjoint from itself. Thus, we have {0, 1} Y , the dual map f* : Y * ---> X * is defined by J * (>.. ) = >.. o f. By the Hahn-Banach Theorem (HB9), X * separates points of X . Therefore, points in X may be viewed as distinct functions acting on X * . 1\iioreover, the embedding X __f_. X * * is norm-preserving. as we noted in (HB8) in 23.18. For any morphism f : X ---> Y , the bidual function f** : X * * ---> Y * * is an extension of the function f : X Y . All of these statements and all of hypotheses (H1) through (H5) in 9.55 through 9.57 are now easy to verify. A Banach space X is reflexive if X * * = X . Some Banach spaces are reflexive, but others are not. For imtance, f!P is reflexive for 1 < < but £ 1 , £:)() , and c0 are not. Reflexivity of Banach spaces will be investigated further in 213.41(A). A slightly mor-e subtle result: 1 1 ! 1 1 = 1 1 ! * 1 1 = II I** II for any continuous linear map f : X ---> Y normed spaces. Hints: From the definition f*(>.. ) >.. o J , prove that ll f* ll ::; ll f l l - Similarly, I I I ** II ::; I I I * I I · On the other hand. use the fact that f** : X * * ---> Y * * is an extension of f : X Y to show that 11 ! 1 1 ::; l l .f ** l l - Finally, combine these results: l l .f * ll ::; ll f ll ::; ll f** ll ::; ll f* l l · 23.21. Taylor-Foguel Theorem (optional) . Let ( X, I I II ) be a Banach space, and let ( X *, I I I I ) be its dual. Then X * is strictly convex if and only if every bounded linear functional on a subspace of X has a 'unique norm-preserving linear extension. Proof. First, suppose there exists some linear subspace X0 � X and some fo E ( X0 ) * that has distinct extensions /J , h E X * with ll fo ll = 1 1 /J II = ll h l l · Let f ( /J + h )/2. Then f is also an extension of fo, so 1 1 ! 1 1 2 l i fo II · Now /J . h . f are collinear, so X * is not strictly convex. Conversely. X * is not strictly convex; we shall show that X does not have unique norm-preserving extensions. By assumption, there exist distinct f, g E X * with ll f l l x * Let = {:r E X : f(:r) = g(.r )}, and let rp be the restriction ll g l l x = II � ( ! + g) l l x of f or* g to the linear*subspace It suffices to show that ll rp l l l\I * = 1, for then f and g 23.20. The dual functor in normed spaces. C
t--+
IF',
--->
p
oo,
}wtwPen
=
--->
=
= 1.
M
1\I.
=
622
Chapter 23: Normed Operators
are distinct norm-preserving extensions. We know that II'PIIM :::; 1 by the definition of the operator norm; thus it suffices to show that II'P IIM * 2:: 1. Since* f -j. g, we may choose some � E X with f(�) - g(�) = 1. Then each x E X may be expressed in one and only one way in the form x y + a� , where y E M and a is a scalar. Since II � ( ! +g) ll x * 1 , we may choose 1, a sequence (xn ) in X with ll xn llx 1 and � ( ! + g)(xn ) ____, 1. Since ll f ll x ll 9 ll it follows that f(xn ) ____, 1 and g(xn ) 1. Write Xn Yn + an � with Yn E* M andx *scalar a n . Then a n ( ! - g)(an �) ( ! - g) (xn - Yn ) ( ! - g)(xn ) 1 - 1 0, hence IIYn ii M = IIYn ll x ____, 1. At the same time, 'P( Yn ) = f( Yn ) f(xn ) - a n f( O 1. Thus II'PII M * 2:: 1, completing the proof. 23.22. Kottman's Theorem. Let X be an infinite-dimensional normed space. Then X* is infinite-dimensional. Furthermore, there exists a sequence (xn ) in X such that ll xn I = 1 for each n and ll xm - xn ll > 1 whenever -::f. n. Remark. It follows easily from compactness considerations (see 27 .17) that such a sequence cannot exist in a finite-dimensional normed space. Outline of proof This theorem was first proved by Kottman, but the proof given here is due to T. Starbird and was published by Diestel [1984] . Show there exist x 1 E X and )q E X* with ll x d = II .A d >. 1 (x l ) 1. b . We now proceed by induction. Assume and have been chosen, all with norm 1, and with >.1, >.2 , . . . , )..k linearly independent. Show there exists y E X such that >.1 ( y) , >.2 (y ) , . . . , >.k ( Y) < 0. Show there exists a nonzero X in n7= 1 Ker (>.i ) here "Ker" denotes kernel. d. Show that for any sufficiently large positive number K, we have IIYII < IIY + Kx ll · Fix some such K. e. Using the linear independence of the >.; ' s, show that if a 1 , a 2 , . . . , ak are scalars, not all 0, then I I:7=1 ai >.i ( Y + Kx) l < II I:7=1 a i >.i ii iiY + Kx ll · f. Let Xk + l ( y + Kx)/ IIY + Kx ll , and then by ( HB8 ) choose some >.k + 1 E X * with l l >.k + 1 11 >.k + 1 (xk + 1 ) 1 . Using 23.22.e, show that >.k +1 is not a linear combination of >. 1 , >.2 , . . . , >.k, completing the induction. Show that if 1 :::; i :::; k, then >.i (xk+ l ) < 0, and hence ll xk+ 1 - x; ll 2:: I.Ai (xk+1 - x; ) l > 1. =
=
=
=
____,
=
=
=
____,
=
=
=
=
____,
m
a.
=
=
i
c.
=
=
=
g.
DUALITY AND SEPARABILITY
If (X, II II ) is a Banach space and X * is separable, then X is separable. Proof ( following the exposition of M. Schechter [1971 ] ) . Let ( 'Pn) be a dense sequence in X * . For each n, choose some Vn E X that satisfies ll vnll 1 and ( 'Pn , vn ) 2:: � II'Pnll · Let 23.23.
=
623
Duality and Separability
V be the closed linear span of the Vn ' s. Then V is a separable, closed linear subspace of X ; it suffices to show V = X. Suppose, on the contrary, that E X \ V. By (HBll) in 23.18, there is some 'lj; E X* that vanishes on V but not on By rescaling we may assume ll'!j;ll = 1. For each we have 1 2 II 1Pn ll :S: (ipn , Vn) Hence 1 = ll '!j; ll :S: IIIPn - '!j; ll + IIIPnll :S: 3II 1Pn - '!j; ll · But the IPn 's are dense in X*, so they should come arbitrarily close to 'lj;, a contradiction. Remark. It is possible to have X separable and X* not separable. For instance, £ 1 is separable, but (see 23.10) its dual is foe, which is not separable. 23.24. Proposition. Let (X, I II) and (Y, I II) be real Banach spaces. Assume one is the dual of the other (i.e., assume either X = Y* or Y = X*). Let S be a separable subset of X . Then there exists a sequence ( Yn ) in Y satisfying llYn I = 1 for all and such that lis I = sup n (yn , s) for each s E S. w.
n,
w
n,
Proof. Note that if IIYII ::; 1 then (y, s) ::; llsll - Let (sk) be a dense sequence in S. We proceed by two different arguments: (i) If Y = X*, we may apply (HB8). For each k, there exists some Yk E Y satisfying IIYk ll = 1 and (y, sk) = llsk ll (ii) If X = Y*, we may apply the definition of the operator norm - i.e., the norm of X. For each k, we have llsk I I = sup { (y, sk) : y E Y, II YII = 1 } . Hence we may choose a sequence (Yk,j j = 1, 2, 3, . . . ) in Y satisfying IIYk,j I = 1 and limk�oc (Yk .j , sk) = llsk I I - Now arrange the doubly indexed set {Yk .j j, k E N} into a sequence (Yn ) (see 2.20.e). In either case we obtain a sequence ( Yn ) in Y, satisfying IIYnll = 1 for each and satisfying supn (Yn, sk) = llskll for each k. Now let any s E S be given and any number E > 0. Since (sk) is dense in S, we have lis - skll < E for some k. For each we have (yn , s) > (yn, Sk) - E, and therefore supn (yn, s) 2: llskll - E > lis II - 2E. Now let E l 0. 23.25. Definitions. Let X be a Banach space, and let (rl, S) be a measurable space. A function f : n X is weakly measurable if the scalar-valued function (ip, f(·)) is measurable for each fixed ip E X*. A function 'lj; n X* is weak-star measurable if the scalar-valued function ('!j;(-), x) is measurable for each fixed x E X. A function satisfying either of these conditions will be called scalarly measurable. Proposition. Any scalarly measurable, separably valued function is strongly measurable (defined as in 21.4). Proof (modified slightly from Hille and Phillips [1957]) . Here we assume X and Y are Banach spaces, one is the dual of the other, f : n --+ X is separably valued, and (y, f(·)) is measurable for each fixed y E Y. Replacing each y with Re y, we may assume the scalar :
:
n
n,
--+
:
--+
624
Chapter 23: Normed Operators
field is lR (see 1 1. 12); this simplifies our notation slightly. Let S be the closed span of the range of f; then S is separable. As in 23.24, choose a sequence ( Yn ) in Y with ll Yn II = 1 for all n and li s II = supn (yn , s) for each s E S. Temporarily fix any v E S. The function f(w)-v takes its values in S. Moreover. for each n E N, the real-valued function w (yn , f(w) - v) = (yn , f(w)) - (yn , v) is measurable. Hence, for each v E S the real-valued function w llf(w) - v ii = supn (yn , f(w) - v) is measurable. In particular, if (xk ) is a dense sequence in S, then each of the functions w llf(w) x k ll (for k = 1 , 2, 3, . . . ) is measurable. Now, for each w E n and j E N, let fJ (w) be the first term in the sequence x 1 , x 2 , x3 , . . . whose distance from f(w) is less than J Show that fJ n X is a countably valued, measurable function. Since UJ ) converges uniformly to f as j it follows that f is strongly measurable. f--+
:
---+
---+
f--+
f--+
l. .
oo ,
UNCONDITIONALLY CONVERGENT SERIES 23.26. In 10.42 we gave an elementary example of a series whose sum is affected by a reordering of its terms. We now investigate that phenomenon further. Definition and proposition. Let 2:::;': Xj be a series in a Banach space (X, I II ). Then the following conditions are equivalent;1 if any (hence all) are satisfied we say the series is unconditionally convergent. Furthermore, when those conditions are satisfied, then all the series in (A) have the same sum, and that sum is equal to the limit in (B). (A) 2::: %"= 1 x1r ( k) is convergent for every permutation 1r of the positive integers. (This is the most commonly used definition of unconditionally convergent.) (B) The net (Lj E F Xj : F E :r) is convergent, where = {finite subsets of N} is directed by inclusion. (C) The series 2:::;': 1 l u(xJ ) I converges uniformly for all u in the closed unit ball of X*. That is, let U = { u E X* llull :::; 1 } ; then :t
:
lim sup N� =
uEU
00
l u(xJ ) I L .J =IV.,
0.
In other words, the set of sequences of scalars { ( ux 1 , ux2 , ux3 , ) : u E U} is a relatively compact subset of £ 1 (see the characterization of compactness in •
.
.
22.25).
For each sequence (!31 ) of scalars with 1 !31 1 :::; 1 , the series 2:::;': 1 f3J x1 Jnverges. (E) For each sequence (EJ ) with Ej = ± 1 , the series 2:::;': 1 EJ XJ converges. (F) Each subseries 2::: � 1 Xjk is convergent - i.e., the series 2::: %"= 1 Xjk is conver gent for each choice of positive integers J 1 < j2 < j3 < · · ·.
(D)
625
Unconditiorwll.Y Convergent Series
A seventh characterization of unconditional convergence will be given in 28.31. This result and proof are taken from Singer [1970] . Before plowing through the proof of equivalence. some readers may find it helpful to glance ahead to the examples in 23.27. Also, this concept should be compared with the one in 10.40. Proof that ( A ) implies ( B ) , and that the sums in ( A ) all equal the limit in ( B ) . Fix some particular permutation 1 of N, and let x 2::.: �= 1 x-y ( k) ; suppose that I.:jEF x1 does not converge to x: we shall obtain a contradiction. By our assumption, there exists some E > 0 such that every finite set F � N is contained in some finite set G such that l x - I.:jEG Xj I I > E. Since x I.: �= x-y ( k ) • there is some positive integer N1 such that Remarks.
=
l
=
Recursively choose finite sets F1 � G1 � F2 � G2 � F3 � G3 � · · � N as follows: Let F1 {1( 1 ) , 1(2), . . . ' r(Nt ) }. Given Fm , choose Gm � Fm such that l x - I.:jEG m Xj I I > E. Given Gm , choose Fm + 1 {1(1), /(2) , . . . , /(Nm + 1 ) } with Nm+ 1 large enough so that Nm + 1 ::=: + 1 and Fm + 1 � Grn. This completes the recursion. Since Nm ::=: the union of the Frn 's is equal to N. Now define a sequence 7r(1), 7r(2), 7r(3), . . by listing first the elements of F1 in any order, then the elements of G 1 \ F1 in any order, then the elements of F2 \ G 1 , then the elements of G2 \ F2 , etc. The resulting series 2::.: �=1 x1r ( k ) is not convergent, since ·
=
=
m,
m
.
L Xj jEGm \F,
> x-
L Xj
-
2
j EGm
Proof of (B ) =? (C). Let x lim :f I.: Xj · Let any E > 0 be given. By ( B ) , there is some positive integer N such thatFEif G isJ EFany finite subset of N with G � { 1 , 2, . . . , N}, then l l x - I.:J EG xJ I < � - Fix any u E U; it suffices to show that 2:.:;: N +1 \u(x1)i :S E. Temporarily fix any integer p ::=: 1. Define the sets =
I
Also let B
=
{J E {N + 1 , N + 2, . . . , N + p} {J E { N + 1, N + 2, . . . , N + p}
{1, 2, . . . , N}. N +p
L
{
J =N + 1 2
< {;
Then
2
\ Reu(xj ) l X-
}, Reu(x1 ) < 0 } ·
Reu(x1) ::=: 0
L
jE A k U B
Xj + X -
L Xj
jE B
< k=L1 jL Xj EAk -
2
626
Chapter 23: Normed Operators
Similarly, I:;=+;+ l l lmu(xj ) l < � - Hence I:;=+J+ 1 l u(xj ) l < E. Now let p ---+ Proof of (C) (D). We must show that the partial sums of 2::;: 1 /3jXj form a Cauchy sequence. Given any E > 0, choose N by (C), so that supuEU I:;: N lu(xj ) l < E. Now for any integers n , p with p 2:: n 2:: N , use the Hahn-Banach Theorem (HB8) in 23.18 to choose some E U (which may depend on n,p) to satisfy the first equation below: oo .
::::?
u
0 there exists some number 8 > 0 such that if E N and a to :::; t 1 :::; t 2 :::; :::; t" b and Tj E [tJ 1 , t1] with tJ - tJ 1 < 8 for all j, then ll v - L.:'= 1 (tJ - tj_I ) j(TJ ) II < c. =
:
n
=
=
-->
v
· · ·
=
629
_
v
_
630
Chapter 24: Generalized Riemann Integrals
When such an integral exists, we say f is Riemann integrable. The Riemann integrability of certain kinds of functions will be established later in this chapter; a characterization of Riemann integrability among real-valued functions will be given in 24.46. Any function f has at most one Riemann integral v; this can be proved directly by ad hoc methods now (easy exercise) or proved via a broader insight given in 24.7.a. Hence we are justified in calling this vector the Riemann integral of f; we shall write it as v = I: f(t)dt. For emphasis it may sometimes be called the proper Riemann integral, to distinguish it 1 2 2 1 1 from "improper Riemann integrals" such as I� c 1 dt = limcto I, c 1 dt, which will not be considered here. Remarks. The definition of "Riemann integral" given above is essentially the same as the definition published by Riemann in 1868 - at least, for real-valued functions f. Some calculus books use a different definition, which is equivalent for real-valued functions f but does not generalize readily to Banach-space-valued f: Any bounded function f : [a, b] lR can be approximated both above and below by step functions (defined in 24.22), and those step functions can be integrated in an obvious fashion. When the infimum of the upper integrals equals the supremum of the lower integrals, the common value is called the Darboux integral or the Riemann-Darboux integral. It was used by Darboux in 1875. 24.4. We now generalize slightly. Let f : [a, b] X be some function, and let v E X. We say v is a Henstock integral of f over [a, b] if for each number E > 0 there exists some function t5 : [a, b] (0, +oo) such that if E .N and a = to :S t 1 :S t2 :S · · :S tn = b and Tj E [tj- 1 , t1] with t1 - t1_ 1 < t5(T1) for all j, then ll v - I:.7= 1 (tJ - tj_I )j(TJ ) II < E. When such an integral exists, we say f is Henstock integrable. The Henstock integrability of certain kinds of functions will be established later in this chapter. Any function f has at most one Henstock integral v; this can be proved directly by ad hoc methods now (easy exercise) or proved via a broader insight given in 24.7.a. Hence we are justified in calling this vector the Henstock integral of f ; we shall write it as v = I: f(t)dt. Clearly, any Riemann integral of f is also a Henstock integral of f. The Henstock integral is more general. For instance, I� c 1 12 dt is a Henstock integral with value 2 (if the integrand is defined arbitrarily at t = 0), but it is not a proper Riemann integral. The Henstock integral is sometimes known as the generalized Riemann integral. It is also known as the Kurzweil integral or the Henstock-Kurzweil integral, although that last term also has another meaning - see 24.9. It was introduced independently at about the same time by Kurzweil and Henstock. Kurzweil used it briefly as a tool in the study of certain kinds of differential equations; see particularly Kurzweil [ 1 957] . Henstock developed it in greater detail as part of a wider study of integration theory. The Henstock integral is sometimes known as the gauge integral, but that term has also been applied to some other integrals. The integral studied in this chapter is also known by other names - e.g., the spe cial Denjoy integral or the Denjoy-Perron integral, since it is equivalent to a more complicated integral worked out earlier by Denjoy and Perron. Research continues on re lated integrals; some recent references are Bullen et al. [1990] , Henstock [1991], and Gordon --+
--+
--+
n
·
631
Definitions of the Integrals
[1994] .
One of the chief advantages of the Henstock integral is that it so greatly resembles the Riemann integral with which we are already somewhat familiar. Thus our intuition about the Riemann integral can be carried over to this new, more general integral. Our definition in 24.4, which follows Henstock [1988] , emphasizes this resemblance. However, we note that certain other books (such as McLeod [1980] and DePree and Swartz [1988] ) use a slightly different definition for the Henstock integral. In those books, v = J: f(t)dt means that ( ) for each number E > 0, there exists some function U : [a, b] { open subintervals of IR} satisfying t E U(t) for each t and such that if n E N and a = t0 ::; h ::; t 2 ::; ::; t n = b and Tj E [tj_ 1 , tj ] � U(Tj ) for all j, then ll v - L,�'= 1 (tj - tj_ l ) Jh ) ll < E. It is easy to show that this definition ( ) is equivalent to our definition of the Henstock integral in 24.4. Indeed, let any E > 0 be given. If v and f satisfy ( ) with some U, then we can satisfy the definition in 24.4 by taking 8 (T) > 0 small enough so that (T- 8 (T), T+ 8 (T)) � U ( T). Conversely, if v and f satisfy the definition in 24.4 with some 8 , then we can satisfy ( ) by taking U( T) = ( T - � 8 ( T), T + � 8 ( T) ). (Exercise. Fill in the details of this argument.) Hereafter we shall only use the definition in 24.4. 24.6. Definitions. We now introduce several auxiliary notations that will be helpful in our study of the Riemann and Henstock integrals. By a gauge we shall mean any function 8 [a, b] (0, +oo ) ; a positive constant may be viewed as a constant function and thus as a particularly simple gauge. (Caution: This kind of "gauge" is unrelated to the other "gauge," a collection of pseudometrics, defined in 2.11.) By a tagged division of the interval [a, b] we shall mean a system of numbers 24.5. An equivalent definition (optional).
*
----+
· · ·
*
*
*
:
----+
where n is some positive integer; we may sometimes abbreviate this as T = (n, tj , Tj ). Some mathematicians impose the further restriction that tj _ 1 < tj for each j, to exclude degenerate intervals of length 0. Although that restriction is satisfied in most interesting cases, it is has no real effect on the development of the theory, and omitting that restriction simplifies the notation in some of our proofs - for instance, see 24. 12. A tagged division T = (n, tj , Tj ) is called 6-fine for some positive constant 8 if tj - tj _ 1 < 8 for all j. More generally, a tagged division T = ( n, tj , Tj ) is 8-fine for a gauge 8 if tJ - tj - 1 < 8 (TJ ) for all j. For any function f : [a, b] X, the approximating Riemann sum corresponding to a tagged division T = (n, tj , Tj ) is defined to be the sum ----+
� [f, T]
n
L (tj - tj - d j(Tj) j=l
0
Chapter 24: Generalized Riemann Integrals
632
It is an element of the normed space X. We can now restate our definitions of the integrals. A vector v E X is a Riemann integral (respectively, a Henstock integral) of a function f : [a, b] X if for each number E > 0 there exists a number b > 0 (respectively, a gauge b > 0) such that whenever T is a b-fine tagged division of [a, b] , then ll v - :E [f, T] ll < E. 24.7. The definitions given above are admittedly complicated: "For each E there exists a b such that for each T we have . . . . " That grammatical construction contains more quantifiers than are commonly used in a nonmathematical sentence. It takes some getting used to. The Riemann or Henstock integral may be viewed very naturally as the limit of a certain net. Let us define the sets 'D { (T, b) b E (0, +oo) and T is a b-fine tagged division of [a, b] } , £ { (T, b) b is a gauge on [a, b] and T is a b-fine tagged division of [a, b] } . Then 'D t,;; £, since every positive constant is a gauge. Both 'D and £ will be viewed as directed sets, with this ordering: (T1 , bl ) � (T2 , b2 ) if b 1 2 b2. Unwinding the notation, verify that is a Riemann integral of f" means that the net (:E [f, T] (T, b) E 'D) converges in X to v, and is a Henstock integral of f" means that the net (:E[J, T] (T, b) E [) converges in X to v. Here are two immediate applications of this viewpoint: Since the normed space (X, II II ) is a Hausdorff topological space, each net in X has at most one limit. Thus we have an immediate proof that each X-valued function has at most one Riemann integral or Henstock integral. b. Assume the normed space X is complete. Then a function f : [a, b] X is Riemann or Henstock-integrable, respectively, if and only if the net (:E [f, T] : (T, b) E 'D) or the net (:E [J, T] : (T, b) E £ is Cauchy in X, where Cauchy nets are defined as in 19.2. In other words, f is Riemann integrable (respectively, Henstock integrable) if and only if for each E > 0 there exists some number b > 0 (respectively, some gauge b on [a, b]) such that whenever T, T' are b-fine tagged divisions of [a, b], then II :E [f, T] - :E [f, T'J II < E. 24.8. Definitions. We generalize still further. Let X be a normed space over the scalar field JF. (In the simplest case we may take X = JF, but greater generality is sometimes useful.) Let f and be two functions defined on [a, b] - one of them X-valued, the other scalar-valued. ---->
"v "v
a.
)
r.p
---->
633
Definitions of the Integrals
(Throughout most of this chapter, whenever possible, we shall be intentionally ambiguous about which of j, 'P is scalar-valued and which is vector-valued, in order to cover both cases at once.) Define the approximating Riemann-Stieltjes sum n
� [f, T, 'P]
]
[
L j(Tj ) '{J(tj ) - '{J(tj - d .
j=l
The Riemann-Stieltjes integral and the Henstock-Stieltjes integral are, respectively, the limits of the nets (�[j, T, !f?] : ( T, t5 ) E c), and
where 'D, E are defined as in 24.7. The resulting integrals are denoted I fdip or I: j(t)dip(t) . In other words, the integral is a vector with the property that for each number c > 0 there exists a number t5 > 0, respectively a gauge t5 > 0, such that whenever T is a t5-fine tagged division, then l v - �[!, T, If?] I < c. Since most of this chapter concerns itself with Henstock-Stieltjes integrals, when the Henstock-Stieltjes integral J:' f dip exists we shall simply say that f is cp-integrable. The theory of Stieltjes integrals generalizes that of Riemann and Henstock integrals, since we can take ip(t) t. Readers who are entirely unfamiliar with Stieltjes integrals may wish to glance ahead to 24. 18, 24.35, 25. 17, and 25.26 for motivation. 24.9. Remarks on generalizations and variants ( optional) . We mention some other integrals that will not be studied in this book. In defining I fd'fJ, we could let f and both be vector-valued. Say they take values in vector spaces X and Y, respectively; then form a product using some bilinear mapping (, ):X Y The resulting integral would take values in For any mapping U : [a, b] [a, b] X, we may define the generalized Perron integral of U as the limit (if it exists) of surns of the form v
=
IP
x
__,
Z.
x
Z.
__,
n
�[U, T]
=
L [ U( Tj , t:J ) - U(Tj , tj_ I )]
:J= l
for tagged divisions T (n, t1 , Tj ) . This generalizes the Henstock-Stieltjes integral I: f dip since we can take U(T, t) j(T)ip(t). For an introduction to this generalized integral and its applications to generalized differential equations, see Schwabik [1992] . Still more generally, let h h( T, J) be a Banach-space-valued function defined for real numbers and compact intervals J. The limit of the sums =
=
T
=
� [h , T]
11
2::: h h . [tj - 1 , tJ D , J= l
when it exists, is sometimes called the Henstock-Kurzweil integral of h. For details the reader may refer to papers and books by Henstock and Kurzweil.
Chapter 24: Generalized Riemann Integrals
634
The Lebesgue integral is an absolute integral - i.e., if f is Lebesgue integrable, then so is l f ( · )l; this fact is built into our definition of the Lebesgue integral. The Henstock and Henstock-Stieltjes integrals are not absolute integrals; the Henstock integral is slightly more general than the Lebesgue integral. McShane [1983] studies a gauge integral which is defined slightly differently from 24.4; McShane's integral turns out to be exactly equivalent to the Lebesgue integral. Further information on McShane 's integral can be found in R. Vyborny [1994/95] and in the appendices of McLeod [1980] . Another integral, due to Frechet, is particularly simple and noteworthy: Forget about gauges. Let :J' be the set of all tagged divisions of [a, b], with this ordering (which ignores the placement of the tags): T1 � T2 if the partition of T1 is a refinement of the partition of T2 - i.e., if {divison points of T2} � {division points of Tt }. The limit of the resulting net (r:[f, T]) is sometimes called the refinement integral. It has the advantage that, although it resembles the Riemann integral, it does not depend as heavily on the specialized nature of subintervals of JR. - it generalizes very easily to integrals over any measure space. It is discussed further by Hildebrandt [1963, pages 320-325]. The refinement integral is slightly simpler than the gauge integrals studied in this chapter, and perhaps it is a better approach in some respects; that question deserves further study. We prefer the gauge integral chiefly because at present it is more compatible with the wider body of mathematical literature. 24.10. Proposition. If 8 [a, b] (0, +oo) is any gauge, then there exists a 8-fine tagged division of [a, b]. Proof Let S = { s E [a, b] : there exists a 8-fine tagged division of [a, s]}. We are to show that b E S. Trivially, a E S. Let = sup(S). There is some s E S such that s > 8( ). Any tagged division of [a, s] can be extended to a tagged division of [a, ] by tacking on the additional interval [s , ] with tag This proves E S. If < b, then any tagged division of [a, ] can be extended to a larger interval [a, ] by tacking on an additional subinterval [ ] with tag - thereby contradicting the maximality of Thus b = so b E S. 24.11. A useful gauge. The following construction will be used in a few proofs later in this chapter. Let any finite, nonempty set Q � [a, b] be given. Let p = min{ l q - q' l : q, q' E Q, q =J q'}, or let p = 1 if Q consists of just one point. Define a gauge 1 [a, b] (0, +oo) by: min{p, d�st(t, Q)} when t � Q r(t) when t E Q. Then it is easy to see that any 1-fine tagged division T = ( n, tJ, TJ) will have the following properties: (i) No subinterval [tj_1, tj] .contains more than one member of Q. (ii) If q E Q n [tj_1, tj], then q is equal to Tj (i.e., the tag of the subinterval). 24.12. A useful tagged division. The following construction will be useful in a few proofs ) be any tagged division of an interval [a, b] . We later in this chapter. Let S = ( can form a related, new tagged division T = (2m, tj, Tj) by the following rule: We have :
--+
(J
(J, (J
1
(J
(J
(J
:
(J .
--+
{
m , si , (Ji
(J
1
(J
(J -
(J
(J
(J.
(J,
(J
635
Basic Properties of Gauge Integrals
s;_ 1
::; u; ::;
s;, so we may subdivide each interval [s;- 1 , s;]
into the two subintervals
and with both new tags r2;_ 1 and T2; equal to the old tag u; . (Of course, some of the new subintervals may have length 0, but that is not a difficulty - see the remarks in 24.6.) This tagged division T has the following important properties. (i) For any gauge b, if S is b-fine, then T is also b-fine. (ii) For any function g : [a, b] --> X , we have �[g, S] = �[g , T]. (iii) Each subinterval [tJ_ 1 , tj] in the tagged division T = (2m, tJ , TJ ) has for its tag one of the subinterval ' s endpoints, tJ _ 1 or t . r1
1
BASIC PROPERTIES OF GAUGE INTEGRALS 24.13. Some trivial integrals. a. In our definitions of the integrals, we permit a = b. Trivially, faa f(t)dt and I,� f(t)dip(t) always exist and are equal to 0. b. Let f be a constant function: f(t) = x for all t E [a, b] . Then we have the Riemann
integral I: xdt = (b - a)x or, more generally, the Riemann-Stieltjes integral J:' xdcp = ['f?(b) - cp (a)]x for any function 24.14. Integrals as linear maps. If T is any tagged division of [a, b] , then f �[f, T] is a linear map from X [a. bJ into X. The Riemann integrable functions form a linear subspace of X[a. b] ; it is the set of all f for which the net (� [f, T] : (T, b) E TI) is convergent. The Riemann integral is a linear map from that linear subspace into X; it is the pointwise limit of the net of functions � [ , T] . Analogous remarks apply for the Henstock integral, with TI replaced by E. . Analogous remarks apply for the Stieltjes integral I: fdip, as a function of f (with fixed) or as a function of cp (with f fixed). IP ·
f--7
·
cp
24.15. Negligibility of small sets. a. If p : [a, b] X is a function that
is only nonzero on a finite subset of [a, b] , then the Riemann integral I: p(t)dt exists and equals 0. If f, g : [a, b] --> X are functions that only differ on a finite subset of [a, b] , then the Riemann integral J� f(t)dt exists if and only if the Riemann integral I: g(t)dt exists, in which case they are equal. Thus, if we change the value of a function at finitely many points, its Riemann integral is not affected. In the preceding statements, we cannot replace "finite" with ' countable." For example, show that l iQi , the characteristic function of the rational numbers, is not Riemann integrable on any interval of positive length. -->
636
Chapter 24: Generalized Riemann Integrals
If p : [a, b] ----> X is a function that is only nonzero on a countable subset C { Cj } of [a, b] , then the Henstock integral I: p(t)dt exists and equalsj 0. (Hint: Given any number E > 0, choose a gauge tJ so that llp(cj ) ll b(cj ) < 2 - € for all j; choose tJ arbitrarily outside C.) For instance, I� lrQ(t)dt 0. If j, g : [a, b] ----> X are functions that only differ on a countable subset of [a, b] , then the Henstock integral I: f(t)dt exists if and only if the Henstock integral I: g(t )dt exists, in which case they are equal. Thus, if we change the value of a function at countably many points, its Henstock integral is not affected. Remarks. The value of a Henstock integral is not affected if we change the integrand on a set of Lebesgue measure 0; that fact will follow from 21.37.i and 24.36. However, for some purposes involving Henstock integrals, we cannot ignore uncountable sets, even if they have measure 0; for instance, see 25.19 and 25.25. d. Let f and r.p be two functions on [a, b] , at least one of them scalar-valued. If r.p vanishes at a and b and at all but finitely many points of (a, b), then the Henstock-Stieltjes integral I: f dr.p exists and equals 0. (Hint: Use 24.11 and 24.12. ) If 7/J1 , 7/J2 agree at a and b, and differ only on a finite subset of (a, b) , then I: f d7/J 1 exists if and only if I: f d1/J2 exists, in which case they are equal. b.
=
=
c.
24.16. Some elementary estimates.
If f : [a, b] ----> X and h : [a, b] ----> IR are Henstock integrable and llf(-) 11 :S h(·) on [a, b] , then II I: f(t) dt ll :::; I: h (t) dt. More generally, if r.p : [a, b] IR is an increasing function, f : [a, b] ----> X and h : [a, b] ----> IR are r.p-integrable, and II f( - ) II :S h(-) on [a, b] , then I I: f dr.p ll :S I: h dr.p. Hint: First show that II � [!, T, r.p] ll :S � [h, T, r.p] . b. A mean value theorem. If f : [a, b] ----> X is Henstock integrable, then b�a I: f(t) dt is in the closed convex hull of the range of f. More generally, if r.p : [a, b] IR is an increasing function and f : [a, b] ----> X is r.p-integrable, then [r.p(b) - r.p (a)] - 1 I: f dr.p is a member of the closed convex hull of the range of f. Hint: First show that [r.p (b) - r.p(a)] - 1 � [j, T, r.p] E co(Ran(f)). Let j, r.p be functions defined on [a, b] , at least one of them scalar-valued. Suppose r.p has bounded variation and f is bounded and r.p-integrable. Then I I: f dr.p l :S llflloo Var( r.p , [a, b]) . Hint: First show that l� [f, T, r.p] l :S llfllooVar(r.p, [a, b]). 24.17. Theorem on uniform limits. Assume the normed space X is complete. Suppose that h, h, h, . . . : [a, b] ----> X are functions converging uniformly on [a, b] to a function a.
---->
---->
c.
f : [a, b] ----> X.
(i) If the fn 's are Riemann integrable or Henstock integrable, then f is integrable in the same sense, and I: fn(t)dt ----> I: j(t)dt.
637
Basic Properties of Gauge Integrals ----+ lR?.
(ii) More generally, suppose tp : [a, b] is an increasing function. If the fn 's have Riemann-Stieltjes or Henstock-Stieltjes integrals with respect to cp, then f is integrable in the same sense and I: fn(t)dcp(t) ----+ I: f(t)dcp(t) . Hints: It suffices to prove (ii). By assumption, Ej = II! - fj ll= supt l f(t) - fj (t) l tends to 0 as j ----+ We have ll fj (t) - fk(t) ll :S Ej + Ek for all t. Hence =
oo.
b
b
b
I f fJ d'P - J fk dcp ll = II f ( JJ - fk )dcp ll a
a
which tends to 0 as j, k limit Now estimate v.
a
----+ oo .
:S
b
j (Ej + Ek)dcp = (cp(b) - cp(a)) (EJ + Ek),
a
Thus the sequence (J fJdtp) is Cauchy and converges to some
24.18. Reparametrization Theorem. Let a : [a,b] [a, b] be an increasing bijection. Let f and be functions defined on [a, b] , at least one of them scalar-valued. The� the Henstock-Stieltjes integral I: f dcp exists if and only if the Henstock-Stieltjes integral I� (f o a) d( tp o a) exists, in which case they are equal. Proof. Let J f o a and (i5 cp o a. We are to prove that I: f dcp exists if and only if I� jd(i) exists, in which case they are equal.1 There is a symmetry between the "hat" quantities and the "no-hat" quantities, since a - : [a, b] [a, b] is an increasing bijection and we have f fo a - 1 and tp (i) o a - 1 . Corresponding to each tagged division ----+
tp
=
=
----+
=
=
T
is another tagged division T
defined by tJ a - 1 (t1 ) and Tj a - 1 (Tj ) · It is easy to verify that � [f, T, cp] � [j, T, (i)] . We are to prove that limy � [!, T, cp] exists if and only if limf � [j, T, (i)] exists, in which case the limits are equal. Thus, it suffices !._o prove that the tagged divisions T become fine when and only when the tagged divisions T become fine. By symmetry, it suffices to prove half of this implication. Thus, let any gauge 8 on [a, b] be given; it suffices to prove the existence of a gauge 8 on [a, b] with the property that whenever T is 8-fine, then T is 8-fine. (Caution: The most obvious choice is 8 8 o a - 1 , but that choice doesn't work; we need something slightly more sophisticated.) Since a - 1 : [a, b] [a, b] is an increasing bijection, it is continuous; in particular it is continuous at T. For each number T in [a, b) , we can choose 8(T) to be a positive number small enough so that T + 8(T) E [a, b] and =
=
=
=
----+
638
Chapter 24: Generalized Riemann Integrals
Also, for each number T in (a, b], we can choose b ( ) to be a positive number small enough so that T - b (T) E [a, b] and T
These conditions can be satisfied simultaneously since � 8 (0" - 1 ( T)) is a positive number. Now suppose T is b-fine. Then for each j, we have Tj E [tj_ 1 , t1] and ti - tj_ 1 < b(T1); hence and 1 1 We can now prove 0" - (t1) < CJ - (TJ ) + � 8 (CJ - 1 (T1 )) by two different arguments. If Tj E [a, b) , then this inequality follows from ( * 1 ) ; if TJ b, then we deduce that t1 TJ. Similarly, we obtain 0"-1 (t1_ 1 ) > CJ- 1 (T1 ) - �8 (CJ- 1 (T1)). Hence =
t j - tj - 1
so T is 8-fine.
0" - 1 (tj ) - 0" - 1 (tJ - d
=
X of the form F(t ) = x + I: f dip, for any constants c E [a, b] and x E X. Note that any two indefinite integrals of f differ by a constant. For simplicity, the most common choice of F is with c = a and x = 0. For most applications, the particular choice of x and c does not matter, so we may refer to F as "the indefinite integral of f." We may write it as f dip + constant. This is actually a whole collection of functions - one for each choice of the constant - but any one of those functions will work equally well in most applications. 24.25. Continuity Theorem. If f is Henstock integrable on [a, b] , then the indefinite integral F(t) = I: f(s)ds is continuous. More generally, if f is ip-integrable, then the indefinite integral F(t) = I,: f dip is right continuous (respectively, left continuous) at each point where ip is. Proof Fix any p E [a, b) where ip is right continuous; we shall show that F is right continuous at p. (A similar argument works for left continuity.) Let any E > 0 be given; choose some corresponding gauge 8 as in the definition of the Henstock integral or in the Henstock-Saks Lemma. Replacing 8 with a smaller gauge if necessary, we may assume p + 8(p) < b. Now consider any q E (p,p + o(p)). By applying 24.10 on the intervals [a,p] and [q, b] we can obtain a 8-fine tagged division T that has as one of its subintervals [t]- l , t� = [p, q] with tag T] = p. Apply the Henstock-Saks Lemma 24.23 with J equal to the singleton {]}; thus lf(p)[ip(q) - ip(p)] - I: f dipl < E. This proves that IF(q) - F(p) l < E + l ip (q) - ip(P)I IJ(p)l. q E (p,p + 8(p)) Hence lim supq lp IF(q) - F(p)l ::; E . Since E was chosen arbitrarily, we have IF(q) - F(p)l ::; 0, hence limq sup lim q lp F(q) = F(p). lp --->
F(t)
1t
INTEGRALS OF CONTINUOUS FUNCTIONS 24.26. Advanced calculus theorem: existence of the integral. Assume the normed space X is complete, and let f : [a, b] ---> X be continuous - or more generally, piecewise continuous (defined in 19.28). Then: (i) The Riemann integral I: f(t)dt exists. (ii) More generally, let ip : [a, b] lF be any function of bounded variation. Then the Riemann-Stieltjes integral I: fdip exists. --->
643
Integrals of Continuous Functions
Hints: f is a uniform limit of step functions; we shall apply 24.22 and 24.17. It suffices to consider the case of real-valued "' ' since 'P = Re( 'P) + i Im( 'P). It suffices to consider 'P increasing, since any real-valued function of bounded variation is the difference of two increasing functions. Remark. Much weaker hypotheses imply the existence of integrals; see 24.45 and 29.33.b. 24.27. Converse proposition ( optional) . Let (X, II I I ) be a normed vector space that is not complete. Then there exists a continuous function f : [0, 1] X that is not Henstock integrable. Proof By assumption, there exists some sequence (xn) in X that is Cauchy but does not converge. Replacing (x, ) with a subsequence,n we may assume llxn - Xn+l ll < 4 - n for all n E N. Let = x, - Xn+l ; then llun ll < 4 - but :L:=l n = lim N _, 00 2:: := l Un does not exist in X. Note that 2:::= 1 does exist in the completion of X, which we shall denote by Y. Now let1 'P : [0, 1] [0, + oo) be some continuous function satisfying 'P(O) = '1'(1) = 0 and J1 'P(t)dt > 0. (The particular choice of 'P does not matter; three such functions are ! - I t - H i - (t - ! ) 2 , and sin(1rt).) Define f : [0, 1] X as follows: Let J(O) = 0 . For n = 1, 2, 3, . . . , on the subinterval [2 - " , 2 - n +l ] , let f(t) = 2"'P(2"t - 1 )un · Then f is ncontinuous on that subinterval and vanishes at each end of that subinterval, and ll f(t) ll :::; 2 ii'PIIoo ll un ll < 2 - n ii'PIIoo everywhere on -that subinterval; hence f is continuous everywhere on [0, 1] . An easy computation shows 2 n+l J2_ , j(t)dt CUn. We may view f as a continuous function from [0, 1] into the completion space Y. Then f is Riemann integrable in Y, by 24.26(i). It is intuitively obvious (and an only moderately difficult exercise to prove) that J;11 f(t)dt = c 2:::, 1 which exists in Y but not in X. If f has a Riemann or Henstock integral in X, then that integral must coincide with the Riemann integral in Y; thus f does not have a Henstock integral in X. 24.28. Proposition. Let IF be the scalar field, and let C[a, b] = {continuous functions from [a, b] to IF}. Let X be a Banach space. Define --+
U11
U
Un
--+
c =
--+
=
Un ,
C[a, b] l_ =
{
1/J E BV([a, b] , X)
: 1b f d1j; = 0 for every f E C[a, b] } .
Then a second, equivalent definition is { 1j; E BV([a, b] , X) : 1/;(a) = 1/J(b) and the set { t E [a, b] : 1/J( t) =1- 1/J( a) } is at most countable } . The linear space BV([a, b] , X) can be expressed as a direct sum of two linear subspaces, as follows: BV([a, b] , X)
C[a, b] _t
EB
NBV([a, b] , X) ,
644
Chapter 24: Generalized Riemann Integrals
where N BV([a, b] , X) is defined as in 22.19.d. That is, any '1/J E BV([a, b] , X) can be written in one and only one way as '1/J = 'I/J1 + 'I/J2 with 'ljJ1 E C[a, b]j_ and 'I/J2 E NBV([a, b] , X). Furthermore, Var('I/J2 ) :::; Var('I/J). Proof (following Limaye [1981]). To prove the equivalence of the two definitions of C[a, b]j_ , let any '1/J E BV([a, b] , X) be given. For simplicity of notation, replace '1/J( - ) with the function '1/J(·) - '1/J(a); thus we may assume '1/J(a) = 0. First suppose '1/J E C[a, b]j_ using the first definition. If we take f to be the constant function 1, we find that '1/J(b) = '1/J(a) . Now let v(t) = Var('I/J, [a, t] ); then v is increasing and hence has at most countably many discontinuities. Fix any point t0 where v is continuous; it suffices to show that 'ljJ(t0) = 0. For large integers n, define the continuous function if a :::; t :::; to I - ( t - to ) n if to :::; t :::; to + � fn (t) if to + � :::; t :::; b. Using 24.19, we can compute
{ :
1toto +* fn(t) d'lj! + 0. Since ll fnlloo :::; 1, 24.16.c shows that 1 '1/J(to) l :::; Var('I/J, [to, to + �]) = v(to + �) - v(to). Now '1/J(to) +
0
take limits as n On the other hand, suppose that '1/J E C[a, b] j_ using the second definition. We use the fact that J: f d'lj! is a Riemann-Stieltjes integral, not just a Henstock-Stieltjes integral. We have .r,: f d'ljJ = limr I.:[f, T, '1/J] for any choice of tagged divisions T that have subinterval lengths tending to 0. By our hypothesis on '1/J, we can choose the tagged divisions T so that the subintervals [t1_ 1 , t1] satisfy '1jJ(t1_ 1 ) = '1jJ(t1 ) = 0. Then I.:[!, T, '1/J] = 0 for all such tagged divisions. This completes the proof of the equivalence of the two definitions. From the second definition of C[a, b] j_ it is clear that C[a, b]j_ n N BV([a, b] , X) = {0}; hence any '1/J can be written in at most one way as 'I/J1 + 'ljJ2 . Let us show that it can be written in at least one way. Let any '1/J E BV([a, b] , X) be given. Since any constant function belongs to C[a, b]j_ , we may replace '1/J( · ) with the function '1/J( ·) - '1/J( a); thus we may assume '1/J(a) = 0 to simplify our notation. Now define 0 when t = a '1/J(t+) when t E (a, b) 'I/J2 (t) '1/J(b) when t = b. Then 'lj!2 is right continuous on (a, b). To show that Var('I/J2 ) :::; Var('I/J), let any partition a = to < t 1 < t 2 < · · · < tn = b be , given and any number E > 0. For j = 1, 2, . . . , n - 1 choose some point Sj E (tj , tJ + I ) with I 'I/J2 (tj ) - '1/J(sJ ) I < �E. That inequality is also satisfied for j = 0 and j = n by taking so = a and Sn = b. Hence ----+ oo .
{
Tl
n
j= I
j=l
645
1\Ionotone Convergence Theorem
Thus Var(l/!2) v - �c; this division will remain fixed throughout the remainder of the proof. Let Q = { q0, q1 , . . . , qP }. Define a gauge 1 as in 24.11. We shall show that the gauge 8 = min{/, rd has the required properties. Let S = (m, s ; , ) be any 8-fine tagged division of [a, b] ; we are to show that l v I: [lfi , S, tpJ I < E. Construct an auxiliary tagged division T = (2m, tj, Tj) as in 24.12. By 24.11 we have Q � { 1 , , . . . , am} = {71 , T2, . . . , T2m} � {to, t 1 , t2, . . . , t2m}; hence ;
a;
v,
· ·
·
a; a
a2
I
and therefore l v - 2::::�:"1 I I I1:'_1 f d'P II < �E . The tagged division T is 8-fine, hence 1 1 -fine, so we may apply the Henstock-Saks Lemma, which yields the first inequality in the following
648
Chapter 24: Generalized Riemann Integrals
string of inequalities: c -
2
>
>
� I t. f d
·
I:
m ->
I:
v.
I:
I:
->
->
I I:
I:
->
I:
I J:
649
Henstock and Lebesgue Integrals
c.
d.
Also, lf(t) + g(t) l bk] selected in this fashion, there may be more than one point that is suitable for use as akl but we choose one particular value for ak. Note that the resulting intervals (ak, bk] are disjoint; also note that f(ak) 2 1 . We claim that U;;= 1 (ak , bk]j :2 E. To see this, fix any z E E. For some integer j sufficiently large, we have 2- ( b - a) < 8 ( z). Since a tt E , we have z "/=- a, so 1one of the 1 intervals (p',t , pj] (for u = 1 , 2, . . . , Q(j)) must contain z, and that interval (pj - , pj] must have length less than 2 -j (b - a). That interval must be selected in the jth stage, if it is not contained in an interval that was already selected in an earlier stage. Thus z E u;;= l (ak, bk], proving our claim. Next we show that 2::: � 1 [rp (bk) - rp(ak)] :::; �c + I: f drp. To see that, fix any pos itive integer K; it suffices to show 2::: �= 1 [rp (bk) - rp (ak)] :::; �c + I: f drp. The intervals (a1 , b l ), (a2, b2), . . . , (aK , bK) are disjoint, and the complement of their union is equal to the union of finitely many subintervals of [a, b]. Apply 24. 10 to obtain a 8-fine tagged divi sion of each of those subintervals. Putting all the subintervals together, we obtain a 8-fine •
• •
:
;
I
(
651
Henstock and Lebesgue Integrals
tagged division T = (m, t11 , Tv ) of [a , b] , in which ( [ak, bk ] , ak) l
==?
=
2
==?
Chapter 24: Generalized Riemann Integrals
654 necessarily 0). Temporarily fix any number
(J(t) - nc:) + 1\ c:
Un (t)
{
c: > 0. For n = 0, 1 , 2, . . . , define the functions 0 if ( t) :::; nc: f(t) - nc: if nc: :::; j(t) :::; (n + l)c: c: if (n + l)c: :::; j(t).
f
Then the functions Un are all absolutely ip-integrable, since the absolutely ip-integrable functions form a vector lattice, as noted in 24.32.b. It is easy to verify that I:�=O un(t) = j(t) for each t; hence I:�=O Un dip = dip by 24.30.a. {t E : By Lemma 24.34, for each n 2 0 we may choose some open set Gn c:- 1 u n (t) 2 1 } = {t E un(t) = c:}, satisfying
[a,b] :
I:
I: f
2-n- l +
-c 1ab 1
::2
[a, b]
Un dip.
I: ip
g df.l'P = g d :::; c: + Then g = c: I:�=O l en is Borel measurable, and dip by the : nc: < j(t) < Levi Theorems 21 .39.b and 24.30.a. Note that the sets Hn = { t E (n + l)c:} (for n = 0, 1 , 2, 3, . . . ) are disjoint, and ( l e n + l HJc: 2 Un · Hence, summing over n, we obtain g + c: 2 Our construction of g depended on the choice of c:. Now construct such a function g = gk for each of the values c: = t (for k = 1 , 2, 3, . . . ) . Thus we obtain functions gk E L 1 (f.L'P , IR) with gk + t 2 and gk df.l 'P :::; dip. Let h = lim infk� oo gk . Then h is Borel h dip = measurable, h 2 j, and by Fatou's Lemma (21 .39.c) we have h df.l'P :::;
I[a ,b]
f.
[a, b]
I: f
f I[a,b] � I: f I[a ,b] I: I: f dip . Hence f - h is nonnegative and I: (J - h) dip = 0. By the special case discussed earlier in this proof, it follows that f - h E U (M'P ) , and therefore f E L 1 (M'P ) . This completes the proof in the case where X = IR and f 2 0. (iii) We next prove (B) (A) in the case where X = IR (but f is not necessarily nonnegative) . As we noted in 24.32.b, the absolutely ip-integrable functions form a vector j + - f- (see 8.42.f) . The lattice. Hence we may write the Jordan Decomposition f functions j + , f- are absolutely ip-integrable, so the problem is reduced to the previous =}
=
case.
(iv) Finally, we prove (B) =} (A) in general - i.e. , where X is any Banach space. Any complex Banach space may be viewed as a real Banach space, by "forgetting" how to multiply vectors by members of C \ JR. This has no effect on conditions (A) and (B); hence we may assume the scalar field is JR. By assumption, is almost separably valued, so by changing on a set of M'P-measure 0 we may assume is separably valued. By assumption, the vector-valued function is absolutely ip-integrable; hence the real-valued is also absolutely ip-integrable. By the previous case of this theorem, we function know that I J( - ) 1 E L 1 (f.L'P , IR) . Temporarily fix any ,\ E X * . We claim that the function ,\ o ___, IR is ip-integrable, with dip = ,\ dip ; indeed, this is clear from the estimate
If(-) I
I: >.(! (·))
f
f
(I: f ) � >.(f(ui )) [ip(si ) - ip(si_I )] - ,\
f
f
f : [a, b]
(1b f ) dip
655
Henstock and Lebesgue Integrals
l (t f(a; ) A ll I t f ( a; )
1b f dso) I b < I [so (s ; ) - so (s ; - d ] - 1 f dso l · ; A(j( ·)) dso = A ( I: f dso) q] � [a, b] . I a= < < < ··· < =b t. A (L f d�) < I A l t. f. f d� l < I A ll l i t ( ) I d� A o f [a, b] lR A o f E U (JL op , JR) I[a. bJ A o f dJL 'P = I: A(j(·))dso . of X f A
A similar estimate shows that Hence for any partition Po
PI
[so (s ; ) - so (s; - d ] -
P2
Pn
for any subinterval [p, we have
l
: --+ is absolutely so-integrable. Apply the previous case By 24.31, therefore, (iii); thus with In particular, A is measurable from to the Borel sets of JR. The function is separably valued and weakly measurable, hence (see 23.25) strongly measurable. Since L 1 (JLop, X). The lf(·) l L 1 (JLop , lR) (established earlier in this proof), it follows that equation dJL'P I: f dso was established when we proved (A) =} (B) .
E
I[a.b] f =
fE
--+
24.37. Corollary. Let X be a Banach space with scalar field F (equal to lR or C). If --+ F has bounded variation and : [a, b] X is bounded and strongly measurable (from the Borel sets to the Borel sets) , then the Henstock-Stieltjes integral dso exists.
so : [a, b]
f
I: f
24.38. Remarks. The Henstock integral can be generalized, though only with some diffi culty, to domains more general than an interval [a, b] . The Bochner/Lebesgue approach is more powerful, in that it applies easily to a very wide collection of measure spaces (fl, S, JL) . In certain other respects, however, the Henstock integral is actually more general. Built into the definition of the Bochner/Lebesgue integral are a separability condition and an must be integrable). These re absolute integrability condition (i.e., not only but strictions are not imposed on the Henstock integral; hence we can devise functions that are Henstock integrable but not Bochner /Lebesgue integrable by violating either the separabil ity condition or the absolute integrability condition. Violations of the separability condition are perhaps contrived and artificial, since all of applied mathematics (all of "the real world" ) happens in separable Banach spaces, or in separable subspaces of Banach spaces. Violations of the absolute integrability condition are not so contrived, however. A study of the continuous dependence on parameters and asymptotic behavior for solutions to differential equations with rapidly oscillating terms leads to functions very much like the pathological function in 25.20, which is Henstock integrable but not Lebesgue integrable. In fact, it was the study of such solutions to differential equations that led K urzweil to his independent discovery of the Henstock integral (also known as the Kurzweil integral) ; for instance, see Kurzweil [1957] .
f
llf (-) 11
Chapter 24: Generalized Riemann Integrals
656
MORE ABOUT LEBESGUE MEASURE 24.39. Example: meager but full. The sets with Lebesgue measure 0 and the meager sets form two a--ideals on IR and thus two different notions of "small" sets. These notions are not directly related; a set may be small in one sense while large in the other sense. That is evident from the following example. Let (rj ) be an enumeration of the rationals. For i , j E N, define the open interval H;,j = (r; - 2- i -j , r; + 2- i -j ) . Then Gj = U � 1 H;,j is an open dense subset of IR, and so C = n� 1 Gj is a comeager set with Lebesgue measure 0. Thus it is "small" with respect to Lebesgue measure, but "large" with respect to Baire category. Its complement has these properties reversed. Note also that C is uncountable since it is not meager. 24.40. Proposition on regularity of Lebesgue measure. Let J-L denote Lebesgue measure on JR. If S � IR is Lebesgue measurable, then J-L(S)
sup { J-L(K) : K
� S,
K compact } .
Proof. Let any E > 0 be given. By 24.35, for each integer n E .Z we can find some compact set Kn � S n [n, n + 1] such that J-L(Kn ) > J-L(S n [n, n + 1]) - 2- l n l -2c. Any overlap among the Kn 's or among the sets S n [n, n + 1] is contained in the set .Z, which has measure 0. Hence for any N E N, we have J-L(U i n i .C: N Kn ) = L l n i £2 that is discontinuous everywhere. For each positive integer m, let e rn be the sequence with 1 in the mth place and Os elsewhere. Let (rrn : m E N) be an enumeration of the rational numbers in [0, 1] . Define f(rm ) = ern for all m and f(t) = 0 when t is irrational. Then f is discontinuous everywhere. To prove that f is Riemann integrable, let T = ( n, t1, Tj ) be any tagged division with max1(t1 - t1_ 1 ) < E. We may merge two consecutive subintervals [t1_ 1 , t1] and [t.7 , t1 + d if they have the same tag - i.e. , if tJ = Tj - l = T1 , then we may replace the two subintervals with a single subinterval; this does not affect the value of the approximating Riemann sum I:[!, T] . The resulting new tagged division still satisfies max1 (t1 - t1_ 1 ) < 2E , and no two of its tags are identical. Hence (j(T1), j(Tk)) = 0 for j =/= k. Therefore
III: [! , TJ I I 2 Thus
J01 f(t)dt = 0.
2 n L j(Tj)(tj - tj - d j=l
n L llf(Tj ) ll 2 (tj - tj_I) 2 < 2E . j=l
C hapter 2 5 Frechet Derivat ives
DEFINITIONS AND BASIC PROPERTIES :
25.1. Definitions. Let X and Y be normed spaces, and let f n --+ Y be some function with domain f1 � X. We say that L is a derivative of f at a point � E f1 if L X --+ Y is :
a bounded linear map satisfying lim X --+ �
ll f(x) - !(0 - L(x - O il
ll x - �II
0,
or, in greater detail, lim
T
10
l l f(x) - f(O - L(x - �) II
sup
XE
f1,
0 < ll x - � II
l lx - �II
0.
Y b e some mapping, and let � E 0. Suppose that either (i) � is i n the interior of n, or (ii) n is convex and has nonempty interior. Then the derivative of f at �. if it exists, is unique - i.e., there is at most one bounded linear operator L : X ---> Y satisfying condition 25. 1 ( * ) .
Remarks. These hypotheses can be weakened, but apparently not without making them more complicated. Note that hypothesis (ii) is satisfied if n is an interval in the real line. Proof of proposition. By replacing f with the function u f-+ f(E, + u) , we may assume 0 E n = 0; this will simplify our notation. By assumption, [[f(x) - f(O) - L(x) [[ sup lim 0. [[x[[ rlO xED, 0 < l l xl l < r
and E,
Suppose that L 1 and L 2 are two bounded linear operators satisfying this condition; we must show that the bounded linear operator M = L 1 - L 2 is equal to 0. We know that M satisfies lim
sup
r LO xED , 0 < ll x ll < r
[[ M(x) [[ [[x[[
0.
To show that the linear mapping M equals 0, it suffices to show that M vanishes on some nonempty open subset of X; in particular, it suffices to show that M vanishes on int(O). Consider any nonzero point v E int (O) . If either 0 is convex or 0 E int(O) , then tv E 0 for all t > 0 sufficiently small. Then 0 = limqo [[M(tv) [[/[[tv[[ = [[ M(v) [[/[[v[[, so [[M(v) [[ = 0.
25.4. Definitions. We say that f is differentiable on the set n if the Frechet derivative f' ( E,) = * ( �) exists for every point � E 0. Thus we define a function f' = * : 0 --->
BL(X, Y) , where BL(X, Y) is the normed space of bounded linear operators from X into Y (introduced in 23. 1 ) . We say that f is continuously differentiable on 0 if it is differentiable and the mapping f' : n ---> BL(X, Y) is continuous. The linear space of all continuously differentiable maps from n into Y is sometimes denoted C 1 (0, Y ).
Definitions and Basic Properties ----)
663 :
----)
If f : n y is continuously differentiable, then the continuous mapping ! ' n BL(X, Y) might also be differentiable at some point �- Then its derivative is the second derivative of f, denoted f"( O = (0. That operator is a member of B L(X, B L(X , Y ) ) ; d3 it may be viewed as a map from X x X into Y. Similarly, we may define f "'( O = !3 (�) , dx dn j n ( n , Y) the class of functions f for (�). We denote by c etc. , and in general f ( n l (�) dxn which j ( n ) exists and is continuous. When Y is the scalar field (IR or C), then cn ( n , Y) may be written more briefly as c n ( n ) . A function is called smooth if it has derivatives of all orders - i.e. , if it belongs to c= (n, Y) n�=l cn (n, Y ) . Note that f ( n ) ( O = [f ( " l ( O ] ( ·) is a mapping from x n into Y; we may write it as [j ( n ) (�)] (x 1 , x 2 , . . . , x n ) · It is linear in each Xj if � and all the other x; 's are held fixed. In general it is not linear in � if all the x; ' s are held fixed.
�:{
=
=
25.5. Elementary examples and properties. a. If f is differentiable at � ' then f is also continuous at � b. If f is a constant function, then f'(�) 0 for all � E D. c.
=
Review (from a calculus text) the proof of the product rule:
:t (f(t)g(t))
f'(t)g(t) + f(t)g'(t)
for scalar-valued functions f and g . ----) y is a bounded linear operator and f(x) = M(x) for all X E n , then = M for each � E D. Thus the mapping f' : n ----+ BL(X, Y) is a constant f'( O mapping, since it takes the value M at each point of n. Hence f " = 0. However, f itself is not constant (unless M = 0). This is a subtle distinction that may confuse some beginners. We have f(x) = M(x) and f'(x) = 1\J, but these are two different things: 1\I(x) is a particular member of Y, whereas 1\1 is a mapping from X into Y. e. If f : X ----+ Y is some mapping and Y is the scalar field, then f'(x) (if it exists) is a member of the dual space X * . This situation should not be confused with the situation in 25.2. f. When X = IR, then we can use one-sided limits (as in 15.21) to define the one-sided
d. If M : X
derivatives:
r (0
=
lim : d�
f(x) - f( O X-�
and
r (O
=
lim :r j l;
f(x) - f( O X-�
.
When � lies in the interior of the set n, then the Frechet derivative f'( O (defined as in 25. 1 ) exists if and only if both the one-sided derivatives exist and are equal, in which case f' ( 0 is equal to their common value. If D is an interval and � is the left endpoint of that interval, then the limits and derivatives from the left at � are meaningless and the Frechet derivative f' (0 is (by our definition in 25. 1 ) the same as the right-handed
664
Chapter 25: Fnkhet Derivatives
derivative j + ( � ) . Similarly, when � is the right endpoint of the interval, then the limits and derivatives from the right are meaningless and f' (�) = f - (0. g. If we replace the norms of X and Y with equivalent norms, then the linear space BL(X, Y) = {bounded linear operators from X into Y} remains unchanged and its norm also gets replaced by an equivalent norm. The existence and value of ! ' (�) are unaffected by these replacements. Thus, our calculations are actually being performed, not in a normed space, but in a normable space - i.e., in a topological vector space whose topology can be given by various norms but that does not have one of those norms specified in particular. h. ( Optional.) Let X, Y be Banach spaces. With notation as in 23.28, recall that Inv(X, Y) is an open subset of the Banach space BL(X, Y). Define a mapping 2 : Inv(X, Y) ---+ BL(Y, X) by 2(!) = r 1 . Show that 2 is continuously differen tiable, with [2'(f)] (h) = - f - 1 h r 1 . Hint: Use the series in 23.28.b. i. ( Optional.) Let X be a complex Banach space, and let BL(X, X) be the Banach space of bounded linear operators from X into X. Let T be a member of BL(X, X), and let p(T) be its resolvent set (defined as in 23.30) . Show that the mapping ,\ �---* (,\! T ) 1 is a differentiable mapping from p(T) into BL(X, X). What is its derivative? Another example of a derivative in infinite dimensions is given in 25.22. -
-
25.6. Chain rule of differential calculus. Let X, Y, Z be normed spaces. Let S s;;: X T S::: Y be open sets. Let f S Y and g T Z be some functions. Suppose that xo E S and Yo = f(xo ) E T. Suppose that the derivatives at these points, f' (x0 ) and g'( y0 ) , :
and
---+
:
---+
both exist. (We do not assume that f and g are differentiable anywhere else, though we do not prohibit that either.) Then the composition g o f is differentiable at x0, and we have
g' (yo ) o f ' (xo ) , (g o J) ' (x0 ) or, i n other terms, (g o !)'(�) = g' (f(0) o f' ( O. This formula is easier to remember in Leibniz notation: If z is a function of y and y is a function of x , then dz dz dy dx dy dx · The dy s appear to "cancel out" in this formula. The proof of the Chain Rule is similar to -
'
0
-
that given in any calculus book for X = Y = Z = JR, with epsilons and deltas. We leave the details as an exercise. (It is interesting to compare this with 29. 12.b.) Cautionary remark. When X = Y = Z = JR, as in college calculus, the linear operators �� and � are simply the operations of multiplication by a real number (see 25.2); hence the composition of those two operators is just the multiplication of those two real numbers. In that setting, it does not matter in what order we put the factors �� and � , since multiplication of real numbers is commutative. However, in the more general setting of three arbitrary normed spaces X, Y, Z, the order of the two factors is very important. The formula must be stated �� = �� � ; it is incorrect if written �� = � �� . Indeed, the composition �� �� may not even make sense, for �� is a bounded linear operator from X into Y while �� is a bounded linear operator from Y into Z. Even when X = Y = Z, the
Partial Derivatives
665
compositions dydz 1:1LX and '!:JLXd dydz need not be equal, since the composition of bounded linear d d operators from X into itself generally is not commutative if dim( X) ;::: 2 � see 8.27.
PARTIAL D ERIVATIVES 25. 7. The matrix of partial derivatives. In some cases of interest, the normed spaces X
and
Y are products of finitely many normed spaces:
for some positive integers m, n. As we noted in 25.5.g, for our present purposes any norm can be replaced by any equivalent norm. Hence the product topology on X can be given by
l l (x 1 , x 2 , . . . , x m ) ll ll (x 1 , X2 , · · , Xm ) ll ·
ll x 1 ll + ll x2 ll + · · + l l xm ll max { ll x 1 ll, ll x 2 ll , · · · , l l xm l l } ·
or
or any other convenient product norm; similarly for Y. Let 0 be a subset of X; then a function f : 0 --> Y can be represented by a wide assortment of notations:
(h (x) , h (x), . . . , fn(x) ) (h (x 1 , X2 , . . . , xm ), h (x 1 , X2 , · · · , xm ), . . . , fn(X 1 , X2 , · · · , xrn) ) .
y = (y 1 , y2 , . . . , yn )
=
f(x) = f(x 1 , X 2 , · · · , x m )
=
We shall use these different expressions interchangeably, switching to whichever one is most usually suppressing whatever information is not convenient in any particular context currently being used. The partial derivative 8yi/ 8xJ is the derivative of the mapping Xj >---+ Yi that we obtain when we consider Xj to be the only variable and view all the other xp ' s as constanto; � i.e., hold their values fixed. With j = 1, for instance, � ( 0 is a bounded linear operator L : X1 --> Y; that satisfies �
lim
u --.�,
ll /i (u, 6 , 6 , · · · , �m ) - /; ( 6 , 6 , 6 , · · , �rn ) - L(u - 6 ) 11 ll u - �1 11 ·
O
(if such an operator L exists) . We define 8yi /8xj for other j ' s analogously. Exercise. Let us represent vectors x E X and y E Y as column matrices; that is,
and
Chapter 25: Fn?chet Derivatives
666
I
Suppose the Fn?chet derivative dy dx = f' ( x) (defined as in 25. 1 ) exists. Then all the partial derivatives exist, and the Fn?chet derivative is equal to the matrix of partial derivatives:
dy dx
!' (�)
oy1 OX!
oy 1 OX2
oy 1 OXm
oy2 oxl
oy2 OX2
oy2 OXm
0Yn OX!
oyn OX2
0Yn OXm
The expression L(x - �) in 25. 1 ( * ) is then evaluated by the usual method for multiplying a matrix times a vector, as in 8.28. Of course, here we must extend the meaning of the term "matrix:" the components oy;joxj of this matrix are not necessarily numbers, or even members of a single ring; rather, oy;joxj is a bounded linear operator from xj into Y; . The components of the matrix are numbers (i.e., scalars) when Xj and Y; are both one-dimensional, as in 25.2. Example. Here is a typical example in two dimensions. If n = X = Y = JR2 and (y 1 , y2 ) = (x 1 cos x2 , x 1 sin x2 ) , then -x 1 sin x2 X 1 COS X2
l
·
I
Thus, for instance, oyl OX! is the function we obtain by viewing X 2 as a constant and Xl as a variable, and differentiating the function y 1 = x 1 cos x2 with respect to that variable. The other partial derivatives oy; joxj are defined similarly. To say that the formula above gives the derivative is to say that this quotient
] [
converges to
0 when x 1
----+
6
.j ' s is less than E / 2. This completes the proof.
.. Let us denote g(s, w) = 'i/;(s,w). By the Second Fundamental Theorem of Calculus 25. 18, we have f(b, w) - f(a,w) = I: g(t, w )dt. Hence L l f(b, w) - f(a, w) l x dJ-l(w) g(t , w)dt dM(w) < IIYII L ' (h ",X ) • x which is finite by hypothesis. It follows that f(s, ·) E L 1 (J-l, X), and / (s) exists for every s E R Then I(b) - I(a) = In [I: g(t,w)dt] dJ-l(w) . Let Y = U (J-l, X). By Fubini's Theorem ( 23. 17) , £ 1 (>. X) is isomorphic to £ 1 (>., Y ). 1 For each v E £ (>. J-l, X) let v be the corresponding member of £ 1 (.>.. , Y ) . The Bochner integrals B[a,b]u = I: u(t)dt and Bnv = In v(w )dw define continuous linear operators B[a,b] : £ 1 ([a , b], Y) Y and Bn £ 1 (J-l, X) X ; one of the conclusions of Fubini ' s Theorem 23. 1 7 is that BnB[a, bJV = In [ I: v(t, w )dt] df-l ( w). B y Lebesgue's Theorem on Differentiation ( 25. 16 ) applied t o members of U (>., Y ) , for s h almost every s E we have g( s) = limh * Is + g( t )dt; the limit and the integral are both in the Banach space Y. Fix any s for which that equation is valid. The equation can be restated g( s) = lim h ..... o * B[s,s+ h]g. Apply the operator Bn on both sides of that equation, keeping in mind that it is continuous and therefore preserves limits. We obtain Since
· · ·
0
X
�----�--�--�
X
X
io l l
x J-l,
x
-->
:
JR.
{ Jn
l
-->
..... o
1 1 lim - B[s,s+ hJ g = lim - BnB[s,s + h] g g(s, w)df-l(w) = Bng(s) = Bn h-->0 h h-->0 h s+ h 11 1 lim lim - [I(s + h ) - / (s)] . g(t,w)dt dJ-l ( = h-->0 w) = hn s h-->0 h
[1
l
680
Chapter 25: Fnkhet Derivatives
This completes the proof. 25.22. Example: differentiation of an integral operator. Let X, (D, S, J.L), f be as in 25.21 , and in addition assume that J.L(D) < n is a compact metric space, and %!- IR n ....... X is jointly continuous. Let BC(IR, X ) { bounded, continuous functions from IR to X } ; this is a Banach space when equipped with the sup norm. Define :
oo ,
X
l !(1(t ), w )d11 (w) for 1 E BC(IR, IR), t E R We shall show that [ F (r)] (·) belongs to BC(IR, X ) , and the mapping F
BC(IR, IR) ---. BC(IR, X )
defined in this fashion is Frechet differentiable. The derivative at any 1 E BC(IR, IR) is given by the continuous linear map F'(r) BC(IR, IR) BC(IR, X ) whose value at any 'lj; E BC(IR, IR) is given by :
--->
l �� (1(t), w)'t/J (t) dJ.L(w ).
Proof Let = %f. Since Range(1) is a bounded subset of IR, it is contained in a compact set K. Then j, K n ---. X are continuous functions on a compact set, hence they are bounded and uniformly continuous there. For fixed t, the function w f-+ f(r(t), w) is measurable and bounded, hence integrable on the finite measure space n. For t E K, the integrand is bounded; hence [F(r)](t) is a bounded function of t. That it is also a continuous function of t follows easily by Lebesgue's Dominated Convergence Theorem ( 22.29) . To show that F' ( 1) has the indicated value, replace K with a slightly larger compact subset of IR, so that (1(t) + 'lj; ( t), w) E K r! for all 'lj; sufficiently small. Let any c > 0 be given. Since g is uniformly continuous on K r!, there is some 8 > 0 such that g
g
:
x
x
x
l 't/J I oo < 8, a E [0, 1] Let us denote [E(r) 't/J] (t) = fo. g (1(t), w) 't/J (t) dJ.L(w). Observe that [F(r + 'lj;) - F(r) - E(r)'t/J] (t) l f (1(t) + 'lj; ( t), w) - 1(1 (t),w) - g (l (t), w)'t/J (t) dJ.L(w) l { 1 1 g (r(t) + a'lj; (t),w) 'lj; (t) dt - g (1(t), w)'t/J ( t) } dJ.L(w)
{
}
681
Some Applications of the Second Fundamental Theorem of Calculus
and therefore I I [F ( r + 1/J) - F( r ) - E (r )1/J] (t) l l x :::; EII1/JIIooM(n). Finally, take the supremum over all choices of t; this yields II F ( r + 1/J) - F (r ) - E (/ ) 1/J I I oo :::; EII1/JIIcxoM(n). Since E is arbitrarily small, this proves F' ( r) = E ( r ) . 25.23. Theorem relating continuous differentiability to strong differentiability.
Let X and Y be Banach spaces, let n � X be an open set, and let f n ----> Y be some differentiable function. Let � E n. Then f is strongly differentiable at � (as defined in 25. 10 ) if and only if f'( ) n ----> BL(X, Y) is continuous at � · Proof First suppose f is strongly differentiable at � · Let (xn ) be a sequence in n converging to �; we wish to show that f'(xn) ---> f'(O . Since f is strongly differentiable at � ' for any number E > 0 there is some number t5 > 0 such that l l u - � I I , llv - � II :::; 2 t5 =? u , v E n and llf (u) - f(v) - f ' (O (u - v)ll :::; llu - v ilE . For sufficiently large (say for 2" N,J we have llxn - � II < b; then -
:
:
n
n
llhll
:=:; {j
=?
llf(x n + h ) - f(xn) - J' (O hll
:=:;
llhiiE .
On the other hand, since f is differentiable at Xn , there is some bn > 0 such that I I J(xn + h) - f(xn) - J'(xn ) hll :=:; l l h i i E . l l h ll :=:; bn Thus llhll :=:; min{b, bn } =? I IJ'(Oh - f' (xn ) hll :=:; 2 l l h i i E . Therefore 2" NE =? 1 11 !'(0 f' (xn ) l l l :=:; 2E , so f'(xn) ---> f' ( � ) . Conversely, suppose f ' ( · ) is continuous at �. Temporarily fix any two points x0 , x 1 near � ' and let x 1 = ( 1 - t )x0 + tx 1 for 0 :::; t :::; 1. Then, applying the Chain Rule ( 25.6) and the Second Fundamental Theorem of Calculus (25.18), =?
n
[
)]
d f( xl ) - f(xo ) = ft f ( x t dt = ft f' (x t ) (x 1 - xo )dt, d t .o .o
and therefore t [J' ( xt ) - J ' (�)] (x 1 - xo )dt.
f(x l ) - f(xo ) - f' ( O (x l - xo )
.fo
When xo and x 1 are near � then all the x1's are near � hence l l f' (x t ) - f'(O II stays small for all t E [0, 1] . This can be' made precise with epsilons' and deltas; we omit the details. It follows easily that l l f (x t ) - f( xo ) - f'(O (x l - xo) ll / llx 1 - xo ll 0 as x 1 , Xo ---> � · 25.24. Theorem relating Lipschitzness to derivatives. Let n be a convex open subset of a Banach space X . Suppose that f n Y is differentiable at every point of n. (We do not require that the derivative of f be continuous.) Then (!) Lip = l l f' l l cxo · That is, ---->
:
sup
J' j
"'"'2
ll f(x t ) - f(x2 ) 11 Y llx 1 - x 2 l l x
____,
sup llf' (x) II BL(X . Y )
xE n
Chapter 25: Fnkhet Derivatives
682
(with one side of this equation finite if and only if the other side is finite) . Thus, Lipschitzian if and only if f ' is bounded. This generalizes a result in 18.3.b.
f
is
Proof First suppose (!) Lip S:: r, and let L = ! '(�) and h E
X \ {0} ; we wish to show II Lh ll S:: r ll h ll · Replacing h with ch for some small nonzero scalar c if necessary, we may assume � + h E 0. Then also � + th E 0 for all t E [0, 1 ] , by convexity. Let any E > 0 be given. By the definition of derivative we have
II ! (� + th) - ! (�) - tLh ll < E for all t > 0 sufficiently small. t llhll For those t, we have II ! (� + th) - ! (�) - tLhll < Etllhll, and therefore t 11 Lhll = ll tLh ll S:: II! (� + th) - ! (�) II + ct ll h ll s; rtllhll + ct ll h ll · Divide by t to obtain II Lh ll S:: (r + c) ll h ll ; then let E l 0. Conversely, suppose 11 !'(0 11 < r for all � E 0; we shall show (!) Lip S:: r. Let any Xo, X 1 E n be given. Since n is convex, the points Xt = tx 1 + ( 1 - t)xo lie in n for all t E [0 , 1 ] . By the Chain Rule (25.6) and the Second Fundamental Theorem of Calculus (25. 18) , we have
f(x l ) - f(xo) = Therefore
l lf(x 1 ) - f(xo)ll
S::
11 [! f(xt )] dt = 1 1 f' (xt ) (x 1 - xo)dt.
J; rllx 1 - xo ll dt = r ll x 1 - xo ll
by 24. 16.b or 24. 16.a.
25.25. Theorem characterizing convex functions on an interval. Let J s;; lR be an open interval, and let f : J (i) f is continuous,
---->
f'(t)
(ii) the derivative and
lR be some function. Then f is convex if and only if
exists except for at most countably many points
(iii) there exists some increasing function g : J ----> but at most countably many points t E J.
t
E J,
lR such that f'(t) = g(t) for all
Moreover, if f is convex, then both of the one-sided derivatives
f(s) - f(t) sTt S - t exist for all t E J, and either of these functions satisfies the requirements on g listed in (iii). Proof First assume f J lR is convex. Show that f(y l ) - f(x l ) < f(y l ) - f(x 2 ) < (!) Y1 - X1 Y1 - X2 whenever X 1 S:: x2 , Y 1 S:: Y2 , and Xj < YJ for j = 1 , 2. The function [f(y) - f(x2)]/(y - x2) is an increasing function of y for y > x 2 , and it is bounded below by [f(y l ) - f(x l ) JI ( Y 1 - xi). f(s) - f(t) slt S - t '
j + (t)
1.
1m
:
---->
r (t)
1.
1m
::____:__ _:___ _:.......:. ...:. .
683
Path Integrals and Analytic Functions (Optional)
Hence j+ ( x2 ) exists for every x2 E J, and therefore f is continuous from+the right at every + , to prove j (xi ) ::; j (x2 ) for XI < x2 ; and I 1 XI J. Take limits in (!) as E 1 x x2 Y2 2 Y thus j+ is an increasing function. Similarly, f - exists everywhere on J and is an increasing function, and f is continuous from the left. Combining these results, f is continuous on J. Since j + and f - are increasing functions, they are continuous except at at most countably many points (see 1 5 . 2 1 .c) . Take limits in (!) as X2 i Y2 and Y I 1 XI to prove r ( Y2 ) � j+ (x i ) when y2 > XI ; or take limits in (!) as Y2 1 x2 and XI i YI to prove j + (x 2 ) � f - (y i ) when + x2 > Y I · Thus, at any point t where f and j are both continuous, they must be equal, and there f' exists. Thus (i), (ii), (iii) are satisfied, with g j+ or g = r. Conversely, suppose (i), (ii), (iii) are satisfied. By the Second Fundamental Theorem of Calculus (25. 18), g is Henstock integrable on each closed interval [a, b] � J, with J: g(t)dt = f(b) - f(a). We have g(s) ::; g(b) ::; g(t) whenever a ::; s ::; b ::; t ::; c, hence =
1b g(s)ds ::; -- 1b g(b)ds = g(b) -b-a b-a
f(b) - f(a) b-a
1
a
1
1 1c 1c = - g(b)dt ::; - g(t)dt c-b b c- b b a
1
f(c) - f(b) c-b
when a < b < c. This inequality can be rearranged to yield
(
)
b-a c-b b-a f cc -- ab a + c - a c = f(b ) ::; c - a f(a) + c - a f(c),
which proves f is convex. PATH INTEGRALS AND ANALYTIC FUNCTIONS ( OPTIONAL )
25.26. Remark. This subchapter can be omitted. Its results will not be needed later in this book, except for a few brief examples. Definitions. By a path in C we shall mean a function
c in F and Xn ---> x in the metric space (X, p), then CnX ex and CXn ---> ex in (X, p) - for in this context separate continuity implies joint continuity; that rather nontrivial fact follows ( exercise) from 23. 15.b and the fact that the scalar field F (always JR. or C in this book) is a complete metric space. An F-seminorm is a paranorm that is also balanced - i.e., satisfying --->
x E X, c E II.", l e i
p(cx) :::; p(x).
:::; 1
If it is also positive-definite, then p is called an F-norm, and (X, p) is an F-normed space. We may refer to X itself as the (F-)normed space, if no confusion will result. The definitions above are admittedly complicated, but their importance will become evident in 26.29. An F-space is a vector space topologized by an F-norm that is complete. (Equivalently, it is a complete metrizable topological vector space; we shall prove that equivalence in 26.29 and 26.32.) Caution: In the modern literature, an "F-normed space," an "F-space," a "Fn§chet space," and a "topological space with a Frechet topology" are four different things; see 26. 14. Any seminorm is also an F-seminorm (easy exercise). Other examples of F-seminorms and paranorms will be given below. Further remarks about terminology. This book ' s terminology conforms to the literature whenever possible, but it is not always possible; the literature varies greatly in its termi nology for F-norms and related notions. Our definition of "F-norm" is equivalent to the definition used by Kalton, Peck, and Roberts [1984] and Kothe [1969] . Our definition of "paranorm" follows that of Wilansky [1978] ; Swartz [1992] calls this object a quasinorm. If a paranorm is positive-definite, then Wilansky [1978] calls it a "total paranorm," Yosida [1964] calls it a "quasinorm," and Swartz [1992] calls it a "total quasinorm." Many other books and papers - including the classic work of Banach [1932/ 1987] - use positive-definite paranorms without attaching any name to them. A very extensive treatment of metric linear spaces is given by Rolewicz [1985] . 26.3. Relations between G-seminorms, paranorms, and F-seminorms. a. A function p is an F -seminorm on a vector space X if and only G-seminorm that satisfies this scalar continuity condition:
For any x E X , if l cnl Hint:
b.
12. 25.f.
--->
0,
if p is a balanced
then p(cnx) ---> 0.
Any paranorm p is equivalent to an F-seminorm Hint (modified from Rolewicz [1985] ) : The set { t E F : I t I :::; 1} is compact; hence the number a(x) = max{p(tx) : lt l :::; 1 } is finite. a.
691
F-Seminorms c.
Let X be a linear space, and let p : X [0, +oo) be some mapping. Then the following are equivalent: (A) p is a seminorm; (B) p is a convex function and an F-seminorm; (C) p is a convex, balanced G-seminorm. Hints: We shall prove (C) (A); the other implications are easy. Show that p(sx) � sp(x) for x E X - first for s E N, by the subadditivity of G-seminorms; then for s E (0, 1], by the assumed convexity of p; then for s > 0 by combining those two results. Then show p( sx ) 2: sp(x) for s > 0 by replacing s with 1 / s . Then what? ----+
=}
26.4. Basic examples. a.
Open and closed balls.
and closed ball
Let (X, d) be a metric space. As in 5.15.g, define the open ball
Kd(z, r) = { x E X : d(x, z) � r}. Bd(z, r) = {x E X : d(x, z) < r } , As we noted in 5.18.c, cl[Bd ( z , r )] s;; Kd(z, r) in any metric space. Show that
in any normed space. That conclusion is false in some F-normed spaces; for instance, show that it is false in lR equipped with the F-norm p(x) min{1, l x l } . b. Pathological example. Consider C (the complex numbers) as a vector space with scalar field JR; then p(z) IRe z l + lim zl is a norm. On the other hand, if we consider C as a vector space with scalar field C, then p is not a norm or an F-norm (since it is not balanced), but it is a paranorm understand that the absolute value of a scalar is defined as usual: on C. Here we lei = j(Re c) 2 + (Im c) 2 . Another pathological example. Using the identity sin( + (3) = sin cos (3 + sin (3 cos show that the function f(x) I sin(1rx)l is subadditive on lR - that is, J (x + y) < f(x) + f(y) for x, y E JR. Then show that the function p(x) I sin(1rx)l + min{2, lxl } is a paranorm on lR that is equivalent to the usual norm on JR, but p is not balanced. d. For 0 < p < 1, the functions lx1 I P + l x z i P + · · · + l xn i P llxll � are F-norms on JH:.n or en that yield the product topology. This follows easily from 12.25.e. (Here I ll v is defined as in 22.11.) Similarly, the functions llxll � l x1 I P + lx z i P + l x3 1 P + · · · are F-norms on £p, defined as in 22.25. We emphasize that the functions II ll v are not necessarily norms when p < 1 . =
=
c.
=
a
a
a,
Chapter 26: Metrization of Groups and Vector Spaces
692
The sum of finitely many paranorms or (G-)(F-)(semi)norms is another object of the same type. f. The pointwise maximum of finitely many paranorms or (G-)(F-)(semi)norms is another object of the same type. The restriction of any paranorm or (F-)(semi)norm to a linear subspace is a paranorm or (F-)(semi)norm. h. Bounded equivalents. Let f3 be a bounded remetrization function; in 18.14 we saw that if d is a pseudometric on a set X, then f3 d is a bounded, uniformly equivalent pseudometric. Show that if p is a G-(semi)norm or an F-(semi)norm, then f3 p is a bounded, uniformly equivalent G-(semi)norm or F-(semi)norm. We cannot make an analogous assertion for seminorms, however. If p is a seminorm, then f3 p is an equivalent F-seminorm, but in general it is not a seminorm. Indeed, the only bounded seminorm is the constant function 0. i. If X is a vector space, p is an F-seminorm on X , K is a linear subspace, and the quotient G-seminorm p is defined as in 22. 13, then p is an F-seminorm too. 26.5. Change of scalar field. Let X be a complex vector space. Then X may also be viewed as a real vector space, if we "forget" how to multiply members of X by members of C \ Show that a. If p is a paranorm, F-seminorm, or seminorm on the complex vector space X , then p is a paranorm, F-seminorm, or seminorm (respectively) on the real vector space X. b. If p is a seminorm or F-seminorm on the real vector space X , show that r(x) sup{p(tx) : t E C, i t l :::; 1 } defines a seminorm or F-seminorm r on the complex vector space X, which is "semi equivalent" to p in this sense: p(x) :::; r(x) :::; p(x) + p(ix) for all x E X. (Hint: p(tx) :::; p(Re(t)x) + p(Im(t)ix).) Moreover, if X is equipped with some topology and p : X [0, +oo) is lower semicontinuous, then r X [0, + oo) is lower semicontinuous. (Hint: It is the sup of the lower semicontinuous functions x p(tx).) (This construction is based on Rolewicz [1985] . ) 26.6. Frechet combinations. Let (pj : j E N) be a sequence of F-seminorms on a vector space X. Then
0 for each j. b. is a bounded F-seminorm on X . ( Hint: Use ( * ) for an easy proof that r.p is scalar continuous. ) c. is uniformly equivalent to the gauge {p 1 , p2 , p3 , . . . } . Thus, the topology that determines on X is the supremum of the topologies of the pseudometric spaces (X, PJ ) , and the uniform structure of (X, r.p) is the supremum of the uniform structures of the (X , pJ ) ' s. (Hint: Again, use ( * ) . ) d. If we replace with some other bounded remetrization function and /or replace ( aj ) with some other sequence of positive numbers with finite sum, then condition ( * ) remains valid; hence the resulting Frechet combination is equivalent to cp. e. r.p is an F-norm if and only if the p/s separate points of X - i.e., if and only if they have the further property that whenever PJ (x) 0 for all j, then x 0. Remarks. In most applications of this formula, the F-seminorms PJ are actually seminorms - i.e., they are homogeneous. In that case X is locally convex; that will follow from 26.20.d. ( Hence, if the metric determined by is complete, then X is a Frechet space. ) However, cannot be a seminorm, since is bounded. In many applications, is not even equivalent to a seminorm; see the examples in 27.7.c and 27.8. 26.7. Example: the space of all sequences. Let lF be the scalar field - that is, lR or C; then JF N { sequences of scalars } . The product topology and product uniform structure on JFN are given by any of the following F-norms: � min { l, l xj l } a.
r.p r.p
r.p
r
=
=
r.p
r.p
r.p
r.p
=
'
.,
J.
�
j=l
applied to any sequence x ( x 1 , x2 , x3 , . . . ) . Indeed, these formulas are all special cases F-norms are complete; any one of them may be of 26.6, with PJ (x) l xJ I · The resulting referred to as the usual F-norm on lFN . Although all the p/s are seminorms, none of the resulting F-norms is a norm or even equivalent to a norm --- i.e., the product topology on JFN cannot be given by a norm; that will be proved in 27.8. 26.8. Example: the space of all continuous functions. Let C(JR) be the set of all continuous functions from lR into the scalar field lF ( which may be lR or C). For f E C(JR) , let =
=
Show that r.p is an F-norm on C(JR) that is complete and that gives the topology of uniform convergence on compact sets ( introduced in 18.26 ) . This F-norrn (or any other F-norm equivalent to it ) is the usual F-norm on C(JR) . It is not equivalent to a norm; see 27.8.
694
Chapter 26: Metrization of Groups and Vector Spaces
26.9. Example: the space of locally integrable functions. Let lF be the scalar field (JR. or C). A function f : lR --> lF is called locally integrable if its restrictionnto each compact interval [a, b] is integrable. More generally, let 0 be an open subset of JR , equipped with Lebesgue measure; a function f : 0 --> lF is called locally integrable if its restriction to The set of all (equivalence classes some open neighborhood of each point in 0 is integrable. of) locally integrable functions on 0 is denoted by £11oc (0). Show that a. LP(O) � L f0c (O) for every p in [1, +oo] . (Hint: Holder's Inequality.) b. If f : 0 --> lF is locally bounded (i.e., bounded on compact sets) and f is measurable, then f E L (O). In particular, any continuous function from 0 to lF is locally integrable. f0c Thus, the function f(t) = exp(t2 ) is locally integrable on JR., even though it does not belong to LP(JR) for any p E [1, +oo] . c. L}0c (O) can be made into a Frechet space in a natural fashion, using the sequence of seminorms Pn U ) = fen i f(t) i dt, where the Gn 's form an open cover of 0 and each Gn is contained in some compact subset of 0 (see 17.18.a). The resulting F-norm is
p(f)
� Tn max { 1, in i f(t) i dt } .
(In particular, £11oc (JR.) can be made into a Frechet space using the sequence of semi norms Pn U ) = f::n l f(t) i dt.) Exercise. Different choices of the sequence (Gn) of relatively compact sets may yield different F-norms p. Show that any two such F-norms are equivalent. (Hint: 17.18.b.) In fact, the topology can be described as follows: A sequence Un ) is p-convergent to a limit f if and only if, for each open set G that is contained in a compact subset of 0, we have limn---> oo fa l fn (t) - f(t) i dt = 0. Further exercise. Prove the completeness of L foc (0) . 26.10. Example: the space of holomorphic functions. Let 0 be an open subset of the complex plane, and let Hol(O) = {holomorphic functions from 0 into C} (defined as in 25.27). The usual topology for Hol(O) is the topology of uniform convergence on compact subsets of 0, introduced in 18.26. That topology makes Hol(O) a Frechet space; it can be metrized as follows. Let C 1 , C2 , G3 , . . . be an open cover of 0, where each Gn is contained in some compact subset of 0. (See 17.18.a.) .Define (n = 1, 2, 3, . . . ). max i f(w) i Pn (f) wEGn
Then each Pn is a seminorm on Hoi ( 0), and the seminorms P1 , P2 , P3 , . . . determine the topology of uniform convergence on compact sets. The particular choice of the sequence (Gn) does not matter - see 17.18.b. A further property of Hol(O) is noted in 27.10.c.
695
F-Seminorms
Let (0, S, p,) be a measure space, and let Then the pseudometric Di-' defined in 21.34 on SM(S, X) takes a slightly simpler form: It can be rewritten Di-' (f, g) = pi-'( ! - g), where inf arctan [a + {w E 0 l h(w)l > a}] . Some basic properties: a. The function pi-' is a G-seminorm on SM(S, X) or a G-norm on the quotient space SM(S, X)/p,, making those vector spaces into topological Abelian groups. b. In general, SM(S, X) is not a topological vector space. That is shown by the example below. However, in 26.12.c we shall consider a smaller subspace on which pi-' is indeed an F-seminorm. Example. Let (0, S, p,) = (N, :P(N), counting measure), and let X = R Define fn (j) = f(j) = j for all n , j E N, and let Cn = � and c 0. Then fn ---> f in measure and en ---> c, but cn fn f. cf in measure. Thus, multiplication of scalars is not jointly continuous for this topology. 26.12. ( Optional.) Let (0, S, p,) be a measure space, and let (X, I I ) be a Banach space. A function f : 0 ---> X is totally measurable if it is a strongly measurable function (as defined in 21.4) that also satisfies p, ({w E 0 : I J(w) l > E}) is finite for each E > 0. (Of course, if p,(O) < then every strongly measurable function is totally measurable.) Let TM ( p, ; X) denote the set of all p,-equivalence classes of totally measurable functions. 26.1 1 . Example: convergence in measure. (X,
I I ) be a Banach space.
a>O
JL
:
=
oo ,
Exercises.
TM(p,, X) is a closed linear subspace of the G-normed space (SM(p, , X), pi-') (which is complete). b. The finitely valued members of TM(p,, X) are dense in TM(p,, X). The G-norm pi-' of 26.11 is an F-norm when restricted to TM(p,; X). The space TM (p,; X),0 equipped with this F-norm or any other equivalent F-norm, is sometimes denoted L ( p, ; X) (especially when p,(O) < ) d. When p,(O) < then pi-' is also equivalent to this F-norm: a.
c.
oo .
oo,
rU )
L r [l f(w) l]dp,(w),
where r is any bounded remetrization function (defined as in 18.14). Hint: To prove is scalar-continuous (as in 26.3.a), use the Dominated Convergence Theorem 22.29. e. For any p E (0, ) , the vector space L P ( p, , X) is a linear subspace of the vector space TM(p,, X), and the F-seminorm I 11;i n { l ,p} is stronger than pi-' on LP (p, , X). f. Dominated Convergence Theorem for TM. Let ( !1 ) be a sequence in TM(p,; X) that converges pointwise p,-almost everywhere to a limit f. Assume that the sequence 1
oo
696
Chapter 26: Metrization of Groups and Vector Spaces
(fJ) is dominated by a totally measurable function - i.e., assume g(w) = supj l fJ (w) l is totally measurable. Then fJ f J-L-almost uniformly (hence also fJ ----+ f in measure). Hints: Let any c > 0 be given. For each positive integer n, let On = {w E 0 : g(w) > 1/n}; then J-L(On ) is finite. Hence fJ f J-L-almost uniformly on On , by Egorov's Theorem. Thus there exists some set An � On such that J-L(An ) < 2� n c, and fJ f uniformly on On \ An . Now let A = U�=l An . Then J-L(A) < c, and we shall show that fJ f uniformly on 0 \ A. To see this, let any 15 > 0 be given; we must show that for all j sufficiently large, we have supw l fJ (w) - f(w) l < 15. Choose some integer n > 2/15 . Then 0 \ A = CA can be partitioned into the sets (CA) n On and (CA) n (COn ) · We have uniform convergence on On \ An , hence on the smaller set On \ A = (CA) n On . The remaining piece, (CA) n (COn ) , is a subset of COn , and at every w in COn we have supj l fJ (w) - f(w) l :::; 2g(w) :::; 2/n < 15. For a slightly more general treatment, see Dunford and Schwartz [1957], which permits J-1 to be a charge, not necessarily a measure. 26.13. ( Optional.) (We omit the proofs of the results below; they constitute exercises that are difficult but may be within the reach of some particularly ambitious readers.) By an Orlicz function we shall mean an increasing, continuous function [0, +oo] [0, +oo] that satisfies cp � 1 (0) = {0} . ( Caution: The terms "Orlicz function" and "Orlicz space" have slightly different meanings in different books and papers.) A few examples of Orlicz functions are min{1, t} . tP (for constant p > 0), tP ln(1 + t), As the last example shows, we permit an Orlicz function to be bounded. Let be an Orlicz function, let (0, S, J-L) be a measure space, and let (X, I I ) be a Banach space. For strongly measurable functions f : 0 ----+ X , define ----+
----+
----+
----+
rp :
rp
inf {r E (O, +oo]
:
L 'P ('1�)1 ) dJ-L(w) < T } .
----+
Show that The set £., '�' ( J-L; X ) = {f E SM(S; X) : p'P(f) < oo} is a linear space, on which p'P is a complete F-seminorm. If we take the quotient with respect to J-L-equivalence classes of functions, we obtain an F-space L'�'(J-1 ; X ) . b. When cp ( t ) = tP for some number p E (0, +oo), then LP (J-L; X ) (defined as in 22.28) is equal to L'�'(J-L; X) and p'P is equivalent to I I lip · c. The space T M (J- 1 ; X ) , defined in 26.12, is equal to the union of all the spaces L'�'(J- 1 ; X ) , as varies over all Orlicz functions (defined above). For a different treatment and references, see Rao and Ren [1991]. a.
rp
as
TAG '.s
697
and TVS '.s
TAG ' s AND TVS ' s
Let X be an Abelian (i.e., commutative) group, with group operation and identity element 0. Let 'J be a topology on the set X . We say (X, 'J) (or more simply, X ) is a topological Abelian group - hereafter abbreviated TAG - if the group operations are continuous - i.e., if v is continuous from X into X, and -v is jointly continuous from X X into X . (u, v) u+v Along with the theory of TAG's, we shall also develop the slightly more specialized theory of TVS's. Let X be an vector space over the scalar field lF, and let 'J be a topology on the set X . We say (X, 'J) is a topological vector space (or topological linear space) hereafter abbreviated TVS - if the vector operations are jointly continuous; i.e., if (c, v) cv (from lF X into X ) and u+v (u, v) (from X X into X ) are both jointly continuous. Of course, every TVS is also a TAG. We shall specialize further: A locally convex space- hereafter abbreviated LCS is a topological vector space with the further property that 0 has a neighborhood basis consisting of convex sets. Finally, a Frechet space is an F-space that is also locally convex. (This should not be confused with a very different meaning given for "Fn§chet space" in 16.7.) Remarks. Clearly, any Banach space is also a Frechet space. Some other examples are noted in 26.20.e. It is immediate from 22.7 that any G-seminormed group (when equipped with the pseu dometric topology) is a TAG. Similarly, any F-seminormed vector space is a TVS, and any seminormed vector space is an LCS. We shall see in 26.29 that TAG's, TVS's, and LCS's are not much more general than this. In our study of TVS 's in this and later chapters we shall distinguish between those theorems (such as 27.6) that require local convexity and those theorems (such as 27.26) that do not. However, this distinction is made chiefly for theoretical and pedagogical reasons - i.e., to make the basic concepts easier for the beginner. Although we do give a few examples of non-locally-convex TVS's in 26.16 and 26.17, we remark that most TVS's used in applications are in fact locally convex. Thus, it would be feasible to skip TVS's altogether and simply study LCS's, equipping some theorems with hypotheses that are slightly stronger than necessary; that approach is followed by some introductory textbooks on functional analysis. Caution: Since most TVS's used in applications have Hausdorff topologies, some math ematicians incorporate the T2 condition into their definition of TVS or LCS. In the present book, however, a topological space will be assumed Hausdorff only if that assumption is stated explicitly. 26.14. Definitions. +
f-.>
f-->
x
f-.>
f-.>
x
x
-
26. 15. Degenerate (but important) examples.
698
Chapter 26: Metrization of Groups and Vector Spaces
The topology consisting of all subsets of an Abelian group X is a TAG topology. However, if X is a vector space (other than the degenerate space { 0}), then the discrete topology on X does not make it a TVS, because ( exercise) multiplication of scalars times vectors is not jointly continuous. In fact, for fixed x -/=- 0, the mapping c f---+ ex is not continuous at e = 0. b. The indiscrete topology. The topology { 0, X } makes any Abelian group X into a TAG and any vector space X into an LCS. Of course, it is not Hausdorff (unless X = {0}). 26.16. Example. For 0 :::; p < 1, the F-spaces LP[O, 1] (defined in 26.12.c and 22.28, with equal to Lebesgue measure on [0, 1]) are not locally convex. In fact, LP[O, 1] has no open convex subsets other than 0 and the entire space, and the space LP [0, 1] * {continuous linear functionals on LP [0, 1]} is just { 0}. Proof The F-space LP [O, 1] is topologized by the F-norm p(f) = fo1 r ( lf(t) l ) dt, where f( ) = sP in the cases of 0 < p < 1, and f is any bounded remetrization function in the case of p = 0 (see 26.12.d). In particular, for p = 0, we may take f(s) s/(1 + s); thus r( s) :::; 1 for all s in that case. Suppose V is a nonempty open convex subset of £P[O, 1 ]. By translation, we may assume 0 E V. Since V is a neighborhood of 0, we have V :2 {f : p(f) < r } for some number r > 0. Let g be any element of £P [O, 1 ] ; we shall show that g E V. Choose some integer n large enough so that p(g) < rn 1 - P . Since the function t J� f( l g(s) l )dst is continuous, we can choose a partition 0 = t0 < t 1 < t 2 < < tn = 1 such that f.t11- 1 f(lg(s) l )ds = ln p(g) for all j. Let 1 (t1_,, t1 ] be the characteristic function of the interval (tj - 1 , t1 ], and let g1 n1 ( t1_1 , t1 ]g · An easy computation shows that a.
The discrete topology.
J.L
=
S
=
·
· ·
f---+
=
g E V also. and thus g1 E V. Since g = � (g1 + g2 + · · + gn ) and V is convex, If A is a continuous linear functional on LP [O, 1], then A - 1 ({e : l e i < 1 }) is an open convex set containing 0. Hence it is all of LP[O, 1 ] ; hence A = 0. 26.17. Example. If 0 < p < 1 , then the sequence space Rp is not locally convex. In particular, {x : llxii P < 1} does not contain a convex neighborhood of 0. However, the set ( Rp) * = {continuous linear functionals on Rp} is equal to Roo; this space is large enough to separate the points of Rp. Hints: For the first assertion, suppose { x : ll x ii < 1} contains some convex neighborhood of 0, which we label V. Show that V :2 {x : Pll x ii P :::; s } for some s > 0. Let ej be the sequence that has a 1 in the jth place and Os elsewhere. Then se1 E V. By convexity, , Vn = � (se 1 + se 2 + · · · + se n ) E V for any positive integer n. However, show that ll vnii > 1 P for n sufficiently large. Any y E Roo acts as a continuous linear functional on Rp, by the action (x, y) 2:� 1 XjYJ ; in fact, we have 2:1 lxJ YJ I :::; ll x lh iiYIIoo :::; llxii P IIYIIoo · Conversely, if t.p E (Rp)*, ·
=
699
TAG 's and TVS's
let ej be the sequence with 1 in the jth place and 0 elsewhere. Define a sequence y ( YJ ) by taking YJ ip ( ej ) · Then IYJ I ::; I IIIPII I IIeJ IIv I II IPI I I; thus y is bounded. The functionals ( , y) and IP are continuous on fiv , and they act the same on sequences with only finitely many terms. Such sequences are dense in fip , so ( , y) ip on fip . 26.18. Net characterizations of TAG's and TVS's. Let X be an Abelian group, equipped with some topology. Then X is a TAG if and only if its topology satisfies these two conditions: (1) Whenever (xn , Yn ) is a net in X X X satisfying X n ----+ x and Yn ----+ y , then Xa + Ya ----+ =
=
=
-
-
=
x + y.
Whenever (xo: ) is a net in X satisfying Xn ----+ x, then -x" ----+ -x. More specifically, let X be a vector space, equipped with some topology. Then X is a TVS if and only if its topology satisfies conditions (1) and (2') Whenever (cn , Xa ) is net in F X satisfying Cn ----+ c and Xn ----+ x, then CaXa ----+ ex . 26.19. Initial object constructions of TAG's, TVS's, and LCS's. Let X be a set, let { (Y>., 'J>.) : .A E A} be a collection of topological spaces, let ip>. : X Y>. be some mappings, and let S be the initial topology determined on X by the ip>.'s and 'J>.'s � i.e., the weakest topology on X that makes all the IP>.'s continuous (see 9 .15 ) . Show that (i) If X is a group, the (Y>., 'J>.)'s are TAG's, and the IP>.'s are additive maps, then (X, S) is a TAG. (ii) If X is vector space, the (Y>., 'J>.)'s are TVS's, and the IP >.'s are linear maps, then (X, S) is a TVS. (iii) If X is a vector space, the (Y>., 'J>.)'s are LCS's, and the IP >. 's are linear maps, then (X, S) is an LCS. Hints: 15.14(A), 15.24, and 26.18. (2)
a
x
----+
a
26.20. Some important special cases of initial objects. a. The product of any collection of TAG's or TVS's or LCS's, with the product topology and product algebraic structure, is a TAG or TVS or LCS. Subspace topologies are initial topologies determined by inclusion maps (see 5.15.e and Thus, any subgroup of a TAG is also a TAG; and a linear subspace of a TVS or LCS is another TVS or LCS. c. The suprerrmm, or least upper bound, of a collection of topologies is the weakest topology that includes all the given topologies (see 5. 2 3.c) ; it is the initial topology given by identity maps. Thus, the sup of a collection of TAG or TVS or LCS topologies is another TAG or TVS or LCS topology. d. Let D be a collection of G-serninorms on an Abelian group X , or a collection of F seminorms or seminorms on a vector space X . Then the gauge topology determined on
b.
9.20) .
Chapter 26: Metrization of Groups and Vector Spaces
700
X by D is a TAG, TVS, or LCS topology, respectively. (Hints: As we noted in 5.23.c, the topology determined by a gauge is the supremum of the individual pseudometric topologies. It follows easily from 15.25.c and 26.18 that any sup of TAG or TVS topologies is a TAG or TVS topology.) (A converse to this result will be given in 26.29.) Let be a Frechet combination of �.pj 's on X (as in 26.6), and suppose that each is actually a seminorm (i.e., it is homogeneous). Then the F-normed space (X, �.p) is locally convex. If it is complete, then it is a Frechet space. These conditions are satisfied by the examples in the next few sections after 26.6. 26.21. Change of scalar field. Let X be a complex vector space. Then X may also be viewed as a real vector space (if we "forget" how to multiply members of X by members of C \ IR). Let 'J be some topology on the set X. It is easy to show that if the complex vector space X is a TVS, then the real vector space X is also a TVS. The converse of that implication is false, however, as we now show: A pathological example. Let X = {bounded functions from [1, +oo) into C}. That is a complex vector space, with vector addition and scalar multiplication both defined pointwise on [1, +oo) For f E X, define e.
i.p
IPJ
llfl l
}
1��.foo { IRe f(t)l + � llm f(t)l .
Verify that (X, I II ) is a Banach space, when we use the real numbers for the scalar field. However, let fn be the characteristic function of the interval [n, n + 1]. Verify that ll fn ll = 1 while ll ifn ll = � · Conclude that the topology of the Banach space (X, I II) does not make scalar multiplication jointly continuous from C X into X; hence (i) that topology does not make X into a complex topological vector space, and (ii) I II is not a norm on the complex vector space. x
ARITHMETIC IN TAG ' s AND TVS ' s 26.22. Arithmetic in TAG's. Let X be a TAG, and let S, T be nonempty subsets of X. Show that H S is symmetric (i.e., if -S = S), then cl(S) and int(S) are symmetric. b. If S is open, then S + T is open regardless of what T is. If S and T are closed and S is compact, then S + T is closed. d. int(S) + int(T)
oo.
·
g
x
x
x
702
Chapter
26:
Metrization of Groups and Vector Spaces
cj 's small enough so that p(cjxj) < 2 -j , then s does indeed exist, by the completeness of p. Let "conv" denote convex hull; we shall choose the cj's so that s tf_ conv(K). For each positive integer n, since en is positive, the vector Sn does not lie in the linear span of x0 , X1 , x2, . . . , Xn - l · Hence it does not lie in their convex hull, which is a compact set. Thus Tn dist(sn, conv{xo, X l , x2, . . . , Xn-d) is a positive number. Note that the definition of Tn depends only on the choices of c1 , c2, . . . , Cn - l , Cn · Thus we may choose the en's and rn 's recursively: choose each Cn+ 1 small enough to satisfy =
It follows that p(s - Sn)
p(cn+ l Xn+ l + Cn+2Xn+ 2 + · · ·)
n Hence dist(s, conv{xo, x 1, x2, . . . , Xn- d) > 0. By 12.5.d, we have conv(K)
=
X be any selection of 1r - 1 - that is, let a be any function that satisfies a(q) E 1r - 1 (q) for all q E X/cl({O}). Then a is continuous. Proof Let (qa ) be a net converging to a limit q in X/ cl( {0} ) . Show that p(a(qa ) a(q)) = p(qa - q) ---> 0. The quotient map 1r : X Q is open, closed, and continuous.
e.
--->
UNIFORM STRUCTURE oF TAG ' s
Let X and Y be TAG's. By 26.29, the topologies of X and Y can be determined (not necessarily in a unique fashion) by gauges D and E consisting of G-seminorms - i.e., consisting of translation-invariant pseudometrics. Fix any such gauges D and E, and let X and Y be equipped with the uniform structure determinea by those gauges. Then: a. Let ( (xa , x� ) : o: E A) be a net in X X. Then D(xa , x� ) ---> 0 in X in the sense of 18.7 if and only if Xa - x� ---> 0 in X. b. A function f : X ---> Y is uniformly continuous if and only if it has this property: Whenever ( (X x�) E A) is a net in X X that satisfies Xa - x� ---> 0 in X, then also f(xa) - f(x�) ---> 0 in Y; or, equivalently, this property: For each neighborhood H of 0 in Y, there is some neighborhood G of 0 in X such that x - x' E G f(x) - f(x') E H . Any additive continuous map f : X Y is uniformly continuous. 26.37. Discussion: uniqueness of the uniformity. Let (X, 'J) be a topological Abelian group (TAG). As we have noted above, the topology 'J can be determined by some gauge consisting of G-seminorms. Such a collection also determines a uniformity on X. The gauge is not necessarily unique, but we can now see that the uniformity is unique; any two such gauges must determine the same uniformity. (Proof Apply 26.36.c to the identity map.) That unique uniformity will be called the usual uniformity for the topological group. It will always be understood to be in use whenever a topological Abelian group is viewed as a uniform space (unless some other arrangement is specified). It will also be in use for special kinds of TAG's - e.g., for TVS 's and LCS's. Note that 26.36. Preliminary lemmas.
x
a,
: o:
x
=?
c.
--->
on an Abelian group, a TA G topology and its associated usual uniformity deter mine each other uniquely. Consequently, we may refer to them interchangeably in discussions.
For instance, we might say something like "the set S is a totally bounded subset of X, when X is equipped with the topology of uniform convergence on members of S." Here we are really referring to the uniformity, not the topology, of uniform convergence on members of
709
Uniform Structure of TAG 's
S, but in certain parts of the literature it seems to be customary to call this a "topology." No harm is done by this abuse of terminology, since the topology and uniformity determine each other uniquely. 26.38. Remarks: nonuniqueness of the topology corresponding to the group structure. The result developed above, on the uniqueness of the usual uniformity for a TAG, must be read carefully. It does not say that there is only one uniformity compatible with the given topology, nor that there is only one translation-invariant uniformity. Even if we restrict our attention to the topologies and uniformities given by translation-invariant gauges, an Abelian group X may be made into a TAG in more than one way - i.e., there may be several different pairs
('J1 , Ul ) , ('Jz , llz ), ('J3 , ll3 ), . . .
consisting of a topology 'J1 that makes X into a TAG and the associated uniformity U1 . We illustrate this by mentioning three different uniformities on R The translation-invariant metric d(x, y) = lx - Yl yields a translation-invariant (and complete) uniformity on lR and the usual topology. The metric d(x, y) = I arctan(x) - arctan(y)l is not translation-invariant. It yields the usual (translation-invariant) topology on JR, but it yields a uniformity that is not translation-invariant or complete. (With this uniformity, the completion of lR is the extended real line [-oo, +oo] .) The discrete metric on lR is translation-invariant and yields a translation-invariant and complete uniformity. It yields, not the usual topology on JR, but rather the discrete topology. It is also possible to develop a theory of topological groups that are not necessarily Abelian, but that theory is more complicated. A topological group that is not Abelian does not necessarily have one "preferred" uniformity, analogous to that discussed in 26.37. Examples can be found in Wilansky [1970], and in books on uniform spaces. We shall not pursue that topic here. 26.39. Remarks: irrelevance of scalars. A uniformity is unchanged if we replace the gauge with any uniformly equivalent gauge. In a TAG we can choose the gauge to consist of G-seminorms. In a TVS or an LCS we can do better: We can choose the gauge to consist of F-seminorms or seminorms. However, these "better" gauges do not give us more insight into the uniform structure. In the basic theory developed below, we can forget about multiplication by scalars, for it has no effect on the uniform structure; we can view our TVS's and LCS 's as TAG's. (Nevertheless, the uniform structure and the operation of scalar multiplication do interact in some interesting ways; see 27.2.) 26.40. Further properties of the usual uniformity. Let X be a TAG, let N be the neigh borhood filter at 0, and let 1L be the usual uniformity on X. Then: The usual uniformity can be described directly in terms of the topology, as follows: 1L {S � X X X S :;2 EN for some N E N}, •
•
•
a.
710
Chapter 26: Metrization of Groups and Vector Spaces
where
E XxX X - y E N} for each neighborhood N E N. The sets EN then form a filterbase for the uniformity. b. A net (xa : a E A) in X is Cauchy if and only if, for each neighborhood N of 0, there is some a0 E A such that a, a' � a0 =? Xa - Xa' E N. A filter 3'" on X is Cauchy if and only if, for each neighborhood N of 0, there is some F E 3'" satisfying F - F c,;;; N. Let Ua : a E A) be a net of functions from some set S into X, and let f E X 8 also. Then f"' f uniformly on S if and only if for each neighborhood N of 0 there is some ao E A such that Ua (s) - f(s) : a � ao, s E S} c,;;; N. d. Let n be a topological space, and let {f>.. : >. E A} be a collection of functions from n into X. Then {f>.. } is equicontinuous at a point Wo E n if and only if {f>.. } has this property: For each neighborhood N of 0 in X, there is some neighborhood G of w0 in D such that {f>.. ( wo) - f>.. (w) : >. E A, w E G} c,;;; N. e. Let S c,;;; X . Then S is totally bounded if and only if S has this property: For each neighborhood N of 0, there is some finite set F c,;;; X (or, equiva lently, some finite set F c,;;; S ) such that F + N � S. 26.41. ( Optional.) Most of the results on Riemann and Henstock integrals in Chapter 24 require norms, but the definitions and a few basic properties do not actually require norms. The definitions would make as much sense in any topological vector space X, if we replace for each number E > 0, there exists . . . such that . . . llv - � [f, T] ll < with for each neighborhood N of 0, there exists . . . such that . . . v - � [f, T] E N. As an exercise, readers may wish to prove the following result. Theorem. Suppose the topological vector space X is locally convex, and assume it is complete - i.e., every Cauchy net in X converges. Then any continuous function f : [a, b] X is Riemann integrable. ( Hint: Any continuous function on [a, b] is uniformly continuous. ) c.
{ (x, y)
-->
E
-->
PONTRYAGIN DUALITY AND HAAR MEASURE ( OPTIONAL ; PROOFS OMITTED) 26.42. Remarks. We now state a few further results about topological Abelian groups. We shall omit the proofs, which are not short or elementary, since these results will not be needed later in this book except in some other material marked "optional."
Pontryagin Duality and Haar Measure (Optional; Proofs Omitted)
71 1
By a Pontryagin group we shall mean a locally compact, Hausdorff, topological Abelian group. Some examples of Pontryagin groups are: lR or e, with the usual topology and with addition for the group operation. (0, +oo), with multiplication for the group operation. (This is isomorphic to JR. ) Z, with the usual (i.e., discrete) topology and with addition for the group operation. 1!' = {z E e : lzl = 1 } , with multiplication for the group operation. This group will play a special role in the theory developed below. (Of course, it is isomorphic to the group [0, r ) with the operation of addition modulo r, for any positive number r . ) Any product of finitely many Pontryagin groups, with group operation defined com ponentwise and with the product topology. (In particular, IRn and en .) We may form a category by taking Pontryagin groups for the objects, with continuous group homomorphisms for the morphisms of the category. It is easy to see that this satisfies the definitions in 9.3. For each Pontryagin group G, we now define the dual G* { 'P : 'P is a morphism from G into 1!'}. This is a special case of the notion of "dual" introduced in 9.55; in this context the object � of 9.55 is the circle group 1!'. The set G* can be made into a multiplicative group by defining products pointwise that is, '{)1/J(g) = '{)(g) 1/J(g) for any '{), 1/J E G * and g E G. The identity element of the group G* is the constant function 1 . The inverse of any element 'P C ) E G* is the function 1/'fJ(·). Note that 1/'fJ(g) = 'fJ(g), since 'fJ(g) takes its values in Also, since 'P is a group homomorphism, note that 1 j'{)(g) = 'P (- g) if G is written as an additive group, or 1 /'{J(g) = 'fJ(g - 1 ) if G is written multiplicatively. The group G* is called the character group of G; the members of G* are called the characters of G. Examples. The groups IRn and en are isomorphic to their own character groups; the groups and Z are isomorphic to each other's character groups. The preceding assertions are easy to verify; the remaining ones below are not. 26.44. Pontryagin Duality Theorem. Let G be a Pontryagin group, and let G* be its character group. Let G* be topologized by the topology of uniform convergence on compact subsets of G. Then G* is also a Pontryagin group; thus the mapping G G* goes from the category of Pontryagin groups into itself. With respect to this mapping, every Pontryagin group is "reflexive" - that is, G** = G. If f : G 1 G2 is a morphism, then the dual map f* : G2 * G1 * (defined as in 9.3) is also a morphism. If G* = H and H* = G, then G is compact if and only if H is discrete. 26.45. Theorem: existence and uniqueness of Haar measure. Let G be a Pontryagin group (as defined in 26.43). Then there exists a regular Borel measure on G that is 26.43. Definitions. • • • •
•
1!'.
1!'
___,
---+
f--7
JL
712
Chapter 26: Metrization of Groups and Vector Spaces
translation-invariant on G - i.e., that satisfies f.l. (X + S) f..l (S) for all x E G and all Borel sets S � G. It is unique, up to multiplication by a positive constant - i.e., if and f.l. 2 are two such measures, then /-l l kf..l2 for some positive constant k. Any such measure is called the Haar measure of the group. It is bounded if and only if G is compact. Notations. The spaces £P (f..l ) may be written instead as LP ( G ) . When the choice of the measure is clear, integration with respect to Haar measure may be written as fc f (x)dx instead of fc f(x)df.l. (x). Remarks. For simplicity we have only considered commutative groups, but the notion of Haar measure generalizes to all locally compact Hausdorff groups; commutativity is not actually required. The literature contains an assortment of proofs of the existence and uniqueness of the Haar integral. They are based mainly on two proofs. One, due to Cartan, is based on an argument involving Cauchy nets and proves uniqueness while it proves existence. The other, due to Wei!, is slightly shorter, uses a compactness argument, and does not prove uniqueness; it is usually supplemented by a brief proof of uniqueness due to von Neumann. Both proofs apply to noncommutative groups; both are given by Nachbin [1965]. Simpler proofs are possible if one restricts one's attention to compact groups or to commutative groups; for instance, see Izzo [1992] and references cited therein. 26.46. Examples. Haar measure on zn (or on any discrete group) is counting measure. Haar measure on JRn is n-dimensional Lebesgue measure. The circle group [0, r) (with addition modulo r, as defined in 8.10.e) has Haar measure equal to the restriction of one dimensional Lebesgue measure to subsets of [0, r). Haar measure on the circle group {z E C : l z l 1} (with the operation of multiplication) can be described in terms of [O, r) since those two groups are isomorphic; equivalently, Haar measure on 1!' is arclength times any convenient positive constant. Haar measure on the multiplicative group (0, +oo) can be described in terms of Lebesgue measure on the additive group JR, since those two groups are isomorphic by the mapping (0, +oo) x In x E R That isomorphism yields this formula for Haar measure in =
=
1!'
=
(O, +oo):
3
/-l l
=
1-l
f.l. (S)
r � dt }8 t
for Borel sets S � (0, +oo).
Here the dt is integration with respect to Lebesgue measure. 26.47. Let G and G* be a Pontryagin group and its dual group (as defined in 26.43). Let HSJ,ar measure on both groups be denoted by dx. Of course, Haar measure is only determined up to multiplication by a positive constant; fix some particular version of Haar measure on each group. The Fourier transform of a function f G lF is a corresponding function f : G* lF. The transform is defined for f E L 1 (G ) , for f E L 2 ( G) , and for f in various other classes of functions by an assortment of different methods, but the different definitions agree wherever the classes of functions overlap. :
----+
---->
713
Pontryagin Duality and Haar Measure (Optional; Proofs Omitted)
The most basic of these definitions is the following: if f E L 1 (G) and E G*. fc f(x)'y(x)dx This makes sense for f E L 1 (G), since "f(x) has absolute value 1 for all x. Abstract Riemann-Lebesgue Lemma. If f E L 1 (G), then f E Co(G), with llfll oo :::; II f Il l · Here Co (G) is the set of all continuous functions from G into C that vanish at infinity, as defined in 22.15. (This result generalizes 24.4l.b; explain how.) Plancherel Theorem. The Fourier transform, restricted to U(G) n L 2 (G), is a linear map from that set onto a dense subset of L2 (G*), which is distance-preserving - i.e., "f
c llf ii£2(G)
(for some positive constant c that depends on the normalizations of the Haar measures; the Haar measures can be chosen so that c = 1). Hence that restriction extends uniquely to a linear map f j, from L 2 (G) onto L 2 (G*), satisfying ( ** ). This map is sometimes called the Plancherel transform. It also satisfies Parseval's Identity: 1---7
and the Fourier Inversion Formula: c2 f (x ) = [( -x). When f E L 1 (G*), then the Fourier Inversion Formula can be written in this form: Examples. a.
When G = ]Kn , then G* = ]Kn also. It is convenient to define [( � ) (21r ) - n 2 }fJR n f(x) exp( -ix . Odx ;
- other constants can be used, but this constant yields f (x ) = [( -x). The term "Fourier transform" most often refers to this example. b. The group G = '][' can be conveniently viewed as the additive group [0, 21r), with addition modulo 27r (see 8.10.e). (Functions on are also often viewed as functions on lK that are periodic with period 21r. Intervals with length other than 27r can also be used, but the formulas are simplest for intervals of length 27r, so that is the only case we shall describe here.) The dual group is G* = Z, and a function on Z is just a sequence of numbers indexed by the integers. Thus, the transform of a function f E £ 1 ('J!') is the sequence of Fourier coefficients - x (n E Z) 27r1 !71' f ( x ) e m. dx 'Jl'
- 71'
714
Chapter 26: Metrization of Groups and Vector Spaces
Going in the other direction, an "integral" of a function in £ 1 (Z) is just a sum of real numbers. Thus we obtain f(x) (x E 'II') n= - oo This series is to be interpreted not as a pointwise summation but as a summation in £2 (1!'). That is, if f E £ 2 (1!') , then the partial sums S N (x) = 2:. := - N Cn einx converge to f in the sense that limN ..... II! - s N II 2 = 0. 26.48. Remarks on pointwise convergence. Let f E £ 2 [-7r, 1r], and define the Fourier coef ficients Cn = 2� J:.1r f (x ) e-inx dx. Then L-:= - oo Cn einx , the Fourier series of J, converges to f in the norm topology of L 2 [-7r, 7r] . Although the convergence in £P spaces is more important for applications, it is of some historical interest to know when the Fourier series converges pointwise to f. For instance, a theorem of Jordanx shows that if J( -1r) = j(1r) and f has bounded variation on [ -1r, 1r], then L-:= - oo cn ein = � [f(x+) + f(x-)] for all x; here f(x+) and f(x-) are the right- and left-hand limits of f at x. If f is also continuous, then the Fourier series converges everywhere to f. Georg Cantor tried to investigate the sets of points where certain Fourier series converge; this led him to invent cardinalities and set theory. What about functions that are not necessarily of bounded variation? It turns out that "most" continuous functions are ill-behaved at "most" points, in the following sense: Let C27r be the collection of all continuous functions f : [ - 1r, 1r] C that satisfy J( -1r) = j(1r); this is a Banach space when equipped with the sup norm. There exists a comeager set � C2 7r such that for each f E , there exists a comeager set E1 � [-1r, 1r] such that the Fourier series of f diverges at every point of E It also turns out that "most" functions in £ 1 [-1r, 1r] are ill-behaved at "most" points, in a different sense: The functions whose Fourier series diverge almost everywhere in [-1r, 1r] is a comeager subset of £1 [-1r, 1r]. (Kolmogorov first proved in 1926 that there exists a function in £ 1 [ -1r, 1r] whose Fourier series diverges almost everywhere.) However, for p > 1 the spaces £P [ -7r, 1r] exhibit much better behavior. If f E £P [ -7r, 1r] for some p > 1, then the Fourier series for f converges almost everywhere to f. This was proved by Hunt [1968], extending methods developed earlier by Carleson for the case of p = 2. For proofs or references for most of these results, see Edwards [1967]. The abstract approach to Fourier analysis is also introduced by Rudin [1960]. oo
-+
f.
ORDERED TOPOLOGICAL VECTOR SPACES 26.49. Definition and remarks. A ordered topological vector space is a real vector space X that is equipped with both a topology, making X into a topological vector space, and an ordering, making X into an ordered vector space (as defined in 11.44). • •
715
Ordered Topological Vector Spaces
Many different types of ordered TVS's can be defined by assuming various relations between the topology and the ordering. We shall concentrate on just a few basic types of ordered TVS's. In order of increasing specialization, these are: { locally full spaces } ::2 { locally solid spaces } ::2 { F-lattices } ::2 { Banach lattices } . Our treatment is based largely on Fremlin [1974] , Peressini [1967] , and Wong and Ng [1973] . 26.50. Exercise. Let X be an ordered TVS whose positive cone X+ is closed. Then: a. The sets {x E X : x � u} and {x E X : x � u} are closed, for each u E X. b. X is Hausdorff. c. X is Archimedean. Hint: If Ny is bounded above by some x, then for all n E N we have x � ny, hence �x - y E X + which is a closed set. Since X is a TVS, we have � x - y ----+ y and thus - ,
d.
y � 0. If (v0 : 8 E D.) is an increasing net that converges to some limit V00 in the topological space X, then v= = sup6E � v15. Hints: For each 80 E D., the set {x E X : x � V150 } is closed, hence contains V00 . Also, if w is an upper bound for the set { v6 : 8 E D.}, then v= is in the closed set {x E X : x � w}.
26.51. Recall that a subset S of a preordered set (X, �) is full (or order convex) if a � x � b with a, b E S implies x E S (see 4.4.a). The full hull of a set S is the set Ua . b ES [a, b] ; it is the smallest full set that contains S. Exercises. a. The full hull of any balanced subset of X is balanced. b. The full hull of any convex subset of X is convex.
Let X be an ordered topological vector space. Then the following conditions are equivalent; if they are satisfied we say X is locally full ( or
26.52. Proposition and definition. ordered by a normal cone) .
( A ) (X, 'J) has a neighborhood base at 0 consisting of balanced, full sets. ( B ) (X, 'J) has a neighborhood base at 0 consisting of full sets. ( C ) (X, 'J) has a neighborhood base at 0 consisting of sets V with this property: lf v E V n X+ , then [O, v] c;;; V. (D) If (xo: : a E A) and ( Ya : a E A ) are nets in X based on the same directed set A and satisfying 0 � Xa � Ya and Ya � 0, then Xa � 0. (E) (The Squeeze Property.) If (uo:), (va), (wa) are nets in X based on the same directed set A and satisfying Ua � Va � Wa for all a E A and also . Ua 'J p and Wo: 'J p wi. t h the same l1. m1. t p, then Va 'J p sat1. sfymg
also.
--->
--->
--->
716
Chapter 26: Metrization of Groups and Vector Spaces
(F) If V is any neighborhood of 0, then there exists a neighborhood W of 0 with this property: If w E W n X+ , then [0, w] � V. Remark. Note the similarity between 26.52(E) and 7.40.i. Those two properties are the same in IR, since in that setting the order and topological convergences are the same. Proof of equivalence. The implications (A) (B) (C) (D) are obvious. The implications (D) (E) are an easy exercise. It suffices to show (D) (F) (A). Let N be the neighborhood filter at 0. Proof of (D) (F). Suppose (F) fails. Then there is some V that is a neighborhood of 0, for which there is no corresponding neighborhood W. Then for each N E N, there is some w N E N such that [0, w N] is not contained in V, and hence there is some x N E [0, w N] \ V. Then the net (wN : N E N) converges to 0. By (D), the net ( x N : N E N) converges to 0 but then eventually x N E V, a contradiction. Proof of (F) (A). Let G be any neighborhood of 0; we wish to show that G contains a balanced, full neighborhood of 0. Let G' be a balanced neighborhood of 0 satisfying G' + G' � G; such a set is available since X is a TVS. By (F), there is some neighborhood w of 0 with this property: n w E Wnx)O, w] � G'. Replacing w with a smaller set, we may assume W � G'. Now let W' be some some balanced neighborhood of 0 satisfying W' + W' � W. Let K = G' n W' ; it is a balanced neighborhood of 0. Let F be its full hull that is, F = Ua,bE K [a, b] . The full hull of any balanced set is balanced; thus F is a balanced, full neighborhood of 0. It suffices to show F � G. Let x E F. Then a � x � b for some a, b E K G' n W'. Then 0 � x - a � b - a E W' - W' = W' + W' � W. Hence b - a E W n X+ , and thus x - a E [0, b - a] � G'. Finally, x = (x - a) + a E G' + G' � G. 26.53. A degenerate example. Any topological vector space (X, 'J) can be turned into a locally full space by equipping it with the degenerate ordering x � y x = y (so that the positive cone is {0} ) Indeed, with that ordering, every subset of X is full, so any neighborhood base at 0 consists of full sets. Despite its triviality (or because of it!), this example is useful. It shows that any affine operator between topological vector spaces can be turned into a convex operator from a topological vector space into a locally full space. Thus, the results proved in this chapter for convex operators are applicable to affine operators as well. 26.54. If X is a locally convex, locally full space, then X has a neighborhood base at 0 consisting of balanced, full, convex sets. Hint: Let N be any given neighborhood of 0 in X. Since X is locally full, we have N :2 B where B is a balanced, full neighborhood of 0. Since X is locally convex, we have B :2 C where C is a balanced, convex neighborhood of 0. Show that the full hull of C is a balanced, full, convex neighborhood of 0 that is contained in N. 26.55. Definitions. Let (X, �) be a Riesz space, i.e., a vector lattice. By a Riesz F seminorm we shall mean an F-seminorm p : X ----+ [0, +oo) (defined as in 26.2) that also =}
{=::::}
=}
=}
=}
=}
�
=}
�
=
.
{=::::}
=}
717
Ordered Topological Vector Spaces
has this property:
p(x) :S p(y). jxj � jyj If p is also homogeneous (i.e., if p(cx) icl p (x)), then it is a Riesz seminorm. If p is positive-definite (i.e., if x =f. 0 p(x) > 0), then it is, respectively, a Riesz F-norm or Riesz norm. Examples. On any of the Banach spaces LP (p,) for 1 :S p :S (with {scalars} = lR), the norm II li P is a Riesz norm. On the F-spaces LP (p,) for 0 < p < 1, the F-norm II II� is a Riesz F-norm. 26.56. The Hahn-Banach Theorem was introduced in 12.30. Two more of its equivalents are given by the following principles: (HB15) Riesz Seminorms and (HB16) Positive Functionals. Let X be a Riesz space, let S be a Riesz subspace, and suppose either q is a Riesz seminorm on X, or q : X --+ lR is a positive linear functional. Let >. : S --+ lR be a positive linear functional, satisfying >. :S q on S+. Then >. extends to a positive linear functional A : X --+ IR, satisfying A :S q on X+ . Proof that (HB2) implies both (HB15) and (HB16). In either case, the restriction of q to X+ is a convex, isotone function q : X+ --+ lR that satisfies q(O) = 0. Define p(x) = q(x + ). Then p is convex; this follows from the convexity and isotonicity of q and the fact that (ax + (1 - a)y) + � (ax) + + ((1 +- a )y) + = a(x+ ) + (1 - a)(y + ) if x, y E X and a E [0, 1]. For any x E S we have x � x , hence >.(x) :S >.(x+ ) :S q(x + ) = p(x). Thus (HB2) is applicable, and >. extends to a linear functional A : X lR satisfying A :S p on X; hence A :S p = q on X+ . To see that A is positive, note that if x ?= 0, then ( -x ) + = 0; hence - A (x) = A( -x) ::; p( -x) = q (( -x) + ) = q(O) = 0. Proof that either (HB15) or (HB16) implies (HB1). In either case we take X to be the Riesz space B(t:,.) and let S be the subspace consisting of those nets that are convergent in the ordinary sense. For a proof with (HB15), use the Riesz seminorm q(x) llxll= sup { l x (b)l : o E !:,.}. For a proof with (HB16), use the positive linear functional q(x) = limsup6 E 6 x(o). 26.57. Recall from 8.42.q that, in a vector lattice, a set S is solid if jxj � jyj and y E S imply x E S. Note that any nonempty solid set is balanced. Proposition and definition. Let (X, �) be a vector lattice and let (X, 'J) be a topological vector space, both with the same underlying vector space X. Then the following conditions are equivalent. If one, hence all, of these conditions are satisfied, we say X is locally solid; some mathematicians call it a topological Riesz space. (A) X has a neighborhood base at 0 consisting of solid sets. =;.
oo
--+
=
718
Chapter 26: Metrization of Groups and Vector Spaces
(B) The topology 'J is the gauge topology determined by a collection of Riesz F-seminorms. (C) The mapping x IxI is uniformly continuous from X to X (equipped with the uniform structure resulting from the topology 'J) . (D) The mapping (x, y ) x V y is uniformly continuous from X X (with the product uniform structure) to X . (E) The mapping x x + is uniformly continuous from ( X , 'J) into ( X , 'J) . (F) X is locally full and the mapping x x+ is continuous at 0 . (G) For any two nets (xa ) and ( Ya ) in X (with the same index set), if lxa l =:(, IYa l 'J 0. for all and Ya 'J 0, then Xa ---. Proof of (A) (B). The proof is similar to that of 26.29, case (ii), but we may choose the sets B E � to be solid sets. Construct an F-seminorm p as in 26.29; we shall now show that that function is actually a Riesz F-seminorm. We know that lxl =:(, IYI cp(x) =:(, cp(y) p( x) :::; p(y) . Consider any since each Bn is solid; we are to show that Ix I =:(, IyI x Then + = decomposition Y Yl + Y2 + Yrn · l l =:(, IYI =:(, IYd + IY2I + · · · + IYrn l , hence x E [-lyl / - IY2 I - · · · - IYrn l, IYl / + IY2 I + + IYm l] f---*
x
f---*
f---*
a
f---*
---.
=}
· ·
=}
=}
·
·
[-l y l / , IYd]
+ [-l y2f, IY2 I J +
· · ·
· ·
+ [-IYml, IYml]
by 8.36(C). Thus we can write x x1 + x2 + · + Xm with lx; / =:(, IYJ Hence p(x) :::; 2::: : 1 cp(x;) :::; 2::: : 1 cp(y; ) . Since p(y) is the infimum of all such summations 2::: : 1 cp(y;) , it follows that p( x) :::; p(y) . Proof of (B) (C). Recall from 8.42.o that Ilxl- lx' I I =:(, lx-x' j. Hence for any Riesz F-seminorm p, we have p(/xl - lx' I ) :::; p(x - x'). If Xa - x� --L 0, then p(xa - x�) 0 for every p in the determining family of Riesz F-seminorms; hence p(/xal - lx�l) 0 for 'J 0. each p; hence lxa l - lx�l ---. Proof of (C) (D). Immediate from 8.42.1. Proof of (D) (E). Obvious. Proof of (E) (F). Obviously the mapping x x+ is continuous at 0. To show that X is locally full, we shall verify COI].dition 26.52(F). Let V be any neighborhood of 0. By the uniform continuity of the mapping x x+ , there is some neighborhood W of 0 such that x-yE W x+ - y+ E V. We are to show that if 0 =:(, w E W, then [0, w] 0. Then Then /Ya / ---and hence by 26.52(D) we have xt 'J 0 and x;-: 'J 0. Hence Xa -- x+a - xo:_ T O Proof of (G) (A). Let W be a neighborhood of 0; we wish to show that W contains some full neighborhood of 0. Recall from 8.42.q that the solid kernel of W is the set sk(W) U { [-u, u] : [-u, u] ,:= 0. Replacing (xn) with a subsequence, we may assume f(xn) stays out of some neighborhood G of 0 in Y . Choosing a smaller neighborhood, we may assume G is full. Replacing (xn) with a subsequence, we may assume p(xn) < 4n - n , where p is somen complete Riesz F-norm that determines the topology of X. Let Un 2 xn; then f(un) tJ. 2 G. By subadditivity of p, we have p( un ) < 2 - n . Hence the series Ln Un converges to some limit v in X. Since the un ' s are in X+ , we haven 0 � Un � v in X, so 0 � f(un) � f(v) in Y . For all n sufficiently large, we have f(v) E 2 G, since G is a neighborhood of 0. But G is also full, so f(un) E 2n G, a contradiction. -----+
=*
a,
-----+
-----+
.
-----+
>--+
:
=
26.60. Corollaries.
720
Chapter 26: Metrization of Groups and Vector Spaces
Any two complete Riesz F-norms on a vector lattice are equivalent. Hint: The identity map is a positive operator. b. Let X be an F-lattice, and let f : X lR be a linear functional. Then f is order bounded ( i.e., the image under f of any order bounded set is an order bounded set ) if and only if f is continuous. Hints: Order-bounded implies continuity, by 11.57 and 26.59. For the converse, suppose f is not order bounded. Then there is some set B s;:; X that is order bounded,2 such that f(B) is not bounded in R Choose a sequence (xn) in B with l f(xn) l > n . Let Yn = *Xn i then l f(Yn ) l > n. However, Yn 0 in X, by 27.11 and 27.2 ( D ) . Thus f is not continuous. Example of a continuous operator that is not order bounded. Let C [O, 1 ] = { continuous functions from [0, 1 ] into JR} and c0 = { sequences of reals converging to 0} be equipped with their sup norms; then both are Banach lattices and c0 is Dedekind complete. Define f : C [O,1 1 ] c0 as follows: For any x E C[O, 1 ] , let f(x) be the sequence whose nth term is J0 x(t) sin(27rnt)dt. Hints: The sequence f(x) tends to 0 by the Riemann-Lebesgue Lemma ( 24.41.b ) . It is an easy exercise to show that the operator f is continuous. The set B = { x E C [O, 1 ] : -1 ::; x ::; 1 } is order bounded. However, B contains all the functions Xn (t) = sin ( 2 nt) . Observe that the nth term of the sequence f(xn ) is � ; show that f(B) is not order bounded. a.
--+
--+
c.
--+
7r
C hapter 2 7 Barrels and Ot her Feat ures of TVS 's
BOUNDED SUBSETS OF TVS ' s 27 .1. Motivating exercises. a. Two equivalent seminorms on the same vector space yield the same collection of met
rically bounded sets. b. Show by example that equivalent F-seminorms on a vector space may yield different collections of metrically bounded sets. 27.2. Definition. Let X be a TVS, with scalar field F equal to � or C. Let S � X. Show that the following conditions on S are equivalent. If any, hence all, of these conditions are satisfied, we say that S is toplinearly bounded, or bounded in the sense of topological linear spaces, or bounded with respect to the TVS topology on X . (A) The set {m, S} of mappings ms F ....-. X defined by m8(c) = is equicontinuous. (B) For each neighborhood G of 0, there is some scalar c such that S � cG. (C) For each neighborhood G of 0, there is some r > 0 such that S � cG for all scalars c with l e i > r. (D) Whenever ( en , Xn ) is a sequence in F S with Cn 0, then CnXn 0. (E) Whenever (ca , Xa ) is a net in F S with Co: 0, then Co:Xo: 0. (Hint for the proof of equivalence: 26.27.c.) A collection of functions = { 'P, : r}, from some set n into X , will be called toplinearly bounded pointwise if for each w !1 the set (w) = {tp1 (w) "( E f} is toplinearly bounded in X . Caution: Toplinear boundedness is not the same thing either metric boundedness or order boundedness. In 27.5, 27.6, and 27.11 we investigate some of the relations between toplinear boundedness and the other two kinds of boundedness. It is unfortunate that the term "bounded set" has these three meanings that are sometimes quite different; the reader must strive to determine from context which meaning is intended. In the next few 721 :
:
s E
X
cs
x
--->
--->
"/ E
E
as
--->
--->
:
722
Chapter 2 7: Barrels and Other Features of TVS's
paragraphs, of course, "bounded" means toplinearly bounded unless some other meaning is specified. 27.3. Basic properties of bounded sets. Let X be a TVS, with topology 'J, and let S � X. Show that a. S is bounded if and only if every countable subset of S is bounded. b. The bounded sets form an ideal: the union of finitely many bounded sets is bounded; any subset of a bounded set is bounded. In fact, it is a proper ideal, provided that the space X does not have the indiscrete topology. Any compact set is bounded. d. The closure of a bounded set is bounded. e. If X is locally convex, then the convex hull of any bounded subset of X is bounded. f. A topological vector space is quasicomplete if each bounded, closed set is complete. Prove this more general version of Mazur's Theorem: In a quasicomplete, locally convex space, the closed convex hull of a compact set is compact. Hint: Refer to the result on totally bounded sets in 26.23.i. Suppose that the topology 'J on X is the initial topology determined by a collection of linear mappings into topological vector spaces, X ( Y>. , U>,). Show that S � X is 'J-bounded if and only if
,(S) is U>,-bounded for each .A . h. Let X = TI>. E A X>, be a product of TVS's; then (as we have noted in 26.20.a) X is also a TVS. Show that a set B - E A B>,, where each B>, is a bounded subset of X>,. i. Change of scalar field. Let X be a complex TVS. Then X may also be viewed as a real TVS, with the same topology, if we "forget" how to multiply vectors by nonreal scalars. However, the bounded subsets of the real TVS are the same as the bounded subsets of the complex TVS. Proof This may be easiest to see by considering condition 27.2(D). Any bounded subset of the complex TVS is also a bounded subset of the real TVS, since IR � C. Conversely, suppose S is real-bounded, and suppose (en , Xn ) is a sequence in C S with Cn 0. Then Cn = an + ibn with an , bn 0 in R Then an Xn 0 and bn Xn 0, since S is real-bounded; hence (an + ibn )Xn 0. 27.4. Let X and Y be topological vector spaces. Let f : X Y be a linear map. Suppose f is continuous (i.e., preserves convergent nets) - or, more generally, suppose f is sequentially continuous (i.e., preserves convergent sequences). Let S � X be bounded. Then f(S) is a bounded subset of Y . In some contexts, a linear map is called bounded if it takes bounded sets to bounded sets. (This generalizes the terminology of 23.1.) With this terminology, we have just shown that f is bounded. f is continuous f is sequentially continuous A partial converse is as follows: c.
g.
'P>-
:
---7
---7
---7
---7
---7
::::}
---7
::::}
x
---7
723
Bounded Subsets of TVS's
Proposition. If X is a pseudometrizable TVS, Y is a TVS, and f X Y is a bounded linear map, then f is continuoul:l. Proof By 26.32, the topology of X il:l given by an F-seminorm, p. By 15.34.d, it i:luffices to i:ihow f il:l l:lequentially continuoul:l. Suppol:le not - l:lay (xn) is a l:lequence that convergel:l to 0 in X, while (f(xn )) doel:l not converge to 0 in Y . Pal:ll:ling to a i:lUbl:lequence, we may al:lsume (f(xn)) l:ltayl:l out of i:lOme neighborhood G of 0 in Y . We have p(xn) 0; passing to a subsequence, we may asl:lume p(xn) < 1/n2 . Then p(nxn) :::; 1 /n, by the subadditivity of any F-l:leminorm p. The l:lequence (nxn) convergel:l to 0, hence il:l bounded. Since f is bounded, the sequence (f(nxn )) = (nf(xn)) is bounded in Y. Then the sequence whose nth term is � nf(xn) mul:lt converge to 0, a contradiction. Remark. A partial extenl:lion to nonmetrizable spacel:l is given in 27.4l.m. 27.5. Let X be a topological vector i:ipace. Then any toplinearly bounded set B is metrically bounded, in the following sense: If p is any continuous F-l:leminorm (or, more generally, any continuoul:l G-l:leminorm) on X, then :mp:r E B p(x) < Proof Suppol:le not. Say there exists a sequence (xn) E B with p(xn ) > n. Let y, = n - 1 xn Since B is bounded, we have Yn 0, hence p(yn) 0, hence p(yn) < 1 for n l:lufficiently large. But then by the subadditivity of p we have p(xn) = p(nyn ) :::; np(y, ) < n, a contradiction. 27.6. Let X be a locally convex space over scalar field F, and let R be any family of semi norms that determines the topology of X. Let S g in Li3 (JL, X ) ; we are to show that f = g. By 22.3l(ii) we may pass to subsequences such that fn --> f and fn --> g pointwise 11-almost everywhere. Remarks. This example is taken from Villani [1985]. That paper also shows the following interesting result: Let X be a Banach space, let (0, S, JL) be a measure space, and let a, {J (O, +oo) with a < {3. Then L" (JL, X ) c;;: L11 (11, X ) if and only if inf{JL(S) : S S, JL(S) > 0} > 0; L" (JL, X ) ::2 L11 (JL, X ) if and only if sup{JL(S) S E S, JL(S) < oo} < oo. Special cases of this were given in 22.34. (Villani 's paper only shows this for X = JR, but that case easily yields the general case since all the functions in L" (JL, X ) or £f3 (JL, X ) are measurable, and we can separate the "regular" condition from the "not too big" condition -- see the remarks in 22.28.) 27.30. Change of scalar field. Let X be a complex topological vector space (respectively, a complex locally convex space). Then X , with the same topology, may also be viewed as a real topological vector space (respectively, a real locally convex space) if we "forget" how
(B5)
E
x
-->
__,
E
E
:
736
Chapter 2 7: Barrels and Other Features of TVS's
to multiply members of X by nonreal scalars. Let us denote these two TVS's by Xc and XJR. Note that the choice of scalars affects the definitions of "balanced" and "absorbing;" hence it affects the definitions of "barrel" and "ultrabarrel." Show that If B is an (ultra)barrel in Xc, then B is also an (ultra)barrel in XJR. Likewise, any (F-)seminorm on Xc is also an (F-)seminorm on XJR. Hence, if XIR is (ultra)barrelled, then Xc is (ultra)barrelled, too. b. It is possible for XIR to have more (ultra)barrels than Xc. For instance, show that the set B {z E C : IRe(z)l :::; 1 and llm(z)l :::; 2} is both a barrel and an ultrabarrel in XIR, but is neither in Xc since B is not balanced in Xc. Nevertheless, XIR is (ultra)barrelled if and only if Xc is (ultra)barrelled. For the moment, we shall prove this equivalence using only definitions (U2) and (B2); proofs with the other definitions in 27.26 and 27.27 will follow from the arguments given in the next subchapter. We have already established half of this "if and only if" result. Now assume Xc is (ultra)barrelled - i.e., it satisfies condition (U2) or (B2). To show the same for XIR, let p be any lower semicontinuous (F-)seminorm on XIR; we wish to show that p is continuous on XJR . Note that XIR and Xc differ only in their algebraic operations they are the same set, and they have the same topology; so a function is continuous on XIR if and only if it is continuous on Xc . Define "( X [0, +oo) as in 26.5.b. As we noted in 26.5.b, this function "( is also lower semicontinuous on X, and "( is an (F-)seminorm on Xc. Hence, by our assumption, "( is continuous. From the inequality p :::; "(, we see that p is continuous at 0. Since p is a G-seminorm, we have !p(u)-p(v)! :::; p(u - ) and therefore p is continuous. a.
=
c.
:
-->
v ,
PROOFS OF BARREL THEOREMS 27.31. We now begin the somewhat lengthy proof of 27.26 and 27.27. We remark that shorter proofs of equivalence can be found in the literature (for instance, in Waelbroeck [1971]) if one omits the nonlinear conditions (U4), (U6), (B4), and (B6). The order of proof will not be the same as the order in which the results were stated. We shall cover the barrels and ultrabarrels cases simultaneously. In the discussions below, phrases in brackets should be read or omitted for the two cases - e.g., an [F-]seminorm means a seminorm for the argument with barrels or an F-seminorm for the argument with ultrabarrels. Also, (1) will refer to either (U1) or (B1), and (2) will refer to either (U2) or (B2), etc. We shall prove the equivalence in this order: (1) (2), • • • •
�
(4)
::::} ::::}
(3)
::::} ::::}
(2),
(6) (5) (1), and (1) implies both (4) and (6).
737
Proofs of Barrel Theorems
In each argument the implication will be proved with either choice of scalar field (IR or C); in fact, the choice of scalar field will not enter into most of the arguments. Our first few proofs are along the lines of Waelbroeck [1971]. Some researchers may also find Adasch, Ernst, and Keirn [1978] helpful for further reading on this topic. 27.32. Proof of (1) (2). Let p be a lower semicontinuous [F-]seminorm; then the sets Sn { x E X : p(x) :::; 2 - n } are closed. Each Sn is also absorbing, since p is scalarly continuous (see 26.3.a). It follows easily that the sequence (Sn ) is a closed [convex] string. By (1), then, it is a neighborhood string; thus each Sn is a neighborhood of 0. It follows easily that p is continuous at 0. Since IP( ) p( ) I :::; p( ) it follows that p is continuous everywhere. 27.33. Proof of (2) (1). Let Vo be an [ultra]barrel; we wish to show that V0 is a neighborhood of 0. It suffices to produce a lower semicontinuous [F-]seminorm p with the property that { x E X : p( x) < 1 } C V0. For the locally convex case, let p be the Minkowski functional of V0; that is, p(x) inf{k E (0, +oo] : k - 1 x E V0} . Then p is a seminorm satisfying (q), as noted in 12.28 and 12.29.g. To show that p is lower semicontinuous, suppose p(x0) > r > s > 0. Then x0 tf:_ rV0. Since rV0 is1 closed, its complement is a neighborhood of x0, and for x in that neighborhood we have r - x tf:_ V0, hence p(x) 2: r > s. Thus for any s the set {x E X : p(x) > s} is open. For the non-locally-convex case, let (Vj : j 0, 1, 2, 3, . . . ) be a closed string in X. By a dyadic rational in [0, 1) we mean a number of the form '*
=
u -
v
u-v ,
=*
=
=
t
t
t
n - + -2 + -3 + · · · + 2 4 8 2n h
n,
for some positive integer where each t1 is either 0 or 1. For each number of this type, define the set n L
{jEN:t1 = 1 }
Vj
c
Vo.
Verify that the Wa 's are balanced and absorbing. Also, for any dyadic rationals a, (3 with a + (3 < 1 we have and Wa + W,13 a > c for some dyadic rational a, and therefore x0 tf:_ cl(Wa )· The complement of cl(Wa ) is an open set on which p( - ) 2: a > c. Thus the set {x E X : p(x) > c} is open for any c. 27.34. Proof of (3) (2). Let a be a lower semicontinuous [F-]seminorm on X. The linear subspace K a - 1 (0) = {x E X : a-(x) :::; 0} is closed. Let Q X/K be the quotient space, and let n : X Q be the canonical map. Then an [F-]norm (j is defined on Q by Ci(n(x)) = a(x) . We topologize Q with this [F-]norm. (We do not claim that the =
=*
=
___,
=
738
Chapter 27: Barrels and Other Features of TVS's
resulting topology is the quotient topology.) Let C be the completion of the [F-]normed space (Q, &). Let its [F-]norm, an extension of &, again be denoted by &; then (C, &) is a complete [F-]normed space. Let i Q --S C be the inclusion. We claim that the composition i 1r : X � Q _2__. C has closed graph. To see this, let ( (Xa , Qa ) ) be any net in the graph of i 1r, converging in X x C to some point (x, q); we shall show that (x, q) actually lies in the graph of i 1r. Let any number c > 0 be given; it suffices to show that (i (q - ( i o 1r) (x) ) < 2c. Since C is the completion of1 Q, there is some q' E Q with &(q' - q) < c. Since Q = 1r(X), we may choose some x' E 1r - (q'). Now compute & (q - (i o 1r)(x) ) - c ::; (i (q - q') + (i (q' - 1r(x)) - c < (i (q' - 1r(x)) = (i (1r(x' ) - 1r(x)) = a(x' - x) ::; liminf a a(x' - xa ) Ci (q' - q) < c. = limainf (i (1r(x1 - Xa )) = liminf a (i (q' - qa ) Hence the linear map i 1r does indeed have closed graph. By (3), then, i 1r is continuous. Hence : X ( Q, &) is continuous; hence a is continuous on X . 27.35. Proof of (5) ( 1 ) . We shall make the proof slightly longer but easier to understand by splitting it into two parts. We first prove (5) ( 1 ) under the additional assumption that X is a Hausdorff space. Let (Vk) be a closed [convex] string in X ; we wish to show that the Vk 's are neighborhoods of 0. We shall construct (i) a [locally convex] topological vector space 3, and (ii) a sequence (fJk) of neighborhoods of 0 in 3, and set (iii) a family of continuous linear maps O} U { (x, O) E �2 : x 2:: 0}
is a convex set in �2 that is not equal to an intersection of half-spaces (easy exercise). BILINEAR PAIRINGS
will mean a triple (X, Y, ( , ) ) where X and Y are vector spaces over the scalar field lF (without any topologies necessarily specified) and ( , ) is a bilinear map from X Y into lF (defined as in 1 1.7). We may abbreviate this arrangement by (X, Y). When (X, Y) is a bilinear pairing, then an associated bilinear pairing [Y, X] can be defined by [y, x] (x, y) . However, we shall usually use the same symbol ( , ) for both of these functions. Thus, ( , ) represents two functions, one from X Y into lF and the other from Y X into lF, related by (x, y) = (y, x) . This ambiguity in our notation should not cause any difficulty. 28.6. Definitions. A bilinear pairing x
x
=
x
755
Bilinear Pairings
Let (X, Y ) be a bilinear pairing. Then each y E Y acts as a linear map ( , y) : X lF; thus Y acts as a collection of functions on X. Observe that this collection of functions separates points of X ( in the sense of 2.6 ) if for each pair of distinct points x 1 , x2 in X , there exists at least one y E Y such that (x 1 , y) =f. (x2 , y) or, equivalently ( since the y's act as linear maps ) , if for each point x =f. 0 in X , there exists at least one y E Y such that (x, y) =f. 0. This condition may or may not be satisfied. If it is satisfied, then the elements of x act as different members of Lin ( Y, JF) = { linear functionals on Y } , and so we may view X as a linear subspace of Lin ( Y, JF). Similarly, the points of X may or may not separate the points of Y; if they do, then we may view Y as a linear subspace of Lin ( X, JF). We shall say that (X, Y ) is a separated pairing if each of the sets X, Y separates the points of the other set. Remarks. What we have called a "separated pairing" is called a "dual pairing" in many other texts, which assume the separation property throughout the entire development of duality theory. We have deviated from that conventional terminology to clarify just where the separation property is or is not needed. Admittedly, most pairings arising naturally in applications are separated, but a few are not; see 28.7.b. ·
__,
28. 7. Examples. a. Let X = Y = { continuous functions from [0, 1 ] into lF}, and let (x, y) = J01 x(t)y(t)dt. Then (X, Y ) is a separated pairing. b. Let X = Y = { piecewise continuous functions from [0, 1 ] into lF} (defined as in 19.28 ) , and let (x, y) = J01 x(t)y(t)dt. Then (X, Y ) is a bilinear pairing, but it is not separated. c.
d.
For instance, if x is the characteristic function of a nonempty finite set, then elements of Y do not distinguish x from 0. If X is any linear space, and Y is any linear subspace of Lin ( X, JF) = { linear functionals on X}, then the evaluation map (x, y) = y(x) defines a bilinear pairing. With this pairing, X separates points of Y, but Y does not necessarily separate points of X , so ( , ) is not necessarily a separated pairing. The preceding case arises, in particular, if X is a topological vector space and Y = X* is its topological dual. We note several subcases: ( i ) If X is not Hausdorff, then X* does not separate points of X . ( ii ) If X is a Hausdorff locally convex space, then X* does separate points of X by ( HB22 ) in 28.4, and so (X, X*) is a separated pairing. (iii ) If X is a Hausdorff topological vector space that is not locally convex, then X* may or may not separate points of X. We saw examples of those two cases in 26.17 and 26.16.
756
Chapter 28: Duality and Weak Compactness
28.8. Definition. Let (X, Y) be a bilinear pairing. Let S be a collection of subsets of Y. Each element of X may be viewed as a mapping from Y into the scalar field lF, and so we may topologize X with the topology of uniform convergence on elements of S, as defined in 18.26. We may refer to that as the S-topology. Proposition. Suppose that S satisfies these conditions: (i) Each set S E S is pointwise bounded, in the following sense: For each x E X, the set of scalars (x, S) = { (x, s) : E S} is bounded. (This condition is reformulated in 28.12.b.) (ii) S is directed by inclusion - i.e., the union of any two members of S is con tained in some member of S. (iii) If S E S and r is a nonzero scalar, then rS E S. Then the S-topology makes X into a locally convex topological vector space. Furthermore, {Ps : S E S} is a gauge that determines that topology, and {Sr> : S E S} is a neighborhood base at 0 for that topology, where Sr> = n { x E X : l (x, s) l :::; 1}. ps(x) = sup l (x, s) l , sES s
sE S
(The "polar" sets sr> will be studied further in 28.25 and thereafter.) Hints: See 27.9.£, 27.9.h, and 27.9.c. Remarks. Condition (i) is essential for a TVS topology, as we saw in 27.9.£. Conditions (ii) and (iii) are not so essential, but they are quite convenient; they yield the characterization of neighborhoods in terms of polars. Moreover, conditions (ii) and (iii) are not really restrictive: If (i) is satisfied, then we can replace S with a larger collection yielding the same S-topology and also satisfying (ii) and (iii). We saw this for (ii) in 27.9.b; for (iii), replace S with the collection 'J = {rS : r > 0, S E S}. 28.9. Preview and definitions. Following are four important cases of collections S satisfying the conditions of 28.8. a. If S {finite subsets of Y}, then the S-topology is denoted by (j(X, Y) or, more briefly, (j or (The and stand for "simple" and "weak.") It has many names - it is called the weak topology, the ¥-topology, the ¥-weak topology, the topology of simple convergence, or the topology of pointwise convergence. In an analogous fashion, we define the (j(Y, X) topology on Y - that is, the topol ogy on Y given by convergence on points of X. It makes Y into a topological vector space, so it can be used to specify certain kinds of subsets of Y - for instance, the dY, X)-compact sets. These are used to define some other topologies on X, described below: b. If S = {pointwise bounded subsets of Y}, where "pointwise bounded" is defined as in 28.8(i), then the resulting S-topology is called the strong topology on X; it is denoted by {3(X, Y). (The {3 stands for "bounded.") Clearly, this collection S is the =
w.
(]
w
757
Bilinear Pairings
largest collection of subsets of Y that satisfies the conditions of 28.8, so /3(X, Y) is the the strongest topology that can be constructed as in 28.8. c. Let S1 = { u(Y, X)-compact, convex subsets of Y} and S 2 { u(Y, X)-compact, con vex, balanced subsets of Y}. It can be shown that these two collections satisfy the requirements of 28.8 and that furthermore they yield the same S-topology. (We shall not prove those assertions since they are not needed later in this book.) This topology is called the Mackey topology and is denoted by T(X, Y). d. Actually, locally convex topology can be viewed an S-topology, by taking S to be the equicontinuous subsets of X*; see 28.28. The set X equipped with the weak, strong, or Mackey topology may be denoted X,n Xf3, or Xn respectively. The topological vector space Xu may also be denoted Xw in some contexts. It is easy to see that u(X, Y) C T(X, Y) C /3(X, Y), since a larger collection S yields a stronger (i.e., larger) S-topology. This chapter is concerned primarily with the weak topology. The Mackey and strong topologies are important in distribution theory, but they will not be considered in great depth in this book; we introduce them mainly for the sake of some information they yield about the weak topology. (See also 28.17.a.) 28.10. Retopologizations. We now describe one of the most important ways to form S-topologies. Let X be a vector space. Let be a topology that makes X into a topological vector space; let X-y denote the vector space X equipped with that given topology. (The stands for "given," if you like.) Let (X-y)* be its dual - i.e., the set of all continuous linear maps from X-y into F. Then (X, (X-y )*) is a bilinear pairing (not necessarily separated). It can be used to define more topologies on X, most notably the weak topology u = u(X, (X-y)*), the strong topology /3 = /3(X, (X-y)*), and the Mackey topology T = T(X, (X-y) * ). In this context, we may call the given topology or the original topology. (Some math ematicians also call it the initial topology, but we prefer to reserve that term for the kind of topology introduced in 9.16.) Caution: Because the two topologies used most often are and u, the beginner who studies only these two topologies may be tempted to call the "strong" topology, to contrast it with the "weak" topology u. However, the term "strong" customarily refers to the topology /3(X, (X-y) * ). The strong topology /3(X, (X-y)*) is at least as strong as the given topology "(, and in some cases it is strictly stronger. =
every
as
1
1
1
1
1
758
Chapter 28: Duality and Weak Compactness
We now summarize the relations between these important topologies: If X'Y is a TVS with dual (X'Y )*, then T(X, (X"' )*) a(X, (X"� )*) ;J(X, (X"' )*). These inclusions are justified by conclusions in 28.13.b and 28.30. C
"(
C
C
WEAK TOPOLOGIES 28.11. Characterizations of the weak topology. Let (X, Y) be a bilinear pairing. As we stated in 28.9.a, the u(X, Y) topology on X is the topology of pointwise convergence on members of Y. Thus, a net (xa) converges to a limit x in the topological space (X, a(X, Y ) ) if and only if (xa , y) (x, y) for each y E Y. This topology can also be characterized in ---+
other ways: a(X, Y) is the initial topology (in the sense of 9.15, 9.16, and 15.24) generated by elements of Y. In other words, it is the weakest topology that makes all the mappings ( , y) : X ---+ lF continuous. b. One gauge that determines the topology a(X, Y) is the collection of seminorms {py : y E Y}, where Py (x) = l (x, y) l . A neighborhood subbasis at 0 for this topology is given by the sets for y E Y, E > 0. Sy (E) = {x E X : l (x, y) l :::; E}, That is, a set is a neighborhood of 0 in this topology if and only if that set contains the intersection of finitely many sets of the form Sy(E), for various y's and E's. 28.12. Basic properties of the weak topology. Let (X, Y) be a bilinear pairing, and let a = a(X, Y) be the resulting weak topology. Show that Xa is a locally convex topological vector space. b. A set B � X is weakly bounded (i.e., bounded in the topological vector space Xa, in the sense of 27.2) if and only if each y in Y is a bounded function on B - that is, if and only if supbE B I (b, y) I < for each y E Y. (Thus, the "bounded pointwise" requirement introduced in 28.8(i) is the requirement that each S E S be a(Y, X)-bounded.) Every member of Y is a continuous linear map from Xa into lF. Thus, y ( , y) is a linear mapping from Y into (Xa)*. d. That mapping y (-, y) , from Y into (Xa)*, is surjective. That is, every continuous linear map A : Xa ---+ lF is represented by at least one member of Y . Hints: {x E X : I A(x) l :::; 1} is a a-neighborhood of 0. Use 28.11.c to show that there exists a finite set F oo. Hints: We know X* is infinite-dimensional, by Kottman's Theorem (23.22). Let H be a vector basis for X*; then H is an infinite set. Let 9"' = {finite subsets of H}, directed by inclusion. For each F 9"', choose some H \ F; then use the Common Kernel Lemma (11.16)(Qk) to find some vector in X that vanishes on F but not on Let Xp be a suitable scalar multiple of that vector, chosen so that llxF II 2: card( F). Corollaries. Suppose (X, II II ) is an infinite dimensional normed space. Then: The weak closure of the unit sphere S = {x E X : ll x ll = 1} is the unit ball B = {x E X : llxll :S: 1}.
28.16.
a
:
--->
a.
a
c.
e.
--->
E
a.
v E
v.
762
Chapter 28: Duality and Weak Compactness
Hints: Suppose l iz II < 1 . Choose a net (xF) as in the preceding proposition. Let ZF = + rFXF for some real number rF; show that a suitable choice of rF yields ZF E S and lrFI :::; (l l� � ll) ----+ 0. Hence (zF) is a net in S converging weakly to z. b. If S c:;; X is bounded, then a-int(S) = 0. That is, in the weak topology, any bounded set has empty interior. Hints: Suppose is in the weak interior of S. Replacing S with S - we may assume 0. There is a net that converges weakly to 0 but stays out of the bounded set S. c. The topology of X is not metrizable. (Of course, certain small subsets of X equipped with their relative topologies, may be metrizable.) Hint: If d is a metric for that topology, use the preceding proposition to find a sequence (xn) satisfying d(O, Xn) < 1/n and llxn II > n. This contradicts 28.14.e. 28.19. Proposition. Let X be a locally uniformly convex Banach space - see 22.38. Let (xa) be a net in X, and also let X00 E X. Then the following are equivalent: (A) llxa - Xooll ----+ 0. (B) Xa ----+ Xoo weakly and lim sup llxall ::=; llxooll · Proof The proof of (A) =? (B) is trivial. Assume (B); we shall prove (A). Note that Xa ----+ X00 weakly implies llxooll :::; liminfa llxall, and thus llxall ----+ l x oo ll · We may assume X00 ::/:- 0 (why?), and that all the Xa's are also nonzero (why?). Replacing Xa with Xa/llxall, we may assume llxall 1 for all and llxooll = 1 (explain). By the Hahn-Banach Theorem (HB8) in 23.18, there exists some >. E X * such that II AII >.(x00 ) 1. Then 2 2: llxa + Xool l 2: 1>-(xa + Xoo)l ----+ 2. Thus llxa + Xooll ----+ 2. By local uniform convexity, Xa ----+ X00 • 28.20. Recall from 23.10 that the dual of the Banach space £ 1 is £00 • Hence a net (xa) converges weakly in £ 1 to a limit X00 if and only if 2.::::;:: Xa,jZj ----+ 2.::::;:: X00,jZj for each z = ( Z2 , Z3, . . . ) in £00 • The space £ 1 has an unusual property, not shared by most Banach spaces. Schur's Theorem. Let (xn) be a sequence converging to a limit X 00 in the weak topology of f! 1 . Then also Xn ----+ X00 in the norm topology. Remarks. We emphasize that Schur's Theorem applies to only to sequences, not to nets. That is clear from 28.18, for instance. The proof below is direct. Some mathematicians may prefer a proof using Baire category, such as that given by Conway [1969]. Outline of proof of theorem. Assume that Xn ----+ X 00 weakly but not in norm; we shall obtain a contradiction. Say Xn = (Xn, Xn, 2, Xn,3, . . . ) . Then: We may assume X00 = 0. (Replace each Xn with Xn - X00 .) b. We may assume llxnll = 1 for all n. Hints: The sequence ( ll xnll : n E w) is bounded and does not converge to 0. Replacing (xn) with a subsequence, we may assume that z
!
p
p
=
p,
a
a,
=
a
1
z1 ,
1,
a.
=
=
1
763
Weak Topologies of Normed Spaces
the numbers l l xn I are all positive and converge to some positive number c. Since the weak topology makes 1\ a topological vector space, we may replace each Xn with Xn / llxn ll (explain). For each finite set S 2/3. There exists some E £00 such that llzll oo = 1 and lzj Xn,j l lxn,j l whenever j E S(n). f. I I:;� 1 Xn.jZj l > 1 / 3 for all n, contradicting the fact that Xn --+ X00 weakly. 28.21. Let [a, b] be a compact interval in JR, and let C[a, b] = {continuous scalar-valued functions on [a, b] } ; this is a Banach space with the sup norm. We emphasize that the following result is for sequences, not for nets. Proposition. In C[a, b], a sequence Un ) converges weakly to a limit f if and only if the sequence Un ) is uniformly bounded and fn --+ f pointwise on [a, b] . Proof. If fn --+ f weakly, then Un) is bounded by 28.14.e and each pointwise evalua tion mapping g(t) is a continuous linear functional on C[a, b] , so fn --+ f pointwise. Conversely, suppose fn --+ f pointwise and boundedly. By 29.34, each continuous linear functional on C[a, b] is represented by a scalar-valued measure fJ on the Borel sets; it suf fices to show that J fn dfJ --+ J f dfJ. If the scalar field is C, we may consider the real and complex parts of thus it suffices to consider real-valued f.L · By the Jordan Decomposition, it suffices to consider finite positive measures f-L · Then J fn dtJ --+ J f dtJ by the Dominated Convergence Theorem (22.29). 28.22. When no topology is specified for X*, then X* is generally understood to be equipped with its norm topology, using the operator norm as in 23.7. That topology on X* is usually used to define the second dual - i.e., the vector space X**. In addition to the norm topology, two other topologies on X* that are occasionally useful are the weak topology O"(X*, X**) and the weak-star topology O"(X*, X). oo .
c.
z
e.
g
=
f---+
{J;
Exercises.
The weak topology O"(X*, X**) and the weak-star topology O"(X*, X) are Hausdorff, locally convex topologies on X*. The weak-star topology is weaker than (or equal to) the weak topology. (In 28.41(B), we shall consider the conditions under which these two topologies are equal.) b. The norm-closed unit ball, { v E X* llvll ::; 1}, is closed in both the weak and weak-star topologies. 28.23. Example. Let X = c0 = {sequences of scalars converging to 0}; we have seen in 23.10 that X* = t\ and X** = 1!00 . Recall from 21.1l.b that a probability measure on N is a sequence (pn) with Pn 2 0 for all n and I:;�=l Pn = 1. Let P be the set of all such probability measures. Show that P is a closed convex subset of £1 , when that space is given its norm topology. Hence P is also a.
:
764
Chapter 28: Duality and Weak Compactness
weakly closed - i.e., closed in the a(£1, £00) topology - by 28.14.a. However, P is not weak-star closed - that is, P is not closed in the a(£ 1 , co) topology. Indeed, we have 0 tf- P, but the sequence (1, 0, 0, 0, 0, .), (0, 1, 0, 0, 0, .), (0, 0, 1, 0, 0, .) , (0, 0, 0, 1, 0, . ) , is easily shown to be weak-star convergent to 0. 28.24. As we have shown above, a normed space, equipped with a weak or weak-star topology, generally is not metrizable. Nevertheless, certain subsets of that nonmetrizable TVS may be metrizable, when equipped with the relative topology. Here are two particularly important special cases. Let V be a normed space. a. If V is separable and is a norm-bounded subset of V*, then the relative topology on determined by the weak-star topology is metrizable. b. If V* is separable and is a norm-bounded subset of V, then the relative topology on determined by the weak topology is metrizable. Hints: Show that convergence in , pointwise on the separable space mentioned, is equiv alent to convergence pointwise on some dense subset of that separable space, since is bounded. Convergence on a countable set can be determined by a countable collection of seminorms. 0
0 0
0
0 0
0
0
POLAR ARITHMETIC AND EQUICONTINUOUS S ETS 28.25. Definition. Let \X, Y) polar of R to be the set
be a bilinear pairing. For each set R � X, we define the
{y E Y : l(x, y)l :::::: 1 for all x E R}. Similarly, we may define the polar of any set S � Y to be the set 5r> {x E X : l(x, y)l ::::; 1 for all y E 5}. These operations are a special case of 4.10(D). Caution: Notations differ. For instance, some mathematicians call the objects above the absolute polars of R and S, and use Re(x, y) instead of I (x, y) I to define "polar." Moreover, among many mathematicians, R are denoted by R0 and 5°; we have introduced" separate notations to reduce confusion among beginners. 28.26. Elementary properties. We state results mainly for analogous results obviously hold for a. 0 <J = Y, X va(x). In particular, for each S E P(O) we have ( S ) = vo( 1s) = lima EA 1s(w(a)). Then is a charge on the measurable space (0, P(O) ) , and its range is contained in {0, 1} since a net of Os and 1s can only have limit 0 or 1. Moreover, ( F) = lima EA 1F(w(a)) = 1. The preceding conclusions about are valid for each F E �; in particular ( O ) = 1 . Thus is a two-valued probability charge that takes the value 1 on �' so is the characteristic function of an ultrafilter that contains �. (This proof is modified from Luxemburg [1969] .) Remark. The Ultrafilter Principle and its consequences are needed frequently in duality theory and will be used heavily throughout the remainder of this chapter. Hereafter we shall use the Ultrafilter Principle freely; we shall discontinue our past practice of keeping track of its uses and its equivalents. 28.30. Let X be a topological vector space with topology and with dual X*. Consider polars with respect to the bilinear pairing (X, X*). Let S is a neighborhood of 0. (B) S is contained in some a- (X*, X)-compact, convex, balanced subset of X*. (C) S is contained in some u(X*, X)-compact subset of X*. (D) S is u(X*, X)-bounded; that is, Sf> is absorbing. Proposition. In any TVS X1 we have (A) (B) (C) (D). Moreover, suppose X1 is a locally convex space. Then X, is barrelled if and only if (D) (A) . In other words, a locally convex space X is barrelled if and only if it satisfies this condition (compare with 27.27(B5)): ( B5' ) Another Uniform Boundedness Property. Let be a collection of continuous linear maps from X into the scalar field that is bounded pointwise. Then is equicontinuous. E
p,
p,
p,
p,
p,
p,
p,
1
"!-
=?
=?
=?
=?
Proof For (A) =? (B), note that S <J , and use 28.26.d and (UF27) in 28.29. The implication (B) =? (C) is trivial. The implication (C) (D) is just 27.3.c. To show that (D) (A) in any barrelled space, note that sr> is a a-( X, X*)-closed, convex, balanced subset of Y; hence it is also X1-closed (see 28.13.b). If sr> is absorbing, then it is a "(-barrel, hence a "(-neighborhood of 0. On the other hand, suppose that (D) (A); let us show X1 is barrelled. Let R
778
Chapter 28: Duality and Weak Compactness
(F) Any nonempty, closed, convex subset of X contains at least one point of minimum norm. (Compare with 22.39(E).) (G) For each f E X*, we have llfll = max{lf(x)l : x E B}. (H) B is weakly complete - i.e., complete when equipped with the uniformity of the weak topology. (I) The weak topology on X is quasicomplete (as defined in 27.3.f). Remarks. In condition (G), we emphasize that a maximum is given, not just a supremum. Contrast this with 23.7 and (HB8) in 23.18. Also, we note that some of the conditions are purely topological - i.e., they are un affected if we replace the given norm on X with some equivalent norm. Therefore all the conditions above are purely topological. Thus, reflexivity should not be viewed as a prop erty of certain normed vector spaces; rather, it is a property of certain topological vector spaces whose topologies are normable. In condition (A), we emphasize that the isomorphism between X and X ** cannot be just any isomorphism; it must be given by the canonical embedding of X � X**, which was described in 9.57 and 23.20. It can be shown that the space J introduced in 22.26 is isomorphic to J ** - i.e., there exists a linear homeomorphism between J and J** - but that isomorphism is not given by the canonical embedding, and in fact J does not satisfy any of the equivalent conditions listed above. This was proved by James [1951]; additional discussion on this subject can be found in James [1982]. Proof of (A) (B). Obvious. Proof of (B) (C). By the Alaoglu Theorem (UF28) in 28.29. Proof of (C) (D). Any closed, convex set is weakly closed and contained in the compact set rB for some r > 0. Proof of (D) (E). Let r = dist(x, Q). Then the sets Sn = { q E Q ll x - ql l ::; r + � } are closed, convex, and bounded, hence weakly compact. Since S1 2 S2 2 S3 2 · · · and each Sn is nonempty, the intersection of the Sn 's is nonempty. Any point q in that intersection satisfies llx - qll = r. Proof of (E) (F). Take x = 0. Proof of (F) (G). We know that llfl l = sup{Ref(x) : x E B}, and we wish to show that that supremum is actually a maximum. We may assume f =f. 0. Let Q = {x E X : Ref(x) ?': llfll}. Let q be a member of Q with smallest norm. It suffices to show that llqll ::; 1. (Some readers may wish to take that assertion as an exercise before reading further.) Choose a seguence (bn) with llbnll 1 and Ref(bn) ---+ II J II · Let Qn = (Ref(bn)) - 1 llfllbn; then Qn E Q. Since q is the member of Q with smallest norm, llqnll ?': llqll for all We have llqnll ---+ 1 , hence 1 ?': llqll · Proof of (G) (C). Immediate from James's Theorem 28.37. Proof of (C) (H). Obvious. =}
=}
=;.
=}
:
=}
=}
=
=}
=}
n.
779
Some Consequences in Banach Spaces =?
Proof of (H) (I). The weak and norm topologies have the same bounded sets. Any TVS topology is invariant under multiplication by a positive scalar. Hence it suffices to consider subsets of B. Any weakly closed subset of B is (for the weak topology) a closed subset of a complete set, hence complete. Proof of (I) (A). Let � E X ** be given. By the Goldstine-Weston Theorem, there is some bounded net (x.\ : >. E A) that is CT(X**, X*)-convergent to �· Then (x.\) is CT(X ** , X*)-Cauchy, hence CT(X, X*)-Cauchy, hence CT(X, X*)-convergent to some x E X. It follows that �(f) = f(x) for all f E X*. 28.42. Exercise. Let X be a Banach space, with dual X*. Then X is reflexive if and only if X* is reflexive. Hints: Let X** and X*** be the second and third dual spaces. Let T : X X** and U X * X*** be the canonical embeddings; these maps are linear and distance preserving. We are to show that (i) T is surjective if and only if (ii) U is surjective. The proof of (i) (ii) involves little more than unwinding the notation and "chasing some arrows around a diagram." Let � be any member of X***. The composition >. : X _I__. X** ----L IF is a member of X*. Using the definitions of T and U, verify that U ( .A ) � · The proof of (ii) (i) is a bit more substantial - it uses the Hahn-Banach Theorem. Suppose T is not surjective. Then T(X) is a proper closed subspace of X**; say E X** \ T(X). By (HBll) in 23.18, there exists some continuous linear functional � E X*** that vanishes on T(X) but not on Unwind the notation to arrive at a contradiction. 28.43. Exercise. Show that the weak topology on t\ is sequentially complete but it is not quasicomplete. 28.44. Let (X, I II) be a real Banach space, with dual space X*. (For simplicity we consider only real scalars.) The normalized duality map of X is the map J : X {subsets of X*} defined by J(x) {>. E X* : II.AII = llxll and .A(x) = llxll 2 } . Such a map will be used in 30.20 and thereafter. Here are some of its basic properties: a. The set J(x) is nonempty, by (HB8) in 23.18. b. The set J(x) is convex and weak-star compact (hence also norm-closed). Hint: Show that it is the intersection of the two sets {.A E X * : II.AII :S llxll} and {A E X * : .A(x) 2: llxll 2 }, both of which are convex and weak-star closed. Refer to (UF28) in 28.29. J(cx) cJ(x) for any real number d. If X* is strictly convex, then J is single-valued - i.e., J(x) is a singleton for each X E X. Hint: J(x) is a convex subset of the sphere {A E X* : II.AII = llxll }; see 22.39(C). e. Example. If X is a Hilbert space with inner product ( , ), then X* = X (see 28.50), and .J (x) is the singleton {.A:r}, where Ax (u ) = (x, u). =?
:
----+
----+
=?
=
=?
cp
cp .
----+
c.
=
c.
780
Chapter 28: Duality and Weak Compactness
Let X = LP (J-1) for some measure space (fl, S, J-1) and 1 p Then q X* = L (J-1) (see 28.50), and J(x) is a singleton {y}, where y E L q (J-1) is given by x(w) l x (w) IP - 2 II x ll �- p wherever x (w ) =/= 0 y(w) 0 wherever x (w ) = 0. Example . Let be u-finite, and let X = L 1 (J-1 ); then X* = L 00 (J-1) by 28.51. For any x E X, the set J(x) consists of all measurable real-valued functions y that satisfy these two conditions: y(w) llxll1sign(x(w)) whenever x (w) =/= 0, whenever x (w) = 0. llxll1 l y(w) l
f. Example.
0 be given. By the definition of f , there is some N(c) such that whenever n 2:: N(c).
· · ·.
Therefore
where
whenever
n
2:: N(c) ,
b is the modulus of convexity. Taking the limsup, we obtain
contradicting the fact that I is the minimum value of f. This proves the uniqueness of the asymptotic center.
28.48. Browder-Gohde-Kirk Fixed Point Theorem. Let C be a closed, convex, bounded subset of a uniformly convex Banach space. Let g : C ---> C be nonexpansive. Then g has at least one fixed point. In fact, if x0 is any point in C, and a sequence (xn ) is defined by Xn+ l = g(xn ) , then the asymptotic center of the sequence (xn ) with respect to C is a fixed point of g.
·
Proof Let u be that asymptotic center. Since g is nonexpansive, lxn+l - g(u) i ::; lxn - ul for all n , and thus lim supn l xn - g( u) I ::; lim supn lxn - ul. Since lim supn lxn - ( ) I achieves its unique minimum at u, we have g( u) = u.
28.49. Optional example. We cannot replace "uniformly convex" with "strictly convex"
in the preceding theorem.
782
Chapter 28: Duality and Weak Compactness
Let C[O, 1] = {continuous scalar-valued functions on [0, 1 ] } ; this is a Banach space when equipped with the sup norm. Show that llfll s = llflloo + llfll 2 is a strictly convex norm on C[O, 1] that is equivalent to the usual sup norm I lloo · Also show that
F
{f E C[O, 1] : f(O) = 0, f(1) = 1 ,
and Range( ! ) F that is nonexpansive when the norm II li s is used, but 'P has no fixed point.
DUALS OF THE LEBESGUE SPACES 28.50. Theorem. Let p , q E (1, oo ) with 1.p + 1. = 1. Let (n, S, f.L) be a measure space (not q necessarily finite or O"-finite) . Then the Banach spaces £P(f.L) and Lq(f.L) are the norm duals of each other, with elements of one space representing elements of the other space's dual by the bilinear pairing
[T(y)] (x) In particular,
L 2 (f.L)
(x, y)
L x(w)y(w)df.L(w).
is its own dual. In view of 22.56, any Hilbert space is its own dual.
Remark. Compare this result with the remark at the end of 29.21 . Proof of theorem. We shall show that (£P(f.L)) * = Lq(f.L). More precisely, for each y E Lq(f.L), let T(y) be the mapping (-, y) : £P(f.L) ---> {scalars} defined above; we shall show that T is an isomorphism from Lq(f.L) onto (£P(f.L)) *. It follows from Holder ' s Inequality that if y E Lq(f.L), then T(y) = ( ·, y) (defined as above) is a continuous linear functional on LP(f.L) , with operator norm I IIT(y) ll l ::; IIYII q · To show that we have equality here, define a function x E £P(f.L) by choosing x(w) to satisfy l x(w) IP = ly(wW and x(w)y(w) 2: 0 for all w. Then we can apply Holder ' s Equality (see 22.33) ; it is easy to verify that (x, y) = llxii P IIYII q ; from this it follows that I IIT(y) l l l 2: IIYII q · Thus the mapping y T(y) is a norm-preserving linear map from Lq (f.L) into (£P(f.L)) *; it �----+
suffices to prove that this map is surjective. Suppose not. Then T(Lq(f.L)) is a proper, closed, linear subspace of (£P(f.L)) *. By the Hahn-Banach Theorem (HB23) in 28.4, there is some �0 in (£P(f.L)) ** that vanishes on T(Lq (f.L)) but does not vanish on some vo E (£P(f.L)) *. B y 22.4l .a and 28.46, LP(f.L) i s reflexive. Thus there i s some x0 E £P(f.L) that represents �0, in this sense: We have u(x0) = �0 (u) for every u E (£P(f.L)) *. Define a function Yo E Lq (f.L) by taking IYo(w) lq = lxo (w) IP for all w, with xo (w)yo(w) 2: 0. Since �o vanishes on T(Lq (f.L)), we have 0 = �o(T(yo)) = [T(yo)] (xo) = (xo, Yo) = ll xo ll � = ll�o ii P. Thus �0 = 0, contradicting our assertion that �o does not vanish on some vo.
28.51. Theorem. Let (rl, S, f.L) be a O"-finite measure space. Then the dual of L 1 (f.L) is
£ 00 (f.L).
Duals of the Lebesgue Spaces
783
Let IF be the scalar field. For any y IF by the rule
Proof.
E L00 (JL) , we may define a mapping Ty : L 1 (JL) ---+
1n x(w)y(w)dJL(w)
and obviously I II Tu ll l -:::; IIYII oc · Now let any T E L 1 (JL)* be given. It suffices to show that (i) T = Ty for some y E uxJ (JL) , and (ii) IIYII oc -:::; I II T II I · Let us first fix our attention on any measurable set flo � fl that has finite measure. Let JLo denote the restriction of JL to flo and its measurable subsets. If f E £ 1 (JLo), then we may extend f to a member of L 1 (JL) by defining f = 0 on fl \ 00. We have L 2 (JLo) � L 1 (JLo) with continuous inclusion, by Holder's inequality: ll g lh = ll 1g lh 'S: 11 9 11211 1 112 = VJL(flo) ll 9 l l 2·
Hence the composition cp : L 2 (JLo) -S £ l (JLo) -S £ l (JL) :!.., IF is continuous, and thus cp i s a member of L 2 (JL0). However, L 2 (JLo)* = L 2 (JLo) by 28.50. Unwinding the notation, we see xy0dJLo that there is a uniquely determined function y0 E L 2 (JLo) that satisfies T(x) = 2 (JL0). L for all x E We claim that ! Yo(·) I -:::; I ll T il l almost everywhere on flo. Indeed, suppose on the contrary that {w E flo : IYo(w) l > I II T II I } has positive measure. Then the set
Ino
has positive measure, for some number
x(w)
=
Then the function x belongs to
TJL(S) -:::; r IYo(·) l dJLo Js
=
:
{w E flo
S
r
r
I II T II I ·
{ IYo(w) 0I /Yo(w)
Define when when
L00 (JLo) � £2(110). xyo dJLo Jr/\1
IYo(w) l }
wES w E flo \ S.
Then
= T(x)
-:::; I II T II I II x lh
I II T II I JL( S),
a contradiction. This proves our claim; we have IIYo ll cx: 'S: I II T II I· By our choice o f Yo, we have xyo djL T(x) for all x E L2 (tto) . However, both sides of that equation are continuous functions of x E L 1 (JLo), and L 2 (JLo) is dense in £ 1 (110) . Thus, that equation is valid for all x E £ 1 (JLo). The function y0 i s uniquely determined on each set flo of finite measure. B y covering fl with an increasing sequence of sets of finite measure, we see that there is a measurable function y : fl ---+ IF, with IIYIIoc -:::; I II T II I , such that xy dJL = T(x) whenever x is a member of £ 1 (JL) that vanishes outside some set of finite measure. Such functions are dense in £ 1 (JL) , and both sides of the equation xy dJL = T( x) are continuous in x. Hence the equation holds for all x E £ 1 (JL) .
Ino
=
I
I
28.52. If (fl, S, JL) is not 0'-finite, then (L 1 (M))* is not necessarily equal to L00 (fL) .
Example (from Holmes [1975] ) . Let (O, S, tt) be the interval [0, 1] equipped with counting measure. Here S = Jl(fl), so every function f : [0, 1] ---+ IF is measurable (where IF is the scalar
784
Chapter 28: Duality and Weak Compactness
field lR or q . The integral of a function f : [0, 1] ---+ IF' is the sum L t E [O ,l ] f(t) E [0, +oo], provided that llfl h = L t E [O, l ] lf(t)l is finite . Now let So be the O"-algebra of countable or cocountable subsets of [0, 1], and let Jlo be the restriction of J1 to S0. Thus a function f : [0, 1] ---+ IF' is measurable with respect to S0 if and only if, for each Borel set B .(A) I : A E S , A s;; S} = oo; then our assumption is that n is fat. We now recursively with I .X (Sn+ l ) l > I.X(Sn ) l + 1 , by the following procedure: choose fat sets So ;;? sl ;;? s2 ;;? Let S0 = !l; this is fat. Given a fat set Sn , choose some measurable set B s;; Sn such that I .X( B) I > 21.-\(Sn ) l Since Sn is fat, at least one of the sets B, Sn \ B must be fat. Call that set Sn+l ; we easily verify that I .X(Sn+ d l > I .X(Sn ) l 1. This completes the recursion. Now, for n = 1 , 2, 3, . . . , let Tn = Sn- 1 \ Sn . Then the Tn ' s are disjoint and have union equal to 0, hence 2:.::= 1 >.(Tn ) = >.(!1) . However, 0
0 0
+ 1.
+
785
786
Chapter 29: Vector Measures
so the series
2:: �= 1 >.. ( Tn )
diverges. This contradiction proves sup5 E g 1 >-.(S) I
< oo .
29.4. Theorem and definition. Let (0, S , J.l) be a measure space; assume /-1 is finite. Let
(X, I
I ) be a Banach space, and let A : S ----+ conditions are equivalent.
(A) (B)
X be a vector charge.
Then the following two
limJ.L(S) --->O >.. (S) = 0. That is, for each number E > 0, there exists some number 8 > 0 such that if S E S with J.l(S) < 8, then 1>-.(S) I < E.
A is a measure, and A vanishes on sets of J-1-measure 0 - that is, J.l(S) = 0 >.. ( S) = 0.
=}
If either (hence both) of these conditions is satisfied, we say that A is absolutely contin uous with respect to J.1 or that it is J.t-continuous; this is abbreviated A < < J.t. Some examples will be given in 29.7 and 29.10.
Proof of equivalence. First assume (A). Obviously J.l(S) = 0 =} >.. ( S) = 0. Let the sets E1 , E2 , E3 , . . . be disjoint with union E; we want to show that 2::;": 1 >.. ( E1 ) = >.. ( E). Let F1 = E \ (E1 U E2 U · · · U E1); we want to show that >.. ( F1 ) 0 as j oo. We know that F1 2 F2 2 F3 2 · · · and the F1 ' s have empty intersection. Hence J.l(F1 ) 1 0. By J-1-Continuity, I >.. ( F1 ) 1 ----+ 0. Thus (A) =} (B) . We shall prove (B) =} (A) first in the case where ).. is real-valued. Note that if J.l(S) = 0, then J.1 vanishes on every measurable subset of S; hence so does >.. ; hence so does j>.. j ; hence so do A + and A-. By the Jordan Decomposition, it suffices to consider ).. + and ).. - ; thus we may assume A 2: 0. Suppose that ).. does not satisfy the condition in (A). Then there ----+
----+
exist a number E > 0 and measurable sets Sn such that >.. ( Sn ) > E and J.l(Sn ) ----+ 0. Passing to a subsequence, we may assume J.l(Sn ) < 2 - n . Let T = lim supn _, oo Sn = n;": 1 U�=i Sn . 1 Then for each j we have T s;; n �j Sn , hence J.l(T) :::; L�=j J.l(Sn ) < 2 - +1 ' hence J.l(T) = 0. On the other hand, by 2 1 .25.c we have E :::; lim supn _, oo >.. ( Sn ) :::; >.. ( lim supn _, oo Sn) = >.. ( T). This is a contradiction, proving (B) =} (A) in the case of real-valued >.. . Now we prove (B) =} (A) for an arbitrary Banach space X. We may assume the scalar field is R Suppose (B) holds but not (A); thus there exist measurable sets Sk with 1>-.(Sk ) l > E and J.l(Sk ) < 2- k . We have limm_,oo /-1 U k 2: rn Sk = 0. For each u E X*, the real-valued measure u>.. is J-1-continuous and therefore satisfies
��00
sup
{
(
lu>.. ( S) I : s
E s,
s
s;;
u sk
k 2: rn
}
)
0.
Recursively choose positive integers m(p) as follows: Let m(O) = 1. Given m(p - 1), use the Hahn-Banach Theorem (HB8) to choose some Up E X* with l uP I = 1 and up>.. ( Srn(p - 1) ) > E . Then choose m(p) > m(p - 1 ) large enough so that eup
{ lu,A(S)I
s
E s,
s
s;;
U
k 2: rn (p )
sk
}
-.( Fp ) l 2: lup>.. ( Fp) l > �E. Since A is a measure, we have
787
The Variation of a Charge
(U;: 1
)
Fp = 2.:.:: ;: 1 A(Fp ) , and therefore the series 2.:.::;: 1 A(Fp ) is convergent, and there A fore limp_,= I A(Fr ) l = 0 , a contradiction.
THE VARIATION OF A CHARGE 29.5. Definition. Let A be an algebra of subsets of 0, let (X, I I ) be a Banach space, and X be a charge. The variation of A is the function I AI : A [0, + oo] defined
let A : A by
--+
--+
I AI(A)
where the supremum is over all positive integers n and partitions of A into disjoint subsets sl ' 52 , . . . ' Sn E A . (Much of the literature denotes the variation by I A I ' but we prefer I AI for reasons indicated in 8.39.) For further clarification, we may refer to IAI as the variation in the sense of charges or measures to distinguish it from another type of "variation" introduced in 19.21 . The relation between the two notions of "variation" will be considered in 29.33 and 29.34. If IAI(O) < oo, we say A has bounded variation. The number IAI(O) may also be written as Var(A) or as Var(A, A) if several different algebras of sets are being considered.
29.6. Basic properties of the variation of a charge. a. I AI is a positive charge. b. sup{I A(S) I : S E A} ::; IAI(O) . Thus, any charge with bounded variation is a bounded
charge. c. The space of all X-valued charges on A with bounded variation is a Banach space, with the variation for a norm. We may denote this space by BV(A, X ) . Proof of completeness. Apply 22. 17 with r = A, with � consisting of all functions of the form If? ( A) = I A(SI ) I + I A(S2 ) 1 + . . . + I A(Sn ) l for disjoint sets sj E A. Remark. For a still larger space, see the space of bounded charges in 29.29.f. d. If X = lR, then IAI(A) sup { I A (S) I + IA(A \ S)l : S E S, S A}. Thus the definition of "variation" given in this chapter agrees with the definition of "variation" given in 1 1 .47. As we noted in 1 1.47, any bounded real charge has bounded variation; hence any real-valued measure has bounded variation. e. If A is countably additive, then IAI(A) is also equal to sup L.::;: 1 I A(SJ ) I , where the supremum is over all partitions of A into countably many disjoint sets Sj E S. Hints: Any finite partition can be written as a countable partition, by taking all but finitely many of the Sj ' s to be empty; thus IAI(A) ::; sup L.:: ;: 1 I A(SJ ) I . On the other hand, if IAI(A) < r < L.::;: 1 I A( SJ ) I for some countable partition (SJ ) and some real number r, then choose N large enough to satisfy r < L.::f= l I A(SJ ) I ; then we have I AI(A) < l A ( A \ u;=l Sj ) l + L.::f= l I A (SJ ) I , a contradiction.
S:
f. If A is countably additive, then I AI is too. Thus, if A is a measure, then I AI is too.
788
Chapter 29: Vector Measures
A
Hints: Let A1, A 2 , A3 , . . . be disjoint members of with union is a positive charge, for any positive integer N we have
A E A.
Since
/ >.. j
and taking limits we obtain 2::: ;': 1 j>.. j (A1) :::; />../ ( A). For the reverse inequality, let (Bk : k E N) be any partition of A into countably many disjoint members of Then (Aj n Bk : k E N) is a partition of A1 , and (A1 n Bk : j E N) is a partition of Bk . Hence L: k i >.. ( Bk ) l = L: k L:j >.. ( Aj n B k) :::; L:j,k i >.. ( Aj n B k ) l :::; L:j 1>../ (Aj ) · Taking
l
I
the supremum over all choices of the sequence
A.
(Bk )
yields
j>.. j (A) :::; 2:::;': 1 j>.. j (A1).
g. If >.. is a vector measure with bounded variation, then >.. < < / >.. j . h. Any real-valued measure - or, more generally, any measure taking values in a finite dimensional Banach space - has bounded variation.
(Proof
29.3 and 29.6.d.)
29. 7. Example: a pathological vector measure. We exhibit a bounded vector measure that
has infinite variation on every nontrivial set. (This example is taken from Diestel and Uhl [ 1977] . ) Let (0, S, J.L) b e the measure space [0 , 1 ] with Lebesgue measure on the Lebesgue measurable sets. Let X = £ 2 [0 , 1] . Define >.. : S ---+ X by >.. ( S) = 1 8 (i.e., the characteristic function of S) . Then >.. is J.L-continuous - i.e., >.. vanishes on sets that have J.L-measure 0. To show that >.. is countably additive, verify that if ( En ) is a sequence of disjoint mea surable subsets of [0 , 1] , then
which tends to 0 as N ---+ oo . On the other hand, if E is a measurable set with J.L(E) > 0, we shall show that j>.. j (E) = oo. Indeed, let N be any positive integer. By 24.25, J0u 1E(t)dt is a continuous function of u, so it must pass through each number between 0 and J.L(E). Hence we can partition E into disjoint measurable sets E1 , E2 , . . . , EN that have equal Lebesgue measure - i.e., they all have J.L(E1 ) = 1:J J.L(E) . Then I >.. j ( E )
>
N
2::: i>.. ( Ej )l x
j= 1
N
2:::
j= 1
/NJ.L(E),
which can be made arbitrarily large.
29.8. Nikodym Convergence Theorem (optional). Let (X, I I ) be a Banach space, and let Pn : n E N} be a sequence of X-valued measures on a measurable space (0, S ) . As sume >.. ( S) = limn� oo >..n (S) exists in X for each S E S. Then >.. is a measure. Furthermore:
789
The Variation of a Charge
k
The An ' s are uniformly countably additive. That is, if Sk l 0 in S , then --+ oo ) uniformly in n. In other words, the sequence-valued function
An ( Sk) --+ 0 (as
is a c(X)-valued measure.
Explanation of notations. The expression Sk l 0 means that S 1 :2 S2 :2 S3 :2 and n �= 1 Sk = 0. Also, c(X) means the Banach space of all convergent sequences in X ; it is normed by ll(x 1 , x 2 , x 3 , . . . ) II = supn EN lxn l Proof of theorem. We first prove this theorem in the classical case, where X is the scalar field lF. In that case, each variation /An/ is a positive, finite measure. Hence ·
·
·
·
JL(S) defines a probability JL on (fl, S) with the further property that Define a pseudometric d on S, by
d(S, T) = JL(S 6 T) =
An < < JL for all n.
L l l s (w) - lr (w) l dJL(w) = Il l s - l r ll 1 ;
the last expression is a norm in the Lebesgue space L 1 (p,). The space } L1 (JL) is complete, ! } by 22.31 (i); and { 15 : S E S = f E L1 (JL) : Range( ) r:;;: {0, 1 } is a closed subset of
{
L 1 (JL), by 22.31 (ii); hence the pseudometric space (S, d) is complete. By the Baire Category Theorem (20.16), (S, d) is a Baire space, and so any comeager subset of S is dense and thus
nonempty. From An < < JL it follows easily that each An is a continuous map from the pseudometric space (S, JL) into IR , and each An is a continuous map from (S, JL) into X . By the Baire Osgood Theorem (20.8), (A n ) is equicontinuous on a subset of S that is comeager, hence nonempty. Say (A n ) is equicontinuous at some particular T E S. Let Sk l 0 . We verify that the sequences (T U Sk) and (T \ Sk) both converge to T in the metric space (S, JL). Since the sequence (A n) is equicontinuous at T, the sequences An(TUSk) and A n (T\Sk) both converge to A n (T) uniformly in n as k --+ oo . Then An(Sk) = An (T U Sk) - An (T \ Sk) converges to 0 uniformly in n as k --+ oo. Thus (An ) is uniformly countably additive. It follows easily that the limit A is a measure. This completes the proof in the case where X is the scalar field. We now turn to the general case. For each u E X*, we know that u o A is a measure, by the scalar case. Let us next show that A itself is a measure: If (Tn ) is any sequence of disjoint measurable sets, we know that 2::: �= 1 A(Tn ) converges weakly to A (U�= 1 Tn ) · Since the same type of conclusion holds when we replace (Tn ) with a subsequence, we know that any subseries of 2::: �= 1 A(Tn) converges weakly. By the Orlicz-Pettis Theorem (28.31 ) , it follows that 2::: �= 1 A(Tn) converges in X to a limit. That limit can only be A (U�= 1 Tn), since X * separates points of X . Thus A is a measure.
790
Chapter 29: Vector Measures
Replacing (An) with the sequence (An - A), we may assume A = 0. That is, An (S) ----+ 0 for each S E S. Suppose the sequence (An) is not uniformly countably additive. Thus there exists a sequence sk 1 0 in s , which does not satisfy supn I An (Sk ) l ----+ 0 as k ----+ 00. That is, some constant E: > 0 satisfies lim supk __,oo supn E N I An(Sk ) l > E: . Replacing (Sk ) with a subsequence, we may assume supn I An (Sk ) l > E: for all k. Thus, for each k there is some n(k) satisfying I An(k) (Sk ) l > Since sk 1 0 , we also have maxl:Sn:S N I An (Sk ) l ----+ 0 as k ----+ 00 for each fixed N. Thus, the sequence (n(k)) cannot take all its values in some finite set { 1 , 2, . . . , N}. That is, the sequence (n(k)) is unbounded. Replacing (Sk ) and (n(k)) with subsequences, we may assume n(1) < n ( 2 ) < n(3) < · · · . Replacing (An) with the subsequence (An(k) ) , we may assume I Ak(Sk ) l > E: for all k. Choose some Uk E X* satisfying l ukAk(Sk ) l > and luk l = 1. For each fixed S E S, we have l uk Ak (S) I � I Ak(S) I ----+ 0. By the scalar case that we have already proved,
E:.
E:
is a c-valued measure. Since Sk 1 0, it follows that A(Sk ) ----+ l ukAk(Sk ) l > E: , a contradiction.
l
0. However, IA(Sk ) >
29.9. Corollary: Nikodym Roundedness Theorem. Let (X, I I ) be a Banach space, and let A be a collection of X-valued measures on a measurable space (D, S ) . If sup >.E A I A(S) I < oo for each S E S, then in fact sup>. E A sup s ES I A(S) I < oo.
Proof Suppose not. Then we � may choose sequences (An) i n A and (Sn ) in S , with I An(Sn ) l > n2 . The measures rn = An satisfy lrn (Sn ) l > n and limn__, oo rn (S) = 0 for each S. By the Nikodym Convergence Theorem (29.8) , f(S) = (ri (S) , r2 (S) , r3 (S), . . . ) defines a c(X)-valued measure. By 29.3, any Banach-space-valued measure is bounded; but that contradicts l f(Sn ) l > n. Remark. With a longer proof, a slightly weaker hypothesis suffices; see Diestel and Uhl [1977] .
INDEFINITE BOCHNER INTEGRALS AND RADON-NIKODYM DERIVATIVES 29.10. Example: the Bochner integral as a vector measure. Let (D, S, J.L) be a
measure space, let (X, I I ) be a Banach space, and let h E U (J.L, X ) . We shall show that the function A : S ----+ X defined by the Bochner integral A(S)
L 1s(w)h(w)dJ.L(w)
is an X-valued measure on S. Obviously it is J.L-continuous (as defined in 29.4). It is sometimes called the indefinite integral of h. Also, we say that h is the Radon-Nikodym
Indefinite Bochner Integrals and Radon-Nikodym Derivatives
791
derivative of A with respect to f.L· We say "the derivative" rather than "a derivative," because - as we shall show below - there is at most one
dA relation with measures A and f.L· Thus we may write h = - . dtJ Proofs. Let h E £ 1 (tJ, X ) . Obviously A is a vector charge.
h
E
£ 1 (tJ, X )
satisfying this
To show that it is a vector measure (i.e., countably additive) and is absolutely continuous, we may reason as follows: (see 22.28). By 21 .38(i), the The function l h(-) I is measurable and is a member of £ 1 (tJ, function 1 defined by 1(S) = Is l h(·) l dtJ is a finite positive measure. Since I A(S) I ::::; 1 ( S) for every measurable set S, it is easy to see that the vector charge A satisfies A < < 1 by criterion 29.4(A). Therefore A < < 1 by criterion 29.4(B), and so A is a vector measure. We have A .. dv
d>.. dfl dfl dv whenever the right side exists. More precisely, if h = d>.. fdfl and g = dfl f dv, where fl, v are positive finite measures and >.. is a vector measure, then the Radon-Nikodym derivative d>.. jdv also exists; it is equal to hg. (Compare with 25.6.)
C ONDITIONAL EXPECTATIONS AND MARTINGALES Notations/assumptions. Throughout this subchapter, we assume (X, I I) is a Banach space and ( !1, S, 11) is a probability space - i.e., a set equipped with a O"-algebra of subsets and a probability measure. We consider various sub-O"-algebras A, e, . . . � S. The restriction of fl to those sub-O"-algebras will be denoted flA , fl13 , fl e , etc. Thus LP(flA , X) consists of those members of LP(fl, X) that are equivalence classes of functions measurable from (!1, A) to the Borel subsets of X. Note that LP(flA' X) is a closed linear subspace of LP(J.l, X); this follows from 2 1 .3 and 22 . 3 1 ( ii ) . Conditional expectations and martingales are used extensively in probability theory; this book will use them to prove Theorem 29.26. 29. 13.
13,
29.14. Proposition and definition. Let f E L 1 (fl, X), and let A � S be a sub-O"-algebra.
Then there exists a unique (up to 11-equivalence) function g E £ 1 (flA , X) with this property: JA g dfl = JA f dfl for every A E A. Such a function will be denoted by E(JIA); it is called the conditional expectation of f with respect to A. In this fashion we define the conditional expectation operator
E(- I A)
It has these further properties: (i) It is linear.
L 1 (fl, X) to L1 (flA , X) - that is, II E(JIA) I I l :::; llfll 1 · It is idempotent - that is, E(·IA) o E(-IA) = E( · I A). Let F be the scalar field. If f E L 2 (fl, F) , then E(JIA) is the closest vector to f in the closed linear subspace L 2 (flA , F).
(ii) It is nonexpansive from
(iii) (iv)
793
Conditional Expectations and Martingales
We first prove uniqueness. Suppose that g 1 , gz E £ 1 (J.LA , X) satisfy fA g 1 dfl = h = g 1 - g2 is a member of L 1 (J.LA , X) that satisfies dfl g fA h2dfl =for0 foreveryeveryA EA A.E A.Then It suffices to show that h = 0 almost everywhere. Suppose, fA on the contrary, that { w E 0 h( w) -# 0} has positive measure. Since the range of h is separable, its points are separated by the functions Rw ¢n for some sequence ( 'l/Jn ) in X* see 23.24. Then {w E 0 : Re 'l/Jnh(w) -# 0} has positive measure for at least one Replacing 'l/Jn with -'l/Jn if necessary, we may assume that the set A = { w E 0 : Re 'l/Jnh (w) > 0} has Re '¢n fA h dJ.L = 0, a contradiction. This proves positive measure. But fA Re'¢n hdJ.L uniqueness. Let us define a linear map E(- I A ) from some linear subspace of £ 1 (J.L, X) into £ 1 (J.LA , X), by writing g E(J I A ) whenever g E L 1 (J.LA ,X) satisfies fA gdJ.L = fA fdJ.L for all A E A. (The fact that the operator's domain is all of £ 1 (J.L, X) will not be established until the end of this proof. ) Clearly, E(J I A ) exists and equals f whenever f E £ 1 (J.LA , X); thus the domain of the conditional expectation operator contains £ 1 (J.LA , X) and the operator is idempotent. We next show that E(- I A ) is nonexpansive. Say g = E(JI A ). Define measures .Ag , ,\A on S, A, respectively, by .AA (A) = i gdJ.L for A E A. .Ag (S) is f dJ.L for S E S, Then l f l 1 Var(.AA ) = /.AA /(0) , by 29. 1 1 . However, Var(.Ag ) /.Ag/ (0) and l g l f i ,\A is just the restriction of .Ag to the smaller a-algebra A, so Var(.AA ) :::; Var(.Ag ) ; thus I E (JNext IA)I Iwel :::;consider 1 ! 1 1 · f E L2 (J.L, If). Let g be the closest point to f in the closed linear subspace L 2 (J.LA ,If). Then g - f is orthogonal to L 2 (J.LA ,If), as we noted in 22.51 . That is, f0 (g - f) h dfl = 0 for every h E L 2 (J.LA ,If). In particular, taking h = l A shows that fA g dfl = fA f dfl for every A E A. (We can take h = I A because is a probability measure, hence l A E L 2 (J.L, !f) s;; U(J.L,!f).) Thus, for scalar-valued functions, the domain of E(· I A ) contains £ 2 (J.L, If); note that that linear space is dense in U (J.L, If). Since E( · l A ) : £ 2 (J.L, If) L 1 (J.LA ,If) is nonexpansive for the I l 1 norms (as noted in the preceding paragraph) 1 (J.LA, ,theIf) £ operator extends uniquely to a nonexpansive linear operator r : U (J.L, If) (which, however, we have not yet established to be a conditional expectation operator) . For fixed A E A, the mappings f f fA rf dJ.L are continuous linear maps from £1 (J.L, If) into If, and they agreefAonf thedJ.L and dense set £ 2 (J.L, If), hence they agree on £ 1 (J.L, If). This proves r is indeed a conditional expectation operator, defined everywhere on £ 1 (J.L, If). For f E U(J.L, X) with a general Banach space X, we construct E(JI A ) first in the case where f is a simple function. Say f = 2.::7= 1 1s1 ( ·)x1 where the l s ( · ) 's are char acteristic functions of disjoint sets S1 E S, and the xj ' s are members of X. Define E(JIA) = 2.::::; = 1 E ( 1s1 ( · ) 1A) x1 . It is easy to verify that this defines1 a linear, nonex pansive mapping E( · l A ) from the simple functions in £1 (J.L, X) into £ (J.LA , X), satisfying fA E(J I A) dJ.L = fA f dJ.L for all A E A. Those properties are preserved when we take limits, and the simple functions are dense in £ 1 (J.L, X); thus the conditional expectation operator extends to a mapping with those properties on all of £ 1 (J.L, X).
Proof
:
n.
-
=
=
=
=
=
=
fL
---->
r---;
r---;
1
'
29.15. Corollaries.
---->
794
a. fK I E( JI A)(w) l dtL(w) :S fK l f(w) l dtL(w) for any
K
K
Chapter 29: Vector Measures
K.
E A. That is, the conditional expectation operator is nonexpansive from L 1 (MS n K ' X) to L 1 (11A n K ' X), where S n K denote the traces of the a-algebras S and A on the set and A n Proof Replace 11 with its restriction t o K, and apply the preceding results. b. If A � 13 � S, then E (E (f i 13) I A) = E( JIA) . c. Example. Let 81 , 82 , 83 , . . . be disjoint members of S with union equal to n. Let A be the a-algebra generated by the 8j ' s; thus A = { unions of 8j 's } . Verify that 9 = E(JIA) can be represented as follows: 9(w)
if w
E 81 and tL(8J ) > 0
if w
E 81
and
tL(8J ) = 0.
29.16. Definition. Let r be a collection of sub-a-algebras of S that is directed by inclusion - i.e., assume that for any A, 13 E r there exists some e E r with e :;2 A U 13. An X-valued martingale indexed by r will mean a net (9A : A E f) in L 1 (11, X) satisfying - that is, satisfying
whenever
whenever
A � 13
A � 13 and A E A.
An important special case is that in which r consists of an increasing sequence of sub a-algebras A 1 � A 2 � A 3 � · · · � S, with conditional expectation operators E1 = E(· I AJ ) · Then the corresponding functions form a sequence 9 1 , 92 , 93 , . . . that satisfies 9J = Ej (9k ) whenever j :::; k - that is, fA 9J d11 = fA 9k d11 whenever j :S k and A E Aj . We may refer to such martingales as sequential martingales. Two examples of methods for constructing martingales are given in 29. 17 and 29.24.
29.17. Mean Convergence Theorem for martingales. Let (9A : A E r) be a net in 1
L (11, X) where r is a collection of sub-a-algebras directed by inclusion. Then these two conditions are equivalent: (A)
(B)
L 1 (11, X) to some limit 9oo · There is some function 9 E L 1 (11, X) such that 9A = E(9 I A) for each A E r. (9A : A E r)
is a martingale that converges in
Moreover, if those two conditions are satisfied, then
where S00 is the a-algebra generated by UA a A. (Remark. The function 9oo is determined uniquely almost everywhere by these conditions, but the function 9 might not be.)
(B). Fix any A E r. Since E(·IA) : L 1 (11, X) --+ L 1 (11A , X) is a nonexpan sive mapping and 0 = lim13 11 913 - 9oo lh , it follows that 0 = lim13 I E(913 IA) - E (9oo lA) lh .
Proof of ( A )
=}
Conditional Expectations and Martingales
795
For 'B sufficiently large, we have 'B -;-;:> A, hence E(9'B l A) = 9A ; thus 0 = lim'B II 9A E(9x i A) II t · But II 9A - E(9IA ) I I t does not depend on 'B, so 9A = E(9oc iA).
Proof of (B) =? (A) and ( * ) . Let 9A = E(9 I A). That (9A : A E r) is a martingale follows immediately from 29. 15.b. Let 9x = E (9IS:x: ) ; it remains to show that 0 = limA I I 9A - 9oc li t . Note that if E E 'B then E E S :x: , hence
l 9x dp,
l 9'B dp, ;
l 9 df1
this proves E(9:x: I 'B ) = 9'B . The function 9oc is S:x:-measurable, hence by 22.30 we can approximate 9= arbitrarily closely in L t ( p, , X) by an S=-measurable simple function. The algebra of sets 'U = UAa A generates the ()-algebra S:x: ' so by 21 .26 each member of s= can be approximated arbitrarily closely in measure by some member of 'U. Combining these two results, we can approximate 9= arbitrarily closely in U ( p, , X ) by a simple function of the form h = "L_7= t l u , ( - )xj where the U; 's belong to 'U ; here l u1 is the characteristic function of UJ . Thus we can satisfy ll 9x - h il t < c for any given c. Temporarily hold c fixed; then we may fix n, h , and some particular A that contains all of Ut , U2 , . . . , Un- For all 'B E r sufficiently large, we have 'B -;-;:> A, and therefore E( hi 'B ) = h. Since E(- I 'B) is nonexpansive. we obtain
II9'B - 9x li t
l let
Proof. Let E
c for some n E
{ w E 0 : sup" l 9n (w) l
N.
En
{ w E 0 : tmax I9 (w) :SJ< n .J l
Sc
}
< l9n (w) l .
Then the E, 's form a partition of E , so p,(E) = "L:= l p,( E11 ) . Since 9n is An-measurable and the An 's arc increasing, it follows that En E An . Observe that
796
Chapter 29: Vector Measures
J for any m 2 n; the last inequality follows from 29. 15.a since gn = E(gm An ) · Therefore, for any integer m 2 1 we have
Finally, take limits as
m
-; oo .
29.19. Pointwise Convergence Theorem. Let ((fn , An ) : n = 1 , 2, 3, . . . ) be a sequen
tial X-valued martingale, which converges in L 1 ({-l, X ) to a limit f. Then also fn -; f pointwise {-l-almost everywhere.
Proof. Say the conditional expectation operators are En = E( · IA n ) i thus the functions are JJ fn = En (f). Let any numbers 8, c > O be given. By 22.30 and 21 .26, we have l l f -g 1 < 8c-/2 for some simple function g that is Ak-measurable for some positive integer k. Then Eng = g for all n 2 k. For all m, n 2 k, we have ::=:; I Eng - Emg
J
+
J
En ( f - g) - Em (f - g)
J
J
::=:; 2 sup Ej (f - g) J 2' l
J
def = h.
Taking limits, we have lim supm . n ->oo l fn (w) - fm (w) J ::::; h(w) for each w. Therefore J � < f..l ( { w E O : h(w) > c- } ) f-l wEO : _ l fn (w) - fm (w) > c: < -2 II ! - g l l 1 < 8, E
({
��
})
where the next-to-last inequality follows from Lemma 29. 1 8 with p = f - g. Letting 8 l 0 shows that f-l ( {w E n : lim SUPm , n ->oo l fn (w) - fm (w) l > E } ) = 0. Since E is arbitrary, this shows that lim supm n -> oo l fn (w) - fm (w) l = 0 almost everywhere, and thus limn_,00 fn (w) exists almost everywhere. We established earlier in this proof that fn converges to f in U (f-l, X ) ; hence fn -; f pointwise almost everywhere.
EXISTENCE OF RADON-NIKODYM DERIVATIVES 29.20. Classical Radon-Nikodym Theorem. Let (0, S, f-l) be a finite measure space. Let A be a scalar-valued measure on S. Assume that A is {-l-continuous (as defined in 29.4) . Then there exists a Radon-Nikodym derivative h = dA/df-l, as defined in 29. 10.
Remark. An analogous result can be proved for . : S --+ £ 1 be a f.L-continuous vector measure with bounded variation. We are to exhibit a function g E L 1 (f.L, € 1 ) satisfying >.(S) = Is g df.L for all S E S . S --+ lF i s a f.L We may write >.(S) = >.1 (8), >.2(8), >.3(S), . . . ; then each Aj
(
)
:
continuous, scalar-valued measure with bounded variation. It follows easily from the defi nition of the norm of £ 1 and the definition of the variation of a charge, that
/>.f(S)
j>.I /(S) + /Az/(S) + j>.3j(S) +
·· · .
799
Existence of Radon-Nikodym Derivatives
�
(Hint: To prove that 2::� 1 l>.j i (S) :::; I>.I(S) , it suffices to show that 2:: = 1 l>.j i (S) :::; I >.I(S) for any positive integer N.) Since the scalar field has the RNP, for each j we have Aj (S) = Is gj dJ1 for some gj E
(
)
L 1 ( J1; lF) . Define g(w) = g 1 (w) , g2 (w) , g3 (w) , . . . . Observe that
00
1 I >. (0) ,
which is finite; this proves that g E £ 1 (J-L, € 1 ) . Define a "truncation map" TN : € 1 -+ € 1 by Then for any x E € 1 , we have limN� oo TN (x) = x - or, more precisely,
O· '
( )
this follows from the definition of € 1 and its norm. Now, for any S E S and N E N it is easy to verify that TN >.(S) = TN Us g dJ1 ) . Taking limits yields >. (S) = Is g dJ1.
29.23. Example. The space c0 lacks the RNP. Let (0, S, J-L) be the interval [0, 21r] equipped with Lebesgue measure. For S E S, let >.(S)
be the sequence whose nth term is
1 ()
2 11"
ls(t) sin(nt)dJ-L(t)
(n = l , 2, 3, . . . ) .
By the Riemann-Lebesgue Lemma (24.41 .b) , An (S) -+ 0 as n -+ oo . Thus ).. is a map from S into the Banach space c0 = {sequences of real numbers converging to 0}, which we equip with the sup norm as usual. Obviously ).. is finitely additive - i.e. , a vector charge. Also, II >.(S) II :::; J-L(S) , so ).. is a J-L-continuous vector measure with bounded variation. However, we shall show that there does not exist an integrable function g : [0, 21r] -+ c0 with the property that >.(S)
is g(t)dJ-L(t)
for every S E S.
Indeed, suppose there were such a function. Then g(t) is a member of co - i.e. , a sequence (g 1 (t) , g2 (t) , g3 (t ) , . . . ) . Applying the nth coordinate projection to the equation above, we obtain
is sin(nt )dJ-L(t)
for every S E S
and therefore gn (t) = sin(nt) for almost every t. However, we shall show that the function g(t) = (sin(t) , sin(2t ) , sin(3t ) , . . . ) defined in this fashion generally does not take values in c0 -- in fact, we shall show that g(t) E c0 only for t in a set of measure 0 . Fix any
800
Chapter 29: Vector Measures
small number E > 0, and let En = {t E [0, 2Ir] : l9n ( t ) 1 ::;:: E}. It is easy to show that p (En) = 27r - 4 arcsin(E) for each n. Define limsups as in 7.48 and 2 1 .25.c; then
{t :
Hence
g ( t ) tic co}
p({t E
[0, 2Ir]
:J
{t
: lgn (t) l ::;:: E for infinitely many
>
g( t ) tic co})
>
Now take limits as E l 0.
p(lim sup En) n -+ oo lim sup p(En ) n - + oo
n}
lim sup En . n -+ oo
2 Ir - 4 arcsin(E ) .
29.24. Definition/example. The rather complicated definitions in this section and the related lemma in the next section are in preparation for the major theorem given in 29.26; some readers may find it helpful to glance ahead to that result. The collection J = {finite subalgebras of S} is directed by inclusion. Let p be a prob ability measure on (D, S), let >. be an X-valued measure on (D, S), and suppose that >. is p-continuous - that is, >. vanishes on p-null sets. We use >. to define a martingale (gA : A E J) as follows. For each finite subalgebra A � S, observe that 1r(A) = {minimal nonempty members of A} is a finite partition 1r (A) of n, and A = {unions of members of 1r(A) } . Define a simple (i.e., finitely valued) integrable function gA : n ----+ X by
{
Some observations:
.A(T)
p(T) 0
if w
E T E 1r(A) and p(T) > 0
if w
E T E 1r(A)
and p(T) = 0.
a. The function gA is defined uniquely everywhere on n, not just p-almost everywhere. b. The restriction of >. to A has bounded variation and has Radon-Nikodym derivative equal to gA ; thus we obtain Var(>., A) = c . fA 9A dp = .A(A ) when A E A.
II 9A I I c
:
d. (gA A E J) is an X-valued martingale. For purposes of the discussion in the next few sections, we shall call this the full sieve martingale associated with >.. e. If A 1 � A 2 � A3 � · · · is an increasing sequence of finite subalgebras of S, then the sequence 9n = gA n (n = 1 , 2, 3, . . . ) , with IT-algebras An , is a martingale. We shall call it a sequential sieve martingale associated with >.. Different sequential sieve martingales are obtained from different sequences (An) ·
29.25. Sieve Convergence Lemma. Let (D, S, p) be a probability space. Let >. be an X-valued measure on (D, S) that is p-continuous. Define sieve martingales as in 29.24. Then the following conditions are equivalent:
•
801
Existence of Radon-Nikodym Derivatives (A) There exists a Radon-Nikodym derivative h = d>./dJL That is, there exists some h E U (JL, X) that satisfies >. (S) = Is h dJL for every S E S. Hence gA = E ( h i A) for each A E �(B) The full sieve martingale (gA : A E �) associated with ).. converges in to some limit h.
£1 (JL, X)
(C) For each increasing sequence o f algebras A 1 � A 2 � A 3 � · · · contained in S, the sequential sieve martingale (gn ) associated with ).. converges in £ 1 (JL, X) to some limit gx .
�
A 2 � A3 � · · contained in (D) For each increasing sequence of algebras A 1 S, there exists some function goc E £ 1 (JL, X) with the following property: For each zp E X* , the scalar-valued sequential sieve martingale ( zp o gn : n = 1 , 2, 3, . . . ) converges in L 1 (JL, IF) to zp o goc . (Here IF is the scalar field. ) ·
Furthermore, when these conditions are satisfied, then the functions h in (A), (B) are the same, and the limits goc in (C), (D) are equal to E(hiAoo ) , where A00 is the a-algebra generated by UnE N An .
The implication (A) =? (C) follows from 29. 17; the implication (C) =? (B) follows from 19.7. To prove (B) =? (A), fix any set S E S. For all A E � sufficiently large, we have S E A, hence Is gA dJL = >. (S) . Take limits as gA -> h in U (JL, X) to obtain Is h dJL = >.(S) . The implication (C) =? (D) is obvious. For (D) =? (C), let any sequence A 1 A2 � A 3 · · · S be given, and suppose the conclusion of (D) holds. Temporarily fix any zp E X * . It is easy (exercise) to verify that the sequence ( zp o gn ) is a martingale in £1 (JL, IF). Since that martingale converges in L 1 (JL, IF) to zp o gx , we know from the implication (A) =? (B) in 29. 1 7 that zp o gn = E(zp o g= IAn) for each n. That is, zp o gn is An-measurable, and Is zp o gn dJL = fs zp o g00 dJL for each set S E An. Since zp is continuous and linear, it commutes with the Bochner integral; thus we obtain zp (J gn dJL) = zp (J goo dJL) for each zp E X * . Since X * separates the points of X, it follows that I gn dJL = I g= dJL for each S E A71 • That is, gn = E(goo iAn) · By the implication (B) =? (A) in 29. 17, it follows that gn -> g00 in £ 1 (Jl, X ) .
Proof of theorem.
�
�
�
29.26. Theorem (Phillips). Every reflexive Banach space has the RNP.
Proof (following R0nnow [1967] and Chatterji [1968] ). Let (X , I I ) be a reflexive Banach space. Let (n, S, JL) be a positive measure space, and let ).. be an X-valued JL-continuous vector measure that has bounded variation. We are to show that the Radon-Nikodym derivative d).. j dJL exists. If S is not complete, we may extend ).. and JL to the completion of S, by taking them both to be 0 on any 11-null set. Thus we may replace S with its completion - we may assume S is complete (i.e., every null set is measurable) . Since ).. < < JL , we know that /)../ < < JL. By the classical Radon-Nikodym Theorem, we / know that the Radon-Nikodym derivative exists. It suffices to show that the derivative
d��
802
Chapter 29: Vector Measures
d>. d>. d1 >.; d>. . l l exists, for then we may apply the Cham = l l �Rule 29. 12.b to obtain dp d >. d >. Hence we may replace J.L with I >.I; thus we may assume hereafter that J.L = I >.I. By rescaling, we may also assume that J.L is a probability measure. Let A 1 c;;; A2 c;;; A3 c;;; · · · be an increasing sequence of finite algebras contained in S . Define the X-valued martingale (gn ) as in 29.24.e It suffices to verify condition 29.25( D ) . Let X0 b e the closed linear span o f the union of the ranges o f the 9n 's; then X0 is separable and weakly closed. Since I >.(S) I � 1>-I(S) p(S), from our definition of the 9n 's in 29.24 we see that l9n ( w) I � 1 for all n, w. Let B be the closed unit ball of X; the 9n 's take their values in B n X0 . The set B is weakly compact by 28.4 1 ( C ) ; hence B n X0 is weakly compact. Therefore B n X0 is weakly sequentially compact, by 28.36(E) . Temporarily fix any w E 0. The sequence (gn (w)) has a subsequence (gn k (w)) that converges weakly to some limit g00 (w) E B n Xo . Many choices of g00 (w) may be possible, using different subsequences; we use the Axiom of Choice to select some particular 9oo (w). Making a choice for each w, we define a function 9oo : 0 -+ B. (Different w's may use different subsequences; we do not yet assert anything about measurability of g00 .) Let F b e the scalar field. Temporarily fi x any
.) Theorem, the scalar-valued measure
. has a Radon-Nikodym derivative . By =
�=
the implication (A) :=:;. (C) in 29.25 (with the scalar field used as a Banach space) , the sequence (
.
=
u
o
>. is a scalar-valued charge; its variation is the positive
803
Semivariation and Bartle Integrals
charge juA.j. The semivariation of ).. is the function /).. / : A ---+ [0, +oo] defined by sup juA./(A). /A./(A) Caution: A more commonly used notation is IIA.II; see for instance Diestel and Uhl [1977]. Our notation / ).. / is unconventional, but perhaps more suggestive. uEU
29.29. Basic properties of semivariations. a. /).. / is monotone - that is, A . is an isomorphism (i.e., norm-preserving linear bijection) from ba(A, X) to the operator-normed space BL (Unif(A), X ) {bounded linear maps from Unif(A) to X} (defined as in 23.1). In particular, ba(A, JF) = (Unif(A) ) *. c. The dual of L= (J.L). Let (0, S, J.L) be a measure space (i.e., assume S is a a-algebra and J.L is countably additive). Show that the mapping A !3>. gives an isomorphism from ba(J.L, X) onto BL (L= (J.L), X)), where ba(J.L, X) is the subspace of ba(S, X) consisting of those charges that vanish on J.L-null sets. (Hints: Since S is a a-algebra, U nif(S) is the space of bounded measurable functions. Then L = (J.L) is a quotient space of Unif(S), obtained by identifying those functions that are J.L-equivalent.) In particular, ba(J.L, lF) = ( L = (J.L) ) *. 29.32. The following principles are equivalent to the Hahn-Banach Theorems, which were presented in 12.31, 23.18, 23.19, 26.56, 28.4, and 28.14.a. Notation is as in 29.30. (HB25) Banach's Generalized Integral. Let O, A, B(O),lF, etc., be as in 29.30, with X = lF = Let A be a bounded charge on A. Then the Bartle integral f J f dA, already defined on Unif(A) in 29.30, can be extended (not necessarily uniquely) to a continuous linear map § : B(O) -+ JR, satisfying l §f l :::; l lf lloo .:" A/. If A is a positive charge, then § can be chosen so that it is also a positive linear functional. (HB26) Banach's Charge. Let A be an algebra of subsets of a set 0, and let A be a bounded real-valued charge on A. Then A can be extended to a real valued charge A on all of P(O). If A is a positive charge, then we can also choose A to be a positive charge. (See also the related remarks in 21.23.) Proof that (HB2) and (HB7) imply (HB25). The first statement is obvious from (HB7); the result about positive charges will take a little more work. Observe that the mapping a.
:
{ ll I
}
f--7
f--7
f--7
JR.
806
Chapter 29: Vector Measures
functionals 0 and f(w) f llf+ lloo is convex, since llf+ lloo is the supremum of the linear (for w E 0). For f E Unif(A), we have I f d).. :::; I f + d).. :::; llf+ lloo/)...+/ , hence by (HB2) we can extend the Bartle integral to a linear map § that satisfies §f :::; llf lloo/>.. / . When f :::; 0, then llf + lloo :::; 0; this proves § is a positive linear functional. However, we still need to show that a functional chosen in this fashion will also satisfy the inequality l§fl :::; llflloo/ >.. / . Any f E B(O) can be expressed in terms of its Jordan Decomposition: f = f + - f - with f+ (w) = f(w) and r (w) 0 wherever f(w) 2 0, r (w) = - f(w) and f+ (w) = 0 wherever f(w) :::; 0. Then §(!+ ) and §(!-) are both nonnegative, so I§(J+ ) - §(f - ) 1 < max{§(!+ ), §(! - )} l§fl < max { llf + lloo, llf - lloo } /)... / r--+
=
'*
Proof of (HB25) (HB26). Define A(S) = § (15). Proof of (HB26) (HB12). Let :J = { 0 \ F : F E 3"}; this is the proper ideal that is dual to 3". Let A 3" U :J = { S s;;; 0 : S E 3" or S E :J}; verify that A is an algebra of sets. Define )... : A {0, 1} by taking >. ( F ) = 1 for each F E 3" and >.(I) 0 for each I E :J; verify that this is a positive charge on A. Now extend to A = =
=}
__.
JL
=
MEASURES ON INTERVALS 29.33. Theorem: scalar-valued measures on an interval. Let lF be the scalar field (JR or q . Let r.p [a, b] lF be a function that has bounded variation (in the sense of :
__.
intervals - i.e., as in 19.21). Then: a. The Henstock-Stieltjes integral J.Lrp (S) = J: 15 (t)dr.p(t) exists for every Borel set S s;;; [a, b] , and thus defines a scalar-valued measure f.Lrp on those sets. (That measure has bounded variation in the sense of measures, as in 29.5 - see 29.6.h.) b . Let f [a, b] lF be bounded and measurable (from the Borel sets to the Borel sets). Then the Henstock-Stieltjes integral I: f dr.p exists and is equal to the Bartle integral I[a , b] f df.Lrp · Proof If r.p is complex-valued, we may write it as Re r.p + i Im r.p; thus it suffices to consider real-valued r.p. Any real-valued function of bounded variation can be written as the difference of two increasing functions. Thus we can apply 24.35; f.Lrp is a linear combination of positive finite measures on the Borel sets, and thus it is a scalar-valued measure. By the definition of f.Lrp , the equation I: f dr.p = I[a,b] f df.Lrp is clear when f is the charac teristic function of a measurable set -00 hence also when f is a simple function, by linearity. The simple functions are dense in ,C (S), and the Henstock-Stieltjes and Bartle integrals :
__.
807
Measures on Intervals
are continuous on ,C oc (S) (see 24.17 and 29.30). Take limits to prove .!,: f d'fJ = I[a .b] f dp,'P for all f E .c''0 ( S). 29.34. Riesz Representation Theorem for intervals. Let n [a, b] be a compact interval in JR, and let 'B be its a-algebra of Borel subsets. Let lF be the scalar field (JR or q , and let {continuous functions from [a, b] into lF}. C[a, b] Then these three Banach spaces are isomorphic: the space C[a, b]* of continuous linear functionals A : C[a , b] lF, equipped with the operator norm; the space ca ( 'B , JF) of scalar-valued measures p, , normed by the variation / p,j as in 29.29.g; the space N BV ( [a, b] , JF) of normalized functions of bounded variation, equipped with the norm I I 'P I I = Var('fJ, [a, b] ) as in 22.19.d. In fact, the following maps are linear norm-preserving bijections: the mapping : p,'P from N BV([a, b] , JF) to ca('B, JF) defined by the Henstock Stieltjes integral p,'P(S) = I: 1s d'fJ as in 29.33; the mapping B p, (311 from ca ( 'B , lF) to C[a, b]* defined by the Bartle integral /3p ( J) = In f d11, as in 29. 30; the mapping A : 'P A'P from N BV([a, b], JF) onto C[a, b]* given by the RiemannStieltjes integral A'P(j) = J�' f d'fJ, as in 24.26(ii). (In fact, the mapping A is actually equal to B o M . ) Thus, the two kinds of "variations" defined in 19.21 and 29.5 are equal, for and corresponding as above. Proof (based on Limaye [1981] ) . The equation A B o M follows from 29.33. It follows from 24.28 that the mapping A is injective when considered only from N BV([a, b] , JF) to C[a, b]*. We saw in 29.30 and 24.16.c that I I I B II I J 1 u,-J. tJ J ( · )
j =l
L kj (1 ( a.11 j (·) - 1 ( a . t1_1J ( ·) ) j= l
808
Chapter 29: Vector Measures
satisfies X (u ) = L.7=1 kj ['!{! (tj) - '!{l(tj_I)]. We claim next that '!{! E BV([a, b], lF), with Var('!{l) :::; 11X11- To see this, let any partition of [a, b] be given; define u as in ( ** ) with constants i'!{! (tj) - '!{! (tj - d l '!{! (tj) - '!{! (tj - d
{
0
We may assume that the kj's are not all zero; hence l l u l l oo = 1 and n
n
i'!{l(tj) - '!{l(tj - 1 ) 1 L: kj ['!{! (tj) - '!{l(tj_ I)J X(u ) < 11X11 jI: j=1 =1 This proves our claim. Next we claim that I: f d'!{! = >.(!) for every u E C[a, b]. Indeed, since f is continuous and '!{! has bounded variation, the integral I: f d'!{! is a Riemann-Stieltjes integral, not just a Henstock-Stieltjes integral. Thus, in the approximating Riemann sums :E[f, T, '!{!], we may take the tags Tj to be any points in the subintervals [tj_ 1 , tj] · In particular, we may take Tj = tj. Let kj = f(tj), and define u as in ( ** ); then :E[f, T, '!{I] = L.7= 1 kj['!{l(tj) '!{l(tj_ I )] = X( u). The Riemann sums of this type converge to the Riemann-Stieltjes integral I: f d'!{! . Meanwhile, the approximating functions u defined in ( ** ) converge uniformly to the continuous function J, and so X(u) converges to X(!) = >.(!). This proves our claim. By 24.28 we may write '!{! = '1{11 + '1{12 where '1{11 E C[a, b]j_, '1{12 E N BV([a, b], X), and Var('!{l2 ) :::; Var('!{l). Take = '1{12 ; this completes the proof of the present theorem. 29.35. We now state a more general theorem. We omit the proof, which is quite long; it can be found in books on measure and integration. Riesz Representation Theorem (general version; proof omitted). Let 0 be a locally compact, Hausdorff topological space. Let Cc(O) be the ordered vector space of all real valued, continuous functions on 0 that have compact support. Then each positive linear functional A on Cc(O) is of the form A(!) f dj1 where J1 is a positive measure the Borel subsets of 0. There may be more than one measure satisfying this requirement, but there is only one satisfying the following further conditions: Each compact subset of 0 has finite measure; J1 is outer regular, in the sense that J1(B) = inf{J1(G) : G is an open superset of B} for each Borel set B; and J1 is inner regulaT, in the sense that J1( G) = sup{J1( K) : K is a compact subset of G} for each open set G. 29.36. Theorem. Let be some mapping from an interval [a, b] into the scalar field (IR or C). Then the following conditions are equivalent: r.p
L
<m
r.p
809
Measures on Intervals (A) ip (x) = IP (a) + J: g(t)dt for some function g E L 1 [a, b] . (B) IP is absolutely continuous in the classical sense; that is, for each number E > 0 there exists some number 8 > 0 such that
whenever a
:::; s 1 :::; t 1 :::; s2 :::; t 2 :::; · · :::; Sn :::; tn :::; b with sJ tJ 7 :Z:: = 1 I - I < 8, then :z=;'= 1 I IP (sj) - ip (tJ ) I < E. In this case we might also describe phi as absolutely continuous in the ·
sense of intervals.
(C)
ip is
continuous, and IP has bounded variation in the sense of intervals (see Moreover, if we define a measure /-L
0 such that A(A) < 8 (C). That ip is continuous and has bounded variation in the sense of Proof of (B) intervals is an easy exercise. To prove /-L
0 be given; choose 8 > 0 as in (B). Let S be any Lebesgue-measurable subset of [a, b] with A(S) < 8 ; we shall show that I J-L
v(O) as n ---> oo , and fn ---> g uniformly on [0, T] x G as n ---> oo . Then Un ---> v uniformly on [0, T] as n ---> oo . Proof Since Range( v) is a compact set, by 22.36 there exist some number r > 0 and some function r.p E L 1 [0, T] with this property: Whenever p 1 and p2 are continuous functions from [0, T] into G such that max max dist (Pj (t), Range(v)) < r, =1 , 2 O v(tJ- d as it suffices to show that Un ---> v uniformly on [tj - 1 , tj ] . Let any E in (O, r/2) be given; choose large enough so that llun (tJ d - v(tJ d ll :S E and II fn - g i l = :S E; it suffices to show that ll un( t) - v(t) II :S 2E for all t E -[tj - 1 , tj ]. -Suppose the contrary; let T be the first point in [tJ_ 1 , tJ] satisfying ll un (T) - v(T) II � 2E. Then for all s in [tJ_ 1 , T), we have ll un(s) - v(s) ll < 2E < r. For such s we have =
·
·
·
=
n ---> oo ;
n
ll fn(s, Un (s)) - g(s, v(s)) ll < llfn(s, un(s)) - g(s, un(s))ll + llg(s, un(s)) - g(s, v(s) ) ll < llfn - gil= + rp(s) llun (s) - v(s) ll < E + 2Er.p (s).
Therefore
ll un (T) - v(T) II
2E
1,7_ 1 [fn (s, un(s)) - g(s, v(s))]ds llun (tJ - d - v(tJ - d l l + 1 7 [E + 2Er.p (s)]ds < E + -E 2 ' Un(tJ - d - v(tJ - d +
t BC([O, T] , X) BC( [O, T] , X) by the rule 30.11. Notation.
x
f
:
1
----t
x
xo
+fat f(s, u(s))ds. f.
A solution of (IVP) is the same thing as a fixed point of if> It suffices to verify the hypotheses of 20.10. Note that different f s yield different if>j's. Indeed, if h ( �) -1- h ( �), then any continuous function u with u( ) = � will yield if> ( ) -1- if> ( u); verifying this is an easy exercise. Let Ill be the set of all such mappings if>t · Then Ill is a bijective copy of B([O, T] X, X), and so we may topologize Ill by copying the topology of BC( [O, T] X, X). Then Ill is a complete metric space. It is easy to verify that the topology on Ill is stronger than the topology of uniform convergence on BC([O, T] , X ) . When f is locally Lipschitz, then (IVP) has a unique solution - i.e., if> t has a unique fixed point. Define 1!1 0 as in 20.10; then 1!10 contains the locally Lipschitz functions, by 30.10. The locally Lipschitz maps from [0, T] X into X are dense in BC( [O, T] X, X ) , by 18.6.c. This completes the proof. '
T
h u
h
x
x
T,
T,
x
x
COMPACTNESS CONDITIONS 30.12. Peano's Existence Theorem.
Let 0 be an open subset of a Banach space X .
823 Let f : [0 , +oo) 0 X be jointly continuous (or, more generally, assume f is jointly measurable and f(t, · ) : 0 --> X is continuous for each fixed t). Suppose that (CM) f is a compact mapping - i.e., f maps bounded sets to relatively compact sets. Let x0 E 0. Then the initial value problem (IVP) in 30.1 has at least one solution for some T > 0. Remarks. We can weaken ( CM) slightly: it suffices to assume that f maps closed, bounded sets to relatively compact sets. This condition can be omitted altogether if X is finite dimensional and f is jointly continuous, since this condition follows from those assumptions - see 17.7.h and 17.17. The finite-dimensional case can be found in most books on ordinary differential equations. Hypothesis ( CM) can not be removed when X is infinite-dimensional; see 30.4. Proof of theorem. Let R be some positive number small enough so that the closed ball B centered at x0 with radius R is contained in 0. Then f ( [0, 1 ] B) is relatively compact, hence bounded; say ll u ll :s; M for all u E f ( [0, 1 ] B). Choose some positive number T :s; min{1, R/M}. Let C ( [O, T] , X) and C ( [O, T] , B) be the sets of continuous functions from [0, T] into X and into B, respectively; then C ( [0, T] , X) is a Banach space (with the sup norm) and C ( [0, T] , B) is a closed convex subset of that Banach space. Define a mapping : C ( [0, T] , B) --> C ( [0, T] , X) by Compactness Conditions x
-->
x
( v) ( t)
xo +
lot f(s, v(s))ds
x
(0 :s; t :s; T).
It is easy to show that this mapping is continuous. By our choice of T, it follows eas ily ( exercise) that maps C ([O, T] , B) into itself. Furthermore, ll (v)(t) - (v)(r) \1 = I J: f(s, v(s))ds ll :s; I t - r i M , and thus the range of is equicontinuous. Recall from 26.23.i that, in a Banach space, the closed convex hull of a compact set is compact. The set f ([0, T] B) is relatively compact; hence its closed convex hull is a compact set K1 c;;; X. Any function v has range contained in x0 + TK1 , which is also a compact subset of X. By the Arzela-Ascoli Theorem (18.35), the range of is contained in a compact set X1 c;;; C ( [0, T] , X). Let X2 be the closed convex hull of X1 ; then has range contained in X3 = Xz n C ( [0, T] , B), which is a compact convex subset of C ( [0, T] , X). The restriction of to X3 is a continuous self-mapping of the compact convex set X3 . By Schauder' s Fixed Point Theorem (27.19), has at least one fixed point in X3 ; that fixed point is a solution of (IVP). 30.13. Remarks on generalizations. Instead of assuming that f maps bounded sets to relatively compact sets, we could make the weaker assumption that r (f(t, S)) :s; w (t, r(S)) for all t and all bounded sets S; here is one of the measures of noncompactness or (3 (defined in 19.19) and w is some suitable function. Some results in this direction are given by Mi:inch and von Harten [1982], Heinz [1983] , Banas [ 1985] , Song [1987], and other papers cited by those. x
1
a
824
Chapter 30: Initial Value Problems
ISOTONICITY CONDITIONS
Let (X, II II , �) be a Dedekind complete Banach lattice. Let [0, T] and X be equipped with their a-algebra of Lebesgue-measurable sets and Borel sets, respectively. Let f : [0, T] x X X be a mapping with the following properties: (i) f is jointly measurable and maps separable sets to separable sets (or, more generally, f has the property that whenever x : [0, T] ---+ X is continuous, then the mapping t f(t, x(t)) is measurable and separably valued). (ii) For each fixed t, the function f(t, ·) : X ---+ X is increasing. (iii) There exist functions b, c E £ 1 ( [0, T], X) such that b(t) � f(t, x) � c(t) for all (t, x) E 30.14. Biles-Schechter Theorem.
---+
�---+
[O, T]
X
X.
Then for each x0 E X, there exists a Caratheodory solution to (IVP). Among the solutions there is a largest; it is the pointwise supremum of all the solutions. We may refer to it as the maximal solution. In fact, it is also equal to the pointwise supremum of all the solutions of the integral inequality u(t)
�
xo +
lo f(s, u(s))ds t
(0 :::; t :::; T).
If we also assume (�) there exists a function m E L 1 [0, T] such that ll f(t, x) ll :::; m (t) for all (t, x) E
Remarks.
[O, T] X X,
then we do not need X to be a lattice; it suffices to assume that X is a Dedekind complete ordered Banach space whose positive cone is closed and whose topology and ordering make X a locally full space (defined in 26 . 52) . Condition (�) can be replaced by still other, weaker, more complicated conditions, but we shall not pursue those here. Isotonicity conditions have not yet been used extensively in applications in the literature. We include this theorem not so much for its usefulness, but for its theoretical interest. The present argument was first given in finite dimensions by Biles [1995]; it was subsequently extended to Banach lattices by Schechter [1996] . Proof of theorem. We shall first use the fact that X is a Banach lattice to prove condition (�); in fact we shall prove it with m (t) ll b(t) ll + ll c(t) ll · The proof is just an application of ordinary lattice arithmetic. For any t, x we have and c( t) � /c(t)/ � /b(t)/ + /c(t)/ f(t, x ) =
-f(t, x)
-b(t) � /b(t)/ � /b(t)/ + /c(t)/,
hence / f(t, x)/
f(t, x) V (-f(t, x)) � /b(t)/ + /c(t)/
//b(t)/ + /c(t)/ j.
825
Isotonicity Conditions
Then
llf(t, x ) l l � 11 /b(t)/ + /c(t)/ 11 � 1 1 /b(t ) / 1 1 + l l /c( t) / 1 1 = llb(t ) ll + llc(t ) ll · This proves (q). Let C([O, T] , X) {continuous functions from [0, T] into X } . Let [;;; denote the pointwise ordering on C([O, T] , X) -- that is, u [;;; v if u(t) � v(t) for all t E [0, T] . Observe that if v E C([O, T] , X) , then the mapping s f(s, v(s)) is measurable and II 11 -dominated by the integrable function hence it is integrable. Therefore we can define the function ( ll>v)(t) = xo + f(s, v(s))ds. Then maps C( [0, T] , X) into itself. It is clear that a solution of the initial value problem is the same as a fixed point of 1>, and a solution of the integral inequality is the same as a solution of u [;;; IJ.>u. Since each mapping f(s, ·) : X --.. X is �-increasing, it follows that C([O, T] , X) --.. C([O, T] , X) is [;;; -increasing. Let r(t) Xo + c(s)ds; f3(t) = x0 + 11 b(s)ds, those are continuous functions of t. Define the set V v E c ( [o, TJ , X ) f3(t) � v(t) � r(t) and =
r--->
m;
11
I]>
I]> :
=
()
{
11 ()
:
J,1 b(s)ds
� v(t) - v(r) � J: c(s)ds
for all
[r, t]
c::;:
}
[0, T] .
1>
Clearly, 1> maps C([O, T] , X) into V; hence V is nonempty and maps V into V. We next show (V, [;;; ) is a complete lattice. (We note that C( [O, T] , X) is not Dedekind complete, in general; thus f3 and are essential for the following argument.) Let V be any nonempty subset of V, and define �T(t) sup{v(t) : v E V} and t(t) = inf{v(t) : v E V}; these functions are well defined since X is Dedekind complete. It suffices to show that IT E V (for then E V by similar reasoning). Clearly, f3(t) � �T(t) � r(t). Fix any [r, t] c::;: [0, T] . For each v E V we have v(t) � v(r) + c(s)ds, and v(r) + !1 b(s)ds � v(t) hence v(r) + !1 b(s)ds � �T(t) v(t) � �T (r) + c(s)ds, and hence (taking the supremum on the left side) 1
=
t
Jt
Jt
Jt
�T (r) + b(s)ds � !T (t)
and
Jt
!J(t) � IT(r) + c(s)ds,
826
Chapter 30: Initial Value Problems
and therefore I: b(s)ds � a(t) - a(r) � I: c(s)ds. We shall use that inequality, finally, to prove that a is continuous. To show that a is continuous from the right, let tn l r; then J:n b(s)ds ---. 0 and J:n c(s)ds ---. 0 (see 26.52(E)). Since X is a Banach lattice (or, more generally, since X is locally full), it follows that a(tn ) - a(r) ---. 0. Similarly, a is left continuous. Thus V is order complete. We can now apply Tarski's Fixed Point Theorem in the form of 4.30; this completes the proof of the present theorem. 30.15. Corollary on comparison of solutions. Let /I, h be two functions satisfying the conditions of the preceding theorem, let x1 , x2 E X, and let u1 , u2 be the maximal solutions of the initial value problems uj (t) f1 (t, uj (t)) (0 � t � T), Uj (O) = Xj for j = 1, 2. Suppose that x1 � x2 , and JI (t, x) � h (t, x) for all (t, x) E [O, T] X. Then u1 (t) � u2(t) for all t E [0, T] . Proof We have u1 (t) = x1 + I� /I (s, u1 (s) )ds � x2 + I� h (s, u1 (s))ds. Thus u1 is a solution of the integral inequality given by h and x2 . However, u2 is the largest solution of that integral inequality. =
}
x
GENERALIZED SOLUTIONS
The preceding subchapters were concerned mainly with Caratheodory solutions of differential equations. Such solutions are differentiable almost everywhere, as we noted in 30.5. In the remainder of this chapter we consider "generalized solutions" - i.e., functions that are are not necessarily differentiable, but nevertheless "solve" the differential equation in some natural sense. We shall briefly discuss why generalized solutions are sometimes needed; then we discuss some of the main types of generalized solutions. Let us begin with the world's simplest partial differential equation: au au or, more briefly, Ut = Ux . at ax We seek real-valued solutions u u(t, x), defined for real t, x. It is easy to verify that a so lution is given by u(t, x) = p(t + x) , if p is any real-valued differentiable function. We could refer to Ut = Ux as a very simple wave equation, because the function u(t, x) = p(t + x) behaves much like a wave at the seashore: it retains its shape while moving horizontally. ( Caution: The term "wave equation" is commonly applied to several other, more compli cated equations that model water waves more accurately.) Classically, a solution u = u(t, x) is viewed as a real-valued function - i.e., a mapping R A different viewpoint, closer to the ideas at the beginning of this chapter, u : IR2 views u(t, x) as a continuous function of x for each fixed t. Thus, for each t, u(t , ·) is a 30.16.
=;=
---.
827
Generalized Solutions
member of some space of continuous functions - e.g., the Banach space BC(JR) of bounded, continuous functions from lR into JR, equipped with the sup norm. Then we may suppress the x variable from our notation and write u(t, ) instead as u(t) E BC(JR). We may view u as a Banach-space-valued function u [a, b] ----> BC(JR). With this viewpoint, we may attempt to apply theorems like the ones developed earlier in this chapter. However, it is important to understand that the Frechet derivative v = u'(t0) of the Banach-space-valued function u : [a, b] ----> BC(JR) is a much stronger derivative2 than the classical, pointwise derivative v(x) = Ut (t0, x) of the real-valued function u : JR R In both cases we have u(to + h, x) - u(to, x) as h ----> 0. ----> V ( X ) h For the pointwise derivative the convergence is pointwise in x; for the Frechet derivative the convergence is uniform in x. If p is a differentiable function but p' rf:. BC(JR), then u( t, x) = p( t + x) only satisfies Ut = Ux in the classical (pointwise) sense, not in the sense of Frechet derivatives. Thus theorems of the type developed earlier in this chapter are not directly applicable. It is natural to view u(t, x) = p(t + x) as the "solution" of the initial value problem (t � 0), U t (t, x) = Ux (t, x) (WIVP) u(O, x) = p(x) for any differentiable function p. By taking limits, we may extend this definition; it is natural to view u(t, x) = p(t + x) as the "solution" of the wave initial value problem (WIVP) for any function p - even one that is not differentiable. Thus, some differential equations have natural "solutions" that are not differentiable in any sense. Such nondifferentiable solutions turn out to be the correct answers to many physical problems. 30.17. The need for nondifferentiable solutions becomes even more evident when we turn to nonlinear problems, such as Burgers's Equation: ·
:
---->
}
Even if the initial data u(O, ·) is continuously differentiable, the solution u(t, ·) may develop discontinuities at some later time t. We shall demonstrate this with some simple examples. Let q lR ----> lR be some continuously differentiable function that satisfies q( z) = z for all z in [ - 1 , 1], and q' ( z) � 1 for all z E R The particular choice of q will not affect our main reasoning below, but we mention a couple of examples for concreteness. A trivial example is given by q(z) = z; more complicated examples can be devised by the reader. For any fixed t in [0, 1 ) , the function 7/J( t, z) q( z) - tz satisfies gz 7/J(t, z) � 1 - t > 0. Hence 7/J(t, · ) is strictly increasing and is a bijection from lR onto R Let u( t, ) be its inverse; thus :
=
·
z. u(t, q(z) - tz) (In the example of q(z) = z we obtain 7/J (t, z) ( 1 - t)z and u(t, x) = (1 - t)- 1 x.) Now differentiate both sides of (qq) with respect to t to obtain d 0 U t (t, q(z) - tz) - ux (t, q(z) - tz)z. [u(t, q(z) - tz) - z] dt =
828
Chapter 30: Initial Value Problems
Then substitute q(z) - tz = x to obtain 0 = Ut(t, x) - ux(t, x)u(t, x) . Thus u(t, x) is a solution of1 Burgers's Equation Ut = uux , at least for 0 :::; t < 1 . The initial data is u(O, x) = q � (x) , which is continuously differentiable since q is. Observe that u(t, x) :::; - 1 for all x :::; -1 + t, and u(t, x) ;:::: 1 for all x ;:::: 1 - t. If u extends to a continuous function on 0 :::; t :::; 1 , it must satisfy u( 1 , x) :::; - 1 for all x :::; 0 and u ( 1 , x) ;:::: 1 for all x ;:::: 0, a contradiction. Thus the solution u(t, x), which is continuously differentiable for all (t, x) in [0, 1) IR, becomes discontinuous at time t = 1; we say that it develops shocks. After time t = 1 , the solution may still be physically meaningful, but its mathematical theory becomes more complicated. We shall not pursue that theory here, other than to mention that one must deal with "generalized solutions" - i.e., discontinuous functions u( t, x) that correspond somehow to the equation Ut = uux but do not satisfy it in a classical sense. The development of shocks is quite typical of nonlinear partial differential equations and is explained further in books on that subject - see Lax [1973] , for instance. x
S EMIGROUPS AND DISSIPATIVE O PERATORS
Let A be an operator for which the differential equation u' ( t) A( u( t)) has "solutions" of some sort. More precisely, suppose that M is a subset of a Banach space, and for each x0 E M there is a unique solution u [0, +oo) ----+ M of the initial value problem 30.18. Motivation for the Crandall-Liggett Theorem. =
:
{
u'(t) = A(u(t)) u(O) = xo.
(t ;:::: 0),
We may denote that solution by u(t) = S(t)x0 to display its dependence on both the time t and the initial value x0. In this fashion we define a family of mappings S(t) : M ----+ M for t :2': 0, with S(O)x = x. For most reasonable notions of "solution," the solutions of the two initial value problems v'(t) A(v(t)) (0 :::; t), u'(t) = A(u(t)) (0 :::; t), and v(O) = u(t 1 ) u(O) = xo
{
}
{
=
}
are related by u(t 1 + s) = v(s). From this it follows that the mappings S(t) satisfy the semigroup property: S(t)S(s)z S(t + s)z. In some of the most interesting cases, the semigroup of mappings satisfies an exponen tial growth condition: ( S(t) ) Lip :::; exp ( wt) for some constant w. For example, if A : X X is a Lipschitzian mapping, then A generates a semigroup satisfying an exponential growth condition; that follows from 30.9. However, a semigroup arising from differential equations may satisfy an exponential growth condition even if the operator A is not Lipschitzian - in fact, even if the operator A is not continuous. In 30.24 we shall show that if the semigroup S(t) is differentiable at =
----+
829
Semigroups and Dissipative Operators
t = 0, then the operator A must satisfy a dissipativeness condition; this is a generalization of Lipschitzness. Conversely, even for semigroups that are not differentiable, an operator A that satisfies a dissipativeness condition plus a mild "range condition" must generate a semigroup that satisfies an exponential growth condition; this is established in 30.28. We might denote the semigroup S ( t ) instead by SA (t), to display its dependence also on the choice of the operator A. A still more suggestive notation is S(t)x = e1 A x, where A is the operator appearing in the differential equation. If A is a continuous linear overator, then etA can be defined in several different equivalent fashions: -n ( tA ) n A !__ ( ) lim I + lim I n---+00 n n--+oo n If the operator A is discontinuous and/or nonlinear,n then most of these formulas become meaningless or incorrect, but the limit of (I - �A r may still be meaningful and useful. Even if A is a badly behaved operator - e.g., a differential operator, which is discontinuous in most of the usual Banach spaces of functions - the operator (I - >.A)- 1 may be quite well behaved when >. is a small positive number - e.g., it may be an integral operator,
which is continuous or even compact on many of the usual Banach spaces. 30.19. Although the abstract theory applies to both linear and nonlinear operators, for illustrative purposes we shall give just one very elementary linear example. (For more ad vanced examples, the reader should consult books devoted specifically to partial differential equations and evolution equations.) Let us use the Banach space C0 (IR) of continuous func tions from IR to IR that vanish at infinity (as explained in 22. 15), with the sup norm. Let A be the operator d� ' with domain0 D(A) equal to the set of all functions f E C0 (IR) such that f is differentiable and f' E C (!R). Then the differential equation u'(t) = A(u(t)) is a reformulation of "the world's simplest partial differential equation," discussed in 30. 16. We shall now show that (I - >.A) - 1 is very well behaved for any positive number >.. Let g E C0 (IR ) be given; then (I - >.A)- 1 g has what value? Assuming that it exists, it is a function f E D(A) that satisfies f - >.Af = g. Let us find that function f. Rewrite its equation as f(x) - >.f'(x) = g(x). Multiply both sides of this equation by - >.- 1 exp(-x/.X), to obtain 1 -x exp T . - -;xg(x) d� [f(x) exp >. Integrate both sides - starting from x = 0, say - to obtain f(x) exp �x for some constant C. To find the value of C, take limits on both sides of this equation as x --> We have f(x) --> 0 since f vanishes at and thus C = ± J000 g(t) exp (- t j>.)dt . This integral converges, since g vanishes at infinity and exp( - t / >.) vanishes exponentially fast. Therefore the last displayed equation can be rewritten 00 where f(x) = �). exp( �). ) g( t ) exp -t). dt .
( )]
(-X)
( )
oo.
oo,
1 X
( )
830
Chapter 30: Initial Value Problems
It is easy to verify that j, defined by the last equation, is indeed a member of D(A) that solves (I - >..A ) j = g; and the preceding computations show that there is no other solution. A further computation shows that llfll sup ::::; llgll sup · Thus ( I - >.. A ) - 1 is a nonexpansive linear operator defined everywhere on C0(IR). This is typical of the kind of operator to which the Crandall-Liggett Theorem is applicable - but we emphasize that that theorem applies to much more complicated operators as well. Exercise. Modifying the computations above, show that ( I + >..A ) - 1 is also a nonexpansive linear operator defined everywhere on C0(IR), for each ).. > 0. 30.20. Let X be a Banach space, and let J : X -+ P(X*) be its duality mapping (defined as in 28.44). Let A be some mapping from a subset of X into X. Then the following two conditions are equivalent; if either (hence both) are satisfied, we say A is dissipative (or -A is accretive): (A) Whenever ).. > 0, then the mapping (I - >.A ) Dom(A) -+ X is injective, and its inverse mapping (I - >..A ) - 1 Ran(I - >.A) Dom(A) is nonexpansive. (B) Whenever x 1 , x2 E Dom(A), then there is some r.p E J(x1 - x2) such that r.p[A(x l) - A(x2)] ::::; 0. Proof (following Cioranescu [1990] ) . Let Y1 = A(x l ) and Y2 = A(x2). Let x = x1 - x2 and fi = Y1 - Y2; then we are to show that (A') l l x - >.fill � llxll for all >.. > o if and only if (B') there is some r.p E J (x) such that r.p(fi) ::::; 0. For (B') (A') we simply compute :
:
-+
=;.
= r.p (x) ::::; r.p(x) - >..r.p (fi)
llxll2
= r.p(x - >..fi) ::::; llxll llx - >.fil l ·
=;.
The proof of (A') (B') is longer. We may assume x and fi are both nonzero (explain), hence x - >..fi is also nonzero for each ).. > 0. For each ).. > 0 choose some 6 E J (x - >..fi) ; this vector is also nonzero. Form the unit vector 7]>. = �>. /116 1 1 · Then llxll ::::; llx - >.fill
= T/>. (x - >..fi)
= T/>. (x) - >..ry>. (fi) ::::; llxll - >..ry>. (fi)
from which we conclude both 7]>. (fi) ::::; 0. and Since the vectors 7]>. all lie in the unit ball of X*, which is weak-star compact by (UF28) in 28.29. the net ( T/>. ).. l 0 ) has a subnet converging in the weak-star topology to some limit 'T]o in that unit ball. Then ll 77o I ::::; 1 . Now we may take limits in ( ** ) we obtain and 'T]o (fi) :S 0. l l xll ::::; T/a (x) llxl l ::::; T/>. (x) + >. ! I fill
:
'
;
Semigroups and Dissipative Operators
831
Since TJo is in the unit ball, we can conclude l!:rll = TJo(x) and IITJo ll = 1 . Then cp = llx ii TJo is a member of J(x), satisfying cp(fj) 'S 0. 30.21. A generalization. Let X be a Banach space, and let J : X ----+ :P(X * ) be its duality mapping. Let A be a mapping from some subset of X into X, and let w be a nonnegative number. Then the following three conditions are equivalent ( exercise) ; if they are satisfied we say A is w-dissipative: (A) Whenever ,\ E (0, � ), then the mapping (I - .\ A) : Dom(A) ----+ X is injective, and its inverse mapping Ran(! - .\A) ----+ Dom(A) R(.\) = (I - .\A) - 1 is Lipschitzian with ( R(.\ ) ) Lip 'S ( 1 - .\w)- 1 . there is some cp E J(x 1 - x2) such that (B) Whenever x 1 , x2 E Dom(A), then pc [A(xl) - A(x2)] 'S w ll x1 - x2ll 2 . ( C ) A - wl is dissipative. 30.22. Remarks. If A is a Lipschitzian mapping, with (A) Lip 'S w, then A and -A are both w-dissipative. For this reason, dissipativeness conditions are sometimes called one-sided Lipschitz conditions. However, that terminology may be misleading. For instance, define A as in 30.19. Then A and -A are both dissipative, but A is not Lipschitzian; in fact, A is not even continuous. 30.23. Example. Let X be a real Hilbert space with inner product ( , ) . Then an operator A is dissipative if and only if it has this property: Whenever x 1 , x2 E Dom(A), then (x 1 - x2, A(yl ) - A(y2)) 'S 0. If X is one-dimensional - i.e., if X is just the real line - then A is dissipative if and only if (x 1 - x2) (A(yl ) - A(y2) ) 'S 0; that inequality is satisfied if and only if A is a decreasing function. 30.24. Proposition. Let C be a subset of a Banach space X, and let S be a semigroup of self-mappings of C. Assume that (S(t)) Lip 'S ew t for some constant w 2': 0 and all t 2': 0. Define a mapping from a subset of C into X by S(h)x - x A(x) = lim h hlO where the domain of the operator A is the set of all x E C for which the limit exists. Then A is w-dissipative. Proof Fix any x 1 , x2 E Dom(A) and A E (0, � ); let h > 0. Then
832
Chapter 30: Initial Value Problems
1 ( 1 + � ) (xi - x2) - � [S(h)xi - S(h)x2] 1 1 ( 1 + � ) (xi - x2 ) 1 - 1 � [S(h)xi - S(h)x2] 1 (1 + � ) ll xi - x2 ll - � ewh ll xi - x2 ll [1 - >. ewhh 1 ] ll xi - x2 ll ·
> >
Ta){e limits as
h 1 0, to prove
30.25. Lemma. Let A be an w-dissipative mapping, and let R(>.) = (I - >.A) - I . Then for
o: , (3 E (0, � ) and any vectors u E Ran( I - o:A) and v E Ran( I - (3A), we have ( o: + f3 - wo:f3) 11 R( o: )u - R ((J)v ll ::::; o: II R( o: )u - v ii + f311 u - R ( (J)v l! .
any numbers
Proof Let x = R( o: )u and y = R( (J )v; thus u = x - o: A(x) and 2 some
t iT) is Cauchy for fixed t 2: 0 and x E D; denote its limit by S(t)x. Hold j fixed and let k -+ oo to prove the convergence rate (a). Since ( R ( s )) Lip � ( 1 - wt ) - 1 on all of M, it follows that S(t) = limj-+t oo R(tjj)1 exists and is Lipschitzian on all of M, with (S(t)) Lip � limj-+oo(1 - w/ ) -j = e . Now apply (4) with a = tjj and (3 = slk, and take limits to prove (b). Since S(t) is Lipschitzian on M, it follows easily that (t, x) nS(t)x is jointly continuous. Finally, by induction on n show that S ( s ln ) x = limk -+oo R(slkn) knx = S(s)x. Use this to prove (c) when t Is is rational; then use continuity to prove (c) for all s and t in w
r--+
[0, +oo) .
30.28. Crandall-Liggett Theorem. Let X be a Banach space, and let A be a mapping from some set Dom(A) � X into X. Assume A is w-dissipative for some w > 0. Also assume this range condition: Ran(! - >.A) � cl(Dom(A)) for all sufficiently small >. > 0. Then the limit
S(t)x
exists for each x E cl(Dom(A)) and each t 2: 0. In fact, the functions R(>.) = ( I - >. A) - 1 satisfy the hypotheses and hence the conclusions of 30.27, with M = cl(Dom(A)). Proof Choose T > 0 small enough so that the range condition is satisfied for all >. E (0, T), and also so that wT 1. It is easy to verify that f(x) � IIA(x)ll for x E Dom(A); hence Dom(A) � D; hence D is indeed dense in M. The other hypotheses of 30.27 follow from properties of w-dissipative operators developed in the last few pages - see 30.21 and 30.25. 30.29. Remarks on extensions and generalizations. For simplicity we have only considered w 2: 0; with some effort it is possible to generalize so that w may also take negative values. Actually, much of the literature concerns itself only with the case of w = 0, because the most interesting ideas are already present in that case and the computations are tidier. We
751 binary operation, 24 binary relation, see relation binding, 358 binomial coefficient, 48 Binomial Theorem, 48 Bipolar Theorem, 762 Bochner integral, 613 Bochner-Lebesgue space, 588 Bohnenblust-Sobczyk Correspondence, 280 Boolean algebra or ring, 329, 334 homomorphism, 329, 336 lattice, 326 space, 472 subalgebra, 330 subring, 335 -valued interpretation, model, universe,
lF,
381 , 383
Borel sets and a-algebra, 1 16, 289, 555 Borel-Lebesgue measure, 555 bound, bounded above or below, 59 function, 97, 293, 579 greatest lower, see infimum hyperreal, 251 least upper, see infimum or supremum linear map ( normed) , 605 linear map (TVS), 719 locally, see locally bounded lower or upper, 59 metrically, 97, 1 1 1 order, 57 order-bounded operator, 296 remetrization function, 486 sets form an ideal, 104 subset of a normed space, 580, 718
Index and Symbol List subset of a TVS, 718 totally, see totally bounded variable (not free) , 355, 358 variation, 507, 784 boundary, 530 Bourbaki-Alaoglu Theorem, 762 BP, 401 Bronsted ordering, 5 19 Brouwer's Fixed Point Theorem, 727 Brouwer's Triple Negation Law, 342, 370 Brouwerian lattice, 341 Browder's Fixed Point Theorem, 778 Bunyakovski1 Inequality, 39 Burali-Forti Paradox, 127 Burgers's Equation, 824 Caccioppoli Fixed Point Theorem, 515 canonical choices, see choice embedding in the bidual, 240, 775 isomorphism, 240 net, 159, 160 shoe, 140 well ordering, 74 Cantor construction of the reals, 513 founder of set theory, 43, 71 1 function, 674 space, 462 theorem on card(2 x ) , 46 theorem on card(N x N), 45 Caratheodory Convexity Theorem, 307 Caratheodory solution of ODE, 814 card, cardinality, 14, 43 and AC, 145 and compactness, 468 and dimension, 282, 286 and metric spaces, 429 and O"-algebras, 549 and ultrafilters, 1 51 collapse, 348 numbers (cardinals) , 126 of the rationals, 190 of wosets, 74 also, see countable, Hartogs number
859 Caristi ' s Fixed Point Theorem, 518 Cartan's Ultrafilter Principle, 151 category Baire (first or second) , 531 concrete, 210 inverse image, 212 nonconcrete, 2 16 objects and morphisms, 208 of sets, 212 Cauchy completeness, 501 continuity, 510 derivatives notation, 659 filters, nets, sequences in metric spaces, 502 in TAG's and TVS's, 706 in uniform or gauge spaces, 498 Intersection Theorem, 502 -Lipschitz Theorem, 816 -Riemann Equations, 666 -Schwarz Inequality, 39, 586, 591 space, 5 1 1 CC, see choice Ceitin ' s Theorem, 136 centered convergence, 170 CH, see Continuum Hypothesis chain (ordering) , 62 Chain Rule for Frechet derivatives, 662 for Radon-Nikodym derivatives, 789 change of variables, 788 chaotic topology, see indiscrete topology character group, characters, 708 character, finite, see finite character characteristic function, 34 charge, 288, 618 charts, tables, diagrams A 6. :J (algebra plus ideal) , 1 17 Argand diagram, 255 arithmetic in [ - oo, +oo] , 13 arithmetic in z6 , 188 arity function of a ring, 206 bal( co(S)) not necessarily convex, 305 Banach spaces, 573 Bessaga-Brunner metric, 522
860 Boolean and set algebras, 327 Cantor's function, 675 categories (dual), 238 categories (elementary) , 208 Choice and its relatives, 131 compactness and its relatives, 452 Condorcet's Paradox, 63 convergence spaces, 1 55 convex and nonconvex sets, 302 convexity and its relatives, 303 Dieudonne-Schwartz Lemma, 701 distances, F-norms, etc. , 686 dual concepts, 6 functions that agree, 37 Hausdorff metric, 1 12 injective, surjective, etc., 37 Intermediate Value Theorem, 433 lattice diagrams, 89 measure convergences, 561 monotone maps, 58 Moore closures, 79 numbers, common sets of, 12 preorders, 49 regularity and separation, 434 Schroder-Bernstein Theorem, 44 "sets" that violate ZF, 32 topological vector spaces, 685 typographical conventions, 3 uniformity, distances, etc., 480 Venn diagram, 1 8 zigzag line, 671 Chebyshev's Inequality, 565 choice AC (Axiom of) , equivalents, 139, 144-146, 285, 424, 425, 460, 461 , 503 arbitrary or canonical, 74, 77, 140, also see canonical countable (CC ) , 148, 466, 502 dependent (DC) , 149, 403, 442, 446, 525, 536 finitely many times (FAC) , 141 for finite sets (ACF ) , 141 for the reals (ACR), 140, 152 function, 139 Kelley's, 147
Index and Symbol List multiple (MC) , 141 pathological consequences, 142 Russell's socks, 140 circle group, 183, 238, 260 circle of convergence, 584 circled (same as balanced) , see convex cis (cos + i sin) , 256 clan, 1 1 7 Clarkson's Inequality, 262, 592 Clarkson's Renorming Theorem, 597 class, 26 classical (in IST) , 398 classical logic, 363 clopen, 106, 107, 328, 472 closed, closure algebraic, 84 ball, 108, 688 convergence, 410 convex hull, 698 down- or up-, 80 formula, 375 Graph Theorem, 731 , 732, 745 half-space, 750 Hausdorff metric, 1 12 interval, 57 Kuratowski's Axioms, 1 12 mapping, 422 Moore, 78, 225, 303, 4 1 1 neighborhood base, 427 path, 681 relativization, 416 string, 730 topological, 106, 1 1 1 , 4 1 1 under operations, 19, 83, 179, 330 closest point, 470, 598, 602 cluster point, 430, 452, 453, 456, 466 coarser (weaker) , see stronger or weaker codomain, 19, 210, 2 16, 230 coefficients, 584 cofinal, see frequent cofinal subnet, 163 cofinite cardinality, 43, 158 filter, see filter topology, 107, 461
Index and
Symbol List
collapse, 348 column matrix, 20, 192, 606 combination convex, 305 Frechet, 487, 689 linear, 275 combinatory logic, 360 comeager, 531 Common Kernel Lemma, 281 commutative, 24 algebra, 273 composition isn't, 35 fundamental operation, 203 group, 182 matrix multiplication isn't, 192 monoid, 1 79 ring, 187 compact, compactness Cauchy structure, 504 mapping, 820 principle of logic, 391 , 464 spaces or sets, 452 uniform continuity, 489 comparability of wosets, 74 comparable, 52 Comparison Law, 135 compatible topology with distances, 1 10, 703 uniformity with distances, 1 19 uniformity with topology, 1 19 complement additive, 185 in a lattice, 326 orthogonal, 86, 300, 600 sets, 16 complete, 253 assignment, 65 Boolean lattice, 326 Dedekind, 87 measure space, 553 metrics and uniformities, 501 ordered field, 246 ordered group, 242 ordering (lattice) , 87 theory, 204
861 completely metrizable, 535 regular, 441 Completeness Principle of Logic, 386, 390 completion of a measure space, 553 of a normed space, 577 of an ordered group, 243 order (Dedekind) , 93 order (MacNeille) , 94 uniform or metric, 512 complex charge or measure, 288 conjugate, 255 derivative, 666 differentiable, 682 linear functional, 280 linear map, 277 linear space, 279 numbers ( q , 255 complexification of real linear space, 279 component, componentwise, 20, 192, 422, also see pointwise, product composition of functions, 35 of morphisms, 216 of relations, 50 Comprehension, Axiom of, 30 concatenation, 181 concave, 310 concrete category, 210 condition of Baire, 538 conditional expectation, 789 Condorcet ' s Paradox, 63 cone, 712 congruent modulo m , 183, 188 conjugate complex, 255 exponents, 591 symmetry, 599 conjunction, 4, 357 connected, 106 connective, 357 consequence, 352 conservative, 395
862 consistent, consistency, 368, 399, 401 , 402 constructive, constructible Axiom of, 130, 348 example with irrationals, 134 in the sense of Bishop, 133, 403, 576 in the sense of Godel, 1 29, 348 Intermediate Value Theorem isn't, 432 numbers, 270 relative to the ordinals, 1 30 Trichotomy Law isn't, 135, 271 , 349 contains, 1 2 continuous absolutely, 783, 806 at a point, 417 from the left or right, 420 function (on a set) , 212, 417 indefinite integral is, 640, 654 scalar, 687 continuously differentiable, 660 Continuum Hypothesis (CH), 47 contraction, 481 Contraction Mapping Theorem, 515 contradiction, 377, also see proof by contrapositive, 6, 341 , 370 contravariant functor, 227 convergent, convergence almost uniformly, 561 along a universal net or ultrafilter, 454 centered, 170 closure, see closed, closure Hausdorff, 170 in a limit space, 168 in a metric space, 155 in complete lattices, 1 74 in measure, 561 in posets, 171 in probability, 561 interior, see interior isotone, 1 70 martingales, 791 , 793 monotone, see monotone of a net or filter, 1 69 order, 171 preserving, 169 pretopological, 409
Index and Symbol List series, 266, 583 space, 168 topological, 412 uniform, 490 converse, 5 convex, and similar algebraic notions (affine, balanced, star, symmetric, absolutely convex) combination, 305 derivatives, 3 1 1 , 680 function, 309, 313 hull, 303 infimum, 313 order convex, 80, 712 set, 302 convolution, 275 Cook-Fischer filter condition, 413 coordinate projection, 22, 236, 422 coordinatewise, see pointwise, product coproduct, 227 countable, countably, 15, 43 8 (products or intersections) , 43 a (sums or unions) , 43 additive, 288 boundedness in TVS's, 719 choice (CC), see choice compact, 466 Fa and G15, 529 gauge, 486 infinite, 15, 43 model, 378 N x N is, 45 products or intersections (8), 43 pseudometrizability criteria, 703 recursion, 47, 148 sums or unions (a) , 43 union of countable sets, 149 valued, 547 counting measure, 551 covariant functor, 227 cover, covering, 17, 504 Lemma of Lebesgue, 468 Cowen-Engeler Lemma, 152 Crandall-Liggett Theorem, 832 cross product, 275
863
Index and Symbol List crystal, 66 cubic polynomial equation, 257 cumulative hierarchy, 1 29 cut, 93 6, see countable products, Kronecker 6-fine, 629 Darboux integral, 628 DC, see choice De Moivre's formula, 256 De Morgan's Laws for Boolean algebras, 329 for logic, 6 for sets, 1 6 decimal representation, 269 decomposition Banach-Tarski, 142 direct sum in Hilbert space, 602 direct sum of groups, 185 Jordan, 199 Riesz, 300 sums in lattice groups, 199 decreasing, see increasing or decreasing Dedekind complete, see complete Dedekind finite or infinite, 149 deduce, 363 Deduction Principle, 373 definable, 139 defined on, 19 degenerate Boolean lattice, 327 degree of a polynomial, 191 Denjoy-Perron integral, 628 dense, 416 Density Property of Fields, 248 denumerable, 43 dependence, linear, 280 Dependent Choice (DC), see choice derivation, 352 derivative, 659 detachment, 363 devil's staircase, 674 diagonal set, 50 diagrams, see charts diameter, 97 dictionary order, see lexicographical
differ, 36 differentiable, 659 dimension, 282, 284, 286 Dini's Convergence Theorem, 456 direct product, 219 direct sum external, 227 internal, 184 directed order, directed set, 52, 156 disconnected, 106 discrete or indiscrete absolute value, 261 G-norm, 577 measure, 551 metric, 41 (a--)algebra, 115 topology, 107 TVS topology, 695 uniformity, 120 disjoint, 16, 618 disjunction, 4, 357 disk of convergence, 584 dissipative, 827 distance between closed sets, 1 1 2 between two points, 40 from a point to a set, 97 distance-preserving, 40 distribution, 744 distributive for functions, 24 for sets, 18 in a ring, 187 lattice, 90, 326 divergent series, 266 Dom, domain in a model, 377 in a nonconcrete category, 216 of a function, 19 of a morphism, 210 Dominated Convergence Theorem for co , 581 for Lebesgue spaces, 589 for totally measurable functions, 692 dot product, 599
Index and Symbol List
864 double elliptic geometry, 346 Double Negation Law, 342, 370 Dowker's Sandwich, 449 down-closed, see lower or upper dual, duality, 6 Boolean algebras and spaces, 337, 474 closed sets and open sets, 1 06 closures and interiors, 410 covering and free collection, 1 7 distributive laws, 18, 90 Euclidean space is its own, 283 eventual sets and infrequent sets, 1 59 exponential functor, 238 filters and ideals, 1 0 1 , 336 functor, 238 map of a normed space, 776 of a linear map, 283 of a linear space, 277 of a normed space, 608 of a Pontryagin group, 708 of a TVS, 749 of L 00 , 802 of ordered vector space, 299 of the Lebesgue spaces, 779 order and its inverse, 50 in a Boolean algebra, 329 in an ordered group, 195 l.s.c. , u.s.c., 421 pairing of vector spaces, 751 sets and their complements, 1 6 two families o f functions, 23, 751 Duns Scotus Law, 363 dyadic rational, 542 E-induction and E-recursion, 33 earlier, 5 1 Eberlein-Smulian Theorem, 4 77, 768 effective domain, 3 1 1 effectively equivalent or proved, 56, 144 Egorov's Theorem, 562 Eisenstein function, 259 element, 1 1 , 20, 27, 192 embedding, 209 empirical consistency, 401 empty
function, 22 relation, 50 set, 14, 30 endpoints, 305 Engeler-Cowen Lemma, 152 enlarging a filter, 103 entities, 396 entourage, 1 18 entry, 20 epigraph, 309 Epimenides, 9 equality, equals axioms for, 364 ordered pairs, 20 sets, 1 1 equational axioms and varieties, 204 equiconsistent, 402 equicontinuous, 493 equivalence defined by a filter or ideal, 230 a linear subspace, 278 a subgroup, 186 meager differences, 538 relation or classes, 52, 54 used to define field of fractions, 190 operations on quotient, 223 also, see equivalent equivalent consistency assertions, 402 definitions, phrases, statements, 5, 55, 1 38, 328 ( semi norms, 575 gauges or pseudometrics topologically, 109 uniformly, 1 19 homotopy-, 216 nets, 163 of choice, see choice structure-determining devices, 2 1 1 topologically or uniformly, 2 1 1 also, see equivalence essential infimum, 568 essential supremum, 568, 589
(F-)(G-)
)
Index and Symbol List Euclidean norm, 578 Euler ' s constant, 267 evaluation map, 240, 757 eventual, eventually, 158 eventuality filter, see filter, eventuality eventually constant, 165 examples (or lack of) , see intangible Excluded Middle, see Law of the E.M. existence of atoms, 28 Banach limits, 318, 321 bijection (Schri:ider-Bernstein) , 44 Boolean prime ideal, 339 canonical net, 159, 160 cardinalities between N and JR, 4 7 closest point, 470, 598, 774 cluster point, 452, 768 common superfilter, 1 03 common supernet, 410 completion of a uniform space, 514 completion of ordered group, 243 completion of poset, 93 explicit examples, xvi, 133, 404 free ultrafilters, 1 5 1 hyperreal numbers, 250 inaccessible cardinal, 46, 401 infinitesimals, 398 initial structures, 218 integrals, 630, 640, 656 intermediate value, 432 Lebesgue measure, 649 liminf and limsup, 1 75 locally finite cover, 448 maximum value, 456, 465 measurable cardinal, 254 model, 386 Moore closure, 79 nonconstructive proof, 8, 133 nowhere-differentiable functions, 670 objects proved by showing int(S) =f. 0, 4 1 1 set i s comeager, hence nonempty, 531 partition of unity, 445, 448 quotient of algebraic systems, 223 Radon-Nikodym derivative, 793
865 real numbers, 249, 270, 513 set lacking Baire property, 132, 808 sets, 30 shrinking of a cover, 445 solutions to polynomial equations, 257, 470 sup-completion of poset, 96 uniformity generated (not) , 120 universal subnet, 166 unmeasurable set or map, 549, 557, 587 U rysohn function, 445 Weil's pseudometric, 98 well ordering, 74, 144 witness for a formula, 380 also, see AC, DC, HB, UF, fixed point existential quantifier, 357 expectation, 613 explicit example, 404 exponential functors, 238 exponential growth condition, 825 exportation law, 363 extended real line, 13 extension of a function, 36 Extensionality, Axiom of, 29 external direct sum, 227, 276 external object, 397 extra-logical axioms, 364 F-lattice, 716 F-(semi)norm, F-space, 686, also seminorm, G-(semi)norm Fu , Go, 529 FAC, see choice factorial, 48 false, falsehood, see truth Fatou's Lemma, 566, 647 Fermat's Last Theorem, 1 34 field, 187 field of sets, see algebra of sets figures, see charts filter, 100, 336 base or subbase, 104 cofinite (Fn§chet) , 103, 105 correspondence with nets, 158 enlarge, 103
see norm,
866 eventuality (tails), 1 59 iterated, 1 03, 413 maximal, 1 05 neighborhood, 1 10, 409 proper or improper, 100, 336 ultra-, see ultrafilter final topology, 426 locally convex, 741 8-fine tagged division, 629 finer (stronger) , see stronger or weaker finest locally convex topology, 742 finitary, 25, 202 finite, 15, 43 character, 77, 144 charge, 551 choice (ACF, FAC) , see choice dimensional, 282, 284 intersection property, 104 sequence, 20 finitely additive, 288 subadditive, 800 valued, 547 F.I.P., see finite intersection property first, see minimum first category of Baire, see meager first countable, 427, 703 first-order language, logic, theory, 354 Fischer-Cook filter condition, 413 fixed or free collection of sets, 16, also see ultrafilter fixed point, 36, 70, 92, 128, 515-519, 524, 533, 534, 668, 727, 778, 815 Foguel-Taylor Theorem, 619 Folkman-Shapley Theorem, 308 forcing, 383 forgetful functors, 228 formulas, 361 forward image, see image Foundation, Axiom of, see Regularity Fourier transform, 709 fraction, 190 Frechet combination, 487, 689 derivative, 659
Index and Symbol List filter, see filter space, 694 topology, 437 free, see fixed or free free variable, 355, 358 frequent subnet, 163 frequent, frequently, 158 frontier, see boundary Fubini's Theorem, 613 full (order convex) , 80, 712 components, 80 full subcategory, 212 function, 19, 22 of classes, 27 functional, 277 functor, 227 fundamental group, 228 fundamental operations algebraic system, 202 barycentric algebra, 306 Boolean algebra, 329 group, 182 lattice group, 225 monoid, 179 ring or field, 187 variety with ideals, 221 Fundamental Theorem of Algebra, 4 70 Fundamental Theorem(s) of Calculus, 671, 674 G-(semi)norm, 573, also see norm, seminorm, F-(semi)norm Go , Fa , 529 Garnir's Closed Graph Theorem, 745 gauge (collection of pseudometrics) , 42 equivalent topologically, 109 Hausdorff or separating, 42, 43 topology, 109 uniformity, 1 19 gauge (Henstock) integral, 628 gaugeable topology, 109, 441 uniformity, 1 19 Gaussian probability measure, 552 generalized
Index and Symbol List Continuum Hypothesis (GCH ) , 47 functions, 744 Perron integral, 631 Riemann integral, 628 sequences, see nets generated, generating Boolean subalgebra, 330 by operations, 83 filter or ideal, 102, 226 Moore closure, 79 preuniformity, uniformity, 121 (a-)algebra, 1 16 subalgebra, 220 subgroup, 182 topology, 1 14 generative, 814 generic, 101, 531 , also see comeager, large Gherman 's conditions, 414 given topology, 754 g.l.b. (inf) , see infimum, supremum Godel Completeness Principle, 386, 390 consistency of AC and GCH, 348 constructible, 130 Incompleteness Theorems, 392, 400 number, 393 operations, 129 Gohde's Fixed Point Theorem, 778 Goldbach's Conjecture, 134, 270 Goldstine-Weston Theorem, 774 googol, 267 Gr, graph of a function, 22 of a relation, 50 grammar, 360 greatest, see maximum greatest lower bound, see infimum Gronwall's inequality, 816 Gross-Hausdorff Theorem, 469 Grothendieck et al. Theorem, 768 group, 181 Haar measure, 708 Hahn-Banach Theorem
867 equivalents, 318, 3 19, 615, 616, 6 18, 714, 750, 756, 802 nonconstructive (discussion) , 135, 143 half-space, 750 Hall's Marriage Theorems, 153 Halpern's vector bases, 285 ham sandwich, 14 Hamel basis, 2 81, 286 harmonic series, 267 Hartogs number, 127 Hausdorff compact metric space theorem, 469 convergence space, 170 Maximal Chain Principle, 144 measure of noncompactness, 506 metric for closed sets, 1 12 topological space, 439 HB, see Hahn-Banach Theorem Heine-Borel Property, 723 Helly's Intersection Theorem, 308 Henstock integral, integrable, 628 -Kurzweil integral, 631 -Saks Lemma, 638 -Stieltjes integral, 631 hereditary, 450 Heyting algebra, 341 Heyting implication, 340 highest, see maximum Hilbert space, 599 Hilbert's program, 399 Holder continuity, 482, 582 Holder Inequality, 591 holomorphic, 682 homeomorphism, 418 homogeneous function, 313 homogeneous polynomial, 191 homomorphism algebraic systems, 203 barycentric algebras, 307 from Q into any field, 190, 247 from into any ring, 188, 247 groups and monoids, 179 ideals, kernels, quotients, 222 lattices, 91, 205
Z
868 rings or fields, 187 homotopy-equivalent, 216 hull, 79 affine, balanced, convex, star, symmetric, absolutely convex, 303 closed convex, 698 hyperfinite, see bounded hyperreal hypernatural numbers, 252 hyperreal line, hyperreal numbers, 14, 250 ideal (homomorphism kernel) , 222 generated, 226 maximal, 336 prime, 336 -supporting variety, 221 also, see homomorphism (ideal of sets) , 100 generated by a collection, 1 02 of bounded sets, 1 04 of equicontinuous sets, 493 of finite sets, 103 of infrequent sets, 1 59 of meager sets, 531 of nowhere-dense sets, 531 of subsets of compact sets, 455 of totally bounded sets, 504 also, see a-ideal, small point(s) adjoined, 13 proper or improper, 1 00, 224, 336 also, see lower set idempotent, 36, 82, 4 1 1 , 414 identification of isomorphic objects, 209 identification topology, 425 identity (axiom) in an algebra, 204 element of a monoid, 179 function or map, 36 morphism, 216 if, 4 iff, 5 image, 37, 122 imaginary part, 255 Implicit Function Theorem, 669 implies, 4
Index and Symbol List importation law, 363 inaccessible cardinal, 46, 402 includes, 12 inclusion map, 36 inconsistent, 368 increasing or decreasing function, 57 net, 171 to a limit, 171 indefinite integral, 640, 787 independence, linear, 280 index set, 1 1 , 230 indicator function, 35, 3 1 1 , also see characteristic function indiscrete, see discrete or indiscrete indistinguishable, 435 individuals, 355, 377, 396, also see atom induction, 33, 47, 72, 99, 127 inductive locally convex topology, 74 1 inequality Bunyakovskil, 39 Cauchy-Schwarz, 39, 586, 600 Clarkson, 262, 592 Gronwall, 816 Holder, 591 Minkowski, 586 reverse Minkowski, 591 triangle (in a lattice group) , 200 triangle (in metric space) , 40 ultrametric, 42, 261 infer, inference, 363 infimum (inf) or supremum (sup), 59 1\ (meet, inf, g.l.b.) , 59 V (join, sup, l.u.b.), 59 associative and commutative, 88 complete lattice has /\(S), v(S) , 87 coordinatewise, pointwise, 6 1 dense, 92 depends on the larger set, 60 inf of infs, sup of sups, 6 1 inf sum i s pseudometric, 98 inf- or sup-closed, 80 lattice has x 1\ y, x V y, 87 of structures (in category theory) , 218 preserving, 62
Index and Symbol List sup completion, 96 topology, 1 14 using sup to define norms, 579 infinitary, 202 infintte, 13, 15, 43, 46, 149 Axiom of the, 31 dimensional, 282 distributivity, 18, 90 regress, 150 sequence, 20 series, 266 infinitely close, 251 infinitesimal, 251 , 398 infrequent , 158 initial end of a path, 681 gauge, 484 ordinal, 126 property, 450 segment, see lower set structure (topology, uniformity, etc . ) , 217, 696 injective, injection, 37 inner product, 599, also see product: dot, scalar intangible, xvi, 105, 133, 137, 140, 142, 151, 166, 404, 538, 610, 807 integers modulo m , 188 integrable, 290, 565, 589, 631, 691 absolutely, 645 simple function, 291 integrably (locally) Lipschitz, 593 integral, 564, 613, 627 integral domain, 189 integrally closed, 242 integrand, 289 intentional ambiguity, 261 Interchange of Hypotheses, 341 , 370 interior, 1 1 1 , 410 Intermediate Value Theorem, 432, 433 internal direct sum, 184 internal object, 397 interpolating polynomial, 35 interpretation, 134, 143, 377 intersection, 15
869 interval, 56 intuitionist logic, 363, 370, 371 inverse, 181 function, 37 image, 39, 122 left or right, 283 relation, 50 Inverse Function Theorem, 668 inverse image categories, 212 involution, 36 irrationals homeomorphic to NN , 540 irrefiexive, 51 isolated, 41, 106 isometric, 40 isomorphism Dom(f) /Ker(f) c::' Ran(!) for groups, 186 for linear spaces, 278 for varieties with ideals, 225 in a category, 216 informal definition, 209 of monoids, 179 of normed spaces, 575 uniqueness of JR., 249 X and subgroup of Perm( X), 184 X and submonoid of X x , 181 isotone convergence, 170 also, see increasing iterates, iterated filter, 103 fixed points of, 524 function, 36 limits, 413, 477, 768 James space, 586 James ' s Sup Theorem, 769 join (sup, V ) , see infimum, supremum joint continuity, 423 joke, 14, 48, 145 Jordan Decomposition, 199 Kadec's Renorming Theorem, 598 Kantorovic-Riesz Theorem, 298 Kelley subnet, 162
870 Kelley's Choice, 147 kernel, 186, 200, 281 Kirk's Fixed Point Theorem, 778 knob space, 120 knots, 730 Kolmogorov Normability Theorem, 721 Kolmogorov quotient space, 437 Kolmogorov space, 436 Kottman's Theorem, 620 Kowalsky's iterated filter, 103, 414 .Krein-Smulian Theorem, 773 Kronecker delta, 35, 194, 602 absolute value, 261 G-norm, 577 metric, 4 1 , 1 20, 488, 502 Kuratowski axioms for closed sets, 1 12 Continuity Lemma, 539 inclusion, 4 1 1 measure of noncompactness, 506 Kurzweil integral, 628 Kurzweil-Henstock integral, 631 labeling, 65 Lagrange notation, 659 Lagrange polynomials, 35 language, 9, 55, 134, 242, 328, 350 large, 1 0 1 , 231 larger, later, 51 largest or last, see maximum lattice, 87 algebra, 292 Boolean, 326 complemented, 326 complete, 87 diagrams, 89 distributive, 90, 326 group, 1 97 homomorphism, 9 1 , 205 meet-join characterization, 89 relatively pseudocomplemented, 340 vector, 292 Law of the Excluded Middle, 1 34, 142, 363, 370, 371, 400 LCS (locally convex space), 694
Index and Symbol List leading coefficient, 191 least, see minimum least upper bound (sup) , see infimum or supremum Lebesgue -Bochner space, 588 Covering Lemma, 468 Differentiation Theorem, 672 Dominated Convergence Theorem, 589 integral, 564, 613 measurable sets, 555 measure, 555 Monotone Convergence Theorem, 565 number, 468 point and set, 672 space, 589 left inverse, 181, 283 left-hand limit, left continuous, 420 Leibniz notation, 659 Leibniz's Principle, 394 L.E.M., see Law of the E.M. length of a sequence, 20 less, 51 Levi 's Theorem, 566 lexicographical order, 75 LF space, 742 liar, 9 Liggett-Crandall Theorem, 832 liminf, limsup, 17 4 limit, limit space, 168 limit from the left or right, 419 limit ordinal, 126 limited, see bounded hyperreal Lindenbaum algebra, 368 line, line segment, 305 linear combination, subspace, 275 dependence, independence, 280 isomorphism, 278 map, functional, dual, 277, 310 order, see chain space, algebra, 272 span, 276, 303 Lipschitz conditions, 481 , 816, 828
Index and Symbol List little, see small littler, 5 1 littlest, see minimum LM, 401 locally bounded space, 721 compact space, 457 continuous mapping, 418 convex space, 694 finite collection of sets, 444 full space, 712 generative, 814 integrable, 691 Lipschitz mapping, 482 solid space, 714 uniformly convex norm, 594 logical axioms, 362 Lovaglia's example, 596 love, 14 lower or upper bound, 59 limit, 1 74 lower limit topology, 451 lower set topology, 107 lower set, down-closed set, 57, 80 semicontinuous ( l.s.c. or u.s.c.) , 420 upper set, up-closed set, 80 lowest , see minimum Lowig's Theorem, 286 l.s.c., 420 l.u.b. (sup) , see infimum, supremum Luxemburg et al. Theorem, 322, 333, 616, 618 Mackey topology, 754 MacNeille completion, 94 magnitude, see absolute value Mal'cev-Godel Theorem, 386, 390 map or mapping, see function maps to, 23 Marriage Theorems ( Hall) , 153 martingale, 791 material implication, 5 matrix, 192 matrix norms, 606
871 max, maximum, 59 max-closed, max-closure, 81 maximal, 59 chain, 144 common AA subnet, 164 filter, 105 function for Lebesgue measure, 655 ideal, 224, 336 lemma for martingales, 792 linearly independent set, 281 orthonormal set, 603 principles equivalent to AC, 144 principles equivalent to DC, 525 also, see minimal Mazur et al. Theorem, 575, 699, 719 Mazurkiewicz-Alexandroff Theorem, 535 MC, see choice meager, 531 mean, 553 measurable cardinal, 253 mapping, 212, 546 sets, 1 15, 546 space, 1 15, 289, 546 measure, 288 measure algebra, 551 measure of noncompactness, 506 measure space, 289, 551 meet have nonempty intersection, 16, 103 1\ , see infimum, supremum member, 1 1 membership (E) induction or recursion, 33 metalanguage, metatheory, 351 metavariables, 361 metric, metrizable completion, 512 defined, 40 metrically bounded, see bounded subset of Banach space, 579 topology, 108 also, see pseudometric Meyers' Contraction Theorem, 519 midpoint convex, 698 Milman-Pettis Theorem, 777
872 min, minimum, 59 minimal, 31, 59 spanning set, 281 also, see maximal Minkowski functional, 316 inequality, 586 reverse inequality, 591 model, 381 model theory, 353 models of set theory, 347 modulo, 183, 188 modulus absolute value, 260 of convexity, 596 of uniform continuity, 484 modus ponens, 363 monoid, 179 monomial, 191 monotone class, 1 17 convergence, 1 71 Convergence Theorem Dini, 456 for Henstock-Stieltjes integrals, 643 Lebesgue, 565 function, 57 net, 172 Montel's Theorem, 683 Moore closure or collection, see closed Moore-Smith sequences, see nets more, 5 1 morphism concrete category, 210 general category, 216 Mostowski's Collapsing Lemma, 347 Multiple Choice Axiom (MC) , see choice multiplication or multiplicative, 24 identity (one) , 180, 187 in a group, 182 in a linear space, 272 in a monoid, 180 in a ring, 187 of matrices, 192
Index and Symbol List n-ary operation, 24 name, 378 NBP, 808 Nedoma's Pathology, 549, 587 negation, 4, 357 negative part, 198 negative variation, 294 negligible set, 101, 553 neighborhood; neighborhood filter base, 426 finite, see locally finite in a pretopological space, 409 in a topological space, 1 10, 4 1 1 string, 730 net, 157 Neumann series, 625 Niemytzki-Tychonov Theorem, 506 Nikodym et al. Theorem, 785, 787, 793 noncompactness, 506 nonconcrete category, 216 nondecreasing, 58 nondegenerate Boolean lattice, 327 nondense, see nowhere-dense non-Euclidean geometry, 346 nonexpansive, 481 nonlogical axioms, 364 nonmeager, 531 nonprincipal ultrafilter, see ultrafilter nonstandard analysis, 394 enlargement, 231 object, 397 norm, 314, 574 equivalent norms, 575 operators, 605 the "usual norm" is complete, 576 Normability Theorem, 721 normal cone, 712 Form Theorem, 330 probability measure, 552 sublattice, 300 topological space, 445 normalized duality map, 776
Index and Symbol List normalized function of bounded variation, 583 nothing, 14 nowhere-dense, 530 nowhere-differentiable, 556, 670 null set, 14, 1 0 1 , 553 nullary operation, 25 numbers, 12 object concrete category, 210 language, 351 nonconcrete category, 216 obtuse angle, 601 one, see multiplicative identity, 187 one-sided Boolean algebras, 340 derivatives, 661 limits, 420 Lipschitz conditions, 828 one-to-one, see injective one-to-one correspondence, see bijection onto, see surjective open almost, see almost open ball, 108, 688 interval, 57 mapping, 422 neighborhood base, 427 sets, 106 operator norm, 606 operator or operation, see function oracle, 137 order, ordered bounded, see bounded bounded operator, 296 by a normal cone, 712 by reverse inclusion, 157 complete, see complete convergent, 171 convex, 80, 712 dual, 299 equivalents of AC, 144 group, 194 ideal, see lower set
873 interval, 56 interval topology, 108 isomorphism, 58 monoid, 194 n-tuple, 20 pair, 20 preserving or reversing, see increasing ring or field, 245 topological vector space, 7 1 1 vector space, 292 ordinal, ordinal type, 124, 125 original topology, 754 Orlicz function, 693 Orlicz-Pettis Theorem, 764 orthogonal, 86, 300, 600 orthonormal set or basis, 602 Orwell, G., 3 oscillation, 492 Osgood-Baire Theorem, 532 outer measure, 560 Oxtoby's Zero-One Law, 543 p-adic absolute value, 261 pairing, 30, 751 pairwise disjoint, 16, 618 paracompact, 447 paradox, 142 Banach-Tarski's, 142 Berry's, 351 Burali-Forti's, 127 Condorcet's, 63 Epimenides's, 9 existence without examples, see intangible liar, 9 Quine ' s, 10 Russell's, 25 Skolem's, 389 Parallel Postulate, 346 Parallelogram Equation, 600 parameter, 1 1 , 2 1 paranorm, 686 Parseval's Identity, 603, 710 partial derivative, 663 partial sum, 266 partially ordered set (poset) , 52, 56
874 partition, 16 partition of unity, 444 Pascal's Triangle, 48 patching together, 445 path, path integral, 681 pathological, 142 Peano arithmetic, 382 permutation, 37, 184, 194 perpendicular, 86 Perron integral, 628, 631 Picard condition, 525 piecewise continuous, 5 1 1 piecewise-linear, 310 Plancherel transform, 710 Poincare fundamental group, 228 pathological functions remark, 670 point finite, 444 pointed topological space, 214 points, 27 pointwise almost everywhere, 554 convergence, 422 inf, sup, max, min, 6 1 also, see product polar, 761 polynomial, 191 Fundamental Theorem of Algebra, 4 70 Lagrange interpolation, 35 leading coefficient, 191 ring of, 1 90, 247 solution of quadratic, cubic, etc. , 257 Pontryagin Duality Theorem, 708 Pontryagin group, 707 poset, see partially ordered set positive charge, 288 cone, 1 96 definite, 40, 260 homogeneity, 313 integral, 564 logic, 362 operator, 296 part, 1 98 variation, 294
Index and Symbol List potential infinity, 13 power of a set, 22, 46 power series, 584 power set, 15, 30, 46 power set functor, 228 p.p. (presque partout ), 553, 554 precedes, 5 1 precise refinement, 17 precisely subordinated, 444 precompact, 505 predecessors, 57 predicate calculus or logic, 354 predicate logic with equality, 365 predicate symbols, 356 preimage, see inverse image prenex normal form, 375 preorder, preordered set, 52 preregular space, 438 prerequisites, xx presque partout, 553, 554 pretopological, 409 preuniformity, 1 1 8 prevalent, 556 prime ideal, 336 prime number, 48, 188 primitive objects, see atom primitive proposition symbols, 356 principal lower and upper sets, 57 principal ultrafilter, see ultrafilter probability, 330, 551, 618 product and Cauchy nets, 499 and equicontinuity, 495 Eberlein-Smulian Theorem, 477 inner, 599 nonempty by AC, 139 of bounded sets in TVS, 719 of closures, 424 of compact sets, 461 of complete spaces, 503 of complex numbers, 256 of convex functions, 312 of gauges, 485 of ideals, 226 of linear spaces, 273
Index and Symbol List of matrices, 192 of measures, 566 of morphisms, 219 of numbers, 34 of orderings, 53, 88, 292 of pseudometrics, 487 of rings, 189 of scalar and vector, 272 of sets, 21 of IT-algebras, 548 of structures (in a category) , 218 of subalgebras, 221 of TAG's, TVS's, LCS ' s, 696 of topologies, 422 of totally bounded spaces, 504 of ultrapowers, 233 of wosets, 75 uncountable ==? nonmetrizable, 488 also, see coordinate projection, pointwise productive, 450 projection coordinate, 22, 218, 236, 422, 426 for internal direct sum, 185 idempotent morphism closest point, 598, 602 linear, 286, 300 quotient, 54 proof, 352 proof by contradiction, 7, 134, 370, 400 proof theory, 353 proper or improper class, 26, 398 filter, 1 00, 336 ideal, 100, 224, 336 lower or upper set, 57 Riemann integral, 628 subset, superset, 12 propositional calculus or logic, 362 Pryce sequence, 264 pseudo-Boolean algebra, 341 pseudocompact, 465, 469 pseudocomplement, 340, 341 pseudometric, pseudometrizable Baire Category Theorem, 536 Cauchyness, 500
875 compactness, 469 completeness, 501 completion, 512 defined, 40 defined by inf I:, 98 equivalent (topologically) , 109 equivalent (uniformly) , 486 first countable, 427 Niemytzki-Tychonov Theorem, 506 product, 487 TAG or TVS, 703 topology, 108 totally bounded, 504 translation-invariant, 574 uniformity, 1 19 Weil Lemma, 98 also, see metric, norm, seminorm quadratic, cubic, quartic formulas, 257 quantifiers, 357 quartic equation, solution, 258 quasicomplete, 719 quasiconstructive, 404 quasiconvex, 310 quasigauge, l lO quasi-interpretation, 377 quasimodel, 38 1 quasinorm, 687 quasipseudometric, 40 Quine ' s Paradox, 10 quining, 10 quintic, 258 quotient group, 186 map or projection, 54 norms, 579, 608 object, 223 set, 54 topology, 425 radial, see absorbing radius of convergence, 584 Rado ' s Selection Lemma, 152 Radon Affineness Lemma, 307
Index and Symbol List
876 integral, 801 Intersection Theorem, 307 -Nikodym derivative, 787 -Nikodym Theorem and Property, 793 random variables, 232, 554 range, 19, 38 range condition, 832 rank, 1 29, 356 rare, see nowhere-dense rational functions, 1 9 1 , 247 rational numbers, 1 90, 247 real derivative, 666 -linear functional, 280 -linear map or operator, 277 linear space, 279 numbers modulo r , 183 part, 255 random variables, 232, 554 -valued charge or measure, 288 real number system Cantor's construction, 513 cardinality of, 269 Dedekind's construction, 249 defined, 246 uniqueness, 249 usual metric and topology, 109 realization, 381 recursion, 33, 47, 73, 128 reduced power, 229 nonstandard analysis, 394 of algebraic system, 236 refinement integral, 632 refinement of a cover, 1 7 reflective subcategories, 229 reflexive Banach space, 619, 774 LCS, 757 object in a category, 240 relation (xRx for all x), 51 regress, 1 50 regular open, 328 regular topological space, 427, 440 Regularity, Axiom of, 3 1 , 138, 1 50 relabeling, 9, 165
(JR.)
relation, 50, 356 relative compactness, 459 complement (of a set) , see complement consistency, 401 pseudocomplement, 340 topology, 107 remetrization function, 486 renorming, 596 Reparametrization Theorem, 635 Replacement, Axiom of, 30 residual, 101, 158, 531 also, see eventual, generic, large, comeager resolvent, resolvent set, 626 respect an equivalence, 55 restriction Axiom of, see Regularity of a function, 36 of a relation, 50 also, see trace reverse inclusion, 157 Reverse Minkowski Inequality, 591 Riemann -Cauchy Equations, 666 -Darboux integral, 628 geometry, 346 integral, integrable, 627 -Lebesgue Lemma, 654, 709 -Stieltjes integral, 631 sum, 629 Riesz Decomposition Property, 196 Decomposition Theorem, 300 (F-) (semi )norm, 713 -Kantorovic Theorem, 298 Representation Theorem, 804, 805 space or subspace, 292 Theorem on Locally Compact TVS's, 726 right half-open interval topology, 451 right inverse, 181 right-hand limit, right continuous, 419 ring, 187 of sets, 1 17 RNP (Radon-Nikodym Property) , 795 row matrix, 192
Index and Symbol List rule, 19 rule of detachment, 363 rule of generalization, 375 rules of inference, 363 Russell, Bertrand Paradox, 25 quotation about truth, 345 socks and shoes, see choice u, countable sums or unions, 43 u-additive, 288 u-algebra, 1 15 u-field, 1 15 u-finite charge or measure, 558 u-ideal, 101 u-ring, 117 sandwich, 14, 319, 449 satisfy, 353 saturated, saturation, 79, 8 1 , 82 scalar continuity, 687 scalarly measurable, 621 scalars, scalar multiplication, 272, 283 Scedrov-real number, 349 Schauder's Fixed Point Theorem, 727 schemes for axioms, 363 Schroder-Bernstein Theorem, 44 Schur's Theorem, 759 Schwarz Inequality, 39, 600 Scott et al. Epimorphism Theorem, 333 second category of Baire, see nonmeager second derivative, 661 segment initial, see lower set line, 305 self-mapping, 35 semantic implication, consequence, theorem, consistency, 204, 353 semigroup of operators, 825, 831 semi-infinitely distributive, 90 seminorm, 314, 574, also see norm, (F-) (G-) (semi) norm semireflexive, 757 semivariation, 800 sentence, 375 sentential calculus or logic, 362
877 separable (i.e., has countable dense set ) , 416 separably valued, 547 separated pairing, 752 separated spaces, 439 separately continuous, 423 separation of points, using: (F-) (G-)pseudonorms, 704 a collection of functions, 37 convergences, 170 gauge or uniformity, 42, 43, 442 sets and/or functions, 434 Separation, Axiom of, see Comprehension sequences, sequential, 20, 157 Banach limit, 320 closure, 427 cluster point, 430 compactness, 466 completeness, 501 continuity, 719 generalized, 157 martingales, 791 series, 266, 583 set, 1 1 , 26 set theory with atoms, 28 Shapley-Folkman Theorem, 308 Shelah's alternative, 344, 402, 405, 745 shrinking, 445 shy, 556 sign function, 35 Sikorski's Extension Criterion, 331 simple function, 291 simplex, 71, 306, 727 singleton, 14 Skolem's example, 394 Skolem's Paradox, 389 Slow Contraction Theorem, 517 small, 101, 654, also see ideal smaller, 51 smallest, see minimum smooth, 661 Smulian et al. Theorem, 477, 768, 773 Sobczyk-Bohnenblust Correspondence, 280 socks and shoes, 140 solid, solid kernel, 200, 714 Soundness Principle, 385
878 space, see linear, measurable, topological, uniform span, 276 special Denjoy integral, 628 spectrum and spectral radius, 625 square matrix, 192 stabilizer group, 384 stage, 129 staircase, 674 standard basis for lFn , 282 deviation (of Gaussian probability) , 553 in Internal Set Theory, 397 object (in nonstandard analysis) , 397 part (of a hyperreal) , 251 real numbers, 251 star property, 410, 414 star set, s e e convex step function, 292, 637 Stieltjes integrals, 631 Stone - Cech compactification, 462 mapping, 338 Paracompactness Theorem, 449 Representation Theorem, 327 space, 472 straight line, straight line segment, 305 strict contraction, 481 strict inductive limit, 742 strictly convex function, 310 norm, 594 strictly larger, stronger, etc. , 5, 5 1 , 58 string, 730 strong topology, 753 stronger or weaker, 5, 109, 2 1 1 , 575 strongest locally convex topology, 742 strongly inaccessible cardinal, 46 strongly measurable, 548 subadditivity, 260, 314, 573 subalgebra, 220 subbase or subbasis for a filter, see filter subbase for a topology, 1 14 for a uniformity, 1 18
Index and Symbol List subcategory, 212 subcover, 17 subgroup, 182 sublattice, 89 sublinear, 314 submonoid, 179 subnet , 1 62 Aarnes and Andenres, 162 cofinal, 163 frequent, 1 63 hereditary property, 165 introduction, 161 Kelley, 162 Willard, 162 subobject, 220 subordinated, 444 subsequence, 20, 161 subseries, 622 subset, 12 subspace linear, 275 topology, 107 succeeds, 5 1 successor function, 382 ordinal, 126 sufficiently large, 158 sum, 34, 184, 583 sup, supremum, see infimum or supremum superfilter, 102 supersequentially compact, 468 superset, 12 superstructure, 396 support, 1 1 1 surjective, surjection, 1 9 surprise, xvi, 13, 105, 145, 270, 317, 403, 460 syllogism law, 362 symmetry, symmetric difference, 17, 326 entourage, 120 G-seminorms are, 573 group of order n, 184 pseudometrics are, 40 relation, 5 1
Index and Symbol List set, see convex topological space, 437 syntactic implication, consequence, theo rem, consistency, 204, 352 T0 , T1 , T2 , . . . ( separation axioms ) , 434 T0 quotient space, 437 tables, see charts TAG ( topological Abelian group ) , 694 tagged division of an interval, 629 tail set in [0, 1 ) , 542 in 2N , 542 of a net, 158 Tarski et al. Theorem, 92, 142, 151, 333 tautology, 353, 377 Taylor-Foguel Theorem, 619 Teichmuller-Tukey Principle, 144 term ( in a first-order language) , 360 term ( in an algebraic system) , 203 terminal end of a path, 681 tertium non datur, 363 then, 4 theorem, 353 Tonelli 's Theorem, 566 toplinearly bounded, 718 topological Abelian group ( TAG ) , 694 closure, 1 1 1 convergence, see convergence linear space ( TVS ) , 694 quotient map, 425 Riesz space, 714 space, 106 vector space ( TVS ) , 694 topologically complete, 535 equivalent, 109, 2 1 1 indistinguishable, 435 stronger, 109, 2 1 1 topology, 106 gauge, gaugeable, 109, 441 generated by a collection of sets, 1 14 of pointwise convergence, 753 of simple convergence, 753
879 of uniform convergence, 491 ( pseudo) metrizable, 109 uniform, uniformizable, 1 19 total order, see chain total paranorm, 687 total preorder, 63 total quasinorm, 687 total variation, see variation totally bounded, 504, 707, 726 totally measurable, 692 trace, 50, 103, 220 Transfer Principle, 395 transfinite, 47 transitive closure, 123 relation, 51 set, 122 translation-invariant neighborhood filter, 699 ordering, inf, sup, 194, 199 pseudometric, 574 topology, 699 uniformity, 705 transpose ( of a matrix) , 192 triangle inequality, see inequality tribe, 117 trichotomy not constructive for IR, 135, 271 , 349 of cardinals, 145 satisfied by chains, 62 trivial ordering, 197 trivially true, 6 true love, 14 truth, 9, 101, 134, 139, 143, 145, 349, 353, 377, 461 truth table, 5 Tukey-Teichmuller Principle, 144 TVS ( topological vector space ) , 694 two-valued homomorphism, 330 two-valued probability, 551, 618 Tychonov Fixed Point Theorem, 727 -Niemytzki Theorem, 506 product of compacts, 460, 461 Theorem: Finite Dimensional TVS's, 725
880 topological space, 442 type (for algebraic systems, etc .) , 202 typographical conventions, 3 UF, see ultrafilter equivalents Ulam-Mazur Theorem, 575 ultrabarrelled TVS, 731 ultrabarrels, 730 ultrafilter, 104 and compactness, 454 and total boundedness, 505 and universal net, 166 Boolean, 336 equivalents of UF, 151, 152, 166, 237, 338, 339, 386, 387, 390, 391 , 454, 462, 473, 505, 762, 763 fixed (principal) , 103, 105 free (nonprincipal) , 105 existence, 1 51 intangible, 133 ultrametric, 42 ultranet, see universal net ultrapower, see reduced power unary operation, 25 unconditionally convergent, 622 uncountable, 15, 43 underlying set, 179, 210 Uniform Roundedness Theorem, 731 , 732, 764 uniformity, uniform space, 1 18, 441, 483 uniformly bounded, 6 1 1 continuous, 213, 483 convergent, 490 convex, 594 equicontinuous, 494 equivalent or stronger, 2 1 1 union, 1 5 , 30 uniqueness of choices if canonical, 1 48 closest point projection, 594 complete (F- )norm on a vector space, 576, 748 completions, 95, 514 continuous extension from dense set, 439
Index and Symbol List direct sum decomposition, 185 Hahn-Banach extension, 619 identity element in a monoid, 179 Jordan Decomposition, 199 limit in Hausdorff space, 409 linear extension to span, 279 natural uniformity for a TAG, 705 preimage by an injective function, 37 real number system, 249 topology having a given base, 429 topology having a given closure, 1 1 2 topology having a given convergence, 4 1 2 uniformity for a compact space, 489 value given by a function, 19 unit circle, 578 unit mass at a point, 552 unital algebra, 273 universal algebra, 202 net, 165 and compactness, 454 and completeness, 499 and convergence, 170 and total boundedriess, 505 subnet theorem, 166 ordering, relation, 50 used to construct canonical net, 159 quantifier, 357 set, 17, 26 universe, 17, 26 unordered set, see set up-, upper, see lower or upper urelements, see atom Urysohn's Lemma, 445 Urysohn-Alexandroff Theorem, 540 u.s.c. , 421 usual absolute values on lR and C, 260 metric and topology on JR, 40, 109 metric on [-oo, + oo], 41 norms are complete, 576 norms on ]Rn and en ' 578 uniformity on a TAG, 705 V (von Neumann's universe) , 129
Index and Symbol List vacuously true, 6 valid, 353, 381 valuation (in logic) , 381 value, valuation, s e e function, component, absolute value vanishes, 37 at infinity, 580 variation, 294, 507, 784 variety (algebraic) , 204 vector, 272 basis, see basis charge, 288 lattice, 292 space, see linear vel, 4 Venn diagram, 17 vicinity, s e e entourage Vitali ' s Theorem, 557 w-dissipative, 828 wave equation, 823 we may assume, 8, 165 weak -star measurable, 621 -star topology, 757 structure, 217 topology, 42 1 , 753 Ultrafilter Principle, 151 Universal Subnet Theorem, 166 weaker, see stronger or weaker weakly increasing, 58 weakly measurable, 621 Weil's Pseudometrization Lemma, 98 well defined, 23, 55 well-formed formulas, 361 well ordering, 72, 74, 144 Weston-Goldstine Theorem, 774 wff, 361 Wiener measure, 555 Willard subnet, 162 with probability 1 , 553 witness, 376, 512 woset, 72 Wright's Closed Graph Theorem, 745 WUF, see Weak Ultrafilter Principle
881 Zermelo Fixed Point Theorem, 128 -Fraenkel Set Theory, 29 Well Ordering Principle, 144 zero, see additive identity, 187 zero-dimensional, 472 Zero-One Law, 543 ZF, see Zermelo-Fraenkel Set Theory Zorn ' s Lemma, 144
LIST OF SYMBOLS
The Greek alphabet: A, a alpha B, {3 beta r , , gamma delta !::,. {j E, E epsilon Z, ( zeta H, 'l] eta e, e theta I, L iota kappa A, >.. lambda M, J-l mu N, v nu xi 3, � O, o om1cron II, w pi P, p rho I:, a sigma T, T tau T, v upsilon , r.p phi x, x chi psi iJ! , � O, w omega )
K, "'
Sets of Numbers: A, Ia, IDl directed sets, 157 C complex numbers, 255
Index and Symbol List
882
lF
a field (usually IR or IC), 261 hyperreal numbers, 250 N natural numbers, 12, 180 Q rational numbers, 189 IR real numbers, 246 '][' circle group, 260 integers, 12, 183 *N hypernatural numbers, 252 *IR hyperreal numbers, 252 lHI
Z
Special objects and sets: {a, ,8, } set, 1 1 [a, b] , [a, b) intervals, 56 [ - oo, +oo] extended real line, 13, 91, 109 Pre( a) predecessors, 57 a, w , ,8, r, "( dual topologies, 753, 754 0 empty set, 1 4 oo infinity, 13 cardinals, 14, 126 Nn w infinite ordinal, 14, 1 24 (yj ) sequence, 20, 157 ( Ya) net, 157 P(X) power set, 15, 46 X/ 8 quotient set, 54 Bd( o ) open ball, 108 Kd( o ) closed ball, 108 a(9) a-algebra generated, 1 16 N( x ) neighborhood filter, 1 10, 409, 412 F(T , GtJ some types of sets, 529 0
0
0
Unary symbols: complement, 1 6 c negation, 4 , 357 Gr( o ) graph, 22, 50 sgn sign function, 35 cos + i sin, 256 cis 1s characteristic function, 34 identity function, 36 is indicator function, 3 1 1 Is x - 1 inverse, 182 inverse (additive) , 182 -x f ( x ) value of function, 19 restriction, 36 1
1s f : X -+ Y function, 1 9
Dom( f ) Ran(!) Ker( f )
domain, 19 range, 19 kernel, 186 forward image, 37 ( ) fS inverse image, 39 r 1 (s) inverse function, 37 r 1 (x) reduced power, 231 * X, *f dual, 238 X * , f* Con( ) consistency of, 401 X+ positive cone, 196 positive part, 198 x+ X negative part, 198 /x/ absolute value (lattice group) , 198 absolute value (real-valued), 260 lxl semivariation, 800 :'J.l:' cardinality, 43, 145 lSI norm, 574 x l l x l l l l norm, 574 l l l x l l l (operator) norm, 574, 605 J( x) normalized duality map, 776 Re(a) real part, 255 Im(a) imaginary part, 255 complex conjugate, 255 a transpose, 192 T orthogonal, 86, 600 _L limit, 168, 169, 17 4 lim LIM Banach limit, 320 cl closure, 78, 1 1 1 , 410, 412 int interior, 1 1 1 , 410, 412 convex hull, 303 co balanced hull, 303 hal 0
Binary symbols: � maps to, 23 decreases (to) , 172 1 increases (to) , 171 i --+ converges to, 1 69 --+ implies, 4 => implies, 4, 340 iff (if and only if), 5 {::::=} 1-syntactic implication, 352 semantic implication, 353 F universal quantifier, 357 V 3 existential quantifier, 357 E, ¢'c element, member, 1 1
Index and Symbol List
X
c ---=--.
R- 1
�, =
-< , C:::
=:;< , r;;;;
8xy
xDy
St:J..T
xOy
Jog
x y d( , ) ·
osc( · ) Var ( · )
( nk )
(, )
J f dJ.L J f dJ.L df/dx d>./dJ.L
member or equal, 124 subset, 1 1 relative complement, 16 product, 21 inclusion map, 36 inverse relation, 50 symmetric relations, 51 irreflexive orders, 51 reflexive orders, 51 Kronecker delta, 35 binary operation, 24 symmetric difference, 1 7 (used briefly in Ch. 16) , 438 composition, 35, 50 product, 87, 180, 599 distance, 40 oscillation, 492 variation, 507 binomial coefficient, 48 bilinear pairing, 23, 751 integral, 564, 613, 627 integral, 289 derivative (Leibniz notation) , 659 Radon-Nikodym derivative, 788
n-ary symbols: u union, disjunction, 1 5 u union, disjunction, 4, 357 n intersection, conj unction, 15 n intersection, conjunction, 4, 357 ll product, 2 1 , 218, 274, 421 I: sum, 34, 184, 266, 583, 629 v sup, l.u.b. , join, vel, 4, 59 (\ inf, g.l.b. , meet, and, 4, 59 product O"-algebra, 34, 549 ® internal direct sum, 185 EB u external direct sum, 226, 276 Spaces of Functions: xv power of a set, 22 2y power set, 15, 46 ba, ca spaces of charges, 293 ba, ca spaces of charges, 800 B(X, Y) bounded, 97, 277, 579, 801
883
C(X, Y) C(X, Y) BC(X, Y) BUC(X, Y) Lip(X, Y) BV( · ) Hol" (X, Y)
continuous, 690 continuous, 495 bounded continuous, 277, 580 bdd. unif. contin., 277, 580 Lipschitz, 277, 481 bounded variation, 583, 784 Holder continuous, 482, 582 Hol(fl) holomorphic, 691 Co (X, Y) contin. vanish at ends, 580 C0 (X, Y) smooth vanish at ends, 277 Cc, CK contin. compact support, 743 TI(JRM) smooth, compact support, 744 SM(·) strongly measurable, 548, 554 TM( · ) totally measurable, 692 c, co , £p , FN sequence spaces, 580, 585, 690 f.,P ' LP Lebesgue spaces, 588 f., 'P , L'P Orlicz spaces, 693 L foc ( n ) locally integrable, 691 Lin(X, Y) linear, 277 BL(X, Y) bounded linear, 605 Inv(X, Y) invertible linear, 625
This Page Intentionally Left Blank