Mathematics
Published nearly forty years after the first edition, this long-awaited Second Edition also:
Measure and I...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Mathematics

Published nearly forty years after the first edition, this long-awaited Second Edition also:

Measure and Integral An Introduction to Real Analysis Second Edition

Richard L. Wheeden Antoni Zygmund

Wheeden • Zygmund

This widely used and highly respected text for upper-division undergraduate and first-year graduate students of mathematics, statistics, probability, or engineering is revised for a new generation of students and instructors. The book also serves as a handy reference for professional mathematicians.

Second Edition

• Studies the Fourier transform of functions in the spaces L1, L2, and Lp, 1 0, let x k = xk /|x|, y k = yk /|y|, x = (x 1 , . . . , x n ) = x/|x|, and y = (y 1 ,n. . . , y n ) = y/|y|. Then |x | = |y | = 1, so that by the case already proved, k = 1 xk yk ≤ 1; that is, n x y ≤ |x||y|, as claimed. k=1 k k If we now define the distance between two points x and y by d(x, y) = |x − y|, we immediately obtain the characteristic metric space properties: (i) d(x, y) = d(y, x) (ii) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y (iii) d(x, y) ≤ d(x, z) + d(z, y) We have used the symbol xk to denote the kth coordinate of x. When no confusion should arise, we will also use {xk } to denote a sequence of points of Rn . If x ∈ Rn , we say that a sequence {xk } converges to x, or that x is the limit point of {xk }, if |x − xk | → 0 as k → ∞. We denote this by writing either

4

Measure and Integral: An Introduction to Real Analysis

x = limk→∞ xk or xk → x as k → ∞. A point x ∈ Rn is called a limit point of a set E if it is the limit point of a sequence of distinct points of E. A point x ∈ E is called an isolated point of E if it is not the limit of any sequence in E (excluding the trivial sequence {xk } where xk = x for all k). It follows that x is isolated if and only if there is a δ > 0 such that |x – y| > δ for every y ∈ E, y = x. For sequences {xk } in R1 , we will write limk→∞ xk = + ∞, or xk → +∞ as k → ∞, if given M > 0 there is an integer K such that xk ≥ M whenever k ≥ K. A similar definition holds for limk→∞ xk =− ∞. A sequence {xk } in Rn is called a Cauchy sequence if given ε > 0 there is an integer K such that |xk − xj | < ε for all k, j ≥ K. We leave it as an exercise to prove that Rn is a complete metric space, that is, that every Cauchy sequence in Rn converges to a point of Rn . A set E ⊂ E1 is said to be dense in E1 if for every x1 ∈ E1 and ε > 0 there is a point x ∈ E such that 0 < |x − x1 | < ε. Thus, E is dense in E1 if every point of E1 is a limit point of E. If E = E1 , we say E is dense in itself. As an example, the set of points of Rn each of whose coordinates is a rational number is dense in Rn . Since this set is also countable, it follows that Rn is separable, by which we mean that Rn has a countable dense subset. For nonempty subsets E of R1 , we use the standard notations sup E and inf E for the supremum (least upper bound) and infimum (greatest lower bound) of E. In case sup E belongs to E, it will be called max E; similarly, inf E will be called min E if it belongs to E. 1 If {ak }∞ k = 1 is a sequence of points in R , let bj = supk≥j ak and cj = inf k≥j ak , j = 1, 2, . . . . Then −∞ ≤ cj ≤ bj ≤ +∞, and {bj } and {cj } are monotone decreasing and increasing, respectively; that is, bj ≥ bj+1 and cj ≤ cj+1 . Define lim supk→∞ ak and lim inf k→∞ ak by sup ak , j→∞ j → ∞ k≥j k→∞ lim inf ak = lim cj = lim inf ak .

lim sup ak = lim bj = lim k→∞

j→∞

j→∞

(1.3)

k≥j

We leave it as an exercise to show that −∞ ≤ lim inf k→∞ ak ≤ lim supk→∞ ak ≤ +∞ and that the following characterizations hold.

Theorem 1.4 (a) L = lim supk→∞ ak if and only if (i) there is that converges to L and (ii) if L > L, there ak < L for k ≥ K. (b) l = lim inf k→∞ ak if and only if (i) there is that converges to l and (ii) if l < l, there ak > l for k ≥ K.

a subsequence {akj } of {ak } is an integer K such that a subsequence {akj } of {ak } is an integer K such that

5

Preliminaries

Thus, when they are finite, lim supk→∞ ak and lim inf k→∞ ak are the largest and smallest limit points of {ak }, respectively. We leave it as a problem to show that {ak } converges to a, −∞ ≤ a ≤ + ∞, if and only if lim supk→∞ ak = lim inf k→∞ ak = a. We can also use the metric on Rn to define the diameter of a set E by letting δ(E) = diam E = sup x − y : x, y ∈ E . If the diameter of E is finite, E is said to be bounded. Equivalently, E is bounded if there is a finite constant M such that |x| ≤ M for all x ∈ E. If E1 and E2 are two sets, the distance between E1 and E2 is defined by d(E1 , E2 ) = inf{|x − y| : x ∈ E1 , y ∈ E2 }.

1.3 Open and Closed Sets in Rn , and Special Sets For x ∈ Rn and δ > 0, the set B(x; δ) = {y : |x − y| < δ} is called the open ball with center x and radius δ. A point x of a set E is called an interior point of E if there exists δ > 0 such that B(x; δ) ⊂ E. The collection of all interior points of E is called the interior of E and denoted E◦ . A set E is said to be open if E = E◦ ; that is, E is open if for each x ∈ E there exists δ > 0 such that B(x; δ) ⊂ E. The empty set ∅ is open by convention. The whole space Rn is clearly open, and we leave it as an exercise to prove that B(x; δ) is open. We will generally denote open sets by the letter G. A set E is called closed if CE is open. Note that ∅ and Rn are closed. Closed sets will generally be denoted by the letter F. The union of a set E and all its ¯ By the boundary of E, we limit points is called the closure of E and written E. ◦ ¯ mean the set E − E . We leave it to the reader to prove the following facts.

Theorem 1.5 (i) B (x; δ) = y : x − y ≤ δ . ¯ that is, E is closed if and only if it contains all (ii) E is closed if and only if E = E; its limit points. (iii) E¯ is closed, and E¯ is the smallest closed set containing E; that is, if F is closed and E ⊂ F, then E¯ ⊂ F. The open subsets of Rn satisfy the properties listed in the next theorem.

6

Measure and Integral: An Introduction to Real Analysis

Theorem 1.6 (i) The union of any number of open sets is open. (ii) The intersection of a finite number of open sets is open. Verification is left to the reader. Using the De Morgan laws, we obtain the following equivalent statements.

Theorem 1.7 (i) The intersection of any number of closed sets is closed. (ii) The union of a finite number of closed sets is closed. A subset E1 of E is said to be relatively open with respect to E if it can be written E1 = E ∩ G for some open set G. Similarly, E1 is relatively closed with respect to E if E1 = E ∩ F for some closed F. Note that the relative complement of a relatively open set is relatively closed. A useful alternate characterization of relatively closed is as follows. Theorem 1.8 A set E1 ⊂ E is relatively closed with respect to E if and only if E1 = E ∩ E¯ 1 , that is, if and only if every limit point of E1 that lies in E is in E1 . The proof is left as an exercise. Consider a collection {A} of sets A. Then a set is said to be of type Aδ if it can be written as a countable intersection of sets A and to be of type Aσ if it can be written as a countable union of sets A. Thus, “δ” stands for intersection and “σ” for union. The most common uses of this notation are Gδ and Fσ , where {G} denotes the open sets in Rn and {F} the closed sets. Hence, H is of type Gδ if H=

Gk , Gk open,

k

and H is of type Fσ if H=

Fk , Fk closed.

k

The complement of a Gδ set is an Fσ set, and vice versa. A Gδ (Fσ ) set is of course not generally open (closed); in fact, any closed (open) set in Rn is of

7

Preliminaries

type Gδ (Fσ ): see Exercise l(j). These two special types of sets will be very useful later in the measure approximation of general sets. Another special type of set that we will have occasion to use is a perfect set, by that we mean a closed set C each of whose points is a limit point of C. Thus, a perfect set is a closed set that is dense in itself. One particular property of perfect sets we will use is stated in the following theorem. The proof is postponed until Section 1.4.

Theorem 1.9

A perfect set is uncountable.

Other special sets that will be important are n-dimensional intervals. When n = 1 and a < b, we will use the usual notations [a, b] = {x : a ≤ x ≤ b}, (a, b) = {x : a < x < b}, [a, b) = {x : a ≤ x < b}, and (a, b] = {x : a < x ≤ b} for closed, open, and partly open intervals. Whenever we use just the word interval, we generally mean closed interval. An n-dimensional interval I is a subset of Rn of the form I = {x = (x1 , . . . , xn ) : ak ≤ xk ≤ bk , k = 1, . . . , n}, where ak < bk , k = 1, . . . , n. An interval is thus closed, and we say it has edges parallel to the coordinate axes. If the edge lengths bk − ak are all equal, I will be called an n-dimensional cube with edges parallel to the coordinate axes. Cubes will usually be denoted by the letter Q. Two intervals I1 and I2 are said to be nonoverlapping if their interiors are disjoint, that is, if the most they have in common is some part of their boundaries. A set equal to an interval minus some part of its boundary will be called a partly open interval. By definition, the volume v(I) of the interval I = {(x1 , . . . , xn ) : ak ≤ xk ≤ bk , k = 1, . . . , n} is v(I) =

n

(bk − ak ).

k=1

Somewhat more generally, if {ek }nk = 1 is any given set of n vectors emanating from a point in Rn , we will consider the closed parallelepiped P = {x : x =

n

tk ek , 0 ≤ tk ≤ 1}.

k=1

Note that the edges of P are parallel translates of the ek . Thus, P is an interval if the ek are parallel to the coordinate axes. The volume v(P) of P is by definition the absolute value of the n × n determinant having e1 , . . . , en as rows.∗ In case P is an interval, this definition agrees with the one given earlier. A linear transformation T of Rn transforms a parallelpiped P into a parallelpiped ∗ See, for example, G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 3rd ed., Macmillan,

New York, 1965, Theorem 8, p. 290.

8

Measure and Integral: An Introduction to Real Analysis

P with volume v(P ) = |det T|v(P).∗ In particular, a rotation of axes in Rn (which is an orthogonal linear transformation) does not change the volume of a parallelepiped. We will assume basic facts about volume: for example, if N N is finite and P is a parallelepiped with P ⊂ N 1 Ik , then v (P) ≤ 1 v(Ik ), and if {Ik }N are nonoverlapping intervals contained in a parallelepiped P, then N 1 v(I ) ≤ v(P). k 1 We shall use the notion of interval to obtain a basic decomposition of open sets in Rn . We consider first the case n = 1, which is somewhat simpler than n > 1.

Theorem 1.10 open intervals.

Every open set in R1 can be written as a countable union of disjoint

Proof. Let G be an open set in R1 . For x ∈ G, let Ix denote the maximal open interval containing x which is in G; that is, Ix is the union of all open intervals that contain x and that lie in G. If x, x ∈ G and x = x , then Ix and Ix must either be disjoint or identical, since if they intersect, their union is an open interval containing x and x . Clearly, G = x∈G Ix . Since each Ix contains a rational number, the number of distinct Ix must be countable, and the theorem follows. The construction used in this proof fails in Rn if n > 1, since the union of (overlapping) intervals is not generally an interval. The theorem itself fails when n > 1, as is easily seen by considering any open ball. As a substitute, we have the following useful result. Theorem 1.11 Every open set in Rn , n ≥ 1, can be written as a countable union of nonoverlapping (closed) cubes. It can also be written as a countable union of disjoint partly open cubes. Proof. Consider the lattice of points of Rn with integral coordinates and the corresponding net K0 of cubes with edge length 1 and vertices at these lattice points. Bisecting each edge of a cube in K0 , we obtain from it 2n subcubes of edge length 12 . The total collection of these subcubes for every cube in K0 forms a net K1 of cubes. If we continue bisecting, we obtain finer and finer nets Kj of cubes such that each cube in Kj has edge length 2−j and is the union of 2n nonoverlapping cubes in Kj+1 . Now let G be any open set in Rn . Let S0 be the collection of all cubes in K0 that lie entirely in G. Let S1 be those cubes in K1 that lie in G but that are not ∗ See, for example, G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 3rd ed., Macmillan,

New York, 1965, Theorem 9, p. 290.

Preliminaries

9

subcubes of any cube in S0 . More generally, for j ≥ 1, let Sj be the cubes in Kj that lie in G but that are not subcubes of any cube in S0 , . . . , Sj−1 . If S denotes the total collection of cubes from all the Sj , then S is countable since each Kj is countable, and the cubes in S are nonoverlapping by construction. Moreover, ∞, each since G is open and the cubes in Kj become arbitrarily small as j → point of G will eventually be caught in a cube in some Sj . Hence, G = Q∈S Q, which proves the first statement. The proof of the second statement is left to the reader. The collection {Q: Q ∈ Kj , j = 1, 2, . . .} constructed above is called a family of dyadic cubes. In general, by dyadic cubes, we mean the family of cubes obtained from repeated bisection of any initial net of cubes in Rn . Note that the family of dyadic cubes used in the proof of Theorem 1.11 could be replaced by one in which the initial net consists of cubes of any fixed edge length. It follows from Theorem 1.10 that any closed set in R1 can be constructed by deleting a countable number of open disjoint intervals from R1 . A perfect set results by removing the intervals in such a way as to create no isolated points; thus, we would not remove any two open intervals with a common endpoint.

1.4 Compact Sets and the Heine–Borel Theorem

By a cover of a set E, we mean a family F of sets A such that E ⊂ A∈F A. A subcover F1 of a cover F is a cover with the property that A1 ∈ F whenever A1 ∈ F1 . A cover F is called an open cover if each set in F is open. We say E is compact if every open cover of E has a finite subcover. Two equivalent statements, whose proofs are left as exercises, are as follows.

Theorem 1.12 (i) (The Heine–Borel theorem) A set E ⊂ Rn is compact if and only if it is closed and bounded. (ii) A set E ⊂ Rn is compact if and only if every sequence of points of E has a subsequence that converges to a point of E. We leave it as an exercise to show that the distance between two nonempty, compact, disjoint sets is positive and that the intersection of a countable sequence of decreasing, nonempty, compact sets is nonempty. Thus, a nested sequence of closed intervals has a nonempty intersection. See also Exercise 12. With these facts, we can now prove Theorem 1.9.

10

Measure and Integral: An Introduction to Real Analysis

Proof of Theorem 1.9. Let C be a perfect set in Rn , and suppose that C is countable: C = {ck }∞ k = 1 . Let Ck = C − {ck }, k ≥ 1. Given x1 ∈ C1 , let Q1 be a (closed) / Q1 . Then Q1 ∩ C is compact (closed and cube with center x1 such that c1 ∈ bounded) and not empty. Since x1 ∈ C and C is perfect, x1 is a limit point of C and so also of C2 . It follows that C2 ∩ Q◦1 is not empty. Let x2 ∈ C2 ∩ Q◦1 and choose a cube Q2 with center x2 such that Q2 ⊂ Q1 and c2 ∈ / Q2 . Then Q2 ∩ C is a compact, nonempty subset of Q1 ∩ C. Continuing in this way, we obtain a decreasing sequence Qk ∩ C of compact, nonempty sets such that / Qk . It follows that k (Qk ∩ C) is a nonempty subset of C that contains no ck ∈ ck . This contradiction proves that C must be uncountable and establishes the theorem.

1.5 Functions By a function f = f (x) defined for x in a set E ⊂ Rn , we will always mean a real-valued function, unless explicitly stated otherwise. By real-valued, we generally mean extended real-valued, that is, f may take the values ±∞; if | f (x)| < + ∞ for all x ∈ E, we say f is finite (or finite-valued) on E. A finite function f is said to be bounded on E if there is a finite number M such that | f (x)| ≤ M for x ∈ E; that is, f is bounded on E if supx∈E |f (x)| is finite. A sequence {fk } of functions is said to be uniformly bounded on E if there is a finite M such that |fk (x)| ≤ M for x ∈ E and all k. By the support of f , we mean the closure of the set where f is not zero. Thus, the support of a function is always closed. It follows that a function defined in Rn has compact support if and only if it vanishes outside some bounded set. A function f defined on an interval I in R1 is called monotone increasing (decreasing) if f (x) ≤ f (y) [f (x) ≥ f (y)] whenever x < y and x, y ∈ I. By strictly monotone increasing (decreasing), we mean that f (x) < f (y) [f (x) > f (y)] if x < y and x, y ∈ I. Let f be defined on E ⊂ Rn and let x0 be a limit point of E. Let B (x0 ; δ) = B(x0 ; δ) − {x0 } denote the punctured ball with center x0 and radius δ, and let M(x0 ; δ) =

sup

x∈B (x0 ;δ)∩E

f (x),

m(x0 ; δ) =

inf

x∈B (x0 ;δ)∩E

f (x).

As δ 0, M(x0 ; δ) decreases and m(x0 ; δ) increases, and we define lim sup f (x) = lim M(x0 ; δ),

x→x0 ;x∈E

δ→0

x→x0 ;x∈E

δ→0

lim inf f (x) = lim m(x0 ; δ).

(1.13)

11

Preliminaries

We leave it as an exercise to show that the following characterizations are valid.

Theorem 1.14 (a) M = lim supx→x0 ;x∈E f (x) if and only if (i) there exists {xk } in E−{x0 } such that xk → x0 and f (xk ) → M and (ii) if M > M, there exists δ > 0 such that f (x) < M for x ∈ B (x0 ; δ) ∩ E. (b) m = lim inf x→x0 ;x∈E f (x) if and only if (i) there exists {xk } in E−{x0 } such that xk → x0 and f (xk ) → m and (ii) if m < m, there exists δ > 0 such that f (x) > m for x ∈ B (x0 ; δ) ∩ E. We also define lim sup|x|→∞;x∈E f (x) and lim inf |x|→∞;x∈E f (x). For example, M = lim sup|x|→∞;x∈E f (x) means (i) there exist {xk } in E such that |xk | → ∞ and f (xk ) → M and (ii) if M > M, there exists N such that f (x) < M if |x| > N and x ∈ E. These notions should not be confused with lim supk→∞ fk (x) and lim inf k→∞ fk (x), which denote the lim sup and lim inf of the sequence {fk (x)}.

1.6 Continuous Functions and Transformations A function f defined in a neighborhood of x0 is said to be continuous at x0 if f (x0 ) is finite and limx→x0 f (x) = f (x0 ). If f is not continuous at x0 , it follows that unless f (x0 ) is infinite, either limx→x0 f (x) does not exist or is different from f (x0 ). For functions on R1 , we will use the notation f (x0 +) =

lim

x→x0 ;x>x0

f (x)

and

f (x0 −) =

lim

x→x0 ;x<x0

f (x)

for the right- and left-hand limits of f at x0 , when they exist. If f (x0 +), f (x0 −), and f (x0 ) exist and are finite, but f is not continuous at x0 , then either f (x0 +) = f (x0 −) or f (x0 +) = f (x0 −) = f (x0 ). In the first case, x0 is called a jump discontinuity of f and in the second, a removable discontinuity of f (since by changing the value of f at x0 , we can make it continuous there). Such discontinuities are said to be of the first kind, as distinguished from those of the second kind, for which either f (x0 +) or f (x0 −) does not exist or for which f (x0 +), f (x0 −) or f (x0 ) is infinite. If f is defined only in a set E containing x0 , E ⊂ Rn , then f is said to be continuous at x0 relative to E if f (x0 ) is finite and either x0 is an isolated point of E or x0 is a limit point of E and limx→x0 ;x∈E f (x) = f (x0 ). If E1 ⊂ E, a function

12

Measure and Integral: An Introduction to Real Analysis

is said to be continuous in E1 relative to E if it is continuous relative to E at every point of E1 . The proofs of the following basic facts are left as exercises. Theorem 1.15 Let E be a compact set in Rn and f be continuous in E relative to E. Then the following are true: (i) f is bounded on E; that is, supx∈E |f (x)| < ∞. (ii) f attains its supremum and infimum on E; that is, there exist x1 , x2 ∈ E such that f (x1 ) = supx∈E f (x), f (x2 ) = inf x∈E f (x). (iii) f is uniformly continuous on E relative to E; that is, given ε > 0, there exists δ > 0 such that |f (x) − f (y)| < ε if |x − y| < δ and x, y ∈ E. A sequence of functions {fk } defined on E is said to converge uniformly on E to a finite f if given ε > 0, there exists K such that |fk (x) − f (x)| < ε for k ≥ K and x ∈ E. We will use the following fact, whose proof is again left to the reader. Theorem 1.16 Let {fk } be a sequence of functions defined on E that are continuous in E relative to E and that converge uniformly on E to a finite f. Then f is continuous in E relative to E. A transformation T of a set E ⊂ Rn into Rn is a mapping y = Tx that carries points x ∈ E into points y ∈ Rn . If y = (y1 , . . . , yn ), then T can be identified with the collection of coordinate functions yk = fk (x), k = 1, . . . , n, which are induced by T. The image of E under T is the set {y: y = Tx for some x ∈ E}. T is continuous at x0 ∈ E relative to E (by which we mean limx→x0 ;x∈E Tx = Tx0 ) if and only if each fk is continuous at x0 relative to E. We will use the following result in Chapter 3. Theorem 1.17 Let y = Tx be a transformation of Rn that is continuous in E relative to E. If E is compact, then so is its image TE.

1.7 The Riemann Integral We shall see that the Lebesgue integral is more general than the Riemann integral, in the sense that whenever the Riemann integral of a function exists, then so does its Lebesgue integral, and the two are equal (Theorem 5.52).

13

Preliminaries

The Riemann integral is nonetheless useful, its significance being simplicity and computability. If f is defined and bounded on an interval I = {x : x = (x1 , . . . , xn ), ak ≤ xk ≤ bk , k = 1, . . . , n} in Rn , its Riemann integral will be denoted by b1 (R)

bn ...

a1

f (x1 , . . . , xn ) dx1 · · · dxn or (R)

an

f (x) dx

(1.18)

I

and is defined as follows. Partition I into a finite collection of nonoverlapping intervals, = {Ik }N k = 1 , and define the norm || of by || = maxk (diam Ik ). Select a point ξk in Ik for k ≥ 1, and let R = R (ξ1 , . . . , ξN ) =

N

f (ξk )v(Ik ),

k=1

N [sup f (x)]v(Ik ), U =

L =

k=1 x∈Ik

N

(1.19) [ inf f (x)]v(Ik ).

k=1

x∈Ik

We then define the Riemann integral by saying that A = (R) I f (x) dx if lim||→0 R exists and equals A; that is, if given ε > 0, there exists δ > 0 such that |A − R | < ε for any and any chosen {ξk }, provided only that || < δ. This definition is actually equivalent to the statement that inf U = sup L = A.

(1.20)

The integral of course exists if f is continuous on I. Proofs of these facts are left as exercises; the treatment given in Chapter 2 for the Riemann–Stieltjes integrals should serve as a review for many facts about Riemann integrals. See also Theorem 5.54.

Exercises 1. Prove the following facts, which were left earlier as exercises. (a) For a sequence of sets {Ek }, lim sup Ek consists of those points that belong to infinitely many Ek , and lim inf Ek consists of those points that belong to all Ek from some k on. (b) The De Morgan laws. (c) Every Cauchy sequence in Rn converges to a point of Rn . (This can be deduced from its analogue in R1 by noting that the entries in a

14

Measure and Integral: An Introduction to Real Analysis

given coordinate position of the points in a Cauchy sequence in Rn form a Cauchy sequence in R1 .) (d) Theorem 1.4. (e) A sequence {ak } in R1 converges to a, −∞ ≤ a ≤ +∞, if and only if lim supk→∞ ak = lim inf k→∞ ak = a. (f) B(x; δ) is open. (g) Theorem 1.5. (h) Theorems 1.6 and 1.7. (i) Theorem 1.8. (j) Any closed (open) set in Rn is of type Gδ (Fσ ). (If F is closed, consider the sets {x : dist(x, F) < (1/k)}, k = 1, 2, . . . .) (k) Theorem 1.12. (l) The distance between two nonempty, compact, disjoint sets in Rn is positive. See also Exercise 12. (m) The intersection of a countable sequence of decreasing, nonempty, compact sets is nonempty. (n) Theorem 1.14. (o) Theorem 1.15. (p) Theorem 1.16. (q) Theorem 1.17. (r) The Riemann integral A = (R) I f (x) dx of a bounded f over an interval I exists if and only if inf U = sup L = A. (s) If f is continuous on an interval I, then (R) I f (x) dx exists. 2. Find lim sup Ek and lim inf Ek if Ek = [−(1/k), 1] for k odd and Ek = [−1, (1/k)] for k even. 3. (a) Show that C(lim sup Ek ) = lim inf CEk . (b) Show that if Ek E or Ek E, then lim sup Ek = lim inf Ek = E. 4. (a) Show that lim supk→∞ (−ak ) = − lim inf k→∞ ak . (b) Show that lim supk→∞ (ak + bk ) ≤ lim supk→∞ ak + lim supk→∞ bk , provided that the expression on the right does not have the form ∞ + (−∞) or −∞ + ∞. (c) If {ak } and {bk } are nonnegative, bounded sequences, show that lim supk→∞ (ak bk ) ≤ (lim supk→∞ ak )(lim supk→∞ bk ). (d) Give examples for which the inequalities in parts (b) and (c) are not equalities. Show that if either {ak } or {bk } converges, equality holds in (b) and (c). 5. Find analogues of the statements in Exercise 4 for lim supx→x0 ;x∈E f (x). 6. Compare lim supk→∞ ak and lim sup(−∞, ak ).

15

Preliminaries

7. Show that E◦1 ∩ E◦2 = (E1 ∩ E2 )◦ and E◦1 ∪ E◦2 ⊂ (E1 ∪ E2 )◦ . Give an example when E◦1 ∪ E◦2 = (E1 ∪ E2 )◦ . 8. Let E be a set in Rn that is relatively open with respect to an interval I. Show that E can be written as a countable union of nonoverlapping intervals. 9. Prove that any closed subset of a compact set is compact. 10. Let {xk } be a bounded infinite sequence in Rn . Show that {xk } has a limit point. (This is the Bolzano–Weierstrass theorem in Rn .) 11. Give an example of a decreasing sequence of nonempty closed sets in Rn whose intersection is empty. 12. (a) Give an example of two disjoint, nonempty, closed sets E1 and E2 in Rn for which d(E1 , E2 ) = 0. (b) Let E1 , E2 be nonempty sets in Rn with E1 closed and E2 compact. Show that there are points x1 ∈ E1 and x2 ∈ E2 such that d(E1 , E2 ) = |x1 − x2 |. Deduce that d(E1 , E2 ) is positive if such E1 , E2 are disjoint. 13. If f is defined and uniformly continuous on E, show there is a function f¯ defined and continuous on E¯ such that f¯ = f on E. 14. If f is defined and uniformly continuous on a bounded set E, show that f is bounded on E. 15. Show that a bounded f is Riemann integrable on I if and only if given ε > 0, there is a partition of I such that 0 ≤ U − L < ε. (Exercise 1(r) may be helpful.) 16. If {fk } is a sequence of bounded, Riemann integrable functions on an interval I that converges uniformly on I to f , show that f is Riemann integrable on I and that (R) fk (x) dx → (R) f (x) dx. I

I

17. Let f be a finite function on Rn and define ω(δ) = sup {f (x) − f (y) : x − y ≤ δ}, δ > 0, to be the modulus of continuity of f . Show that ω(δ) decreases as δ decreases to 0 and that f is uniformly continuous if and only if ω(δ) → 0 as δ → 0. 18. Let F be a closed subset of (−∞, +∞), and let f be continuous relative to F. Show that there is a continuous function g on (−∞, +∞) which equals f in F. If | f (x)| ≤ M for x ∈ F, show that g can be chosen so that |g(x)| ≤ M for −∞ < x < +∞. (This is the Tietze extension theorem for the real line.)

16

Measure and Integral: An Introduction to Real Analysis

19. Prove the following special case of the Baire category theorem: the intersection of a countable number of open dense sets in R1 is dense in R1 . 20. Show that the irrational numbers form a set of type Gδ , but that the rational numbers do not. (For the second part, it is possible to argue by contradiction, using the first part and the result in Exercise 19.) 21. Construct a set in R1 that is neither of type Gδ nor of type Fσ . (Consider the union of the negative rationals and the positive irrationals, and use facts from Exercise 20.) 22. For an integer k = 1, . . . , n and a real number α, consider the hyperplane H = {x = (x1 , . . . , xn ) : xk = α}. Show that for every ε > 0, there is a collecof cubes in Rn with edges parallel to the coordinate axes such tion {Qj }∞ j= 1 that H ⊂ Qj and v(Qj ) < ε. (Using the terminology of Chapter 3, it follows that H has outermeasure 0 in Rn .)

2 Functions of Bounded Variation and the Riemann–Stieltjes Integral In the chapters ahead, we will study the Lebesgue integral. In this chapter, we introduce the Riemann–Stieltjes integral and, as a natural preliminary step, study functions of bounded variation. The justification for doing so is that Lebesgue integration is intimately connected with Riemann–Stieltjes integration, although this is not apparent from the definitions. We shall see in Theorem 5.43 that Lebesgue integrals can be represented as Riemann–Stieltjes integrals.

2.1 Functions of Bounded Variation Let f (x) be a real-valued function that is defined and finite for all x in a closed bounded interval a ≤ x ≤ b. Let = {x0 , x1 , . . . , xm } be a partition of [a, b]; that is, is a collection of points xi , i = 0, 1, . . . , m, satisfying x0 = a, xm = b, and xi−1 < xi for i = 1, . . . , m. With each partition , we associate the sum S = S [ f ; a, b] =

m

| f (xi ) − f (xi−1 )|.

i=1

The variation of f over [a, b] is defined as V = V [ f ; a, b] = sup S ,

where the supremum is taken over all partitions of [a, b]. The variation V[ f ; a, b] will sometimes also be denoted by V[a, b] or V(f ). Since 0 ≤ S < + ∞, we have 0 ≤ V ≤ +∞. If V < +∞, f is said to be of bounded variation on [a, b]; if V = +∞, f is of unbounded variation on [a, b].

17

18

Measure and Integral: An Introduction to Real Analysis

We list several simple examples. Example 1 Suppose f is monotone in [a, b]. Then, clearly, each S equals | f (b) − f (a)|, and therefore V = | f (b) − f (a)|. Example 2 Suppose the graph of f can be split into a finite number k of monotone arcs; that is, suppose [a, b] = i=1 [ai , ai+1 ] and f is monok tone in each [ai , ai+1 ]. Then V = i=1 | f (ai+1 ) − f (ai )|. To see this, we use the result of Example 1 and the fact, to be proved in Theorem 2.2, that V = V[a, b] = ki=1 V[ai , ai+1 ]. Example 3 Let f be defined by f (x) = 0 when x = 0 and f (0) = 1, and let [a, b] be any interval containing 0 in its interior. Then S is either 2 or 0, depending on whether or not x = 0 is a partitioning point of . Thus, V[a, b] = 2. If = {x0 , x1 , . . . , xm } is a partition of [a, b], let ||, called the norm of , be defined as the length of a longest subinterval of : || = max xi − xi−1 . i

If f is continuous on [a, b] and {j } is a sequence of partitions of [a, b] with |j | → 0, we shall see in Theorem 2.9 that V = limj→∞ Sj . Example 3 shows that this equality may fail for functions that are discontinuous even at a single point: if we take f and [a, b] as in Example 3 and choose the j such that x = 0 is never a partitioning point, then lim Sj = 0, while if we choose the j such that x = 0 alternately is and is not a partitioning point, then lim Sj does not exist. See also Exercise 20. Example 4 Let f be the Dirichlet function, defined by f (x) = 1 for rational x and f (x) = 0 for irrational x. Then, clearly, V[a, b] = +∞ for any interval [a, b]. Example 5 A function that is continuous on an interval is not necessarily of bounded variation on the interval. To see this, let {aj } and {dj }, j = 1, 2, . . . , be two monotonedecreasing sequences in (0, 1] with a1 = 1, limj→∞ aj = limj→∞ dj = 0 and dj = +∞. Construct a continuous f as follows. On each subinterval [aj+1 , aj ], the graph of f consists of the sides of the isosceles tri angle with base aj+1 , aj and height dj . Thus, f (aj ) = 0, and if mj denotes the midpoint of aj+1 , aj , then f (mj ) = dj . If we further define f (0) = 0, then f is continuous on [0,1]. Taking k to be the partition defined by the points 0, k+1 k aj j=1 , and mj j=1 , we see that Sk = 2 kj=1 dj . Hence, V[ f ; 0, 1] = +∞. See also Exercise 1.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

19

We mention here that there exist functions that are continuous on an interval but that are not of bounded variation on any subinterval. See Exercise 26 of Chapter 3. Example 6 A function f defined on [a, b] is said to satisfy a Lipschitz condition on [a, b], or to be a Lipschitz function on [a, b], if there is a constant C such that | f (x) − f (y)| ≤ C|x − y| for all x, y ∈ [a, b]. Such a function is clearly of bounded variation, with V[ f ; a, b] ≤ C(b − a). For example, if f has a continuous derivative on [a, b], or even just a bounded derivative, then (by the mean-value theorem) f satisfies a Lipschitz condition on [a, b]. For more examples of functions of bounded variation, see the exercises at the end of the chapter. In the next two theorems, we summarize some of the simplest properties of functions of bounded variation. The proof of the first theorem is left as an exercise. Theorem 2.1 (i) If f is of bounded variation on [a, b], then f is bounded on [a, b]. (ii) Let f and g be of bounded variation on [a, b]. Then cf (for any real constant c), f + g, and fg are of bounded variation on [a, b]. Moreover, f/g is of bounded variation on [a, b] if there exists an ε > 0 such that |g(x)| ≥ ε for x ε [a, b]. Before stating the second result, we note that if ¯ is a refinement of , that is, ¯ if contains all the partitioning points of plus some additional points, then S ≤ S¯ . This follows from the triangle inequality and is most easily seen in the case when ¯ consists of all the points of plus one additional point. The case of general ¯ can be reduced to this simple case by adding one point at a time to . Theorem 2.2 (i) If [a , b ] is a subinterval of [a, b], then V[a , b ] ≤ V[a, b]; that is, variation increases with interval. (ii) If a < c < b, then V[a, b] = V[a, c] + V[c, b]; that is, variation is additive on adjacent intervals.

20

Measure and Integral: An Introduction to Real Analysis

Proof. (i) This follows easily from (ii), as the reader can check. A simple direct proof based on adjoining the points a, b to generic partitions of [a , b ] can also be given. (ii) Let I = [a, b], I1 = [a, c], I2 = [c, b], V = V[a, b], V1 = V[a, c], and V2 = V[c, b]. If 1 and 2 are any partitions of I1 and I2 , respectively, then = 1 ∪2 is one of I, and S [I] = S1 [I1 ]+S2 [I2 ]. Thus, S1 [I1 ]+S2 [I2 ] ≤ V. Therefore, taking the supremum over 1 and 2 separately, we obtain V1 + V2 ≤ V. To show the opposite inequality, let be any partition of I, and let ¯ be with c adjoined. Then S [I] ≤ S¯ [I], and ¯ splits into partitions 1 of I1 and 2 of I2 . Thus, we have S [I] ≤ S¯ [I] = S1 [I1 ] + S2 [I2 ] ≤ V1 + V2 . Therefore, V ≤ V1 + V2 , which completes the proof of (ii). For any real number x, define

x x+ = 0

if x > 0 if x ≤ 0,

x− =

0 −x

if x > 0 if x ≤ 0.

These are called the positive and negative parts of x, respectively, and satisfy the relations x+ , x− ≥ 0;

|x| = x+ + x− ;

x = x+ − x− .

(2.3)

Given a finite function f on [a, b] and a partition = {xi }m i=0 of [a, b], define

P = P [ f ; a, b] =

m [ f (xi ) − f (xi−1 )]+ , i=1

m N = N [ f ; a, b] = [ f (xi ) − f (xi−1 )]− . i=1

Thus, P is the sum of the positive terms of S , and −N is the sum of the negative terms of S . In particular, by (2.3), P ≥ 0, N ≥ 0, P + N = S ,

(2.4)

P − N = f (b) − f (a).

(2.5)

Functions of Bounded Variation and the Riemann–Stieltjes Integral

21

The positive variation P and the negative variation N of f are defined by P = P[ f ; a, b] = sup P ,

N = N[ f ; a, b] = sup N .

Thus, 0 ≤ P, N ≤ +∞.

If any one of P, N, or V is finite, then all three are finite. Moreover,

Theorem 2.6 we then have

P + N = V,

P − N = f (b) − f (a),

or equivalently P=

1 [V + f (b) − f (a)] , 2

N=

1 [V − f (b) + f (a)] . 2

Proof. By (2.4), P +N ≤ V, and therefore, since P and N are nonnegative, P ≤ V and N ≤ V. In particular, P and N are finite if V is. By (2.4) again, S ≤ P + N and therefore V ≤ P + N. If either P or N is finite, so is the other by (2.5), and therefore so is V. This gives the first part of the theorem. Now choose a sequence of partitions k so that Pk → P. Let us show that Nk → N and P − [ f (b) − f (a)] = N. By (2.5), Nk = Pk − [ f (b) − f (a)] → P − [ f (b) − f (a)], and since Nk ≤ N, it follows that P − [ f (b) − f (a)] ≤ N. If P − [ f (b) − f (a)] < N, there is, by definition of N, a partition with N > P−[ f (b)−f (a)]. Then P = N + [ f (b)−f (a)] > P, which is impossible. Hence, P − [ f (b) − f (a)] = N and Nk → N. If N is finite, it follows that P − N = f (b) − f (a). Letting k → ∞ in the inequality Pk + Nk ≤ V gives P + N ≤ V. Since V ≤ P + N was shown earlier, we have V = P + N, and the theorem follows. Corollary 2.7 (Jordan’s Theorem) A function f is of bounded variation on [a, b] if and only if it can be written as the difference of two bounded increasing functions on [a, b]. Proof. Suppose f = f1 − f2 , where f1 and f2 are bounded and increasing on [a, b]. Then f1 and f2 are of bounded variation on [a, b], and therefore, by Theorem 2.1(ii), so is f . Conversely, suppose f is of bounded variation on [a, b]. By Theorem 2.2(i), f is of bounded variation on every interval [a, x], a ≤ x ≤ b. Let P(x) and N(x) denote the positive and negative variations of f on [a, x], respectively.

22

Measure and Integral: An Introduction to Real Analysis

By the analogue of Theorem 2.2(i) for P and N (see Exercise 3), it follows that P(x) and N(x) are bounded and increasing on [a, b]. Moreover, by Theorem 2.6 applied to [a, x], f (x) = [P(x) + f (a)] − N(x) when a ≤ x ≤ b. Since P(x) is bounded and increasing, so is P(x) + f (a), and the corollary follows. Note that since the negative of an increasing function is decreasing, Corollary 2.7 may be rephrased to say that f is of bounded variation if and only if it is the sum of a bounded increasing function and a bounded decreasing function. We remark here that there exist continuous functions of bounded variation that are not monotone in any subinterval. See Exercise 27 of Chapter 3. In the next theorem, we consider a continuity property of functions of bounded variation. We recall from Chapter 1 that a discontinuity is said to be of the first kind if it is either a jump or a removable discontinuity.

Theorem 2.8 Every function of bounded variation has at most a countable number of discontinuities, and they are all of the first kind. Here, if f is of bounded variation on [a, b], we can clarify what it means to say that f has a discontinuity of the first kind at the endpoints a, b by extending the definition of f outside [a, b] by setting f (x) = f (a) if x < a and f (x) = f (b) if x > b and then using the usual notion. Proof. Let f be of bounded variation on [a, b]. Suppose first that f is bounded and increasing on [a, b]. Then the only discontinuities of f are of the first kind; in fact, they are all jumpdiscontinuities. If D denotes the set of all discontinuities of f , then D = ∞ k=1 {x : f (x+) − f (x−) ≥ 1/k}. Since f is bounded, each set on the right is finite (or empty); therefore, D is countable. The general case follows from this by using Corollary 2.7. Note that removable discontinuities may arise by subtracting monotone functions; for example, consider the function f in Example 3 and the corresponding monotone functions P(x) and N(x). See also Exercise 25. We now discuss a property of the variation of a continuous function. See also Exercise 20. Theorem 2.9 If f is continuous on [a, b], then V = lim||→0 S ; that is, given M satisfying M < V, there exists δ > 0 such that S > M for any partition of [a, b] with || < δ. Proof. We remind the reader of the discussion following Example 3. Given M with M < V, we must find δ > 0 so that S > M if || < δ. Select μ > 0 such

Functions of Bounded Variation and the Riemann–Stieltjes Integral

23

that M+μ < V, and choose a fixed partition ¯ = {¯xj }kj=0 such that S¯ > M+μ. Using the uniform continuity of f on [a, b], pick η > 0 such that (i) |f (x) − f (x )| < μ/[2(k + 1)] if |x − x | < η. Now let be any partition that satisfies (ii) || < η, (iii) || < minj x¯ j − x¯ j−1 . We claim that S > M, from which the theorem will follow by choosing δ to be the smaller of η and minj (¯xj − x¯ j−1 ). Write = {xi }m i=0 and S =

m

| f (xi ) − f (xi−1 )| =

+

,

i=1

where is extended over all i such that (xi−1 , xi ) contains some x¯ j . By (iii), any (xi−1 , xi ) can contain at most one x¯ j , and therefore the number of terms of is at most k + 1. Let ∪ ¯ denote the partition formed by the union of ¯ ¯ ¯ the points of and . Then ∪ is a refinement of both and . Moreover, S∪¯ = + , where is obtained from by replacing each term by | f (xi ) − f (¯xj )| + | f (¯xj ) − f (xi−1 )|, x¯ j being the point of ¯ in (xi−1 , xi ). By (i) and (ii), each of these two terms is less than μ/[2(k + 1)], and therefore

< 2(k + 1)

μ = μ. 2(k + 1)

Hence,

= S∪¯ −

> S∪¯ − μ,

¯ S∪¯ ≥ S¯ . This gives so that S > S∪¯ − μ. Since ∪ ¯ is a refinement of , S > S¯ − μ > M and completes the proof. Corollary 2.10

If f has a continuous derivative f on [a, b], then

V=

b a

| f | dx,

P=

b a

+

{ f } dx,

N=

b a

{ f }− dx.

24

Measure and Integral: An Introduction to Real Analysis

Proof. By the mean-value theorem,

S =

m

| f (xi ) − f (xi−1 )| =

i=1

m

|f (ξi )|(xi − xi−1 )

i=1

for appropriate ξi ∈ (xi−1 , xi ), i = 1, . . . , m. Hence, by Theorem 2.9,

V = lim S = lim ||→0

||→0

m

|f (ξi )|(xi − xi−1 ) =

i=1

b

|f (x)| dx,

a

by definition of the Riemann integral. Moreover, by Theorem 2.6, ⎤ ⎡ b b 1 1 ⎣ P = [V + f (b) − f (a)] = |f (x)| dx + f (x) dx⎦ 2 2 a a =

b b 1 |f (x)| + f (x) dx = [f (x)]+ dx. 2a a

The formula for N follows similarly from the fact that

N=

1 [V − f (b) + f (a)]. 2

For an extension of Corollary 2.10, see Theorem 7.31. See also Exercise 22 of Chapter 7. In passing, we note that there are notions of bounded variation for open or partly open intervals, as well as for infinite intervals. Suppose, for example, that (a, b) is a bounded open interval. Let [a , b ] ⊂ (a, b), and define V ◦ (a, b) = lim V[a , b ] as a → a and b → b. If V ◦ (a, b) < + ∞, we say f is of bounded variation on (a, b). Similarly, if f is defined on (−∞, +∞), let V(−∞, +∞) = lim V[a, b] as a → −∞ and b → +∞. Analogous definitions hold for [a, b), (a, +∞), [a, +∞), etc. See Exercise 8. We may also consider the notion of bounded variation for complex-valued f defined on an interval. The definition is the same as for the real-valued case, and we leave it to the reader to show that a complex-valued f is of bounded variation if and only if both its real and imaginary parts are as well.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

25

2.2 Rectifiable Curves As an application of the notion of bounded variation, we shall discuss its relation to rectifiable curves (initially, those in the plane). A curve C in the plane is two finite real-valued parametric equations C:

x = ϕ(t) , y = ψ(t)

a ≤ t ≤ b.

(2.11)

The graph of C is {(x, y) : x = φ(t), y = ψ(t), a ≤ t ≤ b}. The graph may have self-intersections and is not necessarily continuous or bounded. We think of the curve itself as the mapping of [a, b] onto the graph. Let = {a = t0 < t1 < · · · < tm = b} be a partition of [a, b], and consider the corresponding points Pi = (φ(ti ), ψ(ti )), i = 0, 1, . . . , m, on the graph of C. Draw the polygonal (broken) line connecting P0 to P1 , P1 to P2 , . . . , Pm−1 to Pm in order, and let l() =

m 2 2 1/2 φ (ti ) − φ ti−1 + ψ (ti ) − ψ ti−1 i=1

denote its length. The length L of C is defined by the equation L = L(C) = sup l().

(2.12)

Thus, 0 ≤ L ≤ +∞. If the graph of C is discontinuous, then as we move along the graph, the length of every missing segment will contribute to L. Moreover, the possibility that the graph may be traversed more than once, that is, that the mapping t → (φ(t), ψ(t)), a ≤ t ≤ b, may not be one-to-one, will add to L. We say C is rectifiable if L < +∞.

Theorem 2.13 Let C be a curve defined by (2.11). Then C is rectifiable if and only if both φ and ψ are of bounded variation. Moreover, V(φ), V(ψ) ≤ L ≤ V(φ) + V(ψ). Proof. We will use the simple inequalities 1/2 ≤ |a| + |b| |a|, |b| ≤ a2 + b2

26

Measure and Integral: An Introduction to Real Analysis

for real a and b. Thus, if C is rectifiable and = {ti } is any partition of [a, b], the inequality l() =

2 2 1/2 φ (ti ) − φ ti−1 + ψ (ti ) − ψ ti−1 ≤L

implies |φ (ti ) − φ ti−1 | ≤ L and |ψ (ti ) − ψ ti−1 | ≤ L. Hence, V(φ), V(ψ) ≤ L. On the other hand, for any C, l() ≤

|φ (ti ) − φ ti−1 | + |ψ (ti ) − ψ ti−1 | ≤ V(φ) + V(ψ).

Hence, L ≤ V(φ) + V(ψ), which completes the proof. It follows that if φ(t) is any bounded function that is not of bounded variation on [a, b] (see Example 5 and Exercise 1), then the curve given by x = y = φ(t), a ≤ t ≤ b, is not rectifiable, even though its graph lies in a finite segment of the line y = x. Thus, the length of the graph of a curve is not necessarily the same as the length of the curve. In the special case that C is given by a function y = f (x), Theorem 2.13 reduces to the simple statement that C is rectifiable if and only if f is of bounded variation. Curves in Rn can be treated similarly, and we shall be brief. By a curve C in n R , we mean a system x1 = φ1 (t), . . . , xn = φn (t), for t in some [a, b]. We consider a partition = {ti }m i=0 of [a, b] and the length l() of the corresponding polygonal line:

l() =

m

⎛ ⎞1/2 m n 2 ⎝ φj (ti ) − φj ti−1 ⎠ . Pi−1 Pi =

i=1

i=1

j=1

The quantity L = sup l() is called the length of C, and if L < +∞, C is said to be rectifiable. As seen from the definition of l(), exactly as in the case n = 2, C is rectifiable if and only if each φj is of bounded variation.

2.3 The Riemann–Stieltjes Integral Let f and φ be two functions that are defined and finite on a finite interval [a, b]. If = {a = x0 < x1 < · · · < xm = b} is a partition of [a, b], we arbitrarily select intermediate points {ξi }m i=1 satisfying xi−1 ≤ ξi ≤ xi and write R =

m i=1

f (ξi ) φ (xi ) − φ xi−1 .

(2.14)

Functions of Bounded Variation and the Riemann–Stieltjes Integral

27

R is called a Riemann–Stieltjes sum for and of course depends on the points ξi , the functions f and φ, and the interval [a, b], although we shall usually not display this dependence in our notation. If I = lim R

(2.15)

||→0

exists and is finite, that is, if given ε > 0 there is a δ > 0 such that |I – R | < ε for any satisfying || < δ and for any choice of intermediate points, then I is called the Riemann–Stieltjes integral of f with respect to φ on [a, b] and denoted I=

b

b f (x) dφ(x) = f dφ.

a

a

b A necessary and sufficient condition for the existence of a f dφ is the following Cauchy criterion: given ε > 0, there exists δ > 0 such that |R −R | < ε if ||, | | < δ. See Exercise 11. We list four preliminary remarks about this integral. b b 1. If φ(x) = x, a f dφ is clearly just the Riemann integral a f dx. In this case, Theorem 5.54 in Chapter 5 gives a necessary and sufficient condition on f for the existence of the integral. 2. If f is continuous on [a, b] and φ is continuously differentiable on [a, b], then b b a f dφ = a f φ dx. (See also Theorem 7.32.) In fact, by the mean-value theorem, R =

f (ξi ) φ (ηi ) xi − xi−1 , f (ξi ) φ (xi ) − φ xi−1 =

with xi−1 ≤ ξi , ηi ≤ xi . Using the uniform continuity of φ , we obtain b lim||→0 R = a f φ dx.

3. Let φ(x) be a step function; that is, suppose there are points a = α0 < α1 < · · · < αm = b such that φ is constant on each interval (αi−1 , αi ). Let φ(αi +) = lim φ(x), x→αi +

i = 0, 1, . . . , m − 1,

and φ(αi −) = lim φ(x), x→αi −

i = 1, . . . , m,

28

Measure and Integral: An Introduction to Real Analysis

denote the limits from the right and left at αi , and let di = φ(αi +) − φ(αi −), i = 1, . . . , m − 1, d0 = φ(α0 +) − φ(α0 ), and dm = φ(αm ) − φ(αm −) denote the jumps of φ. Then, for continuous f , b

f dφ =

a

m

f (αi )di .

i=0

The existence of the integral can be verified directly or by appealing to Theorem 2.24. See also Exercise 29. For example, if φ(x) is chosen to be the Heaviside function H(x) defined by H(x) = 0 when x < 0 and H(x) = 1 when x ≥ 0, then 1

f dH = f (0)

−1

if f is continuous at x = 0. 4. In definition (2.15), no condition other than finiteness is imposed on f or φ, but we shall see later that the most important applications occur when φ is monotone or, more generally, of bounded variation. We note now that b if a f dφ exists, then f and φ have no common points of discontinuity. To prove this, suppose that both f and φ are discontinuous at x¯ , a < x¯ < b. Suppose first that the discontinuity of φ is not removable. Then there is a fixed η > 0 such that, for every ε > 0, there exist points x¯ 1 and x¯ 2 with x¯ − ε2 < x¯ 1 < x¯ < x¯ 2 < x¯ + ε2 and |φ(¯x2 ) − φ(¯x1 )| > η. Let = {xi } be a partition of [a, b] with || < ε such that xi0 −1 = x¯ 1 and xi0 = x¯ 2 for some i0 . Choose a point ξi ∈ [xi−1 , xi ] for i = i0 and two different points ξi0 and ξi in 0 [xi0 −1 , xi0 ]. Let R be the Riemann–Stieltjes sum using ξi in each [xi−1 , xi ], and let R be the sum using ξi in [xi−1 , xi ] for i = i0 and ξi in [xi0 −1 , xi0 ]. 0 Then, clearly, |R − R | = | f ξi0 − f ξi0 | |φ xi0 − φ xi0 −1 | > η| f ξi0 − f ξi0 |. Since f is discontinuous at x¯ , we can choose ξi0 and ξi subject to the restric0 tions earlier and such that | f (ξi0 ) − f (ξi )| > μ for some μ > 0 independent 0 of ε. It follows that |R − R | exceeds a positive constant independent of ε, contradicting the assumption that R − R → 0 as ||, | | → 0. If the discontinuity of φ at x¯ is removable, a similar argument can be given. The main difference is that we consider with x¯ as a partitioning

29

Functions of Bounded Variation and the Riemann–Stieltjes Integral

point xi0 and argue for either [xi0 −1 , x¯ ] or [¯x, xi0 +1 ], depending on the nature of the discontinuity of f at x¯ . The arguments in the case where x¯ is either a or b are similar. In the theorem that follows, we list some simple properties of the Riemann–Stieltjes integral. The proofs are left as an exercise.

Theorem 2.16 (i) If

b a

f dφ exists, then so do b

b a

cf dφ and

cf dφ =

b

a

(ii) If

b

a f1 dφ

and

b

a f2 dφ

b

b a

f dφ1 and

b a

f d(cφ) for any constant c, and

a

a

both exist, so does

(f1 + f2 ) dφ =

b

b a

b a

(f1 + f2 ) dφ, and

f1 dφ +

a

f dφ2 exist, so does b

a

b f d(cφ) = c f dφ.

a

(iii) If

b

b

f2 dφ.

a

f d(φ1 + φ2 ), and

b b f d(φ1 + φ2 ) = f dφ1 + f dφ2 .

a

a

a

The additivity of the integral with respect to intervals is given by the following result. See also Exercise 14.

Theorem 2.17 exist and

If

b a

f dφ exists and a < c < b, then

b a

f dφ =

c a

f dφ +

b

c a

f dφ and

b c

f dφ both

f dφ.

c

Proof. In the proof, R [a, b] will denote a Riemann–Stieltjes sum correspondc ing to a partition of [a, b]. To show that a f dφ exists, it is enough to show

30

Measure and Integral: An Introduction to Real Analysis

that given ε > 0, there exists δ > 0 so that if 1 and 2 are partitions of [a, c] with |1 |, |2 | < δ, then |R1 [a, c] − R2 [a, c]| < ε.

(2.18)

b Since a f dφ exists, there is a δ > 0 so that for any partitions 1 and 2 of [a, b] with |1 |, |2 | < δ, we have |R [a, b] − R [a, b]| < ε. 1

(2.19)

2

Let 1 and 2 be partitions of [a, c] with given sets of intermediate points. Complete 1 and 2 to partitions 1 and 2 of [a, b] by adjoining the same points of [c, b]; that is, let be a partition of [c, b] and let 1 = 1 ∪ , 2 = 2 ∪ . Select a set of intermediate points in [c, b] for , and let the intermediate points of 1 and 2 consist of these together with the sets for 1 and 2 , respectively. Then R [a, b] = R1 [a, c] + R [c, b] 1

(2.20)

R [a, b] = R2 [a, c] + R [c, b]. 2

If we now assume that |1 |, |2 | < δ and choose so that | | < δ, then |1 |, |2 | < δ, and (2.18) follows from (2.19) by subtracting the equations in (2.20). b The proof of the existence of c f dφ is similar. The fact that b a

f dφ =

c

f dφ +

a

b

f dφ

c

follows from (2.20). This completes the proof. See also Exercise 19. The next result is the very useful formula for integration by parts.

Theorem 2.21

If

b a

b a

f dφ exists, then so does

b a

φ df , and

f dφ = [ f (b)φ(b) − f (a)φ(a)] −

b a

φ df .

31

Functions of Bounded Variation and the Riemann–Stieltjes Integral

Proof. Let = {a = x0 < x1 < · · · < xm = b} and xi−1 ≤ ξi ≤ xi . Then R =

m

m m f (ξi ) φ (xi ) − φ xi−1 = f (ξi ) φ (xi ) − f (ξi ) φ xi−1

i=1

=

m

i=1 m−1

f (ξi ) φ (xi ) −

i=1 m−1

=−

i=1

f ξi+1 φ (xi )

i=0

φ (xi ) f ξi+1 − f (ξi ) + f (ξm ) φ(b) − f (ξ1 ) φ(a)

i=1

since xm = b and x0 = a. If we subtract and add φ(a)[ f (ξ1 )−f (a)]+φ(b)[ f (b)− f (ξm )] on the right side of the last equality and cancel like terms, we obtain R = −T + [ f (b)φ(b) − f (a)φ(a)], where

T =

m−1

φ (xi ) f ξi+1 − f (ξi ) + φ(a) [ f (ξ1 ) − f (a)] + φ(b) [ f (b) − f (ξm )] .

i=1

Since the ξi straddle the xi (successive ξi s may be equal), T is a Riemann– b Stieltjes sum for a φ df . Observing that the roles of φ and f can be interb changed, and taking the limit as || → 0, we see that a f dφ exists if and b b b only if a φ df exists and that a f dφ = [ f (b)φ(b) − f (a)φ(a)] − a φ df . This completes the proof. Now let f be bounded and φ be monotone increasing on [a, b]. If = {xi }m i=0 , let mi = L =

inf

xi−1 ≤ x ≤ xi m

f (x),

Mi =

sup

xi−1 ≤ x ≤ xi

f (x),

mi φ (xi ) − φ xi−1 ,

i=1

U =

m

Mi φ (xi ) − φ xi−1 .

i=1

Since −∞ < mi ≤ Mi < +∞ and φ(xi ) − φ(xi−1 ) ≥ 0, we see that L ≤ R ≤ U .

(2.22)

32

Measure and Integral: An Introduction to Real Analysis

L and U are called the lower and upper Riemann–Stieltjes sums for , respectively. The behavior of L and U is somewhat more predictable than that of R , as we now show.

Lemma 2.23

Let f be bounded and φ be increasing on [a, b].

(i) If is a refinement of , then L ≥ L and U ≤ U . (ii) If 1 and 2 are any two partitions, then L1 ≤ U2 . Proof. To see (i) for upper sums, suppose that has only one point x not in . If x lies between xi−1 and xi of , then sup[xi−1 ,x ] f (x), sup[x ,xi ] f (x) ≤ Mi , so that sup f (x) φ(x ) − φ xi−1

[xi−1 ,x ]

+ sup f (x) φ (xi ) − φ x ≤ Mi φ (xi ) − φ xi−1 . [x ,xi ]

Hence, U ≤ U . Since can be obtained by adding one point at a time to , an extension of this reasoning proves (i) for upper sums. The argument for lower sums is similar. To show (ii), note that 1 ∪ 2 is a refinement of both 1 and 2 . Hence, by part (i) and (2.22), we obtain L1 ≤ L1 ∪2 ≤ U1 ∪2 ≤ U2 , which completes the proof. We now come to an important result that gives sufficient conditions for the b existence of a f dφ. See also Exercise 23. Theorem 2.24 If f is continuous on [a, b] and φ is of bounded variation on [a, b], b then a f dφ exists. Moreover, b f dφ ≤ sup| f | V[φ; a, b]. a

[a,b]

Proof. To prove the existence, we may suppose by Corollary 2.7 and Theorem 2.16(iii) that φ is monotone increasing. Then, by (2.22), L ≤ R ≤ U , and it is enough to show that lim||→0 L and lim||→0 U exist and are equal. This is clear if φ is constant on [a, b]. If φ is not constant, let = {xi } and note that

Functions of Bounded Variation and the Riemann–Stieltjes Integral

33

given ε > 0, the uniform continuity of f implies there exists δ > 0 such that if || < δ, then Mi − mi < ε/[φ(b) − φ(a)]. Hence, if || < δ, 0 ≤ U − L =

(Mi − mi ) φ (xi ) − φ xi−1 < ε.

(2.25)

Therefore, lim (U − L ) = 0,

||→0

(2.26)

and it is enough to show that lim||→0 U exists. This is immediate since otherwise there would exist an ε > 0 and two sequences of partitions, {k } and {k }, with norms tending to zero such that U − U > ε. In view of (2.26), k k we would then have, for k large enough, L − U > ε/2 > 0, contradicting k k the fact that L ≤ U for any and (Lemma 2.23). b To complete the proof, note that the inequality | a f dφ| ≤ (sup(a,b) | f |) V[φ; a, b] follows from a similar one for R by taking the limit. b Combining Theorems 2.21 and 2.24, we see that a f dφ exists if either f or φ is continuous and the other is of bounded variation. Theorem 2.27 (Mean-Value Theorem) If f is continuous on [a, b] and φ is bounded and increasing on [a, b], there exists ξ ∈ [a, b] such that b

f dφ = f (ξ)[φ(b) − φ(a)].

a

Proof. Since φ is increasing, we have min f [φ(b) − φ(a)] ≤ R ≤ max f [φ(b) − φ(a)] [a,b]

for any R . Since

[a,b]

b a

f dφ exists (see Theorem 2.24), it must satisfy

b min f [φ(b) − φ(a)] ≤ f dφ ≤ max f [φ(b) − φ(a)]. [a,b]

a

[a,b]

The result now follows immediately from the intermediate value theorem for continuous functions.

34

Measure and Integral: An Introduction to Real Analysis

In passing, we note that Riemann–Stieltjes integrals can also be defined, in the improper sense, on open or partly open bounded intervals and on infinite intervals. If f and φ are defined on (a, b), for example, let a < a < b < b and define b a

f dφ = lim

b

a →a b →b a

f dφ,

provided the limit exists in the sense that it is independent of how a → a and b → b. Similarly, let +∞

f dφ = lim

b

a→−∞ b→+∞ a

−∞

f dφ

if the limit exists. Analogous definitions can be given for [a, b), (a, +∞), [a, +∞), etc. See Exercise 24.

2.4 Further Results about Riemann–Stieltjes Integrals

b We will discuss a variant of the definition of a f dφ in the case where f is bounded and φ is increasing. Note that it then follows from part (ii) of Lemma 2.23 that −∞ < sup L ≤ inf U < +∞. It is natural to ask if the b existence of a f dφ in this case is equivalent to the statement that sup L = inf U ,

(2.28)

which we know to be an equivalent definition in the case of Riemann integrals (see (1.20)). Unfortunately, the answer in general is no, as the following example shows. Let [a, b] = [−1,1] and f (x) = φ(x) =

0 1

if − 1 ≤ x < 0 if 0 ≤ x ≤ 1,

0 1

if − 1 ≤ x ≤ 0 if 0 < x ≤ 1.

1 Since f and φ have a common discontinuity, −1 f dφ does not exist. In fact, if straddles 0, that is, if xi0 −1 < 0 < xi0 for some i0 , then R = f (ξi0 ) for xi0 −1 ≤ ξi0 ≤ xi0 . Hence, R may be 0 or 1 and thus cannot have a limit. On the other

35

Functions of Bounded Variation and the Riemann–Stieltjes Integral

hand, it is easy to check that U = 1 for any and that L = 0 if straddles 0 and L = 1 otherwise. Hence, neither lim||→0 R nor lim||→0 L exists, but sup L = inf U = 1. In the following two theorems, we explore relations between (2.15) and (2.28).

Theorem 2.29 Let f be bounded and φ be monotone increasing on [a, b]. If exists, then lim||→0 L and lim||→0 U exist, and

lim L = lim U = sup L = inf U =

||→0

||→0

b

b a

f dφ

f dφ.

a

Proof. We may assume that φ is not constant on [a, b] since the result is obvib ous otherwise. Let I = a f dφ. By hypothesis, given ε > 0, there is a δ > 0 such that |I – R | < ε for any R with || < δ. Given = {xi }m i=0 with || < δ, choose ξi and ηi in [xi−1 , xi ], i = 1, . . . , m, such that 0 ≤ Mi − f (ξi )

0 such that U < U + ε if || < δ. Choose ¯ = {¯xj }kj=0 such that U¯ < U + ε2 , and let M = sup[a,b] | f |. By the uniform continuity of φ, there exists η > 0 such that |φ(x) − φ(x )|

0, with V[ f ; a+ε, b] ≤ M < +∞. Show that V[ f ; a, b] < +∞. Is V[ f ; a, b] ≤ M? If not, what additional assumption will make it so? 6. Let f (x) = x2 sin (1/x) for 0 < x ≤ 1 and f (0) = 0. Show that V[ f ; 0, 1] < +∞. (Examine the graph of f , or use Exercise 5 and Corollary 2.10.) 7. Suppose f is of bounded variation on [a, b]. If f is continuous at a point x¯ , show that V(x), P(x), and N(x) are also continuous at x¯ . In particular, if f is continuous on [a, b], then so are V(x), P(x), and N(x). (If = {xi }, note

38

Measure and Integral: An Introduction to Real Analysis

that V[xi−1 , xi ] − |f (xi−1 ) − f (xi )| ≤ V[a, b] − S . Recall that S increases when is refined.) 8. The main results about functions of bounded variation on a closed bounded interval remain true for open or partly open intervals and for infinite intervals. Prove, for example, that if f is of bounded variation on (−∞, +∞), then f is the difference of two increasing bounded functions. 9. Let C be a curve with parametric equations x = φ(t) and y = ψ(t), a ≤ t ≤ b. (a) If φ and ψ are of bounded variation and continuous, show that L = lim||→0 l(). b (b) If φ and ψ are continuously differentiable, show that L = a ([φ (t)]2 + [ψ (t)]2 )1/2 dt. 10. If λ1 < λ2 < · · · < λm is a finite sequence and −∞ < s < +∞, write −sλk as a Riemann–Stieltjes integral. (Take f (x) = e−sx , φ to be an k ak e appropriate step function and [a, b] to contain all the λk in its interior.) b 11. Show that a f dφ exists if and only if given ε > 0, there exists δ > 0 such that |R − R | < ε if ||, | | < δ. 12. Prove that the conclusion of Theorem 2.30 is valid if the assumption that φ is continuous is replaced by the assumption that f and φ have no common discontinuities. (Instead of the uniform continuity of φ, use the fact ¯ that either f or φ is continuous at each point x¯ j of .) 13. Prove Theorem 2.16. c b 14. Give an example that shows that for a < c < b, a f dφ and c f dφ may b both exist but a f dφ may not. Compare Theorem 2.17. (Take [a, b] = [−1,1], c = 0, and f and φ as in the example following (2.28).) 15. Suppose f is continuous and x φ is of bounded variation on [a, b]. Show that the function ψ(x) = a f dφ is of bounded variation on [a, b]. If g is b b continuous on [a, b], show that a g dψ = a gf dφ.

16. Suppose that φ is of bounded variation on [a, b] and that f is bounded and continuous except for a finite number of jump discontinuities in [a, b]. If b φ is continuous at each discontinuity of f , show that a f dφ exists. 17. If φ is of bounded variation on (−∞, +∞), f is continuous on (−∞, +∞), +∞ and lim|x|→+∞ f (x) = 0, show that −∞ f dφ exists. k 18. Let f (z) = ∞ |ak | < +∞, then k=0 ak z be a power series. Show that if f (z) is of bounded variation on every radius of the circle 1. (If, e.g., |z| = k− k x a− the radius is 0 ≤ x ≤ 1 and the ak are real, then f (x) = a+ k k x .) b 19. Let f and φ be finite functions on [a, b]. If [a , b ] ⊂ [a, b] and a f dφ exists, b show that a f dφ exists.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

39

20. Let f be finite on [a, b]. If lim||→0 S [ f ; a, b] exists, show that it equals V[ f ; a, b]. 21. If V[φ; a, b] = +∞, show that there is a point x0 ∈ [a, b] such that either V[φ; I] = + ∞ for every subinterval I of [a, b] having x0 as left-hand endpoint or V[φ; I] = +∞ for every subinterval I of [a, b] having x0 as right-hand endpoint. (Note that for any x0 , a , b such that a ≤ a < x0 < b ≤ b, V[φ; a , b ] < +∞ if both V[φ; a , x0 ], V[φ; x0 , b ] < +∞.) 22. If V[φ; a, b] = +∞, show that there exist x0 ∈ [a, b] and a monotone ∞ in [a, b] such that x → x and sequence {xk }∞ 0 k k=1 |φ(xk+1 ) − φ(xk )| = 1 +∞. (Use the results in Exercises 21 and 5.) 23. If V[φ; a, b] = +∞, show that there is a continuous f on [a, b] such that b a f dφ does not exist. (Use the result in Exercise 22 together with the αk = following fact: if {αk } is a sequence of positive numbers with } of positive numbers with ε → 0 and +∞, then there is a sequence {ε k k εk αk = +∞.) 24. Let f be continuous and φ be of bounded variation on [a, b], and recall b that the Riemann–Stieltjes integral a f dφ then exists by Theorem 2.24. a+ε Show that limε→0+ a f dφ = 0 if and only if either f (a) = 0 or φ is b b continuous at a. Deduce that the formula a f dφ = limε→0+ a+ε f dφ may not hold. 25. Construct a bounded nondecreasing function on (−∞, ∞) which is continuous at every irrational number and discontinuous at every rational enumeration of the rational numbers and number. (Let {rk }∞ k=1 be an consider the function f (x) = k:rk ≤x 2−k .)

26. Let f (x) = sin x. Sketch the graphs of its variations P(x), N(x), V(x) for 0 ≤ x ≤ 2π, and use Corollary 2.10 to find explicit formulas for these variations on [0, 2π].

27. If f is an even function on [−1, 1], verify the formulas V[ f ; −1, 1] = 2P[ f ; −1, 1], V[ f ; 0, 1] = P[ f ; −1, 1] and P[ f ; 0, 1] = N[ f ; −1, 0]. 28. Let f be Lipschitz continuous on an interval [a, b]. Show that V[ f ; a, b] is strictly less than the length of the graph of f over [a, b]. b 29. Use Theorem 2.17 to verify the formula for a f dφ given in remark 3 in Section 2.3. 30. Let f and φ be real-valued functions on [a, b]. b (a) If a f dφ exists and φ is not constant on any subinterval of [a, b], show that f is bounded on [a, b]. b b (b) If a f dφ exists and φ is increasing, show that L ≤ a f dφ ≤ U for every partition of [a, b], where L and U are the corresponding lower and upper sums.

40

Measure and Integral: An Introduction to Real Analysis

31. Show that for any real-valued function f on [a, b], the Riemann– b Stieltjes integral a df exists and equals f (b) − f (a). If f exists and is b b Riemann integrable on [a, b], show that a f (x) dx = a df , and conseb quently a f (x) dx = f (b) − f (a). 32. Let f be a function of bounded variation on an interval [a, b]. Show that f is Riemann integrable on [a, b]. (This follows from Theorems 5.54 and 2.8, but it can be derived solely from the results in this chapter.)

3 Lebesgue Measure and Outer Measure In this chapter, we will define and study the Lebesgue measure of sets in Rn . This will be the foundation for the theory of integration to be developed later. We will base the presentation on the notion of the outer measure of a set.

3.1 Lebesgue Outer Measure and the Cantor Set We consider closed n-dimensional intervals I = {x: aj ≤ xj ≤ bj , j = 1, . . . , n} and their volumes v(I) = nj=1 (bj − aj ). (See p. 7 in Section 1.3.) To define the outer measure of an arbitrary subset E of Rn , cover E by a countable collection S of intervals Ik , and let σ(S) = v(Ik ). Ik ∈S

The Lebesgue outer measure (or exterior measure) of E, denoted |E|e , is defined by |E|e = inf σ(S),

(3.1)

where the infimum is taken over all such covers S of E. Thus, 0 ≤ |E|e ≤ +∞.

Theorem 3.2

For an interval I, |I|e = v(I).

Proof. Since I covers itself, we have |I|e ≤ v(I). To show the opposite inequal∗ interval ity, suppose that S = {Ik }∞ k=1 is a cover of I. Given ε > 0, let Ik be an ∗ containing Ik in its interior such that v(Ik ) ≤ (1 + ε)v(Ik ). Then I ⊂ k (Ik∗ )◦ , and since I is closed and bounded, the Heine–Borel Theorem 1.9 implies there N N ∗ ∗ is an integer N such that I ⊂ k=1 Ik . Therefore, v(I) ≤ k=1 v(Ik ) by a basic property of the volume of intervals (see p. 8 in Section 1.3). Hence, v(I) ≤ (1 + ε) N k=1 v(Ik ) ≤ (1 + ε)σ(S). Since ε can be chosen arbitrarily small, it follows that v(I) ≤ σ(S) and, therefore, that v(I) ≤ |I|e . This completes the proof. Note that the boundary of any interval has outer measure zero. Compare Exercise 22 of Chapter 1. 41

42

Measure and Integral: An Introduction to Real Analysis

The following two theorems state simple but basic properties of outer measure.

Theorem 3.3

If E1 ⊂ E2 , then |E1 |e ≤ |E2 |e .

The proof follows immediately from the fact that any cover of E2 is also a cover of E1 . Theorem 3.4

If E =

Ek is a countable union of sets, then |E|e ≤

|Ek |e .

Proof. We may assume that |Ek |e < +∞ for each k = 1, 2, . . . , since otherwise (k) the conclusion is obvious. Fix ε > 0. Given k, choose intervals Ij such that Ek ⊂ j Ij(k) and j v(Ij(k) ) < |Ek |e + ε2−k . Since E ⊂ j,k Ij(k) , we have |E|e ≤ (k) (k) j,k v(Ij ) = k j v(Ij ). Therefore, |E|e ≤

(|Ek |e + ε2−k ) = |Ek |e + ε, k

k

and the result follows by letting ε → 0. We see in particular that any subset of a set with outer measure zero has outer measure zero and that the countable union of sets with outer measure zero has outer measure zero. Since any set consisting of a single point clearly has outer measure zero, it follows that any countable subset of Rn has outer measure zero. For example, the set consisting of all points each of whose coordinates is rational has outer measure zero, even though it is dense in Rn . There are sets with outer measure zero that are not countable. As an illustration, we will construct a subset of the real line with outer measure zero that is perfect, and therefore uncountable, by Theorem 1.9. Variants of the construction and analogues for Rn , n > 1, are given in the exercises. Consider the closed interval [0, 1]. The first stage of the construction is to subdivide [0, 1] into thirds and the interior of the middle third; that remove 1 2 is, remove the open interval 3 , 3 . Each successive step of the construction is essentially the same. Thus, stage, we subdivide each of the at the second remaining two intervals 0, 13 and 23 , 1 into thirds and remove the interiors, 1 2 7 8 and , , 9 9 9 9 , of their middle thirds. We continue the construction for each of the remaining intervals. The sets removed in the first three successive stages are indicated in the following illustration by darkened intervals:

43

Lebesgue Measure and Outer Measure

0

0

1 9

2 9

1 3

2 3

1 3

2 3

1

7 9

8 9

1

The subset of [0, 1] that remains after infinitely many such operations is called the Cantor set C: thus, if Ck denotes the union of the intervals left at the kth stage, then C=

∞

Ck .

(3.5)

k =1

Since each Ck is closed, it follows from Theorem 1.7 that C is closed. Note that Ck consists of 2k closed disjoint intervals, each of length 3−k , and that C contains the endpoints of all these intervals. Any point of C belongs to an interval in Ck for every k and is therefore a limit point of the endpoints of the intervals. This proves that C is perfect. Finally, since C is covered by the intervals in any Ck , we have |C|e ≤ 2k 3−k for each k. Therefore, |C|e = 0. We now introduce a function associated with the Cantor set that will be useful later. If Dk = [0, 1] − Ck , then Dk consists of the 2k − 1 open intervals Ijk (ordered from left to right as j proceeds from j = 1 to j = 2k − 1) removed in the first k stages of construction of the Cantor set. Let fk be the continuous function on [0, 1] which satisfies fk (0) = 0, fk (1) = 1, fk (x) = j2−k on Ijk , j = 1, . . . , 2k − 1, and which is linear on each interval of Ck . The graphs of f1 and f2 are shown in the following illustration:

1 3 4

f2 f1

1 2

1 4

0

1 9

2 9

1 3

2 3

7 9

8 9

1

44

Measure and Integral: An Introduction to Real Analysis

By construction, each fk is monotone increasing, fk+1 = fk on Ijk , j = 1, . . . , 2k − 1, and |fk − fk+1 | < 2−k . Hence, ( fk − fk+1 ) converges uniformly on [0, 1], and therefore { fk } converges uniformly on [0, 1]. Let f = limk→∞ fk . Then f (0) = 0, f (1) = 1, f is monotone increasing and continuous on [0, 1], and f is constant on every interval removed in constructing C. This f is called the Cantor–Lebesgue function. Its graph is sometimes called the Devil’s staircase. The next two theorems give useful relations between the outer measure of a set and the outer measures of open sets and Gδ sets (see p. 6 in Section 1.3) that contain it. Theorem 3.6 Let E ⊂ Rn . Then given ε > 0, there exists an open set G such that E ⊂ G and |G|e ≤ |E|e + ε. Hence, |E|e = inf |G|e ,

(3.7)

where the infimum is taken over all open sets G containing E. ∞ Proof. Given ε > 0, choose intervals Ik with E ⊂ ∞ k=1 Ik and k=1 v(Ik ) ≤ |E|e + 12 ε. Let Ik∗ be an interval containing Ik in its interior (Ik∗ )◦ such that v(Ik∗ ) ≤ v(Ik ) + ε2−k−1 . If G = (Ik∗ )◦ , then G is open and contains E. Furthermore, |G|e ≤

∞ k=1

v(Ik∗ ) ≤

∞ k=1

v(Ik ) + ε

∞ k=1

1 1 2−k−1 ≤ |E|e + ε + ε = |E|e + ε. 2 2

This completes the proof.

Theorem 3.8 |E|e = |H|e .

If E ⊂ Rn , there exists a set H of type Gδ such that E ⊂ H and

Proof. By Theorem 3.6, there is for every positive integer k an open set Gk ⊃ E such that |Gk |e ≤ |E|e + 1/k. If H = ∞ k=1 Gk , then H is of type Gδ and H ⊃ E. Moreover, for every k, |E|e ≤ |H|e ≤ |Gk |e ≤ |E|e + 1/k. Thus, |E|e = |H|e . Note that each |Gk |e < ∞ if |E|e < ∞. The significance of Theorem 3.8 is that the most general set in Rn can be included in a set of relatively simple type, namely, Gδ , with the same outer measure. In defining the notion of outer measure, we used intervals I with edges parallel to the coordinate axes. The question arises whether the outer measure of a set depends on the position of the (orthogonal) coordinate axes.

Lebesgue Measure and Outer Measure

45

The answer is no, and to prove this, we will simultaneously consider the usual coordinate system in Rn and a fixed rotation of this system. Notions pertaining to the rotated system will be denoted by primes. Thus, I denotes an interval with edges parallel to the rotated coordinate axes, and |E| e denotes the outer measure of a subset E relative to these rotated intervals: v(Ik ), (3.9) |E| e = inf where the infimum is taken over all coverings of E by rotated intervals Ik . The volume of an interval is clearly unchanged by rotation. (See p. 8 in Section 1.3.) Theorem 3.10

|E|e = |E|e for every E ⊂ Rn .

Proof. We first claim that given I and ε > 0, there exist {Il } such that I ⊂ v(Il ) ≤ v(I ) + ε. To see this, let I1 be an interval containing I

Il , and in its interior such that v(I1 ) ≤ v(I ) + ε. By Theorem 1.11, the interior of I1

can be written as a union of nonoverlapping intervals Il . Hence, I ⊂ Il . N

Moreover, since the Il are nonoverlapping and l=1 Il ⊂ I1 for every positive

integer N, we have N l ) ≤ v(I1 ) by a property of volume listed on p. 8 in l=1 v(I ∞ Section 1.3. Therefore, l=1 v(Il ) ≤ v(I1 ) ≤ v(I ) + ε, which proves the claim.

Il A parallel result is that given I and ε > 0, there exist {Il } such that I ⊂ and v(Il ) ≤ v(I) + ε. Let E be any subset of Rn . Given ε, choose {Ik : k = 1, 2, . . .} such that

} such that I ⊂ E⊂ Ik and v(Ik ) ≤ |E|e + 12 ε. For each k, choose {Ik,l k l Ik,l

−k−1 and l v(Ik,l ) ≤ v(Ik ) + ε2 . Thus, k,l

v(Ik,l ) ≤

k

1 v(Ik ) + ε ≤ |E|e + ε. 2

Since E ⊂ k,l Ik,l , we obtain |E| e ≤ |E|e + ε. Hence, |E| e ≤ |E|e , and by symme

try, |E|e = |E|e . For related results, see Exercise 22.

3.2 Lebesgue Measurable Sets A subset E of Rn is said to be Lebesgue measurable, or simply measurable, if given ε > 0, there exists an open set G such that E ⊂ G and |G − E|e < ε.

46

Measure and Integral: An Introduction to Real Analysis

If E is measurable, its outer measure is called its Lebesgue measure, or simply its measure, and denoted |E|: |E| = |E|e , for measurable E.

(3.11)

The condition that E be measurable should not be confused with the conclusion of Theorem 3.6, which states that there exists an open G containing E such that |G|e ≤ |E|e + ε. In general, since G = E (G − E) when E ⊂ G, we only have |G|e ≤ |E|e + |G − E|e , and we cannot conclude from |G|e ≤ |E|e + ε that |G − E|e < ε. We now list two simple examples of measurable sets. A nonmeasurable set will be constructed in Theorem 3.38. Example 1 Every open set is measurable. This is immediate from the definition. Example 2 Every set of outer measure zero is measurable. Suppose that |E|e = 0. Then given ε > 0, there is by Theorem 3.6 an open G containing E with |G| < |E|e + ε = ε. Hence, |G − E|e ≤ |G| < ε. Theorem 3.12 The union E = measurable, and

Ek of a countable number of measurable sets is

|E| ≤

|Ek |.

Proof. Let ε > 0. For each k = 1, 2, . . . , choose an open set Gk such that Ek ⊂ Gk −k . Then G = and |Gk − E | < ε2 Gk is open and E ⊂ G. Moreover, since ke G − E ⊂ (Gk − Ek ), we have |Gk − Ek |e < ε. |G − E|e ≤ (Gk − Ek ) ≤ e

This proves that E is measurable. The fact that |E| ≤ Theorem 3.4.

Corollary 3.13

An interval I is measurable, and |I| = v(I).

|Ek | follows from

47

Lebesgue Measure and Outer Measure

Proof. I is the union of its interior and its boundary. Since its boundary has measure zero, the fact that I is measurable follows from Theorem 3.12 and the results of Examples 1 and 2. Theorem 3.2 shows that |I| = v(I).

Theorem 3.14

Every closed set F is measurable.

In order to prove this, we will use Theorem 1.11 and the next two lemmas. N Lemma 3.15 If {I finite collection of nonoverlapping intervals, then k }k=1 is a is measurable and | Ik | = |Ik |.

Ik

Proof. The fact that Ik is measurable follows from Corollary 3.13. The rest of the lemma is a minor extension of Theorem 3.2, and its proof is left as an exercise. The reader should note the important role played by the Heine– Borel theorem in the proof. We recall from Chapter 1 that the distance between two sets E1 and E2 is defined as d(E1 , E2 ) = inf{|x1 − x2 | : x1 ∈ E1 , x2 ∈ E2 }. Lemma 3.16

If d(E1 , E2 ) > 0, then |E1 ∪ E2 |e = |E1 |e + |E2 |e .

Proof. By Theorem 3.4, |E1 ∪ E2 |e ≤ |E1 |e + |E2 |e . To prove the opposite Ik inequality, suppose ε > 0, and choose intervals {Ik } such that E1 ∪ E2 ⊂ and |Ik | ≤ |E1 ∪ E2 |e + ε. We may assume that the diameter of each Ik is less than d(E1 , E2 ). (Otherwise, we divide each Ik into a finite number of nonoverlapping subintervals with this property and apply Lemma 3.15.) Hence, {Ik } splits into two subsequences {Ik } and {Ik

}, the first of which covers E1 and the second, E2 . Clearly, |Ik | + |Ik

| = |Ik | ≤ |E1 ∪ E2 |e + ε. |E1 |e + |E2 |e ≤ Therefore, |E1 |e + |E2 |e ≤ |E1 ∪ E2 |e , which completes the proof. A special case of this result will be used in the proof of Theorem 3.14. If E1 and E2 are compact and disjoint, then d(E1 , E2 ) > 0 by Exercise 1(l) of Chapter 1; therefore, |E1 ∪ E2 |e = |E1 |e + |E2 |e if E1 and E2 are compact and disjoint. Proof of Theorem 3.14. Suppose first that F is compact. Given ε > 0, choose an open set G such that F ⊂ G and |G| < |F|e +ε. Since G−F is open, Theorem 1.11

48

Measure and Integral: An Introduction to Real Analysis

implies there are nonoverlapping closed intervals Ik , k = 1, 2, . . . , such that G−F = Ik . Therefore, |G−F|e ≤ |Ik |, and it suffices to show that |Ik | ≤ N ε. We have G = F ∪ ( Ik ) ⊃ F ∪ ( N k=1 Ik ) for every finite number {Ik }k=1 of the Ik . Therefore, N N

|G| ≥ F ∪ Ik = |F|e + Ik k=1

k=1

e

e

N

by Lemma 3.16, F and k=1 Ik being disjoint and compact. Since | N k=1 Ik |e = N N | by Lemma 3.15, we obtain k=1 |Ik | ≤ |G| − |F|e < ε for every N, k=1 |Ik so that |Ik | ≤ ε, as desired. This proves the result in the case when F is compact. Fk , To complete the proof, let F be any closed subset of Rn and write F = where Fk = F ∩ {x : |x| ≤ k}, k = 1, 2, . . . . Each Fk is compact and, therefore, measurable. Hence, F is measurable by Theorem 3.12.

Theorem 3.17

The complement of a measurable set is measurable.

Proof. Let E be measurable. For each positive integer k, choose an open set Gk such that E ⊂ Gk and |G k − E|e < 1/k. Since CGk is closed, it is measurable by Theorem 3.14. Let H = k CGk . Then H is measurable and H ⊂ CE. Write CE = H ∪ Z, where Z = CE − H. Then Z ⊂ CE − CGk = Gk − E, and therefore |Z|e < 1/k for every k. Hence, |Z|e = 0 and, in particular, Z is measurable. Thus, CE is measurable since it is the union of two measurable sets. The following two theorems are corollaries of Theorems 3.12 and 3.17. Theorem 3.18 The intersection E = sets is measurable.

k Ek

of a countable number of measurable

Proof. Since CEk is measurable by Theorem 3.17. However, Ek is measurable, CE = C( k Ek ) = k CEk , and hence CE is measurable. Therefore, by another application of Theorem 3.17, E is measurable.

Theorem 3.19

If E1 and E2 are measurable, then E1 − E2 is measurable.

Proof. Since E1 − E2 = E1 ∩ CE2 , the result follows from Theorems 3.17 and 3.18. As a consequence of Theorems 3.12, 3.17, 3.18, and 3.19 it follows that the class of measurable subsets of Rn is closed under the set-theoretic operations

49

Lebesgue Measure and Outer Measure

of taking complements, countable unions, and countable intersections. Such a class of sets is called a σ-algebra; that is, a nonempty collection of subsets E of some universal set U is called a σ-algebra of sets if it satisfies the following two conditions: (i) CE ∈ if E ∈ . k Ek ∈ if Ek ∈ , k = 1, 2, . . . .

(ii)

Note that it follows from the relation C( k Ek ) = k CEk that k Ek ∈ if is a σ-algebra and Ek ∈ , k = 1, 2, . . . . Moreover, it is easy to see that if is a σ-algebra, then the universal set U and the empty set ∅ belong to . The following result is just a reformulation of Theorems 3.12 and 3.17.

Theorem 3.20

The collection of measurable subsets of Rn is a σ-algebra.

Note, for example, that if {Ek }∞ k=1 are measurable, then lim sup Ek and lim inf Ek are measurable since lim sup Ek =

∞ ∞

Ek

and

lim inf Ek =

j=1 k=j

∞

∞

Ek .

j=1 k=j

If C1 and C2 are two collections of sets, we say that C1 is contained in C2 if every set in C1 is also in C2 . If F is a family of σ-algebras , we define the collection of all sets E that belong to every in F . It is easy ∈F to be to check that ∈F is itself a σ-algebra and is contained in every in F . , consider the family F of all Given a collection C of subsets of Rn σ-algebras that contain C , and let E = ∈F . Then E is the smallest σ-algebra containing C ; that is, E is a σ-algebra containing C , and if is any other σ-algebra containing C , then contains E . The smallest σ-algebra of subsets of Rn containing all the open subsets of Rn is called the Borel σ-algebra B of Rn , and the sets in B are called Borel subsets of Rn . Sets of type Fσ , Gδ , Fσδ , Gδσ (see p. 6 in Section 1.3), etc., are Borel sets.

Theorem 3.21

Every Borel set is measurable.

Proof. Let M be the collection of measurable subsets of Rn . By Theorem 3.20, M is a σ-algebra. Since every open set belongs to M , and B is the smallest σ-algebra containing the open sets, B is contained in M . The converse of Theorem 3.21 is false: see Exercise 31.

50

Measure and Integral: An Introduction to Real Analysis

3.3 Two Properties of Lebesgue Measure The next two theorems give properties of Lebesgue measure that are of fundamental importance. To prove them, we first need the following characterization of measurability in terms of closed sets. Lemma 3.22 A set E in Rn is measurable if and only if given ε > 0, there exists a closed set F ⊂ E such that |E − F|e < ε. Proof. E is measurable if and only if CE is measurable, that is, if and only if given ε > 0, there exists an open G such that CE ⊂ G and |G − CE|e < ε. Such G exists if and only if the set F = CG is closed, F ⊂ E, and |E − F|e < ε (since G − CE = E − F). If {Ek } is a countable collection of disjoint measurable sets, then

Theorem 3.23

Ek = |Ek |. k

k

Proof. First, suppose each Ek is bounded. Given ε > 0 and k = 1, 2, . . . , use Lemma 3.22 to choose a closed Fk ⊂ Ek with |Ek − Fk | < ε2−k . Then |Ek | ≤ |Fk | + ε2−k by Theorem 3.12. Since the Ek are bounded and disjoint, m the F | = Fk are compact and disjoint. Therefore, by Lemma 3.16, | m k k=1 k=1 |Fk | m m of the F . The fact that } F ⊂ for every finite number {F k k k k Ek then k=1 k=1 |F | ≤ | E |. Hence, implies that m k k k=1 k Ek ≥ |Fk | ≥ (|Ek | − ε2−k ) = |Ek | − ε, k

k

k

k

so that | k Ek | ≥ k |Ek |. Since the opposite inequality is always true, the theorem follows in this case. For the general case, let Ij , j = 1, 2, . . . , be a sequence of intervals increasing to Rn , and define S1 = I1 and Sj = Ij − Ij−1 for j ≥ 2. Then the sets Sj , k, j = 1, 2, . . ., are bounded, disjoint, and measurable; Ek = Ek,j = Ek ∩ E and E = j k,j k k k,j Ek,j . Therefore, by the case already established, we have Ek = Ek,j = |Ek,j | = k

k,j

k,j

k

j

|Ek,j | =

k

|Ek |.

51

Lebesgue Measure and Outer Measure

Corollary 3.24 |Ik |.

If {Ik } is a sequence of nonoverlapping intervals, then |

Ik | =

◦ Proof. It is clearthat | I |Ik |. other k| ≤ On the hand, the (I k ) being dis◦ ◦ |(Ik ) | = |Ik |. Thus, | Ik | = |Ik |. joint, we have | Ik | ≥ | (Ik ) | = An alternate proof not using Theorem 3.23 can be obtained from Lemma 3.15. Corollary 3.25 Suppose E1 and E2 are measurable, E2 ⊂ E1 , and |E2 | < +∞. Then |E1 − E2 | = |E1 | − |E2 |. Proof. Since E1 = E2 ∪ (E1 − E2 ), Theorem 3.23 gives |E1 | = |E2 | + |E1 − E2 |. The corollary now follows from the assumption that |E2 | < +∞. The second basic property of Lebesgue measure concerns its behavior for a monotone sequence of sets.

Theorem 3.26

Let {Ek }∞ k=1 be a sequence of measurable sets.

(i) If Ek E, then limk→∞ |Ek | = |E|. (ii) If Ek E and |Ek | < +∞ for some k, then limk→∞ |Ek | = |E|. Proof. (i) We may assume that |Ek | < +∞ for all k; otherwise, both limk→∞ |Ek | and |E| are infinite. Write E = E1 ∪ (E2 − E1 ) ∪ · · · ∪ (Ek − Ek−1 ) ∪ · · · . Since the terms in this union are measurable and disjoint, we have by Theorem 3.23 that |E| = |E1 | + |E2 − E1 | + · · · + |Ek − Ek−1 | + · · · . By Corollary 3.25, |E| = |E1 | + (|E2 | − |E1 |) + · · · + (|Ek | − |Ek−1 |) + · · · = lim |Ek |. k→∞

(ii) We may clearly assume that |E1 | < +∞. Write E1 = E ∪ (E1 − E2 ) ∪ · · · ∪ (Ek − Ek+1 ) ∪ · · · .

52

Measure and Integral: An Introduction to Real Analysis

Since the terms on the right are disjoint measurable sets, and since each Ek has finite measure, we have |E1 | = |E| + (|E1 | − |E2 |) + · · · + (|Ek | − |Ek+1 |) + · · · = |E| + |E1 | − lim |Ek |. k→∞

Therefore, |E| = limk→∞ |Ek |, which completes the proof. The restriction in (ii) that |Ek | < +∞ for some k is necessary, as the following example shows. Let Ek be the complement of the ball with center 0 and radius k. Then |Ek | = +∞ for all k and Ek ∅, the empty set. Therefore, limk→∞ |Ek | = +∞, while |∅| = 0. Although we are interested almost exclusively in measurable sets, proving the measurability of a given set is occasionally difficult in practice, and it may be desirable to apply theorems about outer measure. A particularly useful result is the following modification of part (i) of Theorem 3.26. The corresponding modification of part (ii) of Theorem 3.26 does not generally hold; see Exercise 21.

Theorem 3.27

If Ek E, then limk→∞ |Ek |e = |E|e .

Proof. For each k, let Hk be a measurable set (e.g., a set type Gδ ) such that of ∞ 2, . . . , let V = Ek ⊂ Hk and |Hk | = |Ek |e . For m = 1, m k=m Hk . Since the Vm are measurable and increase to V = Vm , it follows from Theorem 3.26 that limm→∞ |Vm | = |V|. Since Em ⊂ Vm ⊂ Hm , we have |Em |e ≤ |Vm | ≤|Hm | = Vm ⊃ |E m |e . Hence, |Vm | = |Em |e and limm→∞ |Em |e = |V|. However, V = Em = E, and therefore, limm→∞ |Em |e ≥ |E|e . The opposite inequality is obvious since Em ⊂ E, and the theorem follows.

3.4 Characterizations of Measurability Lemma 3.22 characterizes measurability in terms of closed subsets of a set. The next three theorems give some other characterizations. The first one states that the most general measurable set differs from a Borel set by a set of measure zero.

Theorem 3.28 (i) E is measurable if and only if E = H − Z, where H is of type Gδ and |Z| = 0. (ii) E is measurable if and only if E = H ∪ Z, where H is of type Fσ and |Z| = 0.

Lebesgue Measure and Outer Measure

53

Proof. If E has the representation expressed in either (i) or (ii), it is measurable since H and Z are. Conversely, to prove the necessity in (i), suppose that E is measurable. For each k = 1, 2, . . . , choose an open set Gk such that E ⊂ Gk and |Gk − E| < 1/k. Set H = k Gk . Then H is of type Gδ , E ⊂ H, and H − E ⊂ Gk − E for every k. Hence, |H − E| = 0, and (i) is proved. The necessity of (ii) follows from that of (i) by taking complements: if E is measurable, so is CE, andtherefore CE = Gk − Z, where the Gk are open and |Z| = 0. Hence, E = ( CGk ) ∪ Z, which completes the proof. Theorem 3.29 Suppose that |E|e < +∞. Then E is measurable if and only if given ε > 0, E = (S ∪ N1 )−N2 , where S is a finite union of nonoverlapping intervals and |N1 |e , |N2 |e < ε. The proof is left as an exercise. Our final characterization of measurability states that the measurable sets are those that split every set (measurable or not) into pieces that are additive with respect to outer measure. This characterization will be used in Chapter 11 to construct measures in abstract spaces.

Theorem 3.30 (Carathéodory) A set E is measurable if and only if for every set A |A|e = |A ∩ E|e + |A − E|e . Proof. Suppose that E is measurable. Given A, let H be a set of type Gδ such that A ⊂ H and |A|e = |H|. Since H = (H ∩ E) ∪ (H − E), and since H ∩ E and H − E are measurable and disjoint, |H| = |H ∩ E| + |H − E|. Therefore, |A|e = |H ∩ E| + |H − E| ≥ |A ∩ E|e + |A − E|e . Since the opposite inequality is clearly true, we obtain |A|e = |A ∩ E|e + |A − E|e . Conversely, suppose that E satisfies the stated condition for every A. In case |E|e < +∞, choose a Gδ set H such that E ⊂ H and |H| = |E|e . Then H = E ∪ (H − E), and by hypothesis, |H| = |E|e + |H − E|e . Therefore, |H − E|e = 0; so the set Z = H − E is measurable, and consequently, E is measurable. 2, . . ., In case |E|e = +∞, let Bk be the ball with center 0 and radius k, k = 1, Ek . and let Ek = E ∩ Bk . Then each Ek has finite outer measure and E = Let Hk be a set of type Gδ containing Ek with |Hk | = |Ek |e . By hypothesis, |Hk − E| = 0. |Hk | = |H k ∩ E|e + |Hk − E|e ≥ |Ek |e + |Hk − E|e . Therefore, Let H = Hk . Then H is measurable, E ⊂ H, and H − E = (Hk − E). In particular, H−E has measure zero, and since E = H−(H−E), E is measurable. This completes the proof.

54

Measure and Integral: An Introduction to Real Analysis

As a special case of Carathéodory’s theorem, we obtain the following result. Corollary 3.31 If E is a measurable subset of a set A, then |A|e = |E| + |A − E|e . Hence, if |E| < +∞, |A − E|e = |A|e − |E|. We can now prove a stronger version of Theorem 3.8. Theorem 3.32 Let E be a subset of Rn . There exists a set H of type Gδ such that E ⊂ H and for any measurable set M, |E ∩ M|e = |H ∩ M| . If M = Rn , this reduces to Theorem 3.8. Proof. Consider first the case when |E|e < +∞. Let H be a set of type Gδ such that E ⊂ H and |E|e = |H|. If M is measurable, then by Carathéodory’s theorem |E|e = |E ∩ M|e + |E − M|e and |H| = |H ∩ M| + |H − M|. Therefore, |E ∩ M|e + |E − M|e = |H ∩ M| + |H − M|. Since all these terms are finite, and since the inclusion E − M ⊂ H − M implies that |E − M|e ≤ |H − M|, we have |E ∩ M|e ≥ |H ∩ M|. The opposite inequality is also true since E ∩ M ⊂ H ∩ M, and the theorem follows in this case. Ek with |Ek |e < +∞ and Ek E. For In case |E|e = +∞, we write E = example, Ek could be the intersection of E with the ball of radius k centered at the origin. By the case already considered, there is a set Uk of type Gδ such Hk = that ∞ Ek ⊂ Uk and |Ek ∩ M|e = |Uk ∩ M| for any measurable M. Let U . Then H is measurable (in fact, H is of type G ), H H = Hk , m δ k k k m=k and Ek ⊂ Hk ⊂ Uk . Hence, Ek ∩ M ⊂ Hk ∩ M ⊂ Uk ∩ M, and therefore, |Ek ∩ M|e = |Hk ∩ M| for measurable M. Since Ek E and Hk H, we have Ek ∩ M E ∩ M and Hk ∩ M H ∩ M. By Theorem 3.27, |E ∩ M|e = |H ∩ M| for measurable M. Note that our set H is of type Gδσ . To obtain a set of type Gδ , use Theorem 3.28 to write H = H1 − Z, where H1 is of type Gδ and |Z| = 0. Then E ⊂ H1 . Moreover, H1 ∩ M = (H ∩ M) ∪ (Z ∩ M), and since |Z| = 0, we have |H1 ∩ M| = |H ∩ M|, so that |E ∩ M|e = |H1 ∩ M|. This completes the proof.

3.5 Lipschitz Transformations of Rn Theorem 3.10 shows that the notion of outer measure is independent of the orientation of the coordinate axes. Since measurability and measure are

55

Lebesgue Measure and Outer Measure

defined in terms of outer measure, it follows that these too are independent of rotation of the axes. We wish to study the effect of other transformations of Rn on the class of measurable sets; that is, we seek a condition on a transformation T: Rn → Rn such that the image TE = {y : y = Tx, x ∈ E} of every measurable set E is measurable. We note that a continuous transformation may not preserve measurability: see Exercise 17 of this chapter, as well as Chapter 7, Exercise 10. A transformation y = Tx of Rn into itself is called a Lipschitz transformation if there is a constant c such that |Tx − Tx | ≤ c|x − x |. The smallest such constant c, namely, the number c = sup

x,x ;x=x

|Tx − Tx | , |x − x |

is called the Lipschitz constant of T. If yj = fj (x), j = 1, . . . , n, are the coordinate functions representing T (see p. 12 in Section 1.7), it follows that T is Lipschitz if and only if each fj satisfies a Lipschitz condition |fj (x)−fj (x )| ≤ cj |x−x |. For example, a linear transformation of Rn is clearly a Lipschitz transformation; see Exercise 29. More generally, a mapping yj = fj (x), j = 1, . . . , n, for which each fj has bounded first partial derivatives in Rn is a Lipschitz mapping. Theorem 3.33 If y = Tx is a Lipschitz transformation of Rn , then T maps measurable sets into measurable sets. Proof. We will first show that a continuous transformation sends sets of type Fσ into sets of type Fσ . A continuous T maps compact sets into compact sets by Theorem 1.17; therefore, since any closed set can be written as a countable union of compact sets, closed sets into sets of type Fσ . Here, we T maps TEk , which holds for any T and {Ek }. It have used the relation T Ek = follows that a continuous T preserves the class of Fσ sets. We will next show that a Lipschitz transformation T sends sets of measure zero into sets of measure zero. Since |Tx − Tx | ≤ c|x − x |, the image of a set with diameter d has diameter at most cd. It follows (see Exercise 28) by covering with cubes that there is a constant c such that |TI| ≤ c |I| for any interval I; note that TI is measurable since I is closed. If |Z| = 0 and ε > 0, } covering Z such that |I | < ε. Since TZ ⊂ TIk , we choose intervals {I k k have |TZ|e ≤ |TIk | ≤ c |Ik | < c ε. Hence, |TZ| = 0. If E is a measurable set, we use Theorem 3.28 to write E = H ∪ Z, where H is of type Fσ and |Z| = 0. Since TE = TH ∪ TZ, TE is measurable as the union of measurable sets. This completes the proof.

56

Measure and Integral: An Introduction to Real Analysis

For an extension of Theorem 3.33 in case n = 1, see Exercise 10 of Chapter 7. In the special case that T is a linear transformation of Rn , we will derive a formula for the measure of TE. By elementary linear algebra, such T has a unique n × n matrix representation with respect to every given basis of Rn , and the value of the determinant of every matrix representation of T is the same. The common value is denoted det T and called the determinant of T. For the sake of definiteness, we may think of T as identified with its matrix representation with respect to the standard orthonormal basis of unit vectors along the coordinate axes. Then Tx is the vector resulting from the matrix action of T on x. If P is a parallelepiped (see p. 7 in Section 1.3), arguments like those used in proving Theorems 3.2 and 3.10 show that |P| = v(P)

(3.34)

(see Exercise 16). Hence, by p. 7 in Section 1.3, |P| is the absolute value of the n × n determinant whose rows are the edges of P. Theorem 3.35 Let T be a linear transformation of Rn , and let E be measurable. Then |TE| = δ|E|, where δ is the absolute value of the determinant of T.∗ Proof. Let δ = |det T|. Then for any interval I, |TI| = δ|I| by p. 8 in Section 1.3. n intervals {Ik } covering If E is any ε > 0, choose measurable subset of R and E with |Ik | ≤ |E| + ε. Then |TE| ≤ |TIk | = δ |Ik | ≤ δ(|E| + ε). Note here that if δ= 0, then |TIk | = 0 for every k and consequently the first inequality |TE| ≤ |TIk | yields |TE| = 0 even when |E| = ∞. Therefore, |TE| ≤ δ|E|,

(3.36)

where we interpret 0 · ∞ as 0. To see that |TE| = δ|E|, we may assume that δ > 0 by (3.36). Then T has an inverse T−1 that is also linear, and E = T−1 (TE). Therefore, by (3.36) applied to T−1 and the set TE, |E| ≤ |det (T−1 )| |TE| = δ−1 |TE|, or |TE| ≥ δ|E|. Hence, by (3.36), δ|E| = |TE|. We leave it as an exercise to show that |TE|e = δ|E|e for any set E.

∗ Here 0 · ∞ should be interpreted as 0.

Lebesgue Measure and Outer Measure

57

The following special case of Theorem 3.35 deserves mention: if E is a measurable set in Rn and λ is a real number, then the set λE defined by λE = {λx : x ∈ E} is measurable with measure |λE| = |λ|n |E|. In fact, λE is the image of E under the matrix λI, where I is the identity matrix on Rn .

3.6 A Nonmeasurable Set We will now construct a nonmeasurable subset of R1 ; the construction in Rn , n > 1, is similar. We will need the axiom of choice in the following form. Zermelo’s Axiom: Consider a family of arbitrary nonempty disjoint sets indexed by a set A, {Eα : α ∈ A}. Then there exists a set consisting of exactly one element from each Eα , α ∈ A. We also need the following lemma. Lemma 3.37 Let E be a measurable subset of R1 with |E| > 0. Then the set of differences {d : d = x − y, x ∈ E, y ∈ E} contains an interval centered at the origin. Proof. Given ε > 0, since |E| > 0, there exists an open set G such that E ⊂ G and |G| ≤ (1 + ε)|E|. By Theorem 1.11, G can be written as a union of . Letting Ek = E ∩ Ik , it follows that E = nonoverlapping intervals, G = I k that two different Ek , that the Ek are measurable, and Ek ’s have at most one |Ek |. Since |G| ≤ (1 + point in common. Therefore, |G| = |Ik | and |E| = ε)|E|, we must have |Ik0 | ≤ (1+ε)|Ek0 | for some k0 . Choosing ε = 13 and letting I and E denote the sets Ik0 and Ek0 , respectively, we have E ⊂ I and |E | ≥ 34 |I|. We claim that if E is translated by any number d satisfying |d| < 12 |I|, the translated set Ed has points in common with E . Otherwise, since E ∪ Ed is contained in an interval of length |I| + |d|, we would have |E | + |Ed | ≤ |I| + |d|, or 2|E | ≤ |I| + |d|. [Here, we have used the fact that Ed is measurable and |Ed | = |E | (see Exercise 18).] However, the last inequality is false if |d| < 12 |I| since |E | ≥ 34 |I|. This proves the claim and thus the lemma. See Exercise 30 in Chapter 7 for a related fact.

Theorem 3.38 (Vitali) There exist nonmeasurable sets. Proof. We define an equivalence relation on the real line by saying that x and y are equivalent if x−y is rational. The equivalence classes then have the form Ex = {x+r : r is rational}. Two classes Ex and Ey are either identical or disjoint;

58

Measure and Integral: An Introduction to Real Analysis

therefore, one equivalence class consists of all the rational numbers, and the other distinct classes are disjoint sets of irrational numbers. The number of distinct equivalence classes is uncountable since each class is countable but the union of all the classes is uncountable (this union being the entire line). Using Zermelo’s axiom, let E be a set consisting of exactly one element from each distinct equivalence class. Since any two points of E must differ by an irrational number, the set {d : d = x − y, x ∈ E, y ∈ E} cannot contain an interval. By Lemma 3.37, it follows that either E is not measurable or |E| = 0. Since the union of the translates of E by every rational number is all of R1 , R1 would have measure zero if E did. We conclude that E is not measurable.

Corollary 3.39 measurable set.

Any set in R1 with positive outer measure contains a non-

Proof. Let A satisfy |A|e > 0, and let E be the nonmeasurable set of Theoof E by r. Thenthe Er are rem 3.38. Forrational r, let Er denote the translate disjoint and Er = (−∞, +∞). Thus, A = (A ∩ Er ) and |A|e ≤ |A ∩ Er |e . If A ∩ Er is measurable, then |A ∩ Er | = 0 by Lemma 3.37, using the fact that for every r, {x − y : x, y ∈ A ∩ Er } is a subset of {x − y : x, y ∈ E} and so contains no interval. Since |A|e > 0, it follows that there is some r such that A ∩ Er is not measurable. This completes the proof.

Exercises 1. There is an analogue for bases different from 10 of the usual decimal expansion of a number. If b is an integer larger than 1 and 0 ≤ x ≤ 1, show that there exist integral coefficients ck , 0 ≤ ck < b, such that x = ∞ −k −k k=1 ck b . Show that this expansion is unique unless x = cb , c = k 1, . . . , b − 1, in which case there are two expansions. 2. When b = 3 in Exercise 1, the expansion is called the triadic or ternary expansion of x. (a) Show that the Cantor set C consists of all x such that x has some triadic expansion for which every ck is either 0 or 2. (b) Let f (x) be the Cantor–Lebesgue see p. 43 in Section 3.1. function: Show that if x ∈ C and x = ck 3−k , where each ck is either 0 or 2, then f (x) = ( 12 ck )2−k . 3. Construct a two-dimensional Cantor set in the unit square {(x, y) : 0 ≤ x, y ≤ 1} as follows. Subdivide the square into nine equal parts and keep only the four closed corner squares, removing the remaining region (which forms a cross). Then repeat this process in a suitably scaled

Lebesgue Measure and Outer Measure

59

version for the remaining squares, ad infinitum. Show that the resulting set is perfect, has plane measure zero, and equals C × C. 4. Construct a subset of [0, 1] in the same manner as the Cantor set by removing from each remaining interval a subinterval of relative length θ, 0 < θ < 1. Show that the resulting set is perfect and has measure zero. 5. Construct a subset of [0, 1] in the same manner as the Cantor set, except that at the kth stage, each interval removed has length δ3−k , 0 < δ < 1. Show that the resulting set is perfect, has measure 1 − δ, and contains no intervals. 6. Construct a Cantor-type subset of [0, 1] by removing from each interval < θk < 1. remaining at the kth stage a subinterval of relative length θk , 0 Show that the remainder has measure zero if and only if θk = ∞ > 0, a converges, in the sense that +∞. (Use the fact that for a k k=1 k ∞ a is finite and not zero, if and only if log a conlimN→∞ N k k=1 k=1 k verges.) 7. Prove Lemma 3.15. 8. Show that the Borel σ-algebra B in Rn is the smallest σ-algebra containing the closed sets in Rn . 9. If {Ek }∞ |Ek |e < +∞, show that lim sup Ek k=1 is a sequence of sets with (and so also lim inf Ek ) has measure zero. 10. If E1 and E2 are measurable, show that |E1 ∪ E2 | + |E1 ∩ E2 | = |E1 | + |E2 |. 11. Prove Theorem 3.29. (For the sufficiency, pick open sets G and G1 with S ⊂ G, N1 ⊂ G1 , |G − S| < ε, and |G1 | < |N1 |e + ε < 2ε. Estimate |(G ∪ G1 ) − E|e .) 12. If E1 and E2 are measurable subsets of R1 , show that E1 × E2 is a measurable subset of R2 and |E1 × E2 | = |E1 ||E2 |. (Interpret 0 · ∞ as 0.) (Hint: Use a characterization of measurability.) 13. Motivated by (3.7), define the inner measure of E by |E|i = sup |F|, where the supremum is taken over all closed subsets F of E. Show that (i) |E|i ≤ |E|e , and (ii) if |E|e < +∞, then E is measurable if and only if |E|i = |E|e . (Use Lemma 3.22.) 14. Show that the conclusion of part (ii) of Exercise 13 is false if |E|e = +∞. 15. If E is measurable and A is any subset of E, show that |E| = |A|i + |E − A|e . (See Exercise 13 for the definition of |A|i .) As a consequence, using Exercise 13, show that if A ⊂ [0, 1] and |A|e + |[0, 1] − A|e = 1, then A is measurable. 16. Prove (3.34). 17. Give an example that shows that the image of a measurable set under a continuous transformation may not be measurable. (Consider the Cantor–Lebesgue function and the pre-image of an appropriate nonmeasurable subset of its range.) See also Exercise 10 of Chapter 7.

60

Measure and Integral: An Introduction to Real Analysis

18. Prove that outer measure is translation invariant; that is, if Eh = {x + h : x ∈ E} is the translate of E by h, h ∈ Rn , show that |Eh |e = |E|e . If E is measurable, show that Eh is also measurable. (This fact was used in proving Lemma 3.37.) 19. Carry out the details of the construction of a nonmeasurable subset of Rn , n > 1. 20. Show that there exist disjoint E1 , E2 , . . . , Ek , . . . such that | Ek |e < |Ek |e with strict inequality. (Let E be a nonmeasurable subset of [0, 1] whose rational translates are disjoint. Consider the translates of E by all rational numbers r, 0 < r < 1, and use Exercise 18.) 21. Show that there exist sets E1 , E2 , . . . , Ek , . . . such that Ek E, |Ek |e < +∞, and limk→∞ |Ek |e > |E|e with strict inequality. 22. (a) Show that the outer measure of a set is unchanged if in the definition of outer measure we use coverings of the set by cubes with edges parallel to the coordinate axes instead of coverings by intervals. (b) Show that outer measure is also unchanged if coverings by parallelepipeds with a fixed orientation (i.e., with edges parallel to a fixed set of n linearly independent vectors) are used rather than coverings by intervals. 23. Let Z be a subset of R1 with measure zero. Show that the set {x2 : x ∈ Z) also has measure zero. 24. Let 0.α1 α2 . . . be the dyadic development of any x in [0, 1], that is, x = α1 2−1 + α2 2−2 + · · · with αi = 0, 1. Let k1 , k2 , . . . be a fixed permutation of the positive integers 1, 2, . . . , and consider the transformation T that sends x = 0.α1 α2 · · · to Tx = 0.αk1 αk2 . . .. If E is a measurable subset of [0, 1], show that its image TE is also measurable and that |TE| = |E|. (Consider first the special cases of E a dyadic interval [s2−k , (s + 1)2−k ], s = 0, 1, . . . , 2k −1, and then of E an open set [which is a countable union of nonoverlapping dyadic intervals]. Also show that if E has small measure, then so has TE.) 25. Construct a measurable subset E of [0, 1] such that for every subinterval I, both E ∩ I and I − E have positive measure. (Take a Cantor-type subset of [0, 1] with positive measure [see Exercise 5], and on each subinterval of the complement of this set, construct another such set, and so on. The measures can be arranged so that the union of all the sets has the desired property.) See also Exercise 21 of Chapter 4. 26. Construct a continuous function on [0, 1], which is not of bounded variation on any subinterval. (The construction follows the pattern of the Cantor–Lebesgue function with some modifications. At the first stage, for example, make the corresponding function increase to 2/3 (rather than 1/2) in (0, 1/3), then make it decrease by 1/3 in (1/3, 2/3), and then increase again 2/3 in (2/3, 1). The construction at other stages is

Lebesgue Measure and Outer Measure

61

similar, depending on whether the preceding function was increasing or decreasing in the subinterval under consideration.) 27. Construct a continuous function of bounded variation on [0, 1] which is not monotone in any subinterval. (The construction is like that in the preceding exercise, except that the approximating functions are less steep. For example, at the first stage, let the function increase to 1/2 + ε, then decrease by 2ε, and then increase again by 1/2 + ε. Choose the ε’s at each stage so that their sum converges.) 28. Prove the following assertion that is made in the proof of Theorem 3.33: If T : Rn → Rn is a Lipschitz transformation, then there is a constant c > 0 such that |TI| ≤ c |I| for every interval I. (Consider first the case when I is a cube Q, noting that TQ is contained in a cube with edge length c diam Q where c is the Lipschitz constant of T. The case of general I can then be deduced by applying Theorem 1.11 to the interiors J◦ of intervals J with I ⊂ J◦ .) 29. Let T : Rn → Rn be a linear transformation. 2 1/2 , show that (a) If T has matrix representation (tij ) and t = i,j tij n |Tx − Ty| ≤ t|x − y| for all x, y ∈ R . (Use (1.2).) The number t is called the Hilbert–Schmidt norm of (tij ). (b) Prove the fact mentioned at the end of the proof of Theorem 3.35 that |TE|e = δ|E|e for every E ⊂ Rn , where δ = | det T|. 30. Let f : Rn → R1 be continuous. Show that the inverse image f −1 (B) of a Borel set B is a Borel set; see p. 64 in Section 4.1 for the definition of the inverse image of a set. (The collection of sets {E : f −1 (E) is a Borel set} is a σ-algebra and contains all open sets in R1 ; cf. Exercise 10 of Chapter 4 and Corollary 4.15.) See also Exercise 22 of Chapter 4. 31. Construct a Lebesgue measurable set that is not Borel measurable. (If f is the Cantor–Lebesgue function, then the function g(x) = x + f (x) is strictly increasing and continuous on [0, 1]. Consider the set g−1 (E) for an appropriate E ⊂ g(C), where C is the Cantor set.) 32. Let E be a set in Rn with |E|e > 0 and let θ satisfy 0 < θ < 1. Show that there is a set Eθ ⊂ E with |Eθ |e = θ|E|e and that Eθ can be chosen to be measurable if E is measurable. (If Q(r) denotes the cube with edge length r centered at the origin, 0 < r < ∞, then |E ∩ Q(r)|e is a continuous monotone function of r.) 33. Let E be a measurable set with 0 < |E| ≤ ∞. Show that there are infinite collections {Aj }, {Bj } of measurable subsets of E with the following properties: 0 < |Aj |, |Bj | < ∞, Aj ∩ Al = ∅ if j = l, Bj+1 ⊂ Bj , and |Bj | → 0. See the end of Section 8.5 for an application. (This can be proved in many ways, for example, by using the Exercise 32.) 34. Let E and Z be sets in Rn and |Z| = 0. If E ∪ Z is measurable, show that E is measurable and that |E| = |E ∪ Z|. See Exercise 2 of Chapter 10.

4 Lebesgue Measurable Functions We will use the concept of Lebesgue measure to introduce a rich class of functions and a method of integrating these functions. In this chapter, we describe the class of functions. Let E be a measurable set in Rn . Let f be a real-valued (in the usual extended sense) function defined on E, that is, −∞ ≤ f (x) ≤ +∞, x ∈ E. Then, f is called a Lebesgue measurable function on E, or simply a measurable function, if for every finite a, the set {x ∈ E : f (x) > a} is a measurable subset of Rn . In what follows, we shall often use the abbreviation {f > a} for {x ∈ E : f (x) > a}. Note that the definition of measurability of a function on a set E presupposes that E is measurable. Since E = { f = −∞} ∪

∞

{ f > −k} ,

k=1

the measurability of E implies that of { f = − ∞} if we assume that f is measurable. As a varies, the behavior of the sets { f > a} describes how the values of f are distributed. Intuitively, it is clear that the smoother f is, the smaller the variety of such sets will be. For example, if E = Rn and f is continuous in Rn , then { f > a} is always open. A function f defined on a Borel set E is said to be Borel measurable if { f > a} is a Borel set for every a. Thus, every Borel measurable function is measurable. See also Exercise 24. One further comment will be helpful later. Let M denote the class of measurable subsets of Rn . Much of the development of measurable functions given in this chapter depends only on the σ-algebra structure of M and the properties of Lebesgue measure. Thus, a measurable function inherits its elementary properties from those of measurable sets. It is reasonable to expect, therefore, that many of the methods of this chapter can be used to develop results in more general settings, for spaces other than Rn and σ-algebras other than M . This will be done in Chapter 10. To save too much repetition there, it will be helpful if the reader notices which properties of M and Lebesgue measure are used in the proofs here. These will be discussed at the end of the chapter. 63

64

Measure and Integral: An Introduction to Real Analysis

4.1 Elementary Properties of Measurable Functions Theorem 4.1 Let f be a real-valued function defined on a measurable set E. Then f is measurable if and only if any of the following statements holds for every finite a: (i) { f ≥ a} is measurable. (ii) { f < a} is measurable. (iii) { f ≤ a} is measurable. Proof. Since { f ≥ a} = ∞ k=1 { f > a − 1/k}, the measurability of f implies (i). Since { f a} is the complement of { f ≤ a}, it follows that f is measurable if (iii) holds. The proof of the following is left as an exercise.

Corollary 4.2 Let f be defined on a measurable set E. If f is measurable, then { f > −∞}, { f < +∞}, { f = +∞}, {a ≤ f ≤ b}, { f = a}, etc., are all measurable. Moreover, for any f, if either { f = +∞} or { f = −∞} is measurable, then f is measurable if {a < f < +∞} is measurable for every finite a. Also, observe that {a < f ≤ b} = { f > a} − { f > b}. Let f be defined in E. If S is a subset of R1 , the inverse image of S under f is defined by f −1 (S) = {x ∈ E : f (x) ∈ S}.

Theorem 4.3 Let f be defined on a measurable set E. If f is measurable, then for every open set G in R1 , the inverse image f −l (G) is a measurable subset of Rn . Conversely, f is measurable if f −1 (G) is measurable for every open set G in Rn and either { f = +∞} or { f = −∞} is measurable. 1 Proof. Suppose that f is measurable and let G be any open−1subset of R . By Theorem 1.10, G can be written as G = k (ak , bk ). But f ((a k , bk )) equals {ak < f < bk } and is therefore measurable. Since f −1 (G) = k f −1 ((ak , bk )), it follows that f −1 (G) is measurable too. To prove the converse, note that if G = (a, +∞), then f −1 (G) = {a < f < +∞} and apply the second part of Corollary 4.2.

Lebesgue Measurable Functions

65

This result shows that a finite function f defined on a measurable set is measurable if and only if f −1 (G) is measurable for every open G ⊂ R1 . Similarly, a finite f defined on a Borel set is Borel measurable if and only if f −1 (G) is Borel measurable for every open G ⊂ R1 . Theorem 4.4 Let A be a dense subset of R1 . Then f is measurable if { f > a} is measurable for all a ∈ A. Proof. Given any real a, choose a sequence {ak } in A that converges to a from above: ak ∈ A, ak ≥ a, limk→∞ ak = a. Then { f > a} = { f > ak }, and the theorem follows. A property is said to hold almost everywhere in E or, in abbreviated form, a.e., if it holds in E except in some subset of E with measure zero. For example, the statement “f = 0 a.e. in E” means that f (x) = 0 in E, with the possible exception of those x in some subset Z of E with |Z| = 0. The next few theorems give some simple properties of the class of measurable functions. Theorem 4.5 If f is measurable and if g = f a.e., then g is measurable and |{g > a}| = |{ f > a}|. Proof. If Z = { f = g}, then |Z| = 0 and {g > a} ∪ Z = { f > a} ∪ Z. Therefore, f being measurable, {g > a}∪Z is measurable, and since this differs from {g > a} by a set of measure zero, g is measurable (cf. Exercise 34 of Chapter 3 and Exercise 2 of Chapter 10). Finally, |{g > a}| = |{g > a} ∪ Z| = |{ f > a} ∪ Z| = |{ f > a}|. In view of the previous theorem, it is natural to extend the definition of measurability to include functions that are defined only a.e. in E, by saying that such an f is measurable on E if it is measurable on the subset of E where it is defined. Note also that if f is measurable on E, then it is measurable on any measurable E1 ⊂ E since {x ∈ E1 : f (x) > a} = {x ∈ E : f (x) > a} ∩ E1 . If φ and f are finite measurable functions defined on R1 and Rn , respectively, their composition φ(f (x)) may not be measurable (see Exercise 5). If φ is continuous, however, we have the following result. Theorem 4.6 Let φ be continuous on R1 and let f be finite a.e. in E, so that, in particular, φ(f ) is defined a.e. in E. Then φ(f ) is measurable if f is.

66

Measure and Integral: An Introduction to Real Analysis

Proof. We may assume that f is finite everywhere in E. We will use the familiar fact that since φ is continuous, the inverse image φ−1 (G) of an open set G is open (cf. Corollary 4.15 and Exercise 10). By Theorem 4.3, it is enough to show that for every open G in R1 , {x : φ(f (x)) ∈ G} is measurable. However, {x : φ(f (x)) ∈ G} = f −1 (φ−1 (G)), and since φ−1 (G) is open and f is measurable, f −1 (φ−1 (G)) is measurable by Theorem 4.3. See also Exercise 22(b). Remark: The cases that arise most frequently are φ(t) = |t|, |t|p (p > 0), ect , etc. Thus, |f |, |f |p (p > 0), ecf are measurable if f is measurable (even if we do not assume that f is finite a.e., as is easily seen). Another special case worth mentioning is that of f + = max{ f , 0},

f − = −min{ f , 0}.

It is enough to observe that the functions x+ and x− are continuous.

Theorem 4.7

If f and g are measurable, then { f > g} is measurable.

Proof. Let {rk } be the rational numbers. Then, { f > g} =

k

{ f > rk > g} =

({f > rk } ∩ {g < rk }),

k

and the theorem follows.

Theorem 4.8 measurable.

If f is measurable and λ is any real number, then f + λ and λf are

The proof is left as an exercise. We interpret 0 · ±∞ to be 0. Note that the sum f + g of two functions f and g is well defined wherever it is not of the form +∞ + (−∞) or −∞ + ∞. In the next theorem, we assume for simplicity that f + g is well defined everywhere. See Exercise 6 for extensions.

Theorem 4.9

If f and g are measurable, so is f + g.

67

Lebesgue Measurable Functions

Proof. Since g is measurable, so is a − g for any real a, by Theorem 4.8. Since { f + g > a} = { f > a − g}, the result follows from Theorem 4.7. A corollary of Theorems 4.8 and 4.9 is that a finite linear combination λ1 f1 + · · · + λN fN of measurable functions f1 , . . . , fN is measurable provided it is well defined. Thus, the class of measurable functions on a set E that are finite a.e. in E forms a vector space; here, we identify measurable functions that are equal a.e. In the theorem that follows, we consider products of functions. In addition to the familiar conventions about products of infinities, we adopt as usual the convention that 0 · ±∞ = ±∞ · 0 = 0. Also, if −∞ ≤ α ≤ +∞, we interpret α/(±∞) = α · 0 = 0.

Theorem 4.10 measurable.

If f and g are measurable, so is fg. If g = 0 a.e., then f /g is

Proof. By Theorem 4.6 and the remark following it, f 2 (= |f |2 ) is measurable if f is. Hence, if f and g are measurable and finite, the formula fg = [(f + g)2 − (f −g)2 ]/4 implies that fg is measurable. The proof when f and g can be infinite and the proof of the second statement of the theorem are left as exercises. Theorem 4.11 If { fk (x)}∞ k=1 is a sequence of measurable functions, then supk fk (x) and inf k fk (x) are measurable. Proof. Since inf k fk = −supk (−fk ), it is enough to prove the result for supk fk . But this follows easily from the fact that {supk fk > a} = k { fk > a}. As a special case of the preceding theorem, we see that if f1 , . . . , fN are measurable, then so are maxk fk and mink fk . In particular, if f is measurable, then so are f + = max{ f , 0} and f − = −min{ f , 0}, a fact we have already observed in the remark following Theorem 4.6.

Theorem 4.12

If { fk } is a sequence of measurable functions, then lim sup fk and k→∞

lim inf fk are measurable. In particular, if lim fk exists a.e., it is measurable. k→∞

k→∞

Proof. Since lim sup fk = inf {sup fk }, k→∞

j

k≥j

lim inf fk = sup{inf fk }, k→∞

j

k≥j

68

Measure and Integral: An Introduction to Real Analysis

the first statement is a corollary of Theorem 4.11. The second statement then follows by Theorem 4.5 since wherever limk→∞ fk exists, it equals lim supk→∞ fk . The characteristic function, or indicator function, χA (x), of a set A is defined by χA (x) =

1 if x ∈ A 0 if x ∈ / A.

Clearly, χA is measurable if and only if A is measurable. χA is an example of what is called a simple function on Rn : a simple function on a set E ⊂ Rn is one that is defined on E and assumes only a finite number of values on E, all of which are finite. If f is a simple function on E taking (distinct) values a1 , . . . , aN on (disjoint) subsets E1 , . . . , EN of E, E = N k=1 Ek , then f (x) =

N

ak χEk (x),

x ∈ E.

k=1

We leave it as an exercise to show that such an f is measurable if and only if E1 , . . . , EN are measurable. Theorem 4.13 (i) Every function f can be written as the limit of a sequence { fk } of simple functions. (ii) If f ≥ 0, the sequence can be chosen to increase to f , that is, chosen such that fk ≤ fk+1 for every k. (iii) If the function f in either (i) or (ii) is measurable, then the fk can be chosen to be measurable. Proof. We will prove (ii) first. Thus, suppose that f ≥ 0. For each k, k = 1, 2, . . ., subdivide the values of f that fall in [0, k] by partitioning [0, k] into subintervals [(j − 1)2−k , j2−k ], j = 1, . . . , k2k . Let ⎧ j−1 j ⎨j − 1 if ≤ f (x) < k , j = 1, . . . , k2k fk (x) = 2k 2k 2 ⎩k if f (x) ≥ k. Each fk is a simple function defined everywhere in the domain of f . Clearly, fk ≤ fk+1 since in passing from fk to fk+1 , each subinterval [(j − 1)2−k , j2−k ] is divided in half. Moreover, fk → f since 0 ≤ f − fk ≤ 2−k for sufficiently large k wherever f is finite, and fk = k → +∞ wherever f = +∞. This proves (ii).

69

Lebesgue Measurable Functions

To prove (i), apply the result of (ii) to each of the nonnegative functions f + and f − , obtaining increasing sequences { fk } and { fk } of simple functions such that fk → f + and fk → f − . Then fk − fk is simple and fk − fk → f + − f − = f . Finally, it is enough to prove (iii) for f ≥ 0 since otherwise we may consider f + and f − . In this case, however, k

fk =

k2 j−1 j=1

2k

χ {( j−1)/2k ≤ f < j/2k } + kχ { f ≥ k} .

This is measurable if f is since all the sets involved are measurable. Note that if f is bounded, the simple functions earlier will converge uniformly to f .

4.2 Semicontinuous Functions We now study classes of functions f whose continuity properties on a set can be characterized by the topological nature of { f > a} or { f < a}. Measurability of such functions will consequently be easy to establish. We will encounter a particularly important example when we study the Hardy–Littlewood maximal function in Chapter 7. Let f be defined on E, and let x0 be a limit point of E that lies in E. Then f is said to be upper semicontinuous at x0 if lim sup f (x) ≤ f (x0 ).

x→x0 ;x∈E

We will usually abbreviate this by saying that f is usc at x0 . Note that if f (x0 ) = +∞, then f is automatically usc at x0 ; otherwise, the statement that f is usc at x0 means that given M > f (x0 ), there exists δ > 0 such that f (x) < M for all x ∈ E that lie in the ball |x − x0 | < δ. Intuitively, this means that near x0 , the values of f do not exceed f (x0 ) by a fixed amount. Similarly, f is said to be lower semicontinuous at x0 , or lsc at x0 , if lim inf f (x) ≥ f (x0 ).

x→x0 ;x∈E

Thus, if f (x0 ) = −∞, f is lsc at x0 , while if f (x0 ) > −∞, the definition amounts to saying that given m < f (x0 ), there exists δ > 0 such that f (x) > m if x ∈ E and |x − x0 | < δ. Equivalently, f is lsc at x0 if and only if −f is usc at x0 . It follows that f is continuous at x0 if and only if |f (x0 )| < +∞ and f is both usc and lsc at x0 . As simple examples of functions that are usc everywhere in R1 but not continuous at some x0 , we have

70

Measure and Integral: An Introduction to Real Analysis

0 if x < x0 u1 (x) = 1 if x ≥ x0 ,

u2 (x) =

0 1

if x = x0 if x = x0 .

Hence, −u1 and −u2 are lsc everywhere in R1 . The Dirichlet function of Example 4 in Chapter 2 is usc at the rational numbers and lsc at the irrationals. A function defined on E is called usc (lsc, continuous) relative to E if it is usc (lsc, continuous) at every limit point of E that is in E. The next theorem characterizes functions that are semicontinuous relative to a set.

Theorem 4.14 (i) A function f is usc relative to E if and only if {x ∈ E : f (x) ≥ a} is relatively closed [equivalently, {x ∈ E : f (x) < a} is relatively open] for all finite a. (ii) A function f is lsc relative to E if and only if {x ∈ E : f (x) ≤ a} is relatively closed [equivalently, {x ∈ E : f (x) > a} is relatively open] for all finite a. Proof. Statements (i) and (ii) are equivalent since f is usc if and only if −f is lsc. It is therefore enough to prove (i). Suppose first that f is usc relative to E. Given a, let x0 be a limit point of {x ∈ E : f (x) ≥ a} that is in E. Then there exist xk ∈ E such that xk → x0 and f (xk ) ≥ a. Since f is usc at x0 , we have f (x0 ) ≥ lim supk→∞ f (xk ). Therefore, f (x0 ) ≥ a, so that x0 ∈ {x ∈ E : f (x) ≥ a}. This shows that {x ∈ E : f (x) ≥ a} is relatively closed. Conversely, let x0 be a limit point of E that is in E. If f is not usc at x0 , then f (x0 ) < +∞, and there exist M and {xk } such that f (x0 ) < M, xk ∈ E, xk → x0 , and f (xk ) ≥ M. Hence, {x ∈ E : f (x) ≥ M} is not relatively closed since it does not contain all its limit points that are in E.

Corollary 4.15 A finite function f is continuous relative to E if and only if all sets of the form {x ∈ E : f (x) ≥ a} and {x ∈ E : f (x) ≤ a} are relatively closed [or, equivalently, all {x ∈ E : f (x) > a} and {x ∈ E : f (x) < a} are relatively open] for finite a.

Corollary 4.16 Let E be measurable, and let f be defined on E. If f is usc (lsc, continuous) relative to E, then f is measurable. Proof. Let f be usc relative to E. Since {x ∈ E : f (x) ≥ a} is relatively closed, it is the intersection of E with a closed set. Hence, it is measurable, and the result follows from Theorem 4.1. The previous three results deserve special attention in certain cases. Suppose, for example, that E = Rn and f is usc everywhere in Rn . ∞ Since { f > a} = k=1 { f ≥ a + 1/k}, it follows from Theorem 4.14 that { f > a}

Lebesgue Measurable Functions

71

is of type Fσ . Since an Fσ set is a Borel set, we see that a function that is usc (similarly, lsc or continuous) at every point of Rn is Borel measurable.

4.3 Properties of Measurable Functions and Theorems of Egorov and Lusin Our next theorem states in effect that if a sequence of measurable functions converges at each point of a set E, then, with the exception of a subset of E with arbitrarily small measure, the sequence actually converges uniformly. This remarkable result cannot hold, at least in the form just stated, without some further restrictions. For example, if E = Rn and fk = χ{x:|x| 0, there is a closed subset F of E such that |E − F| < ε and { fk } converge uniformly to f on F. In order to prove this, we need a preliminary result that is interesting in its own right.

Lemma 4.18 Under the same hypothesis as in Egorov’s theorem, given ε, η > 0, there is a closed subset F of E and an integer K such that |E − F| < η and |f (x) − fk (x)| < ε for x ∈ F and k > K. Proof. Fix ε, η > 0. For each m, let Em = {|f −fk | < ε for all k > m}. Thus, Em = {|f − fk | < ε}, so that Em is measurable. Clearly, Em ⊂ Em+1 . Moreover, k>m since fk → f a.e. in E and f is finite, Em E − Z, |Z| = 0. Hence, by Theorem 3.26, |Em | → |E − Z| = |E|. Since |E| < +∞, it follows that |E − Em | → 0. Choose m0 so that |E − Em0 | < 12 η, and let F be a closed subset of Em0 with |Em0 − F| < 12 η. Then |E − F| < η, and |f − fk | < ε in F if k > m0 . Proof of Egorov’s theorem. Given ε > 0, use Lemma 4.18 to select closed Fm ⊂ E, m ≥ 1, and integers Km,ε such that |E − Fm | < ε2−m and |f − fk | < 1/m

72

Measure and Integral: An Introduction to Real Analysis

in Fm if k > Km,ε . The set F = m Fm is closed, and since F ⊂Fm for all m, fk converges uniformly to f on F. Finally, E − F = E − Fm = (E − Fm ) and, therefore, |E − F| ≤ |E − Fm | < ε. This completes the proof. See Exercises 13 and 14 for an analogue of Egorov’s theorem in the continuous parameter case, that is, in the case when fy (x) → f (x) as y → y0 . We have observed that a continuous function is measurable. Our next result, Lusin’s theorem, gives a continuity property that characterizes measurable functions. In order to state the result, we first make the following definition. A function f defined on a measurable set E has property C on E if given ε > 0, there is a closed set F ⊂ E such that (i) |E − F| < ε (ii) f is continuous relative to F We recall that condition (ii) means that if x0 and {xk } belong to F and xk → x0 , then f (x0 ) is finite and f (xk ) → f (x0 ). In case F is bounded (and, therefore, compact), (ii) implies that the restriction of f to F is uniformly continuous (Theorem 1.15).

Lemma 4.19

A simple measurable function has property C .

Proof. Suppose that f is a simple measurable function on E, taking distinct values a1 , . . . , aN on measurable subsets E1 , . . . , EN . Given ε > 0, choose F closed Fj ⊂ Ej with |Ej − Fj | < ε/N. Then the set F = N j is closed, and j=1 since E − F = Ej − Fj ⊂ (Ej − Fj ), we have |E − F| ≤ |Ej − Fj | < ε. It remains only to show that f is continuous on F. Note that each Fj is rela

tively open in F, in fact, Fj = F ∩ C k =j Fk , so the only points of F in a small neighborhood of any point of Fj are points of Fj itself. The continuity on F of f follows from this since f is constant on each Fj . Property C is actually equivalent to measurability, as we now show.

Theorem 4.20 (Lusin’s Theorem) Let f be defined and finite on a measurable set E. Then f is measurable if and only if it has property C on E. Proof. If f is measurable, then by Theorem 4.13, there exist simple measurable fk , k = 1, 2, . . ., which converge to f . By Lemma 4.19, each fk has property C , so given ε > 0, there exist closed Fk ⊂ E such that |E − Fk | < ε2−k−1 and fk is continuous relative to Fk . Assuming for the moment that |E| < +∞, we see by Egorov’s theorem that there is a closed F0 ⊂ E with |E − F0 | < 12 ε such that

73

Lebesgue Measurable Functions

{ fk } converges uniformly to f on F0 . If F = F0 ∩ ( ∞ k=1 Fk ), then F is closed, each fk is continuous relative to F, and { fk } converges uniformly to f on F. Hence, f is continuous relative to F by Theorem 1.16. Since |E − F| ≤ |E − F0 | +

∞

|E − Fk |

a} = {x ∈ H : f (x) > a} ∪ {x ∈ Z : f (x) > a} =

∞

{x ∈ Fk : f (x) > a} ∪ {x ∈ Z : f (x) > a}.

k=1

Since {x ∈ Z : f (x) > a} has measure zero, the measurability of f will follow from that of each {x ∈ Fk : f (x) > a}. However, since f is continuous relative to Fk and Fk is measurable, {x ∈ Fk : f (x) > a} is measurable by Corollary 4.16. This completes the proof of Lusin’s theorem.

4.4 Convergence in Measure Let f and { fk } be measurable functions that are defined and finite a.e. in a set E. Then { fk } is said to converge in measure on E to f if for every ε > 0, lim |{x ∈ E : |f (x) − fk (x)| > ε}| = 0.

k→∞

We will indicate convergence in measure by writing m

fk −→ f .

74

Measure and Integral: An Introduction to Real Analysis

This concept has many useful applications in analysis. Here, we will discuss its relation to ordinary pointwise convergence; the first result is basically a reformulation of Lemma 4.18. Theorem 4.21 Let f and fk , k = 1, 2, . . ., be measurable and finite a.e. in E. If m fk → f a.e. on E and |E| < +∞, then fk −→ f on E. Proof. Given ε, η > 0, let F and K be as defined in Lemma 4.18. Then if k > K, {x ∈ E : |f (x) − fk (x)| > ε} ⊂ E − F, and since |E − F| < η, the result follows. We recall that this conclusion may not hold if |E| = +∞, as shown by the example E = Rn , fk = χ{x:|x| 1 < 1 j 2j for k ≥ kj . We may assume that kj . Let Ej = {|f − fkj | > 1/j}, and Hm = ∞ ∞ −j −j −m+1 and |f − f | ≤ 1/j in kj j=m Ej . Then |Ej | < 2 , |Hm | ≤ j=m 2 = 2 E − Ej . Thus, if j ≥ m, |f − fkj | ≤ 1/j in E − Hm , so that fkj → f in E − Hm . Then fkj → f in (E − Hm ) = E − Hm . Since |Hm | → 0, it follows that | Hm | = 0 and fkj → f a.e. in E. This completes the proof. The next theorem gives a Cauchy criterion for convergence in measure.

75

Lebesgue Measurable Functions

Theorem 4.23 A necessary and sufficient condition that { fk } converge in measure on E is that for each ε > 0, lim |{x ∈ E : |fk (x) − fl (x)| > ε}| = 0.

k,l→∞

Proof. The necessity follows from the formula 1 1 | fk − f l | > ε ⊂ | f k − f | > ε ∪ | f l − f | > ε 2 2 and the fact that the measures of the sets on the right tend to zero as k, l → ∞ m if fk −→ f . To prove the converse, choose Nj , j = 1, 2, . . . , so that if k, l ≥ Nj , then |{|fk − fl | > 2−j }| < 2−j . We may assume that Nj . Then |fNj+1 − fNj | ≤ 2−j except for a set Ej , |Ej | < 2−j . Let Hi = ∞ j=i Ej , i = 1, 2, . . .. Then |fNj+1 (x) − fNj (x)| ≤ 2−j for j ≥ i and x ∈ / Hi . It follows that (fNj+1 − fNj ) converges uniformly outside Hi for every i and, therefore, that { fNj } converges uniformly outside every Hi . Since |Hi | ≤ −j = 2−i+1 , we obtain that { f } converges a.e. in E and, letting f = Nj j≥i 2 m

m

lim fNj , that fNj −→ f on E. In order to show that fk −→ f on E, note that

1 1 | fk − f | > ε ⊂ | f k − f N j | > ε ∪ | f N j − f | > ε 2 2

for any Nj . To show that the measure of the set on the left is less than a prescribed η > 0 for all sufficiently large k, select Nj so that the first term on the right has measure less than 12 η for all large k (here, we use the Cauchy condition) and so that the measure of the second term on the right is also less than 1 2 η. This completes the proof. As pointed out at the beginning of the chapter, many of the results of the chapter depend on only a few basic properties of Lebesgue measurable sets and Lebesgue measure. This is especially true for the elementary properties of measurable functions (Corollary 4.2, Theorems 4.1 and 4.3 through 4.12) and the section about convergence in measure, which use only the fact that the class of measurable subsets of Rn is a σ-algebra and, in Theorem 4.5, that subsets of a set of measure zero are measurable. Egorov’s theorem uses two additional facts: the fundamental result Theorem 3.26 concerning monotone sequences of sets and Lemma 3.22 about the approximability of measurable

76

Measure and Integral: An Introduction to Real Analysis

sets by closed sets. Actually, even Lemma 3.22, which is a topological property of Lebesgue measure, is not needed in the proof of Egorov’s theorem if instead of requiring that F be closed, we merely require that it be measurable. The rest of the chapter, namely, the material on semicontinuous functions and Lusin’s theorem, uses somewhat more restrictive topological properties of Rn and Lebesgue measure. For example, about Rn , we have used metric properties, and about Lebesgue measure, we have used Lemma 3.22 (e.g., in Lemma 4.19) and the fact that Borel sets are measurable (e.g., in Corollary 4.16).

Exercises 1. Prove Corollary 4.2 and Theorem 4.8. 2. Let f be a simple function, taking its distinct values on disjoint sets E1 , . . . , EN . Show that f is measurable if and only if E1 , . . . , EN are measurable. 3. Theorem 4.3 can be used to define measurability for vector-valued (e.g., complex-valued) functions. Suppose, for example, that f and g are realvalued and finite in Rn , and let F(x) = (f (x), g(x)). Then F is said to be measurable if F−1 (G) is measurable for every open G ⊂ R2 . Prove that F is measurable if and only if both f and g are measurable in Rn . 4. Let f be defined and measurable in Rn . If T is a nonsingular linear transformation of Rn , show that f (Tx) is measurable. (If E1 = {x : f (x) > a} and E2 = {x : f (Tx) > a}, show that E2 = T−1 E1 .) 5. Give an example to show that φ(f (x)) may not be measurable if φ and f are measurable and finite. (Let F be the Cantor–Lebesgue function and let f be its inverse, suitably defined. Let φ be the characteristic function of a set of measure zero whose image under F is not measurable.) Show that the same may be true even if f is continuous. (Let g(x) = x + F(x), where F is the Cantor–Lebesgue function, and consider f = g−1 .) Cf. Exercise 22. 6. Let f and g be measurable functions on E: (a) If f and g are finite a.e. in E, show that f + g is measurable no matter how we define it at the points when it has the form +∞ + (−∞) or −∞ + ∞. (b) Show that fg is measurable without restriction on the finiteness of f and g. Show that f + g is measurable if it is defined to have the same value at every point where it has the form +∞ + (−∞) or −∞ + ∞. (Note that a function h defined on E is measurable if and only if both {h = +∞} and {h = −∞} are measurable and the restriction of h to the subset of E where h is finite is measurable.)

Lebesgue Measurable Functions

77

7. Let f be usc and less than +∞ on a compact set E. Show that f is bounded above on E. Show also that f assumes its maximum on E, that is, that there exists x0 ∈ E such that f (x0 ) ≥ f (x) for all x ∈ E. 8. (a) Let f and g be two functions that are usc at x0 . Show that f + g is usc at x0 . Is f − g usc at x0 ? When is fg usc at x0 ? (b) If { fk } is a sequence of functions that are usc at x0 , show that inf k fk (x) is usc at x0 . (c) If { fk } is a sequence of functions that are usc at x0 and converge uniformly near x0 , show that lim fk is usc at x0 . 9. (a) Show that the limit of a decreasing (increasing) sequence of functions usc (lsc) at x0 is usc (lsc) at x0 . In particular, the limit of a decreasing (increasing) sequence of functions continuous at x0 is usc (lsc) at x0 . (b) Let f be usc and less than +∞ on [a, b]. Show that there exist continuous fk on [a, b] such that fk f . (First show that there are usc step functions fk f .) 10. (a) If f is defined and continuous on E, show that {a < f < b} is relatively open and that {a ≤ f ≤ b} and { f = a} are relatively closed. (b) Let f be a finite function on Rn . Show that f is continuous on Rn if and only if f −1 (G) is open for every open G in R1 , or if and only if f −1 (F) is closed for every closed F in R1 . 11. Let f be defined on Rn and let B(x) denote the open ball {y : |x − y| < r} with center x and fixed radius r. Show that the function g(x) = sup{ f (y) : y ∈ B(x)} is lsc and that the function h(x) = inf{ f (y) : y ∈ B(x)} is usc on Rn . Is the same true for the closed ball {y : |x − y| ≤ r}? 12. If f (x), x ∈ R1 , is continuous at almost every point of an interval [a, b], show that f is measurable on [a, b]. Generalize this to functions defined in Rn . (For a constructive proof, use the subintervals of a sequence of partitions to define a sequence of simple measurable functions converging to f a.e. in [a, b]. Use Theorem 4.12. See also the proof of Theorem 5.54.) 13. One difficulty encountered in trying to extend the proof of Egorov’s theorem to the continuous parameter case fy (x) → f (x) as y → y0 is showing that the analogues of the sets Em in Lemma 4.18 are measurable. This difficulty can often be overcome in individual cases. Suppose, for example, that f (x, y) is defined and continuous in the square 0 ≤ x ≤ 1, 0 < y ≤ 1 and that f (x) = limy→0 f (x, y) exists and is finite for x in a measurable subset E of [0,1]. Show that if ε and δ satisfy 0 < ε, δ < 1, the set Eε,δ = {x ∈ E : |f (x, y)−f (x)| ≤ ε for all y < δ} is measurable. (If yk , k = 1, 2, . . ., is a dense subset of (0, δ), show that Eε,δ = k {x ∈ E : |f (x, yk ) − f (x)| ≤ ε}.) 14. Let f (x, y) be as in Exercise 13. Show that given ε > 0, there exists a closed F ⊂ E with |E − F| < ε such that f (x, y) converges uniformly for x ∈ F to f (x) as y → 0. (Follow the proof of Egorov’s theorem, using the

78

Measure and Integral: An Introduction to Real Analysis

sets Eε,1/m defined in Exercise 13 in place of the sets Em in the proof of Lemma 4.18.) 15. Let { fk } be a sequence of measurable functions defined on a measurable E with |E| < +∞. If |fk (x)| ≤ Mx < +∞ for all k for each x ∈ E, show that given ε > 0, there is a closed F ⊂ E and a finite M such that |E − F| < ε and |fk (x)| ≤ M for all k and all x ∈ F. m

16. Prove that fk −→ f on E if and only if given ε > 0, there exists K such that |{|f − f k | > ε}| < ε if k > K. Give an analogous Cauchy criterion. m

m

m

17. Suppose that fk −→ f and gk −→ g on E. Show that fk + gk −→ f + g on m E and, if |E| < +∞, that fk gk −→ fg on E. If, in addition, gk → g on E, g = m 0 a.e., and |E| < +∞, show that fk /gk −→ f /g on E. (For the product fk gk , write fk gk −fg = (fk − f )(gk − g)+f (gk − g)+g(fk − f ). Consider each term separately, using the fact that a function that is finite on E, |E| < +∞ is bounded outside a subset of E with small measure.) 18. If f is measurable on E, define ωf (a) = |{ f > a}| for −∞ < a < +∞. If m

fk f , show that ωfk ωf . If fk −→ f , show that ωfk → ωf at each m

point of continuity of ωf . (For the second part, show that if fk −→ f , then lim supk→∞ ωfk (a) ≤ ωf (a − ε) and lim inf k→∞ ωfk (a) ≥ ωf (a + ε) for every ε > 0.) 19. Let f (x, y) be a function defined on the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 which is continuous in each variable separately. Show that f is a measurable function of (x, y). Is the same true if f is only assumed to be continuous in x for each y? 20. If f is measurable and finite a.e. on [a, b], show that given ε > 0, there is a continuous g on [a, b] such that |{x : f (x) = g(x)}| < ε. (See Exercise 18 of Chapter 1.) Formulate and prove a similar result in Rn by combining Lusin’s theorem with the Tietze extension theorem. 21. Show that the necessity part of Lusin’s theorem is not true for ε = 0, that is, find a measurable set E and a finite measurable function f on E such that f is not continuous relative to E−Z for any Z with |Z| = 0. (Consider, e.g., χE for the set E in Exercise 25 of Chapter 3.) 22. (a) Show that if f is measurable and B is a Borel set in R1 , then f −1 (B) is measurable. (Recall that the Borel sets form the smallest σ-algebra that contains the open sets. Consider the collection of sets {E : f −1 (E) is measurable}.) Cf. Exercise 30 of Chapter 3. (b) If φ is a Borel measurable function on R1 and f is finite and measurable on Rn , show that φ(f (x)) is measurable on Rn . Cf. Theorem 4.6 and Exercise 5. 23. Let { fk }∞ k=1 be a sequence of measurable functions defined on a measurable set E. Show that the sets {x : lim fk (x) exists and is finite}, {x : lim fk (x) = +∞}, {x : lim fk (x) = −∞} are measurable.

Lebesgue Measurable Functions

79

24. Let f be a (Lebesgue) measurable function defined on a (Lebesgue) measurable set E. Show that there is a Borel set H ⊂ E and a Borel measurable function h on H such that |E| = |H| and f = h on H. (This can be deduced from Lusin’s theorem.) In case E is a Borel set, and consequently the exceptional set Z = E − H is also a Borel set, show that f can be redefined on Z so that the resulting function is Borel measurable on E.

5 The Lebesgue Integral

5.1 Definition of the Integral of a Nonnegative Function There are several equivalent ways to define the Lebesgue integral and develop its main properties. The approach we have chosen is based on the notion that the integral of a nonnegative f should represent the volume of the region under the graph of f . We start then with a nonnegative function f , 0 ≤ f ≤ +∞, defined on a measurable subset E of Rn . Let ( f , E) = x, f (x) ∈ Rn+1 : x ∈ E, f (x) < +∞ , R( f , E) = x, y ∈ Rn+1 : x ∈ E, 0 ≤ y ≤ f (x) if f (x) < +∞, and 0 ≤ y < +∞ if f (x) = +∞ . ( f , E) is called the graph of f over E and R( f , E) the region under f over E. If R( f , E) is measurable (as a subset of Rn+1 ), its measure |R( f , E)|(n+1) is called the Lebesgue integral of f over E, and we write |R( f , E)|(n+1) = f (x) dx. E

Usually, one of the abbreviations f dx E

or

f

E

is used, and at times the lengthy notation · · · f (x1 , . . . , xn ) dx1 . . . dxn E

is convenient. We stress that the definition applies only to nonnegative f ; a definition for functions that are not nonnegative will be given in Section 5.3. 81

82

Measure and Integral: An Introduction to Real Analysis

We also note that the existence of the integral is equivalent to the measurability of R( f , E) and does not require the finiteness of |R( f , E)|(n+1) . The next theorem is of basic importance.

Theorem 5.1 Let f be a nonnegative function defined on a measurable set E. Then f exists if and only if f is measurable. E We will show here only that the integral exists if f is measurable, postponing a proof of the converse until Theorem 6.11. We need several lemmas, the first of which proves the theorem for functions that are constant on E. In this case, R( f , E) is a cylinder set; that is, it has one of the forms {(x, y) : x ∈ E, 0 ≤ y ≤ a}, 0 ≤ a < +∞, or {(x, y) : x ∈ E, 0 ≤ y < +∞}. Lemma 5.2 Let E be a subset of Rn , 0 ≤ a ≤ +∞, and define Ea = {(x, y) : x ∈ E, 0 ≤ y ≤ a} for finite a and E∞ = {(x, y) : x ∈ E, 0 ≤ y < +∞}. If E is measurable (as a subset of Rn ), then Ea is measurable (as a subset of Rn+1 ) and |Ea |(n+1) = a|E|(n) .∗ Proof. The result follows from a series of simple observations. First, assume that a is finite. If |E| = 0 or if E is an interval that is either closed, partly open, or open, the result is clear. Next, if E is an open set, then by Theorem 1.11, it can be written as a disjoint union of partly open intervals, E = Ik . Therefore, Ea = Ik,a , and since Ik,a are measurable and disjoint, Ea is measurable and |Ea | = Ik,a = a |Ik | = a|E|. Let E be of type Gδ , E = ∞ k=1 Gk , with |G1 | < +∞. We may assume that Gk E by writing E = G1 ∩ (G1 ∩ G2 ) ∩ (G1 ∩ G2 ∩ G3 ) ∩ · · · . Therefore, by Theorem 3.26, |Gk | → |E| as k → ∞. Moreover, Gk,a is measurable, Gk,a = a |Gk |, and Gk,a Ea . Therefore, Ea is measurable and |Ea | = limk→∞ Gk,a = a limk→∞ |Gk | = a|E|. If E is any measurable set with |E| < +∞, then by Theorem 3.28, E = H − Z, where |Z| = 0, H is a set of type Gδ , H = ∞ k=1 Gk , and |G1 | < +∞. Since Ea = Ha − Za , we see that Ea is measurable and |Ea | = |Ha | = a|H| = a|E|. Finally, if |E| = +∞, the result follows by writing E as the countable union of disjoint measurable sets with finite measure. This completes the proof in case a is finite. If a = +∞, choose a sequence {ak } of finite numbers increasing to +∞. The conclusion then follows easily from the fact that Eak E∞ . As is easily seen (e.g., by using the same proof as above), the conclusion of Lemma 5.2 holds with Ea replaced by {(x, y) : x ∈ E, 0 ≤ y < a}, 0 ≤ a < +∞. ∗ Here and in the following text, 0 · ∞ and ∞ · 0 should be interpreted as 0.

83

The Lebesgue Integral

Lemma 5.3 If f is a nonnegative measurable function on E, 0 ≤ |E| ≤ +∞, then ( f , E) has measure zero. Proof. Given ε > 0 and k = 0, 1, . . . , let Ek = {kε ≤ f < (k + 1)ε}. The Ek are disjoint and measurable, their union is the subset of E where f is finite. and Hence, ( f , E) = f , Ek . Since ( f , Ek )e ≤ ε |Ek | by Lemma 5.2, we obtain

|( f , E)|e ≤

f , Ek ≤ ε |Ek | ≤ ε|E|. e

If |E| < +∞, this implies that ( f , E) has measure zero. If |E| = +∞, write E as the countable union of disjoint measurable sets with finite measure. Then ( f , E) is the countable union of sets of measure zero, and the lemma follows.

Proof of the sufficiency in Theorem 5.1. Let f be nonnegative and measurable on E. We must show that R( f , E) is measurable. By Theorem 4.13, there exist simple measurable fk f . Therefore, R fk , E ∪ ( f , E) R( f , E), and since (f , E) is measurable (with measure zero), it is enough to show that each R fk , E is measurable. Fix k and suppose that the distinct values of fk are a1 , . . . , aN , taken on measurable sets E1 , . . . , EN , respectively. Then R fk , E = N j=1 Ej,aj . Therefore, R fk , E is measurable by Lemma 5.2, and the proof is complete.

Corollary 5.4 If f is a nonnegative measurable function, taking constant values a1 , a2 , . . . (possibly +∞) on disjoint sets E1 , E2 , . . ., respectively, and if E = Ej , then E

f =

aj Ej .

j

Proof. Clearly R( f , E) = j Ej,aj . Since the Ej are measurable and disjoint, so are the Ej,aj . Therefore, E f = Ej,aj , and the corollary follows from the fact that Ej,aj = aj Ej . Note that Corollary 5.4 applies in particular to nonnegative simple measurable functions.

84

Measure and Integral: An Introduction to Real Analysis

5.2 Properties of the Integral Theorem 5.5 (i) If f and g are measurable and if 0 ≤ g ≤ f on E, then E g ≤ E f . In particular, E (inf f ) ≤ E f . (ii) If f is nonnegative and measurable on E and if E f is finite, then f < +∞ a.e. in E. (iii) Let E1 and E2 be measurable and E1 ⊂ E2 . If f is nonnegative and measurable on E2 , then E1 f ≤ E2 f . Proof. Parts (i) and (iii) follow from the relations R(g, E) ⊂ R( f , E) and R f , E1 ⊂ R f , E2 , respectively. To prove (ii), we may assume that |E| > 0. If f =+∞ in a subset E1 of E with positive measure, then by (iii) and (i), we have E f ≥ E1 f ≥ E1 a = a |E1 |, no matter how large a is. This contradicts the finiteness of E f . Theorem 5.6 (Monotone Convergence Theorem for Nonnegative Functions) If { fk } is a sequence of nonnegative measurable functions such that fk f on E, then fk → f . E

E

Proof. By Theorem 4.12, f is measurable. Since R fk , E ∪ ( f , E) R( f , E) and ( f , E) has measure zero, the result follows from Theorem 3.26.

Theorem 5.7 Suppose that f is nonnegative and measurable on E and that E is the countable union of disjoint measurable sets Ej , E = Ej . Then E

f =

f.

Ej

Proof. The sets R f , Ej are disjoint and measurable. Since R( f , E) = R f , Ej , the result follows from Theorem 3.23. The next four theorems are corollaries of the results just proved. The first one provides an alternate definition of the integral that will be useful in

85

The Lebesgue Integral

Chapter 10 as a motivation for defining integration with respect to abstract measures.

Theorem 5.8

Let f be nonnegative and measurable on E. Then

f = sup

inf f (x) Ej ,

x∈Ej

j

E

where the supremum is taken over all decompositions E = of a finite number of disjoint measurable sets Ej .

j Ej

of E into the union

The reader will observe that the formula resembles the definition of the Riemann integral if the Ej are taken to be subintervals. We note however that the roles of sup and inf cannot be interchanged; see Exercise 25. Proof. If E = N j=1 Ej is such a decomposition, consider the measurable function g taking values aj = inf y∈Ej f (y) on Ej , j = 1, . . . , N. Since 0 ≤ g ≤ f , we have by Corollary 5.4 and Theorem 5.5 that N aj Ej ≤ f . Therefore, j=1

sup

E

inf f |Ej | ≤ f . Ej

j

E

To prove the opposite inequality, consider for k = 1, 2, . . . , the sets (k) Ej , j = 0, 1, . . . , k2k , defined by

(k) Ej

=

j−1 j ≤ f < k , j = 1, . . . , k2k ; 2k 2

(k)

E0 = { f ≥ k},

and the corresponding measurable functions fk =

inf f χE(k) . j

(k)

Ej

j

(Compare the simple functions in Theorem 4.13(ii).) Then 0 ≤ fk f , and by the monotone convergence theorem, E fk → E f . Since E fk = (k) j inf E(k) f Ej by Corollary 5.4, it follows that j

86

Measure and Integral: An Introduction to Real Analysis

sup

j

(inf f ) |Ej | ≥ Ej

f,

E

which completes the proof. Theorem 5.9

Let f be nonnegative on E. If |E| = 0, then

Ef

= 0.

This can be proved in many ways; for example, it follows immediately from the last result. Measurability of f is automatic since |E| = 0. We can now slightly strengthen the statement of part (i) of Theorem 5.5. Theorem 5.10 If f and g are nonnegative and measurable on E and if g ≤ f a.e. in E, then E g ≤ E f . In particular, if f and g are nonnegative and measurable on E and if f = g a.e. in E, then E f = E g. Proof. Write E = A ∪ Z, where A and Z are disjoint and Z= {g > f }. Then |Z| = 0. Therefore, by Theorems 5.7 and 5.9, E f = A f + Z f = A f . Since the same is true for g, and since f ≥ g everywhere on A, the result follows. In defining E f , we assumed that f was defined everywhere in E. In view of Theorem 5.10, E f is unchangedif we modify f in a set of measure zero. Hence, we may consider integrals E f where f is defined only a.e. in E, by completing the definition of f arbitrarily in the set Z of measure zero where it is undefined. As a result of Theorem 5.9, this amounts to defining E f to be f . Similarly, we may extend the definition of the integral and the results E−Z earlier to measurable functions that are nonnegative only a.e. in E. Theorem 5.11 Let f be nonnegative and measurable on E. Then only if f = 0 a.e. in E.

Ef

= 0 if and

Proof. If f = 0 a.e. in E, then E f = 0 by Theorem 5.10. Conversely, suppose that f is nonnegative and measurable on E and that E f = 0. For α > 0, we have by Corollary 5.4 and Theorem 5.5 that α|{x ∈ E : f (x) > α}| ≤ f ≤ f = 0. {x∈E:f (x)>α}

E

Therefore, {x ∈ E : f (x) > α} has measure zero for every α > 0. Since the set where f > 0 is the union of those where f > 1/k, it follows that f = 0 a.e. in E. This proof also establishes the following useful result.

87

The Lebesgue Integral

Corollary 5.12 (Tchebyshev’s Inequality) Let f be nonnegative and measurable on E. If α > 0, then |{x ∈ E : f (x) > α}| ≤

1 f. α E

The significance of Tchebyshev’s inequality is that it estimates the size of f in terms of the integral of f . The next two theorems establish the linear properties of the integral for nonnegative functions.

If f is nonnegative and measurable, and if c is any nonnegative

Theorem 5.13 constant, then

cf = c

E

f.

E

Proof. If f is simple, then so is cf, and the theorem follows in this case from the formula for integrating simple functions (see Corollary 5.4). For arbitrary measurable f ≥ 0, choose simple measurable fk with 0 ≤ fk f . Then 0 ≤ cfk cf and

cf = lim

E

k→∞

cfk = lim c k→∞

E

fk = c

E

f.

E

If f and g are nonnegative and measurable, then

Theorem 5.14

( f + g) =

E

f+

E

g.

E

First, suppose that f and g are simple: f = N i=1 ai χAi and g = b χ , where E = A = B and all A and B are measurable. Then, i j j=1 j Bj i i j j f + g is also simple, taking values ai + bj on Ai ∩ Bj : f + g = i,j ai + bj χAi ∩Bj . Thus,

Ai ∩ Bj + Ai ∩ Bj ai + bj Ai ∩ Bj = ( f + g) = ai bj

Proof. M

i,j

E

=

ai |Ai | +

i

j

E

E

bj Bj = f + g.

j

i

88

Measure and Integral: An Introduction to Real Analysis

For general nonnegative measurable f and g, choose simple measurable fk and gk such that 0 ≤ fk f and 0 ≤ gk g. Then fk + gk is simple and 0 ≤ fk + gk f + g. Therefore, fk + gk = lim ( f + g) = lim fk + gk = f + g, k→∞

E

k→∞

E

E

E

E

E

which completes the proof. Corollary 5.15 Suppose that f and φ are measurable on E, 0 ≤ f ≤ φ, and finite. Then (φ − f ) = φ − f . E

E

Proof. By Theorem 5.14, we have the result follows by subtraction.

Theorem 5.16

Ef

+

Ef

is

E

− f) =

E (φ

E φ.

Since

Ef

is finite,

If fk , k = 1, 2, . . . , are nonnegative and measurable, then ∞ ∞

fk = fk . E

k=1

k=1 E

Proof. The functions FNdefined by FN = N k=1 fk are nonnegative and mea∞ surable and increase to k=1 fk . Hence, ∞ N ∞

fk = lim FN = lim fk = fk . E

k=1

N→∞

E

N→∞

k=1 E

k=1 E

Note that the preceding theorem is essentially a corollary of the Monotone Convergence Theorem 5.6. Conversely, Theorem 5.6 can be deduced from this result. Verification is left to the reader. The monotone convergence theorem gives a sufficient conditionfor interchanging the operations of integration and passage to the limit: E lim fk = lim E fk . It is an important problem to find other conditions under which this is true. First, we show that some restriction other than the mere convergence of fk to f is necessary. Let E be the interval [0, 1], and for k = 1, 2, . . . , let fk be defined as follows: when 0 ≤ x ≤ 1/k, the graph of fk consists of the sides of the isosceles triangle with altitude k and base [0, 1/k]; when 1/k ≤ x ≤ 1, fk (x) = 0.

89

The Lebesgue Integral

k

1

1/k

1 1 Clearly, fk → 0 on [0, 1], but 0 fk = 12 (1/k)(k) = 12 for all k. Hence, 0 lim fk < 1 lim 0 fk . For any nonnegative measurable fk such that fk → f and fk converges, the fact that f ≤ lim fk is a consequence of the next theorem. Theorem 5.17 (Fatou’s Lemma) If fk is a sequence of nonnegative measurable functions on E, then lim inf fk ≤ lim inf fk . k→∞

k→∞

E

E

Proof. First, note that the integral on the leftexists since its integrand is nonnegative and measurable. Next, let gk = inf fk , fk+1 , . . . for each k. Then, gk lim inf fk and 0 ≤ gk ≤ fk . Therefore, by Theorems 5.6 and 5.10, E

gk →

(lim inf fk ),

E

gk ≤

E

fk ,

E

so that E

(lim inf fk ) = lim

E

gk ≤ lim inf

fk .

E

Corollary 5.18 Let and measurable on E, and let fk , k = 1, 2, . . . , be nonnegative fk → f a.e. in E. If E fk ≤ M for all k, then E f ≤ M. Proof. By Fatou’s lemma, E lim inf fk ≤ M. Since lim inf fk = lim fk = f a.e. in E, the conclusion follows. We now prove a basic result about term-by-term integration of convergent sequences.

90

Measure and Integral: An Introduction to Real Analysis

Theorem 5.19 (Lebesgue’s Dominated Convergence Theorem for Nonnegative Functions) Let fk be a sequence of nonnegative measurable functions on E such that fk → f a.e. in E. If there exists a measurable function φ such that fk ≤ φ a.e. for all k and if E φ is finite, then fk → f . E

E

Proof. By Fatou’s lemma, f = lim inf fk ≤ lim inf fk , E

E

E

and the theorem will follow if we show that f ≥ lim sup fk . E

E

To prove this inequality, apply Fatou’s lemma to the nonnegative functions φ − fk , obtaining lim inf (φ − fk ) ≤ lim inf (φ − fk ). E

E

Since fk → f a.e., theintegrand on the left equals φ−f a.e., so that the integral on the left is E φ − E f by Corollary 5.15. The right-hand side equals

lim inf

φ−

E

fk

=

E

φ − lim sup

E

Combining formulas and cancelling lim sup E fk , as desired.

E φ,

fk .

E

we obtain the inequality

Ef

≥

5.3 The Integral of an Arbitrary Measurable f Let f be any measurable function defined on a set E. Then f = f + − f − and, by + − the comments following Theorem − 4.11, f and f are measurable. Therefore, + the integrals E f (x) dx and E f (x) dx exist and are nonnegative, possibly having value +∞. Provided at least one of these integrals is finite, we define E

f (x) dx =

E

f + (x) dx −

E

f − (x) dx

91

The Lebesgue Integral

and say that the integral E f (x) dx exists. If f ≥ 0, then f = f + , and this definition agrees with one. As in the case when f ≥ 0, we will use the the previous abbreviations E f dx and E f . The definition clearly applies if f is defined only a.e. in E, as in the case when f ≥ 0 (see p. 86 in Section 5.2). For the sake of simplicity, we shall usually in E. assume that f is defined everywhere If E f exists then, of course, −∞ ≤ E f ≤ +∞. If E f exists and is finite, we say that f is Lebesgue integrable, or simply integrable, on E and write f ∈ L(E). Thus, L(E) =

f :

f is finite .

E

If

Ef

exists, then E

f+ + f− , f ≤ f+ + f− = E

E

E

by Theorem 5.14. Since f + + f − = |f |, we obtain the inequality f dx ≤ |f | dx. E

Theorem 5.21 if |f | is.

(5.20)

E

Let f be measurable on E. Then f is integrable over E if and only

Proof. If |f | ∈ L(E) then f + , f − ∈ L(E), and consequently E f exists and is + − − is finite, and therefore, f f finite. If f ∈ L(E), then the difference E E must be finite. Hence, their since at least one of E f + or Ef − is finite, both sum is finite. Since this sum is E f + + f − = E |f |, it follows that |f | ∈ L(E). The simple properties of E f for general f follow from the results already established for f ≥ 0. As a first example, we have the following theorem.

Theorem 5.22

If f ∈ L(E), then f is finite a.e. in E.

Proof. If f ∈ L(E), then |f | ∈ L(E), and the result follows from Theorem 5.5(ii).

92

Measure and Integral: An Introduction to Real Analysis

Theorem 5.23 (i) If both E f and E g exist and if f ≤ g a.e. in E, then E f ≤ E g. Also, if f and functions with f = g a.e. in E and if E f exists, then E g exists and g are f = E E g. (ii) If E2 f exists and E1 is a measurable subset of E2 , then E1 f exists. + + − Proof. (i) The fact that f ≤ g a.e. implies that 0 ≤ f +≤ g and− 0 ≤g −≤ − + f a.e. in E. By Theorem 5.10, we then have E f ≤ E g and E f ≥ E g , and the first part of (i) follows by subtracting these inequalities. The proof of the second part of (i) is similar; note that measurability of f is equivalent to that of gwhen f = g a.e. (ii) If E2 f exists, at least one of E2 f + or E2 f − is finite. If E1 ⊂ E2 , then by + − Theorem 5.5(iii), at least one of E1 f or E1 f is finite. Therefore, E1 f exists.

Theorem 5.24 If E f exists and E = k Ek is the countable union of disjoint measurable sets Ek , then

f =

Proof. Each

Ek

f.

k Ek

E

f exists by Theorem 5.23(ii). We have

f =

E

E

f+ −

f− =

E

f+ −

Ek

f−

Ek

by Theorem 5.7. Since at least one of these sums is finite, we obtain E

⎞ ⎛

⎝ f + − f −⎠ = f = f. Ek

Ek

Ek

If |E| = 0 or if f = 0 a.e. in E, then

Theorem 5.25

Ef

= 0.

Proof. The theorem follows by applying Theorem 5.9 or 5.11 to f + and f − . The next few results deal with linearity properties of the integral.

Lemma 5.26

If

Ef

is defined, then so is

E (−f ),

and

E (−f )

=−

E f.

93

The Lebesgue Integral

Proof. Since (−f )+ = f − and (−f)− = f + , and at least one of finite, we have E (−f ) = E f − − E f + = − E f . Theorem 5.27

If

Ef

exists and c is any real constant, then

(cf ) = c

E

Ef

E (cf )

−

or

Ef

+

is

exists and

f.

E

− − Proof. If c ≥ 0, (cf )+ = cf + and by Theorem (cf− ) = cf . Therefore, 5.13, + + − (cf ) = c E f and E (cf ) = c E f . It follows that E (cf ) exists and E (cf ) = E c E f + − E f − = c E f . If c = −1, the theorem reduces to Lemma 5.26. For any c ≤ 0, we have c = (−1)(|c|), and the result follows from the cases c = −1 and c ≥ 0.

Theorem 5.28

If f , g ∈ L(E), then f + g ∈ L(E) and E

( f + g) =

f+

E

g.

E

Proof. Since |f + g| ≤ |f | + |g|, we have from Theorems 5.23(i) and 5.14 that E

|f + g| ≤

E

(|f | + |g|) =

E

|f | +

|g| < +∞.

E

Hence, f + g ∈ L(E). To prove the rest of the theorem, first note that if g = 0 everywhere in E, then the formula E ( f + g) = E f + E g is obvious by Theorem 5.25. Thus, by writing E = (E ∩ {g = 0}) ∪ (E ∩ {g = 0}) and using Theorem 5.24 for each of the three integrals in the formula, it is enough to prove the formula with E replaced by E ∩ {g = 0}. Hence, because of similar considerations for f , it suffices to prove the formula under the extra assumption that f and g never vanish on E. To do so, we begin by considering the following six cases, in which each inequality is assumed to hold everywhere in E: (1) f > 0, g > 0 (so that f + g > 0); (2) f > 0, g < 0, f + g ≥ 0; (3) f > 0, g < 0, f + g < 0; (4) f < 0, g > 0, f + g ≥ 0; (5) f < 0, g > 0, f + g < 0; (6) f < 0, g < 0 (so that f + g < 0). Note that these possibilities are mutually exclusive. The result in case 1 is just Theorem 5.14. Cases 2–6 are all similar, and we shall consider only case 2. Then f > 0, −g > 0, f + g ≥ 0, and since by Theorem 5.22 each function is finite = f a.e. Hence, by Theorems 5.23(i) and a.e., ( f + g) + (−g) 5.14, we have E ( f + g) + E (−g) = E f . The result in case 2 now follows from Lemma 5.26 and the fact that all the integrals involved are finite.

94

Measure and Integral: An Introduction to Real Analysis

For arbitrary f and g in L(E) that never vanish in E, we subdivide E into at most six measurable sets, E1 , . . . , E6 , where possibilities (1), . . . , (6) hold, respectively. Since Ei and Ej are disjoint for i = j, we have ⎛ ⎞ 6 6

⎝ f + g⎠ = f + g. ( f + g) = ( f + g) = E

j=1 Ej

j=1

Ej

Ej

E

E

This completes the proof. It follows that if fk ∈ L(E), k = 1, . . . , N, and if ak are real constants, then N k=1 ak fk ∈ L(E) and N N

ak fk = ak fk . k=1

E

k=1

E

Corollary 5.29 Let f and φ be measurable on E, f ≥ φ a.e., and φ ∈ L(E). Then, ( f − φ) = f − φ. E

E

E

Proof. First, note that E f exists since f − ≤ φ− a.e. implies that E f − is finite. Next, E ( f −φ) exists since f −φ ≥ 0 a.e. If f ∈ L(E), the corollary follows from have Theorem 5.28. If f ∈ L(E), then since f − ∈ L(E), we must E f = +∞. The fact that φ ∈ L(E) implies that f − φ ∈ L(E), so that E ( f − φ) = +∞ since f − φ ≥ 0 a.e. This completes the proof. In Chapter 8, we will study conditions on f and g that imply that fg ∈ L(E). For now, we have the following simple result. Theorem 5.30 If f ∈ L(E), g is measurable on E, and there exists a finite constant M such that |g| ≤ M a.e. in E, then fg ∈ L(E) and E |fg| ≤ M E |f |. |fg| ≤ M|f Proof. Since | a.e. in E, we have by Theorems 5.10 and 5.27 that |fg| ≤ M|f | = M E E E |f |. Hence, fg ∈ L(E). Corollary 5.31 If f ∈ L(E), f ≥ 0 a.e., and there exist finite constants α and β such that α ≤ g ≤ β a.e. in E, then α f ≤ fg ≤ β f . E

E

E

95

The Lebesgue Integral

Proof. By Theorem 5.30, fg ∈ L(E). Since f ≥ 0 a.e., we have αf ≤ fg ≤ βf a.e. in E, and the conclusion follows by integrating. We now study conditions that imply that E fk → E f if fk → f in E. Most of the results are extensions of those we derived for nonnegative functions. Theorem 5.32 (Monotone Convergence Theorem) Let { fk } be a sequence of measurable functions on E: (i) If fk E and there exists φ ∈ L(E) such that fk ≥ φ a.e. on E for all k, f a.e. on then E fk → E f . (ii) If fk E and there exists φ ∈ L(E) such that fk ≤ φ a.e. on E for all k, f a.e. on then E fk → E f . Proof. To prove (i), we may assume by Theorem 5.25 that fk f and fk ≥ φ that by Theorem 5.6, everywhere on E. Then 0 ≤ fk − φ f − φ on E, so f − φ → ( f −φ). Therefore, by Corollary 5.29, f − φ → f − E k E E k E E E φ, and since φ ∈ L(E), the result follows. We can deduce (ii) from (i) by considering the functions −fk . Details are left to the reader. Theorem 5.33 (Uniform Convergence Theorem) Let fk ∈ L(E) fork = 1, 2,. . ., and let {fk } converge uniformly to f on E, |E| < +∞. Then f ∈ L(E) and E fk → E f . Proof. Since |f | ≤ fk + f − fk and fk converge uniformly to f on E, we have |f | ≤ fk + 1 on E if k is sufficiently large. Since |E| < +∞, it follows that f ∈ L(E). From Theorem 5.28 and (5.20), we obtain f − f k = ( f − f k ) ≤ f − f k . E

E

E

E

The last integral is bounded by supx∈E f (x) − fk (x) |E|, which by hypothesis tends to 0 as k → ∞. This proves the theorem. Theorem 5.34 (Fatou’s Lemma) Let fk be a sequence of measurable functions on E. If there exists φ ∈ L(E) such that fk ≥ φ a.e. on E for all k, then lim inf fk ≤ lim inf fk . E

k→∞

k−∞

E

96

Measure and Integral: An Introduction to Real Analysis

Proof. The result follows by first applying Theorem 5.17 to the sequence fk − φ of nonnegative functions, and then using Corollary 5.29. Details are left to the reader. Corollary 5.35 Let fk be a sequence of measurable functions on E. If there exists φ ∈ L(E) such that fk ≤ φ a.e. on E for all k, then lim sup fk ≥ lim sup fk . E

k−∞

k→∞

E

Proof. This follows from Fatou’s lemma since −fk ≥ − φ a.e. and lim inf −fk = − lim sup fk . Theorem 5.36 (Lebesgue’s Dominated Convergence Theorem) Let fk be a sequence of measurable functions on E such that fk → f a.e. in E. If there exists φ ∈ L(E) such that fk ≤ φ a.e. in E for all k, then E fk → E f . Proof. By hypothesis, −φ ≤ fk ≤ φ a.e. in E. Therefore, 0 ≤ fk + φ ≤ 2φ a.e. in E. Since 2φ ∈ L(E), we conclude from Theorem 5.19 that E fk + φ → E ( f + φ). Since φ, f , and all the fk are integrable on E, the result follows from Theorem 5.28. See Exercises 23 and 26 for two useful variants of Theorem 5.36, one about weakening the assumption that fk ≤ φ and the other about replacing the hypothesis of pointwise convergence of fk by convergence in measure. The following special case of the dominated convergence theorem is often useful if E has finite measure. Corollary 5.37 (Bounded Convergence Theorem) Let fk be a sequence of measurable functions on E such that fk → f a.e. inE. If |E| < +∞ and there is a finite constant M such that |fk | ≤ M a.e. in E, then E fk → E f . In later chapters, we will consider the integrals of complex-valued functions. Here, we mention only the definition. (See p. 183 in Section 8.1 for some further remarks.) If f = f1 + if2 with f1 and f2 real-valued, we define E

f =

E

f1 + i

E

f2 ,

97

The Lebesgue Integral

provided the integrals on the right exist and are finite. (For the measurability of such f , see Exercise 3 Chapter 4.) Many basic properties of the ordinary integral are valid in this case.

5.4 Relation between Riemann–Stieltjes and Lebesgue Integrals, and the Lp Spaces, 0 < p < ∞ It turns out that there is a remarkably simple and useful representation of Lebesgue integrals over subsets of Rn in terms of Riemann–Stieltjes integrals (over subsets of R1 , of course). In order to establish this relation, we must first study the function ω(α) = ωf ,E (α) = |{x ∈ E : f (x) > α}|, where f is a measurable function on E and −∞ < α < +∞. We call ωf ,E the distribution function of f on E. Some properties of ω were given in Exercise 18 of Chapter 4. Clearly, it is not affected by changing f in a set of measure zero, and it is decreasing. As α +∞, {x ∈ E : f (x) > α} {x ∈ E : f (x) = +∞}; hence, assuming that f is finite a.e. in E, by Theorem 3.26(ii), lim ω(α) = 0,

α→+∞

unless ω(α) ≡ +∞. Similarly, lim ω(α) = |E|.

α→−∞

We will assume from now on that |E| < +∞. This insures that ω is bounded, that limα→+∞ ω(α) = 0, and that ω is of bounded variation on (−∞, +∞) with variation equal to |E|. The assumption is made only to simplify the properties of ω, and is not entirely necessary (see, e.g., Exercise 16); in fact, the case |E| = +∞ is often important. In the following results, we assume that f is a measurable function that is finite a.e. in E, |E| < +∞, and we write ω(α) = ωf ,E (α), Lemma 5.38

{ f > a} = {x ∈ E : f (x) > α}, etc.

If α < β, then |{α < f ≤ β}| = ω(α) − ω(β).

98

Measure and Integral: An Introduction to Real Analysis

Proof. We have { f > β} ⊂ { f > α} and {a < f ≤ β} = { f > α} − { f > β}. Since |{ f > β}| < +∞, the lemma follows from Corollary 3.25. Given α, let ω(α+) = lim ω(α + ε), ε0

ω(α−) = lim ω(α − ε) ε0

denote the limits of ω from the right and left at α.

Lemma 5.39 (a) ω(α+) = ω(α); that is, ω is continuous from the right. (b) ω(α−) = |{ f ≥ α}|. Proof. If εk 0, then { f > α + εk } { f > α} and { f > α − εk } { f ≥ α}. Since these sets have finite measures, it follows from Theorem 3.26 that ω(α + εk ) → ω(α) and ω(α − εk ) → |{ f ≥ α}|. This completes the proof. We now know that ω is a decreasing function that is continuous from the right. It may have jump discontinuities, with jumps ω(α−)−ω(α), and intervals of constancy. These possibilities are characterized by the behavior of f stated in the following result.

Corollary 5.40 (a) ω(α−) − ω(α) = |{ f = α}|; in particular, ω is continuous at α if and only if |{ f = α}| = 0. (b) ω is constant in an open interval (α, β) if and only if |{α < f < β}| = 0, that is, if and only if f takes almost no values between α and β. Proof. Since |{ f ≥ α}| = |{ f > α}| + |{ f = α}|, part (a) follows immediately from Lemma 5.39(b). To prove part (b), note that |{α < f < β}| = |{ f > α}| − |{ f ≥ β}| = ω(α) − ω(β−). This is zero if and only if ω is constant in the half-open interval [α, β). However, since ω is continuous from the right, it is constant in (α, β) if and only if it is constant in [α, β). The rest of the theorems in this section give relations between Lebesgue and Riemann–Stieltjes integrals. As always, f is measurable and finite a.e. in E, |E| < + ∞ and ω = ωf ,E·

99

The Lebesgue Integral

Theorem 5.41

If a < f (x) ≤ b (a and b finite) for all x ∈ E, then

f =−

b

α dω(α).

a

E

Proof. The Lebesgue integral on the left exists and is finite since f is bounded and |E| < +∞. The Riemann–Stieltjes integral on the right exists by Theorem 2.24. To show that they are equal, partition the interval [a, b] by a = α0 < α1 < · · · < αk = b and let Ej = αj−1 < f ≤ αj . The Ej are disjoint and E = kj=1 Ej . Hence, E f = kj=1 Ej f and, therefore, k

k

αj−1 Ej ≤ f ≤ αj Ej .

j=1

E

j=1

By Lemma 5.38, Ej = ω αj−1 − ω αj = − ω αj − ω αj−1 . Hence, b these sums are Riemann–Stieltjes sums for − a α d ω(α). Since these sums b must converge to − a α d ω(α) as the norm of the partition tends to zero, the conclusion follows. Theorem 5.41 can be extended to the case when f is not bounded on E as follows. Theorem 5.42 Let f be any measurable function on E, and let Eab = {x ∈ E : a < f (x) ≤ b} (a and b finite). Then, Eab

f =−

b

α dω(α).

a

Proof. Let ωab (α) = {x ∈ Eab : f (x) > α}. Then ωab is the distribution func b tion of f on Eab . By Theorem 5.41, we have Eab f = − a α dωab (α). We claim b that the last expression equals − a α dω(α). By taking limits of Riemann– Stieltjes sums that approximate the integrals, we only need to show that for a ≤ α < β ≤ b. By Lemma 5.38, this is ωab (α) − ωab (β) = ω(α) − ω(β) equivalent to showing that x ∈ Eab : α < f (x) ≤ β = |{x ∈ E : α < f (x) ≤ β}| for such the definition of Eab and the restrictions on α and β. However, by α and β, x ∈ Eab : α < f (x) ≤ β = {x ∈ E : α < f (x) ≤ β}. This proves the claim and the theorem too.

100

Measure and Integral: An Introduction to Real Analysis

In both Theorems 5.41 and 5.42, the integrals of f are extended over sets where f is bounded. This restriction is removed in the next theorem, where we define (see p. 34 in Section 2.4) +∞

α dω(α) = lim

b

a→−∞ b→+∞ a

−∞

α dω(α),

if the limit exists.

Theorem 5.43 If either exists and is finite, and

Ef

or

+∞ −∞

f =−

α dω(α) exists and is finite, then the other +∞

α dω(α).

−∞

E

b Proof. By Theorem 5.42, Eab f = − a α dω(α). If f ∈ L(E), then Eab f → f + and f − . Therefore, E f as a → −∞, b → +∞ since this holds for both b lima→−∞, b→+∞ − a α dω(α) exists and equals E f , which proves half of the theorem. ∞ +∞ Now suppose that −∞ α dω(α) exists and is finite. Then 0 α dω(α) ∞ is finite, and we claim that E f + = − 0 α dω(α). By Theorem 5.42, for b ∞ b > 0, E0b f = − 0 α dω(α). Therefore, as b → +∞, E0b f → − 0 α dω(α). On the other hand, as b increases to +∞, E0b {0 < f < +∞}. Therefore, E0b

f =

E0b

f+ →

{0 0.

(5.49)

{ f >α}

The proof is left as an exercise. Hence, if f is in Lp (E), then αp ω(α) remains bounded as α → + ∞. A stronger result is actually true.

105

The Lebesgue Integral

If 0 < p < ∞ and f ∈ Lp (E), then

Lemma 5.50

lim αp ω(α) = 0.

α→+∞

Proof. This will be a corollary of (5.49) if we show that { f >α} f p → 0 as α → +∞. We may suppose that α runs through a sequence αk → +∞. Let p fk = f wherever f > αk and fk = 0 elsewhere. Then { f >αk } f p = E fk . Since f is p

finite a.e., fk → 0 a.e. Moreover, 0 ≤ fk ≤ |f |p ∈ L(E), and the result follows from the dominated convergence theorem. In the next theorem, we use Lemma 5.50 to integrate the Riemann–Stieltjes integral in (5.48) by parts.

Theorem 5.51

If 0 < p < ∞, f ≥ 0, and f ∈ Lp (E), then E

fp = −

∞

αp dω(α) = p

0

∞

αp−1 ω(α)dα,

0

where the last integral may be interpreted as either a Lebesgue or an improper Riemann integral. Proof. The first equality is just (5.48). For the second, if 0 < a < b < +∞, we have −

b

αp dω(α) = −bp ω(b) + ap ω(a) + p

a

b

αp−1 ω(α) dα,

a

by Theorem 2.21 and the fact that αp is continuously differentiable on [a, b]. Here, together with Theorem 2.21, we use the fact that for partitions = {αk } of [a, b] and intermediate points {βk } satisfying αk < βk < αk+1 and p p p−1 αk+1 − αk , we have αk+1 − αk = pβk b a

p−1 ω(α) d αp = lim ω (βk ) pβk αk+1 − αk ||→0

=p

b a

k

αp−1 ω(α)dα.

106

Measure and Integral: An Introduction to Real Analysis

The last integral exists in the Riemann sense by Theorem 5.54 since the number of discontinuities of ω is at most countable. By Theorem 5.52, the last integral may also be interpreted in the Lebesgue sense. Now let a → 0 and b → +∞. Then bp ω(b) → 0 by Lemma 5.50, ap ω(a) → 0 since |E| < +∞ (see also Exercise 14), and the theorem follows. Note that in case 1 ≤ p < ∞, the proof works with a = 0 and no other changes. For an extension of Theorem 5.51, see Exercise 16. See also Exercise 5 of Chapter 6. In practice, the representation of E |f |p as a Riemann–Stieltjes integral provides a powerful tool for determining whether or not f ∈ Lp (E).

5.5 Riemann and Lebesgue Integrals We now study a relation between Lebesgue and Riemann integrals over finite bounded functions intervals [a, b] in R1 and give a characterization of those that are Riemann integrable. The Lebesgue integral [a,b] f will be denoted by b b a f and the Riemann integral by (R) a f . Theorem 5.52 Let f be a bounded function that is Riemann integrable on [a, b]. Then f ∈ L[a, b] and b

f = (R)

a

b

f.

a

Proof. Let {k } be a sequence of partitions of [a, b] with norms tending to (k) (k) zero. For each k, define two simple functions as follows: if x1 < x2 < · · · are the partitioning points of k , let lk (x) and uk (x) be defined in each semiopen

(k) (k) (k) interval [x(k) i , xi+1 ) as the inf and sup of f on xi , xi+1 , respectively. Then lk and uk are uniformly bounded and measurable in [a, b), and if Lk and Uk denote the lower and upper Riemann sums of f corresponding to k , we have

b a

lk = Lk ,

b

uk = Uk .

a

Note also that lk ≤ f ≤ uk on the half-open interval [a, b) and, if we assume that k+1 is a refinement of k , that lk and uk . Let l = limk→∞ lk and u = limk→∞ uk . Then l and u are measurable, l ≤ f ≤ u on [a, b), and, by b b the bounded convergence theorem, Lk → a l and Uk → a u. But since

107

The Lebesgue Integral

f is Riemann integrable, Lk and Uk both converge to (R) Therefore, b (R)

f =

a

b

l=

b

a

b a

f by Theorem 2.29.

u.

a

Since u − l ≥ 0, Theorem 5.11 implies that l = f = u a.e. in [a, b]. Therefore, f b b is measurable and (R) a f = a f , which completes the proof. Theorem 5.52 says that any function that is Riemann integrable is also Lebesgue integrable and that the two integrals are equal. There are, of course, bounded functions that are Lebesgue integrable but not Riemann integrable. One such is the Dirichlet function defined for 0 ≤ x ≤ 1 by letting f (x) = 1 if x is rational and f (x) = 0 if x is irrational. Since f = 0 except for a subset of [0,1] of measure zero, its Lebesgue integral is 0. On the other hand, its Riemann integral does not exist since every upper Riemann sum is 1 and every lower Riemann sum is 0. The practical value of Theorem 5.52 is that it allows us to compute the Lebesgue integral of Riemann integrable (e.g., continuous) functions. Using the monotone convergence theorem, we can easily extend Theorem 5.52 to include improper Riemann integrals of nonnegative functions. Special as it is, the following result is useful in applications. Theorem 5.53 Let f be nonnegative on a finite interval [a, b] and Riemann integrable (so, in particular, bounded) over every subinterval [a + ε, b], ε > 0. Define the improper Riemann integral b

I = lim (R) ε→0

f,

a+ε

0 ≤ I ≤ +∞. Then f is measurable on [a, b] and b

f = I.

a

Proof. Observe that by Theorem 5.52, for every ε > 0, f is measurable on b b [a + ε, b] and a+ε f = (R) a+ε f . Measurability of f on [a, b] follows easily from b Theorem 4.12 by letting ε → 0. The formula a f = I follows similarly from the monotone convergence theorem.

108

Measure and Integral: An Introduction to Real Analysis

Similar results hold for improper Riemann integrals over infinite intervals. For example, if a ∈ (−∞, ∞) and f is nonnegative on [a, ∞) and Riemann integrable on every [a, N], a < N < ∞, then f is measurable on [a, ∞) and N ∞ f = limN→∞ (R) a f . Verification is left to the reader; note that since a N (R) a f increases with N, the limit exists but may be +∞. We note in passing that the finiteness of the improper Riemann integral of an f that is not nonnegative does not in general imply that f is integrable (see Exercise 7). Our final result is a characterization of those bounded functions that are Riemann integrable. Theorem 5.54 A bounded function is Riemann integrable on [a, b] if and only if it is continuous a.e. in [a, b]. Proof. Suppose that f is bounded and Riemann integrable. Let k , lk , uk , etc., be as in the proof of Theorem 5.52. Let Z be the set of measure zero outside which l = f = u. We claim that if x is not a partitioning point of any k and if x∈ / Z, then f is continuous at x. In fact, if f is not continuous at x and x is never a partitioning point, there exists ε > 0, depending on x but not on k, such that / Z. uk (x)−lk (x) ≥ ε. This implies that u(x)−l(x) ≥ ε, which is impossible if x ∈ Therefore, f is continuous a.e. in [a, b]. To prove the converse, let f be a bounded function that is continuous a.e. in [a, b]. Let k be any sequence of partitions with norms tending to zero, and define the corresponding lk , uk , Lk , and Uk as in Theorem 5.52. Note that lk and uk may not be monotone since k+1 may not be a refinement of k . However, by the continuity of f , both lk and uk converge a.e. to f . Hence, b b b by the bounded convergence theorem, a lk and a uk both converge to a f . b b Since Lk = a lk and Uk = a uk , it follows that the upper and lower Riemann sums converge to the same limit. Therefore, f is Riemann integrable.

Exercises 1. If f is a simple measurable function (not necessarily nonnegative) tak N ing values aj on Ej , j = 1, 2, . . . , N, show that E f = j=1 aj Ej . (Use Theorem 5.24.) 2. Show that the conclusions of Theorem 5.32 are not generally true without the assumption that φ ∈ L(E). (In part (ii), for example, take fk = χ(k,∞) .) Show that Theorem 5.33 fails without the assumption that |E| < ∞.

109

The Lebesgue Integral

3. Let fk be a sequence of nonnegative measurable functions defined on E. If fk → f and fk ≤ f a.e. on E, show that E fk → E f . 4. If f ∈ L(0, 1), show that xk f (x) ∈ L(0, 1) for k = 1, 2, . . . , and that 1 k 0 x f (x) dx → 0. 5. Use Egorov’s theorem to prove the bounded convergence theorem. 6. Let f (x, y), 0 ≤ x, y ≤ 1, satisfy the following conditions: for each x, f (x, y) is an integrable function of y, and (∂f (x, y)/∂x) is a bounded function of (x, y). Show that (∂f (x, y)/∂x) is a measurable function of y for each x and ∂ d f (x, y) dy. f (x, y) dy = dx ∂x 1

1

0

0

7. Give an example of an f that is not integrable, but whose improper Riemann integral exists and is finite. 8. Prove (5.49).

p m 9. If p > 0 and E f − fk → 0 as k → ∞, show that fk −→f on E (and thus that there is a subsequence fkj → f a.e. in E). p p 10. If p > 0, E f − fk → 0, and E fk ≤ M for all k, show that E |f |p ≤ M.

11. For which p > 0 does 1/x ∈ Lp (0, 1)? Lp (1, ∞)? Lp (0, ∞)? 12. Give an example of a bounded continuous f on (0, ∞) such that limx→∞ f (x) = 0 but f ∈ / Lp (0, ∞) for any p > 0. 13. (a) Let fk be a sequence of measurable on E. Show that fk functions converges absolutely a.e. in E if E fk < +∞. (Use Theorems 5.16 and 5.22.) (b) If {rk } denotes the in [0,1] and {ak } satisfies |ak | < rational numbers +∞, show that ak |x − rk |−1/2 converges absolutely a.e. in [0,1]. 14. Prove the following result (which is obvious if |E| < +∞), describing the behavior of ap ω(a) as a → 0+. If f ∈ Lp (E), then lima→0+ ap ω(a) = 0. (If f ≥ 0, ε > 0, choose δ > 0 so that { f ≤δ} f p < ε. Thus, ap [ω(a) − ω(δ)] ≤ p {a 0, show that f ∈ Lr , 0 < r < p. k kp 18. If f ≥ 0, show that f ∈ Lp if and only if +∞ < +∞. (Use k=−∞ 2 ω 2 Exercise 16.) 0

19. Derive analogues of Theorems 5.52 and 5.54 for integrals over intervals in Rn , n > 1. 20. Let y = Tx be a nonsingular linear transformation of Rn . If E f (y) dy exists, show that E

f (y) dy = | det T|

f (Tx) dx.

T−1 E

(The case when f = χE1 , E1 ⊂ E, follows from integrating the formula χE1 (Tx) = χT−1 E1 (x) over T−1 E and then applying Theorem 3.35.) 21. If A f = 0 for every measurable subset A of a measurable set E, show that f = 0 a.e. in E. 22. Show that the conclusion ofthe dominated convergence theo Lebesgue rem can be strengthened to E fk − f → 0. 23. Prove the following fact, sometimes referred to as the Sequential (or Gener alized) Version of the Lebesgue Dominated Convergence Theorem. Let fk and {φk } be sequences of measurable functions on E satisfying fk → f a.e. in E, φk → φ a.e. in E, and fk ≤ φk a.e. in E. If φ ∈ L(E) and E φk → E φ, then f → 0. (In case f = 0 and all fk ≥ 0, apply Fatou’s lemma E fk − to φk − fk .) An application is given in Exercise 12of Chapter 8; for ≥ 0, f → f a.e. in E, f ∈ L(E), and f → example, if f k k k E E f , then E fk − f → 0. 24. A measurable function f on E is said to belong to weak Lp (E), 0 < p < ∞, if there is a constant A ≥ 0 such that ω|f | (α) ≤ Aα−p for all α > 0 (cf. (7.8) in case p = 1): (a) Show that if f ∈ Lp (E), then f belongs to weak Lp (E), but that the converse is generally false.

(b) Show that if 1 < p < r < ∞ and f belongs to both weak L1 (E) and weak Lr (E), then f ∈ Lp (E). (c) Show that if f belongs to weak L1 (E) and f is bounded on E, then f ∈ Lp (E) for all 1 < p < ∞. 25. Give an example to show that the analogue of Theorem 5.8 with the roles of sup and inf interchanged is false. 26. Prove the following variant of Lebesgue’s dominated convergence theo m rem: if fk satisfies fk −→f on E and fk ≤ φ ∈ L(E), then f ∈ L(E) and

111

The Lebesgue Integral

that every subsequence of fk has a subsequence E fk → E f . (Show fkj such that E fkj → E f .) 27. The notion of equimeasurability of functions can be extended to different sets E1 and E2 , even in different dimensions, by saying that two measurable functions f1 , f2 defined on E1 , E2 , respectively, are equimeasurable if x ∈ E1 : f1 (x) > α = y ∈ E2 : f2 (y) > α

for all α.

(a) Show that if f is measurable and finite a.e. in E and ωf is strictly decreasing and continuous, then f and the inverse function of ωf are equimeasurable (on E and (0, |E|), respectively). (b) Let f be measurable and finite a.e. in E, and suppose that ωf is finite. Define f ∗ (t) = inf α > 0 : ωf (α) ≤ t , t > 0. Show that f and f ∗ are equimeasurable on E and (0, ∞), respectively. (The function f ∗ is called the nonincreasing rearrangement of f .) 28. Let E be a measurable set in Rn with |E| < ∞. Suppose that f > 0 a.e. in E and f , log f ∈ L1 (E). Prove that ⎛ ⎞ ⎞1/p 1 1 lim ⎝ f p ⎠ = exp ⎝ log f ⎠ . |E| p→0+ |E| ⎛

E

E

(Start by using Theorem 5.36 to show that E f p → |E| as p → 0+. Note that f p − 1 /p → log f .) 29. Let f be measurable, nonnegative, and finite a.e. in a set E. Prove that for any nonnegative constant c, E

ecf (x) dx = |E| + c

∞

ecα ωf (α) dα.

0

Deduce that ecf ∈ L(E) if |E| < ∞ and there exist constants C1 and c1 such that c1 > c and ωf (α) ≤ C1 e−c1 α for all α > 0. We will study such an exponential integrability property in Section 14.5.

6 Repeated Integration Let f (x, y) be defined in a rectangle I=

x, y : a ≤ x ≤ b, c ≤ y ≤ d .

If f is continuous, we have the classical formula ⎡ ⎤ b d f x, y dxdy = ⎣ f x, y dy⎦ dx, I

a

c

and there is an analogous formula for functions of n variables. Sections 6.1 and 6.2 extend this and related results on repeated integration to the case of Lebesgue integrable functions. Section 6.3 contains some applications.

6.1 Fubini’s Theorem We shall use the following notation. Let x = (x1 , . . ., xn ) be a point of an n-dimensional interval I1 , I1 = {x = (x1 , . . . , xn ) : ai ≤ xi ≤ bi , i = 1, . . . , n} , and let y be a point of an m-dimensional interval I2 , I2 = y = y1 , . . . , ym : cj ≤ yj ≤ dj , j = 1, . . . , m . Here, I1 and I2 may also be partly open or unbounded, such as all of Rn and Rm , respectively. The Cartesian product I = I1 × I2 is contained in Rn+m and consists of points (x1 , . . ., xn , y1 , . . ., ym ). We shall denote such points by (x,y). A function m ) defined in I will be written f (x,y), and its f (x1 , . . ., xn , y1 , . . ., y integral I f will be denoted by I f (x, y) dxdy.

113

114

Measure and Integral: An Introduction to Real Analysis

Theorem 6.1 (Fubini’s Theorem) Let f (x, y) ∈ L(I), I = I1 × I2 . Then (i) For almost every x ∈ I1 , f (x, y) is measurable and integrable on I2 as a function of y; (ii) As a function of x, I2 f (x, y) dy is measurable and integrable on I1 , and ⎡ ⎤ f (x, y) dxdy = ⎣ f (x, y) dy⎦ dx. I

I1

I2

Setting f = 0 outside I, we see that it is enough to prove the theorem when we then drop I1 , I2 , and I from I1 = Rn , I2 = Rm , and I = Rn+m . For simplicity, the notation and write f (x, y) dx for I1 f (x, y) dx, L(dx) for L(I1 ), L(dxdy) for L(I), etc. We will prove the theorem by considering a series of special cases. The first two lemmas below will help in passing from one case to the next. In these lemmas, we say that a function f in L(dxdy) for which Fubini’s theorem is true has property F .

Lemma 6.2 property F .

A finite linear combination of functions with property F has

This follows immediately from Theorems 4.9 and 5.28. Lemma 6.3 Let f1 , f2 , . . ., fk , . . . have property F . If fk f or fk f , and if f ∈ L(dxdy), then f has property F . Proof. We will concentrate on integrability properties, leaving questions of measurability to the reader. Changing signs if necessary, we may assume that a set Zk in Rn with measure zero fk f . For each k, there exists by hypothesis

/ Zk . Let Z = k Zk , so that Z has Rn -measure such that fk (x, y) ∈ L(dy) if x ∈ zero. If x ∈ / Z, then fk (x, y) ∈ L(dy) for all k, and therefore, by the monotone convergence theorem applied to {fk (x, y)} as functions of y, hk (x) = fk (x, y) dy h(x) = f (x, y) dy (x ∈ / Z). By assumption, we have hk (x) ∈ L(dx), fk ∈ L(dxdy), and fk (x, y) dxdy = another application of the monotone convergence thehk (x) dx. Therefore, orem gives f (x, y) dxdy = h(x) dx. Since f ∈ L(dxdy), it follows that h ∈ L(dx), which implies that h is finite a.e. This completes the proof. The next three lemmas prove special cases of Fubini’s theorem.

115

Repeated Integration

Lemma 6.4 If E is a set of type Gδ , namely, E = measure, then χE has property F .

∞

k=1

Gk , and if G1 has finite

Proof. Case 1. Suppose that E is a bounded open interval in Rn+m : E = J1 × J2 , where J1 and J2 are bounded open intervals in Rn and Rm , respectively. Then |E| = |J1 ||J2 |, where |J1 | and |J2 | denote the measures of J1 and J2 in Rn and m R . For every x, χE (x, y) is clearly measurable as a function of y. If h(x) = χE (x, y) dy, then h(x) = |J2| for x ∈ J1 , and h(x) = 0 otherwise. Therefore, h(x) dx = |J1 ||J2 |. But also, χE (x, y) dxdy = |E| = |J1 ||J2 |, and the lemma is proved in this case. Case 2. Suppose that E is any set (of type Gδ or not) on the boundary of m an interval in Rn+m . Then for almost every x, the set {y : (x, y) ∈ E} has R measure zero. Therefore, if h(x) = χE (x, y) dy, it follows that h(x) = 0 a.e. Hence, h(x) dx = 0. But also, χE (x, y) dxdy = |E| = 0. Case 3. Suppose next that E is a partly open interval in Rn+m . Then E is the union of its interior and a subset of its boundary. It follows from cases 1 and 2 and Lemma 6.2 that χE has property F .

Case 4. Let E be an open set in Rn+m with finite measure. Write E = Ij ,

k where the Ij are disjoint, partly open intervals. If Ek = j=1 Ij , then χEk = k j=1 χIj , so that χEk has property F by case 3 and Lemma 6.2. Since χEk χE , χE has property F by Lemma 6.3. Case 5. Let E satisfy the hypothesis of Lemma 6.4. We may assume that Gk E by considering the open sets G1 , G1 ∩ G2 , G1 ∩ G2 ∩ G3 , etc. Then χGk χE , and the lemma follows from case 4 and Lemma 6.3. Lemma 6.5 If Z is a subset of Rn+m with measure zero, then χZ has property F . Hence, for almost every x ∈ Rn , the set {y : (x, y) ∈ Z} has Rm -measure zero. Proof. Using Theorem 3.8, select a set H of type Gδ such that Z ⊂ H and |H| = 0. If H = Gk , we may assume that G1 has finite measure, so that by Lemma 6.4,

χH (x, y) dxdy = 0. χH (x, y) dy dx =

Therefore, by Theorem 5.11, |{y : (x, y) ∈ H}| = χH (x, y) dy = 0 for almost every x. If {y : (x, y) ∈ H) has Rm -measure zero, so does {y : (x, y) ∈ Z} since every x, χZ (x, y) is measurable in y and Z ⊂ H. It follows that for almost y) dy = 0. Hence, [ χ (x, y) dy] dx = 0, which proves the lemma χZ (x, Z since χZ (x, y) dxdy = |Z| = 0.

116

Lemma 6.6 property F .

Measure and Integral: An Introduction to Real Analysis

Let E ⊂ Rn+m . If E is measurable with finite measure, then χE has

Proof. Using Theorem 3.28, write E = H − Z, where H is of type Gδ and Z has measure zero. If H = Gk , choose G1 with finite measure (see the proof of Theorem 3.28). Since χE = χH − χZ , the result follows from Lemmas 6.2, 6.4, and 6.5. Proof of Fubini’s theorem. We must show that every f ∈ L(dxdy) has property F . Since f = f + − f − , we may assume by Lemma 6.2 that f ≥ 0. Then, by Theorem 4.13, there are simple measurable fk f , fk ≥ 0. Each fk ∈ L(dxdy), and by Lemma 6.3, it is enough to show that these have property F . Hence, we may assume that f is simple and integrable, say f = N j=1 vj χEj . Since each Ej for which vj = 0 must have finite measure, the result follows from Lemmas 6.2 and 6.6. If f ∈ L(Rn+m ), then by Fubini’s theorem, f (x,y) is a measurable function of y for almost every x ∈ Rn . We now show that the same conclusion holds if f is merely measurable. Theorem 6.7 Let f (x, y) be a measurable function on Rn+m . Then for almost every x ∈ Rn , f (x, y) is a measurable function of y ∈ Rm . In particular, if E is a measurable subset of Rn+m , then the set Ex = {y : (x, y) ∈ E} is measurable in Rm for almost every x ∈ Rn . Proof. Note that if f is the characteristic function χE of a measurable E ⊂ Rn+m , then the two statements of the theorem are equivalent. To prove the result in this case, write E = H ∪ Z, where H is of type Fσ in Rn+m and |Z|n+m = 0. Then Ex = Hx ∪ Zx , Hx is of type Fσ in Rm , and for almost every x ∈ Rn , |Zx |m = 0 by Lemma 6.5. Therefore, Ex is measurable for almost every x. If f is any measurable function on Rn+m , consider the set E(a) = {(x, y) : f (x, y) > a}. Since E(a) is measurable in Rn+m , the set E(a)x = {y : (x, y) ∈ E(a)} is measurable in Rm for almost every x ∈ Rn . The exceptional set of Rn -measure zero depends on a. The union Z of these exceptional sets for all / Z, then {y : f (x, y) > a} is mearational a still has Rn -measure zero. If x ∈ surable for all rational a and so for all a by Theorem 4.4. This completes the proof. We will now extend Fubini’s theorem to functions defined on measurable subsets of Rn+m .

117

Repeated Integration

Theorem 6.8 Let f (x, y) be a measurable function defined on a measurable subset E of Rn+m , and let Ex = {y : (x, y) ∈ E}.

(i) For almost every x ∈ Rn , f (x, y) is a measurable function of y on Ex . (ii) If f (x, y) ∈ L(E), then for almost every x ∈ Rn , f (x, y) is integrable on Ex with respect to y; moreover, Ex f (x, y) dy is an integrable function of x and

f (x, y) dxdy =

Rn

E

⎡

⎤

⎣ f (x, y) dy⎦ dx. Ex

Proof. Let f¯ be the function equal to f in E and to zero elsewhere in Rn+m . Since f is measurable on E, f¯ is measurable on Rn+m . Therefore, by Theorem 6.7, f¯ (x, y) is a measurable function of y for almost every x ∈ Rn . Since Ex is measurable for almost every x ∈ Rn , it follows that f (x, y) is measurable on almost every Ex . This proves (i). If f ∈ L(E), then f¯ ∈ L(Rn+m ) and

f (x, y) dxdy =

E

Rn+m

f¯ (x, y) dxdy =

Rn

f¯ (x, y) dy dx.

Rm

Since Ex is measurable for almost every x, we obtain by Theorem 5.24 that Rm f¯ (x, y) dy = Ex f (x, y) dy for almost every x ∈ Rn . Part (ii) follows by combining equalities.

6.2 Tonelli’s Theorem By Fubini’s theorem, the finiteness of a multiple integral implies that of the corresponding iterated integrals. The converse is not true, even if all the iterated integrals are equal, as shown by the following example. Example 6.9 Let n = m = 1 and let I be the unit square and {Ik } be the infinite sequence of subsquares shown in the following illustration:

118

Measure and Integral: An Introduction to Real Analysis

1

(1,1) I3 I2

1 2

I1

1 2

1

Subdivide each Ik into four equal subsquares by lines parallel to the x- and y-axes.

Ik(4)

Ik(3)

Ik(1)

Ik(2)

Ik (1)

(3)

For each k, let f = 1/|Ik | on the interiors of Ik and Ik and let f = −1/|Ik |

on the interiors of Ik(2) and Ik(4) . Let f = 0 on the rest of I, that is, outside Ik 1 and on the boundaries of all the subsquares. Clearly, 0 f (x, y) dy = 0 for all 1 x, and 0 f (x, y) dx = 0 for all y. Therefore, ⎡ ⎤ ⎡ ⎤ 1 1 1 1 ⎣ f (x, y) dy⎦ dx = ⎣ f (x, y) dx⎦ dy = 0. 0

0

0

0

However, I

f + (x, y) dxdy =

k

Ik

f + (x, y) dxdy =

1 k

2

= +∞.

119

Repeated Integration

Similarly, I f − dxdy = +∞. Hence, finiteness of the iterated integrals of f does not in general imply either the existence of the multiple integral of f or the finiteness of the multiple integral of | f |. However, for nonnegative f , we have the following basic result.

Theorem 6.10 (Tonelli’s Theorem) Let f (x, y) be nonnegative and measurable on an interval I = I1 ×I2 of Rn+m . Then, for almost every x ∈ I1 , f (x, y) is a measurable function of y on I2 . Moreover, as a function of x, I2 f (x, y) dy is measurable on I1 , and

f (x, y) dxdy =

I

⎤ ⎡ ⎣ f (x, y) dy⎦ dx.

I1

I2

Proof. This is actually a corollary of Fubini’s theorem. For k = 1, 2, . . ., let fk (x, y) = 0 if |(x, y)| > k and fk (x, y) = min {k, f (x, y)} if |(x, y)| ≤ k. Then fk ≥ 0, fk f on I, and fk ∈ L(I) (fk is bounded and vanishes outside a compact set). Hence, Fubini’s theorem applies to each fk . The statement concerning the measurability of I2 f (x, y) dy then follows from its analogue for fk ; in fact, by the monotone convergence theorem, I2 fk (x, y) dy I2 f (x, y) dy. (The measurability of f (x,y) as a function of y was proved in Theorem 6.8.) By the monotone convergence theorem again,

fk (x, y) dxdy →

I

I1

f (x, y) dxdy,

and

I

⎤ ⎤ ⎡ ⎡ ⎣ fk (x, y) dy⎦ dx → ⎣ f (x, y) dy⎦ dx. I2

I1

I2

Since fk ∈ L(I), the left-hand sides in the last two limits are equal. Therefore, so are the right-hand sides, and the theorem follows. An extension of Tonelli’s theorem to functions defined over arbitrary measurable sets E is straightforward. Since the roles of x and y can be interchanged above, it follows that if f is nonnegative and measurable, then I

f (x, y) dxdy =

I1

⎤ ⎤ ⎡ ⎡ ⎣ f (x, y) dy⎦ dx = ⎣ f (x, y) dx⎦ dy. I2

I2

I1

In particular, we obtain the important fact that for f ≥ 0, the finiteness of any one of Fubini’s three integrals implies that of the other two. Hence, for any measurable

120

Measure and Integral: An Introduction to Real Analysis

f , the finiteness of one of these integrals for |f | implies that f is integrable and that all three Fubini integrals of f are equal. An easy consequence of Tonelli’s theorem is that the conclusion

f (x, y) dxdy =

I

I1

⎡ ⎤ ⎣ f (x, y) dy⎦dx I2

of Fubini’s theorem (including the existence and measurability of the inner integral on theright-hand side) holds for measurable f even if I f = ±∞ (i.e., it holds if I f merely exists). In fact, if I f = +∞, we have I f + = +∞ and f − ∈ L(I). By Tonelli’s theorem, I

Since

I

f+ =

I1

⎛ ⎝

⎞ f + dy⎠dx,

I2

I

f− =

I1

⎛ ⎝

⎞ f − dy⎠dx.

I2

f − is finite, the desired formula follows by subtraction.

6.3 Applications of Fubini’s Theorem We shall derive several important results as corollaries of Fubini’s and Tonelli’s theorems. The first one is the necessity of the condition in Theorem 5.1. Using the notation of Chapter 5, we will prove the following result. Theorem 6.11 Let f be a nonnegative function defined on a measurable set E ⊂ Rn . If R(f , E), the region under f over E, is a measurable subset of Rn+1 , then f is measurable. Proof. For 0 ≤ y < +∞, we have {x ∈ E : f (x) ≥ y} = {x : (x, y) ∈ R(f , E)}. Since R( f ,E) is measurable, it follows from Theorem 6.7 that {x ∈ E : f (x) ≥ y} is measurable (in Rn ) for almost all (linear measure) such y. In particular, {x ∈ E : f (x) ≥ y} is measurable for all y in a dense subset of (0, ∞). If y is negative, then {x ∈ E : f (x) ≥ y} = E, which is measurable. We conclude that f is measurable (cf. Theorem 4.4). As a second application of Fubini’s theorem, we will prove a result about the convolution of two functions. More general results of this kind will be proved

121

Repeated Integration

in Chapter 9. If f and g are measurable in Rn , their convolution (f ∗ g)(x) is defined by (f ∗ g)(x) =

f (x − t)g(t) dt,

Rn

provided the integral exists. We first claim that f ∗ g = g ∗ f , that is, that

f (x − t)g(t) dt =

Rn

f (t)g(x − t) dt.

(6.12)

Rn

This is actually a special case of results dealing with changes of variable. In this simple case, however, it amounts to the statement that if x ∈ Rn , then

r(t) dt =

Rn

r(x − t) dt

(6.13)

Rn

when r(t) = f (x − t)g(t). For fixed x, x – t ranges over Rn as t does. Therefore, for any measurable r ≥ 0, (6.13) follows from the geometric interpretation of the integral. (See Theorem 5.1 and the definition of the integral of a nonnegative function.) For any measurable r, it follows by writing r = r+ − r− . (For the effect of a linear change of variables, see Exercise 20 of Chapter 5, and see Exercise 18 of Chapter 3 for the effect of translations. Measurability of r(x − t) as a function of t can then be deduced from that of r(t).) The result we wish to prove for convolutions is the following. Theorem 6.14 If f ∈ L(Rn ) and g ∈ L(Rn ), then (f ∗ g)(x) exists for almost every x ∈ Rn and is measurable. Moreover, f ∗ g ∈ L(Rn ) and

| f ∗ g| dx ≤

Rn

Rn

|f | dx

Rn

(f ∗ g) dx =

f dx

Rn

Rn

|g| dx , g dx .

Rn

In order to prove this, we need a lemma. Lemma 6.15 If f (x) is measurable in Rn , then the function F(x, t) = f (x − t) is measurable in Rn × Rn = R2n .

122

Measure and Integral: An Introduction to Real Analysis

Proof. Let F1 (x, t) = f (x). Since f is measurable, it follows as in Lemma 5.2 (or by n-fold iteration of the conclusion of Lemma 5.2) that F1 (x, t) is measurable in R2n : in fact, the set {(x, t) : F1 (x, t) > a}, which equals {(x, t) : f (x) > a, t ∈ Rn }, is a cylinder type set with measurable base {x : f (x) > a) in Rn . For (ξ, η) ∈ R2n , consider the transformation x = ξ − η, t = ξ + η. This is a nonsingular linear transformation of R2n , and therefore, by Theorem 3.33 (see Exercise 4 of Chapter 4), the function F2 defined by F2 (ξ, η) = F1 (ξ −η, ξ +η) is measurable in R2n . Since F2 (ξ, η) = f (ξ − η), the lemma follows. Proof of Theorem 6.14 Suppose first that both f and g are nonnegative on Rn . By Lemma 6.15, f (x − t)g(t) is measurable on Rn × Rn since it is the product of two such functions. Hence, the integral I=

f (x − t)g(t) dtdx

is well defined. By Tonelli’s theorem and (6.13), f (x − t)g(t) dt dx

= g(t) f (x − t) dx dt = f (x) dx g(t) dt .

I=

The first of these equations can be written I = (f ∗ g)(x) dx, where measurability of f ∗ g is guaranteed by Tonelli’s theorem, so that

f (x) dx g(x) dx . f ∗ g (x) dx = This proves the theorem for f ≥ 0 and g ≥ 0. For general f , g ∈ L(Rn ), it follows that

| f (x − t)| |g(t)| dtdx =

g dx < ∞. f dx f ∗ g dx =

Hence, f (x − t)g(t) ∈ L(dtdx). By Fubini’s theorem, a.e. x and is measurable and integrable; also,

f (x − t)g(t) dtdx =

f (x − t)g(t) dt exists for

f (x − t)g(t) dt dx = f (x) dx g(x) dx .

In particular, f ∗ g is well-defined a.e. and measurable (even integrable), and

(f ∗ g) =

f g .

123

Repeated Integration

Finally, since | f ∗g| ≤ | f |∗|g| wherever f ∗g exists, we obtain by integration that

| f ∗ g| ≤

(| f | ∗ |g|) =

|f|

|g| ,

and the proof is complete. See Exercise 21 of Chapter 9 for a criterion which guarantees measurability of f ∗ g. In the proof of Theorem 6.14, we showed the following useful fact. Corollary 6.16 If f and g are nonnegative and measurable on Rn , then f ∗ g is measurable on Rn and Rn

f ∗ g dx =

Rn

f dx

g dx .

Rn

Our final application of Fubini’s theorem is an important result due to Marcinkiewicz concerning the structure of closed sets. For simplicity, we will restrict our attention to the one-dimensional case. Given a closed subset F of R1 and a point x, let δ (x) = δ (x, F) = min x − y : y ∈ F denote the distance of x from F. Thus, δ(x)

= 0 if and only if x ∈ F. By Theorem 1.10, the complement of F is a union k (ak , bk ) of disjoint open intervals. At most, two of these intervals can be infinite. The graph of δ(x) is thus an irregular sawtooth curve: over any finite interval [ak , bk ], the graph is the sides of the isosceles triangle with base [ak , bk ] and altitude 12 (bk − ak ) ; outside the terminal points of F, the graph is linear. If we move from a point x to a point y, the distance from F cannot increase by more than |x − y|. Hence, δ (x) − δ y ≤ x − y ; that is, δ satisfies a Lipschitz condition. We shall prove the following theorem. (See also Exercises 7 through 9 and Theorem 9.19.) The result will be used in the proof of Theorem 12.67.

Theorem 6.17 (Marcinkiewicz) Let F be a closed subset of a bounded open interval (a, b), and let δ(x) = δ(x, F) be the corresponding distance function. Then, given λ > 0, the integral

124

Measure and Integral: An Introduction to Real Analysis

Mλ (x) = Mλ (x; F) =

b a

δλ y dy x − y1+λ

is finite a.e. in F. Moreover, Mλ ∈ L(F) and

Mλ ≤ 2λ−1 |G| ,

F

where G = (a, b) − F. Before the proof, we note that what makes the finiteness of Mλ (x) remarkable is the singular behavior of δλ y /|x − y|1+λ as y → x. Since δ(y) → δ(x) / F (see Exercise 9). If x ∈ F, then as y → x, it follows that Mλ (x) = +∞ if x ∈ δ(x) = 0, but the mere Lipschitz character of δ in the estimate δ y − δ (x)λ x − yλ δλ y 1 1+λ = 1+λ ≤ 1+λ = x − y x − y x − y x − y b is not enough to imply that Mλ (x) is finite since a dy/|x − y| = +∞. The key to the convergence of Mλ at a point x is the fact that δ(y) vanishes not only at x but also at every y ∈ F. Thus, roughly speaking, the finiteness of Mλ (x) means that F is very dense near x. In this regard, see also Exercise 9(b). Proof. Measurability of Mλ follows from Corollary 6.16. Since δ = 0 in F, integration in the integral defining Mλ can be restricted to the set G = (a, b)−F without changing Mλ . Thus,

Mλ (x) dx =

F

G

dx δ y dy, x − y1+λ λ

F

the change in the order of integration being justified by Tonelli’s theorem. To estimate the inner integral, fix y ∈ G and note that for any x ∈ F, we have |x − y| ≥ δ(y) > 0. Thus, F

dx ≤ x − y1+λ

∞ dt −λ dx −1 = 2 = 2λ δ y . 1+λ t1+λ x − y δ(y) |x−y|≥δ(y)

125

Repeated Integration

In particular,

Mλ (x) dx ≤

F

δ

λ

−1 −λ dy = 2λ−1 |G| < +∞. y 2λ δ y

G

Exercises 1. (a) Let E be a measurable subset of R2 such that for almost every x ∈ R1 , {y: (x, y) ∈ E) has R1 -measure zero. Show that E has measure zero and that for almost every y ∈ R1 , {x: (x, y) ∈ E} has measure zero. (b) Let f (x, y) be nonnegative and measurable in R2 . Suppose that for almost every x ∈ Rl , f (x, y) is finite for almost every y. Show that for almost every y ∈ Rl , f (x, y) is finite for almost every x. 2. If f and g are measurable in Rn , show that the function h(x, y) = f (x)g(y) is measurable in Rn ×Rn . Deduce that if E1 and E2 are measurable subsets of Rn , then their Cartesian product E1 × E2 = {(x, y) : x ∈ E1 , y ∈ E2 } is measurable in Rn × Rn , and |E1 × E2 | = |E1 ||E2 |. As usual in measure theory, 0 · ∞ and ∞ · 0 are interpreted as 0. 3. Let f be measurable and finite a.e. on [0,1]. If f (x) − f (y) is integrable over the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, show that f ∈ L[0, 1]. 4. Let f be measurable and periodic with period 1: f (t + 1) = f (t). Suppose that there is a finite c such that 1 f (a + t) − f (b + t) dt ≤ c 0

for all a and b. Show that f ∈ L[0,1]. (Set a = x, b = −x, integrate with respect to x, and make the change of variables ξ = x + t , η = −x + t.) 5. (a) If f is nonnegative and measurable on E and ω(y) = |{x ∈∞E : f(x) > y}|, y > 0, use Tonelli’s theorem to prove that E f = 0 ω y dy. (By definition of the integral, E f = R f , E = R(f ,E) dxdy. Use the observation in the proof of Theorem 6.11 that {x ∈ E : f (x) ≥ y} = {x : (x, y) ∈ R(f , E)}, and recall that ω(y) = |{x ∈ E : f (x) ≥ y}| unless y is a point of discontinuity of ω.) (b) Deduce from this special case the general formula E

fp = p

∞ 0

yp−1 ω y dy

f ≥ 0, 0 < p < ∞ .

126

Measure and Integral: An Introduction to Real Analysis

6. For f ∈ L(R1 ), define the Fourier transform f of f by x ∈ R1 .

+∞ 1 f (x) = f (t)e−ixt dt 2π −∞

(For a complex-valued function F = F0 + iF1 whose real and imaginary parts F0 and F1 are integrable, we define F = F0 + i F1 .) Show that if f and g belong to L(R1 ), then (f ∗ g)(x) = 2π f (x) g(x). 7. Let F be a closed subset of R1 and let δ(x) = δ(x, F) be the corresponding distance function. If λ > 0 and f is nonnegative and integrable over the complement of F, prove that the function δλ y f y dy x − y1+λ 1 R is integrable over F and so is finite a.e. in F. (In case f = χ(a,b) , this reduces to Theorem 6.17.) 8. Under the hypotheses of Theorem 6.17 and assuming that b−a < 1, prove that the function M0 (x) =

b log a

1 δ(y)

−1

|x − y|−1 dy

is finite a.e. in F. 9. (a) Show that Mλ (x; F) = +∞ if x ∈ / F, λ > 0. (b) Let F = [c, d] be a closed subinterval of a bounded open interval (a, b) ⊂ R1 , and let Mλ be the corresponding Marcinkiewicz integral, λ > 0. Show that Mλ is finite for every x ∈ (c, d) and that Mλ (c) = Mλ (d) = ∞. Show also that F Mλ ≤ λ−1 |G|, where G = (a, b) − [c, d]. 10. Let vn be the volume of the unit ball in Rn . Show by using Fubini’s theorem that vn = 2vn−1

1

1 − t2

(n−1)/2

dt.

0

(We also observe that by setting w = t2 , the integral is a multiple of a classical β-function and so can be expressed in terms of the -function: ∞ (s) = 0 e−t ts−1 dt, s > 0.)

127

Repeated Integration

11. Use Fubini’s theorem to prove that

2

e−|x| dx = πn/2 .

Rn

(For n = 1, write

+ ∞ −∞

e−x dx 2

2

=

+∞ + ∞

−x2 −y2 dxdy and use polar −∞ e 2 2 2 e−|x| = e−x1 . . . e−xn and Fubini’s

−∞

coordinates. For n > 1, use the formula theorem to reduce to the case n = 1.) 12. (a) Give an example that shows that the projection onto the x-axis of a measurable subset of the plane may not be linearly measurable. (b) Show that if E is either an open or closed set in the plane, then its projection onto the x-axis is linearly measurable. (For (b), the projection of an open set is open, and the projection of a compact set is compact. Any closed set is a countable union of compact sets.) 13. Let f ∈ L(−∞, ∞), and let h > 0 be fixed. Prove that ⎞ x+h ∞ 1 ⎠ ⎝ f (y) dy dx = f (x) dx. 2h −∞ −∞ ∞

⎛

x−h

7 Differentiation The main results in this chapter deal with questions of differentiability. A variety of topics is considered, but for the most part, the results are related to the analogue for Lebesgue integrals of the fundamental theorem of calculus and to the differentiability a.e. of functions that are Lipschitz continuous.

7.1 The Indefinite Integral If f is a Riemann integrable function on an interval [a, b] in R1 , then the familiar definition of its indefinite integral is

F(x) =

x

f (y) dy,

a ≤ x ≤ b.

a

The fundamental theorem of calculus asserts that F = f if f is continuous. We will study an analogue of this result for Lebesgue integrable f and higher dimensions. We must first find an appropriate definition of the indefinite integral. In two dimensions, for example, we might choose

F (x1 , x2 ) =

x1 x2 f y1 , y2 dy1 dy2 . a1 a2

It turns out, however, to be better to abandon the notion that the indefinite integral be a function of point and adopt the idea that it be a function of set. Thus, given f ∈ L(A), where A is a measurable subset of Rn , we define the indefinite integral of f to be the function F(E) =

f,

E

where E is any measurable subset of A. 129

130

Measure and Integral: An Introduction to Real Analysis

F is an example of a set function, by which we mean any real-valued function F defined on a σ-algebra of measurable sets such that (i) F(E) is finite for every E ∈ . (ii) F is countably additive; that is, if E = k Ek is a union of disjoint Ek ∈ , then F(E) =

F (Ek ).

k

By Theorems 5.5 and 5.24, the indefinite integral of an f ∈ L(A) satisfies (i) and (ii) for the σ-algebra of measurable subsets of A. We shall systematically study set functions in Chapter 10. We now discuss a continuity property of the indefinite integral. Recall (from p. 5 in Section 1.3) that the diameter of a set E is the value sup{|x − y| : x, y ∈ E}. A set function F(E) is called continuous if F(E) tends to zero as the diameter of E tends to zero; that is, F(E) is continuous if, given ε > 0, there exists δ > 0 such that |F(E)| < ε whenever the diameter of E is less than δ. An example of a set function that is not continuous can be obtained by setting F(E) = 1 for any measurable E that contains the origin, and F(E) = 0 otherwise. A set function F is called absolutely continuous with respect to Lebesgue measure, or simply absolutely continuous, if F(E) tends to zero as the measure of E tends to zero. Thus, F is absolutely continuous if given ε > 0, there exists δ > 0 such that |F(E)| < ε whenever the measure of E is less than δ. A set function that is absolutely continuous is clearly continuous. The converse, however, is false, as shown by the following example. Let A be the unit square in R2 , let D be a diagonal of A, and consider the σ-algebra of measurable subsets E of A for which E ∩ D is linearly measurable. For such E, let F(E) be the linear measure of E ∩ D. Then F is a continuous set function. However, it is not absolutely continuous since there are sets E containing a fixed segment of D whose R2 -measures are arbitrarily small.

Theorem 7.1

If f ∈ L(A), its indefinite integral is absolutely continuous.

Proof. We may assume that f ≥ 0 by considering f + and f − . Fix k and write f = g + h, where g = f whenever f ≤ k and g = k otherwise. Given ε > 0, we may choose k so large that 0 ≤ A h < 12 ε and, a fortiori, 0 ≤ E h < 12 ε for every measurable E ⊂ A. On the other hand, since 0 ≤ g ≤ k, we have 0 ≤ E g ≤ k|E| < 12 ε if |E| is small enough. Thus,

131

Differentiation

0≤

f =

E

g+

E

h

0, choose k0 so that f − fk0 0, choose an open G such that E ⊂ G and |G − E| < ε. Then |χG − χE | = |G − E| < ε, so we may assume that f = χG for an open G with |G| < +∞. Using Theorem 1.11, write G = Ik , where the Ik are disjoint, partly open intervals. If we let fN be the characteristic function of N k=1 Ik , we obtain ∞ f − fN = |Ik | → 0 k=N+1

since ∞ k=1 |Ik | = |G| < +∞. By (2), it is thus enough to show that each fN has property A . But fN is the sum of χIk , k = 1, . . . , N, so it suffices by (1) to show

134

Measure and Integral: An Introduction to Real Analysis

that the characteristic function of any partly open interval I has property A . This is practically self-evident: if I∗ denotes an interval that contains the closure of I in its interior and that satisfies |I∗ − I| < ε, then for any continuous C, 0 ≤ C ≤ 1, which is 1 in I and 0 outside I∗ , we have |χI − C| ≤ I∗ − I < ε. This completes the proof of Lemma 7.3. The proof also shows that the functions {Ck } can be chosen to be finite linear combinations of characteristic functions of intervals (i.e., step functions) instead of continuous functions. The lemma that follows is a preliminary version of a covering lemma due to Vitali (Theorem 7.17) and has many applications. Lemma 7.4 (Simple Vitali Lemma) Let E be a subset of Rn with |E|e < +∞, and let K be a collection of cubes Q covering E. Then there exist a positive constant β, depending only on n, and a finite number of disjoint cubes Q1 , . . . , QN in K such that N Qj ≥ β|E|e . j=1

Proof. We will index the size of a cube Q ∈ K by writing Q = Q(t), where t is the edge length of Q. Let K1 = K and t∗1 = sup {t : Q = Q(t) ∈ K1 } . If t∗1 = +∞, then K1 contains a sequence of cubes Q with |Q| → +∞. In this case, given β > 0, we simply choose one Q ∈ K1 with |Q| ≥ β|E|e . If t∗1 < +∞, the idea is still to pick a relatively large cube: choose Q1 = Q1 (t1 ) ∈ K1 such that t1 > 12 t∗1 . Now split K1 = K2 ∪ K2 , where K2 consists of those cubes in K1 that are disjoint from Q1 , and K2 of those that intersect Q1 . Let Q∗1 denote the cube concentric with Q1 whose edge length is 5t1 . Thus, Q∗1 = 5n |Q1 |, and since 2t1 > t∗1 , every cube in K2 is contained in Q∗1 . Starting with j = 2, continue this selection process for j = 2, 3, . . . , by letting t∗j = sup t : Q = Q(t) ∈ Kj , , choosing a cube Qj = Qj tj ∈ Kj with tj > 12 t∗j , and splitting Kj = Kj+1 ∪ Kj+1 where Kj+1 consists of all those cubes of Kj that are disjoint from Qj . If Kj+1 is empty, the process ends. We have t∗j ≥ t∗j+1 ; moreover, for each j, the Q1 , . . . , Qj

135

Differentiation

are disjoint from one another and from every cube in Kj+1 , and every cube in is contained in the cube Q∗j concentric with Qj whose edge length is 5tj . Kj+1 Note that Q∗j = 5n Qj . Consider the sequence t∗1 ≥ t∗2 ≥ · · · . If some KN+1 is empty (i.e., if t∗j = 0 for j ≥ N + 1), then since K1 = K2 ∪ K2 = · · · = KN+1 ∪ KN + 1 ∪ · · · ∪ K2 ,

and E is covered by the cubes in K1 , it follows that E is covered by the cubes ∗ ∪ · · · ∪ K2 . Hence, E ⊂ N in KN+1 j=1 Qj , so that |E|e ≤

N N ∗ Qj . Qj = 5n j=1

j=1

This proves the lemma with β = 5−n . On the other hand, if no t∗j is zero, then either there exists a δ > 0 such that

t∗j ≥ δ for all j, or t∗j → 0. In the first case, tj ≥ 12 δ for all j and, therefore, N j=1 Qj → +∞ as N → ∞. Given any β > 0, the lemma follows in this case by choosing N sufficiently large. Finally, if t∗j → 0, we claim that every cube in K1 is contained in j Q∗j . Otherwise, there would be a cube Q = Q(t) not intersecting any Qj . Since this cube would belong to every Kj , t would satisfy t ≤ t∗j for every j and, therefore, t = 0. This contradiction establishes the claim. Since E is covered by the cubes in K1 , it follows that |E|e ≤

Qj . Q∗j = 5n j

j

Hence, given β with 0 < β < 5−n , there exists an N such that This completes the proof.

N j=1 Qj ≥ β|E|e .

We stress that Lemma 7.4 does not presuppose the measurability of E and that the proof can be shortened if E is measurable. In fact, if E is measurable, we can suppose it is closed and bounded (see, e.g., Lemma 3.22). Hence, assuming as we may that the cubes in K are open (by slightly enlarging each cube concentrically), it follows from the Heine–Borel Theorem 1.12 that E can be covered by a finite number of cubes. For Q1 , we then choose the largest cube; similarly, in subsequent steps, we take Qj to be the largest cube disjoint from Q1 , . . . , Qj−1 . Thus, E ⊂ Q∗j , and the lemma follows. See Exercise 18 for a more set-theoretic version of Lemma 7.4.

136

Measure and Integral: An Introduction to Real Analysis

Before stating the final lemma, we make a definition and a few remarks. If f is defined on Rn and integrable over every cube Q, let f ∗ (x) = sup

1 |f (y)| dy, |Q|

(7.5)

Q

where the supremum is taken over all Q with edges parallel to the coordinate axes and center x. The function f ∗ , called the Hardy–Littlewood maximal function of f, is a gauge of the size of the averages of | f | around x. It clearly satisfies the following: (i)

0 ≤ f ∗ (x) ≤ +∞,

(ii)

(f + g)∗ (x) ≤ f ∗ (x) + g∗ (x),

(iii)

∗

(7.6)

∗

(cf ) (x) = |c| f (x).

If f ∗ (x0 ) > α for some x0 ∈ Rn and α > 0, it follows from the absolute continuity of the indefinite integral that f ∗ (x) > α for all x near x0 . Hence, according to Theorem 4.14, f ∗ is lower semicontinuous in Rn . In particular, it is measurable. We now investigate the size of f ∗ . For any measurable E, χ∗E (x)

|E ∩ Q| : Q has center x . = sup |Q|

If E is bounded and Qx denotes the smallest cube with center x containing E, then |E ∩ Qx | |E| = x . |Qx | |Q | It follows that there are positive constants c1 and c2 such that c1

|E| |E| ≤ χ∗E (x) ≤ c2 n n |x| |x|

for large |x|.

(7.7)

In particular, if |E| > 0, χ∗E is not integrable over Rn . We leave it as an exercise to show that for any measurable f that is different from zero on a set of positive measure, there is a positive constant c such that f ∗ (x) ≥

c for |x| ≥ 1. |x|n

Therefore, f ∗ is never integrable over {x : |x| ≥ 1} unless f = 0 a.e. This failure of integrability of f ∗ is due to its size when |x| is large. On the other hand,

137

Differentiation

f ∗ is not integrable over bounded sets for some f ∈ L (Rn ). For example, in case n = 1, the function f (x) =

1 |x| (log |x|)2

χ{x:|x| 1, or even if |f | 1 + log+ |f | ∈ L1 (Rn ); see Theorem 9.16 and Exercise 22 of Chapter 9. To find a way to estimate the size of f ∗ when f ∈ L (Rn ), recall that by Tchebyshev’s inequality, x ∈ Rn : f (x) > α ≤ 1 |f (x)| dx, α n

α > 0.

R

Hence, if f ∈ L (Rn ), there is a constant c independent of α such that

c |x ∈ Rn : |f (x)| > α | ≤ , α

α > 0.

(7.8)

Any measurable f , integrable or not, for which (7.8) is valid is said to belong to weak L(Rn ). Thus, any function in L (Rn ) is also in weak L (Rn ). The function |x|−n is an example of a function in weak L (Rn ), which is not in L (Rn ) (see also Exercise 24 in Chapter 5). Lemma 7.9 (Hardy–Littlewood) If f ∈ L (Rn ), then f ∗ belongs to weak L (Rn ). Moreover, there is a constant c independent of f and α such that x ∈ Rn : f ∗ (x) > α ≤ c |f |, α n R

Proof. Fix α > 0 and let E = f∗ > α .

α > 0.

138

Measure and Integral: An Introduction to Real Analysis

If x ∈ E, then by the definitions of E and f ∗ , there is a cube Qx with center x such that |Qx |−1 Qx |f | > α. Equivalently, |Qx |

0 (depending only on n) and a finite number (k)

⊂ E such that the cubes Qx(k) are disjoint in j (for each k) and j j −1 |Ek | < β j Qx(k) . Therefore, j

of points xj

|Ek |

0, let Eε be the set on which the left side of (7.10) exceeds ε. By (7.10), ε ∗ ε ∪ x : f (x) − Ck (x) > . Eε ⊂ x : f − Ck (x) > 2 2 Applying Lemma 7.9 to the first set on the right and Tchebyshev’s inequality to the second, we obtain |Eε |e ≤ c

ε −1 −1 f − Ck + ε f − Ck . 2 2 n n R

R

Since c is independent of k, it follows by letting k → ∞ that |E ε |e = 0. Let E be the set where the left side of (7.10) is positive. Then E = k Eεk for any sequence εk → 0, and therefore |E| = 0. This means that limQx F(Q)/|Q| exists and equals f (x) for almost every x, which completes the proof. We now list several extensions and corollaries of Lebesgue’s theorem. (I) A measurable function f defined on Rn is said to be locally integrable on if it is integrable over every bounded measurable subset of Rn . This is easily seen to be equivalent to the integrability of f over every compact set.

Rn

Theorem 7.11 The conclusion of Lebesgue’s theorem is valid if, instead of being integrable, f is locally integrable on Rn . Proof. It is enough to show that the conclusion holds a.e. in every open ball. Fix a ball and replace f by zero outside it. This new function is integrable over Rn , its integral is differentiable a.e., and since differentiability is a local property, the initial function f is differentiable a.e. in the ball. This completes the proof. (II) For any measurable E, note that |E ∩ Q| 1 . χE = |Q| |Q| Q

140

Measure and Integral: An Introduction to Real Analysis

By Theorem 7.11, the left-hand side tends to χE (x) a.e. as Q x; that is, |E ∩ Q| = χE (x) |Q| Qx lim

a.e.

(7.12)

A point x for which this limit is 1 is called a point of density of E, and a point for which it is zero is called a point of dispersion of E. Since |Q ∩ E| |Q ∩ CE| |Q| + = = 1, |Q| |Q| |Q| every point of density of E is a point of dispersion of CE, and vice versa. Formula (7.12) can be restated as follows.

Theorem 7.13 of density of E.

Let E be a measurable set. Then almost every point of E is a point

Thus, roughly speaking, a set clusters around almost all of its points. (III) The formula limQx (1/|Q|) Q f (y)dy = f (x) can be written 1 [f (y) − f (x)] dy = 0 Qx |Q| lim

Q

and is valid for almost every x if f is locally integrable. A point x at which the stronger statement 1 | f (y) − f (x)| dy = 0 Qx |Q| lim

(7.14)

Q

is valid is called a Lebesgue point of f, and the collection of all such points is called the Lebesgue set of f. Theorem 7.15 Let f be locally integrable in Rn . Then almost every point of Rn is a Lebesgue point of f; that is, there exists a set Z (depending on f) of measure zero such that (7.14) holds for x ∈ / Z. Proof. Let {rk } be the rational numbers, and let Zk be the set where the formula 1 f (y) − rk dy = f (x) − rk Qx |Q| lim

Q

141

Differentiation

is not valid. Since f (y) − rk is locally integrable, we have |Zk | = 0. Let Z = Zk ; then |Z| = 0. For any Q, x, and rk , 1 1 1 f (y) − rk dy + f (x) − rk dy |f (y) − f (x)| dy ≤ |Q| |Q| |Q| Q

Q

Q

1 f (y) − rk dy + f (x) − rk . = |Q| Q

Therefore, if x ∈ / Z, lim sup Qx

1 |f (y) − f (x)| dy ≤ 2|f (x) − rk | |Q| Q

for every rk . For an x at which (in particular, almost everywhere), f (x) is finite we can choose rk such that f (x) − rk is arbitrarily small. This shows that the left side of the last formula is zero a.e. and completes the proof. (IV) So far, the sets contracting to x have been cubes centered at x with edges parallel to the coordinate axes. Many other sets can be used. A family {S} of measurable sets is said to shrink regularly to x provided (i) The diameters of the sets S tend to zero. (ii) If Q is the smallest cube with center x containing S, there is a constant k independent of S such that |Q| ≤ k|S|. The sets S need not contain x. Theorem 7.16 Let f be locally integrable in Rn . Then at every point x of the Lebesgue set of f (in particular, almost everywhere), 1 |f (y) − f (x)| dy → 0 |S| S

for any family {S} that shrinks regularly to x. Thus, also 1 f (y) dy → f (x) |S| S

a.e.

142

Measure and Integral: An Introduction to Real Analysis

Proof. If S ⊂ Q, we have |f (y) − f (x)| dy ≤ |f (y) − f (x)| dy. S

Q

Hence, if {S} shrinks regularly to x and Q is the least cube with center x containing S, then 1 |Q| 1 |f (y) − f (x)| dy ≤ |f (y) − f (x)| dy |S| |S| |Q| S

Q

1 |f (y) − f (x)| dy. ≤k |Q| Q

If x is a Lebesgue point of f , the last expression tends to zero, and the theorem follows. In particular, for functions of a single variable, we obtain (cf. p. 131 in Section 7.2) x+h 1 f (y) dy = f (x) lim h→0 h x

a.e.

7.3 Vitali Covering Lemma The theorem that follows is a refinement of the simple Vitali Lemma 7.4. Given a set and a collection of cubes, we will now assume that each point of the set is covered not just by a single cube in the collection but by a sequence of cubes in the collection with diameters tending to zero. In this case, it turns out that we can cover almost all points of the set by a sequence of disjoint cubes in the collection. The result will be essentially a corollary of Lemma 7.4. A family K of cubes is said to cover a set E in the Vitali sense if for every x ∈ E and η > 0, there is a cube in K containing x whose diameter is less than η.

Theorem 7.17 (Vitali Covering Lemma) Suppose that E is covered in the Vitali sense by afamily K of cubes and that 0 < |E|e < +∞. Then, given ε > 0, there is a sequence Qj of disjoint cubes in K such that E − = 0 and Qj < (1 + ε)|E|e . Q j j j

143

Differentiation

Proof. The second relation is automatically satisfied if we choose an open set G containing E with |G| < (1 + ε)|E|e and consider only those Q in K which lie in G. By Lemma 7.4, there exist a constant β, 0 < β < 1, depending only on N 1 |Qj | > β|E|e . the dimension, and disjoint Q1 , . . . , QN1 in K such that j=1 Therefore, N1 N1 N1 E − Qj < |E|e (1 + ε − β). G − = |G| − Q ≤ Q j j j=1 j=1 j=1 e

Hence, by considering from the start only those ε with 0 < ε < β/2, we have N1 β E − . Qj < |E|e 1 − 2 j=1 e

Thus, the part of E not covered by the cubes obtained from the simple Vitali lemma has outer measure less than |E|e (1 − β/2). We now repeat the proN1 cess for the set E1 = E − j=1 Qj , which is still covered in the Vitali sense by those cubes in K that are disjoint from Q1 , . . . , QN1 . We obtain QN1 +1 , . . . , QN2 , disjoint from each other and from Q1 , . . . , QN1 , such that N2 N2 β β 2 E − < |E|e 1 − Qj = E1 − Qj < |E1 |e 1 − . 2 2 j=1 j=N1 +1 e

e

Continuing in this way, we obtain at the mth stage disjoint Q1 , . . . , QNm in K such that Nm β m E − 1 − Q < |E| . e j 2 j=1 e

Since (1 − β/2)m → 0 as m → ∞, there is a sequence of disjoint cubes in K with the desired properties.

Corollary 7.18 Suppose that E is covered in the Vitali sense by a family K of cubes and 0 < |E|e < +∞. Then, given ε > 0, there is a finite collection Q1 , . . . , QN of disjoint cubes in K such that

144

Measure and Integral: An Introduction to Real Analysis N E − Qj < ε j=1

N Qj < (1 + ε)|E|e .

and

j=1

e

This is part of the proof of Vitali’s lemma. Note by Carathéodory’s theorem that N N |E|e = E − Qj + E ∩ Qj , j=1 j=1 e

so that for E and Qj

N j=1

e

as in Corollary 7.18, we have N Qj . |E|e − ε < E ∩ j=1

(7.19)

e

In particular, N Qj . |E|e − ε

D4 f (x) has measure zero. A similar argument will apply to any two Dini numbers of f . It is enough to show that each set Ar,s = x ∈ (a, b) : D1 f (x) > r > s > D4 f (x) has measure zero since the original set is the union of these over rational r and s. We may assume r, s > 0 since f is increasing. Fix r and s with r > s > 0, write A = Ar,s , and suppose that |A|e > 0. If x ∈ A, the fact that D4 f (x) < s implies the existence of arbitrarily small h > 0 such that f (x − h) − f (x) < s. −h 7.18 and (7.19), given ε > 0, there exist disjoint intervals By Corollary xj − hj , xj , j = 1, . . . , N, such that (i) f xj − f xj − hj < shj , j = 1, . . . , N, (ii) A ∩ N x − h , x > |A|e − ε, j j j j=1 e N h < (1 + ε)|A| . (iii) e j=1 j Combining (i) and (iii), we obtain (iv)

N j=1 f xj − f xj − hj < s(1 + ε)|A|e .

Let B = A ∩ N j=1 xj −hj , xj . By(ii), |B|e > |A|e − ε. For every y ∈ B that is not an endpoint of some xj − hj , xj , the fact that D1 f (y) > r implies that there

146

Measure and Integral: An Introduction to Real Analysis

exist arbitrarily small k > 0 such that [y, y + k] lies in some xj − hj , xj and f (y + k) − f (y) > r. k Hence, by Corollary 7.18 and (7.20), there exist disjoint [yi , yi + ki ], i = 1, . . . , M, such that (v) Each [yi , yi + ki ] lies in some xj − hj , xj , (vi) f yi + ki − f yi > rki , i = 1, . . . , M, M (vii) i=1 ki > |B|e − ε > |A|e − 2ε [by (ii)]. Therefore, (viii)

M i = 1 f yi + ki − f yi > r (|A|e − 2ε).

Since f is increasing, it follows from (v) that M N f yi + ki − f yi ≤ f xj − f xj − hj . i=1

j=1

Combining this inequality with (iv) and (viii), we obtain r (|A|e − 2ε) < s (1 + ε)|A|e . Since ε > 0 is arbitrary, this gives r ≤ s, which is a contradiction. Hence, |A|e = 0. Since an analogous argument applies to any two Dini numbers, it follows that f (x) exists for almost every x in (a, b). Extend the definition of f to (a, ∞) by setting f (x) = f (b−) for x ≥ b, and let f (x + h) − f (x) , h

fk (x) =

h=

1 , k

for x ∈ (a, b) and k = 1, 2, . . . . Each fk is nonnegative and measurable, and fk → f a.e. in (a, b). Hence, f is measurable on (a, b), and by Fatou’s lemma, b a

f ≤ lim inf k→∞

b

fk .

a

If f (b−) is finite (otherwise, (7.22) is obvious), we have b

fk =

a

b+h b 1 1 f− f h ha a+h

=

b+h a+h a+h 1 1 1 f− f = f (b−) − f. h h a h a b

147

Differentiation

Since f (a+) ≤ (1/h) b a

a+h a

f ≤ f (a + h), we obtain

f ≤ lim inf k→∞

b

fk = lim

a

k→∞

b

fk = f (b−) − f (a+),

a

which proves (7.22). It remains to show that f is finite a.e. in (a, b). This follows from (7.22) if f (b−) − f (a+) < ∞ since then f ∈ L(a, b). Since f is finite on (a, b), it follows in any case by applying (7.22) to a sequence of intervals (ak , bk ) increasing to (a, b), a < ak < bk < b, and the proof is complete. The inequality in formula (7.22) cannot in general be replaced by equality, even if f is continuous on [a, b]. To see this, let f be the Cantor–Lebesgue function on [0, 1] (p. 43 in Section 3.1). Then f is continuous and monotone increasing on [0, 1], f (0) = 0, f (1) = 1, and since f is constant on every inter1 val removed in constructing the Cantor set, f = 0 a.e. Thus, 0 f = 0, while f (1) − f (0) = 1. We shall return in Theorem 7.29 to the question of equality in (7.22). By Corollary 2.7, any function of bounded variation can be written as the difference of two bounded monotone increasing functions. Hence, we obtain the following result (see also Exercise 9). Corollary 7.23 and f ∈ L[a, b].

If f is of bounded variation on [a, b], then f exists a.e. in [a, b],

If f is of bounded variation on [a, b] and V(x) denotes its (total) variation on [a, x], a ≤ x ≤ b, then V is monotone increasing by Theorem 2.2. The next result gives an important relation between V and f . Theorem 7.24 If f is of bounded variation on [a, b] and V(x) is the variation of f on [a, x], a ≤ x ≤ b, then V (x) = |f (x)| for a.e. x ∈ [a, b]. We will prove this with the help of the next lemma, which is of independent interest. Lemma 7.25 (Fubini) Let fk be a sequence of monotone increasing functions on [a, b]. If the series s(x) = fk (x) converges on [a, b] (equivalently, if s(a) and s(b) are finite), then fk (x) a.e. in [a, b]. s (x) = In particular, fk → 0 a.e. in [a, b].

148

Measure and Integral: An Introduction to Real Analysis

∞ Proof. Let sm = m k=1 fk and rm = k=m+1 fk . Then sm and rm are monotone increasing functions and s = sm + rm . With the exception of a set Zm , |Zm | = 0, these three functions together f1 , . . . , fm are differentiable and s = sm + mwith rm . In particular, s ≥ sm = k=1 fk except in Zm . It follows that ∞

fk ≤ s

k=1

except in Z = ∞ m=1 Zm , |Z| = 0. To prove that in the last inequality we actually have equality a.e., it is enough to show that rm→0 a.e. for m running through a sequence of values rm (x) converges at m1 < m2 < · · · . Select mj increasing so rapidly that j both x = a and x = b. This implies the convergence of rmj (b) − rmj (a) and also, in view of the monotonicity of rmj , of rmj (b−) − rmj (a+) . By (7.22), we have

0≤

b

rmj

=

b

a

rmj ≤

rmj (b−) − rmj (a+) .

a

Thus, rmj is integrable over (a, b), and therefore, it is finite a.e. in (a, b). Thus, rmj → 0 a.e., and the proof is complete. Proof of Theorem 7.24. Let f be of bounded variation on [a, b] and let V(x) = V[a, x] be its variation on [a, x], a ≤ x ≤ b. Then V(a)

= 0 and V(b) is the variation of f on [a, b]. Select a sequence k : k = xkj of partitions of [a, b] such that 0 ≤ V(b) − Sk < 2−k , where Sk =

f xkj − f xkj−1 . j

For each k, define a function fk on [a, b] as follows: if x ∈ xkj−1 , xkj , let ⎧ ⎨f (x) + ckj if f xkj ≥ f xkj−1 , fk (x) = ⎩−f (x) + ck if f xk < f xk , j j j−1

149

Differentiation

where the ckj are constants chosen so that fk (a) = 0 and fk is well-defined (i.e., single-valued) at xkj for every j. Then, for all k and j, fk xkj − fk xkj−1 = f xkj − f xkj−1 , so that Sk =

fk xkj − fk xkj−1 = fk (b). j

Hence, for any k, we obtain 0 ≤ V(b) − fk (b) < 2−k . We will show that each V(x) − fk (x) is an increasing function of x. This amounts to showing that if x < y, then fk (y)−fk (x) ≤ V(y)−V(x). If x and y both belong to the same partitioning interval of k , then fk (y)−fk (x) ≤ | f (y)−f (x)|, and therefore, fk (y) − fk (x) ≤ V[x, y] = V(y) − V(x). In the general case, if xkl , xkl+1 , . . . , xkm are the points of k between x and follows by adding the inequalities for the intervals y, the result k k k x, xl , xl , xl+1 , . . . , xkm , y . Since V(a) = fk (a) = 0 and V(b) − fk (b) < 2−k , it follows that 0 ≤ V(x) − fk (x) < 2−k for all x ∈ [a, b]. Hence,

[V(x) − fk (x)]

0, there exists δ > 0 such that for any collection [ai , bi ] (finite or not) of nonoverlapping subintervals of [a, b], f (bi ) − f (ai ) < ε if (bi − ai ) < δ. For example, if g is integrable on [a, b] and f (x) = f (bi ) − f (ai ) ≤

x a

g for a ≤ x ≤ b, then

|g|

[ai ,bi ]

for any nonoverlapping [ai , bi ]. By Theorem 7.1, E |g| is an absolutely continuous set function, and therefore, f is an absolutely continuous function on [a, b]. One of the main results proved below (Theorem 7.29) x is that every absolutely continuous function f has the form f (x) = f (a) + a g for an integrable g. Another example of an absolutely continuous function is any f that satisfies a Lipschitz condition: |f (x) − f (y)| ≤ C|x − y|

for all x, y ∈ [a, b]

(7.26)

and some constant C ≥ 0. On the other hand, the Cantor–Lebesgue function f is an example of a con tinuous function that is not absolutely continuous, since if Ck = j [akj , bkj ] denotes the intervals remaining at the kth stage of construction of the Cantor set, then |Ck | → 0, while f bkj − f akj = 1 j

for every k. It is simple to see that a linear combination of absolutely continuous functions is absolutely continuous and that an absolutely continuous function is continuous. Moreover, if f is absolutely continuous on [a, b], then f exists a.e. in [a, b] and f ∈ L[a, b]. This follows immediately from Corollary 7.23 and the next theorem. Theorem 7.27 If f is absolutely continuous on [a, b], then it is of bounded variation on [a, b]. Proof. Choose δ so that f (bi ) − f (ai ) ≤ 1 for any collection of nonoverlapping intervals with (bi − ai ) ≤ δ. Then the variation of f over any

151

Differentiation

subinterval of [a, b] with length less than δ is at most 1. Hence, if we split [a, b] into N intervals each with length less than δ, then V[a, b] ≤ N. A function f for which f is zero a.e. in [a, b] is said to be singular on [a, b]. The Cantor–Lebesgue function is an example of a nonconstant, singular function on [0, 1]. Theorem 7.28 If f is both absolutely continuous and singular on [a, b], then it is constant on [a, b]. Proof. It is enough to show that f (a) = f (b) since this result applied to any subinterval proves that f is constant. Let E be the subset of (a, b) where f = 0, so that |E| = b − a. Given ε > 0 and x ∈ E, we have [x, x + h] ⊂ (a, b) and | f (x + h) − f (x)| < εh for all sufficiently small h > 0. Let δ be the number corresponding to ε in the definition of the absolute continuity of f . By Corol lary 7.18 and (7.20), there exist disjoint Qj = xj , xj + hj , j = 1, . . . , N, in (a, b) such that (i) f xj + hj − f xj < εhj , N (ii) j=1 Qj > (b − a) − δ. By (i), N N f xj + hj − f xj < ε Qj ≤ ε(b − a). j=1

j=1

Moreover, since by (ii) the total length of the complementary intervals is less than δ, the sum of the absolute values of the increments of f over them is less than ε. Thus, the sum of the absolute values of the increments of f over the Qj and the complementary intervals is less than ε(b−a)+ε. Hence, | f (b)−f (a)| < ε(b − a) + ε, so that f (b) = f (a). This completes the proof. Theorem 7.29 A function f is absolutely continuous on [a, b] if and only if f exists a.e. in [a, b], f is integrable on [a, b], and

f (x) − f (a) =

x a

f ,

a ≤ x ≤ b.

152

Measure and Integral: An Introduction to Real Analysis

Proof. We have already x observed (see p. 150 in Section 7.5) that any function of the form G(x) = a g, g ∈ L[a, b], is absolutely continuous on [a, b]. Hence, the sufficiency part of the theorem follows. x Conversely, if f is absolutely continuous, let F(x) = a f . F is well-defined by virtue of Theorem 7.27 and Corollary 7.23. Moreover, by Theorem 7.16, F = f a.e. in [a, b]. It follows that F(x)−f (x) is both absolutely continuous and singular on [a, b]. Hence, by Theorem 7.28, we obtain F(x) − f (x) = F(a) − f (a) for x ∈ [a, b]. Since F(a) = 0, the proof is complete. Theorem 7.30 If f is of bounded variation on [a, b], then f can be written f = g+h, where g is absolutely continuous on [a, b] and h is singular on [a, b]. Moreover, g and h are unique up to additive constants. x Proof. Let g(x) = a f and h = f − g. Then h = f − g = f − f = 0 a.e. in [a, b], so that h is singular, and the formula f = g + h gives the desired decomposition. If f = g1 + h1 is another such decomposition, then g − g1 = h1 − h. Since g − g1 is absolutely continuous and h1 − h is singular, it follows from Theorem 7.28 that g − g1 = h1 − h = constant, which completes the proof. In view of Theorems 7.30 and 7.27, it is natural to ask if every bounded singular function is of bounded variation, as is the case, for example, with the Cantor–Lebesgue function. The answer is no, and an example is the characteristic function χC of the Cantor set C. In fact, since χC is zero on every (open) interval in the complement of C, then (χC ) = 0 there, and so (χC ) = 0 a.e. in [0, 1]. However, χC does not belong to BV[0, 1] since it equals 1 at the endpoints of every interval removed during the construction of C. The next theorem, which is an extension of Corollary 2.10, gives formulas for the variations of an absolutely continuous function. Theorem 7.31 Let f be absolutely continuous on [a, b] and let V(x), P(x) and N(x) denote its total, positive, and negative variations on [a, x], a ≤ x ≤ b. Then V, P, and N are absolutely continuous on [a, b], and

V(x) =

x a

|f |,

P(x) =

x + f , a

and

N(x) =

x − f . a

Proof. We will first show that V is absolutely continuous. If [α, β] is a subinterval of [a, b] and = {xk } is a partition of [α, β], then

153

Differentiation

V(β) − V(α) = V[α, β] = sup

f (xk ) − f xk−1

xk = sup xk−1

β f ≤ | f |. α

Hence, if [αi , βi ] is a collection of nonoverlapping subintervals of [a, b], then

[V (βi ) − V (αi )] ≤

|f |.

[αi ,βi ]

From this inequality and Theorem 7.1, it follows that V is absolutely continuous on [a, x b]. Therefore, by Theorem 7.29 and the fact that V(a) = 0, we x have V(x) = a V . Since V = | f | a.e. by Theorem 7.24, we obtain V(x) = a f . The fact that P and N are absolutely continuous for P x x and the formulas and N now follow from the relations V(x) = a f , f (x) = f (a) + a f , P(x) = 1 1 2 [V(x) + f (x) − f (a)], and N(x) = 2 [V(x) − f (x) + f (a)]. (See Theorem 2.6.) This completes the proof. On p. 27 in Section 2.3, we proved that if g is continuous on [a, b] and f is continuously differentiable on [a, b], then b

g df =

a

b

gf dx.

a

This is a special case of the first part of the following theorem.

Theorem 7.32 (Integration by Parts) (i) If g is continuous on [a, b] and f is absolutely continuous on [a, b], then b a

g df =

b

gf dx.

a

(ii) If both f and g are absolutely continuous on [a, b], then b a

gf dx = g(b)f (b) − g(a)f (a) −

b a

g f dx.

154

Measure and Integral: An Introduction to Real Analysis

Proof. To prove (i), first note that the integrals in the conclusion exist and are finite by Theorems 2.24 and 5.30. Let = {xi } be a partition of [a, b] with norm ||. Then b

gf dx =

xi

a

gf dx =

x i g xi−1 f (x) dx

xi−1

xi−1

+

xi g(x) − g xi−1 f (x) dx. xi−1

The first term on the right equals g xi−1 f (xi ) − f xi−1 , which conb verges to a g df as || → 0. The second term on the right is majorized in absolute value by sup

|x−y|≤||

x i g(x) − g(y) f dx = sup xi−1

|x−y|≤||

b g(x) − g(y) |f | dx. a

Since g is uniformly continuous on [a, b], the last expression tends to 0 as || → 0. This proves (i). We can easily deduce (ii) from (i) by using Theorem 2.21. In fact, if f and g are absolutely continuous, then b

gf dx =

a

b

g df = g(b)f (b) − g(a)f (a) −

a

= g(b)f (b) − g(a)f (a) −

b

f dg

a

b

fg dx.

a

This proves (ii). For a generalization of part (i) of the theorem, see Lemma 9.14.

7.6 Convex Functions Let φ be defined and finite on an open interval (a, b), possibly of infinite length. Then φ is said to be convex in (a, b) if for every [x1 , x2 ] in (a, b), the graph of φ on [x1 , x2 ] lies on or below the line segment connecting the points (x1 , φ (x1 )) and (x2 , φ (x2 )) of the graph of φ.

155

Differentiation

f(x)

(x2,f(x2))

L(x)

(x1,f(x1))

a

x1

x2

b

Let y = L(x) be the equation of this line segment. Then since every x ∈ [x1 , x2 ] can be written x = θx1 + (1 − θ)x2 for appropriate θ, 0 ≤ θ ≤ 1, the condition for convexity is φ (θx1 + (1 − θ)x2 ) ≤ L (θx1 + (1 − θ)x2 ) for [x1 , x2 ] ⊂ (a, b) and 0 ≤ θ ≤ 1. Since L (θx1 + (1 − θ) x2 ) = θL (x1 ) + (1 − θ)L (x2 ) = θφ (x1 ) + (1 − θ)φ (x2 ) , it follows that φ is convex in (a, b) if and only if φ (θx1 + (1 − θ)x2 ) ≤ θφ (x1 ) + (1 − θ)φ (x2 )

(7.33)

for a < x1 < x2 < b, 0 ≤ θ ≤ 1. Equivalently, φ is convex in (a, b) if and only if φ

p1 x1 + p2 x2 p1 + p2

≤

p1 φ (x1 ) + p2 φ (x2 ) p1 + p2

(7.34)

for a < x1 < x2 < b, p1 ≥ 0, p2 ≥ 0, p1 + p2 > 0. N Theorem 7.35 (Jensen’s Inequality) Let φ be convex in (a, b). Let xj j=1 be N points of (a, b) and pj j=1 satisfy pj ≥ 0 and pj > 0. Then φ

pj xj pj

≤

pj φ xj . pj

The proof follows by repeated application of (7.34).

156

Measure and Integral: An Introduction to Real Analysis

The slope of the chord connecting two points (α, φ(α)) and (β, φ(β)) of the graph of φ is [φ(β) − φ(α)]/(β − α). We leave it as an exercise to verify the following simple relations between the convexity of φ and the slopes of its chords: if φ is convex in (a, b), then for all x, x1 , x2 with a < x1 < x < x2 < b, φ(x) − φ (x1 ) φ (x2 ) − φ (x1 ) φ (x2 ) − φ(x) ; ≤ ≤ x − x1 x2 − x1 x2 − x conversely, if for every such x, x1 , x2 , either one of these two inequalities holds, then φ is convex in (a, b).

Theorem 7.36 (i) If φ1 and φ2 are convex in (a, b), then φ1 + φ2 is convex in (a, b). (ii) If φ is convex in (a, b) and c is a positive constant, then cφ is convex in (a, b). (iii) If φk , k = 1, 2, . . . , are convex in (a, b) and φk → φ in (a, b), then φ is convex in (a, b). The proof is left as an exercise. Theorem 7.37 If φ exists and is monotone increasing in (a, b), then φ is convex in (a, b). In particular, if φ exists and is nonnegative in (a, b), then φ is convex in (a, b). Proof. We shall use the following inequality: If b1 , b2 > 0, then

min

a1 a2 , b1 b2

≤

a1 + a2 a1 a2 . ≤ max , b1 + b2 b1 b2

(7.38)

To prove the first inequality in (7.38), let m = min {a1 /b1 , a2 /b2 }. Then mb1 ≤ a1 and mb2 ≤ a2 , so that m (b1 + b2 ) ≤ a1 + a2 . This is the desired result. The second inequality is proved similarly. To prove that φ is convex, we will show that if a < x1 < x < x2 < b, then φ (x2 ) − φ (x1 ) φ(x) − φ (x1 ) ≥ . x2 − x1 x − x1 Write [φ (x2 ) − φ(x)] + [φ(x) − φ (x1 )] φ (x2 ) − φ (x1 ) . = x2 − x1 [x2 − x] + [x − x1 ]

157

Differentiation

Since φ exists in (a, b), the mean-value theorem implies that there are ξ1 ∈ (x1 , x) and ξ2 ∈ (x, x2 ) such that φ(x) − φ (x1 ) φ (x2 ) − φ(x) = φ (ξ2 ) . = φ (ξ1 ) , x − x1 x2 − x Since φ is increasing, φ (ξ1 ) ≤ φ (ξ2 ), and it follows from the first inequality in (7.38) that φ (x2 ) − φ (x1 ) φ(x) − φ (x1 ) ≥ . x2 − x1 x − x1 This completes the proof. As a corollary of Theorem 7.37, we see that (i)

xp is convex in (0, ∞) if p ≥ 1 or if p ≤ 0

(ii)

eax is convex in (−∞, ∞)

(iii)

log(1/x) = − log x is convex in (0, ∞)

(7.39)

Theorem 7.40 If φ is convex in (a, b), then φ is continuous in (a, b). Moreover, φ exists except at most in a countable set and is monotone increasing. Proof. Since φ is convex, the slope [φ(x+h)−φ(x)]/h, h > 0, decreases with h. Hence, the derivative on the right, φ(x + h) − φ(x) , h h→0+

D+ φ(x) = lim

exists and is distinct from +∞ in (a, b). Similarly, the derivative on the left, φ(x) − φ(x − h) , h h→0+

D− φ(x) = lim

exists and is distinct from −∞ in (a, b). Since [φ(x) − φ(x − h)]/h ≤ [φ(x + h) − φ(x)]/h, h > 0, we obtain −∞ < D− φ(x) ≤ D+ φ(x) < + ∞.

(7.41)

This shows in particular that φ is continuous in (a, b). We next claim that D+ φ(y) ≤ D− φ(x) if a < y < x < b.

(7.42)

158

Measure and Integral: An Introduction to Real Analysis

In fact, if y < x, then as seen from the discussion earlier in the proof, we have D+ φ(y) ≤

φ(x) − φ(y) ≤ D− φ(x). x−y

This proves the claim. We therefore obtain D+ φ(y) ≤ D− φ(x) ≤ D+ φ(x) if y < x, which shows that D+ φ is monotone increasing. Similarly, D− φ is monotone increasing. To complete the proof of the theorem, note that (cf. Theorem 2.8) D+ φ can have at most a countable number of discontinuities since it is monotone and finite on (a, b). If x is a point of continuity of D+ φ, then letting y → x− in the last inequalities, we obtain D+ φ(x) = D− φ(x). Therefore, φ exists at every point of continuity of D+ φ, and the theorem follows.

Theorem 7.43 If φ is convex in (a, b), then it satisfies a Lipschitz condition on every closed subinterval of (a, b). In particular, if a < x1 < x2 < b, we have φ (x2 ) − φ (x1 ) =

x2

φ .

x1

Proof. Let [x1 , x2 ] be a closed subinterval of (a, b) and let x1 ≤ y < x ≤ x2 . Then as before D+ φ(y) ≤

φ(x) − φ(y) ≤ D− φ(x), x−y

so that, since D+ φ and D− φ are monotone increasing, D+ φ (x1 ) ≤

φ(x) − φ(y) ≤ D− φ (x2 ) . x−y

Hence, |φ(x) − φ(y)| ≤ C|x − y|, where C is the larger of D+ φ(x1 ) and − D φ (x2 ). This shows that φ satisfies a Lipschitz condition on [x1 , x2 ], and the rest of the theorem follows since Lipschitz functions of a single variable are absolutely continuous. The next result is a useful version of Jensen’s inequality for integrals. We shall need the notion of a supporting line: If φ is convex on (a, b) and x0 ∈ (a, b), a supporting line at x0 is a line through (x0 , φ (x0 )) that lies on or below the graph of φ on (a, b). It follows from the discussion preceding (7.41) that any line through (x0 , φ (x0 )) whose slope m satisfies D− φ (x0 ) ≤ m ≤ D+ φ (x0 ) is a supporting line at x0 .

159

Differentiation

Theorem 7.44 (Jensen’s Integral Inequality) Let f and p be measurable functions finite a.e. on a measurable set A ⊂ Rn . Suppose that fp and p are integrable on A, that p ≥ 0, and that A p > 0. If φ is convex in an interval containing the range of f , then φ(f ) p A fp ≤ A . φ p A Ap Proof. By hypothesis, f is finite a.e. in A. Choose (a, b), −∞ ≤ a < b ≤ +∞, such that φ is convex in (a, b) and a < f (x) < b for every x at which f (x) is finite. The number γ defined by fp γ = A Ap is finite and satisfies a < γ < b. If m is the slope of a supporting line at γ and a < t < b, then φ(γ) + m(t − γ) ≤ φ(t). Hence, for almost every x ∈ A, φ(γ)+m[f (x) − γ] ≤ φ(f (x)). Multiplying both sides of this inequality by p(x) and integrating the result with respect to x, we obtain φ(γ)

⎛

A

A

p + m ⎝ fp − γ

A

⎞ p⎠ ≤

φ(f ) p.

A

Here the existence of A φ(f ) p follows from the integrability of pand fp. (The continuity of φ implies that φ(f ) is measurable.) Since A fp − γ A p = 0, the last inequality reduces to φ(γ) A

p≤

φ(f ) p,

A

which is the desired result. In passing, we mention that a function f is called concave in (a, b) if −f is convex on (a, b). Properties of concave functions are easily deduced from those of convex functions.

160

Measure and Integral: An Introduction to Real Analysis

7.7 The Differential in Rn The principal fact to be proved in this section is the Rademacher–Stepanov theorem stating that a function that is locally Lipschitz continuous in an open set in Rn , n > 1, has a first differential (or tangent plane) almost everywhere in the set. The result is somewhat surprising in view of the fact that the graph of a Lipschitz function may have corners and edges. The analogous result in case n = 1 follows by combining Theorem 7.27 and Corollary 7.23. We will also derive a result in which the assumption of Lipschitz continuity is replaced by a weaker condition involving the second difference of a function, and we will study a simple way to extend Lipschitz continuous functions on subsets of Rn to all of Rn . We begin by defining the notion of the first differential of a function. A finite real-valued function f defined in a neighborhood of a point x ∈ Rn , n ≥ 1, is said to have a total first differential at x if there exists A = (a1 , . . . , an ) ∈ Rn such that f (x + h) − f (x) − A · h = o(|h|)

as h → 0.

(7.45a)

Here, A · h denotes the dot product of A and h, that is, A·h=

n

ai hi ,

h = (h1 , . . . hn ) ,

i=1

and the notation “o(|h|) as h → 0” in (7.45a) means that f (x + h) − f (x) − A · h = 0. |h| h→0 lim

(In general, for any finite real-valued function F(h) defined in a deleted neighborhood of the origin, the notation F(h) = o(|h|) as |h| → 0 means that limh→0 F(h)/|h| = 0.) Sometimes, we will just say that such an f has a first differential at x, or simply a differential at x. When n = 1, (7.45a) means only that f has a first derivative at x. If f satisfies (7.45a), then f is clearly continuous at x. By choosing h = (0, . . . , 0, hi , 0, . . . , 0), i = 1, . . . , n, in (7.45a), we also have f x1 , . . . , xi−1 , xi + hi , xi+1 , . . . , xn − f (x1 , . . . , xn ) = ai . lim hi hi →0

161

Differentiation

Thus, A is unique if it exists, and a function f that has a differential at x has (first) partial derivatives ∂f /∂xi at x, i = 1, . . . , n, and ∂f ∂f (x), . . . , (x) . A= ∂x1 ∂xn Consequently, (7.45a) is the same as f (x + h) = f (x) +

n ∂f (x)hi + o(|h|) ∂xi

as h = (h1 , . . . , hn ) → 0.

(7.45b)

i=1

If f has a differential at x, then the graph of the linear function Lx (y) defined by n ∂f Lx (y) = f (x) + (x) yi − xi , ∂xi

y = y1 , . . . , yn ∈ Rn ,

(7.46)

i=1

is called the tangent plane (or tangent line if n = 1) to f at x. By choosing h = y − x in (7.45b), we obtain f (y) = Lx (y) + o(|y − x|) as y → x.

(7.47)

When n > 1, a standard alternate terminology for (7.45b) is to say that f has a tangent plane at x. If f is any function whose partial derivatives ∂f /∂xi all exist at x, we write ∇f (x) =

∂f ∂f (x), . . . , (x) ∂x1 ∂xn

and call ∇f (x) the gradient of f at x. When n > 1, the mere existence of ∇f at a point x does not imply that f has a tangent plane at x even if f is continuous at x. For example, in case n = 2, the function " 2 2 if (x1 , x2 ) = (0, 0) x1 x2 / x1 + x22 (7.48) f (x1 , x2 ) = 0 if (x1 , x2 ) = (0, 0) is continuous and has first partial derivatives everywhere in R2 , but f does not have a tangent plane at (0, 0) since f (0, 0) = 0, ∇f (0, 0) = (0, 0), and f (h1 , h1 ) /h1 = 1/2 if h1 = 0. We leave it as an exercise to show that f satisfies the Lipschitz condition |f (x) − f (y)| ≤ C |x − y|,

x, y ∈ R2 ,

for some constant C that is independent of x and y.

162

Measure and Integral: An Introduction to Real Analysis

We now define a weaker notion of differentiability that will play a role in the proof of the Rademacher–Stepanov theorem. Let x ∈ Rn , n ≥ 1, be a finite real-valued function defined in a measurable set that contains x and has x as a point of density. Equivalently, suppose that f (x + h) is defined and finite for all h in a measurable set H ⊂ Rn containing 0 and having 0 as a point of density. We say that f has an approximate first differential at x, or simply an approximate differential at x, if there exists A ∈ Rn , depending on x, such that f (x + h) = f (x) + A · h + o(|h|),

h ∈ H,

h → 0.

(7.49)

When n = 1, the standard terminology for (7.49) is that f has an approximate derivative at the point in question. The vector A in (7.49) is unique in the following sense. If f satisfies (7.49) as well as f x + h = f (x) + A · h + o h ,

h ∈ H ,

h → 0

for some A and some measurable set H that has 0 as a point of density, then A = A (see Exercise 27). A function that has a total differential at a point clearly has an approximate differential there, but the converse is false even if the function is continuous. For example, let f (x1 , x2 ) be a function on R2 with the properties (i)

f (x1 , x2 ) = 0 if |x2 | ≥ x21 ,

(ii)

f (x1 , 0) = x1 for all x1 ∈ (−∞, ∞),

(iii)

f is continuous on R2 .

Then f has an approximate differential at (0, 0) since it satisfies (7.49) at x = (0, 0) with f (0, 0) = 0 and A = (0, 0) for the set H = (x1 , x2 ) : |x2 | ≥ x21 . However, f does not have a total differential at (0, 0) since f (0, 0) = 0, ∇f (0, 0) = (1, 0), and, for example, the estimate

1/2 f x1 , x21 = x1 + o x21 + x41

as x1 → 0

is false due to the fact that f x1 , x21 = 0 for all x1 . We leave it as an exercise to show that the function f in the example above cannot be redefined in any set of measure zero so that the resulting function has a total differential at (0, 0). On the other hand, the next theorem shows that a Lipschitz continuous function (see the following definition) has a total differential at every point where it has an approximate differential.

163

Differentiation

A finite real-valued function f defined in an open set G ⊂ Rn is said to be Lipschitz continuous in G if there is a constant C such that |f (x) − f (y)| ≤ C|x − y|

for all x, y ∈ G.

We then write f ∈ Lip(G). Similarly, for any finite f defined on G, we write f ∈ Liploc (G) and say that f is locally Lipschitz continuous in G if for every compact set K ⊂ G, there is a constant CK such that |f (x) − f (y)| ≤ CK |x − y|

for all x, y ∈ K.

If f ∈ Lip(G), then f ∈ Liploc (G), but the converse is false. For example, in R2 , the function f (x1 , x2 ) = (1 − x1 )−1 is locally Lipschitz continuous on the open unit ball centered at (0, 0), but it is not Lipschitz continuous there. In fact, for all (x1 , x2 ) , y1 , y2 in the ball, we have f (x1 , x2 ) − f y1 , y2 =

x1 − y1 . (1 − x1 ) 1 − y1

We leave it as an exercise to construct a function that is bounded, uniformly continuous and locally Lipschitz continuous on the open unit ball in R2 but not Lipschitz continuous there. Theorem 7.50 Let x0 ∈ Rn and f be a function that is Lipschitz continuous in a neighborhood of x0 . If f has an approximate differential at x0 , then f has a total differential at x0 . Proof. Suppose that f is Lipschitz continuous near x0 and that f (x0 + h) = f (x0 ) + A · h + o(|h|),

h → 0, h ∈ H,

for some A and some measurable set H ⊂ Rn that has 0 as a point of density. We will prove the theorem by showing that the same asymptotic formula holds for all h → 0 without the restriction that h ∈ H. By considering the function g(x) = f (x0 + x)−f (x0 )−A·x, we may assume that x0 = 0, f (0) = 0, and A = 0, that is, we may assume that f is Lipschitz continuous near 0 and lim

h→0,h∈H

f (h) = 0. |h|

Our goal is then to prove that f (h)/|h| → 0 as |h| → 0.

(7.51)

164

Measure and Integral: An Introduction to Real Analysis

For any y ∈ Rn and t > 0, let B(y; t) = x ∈ Rn : |x − y| < t denote the open ball with center y and radius t. Choose C, r > 0 such that |f (x) − f (y)| ≤ C|x − y|

if x, y ∈ B(0; r).

Fix ε > 0 and let β, γ satisfy 0 < β < 1,

0 < γ < 1 − β,

γ+1−β

0 such that if 0 < |h| < δ then α|B(0; |h|)| ≤ |B(0; |h|) ∩ H| ≤ |Dh ∩ H| + |B(0; |h|) − Dh | = |Dh ∩ H| + |B(0; |h|)| − |Dh | = |Dh ∩ H| + 1 − γn |B(0; |h|)|. Since α > 1 − γn , it follows that |Dh ∩ H| > 0 and in particular that Dh ∩ H is not empty. Thus, for every h with 0 < |h| < δ, there is a point h1 ∈ H such that |h1 | < |h| and |h − h1 | < {ε/(2C)}|h|. ˜ < We may also choose δ so small that δ < r and, by (7.51), so that |f (h)| ˜ ˜ ˜ (ε/2)|h| if h ∈ H and |h| < δ. Note that δ depends ultimately only on ε, H, f , and n. If 0 < |h| < δ and h1 is chosen as above, we obtain

165

Differentiation |f (h)| ≤ f (h) − f (h1 ) + f (h1 ) ε ≤ C |h − h1 | + |h1 | 2 ε ε ≤ C |h| + |h| = ε|h|, 2C 2 which completes the proof. The next theorem is the main result of this section.

Theorem 7.53 (Rademacher–Stepanov) Let G be an open set in Rn , n > 1, and let f ∈ Liploc (G). Then f has a total first differential a.e. in G. The partial derivatives ∂f /∂xi , i = 1, . . . , n, are measurable in G and bounded a.e. in every compact subset of G. Furthermore, if f ∈ Lip(G), then all ∂f /∂xi are bounded a.e. in G. The proof will use Theorem 7.50 together with Lusin’s theorem and the next four lemmas. It will also use the one-dimensional version of Theorem 7.53, which is included in Corollary 7.23. We begin by showing that the derivative (from the right) of a Lipschitz function of one variable can be defined in terms of a certain sequential (as opposed to ordinary) limit of difference quotients, a fact that will be useful in proving measurability of the partial derivatives in Theorem 7.53. Lemma 7.54 Let f be Lipschitz continuous in a half-open interval [a, b) in R1 and let k = 1, 2, . . .. If f (a + 1/k) − f (a) 1/k k→∞ lim

exists, then so does lim

h→0+

f (a + h) − f (a) , h

and the two limits are the same. Proof. Denote L = lim

k→∞

f (a + 1/k) − f (a) . 1/k

166

Measure and Integral: An Introduction to Real Analysis

Note that L is finite since f is Lipschitz continuous in [a, b). Given ε > 0, choose a positive integer Kε such that f (a + 1/k) − f (a) 1. If f ∈ Liploc (G), then f has measurable first partial derivatives ∂f /∂xi a.e. in G, i = 1, . . . , n.

167

Differentiation

Proof. Fix G and f as in the hypothesis. Consider, for example, the case i = 1 and denote points of Rn by (x, y),

with x ∈ (−∞, ∞) and y ∈ Rn−1 .

(7.56)

Also, denote the difference quotient of f in the first variable by Dh f (x, y) =

f (x + h, y) − f (x, y) , h

h = 0,

(7.57)

provided (x, y), (x + h, y) ∈ G. Let E be the set defined by

∂f E = (x, y) ∈ G : (x, y) = lim Dh f (x, y) exists . ∂x h→0 The derivative ∂f /∂x is automatically finite in E since f is locally Lipschitz continuous in G. Our goal is to show that E is measurable in Rn , that |G−E| = 0, and that (∂f /∂x)(x, y) is a measurable function on E. Let D+ f (x, y) = lim Dh f (x, y) h→0+

and

D− f (x, y) = lim Dh f (x, y) h→0−

at any (x, y) ∈ G where these limits exist, and define sets E+ and E− by E+ = (x, y) ∈ G : D+ f (x, y) exists , E− = (x, y) ∈ G : D− f (x, y) exists . Note that E = (x, y) ∈ E+ ∩ E− : D+ f (x, y) = D− f (x, y) .

(7.58)

Moreover, by Lemma 7.54, E+ can be expressed in terms of a sequential limit:

E = (x, y) ∈ G : lim D1/k f (x, y) exists , +

k→∞

and by similar reasoning,

E− = (x, y) ∈ G : lim D−1/k f (x, y) exists . k→∞

168

Measure and Integral: An Introduction to Real Analysis

In order to have a difference quotient that is well-defined for all h = 0 when (x, y) ∈ G, we extend f to be zero outside G. Thus, let "

f in G f¯ = 0 in Rn − G. Then f¯ is measurable in Rn , and consequently, Dh f¯ is defined and measurable in Rn for all h = 0. In particular, both D1/k f¯ and D−1/k f¯ are measurable in Rn for all k = 1, 2, . . .. Since G is open, for every (x, y) ∈ G, we have D1/k f¯ (x, y) = D1/k f (x, y) for all large k. Therefore,

¯ E = (x, y) ∈ G : lim D1/k f (x, y) exists , +

k→∞

and a similar representation holds for E− with D1/k replaced by D−1/k . It follows that E+ and E− are measurable in Rn (cf. Exercise 23 in Chapter 4) and that D+ f is measurable on E+ and D− f is measurable on E− . Then (7.58) implies that E is a measurable set in Rn and ∂f /∂x is a measurable function on E. Since G is open, for every y ∈ Rn−1 , the one-dimensional set Gy defined by Gy = x ∈ (−∞, ∞) : x, y ∈ G is open (possibly empty) in R1 . Also, f (x, y) considered as a function of x is locally Lipschitz continuous in Gy for every y. Hence, by Corollary 7.23, (∂f /∂x)(x, y) exists for a.e. (linear measure) x ∈ Gy . Defining Ey in the same of E manner as Gy , we have from Tonelli’s theorem and the measurability n−1 that for a.e. y ∈ R , Ey is linearly measurable, Ey = Gy (linear measure again), and |E| =

Ey dy = Gy dy = |G|. Rn−1

Rn−1

In case G has finite measure, it follows that |G − E| = 0, and we have accomplished our goal. The same is true if |G| = ∞ by intersecting G and E with a sequence of open balls increasing to Rn . Finally, a similar argument holds for every coordinate xi , i = 1, . . . , n, and the proof of Lemma 7.55 is complete. Before proceeding, we remark that if f is any measurable function in an open set G and if ∂f /∂xi exists a.e. in a measurable set E ⊂ G for some i, then ∂f /∂xi is measurable in E. If i = 1, for example, this follows by considering the extension f¯ in the proof of Lemma 7.55 and using the fact that Dhk f¯ is measurable in Rn and converges a.e. in E to ∂f /∂x1 for any fixed

169

Differentiation

sequence hk → 0. While this fact is generally useful, we did not use it in the proof of Lemma 7.55 because the existence of the first partial derivatives of f a.e. in G was not guaranteed in advance. The next lemma establishes a fact about uniform convergence of difference quotients. Its proof is similar in spirit to the proof of Egorov’s theorem (see also Exercise 13 of Chapter 4). We will continue to use the notations in (7.56) and (7.57) for points in Rn and the difference quotient in the first variable. Lemma 7.59 Let G be an open set in Rn , n > 1, and E be a measurable subset of G with |E| < ∞. Let f be a continuous function on G such that ∂f /∂x exists and is finite in E. Then given ε > 0, there exist a closed set F ⊂ E and δ > 0 such that |E − F| < ε, the set (x + h, y) : (x, y) ∈ F, |h| ≤ δ is contained in G, and Dh f (x, y) converges uniformly to (∂f /∂x)(x, y) in F as h → 0. A similar result holds for the difference quotient of f in each of the other coordinate variables. Proof. Fix G, E, and f as in the hypothesis. For m, k = 1, 2, . . ., let Em,k

= (x, y) ∈ E : (x + h, y) ∈ G and

Dh f (x, y) − ∂f (x, y) ≤ 1 if 0 < |h| ≤ 1 . m ∂x k

To see why each Em,k is measurable, first define G(r) = {(x, y) ∈ G : (x + s, y) ∈ G if |s| ≤ |r|},

r ∈ R1 − {0}.

Thus, G(r) consists of all points (x, y) of G such that the closed line segment of length 2|r| centered at (x, y) and parallel to the x-axis lies in G. Since the distance from any compact line segment in G to the complement of G is positive (cf. Exercise 12(b) of Chapter 1), it follows that G(r) is open and therefore that the set E(r) = G(r) ∩ E is measurable. Next, using the continuity of f on G, and letting Q denote the collection of all rational numbers, we can represent Em,k as a countable intersection as follows: Em,k =

# r∈Q 0 0 ε ⊂ E and an index k = kε and m = 1, 2, . . ., there is a measurable set Hm = Hm m m −m−1 such that |E − Hm | < ε2 and Dh f (x, y) − ∂f (x, y) ≤ 1 m ∂x

if (x, y) ∈ Hm and 0 < |h| ≤

1 . km

$ Let H = ∞ 1 Hm and δ = 1/k1 . Note that if (x, y) ∈ H and |h| ≤ δ then (x + h, y) ∈ G. Also, Dh f converges uniformly to ∂f /∂x in H as h → 0, and |E − H| ≤

∞ 1

ε 2m+1

=

ε . 2

Now choose a closed set F = Fε ⊂ H with |H − F| < ε/2. Then |E − F| < ε, (x + h, y) ∈ G if (x, y) ∈ H and |h| ≤ δ, and Dh f converges uniformly in F as h → 0. This proves Lemma 7.59. The final fact that we will use to prove Theorem 7.53 is given in the next lemma. Lemma 7.60 Let G be an open set in Rn , n > 1, and E be a measurable subset of G. Let f be a continuous function on G all of whose first partial derivatives exist and are finite in E. Then f has an approximate differential a.e. in E. Proof. The proof is by induction on n. Let G, E, and f satisfy the hypothesis. Fix n ≥ 2 and assume that the result is true for dimension n − 1. In case n = 2, this inductive assumption is true since a function of a single variable has an approximate derivative at every point where it has a derivative. We will again use the notation (x, y) and Dh f (x, y) in (7.56) and (7.57). Without loss of generality, we may assume that |E| < ∞. Let ε > 0 and choose δ and F as in Lemma 7.59. Then (x + h, y) ∈ G if (x, y) ∈ F and |h| ≤ δ, |E − F| < ε, and Dh f converges uniformly to ∂f /∂x in F. We may also assume that |F| > 0 and, by Lusin’s theorem, that ∂f /∂x is continuous on F relative to F. Fix any x0 ∈ (−∞, ∞) for which the set Fx0 defined by

Fx0 = y ∈ Rn−1 : x0 , y ∈ F

171

Differentiation

has positive (n − 1)-dimensional measure. Note that a.e. (linear measure) x0 such that Fx0 is not empty has this property by Fubini’s theorem. Similarly, define

Gx0 = y ∈ Rn−1 : x0 , y ∈ G , Ex0 = y ∈ Rn−1 : x0 , y ∈ E . Then G x0 is open in Rn−1 , and Ex0 is measurable in Rn−1 for a.e. x0 . Also, f x0 , y considered as a function of y is continuous in Gx0 , and since f has finite first partial derivatives a.e. in E (by hypothesis), we may assume that f x0 , y has finite first partial derivatives with respect to y for a.e. y ∈ Ex0 . Thus, by our inductive hypothesis, f x0 , y has an approximate differential in y a.e. ((n − 1)-dimensional measure) in Ex0 and so a.e. in Fx0 . Now fix any y0 ∈ Fx0 such that y0 is an (n − 1)-dimensional point of density of Fx0 and such that f x0 , y has a differential in y at y = y0 , that is, so that ˜ x0 , y0 · y − y0 + o |y − y0 | f x0 , y = f x0 , y0 + A

as y → y0 , (7.61)

˜ x0 , y0 is a vector in Rn−1 . Almost every point of Fx has these where A 0 properties. We claim that f has an approximate differential (relative to Rn ) at x0 , y0 . With δ as above, let H ⊂ Rn be the Cartesian product [x0 − δ, x0 + δ] × Fx0 : H = (x, y) ∈ Rn : |x − x0 | ≤ δ, y ∈ Fx0 . Then H ⊂ G, and x0 , y0 is an n-dimensional point of density of H since y0 is an (n − 1)-dimensional point of density of Fx0 . Let (x, y) ∈ H and write

A x0 , y0 =

∂f ˜ x0 , y0 , A x0 , y0 . ∂x

Then A(x0 , y0 ) is a vector in Rn and f (x, y) − f x0 , y0 − A x0 , y0 · x − x0 , y − y0 = f (x, y) − f x0 , y + f x0 , y − f x0 , y0 − A x0 , y0 · x − x0 , y − y0 & % ∂f x0 , y (x − x0 ) = f (x, y) − f x0 , y − ∂x & % ∂f ∂f x0 , y − x0 , y0 (x − x0 ) + ∂x ∂x ˜ x0 , y0 · y − y0 + f x0 , y − f x0 , y0 − A = I + II + III,

172

Measure and Integral: An Introduction to Real Analysis

say. Our by showing that each be proved of |I|, |II|, and |III| has claim will size o |x − x0 | + y − y0 if (x, y) ∈ H and (x, y) → x0 , y0 . Let η > 0. uniformly to ∂f /∂x in F, there exists ν > Since Dh f converges 0 such that Dh f − ∂f /∂x < η in F if 0 < |h| < ν. We may assume that ν < δ. Hence, since I=

% & ∂f Dx−x0 f x0 , y − x0 , y (x − x0 ) , ∂x

we obtain |I| < η |x − x0 |

if |x − x0 | < ν and (x, y) ∈ H.

Also, if (x, y) ∈ H, then x0 , y ∈ F, and consequently by using the continuity of ∂f /∂x on F relative to F, we see that there exists ν > 0 independent of (x, y) such that |II| < η |x − x0 |

if y − y0 < ν and (x, y) ∈ H.

Finally, by (7.61), we have |III| < η y − y0 when y − y0 is sufficiently small. Our claim follows by combining estimates. Since a.e. point in F shares the properties of the point x0 , y0 above, we conclude that f has an approximate differential relative to Rn a.e. in F. Recall that the value of ε > 0 can be arbitrarily small and that the set F = Fε satisfies F ⊂ E and |E − F| < ε. Now let ε → 0 through a sequence {εm }. Then f has an approximate differential a.e. in the set m Fεm , which is a subset of E of full measure. This completes the proof of the Lemma 7.60. Proof of Theorem 7.53. Let G be an open set in Rn , n > 1, and f ∈ Liploc (G). By Lemma 7.55, f has measurable first partial derivatives a.e. in G. The fact that these derivatives are bounded a.e. on every compact subset of G follows from their definitionsas limits of difference quotients since f ∈ Liploc (G). In fact, if Gj : j = 1, 2, . . . is a sequence of bounded open sets with closures contained in G such that Gj G, for example, Gj = x ∈ G : |x| < j and d(x, CG) > 1/j ,

j = 1, 2, . . . ,

then f ∈ Lip Gj for each j, and every compact set in G lies in some Gj . Clearly, the first partial derivatives of a function that is Lipschitz continuous in an open set are bounded a.e. in that set.

173

Differentiation

Thus, it remains only to show that f has a total first differential a.e. in G. However, this is an immediate consequence of Lemma 7.60 and Theorem 7.50, which completes the proof of the Rademacher–Stepanov theorem. An examination of the proof of Theorem 7.53 shows that the assumption of local Lipschitz continuity of f is used in two key ways, first in order to show that f has measurable partial derivatives a.e. in G and again in order to conclude that f has a total differential wherever it has an approximate one (Theorem 7.50). Note that the part of the proof showing that f has an approximate differential a.e. in G (Lemmas 7.59 and 7.60) does not require Lipschitz continuity, although it uses continuity of f in G and relies on the existence of the first partial derivatives of f a.e. in G. In order to transfer some of the proof technique to other situations, we can bypass the first way that Lipschitz continuity is used by simply assuming that all ∂f /∂xi exist and are finite. With this assumption, the next result gives a condition different from Lipschitz continuity that implies total differentiability. The condition is stated as follows in terms of the size of the second difference of f . We say that a function f that is defined and finite in a neighborhood of a point x ∈ Rn is smooth at x if f (x + h) + f (x − h) − 2f (x) = o(|h|)

as |h| → 0.

(7.62)

The intuitive reason for calling such an f smooth at x arises from the one-dimensional situation, noting that f (x + h) + f (x − h) − 2f (x) f (x + h) − f (x) f (x − h) − f (x) = − , h h −h

h = 0,

and consequently, if f is smooth and has a finite derivative from either side at x, then it has a derivative from both sides at x, and the two are the same. Thus, the graph of a function that is smooth at a point cannot have a corner there. Theorem 7.63 Let G be an open set in Rn , n > 1, and E be a measurable subset of G. Let f be a continuous function on G such that (i) f has finite partial derivatives ∂f /∂xi in E, i = 1, . . . , n, (ii) f satisfies the smoothness condition (7.62) at every x ∈ E. Then each ∂f /∂xi is measurable on E, and f has a total first differential a.e. in E. Proof. We will be brief. Fix G, E, and f as in the hypothesis. The measurability on E of the derivatives ∂f /∂xi follows from the remark after the proof of

174

Measure and Integral: An Introduction to Real Analysis

Lemma 7.55. It remains to show that f has a total first differential a.e. in E. For m, k = 1, 2, . . . , define Em,k

|f (x + h) + f (x − h) − 2f (x)| 1 1 . = x∈E: ≤ if 0 < |h| ≤ |h| m k

By (7.62), for every m, Em,k E as k ∞. Every Em,k is measurable since the continuity of f in G gives #

Em,k =

r∈Qn 0 0, it follows (cf. the proof of Lemma 7.59) that there is a measurable set H ⊂ E with |E − H| < ε such that |f (x + h) + f (x − h) − 2f (x)| → 0 uniformly in H as |h| → 0. |h| To complete the proof of the theorem, it is enough to show that f has a total differential a.e. in H. By Lemma 7.60, f has an approximate differential a.e. in E and therefore also a.e. in H. Let x0 ∈ H be a point of density of H where f has an approximate differential. It suffices to prove that f has a total differential at x0 . By the definition of an approximate differential, there is a vector A0 ∈ Rn and a measurable set, which we may assume is the same as the set H, having x0 as a point of density such that f (x0 + h) − f (x0 ) − A0 · h = o(|h|),

h ∈ H, as |h| → 0.

Without loss of generality, we may assume that x0 = 0, f (x0 ) = 0, and A0 = 0. Thus, given η > 0, there exists ρ0 > 0 such that both sup x,h x∈H,|h|≤ρ

|f (x + h) + f (x − h) − 2f (x)| ≤ ηρ if 0 ≤ ρ ≤ ρ0

and |f (h)| ≤ η|h|

if h ∈ H and 0 ≤ |h| ≤ ρ0 .

If ρ is sufficiently small, then every u ∈ Rn satisfying |u| < ρ can be expressed in the form u = x + h with x − h, x ∈ H and |x|, |h| < ρ (see Exercise 30).

175

Differentiation

Combining estimates, we then obtain that for all sufficiently small ρ > 0 and all u with |u| < ρ, |f (u)| = |f (x + h)| ≤ |f (x + h) + f (x − h) − 2f (x)| + |f (x − h)| + 2|f (x)| ≤ ηρ + η 2ρ + 2ηρ = 5ηρ, and the theorem follows. In passing, we will derive an interesting result about extending a Lipschitz function from a set to the entire space in such a way that the extended function remains Lipschitz continuous. The result is not limited to functions defined in open sets. If E is any set in Rn and f is a finite real-valued function defined on E, then f is said to be Lipschitz continuous on E if there is a constant C such that |f (x) − f (y)| ≤ C|x − y|,

x, y ∈ E.

The smallest such C, namely, the constant |f (x) − f (y)| , |x − y| x,y∈E

Cf ,E = sup x=y

is called the Lipschitz constant of f on E. We have the following extension result for such functions. Theorem 7.64 Let f be Lipschitz continuous on a set E ⊂ Rn . Then f can be extended to Rn as a Lipschitz function with the same Lipschitz constant, that is, there is a function f1 ∈ Lip (Rn ) such that f1 = f on E and Cf1 ,Rn = Cf ,E . Proof. Part of the proof will be left as an exercise. Let f and E be as n in the hypothesis if y, y0 ∈ E and and denote C = Cf ,E . Note that x ∈ R , then f y0 −C x −y0 ≤ f (y) + C|x − y| since f y0 − f (y) ≤ C y0 − y ≤ C x − y0 + x − y . Therefore, the conical functions γy (x) defined for y ∈ E and x ∈ Rn by γy (x) = γy,f (x) = f (y) + C|x − y| satisfy inf γy (x) ≥ f y0 − C x − y0 > −∞

y∈E

y0 ∈ E

176

Measure and Integral: An Introduction to Real Analysis

for all x ∈ Rn . Also, for any y0 ∈ E, inf y∈E γy (x) ≤ γy0 (x) < +∞. Define f1 (x) = inf γy (x), y∈E

x ∈ Rn .

(7.65)

Then (see Exercise 32) f1 (x) = f (x) if x ∈ E, f1 ∈ Lip (Rn ) and Cf1 ,Rn = C = Cf ,E .

Exercises 1. Let f be measurable in Rn and different from zero in some set of positive measure. Show that there is a positive constant c such that f ∗ (x) ≥ c|x|−n for |x| ≥ 1. 2. Let φ(x), x ∈ Rn , be a bounded measurable function such that φ(x) = 0 for |x| ≥ 1 and φ = 1. For ε > 0, let φε (x) = ε−n φ(x/ε). (φε is called an approximation to the identity.) If f ∈ L(Rn ), show that lim

ε→0

f ∗ φε (x) = f (x)

in the Lebesgue set of f . (Note that

φε = 1, ε > 0, so that

f ∗ φε (x) − f (x) = [f (x − y) − f (x)]φε (y) dy.

Use Theorem 7.16.) 3. Show that the conclusion of Lemma 7.4 remains true for the case of two dimensions if instead of being squares, the sets Q covering E are rectangles with x-dimension equal to h and y-dimension equal to h2 . (Of course, h varies with Q and the rectangles have edges parallel to the coordinate axes.) Show that the conclusion of Theorem 7.2 remains valid for n = 2 if the cubes Q are replaced by rectangles of this type that are centered at x and with h → 0. Show that the same conclusions are valid for rectangles whose xdimension is h and whose y-dimension is any fixed increasing function of h. Generalize this to higher dimensions. 4. If E1 and E2 are measurable subsets of R1 with |E1 | > 0 and |E2 | > 0, prove that the set {x : x = x1 − x2 , x1 ∈ E1 , x2 ∈ E2 } contains an interval. (cf. Lemma 3.37.)

177

Differentiation

5. Let f be of bounded variation on [a, b]. If f = g + h, where g is absolutely continuous and h is singular, show that b

φ df =

a

6. 7.

8.

9.

b

φf dx +

a

b

φ dh

a

for any continuous φ. Show that if α > 0, then xα is absolutely continuous on every bounded subinterval of [0, ∞). Prove that f is absolutely continuous on [a, b] if and only if given ε > 0, there existsδ > 0 such that [ f (bi ) − f (ai )] < ε for every finite collection [ai , bi ] of nonoverlapping subintervals of [a, b] with (bi − ai ) < δ. Prove the following converse of Theorem 7.31: If f is of bounded variation on [a, b], and if the function V(x) = V[a, x] is absolutely continuous on [a, b], then f is absolutely continuous on [a, b]. If f is of bounded variation on [a, b], show that b f ≤ V[a, b]. a

Show that if equality holds in this inequality, then f is absolutely continuous on [a, b]. (For the second part, use Theorems 2.2(ii) and 7.24 to show that V(x) is absolutely continuous and then use the result of Exercise 8.) 10. (a) Show that if f is absolutely continuous on [a, b] and Z is a subset of [a, b] of measure zero, then the image set defined by f (Z) = {w : w = f (z), z ∈ Z} also has measure zero. Deduce that the image under f of any measurable subset of [a, b] is measurable. (Compare Theorem 3.33.) (Hint: use the fact that the image of an interval [ai , bi ] is an interval of length at most V (bi ) − V (ai ).) (b) Give an example of a strictly increasing Lipschitz continuous function f and a set Z with measure 0 such that f −1 (Z) does not have measure 0 (and consequently, f −1 is not absolutely continuous). (Let f −1 (x) = x + C(x) on [0, 1], where C(x) is the Cantor–Lebesgue function.) 11. Prove the following result concerning changes of variable. Let g(t) be monotone increasing and absolutely continuous on [α, β] and let f be integrable on [a, b], a = g(α), b = g(β). Then f (g(t))g (t) is measurable and integrable on [α, β], and b a

f (x) dx =

β α

f (g(t))g (t) dt.

178

Measure and Integral: An Introduction to Real Analysis

(Consider the cases when f is the characteristic function of an interval, an open set, etc.) 12. Use Jensen’s inequality to prove that if a, b ≥ 0, p, q > 1, (1/p) + (1/q) = 1, then ab ≤

bq ap + . p q

More generally, show that pj

a1 · · · aN ≤

N a j j=1

where aj ≥ 0, pj > 1, convexity of ex .)

N

j=1 (1/pj )

pj

,

= 1. (Write aj = exj /pj and use the

13. Prove Theorem 7.36. 14. Prove that φ is convex on (a, b) if and only if it is continuous and

x1 + x2 φ 2

≤

φ (x1 ) + φ (x2 ) 2

for x1 , x2 ∈ (a, b). 15. Theorem 7.43 shows that a convex function is the indefinite integral of a monotone increasing function. Prove the converse: If φ(x) = xa f (t)dt + φ(a) in (a, b) and f is monotone increasing, then φ is convex in (a, b). (Use Exercise 14.) 16. Show that the formula +∞ −∞

fg = −

+∞

f g

−∞

for integration by parts may not hold if f is of bounded variation on (−∞, +∞) and g is infinitely differentiable with compact support. (Let f be the Cantor–Lebesgue function on [0, 1], and let f = 0 elsewhere.) 17. A sequence {φk } of set functions is said to be uniformly absolutely continuous if given ε > 0, there exists δ > 0 such that if E satisfies |E| < δ, then |φk (E)| < ε for all k. If fk is a sequence of integrable functions on (0, 1) which converges pointwise a.e. to an integrable f , show that 1 0 f − fk → 0 if and only if the indefinite integrals of the fk are uniformly absolutely continuous. (cf. Exercise 23 of Chapter 10.)

Differentiation

179

18. Prove the following set-theoretic result related to the simple Vitali covering lemma. If C = {Q} is a collection of cubes all contained in a fixed bounded set in Rn , then there is a countable subcollection {Qk } of disjoint cubes in C such that every Q ∈ C is contained in some Q∗k , where Q∗k denotes the cube concentric with Qk of edge length 5 times that of Qk . Deduce the measure-theoretic consequence (cf. Lemma 7.4) that if a set E is covered by such a collection C of cubes, then there exist β > 0, depending only on n, and a finite number of disjoint cubes Q1 , . . . , QN in C such that β|E|e ≤ N k=1 |Qk |. Formulate analogues of these facts for a collection of balls in Rn . 19. Use Exercise 18 to prove Lemma 7.9. 20. (a) Let f (x) be defined for all x ∈ Rn by f (x) = 0 if every coordinate of x is rational, and f (x) = 1 otherwise. Describe the set of all x at which 1 |Q| Q f has a limit as Q x and describe all Lebesgue points of f .

21.

22.

23. 24.

25.

26.

(b) Give an example of a bounded function f on (−∞, ∞) with the following x properties: f is continuous except at a single point x0 ; (d/dx) 0 f = f (x) for all x (in particular when x = x0 ); x0 is not a Lebesgue point of f . For x ∈ Rn and 0 < α < n, define f (x) = |x|−α χ{|x| 0, let Hρ be the intersection of H with the ballBρ of radius ρ centered at 0. (a) Show that the set 2x − y :x, y ∈ Hρ covers Bρ if ρ is sufficiently small. Deduce that the set x + h : x, x − h ∈ Hρ , h ∈ Rn covers Bρ if ρ is small. (For the first part, compare Lemma 3.37. Recall from Theorem 3.35 that if E is a measurable set in Rn , then the set 2E defined by 2E = {2x : x ∈ E} has measure |2E| = 2n |E|. Note that 2Hρ and the set obtained by translating Hρ by any fixed point in Bρ are both subsets of B2ρ .) (b) While the result in part (a) is adequate for the purpose of proving Theorem 7.63, it can be improved and generalized. If r, s are real numbers with 0 < |s| ≤ |r|, show that the set rx + sy : x, y ∈ Hρ covers B|r|ρ if ρ is sufficiently small. (Argue as for part (a), but consider only translations of an appropriate subset of −sHρ .) 31. Let f be a finite measurable function on Rn that satisfies f (x + h) + f (x − h) − 2f (x) = O(1) uniformly in x, h for |h| < δ, that is, there are positive constants A, δ such that |f (x + h) + f (x − h) − 2f (x)| ≤ A if x, h ∈ Rn and |h| < δ. Show that f is bounded on every bounded set in Rn . (First show that f is bounded on some ball in Rn . To do so, note that there is a measurable set E with |E| > 0 on which f is bounded. Pick a point of density of E and apply Exercise 30.) Generalize the result to functions f that satisfy f (x + ay) + f (x + by) − 2f (x) = O(1) for fixed real a, b = 0.

Differentiation

181

32. Complete the proof of the extension Theorem 7.64 by verifying the properties listed in the last line of its proof for the function f1 defined in (7.65). 33. Let f be defined and Lipschitz continuous on a measurable set E ⊂ Rn . Show that f has an approximate differential relative to E at almost every point of E.

8 Lp Classes

8.1 Definition of Lp If E is a measurable subset of Rn and p satisfies 0 < p < ∞, then Lp (E) denotes p the collection of measurable f for which E | f | is finite, that is, p

L (E) = f :

p

| f | < +∞ ,

0 < p < ∞.

E

Here, f may be complex-valued (see Exercise 3 of Chapter 4 for the definition of measurability of vector-valued functions). In this case, if f = f1 + if2 for measurable real-valued f1 and f2 , we have | f |2 = f12 + f22 , so that f1 , f2 ≤ | f | ≤ f 1 + f 2 . It follows that f ∈ Lp (E) if and only if both f1 , f2 ∈ Lp (E). See Exercise 1. We shall write || f ||p,E =

1/p |f|

p

,

0 < p < ∞;

E

thus, Lp (E) is the class of measurable f for which || f ||p,E is finite. Whenever it is clear from context what E is, we will write Lp for Lp (E) and || f ||p for || f ||p,E . Note that L1 = L. In order to define L∞ (E), let f be real-valued and measurable on a set E of positive measure. Define the essential supremum of f on E as follows: If | x ∈ E : f (x) > α | > 0 for all real α, let essE sup f = +∞; otherwise, let ess sup f = inf{α : |{x ∈ E : f (x) > α}| = 0}. E

Since the distribution function ω(α) = |{x ∈ E : f (x) > α}| is continuous from the right (see Lemma 5.39), it follows that ω(essE sup f ) = 0 if essE sup f 183

184

Measure and Integral: An Introduction to Real Analysis

is finite. Therefore, essE sup f is the smallest number M, −∞ ≤ M ≤ + ∞, such that f (x) ≤ M except for a subset of E of measure zero. In the definition of essE sup f , we have assumed that |E| = 0. If the same definition were used when |E| = 0, then essE sup f would be −∞ for every real-valued f defined on E, resulting in incorrect or awkward statements of results such as Theorem 8.1. We will avoid technical difficulty of this type by adopting the convention that essE sup f is 0 when |E| = 0. This may be considered an analogue of the fact that ∫E f = 0 when |E| = 0. In practice, we will use the convention only when f ≥ 0 and |E| = 0. A real- or complex-valued measurable f is said to be essentially bounded, or simply bounded, on E if essE sup | f | is finite. By convention, if |E| = 0, then every function is essentially bounded on E and has essential supremum equal to 0. The class of all functions that are essentially bounded on E is denoted by L∞ (E). Clearly, f belongs to L∞ (E) if and only if its real and imaginary parts do. We shall write || f ||∞ = || f ||∞,E = ess sup | f |. E

Thus, || f ||∞ is the smallest M such that | f | ≤ M a.e. in E, and L∞ = L∞ (E) = { f : || f ||∞ < +∞}. The following theorem gives some motivation for this notation.

Theorem 8.1

If |E| < +∞, then || f ||∞ = limp→∞ || f ||p .

Proof. We may assume that |E| > 0 since f ∞ and f p are both 0 if |E| = 0. Let M = f ∞ . If M < M, then the set A = {x ∈ E : | f (x)| > M } has posi 1/p tive measure. Moreover, || f ||p ≥ A | f |p ≥ M |A|1/p . Since |A|1/p → 1 as

p → ∞, it follows that lim inf p→∞ || f ||p ≥ M and therefore that lim inf p→∞

1/p || f ||p ≥ M. However, we also have || f ||p ≤ E Mp = M|E|1/p . Hence, lim supp→∞ || f ||p ≤ M, which completes the proof. Remark: This result may fail if |E| = +∞. Consider, for example, the constant function f (x) = c, c = 0, in (0, ∞). Clearly, f ∈ L∞ but f ∈ / Lp for 0 < p < ∞. See also Exercise 26. We will now study some basic properties of the Lp classes.

Theorem 8.2

If 0 < p1 < p2 ≤ ∞ and |E| < +∞, then Lp2 ⊂ Lp1 .

Lp Classes

185

Proof. Write E = E1 ∪ E2 , E1 being the set where | f | ≤ 1 and E2 the set where | f | > 1. Then E

| f |p =

E1

| f |p +

| f |p ,

0 < p < ∞.

E2

The first term on the right is majorized by |E1 |; the second increases with p since its integrand exceeds 1. It follows that if f ∈ Lp2 , p2 < ∞, then f ∈ Lp1 , p1 < p2 . If p1 < p2 = ∞ and f ∈ L∞ then f is a bounded function on a set of finite measure and so belongs to Lp1 . Remarks (i) In Theorem 8.2, the hypothesis that E have finite measure cannot be omitted: for example, x−1/p1 belongs to Lp2 (1, ∞) if p2 > p1 but does not belong to Lp1 (1, ∞). Again, any nonzero constant is in L∞ , but is not in Lp1 (E) if |E| = +∞ and p1 < ∞. (ii) A function may belong to all Lp1 with p1 < p2 and yet not belong to Lp2 . In fact, if p2 < ∞, x−1/p2 belongs to Lp1 (0, 1), p1 < p2 , but does not belong to Lp2 (0, 1); log(1/x) is in Lp1 (0, 1) for p1 < ∞, but is not in L∞ (0, 1). (iii) We leave it to the reader to show that any function that is bounded on E (|E| < +∞ or not) and that belongs to Lp1 (E) also belongs to Lp2 (E), p2 > p1 . The next theorem states that the Lp classes are vector (i.e., linear) spaces. Its proof is left as an exercise.

Theorem 8.3 constant c.

If f , g ∈ Lp (E), p > 0, then f + g ∈ Lp (E) and cf ∈ Lp (E) for any

8.2 Hölder’s Inequality and Minkowski’s Inequality In order to discuss the integrability of the product of two functions, we will use the following basic result.

Theorem 8.4 (Young’s Inequality) Let y = φ(x) be continuous, real-valued, and strictly increasing for x ≥ 0 and let φ(0) = 0. If x = ψ(y) is the inverse of φ, then for a, b > 0,

186

Measure and Integral: An Introduction to Real Analysis

ab ≤

a

φ(x) dx +

0

b

ψ(y) dy.

0

Equality holds if and only if b = φ(a). Proof. A geometric proof is immediate if we interpret each term as an area and remember that the graph of φ also serves as that of ψ if we interchange the x- and y-axes. Equality holds if and only if the point (a, b) lies on the graph of φ.

y y = f(x) b

0

a

x

If φ(x) = xα , α > 0, then ψ(y) = y1/α , and Young’s inequality becomes ab ≤ a1+α /(1 + α) + b1+1/α /(1 + 1/α). Setting p = α + 1 and p = 1 + 1/α, we obtain

bp 1 ap 1 + if a, b ≥ 0, 1 < p < ∞, and + = 1. ab ≤ p p p p

(8.5)

Two numbers p and p that satisfy 1/p + 1/p = 1, p, p > 1, are called conjugate exponents. Note that p = p/(p − 1), and that 2 is self-conjugate. We will adopt the conventions that p = ∞ if p = 1, and p = 1 if p = ∞. Theorem 8.6 (Hölder’s Inequality) If 1 ≤ p ≤ ∞ and 1/p + 1/p = 1, then || fg||1 ≤ || f ||p ||g||p ; that is, E

E

| fg| ≤

1/p | f |p

E

E

|g|

p

| fg| ≤ ess sup | f | |g|. E

E

1/p

,

1 < p < ∞;

Lp Classes

187

Proof. The last inequality, which corresponds to the case p = ∞, is obvious. Let us suppose then that 1 < p < ∞. In case || f ||p = ||g||p = 1, (8.5) implies that p

p

| f |p ||g||p

|| f ||p |g|p +

+ | fg| ≤ = p p p p

E

E

=

1 1 + = 1 = || f ||p ||g||p . p p

For the general case, we may assume that neither || f ||p nor ||g||p is zero; otherwise, fg is zero a.e. in E, and the result is immediate. We may also assume that neither || f ||p nor ||g||p is infinite. If we set f1 = f /|| f ||p and

g1 = g/||g||p , then || f1 || p = ||g1 ||p = 1. Therefore, by the case already consid ered, we have E f1 g1 ≤ 1; that is, E | fg| ≤ || f ||p ||g||p , as desired. See Exercise 4 concerning the case of equality in Hölder’s inequality. The case p = p = 2 of Hölder’s inequality is a classical inequality:

Corollary 8.7 (Schwarz’s Inequality) E

| fg| ≤

|f|

2

1/2

E

1/2 |g|

2

.

E

The theorem that follows is usually referred to as the converse of Hölder’s inequality (see also Exercise 15 in Chapter 10). Theorem 8.8 Let f be real-valued and measurable on E, and let 1 ≤ p ≤ ∞ and 1/p + 1/p = 1. Then (8.9) || f ||p = sup fg, E

where the supremum is taken over all real-valued g such that g p ≤ 1 and exists.

E fg

Proof. That the left-hand side of (8.9) majorizes the right-hand side follows from Hölder’s inequality. To show the opposite inequality, let us consider first the case of f ≥ 0, 1 < p < ∞. If || f ||p = 0, then f = 0 a.e. in E, and the result is obvious. If 0 < || f ||p < +∞, we may further assume that || f ||p = 1 by dividing both sides of (8.9)

188

Measure and Integral: An Introduction to Real Analysis

by || f ||p . Now let g = f p/p . It is easy to verify that ||g||p = 1 and which completes the proof in this case. If || f ||p = +∞, define functions fk on E by setting fk (x) = 0 if |x| > k,

E fg

= 1,

fk (x) = min{ f (x), k} if |x| ≤ k.

Then each fk belongs to Lp and || fk ||p → || f ||p = +∞. By the case already considered, we have || fk ||p = E fk gk for some gk ≥ 0 with ||gk ||p = 1. Since f ≥ fk , it follows that

fgk ≥

E

fk gk → +∞.

E

This shows that

fg = +∞ = || f ||p .

sup

||g||p =1 E

To dispose of the restriction f ≥ 0, apply the result above to | f |. Thus, there exists gk with ||gk ||p = 1 such that || f ||p = lim

| f | gk = lim

E

f g˜ k ,

E

where g˜ k = gk (sign f ). (By sign

x, we mean the function equal to +1 for x > 0 and to −1 for x < 0.) Since g˜ k p = 1, the result follows. The cases p = 1 and ∞ are left as exercises. We leave it to the reader to check that (8.9) is true if thesupremum is taken only over those real-valued g with ||g||p = 1 for which E fg exists. Also, if 1 ≤ p ≤ ∞, then a measurable function f belongs to Lp (E) if fg ∈ L1 (E) for

every g ∈ Lp (E), 1/p + 1/p = 1. See Exercise 2. We have already observed that the sum of two Lp functions is again in Lp . The next theorem gives a more specific result when 1 ≤ p ≤ ∞. If 1 ≤ p ≤ ∞, then f + g p ≤

Theorem 8.10 (Minkowski’s Inequality) f p + g p ; that is,

1/p | f + g|

E

p

≤

1/p |f|

E

p

+

1/p |g|

p

E

ess sup | f + g| ≤ ess sup | f | + ess sup |g|. E

E

E

,

1 ≤ p < ∞,

Lp Classes

189

Proof. If p = 1, the result is obvious. If p = ∞, we have | f | ≤ || f ||∞ a.e. in E and |g| ≤ ||g||∞ a.e. in E. Therefore, | f + g| ≤ || f ||∞ + ||g||∞ a.e. in E, so that || f + g||∞ ≤ || f ||∞ + ||g||∞ . For 1 < p < ∞, p

|| f + g||p =

| f + g|p =

E

≤

| f + g|p−1 | f + g|

E

| f + g|

E

p−1

|f| +

| f + g|p−1 |g|.

E

In the last integral, apply Hölder’s inequality to | f + g|p−1 and |g| with exponents p = p/(p − 1) and p, respectively. This gives

p−1

| f + g|p−1 |g| ≤ || f + g||p

||g||p .

E

Since a similar result holds for

p−1

E |f

p

+ g|p−1 | f |, we obtain || f + g||p ≤ || f +

(|| f ||p + ||g||p ), and the theorem follows by dividing both sides by || f + p−1 g||p . (Note that if || f + g||p = 0, there is nothing to prove; if || f + g||p = +∞, g||p

then either f p = +∞ or g p = +∞ by Theorem 8.3, and the result is obvious again.) See Exercise 27(a) concerning a version of Minkowski’s inequality for infinite series. Remark: Minkowski’s inequality fails for 0 < p < 1. To see this, take E = (0, 1), f = χ(0, 1 ) , and g = χ( 1 ,1) . Then || f + g||p = 1, while || f ||p + ||g||p = 2−1/p + 2

2

2−1/p = 21−1/p < 1. See also (8.17) and Exercise 27(b).

8.3 Classes l p Let a = {ak } be a sequence of real or complex numbers, and let

||a||p =

k

1/p |ak |p

, 0 < p < ∞;

||a||∞ = sup |ak |. k

Then a is said to belong to l p , 0 < p < ∞, if ||a||p < +∞, and to belong to l∞ if ||a||∞ < +∞. We will sometimes also denote ||a||p by ||a||l p .

190

Measure and Integral: An Introduction to Real Analysis

Let us show that if 0 < p1 < p2 ≤ ∞, then l p1 ⊂ l p2 . (The opposite inclusion holds for Lp (E), |E| < +∞ , by Theorem 8.2.) For p2 = ∞, this is clear, and for p2 < ∞, it follows from the fact that if |ak | ≤ 1, then |ak |p2 ≤ |ak |p1 . Moreover (see Exercise 31), ||a||p ≤ ||a||1 if 1 ≤ p ≤ ∞,

and ||a||p ≤ ||a||q if 0 < q ≤ p ≤ ∞.

2 for a given p < ∞ but that is not in An example ofa sequence that is in l p 2 1/p2 2 p 1 l for p1 < p2 is 1/k log k : k ≥ 2 . Any constant sequence {ak }, ak =

c = 0, belongs to l∞ but not to l p for p < ∞. The same is true for {1/ log k : k ≥ 2}, whose terms even tend to zero.

Theorem 8.11 a ∞ .

If a = {ak } belongs to l p for some p < ∞, then limp→∞ a p =

Proof. If a ∈ l p0 , then a ∈ l p for p0 ≤ p ≤ ∞. Since |ak | → 0,there is a |ak |p = |ak0 |p |ak /ak0 |p . largest |ak |, say |ak0 |. Thus, ||a|| ∞ = |ak0 |. Write |ak /ak0 |p decreases (and so is bounded) as Since |ak /ak0 | ≤ 1, we see that p p ∞. Hence, there is a constant c > 0 such that |ak0 |p ≤ ||a||p ≤ c|ak0 |p for all large p. Since c1/p → 1 as p → ∞, the theorem follows. The next two results are analogues for series of Hölder’s and Minkowski’s inequalities. Their proofs are left as exercises. If a = {ak } and b = {bk }, we use the notation ab = {ak bk } ,

a + b = {ak + bk } ,

etc.

Theorem 8.12 (Hölder’s Inequality) Suppose that 1 ≤ p ≤ ∞ , 1/p+1/p = 1, a = {ak }, b = {bk }, and ab = {ak bk }. Then ||ab||1 ≤ ||a||p ||b||p ; that is,

|ak bk | ≤

|ak |p

|ak bk | ≤ (sup |ak |)

1/p

|bk |p

1/p

,

1 < p < ∞;

|bk | .

Theorem 8.13 (Minkowski’s Inequality) Suppose that 1 ≤ p ≤ ∞, a = {ak }, b = {bk }, and a + b = {ak + bk }. Then ||a + b||p ≤ ||a||p + ||b||p ; that is,

|ak + bk |p

1/p

≤

|ak |p

1/p

+

sup |ak + bk | ≤ sup |ak | + sup |bk | .

|bk |p

1/p

,

1 ≤ p < ∞;

Lp Classes

191

Even though Minkowski’s inequality fails when p < 1 (see Exercise 3), l p is still a vector space for 0 < p < 1; that is, a + b ∈ l p and αa = {αak } ∈ l p if a, b ∈ l p and α is any constant.

8.4 Banach and Metric Space Properties We now define a notion that incorporates the main properties of Lp and l p when p ≥ 1. A set X is called a Banach space over the complex numbers if it satisfies the following three conditions: (B1 ) X is a linear space over the complex numbers C; that is, if x, y ∈ X and α ∈ C, then x + y ∈ X and αx ∈ X. (B2 ) X is a normed space; that is, for every x ∈ X there is a nonnegative (finite) number ||x|| such that (a) ||x|| = 0 if and only if x is the zero element of X, (b) ||αx|| = |α|||x|| for α ∈ C and x ∈ X, (c) x + y ≤ x + y . If these conditions are fulfilled, ||x|| is called the norm of x. (B3 ) X is complete with respect to its norm; that is, every Cauchy sequence in X converges in X, or if ||xk − xm || → 0 as k, m → ∞, then there is an x ∈ X such that ||xk − x|| → 0. A set X that satisfies (B1 ) and (B2 ), but not necessarily (B3 ), is called a normed linear space over the complex numbers. A sequence {xk } such that ||xk − x|| → 0 as k → ∞ is said to converge in norm to x. Restricting the scalars α in (B1 ) and (B2 ) to be real numbers, we obtain definitions for a Banach space over the real numbers and for a normed linear space over the real numbers. Unless specifically stated to the contrary, we will take the scalar field to be the complex numbers. If X is a Banach space, define d(x, y) = ||x − y|| to be the distance between x and y. Then, (M1 ) d(x, y) ≥ 0; d(x, y) = 0 if and only if x = y, (M2 ) d(x, y) = d(y, x), (M3 ) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality). Any set that has a distance function d(x, y) satisfying (M1 ), (M2 ), and (M3 ) is called a metric space with metric d. Therefore, a Banach space is a metric space whose metric is the norm. Moreover, by (B3 ), a Banach space X is a complete metric space; that is, if d (xk , xm ) → 0 as k, m → ∞, then there is an x ∈ X such that d (xk , x) → 0.

192

Theorem 8.14 || f ||p,E .

Measure and Integral: An Introduction to Real Analysis

For 1 ≤ p ≤ ∞, Lp (E) is a Banach space with norm || f || =

Proof. Parts (B1 ) and (B2 ) in the definition of a Banach space are clearly fulfilled by Lp (E), parts (a) and (c) of (B2 ) being Theorem 5.11 and Minkowski’s inequality, respectively. (Regarding part (a), we do not distinguish between two Lp functions that are equal a.e.; thus, the zero element of Lp (E) means any function equal to zero a.e. in E.) p To verify (B 3 ), suppose that fk is a Cauchy sequence in L (E). If p = ∞, fk − fm ≤ || fk − fm ||∞ except for a set Zk,m of measure zero. If Z = then measure zero, and fk − fm ≤ || fk − fm ||∞ outside Z for k,m Zk,m , then Z has all k and m. Hence, fk converges uniformly outside Z to a bounded limit f , and it follows that || fk −f ||∞ → 0. (Note that convergence in L∞ is equivalent to uniform convergence outside a set of measure zero.) In case 1 ≤ p < ∞, Tchebyshev’s inequality (5.49) implies that x ∈ E : | fk (x) − fm (x)| > ε ≤ ε−p fk − fm p . E

Hence, fk is a Cauchy in measure. By Theorems 4.22 and 4.23, sequence there is a subsequence fkj and a function f such that fkj → f a.e. in E. Given ε > 0, there is a K such that p 1/p = || fkj − fk ||p < ε if kj , k > K. f k j − fk E

Letting kj → ∞, we obtain by Fatou’s lemma that || f − fk ||p ≤ ε if k > K. Hence, || f −fk ||p → 0 as k → ∞. Finally, since || f ||p ≤ || f −fk ||p +|| fk ||p < +∞, it follows that f ∈ Lp (E), which completes the proof. A metric space X is said to be separable if it has a countable dense subset; that is, X is separable if there exists a countable set {xk } in X with the property that for every x ∈ X and every ε > 0, there is an xk with d (x, xk ) < ε. In the next theorem, we will show that Lp is separable if 1 ≤ p < ∞. Note that L∞ is not separable: take L∞ (0, 1), for example, and consider the functions ft (x) = χ(0,t) (x), 0 < t < 1. There are an uncountable number of these, and || ft −ft ||∞ = 1 if t = t (see also Exercise 10).

Theorem 8.15

If 1 ≤ p < ∞, Lp (E) is separable.

Proof. Suppose first that E = Rn , and consider a grid of dyadic cubes in Rn . Let D be the set of all (finite) linear combinations of characteristic functions

Lp Classes

193

of these cubes, the coefficients being complex numbers with rational real and imaginary parts. Then D is a countable subset of Lp (Rn ). To see that D is dense, use the method of successively approximating more and more general functions: First, consider characteristic functions of open sets (every open set is the countable union of nonoverlapping dyadic cubes by Theorem 1.11), of Gδ sets, and of measurable sets with finite measure; then consider simple functions whose supports have finite measure, nonnegative functions in Lp (Rn ), and, finally, arbitrary functions in Lp (Rn ). The details are left to the reader (cf. Lemma 7.3). This proves the case E = Rn . For an arbitrary measurable E, let D denote the restrictions to E of the functions in D. Then D is dense in Lp (E), 1 ≤ p < ∞. In fact, given p and f ∈ Lp (E), let f1 = f on E and f1 = 0 off E. Then f1 ∈ Lp (Rn ), so that given ε > 0,

1/p

1/p < ε. Therefore, E | f − g|p < ε. there exists g ∈ D with Rn | f1 − g|p This shows that D is dense in Lp (E) and completes the proof. As we have already noted, Minkowski’s inequality fails when 0 < p < 1. Therefore, || · ||p,E is not a norm for such p. However, we still have the following facts. Theorem 8.16 If 0 < p < 1, Lp (E) is a complete, separable metric space, with distance defined by p

d(f , g) = || f − g||p,E . Proof. With d(f , g) so defined, properties (M1 ) and (M2 ) of a metric space are clear. To verify (M3 ), which is the triangle inequality, we first claim that (a + b)p ≤ ap + bp if a, b ≥ 0, 0 < p ≤ 1. If both a and b are zero, this is obvious. If, say, a = 0, then dividing by ap , we reduce the inequality to (1 + t)p ≤ 1 + tp , t > 0 (t = b/a). This is clear since both sides are equal when t = 0 and the derivative of the right side majorizes that of the left for t > 0. It follows that | f (x) − g(x)|p ≤ | f (x) − h(x)|p + |h(x) − g(x)|p if 0 < p ≤ 1. p p p Integrating, we obtain || f − g||p ≤ || f − h||p + ||h − g||p , which is just the triangle inequality. The proofs that Lp is complete and separable with respect to || · ||p are the same as in Theorems 8.14 and 8.15. It is worth noting that in case 0 < p ≤ 1, the triangle inequality is equivalent to the basic estimate

194

Measure and Integral: An Introduction to Real Analysis

p

p

p

|| f + g||p ≤ || f ||p + ||g||p

(0 < p ≤ 1).

(8.17)

See also Exercise 27(b). The analogous results for series are listed in the next theorem.

Theorem 8.18 (i) If 1 ≤ p ≤ ∞, l p is a Banach space with a = a p . For 1 ≤ p < ∞, l p is separable; l∞ is not separable. (ii) If 0 < p < 1, l p is a complete, separable metric space, with distance d(a, b) = p a − b p . Proof. We will show that l p is complete and separable when 1 ≤ p < ∞ and that l∞ is not separable. The rest of the proof of (i) and the proof of (ii) are left to the reader.

Suppose that 1 ≤ p < ∞, a(i) = ak(i) ∈ l p for i = 1, 2, . . ., and a(i) − a(j) p →

(j) (i) 0 as i, j → ∞. Since a(i) − a(j) p ≥ ak − ak for every k, it follows that (j) (i) (i) ak − ak → 0 for every k as i, j → ∞. Let ak = limi→∞ ak and a = {ak }. We

will show that a ∈ l p and a(i) − a → 0. Given ε > 0, there exists N such that p

1/p (i) (j) p = a(i) − a(j) p < ε ak − ak

if i, j > N.

k

Restricting the summation to k ≤ M and letting j → ∞, we obtain M p 1/p (i) ≤ε ak − ak

for any M, if i > N.

k=1

Letting M → ∞, we get a(i) − a p ≤ ε if i > N; that is, a(i) − a p → 0. The

fact that a p ≤ a − a(i) p + a(i) p shows that a ∈ l p . Hence, l p is complete. To prove that l p is separable when p < ∞, let D be the set of all sequences {dk } such that (a) the real and imaginary parts of dk are rational, and (b) dk = 0 for k ≥ N (N may vary from sequence to sequence). Then D is a countable subset of l p . If a = {ak } ∈ l p and ε > 0, choose N so that ∞ p k=N+1 |ak | < ε/2. Choose d1 , . . . , dN with rational real and imaginary parts N such that k=1 |ak − dk |p < ε/2. Then d = {d1 , . . . , dN , 0, . . .} belongs to D and p a − d p < ε. It follows that D is dense in l p and therefore that l p is separable. To see that l∞ is not separable, consider the subclass of sequences a = {ak } for which each ak is 0 or 1. The number of such sequences is uncountable,

Lp Classes

195

and a − a ∞ = 1 for any two different such sequences. Hence, l∞ cannot be separable. We know from Lusin’s theorem that measurable functions have continuity properties. The next theorem gives a useful continuity property of functions in Lp . Theorem 8.19 (Continuity in Lp ) If f ∈ Lp (Rn ), 1 ≤ p < ∞, then lim f (x + h) − f (x) p = 0.

|h|→0

Proof. Let Cp denote the class of f ∈ Lp such that f (x + h) − f (x) p → 0 as |h| → 0. We claim that (a)

a finite

linear combination of functions in Cp is in

Cp , and (b) if fk ∈ Cp and fk − f p → 0, then f ∈ Cp . Both of these facts follow easily from Minkowski’s inequality; for (b), for example, note that f (x + h) − f (x) p

≤ f (x + h) − fk (x + h) p + fk (x + h) − fk (x) p + fk (x) − f (x) p

= fk (x + h) − fk (x) p + 2 fk (x) − f (x) p . Since fk ∈ Cp , we have lim sup|h|→0 f (x + h) − f (x) p ≤ 2 fk (x) − f (x) p , and (b) follows by letting k → ∞. Clearly, the characteristic function of a cube belongs to Cp . Hence, in view of the fact that linear combinations of characteristic functions of cubes are dense in Lp (Rn ) (see the proof of Theorem 8.15), it follows from (a) and (b) that Lp (Rn ) is contained in Cp , and the proof is complete. We remark without proof that Theorem 8.19 is also true for 0 < p < 1. (Use p the same ideas for · p .) It fails, however, for p = ∞, as shown by the function χ = χ(0,∞) (x) on (−∞, +∞). In fact, χ ∈ L∞ (−∞, +∞) but χ (x + h) − χ(x) ∞ = 1 for all h = 0.

8.5 The Space L2 and Orthogonality For complex-valued measurable f , f = f1 + if2 with f1 and f2 real-valued and f = measurable, we have E f1 + i E f2 (see p. 96 in Section 5.3). We will E use the fact that E f ≤ E | f | (see Exercise 1).

196

Measure and Integral: An Introduction to Real Analysis

Among the Lp spaces, L2 has the special property that the product of any two of its elements is integrable (Schwarz’s inequality). This simple fact leads to some important extra structure in L2 , which we will now discuss. E is a fixed subset of Rn of positive measure, Consider L2 = L2 (E), where and write f = f 2,E , E f = f , etc. For f , g ∈ L2 , define the inner product of f and g by f , g =

f g,

(8.20)

where g denotes the complex conjugate of g. Note that by Schwarz’s inequality, |f , g| ≤ f g . Moreover, the inner product has the following properties: (a) g, f = f , g, (b) f1 + f2 , g = f1 , g + f2 , g , f , g1 + g2 = f , g1 + f , g2 , (c) αf , g = αf , g, f , αg = αf , g, α ∈ C, (d) f , f 1/2 = f . If f , g = 0, then f and g are said to be orthogonal. A set {φα }α∈A is orthogonal if any two of its elements are orthogonal; {φα } is orthonormal if it is orthogonal and φα = 1 for all α. Note that if {φα } is orthogonal and φα = 0 for every α, then {φα / φα } is orthonormal. Henceforth, we will assume that φα = 0 for all α for an orthogonal system {φα }. This implies that no element is zero. Furthermore, since for α = β,

2

φα − φβ 2 = φα − φβ φα − φβ = φα 2 + φβ = 0, it implies that no two elements are equal. A simple example of an infinite orthogonal system in L2 (E) is χEj where Ej is an infinite collection of disjoint measurable subsets of E with 0 < Ej < ∞ (cf. Exercise 33 of Chapter 3. See also Exercise 24 of this chapter).

Theorem 8.21

Any orthogonal system {φα } in L2 is countable.

Proof. We may assume that {φα } is orthonormal. Then for α = β, as above,

2

φα − φβ 2 = φα − φβ φα − φβ = φα 2 + φβ = 2,

Lp Classes

197

√ so that φα − φβ = 2. Since L2 is separable, it follows that {φα } must be countable. A collection ψ1 , . . . , ψN is said to be linearly independent if N k=1 ak ψk (x) = 0 (a.e.) implies that every ak is zero. Any collection of functions is called linearly independent if each finite subcollection is linearly independent. No function in a linearly independent set can be zero a.e.

Theorem 8.22

If {ψk } is orthogonal, it is linearly independent.

Proof. Suppose that a1 ψk1 + · · · + aN ψkN = 0. Multiplying both sides by ψk1 and integrating, we obtain by orthogonality that a1 = 0. Similarly, a2 = · · · = aN = 0. The converse of Theorem 8.22 is not true. However, the next result shows that if {ψk } is linearly independent, then the system formed from suitable linear combinations of its elements is orthogonal. Theorem 8.23 (Gram–Schmidt Process) If {ψk } is linearly independent, then the system {φk } defined by φ1 = ψ1 φ2 = a21 ψ1 + ψ2 .. .

.. .

φk = ak1 ψ1 + · · · + ak,k−1 ψk−1 + ψk .. .

.. .

is orthogonal for proper selection of the aij . Proof. Having φ1 = ψ1 , we proceed by induction, assuming that φ1 , . . . , φk−1 have been chosen as required. We will determine constants bk1 , . . . , bk,k−1 so that the function φk defined by φk = bk1 φ1 + · · · + bk,k−1 φk−1 + ψk is orthogonal to φ1 , . . . , φk−1 . If j < k, φk , φj = bkj φj , φj + ψk , φj

198

Measure and Integral: An Introduction to Real Analysis

by orthogonality. Since φj , φj = 0, bkj can be chosen so that φk , φj = 0, j < k. Since each φj with j < k is a linear combination of ψ1 , . . . , ψj , the theorem follows. When the φk are selected by the Gram–Schmidt process, we shall say that they are generated from the ψk . Note that the triangular character of the matrix in Theorem 8.23 means that each ψk can also be written as a linear combination of the φj , j ≤ k. that is We call an orthogonal system {φk } complete if the only function orthogonal to every φk is zero; that is, {φk } is complete if f , φk = 0 for all k implies that f = 0 a.e. Thus, a complete orthogonal system is one that is maximal in the sense that it is not properly contained in any larger orthogonal system. The span of a set of functions {ψk } is the collection of all finite linear combinations of the ψk . In speaking of the span of {ψk }, we may always assume that {ψk } is orthogonal by discarding any dependent functions and applying the Gram–Schmidt process to the resulting linearly independent set. A set {ψk } is called a basis for L2 if its span is dense in L2 ; that is, {ψk } is a basis if given f ∈ L2 and ε > 0, there exist N and {ak }N k=1 such that

N

f − k=1 ak ψk < ε. The ak can always be chosen with rational real and imaginary parts. Any countable dense set in L2 is of course a basis. It follows that L2 has an infinite orthogonal basis. Theorem 8.24 Any orthogonal basis in L2 is complete. In particular, there exists a complete orthonormal basis for L2 . Proof. Let {ψk } be an orthogonal basis for L 2 . We may assume that {ψk } is orthonormal. To show that it is complete, let f , ψk = 0 for all k. Then f , f = N f , f − k=1 ak ψk for any finite sum N k=1 ak ψk . By Schwarz’s inequality,

N

|f , f | ≤ f f − k=1 ak ψk , and so, since the term on the right can be chosen arbitrarily small, f , f = 0. Therefore, f = 0 a.e., which completes the proof. Let us show that every complete orthogonal system in L2 (E) is infinite if (as always) |E| > 0. If not, there is a set E with |E| > 0 and a complete orthog2 onal system {φk }N k=1 in L with N finite. Assuming as we may that the system is orthonormal, its completeness implies that for every f ∈ L2 (E), we have f = N k=1 f , φk φk a.e. in E, and consequently, f , g =

N f , φk g, φk if f , g ∈ L2 (E). k=1

Lp Classes

199

(In the next section, similar facts are derived in the limit case N = ∞.) Therefore, if f and g are any two orthogonal functions in L2 (E), then the sequences N N f , φk k=1 and g, φk k=1 , considered as vectors, are orthogonal in the vecon p. 198 in tor space CN of N-tuples of complex numbers. We observed Section 8.5 that there is an infinite orthogonal system ψj in L2 (E). Hence, N there is an infinite collection ψj , φk k=1 of orthogonal vectors in the finite j

dimensional space CN , which is impossible. From now on, we will consider only orthogonal systems in L2 that are (countably) infinite.

8.6 Fourier Series and Parseval’s Formula Let {φk } be any (infinite) orthonormal system in L2 . If f ∈ L2 , the numbers defined by ck = ck (f ) = f , φk = f φk E

are called the Fourier coefficients of f with respect to {φk }. The series k ck φk is called the Fourier series of f with respect to {φk }, and denoted S[f ] = k ck φk . We also write f ∼ c k φk . k

The first question we ask is how well S[f ], or more precisely, the sequence of its partial sums, approximates f . Fix N and let L = N k=1 γk φk be a linear combination of φ1 , . . . , φN . We wish to know what choice of γ1 , . . . , γN makes f − L a minimum. Note that since {φk } is orthonormal, L 2 = L, L = N 2 k=1 |γk | . Hence, 2

f − L =

f−

N k=1

= f 2 −

γ k φk

f−

N

γ k φk

k=1

N N (γk ck + γk ck ) + |γk |2 , k=1

k=1

where the ck are the Fourier coefficients of f . Since |ck − γk |2 = (ck − γk ) (ck − γk ) = |ck |2 − (γk ck + γk ck ) + |γk |2 ,

200

Measure and Integral: An Introduction to Real Analysis

we obtain f − L 2 = f 2 +

N

|ck − γk |2 −

k=1

N

|ck |2 .

k=1

Therefore, min f − L 2 = f 2 −

N

γ1 ,...,γN

|ck |2 ;

(8.25)

k=1

that is, the minimum is achieved when γk = ck for k = 1, . . . , N, or equivalently, when L is the Nth partial sum of S[f ]. Writing sN = sN (f ) = N k=1 ck φk , we have from (8.25) that N

f − sN 2 = f 2 − |ck |2 .

(8.26)

k=1

Theorem 8.27

Let {φk } be an orthonormal system in L2 and let f ∈ L2 .

(i) Of all linear combinations N one that best approx1 γk φk with N fixed, the N imates f in L2 is given by the partial sum sN = 1 ck φk of the Fourier series of f . (ii) (Bessel’s inequality) The sequence {ck } of Fourier coefficients of f belongs to l2 and ∞

1/2 2

|ck |

≤ f .

k=1

2 Proof. Part (i) has been proved. Note that since f − sN ≥ 0, Bessel’s inequality follows from (8.26) by letting N → ∞, which completes the proof. If f is a function for which equality holds in Bessel’s inequality, that is, if ∞

1/2 |ck |

2

= f ,

(8.28)

k=1

then f is said to satisfy Parseval’s formula. From (8.26), we immediately obtain the next result.

Lp Classes

201

Theorem 8.29 Parseval’s formula holds for f if and only if sN − f → 0, that is, if and only if S[f ] converges to f in L2 norm. The following theorem is of great importance. Theorem 8.30 (Riesz–Fischer Theorem) Let {φk } be any orthonormalsystem and let {ck } be any sequence in l2 . Then there is an f ∈ L2 such that S[f ] = ck φk , that is, such that {ck } is the sequence of Fourier coefficients of f with respect to {φk }. Moreover, f can be chosen to satisfy Parseval’s formula. Proof. Let tN =

N

k=1 ck φk .

Then if M < N,

2

N

N

tN − tM 2 = |ck |2 . c φ = k k

M+1

M+1 The fact that {ck } ∈ l2 implies that {tN } is a Cauchy sequence in L2 . Since L2 is

2

complete, there is an f ∈ L such that f − tN → 0. If N ≥ k,

f φk =

f − tN φk + t N φk = f − tN φk + c k .

Since the integral on the right is bounded in absolute value by f − tN

φk = f − tN , we obtain by letting N → ∞ that f φk = ck . Thus, S[f ] = k ck φk , so that tN = sN (f ), and it follows from Theorem 8.29 that Parseval’s formula holds for f . This completes the proof. There is no guarantee that the Fourier coefficients of a function uniquely determine the function. However, if {φk } is complete, we can show that the correspondence between a function and its Fourier coefficients is unique; that is, if f and g have the same Fourier coefficients with respect to a complete system, then f = g a.e. This is simple, since the vanishing of all the Fourier coefficients of f − g implies that f − g = 0 a.e. An important related fact is the following. Theorem 8.31 Let {φk } be an orthonormal system. Then {φk } is complete if and only if Parseval’s formula holds for every f ∈ L2 . Proof. Suppose that {φk } is complete. If f ∈ L2 , Bessel’s inequality implies that its Fourier coefficients {ck } belong to l2 . Hence, by the Riesz–Fischer theorem,

202

Measure and Integral: An Introduction to Real Analysis

1/2 |ck |2 there exists a g in L2 with S[g] = ck φk and g = . Since f and g have the same Fourier coefficients and {φk } is complete, we see that f = g a.e. 1/2 |ck |2 Hence, f = g = , which is Parseval’s formula. Conversely, suppose that Parseval’s formula holds with respect to {φk } for 1/2 f , φk 2 every f ∈ L2 . If f , φk = 0 for all k, then f = = 0. Therefore, f = 0 a.e., so that {φk } is complete, which proves the result. Suppose that {φk } is orthonormal and complete and that f , g ∈ L2 . Let ck = f , φk , dk = g, φk , c = {ck } , d = {dk }, and (c, d) = ck dk . We claim that f , g = (c, d).

(8.32)

To prove this, observe that by Parseval’s formula, f + g, f + g = (c + d, c + d), or f , f + g, g + 2 Re f , g = (c, c) + (d, d) + 2 Re (c, d), where Re z denotes the real part of z. Cancelling equal terms gives Re f , g = Re (c, d). Applying this to the function if (x), we obtain Re if , g = Re (ic, d). But Re if , g = Re [if , g] = −Im f , g. Similarly, Re (ic, d) = −Im (c, d). Therefore, Im f , g = Im (c, d), and (8.32) is proved. Another corollary of Theorem 8.31 is given in the next result. First, we make several definitions. Let X1 and X2 be metric spaces with metrics d1 and d2 , respectively. Then X1 and X2 are said to be isometric if there is a mapping T of X1 onto X2 such that d1 (f , g) = d2 (Tf , Tg) for all f , g ∈ X1 . Such a T is called an isometry. Thus, an isometry is a mapping that preserves distances. An isometry is automatically one-to-one, and two isometric metric spaces may be regarded as the same space with a relabeling of the points. For example, two L2 spaces, L2 (E) and L2 (E ), are isometric if there is a mapping T of L2 (E) onto L2 (E ) such that f − g 2,E = Tf − Tg 2,E

for all f , g ∈ L2 (E). The isometries we shall encounter will be linear, that is, will satisfy T(αf + βg) = αTf + βTg

for all scalars α, β.

If T is a linear map of L2 (E) onto L2 (E ), then since Tf − Tg = T(f − g), it follows that T is an isometry if and only if f 2,E = Tf 2,E

Lp Classes

203

for all f ∈ L2 (E). Similarly, a linear map T of L2 (E) onto l2 is an isometry if and only if f 2,E = Tf l2 for all f ∈ L2 (E). Theorem 8.33 another.

All spaces L2 (E) are linearly isometric with l2 , and so with one

Proof. For a given E, define a linear correspondence between L2 (E) and l2 2 by choosing a complete orthonormal system {φk } in L (E) and mapping an 2 f ∈ L (E) onto the sequence f , φk of its Fourier coefficients. This mapping is onto all of l2 by the Riesz–Fischer theorem and is an isometry by Theorem 8.31.

8.7 Hilbert Spaces A set H is called a Hilbert space over the complex numbers C if it satisfies the following three conditions: (H1 ) H is a vector space over C; that is, if f , g ∈ H and α ∈ C, then f + g ∈ H and αf ∈ H. The zero element of H will be denoted by 0. (H2 ) For every pair f , g ∈ H, there is a complex number (f , g), called the inner product of f and g, which satisfies (a) (g, f ) = (f , g),

(b) f1 + f2 , g = f1 , g + f2 , g , (c) (αf , g) = α(f , g) for α ∈ C, (d) (f , f ) ≥ 0, and (f , f ) = 0 if and only if f = 0.

Notice that (a), (b), and (c) imply that f , g1 + g2 = f1 g1 + f , g2 , (f , αg) = α (f , g), and (0, f ) = 0. Define f = (f , f )1/2 . Before stating the third condition, we claim that |(f , g)| ≤ f g

(Schwarz s inequality).

(8.34)

If g = 0, this is obvious. Otherwise, letting λ = −(f , g)/ g 2 , we obtain 0 ≤ (f + λg, f + λg) = f 2 − 2

|(f , g)|2 |(f , g)|2 |(f , g)|2 + = f 2 − , 2 2 g g g 2

204

Measure and Integral: An Introduction to Real Analysis

and Schwarz’s inequality follows at once. This simple proof also shows that if equality holds in (8.34) and ||g|| = 0, then f is a constant multiple of g, namely, f = −λg =

(f , g) g. (g, g)

We will show that · is a norm on H by proving the triangle inequality. In fact, f + g 2 = (f + g, f + g) = f 2 + 2 Re (f , g) + g 2 . Since |Re (f , g)| ≤ |(f , g)| ≤ f g , it follows that the right side is at most ( f + g )2 . Taking square roots, we obtain f + g ≤ f + g , as desired. Hence, H is a normed linear space. We also require (H3 ) H is complete with respect to · . In particular, a Hilbert space is a Banach space. As for L2 spaces, a linear map T of a Hilbert space H onto a Hilbert space

H is an isometry if and only if f H = Tf H for all f ∈ H. A Hilbert space is called infinite dimensional if it cannot be spanned by a finite number of elements; hence, an infinite dimensional Hilbert space has an infinite linearly independent subset. The space L2 with inner product (f , g) = f g and the space l2 with (c, d) = k ck dk are examples of separable infinite dimensional Hilbert spaces. In fact, there are essentially no other examples, as the following theorem shows.

Theorem 8.35 All separable infinite dimensional Hilbert spaces are linearly isometric with l2 and so with one another. Proof. The proof is a repetition of the ideas leading to Theorem 8.33, so we shall brief. Let H be a separable infinite dimensional Hilbert space, and be let e k be a countable dense subset. Discarding those e k that are spanned by other e i , we obtain a linearly independent set {ek } with the same dense span as ek . Since H is infinite dimensional, {ek } is infinite. Using the Gram–Schmidt process, we may assume that {ek } is orthonormal: (ei , ek ) = 0 for i = k and ek = 1 for all k. It follows that {ek } is complete; in fact, if f , ek = 0 for all k, then

2 N N

ak ek = f 2 + |ak |2 ≥ f 2

f −

k=1

k=1

Lp Classes

205

for N = 1, 2, . . . . If f were not zero, the span of the ek could not be dense. Hence, f = 0, which shows that H has a complete orthonormal system {ek }. Next, we will show that Bessel’s inequality and an analogue of the

Riesz–Fischer theorem hold for {ek }. Let f ∈ H and ck = f , ek . Then

2

N N

|ck |2 . 0 ≤ f − ck ek = f 2 −

k=1

k=1

Letting N → ∞, we obtain Bessel’s inequality

|ck |2

1/2

≤ f . In particu-

lar, {ck } belongs to To derive the Riesz–Fischer theorem, let {γk } be a sequence in l2 and set tN = N k=1 γk ek . Then l2 .

tM − tN 2 =

N

|γk |2 → 0

as M, N → ∞, M < N.

k=M+1

Since H is complete, there is a g ∈ H such that g − tN → 0. We have

g, ek = g − tN , ek + (tN , ek ) = g − tN , ek + γk

(k ≤ N).

Letting N → ∞, it follows from Schwarz’s inequality that g, ek = γk . Hence,

2

= g 2 − N |γk |2 . Letting N → ∞ in the tN = N k=1 g, ek ek and g − tN k=1 1/2 |γk |2 . last equation, we see that g satisfies Parseval’s formula g = This gives the analogue of the Riesz–Fischer theorem.

Now, let f ∈ H and set ck = f , ek . Choose {γk } = {ck } in the version

of the Riesz–Fischer theorem just derived, and let g ∈ H satisfy g, ek = ck 1/2 |ck |2 and ||g|| = . We see by the completeness of {ek } that g = f , so 1/2 |ck |2 that Parseval’s formula holds: f = . The fact that H is linearly isometric with l2 now follows as in the proof of Theorem 8.33.

Exercises 1. For complex-valued measurable f ,f = f1 + if2 with f1 and f2 real-valued f = by definition. Prove that and measurable, we have E f1 + i E f2 E f ≤ f is finite if and only if | f | is finite, and | f |. (Note that E E E E

206

Measure and Integral: An Introduction to Real Analysis

1/2 2 1/2 2 f = , and use the fact that a2 + b2 = E E f1 + E f2 2

1/2 a cos α+b sin α for an appropriate α, while a + b2 ≥ |a cos α+b sin α| for all α.)

2. Prove the converse of Hölder’s inequality for p = 1 and ∞. Show also that for 1 ≤ p ≤ ∞, a real-valued measurable f belongs to Lp (E) if fg ∈ L1 (E)

for every g ∈ Lp (E), 1/p + 1/p = 1. The negation is also of interest: if

f ∈ / Lp (E), then there exists g ∈ Lp (E) such that fg ∈ / L1 (E). (To verify the negation, construct g of the form ak gk for appropriate ak and gk , with gk satisfying E fgk → +∞.) 3. Prove Theorems 8.12 and 8.13. Show that Minkowski’s inequality for series fails when p < 1. 4. Let f and g be real-valued and not identically 0 (i.e., neither function equals 0 a.e.), and let 1 < p < ∞. Prove that equality holds in the inequality fg ≤ f p g p if and only if fg has constant sign a.e. and | f |p is a

multiple of |g|p a.e. If f + g p = f p + g p and g = 0 in Minkowski’s inequality, show that f is a multiple of g. Find analogues of these results for the spaces l p . 5. For 0 < p ≤ ∞ and 0 < |E| < +∞, define Np [f ] =

1 | f |p |E|

1/p ,

E

where N∞ [f ] means f ∞ . Prove that if p1 < p2 , then Np1 [f ] ≤ Np2 [f ]. Prove also that if 1 ≤ p ≤ ∞, then Np [f + g] ≤ Np [f ] + Np [g], (1/|E|) | fg| ≤ Np [f ]Np [g], 1/p + 1/p = 1, and that limp→∞ Np [f ] = f ∞ . E Thus, Np behaves like · p but has the advantage of being monotone in p. Recall Exercise 28 of Chapter 5. 6. (a) Let 1 ≤ pi , r ≤ ∞ and ki=1 p1 = 1r . Prove the following generalizai tion of Hölder’s inequality:

f 1 · · · f k ≤ f 1 · · · f k . r p p 1

k

(See also Exercise 12 of Chapter 7.) (b) Let 1 ≤ p < r < q ≤ ∞ and define θ ∈ (0, 1) by the interpolation estimate

1 r

=

1−θ f r ≤ f θ . p f q

In particular, if A = max{ f p , f q }, then f r ≤ A.

θ p

+

1−θ q .

Prove

Lp Classes

207

7. Show that when 0 < p < 1, the neighborhoods { f : f p < ε} of zero in Lp (0, 1) are not convex. (Let f = χ(0,εp ) , and g = χ(εp ,2εp ) . Show that f p = g p = ε, but that 12 f + 12 g p > ε.) 8. Prove the following integral version of Minkowski’s inequality for 1 ≤ p < ∞ and a measurable function f (x, y):

p 1/p 1/p |f (x, y)| dx dy ≤ | f (x, y)|p dy dx.

(For 1 < p < ∞, note that the pth power of the left-hand side equals f (z, y) dz p−1 | f (x, y)| dx dy. Integrate first with respect to y and apply Hölder’s inequality.) 9. If f is real-valued and measurable on E, |E| > 0, define its essential infimum on E by ess inf f = sup{α : |{x ∈ E : f (x) < α}| = 0}. E

If f ≥ 0, show that essE inf f = (essE sup 1/f )−1 . 10. Prove that L∞ (E) is not separable for any E with |E| > 0. (Construct a sequence of decreasing subsets of E whose measures strictly decrease. Consider the characteristic functions of the class of sets obtained by taking all possible unions of the differences of these subsets.)

11. If fk → f in Lp , 1 ≤ p < ∞, gk → g pointwise, and gk ∞ ≤ M for all k, prove that fk gk → fg in Lp .

12. Let f , fk ∈ Lp , 0 < p ≤ ∞. Show that if f − fk p → 0, then

fk → f p . Conversely, if fk → f a.e. and fk → f p , 0 < p < ∞, p p

show that f − fk p → 0. Show that the converse may fail for p = ∞. p p (For the converse when 0 < p < ∞, note that f − fk ≤ c(|f |p + fk ) with c = max 2p−1 , 1 ; then apply, for example, the sequential version of Lebesgue’s dominated convergence theorem given in Exercise 23 of Chapter 5.)

13. Suppose that fk → f a.e. and that fk , f ∈ Lp , 1 < p ≤ ∞. If fk p ≤ M <

+∞, show that fk g → fg for all g ∈ Lp , 1/p + 1/p = 1. Show that the result is false if p = 1. (When p > 1, use Egorov’s theorem in case the domain of integration has finite measure.) 14. Verify that the following systems are orthogonal: (a) 12 , cos x, sin x, . . . , cos kx, sin kx, . . . on any interval of length 2π. (b) e2πikx/(b−a) ; k = 0, ±1, ±2, . . . on (a, b).

208

Measure and Integral: An Introduction to Real Analysis

15. If f ∈ L2 (0, 2π), show that 2π

lim

k→∞

f (x) cos kx dx = lim

k→∞

0

2π

f (x) sin kx dx = 0.

0

Prove that the same is true if f ∈ L1 (0, 2π). (This last statement is the Riemann–Lebesgue lemma. To prove it, approximate f in L1 norm by L2 functions. See Theorem 12.21.) 16. A sequence fk in Lp is said to converge weakly in Lp to a function f

(belonging to Lp ) if fk g → fg for all g ∈ Lp . Prove that if fk → f in Lp norm, 1 ≤ p ≤ ∞, then fk converges weakly in Lp to f . Note by Exercise 15 that the converse is not true. See Exercise 28 of Chapter 10. 17. Suppose that fk , f ∈ L2 and that f k g → fg for all g ∈ L2 (i.e., fk converges weakly in L2 to f ). If fk 2 → f 2 , show that fk → f in L2 norm. The same is true for Lp , 1 < p < ∞, by a 1913 result of Radon. 18. Prove the parallelogram law for L2 : f + g 2 + f − g 2 = 2 f 2 + 2 g 2 . Is this true for Lp when p = 2? The geometric interpretation is that the sum of the squares of the lengths of the diagonals of a parallelogram equals the sum of the squares of the edge lengths. 19. Prove that a finite dimensional Hilbert space is isometric with Rn for some n. 20. Construct a function in L1 (−∞, +∞) that is not in L2 (a, b) for any a < b. +∞ (Let g(x) = x−1/2 on (0,1) and g(x) = 0 elsewhere, so that −∞ g = 2. Consider the function f (x) = ak g (x − rk ), where {rk } is the rational numbers and {ak } satisfies ak > 0, ak < +∞.) 21. If f ∈ Lp (Rn ) , 0 < p < ∞, show that 1 | f (y) − f (x)|p dy = 0 a.e. Qx |Q| lim

Q

Note by Exercise 5 that if this condition holds for a given p, then it also holds for all smaller p. let m = {mk } be 22. Let {φk } be a complete orthonormal system in L2 and 2, f ∼ a fixed bounded sequence of numbers. If f ∈ L ck φk , define Tf by Tf ∼ mk ck φk . Such an operator is called a Fourier multiplier operator. Show that T is bounded on L2 , that is, that there is a constant c

Lp Classes

209

independent of f such that ||Tf ||2 ≤ c||f ||2 for all f ∈ L2 . Show also that the smallest possible choice for c is ||m||l∞ . 23. Show that every of a separable metric space (M, d) is separa subset ble. (Let D = fk be a countable dense set in M, and for j = 1, 2, . . ., define Dj = f ∈ D : inf λ∈ d(λ, f ) < 1/j . If fk ∈ Dj , pick λk,j ∈ with

d λk,j , fk < 1/j and show that λk,j is dense in .) 24. Let E be a measurable set in Rn with 0 < |E| < ∞. Construct an orthog ∞ onal system φj j=0 in L2 (E) with φ0 = 1 everywhere in E. (Use Exercise 32 of Chapter 3 with θ = 1/2, and choose φj for j ≥ 1 to be appropriate simple functions with values ±1.) 25. If f is a measurable function on Rn , define f = supα>0 α|{|f | > α}|, and recall that f belongs to weak L1 (Rn ) if and only if f < ∞. Show that weak L1 (Rn ) has all the properties of a Banach space with respect to · except the triangle inequality. Show however that there is a constant κ > 1 such that the quasi-triangle inequality f +g ≤ κ(f +g) holds for all measurable f , g. (To show that κ cannot be 1, consider the case of one dimension and the functions f = χ[0,1/2] + 2χ(1/2,1] , g = 2χ[0,1/2] + χ(1/2,1] .) 26. Show that lim inf p→∞ ||f ||Lp (E) ≥ ||f ||L∞ (E) even if |E| = ∞. 27. (a) Prove Minkowski’s inequality for infinite series:

∞ ∞

|fk | ≤ ||fk ||p ,

k=1

p

1 ≤ p ≤ ∞.

k=1

(b) Show that in part (a), the opposite inequality holds if 0 < p ≤ 1: ∞ k=1

∞

||fk ||p ≤ |fk | ,

k=1

0 < p ≤ 1.

p

fk is positive and finite a.e., multiply and (For (b), assuming that p divide each fk in the summation on the left side of the inequality by p(1−p) fk and apply Hölder’s inequality with exponents q = 1/p and q = 1/(1 − p).) 28. Let φ(t) be a continuous function on [0, ∞) that is positive, increasing, and convex on (0, ∞) and that satisfies φ(0) = 0, limt→∞ φ(t) = ∞, and |φ(2t)| ≤ c|φ(t)| for some constant c independent of t. For exam

ple, the function φ(t) = t 1 + log+ t has these properties (see Exercise 25 of Chapter 7). If E is a measurable set in Rn , define the Orlicz space

210

Measure and Integral: An Introduction to Real Analysis

Lφ (E) to be the collection of all measurable f on E that are finite a.e. in E and satisfy φ(|f |) ∈ L(E). Show that Lφ (E) is a Banach space with norm ||f ||Lφ (E) = inf λ > 0 :

E

|f | φ λ

dx ≤ 1 .

In case φ(t) = t 1 + log+ t , the class is often denoted L log L(E) (see Exercise 22 of Chapter 9 for a result about functions in L log L (Rn )). 29. Let 1 ≤ p < ∞ and f ∈ Lp (Rn ). Show that ||f (x + h) − f (x)||p (where the norm is taken with respect to x) is a uniformly continuous function of h. Is the same true when 0 < p < 1? 30. Let 1 ≤ p < ∞ and E be a measurable set in Rn . (a) Prove that if f1 , f2 , g1 , g2 are nonnegative and measurable on E, then

f1 + f2

p

1/p

p dx + g1 + g2

E

p p f1 + g1 dx ≤

1/p

+

E

p p f2 + g2 dx

1/p

.

E

(b) If the right side in part (a) is replaced by

E

p f1

p + f2

1/p

dx

p p g1 + g2 dx +

1/p

,

E

is the resulting inequality true? ∞ (c) If fi,j i,j=1 are measurable functions on E, show that ⎡ ⎡ ⎛ ⎞ ⎤1/p p ⎤1/p p fi,j (x) dx⎦ ≤ fi,j (x) ⎠ dx⎦ . ⎣ ⎣ ⎝ E

j

i

i

E

j

For i = 1, . . . , N (N finite), consider the sequences Fi = fi,j j and p 1/p N . note that || N i=1 Fi ||l p = i=1 |fi,j | j 31. Let a = {ak } be a sequence of real or complex numbers. Show that ||a||p ≤ ||a||1 if 1 ≤ p ≤ ∞ and more generally that ||a||p ≤ ||a||q if 0 < q ≤ p ≤ ∞. (If 1 ≤ p < ∞, the inequality |a1 |p + |a2 |p ≤ (|a1 | + |a2 |)p may be used together with an induction argument.)

Lp Classes

211

32. For nonnegative measurable functions f and g on (0, ∞), let ∞ x dy g(y) , F(x) = f y y

x ∈ (0, ∞).

0

Also, if 1 ≤ p < ∞, set ⎛

⎞1/p dx [f ]p = ⎝ f (x)p ⎠ , x ∞ 0

and if p = ∞, define [f ]∞ = ess sup f . Prove that for 1 ≤ p ≤ ∞, [F]p ≤ [f ]1 [g]p .

(0,∞)

9 Approximations of the Identity and Maximal Functions

9.1 Convolutions The convolution of two functions f and g that are measurable in Rn is defined by (f ∗ g)(x) = f (t)g(x − t) dt, x ∈ Rn , Rn

provided the integral exists. In Theorem 6.14, we saw that if f , g ∈ L1 (Rn ), then f ∗ g exists a.e. and is measurable in Rn , and f ∗ g1 ≤ f 1 g1 . Moreover, according to Corollary 6.16, f ∗ g1 = f 1 g1 if f and g are nonnegative and measurable. In this section, we will study some additional properties of convolutions, beginning with the following theorem.

Theorem 9.1 Lp (Rn ) and

Let 1 ≤ p ≤ ∞, f ∈ Lp (Rn ) and g ∈ L1 (Rn ). Then f ∗ g ∈ f ∗ gp ≤ f p g1 .

Proof. We may suppose that 1 < p ≤ ∞, since when p = 1, the result is just Theorem 6.14. Let us first prove the result in case f and g are nonnegative. Then f ∗ g is nonnegative and measurable on Rn by Corollary 6.16. If p = ∞, (f ∗ g)(x) ≤ f ∞ g(x − t) dt = f ∞ g(x − t) dt = f ∞ g1 . Rn

Rn

Therefore, f ∗ g∞ ≤ f ∞ g1 , as claimed. If 1 < p < ∞, we write 1 1 + = 1. f (t)g(x − t)1/p g(x − t)1/p dt, (f ∗ g)(x) = p p n R

213

214

Measure and Integral: An Introduction to Real Analysis

By Hölder’s inequality with exponents p and p , (f ∗ g)(x) ≤

1/p

p

f (t) g(x − t) dt

Rn

1/p g(x − t) dt

Rn 1/p

= (f p ∗ g)(x)1/p g1 . Now raise the first and last terms in this inequality to the pth power and p integrate the result. Since Rn (f p ∗ g) dx = f p g1 by Corollary 6.16, we obtain p

p

1 + (p/p )

f ∗ gp ≤ f p g1

p

p

= f p g1 .

The theorem follows for f , g ≥ 0 by taking pth roots. For general f ∈ Lp (Rn ) and g ∈ L1 (Rn ), let us first show that f ∗ g exists a.e. and is measurable. By the case already considered, we have | f | ∗ |g| ∈ Lp (Rn ). Hence, | f | ∗ |g| < ∞ a.e., so that f (x − t)g(t) ∈ L1 (dt) for a.e. x. Consequently, f ∗ g exists and is finite a.e. To show it is measurable, define fN = f χ{|x| 0, let Kε (x) = ε−n K

x ε

= ε−n K

x

1

ε

,...,

xn . ε

For example, if K(x) = χ{|x|δ |Kε | → 0 as ε → 0, for any fixed δ > 0. Proof. Part (i) follows immediately from the change of variables y = x/ε (see Exercise 20 of Chapter 5). For part (ii), fix δ > 0, and let y = x/ε. Then |x|>δ

|Kε (x)| dx = ε−n

x dx = K ε

|x|>δ

|K(y)| dy.

|y|>δ/ε

Since K ∈ L and δ/ε → +∞ as ε → 0, it follows that the last integral tends to zero as ε → 0. This completes the proof. Note that for K ≥ 0, property (i) means that the areas under the graphs of K and Kε are the same, while (ii) means that for small ε, the bulk of the area under the graph of Kε is concentrated in the region above a small neighborhood of the origin. For any K ∈ L, we can expect from (ii) that the effect of letting ε → 0 in the formula (f ∗ Kε )(x) = f (t)Kε (x − t)dt will be to emphasize the values of f (t)

218

Measure and Integral: An Introduction to Real Analysis

when t is near x. As a simple illustration, let Qε (x) denote the cube in Rn of edge length ε centered at x and consider the kernel k(t) = χQ1 (0) (t). Then kε (x − t) = ε−n k((x − t)/ε) = ε−n χQε (x) (t), and 1 1 f ∗ kε (x) = n f (t) dt = f (t) dt. ε |Qε (x)| Qε (x)

Qε (x)

In this case, the Lebesgue Differentiation Theorem 7.2 implies that (f ∗ kε ) (x) → f (x) a.e as ε → 0 if f is locally integrable. The next four theorems show that (f ∗Kε )(x) → f (x) in various senses (e.g., in norm or pointwise) as ε → 0 if K is suitably restricted. A family {Kε : ε > 0} of kernels for which f ∗ Kε → f in some sense is called an approximation of the identity. See also Exercise 23(a). In what follows, we shall use the notation fε (x) for the convolution (f ∗ Kε )(x). Theorem 9.6 Let fε = f ∗ Kε , where K ∈ L1 (Rn ) and 1 ≤ p < ∞, then

Rn

K = 1. If f ∈ Lp (Rn ),

fε − f p → 0 as ε → 0. Proof. By Lemma 9.5(i), f (x) = f (x)

Rn

Kε (t) dt =

f (x)Kε (t) dt.

Rn

Therefore, | fε (x) − f (x)| = [ f (x − t) − f (x)] Kε (t) dt n R

≤

Rn

| f (x − t) − f (x)||Kε (t)|1/p |Kε (t)|1/p dt,

219

Approximations of the Identity and Maximal Functions

where 1/p + 1/p = 1 (1/p = 0 if p = 1). Applying Hölder’s inequality with exponents p and p and then raising both sides to the pth power and integrating with respect to x, we obtain

|fε (x) − f (x)|p dx

Rn

≤

Rn

=

p

|f (x − t) − f (x)| |Kε (t)| dt

Rn

p/p K1

Rn

p/p |Kε (t)| dt

Rn

dx

p

| f (x − t) − f (x)| |Kε (t)| dt dx.

Rn

Changing the order of integration in the last expression (which is justified since the integrand is nonnegative), we obtain p/p

p

fε − f p ≤ K1

|Kε (t)|φ(t) dt,

Rn

where φ(t) =

p

Rn

| f (x − t) − f (x)|p dx = f (x − t) − f (x)p . For δ > 0, write

Iε =

|Kε (t)|φ(t) dt =

|t| 0, we can choose δ so small that φ(t) < η if |t| < δ (note that φ(t) → 0 as |t| → 0 by Theorem 8.19). Then Aε,δ ≤ η

|Kε (t)| dt ≤ ηK1

|t| 0, write f = g + h where g has compact support and ||h||p < η. Choose a kernel K ∈ C∞ 0 with Rn K = 1, and

220

Measure and Integral: An Introduction to Real Analysis

let gε = g ∗ Kε . Then gε ∈ C∞ 0 and, by Theorem 9.6, g − gε p → 0. By Minkowski’s inequality, f − gε p ≤ g − gε p + hp < g − gε p + η. Choosing ε so that g − gε p < η, we obtain f − gε p < 2η, and the corollary follows. The next result is a substitute for Theorem 9.6 in case f ∈ L∞ . Theorem 9.8 Let fε = f ∗ Kε , where K ∈ L1 (Rn ) and Rn K = 1. If f ∈ L∞ (Rn ), then fε → f as ε → 0 at every point of continuity of f , and the convergence is uniform on any set where f is uniformly continuous. Proof. Note that for every ε > 0, fε (x) converges absolutely for all x since f ∈ L∞ and K ∈ L1 . As before, | fε (x) − f (x)| ≤

|f (x − t) − f (x)||Kε (t)| dt.

Rn

If f is continuous at x, then given η > 0, there exists δ > 0 such that | f (x − t) − f (x)| < η if |t| < δ. Hence, Rn

| f (x − t) − f (x)||Kε (t)| dt ≤ η

|Kε (t)| dt + 2 f ∞

|t| 0, choose δ > 0 such that | f (x − t) − f (x)| < η if |t| < δ. Note that fε (x) converges absolutely for all x since f ∈ L1 and K ∈ L∞ . As usual, | fε (x) − f (x)| ≤ η

|Kε (t)| dt +

|t| 0.

(9.10)

Pε is called the Poisson kernel, and the convolution fε (x) = (f ∗ Pε )(x) =

+∞ 1 ε f (t) 2 dt π −∞ ε + (x − t)2

is called the Poisson integral of f. Setting ε = y and letting f (x, y) = fy (x), we obtain a function f (x, y) defined in the upper half plane {(x, y) : −∞ < x < +∞, y > 0}. Notice that y/(y2 + x2 ) is the imaginary part of −1/z, z = x + iy, and so is harmonic in the upper half-plane; that is, Py (x) satisfies Laplace’s equation

∂2 ∂2 Py (x) = 0 if y > 0. + ∂x2 ∂y2

We leave it as an exercise to show that if f ∈ Lp , 1 ≤ p ≤ ∞, then

∂2 ∂2 + 2 2 ∂x ∂y

+∞

∂2 ∂2 f (t) + 2 f (x, y) = 2 ∂x ∂y −∞

Py (x − t) dt,

so that (∂ 2 /∂x2 + ∂ 2 /∂y2 )f (x, y) = 0 for y > 0. Hence, f (x, y) is also harmonic in the upper half-plane. If f is integrable on (−∞, +∞), it follows from Theorem 9.9 that f (x, y) → f (x) as y → 0 wherever f is continuous. Thus, f (x, y) solves the Dirichlet problem for the upper half-plane; that is, if f (x) is continuous and integrable on (−∞, +∞), then f (x, y) defines a function that is harmonic in the upper half-plane and that tends to f (x) as y → 0. See also Exercises 15, 16. The Fejér kernel. Let 1 K(x) = π

sin x x

2

[x ∈ (−∞, +∞)].

Then K satisfies the same conditions as P(x) in the previous example, and Kε (x) = (1/π)[ε sin2 (x/ε)/x2 ]. Setting w = 1/ε, w > 0, we obtain the Fejér kernel F(x, w) =

1 sin2 wx . π wx2

(9.11)

Approximations of the Identity and Maximal Functions

223

If f εL(−∞, +∞) and if f is continuous at x, then by Theorem 9.9, +∞ 1 sin2 wt lim f (x − t) dt = f (x). w→+∞ π wt2 −∞

The Gauss–Weierstrass kernel. The function 1 2 K(x) = √ e−x π

[x ∈ (−∞, +∞)]

also satisfies all the required conditions (see Exercise 11 of Chapter 6). √ 2 2 √ Here, Kε (x) = (1/ πε)e−x /ε , and letting ε = y, y > 0, we obtain the Gauss–Weierstrass kernel 1 2 W(x, y) = √ e−x /y . πy

(9.12)

The convolution +∞ 1 2 f (x − t)e−t /y dt Wf (x, y) = f ∗ W(·, y) (x) = √ πy −∞

is called the Gauss–Weierstrass integral of f . If f is integrable on (−∞, +∞) and continuous at x, then lim Wf (x, y) = f (x).

y→0+

Notice that W(x, y) satisfies the heat equation ∂2 ∂ W=4 W ∂y ∂x2 in the upper half-plane, as does Wf (x, y) if f ∈ Lp (−∞, ∞) for some p, 1 ≤ p ≤ ∞. Higher dimensional versions of the Gauss–Weierstrass and Poisson kernels are discussed in Chapter 13. If we strengthen the condition K(x) = o(|x|−n ), |x| → + ∞, used in Theorem 9.9, we can obtain the convergence of fε to f almost everywhere. The following result is fairly typical of theorems of this kind. Its hypotheses are met by any of the three examples just listed.

224

Measure and Integral: An Introduction to Real Analysis

Theorem 9.13 Suppose that f ∈ L(Rn ), K is bounded, K(x) = O(|x|−n−λ ) as |x| → +∞ for some λ > 0, and Rn K = 1. If fε = f ∗ Kε , then fε → f at each point of the Lebesgue set of f . Proof. Let x0 be a point of the Lebesgue set of f (see (7.14)), so that ρ−n |x| 0, there is a δ > 0 such that F(ρ) < ηρn if ρ ≤ δ. Write

| f (x)|

Rn

ελ dx = + n+λ (ε + |x|) |x|≤δ

= A + B.

|x|>δ

Taking φ(ρ) = ελ /(ε + ρ)n+λ and [a, b] = [0, δ] in Lemma 9.14, we have A=

δ 0

ελ dF(ρ). (ε + ρ)n+λ

Integrating by parts and observing that F(0) = 0, we obtain ελ ελ F(δ) + (n + λ) F(ρ) dρ. n+λ (ε + δ) (ε + ρ)n+λ+1 δ

A=

0

226

Measure and Integral: An Introduction to Real Analysis

The first term on the right tends to zero as ε → 0. The definition of δ and the change of variables ρ = εt show that the second term is at most (n + λ)η

δ 0

ρn

δ/ε ελ tn dρ = (n + λ)η dt. n+λ+1 (ε + ρ) (1 + t)n+λ+1 0

Hence, lim sup A ≤ (n + λ)η ε→0

∞ 0

tn = c η, (1 + t)n+λ+1

where c (=cλ,n ) is finite since λ > 0. Finally, to estimate B, note that if |x| > δ, then ε + |x| > δ, so that B≤

ελ δn+λ

| f (x)| dx ≤

|x|>δ

ελ δn+λ

f 1 .

Hence, limε→0 B = 0. Combining these estimates, we obtain lim supε→0 (A + B) ≤ c η, and the theorem follows by letting η → 0. See Exercise 12 for the case f ∈ Lp , p > 1. The kernels {Kε } for K satisfying the kinds of conditions earlier are examples of approximations of the identity.

9.3 The Hardy–Littlewood Maximal Function Let f ∗ denote the Hardy–Littlewood maximal function of f : f ∗ (x) = sup

1 | f (y)| dy, |Q| Q

where the supremum is taken over all cubes Q with center x and edges parallel to the coordinate axes (see (7.5)). If f ∈ Lp (Rn ) for some p ≥ 1, then f is locally integrable in Rn , and consequently, | f | ≤ f ∗ a.e. by Theorem 7.11. We observed on p. 136 in Section 7.2 that f ∗ is not integrable over Rn (unless f = 0 a.e.), but does satisfy the weak-type condition |{x ∈ Rn : f ∗ (x) > α}| ≤

c f 1 α

(α > 0),

(9.15)

Approximations of the Identity and Maximal Functions

227

where c depends only on n (Lemma 7.9). The behavior of f ∗ on the other Lp spaces, 1 < p ≤ ∞, turns out to be better. For example, it is clear from the definition of f ∗ that f ∗ (x) ≤ f ∞ for all x. Thus, f ∗ is bounded if f is, and f ∗ ∞ ≤ f ∞ . The following theorem describes the behavior of f ∗ when f ∈ Lp , p > 1.

Theorem 9.16

Let 1 < p ≤ ∞ and f ∈ Lp (Rn ). Then f ∗ ∈ Lp (Rn ) and f ∗ p ≤ c f p ,

where c depends only on n and p. Proof. Let f ∈ Lp (Rn ). We may assume that 1 < p < ∞ since the result is obvious with constant c = 1 when p = ∞. The idea is to obtain information for Lp by interpolating between the known results for L1 and L∞ . For α > 0, let ω(α) = x ∈ Rn : f ∗ (x) > α denote the distribution function of f ∗ . Fix α > 0 and define a function g by g(x) = f (x) when | f (x)| ≥ α/2 and g(x) = 0 otherwise. Note that g ∈ L1 (Rn ) since

g1 =

| f (x)| dx

{x∈Rn :|f (x)|≥α/2}

≤

| f (x)|

Rn

| f (x)| α/2

p−1

dx =

p−1 2 p f p < ∞. α

Also, the difference f − g ∈ L∞ (Rn ); in fact, f − g∞ ≤ α/2. Then since | f (x)| ≤ |g(x)| + α/2, f ∗ (x) ≤ sup

1 α α |g(y)| dy + = g∗ (x) + . |Q| 2 2 Q

In particular, α {x ∈ Rn : f ∗ (x) > α} ⊂ x ∈ Rn : g∗ (x) > , 2

228

Measure and Integral: An Introduction to Real Analysis

so that, by (9.15), α ω(α) ≤ x ∈ Rn : g∗ (x) > 2 2c 2c ≤ g1 = α α n

| f (x)| dx.

{x∈R :|f (x)|≥α/2}

∞ We have the formula Rn f ∗p dx = p 0 αp−1 ω(α) dα, which was stated on two occasions: Exercise 16 of Chapter 5, and Exercise 5 of Chapter 6. Hence,

f ∗p dx ≤ p

Rn

∞ 0

⎡ 2c αp−1 ⎣ α

⎤

|f (x)| dx⎦ dα.

{x∈Rn :| f (x)|≥α/2}

Interchanging the order of integration in the expression on the right (which is justified since the integrand is nonnegative), we obtain

f ∗p dx ≤ 2cp

Rn

⎛ | f (x)| ⎝

Rn

2|f(x)|

⎞ αp−2 dα⎠ dx.

0

Since p − 2 > −1 (i.e., p > 1), the inner integral equals (2| f (x)|)p−1 /(p − 1) a.e. (wherever f (x) is finite), so that Rn

f ∗p dx ≤

2p pc 2p pc p f p . | f (x)|p dx = p−1 n p−1 R

p

Taking pth roots, we see that f ∗ p ≤ Cp f p , where Cp = 2p pc/(p − 1). This completes the proof. Note that the constant Cp tends to +∞ as p → 1 and is bounded as p → ∞. The Hardy–Littlewood maximal function plays an important role in many parts of analysis concerned with operator theory and differentiation. It arose naturally in Chapter 7 in connection with Lebesgue’s differentiation theorem, and it will be used in the proof of Theorem 9.19 and frequently in later chapters. As another illustration of its usefulness, we have the following result. Theorem 9.17 Let K(x) be nonnegative and integrable on Rn and suppose that K(x) depends only on |x| and decreases as |x| increases (i.e., K(x) = φ(|x|), where φ(t), t > 0, is monotone decreasing). Then

229

Approximations of the Identity and Maximal Functions

sup |(f ∗ Kε )(x)| ≤ cf ∗ (x), ε>0

with c independent of f . In particular, for such kernels K, || sup |(f ∗ Kε )| ||p ≤ c|| f ||p ,

if 1 < p ≤ ∞,

ε>0

|{x : sup |(f ∗ Kε )(x)| > α}| ≤ ε>0

c || f ||1 , α > 0, α

if p = 1,

with c independent of f and α. Proof. We first remark that there is a constant c depending only on n such that

sup δ−n δ>0

| f (x − y)| dy ≤ cf ∗ (x).

|y| 0, Kε (y) > t}. Then E is a measurable subset of Rn+1 by Theorem 5.1, and Kε (y) =

K ε (y)

dt =

0

∞

χE (y, t) dt.

0

Hence, ⎡ ⎤ ∞ |(f ∗ Kε )(x)| = f (x − y)Kε (y) dy ≤ | f (x − y)| ⎣ χE (y, t) dt⎦ dy. n n R

R

0

Changing the order of integration in the last expression, we obtain |(f ∗ Kε )(x)| ≤

∞ 0

=

∞ 0

| f (x − y)|χE (y, t) dy dt

Rn

⎡ ⎣

⎤ | f (x − y)| dy⎦ dt.

{y:Kε (y)>t}

Let Et = {y : Kε (y) > t}, t > 0. Since K(y) depends only on |y| and decreases as |y| increases, Et is a ball with center 0 unless it is empty or the single point 0.

230

Measure and Integral: An Introduction to Real Analysis

In the t-integration in the last display, we may ignore any values of t such that |Et | = 0 since the inner integral is then zero. For all other t, Et is a ball, and by our earlier remark, the inner integral satisfies Et

⎡

⎤ 1 | f (x − y)| dy = |Et | ⎣ | f (x − y)| dy⎦ ≤ |Et | cf ∗ (x). |Et | Et

By combining estimates, we have |(f ∗ Kε )(x)| ≤

∞

|Et | cf ∗ (x) dt = cf ∗ (x)

0

∞

|Et | dt.

0

Finally, note that |Et | is the distribution function of Kε , so that ∞

|Et | dt = Kε 1 = K1 .

0

Therefore, |(f ∗ Kε )(x)| ≤ c||K||1 f ∗ (x), and the first statement of the theorem follows by taking the sup over ε > 0. The second statement is then a corollary of Theorem 9.16 if p > 1, and of (9.15) if p = 1. We leave it to the reader (Exercise 18) to show that Theorem 9.17 can also be derived from the formula in the conclusion of Lemma 9.14 (see, e.g., Exercise 17 and the proof of Theorem 12.61). In particular, for the kernel K(x) = 1/(1 + |x|n+λ ), λ > 0, Theorem 9.17 gives ελ sup f (x − y) n+λ dy ≤ cf ∗ (x), n+λ ε + |y| ε>0 n

(9.18)

R

a fact that will be used in the next section. Note that the conclusion of Theorem 9.17 is valid for any K that is majorized in absolute value by a kernel satisfying the hypothesis of Theorem 9.17. This includes any K satisfying the hypothesis of Theorem 9.13.

9.4 The Marcinkiewicz Integral We recall from Theorem 6.17, that if F is a closed subset of a bounded open interval (a, b) in R1 , and if δ(x) denotes the distance from x to F, then the Marcinkiewicz integral

Approximations of the Identity and Maximal Functions

Mλ (x) =

b a

δλ (y) dy |x − y|1+λ

231

(λ > 0)

is integrable over F. More generally, in Exercise 7 of Chapter 6, we considered the expression δλ (y)f (y) dy |x − y|1+λ 1

(λ > 0),

R

where f is nonnegative and integrable over the complement of F. If f = χ(a,b) , this reduces to Mλ (x). In case λ = 1, Mλ plays a role in proving Theorem 12.67. Now we consider Lp estimates for n-dimensional analogues, namely, for the integral Jλ (f )(x) =

δλ (y)f (y) dy |x − y|n+λ n

(x ∈ Rn ),

R

and the modified form Hλ (f )(x) =

Rn

δλ (y)f (y) dy. |x − y|n+λ + δ(x)n+λ

Here again, λ > 0, δ(x) denotes the distance from x to a closed set F ⊂ Rn and f is nonnegative and measurable on Rn . Notice that Hλ (f ) and Jλ (f ) are equal in F since δ is zero there. For the same reason, Jλ (f ) and Hλ (f ) are independent of the values of f on F. Therefore, we may assume for simplicity that f = 0 on F. We will prove in the next theorem that if f ∈ Lp (Rn − F), 1 ≤ p < ∞, then Hλ (f ) ∈ Lp (Rn ). This implies the basic fact that Jλ (f ) ∈ Lp (F). (In general, Jλ (f ) diverges outside F: see, e.g., Exercise 9 of Chapter 6.) For the proof, it will be convenient to consider one more modification of Jλ (f ), namely, Hλ (f )(x) =

Rn

δλ (y)f (y) dy. |x − y|n+λ + δ(y)n+λ

As we move from a point x to another point y, the distance from F does not increase by more than |x − y|. Hence, |δ(x) − δ(y)| ≤ |x − y|.

232

Measure and Integral: An Introduction to Real Analysis

It follows that δ(y) < |x − y| + δ(x), so that we have δn+λ (y) ≤ 2n+λ [|x − y|n+λ + δ(x)n+λ ], as well as a similar inequality with x and y interchanged. We immediately obtain that 2−n−λ−1 Hλ (f )(x) ≤ Hλ (f )(x) ≤ 2n+λ+1 Hλ (f )(x). Thus, inequalities for Hλ lead to ones for Hλ , but Hλ is easier to deal with. Theorem 9.19

If f ∈ Lp (Rn ), 1 ≤ p < ∞, and λ > 0, then Hλ (f ) ∈ Lp (Rn ) and Hλ (f )p ≤ c f p ,

where c is independent of f . In particular, Jλ (f )p,F ≤ c f p . Proof. Fix p, 1 ≤ p < ∞, and let g be any nonnegative function with gp ≤ 1, where 1/p + 1/p = 1. By interchanging the order of integration, we obtain

Hλ (f )(x)g(x) dx =

Rn

f (y)δλ (y)

Rn

Rn

g(x) dx dy. |x − y|n+λ + δ(y)n+λ

The outer integration on the right can be restricted to Rn −F without changing the value of the integral. However, if y ∈ Rn − F, then δ(y) > 0, and it follows from (9.18) that the inner integral on the right is bounded by cδ(y)−λ g∗ (y). Combining this estimate with Holder’s inequality and Theorem 9.16, we obtain Rn

Hλ (f )(x)g(x) dx ≤ c

f (y)g∗ (y) dy

Rn

≤ c f p g∗ p ≤ c1 f p gp ≤ c1 f p . By (8.9), the supremum of the left side for all such g is Hλ (f )p , so that Hλ (f )p ≤ c1 f p , and the theorem follows.

233

Approximations of the Identity and Maximal Functions

Exercises 1. Use Minkowski’s integral inequality (see Exercise 8 of Chapter 8) to prove Theorem 9.1 for 1 < p < ∞. 2. (a) Prove Young’s Theorem 9.2. (For f , g ≥ 0 and p, q, r < ∞, write

( f ∗ g)(x) =

f (t)p/r g(x − t)q/r · f (t)p(1/p−1/r) · g(x − t)q(1/q−1/r) dt,

and apply Hölder’s inequality for three functions (Exercise 6 of Chapter 8) with exponents r, p1 , and p2 , where 1/p1 = 1/p − 1/r, 1/p2 = 1/q − 1/r.) (b) Suppose that the conclusion of Theorem 9.2 holds for three indices p, q, r > 0. Show that 1/r = 1/p + 1/q − 1. (Apply the inequality in the conclusion to the dilated functions f (λx) and g(λx), λ > 0, and let λ vary. A similar method is used in Exercise 13 of Chapter 14.)

3. (a) Show that if f ∈ Lp (Rn ) and K ∈ Lp (Rn ), 1 ≤ p ≤ ∞, 1/p + 1/p = 1, then f ∗ K is bounded and continuous in Rn . (b) Sketch the (trapezoidal) graph of χI ∗ χJ where I and J are onedimensional intervals. Consider also the case when the two intervals are the same. 4. (a) Show that the function h defined by h(x) = e−1/x for x > 0 and h(x) = 0 for x ≤ 0 is in C∞ . 2

(b) Show that the function g(x) = h(x − a)h(b − x), a < b, is C∞ with support [a, b]. n (c) Construct a function in C∞ 0 (R ) whose support is a ball or an interval. 5. Let G and G1 be bounded open subsets of Rn such that G1 ⊂ G. Construct a function h ∈ C∞ 0 such that h = 1 in G1 and h = 0 outside G. (Choose an open G2 such that G1 ⊂ G2 , G2 ⊂ G. Let h = χG2 ∗ K for a K ∈ C∞ with suitably small support and K = 1.) 6. Prove Theorem 9.4. 7. Let f ∈ Lp (−∞, +∞), 1 ≤ p ≤ ∞. Show that the Poisson integral of f , f (x, y), is harmonic in the upper half-plane y > 0. (Show that ((∂ 2 /∂x2 ) + +∞ (∂ 2 /∂y2 ))f (x, y) = −∞ f (t)((∂ 2 /∂x2 ) + (∂ 2 /∂y2 ))Py (x − t) dt.)

8. (Schur’s lemma) For s, t ≥ 0, let K(s, t) satisfy K ≥ 0 and K(λs, λt) = ∞ λ−1 K(s, t) for all λ > 0, and suppose that 0 t−1/p K(1, t) dt = γ < +∞ for some p, 1 ≤ p ≤ ∞. For example, K(s, t) = 1/(s + t) has these properties. Show that if

234

Measure and Integral: An Introduction to Real Analysis

(Tf )(s) =

∞

f (t)K(s, t) dt (f ≥ 0),

0

then Tf p ≤ γ f p . (Note that K(s, t) = s−1 K(1, t/s), and therefore ∞ (Tf )(s) = 0 f (ts)K(1, t) dt. Now apply Minkowski’s integral inequality [see Exercise 8 of Chapter 8].) 9. (a) The maximal function is defined as f ∗ (x) = sup |Q|−1 Q | f |, where the supremum is taken over cubes Q with center x. Let f ∗∗ (x) be defined similarly, but with the supremum taken over all cubes Q containing x. Thus, f ∗ (x) ≤ f ∗∗ (x). Show that there is a positive constant c depending only on the dimension such that f ∗∗ (x) ≤ cf ∗ (x). (b) If f ∗∗ were instead defined to be sup |B|−1 B | f | where the supremum is taken over all balls B containing x, show that there are positive constants c1 and c2 depending only on the dimension so that c1 f ∗ (x) ≤ f ∗∗ (x) ≤ c2 f ∗ (x) for all x. 10. Let T : f → Tf be a function transformation that is sublinear; that is, T has the property that if Tf1 and Tf2 are defined, then so is T(f1 + f2 ), and |T(f1 + f2 )(x)| ≤ |(Tf1 )(x)| + |(Tf2 )(x)|. Suppose also that there are constants c1 and c2 such that T satisfies Tf ∞ ≤ c1 f ∞ and |{x : |(Tf )(x)| > α}| ≤ c2 α−1 f 1 , α > 0. Show that for 1 < p < ∞, there is a constant c3 such that Tf p ≤ c3 f p . This is a special case of an interpolation result due to Marcinkiewicz. (An example of such a T is the maximal function operator Tf = f ∗ , and the proof in the general case is like that for f ∗ .) 11. Generalize Theorem 9.6 as follows: Let fε = f ∗Kε , K ∈ L1 (Rn ) and Rn K = γ. If f ∈ Lp (Rn ), 1 ≤ p < ∞, show that fε − γf p → 0. Derive analogous results for Theorems 9.8, 9.9, and 9.13. (The case γ = 0 follows from the case γ = 1 by considering K(x)/γ.) 12. Show that the conclusions of Theorems 9.9 and 9.13 remain true if the assumption that f ∈ L1 is replaced by f ∈ Lp , p > 1. 13. Let f ∈ Lp (0, 1), 1 ≤ p < ∞, and for each k = 1, 2, . . . , define a function fk on (0,1) by letting Ik,j = {x : (j − 1)2−k ≤ x < j2−k }, j = 1, . . . , 2k , and setting fk (x) equal to |Ik,j |−1 Ik,j f for x ∈ Ik,j . Prove that fk → f in Lp (0, 1) norm. (Exercise 17 of Chapter 7 may be helpful for the case p = 1.) 14. Show that Theorem 9.6 and Corollary 9.7 fail for p = ∞. 15. Regarding the Dirichlet problem for the upper half-space, more can be said about the behavior of the Poisson integral f (x, y) of f (x) near a point of continuity of f (x). Prove that if f ∈ Lp (−∞, ∞), 1 ≤ p ≤ ∞, and f

Approximations of the Identity and Maximal Functions

235

is continuous at a point x0 , then f (x, y) → f (x0 ) as (x, y) approaches x0 unrestrictedly, that is, as x → x0 and y → 0, y > 0. 16. Let 1 ≤ p ≤ ∞, f ∈ Lp (−∞, ∞), and x0 be a Lebesgue point of f . Show that the Poisson integral f (x, y) of f converges nontangentially to f (x0 ), that is, show that for any γ > 0, f (x, y) → f (x0 ) as x → x0 and y → 0 with |x − x0 | ≤ γy. (Note that the Poisson kernel satisfies Py (t + z) ≤ Cγ Py (t) if |z| ≤ γy, with Cγ independent of y, t, z.) (See also Theorems 12.42 and 12.64.) 17. Prove that the conclusion of Lemma 9.14 holds without assuming φ is b continuous provided a φ(ρ) dF(ρ) exists. Show, for example, that the conclusion holds if φ is monotone and finite on [a, b]. (For the second b b part, recall from Theorem 2.21 that a φ dF exists if a F dφ does.) 18. Derive the first part of Theorem 9.17 by using the formula in the conclusion of Lemma 9.14 (even though the function φ in Theorem 9.17 is not assumed to be continuous). (Use Exercise 17. Note that if K(x) = φ(|x|) satisfies the hypothesis of Theorem 9.17, then φ(|x|) = o(|x|−n ) as |x| → 0 and as |x| → ∞). n 19. Let K(x) be a nonnegative, decreasing, integrable radial function on R (i.e., K satisfies the hypothesis of Theorem 9.17), and let K = 1. Use Theorem 9.17 to show that if 1 ≤ p ≤ ∞ and f ∈ Lp (Rn ), then f ∗Kε → f a.e. as ε → 0. (In case 1 ≤ p < ∞, a proof reminiscent of the proof of Lebesgue’s differentiation theorem can be constructed by applying Corollary 9.7.)

20. Show that the conclusion f ∗ Kε → f in Exercise 19 is valid at every Lebesgue point of f . (A proof based on integrating the formula in Lemma 9.14 by parts is possible; cf. Exercise 18.) 21. Let f and g be locally integrable functions on Rn and suppose that | f | ∗ |g| is finite a.e. in Rn . Prove that f ∗ g is measurable on Rn . (See the argument in the last part of the proof of Theorem 9.1 involving the truncated functions fN and also truncate g.) 22. As we know from Chapter 7, the maximal function f ∗ of an f ∈ L1 (Rn ) maynot be locally integrable. Show that if f satisfies the stronger condition Rn | f |(1 + log+ | f |) dx < ∞, then f ∗ ∈ L1 (E) for every measurable set E with |E| < ∞, and

f ∗ dx ≤ C |E| + | f | log+ | f | dx ,

E

Rn

with C independent of f and distribution function ∞E. (Let ω(α) denote γ the ∞ of f ∗ relative to E. Write 0 ω(α) dα = 0 + γ for γ > 0. The first integral on the right is bounded by γ|E|. In the second integral, use the estimate ω(α) ≤ Cα−1 {| f |>α/2} | f |; cf. the proof of Theorem 9.16.)

236

Measure and Integral: An Introduction to Real Analysis

23. (a) Show that there is no identity element for the convolution operation on L1 (Rn ), that is, there is no function k ∈ L1 (Rn ) such that f ∗ k = f a.e. for every f ∈ L1 (Rn ). (b) Show that if f , g ∈ L2 (Rn ), then f ∗ g belongs to L∞ (Rn ) but not necessarily to L2 (Rn ). See also Lemma 13.49.

10 Abstract Integration In the preceding chapters, we developed a theory of integration based on a theory of measurable sets. The notion of the measure of a set was in turn based on the primitive and classical notion of the measure (or volume) of an interval in Rn ; this led almost automatically by the process of covering to the notion of measure for more general sets. In this chapter, we follow an alternate approach. We will consider a family of sets and assume that they all have measures, that is, assume that with each member of the family, we can associate a nonnegative number satisfying elementary and natural requirements that justify calling it a measure. Starting with this assumption, we will develop a theory of integration that follows the pattern of Lebesgue integration. The advantage of this method is that it can be applied not only to Rn but also to general abstract spaces with much less geometric structure than Rn . Thus, it is important for applications. There are new questions that arise in the abstract setting, but many of the theorems and proofs are practically the same as those for Lebesgue measure in Rn . In such cases, we will usually refer to earlier chapters for proofs. It is natural to ask how we can construct such measures. One possible approach is to start with the more elementary notion of an outer measure in an abstract space and, as in the case of Rn discussed in Chapter 3, select a subclass of sets on which the outer measure has additional properties, qualifying it as a measure. This idea will be developed in Chapter 11.

10.1 Additive Set Functions and Measures Let S be a fixed set, and let be a σ-algebra of subsets of S ; that is, let satisfy the following: (a) S ∈ . (b) If E ∈ , then its complement CE (=S − E) ∈ (i.e., is closed under complements). (c) If Ek ∈ for k = 1, 2, . . . , then Ek ∈ (i.e., is closed under countable unions).

237

238

Measure and Integral: An Introduction to Real Analysis

It is easy to see that the definition is unchanged if condition (a) is replaced by the assumption that be nonempty; see also p. 49 in Section 3.2. Another widely used term for a σ-algebra is a countably additive family of sets. Immediate consequences of the definition are that the following sets belong to : (1) The empty set ∅ ( = CS ), (2) Ek if Ek ∈ , k = 1, 2, . . ., ∞ ∞ ∞ ∞ (3) lim sup Ek ( = m=1 k=m Ek ) and lim inf Ek ( = m=1 k=m Ek ) if each Ek ∈ , (4) E1 − E2 ( = E1 ∩ CE2 ) if E1 , E2 ∈ . We recall the basic fact that the collection of Lebesgue measurable subsets of Rn is a σ-algebra: see Theorem 3.20. In general, the elements E of a σ-algebra are called -measurable sets, or simply measurable sets if it is clear from context what is. If is a σ-algebra, then a real-valued function φ(E), E ∈ , is called an additive set function on if (i) φ(E) is finite for every E ∈ , (ii) φ( Ek ) = φ(Ek ) for every countable family {Ek } of disjoint sets in . Since Ek is independent of the order of the Ek ’s, the series in (ii) converges absolutely. We obtain a simple example of a set function by choosing to be the σ-algebra of all subsets of S and defining φ(E) = χE (x0 ) for a fixed x0 ∈ S . As another example, let be the collection of all Lebesgue measurable subsets of Rn , and define φ(E) = E f , where f ∈ L(Rn ). A function μ(E) defined for E in is called a measure on if (i) 0 ≤ μ(E) ≤ +∞, Ek ) = μ(Ek ) for every countable family {Ek } of disjoint sets in .

(ii) μ(

The choices μ ≡ 0 or μ ≡ +∞ are always possible, but of little interest. If μ is a measure on , then the triplet (S , , μ) is called a measure space. For example, Lebesgue measure together with the class of Lebesgue measurable subsets of Rn is a measure space. As another example, let S be any countable numbers. Let be the set, S = {xk }, and let {ak } be a sequence of nonnegative family of all subsets of S , and define δ(E) = akj if E = {xkj }. Then (S , , δ) is a measure space. Such a space is called a discrete measure space. The distinction between a measure and an additive set function is that a measure is nonnegative, but may be infinite, while an additive set function may take both positive and negative values, but is finite. Any nonnegative

239

Abstract Integration

additive set function is a finite measure and vice versa. There are similarities between many of the properties of set functions and measures. If E1 ⊂ E2 and μ is a measure, then μ(E2 − E1 ) + μ(E1 ) = μ(E2 ), so that μ(E2 − E1 ) = μ(E2 )−μ(E1 ) if μ(E1 ) is finite. If φ is an additive set function, then the formula φ(E2 −E1 ) = φ(E2 )−φ(E1 ), E1 ⊂ E2 , always holds. Choosing E1 = E2 , we see that φ(∅) = 0 for an additive set function, and also μ(∅) = 0 for a measure, unless μ(E) = +∞ for all E. Moreover, if E1 ⊂ E2 , then μ(E1 ) ≤ μ(E2 ) even if μ(E1 ) = +∞, and φ(E1 ) ≤ φ(E2 ) if φ ≥ 0. The next few results concern limit properties and a basic decomposition for additive set functions. Both S and are fixed. Theorem 10.1 If {Ek } is a monotone sequence of sets in (i.e., Ek E or Ek E) and φ is an additive set function, then φ(E) = limk→∞ φ(Ek ). Proof. If Ek E, then E = disjointness,

φ(E) = φ(E1 ) +

∞

Ek = E1 ∪ (E2 − E1 ) ∪ (E3 − E2 ) ∪ · · · . Hence, by

φ(Ek − Ek−1 )

k=2

= φ(E1 ) + lim

N→∞

N

[φ(Ek ) − φ(Ek−1 )] = lim φ(EN ). N→∞

k=2

On the other hand, if Ek E, then S − Ek S − E. Therefore, by the case already considered, we have φ(S − Ek ) → φ(S − E). Since φ(S − Ek ) = φ(S ) − φ(Ek ) and φ(S − E) = φ(S ) − φ(E), the result follows. The next theorem is similar to Fatou’s lemma (Theorem 5.17). Theorem 10.2 Let φ be a nonnegative additive set function, and let {Ek } be any sequence of sets in . Then φ(lim inf Ek ) ≤ lim inf φ(Ek ) ≤ lim sup φ(Ek ) ≤ φ(lim sup Ek ). k→∞

k→∞

Proof. The sets Hm = ∞ k=m Ek increase to lim inf Ek . Therefore, by the preceding theorem, φ(lim inf Ek ) = lim φ(Hm ). Since Hm ⊂ Em and φ ≥ 0, we have φ(Hm ) ≤ φ(Em ) and lim φ(Hm ) ≤ lim inf φ(Em ). Therefore, φ(lim inf Ek ) ≤ lim inf φ(Em ), which proves the first inequality. The proof of the third one is similar, and the second is obvious.

240

Measure and Integral: An Introduction to Real Analysis

If E ∈ , the collection of sets E∩A as A ranges over forms a σ-algebra of subsets of E. In fact, is just the collection of all -measurable subsets of E. If ψ is an additive set function on , then its restriction to is additive on . On the other hand, if φ is an additive set function on , then the function defined by ψ(A) = φ(A ∩ E) is additive on . Now, let φ be an additive set function on the measurable subsets of a set E ∈ , and define V(E) = V(E; φ) = sup φ(A),

V(E) = V(E; φ) = − inf φ(A), A⊂E A∈

A⊂E A∈

V(E) = V(E; φ) = V(E) + V(E)

(10.3)

to be the upper, lower, and total variation of φ on E, respectively. Note that all three are nonnegative since φ(∅) = 0. Moreover, as is easy to see from the definitions, −V(E) ≤ φ(A) ≤ V(E)

if A ⊂ E and A ∈ ,

and therefore, supA⊂E,A∈ |φ(A)| ≤ V(E). In fact, sup |φ(A)| ≤ V(E) ≤ 2 sup |φ(A)|. A⊂E A∈

A⊂E A∈

Also, each variation is monotone increasing with E; that is, if E1 ⊂ E2 , then V(E1 ) ≤ V(E2 ), etc. of all Lebesgue measurable sets Rn and φ(E) in −= In case is the collection n + f for some fixed f ∈ L(R ), it is easy to see that V(E) = f , V(E) = E E Ef , and V(E) = E | f |; consequently, V, V, and V are also additive set functions. More generally, we will show that if φ is any additive set function on a σalgebra , then V, V, and V are also additive set functions on . A simple corollary of the finiteness of V is that an additive set function φ is not only finite but also bounded. The first step in proving that the three variations are additive is the following lemma.

Lemma 10.4 If φ is an additive set function on , then each of its three variations is countably subadditive; that is, if Ek ∈ , k = 1, 2, . . . , then V

with similar formulas for V and V.

V(Ek ), Ek ≤

241

Abstract Integration

Proof. Let H H2 = E2 − E1 , H3 = E3 − E2 − E1 , . . . . Then 1 = E1 , the Hk are Hk ) and disjoint and Ek = Hk . If A ∈ and A ⊂ Ek , then A = (A ∩ φ(A) = φ(A ∩ Hk ). Therefore, since A ∩ Hk ⊂ Ek , we have φ(A) ≤ V(Ek ). Hence,

V

Ek =

sup

A⊂

Ek ,A∈

φ(A) ≤

V(Ek ),

which proves the result for V. The proof for V is similar, and the result for V follows by adding.

Lemma 10.5 If φ is an additive set function on , then its variations V(E), V(E), and V(E) are finite for every E ∈ . Proof. It is enough to show the result for V. Suppose that V(E) = +∞ for some E. We claim that there would then exist sets Ek ∈ , k = 1, 2, . . . , such that Ek and both V(Ek ) = +∞ and |φ(Ek )| ≥ k − 1. To see this, we argue by induction. Let E1 = E, and suppose that E1 ⊃ E2 ⊃ · · · ⊃ EN have been constructed with |φ(Ek )| ≥ k − 1 and V(Ek ) = +∞ for k = 1, . . . , N. Since V(EN ) = +∞, there exists A ∈ such that A ⊂ EN and |φ(A)| ≥ |φ(EN )| + N. If V(A) = +∞, let EN+1 = A, noting that |φ(A)| ≥ N. If V(A) < +∞, let EN+1 = EN −A. Then V(EN+1 ) = +∞ since by Lemma 10.4, we have V(EN ) ≤ V(EN+1 ) + V(A). Furthermore, |φ(EN+1 )| = |φ(EN ) − φ(A)| ≥ |φ(A)| − |φ(EN )| ≥ N. This establishes the existence of sets Ek with the desired properties. Thus, by Theorem 10.1, we obtain |φ( Ek )| = lim |φ(Ek )| = +∞, contradicting the finiteness of φ and completing the proof of the lemma. The final step in proving that the variations are additive set functions is given in the next lemma. Lemma 10.6 If φ is an additiveset function on and {Ek } is a sequence of disjoint sets in , then V( Ek ) = V(Ek ). Similar formulas hold for V and V. Proof. By Lemma 10.4, we have V( Ek ) ≤ V(Ek ). To show the opposite inequality, given ε > 0, choose Ak ⊂ Ek with V(Ek ) ≤ φ(Ak ) + ε2−k . This is

242

Measure and Integral: An Introduction to Real Analysis

possible since V(Ek ) is finite by the previous lemma. Since the Ek are disjoint so are the Ak , and we obtain

V(Ek ) ≤

φ(Ak ) + ε = φ Ak + ε ≤ V Ek + ε.

Since ε is an arbitrary positive number, the result for V follows. The analogous formula for V is proved similarly, and the one for V follows by adding. Combining Lemmas 10.5 and 10.6, we immediately obtain the next theorem.

Theorem 10.7 V, and V.

If φ is an additive set function on , then so are its variations V,

The result that follows is basic and gives a decomposition of an additive set function into the difference of two nonnegative additive set functions. It may be compared to Theorem 2.6.

Theorem 10.8 (Jordan Decomposition) If φ is an additive set function on , then φ(E) = V(E) − V(E),

E ∈ .

Proof. If A ⊂ E and A ∈ , then φ(E) = φ(A)+φ(E − A). Choose measurable sets Ak ⊂ E with φ(Ak ) → V(E) as k → ∞. Then φ(E − Ak ) → φ(E) − V(E), and −V(E) ≤ φ(E) − V(E) since φ(E − Ak ) ≥ −V(E). If it were true that −V(E) < φ(E) − V(E), there would be a measurable set B ⊂ E with φ(B) < φ(E) − V(E), and consequently φ(E − B) > V(E), a contradiction. Hence, −V(E) = φ(E) − V(E) as claimed. A sequence {Ek } of sets is said to converge if lim sup Ek = lim inf Ek . Thus, {Ek } converges if each point that belongs to infinitely many Ek belongs to all Ek from some k on. For example, if either Ek E or Ek E, then {Ek } converges to E. If {Ek } is any sequence that converges, it is said to converge to the set E = lim sup Ek = lim inf Ek . As a simple corollary of the Jordan decomposition, we obtain the following result. Corollary 10.9 If a sequence of sets {Ek } of converges to E, and if φ is an additive set function on , then limk→∞ φ(Ek ) = φ(E).

243

Abstract Integration

Proof. If φ ≥ 0, we may apply Theorem 10.2. Since the extreme terms there both equal φ(E), it follows that all four equal φ(E). Hence, lim φ(Ek ) exists and equals φ(E). For arbitrary φ, the result therefore holds for V and V, and so, by the Jordan decomposition, for φ itself. Let (S , , μ) be a measure space. We have already observed that μ satisfies μ(E1 ) ≤ μ(E2 ) if E1 ⊂ E2 , E1 , E2 ∈ . Another basic property of μ is given in the next theorem. Theorem 10.10 Let (S , , μ) be a measure space, and let {Ek : k = 1, 2, . . .} be any sequence of measurable sets. Then μ Proof. Write

μ(Ek ). Ek ≤

Ek as a disjoint union as follows:

Ek = E1 ∪ (E2 − E1 ) ∪ (E3 − E2 − E1 ) ∪ · · · .

Then μ

Ek = μ(E1 ) + μ(E2 − E1 ) + μ(E3 − E2 − E1 ) + · · · ≤ μ(E1 ) + μ(E2 ) + μ(E3 ) + · · · = μ(Ek ),

which completes the proof. By definition, a measure is countably additive on disjoint measurable sets (cf. Theorem 3.23). The next result shows that it shares another basic property of Lebesgue measure (see Theorem 3.26).

Theorem 10.11 measurable sets.

Let (S , , μ) be a measure space, and let {Ek } be a sequence of

(i) If Ek E, then limk→∞ μ(Ek ) = μ(E). (ii) If Ek E and μ(Ek0 ) < +∞ for some k0 , then limk→∞ μ(Ek ) = μ(E). Proof. Suppose that Ek E. If μ(Ek ) < +∞ for all k, we may use the same argument used to prove the first part of Theorem 10.1. If μ(Ek ) = +∞ for some k, then lim μ(Ek ) = μ(E) = +∞. To prove the second part, we may assume that k0 = 1 and use the argument for Theorem 10.1 with S replaced by E1 .

244

Measure and Integral: An Introduction to Real Analysis

Corollary 10.12 Let (S , , μ) be a measure space and let {Ek : k = 1, 2, . . .} be a sequence of measurable sets. Then (i) μ(lim inf Ek ) ≤ lim inf k→∞ μ(Ek ). (ii) If μ( ∞ k0 Ek ) < +∞ for some k0 , then μ(lim sup Ek ) ≥ lim supk→∞ μ(Ek ). Proof. Part (ii) is an immediate corollary of Theorem 10.2. For part (i), let Am = ∞ k=m Ek , m = 1, 2, . . . . Then Am lim inf Ek , and by Theorem 10.11, μ(lim inf Ek ) = limm→∞ μ(Am ). Since Am ⊂ Em , we have μ(Am ) ≤ μ(Em ) and limm→∞ μ(Am ) ≤ lim inf m→∞ μ(Em ). The result follows by combining inequalities.

10.2 Measurable Functions and Integration We will now develop the notions of measurable functions and integration in a measure space. These will be used later in the chapter to prove several important results for set functions. Let be a fixed σ-algebra of subsets of S , and let f (x) be a real-valued function defined for x in a measurable set E. (As usual, f may take the values ±∞.) Then f is said to be -measurable, or simply measurable, if {x ∈ E : f (x) > a} is measurable for −∞ < a < +∞. We will state some familiar results whose proofs depend only on the fact that the class of measurable sets forms a σ-algebra. The proofs are therefore similar to those in Chapter 4 for Lebesgue measurable functions, and details are left to the reader. On the other hand, the proofs of some other results (such as the monotone convergence theorem) follow a pattern different from their analogues in Chapter 5 due to the lack of a geometric interpretation of the integral in the general setting.

Theorem 10.13 (i) If f and g are measurable on a set E ∈ , then so are f + g, cf for real c, φ(f ) if φ is continuous on R1 , f + , f − , | f |p for p > 0, fg, and 1/f if f = 0 in E. (ii) If {fk } are measurable on a set E ∈ , then so are supk fk , inf k fk , lim supk→∞ fk , lim inf k→∞ fk , and, if it exists, limk→∞ fk . N (iii) If f is a simple function taking values {vk }N k=1 on disjoint sets {Ek }k=1 , respectively, then f is measurable if and only if each Ek is measurable. In particular, χE is measurable if and only if E is.

(iv) If f is nonnegative and measurable on E ∈ , then there exist nonnegative, simple measurable fk f on E.

245

Abstract Integration

If (S , , μ) is a measure space, a measurable set E is said to have μ-measure zero, or measure zero, if μ(E) = 0. A property is said to hold almost everywhere in E with respect to μ, or a.e. (μ), if it holds in E except at most for a subset of measure zero. We have the following analogue of Egorov’s theorem.

Theorem 10.14 (Egorov’s Theorem) Let (S , , μ) be a measure space, and let E be a measurable set with μ(E) < +∞. Let {fk } be a sequence of measurable functions on E such that each fk is finite a.e. (μ) in E and {fk } converges a.e. (μ) in E to a finite limit. Then, given ε > 0, there is a measurable set A ⊂ E with μ(E − A) < ε such that {fk } converges uniformly on A. In general, we cannot choose A to be closed in Theorem 10.14; in fact, S has very little structure, and the notion of a closed set may not even be defined. The proof is similar to that for Lebesgue measure and is left as an exercise. Let f be nonnegative on a measurable set E. Define the integral of f over E with respect to μ by

f dμ = sup

j

E

[ inf f (x)] μ(Ej ), x∈Ej

f ≥ 0,

(10.15)

where the supremum is taken over all decompositions E = Ej of E into the union of a finite number of disjoint measurable sets Ej . We adopt the convention 0 · ∞ = ∞ · 0 = 0 for the terms of the sum in (10.15). By Theorem 5.8, the definition reduces to the usual Lebesgue integral in case S = Rn , is the class of Lebesgue measurable sets, μ is Lebesgue measure, and f is nonnegative and Lebesgue measurable. Although definition (10.15) does not require the measurability of f , many of the familiar properties of the integral are valid only for measurable functions. All functions considered in the rest of this section are assumed to be measurable.

Theorem 10.16 Let (S , , μ) be a measure space, and let f be a nonnegative, simple measurable function defined on a measurable set E. If f takes values v1 , . . . , vN on disjoint E1 , . . . , EN , then f dμ = vj μ(Ej ). E

Proof. Since f is measurable, each Ej is measurable by Theorem 10.13(iii). f dμ ≥ vj μ(Ej ). On the other hand, consider any decomposition Clearly, E E = Ak of E into a finite number of disjoint measurable sets, and let wk = inf Ak f . If Ak ∩ Ej is not empty, then wk ≤ vj . Therefore, by the additivity of μ,

246

Measure and Integral: An Introduction to Real Analysis

wk μ(Ak ) =

j

≤

k

vj

j

wk μ(Ak ∩ Ej )

μ(Ak ∩ Ej ) =

vj μ(Ej ).

k

Taking the supremum over all such decompositions gives which completes the proof.

Ef

dμ ≤

vj μ(Ej ),

Note that the previous theorem holds even if some of the vj are +∞. Theorem 10.17 Let (S , , μ) be a measure space, and let f and g be measurable functions defined on a set E ∈ .

g dμ. E (ii) If f ≥ 0 on E and μ(E) = 0, then E f dμ = 0. (i) If 0 ≤ f ≤ g on E, then

Ef

dμ ≤

Proof. Both parts follow immediately from the definition (10.15). For part (ii), note that μ(Ej ) = 0 whenever Ej ⊂ E and Ej ∈ . Hence, each term of the sum in (10.15) is zero. In order to further investigate the properties of the integral, we need the next two lemmas. In these and the results that follow, the measure space (S , , μ) is fixed.

Lemma 10.18 (i) If f and g are nonnegative, simple measurable functions on E, and if c is a (f + g) dμ = f dμ + g dμ and nonnegative constant, then E E E E cf dμ = c E f dμ.

(ii) If f is a nonnegative, simple measurable function on E, and E = E1 ∪ E2 is the union of two disjoint measurable sets, then E f dμ = E1 f dμ + E2 f dμ. The proof of the first part of (i) is like the first part of the proof of Theorem 5.14. The second part of (i) follows immediately from Theorem 10.16. Details and the proof of (ii) are left as an exercise. Lemma 10.19 Let fk , k = 1, 2, . . . , and g be nonnegative, simple measurable functions defined on a set E ∈ . If fk and limk→∞ fk ≥ g on E, then

247

Abstract Integration lim

k→∞

fk dμ ≥

E

g dμ.

E

Moreover, the conclusion remains true if some values of g are +∞. Proof. Suppose g takes finite values v1 , . . . , vm on disjoint sets E1 , . . . , Em . By Lemma 10.18(ii), it is enough to show that lim

k→∞

fk dμ ≥

Ej

g dμ

for each j.

Ej

We thus reduce the proof when g < +∞ to the case when g is constant on E, that is, g = v ≥ 0 on E. If v = 0, the result is obvious. Suppose then that 0 < v < +∞, and let 0 < ε < v and Ak = {x ∈ E : fk (x) ≥ v − ε}, k = 1, 2, . . . . Since fk , we have Ak E, so that μ(Ak ) → μ(E). Moreover, E

fk dμ ≥

fk dμ ≥ (v − ε)μ(Ak ).

Ak

Therefore, limk→∞ E fk dμ ≥ (v − ε)μ(E). Letting ε → 0 and observing that vμ(E) = E g dμ, we obtain the desired result. We leave it to the reader to check the case when some values of g are +∞. The next theorem is helpful in deriving properties of nonnegative f from those for simple f .

fdμ for arbitrary

Theorem 10.20 Let {fk } be a sequence of nonnegative, simple measurable functions defined on E ∈ . If fk f on E, then E fk dμ → E f dμ. Proof. Clearly, limk→∞ E fk dμ ≤ E f dμ. To show the opposite inequality, consider a partition E = Ej of E into a finite number of disjoint measurable vj μ(Ej ). The function g defined by g = sets Ej , and let vj = inf Ej f and σ = vj χEj is nonnegative and measurable, and E g dμ = σ. Since limk→∞ fk ≥ g, we have limk→∞ E fk dμ ≥ σ by Lemma 10.19. Taking the supremum of such σ over all partitions of E, we obtain the desired result. As a corollary, we have the following theorem.

Theorem 10.21 Let f and g be nonnegative measurable functions defined on E ∈ , and let c be a nonnegative constant. Then

248

Measure and Integral: An Introduction to Real Analysis

(i) E (f + g) dμ = E f dμ + E g dμ and E cf dμ = c E f dμ. (ii) If E2 , where E1 and E2 are disjoint and measurable, then E f dμ = E = E1 ∪ E1 f dμ + E2 f dμ. Proof. By Theorem 10.13(iv), choose simple measurable fk and gk such that 0 ≤ fk f and 0 ≤ gk g. Then fk + gk is simple and measurable, and 0 ≤ fk + gk f + g. Therefore, by Theorem 10.20 and Lemma 10.18,

(f + g) dμ = lim

k→∞

E

=

(fk + gk ) dμ = lim

E

f dμ +

E

k→∞

fk dμ +

E

gk dμ

E

g dμ.

E

This proves the first part of (i); the other parts are proved similarly. If f is any real-valued measurable function defined on a measurable set E, we define its integral with respect to μ by E

f (x) dμ(x) =

E

f dμ =

E

f + dμ −

f − dμ,

(10.22)

E

provided not both integrals on the right are +∞. We say that f is integrable with respect to μ, or μ-integrable, over E if E f dμ exists and is finite. When this is the case, we write f ∈ L(E; dμ) or f ∈ L(E; μ). The abbreviations L(dμ) or L(μ) are also useful when it is clear from context what the set E is. It is immediate from (10.22) and Theorem 10.17 that E f dμ = 0 if μ(E) = 0 exist. The familiar and that E f dμ ≤ E g dμ if f ≤ g on E and both integrals properties of the Lebesgue integral are shared by E f dμ; some of them are listed in the following theorem.

Theorem 10.23

Ef

dμ| ≤

E | f | dμ; furthermore, f ∈ L(E; dμ) if and only if | f | ∈ L(E; dμ). (ii) If | f | ≤ |g| a.e. (μ) in E, and if g ∈ L(E; dμ), then f ∈ L(E; dμ), and E | f | dμ ≤ |g| dμ. E (iii) If f ∈ L(E; dμ), then f is finite a.e. (μ) in E. (iv) If f = g a.e. (μ) in E and if E f dμ exists, then E g dμ exists and E g dμ = E f dμ.

(i) |

249

Abstract Integration

(v) If E f dμ exists and c is a constant, then E cf dμ exists and E cf dμ = c E f dμ. (vi) If f , g ∈ L(E; dμ), then f + g ∈ L(E; dμ) and E (f + g) dμ = E f dμ + E g dμ.

(vii) If f ≥ 0 and m ≤ g ≤ M on E, then m

f dμ ≤

E

fg dμ ≤ M

E

f dμ.

E

Proof. The proofs are similar to those for Lebesgue integrals. As examples, we will prove (iii) and (iv). For (iii), suppose that f ∈ L(E; dμ). Then, by (i), | f | ∈ L(E; dμ). Let Z = {x ∈ E : | f (x)| = +∞}. Then for any positive integer k, k μ(Z) ≤

| f | dμ ≤

Z

| f | dμ.

E

Since f ∈ L(E; dμ), it follows that μ(Z) = 0, which proves (iii). For (iv), since both f + = g+ and f − = g− a.e. (μ) in E, we may assume that f ≥ 0. Then both E f dμ and E g dμ clearly exist. Let E1 = {x ∈ E : f (x) = g(x)}. Since μ(E1 ) = 0, Theorem 10.21(ii) implies that

f dμ =

E

f dμ =

E−E1

g dμ =

E−E1

g dμ,

E

as asserted.

Theorem 10.24 then

If {fk } is a sequence of nonnegative measurable functions on E, fk dμ = fk dμ. E

E

m Proof. Let f = ∞ k=1 fk . Since f ≥ k=1 fk , the integral of f over E majorizes m k=1 E fk dμ for any m. Hence, the left side in the preceding equation (j)

majorizes the right. To show the opposite inequality, let {fk } be a sequence of j (j) nonnegative, simple measurable functions increasing to fk . Let sj = k=1 fk . Then sj is nonnegative and simple, and sj . We will show that sj f . Clearly, limj→∞ sj ≤ f . On the other hand, for any m, lim sj ≥ lim

j→∞

j→∞

m k=1

(j)

fk =

m k=1

fk .

250

Measure and Integral: An Introduction to Real Analysis

Therefore, limj→∞ sj ≥ f . It follows from Theorem 10.20 that j E f dμ. Since k=1 fk ≥ sj , we obtain ∞

j

fk dμ = lim

j→∞

k=1 E

fk dμ ≥ lim

j→∞

k=1 E

sj dμ =

E

E sj dμ

→

f dμ.

E

This proves the desired inequality, and the theorem follows. The next three results are essentially corollaries of Theorem 10.24. Ek is a countable union of disjoint Theorem 10.25 If E f dμ exists, and if E = measurable sets Ek , then

f dμ =

E

f dμ.

Ek

Proof. Suppose first that f≥ 0. Let fk = f χEk on E, so that fk is measurable and nonnegative, and f = fk . By Theorem 10.24,

f dμ =

E

fk dμ =

E

f dμ.

Ek

For arbitrary measurable f , the existence of E f dμ implies that of Ek f dμ; in fact, the integrals of f + and f − over any Ek are majorized by those over E. Moreover, by the case already considered, E

f + dμ =

Ek

f + dμ,

E

f − dμ =

f − dμ.

Ek

Since at least one of these sums is finite, the conclusion follows by subtraction. (Compare Theorem 5.24.)

Theorem 10.26

If fk are measurable and 0 ≤ fk f on E, then

E fk dμ

→

Ef

dμ.

Proof. If E fk dμ = +∞ for some k, the result is obvious. We may therefore assume that each fk ∈ L(dμ). Write f = f1 + ∞ k=2 (fk − fk−1 ). Since each term on

251

Abstract Integration

the right is nonnegative, we obtain from Theorems 10.24 and 10.23(vi) that E

f dμ =

f1 dμ +

∞ k=2

E

E

fk dμ −

fk−1 dμ = lim

k→∞

E

fk dμ.

E

Theorem 10.27 (Monotone Convergence Theorem) Let {fk } and f be measurable functions on E: (i) Suppose that fk f a.e. (μ) onE. If there exists φ ∈ L(E; dμ) such that fk ≥ φ on E for all k, then E fk dμ → E f dμ.

(ii) Suppose that fk f a.e. (μ) onE. If there exists φ ∈ L(E; dμ) such that fk ≤ φ on E for all k, then E fk dμ → E f dμ. Proof. The proof of (i) follows by applying Theorem 10.26 to the functions fk − φ. The details are as in the proof of Theorem 5.32. Part (ii) follows by applying (i) to the functions −fk . Theorem 10.28 (Uniform Convergence Theorem) If fk ∈ L(E; dμ), k = 1, 2, . . . , and {fk } converges uniformly to f on E, μ(E) < +∞, then f ∈ L(E; dμ) and E fk dμ → E f dμ. The proof is the same as for Lebesgue measure (see Theorem 5.33) and is omitted. Fatou’s lemma and the Lebesgue dominated convergence theorem are true for abstract measures. They are stated below without proof; the proofs are like those of Theorems 5.34 and 5.36. Theorem 10.29 (Fatou’s Lemma) If {fk } is a sequence of measurable functions on E and there exists φ ∈ L(E; dμ) such that fk ≥ φ a.e. (μ) on E for all k, then E

(lim inf fk ) dμ ≤ lim inf k→∞

k→∞

fk dμ.

E

The case φ = 0 (i.e., fk ≥ 0) is of special importance. In this case, we obtain the following useful corollary. functions on E such Corollary 10.30 Let {fk } and f be nonnegative measurable that fk → f a.e. (μ) in E. If E fk dμ ≤ M for all k, then E f dμ ≤ M.

252

Measure and Integral: An Introduction to Real Analysis

Theorem 10.31 (Lebesgue’s Dominated Convergence Theorem) Let φ, {fk }, and f be measurable functions on E such that | fk | ≤ φ a.e. (μ) on E and φ ∈ L(E; dμ). Then (i)

E (lim inf k→∞ fk ) dμ ≤ lim inf k→∞ E fk dμ ≤ ≤ E (lim supk→∞ fk ) dμ.

(ii) If fk → f a.e. (μ) in E, then

E fk dμ

→

Ef

lim supk→∞

E fk dμ

dμ.

Corollary 10.32 (Bounded Convergence Theorem) Suppose that {fk } and f are measurable functions on E such that fk → f a.e. (μ) in E. If μ(E) < +∞ and there is a constant M such that | fk | ≤ M a.e. (μ) in E, then E fk dμ → E f dμ. We conclude our brief study of integration with respect to abstract measures by defining Lp (E; dμ) = Lp (E, , dμ), 0 < p α) = 0}. E

We leave it to the reader to check that | f | ≤ f ∞ a.e. (μ) in E and that for every α < f ∞ , there is a set Eα ⊂ E such that μ(Eα ) > 0 and | f | > α on Eα . Observe that lp is Lp (S , , dμ) when S is the set of integers, is the set of all subsets of S , and μ(E) is the number of elements of E. For 1 ≤ p ≤ ∞, Hölder’s and Minkowski’s inequalities hold: fg1 ≤ f p gp ,

f + gp ≤ f p + gp ,

1/p + 1/p = 1. Moreover, Lp is a Banach space with norm · p if 1 ≤ p ≤ ∞. In general, Lp is not separable (see Exercise 9). However, if L2 is separable, we can define orthogonality, linear independence, completeness, Fourier coefficients, and Fourier series as usual, obtaining Bessel’s inequality and Parseval’s formula, as well as the usual result relating L2 and l2 .

Abstract Integration

253

10.3 Absolutely Continuous and Singular Set Functions and Measures We now turn our attention from the familiar results earlier to some new ones arising naturally in the context of abstract measure spaces. Let (S , , μ) be a measure space, and let φ be an additive set function on . If E ∈ , then φ is said to be absolutely continuous on E with respect to μ if φ(A) = 0 for every measurable A ⊂ E with μ(A) = 0. Note that this definition has a somewhat different pattern from the one for Lebesgue measure (see p. 130 in Section 7.1). However, in Theorem 10.34, we shall obtain a reformulation of the present definition in terms of the old one. On the other hand, φ is said to be singular on E with respect to μ if there is a measurable set Z ⊂ E such that μ(Z) = 0 and φ(A) = 0 for every measurable A ⊂ E − Z. Thus, φ is singular if it is supported on a set of μ-measure zero, so that E splits into the union of two sets, Z and E − Z, one with μ-measure zero and the other with the property that φ is zero on each measurable subset of it. As examples, note that if f ∈ L(E; dμ), then the function φ(A) = A f dμ is absolutely continuous on E with respect to μ. If Z is any measurable subset of E with μ(Z) = 0 and ψ is any additive set function on the measurable subsets of E, then the function φ(A) = ψ(A ∩ Z) is singular on E with respect to μ. We list several simple properties of such set functions in the next theorem.

Theorem 10.33 (i) If φ is both absolutely continuous and singular on E with respect to μ, then φ(A) = 0 for every measurable A ⊂ E. (ii) If both ψ and φ are absolutely continuous (singular) on E with respect to μ, then so are ψ + φ and cφ, where c is any real constant. (iii) φ is absolutely continuous (singular) on E with respect to μ if and only if its variations V and V are, or, equivalently, if and only if its total variation V is. (iv) If {φk } is a sequence of additive set functions that are absolutely continuous (singular) on E with respect to μ, and if φ(A) = limk→∞ φk (A) exists for every measurable A ⊂ E, then φ is absolutely continuous (singular) on E with respect to μ. Proof. For part (i), suppose that φ is absolutely continuous and singular on E. Let Z be a subset of E with μ-measure zero such that φ(H) = 0 if H is measurable and H ⊂ E − Z. If A is any measurable subset of E, then

254

Measure and Integral: An Introduction to Real Analysis

φ(A) = φ (A ∩ Z) + φ (A − Z). Since φ is absolutely continuous and μ (A ∩ Z) = 0, we have φ (A ∩ Z) = 0. Moreover, since A − Z ⊂ E − Z and φ is singular, we have φ(A − Z) = 0. Hence, φ(A) = 0, and part (i) is proved. The proofs of parts (ii)–(iv) are left as exercises. The next two theorems give alternate characterizations of absolutely continuous and singular set functions.

Theorem 10.34 An additive set function φ is absolutely continuous on E with respect to μ if and only if given ε > 0, there exists δ > 0 such that |φ(A)| < ε for any measurable A ⊂ E with μ(A) < δ. Proof. The sufficiency of the condition is immediate since if μ(A) = 0, then μ(A) < δ for all δ > 0, so that |φ(A)| < ε for all ε. Consequently, φ(A) = 0. For the converse, suppose that φ is absolutely continuous, but that there is an ε > 0 for which no δ > 0 gives the desired result. Then, taking δ = 2−k for k = 1, 2, . . . , there would exist measurable Ak ⊂ E with μ(Ak ) < 2−k and |φ(Ak )| ≥ ε. Let A = lim sup Ak . Then, for any m, μ(A) ≤ μ

∞ k=m

Ak

≤

∞

2−k ,

k=m

so that μ(A) = 0. Therefore, φ(A) = 0. Assuming for the moment that φ ≥ 0, we obtain from Theorem 10.2 that φ(A) = φ(lim sup Ak ) ≥ lim sup φ(Ak ) ≥ ε. This contradiction establishes the result in case φ ≥ 0. For the general case, the variation V of an absolutely continuous φ is absolutely continuous (by Theorem 10.33(iii)) and nonnegative. Since |φ(A)| ≤ V(A), the theorem follows.

Theorem 10.35 An additive set function φ is singular on E with respect to μ if and only if given ε > 0, there is a measurable subset E0 of E such that μ(E0 ) < ε and V(E − E0 ; φ) < ε. (Recall from p. 240 in Section 10.1 that V(E − E0 ; φ) is equivalent in size to supA⊂E−E0 ,A∈ |φ(A)|). Proof. If φ is singular, there exists Z ⊂ E with μ(Z) = 0 such that V(E − Z; φ) = 0. Taking E0 = Z, we obtain the necessity of the condition. To prove its −k sufficiency, choose for each k = 1, 2, . . . a measurable E k ⊂ E with μ(Ek ) < 2 ∞ −k and V(E − Ek ; φ) < 2 . Let Z = lim sup Ek . Since Z ⊂ k=m Ek for every m, it follows as usual that μ(Z) = 0. Moreover, by Theorem 10.2,

255

Abstract Integration

V(E − Z; φ) = V(E − lim sup Ek ; φ) = V(lim inf(E − Ek ); φ) ≤ lim inf V(E − Ek ; φ) = 0. Hence, φ is singular with respect to μ, which completes the proof. For any measurable set E and any additive set function φ, the next theorem gives a useful decomposition of E in terms of the sign of φ. In order to motivate it, let us first consider the special case when φ is the indefinite integral of an f ∈ L(E; dμ): φ(A) = A f dμ for measurable A ⊂ E. Letting P = {x ∈ E : f (x) ≥ 0}, we see that φ(A) ≥ 0 for any measurable A ⊂ P and that φ(A) ≤ 0 for any measurable A ⊂ E − P. It follows that V(E; φ) = V(P; φ) = φ(P) and V(E; φ) = V(E − P; φ) = −φ(E − P). As a simple consequence, since P = {x ∈ E : f (x) = f + (x)}, we have V(E; φ) =

f + dμ,

V(E; φ) =

E

f − dμ.

E

The splitting E = P ∪ (E − P) is the sort of decomposition of E that we have in mind. For an arbitrary φ, there is the following basic result.

Theorem 10.36 (Hahn Decomposition) Let E be a measurable set and let φ be an additive set function defined on the measurable subsets A of E. Then there is a measurable P ⊂ E such that φ(A) ≥ 0 for A ⊂ P and φ(A) ≤ 0 for A ⊂ E − P. Equivalently, V(P; φ) = V(E − P; φ) = 0. Hence, V(E; φ) = V(P; φ) = φ(P), V(E; φ) = V(E − P; φ) = −φ(E − P). Proof. Denote V(A) = V(A; φ) and V(A) = V(A; φ) for measurable sets A ⊂ E. For each positive integer k, choose a measurable Ak ⊂ E such that φ(Ak ) > V(E) − 2−k . Then V(Ak ) > V(E) − 2−k . Since V is additive, V(E − Ak ) = V(E) − V(Ak ) < 2−k . Moreover, by the Jordan decomposition (Theorem 10.8), V(Ak ) − V(Ak ) = φ(Ak ) > V(E) − 2−k , and therefore V(Ak ) < 2−k . Let P = lim inf Ak . Since V is nonnegative, Theorem 10.2 implies that V(P) ≤ lim inf V(Ak ) = 0. Also,

V(E − P) = V(E − lim inf Ak ) = V(lim sup(E − Ak )) ≤ V

∞

(E − Ak )

k=m

256

Measure and Integral: An Introduction to Real Analysis

for any m. Therefore, for any m, by using Lemma 10.4, we obtain

V(E − P) ≤

∞

V(E − Ak )

0, there is a decomposition E = Z ∪ k=1 k of E into disjoint measurable sets such that (i) μ(Z) = 0 (ii) a(k − 1)μ(A) ≤ φ(A) ≤ akμ(A) for measurable A ⊂ Ek , k = 1, 2, . . . Proof. We may assume that a = 1 by considering φ/a. For each positive integer k, let ψk (A) = φ(A) − kμ(A) for measurable A ⊂ E. Since φ and μ are finite and additive, ψk is an additive set function. By the Hahn decomposition, there is a set Pk ⊂ E such that ψk (A) ≥ 0 if A ⊂ Pk and ψk (A) ≤ 0 if A ⊂ E − Pk . Thus, φ(A) ≥ kμ(A) if A ⊂ Pk and φ(A) ≤ kμ(A) if A ⊂ E − Pk . Now, let Qk = ∞ m=k Pm for k = 1, 2, . . . , and observe that Pk ⊂ Qk and Qk . We will show that φ(A) ≥ kμ(A) if A ⊂ Qk , and φ(A) ≤ kμ(A) if A ⊂ E − Qk . To see this, write Qk = Pk ∪ (Pk+1 − Pk ) ∪ (Pk+2 − Pk+1 − Pk ) ∪ · · · and note that the terms on the right side are disjoint. Hence, if A ⊂ Qk , we may write A = ∞ m=k Am , where the Am are disjoint and Am ⊂ Pm , by simply intersecting A with each such term of Qk . Then

φ(A) =

∞ m=k

φ(Am ) ≥

∞ m=k

mμ(Am ) ≥ k

∞

μ(Am ) = kμ(A),

m=k

so that φ(A) ≥ kμ(A) for A ⊂ Qk , as claimed. On the other hand, if A ⊂ E−Qk , then A ⊂ E − Pk , so that φ(A) ≤ kμ(A). This proves the assertion above.

257

Abstract Integration

We can now give the decomposition of E. Let Z = and write

∞

k=1 Qk

= lim sup Pk ,

E = Z ∪ (E − Q1 ) ∪ (Q1 − Q2 ) ∪ (Q2 − Q3 ) ∪ · · · = Z ∪ E1 ∪ E2 ∪ E3 ∪ · · · . The terms in this decomposition are disjoint. If A ⊂ E1 (= E−Q1 ), then φ(A) ≤ μ(A) by what was shown earlier, and φ(A) ≥ 0 by hypothesis. For k ≥ 2, we have Ek = Qk−1 − Qk = Qk−1 ∩ (E − Qk ). Hence, if k ≥ 2 and A ⊂ Ek , then φ(A) ≥ (k − 1) μ(A) due to the fact that A ⊂ Qk−1 ; also, φ(A) ≤ kμ(A) due to A ⊂ E − Qk . Finally, since Z ⊂ Qk for all k, we have φ(Z) ≥ kμ(Z) for all k. Since φ is finite, it follows that μ(Z) = 0, which completes the proof. To give some idea of the significance of the last result, write A = (A ∩ Z) ∪ [ k (A ∩ Ek )] for measurable A ⊂ E. Then φ(A) = φ(A ∩ Z) +

φ(A ∩ Ek ).

The set function σ(A) = φ (A ∩ Z) is singular with respect to μ. By (ii) of the theorem, φ is absolutely continuous with respect to μ on each Ek . Hence, the set function α defined by α(A) = φ(A) − σ(A) =

φ(A ∩ Ek )

is absolutely continuous with respect to μ since if μ(A) = 0, then μ(A ∩ Ek ) = 0 and φ(A ∩ Ek ) = 0 for all k. Note also that (ii) can be written a(k − 1) dμ ≤ φ(A) ≤ ak dμ A

A

for measurable A ⊂ Ek . We will now use these ideas to decompose any set function into the sum of an absolutely continuous part, which will be an indefinite integral, and a singular part. This decomposition, which is of major importance, is stated in the following theorem. We assume that the measure μ defined on the measurable subsets of E is σ-finite, that is, that E can be written as a countable union of measurable sets with finite μ-measure.

Theorem 10.38 (Lebesgue Decomposition) Let φ be an additive set function on the measurable subsets of a measurable set E, and let μ be a σ-finite measure on E. Then there is a unique decomposition φ(A) = α(A) + σ(A) for measurable A ⊂ E,

258

Measure and Integral: An Introduction to Real Analysis

where α and σ are additive set functions, α is absolutely continuous with respect to μ, and σ is singular with respect to μ. These functions are α(A) =

σ(A) = φ(A ∩ Z)

f dμ,

A

for appropriate f ∈ L (E; dμ) and Z with μ(Z) = 0. Moreover, if φ ≥ 0, then f ≥ 0. Proof. Assuming that such a decomposition exists, we will show it is unique. If φ = α1 + σ1 is another decomposition of φ into absolutely continuous and singular parts, then α−α1 = σ1 −σ, which (being both absolutely continuous and singular with respect to μ) must vanish identically. Hence α = α1 and σ = σ1 . To show that the decomposition exists, first assume that φ ≥ 0 and μ(E) < +∞. Taking a = 2−m , m = 1, 2, . . . , in Theorem 10.37, we may write E as a (m) disjoint union E = Z(m) ∪ , where k Ek μ(Z(m) ) = 0 and 2−m (k − 1)μ(A) ≤ φ(A) ≤ 2−m kμ(A) if A ⊂ Ek . (m)

Given m, k, m , and k , let β = 2−m (k − 1), γ = 2−m k, β = 2−m (k − 1), and γ = 2−m k . If the intervals [β, γ] and [β , γ ] are disjoint, we will show that (m) (m ) the set A = Ek ∩ Ek has μ-measure zero. In fact, we have both βμ(A) ≤ φ(A) ≤ γμ(A)

and

β μ(A) ≤ φ(A) ≤ γ μ(A).

If, for example, γ < β , the inequalities β μ(A) ≤ φ(A) ≤ γμ(A) imply that μ(A) = 0. A similar argument applies if γ < β. Fixing m and k, and setting (m+1) m = m + 1, we see that there are at most four values of k such that Ek (m) intersects Ek in a set of positive μ-measure, namely, k = 2k − 2, 2k − 1, 2k, and 2k + 1. Hence, (m)

Ek

(m+1)

(m+1)

(m+1)

⊂ E2k−2 ∪ E2k−1 ∪ E2k

(m+1)

(m)

∪ E2k+1 ∪ Yk ,

(m)

where μ(Yk ) = 0. Let Z=

m

⎛

Z(m) ∪ ⎝

⎞ (m) Yk ⎠ ,

k,m

−m (k − 1) so that μ(Z) = 0, and define functions {fm }∞ m=1 on E by fm (x) = 2 (m) (m) if x ∈ Ek − Z and fm (x) = 0 if x ∈ Z. Therefore, if x ∈ Ek − Z, then

259

Abstract Integration

fm (x) = 2−m (k − 1) and fm+1 (x) takes one of the four values 2−m−1 j, j = 2k − (m) 3, 2k − 2, 2k − 1, 2k. Hence, |fm (x) − fm+1 (x)| ≤ 2−m if x ∈ Ek − Z, and so also if x ∈ E. It follows that {fm } converges uniformly on E to a limit f . Since fm ≥ 0, also f ≥ 0. (m) Since E is the disjoint union Z ∪ k Ek − Z and φ is absolutely continuous on each Ek(m) , φ(A) = φ(A ∩ Z) +

k

= φ(A ∩ Z) +

φ A ∩ Ek(m) − Z (m) φ A ∩ Ek

k

for measurable A ⊂ E. Therefore, φ(A ∩ Z) +

(m) ≤ φ(A) 2−m (k − 1)μ A ∩ Ek

k

≤ φ(A ∩ Z) +

2−m kμ A ∩ Ek(m) ,

k

which can be rewritten φ(A ∩ Z) +

fm dμ ≤ φ(A) ≤ φ(A ∩ Z) +

A

fm dμ + 2−m μ(A).

A

Since μ(A) is finite, we obtain from the uniform convergence theorem that f dμ → A f dμ. Therefore, A m φ(A) = φ(A ∩ Z) +

f dμ,

A

which proves the theorem in case φ ≥ 0 and μ(E) < +∞. If φ ≥ 0 and μ(E) = +∞, then E can still be written as a disjoint union E = Ej with μ(Ej ) < +∞, since E is σ-finite. Hence, there exist Zj ⊂ Ej , μ(Zj ) = 0, and nonnegative fj on Ej such that for all measurable A ⊂ E, φ(A ∩ Ej ) = φ(A ∩ Zj ) +

A∩Ej

fj dμ.

260

Letting Z =

Measure and Integral: An Introduction to Real Analysis

Zj and f = φ(A) =

fj χEj , we obtain μ(Z) = 0, f ≥ 0, and

φ(A ∩ Zj ) φ(A ∩ Ej ) = + fj dμ = φ(A ∩ Z) + f dμ.

A∩Ej

A

Of course, f is integrable since φ is finite. The proof is now complete if φ ≥ 0. For an arbitrary φ, apply the decomposition to each of V and V, and subtract the results. By the Jordan decomposition, we obtain φ(A) = A f dμ + σ(A), where f ∈ L (E; dμ) and σ is singular with respect to μ. It remains to show that there is a set Z, μ(Z) = 0, such that σ(A) = φ (A ∩ Z). Let Z be the set of μ-measure zero corresponding to σ in the definition of a singular set function. Then σ (A ∩ Z) = σ(A) and A∩Z f dμ + σ(A) = 0. Hence, replacing A by A ∩ Z in the formula φ(A) = A f dμ + σ(A), we obtain σ(A) = φ (A ∩ Z). This completes the proof. For a result concerning the uniqueness of f , see Exercise 6(a). We have already noted that the indefinite integral of an integrable function is absolutely continuous. The following fundamental result gives a converse: namely, in a σ-finite space, the only absolutely continuous set functions are indefinite integrals.

Theorem 10.39 (Radon–Nikodym) Let φ be an additive set function on the measurable subsets of a measurable E, and let μ be a σ-finite measure on E. If φ is absolutely continuous with respect to μ, there exists a unique f ∈ L (E; dμ) such that φ(A) =

f dμ

A

for every measurable A ⊂ E. Here, the function f is called the Radon–Nikodym derivative of φ with respect to μ. Proof. The result follows from the Lebesgue decomposition. In fact, φ(A) = f dμ + φ (A ∩ Z) for appropriate f ∈ L (E; dμ) and Z with μ(Z) = 0. Since φ A is absolutely continuous, we have φ (A ∩ Z) = 0, so that φ (A) = A f dμ. For the uniqueness of f , see Exercise 6(a). If ν and μ are two measures defined on the same family of measurable sets, we say that ν is absolutely continuous with respect to μ on a measurable

261

Abstract Integration

set E if ν(A) = 0 for every A ⊂ E with μ(A) = 0. If ν is finite, Theorem 10.34 implies that a necessary and sufficient condition for ν to be absolutely continuous with respect to μ is that given ε > 0, there exist δ > 0 such that ν(A) < ε if μ(A) < δ. The necessity of this condition may fail if ν is not finite; see Exercise 12. We say that two measures ν and μ are mutually singular on E if E can be written as a disjoint union, E = E1 ∪ E2 , of two measurable sets with ν(E1 ) = μ(E2 ) = 0. The reader can check that the following analogue of Theorem 10.35 is valid: two measures ν and μ are mutually singular on E if and only if given ε > 0, there are disjoint measurable E1 and E2 with E = E1 ∪ E2 and ν(E1 ) < ε, μ (E2 ) < ε. We also note that ifν and μ are mutually singular on E and if g ∈ L (E; dν), then the set function A g dν is singular with respect to μ. To see this, write E = E1 ∪ E2 , where E1 and E2 are disjoint with ν (E1 ) = μ (E2 ) = 0. Setting Z = E2 ,we have μ(Z) = 0 and ν(A) = 0 for every measurable A ⊂ E−Z = E1 . Hence, A g dν = 0 for such A, which proves the assertion. We have the following analogue of the Lebesgue decomposition.

Theorem 10.40 Let ν and μ be two σ-finite measures defined on the measurable subsets of a measurable E. Then there is a unique, nonnegative measurable f on E and a unique measure σ on the measurable subsets of E such that σ and μ are mutually singular on E and ν(A) = A f dμ + σ(A) for every measurable A ⊂ E. Moreover,

g dν =

A

whenever

A g dν

A

gf dμ +

g dσ

A

exists.

Before giving the proof, we add several remarks based on the theorem. First, A f dμ is an absolutely continuous measure with respect to μ since f ≥ 0. Next, if E = Z ∪ (E − Z), where μ(Z) = σ(E − Z) = 0, then σ has the form σ(A) = ν(A ∩ Z)

for measurable A ⊂ E,

as can be seen by replacing A by A ∩ Z in the decomposition of ν and observing that σ(A) = σ(A ∩ Z) + σ(A − Z) = σ(A ∩ Z). Note also that if ν(E) is finite, then f ∈ L(E; dμ). Finally, if g ∈ L(E; dν), then g ∈ L(E; dσ) since σ(A) ≤ ν(A) for all measurable A ⊂ E. In this case, the second formula in the theorem implies that gf ∈ L(E; dμ) and expresses the Lebesgue decomposition of A g dν with respect to μ. Proof. If ν(E) < +∞, the Lebesgue decomposition implies that ν(A) = A f dμ + σ(A) for measurable A ⊂ E, where f ≥ 0, f ∈ L(E; dμ), and σ

262

Measure and Integral: An Introduction to Real Analysis

and μ are mutually singular. If ν(E) = +∞, then since ν is σ-finite, we have E = Ej with Ej disjoint and ν(Ej ) < +∞. Choose Zj ⊂ Ej and fj on Ej such that μ(Zj ) = 0, fj ≥ 0 and ν(A ∩ Ej ) =

fj dμ + ν(A ∩ Zj )

for measurable A ⊂ E.

A∩Ej

Let Z =

Zj and f =

ν(A) =

fj χEj . Then μ(Z) = 0, and adding over j, we have

f dμ + ν(A ∩ Z) =

A

f dμ + σ(A),

A

as claimed. The proof of the uniqueness of f and σ is left as an exercise. If g is the characteristic function χB of ameasurable set B, the formula g dν = in question, namely, A A gf dμ + A g dσ, reduces to ν(A ∩ B) = f dμ + σ(A ∩ B), which we know to be valid. Hence, the formula is A∩B also valid for any simple measurable g and, therefore, by the monotone convergence theorem, for any measurable g ≥ 0. Now let g be any measurable function for which A g dν exists. Then at least one of A g+ dν and A g− dν is finite, and the formula for g follows by subtracting those for g+ and g− .

Corollary 10.41 Let ν and μ be two σ-finite measures defined on the measurable subsets of a measurable E. (i) Then ν is absolutely continuous with respect to μ on E if and only if there is a nonnegative measurable f such that ν(A) = A f dμ for every measurable A ⊂ E. In this case, A

g dν =

gf dμ

A

for any measurable g and A ⊂ E for which A g dν exists. (ii) Let g ∈ L(E; dν). Then A g dν = A gf dμ for some nonnegative f and all measurable A ⊂ E if and only if A g dν is absolutely continuous with respect to μ. Proof. Let ν(A) = A f dμ + σ(A) be the decomposition given by Theorem 10.40. Part (i) follows from Theorem 10.40 since σ ≡ 0 if and only if ν is absolutely continuous with respect to μ. Part (ii) follows from the fact that g dν = gf dμ + the formula A A A g dσ is the Lebesgue decomposition of g dν. A

263

Abstract Integration

10.4 The Dual Space of Lp If B is a Banach space (or, more generally, a normed linear space) over the real numbers, a real-valued linear functional l on B is by definition a finite real-valued function l(f ), f ∈ B, which satisfies l(f1 + f2 ) = l(f1 ) + l(f2 ),

l(αf ) = α l(f ), −∞ < α < +∞.

Note that l(0) = 0. A linear functional l is said to be bounded if there is a constant c such that |l(f )| ≤ c f for all f ∈ B. A bounded linear functional l is continuous with respect to the norm in B, by which we mean that if f − fk → 0 as k → ∞, then l(fk ) → l(f ), since |l(f ) − l(fk )| = |l(f − fk )| ≤ c f − fk → 0. The norm l of a bounded linear functional l is defined as l = sup |l(f )|.

(10.42)

f ≤1

Since f / f has norm 1 for any f = 0, and since l is linear, we have l = supf =0 |l(f )|/ f . The collection of all bounded linear functionals on B is called the dual space B of B. We shall consider the case when B = Lp = Lp (E; dμ) and for simplicity restrict our attention to real-valued functions. Our goal is to show that if 1 ≤ p < ∞ and μ is σ-finite, then the dual space (Lp ) of Lp can be identified in a natural way with Lp , 1/p + 1/p = 1. The main tool in doing so is the Radon–Nikodym theorem. The first result is the following.

Theorem 10.43 formula

Let 1 ≤ p ≤ ∞, 1/p + 1/p = 1. If g ∈ Lp (E; dμ), then the l(f ) =

fg dμ

E

defines a bounded linear functional l ∈ [Lp (E; dμ)] . Moreover, l ≤ gp . Proof. This follows easily from Hölder’s inequality and the linear properties of the integral: we have |l(f )| = fg dμ ≤ gp f p , E

and therefore l ≤ gp .

264

Measure and Integral: An Introduction to Real Analysis

The theorem shows that with each g ∈ Lp , we can associate a bounded linear functional, l(f ) = E fg dμ, on Lp . If μ is σ-finite on E, the correspondence between g and l is unique (see Exercise 6) and defines an embedding of Lp in (Lp ) . We now give the characterization of (Lp ) , 1 ≤ p < ∞. Theorem 10.44 Let 1 ≤ p < ∞, 1/p + 1/p = 1, and let μ be σ-finite. If l ∈ [Lp (E; dμ)] , there is a unique g ∈ Lp (E; dμ) such that l(f ) =

fg dμ.

E

Moreover, l = gp , and therefore the correspondence between l and g defines an isometry between (Lp ) and Lp . Proof. Suppose first that μ(E) < +∞. Let l ∈ (Lp ) and write l = c. Define a set function φ on the measurable sets A ⊂ E by φ(A) = l(χA ). Note that φ is finite; in fact, |φ(A)| ≤ cχA p = cμ(A)1/p . Clearly, φ is finitely ∞ additive. To show that it is countably additive, suppose that A = k=1 Ak , Ak ∞ m measurable and disjoint. Write A = ( k=1 Ak ) ∪ ( k=m+1 Ak ) = A ∪ A . Then

φ(A) = φ(A ) + φ(A ) =

m

φ(Ak ) + φ(A ).

k=1

10.11 implies that φ(A ) Since |φ(A )| ≤ cμ(A )1/p (with p = ∞), Theorem ∞ tends to zero as m → ∞. Hence, φ(A) = k=1 φ(Ak ), which shows that φ is countably additive. The fact that |φ(A)| ≤ cμ(A)1/p also implies that φ is absolutely continuous with respect to μ. 1 By the Radon–Nikodym theorem, there is a g ∈ L (E; dμ)such that φ(A) = A g dμ for measurable A ⊂ E. This means that l(χA ) = E χA g dμ, so that l(f ) = E fg dμ for any simple measurable f . To show the same formula holds for any f ∈ Lp , we first claim that g ∈ Lp and gp ≤ c. If p > 1, choose simple functions hk with 0 ≤ hk |g|p . Let {gk } be the simple functions defined by 1/p

gk = hk sign g.

265

Abstract Integration 1/p Then gk p = hk 1 , and

1/p gk g dμ = l(gk ) ≤ c gk p = c hk 1 .

E 1/p+1/p

1/p

1/p

= hk , we obtain hk 1 ≤ chk 1 . We may Since gk g = hk |g| ≥ hk assume that hk 1 = 0 for large k. (Otherwise, g would be zero a.e. (μ), and our claim would be obviously true.) Hence, dividing both sides of the last 1/p 1/p inequality by hk , we have hk ≤ c, so that g ≤ c by the monotone 1

1

p

convergence theorem. This proves the claim when p > 1. The case p = 1 is left as an exercise. fk converging to To show that l(f ) = E fg dμ for any f ∈ Lp , choose simple f in Lp norm (see Exercise 8). Then l(fk ) → l(f ), and E fk g dμ → E fg dμ by Hölder’s inequality: fk g dμ − fg dμ ≤ fk − f g dμ ≤ fk − f p gp . E

E

E

The fact that the formula holds for fk thus implies that it holds for f by passing to the limit. To complete the proof for the case μ(E) < +∞, it remains to show that g = c and that the correspondence between l and g is unique. However, p we already know that g ≤ c, and the opposite inequality follows from p

Theorem 10.43. For the uniqueness of the correspondence, see Exercise 6(c). If μ(E) = +∞, then since μ is σ-finite, there exist Ej E with μ(Ej ) < +∞. Let l ∈ [Lp (E)] . We may view functions in Lp (Ej ) as those functions in Lp (E) that vanish outside Ej . Since the restriction of l to Lp (Ej ) is a bounded linear functional, there is a unique gj ∈ Lp (Ej ), gj p ,E ≤ l, such that j

l(f ) =

fgj dμ

Ej

for every f in Lp that vanishes outside Ej . For such f , the fact that Ej ⊂ Ej+1 also gives l(f ) =

Ej+1

fgj+1 dμ =

Ej

fgj+1 dμ.

266

Measure and Integral: An Introduction to Real Analysis

Therefore, gj+1 = gj a.e. (μ) in Ej . We may assume that gj+1 = gj everywhere in Ej . Define g(x) = gj (x) if x ∈ Ej . Then g is measurable and it follows that g ≤ l. If f ∈ Lp (E), then p l(f χEj ) =

fgj dμ =

Ej

fg dμ.

Ej

Since f χEj converges in Lp to f and Ej fg dμ → E fg dμ (note that fg ∈ L1 by Hölder’s inequality), we obtain l f = E fg dμ in the limit. Therefore, by Theorem 10.43, l ≤ gp , so that l = gp and the proof is complete. We remark that the proof of Theorem 10.44 fails in several places if p = ∞, for example, at the place where we conclude that φ is absolutely continuous. In fact, the theorem itself is false when p = ∞, is, not every bounded that linear functional on L∞ can be represented l f = fg dμ for some g ∈ L1 ; an example is indicated in Exercise 18.

10.5 Relative Differentiation of Measures Lebesgue’s differentiation theorem, Theorem 7.11, states that if f is locally integrable in Rn , then lim

h→0

1 f (y) dy = f (x) a.e., |Qx (h)| Qx (h)

where Qx (h) is the cube with center x and edge length h. We will now study an analogue of this result for other measures on Rn . Specifically, if μ and ν are two σ-finite measures on the Borel subsets of Rn , we will study the existence of lim

h→0

ν(Qx (h)) μ(Qx (h))

and its relation to the Lebesgue decomposition of ν with respect to μ. We will follow the method used to prove Lebesgue’s differentiation theorem. To do this, we must find a replacement for Vitali’s lemma: the simple form of Vitali’s lemma (see Lemma 7.4) relies heavily on the fact that expanding a cube concentrically by a factor (say 5) only enlarges its Lebesgue

267

Abstract Integration

measure proportionally, whereas no such relation may hold for general measures. In order to bypass this difficulty, we shall present a covering lemma that is purely geometric in nature, that is, which makes no mention of measure. We consider only cubes whose edges are parallel to the coordinate axes and write Q = Qx for those with center x. We say that a family K of cubes has bounded overlaps if there is a constant c such that every x ∈ Rn belongs to at most c cubes from K. Thus, K has bounded overlaps if and only if

χQ (x) ≤ c for all x ∈ Rn .

Q∈K

Theorem 10.45 (Besicovitch Covering Lemma) Let E be a bounded subset of Rn and let K be a family of cubes covering E that contains a cube Qx with center x for each x ∈ E. Then there exist points {xk } in E such that (i) E ⊂ Qxk (ii) {Qxk } has bounded overlaps Moreover, the constant c for which

χQx ≤ c can be chosen to depend only on n. k

In order to prove this, it will be convenient to first prove the following lemma. Lemma 10.46 Let {Qk }∞ k=1 be a sequence of cubes with centers {xk } such that if / Qj and |Qk | ≤ 2|Qj |. Then {Qk } has bounded overlaps, and the j < k, then xk ∈ constant c for which χQk ≤ c can be chosen to depend only on n. Proof. We will consider only n = 2; the case n = 2 is similar and left as an exercise. Let Qkm , m = 1, 2, . . . , be those Qk that contain the origin and whose centers are in the first quadrant, and let hm denote the edge length of Qkm . Then Qk1 covers at least the region A = (x, y) : 0 ≤ x ≤ 12 h1 , 0 ≤ y ≤ 12 h1 . Hence, no Qkm can have its center outside the set {(x, y) : 0 ≤ x ≤ h1 , 0 ≤ y ≤ h1 }; otherwise, we would have hm > 2h1 for some m, so that |Qkm | > 4|Qk1 |, a contradiction. Therefore, the center of each Qkm , m ≥ 2, must lie in one of the regions A, B, C, or D indicated in the following illustration:

268

Measure and Integral: An Introduction to Real Analysis

h1 1 2

B

C

A

D

h1

1 2

h1

h1

The center cannot be in A since that would contradict the assumption that the center of Qkm , m ≥ 2, does not lie in Qk1 . If it lies in B, then since Qkm contains 0, it covers B, and so there is at most one Qkm with center in B. Similar arguments hold for C and D. Applying the same reasoning to each quadrant, we see that there are at most 16 cubes in {Qk } that contain the origin. By translation, the same holds at any point of the plane, and the lemma is proved. Proof of Besicovitch’s lemma. Let α1 = sup {|Qx | : x ∈ E}. If α1 = +∞, there are arbitrarily large Qx , and since E is bounded, we simply choose one that contains E. If α1 < +∞, write E1 = E and choose x1 ∈ E1 with |Qx1 | > α1 /2. Let E2 = E1 − Qx1 ,

α2 = sup {|Qx | : x ∈ E2 }.

The definition of α2 assumes that E2 = ∅. Then α2 > 0 and we choose x2 ∈ E2 with Qx2 > α2 /2. Proceed in this way, obtaining at the kth stage Ek = Ek−1 − Qxk−1 = E − xk ∈ Ek ,

Qx > αk /2. k

k−1

Qxj ,

αk = sup {|Qx | : x ∈ Ek },

j=1

We continue the process as long as Ek = ∅. Note that αk >0 if Ek = ∅. all j ≤ k. Therefore, Qxk ≤ αk ≤ αj < Since xk ∈ Ek , we have xk ∈ Ej for 2Qxj if j ≤ k. It follows that Qxk satisfies the hypothesis of Lemma 10.46 and so has bounded overlaps. It remains only to show that E ⊂ Qxk . If some Ek0 is empty, then E is contained in the union of the Qxk , k ≤ k0 − 1. then Qxk is defined for every If no Ek is empty, k = 1, 2, . . .. Since αk and αk /2 < Qxk ≤ αk , it follows either that Qxk → 0 or that there exists δ > 0

269

Abstract Integration

such that Qxk ≥ δ for all k. The second possibility cannot arise; otherwise, E would not be bounded since xk ∈ E but xk is not in any Qxj with j < k. Hence, Qxk → 0, or equivalently, αk → 0. If x ∈ E − Qxk , then x ∈ Ek for all k. Therefore, |Qx | ≤ αk for all k, which means that |Qx | = 0. This shows that E − Qxk is actually empty and completes the proof. A Borel measure μ on Rn (i.e., a measure on the Borel subsets of Rn ) is called regular if μ(E) = inf {μ(G) : G ⊃ E, G open} for every Borel set E. If μ is regular and E is a Borel set with μ(E) < ∞, then any open set G that satisfies E ⊂ G and μ(G) < μ(E) + ε for some ε > 0 also satisfies μ(G − E) < ε since μ(G − E) = μ(G) − μ(E) when μ(E) < ∞. Now suppose that μ is a σ-finite regular Borel measure. Let E be a Borel set and ε > 0. We will show that there is an open set G satisfying E ⊂ G and μ(G − E) < ε whether μ(E) is finite or not. In fact, since μ is σ-finite, we can by choosing open sets write E = ∞ 1 Ek with μ(Ek ) < ∞ for each k, and then ∞ −k , we obtain an open set G = ) < ε2 Gk with μ(Gk −E 1 Gk such that E ⊂ G ∞k and G − E ⊂ 1 (Gk − Ek ), and consequently μ(G − E) ≤

∞ 1

∞ ε μ(Gk − Ek ) < = ε, 2k 1

as desired. Moreover, since the complement CE of E is also a Borel set, it follows that there is an open set G with CE ⊂ G and μ( G − CE) < ε. Then the closed set F defined by F = CG satisfies F ⊂ E and μ(E − F) < ε since E−F = G − CE. In case μ(F) < ∞, we obtain μ(E) − μ(F) < ε. In any case, either both μ(E) and μ(F) are finite or μ(E) = μ(F) = ∞. Therefore, for any Borel set E, μ(E) =

sup

μ(F)

F closed F⊂E

if μ is a σ-finite regular Borel measure. From now on, we will consider two regular Borel measures μ and ν on Rn that are finite on the bounded Borel sets. Let Qx (h) denote the cube with center x and edge length h. We will also assume that (i) μ(Qx (h)) > 0 for x ∈ Rn , h > 0. (ii) Sets of the form ν(Qx (h)) ν(Qx (h)) x : sup > α , x : lim sup > α , etc., h>0 μ(Qx (h)) h→0 μ(Qx (h)) are (Borel) measurable.

270

Measure and Integral: An Introduction to Real Analysis

Assumption (ii) is made for simplicity and is not necessary (see Exercise 17 of Chapter 11). Also, the assumption that μ and ν are regular is redundant (see Theorem 11.24 and the remarks following it). We assume that μ and ν are finite on the bounded Borel sets in order to ensure their finiteness on every Qx (h). Thus μ and ν are σ-finite measures. We will also use the fact that the class of continuous functions with compact support is dense in L(dμ); that is, given f ∈ L(dμ), there exist continuous gk with compact support such that f − gk dμ → 0. (See Exercise 27.) Note that when μ is Lebesgue measure and ν(E) = E | f | dx for a locally integrable f , then suph>0 ν(Qx (h))/μ(Qx (h)) is the Hardy–Littlewood maximal function of f (see (7.5)). In the next lemma, we estimate the size of this expression if μ and ν are any measures with the properties listed above. The results we will prove about differentiation are corollaries of this estimate.

Lemma 10.47 Let μ and ν satisfy the stated conditions. Then there is a constant c depending only on n such that (a) μ {x ∈ Rn : suph>0 [ν(Qx (h))/μ(Qx (h))] > α} ≤ (c/α)ν(Rn ), and (b) μ{x ∈ E : lim suph→ 0 [ν(Qx (h))/μ(Qx (h))] > α} ≤ (c/α)ν(E) for any Borel set E ⊂ Rn and any α > 0 Proof. (a) Fix α > 0, and let ν(Qx (h)) >α . S = x ∈ Rn : sup h > 0 μ(Qx (h)) If B is any bounded Borel set and x ∈ S ∩ B, there is a cube Qx withcenter Besicovitch’s x such that ν(Qx )/μ(Qx ) > α. Using lemma, select Qxk and c such that ν(Qxk ) > αμ(Qxk ), S ∩ B ⊂ Qxk , and χQx ≤ c. We then have k

1 ν(Qxk ), μ(Qxk ) < Qxk ≤ α ν(Qxk ) = χQx dν ≤ c dν = cν Qxk . μ(S ∩ B) ≤ μ

k

Qxk

Qxk

Therefore, μ(S ∩ B) ≤ cν( Qxk )/α, so that μ(S ∩ B) ≤ cν(Rn )/α. Letting B Rn , we obtain μ(S) ≤ cν(Rn )/α, as desired.

271

Abstract Integration

(b) Fix α > 0, and let ν(Qx (h)) >α . T = x ∈ E : lim sup h→0 μ(Qx (h)) If ν(E) = + ∞, there is nothing to prove. Otherwise, choose an open set G ⊃ E with ν(G) < ν(E) + ε, and let B be a bounded Borel set. If x ∈ T ∩ B, there is a cube Qx such that Qx ⊂ G and ν(Qx )/μ(Qx ) > α. By again using Qxk ⊂ G, such that μ(T ∩ B) ≤ Besicovitch’s lemma, there exists {Qxk }, c ν( Qxk )/α. Therefore, μ(T ∩ B) ≤ c ν(G)/α ≤ c[ν(E) + ε]/α. The result now follows by first letting ε → 0 and then letting B Rn . The first result about differentiation of measures is the following.

Theorem 10.48 singular, then

Let ν and μ satisfy the stated conditions. If ν and μ are mutually

ν(Qx (h)) = 0 a.e. (μ). μ(Q h→0 x (h)) lim

Proof. Since ν and μ are mutually singular, there is a set Z with ν(Rn − Z) = μ(Z) = 0. Let E = Rn − Z, and consider the sets ν(Qx (h)) > α , α > 0, Tα = x ∈ E : lim sup h→0 μ(Qx (h)) ν(Qx (h)) T = x ∈ E : lim sup >0 . μ(Qx (h)) h→0 By Lemma 10.47(b), we have μ(Tα ) ≤ cν(E)/α = 0. Since T is the union of the Tαk for any sequence αk → 0, it also has μ-measure zero, and the result follows.

Theorem 10.49 Let μ satisfy the stated conditions, and let f be a Borel measurable function that is integrable (dμ) over every bounded Borel set in Rn . Then 1 f dμ = f (x) h→0 μ(Qx (h)) lim

Qx (h)

a.e. (μ).

272

Measure and Integral: An Introduction to Real Analysis

Proof. Assume first that f ∈ L (Rn ; dμ). For any integrable g, we have 1 1 ≤ f − g dμ f dμ − f (x) μ(Q (h)) μ(Q (h)) x x Qx (h) Qx (h) 1 + g dμ − f (x). μ(Qx (h)) Qx (h)

If g is also continuous, the last term on the right converges to |g(x) − f (x)| as h → 0. Hence, letting L(x) denote the lim sup as h → 0 of the term on the left, we obtain L(x) ≤ sup h>0

1 |f − g| dμ + |g(x) − f (x)|. μ(Qx (h)) Qx (h)

Therefore, the set Sε where L(x) > ε, ε > 0, is contained in the union of the two sets where the corresponding terms on the right side of the last inequality exceed ε/2. From Lemma 10.47 and Tchebyshev’s inequality, we obtain

μ(Sε ) ≤ c

−1 ε −1 f − g dμ + ε f − g dμ. 2 2 n n R

R

As noted before the proof of Lemma 10.47, g can be chosen such that Rn | f − g| dμ is arbitrarily small. Hence, μ(Sε ) = 0 for every ε > 0, and the result follows. The case when f ∈ / L(Rn ; dμ) is left as an exercise (cf. Theorem 7.11). Combining the last two theorems, we obtain the main result: Corollary 10.50 Let ν and μ satisfy the stated conditions. If ν(E) = E f dμ+σ(E) is the decomposition of ν into parts that are absolutely continuous and singular with respect to μ, then

lim

h→0

ν(Qx (h)) = f (x) μ(Qx (h))

a.e. (μ).

273

Abstract Integration

Exercises 1. Prove Theorem 10.13. 2. A measure space (S , , μ) is said to be complete if contains all subsets of sets with measure zero; that is, (S , , μ) is complete if Y ∈ whenever Y ⊂ Z, Z ∈ , and μ(Z) = 0. In this case, show that if f is measurable and g = f a.e. (μ), then g is also measurable (cf. Theorem 4.5 and Chapter 3, Exercise 34). Is this true if (S , , μ) is not complete? Give an example of an incomplete measure space with a measure that is neither identically infinite nor identically zero. 3. Prove Egorov’s Theorem 10.14. 4. If (S , , μ) is a measure space, and if f and {fk } are measurable and finite a.e. (μ) in a measurable set E, then {fk } is said to converge in μ-measure on E to limit f if lim μ{x ∈ E : |f (x) − fk (x)| > ε} = 0 for all ε > 0.

k→∞

Formulate and prove analogues of Theorems 4.21 through 4.23. 5. Complete the proof of Lemma 10.18. 6. (a) If f1 , f2 ∈ L(dμ) and E f1 dμ = E f2 dμ for all measurable E, show that f1 = f2 a.e. (μ). (b) Prove the uniqueness of f and σ in Theorem 10.40.

(c) Let and let f1 , f2 ∈ Lp (dμ), 1/p + 1/p = 1, 1 ≤ p ≤ ∞. If μ be σ-finite, f1 g dμ = f2 g dμ for all g ∈ Lp (dμ), show that f1 = f2 a.e. (μ). 7. Prove the integral convergence results in Theorems 10.27 through 10.29 and 10.31. 8. Show that for 1 ≤ p < ∞, the class of simple functions vanishing outside sets of finite measure is dense in Lp (dμ). See also Exercise 27. 9. The symmetric difference of two sets E1 and E2 is defined as E1 E2 = (E1 − E2 ) ∪ (E2 − E1 ). Let (S , , μ) be a measure space, and identify measurable sets E1 and E2 if μ(E1 E2 ) = 0. Show that is a metric space with distance d(E1 , E2 ) = μ(E1 E2 ) and that if μ is finite, then Lp (S , , μ) is separable if and only if is 1 ≤ p < ∞. (For the sufficiency in the second part, Exercise 8 may be helpful; for the necessity, let {fk } be a countable dense set in Lp (S , , μ) and consider the sets {1/2 < fk ≤ 3/2}.)

274

Measure and Integral: An Introduction to Real Analysis

10. If φ is a set function whose Jordan decomposition is φ = V − V, define E

f dφ =

f dV −

E

f dV,

E

provided not both integrals on the right are infinite with the same sign. If V is the total variation of φ on E, and if | f | ≤ M, prove that | E f dφ| ≤ MV. 11. Prove parts (ii)–(iv) of Theorem 10.33. 12. Give an example of a pair of measures ν and μ such that ν is absolutely continuous with respect to μ, but given ε > 0, there is no δ > 0 such that ν(A) < ε for every A with μ(A) < δ. (Thus, the analogue for measures of Theorem 10.34 may fail.) Prove the analogue of Theorem 10.35 for mutually singular measures ν and μ. 13. Show that the set P of the Hahn decomposition is unique up to null sets. (By a null set for φ, we mean a set N such that φ(A) = 0 for every measurable A ⊂ N.) 14. Complete the proof of Theorem 10.44 for p = 1. 15. (Converse of Hölder’s inequality) Let μ be a σ-finite measure and 1 ≤ p ≤ ∞. (a) Show that f p = sup fg dμ , where the supremum is taken over all bounded measurable functions g that vanish outside a set (depending on g) of finite measure, and for which gp ≤ 1 and fg dμ exists. (If 1 < p ≤ ∞ and f p < ∞, this can be deduced from Theorem 10.44.) (b) Show that a real-valued measurable f belongs to Lp if fg ∈ L1 for all g ∈ Lp , 1/p + 1/p = 1. 16. Consider a convolution operator Tf (x) = Rn f (y)K(x−y) dy with K ≥ 0. If 1 ≤ p ≤ ∞ and Tf p ≤ Mf p for all f , show that Tf p ≤ Mf p for all f , 1/p + 1/p = 1. (Use Exercise 15 to write Tf p = supgp ≤1 | Rn (Tf )g dx|, and note that (Tf )(x)g(x) dx = (T g)(−y)f (y) dy Rn

Rn

where g(x) = g(−x).) Find a generalization if the hypothesis is instead that Tf q ≤ M f p for all f , where q is a fixed index with 1 ≤ q ≤ ∞ and q = p.

275

Abstract Integration

17. Let μbe σ-finite and define L p (dμ) to be the class of complex-valued f with |f |p dμ < +∞. Let l be a complex-valued bounded linear functional on L p (dμ). If 1 ≤ p < ∞, show that there is a function g ∈ L p (dμ) such that l(f ) = fg dμ. (Here, as usual, we define h dμ = h1 dμ + i h2 dμ if h = h1 + ih2 with h1 and h2 real-valued.) (Hint: Reduce to the real case.) 18. Give an example to show that (L∞ ) cannot be identified with L1 as in Theorem 10.44. (Consider L∞ [−1, 1] with Lebesgue measure, and let C be the subspace of continuous functions on [−1,1] with the sup norm. Define l(f ) = f (0) for f ∈ C . Then l is a bounded linear functional on C , so by the Hahn–Banach theorem∗ , l has an extension l ∈ (L∞ [−1, 1]) . 1 If there were a function g ∈ L1 [−1, 1] such that l(f ) = −1 fg dx for all 1 f ∈ L∞ [−1, 1], then we would have f (0) = −1 fg dx for all f ∈ C . Show that this implies that g = 0 a.e., so that l ≡ 0. The functional l is called the Dirac δ-function.) To show that (L∞ (E; dx)) and L1 (E; dx) are not isometrically isomorphic, one can combine the following three facts: L1 (E; dx) is separable; L∞ (E; dx) is not separable; and, a Banach space is separable if its dual space is separable. For the latter, see the references in the footnote below, Theorem 8.11 on p. 192 of the first, or Theorem 3.26, p. 73, of the second. 19. Complete the proof of Theorem 10.49. 20. Under the hypothesis of Theorem 10.49, prove that 1 | f (y) − f (x)| dμ(y) = 0 h→0 μ(Qx (h)) lim

a.e. (μ).

Qx (h)

21. Derive an analogue of the Besicovitch Covering Lemma for the case of two dimensions (x, y) when the squares Q(x,y) are replaced by rectangles R(x,y) (h) centered at (x, y) whose x and y dimensions are h and h2 , respectively. Use this result to prove that under the hypothesis of Theorem 10.49, lim

h→0

1 μ(R(x,y) (h))

f dμ = f (x, y)

a.e. (μ).

R(x,y) (h)

22. Let μ be a measure and A be a set with 0 < μ(A) < ∞. Let f be measurable and bounded on A, and let φ be convex in an interval containing the range of f . Prove that ∗ (see, e.g., Theorem 2.5, p. 33 of M. Schechter, Principles of Functional Analysis, 2nd edition,

Graduate Studies in Mathematics, vol. 36 (2001), American Mathematical Society; or Theorem 1.1, p. 1, of H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer, 2011).

276

Measure and Integral: An Introduction to Real Analysis

φ(f ) dμ A f dμ φ ≤ A . dμ A A dμ

(This is Jensen’s inequality for measures. See Theorem 7.44.) 23. A sequence {φk } of set functions is said to be uniformly absolutely continuous with respect to a measure μ if given ε > 0, there exists δ > 0 such that if E satisfies μ(E) < δ, then |φk (E)| < ε for all k. If {fk } is a sequence of integrable functions on a finite measure space (S , , μ) that converges pointwise a.e. (μ) to an integrable f , show that fk → f in L(dμ) norm if and only if the indefinite integrals of the fk are uniformly absolutely continuous with respect to μ. (Cf. Exercise 17 of Chapter 7.) 24. Let (S , , μ) be a σ-finite measure space, and let f be -measurable and integrable over S . Let 0 be a σ-algebra satisfying 0 ⊂ . Of course, f that there is a unique function f0 that may not be 0 -measurable. Show is 0 -measurable such that fg dμ = f0 g dμ for every 0 -measurable g for which the integrals are finite. The function f0 is called the conditional expectation of f with respect to 0 , denoted f0 = E(f |0 ). (Apply the Radon–Nikodym theorem to the set function φ(E) = E f dμ, E ∈ 0 .) 25. Using the notation of the preceding exercise, prove the following: (a) E(af + bg|0 ) = aE(f |0 ) + bE(g|0 ), a, b constants. (b) E(f |0 ) ≥ 0 if f ≥ 0. (c) E(fg|0 ) = gE(f |0 ) if g is 0 -measurable. (d) If 1 ⊂ 0 ⊂ , then E(f |1 ) = E(E(f |0 )|1 ). 26. (Hardy’s inequality) Let f ≥ 0 on (0, ∞), 1 ≤ p < ∞, dμ(x) = xα dx and dν(x) = xα+p dx on (0, ∞). Prove there exists a constant c independent of f such x ∞ ∞that (i) 0 ( 0 f (t) dt)p dμ(x) ≤ c 0 f p (x) dν(x), α < −1, ∞ ∞ ∞ (ii) 0 ( x f (t) dt)p dμ(x) ≤ c 0 f p (x) dν(x), α > −1. x x (For (i), ( 0 f (t) dt)p ≤ cxp−η−1 0 f (t)p tη dt by Hölder’s inequality, provided p − η − 1 > 0. Multiply both sides by xα , integrate over (0, ∞), change the order of integration, and observe that an appropriate η exists since α < −1.) 27. If μ is a σ-finite regular Borel measure on Rn , show that the class of continuous functions with compact support is dense in Lp (dμ), 1 ≤ p < ∞. (By Exercise 8, it is enough to approximate χE , where E is a Borel set with finite measure. Given ε > 0, as shown in Section 10.5 on p. 269, there exist open G and closed F with F ⊂ E ⊂ G and μ(G − F) < ε. Now use Urysohn’s lemma: if F1 and F2 are disjoint closed sets in Rn , there is a continuous f on Rn with 0 ≤ f ≤ 1, f = 1 on F1 , f = 0 on F2 .)

28. Let 1 < p ≤ ∞ and μ be a σ-finite measure for which Lp (E; dμ) is separable, 1/p + 1/p = 1. Show that every bounded sequence in Lp (E; dμ) has a weakly convergent subsequence, that is, if supk fk p < ∞, show that

277

Abstract Integration

there exists {fkj } and f ∈ Lp such that

E fkj g dμ

→

E fg dμ

for all g ∈ Lp .

(Use the Bolzano–Weierstrass theorem to show that for every g ∈ Lp , there is a subsequence {fkj } depending on g such that E fkj g dμ converges. By using a diagonal argument, {fkj } can be chosen to be independent of

g for all g in any fixed countable subset S of Lp and consequently for all g ∈ Lp by choosing S to be dense in Lp . Finally, apply Theorem 10.44 to the linear functional l ∈ (Lp ) defined by l(g) = limkj →∞ E fkj g dμ.) 29. Let l p be defined as in Section 8.3. Explain how Theorem 10.44 can be used to describe both the action and the norm of a continuous linear functional on l p , 1 ≤ p < ∞. 30. Let be the σ-algebra of Lebesgue measurable sets in R1 . For every E ∈ , let (E) denote the Lebesgue measure of E, and define measures R and by R(E) = (E ∩ [0, 1]) and (E) = χE (0): (a) Show that E f d = f (0)(E). (b) Is either R or absolutely continuous or singular with respect to ? (c) Identify the functions f and the sets Z in the Lebesgue decompositions of with respect to , of with respect to , of R with respect to , and of with respect to R. 31. Prove the Besicovitch Covering Lemma in case n = 1. 32. Let w(x) be a nonnegative locally integrable function on Rn such that n Q w > 0 for every cube Q in R with edges parallel to the coordinate axes. Consider the w-weighted maximal function Mw f defined by Mw f (x) = sup

1 | f (y)| w(y) dy, Q w(y) dy

x ∈ Rn ,

Q

where f is a measurable function and the supremum is taken over all cubes Q ⊂ Rn centered at x with edges parallel to the coordinate axes. Show that

w(x) dx ≤

{x:Mw f (x)>α}

C | f (x)| w(x) dx, α n

α > 0,

R

and that for 1 < p ≤ ∞,

Rn

1/p p

|Mw f (x)| w(x) dx

≤C

1/p p

| f (x)| w(x) dx

,

Rn

where the constant C is independent of w, f , and α. (Compare Lemma 10.47 and Theorem 9.16.)

11 Outer Measure and Measure

11.1 Constructing Measures from Outer Measures A function = (A) that is defined for every subset A of a set S is called an outer measure if it satisfies the following: (i) (A) ≥ 0, (∅) = 0. (ii) (A1 ) ≤ (A2 ) if A1 ⊂ A2 . (iii) ( Ak ) ≤ (Ak ) for any countable collection of sets {Ak }. For example, ordinary Lebesgue outer measure is an outer measure on the subsets of Rn . Some other concrete examples will be constructed later in the chapter. As with Lebesgue outer measure, it is possible to use any outer measure to introduce a class of measurable sets and a corresponding measure. In doing so, we base the definition of measurability on Carathéodory’s Theorem 3.30. Thus, given an outer measure , we say that a subset E of S is -measurable, or simply measurable, if (A) = (A ∩ E) + (A − E)

(11.1)

for every A ⊂ S . Equivalently, E is measurable if and only if (A1 ∪ A2 ) = (A1 ) + (A2 ) whenever A1 ⊂ E, A2 ⊂ S − E. It follows that a set E is measurable if and only if its complement S − E is measurable. As a simple example, let us show that any set Z with (Z) = 0 is measurable. In fact, for such Z and any A ⊂ S , property (ii) gives (A ∩ Z) + (A − Z) ≤ (Z) + (A) = (A). But by (iii), the opposite inequality (A) ≤ (A ∩ Z) + (A − Z) is always true, and the measurability of Z follows. 279

280

Measure and Integral: An Introduction to Real Analysis

If E is a measurable set, then (E) is called its -measure, or simply its measure. The terminology is justified by the following theorem. Let be an outer measure on the subsets of S .

Theorem 11.2

(i) The family of -measurable subsets of S forms a σ-algebra. (ii) is countably additive on disjoint measurable sets, that k } is a count is, if {E (Ek ). More able collection of disjoint -measurable sets, then ( Ek ) = generally, for any A, measurable or not, (A ∩ Ek ) and A∩ Ek = Ek . (A) = (A ∩ Ek ) + A − Proof. Let {Ek } be a collection of disjoint measurable sets, and let H = j and Hj = k=1 Ek , j = 1, 2, . . .. We first claim that for every A, (A) =

j

∞

k=1 Ek

(A ∩ Ek ) + (A − Hj ).

k=1

The proof will be by induction on j. If j = 1, the formula follows from the measurability of E1 . Assuming that the formula holds for j − 1, we have (A) = (A ∩ Ej ) + (A − Ej ) = (A ∩ Ej ) +

j−1

((A − Ej ) ∩ Ek ) + ((A − Ej ) − Hj−1 ).

k=1

Since the Ek are disjoint, (A − Ej ) ∩ Ek = A ∩ Ek for k ≤ j − 1. Hence, since j (A − Ej ) − Hj−1 = A − Hj , we obtain (A) = k=1 (A ∩ Ek ) + (A − Hj ), as required. This proves the claim. Since Hj ⊂ H, we have (A − Hj ) ≥ (A − H). Using this fact in the previous formula and letting j → ∞, it follows that (A) ≥

∞

(A ∩ Ek ) + (A − H) ≥ (A ∩ H) + (A − H).

k=1

However, we also have (A) ≤ (A ∩ H) + (A − H). Therefore, H is measurable and (A) = ∞ k=1 (A ∩ Ek ) + (A − H). Replacing A by A ∩ H in

281

Outer Measure and Measure

this equation, we obtain (A ∩ H) = ∞ k=1 (A ∩ Ek ), and the proof of (ii) is complete. Note we have also shown that a countable union of disjoint measurable sets is measurable, and we know that the complement of a measurable set is measurable. To prove (i), it remains to show that a countable union of arbitrary measurable sets is measurable. We will use the next lemma.

Lemma 11.3

If E1 and E2 are measurable, then so is E1 − E2 .

Proof. We will show that (A ∪ B) = (A) + (B) whenever A ⊂ E1 − E2 and B ⊂ C(E1 − E2 ). Since B = (B ∩ E2 ) ∪ (B − E2 ), we have A ∪ B = [A ∪ (B − E2 )] ∪ [B ∩ E2 ]. Hence, since A ∪ (B − E2 ) ⊂ CE2 and B ∩ E2 ⊂ E2 , it follows from the measurability of E2 that (A ∪ B) = (A ∪ (B − E2 )) + (B ∩ E2 ). However, A ⊂ E1 and B − E2 ⊂ C(E1 − E2 ) − E2 ⊂ CE1 . Therefore, since E1 is measurable, (A ∪ (B − E2 )) = (A) + (B − E2 ). Combining equalities and using the measurability of E2 , we obtain (A ∪ B) = (A) + (B − E2 ) + (B ∩ E2 ) = (A) + (B), which proves the lemma. Returning now to the proof of part (i) of Theorem 11.2, recall that the complement of a measurable set is measurable. Since E1 ∪ E2 = C(CE1 − E2 ), it then follows from Lemma 11.3 that E1 ∪ E2 is measurable if E1 and E2 are. Therefore, any finite union of measurable sets is measurable. Now, let {Ek } be j a countable collection of measurable sets. If Hj = k=1 Ek , then ∞ k=1

⎡

⎤ ∞

Hj+1 − Hj ⎦ , Ek = H1 ∪ ⎣ j=1

and since the Hj are measurable and increasing, the terms on the right are measurable and disjoint. Thus, by the case already proved, it follows that ∞ E is measurable. This completes the proof of the theorem. k k=1 According to Theorem 11.2, an outer measure is a measure on the σalgebra of -measurable sets and so enjoys the usual properties of measures. We also have the following result. Corollary 11.4 Let be an outer measure on S , let {Ek } be a collection of measurable sets, and let A be any set.

282

Measure and Integral: An Introduction to Real Analysis

(i) If Ek , then (A ∩ lim Ek ) = limk→∞ (A ∩ Ek ); if Ek and if (A ∩ Ek0 ) is finite for some k0 , then (A ∩ lim Ek ) = limk→∞ (A ∩ Ek ). (ii) (A ∩ lim inf Ek ) ≤ lim inf k→∞ (A ∩ Ek ); if (A ∩ ∞ k=k0 Ek ) is finite for some k0 , then (A ∩ lim sup Ek ) ≥ lim supk→∞ (A ∩ Ek ). Proof. We will prove the first statements in (i) and (ii); the proofs of the second statements are left as exercises. Let Ek be measurable and Ek . To prove the first part of (i), we may assume that (A ∩ Ek ) is finite for each k; otherwise, the result is clear. The sets E1 , E2 − E1 , . . . , Ek+1 − Ek , . . . are disjoint and measurable. Since ∞ ∞ Ek = E1 ∪ (Ek+1 − Ek ) , lim Ek = k=1

k=1

it follows from Theorem 11.2 that (A ∩ lim Ek ) = (A ∩ E1 ) +

∞

(A ∩ (Ek+1 − Ek )).

k=1

Moreover, since Ek and Ek+1 −Ek are disjoint and measurable and Ek has finite measure, we have (A ∩ (Ek+1 − Ek )) = (A ∩ Ek+1 ) − (A ∩ Ek ). Therefore, (A ∩ lim Ek ) = (A ∩ E1 ) +

∞ [(A ∩ Ek+1 ) − (A ∩ Ek )] k=1

= lim (A ∩ Ek+1 ), k→∞

which proves the first part of (i). For the first part of (ii), let {Ek } be measurable and define sets Xj = ∞ k=j Ek , j = 1, 2, . . .. Then Xj lim inf Ek , so that by (i), (A ∩ lim inf Ek ) = limj→∞ (A ∩ Xj ). But since A ∩ Xj ⊂ A ∩ Ej , we have limj→∞ (A ∩ Xj ) ≤ lim inf j→∞ (A ∩ Ej ), and the result follows.

11.2 Metric Outer Measures Now let us introduce a new assumption concerning the underlying space S : namely, that it is a metric space with metric d. The distance between two sets A1 and A2 is then defined by d(A1 , A2 ) = inf{d(x, y) : x ∈ A1 , y ∈ A2 },

283

Outer Measure and Measure

as in Euclidean space (see p. 5 in Section 1.3). An outer measure on S is called a metric outer measure, or an outer measure in the sense of Carathéodory, if (A1 ∪ A2 ) = (A1 ) + (A2 ) whenever d(A1 , A2 ) > 0. For example, by Lemma 3.16, Lebesgue outer measure satisfies this condition. An outer measure in a metric space may not be a metric outer measure and may lack properties (in addition to the defining property) of metric outer measures. Consider, for example, the case when S is the x, y-plane and d is the usual Euclidean metric in R2 . Define r(A) =

1 , d(A, Y)

A ⊂ R2 , where Y is the y-axis,

with the conventions 1/0 = ∞ and r(∅) = 0. We leave it as an exercise to check that r is an outer measure on R2 but not a metric outer measure. Furthermore, Y and all its subsets are r-measurable (with infinite measure), and no set B ⊂ R2 with d(B, Y) > 0 is r-measurable. In particular, r does not have the property in the next result, Theorem 11.5. Since S is a metric space, it has the topology induced by its metric. Thus, a set G in S is said to be open if for every x ∈ G, there is a δ > 0 such that the metric ball {y : d(x, y) < δ} lies in G. A closed set is by definition the complement of an open set, and B denotes the σ-algebra of Borel subsets of S ; that is, B is the smallest σ-algebra containing all the open (closed) subsets of S . Theorem 11.5 Let be a metric outer measure on a metric space S . Then every Borel subset of S is -measurable. Since the collection of -measurable sets is a σ-algebra, it is enough to prove that every closed set is -measurable. To prove this, we will use the following fact. Lemma 11.6 Let be a metric outer measure on a space S with metric d. Let A be any set contained in an open set G, and let Ak = {x ∈ A : d(x, CG) ≥ 1/k}, k = 1, 2, . . .. Then limk→∞ (Ak ) = (A). Proof. Since G is open, we have Ak A. Clearly, limk→∞ (Ak ) ≤ (A). To prove the opposite inequality, let Dk = Ak+1 − Ak , k = 1, 2, . . . Then d(Dk+1 , Ak ) ≥ [(1/k) − (1/(k + 1))] > 0 since if x ∈ Ak and y ∈ Dk+1 , then 1 1 ≤ d(x, CG) ≤ d(x, y) + d(y, CG) < d(x, y) + , k k+1

284

Measure and Integral: An Introduction to Real Analysis

where the second inequality is true since d is a metric. We also have A = Ak ∪ Dk ∪ Dk+1 ∪ · · · ,

(A) ≤ (Ak ) + (Dk ) + (Dk+1 ) + · · · .

If (Dj ) < +∞, then j≥k (Dj ) tends to zero as k → ∞, and it follows that (A) ≤ limk→∞ (Ak ), as desired. If (Dj ) = +∞, then at least one of (D2j ) and (D2j+1 ) is infinite. We can therefore choose N so that (DN ) + (DN−2 ) + (DN−4 ) + · · · is arbi trarily large. However, when k ≥ 2, the fact that k−1 j=1 Dj ⊂ Ak implies that k−1 the distance between Dk+1 and j=1 Dj is positive. Therefore, (DN ∪ DN−2 ∪ DN−4 ∪ · · · ) = (DN ) + (DN−2 ) + (DN−4 ) + · · · . Since AN+1 contains DN ∪ DN−2 ∪ DN−4 ∪ · · · , it follows that lim (Ak ) = +∞, and the lemma is proved. Proof of Theorem 11.5. Let F be any closed set. It is enough to show that (A ∪ B) = (A) + (B) for A ⊂ CF, B ⊂ F. If Ak = {x ∈ A : d(x, F) ≥ 1/k}, then d(Ak , B) ≥ 1/k, so that (Ak ∪ B) = (Ak ) + (B). Therefore, (A ∪ B) ≥ (Ak ) + (B). Letting k → ∞, it follows from the lemma that (A ∪ B) ≥ (A) + (B). Since the opposite inequality is also true, the theorem is proved. If S is a metric space, the notions of upper and lower semicontinuity of functions can be defined just as in Rn . For example, a real-valued function f defined near a point x0 is said to be upper semicontinuous at x0 if lim sup f (x) ≤ f (x0 ). x→x0

Here, of course, the notation x → x0 means that d(x, x0 ) → 0. The results of Theorem 4.14 are valid for metric spaces; for example, f is usc at every point of S if and only if {f ≥ a} is closed for every a. We thus obtain the following fact. Corollary 11.7 Let be a metric outer measure on S . Then every semicontinuous function on S is -measurable. Proof. Suppose, for example, that f is upper semicontinuous on S . Then {x : f (x) ≥ a} is closed for every a, and so is -measurable by Theorem 11.5. Hence, f is -measurable. If f is lower semicontinuous, then −f is upper semicontinuous, and the corollary follows.

285

Outer Measure and Measure

11.3 Lebesgue–Stieltjes Measure In this section and the next, we will consider two specific examples of outer measures in the sense of Carathéodory. The first is known as Lebesgue– Stieltjes outer measure. It elucidates the connection between measures and monotone functions. The situation is relatively simple for measures on R1 and monotone functions of a single variable, and we shall restrict our attention to this case. Extensions to higher dimensions are possible but more complicated. To construct a typical Lebesgue–Stieltjes outer measure, consider any fixed function f that is finite and monotone increasing (i.e., nondecreasing) on (−∞, +∞). For each half-open finite interval of the form (a, b], let λ(a, b] = λf ((a, b]) = f (b) − f (a). Note that λ ≥ 0 since f is increasing. If A is a nonempty subset of R1 , let λ(ak , bk ], ∗ (A) = ∗f (A) = inf where the inf is taken over all countable collections {(ak , bk ]} such that A ⊂ (ak , bk ]. Further, define ∗ (∅) = 0. Theorem 11.8

∗ is a Carathéodory outer measure on R1 .

Proof. We have ∗ ≥ 0 and ∗ (∅) = 0. First, we will show that if A1 ⊂ A2 , ∗ (A ) = +∞. then ∗ (A1 ) ≤ ∗ (A2 ). This is obvious if either A 2 1 = ∅ or , bk ] and λ(ak , bk ] < In any other case, choose {(ak , bk ]} such that A2 ⊂ (ak λ(ak , bk ]. Therefore, ∗ (A2 ) + ε. Then A1 ⊂ (ak , bk ], so that ∗ (A1 ) ≤ ∗ (A1 ) < ∗ (A2 ) + ε, and the result follows by letting ε → 0. To show that ∗ is subadditive, let {Aj }∞ j=1 be a collection of nonempty 1 subsets of R and let A = Aj . We may assume that ∗ (Aj ) < +∞ for each j j j. Choose (ak , bk ] such that Aj ⊂

j j (ak , bk ]

and

k

Since A ⊂

j j j,k (ak , bk ],

j

j

λ(ak , bk ] < ∗ (Aj ) + ε2−j .

k

we have

∗ (A) ≤

j,k

It follows that ∗ (A) ≤

j

j

λ(ak , bk ]

0, then given ε > 0, we can choose {(ak , bk ]} such that each (ak , bk ] has length less than d(A1 , A2 ) and A1 ∪ A2 ⊂

(ak , bk ],

λ(ak , bk ] ≤ ∗ (A1 ∪ A2 ) + ε.

Thus, the collection {(ak , bk ]} splits into two coverings, one of A1 and the other λ(ak , bk ], so that since ε is arbitrary, of A2 . Therefore, ∗ (A1 ) + ∗ (A2 ) ≤ ∗ (A1 ) + ∗ (A2 ) ≤ ∗ (A1 ∪ A2 ). But the opposite inequality is always true, which completes the proof. ∗f is called the Lebesgue–Stieltjes outer measure corresponding to f , and its restriction to those sets that are ∗f -measurable is called the Lebesgue–Stieltjes measure corresponding to f and denoted f or simply . Every Borel set in (−∞, ∞) is ∗f -measurable by Theorems 11.5 and 11.8. In particular, since (a, b] is a Borel set, ∗f ((a, b]) = f ((a, b]) for every (a, b]. We leave it as an exercise to show that the Lebesgue–Stieltjes outer measure ∗x corresponding to f (x) = x coincides with ordinary Lebesgue outer measure in R1 . Hence, by Carathéodory’s Theorem 3.30, a set is ∗x -measurable if and only if it is Lebesgue measurable. An outer measure defined on the subsets of a set S is said to be regular if for every A ⊂ S there is a -measurable set E such that A ⊂ E and (A) = (E). Ordinary Lebesgue outer measure in Rn is regular by Theorem 3.8. The next theorem shows that any Lebesgue–Stieltjes outer measure is regular; in fact, it shows that any set in R1 can be included in a Borel set with the same Lebesgue–Stieltjes outer measure. Of course, Borel sets are ∗ -measurable by Theorem 11.5. Theorem 11.9 Let ∗ be a Lebesgue–Stieltjes outer measure. If A is a subset of R1 , there is a Borel set B containing A such that ∗ (A) = (B). j

j

Proof. Given j = 1, 2, . . ., choose {(ak , bk ]} such that A⊂

j j ak , bk ], k

k

1 j j λ(ak , bk ] ≤ ∗ (A) + . j

287

Outer Measure and Measure

Let Bj =

j j k (ak , bk ]

and B =

(Bj ) ≤

Bj . Then A ⊂ B and B is a Borel set. Moreover,

k

1 j j λ(ak , bk ] ≤ ∗ (A) + . j

Since B ⊂ Bj , it follows that (B) ≤ ∗ (A) + (1/j), so that (B) ≤ ∗ (A). But the opposite inequality is also true since A ⊂ B, and the theorem follows. If μ is a finite Borel measure on R1 , define fμ (x) = μ((−∞, x]),

−∞ < x < +∞.

Note that fμ is monotone increasing and that μ((a, b]) = fμ (b) − fμ (a). It is natural to ask if the Lebesgue–Stieltjes measure induced by fμ agrees with μ as a Borel measure. An affirmative answer would mean that every finite Borel measure is a Lebesgue–Stieltjes measure. We shall see later (Corollary 11.22) that this is actually the case and that the continuity from the right of fμ (see Exercise 2) plays a role. The next result is also useful.

Theorem 11.10 If f is an increasing function that is continuous from the right, then its Lebesgue–Stieltjes measure satisfies ((a, b]) = f (b) − f (a). In particular, ({a}) = f (a) − f (a−). ((a, b]) ≤ f (b) − Proof. Since (a, b] covers itself, we always have ((a, b]) = ∗ f (a). To show the opposite inequality, suppose that (a, b] ⊂ (ak , bk ]. Given ε > 0, use the right continuity of f to choose {bk } with bk < bk ,

f (bk ) > f (bk ) − ε2−k .

If a satisfies a < a < b, then [a , b] is covered by the (ak , bk ), and therefore, there is a finite N such that [a , b] ⊂ N k=1 (ak , bk ). By discarding any unnec essary (ak , bk ) and reindexing the rest, we may assume that ak+1 < bk for k = 1, . . . , N−1. Also, a1 < a and b < bN , so that f (a1 ) ≤ f (a ) and f (b) ≤ f (bN ). We have k

λ(ak , bk ] ≥

N k=1

λ(ak , bk ] =

N

[f (bk ) − f (ak )]

k=1

= f (bN ) − f (a1 ) +

N−1

[f (bk ) − f (ak+1 )].

k=1

288

Measure and Integral: An Introduction to Real Analysis

Now, f (bN ) − f (a1 ) = [f (bN ) − f (bN )] + [f (bN ) − f (a1 )] ≥ −ε + [f (b) − f (a )]. Also, since f (bk ) − f (ak+1 ) ≥ 0 for k = 1, . . . , N − 1, N−1

[f (bk ) − f (ak+1 )] =

k=1

N−1

[f (bk ) − f (bk )] +

k=1

≥

N−1

[f (bk ) − f (ak+1 )]

k=1

∞

(−ε2−k ) + 0 = −ε.

k=1

Combining estimates, we obtain

λ(ak , bk ] ≥ −2ε + [f (b) − f (a )].

k

Letting ε → 0 and a → a, we have k λ(ak , bk ] ≥ f (b)−f (a). Hence, ((a, b]) ≥ f (b) − f (a), and the first statement of the theorem follows. The second statement is proved by applying the first to the intervals (a − (1/k), a], k = 1, 2, . . ., which decrease to {a}. This completes the proof. Let g be a Borel measurable function defined on R1 , and let f be a Lebesgue–Stieltjes measure. Then the integral g df is called the Lebesgue– Stieltjes integral of g with respect to f .∗ The next theorem gives a relation between g df and the usual Riemann–Stieltjes integral g df . Theorem 11.11 Let f be an increasing function that is right continuous on [a, b], and let g be a bounded Borel measurable function on [a, b]. If the Riemann–Stieltjes b integral a g df exists, then (a,b]

g df =

b

g df .

a

Proof. Let = {xj } be a partition of [a, b], and let mj and Mj be the inf and sup of g in [xj−1 , xj ] respectively. Let ∗ In some other texts, any integral

f dμ of the kind considered in Chapter 10 is called a Lebesgue–Stieltjes integral. We shall use the terminology only when μ is a Lebesgue–Stieltjes measure.

289

Outer Measure and Measure

L =

mj [f (xj ) − f (xj−1 )],

U =

Mj [f (xj ) − f (xj−1 )]

denote the corresponding lower and upper Riemann–Stieltjes sums. Define functions g1 and g2 by setting g1 = mj in (xj−1 , xj ] and g2 = Mj in (xj−1 , xj ]. Since f is right continuous, it follows from Theorem 11.10 that

g1 d = L ,

(a,b]

g2 d = U .

(a,b]

Therefore, since g1 ≤ g ≤ g2 , we obtain L ≤ (a,b] g d ≤ U . However, as b || → 0, both L and U converge to a g df by Theorem 2.29. This completes the proof. We remark in passing that a right continuous function f of bounded variation can be written f = f1 − f2 , where f1 and f2 are right continuous, bounded, and increasing. If 1 and 2 are the Lebesgue–Stieltjes measures corresponding to f1 and f2 , consider the Borel set function = 1 − 2 , and define g d = g d1 − g d2 . b b If a g df1 and a g df2 exist and are finite for a bounded Borel measurable function g, it then follows from Theorems 11.11 and 2.16 that

g d =

(a,b]

b

g df .

a

11.4 Hausdorff Measure Our second example of a Carathéodory outer measure is Hausdorff outer measure in Rn . To define it, fix α > 0, and let A be any subset of Rn . Given ε > 0, let (ε) Hα (A) = inf

δ(Ak )α ,

k

where δ(Ak ) denotes the diameter of Ak (see p. 5 in Section 1.3), and the inf is taken over all countable collections {Ak } such that A ⊂ Ak and δ(Ak ) < ε for all k. We may always assume that the Ak in a given covering are disjoint and that A = Ak .

290

Measure and Integral: An Introduction to Real Analysis

If ε < ε, each covering of A by sets with diameters less than ε is also such a cover for ε. Hence, as ε decreases, the collection of coverings decreases, and (ε) consequently Hα (A) increases. Define (ε) Hα (A) = lim Hα (A). ε→0

Theorem 11.12

For α > 0, Hα is a Carathéodory outer measure on Rn .

Proof. Clearly, Hα ≥ 0 and Hα (∅) = 0. If A1 ⊂ A2 , then any covering of (ε) (ε) A2 is also one of A1 , so that Hα (A1 ) ≤ Hα (A2 ). Letting ε→ 0, we obtain Hα (A1 ) ≤ Hα (A2 ). To show that Hα is subadditive, let A = Ak , and choose a cover of Ak for each k. The union of these is a cover of A, and it is easy (ε) (ε) to show that Hα (Ak ) ≤ Hα (Ak ). Letting ε → 0, we get Hα (A) ≤ Hα (Ak ). The details of this argument and the proof that Hα is a Hα (A) ≤ Carathéodory outer measure are left as exercises. Hα is called Hausdorff outer measure of dimension α on Rn , and the corresponding measure is called Hausdorff measure of dimension α and also denoted Hα . It has the following basic property.

Theorem 11.13 (i) If Hα (A) < +∞, then Hβ (A) = 0 for β > α. (ii) If Hα (A) > 0, then Hβ (A) = +∞ for β < α. Proof. Statements (i) and (ii) are equivalent. To prove (i), let A = δ(Ak ) < ε. If β > α, then (ε) Hβ (A) ≤ (ε)

δ(Ak )β ≤ εβ−α

Ak ,

δ(Ak )α .

(ε)

Therefore, Hβ (A) ≤ εβ−α Hα (A). Letting ε → 0, we obtain Hβ (A) = 0 if Hα (A) < +∞, completing the proof. The next theorem shows that Hausdorff outer measure is regular, by showing that any set in Rn can be included in a Borel set with the same Hausdorff outer measure. Theorem 11.14 Given A ⊂ Rn and α > 0, there is a set B of type Gδ containing A such that Hα (A) = Hα (B).

291

Outer Measure and Measure

Proof. Given ε > 0, choose {Ak } such that A =

(ε/2)

δ(Ak )α ≤ Hα

Ak , δ(Ak ) < ε/2 and

(A) + ε ≤ Hα (A) + ε.

ε)δ(Ak ); this can be done Enclose Ak in an open set Gk with δ(Gk ) ≤ (1 + by letting Gk = {x : d(x, Ak ) < εδ(Ak )/2}. Let G = Gk . Then G is open and A ⊂ G. Since δ(Gk ) < (1 + ε)ε/2 < ε for 0 < ε < 1, we have (ε) Hα (G) ≤

δ(Gk )α ≤ (1 + ε)α

δ(Ak )α

≤ (1 + ε)α [Hα (A) + ε]. Now let ε → 0 through a sequence {εj }, and let G(j) be the corresponding open sets G as above. If B = G(j), then B is of type Gδ and A ⊂ B. Also, since B ⊂ G(j) for each j, we have (ε) (B) ≤ (1 + ε)α [Hα (A) + ε] Hα

for ε = εj .

Letting j → ∞, we obtain Hα (B) ≤ Hα (A). Since the opposite inequality is clearly true, the result follows. If A is a subset of R1 and A = Ak with δ(Ak ) < ε, then δ(Ak ) = |Ik |, where Ik is the smallest interval containing Ak . Hence, in the one-dimensional case, (ε) Hα (A) = inf

|Ik |a

(n = 1),

where the Ik ’s are intervals of length less than ε such that A ⊂ Ik . If α = 1, it follows that H1 (A) is the usual Lebesgue outer measure of A. In Rn , n > 1, Hn is not the same as Lebesgue outer measure (see Exercise 10). Nevertheless, there is a simple relation between the two, which is a corollary of the next lemma. Let (ε) (A) = inf Hα

δ(Qk )α ,

where {Qk } is any collection of cubes with edges parallel to the axes such that A ⊂ Qk and δ(Qk ) < ε. Also, let (ε) (A) = lim Hα (A). Hα ε→0

292

Measure and Integral: An Introduction to Real Analysis

is defined in the same way as H , except that cubes are used instead Thus, Hα α of arbitrary sets.

Lemma 11.15

There is a constant c depending only on n and α such that Hα (A) ≤ Hα (A) ≤ cHα (A),

A ⊂ Rn .

Proof. Since every covering of A by cubes is a covering of A, we obtain (A). Any set with diameter δ, say, is contained in a cube with Hα (A) ≤ Hα √ Ak , δ(Ak ) < ε. edge length 2δ and so with diameter √2 n δ. Now let A = Select cubes Qk ⊃ Ak with δ(Qk ) = 2 n δ(Ak ). Then

√ √ √ (2 n ε) (A). δ(Ak )α = (2 n)−α δ(Qk )α ≥ (2 n)−α Hα

√ −α (2√nε) (ε) Therefore, H (A) ≥ (2 n) Hα (A). Letting ε → 0, we obtain Hα (A) ≥ α √ (A), which completes the proof. (2 n)−α Hα Theorem 11.16 (i) There are positive constants c1 and c2 depending only on the dimension n such that c1 Hn (A) ≤ |A|e ≤ c2 Hn (A) for A ⊂ Rn . (ii) If α > n, then Hα (A) = 0 for every A ⊂ Rn . Proof. We first claim that for every set A ⊂ Rn , Hn (A) = inf

{Qk }

δ(Qk )n ,

where the inf is taken over all collections {Qk } of cubes with edges parallel to the coordinate axes that cover A, without restriction on the size of δ(Qk ). Let I denote the inf on the right side. It follows easily that Hn (A) ≥ I. To collection of cubes show the opposite inequality, let {Qk : k = 1, 2, . . .} be a with edges parallel to the coordinate axes such that A ⊂ Qk , and let ε, η > 0. Pick cubes {Q∗k } satisfying Qk ⊂ (Q∗k )◦ and |Q∗k − Qk | < η/2k for each k. Decompose (Q∗k )◦ = j Q k,j into the union of nonoverlapping cubes Qk,j with k,j ) < ε (and edges parallel to the coordinate axes); see Theorem 1.11 and δ(Q the comment after its proof about the size of the initial net of cubes used in its ∗ n proof. Then, for each k, we have |Q∗k | = j |Q k,j |, and consequently δ(Qk ) = n n j δ(Qk,j ) since for any cube Q, δ(Q) is proportional to the volume of Q: in k,j and fact, δ(Q)n = nn/2 |Q|. Then A ⊂ k,j Q

293

Outer Measure and Measure

Hn(ε) (A) ≤

k,j )n = nn/2 δ(Q

k,j

n by Theorem 11.13. If Hn (A) = +∞, write A = (A ∩ Qj ), where the Qj are disjoint (partly open) cubes. Since |A ∩ Qj |e is finite, so is Hn (A ∩ Qj ). Hence, Hα (A ∩ Qj ) = 0 for α > n. Therefore, Hα (A) ≤ Hα (A ∩ Qj ) = 0. It is natural to ask if Hα (A) is comparable to the expression inf δ(Ak )α , where the inf is taken over all coverings {Ak } of A, without any requirement on the size of the diameters. It is not difficult to see that the answer in general is no. In fact, this expression is finite for any bounded A, as is easily seen from covering A by itself. On the other hand, it is clear from Theorems 11.13 and 11.16 that if |A|e > 0, then Hα (A) = +∞ for α < n. However, in case α = n, Hα (A) is comparable to the previous expression. To see this, it is enough by Lemma 11.15 to prove that Hn (A) is comparable to it. But since Hn (A) = inf δ(Qk )n , where {Qk } is any collection of cubes covering A whose edges are parallel to the axes, this follows from the fact that inf δ(Qk )n and inf δ(Ak )n are comparable (cf. the proof of Lemma 11.15). We remark in passing that Hausdorff measure is particularly useful in measuring sets with Lebesgue measure zero since these may have positive Hausdorff measure for some α < n. For example, it can be shown that the Cantor set in [0,1] has Hausdorff measure of dimension log 2/log 3 equal to 1; see Exercise 19.

11.5 The Carathéodory–Hahn Extension Theorem In this section, we will settle the question that arose in the discussion preceding Theorem 11.10. We recall the situation. Let μ be a finite Borel measure

294

Measure and Integral: An Introduction to Real Analysis

on R1 , and let be the Lebesgue–Stieltjes measure induced by the function f (x) = μ((−∞, x]). Since f is continuous from the right, we have by Theorem 11.10 that ((a, b]) = μ((a, b]) [= f (b) − f (a)]. The point in question is whether this implies that μ and agree on every Borel set in R1 . More generally, we may ask if two Borel measures can be finite and equal on every (a, b] without being identical. It is worthwhile to consider this question in a still more general context, which we now present. Let S be a fixed set. By an algebra A of subsets of S , we mean a nonempty collection of subsets of S that is closed under the operations of taking complements and finite unions; that is, A is an algebra if it satisfies the following: (i) If A ∈ A , then CA (= S − A) ∈ A . (ii) If A1 , . . . , AN ∈ A , then N k=1 Ak ∈ A . What distinguishes an algebra from a σ-algebra is that an algebra is only closed under finite unions. It follows from the definition that an algebra is also closed under finite intersections and differences (relative complements) and that both the empty set ∅ and the whole space S belong to it. The collection of finite intervals (a, b] on the line is clearly not an algebra. However, we can generate an algebra from it by adjoining ∅, R1 , and all intervals of the form (−∞, a] and (b, +∞), as well as all possible finite disjoint unions of these and the intervals (a, b]. This algebra will be called the algebra generated by the intervals (a, b]. By a measure λ on an algebra A , we mean a function λ which is defined on the elements of A and which satisfies (i) λ(A) ≥ 0 and λ(∅) = 0, ∞ (ii) λ( ∞ k=1 Ak ) = k=1 λ(Ak ) whenever {Ak } is a countable collection of disjoint sets in A whose union also belongs to A . It follows easily that such a λ is monotone: if A1 ⊂ A2 and A1 , A2 ∈ A , then λ(A1 ) ≤ λ(A2 ). A measure λ on A is called σ-finite (with respect to A ) if S can be written S = Sk with Sk ∈ A and λ(Sk ) < +∞. For example, any Lebesgue–Stieltjes measure on the line is a σ-finite measure on the algebra generated by the intervals (a, b]. Using the ideas behind the construction of a Lebesgue–Stieltjes outer measure, we can construct an outer measure λ∗ from λ. Thus, let λ be a measure

295

Outer Measure and Measure

on an algebra A of subsets of S . For any subset A of S , define λ∗ (A) = inf

λ(Ak ),

(11.17)

where the infimum is taken over all countable collections {Ak } such that Ak ∈ A and A ⊂ Ak . It is always possible to find such a covering of A since S itself belongs to A . The facts that A is an algebra and λ is monotone allow us to assume without loss of generality that the sets Ak covering A are disjoint since k≥1 Ak = A1 ∪ (A2 − A1 ) ∪ (A3 − A2 − A1 ) ∪ · · · . Theorem 11.18 Let λ be a measure on an algebra A , and let λ∗ be defined by (11.17). Then λ∗ is an outer measure. The proof is similar to the first part of the proof of Theorem 11.8 and is left as an exercise. While λ is assumed to be defined only on A , λ∗ is defined on every subset of S . The next result shows that λ∗ equals λ when restricted to A . Theorem 11.19 Let λ be a measure on an algebra A , and let λ∗ be the corresponding outer measure. If A ∈ A , then λ∗ (A) = λ(A) and A is measurable with respect to λ∗ . Proof. Let A ∈ A . Clearly, λ∗ (A) ≤ λ(A). On the other hand, given disjoint Ak ∈ A with A ⊂ ∪Ak , let Ak = Ak∩ A. Then Ak ∈ A and A is the dis joint union of the Ak . Hence, λ(A) = ∗ λ(Ak ). Since Ak ⊂ Ak , it follows that λ(A) ≤ λ(Ak ). Therefore, λ(A) ≤ λ (A), and the proof of the first part of the theorem is complete. For the second part, let A ∈ A . To show that A is measurable with respect to λ∗ , we must show that λ∗ (E) = λ∗ (E ∩ A) + λ∗ (E − A) for every E ⊂ S . Since λ∗ is subadditive, the right side majorizes the left. To show the opposite inequality, we may that λ∗ (E) is finite. Given ε > 0, choose {Ek } such assume that Ek ∈ A , E ⊂ Ek and λ(Ek ) < λ∗ (E) + ε. Since Ek and A are in A and (Ek ∩ A) ∪ (Ek − A) = Ek , we have λ(Ek ∩ A) + λ(Ek − A) = λ(Ek ). Hence,

λ(Ek ∩ A) +

λ(Ek − A) < λ∗ (E) + ε.

296

Measure and Integral: An Introduction to Real Analysis

Therefore, since E ∩ A ⊂ the definition of λ∗ that

(Ek ∩ A) and E − A ⊂

(Ek − A), it follows from

λ∗ (E ∩ A) + λ∗ (E − A) < λ∗ (E) + ε. Letting ε → 0, we obtain the desired inequality, which completes the proof. Let λ be a measure on an algebra A , and let μ be a measure on a σ-algebra that contains A . Then μ is said to be an extension of λ to if μ(A) = λ(A) for every A ∈ A . If λ∗ is the outer measure generated by λ and A ∗ denotes the σ-algebra of λ∗ -measurable sets, it follows from the last theorem that λ∗ is an extension of λ to A ∗ . This proves the first part of the following theorem, which is the main result of this section.

Theorem 11.20 (Carathéodory–Hahn Extension Theorem) Let λ be a measure on an algebra A , let λ∗ be the corresponding outer measure, and let A ∗ be the σ-algebra of λ∗ -measurable sets. (i) The restriction of λ∗ to A ∗ is an extension of λ. (ii) If λ is σ-finite with respect to A , and if is any σ-algebra with A ⊂ ⊂ A ∗ , then λ∗ is the only measure on that is an extension of λ. Proof. As we have already observed, (i) follows from Theorem 11.19. To prove (ii), which states the uniqueness of the extension, let μ be any measure on , A ⊂ ⊂ A ∗ , which agrees with λ on A . Given a set E ∈ , consider any countable collection {Ak } such that E ⊂ Ak and each Ak ∈ A . Then μ(Ak ) = λ(Ak ). μ(E) ≤ μ Ak ≤ Therefore, by the definition of λ∗ , we have μ(E) ≤ λ∗ (E). To show that equality holds, first suppose that there exists a set A ∈ A with E ⊂ A and λ(A) < +∞. Applying what has just been proved to A − E (which belongs to ), we obtain μ(A − E) ≤ λ∗ (A − E). However, μ(E) + μ(A − E) = μ(A) = λ(A) = λ∗ (A) = λ∗ (E) + λ∗ (A − E). Since all these terms are finite (due to the fact that λ(A) is finite), it follows that μ(E) ≥ λ∗ (E), so that μ(E) = λ∗ (E) in this case. In the general case, since λ is σ-finite, there exists disjoint Sk ∈ A such that S = Sk and λ(Sk ) < +∞. We may apply the previous result to each E ∩ Sk (which is a subset of Sk ), obtaining μ(E ∩ Sk ) = λ∗ (E ∩ Sk ). By adding over k, we obtain μ(E) = λ∗ (E), which completes the proof.

297

Outer Measure and Measure

As a corollary, we can answer the questions raised at the beginning of this section. Corollary 11.21 Let μ and ν be two Borel measures on R1 which are finite and equal on every half-open interval (a, b], −∞ < a < b < +∞. Then μ(B) = ν(B) for every Borel set B ⊂ R1 . Proof. Such μ and ν must agree on the algebra generated by the (a, b] and are σ-finite with respect to this algebra. Since the smallest σ-algebra containing all (a, b] is the Borel σ-algebra, it follows from Theorem 11.20 that μ and ν are identical. Although we have confined ourselves to n = 1, we note that an analogue of Corollary 11.21 can be formulated in higher dimensions. See Exercise 18. Now let μ be any finite Borel measure on R1 , and define fμ by fμ (x) = μ((−∞, x]). Clearly, μ((a, b]) = fμ (b) − fμ (a). Moreover, if fμ denotes the Lebesgue–Stieltjes measure constructed from fμ , then since fμ is continuous from the right, it follows from Theorem 11.10 that fμ ((a, b]) = fμ (b) − fμ (a). Therefore, by the previous corollary, μ and fμ are identical as Borel measures, and we easily obtain Corollary 11.22 The class of finite Borel measures on R1 is identical with the class of Lebesgue–Stieltjes measures induced by bounded increasing functions that are continuous from the right. See also Exercise 4. Let μ be a Borel measure on R1 which is finite on every (a, b], −∞ < a < b < +∞ (equivalently, μ is finite on every bounded Borel set). Consider the restriction of μ to the algebra A generated by the (a, b], and let μ∗ be the corresponding outer measure. The smallest σ-algebra containing A is the Borel sets B. Thus, A ⊂ B ⊂ A ∗ . Since μ and μ∗ are measures on B that agree on A , it follows that μ = μ∗ on B. In particular, if B ∈ B, we see from the definition of μ∗ (see (11.17)) that μ(B) = inf

μ(Ak ) : B ⊂

Ak , Ak ∈ A .

Each Ak ∈ A is a countable union of disjoint (a, b]. Hence, we obtain the formula μ(B) = inf

μ(ak , bk ] : B ⊂

(ak , bk ] ,

B ∈ B.

(11.23)

298

Measure and Integral: An Introduction to Real Analysis

We recall (see p. 269 in Section 10.5) that a Borel measure μ is said to be regular if μ(B) = inf{μ(G) : B ⊂ G, G open},

B ∈ B.

Theorem 11.24 If μ is a Borel measure on R1 which is finite on every bounded Borel set, then μ is regular. Proof. This is a corollary of (11.23). Given a Borel set B and ε > 0, find a cover {(ak , bk ]} of B such that

μ(ak , bk ] ≤ μ(B) + ε.

Since μ is finite on bounded intervals, it follows from Theorem 10.11 that μ(a, b] = limε→0+ μ(a, b + ε). Hence, by slightly enlarging each (ak , bk ], we see that there is an open set G, G = (ak , bk + εk ) for sufficiently small εk , containing B such that μ(G) ≤

μ(ak , bk + εk ) ≤ μ(B) + 2ε.

This completes the proof. In Theorem 11.24, as in Corollary 11.21, we have limited ourselves to n = 1. For n > 1, see Exercise 18. In particular, note that the assumption in Section 10.5 on p. 269 concerning the regularity of μ and ν is redundant.

Exercises 1. (a) Prove the second statements in both parts of Corollary 11.4. (b) Verify the statements made before Theorem 11.5 about the function r(A) defined on sets A ⊂ R2 . (One way to see that a set B with d(Y, B) > 0 is not r-measurable is to denote the mirror reflection of B in the y-axis by B∗ and check that the equation r(B ∪ B∗ ) = r(B) + r(B∗ ) is false.) 2. Let μ be a finite Borel measure on R1 , and define fμ (x) = μ((−∞, x]), −∞ < x < +∞. Show that fμ is monotone increasing, μ((a, b]) = fμ (b) − fμ (a), fμ is continuous from the right, and limx→−∞ fμ (x) = 0. 3. Let f be monotone increasing on R1 . (a) Show that f (R1 ) is finite if and only if f is bounded.

Outer Measure and Measure

299

(b) Let f be bounded and right continuous, let μ = f , and let f¯ denote the function fμ defined in Exercise 2. Show that f and f¯ differ by a constant. Thus, if we make the additional assumption that limx→−∞ f (x) = 0, then f = f¯ . 4. If we identify two functions on R1 which differ by a constant, prove that there is a one-to-one correspondence between the class of finite Borel measures on R1 and the class of bounded increasing functions that are continuous from the right. 5. Let f be monotone increasing and right continuous on R1 . (a) Show that f is absolutely continuous with respect to Lebesgue measure if and only if f is absolutely continuous on R1 . (By absolutely continuous on R1 , we mean absolutely continuous on every compact interval.) (b) If f is absolutely continuous with respect to Lebesgue measure, show that its Radon–Nikodym derivative equals df /dx. 6. Prove that the Lebesgue–Stieltjes outer measure constructed from f (x) = x is the same as Lebesgue outer measure. 7. If f is monotone increasing and continuous from the right on R1 , show ◦∗ ∗ that ∗f (A) = ◦∗ f (A), where f is defined in the same way as f except that we use open intervals (ak , bk ). 8. If f is monotone increasing and continuous from the right, derive formulas for f ([a, b]) and f ((a, b)). 9. Complete the proof of Theorem 11.12. 10. Show that in Rn , n > 1, the Hausdorff outer measure Hn is not identical to Lebesgue outer measure. (For example, let n = 2, and write A = Ak , ) < ε. Enclose Ak in a circle Ck with the same diameter, and show δ(Ak that δ(Ak )2 ≥ (4/π)|A|e . Thus, H2ε (A) ≥ (4/π)|A|e .) 11. If A is a subset of Rn , define the Hausdorff dimension of A as follows: If Hα (A) = 0 for all α > 0, let dim A = 0; otherwise, let dim A = sup{α : Hα (A) = +∞}. (a) Show that Hα (A) = 0 if α > dim A and that Hα (A) = +∞ if α < dim A. Show that in Rn we have dim A ≤ n. See Exercise 19 in order to determine the Hausdorff dimension of the Cantor set. (b) If dim Ak = d for each Ak in a countable collection {Ak }, show that dim( Ak ) = d. Hence, show that every countable set has Hausdorff dimension 0. 12. Let be an outer measure on S , and let denote restricted to the -measurable sets. Since is a measure on an algebra, it induces an outer measure ∗ . Show that ∗ (A) ≥ (A) for A ⊂ S and that equality holds

300

Measure and Integral: An Introduction to Real Analysis

for a given A if and only if there is a -measurable set E such that A ⊂ E and (E) = (A). Thus, = ∗ if is regular. 13. Let λ be a measure on an algebra A , and let λ∗ be the corresponding outer measure. Given A, show that there is a set H of the form k j Ak,j such that Ak,j ∈ A , A ⊂ H and λ∗ (A) = λ∗ (H). Thus, every outer measure that is induced by a measure on an algebra is regular. 14. Prove Theorem 11.18. 15. (a) Show that the intersection of a family of algebras is an algebra. (b) A collection C of subsets of S is called a subalgebra if it is closed under finite intersections and if the complement of any set in C is the union of a finite number of disjoint sets in C . Give an example of a subalgebra. Show that a subalgebra C generates an algebra by adding ∅, S , and all finite disjoint unions of sets of C . 16. If μ is a finite Borel measure on R1 , show that μ(B) = sup μ(F) for every Borel set B, where the sup is taken over all closed subsets F of B. 17. Show that the conclusions of Theorems 10.48 and 10.49, and therefore also the conclusion of Corollary 10.50, remain true without the assumption (ii) stated before Lemma 10.47. (Show that without this assumption, the conclusions of Lemma 10.47 are true with μ replaced by μ∗ ; for example, ν(Rn ) ν(Qx (h)) >α ≤c . μ∗ x ∈ E : sup α h>0 μ(Qx (h)) 18. Derive analogues of Corollary 11.21 and Theorem 11.24 in Rn , n > 1. (Use partly open n-dimensional intervals in place of the intervals (a, b].) 19. Show that the Cantor set C in [0, 1] has the Hausdorff measure of order log 2/log 3 equal to 1. (In order to show the inequality Hlog 2/ log 3 (C) ≤ 1, (ε) consider Hlog 2/ log 3 (C) when ε = 3−k , k = 1, 2, . . ., and show that the

2k intervals {Ij } of length 3−k remaining at the kth stage Ck of construc α tion of C satisfy |Ij | = 2k 3−kα = 1 if α = log 2/ log 3. A proof of the opposite inequality is harder. It may be helpful to note that if I is a closed interval, containing at its two endpoints intervals from Ck , then |I|α ≥ nk (I)/2k , where nk (I) is the number of intervals of Ck contained in I and α = log 2/ log 3.)

12 A Few Facts from Harmonic Analysis

12.1 Trigonometric Fourier Series The Lebesgue measure and integration have been decisive in the development of many branches of analysis and are applied there in ever greater degree. But, conversely, some of the applications have had considerable impact on the theory of integration. In this chapter, we will consider one topic where this interdependence has been particularly fruitful: harmonic analysis (see p. 305 in Section 12.1). One of the principal goals of harmonic analysis, which is a vast field, is to represent very general functions f in terms of a collection of simpler oscillatory ones. The fact that typical representations involve integration of f accounts for the interrelation of the two fields. Oscillatory behavior of the simpler functions has advantages: it can help make them independent of one another, and it can be exploited in order to find particular linear combinations of them that approximate general functions. We begin by describing some elementary notions and facts; the concept of an orthogonal system, and in particular of the trigonometric system, is basic here. The notion of an orthogonal system, defined generally on a subset E of positive measure in Rn , was introduced in Chapter 8, and we refer the reader to that place for the definitions and properties of general orthogonal systems, restating only a few facts here. A system of complex-valued functions {φα (x)}, all belonging to L2 (E), is called orthogonal over E if

φα , φβ =

φα φβ E

= 0 α = β > 0 α = β.

The second condition means that φα ≡ 0. If φα , φα = 1 for all α, the orthogonal system is called normal, or orthonormal. If {φα } is orthogonal, the system {φα /||φα ||2 } is orthonormal. Thus, by merely multiplying the functions of an orthogonal system by suitable constants, we can normalize it, and formulas 301

302

Measure and Integral: An Introduction to Real Analysis

for orthonormal systems can be easily and automatically extended to general, not necessarily normal, orthogonal systems. On the other hand, certain important orthogonal systems very naturally appear, often for historical reasons, in a nonnormalized form, and because of this, it may be desirable not to insist on the normality of the system under consideration. Let us, therefore, briefly restate the definitions in this somewhat more general setup. Since orthogonal systems are countable (see Theorem 8.21), we may index them by integers. Let φ1 (x), φ2 (x), . . . be an orthogonal system on E ⊂ Rn . Thus, 0 k = l. φk φl = λk > 0 k = l.

E

Given any (complex-valued) f ∈ L2 (E), we call the numbers ck =

1 f φk λk E

the Fourier coefficients of f and the series S[f ] = f , with respect to {φk }. As before, we write f ∼

ck φk (x) the Fourier series of

ck φk (x).

If we set −1/2

ψk = λ k

φk ,

1/2

dk = λk ck =

f ψk ,

(12.1)

E

then {ψk } is orthonormal over E, and {dk } is the sequence of Fourier coefficients of f with respect to {ψk }. Clearly, dk ψk = ck φk .

(12.2)

This set of formulas enables us to rewrite relations for orthonormal systems forms valid for general orthogonal systems. Thus, Bessel’s inequality in |dk |2 ≤ E |f |2 and Parseval’s formula |dk |2 = E | f |2 take the forms

λk |ck |2 ≤

E

| f |2 ,

λk |ck |2 =

| f |2 .

(12.3)

E

The notion of completeness of an orthogonal system (“the vanishing of all the Fourier coefficients implies the vanishing of the function”) remains unchanged in the general case, and as in the case of normalized systems, the

303

A Few Facts from Harmonic Analysis

validity of the second formula in (12.3) is a necessary and sufficient condition for the completeness of {φk }. Let sn denote the nth partial sum of S[f ]. As a corollary of the corresponding result for orthonormal systems, we see that the equation λk |ck |2 = E | f |2 is equivalent to f − sn 2 → 0. E

Thus, if an orthogonal system is complete, the Fourier series of every f ∈ L2 (E) 2 converges to f , convergence being understood in the metric L . Of course, this says nothing about the pointwise convergence of ck φk (x). On the other hand, it holds for any rearrangement of the terms of ck φk since the orthogonality and completeness of a system are not affected by a permutation of the functions within the system. We shall now consider a special orthogonal system, the trigonometric system. This name is given to the system of functions eikx = cos kx + i sin kx, x ∈ (−∞, ∞)

(k = 0, ±1, ±2, . . .).

These functions are all periodic, with period 2π, and it is immediate that they form an orthogonal system over any interval Q = (a, a + 2π) of length 2π, since if k and m are distinct integers, then

eikx eimx dx

Q

=

e

i(k−m)x

dx =

Q

ei(k−m)x k−m

a+2π = 0. a

The system is not orthonormal since λk = Q |eikx |2 dx = 2π for all k. Thus, with any f ∈ L(Q), we may associate its Fourier coefficients ck =

1 1 f (t) eikt dt = f (t)e−ikt dt 2π 2π Q

(k = 0, ±1, ±2, . . .),

(12.4)

Q

and its Fourier series f ∼

+∞

ck eikx .

(12.5)

−∞

In what follows, this series will be designated by S[f ], and its coefficients by ck [f ].

304

Measure and Integral: An Introduction to Real Analysis

Observe that2 if two functions φ and ψ are orthogonal over a set E and if 2 = |φ| E E |ψ| , then the pair φ ± ψ is also orthogonal over E, as seen from the equation

(φ + ψ)(φ − ψ) =

E

|φ|2 −

E

|ψ|2 = 0.

E

Applying this to the pairs e±ikx (k = 1, 2, . . .), we see that the functions eikx + e−ikx eikx − e−ikx 1 ,..., , , . . . (k = 1, 2, . . .) 2 2 2i

(12.6)

or, what is the same thing, the functions 1 , cos x, sin x, . . . , cos kx, sin kx, . . . 2

(12.7)

are orthogonal over any interval Q of length 2π. Using the form (12.6), we find that the numbers λk for (12.7) are 1 π, π, π, . . . . 2 Thus, any f ∈ L(Q) can be developed into a new Fourier series ∞

f ∼

1 a0 + (ak cos kx + bk sin kx), 2

(12.8)

k=1

where a0 =

1 1 −1 1 π f = f, 2 2 π Q

1 f (t) cos kt dt, ak = π Q

Q

1 bk = f (t) sin kt dt. π

(12.9)

Q

The numbers ak and bk are easily expressible in terms of the coefficients ck of (12.4): 1 a0 = c0 , 2

ak = ck + c−k ,

bk = i(ck − c−k ).

(12.10)

305

A Few Facts from Harmonic Analysis

Hence, n

ck eikx = c0 +

k=−n

n

ck eikx +

k=1

=

1 a0 + 2

n

c−k e−ikx

k=1

n

(ak cos kx + bk sin kx),

k=1

and the nth partial sum of the series in (12.8) turns out to be the nth symmetric partial sum of ck eikx . The numbers ak = ak [f ] and bk = bk [f ] are called the Fourier cosine and sine coefficients of f , respectively. To sum up, we may consider the trigonometric system in two forms. One consists of functions eikx (k = 0, ±1, ±2, . . .), and the Fourier series has the +∞ the form −∞ ck eikx , where the ck are given by (12.4). The other consists of the functions (12.7), the Fourier series is (12.8), and the coefficients ak and bk are given by (12.9). The partial sums of (12.8) are the symmetric partial sums of (12.5). In both cases, the terms of the Fourier series are harmonic oscillations, and for this reason, the study of S[f ] is called the harmonic analysis of f . Each form of the trigonometric system has its advantages. For example, if f is real-valued, then the numbers ak and bk are real, while the ck have the property c−k = ck . Note also that if Q = (−π, π) and f is an even function, that is, if f (−x) = f (x), then 2 f (t) cos kt dt, π π

ak =

bk = 0,

0

and if f is an odd function, that is, if f (−x) = −f (x), then 2 f (t) sin kt dt. π π

ak = 0,

bk =

0

∞ 1 Thus, if f is even, (12.8) reduces ∞ to the cosine series 2 a0 + k=1 ak cos kx, and if f is odd, to the sine series k=1 bk sin kx. Since the terms of a (trigonometric) Fourier series are periodic with period 2π, if we expect to represent a function f by its Fourier series, we may assume from the start that f is defined everywhere (or almost everywhere [a.e.]) on the real axis and is periodic with period 2π. This amounts to considering the function as defined on the circumference of the unit circle. We do not distinguish between points that are congruent mod 2π. By an integrable function, we shall mean a function integrable over a period. Similarly, the Lp norm of a function will mean its Lp norm over a period and the familiar definitions

306

Measure and Integral: An Introduction to Real Analysis

of other classes of function such as functions of bounded variation and Lipschitz continuous functions will also be restricted to a period. In what follows, periodic will mean periodic of period 2π. In the preceding chapters, we proved a number of theorems about functions in Lp (Rn ) and, in particular, in Lp (R1 ). Usually, these results have analogues for periodic functions, where integrals over R1 are replaced by integrals over a period. The proofs are usually in essence identical with those for R1 (or are merely corollaries of the results for R1 ) and may be left as exercises. We would like to stress one point. The definition of a general orthogonal system presupposes that the functions in the system are of class L2 . This makes it possible to define Fourier coefficients for any f ∈ L2 . If f is not in L2 , it may be impossible to define its Fourier coefficients with respect to certain orthogonal systems. The situation is different

for special orthogonal systems. For example, in the case of the functions eikx , which are bounded, the coefficients ck are defined for any f that is merely integrable over Q and, in particular, for any f ∈ Lp (Q), 1 ≤ p ≤ ∞. Thus, the trigonometric system is richer in properties than general orthogonal systems. Some simple developments are important for the general theory of Fourier series. We consider two here and refer the reader to Exercise 5 for others. Example (a). Let f be periodic and equal to 12 (π − x) for 0 < x < 2π, with f (0) = f (2π) = 0. Since f is odd, its Fourier series is a sine series, and integration by parts gives ⎞ ⎛ π π 1 ⎝π 1 2 1 1 (π − x) sin kx dx = − bk = cos kx dx⎠ = . π 2 π k k k 0

0

Thus,

f ∼

∞ sin kx

k

k=1

+∞

=

1 eikx , 2 −∞ ik

where denotes k=0 . Example (b). Let f be periodic and equal in (−π, π) to the characteristic function of the interval (−h, h), 0 < h < π. Then f is even, and if k = 0, its cosine coefficient is 2 2 sin kh . cos kx dx = π π k h

ak =

0

307

A Few Facts from Harmonic Analysis

Since a0 = 2h/π, we obtain ∞ 2h 1 sin kh + cos kx f ∼ π 2 kh k=1 +∞ h sin kh ikx = e . 1+ π kh −∞ Series of the form +∞

∞

1 a0 + (ak cos kx + bk sin kx) , 2

ck eikx ,

−∞

k=1

whether they are Fourier not, are called trigonometric series. In definseries or ikx , we usually consider the limit, ordinary or c e ing the convergence of +∞ −∞ k generalized, of the symmetric partial sums +n −n , and once again n

1 a0 + (ak cos kx + bk sin kx) , 2 n

ck eikx =

−n

k=1

where 1 a0 = c0 , 2

ak = ck + c−k ,

bk = i ck − c−k .

A finite sum T = n−n ck eikx is called a trigonometric polynomial of order n, and if |c−n | + |cn | = 0, T is strictly of order n. If T is of order n and vanishes at more than 2n distinct points (i.e., distinct mod 2π), then it vanishes identically, that is, all the ck are 0. In fact, Teinx = n−n ck ei(k+n)x is an algebraic (power) polynomial in z = eix of degree ≤ 2n, and if it has more than 2n zeros, then it vanishes identically. If the numbers ak and bk are real, the trigonometric series ∞

S=

1 a0 + (ak cos kx + bk sin kx) 2 k=1

is the real part of the power series ∞

1 a0 + (ak − ibk ) zk 2 k=1

308

Measure and Integral: An Introduction to Real Analysis

on the unit circle z = eix . The imaginary part is then the series ∞

(ak sin kx − bk cos kx)

(12.11)

k=1

(with vanishing constant term). If S is written in the complex form it is easy to see that (12.11) is +∞

(−i sign k)ck eikx

+∞

−∞ ck e

ikx ,

(12.12)

−∞

(where, by convention, sign 0 = 0). The series (12.11), or (12.12), is said to be conjugate to S. A series conjugate to a trigonometric series S is denoted by S. If S has constant term 0, then ≈

S = −S.

It is natural to study the properties of S[f ] simultaneously with those of S[f ]. One more remark. Properties of functions in Lp (Rn ), and in particular in Lp (R1 ), are important for the theory of Fourier integrals, which for nonperiodic functions play the same role as Fourier series in the periodic case. The two theories run largely parallel. In this chapter, we shall limit ourselves to Fourier series since our primary aim is to show the role that Lebesgue integration plays in some problems of representability of functions, and both the results and techniques of Fourier series are sufficiently indicative of the situation. In Chapter 13, we will study the main aspects of Fourier integrals in Rn , n ≥ 1.

12.2 Theorems about Fourier Coefficients Theorem 12.13 If a periodic f is the indefinite integral of its derivative f (i.e., if ikx f is periodic and absolutely continuous), and if f ∼ ck e , then f ∼

+∞

ck (ik)eikx .

−∞

In symbols, S[f ] = S [f ], where S [f ] denotes the result of the termwise differentiation of S[f ].

309

A Few Facts from Harmonic Analysis

Proof. It is clear that f is also periodic and that its constant term equals −1

2π

(2π)

f (x) dx = (2π)−1 [f (2π) − f (0)] = 0.

0

If k = 0, integrating by parts and observing that the integrated term is zero, we have (2π)−1

2π

f (x)e−ikx dx = (2π)−1 ik

0

2π

f (x)e−ikx dx = ikck ,

0

which proves the theorem. By repeated application of this result, we see that if a periodic f is the mth indefinite integral of an integrable function f (m) , then S f (m) = S(m) [f ] = (ik)m ck eikx .

Theorem 12.14 If f is periodic, f ∼ f , then F(x) − c0 x is periodic and

ck eikx , and if F is the indefinite integral of

F(x) − c0 x ∼ C0 +

ck eikx , ik

constant (depending where C0 is a suitable on the choice of the arbitrary constant of integration in F) and again denotes k=0 . Proof. Let G(x) = F(x) − c0 x. The periodicity of G follows from the equation G(x + 2π) − G(x) =

x+2π

f dt − c0 (2π) = 2πc0 − 2πc0 = 0.

x

Since G is also absolutely continuous, Theorem 12.13 gives S [G] = S[G ] = S[f − c0 ] =

ck eikx .

k=0

Hence, S[G] is obtained by termwise integration of the result, C0 being the constant term of S[F − c0 x].

ck eikx , which leads to

310

Measure and Integral: An Introduction to Real Analysis

For the trigonometric system, we have Bessel’s inequality (see (12.3)) +∞

|ck |2 ≤

−∞

2π 1 | f (x)|2 dx, 2π 0

but actually, as we shall see, we also have Parseval’s formula +∞

|ck |2 =

−∞

2π 1 | f (x)|2 dx 2π

(12.15)

0

for every f ∈ L2 . We know by Theorem 8.31 that this is a corollary of the next result.

Theorem 12.16 The trigonometric system is complete. More precisely, if all the Fourier coefficients of an integrable f are zero, then f = 0 a.e. Proof. Assume first that f is continuous and real-valued, with all ck = 0. If f ≡ 0, then | f | attains a nonzero maximum M at some point x0 . Suppose, for example, that f (x0 ) = M > 0. Let δ > 0 be so small that f (x) > 12 M in the interval I = (x0 − δ, x0 + δ). Consider the trigonometric polynomial t(x) = 1 + cos(x − x0 ) − cos δ. It is strictly greater than 1 inside I and does not exceed 1 in absolute value elsewhere. The hypothesis that all the Fourier coefficients of f are 0 implies π that −π f T dx = 0 for any trigonometric polynomial T, and in particular, π

f tN dx = 0

(N = 1, 2, . . .).

−π

We claim that this is impossible for N large enough. The absolute value of the part of the last integral extended over the complement of I is ≤ 2π · M · 1N = 2πM. If I is the middle half of I, then t(x) ≥ θ > 1 in I , so that I

f tN dx ≥

I

f tN dx ≥

1 M · θN |I | → +∞. 2

π Collecting results, we see that −π f tN dx → +∞; this contradiction shows that f ≡ 0. 2π If f is continuous but not real-valued, the hypothesis 0 f (x)e−ikx dx = 0 2π for all k implies that 0 f (x) e−ikx dx = 0 for all k. By adding and subtracting

A Few Facts from Harmonic Analysis

311

the last two equations, we see that both the real and imaginary parts of f have all their Fourier coefficients equal to 0 and so vanish identically. Finally, if f is merely integrable, the hypothesis c0 = 0 implies that the x function F(x) = 0 f dt is periodic, and by Theorem 12.14, for a suitable C0 , the Fourier coefficients of the continuous function F − C0 are all 0. Hence, F − C0 ≡ 0, F is a constant, and f = F = 0 a.e. This completes the proof of the theorem. Another proof is given on p. 340 in Section 12.6. An immediate corollary is the following result.

Theorem 12.17

Parseval’s formula (12.15) holds for any f ∈ L2 .

Parseval’s formula can be written in more general which are, forms, however, corollaries of (12.15). Thus, besides f ∼ ck eikx ∈ L2 , consider another function g ∼ dk eikx ∈ L2 . Then, by an argument like that in Section 8.6 for (8.32), 2π +∞ 1 f g dx = ck dk . 2π −∞

(12.18)

0

This reduces to (12.15) in the special case f = g. The completeness of the trigonometric system also gives the next two theorems.

Theorem 12.19 If the Fourier series of a continuous f converges uniformly, then the sum of the series is f . Proof. Let g be the sum of the uniformly convergent series S[f ]. The Fourier coefficients of g can be obtained by multiplying S[f ] by e−ikx and integrating the result termwise. Thus, ck [g] = ck [f ] for all k, so that f ≡ g. Theorem 12.20 If a periodic f is the integral of a function in L2 , then S[f ] converges absolutely and uniformly. In particular, the Fourier series of a continuously differentiable function converges uniformly to the function. ikx , g∼ ck e (c0 = 0). Then S[f ] = C0 + Proof. Let f be the integral of g ∈ L2 ikx C |ck |2 < +∞ by Bessel’s inequality, so k e , Ck = ck /ik, k = 0. We have that |Ck | < +∞ by Schwarz’s inequality. This completes the proof. The theorem that follows is of basic importance.

312

Measure and Integral: An Introduction to Real Analysis

Theorem 12.21 (Riemann–Lebesgue) The Fourier coefficients ck of any integrable f tend to 0 as k → ±∞. Hence, also ak , bk → 0 as k → +∞. Proof. First, we note the obvious but important inequality

|ck [f ]| ≤

2π 1 | f | dx. 2π 0

We will give two proofs of the theorem. (a) (See also Exercise 15 of Chapter 8.) If f ∈ L2 , then ck → 0 as a corollary of Bessel’s inequality (p. 310, Section 12.2). If f ∈ L and 2π ε > 0, write f = g + h, where g ∈ L2 and 0 |h| < ε. (This decomposition can be made in various ways: we may, for example, take M large enough and define h to be f wherever | f | ≥ M and 0 elsewhere; clearly, |g| ≤ M, and so g ∈ L2 .) Then ck [f ] = ck [g] + ck [h]. Since ck [g] → 0 and |ck [h]| ≤ (2π)−1 ck [f ] → 0 follows.

2π 0

|h| < ε/2π for all k, the relation

(b) Observe that

ck [f ] =

2π 1 1 f (x)e−ikx dx = 2π 2π 0

=−

2π−(π/k) −(π/k)

π −ik(x+(π/k)) e f x+ dx k

2π π −ikx 1 e f x+ dx. 2π k 0

Taking the semi-sum of the first and third integrals, we obtain

ck [f ] =

2π π −ikx 1 e f (x) − f x + dx, 4π k 0

|ck [f ]| ≤

1 4π

2π 0

π dx. f (x) − f x + k

However, we know that the last integral tends to 0 as k → ±∞ (cf. Theorem 8.19; the analogous result for periodic functions is left to the reader). This completes the proof.

313

A Few Facts from Harmonic Analysis

Given any finite periodic f , the expression sup |f (x + h) − f (x)|

(δ > 0)

x,h;|h|≤δ

is called the modulus of continuity of f and denoted by ω(δ) or ω(δ, f ) (cf. Exercise 17 of Chapter 1). If f is in Lp , 1 ≤ p < ∞, the expression ⎡

⎤1/p 2π 1 sup ⎣ |f (x + h) − f (x)|p dx⎦ |h|≤δ 2π 0

is called the Lp modulus of continuity of f and denoted ωp (δ, f ) or simply ωp (δ). Clearly, ωp (δ) ≤ ω(δ) and, as is easily seen from Hölder’s inequality, ωp (δ) ≤ ωq (δ) if p ≤ q. We know that if f ∈ Lp , then ωp (δ, f ) → 0 with δ (Theorem 8.19). The last inequality in proof (b) of Theorem 12.21 gives |ck [f ]| ≤

π 1 ω ,f , 2 |k|

|ck [f ]| ≤

1 ω1 2

π ,f . |k|

(12.22)

These two inequalities contain the Riemann–Lebesgue theorem in a sharp form since they quantitatively estimate the magnitude of the Fourier coefficients of f in terms of various moduli of continuity. The estimates (12.22) are also useful for families of functions. The following special case deserves a separate mention. A continuous periodic f is said to satisfy a Lipschitz condition of order α, 0 < α ≤ 1, if ω(δ, f ) = O(δα ) or, equivalently, if there is a finite constant M independent of x, h such that |f (x + h) − f (x)| ≤ M|h|α .

Theorem 12.23

If f satisfies a Lipschitz condition of order α, 0 < α < 1, then |ck [f ]| = O(|k|−α ).

If α = 1, the stronger estimate |ck [f ]| = o is valid.

1 |k|

314

Measure and Integral: An Introduction to Real Analysis

Proof. The first part follows from the first inequality (12.22). If α = 1, then f is absolutely continuous (see p. 150 in Section 7.5) and so equals the indefinite integral of its derivative f . Since f is bounded (and so is in L2 ), its Fourier coefficients tend to zero. Hence, the coefficients of f are o(1/|k|) by Theorem 12.13. Theorem 12.24 If a periodic f is of bounded variation over a period, then |ck [f ]| = O(1/|k|). More precisely,

|ck | ≤

V , 2π|k|

where V is the total variation of f over a closed interval of length 2π. Proof. Integrating by parts and taking account of the periodicity of f , we have

2πck [f ] =

π −π

e

−ikx

π 1 −ikx f (x) dx = e df (x), ik −π

where the last integral is a Riemann-Stieltjes integral. Hence,

2π|ck [f ]| ≤ |k|−1

π

|df (x)| = |k|−1 V.

−π

It must be stressed that V is the total variation over a closed interval of periodicity.

12.3 Convergence of S[f ] and S[f ] We shall now briefly discuss the problem of pointwise convergence of S[f ], treating side-by-side the parallel problem for S[f ]. Among many existing results, we will consider only the simplest. Without loss of generality, we may restrict our attention to real-valued f . We begin by computing the partial sums of S[f ] and S[f ]. If ak and bk denote the cosine and sine coefficients of f and n = 1, 2, . . ., then the nth partial sum of S[f ] is

315

A Few Facts from Harmonic Analysis

1 a0 + (ak cos kx + bk sin kx) 2 k=1 π π n 1 1 = f (t) dt + f (t) cos kt dt + sin kx cos kx · 2π −π π −π k=1 π 1 · f (t) sin kt dt π −π π π n 1 1 1 = + f (t) cos k(x − t) dt = f (t)Dn (x − t) dt, π −π 2 π −π n

sn (x) =

k=1

where 1 + cos kt. 2 n

Dn (t) =

k=1

In case n = 0, we denote s0 (x) = 12 a0 and D0 (t) = 12 . The trigonometric polynomial Dn is called the nth Dirichlet kernel. Similarly, if n = 1, 2, . . ., the nth partial sum of S[f ] is

sn (x) =

n k=1

=

n π 1 f (t) sin k(x − t) dt (ak sin kx − bk cos kx) = π −π k=1

π

1 n (x − t) dt, f (t)D π −π

where n (t) = D

n

sin kt

k=1

is the nth conjugate Dirichlet kernel. It will be convenient to define s0 (x) = 0 n are even and odd functions of t, 0 (t) = 0. Notice that Dn and D and D respectively, and that π 1 Dn (t) dt = 1, π −π

π 1 Dn (t) dt = 0. π −π

(12.25)

316

Measure and Integral: An Introduction to Real Analysis

n are, respectively, the real and imaginary parts of Moreover, Dn and D z = eit , n ≥ 1 ,

1 ikt 1 k 1 zn+1 − z + e = + z = + 2 2 2 z−1 n

n

k=1

k=1

this expression being interpreted as computation gives (even if n = 0)

Dn (t) =

sin n + 12 t 2 sin 12 t

,

1 2

+ n when t = 0. Then an elementary

n (t) = D

cos 12 t − cos n + 12 t 2 sin 12 t

,

(12.26)

n (0) = 0. with Dn (0) = n + 12 and D A quicker, though somewhat artificial, method of obtaining the first formula in (12.26) is to multiply Dn (t) termwise by 2 sin 12 t, replace the products 2 sin 12 t cos kt by sin k + 12 t − sin k − 12 t, and make use of cancellation of n (t) can be derived similarly. terms. The formula for D Given a function f and a fixed point x, let us consider the expressions φx (t) =

1 [f (x + t) + f (x − t)], 2

ψx (t) =

1 [f (x + t) − f (x − t)] 2

as functions of t. They are called the even and odd parts of f at the point x, respectively. Clearly, f (x + t) = φx (t) + ψx (t). It turns out that the behaviors of φx (t) and ψx (t) near t = 0 are decisive for the behaviors of S[f ] and S[f ], as the case may be, at the point x. Returning to the formula for sn (x) and making use of the even character of Dn (t), we can write π π 1 1 f (t)Dn (x − t) dt = f (t)Dn (t − x) dt sn (x) = π −π π −π

=

π π 1 2 1 [f (x + t) + f (x − t)]Dn (t) dt f (x + t)Dn (t) dt = π −π π 2 0

=

2 π

π 0

φx (t)Dn (t) dt.

317

A Few Facts from Harmonic Analysis

The first formula (12.25) immediately gives 2 [φx (t) − f (x)] Dn (t) dt π 0 1 π sin n + 2 t 2 = dt. [φx (t) − f (x)] 1 π 2 sin 2 t π

sn (x) − f (x) =

0

It will be convenient to modify this formula slightly by replacing n by n − 1 and taking the semi-sum of the two formulas. When n ≥ 1, writing #

sn (x) =

1 1 [sn (x) + sn−1 (x)] = sn (x) − (an cos nx + bn sin nx) , 2 2

(12.27)

we obtain, after observing that 1 2 cos nt dt, [φx (t) − f (x)] (an cos nx + bn sin nx) = 2 π 2 π

0

the formula 2 sin nt dt. [φx (t) − f (x)] π 2 tan 12 t π

#

sn (x) − f (x) =

0

The right side here is the nth Fourier sine coefficient of the odd function [φx (t) − f (x)]

1 1 cot t, 2 2

and if this function happens to be integrable near t = 0, the Riemann– # Lebesgue theorem immediately gives sn (x) − f (x) → 0. Hence, making use of the fact that an , bn → 0, we obtain from (12.27) the following basic result. Theorem 12.28 (Dini’s Test) Let f be periodic and integrable. If the integral π φx (t) − f (x) 1 cot 1 t dt 2 2 0

is finite, then S[f ] converges at the point x to the value f (x).

318

1 2

Measure and Integral: An Introduction to Real Analysis

Since only small values of t matter here, and since for small t we have cot 12 t t−1 , Dini’s condition can be restated in the form π φx (t) − f (x) t

0

dt < +∞,

or what is the same thing π |f (x + t) + f (x − t) − 2f (x)| t

0

dt < +∞.

(12.29)

The following special case is useful. Suppose that f has a jump discontinuity at x, so that the one-sided limits f (x+), f (x−) exist and are finite. Since changing f at a single point does not affect S[f ], we may assume that f (x) =

1 [f (x+) + f (x−)], 2

in which case we say that f has a regular discontinuity at x. Condition (12.29) is then certainly satisfied if both π | f (x + t) − f (x+)| 0

t

dt < +∞,

π | f (x − t) − f (x−)| t

0

dt < +∞.

Thus, a corollary of Dini’s test is that if both f (x+) and f (x−) exist and are finite, and if both of the last two integrals are finite, then S[f ] converges at the point x to the value f (x+) + f (x−) . 2 There is a result analogous to Theorem 12.28 for S[f ], and we will be n , we have brief here. Using the formula for sn and the odd character of D if n ≥ 1 that 1 1 π π cos t − cos n + 2 2 t 1 n (t) dt = − 2 ψx (t) dt, sn (x) = − f (x + t)D 1 π −π π 2 sin 2 t 0

319

A Few Facts from Harmonic Analysis

# sn (x) =

sn (x) + sn−1 (x) 2

π π cos n + 12 t 2 1 2 1 =− dt, ψx (t) cot t dt + ψx (t) π 2 2 π 2 sin 12 t 0

0

provided that π

|ψx (t)|

0

dt < +∞. t

(12.30)

Under this hypothesis, the last term in the preceding equation tends to zero by the Riemann–Lebesgue theorem, and we obtain (see Exercise 22) Theorem 12.31 Under the hypothesis (12.30), the series S[f ] converges at the point x to the sum −

2 1 1 f (x + t) − f (x − t) 1 dt. ψx (t) cot t dt = − π 2 2 π 2 tan 12 t π

π

0

0

We denote the last integral, which converges absolutely when (12.30) holds, by f (x) : 1 f (x) = − π

π f (x + t) − f (x − t) 0

2 tan 12 t

dt.

(12.32)

This function is called the conjugate function of f and is intimately connected with the behavior of S[f ]. We will study the existence and properties of f in detail later. Observe that condition (12.30) is of a nature completely different from (12.29); (12.30) precludes the possibility that f may have a jump at x. See Exercise 15. The proofs of Theorems 12.28 and 12.31 are based on the Riemann– Lebesgue theorem and give convergence results only at individual points. They cannot give uniform convergence in an interval without additional and rather strong assumptions. We consider one such assumption that, though very restrictive, leads to an important result. Theorem 12.33 If f = 0 in an interval (a, b), then S[f ] and S[f ] converge uniformly in every smaller interval (a + ε, b − ε). The sum of S[f ] is 0.

320

Measure and Integral: An Introduction to Real Analysis

Proof. The pointwise convergence in (a, b) is a corollary of Theorems 12.28 and 12.31, and it is only the question of uniformity that requires additional comment. We will consider only S[f ]; the argument for S[f ] is similar. Fix ε > 0. From (12.27), we deduce that # sn (x0 )

π 1 sn (x0 ) + sn−1 (x0 ) sin nt = dt = f (x0 + t) 2 π −π 2 tan 12 t

=

π 1 f (x0 + t) χ(t) sin nt dt, π −π

x0 ∈ (a + ε, b − ε),

where χ(t) is periodic, equals 12 cot 12 t for ε ≤ |t| ≤ π, and is arbitrary for |t| < ε. Suppose χ is defined so that it is continuous everywhere. Write f (x0 +t)χ(t) = gx0 (t), treating t as the variable and x0 as a parameter, and consider the modulus of continuity of gx0 (t) in the metric L1 . If we show that ω1 (δ, gx0 ) tends to 0 with δ uniformly for x0 ∈ (a + ε, b − ε), then the Fourier coefficients of gx0 (t) will also tend to 0 uniformly for such x0 (see the second formula (12.22)), and the theorem will follow. Now, for h > 0, 2π 2π gx (t + h) − gx (t) dt = f (x0 + t + h) χ(t + h) − f (x0 + t) χ(t) dt 0 0 0

0

≤

2π

f (x0 + t + h) − f (x0 + t) |χ(t + h)| dt

0

+

2π f (x0 + t) |χ(t + h) − χ(t)| dt. 0

The last integral clearly tends to 0 with h, uniformly in x0 , since max |χ(t + h)−χ(t)| → 0 as h → 0. If M = max |χ|, the preceding integral is majorized by M

2π 2π f (x0 + t + h) − f (x0 + t) dt = M | f (t + h) − f (t)| dt, 0

0

a quantity independent of x0 and tending to 0. This completes the proof. Two trigonometric series T1 and T2 are said to be equiconvergent at a point x0 if their difference T1 − T2 converges to 0 at x0 . If T1 − T2 merely converges, but not necessarily to 0, then T1 and T2 are said to be equiconvergent in the wider sense at x0 . Each of two equiconvergent series may be individually divergent, but the character of divergence is so similar that divergence cancels out in T1 − T2 .

321

A Few Facts from Harmonic Analysis

Theorem 12.34 Let f1 and f2 be two periodic functions that are equal in an interval (a, b). Then S[f1 ] and S[f2 ] are uniformly equiconvergent in every subinterval (a + ε, b − ε); S[f1 ] and S[f2 ] are uniformly equiconvergent in the wider sense in every (a + ε, b − ε). This is a corollary of Theorem 12.33, since, for example, S[f1 ] − S[f2 ] = S[f ] where f (= f1 − f2 ) vanishes in (a, b). Thus, if we change the values of f in an arbitrary way outside an interval (a, b), we do not affect the behavior of S[f ] in (a + ε, b − ε). Likewise for S[f ], although in this case, if the series converges, the value of the sum may change. Therefore, the convergence or divergence of S[f ] and S[f ] at a point x0 is a local property, that is, it depends only on the behavior of f near x0 .

12.4 Divergence of Fourier Series Theorem 12.35 There exists a continuous periodic f such that S[f ] diverges (more specifically, the partial sums of S[f ] are unbounded) at some point. Proof. Let 1 ≤ m < n and consider the polynomials

Qm,n (x) =

cos(m + n − 1)x cos mx cos(m + 1)x + + ··· + n n−1 1 −

cos(m + n + 1)x cos(m + n + 2)x cos(m + 2n)x − − ··· − . 1 2 n

We will show that all these polynomials are uniformly bounded, but that their partial sums are not. To prove the first statement, we need the fact that the partial sums of the series ∞ sin kx k=1

k

,

which we considered on p. 306 in Section 12.1, are uniformly bounded. This is an elementary fact that can be proved in many ways (see, e.g., Theorem 12.50(c)), but here we take it for granted. Thus, since

322

Measure and Integral: An Introduction to Real Analysis

Qm,n (x) =

n cos(m + n − k)x − cos(m + n + k)x

k

k=1

= 2 sin(m + n)x

n sin kx

k

k=1

,

we obtain |Qm,n (x)| ≤ C, where C is independent of m and n. On the other hand, when x = 0, the partial sum #

Qm,n (x) =

cos(m + n − 1)x cos mx + ··· + n 1

has the value 1 + (1/2) + · · · + (1/n), which is of order log n. Now select integers mk and nk such that mk + 2nk < mk+1

(k = 1, 2, . . .),

and choose a series of positive numbers αk such that αk < +∞, αk log nk → +∞. (We will make the construction in a moment.) The series ∞

αk Qmk ,nk (x)

k=1

then converges uniformly to a continuous function f . In view of the previous inequality relating mk and mk+1 , the polynomials Qmk ,nk do not overlap. Hence, the last series can be written as a single trigonometric series, whose coefficients (because of uniform convergence) are the Fourier coefficients of f . Thus, this series, unbracketed, is S[f ]. But S[f ] has unbounded partial # sums at x = 0 since a single block of terms, namely, αk Qmk ,nk (x), is of order αk log nk at x = 0. It is easy to verify that if we set 3

mk = 5k ,

3 nk = 2mk = 2 5k ,

αk = 1/k2 ,

then all the conditions required previously are fulfilled. This completes the proof. We leave it to the reader to check that if we choose 2

mk = 5k ,

nk = 2mk ,

αk = 1/k2

in the construction above, we get a continuous f whose partial sums are divergent but bounded at x = 0.

323

A Few Facts from Harmonic Analysis

Theorem 12.35 asserts that the partial sums of S[f ] can be unbounded even if f is continuous. It is of interest to know how unbounded they can be. From the formula sn (x, f ) =

π 1 f (x + t)Dn (t) dt, π −π

we see that if | f | ≤ 1, then π π 2 sn (x, f ) ≤ 1 |Dn (t)| dt = |Dn (t)| dt, π −π π 0

uniformly in x. The right side here is called the nth Lebesgue constant and will be denoted by Ln . Note that Ln is actually the value of sn (0, f ) for a specific f , namely, f (t) = sign Dn (t).

Theorem 12.36

We have

2 4 |Dn (t)| dt = 2 log n + O(1) Ln = π π π

as n → ∞.

0

Proof. Write 1 sin n + 1 t dt 2 2 sin 12 t 0 0 π π 1 1 dt 1 2 2 1 sin n + sin n + = t + t − dt. π 2 t π 2 2 sin 12 t t π

2 2 |Dn (t)| dt = Ln = π π π

0

0

−1 − t−1 is nonnegative and bounded for 0 < t ≤ π, and since Since 2 sin 12 t | sin n + 12 t| ≤ 1, the last integral is nonnegative and majorized by an abso lute constant. The change of variable n + 12 t = u shows that the preceding term equals 2 π

(n+(1/2))π 0

| sin u| du. u

324

Measure and Integral: An Introduction to Real Analysis

We may disregard the parts of this integral extended over (0, π) and 1 nπ, n + 2 π , since the integrand is bounded. In view of the periodicity of sin u, what remains can be written as n−1 nπ π 1 2 | sin u| 2 du = (sin u) du. ππ u π u + kπ 0

k=1

For 0 ≤ u ≤ π, the sum in brackets is contained between π−1 nk=2 (1/k) and −1 log n by an amount that is bounded in π−1 n−1 k=1 (1/k) and so differs from π π n and u. If we now note that 0 sin u du = 2, and collect estimates, we obtain Ln = (4/π2 ) log n + O(1).

Theorem 12.37

If f is integrable, then at each point x0 of continuity of f , sn x0 , f = o(log n).

The estimate is uniform over every closed interval of continuity of f . Proof. We will prove only the first statement, leaving the second to the reader. Suppose, as we may, that x0 = 0, f (x0 ) = 0. Because of our results about localization (see Theorem 12.34), we may assume that f vanishes outside an arbitrarily small fixed interval (−δ, δ). Then δ π 1 |sn (0)| = |Dn (t)| dt. f (t)Dn (t) dt ≤ sup | f (t)| · π −δ |t|≤δ −π Since the sup here is small with δ and the integral is of order log n, the assertion follows.

12.5 Summability of Sequences and Series Theorem 12.35 shows that even continuous functions, when developed into Fourier series, may not be representable by those series in terms of pointwise convergence. The situation can be remedied by considering generalized sums of the series. This topic is vast and basic for analysis, and we will study only a few facts important for the theory of Fourier series.

325

A Few Facts from Harmonic Analysis

Consider a fixed doubly infinite matrix of numbers (real or complex): α00 α10

α01 · · · α0n · · · α11 · · · α1n · · · .. .

(M )

αm0 αm1 · · · αmn · · · .. .

Given an infinite sequence of numbers s0 , s1 , . . . , sn , . . . , we transform it by using (M ) into a sequence σ0 , σ1 , . . . , σm , . . . by means of the formulas σm = αm0 s0 + αm1 s1 + · · · + αmn sn + · · ·

(m = 0, 1, 2, . . .),

assuming that the series defining σm converges for each m. We may ask what conditions on (M ) will guarantee that whenever {sn } converges to a finite limit s, lim σm also exists and equals s. An answer is given by the following theorem.

Theorem 12.38

Suppose that (M ) satisfies the following three conditions:

≤ A (for all m, with A independent of m), (ii) limm→∞ ( n αmn ) = 1, (iii) limm→∞ αmn = 0 for each n. (i)

n |αmn |

Then for any sequence {sn } converging to a finite limit s, lim σm exists and equals s. Theorem 12.38 is due to Toeplitz, and a matrix (M ) that satisfies (i)–(iii) is called a Toeplitz matrix. Proof. First of all, since {sn } is bounded, (i) implies that σm exists for each m. Next, write sn = s + εn , where εn → 0. Then σm =

αmn (s + εn ) = s

n

We have s

n αmn

αmn +

n

αmn εn .

n

→ s by (ii), and it remains only to show that the expression ρm =

n

αmn εn

326

Measure and Integral: An Introduction to Real Analysis

tends to 0 as m → ∞. Given δ > 0, split ρm into two sums, ρm =

αmn εn +

n≤n0

αmn εn = ρ m + ρ m ,

n>n0

say, where n0 is so large that |εn | ≤ δ for n > n0 . By (i), ρ ≤ |αmn | |εn | ≤ |αmn | δ ≤ Aδ. m n>n0

n>n0

On the other hand, ρ m consists of a fixed number of terms each of which, by (iii), tends to 0 as m → ∞. Hence, ρ m < Aδ for m large enough. Combining estimates, we see that ρm → 0, which completes the proof. It is useful to note that if s = 0, then condition (ii) is not required in the proof (and so in the statement of the theorem) above. It is also immediate from the proof that if {sn } depends on a parameter, and if {sn } tends uniformly to a bounded limit s, then {σm } tends uniformly to s too. If σm → s, we shall say that the sequence {sn } (or the series whose partial sums are the sn ) is summable to limit (sum) s by means of the matrix (M ), or simply is summable (M ) to s. The matrix (M ) is called positive if αmn ≥ 0 for all m, n. Condition (i) is then a corollary of (ii). For positive (M ), Theorem 12.38 also holds if s = ±∞; we leave the proof to the reader. Two methods of summability are of special significance for Fourier series. (a) The method of the arithmetic mean. Given s0 , s1 , . . . , sn , . . ., consider the arithmetic means σ0 , σ1 , . . . , σm , . . . defined by σm =

s0 + s1 + · · · + sm m+1

(m = 0, 1, 2, . . .).

If sn → s (−∞ ≤ s ≤ +∞), then σm → s. This is clearly a special case of Theorem 12.38; the matrix is positive. It is useful (see, e.g., the comments following the proof of Theorem 12.44) to note that if the sn are the partial sums of a series ∞ k=0 uk , then u0 + (u0 + u1 ) + · · · + (u0 + u1 + · · · + um ) s0 + s1 + · · · + sm = m+1 m+1 m 1 = (m + 1 − k) uk . m+1

σm =

k=0

327

A Few Facts from Harmonic Analysis

Thus, m σm = 1− k=0

k uk , m+1

1 kuk . m+1 m

s m − σm =

(12.39)

k=0

(b) The method of Abel. Given a series u0 + u1 + · · · + un + · · · , consider the power series f (r) =

∞

un rn ,

0 ≤ r < 1,

n=0

assuming that it converges for 0 ≤ r < 1. If f (r) → s as r → 1, we say that un is Abel summable (or A-summable) to sum s. The method can also be applied to sequences since any sequence {sn } can be written as the partial sums of the series s0 + (s1 − s0 ) + (s2 − s1 ) + · · · . Let us now see the relation of Abel summability to the general scheme. We claim that for 0 ≤ r < 1, the formula ∞

n

un r = (1 − r)

n=0

∞

sn rn

(sn = u0 + · · · + un )

(12.40)

n=0

is valid assuming only that one of the two series that appear is convergent. If the right side converges, it equals ∞

sn rn −

n=0

∞

sn rn+1 =

n=0

∞

sn rn −

n=0

= s0 +

∞

sn−1 rn

n=1 ∞

(sn − sn−1 ) rn =

n=1

∞

un rn .

n=0

n Conversely, if ∞ some r, 0 < r < 1, its Cauchy product n=0 un r converges for n −1 converges to sum with the absolutely convergent series ∞ n=0 r = (1 − r) n ∞ n=0

k=0

k

uk r · r

n−k

=

∞

n

(u0 + u1 + · · · + un ) r =

n=0

∞

sn rn .

n=0

This proves (12.40). Now, if {rm } is any sequence tending to 1, 0 < rm < 1, then the positive numbers αmn = (1 − rm ) rnm

328

Measure and Integral: An Introduction to Real Analysis

satisfy conditions (i), (ii), (iii) of Theorem 12.38. We leave the verification to the reader.

Theorem 12.41 (Abel) If is A-summable to s.

∞

n=0

un converges to sum s, −∞ ≤ s ≤ +∞, then it

Proof. Suppose first that s is finite. Applying (12.40), we have to show that n (1 − r) ∞ n=0 sn r → s as r → 1. It is enough to prove that this relation holds for any sequence r = rm , m = 0, 1, . . . , where 0 < rm < 1, rm → 1. This is a corollary of Theorem 12.38 since the numbers αmn = (1 − rm )rnm satisfy (i), (ii), (iii). The matrix αmn is positive, and so the proof holds for s = ±∞, the only prerequisite being that the series un rn converges for 0 ≤ r < 1. We may also consider the power series f (z) =

∞

un zn ,

n=0

where z is a complex variable lying in the unit disc: z = reix , 0 ≤ r < 1. If f (z) tends to a limit s as z tends nontangentially to 1, that is, as z → 1 in such a way that |1 − z| ≤ C < +∞ (|z| < 1), 1 − |z| then ∞ n=0 un is said to be nontangentially Abel summable to sum s. The last inequality means that, in approaching 1, z remains between two chords of the unit circle through z = 1. In fact, if z = x + iy is a pointthat satisfies 0 < x < 1

and |1 − z| ≤ C(1 − |z|), then |y| < C(1 − x) since |y| < y2 + (1 − x)2 = |1 − z| and C(1 − |z|) ≤ C(1 − x). Conversely (see Exercise 23(a)), given a constant γ > 0, there are constants C and δ with C > 0 and 0 < δ < 1 such that if z = x+iy with |z| < 1, 1−δ < x < 1 and |y| < γ(1−x), then |1−z| ≤ C(1−|z|). See (12.65) for another characterization of the notion of nontangential approach of z to 1.

Theorem 12.42 (Abel–Stolz) If ∞ n=0 un converges to a finite sum s, then it is nontangentially Abel summable to s. Proof. The proof is identical tothat of Abel’s theorem, except that now we use the formula un zn = (1 − z) sn zn and consider any sequence {zm } tending

329

A Few Facts from Harmonic Analysis

to 1 from the interior of the unit disc. The matrix αmn is now (1 − zm )znm , conditions (ii) and (iii) of Theorem 12.38 are satisfied as before, and (i) takes the form |1 − zm | ≤ C. 1 − |zm | Theorem 12.43 If ∞ mean n=0 un is summable by the method of the arithmetic to sum s, then it is A-summable to s. If in addition s is finite, then ∞ n=0 un is nontangentially A-summable to s. Proof. Suppose that s is finite. By hypothesis, σn =

s0 + s1 + · · · + sn → s. n+1

Write s0 + s1 + · · · + sn = tn . Applying formula (12.40) twice, we have ∞ n=0

un rn = (1 − r)

∞ n=0

sn rn = (1 − r)2

∞

tn rn = (1 − r)2

n=0

∞ (n + 1)σn rn . n=0

Again, it is enough to consider any sequence rm → 1, 0 < rm < 1. We then have to apply Theorem 12.38 with matrix αmn = (1 − rm )2 (n + 1)rnm , and we easily verify that (αmn ) satisfies conditions (i), (ii), (iii) of Theorem 12.38. The rest of the proof of the theorem is left to the reader. While convergence of a series implies summability A, the converse is gen 1 n erally false: for example, ∞ n=0 (−1) diverges but is A-summable to sum 2 ∞ since n=0 (−r)n = 1/(1 + r) → 12 as r → 1−. If one makes additional assumptions on the terms of the series, however, the converse will hold. The following result is both elementary and useful. Theorem 12.44 (Tauber) If un is A-summable to sum s, −∞ ≤ s ≤ +∞, and if un = o(1/n) as n → ∞, then un converges to sum s.

330

Measure and Integral: An Introduction to Real Analysis

Proof. Write un = εn /n, n > 1, where εn → 0. Let rm be a sequence tending to 1, which we shall determine in a moment. Then sm − f (rm ) is a transformation of the sequence εn : sm − f (rm ) =

m εn n=1

n

−

∞ εn

n

n=1

rnm =

∞

αmn εn ,

n=1

where αmn =

1 1 − rnm n

if n ≤ m,

1 αmn = − rnm n

if n > m.

If we verify conditions (i) and (iii) of Theorem 12.38, then the fact that εn → 0 will give sm − f (rm ) → 0, and so also sm → s. Condition (iii) is obvious for any {rm } → 1. As for (i), observing that 1 − rn = (1 − r) 1 + · · · + rn−1 ≤ (1 − r)n, we have n

|αmn | ≤

m ∞ 1 1 n r (1 − rm ) n + n n m n=1

n=m+1 ∞

≤ m (1 − rm ) +

1 n rm m+1 n=0

1 1 = m (1 − rm ) + . m + 1 1 − rm Hence, if we choose rm = 1 − (1/m), then holds, and the theorem follows.

n |αmn |

≤ 2. Thus, condition (i)

un is summable by the method of the arithmetic mean and nun → 0, then If un converges. Of course, this is a corollary of Theorems 12.43 and 12.44, but a direct proof is on the surface: By (12.39), 1 nun , m+1 m

s m − σm =

n=0

and the assumption nun → 0 clearly implies sm − σm → 0. Thus, if σm → s, then also sm → s. Actually, this argument shows that if nun → 0, then whether {σm } converges or not, the difference sm −σm tends to 0, that is, the behavior of {sm } imitates that of {σm }. The same argument shows that if {σm } is a bounded

331

A Few Facts from Harmonic Analysis

sequence, and |un | ≤ A/n for n = 1, 2, . . . , then the sequence {sm } is bounded. The result that follows lies deeper.

Theorem 12.45 (Hardy) If mean to a finite sum s and if

un is summable by the method of the arithmetic

|un | ≤ then

A n

(n = 1, 2, . . .),

un converges to s.

Proof. Consider the expressions (which we shall call the delayed arithmetic means) σn,k =

sn+1 + sn+2 + · · · + sn+k . k

They are easily expressible in terms of the σn :

σn,k

s0 + · · · + sn+k − (s0 + · · · + sn ) n+k+1 n+1 = σn+k − σn = k k k n+1 σn+k − σn + σn+k . = k

It is clear that if kn is any sequence of integers such that n/kn is bounded as n → ∞, then σn → s implies that σn,kn → s. Using the definition of σn,k , we also deduce that

σn,k

(sn+1 − sn ) + · · · + sn+k − sn = sn + k 1 (k − j + 1)un+j . k k

= sn +

j=1

Hence, assuming as we may that A = 1, k σn,k − sn ≤ un+j ≤ j=1

k . n+1

332

Measure and Integral: An Introduction to Real Analysis

Let k = kn = [εn], where ε > 0 is arbitrarily small and fixed, and [x] designates the integral part of x. Then n/kn is bounded, and so σn,kn → s. But by taking kn = [εn] in the last estimate, we obtain lim sup σn,kn − sn ≤ ε. n→∞

Hence, sn → s, and the proof is complete. We remark in passing that the conclusion of Hardy’s theorem is true if the assumption of summability by the method of the arithmetic mean is replaced by A-summability (theorem of Littlewood)∗ .

12.6 Summability of S[f ] and S[f ] by the Method of the Arithmetic Mean Given a periodic f , we denote by sn (x) = sn (x, f ) the partial sums of S[f ] and by σn (x) = σn (x, f ) their arithmetic means. Thus (see p. 315 in Section 12.3 for the definition of the Dirichlet kernel Dn ), sn (x) =

π π 1 1 f (t)Dn (x − t) dt = f (x + t)Dn (t) dt, π −π π −π

σn (x) =

π π 1 1 f (t)Kn (x − t) dt = f (x + t)Kn (t) dt, π −π π −π

where, by using (12.26), n n 1 1 1 t. Dj (t) = sin j + Kn (t) = n+1 2 (n + 1)2 sin 12 t j=0 j=0 Note that σ0 (x) = a0 /2 and K0 (t) = 1/2. Multiplyingand dividing the last sum

termwise by 2 sin 12 t and using the equation 2 sin j + cos(j + 1)t, we get 1 − cos(n + 1)t 2 Kn (t) = 2 = n + 1 (n + 1) 2 sin 12 t

1 2

t sin 12 t = cos jt −

sin[(n + 1)t/2] 2 sin 12 t

2 .

(12.46)

∗ See A. Zygmund, Trigonometric Series, vol. 1, 2nd edn., Cambridge University Press,

Cambridge, U.K., 1968, p. 81.

A Few Facts from Harmonic Analysis

333

The trigonometric polynomial Kn (t) is called the nth Fejér kernel. An analogous kernel was considered in (9.11) for nonperiodic functions. The formula σn (x, f ) =

π π 1 1 f (t)Kn (x − t) dt = f (x − t)Kn (t) dt π −π π −π

is a periodic version of the notion of convolving a function and a kernel. Some of the facts we will prove below are similar to ones we have already had in Chapter 9, but rather than connecting the present case with those results, we shall give brief direct proofs of the theorems we need. j Using the formula Dj (t) = 12 + m=1 cos mt, j ≥ 1, the Fejér kernel can be written (see (12.39)): Kn (t) =

n n m |m| 1 1 1− 1− + cos mt = eimt , 2 n+1 2 m=−n n+1 m=1

which should be interpreted as 1/2 in case n = 0. Kn has the following properties: (a) Kn (t) ≥ 0; Kn (−t) = Kn (t). π (b) (1/π) −π Kn (t) dt = 1. (c) Kn (t) ≤ (n + 1)/2; Kn (t) ≤ A/[(n + 1)t2 ] (0 < |t| ≤ π; A an absolute constant). Here, (a) and (b) are obvious from the various previously mentioned formulas for Kn . The first part of (c) follows from the formula Kn (t) = (n + 1)−1 nj=0 Dj (t) together with the obvious estimate |Dj | ≤ j + 12 and the identity nj=0 j = n(n + 1)/2. The second part follows from (12.46) if we note that | sin u| ≥ (2/π)|u| for 0 ≤ |u| ≤ π/2. From the second inequality in (c), we immediately deduce (c’)

δ≤|t|≤π Kn (t) dt

→ 0 as n → ∞ for any fixed δ, 0 < δ ≤ π.

These properties of Kn lead to the next result, which is basic and related to Theorem 9.9.

Theorem 12.47 (Fejér) Let f be integrable and periodic. Then σn (x) → f (x)

334

Measure and Integral: An Introduction to Real Analysis

at each point of continuity of f , and the convergence is uniform over every closed interval of continuity. In particular, σn (x) tends to f (x) uniformly everywhere if f is continuous everywhere. If f has a jump discontinuity at x0 , then σn (x0 ) →

1 [f (x0 +) + f (x0 −)] . 2

Proof. Suppose f is continuous on a closed interval I = [a, b] (which may reduce to a point). Given ε > 0, we can find δ > 0 so that | f (x + t) − f (x)| < ε for x ∈ I, |t| < δ. Using (b), we can write (assuming as we may that δ ≤ π) π 1 σn (x) − f (x) = [f (x + t) − f (x)]Kn (t) dt π −π

=

1 1 + π π |t| α ∪ x : f > α ⊂ x : h > α = S ∪ T. x : 2 2

Recall that 0 ≤ g ≤ 2α; we also have Hence, by Tchebyshev’s inequality, |S| ≤

1 α 2

=

Qg

−2

Qf

since

Qh

= 0 by (12.71)(b).

g2

Q

4 8 8 ≤ 2 g2 ≤ g= f. α α α Q

Q

Q

To estimate |T|, let T1 = T ∩ P∗ , T2 = T ∩ Q∗ . Thus, |T| = |T1 | + |T2 |, and by (12.77), |T2 | ≤ |Q∗ | ≤

2 f. α Q

358

Measure and Integral: An Introduction to Real Analysis

On T1 , | h(x)| is majorized a.e. by (see (12.76)) Aα

Q

δ(t) δ(t) dt = Aα dt = AαM(x), (x − t)2 (x − t)2 ∗ Q

and if | h(x)| is to be ≥ 12 α there, then necessarily M(x) ≥ 1/(2A). But M(x) is the integral Mλ (x) of Theorem 12.72 corresponding to λ = 1, G = Q∗ , F = P∗ . Therefore, by the estimate given in Theorem 12.72 and (12.77),

M ≤ 2|Q∗ | ≤

P∗

4 f, α Q

∗ and by Tchebyshev’s inequality, the M(x) subset of P where−1 ≥ 1/(2A) has −1 f . Thus, |T | ≤ 8Aα measure at most 2A P∗ M ≤ 8Aα 1 Q Q f , and collecting estimates, we have

! " c f (x) > α ≤ f 1 (c = 10 + 8A). x : α This was proved for α ≥ (2π)−1 Q f but is trivially true for smaller α since our set certainly has measure ≤ 2π, while 2π ≤ α−1 Q f for such α. This completes the proof of Theorem 12.67. The theorem is strengthened by the following result.

Theorem 12.78

If f ∈ L, then the maximal conjugate function fε (x) f∗ (x) = sup 0 α/2 . g∗ > α/2 + ω(α) ≤

360

Measure and Integral: An Introduction to Real Analysis

By Theorem 12.67, with Q = (−π, π), ! " h∗ > α/2 ≤ A (α/2)−1 |h| = Aα−1

| f |,

{| f |>α}

Q

and by Tchebyshev’s inequality and Theorem 12.62,

−2 2 g∗ > α/2 ≤ α/2 g∗ ≤ Aα−2

Q

g2 = Aα−2

f 2.

{| f |≤α}

Q

We have (see Theorem 5.51 and Exercises 16 of Chapter 5 and 5 of Chapter 6) p || f∗ ||p = −

∞

αp dω(α) = p

0

∞

αp−1 ω(α) dα.

0

Using the estimates above, the last integral is majorized by

Ap

∞

⎛

⎞

αp−1 ⎝α−1

| f | dx⎠ dα + Ap

{| f |>α}

0

∞

⎛

αp−1 ⎝α−2

⎞ f 2 dx⎠ dα.

{| f |≤α}

0

Interchanging the order of integration, we can write this as

Ap

Q

⎛ | f (x)| ⎝

| f(x)|

⎞ αp−2 dα⎠ dx + Ap

0

Q

⎛ f 2 (x) ⎝

∞

⎞ αp−3 dα⎠ dx

| f (x)|

⎧ ⎫ ⎨ 1 ⎬ 1 = Ap | f (x)|p dx + | f (x)|p dx , ⎩p − 1 ⎭ 2−p Q

Q

due to the fact that 1 < p < 2. It follows that (12.80)(b), and so also (12.80)(a), holds with , + 1 1 p + (1 < p < 2), (12.82) Ap = Ap p−1 2−p which proves the lemma. It is not surprising that Ap becomes infinite as p → 1 since as we have observed, the integrability of f does not imply that of f . On the other hand,

361

A Few Facts from Harmonic Analysis

in view of the validity of (12.80)(b) for p = 2, it is natural to expect that there is a better estimate for Ap near p = 2 than (12.82) (which becomes infinite as p → 2). This we shall see below. Meanwhile, note that the exponent 2 plays a rather accidental role in Lemma 12.81. In fact, if we knew (12.80)(b) for any p0 > 1, then the previous argument would give it for 1 < p ≤ p0 , with Ap bounded for p away from p = 1 and p = p0 . The only change necessary in the proof is replacing the exponent 2 in the argument for g by p0 . Moreover, if instead of (12.80)(b) we only knew (12.80)(a) for some p0 , then by applying the same argument to f instead of f∗ , we would obtain (12.80)(a) for 1 < p ≤ p0 , with Ap bounded for p away from 1 and p0 . We will use this idea below. Inequality (12.80)(b) for any p implies || fε ||p ≤ Ap || f ||p

(12.83)

fε g| for for the same p and all ε > 0. From the fact that || fε ||p equals supg | Q all g with ||g||p ≤ 1, 1/p + 1/p = 1, and the easily verifiable formula

fε g = −

Q

f gε

Q

(apply Fubini’s theorem), it follows that if (12.83) holds for any p, 1 < p < ∞, then it also holds for the conjugate exponent p , and Ap = Ap (see also Exercise 16 of Chapter 10). But we proved (12.83) for 1 < p ≤ 2. Hence, it also holds for 2 ≤ p < ∞, and an observation analogous to the one made earlier just before (12.83), but this time for fε rather than f , shows that the constant Ap in (12.83) remains bounded for p away from 1 and ∞. Using (12.82), we thus see that the Ap in (12.83) satisfies the inequality Ap ≤

A p−1

(1 < p ≤ 2),

(12.84)

for p ≥ 2.

(12.85)

and so also (since Ap = Ap ) Ap ≤ Ap

f (x) a.e. as ε → 0 for any integrable f , (12.83) leads to the Since fε (x) → basic inequality ||f ||p ≤ Ap || f ||p , 1 < p < ∞, where Ap satisfies the two previous estimates. This proves (12.80)(a); we still have to prove (12.80)(b) for 2 < p < ∞ and the formula S[f ] = S[ f ] for f ∈ Lp , p > 1. p Let us show that if f ∈ L , p > 1, then S[f ] = S[ f ]. We may assume that p < 2. In view of Theorem 12.56, lim σn (x) exists and equals f (x) a.e. for f merely integrable. Next, by Theorem 12.61(ii), | σn (x)| is majorized by f∗ (x) + cf ∗ (x),

362

Measure and Integral: An Introduction to Real Analysis

an integrable function in our case since f ∈ Lp , 1 < p < 2. Hence, by the dominated convergence theorem, each Fourier coefficient of σn tends to the corresponding Fourier coefficient of f . But it also tends to the correspond ikx ing coefficient of S[f ]; in fact by (12.39), if f ∼ ck e , then the kth Fourier coefficient of σn equals [1 − |k|/(n + 1)](−i sign k)ck if |k| ≤ n and equals 0 if |k| > n. Thus, S[f ] = S[ f ]. To complete the proof of Theorem 12.79, it remains to show that (12.80)(b) holds for 1 < p < ∞ (we have shown it only for 1 < p ≤ 2), with Ap bounded for p away from 1 and ∞. It is immediate (see the proof of Theorem 12.62) that if 1/(n + 1) ≤ ε ≤ 1/n, then | fε (x) − f1/n (x)| is majorized by f ∗ (x). Hence, by Theorem 12.61(ii), f∗ (x) ≤ cf ∗ (x) + sup | σn (x)| . n

By Theorem 12.61(i) applied to S[f ] = S[ f ], ∗ sup f ≤c f (x). σn (x, f ) = σ∗ x, n

Hence,

! ∗ " f∗ ≤ c f ∗ + , f + ∗ , f || f∗ ||p ≤ c || f ∗ ||p + p

≤ c Cp {|| f ||p + || f ||p }, where Cp is the constant of the Hardy–Littlewood maximal Theorem 12.60. In view of (12.80)(a), the inequality || f∗ ||p ≤ Bp || f ||p is thus established for all p, 1 < p < ∞, with Bp = cCp [1 + Ap ], Ap being the constant in (12.80)(a). Combining the estimates (12.84) and (12.85) for Ap with the estimate for Cp on p. 228 in Section 9.3, it follows that Bp is bounded for p away from 1 and ∞. In particular, since Cp remains bounded as p → ∞, it follows that Bp is O(p) as p → ∞. To estimate Bp as p → 1, it is best to use the estimate derived in the proof of Lemma 12.81 (see (12.82)); thus, Bp = O(1/(p − 1)) as p → 1, so that Bp satisfies the same sorts of estimates as Ap . This completes the proof of Theorem 12.79. We have the following important corollary.

Corollary 12.86

fε converges in Lp norm to f. Let f ∈ Lp , 1 < p < ∞. Then

This is an immediate consequence of the convergence theorem dominated fε ≤ f∗ ∈ Lp . since fε converges pointwise a.e. to f and

363

A Few Facts from Harmonic Analysis

12.10 Application of Conjugate Functions to Partial Sums of S[f ] The behavior of the partial sums sn (x) = sn (x, f ) is a much more delicate topic than the behavior of the arithmetic means. We will consider only the question of the convergence of sn to f in the metric Lp . The main tool here is a connection between the partial sums and the conjugate function. Instead of sn , it will be convenient to consider the expressions (see (12.27)) #

sn (x) =

sn (x) + sn−1 (x) 1 = sn (x) − (an cos nx + bn sin nx) 2 2

(12.87)

when n ≥ 1. We have #

sn (x) = =

π 1 Dn (x − t) + Dn−1 (x − t) dt f (t) π −π 2 π 1 sin n(x − t) dt f (t) π −π 2 tan 12 (x − t)

π π 1 f (t) cos nt 1 f (t) sin nt = sin nx · dt − cos nx · dt. π −π 2 tan 12 (x − t) π −π 2 tan 12 (x − t)

The last decomposition was purely formal, but from Section 12.8, we know that the cofactors of sin nx and cos nx exist a.e. in the principal value sense (and represent the conjugate functions of f (t) cos nt and f (t) sin nt, respectively). Theorem 12.88 (M. Riesz) If f ∈ Lp , 1 < p < ∞, then (i) ||sn ||p ≤ c|| f ||p , (ii) || sn ||p ≤ c|| f ||p ,

|| f − sn ||p → 0, || f − sn ||p → 0,

where c depends only on p. Proof. It is enough to prove (i), which implies (ii) since sn (x, f ) = sn (x, f) # and ||f ||p ≤ Ap || f ||p , 1 < p < ∞. The first part of (i) with sn instead of sn #

is an immediate corollary of the last formula for sn and the inequality

364

Measure and Integral: An Introduction to Real Analysis

f p ≤ Ap f p : in fact, letting gn (t) = f (t) cos nt, hn (t) = f (t) sin nt in the formula, we obtain sn (x) = sin nx gn (x) − cos nx hn (x) #

(a.e.),

# ||sn ||p ≤ || gn ||p + || hn ||p ≤ Ap (||gn ||p + ||hn ||p ) ≤ Ap || f ||p #

for 1 < p < ∞. Since, by (12.87), sn − sn p ≤ A f 1 ≤ A f p , we obtain sn p ≤ Ap f p . The relation || f − sn ||p → 0 is obvious for functions that have a continuous derivative (see Theorem 12.20), and since such functions are dense in Lp , it also follows in the general case.

Exercises 1. Prove the following versions of Theorems 12.13 and 12.14. (a) If f ∼ 12 a0 + ∞ k=1 (ak cos kx + bk sin kx), and f is the indefinite integral of f , then f ∼ ∞ k=1 k(bk cos kx − ak sin kx). ∞ 1 (b) If f ∼ 2 a0 + k=1 (ak cos kx + bk sin kx), and F is the indefinite integral of f , then ∞

1 1 1 (ak sin kx − bk cos kx). F(x) − a0 x ∼ A0 + 2 2 k k=1

2. Prove that if f (x) ∼

ck eikx , then f (x + α) ∼

ck eikα eikx . 3. If f is real or complex-valued and f ∼ 12 a0 + ∞ k=1 (ak cos kx + bk sin kx), show that 2π ∞ 1 1 |ak |2 + |bk |2 . | f |2 = |a0 |2 + π 2 k=1

0

4. Deduce from (12.18) that if f , g ∈ L2 , f ∼

ck eikx , g ∼

2π 1 f (t)g(x − t) dt = ck dk eikx . 2π 0

dk eikx , then

365

A Few Facts from Harmonic Analysis

Thus, a Fourier coefficient of the convolution of f and g equals the product of the corresponding coefficients of f and g. (For nonperiodic versions of this result, see Theorems 13.30 and 13.59.) 5. Prove the following. (a) If f (x) is periodic and equal to sign x in (−π, π), then f (x) ∼

+ , sin 3x sin 5x 4 sin x + + + ··· . π 3 5

(b) Let 0 < h < 12 π, and let f be the triangular function defined as follows: f is periodic, even, continuous, f (0) = 1, f (x) = 0 for 2h ≤ x ≤ π, and f is linear in (0, 2h). Then 2h f ∼ π

2 ∞ +∞ h 1 sin kh 2 ikx sin kh + cos kx = e . 1+ 2 kh π kh −∞ k=1

(c) Let g be periodic and equal to

g∼

1 2

log 1/ 2 sin 12 x in (−π, π). Then

∞ cos kx k=1

k

.

(For (b), the coefficients can be computed directly, or by using Exercise 4 and Example (b), Section 12.1, p. 306. Observe that the convolution of the characteristic function of an interval with itself is a triangular function. For (c), one may either integrate by parts in the formula for the cosine coefficients of g or consider the real part of the series ∞ k z k=1

k

= log

1 , 1−z

z = reix ,

for r < 1, and then let r → 1.) 6. Using the formula for the Fourier series of 12 (π − x) given in Example (a), Section 12.1, p. 306, prove the formulas ∞ 1 π2 , = 2 6 n n=1

∞ 1 π4 , = 4 90 n n=1

∞ 1 = π2k Bk for some rational Bk n2k n=1

(k = 1, 2, . . .).

366

Measure and Integral: An Introduction to Real Analysis

7. Prove that each of the systems (a) 12 , cos x, cos 2x, . . . , cos kx, . . . (b) sin x, sin 2x, . . . , sin kx, . . . is orthogonal and complete over (0, π). 8. Let {φj (x)} and {ψk (y)} be two orthogonal systems: the first over a set A ⊂ Rm and the second over a set B ⊂ Rn . Then the (double) system ωj,k (x, y) = φj (x)ψk (y) is orthogonal over the Cartesian product C = A × B ⊂ Rm+n . If both {φj } and {ψk } are complete, so is {ωj,k }. Generalize this to the case of more than two orthogonal systems. 9. Let {(k1 , k1 , . . . , kn )} be all lattice points in the space Rn (i.e., all distinct points with integral coordinates). Then the system exp {i (k1 x1 + k2 x2 + · · · + kn xn )} is orthogonal and complete over any cube in Rn with edge length 2π. ikx 10. If f ∼ ck eikx ∈ L2 and g ∼ dk e ∈ L2 , then h = fg is integrable, and inx if h ∼ Cn e , then Cn =

+∞

ck dn−k ,

k=−∞

where the series on the right converges absolutely. (For n = 0, this 2π means (1/2π) 0 fg = ck d−k , which is a variant of (12.18).) See also Exercises 17 and 18. 11. Let f ∼ ck eikx ∈ L2 . For each n, let γn =

k=n

ck

1 1 = cn−k . n−k k k=0

Show that {γn } ∈ l2 and, more precisely, that |γn |2 ≤ π2 |ck |2 . The numbers γn are discrete analogues of the formal Hilbert integral [f (t)/(x − t)] dt; see Section 3 of Chapter 13. (The numbers γn are the Fourier coefficients of the function fg, where g ∼ k=0 eikx /k = i(π − x) for 0 < x < 2π (see Example (a), Section 12.1, p. 306), and we have 2π 0

| fg|2 ≤ π2

2π 0

| f |2 .)

367

A Few Facts from Harmonic Analysis

12. Let f and g be periodic, f ∈ Lp , g ∈ Lp , 1 ≤ p ≤ ∞, 1/p + 1/p = 1. Consider the product hx (t) = f (x+t)g(t) as a function of t. Show that the nth Fourier coefficient of hx (t) tends to 0 as n → ∞, uniformly in x. (Show that the L1 -modulus of continuity of h tends to 0 uniformly in x; apply (12.22).) 13. Let f be periodic and integrable. Show that for the partial sums sn and sn , we have the following formulas: π (a) sn (x) = (1/π) −π f (x + t) [(sin nt)/t] dt + εn (x), where εn (x) tends to 0 uniformly in x as n → ∞. π (b) sn (x) = (1/π) −π f (x + t) [(1 − cos nt)/t] dt + ηn (x), where ηn (x) tends to 0 uniformly in x. (For (a), except for an error which is o(1) uniformly in x, we have sn (x) =

π 1 sin nt dt, f (x + t) π −π 2 tan 12 t

and since 12 cot 12 t− 1t is bounded in (−π, π), the result follows easily from Exercise 12 with p = 1, p = ∞.) π n (t)| dt, D n denoting the conjugate Dirichlet kernel. 14. Let Ln = 2π−1 0 |D Prove the following analogue of Theorem 12.36: 2 Ln = log n + o(log n) π

as n → ∞.

n )| = max{| Show also that Ln = | sn (0, sign D sn (0, f )| : f with | f | ≤ 1}. 15. Show that if x is a point of continuity of an integrable f , then sn (x) = o(log n). If f has a jump discontinuity at x and the value of the jump is d = f (x+) − f (x−), show that sn (x) = −(d/π) log n + o(log n). (The second statement follows from the first by considering [if, e.g., x = 0] the function h(t) = 12 (π − t) of Example (a), Section 12.1, p. 306, whose conjugate function h satisfies h = −g, where g is the function of Exercise 5(c).) 16. Prove the following more general form of Theorem 12.38. Suppose that lim inf sn = s and lim sup sn = s¯ are finite. Then both lim inf σn and lim sup σn are contained between the numbers 1 1 (s + s¯) ± A (¯s − s), 2 2 where A is the same as in condition (i) of Theorem 12.38.

17. Prove the following extension of Exercise 10. Let f ∈ Lpand g ∈ Lp , 1 < p < ∞, 1/p + 1/p = 1 (thus also 1 < p < ∞). If f ∼ ck eikx and

368

Measure and Integral: An Introduction to Real Analysis ikx g∼ dk e , then the Fourier coefficients Cn of the (integrable) function h = fg are given by Cn =

∞

ck dn−k =

k=−∞

M

lim

M→+∞

.

k=−M

Thus, the Cn are the same as in Exercise 10, but the series representing the Cn are no longer claimed to be absolutely convergent, and we must consider the limits of their symmetric partial sums. (The proof is parallel to that of Exercise 10. We write Cn =

2π 2π 2π 1 1 1 fge−inx = (f − sM )ge−inx + sM geinx , 2π 2π 2π 0

0

0

where sM = sM (x, f ), observe that the first integral on the right is 2π majorized by 0 | f − sM | |g| , apply Hölder’s inequality, and use the fact that || f − sM ||p → 0 by Theorem 12.88.) 18. The result of Exercise 17 is valid when p = 1, p = ∞ (or when p = ∞, p = 1), but the series defining the Cn must be taken in the sense of the (symmetric) first arithmetic means: Cn =

lim

M→+∞

M

ck 1 −

k=−M

|k| dn−k . M+1

19. We know that Theorem 12.79 is false for p = 1. Show that it is also false for p = ∞, that is, that the conjugate function of a bounded function (sin kx)/k ∼ need not be bounded. (Consider, e.g., the two series ∞ k=1 ∞ 1 1 1 k=1 (cos kx)/k ∼ 2 log(1/|2 sin 2 x|), −π < x < π 2 (π−x), 0 < x < 2π, and [see Exercise 5].) 20. There is a substitute result for Theorem 12.79 in case p = ∞. Let f be a periodic function with | f | ≤ 1. Then there are absolute constants λ, μ > 0 such that π

eλ|f | ≤ μ.

−π

See also Exercise 27 in Chapter 14. (Write

eλ|f | − λ| f| − 1 =

∞ λn | f |n n=2

n!

369

A Few Facts from Harmonic Analysis

and integrate termwise. Use (12.80)(a), (12.85), and the fact that nn ≤ cn n!.) 21. Suppose that f is a periodic function for which π

| f | log+ | f | < +∞,

−π

where log+ stands for the positive part of log. This clearly implies that f ∈ L1 . Show that f ∈ L1 and that there are absolute constants A and B such that π

π

| f| ≤ A

−π

| f | log+ | f | dx + B.

−π

(Write ω(α) = |{x : |x| < π, | f (x)| > α}|. Then π ∞ 2 ∞ f = ω(α) dα = + . −π

0

0

2

For the first integral on the right, use the fact that ω(α) ≤ 2π, and for the second, use an argument like that in the proof of Lemma 12.81.) 22. The discussion that precedes Theorem 12.31 shows only that the averages ( sn (x) + sn−1 (x))/2 converge. Give the remaining details of the proof of sn−1 (x).) Theorem 12.31. (Consider sn (x) − 23. (a) Prove the statement about nontangential approach made before Theorem 12.42, namely, that given γ > 0, there exist C, δ > 0 with δ < 1 such that |1 − z| ≤ C(1 − |z|) if z = x + iy, |z| < 1, 1 − δ < x < 1 and |y| < γ(1 − x). (It may be helpful to use the asymptotic estimates | sin θ| |θ| and 1 − cos θ θ2 /2 as θ → 0.) (b) Verify that (12.65) characterizes the nontangential approach of z = reix to 1, |z| < 1.

13 The Fourier Transform In this chapter, we will study properties of the Fourier transform f (x) of a function f on Rn , n ≥ 1, defined formally (for the moment) as f (x) =

1 f (y) e−ix·y dy, (2π)n n

x ∈ Rn .

(13.1)

R

n

Here x · y = of x = (x1 , . . . , xn ) and y = 1 xk yk is the usual dot product √ f may (y1 , . . . , yn ), and i is the complex number i = −1 = eiπ/2 . Both f and be complex-valued. Different normalizations of f are common in the literature, such as 1 f (y)e−ix·y dy, (2π)n/2 n R

f (y)e2πix·y dy, . . . ,

Rn

but the important properties of f are unaffected by normalization, and passing from one normalization to another is easy by scaling. We will often abuse notation by denoting f (x) = f (x)

or f (x) = f (x)

instead of the more cumbersome notations f (y)(x), (f (·))(x), etc. For example, we will do this in Theorem 13.8 when computing the Fourier transform 2 −|x|2 and (e−|x|2 ) are somewhat simpler than of e−|x| since the notations e (e−| · | )(x). Note that f (x) is a formal analogue of the sequence {cj }∞ −∞ of trigonometric Fourier coefficients of a periodic function on the line, with the continuous variable x now playing the role of j. One of our main goals is to derive an analogue of Parseval’s formula (12.15), that is, to prove that the mapping f → f is essentially an isometry on L2 (Rn ). Of course, an important requirement for achieving this is to find an interpretation of f in case f ∈ L2 (Rn ). Unlike the formulas for Fourier coefficients in the one-dimensional periodic case, the integral in (13.1) may not converge absolutely for every f ∈ L2 (Rn ). However, as is easy to see, (13.1) does converge absolutely if f ∈ L1 (Rn ). Properties of f when f ∈ L1 (Rn ) are simpler to derive precisely because of this absolute convergence, and we will 2

371

372

Measure and Integral: An Introduction to Real Analysis

begin with that case. Furthermore, properties of f when f ∈ L1 (Rn ) will be useful later for studying f for other classes of functions f .

13.1 The Fourier Transform on L1 In this section, we list some properties of the Fourier transform of functions in L1 (Rn ). (1) Let f ∈ L1 (Rn ), n ≥ 1. Define f (x) by (13.1) and note that (cf. Exercise 1 of Chapter 8) 1 −ix·y | f (x)| = f (y)e dy n (2π) n R

1 ≤ | f (y)| dy, (2π)n n

x ∈ Rn .

R

Thus, the integral in (13.1) converges absolutely (i.e., exists in the usual Lebesgue sense) for every f ∈ L1 (Rn ) and every x ∈ Rn , and the mapping f → f sends the space L1 (Rn ) into L∞ (Rn ) with f (x)| ≤ (2π)−n f 1 . sup |

(13.2)

x∈Rn

The verification is immediate. (2) The mapping f → f is linear on L1 (Rn ), that is, if f1 , f2 ∈ L1 (Rn ) and c1 , c2 ∈ C (the class of complex numbers), then f1 (x) + c2 f2 (x), c1 f1 + c2 f2 (x) = c1

x ∈ Rn .

(13.3)

We leave the simple proof to the reader. (3) Next, let us show that f (x) is a uniformly continuous function of x on Rn 1 n if f ∈ L (R ). For any x, h ∈ Rn , f (x + h) − f (x) = f (y)e−ix·y e−ih·y − 1 dy (2π)n n R ≤ | f (y)| min{|h||y| , 2} dy Rn

≤

| f (y)| |h||y| dy +

|y| 0 will be chosen momentarily. Let ε > 0. Note that II depends only on N and f , not on x or h, and II → 0 as N → ∞ since f ∈ L1 (Rn ). Fix N so large that II < ε/2. For I, we have ε I≤ | f (y)| |h|N dy ≤ f 1 N|h| < 2 |y|≤N

if |h| < ε/(2 f 1 N). Here, we have assumed that f 1 = 0, but otherwise f ≡ 0 and the result is trivial. Hence, I + II < ε uniformly in x and h provided |h| is small, which proves the result. (4) There is a version for the Fourier transform of the Riemann–Lebesgue Theorem 12.21 for Fourier coefficients, namely, f (x) = 0. Theorem 13.4 (Riemann–Lebesgue) If f ∈ L1 (Rn ), then lim |x|→∞

Proof. We will give two proofs. First, for any x ∈ Rn , |x| = 0, we write f (x) = (2π)n

f (y)e−ix·y dy

Rn

=

2

πx πx −iπ |x| f y + 2 e−ix·y e |x|2 dy = − f y + 2 e−ix·y dy. |x| |x| n n

R

R

Adding the first and last formulas gives

πx −ix·y 2 (2π) f (x) = f (y) − f y + 2 e dy |x| n n

R

and 2 (2π) | f (x)| ≤ n

f (y) − f y + πx dy. |x|2 n

R

The last integral tends to 0 as |x| → ∞ by continuity in L1 , Theorem 8.19, and the first proof is complete. We may also proceed by computing the Fourier transform of step functions and using a density argument. Let I = nk=1 [ak , bk ] be any interval in Rn with positive measure |I|. For any x = (x1 , . . . , xn ), by Fubini’s theorem,

374

Measure and Integral: An Introduction to Real Analysis

b1

n

(2π) χI (x) =

bn

···

a1

an

bk n

=

e−ix1 y1 · · · e−ixn yn dy1 · · · dyn

e−ixk t dt =

k=1 ak

n

F[ak ,bk ] (xk ),

k=1

where for any one-dimensional interval [a, b] and any s ∈ R1 , we denote

F[a,b] (s) =

b a

⎧ ⎨b − a if s = 0 e−ist dt = e−isb − e−isa ⎩ if s = 0. −is

iφ iψ 1 The simple inequality |e − e | ≤ |φ − ψ|, φ, ψ ∈ R shows that F[a,b] (s) ≤ b − a. Also, F[a,b] (s) ≤ 2/|s| if s = 0. Since |x| ≤ |x1 | + · · · + |xn |, there exists k0 = 1, . . . , n depending on x such that |x| ≤ n|xk0 |. Combining estimates, if |x| = 0, we obtain n F[a ,b ] (xk ) ≤ (2π)n χI (x) ≤ k k k=1

≤

2n |I| , |x|

n 2 (bk − ak ) |xk0 | k=1 k =k0

where = min(bk − ak ). k

Hence, χI (x) = O(|x|−1 ) as |x| → ∞, and in particular, χI (x) → 0 as |x| → ∞. If |I| = 0, then χI ≡ 0. If f is a linear combination of characteristic functions of intervals in Rn (i.e., if f is a step function), it then follows from (13.3) that f (x) → 0 as |x| → ∞. Finally, given any f ∈ L1 (Rn ) and ε > 0, choose a step function g such that f − g1 < ε (see the comment at the end of the proof of Lemma 7.3 for the existence of such a g). Then for any x ∈ Rn , by (13.2) and (13.3), | f (x)| ≤ | f (x) − g(x)| + | g(x)| = |(f − g)(x)| + | g(x)| ε < + | g(x)|. (2π)n Since g tends to 0 at infinity, so does f , and Theorem 13.4 is again proved. The next three properties show how the Fourier transform interacts with translations, dilations, and rotations in Rn . The proofs of the first two are left as exercises.

375

The Fourier Transform

(5) (Translation) Let f ∈ L1 (Rn ) and h ∈ Rn . Define the translation τh f of f by h as (τh f )(x) = f (x + h). Then ix·h τ f (x), h f (x) = e

x ∈ Rn .

(13.5)

Also,

τh f (x) = f (x + h) = f (x)e−ix·h ,

x ∈ Rn ,

that is, if Eh f is the function defined by Eh f (x) = f (x)e−ix·h , then f =E τh hf . (6) (Dilation) Let f ∈ L1(Rn ) and λ ∈ R1 − {0}. Define the dilation δλ f of f by λ to be the function δλ f (x) = f (λx). Then 1 x δ f λ f (x) = |λ|n λ

1 = δ 1 f (x) , λ |λ|n

x ∈ Rn .

(13.6)

(7) (Rotation) Let O be an orthogonal linear transformation of Rn and set (Of )(x) = f (Ox), x ∈ Rn . If f ∈ L1 (Rn ), then (x) = (O Of f )(x)

= f (Ox) ,

x ∈ Rn .

(13.7)

In fact, (x) = Of

1 f (Oy)e−ix·y dy (2π)n n R

=

dy 1 −1 f (y)e−ix·O y n (2π) n | det O| R

=

1 f (y)e−i(Ox·y) dy (2π)n n R

since | det O| = 1 and x · O−1 y = Ox · OO−1 y = Ox · y by orthogonality. The last expression is f (Ox) by definition, and (13.7) is proved. See Exercise 5 for an analogue of (13.7) for general nonsingular linear transformations of Rn . By definition, a radial function of x is one that depends only on |x|. Thus, f is a radial function on Rn if there is a function g(t), t ≥ 0, such that f (x) = g(|x|) for all x ∈ Rn . Formula (13.7) says that rotations about the origin commute with the Fourier transform. As a consequence, we immediately obtain the next property.

376

Measure and Integral: An Introduction to Real Analysis

(8) The Fourier transform of an integrable radial function is a radial function. It is possible to explicitly compute the Fourier transforms of e−|x| and e−|x| , and the formulas obtained have important applications. The computation of −|x|2 is the simpler of the two, and the result will be used later in order to e −|x| find e . √ 2 2 (9) In shorthand notation, we have e−|x| = (2 π)−n e−|x| /4 , that is, 2

Theorem 13.8

For all x ∈ Rn , 1 −|y|2 −ix·y 1 2 −|x|2 = e e e dy = √ n e−|x| /4 . (2π)n n (2 π) R

Proof. By Fubini’s theorem, 1 −|y|2 −ix·y e e dy (2π)n n R

=

∞ ∞ 1 2 2 · · · e−y1 · · · e−yn e−ix1 y1 · · · e−ixn yn dy1 · · · dyn n (2π) −∞ −∞

=

n ∞ n ∞ 1 −y2 −ixk yk 1 −(t2 +itxk ) k e dy = e dt. k (2π)n (2π)n −∞ −∞ k=1

k=1

2 Writing t2 + itxk = t + i x2k +

x2k 4,

we see that the last expression equals

n ∞ n 1 −(t+ixk /2)2 2 e dt · e−xk /4 . (2π)n −∞ k=1

k=1

Next, we use, without proof, the identities ∞

e−(t+is) dt = 2

−∞

∞

e−t dt = 2

√

π if −∞ < s < ∞.

−∞

The first of these is a corollary of Cauchy’s contour integration theorem; the second one is classical and has been observed in Exercise 11 of Chapter 6. Hence, −|x|2 = e

√ 1 1 2 2 · ( π)n · e−|x| /4 = √ n e−|x| /4 , (2π)n (2 π)

and the proof is complete.

377

The Fourier Transform

By rescaling e−|x| and using the dilation property (13.6), we easily find a function that is equal to a multiple of its own Fourier transform: 2

e−|x|

2 /2

=

1 2 e−|x| /2 , (2π)n/2

x ∈ Rn .

(13.9)

In order to find an analogue in higher dimensions of the one-dimensional Gauss–Weierstrass kernel defined in (9.12), we first consider the kernel K(x) = π−n/2 e−|x| , 2

x ∈ Rn ,

and the corresponding approximation of the identity: Kε (x) = ε−n K(x/ε) =

√

−n −|x|2 /ε2 πε e ,

ε > 0.

√ Note that Rn K(x) dx = 1, again by Exercise 11 of Chapter 6. Setting ε = t, t > 0, we obtain the n-dimensional Gauss–Weierstrass kernel defined by √ 2 W(x, t) = ( πt)−n e−|x| /t ,

x ∈ Rn , t > 0.

(13.10)

Then

W(x, t) dx = 1,

t > 0.

Rn

For x ∈ Rn and t > 0, the convolution Wf (x, t) defined by Wf (x, t) = [f ∗ W(·, t)](x) =

f (x − y)W(y, t) dy

(13.11)

Rn

is called the Gauss–Weierstrass integral of f . The kernel W(x, t) satisfies the heat equation

∂2

∂2 + · · · + ∂x2n ∂x21

W(x, t) = 4

∂ W(x, t) ∂t

(13.12)

in the upper half-space Rn+1 = {(x, t) : x ∈ Rn , t > 0} (see Exercise 6(a)). + Due to the dilation property (13.6) and Theorem 13.8, the Gauss– Weierstrass kernel satisfies

2 W(x, t) = e−t|x| /4 ,

W(x, t) = (2π)−n e−t|x|

2 /4

(13.13)

378

Measure and Integral: An Introduction to Real Analysis

if (x, t) ∈ Rn+1 + , where the Fourier transforms are taken in the x variable. We will use the first of these equations to derive an inversion formula for the Fourier transform, that is, a way to recover f from f . We need the following basic result in order to accomplish this. (10) (Shifting hats) If f , g ∈ L1 (Rn ), then

f (x) g(x) dx =

Rn

f (x) g(x) dx.

(13.14)

Rn

Proof. Note that both integrals in (13.14) are finite; for example, the one on the left side is finite since g ∈ L1 (Rn ) and f ∈ L∞ (Rn ) by (13.2). The formula itself is a corollary of Fubini’s theorem since 1 f (x) g(x) dx = f (y)e−ix·y dy g(x)dx (2π)n n n n R R R 1 −ix·y = f (y) g(x)e dx dy (2π)n n Rn R = f (y) g(y) dy,

Rn

where the change in the order of integration is justified because f (y)g(x) ∈ L1 (R2n , dydx). (11) We have the following inversion result (see also (13.28)). Theorem 13.15 (Inversion of the Fourier transform on L1 ) If f ∈ L1 (Rn ), then at every point x of the Lebesgue set of f ,

f (x) = lim

t→0+

2 f (y)eix·y e−t|y| dy.

(13.16)

Rn

If in addition f ∈ L1 (Rn ), then at every Lebesgue point x of f , f (x) =

f (y)eix·y dy.

(13.17)

Rn

In particular, (13.16) and (13.17) hold a.e. in Rn and at every point of continuity of f .

379

The Fourier Transform

Note that (13.17) can be rewritten as

f (−x) = (2π)n f (−x) . f (x) = (2π)n

(13.18)

Before proving Theorem 13.15, we mention that the periodic analogue of (13.17) is that if the sum of the Fourier coefficients of a periodic f ∈ L1 (−π, π) converges absolutely, then S[f ], the Fourier series of f , converges to f at every Lebesgue point of f . This is a corollary of the first part of Lebesgue’s Theorem 12.51 (together with Theorem 12.38) since S[f ] converges everywhere if the sum of the Fourier coefficients of f converges absolutely. Proof. To prove Theorem 13.15, let f ∈ L1 (Rn ) and write

Wf (x, t) =

f (x − y)W(y, t) dy =

Rn

f (x + y)W(y, t) dy,

Rn

where the last equality is true since W(y, t) is an even function of y. By (13.13) and (13.14), the last integral equals

2 −t|y|2 /4 (τx f )(y) e−t|y| /4 dy = τ dy x f (y) e

Rn

=

Rn 2 f (y)eix·y e−t|y| /4 dy

by (13.5).

Rn

In summary, we have Wf (x, t) =

2 f (y)eix·y e−t|y| /4 dy for all (x, t) ∈ Rn+1 + .

(13.19)

Rn

Now let t → 0+. Due to the convolution representation of Wf (x, t) and Theorem 9.13, Wf (x, t) converges to f (x) at every Lebesgue point x of f . The same is true if t is replaced by 4t, and consequently, (13.16) follows from (13.19). Assuming in addition that f ∈ L1 (Rn ), we easily obtain from Lebesgue’s dominated convergence theorem that, as t → 0, the integral on f (y)eix·y dy. Formula (13.17) now follows at the right in (13.19) tends to Rn every Lebesgue point of f , and the proof is complete. The integrals in (13.16), namely, Rn

2 f (y)eix·y e−t|y| dy,

x ∈ Rn , t > 0,

380

Measure and Integral: An Introduction to Real Analysis

are referred to as the Gauss–Weierstrass means of the Fourier inversion integral ix·y dy, although the latter integral may not exist in the Lebesgue sense Rn f (y)e 1 if f ∈ L (Rn ). In general, the integrals

g(y)e−t|y| dy 2

Rn

are called the Gauss–Weierstrass means of Rn g and may exist and be finite for every t > 0 while Rn g may not. The analogous expressions

g(y)e−t|y| dy,

t > 0,

Rn

are called the Abel means of

Rn

g (see (13.28)).

(12) An immediate corollary of (13.16) in Theorem 13.15 is f (x) = g(x) for all x ∈ Rn , Corollary 13.20 (Uniqueness) If f , g ∈ L1 (Rn ) and n 1 n then f = g a.e. in R . In particular, if f ∈ L (R ) and f = 0 everywhere in Rn , then f = 0 a.e. in Rn . (13) Another corollary of Theorem 13.15 is the following simple sufficient condition for integrability of f. Corollary 13.21 Suppose that f ∈ L1 (Rn ), f is continuous at x = 0 and f ≥0 everywhere. Then f ∈ L1 (Rn ) and f 1 = f (0). Proof. Under the hypotheses, we may set x = 0 in (13.16) to obtain f (0) = lim

t→0+

2 f (y)e−t|y| dy.

Rn

2 f (y)e−t|y| dy is bounded in t for t > 0. Its inteIn particular, the integral Rn grand is nonnegative since f ≥ 0 by assumption. Letting t → 0, Fatou’s lemma then implies that f is integrable. Finally, (13.17) with x = 0 gives f (0) =

Rn

completing the proof.

f (y) dy = f 1 ,

381

The Fourier Transform

(14) Next, we compute the Fourier transform of e−|x| in Rn , n ≥ 1. Let

(α) =

∞

sα−1 e−s ds,

α > 0,

0

denote the classical Gamma function.

Theorem 13.22

For all x ∈ Rn and ε > 0, n+1 ε 2 −ε|x| e = n+1 n+1 . π 2 ε2 + |x|2 2

Proof. Because of the dilation property (13.6), it is enough to prove the result in case ε = 1. We consider first the one-dimensional case, n = 1, where the computation is simplest. Let x ∈ (−∞, ∞). Then −|x| = 2πe

∞

e−|t| e−ixt dt

−∞

=

∞

e

−t−ixt

dt +

∞

et−ixt dt

−∞

0

=

0

e−(1+ix)t dt +

0

∞

e−(1−ix)t dt

0

e−(1−ix)t e−(1+ix)t − = − 1 + ix 1 − ix =

∞ t=0

1 2 1 + = . 1 + ix 1 − ix 1 + x2

Hence, 1 1 −|x| = e , π 1 + x2

x ∈ (−∞, ∞),

and we are done since when n = 1, then ((n + 1)/2) = (1) = 1.

(13.23)

382

Measure and Integral: An Introduction to Real Analysis

Combining (13.23) with the inversion formula (13.17) in case n = 1, we obtain e−|x| =

∞ ∞ 1 1 1 1 ixt e dt = e−ixt dt, π −∞ 1 + t2 π −∞ 1 + t2

x ∈ (−∞, ∞).

Next, with the purpose of introducing an exponential factor e−t in the computation, we write 2

1 2 = e−(1+t )s ds. 2 1+t ∞

0

Consequently,

e−|x|

⎛ ⎞ ∞ ∞

1 ⎝ − 1+t2 s ⎠ −ixt = e ds e dt, π −∞

x ∈ (−∞, ∞).

(13.24)

0

Now consider the higher dimensional case: −|x| = (2π)n e

e−|y| e−ix·y dy,

x ∈ Rn .

Rn

On the right side, express the factor e−|y| by using (13.24) with x there chosen to be |y|, obtaining

e

−|y|

⎞ ⎛ ∞ ∞

1 ⎝ − 1+t2 s ⎠ −i|y|t = e ds e dt π −∞ 0 ∞ ∞ 1 −s −t2 s −i|y|t = e e e dt ds π −∞ 0

1 −s √ −|y|2 /(4s) 1 = e πe √ ds, π s ∞

0

where the last equality follows by dilation from the one-dimensional version of Theorem 13.8. By substitution, we obtain that for any x ∈ Rn ,

383

The Fourier Transform ⎞ ∞ √ 1 ds 2 −|x| = ⎝ (2π)n e e−s π e−|y| /(4s) √ ⎠ e−ix·y dy π s Rn 0 ∞ 1 e−s −|y|2 /(4s) −ix·y =√ e e dy ds √ π s n

⎛

R

0

∞ 1 e−s √ n √ n −s|x|2 =√ π 2 s e ds, √ π s 0

by another application of Theorem 13.8, this time in the n-dimensional form. Regrouping terms gives −|x| = 2n (2π)n e

1+|x|2 s

√ n−1 ∞ − π e

s

n−1 2

ds

0

=

2n

√ n−1 ∞ π

(1 + |x|2 )

n+1 2

e−s s

n−1 2

ds.

0

n+1 2 , and therefore n+1 1 2 −|x| e = , n+1 n+1 π 2 (1 + |x|2 ) 2

The last integral equals

x ∈ Rn ,

as desired. This completes the proof of Theorem 13.22. n+1 (15) (The Poisson Integral in Rn+1 = {(x, ε) : x ∈ Rn , + ) For (x, ε) ∈ R+ ε > 0}, we denote

n+1 ε 2 −ε|x| = P(x, ε) = e n+1 n+1 , π 2 ε2 + |x|2 2

(13.25)

and call P(x, ε) the Poisson kernel for the half-space Rn+1 + . The case n = 1 is considered in (9.10), and the periodic version is defined in Section 12.7 on p. 348. As a function of x, P(x, ε) is clearly positive and of class L1 (Rn ) ∩ L∞ (Rn ) for each ε > 0. Also, as a function of (x, ε), it is infinitely differentiable in n+1 R+ . By direct computation, it satisfies Laplace’s equation

∂2

∂2 ∂2 + · · · + + ∂x2n ∂ε2 ∂x21

P(x, ε) = 0

in Rn+1 + .

(13.26)

384

Measure and Integral: An Introduction to Real Analysis

Moreover, P(x, ε) is an approximation of the identity since if P(x) is defined by P(x) = P(x, 1), then P(x, ε) = ε−n P(x/ε), and by (13.17),

P(x, 1) dx =

Rn

−|x| −|x| e dx = e

Rn

= 1. x=0

Hence, since P(x, 1) = O(1/|x|n+1 ) as |x| → ∞, Theorem 9.13 implies that the Poisson integral of f , defined to be the convolution Pf (x, ε) =

f (x − y)P(y, ε) dy,

(13.27)

Rn

converges to f (x) as ε → 0 at each point x of the Lebesgue set of f if f ∈ L1 (Rn ). If f ∈ Lp (Rn ), p > 1, the same is true (cf. Exercise 12 of Chapter 9). Note that Pf (x, ε) =

−ε|y| dy = f (x − y)e

Rn

=

−ε|y| dy f (x + y)e

Rn

f (y)eix·y e−ε|y| dy

if f ∈ L1 (Rn ).

Rn

Hence, if f ∈ L1 (Rn ) and x is a Lebesgue point of f , then f (x) = lim

ε→0+

f (y)eix·y e−ε|y| dy.

(13.28)

Rn

Recall that the integrals

f (y)eix·y e−ε|y| dy,

ε > 0,

Rn

are called the Abel means of the (formal) integral We also have by inversion that P(x, ε) = e−ε|x| ,

Rn

f (y)eix·y dy.

x ∈ Rn , ε > 0,

(13.29)

where the Fourier transform is taken in the x variable. (16) (The Convolution Property) By Young’s theorem, the convolution of any two integrable functions is also integrable and therefore has a well-defined Fourier transform.

385

The Fourier Transform

Theorem 13.30

If f , g ∈ L1 (Rn ), then f (x) g(x) f ∗ g(x) = (2π)n

for all x ∈ Rn .

This follows easily from Fubini’s theorem; we omit the details. In case n = 1, the result is listed in Exercise 6 Chapter 6. See also Theorem 13.59 for an important related result. (17) (Differentiation Properties) We continue our list of properties of the Fourier transform with two important differentiation formulas.

Theorem 13.31 (a) Let xk , k = 1, . . . , n, denote the kth coordinate of x, x ∈ Rn . If f and xk f (x) belong to L1 (Rn ), then f has a partial derivative with respect to xk everywhere in Rn , and ∂ f (x) = − ixk f (x) ∂xk

−ix·y 1 − iyk f (y) e dy . = (2π)n n

(13.32)

R

(b) Fix k = 1, . . . , n and let h = (0, . . . , 0, hk , 0, . . . , 0) = 0 lie on the kth coordinate axis. Suppose that f ∈ L1 (Rn ) and that there is a function g such that f (x + h) − f (x) lim − g(x) dx = 0. h h →0 k

(13.33)

k

Rn

Then g ∈ L1 (Rn ) and g(x) = ixk f (x) for all x ∈ Rn .

(13.34)

A few remarks about condition (13.33) are listed after the proof of the theorem. Proof. (a) Denoting nonzero points on the kth coordinate axis by h = (0, . . . , 0, hk , 0, . . . , 0), we have e−ihk yk − 1 −ix·y f (x + h) − f (x) 1 = f (y) dy. e hk (2π)n n hk R

386

Measure and Integral: An Introduction to Real Analysis

Since the expression in curly brackets converges to −iyk as hk → 0 and is bounded in absolute value by |yk |, formula (13.32) follows from the Lebesgue dominated convergence theorem and the hypothesis that f (y)yk is integrable. (b) If g satisfies (13.33), then g is clearly integrable since f is. With h = (0, . . . , 0, hk , 0, . . . , 0) as usual, (13.33) together with (13.2) implies the pointwise equality

f (x + h) − f (x) = g(x), or hk hk →0 1 τh f − f (x) = g(x), x ∈ Rn . lim hk →0 hk lim

Equivalently, by (13.3) and (13.5), for all x,

eixk hk − 1 f (x) lim hk hk →0

= g(x),

and (13.34) follows. We now make some remarks related to (13.33). Given f , a function g that satisfies (13.33) is clearly unique up to a set of Lebesgue measure 0. If such a g exists, it is called the partial derivative of f with respect to xk in the L1 sense, the idea being that the difference quotient of f in the kth coordinate converges in the L1 norm. A simple case when (13.33) holds with g equal to the ordinary partial derivative ∂f /∂xk of f is when f ∈ C10 (Rn ), since then both f and ∂f /∂xk have compact support and the difference quotient of f in the variable xk converges uniformly to ∂f /∂xk . Hence, by (13.34), for every k = 1, . . . , n, we have

∂f ∂xk

f (x), (x) = ixk

x ∈ Rn , if f ∈ C10 (Rn ).

(13.35)

As we will see in Theorem 13.41, the restriction in (13.35) that f ∈ C10 (Rn ) can be considerably weakened. By iteration, (13.35) leads easily to analogous formulas for higher-order derivatives. For example, if f ∈ C20 (Rn ) and j, k = 1, . . . , n, then

∂ 2f ∂xj ∂xk

(x) = ixj

∂f ∂xk

(x) = (ixj ) (ixk ) f (x).

387

The Fourier Transform

More generally, if α = (α1 , . . . , αn ) is a multi-index of nonnegative integers and Dα is the differential operator defined by Dα =

∂ α1 ∂ αn , α1 · · · n ∂xα ∂x1 n

then for any N = 1, 2, . . . and any α with α1 + · · · + αn ≤ N, α f (x) = (ix )α1 · · · (ix )αn D f (x) n 1

n if f ∈ CN 0 (R ).

(13.36)

A higher-order analogue of part (a) of Theorem 13.31 is that if N = 1, 2, . . . and f has the property that p(x)f (x) ∈ L1 (Rn ) for every polynomial p(x) of degree N (or less), then f ∈ CN (Rn ) and α f (x) = [(−ix1 )α1 · · · (−ixn )αn f (x)] if α1 + · · · + αn ≤ N. D

(13.37)

Verification is left to the reader. In particular, (13.36) and (13.37) hold for derivatives of any order if n f ∈ C∞ 0 (R ). n Note that if f ∈ CN 0 (R ) for some N, then for every multi-index α = (α1 , . . . , αn ) with α1 + · · · + αn ≤ N, the fact that Dα f ∈ L1 (Rn ) implies α f is a bounded function on Rn . Hence, by (13.36), the function that D α 1 (ix1 ) · · · (ixn )αn f (x) is bounded in x if α1 + · · · + αn ≤ N. Consequently, | f (x)| ≤ C

1 (1 + |x|)N

n if f ∈ CN 0 (R ).

α f tends to 0 at infinity, fˆ (x) = o(|x|−n ) as |x| → ∞. Roughly Also, since D speaking, this means that the smoother a function with compact support is, the faster its Fourier transform tends to 0 at infinity. In particular, by choosing N = n + 1, we obtain a simple sufficient condition for f to be integrable, namely,

n 1 n Corollary 13.38 If f ∈ Cn+1 0 (R ), then f ∈ L (R ). In particular, the inversion n formula (13.17) holds everywhere in R for such f .

The class S = S (Rn ) of Schwartz functions, or rapidly decreasing functions, on Rn is defined to be the collection of all f ∈ C∞ (Rn ) such that p(x)Dα f (x) is bounded on Rn for every polynomial p(x) and every multi-index α. The bound may vary with α and the polynomial p. Thus, if xβ denotes the

388

Measure and Integral: An Introduction to Real Analysis

β

β

monomial xβ = x1 1 · · · xn n , where β = (β1 , . . . , βn ) and each βk is a nonnegative integer, then S = f ∈ C∞ (Rn ) : sup xβ Dα f (x) < ∞ for all α, β .

(13.39)

x∈Rn

n −|x| belongs Clearly, C∞ 0 (R ) ⊂ S , but the containment is proper since e −|x| is not differento S but does not have compact support. The function e tiable at the origin, and so it does not belong to S . The functions 1, |x|2 , and 2 e−1/|x| (defined to be 0 at the origin) are simple examples of infinitely differentiable functions that do not belong to S . Note that S ⊂ Lp (Rn ) for every p, 0 < p ≤ ∞. Also, if f ∈ S , then p(x)Dα f (x) ∈ S for every polynomial p(x) in Rn and every multi-index α. We leave the proof of the next result to the reader (see Exercise 8). 2

Theorem 13.40 If f ∈ S , then f ∈ S . Also, if f ∈ S , then the formulas in (13.36) and (13.37) hold for all α, that is,

α f (x) = (ix)α D f (x) and Dα f (x) = [(−ix)α f (x)] for all x ∈ Rn and all α. n n In particular, if f ∈ C∞ 0 (R ), then f ∈ S (R ). On the other hand, the only ∞ n function f in C0 (R ) such that f has compact support is f ≡ 0 (see Exercise 28). Theorem 13.40 shows that the formula (∂f /∂xk )(x) = ixk f (x) is valid if f ∈ S , and we observed in (13.35) that it is also true if f ∈ C10 (Rn ). Let us now show that it holds for a much larger class of functions. Recall that (by Theorem 7.29) an absolutely continuous function F(t), t ∈ [a, b] ⊂ R1 , has a b first derivative F a.e. in [a, b] and F(b) − F(a) = a F (t) dt.

Theorem 13.41 Let k = 1, . . . , n and f ∈ L1 (Rn ). Suppose that f is locally absolutely continuous on every line parallel to the kth coordinate axis, that is, suppose that f (x1 , . . . , xk−1 , xk , xk+1 , . . . , xn ), when considered as a function of xk , is absolutely continuous on every interval −∞ < a ≤ xk ≤ b < ∞ for all x1 , . . . , xk−1 , xk+1 , . . . , xn . If ∂f /∂xk ∈ L1 (Rn ), then

∂f ∂xk

f (x), (x) = ixk

x ∈ Rn .

In particular, the formula holds for every k = 1, . . . , n and every f such that f ∈ L1 (Rn ) ∩ Liploc (Rn ) and |∇f | ∈ L1 (Rn ).

389

The Fourier Transform

Proof. The second statement of the theorem (concerning the assumption that f is locally Lipschitz continuous) follows from the first statement since if f is locally Lipschitz continuous, then it is locally absolutely continuous on every line. To prove the first statement, let f satisfy its hypothesis. By Theorem 13.31(b), it is enough to verify (13.33) with g equal to ∂f /∂xk , that is, to show that ∂f /∂xk is the partial derivative of f with respect to xk in the L1 sense. We will show this only when n ≥ 2 and k = 1, leaving the remaining cases for the reader to check. If x = (x1 , x2 , . . . , xn ), denote x = (x1 , xˆ ) with xˆ = (x2 , . . . , xn ) ∈ Rn−1 . By hypothesis, for every xˆ , f (x1 , xˆ ) is an absolutely continuous function of x1 on every compact interval in R1 . Hence, by Theorem 7.29, for every x ∈ Rn and every h = (h1 , 0, . . . , 0) ∈ Rn , we have 1 ∂f f (x + h) − f (x) = f x1 + h1 , xˆ − f (x1 , xˆ ) = x1 + t, xˆ dt. ∂x1 h

0

Thus, if h1 = 0,

h1 f (x + h) − f (x) ∂f 1 ∂f ∂f − (x) = x1 + t, xˆ − x1 , xˆ dt. h1 ∂x1 h1 ∂x1 ∂x1 0

Taking the L1 norm in x, and denoting t = (t, 0, . . . , 0), we obtain h1 ∂f f (x + h) − f (x) ∂f ∂f dx ≤ 1 − (x) (x + t) − (x) h1 ∂x1 h1 ∂x1 ∂x1 n

R

0

dt. 1

Now let h1 → 0. By continuity in L1 (Theorem 8.19), the integrand in the last integral tends to 0 as t → 0, that is, ∂f ∂f (x + t) − (x) ∂x1 ∂x1

→ 0 as t → 0, 1

and the result follows. (18) (The Principal Value Fourier Transform of x−1 ) We close this section with the computation in one dimension of a different notion of the Fourier transform, which we call the principal value Fourier transform, of the function 1/x. Since 1/x fails to be integrable in R1 , both near the origin and near infinity,

390

Measure and Integral: An Introduction to Real Analysis

its Fourier transform cannot be defined by using the one-dimensional version of (13.1). Furthermore, since 1/x does not belong to L2 near the origin, the method developed in the next section for extending the definition of the Fourier transform to L2 functions will not apply to 1/x. However, as we will see, the symmetry and oddness of 1/x can be exploited in order to define its Fourier transform as a limit, denoted p.v. (x−1 ) and defined by p.v.

1 1 = lim x ε→0+ 2π ω→∞

ε1

| f (y)|

1 |y|n−α

dy < ∞,

444

Measure and Integral: An Introduction to Real Analysis

then Iα f exists and is finite a.e. in Rn and ⎧ n ⎫ ⎨ Iα f (x) − [Iα f ] n−α ⎬ 1 B exp c1 dx ≤ c2 ⎩ ⎭ |B| f n/α B

for every ball B ⊂ Rn , with c1 and c2 independent of f and B. See also Exercise 18.

14.5 Bounded Mean Oscillation In this section, we will study functions that satisfy (14.9) with B0 replaced by Rn for the constant functional a(B) = c; cf. (14.11) with exponent β = 0. Such functions have a remarkable local exponential integrability property discovered by F. John and L. Nirenberg. We begin with some standard terminology. A locally integrable function f on Rn is said to be of bounded mean oscillation on Rn (or to be a BMO function on Rn ) if there is a constant c ≥ 0 such that for every ball B ⊂ Rn , 1 f (x) − fB dx ≤ c. |B|

(14.47)

B

As usual, fB denotes the average |B|−1 B f . The collection of all such f is denoted BMO(Rn ) and called the class of functions of bounded mean oscillation on Rn . Equivalently, if we denote f ∗ = sup B

1 f (x) − fB dx, |B|

(14.48)

B

where the supremum is taken over all balls B ⊂ Rn , then f ∈ BMO(Rn ) means that f is locally integrable and f ∗ < ∞. Note that if f and g are any two locally integrable functions, then f +g ∗ ≤ f ∗ + g ∗ , and cf ∗ = |c| f ∗ for any constant c. However, · ∗ is not a norm in the usual sense since it vanishes on constant functions. The next lemma shows that if f is measurable and (14.47) holds with the average fB replaced by a different constant depending on B, then f belongs to BMO(Rn ).

445

Fractional Integrals

Lemma 14.49 such that

Let f be a measurable function on Rn . If there is a constant C 1 | f (x) − c(f , B)| dx ≤ C |B| B

for every ball B ⊂ Rn and for some constant c(f , B) depending on f and B, then f ∈ BMO(Rn ). Moreover, f ∗ ≤ 2C. Proof. If f satisfies the hypothesis, then f is clearly locally integrable by the triangle inequality. Furthermore, for any ball B, 1 1 f (x) − fB dx ≤ | f (x) − c(f , B)| dx + c(f , B) − fB |B| |B| B B ≤ C + c(f , B) − fB , where c(f , B) and C are as in the hypothesis. Also, 1 c(f , B) − fB = f (x) − c(f , B) dx |B| B

1 ≤ | f (x) − c(f , B)| dx ≤ C. |B| B

Therefore, f ∗ ≤ C + C = 2C, and the lemma is proved. A simple corollary of Lemma 14.49 is that balls can be replaced by cubes in the definition of BMO(Rn ). More precisely, suppose that f is locally integrable and satisfies the analogue of condition (14.47) for cubes, that is suppose that f ∗∗ := sup

1 f (x) − fQ dx < ∞, |Q| Q

where the supremum is taken over all cubes Q with edges parallel to the coordinate axes, and fQ = |Q|−1 Q f . Then, given any ball B, by enclosing B in a cube Q with |Q| ≤ cn |B|, we obtain f − fQ dx ≤ f − fQ dx ≤ f ∗∗ |Q| ≤ cn f ∗∗ |B|. B

Q

It follows from Lemma 14.49 with c(f , B) there chosen to be fQ that f ∈ BMO(Rn ) and f ∗ ≤ 2cn f ∗∗ . The converse is also true, that is, the definition of BMO(Rn ) using balls implies the analogous definition using cubes,

446

Measure and Integral: An Introduction to Real Analysis

and f ∗∗ ≤ c f ∗ for some constant c that depends only on n; we leave the verification as an exercise. Our main goal is to study the “size” of functions of bounded mean oscillation. To set the stage, we begin by listing a few examples. First, it is easy to see that L∞ (Rn ) ⊂ BMO(Rn ). In fact, if f ∈ L∞ (Rn ), then f is clearly locally integrable, and for any ball B, 1 | f (x) − fB | dx ≤ f − fB ∞ |B| B

≤ f ∞ + | fB | ≤ 2 f ∞ . Hence, f ∈ BMO(Rn ) and f ∗ ≤ 2 f ∞ . However, the containment L∞ (Rn ) ⊂ BMO(Rn ) is a proper containment. For example, the (essentially) unbounded function log |x| is of bounded mean oscillation on Rn ; see Exercise 20. We also leave it as an exercise to check the / BMO(Rn ); and there following two facts: if −∞ < λ < ∞ and λ = 0, then |x|λ ∈ p are functions with compact support that belong to L (Rn ) for all p, 0 < p < ∞, but do not belong to BMO(Rn ). See Exercises 21 and 22. Another subclass of BMO(Rn ) is the collection of all functions I α f defined in (14.42) when f ∈ Ln/α (Rn ), 0 < α < n. This follows immediately from the exponential integrability estimate in Theorem 14.44 and the simple inequality t ≤ exp (tγ ), t ≥ 0, γ ≥ 1. See also Exercise 27. All the examples given so far of BMO(Rn ) functions are locally exponentially integrable in the sense that there are positive constants c and γ such that

exp c| f (x)|γ dx < ∞ for every ball B ⊂ Rn .

B

A natural question is whether the same is true for every f ∈ BMO(Rn ). If γ > 1, the answer is no, and an example is f (x) = log |x|; for example, if n = 1 and γ > 1, then for any c > 0, 1/2 0

" ! ∞ γ 1 γ dx = exp c log ecu e−u du = +∞. x ln 2

However, the answer is yes if γ = 1. This is a corollary of the following basic fact, which is the main result of the section.

Theorem 14.50 (John–Nirenberg) There are positive constants c1 and c2 depending only on n such that if f ∈ BMO(Rn ), B is a ball in Rn and λ > 0, then

447

Fractional Integrals ! " x ∈ B : f (x) − fB > λ ≤ c1 exp − c2 λ |B|. f ∗

(14.51)

Here, we have assumed that f ∗ = 0; otherwise, f is constant a.e. in Rn and the left side of (14.51) is zero for all B and all λ > 0. Before giving a proof, let us deduce two corollaries of the theorem. Corollary 14.52 Let f ∈ BMO(Rn ) and 1 ≤ p < ∞. There is a positive constant c depending only on n and p such that for every ball B ⊂ Rn ,

1/p p 1 f − fB dx ≤ c f ∗ . |B| B

p

In particular, f ∈ Lloc (Rn ), and for every ball B,

1/p 1 p | f | dx ≤ c f ∗ + | fB | |B| B

Proof. To prove the corollary, we compute (cf. Exercise 16 of Chapter 5) ∞ p 1 p p−1 f − fB dx = x ∈ B : f (x) − fB > λ dλ λ |B| |B| B

0

≤

p |B|

∞ 0

= pc1

" ! c2 λ |B| dλ λp−1 c1 exp − f ∗

f ∗ c2

p ∞

by (14.51)

λp−1 e−λ dλ.

0

The first part of the corollary now follows by taking pth roots, and the second then follows from Minkowski’s inequality. Note that the integral ∞ part p−1 e−λ dλ is the classical gamma function (p). λ 0 The proof shows that the constant c can be chosen to satisfy p

c =

−p pc1 c2

∞

λp−1 e−λ dλ

0 −p

≥ pc1 c2

∞ p

−p

pp−1 e−λ dλ = pc1 c2 pp−1 e−p ,

448

Measure and Integral: An Introduction to Real Analysis

whose pth root is of order O(p1/p ) → ∞ as p → ∞. The constant c in the conclusion of the corollary must tend to ∞ as p → ∞ since functions in BMO(Rn ) may not be locally essentially bounded. Corollary 14.53 Let c1 and c2 be as in Theorem 14.50. If f ∈ BMO(Rn ) and c0 is a positive constant such that c0 f ∗ < c2 , then 1 c0 c1 exp c0 f − fB dx ≤ 1 + |B| c2 − c0 f ∗ B

for every ball B ⊂ Rn . In particular,

B exp{c0 | f |} dx

< ∞ for every ball B ⊂ Rn .

Proof. The result can be deduced from (14.51) combined with the following formula (see Exercise 29 of Chapter 5):

∞ exp c0 | f − fB | dx = |B| + c0 ec0 λ | x ∈ B : | f (x) − fB | > λ | dλ.

B

0

Further details are left to the reader. Theorem 14.50 will be proved in several steps. The main step is to derive an analogue of (14.51) for n-dimensional cubes Q with edges parallel to the coordinate axes, that is, to show that ! " x ∈ Q : f (x) − fQ > λ ≤ c1 exp − c2 λ |Q| (14.54) f ∗∗ for all such Q and all λ > 0, and now with c1 = 4 and c2 = 2−n−1 ln 2. As usual, we assume that f ∗∗ = 0 since otherwise there is nothing to prove. The proof of (14.54) will be based on the following n-dimensional version of the Decomposition Lemma 12.68 (cf. the second remark on p. 353 in Section 12.8). All cubes considered below are assumed to have edges parallel to the coordinate axes. Lemma 14.55 (Decomposition Lemma in R n ) Let Q be a cube in Rn and suppose that f ∈ L1 (Q) and f ≥ 0. Then for any real number s satisfying s≥

1 f, |Q| Q

there are nonoverlapping cubes Q1 , Q2 , . . . contained in Q such that

449

Fractional Integrals

1 |Qk | Qk

f ≤ 2n s for all k, ' (ii) f (x) ≤ s for a.e. x ∈ Q − Qk , ( 1 ' 1 (iii) k |Qk | ≤ s Qk f ≤ s Q f . (i) s

s. Using the hypothesis that |Q| Q f ≤ s together with the fact that |Q| = 2n |Q |, we have for each Q that either 1 f ≤s |Q |

or s

λ ⊂ x ∈ Qk : f (x) − fQ > λ .

(14.57)

k

We have 1 1 fQ − fQ = ≤ f − fQ ≤ 2n s f − f Q k |Q | k Q |Qk | Q k

k

by (14.56). Therefore, for any x ∈ Q, f (x) − fQ ≤ f (x) − fQ + fQ − fQ ≤ f (x) − fQ + 2n s. k k k f (x) − fQ exceeds λ, then by subtracting 2n s from both sides, If the left side we obtain f (x) − fQk > λ − 2n s. Hence, by (14.57), if λ > s, then x ∈ Q : f (x) − fQ > λ ≤ x ∈ Qk : f (x) − fQ > λ − 2n s . k k

451

Fractional Integrals

If λ > 2n s, it follows that |{x ∈ Q : | f (x) − fQ | > λ}| 1 ≤ F(λ − 2n s) |Qk | |Q| |Q| k

1 1 F(λ − 2n s) , | f − fQ | ≤ ≤ F(λ − 2n s) s |Q| s Q

where we have used property (iii) in Lemma 14.55 (applied to f − fQ ) in order to obtain the next-to-last estimate. Therefore, F(λ) ≤

F(λ − 2n s) s

if s ≥ 1 and λ > 2n s.

This inequality will now be iterated in order to prove the estimate F(λ) ≤ 4 exp − 2−n−1 ln 2 λ , λ > 0. Let s=2

γ = 2n s = 2n+1 .

and

Fix λ with λ > γ and choose m = 0, 1, 2, . . . such that (m + 1)γ < λ ≤ (m + 2)γ. Then if m = 0, λ > λ − γ > λ − 2γ > · · · > λ − mγ > γ, and we obtain F(λ) ≤ 2−m F(λ − mγ) after m iterations, even if m = 0. The trivial estimate F(λ − mγ) ≤ 1 together with m ≥ (λ/γ) − 2 then gives λ

F(λ) ≤ 2−m ≤ 22− γ = 4e−

ln 2 γ

λ

if λ > γ. ln 2

Finally, if 0 < λ ≤ γ, then it is also true that F(λ) ≤ 4e− γ λ since F(λ) ≤ 1. This proves (14.54) for all λ > 0 and for all Q and f . To complete the proof, it remains only to deduce inequality (14.51) for balls from its analogue (14.54) for cubes. The ideas needed are like those in the first paragraph after the proof of Lemma 14.49, and we will be brief. We will use the letters c, c1 , c2 to denote various positive constants depending only on n, which may be different at each occurrence. Let B be a ball and f ∈ BMO(Rn ), f ∗ = 0. Choose a cube satisfying B ⊂ Q and |Q| ≤ c |B|. As usual, f − fQ ≤ c f − fQ ≤ c f ∗∗ and so fB − f Q ≤ 1 |B| |Q| B Q f (x) − fQ ≥ f (x) − fB − c f ∗∗ , x ∈ Rn .

452

Measure and Integral: An Introduction to Real Analysis

Then if λ is much larger than f ∗∗ , for example, if λ/2 > c f ∗∗ , we have x ∈ B : f (x) − fB > λ ≤ x ∈ Q : f (x) − fQ > λ/2 ! " λ 1 ≤ c1 exp −c2 |Q| by (14.54). 2 f ∗∗ Hence, since |Q| ≤ c |B| and c−1 f ∗∗ ≤ f ∗ ≤ c f ∗∗ , ! " x ∈ B : | f (x) − fB | > λ ≤ c1 exp −c2 λ |B|, f ∗

λ > c f ∗ .

However, for the remaining values of λ, that is, when 0 < λ ≤ c f ∗ , such an inequality is obvious, and Theorem 14.50 follows.

Exercises 1. Show that if f ∈ C1 (B), where B denotes the closure of a ball B, then the inequalities in (14.3) and (14.4) are true for all x ∈ B. 2. Under the same assumptions as in Theorem 14.2, show that both of the estimates (14.3) and (14.4) can be improved by replacing the integral on their right-hand sides by n |∇f (y) · (x − y)| dy ≤ |x − y|n i=1 B

B

∂f |xi − yi | (y) ∂y |x − y|n dy. i

3. Derive the L1 , L1 Poincaré inequality (14.8) directly by the same method used to prove the subrepresentation Theorem 14.2, instead of obtaining it as a corollary of Theorem 14.2. 4. Let B1 and B2 be balls, with B1 ⊂ Rn1 and B2 ⊂ Rn2 , n1 , n2 ≥ 1. If f (x1 , x2 ) ∈ C1 (B1 × B2 ) ∩ L1 (B1 × B2 ), show that for every (x1 , x2 ) ∈ B1 × B2 , f (x1 , x2 ) − fB ×B ≤ c 1 2

∇y f y1 , y2 k1 x1 , x2 ; y1 , y2 dy1 dy2 1

B1 ×B2

+c

B1 ×B2

∇y f y1 , y2 k2 x1 , x2 ; y1 , y2 dy1 dy2 , 2

453

Fractional Integrals

where ki x1 , x2 ; y1 , y2 =

|xi − yi | 1 , |B1 | |B2 | |x1 −y1 | + |x2 −y2 | n1 +n2 r(B1 )

i = 1, 2,

r(B2 )

fB1 ×B2 = (|B1 | |B2 |)−1 B1 ×B2 f , and c is a constant that is independent of f , x1 , x2 , B1 , and B2 . Note that each integral is at most a multiple of

∇y f (y1 , y2 ) r(B1 ) + ∇y f (y1 , y2 ) r((B2 ) 1 2

B1 ×B2

× K x1 , x2 ; y1 , y2 dy1 dy2

where K x1 , x2 ; y1 , y2 =

1 |B1 | |B2 | |x1 −y1 | r(B1 )

1

+

|x2 −y2 | n1 +n2 −1 r(B2 )

.

5. Let Q be a closed cube in Rn , n > 1. Show that if f ∈ C1 (Q), then f (x) − fQ ≤ cn Q

|∇f (y)| dy, |x − y|n−1

x ∈ Q,

where fQ = |Q|−1 Q f and cn depends only on n. Also, by choosing Q to be the unit cube centered at *the origin and making an affine change of variables, show that if I = ni=1 [ai , bi ] is an interval in Rn and f ∈ C1 (I), then for all x ∈ I, f (x) − fI ≤ cn 1 |I| I

n i=1

∂f (y) (bi − ai ) (n ∂yi

1 |xi −yi |

n−1 dy,

i=1 bi −ai

where fI = |I|−1 I f . *n n 1 6. Let I = i=1 [ai , bi ] be an interval in R and suppose that f ∈ C (I). Show that

n | f (x) − fI | dx ≤ cn (bi − ai ) I i=1

I

where fI = |I|−1

I

∂f ∂x (x) dx, i

f and cn is independent of f and I.

454

Measure and Integral: An Introduction to Real Analysis

7. The proof of Lemma 14.15 was carried out for the ball B = B(0; r) and a point x = (x, 0, . . . , 0) with 0 ≤ x < r. Show that the proof is similar in the general case. 8. Verify the existence of balls Bx and By with the properties described after (14.32), that is, using the notation there and assuming that x, y, r, R, ε satisfy (14.31) and (14.32), show that there are open balls Bx , By ⊂ B(0; r) such that x ∈ Bx , y ∈ By , r(Bx ) = r(By ) = R, and Bx ∩By = ∅. (Denote d = |x− y|, x = x/|x|, y = y/|y|, and let xd = (|x| − d)x and yd = (|y| − d)y . Then the balls Bx = B(xd ; R) and By = B(yd ; R) have the desired properties. The restriction on ε in (14.31) helps to show that Bx , By ⊂ B(0; r). Also, (xd + yd )/2 ∈ Bx ∩ By . It may be helpful to note that |xd − yd | < d, for example, by the law of cosines.) 9. Show that a function that satisfies (14.28) can be redefined on a subset of B0 of measure zero so that (14.29) holds with the same constant C3 as in (14.28). 10. Prove that Theorems 14.12 and 14.25 remain true if B0 is replaced by Rn and f is assumed to be locally integrable on Rn . 11. Show that the function |x|1/2 is Hölder continuous of order β on Rn if and only if β = 1/2. 12. In the plane R2 , consider the class of rectangles I((x, y); h) with edges parallel to the coordinate axes, center (x, y), x-dimension h, and y-dimension h2 , h > 0, where (x, y) and h are allowed to vary. Suppose that f (x, y) is a locally integrable function that satisfies 1 h3

f (u, v) − fI((x,y);h) du dv ≤ C hβ

I((x,y);h)

for some C and β independent of (x, y) and h, with 0 < β ≤ 1. Show that after possible redefinition on a set of measure zero, f satisfies β | f (x, y) − f (u, v)| ≤ c |x − u| + |y − v|1/2 for all (x, y) and (u, v), where c depends only on C and β. (It may be helpful to consider R2 endowed with the metric d((x, y), (u, v)) = |x − u| + |y − v|1/2 instead of the usual metric.) 13. Verify the dilation property δλ (Iα f ) = λα Iα (δλ f ), where (δλ f )(x) = f (λx), λ > 0, and use it to prove (14.35). Similarly, show that if 0 < α < n, q > 0, and there is a constant c such that 1/q sup t x ∈ Rn : Iα f (x) > t ≤ c f 1 t>0

for all f , then q = n/(n − α).

455

Fractional Integrals

14. Let B be the ball B(0; 1/2) in Rn , and let 0 < α < n and 0 ≤ β < 1 − (α/n). Then the function $−1 # f (x) = χB (x) |x|α |log |x||1−β has compact support and belongs to Ln/α (Rn ). Show that there is a positive constant c such that ⎧ 1 ⎪ ⎨log log |x| if β = 0 Iα f (x) ≥ c β ⎪ ⎩ log 1 if 0 < β < 1 − |x|

α n

for all x near 0.

In particular, lim Iα f (x) = +∞,

x→0

and Iα f is not essentially bounded in any neighborhood of the origin. How are Iα f and I α f related? 15. Let 0 < α < n, γ = 2 − (α/n), and B = B(0; 1/2). Show that the function −1 f (x) = χB (x) |x|n |log |x||γ n/(n−α)

belongs to L1 (Rn ) but that Iα f ∈ / Lloc

(Rn ). (Show that there is a con1−γ stant c > 0 such that Iα f (x) ≥ c |x|α−n log |x| for all x near the origin.)

16. If 0 < α < n and f ∈ Ln/α (Rn ), show that I α f is a measurable function. (One way to proceed is to express I α f as the limit of a sequence of measurable functions: for example, by (14.43),

I α f (x) = lim

k→∞

|y|1

| f (y)|

1 |y|n−α

dy < ∞

is necessary and sufficient for the existence and finiteness a.e. of Iα f . Prove that the condition holds if there exists p ∈ [1, n/α) such that

456

Measure and Integral: An Introduction to Real Analysis

f ∈ Ln/α (Rn ) ∩ Lp (Rn ). Consequently, the conclusion of Corollary 14.46 is valid for such f . 19. Let 0 < β ≤ 1 and B0 be a ball in Rn . Suppose that f is a measurable function on B0 that satisfies sup B

1 |B|1+(β/n)

| f (x) − c(f , B)| dx < ∞,

B

where the supremum is taken over all balls B ⊂ B0 and c(f , B) is a constant depending on f and B. Show that f ∈ L1 (B0 ) and that (14.26) holds. (Argue as in the proof of Lemma 14.49.) 20. (a) Show that log |x| ∈ BMO(Rn ). (b) In case n = 1, show that the odd function (sign x) log |x| does not belong to BMO(−∞, ∞). Find an analogue of this fact in case n > 1. (For part (a), it may help to consider two types of balls, those with center xB = 0 and radius r(B) < |xB |/2 and those with center xB and radius r(B) ≥ |xB |/2 [e.g., xB = 0], and apply Lemma 14.49. If B is of the first type, use the fact that log |x| is continuously differentiable in B and choose c(log |x|, B) = log |xB | in the lemma. If B is of the second type, choose c(log |x|, B) = log r(B). For part (b), consider intervals centered at the origin.) / BMO(Rn ), −∞ < λ < ∞, λ = 0. 21. Show that |x|λ ∈ λ Show that |x| χ|x| 0 assuming that f ∈ L1 (Rn ). (Given s and f , adjust the size of the cubes Q in the initial grid, which now covers all of Rn , such that |Q|−1 Q f ≤ s and all Q have the same edge length.) 24. Let w(x) be measurable and positive a.e. in Rn , and suppose that log w ∈ BMO(Rn ). Show that there are positive constants A1 and A2 depending on n, with A1 also depending on log w ∗ , such that

1 A1 1 −A1 w dx w dx ≤ A2 |B| |B| B

for all balls B ⊂ Rn .

B

(This can be deduced from Corollary 14.53 by writing B

wA1 dx =

B

exp A1 log w − (log w)B dx exp A1 (log w)B ,

457

Fractional Integrals

together with a similar formula for B w−A1 dx; note that the function log(1/w) = − log w also belongs to BMO(Rn ).) 25. Let 1 < p < ∞. A nonnegative function w(x) on Rn is said to satisfy the Ap condition if w, w−1/(p−1) ∈ L1loc (Rn ) and there is a constant C such that for every ball B ⊂ Rn , p−1 1 1 1 w(x) dx w(x) p−1 dx ≤ C. |B| |B| B

B

For such w, we will write w ∈ Ap . The opposite inequality with C = 1 is a consequence of Hölder’s inequality. Note that 0 < w(x) < ∞ for a.e. x if w ∈ Ap . (a) Show by direct estimation that |x|γ ∈ Ap if −n < γ < n(p − 1). (Consider first the case when r(B) < |xB |/2, where r(B) and xB denote the radius and center of B.) (b) Show that log w ∈ BMO(Rn ) if w ∈ Ap . (Consider separately the cases p = 2, p < 2 and p > 2. If w ∈ A2 , show that 1 1 sup exp | log w(x) − log wB | dx < ∞, wB = w(x) dx. |B| B |B| B

B

If p < 2 and w ∈ Ap , show that w ∈ A2 . In case p > 2, use the fact that if w ∈ Ap then w−1/(p−1) ∈ Ap , 1/p + 1/p = 1.) 26. Let Q0 be a cube in Rn . We say that f ∈ BMO(Q0 ) if f ∈ L1 (Q0 ) and 1 f ∗∗, Q0 := sup | f − fQ | < ∞, Q⊂Q0 |Q| Q

where the supremum is taken over all cubes Q in Q0 that have the same orientation as Q0 . Prove that if f ∈ BMO(Q0 ), then there are positive constants c1 and c2 such that for all such Q and all λ > 0, (14.54) holds with f ∗∗, Q0 in place of f ∗∗ . 27. Let f be the periodic conjugate function of f as defined in Chapter 12: 1 dt f (x) = − p.v. . f (x − t) 1 π 2 tan 2t |t|≤π (a) If f ∈ L∞ [−π, π], show that f belongs to the class BMO[−π, π] defined in Exercise 26 and that f ∗∗, [−π,π] ≤ c f L∞ [−π,π] with c independent of f .

458

Measure and Integral: An Introduction to Real Analysis

(b) Deduce from part (a) the conclusion of Exercise 20 in Chapter 12. Let f ∈ L∞ (R1 ) and have compact support. Show that the Hilbert transform Hf of f belongs to BMO(R1 ), and derive an analogue for Hf ∗ of the estimate in part (a). (For (a), given a small interval I ⊂ [−π, π], let xI denote the center of I and 2I denote the interval concentric with I with length 2|I|. Decompose f on (xI − π, xI + π) as f = g + h where g = f χ2I . Use Hölder’s inequality and Theorem 12.79, applied to the periodic extension of g, to show that I | g| ≤ c|I| f L∞ [−π,π] . Then estimate I | h(x) − h(xI )| dx by using the mean value theorem.) 28. For a measurable function f on Rn and 0 < α < n, define the fractional maximal function Mα f of f by 1 | f (y)| dy, n−α B:x∈B r(B)

x ∈ Rn .

Mα f (x) = sup

B

Show that Mα f is a measurable function on Rn and that there is a constant c independent of f and x such that Mα f (x) ≤ cIα (| f |)(x). As a consequence, the estimates in Theorem 14.37 for Iα f also hold for Mα f . Show that the same is true if balls B are replaced by cubes Q in the definition of Mα f , with r(B) replaced by the edge length of Q. 29. Let 0 < α < n and suppose that f satisfies

| f |(1 + log+ | f |) dx < ∞.

Rn

Prove that Iα f ∈ Ln/(n−α) (E) for every measurable set E ⊂ Rn with |E| < ∞, and

|Iα f |

n/(n−α)

dx ≤

α/(n−α) C f L1 (Rn )

E

|E| +

+

| f | log | f | dx ,

Rn

where C is independent of f and E. (Combine (14.39) in case p = 1 with the estimate in Exercise 22 of Chapter 9.) 30. Let k = 2, 3, . . . and B be a ball in Rn , n > k. Show that if f ∈ Ck0 (B), then | f (x)| ≤ c

B

|∇ k f (y)|

1 dy, |x − y|n−k

x ∈ B,

( where |∇ k f (y)| denotes the sum |α|=k |Dα f (y)|, α = (α1 , . . . , αn ) is a multi-index of nonnegative integers, |α| = α1 + · · · + αn , and c is a

459

Fractional Integrals

constant independent of B and f . (In case k = 2, apply part (ii) of Corollary 14.6 to f and each component of its gradient, and verify the estimate B

when n > 2.)

1 1 1 dz ≤ cn , n−1 n−1 |x − z| |y − z| |x − y|n−2

x, y ∈ B,

15 Weak Derivatives and Poincaré–Sobolev Estimates First-order Poincaré–Sobolev estimates in Rn are inequalities showing how Lp norms of the gradient of a function control the function itself. For a sufficiently smooth function f , the first-order partial derivatives ∂f /∂xi , i = 1, . . . , n, and the gradient ∇f of course have the usual meanings. When f is continuously differentiable and n > 1, the first-order Poincaré–Sobolev estimates that we will derive are fairly simple consequences of the subrepresentation formulas and norm estimates for fractional integrals proved in Chapter 14. A notable exception to the simplicity of their derivation occurs when p = 1, as we will see. However, Poincaré–Sobolev estimates are also true for less smooth functions. Our first goal is to study a weaker notion of ∇f that may exist when the ordinary gradient does not. This will allow us to extend the subrepresentation formulas in Chapter 14 to more general functions than those of class C1 . Poincaré–Sobolev estimates for functions with weak derivatives can then be derived for n > 1 by using the same pattern as for smooth functions, namely, by applying norm estimates for the fractional integral operator I1 . Results when n = 1 will instead be obtained from the representation in Chapter 7 of an absolutely continuous function as the integral of its derivative.

15.1 Weak Derivatives Let be an open set in Rn , n ≥ 1. Let L1loc () denote the class of locally integrable real-valued functions f on ; as usual, we say f is locally integrable on if it is integrable on every compact subset of . The notion of the weak firstorder partial derivatives of such an f is based on generalizing the standard formula for integration by parts by allowing functions that may be different from the ordinary partial derivatives of f to play their role in the formula. The precise definition in case n > 1 is as follows: if f ∈ L1loc () and i = 1, . . . , n, then f (x) is said to have a weak partial derivative in with respect to xi , where x = (x1 , . . . , xn ), if there is a function gi ∈ L1loc () such that

461

462

Measure and Integral: An Introduction to Real Analysis

f (x)

∂ϕ (x) dx = − gi (x) ϕ(x) dx ∂xi

for every ϕ ∈ C∞ 0 ().

(15.1)

The justification for not including “integrated terms” on the right side of (15.1) is that each function ϕ has compact support in . An important part of the definition that f has a weak partial derivative gi in is that both f and gi belong to L1loc (), but neither f nor gi is required to belong to L1 (). The functions ϕ in (15.1) are called test functions because they serve to test whether the same function gi satisfies (15.1) as ϕ varies over a fairly large collection of functions with compact support. As we will see later, using C∞ 0 () as the class of test functions is sufficient to ensure that gi is unique if it exists, and it will then make sense to refer to gi as “the” weak partial derivative of f with respect to xi . On the other hand, if (15.1) holds for all ϕ ∈ C∞ 0 (), then it also holds for all ϕ ∈ Lip0 (), that is, for all ϕ that are Lipschitz continuous and compactly supported in (see Theorem 15.7). In case n = 1 and is an open set in (−∞, ∞), the analogous definition is that a function f ∈ L1loc () has a weak derivative in if there exists g ∈ L1loc () such that

f (x)ϕ (x) dx = −

g(x)ϕ(x) dx

for every ϕ ∈ C∞ 0 ().

(15.2)

There are simple, almost trivial, examples of functions that have weak partial derivatives but have ordinary partial derivatives nowhere. For example, in case n = 1, if −∞ < s < t < ∞, the function χ(x) that equals s for rational 1 x and equals t for irrational x has this property since for every ϕ ∈ C∞ 0 (R ), ∞ −∞

χ ϕ dx = t

∞ −∞

ϕ dx = 0 = −

∞

0 ϕ dx.

−∞

Therefore, χ has weak derivative 0 in R1 even though its ordinary derivative exists nowhere; in fact, χ is continuous nowhere. The averaging process that is inherent in integrating χ ϕ compensates for the lack of ordinary differentiability of χ. We leave it as an exercise to construct a similar example in any dimension. On the other hand, as we will see in Theorem 15.4, the weak first partial derivatives with respect to xi , i = 1, . . . , n, of a locally Lipschitz continuous function f agree with its corresponding ordinary partial derivatives ∂f /∂xi . Let us begin by making several comments related to (15.1) and (15.2).

463

Weak Derivatives and Poincaré–Sobolev Estimates

The integrals on both sides of (15.1) are well-defined and finite since if K denotes the support of ϕ, then ∂ϕ f dx = f ∂ϕ dx ≤ ∂ϕ | f | dx < ∞, ∂x ∞ ∂x ∂x i i i L (K) K

K

where we have used the local integrability of f in , and similarly, |gi ϕ| dx = |gi ϕ| dx ≤ ϕL∞ (K) |gi | dx < ∞. K

K

Next, let us consider the question of uniqueness of weak derivatives. Let i = 1, . . . , n and suppose that f has a weak partial derivative gi with respect to xi in . Clearly, both f and gi can be changed arbitrarily in any subset of of measure zero without affecting (15.1). However, it turns out that gi is uniquely determined a.e. in by f . To prove this, it is enough to first note that if g˜ i is another function in L1loc () that satisfies (15.1) for the same f , then (gi − g˜ i ) ϕ dx = 0 for all ϕ ∈ C∞ 0 (),

and then to apply part (i) of the next lemma with g chosen to be gi − g˜ i . Lemma 15.3

Let be an open set in Rn and let g ∈ L1loc ().

(i) If

g ϕ dx = 0

for all ϕ ∈ C∞ 0 (),

then g = 0 a.e. in . (ii) If is connected and

g

∂ϕ dx = 0 for all i = 1, . . . , n and all ϕ ∈ C∞ 0 (), ∂xi

then g is constant a.e. in . Part (ii) of Lemma 15.3 will be used in the proof of Theorem 15.6. Proof. Part (i) can be proved in several ways. We will use a method based on a smooth approximation of the identity; the method is flexible and will

464

Measure and Integral: An Introduction to Real Analysis

be used in other situations in this chapter. Let k(x) ∈ C∞ 0 ({|x| < 1}) sat isfy Rn k(x) dx = 1, and set kε (x) = ε−n k(x/ε), ε > 0. Each function kε (x) is infinitely differentiable in Rn and supported in {|x| < ε}. Let B be any fixed ball (or bounded open interval if n = 1) whose closure lies in . Then there is a number ε0 > 0 and a compact set K0 with B ⊂ K0 ⊂ such that for all y ∈ B and all ε ∈ (0, ε0 ), the function kε (y − x) considered as a function of x has support in K0 . By hypothesis, we obtain

g(x) kε (y − x) dx = 0 for all y ∈ B and all ε < ε0 .

For such y and ε, the integral on the left equals

(g χK0 )(x)kε (y − x) dx,

Rn

which by Theorem 9.13 converges to (gχK0 )(y) for a.e. y ∈ Rn as ε → 0, and so converges to g(y) for a.e. y ∈ B as ε → 0. Here, g χK0 is of course interpreted to be 0 outside K0 , that is, g χK0 (x) =

g(x) if x ∈ K0 0 if x ∈ Rn − K0 ,

and we have used the fact that g χK0 ∈ L1 (Rn ). It follows that g is zero a.e in B and therefore also a.e. in . This proves (i). To prove (ii), let kε , B, ε0 , and K0 be as above. If y ∈ B and ε < ε0 , then for every i = 1, . . . , n, we have by hypothesis that 0=

=−

g(x)

∂ kε (y − x) dx ∂xi

∂ g(x)kε (y − x) dx. ∂yi

Hence, for all ε < ε0 , there is a constant cε = cε,B such that

g(x)kε (y − x) dx = cε

for all y ∈ B.

As above, assuming that y ∈ B and ε < ε0 , g can be replaced by the integrable function gχK0 in this integration, and the domain of integration can then be enlarged to Rn . Now let ε → 0. Then the integral converges to g(y) a.e. in B, and consequently cε converges to some constant c, and g(y) = c a.e. in B.

Weak Derivatives and Poincaré–Sobolev Estimates

465

Since is connected by hypothesis, it follows that g is constant a.e. in . This completes the proof of the lemma. Since the function gi in (15.1) is unique (if it exists), we may call it the weak partial derivative of f with respect to xi in . It is customary to use the familiar notation ∂f /∂xi for gi even though f may not have a partial derivative with respect to xi in in the ordinary sense. Similarly, when n = 1, the weak derivative of f will be denoted by f . In situations when the notation might cause confusion, it can be clarified by also using the term “weak” or “ordinary” derivative as the case may be. Let us now show that if f ∈ Liploc (), then f has weak partial derivatives that are the same as its ordinary partial derivatives. Recall from the discussion preceding the Rademacher–Stepanov Theorem 7.53 that any f ∈ Liploc () has ordinary partial derivatives ∂f /∂xi , i = 1, . . . , n, a.e. in that are measurable and locally bounded in . We note however that the full strength of the Rademacher–Stepanov theorem is not used here. Theorem 15.4 Let be an open set in Rn and let f ∈ Liploc (). Then for each i = 1, . . . , n, f has a weak partial derivative gi with respect to xi in given by the ordinary partial derivative ∂f /∂xi , that is, ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

if ϕ ∈ C∞ 0 () and i = 1, . . . , n.

Proof. Let , f , and ϕ satisfy the hypothesis, and let ∂f /∂xi denote the ordinary partial derivative of f with respect to xi . Note that both integrals in the conclusion are finite since ϕ and ∂ϕ/∂xi are bounded and both f and ∂f /∂xi are integrable (even bounded) on the support of ϕ. We will prove the result in case n > 1 and i = 1; the general case is similar. Denote points x ∈ Rn by x = (x1 , x2 , . . . , xn ) = (x1 , xˆ )

with xˆ = (x2 , . . . , xn ) ∈ Rn−1 ,

and write xˆ = {x1 : (x1 , xˆ ) ∈ } and

dx = dx1 dˆx

with dˆx = dx2 · · · dxn .

By Theorem 6.8,

⎡ ⎤ ∂ϕ ∂ϕ ⎣ f (x1 , xˆ ) f dx = (x1 , xˆ ) dx1 ⎦ dˆx. ∂x1 ∂x 1 n−1 R

xˆ

466

Measure and Integral: An Introduction to Real Analysis

Each set xˆ is an open set in (−∞, ∞). If xˆ is not empty, then Theorem 1.10 implies that it can be written as a countable union j (αj , βj ) of disjoint open intervals (αj , βj ), possibly of infinite length. The intervals (αj , βj ) are the connected components of xˆ and of course depend on and xˆ . When f (x1 , xˆ ) and ϕ(x1 , xˆ ) are considered as functions of x1 , f (x1 , xˆ ) is locally Lipschitz continuous on xˆ and ϕ(x1 , xˆ ) is supported in the union j (aj , bj ) of open intervals (aj , bj ) that satisfy [aj , bj ] ⊂ (αj , βj ) and bj − aj < ∞ for each j. Therefore, using Theorem 7.32(ii) and the absolute continuity on [aj , bj ] of f (x1 , xˆ ), we obtain βj αj

bj

∂ϕ ∂ϕ f (x1 , xˆ ) (x1 , xˆ ) dx1 = f (x1 , xˆ ) (x1 , xˆ ) dx1 ∂x1 ∂x 1 a j

bj ∂f βj ∂f =− (x1 , xˆ )ϕ(x1 , xˆ ) dx1 = − (x1 , xˆ )ϕ(x1 , xˆ ) dx1 . ∂x1 ∂x1 a α j

j

Adding over j gives

f (x1 , xˆ )

xˆ

∂f ∂ϕ (x1 , xˆ ) dx1 = − (x1 , xˆ )ϕ(x1 , xˆ ) dx1 . ∂x1 ∂x1 xˆ

Hence,

⎡ ⎤ ∂f ∂ϕ ⎣− f dx = (x1 , xˆ )ϕ(x1 , xˆ ) dx1 ⎦ dˆx ∂x1 ∂x1 n−1 xˆ

R

∂f ϕ dx, =− ∂x1

where the final equality is also due to Theorem 6.8, completing the proof. Let f be a function defined on an open interval (α, β) ⊂ (−∞, ∞), possibly of infinte length. We say that f is absolutely continuous on (α, β) if f is absolutely continuous on every compact subinterval [a, b] of (α, β). By Theorem 7.29, this is equivalent to assuming that f has an ordinary derivative f a.e. in (α, β) that is locally integrable in (α, β) and that

f (b) − f (a) =

b a

f (x) dx

if α < a ≤ b < β.

Weak Derivatives and Poincaré–Sobolev Estimates

467

Remark 15.5 The proof of Theorem 15.4 shows that the conclusion ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

n if ϕ ∈ C∞ 0 (R )

is true for a particular i, i = 1, . . . , n, if the assumption that f ∈ Liploc () is weakened by assuming both of the following: (a) f is absolutely continuous with respect to xi on every connected component of the intersection of with a.e. ((n − 1)-dimensional measure) line parallel to the xi axis. (b) f and its ordinary partial derivative ∂f /∂xi belong to L1loc (). In case n = 1, condition (b) in Remark 15.5 is automatically true for any f that is absolutely continuous on every connected component of . In fact, we have the next basic result. Theorem 15.6 Let be an open set in (−∞, ∞) and f ∈ L1loc (). Then f has a weak derivative in if and only if f can be redefined in a subset of of measure zero so that f is absolutely continuous on every connected component of , that is, if and only if there is a function h on such that f = h a.e. in and h is absolutely continuous on every connected component of . Moreover, the weak derivative f of f coincides with the ordinary derivative h . Proof. The sufficiency of the condition follows from Remark 15.5, as we have already noted. To prove the necessity, suppose that f has weak derivative f on , and decompose into the countable union of disjoint open intervals (αj , βj ), possibly of infinite length: = j (αj , βj ). These intervals are the connected components of . Given an interval (αj , βj ), choose a point γj ∈ (αj , βj ) and define f˜ in (αj , βj ) by

f˜ (x) =

x

f (t) dt,

x ∈ (αj , βj ).

γj

In this way, f˜ is defined in all of and is absolutely continuous on every (αj , βj ) since f ∈ L1loc (). Clearly, f is the weak derivative of f on each (αj , βj ) since it is the weak derivative of f on . Therefore, for every ϕ ∈ C∞ 0 ((αj , βj )), we have

468

Measure and Integral: An Introduction to Real Analysis

βj

βj

f ϕ dx = −

αj

f ϕ dx

αj

βj = − (f˜ ) ϕ dx

by Theorem 7.11

αj

=

βj

f˜ ϕ dx

by Theorem 7.32(ii).

αj

In the middle equality above, (f˜ ) denotes the ordinary derivative of f˜ . Hence, for all ϕ ∈ C∞ 0 ((αj , βj )), βj (f − f˜ ) ϕ dx = 0, αj

and it follows from the second part of Lemma 15.3 (in the one-dimensional case of an open interval) that there is a constant cj such that f (x) = f˜ (x) + cj

for a.e. x ∈ (αj , βj ).

Define h(x) in by h(x) = f˜ (x) + cj if x ∈ (αj , βj ), j ≥ 1. Then h is absolutely continuous on every (αj , βj ) and f = h a.e. in . Also, the ordinary derivative h satisfies h = (f˜ ) = f a.e. in , and the proof is complete. A simple application of Theorem 15.6 is that the step function f (x) =

1 if 0 < x < 1 −1 if − 1 < x < 0

does not have a weak derivative on (−1, 1), even though it is infinitely differentiable on (−1, 1) except at a single point. This follows immediately from Theorem 15.6 since f has a jump discontinuity in (−1, 1), but a direct verification is not difficult. In fact, first note that if ϕ is infinitely differentiable and supported in (−1, 1), then the left side of (15.2) equals 1 −1

f ϕ dx =

1 0

ϕ dx −

0

ϕ dx

−1

=[ϕ(1) − ϕ(0)] − [ϕ(0) − ϕ(−1)] = −2ϕ(0)

469

Weak Derivatives and Poincaré–Sobolev Estimates

since ϕ(1) = ϕ(−1) = 0. If (15.2) were true, there would then exist g ∈ L1loc (−1, 1) such that 1

gϕ dx = 2ϕ(0)

for all ϕ ∈ C∞ 0 ((−1, 1)).

−1

We leave it as an exercise to show that this is impossible; see Exercise 2. See also Exercise 3. We conclude our basic comments related to (15.1) and (15.2) by showing that the class C∞ 0 () of test functions ϕ used in the definition of weak partial derivatives on can be enlarged to the class Lip0 () of Lipschitz continuous functions with compact support in . Thus, once it is verified by testing with the class C∞ 0 () that a function has weak derivatives, it is legitimate to use (15.1) (or (15.2)) for the larger class of functions ϕ ∈ Lip0 (). Theorem 15.7 Let be an open set in Rn and suppose that f has a weak partial derivative ∂f /∂xi in for some i, i = 1, . . . , n. Then ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

for every ϕ ∈ Lip0 ().

−n Proof. Consider again an approximation of the identity kε (x) = ε k(x/ε), ∞ n ε > 0, x ∈ R , with k ∈ C0 ({|x| < 1}) and Rn k dx = 1. Let ϕ ∈ Lip0 (). By extending ϕ to be zero outside , we may think of ϕ as being Lipschitz continuous in all of Rn . Since ϕ has compact support in , there is a compact set K0 ⊂ and a number ε0 > 0 such that K0 contains the supports of ϕ and ϕ ∗ kε for all ε < ε0 . Moreover, we may assume that ε0 is chosen so small that for all ε < ε0 and x ∈ K0 , kε (x − y) considered as a function of y is supported in . By the definition of the weak derivative ∂f /∂xi ,

f (x)

∂f ∂ (ϕ ∗ kε )(x) dx = − (x) (ϕ ∗ kε )(x) dx, ∂xi ∂xi

ε < ε0 .

Let us compute the limit as ε → 0 of each side of this equation. The domains of integration on both sides may be replaced by K0 because ε < ε0 . The expression on the right is then −

∂f (ϕ ∗ kε ) dx. ∂xi

K0

470

Measure and Integral: An Introduction to Real Analysis

Since ∂f /∂xi ∈ L1 (K0 ) and (ϕ ∗ k )(x) converges pointwise to ϕ(x) and is bounded uniformly in ε and x, this has limit equal to −

∂f ∂f ϕ dx = − ϕ dx. ∂xi ∂xi

K0

On the other hand, for ε < ε0 , the integral on the left side earlier above is K0

f (x)

∂ (ϕ ∗ kε )(x) dx. ∂xi

In order to compute its limit as ε → 0, note that

∂ ∂ k (x − y) dy (ϕ ∗ kε )(x) = ϕ(y) ∂xi ∂xi ∂ϕ

∂ = ϕ(y) − k (x − y) dy = (y)kε (x − y) dy, ∂yi ∂yi

where to obtain the final equality, we have assumed that x ∈ K0 and ε < ε0 and applied Theorem 15.4 to the Lipschitz function ϕ and the smooth test function kε (x−y) of y. (See the related result in Exercise 4.) Since ϕ is Lipschitz continuous, the last integral is bounded uniformly in x and ε for x ∈ Rn and ε > 0, and it converges a.e. to (∂ϕ/∂xi )(x) as ε → 0. Since f ∈ L1 (K0 ), the Lebesgue dominated convergence theorem implies that lim

ε→0

f (x)

K0

∂ϕ ∂ (ϕ ∗ kε )(x) dx = f dx ∂xi ∂xi K0

=

∂ϕ f dx. ∂xi

This completes the computation of the limits. It follows that ∂ϕ ∂f f dx = − ϕ dx, ∂xi ∂xi

and the theorem is proved.

471

Weak Derivatives and Poincaré–Sobolev Estimates

If a locally integrable function f in has weak partial derivatives ∂f /∂xi in for every i = 1, . . . , n, n > 1, we write ∇f =

∂f ∂f ,..., ∂x1 ∂xn

in

and call this vector the weak gradient of f in . It is customary to use the same notation ∇f for the weak gradient as for the ordinary gradient. By Theorem 15.4, the two notions are the same if f ∈ Liploc (). Note that if f is any function with a weak gradient in , then both f and |∇f | are locally integrable in by definition; here, the notation |∇f | stands for

n ∂f |∇f | = ∂x i=1

i

2 1/2 .

15.2 Smooth Approximation and Sobolev Spaces In Section 15.3, before deriving Poincaré–Sobolev estimates, we will show that the subrepresentation formulas for C1 functions in Chapter 14 can be extended to functions with a weak gradient. The proofs of these extended versions of the formulas will be based on the known ones for C1 functions and various results about approximation by smooth functions. Approximation theorems like Theorem 15.8 are generally attributed to K. Friedrichs, although their hypotheses may vary. Some variants are discussed farther below and in the exercises. Theorem 15.8 Let be an open set in Rn and K be a compact subset of . If f has a weak gradient ∇f in , then there is a sequence {fj }∞ j=1 of functions on such that (i) fj ∈ C∞ 0 () for all j, (ii) fj → f a.e. in K and in L1 (K) norm as j → ∞, (iii) ∇fj → ∇f a.e. in K and in L1 (K) norm as j → ∞. In part (iii), the terminology ∇fj → ∇f in L1 (K) norm means that for every i = 1, . . . , n, ∂fj /∂xi → ∂f /∂xi in L1 (K) norm as j → ∞. Before giving a proof, we note that more can be said about the supports of the approximating functions fj : if G is any open set satisfying K ⊂ G ⊂ ,

472

Measure and Integral: An Introduction to Real Analysis

then conclusion (i) of the theorem can be replaced by fj ∈ C∞ 0 (G) for all j. Of course, the sequence {fj } then depends on G as well as on K. Indeed, to see why, we have only to apply the theorem with replaced by G, after noting that if f has weak gradient ∇f in an open set , then f also has weak gradient ∇f in any open set G ⊂ . See Theorem 15.9 about the existence of a single subsequence {fj } that has the properties in Theorem 15.8 for every compact set K ⊂ . Proof. We will use the standard method based on a smooth approximation of the identity, with a further refinement. As usual, B(x; r) denotes the (open) ball with center x and radius r. Let f , , and K satisfy the hypothesis, and let kε (x) be as in the proof of Theorem 15.7. Choose an open set G and a number ε0 > 0 such that K ⊂ G, G has compact closure in , B(x; ε) ⊂ G for all x ∈ K and all ε < ε0 , and B(y; ε) ⊂ for all y ∈ G and all ε < ε0 . For example, G can be chosen as G=

B(x; ε0 ),

x∈K

where ε0 is chosen less than half the distance from K to the complement of (cf. Exercise 1(l) in Chapter 1). Note that kε (x − y) considered as a function of y has support in G for all x ∈ K and all ε < ε0 . Let g = f χG denote the function on Rn obtained by extending f to be zero outside G. Then g ∈ L1 (Rn ) since f ∈ L1loc () by hypothesis. Now define gε (x) = (g ∗ kε )(x),

x ∈ Rn , ε > 0.

n Then gε ∈ C∞ 0 (R ) by Theorem 9.3 and the comments after its proof. Moreover, since g vanishes outside G and kε has support in B(0; ε), it follows that gε has support in if ε < ε0 . By Theorems 9.6 and 9.13, gε → g in L1 (Rn ) norm and pointwise a.e. in Rn as ε → 0. Hence, gε → f in L1 (K) and pointwise a.e. in K. Next, by Theorem 9.3, for all x ∈ Rn , all ε > 0 and every i = 1, . . . , n, we have

∂gε ∂ kε (x − y) dy (x) = g(y) ∂xi ∂xi n R

=−

g(y)

Rn

=−

G

f (y)

∂ kε (x − y) dy ∂yi

∂ kε (x − y) dy. ∂yi

Weak Derivatives and Poincaré–Sobolev Estimates

473

Considered as a function of y, kε (x−y) has support in B(x; ε) and therefore has support in G if x ∈ K and ε < ε0 . Since ∂f /∂xi is the weak partial derivative of f with respect to xi in , it is also the weak partial derivative of f with respect to xi in G. Consequently, ∂f ∂gε (x) = (y)kε (x − y) dy ∂xi ∂yi

if x ∈ K and ε < ε0 .

G

The last integral equals

which converges to

∂f ∂xi χG

∂f χG ∗ kε (x), ∂xi

(x) in L1 (Rn ) and pointwise a.e. in Rn as ε → 0.

Restricting x to K, we obtain that ∂gε /∂xi → ∂f /∂xi in L1 (K) and pointwise a.e. in K. The theorem now follows by choosing fj = gεj for any sequence {εj }∞ j=1 → 0 such that εj < ε0 for all j. The proof of Theorem 15.8 can be modified to yield a single sequence {fj } that has the same properties as in Theorem 15.8 for every compact set K ⊂ . Indeed, we have the following result whose proof is left to the reader (see Exercise 5).

Theorem 15.9 Under the same hypotheses on and f as in Theorem 15.8, there is a sequence {fj } that satisfies the three properties in the conclusion of Theorem 15.8 for every compact set K ⊂ . It follows immediately that the sequence {fj } in Theorem 15.9 has the pointwise convergence properties fj → f and ∇fj → ∇f a.e. in . We arrive naturally at the definition of the Sobolev space W 1,p () by considering functions f that have a weak gradient in such that both f and |∇f | belong to Lp (), that is, if is an open set in Rn and 1 ≤ p ≤ ∞, then W 1,p () is defined by W 1,p () = f ∈ Lp () : f has a weak gradient in satisfying |∇f | ∈ Lp () . (15.10) The purpose of the first superscript 1 in the notation W 1,p is to indicate that only first-order derivatives enter the definition. We will not consider Sobolev spaces involving weak derivatives of order more than 1, although these spaces have important applications. In fact, we only consider a few

474

Measure and Integral: An Introduction to Real Analysis

aspects of the spaces W 1,p (), and then our primary concerns are the cases when is either a ball or the entire space Rn . One fact about (15.10) deserves emphasis: in order that f ∈ W 1,p (), the p definition requires that f and |∇f | belong to Lp () and not just to Lloc (). On the other hand, in case is a ball B, we will see in Section 15.3 that f ∈ W 1,p (B) if f ∈ L1loc (B) and f has a weak gradient in B satisfying |∇f | ∈ Lp (B). In other words, in the case of a ball B, the requirement in (15.10) that f ∈ Lp (B) can be replaced by assuming only that f ∈ L1loc (B). Some basic properties of Sobolev spaces are given in the exercises. For example (see Exercise 7), if is any open set in Rn and 1 ≤ p ≤ ∞, then W 1,p () is a Banach space with respect to the norm f W1,p ()

n ∂f = f Lp () + ∂x

.

(15.11)

i Lp ()

i=1

An equivalent norm is n 1/2 ∂f 2 f Lp () + ∂x i i=1

= f Lp () + ∇f Lp () .

Lp ()

Moreover, W 1,p () is separable if 1 ≤ p < ∞. In case 1 ≤ p < ∞, another equivalent norm is p f Lp ()

n ∂f + ∂x

p p

1/p ,

i L ()

i=1

and using this norm when p = 2 makes W 1,2 () a Hilbert space with inner product (f , g) =

fg dx +

∇f · ∇g dx.

(15.12)

See Exercises 11 and 12 for versions of the product rule and the chain rule in W 1,p (). Next, we state a way that functions in W 1,p () can be approximated by smooth functions when p is finite. Theorem 15.13 Let 1 ≤ p < ∞ and be an open set in Rn . If f ∈ W 1,p (), n there is a sequence {fj }∞ j=1 of functions on R such that n (i) fj ∈ C∞ 0 (R ) for all j,

475

Weak Derivatives and Poincaré–Sobolev Estimates

(ii) fj → f a.e. in and in Lp () norm as j → ∞, (iii) ∇fj → ∇f a.e. in and in Lp (K) norm as j → ∞ for every compact set K ⊂ . Moreover, in case = Rn and f ∈ W 1,p (Rn ) for some p with 1 ≤ p < ∞, the sequence {fj } can be chosen to converge to f in W 1,p (Rn ) norm, that is, so that fj → f in Lp (Rn ) and ∇fj → ∇f in Lp (Rn ).

Lp

Here the terminology ∇fj → ∇f in Lp norm means that ∂fj /∂xi → ∂f /∂xi in norm as j → ∞ for every i = 1, . . . , n.

Proof. The proof is similar to that of Theorem 15.8, now using the Lp versions of Theorems 9.6 and 9.13. We will leave some details for the reader to check. Fix p, , and f satisfying the hypotheses, and define g = f χ by extending f to be zero outside . Note that g belongs to Lp (Rn ) but may not have compact support and that g = f everywhere if = Rn . For ε > 0, define gε = g∗kε where kε is the same approximation of the identity used in the proofs of Theorems 15.7 and 15.8. Then gε ∈ C∞ (Rn ), although it may not have compact support, and gε → g in Lp (Rn ) and pointwise a.e. in Rn . Hence, gε → f in Lp () and pointwise a.e. in . The proof that ∇gε → ∇f pointwise a.e. in and also in Lp (K) norm for every compact set K in is left to the reader, as is the proof that if = Rn , then the sequence can be chosen with ∇gε → ∇f in Lp (Rn ) norm. Finally, whether = Rn or not, let η(x) be a smooth cutoff function on Rn n such that η ∈ C∞ 0 (R ) and η(x) =

1 0

if |x| ≤ 1 if |x| ≥ 2.

Then for any sequence εj → 0, the functions {fj } defined on Rn by fj (x) = η(εj x)gεj (x) have compact support in Rn and satisfy all the properties stated in the theorem. For example, let us show why ∇fj → ∇f in Lp (K) for every compact set K ⊂ . We compute ∇fj (x) = εj (∇η)(εj x)gεj (x) + η(εj x)∇gεj (x),

x ∈ Rn .

Therefore, if x ∈ , ∇fj (x) − ∇f (x) ≤ εj ∇ηL∞ (Rn ) |gε (x)| + η(εj x)∇gε (x) − ∇f (x) j j

+ 1 − η(εj x) |∇f (x)| := αj (x) + βj (x) + γj (x).

476

Measure and Integral: An Introduction to Real Analysis

First consider αj . Denote M = ∇ηL∞ (Rn ) sup gεj Lp (Rn ) . j

Then M is finite and αj Lp () ≤ Mεj → 0 as j → ∞. We estimate the norm of γj by using the facts that ηj (εj x) = 1 if |εj x| ≤ 1 and |∇f | ∈ Lp () with p finite: ⎛ ⎜ γj Lp () ≤ ⎝

⎞1/p ⎟ |∇f |p dx⎠

→0

as j → ∞.

x∈;|x|>ε−1 j

Next, fix a compact set K ⊂ . Since ∇gεj → ∇f in Lp (K), then βj Lp (K) ≤ ∇gεj − ∇f Lp (K) → 0 as j → ∞. Collecting estimates, it follows that ∇fj → ∇f in Lp (K) for every compact set K ⊂ . Next, suppose that = Rn and recall that ∇gεj → ∇f in Lp (Rn ) in this case. Hence, βj Lp (Rn ) ≤ ∇gεj − ∇f Lp (Rn ) → 0 as j → ∞. Finally, the arguments proving that both αj Lp (Rn ) and γj Lp (Rn ) tend to 0 were already included above. Thus, in case = Rn , we obtain that ∇fj → ∇f in Lp (Rn ). The remaining details of the proof are left to the reader. In some situations, fewer properties of the approximating sequence {fj } are needed than the ones listed in Theorems 15.8 or 15.13. The following simple variant of Theorem 15.13 will be useful in the proof of Theorem 15.45. Theorem 15.14 Let f have a weak gradient in Rn that satisfies |∇f | ∈ Lp (Rn ) for some p, 1 ≤ p < ∞. If either f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞ or ∞ n lim|x|→∞ f (x) = 0, then there is a sequence {fj }∞ j=1 ⊂ C (R ) with lim|x|→∞ fj (x) = 0 for every j such that fj → f pointwise a.e. in Rn and ∇fj → ∇f in Lp (Rn ) norm as j → ∞.

477

Weak Derivatives and Poincaré–Sobolev Estimates

Proof. Let kε be as usual and let fj = f ∗ kεj for any sequence εj → 0. Then fj ∈ C∞ (Rn ) for all j. Also, fj → f a.e. in Rn since f ∈ L1loc (Rn ), and ∇fj → ∇f in Lp (Rn ) as usual. It remains to show that each fj (x) → 0 as |x| → ∞. First, suppose that f (x) → 0 as |x| → ∞. Then, for every ε > 0, sup | f | → 0

as

B(x;ε)

|x| → ∞,

and we are done since |(f ∗ kε )(x)| ≤

sup | f | kL1 (Rn ) .

B(x;ε)

Finally, assuming instead that f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞, we have |(f ∗ kε )(x)| = f (y)kε (x − y) dy |x−y| 0, then | f (x)| ≤

cn |∇f (y)| dy γ |x − y|n−1

for a.e. x ∈ B.

B

(ii) If f has compact support in B, then | f (x)| ≤ cn

B

|∇f (y)| dy |x − y|n−1

for a.e. x ∈ B.

In both parts, the constant cn depends only on n. Proof. The proof of part (i) is omitted since it is essentially identical to the corresponding proof in Corollary 14.6. To prove (ii), let f have compact support and weak gradient ∇f in a ball B. Extend f and ∇f to Rn by setting them equal to zero outside B. Let B∗ be the ball of radius 2r(B) concentric with B. By Theorem 15.15, ∇f is the weak gradient of f in B∗ . Also, by Exercise 13, |∇f | = 0 outside B. By part (i) of this corollary applied to B∗ and its subset E = B∗ − B, we then obtain | f (x)| ≤ cn

B∗

|∇f (y)| |∇f (y)| dy = c dy n |x − y|n−1 |x − y|n−1

for a.e. x ∈ B, which completes the proof.

B

481

Weak Derivatives and Poincaré–Sobolev Estimates

There is also an analogue of Corollary 14.7. In stating it, we assume f has a weak gradient in Rn and satisfies the extra condition that there is a sequence n {Bi }∞ i=1 of balls such that Bi R and fBi → 0 as i → ∞. As in Chapter 14, typical situations when this holds are (i) f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞, or (ii) f ∈ Lrloc (Rn ) for some r with 1 ≤ r ≤ ∞ and lim|x|→∞ f (x) = 0. Corollary 15.22 Let f have weak gradient ∇f in Rn and suppose that fB → 0 for some sequence of balls B Rn . Then there is a constant cn depending only on n such that | f (x)| ≤ cn I1 (|∇f |)(x)

for a.e. x ∈ Rn ,

where I1 is the fractional integral operator of order 1. In particular, the conclusion holds if f ∈ W 1,p (Rn ) for any p with 1 ≤ p < ∞. Proof. Fix a function f with a weak gradient in Rn and fB → 0 for some sequence of balls B increasing to Rn . For any B in the sequence, since f is locally integrable in Rn , we have that f ∈ L1 (B), and Theorem 15.16 (see (15.18)) yields | f (x) − fB | ≤ cn I1 (|∇f |)(x)

for a.e. x ∈ B.

Now let B Rn . Since fB → 0, the result follows. We now turn to the Poincaré–Sobolev inequalities themselves. Four separate ranges of p values will be considered: 1 < p < n, p = n, p > n, and p = 1. The case p = 1 uses a more local subrepresentation formula than the ones derived so far. Our primary interest is again when n > 1. In fact, the range 1 < p < n does not exist if n = 1, and results when p = n = 1 and p > n = 1 are easy consequences of Theorem 15.6; see Exercise 20. We begin with the range 1 < p < n. Theorem 15.23 Let B be a ball in Rn , n > 1, and let p and q satisfy 1 < p < n and 1/q = 1/p − 1/n (so that q = pn/(n − p) > p). If f is a locally integrable function on B with a weak gradient ∇f belonging to Lp (B), then f ∈ Lq (B) and

1/q | f (x) − fB | dx

B

where fB = |B|−1

q

B f (x) dx

≤ cn,p

1/p p

|∇f (x)| dx

,

B

and the constant cn,p depends only on n and p.

(15.24)

482

Measure and Integral: An Introduction to Real Analysis

If f also has compact support in B, then

1/q q

| f (x)| dx

≤ cn,p

B

1/p p

|∇f (x)| dx

.

(15.25)

B

Some comments about Theorem 15.23 are given after its proof. Proof. Fix B, p, and f satisfying the hypothesis of the first part of the theorem, and let 1/q = 1/p − 1/n. Let us begin by showing that f ∈ L1 (B). Theorem 14.37(a) in case α = 1 gives I1 (|∇f |χB )Lq (Rn ) ≤ cn,p (∇f )χB Lp (Rn ) = cn,p ∇f Lp (B) < ∞. In particular, I1 (|∇f |χB ) is finite a.e. in B. Since f ∈ L1loc (B) by hypothesis, f is also finite a.e. in B. Furthermore, by (15.17), 1 | f (x) − f (y)| dy ≤ cn I1 (|∇f |χB )(x) |B|

a.e. in B,

B

and therefore, 1 | f (y)| dy ≤ | f (x)| + cn I1 (|∇f |χB )(x) |B|

a.e. in B.

B

The fact that f ∈ L1 (B) now follows by choosing a point x ∈ B such that both f (x) and I1 (|∇f |χB )(x) are finite. Hence, the average fB is finite and (15.18) holds: | f (x) − fB | ≤ cn I1 (|∇f |χB )(x)

a.e. in B.

Taking Lq norms over B, we obtain f − fB Lq (B) ≤ cn I1 (|∇f |χB )Lq (B) ≤ cn,p ∇f Lp (B) , as noted earlier. This proves (15.24). The fact that f ∈ Lq (B) now follows immediately from Minkowski’s inequality: f Lq (B) ≤ f − fB Lq (B) + | fB | |B|1/q ≤ cn,p ∇f Lp (B) + | fB | |B|1/q < ∞. This proves the first statement in Theorem 15.23.

Weak Derivatives and Poincaré–Sobolev Estimates

483

If f also has compact support in B, a similar argument based on part (ii) of Corollary 15.21 yields (15.25). Theorem 15.23 is now proved. We pause briefly in order to add several comments related to Theorem 15.23. The proof of Theorem 15.23 fails if p = 1 since I1 does not map L1 (Rn ) into n/(n−1) L (Rn ); Theorem 14.37 provides only a weak type estimate for I1 when p = 1. However, as will see in Theorem 15.37, Theorem 15.23 remains true if p = 1. Inequality (15.24) is often called the Lp , Lq Poincaré estimate for f and B, and (15.25) for compactly supported f is called the Lp , Lq Sobolev estimate for f . The formula 1/q = 1/p − 1/n is called the Poincaré–Sobolev dimensional balance formula or simply the balance formula. Because of it and the fact that the measure of any ball B in Rn is a fixed multiple of r(B)n , the Lp , Lq Poincaré estimate for B can be written in the equivalent normalized form

1/q 1/p 1 1 q p | f (x) − fB | dx ≤ cn,p r(B) |∇f (x)| dx . |B| |B| B

(15.26)

B

Similarly, the Lp , Lq Sobolev estimate can be written in the form

1/q 1/p 1 1 q p | f (x)| dx ≤ cn,p r(B) |∇f (x)| dx . |B| |B| B

(15.27)

B

A remarkable fact is that the hypothesis in the first part of Theorem 15.23 is nominally weaker than assuming f belongs to W 1,p (B), because f is not assumed to belong to Lp (B) or even to L1 (B), but the conclusion is stronger since f turns out to belong to Lq (B) for some q > p. In fact, as already shown in the proof of Theorem 15.23, under the same hypothesis as in the first part of Theorem 15.23, we have f Lq (B) ≤ cn,p ∇f Lp (B) + | fB | |B|1/q < ∞.

(15.28)

From a heuristic point of view, the main content of Theorem 15.23 is that there is a gain in the order of integrability of f . An important corollary is that W 1,p (B) ⊂ Lq (B) if 1 < p < n and 1/q = 1/p − 1/n. See also Exercise 14. Norm estimates over all of Rn for the range 1 < p < n can be derived easily from Corollary 15.22. The second part of the following result is usually referred to as the first-order Sobolev embedding theorem for Rn in case 1 < p < n. Theorem 15.29 Let f have a weak gradient in Rn satisfying |∇f | ∈ Lp (Rn ) for some p with 1 < p < n, and suppose that fB → 0 for some sequence of balls increasing

484

Measure and Integral: An Introduction to Real Analysis

to Rn . Let 1/q = 1/p − 1/n. Then f ∈ Lq (Rn ) and f Lq (Rn ) ≤ cn,p ∇f Lp (Rn ) . In particular, if f ∈ W 1,p (Rn ) for some p with 1 < p < n, then f Lq (Rn ) ≤ cn,p ∇f Lp (Rn ) ≤ cn,p f W1,p (Rn ) , 1/q = 1/p − 1/n. Proof. The first statement follows immediately by combining Corollary 15.22 and Theorem 14.37(a). Alternately, it can be derived by applying (15.24) to each ball in the sequence of balls in the hypothesis of the theorem. The second statement is a corollary of the first one since fB → 0 for any sequence of balls B Rn if f ∈ Lp (Rn ), 1 ≤ p < ∞. See also Exercise 15. Next, we consider the endpoint value p = n that was omitted in Theorem 15.23. When p = n, the corresponding value of q in the balance formula 1/q = β 1/p − 1/n is q = ∞. However, the function log |x| belongs to W 1,n ({|x| < 1/2}) if n > 1 and 0 < β < (n − 1)/n, but it is not bounded; see Exercise 16. On the other hand, by using the estimates of Moser–Trudinger type in Chapter 14, we obtain the following result about local exponential integrability. The case n = 2 was studied by Pohozaev. Theorem 15.30 Let B be a ball in Rn , n > 1, and f have a weak gradient in B satisfying |∇f | ∈ Ln (B). There are positive constants c1 and c2 depending only on n such that 1 | f (x) − fB | n/(n−1) exp c1 dx ≤ c2 . |B| ∇f Ln (B)

(15.31)

B

If f is also supported in B, then (15.31) also holds with fB replaced by 0. We have assumed here that ∇f Ln (B) = 0; otherwise, f is constant a.e. in B by Corollary 15.20. Proof. Let B and f satisfy the first hypothesis. Since |∇f | ∈ Ln (B), then |∇f | ∈ Lp (B) for 1 ≤ p ≤ n by Hölder’s inequality, and consequently f ∈ L1 (B) by Theorem 15.23. The proof of (15.31) then follows immediately by combining (15.18) with the case α = 1 of Theorem 14.40. If f also has compact support

485

Weak Derivatives and Poincaré–Sobolev Estimates

in B, a similar argument based on the second part of Corollary 15.21 yields the second part of the theorem. In case n = 1, a better result than (15.31) holds; see Exercise 20. An immediate corollary of (15.31), under the same assumptions on f , is that for every r with 0 < r < ∞, 1 |B|

B

| f (x) − fB | ∇f Ln (B)

r

dx ≤ cn,r ,

or equivalently

1/r 1 r | f (x) − fB | dx ≤ cn,r ∇f Ln (B) , |B|

0 < r < ∞.

B

For a related global inequality, see Corollary 15.41. Let us now consider the case p > n. Theorem 15.32 (Morrey) Let B be a ball in Rn and n < p ≤ ∞. If f has a weak gradient ∇f in B satisfying |∇f | ∈ Lp (B), then after possible redefinition of f in a subset of B of measure zero, f is Hölder continuous of order 1−(n/p) on B. Moreover, there is a constant cn,p depending only on n and p such that | f (x) − f (y)| ≤ cn,p ∇f Lp (B) |x − y|

1− np

,

x, y ∈ B.

The order 1 − (n/p) of Hölder continuity should be interpreted as 1 if p = ∞, that is, f is Lipschitz continuous on B if p = ∞. Proof. Let B, p, and f satisfy the hypothesis. Then |∇f | ∈ Lr (B) for 1 ≤ r ≤ p, so f ∈ L1 (B) by previous results. Therefore, using (15.18) and Hölder’s inequality, for any ball B1 ⊂ B and a.e. x ∈ B1 , we obtain | f (x) − fB1 | ≤ cn

B1

|∇f (y)| dy |x − y|n−1 ⎛

≤ cn ∇f Lp (B1 ) ⎝

B1

⎞1/p 1 ⎠ , dy |x − y|(n−1)p

486

Measure and Integral: An Introduction to Real Analysis

1/p + 1/p = 1. Note that (n − 1)p < n since p > n. Also, B1 ⊂ B(x; 2r(B1 )) for every x ∈ B1 . Hence, if x ∈ B1 , then ⎛ ⎝

B1

⎞1/p ⎛ 1 ⎠ ≤⎝ dy |x − y|(n−1)p

|x−y| 2k , | f (x0 ) − fB | < 2k−1 , or | f (x0 ) − fB | = 2k−1 . In either of the first two cases, since f is continuous, there is a neighborhood of x0 in which gk is constant, namely, in which gk ≡ 2k or gk ≡ 2k−1 , respectively, and consequently |∇gk (x0 )| = 0 in either of the first two cases. On the other hand, if | f (x0 ) − fB | = 2k−1 , then gk (x0 ) = 2k−1 , and therefore, gk has an absolute minimum at x0 (since gk ≥ 2k−1 everywhere in B). Assuming as we may that ∇gk (x0 ) exists, it follows that |∇gk (x0 )| = 0, and (15.36) is verified. Note that gk is integrable, even bounded, in B. For all x ∈ B, gk (x) = [gk (x) − (gk )B ] + (gk )B ≤ cn I1 (|∇gk |χB )(x) + (gk )B ≤ cn I1 (|∇f |χk−1 )(x) + (gk )B , where we have applied the subrepresentation formula to the locally Lipschitz function gk and used (15.36). Also, gk ≤ 2k−1 + | f − fB | in B, and hence (gk )B ≤ 2k−1 +

1 | f − fB | dy. |B| B

Therefore, if x ∈ B,

1 | f − fB | dy. gk (x) ≤ cn I1 |∇f |χk−1 (x) + 2k−1 + |B| B

Restricting x to k , where we have gk (x) = 2k , gives 2k ≤ cn I1 (|∇f |χk−1 )(x) + 2k−1 +

1 | f − fB | dy, |B| B

and by then subtracting 2k−1 from both sides, it follows that 2k ≤ cn I1 (|∇f |χk−1 )(x) +

2 | f − fB | dy, |B|

x ∈ k .

B

Combining this with the fact that | f − fB | ≤ 2k+1 on k , we obtain the desired estimate | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) +

4 | f − fB | dy, |B| B

We can now extend Theorems 15.23 and 15.29 to p = 1.

x ∈ k .

489

Weak Derivatives and Poincaré–Sobolev Estimates

Theorem 15.37 (i) Let B be a ball in Rn and f be a function with a weak gradient ∇f in B that satisfies |∇f | ∈ L1 (B). Then f ∈ Ln/(n−1) (B) and

(n−1)/n | f − fB |

n/(n−1)

dx

≤ cn

B

|∇f | dx.

(15.38)

B

(ii) Let f have a weak gradient ∇f in Rn satisfing |∇f | ∈ L1 (Rn ). If fB → 0 for some sequence of balls B Rn , then f ∈ Ln/(n−1) (Rn ) and f Ln/(n−1) (Rn ) ≤ cn ∇f L1 (Rn ) .

(15.39)

In particular, (15.39) holds if f ∈ W 1,1 (Rn ). Thus, it holds if f has compact support and a weak gradient in Rn satisfing |∇f | ∈ L1 (Rn ). The constants cn depend only on n. Once again, there are normalized versions: (15.38) can be rewritten as

(n−1)/n 1 r(B) n/(n−1) | f − fB | dx ≤ cn |∇f | dx, |B| |B| B

B

and if f has compact support in B, (15.39) is the same as

(n−1)/n 1 r(B) n/(n−1) |f| dx ≤ cn |∇f | dx. |B| |B| B

B

Proof. It is enough to prove part (i) since part (ii) follows from it by applying (15.38) to the sequence of balls in the hypothesis of part (ii) and using Fatou’s lemma. Fix a ball B and let f have weak gradient in B with |∇f | ∈ L1 (B). Then f ∈ L1 (B) by an argument like the one at the beginning of the proof of Theorem 15.23, except that the finiteness of I1 (|∇f |χB ) a.e. in B now follows from the weak type estimate in Theorem 14.37(b) since |∇f | ∈ L1 (B). Assuming also that f ∈ C∞ (B), or even also just that f ∈ Liploc (B), we can apply Theorem 15.34. Thus, let k = x ∈ B : 2k < | f (x) − fB | ≤ 2k+1 ,

k = 0, ±1, . . . .

490

Measure and Integral: An Introduction to Real Analysis

Recall from the discussion following the statement of Theorem 15.34, that is, from the discussion about the size of the second term on the right side of (15.35), that | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) + cn

r(B) |∇f | dy, |B|

x ∈ k .

(15.40)

B

Denote q = n/(n − 1). Since B is the disjoint union of the k , we have

| f − fB | q =

∞

| f − fB | q

k=−∞ k

B

≤

∞

2(k+1)q |k | =

k=−∞

+

k≤N

= S1 + S2 ,

k≥N+1

say, where N is chosen such that the second term on the right of (15.40) satisfies 2N−1 < cn

r(B) |∇f | ≤ 2N . |B| B

Then S1 =

2(k+1)q |k | ≤ 2(N+1)q |B|

k≤N

2q (N−1)q

=2 2

= cn

2q

|B| ≤ 2

q |B|

B

q |∇f |

r(B) |∇f | cn |B|

since q = n/(n − 1).

B

In order to estimate S2 , note that for all x ∈ k , the choice of N and (15.40) imply that 2k < | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) + 2N . If k ≥ N + 1, then by subtracting 2N from both sides and noting that 2k−1 ≤ 2k − 2N , we obtain I1 (|∇f |χk−1 )(x) >

2k−1 , cn

x ∈ k , k ≥ N + 1.

491

Weak Derivatives and Poincaré–Sobolev Estimates

Equivalently, if k ≥ N + 1, then k ⊂ x : I1 (|∇f |χk−1 )(x) > 2k−1 /cn . Therefore, by the weak type estimate in Theorem 14.37(b), ⎛ |k | ≤ cn ⎝

1 2k−1

⎞q

|∇f |⎠

if k ≥ N + 1.

k−1

Hence,

S2 =

2(k+1)q |k |

k≥N+1

⎞q ⎛ 2(k+1)q ⎝ ≤ cn |∇f |⎠ 2(k−1)q k≥N+1

= cn

≤ cn ⎝

⎞q

⎝

k≥N+1

⎛

k−1

⎛

|∇f |⎠

k−1

⎞q

|∇f |⎠ = cn

k k−1

q |∇f |

.

B

%

q % Here, we have used the estimate |ak |q ≤ |ak | to obtain the last line (see Exercise 31 of Chapter 8). Combining the estimates for S1 and S2 proves (15.38) in case f is smooth. Now consider a general f with a weak gradient satisfying |∇f | ∈ L1 (B). Let D be a ball with D ⊂ B, and choose an approximating sequence {fj } as in Theorem 15.8 with K = D there. By the case just proved, we have (with q = n/(n − 1))

1/q | fj − (fj )D |

q

≤ cn

D

|∇fj |

D

for every j. As j → ∞, fj → f a.e. in D, (fj )D → fD since fj → f in L1 (D) norm, and |∇fj | → |∇f | in L1 (D). Therefore,

D

1/q | f − fD |

q

≤ cn

D

|∇f |,

492

Measure and Integral: An Introduction to Real Analysis

where we used Fatou’s lemma for the left side of the inequality. Now enlarge the domain of integration on the right side to B and then let D increase to B. Since f ∈ L1 (B), then fD → fB , and another application of Fatou’s lemma gives (15.38). Finally, since (15.38) and the finiteness of fB imply that f ∈ Lq (B) by Minkowski’s inequality, the proof is complete. Part (ii) of Theorem 15.37 yields the next result about the space W 1,n (Rn ).

Corollary 15.41

Let f ∈ W 1,n (Rn ) and 1 < n ≤ r < ∞. Then f ∈ Lr (Rn ) and n

1− n

f Lr (Rn ) ≤ cn,r f Lrn (Rn ) ∇f Ln (Rr n ) ,

(15.42)

where cn,r is a constant depending only on n and r. In particular, f Lr (Rn ) ≤ cn,r f W1,n (Rn ) ,

n ≤ r < ∞.

(15.43)

Proof. First suppose f ∈ C10 (Rn ) and note that the function F = | f |δ−1 f is then also of class C10 (Rn ) if δ ≥ 1. Moreover, ∂F/∂xi = δ| f |δ−1 ∂f /∂xi for every i = 1, . . . , n. Denote n = n/(n − 1) and f Ln (Rn ) = f n . Applying (15.39) to F, we obtain f δδn ≤ cn δ | f |δ−1 |∇f |1 ≤ cn δ f δ−1 (δ−1)n ∇f n ,

δ ≥ 1, by Hölder’s inequality

(15.44)

with exponents n and n. We will successively choose δ = n, n + 1, n + 2, . . . in (15.44). With δ = n, since (n − 1)n = n, we have f nnn ≤ cn n f n−1 n ∇f n . Next, choosing δ = n + 1, n f n+1 (n+1)n ≤ cn (n + 1) f nn ∇f n 2 ≤ cn (n + 1)n f n−1 n ∇f n ,

where we have used the case δ = n to obtain the last inequality. Continuing inductively in this way gives n−1 k+1 f n+k (n+k)n ≤ cn (n + k) · · · (n + 1)n f n ∇f n ,

k = 0, 1, . . . .

493

Weak Derivatives and Poincaré–Sobolev Estimates

Raising both sides to the power 1/(n + k), we obtain n

1− nr

f r ≤ cn,r f nr ∇f n

,

r = (n + k)n , k = 0, 1, . . . .

Also, the estimate is trivially true when r = n. In order to obtain the same estimate for every r ∈ [n, ∞), we use Hölder’s inequality in the form (cf. Exercise 6 of Chapter 8) 1−θ f r ≤ f θ r1 f r2 ,

1 ≤ r1 ≤ r ≤ r2 ≤ ∞,

θ 1 1−θ = + . r r1 r2

In fact, by choosing r1 = n and r2 = (n + k)n for a fixed k = 0, 1, . . ., it follows that for any r ∈ [n, (n + k)n ], θ 1−θ 1 = + , r n (n + k)n n−1 k+1 1−θ n+k n+k c f ∇f ≤ f θ n,k n n n

1−θ f r ≤ f θ n f (n+k)n ,

θ+ n−1 n+k (1−θ)

= cn,r f n

k+1

∇f nn+k

(1−θ)

n

1− nr

= cn,r f nr ∇f n

.

This proves the result when f ∈ C10 (Rn ). For general f ∈ W 1,n (Rn ), the same estimate then holds by applying the second statement in Theorem 15.13 with p = n. Details are left to the reader. This completes the proof. Theorem 15.37(ii) has an elegant companion result relying heavily on the product structure of Rn . There is an equally elegant analogue of Theorem 15.29 for functions defined in all of Rn . The result corresponding to Theorem 15.37(ii) is given next; see Exercise 18 for the one corresponding to Theorem 15.29.

Theorem 15.45

If f ∈ W 1,1 (Rn ), then f ∈ Ln/(n−1) (Rn ) and f Ln/(n−1) (Rn )

n & ∂f ≤ ∂x i=1

1/n 1

.

(15.46)

i L (Rn )

Moreover, the conclusion holds if the hypothesis that f ∈ W 1,1 (Rn ) is replaced by assuming that f has a weak gradient in Rn that satisfies |∇f | ∈ L1 (Rn ) and either f ∈ Lr (Rn ) for some r ∈ [1, ∞) or lim|x|→∞ f (x) = 0. In the proof, we will use the next lemma in case n > 1.

494

Measure and Integral: An Introduction to Real Analysis

Lemma 15.47 Let n = 2, 3, . . . and g1 (x), . . . , gn (x) be n functions of x = (x1 , . . . , xn ) ∈ Rn such that gi (x) is independent of xi , i = 1, . . . , n, that is, gi (x) depends only on (x1 , . . . , xi−1 , xi+1 , . . . , xn ) ∈ Rn−1 . If each gi ∈ Ln−1 (Rn−1 ), then n & gi i=1

≤

n & gi

Ln−1 (Rn−1 )

.

i=1

L1 (Rn )

Proof. The proof will be by induction on n. We may assume that every gi is nonnegative. If n = 2, the result is true since

⎛ g1 (x2 )g2 (x1 ) dx1 dx2 = ⎝

R2

⎞⎛ g1 (x2 ) dx2 ⎠ ⎝

R1

⎞ g2 (x1 ) dx1 ⎠ .

R1

Suppose n ≥ 3 and let (n − 1) denote the conjugate index of n − 1. Write g1 g2 · · · gn = (g1 g2 · · · gn−1 ) gn and apply Hölder’s inequality with indices (n − 1) and n − 1, obtaining n ( '& gi dx1 · · · dxn−1 Rn−1

1

⎛ ≤⎝

' n−1 &

Rn−1

(n−1)

gi

(

⎞ dx1 · · · dxn−1 ⎠

1

1 (n−1)

⎛ ⎝

⎞ gn−1 dx1 · · · dxn−1 ⎠ n

1 n−1

.

Rn−1

(15.48) The second factor on the right side of (15.48) is gn Ln−1 (Rn−1 ) , which is independent of xn since gn is independent of xn . We estimate the first factor on the right side by applying the inductive assumption as follows: ' n−1 & Rn−1

≤

(n−1)

(

gi

dx1 · · · dxn−1

1 n−1 & 1

⎡ ⎣

Rn−2

⎤ (n−1) (n−2)

gi

dx1 · · · dxi−1 dxi+1 · · · dxn−1 ⎦

1 n−2

.

495

Weak Derivatives and Poincaré–Sobolev Estimates

Now raise both sides of this inequality to the power 1/(n − 1) , note that (n − 1) (n − 2) = n − 1, and combine the result with (15.48) to obtain n ( '& gi dx1 · · · dxn−1 Rn−1

⎛

⎜ ≤⎝

1 n−1 &

⎡ ⎣

1

⎤

1 n−1

gn−1 dx1 · · · dxi−1 dxi+1 · · · dxn−1 ⎦ i

⎞ ⎟ ⎠ gn Ln−1 (Rn−1 ) .

Rn−2

On the right side of this inequality, each of the first n−1 factors in the product is a function only of xn . Hence, by integrating both sides of the inequality with respect to xn and then using Hölder’s inequality with n−1 exponents all equal to n − 1 (see Exercise 6 of Chapter 8), we obtain n ( '& gi dx ≤ Rn

n−1 &

1

gi Ln−1 (Rn−1 ) gn Ln−1 (Rn−1 )

1

=

n &

gi Ln−1 (Rn−1 ) .

1

This completes the proof of Lemma 15.47. Proof of Theorem 15.45. First assume that f ∈ C1 (Rn ), n > 1, with |∇f | ∈ L1 (Rn ) and that lim|x|→∞ f (x) = 0. Then for any x = (x1 , . . . , xn ) and i = 1, . . . , n, we have f (x) =

xi ∂f (x1 , . . . , xi−1 , t, xi+1 , . . . , xn ) dt ∂xi −∞

and ∞ ∂f | f (x)| ≤ ∂x (x1 , . . . , xi−1 , t, xi+1 , . . . , xn ) dt := hi (x), i −∞ where hi is defined by the last equality. Note that hi (x) is independent of xi and belongs to L1 (Rn−1 ). Raising both sides of the inequality to the power 1/(n − 1), n > 1, and then taking the product over i of the result gives n

| f (x)| n−1 ≤

n & i=1

1

hi (x) n−1 ,

x ∈ Rn .

496

Measure and Integral: An Introduction to Real Analysis

1/(n−1)

Each hi

∈ Ln−1 (Rn−1 ). Therefore, by Lemma 15.47 with gi chosen to be

1/(n−1) , hi

Rn

| f (x)|

n n−1

n 1 & n−1 h dx ≤ i i=1

Ln−1 (Rn−1 )

n & ∂f = ∂x i=1

1 n−1 1

.

i L (Rn )

Raising both sides to the power (n − 1)/n proves (15.46) when n > 1 and f ∈ C1 (Rn ) with limit 0 at ∞ and |∇f | ∈ L1 (Rn ). The general case when n > 1 follows by using the approximation result in Theorem 15.14 combined with Fatou’s lemma; details as well as the proof in case n = 1 are left to the reader. This completes the proof.

Exercises 1. Construct an example of a function f that has weak first-order partial derivatives in Rn , n > 1, but whose ordinary first-order partial derivatives exist nowhere in Rn . 2. Let c be a real number different from 0. Show that there is no locally 1 integrable function g on (−1, 1) such that −1 gϕ dx = cϕ(0) for all ϕ ∈ C∞ 0 ((−1, 1)). 3. The fact that the Cantor–Lebesgue function f does not have a weak derivative on the interval (0, 1) is an immediate corollary of Theorem 15.6. Verify this fact instead directly from the definition of a weak derivative by choosing test functions that are adapted to the graph of f . 4. Consider the convolution (ϕ ∗ k)(x), x ∈ Rn , where ϕ ∈ Liploc (Rn ), k ∈ L1 (Rn ), and k has compact support. Show that for every i = 1, . . . , n, the ordinary derivative ∂(ϕ∗k)/∂xi exists and is finite everywhere in Rn , and ∂ϕ ∂ (ϕ ∗ k)(x) = (y)k(x − y) dy, ∂xi ∂yi n

x ∈ Rn .

R

Show that the same is true without the assumption that k has compact support if ϕ satisfies the additional conditions ϕ, ∂ϕ/∂yi ∈ L∞ (Rn ). 5. Verify Theorem 15.9. (Choose a sequence {K }∞ =1 of compact subsets of whose interiors increase to . For each , use Theorem 15.8 to pick a function f ∈ C∞ 0 () such that f − f L1 (K ) + ∇f − ∇f L1 (K ) < 1/.

497

Weak Derivatives and Poincaré–Sobolev Estimates

Show that for every compact K ⊂ , {f } and {∇f } converge in L1 (K) to f and ∇f , respectively. Then use a diagonal process to construct a subsequence {j } of positive integers such that {fj } and {∇fj } also converge pointwise a.e. in .) 6. Let be an open set in Rn and let f , g ∈ L1loc (). Show that for any i = 1, . . . , n, f has weak partial derivative ∂f /∂xi = g if and only if there are ∞ 1 functions {fj }∞ j=1 ⊂ C () such that fj → f in L (K) and ∂fj /∂xi → g in L1 (K) as j → ∞ for every compact set K ⊂ . 7. Prove that when 1 ≤ p ≤ ∞, W 1,p () is a Banach space with respect to the norm (15.11) and that it is separable when 1 ≤ p < ∞. Prove that W 1,2 () is a Hilbert space with respect to the inner product (15.12). (To show separability, first note that the repeated Cartesian product of Lp () with itself is separable when 1 ≤ p < ∞. Then apply the result in Exercise 23 of Chapter 8.) 8. Complete the proof of Theorem 15.13. Furthermore, given a number δ > 0, show that the approximating functions {fj } can be chosen to have supports in the δ-neighborhood {x : d(x, ) < δ} of , where d(x, ) denotes the distance from x to . In particular, in case is a ball B, if B∗ is a larger ball concentric with B, the fj can be chosen to have supports in B∗ . 1,p n n 9. Let 1 ≤ p ≤ ∞

and f ∈ W (R ). For h and x in R , define the translated function τh f (x) = f (x + h). Show that τh f − f Lp (Rn ) ≤ |h| ∇f Lp (Rn ) . n (Consider first the case when f ∈ C∞ 0 (R ) and derive the estimate τh f − f Lp ({|x| c satisfies |∇fc | ≤ |∇f | a.e. in . Show also that |∇(| f |)| ≤ |∇f | a.e. in . 18. Prove the following analogue of Theorem 15.45 when 1 < p < n and 1/q = 1/p − 1/n. If f ∈ W 1,p (Rn ), then f Lq (Rn )

n & ∂f ≤ cn,p ∂x i=1

1/n p

.

i L (Rn )

(In case f ∈ C10 (Rn ), apply (15.46) to the function F = | f |δ−1 f with δ = q(n − 1)/n. Note that 1 < δ < ∞, and apply Hölder’s inequality to the formula ∂F/∂xi L1 (Rn ) = δ | f |δ−1 ∂f /∂xi L1 (Rn ) with exponents p and p.) 19. (a) Show that there exists f ∈ C∞ (Rn ) with |∇f | ∈ Lp (Rn ) for every p, 1 ≤ p ≤ ∞, such that (15.39), (15.46), and the first inequality in the conclusion of Theorem 15.29 fail. (b) Suppose that 1 ≤ p < n, f ∈ W 1,p (Rn ), and ∂f /∂xi = 0 a.e. in Rn for some i, i = 1, . . . , n. Prove that f = 0 a.e. in Rn . (See Exercise 18 in case 1 < p < n.) 20. Let n = 1, (a, b) be an open interval in R1 , possibly of infinite length, and suppose that f has a weak derivative f in (a, b). Verify the following analogues of Theorems 15.30 and 15.32: (a) If f ∈ L1 (a, b), then f ∈ L∞ (a, b) and f − f (c)L∞ (a,b) ≤ f L1 (a,b)

for a.e. c ∈ (a, b).

500

Measure and Integral: An Introduction to Real Analysis

(b) If f ∈ Lp (a, b) for some p with 1 < p ≤ ∞, then | f (x) − f (y)| ≤ f Lp (a,b) |x − y|1/p

for a.e. x, y ∈ (a, b),

where 1/p + 1/p + 1. 21. Let f ∈ W 1,1 (Rn ). Prove that f Lr (Rn ) ≤ f W1,1 (Rn ) ,

1≤r≤

n . n−1

22. Verify the following remarkable fact. Let f and g be locally integrable functions on Rn that satisfy 1 1 | f − fB | dx ≤ Cr(B) |g| dx |B| |B| B

B

for all balls B ⊂ Rn , with C independent of B. If 1 < p < n and 1/q = 1/p − 1/n, then for every ball B,

1/q 1/p 1 p 1 q | f − fB | dx ≤ C r(B) |g| dx , |B| |B| B

B

where C depends only on n and C. (Recall Theorem 14.12.) What can be said if p ≥ n? 23. Let B be a ball in Rn with r(B) = 1 and n ≥ 2. Let k = 1, 2, . . . and f ∈ Ck+1 0 (B). If n = 2k or n = 2k + 1, show that f L∞ (B) ≤ cn ∇ k+1 f L2 (B) , where the right-hand side denotes the L2 (B) norm of the sum of the absolute value of every partial derivative of f of order k + 1, and cn is a constant independent of B and f . (One way to proceed is to first apply Corollary 14.6 to obtain f L∞ (B) ≤ c∇f Lr (B) if r > n. Then show that ∇f Lr (B) ≤ c∇ 2 f Lrn/(r+n) (B) , and continue considering higher order derivatives as necessary.)

Notations x+y x·y |x| x+ x− f (x+) f (x−) → m − → {xk } {x : . . .} x∈E x∈ /E E 1 ∪ E2 Ek k E1 ∩ E2 Ek k E1

− E2 E1 ⊂ E2 CE E¯ ˚ E diam E δ(E) δ(x) |E| |E|e [a, b] (a, b) (a, b] [a, b) Gδ Fσ Ø sup inf

addition dot product absolute value positive part negative part limit from the right limit from the left converges to converges in measure to increases to decreases to sequence set of x satisfying . . . x an element of E x not an element of E union intersection difference, relative complement subset complement closure interior diameter diameter distance function measure outermeasure closed interval open interval partly open intervals countable intersection of open sets countable union of closed sets empty set supremum, least upper bound infimum, greatest lower bound 501

502

lim sup lim inf ess sup ess inf ⎫ L, L1 ⎪ ⎪ ⎪ ⎪ Lp ⎪ ⎪ ⎬ Lip, Liploc ∞ L ⎪ ⎪ ⎪ ⎪ W 1,p ⎪ ⎪ ⎭ S p l R1 Rn C f E f dμ Eb a fdφ

·

· p

· ∗ , · ∗∗ f ∗g χE a.e. (S, , μ) o(φ(x)) O(φ(x)) ⎫ V(E) ⎬ ¯ V(E), V(E) ⎭ V[f ; a, b] S[f ] S[f ] usc lsc R S ω(f ; δ) ωp (f ; δ) ωf ,E (α) f* f

f Ff Iα f , Iα f

Notations

limit superior limit inferior essential supremum essential infimum

classes of functions

classes of sequences real number system n-dimensional Euclidean space complex number system Lebesgue integral in Rn Lebesgue integral in a measure space Riemann–Stieltjes integral norm Lp norm BMO constants convolution characteristic function almost everywhere measure space order of magnitude relations variations Fourier series conjugate Fourier series upper semicontinuous lower semicontinuous Riemann–Stieltjes sum sum of increments moduli of continuity distribution function Hardy–Littlewood maximal function conjugate function Fourier transform Fourier transform fractional integrals

503

Notations

Hf Hε f , Hε,ω f Hα (A) (A), ∗ (A) v(I) ∂f ∂xi

∇f

Hilbert transform truncated Hilbert transform Hausdorff measure Lebesgue–Stieltjes measure, outer measure volume partial derivative gradient

Mathematics

Published nearly forty years after the first edition, this long-awaited Second Edition also:

Measure and Integral An Introduction to Real Analysis Second Edition

Richard L. Wheeden Antoni Zygmund

Wheeden • Zygmund

This widely used and highly respected text for upper-division undergraduate and first-year graduate students of mathematics, statistics, probability, or engineering is revised for a new generation of students and instructors. The book also serves as a handy reference for professional mathematicians.

Second Edition

• Studies the Fourier transform of functions in the spaces L1, L2, and Lp, 1

Published nearly forty years after the first edition, this long-awaited Second Edition also:

Measure and Integral An Introduction to Real Analysis Second Edition

Richard L. Wheeden Antoni Zygmund

Wheeden • Zygmund

This widely used and highly respected text for upper-division undergraduate and first-year graduate students of mathematics, statistics, probability, or engineering is revised for a new generation of students and instructors. The book also serves as a handy reference for professional mathematicians.

Second Edition

• Studies the Fourier transform of functions in the spaces L1, L2, and Lp, 1 0, let x k = xk /|x|, y k = yk /|y|, x = (x 1 , . . . , x n ) = x/|x|, and y = (y 1 ,n. . . , y n ) = y/|y|. Then |x | = |y | = 1, so that by the case already proved, k = 1 xk yk ≤ 1; that is, n x y ≤ |x||y|, as claimed. k=1 k k If we now define the distance between two points x and y by d(x, y) = |x − y|, we immediately obtain the characteristic metric space properties: (i) d(x, y) = d(y, x) (ii) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y (iii) d(x, y) ≤ d(x, z) + d(z, y) We have used the symbol xk to denote the kth coordinate of x. When no confusion should arise, we will also use {xk } to denote a sequence of points of Rn . If x ∈ Rn , we say that a sequence {xk } converges to x, or that x is the limit point of {xk }, if |x − xk | → 0 as k → ∞. We denote this by writing either

4

Measure and Integral: An Introduction to Real Analysis

x = limk→∞ xk or xk → x as k → ∞. A point x ∈ Rn is called a limit point of a set E if it is the limit point of a sequence of distinct points of E. A point x ∈ E is called an isolated point of E if it is not the limit of any sequence in E (excluding the trivial sequence {xk } where xk = x for all k). It follows that x is isolated if and only if there is a δ > 0 such that |x – y| > δ for every y ∈ E, y = x. For sequences {xk } in R1 , we will write limk→∞ xk = + ∞, or xk → +∞ as k → ∞, if given M > 0 there is an integer K such that xk ≥ M whenever k ≥ K. A similar definition holds for limk→∞ xk =− ∞. A sequence {xk } in Rn is called a Cauchy sequence if given ε > 0 there is an integer K such that |xk − xj | < ε for all k, j ≥ K. We leave it as an exercise to prove that Rn is a complete metric space, that is, that every Cauchy sequence in Rn converges to a point of Rn . A set E ⊂ E1 is said to be dense in E1 if for every x1 ∈ E1 and ε > 0 there is a point x ∈ E such that 0 < |x − x1 | < ε. Thus, E is dense in E1 if every point of E1 is a limit point of E. If E = E1 , we say E is dense in itself. As an example, the set of points of Rn each of whose coordinates is a rational number is dense in Rn . Since this set is also countable, it follows that Rn is separable, by which we mean that Rn has a countable dense subset. For nonempty subsets E of R1 , we use the standard notations sup E and inf E for the supremum (least upper bound) and infimum (greatest lower bound) of E. In case sup E belongs to E, it will be called max E; similarly, inf E will be called min E if it belongs to E. 1 If {ak }∞ k = 1 is a sequence of points in R , let bj = supk≥j ak and cj = inf k≥j ak , j = 1, 2, . . . . Then −∞ ≤ cj ≤ bj ≤ +∞, and {bj } and {cj } are monotone decreasing and increasing, respectively; that is, bj ≥ bj+1 and cj ≤ cj+1 . Define lim supk→∞ ak and lim inf k→∞ ak by sup ak , j→∞ j → ∞ k≥j k→∞ lim inf ak = lim cj = lim inf ak .

lim sup ak = lim bj = lim k→∞

j→∞

j→∞

(1.3)

k≥j

We leave it as an exercise to show that −∞ ≤ lim inf k→∞ ak ≤ lim supk→∞ ak ≤ +∞ and that the following characterizations hold.

Theorem 1.4 (a) L = lim supk→∞ ak if and only if (i) there is that converges to L and (ii) if L > L, there ak < L for k ≥ K. (b) l = lim inf k→∞ ak if and only if (i) there is that converges to l and (ii) if l < l, there ak > l for k ≥ K.

a subsequence {akj } of {ak } is an integer K such that a subsequence {akj } of {ak } is an integer K such that

5

Preliminaries

Thus, when they are finite, lim supk→∞ ak and lim inf k→∞ ak are the largest and smallest limit points of {ak }, respectively. We leave it as a problem to show that {ak } converges to a, −∞ ≤ a ≤ + ∞, if and only if lim supk→∞ ak = lim inf k→∞ ak = a. We can also use the metric on Rn to define the diameter of a set E by letting δ(E) = diam E = sup x − y : x, y ∈ E . If the diameter of E is finite, E is said to be bounded. Equivalently, E is bounded if there is a finite constant M such that |x| ≤ M for all x ∈ E. If E1 and E2 are two sets, the distance between E1 and E2 is defined by d(E1 , E2 ) = inf{|x − y| : x ∈ E1 , y ∈ E2 }.

1.3 Open and Closed Sets in Rn , and Special Sets For x ∈ Rn and δ > 0, the set B(x; δ) = {y : |x − y| < δ} is called the open ball with center x and radius δ. A point x of a set E is called an interior point of E if there exists δ > 0 such that B(x; δ) ⊂ E. The collection of all interior points of E is called the interior of E and denoted E◦ . A set E is said to be open if E = E◦ ; that is, E is open if for each x ∈ E there exists δ > 0 such that B(x; δ) ⊂ E. The empty set ∅ is open by convention. The whole space Rn is clearly open, and we leave it as an exercise to prove that B(x; δ) is open. We will generally denote open sets by the letter G. A set E is called closed if CE is open. Note that ∅ and Rn are closed. Closed sets will generally be denoted by the letter F. The union of a set E and all its ¯ By the boundary of E, we limit points is called the closure of E and written E. ◦ ¯ mean the set E − E . We leave it to the reader to prove the following facts.

Theorem 1.5 (i) B (x; δ) = y : x − y ≤ δ . ¯ that is, E is closed if and only if it contains all (ii) E is closed if and only if E = E; its limit points. (iii) E¯ is closed, and E¯ is the smallest closed set containing E; that is, if F is closed and E ⊂ F, then E¯ ⊂ F. The open subsets of Rn satisfy the properties listed in the next theorem.

6

Measure and Integral: An Introduction to Real Analysis

Theorem 1.6 (i) The union of any number of open sets is open. (ii) The intersection of a finite number of open sets is open. Verification is left to the reader. Using the De Morgan laws, we obtain the following equivalent statements.

Theorem 1.7 (i) The intersection of any number of closed sets is closed. (ii) The union of a finite number of closed sets is closed. A subset E1 of E is said to be relatively open with respect to E if it can be written E1 = E ∩ G for some open set G. Similarly, E1 is relatively closed with respect to E if E1 = E ∩ F for some closed F. Note that the relative complement of a relatively open set is relatively closed. A useful alternate characterization of relatively closed is as follows. Theorem 1.8 A set E1 ⊂ E is relatively closed with respect to E if and only if E1 = E ∩ E¯ 1 , that is, if and only if every limit point of E1 that lies in E is in E1 . The proof is left as an exercise. Consider a collection {A} of sets A. Then a set is said to be of type Aδ if it can be written as a countable intersection of sets A and to be of type Aσ if it can be written as a countable union of sets A. Thus, “δ” stands for intersection and “σ” for union. The most common uses of this notation are Gδ and Fσ , where {G} denotes the open sets in Rn and {F} the closed sets. Hence, H is of type Gδ if H=

Gk , Gk open,

k

and H is of type Fσ if H=

Fk , Fk closed.

k

The complement of a Gδ set is an Fσ set, and vice versa. A Gδ (Fσ ) set is of course not generally open (closed); in fact, any closed (open) set in Rn is of

7

Preliminaries

type Gδ (Fσ ): see Exercise l(j). These two special types of sets will be very useful later in the measure approximation of general sets. Another special type of set that we will have occasion to use is a perfect set, by that we mean a closed set C each of whose points is a limit point of C. Thus, a perfect set is a closed set that is dense in itself. One particular property of perfect sets we will use is stated in the following theorem. The proof is postponed until Section 1.4.

Theorem 1.9

A perfect set is uncountable.

Other special sets that will be important are n-dimensional intervals. When n = 1 and a < b, we will use the usual notations [a, b] = {x : a ≤ x ≤ b}, (a, b) = {x : a < x < b}, [a, b) = {x : a ≤ x < b}, and (a, b] = {x : a < x ≤ b} for closed, open, and partly open intervals. Whenever we use just the word interval, we generally mean closed interval. An n-dimensional interval I is a subset of Rn of the form I = {x = (x1 , . . . , xn ) : ak ≤ xk ≤ bk , k = 1, . . . , n}, where ak < bk , k = 1, . . . , n. An interval is thus closed, and we say it has edges parallel to the coordinate axes. If the edge lengths bk − ak are all equal, I will be called an n-dimensional cube with edges parallel to the coordinate axes. Cubes will usually be denoted by the letter Q. Two intervals I1 and I2 are said to be nonoverlapping if their interiors are disjoint, that is, if the most they have in common is some part of their boundaries. A set equal to an interval minus some part of its boundary will be called a partly open interval. By definition, the volume v(I) of the interval I = {(x1 , . . . , xn ) : ak ≤ xk ≤ bk , k = 1, . . . , n} is v(I) =

n

(bk − ak ).

k=1

Somewhat more generally, if {ek }nk = 1 is any given set of n vectors emanating from a point in Rn , we will consider the closed parallelepiped P = {x : x =

n

tk ek , 0 ≤ tk ≤ 1}.

k=1

Note that the edges of P are parallel translates of the ek . Thus, P is an interval if the ek are parallel to the coordinate axes. The volume v(P) of P is by definition the absolute value of the n × n determinant having e1 , . . . , en as rows.∗ In case P is an interval, this definition agrees with the one given earlier. A linear transformation T of Rn transforms a parallelpiped P into a parallelpiped ∗ See, for example, G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 3rd ed., Macmillan,

New York, 1965, Theorem 8, p. 290.

8

Measure and Integral: An Introduction to Real Analysis

P with volume v(P ) = |det T|v(P).∗ In particular, a rotation of axes in Rn (which is an orthogonal linear transformation) does not change the volume of a parallelepiped. We will assume basic facts about volume: for example, if N N is finite and P is a parallelepiped with P ⊂ N 1 Ik , then v (P) ≤ 1 v(Ik ), and if {Ik }N are nonoverlapping intervals contained in a parallelepiped P, then N 1 v(I ) ≤ v(P). k 1 We shall use the notion of interval to obtain a basic decomposition of open sets in Rn . We consider first the case n = 1, which is somewhat simpler than n > 1.

Theorem 1.10 open intervals.

Every open set in R1 can be written as a countable union of disjoint

Proof. Let G be an open set in R1 . For x ∈ G, let Ix denote the maximal open interval containing x which is in G; that is, Ix is the union of all open intervals that contain x and that lie in G. If x, x ∈ G and x = x , then Ix and Ix must either be disjoint or identical, since if they intersect, their union is an open interval containing x and x . Clearly, G = x∈G Ix . Since each Ix contains a rational number, the number of distinct Ix must be countable, and the theorem follows. The construction used in this proof fails in Rn if n > 1, since the union of (overlapping) intervals is not generally an interval. The theorem itself fails when n > 1, as is easily seen by considering any open ball. As a substitute, we have the following useful result. Theorem 1.11 Every open set in Rn , n ≥ 1, can be written as a countable union of nonoverlapping (closed) cubes. It can also be written as a countable union of disjoint partly open cubes. Proof. Consider the lattice of points of Rn with integral coordinates and the corresponding net K0 of cubes with edge length 1 and vertices at these lattice points. Bisecting each edge of a cube in K0 , we obtain from it 2n subcubes of edge length 12 . The total collection of these subcubes for every cube in K0 forms a net K1 of cubes. If we continue bisecting, we obtain finer and finer nets Kj of cubes such that each cube in Kj has edge length 2−j and is the union of 2n nonoverlapping cubes in Kj+1 . Now let G be any open set in Rn . Let S0 be the collection of all cubes in K0 that lie entirely in G. Let S1 be those cubes in K1 that lie in G but that are not ∗ See, for example, G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 3rd ed., Macmillan,

New York, 1965, Theorem 9, p. 290.

Preliminaries

9

subcubes of any cube in S0 . More generally, for j ≥ 1, let Sj be the cubes in Kj that lie in G but that are not subcubes of any cube in S0 , . . . , Sj−1 . If S denotes the total collection of cubes from all the Sj , then S is countable since each Kj is countable, and the cubes in S are nonoverlapping by construction. Moreover, ∞, each since G is open and the cubes in Kj become arbitrarily small as j → point of G will eventually be caught in a cube in some Sj . Hence, G = Q∈S Q, which proves the first statement. The proof of the second statement is left to the reader. The collection {Q: Q ∈ Kj , j = 1, 2, . . .} constructed above is called a family of dyadic cubes. In general, by dyadic cubes, we mean the family of cubes obtained from repeated bisection of any initial net of cubes in Rn . Note that the family of dyadic cubes used in the proof of Theorem 1.11 could be replaced by one in which the initial net consists of cubes of any fixed edge length. It follows from Theorem 1.10 that any closed set in R1 can be constructed by deleting a countable number of open disjoint intervals from R1 . A perfect set results by removing the intervals in such a way as to create no isolated points; thus, we would not remove any two open intervals with a common endpoint.

1.4 Compact Sets and the Heine–Borel Theorem

By a cover of a set E, we mean a family F of sets A such that E ⊂ A∈F A. A subcover F1 of a cover F is a cover with the property that A1 ∈ F whenever A1 ∈ F1 . A cover F is called an open cover if each set in F is open. We say E is compact if every open cover of E has a finite subcover. Two equivalent statements, whose proofs are left as exercises, are as follows.

Theorem 1.12 (i) (The Heine–Borel theorem) A set E ⊂ Rn is compact if and only if it is closed and bounded. (ii) A set E ⊂ Rn is compact if and only if every sequence of points of E has a subsequence that converges to a point of E. We leave it as an exercise to show that the distance between two nonempty, compact, disjoint sets is positive and that the intersection of a countable sequence of decreasing, nonempty, compact sets is nonempty. Thus, a nested sequence of closed intervals has a nonempty intersection. See also Exercise 12. With these facts, we can now prove Theorem 1.9.

10

Measure and Integral: An Introduction to Real Analysis

Proof of Theorem 1.9. Let C be a perfect set in Rn , and suppose that C is countable: C = {ck }∞ k = 1 . Let Ck = C − {ck }, k ≥ 1. Given x1 ∈ C1 , let Q1 be a (closed) / Q1 . Then Q1 ∩ C is compact (closed and cube with center x1 such that c1 ∈ bounded) and not empty. Since x1 ∈ C and C is perfect, x1 is a limit point of C and so also of C2 . It follows that C2 ∩ Q◦1 is not empty. Let x2 ∈ C2 ∩ Q◦1 and choose a cube Q2 with center x2 such that Q2 ⊂ Q1 and c2 ∈ / Q2 . Then Q2 ∩ C is a compact, nonempty subset of Q1 ∩ C. Continuing in this way, we obtain a decreasing sequence Qk ∩ C of compact, nonempty sets such that / Qk . It follows that k (Qk ∩ C) is a nonempty subset of C that contains no ck ∈ ck . This contradiction proves that C must be uncountable and establishes the theorem.

1.5 Functions By a function f = f (x) defined for x in a set E ⊂ Rn , we will always mean a real-valued function, unless explicitly stated otherwise. By real-valued, we generally mean extended real-valued, that is, f may take the values ±∞; if | f (x)| < + ∞ for all x ∈ E, we say f is finite (or finite-valued) on E. A finite function f is said to be bounded on E if there is a finite number M such that | f (x)| ≤ M for x ∈ E; that is, f is bounded on E if supx∈E |f (x)| is finite. A sequence {fk } of functions is said to be uniformly bounded on E if there is a finite M such that |fk (x)| ≤ M for x ∈ E and all k. By the support of f , we mean the closure of the set where f is not zero. Thus, the support of a function is always closed. It follows that a function defined in Rn has compact support if and only if it vanishes outside some bounded set. A function f defined on an interval I in R1 is called monotone increasing (decreasing) if f (x) ≤ f (y) [f (x) ≥ f (y)] whenever x < y and x, y ∈ I. By strictly monotone increasing (decreasing), we mean that f (x) < f (y) [f (x) > f (y)] if x < y and x, y ∈ I. Let f be defined on E ⊂ Rn and let x0 be a limit point of E. Let B (x0 ; δ) = B(x0 ; δ) − {x0 } denote the punctured ball with center x0 and radius δ, and let M(x0 ; δ) =

sup

x∈B (x0 ;δ)∩E

f (x),

m(x0 ; δ) =

inf

x∈B (x0 ;δ)∩E

f (x).

As δ 0, M(x0 ; δ) decreases and m(x0 ; δ) increases, and we define lim sup f (x) = lim M(x0 ; δ),

x→x0 ;x∈E

δ→0

x→x0 ;x∈E

δ→0

lim inf f (x) = lim m(x0 ; δ).

(1.13)

11

Preliminaries

We leave it as an exercise to show that the following characterizations are valid.

Theorem 1.14 (a) M = lim supx→x0 ;x∈E f (x) if and only if (i) there exists {xk } in E−{x0 } such that xk → x0 and f (xk ) → M and (ii) if M > M, there exists δ > 0 such that f (x) < M for x ∈ B (x0 ; δ) ∩ E. (b) m = lim inf x→x0 ;x∈E f (x) if and only if (i) there exists {xk } in E−{x0 } such that xk → x0 and f (xk ) → m and (ii) if m < m, there exists δ > 0 such that f (x) > m for x ∈ B (x0 ; δ) ∩ E. We also define lim sup|x|→∞;x∈E f (x) and lim inf |x|→∞;x∈E f (x). For example, M = lim sup|x|→∞;x∈E f (x) means (i) there exist {xk } in E such that |xk | → ∞ and f (xk ) → M and (ii) if M > M, there exists N such that f (x) < M if |x| > N and x ∈ E. These notions should not be confused with lim supk→∞ fk (x) and lim inf k→∞ fk (x), which denote the lim sup and lim inf of the sequence {fk (x)}.

1.6 Continuous Functions and Transformations A function f defined in a neighborhood of x0 is said to be continuous at x0 if f (x0 ) is finite and limx→x0 f (x) = f (x0 ). If f is not continuous at x0 , it follows that unless f (x0 ) is infinite, either limx→x0 f (x) does not exist or is different from f (x0 ). For functions on R1 , we will use the notation f (x0 +) =

lim

x→x0 ;x>x0

f (x)

and

f (x0 −) =

lim

x→x0 ;x<x0

f (x)

for the right- and left-hand limits of f at x0 , when they exist. If f (x0 +), f (x0 −), and f (x0 ) exist and are finite, but f is not continuous at x0 , then either f (x0 +) = f (x0 −) or f (x0 +) = f (x0 −) = f (x0 ). In the first case, x0 is called a jump discontinuity of f and in the second, a removable discontinuity of f (since by changing the value of f at x0 , we can make it continuous there). Such discontinuities are said to be of the first kind, as distinguished from those of the second kind, for which either f (x0 +) or f (x0 −) does not exist or for which f (x0 +), f (x0 −) or f (x0 ) is infinite. If f is defined only in a set E containing x0 , E ⊂ Rn , then f is said to be continuous at x0 relative to E if f (x0 ) is finite and either x0 is an isolated point of E or x0 is a limit point of E and limx→x0 ;x∈E f (x) = f (x0 ). If E1 ⊂ E, a function

12

Measure and Integral: An Introduction to Real Analysis

is said to be continuous in E1 relative to E if it is continuous relative to E at every point of E1 . The proofs of the following basic facts are left as exercises. Theorem 1.15 Let E be a compact set in Rn and f be continuous in E relative to E. Then the following are true: (i) f is bounded on E; that is, supx∈E |f (x)| < ∞. (ii) f attains its supremum and infimum on E; that is, there exist x1 , x2 ∈ E such that f (x1 ) = supx∈E f (x), f (x2 ) = inf x∈E f (x). (iii) f is uniformly continuous on E relative to E; that is, given ε > 0, there exists δ > 0 such that |f (x) − f (y)| < ε if |x − y| < δ and x, y ∈ E. A sequence of functions {fk } defined on E is said to converge uniformly on E to a finite f if given ε > 0, there exists K such that |fk (x) − f (x)| < ε for k ≥ K and x ∈ E. We will use the following fact, whose proof is again left to the reader. Theorem 1.16 Let {fk } be a sequence of functions defined on E that are continuous in E relative to E and that converge uniformly on E to a finite f. Then f is continuous in E relative to E. A transformation T of a set E ⊂ Rn into Rn is a mapping y = Tx that carries points x ∈ E into points y ∈ Rn . If y = (y1 , . . . , yn ), then T can be identified with the collection of coordinate functions yk = fk (x), k = 1, . . . , n, which are induced by T. The image of E under T is the set {y: y = Tx for some x ∈ E}. T is continuous at x0 ∈ E relative to E (by which we mean limx→x0 ;x∈E Tx = Tx0 ) if and only if each fk is continuous at x0 relative to E. We will use the following result in Chapter 3. Theorem 1.17 Let y = Tx be a transformation of Rn that is continuous in E relative to E. If E is compact, then so is its image TE.

1.7 The Riemann Integral We shall see that the Lebesgue integral is more general than the Riemann integral, in the sense that whenever the Riemann integral of a function exists, then so does its Lebesgue integral, and the two are equal (Theorem 5.52).

13

Preliminaries

The Riemann integral is nonetheless useful, its significance being simplicity and computability. If f is defined and bounded on an interval I = {x : x = (x1 , . . . , xn ), ak ≤ xk ≤ bk , k = 1, . . . , n} in Rn , its Riemann integral will be denoted by b1 (R)

bn ...

a1

f (x1 , . . . , xn ) dx1 · · · dxn or (R)

an

f (x) dx

(1.18)

I

and is defined as follows. Partition I into a finite collection of nonoverlapping intervals, = {Ik }N k = 1 , and define the norm || of by || = maxk (diam Ik ). Select a point ξk in Ik for k ≥ 1, and let R = R (ξ1 , . . . , ξN ) =

N

f (ξk )v(Ik ),

k=1

N [sup f (x)]v(Ik ), U =

L =

k=1 x∈Ik

N

(1.19) [ inf f (x)]v(Ik ).

k=1

x∈Ik

We then define the Riemann integral by saying that A = (R) I f (x) dx if lim||→0 R exists and equals A; that is, if given ε > 0, there exists δ > 0 such that |A − R | < ε for any and any chosen {ξk }, provided only that || < δ. This definition is actually equivalent to the statement that inf U = sup L = A.

(1.20)

The integral of course exists if f is continuous on I. Proofs of these facts are left as exercises; the treatment given in Chapter 2 for the Riemann–Stieltjes integrals should serve as a review for many facts about Riemann integrals. See also Theorem 5.54.

Exercises 1. Prove the following facts, which were left earlier as exercises. (a) For a sequence of sets {Ek }, lim sup Ek consists of those points that belong to infinitely many Ek , and lim inf Ek consists of those points that belong to all Ek from some k on. (b) The De Morgan laws. (c) Every Cauchy sequence in Rn converges to a point of Rn . (This can be deduced from its analogue in R1 by noting that the entries in a

14

Measure and Integral: An Introduction to Real Analysis

given coordinate position of the points in a Cauchy sequence in Rn form a Cauchy sequence in R1 .) (d) Theorem 1.4. (e) A sequence {ak } in R1 converges to a, −∞ ≤ a ≤ +∞, if and only if lim supk→∞ ak = lim inf k→∞ ak = a. (f) B(x; δ) is open. (g) Theorem 1.5. (h) Theorems 1.6 and 1.7. (i) Theorem 1.8. (j) Any closed (open) set in Rn is of type Gδ (Fσ ). (If F is closed, consider the sets {x : dist(x, F) < (1/k)}, k = 1, 2, . . . .) (k) Theorem 1.12. (l) The distance between two nonempty, compact, disjoint sets in Rn is positive. See also Exercise 12. (m) The intersection of a countable sequence of decreasing, nonempty, compact sets is nonempty. (n) Theorem 1.14. (o) Theorem 1.15. (p) Theorem 1.16. (q) Theorem 1.17. (r) The Riemann integral A = (R) I f (x) dx of a bounded f over an interval I exists if and only if inf U = sup L = A. (s) If f is continuous on an interval I, then (R) I f (x) dx exists. 2. Find lim sup Ek and lim inf Ek if Ek = [−(1/k), 1] for k odd and Ek = [−1, (1/k)] for k even. 3. (a) Show that C(lim sup Ek ) = lim inf CEk . (b) Show that if Ek E or Ek E, then lim sup Ek = lim inf Ek = E. 4. (a) Show that lim supk→∞ (−ak ) = − lim inf k→∞ ak . (b) Show that lim supk→∞ (ak + bk ) ≤ lim supk→∞ ak + lim supk→∞ bk , provided that the expression on the right does not have the form ∞ + (−∞) or −∞ + ∞. (c) If {ak } and {bk } are nonnegative, bounded sequences, show that lim supk→∞ (ak bk ) ≤ (lim supk→∞ ak )(lim supk→∞ bk ). (d) Give examples for which the inequalities in parts (b) and (c) are not equalities. Show that if either {ak } or {bk } converges, equality holds in (b) and (c). 5. Find analogues of the statements in Exercise 4 for lim supx→x0 ;x∈E f (x). 6. Compare lim supk→∞ ak and lim sup(−∞, ak ).

15

Preliminaries

7. Show that E◦1 ∩ E◦2 = (E1 ∩ E2 )◦ and E◦1 ∪ E◦2 ⊂ (E1 ∪ E2 )◦ . Give an example when E◦1 ∪ E◦2 = (E1 ∪ E2 )◦ . 8. Let E be a set in Rn that is relatively open with respect to an interval I. Show that E can be written as a countable union of nonoverlapping intervals. 9. Prove that any closed subset of a compact set is compact. 10. Let {xk } be a bounded infinite sequence in Rn . Show that {xk } has a limit point. (This is the Bolzano–Weierstrass theorem in Rn .) 11. Give an example of a decreasing sequence of nonempty closed sets in Rn whose intersection is empty. 12. (a) Give an example of two disjoint, nonempty, closed sets E1 and E2 in Rn for which d(E1 , E2 ) = 0. (b) Let E1 , E2 be nonempty sets in Rn with E1 closed and E2 compact. Show that there are points x1 ∈ E1 and x2 ∈ E2 such that d(E1 , E2 ) = |x1 − x2 |. Deduce that d(E1 , E2 ) is positive if such E1 , E2 are disjoint. 13. If f is defined and uniformly continuous on E, show there is a function f¯ defined and continuous on E¯ such that f¯ = f on E. 14. If f is defined and uniformly continuous on a bounded set E, show that f is bounded on E. 15. Show that a bounded f is Riemann integrable on I if and only if given ε > 0, there is a partition of I such that 0 ≤ U − L < ε. (Exercise 1(r) may be helpful.) 16. If {fk } is a sequence of bounded, Riemann integrable functions on an interval I that converges uniformly on I to f , show that f is Riemann integrable on I and that (R) fk (x) dx → (R) f (x) dx. I

I

17. Let f be a finite function on Rn and define ω(δ) = sup {f (x) − f (y) : x − y ≤ δ}, δ > 0, to be the modulus of continuity of f . Show that ω(δ) decreases as δ decreases to 0 and that f is uniformly continuous if and only if ω(δ) → 0 as δ → 0. 18. Let F be a closed subset of (−∞, +∞), and let f be continuous relative to F. Show that there is a continuous function g on (−∞, +∞) which equals f in F. If | f (x)| ≤ M for x ∈ F, show that g can be chosen so that |g(x)| ≤ M for −∞ < x < +∞. (This is the Tietze extension theorem for the real line.)

16

Measure and Integral: An Introduction to Real Analysis

19. Prove the following special case of the Baire category theorem: the intersection of a countable number of open dense sets in R1 is dense in R1 . 20. Show that the irrational numbers form a set of type Gδ , but that the rational numbers do not. (For the second part, it is possible to argue by contradiction, using the first part and the result in Exercise 19.) 21. Construct a set in R1 that is neither of type Gδ nor of type Fσ . (Consider the union of the negative rationals and the positive irrationals, and use facts from Exercise 20.) 22. For an integer k = 1, . . . , n and a real number α, consider the hyperplane H = {x = (x1 , . . . , xn ) : xk = α}. Show that for every ε > 0, there is a collecof cubes in Rn with edges parallel to the coordinate axes such tion {Qj }∞ j= 1 that H ⊂ Qj and v(Qj ) < ε. (Using the terminology of Chapter 3, it follows that H has outermeasure 0 in Rn .)

2 Functions of Bounded Variation and the Riemann–Stieltjes Integral In the chapters ahead, we will study the Lebesgue integral. In this chapter, we introduce the Riemann–Stieltjes integral and, as a natural preliminary step, study functions of bounded variation. The justification for doing so is that Lebesgue integration is intimately connected with Riemann–Stieltjes integration, although this is not apparent from the definitions. We shall see in Theorem 5.43 that Lebesgue integrals can be represented as Riemann–Stieltjes integrals.

2.1 Functions of Bounded Variation Let f (x) be a real-valued function that is defined and finite for all x in a closed bounded interval a ≤ x ≤ b. Let = {x0 , x1 , . . . , xm } be a partition of [a, b]; that is, is a collection of points xi , i = 0, 1, . . . , m, satisfying x0 = a, xm = b, and xi−1 < xi for i = 1, . . . , m. With each partition , we associate the sum S = S [ f ; a, b] =

m

| f (xi ) − f (xi−1 )|.

i=1

The variation of f over [a, b] is defined as V = V [ f ; a, b] = sup S ,

where the supremum is taken over all partitions of [a, b]. The variation V[ f ; a, b] will sometimes also be denoted by V[a, b] or V(f ). Since 0 ≤ S < + ∞, we have 0 ≤ V ≤ +∞. If V < +∞, f is said to be of bounded variation on [a, b]; if V = +∞, f is of unbounded variation on [a, b].

17

18

Measure and Integral: An Introduction to Real Analysis

We list several simple examples. Example 1 Suppose f is monotone in [a, b]. Then, clearly, each S equals | f (b) − f (a)|, and therefore V = | f (b) − f (a)|. Example 2 Suppose the graph of f can be split into a finite number k of monotone arcs; that is, suppose [a, b] = i=1 [ai , ai+1 ] and f is monok tone in each [ai , ai+1 ]. Then V = i=1 | f (ai+1 ) − f (ai )|. To see this, we use the result of Example 1 and the fact, to be proved in Theorem 2.2, that V = V[a, b] = ki=1 V[ai , ai+1 ]. Example 3 Let f be defined by f (x) = 0 when x = 0 and f (0) = 1, and let [a, b] be any interval containing 0 in its interior. Then S is either 2 or 0, depending on whether or not x = 0 is a partitioning point of . Thus, V[a, b] = 2. If = {x0 , x1 , . . . , xm } is a partition of [a, b], let ||, called the norm of , be defined as the length of a longest subinterval of : || = max xi − xi−1 . i

If f is continuous on [a, b] and {j } is a sequence of partitions of [a, b] with |j | → 0, we shall see in Theorem 2.9 that V = limj→∞ Sj . Example 3 shows that this equality may fail for functions that are discontinuous even at a single point: if we take f and [a, b] as in Example 3 and choose the j such that x = 0 is never a partitioning point, then lim Sj = 0, while if we choose the j such that x = 0 alternately is and is not a partitioning point, then lim Sj does not exist. See also Exercise 20. Example 4 Let f be the Dirichlet function, defined by f (x) = 1 for rational x and f (x) = 0 for irrational x. Then, clearly, V[a, b] = +∞ for any interval [a, b]. Example 5 A function that is continuous on an interval is not necessarily of bounded variation on the interval. To see this, let {aj } and {dj }, j = 1, 2, . . . , be two monotonedecreasing sequences in (0, 1] with a1 = 1, limj→∞ aj = limj→∞ dj = 0 and dj = +∞. Construct a continuous f as follows. On each subinterval [aj+1 , aj ], the graph of f consists of the sides of the isosceles tri angle with base aj+1 , aj and height dj . Thus, f (aj ) = 0, and if mj denotes the midpoint of aj+1 , aj , then f (mj ) = dj . If we further define f (0) = 0, then f is continuous on [0,1]. Taking k to be the partition defined by the points 0, k+1 k aj j=1 , and mj j=1 , we see that Sk = 2 kj=1 dj . Hence, V[ f ; 0, 1] = +∞. See also Exercise 1.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

19

We mention here that there exist functions that are continuous on an interval but that are not of bounded variation on any subinterval. See Exercise 26 of Chapter 3. Example 6 A function f defined on [a, b] is said to satisfy a Lipschitz condition on [a, b], or to be a Lipschitz function on [a, b], if there is a constant C such that | f (x) − f (y)| ≤ C|x − y| for all x, y ∈ [a, b]. Such a function is clearly of bounded variation, with V[ f ; a, b] ≤ C(b − a). For example, if f has a continuous derivative on [a, b], or even just a bounded derivative, then (by the mean-value theorem) f satisfies a Lipschitz condition on [a, b]. For more examples of functions of bounded variation, see the exercises at the end of the chapter. In the next two theorems, we summarize some of the simplest properties of functions of bounded variation. The proof of the first theorem is left as an exercise. Theorem 2.1 (i) If f is of bounded variation on [a, b], then f is bounded on [a, b]. (ii) Let f and g be of bounded variation on [a, b]. Then cf (for any real constant c), f + g, and fg are of bounded variation on [a, b]. Moreover, f/g is of bounded variation on [a, b] if there exists an ε > 0 such that |g(x)| ≥ ε for x ε [a, b]. Before stating the second result, we note that if ¯ is a refinement of , that is, ¯ if contains all the partitioning points of plus some additional points, then S ≤ S¯ . This follows from the triangle inequality and is most easily seen in the case when ¯ consists of all the points of plus one additional point. The case of general ¯ can be reduced to this simple case by adding one point at a time to . Theorem 2.2 (i) If [a , b ] is a subinterval of [a, b], then V[a , b ] ≤ V[a, b]; that is, variation increases with interval. (ii) If a < c < b, then V[a, b] = V[a, c] + V[c, b]; that is, variation is additive on adjacent intervals.

20

Measure and Integral: An Introduction to Real Analysis

Proof. (i) This follows easily from (ii), as the reader can check. A simple direct proof based on adjoining the points a, b to generic partitions of [a , b ] can also be given. (ii) Let I = [a, b], I1 = [a, c], I2 = [c, b], V = V[a, b], V1 = V[a, c], and V2 = V[c, b]. If 1 and 2 are any partitions of I1 and I2 , respectively, then = 1 ∪2 is one of I, and S [I] = S1 [I1 ]+S2 [I2 ]. Thus, S1 [I1 ]+S2 [I2 ] ≤ V. Therefore, taking the supremum over 1 and 2 separately, we obtain V1 + V2 ≤ V. To show the opposite inequality, let be any partition of I, and let ¯ be with c adjoined. Then S [I] ≤ S¯ [I], and ¯ splits into partitions 1 of I1 and 2 of I2 . Thus, we have S [I] ≤ S¯ [I] = S1 [I1 ] + S2 [I2 ] ≤ V1 + V2 . Therefore, V ≤ V1 + V2 , which completes the proof of (ii). For any real number x, define

x x+ = 0

if x > 0 if x ≤ 0,

x− =

0 −x

if x > 0 if x ≤ 0.

These are called the positive and negative parts of x, respectively, and satisfy the relations x+ , x− ≥ 0;

|x| = x+ + x− ;

x = x+ − x− .

(2.3)

Given a finite function f on [a, b] and a partition = {xi }m i=0 of [a, b], define

P = P [ f ; a, b] =

m [ f (xi ) − f (xi−1 )]+ , i=1

m N = N [ f ; a, b] = [ f (xi ) − f (xi−1 )]− . i=1

Thus, P is the sum of the positive terms of S , and −N is the sum of the negative terms of S . In particular, by (2.3), P ≥ 0, N ≥ 0, P + N = S ,

(2.4)

P − N = f (b) − f (a).

(2.5)

Functions of Bounded Variation and the Riemann–Stieltjes Integral

21

The positive variation P and the negative variation N of f are defined by P = P[ f ; a, b] = sup P ,

N = N[ f ; a, b] = sup N .

Thus, 0 ≤ P, N ≤ +∞.

If any one of P, N, or V is finite, then all three are finite. Moreover,

Theorem 2.6 we then have

P + N = V,

P − N = f (b) − f (a),

or equivalently P=

1 [V + f (b) − f (a)] , 2

N=

1 [V − f (b) + f (a)] . 2

Proof. By (2.4), P +N ≤ V, and therefore, since P and N are nonnegative, P ≤ V and N ≤ V. In particular, P and N are finite if V is. By (2.4) again, S ≤ P + N and therefore V ≤ P + N. If either P or N is finite, so is the other by (2.5), and therefore so is V. This gives the first part of the theorem. Now choose a sequence of partitions k so that Pk → P. Let us show that Nk → N and P − [ f (b) − f (a)] = N. By (2.5), Nk = Pk − [ f (b) − f (a)] → P − [ f (b) − f (a)], and since Nk ≤ N, it follows that P − [ f (b) − f (a)] ≤ N. If P − [ f (b) − f (a)] < N, there is, by definition of N, a partition with N > P−[ f (b)−f (a)]. Then P = N + [ f (b)−f (a)] > P, which is impossible. Hence, P − [ f (b) − f (a)] = N and Nk → N. If N is finite, it follows that P − N = f (b) − f (a). Letting k → ∞ in the inequality Pk + Nk ≤ V gives P + N ≤ V. Since V ≤ P + N was shown earlier, we have V = P + N, and the theorem follows. Corollary 2.7 (Jordan’s Theorem) A function f is of bounded variation on [a, b] if and only if it can be written as the difference of two bounded increasing functions on [a, b]. Proof. Suppose f = f1 − f2 , where f1 and f2 are bounded and increasing on [a, b]. Then f1 and f2 are of bounded variation on [a, b], and therefore, by Theorem 2.1(ii), so is f . Conversely, suppose f is of bounded variation on [a, b]. By Theorem 2.2(i), f is of bounded variation on every interval [a, x], a ≤ x ≤ b. Let P(x) and N(x) denote the positive and negative variations of f on [a, x], respectively.

22

Measure and Integral: An Introduction to Real Analysis

By the analogue of Theorem 2.2(i) for P and N (see Exercise 3), it follows that P(x) and N(x) are bounded and increasing on [a, b]. Moreover, by Theorem 2.6 applied to [a, x], f (x) = [P(x) + f (a)] − N(x) when a ≤ x ≤ b. Since P(x) is bounded and increasing, so is P(x) + f (a), and the corollary follows. Note that since the negative of an increasing function is decreasing, Corollary 2.7 may be rephrased to say that f is of bounded variation if and only if it is the sum of a bounded increasing function and a bounded decreasing function. We remark here that there exist continuous functions of bounded variation that are not monotone in any subinterval. See Exercise 27 of Chapter 3. In the next theorem, we consider a continuity property of functions of bounded variation. We recall from Chapter 1 that a discontinuity is said to be of the first kind if it is either a jump or a removable discontinuity.

Theorem 2.8 Every function of bounded variation has at most a countable number of discontinuities, and they are all of the first kind. Here, if f is of bounded variation on [a, b], we can clarify what it means to say that f has a discontinuity of the first kind at the endpoints a, b by extending the definition of f outside [a, b] by setting f (x) = f (a) if x < a and f (x) = f (b) if x > b and then using the usual notion. Proof. Let f be of bounded variation on [a, b]. Suppose first that f is bounded and increasing on [a, b]. Then the only discontinuities of f are of the first kind; in fact, they are all jumpdiscontinuities. If D denotes the set of all discontinuities of f , then D = ∞ k=1 {x : f (x+) − f (x−) ≥ 1/k}. Since f is bounded, each set on the right is finite (or empty); therefore, D is countable. The general case follows from this by using Corollary 2.7. Note that removable discontinuities may arise by subtracting monotone functions; for example, consider the function f in Example 3 and the corresponding monotone functions P(x) and N(x). See also Exercise 25. We now discuss a property of the variation of a continuous function. See also Exercise 20. Theorem 2.9 If f is continuous on [a, b], then V = lim||→0 S ; that is, given M satisfying M < V, there exists δ > 0 such that S > M for any partition of [a, b] with || < δ. Proof. We remind the reader of the discussion following Example 3. Given M with M < V, we must find δ > 0 so that S > M if || < δ. Select μ > 0 such

Functions of Bounded Variation and the Riemann–Stieltjes Integral

23

that M+μ < V, and choose a fixed partition ¯ = {¯xj }kj=0 such that S¯ > M+μ. Using the uniform continuity of f on [a, b], pick η > 0 such that (i) |f (x) − f (x )| < μ/[2(k + 1)] if |x − x | < η. Now let be any partition that satisfies (ii) || < η, (iii) || < minj x¯ j − x¯ j−1 . We claim that S > M, from which the theorem will follow by choosing δ to be the smaller of η and minj (¯xj − x¯ j−1 ). Write = {xi }m i=0 and S =

m

| f (xi ) − f (xi−1 )| =

+

,

i=1

where is extended over all i such that (xi−1 , xi ) contains some x¯ j . By (iii), any (xi−1 , xi ) can contain at most one x¯ j , and therefore the number of terms of is at most k + 1. Let ∪ ¯ denote the partition formed by the union of ¯ ¯ ¯ the points of and . Then ∪ is a refinement of both and . Moreover, S∪¯ = + , where is obtained from by replacing each term by | f (xi ) − f (¯xj )| + | f (¯xj ) − f (xi−1 )|, x¯ j being the point of ¯ in (xi−1 , xi ). By (i) and (ii), each of these two terms is less than μ/[2(k + 1)], and therefore

< 2(k + 1)

μ = μ. 2(k + 1)

Hence,

= S∪¯ −

> S∪¯ − μ,

¯ S∪¯ ≥ S¯ . This gives so that S > S∪¯ − μ. Since ∪ ¯ is a refinement of , S > S¯ − μ > M and completes the proof. Corollary 2.10

If f has a continuous derivative f on [a, b], then

V=

b a

| f | dx,

P=

b a

+

{ f } dx,

N=

b a

{ f }− dx.

24

Measure and Integral: An Introduction to Real Analysis

Proof. By the mean-value theorem,

S =

m

| f (xi ) − f (xi−1 )| =

i=1

m

|f (ξi )|(xi − xi−1 )

i=1

for appropriate ξi ∈ (xi−1 , xi ), i = 1, . . . , m. Hence, by Theorem 2.9,

V = lim S = lim ||→0

||→0

m

|f (ξi )|(xi − xi−1 ) =

i=1

b

|f (x)| dx,

a

by definition of the Riemann integral. Moreover, by Theorem 2.6, ⎤ ⎡ b b 1 1 ⎣ P = [V + f (b) − f (a)] = |f (x)| dx + f (x) dx⎦ 2 2 a a =

b b 1 |f (x)| + f (x) dx = [f (x)]+ dx. 2a a

The formula for N follows similarly from the fact that

N=

1 [V − f (b) + f (a)]. 2

For an extension of Corollary 2.10, see Theorem 7.31. See also Exercise 22 of Chapter 7. In passing, we note that there are notions of bounded variation for open or partly open intervals, as well as for infinite intervals. Suppose, for example, that (a, b) is a bounded open interval. Let [a , b ] ⊂ (a, b), and define V ◦ (a, b) = lim V[a , b ] as a → a and b → b. If V ◦ (a, b) < + ∞, we say f is of bounded variation on (a, b). Similarly, if f is defined on (−∞, +∞), let V(−∞, +∞) = lim V[a, b] as a → −∞ and b → +∞. Analogous definitions hold for [a, b), (a, +∞), [a, +∞), etc. See Exercise 8. We may also consider the notion of bounded variation for complex-valued f defined on an interval. The definition is the same as for the real-valued case, and we leave it to the reader to show that a complex-valued f is of bounded variation if and only if both its real and imaginary parts are as well.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

25

2.2 Rectifiable Curves As an application of the notion of bounded variation, we shall discuss its relation to rectifiable curves (initially, those in the plane). A curve C in the plane is two finite real-valued parametric equations C:

x = ϕ(t) , y = ψ(t)

a ≤ t ≤ b.

(2.11)

The graph of C is {(x, y) : x = φ(t), y = ψ(t), a ≤ t ≤ b}. The graph may have self-intersections and is not necessarily continuous or bounded. We think of the curve itself as the mapping of [a, b] onto the graph. Let = {a = t0 < t1 < · · · < tm = b} be a partition of [a, b], and consider the corresponding points Pi = (φ(ti ), ψ(ti )), i = 0, 1, . . . , m, on the graph of C. Draw the polygonal (broken) line connecting P0 to P1 , P1 to P2 , . . . , Pm−1 to Pm in order, and let l() =

m 2 2 1/2 φ (ti ) − φ ti−1 + ψ (ti ) − ψ ti−1 i=1

denote its length. The length L of C is defined by the equation L = L(C) = sup l().

(2.12)

Thus, 0 ≤ L ≤ +∞. If the graph of C is discontinuous, then as we move along the graph, the length of every missing segment will contribute to L. Moreover, the possibility that the graph may be traversed more than once, that is, that the mapping t → (φ(t), ψ(t)), a ≤ t ≤ b, may not be one-to-one, will add to L. We say C is rectifiable if L < +∞.

Theorem 2.13 Let C be a curve defined by (2.11). Then C is rectifiable if and only if both φ and ψ are of bounded variation. Moreover, V(φ), V(ψ) ≤ L ≤ V(φ) + V(ψ). Proof. We will use the simple inequalities 1/2 ≤ |a| + |b| |a|, |b| ≤ a2 + b2

26

Measure and Integral: An Introduction to Real Analysis

for real a and b. Thus, if C is rectifiable and = {ti } is any partition of [a, b], the inequality l() =

2 2 1/2 φ (ti ) − φ ti−1 + ψ (ti ) − ψ ti−1 ≤L

implies |φ (ti ) − φ ti−1 | ≤ L and |ψ (ti ) − ψ ti−1 | ≤ L. Hence, V(φ), V(ψ) ≤ L. On the other hand, for any C, l() ≤

|φ (ti ) − φ ti−1 | + |ψ (ti ) − ψ ti−1 | ≤ V(φ) + V(ψ).

Hence, L ≤ V(φ) + V(ψ), which completes the proof. It follows that if φ(t) is any bounded function that is not of bounded variation on [a, b] (see Example 5 and Exercise 1), then the curve given by x = y = φ(t), a ≤ t ≤ b, is not rectifiable, even though its graph lies in a finite segment of the line y = x. Thus, the length of the graph of a curve is not necessarily the same as the length of the curve. In the special case that C is given by a function y = f (x), Theorem 2.13 reduces to the simple statement that C is rectifiable if and only if f is of bounded variation. Curves in Rn can be treated similarly, and we shall be brief. By a curve C in n R , we mean a system x1 = φ1 (t), . . . , xn = φn (t), for t in some [a, b]. We consider a partition = {ti }m i=0 of [a, b] and the length l() of the corresponding polygonal line:

l() =

m

⎛ ⎞1/2 m n 2 ⎝ φj (ti ) − φj ti−1 ⎠ . Pi−1 Pi =

i=1

i=1

j=1

The quantity L = sup l() is called the length of C, and if L < +∞, C is said to be rectifiable. As seen from the definition of l(), exactly as in the case n = 2, C is rectifiable if and only if each φj is of bounded variation.

2.3 The Riemann–Stieltjes Integral Let f and φ be two functions that are defined and finite on a finite interval [a, b]. If = {a = x0 < x1 < · · · < xm = b} is a partition of [a, b], we arbitrarily select intermediate points {ξi }m i=1 satisfying xi−1 ≤ ξi ≤ xi and write R =

m i=1

f (ξi ) φ (xi ) − φ xi−1 .

(2.14)

Functions of Bounded Variation and the Riemann–Stieltjes Integral

27

R is called a Riemann–Stieltjes sum for and of course depends on the points ξi , the functions f and φ, and the interval [a, b], although we shall usually not display this dependence in our notation. If I = lim R

(2.15)

||→0

exists and is finite, that is, if given ε > 0 there is a δ > 0 such that |I – R | < ε for any satisfying || < δ and for any choice of intermediate points, then I is called the Riemann–Stieltjes integral of f with respect to φ on [a, b] and denoted I=

b

b f (x) dφ(x) = f dφ.

a

a

b A necessary and sufficient condition for the existence of a f dφ is the following Cauchy criterion: given ε > 0, there exists δ > 0 such that |R −R | < ε if ||, | | < δ. See Exercise 11. We list four preliminary remarks about this integral. b b 1. If φ(x) = x, a f dφ is clearly just the Riemann integral a f dx. In this case, Theorem 5.54 in Chapter 5 gives a necessary and sufficient condition on f for the existence of the integral. 2. If f is continuous on [a, b] and φ is continuously differentiable on [a, b], then b b a f dφ = a f φ dx. (See also Theorem 7.32.) In fact, by the mean-value theorem, R =

f (ξi ) φ (ηi ) xi − xi−1 , f (ξi ) φ (xi ) − φ xi−1 =

with xi−1 ≤ ξi , ηi ≤ xi . Using the uniform continuity of φ , we obtain b lim||→0 R = a f φ dx.

3. Let φ(x) be a step function; that is, suppose there are points a = α0 < α1 < · · · < αm = b such that φ is constant on each interval (αi−1 , αi ). Let φ(αi +) = lim φ(x), x→αi +

i = 0, 1, . . . , m − 1,

and φ(αi −) = lim φ(x), x→αi −

i = 1, . . . , m,

28

Measure and Integral: An Introduction to Real Analysis

denote the limits from the right and left at αi , and let di = φ(αi +) − φ(αi −), i = 1, . . . , m − 1, d0 = φ(α0 +) − φ(α0 ), and dm = φ(αm ) − φ(αm −) denote the jumps of φ. Then, for continuous f , b

f dφ =

a

m

f (αi )di .

i=0

The existence of the integral can be verified directly or by appealing to Theorem 2.24. See also Exercise 29. For example, if φ(x) is chosen to be the Heaviside function H(x) defined by H(x) = 0 when x < 0 and H(x) = 1 when x ≥ 0, then 1

f dH = f (0)

−1

if f is continuous at x = 0. 4. In definition (2.15), no condition other than finiteness is imposed on f or φ, but we shall see later that the most important applications occur when φ is monotone or, more generally, of bounded variation. We note now that b if a f dφ exists, then f and φ have no common points of discontinuity. To prove this, suppose that both f and φ are discontinuous at x¯ , a < x¯ < b. Suppose first that the discontinuity of φ is not removable. Then there is a fixed η > 0 such that, for every ε > 0, there exist points x¯ 1 and x¯ 2 with x¯ − ε2 < x¯ 1 < x¯ < x¯ 2 < x¯ + ε2 and |φ(¯x2 ) − φ(¯x1 )| > η. Let = {xi } be a partition of [a, b] with || < ε such that xi0 −1 = x¯ 1 and xi0 = x¯ 2 for some i0 . Choose a point ξi ∈ [xi−1 , xi ] for i = i0 and two different points ξi0 and ξi in 0 [xi0 −1 , xi0 ]. Let R be the Riemann–Stieltjes sum using ξi in each [xi−1 , xi ], and let R be the sum using ξi in [xi−1 , xi ] for i = i0 and ξi in [xi0 −1 , xi0 ]. 0 Then, clearly, |R − R | = | f ξi0 − f ξi0 | |φ xi0 − φ xi0 −1 | > η| f ξi0 − f ξi0 |. Since f is discontinuous at x¯ , we can choose ξi0 and ξi subject to the restric0 tions earlier and such that | f (ξi0 ) − f (ξi )| > μ for some μ > 0 independent 0 of ε. It follows that |R − R | exceeds a positive constant independent of ε, contradicting the assumption that R − R → 0 as ||, | | → 0. If the discontinuity of φ at x¯ is removable, a similar argument can be given. The main difference is that we consider with x¯ as a partitioning

29

Functions of Bounded Variation and the Riemann–Stieltjes Integral

point xi0 and argue for either [xi0 −1 , x¯ ] or [¯x, xi0 +1 ], depending on the nature of the discontinuity of f at x¯ . The arguments in the case where x¯ is either a or b are similar. In the theorem that follows, we list some simple properties of the Riemann–Stieltjes integral. The proofs are left as an exercise.

Theorem 2.16 (i) If

b a

f dφ exists, then so do b

b a

cf dφ and

cf dφ =

b

a

(ii) If

b

a f1 dφ

and

b

a f2 dφ

b

b a

f dφ1 and

b a

f d(cφ) for any constant c, and

a

a

both exist, so does

(f1 + f2 ) dφ =

b

b a

b a

(f1 + f2 ) dφ, and

f1 dφ +

a

f dφ2 exist, so does b

a

b f d(cφ) = c f dφ.

a

(iii) If

b

b

f2 dφ.

a

f d(φ1 + φ2 ), and

b b f d(φ1 + φ2 ) = f dφ1 + f dφ2 .

a

a

a

The additivity of the integral with respect to intervals is given by the following result. See also Exercise 14.

Theorem 2.17 exist and

If

b a

f dφ exists and a < c < b, then

b a

f dφ =

c a

f dφ +

b

c a

f dφ and

b c

f dφ both

f dφ.

c

Proof. In the proof, R [a, b] will denote a Riemann–Stieltjes sum correspondc ing to a partition of [a, b]. To show that a f dφ exists, it is enough to show

30

Measure and Integral: An Introduction to Real Analysis

that given ε > 0, there exists δ > 0 so that if 1 and 2 are partitions of [a, c] with |1 |, |2 | < δ, then |R1 [a, c] − R2 [a, c]| < ε.

(2.18)

b Since a f dφ exists, there is a δ > 0 so that for any partitions 1 and 2 of [a, b] with |1 |, |2 | < δ, we have |R [a, b] − R [a, b]| < ε. 1

(2.19)

2

Let 1 and 2 be partitions of [a, c] with given sets of intermediate points. Complete 1 and 2 to partitions 1 and 2 of [a, b] by adjoining the same points of [c, b]; that is, let be a partition of [c, b] and let 1 = 1 ∪ , 2 = 2 ∪ . Select a set of intermediate points in [c, b] for , and let the intermediate points of 1 and 2 consist of these together with the sets for 1 and 2 , respectively. Then R [a, b] = R1 [a, c] + R [c, b] 1

(2.20)

R [a, b] = R2 [a, c] + R [c, b]. 2

If we now assume that |1 |, |2 | < δ and choose so that | | < δ, then |1 |, |2 | < δ, and (2.18) follows from (2.19) by subtracting the equations in (2.20). b The proof of the existence of c f dφ is similar. The fact that b a

f dφ =

c

f dφ +

a

b

f dφ

c

follows from (2.20). This completes the proof. See also Exercise 19. The next result is the very useful formula for integration by parts.

Theorem 2.21

If

b a

b a

f dφ exists, then so does

b a

φ df , and

f dφ = [ f (b)φ(b) − f (a)φ(a)] −

b a

φ df .

31

Functions of Bounded Variation and the Riemann–Stieltjes Integral

Proof. Let = {a = x0 < x1 < · · · < xm = b} and xi−1 ≤ ξi ≤ xi . Then R =

m

m m f (ξi ) φ (xi ) − φ xi−1 = f (ξi ) φ (xi ) − f (ξi ) φ xi−1

i=1

=

m

i=1 m−1

f (ξi ) φ (xi ) −

i=1 m−1

=−

i=1

f ξi+1 φ (xi )

i=0

φ (xi ) f ξi+1 − f (ξi ) + f (ξm ) φ(b) − f (ξ1 ) φ(a)

i=1

since xm = b and x0 = a. If we subtract and add φ(a)[ f (ξ1 )−f (a)]+φ(b)[ f (b)− f (ξm )] on the right side of the last equality and cancel like terms, we obtain R = −T + [ f (b)φ(b) − f (a)φ(a)], where

T =

m−1

φ (xi ) f ξi+1 − f (ξi ) + φ(a) [ f (ξ1 ) − f (a)] + φ(b) [ f (b) − f (ξm )] .

i=1

Since the ξi straddle the xi (successive ξi s may be equal), T is a Riemann– b Stieltjes sum for a φ df . Observing that the roles of φ and f can be interb changed, and taking the limit as || → 0, we see that a f dφ exists if and b b b only if a φ df exists and that a f dφ = [ f (b)φ(b) − f (a)φ(a)] − a φ df . This completes the proof. Now let f be bounded and φ be monotone increasing on [a, b]. If = {xi }m i=0 , let mi = L =

inf

xi−1 ≤ x ≤ xi m

f (x),

Mi =

sup

xi−1 ≤ x ≤ xi

f (x),

mi φ (xi ) − φ xi−1 ,

i=1

U =

m

Mi φ (xi ) − φ xi−1 .

i=1

Since −∞ < mi ≤ Mi < +∞ and φ(xi ) − φ(xi−1 ) ≥ 0, we see that L ≤ R ≤ U .

(2.22)

32

Measure and Integral: An Introduction to Real Analysis

L and U are called the lower and upper Riemann–Stieltjes sums for , respectively. The behavior of L and U is somewhat more predictable than that of R , as we now show.

Lemma 2.23

Let f be bounded and φ be increasing on [a, b].

(i) If is a refinement of , then L ≥ L and U ≤ U . (ii) If 1 and 2 are any two partitions, then L1 ≤ U2 . Proof. To see (i) for upper sums, suppose that has only one point x not in . If x lies between xi−1 and xi of , then sup[xi−1 ,x ] f (x), sup[x ,xi ] f (x) ≤ Mi , so that sup f (x) φ(x ) − φ xi−1

[xi−1 ,x ]

+ sup f (x) φ (xi ) − φ x ≤ Mi φ (xi ) − φ xi−1 . [x ,xi ]

Hence, U ≤ U . Since can be obtained by adding one point at a time to , an extension of this reasoning proves (i) for upper sums. The argument for lower sums is similar. To show (ii), note that 1 ∪ 2 is a refinement of both 1 and 2 . Hence, by part (i) and (2.22), we obtain L1 ≤ L1 ∪2 ≤ U1 ∪2 ≤ U2 , which completes the proof. We now come to an important result that gives sufficient conditions for the b existence of a f dφ. See also Exercise 23. Theorem 2.24 If f is continuous on [a, b] and φ is of bounded variation on [a, b], b then a f dφ exists. Moreover, b f dφ ≤ sup| f | V[φ; a, b]. a

[a,b]

Proof. To prove the existence, we may suppose by Corollary 2.7 and Theorem 2.16(iii) that φ is monotone increasing. Then, by (2.22), L ≤ R ≤ U , and it is enough to show that lim||→0 L and lim||→0 U exist and are equal. This is clear if φ is constant on [a, b]. If φ is not constant, let = {xi } and note that

Functions of Bounded Variation and the Riemann–Stieltjes Integral

33

given ε > 0, the uniform continuity of f implies there exists δ > 0 such that if || < δ, then Mi − mi < ε/[φ(b) − φ(a)]. Hence, if || < δ, 0 ≤ U − L =

(Mi − mi ) φ (xi ) − φ xi−1 < ε.

(2.25)

Therefore, lim (U − L ) = 0,

||→0

(2.26)

and it is enough to show that lim||→0 U exists. This is immediate since otherwise there would exist an ε > 0 and two sequences of partitions, {k } and {k }, with norms tending to zero such that U − U > ε. In view of (2.26), k k we would then have, for k large enough, L − U > ε/2 > 0, contradicting k k the fact that L ≤ U for any and (Lemma 2.23). b To complete the proof, note that the inequality | a f dφ| ≤ (sup(a,b) | f |) V[φ; a, b] follows from a similar one for R by taking the limit. b Combining Theorems 2.21 and 2.24, we see that a f dφ exists if either f or φ is continuous and the other is of bounded variation. Theorem 2.27 (Mean-Value Theorem) If f is continuous on [a, b] and φ is bounded and increasing on [a, b], there exists ξ ∈ [a, b] such that b

f dφ = f (ξ)[φ(b) − φ(a)].

a

Proof. Since φ is increasing, we have min f [φ(b) − φ(a)] ≤ R ≤ max f [φ(b) − φ(a)] [a,b]

for any R . Since

[a,b]

b a

f dφ exists (see Theorem 2.24), it must satisfy

b min f [φ(b) − φ(a)] ≤ f dφ ≤ max f [φ(b) − φ(a)]. [a,b]

a

[a,b]

The result now follows immediately from the intermediate value theorem for continuous functions.

34

Measure and Integral: An Introduction to Real Analysis

In passing, we note that Riemann–Stieltjes integrals can also be defined, in the improper sense, on open or partly open bounded intervals and on infinite intervals. If f and φ are defined on (a, b), for example, let a < a < b < b and define b a

f dφ = lim

b

a →a b →b a

f dφ,

provided the limit exists in the sense that it is independent of how a → a and b → b. Similarly, let +∞

f dφ = lim

b

a→−∞ b→+∞ a

−∞

f dφ

if the limit exists. Analogous definitions can be given for [a, b), (a, +∞), [a, +∞), etc. See Exercise 24.

2.4 Further Results about Riemann–Stieltjes Integrals

b We will discuss a variant of the definition of a f dφ in the case where f is bounded and φ is increasing. Note that it then follows from part (ii) of Lemma 2.23 that −∞ < sup L ≤ inf U < +∞. It is natural to ask if the b existence of a f dφ in this case is equivalent to the statement that sup L = inf U ,

(2.28)

which we know to be an equivalent definition in the case of Riemann integrals (see (1.20)). Unfortunately, the answer in general is no, as the following example shows. Let [a, b] = [−1,1] and f (x) = φ(x) =

0 1

if − 1 ≤ x < 0 if 0 ≤ x ≤ 1,

0 1

if − 1 ≤ x ≤ 0 if 0 < x ≤ 1.

1 Since f and φ have a common discontinuity, −1 f dφ does not exist. In fact, if straddles 0, that is, if xi0 −1 < 0 < xi0 for some i0 , then R = f (ξi0 ) for xi0 −1 ≤ ξi0 ≤ xi0 . Hence, R may be 0 or 1 and thus cannot have a limit. On the other

35

Functions of Bounded Variation and the Riemann–Stieltjes Integral

hand, it is easy to check that U = 1 for any and that L = 0 if straddles 0 and L = 1 otherwise. Hence, neither lim||→0 R nor lim||→0 L exists, but sup L = inf U = 1. In the following two theorems, we explore relations between (2.15) and (2.28).

Theorem 2.29 Let f be bounded and φ be monotone increasing on [a, b]. If exists, then lim||→0 L and lim||→0 U exist, and

lim L = lim U = sup L = inf U =

||→0

||→0

b

b a

f dφ

f dφ.

a

Proof. We may assume that φ is not constant on [a, b] since the result is obvib ous otherwise. Let I = a f dφ. By hypothesis, given ε > 0, there is a δ > 0 such that |I – R | < ε for any R with || < δ. Given = {xi }m i=0 with || < δ, choose ξi and ηi in [xi−1 , xi ], i = 1, . . . , m, such that 0 ≤ Mi − f (ξi )

0 such that U < U + ε if || < δ. Choose ¯ = {¯xj }kj=0 such that U¯ < U + ε2 , and let M = sup[a,b] | f |. By the uniform continuity of φ, there exists η > 0 such that |φ(x) − φ(x )|

0, with V[ f ; a+ε, b] ≤ M < +∞. Show that V[ f ; a, b] < +∞. Is V[ f ; a, b] ≤ M? If not, what additional assumption will make it so? 6. Let f (x) = x2 sin (1/x) for 0 < x ≤ 1 and f (0) = 0. Show that V[ f ; 0, 1] < +∞. (Examine the graph of f , or use Exercise 5 and Corollary 2.10.) 7. Suppose f is of bounded variation on [a, b]. If f is continuous at a point x¯ , show that V(x), P(x), and N(x) are also continuous at x¯ . In particular, if f is continuous on [a, b], then so are V(x), P(x), and N(x). (If = {xi }, note

38

Measure and Integral: An Introduction to Real Analysis

that V[xi−1 , xi ] − |f (xi−1 ) − f (xi )| ≤ V[a, b] − S . Recall that S increases when is refined.) 8. The main results about functions of bounded variation on a closed bounded interval remain true for open or partly open intervals and for infinite intervals. Prove, for example, that if f is of bounded variation on (−∞, +∞), then f is the difference of two increasing bounded functions. 9. Let C be a curve with parametric equations x = φ(t) and y = ψ(t), a ≤ t ≤ b. (a) If φ and ψ are of bounded variation and continuous, show that L = lim||→0 l(). b (b) If φ and ψ are continuously differentiable, show that L = a ([φ (t)]2 + [ψ (t)]2 )1/2 dt. 10. If λ1 < λ2 < · · · < λm is a finite sequence and −∞ < s < +∞, write −sλk as a Riemann–Stieltjes integral. (Take f (x) = e−sx , φ to be an k ak e appropriate step function and [a, b] to contain all the λk in its interior.) b 11. Show that a f dφ exists if and only if given ε > 0, there exists δ > 0 such that |R − R | < ε if ||, | | < δ. 12. Prove that the conclusion of Theorem 2.30 is valid if the assumption that φ is continuous is replaced by the assumption that f and φ have no common discontinuities. (Instead of the uniform continuity of φ, use the fact ¯ that either f or φ is continuous at each point x¯ j of .) 13. Prove Theorem 2.16. c b 14. Give an example that shows that for a < c < b, a f dφ and c f dφ may b both exist but a f dφ may not. Compare Theorem 2.17. (Take [a, b] = [−1,1], c = 0, and f and φ as in the example following (2.28).) 15. Suppose f is continuous and x φ is of bounded variation on [a, b]. Show that the function ψ(x) = a f dφ is of bounded variation on [a, b]. If g is b b continuous on [a, b], show that a g dψ = a gf dφ.

16. Suppose that φ is of bounded variation on [a, b] and that f is bounded and continuous except for a finite number of jump discontinuities in [a, b]. If b φ is continuous at each discontinuity of f , show that a f dφ exists. 17. If φ is of bounded variation on (−∞, +∞), f is continuous on (−∞, +∞), +∞ and lim|x|→+∞ f (x) = 0, show that −∞ f dφ exists. k 18. Let f (z) = ∞ |ak | < +∞, then k=0 ak z be a power series. Show that if f (z) is of bounded variation on every radius of the circle 1. (If, e.g., |z| = k− k x a− the radius is 0 ≤ x ≤ 1 and the ak are real, then f (x) = a+ k k x .) b 19. Let f and φ be finite functions on [a, b]. If [a , b ] ⊂ [a, b] and a f dφ exists, b show that a f dφ exists.

Functions of Bounded Variation and the Riemann–Stieltjes Integral

39

20. Let f be finite on [a, b]. If lim||→0 S [ f ; a, b] exists, show that it equals V[ f ; a, b]. 21. If V[φ; a, b] = +∞, show that there is a point x0 ∈ [a, b] such that either V[φ; I] = + ∞ for every subinterval I of [a, b] having x0 as left-hand endpoint or V[φ; I] = +∞ for every subinterval I of [a, b] having x0 as right-hand endpoint. (Note that for any x0 , a , b such that a ≤ a < x0 < b ≤ b, V[φ; a , b ] < +∞ if both V[φ; a , x0 ], V[φ; x0 , b ] < +∞.) 22. If V[φ; a, b] = +∞, show that there exist x0 ∈ [a, b] and a monotone ∞ in [a, b] such that x → x and sequence {xk }∞ 0 k k=1 |φ(xk+1 ) − φ(xk )| = 1 +∞. (Use the results in Exercises 21 and 5.) 23. If V[φ; a, b] = +∞, show that there is a continuous f on [a, b] such that b a f dφ does not exist. (Use the result in Exercise 22 together with the αk = following fact: if {αk } is a sequence of positive numbers with } of positive numbers with ε → 0 and +∞, then there is a sequence {ε k k εk αk = +∞.) 24. Let f be continuous and φ be of bounded variation on [a, b], and recall b that the Riemann–Stieltjes integral a f dφ then exists by Theorem 2.24. a+ε Show that limε→0+ a f dφ = 0 if and only if either f (a) = 0 or φ is b b continuous at a. Deduce that the formula a f dφ = limε→0+ a+ε f dφ may not hold. 25. Construct a bounded nondecreasing function on (−∞, ∞) which is continuous at every irrational number and discontinuous at every rational enumeration of the rational numbers and number. (Let {rk }∞ k=1 be an consider the function f (x) = k:rk ≤x 2−k .)

26. Let f (x) = sin x. Sketch the graphs of its variations P(x), N(x), V(x) for 0 ≤ x ≤ 2π, and use Corollary 2.10 to find explicit formulas for these variations on [0, 2π].

27. If f is an even function on [−1, 1], verify the formulas V[ f ; −1, 1] = 2P[ f ; −1, 1], V[ f ; 0, 1] = P[ f ; −1, 1] and P[ f ; 0, 1] = N[ f ; −1, 0]. 28. Let f be Lipschitz continuous on an interval [a, b]. Show that V[ f ; a, b] is strictly less than the length of the graph of f over [a, b]. b 29. Use Theorem 2.17 to verify the formula for a f dφ given in remark 3 in Section 2.3. 30. Let f and φ be real-valued functions on [a, b]. b (a) If a f dφ exists and φ is not constant on any subinterval of [a, b], show that f is bounded on [a, b]. b b (b) If a f dφ exists and φ is increasing, show that L ≤ a f dφ ≤ U for every partition of [a, b], where L and U are the corresponding lower and upper sums.

40

Measure and Integral: An Introduction to Real Analysis

31. Show that for any real-valued function f on [a, b], the Riemann– b Stieltjes integral a df exists and equals f (b) − f (a). If f exists and is b b Riemann integrable on [a, b], show that a f (x) dx = a df , and conseb quently a f (x) dx = f (b) − f (a). 32. Let f be a function of bounded variation on an interval [a, b]. Show that f is Riemann integrable on [a, b]. (This follows from Theorems 5.54 and 2.8, but it can be derived solely from the results in this chapter.)

3 Lebesgue Measure and Outer Measure In this chapter, we will define and study the Lebesgue measure of sets in Rn . This will be the foundation for the theory of integration to be developed later. We will base the presentation on the notion of the outer measure of a set.

3.1 Lebesgue Outer Measure and the Cantor Set We consider closed n-dimensional intervals I = {x: aj ≤ xj ≤ bj , j = 1, . . . , n} and their volumes v(I) = nj=1 (bj − aj ). (See p. 7 in Section 1.3.) To define the outer measure of an arbitrary subset E of Rn , cover E by a countable collection S of intervals Ik , and let σ(S) = v(Ik ). Ik ∈S

The Lebesgue outer measure (or exterior measure) of E, denoted |E|e , is defined by |E|e = inf σ(S),

(3.1)

where the infimum is taken over all such covers S of E. Thus, 0 ≤ |E|e ≤ +∞.

Theorem 3.2

For an interval I, |I|e = v(I).

Proof. Since I covers itself, we have |I|e ≤ v(I). To show the opposite inequal∗ interval ity, suppose that S = {Ik }∞ k=1 is a cover of I. Given ε > 0, let Ik be an ∗ containing Ik in its interior such that v(Ik ) ≤ (1 + ε)v(Ik ). Then I ⊂ k (Ik∗ )◦ , and since I is closed and bounded, the Heine–Borel Theorem 1.9 implies there N N ∗ ∗ is an integer N such that I ⊂ k=1 Ik . Therefore, v(I) ≤ k=1 v(Ik ) by a basic property of the volume of intervals (see p. 8 in Section 1.3). Hence, v(I) ≤ (1 + ε) N k=1 v(Ik ) ≤ (1 + ε)σ(S). Since ε can be chosen arbitrarily small, it follows that v(I) ≤ σ(S) and, therefore, that v(I) ≤ |I|e . This completes the proof. Note that the boundary of any interval has outer measure zero. Compare Exercise 22 of Chapter 1. 41

42

Measure and Integral: An Introduction to Real Analysis

The following two theorems state simple but basic properties of outer measure.

Theorem 3.3

If E1 ⊂ E2 , then |E1 |e ≤ |E2 |e .

The proof follows immediately from the fact that any cover of E2 is also a cover of E1 . Theorem 3.4

If E =

Ek is a countable union of sets, then |E|e ≤

|Ek |e .

Proof. We may assume that |Ek |e < +∞ for each k = 1, 2, . . . , since otherwise (k) the conclusion is obvious. Fix ε > 0. Given k, choose intervals Ij such that Ek ⊂ j Ij(k) and j v(Ij(k) ) < |Ek |e + ε2−k . Since E ⊂ j,k Ij(k) , we have |E|e ≤ (k) (k) j,k v(Ij ) = k j v(Ij ). Therefore, |E|e ≤

(|Ek |e + ε2−k ) = |Ek |e + ε, k

k

and the result follows by letting ε → 0. We see in particular that any subset of a set with outer measure zero has outer measure zero and that the countable union of sets with outer measure zero has outer measure zero. Since any set consisting of a single point clearly has outer measure zero, it follows that any countable subset of Rn has outer measure zero. For example, the set consisting of all points each of whose coordinates is rational has outer measure zero, even though it is dense in Rn . There are sets with outer measure zero that are not countable. As an illustration, we will construct a subset of the real line with outer measure zero that is perfect, and therefore uncountable, by Theorem 1.9. Variants of the construction and analogues for Rn , n > 1, are given in the exercises. Consider the closed interval [0, 1]. The first stage of the construction is to subdivide [0, 1] into thirds and the interior of the middle third; that remove 1 2 is, remove the open interval 3 , 3 . Each successive step of the construction is essentially the same. Thus, stage, we subdivide each of the at the second remaining two intervals 0, 13 and 23 , 1 into thirds and remove the interiors, 1 2 7 8 and , , 9 9 9 9 , of their middle thirds. We continue the construction for each of the remaining intervals. The sets removed in the first three successive stages are indicated in the following illustration by darkened intervals:

43

Lebesgue Measure and Outer Measure

0

0

1 9

2 9

1 3

2 3

1 3

2 3

1

7 9

8 9

1

The subset of [0, 1] that remains after infinitely many such operations is called the Cantor set C: thus, if Ck denotes the union of the intervals left at the kth stage, then C=

∞

Ck .

(3.5)

k =1

Since each Ck is closed, it follows from Theorem 1.7 that C is closed. Note that Ck consists of 2k closed disjoint intervals, each of length 3−k , and that C contains the endpoints of all these intervals. Any point of C belongs to an interval in Ck for every k and is therefore a limit point of the endpoints of the intervals. This proves that C is perfect. Finally, since C is covered by the intervals in any Ck , we have |C|e ≤ 2k 3−k for each k. Therefore, |C|e = 0. We now introduce a function associated with the Cantor set that will be useful later. If Dk = [0, 1] − Ck , then Dk consists of the 2k − 1 open intervals Ijk (ordered from left to right as j proceeds from j = 1 to j = 2k − 1) removed in the first k stages of construction of the Cantor set. Let fk be the continuous function on [0, 1] which satisfies fk (0) = 0, fk (1) = 1, fk (x) = j2−k on Ijk , j = 1, . . . , 2k − 1, and which is linear on each interval of Ck . The graphs of f1 and f2 are shown in the following illustration:

1 3 4

f2 f1

1 2

1 4

0

1 9

2 9

1 3

2 3

7 9

8 9

1

44

Measure and Integral: An Introduction to Real Analysis

By construction, each fk is monotone increasing, fk+1 = fk on Ijk , j = 1, . . . , 2k − 1, and |fk − fk+1 | < 2−k . Hence, ( fk − fk+1 ) converges uniformly on [0, 1], and therefore { fk } converges uniformly on [0, 1]. Let f = limk→∞ fk . Then f (0) = 0, f (1) = 1, f is monotone increasing and continuous on [0, 1], and f is constant on every interval removed in constructing C. This f is called the Cantor–Lebesgue function. Its graph is sometimes called the Devil’s staircase. The next two theorems give useful relations between the outer measure of a set and the outer measures of open sets and Gδ sets (see p. 6 in Section 1.3) that contain it. Theorem 3.6 Let E ⊂ Rn . Then given ε > 0, there exists an open set G such that E ⊂ G and |G|e ≤ |E|e + ε. Hence, |E|e = inf |G|e ,

(3.7)

where the infimum is taken over all open sets G containing E. ∞ Proof. Given ε > 0, choose intervals Ik with E ⊂ ∞ k=1 Ik and k=1 v(Ik ) ≤ |E|e + 12 ε. Let Ik∗ be an interval containing Ik in its interior (Ik∗ )◦ such that v(Ik∗ ) ≤ v(Ik ) + ε2−k−1 . If G = (Ik∗ )◦ , then G is open and contains E. Furthermore, |G|e ≤

∞ k=1

v(Ik∗ ) ≤

∞ k=1

v(Ik ) + ε

∞ k=1

1 1 2−k−1 ≤ |E|e + ε + ε = |E|e + ε. 2 2

This completes the proof.

Theorem 3.8 |E|e = |H|e .

If E ⊂ Rn , there exists a set H of type Gδ such that E ⊂ H and

Proof. By Theorem 3.6, there is for every positive integer k an open set Gk ⊃ E such that |Gk |e ≤ |E|e + 1/k. If H = ∞ k=1 Gk , then H is of type Gδ and H ⊃ E. Moreover, for every k, |E|e ≤ |H|e ≤ |Gk |e ≤ |E|e + 1/k. Thus, |E|e = |H|e . Note that each |Gk |e < ∞ if |E|e < ∞. The significance of Theorem 3.8 is that the most general set in Rn can be included in a set of relatively simple type, namely, Gδ , with the same outer measure. In defining the notion of outer measure, we used intervals I with edges parallel to the coordinate axes. The question arises whether the outer measure of a set depends on the position of the (orthogonal) coordinate axes.

Lebesgue Measure and Outer Measure

45

The answer is no, and to prove this, we will simultaneously consider the usual coordinate system in Rn and a fixed rotation of this system. Notions pertaining to the rotated system will be denoted by primes. Thus, I denotes an interval with edges parallel to the rotated coordinate axes, and |E| e denotes the outer measure of a subset E relative to these rotated intervals: v(Ik ), (3.9) |E| e = inf where the infimum is taken over all coverings of E by rotated intervals Ik . The volume of an interval is clearly unchanged by rotation. (See p. 8 in Section 1.3.) Theorem 3.10

|E|e = |E|e for every E ⊂ Rn .

Proof. We first claim that given I and ε > 0, there exist {Il } such that I ⊂ v(Il ) ≤ v(I ) + ε. To see this, let I1 be an interval containing I

Il , and in its interior such that v(I1 ) ≤ v(I ) + ε. By Theorem 1.11, the interior of I1

can be written as a union of nonoverlapping intervals Il . Hence, I ⊂ Il . N

Moreover, since the Il are nonoverlapping and l=1 Il ⊂ I1 for every positive

integer N, we have N l ) ≤ v(I1 ) by a property of volume listed on p. 8 in l=1 v(I ∞ Section 1.3. Therefore, l=1 v(Il ) ≤ v(I1 ) ≤ v(I ) + ε, which proves the claim.

Il A parallel result is that given I and ε > 0, there exist {Il } such that I ⊂ and v(Il ) ≤ v(I) + ε. Let E be any subset of Rn . Given ε, choose {Ik : k = 1, 2, . . .} such that

} such that I ⊂ E⊂ Ik and v(Ik ) ≤ |E|e + 12 ε. For each k, choose {Ik,l k l Ik,l

−k−1 and l v(Ik,l ) ≤ v(Ik ) + ε2 . Thus, k,l

v(Ik,l ) ≤

k

1 v(Ik ) + ε ≤ |E|e + ε. 2

Since E ⊂ k,l Ik,l , we obtain |E| e ≤ |E|e + ε. Hence, |E| e ≤ |E|e , and by symme

try, |E|e = |E|e . For related results, see Exercise 22.

3.2 Lebesgue Measurable Sets A subset E of Rn is said to be Lebesgue measurable, or simply measurable, if given ε > 0, there exists an open set G such that E ⊂ G and |G − E|e < ε.

46

Measure and Integral: An Introduction to Real Analysis

If E is measurable, its outer measure is called its Lebesgue measure, or simply its measure, and denoted |E|: |E| = |E|e , for measurable E.

(3.11)

The condition that E be measurable should not be confused with the conclusion of Theorem 3.6, which states that there exists an open G containing E such that |G|e ≤ |E|e + ε. In general, since G = E (G − E) when E ⊂ G, we only have |G|e ≤ |E|e + |G − E|e , and we cannot conclude from |G|e ≤ |E|e + ε that |G − E|e < ε. We now list two simple examples of measurable sets. A nonmeasurable set will be constructed in Theorem 3.38. Example 1 Every open set is measurable. This is immediate from the definition. Example 2 Every set of outer measure zero is measurable. Suppose that |E|e = 0. Then given ε > 0, there is by Theorem 3.6 an open G containing E with |G| < |E|e + ε = ε. Hence, |G − E|e ≤ |G| < ε. Theorem 3.12 The union E = measurable, and

Ek of a countable number of measurable sets is

|E| ≤

|Ek |.

Proof. Let ε > 0. For each k = 1, 2, . . . , choose an open set Gk such that Ek ⊂ Gk −k . Then G = and |Gk − E | < ε2 Gk is open and E ⊂ G. Moreover, since ke G − E ⊂ (Gk − Ek ), we have |Gk − Ek |e < ε. |G − E|e ≤ (Gk − Ek ) ≤ e

This proves that E is measurable. The fact that |E| ≤ Theorem 3.4.

Corollary 3.13

An interval I is measurable, and |I| = v(I).

|Ek | follows from

47

Lebesgue Measure and Outer Measure

Proof. I is the union of its interior and its boundary. Since its boundary has measure zero, the fact that I is measurable follows from Theorem 3.12 and the results of Examples 1 and 2. Theorem 3.2 shows that |I| = v(I).

Theorem 3.14

Every closed set F is measurable.

In order to prove this, we will use Theorem 1.11 and the next two lemmas. N Lemma 3.15 If {I finite collection of nonoverlapping intervals, then k }k=1 is a is measurable and | Ik | = |Ik |.

Ik

Proof. The fact that Ik is measurable follows from Corollary 3.13. The rest of the lemma is a minor extension of Theorem 3.2, and its proof is left as an exercise. The reader should note the important role played by the Heine– Borel theorem in the proof. We recall from Chapter 1 that the distance between two sets E1 and E2 is defined as d(E1 , E2 ) = inf{|x1 − x2 | : x1 ∈ E1 , x2 ∈ E2 }. Lemma 3.16

If d(E1 , E2 ) > 0, then |E1 ∪ E2 |e = |E1 |e + |E2 |e .

Proof. By Theorem 3.4, |E1 ∪ E2 |e ≤ |E1 |e + |E2 |e . To prove the opposite Ik inequality, suppose ε > 0, and choose intervals {Ik } such that E1 ∪ E2 ⊂ and |Ik | ≤ |E1 ∪ E2 |e + ε. We may assume that the diameter of each Ik is less than d(E1 , E2 ). (Otherwise, we divide each Ik into a finite number of nonoverlapping subintervals with this property and apply Lemma 3.15.) Hence, {Ik } splits into two subsequences {Ik } and {Ik

}, the first of which covers E1 and the second, E2 . Clearly, |Ik | + |Ik

| = |Ik | ≤ |E1 ∪ E2 |e + ε. |E1 |e + |E2 |e ≤ Therefore, |E1 |e + |E2 |e ≤ |E1 ∪ E2 |e , which completes the proof. A special case of this result will be used in the proof of Theorem 3.14. If E1 and E2 are compact and disjoint, then d(E1 , E2 ) > 0 by Exercise 1(l) of Chapter 1; therefore, |E1 ∪ E2 |e = |E1 |e + |E2 |e if E1 and E2 are compact and disjoint. Proof of Theorem 3.14. Suppose first that F is compact. Given ε > 0, choose an open set G such that F ⊂ G and |G| < |F|e +ε. Since G−F is open, Theorem 1.11

48

Measure and Integral: An Introduction to Real Analysis

implies there are nonoverlapping closed intervals Ik , k = 1, 2, . . . , such that G−F = Ik . Therefore, |G−F|e ≤ |Ik |, and it suffices to show that |Ik | ≤ N ε. We have G = F ∪ ( Ik ) ⊃ F ∪ ( N k=1 Ik ) for every finite number {Ik }k=1 of the Ik . Therefore, N N

|G| ≥ F ∪ Ik = |F|e + Ik k=1

k=1

e

e

N

by Lemma 3.16, F and k=1 Ik being disjoint and compact. Since | N k=1 Ik |e = N N | by Lemma 3.15, we obtain k=1 |Ik | ≤ |G| − |F|e < ε for every N, k=1 |Ik so that |Ik | ≤ ε, as desired. This proves the result in the case when F is compact. Fk , To complete the proof, let F be any closed subset of Rn and write F = where Fk = F ∩ {x : |x| ≤ k}, k = 1, 2, . . . . Each Fk is compact and, therefore, measurable. Hence, F is measurable by Theorem 3.12.

Theorem 3.17

The complement of a measurable set is measurable.

Proof. Let E be measurable. For each positive integer k, choose an open set Gk such that E ⊂ Gk and |G k − E|e < 1/k. Since CGk is closed, it is measurable by Theorem 3.14. Let H = k CGk . Then H is measurable and H ⊂ CE. Write CE = H ∪ Z, where Z = CE − H. Then Z ⊂ CE − CGk = Gk − E, and therefore |Z|e < 1/k for every k. Hence, |Z|e = 0 and, in particular, Z is measurable. Thus, CE is measurable since it is the union of two measurable sets. The following two theorems are corollaries of Theorems 3.12 and 3.17. Theorem 3.18 The intersection E = sets is measurable.

k Ek

of a countable number of measurable

Proof. Since CEk is measurable by Theorem 3.17. However, Ek is measurable, CE = C( k Ek ) = k CEk , and hence CE is measurable. Therefore, by another application of Theorem 3.17, E is measurable.

Theorem 3.19

If E1 and E2 are measurable, then E1 − E2 is measurable.

Proof. Since E1 − E2 = E1 ∩ CE2 , the result follows from Theorems 3.17 and 3.18. As a consequence of Theorems 3.12, 3.17, 3.18, and 3.19 it follows that the class of measurable subsets of Rn is closed under the set-theoretic operations

49

Lebesgue Measure and Outer Measure

of taking complements, countable unions, and countable intersections. Such a class of sets is called a σ-algebra; that is, a nonempty collection of subsets E of some universal set U is called a σ-algebra of sets if it satisfies the following two conditions: (i) CE ∈ if E ∈ . k Ek ∈ if Ek ∈ , k = 1, 2, . . . .

(ii)

Note that it follows from the relation C( k Ek ) = k CEk that k Ek ∈ if is a σ-algebra and Ek ∈ , k = 1, 2, . . . . Moreover, it is easy to see that if is a σ-algebra, then the universal set U and the empty set ∅ belong to . The following result is just a reformulation of Theorems 3.12 and 3.17.

Theorem 3.20

The collection of measurable subsets of Rn is a σ-algebra.

Note, for example, that if {Ek }∞ k=1 are measurable, then lim sup Ek and lim inf Ek are measurable since lim sup Ek =

∞ ∞

Ek

and

lim inf Ek =

j=1 k=j

∞

∞

Ek .

j=1 k=j

If C1 and C2 are two collections of sets, we say that C1 is contained in C2 if every set in C1 is also in C2 . If F is a family of σ-algebras , we define the collection of all sets E that belong to every in F . It is easy ∈F to be to check that ∈F is itself a σ-algebra and is contained in every in F . , consider the family F of all Given a collection C of subsets of Rn σ-algebras that contain C , and let E = ∈F . Then E is the smallest σ-algebra containing C ; that is, E is a σ-algebra containing C , and if is any other σ-algebra containing C , then contains E . The smallest σ-algebra of subsets of Rn containing all the open subsets of Rn is called the Borel σ-algebra B of Rn , and the sets in B are called Borel subsets of Rn . Sets of type Fσ , Gδ , Fσδ , Gδσ (see p. 6 in Section 1.3), etc., are Borel sets.

Theorem 3.21

Every Borel set is measurable.

Proof. Let M be the collection of measurable subsets of Rn . By Theorem 3.20, M is a σ-algebra. Since every open set belongs to M , and B is the smallest σ-algebra containing the open sets, B is contained in M . The converse of Theorem 3.21 is false: see Exercise 31.

50

Measure and Integral: An Introduction to Real Analysis

3.3 Two Properties of Lebesgue Measure The next two theorems give properties of Lebesgue measure that are of fundamental importance. To prove them, we first need the following characterization of measurability in terms of closed sets. Lemma 3.22 A set E in Rn is measurable if and only if given ε > 0, there exists a closed set F ⊂ E such that |E − F|e < ε. Proof. E is measurable if and only if CE is measurable, that is, if and only if given ε > 0, there exists an open G such that CE ⊂ G and |G − CE|e < ε. Such G exists if and only if the set F = CG is closed, F ⊂ E, and |E − F|e < ε (since G − CE = E − F). If {Ek } is a countable collection of disjoint measurable sets, then

Theorem 3.23

Ek = |Ek |. k

k

Proof. First, suppose each Ek is bounded. Given ε > 0 and k = 1, 2, . . . , use Lemma 3.22 to choose a closed Fk ⊂ Ek with |Ek − Fk | < ε2−k . Then |Ek | ≤ |Fk | + ε2−k by Theorem 3.12. Since the Ek are bounded and disjoint, m the F | = Fk are compact and disjoint. Therefore, by Lemma 3.16, | m k k=1 k=1 |Fk | m m of the F . The fact that } F ⊂ for every finite number {F k k k k Ek then k=1 k=1 |F | ≤ | E |. Hence, implies that m k k k=1 k Ek ≥ |Fk | ≥ (|Ek | − ε2−k ) = |Ek | − ε, k

k

k

k

so that | k Ek | ≥ k |Ek |. Since the opposite inequality is always true, the theorem follows in this case. For the general case, let Ij , j = 1, 2, . . . , be a sequence of intervals increasing to Rn , and define S1 = I1 and Sj = Ij − Ij−1 for j ≥ 2. Then the sets Sj , k, j = 1, 2, . . ., are bounded, disjoint, and measurable; Ek = Ek,j = Ek ∩ E and E = j k,j k k k,j Ek,j . Therefore, by the case already established, we have Ek = Ek,j = |Ek,j | = k

k,j

k,j

k

j

|Ek,j | =

k

|Ek |.

51

Lebesgue Measure and Outer Measure

Corollary 3.24 |Ik |.

If {Ik } is a sequence of nonoverlapping intervals, then |

Ik | =

◦ Proof. It is clearthat | I |Ik |. other k| ≤ On the hand, the (I k ) being dis◦ ◦ |(Ik ) | = |Ik |. Thus, | Ik | = |Ik |. joint, we have | Ik | ≥ | (Ik ) | = An alternate proof not using Theorem 3.23 can be obtained from Lemma 3.15. Corollary 3.25 Suppose E1 and E2 are measurable, E2 ⊂ E1 , and |E2 | < +∞. Then |E1 − E2 | = |E1 | − |E2 |. Proof. Since E1 = E2 ∪ (E1 − E2 ), Theorem 3.23 gives |E1 | = |E2 | + |E1 − E2 |. The corollary now follows from the assumption that |E2 | < +∞. The second basic property of Lebesgue measure concerns its behavior for a monotone sequence of sets.

Theorem 3.26

Let {Ek }∞ k=1 be a sequence of measurable sets.

(i) If Ek E, then limk→∞ |Ek | = |E|. (ii) If Ek E and |Ek | < +∞ for some k, then limk→∞ |Ek | = |E|. Proof. (i) We may assume that |Ek | < +∞ for all k; otherwise, both limk→∞ |Ek | and |E| are infinite. Write E = E1 ∪ (E2 − E1 ) ∪ · · · ∪ (Ek − Ek−1 ) ∪ · · · . Since the terms in this union are measurable and disjoint, we have by Theorem 3.23 that |E| = |E1 | + |E2 − E1 | + · · · + |Ek − Ek−1 | + · · · . By Corollary 3.25, |E| = |E1 | + (|E2 | − |E1 |) + · · · + (|Ek | − |Ek−1 |) + · · · = lim |Ek |. k→∞

(ii) We may clearly assume that |E1 | < +∞. Write E1 = E ∪ (E1 − E2 ) ∪ · · · ∪ (Ek − Ek+1 ) ∪ · · · .

52

Measure and Integral: An Introduction to Real Analysis

Since the terms on the right are disjoint measurable sets, and since each Ek has finite measure, we have |E1 | = |E| + (|E1 | − |E2 |) + · · · + (|Ek | − |Ek+1 |) + · · · = |E| + |E1 | − lim |Ek |. k→∞

Therefore, |E| = limk→∞ |Ek |, which completes the proof. The restriction in (ii) that |Ek | < +∞ for some k is necessary, as the following example shows. Let Ek be the complement of the ball with center 0 and radius k. Then |Ek | = +∞ for all k and Ek ∅, the empty set. Therefore, limk→∞ |Ek | = +∞, while |∅| = 0. Although we are interested almost exclusively in measurable sets, proving the measurability of a given set is occasionally difficult in practice, and it may be desirable to apply theorems about outer measure. A particularly useful result is the following modification of part (i) of Theorem 3.26. The corresponding modification of part (ii) of Theorem 3.26 does not generally hold; see Exercise 21.

Theorem 3.27

If Ek E, then limk→∞ |Ek |e = |E|e .

Proof. For each k, let Hk be a measurable set (e.g., a set type Gδ ) such that of ∞ 2, . . . , let V = Ek ⊂ Hk and |Hk | = |Ek |e . For m = 1, m k=m Hk . Since the Vm are measurable and increase to V = Vm , it follows from Theorem 3.26 that limm→∞ |Vm | = |V|. Since Em ⊂ Vm ⊂ Hm , we have |Em |e ≤ |Vm | ≤|Hm | = Vm ⊃ |E m |e . Hence, |Vm | = |Em |e and limm→∞ |Em |e = |V|. However, V = Em = E, and therefore, limm→∞ |Em |e ≥ |E|e . The opposite inequality is obvious since Em ⊂ E, and the theorem follows.

3.4 Characterizations of Measurability Lemma 3.22 characterizes measurability in terms of closed subsets of a set. The next three theorems give some other characterizations. The first one states that the most general measurable set differs from a Borel set by a set of measure zero.

Theorem 3.28 (i) E is measurable if and only if E = H − Z, where H is of type Gδ and |Z| = 0. (ii) E is measurable if and only if E = H ∪ Z, where H is of type Fσ and |Z| = 0.

Lebesgue Measure and Outer Measure

53

Proof. If E has the representation expressed in either (i) or (ii), it is measurable since H and Z are. Conversely, to prove the necessity in (i), suppose that E is measurable. For each k = 1, 2, . . . , choose an open set Gk such that E ⊂ Gk and |Gk − E| < 1/k. Set H = k Gk . Then H is of type Gδ , E ⊂ H, and H − E ⊂ Gk − E for every k. Hence, |H − E| = 0, and (i) is proved. The necessity of (ii) follows from that of (i) by taking complements: if E is measurable, so is CE, andtherefore CE = Gk − Z, where the Gk are open and |Z| = 0. Hence, E = ( CGk ) ∪ Z, which completes the proof. Theorem 3.29 Suppose that |E|e < +∞. Then E is measurable if and only if given ε > 0, E = (S ∪ N1 )−N2 , where S is a finite union of nonoverlapping intervals and |N1 |e , |N2 |e < ε. The proof is left as an exercise. Our final characterization of measurability states that the measurable sets are those that split every set (measurable or not) into pieces that are additive with respect to outer measure. This characterization will be used in Chapter 11 to construct measures in abstract spaces.

Theorem 3.30 (Carathéodory) A set E is measurable if and only if for every set A |A|e = |A ∩ E|e + |A − E|e . Proof. Suppose that E is measurable. Given A, let H be a set of type Gδ such that A ⊂ H and |A|e = |H|. Since H = (H ∩ E) ∪ (H − E), and since H ∩ E and H − E are measurable and disjoint, |H| = |H ∩ E| + |H − E|. Therefore, |A|e = |H ∩ E| + |H − E| ≥ |A ∩ E|e + |A − E|e . Since the opposite inequality is clearly true, we obtain |A|e = |A ∩ E|e + |A − E|e . Conversely, suppose that E satisfies the stated condition for every A. In case |E|e < +∞, choose a Gδ set H such that E ⊂ H and |H| = |E|e . Then H = E ∪ (H − E), and by hypothesis, |H| = |E|e + |H − E|e . Therefore, |H − E|e = 0; so the set Z = H − E is measurable, and consequently, E is measurable. 2, . . ., In case |E|e = +∞, let Bk be the ball with center 0 and radius k, k = 1, Ek . and let Ek = E ∩ Bk . Then each Ek has finite outer measure and E = Let Hk be a set of type Gδ containing Ek with |Hk | = |Ek |e . By hypothesis, |Hk − E| = 0. |Hk | = |H k ∩ E|e + |Hk − E|e ≥ |Ek |e + |Hk − E|e . Therefore, Let H = Hk . Then H is measurable, E ⊂ H, and H − E = (Hk − E). In particular, H−E has measure zero, and since E = H−(H−E), E is measurable. This completes the proof.

54

Measure and Integral: An Introduction to Real Analysis

As a special case of Carathéodory’s theorem, we obtain the following result. Corollary 3.31 If E is a measurable subset of a set A, then |A|e = |E| + |A − E|e . Hence, if |E| < +∞, |A − E|e = |A|e − |E|. We can now prove a stronger version of Theorem 3.8. Theorem 3.32 Let E be a subset of Rn . There exists a set H of type Gδ such that E ⊂ H and for any measurable set M, |E ∩ M|e = |H ∩ M| . If M = Rn , this reduces to Theorem 3.8. Proof. Consider first the case when |E|e < +∞. Let H be a set of type Gδ such that E ⊂ H and |E|e = |H|. If M is measurable, then by Carathéodory’s theorem |E|e = |E ∩ M|e + |E − M|e and |H| = |H ∩ M| + |H − M|. Therefore, |E ∩ M|e + |E − M|e = |H ∩ M| + |H − M|. Since all these terms are finite, and since the inclusion E − M ⊂ H − M implies that |E − M|e ≤ |H − M|, we have |E ∩ M|e ≥ |H ∩ M|. The opposite inequality is also true since E ∩ M ⊂ H ∩ M, and the theorem follows in this case. Ek with |Ek |e < +∞ and Ek E. For In case |E|e = +∞, we write E = example, Ek could be the intersection of E with the ball of radius k centered at the origin. By the case already considered, there is a set Uk of type Gδ such Hk = that ∞ Ek ⊂ Uk and |Ek ∩ M|e = |Uk ∩ M| for any measurable M. Let U . Then H is measurable (in fact, H is of type G ), H H = Hk , m δ k k k m=k and Ek ⊂ Hk ⊂ Uk . Hence, Ek ∩ M ⊂ Hk ∩ M ⊂ Uk ∩ M, and therefore, |Ek ∩ M|e = |Hk ∩ M| for measurable M. Since Ek E and Hk H, we have Ek ∩ M E ∩ M and Hk ∩ M H ∩ M. By Theorem 3.27, |E ∩ M|e = |H ∩ M| for measurable M. Note that our set H is of type Gδσ . To obtain a set of type Gδ , use Theorem 3.28 to write H = H1 − Z, where H1 is of type Gδ and |Z| = 0. Then E ⊂ H1 . Moreover, H1 ∩ M = (H ∩ M) ∪ (Z ∩ M), and since |Z| = 0, we have |H1 ∩ M| = |H ∩ M|, so that |E ∩ M|e = |H1 ∩ M|. This completes the proof.

3.5 Lipschitz Transformations of Rn Theorem 3.10 shows that the notion of outer measure is independent of the orientation of the coordinate axes. Since measurability and measure are

55

Lebesgue Measure and Outer Measure

defined in terms of outer measure, it follows that these too are independent of rotation of the axes. We wish to study the effect of other transformations of Rn on the class of measurable sets; that is, we seek a condition on a transformation T: Rn → Rn such that the image TE = {y : y = Tx, x ∈ E} of every measurable set E is measurable. We note that a continuous transformation may not preserve measurability: see Exercise 17 of this chapter, as well as Chapter 7, Exercise 10. A transformation y = Tx of Rn into itself is called a Lipschitz transformation if there is a constant c such that |Tx − Tx | ≤ c|x − x |. The smallest such constant c, namely, the number c = sup

x,x ;x=x

|Tx − Tx | , |x − x |

is called the Lipschitz constant of T. If yj = fj (x), j = 1, . . . , n, are the coordinate functions representing T (see p. 12 in Section 1.7), it follows that T is Lipschitz if and only if each fj satisfies a Lipschitz condition |fj (x)−fj (x )| ≤ cj |x−x |. For example, a linear transformation of Rn is clearly a Lipschitz transformation; see Exercise 29. More generally, a mapping yj = fj (x), j = 1, . . . , n, for which each fj has bounded first partial derivatives in Rn is a Lipschitz mapping. Theorem 3.33 If y = Tx is a Lipschitz transformation of Rn , then T maps measurable sets into measurable sets. Proof. We will first show that a continuous transformation sends sets of type Fσ into sets of type Fσ . A continuous T maps compact sets into compact sets by Theorem 1.17; therefore, since any closed set can be written as a countable union of compact sets, closed sets into sets of type Fσ . Here, we T maps TEk , which holds for any T and {Ek }. It have used the relation T Ek = follows that a continuous T preserves the class of Fσ sets. We will next show that a Lipschitz transformation T sends sets of measure zero into sets of measure zero. Since |Tx − Tx | ≤ c|x − x |, the image of a set with diameter d has diameter at most cd. It follows (see Exercise 28) by covering with cubes that there is a constant c such that |TI| ≤ c |I| for any interval I; note that TI is measurable since I is closed. If |Z| = 0 and ε > 0, } covering Z such that |I | < ε. Since TZ ⊂ TIk , we choose intervals {I k k have |TZ|e ≤ |TIk | ≤ c |Ik | < c ε. Hence, |TZ| = 0. If E is a measurable set, we use Theorem 3.28 to write E = H ∪ Z, where H is of type Fσ and |Z| = 0. Since TE = TH ∪ TZ, TE is measurable as the union of measurable sets. This completes the proof.

56

Measure and Integral: An Introduction to Real Analysis

For an extension of Theorem 3.33 in case n = 1, see Exercise 10 of Chapter 7. In the special case that T is a linear transformation of Rn , we will derive a formula for the measure of TE. By elementary linear algebra, such T has a unique n × n matrix representation with respect to every given basis of Rn , and the value of the determinant of every matrix representation of T is the same. The common value is denoted det T and called the determinant of T. For the sake of definiteness, we may think of T as identified with its matrix representation with respect to the standard orthonormal basis of unit vectors along the coordinate axes. Then Tx is the vector resulting from the matrix action of T on x. If P is a parallelepiped (see p. 7 in Section 1.3), arguments like those used in proving Theorems 3.2 and 3.10 show that |P| = v(P)

(3.34)

(see Exercise 16). Hence, by p. 7 in Section 1.3, |P| is the absolute value of the n × n determinant whose rows are the edges of P. Theorem 3.35 Let T be a linear transformation of Rn , and let E be measurable. Then |TE| = δ|E|, where δ is the absolute value of the determinant of T.∗ Proof. Let δ = |det T|. Then for any interval I, |TI| = δ|I| by p. 8 in Section 1.3. n intervals {Ik } covering If E is any ε > 0, choose measurable subset of R and E with |Ik | ≤ |E| + ε. Then |TE| ≤ |TIk | = δ |Ik | ≤ δ(|E| + ε). Note here that if δ= 0, then |TIk | = 0 for every k and consequently the first inequality |TE| ≤ |TIk | yields |TE| = 0 even when |E| = ∞. Therefore, |TE| ≤ δ|E|,

(3.36)

where we interpret 0 · ∞ as 0. To see that |TE| = δ|E|, we may assume that δ > 0 by (3.36). Then T has an inverse T−1 that is also linear, and E = T−1 (TE). Therefore, by (3.36) applied to T−1 and the set TE, |E| ≤ |det (T−1 )| |TE| = δ−1 |TE|, or |TE| ≥ δ|E|. Hence, by (3.36), δ|E| = |TE|. We leave it as an exercise to show that |TE|e = δ|E|e for any set E.

∗ Here 0 · ∞ should be interpreted as 0.

Lebesgue Measure and Outer Measure

57

The following special case of Theorem 3.35 deserves mention: if E is a measurable set in Rn and λ is a real number, then the set λE defined by λE = {λx : x ∈ E} is measurable with measure |λE| = |λ|n |E|. In fact, λE is the image of E under the matrix λI, where I is the identity matrix on Rn .

3.6 A Nonmeasurable Set We will now construct a nonmeasurable subset of R1 ; the construction in Rn , n > 1, is similar. We will need the axiom of choice in the following form. Zermelo’s Axiom: Consider a family of arbitrary nonempty disjoint sets indexed by a set A, {Eα : α ∈ A}. Then there exists a set consisting of exactly one element from each Eα , α ∈ A. We also need the following lemma. Lemma 3.37 Let E be a measurable subset of R1 with |E| > 0. Then the set of differences {d : d = x − y, x ∈ E, y ∈ E} contains an interval centered at the origin. Proof. Given ε > 0, since |E| > 0, there exists an open set G such that E ⊂ G and |G| ≤ (1 + ε)|E|. By Theorem 1.11, G can be written as a union of . Letting Ek = E ∩ Ik , it follows that E = nonoverlapping intervals, G = I k that two different Ek , that the Ek are measurable, and Ek ’s have at most one |Ek |. Since |G| ≤ (1 + point in common. Therefore, |G| = |Ik | and |E| = ε)|E|, we must have |Ik0 | ≤ (1+ε)|Ek0 | for some k0 . Choosing ε = 13 and letting I and E denote the sets Ik0 and Ek0 , respectively, we have E ⊂ I and |E | ≥ 34 |I|. We claim that if E is translated by any number d satisfying |d| < 12 |I|, the translated set Ed has points in common with E . Otherwise, since E ∪ Ed is contained in an interval of length |I| + |d|, we would have |E | + |Ed | ≤ |I| + |d|, or 2|E | ≤ |I| + |d|. [Here, we have used the fact that Ed is measurable and |Ed | = |E | (see Exercise 18).] However, the last inequality is false if |d| < 12 |I| since |E | ≥ 34 |I|. This proves the claim and thus the lemma. See Exercise 30 in Chapter 7 for a related fact.

Theorem 3.38 (Vitali) There exist nonmeasurable sets. Proof. We define an equivalence relation on the real line by saying that x and y are equivalent if x−y is rational. The equivalence classes then have the form Ex = {x+r : r is rational}. Two classes Ex and Ey are either identical or disjoint;

58

Measure and Integral: An Introduction to Real Analysis

therefore, one equivalence class consists of all the rational numbers, and the other distinct classes are disjoint sets of irrational numbers. The number of distinct equivalence classes is uncountable since each class is countable but the union of all the classes is uncountable (this union being the entire line). Using Zermelo’s axiom, let E be a set consisting of exactly one element from each distinct equivalence class. Since any two points of E must differ by an irrational number, the set {d : d = x − y, x ∈ E, y ∈ E} cannot contain an interval. By Lemma 3.37, it follows that either E is not measurable or |E| = 0. Since the union of the translates of E by every rational number is all of R1 , R1 would have measure zero if E did. We conclude that E is not measurable.

Corollary 3.39 measurable set.

Any set in R1 with positive outer measure contains a non-

Proof. Let A satisfy |A|e > 0, and let E be the nonmeasurable set of Theoof E by r. Thenthe Er are rem 3.38. Forrational r, let Er denote the translate disjoint and Er = (−∞, +∞). Thus, A = (A ∩ Er ) and |A|e ≤ |A ∩ Er |e . If A ∩ Er is measurable, then |A ∩ Er | = 0 by Lemma 3.37, using the fact that for every r, {x − y : x, y ∈ A ∩ Er } is a subset of {x − y : x, y ∈ E} and so contains no interval. Since |A|e > 0, it follows that there is some r such that A ∩ Er is not measurable. This completes the proof.

Exercises 1. There is an analogue for bases different from 10 of the usual decimal expansion of a number. If b is an integer larger than 1 and 0 ≤ x ≤ 1, show that there exist integral coefficients ck , 0 ≤ ck < b, such that x = ∞ −k −k k=1 ck b . Show that this expansion is unique unless x = cb , c = k 1, . . . , b − 1, in which case there are two expansions. 2. When b = 3 in Exercise 1, the expansion is called the triadic or ternary expansion of x. (a) Show that the Cantor set C consists of all x such that x has some triadic expansion for which every ck is either 0 or 2. (b) Let f (x) be the Cantor–Lebesgue see p. 43 in Section 3.1. function: Show that if x ∈ C and x = ck 3−k , where each ck is either 0 or 2, then f (x) = ( 12 ck )2−k . 3. Construct a two-dimensional Cantor set in the unit square {(x, y) : 0 ≤ x, y ≤ 1} as follows. Subdivide the square into nine equal parts and keep only the four closed corner squares, removing the remaining region (which forms a cross). Then repeat this process in a suitably scaled

Lebesgue Measure and Outer Measure

59

version for the remaining squares, ad infinitum. Show that the resulting set is perfect, has plane measure zero, and equals C × C. 4. Construct a subset of [0, 1] in the same manner as the Cantor set by removing from each remaining interval a subinterval of relative length θ, 0 < θ < 1. Show that the resulting set is perfect and has measure zero. 5. Construct a subset of [0, 1] in the same manner as the Cantor set, except that at the kth stage, each interval removed has length δ3−k , 0 < δ < 1. Show that the resulting set is perfect, has measure 1 − δ, and contains no intervals. 6. Construct a Cantor-type subset of [0, 1] by removing from each interval < θk < 1. remaining at the kth stage a subinterval of relative length θk , 0 Show that the remainder has measure zero if and only if θk = ∞ > 0, a converges, in the sense that +∞. (Use the fact that for a k k=1 k ∞ a is finite and not zero, if and only if log a conlimN→∞ N k k=1 k=1 k verges.) 7. Prove Lemma 3.15. 8. Show that the Borel σ-algebra B in Rn is the smallest σ-algebra containing the closed sets in Rn . 9. If {Ek }∞ |Ek |e < +∞, show that lim sup Ek k=1 is a sequence of sets with (and so also lim inf Ek ) has measure zero. 10. If E1 and E2 are measurable, show that |E1 ∪ E2 | + |E1 ∩ E2 | = |E1 | + |E2 |. 11. Prove Theorem 3.29. (For the sufficiency, pick open sets G and G1 with S ⊂ G, N1 ⊂ G1 , |G − S| < ε, and |G1 | < |N1 |e + ε < 2ε. Estimate |(G ∪ G1 ) − E|e .) 12. If E1 and E2 are measurable subsets of R1 , show that E1 × E2 is a measurable subset of R2 and |E1 × E2 | = |E1 ||E2 |. (Interpret 0 · ∞ as 0.) (Hint: Use a characterization of measurability.) 13. Motivated by (3.7), define the inner measure of E by |E|i = sup |F|, where the supremum is taken over all closed subsets F of E. Show that (i) |E|i ≤ |E|e , and (ii) if |E|e < +∞, then E is measurable if and only if |E|i = |E|e . (Use Lemma 3.22.) 14. Show that the conclusion of part (ii) of Exercise 13 is false if |E|e = +∞. 15. If E is measurable and A is any subset of E, show that |E| = |A|i + |E − A|e . (See Exercise 13 for the definition of |A|i .) As a consequence, using Exercise 13, show that if A ⊂ [0, 1] and |A|e + |[0, 1] − A|e = 1, then A is measurable. 16. Prove (3.34). 17. Give an example that shows that the image of a measurable set under a continuous transformation may not be measurable. (Consider the Cantor–Lebesgue function and the pre-image of an appropriate nonmeasurable subset of its range.) See also Exercise 10 of Chapter 7.

60

Measure and Integral: An Introduction to Real Analysis

18. Prove that outer measure is translation invariant; that is, if Eh = {x + h : x ∈ E} is the translate of E by h, h ∈ Rn , show that |Eh |e = |E|e . If E is measurable, show that Eh is also measurable. (This fact was used in proving Lemma 3.37.) 19. Carry out the details of the construction of a nonmeasurable subset of Rn , n > 1. 20. Show that there exist disjoint E1 , E2 , . . . , Ek , . . . such that | Ek |e < |Ek |e with strict inequality. (Let E be a nonmeasurable subset of [0, 1] whose rational translates are disjoint. Consider the translates of E by all rational numbers r, 0 < r < 1, and use Exercise 18.) 21. Show that there exist sets E1 , E2 , . . . , Ek , . . . such that Ek E, |Ek |e < +∞, and limk→∞ |Ek |e > |E|e with strict inequality. 22. (a) Show that the outer measure of a set is unchanged if in the definition of outer measure we use coverings of the set by cubes with edges parallel to the coordinate axes instead of coverings by intervals. (b) Show that outer measure is also unchanged if coverings by parallelepipeds with a fixed orientation (i.e., with edges parallel to a fixed set of n linearly independent vectors) are used rather than coverings by intervals. 23. Let Z be a subset of R1 with measure zero. Show that the set {x2 : x ∈ Z) also has measure zero. 24. Let 0.α1 α2 . . . be the dyadic development of any x in [0, 1], that is, x = α1 2−1 + α2 2−2 + · · · with αi = 0, 1. Let k1 , k2 , . . . be a fixed permutation of the positive integers 1, 2, . . . , and consider the transformation T that sends x = 0.α1 α2 · · · to Tx = 0.αk1 αk2 . . .. If E is a measurable subset of [0, 1], show that its image TE is also measurable and that |TE| = |E|. (Consider first the special cases of E a dyadic interval [s2−k , (s + 1)2−k ], s = 0, 1, . . . , 2k −1, and then of E an open set [which is a countable union of nonoverlapping dyadic intervals]. Also show that if E has small measure, then so has TE.) 25. Construct a measurable subset E of [0, 1] such that for every subinterval I, both E ∩ I and I − E have positive measure. (Take a Cantor-type subset of [0, 1] with positive measure [see Exercise 5], and on each subinterval of the complement of this set, construct another such set, and so on. The measures can be arranged so that the union of all the sets has the desired property.) See also Exercise 21 of Chapter 4. 26. Construct a continuous function on [0, 1], which is not of bounded variation on any subinterval. (The construction follows the pattern of the Cantor–Lebesgue function with some modifications. At the first stage, for example, make the corresponding function increase to 2/3 (rather than 1/2) in (0, 1/3), then make it decrease by 1/3 in (1/3, 2/3), and then increase again 2/3 in (2/3, 1). The construction at other stages is

Lebesgue Measure and Outer Measure

61

similar, depending on whether the preceding function was increasing or decreasing in the subinterval under consideration.) 27. Construct a continuous function of bounded variation on [0, 1] which is not monotone in any subinterval. (The construction is like that in the preceding exercise, except that the approximating functions are less steep. For example, at the first stage, let the function increase to 1/2 + ε, then decrease by 2ε, and then increase again by 1/2 + ε. Choose the ε’s at each stage so that their sum converges.) 28. Prove the following assertion that is made in the proof of Theorem 3.33: If T : Rn → Rn is a Lipschitz transformation, then there is a constant c > 0 such that |TI| ≤ c |I| for every interval I. (Consider first the case when I is a cube Q, noting that TQ is contained in a cube with edge length c diam Q where c is the Lipschitz constant of T. The case of general I can then be deduced by applying Theorem 1.11 to the interiors J◦ of intervals J with I ⊂ J◦ .) 29. Let T : Rn → Rn be a linear transformation. 2 1/2 , show that (a) If T has matrix representation (tij ) and t = i,j tij n |Tx − Ty| ≤ t|x − y| for all x, y ∈ R . (Use (1.2).) The number t is called the Hilbert–Schmidt norm of (tij ). (b) Prove the fact mentioned at the end of the proof of Theorem 3.35 that |TE|e = δ|E|e for every E ⊂ Rn , where δ = | det T|. 30. Let f : Rn → R1 be continuous. Show that the inverse image f −1 (B) of a Borel set B is a Borel set; see p. 64 in Section 4.1 for the definition of the inverse image of a set. (The collection of sets {E : f −1 (E) is a Borel set} is a σ-algebra and contains all open sets in R1 ; cf. Exercise 10 of Chapter 4 and Corollary 4.15.) See also Exercise 22 of Chapter 4. 31. Construct a Lebesgue measurable set that is not Borel measurable. (If f is the Cantor–Lebesgue function, then the function g(x) = x + f (x) is strictly increasing and continuous on [0, 1]. Consider the set g−1 (E) for an appropriate E ⊂ g(C), where C is the Cantor set.) 32. Let E be a set in Rn with |E|e > 0 and let θ satisfy 0 < θ < 1. Show that there is a set Eθ ⊂ E with |Eθ |e = θ|E|e and that Eθ can be chosen to be measurable if E is measurable. (If Q(r) denotes the cube with edge length r centered at the origin, 0 < r < ∞, then |E ∩ Q(r)|e is a continuous monotone function of r.) 33. Let E be a measurable set with 0 < |E| ≤ ∞. Show that there are infinite collections {Aj }, {Bj } of measurable subsets of E with the following properties: 0 < |Aj |, |Bj | < ∞, Aj ∩ Al = ∅ if j = l, Bj+1 ⊂ Bj , and |Bj | → 0. See the end of Section 8.5 for an application. (This can be proved in many ways, for example, by using the Exercise 32.) 34. Let E and Z be sets in Rn and |Z| = 0. If E ∪ Z is measurable, show that E is measurable and that |E| = |E ∪ Z|. See Exercise 2 of Chapter 10.

4 Lebesgue Measurable Functions We will use the concept of Lebesgue measure to introduce a rich class of functions and a method of integrating these functions. In this chapter, we describe the class of functions. Let E be a measurable set in Rn . Let f be a real-valued (in the usual extended sense) function defined on E, that is, −∞ ≤ f (x) ≤ +∞, x ∈ E. Then, f is called a Lebesgue measurable function on E, or simply a measurable function, if for every finite a, the set {x ∈ E : f (x) > a} is a measurable subset of Rn . In what follows, we shall often use the abbreviation {f > a} for {x ∈ E : f (x) > a}. Note that the definition of measurability of a function on a set E presupposes that E is measurable. Since E = { f = −∞} ∪

∞

{ f > −k} ,

k=1

the measurability of E implies that of { f = − ∞} if we assume that f is measurable. As a varies, the behavior of the sets { f > a} describes how the values of f are distributed. Intuitively, it is clear that the smoother f is, the smaller the variety of such sets will be. For example, if E = Rn and f is continuous in Rn , then { f > a} is always open. A function f defined on a Borel set E is said to be Borel measurable if { f > a} is a Borel set for every a. Thus, every Borel measurable function is measurable. See also Exercise 24. One further comment will be helpful later. Let M denote the class of measurable subsets of Rn . Much of the development of measurable functions given in this chapter depends only on the σ-algebra structure of M and the properties of Lebesgue measure. Thus, a measurable function inherits its elementary properties from those of measurable sets. It is reasonable to expect, therefore, that many of the methods of this chapter can be used to develop results in more general settings, for spaces other than Rn and σ-algebras other than M . This will be done in Chapter 10. To save too much repetition there, it will be helpful if the reader notices which properties of M and Lebesgue measure are used in the proofs here. These will be discussed at the end of the chapter. 63

64

Measure and Integral: An Introduction to Real Analysis

4.1 Elementary Properties of Measurable Functions Theorem 4.1 Let f be a real-valued function defined on a measurable set E. Then f is measurable if and only if any of the following statements holds for every finite a: (i) { f ≥ a} is measurable. (ii) { f < a} is measurable. (iii) { f ≤ a} is measurable. Proof. Since { f ≥ a} = ∞ k=1 { f > a − 1/k}, the measurability of f implies (i). Since { f a} is the complement of { f ≤ a}, it follows that f is measurable if (iii) holds. The proof of the following is left as an exercise.

Corollary 4.2 Let f be defined on a measurable set E. If f is measurable, then { f > −∞}, { f < +∞}, { f = +∞}, {a ≤ f ≤ b}, { f = a}, etc., are all measurable. Moreover, for any f, if either { f = +∞} or { f = −∞} is measurable, then f is measurable if {a < f < +∞} is measurable for every finite a. Also, observe that {a < f ≤ b} = { f > a} − { f > b}. Let f be defined in E. If S is a subset of R1 , the inverse image of S under f is defined by f −1 (S) = {x ∈ E : f (x) ∈ S}.

Theorem 4.3 Let f be defined on a measurable set E. If f is measurable, then for every open set G in R1 , the inverse image f −l (G) is a measurable subset of Rn . Conversely, f is measurable if f −1 (G) is measurable for every open set G in Rn and either { f = +∞} or { f = −∞} is measurable. 1 Proof. Suppose that f is measurable and let G be any open−1subset of R . By Theorem 1.10, G can be written as G = k (ak , bk ). But f ((a k , bk )) equals {ak < f < bk } and is therefore measurable. Since f −1 (G) = k f −1 ((ak , bk )), it follows that f −1 (G) is measurable too. To prove the converse, note that if G = (a, +∞), then f −1 (G) = {a < f < +∞} and apply the second part of Corollary 4.2.

Lebesgue Measurable Functions

65

This result shows that a finite function f defined on a measurable set is measurable if and only if f −1 (G) is measurable for every open G ⊂ R1 . Similarly, a finite f defined on a Borel set is Borel measurable if and only if f −1 (G) is Borel measurable for every open G ⊂ R1 . Theorem 4.4 Let A be a dense subset of R1 . Then f is measurable if { f > a} is measurable for all a ∈ A. Proof. Given any real a, choose a sequence {ak } in A that converges to a from above: ak ∈ A, ak ≥ a, limk→∞ ak = a. Then { f > a} = { f > ak }, and the theorem follows. A property is said to hold almost everywhere in E or, in abbreviated form, a.e., if it holds in E except in some subset of E with measure zero. For example, the statement “f = 0 a.e. in E” means that f (x) = 0 in E, with the possible exception of those x in some subset Z of E with |Z| = 0. The next few theorems give some simple properties of the class of measurable functions. Theorem 4.5 If f is measurable and if g = f a.e., then g is measurable and |{g > a}| = |{ f > a}|. Proof. If Z = { f = g}, then |Z| = 0 and {g > a} ∪ Z = { f > a} ∪ Z. Therefore, f being measurable, {g > a}∪Z is measurable, and since this differs from {g > a} by a set of measure zero, g is measurable (cf. Exercise 34 of Chapter 3 and Exercise 2 of Chapter 10). Finally, |{g > a}| = |{g > a} ∪ Z| = |{ f > a} ∪ Z| = |{ f > a}|. In view of the previous theorem, it is natural to extend the definition of measurability to include functions that are defined only a.e. in E, by saying that such an f is measurable on E if it is measurable on the subset of E where it is defined. Note also that if f is measurable on E, then it is measurable on any measurable E1 ⊂ E since {x ∈ E1 : f (x) > a} = {x ∈ E : f (x) > a} ∩ E1 . If φ and f are finite measurable functions defined on R1 and Rn , respectively, their composition φ(f (x)) may not be measurable (see Exercise 5). If φ is continuous, however, we have the following result. Theorem 4.6 Let φ be continuous on R1 and let f be finite a.e. in E, so that, in particular, φ(f ) is defined a.e. in E. Then φ(f ) is measurable if f is.

66

Measure and Integral: An Introduction to Real Analysis

Proof. We may assume that f is finite everywhere in E. We will use the familiar fact that since φ is continuous, the inverse image φ−1 (G) of an open set G is open (cf. Corollary 4.15 and Exercise 10). By Theorem 4.3, it is enough to show that for every open G in R1 , {x : φ(f (x)) ∈ G} is measurable. However, {x : φ(f (x)) ∈ G} = f −1 (φ−1 (G)), and since φ−1 (G) is open and f is measurable, f −1 (φ−1 (G)) is measurable by Theorem 4.3. See also Exercise 22(b). Remark: The cases that arise most frequently are φ(t) = |t|, |t|p (p > 0), ect , etc. Thus, |f |, |f |p (p > 0), ecf are measurable if f is measurable (even if we do not assume that f is finite a.e., as is easily seen). Another special case worth mentioning is that of f + = max{ f , 0},

f − = −min{ f , 0}.

It is enough to observe that the functions x+ and x− are continuous.

Theorem 4.7

If f and g are measurable, then { f > g} is measurable.

Proof. Let {rk } be the rational numbers. Then, { f > g} =

k

{ f > rk > g} =

({f > rk } ∩ {g < rk }),

k

and the theorem follows.

Theorem 4.8 measurable.

If f is measurable and λ is any real number, then f + λ and λf are

The proof is left as an exercise. We interpret 0 · ±∞ to be 0. Note that the sum f + g of two functions f and g is well defined wherever it is not of the form +∞ + (−∞) or −∞ + ∞. In the next theorem, we assume for simplicity that f + g is well defined everywhere. See Exercise 6 for extensions.

Theorem 4.9

If f and g are measurable, so is f + g.

67

Lebesgue Measurable Functions

Proof. Since g is measurable, so is a − g for any real a, by Theorem 4.8. Since { f + g > a} = { f > a − g}, the result follows from Theorem 4.7. A corollary of Theorems 4.8 and 4.9 is that a finite linear combination λ1 f1 + · · · + λN fN of measurable functions f1 , . . . , fN is measurable provided it is well defined. Thus, the class of measurable functions on a set E that are finite a.e. in E forms a vector space; here, we identify measurable functions that are equal a.e. In the theorem that follows, we consider products of functions. In addition to the familiar conventions about products of infinities, we adopt as usual the convention that 0 · ±∞ = ±∞ · 0 = 0. Also, if −∞ ≤ α ≤ +∞, we interpret α/(±∞) = α · 0 = 0.

Theorem 4.10 measurable.

If f and g are measurable, so is fg. If g = 0 a.e., then f /g is

Proof. By Theorem 4.6 and the remark following it, f 2 (= |f |2 ) is measurable if f is. Hence, if f and g are measurable and finite, the formula fg = [(f + g)2 − (f −g)2 ]/4 implies that fg is measurable. The proof when f and g can be infinite and the proof of the second statement of the theorem are left as exercises. Theorem 4.11 If { fk (x)}∞ k=1 is a sequence of measurable functions, then supk fk (x) and inf k fk (x) are measurable. Proof. Since inf k fk = −supk (−fk ), it is enough to prove the result for supk fk . But this follows easily from the fact that {supk fk > a} = k { fk > a}. As a special case of the preceding theorem, we see that if f1 , . . . , fN are measurable, then so are maxk fk and mink fk . In particular, if f is measurable, then so are f + = max{ f , 0} and f − = −min{ f , 0}, a fact we have already observed in the remark following Theorem 4.6.

Theorem 4.12

If { fk } is a sequence of measurable functions, then lim sup fk and k→∞

lim inf fk are measurable. In particular, if lim fk exists a.e., it is measurable. k→∞

k→∞

Proof. Since lim sup fk = inf {sup fk }, k→∞

j

k≥j

lim inf fk = sup{inf fk }, k→∞

j

k≥j

68

Measure and Integral: An Introduction to Real Analysis

the first statement is a corollary of Theorem 4.11. The second statement then follows by Theorem 4.5 since wherever limk→∞ fk exists, it equals lim supk→∞ fk . The characteristic function, or indicator function, χA (x), of a set A is defined by χA (x) =

1 if x ∈ A 0 if x ∈ / A.

Clearly, χA is measurable if and only if A is measurable. χA is an example of what is called a simple function on Rn : a simple function on a set E ⊂ Rn is one that is defined on E and assumes only a finite number of values on E, all of which are finite. If f is a simple function on E taking (distinct) values a1 , . . . , aN on (disjoint) subsets E1 , . . . , EN of E, E = N k=1 Ek , then f (x) =

N

ak χEk (x),

x ∈ E.

k=1

We leave it as an exercise to show that such an f is measurable if and only if E1 , . . . , EN are measurable. Theorem 4.13 (i) Every function f can be written as the limit of a sequence { fk } of simple functions. (ii) If f ≥ 0, the sequence can be chosen to increase to f , that is, chosen such that fk ≤ fk+1 for every k. (iii) If the function f in either (i) or (ii) is measurable, then the fk can be chosen to be measurable. Proof. We will prove (ii) first. Thus, suppose that f ≥ 0. For each k, k = 1, 2, . . ., subdivide the values of f that fall in [0, k] by partitioning [0, k] into subintervals [(j − 1)2−k , j2−k ], j = 1, . . . , k2k . Let ⎧ j−1 j ⎨j − 1 if ≤ f (x) < k , j = 1, . . . , k2k fk (x) = 2k 2k 2 ⎩k if f (x) ≥ k. Each fk is a simple function defined everywhere in the domain of f . Clearly, fk ≤ fk+1 since in passing from fk to fk+1 , each subinterval [(j − 1)2−k , j2−k ] is divided in half. Moreover, fk → f since 0 ≤ f − fk ≤ 2−k for sufficiently large k wherever f is finite, and fk = k → +∞ wherever f = +∞. This proves (ii).

69

Lebesgue Measurable Functions

To prove (i), apply the result of (ii) to each of the nonnegative functions f + and f − , obtaining increasing sequences { fk } and { fk } of simple functions such that fk → f + and fk → f − . Then fk − fk is simple and fk − fk → f + − f − = f . Finally, it is enough to prove (iii) for f ≥ 0 since otherwise we may consider f + and f − . In this case, however, k

fk =

k2 j−1 j=1

2k

χ {( j−1)/2k ≤ f < j/2k } + kχ { f ≥ k} .

This is measurable if f is since all the sets involved are measurable. Note that if f is bounded, the simple functions earlier will converge uniformly to f .

4.2 Semicontinuous Functions We now study classes of functions f whose continuity properties on a set can be characterized by the topological nature of { f > a} or { f < a}. Measurability of such functions will consequently be easy to establish. We will encounter a particularly important example when we study the Hardy–Littlewood maximal function in Chapter 7. Let f be defined on E, and let x0 be a limit point of E that lies in E. Then f is said to be upper semicontinuous at x0 if lim sup f (x) ≤ f (x0 ).

x→x0 ;x∈E

We will usually abbreviate this by saying that f is usc at x0 . Note that if f (x0 ) = +∞, then f is automatically usc at x0 ; otherwise, the statement that f is usc at x0 means that given M > f (x0 ), there exists δ > 0 such that f (x) < M for all x ∈ E that lie in the ball |x − x0 | < δ. Intuitively, this means that near x0 , the values of f do not exceed f (x0 ) by a fixed amount. Similarly, f is said to be lower semicontinuous at x0 , or lsc at x0 , if lim inf f (x) ≥ f (x0 ).

x→x0 ;x∈E

Thus, if f (x0 ) = −∞, f is lsc at x0 , while if f (x0 ) > −∞, the definition amounts to saying that given m < f (x0 ), there exists δ > 0 such that f (x) > m if x ∈ E and |x − x0 | < δ. Equivalently, f is lsc at x0 if and only if −f is usc at x0 . It follows that f is continuous at x0 if and only if |f (x0 )| < +∞ and f is both usc and lsc at x0 . As simple examples of functions that are usc everywhere in R1 but not continuous at some x0 , we have

70

Measure and Integral: An Introduction to Real Analysis

0 if x < x0 u1 (x) = 1 if x ≥ x0 ,

u2 (x) =

0 1

if x = x0 if x = x0 .

Hence, −u1 and −u2 are lsc everywhere in R1 . The Dirichlet function of Example 4 in Chapter 2 is usc at the rational numbers and lsc at the irrationals. A function defined on E is called usc (lsc, continuous) relative to E if it is usc (lsc, continuous) at every limit point of E that is in E. The next theorem characterizes functions that are semicontinuous relative to a set.

Theorem 4.14 (i) A function f is usc relative to E if and only if {x ∈ E : f (x) ≥ a} is relatively closed [equivalently, {x ∈ E : f (x) < a} is relatively open] for all finite a. (ii) A function f is lsc relative to E if and only if {x ∈ E : f (x) ≤ a} is relatively closed [equivalently, {x ∈ E : f (x) > a} is relatively open] for all finite a. Proof. Statements (i) and (ii) are equivalent since f is usc if and only if −f is lsc. It is therefore enough to prove (i). Suppose first that f is usc relative to E. Given a, let x0 be a limit point of {x ∈ E : f (x) ≥ a} that is in E. Then there exist xk ∈ E such that xk → x0 and f (xk ) ≥ a. Since f is usc at x0 , we have f (x0 ) ≥ lim supk→∞ f (xk ). Therefore, f (x0 ) ≥ a, so that x0 ∈ {x ∈ E : f (x) ≥ a}. This shows that {x ∈ E : f (x) ≥ a} is relatively closed. Conversely, let x0 be a limit point of E that is in E. If f is not usc at x0 , then f (x0 ) < +∞, and there exist M and {xk } such that f (x0 ) < M, xk ∈ E, xk → x0 , and f (xk ) ≥ M. Hence, {x ∈ E : f (x) ≥ M} is not relatively closed since it does not contain all its limit points that are in E.

Corollary 4.15 A finite function f is continuous relative to E if and only if all sets of the form {x ∈ E : f (x) ≥ a} and {x ∈ E : f (x) ≤ a} are relatively closed [or, equivalently, all {x ∈ E : f (x) > a} and {x ∈ E : f (x) < a} are relatively open] for finite a.

Corollary 4.16 Let E be measurable, and let f be defined on E. If f is usc (lsc, continuous) relative to E, then f is measurable. Proof. Let f be usc relative to E. Since {x ∈ E : f (x) ≥ a} is relatively closed, it is the intersection of E with a closed set. Hence, it is measurable, and the result follows from Theorem 4.1. The previous three results deserve special attention in certain cases. Suppose, for example, that E = Rn and f is usc everywhere in Rn . ∞ Since { f > a} = k=1 { f ≥ a + 1/k}, it follows from Theorem 4.14 that { f > a}

Lebesgue Measurable Functions

71

is of type Fσ . Since an Fσ set is a Borel set, we see that a function that is usc (similarly, lsc or continuous) at every point of Rn is Borel measurable.

4.3 Properties of Measurable Functions and Theorems of Egorov and Lusin Our next theorem states in effect that if a sequence of measurable functions converges at each point of a set E, then, with the exception of a subset of E with arbitrarily small measure, the sequence actually converges uniformly. This remarkable result cannot hold, at least in the form just stated, without some further restrictions. For example, if E = Rn and fk = χ{x:|x| 0, there is a closed subset F of E such that |E − F| < ε and { fk } converge uniformly to f on F. In order to prove this, we need a preliminary result that is interesting in its own right.

Lemma 4.18 Under the same hypothesis as in Egorov’s theorem, given ε, η > 0, there is a closed subset F of E and an integer K such that |E − F| < η and |f (x) − fk (x)| < ε for x ∈ F and k > K. Proof. Fix ε, η > 0. For each m, let Em = {|f −fk | < ε for all k > m}. Thus, Em = {|f − fk | < ε}, so that Em is measurable. Clearly, Em ⊂ Em+1 . Moreover, k>m since fk → f a.e. in E and f is finite, Em E − Z, |Z| = 0. Hence, by Theorem 3.26, |Em | → |E − Z| = |E|. Since |E| < +∞, it follows that |E − Em | → 0. Choose m0 so that |E − Em0 | < 12 η, and let F be a closed subset of Em0 with |Em0 − F| < 12 η. Then |E − F| < η, and |f − fk | < ε in F if k > m0 . Proof of Egorov’s theorem. Given ε > 0, use Lemma 4.18 to select closed Fm ⊂ E, m ≥ 1, and integers Km,ε such that |E − Fm | < ε2−m and |f − fk | < 1/m

72

Measure and Integral: An Introduction to Real Analysis

in Fm if k > Km,ε . The set F = m Fm is closed, and since F ⊂Fm for all m, fk converges uniformly to f on F. Finally, E − F = E − Fm = (E − Fm ) and, therefore, |E − F| ≤ |E − Fm | < ε. This completes the proof. See Exercises 13 and 14 for an analogue of Egorov’s theorem in the continuous parameter case, that is, in the case when fy (x) → f (x) as y → y0 . We have observed that a continuous function is measurable. Our next result, Lusin’s theorem, gives a continuity property that characterizes measurable functions. In order to state the result, we first make the following definition. A function f defined on a measurable set E has property C on E if given ε > 0, there is a closed set F ⊂ E such that (i) |E − F| < ε (ii) f is continuous relative to F We recall that condition (ii) means that if x0 and {xk } belong to F and xk → x0 , then f (x0 ) is finite and f (xk ) → f (x0 ). In case F is bounded (and, therefore, compact), (ii) implies that the restriction of f to F is uniformly continuous (Theorem 1.15).

Lemma 4.19

A simple measurable function has property C .

Proof. Suppose that f is a simple measurable function on E, taking distinct values a1 , . . . , aN on measurable subsets E1 , . . . , EN . Given ε > 0, choose F closed Fj ⊂ Ej with |Ej − Fj | < ε/N. Then the set F = N j is closed, and j=1 since E − F = Ej − Fj ⊂ (Ej − Fj ), we have |E − F| ≤ |Ej − Fj | < ε. It remains only to show that f is continuous on F. Note that each Fj is rela

tively open in F, in fact, Fj = F ∩ C k =j Fk , so the only points of F in a small neighborhood of any point of Fj are points of Fj itself. The continuity on F of f follows from this since f is constant on each Fj . Property C is actually equivalent to measurability, as we now show.

Theorem 4.20 (Lusin’s Theorem) Let f be defined and finite on a measurable set E. Then f is measurable if and only if it has property C on E. Proof. If f is measurable, then by Theorem 4.13, there exist simple measurable fk , k = 1, 2, . . ., which converge to f . By Lemma 4.19, each fk has property C , so given ε > 0, there exist closed Fk ⊂ E such that |E − Fk | < ε2−k−1 and fk is continuous relative to Fk . Assuming for the moment that |E| < +∞, we see by Egorov’s theorem that there is a closed F0 ⊂ E with |E − F0 | < 12 ε such that

73

Lebesgue Measurable Functions

{ fk } converges uniformly to f on F0 . If F = F0 ∩ ( ∞ k=1 Fk ), then F is closed, each fk is continuous relative to F, and { fk } converges uniformly to f on F. Hence, f is continuous relative to F by Theorem 1.16. Since |E − F| ≤ |E − F0 | +

∞

|E − Fk |

a} = {x ∈ H : f (x) > a} ∪ {x ∈ Z : f (x) > a} =

∞

{x ∈ Fk : f (x) > a} ∪ {x ∈ Z : f (x) > a}.

k=1

Since {x ∈ Z : f (x) > a} has measure zero, the measurability of f will follow from that of each {x ∈ Fk : f (x) > a}. However, since f is continuous relative to Fk and Fk is measurable, {x ∈ Fk : f (x) > a} is measurable by Corollary 4.16. This completes the proof of Lusin’s theorem.

4.4 Convergence in Measure Let f and { fk } be measurable functions that are defined and finite a.e. in a set E. Then { fk } is said to converge in measure on E to f if for every ε > 0, lim |{x ∈ E : |f (x) − fk (x)| > ε}| = 0.

k→∞

We will indicate convergence in measure by writing m

fk −→ f .

74

Measure and Integral: An Introduction to Real Analysis

This concept has many useful applications in analysis. Here, we will discuss its relation to ordinary pointwise convergence; the first result is basically a reformulation of Lemma 4.18. Theorem 4.21 Let f and fk , k = 1, 2, . . ., be measurable and finite a.e. in E. If m fk → f a.e. on E and |E| < +∞, then fk −→ f on E. Proof. Given ε, η > 0, let F and K be as defined in Lemma 4.18. Then if k > K, {x ∈ E : |f (x) − fk (x)| > ε} ⊂ E − F, and since |E − F| < η, the result follows. We recall that this conclusion may not hold if |E| = +∞, as shown by the example E = Rn , fk = χ{x:|x| 1 < 1 j 2j for k ≥ kj . We may assume that kj . Let Ej = {|f − fkj | > 1/j}, and Hm = ∞ ∞ −j −j −m+1 and |f − f | ≤ 1/j in kj j=m Ej . Then |Ej | < 2 , |Hm | ≤ j=m 2 = 2 E − Ej . Thus, if j ≥ m, |f − fkj | ≤ 1/j in E − Hm , so that fkj → f in E − Hm . Then fkj → f in (E − Hm ) = E − Hm . Since |Hm | → 0, it follows that | Hm | = 0 and fkj → f a.e. in E. This completes the proof. The next theorem gives a Cauchy criterion for convergence in measure.

75

Lebesgue Measurable Functions

Theorem 4.23 A necessary and sufficient condition that { fk } converge in measure on E is that for each ε > 0, lim |{x ∈ E : |fk (x) − fl (x)| > ε}| = 0.

k,l→∞

Proof. The necessity follows from the formula 1 1 | fk − f l | > ε ⊂ | f k − f | > ε ∪ | f l − f | > ε 2 2 and the fact that the measures of the sets on the right tend to zero as k, l → ∞ m if fk −→ f . To prove the converse, choose Nj , j = 1, 2, . . . , so that if k, l ≥ Nj , then |{|fk − fl | > 2−j }| < 2−j . We may assume that Nj . Then |fNj+1 − fNj | ≤ 2−j except for a set Ej , |Ej | < 2−j . Let Hi = ∞ j=i Ej , i = 1, 2, . . .. Then |fNj+1 (x) − fNj (x)| ≤ 2−j for j ≥ i and x ∈ / Hi . It follows that (fNj+1 − fNj ) converges uniformly outside Hi for every i and, therefore, that { fNj } converges uniformly outside every Hi . Since |Hi | ≤ −j = 2−i+1 , we obtain that { f } converges a.e. in E and, letting f = Nj j≥i 2 m

m

lim fNj , that fNj −→ f on E. In order to show that fk −→ f on E, note that

1 1 | fk − f | > ε ⊂ | f k − f N j | > ε ∪ | f N j − f | > ε 2 2

for any Nj . To show that the measure of the set on the left is less than a prescribed η > 0 for all sufficiently large k, select Nj so that the first term on the right has measure less than 12 η for all large k (here, we use the Cauchy condition) and so that the measure of the second term on the right is also less than 1 2 η. This completes the proof. As pointed out at the beginning of the chapter, many of the results of the chapter depend on only a few basic properties of Lebesgue measurable sets and Lebesgue measure. This is especially true for the elementary properties of measurable functions (Corollary 4.2, Theorems 4.1 and 4.3 through 4.12) and the section about convergence in measure, which use only the fact that the class of measurable subsets of Rn is a σ-algebra and, in Theorem 4.5, that subsets of a set of measure zero are measurable. Egorov’s theorem uses two additional facts: the fundamental result Theorem 3.26 concerning monotone sequences of sets and Lemma 3.22 about the approximability of measurable

76

Measure and Integral: An Introduction to Real Analysis

sets by closed sets. Actually, even Lemma 3.22, which is a topological property of Lebesgue measure, is not needed in the proof of Egorov’s theorem if instead of requiring that F be closed, we merely require that it be measurable. The rest of the chapter, namely, the material on semicontinuous functions and Lusin’s theorem, uses somewhat more restrictive topological properties of Rn and Lebesgue measure. For example, about Rn , we have used metric properties, and about Lebesgue measure, we have used Lemma 3.22 (e.g., in Lemma 4.19) and the fact that Borel sets are measurable (e.g., in Corollary 4.16).

Exercises 1. Prove Corollary 4.2 and Theorem 4.8. 2. Let f be a simple function, taking its distinct values on disjoint sets E1 , . . . , EN . Show that f is measurable if and only if E1 , . . . , EN are measurable. 3. Theorem 4.3 can be used to define measurability for vector-valued (e.g., complex-valued) functions. Suppose, for example, that f and g are realvalued and finite in Rn , and let F(x) = (f (x), g(x)). Then F is said to be measurable if F−1 (G) is measurable for every open G ⊂ R2 . Prove that F is measurable if and only if both f and g are measurable in Rn . 4. Let f be defined and measurable in Rn . If T is a nonsingular linear transformation of Rn , show that f (Tx) is measurable. (If E1 = {x : f (x) > a} and E2 = {x : f (Tx) > a}, show that E2 = T−1 E1 .) 5. Give an example to show that φ(f (x)) may not be measurable if φ and f are measurable and finite. (Let F be the Cantor–Lebesgue function and let f be its inverse, suitably defined. Let φ be the characteristic function of a set of measure zero whose image under F is not measurable.) Show that the same may be true even if f is continuous. (Let g(x) = x + F(x), where F is the Cantor–Lebesgue function, and consider f = g−1 .) Cf. Exercise 22. 6. Let f and g be measurable functions on E: (a) If f and g are finite a.e. in E, show that f + g is measurable no matter how we define it at the points when it has the form +∞ + (−∞) or −∞ + ∞. (b) Show that fg is measurable without restriction on the finiteness of f and g. Show that f + g is measurable if it is defined to have the same value at every point where it has the form +∞ + (−∞) or −∞ + ∞. (Note that a function h defined on E is measurable if and only if both {h = +∞} and {h = −∞} are measurable and the restriction of h to the subset of E where h is finite is measurable.)

Lebesgue Measurable Functions

77

7. Let f be usc and less than +∞ on a compact set E. Show that f is bounded above on E. Show also that f assumes its maximum on E, that is, that there exists x0 ∈ E such that f (x0 ) ≥ f (x) for all x ∈ E. 8. (a) Let f and g be two functions that are usc at x0 . Show that f + g is usc at x0 . Is f − g usc at x0 ? When is fg usc at x0 ? (b) If { fk } is a sequence of functions that are usc at x0 , show that inf k fk (x) is usc at x0 . (c) If { fk } is a sequence of functions that are usc at x0 and converge uniformly near x0 , show that lim fk is usc at x0 . 9. (a) Show that the limit of a decreasing (increasing) sequence of functions usc (lsc) at x0 is usc (lsc) at x0 . In particular, the limit of a decreasing (increasing) sequence of functions continuous at x0 is usc (lsc) at x0 . (b) Let f be usc and less than +∞ on [a, b]. Show that there exist continuous fk on [a, b] such that fk f . (First show that there are usc step functions fk f .) 10. (a) If f is defined and continuous on E, show that {a < f < b} is relatively open and that {a ≤ f ≤ b} and { f = a} are relatively closed. (b) Let f be a finite function on Rn . Show that f is continuous on Rn if and only if f −1 (G) is open for every open G in R1 , or if and only if f −1 (F) is closed for every closed F in R1 . 11. Let f be defined on Rn and let B(x) denote the open ball {y : |x − y| < r} with center x and fixed radius r. Show that the function g(x) = sup{ f (y) : y ∈ B(x)} is lsc and that the function h(x) = inf{ f (y) : y ∈ B(x)} is usc on Rn . Is the same true for the closed ball {y : |x − y| ≤ r}? 12. If f (x), x ∈ R1 , is continuous at almost every point of an interval [a, b], show that f is measurable on [a, b]. Generalize this to functions defined in Rn . (For a constructive proof, use the subintervals of a sequence of partitions to define a sequence of simple measurable functions converging to f a.e. in [a, b]. Use Theorem 4.12. See also the proof of Theorem 5.54.) 13. One difficulty encountered in trying to extend the proof of Egorov’s theorem to the continuous parameter case fy (x) → f (x) as y → y0 is showing that the analogues of the sets Em in Lemma 4.18 are measurable. This difficulty can often be overcome in individual cases. Suppose, for example, that f (x, y) is defined and continuous in the square 0 ≤ x ≤ 1, 0 < y ≤ 1 and that f (x) = limy→0 f (x, y) exists and is finite for x in a measurable subset E of [0,1]. Show that if ε and δ satisfy 0 < ε, δ < 1, the set Eε,δ = {x ∈ E : |f (x, y)−f (x)| ≤ ε for all y < δ} is measurable. (If yk , k = 1, 2, . . ., is a dense subset of (0, δ), show that Eε,δ = k {x ∈ E : |f (x, yk ) − f (x)| ≤ ε}.) 14. Let f (x, y) be as in Exercise 13. Show that given ε > 0, there exists a closed F ⊂ E with |E − F| < ε such that f (x, y) converges uniformly for x ∈ F to f (x) as y → 0. (Follow the proof of Egorov’s theorem, using the

78

Measure and Integral: An Introduction to Real Analysis

sets Eε,1/m defined in Exercise 13 in place of the sets Em in the proof of Lemma 4.18.) 15. Let { fk } be a sequence of measurable functions defined on a measurable E with |E| < +∞. If |fk (x)| ≤ Mx < +∞ for all k for each x ∈ E, show that given ε > 0, there is a closed F ⊂ E and a finite M such that |E − F| < ε and |fk (x)| ≤ M for all k and all x ∈ F. m

16. Prove that fk −→ f on E if and only if given ε > 0, there exists K such that |{|f − f k | > ε}| < ε if k > K. Give an analogous Cauchy criterion. m

m

m

17. Suppose that fk −→ f and gk −→ g on E. Show that fk + gk −→ f + g on m E and, if |E| < +∞, that fk gk −→ fg on E. If, in addition, gk → g on E, g = m 0 a.e., and |E| < +∞, show that fk /gk −→ f /g on E. (For the product fk gk , write fk gk −fg = (fk − f )(gk − g)+f (gk − g)+g(fk − f ). Consider each term separately, using the fact that a function that is finite on E, |E| < +∞ is bounded outside a subset of E with small measure.) 18. If f is measurable on E, define ωf (a) = |{ f > a}| for −∞ < a < +∞. If m

fk f , show that ωfk ωf . If fk −→ f , show that ωfk → ωf at each m

point of continuity of ωf . (For the second part, show that if fk −→ f , then lim supk→∞ ωfk (a) ≤ ωf (a − ε) and lim inf k→∞ ωfk (a) ≥ ωf (a + ε) for every ε > 0.) 19. Let f (x, y) be a function defined on the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 which is continuous in each variable separately. Show that f is a measurable function of (x, y). Is the same true if f is only assumed to be continuous in x for each y? 20. If f is measurable and finite a.e. on [a, b], show that given ε > 0, there is a continuous g on [a, b] such that |{x : f (x) = g(x)}| < ε. (See Exercise 18 of Chapter 1.) Formulate and prove a similar result in Rn by combining Lusin’s theorem with the Tietze extension theorem. 21. Show that the necessity part of Lusin’s theorem is not true for ε = 0, that is, find a measurable set E and a finite measurable function f on E such that f is not continuous relative to E−Z for any Z with |Z| = 0. (Consider, e.g., χE for the set E in Exercise 25 of Chapter 3.) 22. (a) Show that if f is measurable and B is a Borel set in R1 , then f −1 (B) is measurable. (Recall that the Borel sets form the smallest σ-algebra that contains the open sets. Consider the collection of sets {E : f −1 (E) is measurable}.) Cf. Exercise 30 of Chapter 3. (b) If φ is a Borel measurable function on R1 and f is finite and measurable on Rn , show that φ(f (x)) is measurable on Rn . Cf. Theorem 4.6 and Exercise 5. 23. Let { fk }∞ k=1 be a sequence of measurable functions defined on a measurable set E. Show that the sets {x : lim fk (x) exists and is finite}, {x : lim fk (x) = +∞}, {x : lim fk (x) = −∞} are measurable.

Lebesgue Measurable Functions

79

24. Let f be a (Lebesgue) measurable function defined on a (Lebesgue) measurable set E. Show that there is a Borel set H ⊂ E and a Borel measurable function h on H such that |E| = |H| and f = h on H. (This can be deduced from Lusin’s theorem.) In case E is a Borel set, and consequently the exceptional set Z = E − H is also a Borel set, show that f can be redefined on Z so that the resulting function is Borel measurable on E.

5 The Lebesgue Integral

5.1 Definition of the Integral of a Nonnegative Function There are several equivalent ways to define the Lebesgue integral and develop its main properties. The approach we have chosen is based on the notion that the integral of a nonnegative f should represent the volume of the region under the graph of f . We start then with a nonnegative function f , 0 ≤ f ≤ +∞, defined on a measurable subset E of Rn . Let ( f , E) = x, f (x) ∈ Rn+1 : x ∈ E, f (x) < +∞ , R( f , E) = x, y ∈ Rn+1 : x ∈ E, 0 ≤ y ≤ f (x) if f (x) < +∞, and 0 ≤ y < +∞ if f (x) = +∞ . ( f , E) is called the graph of f over E and R( f , E) the region under f over E. If R( f , E) is measurable (as a subset of Rn+1 ), its measure |R( f , E)|(n+1) is called the Lebesgue integral of f over E, and we write |R( f , E)|(n+1) = f (x) dx. E

Usually, one of the abbreviations f dx E

or

f

E

is used, and at times the lengthy notation · · · f (x1 , . . . , xn ) dx1 . . . dxn E

is convenient. We stress that the definition applies only to nonnegative f ; a definition for functions that are not nonnegative will be given in Section 5.3. 81

82

Measure and Integral: An Introduction to Real Analysis

We also note that the existence of the integral is equivalent to the measurability of R( f , E) and does not require the finiteness of |R( f , E)|(n+1) . The next theorem is of basic importance.

Theorem 5.1 Let f be a nonnegative function defined on a measurable set E. Then f exists if and only if f is measurable. E We will show here only that the integral exists if f is measurable, postponing a proof of the converse until Theorem 6.11. We need several lemmas, the first of which proves the theorem for functions that are constant on E. In this case, R( f , E) is a cylinder set; that is, it has one of the forms {(x, y) : x ∈ E, 0 ≤ y ≤ a}, 0 ≤ a < +∞, or {(x, y) : x ∈ E, 0 ≤ y < +∞}. Lemma 5.2 Let E be a subset of Rn , 0 ≤ a ≤ +∞, and define Ea = {(x, y) : x ∈ E, 0 ≤ y ≤ a} for finite a and E∞ = {(x, y) : x ∈ E, 0 ≤ y < +∞}. If E is measurable (as a subset of Rn ), then Ea is measurable (as a subset of Rn+1 ) and |Ea |(n+1) = a|E|(n) .∗ Proof. The result follows from a series of simple observations. First, assume that a is finite. If |E| = 0 or if E is an interval that is either closed, partly open, or open, the result is clear. Next, if E is an open set, then by Theorem 1.11, it can be written as a disjoint union of partly open intervals, E = Ik . Therefore, Ea = Ik,a , and since Ik,a are measurable and disjoint, Ea is measurable and |Ea | = Ik,a = a |Ik | = a|E|. Let E be of type Gδ , E = ∞ k=1 Gk , with |G1 | < +∞. We may assume that Gk E by writing E = G1 ∩ (G1 ∩ G2 ) ∩ (G1 ∩ G2 ∩ G3 ) ∩ · · · . Therefore, by Theorem 3.26, |Gk | → |E| as k → ∞. Moreover, Gk,a is measurable, Gk,a = a |Gk |, and Gk,a Ea . Therefore, Ea is measurable and |Ea | = limk→∞ Gk,a = a limk→∞ |Gk | = a|E|. If E is any measurable set with |E| < +∞, then by Theorem 3.28, E = H − Z, where |Z| = 0, H is a set of type Gδ , H = ∞ k=1 Gk , and |G1 | < +∞. Since Ea = Ha − Za , we see that Ea is measurable and |Ea | = |Ha | = a|H| = a|E|. Finally, if |E| = +∞, the result follows by writing E as the countable union of disjoint measurable sets with finite measure. This completes the proof in case a is finite. If a = +∞, choose a sequence {ak } of finite numbers increasing to +∞. The conclusion then follows easily from the fact that Eak E∞ . As is easily seen (e.g., by using the same proof as above), the conclusion of Lemma 5.2 holds with Ea replaced by {(x, y) : x ∈ E, 0 ≤ y < a}, 0 ≤ a < +∞. ∗ Here and in the following text, 0 · ∞ and ∞ · 0 should be interpreted as 0.

83

The Lebesgue Integral

Lemma 5.3 If f is a nonnegative measurable function on E, 0 ≤ |E| ≤ +∞, then ( f , E) has measure zero. Proof. Given ε > 0 and k = 0, 1, . . . , let Ek = {kε ≤ f < (k + 1)ε}. The Ek are disjoint and measurable, their union is the subset of E where f is finite. and Hence, ( f , E) = f , Ek . Since ( f , Ek )e ≤ ε |Ek | by Lemma 5.2, we obtain

|( f , E)|e ≤

f , Ek ≤ ε |Ek | ≤ ε|E|. e

If |E| < +∞, this implies that ( f , E) has measure zero. If |E| = +∞, write E as the countable union of disjoint measurable sets with finite measure. Then ( f , E) is the countable union of sets of measure zero, and the lemma follows.

Proof of the sufficiency in Theorem 5.1. Let f be nonnegative and measurable on E. We must show that R( f , E) is measurable. By Theorem 4.13, there exist simple measurable fk f . Therefore, R fk , E ∪ ( f , E) R( f , E), and since (f , E) is measurable (with measure zero), it is enough to show that each R fk , E is measurable. Fix k and suppose that the distinct values of fk are a1 , . . . , aN , taken on measurable sets E1 , . . . , EN , respectively. Then R fk , E = N j=1 Ej,aj . Therefore, R fk , E is measurable by Lemma 5.2, and the proof is complete.

Corollary 5.4 If f is a nonnegative measurable function, taking constant values a1 , a2 , . . . (possibly +∞) on disjoint sets E1 , E2 , . . ., respectively, and if E = Ej , then E

f =

aj Ej .

j

Proof. Clearly R( f , E) = j Ej,aj . Since the Ej are measurable and disjoint, so are the Ej,aj . Therefore, E f = Ej,aj , and the corollary follows from the fact that Ej,aj = aj Ej . Note that Corollary 5.4 applies in particular to nonnegative simple measurable functions.

84

Measure and Integral: An Introduction to Real Analysis

5.2 Properties of the Integral Theorem 5.5 (i) If f and g are measurable and if 0 ≤ g ≤ f on E, then E g ≤ E f . In particular, E (inf f ) ≤ E f . (ii) If f is nonnegative and measurable on E and if E f is finite, then f < +∞ a.e. in E. (iii) Let E1 and E2 be measurable and E1 ⊂ E2 . If f is nonnegative and measurable on E2 , then E1 f ≤ E2 f . Proof. Parts (i) and (iii) follow from the relations R(g, E) ⊂ R( f , E) and R f , E1 ⊂ R f , E2 , respectively. To prove (ii), we may assume that |E| > 0. If f =+∞ in a subset E1 of E with positive measure, then by (iii) and (i), we have E f ≥ E1 f ≥ E1 a = a |E1 |, no matter how large a is. This contradicts the finiteness of E f . Theorem 5.6 (Monotone Convergence Theorem for Nonnegative Functions) If { fk } is a sequence of nonnegative measurable functions such that fk f on E, then fk → f . E

E

Proof. By Theorem 4.12, f is measurable. Since R fk , E ∪ ( f , E) R( f , E) and ( f , E) has measure zero, the result follows from Theorem 3.26.

Theorem 5.7 Suppose that f is nonnegative and measurable on E and that E is the countable union of disjoint measurable sets Ej , E = Ej . Then E

f =

f.

Ej

Proof. The sets R f , Ej are disjoint and measurable. Since R( f , E) = R f , Ej , the result follows from Theorem 3.23. The next four theorems are corollaries of the results just proved. The first one provides an alternate definition of the integral that will be useful in

85

The Lebesgue Integral

Chapter 10 as a motivation for defining integration with respect to abstract measures.

Theorem 5.8

Let f be nonnegative and measurable on E. Then

f = sup

inf f (x) Ej ,

x∈Ej

j

E

where the supremum is taken over all decompositions E = of a finite number of disjoint measurable sets Ej .

j Ej

of E into the union

The reader will observe that the formula resembles the definition of the Riemann integral if the Ej are taken to be subintervals. We note however that the roles of sup and inf cannot be interchanged; see Exercise 25. Proof. If E = N j=1 Ej is such a decomposition, consider the measurable function g taking values aj = inf y∈Ej f (y) on Ej , j = 1, . . . , N. Since 0 ≤ g ≤ f , we have by Corollary 5.4 and Theorem 5.5 that N aj Ej ≤ f . Therefore, j=1

sup

E

inf f |Ej | ≤ f . Ej

j

E

To prove the opposite inequality, consider for k = 1, 2, . . . , the sets (k) Ej , j = 0, 1, . . . , k2k , defined by

(k) Ej

=

j−1 j ≤ f < k , j = 1, . . . , k2k ; 2k 2

(k)

E0 = { f ≥ k},

and the corresponding measurable functions fk =

inf f χE(k) . j

(k)

Ej

j

(Compare the simple functions in Theorem 4.13(ii).) Then 0 ≤ fk f , and by the monotone convergence theorem, E fk → E f . Since E fk = (k) j inf E(k) f Ej by Corollary 5.4, it follows that j

86

Measure and Integral: An Introduction to Real Analysis

sup

j

(inf f ) |Ej | ≥ Ej

f,

E

which completes the proof. Theorem 5.9

Let f be nonnegative on E. If |E| = 0, then

Ef

= 0.

This can be proved in many ways; for example, it follows immediately from the last result. Measurability of f is automatic since |E| = 0. We can now slightly strengthen the statement of part (i) of Theorem 5.5. Theorem 5.10 If f and g are nonnegative and measurable on E and if g ≤ f a.e. in E, then E g ≤ E f . In particular, if f and g are nonnegative and measurable on E and if f = g a.e. in E, then E f = E g. Proof. Write E = A ∪ Z, where A and Z are disjoint and Z= {g > f }. Then |Z| = 0. Therefore, by Theorems 5.7 and 5.9, E f = A f + Z f = A f . Since the same is true for g, and since f ≥ g everywhere on A, the result follows. In defining E f , we assumed that f was defined everywhere in E. In view of Theorem 5.10, E f is unchangedif we modify f in a set of measure zero. Hence, we may consider integrals E f where f is defined only a.e. in E, by completing the definition of f arbitrarily in the set Z of measure zero where it is undefined. As a result of Theorem 5.9, this amounts to defining E f to be f . Similarly, we may extend the definition of the integral and the results E−Z earlier to measurable functions that are nonnegative only a.e. in E. Theorem 5.11 Let f be nonnegative and measurable on E. Then only if f = 0 a.e. in E.

Ef

= 0 if and

Proof. If f = 0 a.e. in E, then E f = 0 by Theorem 5.10. Conversely, suppose that f is nonnegative and measurable on E and that E f = 0. For α > 0, we have by Corollary 5.4 and Theorem 5.5 that α|{x ∈ E : f (x) > α}| ≤ f ≤ f = 0. {x∈E:f (x)>α}

E

Therefore, {x ∈ E : f (x) > α} has measure zero for every α > 0. Since the set where f > 0 is the union of those where f > 1/k, it follows that f = 0 a.e. in E. This proof also establishes the following useful result.

87

The Lebesgue Integral

Corollary 5.12 (Tchebyshev’s Inequality) Let f be nonnegative and measurable on E. If α > 0, then |{x ∈ E : f (x) > α}| ≤

1 f. α E

The significance of Tchebyshev’s inequality is that it estimates the size of f in terms of the integral of f . The next two theorems establish the linear properties of the integral for nonnegative functions.

If f is nonnegative and measurable, and if c is any nonnegative

Theorem 5.13 constant, then

cf = c

E

f.

E

Proof. If f is simple, then so is cf, and the theorem follows in this case from the formula for integrating simple functions (see Corollary 5.4). For arbitrary measurable f ≥ 0, choose simple measurable fk with 0 ≤ fk f . Then 0 ≤ cfk cf and

cf = lim

E

k→∞

cfk = lim c k→∞

E

fk = c

E

f.

E

If f and g are nonnegative and measurable, then

Theorem 5.14

( f + g) =

E

f+

E

g.

E

First, suppose that f and g are simple: f = N i=1 ai χAi and g = b χ , where E = A = B and all A and B are measurable. Then, i j j=1 j Bj i i j j f + g is also simple, taking values ai + bj on Ai ∩ Bj : f + g = i,j ai + bj χAi ∩Bj . Thus,

Ai ∩ Bj + Ai ∩ Bj ai + bj Ai ∩ Bj = ( f + g) = ai bj

Proof. M

i,j

E

=

ai |Ai | +

i

j

E

E

bj Bj = f + g.

j

i

88

Measure and Integral: An Introduction to Real Analysis

For general nonnegative measurable f and g, choose simple measurable fk and gk such that 0 ≤ fk f and 0 ≤ gk g. Then fk + gk is simple and 0 ≤ fk + gk f + g. Therefore, fk + gk = lim ( f + g) = lim fk + gk = f + g, k→∞

E

k→∞

E

E

E

E

E

which completes the proof. Corollary 5.15 Suppose that f and φ are measurable on E, 0 ≤ f ≤ φ, and finite. Then (φ − f ) = φ − f . E

E

Proof. By Theorem 5.14, we have the result follows by subtraction.

Theorem 5.16

Ef

+

Ef

is

E

− f) =

E (φ

E φ.

Since

Ef

is finite,

If fk , k = 1, 2, . . . , are nonnegative and measurable, then ∞ ∞

fk = fk . E

k=1

k=1 E

Proof. The functions FNdefined by FN = N k=1 fk are nonnegative and mea∞ surable and increase to k=1 fk . Hence, ∞ N ∞

fk = lim FN = lim fk = fk . E

k=1

N→∞

E

N→∞

k=1 E

k=1 E

Note that the preceding theorem is essentially a corollary of the Monotone Convergence Theorem 5.6. Conversely, Theorem 5.6 can be deduced from this result. Verification is left to the reader. The monotone convergence theorem gives a sufficient conditionfor interchanging the operations of integration and passage to the limit: E lim fk = lim E fk . It is an important problem to find other conditions under which this is true. First, we show that some restriction other than the mere convergence of fk to f is necessary. Let E be the interval [0, 1], and for k = 1, 2, . . . , let fk be defined as follows: when 0 ≤ x ≤ 1/k, the graph of fk consists of the sides of the isosceles triangle with altitude k and base [0, 1/k]; when 1/k ≤ x ≤ 1, fk (x) = 0.

89

The Lebesgue Integral

k

1

1/k

1 1 Clearly, fk → 0 on [0, 1], but 0 fk = 12 (1/k)(k) = 12 for all k. Hence, 0 lim fk < 1 lim 0 fk . For any nonnegative measurable fk such that fk → f and fk converges, the fact that f ≤ lim fk is a consequence of the next theorem. Theorem 5.17 (Fatou’s Lemma) If fk is a sequence of nonnegative measurable functions on E, then lim inf fk ≤ lim inf fk . k→∞

k→∞

E

E

Proof. First, note that the integral on the leftexists since its integrand is nonnegative and measurable. Next, let gk = inf fk , fk+1 , . . . for each k. Then, gk lim inf fk and 0 ≤ gk ≤ fk . Therefore, by Theorems 5.6 and 5.10, E

gk →

(lim inf fk ),

E

gk ≤

E

fk ,

E

so that E

(lim inf fk ) = lim

E

gk ≤ lim inf

fk .

E

Corollary 5.18 Let and measurable on E, and let fk , k = 1, 2, . . . , be nonnegative fk → f a.e. in E. If E fk ≤ M for all k, then E f ≤ M. Proof. By Fatou’s lemma, E lim inf fk ≤ M. Since lim inf fk = lim fk = f a.e. in E, the conclusion follows. We now prove a basic result about term-by-term integration of convergent sequences.

90

Measure and Integral: An Introduction to Real Analysis

Theorem 5.19 (Lebesgue’s Dominated Convergence Theorem for Nonnegative Functions) Let fk be a sequence of nonnegative measurable functions on E such that fk → f a.e. in E. If there exists a measurable function φ such that fk ≤ φ a.e. for all k and if E φ is finite, then fk → f . E

E

Proof. By Fatou’s lemma, f = lim inf fk ≤ lim inf fk , E

E

E

and the theorem will follow if we show that f ≥ lim sup fk . E

E

To prove this inequality, apply Fatou’s lemma to the nonnegative functions φ − fk , obtaining lim inf (φ − fk ) ≤ lim inf (φ − fk ). E

E

Since fk → f a.e., theintegrand on the left equals φ−f a.e., so that the integral on the left is E φ − E f by Corollary 5.15. The right-hand side equals

lim inf

φ−

E

fk

=

E

φ − lim sup

E

Combining formulas and cancelling lim sup E fk , as desired.

E φ,

fk .

E

we obtain the inequality

Ef

≥

5.3 The Integral of an Arbitrary Measurable f Let f be any measurable function defined on a set E. Then f = f + − f − and, by + − the comments following Theorem − 4.11, f and f are measurable. Therefore, + the integrals E f (x) dx and E f (x) dx exist and are nonnegative, possibly having value +∞. Provided at least one of these integrals is finite, we define E

f (x) dx =

E

f + (x) dx −

E

f − (x) dx

91

The Lebesgue Integral

and say that the integral E f (x) dx exists. If f ≥ 0, then f = f + , and this definition agrees with one. As in the case when f ≥ 0, we will use the the previous abbreviations E f dx and E f . The definition clearly applies if f is defined only a.e. in E, as in the case when f ≥ 0 (see p. 86 in Section 5.2). For the sake of simplicity, we shall usually in E. assume that f is defined everywhere If E f exists then, of course, −∞ ≤ E f ≤ +∞. If E f exists and is finite, we say that f is Lebesgue integrable, or simply integrable, on E and write f ∈ L(E). Thus, L(E) =

f :

f is finite .

E

If

Ef

exists, then E

f+ + f− , f ≤ f+ + f− = E

E

E

by Theorem 5.14. Since f + + f − = |f |, we obtain the inequality f dx ≤ |f | dx. E

Theorem 5.21 if |f | is.

(5.20)

E

Let f be measurable on E. Then f is integrable over E if and only

Proof. If |f | ∈ L(E) then f + , f − ∈ L(E), and consequently E f exists and is + − − is finite, and therefore, f f finite. If f ∈ L(E), then the difference E E must be finite. Hence, their since at least one of E f + or Ef − is finite, both sum is finite. Since this sum is E f + + f − = E |f |, it follows that |f | ∈ L(E). The simple properties of E f for general f follow from the results already established for f ≥ 0. As a first example, we have the following theorem.

Theorem 5.22

If f ∈ L(E), then f is finite a.e. in E.

Proof. If f ∈ L(E), then |f | ∈ L(E), and the result follows from Theorem 5.5(ii).

92

Measure and Integral: An Introduction to Real Analysis

Theorem 5.23 (i) If both E f and E g exist and if f ≤ g a.e. in E, then E f ≤ E g. Also, if f and functions with f = g a.e. in E and if E f exists, then E g exists and g are f = E E g. (ii) If E2 f exists and E1 is a measurable subset of E2 , then E1 f exists. + + − Proof. (i) The fact that f ≤ g a.e. implies that 0 ≤ f +≤ g and− 0 ≤g −≤ − + f a.e. in E. By Theorem 5.10, we then have E f ≤ E g and E f ≥ E g , and the first part of (i) follows by subtracting these inequalities. The proof of the second part of (i) is similar; note that measurability of f is equivalent to that of gwhen f = g a.e. (ii) If E2 f exists, at least one of E2 f + or E2 f − is finite. If E1 ⊂ E2 , then by + − Theorem 5.5(iii), at least one of E1 f or E1 f is finite. Therefore, E1 f exists.

Theorem 5.24 If E f exists and E = k Ek is the countable union of disjoint measurable sets Ek , then

f =

Proof. Each

Ek

f.

k Ek

E

f exists by Theorem 5.23(ii). We have

f =

E

E

f+ −

f− =

E

f+ −

Ek

f−

Ek

by Theorem 5.7. Since at least one of these sums is finite, we obtain E

⎞ ⎛

⎝ f + − f −⎠ = f = f. Ek

Ek

Ek

If |E| = 0 or if f = 0 a.e. in E, then

Theorem 5.25

Ef

= 0.

Proof. The theorem follows by applying Theorem 5.9 or 5.11 to f + and f − . The next few results deal with linearity properties of the integral.

Lemma 5.26

If

Ef

is defined, then so is

E (−f ),

and

E (−f )

=−

E f.

93

The Lebesgue Integral

Proof. Since (−f )+ = f − and (−f)− = f + , and at least one of finite, we have E (−f ) = E f − − E f + = − E f . Theorem 5.27

If

Ef

exists and c is any real constant, then

(cf ) = c

E

Ef

E (cf )

−

or

Ef

+

is

exists and

f.

E

− − Proof. If c ≥ 0, (cf )+ = cf + and by Theorem (cf− ) = cf . Therefore, 5.13, + + − (cf ) = c E f and E (cf ) = c E f . It follows that E (cf ) exists and E (cf ) = E c E f + − E f − = c E f . If c = −1, the theorem reduces to Lemma 5.26. For any c ≤ 0, we have c = (−1)(|c|), and the result follows from the cases c = −1 and c ≥ 0.

Theorem 5.28

If f , g ∈ L(E), then f + g ∈ L(E) and E

( f + g) =

f+

E

g.

E

Proof. Since |f + g| ≤ |f | + |g|, we have from Theorems 5.23(i) and 5.14 that E

|f + g| ≤

E

(|f | + |g|) =

E

|f | +

|g| < +∞.

E

Hence, f + g ∈ L(E). To prove the rest of the theorem, first note that if g = 0 everywhere in E, then the formula E ( f + g) = E f + E g is obvious by Theorem 5.25. Thus, by writing E = (E ∩ {g = 0}) ∪ (E ∩ {g = 0}) and using Theorem 5.24 for each of the three integrals in the formula, it is enough to prove the formula with E replaced by E ∩ {g = 0}. Hence, because of similar considerations for f , it suffices to prove the formula under the extra assumption that f and g never vanish on E. To do so, we begin by considering the following six cases, in which each inequality is assumed to hold everywhere in E: (1) f > 0, g > 0 (so that f + g > 0); (2) f > 0, g < 0, f + g ≥ 0; (3) f > 0, g < 0, f + g < 0; (4) f < 0, g > 0, f + g ≥ 0; (5) f < 0, g > 0, f + g < 0; (6) f < 0, g < 0 (so that f + g < 0). Note that these possibilities are mutually exclusive. The result in case 1 is just Theorem 5.14. Cases 2–6 are all similar, and we shall consider only case 2. Then f > 0, −g > 0, f + g ≥ 0, and since by Theorem 5.22 each function is finite = f a.e. Hence, by Theorems 5.23(i) and a.e., ( f + g) + (−g) 5.14, we have E ( f + g) + E (−g) = E f . The result in case 2 now follows from Lemma 5.26 and the fact that all the integrals involved are finite.

94

Measure and Integral: An Introduction to Real Analysis

For arbitrary f and g in L(E) that never vanish in E, we subdivide E into at most six measurable sets, E1 , . . . , E6 , where possibilities (1), . . . , (6) hold, respectively. Since Ei and Ej are disjoint for i = j, we have ⎛ ⎞ 6 6

⎝ f + g⎠ = f + g. ( f + g) = ( f + g) = E

j=1 Ej

j=1

Ej

Ej

E

E

This completes the proof. It follows that if fk ∈ L(E), k = 1, . . . , N, and if ak are real constants, then N k=1 ak fk ∈ L(E) and N N

ak fk = ak fk . k=1

E

k=1

E

Corollary 5.29 Let f and φ be measurable on E, f ≥ φ a.e., and φ ∈ L(E). Then, ( f − φ) = f − φ. E

E

E

Proof. First, note that E f exists since f − ≤ φ− a.e. implies that E f − is finite. Next, E ( f −φ) exists since f −φ ≥ 0 a.e. If f ∈ L(E), the corollary follows from have Theorem 5.28. If f ∈ L(E), then since f − ∈ L(E), we must E f = +∞. The fact that φ ∈ L(E) implies that f − φ ∈ L(E), so that E ( f − φ) = +∞ since f − φ ≥ 0 a.e. This completes the proof. In Chapter 8, we will study conditions on f and g that imply that fg ∈ L(E). For now, we have the following simple result. Theorem 5.30 If f ∈ L(E), g is measurable on E, and there exists a finite constant M such that |g| ≤ M a.e. in E, then fg ∈ L(E) and E |fg| ≤ M E |f |. |fg| ≤ M|f Proof. Since | a.e. in E, we have by Theorems 5.10 and 5.27 that |fg| ≤ M|f | = M E E E |f |. Hence, fg ∈ L(E). Corollary 5.31 If f ∈ L(E), f ≥ 0 a.e., and there exist finite constants α and β such that α ≤ g ≤ β a.e. in E, then α f ≤ fg ≤ β f . E

E

E

95

The Lebesgue Integral

Proof. By Theorem 5.30, fg ∈ L(E). Since f ≥ 0 a.e., we have αf ≤ fg ≤ βf a.e. in E, and the conclusion follows by integrating. We now study conditions that imply that E fk → E f if fk → f in E. Most of the results are extensions of those we derived for nonnegative functions. Theorem 5.32 (Monotone Convergence Theorem) Let { fk } be a sequence of measurable functions on E: (i) If fk E and there exists φ ∈ L(E) such that fk ≥ φ a.e. on E for all k, f a.e. on then E fk → E f . (ii) If fk E and there exists φ ∈ L(E) such that fk ≤ φ a.e. on E for all k, f a.e. on then E fk → E f . Proof. To prove (i), we may assume by Theorem 5.25 that fk f and fk ≥ φ that by Theorem 5.6, everywhere on E. Then 0 ≤ fk − φ f − φ on E, so f − φ → ( f −φ). Therefore, by Corollary 5.29, f − φ → f − E k E E k E E E φ, and since φ ∈ L(E), the result follows. We can deduce (ii) from (i) by considering the functions −fk . Details are left to the reader. Theorem 5.33 (Uniform Convergence Theorem) Let fk ∈ L(E) fork = 1, 2,. . ., and let {fk } converge uniformly to f on E, |E| < +∞. Then f ∈ L(E) and E fk → E f . Proof. Since |f | ≤ fk + f − fk and fk converge uniformly to f on E, we have |f | ≤ fk + 1 on E if k is sufficiently large. Since |E| < +∞, it follows that f ∈ L(E). From Theorem 5.28 and (5.20), we obtain f − f k = ( f − f k ) ≤ f − f k . E

E

E

E

The last integral is bounded by supx∈E f (x) − fk (x) |E|, which by hypothesis tends to 0 as k → ∞. This proves the theorem. Theorem 5.34 (Fatou’s Lemma) Let fk be a sequence of measurable functions on E. If there exists φ ∈ L(E) such that fk ≥ φ a.e. on E for all k, then lim inf fk ≤ lim inf fk . E

k→∞

k−∞

E

96

Measure and Integral: An Introduction to Real Analysis

Proof. The result follows by first applying Theorem 5.17 to the sequence fk − φ of nonnegative functions, and then using Corollary 5.29. Details are left to the reader. Corollary 5.35 Let fk be a sequence of measurable functions on E. If there exists φ ∈ L(E) such that fk ≤ φ a.e. on E for all k, then lim sup fk ≥ lim sup fk . E

k−∞

k→∞

E

Proof. This follows from Fatou’s lemma since −fk ≥ − φ a.e. and lim inf −fk = − lim sup fk . Theorem 5.36 (Lebesgue’s Dominated Convergence Theorem) Let fk be a sequence of measurable functions on E such that fk → f a.e. in E. If there exists φ ∈ L(E) such that fk ≤ φ a.e. in E for all k, then E fk → E f . Proof. By hypothesis, −φ ≤ fk ≤ φ a.e. in E. Therefore, 0 ≤ fk + φ ≤ 2φ a.e. in E. Since 2φ ∈ L(E), we conclude from Theorem 5.19 that E fk + φ → E ( f + φ). Since φ, f , and all the fk are integrable on E, the result follows from Theorem 5.28. See Exercises 23 and 26 for two useful variants of Theorem 5.36, one about weakening the assumption that fk ≤ φ and the other about replacing the hypothesis of pointwise convergence of fk by convergence in measure. The following special case of the dominated convergence theorem is often useful if E has finite measure. Corollary 5.37 (Bounded Convergence Theorem) Let fk be a sequence of measurable functions on E such that fk → f a.e. inE. If |E| < +∞ and there is a finite constant M such that |fk | ≤ M a.e. in E, then E fk → E f . In later chapters, we will consider the integrals of complex-valued functions. Here, we mention only the definition. (See p. 183 in Section 8.1 for some further remarks.) If f = f1 + if2 with f1 and f2 real-valued, we define E

f =

E

f1 + i

E

f2 ,

97

The Lebesgue Integral

provided the integrals on the right exist and are finite. (For the measurability of such f , see Exercise 3 Chapter 4.) Many basic properties of the ordinary integral are valid in this case.

5.4 Relation between Riemann–Stieltjes and Lebesgue Integrals, and the Lp Spaces, 0 < p < ∞ It turns out that there is a remarkably simple and useful representation of Lebesgue integrals over subsets of Rn in terms of Riemann–Stieltjes integrals (over subsets of R1 , of course). In order to establish this relation, we must first study the function ω(α) = ωf ,E (α) = |{x ∈ E : f (x) > α}|, where f is a measurable function on E and −∞ < α < +∞. We call ωf ,E the distribution function of f on E. Some properties of ω were given in Exercise 18 of Chapter 4. Clearly, it is not affected by changing f in a set of measure zero, and it is decreasing. As α +∞, {x ∈ E : f (x) > α} {x ∈ E : f (x) = +∞}; hence, assuming that f is finite a.e. in E, by Theorem 3.26(ii), lim ω(α) = 0,

α→+∞

unless ω(α) ≡ +∞. Similarly, lim ω(α) = |E|.

α→−∞

We will assume from now on that |E| < +∞. This insures that ω is bounded, that limα→+∞ ω(α) = 0, and that ω is of bounded variation on (−∞, +∞) with variation equal to |E|. The assumption is made only to simplify the properties of ω, and is not entirely necessary (see, e.g., Exercise 16); in fact, the case |E| = +∞ is often important. In the following results, we assume that f is a measurable function that is finite a.e. in E, |E| < +∞, and we write ω(α) = ωf ,E (α), Lemma 5.38

{ f > a} = {x ∈ E : f (x) > α}, etc.

If α < β, then |{α < f ≤ β}| = ω(α) − ω(β).

98

Measure and Integral: An Introduction to Real Analysis

Proof. We have { f > β} ⊂ { f > α} and {a < f ≤ β} = { f > α} − { f > β}. Since |{ f > β}| < +∞, the lemma follows from Corollary 3.25. Given α, let ω(α+) = lim ω(α + ε), ε0

ω(α−) = lim ω(α − ε) ε0

denote the limits of ω from the right and left at α.

Lemma 5.39 (a) ω(α+) = ω(α); that is, ω is continuous from the right. (b) ω(α−) = |{ f ≥ α}|. Proof. If εk 0, then { f > α + εk } { f > α} and { f > α − εk } { f ≥ α}. Since these sets have finite measures, it follows from Theorem 3.26 that ω(α + εk ) → ω(α) and ω(α − εk ) → |{ f ≥ α}|. This completes the proof. We now know that ω is a decreasing function that is continuous from the right. It may have jump discontinuities, with jumps ω(α−)−ω(α), and intervals of constancy. These possibilities are characterized by the behavior of f stated in the following result.

Corollary 5.40 (a) ω(α−) − ω(α) = |{ f = α}|; in particular, ω is continuous at α if and only if |{ f = α}| = 0. (b) ω is constant in an open interval (α, β) if and only if |{α < f < β}| = 0, that is, if and only if f takes almost no values between α and β. Proof. Since |{ f ≥ α}| = |{ f > α}| + |{ f = α}|, part (a) follows immediately from Lemma 5.39(b). To prove part (b), note that |{α < f < β}| = |{ f > α}| − |{ f ≥ β}| = ω(α) − ω(β−). This is zero if and only if ω is constant in the half-open interval [α, β). However, since ω is continuous from the right, it is constant in (α, β) if and only if it is constant in [α, β). The rest of the theorems in this section give relations between Lebesgue and Riemann–Stieltjes integrals. As always, f is measurable and finite a.e. in E, |E| < + ∞ and ω = ωf ,E·

99

The Lebesgue Integral

Theorem 5.41

If a < f (x) ≤ b (a and b finite) for all x ∈ E, then

f =−

b

α dω(α).

a

E

Proof. The Lebesgue integral on the left exists and is finite since f is bounded and |E| < +∞. The Riemann–Stieltjes integral on the right exists by Theorem 2.24. To show that they are equal, partition the interval [a, b] by a = α0 < α1 < · · · < αk = b and let Ej = αj−1 < f ≤ αj . The Ej are disjoint and E = kj=1 Ej . Hence, E f = kj=1 Ej f and, therefore, k

k

αj−1 Ej ≤ f ≤ αj Ej .

j=1

E

j=1

By Lemma 5.38, Ej = ω αj−1 − ω αj = − ω αj − ω αj−1 . Hence, b these sums are Riemann–Stieltjes sums for − a α d ω(α). Since these sums b must converge to − a α d ω(α) as the norm of the partition tends to zero, the conclusion follows. Theorem 5.41 can be extended to the case when f is not bounded on E as follows. Theorem 5.42 Let f be any measurable function on E, and let Eab = {x ∈ E : a < f (x) ≤ b} (a and b finite). Then, Eab

f =−

b

α dω(α).

a

Proof. Let ωab (α) = {x ∈ Eab : f (x) > α}. Then ωab is the distribution func b tion of f on Eab . By Theorem 5.41, we have Eab f = − a α dωab (α). We claim b that the last expression equals − a α dω(α). By taking limits of Riemann– Stieltjes sums that approximate the integrals, we only need to show that for a ≤ α < β ≤ b. By Lemma 5.38, this is ωab (α) − ωab (β) = ω(α) − ω(β) equivalent to showing that x ∈ Eab : α < f (x) ≤ β = |{x ∈ E : α < f (x) ≤ β}| for such the definition of Eab and the restrictions on α and β. However, by α and β, x ∈ Eab : α < f (x) ≤ β = {x ∈ E : α < f (x) ≤ β}. This proves the claim and the theorem too.

100

Measure and Integral: An Introduction to Real Analysis

In both Theorems 5.41 and 5.42, the integrals of f are extended over sets where f is bounded. This restriction is removed in the next theorem, where we define (see p. 34 in Section 2.4) +∞

α dω(α) = lim

b

a→−∞ b→+∞ a

−∞

α dω(α),

if the limit exists.

Theorem 5.43 If either exists and is finite, and

Ef

or

+∞ −∞

f =−

α dω(α) exists and is finite, then the other +∞

α dω(α).

−∞

E

b Proof. By Theorem 5.42, Eab f = − a α dω(α). If f ∈ L(E), then Eab f → f + and f − . Therefore, E f as a → −∞, b → +∞ since this holds for both b lima→−∞, b→+∞ − a α dω(α) exists and equals E f , which proves half of the theorem. ∞ +∞ Now suppose that −∞ α dω(α) exists and is finite. Then 0 α dω(α) ∞ is finite, and we claim that E f + = − 0 α dω(α). By Theorem 5.42, for b ∞ b > 0, E0b f = − 0 α dω(α). Therefore, as b → +∞, E0b f → − 0 α dω(α). On the other hand, as b increases to +∞, E0b {0 < f < +∞}. Therefore, E0b

f =

E0b

f+ →

{0 0.

(5.49)

{ f >α}

The proof is left as an exercise. Hence, if f is in Lp (E), then αp ω(α) remains bounded as α → + ∞. A stronger result is actually true.

105

The Lebesgue Integral

If 0 < p < ∞ and f ∈ Lp (E), then

Lemma 5.50

lim αp ω(α) = 0.

α→+∞

Proof. This will be a corollary of (5.49) if we show that { f >α} f p → 0 as α → +∞. We may suppose that α runs through a sequence αk → +∞. Let p fk = f wherever f > αk and fk = 0 elsewhere. Then { f >αk } f p = E fk . Since f is p

finite a.e., fk → 0 a.e. Moreover, 0 ≤ fk ≤ |f |p ∈ L(E), and the result follows from the dominated convergence theorem. In the next theorem, we use Lemma 5.50 to integrate the Riemann–Stieltjes integral in (5.48) by parts.

Theorem 5.51

If 0 < p < ∞, f ≥ 0, and f ∈ Lp (E), then E

fp = −

∞

αp dω(α) = p

0

∞

αp−1 ω(α)dα,

0

where the last integral may be interpreted as either a Lebesgue or an improper Riemann integral. Proof. The first equality is just (5.48). For the second, if 0 < a < b < +∞, we have −

b

αp dω(α) = −bp ω(b) + ap ω(a) + p

a

b

αp−1 ω(α) dα,

a

by Theorem 2.21 and the fact that αp is continuously differentiable on [a, b]. Here, together with Theorem 2.21, we use the fact that for partitions = {αk } of [a, b] and intermediate points {βk } satisfying αk < βk < αk+1 and p p p−1 αk+1 − αk , we have αk+1 − αk = pβk b a

p−1 ω(α) d αp = lim ω (βk ) pβk αk+1 − αk ||→0

=p

b a

k

αp−1 ω(α)dα.

106

Measure and Integral: An Introduction to Real Analysis

The last integral exists in the Riemann sense by Theorem 5.54 since the number of discontinuities of ω is at most countable. By Theorem 5.52, the last integral may also be interpreted in the Lebesgue sense. Now let a → 0 and b → +∞. Then bp ω(b) → 0 by Lemma 5.50, ap ω(a) → 0 since |E| < +∞ (see also Exercise 14), and the theorem follows. Note that in case 1 ≤ p < ∞, the proof works with a = 0 and no other changes. For an extension of Theorem 5.51, see Exercise 16. See also Exercise 5 of Chapter 6. In practice, the representation of E |f |p as a Riemann–Stieltjes integral provides a powerful tool for determining whether or not f ∈ Lp (E).

5.5 Riemann and Lebesgue Integrals We now study a relation between Lebesgue and Riemann integrals over finite bounded functions intervals [a, b] in R1 and give a characterization of those that are Riemann integrable. The Lebesgue integral [a,b] f will be denoted by b b a f and the Riemann integral by (R) a f . Theorem 5.52 Let f be a bounded function that is Riemann integrable on [a, b]. Then f ∈ L[a, b] and b

f = (R)

a

b

f.

a

Proof. Let {k } be a sequence of partitions of [a, b] with norms tending to (k) (k) zero. For each k, define two simple functions as follows: if x1 < x2 < · · · are the partitioning points of k , let lk (x) and uk (x) be defined in each semiopen

(k) (k) (k) interval [x(k) i , xi+1 ) as the inf and sup of f on xi , xi+1 , respectively. Then lk and uk are uniformly bounded and measurable in [a, b), and if Lk and Uk denote the lower and upper Riemann sums of f corresponding to k , we have

b a

lk = Lk ,

b

uk = Uk .

a

Note also that lk ≤ f ≤ uk on the half-open interval [a, b) and, if we assume that k+1 is a refinement of k , that lk and uk . Let l = limk→∞ lk and u = limk→∞ uk . Then l and u are measurable, l ≤ f ≤ u on [a, b), and, by b b the bounded convergence theorem, Lk → a l and Uk → a u. But since

107

The Lebesgue Integral

f is Riemann integrable, Lk and Uk both converge to (R) Therefore, b (R)

f =

a

b

l=

b

a

b a

f by Theorem 2.29.

u.

a

Since u − l ≥ 0, Theorem 5.11 implies that l = f = u a.e. in [a, b]. Therefore, f b b is measurable and (R) a f = a f , which completes the proof. Theorem 5.52 says that any function that is Riemann integrable is also Lebesgue integrable and that the two integrals are equal. There are, of course, bounded functions that are Lebesgue integrable but not Riemann integrable. One such is the Dirichlet function defined for 0 ≤ x ≤ 1 by letting f (x) = 1 if x is rational and f (x) = 0 if x is irrational. Since f = 0 except for a subset of [0,1] of measure zero, its Lebesgue integral is 0. On the other hand, its Riemann integral does not exist since every upper Riemann sum is 1 and every lower Riemann sum is 0. The practical value of Theorem 5.52 is that it allows us to compute the Lebesgue integral of Riemann integrable (e.g., continuous) functions. Using the monotone convergence theorem, we can easily extend Theorem 5.52 to include improper Riemann integrals of nonnegative functions. Special as it is, the following result is useful in applications. Theorem 5.53 Let f be nonnegative on a finite interval [a, b] and Riemann integrable (so, in particular, bounded) over every subinterval [a + ε, b], ε > 0. Define the improper Riemann integral b

I = lim (R) ε→0

f,

a+ε

0 ≤ I ≤ +∞. Then f is measurable on [a, b] and b

f = I.

a

Proof. Observe that by Theorem 5.52, for every ε > 0, f is measurable on b b [a + ε, b] and a+ε f = (R) a+ε f . Measurability of f on [a, b] follows easily from b Theorem 4.12 by letting ε → 0. The formula a f = I follows similarly from the monotone convergence theorem.

108

Measure and Integral: An Introduction to Real Analysis

Similar results hold for improper Riemann integrals over infinite intervals. For example, if a ∈ (−∞, ∞) and f is nonnegative on [a, ∞) and Riemann integrable on every [a, N], a < N < ∞, then f is measurable on [a, ∞) and N ∞ f = limN→∞ (R) a f . Verification is left to the reader; note that since a N (R) a f increases with N, the limit exists but may be +∞. We note in passing that the finiteness of the improper Riemann integral of an f that is not nonnegative does not in general imply that f is integrable (see Exercise 7). Our final result is a characterization of those bounded functions that are Riemann integrable. Theorem 5.54 A bounded function is Riemann integrable on [a, b] if and only if it is continuous a.e. in [a, b]. Proof. Suppose that f is bounded and Riemann integrable. Let k , lk , uk , etc., be as in the proof of Theorem 5.52. Let Z be the set of measure zero outside which l = f = u. We claim that if x is not a partitioning point of any k and if x∈ / Z, then f is continuous at x. In fact, if f is not continuous at x and x is never a partitioning point, there exists ε > 0, depending on x but not on k, such that / Z. uk (x)−lk (x) ≥ ε. This implies that u(x)−l(x) ≥ ε, which is impossible if x ∈ Therefore, f is continuous a.e. in [a, b]. To prove the converse, let f be a bounded function that is continuous a.e. in [a, b]. Let k be any sequence of partitions with norms tending to zero, and define the corresponding lk , uk , Lk , and Uk as in Theorem 5.52. Note that lk and uk may not be monotone since k+1 may not be a refinement of k . However, by the continuity of f , both lk and uk converge a.e. to f . Hence, b b b by the bounded convergence theorem, a lk and a uk both converge to a f . b b Since Lk = a lk and Uk = a uk , it follows that the upper and lower Riemann sums converge to the same limit. Therefore, f is Riemann integrable.

Exercises 1. If f is a simple measurable function (not necessarily nonnegative) tak N ing values aj on Ej , j = 1, 2, . . . , N, show that E f = j=1 aj Ej . (Use Theorem 5.24.) 2. Show that the conclusions of Theorem 5.32 are not generally true without the assumption that φ ∈ L(E). (In part (ii), for example, take fk = χ(k,∞) .) Show that Theorem 5.33 fails without the assumption that |E| < ∞.

109

The Lebesgue Integral

3. Let fk be a sequence of nonnegative measurable functions defined on E. If fk → f and fk ≤ f a.e. on E, show that E fk → E f . 4. If f ∈ L(0, 1), show that xk f (x) ∈ L(0, 1) for k = 1, 2, . . . , and that 1 k 0 x f (x) dx → 0. 5. Use Egorov’s theorem to prove the bounded convergence theorem. 6. Let f (x, y), 0 ≤ x, y ≤ 1, satisfy the following conditions: for each x, f (x, y) is an integrable function of y, and (∂f (x, y)/∂x) is a bounded function of (x, y). Show that (∂f (x, y)/∂x) is a measurable function of y for each x and ∂ d f (x, y) dy. f (x, y) dy = dx ∂x 1

1

0

0

7. Give an example of an f that is not integrable, but whose improper Riemann integral exists and is finite. 8. Prove (5.49).

p m 9. If p > 0 and E f − fk → 0 as k → ∞, show that fk −→f on E (and thus that there is a subsequence fkj → f a.e. in E). p p 10. If p > 0, E f − fk → 0, and E fk ≤ M for all k, show that E |f |p ≤ M.

11. For which p > 0 does 1/x ∈ Lp (0, 1)? Lp (1, ∞)? Lp (0, ∞)? 12. Give an example of a bounded continuous f on (0, ∞) such that limx→∞ f (x) = 0 but f ∈ / Lp (0, ∞) for any p > 0. 13. (a) Let fk be a sequence of measurable on E. Show that fk functions converges absolutely a.e. in E if E fk < +∞. (Use Theorems 5.16 and 5.22.) (b) If {rk } denotes the in [0,1] and {ak } satisfies |ak | < rational numbers +∞, show that ak |x − rk |−1/2 converges absolutely a.e. in [0,1]. 14. Prove the following result (which is obvious if |E| < +∞), describing the behavior of ap ω(a) as a → 0+. If f ∈ Lp (E), then lima→0+ ap ω(a) = 0. (If f ≥ 0, ε > 0, choose δ > 0 so that { f ≤δ} f p < ε. Thus, ap [ω(a) − ω(δ)] ≤ p {a 0, show that f ∈ Lr , 0 < r < p. k kp 18. If f ≥ 0, show that f ∈ Lp if and only if +∞ < +∞. (Use k=−∞ 2 ω 2 Exercise 16.) 0

19. Derive analogues of Theorems 5.52 and 5.54 for integrals over intervals in Rn , n > 1. 20. Let y = Tx be a nonsingular linear transformation of Rn . If E f (y) dy exists, show that E

f (y) dy = | det T|

f (Tx) dx.

T−1 E

(The case when f = χE1 , E1 ⊂ E, follows from integrating the formula χE1 (Tx) = χT−1 E1 (x) over T−1 E and then applying Theorem 3.35.) 21. If A f = 0 for every measurable subset A of a measurable set E, show that f = 0 a.e. in E. 22. Show that the conclusion ofthe dominated convergence theo Lebesgue rem can be strengthened to E fk − f → 0. 23. Prove the following fact, sometimes referred to as the Sequential (or Gener alized) Version of the Lebesgue Dominated Convergence Theorem. Let fk and {φk } be sequences of measurable functions on E satisfying fk → f a.e. in E, φk → φ a.e. in E, and fk ≤ φk a.e. in E. If φ ∈ L(E) and E φk → E φ, then f → 0. (In case f = 0 and all fk ≥ 0, apply Fatou’s lemma E fk − to φk − fk .) An application is given in Exercise 12of Chapter 8; for ≥ 0, f → f a.e. in E, f ∈ L(E), and f → example, if f k k k E E f , then E fk − f → 0. 24. A measurable function f on E is said to belong to weak Lp (E), 0 < p < ∞, if there is a constant A ≥ 0 such that ω|f | (α) ≤ Aα−p for all α > 0 (cf. (7.8) in case p = 1): (a) Show that if f ∈ Lp (E), then f belongs to weak Lp (E), but that the converse is generally false.

(b) Show that if 1 < p < r < ∞ and f belongs to both weak L1 (E) and weak Lr (E), then f ∈ Lp (E). (c) Show that if f belongs to weak L1 (E) and f is bounded on E, then f ∈ Lp (E) for all 1 < p < ∞. 25. Give an example to show that the analogue of Theorem 5.8 with the roles of sup and inf interchanged is false. 26. Prove the following variant of Lebesgue’s dominated convergence theo m rem: if fk satisfies fk −→f on E and fk ≤ φ ∈ L(E), then f ∈ L(E) and

111

The Lebesgue Integral

that every subsequence of fk has a subsequence E fk → E f . (Show fkj such that E fkj → E f .) 27. The notion of equimeasurability of functions can be extended to different sets E1 and E2 , even in different dimensions, by saying that two measurable functions f1 , f2 defined on E1 , E2 , respectively, are equimeasurable if x ∈ E1 : f1 (x) > α = y ∈ E2 : f2 (y) > α

for all α.

(a) Show that if f is measurable and finite a.e. in E and ωf is strictly decreasing and continuous, then f and the inverse function of ωf are equimeasurable (on E and (0, |E|), respectively). (b) Let f be measurable and finite a.e. in E, and suppose that ωf is finite. Define f ∗ (t) = inf α > 0 : ωf (α) ≤ t , t > 0. Show that f and f ∗ are equimeasurable on E and (0, ∞), respectively. (The function f ∗ is called the nonincreasing rearrangement of f .) 28. Let E be a measurable set in Rn with |E| < ∞. Suppose that f > 0 a.e. in E and f , log f ∈ L1 (E). Prove that ⎛ ⎞ ⎞1/p 1 1 lim ⎝ f p ⎠ = exp ⎝ log f ⎠ . |E| p→0+ |E| ⎛

E

E

(Start by using Theorem 5.36 to show that E f p → |E| as p → 0+. Note that f p − 1 /p → log f .) 29. Let f be measurable, nonnegative, and finite a.e. in a set E. Prove that for any nonnegative constant c, E

ecf (x) dx = |E| + c

∞

ecα ωf (α) dα.

0

Deduce that ecf ∈ L(E) if |E| < ∞ and there exist constants C1 and c1 such that c1 > c and ωf (α) ≤ C1 e−c1 α for all α > 0. We will study such an exponential integrability property in Section 14.5.

6 Repeated Integration Let f (x, y) be defined in a rectangle I=

x, y : a ≤ x ≤ b, c ≤ y ≤ d .

If f is continuous, we have the classical formula ⎡ ⎤ b d f x, y dxdy = ⎣ f x, y dy⎦ dx, I

a

c

and there is an analogous formula for functions of n variables. Sections 6.1 and 6.2 extend this and related results on repeated integration to the case of Lebesgue integrable functions. Section 6.3 contains some applications.

6.1 Fubini’s Theorem We shall use the following notation. Let x = (x1 , . . ., xn ) be a point of an n-dimensional interval I1 , I1 = {x = (x1 , . . . , xn ) : ai ≤ xi ≤ bi , i = 1, . . . , n} , and let y be a point of an m-dimensional interval I2 , I2 = y = y1 , . . . , ym : cj ≤ yj ≤ dj , j = 1, . . . , m . Here, I1 and I2 may also be partly open or unbounded, such as all of Rn and Rm , respectively. The Cartesian product I = I1 × I2 is contained in Rn+m and consists of points (x1 , . . ., xn , y1 , . . ., ym ). We shall denote such points by (x,y). A function m ) defined in I will be written f (x,y), and its f (x1 , . . ., xn , y1 , . . ., y integral I f will be denoted by I f (x, y) dxdy.

113

114

Measure and Integral: An Introduction to Real Analysis

Theorem 6.1 (Fubini’s Theorem) Let f (x, y) ∈ L(I), I = I1 × I2 . Then (i) For almost every x ∈ I1 , f (x, y) is measurable and integrable on I2 as a function of y; (ii) As a function of x, I2 f (x, y) dy is measurable and integrable on I1 , and ⎡ ⎤ f (x, y) dxdy = ⎣ f (x, y) dy⎦ dx. I

I1

I2

Setting f = 0 outside I, we see that it is enough to prove the theorem when we then drop I1 , I2 , and I from I1 = Rn , I2 = Rm , and I = Rn+m . For simplicity, the notation and write f (x, y) dx for I1 f (x, y) dx, L(dx) for L(I1 ), L(dxdy) for L(I), etc. We will prove the theorem by considering a series of special cases. The first two lemmas below will help in passing from one case to the next. In these lemmas, we say that a function f in L(dxdy) for which Fubini’s theorem is true has property F .

Lemma 6.2 property F .

A finite linear combination of functions with property F has

This follows immediately from Theorems 4.9 and 5.28. Lemma 6.3 Let f1 , f2 , . . ., fk , . . . have property F . If fk f or fk f , and if f ∈ L(dxdy), then f has property F . Proof. We will concentrate on integrability properties, leaving questions of measurability to the reader. Changing signs if necessary, we may assume that a set Zk in Rn with measure zero fk f . For each k, there exists by hypothesis

/ Zk . Let Z = k Zk , so that Z has Rn -measure such that fk (x, y) ∈ L(dy) if x ∈ zero. If x ∈ / Z, then fk (x, y) ∈ L(dy) for all k, and therefore, by the monotone convergence theorem applied to {fk (x, y)} as functions of y, hk (x) = fk (x, y) dy h(x) = f (x, y) dy (x ∈ / Z). By assumption, we have hk (x) ∈ L(dx), fk ∈ L(dxdy), and fk (x, y) dxdy = another application of the monotone convergence thehk (x) dx. Therefore, orem gives f (x, y) dxdy = h(x) dx. Since f ∈ L(dxdy), it follows that h ∈ L(dx), which implies that h is finite a.e. This completes the proof. The next three lemmas prove special cases of Fubini’s theorem.

115

Repeated Integration

Lemma 6.4 If E is a set of type Gδ , namely, E = measure, then χE has property F .

∞

k=1

Gk , and if G1 has finite

Proof. Case 1. Suppose that E is a bounded open interval in Rn+m : E = J1 × J2 , where J1 and J2 are bounded open intervals in Rn and Rm , respectively. Then |E| = |J1 ||J2 |, where |J1 | and |J2 | denote the measures of J1 and J2 in Rn and m R . For every x, χE (x, y) is clearly measurable as a function of y. If h(x) = χE (x, y) dy, then h(x) = |J2| for x ∈ J1 , and h(x) = 0 otherwise. Therefore, h(x) dx = |J1 ||J2 |. But also, χE (x, y) dxdy = |E| = |J1 ||J2 |, and the lemma is proved in this case. Case 2. Suppose that E is any set (of type Gδ or not) on the boundary of m an interval in Rn+m . Then for almost every x, the set {y : (x, y) ∈ E} has R measure zero. Therefore, if h(x) = χE (x, y) dy, it follows that h(x) = 0 a.e. Hence, h(x) dx = 0. But also, χE (x, y) dxdy = |E| = 0. Case 3. Suppose next that E is a partly open interval in Rn+m . Then E is the union of its interior and a subset of its boundary. It follows from cases 1 and 2 and Lemma 6.2 that χE has property F .

Case 4. Let E be an open set in Rn+m with finite measure. Write E = Ij ,

k where the Ij are disjoint, partly open intervals. If Ek = j=1 Ij , then χEk = k j=1 χIj , so that χEk has property F by case 3 and Lemma 6.2. Since χEk χE , χE has property F by Lemma 6.3. Case 5. Let E satisfy the hypothesis of Lemma 6.4. We may assume that Gk E by considering the open sets G1 , G1 ∩ G2 , G1 ∩ G2 ∩ G3 , etc. Then χGk χE , and the lemma follows from case 4 and Lemma 6.3. Lemma 6.5 If Z is a subset of Rn+m with measure zero, then χZ has property F . Hence, for almost every x ∈ Rn , the set {y : (x, y) ∈ Z} has Rm -measure zero. Proof. Using Theorem 3.8, select a set H of type Gδ such that Z ⊂ H and |H| = 0. If H = Gk , we may assume that G1 has finite measure, so that by Lemma 6.4,

χH (x, y) dxdy = 0. χH (x, y) dy dx =

Therefore, by Theorem 5.11, |{y : (x, y) ∈ H}| = χH (x, y) dy = 0 for almost every x. If {y : (x, y) ∈ H) has Rm -measure zero, so does {y : (x, y) ∈ Z} since every x, χZ (x, y) is measurable in y and Z ⊂ H. It follows that for almost y) dy = 0. Hence, [ χ (x, y) dy] dx = 0, which proves the lemma χZ (x, Z since χZ (x, y) dxdy = |Z| = 0.

116

Lemma 6.6 property F .

Measure and Integral: An Introduction to Real Analysis

Let E ⊂ Rn+m . If E is measurable with finite measure, then χE has

Proof. Using Theorem 3.28, write E = H − Z, where H is of type Gδ and Z has measure zero. If H = Gk , choose G1 with finite measure (see the proof of Theorem 3.28). Since χE = χH − χZ , the result follows from Lemmas 6.2, 6.4, and 6.5. Proof of Fubini’s theorem. We must show that every f ∈ L(dxdy) has property F . Since f = f + − f − , we may assume by Lemma 6.2 that f ≥ 0. Then, by Theorem 4.13, there are simple measurable fk f , fk ≥ 0. Each fk ∈ L(dxdy), and by Lemma 6.3, it is enough to show that these have property F . Hence, we may assume that f is simple and integrable, say f = N j=1 vj χEj . Since each Ej for which vj = 0 must have finite measure, the result follows from Lemmas 6.2 and 6.6. If f ∈ L(Rn+m ), then by Fubini’s theorem, f (x,y) is a measurable function of y for almost every x ∈ Rn . We now show that the same conclusion holds if f is merely measurable. Theorem 6.7 Let f (x, y) be a measurable function on Rn+m . Then for almost every x ∈ Rn , f (x, y) is a measurable function of y ∈ Rm . In particular, if E is a measurable subset of Rn+m , then the set Ex = {y : (x, y) ∈ E} is measurable in Rm for almost every x ∈ Rn . Proof. Note that if f is the characteristic function χE of a measurable E ⊂ Rn+m , then the two statements of the theorem are equivalent. To prove the result in this case, write E = H ∪ Z, where H is of type Fσ in Rn+m and |Z|n+m = 0. Then Ex = Hx ∪ Zx , Hx is of type Fσ in Rm , and for almost every x ∈ Rn , |Zx |m = 0 by Lemma 6.5. Therefore, Ex is measurable for almost every x. If f is any measurable function on Rn+m , consider the set E(a) = {(x, y) : f (x, y) > a}. Since E(a) is measurable in Rn+m , the set E(a)x = {y : (x, y) ∈ E(a)} is measurable in Rm for almost every x ∈ Rn . The exceptional set of Rn -measure zero depends on a. The union Z of these exceptional sets for all / Z, then {y : f (x, y) > a} is mearational a still has Rn -measure zero. If x ∈ surable for all rational a and so for all a by Theorem 4.4. This completes the proof. We will now extend Fubini’s theorem to functions defined on measurable subsets of Rn+m .

117

Repeated Integration

Theorem 6.8 Let f (x, y) be a measurable function defined on a measurable subset E of Rn+m , and let Ex = {y : (x, y) ∈ E}.

(i) For almost every x ∈ Rn , f (x, y) is a measurable function of y on Ex . (ii) If f (x, y) ∈ L(E), then for almost every x ∈ Rn , f (x, y) is integrable on Ex with respect to y; moreover, Ex f (x, y) dy is an integrable function of x and

f (x, y) dxdy =

Rn

E

⎡

⎤

⎣ f (x, y) dy⎦ dx. Ex

Proof. Let f¯ be the function equal to f in E and to zero elsewhere in Rn+m . Since f is measurable on E, f¯ is measurable on Rn+m . Therefore, by Theorem 6.7, f¯ (x, y) is a measurable function of y for almost every x ∈ Rn . Since Ex is measurable for almost every x ∈ Rn , it follows that f (x, y) is measurable on almost every Ex . This proves (i). If f ∈ L(E), then f¯ ∈ L(Rn+m ) and

f (x, y) dxdy =

E

Rn+m

f¯ (x, y) dxdy =

Rn

f¯ (x, y) dy dx.

Rm

Since Ex is measurable for almost every x, we obtain by Theorem 5.24 that Rm f¯ (x, y) dy = Ex f (x, y) dy for almost every x ∈ Rn . Part (ii) follows by combining equalities.

6.2 Tonelli’s Theorem By Fubini’s theorem, the finiteness of a multiple integral implies that of the corresponding iterated integrals. The converse is not true, even if all the iterated integrals are equal, as shown by the following example. Example 6.9 Let n = m = 1 and let I be the unit square and {Ik } be the infinite sequence of subsquares shown in the following illustration:

118

Measure and Integral: An Introduction to Real Analysis

1

(1,1) I3 I2

1 2

I1

1 2

1

Subdivide each Ik into four equal subsquares by lines parallel to the x- and y-axes.

Ik(4)

Ik(3)

Ik(1)

Ik(2)

Ik (1)

(3)

For each k, let f = 1/|Ik | on the interiors of Ik and Ik and let f = −1/|Ik |

on the interiors of Ik(2) and Ik(4) . Let f = 0 on the rest of I, that is, outside Ik 1 and on the boundaries of all the subsquares. Clearly, 0 f (x, y) dy = 0 for all 1 x, and 0 f (x, y) dx = 0 for all y. Therefore, ⎡ ⎤ ⎡ ⎤ 1 1 1 1 ⎣ f (x, y) dy⎦ dx = ⎣ f (x, y) dx⎦ dy = 0. 0

0

0

0

However, I

f + (x, y) dxdy =

k

Ik

f + (x, y) dxdy =

1 k

2

= +∞.

119

Repeated Integration

Similarly, I f − dxdy = +∞. Hence, finiteness of the iterated integrals of f does not in general imply either the existence of the multiple integral of f or the finiteness of the multiple integral of | f |. However, for nonnegative f , we have the following basic result.

Theorem 6.10 (Tonelli’s Theorem) Let f (x, y) be nonnegative and measurable on an interval I = I1 ×I2 of Rn+m . Then, for almost every x ∈ I1 , f (x, y) is a measurable function of y on I2 . Moreover, as a function of x, I2 f (x, y) dy is measurable on I1 , and

f (x, y) dxdy =

I

⎤ ⎡ ⎣ f (x, y) dy⎦ dx.

I1

I2

Proof. This is actually a corollary of Fubini’s theorem. For k = 1, 2, . . ., let fk (x, y) = 0 if |(x, y)| > k and fk (x, y) = min {k, f (x, y)} if |(x, y)| ≤ k. Then fk ≥ 0, fk f on I, and fk ∈ L(I) (fk is bounded and vanishes outside a compact set). Hence, Fubini’s theorem applies to each fk . The statement concerning the measurability of I2 f (x, y) dy then follows from its analogue for fk ; in fact, by the monotone convergence theorem, I2 fk (x, y) dy I2 f (x, y) dy. (The measurability of f (x,y) as a function of y was proved in Theorem 6.8.) By the monotone convergence theorem again,

fk (x, y) dxdy →

I

I1

f (x, y) dxdy,

and

I

⎤ ⎤ ⎡ ⎡ ⎣ fk (x, y) dy⎦ dx → ⎣ f (x, y) dy⎦ dx. I2

I1

I2

Since fk ∈ L(I), the left-hand sides in the last two limits are equal. Therefore, so are the right-hand sides, and the theorem follows. An extension of Tonelli’s theorem to functions defined over arbitrary measurable sets E is straightforward. Since the roles of x and y can be interchanged above, it follows that if f is nonnegative and measurable, then I

f (x, y) dxdy =

I1

⎤ ⎤ ⎡ ⎡ ⎣ f (x, y) dy⎦ dx = ⎣ f (x, y) dx⎦ dy. I2

I2

I1

In particular, we obtain the important fact that for f ≥ 0, the finiteness of any one of Fubini’s three integrals implies that of the other two. Hence, for any measurable

120

Measure and Integral: An Introduction to Real Analysis

f , the finiteness of one of these integrals for |f | implies that f is integrable and that all three Fubini integrals of f are equal. An easy consequence of Tonelli’s theorem is that the conclusion

f (x, y) dxdy =

I

I1

⎡ ⎤ ⎣ f (x, y) dy⎦dx I2

of Fubini’s theorem (including the existence and measurability of the inner integral on theright-hand side) holds for measurable f even if I f = ±∞ (i.e., it holds if I f merely exists). In fact, if I f = +∞, we have I f + = +∞ and f − ∈ L(I). By Tonelli’s theorem, I

Since

I

f+ =

I1

⎛ ⎝

⎞ f + dy⎠dx,

I2

I

f− =

I1

⎛ ⎝

⎞ f − dy⎠dx.

I2

f − is finite, the desired formula follows by subtraction.

6.3 Applications of Fubini’s Theorem We shall derive several important results as corollaries of Fubini’s and Tonelli’s theorems. The first one is the necessity of the condition in Theorem 5.1. Using the notation of Chapter 5, we will prove the following result. Theorem 6.11 Let f be a nonnegative function defined on a measurable set E ⊂ Rn . If R(f , E), the region under f over E, is a measurable subset of Rn+1 , then f is measurable. Proof. For 0 ≤ y < +∞, we have {x ∈ E : f (x) ≥ y} = {x : (x, y) ∈ R(f , E)}. Since R( f ,E) is measurable, it follows from Theorem 6.7 that {x ∈ E : f (x) ≥ y} is measurable (in Rn ) for almost all (linear measure) such y. In particular, {x ∈ E : f (x) ≥ y} is measurable for all y in a dense subset of (0, ∞). If y is negative, then {x ∈ E : f (x) ≥ y} = E, which is measurable. We conclude that f is measurable (cf. Theorem 4.4). As a second application of Fubini’s theorem, we will prove a result about the convolution of two functions. More general results of this kind will be proved

121

Repeated Integration

in Chapter 9. If f and g are measurable in Rn , their convolution (f ∗ g)(x) is defined by (f ∗ g)(x) =

f (x − t)g(t) dt,

Rn

provided the integral exists. We first claim that f ∗ g = g ∗ f , that is, that

f (x − t)g(t) dt =

Rn

f (t)g(x − t) dt.

(6.12)

Rn

This is actually a special case of results dealing with changes of variable. In this simple case, however, it amounts to the statement that if x ∈ Rn , then

r(t) dt =

Rn

r(x − t) dt

(6.13)

Rn

when r(t) = f (x − t)g(t). For fixed x, x – t ranges over Rn as t does. Therefore, for any measurable r ≥ 0, (6.13) follows from the geometric interpretation of the integral. (See Theorem 5.1 and the definition of the integral of a nonnegative function.) For any measurable r, it follows by writing r = r+ − r− . (For the effect of a linear change of variables, see Exercise 20 of Chapter 5, and see Exercise 18 of Chapter 3 for the effect of translations. Measurability of r(x − t) as a function of t can then be deduced from that of r(t).) The result we wish to prove for convolutions is the following. Theorem 6.14 If f ∈ L(Rn ) and g ∈ L(Rn ), then (f ∗ g)(x) exists for almost every x ∈ Rn and is measurable. Moreover, f ∗ g ∈ L(Rn ) and

| f ∗ g| dx ≤

Rn

Rn

|f | dx

Rn

(f ∗ g) dx =

f dx

Rn

Rn

|g| dx , g dx .

Rn

In order to prove this, we need a lemma. Lemma 6.15 If f (x) is measurable in Rn , then the function F(x, t) = f (x − t) is measurable in Rn × Rn = R2n .

122

Measure and Integral: An Introduction to Real Analysis

Proof. Let F1 (x, t) = f (x). Since f is measurable, it follows as in Lemma 5.2 (or by n-fold iteration of the conclusion of Lemma 5.2) that F1 (x, t) is measurable in R2n : in fact, the set {(x, t) : F1 (x, t) > a}, which equals {(x, t) : f (x) > a, t ∈ Rn }, is a cylinder type set with measurable base {x : f (x) > a) in Rn . For (ξ, η) ∈ R2n , consider the transformation x = ξ − η, t = ξ + η. This is a nonsingular linear transformation of R2n , and therefore, by Theorem 3.33 (see Exercise 4 of Chapter 4), the function F2 defined by F2 (ξ, η) = F1 (ξ −η, ξ +η) is measurable in R2n . Since F2 (ξ, η) = f (ξ − η), the lemma follows. Proof of Theorem 6.14 Suppose first that both f and g are nonnegative on Rn . By Lemma 6.15, f (x − t)g(t) is measurable on Rn × Rn since it is the product of two such functions. Hence, the integral I=

f (x − t)g(t) dtdx

is well defined. By Tonelli’s theorem and (6.13), f (x − t)g(t) dt dx

= g(t) f (x − t) dx dt = f (x) dx g(t) dt .

I=

The first of these equations can be written I = (f ∗ g)(x) dx, where measurability of f ∗ g is guaranteed by Tonelli’s theorem, so that

f (x) dx g(x) dx . f ∗ g (x) dx = This proves the theorem for f ≥ 0 and g ≥ 0. For general f , g ∈ L(Rn ), it follows that

| f (x − t)| |g(t)| dtdx =

g dx < ∞. f dx f ∗ g dx =

Hence, f (x − t)g(t) ∈ L(dtdx). By Fubini’s theorem, a.e. x and is measurable and integrable; also,

f (x − t)g(t) dtdx =

f (x − t)g(t) dt exists for

f (x − t)g(t) dt dx = f (x) dx g(x) dx .

In particular, f ∗ g is well-defined a.e. and measurable (even integrable), and

(f ∗ g) =

f g .

123

Repeated Integration

Finally, since | f ∗g| ≤ | f |∗|g| wherever f ∗g exists, we obtain by integration that

| f ∗ g| ≤

(| f | ∗ |g|) =

|f|

|g| ,

and the proof is complete. See Exercise 21 of Chapter 9 for a criterion which guarantees measurability of f ∗ g. In the proof of Theorem 6.14, we showed the following useful fact. Corollary 6.16 If f and g are nonnegative and measurable on Rn , then f ∗ g is measurable on Rn and Rn

f ∗ g dx =

Rn

f dx

g dx .

Rn

Our final application of Fubini’s theorem is an important result due to Marcinkiewicz concerning the structure of closed sets. For simplicity, we will restrict our attention to the one-dimensional case. Given a closed subset F of R1 and a point x, let δ (x) = δ (x, F) = min x − y : y ∈ F denote the distance of x from F. Thus, δ(x)

= 0 if and only if x ∈ F. By Theorem 1.10, the complement of F is a union k (ak , bk ) of disjoint open intervals. At most, two of these intervals can be infinite. The graph of δ(x) is thus an irregular sawtooth curve: over any finite interval [ak , bk ], the graph is the sides of the isosceles triangle with base [ak , bk ] and altitude 12 (bk − ak ) ; outside the terminal points of F, the graph is linear. If we move from a point x to a point y, the distance from F cannot increase by more than |x − y|. Hence, δ (x) − δ y ≤ x − y ; that is, δ satisfies a Lipschitz condition. We shall prove the following theorem. (See also Exercises 7 through 9 and Theorem 9.19.) The result will be used in the proof of Theorem 12.67.

Theorem 6.17 (Marcinkiewicz) Let F be a closed subset of a bounded open interval (a, b), and let δ(x) = δ(x, F) be the corresponding distance function. Then, given λ > 0, the integral

124

Measure and Integral: An Introduction to Real Analysis

Mλ (x) = Mλ (x; F) =

b a

δλ y dy x − y1+λ

is finite a.e. in F. Moreover, Mλ ∈ L(F) and

Mλ ≤ 2λ−1 |G| ,

F

where G = (a, b) − F. Before the proof, we note that what makes the finiteness of Mλ (x) remarkable is the singular behavior of δλ y /|x − y|1+λ as y → x. Since δ(y) → δ(x) / F (see Exercise 9). If x ∈ F, then as y → x, it follows that Mλ (x) = +∞ if x ∈ δ(x) = 0, but the mere Lipschitz character of δ in the estimate δ y − δ (x)λ x − yλ δλ y 1 1+λ = 1+λ ≤ 1+λ = x − y x − y x − y x − y b is not enough to imply that Mλ (x) is finite since a dy/|x − y| = +∞. The key to the convergence of Mλ at a point x is the fact that δ(y) vanishes not only at x but also at every y ∈ F. Thus, roughly speaking, the finiteness of Mλ (x) means that F is very dense near x. In this regard, see also Exercise 9(b). Proof. Measurability of Mλ follows from Corollary 6.16. Since δ = 0 in F, integration in the integral defining Mλ can be restricted to the set G = (a, b)−F without changing Mλ . Thus,

Mλ (x) dx =

F

G

dx δ y dy, x − y1+λ λ

F

the change in the order of integration being justified by Tonelli’s theorem. To estimate the inner integral, fix y ∈ G and note that for any x ∈ F, we have |x − y| ≥ δ(y) > 0. Thus, F

dx ≤ x − y1+λ

∞ dt −λ dx −1 = 2 = 2λ δ y . 1+λ t1+λ x − y δ(y) |x−y|≥δ(y)

125

Repeated Integration

In particular,

Mλ (x) dx ≤

F

δ

λ

−1 −λ dy = 2λ−1 |G| < +∞. y 2λ δ y

G

Exercises 1. (a) Let E be a measurable subset of R2 such that for almost every x ∈ R1 , {y: (x, y) ∈ E) has R1 -measure zero. Show that E has measure zero and that for almost every y ∈ R1 , {x: (x, y) ∈ E} has measure zero. (b) Let f (x, y) be nonnegative and measurable in R2 . Suppose that for almost every x ∈ Rl , f (x, y) is finite for almost every y. Show that for almost every y ∈ Rl , f (x, y) is finite for almost every x. 2. If f and g are measurable in Rn , show that the function h(x, y) = f (x)g(y) is measurable in Rn ×Rn . Deduce that if E1 and E2 are measurable subsets of Rn , then their Cartesian product E1 × E2 = {(x, y) : x ∈ E1 , y ∈ E2 } is measurable in Rn × Rn , and |E1 × E2 | = |E1 ||E2 |. As usual in measure theory, 0 · ∞ and ∞ · 0 are interpreted as 0. 3. Let f be measurable and finite a.e. on [0,1]. If f (x) − f (y) is integrable over the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, show that f ∈ L[0, 1]. 4. Let f be measurable and periodic with period 1: f (t + 1) = f (t). Suppose that there is a finite c such that 1 f (a + t) − f (b + t) dt ≤ c 0

for all a and b. Show that f ∈ L[0,1]. (Set a = x, b = −x, integrate with respect to x, and make the change of variables ξ = x + t , η = −x + t.) 5. (a) If f is nonnegative and measurable on E and ω(y) = |{x ∈∞E : f(x) > y}|, y > 0, use Tonelli’s theorem to prove that E f = 0 ω y dy. (By definition of the integral, E f = R f , E = R(f ,E) dxdy. Use the observation in the proof of Theorem 6.11 that {x ∈ E : f (x) ≥ y} = {x : (x, y) ∈ R(f , E)}, and recall that ω(y) = |{x ∈ E : f (x) ≥ y}| unless y is a point of discontinuity of ω.) (b) Deduce from this special case the general formula E

fp = p

∞ 0

yp−1 ω y dy

f ≥ 0, 0 < p < ∞ .

126

Measure and Integral: An Introduction to Real Analysis

6. For f ∈ L(R1 ), define the Fourier transform f of f by x ∈ R1 .

+∞ 1 f (x) = f (t)e−ixt dt 2π −∞

(For a complex-valued function F = F0 + iF1 whose real and imaginary parts F0 and F1 are integrable, we define F = F0 + i F1 .) Show that if f and g belong to L(R1 ), then (f ∗ g)(x) = 2π f (x) g(x). 7. Let F be a closed subset of R1 and let δ(x) = δ(x, F) be the corresponding distance function. If λ > 0 and f is nonnegative and integrable over the complement of F, prove that the function δλ y f y dy x − y1+λ 1 R is integrable over F and so is finite a.e. in F. (In case f = χ(a,b) , this reduces to Theorem 6.17.) 8. Under the hypotheses of Theorem 6.17 and assuming that b−a < 1, prove that the function M0 (x) =

b log a

1 δ(y)

−1

|x − y|−1 dy

is finite a.e. in F. 9. (a) Show that Mλ (x; F) = +∞ if x ∈ / F, λ > 0. (b) Let F = [c, d] be a closed subinterval of a bounded open interval (a, b) ⊂ R1 , and let Mλ be the corresponding Marcinkiewicz integral, λ > 0. Show that Mλ is finite for every x ∈ (c, d) and that Mλ (c) = Mλ (d) = ∞. Show also that F Mλ ≤ λ−1 |G|, where G = (a, b) − [c, d]. 10. Let vn be the volume of the unit ball in Rn . Show by using Fubini’s theorem that vn = 2vn−1

1

1 − t2

(n−1)/2

dt.

0

(We also observe that by setting w = t2 , the integral is a multiple of a classical β-function and so can be expressed in terms of the -function: ∞ (s) = 0 e−t ts−1 dt, s > 0.)

127

Repeated Integration

11. Use Fubini’s theorem to prove that

2

e−|x| dx = πn/2 .

Rn

(For n = 1, write

+ ∞ −∞

e−x dx 2

2

=

+∞ + ∞

−x2 −y2 dxdy and use polar −∞ e 2 2 2 e−|x| = e−x1 . . . e−xn and Fubini’s

−∞

coordinates. For n > 1, use the formula theorem to reduce to the case n = 1.) 12. (a) Give an example that shows that the projection onto the x-axis of a measurable subset of the plane may not be linearly measurable. (b) Show that if E is either an open or closed set in the plane, then its projection onto the x-axis is linearly measurable. (For (b), the projection of an open set is open, and the projection of a compact set is compact. Any closed set is a countable union of compact sets.) 13. Let f ∈ L(−∞, ∞), and let h > 0 be fixed. Prove that ⎞ x+h ∞ 1 ⎠ ⎝ f (y) dy dx = f (x) dx. 2h −∞ −∞ ∞

⎛

x−h

7 Differentiation The main results in this chapter deal with questions of differentiability. A variety of topics is considered, but for the most part, the results are related to the analogue for Lebesgue integrals of the fundamental theorem of calculus and to the differentiability a.e. of functions that are Lipschitz continuous.

7.1 The Indefinite Integral If f is a Riemann integrable function on an interval [a, b] in R1 , then the familiar definition of its indefinite integral is

F(x) =

x

f (y) dy,

a ≤ x ≤ b.

a

The fundamental theorem of calculus asserts that F = f if f is continuous. We will study an analogue of this result for Lebesgue integrable f and higher dimensions. We must first find an appropriate definition of the indefinite integral. In two dimensions, for example, we might choose

F (x1 , x2 ) =

x1 x2 f y1 , y2 dy1 dy2 . a1 a2

It turns out, however, to be better to abandon the notion that the indefinite integral be a function of point and adopt the idea that it be a function of set. Thus, given f ∈ L(A), where A is a measurable subset of Rn , we define the indefinite integral of f to be the function F(E) =

f,

E

where E is any measurable subset of A. 129

130

Measure and Integral: An Introduction to Real Analysis

F is an example of a set function, by which we mean any real-valued function F defined on a σ-algebra of measurable sets such that (i) F(E) is finite for every E ∈ . (ii) F is countably additive; that is, if E = k Ek is a union of disjoint Ek ∈ , then F(E) =

F (Ek ).

k

By Theorems 5.5 and 5.24, the indefinite integral of an f ∈ L(A) satisfies (i) and (ii) for the σ-algebra of measurable subsets of A. We shall systematically study set functions in Chapter 10. We now discuss a continuity property of the indefinite integral. Recall (from p. 5 in Section 1.3) that the diameter of a set E is the value sup{|x − y| : x, y ∈ E}. A set function F(E) is called continuous if F(E) tends to zero as the diameter of E tends to zero; that is, F(E) is continuous if, given ε > 0, there exists δ > 0 such that |F(E)| < ε whenever the diameter of E is less than δ. An example of a set function that is not continuous can be obtained by setting F(E) = 1 for any measurable E that contains the origin, and F(E) = 0 otherwise. A set function F is called absolutely continuous with respect to Lebesgue measure, or simply absolutely continuous, if F(E) tends to zero as the measure of E tends to zero. Thus, F is absolutely continuous if given ε > 0, there exists δ > 0 such that |F(E)| < ε whenever the measure of E is less than δ. A set function that is absolutely continuous is clearly continuous. The converse, however, is false, as shown by the following example. Let A be the unit square in R2 , let D be a diagonal of A, and consider the σ-algebra of measurable subsets E of A for which E ∩ D is linearly measurable. For such E, let F(E) be the linear measure of E ∩ D. Then F is a continuous set function. However, it is not absolutely continuous since there are sets E containing a fixed segment of D whose R2 -measures are arbitrarily small.

Theorem 7.1

If f ∈ L(A), its indefinite integral is absolutely continuous.

Proof. We may assume that f ≥ 0 by considering f + and f − . Fix k and write f = g + h, where g = f whenever f ≤ k and g = k otherwise. Given ε > 0, we may choose k so large that 0 ≤ A h < 12 ε and, a fortiori, 0 ≤ E h < 12 ε for every measurable E ⊂ A. On the other hand, since 0 ≤ g ≤ k, we have 0 ≤ E g ≤ k|E| < 12 ε if |E| is small enough. Thus,

131

Differentiation

0≤

f =

E

g+

E

h

0, choose k0 so that f − fk0 0, choose an open G such that E ⊂ G and |G − E| < ε. Then |χG − χE | = |G − E| < ε, so we may assume that f = χG for an open G with |G| < +∞. Using Theorem 1.11, write G = Ik , where the Ik are disjoint, partly open intervals. If we let fN be the characteristic function of N k=1 Ik , we obtain ∞ f − fN = |Ik | → 0 k=N+1

since ∞ k=1 |Ik | = |G| < +∞. By (2), it is thus enough to show that each fN has property A . But fN is the sum of χIk , k = 1, . . . , N, so it suffices by (1) to show

134

Measure and Integral: An Introduction to Real Analysis

that the characteristic function of any partly open interval I has property A . This is practically self-evident: if I∗ denotes an interval that contains the closure of I in its interior and that satisfies |I∗ − I| < ε, then for any continuous C, 0 ≤ C ≤ 1, which is 1 in I and 0 outside I∗ , we have |χI − C| ≤ I∗ − I < ε. This completes the proof of Lemma 7.3. The proof also shows that the functions {Ck } can be chosen to be finite linear combinations of characteristic functions of intervals (i.e., step functions) instead of continuous functions. The lemma that follows is a preliminary version of a covering lemma due to Vitali (Theorem 7.17) and has many applications. Lemma 7.4 (Simple Vitali Lemma) Let E be a subset of Rn with |E|e < +∞, and let K be a collection of cubes Q covering E. Then there exist a positive constant β, depending only on n, and a finite number of disjoint cubes Q1 , . . . , QN in K such that N Qj ≥ β|E|e . j=1

Proof. We will index the size of a cube Q ∈ K by writing Q = Q(t), where t is the edge length of Q. Let K1 = K and t∗1 = sup {t : Q = Q(t) ∈ K1 } . If t∗1 = +∞, then K1 contains a sequence of cubes Q with |Q| → +∞. In this case, given β > 0, we simply choose one Q ∈ K1 with |Q| ≥ β|E|e . If t∗1 < +∞, the idea is still to pick a relatively large cube: choose Q1 = Q1 (t1 ) ∈ K1 such that t1 > 12 t∗1 . Now split K1 = K2 ∪ K2 , where K2 consists of those cubes in K1 that are disjoint from Q1 , and K2 of those that intersect Q1 . Let Q∗1 denote the cube concentric with Q1 whose edge length is 5t1 . Thus, Q∗1 = 5n |Q1 |, and since 2t1 > t∗1 , every cube in K2 is contained in Q∗1 . Starting with j = 2, continue this selection process for j = 2, 3, . . . , by letting t∗j = sup t : Q = Q(t) ∈ Kj , , choosing a cube Qj = Qj tj ∈ Kj with tj > 12 t∗j , and splitting Kj = Kj+1 ∪ Kj+1 where Kj+1 consists of all those cubes of Kj that are disjoint from Qj . If Kj+1 is empty, the process ends. We have t∗j ≥ t∗j+1 ; moreover, for each j, the Q1 , . . . , Qj

135

Differentiation

are disjoint from one another and from every cube in Kj+1 , and every cube in is contained in the cube Q∗j concentric with Qj whose edge length is 5tj . Kj+1 Note that Q∗j = 5n Qj . Consider the sequence t∗1 ≥ t∗2 ≥ · · · . If some KN+1 is empty (i.e., if t∗j = 0 for j ≥ N + 1), then since K1 = K2 ∪ K2 = · · · = KN+1 ∪ KN + 1 ∪ · · · ∪ K2 ,

and E is covered by the cubes in K1 , it follows that E is covered by the cubes ∗ ∪ · · · ∪ K2 . Hence, E ⊂ N in KN+1 j=1 Qj , so that |E|e ≤

N N ∗ Qj . Qj = 5n j=1

j=1

This proves the lemma with β = 5−n . On the other hand, if no t∗j is zero, then either there exists a δ > 0 such that

t∗j ≥ δ for all j, or t∗j → 0. In the first case, tj ≥ 12 δ for all j and, therefore, N j=1 Qj → +∞ as N → ∞. Given any β > 0, the lemma follows in this case by choosing N sufficiently large. Finally, if t∗j → 0, we claim that every cube in K1 is contained in j Q∗j . Otherwise, there would be a cube Q = Q(t) not intersecting any Qj . Since this cube would belong to every Kj , t would satisfy t ≤ t∗j for every j and, therefore, t = 0. This contradiction establishes the claim. Since E is covered by the cubes in K1 , it follows that |E|e ≤

Qj . Q∗j = 5n j

j

Hence, given β with 0 < β < 5−n , there exists an N such that This completes the proof.

N j=1 Qj ≥ β|E|e .

We stress that Lemma 7.4 does not presuppose the measurability of E and that the proof can be shortened if E is measurable. In fact, if E is measurable, we can suppose it is closed and bounded (see, e.g., Lemma 3.22). Hence, assuming as we may that the cubes in K are open (by slightly enlarging each cube concentrically), it follows from the Heine–Borel Theorem 1.12 that E can be covered by a finite number of cubes. For Q1 , we then choose the largest cube; similarly, in subsequent steps, we take Qj to be the largest cube disjoint from Q1 , . . . , Qj−1 . Thus, E ⊂ Q∗j , and the lemma follows. See Exercise 18 for a more set-theoretic version of Lemma 7.4.

136

Measure and Integral: An Introduction to Real Analysis

Before stating the final lemma, we make a definition and a few remarks. If f is defined on Rn and integrable over every cube Q, let f ∗ (x) = sup

1 |f (y)| dy, |Q|

(7.5)

Q

where the supremum is taken over all Q with edges parallel to the coordinate axes and center x. The function f ∗ , called the Hardy–Littlewood maximal function of f, is a gauge of the size of the averages of | f | around x. It clearly satisfies the following: (i)

0 ≤ f ∗ (x) ≤ +∞,

(ii)

(f + g)∗ (x) ≤ f ∗ (x) + g∗ (x),

(iii)

∗

(7.6)

∗

(cf ) (x) = |c| f (x).

If f ∗ (x0 ) > α for some x0 ∈ Rn and α > 0, it follows from the absolute continuity of the indefinite integral that f ∗ (x) > α for all x near x0 . Hence, according to Theorem 4.14, f ∗ is lower semicontinuous in Rn . In particular, it is measurable. We now investigate the size of f ∗ . For any measurable E, χ∗E (x)

|E ∩ Q| : Q has center x . = sup |Q|

If E is bounded and Qx denotes the smallest cube with center x containing E, then |E ∩ Qx | |E| = x . |Qx | |Q | It follows that there are positive constants c1 and c2 such that c1

|E| |E| ≤ χ∗E (x) ≤ c2 n n |x| |x|

for large |x|.

(7.7)

In particular, if |E| > 0, χ∗E is not integrable over Rn . We leave it as an exercise to show that for any measurable f that is different from zero on a set of positive measure, there is a positive constant c such that f ∗ (x) ≥

c for |x| ≥ 1. |x|n

Therefore, f ∗ is never integrable over {x : |x| ≥ 1} unless f = 0 a.e. This failure of integrability of f ∗ is due to its size when |x| is large. On the other hand,

137

Differentiation

f ∗ is not integrable over bounded sets for some f ∈ L (Rn ). For example, in case n = 1, the function f (x) =

1 |x| (log |x|)2

χ{x:|x| 1, or even if |f | 1 + log+ |f | ∈ L1 (Rn ); see Theorem 9.16 and Exercise 22 of Chapter 9. To find a way to estimate the size of f ∗ when f ∈ L (Rn ), recall that by Tchebyshev’s inequality, x ∈ Rn : f (x) > α ≤ 1 |f (x)| dx, α n

α > 0.

R

Hence, if f ∈ L (Rn ), there is a constant c independent of α such that

c |x ∈ Rn : |f (x)| > α | ≤ , α

α > 0.

(7.8)

Any measurable f , integrable or not, for which (7.8) is valid is said to belong to weak L(Rn ). Thus, any function in L (Rn ) is also in weak L (Rn ). The function |x|−n is an example of a function in weak L (Rn ), which is not in L (Rn ) (see also Exercise 24 in Chapter 5). Lemma 7.9 (Hardy–Littlewood) If f ∈ L (Rn ), then f ∗ belongs to weak L (Rn ). Moreover, there is a constant c independent of f and α such that x ∈ Rn : f ∗ (x) > α ≤ c |f |, α n R

Proof. Fix α > 0 and let E = f∗ > α .

α > 0.

138

Measure and Integral: An Introduction to Real Analysis

If x ∈ E, then by the definitions of E and f ∗ , there is a cube Qx with center x such that |Qx |−1 Qx |f | > α. Equivalently, |Qx |

0 (depending only on n) and a finite number (k)

⊂ E such that the cubes Qx(k) are disjoint in j (for each k) and j j −1 |Ek | < β j Qx(k) . Therefore, j

of points xj

|Ek |

0, let Eε be the set on which the left side of (7.10) exceeds ε. By (7.10), ε ∗ ε ∪ x : f (x) − Ck (x) > . Eε ⊂ x : f − Ck (x) > 2 2 Applying Lemma 7.9 to the first set on the right and Tchebyshev’s inequality to the second, we obtain |Eε |e ≤ c

ε −1 −1 f − Ck + ε f − Ck . 2 2 n n R

R

Since c is independent of k, it follows by letting k → ∞ that |E ε |e = 0. Let E be the set where the left side of (7.10) is positive. Then E = k Eεk for any sequence εk → 0, and therefore |E| = 0. This means that limQx F(Q)/|Q| exists and equals f (x) for almost every x, which completes the proof. We now list several extensions and corollaries of Lebesgue’s theorem. (I) A measurable function f defined on Rn is said to be locally integrable on if it is integrable over every bounded measurable subset of Rn . This is easily seen to be equivalent to the integrability of f over every compact set.

Rn

Theorem 7.11 The conclusion of Lebesgue’s theorem is valid if, instead of being integrable, f is locally integrable on Rn . Proof. It is enough to show that the conclusion holds a.e. in every open ball. Fix a ball and replace f by zero outside it. This new function is integrable over Rn , its integral is differentiable a.e., and since differentiability is a local property, the initial function f is differentiable a.e. in the ball. This completes the proof. (II) For any measurable E, note that |E ∩ Q| 1 . χE = |Q| |Q| Q

140

Measure and Integral: An Introduction to Real Analysis

By Theorem 7.11, the left-hand side tends to χE (x) a.e. as Q x; that is, |E ∩ Q| = χE (x) |Q| Qx lim

a.e.

(7.12)

A point x for which this limit is 1 is called a point of density of E, and a point for which it is zero is called a point of dispersion of E. Since |Q ∩ E| |Q ∩ CE| |Q| + = = 1, |Q| |Q| |Q| every point of density of E is a point of dispersion of CE, and vice versa. Formula (7.12) can be restated as follows.

Theorem 7.13 of density of E.

Let E be a measurable set. Then almost every point of E is a point

Thus, roughly speaking, a set clusters around almost all of its points. (III) The formula limQx (1/|Q|) Q f (y)dy = f (x) can be written 1 [f (y) − f (x)] dy = 0 Qx |Q| lim

Q

and is valid for almost every x if f is locally integrable. A point x at which the stronger statement 1 | f (y) − f (x)| dy = 0 Qx |Q| lim

(7.14)

Q

is valid is called a Lebesgue point of f, and the collection of all such points is called the Lebesgue set of f. Theorem 7.15 Let f be locally integrable in Rn . Then almost every point of Rn is a Lebesgue point of f; that is, there exists a set Z (depending on f) of measure zero such that (7.14) holds for x ∈ / Z. Proof. Let {rk } be the rational numbers, and let Zk be the set where the formula 1 f (y) − rk dy = f (x) − rk Qx |Q| lim

Q

141

Differentiation

is not valid. Since f (y) − rk is locally integrable, we have |Zk | = 0. Let Z = Zk ; then |Z| = 0. For any Q, x, and rk , 1 1 1 f (y) − rk dy + f (x) − rk dy |f (y) − f (x)| dy ≤ |Q| |Q| |Q| Q

Q

Q

1 f (y) − rk dy + f (x) − rk . = |Q| Q

Therefore, if x ∈ / Z, lim sup Qx

1 |f (y) − f (x)| dy ≤ 2|f (x) − rk | |Q| Q

for every rk . For an x at which (in particular, almost everywhere), f (x) is finite we can choose rk such that f (x) − rk is arbitrarily small. This shows that the left side of the last formula is zero a.e. and completes the proof. (IV) So far, the sets contracting to x have been cubes centered at x with edges parallel to the coordinate axes. Many other sets can be used. A family {S} of measurable sets is said to shrink regularly to x provided (i) The diameters of the sets S tend to zero. (ii) If Q is the smallest cube with center x containing S, there is a constant k independent of S such that |Q| ≤ k|S|. The sets S need not contain x. Theorem 7.16 Let f be locally integrable in Rn . Then at every point x of the Lebesgue set of f (in particular, almost everywhere), 1 |f (y) − f (x)| dy → 0 |S| S

for any family {S} that shrinks regularly to x. Thus, also 1 f (y) dy → f (x) |S| S

a.e.

142

Measure and Integral: An Introduction to Real Analysis

Proof. If S ⊂ Q, we have |f (y) − f (x)| dy ≤ |f (y) − f (x)| dy. S

Q

Hence, if {S} shrinks regularly to x and Q is the least cube with center x containing S, then 1 |Q| 1 |f (y) − f (x)| dy ≤ |f (y) − f (x)| dy |S| |S| |Q| S

Q

1 |f (y) − f (x)| dy. ≤k |Q| Q

If x is a Lebesgue point of f , the last expression tends to zero, and the theorem follows. In particular, for functions of a single variable, we obtain (cf. p. 131 in Section 7.2) x+h 1 f (y) dy = f (x) lim h→0 h x

a.e.

7.3 Vitali Covering Lemma The theorem that follows is a refinement of the simple Vitali Lemma 7.4. Given a set and a collection of cubes, we will now assume that each point of the set is covered not just by a single cube in the collection but by a sequence of cubes in the collection with diameters tending to zero. In this case, it turns out that we can cover almost all points of the set by a sequence of disjoint cubes in the collection. The result will be essentially a corollary of Lemma 7.4. A family K of cubes is said to cover a set E in the Vitali sense if for every x ∈ E and η > 0, there is a cube in K containing x whose diameter is less than η.

Theorem 7.17 (Vitali Covering Lemma) Suppose that E is covered in the Vitali sense by afamily K of cubes and that 0 < |E|e < +∞. Then, given ε > 0, there is a sequence Qj of disjoint cubes in K such that E − = 0 and Qj < (1 + ε)|E|e . Q j j j

143

Differentiation

Proof. The second relation is automatically satisfied if we choose an open set G containing E with |G| < (1 + ε)|E|e and consider only those Q in K which lie in G. By Lemma 7.4, there exist a constant β, 0 < β < 1, depending only on N 1 |Qj | > β|E|e . the dimension, and disjoint Q1 , . . . , QN1 in K such that j=1 Therefore, N1 N1 N1 E − Qj < |E|e (1 + ε − β). G − = |G| − Q ≤ Q j j j=1 j=1 j=1 e

Hence, by considering from the start only those ε with 0 < ε < β/2, we have N1 β E − . Qj < |E|e 1 − 2 j=1 e

Thus, the part of E not covered by the cubes obtained from the simple Vitali lemma has outer measure less than |E|e (1 − β/2). We now repeat the proN1 cess for the set E1 = E − j=1 Qj , which is still covered in the Vitali sense by those cubes in K that are disjoint from Q1 , . . . , QN1 . We obtain QN1 +1 , . . . , QN2 , disjoint from each other and from Q1 , . . . , QN1 , such that N2 N2 β β 2 E − < |E|e 1 − Qj = E1 − Qj < |E1 |e 1 − . 2 2 j=1 j=N1 +1 e

e

Continuing in this way, we obtain at the mth stage disjoint Q1 , . . . , QNm in K such that Nm β m E − 1 − Q < |E| . e j 2 j=1 e

Since (1 − β/2)m → 0 as m → ∞, there is a sequence of disjoint cubes in K with the desired properties.

Corollary 7.18 Suppose that E is covered in the Vitali sense by a family K of cubes and 0 < |E|e < +∞. Then, given ε > 0, there is a finite collection Q1 , . . . , QN of disjoint cubes in K such that

144

Measure and Integral: An Introduction to Real Analysis N E − Qj < ε j=1

N Qj < (1 + ε)|E|e .

and

j=1

e

This is part of the proof of Vitali’s lemma. Note by Carathéodory’s theorem that N N |E|e = E − Qj + E ∩ Qj , j=1 j=1 e

so that for E and Qj

N j=1

e

as in Corollary 7.18, we have N Qj . |E|e − ε < E ∩ j=1

(7.19)

e

In particular, N Qj . |E|e − ε

D4 f (x) has measure zero. A similar argument will apply to any two Dini numbers of f . It is enough to show that each set Ar,s = x ∈ (a, b) : D1 f (x) > r > s > D4 f (x) has measure zero since the original set is the union of these over rational r and s. We may assume r, s > 0 since f is increasing. Fix r and s with r > s > 0, write A = Ar,s , and suppose that |A|e > 0. If x ∈ A, the fact that D4 f (x) < s implies the existence of arbitrarily small h > 0 such that f (x − h) − f (x) < s. −h 7.18 and (7.19), given ε > 0, there exist disjoint intervals By Corollary xj − hj , xj , j = 1, . . . , N, such that (i) f xj − f xj − hj < shj , j = 1, . . . , N, (ii) A ∩ N x − h , x > |A|e − ε, j j j j=1 e N h < (1 + ε)|A| . (iii) e j=1 j Combining (i) and (iii), we obtain (iv)

N j=1 f xj − f xj − hj < s(1 + ε)|A|e .

Let B = A ∩ N j=1 xj −hj , xj . By(ii), |B|e > |A|e − ε. For every y ∈ B that is not an endpoint of some xj − hj , xj , the fact that D1 f (y) > r implies that there

146

Measure and Integral: An Introduction to Real Analysis

exist arbitrarily small k > 0 such that [y, y + k] lies in some xj − hj , xj and f (y + k) − f (y) > r. k Hence, by Corollary 7.18 and (7.20), there exist disjoint [yi , yi + ki ], i = 1, . . . , M, such that (v) Each [yi , yi + ki ] lies in some xj − hj , xj , (vi) f yi + ki − f yi > rki , i = 1, . . . , M, M (vii) i=1 ki > |B|e − ε > |A|e − 2ε [by (ii)]. Therefore, (viii)

M i = 1 f yi + ki − f yi > r (|A|e − 2ε).

Since f is increasing, it follows from (v) that M N f yi + ki − f yi ≤ f xj − f xj − hj . i=1

j=1

Combining this inequality with (iv) and (viii), we obtain r (|A|e − 2ε) < s (1 + ε)|A|e . Since ε > 0 is arbitrary, this gives r ≤ s, which is a contradiction. Hence, |A|e = 0. Since an analogous argument applies to any two Dini numbers, it follows that f (x) exists for almost every x in (a, b). Extend the definition of f to (a, ∞) by setting f (x) = f (b−) for x ≥ b, and let f (x + h) − f (x) , h

fk (x) =

h=

1 , k

for x ∈ (a, b) and k = 1, 2, . . . . Each fk is nonnegative and measurable, and fk → f a.e. in (a, b). Hence, f is measurable on (a, b), and by Fatou’s lemma, b a

f ≤ lim inf k→∞

b

fk .

a

If f (b−) is finite (otherwise, (7.22) is obvious), we have b

fk =

a

b+h b 1 1 f− f h ha a+h

=

b+h a+h a+h 1 1 1 f− f = f (b−) − f. h h a h a b

147

Differentiation

Since f (a+) ≤ (1/h) b a

a+h a

f ≤ f (a + h), we obtain

f ≤ lim inf k→∞

b

fk = lim

a

k→∞

b

fk = f (b−) − f (a+),

a

which proves (7.22). It remains to show that f is finite a.e. in (a, b). This follows from (7.22) if f (b−) − f (a+) < ∞ since then f ∈ L(a, b). Since f is finite on (a, b), it follows in any case by applying (7.22) to a sequence of intervals (ak , bk ) increasing to (a, b), a < ak < bk < b, and the proof is complete. The inequality in formula (7.22) cannot in general be replaced by equality, even if f is continuous on [a, b]. To see this, let f be the Cantor–Lebesgue function on [0, 1] (p. 43 in Section 3.1). Then f is continuous and monotone increasing on [0, 1], f (0) = 0, f (1) = 1, and since f is constant on every inter1 val removed in constructing the Cantor set, f = 0 a.e. Thus, 0 f = 0, while f (1) − f (0) = 1. We shall return in Theorem 7.29 to the question of equality in (7.22). By Corollary 2.7, any function of bounded variation can be written as the difference of two bounded monotone increasing functions. Hence, we obtain the following result (see also Exercise 9). Corollary 7.23 and f ∈ L[a, b].

If f is of bounded variation on [a, b], then f exists a.e. in [a, b],

If f is of bounded variation on [a, b] and V(x) denotes its (total) variation on [a, x], a ≤ x ≤ b, then V is monotone increasing by Theorem 2.2. The next result gives an important relation between V and f . Theorem 7.24 If f is of bounded variation on [a, b] and V(x) is the variation of f on [a, x], a ≤ x ≤ b, then V (x) = |f (x)| for a.e. x ∈ [a, b]. We will prove this with the help of the next lemma, which is of independent interest. Lemma 7.25 (Fubini) Let fk be a sequence of monotone increasing functions on [a, b]. If the series s(x) = fk (x) converges on [a, b] (equivalently, if s(a) and s(b) are finite), then fk (x) a.e. in [a, b]. s (x) = In particular, fk → 0 a.e. in [a, b].

148

Measure and Integral: An Introduction to Real Analysis

∞ Proof. Let sm = m k=1 fk and rm = k=m+1 fk . Then sm and rm are monotone increasing functions and s = sm + rm . With the exception of a set Zm , |Zm | = 0, these three functions together f1 , . . . , fm are differentiable and s = sm + mwith rm . In particular, s ≥ sm = k=1 fk except in Zm . It follows that ∞

fk ≤ s

k=1

except in Z = ∞ m=1 Zm , |Z| = 0. To prove that in the last inequality we actually have equality a.e., it is enough to show that rm→0 a.e. for m running through a sequence of values rm (x) converges at m1 < m2 < · · · . Select mj increasing so rapidly that j both x = a and x = b. This implies the convergence of rmj (b) − rmj (a) and also, in view of the monotonicity of rmj , of rmj (b−) − rmj (a+) . By (7.22), we have

0≤

b

rmj

=

b

a

rmj ≤

rmj (b−) − rmj (a+) .

a

Thus, rmj is integrable over (a, b), and therefore, it is finite a.e. in (a, b). Thus, rmj → 0 a.e., and the proof is complete. Proof of Theorem 7.24. Let f be of bounded variation on [a, b] and let V(x) = V[a, x] be its variation on [a, x], a ≤ x ≤ b. Then V(a)

= 0 and V(b) is the variation of f on [a, b]. Select a sequence k : k = xkj of partitions of [a, b] such that 0 ≤ V(b) − Sk < 2−k , where Sk =

f xkj − f xkj−1 . j

For each k, define a function fk on [a, b] as follows: if x ∈ xkj−1 , xkj , let ⎧ ⎨f (x) + ckj if f xkj ≥ f xkj−1 , fk (x) = ⎩−f (x) + ck if f xk < f xk , j j j−1

149

Differentiation

where the ckj are constants chosen so that fk (a) = 0 and fk is well-defined (i.e., single-valued) at xkj for every j. Then, for all k and j, fk xkj − fk xkj−1 = f xkj − f xkj−1 , so that Sk =

fk xkj − fk xkj−1 = fk (b). j

Hence, for any k, we obtain 0 ≤ V(b) − fk (b) < 2−k . We will show that each V(x) − fk (x) is an increasing function of x. This amounts to showing that if x < y, then fk (y)−fk (x) ≤ V(y)−V(x). If x and y both belong to the same partitioning interval of k , then fk (y)−fk (x) ≤ | f (y)−f (x)|, and therefore, fk (y) − fk (x) ≤ V[x, y] = V(y) − V(x). In the general case, if xkl , xkl+1 , . . . , xkm are the points of k between x and follows by adding the inequalities for the intervals y, the result k k k x, xl , xl , xl+1 , . . . , xkm , y . Since V(a) = fk (a) = 0 and V(b) − fk (b) < 2−k , it follows that 0 ≤ V(x) − fk (x) < 2−k for all x ∈ [a, b]. Hence,

[V(x) − fk (x)]

0, there exists δ > 0 such that for any collection [ai , bi ] (finite or not) of nonoverlapping subintervals of [a, b], f (bi ) − f (ai ) < ε if (bi − ai ) < δ. For example, if g is integrable on [a, b] and f (x) = f (bi ) − f (ai ) ≤

x a

g for a ≤ x ≤ b, then

|g|

[ai ,bi ]

for any nonoverlapping [ai , bi ]. By Theorem 7.1, E |g| is an absolutely continuous set function, and therefore, f is an absolutely continuous function on [a, b]. One of the main results proved below (Theorem 7.29) x is that every absolutely continuous function f has the form f (x) = f (a) + a g for an integrable g. Another example of an absolutely continuous function is any f that satisfies a Lipschitz condition: |f (x) − f (y)| ≤ C|x − y|

for all x, y ∈ [a, b]

(7.26)

and some constant C ≥ 0. On the other hand, the Cantor–Lebesgue function f is an example of a con tinuous function that is not absolutely continuous, since if Ck = j [akj , bkj ] denotes the intervals remaining at the kth stage of construction of the Cantor set, then |Ck | → 0, while f bkj − f akj = 1 j

for every k. It is simple to see that a linear combination of absolutely continuous functions is absolutely continuous and that an absolutely continuous function is continuous. Moreover, if f is absolutely continuous on [a, b], then f exists a.e. in [a, b] and f ∈ L[a, b]. This follows immediately from Corollary 7.23 and the next theorem. Theorem 7.27 If f is absolutely continuous on [a, b], then it is of bounded variation on [a, b]. Proof. Choose δ so that f (bi ) − f (ai ) ≤ 1 for any collection of nonoverlapping intervals with (bi − ai ) ≤ δ. Then the variation of f over any

151

Differentiation

subinterval of [a, b] with length less than δ is at most 1. Hence, if we split [a, b] into N intervals each with length less than δ, then V[a, b] ≤ N. A function f for which f is zero a.e. in [a, b] is said to be singular on [a, b]. The Cantor–Lebesgue function is an example of a nonconstant, singular function on [0, 1]. Theorem 7.28 If f is both absolutely continuous and singular on [a, b], then it is constant on [a, b]. Proof. It is enough to show that f (a) = f (b) since this result applied to any subinterval proves that f is constant. Let E be the subset of (a, b) where f = 0, so that |E| = b − a. Given ε > 0 and x ∈ E, we have [x, x + h] ⊂ (a, b) and | f (x + h) − f (x)| < εh for all sufficiently small h > 0. Let δ be the number corresponding to ε in the definition of the absolute continuity of f . By Corol lary 7.18 and (7.20), there exist disjoint Qj = xj , xj + hj , j = 1, . . . , N, in (a, b) such that (i) f xj + hj − f xj < εhj , N (ii) j=1 Qj > (b − a) − δ. By (i), N N f xj + hj − f xj < ε Qj ≤ ε(b − a). j=1

j=1

Moreover, since by (ii) the total length of the complementary intervals is less than δ, the sum of the absolute values of the increments of f over them is less than ε. Thus, the sum of the absolute values of the increments of f over the Qj and the complementary intervals is less than ε(b−a)+ε. Hence, | f (b)−f (a)| < ε(b − a) + ε, so that f (b) = f (a). This completes the proof. Theorem 7.29 A function f is absolutely continuous on [a, b] if and only if f exists a.e. in [a, b], f is integrable on [a, b], and

f (x) − f (a) =

x a

f ,

a ≤ x ≤ b.

152

Measure and Integral: An Introduction to Real Analysis

Proof. We have already x observed (see p. 150 in Section 7.5) that any function of the form G(x) = a g, g ∈ L[a, b], is absolutely continuous on [a, b]. Hence, the sufficiency part of the theorem follows. x Conversely, if f is absolutely continuous, let F(x) = a f . F is well-defined by virtue of Theorem 7.27 and Corollary 7.23. Moreover, by Theorem 7.16, F = f a.e. in [a, b]. It follows that F(x)−f (x) is both absolutely continuous and singular on [a, b]. Hence, by Theorem 7.28, we obtain F(x) − f (x) = F(a) − f (a) for x ∈ [a, b]. Since F(a) = 0, the proof is complete. Theorem 7.30 If f is of bounded variation on [a, b], then f can be written f = g+h, where g is absolutely continuous on [a, b] and h is singular on [a, b]. Moreover, g and h are unique up to additive constants. x Proof. Let g(x) = a f and h = f − g. Then h = f − g = f − f = 0 a.e. in [a, b], so that h is singular, and the formula f = g + h gives the desired decomposition. If f = g1 + h1 is another such decomposition, then g − g1 = h1 − h. Since g − g1 is absolutely continuous and h1 − h is singular, it follows from Theorem 7.28 that g − g1 = h1 − h = constant, which completes the proof. In view of Theorems 7.30 and 7.27, it is natural to ask if every bounded singular function is of bounded variation, as is the case, for example, with the Cantor–Lebesgue function. The answer is no, and an example is the characteristic function χC of the Cantor set C. In fact, since χC is zero on every (open) interval in the complement of C, then (χC ) = 0 there, and so (χC ) = 0 a.e. in [0, 1]. However, χC does not belong to BV[0, 1] since it equals 1 at the endpoints of every interval removed during the construction of C. The next theorem, which is an extension of Corollary 2.10, gives formulas for the variations of an absolutely continuous function. Theorem 7.31 Let f be absolutely continuous on [a, b] and let V(x), P(x) and N(x) denote its total, positive, and negative variations on [a, x], a ≤ x ≤ b. Then V, P, and N are absolutely continuous on [a, b], and

V(x) =

x a

|f |,

P(x) =

x + f , a

and

N(x) =

x − f . a

Proof. We will first show that V is absolutely continuous. If [α, β] is a subinterval of [a, b] and = {xk } is a partition of [α, β], then

153

Differentiation

V(β) − V(α) = V[α, β] = sup

f (xk ) − f xk−1

xk = sup xk−1

β f ≤ | f |. α

Hence, if [αi , βi ] is a collection of nonoverlapping subintervals of [a, b], then

[V (βi ) − V (αi )] ≤

|f |.

[αi ,βi ]

From this inequality and Theorem 7.1, it follows that V is absolutely continuous on [a, x b]. Therefore, by Theorem 7.29 and the fact that V(a) = 0, we x have V(x) = a V . Since V = | f | a.e. by Theorem 7.24, we obtain V(x) = a f . The fact that P and N are absolutely continuous for P x x and the formulas and N now follow from the relations V(x) = a f , f (x) = f (a) + a f , P(x) = 1 1 2 [V(x) + f (x) − f (a)], and N(x) = 2 [V(x) − f (x) + f (a)]. (See Theorem 2.6.) This completes the proof. On p. 27 in Section 2.3, we proved that if g is continuous on [a, b] and f is continuously differentiable on [a, b], then b

g df =

a

b

gf dx.

a

This is a special case of the first part of the following theorem.

Theorem 7.32 (Integration by Parts) (i) If g is continuous on [a, b] and f is absolutely continuous on [a, b], then b a

g df =

b

gf dx.

a

(ii) If both f and g are absolutely continuous on [a, b], then b a

gf dx = g(b)f (b) − g(a)f (a) −

b a

g f dx.

154

Measure and Integral: An Introduction to Real Analysis

Proof. To prove (i), first note that the integrals in the conclusion exist and are finite by Theorems 2.24 and 5.30. Let = {xi } be a partition of [a, b] with norm ||. Then b

gf dx =

xi

a

gf dx =

x i g xi−1 f (x) dx

xi−1

xi−1

+

xi g(x) − g xi−1 f (x) dx. xi−1

The first term on the right equals g xi−1 f (xi ) − f xi−1 , which conb verges to a g df as || → 0. The second term on the right is majorized in absolute value by sup

|x−y|≤||

x i g(x) − g(y) f dx = sup xi−1

|x−y|≤||

b g(x) − g(y) |f | dx. a

Since g is uniformly continuous on [a, b], the last expression tends to 0 as || → 0. This proves (i). We can easily deduce (ii) from (i) by using Theorem 2.21. In fact, if f and g are absolutely continuous, then b

gf dx =

a

b

g df = g(b)f (b) − g(a)f (a) −

a

= g(b)f (b) − g(a)f (a) −

b

f dg

a

b

fg dx.

a

This proves (ii). For a generalization of part (i) of the theorem, see Lemma 9.14.

7.6 Convex Functions Let φ be defined and finite on an open interval (a, b), possibly of infinite length. Then φ is said to be convex in (a, b) if for every [x1 , x2 ] in (a, b), the graph of φ on [x1 , x2 ] lies on or below the line segment connecting the points (x1 , φ (x1 )) and (x2 , φ (x2 )) of the graph of φ.

155

Differentiation

f(x)

(x2,f(x2))

L(x)

(x1,f(x1))

a

x1

x2

b

Let y = L(x) be the equation of this line segment. Then since every x ∈ [x1 , x2 ] can be written x = θx1 + (1 − θ)x2 for appropriate θ, 0 ≤ θ ≤ 1, the condition for convexity is φ (θx1 + (1 − θ)x2 ) ≤ L (θx1 + (1 − θ)x2 ) for [x1 , x2 ] ⊂ (a, b) and 0 ≤ θ ≤ 1. Since L (θx1 + (1 − θ) x2 ) = θL (x1 ) + (1 − θ)L (x2 ) = θφ (x1 ) + (1 − θ)φ (x2 ) , it follows that φ is convex in (a, b) if and only if φ (θx1 + (1 − θ)x2 ) ≤ θφ (x1 ) + (1 − θ)φ (x2 )

(7.33)

for a < x1 < x2 < b, 0 ≤ θ ≤ 1. Equivalently, φ is convex in (a, b) if and only if φ

p1 x1 + p2 x2 p1 + p2

≤

p1 φ (x1 ) + p2 φ (x2 ) p1 + p2

(7.34)

for a < x1 < x2 < b, p1 ≥ 0, p2 ≥ 0, p1 + p2 > 0. N Theorem 7.35 (Jensen’s Inequality) Let φ be convex in (a, b). Let xj j=1 be N points of (a, b) and pj j=1 satisfy pj ≥ 0 and pj > 0. Then φ

pj xj pj

≤

pj φ xj . pj

The proof follows by repeated application of (7.34).

156

Measure and Integral: An Introduction to Real Analysis

The slope of the chord connecting two points (α, φ(α)) and (β, φ(β)) of the graph of φ is [φ(β) − φ(α)]/(β − α). We leave it as an exercise to verify the following simple relations between the convexity of φ and the slopes of its chords: if φ is convex in (a, b), then for all x, x1 , x2 with a < x1 < x < x2 < b, φ(x) − φ (x1 ) φ (x2 ) − φ (x1 ) φ (x2 ) − φ(x) ; ≤ ≤ x − x1 x2 − x1 x2 − x conversely, if for every such x, x1 , x2 , either one of these two inequalities holds, then φ is convex in (a, b).

Theorem 7.36 (i) If φ1 and φ2 are convex in (a, b), then φ1 + φ2 is convex in (a, b). (ii) If φ is convex in (a, b) and c is a positive constant, then cφ is convex in (a, b). (iii) If φk , k = 1, 2, . . . , are convex in (a, b) and φk → φ in (a, b), then φ is convex in (a, b). The proof is left as an exercise. Theorem 7.37 If φ exists and is monotone increasing in (a, b), then φ is convex in (a, b). In particular, if φ exists and is nonnegative in (a, b), then φ is convex in (a, b). Proof. We shall use the following inequality: If b1 , b2 > 0, then

min

a1 a2 , b1 b2

≤

a1 + a2 a1 a2 . ≤ max , b1 + b2 b1 b2

(7.38)

To prove the first inequality in (7.38), let m = min {a1 /b1 , a2 /b2 }. Then mb1 ≤ a1 and mb2 ≤ a2 , so that m (b1 + b2 ) ≤ a1 + a2 . This is the desired result. The second inequality is proved similarly. To prove that φ is convex, we will show that if a < x1 < x < x2 < b, then φ (x2 ) − φ (x1 ) φ(x) − φ (x1 ) ≥ . x2 − x1 x − x1 Write [φ (x2 ) − φ(x)] + [φ(x) − φ (x1 )] φ (x2 ) − φ (x1 ) . = x2 − x1 [x2 − x] + [x − x1 ]

157

Differentiation

Since φ exists in (a, b), the mean-value theorem implies that there are ξ1 ∈ (x1 , x) and ξ2 ∈ (x, x2 ) such that φ(x) − φ (x1 ) φ (x2 ) − φ(x) = φ (ξ2 ) . = φ (ξ1 ) , x − x1 x2 − x Since φ is increasing, φ (ξ1 ) ≤ φ (ξ2 ), and it follows from the first inequality in (7.38) that φ (x2 ) − φ (x1 ) φ(x) − φ (x1 ) ≥ . x2 − x1 x − x1 This completes the proof. As a corollary of Theorem 7.37, we see that (i)

xp is convex in (0, ∞) if p ≥ 1 or if p ≤ 0

(ii)

eax is convex in (−∞, ∞)

(iii)

log(1/x) = − log x is convex in (0, ∞)

(7.39)

Theorem 7.40 If φ is convex in (a, b), then φ is continuous in (a, b). Moreover, φ exists except at most in a countable set and is monotone increasing. Proof. Since φ is convex, the slope [φ(x+h)−φ(x)]/h, h > 0, decreases with h. Hence, the derivative on the right, φ(x + h) − φ(x) , h h→0+

D+ φ(x) = lim

exists and is distinct from +∞ in (a, b). Similarly, the derivative on the left, φ(x) − φ(x − h) , h h→0+

D− φ(x) = lim

exists and is distinct from −∞ in (a, b). Since [φ(x) − φ(x − h)]/h ≤ [φ(x + h) − φ(x)]/h, h > 0, we obtain −∞ < D− φ(x) ≤ D+ φ(x) < + ∞.

(7.41)

This shows in particular that φ is continuous in (a, b). We next claim that D+ φ(y) ≤ D− φ(x) if a < y < x < b.

(7.42)

158

Measure and Integral: An Introduction to Real Analysis

In fact, if y < x, then as seen from the discussion earlier in the proof, we have D+ φ(y) ≤

φ(x) − φ(y) ≤ D− φ(x). x−y

This proves the claim. We therefore obtain D+ φ(y) ≤ D− φ(x) ≤ D+ φ(x) if y < x, which shows that D+ φ is monotone increasing. Similarly, D− φ is monotone increasing. To complete the proof of the theorem, note that (cf. Theorem 2.8) D+ φ can have at most a countable number of discontinuities since it is monotone and finite on (a, b). If x is a point of continuity of D+ φ, then letting y → x− in the last inequalities, we obtain D+ φ(x) = D− φ(x). Therefore, φ exists at every point of continuity of D+ φ, and the theorem follows.

Theorem 7.43 If φ is convex in (a, b), then it satisfies a Lipschitz condition on every closed subinterval of (a, b). In particular, if a < x1 < x2 < b, we have φ (x2 ) − φ (x1 ) =

x2

φ .

x1

Proof. Let [x1 , x2 ] be a closed subinterval of (a, b) and let x1 ≤ y < x ≤ x2 . Then as before D+ φ(y) ≤

φ(x) − φ(y) ≤ D− φ(x), x−y

so that, since D+ φ and D− φ are monotone increasing, D+ φ (x1 ) ≤

φ(x) − φ(y) ≤ D− φ (x2 ) . x−y

Hence, |φ(x) − φ(y)| ≤ C|x − y|, where C is the larger of D+ φ(x1 ) and − D φ (x2 ). This shows that φ satisfies a Lipschitz condition on [x1 , x2 ], and the rest of the theorem follows since Lipschitz functions of a single variable are absolutely continuous. The next result is a useful version of Jensen’s inequality for integrals. We shall need the notion of a supporting line: If φ is convex on (a, b) and x0 ∈ (a, b), a supporting line at x0 is a line through (x0 , φ (x0 )) that lies on or below the graph of φ on (a, b). It follows from the discussion preceding (7.41) that any line through (x0 , φ (x0 )) whose slope m satisfies D− φ (x0 ) ≤ m ≤ D+ φ (x0 ) is a supporting line at x0 .

159

Differentiation

Theorem 7.44 (Jensen’s Integral Inequality) Let f and p be measurable functions finite a.e. on a measurable set A ⊂ Rn . Suppose that fp and p are integrable on A, that p ≥ 0, and that A p > 0. If φ is convex in an interval containing the range of f , then φ(f ) p A fp ≤ A . φ p A Ap Proof. By hypothesis, f is finite a.e. in A. Choose (a, b), −∞ ≤ a < b ≤ +∞, such that φ is convex in (a, b) and a < f (x) < b for every x at which f (x) is finite. The number γ defined by fp γ = A Ap is finite and satisfies a < γ < b. If m is the slope of a supporting line at γ and a < t < b, then φ(γ) + m(t − γ) ≤ φ(t). Hence, for almost every x ∈ A, φ(γ)+m[f (x) − γ] ≤ φ(f (x)). Multiplying both sides of this inequality by p(x) and integrating the result with respect to x, we obtain φ(γ)

⎛

A

A

p + m ⎝ fp − γ

A

⎞ p⎠ ≤

φ(f ) p.

A

Here the existence of A φ(f ) p follows from the integrability of pand fp. (The continuity of φ implies that φ(f ) is measurable.) Since A fp − γ A p = 0, the last inequality reduces to φ(γ) A

p≤

φ(f ) p,

A

which is the desired result. In passing, we mention that a function f is called concave in (a, b) if −f is convex on (a, b). Properties of concave functions are easily deduced from those of convex functions.

160

Measure and Integral: An Introduction to Real Analysis

7.7 The Differential in Rn The principal fact to be proved in this section is the Rademacher–Stepanov theorem stating that a function that is locally Lipschitz continuous in an open set in Rn , n > 1, has a first differential (or tangent plane) almost everywhere in the set. The result is somewhat surprising in view of the fact that the graph of a Lipschitz function may have corners and edges. The analogous result in case n = 1 follows by combining Theorem 7.27 and Corollary 7.23. We will also derive a result in which the assumption of Lipschitz continuity is replaced by a weaker condition involving the second difference of a function, and we will study a simple way to extend Lipschitz continuous functions on subsets of Rn to all of Rn . We begin by defining the notion of the first differential of a function. A finite real-valued function f defined in a neighborhood of a point x ∈ Rn , n ≥ 1, is said to have a total first differential at x if there exists A = (a1 , . . . , an ) ∈ Rn such that f (x + h) − f (x) − A · h = o(|h|)

as h → 0.

(7.45a)

Here, A · h denotes the dot product of A and h, that is, A·h=

n

ai hi ,

h = (h1 , . . . hn ) ,

i=1

and the notation “o(|h|) as h → 0” in (7.45a) means that f (x + h) − f (x) − A · h = 0. |h| h→0 lim

(In general, for any finite real-valued function F(h) defined in a deleted neighborhood of the origin, the notation F(h) = o(|h|) as |h| → 0 means that limh→0 F(h)/|h| = 0.) Sometimes, we will just say that such an f has a first differential at x, or simply a differential at x. When n = 1, (7.45a) means only that f has a first derivative at x. If f satisfies (7.45a), then f is clearly continuous at x. By choosing h = (0, . . . , 0, hi , 0, . . . , 0), i = 1, . . . , n, in (7.45a), we also have f x1 , . . . , xi−1 , xi + hi , xi+1 , . . . , xn − f (x1 , . . . , xn ) = ai . lim hi hi →0

161

Differentiation

Thus, A is unique if it exists, and a function f that has a differential at x has (first) partial derivatives ∂f /∂xi at x, i = 1, . . . , n, and ∂f ∂f (x), . . . , (x) . A= ∂x1 ∂xn Consequently, (7.45a) is the same as f (x + h) = f (x) +

n ∂f (x)hi + o(|h|) ∂xi

as h = (h1 , . . . , hn ) → 0.

(7.45b)

i=1

If f has a differential at x, then the graph of the linear function Lx (y) defined by n ∂f Lx (y) = f (x) + (x) yi − xi , ∂xi

y = y1 , . . . , yn ∈ Rn ,

(7.46)

i=1

is called the tangent plane (or tangent line if n = 1) to f at x. By choosing h = y − x in (7.45b), we obtain f (y) = Lx (y) + o(|y − x|) as y → x.

(7.47)

When n > 1, a standard alternate terminology for (7.45b) is to say that f has a tangent plane at x. If f is any function whose partial derivatives ∂f /∂xi all exist at x, we write ∇f (x) =

∂f ∂f (x), . . . , (x) ∂x1 ∂xn

and call ∇f (x) the gradient of f at x. When n > 1, the mere existence of ∇f at a point x does not imply that f has a tangent plane at x even if f is continuous at x. For example, in case n = 2, the function " 2 2 if (x1 , x2 ) = (0, 0) x1 x2 / x1 + x22 (7.48) f (x1 , x2 ) = 0 if (x1 , x2 ) = (0, 0) is continuous and has first partial derivatives everywhere in R2 , but f does not have a tangent plane at (0, 0) since f (0, 0) = 0, ∇f (0, 0) = (0, 0), and f (h1 , h1 ) /h1 = 1/2 if h1 = 0. We leave it as an exercise to show that f satisfies the Lipschitz condition |f (x) − f (y)| ≤ C |x − y|,

x, y ∈ R2 ,

for some constant C that is independent of x and y.

162

Measure and Integral: An Introduction to Real Analysis

We now define a weaker notion of differentiability that will play a role in the proof of the Rademacher–Stepanov theorem. Let x ∈ Rn , n ≥ 1, be a finite real-valued function defined in a measurable set that contains x and has x as a point of density. Equivalently, suppose that f (x + h) is defined and finite for all h in a measurable set H ⊂ Rn containing 0 and having 0 as a point of density. We say that f has an approximate first differential at x, or simply an approximate differential at x, if there exists A ∈ Rn , depending on x, such that f (x + h) = f (x) + A · h + o(|h|),

h ∈ H,

h → 0.

(7.49)

When n = 1, the standard terminology for (7.49) is that f has an approximate derivative at the point in question. The vector A in (7.49) is unique in the following sense. If f satisfies (7.49) as well as f x + h = f (x) + A · h + o h ,

h ∈ H ,

h → 0

for some A and some measurable set H that has 0 as a point of density, then A = A (see Exercise 27). A function that has a total differential at a point clearly has an approximate differential there, but the converse is false even if the function is continuous. For example, let f (x1 , x2 ) be a function on R2 with the properties (i)

f (x1 , x2 ) = 0 if |x2 | ≥ x21 ,

(ii)

f (x1 , 0) = x1 for all x1 ∈ (−∞, ∞),

(iii)

f is continuous on R2 .

Then f has an approximate differential at (0, 0) since it satisfies (7.49) at x = (0, 0) with f (0, 0) = 0 and A = (0, 0) for the set H = (x1 , x2 ) : |x2 | ≥ x21 . However, f does not have a total differential at (0, 0) since f (0, 0) = 0, ∇f (0, 0) = (1, 0), and, for example, the estimate

1/2 f x1 , x21 = x1 + o x21 + x41

as x1 → 0

is false due to the fact that f x1 , x21 = 0 for all x1 . We leave it as an exercise to show that the function f in the example above cannot be redefined in any set of measure zero so that the resulting function has a total differential at (0, 0). On the other hand, the next theorem shows that a Lipschitz continuous function (see the following definition) has a total differential at every point where it has an approximate differential.

163

Differentiation

A finite real-valued function f defined in an open set G ⊂ Rn is said to be Lipschitz continuous in G if there is a constant C such that |f (x) − f (y)| ≤ C|x − y|

for all x, y ∈ G.

We then write f ∈ Lip(G). Similarly, for any finite f defined on G, we write f ∈ Liploc (G) and say that f is locally Lipschitz continuous in G if for every compact set K ⊂ G, there is a constant CK such that |f (x) − f (y)| ≤ CK |x − y|

for all x, y ∈ K.

If f ∈ Lip(G), then f ∈ Liploc (G), but the converse is false. For example, in R2 , the function f (x1 , x2 ) = (1 − x1 )−1 is locally Lipschitz continuous on the open unit ball centered at (0, 0), but it is not Lipschitz continuous there. In fact, for all (x1 , x2 ) , y1 , y2 in the ball, we have f (x1 , x2 ) − f y1 , y2 =

x1 − y1 . (1 − x1 ) 1 − y1

We leave it as an exercise to construct a function that is bounded, uniformly continuous and locally Lipschitz continuous on the open unit ball in R2 but not Lipschitz continuous there. Theorem 7.50 Let x0 ∈ Rn and f be a function that is Lipschitz continuous in a neighborhood of x0 . If f has an approximate differential at x0 , then f has a total differential at x0 . Proof. Suppose that f is Lipschitz continuous near x0 and that f (x0 + h) = f (x0 ) + A · h + o(|h|),

h → 0, h ∈ H,

for some A and some measurable set H ⊂ Rn that has 0 as a point of density. We will prove the theorem by showing that the same asymptotic formula holds for all h → 0 without the restriction that h ∈ H. By considering the function g(x) = f (x0 + x)−f (x0 )−A·x, we may assume that x0 = 0, f (0) = 0, and A = 0, that is, we may assume that f is Lipschitz continuous near 0 and lim

h→0,h∈H

f (h) = 0. |h|

Our goal is then to prove that f (h)/|h| → 0 as |h| → 0.

(7.51)

164

Measure and Integral: An Introduction to Real Analysis

For any y ∈ Rn and t > 0, let B(y; t) = x ∈ Rn : |x − y| < t denote the open ball with center y and radius t. Choose C, r > 0 such that |f (x) − f (y)| ≤ C|x − y|

if x, y ∈ B(0; r).

Fix ε > 0 and let β, γ satisfy 0 < β < 1,

0 < γ < 1 − β,

γ+1−β

0 such that if 0 < |h| < δ then α|B(0; |h|)| ≤ |B(0; |h|) ∩ H| ≤ |Dh ∩ H| + |B(0; |h|) − Dh | = |Dh ∩ H| + |B(0; |h|)| − |Dh | = |Dh ∩ H| + 1 − γn |B(0; |h|)|. Since α > 1 − γn , it follows that |Dh ∩ H| > 0 and in particular that Dh ∩ H is not empty. Thus, for every h with 0 < |h| < δ, there is a point h1 ∈ H such that |h1 | < |h| and |h − h1 | < {ε/(2C)}|h|. ˜ < We may also choose δ so small that δ < r and, by (7.51), so that |f (h)| ˜ ˜ ˜ (ε/2)|h| if h ∈ H and |h| < δ. Note that δ depends ultimately only on ε, H, f , and n. If 0 < |h| < δ and h1 is chosen as above, we obtain

165

Differentiation |f (h)| ≤ f (h) − f (h1 ) + f (h1 ) ε ≤ C |h − h1 | + |h1 | 2 ε ε ≤ C |h| + |h| = ε|h|, 2C 2 which completes the proof. The next theorem is the main result of this section.

Theorem 7.53 (Rademacher–Stepanov) Let G be an open set in Rn , n > 1, and let f ∈ Liploc (G). Then f has a total first differential a.e. in G. The partial derivatives ∂f /∂xi , i = 1, . . . , n, are measurable in G and bounded a.e. in every compact subset of G. Furthermore, if f ∈ Lip(G), then all ∂f /∂xi are bounded a.e. in G. The proof will use Theorem 7.50 together with Lusin’s theorem and the next four lemmas. It will also use the one-dimensional version of Theorem 7.53, which is included in Corollary 7.23. We begin by showing that the derivative (from the right) of a Lipschitz function of one variable can be defined in terms of a certain sequential (as opposed to ordinary) limit of difference quotients, a fact that will be useful in proving measurability of the partial derivatives in Theorem 7.53. Lemma 7.54 Let f be Lipschitz continuous in a half-open interval [a, b) in R1 and let k = 1, 2, . . .. If f (a + 1/k) − f (a) 1/k k→∞ lim

exists, then so does lim

h→0+

f (a + h) − f (a) , h

and the two limits are the same. Proof. Denote L = lim

k→∞

f (a + 1/k) − f (a) . 1/k

166

Measure and Integral: An Introduction to Real Analysis

Note that L is finite since f is Lipschitz continuous in [a, b). Given ε > 0, choose a positive integer Kε such that f (a + 1/k) − f (a) 1. If f ∈ Liploc (G), then f has measurable first partial derivatives ∂f /∂xi a.e. in G, i = 1, . . . , n.

167

Differentiation

Proof. Fix G and f as in the hypothesis. Consider, for example, the case i = 1 and denote points of Rn by (x, y),

with x ∈ (−∞, ∞) and y ∈ Rn−1 .

(7.56)

Also, denote the difference quotient of f in the first variable by Dh f (x, y) =

f (x + h, y) − f (x, y) , h

h = 0,

(7.57)

provided (x, y), (x + h, y) ∈ G. Let E be the set defined by

∂f E = (x, y) ∈ G : (x, y) = lim Dh f (x, y) exists . ∂x h→0 The derivative ∂f /∂x is automatically finite in E since f is locally Lipschitz continuous in G. Our goal is to show that E is measurable in Rn , that |G−E| = 0, and that (∂f /∂x)(x, y) is a measurable function on E. Let D+ f (x, y) = lim Dh f (x, y) h→0+

and

D− f (x, y) = lim Dh f (x, y) h→0−

at any (x, y) ∈ G where these limits exist, and define sets E+ and E− by E+ = (x, y) ∈ G : D+ f (x, y) exists , E− = (x, y) ∈ G : D− f (x, y) exists . Note that E = (x, y) ∈ E+ ∩ E− : D+ f (x, y) = D− f (x, y) .

(7.58)

Moreover, by Lemma 7.54, E+ can be expressed in terms of a sequential limit:

E = (x, y) ∈ G : lim D1/k f (x, y) exists , +

k→∞

and by similar reasoning,

E− = (x, y) ∈ G : lim D−1/k f (x, y) exists . k→∞

168

Measure and Integral: An Introduction to Real Analysis

In order to have a difference quotient that is well-defined for all h = 0 when (x, y) ∈ G, we extend f to be zero outside G. Thus, let "

f in G f¯ = 0 in Rn − G. Then f¯ is measurable in Rn , and consequently, Dh f¯ is defined and measurable in Rn for all h = 0. In particular, both D1/k f¯ and D−1/k f¯ are measurable in Rn for all k = 1, 2, . . .. Since G is open, for every (x, y) ∈ G, we have D1/k f¯ (x, y) = D1/k f (x, y) for all large k. Therefore,

¯ E = (x, y) ∈ G : lim D1/k f (x, y) exists , +

k→∞

and a similar representation holds for E− with D1/k replaced by D−1/k . It follows that E+ and E− are measurable in Rn (cf. Exercise 23 in Chapter 4) and that D+ f is measurable on E+ and D− f is measurable on E− . Then (7.58) implies that E is a measurable set in Rn and ∂f /∂x is a measurable function on E. Since G is open, for every y ∈ Rn−1 , the one-dimensional set Gy defined by Gy = x ∈ (−∞, ∞) : x, y ∈ G is open (possibly empty) in R1 . Also, f (x, y) considered as a function of x is locally Lipschitz continuous in Gy for every y. Hence, by Corollary 7.23, (∂f /∂x)(x, y) exists for a.e. (linear measure) x ∈ Gy . Defining Ey in the same of E manner as Gy , we have from Tonelli’s theorem and the measurability n−1 that for a.e. y ∈ R , Ey is linearly measurable, Ey = Gy (linear measure again), and |E| =

Ey dy = Gy dy = |G|. Rn−1

Rn−1

In case G has finite measure, it follows that |G − E| = 0, and we have accomplished our goal. The same is true if |G| = ∞ by intersecting G and E with a sequence of open balls increasing to Rn . Finally, a similar argument holds for every coordinate xi , i = 1, . . . , n, and the proof of Lemma 7.55 is complete. Before proceeding, we remark that if f is any measurable function in an open set G and if ∂f /∂xi exists a.e. in a measurable set E ⊂ G for some i, then ∂f /∂xi is measurable in E. If i = 1, for example, this follows by considering the extension f¯ in the proof of Lemma 7.55 and using the fact that Dhk f¯ is measurable in Rn and converges a.e. in E to ∂f /∂x1 for any fixed

169

Differentiation

sequence hk → 0. While this fact is generally useful, we did not use it in the proof of Lemma 7.55 because the existence of the first partial derivatives of f a.e. in G was not guaranteed in advance. The next lemma establishes a fact about uniform convergence of difference quotients. Its proof is similar in spirit to the proof of Egorov’s theorem (see also Exercise 13 of Chapter 4). We will continue to use the notations in (7.56) and (7.57) for points in Rn and the difference quotient in the first variable. Lemma 7.59 Let G be an open set in Rn , n > 1, and E be a measurable subset of G with |E| < ∞. Let f be a continuous function on G such that ∂f /∂x exists and is finite in E. Then given ε > 0, there exist a closed set F ⊂ E and δ > 0 such that |E − F| < ε, the set (x + h, y) : (x, y) ∈ F, |h| ≤ δ is contained in G, and Dh f (x, y) converges uniformly to (∂f /∂x)(x, y) in F as h → 0. A similar result holds for the difference quotient of f in each of the other coordinate variables. Proof. Fix G, E, and f as in the hypothesis. For m, k = 1, 2, . . ., let Em,k

= (x, y) ∈ E : (x + h, y) ∈ G and

Dh f (x, y) − ∂f (x, y) ≤ 1 if 0 < |h| ≤ 1 . m ∂x k

To see why each Em,k is measurable, first define G(r) = {(x, y) ∈ G : (x + s, y) ∈ G if |s| ≤ |r|},

r ∈ R1 − {0}.

Thus, G(r) consists of all points (x, y) of G such that the closed line segment of length 2|r| centered at (x, y) and parallel to the x-axis lies in G. Since the distance from any compact line segment in G to the complement of G is positive (cf. Exercise 12(b) of Chapter 1), it follows that G(r) is open and therefore that the set E(r) = G(r) ∩ E is measurable. Next, using the continuity of f on G, and letting Q denote the collection of all rational numbers, we can represent Em,k as a countable intersection as follows: Em,k =

# r∈Q 0 0 ε ⊂ E and an index k = kε and m = 1, 2, . . ., there is a measurable set Hm = Hm m m −m−1 such that |E − Hm | < ε2 and Dh f (x, y) − ∂f (x, y) ≤ 1 m ∂x

if (x, y) ∈ Hm and 0 < |h| ≤

1 . km

$ Let H = ∞ 1 Hm and δ = 1/k1 . Note that if (x, y) ∈ H and |h| ≤ δ then (x + h, y) ∈ G. Also, Dh f converges uniformly to ∂f /∂x in H as h → 0, and |E − H| ≤

∞ 1

ε 2m+1

=

ε . 2

Now choose a closed set F = Fε ⊂ H with |H − F| < ε/2. Then |E − F| < ε, (x + h, y) ∈ G if (x, y) ∈ H and |h| ≤ δ, and Dh f converges uniformly in F as h → 0. This proves Lemma 7.59. The final fact that we will use to prove Theorem 7.53 is given in the next lemma. Lemma 7.60 Let G be an open set in Rn , n > 1, and E be a measurable subset of G. Let f be a continuous function on G all of whose first partial derivatives exist and are finite in E. Then f has an approximate differential a.e. in E. Proof. The proof is by induction on n. Let G, E, and f satisfy the hypothesis. Fix n ≥ 2 and assume that the result is true for dimension n − 1. In case n = 2, this inductive assumption is true since a function of a single variable has an approximate derivative at every point where it has a derivative. We will again use the notation (x, y) and Dh f (x, y) in (7.56) and (7.57). Without loss of generality, we may assume that |E| < ∞. Let ε > 0 and choose δ and F as in Lemma 7.59. Then (x + h, y) ∈ G if (x, y) ∈ F and |h| ≤ δ, |E − F| < ε, and Dh f converges uniformly to ∂f /∂x in F. We may also assume that |F| > 0 and, by Lusin’s theorem, that ∂f /∂x is continuous on F relative to F. Fix any x0 ∈ (−∞, ∞) for which the set Fx0 defined by

Fx0 = y ∈ Rn−1 : x0 , y ∈ F

171

Differentiation

has positive (n − 1)-dimensional measure. Note that a.e. (linear measure) x0 such that Fx0 is not empty has this property by Fubini’s theorem. Similarly, define

Gx0 = y ∈ Rn−1 : x0 , y ∈ G , Ex0 = y ∈ Rn−1 : x0 , y ∈ E . Then G x0 is open in Rn−1 , and Ex0 is measurable in Rn−1 for a.e. x0 . Also, f x0 , y considered as a function of y is continuous in Gx0 , and since f has finite first partial derivatives a.e. in E (by hypothesis), we may assume that f x0 , y has finite first partial derivatives with respect to y for a.e. y ∈ Ex0 . Thus, by our inductive hypothesis, f x0 , y has an approximate differential in y a.e. ((n − 1)-dimensional measure) in Ex0 and so a.e. in Fx0 . Now fix any y0 ∈ Fx0 such that y0 is an (n − 1)-dimensional point of density of Fx0 and such that f x0 , y has a differential in y at y = y0 , that is, so that ˜ x0 , y0 · y − y0 + o |y − y0 | f x0 , y = f x0 , y0 + A

as y → y0 , (7.61)

˜ x0 , y0 is a vector in Rn−1 . Almost every point of Fx has these where A 0 properties. We claim that f has an approximate differential (relative to Rn ) at x0 , y0 . With δ as above, let H ⊂ Rn be the Cartesian product [x0 − δ, x0 + δ] × Fx0 : H = (x, y) ∈ Rn : |x − x0 | ≤ δ, y ∈ Fx0 . Then H ⊂ G, and x0 , y0 is an n-dimensional point of density of H since y0 is an (n − 1)-dimensional point of density of Fx0 . Let (x, y) ∈ H and write

A x0 , y0 =

∂f ˜ x0 , y0 , A x0 , y0 . ∂x

Then A(x0 , y0 ) is a vector in Rn and f (x, y) − f x0 , y0 − A x0 , y0 · x − x0 , y − y0 = f (x, y) − f x0 , y + f x0 , y − f x0 , y0 − A x0 , y0 · x − x0 , y − y0 & % ∂f x0 , y (x − x0 ) = f (x, y) − f x0 , y − ∂x & % ∂f ∂f x0 , y − x0 , y0 (x − x0 ) + ∂x ∂x ˜ x0 , y0 · y − y0 + f x0 , y − f x0 , y0 − A = I + II + III,

172

Measure and Integral: An Introduction to Real Analysis

say. Our by showing that each be proved of |I|, |II|, and |III| has claim will size o |x − x0 | + y − y0 if (x, y) ∈ H and (x, y) → x0 , y0 . Let η > 0. uniformly to ∂f /∂x in F, there exists ν > Since Dh f converges 0 such that Dh f − ∂f /∂x < η in F if 0 < |h| < ν. We may assume that ν < δ. Hence, since I=

% & ∂f Dx−x0 f x0 , y − x0 , y (x − x0 ) , ∂x

we obtain |I| < η |x − x0 |

if |x − x0 | < ν and (x, y) ∈ H.

Also, if (x, y) ∈ H, then x0 , y ∈ F, and consequently by using the continuity of ∂f /∂x on F relative to F, we see that there exists ν > 0 independent of (x, y) such that |II| < η |x − x0 |

if y − y0 < ν and (x, y) ∈ H.

Finally, by (7.61), we have |III| < η y − y0 when y − y0 is sufficiently small. Our claim follows by combining estimates. Since a.e. point in F shares the properties of the point x0 , y0 above, we conclude that f has an approximate differential relative to Rn a.e. in F. Recall that the value of ε > 0 can be arbitrarily small and that the set F = Fε satisfies F ⊂ E and |E − F| < ε. Now let ε → 0 through a sequence {εm }. Then f has an approximate differential a.e. in the set m Fεm , which is a subset of E of full measure. This completes the proof of the Lemma 7.60. Proof of Theorem 7.53. Let G be an open set in Rn , n > 1, and f ∈ Liploc (G). By Lemma 7.55, f has measurable first partial derivatives a.e. in G. The fact that these derivatives are bounded a.e. on every compact subset of G follows from their definitionsas limits of difference quotients since f ∈ Liploc (G). In fact, if Gj : j = 1, 2, . . . is a sequence of bounded open sets with closures contained in G such that Gj G, for example, Gj = x ∈ G : |x| < j and d(x, CG) > 1/j ,

j = 1, 2, . . . ,

then f ∈ Lip Gj for each j, and every compact set in G lies in some Gj . Clearly, the first partial derivatives of a function that is Lipschitz continuous in an open set are bounded a.e. in that set.

173

Differentiation

Thus, it remains only to show that f has a total first differential a.e. in G. However, this is an immediate consequence of Lemma 7.60 and Theorem 7.50, which completes the proof of the Rademacher–Stepanov theorem. An examination of the proof of Theorem 7.53 shows that the assumption of local Lipschitz continuity of f is used in two key ways, first in order to show that f has measurable partial derivatives a.e. in G and again in order to conclude that f has a total differential wherever it has an approximate one (Theorem 7.50). Note that the part of the proof showing that f has an approximate differential a.e. in G (Lemmas 7.59 and 7.60) does not require Lipschitz continuity, although it uses continuity of f in G and relies on the existence of the first partial derivatives of f a.e. in G. In order to transfer some of the proof technique to other situations, we can bypass the first way that Lipschitz continuity is used by simply assuming that all ∂f /∂xi exist and are finite. With this assumption, the next result gives a condition different from Lipschitz continuity that implies total differentiability. The condition is stated as follows in terms of the size of the second difference of f . We say that a function f that is defined and finite in a neighborhood of a point x ∈ Rn is smooth at x if f (x + h) + f (x − h) − 2f (x) = o(|h|)

as |h| → 0.

(7.62)

The intuitive reason for calling such an f smooth at x arises from the one-dimensional situation, noting that f (x + h) + f (x − h) − 2f (x) f (x + h) − f (x) f (x − h) − f (x) = − , h h −h

h = 0,

and consequently, if f is smooth and has a finite derivative from either side at x, then it has a derivative from both sides at x, and the two are the same. Thus, the graph of a function that is smooth at a point cannot have a corner there. Theorem 7.63 Let G be an open set in Rn , n > 1, and E be a measurable subset of G. Let f be a continuous function on G such that (i) f has finite partial derivatives ∂f /∂xi in E, i = 1, . . . , n, (ii) f satisfies the smoothness condition (7.62) at every x ∈ E. Then each ∂f /∂xi is measurable on E, and f has a total first differential a.e. in E. Proof. We will be brief. Fix G, E, and f as in the hypothesis. The measurability on E of the derivatives ∂f /∂xi follows from the remark after the proof of

174

Measure and Integral: An Introduction to Real Analysis

Lemma 7.55. It remains to show that f has a total first differential a.e. in E. For m, k = 1, 2, . . . , define Em,k

|f (x + h) + f (x − h) − 2f (x)| 1 1 . = x∈E: ≤ if 0 < |h| ≤ |h| m k

By (7.62), for every m, Em,k E as k ∞. Every Em,k is measurable since the continuity of f in G gives #

Em,k =

r∈Qn 0 0, it follows (cf. the proof of Lemma 7.59) that there is a measurable set H ⊂ E with |E − H| < ε such that |f (x + h) + f (x − h) − 2f (x)| → 0 uniformly in H as |h| → 0. |h| To complete the proof of the theorem, it is enough to show that f has a total differential a.e. in H. By Lemma 7.60, f has an approximate differential a.e. in E and therefore also a.e. in H. Let x0 ∈ H be a point of density of H where f has an approximate differential. It suffices to prove that f has a total differential at x0 . By the definition of an approximate differential, there is a vector A0 ∈ Rn and a measurable set, which we may assume is the same as the set H, having x0 as a point of density such that f (x0 + h) − f (x0 ) − A0 · h = o(|h|),

h ∈ H, as |h| → 0.

Without loss of generality, we may assume that x0 = 0, f (x0 ) = 0, and A0 = 0. Thus, given η > 0, there exists ρ0 > 0 such that both sup x,h x∈H,|h|≤ρ

|f (x + h) + f (x − h) − 2f (x)| ≤ ηρ if 0 ≤ ρ ≤ ρ0

and |f (h)| ≤ η|h|

if h ∈ H and 0 ≤ |h| ≤ ρ0 .

If ρ is sufficiently small, then every u ∈ Rn satisfying |u| < ρ can be expressed in the form u = x + h with x − h, x ∈ H and |x|, |h| < ρ (see Exercise 30).

175

Differentiation

Combining estimates, we then obtain that for all sufficiently small ρ > 0 and all u with |u| < ρ, |f (u)| = |f (x + h)| ≤ |f (x + h) + f (x − h) − 2f (x)| + |f (x − h)| + 2|f (x)| ≤ ηρ + η 2ρ + 2ηρ = 5ηρ, and the theorem follows. In passing, we will derive an interesting result about extending a Lipschitz function from a set to the entire space in such a way that the extended function remains Lipschitz continuous. The result is not limited to functions defined in open sets. If E is any set in Rn and f is a finite real-valued function defined on E, then f is said to be Lipschitz continuous on E if there is a constant C such that |f (x) − f (y)| ≤ C|x − y|,

x, y ∈ E.

The smallest such C, namely, the constant |f (x) − f (y)| , |x − y| x,y∈E

Cf ,E = sup x=y

is called the Lipschitz constant of f on E. We have the following extension result for such functions. Theorem 7.64 Let f be Lipschitz continuous on a set E ⊂ Rn . Then f can be extended to Rn as a Lipschitz function with the same Lipschitz constant, that is, there is a function f1 ∈ Lip (Rn ) such that f1 = f on E and Cf1 ,Rn = Cf ,E . Proof. Part of the proof will be left as an exercise. Let f and E be as n in the hypothesis if y, y0 ∈ E and and denote C = Cf ,E . Note that x ∈ R , then f y0 −C x −y0 ≤ f (y) + C|x − y| since f y0 − f (y) ≤ C y0 − y ≤ C x − y0 + x − y . Therefore, the conical functions γy (x) defined for y ∈ E and x ∈ Rn by γy (x) = γy,f (x) = f (y) + C|x − y| satisfy inf γy (x) ≥ f y0 − C x − y0 > −∞

y∈E

y0 ∈ E

176

Measure and Integral: An Introduction to Real Analysis

for all x ∈ Rn . Also, for any y0 ∈ E, inf y∈E γy (x) ≤ γy0 (x) < +∞. Define f1 (x) = inf γy (x), y∈E

x ∈ Rn .

(7.65)

Then (see Exercise 32) f1 (x) = f (x) if x ∈ E, f1 ∈ Lip (Rn ) and Cf1 ,Rn = C = Cf ,E .

Exercises 1. Let f be measurable in Rn and different from zero in some set of positive measure. Show that there is a positive constant c such that f ∗ (x) ≥ c|x|−n for |x| ≥ 1. 2. Let φ(x), x ∈ Rn , be a bounded measurable function such that φ(x) = 0 for |x| ≥ 1 and φ = 1. For ε > 0, let φε (x) = ε−n φ(x/ε). (φε is called an approximation to the identity.) If f ∈ L(Rn ), show that lim

ε→0

f ∗ φε (x) = f (x)

in the Lebesgue set of f . (Note that

φε = 1, ε > 0, so that

f ∗ φε (x) − f (x) = [f (x − y) − f (x)]φε (y) dy.

Use Theorem 7.16.) 3. Show that the conclusion of Lemma 7.4 remains true for the case of two dimensions if instead of being squares, the sets Q covering E are rectangles with x-dimension equal to h and y-dimension equal to h2 . (Of course, h varies with Q and the rectangles have edges parallel to the coordinate axes.) Show that the conclusion of Theorem 7.2 remains valid for n = 2 if the cubes Q are replaced by rectangles of this type that are centered at x and with h → 0. Show that the same conclusions are valid for rectangles whose xdimension is h and whose y-dimension is any fixed increasing function of h. Generalize this to higher dimensions. 4. If E1 and E2 are measurable subsets of R1 with |E1 | > 0 and |E2 | > 0, prove that the set {x : x = x1 − x2 , x1 ∈ E1 , x2 ∈ E2 } contains an interval. (cf. Lemma 3.37.)

177

Differentiation

5. Let f be of bounded variation on [a, b]. If f = g + h, where g is absolutely continuous and h is singular, show that b

φ df =

a

6. 7.

8.

9.

b

φf dx +

a

b

φ dh

a

for any continuous φ. Show that if α > 0, then xα is absolutely continuous on every bounded subinterval of [0, ∞). Prove that f is absolutely continuous on [a, b] if and only if given ε > 0, there existsδ > 0 such that [ f (bi ) − f (ai )] < ε for every finite collection [ai , bi ] of nonoverlapping subintervals of [a, b] with (bi − ai ) < δ. Prove the following converse of Theorem 7.31: If f is of bounded variation on [a, b], and if the function V(x) = V[a, x] is absolutely continuous on [a, b], then f is absolutely continuous on [a, b]. If f is of bounded variation on [a, b], show that b f ≤ V[a, b]. a

Show that if equality holds in this inequality, then f is absolutely continuous on [a, b]. (For the second part, use Theorems 2.2(ii) and 7.24 to show that V(x) is absolutely continuous and then use the result of Exercise 8.) 10. (a) Show that if f is absolutely continuous on [a, b] and Z is a subset of [a, b] of measure zero, then the image set defined by f (Z) = {w : w = f (z), z ∈ Z} also has measure zero. Deduce that the image under f of any measurable subset of [a, b] is measurable. (Compare Theorem 3.33.) (Hint: use the fact that the image of an interval [ai , bi ] is an interval of length at most V (bi ) − V (ai ).) (b) Give an example of a strictly increasing Lipschitz continuous function f and a set Z with measure 0 such that f −1 (Z) does not have measure 0 (and consequently, f −1 is not absolutely continuous). (Let f −1 (x) = x + C(x) on [0, 1], where C(x) is the Cantor–Lebesgue function.) 11. Prove the following result concerning changes of variable. Let g(t) be monotone increasing and absolutely continuous on [α, β] and let f be integrable on [a, b], a = g(α), b = g(β). Then f (g(t))g (t) is measurable and integrable on [α, β], and b a

f (x) dx =

β α

f (g(t))g (t) dt.

178

Measure and Integral: An Introduction to Real Analysis

(Consider the cases when f is the characteristic function of an interval, an open set, etc.) 12. Use Jensen’s inequality to prove that if a, b ≥ 0, p, q > 1, (1/p) + (1/q) = 1, then ab ≤

bq ap + . p q

More generally, show that pj

a1 · · · aN ≤

N a j j=1

where aj ≥ 0, pj > 1, convexity of ex .)

N

j=1 (1/pj )

pj

,

= 1. (Write aj = exj /pj and use the

13. Prove Theorem 7.36. 14. Prove that φ is convex on (a, b) if and only if it is continuous and

x1 + x2 φ 2

≤

φ (x1 ) + φ (x2 ) 2

for x1 , x2 ∈ (a, b). 15. Theorem 7.43 shows that a convex function is the indefinite integral of a monotone increasing function. Prove the converse: If φ(x) = xa f (t)dt + φ(a) in (a, b) and f is monotone increasing, then φ is convex in (a, b). (Use Exercise 14.) 16. Show that the formula +∞ −∞

fg = −

+∞

f g

−∞

for integration by parts may not hold if f is of bounded variation on (−∞, +∞) and g is infinitely differentiable with compact support. (Let f be the Cantor–Lebesgue function on [0, 1], and let f = 0 elsewhere.) 17. A sequence {φk } of set functions is said to be uniformly absolutely continuous if given ε > 0, there exists δ > 0 such that if E satisfies |E| < δ, then |φk (E)| < ε for all k. If fk is a sequence of integrable functions on (0, 1) which converges pointwise a.e. to an integrable f , show that 1 0 f − fk → 0 if and only if the indefinite integrals of the fk are uniformly absolutely continuous. (cf. Exercise 23 of Chapter 10.)

Differentiation

179

18. Prove the following set-theoretic result related to the simple Vitali covering lemma. If C = {Q} is a collection of cubes all contained in a fixed bounded set in Rn , then there is a countable subcollection {Qk } of disjoint cubes in C such that every Q ∈ C is contained in some Q∗k , where Q∗k denotes the cube concentric with Qk of edge length 5 times that of Qk . Deduce the measure-theoretic consequence (cf. Lemma 7.4) that if a set E is covered by such a collection C of cubes, then there exist β > 0, depending only on n, and a finite number of disjoint cubes Q1 , . . . , QN in C such that β|E|e ≤ N k=1 |Qk |. Formulate analogues of these facts for a collection of balls in Rn . 19. Use Exercise 18 to prove Lemma 7.9. 20. (a) Let f (x) be defined for all x ∈ Rn by f (x) = 0 if every coordinate of x is rational, and f (x) = 1 otherwise. Describe the set of all x at which 1 |Q| Q f has a limit as Q x and describe all Lebesgue points of f .

21.

22.

23. 24.

25.

26.

(b) Give an example of a bounded function f on (−∞, ∞) with the following x properties: f is continuous except at a single point x0 ; (d/dx) 0 f = f (x) for all x (in particular when x = x0 ); x0 is not a Lebesgue point of f . For x ∈ Rn and 0 < α < n, define f (x) = |x|−α χ{|x| 0, let Hρ be the intersection of H with the ballBρ of radius ρ centered at 0. (a) Show that the set 2x − y :x, y ∈ Hρ covers Bρ if ρ is sufficiently small. Deduce that the set x + h : x, x − h ∈ Hρ , h ∈ Rn covers Bρ if ρ is small. (For the first part, compare Lemma 3.37. Recall from Theorem 3.35 that if E is a measurable set in Rn , then the set 2E defined by 2E = {2x : x ∈ E} has measure |2E| = 2n |E|. Note that 2Hρ and the set obtained by translating Hρ by any fixed point in Bρ are both subsets of B2ρ .) (b) While the result in part (a) is adequate for the purpose of proving Theorem 7.63, it can be improved and generalized. If r, s are real numbers with 0 < |s| ≤ |r|, show that the set rx + sy : x, y ∈ Hρ covers B|r|ρ if ρ is sufficiently small. (Argue as for part (a), but consider only translations of an appropriate subset of −sHρ .) 31. Let f be a finite measurable function on Rn that satisfies f (x + h) + f (x − h) − 2f (x) = O(1) uniformly in x, h for |h| < δ, that is, there are positive constants A, δ such that |f (x + h) + f (x − h) − 2f (x)| ≤ A if x, h ∈ Rn and |h| < δ. Show that f is bounded on every bounded set in Rn . (First show that f is bounded on some ball in Rn . To do so, note that there is a measurable set E with |E| > 0 on which f is bounded. Pick a point of density of E and apply Exercise 30.) Generalize the result to functions f that satisfy f (x + ay) + f (x + by) − 2f (x) = O(1) for fixed real a, b = 0.

Differentiation

181

32. Complete the proof of the extension Theorem 7.64 by verifying the properties listed in the last line of its proof for the function f1 defined in (7.65). 33. Let f be defined and Lipschitz continuous on a measurable set E ⊂ Rn . Show that f has an approximate differential relative to E at almost every point of E.

8 Lp Classes

8.1 Definition of Lp If E is a measurable subset of Rn and p satisfies 0 < p < ∞, then Lp (E) denotes p the collection of measurable f for which E | f | is finite, that is, p

L (E) = f :

p

| f | < +∞ ,

0 < p < ∞.

E

Here, f may be complex-valued (see Exercise 3 of Chapter 4 for the definition of measurability of vector-valued functions). In this case, if f = f1 + if2 for measurable real-valued f1 and f2 , we have | f |2 = f12 + f22 , so that f1 , f2 ≤ | f | ≤ f 1 + f 2 . It follows that f ∈ Lp (E) if and only if both f1 , f2 ∈ Lp (E). See Exercise 1. We shall write || f ||p,E =

1/p |f|

p

,

0 < p < ∞;

E

thus, Lp (E) is the class of measurable f for which || f ||p,E is finite. Whenever it is clear from context what E is, we will write Lp for Lp (E) and || f ||p for || f ||p,E . Note that L1 = L. In order to define L∞ (E), let f be real-valued and measurable on a set E of positive measure. Define the essential supremum of f on E as follows: If | x ∈ E : f (x) > α | > 0 for all real α, let essE sup f = +∞; otherwise, let ess sup f = inf{α : |{x ∈ E : f (x) > α}| = 0}. E

Since the distribution function ω(α) = |{x ∈ E : f (x) > α}| is continuous from the right (see Lemma 5.39), it follows that ω(essE sup f ) = 0 if essE sup f 183

184

Measure and Integral: An Introduction to Real Analysis

is finite. Therefore, essE sup f is the smallest number M, −∞ ≤ M ≤ + ∞, such that f (x) ≤ M except for a subset of E of measure zero. In the definition of essE sup f , we have assumed that |E| = 0. If the same definition were used when |E| = 0, then essE sup f would be −∞ for every real-valued f defined on E, resulting in incorrect or awkward statements of results such as Theorem 8.1. We will avoid technical difficulty of this type by adopting the convention that essE sup f is 0 when |E| = 0. This may be considered an analogue of the fact that ∫E f = 0 when |E| = 0. In practice, we will use the convention only when f ≥ 0 and |E| = 0. A real- or complex-valued measurable f is said to be essentially bounded, or simply bounded, on E if essE sup | f | is finite. By convention, if |E| = 0, then every function is essentially bounded on E and has essential supremum equal to 0. The class of all functions that are essentially bounded on E is denoted by L∞ (E). Clearly, f belongs to L∞ (E) if and only if its real and imaginary parts do. We shall write || f ||∞ = || f ||∞,E = ess sup | f |. E

Thus, || f ||∞ is the smallest M such that | f | ≤ M a.e. in E, and L∞ = L∞ (E) = { f : || f ||∞ < +∞}. The following theorem gives some motivation for this notation.

Theorem 8.1

If |E| < +∞, then || f ||∞ = limp→∞ || f ||p .

Proof. We may assume that |E| > 0 since f ∞ and f p are both 0 if |E| = 0. Let M = f ∞ . If M < M, then the set A = {x ∈ E : | f (x)| > M } has posi 1/p tive measure. Moreover, || f ||p ≥ A | f |p ≥ M |A|1/p . Since |A|1/p → 1 as

p → ∞, it follows that lim inf p→∞ || f ||p ≥ M and therefore that lim inf p→∞

1/p || f ||p ≥ M. However, we also have || f ||p ≤ E Mp = M|E|1/p . Hence, lim supp→∞ || f ||p ≤ M, which completes the proof. Remark: This result may fail if |E| = +∞. Consider, for example, the constant function f (x) = c, c = 0, in (0, ∞). Clearly, f ∈ L∞ but f ∈ / Lp for 0 < p < ∞. See also Exercise 26. We will now study some basic properties of the Lp classes.

Theorem 8.2

If 0 < p1 < p2 ≤ ∞ and |E| < +∞, then Lp2 ⊂ Lp1 .

Lp Classes

185

Proof. Write E = E1 ∪ E2 , E1 being the set where | f | ≤ 1 and E2 the set where | f | > 1. Then E

| f |p =

E1

| f |p +

| f |p ,

0 < p < ∞.

E2

The first term on the right is majorized by |E1 |; the second increases with p since its integrand exceeds 1. It follows that if f ∈ Lp2 , p2 < ∞, then f ∈ Lp1 , p1 < p2 . If p1 < p2 = ∞ and f ∈ L∞ then f is a bounded function on a set of finite measure and so belongs to Lp1 . Remarks (i) In Theorem 8.2, the hypothesis that E have finite measure cannot be omitted: for example, x−1/p1 belongs to Lp2 (1, ∞) if p2 > p1 but does not belong to Lp1 (1, ∞). Again, any nonzero constant is in L∞ , but is not in Lp1 (E) if |E| = +∞ and p1 < ∞. (ii) A function may belong to all Lp1 with p1 < p2 and yet not belong to Lp2 . In fact, if p2 < ∞, x−1/p2 belongs to Lp1 (0, 1), p1 < p2 , but does not belong to Lp2 (0, 1); log(1/x) is in Lp1 (0, 1) for p1 < ∞, but is not in L∞ (0, 1). (iii) We leave it to the reader to show that any function that is bounded on E (|E| < +∞ or not) and that belongs to Lp1 (E) also belongs to Lp2 (E), p2 > p1 . The next theorem states that the Lp classes are vector (i.e., linear) spaces. Its proof is left as an exercise.

Theorem 8.3 constant c.

If f , g ∈ Lp (E), p > 0, then f + g ∈ Lp (E) and cf ∈ Lp (E) for any

8.2 Hölder’s Inequality and Minkowski’s Inequality In order to discuss the integrability of the product of two functions, we will use the following basic result.

Theorem 8.4 (Young’s Inequality) Let y = φ(x) be continuous, real-valued, and strictly increasing for x ≥ 0 and let φ(0) = 0. If x = ψ(y) is the inverse of φ, then for a, b > 0,

186

Measure and Integral: An Introduction to Real Analysis

ab ≤

a

φ(x) dx +

0

b

ψ(y) dy.

0

Equality holds if and only if b = φ(a). Proof. A geometric proof is immediate if we interpret each term as an area and remember that the graph of φ also serves as that of ψ if we interchange the x- and y-axes. Equality holds if and only if the point (a, b) lies on the graph of φ.

y y = f(x) b

0

a

x

If φ(x) = xα , α > 0, then ψ(y) = y1/α , and Young’s inequality becomes ab ≤ a1+α /(1 + α) + b1+1/α /(1 + 1/α). Setting p = α + 1 and p = 1 + 1/α, we obtain

bp 1 ap 1 + if a, b ≥ 0, 1 < p < ∞, and + = 1. ab ≤ p p p p

(8.5)

Two numbers p and p that satisfy 1/p + 1/p = 1, p, p > 1, are called conjugate exponents. Note that p = p/(p − 1), and that 2 is self-conjugate. We will adopt the conventions that p = ∞ if p = 1, and p = 1 if p = ∞. Theorem 8.6 (Hölder’s Inequality) If 1 ≤ p ≤ ∞ and 1/p + 1/p = 1, then || fg||1 ≤ || f ||p ||g||p ; that is, E

E

| fg| ≤

1/p | f |p

E

E

|g|

p

| fg| ≤ ess sup | f | |g|. E

E

1/p

,

1 < p < ∞;

Lp Classes

187

Proof. The last inequality, which corresponds to the case p = ∞, is obvious. Let us suppose then that 1 < p < ∞. In case || f ||p = ||g||p = 1, (8.5) implies that p

p

| f |p ||g||p

|| f ||p |g|p +

+ | fg| ≤ = p p p p

E

E

=

1 1 + = 1 = || f ||p ||g||p . p p

For the general case, we may assume that neither || f ||p nor ||g||p is zero; otherwise, fg is zero a.e. in E, and the result is immediate. We may also assume that neither || f ||p nor ||g||p is infinite. If we set f1 = f /|| f ||p and

g1 = g/||g||p , then || f1 || p = ||g1 ||p = 1. Therefore, by the case already consid ered, we have E f1 g1 ≤ 1; that is, E | fg| ≤ || f ||p ||g||p , as desired. See Exercise 4 concerning the case of equality in Hölder’s inequality. The case p = p = 2 of Hölder’s inequality is a classical inequality:

Corollary 8.7 (Schwarz’s Inequality) E

| fg| ≤

|f|

2

1/2

E

1/2 |g|

2

.

E

The theorem that follows is usually referred to as the converse of Hölder’s inequality (see also Exercise 15 in Chapter 10). Theorem 8.8 Let f be real-valued and measurable on E, and let 1 ≤ p ≤ ∞ and 1/p + 1/p = 1. Then (8.9) || f ||p = sup fg, E

where the supremum is taken over all real-valued g such that g p ≤ 1 and exists.

E fg

Proof. That the left-hand side of (8.9) majorizes the right-hand side follows from Hölder’s inequality. To show the opposite inequality, let us consider first the case of f ≥ 0, 1 < p < ∞. If || f ||p = 0, then f = 0 a.e. in E, and the result is obvious. If 0 < || f ||p < +∞, we may further assume that || f ||p = 1 by dividing both sides of (8.9)

188

Measure and Integral: An Introduction to Real Analysis

by || f ||p . Now let g = f p/p . It is easy to verify that ||g||p = 1 and which completes the proof in this case. If || f ||p = +∞, define functions fk on E by setting fk (x) = 0 if |x| > k,

E fg

= 1,

fk (x) = min{ f (x), k} if |x| ≤ k.

Then each fk belongs to Lp and || fk ||p → || f ||p = +∞. By the case already considered, we have || fk ||p = E fk gk for some gk ≥ 0 with ||gk ||p = 1. Since f ≥ fk , it follows that

fgk ≥

E

fk gk → +∞.

E

This shows that

fg = +∞ = || f ||p .

sup

||g||p =1 E

To dispose of the restriction f ≥ 0, apply the result above to | f |. Thus, there exists gk with ||gk ||p = 1 such that || f ||p = lim

| f | gk = lim

E

f g˜ k ,

E

where g˜ k = gk (sign f ). (By sign

x, we mean the function equal to +1 for x > 0 and to −1 for x < 0.) Since g˜ k p = 1, the result follows. The cases p = 1 and ∞ are left as exercises. We leave it to the reader to check that (8.9) is true if thesupremum is taken only over those real-valued g with ||g||p = 1 for which E fg exists. Also, if 1 ≤ p ≤ ∞, then a measurable function f belongs to Lp (E) if fg ∈ L1 (E) for

every g ∈ Lp (E), 1/p + 1/p = 1. See Exercise 2. We have already observed that the sum of two Lp functions is again in Lp . The next theorem gives a more specific result when 1 ≤ p ≤ ∞. If 1 ≤ p ≤ ∞, then f + g p ≤

Theorem 8.10 (Minkowski’s Inequality) f p + g p ; that is,

1/p | f + g|

E

p

≤

1/p |f|

E

p

+

1/p |g|

p

E

ess sup | f + g| ≤ ess sup | f | + ess sup |g|. E

E

E

,

1 ≤ p < ∞,

Lp Classes

189

Proof. If p = 1, the result is obvious. If p = ∞, we have | f | ≤ || f ||∞ a.e. in E and |g| ≤ ||g||∞ a.e. in E. Therefore, | f + g| ≤ || f ||∞ + ||g||∞ a.e. in E, so that || f + g||∞ ≤ || f ||∞ + ||g||∞ . For 1 < p < ∞, p

|| f + g||p =

| f + g|p =

E

≤

| f + g|p−1 | f + g|

E

| f + g|

E

p−1

|f| +

| f + g|p−1 |g|.

E

In the last integral, apply Hölder’s inequality to | f + g|p−1 and |g| with exponents p = p/(p − 1) and p, respectively. This gives

p−1

| f + g|p−1 |g| ≤ || f + g||p

||g||p .

E

Since a similar result holds for

p−1

E |f

p

+ g|p−1 | f |, we obtain || f + g||p ≤ || f +

(|| f ||p + ||g||p ), and the theorem follows by dividing both sides by || f + p−1 g||p . (Note that if || f + g||p = 0, there is nothing to prove; if || f + g||p = +∞, g||p

then either f p = +∞ or g p = +∞ by Theorem 8.3, and the result is obvious again.) See Exercise 27(a) concerning a version of Minkowski’s inequality for infinite series. Remark: Minkowski’s inequality fails for 0 < p < 1. To see this, take E = (0, 1), f = χ(0, 1 ) , and g = χ( 1 ,1) . Then || f + g||p = 1, while || f ||p + ||g||p = 2−1/p + 2

2

2−1/p = 21−1/p < 1. See also (8.17) and Exercise 27(b).

8.3 Classes l p Let a = {ak } be a sequence of real or complex numbers, and let

||a||p =

k

1/p |ak |p

, 0 < p < ∞;

||a||∞ = sup |ak |. k

Then a is said to belong to l p , 0 < p < ∞, if ||a||p < +∞, and to belong to l∞ if ||a||∞ < +∞. We will sometimes also denote ||a||p by ||a||l p .

190

Measure and Integral: An Introduction to Real Analysis

Let us show that if 0 < p1 < p2 ≤ ∞, then l p1 ⊂ l p2 . (The opposite inclusion holds for Lp (E), |E| < +∞ , by Theorem 8.2.) For p2 = ∞, this is clear, and for p2 < ∞, it follows from the fact that if |ak | ≤ 1, then |ak |p2 ≤ |ak |p1 . Moreover (see Exercise 31), ||a||p ≤ ||a||1 if 1 ≤ p ≤ ∞,

and ||a||p ≤ ||a||q if 0 < q ≤ p ≤ ∞.

2 for a given p < ∞ but that is not in An example ofa sequence that is in l p 2 1/p2 2 p 1 l for p1 < p2 is 1/k log k : k ≥ 2 . Any constant sequence {ak }, ak =

c = 0, belongs to l∞ but not to l p for p < ∞. The same is true for {1/ log k : k ≥ 2}, whose terms even tend to zero.

Theorem 8.11 a ∞ .

If a = {ak } belongs to l p for some p < ∞, then limp→∞ a p =

Proof. If a ∈ l p0 , then a ∈ l p for p0 ≤ p ≤ ∞. Since |ak | → 0,there is a |ak |p = |ak0 |p |ak /ak0 |p . largest |ak |, say |ak0 |. Thus, ||a|| ∞ = |ak0 |. Write |ak /ak0 |p decreases (and so is bounded) as Since |ak /ak0 | ≤ 1, we see that p p ∞. Hence, there is a constant c > 0 such that |ak0 |p ≤ ||a||p ≤ c|ak0 |p for all large p. Since c1/p → 1 as p → ∞, the theorem follows. The next two results are analogues for series of Hölder’s and Minkowski’s inequalities. Their proofs are left as exercises. If a = {ak } and b = {bk }, we use the notation ab = {ak bk } ,

a + b = {ak + bk } ,

etc.

Theorem 8.12 (Hölder’s Inequality) Suppose that 1 ≤ p ≤ ∞ , 1/p+1/p = 1, a = {ak }, b = {bk }, and ab = {ak bk }. Then ||ab||1 ≤ ||a||p ||b||p ; that is,

|ak bk | ≤

|ak |p

|ak bk | ≤ (sup |ak |)

1/p

|bk |p

1/p

,

1 < p < ∞;

|bk | .

Theorem 8.13 (Minkowski’s Inequality) Suppose that 1 ≤ p ≤ ∞, a = {ak }, b = {bk }, and a + b = {ak + bk }. Then ||a + b||p ≤ ||a||p + ||b||p ; that is,

|ak + bk |p

1/p

≤

|ak |p

1/p

+

sup |ak + bk | ≤ sup |ak | + sup |bk | .

|bk |p

1/p

,

1 ≤ p < ∞;

Lp Classes

191

Even though Minkowski’s inequality fails when p < 1 (see Exercise 3), l p is still a vector space for 0 < p < 1; that is, a + b ∈ l p and αa = {αak } ∈ l p if a, b ∈ l p and α is any constant.

8.4 Banach and Metric Space Properties We now define a notion that incorporates the main properties of Lp and l p when p ≥ 1. A set X is called a Banach space over the complex numbers if it satisfies the following three conditions: (B1 ) X is a linear space over the complex numbers C; that is, if x, y ∈ X and α ∈ C, then x + y ∈ X and αx ∈ X. (B2 ) X is a normed space; that is, for every x ∈ X there is a nonnegative (finite) number ||x|| such that (a) ||x|| = 0 if and only if x is the zero element of X, (b) ||αx|| = |α|||x|| for α ∈ C and x ∈ X, (c) x + y ≤ x + y . If these conditions are fulfilled, ||x|| is called the norm of x. (B3 ) X is complete with respect to its norm; that is, every Cauchy sequence in X converges in X, or if ||xk − xm || → 0 as k, m → ∞, then there is an x ∈ X such that ||xk − x|| → 0. A set X that satisfies (B1 ) and (B2 ), but not necessarily (B3 ), is called a normed linear space over the complex numbers. A sequence {xk } such that ||xk − x|| → 0 as k → ∞ is said to converge in norm to x. Restricting the scalars α in (B1 ) and (B2 ) to be real numbers, we obtain definitions for a Banach space over the real numbers and for a normed linear space over the real numbers. Unless specifically stated to the contrary, we will take the scalar field to be the complex numbers. If X is a Banach space, define d(x, y) = ||x − y|| to be the distance between x and y. Then, (M1 ) d(x, y) ≥ 0; d(x, y) = 0 if and only if x = y, (M2 ) d(x, y) = d(y, x), (M3 ) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality). Any set that has a distance function d(x, y) satisfying (M1 ), (M2 ), and (M3 ) is called a metric space with metric d. Therefore, a Banach space is a metric space whose metric is the norm. Moreover, by (B3 ), a Banach space X is a complete metric space; that is, if d (xk , xm ) → 0 as k, m → ∞, then there is an x ∈ X such that d (xk , x) → 0.

192

Theorem 8.14 || f ||p,E .

Measure and Integral: An Introduction to Real Analysis

For 1 ≤ p ≤ ∞, Lp (E) is a Banach space with norm || f || =

Proof. Parts (B1 ) and (B2 ) in the definition of a Banach space are clearly fulfilled by Lp (E), parts (a) and (c) of (B2 ) being Theorem 5.11 and Minkowski’s inequality, respectively. (Regarding part (a), we do not distinguish between two Lp functions that are equal a.e.; thus, the zero element of Lp (E) means any function equal to zero a.e. in E.) p To verify (B 3 ), suppose that fk is a Cauchy sequence in L (E). If p = ∞, fk − fm ≤ || fk − fm ||∞ except for a set Zk,m of measure zero. If Z = then measure zero, and fk − fm ≤ || fk − fm ||∞ outside Z for k,m Zk,m , then Z has all k and m. Hence, fk converges uniformly outside Z to a bounded limit f , and it follows that || fk −f ||∞ → 0. (Note that convergence in L∞ is equivalent to uniform convergence outside a set of measure zero.) In case 1 ≤ p < ∞, Tchebyshev’s inequality (5.49) implies that x ∈ E : | fk (x) − fm (x)| > ε ≤ ε−p fk − fm p . E

Hence, fk is a Cauchy in measure. By Theorems 4.22 and 4.23, sequence there is a subsequence fkj and a function f such that fkj → f a.e. in E. Given ε > 0, there is a K such that p 1/p = || fkj − fk ||p < ε if kj , k > K. f k j − fk E

Letting kj → ∞, we obtain by Fatou’s lemma that || f − fk ||p ≤ ε if k > K. Hence, || f −fk ||p → 0 as k → ∞. Finally, since || f ||p ≤ || f −fk ||p +|| fk ||p < +∞, it follows that f ∈ Lp (E), which completes the proof. A metric space X is said to be separable if it has a countable dense subset; that is, X is separable if there exists a countable set {xk } in X with the property that for every x ∈ X and every ε > 0, there is an xk with d (x, xk ) < ε. In the next theorem, we will show that Lp is separable if 1 ≤ p < ∞. Note that L∞ is not separable: take L∞ (0, 1), for example, and consider the functions ft (x) = χ(0,t) (x), 0 < t < 1. There are an uncountable number of these, and || ft −ft ||∞ = 1 if t = t (see also Exercise 10).

Theorem 8.15

If 1 ≤ p < ∞, Lp (E) is separable.

Proof. Suppose first that E = Rn , and consider a grid of dyadic cubes in Rn . Let D be the set of all (finite) linear combinations of characteristic functions

Lp Classes

193

of these cubes, the coefficients being complex numbers with rational real and imaginary parts. Then D is a countable subset of Lp (Rn ). To see that D is dense, use the method of successively approximating more and more general functions: First, consider characteristic functions of open sets (every open set is the countable union of nonoverlapping dyadic cubes by Theorem 1.11), of Gδ sets, and of measurable sets with finite measure; then consider simple functions whose supports have finite measure, nonnegative functions in Lp (Rn ), and, finally, arbitrary functions in Lp (Rn ). The details are left to the reader (cf. Lemma 7.3). This proves the case E = Rn . For an arbitrary measurable E, let D denote the restrictions to E of the functions in D. Then D is dense in Lp (E), 1 ≤ p < ∞. In fact, given p and f ∈ Lp (E), let f1 = f on E and f1 = 0 off E. Then f1 ∈ Lp (Rn ), so that given ε > 0,

1/p

1/p < ε. Therefore, E | f − g|p < ε. there exists g ∈ D with Rn | f1 − g|p This shows that D is dense in Lp (E) and completes the proof. As we have already noted, Minkowski’s inequality fails when 0 < p < 1. Therefore, || · ||p,E is not a norm for such p. However, we still have the following facts. Theorem 8.16 If 0 < p < 1, Lp (E) is a complete, separable metric space, with distance defined by p

d(f , g) = || f − g||p,E . Proof. With d(f , g) so defined, properties (M1 ) and (M2 ) of a metric space are clear. To verify (M3 ), which is the triangle inequality, we first claim that (a + b)p ≤ ap + bp if a, b ≥ 0, 0 < p ≤ 1. If both a and b are zero, this is obvious. If, say, a = 0, then dividing by ap , we reduce the inequality to (1 + t)p ≤ 1 + tp , t > 0 (t = b/a). This is clear since both sides are equal when t = 0 and the derivative of the right side majorizes that of the left for t > 0. It follows that | f (x) − g(x)|p ≤ | f (x) − h(x)|p + |h(x) − g(x)|p if 0 < p ≤ 1. p p p Integrating, we obtain || f − g||p ≤ || f − h||p + ||h − g||p , which is just the triangle inequality. The proofs that Lp is complete and separable with respect to || · ||p are the same as in Theorems 8.14 and 8.15. It is worth noting that in case 0 < p ≤ 1, the triangle inequality is equivalent to the basic estimate

194

Measure and Integral: An Introduction to Real Analysis

p

p

p

|| f + g||p ≤ || f ||p + ||g||p

(0 < p ≤ 1).

(8.17)

See also Exercise 27(b). The analogous results for series are listed in the next theorem.

Theorem 8.18 (i) If 1 ≤ p ≤ ∞, l p is a Banach space with a = a p . For 1 ≤ p < ∞, l p is separable; l∞ is not separable. (ii) If 0 < p < 1, l p is a complete, separable metric space, with distance d(a, b) = p a − b p . Proof. We will show that l p is complete and separable when 1 ≤ p < ∞ and that l∞ is not separable. The rest of the proof of (i) and the proof of (ii) are left to the reader.

Suppose that 1 ≤ p < ∞, a(i) = ak(i) ∈ l p for i = 1, 2, . . ., and a(i) − a(j) p →

(j) (i) 0 as i, j → ∞. Since a(i) − a(j) p ≥ ak − ak for every k, it follows that (j) (i) (i) ak − ak → 0 for every k as i, j → ∞. Let ak = limi→∞ ak and a = {ak }. We

will show that a ∈ l p and a(i) − a → 0. Given ε > 0, there exists N such that p

1/p (i) (j) p = a(i) − a(j) p < ε ak − ak

if i, j > N.

k

Restricting the summation to k ≤ M and letting j → ∞, we obtain M p 1/p (i) ≤ε ak − ak

for any M, if i > N.

k=1

Letting M → ∞, we get a(i) − a p ≤ ε if i > N; that is, a(i) − a p → 0. The

fact that a p ≤ a − a(i) p + a(i) p shows that a ∈ l p . Hence, l p is complete. To prove that l p is separable when p < ∞, let D be the set of all sequences {dk } such that (a) the real and imaginary parts of dk are rational, and (b) dk = 0 for k ≥ N (N may vary from sequence to sequence). Then D is a countable subset of l p . If a = {ak } ∈ l p and ε > 0, choose N so that ∞ p k=N+1 |ak | < ε/2. Choose d1 , . . . , dN with rational real and imaginary parts N such that k=1 |ak − dk |p < ε/2. Then d = {d1 , . . . , dN , 0, . . .} belongs to D and p a − d p < ε. It follows that D is dense in l p and therefore that l p is separable. To see that l∞ is not separable, consider the subclass of sequences a = {ak } for which each ak is 0 or 1. The number of such sequences is uncountable,

Lp Classes

195

and a − a ∞ = 1 for any two different such sequences. Hence, l∞ cannot be separable. We know from Lusin’s theorem that measurable functions have continuity properties. The next theorem gives a useful continuity property of functions in Lp . Theorem 8.19 (Continuity in Lp ) If f ∈ Lp (Rn ), 1 ≤ p < ∞, then lim f (x + h) − f (x) p = 0.

|h|→0

Proof. Let Cp denote the class of f ∈ Lp such that f (x + h) − f (x) p → 0 as |h| → 0. We claim that (a)

a finite

linear combination of functions in Cp is in

Cp , and (b) if fk ∈ Cp and fk − f p → 0, then f ∈ Cp . Both of these facts follow easily from Minkowski’s inequality; for (b), for example, note that f (x + h) − f (x) p

≤ f (x + h) − fk (x + h) p + fk (x + h) − fk (x) p + fk (x) − f (x) p

= fk (x + h) − fk (x) p + 2 fk (x) − f (x) p . Since fk ∈ Cp , we have lim sup|h|→0 f (x + h) − f (x) p ≤ 2 fk (x) − f (x) p , and (b) follows by letting k → ∞. Clearly, the characteristic function of a cube belongs to Cp . Hence, in view of the fact that linear combinations of characteristic functions of cubes are dense in Lp (Rn ) (see the proof of Theorem 8.15), it follows from (a) and (b) that Lp (Rn ) is contained in Cp , and the proof is complete. We remark without proof that Theorem 8.19 is also true for 0 < p < 1. (Use p the same ideas for · p .) It fails, however, for p = ∞, as shown by the function χ = χ(0,∞) (x) on (−∞, +∞). In fact, χ ∈ L∞ (−∞, +∞) but χ (x + h) − χ(x) ∞ = 1 for all h = 0.

8.5 The Space L2 and Orthogonality For complex-valued measurable f , f = f1 + if2 with f1 and f2 real-valued and f = measurable, we have E f1 + i E f2 (see p. 96 in Section 5.3). We will E use the fact that E f ≤ E | f | (see Exercise 1).

196

Measure and Integral: An Introduction to Real Analysis

Among the Lp spaces, L2 has the special property that the product of any two of its elements is integrable (Schwarz’s inequality). This simple fact leads to some important extra structure in L2 , which we will now discuss. E is a fixed subset of Rn of positive measure, Consider L2 = L2 (E), where and write f = f 2,E , E f = f , etc. For f , g ∈ L2 , define the inner product of f and g by f , g =

f g,

(8.20)

where g denotes the complex conjugate of g. Note that by Schwarz’s inequality, |f , g| ≤ f g . Moreover, the inner product has the following properties: (a) g, f = f , g, (b) f1 + f2 , g = f1 , g + f2 , g , f , g1 + g2 = f , g1 + f , g2 , (c) αf , g = αf , g, f , αg = αf , g, α ∈ C, (d) f , f 1/2 = f . If f , g = 0, then f and g are said to be orthogonal. A set {φα }α∈A is orthogonal if any two of its elements are orthogonal; {φα } is orthonormal if it is orthogonal and φα = 1 for all α. Note that if {φα } is orthogonal and φα = 0 for every α, then {φα / φα } is orthonormal. Henceforth, we will assume that φα = 0 for all α for an orthogonal system {φα }. This implies that no element is zero. Furthermore, since for α = β,

2

φα − φβ 2 = φα − φβ φα − φβ = φα 2 + φβ = 0, it implies that no two elements are equal. A simple example of an infinite orthogonal system in L2 (E) is χEj where Ej is an infinite collection of disjoint measurable subsets of E with 0 < Ej < ∞ (cf. Exercise 33 of Chapter 3. See also Exercise 24 of this chapter).

Theorem 8.21

Any orthogonal system {φα } in L2 is countable.

Proof. We may assume that {φα } is orthonormal. Then for α = β, as above,

2

φα − φβ 2 = φα − φβ φα − φβ = φα 2 + φβ = 2,

Lp Classes

197

√ so that φα − φβ = 2. Since L2 is separable, it follows that {φα } must be countable. A collection ψ1 , . . . , ψN is said to be linearly independent if N k=1 ak ψk (x) = 0 (a.e.) implies that every ak is zero. Any collection of functions is called linearly independent if each finite subcollection is linearly independent. No function in a linearly independent set can be zero a.e.

Theorem 8.22

If {ψk } is orthogonal, it is linearly independent.

Proof. Suppose that a1 ψk1 + · · · + aN ψkN = 0. Multiplying both sides by ψk1 and integrating, we obtain by orthogonality that a1 = 0. Similarly, a2 = · · · = aN = 0. The converse of Theorem 8.22 is not true. However, the next result shows that if {ψk } is linearly independent, then the system formed from suitable linear combinations of its elements is orthogonal. Theorem 8.23 (Gram–Schmidt Process) If {ψk } is linearly independent, then the system {φk } defined by φ1 = ψ1 φ2 = a21 ψ1 + ψ2 .. .

.. .

φk = ak1 ψ1 + · · · + ak,k−1 ψk−1 + ψk .. .

.. .

is orthogonal for proper selection of the aij . Proof. Having φ1 = ψ1 , we proceed by induction, assuming that φ1 , . . . , φk−1 have been chosen as required. We will determine constants bk1 , . . . , bk,k−1 so that the function φk defined by φk = bk1 φ1 + · · · + bk,k−1 φk−1 + ψk is orthogonal to φ1 , . . . , φk−1 . If j < k, φk , φj = bkj φj , φj + ψk , φj

198

Measure and Integral: An Introduction to Real Analysis

by orthogonality. Since φj , φj = 0, bkj can be chosen so that φk , φj = 0, j < k. Since each φj with j < k is a linear combination of ψ1 , . . . , ψj , the theorem follows. When the φk are selected by the Gram–Schmidt process, we shall say that they are generated from the ψk . Note that the triangular character of the matrix in Theorem 8.23 means that each ψk can also be written as a linear combination of the φj , j ≤ k. that is We call an orthogonal system {φk } complete if the only function orthogonal to every φk is zero; that is, {φk } is complete if f , φk = 0 for all k implies that f = 0 a.e. Thus, a complete orthogonal system is one that is maximal in the sense that it is not properly contained in any larger orthogonal system. The span of a set of functions {ψk } is the collection of all finite linear combinations of the ψk . In speaking of the span of {ψk }, we may always assume that {ψk } is orthogonal by discarding any dependent functions and applying the Gram–Schmidt process to the resulting linearly independent set. A set {ψk } is called a basis for L2 if its span is dense in L2 ; that is, {ψk } is a basis if given f ∈ L2 and ε > 0, there exist N and {ak }N k=1 such that

N

f − k=1 ak ψk < ε. The ak can always be chosen with rational real and imaginary parts. Any countable dense set in L2 is of course a basis. It follows that L2 has an infinite orthogonal basis. Theorem 8.24 Any orthogonal basis in L2 is complete. In particular, there exists a complete orthonormal basis for L2 . Proof. Let {ψk } be an orthogonal basis for L 2 . We may assume that {ψk } is orthonormal. To show that it is complete, let f , ψk = 0 for all k. Then f , f = N f , f − k=1 ak ψk for any finite sum N k=1 ak ψk . By Schwarz’s inequality,

N

|f , f | ≤ f f − k=1 ak ψk , and so, since the term on the right can be chosen arbitrarily small, f , f = 0. Therefore, f = 0 a.e., which completes the proof. Let us show that every complete orthogonal system in L2 (E) is infinite if (as always) |E| > 0. If not, there is a set E with |E| > 0 and a complete orthog2 onal system {φk }N k=1 in L with N finite. Assuming as we may that the system is orthonormal, its completeness implies that for every f ∈ L2 (E), we have f = N k=1 f , φk φk a.e. in E, and consequently, f , g =

N f , φk g, φk if f , g ∈ L2 (E). k=1

Lp Classes

199

(In the next section, similar facts are derived in the limit case N = ∞.) Therefore, if f and g are any two orthogonal functions in L2 (E), then the sequences N N f , φk k=1 and g, φk k=1 , considered as vectors, are orthogonal in the vecon p. 198 in tor space CN of N-tuples of complex numbers. We observed Section 8.5 that there is an infinite orthogonal system ψj in L2 (E). Hence, N there is an infinite collection ψj , φk k=1 of orthogonal vectors in the finite j

dimensional space CN , which is impossible. From now on, we will consider only orthogonal systems in L2 that are (countably) infinite.

8.6 Fourier Series and Parseval’s Formula Let {φk } be any (infinite) orthonormal system in L2 . If f ∈ L2 , the numbers defined by ck = ck (f ) = f , φk = f φk E

are called the Fourier coefficients of f with respect to {φk }. The series k ck φk is called the Fourier series of f with respect to {φk }, and denoted S[f ] = k ck φk . We also write f ∼ c k φk . k

The first question we ask is how well S[f ], or more precisely, the sequence of its partial sums, approximates f . Fix N and let L = N k=1 γk φk be a linear combination of φ1 , . . . , φN . We wish to know what choice of γ1 , . . . , γN makes f − L a minimum. Note that since {φk } is orthonormal, L 2 = L, L = N 2 k=1 |γk | . Hence, 2

f − L =

f−

N k=1

= f 2 −

γ k φk

f−

N

γ k φk

k=1

N N (γk ck + γk ck ) + |γk |2 , k=1

k=1

where the ck are the Fourier coefficients of f . Since |ck − γk |2 = (ck − γk ) (ck − γk ) = |ck |2 − (γk ck + γk ck ) + |γk |2 ,

200

Measure and Integral: An Introduction to Real Analysis

we obtain f − L 2 = f 2 +

N

|ck − γk |2 −

k=1

N

|ck |2 .

k=1

Therefore, min f − L 2 = f 2 −

N

γ1 ,...,γN

|ck |2 ;

(8.25)

k=1

that is, the minimum is achieved when γk = ck for k = 1, . . . , N, or equivalently, when L is the Nth partial sum of S[f ]. Writing sN = sN (f ) = N k=1 ck φk , we have from (8.25) that N

f − sN 2 = f 2 − |ck |2 .

(8.26)

k=1

Theorem 8.27

Let {φk } be an orthonormal system in L2 and let f ∈ L2 .

(i) Of all linear combinations N one that best approx1 γk φk with N fixed, the N imates f in L2 is given by the partial sum sN = 1 ck φk of the Fourier series of f . (ii) (Bessel’s inequality) The sequence {ck } of Fourier coefficients of f belongs to l2 and ∞

1/2 2

|ck |

≤ f .

k=1

2 Proof. Part (i) has been proved. Note that since f − sN ≥ 0, Bessel’s inequality follows from (8.26) by letting N → ∞, which completes the proof. If f is a function for which equality holds in Bessel’s inequality, that is, if ∞

1/2 |ck |

2

= f ,

(8.28)

k=1

then f is said to satisfy Parseval’s formula. From (8.26), we immediately obtain the next result.

Lp Classes

201

Theorem 8.29 Parseval’s formula holds for f if and only if sN − f → 0, that is, if and only if S[f ] converges to f in L2 norm. The following theorem is of great importance. Theorem 8.30 (Riesz–Fischer Theorem) Let {φk } be any orthonormalsystem and let {ck } be any sequence in l2 . Then there is an f ∈ L2 such that S[f ] = ck φk , that is, such that {ck } is the sequence of Fourier coefficients of f with respect to {φk }. Moreover, f can be chosen to satisfy Parseval’s formula. Proof. Let tN =

N

k=1 ck φk .

Then if M < N,

2

N

N

tN − tM 2 = |ck |2 . c φ = k k

M+1

M+1 The fact that {ck } ∈ l2 implies that {tN } is a Cauchy sequence in L2 . Since L2 is

2

complete, there is an f ∈ L such that f − tN → 0. If N ≥ k,

f φk =

f − tN φk + t N φk = f − tN φk + c k .

Since the integral on the right is bounded in absolute value by f − tN

φk = f − tN , we obtain by letting N → ∞ that f φk = ck . Thus, S[f ] = k ck φk , so that tN = sN (f ), and it follows from Theorem 8.29 that Parseval’s formula holds for f . This completes the proof. There is no guarantee that the Fourier coefficients of a function uniquely determine the function. However, if {φk } is complete, we can show that the correspondence between a function and its Fourier coefficients is unique; that is, if f and g have the same Fourier coefficients with respect to a complete system, then f = g a.e. This is simple, since the vanishing of all the Fourier coefficients of f − g implies that f − g = 0 a.e. An important related fact is the following. Theorem 8.31 Let {φk } be an orthonormal system. Then {φk } is complete if and only if Parseval’s formula holds for every f ∈ L2 . Proof. Suppose that {φk } is complete. If f ∈ L2 , Bessel’s inequality implies that its Fourier coefficients {ck } belong to l2 . Hence, by the Riesz–Fischer theorem,

202

Measure and Integral: An Introduction to Real Analysis

1/2 |ck |2 there exists a g in L2 with S[g] = ck φk and g = . Since f and g have the same Fourier coefficients and {φk } is complete, we see that f = g a.e. 1/2 |ck |2 Hence, f = g = , which is Parseval’s formula. Conversely, suppose that Parseval’s formula holds with respect to {φk } for 1/2 f , φk 2 every f ∈ L2 . If f , φk = 0 for all k, then f = = 0. Therefore, f = 0 a.e., so that {φk } is complete, which proves the result. Suppose that {φk } is orthonormal and complete and that f , g ∈ L2 . Let ck = f , φk , dk = g, φk , c = {ck } , d = {dk }, and (c, d) = ck dk . We claim that f , g = (c, d).

(8.32)

To prove this, observe that by Parseval’s formula, f + g, f + g = (c + d, c + d), or f , f + g, g + 2 Re f , g = (c, c) + (d, d) + 2 Re (c, d), where Re z denotes the real part of z. Cancelling equal terms gives Re f , g = Re (c, d). Applying this to the function if (x), we obtain Re if , g = Re (ic, d). But Re if , g = Re [if , g] = −Im f , g. Similarly, Re (ic, d) = −Im (c, d). Therefore, Im f , g = Im (c, d), and (8.32) is proved. Another corollary of Theorem 8.31 is given in the next result. First, we make several definitions. Let X1 and X2 be metric spaces with metrics d1 and d2 , respectively. Then X1 and X2 are said to be isometric if there is a mapping T of X1 onto X2 such that d1 (f , g) = d2 (Tf , Tg) for all f , g ∈ X1 . Such a T is called an isometry. Thus, an isometry is a mapping that preserves distances. An isometry is automatically one-to-one, and two isometric metric spaces may be regarded as the same space with a relabeling of the points. For example, two L2 spaces, L2 (E) and L2 (E ), are isometric if there is a mapping T of L2 (E) onto L2 (E ) such that f − g 2,E = Tf − Tg 2,E

for all f , g ∈ L2 (E). The isometries we shall encounter will be linear, that is, will satisfy T(αf + βg) = αTf + βTg

for all scalars α, β.

If T is a linear map of L2 (E) onto L2 (E ), then since Tf − Tg = T(f − g), it follows that T is an isometry if and only if f 2,E = Tf 2,E

Lp Classes

203

for all f ∈ L2 (E). Similarly, a linear map T of L2 (E) onto l2 is an isometry if and only if f 2,E = Tf l2 for all f ∈ L2 (E). Theorem 8.33 another.

All spaces L2 (E) are linearly isometric with l2 , and so with one

Proof. For a given E, define a linear correspondence between L2 (E) and l2 2 by choosing a complete orthonormal system {φk } in L (E) and mapping an 2 f ∈ L (E) onto the sequence f , φk of its Fourier coefficients. This mapping is onto all of l2 by the Riesz–Fischer theorem and is an isometry by Theorem 8.31.

8.7 Hilbert Spaces A set H is called a Hilbert space over the complex numbers C if it satisfies the following three conditions: (H1 ) H is a vector space over C; that is, if f , g ∈ H and α ∈ C, then f + g ∈ H and αf ∈ H. The zero element of H will be denoted by 0. (H2 ) For every pair f , g ∈ H, there is a complex number (f , g), called the inner product of f and g, which satisfies (a) (g, f ) = (f , g),

(b) f1 + f2 , g = f1 , g + f2 , g , (c) (αf , g) = α(f , g) for α ∈ C, (d) (f , f ) ≥ 0, and (f , f ) = 0 if and only if f = 0.

Notice that (a), (b), and (c) imply that f , g1 + g2 = f1 g1 + f , g2 , (f , αg) = α (f , g), and (0, f ) = 0. Define f = (f , f )1/2 . Before stating the third condition, we claim that |(f , g)| ≤ f g

(Schwarz s inequality).

(8.34)

If g = 0, this is obvious. Otherwise, letting λ = −(f , g)/ g 2 , we obtain 0 ≤ (f + λg, f + λg) = f 2 − 2

|(f , g)|2 |(f , g)|2 |(f , g)|2 + = f 2 − , 2 2 g g g 2

204

Measure and Integral: An Introduction to Real Analysis

and Schwarz’s inequality follows at once. This simple proof also shows that if equality holds in (8.34) and ||g|| = 0, then f is a constant multiple of g, namely, f = −λg =

(f , g) g. (g, g)

We will show that · is a norm on H by proving the triangle inequality. In fact, f + g 2 = (f + g, f + g) = f 2 + 2 Re (f , g) + g 2 . Since |Re (f , g)| ≤ |(f , g)| ≤ f g , it follows that the right side is at most ( f + g )2 . Taking square roots, we obtain f + g ≤ f + g , as desired. Hence, H is a normed linear space. We also require (H3 ) H is complete with respect to · . In particular, a Hilbert space is a Banach space. As for L2 spaces, a linear map T of a Hilbert space H onto a Hilbert space

H is an isometry if and only if f H = Tf H for all f ∈ H. A Hilbert space is called infinite dimensional if it cannot be spanned by a finite number of elements; hence, an infinite dimensional Hilbert space has an infinite linearly independent subset. The space L2 with inner product (f , g) = f g and the space l2 with (c, d) = k ck dk are examples of separable infinite dimensional Hilbert spaces. In fact, there are essentially no other examples, as the following theorem shows.

Theorem 8.35 All separable infinite dimensional Hilbert spaces are linearly isometric with l2 and so with one another. Proof. The proof is a repetition of the ideas leading to Theorem 8.33, so we shall brief. Let H be a separable infinite dimensional Hilbert space, and be let e k be a countable dense subset. Discarding those e k that are spanned by other e i , we obtain a linearly independent set {ek } with the same dense span as ek . Since H is infinite dimensional, {ek } is infinite. Using the Gram–Schmidt process, we may assume that {ek } is orthonormal: (ei , ek ) = 0 for i = k and ek = 1 for all k. It follows that {ek } is complete; in fact, if f , ek = 0 for all k, then

2 N N

ak ek = f 2 + |ak |2 ≥ f 2

f −

k=1

k=1

Lp Classes

205

for N = 1, 2, . . . . If f were not zero, the span of the ek could not be dense. Hence, f = 0, which shows that H has a complete orthonormal system {ek }. Next, we will show that Bessel’s inequality and an analogue of the

Riesz–Fischer theorem hold for {ek }. Let f ∈ H and ck = f , ek . Then

2

N N

|ck |2 . 0 ≤ f − ck ek = f 2 −

k=1

k=1

Letting N → ∞, we obtain Bessel’s inequality

|ck |2

1/2

≤ f . In particu-

lar, {ck } belongs to To derive the Riesz–Fischer theorem, let {γk } be a sequence in l2 and set tN = N k=1 γk ek . Then l2 .

tM − tN 2 =

N

|γk |2 → 0

as M, N → ∞, M < N.

k=M+1

Since H is complete, there is a g ∈ H such that g − tN → 0. We have

g, ek = g − tN , ek + (tN , ek ) = g − tN , ek + γk

(k ≤ N).

Letting N → ∞, it follows from Schwarz’s inequality that g, ek = γk . Hence,

2

= g 2 − N |γk |2 . Letting N → ∞ in the tN = N k=1 g, ek ek and g − tN k=1 1/2 |γk |2 . last equation, we see that g satisfies Parseval’s formula g = This gives the analogue of the Riesz–Fischer theorem.

Now, let f ∈ H and set ck = f , ek . Choose {γk } = {ck } in the version

of the Riesz–Fischer theorem just derived, and let g ∈ H satisfy g, ek = ck 1/2 |ck |2 and ||g|| = . We see by the completeness of {ek } that g = f , so 1/2 |ck |2 that Parseval’s formula holds: f = . The fact that H is linearly isometric with l2 now follows as in the proof of Theorem 8.33.

Exercises 1. For complex-valued measurable f ,f = f1 + if2 with f1 and f2 real-valued f = by definition. Prove that and measurable, we have E f1 + i E f2 E f ≤ f is finite if and only if | f | is finite, and | f |. (Note that E E E E

206

Measure and Integral: An Introduction to Real Analysis

1/2 2 1/2 2 f = , and use the fact that a2 + b2 = E E f1 + E f2 2

1/2 a cos α+b sin α for an appropriate α, while a + b2 ≥ |a cos α+b sin α| for all α.)

2. Prove the converse of Hölder’s inequality for p = 1 and ∞. Show also that for 1 ≤ p ≤ ∞, a real-valued measurable f belongs to Lp (E) if fg ∈ L1 (E)

for every g ∈ Lp (E), 1/p + 1/p = 1. The negation is also of interest: if

f ∈ / Lp (E), then there exists g ∈ Lp (E) such that fg ∈ / L1 (E). (To verify the negation, construct g of the form ak gk for appropriate ak and gk , with gk satisfying E fgk → +∞.) 3. Prove Theorems 8.12 and 8.13. Show that Minkowski’s inequality for series fails when p < 1. 4. Let f and g be real-valued and not identically 0 (i.e., neither function equals 0 a.e.), and let 1 < p < ∞. Prove that equality holds in the inequality fg ≤ f p g p if and only if fg has constant sign a.e. and | f |p is a

multiple of |g|p a.e. If f + g p = f p + g p and g = 0 in Minkowski’s inequality, show that f is a multiple of g. Find analogues of these results for the spaces l p . 5. For 0 < p ≤ ∞ and 0 < |E| < +∞, define Np [f ] =

1 | f |p |E|

1/p ,

E

where N∞ [f ] means f ∞ . Prove that if p1 < p2 , then Np1 [f ] ≤ Np2 [f ]. Prove also that if 1 ≤ p ≤ ∞, then Np [f + g] ≤ Np [f ] + Np [g], (1/|E|) | fg| ≤ Np [f ]Np [g], 1/p + 1/p = 1, and that limp→∞ Np [f ] = f ∞ . E Thus, Np behaves like · p but has the advantage of being monotone in p. Recall Exercise 28 of Chapter 5. 6. (a) Let 1 ≤ pi , r ≤ ∞ and ki=1 p1 = 1r . Prove the following generalizai tion of Hölder’s inequality:

f 1 · · · f k ≤ f 1 · · · f k . r p p 1

k

(See also Exercise 12 of Chapter 7.) (b) Let 1 ≤ p < r < q ≤ ∞ and define θ ∈ (0, 1) by the interpolation estimate

1 r

=

1−θ f r ≤ f θ . p f q

In particular, if A = max{ f p , f q }, then f r ≤ A.

θ p

+

1−θ q .

Prove

Lp Classes

207

7. Show that when 0 < p < 1, the neighborhoods { f : f p < ε} of zero in Lp (0, 1) are not convex. (Let f = χ(0,εp ) , and g = χ(εp ,2εp ) . Show that f p = g p = ε, but that 12 f + 12 g p > ε.) 8. Prove the following integral version of Minkowski’s inequality for 1 ≤ p < ∞ and a measurable function f (x, y):

p 1/p 1/p |f (x, y)| dx dy ≤ | f (x, y)|p dy dx.

(For 1 < p < ∞, note that the pth power of the left-hand side equals f (z, y) dz p−1 | f (x, y)| dx dy. Integrate first with respect to y and apply Hölder’s inequality.) 9. If f is real-valued and measurable on E, |E| > 0, define its essential infimum on E by ess inf f = sup{α : |{x ∈ E : f (x) < α}| = 0}. E

If f ≥ 0, show that essE inf f = (essE sup 1/f )−1 . 10. Prove that L∞ (E) is not separable for any E with |E| > 0. (Construct a sequence of decreasing subsets of E whose measures strictly decrease. Consider the characteristic functions of the class of sets obtained by taking all possible unions of the differences of these subsets.)

11. If fk → f in Lp , 1 ≤ p < ∞, gk → g pointwise, and gk ∞ ≤ M for all k, prove that fk gk → fg in Lp .

12. Let f , fk ∈ Lp , 0 < p ≤ ∞. Show that if f − fk p → 0, then

fk → f p . Conversely, if fk → f a.e. and fk → f p , 0 < p < ∞, p p

show that f − fk p → 0. Show that the converse may fail for p = ∞. p p (For the converse when 0 < p < ∞, note that f − fk ≤ c(|f |p + fk ) with c = max 2p−1 , 1 ; then apply, for example, the sequential version of Lebesgue’s dominated convergence theorem given in Exercise 23 of Chapter 5.)

13. Suppose that fk → f a.e. and that fk , f ∈ Lp , 1 < p ≤ ∞. If fk p ≤ M <

+∞, show that fk g → fg for all g ∈ Lp , 1/p + 1/p = 1. Show that the result is false if p = 1. (When p > 1, use Egorov’s theorem in case the domain of integration has finite measure.) 14. Verify that the following systems are orthogonal: (a) 12 , cos x, sin x, . . . , cos kx, sin kx, . . . on any interval of length 2π. (b) e2πikx/(b−a) ; k = 0, ±1, ±2, . . . on (a, b).

208

Measure and Integral: An Introduction to Real Analysis

15. If f ∈ L2 (0, 2π), show that 2π

lim

k→∞

f (x) cos kx dx = lim

k→∞

0

2π

f (x) sin kx dx = 0.

0

Prove that the same is true if f ∈ L1 (0, 2π). (This last statement is the Riemann–Lebesgue lemma. To prove it, approximate f in L1 norm by L2 functions. See Theorem 12.21.) 16. A sequence fk in Lp is said to converge weakly in Lp to a function f

(belonging to Lp ) if fk g → fg for all g ∈ Lp . Prove that if fk → f in Lp norm, 1 ≤ p ≤ ∞, then fk converges weakly in Lp to f . Note by Exercise 15 that the converse is not true. See Exercise 28 of Chapter 10. 17. Suppose that fk , f ∈ L2 and that f k g → fg for all g ∈ L2 (i.e., fk converges weakly in L2 to f ). If fk 2 → f 2 , show that fk → f in L2 norm. The same is true for Lp , 1 < p < ∞, by a 1913 result of Radon. 18. Prove the parallelogram law for L2 : f + g 2 + f − g 2 = 2 f 2 + 2 g 2 . Is this true for Lp when p = 2? The geometric interpretation is that the sum of the squares of the lengths of the diagonals of a parallelogram equals the sum of the squares of the edge lengths. 19. Prove that a finite dimensional Hilbert space is isometric with Rn for some n. 20. Construct a function in L1 (−∞, +∞) that is not in L2 (a, b) for any a < b. +∞ (Let g(x) = x−1/2 on (0,1) and g(x) = 0 elsewhere, so that −∞ g = 2. Consider the function f (x) = ak g (x − rk ), where {rk } is the rational numbers and {ak } satisfies ak > 0, ak < +∞.) 21. If f ∈ Lp (Rn ) , 0 < p < ∞, show that 1 | f (y) − f (x)|p dy = 0 a.e. Qx |Q| lim

Q

Note by Exercise 5 that if this condition holds for a given p, then it also holds for all smaller p. let m = {mk } be 22. Let {φk } be a complete orthonormal system in L2 and 2, f ∼ a fixed bounded sequence of numbers. If f ∈ L ck φk , define Tf by Tf ∼ mk ck φk . Such an operator is called a Fourier multiplier operator. Show that T is bounded on L2 , that is, that there is a constant c

Lp Classes

209

independent of f such that ||Tf ||2 ≤ c||f ||2 for all f ∈ L2 . Show also that the smallest possible choice for c is ||m||l∞ . 23. Show that every of a separable metric space (M, d) is separa subset ble. (Let D = fk be a countable dense set in M, and for j = 1, 2, . . ., define Dj = f ∈ D : inf λ∈ d(λ, f ) < 1/j . If fk ∈ Dj , pick λk,j ∈ with

d λk,j , fk < 1/j and show that λk,j is dense in .) 24. Let E be a measurable set in Rn with 0 < |E| < ∞. Construct an orthog ∞ onal system φj j=0 in L2 (E) with φ0 = 1 everywhere in E. (Use Exercise 32 of Chapter 3 with θ = 1/2, and choose φj for j ≥ 1 to be appropriate simple functions with values ±1.) 25. If f is a measurable function on Rn , define f = supα>0 α|{|f | > α}|, and recall that f belongs to weak L1 (Rn ) if and only if f < ∞. Show that weak L1 (Rn ) has all the properties of a Banach space with respect to · except the triangle inequality. Show however that there is a constant κ > 1 such that the quasi-triangle inequality f +g ≤ κ(f +g) holds for all measurable f , g. (To show that κ cannot be 1, consider the case of one dimension and the functions f = χ[0,1/2] + 2χ(1/2,1] , g = 2χ[0,1/2] + χ(1/2,1] .) 26. Show that lim inf p→∞ ||f ||Lp (E) ≥ ||f ||L∞ (E) even if |E| = ∞. 27. (a) Prove Minkowski’s inequality for infinite series:

∞ ∞

|fk | ≤ ||fk ||p ,

k=1

p

1 ≤ p ≤ ∞.

k=1

(b) Show that in part (a), the opposite inequality holds if 0 < p ≤ 1: ∞ k=1

∞

||fk ||p ≤ |fk | ,

k=1

0 < p ≤ 1.

p

fk is positive and finite a.e., multiply and (For (b), assuming that p divide each fk in the summation on the left side of the inequality by p(1−p) fk and apply Hölder’s inequality with exponents q = 1/p and q = 1/(1 − p).) 28. Let φ(t) be a continuous function on [0, ∞) that is positive, increasing, and convex on (0, ∞) and that satisfies φ(0) = 0, limt→∞ φ(t) = ∞, and |φ(2t)| ≤ c|φ(t)| for some constant c independent of t. For exam

ple, the function φ(t) = t 1 + log+ t has these properties (see Exercise 25 of Chapter 7). If E is a measurable set in Rn , define the Orlicz space

210

Measure and Integral: An Introduction to Real Analysis

Lφ (E) to be the collection of all measurable f on E that are finite a.e. in E and satisfy φ(|f |) ∈ L(E). Show that Lφ (E) is a Banach space with norm ||f ||Lφ (E) = inf λ > 0 :

E

|f | φ λ

dx ≤ 1 .

In case φ(t) = t 1 + log+ t , the class is often denoted L log L(E) (see Exercise 22 of Chapter 9 for a result about functions in L log L (Rn )). 29. Let 1 ≤ p < ∞ and f ∈ Lp (Rn ). Show that ||f (x + h) − f (x)||p (where the norm is taken with respect to x) is a uniformly continuous function of h. Is the same true when 0 < p < 1? 30. Let 1 ≤ p < ∞ and E be a measurable set in Rn . (a) Prove that if f1 , f2 , g1 , g2 are nonnegative and measurable on E, then

f1 + f2

p

1/p

p dx + g1 + g2

E

p p f1 + g1 dx ≤

1/p

+

E

p p f2 + g2 dx

1/p

.

E

(b) If the right side in part (a) is replaced by

E

p f1

p + f2

1/p

dx

p p g1 + g2 dx +

1/p

,

E

is the resulting inequality true? ∞ (c) If fi,j i,j=1 are measurable functions on E, show that ⎡ ⎡ ⎛ ⎞ ⎤1/p p ⎤1/p p fi,j (x) dx⎦ ≤ fi,j (x) ⎠ dx⎦ . ⎣ ⎣ ⎝ E

j

i

i

E

j

For i = 1, . . . , N (N finite), consider the sequences Fi = fi,j j and p 1/p N . note that || N i=1 Fi ||l p = i=1 |fi,j | j 31. Let a = {ak } be a sequence of real or complex numbers. Show that ||a||p ≤ ||a||1 if 1 ≤ p ≤ ∞ and more generally that ||a||p ≤ ||a||q if 0 < q ≤ p ≤ ∞. (If 1 ≤ p < ∞, the inequality |a1 |p + |a2 |p ≤ (|a1 | + |a2 |)p may be used together with an induction argument.)

Lp Classes

211

32. For nonnegative measurable functions f and g on (0, ∞), let ∞ x dy g(y) , F(x) = f y y

x ∈ (0, ∞).

0

Also, if 1 ≤ p < ∞, set ⎛

⎞1/p dx [f ]p = ⎝ f (x)p ⎠ , x ∞ 0

and if p = ∞, define [f ]∞ = ess sup f . Prove that for 1 ≤ p ≤ ∞, [F]p ≤ [f ]1 [g]p .

(0,∞)

9 Approximations of the Identity and Maximal Functions

9.1 Convolutions The convolution of two functions f and g that are measurable in Rn is defined by (f ∗ g)(x) = f (t)g(x − t) dt, x ∈ Rn , Rn

provided the integral exists. In Theorem 6.14, we saw that if f , g ∈ L1 (Rn ), then f ∗ g exists a.e. and is measurable in Rn , and f ∗ g1 ≤ f 1 g1 . Moreover, according to Corollary 6.16, f ∗ g1 = f 1 g1 if f and g are nonnegative and measurable. In this section, we will study some additional properties of convolutions, beginning with the following theorem.

Theorem 9.1 Lp (Rn ) and

Let 1 ≤ p ≤ ∞, f ∈ Lp (Rn ) and g ∈ L1 (Rn ). Then f ∗ g ∈ f ∗ gp ≤ f p g1 .

Proof. We may suppose that 1 < p ≤ ∞, since when p = 1, the result is just Theorem 6.14. Let us first prove the result in case f and g are nonnegative. Then f ∗ g is nonnegative and measurable on Rn by Corollary 6.16. If p = ∞, (f ∗ g)(x) ≤ f ∞ g(x − t) dt = f ∞ g(x − t) dt = f ∞ g1 . Rn

Rn

Therefore, f ∗ g∞ ≤ f ∞ g1 , as claimed. If 1 < p < ∞, we write 1 1 + = 1. f (t)g(x − t)1/p g(x − t)1/p dt, (f ∗ g)(x) = p p n R

213

214

Measure and Integral: An Introduction to Real Analysis

By Hölder’s inequality with exponents p and p , (f ∗ g)(x) ≤

1/p

p

f (t) g(x − t) dt

Rn

1/p g(x − t) dt

Rn 1/p

= (f p ∗ g)(x)1/p g1 . Now raise the first and last terms in this inequality to the pth power and p integrate the result. Since Rn (f p ∗ g) dx = f p g1 by Corollary 6.16, we obtain p

p

1 + (p/p )

f ∗ gp ≤ f p g1

p

p

= f p g1 .

The theorem follows for f , g ≥ 0 by taking pth roots. For general f ∈ Lp (Rn ) and g ∈ L1 (Rn ), let us first show that f ∗ g exists a.e. and is measurable. By the case already considered, we have | f | ∗ |g| ∈ Lp (Rn ). Hence, | f | ∗ |g| < ∞ a.e., so that f (x − t)g(t) ∈ L1 (dt) for a.e. x. Consequently, f ∗ g exists and is finite a.e. To show it is measurable, define fN = f χ{|x| 0, let Kε (x) = ε−n K

x ε

= ε−n K

x

1

ε

,...,

xn . ε

For example, if K(x) = χ{|x|δ |Kε | → 0 as ε → 0, for any fixed δ > 0. Proof. Part (i) follows immediately from the change of variables y = x/ε (see Exercise 20 of Chapter 5). For part (ii), fix δ > 0, and let y = x/ε. Then |x|>δ

|Kε (x)| dx = ε−n

x dx = K ε

|x|>δ

|K(y)| dy.

|y|>δ/ε

Since K ∈ L and δ/ε → +∞ as ε → 0, it follows that the last integral tends to zero as ε → 0. This completes the proof. Note that for K ≥ 0, property (i) means that the areas under the graphs of K and Kε are the same, while (ii) means that for small ε, the bulk of the area under the graph of Kε is concentrated in the region above a small neighborhood of the origin. For any K ∈ L, we can expect from (ii) that the effect of letting ε → 0 in the formula (f ∗ Kε )(x) = f (t)Kε (x − t)dt will be to emphasize the values of f (t)

218

Measure and Integral: An Introduction to Real Analysis

when t is near x. As a simple illustration, let Qε (x) denote the cube in Rn of edge length ε centered at x and consider the kernel k(t) = χQ1 (0) (t). Then kε (x − t) = ε−n k((x − t)/ε) = ε−n χQε (x) (t), and 1 1 f ∗ kε (x) = n f (t) dt = f (t) dt. ε |Qε (x)| Qε (x)

Qε (x)

In this case, the Lebesgue Differentiation Theorem 7.2 implies that (f ∗ kε ) (x) → f (x) a.e as ε → 0 if f is locally integrable. The next four theorems show that (f ∗Kε )(x) → f (x) in various senses (e.g., in norm or pointwise) as ε → 0 if K is suitably restricted. A family {Kε : ε > 0} of kernels for which f ∗ Kε → f in some sense is called an approximation of the identity. See also Exercise 23(a). In what follows, we shall use the notation fε (x) for the convolution (f ∗ Kε )(x). Theorem 9.6 Let fε = f ∗ Kε , where K ∈ L1 (Rn ) and 1 ≤ p < ∞, then

Rn

K = 1. If f ∈ Lp (Rn ),

fε − f p → 0 as ε → 0. Proof. By Lemma 9.5(i), f (x) = f (x)

Rn

Kε (t) dt =

f (x)Kε (t) dt.

Rn

Therefore, | fε (x) − f (x)| = [ f (x − t) − f (x)] Kε (t) dt n R

≤

Rn

| f (x − t) − f (x)||Kε (t)|1/p |Kε (t)|1/p dt,

219

Approximations of the Identity and Maximal Functions

where 1/p + 1/p = 1 (1/p = 0 if p = 1). Applying Hölder’s inequality with exponents p and p and then raising both sides to the pth power and integrating with respect to x, we obtain

|fε (x) − f (x)|p dx

Rn

≤

Rn

=

p

|f (x − t) − f (x)| |Kε (t)| dt

Rn

p/p K1

Rn

p/p |Kε (t)| dt

Rn

dx

p

| f (x − t) − f (x)| |Kε (t)| dt dx.

Rn

Changing the order of integration in the last expression (which is justified since the integrand is nonnegative), we obtain p/p

p

fε − f p ≤ K1

|Kε (t)|φ(t) dt,

Rn

where φ(t) =

p

Rn

| f (x − t) − f (x)|p dx = f (x − t) − f (x)p . For δ > 0, write

Iε =

|Kε (t)|φ(t) dt =

|t| 0, we can choose δ so small that φ(t) < η if |t| < δ (note that φ(t) → 0 as |t| → 0 by Theorem 8.19). Then Aε,δ ≤ η

|Kε (t)| dt ≤ ηK1

|t| 0, write f = g + h where g has compact support and ||h||p < η. Choose a kernel K ∈ C∞ 0 with Rn K = 1, and

220

Measure and Integral: An Introduction to Real Analysis

let gε = g ∗ Kε . Then gε ∈ C∞ 0 and, by Theorem 9.6, g − gε p → 0. By Minkowski’s inequality, f − gε p ≤ g − gε p + hp < g − gε p + η. Choosing ε so that g − gε p < η, we obtain f − gε p < 2η, and the corollary follows. The next result is a substitute for Theorem 9.6 in case f ∈ L∞ . Theorem 9.8 Let fε = f ∗ Kε , where K ∈ L1 (Rn ) and Rn K = 1. If f ∈ L∞ (Rn ), then fε → f as ε → 0 at every point of continuity of f , and the convergence is uniform on any set where f is uniformly continuous. Proof. Note that for every ε > 0, fε (x) converges absolutely for all x since f ∈ L∞ and K ∈ L1 . As before, | fε (x) − f (x)| ≤

|f (x − t) − f (x)||Kε (t)| dt.

Rn

If f is continuous at x, then given η > 0, there exists δ > 0 such that | f (x − t) − f (x)| < η if |t| < δ. Hence, Rn

| f (x − t) − f (x)||Kε (t)| dt ≤ η

|Kε (t)| dt + 2 f ∞

|t| 0, choose δ > 0 such that | f (x − t) − f (x)| < η if |t| < δ. Note that fε (x) converges absolutely for all x since f ∈ L1 and K ∈ L∞ . As usual, | fε (x) − f (x)| ≤ η

|Kε (t)| dt +

|t| 0.

(9.10)

Pε is called the Poisson kernel, and the convolution fε (x) = (f ∗ Pε )(x) =

+∞ 1 ε f (t) 2 dt π −∞ ε + (x − t)2

is called the Poisson integral of f. Setting ε = y and letting f (x, y) = fy (x), we obtain a function f (x, y) defined in the upper half plane {(x, y) : −∞ < x < +∞, y > 0}. Notice that y/(y2 + x2 ) is the imaginary part of −1/z, z = x + iy, and so is harmonic in the upper half-plane; that is, Py (x) satisfies Laplace’s equation

∂2 ∂2 Py (x) = 0 if y > 0. + ∂x2 ∂y2

We leave it as an exercise to show that if f ∈ Lp , 1 ≤ p ≤ ∞, then

∂2 ∂2 + 2 2 ∂x ∂y

+∞

∂2 ∂2 f (t) + 2 f (x, y) = 2 ∂x ∂y −∞

Py (x − t) dt,

so that (∂ 2 /∂x2 + ∂ 2 /∂y2 )f (x, y) = 0 for y > 0. Hence, f (x, y) is also harmonic in the upper half-plane. If f is integrable on (−∞, +∞), it follows from Theorem 9.9 that f (x, y) → f (x) as y → 0 wherever f is continuous. Thus, f (x, y) solves the Dirichlet problem for the upper half-plane; that is, if f (x) is continuous and integrable on (−∞, +∞), then f (x, y) defines a function that is harmonic in the upper half-plane and that tends to f (x) as y → 0. See also Exercises 15, 16. The Fejér kernel. Let 1 K(x) = π

sin x x

2

[x ∈ (−∞, +∞)].

Then K satisfies the same conditions as P(x) in the previous example, and Kε (x) = (1/π)[ε sin2 (x/ε)/x2 ]. Setting w = 1/ε, w > 0, we obtain the Fejér kernel F(x, w) =

1 sin2 wx . π wx2

(9.11)

Approximations of the Identity and Maximal Functions

223

If f εL(−∞, +∞) and if f is continuous at x, then by Theorem 9.9, +∞ 1 sin2 wt lim f (x − t) dt = f (x). w→+∞ π wt2 −∞

The Gauss–Weierstrass kernel. The function 1 2 K(x) = √ e−x π

[x ∈ (−∞, +∞)]

also satisfies all the required conditions (see Exercise 11 of Chapter 6). √ 2 2 √ Here, Kε (x) = (1/ πε)e−x /ε , and letting ε = y, y > 0, we obtain the Gauss–Weierstrass kernel 1 2 W(x, y) = √ e−x /y . πy

(9.12)

The convolution +∞ 1 2 f (x − t)e−t /y dt Wf (x, y) = f ∗ W(·, y) (x) = √ πy −∞

is called the Gauss–Weierstrass integral of f . If f is integrable on (−∞, +∞) and continuous at x, then lim Wf (x, y) = f (x).

y→0+

Notice that W(x, y) satisfies the heat equation ∂2 ∂ W=4 W ∂y ∂x2 in the upper half-plane, as does Wf (x, y) if f ∈ Lp (−∞, ∞) for some p, 1 ≤ p ≤ ∞. Higher dimensional versions of the Gauss–Weierstrass and Poisson kernels are discussed in Chapter 13. If we strengthen the condition K(x) = o(|x|−n ), |x| → + ∞, used in Theorem 9.9, we can obtain the convergence of fε to f almost everywhere. The following result is fairly typical of theorems of this kind. Its hypotheses are met by any of the three examples just listed.

224

Measure and Integral: An Introduction to Real Analysis

Theorem 9.13 Suppose that f ∈ L(Rn ), K is bounded, K(x) = O(|x|−n−λ ) as |x| → +∞ for some λ > 0, and Rn K = 1. If fε = f ∗ Kε , then fε → f at each point of the Lebesgue set of f . Proof. Let x0 be a point of the Lebesgue set of f (see (7.14)), so that ρ−n |x| 0, there is a δ > 0 such that F(ρ) < ηρn if ρ ≤ δ. Write

| f (x)|

Rn

ελ dx = + n+λ (ε + |x|) |x|≤δ

= A + B.

|x|>δ

Taking φ(ρ) = ελ /(ε + ρ)n+λ and [a, b] = [0, δ] in Lemma 9.14, we have A=

δ 0

ελ dF(ρ). (ε + ρ)n+λ

Integrating by parts and observing that F(0) = 0, we obtain ελ ελ F(δ) + (n + λ) F(ρ) dρ. n+λ (ε + δ) (ε + ρ)n+λ+1 δ

A=

0

226

Measure and Integral: An Introduction to Real Analysis

The first term on the right tends to zero as ε → 0. The definition of δ and the change of variables ρ = εt show that the second term is at most (n + λ)η

δ 0

ρn

δ/ε ελ tn dρ = (n + λ)η dt. n+λ+1 (ε + ρ) (1 + t)n+λ+1 0

Hence, lim sup A ≤ (n + λ)η ε→0

∞ 0

tn = c η, (1 + t)n+λ+1

where c (=cλ,n ) is finite since λ > 0. Finally, to estimate B, note that if |x| > δ, then ε + |x| > δ, so that B≤

ελ δn+λ

| f (x)| dx ≤

|x|>δ

ελ δn+λ

f 1 .

Hence, limε→0 B = 0. Combining these estimates, we obtain lim supε→0 (A + B) ≤ c η, and the theorem follows by letting η → 0. See Exercise 12 for the case f ∈ Lp , p > 1. The kernels {Kε } for K satisfying the kinds of conditions earlier are examples of approximations of the identity.

9.3 The Hardy–Littlewood Maximal Function Let f ∗ denote the Hardy–Littlewood maximal function of f : f ∗ (x) = sup

1 | f (y)| dy, |Q| Q

where the supremum is taken over all cubes Q with center x and edges parallel to the coordinate axes (see (7.5)). If f ∈ Lp (Rn ) for some p ≥ 1, then f is locally integrable in Rn , and consequently, | f | ≤ f ∗ a.e. by Theorem 7.11. We observed on p. 136 in Section 7.2 that f ∗ is not integrable over Rn (unless f = 0 a.e.), but does satisfy the weak-type condition |{x ∈ Rn : f ∗ (x) > α}| ≤

c f 1 α

(α > 0),

(9.15)

Approximations of the Identity and Maximal Functions

227

where c depends only on n (Lemma 7.9). The behavior of f ∗ on the other Lp spaces, 1 < p ≤ ∞, turns out to be better. For example, it is clear from the definition of f ∗ that f ∗ (x) ≤ f ∞ for all x. Thus, f ∗ is bounded if f is, and f ∗ ∞ ≤ f ∞ . The following theorem describes the behavior of f ∗ when f ∈ Lp , p > 1.

Theorem 9.16

Let 1 < p ≤ ∞ and f ∈ Lp (Rn ). Then f ∗ ∈ Lp (Rn ) and f ∗ p ≤ c f p ,

where c depends only on n and p. Proof. Let f ∈ Lp (Rn ). We may assume that 1 < p < ∞ since the result is obvious with constant c = 1 when p = ∞. The idea is to obtain information for Lp by interpolating between the known results for L1 and L∞ . For α > 0, let ω(α) = x ∈ Rn : f ∗ (x) > α denote the distribution function of f ∗ . Fix α > 0 and define a function g by g(x) = f (x) when | f (x)| ≥ α/2 and g(x) = 0 otherwise. Note that g ∈ L1 (Rn ) since

g1 =

| f (x)| dx

{x∈Rn :|f (x)|≥α/2}

≤

| f (x)|

Rn

| f (x)| α/2

p−1

dx =

p−1 2 p f p < ∞. α

Also, the difference f − g ∈ L∞ (Rn ); in fact, f − g∞ ≤ α/2. Then since | f (x)| ≤ |g(x)| + α/2, f ∗ (x) ≤ sup

1 α α |g(y)| dy + = g∗ (x) + . |Q| 2 2 Q

In particular, α {x ∈ Rn : f ∗ (x) > α} ⊂ x ∈ Rn : g∗ (x) > , 2

228

Measure and Integral: An Introduction to Real Analysis

so that, by (9.15), α ω(α) ≤ x ∈ Rn : g∗ (x) > 2 2c 2c ≤ g1 = α α n

| f (x)| dx.

{x∈R :|f (x)|≥α/2}

∞ We have the formula Rn f ∗p dx = p 0 αp−1 ω(α) dα, which was stated on two occasions: Exercise 16 of Chapter 5, and Exercise 5 of Chapter 6. Hence,

f ∗p dx ≤ p

Rn

∞ 0

⎡ 2c αp−1 ⎣ α

⎤

|f (x)| dx⎦ dα.

{x∈Rn :| f (x)|≥α/2}

Interchanging the order of integration in the expression on the right (which is justified since the integrand is nonnegative), we obtain

f ∗p dx ≤ 2cp

Rn

⎛ | f (x)| ⎝

Rn

2|f(x)|

⎞ αp−2 dα⎠ dx.

0

Since p − 2 > −1 (i.e., p > 1), the inner integral equals (2| f (x)|)p−1 /(p − 1) a.e. (wherever f (x) is finite), so that Rn

f ∗p dx ≤

2p pc 2p pc p f p . | f (x)|p dx = p−1 n p−1 R

p

Taking pth roots, we see that f ∗ p ≤ Cp f p , where Cp = 2p pc/(p − 1). This completes the proof. Note that the constant Cp tends to +∞ as p → 1 and is bounded as p → ∞. The Hardy–Littlewood maximal function plays an important role in many parts of analysis concerned with operator theory and differentiation. It arose naturally in Chapter 7 in connection with Lebesgue’s differentiation theorem, and it will be used in the proof of Theorem 9.19 and frequently in later chapters. As another illustration of its usefulness, we have the following result. Theorem 9.17 Let K(x) be nonnegative and integrable on Rn and suppose that K(x) depends only on |x| and decreases as |x| increases (i.e., K(x) = φ(|x|), where φ(t), t > 0, is monotone decreasing). Then

229

Approximations of the Identity and Maximal Functions

sup |(f ∗ Kε )(x)| ≤ cf ∗ (x), ε>0

with c independent of f . In particular, for such kernels K, || sup |(f ∗ Kε )| ||p ≤ c|| f ||p ,

if 1 < p ≤ ∞,

ε>0

|{x : sup |(f ∗ Kε )(x)| > α}| ≤ ε>0

c || f ||1 , α > 0, α

if p = 1,

with c independent of f and α. Proof. We first remark that there is a constant c depending only on n such that

sup δ−n δ>0

| f (x − y)| dy ≤ cf ∗ (x).

|y| 0, Kε (y) > t}. Then E is a measurable subset of Rn+1 by Theorem 5.1, and Kε (y) =

K ε (y)

dt =

0

∞

χE (y, t) dt.

0

Hence, ⎡ ⎤ ∞ |(f ∗ Kε )(x)| = f (x − y)Kε (y) dy ≤ | f (x − y)| ⎣ χE (y, t) dt⎦ dy. n n R

R

0

Changing the order of integration in the last expression, we obtain |(f ∗ Kε )(x)| ≤

∞ 0

=

∞ 0

| f (x − y)|χE (y, t) dy dt

Rn

⎡ ⎣

⎤ | f (x − y)| dy⎦ dt.

{y:Kε (y)>t}

Let Et = {y : Kε (y) > t}, t > 0. Since K(y) depends only on |y| and decreases as |y| increases, Et is a ball with center 0 unless it is empty or the single point 0.

230

Measure and Integral: An Introduction to Real Analysis

In the t-integration in the last display, we may ignore any values of t such that |Et | = 0 since the inner integral is then zero. For all other t, Et is a ball, and by our earlier remark, the inner integral satisfies Et

⎡

⎤ 1 | f (x − y)| dy = |Et | ⎣ | f (x − y)| dy⎦ ≤ |Et | cf ∗ (x). |Et | Et

By combining estimates, we have |(f ∗ Kε )(x)| ≤

∞

|Et | cf ∗ (x) dt = cf ∗ (x)

0

∞

|Et | dt.

0

Finally, note that |Et | is the distribution function of Kε , so that ∞

|Et | dt = Kε 1 = K1 .

0

Therefore, |(f ∗ Kε )(x)| ≤ c||K||1 f ∗ (x), and the first statement of the theorem follows by taking the sup over ε > 0. The second statement is then a corollary of Theorem 9.16 if p > 1, and of (9.15) if p = 1. We leave it to the reader (Exercise 18) to show that Theorem 9.17 can also be derived from the formula in the conclusion of Lemma 9.14 (see, e.g., Exercise 17 and the proof of Theorem 12.61). In particular, for the kernel K(x) = 1/(1 + |x|n+λ ), λ > 0, Theorem 9.17 gives ελ sup f (x − y) n+λ dy ≤ cf ∗ (x), n+λ ε + |y| ε>0 n

(9.18)

R

a fact that will be used in the next section. Note that the conclusion of Theorem 9.17 is valid for any K that is majorized in absolute value by a kernel satisfying the hypothesis of Theorem 9.17. This includes any K satisfying the hypothesis of Theorem 9.13.

9.4 The Marcinkiewicz Integral We recall from Theorem 6.17, that if F is a closed subset of a bounded open interval (a, b) in R1 , and if δ(x) denotes the distance from x to F, then the Marcinkiewicz integral

Approximations of the Identity and Maximal Functions

Mλ (x) =

b a

δλ (y) dy |x − y|1+λ

231

(λ > 0)

is integrable over F. More generally, in Exercise 7 of Chapter 6, we considered the expression δλ (y)f (y) dy |x − y|1+λ 1

(λ > 0),

R

where f is nonnegative and integrable over the complement of F. If f = χ(a,b) , this reduces to Mλ (x). In case λ = 1, Mλ plays a role in proving Theorem 12.67. Now we consider Lp estimates for n-dimensional analogues, namely, for the integral Jλ (f )(x) =

δλ (y)f (y) dy |x − y|n+λ n

(x ∈ Rn ),

R

and the modified form Hλ (f )(x) =

Rn

δλ (y)f (y) dy. |x − y|n+λ + δ(x)n+λ

Here again, λ > 0, δ(x) denotes the distance from x to a closed set F ⊂ Rn and f is nonnegative and measurable on Rn . Notice that Hλ (f ) and Jλ (f ) are equal in F since δ is zero there. For the same reason, Jλ (f ) and Hλ (f ) are independent of the values of f on F. Therefore, we may assume for simplicity that f = 0 on F. We will prove in the next theorem that if f ∈ Lp (Rn − F), 1 ≤ p < ∞, then Hλ (f ) ∈ Lp (Rn ). This implies the basic fact that Jλ (f ) ∈ Lp (F). (In general, Jλ (f ) diverges outside F: see, e.g., Exercise 9 of Chapter 6.) For the proof, it will be convenient to consider one more modification of Jλ (f ), namely, Hλ (f )(x) =

Rn

δλ (y)f (y) dy. |x − y|n+λ + δ(y)n+λ

As we move from a point x to another point y, the distance from F does not increase by more than |x − y|. Hence, |δ(x) − δ(y)| ≤ |x − y|.

232

Measure and Integral: An Introduction to Real Analysis

It follows that δ(y) < |x − y| + δ(x), so that we have δn+λ (y) ≤ 2n+λ [|x − y|n+λ + δ(x)n+λ ], as well as a similar inequality with x and y interchanged. We immediately obtain that 2−n−λ−1 Hλ (f )(x) ≤ Hλ (f )(x) ≤ 2n+λ+1 Hλ (f )(x). Thus, inequalities for Hλ lead to ones for Hλ , but Hλ is easier to deal with. Theorem 9.19

If f ∈ Lp (Rn ), 1 ≤ p < ∞, and λ > 0, then Hλ (f ) ∈ Lp (Rn ) and Hλ (f )p ≤ c f p ,

where c is independent of f . In particular, Jλ (f )p,F ≤ c f p . Proof. Fix p, 1 ≤ p < ∞, and let g be any nonnegative function with gp ≤ 1, where 1/p + 1/p = 1. By interchanging the order of integration, we obtain

Hλ (f )(x)g(x) dx =

Rn

f (y)δλ (y)

Rn

Rn

g(x) dx dy. |x − y|n+λ + δ(y)n+λ

The outer integration on the right can be restricted to Rn −F without changing the value of the integral. However, if y ∈ Rn − F, then δ(y) > 0, and it follows from (9.18) that the inner integral on the right is bounded by cδ(y)−λ g∗ (y). Combining this estimate with Holder’s inequality and Theorem 9.16, we obtain Rn

Hλ (f )(x)g(x) dx ≤ c

f (y)g∗ (y) dy

Rn

≤ c f p g∗ p ≤ c1 f p gp ≤ c1 f p . By (8.9), the supremum of the left side for all such g is Hλ (f )p , so that Hλ (f )p ≤ c1 f p , and the theorem follows.

233

Approximations of the Identity and Maximal Functions

Exercises 1. Use Minkowski’s integral inequality (see Exercise 8 of Chapter 8) to prove Theorem 9.1 for 1 < p < ∞. 2. (a) Prove Young’s Theorem 9.2. (For f , g ≥ 0 and p, q, r < ∞, write

( f ∗ g)(x) =

f (t)p/r g(x − t)q/r · f (t)p(1/p−1/r) · g(x − t)q(1/q−1/r) dt,

and apply Hölder’s inequality for three functions (Exercise 6 of Chapter 8) with exponents r, p1 , and p2 , where 1/p1 = 1/p − 1/r, 1/p2 = 1/q − 1/r.) (b) Suppose that the conclusion of Theorem 9.2 holds for three indices p, q, r > 0. Show that 1/r = 1/p + 1/q − 1. (Apply the inequality in the conclusion to the dilated functions f (λx) and g(λx), λ > 0, and let λ vary. A similar method is used in Exercise 13 of Chapter 14.)

3. (a) Show that if f ∈ Lp (Rn ) and K ∈ Lp (Rn ), 1 ≤ p ≤ ∞, 1/p + 1/p = 1, then f ∗ K is bounded and continuous in Rn . (b) Sketch the (trapezoidal) graph of χI ∗ χJ where I and J are onedimensional intervals. Consider also the case when the two intervals are the same. 4. (a) Show that the function h defined by h(x) = e−1/x for x > 0 and h(x) = 0 for x ≤ 0 is in C∞ . 2

(b) Show that the function g(x) = h(x − a)h(b − x), a < b, is C∞ with support [a, b]. n (c) Construct a function in C∞ 0 (R ) whose support is a ball or an interval. 5. Let G and G1 be bounded open subsets of Rn such that G1 ⊂ G. Construct a function h ∈ C∞ 0 such that h = 1 in G1 and h = 0 outside G. (Choose an open G2 such that G1 ⊂ G2 , G2 ⊂ G. Let h = χG2 ∗ K for a K ∈ C∞ with suitably small support and K = 1.) 6. Prove Theorem 9.4. 7. Let f ∈ Lp (−∞, +∞), 1 ≤ p ≤ ∞. Show that the Poisson integral of f , f (x, y), is harmonic in the upper half-plane y > 0. (Show that ((∂ 2 /∂x2 ) + +∞ (∂ 2 /∂y2 ))f (x, y) = −∞ f (t)((∂ 2 /∂x2 ) + (∂ 2 /∂y2 ))Py (x − t) dt.)

8. (Schur’s lemma) For s, t ≥ 0, let K(s, t) satisfy K ≥ 0 and K(λs, λt) = ∞ λ−1 K(s, t) for all λ > 0, and suppose that 0 t−1/p K(1, t) dt = γ < +∞ for some p, 1 ≤ p ≤ ∞. For example, K(s, t) = 1/(s + t) has these properties. Show that if

234

Measure and Integral: An Introduction to Real Analysis

(Tf )(s) =

∞

f (t)K(s, t) dt (f ≥ 0),

0

then Tf p ≤ γ f p . (Note that K(s, t) = s−1 K(1, t/s), and therefore ∞ (Tf )(s) = 0 f (ts)K(1, t) dt. Now apply Minkowski’s integral inequality [see Exercise 8 of Chapter 8].) 9. (a) The maximal function is defined as f ∗ (x) = sup |Q|−1 Q | f |, where the supremum is taken over cubes Q with center x. Let f ∗∗ (x) be defined similarly, but with the supremum taken over all cubes Q containing x. Thus, f ∗ (x) ≤ f ∗∗ (x). Show that there is a positive constant c depending only on the dimension such that f ∗∗ (x) ≤ cf ∗ (x). (b) If f ∗∗ were instead defined to be sup |B|−1 B | f | where the supremum is taken over all balls B containing x, show that there are positive constants c1 and c2 depending only on the dimension so that c1 f ∗ (x) ≤ f ∗∗ (x) ≤ c2 f ∗ (x) for all x. 10. Let T : f → Tf be a function transformation that is sublinear; that is, T has the property that if Tf1 and Tf2 are defined, then so is T(f1 + f2 ), and |T(f1 + f2 )(x)| ≤ |(Tf1 )(x)| + |(Tf2 )(x)|. Suppose also that there are constants c1 and c2 such that T satisfies Tf ∞ ≤ c1 f ∞ and |{x : |(Tf )(x)| > α}| ≤ c2 α−1 f 1 , α > 0. Show that for 1 < p < ∞, there is a constant c3 such that Tf p ≤ c3 f p . This is a special case of an interpolation result due to Marcinkiewicz. (An example of such a T is the maximal function operator Tf = f ∗ , and the proof in the general case is like that for f ∗ .) 11. Generalize Theorem 9.6 as follows: Let fε = f ∗Kε , K ∈ L1 (Rn ) and Rn K = γ. If f ∈ Lp (Rn ), 1 ≤ p < ∞, show that fε − γf p → 0. Derive analogous results for Theorems 9.8, 9.9, and 9.13. (The case γ = 0 follows from the case γ = 1 by considering K(x)/γ.) 12. Show that the conclusions of Theorems 9.9 and 9.13 remain true if the assumption that f ∈ L1 is replaced by f ∈ Lp , p > 1. 13. Let f ∈ Lp (0, 1), 1 ≤ p < ∞, and for each k = 1, 2, . . . , define a function fk on (0,1) by letting Ik,j = {x : (j − 1)2−k ≤ x < j2−k }, j = 1, . . . , 2k , and setting fk (x) equal to |Ik,j |−1 Ik,j f for x ∈ Ik,j . Prove that fk → f in Lp (0, 1) norm. (Exercise 17 of Chapter 7 may be helpful for the case p = 1.) 14. Show that Theorem 9.6 and Corollary 9.7 fail for p = ∞. 15. Regarding the Dirichlet problem for the upper half-space, more can be said about the behavior of the Poisson integral f (x, y) of f (x) near a point of continuity of f (x). Prove that if f ∈ Lp (−∞, ∞), 1 ≤ p ≤ ∞, and f

Approximations of the Identity and Maximal Functions

235

is continuous at a point x0 , then f (x, y) → f (x0 ) as (x, y) approaches x0 unrestrictedly, that is, as x → x0 and y → 0, y > 0. 16. Let 1 ≤ p ≤ ∞, f ∈ Lp (−∞, ∞), and x0 be a Lebesgue point of f . Show that the Poisson integral f (x, y) of f converges nontangentially to f (x0 ), that is, show that for any γ > 0, f (x, y) → f (x0 ) as x → x0 and y → 0 with |x − x0 | ≤ γy. (Note that the Poisson kernel satisfies Py (t + z) ≤ Cγ Py (t) if |z| ≤ γy, with Cγ independent of y, t, z.) (See also Theorems 12.42 and 12.64.) 17. Prove that the conclusion of Lemma 9.14 holds without assuming φ is b continuous provided a φ(ρ) dF(ρ) exists. Show, for example, that the conclusion holds if φ is monotone and finite on [a, b]. (For the second b b part, recall from Theorem 2.21 that a φ dF exists if a F dφ does.) 18. Derive the first part of Theorem 9.17 by using the formula in the conclusion of Lemma 9.14 (even though the function φ in Theorem 9.17 is not assumed to be continuous). (Use Exercise 17. Note that if K(x) = φ(|x|) satisfies the hypothesis of Theorem 9.17, then φ(|x|) = o(|x|−n ) as |x| → 0 and as |x| → ∞). n 19. Let K(x) be a nonnegative, decreasing, integrable radial function on R (i.e., K satisfies the hypothesis of Theorem 9.17), and let K = 1. Use Theorem 9.17 to show that if 1 ≤ p ≤ ∞ and f ∈ Lp (Rn ), then f ∗Kε → f a.e. as ε → 0. (In case 1 ≤ p < ∞, a proof reminiscent of the proof of Lebesgue’s differentiation theorem can be constructed by applying Corollary 9.7.)

20. Show that the conclusion f ∗ Kε → f in Exercise 19 is valid at every Lebesgue point of f . (A proof based on integrating the formula in Lemma 9.14 by parts is possible; cf. Exercise 18.) 21. Let f and g be locally integrable functions on Rn and suppose that | f | ∗ |g| is finite a.e. in Rn . Prove that f ∗ g is measurable on Rn . (See the argument in the last part of the proof of Theorem 9.1 involving the truncated functions fN and also truncate g.) 22. As we know from Chapter 7, the maximal function f ∗ of an f ∈ L1 (Rn ) maynot be locally integrable. Show that if f satisfies the stronger condition Rn | f |(1 + log+ | f |) dx < ∞, then f ∗ ∈ L1 (E) for every measurable set E with |E| < ∞, and

f ∗ dx ≤ C |E| + | f | log+ | f | dx ,

E

Rn

with C independent of f and distribution function ∞E. (Let ω(α) denote γ the ∞ of f ∗ relative to E. Write 0 ω(α) dα = 0 + γ for γ > 0. The first integral on the right is bounded by γ|E|. In the second integral, use the estimate ω(α) ≤ Cα−1 {| f |>α/2} | f |; cf. the proof of Theorem 9.16.)

236

Measure and Integral: An Introduction to Real Analysis

23. (a) Show that there is no identity element for the convolution operation on L1 (Rn ), that is, there is no function k ∈ L1 (Rn ) such that f ∗ k = f a.e. for every f ∈ L1 (Rn ). (b) Show that if f , g ∈ L2 (Rn ), then f ∗ g belongs to L∞ (Rn ) but not necessarily to L2 (Rn ). See also Lemma 13.49.

10 Abstract Integration In the preceding chapters, we developed a theory of integration based on a theory of measurable sets. The notion of the measure of a set was in turn based on the primitive and classical notion of the measure (or volume) of an interval in Rn ; this led almost automatically by the process of covering to the notion of measure for more general sets. In this chapter, we follow an alternate approach. We will consider a family of sets and assume that they all have measures, that is, assume that with each member of the family, we can associate a nonnegative number satisfying elementary and natural requirements that justify calling it a measure. Starting with this assumption, we will develop a theory of integration that follows the pattern of Lebesgue integration. The advantage of this method is that it can be applied not only to Rn but also to general abstract spaces with much less geometric structure than Rn . Thus, it is important for applications. There are new questions that arise in the abstract setting, but many of the theorems and proofs are practically the same as those for Lebesgue measure in Rn . In such cases, we will usually refer to earlier chapters for proofs. It is natural to ask how we can construct such measures. One possible approach is to start with the more elementary notion of an outer measure in an abstract space and, as in the case of Rn discussed in Chapter 3, select a subclass of sets on which the outer measure has additional properties, qualifying it as a measure. This idea will be developed in Chapter 11.

10.1 Additive Set Functions and Measures Let S be a fixed set, and let be a σ-algebra of subsets of S ; that is, let satisfy the following: (a) S ∈ . (b) If E ∈ , then its complement CE (=S − E) ∈ (i.e., is closed under complements). (c) If Ek ∈ for k = 1, 2, . . . , then Ek ∈ (i.e., is closed under countable unions).

237

238

Measure and Integral: An Introduction to Real Analysis

It is easy to see that the definition is unchanged if condition (a) is replaced by the assumption that be nonempty; see also p. 49 in Section 3.2. Another widely used term for a σ-algebra is a countably additive family of sets. Immediate consequences of the definition are that the following sets belong to : (1) The empty set ∅ ( = CS ), (2) Ek if Ek ∈ , k = 1, 2, . . ., ∞ ∞ ∞ ∞ (3) lim sup Ek ( = m=1 k=m Ek ) and lim inf Ek ( = m=1 k=m Ek ) if each Ek ∈ , (4) E1 − E2 ( = E1 ∩ CE2 ) if E1 , E2 ∈ . We recall the basic fact that the collection of Lebesgue measurable subsets of Rn is a σ-algebra: see Theorem 3.20. In general, the elements E of a σ-algebra are called -measurable sets, or simply measurable sets if it is clear from context what is. If is a σ-algebra, then a real-valued function φ(E), E ∈ , is called an additive set function on if (i) φ(E) is finite for every E ∈ , (ii) φ( Ek ) = φ(Ek ) for every countable family {Ek } of disjoint sets in . Since Ek is independent of the order of the Ek ’s, the series in (ii) converges absolutely. We obtain a simple example of a set function by choosing to be the σ-algebra of all subsets of S and defining φ(E) = χE (x0 ) for a fixed x0 ∈ S . As another example, let be the collection of all Lebesgue measurable subsets of Rn , and define φ(E) = E f , where f ∈ L(Rn ). A function μ(E) defined for E in is called a measure on if (i) 0 ≤ μ(E) ≤ +∞, Ek ) = μ(Ek ) for every countable family {Ek } of disjoint sets in .

(ii) μ(

The choices μ ≡ 0 or μ ≡ +∞ are always possible, but of little interest. If μ is a measure on , then the triplet (S , , μ) is called a measure space. For example, Lebesgue measure together with the class of Lebesgue measurable subsets of Rn is a measure space. As another example, let S be any countable numbers. Let be the set, S = {xk }, and let {ak } be a sequence of nonnegative family of all subsets of S , and define δ(E) = akj if E = {xkj }. Then (S , , δ) is a measure space. Such a space is called a discrete measure space. The distinction between a measure and an additive set function is that a measure is nonnegative, but may be infinite, while an additive set function may take both positive and negative values, but is finite. Any nonnegative

239

Abstract Integration

additive set function is a finite measure and vice versa. There are similarities between many of the properties of set functions and measures. If E1 ⊂ E2 and μ is a measure, then μ(E2 − E1 ) + μ(E1 ) = μ(E2 ), so that μ(E2 − E1 ) = μ(E2 )−μ(E1 ) if μ(E1 ) is finite. If φ is an additive set function, then the formula φ(E2 −E1 ) = φ(E2 )−φ(E1 ), E1 ⊂ E2 , always holds. Choosing E1 = E2 , we see that φ(∅) = 0 for an additive set function, and also μ(∅) = 0 for a measure, unless μ(E) = +∞ for all E. Moreover, if E1 ⊂ E2 , then μ(E1 ) ≤ μ(E2 ) even if μ(E1 ) = +∞, and φ(E1 ) ≤ φ(E2 ) if φ ≥ 0. The next few results concern limit properties and a basic decomposition for additive set functions. Both S and are fixed. Theorem 10.1 If {Ek } is a monotone sequence of sets in (i.e., Ek E or Ek E) and φ is an additive set function, then φ(E) = limk→∞ φ(Ek ). Proof. If Ek E, then E = disjointness,

φ(E) = φ(E1 ) +

∞

Ek = E1 ∪ (E2 − E1 ) ∪ (E3 − E2 ) ∪ · · · . Hence, by

φ(Ek − Ek−1 )

k=2

= φ(E1 ) + lim

N→∞

N

[φ(Ek ) − φ(Ek−1 )] = lim φ(EN ). N→∞

k=2

On the other hand, if Ek E, then S − Ek S − E. Therefore, by the case already considered, we have φ(S − Ek ) → φ(S − E). Since φ(S − Ek ) = φ(S ) − φ(Ek ) and φ(S − E) = φ(S ) − φ(E), the result follows. The next theorem is similar to Fatou’s lemma (Theorem 5.17). Theorem 10.2 Let φ be a nonnegative additive set function, and let {Ek } be any sequence of sets in . Then φ(lim inf Ek ) ≤ lim inf φ(Ek ) ≤ lim sup φ(Ek ) ≤ φ(lim sup Ek ). k→∞

k→∞

Proof. The sets Hm = ∞ k=m Ek increase to lim inf Ek . Therefore, by the preceding theorem, φ(lim inf Ek ) = lim φ(Hm ). Since Hm ⊂ Em and φ ≥ 0, we have φ(Hm ) ≤ φ(Em ) and lim φ(Hm ) ≤ lim inf φ(Em ). Therefore, φ(lim inf Ek ) ≤ lim inf φ(Em ), which proves the first inequality. The proof of the third one is similar, and the second is obvious.

240

Measure and Integral: An Introduction to Real Analysis

If E ∈ , the collection of sets E∩A as A ranges over forms a σ-algebra of subsets of E. In fact, is just the collection of all -measurable subsets of E. If ψ is an additive set function on , then its restriction to is additive on . On the other hand, if φ is an additive set function on , then the function defined by ψ(A) = φ(A ∩ E) is additive on . Now, let φ be an additive set function on the measurable subsets of a set E ∈ , and define V(E) = V(E; φ) = sup φ(A),

V(E) = V(E; φ) = − inf φ(A), A⊂E A∈

A⊂E A∈

V(E) = V(E; φ) = V(E) + V(E)

(10.3)

to be the upper, lower, and total variation of φ on E, respectively. Note that all three are nonnegative since φ(∅) = 0. Moreover, as is easy to see from the definitions, −V(E) ≤ φ(A) ≤ V(E)

if A ⊂ E and A ∈ ,

and therefore, supA⊂E,A∈ |φ(A)| ≤ V(E). In fact, sup |φ(A)| ≤ V(E) ≤ 2 sup |φ(A)|. A⊂E A∈

A⊂E A∈

Also, each variation is monotone increasing with E; that is, if E1 ⊂ E2 , then V(E1 ) ≤ V(E2 ), etc. of all Lebesgue measurable sets Rn and φ(E) in −= In case is the collection n + f for some fixed f ∈ L(R ), it is easy to see that V(E) = f , V(E) = E E Ef , and V(E) = E | f |; consequently, V, V, and V are also additive set functions. More generally, we will show that if φ is any additive set function on a σalgebra , then V, V, and V are also additive set functions on . A simple corollary of the finiteness of V is that an additive set function φ is not only finite but also bounded. The first step in proving that the three variations are additive is the following lemma.

Lemma 10.4 If φ is an additive set function on , then each of its three variations is countably subadditive; that is, if Ek ∈ , k = 1, 2, . . . , then V

with similar formulas for V and V.

V(Ek ), Ek ≤

241

Abstract Integration

Proof. Let H H2 = E2 − E1 , H3 = E3 − E2 − E1 , . . . . Then 1 = E1 , the Hk are Hk ) and disjoint and Ek = Hk . If A ∈ and A ⊂ Ek , then A = (A ∩ φ(A) = φ(A ∩ Hk ). Therefore, since A ∩ Hk ⊂ Ek , we have φ(A) ≤ V(Ek ). Hence,

V

Ek =

sup

A⊂

Ek ,A∈

φ(A) ≤

V(Ek ),

which proves the result for V. The proof for V is similar, and the result for V follows by adding.

Lemma 10.5 If φ is an additive set function on , then its variations V(E), V(E), and V(E) are finite for every E ∈ . Proof. It is enough to show the result for V. Suppose that V(E) = +∞ for some E. We claim that there would then exist sets Ek ∈ , k = 1, 2, . . . , such that Ek and both V(Ek ) = +∞ and |φ(Ek )| ≥ k − 1. To see this, we argue by induction. Let E1 = E, and suppose that E1 ⊃ E2 ⊃ · · · ⊃ EN have been constructed with |φ(Ek )| ≥ k − 1 and V(Ek ) = +∞ for k = 1, . . . , N. Since V(EN ) = +∞, there exists A ∈ such that A ⊂ EN and |φ(A)| ≥ |φ(EN )| + N. If V(A) = +∞, let EN+1 = A, noting that |φ(A)| ≥ N. If V(A) < +∞, let EN+1 = EN −A. Then V(EN+1 ) = +∞ since by Lemma 10.4, we have V(EN ) ≤ V(EN+1 ) + V(A). Furthermore, |φ(EN+1 )| = |φ(EN ) − φ(A)| ≥ |φ(A)| − |φ(EN )| ≥ N. This establishes the existence of sets Ek with the desired properties. Thus, by Theorem 10.1, we obtain |φ( Ek )| = lim |φ(Ek )| = +∞, contradicting the finiteness of φ and completing the proof of the lemma. The final step in proving that the variations are additive set functions is given in the next lemma. Lemma 10.6 If φ is an additiveset function on and {Ek } is a sequence of disjoint sets in , then V( Ek ) = V(Ek ). Similar formulas hold for V and V. Proof. By Lemma 10.4, we have V( Ek ) ≤ V(Ek ). To show the opposite inequality, given ε > 0, choose Ak ⊂ Ek with V(Ek ) ≤ φ(Ak ) + ε2−k . This is

242

Measure and Integral: An Introduction to Real Analysis

possible since V(Ek ) is finite by the previous lemma. Since the Ek are disjoint so are the Ak , and we obtain

V(Ek ) ≤

φ(Ak ) + ε = φ Ak + ε ≤ V Ek + ε.

Since ε is an arbitrary positive number, the result for V follows. The analogous formula for V is proved similarly, and the one for V follows by adding. Combining Lemmas 10.5 and 10.6, we immediately obtain the next theorem.

Theorem 10.7 V, and V.

If φ is an additive set function on , then so are its variations V,

The result that follows is basic and gives a decomposition of an additive set function into the difference of two nonnegative additive set functions. It may be compared to Theorem 2.6.

Theorem 10.8 (Jordan Decomposition) If φ is an additive set function on , then φ(E) = V(E) − V(E),

E ∈ .

Proof. If A ⊂ E and A ∈ , then φ(E) = φ(A)+φ(E − A). Choose measurable sets Ak ⊂ E with φ(Ak ) → V(E) as k → ∞. Then φ(E − Ak ) → φ(E) − V(E), and −V(E) ≤ φ(E) − V(E) since φ(E − Ak ) ≥ −V(E). If it were true that −V(E) < φ(E) − V(E), there would be a measurable set B ⊂ E with φ(B) < φ(E) − V(E), and consequently φ(E − B) > V(E), a contradiction. Hence, −V(E) = φ(E) − V(E) as claimed. A sequence {Ek } of sets is said to converge if lim sup Ek = lim inf Ek . Thus, {Ek } converges if each point that belongs to infinitely many Ek belongs to all Ek from some k on. For example, if either Ek E or Ek E, then {Ek } converges to E. If {Ek } is any sequence that converges, it is said to converge to the set E = lim sup Ek = lim inf Ek . As a simple corollary of the Jordan decomposition, we obtain the following result. Corollary 10.9 If a sequence of sets {Ek } of converges to E, and if φ is an additive set function on , then limk→∞ φ(Ek ) = φ(E).

243

Abstract Integration

Proof. If φ ≥ 0, we may apply Theorem 10.2. Since the extreme terms there both equal φ(E), it follows that all four equal φ(E). Hence, lim φ(Ek ) exists and equals φ(E). For arbitrary φ, the result therefore holds for V and V, and so, by the Jordan decomposition, for φ itself. Let (S , , μ) be a measure space. We have already observed that μ satisfies μ(E1 ) ≤ μ(E2 ) if E1 ⊂ E2 , E1 , E2 ∈ . Another basic property of μ is given in the next theorem. Theorem 10.10 Let (S , , μ) be a measure space, and let {Ek : k = 1, 2, . . .} be any sequence of measurable sets. Then μ Proof. Write

μ(Ek ). Ek ≤

Ek as a disjoint union as follows:

Ek = E1 ∪ (E2 − E1 ) ∪ (E3 − E2 − E1 ) ∪ · · · .

Then μ

Ek = μ(E1 ) + μ(E2 − E1 ) + μ(E3 − E2 − E1 ) + · · · ≤ μ(E1 ) + μ(E2 ) + μ(E3 ) + · · · = μ(Ek ),

which completes the proof. By definition, a measure is countably additive on disjoint measurable sets (cf. Theorem 3.23). The next result shows that it shares another basic property of Lebesgue measure (see Theorem 3.26).

Theorem 10.11 measurable sets.

Let (S , , μ) be a measure space, and let {Ek } be a sequence of

(i) If Ek E, then limk→∞ μ(Ek ) = μ(E). (ii) If Ek E and μ(Ek0 ) < +∞ for some k0 , then limk→∞ μ(Ek ) = μ(E). Proof. Suppose that Ek E. If μ(Ek ) < +∞ for all k, we may use the same argument used to prove the first part of Theorem 10.1. If μ(Ek ) = +∞ for some k, then lim μ(Ek ) = μ(E) = +∞. To prove the second part, we may assume that k0 = 1 and use the argument for Theorem 10.1 with S replaced by E1 .

244

Measure and Integral: An Introduction to Real Analysis

Corollary 10.12 Let (S , , μ) be a measure space and let {Ek : k = 1, 2, . . .} be a sequence of measurable sets. Then (i) μ(lim inf Ek ) ≤ lim inf k→∞ μ(Ek ). (ii) If μ( ∞ k0 Ek ) < +∞ for some k0 , then μ(lim sup Ek ) ≥ lim supk→∞ μ(Ek ). Proof. Part (ii) is an immediate corollary of Theorem 10.2. For part (i), let Am = ∞ k=m Ek , m = 1, 2, . . . . Then Am lim inf Ek , and by Theorem 10.11, μ(lim inf Ek ) = limm→∞ μ(Am ). Since Am ⊂ Em , we have μ(Am ) ≤ μ(Em ) and limm→∞ μ(Am ) ≤ lim inf m→∞ μ(Em ). The result follows by combining inequalities.

10.2 Measurable Functions and Integration We will now develop the notions of measurable functions and integration in a measure space. These will be used later in the chapter to prove several important results for set functions. Let be a fixed σ-algebra of subsets of S , and let f (x) be a real-valued function defined for x in a measurable set E. (As usual, f may take the values ±∞.) Then f is said to be -measurable, or simply measurable, if {x ∈ E : f (x) > a} is measurable for −∞ < a < +∞. We will state some familiar results whose proofs depend only on the fact that the class of measurable sets forms a σ-algebra. The proofs are therefore similar to those in Chapter 4 for Lebesgue measurable functions, and details are left to the reader. On the other hand, the proofs of some other results (such as the monotone convergence theorem) follow a pattern different from their analogues in Chapter 5 due to the lack of a geometric interpretation of the integral in the general setting.

Theorem 10.13 (i) If f and g are measurable on a set E ∈ , then so are f + g, cf for real c, φ(f ) if φ is continuous on R1 , f + , f − , | f |p for p > 0, fg, and 1/f if f = 0 in E. (ii) If {fk } are measurable on a set E ∈ , then so are supk fk , inf k fk , lim supk→∞ fk , lim inf k→∞ fk , and, if it exists, limk→∞ fk . N (iii) If f is a simple function taking values {vk }N k=1 on disjoint sets {Ek }k=1 , respectively, then f is measurable if and only if each Ek is measurable. In particular, χE is measurable if and only if E is.

(iv) If f is nonnegative and measurable on E ∈ , then there exist nonnegative, simple measurable fk f on E.

245

Abstract Integration

If (S , , μ) is a measure space, a measurable set E is said to have μ-measure zero, or measure zero, if μ(E) = 0. A property is said to hold almost everywhere in E with respect to μ, or a.e. (μ), if it holds in E except at most for a subset of measure zero. We have the following analogue of Egorov’s theorem.

Theorem 10.14 (Egorov’s Theorem) Let (S , , μ) be a measure space, and let E be a measurable set with μ(E) < +∞. Let {fk } be a sequence of measurable functions on E such that each fk is finite a.e. (μ) in E and {fk } converges a.e. (μ) in E to a finite limit. Then, given ε > 0, there is a measurable set A ⊂ E with μ(E − A) < ε such that {fk } converges uniformly on A. In general, we cannot choose A to be closed in Theorem 10.14; in fact, S has very little structure, and the notion of a closed set may not even be defined. The proof is similar to that for Lebesgue measure and is left as an exercise. Let f be nonnegative on a measurable set E. Define the integral of f over E with respect to μ by

f dμ = sup

j

E

[ inf f (x)] μ(Ej ), x∈Ej

f ≥ 0,

(10.15)

where the supremum is taken over all decompositions E = Ej of E into the union of a finite number of disjoint measurable sets Ej . We adopt the convention 0 · ∞ = ∞ · 0 = 0 for the terms of the sum in (10.15). By Theorem 5.8, the definition reduces to the usual Lebesgue integral in case S = Rn , is the class of Lebesgue measurable sets, μ is Lebesgue measure, and f is nonnegative and Lebesgue measurable. Although definition (10.15) does not require the measurability of f , many of the familiar properties of the integral are valid only for measurable functions. All functions considered in the rest of this section are assumed to be measurable.

Theorem 10.16 Let (S , , μ) be a measure space, and let f be a nonnegative, simple measurable function defined on a measurable set E. If f takes values v1 , . . . , vN on disjoint E1 , . . . , EN , then f dμ = vj μ(Ej ). E

Proof. Since f is measurable, each Ej is measurable by Theorem 10.13(iii). f dμ ≥ vj μ(Ej ). On the other hand, consider any decomposition Clearly, E E = Ak of E into a finite number of disjoint measurable sets, and let wk = inf Ak f . If Ak ∩ Ej is not empty, then wk ≤ vj . Therefore, by the additivity of μ,

246

Measure and Integral: An Introduction to Real Analysis

wk μ(Ak ) =

j

≤

k

vj

j

wk μ(Ak ∩ Ej )

μ(Ak ∩ Ej ) =

vj μ(Ej ).

k

Taking the supremum over all such decompositions gives which completes the proof.

Ef

dμ ≤

vj μ(Ej ),

Note that the previous theorem holds even if some of the vj are +∞. Theorem 10.17 Let (S , , μ) be a measure space, and let f and g be measurable functions defined on a set E ∈ .

g dμ. E (ii) If f ≥ 0 on E and μ(E) = 0, then E f dμ = 0. (i) If 0 ≤ f ≤ g on E, then

Ef

dμ ≤

Proof. Both parts follow immediately from the definition (10.15). For part (ii), note that μ(Ej ) = 0 whenever Ej ⊂ E and Ej ∈ . Hence, each term of the sum in (10.15) is zero. In order to further investigate the properties of the integral, we need the next two lemmas. In these and the results that follow, the measure space (S , , μ) is fixed.

Lemma 10.18 (i) If f and g are nonnegative, simple measurable functions on E, and if c is a (f + g) dμ = f dμ + g dμ and nonnegative constant, then E E E E cf dμ = c E f dμ.

(ii) If f is a nonnegative, simple measurable function on E, and E = E1 ∪ E2 is the union of two disjoint measurable sets, then E f dμ = E1 f dμ + E2 f dμ. The proof of the first part of (i) is like the first part of the proof of Theorem 5.14. The second part of (i) follows immediately from Theorem 10.16. Details and the proof of (ii) are left as an exercise. Lemma 10.19 Let fk , k = 1, 2, . . . , and g be nonnegative, simple measurable functions defined on a set E ∈ . If fk and limk→∞ fk ≥ g on E, then

247

Abstract Integration lim

k→∞

fk dμ ≥

E

g dμ.

E

Moreover, the conclusion remains true if some values of g are +∞. Proof. Suppose g takes finite values v1 , . . . , vm on disjoint sets E1 , . . . , Em . By Lemma 10.18(ii), it is enough to show that lim

k→∞

fk dμ ≥

Ej

g dμ

for each j.

Ej

We thus reduce the proof when g < +∞ to the case when g is constant on E, that is, g = v ≥ 0 on E. If v = 0, the result is obvious. Suppose then that 0 < v < +∞, and let 0 < ε < v and Ak = {x ∈ E : fk (x) ≥ v − ε}, k = 1, 2, . . . . Since fk , we have Ak E, so that μ(Ak ) → μ(E). Moreover, E

fk dμ ≥

fk dμ ≥ (v − ε)μ(Ak ).

Ak

Therefore, limk→∞ E fk dμ ≥ (v − ε)μ(E). Letting ε → 0 and observing that vμ(E) = E g dμ, we obtain the desired result. We leave it to the reader to check the case when some values of g are +∞. The next theorem is helpful in deriving properties of nonnegative f from those for simple f .

fdμ for arbitrary

Theorem 10.20 Let {fk } be a sequence of nonnegative, simple measurable functions defined on E ∈ . If fk f on E, then E fk dμ → E f dμ. Proof. Clearly, limk→∞ E fk dμ ≤ E f dμ. To show the opposite inequality, consider a partition E = Ej of E into a finite number of disjoint measurable vj μ(Ej ). The function g defined by g = sets Ej , and let vj = inf Ej f and σ = vj χEj is nonnegative and measurable, and E g dμ = σ. Since limk→∞ fk ≥ g, we have limk→∞ E fk dμ ≥ σ by Lemma 10.19. Taking the supremum of such σ over all partitions of E, we obtain the desired result. As a corollary, we have the following theorem.

Theorem 10.21 Let f and g be nonnegative measurable functions defined on E ∈ , and let c be a nonnegative constant. Then

248

Measure and Integral: An Introduction to Real Analysis

(i) E (f + g) dμ = E f dμ + E g dμ and E cf dμ = c E f dμ. (ii) If E2 , where E1 and E2 are disjoint and measurable, then E f dμ = E = E1 ∪ E1 f dμ + E2 f dμ. Proof. By Theorem 10.13(iv), choose simple measurable fk and gk such that 0 ≤ fk f and 0 ≤ gk g. Then fk + gk is simple and measurable, and 0 ≤ fk + gk f + g. Therefore, by Theorem 10.20 and Lemma 10.18,

(f + g) dμ = lim

k→∞

E

=

(fk + gk ) dμ = lim

E

f dμ +

E

k→∞

fk dμ +

E

gk dμ

E

g dμ.

E

This proves the first part of (i); the other parts are proved similarly. If f is any real-valued measurable function defined on a measurable set E, we define its integral with respect to μ by E

f (x) dμ(x) =

E

f dμ =

E

f + dμ −

f − dμ,

(10.22)

E

provided not both integrals on the right are +∞. We say that f is integrable with respect to μ, or μ-integrable, over E if E f dμ exists and is finite. When this is the case, we write f ∈ L(E; dμ) or f ∈ L(E; μ). The abbreviations L(dμ) or L(μ) are also useful when it is clear from context what the set E is. It is immediate from (10.22) and Theorem 10.17 that E f dμ = 0 if μ(E) = 0 exist. The familiar and that E f dμ ≤ E g dμ if f ≤ g on E and both integrals properties of the Lebesgue integral are shared by E f dμ; some of them are listed in the following theorem.

Theorem 10.23

Ef

dμ| ≤

E | f | dμ; furthermore, f ∈ L(E; dμ) if and only if | f | ∈ L(E; dμ). (ii) If | f | ≤ |g| a.e. (μ) in E, and if g ∈ L(E; dμ), then f ∈ L(E; dμ), and E | f | dμ ≤ |g| dμ. E (iii) If f ∈ L(E; dμ), then f is finite a.e. (μ) in E. (iv) If f = g a.e. (μ) in E and if E f dμ exists, then E g dμ exists and E g dμ = E f dμ.

(i) |

249

Abstract Integration

(v) If E f dμ exists and c is a constant, then E cf dμ exists and E cf dμ = c E f dμ. (vi) If f , g ∈ L(E; dμ), then f + g ∈ L(E; dμ) and E (f + g) dμ = E f dμ + E g dμ.

(vii) If f ≥ 0 and m ≤ g ≤ M on E, then m

f dμ ≤

E

fg dμ ≤ M

E

f dμ.

E

Proof. The proofs are similar to those for Lebesgue integrals. As examples, we will prove (iii) and (iv). For (iii), suppose that f ∈ L(E; dμ). Then, by (i), | f | ∈ L(E; dμ). Let Z = {x ∈ E : | f (x)| = +∞}. Then for any positive integer k, k μ(Z) ≤

| f | dμ ≤

Z

| f | dμ.

E

Since f ∈ L(E; dμ), it follows that μ(Z) = 0, which proves (iii). For (iv), since both f + = g+ and f − = g− a.e. (μ) in E, we may assume that f ≥ 0. Then both E f dμ and E g dμ clearly exist. Let E1 = {x ∈ E : f (x) = g(x)}. Since μ(E1 ) = 0, Theorem 10.21(ii) implies that

f dμ =

E

f dμ =

E−E1

g dμ =

E−E1

g dμ,

E

as asserted.

Theorem 10.24 then

If {fk } is a sequence of nonnegative measurable functions on E, fk dμ = fk dμ. E

E

m Proof. Let f = ∞ k=1 fk . Since f ≥ k=1 fk , the integral of f over E majorizes m k=1 E fk dμ for any m. Hence, the left side in the preceding equation (j)

majorizes the right. To show the opposite inequality, let {fk } be a sequence of j (j) nonnegative, simple measurable functions increasing to fk . Let sj = k=1 fk . Then sj is nonnegative and simple, and sj . We will show that sj f . Clearly, limj→∞ sj ≤ f . On the other hand, for any m, lim sj ≥ lim

j→∞

j→∞

m k=1

(j)

fk =

m k=1

fk .

250

Measure and Integral: An Introduction to Real Analysis

Therefore, limj→∞ sj ≥ f . It follows from Theorem 10.20 that j E f dμ. Since k=1 fk ≥ sj , we obtain ∞

j

fk dμ = lim

j→∞

k=1 E

fk dμ ≥ lim

j→∞

k=1 E

sj dμ =

E

E sj dμ

→

f dμ.

E

This proves the desired inequality, and the theorem follows. The next three results are essentially corollaries of Theorem 10.24. Ek is a countable union of disjoint Theorem 10.25 If E f dμ exists, and if E = measurable sets Ek , then

f dμ =

E

f dμ.

Ek

Proof. Suppose first that f≥ 0. Let fk = f χEk on E, so that fk is measurable and nonnegative, and f = fk . By Theorem 10.24,

f dμ =

E

fk dμ =

E

f dμ.

Ek

For arbitrary measurable f , the existence of E f dμ implies that of Ek f dμ; in fact, the integrals of f + and f − over any Ek are majorized by those over E. Moreover, by the case already considered, E

f + dμ =

Ek

f + dμ,

E

f − dμ =

f − dμ.

Ek

Since at least one of these sums is finite, the conclusion follows by subtraction. (Compare Theorem 5.24.)

Theorem 10.26

If fk are measurable and 0 ≤ fk f on E, then

E fk dμ

→

Ef

dμ.

Proof. If E fk dμ = +∞ for some k, the result is obvious. We may therefore assume that each fk ∈ L(dμ). Write f = f1 + ∞ k=2 (fk − fk−1 ). Since each term on

251

Abstract Integration

the right is nonnegative, we obtain from Theorems 10.24 and 10.23(vi) that E

f dμ =

f1 dμ +

∞ k=2

E

E

fk dμ −

fk−1 dμ = lim

k→∞

E

fk dμ.

E

Theorem 10.27 (Monotone Convergence Theorem) Let {fk } and f be measurable functions on E: (i) Suppose that fk f a.e. (μ) onE. If there exists φ ∈ L(E; dμ) such that fk ≥ φ on E for all k, then E fk dμ → E f dμ.

(ii) Suppose that fk f a.e. (μ) onE. If there exists φ ∈ L(E; dμ) such that fk ≤ φ on E for all k, then E fk dμ → E f dμ. Proof. The proof of (i) follows by applying Theorem 10.26 to the functions fk − φ. The details are as in the proof of Theorem 5.32. Part (ii) follows by applying (i) to the functions −fk . Theorem 10.28 (Uniform Convergence Theorem) If fk ∈ L(E; dμ), k = 1, 2, . . . , and {fk } converges uniformly to f on E, μ(E) < +∞, then f ∈ L(E; dμ) and E fk dμ → E f dμ. The proof is the same as for Lebesgue measure (see Theorem 5.33) and is omitted. Fatou’s lemma and the Lebesgue dominated convergence theorem are true for abstract measures. They are stated below without proof; the proofs are like those of Theorems 5.34 and 5.36. Theorem 10.29 (Fatou’s Lemma) If {fk } is a sequence of measurable functions on E and there exists φ ∈ L(E; dμ) such that fk ≥ φ a.e. (μ) on E for all k, then E

(lim inf fk ) dμ ≤ lim inf k→∞

k→∞

fk dμ.

E

The case φ = 0 (i.e., fk ≥ 0) is of special importance. In this case, we obtain the following useful corollary. functions on E such Corollary 10.30 Let {fk } and f be nonnegative measurable that fk → f a.e. (μ) in E. If E fk dμ ≤ M for all k, then E f dμ ≤ M.

252

Measure and Integral: An Introduction to Real Analysis

Theorem 10.31 (Lebesgue’s Dominated Convergence Theorem) Let φ, {fk }, and f be measurable functions on E such that | fk | ≤ φ a.e. (μ) on E and φ ∈ L(E; dμ). Then (i)

E (lim inf k→∞ fk ) dμ ≤ lim inf k→∞ E fk dμ ≤ ≤ E (lim supk→∞ fk ) dμ.

(ii) If fk → f a.e. (μ) in E, then

E fk dμ

→

Ef

lim supk→∞

E fk dμ

dμ.

Corollary 10.32 (Bounded Convergence Theorem) Suppose that {fk } and f are measurable functions on E such that fk → f a.e. (μ) in E. If μ(E) < +∞ and there is a constant M such that | fk | ≤ M a.e. (μ) in E, then E fk dμ → E f dμ. We conclude our brief study of integration with respect to abstract measures by defining Lp (E; dμ) = Lp (E, , dμ), 0 < p α) = 0}. E

We leave it to the reader to check that | f | ≤ f ∞ a.e. (μ) in E and that for every α < f ∞ , there is a set Eα ⊂ E such that μ(Eα ) > 0 and | f | > α on Eα . Observe that lp is Lp (S , , dμ) when S is the set of integers, is the set of all subsets of S , and μ(E) is the number of elements of E. For 1 ≤ p ≤ ∞, Hölder’s and Minkowski’s inequalities hold: fg1 ≤ f p gp ,

f + gp ≤ f p + gp ,

1/p + 1/p = 1. Moreover, Lp is a Banach space with norm · p if 1 ≤ p ≤ ∞. In general, Lp is not separable (see Exercise 9). However, if L2 is separable, we can define orthogonality, linear independence, completeness, Fourier coefficients, and Fourier series as usual, obtaining Bessel’s inequality and Parseval’s formula, as well as the usual result relating L2 and l2 .

Abstract Integration

253

10.3 Absolutely Continuous and Singular Set Functions and Measures We now turn our attention from the familiar results earlier to some new ones arising naturally in the context of abstract measure spaces. Let (S , , μ) be a measure space, and let φ be an additive set function on . If E ∈ , then φ is said to be absolutely continuous on E with respect to μ if φ(A) = 0 for every measurable A ⊂ E with μ(A) = 0. Note that this definition has a somewhat different pattern from the one for Lebesgue measure (see p. 130 in Section 7.1). However, in Theorem 10.34, we shall obtain a reformulation of the present definition in terms of the old one. On the other hand, φ is said to be singular on E with respect to μ if there is a measurable set Z ⊂ E such that μ(Z) = 0 and φ(A) = 0 for every measurable A ⊂ E − Z. Thus, φ is singular if it is supported on a set of μ-measure zero, so that E splits into the union of two sets, Z and E − Z, one with μ-measure zero and the other with the property that φ is zero on each measurable subset of it. As examples, note that if f ∈ L(E; dμ), then the function φ(A) = A f dμ is absolutely continuous on E with respect to μ. If Z is any measurable subset of E with μ(Z) = 0 and ψ is any additive set function on the measurable subsets of E, then the function φ(A) = ψ(A ∩ Z) is singular on E with respect to μ. We list several simple properties of such set functions in the next theorem.

Theorem 10.33 (i) If φ is both absolutely continuous and singular on E with respect to μ, then φ(A) = 0 for every measurable A ⊂ E. (ii) If both ψ and φ are absolutely continuous (singular) on E with respect to μ, then so are ψ + φ and cφ, where c is any real constant. (iii) φ is absolutely continuous (singular) on E with respect to μ if and only if its variations V and V are, or, equivalently, if and only if its total variation V is. (iv) If {φk } is a sequence of additive set functions that are absolutely continuous (singular) on E with respect to μ, and if φ(A) = limk→∞ φk (A) exists for every measurable A ⊂ E, then φ is absolutely continuous (singular) on E with respect to μ. Proof. For part (i), suppose that φ is absolutely continuous and singular on E. Let Z be a subset of E with μ-measure zero such that φ(H) = 0 if H is measurable and H ⊂ E − Z. If A is any measurable subset of E, then

254

Measure and Integral: An Introduction to Real Analysis

φ(A) = φ (A ∩ Z) + φ (A − Z). Since φ is absolutely continuous and μ (A ∩ Z) = 0, we have φ (A ∩ Z) = 0. Moreover, since A − Z ⊂ E − Z and φ is singular, we have φ(A − Z) = 0. Hence, φ(A) = 0, and part (i) is proved. The proofs of parts (ii)–(iv) are left as exercises. The next two theorems give alternate characterizations of absolutely continuous and singular set functions.

Theorem 10.34 An additive set function φ is absolutely continuous on E with respect to μ if and only if given ε > 0, there exists δ > 0 such that |φ(A)| < ε for any measurable A ⊂ E with μ(A) < δ. Proof. The sufficiency of the condition is immediate since if μ(A) = 0, then μ(A) < δ for all δ > 0, so that |φ(A)| < ε for all ε. Consequently, φ(A) = 0. For the converse, suppose that φ is absolutely continuous, but that there is an ε > 0 for which no δ > 0 gives the desired result. Then, taking δ = 2−k for k = 1, 2, . . . , there would exist measurable Ak ⊂ E with μ(Ak ) < 2−k and |φ(Ak )| ≥ ε. Let A = lim sup Ak . Then, for any m, μ(A) ≤ μ

∞ k=m

Ak

≤

∞

2−k ,

k=m

so that μ(A) = 0. Therefore, φ(A) = 0. Assuming for the moment that φ ≥ 0, we obtain from Theorem 10.2 that φ(A) = φ(lim sup Ak ) ≥ lim sup φ(Ak ) ≥ ε. This contradiction establishes the result in case φ ≥ 0. For the general case, the variation V of an absolutely continuous φ is absolutely continuous (by Theorem 10.33(iii)) and nonnegative. Since |φ(A)| ≤ V(A), the theorem follows.

Theorem 10.35 An additive set function φ is singular on E with respect to μ if and only if given ε > 0, there is a measurable subset E0 of E such that μ(E0 ) < ε and V(E − E0 ; φ) < ε. (Recall from p. 240 in Section 10.1 that V(E − E0 ; φ) is equivalent in size to supA⊂E−E0 ,A∈ |φ(A)|). Proof. If φ is singular, there exists Z ⊂ E with μ(Z) = 0 such that V(E − Z; φ) = 0. Taking E0 = Z, we obtain the necessity of the condition. To prove its −k sufficiency, choose for each k = 1, 2, . . . a measurable E k ⊂ E with μ(Ek ) < 2 ∞ −k and V(E − Ek ; φ) < 2 . Let Z = lim sup Ek . Since Z ⊂ k=m Ek for every m, it follows as usual that μ(Z) = 0. Moreover, by Theorem 10.2,

255

Abstract Integration

V(E − Z; φ) = V(E − lim sup Ek ; φ) = V(lim inf(E − Ek ); φ) ≤ lim inf V(E − Ek ; φ) = 0. Hence, φ is singular with respect to μ, which completes the proof. For any measurable set E and any additive set function φ, the next theorem gives a useful decomposition of E in terms of the sign of φ. In order to motivate it, let us first consider the special case when φ is the indefinite integral of an f ∈ L(E; dμ): φ(A) = A f dμ for measurable A ⊂ E. Letting P = {x ∈ E : f (x) ≥ 0}, we see that φ(A) ≥ 0 for any measurable A ⊂ P and that φ(A) ≤ 0 for any measurable A ⊂ E − P. It follows that V(E; φ) = V(P; φ) = φ(P) and V(E; φ) = V(E − P; φ) = −φ(E − P). As a simple consequence, since P = {x ∈ E : f (x) = f + (x)}, we have V(E; φ) =

f + dμ,

V(E; φ) =

E

f − dμ.

E

The splitting E = P ∪ (E − P) is the sort of decomposition of E that we have in mind. For an arbitrary φ, there is the following basic result.

Theorem 10.36 (Hahn Decomposition) Let E be a measurable set and let φ be an additive set function defined on the measurable subsets A of E. Then there is a measurable P ⊂ E such that φ(A) ≥ 0 for A ⊂ P and φ(A) ≤ 0 for A ⊂ E − P. Equivalently, V(P; φ) = V(E − P; φ) = 0. Hence, V(E; φ) = V(P; φ) = φ(P), V(E; φ) = V(E − P; φ) = −φ(E − P). Proof. Denote V(A) = V(A; φ) and V(A) = V(A; φ) for measurable sets A ⊂ E. For each positive integer k, choose a measurable Ak ⊂ E such that φ(Ak ) > V(E) − 2−k . Then V(Ak ) > V(E) − 2−k . Since V is additive, V(E − Ak ) = V(E) − V(Ak ) < 2−k . Moreover, by the Jordan decomposition (Theorem 10.8), V(Ak ) − V(Ak ) = φ(Ak ) > V(E) − 2−k , and therefore V(Ak ) < 2−k . Let P = lim inf Ak . Since V is nonnegative, Theorem 10.2 implies that V(P) ≤ lim inf V(Ak ) = 0. Also,

V(E − P) = V(E − lim inf Ak ) = V(lim sup(E − Ak )) ≤ V

∞

(E − Ak )

k=m

256

Measure and Integral: An Introduction to Real Analysis

for any m. Therefore, for any m, by using Lemma 10.4, we obtain

V(E − P) ≤

∞

V(E − Ak )

0, there is a decomposition E = Z ∪ k=1 k of E into disjoint measurable sets such that (i) μ(Z) = 0 (ii) a(k − 1)μ(A) ≤ φ(A) ≤ akμ(A) for measurable A ⊂ Ek , k = 1, 2, . . . Proof. We may assume that a = 1 by considering φ/a. For each positive integer k, let ψk (A) = φ(A) − kμ(A) for measurable A ⊂ E. Since φ and μ are finite and additive, ψk is an additive set function. By the Hahn decomposition, there is a set Pk ⊂ E such that ψk (A) ≥ 0 if A ⊂ Pk and ψk (A) ≤ 0 if A ⊂ E − Pk . Thus, φ(A) ≥ kμ(A) if A ⊂ Pk and φ(A) ≤ kμ(A) if A ⊂ E − Pk . Now, let Qk = ∞ m=k Pm for k = 1, 2, . . . , and observe that Pk ⊂ Qk and Qk . We will show that φ(A) ≥ kμ(A) if A ⊂ Qk , and φ(A) ≤ kμ(A) if A ⊂ E − Qk . To see this, write Qk = Pk ∪ (Pk+1 − Pk ) ∪ (Pk+2 − Pk+1 − Pk ) ∪ · · · and note that the terms on the right side are disjoint. Hence, if A ⊂ Qk , we may write A = ∞ m=k Am , where the Am are disjoint and Am ⊂ Pm , by simply intersecting A with each such term of Qk . Then

φ(A) =

∞ m=k

φ(Am ) ≥

∞ m=k

mμ(Am ) ≥ k

∞

μ(Am ) = kμ(A),

m=k

so that φ(A) ≥ kμ(A) for A ⊂ Qk , as claimed. On the other hand, if A ⊂ E−Qk , then A ⊂ E − Pk , so that φ(A) ≤ kμ(A). This proves the assertion above.

257

Abstract Integration

We can now give the decomposition of E. Let Z = and write

∞

k=1 Qk

= lim sup Pk ,

E = Z ∪ (E − Q1 ) ∪ (Q1 − Q2 ) ∪ (Q2 − Q3 ) ∪ · · · = Z ∪ E1 ∪ E2 ∪ E3 ∪ · · · . The terms in this decomposition are disjoint. If A ⊂ E1 (= E−Q1 ), then φ(A) ≤ μ(A) by what was shown earlier, and φ(A) ≥ 0 by hypothesis. For k ≥ 2, we have Ek = Qk−1 − Qk = Qk−1 ∩ (E − Qk ). Hence, if k ≥ 2 and A ⊂ Ek , then φ(A) ≥ (k − 1) μ(A) due to the fact that A ⊂ Qk−1 ; also, φ(A) ≤ kμ(A) due to A ⊂ E − Qk . Finally, since Z ⊂ Qk for all k, we have φ(Z) ≥ kμ(Z) for all k. Since φ is finite, it follows that μ(Z) = 0, which completes the proof. To give some idea of the significance of the last result, write A = (A ∩ Z) ∪ [ k (A ∩ Ek )] for measurable A ⊂ E. Then φ(A) = φ(A ∩ Z) +

φ(A ∩ Ek ).

The set function σ(A) = φ (A ∩ Z) is singular with respect to μ. By (ii) of the theorem, φ is absolutely continuous with respect to μ on each Ek . Hence, the set function α defined by α(A) = φ(A) − σ(A) =

φ(A ∩ Ek )

is absolutely continuous with respect to μ since if μ(A) = 0, then μ(A ∩ Ek ) = 0 and φ(A ∩ Ek ) = 0 for all k. Note also that (ii) can be written a(k − 1) dμ ≤ φ(A) ≤ ak dμ A

A

for measurable A ⊂ Ek . We will now use these ideas to decompose any set function into the sum of an absolutely continuous part, which will be an indefinite integral, and a singular part. This decomposition, which is of major importance, is stated in the following theorem. We assume that the measure μ defined on the measurable subsets of E is σ-finite, that is, that E can be written as a countable union of measurable sets with finite μ-measure.

Theorem 10.38 (Lebesgue Decomposition) Let φ be an additive set function on the measurable subsets of a measurable set E, and let μ be a σ-finite measure on E. Then there is a unique decomposition φ(A) = α(A) + σ(A) for measurable A ⊂ E,

258

Measure and Integral: An Introduction to Real Analysis

where α and σ are additive set functions, α is absolutely continuous with respect to μ, and σ is singular with respect to μ. These functions are α(A) =

σ(A) = φ(A ∩ Z)

f dμ,

A

for appropriate f ∈ L (E; dμ) and Z with μ(Z) = 0. Moreover, if φ ≥ 0, then f ≥ 0. Proof. Assuming that such a decomposition exists, we will show it is unique. If φ = α1 + σ1 is another decomposition of φ into absolutely continuous and singular parts, then α−α1 = σ1 −σ, which (being both absolutely continuous and singular with respect to μ) must vanish identically. Hence α = α1 and σ = σ1 . To show that the decomposition exists, first assume that φ ≥ 0 and μ(E) < +∞. Taking a = 2−m , m = 1, 2, . . . , in Theorem 10.37, we may write E as a (m) disjoint union E = Z(m) ∪ , where k Ek μ(Z(m) ) = 0 and 2−m (k − 1)μ(A) ≤ φ(A) ≤ 2−m kμ(A) if A ⊂ Ek . (m)

Given m, k, m , and k , let β = 2−m (k − 1), γ = 2−m k, β = 2−m (k − 1), and γ = 2−m k . If the intervals [β, γ] and [β , γ ] are disjoint, we will show that (m) (m ) the set A = Ek ∩ Ek has μ-measure zero. In fact, we have both βμ(A) ≤ φ(A) ≤ γμ(A)

and

β μ(A) ≤ φ(A) ≤ γ μ(A).

If, for example, γ < β , the inequalities β μ(A) ≤ φ(A) ≤ γμ(A) imply that μ(A) = 0. A similar argument applies if γ < β. Fixing m and k, and setting (m+1) m = m + 1, we see that there are at most four values of k such that Ek (m) intersects Ek in a set of positive μ-measure, namely, k = 2k − 2, 2k − 1, 2k, and 2k + 1. Hence, (m)

Ek

(m+1)

(m+1)

(m+1)

⊂ E2k−2 ∪ E2k−1 ∪ E2k

(m+1)

(m)

∪ E2k+1 ∪ Yk ,

(m)

where μ(Yk ) = 0. Let Z=

m

⎛

Z(m) ∪ ⎝

⎞ (m) Yk ⎠ ,

k,m

−m (k − 1) so that μ(Z) = 0, and define functions {fm }∞ m=1 on E by fm (x) = 2 (m) (m) if x ∈ Ek − Z and fm (x) = 0 if x ∈ Z. Therefore, if x ∈ Ek − Z, then

259

Abstract Integration

fm (x) = 2−m (k − 1) and fm+1 (x) takes one of the four values 2−m−1 j, j = 2k − (m) 3, 2k − 2, 2k − 1, 2k. Hence, |fm (x) − fm+1 (x)| ≤ 2−m if x ∈ Ek − Z, and so also if x ∈ E. It follows that {fm } converges uniformly on E to a limit f . Since fm ≥ 0, also f ≥ 0. (m) Since E is the disjoint union Z ∪ k Ek − Z and φ is absolutely continuous on each Ek(m) , φ(A) = φ(A ∩ Z) +

k

= φ(A ∩ Z) +

φ A ∩ Ek(m) − Z (m) φ A ∩ Ek

k

for measurable A ⊂ E. Therefore, φ(A ∩ Z) +

(m) ≤ φ(A) 2−m (k − 1)μ A ∩ Ek

k

≤ φ(A ∩ Z) +

2−m kμ A ∩ Ek(m) ,

k

which can be rewritten φ(A ∩ Z) +

fm dμ ≤ φ(A) ≤ φ(A ∩ Z) +

A

fm dμ + 2−m μ(A).

A

Since μ(A) is finite, we obtain from the uniform convergence theorem that f dμ → A f dμ. Therefore, A m φ(A) = φ(A ∩ Z) +

f dμ,

A

which proves the theorem in case φ ≥ 0 and μ(E) < +∞. If φ ≥ 0 and μ(E) = +∞, then E can still be written as a disjoint union E = Ej with μ(Ej ) < +∞, since E is σ-finite. Hence, there exist Zj ⊂ Ej , μ(Zj ) = 0, and nonnegative fj on Ej such that for all measurable A ⊂ E, φ(A ∩ Ej ) = φ(A ∩ Zj ) +

A∩Ej

fj dμ.

260

Letting Z =

Measure and Integral: An Introduction to Real Analysis

Zj and f = φ(A) =

fj χEj , we obtain μ(Z) = 0, f ≥ 0, and

φ(A ∩ Zj ) φ(A ∩ Ej ) = + fj dμ = φ(A ∩ Z) + f dμ.

A∩Ej

A

Of course, f is integrable since φ is finite. The proof is now complete if φ ≥ 0. For an arbitrary φ, apply the decomposition to each of V and V, and subtract the results. By the Jordan decomposition, we obtain φ(A) = A f dμ + σ(A), where f ∈ L (E; dμ) and σ is singular with respect to μ. It remains to show that there is a set Z, μ(Z) = 0, such that σ(A) = φ (A ∩ Z). Let Z be the set of μ-measure zero corresponding to σ in the definition of a singular set function. Then σ (A ∩ Z) = σ(A) and A∩Z f dμ + σ(A) = 0. Hence, replacing A by A ∩ Z in the formula φ(A) = A f dμ + σ(A), we obtain σ(A) = φ (A ∩ Z). This completes the proof. For a result concerning the uniqueness of f , see Exercise 6(a). We have already noted that the indefinite integral of an integrable function is absolutely continuous. The following fundamental result gives a converse: namely, in a σ-finite space, the only absolutely continuous set functions are indefinite integrals.

Theorem 10.39 (Radon–Nikodym) Let φ be an additive set function on the measurable subsets of a measurable E, and let μ be a σ-finite measure on E. If φ is absolutely continuous with respect to μ, there exists a unique f ∈ L (E; dμ) such that φ(A) =

f dμ

A

for every measurable A ⊂ E. Here, the function f is called the Radon–Nikodym derivative of φ with respect to μ. Proof. The result follows from the Lebesgue decomposition. In fact, φ(A) = f dμ + φ (A ∩ Z) for appropriate f ∈ L (E; dμ) and Z with μ(Z) = 0. Since φ A is absolutely continuous, we have φ (A ∩ Z) = 0, so that φ (A) = A f dμ. For the uniqueness of f , see Exercise 6(a). If ν and μ are two measures defined on the same family of measurable sets, we say that ν is absolutely continuous with respect to μ on a measurable

261

Abstract Integration

set E if ν(A) = 0 for every A ⊂ E with μ(A) = 0. If ν is finite, Theorem 10.34 implies that a necessary and sufficient condition for ν to be absolutely continuous with respect to μ is that given ε > 0, there exist δ > 0 such that ν(A) < ε if μ(A) < δ. The necessity of this condition may fail if ν is not finite; see Exercise 12. We say that two measures ν and μ are mutually singular on E if E can be written as a disjoint union, E = E1 ∪ E2 , of two measurable sets with ν(E1 ) = μ(E2 ) = 0. The reader can check that the following analogue of Theorem 10.35 is valid: two measures ν and μ are mutually singular on E if and only if given ε > 0, there are disjoint measurable E1 and E2 with E = E1 ∪ E2 and ν(E1 ) < ε, μ (E2 ) < ε. We also note that ifν and μ are mutually singular on E and if g ∈ L (E; dν), then the set function A g dν is singular with respect to μ. To see this, write E = E1 ∪ E2 , where E1 and E2 are disjoint with ν (E1 ) = μ (E2 ) = 0. Setting Z = E2 ,we have μ(Z) = 0 and ν(A) = 0 for every measurable A ⊂ E−Z = E1 . Hence, A g dν = 0 for such A, which proves the assertion. We have the following analogue of the Lebesgue decomposition.

Theorem 10.40 Let ν and μ be two σ-finite measures defined on the measurable subsets of a measurable E. Then there is a unique, nonnegative measurable f on E and a unique measure σ on the measurable subsets of E such that σ and μ are mutually singular on E and ν(A) = A f dμ + σ(A) for every measurable A ⊂ E. Moreover,

g dν =

A

whenever

A g dν

A

gf dμ +

g dσ

A

exists.

Before giving the proof, we add several remarks based on the theorem. First, A f dμ is an absolutely continuous measure with respect to μ since f ≥ 0. Next, if E = Z ∪ (E − Z), where μ(Z) = σ(E − Z) = 0, then σ has the form σ(A) = ν(A ∩ Z)

for measurable A ⊂ E,

as can be seen by replacing A by A ∩ Z in the decomposition of ν and observing that σ(A) = σ(A ∩ Z) + σ(A − Z) = σ(A ∩ Z). Note also that if ν(E) is finite, then f ∈ L(E; dμ). Finally, if g ∈ L(E; dν), then g ∈ L(E; dσ) since σ(A) ≤ ν(A) for all measurable A ⊂ E. In this case, the second formula in the theorem implies that gf ∈ L(E; dμ) and expresses the Lebesgue decomposition of A g dν with respect to μ. Proof. If ν(E) < +∞, the Lebesgue decomposition implies that ν(A) = A f dμ + σ(A) for measurable A ⊂ E, where f ≥ 0, f ∈ L(E; dμ), and σ

262

Measure and Integral: An Introduction to Real Analysis

and μ are mutually singular. If ν(E) = +∞, then since ν is σ-finite, we have E = Ej with Ej disjoint and ν(Ej ) < +∞. Choose Zj ⊂ Ej and fj on Ej such that μ(Zj ) = 0, fj ≥ 0 and ν(A ∩ Ej ) =

fj dμ + ν(A ∩ Zj )

for measurable A ⊂ E.

A∩Ej

Let Z =

Zj and f =

ν(A) =

fj χEj . Then μ(Z) = 0, and adding over j, we have

f dμ + ν(A ∩ Z) =

A

f dμ + σ(A),

A

as claimed. The proof of the uniqueness of f and σ is left as an exercise. If g is the characteristic function χB of ameasurable set B, the formula g dν = in question, namely, A A gf dμ + A g dσ, reduces to ν(A ∩ B) = f dμ + σ(A ∩ B), which we know to be valid. Hence, the formula is A∩B also valid for any simple measurable g and, therefore, by the monotone convergence theorem, for any measurable g ≥ 0. Now let g be any measurable function for which A g dν exists. Then at least one of A g+ dν and A g− dν is finite, and the formula for g follows by subtracting those for g+ and g− .

Corollary 10.41 Let ν and μ be two σ-finite measures defined on the measurable subsets of a measurable E. (i) Then ν is absolutely continuous with respect to μ on E if and only if there is a nonnegative measurable f such that ν(A) = A f dμ for every measurable A ⊂ E. In this case, A

g dν =

gf dμ

A

for any measurable g and A ⊂ E for which A g dν exists. (ii) Let g ∈ L(E; dν). Then A g dν = A gf dμ for some nonnegative f and all measurable A ⊂ E if and only if A g dν is absolutely continuous with respect to μ. Proof. Let ν(A) = A f dμ + σ(A) be the decomposition given by Theorem 10.40. Part (i) follows from Theorem 10.40 since σ ≡ 0 if and only if ν is absolutely continuous with respect to μ. Part (ii) follows from the fact that g dν = gf dμ + the formula A A A g dσ is the Lebesgue decomposition of g dν. A

263

Abstract Integration

10.4 The Dual Space of Lp If B is a Banach space (or, more generally, a normed linear space) over the real numbers, a real-valued linear functional l on B is by definition a finite real-valued function l(f ), f ∈ B, which satisfies l(f1 + f2 ) = l(f1 ) + l(f2 ),

l(αf ) = α l(f ), −∞ < α < +∞.

Note that l(0) = 0. A linear functional l is said to be bounded if there is a constant c such that |l(f )| ≤ c f for all f ∈ B. A bounded linear functional l is continuous with respect to the norm in B, by which we mean that if f − fk → 0 as k → ∞, then l(fk ) → l(f ), since |l(f ) − l(fk )| = |l(f − fk )| ≤ c f − fk → 0. The norm l of a bounded linear functional l is defined as l = sup |l(f )|.

(10.42)

f ≤1

Since f / f has norm 1 for any f = 0, and since l is linear, we have l = supf =0 |l(f )|/ f . The collection of all bounded linear functionals on B is called the dual space B of B. We shall consider the case when B = Lp = Lp (E; dμ) and for simplicity restrict our attention to real-valued functions. Our goal is to show that if 1 ≤ p < ∞ and μ is σ-finite, then the dual space (Lp ) of Lp can be identified in a natural way with Lp , 1/p + 1/p = 1. The main tool in doing so is the Radon–Nikodym theorem. The first result is the following.

Theorem 10.43 formula

Let 1 ≤ p ≤ ∞, 1/p + 1/p = 1. If g ∈ Lp (E; dμ), then the l(f ) =

fg dμ

E

defines a bounded linear functional l ∈ [Lp (E; dμ)] . Moreover, l ≤ gp . Proof. This follows easily from Hölder’s inequality and the linear properties of the integral: we have |l(f )| = fg dμ ≤ gp f p , E

and therefore l ≤ gp .

264

Measure and Integral: An Introduction to Real Analysis

The theorem shows that with each g ∈ Lp , we can associate a bounded linear functional, l(f ) = E fg dμ, on Lp . If μ is σ-finite on E, the correspondence between g and l is unique (see Exercise 6) and defines an embedding of Lp in (Lp ) . We now give the characterization of (Lp ) , 1 ≤ p < ∞. Theorem 10.44 Let 1 ≤ p < ∞, 1/p + 1/p = 1, and let μ be σ-finite. If l ∈ [Lp (E; dμ)] , there is a unique g ∈ Lp (E; dμ) such that l(f ) =

fg dμ.

E

Moreover, l = gp , and therefore the correspondence between l and g defines an isometry between (Lp ) and Lp . Proof. Suppose first that μ(E) < +∞. Let l ∈ (Lp ) and write l = c. Define a set function φ on the measurable sets A ⊂ E by φ(A) = l(χA ). Note that φ is finite; in fact, |φ(A)| ≤ cχA p = cμ(A)1/p . Clearly, φ is finitely ∞ additive. To show that it is countably additive, suppose that A = k=1 Ak , Ak ∞ m measurable and disjoint. Write A = ( k=1 Ak ) ∪ ( k=m+1 Ak ) = A ∪ A . Then

φ(A) = φ(A ) + φ(A ) =

m

φ(Ak ) + φ(A ).

k=1

10.11 implies that φ(A ) Since |φ(A )| ≤ cμ(A )1/p (with p = ∞), Theorem ∞ tends to zero as m → ∞. Hence, φ(A) = k=1 φ(Ak ), which shows that φ is countably additive. The fact that |φ(A)| ≤ cμ(A)1/p also implies that φ is absolutely continuous with respect to μ. 1 By the Radon–Nikodym theorem, there is a g ∈ L (E; dμ)such that φ(A) = A g dμ for measurable A ⊂ E. This means that l(χA ) = E χA g dμ, so that l(f ) = E fg dμ for any simple measurable f . To show the same formula holds for any f ∈ Lp , we first claim that g ∈ Lp and gp ≤ c. If p > 1, choose simple functions hk with 0 ≤ hk |g|p . Let {gk } be the simple functions defined by 1/p

gk = hk sign g.

265

Abstract Integration 1/p Then gk p = hk 1 , and

1/p gk g dμ = l(gk ) ≤ c gk p = c hk 1 .

E 1/p+1/p

1/p

1/p

= hk , we obtain hk 1 ≤ chk 1 . We may Since gk g = hk |g| ≥ hk assume that hk 1 = 0 for large k. (Otherwise, g would be zero a.e. (μ), and our claim would be obviously true.) Hence, dividing both sides of the last 1/p 1/p inequality by hk , we have hk ≤ c, so that g ≤ c by the monotone 1

1

p

convergence theorem. This proves the claim when p > 1. The case p = 1 is left as an exercise. fk converging to To show that l(f ) = E fg dμ for any f ∈ Lp , choose simple f in Lp norm (see Exercise 8). Then l(fk ) → l(f ), and E fk g dμ → E fg dμ by Hölder’s inequality: fk g dμ − fg dμ ≤ fk − f g dμ ≤ fk − f p gp . E

E

E

The fact that the formula holds for fk thus implies that it holds for f by passing to the limit. To complete the proof for the case μ(E) < +∞, it remains to show that g = c and that the correspondence between l and g is unique. However, p we already know that g ≤ c, and the opposite inequality follows from p

Theorem 10.43. For the uniqueness of the correspondence, see Exercise 6(c). If μ(E) = +∞, then since μ is σ-finite, there exist Ej E with μ(Ej ) < +∞. Let l ∈ [Lp (E)] . We may view functions in Lp (Ej ) as those functions in Lp (E) that vanish outside Ej . Since the restriction of l to Lp (Ej ) is a bounded linear functional, there is a unique gj ∈ Lp (Ej ), gj p ,E ≤ l, such that j

l(f ) =

fgj dμ

Ej

for every f in Lp that vanishes outside Ej . For such f , the fact that Ej ⊂ Ej+1 also gives l(f ) =

Ej+1

fgj+1 dμ =

Ej

fgj+1 dμ.

266

Measure and Integral: An Introduction to Real Analysis

Therefore, gj+1 = gj a.e. (μ) in Ej . We may assume that gj+1 = gj everywhere in Ej . Define g(x) = gj (x) if x ∈ Ej . Then g is measurable and it follows that g ≤ l. If f ∈ Lp (E), then p l(f χEj ) =

fgj dμ =

Ej

fg dμ.

Ej

Since f χEj converges in Lp to f and Ej fg dμ → E fg dμ (note that fg ∈ L1 by Hölder’s inequality), we obtain l f = E fg dμ in the limit. Therefore, by Theorem 10.43, l ≤ gp , so that l = gp and the proof is complete. We remark that the proof of Theorem 10.44 fails in several places if p = ∞, for example, at the place where we conclude that φ is absolutely continuous. In fact, the theorem itself is false when p = ∞, is, not every bounded that linear functional on L∞ can be represented l f = fg dμ for some g ∈ L1 ; an example is indicated in Exercise 18.

10.5 Relative Differentiation of Measures Lebesgue’s differentiation theorem, Theorem 7.11, states that if f is locally integrable in Rn , then lim

h→0

1 f (y) dy = f (x) a.e., |Qx (h)| Qx (h)

where Qx (h) is the cube with center x and edge length h. We will now study an analogue of this result for other measures on Rn . Specifically, if μ and ν are two σ-finite measures on the Borel subsets of Rn , we will study the existence of lim

h→0

ν(Qx (h)) μ(Qx (h))

and its relation to the Lebesgue decomposition of ν with respect to μ. We will follow the method used to prove Lebesgue’s differentiation theorem. To do this, we must find a replacement for Vitali’s lemma: the simple form of Vitali’s lemma (see Lemma 7.4) relies heavily on the fact that expanding a cube concentrically by a factor (say 5) only enlarges its Lebesgue

267

Abstract Integration

measure proportionally, whereas no such relation may hold for general measures. In order to bypass this difficulty, we shall present a covering lemma that is purely geometric in nature, that is, which makes no mention of measure. We consider only cubes whose edges are parallel to the coordinate axes and write Q = Qx for those with center x. We say that a family K of cubes has bounded overlaps if there is a constant c such that every x ∈ Rn belongs to at most c cubes from K. Thus, K has bounded overlaps if and only if

χQ (x) ≤ c for all x ∈ Rn .

Q∈K

Theorem 10.45 (Besicovitch Covering Lemma) Let E be a bounded subset of Rn and let K be a family of cubes covering E that contains a cube Qx with center x for each x ∈ E. Then there exist points {xk } in E such that (i) E ⊂ Qxk (ii) {Qxk } has bounded overlaps Moreover, the constant c for which

χQx ≤ c can be chosen to depend only on n. k

In order to prove this, it will be convenient to first prove the following lemma. Lemma 10.46 Let {Qk }∞ k=1 be a sequence of cubes with centers {xk } such that if / Qj and |Qk | ≤ 2|Qj |. Then {Qk } has bounded overlaps, and the j < k, then xk ∈ constant c for which χQk ≤ c can be chosen to depend only on n. Proof. We will consider only n = 2; the case n = 2 is similar and left as an exercise. Let Qkm , m = 1, 2, . . . , be those Qk that contain the origin and whose centers are in the first quadrant, and let hm denote the edge length of Qkm . Then Qk1 covers at least the region A = (x, y) : 0 ≤ x ≤ 12 h1 , 0 ≤ y ≤ 12 h1 . Hence, no Qkm can have its center outside the set {(x, y) : 0 ≤ x ≤ h1 , 0 ≤ y ≤ h1 }; otherwise, we would have hm > 2h1 for some m, so that |Qkm | > 4|Qk1 |, a contradiction. Therefore, the center of each Qkm , m ≥ 2, must lie in one of the regions A, B, C, or D indicated in the following illustration:

268

Measure and Integral: An Introduction to Real Analysis

h1 1 2

B

C

A

D

h1

1 2

h1

h1

The center cannot be in A since that would contradict the assumption that the center of Qkm , m ≥ 2, does not lie in Qk1 . If it lies in B, then since Qkm contains 0, it covers B, and so there is at most one Qkm with center in B. Similar arguments hold for C and D. Applying the same reasoning to each quadrant, we see that there are at most 16 cubes in {Qk } that contain the origin. By translation, the same holds at any point of the plane, and the lemma is proved. Proof of Besicovitch’s lemma. Let α1 = sup {|Qx | : x ∈ E}. If α1 = +∞, there are arbitrarily large Qx , and since E is bounded, we simply choose one that contains E. If α1 < +∞, write E1 = E and choose x1 ∈ E1 with |Qx1 | > α1 /2. Let E2 = E1 − Qx1 ,

α2 = sup {|Qx | : x ∈ E2 }.

The definition of α2 assumes that E2 = ∅. Then α2 > 0 and we choose x2 ∈ E2 with Qx2 > α2 /2. Proceed in this way, obtaining at the kth stage Ek = Ek−1 − Qxk−1 = E − xk ∈ Ek ,

Qx > αk /2. k

k−1

Qxj ,

αk = sup {|Qx | : x ∈ Ek },

j=1

We continue the process as long as Ek = ∅. Note that αk >0 if Ek = ∅. all j ≤ k. Therefore, Qxk ≤ αk ≤ αj < Since xk ∈ Ek , we have xk ∈ Ej for 2Qxj if j ≤ k. It follows that Qxk satisfies the hypothesis of Lemma 10.46 and so has bounded overlaps. It remains only to show that E ⊂ Qxk . If some Ek0 is empty, then E is contained in the union of the Qxk , k ≤ k0 − 1. then Qxk is defined for every If no Ek is empty, k = 1, 2, . . .. Since αk and αk /2 < Qxk ≤ αk , it follows either that Qxk → 0 or that there exists δ > 0

269

Abstract Integration

such that Qxk ≥ δ for all k. The second possibility cannot arise; otherwise, E would not be bounded since xk ∈ E but xk is not in any Qxj with j < k. Hence, Qxk → 0, or equivalently, αk → 0. If x ∈ E − Qxk , then x ∈ Ek for all k. Therefore, |Qx | ≤ αk for all k, which means that |Qx | = 0. This shows that E − Qxk is actually empty and completes the proof. A Borel measure μ on Rn (i.e., a measure on the Borel subsets of Rn ) is called regular if μ(E) = inf {μ(G) : G ⊃ E, G open} for every Borel set E. If μ is regular and E is a Borel set with μ(E) < ∞, then any open set G that satisfies E ⊂ G and μ(G) < μ(E) + ε for some ε > 0 also satisfies μ(G − E) < ε since μ(G − E) = μ(G) − μ(E) when μ(E) < ∞. Now suppose that μ is a σ-finite regular Borel measure. Let E be a Borel set and ε > 0. We will show that there is an open set G satisfying E ⊂ G and μ(G − E) < ε whether μ(E) is finite or not. In fact, since μ is σ-finite, we can by choosing open sets write E = ∞ 1 Ek with μ(Ek ) < ∞ for each k, and then ∞ −k , we obtain an open set G = ) < ε2 Gk with μ(Gk −E 1 Gk such that E ⊂ G ∞k and G − E ⊂ 1 (Gk − Ek ), and consequently μ(G − E) ≤

∞ 1

∞ ε μ(Gk − Ek ) < = ε, 2k 1

as desired. Moreover, since the complement CE of E is also a Borel set, it follows that there is an open set G with CE ⊂ G and μ( G − CE) < ε. Then the closed set F defined by F = CG satisfies F ⊂ E and μ(E − F) < ε since E−F = G − CE. In case μ(F) < ∞, we obtain μ(E) − μ(F) < ε. In any case, either both μ(E) and μ(F) are finite or μ(E) = μ(F) = ∞. Therefore, for any Borel set E, μ(E) =

sup

μ(F)

F closed F⊂E

if μ is a σ-finite regular Borel measure. From now on, we will consider two regular Borel measures μ and ν on Rn that are finite on the bounded Borel sets. Let Qx (h) denote the cube with center x and edge length h. We will also assume that (i) μ(Qx (h)) > 0 for x ∈ Rn , h > 0. (ii) Sets of the form ν(Qx (h)) ν(Qx (h)) x : sup > α , x : lim sup > α , etc., h>0 μ(Qx (h)) h→0 μ(Qx (h)) are (Borel) measurable.

270

Measure and Integral: An Introduction to Real Analysis

Assumption (ii) is made for simplicity and is not necessary (see Exercise 17 of Chapter 11). Also, the assumption that μ and ν are regular is redundant (see Theorem 11.24 and the remarks following it). We assume that μ and ν are finite on the bounded Borel sets in order to ensure their finiteness on every Qx (h). Thus μ and ν are σ-finite measures. We will also use the fact that the class of continuous functions with compact support is dense in L(dμ); that is, given f ∈ L(dμ), there exist continuous gk with compact support such that f − gk dμ → 0. (See Exercise 27.) Note that when μ is Lebesgue measure and ν(E) = E | f | dx for a locally integrable f , then suph>0 ν(Qx (h))/μ(Qx (h)) is the Hardy–Littlewood maximal function of f (see (7.5)). In the next lemma, we estimate the size of this expression if μ and ν are any measures with the properties listed above. The results we will prove about differentiation are corollaries of this estimate.

Lemma 10.47 Let μ and ν satisfy the stated conditions. Then there is a constant c depending only on n such that (a) μ {x ∈ Rn : suph>0 [ν(Qx (h))/μ(Qx (h))] > α} ≤ (c/α)ν(Rn ), and (b) μ{x ∈ E : lim suph→ 0 [ν(Qx (h))/μ(Qx (h))] > α} ≤ (c/α)ν(E) for any Borel set E ⊂ Rn and any α > 0 Proof. (a) Fix α > 0, and let ν(Qx (h)) >α . S = x ∈ Rn : sup h > 0 μ(Qx (h)) If B is any bounded Borel set and x ∈ S ∩ B, there is a cube Qx withcenter Besicovitch’s x such that ν(Qx )/μ(Qx ) > α. Using lemma, select Qxk and c such that ν(Qxk ) > αμ(Qxk ), S ∩ B ⊂ Qxk , and χQx ≤ c. We then have k

1 ν(Qxk ), μ(Qxk ) < Qxk ≤ α ν(Qxk ) = χQx dν ≤ c dν = cν Qxk . μ(S ∩ B) ≤ μ

k

Qxk

Qxk

Therefore, μ(S ∩ B) ≤ cν( Qxk )/α, so that μ(S ∩ B) ≤ cν(Rn )/α. Letting B Rn , we obtain μ(S) ≤ cν(Rn )/α, as desired.

271

Abstract Integration

(b) Fix α > 0, and let ν(Qx (h)) >α . T = x ∈ E : lim sup h→0 μ(Qx (h)) If ν(E) = + ∞, there is nothing to prove. Otherwise, choose an open set G ⊃ E with ν(G) < ν(E) + ε, and let B be a bounded Borel set. If x ∈ T ∩ B, there is a cube Qx such that Qx ⊂ G and ν(Qx )/μ(Qx ) > α. By again using Qxk ⊂ G, such that μ(T ∩ B) ≤ Besicovitch’s lemma, there exists {Qxk }, c ν( Qxk )/α. Therefore, μ(T ∩ B) ≤ c ν(G)/α ≤ c[ν(E) + ε]/α. The result now follows by first letting ε → 0 and then letting B Rn . The first result about differentiation of measures is the following.

Theorem 10.48 singular, then

Let ν and μ satisfy the stated conditions. If ν and μ are mutually

ν(Qx (h)) = 0 a.e. (μ). μ(Q h→0 x (h)) lim

Proof. Since ν and μ are mutually singular, there is a set Z with ν(Rn − Z) = μ(Z) = 0. Let E = Rn − Z, and consider the sets ν(Qx (h)) > α , α > 0, Tα = x ∈ E : lim sup h→0 μ(Qx (h)) ν(Qx (h)) T = x ∈ E : lim sup >0 . μ(Qx (h)) h→0 By Lemma 10.47(b), we have μ(Tα ) ≤ cν(E)/α = 0. Since T is the union of the Tαk for any sequence αk → 0, it also has μ-measure zero, and the result follows.

Theorem 10.49 Let μ satisfy the stated conditions, and let f be a Borel measurable function that is integrable (dμ) over every bounded Borel set in Rn . Then 1 f dμ = f (x) h→0 μ(Qx (h)) lim

Qx (h)

a.e. (μ).

272

Measure and Integral: An Introduction to Real Analysis

Proof. Assume first that f ∈ L (Rn ; dμ). For any integrable g, we have 1 1 ≤ f − g dμ f dμ − f (x) μ(Q (h)) μ(Q (h)) x x Qx (h) Qx (h) 1 + g dμ − f (x). μ(Qx (h)) Qx (h)

If g is also continuous, the last term on the right converges to |g(x) − f (x)| as h → 0. Hence, letting L(x) denote the lim sup as h → 0 of the term on the left, we obtain L(x) ≤ sup h>0

1 |f − g| dμ + |g(x) − f (x)|. μ(Qx (h)) Qx (h)

Therefore, the set Sε where L(x) > ε, ε > 0, is contained in the union of the two sets where the corresponding terms on the right side of the last inequality exceed ε/2. From Lemma 10.47 and Tchebyshev’s inequality, we obtain

μ(Sε ) ≤ c

−1 ε −1 f − g dμ + ε f − g dμ. 2 2 n n R

R

As noted before the proof of Lemma 10.47, g can be chosen such that Rn | f − g| dμ is arbitrarily small. Hence, μ(Sε ) = 0 for every ε > 0, and the result follows. The case when f ∈ / L(Rn ; dμ) is left as an exercise (cf. Theorem 7.11). Combining the last two theorems, we obtain the main result: Corollary 10.50 Let ν and μ satisfy the stated conditions. If ν(E) = E f dμ+σ(E) is the decomposition of ν into parts that are absolutely continuous and singular with respect to μ, then

lim

h→0

ν(Qx (h)) = f (x) μ(Qx (h))

a.e. (μ).

273

Abstract Integration

Exercises 1. Prove Theorem 10.13. 2. A measure space (S , , μ) is said to be complete if contains all subsets of sets with measure zero; that is, (S , , μ) is complete if Y ∈ whenever Y ⊂ Z, Z ∈ , and μ(Z) = 0. In this case, show that if f is measurable and g = f a.e. (μ), then g is also measurable (cf. Theorem 4.5 and Chapter 3, Exercise 34). Is this true if (S , , μ) is not complete? Give an example of an incomplete measure space with a measure that is neither identically infinite nor identically zero. 3. Prove Egorov’s Theorem 10.14. 4. If (S , , μ) is a measure space, and if f and {fk } are measurable and finite a.e. (μ) in a measurable set E, then {fk } is said to converge in μ-measure on E to limit f if lim μ{x ∈ E : |f (x) − fk (x)| > ε} = 0 for all ε > 0.

k→∞

Formulate and prove analogues of Theorems 4.21 through 4.23. 5. Complete the proof of Lemma 10.18. 6. (a) If f1 , f2 ∈ L(dμ) and E f1 dμ = E f2 dμ for all measurable E, show that f1 = f2 a.e. (μ). (b) Prove the uniqueness of f and σ in Theorem 10.40.

(c) Let and let f1 , f2 ∈ Lp (dμ), 1/p + 1/p = 1, 1 ≤ p ≤ ∞. If μ be σ-finite, f1 g dμ = f2 g dμ for all g ∈ Lp (dμ), show that f1 = f2 a.e. (μ). 7. Prove the integral convergence results in Theorems 10.27 through 10.29 and 10.31. 8. Show that for 1 ≤ p < ∞, the class of simple functions vanishing outside sets of finite measure is dense in Lp (dμ). See also Exercise 27. 9. The symmetric difference of two sets E1 and E2 is defined as E1 E2 = (E1 − E2 ) ∪ (E2 − E1 ). Let (S , , μ) be a measure space, and identify measurable sets E1 and E2 if μ(E1 E2 ) = 0. Show that is a metric space with distance d(E1 , E2 ) = μ(E1 E2 ) and that if μ is finite, then Lp (S , , μ) is separable if and only if is 1 ≤ p < ∞. (For the sufficiency in the second part, Exercise 8 may be helpful; for the necessity, let {fk } be a countable dense set in Lp (S , , μ) and consider the sets {1/2 < fk ≤ 3/2}.)

274

Measure and Integral: An Introduction to Real Analysis

10. If φ is a set function whose Jordan decomposition is φ = V − V, define E

f dφ =

f dV −

E

f dV,

E

provided not both integrals on the right are infinite with the same sign. If V is the total variation of φ on E, and if | f | ≤ M, prove that | E f dφ| ≤ MV. 11. Prove parts (ii)–(iv) of Theorem 10.33. 12. Give an example of a pair of measures ν and μ such that ν is absolutely continuous with respect to μ, but given ε > 0, there is no δ > 0 such that ν(A) < ε for every A with μ(A) < δ. (Thus, the analogue for measures of Theorem 10.34 may fail.) Prove the analogue of Theorem 10.35 for mutually singular measures ν and μ. 13. Show that the set P of the Hahn decomposition is unique up to null sets. (By a null set for φ, we mean a set N such that φ(A) = 0 for every measurable A ⊂ N.) 14. Complete the proof of Theorem 10.44 for p = 1. 15. (Converse of Hölder’s inequality) Let μ be a σ-finite measure and 1 ≤ p ≤ ∞. (a) Show that f p = sup fg dμ , where the supremum is taken over all bounded measurable functions g that vanish outside a set (depending on g) of finite measure, and for which gp ≤ 1 and fg dμ exists. (If 1 < p ≤ ∞ and f p < ∞, this can be deduced from Theorem 10.44.) (b) Show that a real-valued measurable f belongs to Lp if fg ∈ L1 for all g ∈ Lp , 1/p + 1/p = 1. 16. Consider a convolution operator Tf (x) = Rn f (y)K(x−y) dy with K ≥ 0. If 1 ≤ p ≤ ∞ and Tf p ≤ Mf p for all f , show that Tf p ≤ Mf p for all f , 1/p + 1/p = 1. (Use Exercise 15 to write Tf p = supgp ≤1 | Rn (Tf )g dx|, and note that (Tf )(x)g(x) dx = (T g)(−y)f (y) dy Rn

Rn

where g(x) = g(−x).) Find a generalization if the hypothesis is instead that Tf q ≤ M f p for all f , where q is a fixed index with 1 ≤ q ≤ ∞ and q = p.

275

Abstract Integration

17. Let μbe σ-finite and define L p (dμ) to be the class of complex-valued f with |f |p dμ < +∞. Let l be a complex-valued bounded linear functional on L p (dμ). If 1 ≤ p < ∞, show that there is a function g ∈ L p (dμ) such that l(f ) = fg dμ. (Here, as usual, we define h dμ = h1 dμ + i h2 dμ if h = h1 + ih2 with h1 and h2 real-valued.) (Hint: Reduce to the real case.) 18. Give an example to show that (L∞ ) cannot be identified with L1 as in Theorem 10.44. (Consider L∞ [−1, 1] with Lebesgue measure, and let C be the subspace of continuous functions on [−1,1] with the sup norm. Define l(f ) = f (0) for f ∈ C . Then l is a bounded linear functional on C , so by the Hahn–Banach theorem∗ , l has an extension l ∈ (L∞ [−1, 1]) . 1 If there were a function g ∈ L1 [−1, 1] such that l(f ) = −1 fg dx for all 1 f ∈ L∞ [−1, 1], then we would have f (0) = −1 fg dx for all f ∈ C . Show that this implies that g = 0 a.e., so that l ≡ 0. The functional l is called the Dirac δ-function.) To show that (L∞ (E; dx)) and L1 (E; dx) are not isometrically isomorphic, one can combine the following three facts: L1 (E; dx) is separable; L∞ (E; dx) is not separable; and, a Banach space is separable if its dual space is separable. For the latter, see the references in the footnote below, Theorem 8.11 on p. 192 of the first, or Theorem 3.26, p. 73, of the second. 19. Complete the proof of Theorem 10.49. 20. Under the hypothesis of Theorem 10.49, prove that 1 | f (y) − f (x)| dμ(y) = 0 h→0 μ(Qx (h)) lim

a.e. (μ).

Qx (h)

21. Derive an analogue of the Besicovitch Covering Lemma for the case of two dimensions (x, y) when the squares Q(x,y) are replaced by rectangles R(x,y) (h) centered at (x, y) whose x and y dimensions are h and h2 , respectively. Use this result to prove that under the hypothesis of Theorem 10.49, lim

h→0

1 μ(R(x,y) (h))

f dμ = f (x, y)

a.e. (μ).

R(x,y) (h)

22. Let μ be a measure and A be a set with 0 < μ(A) < ∞. Let f be measurable and bounded on A, and let φ be convex in an interval containing the range of f . Prove that ∗ (see, e.g., Theorem 2.5, p. 33 of M. Schechter, Principles of Functional Analysis, 2nd edition,

Graduate Studies in Mathematics, vol. 36 (2001), American Mathematical Society; or Theorem 1.1, p. 1, of H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer, 2011).

276

Measure and Integral: An Introduction to Real Analysis

φ(f ) dμ A f dμ φ ≤ A . dμ A A dμ

(This is Jensen’s inequality for measures. See Theorem 7.44.) 23. A sequence {φk } of set functions is said to be uniformly absolutely continuous with respect to a measure μ if given ε > 0, there exists δ > 0 such that if E satisfies μ(E) < δ, then |φk (E)| < ε for all k. If {fk } is a sequence of integrable functions on a finite measure space (S , , μ) that converges pointwise a.e. (μ) to an integrable f , show that fk → f in L(dμ) norm if and only if the indefinite integrals of the fk are uniformly absolutely continuous with respect to μ. (Cf. Exercise 17 of Chapter 7.) 24. Let (S , , μ) be a σ-finite measure space, and let f be -measurable and integrable over S . Let 0 be a σ-algebra satisfying 0 ⊂ . Of course, f that there is a unique function f0 that may not be 0 -measurable. Show is 0 -measurable such that fg dμ = f0 g dμ for every 0 -measurable g for which the integrals are finite. The function f0 is called the conditional expectation of f with respect to 0 , denoted f0 = E(f |0 ). (Apply the Radon–Nikodym theorem to the set function φ(E) = E f dμ, E ∈ 0 .) 25. Using the notation of the preceding exercise, prove the following: (a) E(af + bg|0 ) = aE(f |0 ) + bE(g|0 ), a, b constants. (b) E(f |0 ) ≥ 0 if f ≥ 0. (c) E(fg|0 ) = gE(f |0 ) if g is 0 -measurable. (d) If 1 ⊂ 0 ⊂ , then E(f |1 ) = E(E(f |0 )|1 ). 26. (Hardy’s inequality) Let f ≥ 0 on (0, ∞), 1 ≤ p < ∞, dμ(x) = xα dx and dν(x) = xα+p dx on (0, ∞). Prove there exists a constant c independent of f such x ∞ ∞that (i) 0 ( 0 f (t) dt)p dμ(x) ≤ c 0 f p (x) dν(x), α < −1, ∞ ∞ ∞ (ii) 0 ( x f (t) dt)p dμ(x) ≤ c 0 f p (x) dν(x), α > −1. x x (For (i), ( 0 f (t) dt)p ≤ cxp−η−1 0 f (t)p tη dt by Hölder’s inequality, provided p − η − 1 > 0. Multiply both sides by xα , integrate over (0, ∞), change the order of integration, and observe that an appropriate η exists since α < −1.) 27. If μ is a σ-finite regular Borel measure on Rn , show that the class of continuous functions with compact support is dense in Lp (dμ), 1 ≤ p < ∞. (By Exercise 8, it is enough to approximate χE , where E is a Borel set with finite measure. Given ε > 0, as shown in Section 10.5 on p. 269, there exist open G and closed F with F ⊂ E ⊂ G and μ(G − F) < ε. Now use Urysohn’s lemma: if F1 and F2 are disjoint closed sets in Rn , there is a continuous f on Rn with 0 ≤ f ≤ 1, f = 1 on F1 , f = 0 on F2 .)

28. Let 1 < p ≤ ∞ and μ be a σ-finite measure for which Lp (E; dμ) is separable, 1/p + 1/p = 1. Show that every bounded sequence in Lp (E; dμ) has a weakly convergent subsequence, that is, if supk fk p < ∞, show that

277

Abstract Integration

there exists {fkj } and f ∈ Lp such that

E fkj g dμ

→

E fg dμ

for all g ∈ Lp .

(Use the Bolzano–Weierstrass theorem to show that for every g ∈ Lp , there is a subsequence {fkj } depending on g such that E fkj g dμ converges. By using a diagonal argument, {fkj } can be chosen to be independent of

g for all g in any fixed countable subset S of Lp and consequently for all g ∈ Lp by choosing S to be dense in Lp . Finally, apply Theorem 10.44 to the linear functional l ∈ (Lp ) defined by l(g) = limkj →∞ E fkj g dμ.) 29. Let l p be defined as in Section 8.3. Explain how Theorem 10.44 can be used to describe both the action and the norm of a continuous linear functional on l p , 1 ≤ p < ∞. 30. Let be the σ-algebra of Lebesgue measurable sets in R1 . For every E ∈ , let (E) denote the Lebesgue measure of E, and define measures R and by R(E) = (E ∩ [0, 1]) and (E) = χE (0): (a) Show that E f d = f (0)(E). (b) Is either R or absolutely continuous or singular with respect to ? (c) Identify the functions f and the sets Z in the Lebesgue decompositions of with respect to , of with respect to , of R with respect to , and of with respect to R. 31. Prove the Besicovitch Covering Lemma in case n = 1. 32. Let w(x) be a nonnegative locally integrable function on Rn such that n Q w > 0 for every cube Q in R with edges parallel to the coordinate axes. Consider the w-weighted maximal function Mw f defined by Mw f (x) = sup

1 | f (y)| w(y) dy, Q w(y) dy

x ∈ Rn ,

Q

where f is a measurable function and the supremum is taken over all cubes Q ⊂ Rn centered at x with edges parallel to the coordinate axes. Show that

w(x) dx ≤

{x:Mw f (x)>α}

C | f (x)| w(x) dx, α n

α > 0,

R

and that for 1 < p ≤ ∞,

Rn

1/p p

|Mw f (x)| w(x) dx

≤C

1/p p

| f (x)| w(x) dx

,

Rn

where the constant C is independent of w, f , and α. (Compare Lemma 10.47 and Theorem 9.16.)

11 Outer Measure and Measure

11.1 Constructing Measures from Outer Measures A function = (A) that is defined for every subset A of a set S is called an outer measure if it satisfies the following: (i) (A) ≥ 0, (∅) = 0. (ii) (A1 ) ≤ (A2 ) if A1 ⊂ A2 . (iii) ( Ak ) ≤ (Ak ) for any countable collection of sets {Ak }. For example, ordinary Lebesgue outer measure is an outer measure on the subsets of Rn . Some other concrete examples will be constructed later in the chapter. As with Lebesgue outer measure, it is possible to use any outer measure to introduce a class of measurable sets and a corresponding measure. In doing so, we base the definition of measurability on Carathéodory’s Theorem 3.30. Thus, given an outer measure , we say that a subset E of S is -measurable, or simply measurable, if (A) = (A ∩ E) + (A − E)

(11.1)

for every A ⊂ S . Equivalently, E is measurable if and only if (A1 ∪ A2 ) = (A1 ) + (A2 ) whenever A1 ⊂ E, A2 ⊂ S − E. It follows that a set E is measurable if and only if its complement S − E is measurable. As a simple example, let us show that any set Z with (Z) = 0 is measurable. In fact, for such Z and any A ⊂ S , property (ii) gives (A ∩ Z) + (A − Z) ≤ (Z) + (A) = (A). But by (iii), the opposite inequality (A) ≤ (A ∩ Z) + (A − Z) is always true, and the measurability of Z follows. 279

280

Measure and Integral: An Introduction to Real Analysis

If E is a measurable set, then (E) is called its -measure, or simply its measure. The terminology is justified by the following theorem. Let be an outer measure on the subsets of S .

Theorem 11.2

(i) The family of -measurable subsets of S forms a σ-algebra. (ii) is countably additive on disjoint measurable sets, that k } is a count is, if {E (Ek ). More able collection of disjoint -measurable sets, then ( Ek ) = generally, for any A, measurable or not, (A ∩ Ek ) and A∩ Ek = Ek . (A) = (A ∩ Ek ) + A − Proof. Let {Ek } be a collection of disjoint measurable sets, and let H = j and Hj = k=1 Ek , j = 1, 2, . . .. We first claim that for every A, (A) =

j

∞

k=1 Ek

(A ∩ Ek ) + (A − Hj ).

k=1

The proof will be by induction on j. If j = 1, the formula follows from the measurability of E1 . Assuming that the formula holds for j − 1, we have (A) = (A ∩ Ej ) + (A − Ej ) = (A ∩ Ej ) +

j−1

((A − Ej ) ∩ Ek ) + ((A − Ej ) − Hj−1 ).

k=1

Since the Ek are disjoint, (A − Ej ) ∩ Ek = A ∩ Ek for k ≤ j − 1. Hence, since j (A − Ej ) − Hj−1 = A − Hj , we obtain (A) = k=1 (A ∩ Ek ) + (A − Hj ), as required. This proves the claim. Since Hj ⊂ H, we have (A − Hj ) ≥ (A − H). Using this fact in the previous formula and letting j → ∞, it follows that (A) ≥

∞

(A ∩ Ek ) + (A − H) ≥ (A ∩ H) + (A − H).

k=1

However, we also have (A) ≤ (A ∩ H) + (A − H). Therefore, H is measurable and (A) = ∞ k=1 (A ∩ Ek ) + (A − H). Replacing A by A ∩ H in

281

Outer Measure and Measure

this equation, we obtain (A ∩ H) = ∞ k=1 (A ∩ Ek ), and the proof of (ii) is complete. Note we have also shown that a countable union of disjoint measurable sets is measurable, and we know that the complement of a measurable set is measurable. To prove (i), it remains to show that a countable union of arbitrary measurable sets is measurable. We will use the next lemma.

Lemma 11.3

If E1 and E2 are measurable, then so is E1 − E2 .

Proof. We will show that (A ∪ B) = (A) + (B) whenever A ⊂ E1 − E2 and B ⊂ C(E1 − E2 ). Since B = (B ∩ E2 ) ∪ (B − E2 ), we have A ∪ B = [A ∪ (B − E2 )] ∪ [B ∩ E2 ]. Hence, since A ∪ (B − E2 ) ⊂ CE2 and B ∩ E2 ⊂ E2 , it follows from the measurability of E2 that (A ∪ B) = (A ∪ (B − E2 )) + (B ∩ E2 ). However, A ⊂ E1 and B − E2 ⊂ C(E1 − E2 ) − E2 ⊂ CE1 . Therefore, since E1 is measurable, (A ∪ (B − E2 )) = (A) + (B − E2 ). Combining equalities and using the measurability of E2 , we obtain (A ∪ B) = (A) + (B − E2 ) + (B ∩ E2 ) = (A) + (B), which proves the lemma. Returning now to the proof of part (i) of Theorem 11.2, recall that the complement of a measurable set is measurable. Since E1 ∪ E2 = C(CE1 − E2 ), it then follows from Lemma 11.3 that E1 ∪ E2 is measurable if E1 and E2 are. Therefore, any finite union of measurable sets is measurable. Now, let {Ek } be j a countable collection of measurable sets. If Hj = k=1 Ek , then ∞ k=1

⎡

⎤ ∞

Hj+1 − Hj ⎦ , Ek = H1 ∪ ⎣ j=1

and since the Hj are measurable and increasing, the terms on the right are measurable and disjoint. Thus, by the case already proved, it follows that ∞ E is measurable. This completes the proof of the theorem. k k=1 According to Theorem 11.2, an outer measure is a measure on the σalgebra of -measurable sets and so enjoys the usual properties of measures. We also have the following result. Corollary 11.4 Let be an outer measure on S , let {Ek } be a collection of measurable sets, and let A be any set.

282

Measure and Integral: An Introduction to Real Analysis

(i) If Ek , then (A ∩ lim Ek ) = limk→∞ (A ∩ Ek ); if Ek and if (A ∩ Ek0 ) is finite for some k0 , then (A ∩ lim Ek ) = limk→∞ (A ∩ Ek ). (ii) (A ∩ lim inf Ek ) ≤ lim inf k→∞ (A ∩ Ek ); if (A ∩ ∞ k=k0 Ek ) is finite for some k0 , then (A ∩ lim sup Ek ) ≥ lim supk→∞ (A ∩ Ek ). Proof. We will prove the first statements in (i) and (ii); the proofs of the second statements are left as exercises. Let Ek be measurable and Ek . To prove the first part of (i), we may assume that (A ∩ Ek ) is finite for each k; otherwise, the result is clear. The sets E1 , E2 − E1 , . . . , Ek+1 − Ek , . . . are disjoint and measurable. Since ∞ ∞ Ek = E1 ∪ (Ek+1 − Ek ) , lim Ek = k=1

k=1

it follows from Theorem 11.2 that (A ∩ lim Ek ) = (A ∩ E1 ) +

∞

(A ∩ (Ek+1 − Ek )).

k=1

Moreover, since Ek and Ek+1 −Ek are disjoint and measurable and Ek has finite measure, we have (A ∩ (Ek+1 − Ek )) = (A ∩ Ek+1 ) − (A ∩ Ek ). Therefore, (A ∩ lim Ek ) = (A ∩ E1 ) +

∞ [(A ∩ Ek+1 ) − (A ∩ Ek )] k=1

= lim (A ∩ Ek+1 ), k→∞

which proves the first part of (i). For the first part of (ii), let {Ek } be measurable and define sets Xj = ∞ k=j Ek , j = 1, 2, . . .. Then Xj lim inf Ek , so that by (i), (A ∩ lim inf Ek ) = limj→∞ (A ∩ Xj ). But since A ∩ Xj ⊂ A ∩ Ej , we have limj→∞ (A ∩ Xj ) ≤ lim inf j→∞ (A ∩ Ej ), and the result follows.

11.2 Metric Outer Measures Now let us introduce a new assumption concerning the underlying space S : namely, that it is a metric space with metric d. The distance between two sets A1 and A2 is then defined by d(A1 , A2 ) = inf{d(x, y) : x ∈ A1 , y ∈ A2 },

283

Outer Measure and Measure

as in Euclidean space (see p. 5 in Section 1.3). An outer measure on S is called a metric outer measure, or an outer measure in the sense of Carathéodory, if (A1 ∪ A2 ) = (A1 ) + (A2 ) whenever d(A1 , A2 ) > 0. For example, by Lemma 3.16, Lebesgue outer measure satisfies this condition. An outer measure in a metric space may not be a metric outer measure and may lack properties (in addition to the defining property) of metric outer measures. Consider, for example, the case when S is the x, y-plane and d is the usual Euclidean metric in R2 . Define r(A) =

1 , d(A, Y)

A ⊂ R2 , where Y is the y-axis,

with the conventions 1/0 = ∞ and r(∅) = 0. We leave it as an exercise to check that r is an outer measure on R2 but not a metric outer measure. Furthermore, Y and all its subsets are r-measurable (with infinite measure), and no set B ⊂ R2 with d(B, Y) > 0 is r-measurable. In particular, r does not have the property in the next result, Theorem 11.5. Since S is a metric space, it has the topology induced by its metric. Thus, a set G in S is said to be open if for every x ∈ G, there is a δ > 0 such that the metric ball {y : d(x, y) < δ} lies in G. A closed set is by definition the complement of an open set, and B denotes the σ-algebra of Borel subsets of S ; that is, B is the smallest σ-algebra containing all the open (closed) subsets of S . Theorem 11.5 Let be a metric outer measure on a metric space S . Then every Borel subset of S is -measurable. Since the collection of -measurable sets is a σ-algebra, it is enough to prove that every closed set is -measurable. To prove this, we will use the following fact. Lemma 11.6 Let be a metric outer measure on a space S with metric d. Let A be any set contained in an open set G, and let Ak = {x ∈ A : d(x, CG) ≥ 1/k}, k = 1, 2, . . .. Then limk→∞ (Ak ) = (A). Proof. Since G is open, we have Ak A. Clearly, limk→∞ (Ak ) ≤ (A). To prove the opposite inequality, let Dk = Ak+1 − Ak , k = 1, 2, . . . Then d(Dk+1 , Ak ) ≥ [(1/k) − (1/(k + 1))] > 0 since if x ∈ Ak and y ∈ Dk+1 , then 1 1 ≤ d(x, CG) ≤ d(x, y) + d(y, CG) < d(x, y) + , k k+1

284

Measure and Integral: An Introduction to Real Analysis

where the second inequality is true since d is a metric. We also have A = Ak ∪ Dk ∪ Dk+1 ∪ · · · ,

(A) ≤ (Ak ) + (Dk ) + (Dk+1 ) + · · · .

If (Dj ) < +∞, then j≥k (Dj ) tends to zero as k → ∞, and it follows that (A) ≤ limk→∞ (Ak ), as desired. If (Dj ) = +∞, then at least one of (D2j ) and (D2j+1 ) is infinite. We can therefore choose N so that (DN ) + (DN−2 ) + (DN−4 ) + · · · is arbi trarily large. However, when k ≥ 2, the fact that k−1 j=1 Dj ⊂ Ak implies that k−1 the distance between Dk+1 and j=1 Dj is positive. Therefore, (DN ∪ DN−2 ∪ DN−4 ∪ · · · ) = (DN ) + (DN−2 ) + (DN−4 ) + · · · . Since AN+1 contains DN ∪ DN−2 ∪ DN−4 ∪ · · · , it follows that lim (Ak ) = +∞, and the lemma is proved. Proof of Theorem 11.5. Let F be any closed set. It is enough to show that (A ∪ B) = (A) + (B) for A ⊂ CF, B ⊂ F. If Ak = {x ∈ A : d(x, F) ≥ 1/k}, then d(Ak , B) ≥ 1/k, so that (Ak ∪ B) = (Ak ) + (B). Therefore, (A ∪ B) ≥ (Ak ) + (B). Letting k → ∞, it follows from the lemma that (A ∪ B) ≥ (A) + (B). Since the opposite inequality is also true, the theorem is proved. If S is a metric space, the notions of upper and lower semicontinuity of functions can be defined just as in Rn . For example, a real-valued function f defined near a point x0 is said to be upper semicontinuous at x0 if lim sup f (x) ≤ f (x0 ). x→x0

Here, of course, the notation x → x0 means that d(x, x0 ) → 0. The results of Theorem 4.14 are valid for metric spaces; for example, f is usc at every point of S if and only if {f ≥ a} is closed for every a. We thus obtain the following fact. Corollary 11.7 Let be a metric outer measure on S . Then every semicontinuous function on S is -measurable. Proof. Suppose, for example, that f is upper semicontinuous on S . Then {x : f (x) ≥ a} is closed for every a, and so is -measurable by Theorem 11.5. Hence, f is -measurable. If f is lower semicontinuous, then −f is upper semicontinuous, and the corollary follows.

285

Outer Measure and Measure

11.3 Lebesgue–Stieltjes Measure In this section and the next, we will consider two specific examples of outer measures in the sense of Carathéodory. The first is known as Lebesgue– Stieltjes outer measure. It elucidates the connection between measures and monotone functions. The situation is relatively simple for measures on R1 and monotone functions of a single variable, and we shall restrict our attention to this case. Extensions to higher dimensions are possible but more complicated. To construct a typical Lebesgue–Stieltjes outer measure, consider any fixed function f that is finite and monotone increasing (i.e., nondecreasing) on (−∞, +∞). For each half-open finite interval of the form (a, b], let λ(a, b] = λf ((a, b]) = f (b) − f (a). Note that λ ≥ 0 since f is increasing. If A is a nonempty subset of R1 , let λ(ak , bk ], ∗ (A) = ∗f (A) = inf where the inf is taken over all countable collections {(ak , bk ]} such that A ⊂ (ak , bk ]. Further, define ∗ (∅) = 0. Theorem 11.8

∗ is a Carathéodory outer measure on R1 .

Proof. We have ∗ ≥ 0 and ∗ (∅) = 0. First, we will show that if A1 ⊂ A2 , ∗ (A ) = +∞. then ∗ (A1 ) ≤ ∗ (A2 ). This is obvious if either A 2 1 = ∅ or , bk ] and λ(ak , bk ] < In any other case, choose {(ak , bk ]} such that A2 ⊂ (ak λ(ak , bk ]. Therefore, ∗ (A2 ) + ε. Then A1 ⊂ (ak , bk ], so that ∗ (A1 ) ≤ ∗ (A1 ) < ∗ (A2 ) + ε, and the result follows by letting ε → 0. To show that ∗ is subadditive, let {Aj }∞ j=1 be a collection of nonempty 1 subsets of R and let A = Aj . We may assume that ∗ (Aj ) < +∞ for each j j j. Choose (ak , bk ] such that Aj ⊂

j j (ak , bk ]

and

k

Since A ⊂

j j j,k (ak , bk ],

j

j

λ(ak , bk ] < ∗ (Aj ) + ε2−j .

k

we have

∗ (A) ≤

j,k

It follows that ∗ (A) ≤

j

j

λ(ak , bk ]

0, then given ε > 0, we can choose {(ak , bk ]} such that each (ak , bk ] has length less than d(A1 , A2 ) and A1 ∪ A2 ⊂

(ak , bk ],

λ(ak , bk ] ≤ ∗ (A1 ∪ A2 ) + ε.

Thus, the collection {(ak , bk ]} splits into two coverings, one of A1 and the other λ(ak , bk ], so that since ε is arbitrary, of A2 . Therefore, ∗ (A1 ) + ∗ (A2 ) ≤ ∗ (A1 ) + ∗ (A2 ) ≤ ∗ (A1 ∪ A2 ). But the opposite inequality is always true, which completes the proof. ∗f is called the Lebesgue–Stieltjes outer measure corresponding to f , and its restriction to those sets that are ∗f -measurable is called the Lebesgue–Stieltjes measure corresponding to f and denoted f or simply . Every Borel set in (−∞, ∞) is ∗f -measurable by Theorems 11.5 and 11.8. In particular, since (a, b] is a Borel set, ∗f ((a, b]) = f ((a, b]) for every (a, b]. We leave it as an exercise to show that the Lebesgue–Stieltjes outer measure ∗x corresponding to f (x) = x coincides with ordinary Lebesgue outer measure in R1 . Hence, by Carathéodory’s Theorem 3.30, a set is ∗x -measurable if and only if it is Lebesgue measurable. An outer measure defined on the subsets of a set S is said to be regular if for every A ⊂ S there is a -measurable set E such that A ⊂ E and (A) = (E). Ordinary Lebesgue outer measure in Rn is regular by Theorem 3.8. The next theorem shows that any Lebesgue–Stieltjes outer measure is regular; in fact, it shows that any set in R1 can be included in a Borel set with the same Lebesgue–Stieltjes outer measure. Of course, Borel sets are ∗ -measurable by Theorem 11.5. Theorem 11.9 Let ∗ be a Lebesgue–Stieltjes outer measure. If A is a subset of R1 , there is a Borel set B containing A such that ∗ (A) = (B). j

j

Proof. Given j = 1, 2, . . ., choose {(ak , bk ]} such that A⊂

j j ak , bk ], k

k

1 j j λ(ak , bk ] ≤ ∗ (A) + . j

287

Outer Measure and Measure

Let Bj =

j j k (ak , bk ]

and B =

(Bj ) ≤

Bj . Then A ⊂ B and B is a Borel set. Moreover,

k

1 j j λ(ak , bk ] ≤ ∗ (A) + . j

Since B ⊂ Bj , it follows that (B) ≤ ∗ (A) + (1/j), so that (B) ≤ ∗ (A). But the opposite inequality is also true since A ⊂ B, and the theorem follows. If μ is a finite Borel measure on R1 , define fμ (x) = μ((−∞, x]),

−∞ < x < +∞.

Note that fμ is monotone increasing and that μ((a, b]) = fμ (b) − fμ (a). It is natural to ask if the Lebesgue–Stieltjes measure induced by fμ agrees with μ as a Borel measure. An affirmative answer would mean that every finite Borel measure is a Lebesgue–Stieltjes measure. We shall see later (Corollary 11.22) that this is actually the case and that the continuity from the right of fμ (see Exercise 2) plays a role. The next result is also useful.

Theorem 11.10 If f is an increasing function that is continuous from the right, then its Lebesgue–Stieltjes measure satisfies ((a, b]) = f (b) − f (a). In particular, ({a}) = f (a) − f (a−). ((a, b]) ≤ f (b) − Proof. Since (a, b] covers itself, we always have ((a, b]) = ∗ f (a). To show the opposite inequality, suppose that (a, b] ⊂ (ak , bk ]. Given ε > 0, use the right continuity of f to choose {bk } with bk < bk ,

f (bk ) > f (bk ) − ε2−k .

If a satisfies a < a < b, then [a , b] is covered by the (ak , bk ), and therefore, there is a finite N such that [a , b] ⊂ N k=1 (ak , bk ). By discarding any unnec essary (ak , bk ) and reindexing the rest, we may assume that ak+1 < bk for k = 1, . . . , N−1. Also, a1 < a and b < bN , so that f (a1 ) ≤ f (a ) and f (b) ≤ f (bN ). We have k

λ(ak , bk ] ≥

N k=1

λ(ak , bk ] =

N

[f (bk ) − f (ak )]

k=1

= f (bN ) − f (a1 ) +

N−1

[f (bk ) − f (ak+1 )].

k=1

288

Measure and Integral: An Introduction to Real Analysis

Now, f (bN ) − f (a1 ) = [f (bN ) − f (bN )] + [f (bN ) − f (a1 )] ≥ −ε + [f (b) − f (a )]. Also, since f (bk ) − f (ak+1 ) ≥ 0 for k = 1, . . . , N − 1, N−1

[f (bk ) − f (ak+1 )] =

k=1

N−1

[f (bk ) − f (bk )] +

k=1

≥

N−1

[f (bk ) − f (ak+1 )]

k=1

∞

(−ε2−k ) + 0 = −ε.

k=1

Combining estimates, we obtain

λ(ak , bk ] ≥ −2ε + [f (b) − f (a )].

k

Letting ε → 0 and a → a, we have k λ(ak , bk ] ≥ f (b)−f (a). Hence, ((a, b]) ≥ f (b) − f (a), and the first statement of the theorem follows. The second statement is proved by applying the first to the intervals (a − (1/k), a], k = 1, 2, . . ., which decrease to {a}. This completes the proof. Let g be a Borel measurable function defined on R1 , and let f be a Lebesgue–Stieltjes measure. Then the integral g df is called the Lebesgue– Stieltjes integral of g with respect to f .∗ The next theorem gives a relation between g df and the usual Riemann–Stieltjes integral g df . Theorem 11.11 Let f be an increasing function that is right continuous on [a, b], and let g be a bounded Borel measurable function on [a, b]. If the Riemann–Stieltjes b integral a g df exists, then (a,b]

g df =

b

g df .

a

Proof. Let = {xj } be a partition of [a, b], and let mj and Mj be the inf and sup of g in [xj−1 , xj ] respectively. Let ∗ In some other texts, any integral

f dμ of the kind considered in Chapter 10 is called a Lebesgue–Stieltjes integral. We shall use the terminology only when μ is a Lebesgue–Stieltjes measure.

289

Outer Measure and Measure

L =

mj [f (xj ) − f (xj−1 )],

U =

Mj [f (xj ) − f (xj−1 )]

denote the corresponding lower and upper Riemann–Stieltjes sums. Define functions g1 and g2 by setting g1 = mj in (xj−1 , xj ] and g2 = Mj in (xj−1 , xj ]. Since f is right continuous, it follows from Theorem 11.10 that

g1 d = L ,

(a,b]

g2 d = U .

(a,b]

Therefore, since g1 ≤ g ≤ g2 , we obtain L ≤ (a,b] g d ≤ U . However, as b || → 0, both L and U converge to a g df by Theorem 2.29. This completes the proof. We remark in passing that a right continuous function f of bounded variation can be written f = f1 − f2 , where f1 and f2 are right continuous, bounded, and increasing. If 1 and 2 are the Lebesgue–Stieltjes measures corresponding to f1 and f2 , consider the Borel set function = 1 − 2 , and define g d = g d1 − g d2 . b b If a g df1 and a g df2 exist and are finite for a bounded Borel measurable function g, it then follows from Theorems 11.11 and 2.16 that

g d =

(a,b]

b

g df .

a

11.4 Hausdorff Measure Our second example of a Carathéodory outer measure is Hausdorff outer measure in Rn . To define it, fix α > 0, and let A be any subset of Rn . Given ε > 0, let (ε) Hα (A) = inf

δ(Ak )α ,

k

where δ(Ak ) denotes the diameter of Ak (see p. 5 in Section 1.3), and the inf is taken over all countable collections {Ak } such that A ⊂ Ak and δ(Ak ) < ε for all k. We may always assume that the Ak in a given covering are disjoint and that A = Ak .

290

Measure and Integral: An Introduction to Real Analysis

If ε < ε, each covering of A by sets with diameters less than ε is also such a cover for ε. Hence, as ε decreases, the collection of coverings decreases, and (ε) consequently Hα (A) increases. Define (ε) Hα (A) = lim Hα (A). ε→0

Theorem 11.12

For α > 0, Hα is a Carathéodory outer measure on Rn .

Proof. Clearly, Hα ≥ 0 and Hα (∅) = 0. If A1 ⊂ A2 , then any covering of (ε) (ε) A2 is also one of A1 , so that Hα (A1 ) ≤ Hα (A2 ). Letting ε→ 0, we obtain Hα (A1 ) ≤ Hα (A2 ). To show that Hα is subadditive, let A = Ak , and choose a cover of Ak for each k. The union of these is a cover of A, and it is easy (ε) (ε) to show that Hα (Ak ) ≤ Hα (Ak ). Letting ε → 0, we get Hα (A) ≤ Hα (Ak ). The details of this argument and the proof that Hα is a Hα (A) ≤ Carathéodory outer measure are left as exercises. Hα is called Hausdorff outer measure of dimension α on Rn , and the corresponding measure is called Hausdorff measure of dimension α and also denoted Hα . It has the following basic property.

Theorem 11.13 (i) If Hα (A) < +∞, then Hβ (A) = 0 for β > α. (ii) If Hα (A) > 0, then Hβ (A) = +∞ for β < α. Proof. Statements (i) and (ii) are equivalent. To prove (i), let A = δ(Ak ) < ε. If β > α, then (ε) Hβ (A) ≤ (ε)

δ(Ak )β ≤ εβ−α

Ak ,

δ(Ak )α .

(ε)

Therefore, Hβ (A) ≤ εβ−α Hα (A). Letting ε → 0, we obtain Hβ (A) = 0 if Hα (A) < +∞, completing the proof. The next theorem shows that Hausdorff outer measure is regular, by showing that any set in Rn can be included in a Borel set with the same Hausdorff outer measure. Theorem 11.14 Given A ⊂ Rn and α > 0, there is a set B of type Gδ containing A such that Hα (A) = Hα (B).

291

Outer Measure and Measure

Proof. Given ε > 0, choose {Ak } such that A =

(ε/2)

δ(Ak )α ≤ Hα

Ak , δ(Ak ) < ε/2 and

(A) + ε ≤ Hα (A) + ε.

ε)δ(Ak ); this can be done Enclose Ak in an open set Gk with δ(Gk ) ≤ (1 + by letting Gk = {x : d(x, Ak ) < εδ(Ak )/2}. Let G = Gk . Then G is open and A ⊂ G. Since δ(Gk ) < (1 + ε)ε/2 < ε for 0 < ε < 1, we have (ε) Hα (G) ≤

δ(Gk )α ≤ (1 + ε)α

δ(Ak )α

≤ (1 + ε)α [Hα (A) + ε]. Now let ε → 0 through a sequence {εj }, and let G(j) be the corresponding open sets G as above. If B = G(j), then B is of type Gδ and A ⊂ B. Also, since B ⊂ G(j) for each j, we have (ε) (B) ≤ (1 + ε)α [Hα (A) + ε] Hα

for ε = εj .

Letting j → ∞, we obtain Hα (B) ≤ Hα (A). Since the opposite inequality is clearly true, the result follows. If A is a subset of R1 and A = Ak with δ(Ak ) < ε, then δ(Ak ) = |Ik |, where Ik is the smallest interval containing Ak . Hence, in the one-dimensional case, (ε) Hα (A) = inf

|Ik |a

(n = 1),

where the Ik ’s are intervals of length less than ε such that A ⊂ Ik . If α = 1, it follows that H1 (A) is the usual Lebesgue outer measure of A. In Rn , n > 1, Hn is not the same as Lebesgue outer measure (see Exercise 10). Nevertheless, there is a simple relation between the two, which is a corollary of the next lemma. Let (ε) (A) = inf Hα

δ(Qk )α ,

where {Qk } is any collection of cubes with edges parallel to the axes such that A ⊂ Qk and δ(Qk ) < ε. Also, let (ε) (A) = lim Hα (A). Hα ε→0

292

Measure and Integral: An Introduction to Real Analysis

is defined in the same way as H , except that cubes are used instead Thus, Hα α of arbitrary sets.

Lemma 11.15

There is a constant c depending only on n and α such that Hα (A) ≤ Hα (A) ≤ cHα (A),

A ⊂ Rn .

Proof. Since every covering of A by cubes is a covering of A, we obtain (A). Any set with diameter δ, say, is contained in a cube with Hα (A) ≤ Hα √ Ak , δ(Ak ) < ε. edge length 2δ and so with diameter √2 n δ. Now let A = Select cubes Qk ⊃ Ak with δ(Qk ) = 2 n δ(Ak ). Then

√ √ √ (2 n ε) (A). δ(Ak )α = (2 n)−α δ(Qk )α ≥ (2 n)−α Hα

√ −α (2√nε) (ε) Therefore, H (A) ≥ (2 n) Hα (A). Letting ε → 0, we obtain Hα (A) ≥ α √ (A), which completes the proof. (2 n)−α Hα Theorem 11.16 (i) There are positive constants c1 and c2 depending only on the dimension n such that c1 Hn (A) ≤ |A|e ≤ c2 Hn (A) for A ⊂ Rn . (ii) If α > n, then Hα (A) = 0 for every A ⊂ Rn . Proof. We first claim that for every set A ⊂ Rn , Hn (A) = inf

{Qk }

δ(Qk )n ,

where the inf is taken over all collections {Qk } of cubes with edges parallel to the coordinate axes that cover A, without restriction on the size of δ(Qk ). Let I denote the inf on the right side. It follows easily that Hn (A) ≥ I. To collection of cubes show the opposite inequality, let {Qk : k = 1, 2, . . .} be a with edges parallel to the coordinate axes such that A ⊂ Qk , and let ε, η > 0. Pick cubes {Q∗k } satisfying Qk ⊂ (Q∗k )◦ and |Q∗k − Qk | < η/2k for each k. Decompose (Q∗k )◦ = j Q k,j into the union of nonoverlapping cubes Qk,j with k,j ) < ε (and edges parallel to the coordinate axes); see Theorem 1.11 and δ(Q the comment after its proof about the size of the initial net of cubes used in its ∗ n proof. Then, for each k, we have |Q∗k | = j |Q k,j |, and consequently δ(Qk ) = n n j δ(Qk,j ) since for any cube Q, δ(Q) is proportional to the volume of Q: in k,j and fact, δ(Q)n = nn/2 |Q|. Then A ⊂ k,j Q

293

Outer Measure and Measure

Hn(ε) (A) ≤

k,j )n = nn/2 δ(Q

k,j

n by Theorem 11.13. If Hn (A) = +∞, write A = (A ∩ Qj ), where the Qj are disjoint (partly open) cubes. Since |A ∩ Qj |e is finite, so is Hn (A ∩ Qj ). Hence, Hα (A ∩ Qj ) = 0 for α > n. Therefore, Hα (A) ≤ Hα (A ∩ Qj ) = 0. It is natural to ask if Hα (A) is comparable to the expression inf δ(Ak )α , where the inf is taken over all coverings {Ak } of A, without any requirement on the size of the diameters. It is not difficult to see that the answer in general is no. In fact, this expression is finite for any bounded A, as is easily seen from covering A by itself. On the other hand, it is clear from Theorems 11.13 and 11.16 that if |A|e > 0, then Hα (A) = +∞ for α < n. However, in case α = n, Hα (A) is comparable to the previous expression. To see this, it is enough by Lemma 11.15 to prove that Hn (A) is comparable to it. But since Hn (A) = inf δ(Qk )n , where {Qk } is any collection of cubes covering A whose edges are parallel to the axes, this follows from the fact that inf δ(Qk )n and inf δ(Ak )n are comparable (cf. the proof of Lemma 11.15). We remark in passing that Hausdorff measure is particularly useful in measuring sets with Lebesgue measure zero since these may have positive Hausdorff measure for some α < n. For example, it can be shown that the Cantor set in [0,1] has Hausdorff measure of dimension log 2/log 3 equal to 1; see Exercise 19.

11.5 The Carathéodory–Hahn Extension Theorem In this section, we will settle the question that arose in the discussion preceding Theorem 11.10. We recall the situation. Let μ be a finite Borel measure

294

Measure and Integral: An Introduction to Real Analysis

on R1 , and let be the Lebesgue–Stieltjes measure induced by the function f (x) = μ((−∞, x]). Since f is continuous from the right, we have by Theorem 11.10 that ((a, b]) = μ((a, b]) [= f (b) − f (a)]. The point in question is whether this implies that μ and agree on every Borel set in R1 . More generally, we may ask if two Borel measures can be finite and equal on every (a, b] without being identical. It is worthwhile to consider this question in a still more general context, which we now present. Let S be a fixed set. By an algebra A of subsets of S , we mean a nonempty collection of subsets of S that is closed under the operations of taking complements and finite unions; that is, A is an algebra if it satisfies the following: (i) If A ∈ A , then CA (= S − A) ∈ A . (ii) If A1 , . . . , AN ∈ A , then N k=1 Ak ∈ A . What distinguishes an algebra from a σ-algebra is that an algebra is only closed under finite unions. It follows from the definition that an algebra is also closed under finite intersections and differences (relative complements) and that both the empty set ∅ and the whole space S belong to it. The collection of finite intervals (a, b] on the line is clearly not an algebra. However, we can generate an algebra from it by adjoining ∅, R1 , and all intervals of the form (−∞, a] and (b, +∞), as well as all possible finite disjoint unions of these and the intervals (a, b]. This algebra will be called the algebra generated by the intervals (a, b]. By a measure λ on an algebra A , we mean a function λ which is defined on the elements of A and which satisfies (i) λ(A) ≥ 0 and λ(∅) = 0, ∞ (ii) λ( ∞ k=1 Ak ) = k=1 λ(Ak ) whenever {Ak } is a countable collection of disjoint sets in A whose union also belongs to A . It follows easily that such a λ is monotone: if A1 ⊂ A2 and A1 , A2 ∈ A , then λ(A1 ) ≤ λ(A2 ). A measure λ on A is called σ-finite (with respect to A ) if S can be written S = Sk with Sk ∈ A and λ(Sk ) < +∞. For example, any Lebesgue–Stieltjes measure on the line is a σ-finite measure on the algebra generated by the intervals (a, b]. Using the ideas behind the construction of a Lebesgue–Stieltjes outer measure, we can construct an outer measure λ∗ from λ. Thus, let λ be a measure

295

Outer Measure and Measure

on an algebra A of subsets of S . For any subset A of S , define λ∗ (A) = inf

λ(Ak ),

(11.17)

where the infimum is taken over all countable collections {Ak } such that Ak ∈ A and A ⊂ Ak . It is always possible to find such a covering of A since S itself belongs to A . The facts that A is an algebra and λ is monotone allow us to assume without loss of generality that the sets Ak covering A are disjoint since k≥1 Ak = A1 ∪ (A2 − A1 ) ∪ (A3 − A2 − A1 ) ∪ · · · . Theorem 11.18 Let λ be a measure on an algebra A , and let λ∗ be defined by (11.17). Then λ∗ is an outer measure. The proof is similar to the first part of the proof of Theorem 11.8 and is left as an exercise. While λ is assumed to be defined only on A , λ∗ is defined on every subset of S . The next result shows that λ∗ equals λ when restricted to A . Theorem 11.19 Let λ be a measure on an algebra A , and let λ∗ be the corresponding outer measure. If A ∈ A , then λ∗ (A) = λ(A) and A is measurable with respect to λ∗ . Proof. Let A ∈ A . Clearly, λ∗ (A) ≤ λ(A). On the other hand, given disjoint Ak ∈ A with A ⊂ ∪Ak , let Ak = Ak∩ A. Then Ak ∈ A and A is the dis joint union of the Ak . Hence, λ(A) = ∗ λ(Ak ). Since Ak ⊂ Ak , it follows that λ(A) ≤ λ(Ak ). Therefore, λ(A) ≤ λ (A), and the proof of the first part of the theorem is complete. For the second part, let A ∈ A . To show that A is measurable with respect to λ∗ , we must show that λ∗ (E) = λ∗ (E ∩ A) + λ∗ (E − A) for every E ⊂ S . Since λ∗ is subadditive, the right side majorizes the left. To show the opposite inequality, we may that λ∗ (E) is finite. Given ε > 0, choose {Ek } such assume that Ek ∈ A , E ⊂ Ek and λ(Ek ) < λ∗ (E) + ε. Since Ek and A are in A and (Ek ∩ A) ∪ (Ek − A) = Ek , we have λ(Ek ∩ A) + λ(Ek − A) = λ(Ek ). Hence,

λ(Ek ∩ A) +

λ(Ek − A) < λ∗ (E) + ε.

296

Measure and Integral: An Introduction to Real Analysis

Therefore, since E ∩ A ⊂ the definition of λ∗ that

(Ek ∩ A) and E − A ⊂

(Ek − A), it follows from

λ∗ (E ∩ A) + λ∗ (E − A) < λ∗ (E) + ε. Letting ε → 0, we obtain the desired inequality, which completes the proof. Let λ be a measure on an algebra A , and let μ be a measure on a σ-algebra that contains A . Then μ is said to be an extension of λ to if μ(A) = λ(A) for every A ∈ A . If λ∗ is the outer measure generated by λ and A ∗ denotes the σ-algebra of λ∗ -measurable sets, it follows from the last theorem that λ∗ is an extension of λ to A ∗ . This proves the first part of the following theorem, which is the main result of this section.

Theorem 11.20 (Carathéodory–Hahn Extension Theorem) Let λ be a measure on an algebra A , let λ∗ be the corresponding outer measure, and let A ∗ be the σ-algebra of λ∗ -measurable sets. (i) The restriction of λ∗ to A ∗ is an extension of λ. (ii) If λ is σ-finite with respect to A , and if is any σ-algebra with A ⊂ ⊂ A ∗ , then λ∗ is the only measure on that is an extension of λ. Proof. As we have already observed, (i) follows from Theorem 11.19. To prove (ii), which states the uniqueness of the extension, let μ be any measure on , A ⊂ ⊂ A ∗ , which agrees with λ on A . Given a set E ∈ , consider any countable collection {Ak } such that E ⊂ Ak and each Ak ∈ A . Then μ(Ak ) = λ(Ak ). μ(E) ≤ μ Ak ≤ Therefore, by the definition of λ∗ , we have μ(E) ≤ λ∗ (E). To show that equality holds, first suppose that there exists a set A ∈ A with E ⊂ A and λ(A) < +∞. Applying what has just been proved to A − E (which belongs to ), we obtain μ(A − E) ≤ λ∗ (A − E). However, μ(E) + μ(A − E) = μ(A) = λ(A) = λ∗ (A) = λ∗ (E) + λ∗ (A − E). Since all these terms are finite (due to the fact that λ(A) is finite), it follows that μ(E) ≥ λ∗ (E), so that μ(E) = λ∗ (E) in this case. In the general case, since λ is σ-finite, there exists disjoint Sk ∈ A such that S = Sk and λ(Sk ) < +∞. We may apply the previous result to each E ∩ Sk (which is a subset of Sk ), obtaining μ(E ∩ Sk ) = λ∗ (E ∩ Sk ). By adding over k, we obtain μ(E) = λ∗ (E), which completes the proof.

297

Outer Measure and Measure

As a corollary, we can answer the questions raised at the beginning of this section. Corollary 11.21 Let μ and ν be two Borel measures on R1 which are finite and equal on every half-open interval (a, b], −∞ < a < b < +∞. Then μ(B) = ν(B) for every Borel set B ⊂ R1 . Proof. Such μ and ν must agree on the algebra generated by the (a, b] and are σ-finite with respect to this algebra. Since the smallest σ-algebra containing all (a, b] is the Borel σ-algebra, it follows from Theorem 11.20 that μ and ν are identical. Although we have confined ourselves to n = 1, we note that an analogue of Corollary 11.21 can be formulated in higher dimensions. See Exercise 18. Now let μ be any finite Borel measure on R1 , and define fμ by fμ (x) = μ((−∞, x]). Clearly, μ((a, b]) = fμ (b) − fμ (a). Moreover, if fμ denotes the Lebesgue–Stieltjes measure constructed from fμ , then since fμ is continuous from the right, it follows from Theorem 11.10 that fμ ((a, b]) = fμ (b) − fμ (a). Therefore, by the previous corollary, μ and fμ are identical as Borel measures, and we easily obtain Corollary 11.22 The class of finite Borel measures on R1 is identical with the class of Lebesgue–Stieltjes measures induced by bounded increasing functions that are continuous from the right. See also Exercise 4. Let μ be a Borel measure on R1 which is finite on every (a, b], −∞ < a < b < +∞ (equivalently, μ is finite on every bounded Borel set). Consider the restriction of μ to the algebra A generated by the (a, b], and let μ∗ be the corresponding outer measure. The smallest σ-algebra containing A is the Borel sets B. Thus, A ⊂ B ⊂ A ∗ . Since μ and μ∗ are measures on B that agree on A , it follows that μ = μ∗ on B. In particular, if B ∈ B, we see from the definition of μ∗ (see (11.17)) that μ(B) = inf

μ(Ak ) : B ⊂

Ak , Ak ∈ A .

Each Ak ∈ A is a countable union of disjoint (a, b]. Hence, we obtain the formula μ(B) = inf

μ(ak , bk ] : B ⊂

(ak , bk ] ,

B ∈ B.

(11.23)

298

Measure and Integral: An Introduction to Real Analysis

We recall (see p. 269 in Section 10.5) that a Borel measure μ is said to be regular if μ(B) = inf{μ(G) : B ⊂ G, G open},

B ∈ B.

Theorem 11.24 If μ is a Borel measure on R1 which is finite on every bounded Borel set, then μ is regular. Proof. This is a corollary of (11.23). Given a Borel set B and ε > 0, find a cover {(ak , bk ]} of B such that

μ(ak , bk ] ≤ μ(B) + ε.

Since μ is finite on bounded intervals, it follows from Theorem 10.11 that μ(a, b] = limε→0+ μ(a, b + ε). Hence, by slightly enlarging each (ak , bk ], we see that there is an open set G, G = (ak , bk + εk ) for sufficiently small εk , containing B such that μ(G) ≤

μ(ak , bk + εk ) ≤ μ(B) + 2ε.

This completes the proof. In Theorem 11.24, as in Corollary 11.21, we have limited ourselves to n = 1. For n > 1, see Exercise 18. In particular, note that the assumption in Section 10.5 on p. 269 concerning the regularity of μ and ν is redundant.

Exercises 1. (a) Prove the second statements in both parts of Corollary 11.4. (b) Verify the statements made before Theorem 11.5 about the function r(A) defined on sets A ⊂ R2 . (One way to see that a set B with d(Y, B) > 0 is not r-measurable is to denote the mirror reflection of B in the y-axis by B∗ and check that the equation r(B ∪ B∗ ) = r(B) + r(B∗ ) is false.) 2. Let μ be a finite Borel measure on R1 , and define fμ (x) = μ((−∞, x]), −∞ < x < +∞. Show that fμ is monotone increasing, μ((a, b]) = fμ (b) − fμ (a), fμ is continuous from the right, and limx→−∞ fμ (x) = 0. 3. Let f be monotone increasing on R1 . (a) Show that f (R1 ) is finite if and only if f is bounded.

Outer Measure and Measure

299

(b) Let f be bounded and right continuous, let μ = f , and let f¯ denote the function fμ defined in Exercise 2. Show that f and f¯ differ by a constant. Thus, if we make the additional assumption that limx→−∞ f (x) = 0, then f = f¯ . 4. If we identify two functions on R1 which differ by a constant, prove that there is a one-to-one correspondence between the class of finite Borel measures on R1 and the class of bounded increasing functions that are continuous from the right. 5. Let f be monotone increasing and right continuous on R1 . (a) Show that f is absolutely continuous with respect to Lebesgue measure if and only if f is absolutely continuous on R1 . (By absolutely continuous on R1 , we mean absolutely continuous on every compact interval.) (b) If f is absolutely continuous with respect to Lebesgue measure, show that its Radon–Nikodym derivative equals df /dx. 6. Prove that the Lebesgue–Stieltjes outer measure constructed from f (x) = x is the same as Lebesgue outer measure. 7. If f is monotone increasing and continuous from the right on R1 , show ◦∗ ∗ that ∗f (A) = ◦∗ f (A), where f is defined in the same way as f except that we use open intervals (ak , bk ). 8. If f is monotone increasing and continuous from the right, derive formulas for f ([a, b]) and f ((a, b)). 9. Complete the proof of Theorem 11.12. 10. Show that in Rn , n > 1, the Hausdorff outer measure Hn is not identical to Lebesgue outer measure. (For example, let n = 2, and write A = Ak , ) < ε. Enclose Ak in a circle Ck with the same diameter, and show δ(Ak that δ(Ak )2 ≥ (4/π)|A|e . Thus, H2ε (A) ≥ (4/π)|A|e .) 11. If A is a subset of Rn , define the Hausdorff dimension of A as follows: If Hα (A) = 0 for all α > 0, let dim A = 0; otherwise, let dim A = sup{α : Hα (A) = +∞}. (a) Show that Hα (A) = 0 if α > dim A and that Hα (A) = +∞ if α < dim A. Show that in Rn we have dim A ≤ n. See Exercise 19 in order to determine the Hausdorff dimension of the Cantor set. (b) If dim Ak = d for each Ak in a countable collection {Ak }, show that dim( Ak ) = d. Hence, show that every countable set has Hausdorff dimension 0. 12. Let be an outer measure on S , and let denote restricted to the -measurable sets. Since is a measure on an algebra, it induces an outer measure ∗ . Show that ∗ (A) ≥ (A) for A ⊂ S and that equality holds

300

Measure and Integral: An Introduction to Real Analysis

for a given A if and only if there is a -measurable set E such that A ⊂ E and (E) = (A). Thus, = ∗ if is regular. 13. Let λ be a measure on an algebra A , and let λ∗ be the corresponding outer measure. Given A, show that there is a set H of the form k j Ak,j such that Ak,j ∈ A , A ⊂ H and λ∗ (A) = λ∗ (H). Thus, every outer measure that is induced by a measure on an algebra is regular. 14. Prove Theorem 11.18. 15. (a) Show that the intersection of a family of algebras is an algebra. (b) A collection C of subsets of S is called a subalgebra if it is closed under finite intersections and if the complement of any set in C is the union of a finite number of disjoint sets in C . Give an example of a subalgebra. Show that a subalgebra C generates an algebra by adding ∅, S , and all finite disjoint unions of sets of C . 16. If μ is a finite Borel measure on R1 , show that μ(B) = sup μ(F) for every Borel set B, where the sup is taken over all closed subsets F of B. 17. Show that the conclusions of Theorems 10.48 and 10.49, and therefore also the conclusion of Corollary 10.50, remain true without the assumption (ii) stated before Lemma 10.47. (Show that without this assumption, the conclusions of Lemma 10.47 are true with μ replaced by μ∗ ; for example, ν(Rn ) ν(Qx (h)) >α ≤c . μ∗ x ∈ E : sup α h>0 μ(Qx (h)) 18. Derive analogues of Corollary 11.21 and Theorem 11.24 in Rn , n > 1. (Use partly open n-dimensional intervals in place of the intervals (a, b].) 19. Show that the Cantor set C in [0, 1] has the Hausdorff measure of order log 2/log 3 equal to 1. (In order to show the inequality Hlog 2/ log 3 (C) ≤ 1, (ε) consider Hlog 2/ log 3 (C) when ε = 3−k , k = 1, 2, . . ., and show that the

2k intervals {Ij } of length 3−k remaining at the kth stage Ck of construc α tion of C satisfy |Ij | = 2k 3−kα = 1 if α = log 2/ log 3. A proof of the opposite inequality is harder. It may be helpful to note that if I is a closed interval, containing at its two endpoints intervals from Ck , then |I|α ≥ nk (I)/2k , where nk (I) is the number of intervals of Ck contained in I and α = log 2/ log 3.)

12 A Few Facts from Harmonic Analysis

12.1 Trigonometric Fourier Series The Lebesgue measure and integration have been decisive in the development of many branches of analysis and are applied there in ever greater degree. But, conversely, some of the applications have had considerable impact on the theory of integration. In this chapter, we will consider one topic where this interdependence has been particularly fruitful: harmonic analysis (see p. 305 in Section 12.1). One of the principal goals of harmonic analysis, which is a vast field, is to represent very general functions f in terms of a collection of simpler oscillatory ones. The fact that typical representations involve integration of f accounts for the interrelation of the two fields. Oscillatory behavior of the simpler functions has advantages: it can help make them independent of one another, and it can be exploited in order to find particular linear combinations of them that approximate general functions. We begin by describing some elementary notions and facts; the concept of an orthogonal system, and in particular of the trigonometric system, is basic here. The notion of an orthogonal system, defined generally on a subset E of positive measure in Rn , was introduced in Chapter 8, and we refer the reader to that place for the definitions and properties of general orthogonal systems, restating only a few facts here. A system of complex-valued functions {φα (x)}, all belonging to L2 (E), is called orthogonal over E if

φα , φβ =

φα φβ E

= 0 α = β > 0 α = β.

The second condition means that φα ≡ 0. If φα , φα = 1 for all α, the orthogonal system is called normal, or orthonormal. If {φα } is orthogonal, the system {φα /||φα ||2 } is orthonormal. Thus, by merely multiplying the functions of an orthogonal system by suitable constants, we can normalize it, and formulas 301

302

Measure and Integral: An Introduction to Real Analysis

for orthonormal systems can be easily and automatically extended to general, not necessarily normal, orthogonal systems. On the other hand, certain important orthogonal systems very naturally appear, often for historical reasons, in a nonnormalized form, and because of this, it may be desirable not to insist on the normality of the system under consideration. Let us, therefore, briefly restate the definitions in this somewhat more general setup. Since orthogonal systems are countable (see Theorem 8.21), we may index them by integers. Let φ1 (x), φ2 (x), . . . be an orthogonal system on E ⊂ Rn . Thus, 0 k = l. φk φl = λk > 0 k = l.

E

Given any (complex-valued) f ∈ L2 (E), we call the numbers ck =

1 f φk λk E

the Fourier coefficients of f and the series S[f ] = f , with respect to {φk }. As before, we write f ∼

ck φk (x) the Fourier series of

ck φk (x).

If we set −1/2

ψk = λ k

φk ,

1/2

dk = λk ck =

f ψk ,

(12.1)

E

then {ψk } is orthonormal over E, and {dk } is the sequence of Fourier coefficients of f with respect to {ψk }. Clearly, dk ψk = ck φk .

(12.2)

This set of formulas enables us to rewrite relations for orthonormal systems forms valid for general orthogonal systems. Thus, Bessel’s inequality in |dk |2 ≤ E |f |2 and Parseval’s formula |dk |2 = E | f |2 take the forms

λk |ck |2 ≤

E

| f |2 ,

λk |ck |2 =

| f |2 .

(12.3)

E

The notion of completeness of an orthogonal system (“the vanishing of all the Fourier coefficients implies the vanishing of the function”) remains unchanged in the general case, and as in the case of normalized systems, the

303

A Few Facts from Harmonic Analysis

validity of the second formula in (12.3) is a necessary and sufficient condition for the completeness of {φk }. Let sn denote the nth partial sum of S[f ]. As a corollary of the corresponding result for orthonormal systems, we see that the equation λk |ck |2 = E | f |2 is equivalent to f − sn 2 → 0. E

Thus, if an orthogonal system is complete, the Fourier series of every f ∈ L2 (E) 2 converges to f , convergence being understood in the metric L . Of course, this says nothing about the pointwise convergence of ck φk (x). On the other hand, it holds for any rearrangement of the terms of ck φk since the orthogonality and completeness of a system are not affected by a permutation of the functions within the system. We shall now consider a special orthogonal system, the trigonometric system. This name is given to the system of functions eikx = cos kx + i sin kx, x ∈ (−∞, ∞)

(k = 0, ±1, ±2, . . .).

These functions are all periodic, with period 2π, and it is immediate that they form an orthogonal system over any interval Q = (a, a + 2π) of length 2π, since if k and m are distinct integers, then

eikx eimx dx

Q

=

e

i(k−m)x

dx =

Q

ei(k−m)x k−m

a+2π = 0. a

The system is not orthonormal since λk = Q |eikx |2 dx = 2π for all k. Thus, with any f ∈ L(Q), we may associate its Fourier coefficients ck =

1 1 f (t) eikt dt = f (t)e−ikt dt 2π 2π Q

(k = 0, ±1, ±2, . . .),

(12.4)

Q

and its Fourier series f ∼

+∞

ck eikx .

(12.5)

−∞

In what follows, this series will be designated by S[f ], and its coefficients by ck [f ].

304

Measure and Integral: An Introduction to Real Analysis

Observe that2 if two functions φ and ψ are orthogonal over a set E and if 2 = |φ| E E |ψ| , then the pair φ ± ψ is also orthogonal over E, as seen from the equation

(φ + ψ)(φ − ψ) =

E

|φ|2 −

E

|ψ|2 = 0.

E

Applying this to the pairs e±ikx (k = 1, 2, . . .), we see that the functions eikx + e−ikx eikx − e−ikx 1 ,..., , , . . . (k = 1, 2, . . .) 2 2 2i

(12.6)

or, what is the same thing, the functions 1 , cos x, sin x, . . . , cos kx, sin kx, . . . 2

(12.7)

are orthogonal over any interval Q of length 2π. Using the form (12.6), we find that the numbers λk for (12.7) are 1 π, π, π, . . . . 2 Thus, any f ∈ L(Q) can be developed into a new Fourier series ∞

f ∼

1 a0 + (ak cos kx + bk sin kx), 2

(12.8)

k=1

where a0 =

1 1 −1 1 π f = f, 2 2 π Q

1 f (t) cos kt dt, ak = π Q

Q

1 bk = f (t) sin kt dt. π

(12.9)

Q

The numbers ak and bk are easily expressible in terms of the coefficients ck of (12.4): 1 a0 = c0 , 2

ak = ck + c−k ,

bk = i(ck − c−k ).

(12.10)

305

A Few Facts from Harmonic Analysis

Hence, n

ck eikx = c0 +

k=−n

n

ck eikx +

k=1

=

1 a0 + 2

n

c−k e−ikx

k=1

n

(ak cos kx + bk sin kx),

k=1

and the nth partial sum of the series in (12.8) turns out to be the nth symmetric partial sum of ck eikx . The numbers ak = ak [f ] and bk = bk [f ] are called the Fourier cosine and sine coefficients of f , respectively. To sum up, we may consider the trigonometric system in two forms. One consists of functions eikx (k = 0, ±1, ±2, . . .), and the Fourier series has the +∞ the form −∞ ck eikx , where the ck are given by (12.4). The other consists of the functions (12.7), the Fourier series is (12.8), and the coefficients ak and bk are given by (12.9). The partial sums of (12.8) are the symmetric partial sums of (12.5). In both cases, the terms of the Fourier series are harmonic oscillations, and for this reason, the study of S[f ] is called the harmonic analysis of f . Each form of the trigonometric system has its advantages. For example, if f is real-valued, then the numbers ak and bk are real, while the ck have the property c−k = ck . Note also that if Q = (−π, π) and f is an even function, that is, if f (−x) = f (x), then 2 f (t) cos kt dt, π π

ak =

bk = 0,

0

and if f is an odd function, that is, if f (−x) = −f (x), then 2 f (t) sin kt dt. π π

ak = 0,

bk =

0

∞ 1 Thus, if f is even, (12.8) reduces ∞ to the cosine series 2 a0 + k=1 ak cos kx, and if f is odd, to the sine series k=1 bk sin kx. Since the terms of a (trigonometric) Fourier series are periodic with period 2π, if we expect to represent a function f by its Fourier series, we may assume from the start that f is defined everywhere (or almost everywhere [a.e.]) on the real axis and is periodic with period 2π. This amounts to considering the function as defined on the circumference of the unit circle. We do not distinguish between points that are congruent mod 2π. By an integrable function, we shall mean a function integrable over a period. Similarly, the Lp norm of a function will mean its Lp norm over a period and the familiar definitions

306

Measure and Integral: An Introduction to Real Analysis

of other classes of function such as functions of bounded variation and Lipschitz continuous functions will also be restricted to a period. In what follows, periodic will mean periodic of period 2π. In the preceding chapters, we proved a number of theorems about functions in Lp (Rn ) and, in particular, in Lp (R1 ). Usually, these results have analogues for periodic functions, where integrals over R1 are replaced by integrals over a period. The proofs are usually in essence identical with those for R1 (or are merely corollaries of the results for R1 ) and may be left as exercises. We would like to stress one point. The definition of a general orthogonal system presupposes that the functions in the system are of class L2 . This makes it possible to define Fourier coefficients for any f ∈ L2 . If f is not in L2 , it may be impossible to define its Fourier coefficients with respect to certain orthogonal systems. The situation is different

for special orthogonal systems. For example, in the case of the functions eikx , which are bounded, the coefficients ck are defined for any f that is merely integrable over Q and, in particular, for any f ∈ Lp (Q), 1 ≤ p ≤ ∞. Thus, the trigonometric system is richer in properties than general orthogonal systems. Some simple developments are important for the general theory of Fourier series. We consider two here and refer the reader to Exercise 5 for others. Example (a). Let f be periodic and equal to 12 (π − x) for 0 < x < 2π, with f (0) = f (2π) = 0. Since f is odd, its Fourier series is a sine series, and integration by parts gives ⎞ ⎛ π π 1 ⎝π 1 2 1 1 (π − x) sin kx dx = − bk = cos kx dx⎠ = . π 2 π k k k 0

0

Thus,

f ∼

∞ sin kx

k

k=1

+∞

=

1 eikx , 2 −∞ ik

where denotes k=0 . Example (b). Let f be periodic and equal in (−π, π) to the characteristic function of the interval (−h, h), 0 < h < π. Then f is even, and if k = 0, its cosine coefficient is 2 2 sin kh . cos kx dx = π π k h

ak =

0

307

A Few Facts from Harmonic Analysis

Since a0 = 2h/π, we obtain ∞ 2h 1 sin kh + cos kx f ∼ π 2 kh k=1 +∞ h sin kh ikx = e . 1+ π kh −∞ Series of the form +∞

∞

1 a0 + (ak cos kx + bk sin kx) , 2

ck eikx ,

−∞

k=1

whether they are Fourier not, are called trigonometric series. In definseries or ikx , we usually consider the limit, ordinary or c e ing the convergence of +∞ −∞ k generalized, of the symmetric partial sums +n −n , and once again n

1 a0 + (ak cos kx + bk sin kx) , 2 n

ck eikx =

−n

k=1

where 1 a0 = c0 , 2

ak = ck + c−k ,

bk = i ck − c−k .

A finite sum T = n−n ck eikx is called a trigonometric polynomial of order n, and if |c−n | + |cn | = 0, T is strictly of order n. If T is of order n and vanishes at more than 2n distinct points (i.e., distinct mod 2π), then it vanishes identically, that is, all the ck are 0. In fact, Teinx = n−n ck ei(k+n)x is an algebraic (power) polynomial in z = eix of degree ≤ 2n, and if it has more than 2n zeros, then it vanishes identically. If the numbers ak and bk are real, the trigonometric series ∞

S=

1 a0 + (ak cos kx + bk sin kx) 2 k=1

is the real part of the power series ∞

1 a0 + (ak − ibk ) zk 2 k=1

308

Measure and Integral: An Introduction to Real Analysis

on the unit circle z = eix . The imaginary part is then the series ∞

(ak sin kx − bk cos kx)

(12.11)

k=1

(with vanishing constant term). If S is written in the complex form it is easy to see that (12.11) is +∞

(−i sign k)ck eikx

+∞

−∞ ck e

ikx ,

(12.12)

−∞

(where, by convention, sign 0 = 0). The series (12.11), or (12.12), is said to be conjugate to S. A series conjugate to a trigonometric series S is denoted by S. If S has constant term 0, then ≈

S = −S.

It is natural to study the properties of S[f ] simultaneously with those of S[f ]. One more remark. Properties of functions in Lp (Rn ), and in particular in Lp (R1 ), are important for the theory of Fourier integrals, which for nonperiodic functions play the same role as Fourier series in the periodic case. The two theories run largely parallel. In this chapter, we shall limit ourselves to Fourier series since our primary aim is to show the role that Lebesgue integration plays in some problems of representability of functions, and both the results and techniques of Fourier series are sufficiently indicative of the situation. In Chapter 13, we will study the main aspects of Fourier integrals in Rn , n ≥ 1.

12.2 Theorems about Fourier Coefficients Theorem 12.13 If a periodic f is the indefinite integral of its derivative f (i.e., if ikx f is periodic and absolutely continuous), and if f ∼ ck e , then f ∼

+∞

ck (ik)eikx .

−∞

In symbols, S[f ] = S [f ], where S [f ] denotes the result of the termwise differentiation of S[f ].

309

A Few Facts from Harmonic Analysis

Proof. It is clear that f is also periodic and that its constant term equals −1

2π

(2π)

f (x) dx = (2π)−1 [f (2π) − f (0)] = 0.

0

If k = 0, integrating by parts and observing that the integrated term is zero, we have (2π)−1

2π

f (x)e−ikx dx = (2π)−1 ik

0

2π

f (x)e−ikx dx = ikck ,

0

which proves the theorem. By repeated application of this result, we see that if a periodic f is the mth indefinite integral of an integrable function f (m) , then S f (m) = S(m) [f ] = (ik)m ck eikx .

Theorem 12.14 If f is periodic, f ∼ f , then F(x) − c0 x is periodic and

ck eikx , and if F is the indefinite integral of

F(x) − c0 x ∼ C0 +

ck eikx , ik

constant (depending where C0 is a suitable on the choice of the arbitrary constant of integration in F) and again denotes k=0 . Proof. Let G(x) = F(x) − c0 x. The periodicity of G follows from the equation G(x + 2π) − G(x) =

x+2π

f dt − c0 (2π) = 2πc0 − 2πc0 = 0.

x

Since G is also absolutely continuous, Theorem 12.13 gives S [G] = S[G ] = S[f − c0 ] =

ck eikx .

k=0

Hence, S[G] is obtained by termwise integration of the result, C0 being the constant term of S[F − c0 x].

ck eikx , which leads to

310

Measure and Integral: An Introduction to Real Analysis

For the trigonometric system, we have Bessel’s inequality (see (12.3)) +∞

|ck |2 ≤

−∞

2π 1 | f (x)|2 dx, 2π 0

but actually, as we shall see, we also have Parseval’s formula +∞

|ck |2 =

−∞

2π 1 | f (x)|2 dx 2π

(12.15)

0

for every f ∈ L2 . We know by Theorem 8.31 that this is a corollary of the next result.

Theorem 12.16 The trigonometric system is complete. More precisely, if all the Fourier coefficients of an integrable f are zero, then f = 0 a.e. Proof. Assume first that f is continuous and real-valued, with all ck = 0. If f ≡ 0, then | f | attains a nonzero maximum M at some point x0 . Suppose, for example, that f (x0 ) = M > 0. Let δ > 0 be so small that f (x) > 12 M in the interval I = (x0 − δ, x0 + δ). Consider the trigonometric polynomial t(x) = 1 + cos(x − x0 ) − cos δ. It is strictly greater than 1 inside I and does not exceed 1 in absolute value elsewhere. The hypothesis that all the Fourier coefficients of f are 0 implies π that −π f T dx = 0 for any trigonometric polynomial T, and in particular, π

f tN dx = 0

(N = 1, 2, . . .).

−π

We claim that this is impossible for N large enough. The absolute value of the part of the last integral extended over the complement of I is ≤ 2π · M · 1N = 2πM. If I is the middle half of I, then t(x) ≥ θ > 1 in I , so that I

f tN dx ≥

I

f tN dx ≥

1 M · θN |I | → +∞. 2

π Collecting results, we see that −π f tN dx → +∞; this contradiction shows that f ≡ 0. 2π If f is continuous but not real-valued, the hypothesis 0 f (x)e−ikx dx = 0 2π for all k implies that 0 f (x) e−ikx dx = 0 for all k. By adding and subtracting

A Few Facts from Harmonic Analysis

311

the last two equations, we see that both the real and imaginary parts of f have all their Fourier coefficients equal to 0 and so vanish identically. Finally, if f is merely integrable, the hypothesis c0 = 0 implies that the x function F(x) = 0 f dt is periodic, and by Theorem 12.14, for a suitable C0 , the Fourier coefficients of the continuous function F − C0 are all 0. Hence, F − C0 ≡ 0, F is a constant, and f = F = 0 a.e. This completes the proof of the theorem. Another proof is given on p. 340 in Section 12.6. An immediate corollary is the following result.

Theorem 12.17

Parseval’s formula (12.15) holds for any f ∈ L2 .

Parseval’s formula can be written in more general which are, forms, however, corollaries of (12.15). Thus, besides f ∼ ck eikx ∈ L2 , consider another function g ∼ dk eikx ∈ L2 . Then, by an argument like that in Section 8.6 for (8.32), 2π +∞ 1 f g dx = ck dk . 2π −∞

(12.18)

0

This reduces to (12.15) in the special case f = g. The completeness of the trigonometric system also gives the next two theorems.

Theorem 12.19 If the Fourier series of a continuous f converges uniformly, then the sum of the series is f . Proof. Let g be the sum of the uniformly convergent series S[f ]. The Fourier coefficients of g can be obtained by multiplying S[f ] by e−ikx and integrating the result termwise. Thus, ck [g] = ck [f ] for all k, so that f ≡ g. Theorem 12.20 If a periodic f is the integral of a function in L2 , then S[f ] converges absolutely and uniformly. In particular, the Fourier series of a continuously differentiable function converges uniformly to the function. ikx , g∼ ck e (c0 = 0). Then S[f ] = C0 + Proof. Let f be the integral of g ∈ L2 ikx C |ck |2 < +∞ by Bessel’s inequality, so k e , Ck = ck /ik, k = 0. We have that |Ck | < +∞ by Schwarz’s inequality. This completes the proof. The theorem that follows is of basic importance.

312

Measure and Integral: An Introduction to Real Analysis

Theorem 12.21 (Riemann–Lebesgue) The Fourier coefficients ck of any integrable f tend to 0 as k → ±∞. Hence, also ak , bk → 0 as k → +∞. Proof. First, we note the obvious but important inequality

|ck [f ]| ≤

2π 1 | f | dx. 2π 0

We will give two proofs of the theorem. (a) (See also Exercise 15 of Chapter 8.) If f ∈ L2 , then ck → 0 as a corollary of Bessel’s inequality (p. 310, Section 12.2). If f ∈ L and 2π ε > 0, write f = g + h, where g ∈ L2 and 0 |h| < ε. (This decomposition can be made in various ways: we may, for example, take M large enough and define h to be f wherever | f | ≥ M and 0 elsewhere; clearly, |g| ≤ M, and so g ∈ L2 .) Then ck [f ] = ck [g] + ck [h]. Since ck [g] → 0 and |ck [h]| ≤ (2π)−1 ck [f ] → 0 follows.

2π 0

|h| < ε/2π for all k, the relation

(b) Observe that

ck [f ] =

2π 1 1 f (x)e−ikx dx = 2π 2π 0

=−

2π−(π/k) −(π/k)

π −ik(x+(π/k)) e f x+ dx k

2π π −ikx 1 e f x+ dx. 2π k 0

Taking the semi-sum of the first and third integrals, we obtain

ck [f ] =

2π π −ikx 1 e f (x) − f x + dx, 4π k 0

|ck [f ]| ≤

1 4π

2π 0

π dx. f (x) − f x + k

However, we know that the last integral tends to 0 as k → ±∞ (cf. Theorem 8.19; the analogous result for periodic functions is left to the reader). This completes the proof.

313

A Few Facts from Harmonic Analysis

Given any finite periodic f , the expression sup |f (x + h) − f (x)|

(δ > 0)

x,h;|h|≤δ

is called the modulus of continuity of f and denoted by ω(δ) or ω(δ, f ) (cf. Exercise 17 of Chapter 1). If f is in Lp , 1 ≤ p < ∞, the expression ⎡

⎤1/p 2π 1 sup ⎣ |f (x + h) − f (x)|p dx⎦ |h|≤δ 2π 0

is called the Lp modulus of continuity of f and denoted ωp (δ, f ) or simply ωp (δ). Clearly, ωp (δ) ≤ ω(δ) and, as is easily seen from Hölder’s inequality, ωp (δ) ≤ ωq (δ) if p ≤ q. We know that if f ∈ Lp , then ωp (δ, f ) → 0 with δ (Theorem 8.19). The last inequality in proof (b) of Theorem 12.21 gives |ck [f ]| ≤

π 1 ω ,f , 2 |k|

|ck [f ]| ≤

1 ω1 2

π ,f . |k|

(12.22)

These two inequalities contain the Riemann–Lebesgue theorem in a sharp form since they quantitatively estimate the magnitude of the Fourier coefficients of f in terms of various moduli of continuity. The estimates (12.22) are also useful for families of functions. The following special case deserves a separate mention. A continuous periodic f is said to satisfy a Lipschitz condition of order α, 0 < α ≤ 1, if ω(δ, f ) = O(δα ) or, equivalently, if there is a finite constant M independent of x, h such that |f (x + h) − f (x)| ≤ M|h|α .

Theorem 12.23

If f satisfies a Lipschitz condition of order α, 0 < α < 1, then |ck [f ]| = O(|k|−α ).

If α = 1, the stronger estimate |ck [f ]| = o is valid.

1 |k|

314

Measure and Integral: An Introduction to Real Analysis

Proof. The first part follows from the first inequality (12.22). If α = 1, then f is absolutely continuous (see p. 150 in Section 7.5) and so equals the indefinite integral of its derivative f . Since f is bounded (and so is in L2 ), its Fourier coefficients tend to zero. Hence, the coefficients of f are o(1/|k|) by Theorem 12.13. Theorem 12.24 If a periodic f is of bounded variation over a period, then |ck [f ]| = O(1/|k|). More precisely,

|ck | ≤

V , 2π|k|

where V is the total variation of f over a closed interval of length 2π. Proof. Integrating by parts and taking account of the periodicity of f , we have

2πck [f ] =

π −π

e

−ikx

π 1 −ikx f (x) dx = e df (x), ik −π

where the last integral is a Riemann-Stieltjes integral. Hence,

2π|ck [f ]| ≤ |k|−1

π

|df (x)| = |k|−1 V.

−π

It must be stressed that V is the total variation over a closed interval of periodicity.

12.3 Convergence of S[f ] and S[f ] We shall now briefly discuss the problem of pointwise convergence of S[f ], treating side-by-side the parallel problem for S[f ]. Among many existing results, we will consider only the simplest. Without loss of generality, we may restrict our attention to real-valued f . We begin by computing the partial sums of S[f ] and S[f ]. If ak and bk denote the cosine and sine coefficients of f and n = 1, 2, . . ., then the nth partial sum of S[f ] is

315

A Few Facts from Harmonic Analysis

1 a0 + (ak cos kx + bk sin kx) 2 k=1 π π n 1 1 = f (t) dt + f (t) cos kt dt + sin kx cos kx · 2π −π π −π k=1 π 1 · f (t) sin kt dt π −π π π n 1 1 1 = + f (t) cos k(x − t) dt = f (t)Dn (x − t) dt, π −π 2 π −π n

sn (x) =

k=1

where 1 + cos kt. 2 n

Dn (t) =

k=1

In case n = 0, we denote s0 (x) = 12 a0 and D0 (t) = 12 . The trigonometric polynomial Dn is called the nth Dirichlet kernel. Similarly, if n = 1, 2, . . ., the nth partial sum of S[f ] is

sn (x) =

n k=1

=

n π 1 f (t) sin k(x − t) dt (ak sin kx − bk cos kx) = π −π k=1

π

1 n (x − t) dt, f (t)D π −π

where n (t) = D

n

sin kt

k=1

is the nth conjugate Dirichlet kernel. It will be convenient to define s0 (x) = 0 n are even and odd functions of t, 0 (t) = 0. Notice that Dn and D and D respectively, and that π 1 Dn (t) dt = 1, π −π

π 1 Dn (t) dt = 0. π −π

(12.25)

316

Measure and Integral: An Introduction to Real Analysis

n are, respectively, the real and imaginary parts of Moreover, Dn and D z = eit , n ≥ 1 ,

1 ikt 1 k 1 zn+1 − z + e = + z = + 2 2 2 z−1 n

n

k=1

k=1

this expression being interpreted as computation gives (even if n = 0)

Dn (t) =

sin n + 12 t 2 sin 12 t

,

1 2

+ n when t = 0. Then an elementary

n (t) = D

cos 12 t − cos n + 12 t 2 sin 12 t

,

(12.26)

n (0) = 0. with Dn (0) = n + 12 and D A quicker, though somewhat artificial, method of obtaining the first formula in (12.26) is to multiply Dn (t) termwise by 2 sin 12 t, replace the products 2 sin 12 t cos kt by sin k + 12 t − sin k − 12 t, and make use of cancellation of n (t) can be derived similarly. terms. The formula for D Given a function f and a fixed point x, let us consider the expressions φx (t) =

1 [f (x + t) + f (x − t)], 2

ψx (t) =

1 [f (x + t) − f (x − t)] 2

as functions of t. They are called the even and odd parts of f at the point x, respectively. Clearly, f (x + t) = φx (t) + ψx (t). It turns out that the behaviors of φx (t) and ψx (t) near t = 0 are decisive for the behaviors of S[f ] and S[f ], as the case may be, at the point x. Returning to the formula for sn (x) and making use of the even character of Dn (t), we can write π π 1 1 f (t)Dn (x − t) dt = f (t)Dn (t − x) dt sn (x) = π −π π −π

=

π π 1 2 1 [f (x + t) + f (x − t)]Dn (t) dt f (x + t)Dn (t) dt = π −π π 2 0

=

2 π

π 0

φx (t)Dn (t) dt.

317

A Few Facts from Harmonic Analysis

The first formula (12.25) immediately gives 2 [φx (t) − f (x)] Dn (t) dt π 0 1 π sin n + 2 t 2 = dt. [φx (t) − f (x)] 1 π 2 sin 2 t π

sn (x) − f (x) =

0

It will be convenient to modify this formula slightly by replacing n by n − 1 and taking the semi-sum of the two formulas. When n ≥ 1, writing #

sn (x) =

1 1 [sn (x) + sn−1 (x)] = sn (x) − (an cos nx + bn sin nx) , 2 2

(12.27)

we obtain, after observing that 1 2 cos nt dt, [φx (t) − f (x)] (an cos nx + bn sin nx) = 2 π 2 π

0

the formula 2 sin nt dt. [φx (t) − f (x)] π 2 tan 12 t π

#

sn (x) − f (x) =

0

The right side here is the nth Fourier sine coefficient of the odd function [φx (t) − f (x)]

1 1 cot t, 2 2

and if this function happens to be integrable near t = 0, the Riemann– # Lebesgue theorem immediately gives sn (x) − f (x) → 0. Hence, making use of the fact that an , bn → 0, we obtain from (12.27) the following basic result. Theorem 12.28 (Dini’s Test) Let f be periodic and integrable. If the integral π φx (t) − f (x) 1 cot 1 t dt 2 2 0

is finite, then S[f ] converges at the point x to the value f (x).

318

1 2

Measure and Integral: An Introduction to Real Analysis

Since only small values of t matter here, and since for small t we have cot 12 t t−1 , Dini’s condition can be restated in the form π φx (t) − f (x) t

0

dt < +∞,

or what is the same thing π |f (x + t) + f (x − t) − 2f (x)| t

0

dt < +∞.

(12.29)

The following special case is useful. Suppose that f has a jump discontinuity at x, so that the one-sided limits f (x+), f (x−) exist and are finite. Since changing f at a single point does not affect S[f ], we may assume that f (x) =

1 [f (x+) + f (x−)], 2

in which case we say that f has a regular discontinuity at x. Condition (12.29) is then certainly satisfied if both π | f (x + t) − f (x+)| 0

t

dt < +∞,

π | f (x − t) − f (x−)| t

0

dt < +∞.

Thus, a corollary of Dini’s test is that if both f (x+) and f (x−) exist and are finite, and if both of the last two integrals are finite, then S[f ] converges at the point x to the value f (x+) + f (x−) . 2 There is a result analogous to Theorem 12.28 for S[f ], and we will be n , we have brief here. Using the formula for sn and the odd character of D if n ≥ 1 that 1 1 π π cos t − cos n + 2 2 t 1 n (t) dt = − 2 ψx (t) dt, sn (x) = − f (x + t)D 1 π −π π 2 sin 2 t 0

319

A Few Facts from Harmonic Analysis

# sn (x) =

sn (x) + sn−1 (x) 2

π π cos n + 12 t 2 1 2 1 =− dt, ψx (t) cot t dt + ψx (t) π 2 2 π 2 sin 12 t 0

0

provided that π

|ψx (t)|

0

dt < +∞. t

(12.30)

Under this hypothesis, the last term in the preceding equation tends to zero by the Riemann–Lebesgue theorem, and we obtain (see Exercise 22) Theorem 12.31 Under the hypothesis (12.30), the series S[f ] converges at the point x to the sum −

2 1 1 f (x + t) − f (x − t) 1 dt. ψx (t) cot t dt = − π 2 2 π 2 tan 12 t π

π

0

0

We denote the last integral, which converges absolutely when (12.30) holds, by f (x) : 1 f (x) = − π

π f (x + t) − f (x − t) 0

2 tan 12 t

dt.

(12.32)

This function is called the conjugate function of f and is intimately connected with the behavior of S[f ]. We will study the existence and properties of f in detail later. Observe that condition (12.30) is of a nature completely different from (12.29); (12.30) precludes the possibility that f may have a jump at x. See Exercise 15. The proofs of Theorems 12.28 and 12.31 are based on the Riemann– Lebesgue theorem and give convergence results only at individual points. They cannot give uniform convergence in an interval without additional and rather strong assumptions. We consider one such assumption that, though very restrictive, leads to an important result. Theorem 12.33 If f = 0 in an interval (a, b), then S[f ] and S[f ] converge uniformly in every smaller interval (a + ε, b − ε). The sum of S[f ] is 0.

320

Measure and Integral: An Introduction to Real Analysis

Proof. The pointwise convergence in (a, b) is a corollary of Theorems 12.28 and 12.31, and it is only the question of uniformity that requires additional comment. We will consider only S[f ]; the argument for S[f ] is similar. Fix ε > 0. From (12.27), we deduce that # sn (x0 )

π 1 sn (x0 ) + sn−1 (x0 ) sin nt = dt = f (x0 + t) 2 π −π 2 tan 12 t

=

π 1 f (x0 + t) χ(t) sin nt dt, π −π

x0 ∈ (a + ε, b − ε),

where χ(t) is periodic, equals 12 cot 12 t for ε ≤ |t| ≤ π, and is arbitrary for |t| < ε. Suppose χ is defined so that it is continuous everywhere. Write f (x0 +t)χ(t) = gx0 (t), treating t as the variable and x0 as a parameter, and consider the modulus of continuity of gx0 (t) in the metric L1 . If we show that ω1 (δ, gx0 ) tends to 0 with δ uniformly for x0 ∈ (a + ε, b − ε), then the Fourier coefficients of gx0 (t) will also tend to 0 uniformly for such x0 (see the second formula (12.22)), and the theorem will follow. Now, for h > 0, 2π 2π gx (t + h) − gx (t) dt = f (x0 + t + h) χ(t + h) − f (x0 + t) χ(t) dt 0 0 0

0

≤

2π

f (x0 + t + h) − f (x0 + t) |χ(t + h)| dt

0

+

2π f (x0 + t) |χ(t + h) − χ(t)| dt. 0

The last integral clearly tends to 0 with h, uniformly in x0 , since max |χ(t + h)−χ(t)| → 0 as h → 0. If M = max |χ|, the preceding integral is majorized by M

2π 2π f (x0 + t + h) − f (x0 + t) dt = M | f (t + h) − f (t)| dt, 0

0

a quantity independent of x0 and tending to 0. This completes the proof. Two trigonometric series T1 and T2 are said to be equiconvergent at a point x0 if their difference T1 − T2 converges to 0 at x0 . If T1 − T2 merely converges, but not necessarily to 0, then T1 and T2 are said to be equiconvergent in the wider sense at x0 . Each of two equiconvergent series may be individually divergent, but the character of divergence is so similar that divergence cancels out in T1 − T2 .

321

A Few Facts from Harmonic Analysis

Theorem 12.34 Let f1 and f2 be two periodic functions that are equal in an interval (a, b). Then S[f1 ] and S[f2 ] are uniformly equiconvergent in every subinterval (a + ε, b − ε); S[f1 ] and S[f2 ] are uniformly equiconvergent in the wider sense in every (a + ε, b − ε). This is a corollary of Theorem 12.33, since, for example, S[f1 ] − S[f2 ] = S[f ] where f (= f1 − f2 ) vanishes in (a, b). Thus, if we change the values of f in an arbitrary way outside an interval (a, b), we do not affect the behavior of S[f ] in (a + ε, b − ε). Likewise for S[f ], although in this case, if the series converges, the value of the sum may change. Therefore, the convergence or divergence of S[f ] and S[f ] at a point x0 is a local property, that is, it depends only on the behavior of f near x0 .

12.4 Divergence of Fourier Series Theorem 12.35 There exists a continuous periodic f such that S[f ] diverges (more specifically, the partial sums of S[f ] are unbounded) at some point. Proof. Let 1 ≤ m < n and consider the polynomials

Qm,n (x) =

cos(m + n − 1)x cos mx cos(m + 1)x + + ··· + n n−1 1 −

cos(m + n + 1)x cos(m + n + 2)x cos(m + 2n)x − − ··· − . 1 2 n

We will show that all these polynomials are uniformly bounded, but that their partial sums are not. To prove the first statement, we need the fact that the partial sums of the series ∞ sin kx k=1

k

,

which we considered on p. 306 in Section 12.1, are uniformly bounded. This is an elementary fact that can be proved in many ways (see, e.g., Theorem 12.50(c)), but here we take it for granted. Thus, since

322

Measure and Integral: An Introduction to Real Analysis

Qm,n (x) =

n cos(m + n − k)x − cos(m + n + k)x

k

k=1

= 2 sin(m + n)x

n sin kx

k

k=1

,

we obtain |Qm,n (x)| ≤ C, where C is independent of m and n. On the other hand, when x = 0, the partial sum #

Qm,n (x) =

cos(m + n − 1)x cos mx + ··· + n 1

has the value 1 + (1/2) + · · · + (1/n), which is of order log n. Now select integers mk and nk such that mk + 2nk < mk+1

(k = 1, 2, . . .),

and choose a series of positive numbers αk such that αk < +∞, αk log nk → +∞. (We will make the construction in a moment.) The series ∞

αk Qmk ,nk (x)

k=1

then converges uniformly to a continuous function f . In view of the previous inequality relating mk and mk+1 , the polynomials Qmk ,nk do not overlap. Hence, the last series can be written as a single trigonometric series, whose coefficients (because of uniform convergence) are the Fourier coefficients of f . Thus, this series, unbracketed, is S[f ]. But S[f ] has unbounded partial # sums at x = 0 since a single block of terms, namely, αk Qmk ,nk (x), is of order αk log nk at x = 0. It is easy to verify that if we set 3

mk = 5k ,

3 nk = 2mk = 2 5k ,

αk = 1/k2 ,

then all the conditions required previously are fulfilled. This completes the proof. We leave it to the reader to check that if we choose 2

mk = 5k ,

nk = 2mk ,

αk = 1/k2

in the construction above, we get a continuous f whose partial sums are divergent but bounded at x = 0.

323

A Few Facts from Harmonic Analysis

Theorem 12.35 asserts that the partial sums of S[f ] can be unbounded even if f is continuous. It is of interest to know how unbounded they can be. From the formula sn (x, f ) =

π 1 f (x + t)Dn (t) dt, π −π

we see that if | f | ≤ 1, then π π 2 sn (x, f ) ≤ 1 |Dn (t)| dt = |Dn (t)| dt, π −π π 0

uniformly in x. The right side here is called the nth Lebesgue constant and will be denoted by Ln . Note that Ln is actually the value of sn (0, f ) for a specific f , namely, f (t) = sign Dn (t).

Theorem 12.36

We have

2 4 |Dn (t)| dt = 2 log n + O(1) Ln = π π π

as n → ∞.

0

Proof. Write 1 sin n + 1 t dt 2 2 sin 12 t 0 0 π π 1 1 dt 1 2 2 1 sin n + sin n + = t + t − dt. π 2 t π 2 2 sin 12 t t π

2 2 |Dn (t)| dt = Ln = π π π

0

0

−1 − t−1 is nonnegative and bounded for 0 < t ≤ π, and since Since 2 sin 12 t | sin n + 12 t| ≤ 1, the last integral is nonnegative and majorized by an abso lute constant. The change of variable n + 12 t = u shows that the preceding term equals 2 π

(n+(1/2))π 0

| sin u| du. u

324

Measure and Integral: An Introduction to Real Analysis

We may disregard the parts of this integral extended over (0, π) and 1 nπ, n + 2 π , since the integrand is bounded. In view of the periodicity of sin u, what remains can be written as n−1 nπ π 1 2 | sin u| 2 du = (sin u) du. ππ u π u + kπ 0

k=1

For 0 ≤ u ≤ π, the sum in brackets is contained between π−1 nk=2 (1/k) and −1 log n by an amount that is bounded in π−1 n−1 k=1 (1/k) and so differs from π π n and u. If we now note that 0 sin u du = 2, and collect estimates, we obtain Ln = (4/π2 ) log n + O(1).

Theorem 12.37

If f is integrable, then at each point x0 of continuity of f , sn x0 , f = o(log n).

The estimate is uniform over every closed interval of continuity of f . Proof. We will prove only the first statement, leaving the second to the reader. Suppose, as we may, that x0 = 0, f (x0 ) = 0. Because of our results about localization (see Theorem 12.34), we may assume that f vanishes outside an arbitrarily small fixed interval (−δ, δ). Then δ π 1 |sn (0)| = |Dn (t)| dt. f (t)Dn (t) dt ≤ sup | f (t)| · π −δ |t|≤δ −π Since the sup here is small with δ and the integral is of order log n, the assertion follows.

12.5 Summability of Sequences and Series Theorem 12.35 shows that even continuous functions, when developed into Fourier series, may not be representable by those series in terms of pointwise convergence. The situation can be remedied by considering generalized sums of the series. This topic is vast and basic for analysis, and we will study only a few facts important for the theory of Fourier series.

325

A Few Facts from Harmonic Analysis

Consider a fixed doubly infinite matrix of numbers (real or complex): α00 α10

α01 · · · α0n · · · α11 · · · α1n · · · .. .

(M )

αm0 αm1 · · · αmn · · · .. .

Given an infinite sequence of numbers s0 , s1 , . . . , sn , . . . , we transform it by using (M ) into a sequence σ0 , σ1 , . . . , σm , . . . by means of the formulas σm = αm0 s0 + αm1 s1 + · · · + αmn sn + · · ·

(m = 0, 1, 2, . . .),

assuming that the series defining σm converges for each m. We may ask what conditions on (M ) will guarantee that whenever {sn } converges to a finite limit s, lim σm also exists and equals s. An answer is given by the following theorem.

Theorem 12.38

Suppose that (M ) satisfies the following three conditions:

≤ A (for all m, with A independent of m), (ii) limm→∞ ( n αmn ) = 1, (iii) limm→∞ αmn = 0 for each n. (i)

n |αmn |

Then for any sequence {sn } converging to a finite limit s, lim σm exists and equals s. Theorem 12.38 is due to Toeplitz, and a matrix (M ) that satisfies (i)–(iii) is called a Toeplitz matrix. Proof. First of all, since {sn } is bounded, (i) implies that σm exists for each m. Next, write sn = s + εn , where εn → 0. Then σm =

αmn (s + εn ) = s

n

We have s

n αmn

αmn +

n

αmn εn .

n

→ s by (ii), and it remains only to show that the expression ρm =

n

αmn εn

326

Measure and Integral: An Introduction to Real Analysis

tends to 0 as m → ∞. Given δ > 0, split ρm into two sums, ρm =

αmn εn +

n≤n0

αmn εn = ρ m + ρ m ,

n>n0

say, where n0 is so large that |εn | ≤ δ for n > n0 . By (i), ρ ≤ |αmn | |εn | ≤ |αmn | δ ≤ Aδ. m n>n0

n>n0

On the other hand, ρ m consists of a fixed number of terms each of which, by (iii), tends to 0 as m → ∞. Hence, ρ m < Aδ for m large enough. Combining estimates, we see that ρm → 0, which completes the proof. It is useful to note that if s = 0, then condition (ii) is not required in the proof (and so in the statement of the theorem) above. It is also immediate from the proof that if {sn } depends on a parameter, and if {sn } tends uniformly to a bounded limit s, then {σm } tends uniformly to s too. If σm → s, we shall say that the sequence {sn } (or the series whose partial sums are the sn ) is summable to limit (sum) s by means of the matrix (M ), or simply is summable (M ) to s. The matrix (M ) is called positive if αmn ≥ 0 for all m, n. Condition (i) is then a corollary of (ii). For positive (M ), Theorem 12.38 also holds if s = ±∞; we leave the proof to the reader. Two methods of summability are of special significance for Fourier series. (a) The method of the arithmetic mean. Given s0 , s1 , . . . , sn , . . ., consider the arithmetic means σ0 , σ1 , . . . , σm , . . . defined by σm =

s0 + s1 + · · · + sm m+1

(m = 0, 1, 2, . . .).

If sn → s (−∞ ≤ s ≤ +∞), then σm → s. This is clearly a special case of Theorem 12.38; the matrix is positive. It is useful (see, e.g., the comments following the proof of Theorem 12.44) to note that if the sn are the partial sums of a series ∞ k=0 uk , then u0 + (u0 + u1 ) + · · · + (u0 + u1 + · · · + um ) s0 + s1 + · · · + sm = m+1 m+1 m 1 = (m + 1 − k) uk . m+1

σm =

k=0

327

A Few Facts from Harmonic Analysis

Thus, m σm = 1− k=0

k uk , m+1

1 kuk . m+1 m

s m − σm =

(12.39)

k=0

(b) The method of Abel. Given a series u0 + u1 + · · · + un + · · · , consider the power series f (r) =

∞

un rn ,

0 ≤ r < 1,

n=0

assuming that it converges for 0 ≤ r < 1. If f (r) → s as r → 1, we say that un is Abel summable (or A-summable) to sum s. The method can also be applied to sequences since any sequence {sn } can be written as the partial sums of the series s0 + (s1 − s0 ) + (s2 − s1 ) + · · · . Let us now see the relation of Abel summability to the general scheme. We claim that for 0 ≤ r < 1, the formula ∞

n

un r = (1 − r)

n=0

∞

sn rn

(sn = u0 + · · · + un )

(12.40)

n=0

is valid assuming only that one of the two series that appear is convergent. If the right side converges, it equals ∞

sn rn −

n=0

∞

sn rn+1 =

n=0

∞

sn rn −

n=0

= s0 +

∞

sn−1 rn

n=1 ∞

(sn − sn−1 ) rn =

n=1

∞

un rn .

n=0

n Conversely, if ∞ some r, 0 < r < 1, its Cauchy product n=0 un r converges for n −1 converges to sum with the absolutely convergent series ∞ n=0 r = (1 − r) n ∞ n=0

k=0

k

uk r · r

n−k

=

∞

n

(u0 + u1 + · · · + un ) r =

n=0

∞

sn rn .

n=0

This proves (12.40). Now, if {rm } is any sequence tending to 1, 0 < rm < 1, then the positive numbers αmn = (1 − rm ) rnm

328

Measure and Integral: An Introduction to Real Analysis

satisfy conditions (i), (ii), (iii) of Theorem 12.38. We leave the verification to the reader.

Theorem 12.41 (Abel) If is A-summable to s.

∞

n=0

un converges to sum s, −∞ ≤ s ≤ +∞, then it

Proof. Suppose first that s is finite. Applying (12.40), we have to show that n (1 − r) ∞ n=0 sn r → s as r → 1. It is enough to prove that this relation holds for any sequence r = rm , m = 0, 1, . . . , where 0 < rm < 1, rm → 1. This is a corollary of Theorem 12.38 since the numbers αmn = (1 − rm )rnm satisfy (i), (ii), (iii). The matrix αmn is positive, and so the proof holds for s = ±∞, the only prerequisite being that the series un rn converges for 0 ≤ r < 1. We may also consider the power series f (z) =

∞

un zn ,

n=0

where z is a complex variable lying in the unit disc: z = reix , 0 ≤ r < 1. If f (z) tends to a limit s as z tends nontangentially to 1, that is, as z → 1 in such a way that |1 − z| ≤ C < +∞ (|z| < 1), 1 − |z| then ∞ n=0 un is said to be nontangentially Abel summable to sum s. The last inequality means that, in approaching 1, z remains between two chords of the unit circle through z = 1. In fact, if z = x + iy is a pointthat satisfies 0 < x < 1

and |1 − z| ≤ C(1 − |z|), then |y| < C(1 − x) since |y| < y2 + (1 − x)2 = |1 − z| and C(1 − |z|) ≤ C(1 − x). Conversely (see Exercise 23(a)), given a constant γ > 0, there are constants C and δ with C > 0 and 0 < δ < 1 such that if z = x+iy with |z| < 1, 1−δ < x < 1 and |y| < γ(1−x), then |1−z| ≤ C(1−|z|). See (12.65) for another characterization of the notion of nontangential approach of z to 1.

Theorem 12.42 (Abel–Stolz) If ∞ n=0 un converges to a finite sum s, then it is nontangentially Abel summable to s. Proof. The proof is identical tothat of Abel’s theorem, except that now we use the formula un zn = (1 − z) sn zn and consider any sequence {zm } tending

329

A Few Facts from Harmonic Analysis

to 1 from the interior of the unit disc. The matrix αmn is now (1 − zm )znm , conditions (ii) and (iii) of Theorem 12.38 are satisfied as before, and (i) takes the form |1 − zm | ≤ C. 1 − |zm | Theorem 12.43 If ∞ mean n=0 un is summable by the method of the arithmetic to sum s, then it is A-summable to s. If in addition s is finite, then ∞ n=0 un is nontangentially A-summable to s. Proof. Suppose that s is finite. By hypothesis, σn =

s0 + s1 + · · · + sn → s. n+1

Write s0 + s1 + · · · + sn = tn . Applying formula (12.40) twice, we have ∞ n=0

un rn = (1 − r)

∞ n=0

sn rn = (1 − r)2

∞

tn rn = (1 − r)2

n=0

∞ (n + 1)σn rn . n=0

Again, it is enough to consider any sequence rm → 1, 0 < rm < 1. We then have to apply Theorem 12.38 with matrix αmn = (1 − rm )2 (n + 1)rnm , and we easily verify that (αmn ) satisfies conditions (i), (ii), (iii) of Theorem 12.38. The rest of the proof of the theorem is left to the reader. While convergence of a series implies summability A, the converse is gen 1 n erally false: for example, ∞ n=0 (−1) diverges but is A-summable to sum 2 ∞ since n=0 (−r)n = 1/(1 + r) → 12 as r → 1−. If one makes additional assumptions on the terms of the series, however, the converse will hold. The following result is both elementary and useful. Theorem 12.44 (Tauber) If un is A-summable to sum s, −∞ ≤ s ≤ +∞, and if un = o(1/n) as n → ∞, then un converges to sum s.

330

Measure and Integral: An Introduction to Real Analysis

Proof. Write un = εn /n, n > 1, where εn → 0. Let rm be a sequence tending to 1, which we shall determine in a moment. Then sm − f (rm ) is a transformation of the sequence εn : sm − f (rm ) =

m εn n=1

n

−

∞ εn

n

n=1

rnm =

∞

αmn εn ,

n=1

where αmn =

1 1 − rnm n

if n ≤ m,

1 αmn = − rnm n

if n > m.

If we verify conditions (i) and (iii) of Theorem 12.38, then the fact that εn → 0 will give sm − f (rm ) → 0, and so also sm → s. Condition (iii) is obvious for any {rm } → 1. As for (i), observing that 1 − rn = (1 − r) 1 + · · · + rn−1 ≤ (1 − r)n, we have n

|αmn | ≤

m ∞ 1 1 n r (1 − rm ) n + n n m n=1

n=m+1 ∞

≤ m (1 − rm ) +

1 n rm m+1 n=0

1 1 = m (1 − rm ) + . m + 1 1 − rm Hence, if we choose rm = 1 − (1/m), then holds, and the theorem follows.

n |αmn |

≤ 2. Thus, condition (i)

un is summable by the method of the arithmetic mean and nun → 0, then If un converges. Of course, this is a corollary of Theorems 12.43 and 12.44, but a direct proof is on the surface: By (12.39), 1 nun , m+1 m

s m − σm =

n=0

and the assumption nun → 0 clearly implies sm − σm → 0. Thus, if σm → s, then also sm → s. Actually, this argument shows that if nun → 0, then whether {σm } converges or not, the difference sm −σm tends to 0, that is, the behavior of {sm } imitates that of {σm }. The same argument shows that if {σm } is a bounded

331

A Few Facts from Harmonic Analysis

sequence, and |un | ≤ A/n for n = 1, 2, . . . , then the sequence {sm } is bounded. The result that follows lies deeper.

Theorem 12.45 (Hardy) If mean to a finite sum s and if

un is summable by the method of the arithmetic

|un | ≤ then

A n

(n = 1, 2, . . .),

un converges to s.

Proof. Consider the expressions (which we shall call the delayed arithmetic means) σn,k =

sn+1 + sn+2 + · · · + sn+k . k

They are easily expressible in terms of the σn :

σn,k

s0 + · · · + sn+k − (s0 + · · · + sn ) n+k+1 n+1 = σn+k − σn = k k k n+1 σn+k − σn + σn+k . = k

It is clear that if kn is any sequence of integers such that n/kn is bounded as n → ∞, then σn → s implies that σn,kn → s. Using the definition of σn,k , we also deduce that

σn,k

(sn+1 − sn ) + · · · + sn+k − sn = sn + k 1 (k − j + 1)un+j . k k

= sn +

j=1

Hence, assuming as we may that A = 1, k σn,k − sn ≤ un+j ≤ j=1

k . n+1

332

Measure and Integral: An Introduction to Real Analysis

Let k = kn = [εn], where ε > 0 is arbitrarily small and fixed, and [x] designates the integral part of x. Then n/kn is bounded, and so σn,kn → s. But by taking kn = [εn] in the last estimate, we obtain lim sup σn,kn − sn ≤ ε. n→∞

Hence, sn → s, and the proof is complete. We remark in passing that the conclusion of Hardy’s theorem is true if the assumption of summability by the method of the arithmetic mean is replaced by A-summability (theorem of Littlewood)∗ .

12.6 Summability of S[f ] and S[f ] by the Method of the Arithmetic Mean Given a periodic f , we denote by sn (x) = sn (x, f ) the partial sums of S[f ] and by σn (x) = σn (x, f ) their arithmetic means. Thus (see p. 315 in Section 12.3 for the definition of the Dirichlet kernel Dn ), sn (x) =

π π 1 1 f (t)Dn (x − t) dt = f (x + t)Dn (t) dt, π −π π −π

σn (x) =

π π 1 1 f (t)Kn (x − t) dt = f (x + t)Kn (t) dt, π −π π −π

where, by using (12.26), n n 1 1 1 t. Dj (t) = sin j + Kn (t) = n+1 2 (n + 1)2 sin 12 t j=0 j=0 Note that σ0 (x) = a0 /2 and K0 (t) = 1/2. Multiplyingand dividing the last sum

termwise by 2 sin 12 t and using the equation 2 sin j + cos(j + 1)t, we get 1 − cos(n + 1)t 2 Kn (t) = 2 = n + 1 (n + 1) 2 sin 12 t

1 2

t sin 12 t = cos jt −

sin[(n + 1)t/2] 2 sin 12 t

2 .

(12.46)

∗ See A. Zygmund, Trigonometric Series, vol. 1, 2nd edn., Cambridge University Press,

Cambridge, U.K., 1968, p. 81.

A Few Facts from Harmonic Analysis

333

The trigonometric polynomial Kn (t) is called the nth Fejér kernel. An analogous kernel was considered in (9.11) for nonperiodic functions. The formula σn (x, f ) =

π π 1 1 f (t)Kn (x − t) dt = f (x − t)Kn (t) dt π −π π −π

is a periodic version of the notion of convolving a function and a kernel. Some of the facts we will prove below are similar to ones we have already had in Chapter 9, but rather than connecting the present case with those results, we shall give brief direct proofs of the theorems we need. j Using the formula Dj (t) = 12 + m=1 cos mt, j ≥ 1, the Fejér kernel can be written (see (12.39)): Kn (t) =

n n m |m| 1 1 1− 1− + cos mt = eimt , 2 n+1 2 m=−n n+1 m=1

which should be interpreted as 1/2 in case n = 0. Kn has the following properties: (a) Kn (t) ≥ 0; Kn (−t) = Kn (t). π (b) (1/π) −π Kn (t) dt = 1. (c) Kn (t) ≤ (n + 1)/2; Kn (t) ≤ A/[(n + 1)t2 ] (0 < |t| ≤ π; A an absolute constant). Here, (a) and (b) are obvious from the various previously mentioned formulas for Kn . The first part of (c) follows from the formula Kn (t) = (n + 1)−1 nj=0 Dj (t) together with the obvious estimate |Dj | ≤ j + 12 and the identity nj=0 j = n(n + 1)/2. The second part follows from (12.46) if we note that | sin u| ≥ (2/π)|u| for 0 ≤ |u| ≤ π/2. From the second inequality in (c), we immediately deduce (c’)

δ≤|t|≤π Kn (t) dt

→ 0 as n → ∞ for any fixed δ, 0 < δ ≤ π.

These properties of Kn lead to the next result, which is basic and related to Theorem 9.9.

Theorem 12.47 (Fejér) Let f be integrable and periodic. Then σn (x) → f (x)

334

Measure and Integral: An Introduction to Real Analysis

at each point of continuity of f , and the convergence is uniform over every closed interval of continuity. In particular, σn (x) tends to f (x) uniformly everywhere if f is continuous everywhere. If f has a jump discontinuity at x0 , then σn (x0 ) →

1 [f (x0 +) + f (x0 −)] . 2

Proof. Suppose f is continuous on a closed interval I = [a, b] (which may reduce to a point). Given ε > 0, we can find δ > 0 so that | f (x + t) − f (x)| < ε for x ∈ I, |t| < δ. Using (b), we can write (assuming as we may that δ ≤ π) π 1 σn (x) − f (x) = [f (x + t) − f (x)]Kn (t) dt π −π

=

1 1 + π π |t| α ∪ x : f > α ⊂ x : h > α = S ∪ T. x : 2 2

Recall that 0 ≤ g ≤ 2α; we also have Hence, by Tchebyshev’s inequality, |S| ≤

1 α 2

=

Qg

−2

Qf

since

Qh

= 0 by (12.71)(b).

g2

Q

4 8 8 ≤ 2 g2 ≤ g= f. α α α Q

Q

Q

To estimate |T|, let T1 = T ∩ P∗ , T2 = T ∩ Q∗ . Thus, |T| = |T1 | + |T2 |, and by (12.77), |T2 | ≤ |Q∗ | ≤

2 f. α Q

358

Measure and Integral: An Introduction to Real Analysis

On T1 , | h(x)| is majorized a.e. by (see (12.76)) Aα

Q

δ(t) δ(t) dt = Aα dt = AαM(x), (x − t)2 (x − t)2 ∗ Q

and if | h(x)| is to be ≥ 12 α there, then necessarily M(x) ≥ 1/(2A). But M(x) is the integral Mλ (x) of Theorem 12.72 corresponding to λ = 1, G = Q∗ , F = P∗ . Therefore, by the estimate given in Theorem 12.72 and (12.77),

M ≤ 2|Q∗ | ≤

P∗

4 f, α Q

∗ and by Tchebyshev’s inequality, the M(x) subset of P where−1 ≥ 1/(2A) has −1 f . Thus, |T | ≤ 8Aα measure at most 2A P∗ M ≤ 8Aα 1 Q Q f , and collecting estimates, we have

! " c f (x) > α ≤ f 1 (c = 10 + 8A). x : α This was proved for α ≥ (2π)−1 Q f but is trivially true for smaller α since our set certainly has measure ≤ 2π, while 2π ≤ α−1 Q f for such α. This completes the proof of Theorem 12.67. The theorem is strengthened by the following result.

Theorem 12.78

If f ∈ L, then the maximal conjugate function fε (x) f∗ (x) = sup 0 α/2 . g∗ > α/2 + ω(α) ≤

360

Measure and Integral: An Introduction to Real Analysis

By Theorem 12.67, with Q = (−π, π), ! " h∗ > α/2 ≤ A (α/2)−1 |h| = Aα−1

| f |,

{| f |>α}

Q

and by Tchebyshev’s inequality and Theorem 12.62,

−2 2 g∗ > α/2 ≤ α/2 g∗ ≤ Aα−2

Q

g2 = Aα−2

f 2.

{| f |≤α}

Q

We have (see Theorem 5.51 and Exercises 16 of Chapter 5 and 5 of Chapter 6) p || f∗ ||p = −

∞

αp dω(α) = p

0

∞

αp−1 ω(α) dα.

0

Using the estimates above, the last integral is majorized by

Ap

∞

⎛

⎞

αp−1 ⎝α−1

| f | dx⎠ dα + Ap

{| f |>α}

0

∞

⎛

αp−1 ⎝α−2

⎞ f 2 dx⎠ dα.

{| f |≤α}

0

Interchanging the order of integration, we can write this as

Ap

Q

⎛ | f (x)| ⎝

| f(x)|

⎞ αp−2 dα⎠ dx + Ap

0

Q

⎛ f 2 (x) ⎝

∞

⎞ αp−3 dα⎠ dx

| f (x)|

⎧ ⎫ ⎨ 1 ⎬ 1 = Ap | f (x)|p dx + | f (x)|p dx , ⎩p − 1 ⎭ 2−p Q

Q

due to the fact that 1 < p < 2. It follows that (12.80)(b), and so also (12.80)(a), holds with , + 1 1 p + (1 < p < 2), (12.82) Ap = Ap p−1 2−p which proves the lemma. It is not surprising that Ap becomes infinite as p → 1 since as we have observed, the integrability of f does not imply that of f . On the other hand,

361

A Few Facts from Harmonic Analysis

in view of the validity of (12.80)(b) for p = 2, it is natural to expect that there is a better estimate for Ap near p = 2 than (12.82) (which becomes infinite as p → 2). This we shall see below. Meanwhile, note that the exponent 2 plays a rather accidental role in Lemma 12.81. In fact, if we knew (12.80)(b) for any p0 > 1, then the previous argument would give it for 1 < p ≤ p0 , with Ap bounded for p away from p = 1 and p = p0 . The only change necessary in the proof is replacing the exponent 2 in the argument for g by p0 . Moreover, if instead of (12.80)(b) we only knew (12.80)(a) for some p0 , then by applying the same argument to f instead of f∗ , we would obtain (12.80)(a) for 1 < p ≤ p0 , with Ap bounded for p away from 1 and p0 . We will use this idea below. Inequality (12.80)(b) for any p implies || fε ||p ≤ Ap || f ||p

(12.83)

fε g| for for the same p and all ε > 0. From the fact that || fε ||p equals supg | Q all g with ||g||p ≤ 1, 1/p + 1/p = 1, and the easily verifiable formula

fε g = −

Q

f gε

Q

(apply Fubini’s theorem), it follows that if (12.83) holds for any p, 1 < p < ∞, then it also holds for the conjugate exponent p , and Ap = Ap (see also Exercise 16 of Chapter 10). But we proved (12.83) for 1 < p ≤ 2. Hence, it also holds for 2 ≤ p < ∞, and an observation analogous to the one made earlier just before (12.83), but this time for fε rather than f , shows that the constant Ap in (12.83) remains bounded for p away from 1 and ∞. Using (12.82), we thus see that the Ap in (12.83) satisfies the inequality Ap ≤

A p−1

(1 < p ≤ 2),

(12.84)

for p ≥ 2.

(12.85)

and so also (since Ap = Ap ) Ap ≤ Ap

f (x) a.e. as ε → 0 for any integrable f , (12.83) leads to the Since fε (x) → basic inequality ||f ||p ≤ Ap || f ||p , 1 < p < ∞, where Ap satisfies the two previous estimates. This proves (12.80)(a); we still have to prove (12.80)(b) for 2 < p < ∞ and the formula S[f ] = S[ f ] for f ∈ Lp , p > 1. p Let us show that if f ∈ L , p > 1, then S[f ] = S[ f ]. We may assume that p < 2. In view of Theorem 12.56, lim σn (x) exists and equals f (x) a.e. for f merely integrable. Next, by Theorem 12.61(ii), | σn (x)| is majorized by f∗ (x) + cf ∗ (x),

362

Measure and Integral: An Introduction to Real Analysis

an integrable function in our case since f ∈ Lp , 1 < p < 2. Hence, by the dominated convergence theorem, each Fourier coefficient of σn tends to the corresponding Fourier coefficient of f . But it also tends to the correspond ikx ing coefficient of S[f ]; in fact by (12.39), if f ∼ ck e , then the kth Fourier coefficient of σn equals [1 − |k|/(n + 1)](−i sign k)ck if |k| ≤ n and equals 0 if |k| > n. Thus, S[f ] = S[ f ]. To complete the proof of Theorem 12.79, it remains to show that (12.80)(b) holds for 1 < p < ∞ (we have shown it only for 1 < p ≤ 2), with Ap bounded for p away from 1 and ∞. It is immediate (see the proof of Theorem 12.62) that if 1/(n + 1) ≤ ε ≤ 1/n, then | fε (x) − f1/n (x)| is majorized by f ∗ (x). Hence, by Theorem 12.61(ii), f∗ (x) ≤ cf ∗ (x) + sup | σn (x)| . n

By Theorem 12.61(i) applied to S[f ] = S[ f ], ∗ sup f ≤c f (x). σn (x, f ) = σ∗ x, n

Hence,

! ∗ " f∗ ≤ c f ∗ + , f + ∗ , f || f∗ ||p ≤ c || f ∗ ||p + p

≤ c Cp {|| f ||p + || f ||p }, where Cp is the constant of the Hardy–Littlewood maximal Theorem 12.60. In view of (12.80)(a), the inequality || f∗ ||p ≤ Bp || f ||p is thus established for all p, 1 < p < ∞, with Bp = cCp [1 + Ap ], Ap being the constant in (12.80)(a). Combining the estimates (12.84) and (12.85) for Ap with the estimate for Cp on p. 228 in Section 9.3, it follows that Bp is bounded for p away from 1 and ∞. In particular, since Cp remains bounded as p → ∞, it follows that Bp is O(p) as p → ∞. To estimate Bp as p → 1, it is best to use the estimate derived in the proof of Lemma 12.81 (see (12.82)); thus, Bp = O(1/(p − 1)) as p → 1, so that Bp satisfies the same sorts of estimates as Ap . This completes the proof of Theorem 12.79. We have the following important corollary.

Corollary 12.86

fε converges in Lp norm to f. Let f ∈ Lp , 1 < p < ∞. Then

This is an immediate consequence of the convergence theorem dominated fε ≤ f∗ ∈ Lp . since fε converges pointwise a.e. to f and

363

A Few Facts from Harmonic Analysis

12.10 Application of Conjugate Functions to Partial Sums of S[f ] The behavior of the partial sums sn (x) = sn (x, f ) is a much more delicate topic than the behavior of the arithmetic means. We will consider only the question of the convergence of sn to f in the metric Lp . The main tool here is a connection between the partial sums and the conjugate function. Instead of sn , it will be convenient to consider the expressions (see (12.27)) #

sn (x) =

sn (x) + sn−1 (x) 1 = sn (x) − (an cos nx + bn sin nx) 2 2

(12.87)

when n ≥ 1. We have #

sn (x) = =

π 1 Dn (x − t) + Dn−1 (x − t) dt f (t) π −π 2 π 1 sin n(x − t) dt f (t) π −π 2 tan 12 (x − t)

π π 1 f (t) cos nt 1 f (t) sin nt = sin nx · dt − cos nx · dt. π −π 2 tan 12 (x − t) π −π 2 tan 12 (x − t)

The last decomposition was purely formal, but from Section 12.8, we know that the cofactors of sin nx and cos nx exist a.e. in the principal value sense (and represent the conjugate functions of f (t) cos nt and f (t) sin nt, respectively). Theorem 12.88 (M. Riesz) If f ∈ Lp , 1 < p < ∞, then (i) ||sn ||p ≤ c|| f ||p , (ii) || sn ||p ≤ c|| f ||p ,

|| f − sn ||p → 0, || f − sn ||p → 0,

where c depends only on p. Proof. It is enough to prove (i), which implies (ii) since sn (x, f ) = sn (x, f) # and ||f ||p ≤ Ap || f ||p , 1 < p < ∞. The first part of (i) with sn instead of sn #

is an immediate corollary of the last formula for sn and the inequality

364

Measure and Integral: An Introduction to Real Analysis

f p ≤ Ap f p : in fact, letting gn (t) = f (t) cos nt, hn (t) = f (t) sin nt in the formula, we obtain sn (x) = sin nx gn (x) − cos nx hn (x) #

(a.e.),

# ||sn ||p ≤ || gn ||p + || hn ||p ≤ Ap (||gn ||p + ||hn ||p ) ≤ Ap || f ||p #

for 1 < p < ∞. Since, by (12.87), sn − sn p ≤ A f 1 ≤ A f p , we obtain sn p ≤ Ap f p . The relation || f − sn ||p → 0 is obvious for functions that have a continuous derivative (see Theorem 12.20), and since such functions are dense in Lp , it also follows in the general case.

Exercises 1. Prove the following versions of Theorems 12.13 and 12.14. (a) If f ∼ 12 a0 + ∞ k=1 (ak cos kx + bk sin kx), and f is the indefinite integral of f , then f ∼ ∞ k=1 k(bk cos kx − ak sin kx). ∞ 1 (b) If f ∼ 2 a0 + k=1 (ak cos kx + bk sin kx), and F is the indefinite integral of f , then ∞

1 1 1 (ak sin kx − bk cos kx). F(x) − a0 x ∼ A0 + 2 2 k k=1

2. Prove that if f (x) ∼

ck eikx , then f (x + α) ∼

ck eikα eikx . 3. If f is real or complex-valued and f ∼ 12 a0 + ∞ k=1 (ak cos kx + bk sin kx), show that 2π ∞ 1 1 |ak |2 + |bk |2 . | f |2 = |a0 |2 + π 2 k=1

0

4. Deduce from (12.18) that if f , g ∈ L2 , f ∼

ck eikx , g ∼

2π 1 f (t)g(x − t) dt = ck dk eikx . 2π 0

dk eikx , then

365

A Few Facts from Harmonic Analysis

Thus, a Fourier coefficient of the convolution of f and g equals the product of the corresponding coefficients of f and g. (For nonperiodic versions of this result, see Theorems 13.30 and 13.59.) 5. Prove the following. (a) If f (x) is periodic and equal to sign x in (−π, π), then f (x) ∼

+ , sin 3x sin 5x 4 sin x + + + ··· . π 3 5

(b) Let 0 < h < 12 π, and let f be the triangular function defined as follows: f is periodic, even, continuous, f (0) = 1, f (x) = 0 for 2h ≤ x ≤ π, and f is linear in (0, 2h). Then 2h f ∼ π

2 ∞ +∞ h 1 sin kh 2 ikx sin kh + cos kx = e . 1+ 2 kh π kh −∞ k=1

(c) Let g be periodic and equal to

g∼

1 2

log 1/ 2 sin 12 x in (−π, π). Then

∞ cos kx k=1

k

.

(For (b), the coefficients can be computed directly, or by using Exercise 4 and Example (b), Section 12.1, p. 306. Observe that the convolution of the characteristic function of an interval with itself is a triangular function. For (c), one may either integrate by parts in the formula for the cosine coefficients of g or consider the real part of the series ∞ k z k=1

k

= log

1 , 1−z

z = reix ,

for r < 1, and then let r → 1.) 6. Using the formula for the Fourier series of 12 (π − x) given in Example (a), Section 12.1, p. 306, prove the formulas ∞ 1 π2 , = 2 6 n n=1

∞ 1 π4 , = 4 90 n n=1

∞ 1 = π2k Bk for some rational Bk n2k n=1

(k = 1, 2, . . .).

366

Measure and Integral: An Introduction to Real Analysis

7. Prove that each of the systems (a) 12 , cos x, cos 2x, . . . , cos kx, . . . (b) sin x, sin 2x, . . . , sin kx, . . . is orthogonal and complete over (0, π). 8. Let {φj (x)} and {ψk (y)} be two orthogonal systems: the first over a set A ⊂ Rm and the second over a set B ⊂ Rn . Then the (double) system ωj,k (x, y) = φj (x)ψk (y) is orthogonal over the Cartesian product C = A × B ⊂ Rm+n . If both {φj } and {ψk } are complete, so is {ωj,k }. Generalize this to the case of more than two orthogonal systems. 9. Let {(k1 , k1 , . . . , kn )} be all lattice points in the space Rn (i.e., all distinct points with integral coordinates). Then the system exp {i (k1 x1 + k2 x2 + · · · + kn xn )} is orthogonal and complete over any cube in Rn with edge length 2π. ikx 10. If f ∼ ck eikx ∈ L2 and g ∼ dk e ∈ L2 , then h = fg is integrable, and inx if h ∼ Cn e , then Cn =

+∞

ck dn−k ,

k=−∞

where the series on the right converges absolutely. (For n = 0, this 2π means (1/2π) 0 fg = ck d−k , which is a variant of (12.18).) See also Exercises 17 and 18. 11. Let f ∼ ck eikx ∈ L2 . For each n, let γn =

k=n

ck

1 1 = cn−k . n−k k k=0

Show that {γn } ∈ l2 and, more precisely, that |γn |2 ≤ π2 |ck |2 . The numbers γn are discrete analogues of the formal Hilbert integral [f (t)/(x − t)] dt; see Section 3 of Chapter 13. (The numbers γn are the Fourier coefficients of the function fg, where g ∼ k=0 eikx /k = i(π − x) for 0 < x < 2π (see Example (a), Section 12.1, p. 306), and we have 2π 0

| fg|2 ≤ π2

2π 0

| f |2 .)

367

A Few Facts from Harmonic Analysis

12. Let f and g be periodic, f ∈ Lp , g ∈ Lp , 1 ≤ p ≤ ∞, 1/p + 1/p = 1. Consider the product hx (t) = f (x+t)g(t) as a function of t. Show that the nth Fourier coefficient of hx (t) tends to 0 as n → ∞, uniformly in x. (Show that the L1 -modulus of continuity of h tends to 0 uniformly in x; apply (12.22).) 13. Let f be periodic and integrable. Show that for the partial sums sn and sn , we have the following formulas: π (a) sn (x) = (1/π) −π f (x + t) [(sin nt)/t] dt + εn (x), where εn (x) tends to 0 uniformly in x as n → ∞. π (b) sn (x) = (1/π) −π f (x + t) [(1 − cos nt)/t] dt + ηn (x), where ηn (x) tends to 0 uniformly in x. (For (a), except for an error which is o(1) uniformly in x, we have sn (x) =

π 1 sin nt dt, f (x + t) π −π 2 tan 12 t

and since 12 cot 12 t− 1t is bounded in (−π, π), the result follows easily from Exercise 12 with p = 1, p = ∞.) π n (t)| dt, D n denoting the conjugate Dirichlet kernel. 14. Let Ln = 2π−1 0 |D Prove the following analogue of Theorem 12.36: 2 Ln = log n + o(log n) π

as n → ∞.

n )| = max{| Show also that Ln = | sn (0, sign D sn (0, f )| : f with | f | ≤ 1}. 15. Show that if x is a point of continuity of an integrable f , then sn (x) = o(log n). If f has a jump discontinuity at x and the value of the jump is d = f (x+) − f (x−), show that sn (x) = −(d/π) log n + o(log n). (The second statement follows from the first by considering [if, e.g., x = 0] the function h(t) = 12 (π − t) of Example (a), Section 12.1, p. 306, whose conjugate function h satisfies h = −g, where g is the function of Exercise 5(c).) 16. Prove the following more general form of Theorem 12.38. Suppose that lim inf sn = s and lim sup sn = s¯ are finite. Then both lim inf σn and lim sup σn are contained between the numbers 1 1 (s + s¯) ± A (¯s − s), 2 2 where A is the same as in condition (i) of Theorem 12.38.

17. Prove the following extension of Exercise 10. Let f ∈ Lpand g ∈ Lp , 1 < p < ∞, 1/p + 1/p = 1 (thus also 1 < p < ∞). If f ∼ ck eikx and

368

Measure and Integral: An Introduction to Real Analysis ikx g∼ dk e , then the Fourier coefficients Cn of the (integrable) function h = fg are given by Cn =

∞

ck dn−k =

k=−∞

M

lim

M→+∞

.

k=−M

Thus, the Cn are the same as in Exercise 10, but the series representing the Cn are no longer claimed to be absolutely convergent, and we must consider the limits of their symmetric partial sums. (The proof is parallel to that of Exercise 10. We write Cn =

2π 2π 2π 1 1 1 fge−inx = (f − sM )ge−inx + sM geinx , 2π 2π 2π 0

0

0

where sM = sM (x, f ), observe that the first integral on the right is 2π majorized by 0 | f − sM | |g| , apply Hölder’s inequality, and use the fact that || f − sM ||p → 0 by Theorem 12.88.) 18. The result of Exercise 17 is valid when p = 1, p = ∞ (or when p = ∞, p = 1), but the series defining the Cn must be taken in the sense of the (symmetric) first arithmetic means: Cn =

lim

M→+∞

M

ck 1 −

k=−M

|k| dn−k . M+1

19. We know that Theorem 12.79 is false for p = 1. Show that it is also false for p = ∞, that is, that the conjugate function of a bounded function (sin kx)/k ∼ need not be bounded. (Consider, e.g., the two series ∞ k=1 ∞ 1 1 1 k=1 (cos kx)/k ∼ 2 log(1/|2 sin 2 x|), −π < x < π 2 (π−x), 0 < x < 2π, and [see Exercise 5].) 20. There is a substitute result for Theorem 12.79 in case p = ∞. Let f be a periodic function with | f | ≤ 1. Then there are absolute constants λ, μ > 0 such that π

eλ|f | ≤ μ.

−π

See also Exercise 27 in Chapter 14. (Write

eλ|f | − λ| f| − 1 =

∞ λn | f |n n=2

n!

369

A Few Facts from Harmonic Analysis

and integrate termwise. Use (12.80)(a), (12.85), and the fact that nn ≤ cn n!.) 21. Suppose that f is a periodic function for which π

| f | log+ | f | < +∞,

−π

where log+ stands for the positive part of log. This clearly implies that f ∈ L1 . Show that f ∈ L1 and that there are absolute constants A and B such that π

π

| f| ≤ A

−π

| f | log+ | f | dx + B.

−π

(Write ω(α) = |{x : |x| < π, | f (x)| > α}|. Then π ∞ 2 ∞ f = ω(α) dα = + . −π

0

0

2

For the first integral on the right, use the fact that ω(α) ≤ 2π, and for the second, use an argument like that in the proof of Lemma 12.81.) 22. The discussion that precedes Theorem 12.31 shows only that the averages ( sn (x) + sn−1 (x))/2 converge. Give the remaining details of the proof of sn−1 (x).) Theorem 12.31. (Consider sn (x) − 23. (a) Prove the statement about nontangential approach made before Theorem 12.42, namely, that given γ > 0, there exist C, δ > 0 with δ < 1 such that |1 − z| ≤ C(1 − |z|) if z = x + iy, |z| < 1, 1 − δ < x < 1 and |y| < γ(1 − x). (It may be helpful to use the asymptotic estimates | sin θ| |θ| and 1 − cos θ θ2 /2 as θ → 0.) (b) Verify that (12.65) characterizes the nontangential approach of z = reix to 1, |z| < 1.

13 The Fourier Transform In this chapter, we will study properties of the Fourier transform f (x) of a function f on Rn , n ≥ 1, defined formally (for the moment) as f (x) =

1 f (y) e−ix·y dy, (2π)n n

x ∈ Rn .

(13.1)

R

n

Here x · y = of x = (x1 , . . . , xn ) and y = 1 xk yk is the usual dot product √ f may (y1 , . . . , yn ), and i is the complex number i = −1 = eiπ/2 . Both f and be complex-valued. Different normalizations of f are common in the literature, such as 1 f (y)e−ix·y dy, (2π)n/2 n R

f (y)e2πix·y dy, . . . ,

Rn

but the important properties of f are unaffected by normalization, and passing from one normalization to another is easy by scaling. We will often abuse notation by denoting f (x) = f (x)

or f (x) = f (x)

instead of the more cumbersome notations f (y)(x), (f (·))(x), etc. For example, we will do this in Theorem 13.8 when computing the Fourier transform 2 −|x|2 and (e−|x|2 ) are somewhat simpler than of e−|x| since the notations e (e−| · | )(x). Note that f (x) is a formal analogue of the sequence {cj }∞ −∞ of trigonometric Fourier coefficients of a periodic function on the line, with the continuous variable x now playing the role of j. One of our main goals is to derive an analogue of Parseval’s formula (12.15), that is, to prove that the mapping f → f is essentially an isometry on L2 (Rn ). Of course, an important requirement for achieving this is to find an interpretation of f in case f ∈ L2 (Rn ). Unlike the formulas for Fourier coefficients in the one-dimensional periodic case, the integral in (13.1) may not converge absolutely for every f ∈ L2 (Rn ). However, as is easy to see, (13.1) does converge absolutely if f ∈ L1 (Rn ). Properties of f when f ∈ L1 (Rn ) are simpler to derive precisely because of this absolute convergence, and we will 2

371

372

Measure and Integral: An Introduction to Real Analysis

begin with that case. Furthermore, properties of f when f ∈ L1 (Rn ) will be useful later for studying f for other classes of functions f .

13.1 The Fourier Transform on L1 In this section, we list some properties of the Fourier transform of functions in L1 (Rn ). (1) Let f ∈ L1 (Rn ), n ≥ 1. Define f (x) by (13.1) and note that (cf. Exercise 1 of Chapter 8) 1 −ix·y | f (x)| = f (y)e dy n (2π) n R

1 ≤ | f (y)| dy, (2π)n n

x ∈ Rn .

R

Thus, the integral in (13.1) converges absolutely (i.e., exists in the usual Lebesgue sense) for every f ∈ L1 (Rn ) and every x ∈ Rn , and the mapping f → f sends the space L1 (Rn ) into L∞ (Rn ) with f (x)| ≤ (2π)−n f 1 . sup |

(13.2)

x∈Rn

The verification is immediate. (2) The mapping f → f is linear on L1 (Rn ), that is, if f1 , f2 ∈ L1 (Rn ) and c1 , c2 ∈ C (the class of complex numbers), then f1 (x) + c2 f2 (x), c1 f1 + c2 f2 (x) = c1

x ∈ Rn .

(13.3)

We leave the simple proof to the reader. (3) Next, let us show that f (x) is a uniformly continuous function of x on Rn 1 n if f ∈ L (R ). For any x, h ∈ Rn , f (x + h) − f (x) = f (y)e−ix·y e−ih·y − 1 dy (2π)n n R ≤ | f (y)| min{|h||y| , 2} dy Rn

≤

| f (y)| |h||y| dy +

|y| 0 will be chosen momentarily. Let ε > 0. Note that II depends only on N and f , not on x or h, and II → 0 as N → ∞ since f ∈ L1 (Rn ). Fix N so large that II < ε/2. For I, we have ε I≤ | f (y)| |h|N dy ≤ f 1 N|h| < 2 |y|≤N

if |h| < ε/(2 f 1 N). Here, we have assumed that f 1 = 0, but otherwise f ≡ 0 and the result is trivial. Hence, I + II < ε uniformly in x and h provided |h| is small, which proves the result. (4) There is a version for the Fourier transform of the Riemann–Lebesgue Theorem 12.21 for Fourier coefficients, namely, f (x) = 0. Theorem 13.4 (Riemann–Lebesgue) If f ∈ L1 (Rn ), then lim |x|→∞

Proof. We will give two proofs. First, for any x ∈ Rn , |x| = 0, we write f (x) = (2π)n

f (y)e−ix·y dy

Rn

=

2

πx πx −iπ |x| f y + 2 e−ix·y e |x|2 dy = − f y + 2 e−ix·y dy. |x| |x| n n

R

R

Adding the first and last formulas gives

πx −ix·y 2 (2π) f (x) = f (y) − f y + 2 e dy |x| n n

R

and 2 (2π) | f (x)| ≤ n

f (y) − f y + πx dy. |x|2 n

R

The last integral tends to 0 as |x| → ∞ by continuity in L1 , Theorem 8.19, and the first proof is complete. We may also proceed by computing the Fourier transform of step functions and using a density argument. Let I = nk=1 [ak , bk ] be any interval in Rn with positive measure |I|. For any x = (x1 , . . . , xn ), by Fubini’s theorem,

374

Measure and Integral: An Introduction to Real Analysis

b1

n

(2π) χI (x) =

bn

···

a1

an

bk n

=

e−ix1 y1 · · · e−ixn yn dy1 · · · dyn

e−ixk t dt =

k=1 ak

n

F[ak ,bk ] (xk ),

k=1

where for any one-dimensional interval [a, b] and any s ∈ R1 , we denote

F[a,b] (s) =

b a

⎧ ⎨b − a if s = 0 e−ist dt = e−isb − e−isa ⎩ if s = 0. −is

iφ iψ 1 The simple inequality |e − e | ≤ |φ − ψ|, φ, ψ ∈ R shows that F[a,b] (s) ≤ b − a. Also, F[a,b] (s) ≤ 2/|s| if s = 0. Since |x| ≤ |x1 | + · · · + |xn |, there exists k0 = 1, . . . , n depending on x such that |x| ≤ n|xk0 |. Combining estimates, if |x| = 0, we obtain n F[a ,b ] (xk ) ≤ (2π)n χI (x) ≤ k k k=1

≤

2n |I| , |x|

n 2 (bk − ak ) |xk0 | k=1 k =k0

where = min(bk − ak ). k

Hence, χI (x) = O(|x|−1 ) as |x| → ∞, and in particular, χI (x) → 0 as |x| → ∞. If |I| = 0, then χI ≡ 0. If f is a linear combination of characteristic functions of intervals in Rn (i.e., if f is a step function), it then follows from (13.3) that f (x) → 0 as |x| → ∞. Finally, given any f ∈ L1 (Rn ) and ε > 0, choose a step function g such that f − g1 < ε (see the comment at the end of the proof of Lemma 7.3 for the existence of such a g). Then for any x ∈ Rn , by (13.2) and (13.3), | f (x)| ≤ | f (x) − g(x)| + | g(x)| = |(f − g)(x)| + | g(x)| ε < + | g(x)|. (2π)n Since g tends to 0 at infinity, so does f , and Theorem 13.4 is again proved. The next three properties show how the Fourier transform interacts with translations, dilations, and rotations in Rn . The proofs of the first two are left as exercises.

375

The Fourier Transform

(5) (Translation) Let f ∈ L1 (Rn ) and h ∈ Rn . Define the translation τh f of f by h as (τh f )(x) = f (x + h). Then ix·h τ f (x), h f (x) = e

x ∈ Rn .

(13.5)

Also,

τh f (x) = f (x + h) = f (x)e−ix·h ,

x ∈ Rn ,

that is, if Eh f is the function defined by Eh f (x) = f (x)e−ix·h , then f =E τh hf . (6) (Dilation) Let f ∈ L1(Rn ) and λ ∈ R1 − {0}. Define the dilation δλ f of f by λ to be the function δλ f (x) = f (λx). Then 1 x δ f λ f (x) = |λ|n λ

1 = δ 1 f (x) , λ |λ|n

x ∈ Rn .

(13.6)

(7) (Rotation) Let O be an orthogonal linear transformation of Rn and set (Of )(x) = f (Ox), x ∈ Rn . If f ∈ L1 (Rn ), then (x) = (O Of f )(x)

= f (Ox) ,

x ∈ Rn .

(13.7)

In fact, (x) = Of

1 f (Oy)e−ix·y dy (2π)n n R

=

dy 1 −1 f (y)e−ix·O y n (2π) n | det O| R

=

1 f (y)e−i(Ox·y) dy (2π)n n R

since | det O| = 1 and x · O−1 y = Ox · OO−1 y = Ox · y by orthogonality. The last expression is f (Ox) by definition, and (13.7) is proved. See Exercise 5 for an analogue of (13.7) for general nonsingular linear transformations of Rn . By definition, a radial function of x is one that depends only on |x|. Thus, f is a radial function on Rn if there is a function g(t), t ≥ 0, such that f (x) = g(|x|) for all x ∈ Rn . Formula (13.7) says that rotations about the origin commute with the Fourier transform. As a consequence, we immediately obtain the next property.

376

Measure and Integral: An Introduction to Real Analysis

(8) The Fourier transform of an integrable radial function is a radial function. It is possible to explicitly compute the Fourier transforms of e−|x| and e−|x| , and the formulas obtained have important applications. The computation of −|x|2 is the simpler of the two, and the result will be used later in order to e −|x| find e . √ 2 2 (9) In shorthand notation, we have e−|x| = (2 π)−n e−|x| /4 , that is, 2

Theorem 13.8

For all x ∈ Rn , 1 −|y|2 −ix·y 1 2 −|x|2 = e e e dy = √ n e−|x| /4 . (2π)n n (2 π) R

Proof. By Fubini’s theorem, 1 −|y|2 −ix·y e e dy (2π)n n R

=

∞ ∞ 1 2 2 · · · e−y1 · · · e−yn e−ix1 y1 · · · e−ixn yn dy1 · · · dyn n (2π) −∞ −∞

=

n ∞ n ∞ 1 −y2 −ixk yk 1 −(t2 +itxk ) k e dy = e dt. k (2π)n (2π)n −∞ −∞ k=1

k=1

2 Writing t2 + itxk = t + i x2k +

x2k 4,

we see that the last expression equals

n ∞ n 1 −(t+ixk /2)2 2 e dt · e−xk /4 . (2π)n −∞ k=1

k=1

Next, we use, without proof, the identities ∞

e−(t+is) dt = 2

−∞

∞

e−t dt = 2

√

π if −∞ < s < ∞.

−∞

The first of these is a corollary of Cauchy’s contour integration theorem; the second one is classical and has been observed in Exercise 11 of Chapter 6. Hence, −|x|2 = e

√ 1 1 2 2 · ( π)n · e−|x| /4 = √ n e−|x| /4 , (2π)n (2 π)

and the proof is complete.

377

The Fourier Transform

By rescaling e−|x| and using the dilation property (13.6), we easily find a function that is equal to a multiple of its own Fourier transform: 2

e−|x|

2 /2

=

1 2 e−|x| /2 , (2π)n/2

x ∈ Rn .

(13.9)

In order to find an analogue in higher dimensions of the one-dimensional Gauss–Weierstrass kernel defined in (9.12), we first consider the kernel K(x) = π−n/2 e−|x| , 2

x ∈ Rn ,

and the corresponding approximation of the identity: Kε (x) = ε−n K(x/ε) =

√

−n −|x|2 /ε2 πε e ,

ε > 0.

√ Note that Rn K(x) dx = 1, again by Exercise 11 of Chapter 6. Setting ε = t, t > 0, we obtain the n-dimensional Gauss–Weierstrass kernel defined by √ 2 W(x, t) = ( πt)−n e−|x| /t ,

x ∈ Rn , t > 0.

(13.10)

Then

W(x, t) dx = 1,

t > 0.

Rn

For x ∈ Rn and t > 0, the convolution Wf (x, t) defined by Wf (x, t) = [f ∗ W(·, t)](x) =

f (x − y)W(y, t) dy

(13.11)

Rn

is called the Gauss–Weierstrass integral of f . The kernel W(x, t) satisfies the heat equation

∂2

∂2 + · · · + ∂x2n ∂x21

W(x, t) = 4

∂ W(x, t) ∂t

(13.12)

in the upper half-space Rn+1 = {(x, t) : x ∈ Rn , t > 0} (see Exercise 6(a)). + Due to the dilation property (13.6) and Theorem 13.8, the Gauss– Weierstrass kernel satisfies

2 W(x, t) = e−t|x| /4 ,

W(x, t) = (2π)−n e−t|x|

2 /4

(13.13)

378

Measure and Integral: An Introduction to Real Analysis

if (x, t) ∈ Rn+1 + , where the Fourier transforms are taken in the x variable. We will use the first of these equations to derive an inversion formula for the Fourier transform, that is, a way to recover f from f . We need the following basic result in order to accomplish this. (10) (Shifting hats) If f , g ∈ L1 (Rn ), then

f (x) g(x) dx =

Rn

f (x) g(x) dx.

(13.14)

Rn

Proof. Note that both integrals in (13.14) are finite; for example, the one on the left side is finite since g ∈ L1 (Rn ) and f ∈ L∞ (Rn ) by (13.2). The formula itself is a corollary of Fubini’s theorem since 1 f (x) g(x) dx = f (y)e−ix·y dy g(x)dx (2π)n n n n R R R 1 −ix·y = f (y) g(x)e dx dy (2π)n n Rn R = f (y) g(y) dy,

Rn

where the change in the order of integration is justified because f (y)g(x) ∈ L1 (R2n , dydx). (11) We have the following inversion result (see also (13.28)). Theorem 13.15 (Inversion of the Fourier transform on L1 ) If f ∈ L1 (Rn ), then at every point x of the Lebesgue set of f ,

f (x) = lim

t→0+

2 f (y)eix·y e−t|y| dy.

(13.16)

Rn

If in addition f ∈ L1 (Rn ), then at every Lebesgue point x of f , f (x) =

f (y)eix·y dy.

(13.17)

Rn

In particular, (13.16) and (13.17) hold a.e. in Rn and at every point of continuity of f .

379

The Fourier Transform

Note that (13.17) can be rewritten as

f (−x) = (2π)n f (−x) . f (x) = (2π)n

(13.18)

Before proving Theorem 13.15, we mention that the periodic analogue of (13.17) is that if the sum of the Fourier coefficients of a periodic f ∈ L1 (−π, π) converges absolutely, then S[f ], the Fourier series of f , converges to f at every Lebesgue point of f . This is a corollary of the first part of Lebesgue’s Theorem 12.51 (together with Theorem 12.38) since S[f ] converges everywhere if the sum of the Fourier coefficients of f converges absolutely. Proof. To prove Theorem 13.15, let f ∈ L1 (Rn ) and write

Wf (x, t) =

f (x − y)W(y, t) dy =

Rn

f (x + y)W(y, t) dy,

Rn

where the last equality is true since W(y, t) is an even function of y. By (13.13) and (13.14), the last integral equals

2 −t|y|2 /4 (τx f )(y) e−t|y| /4 dy = τ dy x f (y) e

Rn

=

Rn 2 f (y)eix·y e−t|y| /4 dy

by (13.5).

Rn

In summary, we have Wf (x, t) =

2 f (y)eix·y e−t|y| /4 dy for all (x, t) ∈ Rn+1 + .

(13.19)

Rn

Now let t → 0+. Due to the convolution representation of Wf (x, t) and Theorem 9.13, Wf (x, t) converges to f (x) at every Lebesgue point x of f . The same is true if t is replaced by 4t, and consequently, (13.16) follows from (13.19). Assuming in addition that f ∈ L1 (Rn ), we easily obtain from Lebesgue’s dominated convergence theorem that, as t → 0, the integral on f (y)eix·y dy. Formula (13.17) now follows at the right in (13.19) tends to Rn every Lebesgue point of f , and the proof is complete. The integrals in (13.16), namely, Rn

2 f (y)eix·y e−t|y| dy,

x ∈ Rn , t > 0,

380

Measure and Integral: An Introduction to Real Analysis

are referred to as the Gauss–Weierstrass means of the Fourier inversion integral ix·y dy, although the latter integral may not exist in the Lebesgue sense Rn f (y)e 1 if f ∈ L (Rn ). In general, the integrals

g(y)e−t|y| dy 2

Rn

are called the Gauss–Weierstrass means of Rn g and may exist and be finite for every t > 0 while Rn g may not. The analogous expressions

g(y)e−t|y| dy,

t > 0,

Rn

are called the Abel means of

Rn

g (see (13.28)).

(12) An immediate corollary of (13.16) in Theorem 13.15 is f (x) = g(x) for all x ∈ Rn , Corollary 13.20 (Uniqueness) If f , g ∈ L1 (Rn ) and n 1 n then f = g a.e. in R . In particular, if f ∈ L (R ) and f = 0 everywhere in Rn , then f = 0 a.e. in Rn . (13) Another corollary of Theorem 13.15 is the following simple sufficient condition for integrability of f. Corollary 13.21 Suppose that f ∈ L1 (Rn ), f is continuous at x = 0 and f ≥0 everywhere. Then f ∈ L1 (Rn ) and f 1 = f (0). Proof. Under the hypotheses, we may set x = 0 in (13.16) to obtain f (0) = lim

t→0+

2 f (y)e−t|y| dy.

Rn

2 f (y)e−t|y| dy is bounded in t for t > 0. Its inteIn particular, the integral Rn grand is nonnegative since f ≥ 0 by assumption. Letting t → 0, Fatou’s lemma then implies that f is integrable. Finally, (13.17) with x = 0 gives f (0) =

Rn

completing the proof.

f (y) dy = f 1 ,

381

The Fourier Transform

(14) Next, we compute the Fourier transform of e−|x| in Rn , n ≥ 1. Let

(α) =

∞

sα−1 e−s ds,

α > 0,

0

denote the classical Gamma function.

Theorem 13.22

For all x ∈ Rn and ε > 0, n+1 ε 2 −ε|x| e = n+1 n+1 . π 2 ε2 + |x|2 2

Proof. Because of the dilation property (13.6), it is enough to prove the result in case ε = 1. We consider first the one-dimensional case, n = 1, where the computation is simplest. Let x ∈ (−∞, ∞). Then −|x| = 2πe

∞

e−|t| e−ixt dt

−∞

=

∞

e

−t−ixt

dt +

∞

et−ixt dt

−∞

0

=

0

e−(1+ix)t dt +

0

∞

e−(1−ix)t dt

0

e−(1−ix)t e−(1+ix)t − = − 1 + ix 1 − ix =

∞ t=0

1 2 1 + = . 1 + ix 1 − ix 1 + x2

Hence, 1 1 −|x| = e , π 1 + x2

x ∈ (−∞, ∞),

and we are done since when n = 1, then ((n + 1)/2) = (1) = 1.

(13.23)

382

Measure and Integral: An Introduction to Real Analysis

Combining (13.23) with the inversion formula (13.17) in case n = 1, we obtain e−|x| =

∞ ∞ 1 1 1 1 ixt e dt = e−ixt dt, π −∞ 1 + t2 π −∞ 1 + t2

x ∈ (−∞, ∞).

Next, with the purpose of introducing an exponential factor e−t in the computation, we write 2

1 2 = e−(1+t )s ds. 2 1+t ∞

0

Consequently,

e−|x|

⎛ ⎞ ∞ ∞

1 ⎝ − 1+t2 s ⎠ −ixt = e ds e dt, π −∞

x ∈ (−∞, ∞).

(13.24)

0

Now consider the higher dimensional case: −|x| = (2π)n e

e−|y| e−ix·y dy,

x ∈ Rn .

Rn

On the right side, express the factor e−|y| by using (13.24) with x there chosen to be |y|, obtaining

e

−|y|

⎞ ⎛ ∞ ∞

1 ⎝ − 1+t2 s ⎠ −i|y|t = e ds e dt π −∞ 0 ∞ ∞ 1 −s −t2 s −i|y|t = e e e dt ds π −∞ 0

1 −s √ −|y|2 /(4s) 1 = e πe √ ds, π s ∞

0

where the last equality follows by dilation from the one-dimensional version of Theorem 13.8. By substitution, we obtain that for any x ∈ Rn ,

383

The Fourier Transform ⎞ ∞ √ 1 ds 2 −|x| = ⎝ (2π)n e e−s π e−|y| /(4s) √ ⎠ e−ix·y dy π s Rn 0 ∞ 1 e−s −|y|2 /(4s) −ix·y =√ e e dy ds √ π s n

⎛

R

0

∞ 1 e−s √ n √ n −s|x|2 =√ π 2 s e ds, √ π s 0

by another application of Theorem 13.8, this time in the n-dimensional form. Regrouping terms gives −|x| = 2n (2π)n e

1+|x|2 s

√ n−1 ∞ − π e

s

n−1 2

ds

0

=

2n

√ n−1 ∞ π

(1 + |x|2 )

n+1 2

e−s s

n−1 2

ds.

0

n+1 2 , and therefore n+1 1 2 −|x| e = , n+1 n+1 π 2 (1 + |x|2 ) 2

The last integral equals

x ∈ Rn ,

as desired. This completes the proof of Theorem 13.22. n+1 (15) (The Poisson Integral in Rn+1 = {(x, ε) : x ∈ Rn , + ) For (x, ε) ∈ R+ ε > 0}, we denote

n+1 ε 2 −ε|x| = P(x, ε) = e n+1 n+1 , π 2 ε2 + |x|2 2

(13.25)

and call P(x, ε) the Poisson kernel for the half-space Rn+1 + . The case n = 1 is considered in (9.10), and the periodic version is defined in Section 12.7 on p. 348. As a function of x, P(x, ε) is clearly positive and of class L1 (Rn ) ∩ L∞ (Rn ) for each ε > 0. Also, as a function of (x, ε), it is infinitely differentiable in n+1 R+ . By direct computation, it satisfies Laplace’s equation

∂2

∂2 ∂2 + · · · + + ∂x2n ∂ε2 ∂x21

P(x, ε) = 0

in Rn+1 + .

(13.26)

384

Measure and Integral: An Introduction to Real Analysis

Moreover, P(x, ε) is an approximation of the identity since if P(x) is defined by P(x) = P(x, 1), then P(x, ε) = ε−n P(x/ε), and by (13.17),

P(x, 1) dx =

Rn

−|x| −|x| e dx = e

Rn

= 1. x=0

Hence, since P(x, 1) = O(1/|x|n+1 ) as |x| → ∞, Theorem 9.13 implies that the Poisson integral of f , defined to be the convolution Pf (x, ε) =

f (x − y)P(y, ε) dy,

(13.27)

Rn

converges to f (x) as ε → 0 at each point x of the Lebesgue set of f if f ∈ L1 (Rn ). If f ∈ Lp (Rn ), p > 1, the same is true (cf. Exercise 12 of Chapter 9). Note that Pf (x, ε) =

−ε|y| dy = f (x − y)e

Rn

=

−ε|y| dy f (x + y)e

Rn

f (y)eix·y e−ε|y| dy

if f ∈ L1 (Rn ).

Rn

Hence, if f ∈ L1 (Rn ) and x is a Lebesgue point of f , then f (x) = lim

ε→0+

f (y)eix·y e−ε|y| dy.

(13.28)

Rn

Recall that the integrals

f (y)eix·y e−ε|y| dy,

ε > 0,

Rn

are called the Abel means of the (formal) integral We also have by inversion that P(x, ε) = e−ε|x| ,

Rn

f (y)eix·y dy.

x ∈ Rn , ε > 0,

(13.29)

where the Fourier transform is taken in the x variable. (16) (The Convolution Property) By Young’s theorem, the convolution of any two integrable functions is also integrable and therefore has a well-defined Fourier transform.

385

The Fourier Transform

Theorem 13.30

If f , g ∈ L1 (Rn ), then f (x) g(x) f ∗ g(x) = (2π)n

for all x ∈ Rn .

This follows easily from Fubini’s theorem; we omit the details. In case n = 1, the result is listed in Exercise 6 Chapter 6. See also Theorem 13.59 for an important related result. (17) (Differentiation Properties) We continue our list of properties of the Fourier transform with two important differentiation formulas.

Theorem 13.31 (a) Let xk , k = 1, . . . , n, denote the kth coordinate of x, x ∈ Rn . If f and xk f (x) belong to L1 (Rn ), then f has a partial derivative with respect to xk everywhere in Rn , and ∂ f (x) = − ixk f (x) ∂xk

−ix·y 1 − iyk f (y) e dy . = (2π)n n

(13.32)

R

(b) Fix k = 1, . . . , n and let h = (0, . . . , 0, hk , 0, . . . , 0) = 0 lie on the kth coordinate axis. Suppose that f ∈ L1 (Rn ) and that there is a function g such that f (x + h) − f (x) lim − g(x) dx = 0. h h →0 k

(13.33)

k

Rn

Then g ∈ L1 (Rn ) and g(x) = ixk f (x) for all x ∈ Rn .

(13.34)

A few remarks about condition (13.33) are listed after the proof of the theorem. Proof. (a) Denoting nonzero points on the kth coordinate axis by h = (0, . . . , 0, hk , 0, . . . , 0), we have e−ihk yk − 1 −ix·y f (x + h) − f (x) 1 = f (y) dy. e hk (2π)n n hk R

386

Measure and Integral: An Introduction to Real Analysis

Since the expression in curly brackets converges to −iyk as hk → 0 and is bounded in absolute value by |yk |, formula (13.32) follows from the Lebesgue dominated convergence theorem and the hypothesis that f (y)yk is integrable. (b) If g satisfies (13.33), then g is clearly integrable since f is. With h = (0, . . . , 0, hk , 0, . . . , 0) as usual, (13.33) together with (13.2) implies the pointwise equality

f (x + h) − f (x) = g(x), or hk hk →0 1 τh f − f (x) = g(x), x ∈ Rn . lim hk →0 hk lim

Equivalently, by (13.3) and (13.5), for all x,

eixk hk − 1 f (x) lim hk hk →0

= g(x),

and (13.34) follows. We now make some remarks related to (13.33). Given f , a function g that satisfies (13.33) is clearly unique up to a set of Lebesgue measure 0. If such a g exists, it is called the partial derivative of f with respect to xk in the L1 sense, the idea being that the difference quotient of f in the kth coordinate converges in the L1 norm. A simple case when (13.33) holds with g equal to the ordinary partial derivative ∂f /∂xk of f is when f ∈ C10 (Rn ), since then both f and ∂f /∂xk have compact support and the difference quotient of f in the variable xk converges uniformly to ∂f /∂xk . Hence, by (13.34), for every k = 1, . . . , n, we have

∂f ∂xk

f (x), (x) = ixk

x ∈ Rn , if f ∈ C10 (Rn ).

(13.35)

As we will see in Theorem 13.41, the restriction in (13.35) that f ∈ C10 (Rn ) can be considerably weakened. By iteration, (13.35) leads easily to analogous formulas for higher-order derivatives. For example, if f ∈ C20 (Rn ) and j, k = 1, . . . , n, then

∂ 2f ∂xj ∂xk

(x) = ixj

∂f ∂xk

(x) = (ixj ) (ixk ) f (x).

387

The Fourier Transform

More generally, if α = (α1 , . . . , αn ) is a multi-index of nonnegative integers and Dα is the differential operator defined by Dα =

∂ α1 ∂ αn , α1 · · · n ∂xα ∂x1 n

then for any N = 1, 2, . . . and any α with α1 + · · · + αn ≤ N, α f (x) = (ix )α1 · · · (ix )αn D f (x) n 1

n if f ∈ CN 0 (R ).

(13.36)

A higher-order analogue of part (a) of Theorem 13.31 is that if N = 1, 2, . . . and f has the property that p(x)f (x) ∈ L1 (Rn ) for every polynomial p(x) of degree N (or less), then f ∈ CN (Rn ) and α f (x) = [(−ix1 )α1 · · · (−ixn )αn f (x)] if α1 + · · · + αn ≤ N. D

(13.37)

Verification is left to the reader. In particular, (13.36) and (13.37) hold for derivatives of any order if n f ∈ C∞ 0 (R ). n Note that if f ∈ CN 0 (R ) for some N, then for every multi-index α = (α1 , . . . , αn ) with α1 + · · · + αn ≤ N, the fact that Dα f ∈ L1 (Rn ) implies α f is a bounded function on Rn . Hence, by (13.36), the function that D α 1 (ix1 ) · · · (ixn )αn f (x) is bounded in x if α1 + · · · + αn ≤ N. Consequently, | f (x)| ≤ C

1 (1 + |x|)N

n if f ∈ CN 0 (R ).

α f tends to 0 at infinity, fˆ (x) = o(|x|−n ) as |x| → ∞. Roughly Also, since D speaking, this means that the smoother a function with compact support is, the faster its Fourier transform tends to 0 at infinity. In particular, by choosing N = n + 1, we obtain a simple sufficient condition for f to be integrable, namely,

n 1 n Corollary 13.38 If f ∈ Cn+1 0 (R ), then f ∈ L (R ). In particular, the inversion n formula (13.17) holds everywhere in R for such f .

The class S = S (Rn ) of Schwartz functions, or rapidly decreasing functions, on Rn is defined to be the collection of all f ∈ C∞ (Rn ) such that p(x)Dα f (x) is bounded on Rn for every polynomial p(x) and every multi-index α. The bound may vary with α and the polynomial p. Thus, if xβ denotes the

388

Measure and Integral: An Introduction to Real Analysis

β

β

monomial xβ = x1 1 · · · xn n , where β = (β1 , . . . , βn ) and each βk is a nonnegative integer, then S = f ∈ C∞ (Rn ) : sup xβ Dα f (x) < ∞ for all α, β .

(13.39)

x∈Rn

n −|x| belongs Clearly, C∞ 0 (R ) ⊂ S , but the containment is proper since e −|x| is not differento S but does not have compact support. The function e tiable at the origin, and so it does not belong to S . The functions 1, |x|2 , and 2 e−1/|x| (defined to be 0 at the origin) are simple examples of infinitely differentiable functions that do not belong to S . Note that S ⊂ Lp (Rn ) for every p, 0 < p ≤ ∞. Also, if f ∈ S , then p(x)Dα f (x) ∈ S for every polynomial p(x) in Rn and every multi-index α. We leave the proof of the next result to the reader (see Exercise 8). 2

Theorem 13.40 If f ∈ S , then f ∈ S . Also, if f ∈ S , then the formulas in (13.36) and (13.37) hold for all α, that is,

α f (x) = (ix)α D f (x) and Dα f (x) = [(−ix)α f (x)] for all x ∈ Rn and all α. n n In particular, if f ∈ C∞ 0 (R ), then f ∈ S (R ). On the other hand, the only ∞ n function f in C0 (R ) such that f has compact support is f ≡ 0 (see Exercise 28). Theorem 13.40 shows that the formula (∂f /∂xk )(x) = ixk f (x) is valid if f ∈ S , and we observed in (13.35) that it is also true if f ∈ C10 (Rn ). Let us now show that it holds for a much larger class of functions. Recall that (by Theorem 7.29) an absolutely continuous function F(t), t ∈ [a, b] ⊂ R1 , has a b first derivative F a.e. in [a, b] and F(b) − F(a) = a F (t) dt.

Theorem 13.41 Let k = 1, . . . , n and f ∈ L1 (Rn ). Suppose that f is locally absolutely continuous on every line parallel to the kth coordinate axis, that is, suppose that f (x1 , . . . , xk−1 , xk , xk+1 , . . . , xn ), when considered as a function of xk , is absolutely continuous on every interval −∞ < a ≤ xk ≤ b < ∞ for all x1 , . . . , xk−1 , xk+1 , . . . , xn . If ∂f /∂xk ∈ L1 (Rn ), then

∂f ∂xk

f (x), (x) = ixk

x ∈ Rn .

In particular, the formula holds for every k = 1, . . . , n and every f such that f ∈ L1 (Rn ) ∩ Liploc (Rn ) and |∇f | ∈ L1 (Rn ).

389

The Fourier Transform

Proof. The second statement of the theorem (concerning the assumption that f is locally Lipschitz continuous) follows from the first statement since if f is locally Lipschitz continuous, then it is locally absolutely continuous on every line. To prove the first statement, let f satisfy its hypothesis. By Theorem 13.31(b), it is enough to verify (13.33) with g equal to ∂f /∂xk , that is, to show that ∂f /∂xk is the partial derivative of f with respect to xk in the L1 sense. We will show this only when n ≥ 2 and k = 1, leaving the remaining cases for the reader to check. If x = (x1 , x2 , . . . , xn ), denote x = (x1 , xˆ ) with xˆ = (x2 , . . . , xn ) ∈ Rn−1 . By hypothesis, for every xˆ , f (x1 , xˆ ) is an absolutely continuous function of x1 on every compact interval in R1 . Hence, by Theorem 7.29, for every x ∈ Rn and every h = (h1 , 0, . . . , 0) ∈ Rn , we have 1 ∂f f (x + h) − f (x) = f x1 + h1 , xˆ − f (x1 , xˆ ) = x1 + t, xˆ dt. ∂x1 h

0

Thus, if h1 = 0,

h1 f (x + h) − f (x) ∂f 1 ∂f ∂f − (x) = x1 + t, xˆ − x1 , xˆ dt. h1 ∂x1 h1 ∂x1 ∂x1 0

Taking the L1 norm in x, and denoting t = (t, 0, . . . , 0), we obtain h1 ∂f f (x + h) − f (x) ∂f ∂f dx ≤ 1 − (x) (x + t) − (x) h1 ∂x1 h1 ∂x1 ∂x1 n

R

0

dt. 1

Now let h1 → 0. By continuity in L1 (Theorem 8.19), the integrand in the last integral tends to 0 as t → 0, that is, ∂f ∂f (x + t) − (x) ∂x1 ∂x1

→ 0 as t → 0, 1

and the result follows. (18) (The Principal Value Fourier Transform of x−1 ) We close this section with the computation in one dimension of a different notion of the Fourier transform, which we call the principal value Fourier transform, of the function 1/x. Since 1/x fails to be integrable in R1 , both near the origin and near infinity,

390

Measure and Integral: An Introduction to Real Analysis

its Fourier transform cannot be defined by using the one-dimensional version of (13.1). Furthermore, since 1/x does not belong to L2 near the origin, the method developed in the next section for extending the definition of the Fourier transform to L2 functions will not apply to 1/x. However, as we will see, the symmetry and oddness of 1/x can be exploited in order to define its Fourier transform as a limit, denoted p.v. (x−1 ) and defined by p.v.

1 1 = lim x ε→0+ 2π ω→∞

ε1

| f (y)|

1 |y|n−α

dy < ∞,

444

Measure and Integral: An Introduction to Real Analysis

then Iα f exists and is finite a.e. in Rn and ⎧ n ⎫ ⎨ Iα f (x) − [Iα f ] n−α ⎬ 1 B exp c1 dx ≤ c2 ⎩ ⎭ |B| f n/α B

for every ball B ⊂ Rn , with c1 and c2 independent of f and B. See also Exercise 18.

14.5 Bounded Mean Oscillation In this section, we will study functions that satisfy (14.9) with B0 replaced by Rn for the constant functional a(B) = c; cf. (14.11) with exponent β = 0. Such functions have a remarkable local exponential integrability property discovered by F. John and L. Nirenberg. We begin with some standard terminology. A locally integrable function f on Rn is said to be of bounded mean oscillation on Rn (or to be a BMO function on Rn ) if there is a constant c ≥ 0 such that for every ball B ⊂ Rn , 1 f (x) − fB dx ≤ c. |B|

(14.47)

B

As usual, fB denotes the average |B|−1 B f . The collection of all such f is denoted BMO(Rn ) and called the class of functions of bounded mean oscillation on Rn . Equivalently, if we denote f ∗ = sup B

1 f (x) − fB dx, |B|

(14.48)

B

where the supremum is taken over all balls B ⊂ Rn , then f ∈ BMO(Rn ) means that f is locally integrable and f ∗ < ∞. Note that if f and g are any two locally integrable functions, then f +g ∗ ≤ f ∗ + g ∗ , and cf ∗ = |c| f ∗ for any constant c. However, · ∗ is not a norm in the usual sense since it vanishes on constant functions. The next lemma shows that if f is measurable and (14.47) holds with the average fB replaced by a different constant depending on B, then f belongs to BMO(Rn ).

445

Fractional Integrals

Lemma 14.49 such that

Let f be a measurable function on Rn . If there is a constant C 1 | f (x) − c(f , B)| dx ≤ C |B| B

for every ball B ⊂ Rn and for some constant c(f , B) depending on f and B, then f ∈ BMO(Rn ). Moreover, f ∗ ≤ 2C. Proof. If f satisfies the hypothesis, then f is clearly locally integrable by the triangle inequality. Furthermore, for any ball B, 1 1 f (x) − fB dx ≤ | f (x) − c(f , B)| dx + c(f , B) − fB |B| |B| B B ≤ C + c(f , B) − fB , where c(f , B) and C are as in the hypothesis. Also, 1 c(f , B) − fB = f (x) − c(f , B) dx |B| B

1 ≤ | f (x) − c(f , B)| dx ≤ C. |B| B

Therefore, f ∗ ≤ C + C = 2C, and the lemma is proved. A simple corollary of Lemma 14.49 is that balls can be replaced by cubes in the definition of BMO(Rn ). More precisely, suppose that f is locally integrable and satisfies the analogue of condition (14.47) for cubes, that is suppose that f ∗∗ := sup

1 f (x) − fQ dx < ∞, |Q| Q

where the supremum is taken over all cubes Q with edges parallel to the coordinate axes, and fQ = |Q|−1 Q f . Then, given any ball B, by enclosing B in a cube Q with |Q| ≤ cn |B|, we obtain f − fQ dx ≤ f − fQ dx ≤ f ∗∗ |Q| ≤ cn f ∗∗ |B|. B

Q

It follows from Lemma 14.49 with c(f , B) there chosen to be fQ that f ∈ BMO(Rn ) and f ∗ ≤ 2cn f ∗∗ . The converse is also true, that is, the definition of BMO(Rn ) using balls implies the analogous definition using cubes,

446

Measure and Integral: An Introduction to Real Analysis

and f ∗∗ ≤ c f ∗ for some constant c that depends only on n; we leave the verification as an exercise. Our main goal is to study the “size” of functions of bounded mean oscillation. To set the stage, we begin by listing a few examples. First, it is easy to see that L∞ (Rn ) ⊂ BMO(Rn ). In fact, if f ∈ L∞ (Rn ), then f is clearly locally integrable, and for any ball B, 1 | f (x) − fB | dx ≤ f − fB ∞ |B| B

≤ f ∞ + | fB | ≤ 2 f ∞ . Hence, f ∈ BMO(Rn ) and f ∗ ≤ 2 f ∞ . However, the containment L∞ (Rn ) ⊂ BMO(Rn ) is a proper containment. For example, the (essentially) unbounded function log |x| is of bounded mean oscillation on Rn ; see Exercise 20. We also leave it as an exercise to check the / BMO(Rn ); and there following two facts: if −∞ < λ < ∞ and λ = 0, then |x|λ ∈ p are functions with compact support that belong to L (Rn ) for all p, 0 < p < ∞, but do not belong to BMO(Rn ). See Exercises 21 and 22. Another subclass of BMO(Rn ) is the collection of all functions I α f defined in (14.42) when f ∈ Ln/α (Rn ), 0 < α < n. This follows immediately from the exponential integrability estimate in Theorem 14.44 and the simple inequality t ≤ exp (tγ ), t ≥ 0, γ ≥ 1. See also Exercise 27. All the examples given so far of BMO(Rn ) functions are locally exponentially integrable in the sense that there are positive constants c and γ such that

exp c| f (x)|γ dx < ∞ for every ball B ⊂ Rn .

B

A natural question is whether the same is true for every f ∈ BMO(Rn ). If γ > 1, the answer is no, and an example is f (x) = log |x|; for example, if n = 1 and γ > 1, then for any c > 0, 1/2 0

" ! ∞ γ 1 γ dx = exp c log ecu e−u du = +∞. x ln 2

However, the answer is yes if γ = 1. This is a corollary of the following basic fact, which is the main result of the section.

Theorem 14.50 (John–Nirenberg) There are positive constants c1 and c2 depending only on n such that if f ∈ BMO(Rn ), B is a ball in Rn and λ > 0, then

447

Fractional Integrals ! " x ∈ B : f (x) − fB > λ ≤ c1 exp − c2 λ |B|. f ∗

(14.51)

Here, we have assumed that f ∗ = 0; otherwise, f is constant a.e. in Rn and the left side of (14.51) is zero for all B and all λ > 0. Before giving a proof, let us deduce two corollaries of the theorem. Corollary 14.52 Let f ∈ BMO(Rn ) and 1 ≤ p < ∞. There is a positive constant c depending only on n and p such that for every ball B ⊂ Rn ,

1/p p 1 f − fB dx ≤ c f ∗ . |B| B

p

In particular, f ∈ Lloc (Rn ), and for every ball B,

1/p 1 p | f | dx ≤ c f ∗ + | fB | |B| B

Proof. To prove the corollary, we compute (cf. Exercise 16 of Chapter 5) ∞ p 1 p p−1 f − fB dx = x ∈ B : f (x) − fB > λ dλ λ |B| |B| B

0

≤

p |B|

∞ 0

= pc1

" ! c2 λ |B| dλ λp−1 c1 exp − f ∗

f ∗ c2

p ∞

by (14.51)

λp−1 e−λ dλ.

0

The first part of the corollary now follows by taking pth roots, and the second then follows from Minkowski’s inequality. Note that the integral ∞ part p−1 e−λ dλ is the classical gamma function (p). λ 0 The proof shows that the constant c can be chosen to satisfy p

c =

−p pc1 c2

∞

λp−1 e−λ dλ

0 −p

≥ pc1 c2

∞ p

−p

pp−1 e−λ dλ = pc1 c2 pp−1 e−p ,

448

Measure and Integral: An Introduction to Real Analysis

whose pth root is of order O(p1/p ) → ∞ as p → ∞. The constant c in the conclusion of the corollary must tend to ∞ as p → ∞ since functions in BMO(Rn ) may not be locally essentially bounded. Corollary 14.53 Let c1 and c2 be as in Theorem 14.50. If f ∈ BMO(Rn ) and c0 is a positive constant such that c0 f ∗ < c2 , then 1 c0 c1 exp c0 f − fB dx ≤ 1 + |B| c2 − c0 f ∗ B

for every ball B ⊂ Rn . In particular,

B exp{c0 | f |} dx

< ∞ for every ball B ⊂ Rn .

Proof. The result can be deduced from (14.51) combined with the following formula (see Exercise 29 of Chapter 5):

∞ exp c0 | f − fB | dx = |B| + c0 ec0 λ | x ∈ B : | f (x) − fB | > λ | dλ.

B

0

Further details are left to the reader. Theorem 14.50 will be proved in several steps. The main step is to derive an analogue of (14.51) for n-dimensional cubes Q with edges parallel to the coordinate axes, that is, to show that ! " x ∈ Q : f (x) − fQ > λ ≤ c1 exp − c2 λ |Q| (14.54) f ∗∗ for all such Q and all λ > 0, and now with c1 = 4 and c2 = 2−n−1 ln 2. As usual, we assume that f ∗∗ = 0 since otherwise there is nothing to prove. The proof of (14.54) will be based on the following n-dimensional version of the Decomposition Lemma 12.68 (cf. the second remark on p. 353 in Section 12.8). All cubes considered below are assumed to have edges parallel to the coordinate axes. Lemma 14.55 (Decomposition Lemma in R n ) Let Q be a cube in Rn and suppose that f ∈ L1 (Q) and f ≥ 0. Then for any real number s satisfying s≥

1 f, |Q| Q

there are nonoverlapping cubes Q1 , Q2 , . . . contained in Q such that

449

Fractional Integrals

1 |Qk | Qk

f ≤ 2n s for all k, ' (ii) f (x) ≤ s for a.e. x ∈ Q − Qk , ( 1 ' 1 (iii) k |Qk | ≤ s Qk f ≤ s Q f . (i) s

s. Using the hypothesis that |Q| Q f ≤ s together with the fact that |Q| = 2n |Q |, we have for each Q that either 1 f ≤s |Q |

or s

λ ⊂ x ∈ Qk : f (x) − fQ > λ .

(14.57)

k

We have 1 1 fQ − fQ = ≤ f − fQ ≤ 2n s f − f Q k |Q | k Q |Qk | Q k

k

by (14.56). Therefore, for any x ∈ Q, f (x) − fQ ≤ f (x) − fQ + fQ − fQ ≤ f (x) − fQ + 2n s. k k k f (x) − fQ exceeds λ, then by subtracting 2n s from both sides, If the left side we obtain f (x) − fQk > λ − 2n s. Hence, by (14.57), if λ > s, then x ∈ Q : f (x) − fQ > λ ≤ x ∈ Qk : f (x) − fQ > λ − 2n s . k k

451

Fractional Integrals

If λ > 2n s, it follows that |{x ∈ Q : | f (x) − fQ | > λ}| 1 ≤ F(λ − 2n s) |Qk | |Q| |Q| k

1 1 F(λ − 2n s) , | f − fQ | ≤ ≤ F(λ − 2n s) s |Q| s Q

where we have used property (iii) in Lemma 14.55 (applied to f − fQ ) in order to obtain the next-to-last estimate. Therefore, F(λ) ≤

F(λ − 2n s) s

if s ≥ 1 and λ > 2n s.

This inequality will now be iterated in order to prove the estimate F(λ) ≤ 4 exp − 2−n−1 ln 2 λ , λ > 0. Let s=2

γ = 2n s = 2n+1 .

and

Fix λ with λ > γ and choose m = 0, 1, 2, . . . such that (m + 1)γ < λ ≤ (m + 2)γ. Then if m = 0, λ > λ − γ > λ − 2γ > · · · > λ − mγ > γ, and we obtain F(λ) ≤ 2−m F(λ − mγ) after m iterations, even if m = 0. The trivial estimate F(λ − mγ) ≤ 1 together with m ≥ (λ/γ) − 2 then gives λ

F(λ) ≤ 2−m ≤ 22− γ = 4e−

ln 2 γ

λ

if λ > γ. ln 2

Finally, if 0 < λ ≤ γ, then it is also true that F(λ) ≤ 4e− γ λ since F(λ) ≤ 1. This proves (14.54) for all λ > 0 and for all Q and f . To complete the proof, it remains only to deduce inequality (14.51) for balls from its analogue (14.54) for cubes. The ideas needed are like those in the first paragraph after the proof of Lemma 14.49, and we will be brief. We will use the letters c, c1 , c2 to denote various positive constants depending only on n, which may be different at each occurrence. Let B be a ball and f ∈ BMO(Rn ), f ∗ = 0. Choose a cube satisfying B ⊂ Q and |Q| ≤ c |B|. As usual, f − fQ ≤ c f − fQ ≤ c f ∗∗ and so fB − f Q ≤ 1 |B| |Q| B Q f (x) − fQ ≥ f (x) − fB − c f ∗∗ , x ∈ Rn .

452

Measure and Integral: An Introduction to Real Analysis

Then if λ is much larger than f ∗∗ , for example, if λ/2 > c f ∗∗ , we have x ∈ B : f (x) − fB > λ ≤ x ∈ Q : f (x) − fQ > λ/2 ! " λ 1 ≤ c1 exp −c2 |Q| by (14.54). 2 f ∗∗ Hence, since |Q| ≤ c |B| and c−1 f ∗∗ ≤ f ∗ ≤ c f ∗∗ , ! " x ∈ B : | f (x) − fB | > λ ≤ c1 exp −c2 λ |B|, f ∗

λ > c f ∗ .

However, for the remaining values of λ, that is, when 0 < λ ≤ c f ∗ , such an inequality is obvious, and Theorem 14.50 follows.

Exercises 1. Show that if f ∈ C1 (B), where B denotes the closure of a ball B, then the inequalities in (14.3) and (14.4) are true for all x ∈ B. 2. Under the same assumptions as in Theorem 14.2, show that both of the estimates (14.3) and (14.4) can be improved by replacing the integral on their right-hand sides by n |∇f (y) · (x − y)| dy ≤ |x − y|n i=1 B

B

∂f |xi − yi | (y) ∂y |x − y|n dy. i

3. Derive the L1 , L1 Poincaré inequality (14.8) directly by the same method used to prove the subrepresentation Theorem 14.2, instead of obtaining it as a corollary of Theorem 14.2. 4. Let B1 and B2 be balls, with B1 ⊂ Rn1 and B2 ⊂ Rn2 , n1 , n2 ≥ 1. If f (x1 , x2 ) ∈ C1 (B1 × B2 ) ∩ L1 (B1 × B2 ), show that for every (x1 , x2 ) ∈ B1 × B2 , f (x1 , x2 ) − fB ×B ≤ c 1 2

∇y f y1 , y2 k1 x1 , x2 ; y1 , y2 dy1 dy2 1

B1 ×B2

+c

B1 ×B2

∇y f y1 , y2 k2 x1 , x2 ; y1 , y2 dy1 dy2 , 2

453

Fractional Integrals

where ki x1 , x2 ; y1 , y2 =

|xi − yi | 1 , |B1 | |B2 | |x1 −y1 | + |x2 −y2 | n1 +n2 r(B1 )

i = 1, 2,

r(B2 )

fB1 ×B2 = (|B1 | |B2 |)−1 B1 ×B2 f , and c is a constant that is independent of f , x1 , x2 , B1 , and B2 . Note that each integral is at most a multiple of

∇y f (y1 , y2 ) r(B1 ) + ∇y f (y1 , y2 ) r((B2 ) 1 2

B1 ×B2

× K x1 , x2 ; y1 , y2 dy1 dy2

where K x1 , x2 ; y1 , y2 =

1 |B1 | |B2 | |x1 −y1 | r(B1 )

1

+

|x2 −y2 | n1 +n2 −1 r(B2 )

.

5. Let Q be a closed cube in Rn , n > 1. Show that if f ∈ C1 (Q), then f (x) − fQ ≤ cn Q

|∇f (y)| dy, |x − y|n−1

x ∈ Q,

where fQ = |Q|−1 Q f and cn depends only on n. Also, by choosing Q to be the unit cube centered at *the origin and making an affine change of variables, show that if I = ni=1 [ai , bi ] is an interval in Rn and f ∈ C1 (I), then for all x ∈ I, f (x) − fI ≤ cn 1 |I| I

n i=1

∂f (y) (bi − ai ) (n ∂yi

1 |xi −yi |

n−1 dy,

i=1 bi −ai

where fI = |I|−1 I f . *n n 1 6. Let I = i=1 [ai , bi ] be an interval in R and suppose that f ∈ C (I). Show that

n | f (x) − fI | dx ≤ cn (bi − ai ) I i=1

I

where fI = |I|−1

I

∂f ∂x (x) dx, i

f and cn is independent of f and I.

454

Measure and Integral: An Introduction to Real Analysis

7. The proof of Lemma 14.15 was carried out for the ball B = B(0; r) and a point x = (x, 0, . . . , 0) with 0 ≤ x < r. Show that the proof is similar in the general case. 8. Verify the existence of balls Bx and By with the properties described after (14.32), that is, using the notation there and assuming that x, y, r, R, ε satisfy (14.31) and (14.32), show that there are open balls Bx , By ⊂ B(0; r) such that x ∈ Bx , y ∈ By , r(Bx ) = r(By ) = R, and Bx ∩By = ∅. (Denote d = |x− y|, x = x/|x|, y = y/|y|, and let xd = (|x| − d)x and yd = (|y| − d)y . Then the balls Bx = B(xd ; R) and By = B(yd ; R) have the desired properties. The restriction on ε in (14.31) helps to show that Bx , By ⊂ B(0; r). Also, (xd + yd )/2 ∈ Bx ∩ By . It may be helpful to note that |xd − yd | < d, for example, by the law of cosines.) 9. Show that a function that satisfies (14.28) can be redefined on a subset of B0 of measure zero so that (14.29) holds with the same constant C3 as in (14.28). 10. Prove that Theorems 14.12 and 14.25 remain true if B0 is replaced by Rn and f is assumed to be locally integrable on Rn . 11. Show that the function |x|1/2 is Hölder continuous of order β on Rn if and only if β = 1/2. 12. In the plane R2 , consider the class of rectangles I((x, y); h) with edges parallel to the coordinate axes, center (x, y), x-dimension h, and y-dimension h2 , h > 0, where (x, y) and h are allowed to vary. Suppose that f (x, y) is a locally integrable function that satisfies 1 h3

f (u, v) − fI((x,y);h) du dv ≤ C hβ

I((x,y);h)

for some C and β independent of (x, y) and h, with 0 < β ≤ 1. Show that after possible redefinition on a set of measure zero, f satisfies β | f (x, y) − f (u, v)| ≤ c |x − u| + |y − v|1/2 for all (x, y) and (u, v), where c depends only on C and β. (It may be helpful to consider R2 endowed with the metric d((x, y), (u, v)) = |x − u| + |y − v|1/2 instead of the usual metric.) 13. Verify the dilation property δλ (Iα f ) = λα Iα (δλ f ), where (δλ f )(x) = f (λx), λ > 0, and use it to prove (14.35). Similarly, show that if 0 < α < n, q > 0, and there is a constant c such that 1/q sup t x ∈ Rn : Iα f (x) > t ≤ c f 1 t>0

for all f , then q = n/(n − α).

455

Fractional Integrals

14. Let B be the ball B(0; 1/2) in Rn , and let 0 < α < n and 0 ≤ β < 1 − (α/n). Then the function $−1 # f (x) = χB (x) |x|α |log |x||1−β has compact support and belongs to Ln/α (Rn ). Show that there is a positive constant c such that ⎧ 1 ⎪ ⎨log log |x| if β = 0 Iα f (x) ≥ c β ⎪ ⎩ log 1 if 0 < β < 1 − |x|

α n

for all x near 0.

In particular, lim Iα f (x) = +∞,

x→0

and Iα f is not essentially bounded in any neighborhood of the origin. How are Iα f and I α f related? 15. Let 0 < α < n, γ = 2 − (α/n), and B = B(0; 1/2). Show that the function −1 f (x) = χB (x) |x|n |log |x||γ n/(n−α)

belongs to L1 (Rn ) but that Iα f ∈ / Lloc

(Rn ). (Show that there is a con1−γ stant c > 0 such that Iα f (x) ≥ c |x|α−n log |x| for all x near the origin.)

16. If 0 < α < n and f ∈ Ln/α (Rn ), show that I α f is a measurable function. (One way to proceed is to express I α f as the limit of a sequence of measurable functions: for example, by (14.43),

I α f (x) = lim

k→∞

|y|1

| f (y)|

1 |y|n−α

dy < ∞

is necessary and sufficient for the existence and finiteness a.e. of Iα f . Prove that the condition holds if there exists p ∈ [1, n/α) such that

456

Measure and Integral: An Introduction to Real Analysis

f ∈ Ln/α (Rn ) ∩ Lp (Rn ). Consequently, the conclusion of Corollary 14.46 is valid for such f . 19. Let 0 < β ≤ 1 and B0 be a ball in Rn . Suppose that f is a measurable function on B0 that satisfies sup B

1 |B|1+(β/n)

| f (x) − c(f , B)| dx < ∞,

B

where the supremum is taken over all balls B ⊂ B0 and c(f , B) is a constant depending on f and B. Show that f ∈ L1 (B0 ) and that (14.26) holds. (Argue as in the proof of Lemma 14.49.) 20. (a) Show that log |x| ∈ BMO(Rn ). (b) In case n = 1, show that the odd function (sign x) log |x| does not belong to BMO(−∞, ∞). Find an analogue of this fact in case n > 1. (For part (a), it may help to consider two types of balls, those with center xB = 0 and radius r(B) < |xB |/2 and those with center xB and radius r(B) ≥ |xB |/2 [e.g., xB = 0], and apply Lemma 14.49. If B is of the first type, use the fact that log |x| is continuously differentiable in B and choose c(log |x|, B) = log |xB | in the lemma. If B is of the second type, choose c(log |x|, B) = log r(B). For part (b), consider intervals centered at the origin.) / BMO(Rn ), −∞ < λ < ∞, λ = 0. 21. Show that |x|λ ∈ λ Show that |x| χ|x| 0 assuming that f ∈ L1 (Rn ). (Given s and f , adjust the size of the cubes Q in the initial grid, which now covers all of Rn , such that |Q|−1 Q f ≤ s and all Q have the same edge length.) 24. Let w(x) be measurable and positive a.e. in Rn , and suppose that log w ∈ BMO(Rn ). Show that there are positive constants A1 and A2 depending on n, with A1 also depending on log w ∗ , such that

1 A1 1 −A1 w dx w dx ≤ A2 |B| |B| B

for all balls B ⊂ Rn .

B

(This can be deduced from Corollary 14.53 by writing B

wA1 dx =

B

exp A1 log w − (log w)B dx exp A1 (log w)B ,

457

Fractional Integrals

together with a similar formula for B w−A1 dx; note that the function log(1/w) = − log w also belongs to BMO(Rn ).) 25. Let 1 < p < ∞. A nonnegative function w(x) on Rn is said to satisfy the Ap condition if w, w−1/(p−1) ∈ L1loc (Rn ) and there is a constant C such that for every ball B ⊂ Rn , p−1 1 1 1 w(x) dx w(x) p−1 dx ≤ C. |B| |B| B

B

For such w, we will write w ∈ Ap . The opposite inequality with C = 1 is a consequence of Hölder’s inequality. Note that 0 < w(x) < ∞ for a.e. x if w ∈ Ap . (a) Show by direct estimation that |x|γ ∈ Ap if −n < γ < n(p − 1). (Consider first the case when r(B) < |xB |/2, where r(B) and xB denote the radius and center of B.) (b) Show that log w ∈ BMO(Rn ) if w ∈ Ap . (Consider separately the cases p = 2, p < 2 and p > 2. If w ∈ A2 , show that 1 1 sup exp | log w(x) − log wB | dx < ∞, wB = w(x) dx. |B| B |B| B

B

If p < 2 and w ∈ Ap , show that w ∈ A2 . In case p > 2, use the fact that if w ∈ Ap then w−1/(p−1) ∈ Ap , 1/p + 1/p = 1.) 26. Let Q0 be a cube in Rn . We say that f ∈ BMO(Q0 ) if f ∈ L1 (Q0 ) and 1 f ∗∗, Q0 := sup | f − fQ | < ∞, Q⊂Q0 |Q| Q

where the supremum is taken over all cubes Q in Q0 that have the same orientation as Q0 . Prove that if f ∈ BMO(Q0 ), then there are positive constants c1 and c2 such that for all such Q and all λ > 0, (14.54) holds with f ∗∗, Q0 in place of f ∗∗ . 27. Let f be the periodic conjugate function of f as defined in Chapter 12: 1 dt f (x) = − p.v. . f (x − t) 1 π 2 tan 2t |t|≤π (a) If f ∈ L∞ [−π, π], show that f belongs to the class BMO[−π, π] defined in Exercise 26 and that f ∗∗, [−π,π] ≤ c f L∞ [−π,π] with c independent of f .

458

Measure and Integral: An Introduction to Real Analysis

(b) Deduce from part (a) the conclusion of Exercise 20 in Chapter 12. Let f ∈ L∞ (R1 ) and have compact support. Show that the Hilbert transform Hf of f belongs to BMO(R1 ), and derive an analogue for Hf ∗ of the estimate in part (a). (For (a), given a small interval I ⊂ [−π, π], let xI denote the center of I and 2I denote the interval concentric with I with length 2|I|. Decompose f on (xI − π, xI + π) as f = g + h where g = f χ2I . Use Hölder’s inequality and Theorem 12.79, applied to the periodic extension of g, to show that I | g| ≤ c|I| f L∞ [−π,π] . Then estimate I | h(x) − h(xI )| dx by using the mean value theorem.) 28. For a measurable function f on Rn and 0 < α < n, define the fractional maximal function Mα f of f by 1 | f (y)| dy, n−α B:x∈B r(B)

x ∈ Rn .

Mα f (x) = sup

B

Show that Mα f is a measurable function on Rn and that there is a constant c independent of f and x such that Mα f (x) ≤ cIα (| f |)(x). As a consequence, the estimates in Theorem 14.37 for Iα f also hold for Mα f . Show that the same is true if balls B are replaced by cubes Q in the definition of Mα f , with r(B) replaced by the edge length of Q. 29. Let 0 < α < n and suppose that f satisfies

| f |(1 + log+ | f |) dx < ∞.

Rn

Prove that Iα f ∈ Ln/(n−α) (E) for every measurable set E ⊂ Rn with |E| < ∞, and

|Iα f |

n/(n−α)

dx ≤

α/(n−α) C f L1 (Rn )

E

|E| +

+

| f | log | f | dx ,

Rn

where C is independent of f and E. (Combine (14.39) in case p = 1 with the estimate in Exercise 22 of Chapter 9.) 30. Let k = 2, 3, . . . and B be a ball in Rn , n > k. Show that if f ∈ Ck0 (B), then | f (x)| ≤ c

B

|∇ k f (y)|

1 dy, |x − y|n−k

x ∈ B,

( where |∇ k f (y)| denotes the sum |α|=k |Dα f (y)|, α = (α1 , . . . , αn ) is a multi-index of nonnegative integers, |α| = α1 + · · · + αn , and c is a

459

Fractional Integrals

constant independent of B and f . (In case k = 2, apply part (ii) of Corollary 14.6 to f and each component of its gradient, and verify the estimate B

when n > 2.)

1 1 1 dz ≤ cn , n−1 n−1 |x − z| |y − z| |x − y|n−2

x, y ∈ B,

15 Weak Derivatives and Poincaré–Sobolev Estimates First-order Poincaré–Sobolev estimates in Rn are inequalities showing how Lp norms of the gradient of a function control the function itself. For a sufficiently smooth function f , the first-order partial derivatives ∂f /∂xi , i = 1, . . . , n, and the gradient ∇f of course have the usual meanings. When f is continuously differentiable and n > 1, the first-order Poincaré–Sobolev estimates that we will derive are fairly simple consequences of the subrepresentation formulas and norm estimates for fractional integrals proved in Chapter 14. A notable exception to the simplicity of their derivation occurs when p = 1, as we will see. However, Poincaré–Sobolev estimates are also true for less smooth functions. Our first goal is to study a weaker notion of ∇f that may exist when the ordinary gradient does not. This will allow us to extend the subrepresentation formulas in Chapter 14 to more general functions than those of class C1 . Poincaré–Sobolev estimates for functions with weak derivatives can then be derived for n > 1 by using the same pattern as for smooth functions, namely, by applying norm estimates for the fractional integral operator I1 . Results when n = 1 will instead be obtained from the representation in Chapter 7 of an absolutely continuous function as the integral of its derivative.

15.1 Weak Derivatives Let be an open set in Rn , n ≥ 1. Let L1loc () denote the class of locally integrable real-valued functions f on ; as usual, we say f is locally integrable on if it is integrable on every compact subset of . The notion of the weak firstorder partial derivatives of such an f is based on generalizing the standard formula for integration by parts by allowing functions that may be different from the ordinary partial derivatives of f to play their role in the formula. The precise definition in case n > 1 is as follows: if f ∈ L1loc () and i = 1, . . . , n, then f (x) is said to have a weak partial derivative in with respect to xi , where x = (x1 , . . . , xn ), if there is a function gi ∈ L1loc () such that

461

462

Measure and Integral: An Introduction to Real Analysis

f (x)

∂ϕ (x) dx = − gi (x) ϕ(x) dx ∂xi

for every ϕ ∈ C∞ 0 ().

(15.1)

The justification for not including “integrated terms” on the right side of (15.1) is that each function ϕ has compact support in . An important part of the definition that f has a weak partial derivative gi in is that both f and gi belong to L1loc (), but neither f nor gi is required to belong to L1 (). The functions ϕ in (15.1) are called test functions because they serve to test whether the same function gi satisfies (15.1) as ϕ varies over a fairly large collection of functions with compact support. As we will see later, using C∞ 0 () as the class of test functions is sufficient to ensure that gi is unique if it exists, and it will then make sense to refer to gi as “the” weak partial derivative of f with respect to xi . On the other hand, if (15.1) holds for all ϕ ∈ C∞ 0 (), then it also holds for all ϕ ∈ Lip0 (), that is, for all ϕ that are Lipschitz continuous and compactly supported in (see Theorem 15.7). In case n = 1 and is an open set in (−∞, ∞), the analogous definition is that a function f ∈ L1loc () has a weak derivative in if there exists g ∈ L1loc () such that

f (x)ϕ (x) dx = −

g(x)ϕ(x) dx

for every ϕ ∈ C∞ 0 ().

(15.2)

There are simple, almost trivial, examples of functions that have weak partial derivatives but have ordinary partial derivatives nowhere. For example, in case n = 1, if −∞ < s < t < ∞, the function χ(x) that equals s for rational 1 x and equals t for irrational x has this property since for every ϕ ∈ C∞ 0 (R ), ∞ −∞

χ ϕ dx = t

∞ −∞

ϕ dx = 0 = −

∞

0 ϕ dx.

−∞

Therefore, χ has weak derivative 0 in R1 even though its ordinary derivative exists nowhere; in fact, χ is continuous nowhere. The averaging process that is inherent in integrating χ ϕ compensates for the lack of ordinary differentiability of χ. We leave it as an exercise to construct a similar example in any dimension. On the other hand, as we will see in Theorem 15.4, the weak first partial derivatives with respect to xi , i = 1, . . . , n, of a locally Lipschitz continuous function f agree with its corresponding ordinary partial derivatives ∂f /∂xi . Let us begin by making several comments related to (15.1) and (15.2).

463

Weak Derivatives and Poincaré–Sobolev Estimates

The integrals on both sides of (15.1) are well-defined and finite since if K denotes the support of ϕ, then ∂ϕ f dx = f ∂ϕ dx ≤ ∂ϕ | f | dx < ∞, ∂x ∞ ∂x ∂x i i i L (K) K

K

where we have used the local integrability of f in , and similarly, |gi ϕ| dx = |gi ϕ| dx ≤ ϕL∞ (K) |gi | dx < ∞. K

K

Next, let us consider the question of uniqueness of weak derivatives. Let i = 1, . . . , n and suppose that f has a weak partial derivative gi with respect to xi in . Clearly, both f and gi can be changed arbitrarily in any subset of of measure zero without affecting (15.1). However, it turns out that gi is uniquely determined a.e. in by f . To prove this, it is enough to first note that if g˜ i is another function in L1loc () that satisfies (15.1) for the same f , then (gi − g˜ i ) ϕ dx = 0 for all ϕ ∈ C∞ 0 (),

and then to apply part (i) of the next lemma with g chosen to be gi − g˜ i . Lemma 15.3

Let be an open set in Rn and let g ∈ L1loc ().

(i) If

g ϕ dx = 0

for all ϕ ∈ C∞ 0 (),

then g = 0 a.e. in . (ii) If is connected and

g

∂ϕ dx = 0 for all i = 1, . . . , n and all ϕ ∈ C∞ 0 (), ∂xi

then g is constant a.e. in . Part (ii) of Lemma 15.3 will be used in the proof of Theorem 15.6. Proof. Part (i) can be proved in several ways. We will use a method based on a smooth approximation of the identity; the method is flexible and will

464

Measure and Integral: An Introduction to Real Analysis

be used in other situations in this chapter. Let k(x) ∈ C∞ 0 ({|x| < 1}) sat isfy Rn k(x) dx = 1, and set kε (x) = ε−n k(x/ε), ε > 0. Each function kε (x) is infinitely differentiable in Rn and supported in {|x| < ε}. Let B be any fixed ball (or bounded open interval if n = 1) whose closure lies in . Then there is a number ε0 > 0 and a compact set K0 with B ⊂ K0 ⊂ such that for all y ∈ B and all ε ∈ (0, ε0 ), the function kε (y − x) considered as a function of x has support in K0 . By hypothesis, we obtain

g(x) kε (y − x) dx = 0 for all y ∈ B and all ε < ε0 .

For such y and ε, the integral on the left equals

(g χK0 )(x)kε (y − x) dx,

Rn

which by Theorem 9.13 converges to (gχK0 )(y) for a.e. y ∈ Rn as ε → 0, and so converges to g(y) for a.e. y ∈ B as ε → 0. Here, g χK0 is of course interpreted to be 0 outside K0 , that is, g χK0 (x) =

g(x) if x ∈ K0 0 if x ∈ Rn − K0 ,

and we have used the fact that g χK0 ∈ L1 (Rn ). It follows that g is zero a.e in B and therefore also a.e. in . This proves (i). To prove (ii), let kε , B, ε0 , and K0 be as above. If y ∈ B and ε < ε0 , then for every i = 1, . . . , n, we have by hypothesis that 0=

=−

g(x)

∂ kε (y − x) dx ∂xi

∂ g(x)kε (y − x) dx. ∂yi

Hence, for all ε < ε0 , there is a constant cε = cε,B such that

g(x)kε (y − x) dx = cε

for all y ∈ B.

As above, assuming that y ∈ B and ε < ε0 , g can be replaced by the integrable function gχK0 in this integration, and the domain of integration can then be enlarged to Rn . Now let ε → 0. Then the integral converges to g(y) a.e. in B, and consequently cε converges to some constant c, and g(y) = c a.e. in B.

Weak Derivatives and Poincaré–Sobolev Estimates

465

Since is connected by hypothesis, it follows that g is constant a.e. in . This completes the proof of the lemma. Since the function gi in (15.1) is unique (if it exists), we may call it the weak partial derivative of f with respect to xi in . It is customary to use the familiar notation ∂f /∂xi for gi even though f may not have a partial derivative with respect to xi in in the ordinary sense. Similarly, when n = 1, the weak derivative of f will be denoted by f . In situations when the notation might cause confusion, it can be clarified by also using the term “weak” or “ordinary” derivative as the case may be. Let us now show that if f ∈ Liploc (), then f has weak partial derivatives that are the same as its ordinary partial derivatives. Recall from the discussion preceding the Rademacher–Stepanov Theorem 7.53 that any f ∈ Liploc () has ordinary partial derivatives ∂f /∂xi , i = 1, . . . , n, a.e. in that are measurable and locally bounded in . We note however that the full strength of the Rademacher–Stepanov theorem is not used here. Theorem 15.4 Let be an open set in Rn and let f ∈ Liploc (). Then for each i = 1, . . . , n, f has a weak partial derivative gi with respect to xi in given by the ordinary partial derivative ∂f /∂xi , that is, ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

if ϕ ∈ C∞ 0 () and i = 1, . . . , n.

Proof. Let , f , and ϕ satisfy the hypothesis, and let ∂f /∂xi denote the ordinary partial derivative of f with respect to xi . Note that both integrals in the conclusion are finite since ϕ and ∂ϕ/∂xi are bounded and both f and ∂f /∂xi are integrable (even bounded) on the support of ϕ. We will prove the result in case n > 1 and i = 1; the general case is similar. Denote points x ∈ Rn by x = (x1 , x2 , . . . , xn ) = (x1 , xˆ )

with xˆ = (x2 , . . . , xn ) ∈ Rn−1 ,

and write xˆ = {x1 : (x1 , xˆ ) ∈ } and

dx = dx1 dˆx

with dˆx = dx2 · · · dxn .

By Theorem 6.8,

⎡ ⎤ ∂ϕ ∂ϕ ⎣ f (x1 , xˆ ) f dx = (x1 , xˆ ) dx1 ⎦ dˆx. ∂x1 ∂x 1 n−1 R

xˆ

466

Measure and Integral: An Introduction to Real Analysis

Each set xˆ is an open set in (−∞, ∞). If xˆ is not empty, then Theorem 1.10 implies that it can be written as a countable union j (αj , βj ) of disjoint open intervals (αj , βj ), possibly of infinite length. The intervals (αj , βj ) are the connected components of xˆ and of course depend on and xˆ . When f (x1 , xˆ ) and ϕ(x1 , xˆ ) are considered as functions of x1 , f (x1 , xˆ ) is locally Lipschitz continuous on xˆ and ϕ(x1 , xˆ ) is supported in the union j (aj , bj ) of open intervals (aj , bj ) that satisfy [aj , bj ] ⊂ (αj , βj ) and bj − aj < ∞ for each j. Therefore, using Theorem 7.32(ii) and the absolute continuity on [aj , bj ] of f (x1 , xˆ ), we obtain βj αj

bj

∂ϕ ∂ϕ f (x1 , xˆ ) (x1 , xˆ ) dx1 = f (x1 , xˆ ) (x1 , xˆ ) dx1 ∂x1 ∂x 1 a j

bj ∂f βj ∂f =− (x1 , xˆ )ϕ(x1 , xˆ ) dx1 = − (x1 , xˆ )ϕ(x1 , xˆ ) dx1 . ∂x1 ∂x1 a α j

j

Adding over j gives

f (x1 , xˆ )

xˆ

∂f ∂ϕ (x1 , xˆ ) dx1 = − (x1 , xˆ )ϕ(x1 , xˆ ) dx1 . ∂x1 ∂x1 xˆ

Hence,

⎡ ⎤ ∂f ∂ϕ ⎣− f dx = (x1 , xˆ )ϕ(x1 , xˆ ) dx1 ⎦ dˆx ∂x1 ∂x1 n−1 xˆ

R

∂f ϕ dx, =− ∂x1

where the final equality is also due to Theorem 6.8, completing the proof. Let f be a function defined on an open interval (α, β) ⊂ (−∞, ∞), possibly of infinte length. We say that f is absolutely continuous on (α, β) if f is absolutely continuous on every compact subinterval [a, b] of (α, β). By Theorem 7.29, this is equivalent to assuming that f has an ordinary derivative f a.e. in (α, β) that is locally integrable in (α, β) and that

f (b) − f (a) =

b a

f (x) dx

if α < a ≤ b < β.

Weak Derivatives and Poincaré–Sobolev Estimates

467

Remark 15.5 The proof of Theorem 15.4 shows that the conclusion ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

n if ϕ ∈ C∞ 0 (R )

is true for a particular i, i = 1, . . . , n, if the assumption that f ∈ Liploc () is weakened by assuming both of the following: (a) f is absolutely continuous with respect to xi on every connected component of the intersection of with a.e. ((n − 1)-dimensional measure) line parallel to the xi axis. (b) f and its ordinary partial derivative ∂f /∂xi belong to L1loc (). In case n = 1, condition (b) in Remark 15.5 is automatically true for any f that is absolutely continuous on every connected component of . In fact, we have the next basic result. Theorem 15.6 Let be an open set in (−∞, ∞) and f ∈ L1loc (). Then f has a weak derivative in if and only if f can be redefined in a subset of of measure zero so that f is absolutely continuous on every connected component of , that is, if and only if there is a function h on such that f = h a.e. in and h is absolutely continuous on every connected component of . Moreover, the weak derivative f of f coincides with the ordinary derivative h . Proof. The sufficiency of the condition follows from Remark 15.5, as we have already noted. To prove the necessity, suppose that f has weak derivative f on , and decompose into the countable union of disjoint open intervals (αj , βj ), possibly of infinite length: = j (αj , βj ). These intervals are the connected components of . Given an interval (αj , βj ), choose a point γj ∈ (αj , βj ) and define f˜ in (αj , βj ) by

f˜ (x) =

x

f (t) dt,

x ∈ (αj , βj ).

γj

In this way, f˜ is defined in all of and is absolutely continuous on every (αj , βj ) since f ∈ L1loc (). Clearly, f is the weak derivative of f on each (αj , βj ) since it is the weak derivative of f on . Therefore, for every ϕ ∈ C∞ 0 ((αj , βj )), we have

468

Measure and Integral: An Introduction to Real Analysis

βj

βj

f ϕ dx = −

αj

f ϕ dx

αj

βj = − (f˜ ) ϕ dx

by Theorem 7.11

αj

=

βj

f˜ ϕ dx

by Theorem 7.32(ii).

αj

In the middle equality above, (f˜ ) denotes the ordinary derivative of f˜ . Hence, for all ϕ ∈ C∞ 0 ((αj , βj )), βj (f − f˜ ) ϕ dx = 0, αj

and it follows from the second part of Lemma 15.3 (in the one-dimensional case of an open interval) that there is a constant cj such that f (x) = f˜ (x) + cj

for a.e. x ∈ (αj , βj ).

Define h(x) in by h(x) = f˜ (x) + cj if x ∈ (αj , βj ), j ≥ 1. Then h is absolutely continuous on every (αj , βj ) and f = h a.e. in . Also, the ordinary derivative h satisfies h = (f˜ ) = f a.e. in , and the proof is complete. A simple application of Theorem 15.6 is that the step function f (x) =

1 if 0 < x < 1 −1 if − 1 < x < 0

does not have a weak derivative on (−1, 1), even though it is infinitely differentiable on (−1, 1) except at a single point. This follows immediately from Theorem 15.6 since f has a jump discontinuity in (−1, 1), but a direct verification is not difficult. In fact, first note that if ϕ is infinitely differentiable and supported in (−1, 1), then the left side of (15.2) equals 1 −1

f ϕ dx =

1 0

ϕ dx −

0

ϕ dx

−1

=[ϕ(1) − ϕ(0)] − [ϕ(0) − ϕ(−1)] = −2ϕ(0)

469

Weak Derivatives and Poincaré–Sobolev Estimates

since ϕ(1) = ϕ(−1) = 0. If (15.2) were true, there would then exist g ∈ L1loc (−1, 1) such that 1

gϕ dx = 2ϕ(0)

for all ϕ ∈ C∞ 0 ((−1, 1)).

−1

We leave it as an exercise to show that this is impossible; see Exercise 2. See also Exercise 3. We conclude our basic comments related to (15.1) and (15.2) by showing that the class C∞ 0 () of test functions ϕ used in the definition of weak partial derivatives on can be enlarged to the class Lip0 () of Lipschitz continuous functions with compact support in . Thus, once it is verified by testing with the class C∞ 0 () that a function has weak derivatives, it is legitimate to use (15.1) (or (15.2)) for the larger class of functions ϕ ∈ Lip0 (). Theorem 15.7 Let be an open set in Rn and suppose that f has a weak partial derivative ∂f /∂xi in for some i, i = 1, . . . , n. Then ∂ϕ ∂f f dx = − ϕ dx ∂xi ∂xi

for every ϕ ∈ Lip0 ().

−n Proof. Consider again an approximation of the identity kε (x) = ε k(x/ε), ∞ n ε > 0, x ∈ R , with k ∈ C0 ({|x| < 1}) and Rn k dx = 1. Let ϕ ∈ Lip0 (). By extending ϕ to be zero outside , we may think of ϕ as being Lipschitz continuous in all of Rn . Since ϕ has compact support in , there is a compact set K0 ⊂ and a number ε0 > 0 such that K0 contains the supports of ϕ and ϕ ∗ kε for all ε < ε0 . Moreover, we may assume that ε0 is chosen so small that for all ε < ε0 and x ∈ K0 , kε (x − y) considered as a function of y is supported in . By the definition of the weak derivative ∂f /∂xi ,

f (x)

∂f ∂ (ϕ ∗ kε )(x) dx = − (x) (ϕ ∗ kε )(x) dx, ∂xi ∂xi

ε < ε0 .

Let us compute the limit as ε → 0 of each side of this equation. The domains of integration on both sides may be replaced by K0 because ε < ε0 . The expression on the right is then −

∂f (ϕ ∗ kε ) dx. ∂xi

K0

470

Measure and Integral: An Introduction to Real Analysis

Since ∂f /∂xi ∈ L1 (K0 ) and (ϕ ∗ k )(x) converges pointwise to ϕ(x) and is bounded uniformly in ε and x, this has limit equal to −

∂f ∂f ϕ dx = − ϕ dx. ∂xi ∂xi

K0

On the other hand, for ε < ε0 , the integral on the left side earlier above is K0

f (x)

∂ (ϕ ∗ kε )(x) dx. ∂xi

In order to compute its limit as ε → 0, note that

∂ ∂ k (x − y) dy (ϕ ∗ kε )(x) = ϕ(y) ∂xi ∂xi ∂ϕ

∂ = ϕ(y) − k (x − y) dy = (y)kε (x − y) dy, ∂yi ∂yi

where to obtain the final equality, we have assumed that x ∈ K0 and ε < ε0 and applied Theorem 15.4 to the Lipschitz function ϕ and the smooth test function kε (x−y) of y. (See the related result in Exercise 4.) Since ϕ is Lipschitz continuous, the last integral is bounded uniformly in x and ε for x ∈ Rn and ε > 0, and it converges a.e. to (∂ϕ/∂xi )(x) as ε → 0. Since f ∈ L1 (K0 ), the Lebesgue dominated convergence theorem implies that lim

ε→0

f (x)

K0

∂ϕ ∂ (ϕ ∗ kε )(x) dx = f dx ∂xi ∂xi K0

=

∂ϕ f dx. ∂xi

This completes the computation of the limits. It follows that ∂ϕ ∂f f dx = − ϕ dx, ∂xi ∂xi

and the theorem is proved.

471

Weak Derivatives and Poincaré–Sobolev Estimates

If a locally integrable function f in has weak partial derivatives ∂f /∂xi in for every i = 1, . . . , n, n > 1, we write ∇f =

∂f ∂f ,..., ∂x1 ∂xn

in

and call this vector the weak gradient of f in . It is customary to use the same notation ∇f for the weak gradient as for the ordinary gradient. By Theorem 15.4, the two notions are the same if f ∈ Liploc (). Note that if f is any function with a weak gradient in , then both f and |∇f | are locally integrable in by definition; here, the notation |∇f | stands for

n ∂f |∇f | = ∂x i=1

i

2 1/2 .

15.2 Smooth Approximation and Sobolev Spaces In Section 15.3, before deriving Poincaré–Sobolev estimates, we will show that the subrepresentation formulas for C1 functions in Chapter 14 can be extended to functions with a weak gradient. The proofs of these extended versions of the formulas will be based on the known ones for C1 functions and various results about approximation by smooth functions. Approximation theorems like Theorem 15.8 are generally attributed to K. Friedrichs, although their hypotheses may vary. Some variants are discussed farther below and in the exercises. Theorem 15.8 Let be an open set in Rn and K be a compact subset of . If f has a weak gradient ∇f in , then there is a sequence {fj }∞ j=1 of functions on such that (i) fj ∈ C∞ 0 () for all j, (ii) fj → f a.e. in K and in L1 (K) norm as j → ∞, (iii) ∇fj → ∇f a.e. in K and in L1 (K) norm as j → ∞. In part (iii), the terminology ∇fj → ∇f in L1 (K) norm means that for every i = 1, . . . , n, ∂fj /∂xi → ∂f /∂xi in L1 (K) norm as j → ∞. Before giving a proof, we note that more can be said about the supports of the approximating functions fj : if G is any open set satisfying K ⊂ G ⊂ ,

472

Measure and Integral: An Introduction to Real Analysis

then conclusion (i) of the theorem can be replaced by fj ∈ C∞ 0 (G) for all j. Of course, the sequence {fj } then depends on G as well as on K. Indeed, to see why, we have only to apply the theorem with replaced by G, after noting that if f has weak gradient ∇f in an open set , then f also has weak gradient ∇f in any open set G ⊂ . See Theorem 15.9 about the existence of a single subsequence {fj } that has the properties in Theorem 15.8 for every compact set K ⊂ . Proof. We will use the standard method based on a smooth approximation of the identity, with a further refinement. As usual, B(x; r) denotes the (open) ball with center x and radius r. Let f , , and K satisfy the hypothesis, and let kε (x) be as in the proof of Theorem 15.7. Choose an open set G and a number ε0 > 0 such that K ⊂ G, G has compact closure in , B(x; ε) ⊂ G for all x ∈ K and all ε < ε0 , and B(y; ε) ⊂ for all y ∈ G and all ε < ε0 . For example, G can be chosen as G=

B(x; ε0 ),

x∈K

where ε0 is chosen less than half the distance from K to the complement of (cf. Exercise 1(l) in Chapter 1). Note that kε (x − y) considered as a function of y has support in G for all x ∈ K and all ε < ε0 . Let g = f χG denote the function on Rn obtained by extending f to be zero outside G. Then g ∈ L1 (Rn ) since f ∈ L1loc () by hypothesis. Now define gε (x) = (g ∗ kε )(x),

x ∈ Rn , ε > 0.

n Then gε ∈ C∞ 0 (R ) by Theorem 9.3 and the comments after its proof. Moreover, since g vanishes outside G and kε has support in B(0; ε), it follows that gε has support in if ε < ε0 . By Theorems 9.6 and 9.13, gε → g in L1 (Rn ) norm and pointwise a.e. in Rn as ε → 0. Hence, gε → f in L1 (K) and pointwise a.e. in K. Next, by Theorem 9.3, for all x ∈ Rn , all ε > 0 and every i = 1, . . . , n, we have

∂gε ∂ kε (x − y) dy (x) = g(y) ∂xi ∂xi n R

=−

g(y)

Rn

=−

G

f (y)

∂ kε (x − y) dy ∂yi

∂ kε (x − y) dy. ∂yi

Weak Derivatives and Poincaré–Sobolev Estimates

473

Considered as a function of y, kε (x−y) has support in B(x; ε) and therefore has support in G if x ∈ K and ε < ε0 . Since ∂f /∂xi is the weak partial derivative of f with respect to xi in , it is also the weak partial derivative of f with respect to xi in G. Consequently, ∂f ∂gε (x) = (y)kε (x − y) dy ∂xi ∂yi

if x ∈ K and ε < ε0 .

G

The last integral equals

which converges to

∂f ∂xi χG

∂f χG ∗ kε (x), ∂xi

(x) in L1 (Rn ) and pointwise a.e. in Rn as ε → 0.

Restricting x to K, we obtain that ∂gε /∂xi → ∂f /∂xi in L1 (K) and pointwise a.e. in K. The theorem now follows by choosing fj = gεj for any sequence {εj }∞ j=1 → 0 such that εj < ε0 for all j. The proof of Theorem 15.8 can be modified to yield a single sequence {fj } that has the same properties as in Theorem 15.8 for every compact set K ⊂ . Indeed, we have the following result whose proof is left to the reader (see Exercise 5).

Theorem 15.9 Under the same hypotheses on and f as in Theorem 15.8, there is a sequence {fj } that satisfies the three properties in the conclusion of Theorem 15.8 for every compact set K ⊂ . It follows immediately that the sequence {fj } in Theorem 15.9 has the pointwise convergence properties fj → f and ∇fj → ∇f a.e. in . We arrive naturally at the definition of the Sobolev space W 1,p () by considering functions f that have a weak gradient in such that both f and |∇f | belong to Lp (), that is, if is an open set in Rn and 1 ≤ p ≤ ∞, then W 1,p () is defined by W 1,p () = f ∈ Lp () : f has a weak gradient in satisfying |∇f | ∈ Lp () . (15.10) The purpose of the first superscript 1 in the notation W 1,p is to indicate that only first-order derivatives enter the definition. We will not consider Sobolev spaces involving weak derivatives of order more than 1, although these spaces have important applications. In fact, we only consider a few

474

Measure and Integral: An Introduction to Real Analysis

aspects of the spaces W 1,p (), and then our primary concerns are the cases when is either a ball or the entire space Rn . One fact about (15.10) deserves emphasis: in order that f ∈ W 1,p (), the p definition requires that f and |∇f | belong to Lp () and not just to Lloc (). On the other hand, in case is a ball B, we will see in Section 15.3 that f ∈ W 1,p (B) if f ∈ L1loc (B) and f has a weak gradient in B satisfying |∇f | ∈ Lp (B). In other words, in the case of a ball B, the requirement in (15.10) that f ∈ Lp (B) can be replaced by assuming only that f ∈ L1loc (B). Some basic properties of Sobolev spaces are given in the exercises. For example (see Exercise 7), if is any open set in Rn and 1 ≤ p ≤ ∞, then W 1,p () is a Banach space with respect to the norm f W1,p ()

n ∂f = f Lp () + ∂x

.

(15.11)

i Lp ()

i=1

An equivalent norm is n 1/2 ∂f 2 f Lp () + ∂x i i=1

= f Lp () + ∇f Lp () .

Lp ()

Moreover, W 1,p () is separable if 1 ≤ p < ∞. In case 1 ≤ p < ∞, another equivalent norm is p f Lp ()

n ∂f + ∂x

p p

1/p ,

i L ()

i=1

and using this norm when p = 2 makes W 1,2 () a Hilbert space with inner product (f , g) =

fg dx +

∇f · ∇g dx.

(15.12)

See Exercises 11 and 12 for versions of the product rule and the chain rule in W 1,p (). Next, we state a way that functions in W 1,p () can be approximated by smooth functions when p is finite. Theorem 15.13 Let 1 ≤ p < ∞ and be an open set in Rn . If f ∈ W 1,p (), n there is a sequence {fj }∞ j=1 of functions on R such that n (i) fj ∈ C∞ 0 (R ) for all j,

475

Weak Derivatives and Poincaré–Sobolev Estimates

(ii) fj → f a.e. in and in Lp () norm as j → ∞, (iii) ∇fj → ∇f a.e. in and in Lp (K) norm as j → ∞ for every compact set K ⊂ . Moreover, in case = Rn and f ∈ W 1,p (Rn ) for some p with 1 ≤ p < ∞, the sequence {fj } can be chosen to converge to f in W 1,p (Rn ) norm, that is, so that fj → f in Lp (Rn ) and ∇fj → ∇f in Lp (Rn ).

Lp

Here the terminology ∇fj → ∇f in Lp norm means that ∂fj /∂xi → ∂f /∂xi in norm as j → ∞ for every i = 1, . . . , n.

Proof. The proof is similar to that of Theorem 15.8, now using the Lp versions of Theorems 9.6 and 9.13. We will leave some details for the reader to check. Fix p, , and f satisfying the hypotheses, and define g = f χ by extending f to be zero outside . Note that g belongs to Lp (Rn ) but may not have compact support and that g = f everywhere if = Rn . For ε > 0, define gε = g∗kε where kε is the same approximation of the identity used in the proofs of Theorems 15.7 and 15.8. Then gε ∈ C∞ (Rn ), although it may not have compact support, and gε → g in Lp (Rn ) and pointwise a.e. in Rn . Hence, gε → f in Lp () and pointwise a.e. in . The proof that ∇gε → ∇f pointwise a.e. in and also in Lp (K) norm for every compact set K in is left to the reader, as is the proof that if = Rn , then the sequence can be chosen with ∇gε → ∇f in Lp (Rn ) norm. Finally, whether = Rn or not, let η(x) be a smooth cutoff function on Rn n such that η ∈ C∞ 0 (R ) and η(x) =

1 0

if |x| ≤ 1 if |x| ≥ 2.

Then for any sequence εj → 0, the functions {fj } defined on Rn by fj (x) = η(εj x)gεj (x) have compact support in Rn and satisfy all the properties stated in the theorem. For example, let us show why ∇fj → ∇f in Lp (K) for every compact set K ⊂ . We compute ∇fj (x) = εj (∇η)(εj x)gεj (x) + η(εj x)∇gεj (x),

x ∈ Rn .

Therefore, if x ∈ , ∇fj (x) − ∇f (x) ≤ εj ∇ηL∞ (Rn ) |gε (x)| + η(εj x)∇gε (x) − ∇f (x) j j

+ 1 − η(εj x) |∇f (x)| := αj (x) + βj (x) + γj (x).

476

Measure and Integral: An Introduction to Real Analysis

First consider αj . Denote M = ∇ηL∞ (Rn ) sup gεj Lp (Rn ) . j

Then M is finite and αj Lp () ≤ Mεj → 0 as j → ∞. We estimate the norm of γj by using the facts that ηj (εj x) = 1 if |εj x| ≤ 1 and |∇f | ∈ Lp () with p finite: ⎛ ⎜ γj Lp () ≤ ⎝

⎞1/p ⎟ |∇f |p dx⎠

→0

as j → ∞.

x∈;|x|>ε−1 j

Next, fix a compact set K ⊂ . Since ∇gεj → ∇f in Lp (K), then βj Lp (K) ≤ ∇gεj − ∇f Lp (K) → 0 as j → ∞. Collecting estimates, it follows that ∇fj → ∇f in Lp (K) for every compact set K ⊂ . Next, suppose that = Rn and recall that ∇gεj → ∇f in Lp (Rn ) in this case. Hence, βj Lp (Rn ) ≤ ∇gεj − ∇f Lp (Rn ) → 0 as j → ∞. Finally, the arguments proving that both αj Lp (Rn ) and γj Lp (Rn ) tend to 0 were already included above. Thus, in case = Rn , we obtain that ∇fj → ∇f in Lp (Rn ). The remaining details of the proof are left to the reader. In some situations, fewer properties of the approximating sequence {fj } are needed than the ones listed in Theorems 15.8 or 15.13. The following simple variant of Theorem 15.13 will be useful in the proof of Theorem 15.45. Theorem 15.14 Let f have a weak gradient in Rn that satisfies |∇f | ∈ Lp (Rn ) for some p, 1 ≤ p < ∞. If either f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞ or ∞ n lim|x|→∞ f (x) = 0, then there is a sequence {fj }∞ j=1 ⊂ C (R ) with lim|x|→∞ fj (x) = 0 for every j such that fj → f pointwise a.e. in Rn and ∇fj → ∇f in Lp (Rn ) norm as j → ∞.

477

Weak Derivatives and Poincaré–Sobolev Estimates

Proof. Let kε be as usual and let fj = f ∗ kεj for any sequence εj → 0. Then fj ∈ C∞ (Rn ) for all j. Also, fj → f a.e. in Rn since f ∈ L1loc (Rn ), and ∇fj → ∇f in Lp (Rn ) as usual. It remains to show that each fj (x) → 0 as |x| → ∞. First, suppose that f (x) → 0 as |x| → ∞. Then, for every ε > 0, sup | f | → 0

as

B(x;ε)

|x| → ∞,

and we are done since |(f ∗ kε )(x)| ≤

sup | f | kL1 (Rn ) .

B(x;ε)

Finally, assuming instead that f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞, we have |(f ∗ kε )(x)| = f (y)kε (x − y) dy |x−y| 0, then | f (x)| ≤

cn |∇f (y)| dy γ |x − y|n−1

for a.e. x ∈ B.

B

(ii) If f has compact support in B, then | f (x)| ≤ cn

B

|∇f (y)| dy |x − y|n−1

for a.e. x ∈ B.

In both parts, the constant cn depends only on n. Proof. The proof of part (i) is omitted since it is essentially identical to the corresponding proof in Corollary 14.6. To prove (ii), let f have compact support and weak gradient ∇f in a ball B. Extend f and ∇f to Rn by setting them equal to zero outside B. Let B∗ be the ball of radius 2r(B) concentric with B. By Theorem 15.15, ∇f is the weak gradient of f in B∗ . Also, by Exercise 13, |∇f | = 0 outside B. By part (i) of this corollary applied to B∗ and its subset E = B∗ − B, we then obtain | f (x)| ≤ cn

B∗

|∇f (y)| |∇f (y)| dy = c dy n |x − y|n−1 |x − y|n−1

for a.e. x ∈ B, which completes the proof.

B

481

Weak Derivatives and Poincaré–Sobolev Estimates

There is also an analogue of Corollary 14.7. In stating it, we assume f has a weak gradient in Rn and satisfies the extra condition that there is a sequence n {Bi }∞ i=1 of balls such that Bi R and fBi → 0 as i → ∞. As in Chapter 14, typical situations when this holds are (i) f ∈ Lr (Rn ) for some r with 1 ≤ r < ∞, or (ii) f ∈ Lrloc (Rn ) for some r with 1 ≤ r ≤ ∞ and lim|x|→∞ f (x) = 0. Corollary 15.22 Let f have weak gradient ∇f in Rn and suppose that fB → 0 for some sequence of balls B Rn . Then there is a constant cn depending only on n such that | f (x)| ≤ cn I1 (|∇f |)(x)

for a.e. x ∈ Rn ,

where I1 is the fractional integral operator of order 1. In particular, the conclusion holds if f ∈ W 1,p (Rn ) for any p with 1 ≤ p < ∞. Proof. Fix a function f with a weak gradient in Rn and fB → 0 for some sequence of balls B increasing to Rn . For any B in the sequence, since f is locally integrable in Rn , we have that f ∈ L1 (B), and Theorem 15.16 (see (15.18)) yields | f (x) − fB | ≤ cn I1 (|∇f |)(x)

for a.e. x ∈ B.

Now let B Rn . Since fB → 0, the result follows. We now turn to the Poincaré–Sobolev inequalities themselves. Four separate ranges of p values will be considered: 1 < p < n, p = n, p > n, and p = 1. The case p = 1 uses a more local subrepresentation formula than the ones derived so far. Our primary interest is again when n > 1. In fact, the range 1 < p < n does not exist if n = 1, and results when p = n = 1 and p > n = 1 are easy consequences of Theorem 15.6; see Exercise 20. We begin with the range 1 < p < n. Theorem 15.23 Let B be a ball in Rn , n > 1, and let p and q satisfy 1 < p < n and 1/q = 1/p − 1/n (so that q = pn/(n − p) > p). If f is a locally integrable function on B with a weak gradient ∇f belonging to Lp (B), then f ∈ Lq (B) and

1/q | f (x) − fB | dx

B

where fB = |B|−1

q

B f (x) dx

≤ cn,p

1/p p

|∇f (x)| dx

,

B

and the constant cn,p depends only on n and p.

(15.24)

482

Measure and Integral: An Introduction to Real Analysis

If f also has compact support in B, then

1/q q

| f (x)| dx

≤ cn,p

B

1/p p

|∇f (x)| dx

.

(15.25)

B

Some comments about Theorem 15.23 are given after its proof. Proof. Fix B, p, and f satisfying the hypothesis of the first part of the theorem, and let 1/q = 1/p − 1/n. Let us begin by showing that f ∈ L1 (B). Theorem 14.37(a) in case α = 1 gives I1 (|∇f |χB )Lq (Rn ) ≤ cn,p (∇f )χB Lp (Rn ) = cn,p ∇f Lp (B) < ∞. In particular, I1 (|∇f |χB ) is finite a.e. in B. Since f ∈ L1loc (B) by hypothesis, f is also finite a.e. in B. Furthermore, by (15.17), 1 | f (x) − f (y)| dy ≤ cn I1 (|∇f |χB )(x) |B|

a.e. in B,

B

and therefore, 1 | f (y)| dy ≤ | f (x)| + cn I1 (|∇f |χB )(x) |B|

a.e. in B.

B

The fact that f ∈ L1 (B) now follows by choosing a point x ∈ B such that both f (x) and I1 (|∇f |χB )(x) are finite. Hence, the average fB is finite and (15.18) holds: | f (x) − fB | ≤ cn I1 (|∇f |χB )(x)

a.e. in B.

Taking Lq norms over B, we obtain f − fB Lq (B) ≤ cn I1 (|∇f |χB )Lq (B) ≤ cn,p ∇f Lp (B) , as noted earlier. This proves (15.24). The fact that f ∈ Lq (B) now follows immediately from Minkowski’s inequality: f Lq (B) ≤ f − fB Lq (B) + | fB | |B|1/q ≤ cn,p ∇f Lp (B) + | fB | |B|1/q < ∞. This proves the first statement in Theorem 15.23.

Weak Derivatives and Poincaré–Sobolev Estimates

483

If f also has compact support in B, a similar argument based on part (ii) of Corollary 15.21 yields (15.25). Theorem 15.23 is now proved. We pause briefly in order to add several comments related to Theorem 15.23. The proof of Theorem 15.23 fails if p = 1 since I1 does not map L1 (Rn ) into n/(n−1) L (Rn ); Theorem 14.37 provides only a weak type estimate for I1 when p = 1. However, as will see in Theorem 15.37, Theorem 15.23 remains true if p = 1. Inequality (15.24) is often called the Lp , Lq Poincaré estimate for f and B, and (15.25) for compactly supported f is called the Lp , Lq Sobolev estimate for f . The formula 1/q = 1/p − 1/n is called the Poincaré–Sobolev dimensional balance formula or simply the balance formula. Because of it and the fact that the measure of any ball B in Rn is a fixed multiple of r(B)n , the Lp , Lq Poincaré estimate for B can be written in the equivalent normalized form

1/q 1/p 1 1 q p | f (x) − fB | dx ≤ cn,p r(B) |∇f (x)| dx . |B| |B| B

(15.26)

B

Similarly, the Lp , Lq Sobolev estimate can be written in the form

1/q 1/p 1 1 q p | f (x)| dx ≤ cn,p r(B) |∇f (x)| dx . |B| |B| B

(15.27)

B

A remarkable fact is that the hypothesis in the first part of Theorem 15.23 is nominally weaker than assuming f belongs to W 1,p (B), because f is not assumed to belong to Lp (B) or even to L1 (B), but the conclusion is stronger since f turns out to belong to Lq (B) for some q > p. In fact, as already shown in the proof of Theorem 15.23, under the same hypothesis as in the first part of Theorem 15.23, we have f Lq (B) ≤ cn,p ∇f Lp (B) + | fB | |B|1/q < ∞.

(15.28)

From a heuristic point of view, the main content of Theorem 15.23 is that there is a gain in the order of integrability of f . An important corollary is that W 1,p (B) ⊂ Lq (B) if 1 < p < n and 1/q = 1/p − 1/n. See also Exercise 14. Norm estimates over all of Rn for the range 1 < p < n can be derived easily from Corollary 15.22. The second part of the following result is usually referred to as the first-order Sobolev embedding theorem for Rn in case 1 < p < n. Theorem 15.29 Let f have a weak gradient in Rn satisfying |∇f | ∈ Lp (Rn ) for some p with 1 < p < n, and suppose that fB → 0 for some sequence of balls increasing

484

Measure and Integral: An Introduction to Real Analysis

to Rn . Let 1/q = 1/p − 1/n. Then f ∈ Lq (Rn ) and f Lq (Rn ) ≤ cn,p ∇f Lp (Rn ) . In particular, if f ∈ W 1,p (Rn ) for some p with 1 < p < n, then f Lq (Rn ) ≤ cn,p ∇f Lp (Rn ) ≤ cn,p f W1,p (Rn ) , 1/q = 1/p − 1/n. Proof. The first statement follows immediately by combining Corollary 15.22 and Theorem 14.37(a). Alternately, it can be derived by applying (15.24) to each ball in the sequence of balls in the hypothesis of the theorem. The second statement is a corollary of the first one since fB → 0 for any sequence of balls B Rn if f ∈ Lp (Rn ), 1 ≤ p < ∞. See also Exercise 15. Next, we consider the endpoint value p = n that was omitted in Theorem 15.23. When p = n, the corresponding value of q in the balance formula 1/q = β 1/p − 1/n is q = ∞. However, the function log |x| belongs to W 1,n ({|x| < 1/2}) if n > 1 and 0 < β < (n − 1)/n, but it is not bounded; see Exercise 16. On the other hand, by using the estimates of Moser–Trudinger type in Chapter 14, we obtain the following result about local exponential integrability. The case n = 2 was studied by Pohozaev. Theorem 15.30 Let B be a ball in Rn , n > 1, and f have a weak gradient in B satisfying |∇f | ∈ Ln (B). There are positive constants c1 and c2 depending only on n such that 1 | f (x) − fB | n/(n−1) exp c1 dx ≤ c2 . |B| ∇f Ln (B)

(15.31)

B

If f is also supported in B, then (15.31) also holds with fB replaced by 0. We have assumed here that ∇f Ln (B) = 0; otherwise, f is constant a.e. in B by Corollary 15.20. Proof. Let B and f satisfy the first hypothesis. Since |∇f | ∈ Ln (B), then |∇f | ∈ Lp (B) for 1 ≤ p ≤ n by Hölder’s inequality, and consequently f ∈ L1 (B) by Theorem 15.23. The proof of (15.31) then follows immediately by combining (15.18) with the case α = 1 of Theorem 14.40. If f also has compact support

485

Weak Derivatives and Poincaré–Sobolev Estimates

in B, a similar argument based on the second part of Corollary 15.21 yields the second part of the theorem. In case n = 1, a better result than (15.31) holds; see Exercise 20. An immediate corollary of (15.31), under the same assumptions on f , is that for every r with 0 < r < ∞, 1 |B|

B

| f (x) − fB | ∇f Ln (B)

r

dx ≤ cn,r ,

or equivalently

1/r 1 r | f (x) − fB | dx ≤ cn,r ∇f Ln (B) , |B|

0 < r < ∞.

B

For a related global inequality, see Corollary 15.41. Let us now consider the case p > n. Theorem 15.32 (Morrey) Let B be a ball in Rn and n < p ≤ ∞. If f has a weak gradient ∇f in B satisfying |∇f | ∈ Lp (B), then after possible redefinition of f in a subset of B of measure zero, f is Hölder continuous of order 1−(n/p) on B. Moreover, there is a constant cn,p depending only on n and p such that | f (x) − f (y)| ≤ cn,p ∇f Lp (B) |x − y|

1− np

,

x, y ∈ B.

The order 1 − (n/p) of Hölder continuity should be interpreted as 1 if p = ∞, that is, f is Lipschitz continuous on B if p = ∞. Proof. Let B, p, and f satisfy the hypothesis. Then |∇f | ∈ Lr (B) for 1 ≤ r ≤ p, so f ∈ L1 (B) by previous results. Therefore, using (15.18) and Hölder’s inequality, for any ball B1 ⊂ B and a.e. x ∈ B1 , we obtain | f (x) − fB1 | ≤ cn

B1

|∇f (y)| dy |x − y|n−1 ⎛

≤ cn ∇f Lp (B1 ) ⎝

B1

⎞1/p 1 ⎠ , dy |x − y|(n−1)p

486

Measure and Integral: An Introduction to Real Analysis

1/p + 1/p = 1. Note that (n − 1)p < n since p > n. Also, B1 ⊂ B(x; 2r(B1 )) for every x ∈ B1 . Hence, if x ∈ B1 , then ⎛ ⎝

B1

⎞1/p ⎛ 1 ⎠ ≤⎝ dy |x − y|(n−1)p

|x−y| 2k , | f (x0 ) − fB | < 2k−1 , or | f (x0 ) − fB | = 2k−1 . In either of the first two cases, since f is continuous, there is a neighborhood of x0 in which gk is constant, namely, in which gk ≡ 2k or gk ≡ 2k−1 , respectively, and consequently |∇gk (x0 )| = 0 in either of the first two cases. On the other hand, if | f (x0 ) − fB | = 2k−1 , then gk (x0 ) = 2k−1 , and therefore, gk has an absolute minimum at x0 (since gk ≥ 2k−1 everywhere in B). Assuming as we may that ∇gk (x0 ) exists, it follows that |∇gk (x0 )| = 0, and (15.36) is verified. Note that gk is integrable, even bounded, in B. For all x ∈ B, gk (x) = [gk (x) − (gk )B ] + (gk )B ≤ cn I1 (|∇gk |χB )(x) + (gk )B ≤ cn I1 (|∇f |χk−1 )(x) + (gk )B , where we have applied the subrepresentation formula to the locally Lipschitz function gk and used (15.36). Also, gk ≤ 2k−1 + | f − fB | in B, and hence (gk )B ≤ 2k−1 +

1 | f − fB | dy. |B| B

Therefore, if x ∈ B,

1 | f − fB | dy. gk (x) ≤ cn I1 |∇f |χk−1 (x) + 2k−1 + |B| B

Restricting x to k , where we have gk (x) = 2k , gives 2k ≤ cn I1 (|∇f |χk−1 )(x) + 2k−1 +

1 | f − fB | dy, |B| B

and by then subtracting 2k−1 from both sides, it follows that 2k ≤ cn I1 (|∇f |χk−1 )(x) +

2 | f − fB | dy, |B|

x ∈ k .

B

Combining this with the fact that | f − fB | ≤ 2k+1 on k , we obtain the desired estimate | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) +

4 | f − fB | dy, |B| B

We can now extend Theorems 15.23 and 15.29 to p = 1.

x ∈ k .

489

Weak Derivatives and Poincaré–Sobolev Estimates

Theorem 15.37 (i) Let B be a ball in Rn and f be a function with a weak gradient ∇f in B that satisfies |∇f | ∈ L1 (B). Then f ∈ Ln/(n−1) (B) and

(n−1)/n | f − fB |

n/(n−1)

dx

≤ cn

B

|∇f | dx.

(15.38)

B

(ii) Let f have a weak gradient ∇f in Rn satisfing |∇f | ∈ L1 (Rn ). If fB → 0 for some sequence of balls B Rn , then f ∈ Ln/(n−1) (Rn ) and f Ln/(n−1) (Rn ) ≤ cn ∇f L1 (Rn ) .

(15.39)

In particular, (15.39) holds if f ∈ W 1,1 (Rn ). Thus, it holds if f has compact support and a weak gradient in Rn satisfing |∇f | ∈ L1 (Rn ). The constants cn depend only on n. Once again, there are normalized versions: (15.38) can be rewritten as

(n−1)/n 1 r(B) n/(n−1) | f − fB | dx ≤ cn |∇f | dx, |B| |B| B

B

and if f has compact support in B, (15.39) is the same as

(n−1)/n 1 r(B) n/(n−1) |f| dx ≤ cn |∇f | dx. |B| |B| B

B

Proof. It is enough to prove part (i) since part (ii) follows from it by applying (15.38) to the sequence of balls in the hypothesis of part (ii) and using Fatou’s lemma. Fix a ball B and let f have weak gradient in B with |∇f | ∈ L1 (B). Then f ∈ L1 (B) by an argument like the one at the beginning of the proof of Theorem 15.23, except that the finiteness of I1 (|∇f |χB ) a.e. in B now follows from the weak type estimate in Theorem 14.37(b) since |∇f | ∈ L1 (B). Assuming also that f ∈ C∞ (B), or even also just that f ∈ Liploc (B), we can apply Theorem 15.34. Thus, let k = x ∈ B : 2k < | f (x) − fB | ≤ 2k+1 ,

k = 0, ±1, . . . .

490

Measure and Integral: An Introduction to Real Analysis

Recall from the discussion following the statement of Theorem 15.34, that is, from the discussion about the size of the second term on the right side of (15.35), that | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) + cn

r(B) |∇f | dy, |B|

x ∈ k .

(15.40)

B

Denote q = n/(n − 1). Since B is the disjoint union of the k , we have

| f − fB | q =

∞

| f − fB | q

k=−∞ k

B

≤

∞

2(k+1)q |k | =

k=−∞

+

k≤N

= S1 + S2 ,

k≥N+1

say, where N is chosen such that the second term on the right of (15.40) satisfies 2N−1 < cn

r(B) |∇f | ≤ 2N . |B| B

Then S1 =

2(k+1)q |k | ≤ 2(N+1)q |B|

k≤N

2q (N−1)q

=2 2

= cn

2q

|B| ≤ 2

q |B|

B

q |∇f |

r(B) |∇f | cn |B|

since q = n/(n − 1).

B

In order to estimate S2 , note that for all x ∈ k , the choice of N and (15.40) imply that 2k < | f (x) − fB | ≤ cn I1 (|∇f |χk−1 )(x) + 2N . If k ≥ N + 1, then by subtracting 2N from both sides and noting that 2k−1 ≤ 2k − 2N , we obtain I1 (|∇f |χk−1 )(x) >

2k−1 , cn

x ∈ k , k ≥ N + 1.

491

Weak Derivatives and Poincaré–Sobolev Estimates

Equivalently, if k ≥ N + 1, then k ⊂ x : I1 (|∇f |χk−1 )(x) > 2k−1 /cn . Therefore, by the weak type estimate in Theorem 14.37(b), ⎛ |k | ≤ cn ⎝

1 2k−1

⎞q

|∇f |⎠

if k ≥ N + 1.

k−1

Hence,

S2 =

2(k+1)q |k |

k≥N+1

⎞q ⎛ 2(k+1)q ⎝ ≤ cn |∇f |⎠ 2(k−1)q k≥N+1

= cn

≤ cn ⎝

⎞q

⎝

k≥N+1

⎛

k−1

⎛

|∇f |⎠

k−1

⎞q

|∇f |⎠ = cn

k k−1

q |∇f |

.

B

%

q % Here, we have used the estimate |ak |q ≤ |ak | to obtain the last line (see Exercise 31 of Chapter 8). Combining the estimates for S1 and S2 proves (15.38) in case f is smooth. Now consider a general f with a weak gradient satisfying |∇f | ∈ L1 (B). Let D be a ball with D ⊂ B, and choose an approximating sequence {fj } as in Theorem 15.8 with K = D there. By the case just proved, we have (with q = n/(n − 1))

1/q | fj − (fj )D |

q

≤ cn

D

|∇fj |

D

for every j. As j → ∞, fj → f a.e. in D, (fj )D → fD since fj → f in L1 (D) norm, and |∇fj | → |∇f | in L1 (D). Therefore,

D

1/q | f − fD |

q

≤ cn

D

|∇f |,

492

Measure and Integral: An Introduction to Real Analysis

where we used Fatou’s lemma for the left side of the inequality. Now enlarge the domain of integration on the right side to B and then let D increase to B. Since f ∈ L1 (B), then fD → fB , and another application of Fatou’s lemma gives (15.38). Finally, since (15.38) and the finiteness of fB imply that f ∈ Lq (B) by Minkowski’s inequality, the proof is complete. Part (ii) of Theorem 15.37 yields the next result about the space W 1,n (Rn ).

Corollary 15.41

Let f ∈ W 1,n (Rn ) and 1 < n ≤ r < ∞. Then f ∈ Lr (Rn ) and n

1− n

f Lr (Rn ) ≤ cn,r f Lrn (Rn ) ∇f Ln (Rr n ) ,

(15.42)

where cn,r is a constant depending only on n and r. In particular, f Lr (Rn ) ≤ cn,r f W1,n (Rn ) ,

n ≤ r < ∞.

(15.43)

Proof. First suppose f ∈ C10 (Rn ) and note that the function F = | f |δ−1 f is then also of class C10 (Rn ) if δ ≥ 1. Moreover, ∂F/∂xi = δ| f |δ−1 ∂f /∂xi for every i = 1, . . . , n. Denote n = n/(n − 1) and f Ln (Rn ) = f n . Applying (15.39) to F, we obtain f δδn ≤ cn δ | f |δ−1 |∇f |1 ≤ cn δ f δ−1 (δ−1)n ∇f n ,

δ ≥ 1, by Hölder’s inequality

(15.44)

with exponents n and n. We will successively choose δ = n, n + 1, n + 2, . . . in (15.44). With δ = n, since (n − 1)n = n, we have f nnn ≤ cn n f n−1 n ∇f n . Next, choosing δ = n + 1, n f n+1 (n+1)n ≤ cn (n + 1) f nn ∇f n 2 ≤ cn (n + 1)n f n−1 n ∇f n ,

where we have used the case δ = n to obtain the last inequality. Continuing inductively in this way gives n−1 k+1 f n+k (n+k)n ≤ cn (n + k) · · · (n + 1)n f n ∇f n ,

k = 0, 1, . . . .

493

Weak Derivatives and Poincaré–Sobolev Estimates

Raising both sides to the power 1/(n + k), we obtain n

1− nr

f r ≤ cn,r f nr ∇f n

,

r = (n + k)n , k = 0, 1, . . . .

Also, the estimate is trivially true when r = n. In order to obtain the same estimate for every r ∈ [n, ∞), we use Hölder’s inequality in the form (cf. Exercise 6 of Chapter 8) 1−θ f r ≤ f θ r1 f r2 ,

1 ≤ r1 ≤ r ≤ r2 ≤ ∞,

θ 1 1−θ = + . r r1 r2

In fact, by choosing r1 = n and r2 = (n + k)n for a fixed k = 0, 1, . . ., it follows that for any r ∈ [n, (n + k)n ], θ 1−θ 1 = + , r n (n + k)n n−1 k+1 1−θ n+k n+k c f ∇f ≤ f θ n,k n n n

1−θ f r ≤ f θ n f (n+k)n ,

θ+ n−1 n+k (1−θ)

= cn,r f n

k+1

∇f nn+k

(1−θ)

n

1− nr

= cn,r f nr ∇f n

.

This proves the result when f ∈ C10 (Rn ). For general f ∈ W 1,n (Rn ), the same estimate then holds by applying the second statement in Theorem 15.13 with p = n. Details are left to the reader. This completes the proof. Theorem 15.37(ii) has an elegant companion result relying heavily on the product structure of Rn . There is an equally elegant analogue of Theorem 15.29 for functions defined in all of Rn . The result corresponding to Theorem 15.37(ii) is given next; see Exercise 18 for the one corresponding to Theorem 15.29.

Theorem 15.45

If f ∈ W 1,1 (Rn ), then f ∈ Ln/(n−1) (Rn ) and f Ln/(n−1) (Rn )

n & ∂f ≤ ∂x i=1

1/n 1

.

(15.46)

i L (Rn )

Moreover, the conclusion holds if the hypothesis that f ∈ W 1,1 (Rn ) is replaced by assuming that f has a weak gradient in Rn that satisfies |∇f | ∈ L1 (Rn ) and either f ∈ Lr (Rn ) for some r ∈ [1, ∞) or lim|x|→∞ f (x) = 0. In the proof, we will use the next lemma in case n > 1.

494

Measure and Integral: An Introduction to Real Analysis

Lemma 15.47 Let n = 2, 3, . . . and g1 (x), . . . , gn (x) be n functions of x = (x1 , . . . , xn ) ∈ Rn such that gi (x) is independent of xi , i = 1, . . . , n, that is, gi (x) depends only on (x1 , . . . , xi−1 , xi+1 , . . . , xn ) ∈ Rn−1 . If each gi ∈ Ln−1 (Rn−1 ), then n & gi i=1

≤

n & gi

Ln−1 (Rn−1 )

.

i=1

L1 (Rn )

Proof. The proof will be by induction on n. We may assume that every gi is nonnegative. If n = 2, the result is true since

⎛ g1 (x2 )g2 (x1 ) dx1 dx2 = ⎝

R2

⎞⎛ g1 (x2 ) dx2 ⎠ ⎝

R1

⎞ g2 (x1 ) dx1 ⎠ .

R1

Suppose n ≥ 3 and let (n − 1) denote the conjugate index of n − 1. Write g1 g2 · · · gn = (g1 g2 · · · gn−1 ) gn and apply Hölder’s inequality with indices (n − 1) and n − 1, obtaining n ( '& gi dx1 · · · dxn−1 Rn−1

1

⎛ ≤⎝

' n−1 &

Rn−1

(n−1)

gi

(

⎞ dx1 · · · dxn−1 ⎠

1

1 (n−1)

⎛ ⎝

⎞ gn−1 dx1 · · · dxn−1 ⎠ n

1 n−1

.

Rn−1

(15.48) The second factor on the right side of (15.48) is gn Ln−1 (Rn−1 ) , which is independent of xn since gn is independent of xn . We estimate the first factor on the right side by applying the inductive assumption as follows: ' n−1 & Rn−1

≤

(n−1)

(

gi

dx1 · · · dxn−1

1 n−1 & 1

⎡ ⎣

Rn−2

⎤ (n−1) (n−2)

gi

dx1 · · · dxi−1 dxi+1 · · · dxn−1 ⎦

1 n−2

.

495

Weak Derivatives and Poincaré–Sobolev Estimates

Now raise both sides of this inequality to the power 1/(n − 1) , note that (n − 1) (n − 2) = n − 1, and combine the result with (15.48) to obtain n ( '& gi dx1 · · · dxn−1 Rn−1

⎛

⎜ ≤⎝

1 n−1 &

⎡ ⎣

1

⎤

1 n−1

gn−1 dx1 · · · dxi−1 dxi+1 · · · dxn−1 ⎦ i

⎞ ⎟ ⎠ gn Ln−1 (Rn−1 ) .

Rn−2

On the right side of this inequality, each of the first n−1 factors in the product is a function only of xn . Hence, by integrating both sides of the inequality with respect to xn and then using Hölder’s inequality with n−1 exponents all equal to n − 1 (see Exercise 6 of Chapter 8), we obtain n ( '& gi dx ≤ Rn

n−1 &

1

gi Ln−1 (Rn−1 ) gn Ln−1 (Rn−1 )

1

=

n &

gi Ln−1 (Rn−1 ) .

1

This completes the proof of Lemma 15.47. Proof of Theorem 15.45. First assume that f ∈ C1 (Rn ), n > 1, with |∇f | ∈ L1 (Rn ) and that lim|x|→∞ f (x) = 0. Then for any x = (x1 , . . . , xn ) and i = 1, . . . , n, we have f (x) =

xi ∂f (x1 , . . . , xi−1 , t, xi+1 , . . . , xn ) dt ∂xi −∞

and ∞ ∂f | f (x)| ≤ ∂x (x1 , . . . , xi−1 , t, xi+1 , . . . , xn ) dt := hi (x), i −∞ where hi is defined by the last equality. Note that hi (x) is independent of xi and belongs to L1 (Rn−1 ). Raising both sides of the inequality to the power 1/(n − 1), n > 1, and then taking the product over i of the result gives n

| f (x)| n−1 ≤

n & i=1

1

hi (x) n−1 ,

x ∈ Rn .

496

Measure and Integral: An Introduction to Real Analysis

1/(n−1)

Each hi

∈ Ln−1 (Rn−1 ). Therefore, by Lemma 15.47 with gi chosen to be

1/(n−1) , hi

Rn

| f (x)|

n n−1

n 1 & n−1 h dx ≤ i i=1

Ln−1 (Rn−1 )

n & ∂f = ∂x i=1

1 n−1 1

.

i L (Rn )

Raising both sides to the power (n − 1)/n proves (15.46) when n > 1 and f ∈ C1 (Rn ) with limit 0 at ∞ and |∇f | ∈ L1 (Rn ). The general case when n > 1 follows by using the approximation result in Theorem 15.14 combined with Fatou’s lemma; details as well as the proof in case n = 1 are left to the reader. This completes the proof.

Exercises 1. Construct an example of a function f that has weak first-order partial derivatives in Rn , n > 1, but whose ordinary first-order partial derivatives exist nowhere in Rn . 2. Let c be a real number different from 0. Show that there is no locally 1 integrable function g on (−1, 1) such that −1 gϕ dx = cϕ(0) for all ϕ ∈ C∞ 0 ((−1, 1)). 3. The fact that the Cantor–Lebesgue function f does not have a weak derivative on the interval (0, 1) is an immediate corollary of Theorem 15.6. Verify this fact instead directly from the definition of a weak derivative by choosing test functions that are adapted to the graph of f . 4. Consider the convolution (ϕ ∗ k)(x), x ∈ Rn , where ϕ ∈ Liploc (Rn ), k ∈ L1 (Rn ), and k has compact support. Show that for every i = 1, . . . , n, the ordinary derivative ∂(ϕ∗k)/∂xi exists and is finite everywhere in Rn , and ∂ϕ ∂ (ϕ ∗ k)(x) = (y)k(x − y) dy, ∂xi ∂yi n

x ∈ Rn .

R

Show that the same is true without the assumption that k has compact support if ϕ satisfies the additional conditions ϕ, ∂ϕ/∂yi ∈ L∞ (Rn ). 5. Verify Theorem 15.9. (Choose a sequence {K }∞ =1 of compact subsets of whose interiors increase to . For each , use Theorem 15.8 to pick a function f ∈ C∞ 0 () such that f − f L1 (K ) + ∇f − ∇f L1 (K ) < 1/.

497

Weak Derivatives and Poincaré–Sobolev Estimates

Show that for every compact K ⊂ , {f } and {∇f } converge in L1 (K) to f and ∇f , respectively. Then use a diagonal process to construct a subsequence {j } of positive integers such that {fj } and {∇fj } also converge pointwise a.e. in .) 6. Let be an open set in Rn and let f , g ∈ L1loc (). Show that for any i = 1, . . . , n, f has weak partial derivative ∂f /∂xi = g if and only if there are ∞ 1 functions {fj }∞ j=1 ⊂ C () such that fj → f in L (K) and ∂fj /∂xi → g in L1 (K) as j → ∞ for every compact set K ⊂ . 7. Prove that when 1 ≤ p ≤ ∞, W 1,p () is a Banach space with respect to the norm (15.11) and that it is separable when 1 ≤ p < ∞. Prove that W 1,2 () is a Hilbert space with respect to the inner product (15.12). (To show separability, first note that the repeated Cartesian product of Lp () with itself is separable when 1 ≤ p < ∞. Then apply the result in Exercise 23 of Chapter 8.) 8. Complete the proof of Theorem 15.13. Furthermore, given a number δ > 0, show that the approximating functions {fj } can be chosen to have supports in the δ-neighborhood {x : d(x, ) < δ} of , where d(x, ) denotes the distance from x to . In particular, in case is a ball B, if B∗ is a larger ball concentric with B, the fj can be chosen to have supports in B∗ . 1,p n n 9. Let 1 ≤ p ≤ ∞

and f ∈ W (R ). For h and x in R , define the translated function τh f (x) = f (x + h). Show that τh f − f Lp (Rn ) ≤ |h| ∇f Lp (Rn ) . n (Consider first the case when f ∈ C∞ 0 (R ) and derive the estimate τh f − f Lp ({|x| c satisfies |∇fc | ≤ |∇f | a.e. in . Show also that |∇(| f |)| ≤ |∇f | a.e. in . 18. Prove the following analogue of Theorem 15.45 when 1 < p < n and 1/q = 1/p − 1/n. If f ∈ W 1,p (Rn ), then f Lq (Rn )

n & ∂f ≤ cn,p ∂x i=1

1/n p

.

i L (Rn )

(In case f ∈ C10 (Rn ), apply (15.46) to the function F = | f |δ−1 f with δ = q(n − 1)/n. Note that 1 < δ < ∞, and apply Hölder’s inequality to the formula ∂F/∂xi L1 (Rn ) = δ | f |δ−1 ∂f /∂xi L1 (Rn ) with exponents p and p.) 19. (a) Show that there exists f ∈ C∞ (Rn ) with |∇f | ∈ Lp (Rn ) for every p, 1 ≤ p ≤ ∞, such that (15.39), (15.46), and the first inequality in the conclusion of Theorem 15.29 fail. (b) Suppose that 1 ≤ p < n, f ∈ W 1,p (Rn ), and ∂f /∂xi = 0 a.e. in Rn for some i, i = 1, . . . , n. Prove that f = 0 a.e. in Rn . (See Exercise 18 in case 1 < p < n.) 20. Let n = 1, (a, b) be an open interval in R1 , possibly of infinite length, and suppose that f has a weak derivative f in (a, b). Verify the following analogues of Theorems 15.30 and 15.32: (a) If f ∈ L1 (a, b), then f ∈ L∞ (a, b) and f − f (c)L∞ (a,b) ≤ f L1 (a,b)

for a.e. c ∈ (a, b).

500

Measure and Integral: An Introduction to Real Analysis

(b) If f ∈ Lp (a, b) for some p with 1 < p ≤ ∞, then | f (x) − f (y)| ≤ f Lp (a,b) |x − y|1/p

for a.e. x, y ∈ (a, b),

where 1/p + 1/p + 1. 21. Let f ∈ W 1,1 (Rn ). Prove that f Lr (Rn ) ≤ f W1,1 (Rn ) ,

1≤r≤

n . n−1

22. Verify the following remarkable fact. Let f and g be locally integrable functions on Rn that satisfy 1 1 | f − fB | dx ≤ Cr(B) |g| dx |B| |B| B

B

for all balls B ⊂ Rn , with C independent of B. If 1 < p < n and 1/q = 1/p − 1/n, then for every ball B,

1/q 1/p 1 p 1 q | f − fB | dx ≤ C r(B) |g| dx , |B| |B| B

B

where C depends only on n and C. (Recall Theorem 14.12.) What can be said if p ≥ n? 23. Let B be a ball in Rn with r(B) = 1 and n ≥ 2. Let k = 1, 2, . . . and f ∈ Ck+1 0 (B). If n = 2k or n = 2k + 1, show that f L∞ (B) ≤ cn ∇ k+1 f L2 (B) , where the right-hand side denotes the L2 (B) norm of the sum of the absolute value of every partial derivative of f of order k + 1, and cn is a constant independent of B and f . (One way to proceed is to first apply Corollary 14.6 to obtain f L∞ (B) ≤ c∇f Lr (B) if r > n. Then show that ∇f Lr (B) ≤ c∇ 2 f Lrn/(r+n) (B) , and continue considering higher order derivatives as necessary.)

Notations x+y x·y |x| x+ x− f (x+) f (x−) → m − → {xk } {x : . . .} x∈E x∈ /E E 1 ∪ E2 Ek k E1 ∩ E2 Ek k E1

− E2 E1 ⊂ E2 CE E¯ ˚ E diam E δ(E) δ(x) |E| |E|e [a, b] (a, b) (a, b] [a, b) Gδ Fσ Ø sup inf

addition dot product absolute value positive part negative part limit from the right limit from the left converges to converges in measure to increases to decreases to sequence set of x satisfying . . . x an element of E x not an element of E union intersection difference, relative complement subset complement closure interior diameter diameter distance function measure outermeasure closed interval open interval partly open intervals countable intersection of open sets countable union of closed sets empty set supremum, least upper bound infimum, greatest lower bound 501

502

lim sup lim inf ess sup ess inf ⎫ L, L1 ⎪ ⎪ ⎪ ⎪ Lp ⎪ ⎪ ⎬ Lip, Liploc ∞ L ⎪ ⎪ ⎪ ⎪ W 1,p ⎪ ⎪ ⎭ S p l R1 Rn C f E f dμ Eb a fdφ

·

· p

· ∗ , · ∗∗ f ∗g χE a.e. (S, , μ) o(φ(x)) O(φ(x)) ⎫ V(E) ⎬ ¯ V(E), V(E) ⎭ V[f ; a, b] S[f ] S[f ] usc lsc R S ω(f ; δ) ωp (f ; δ) ωf ,E (α) f* f

f Ff Iα f , Iα f

Notations

limit superior limit inferior essential supremum essential infimum

classes of functions

classes of sequences real number system n-dimensional Euclidean space complex number system Lebesgue integral in Rn Lebesgue integral in a measure space Riemann–Stieltjes integral norm Lp norm BMO constants convolution characteristic function almost everywhere measure space order of magnitude relations variations Fourier series conjugate Fourier series upper semicontinuous lower semicontinuous Riemann–Stieltjes sum sum of increments moduli of continuity distribution function Hardy–Littlewood maximal function conjugate function Fourier transform Fourier transform fractional integrals

503

Notations

Hf Hε f , Hε,ω f Hα (A) (A), ∗ (A) v(I) ∂f ∂xi

∇f

Hilbert transform truncated Hilbert transform Hausdorff measure Lebesgue–Stieltjes measure, outer measure volume partial derivative gradient

Mathematics

Published nearly forty years after the first edition, this long-awaited Second Edition also:

Measure and Integral An Introduction to Real Analysis Second Edition

Richard L. Wheeden Antoni Zygmund

Wheeden • Zygmund

This widely used and highly respected text for upper-division undergraduate and first-year graduate students of mathematics, statistics, probability, or engineering is revised for a new generation of students and instructors. The book also serves as a handy reference for professional mathematicians.

Second Edition

• Studies the Fourier transform of functions in the spaces L1, L2, and Lp, 1

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close