MP06 Attouch FM A.qxp
11/11/2005
11:57 AM
Page 1
VARIATIONAL ANALYSIS IN SOBOLEV AND BV SPACES
MP06 Attouch FM A...
33 downloads
662 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
MP06 Attouch FM A.qxp
11/11/2005
11:57 AM
Page 1
VARIATIONAL ANALYSIS IN SOBOLEV AND BV SPACES
MP06 Attouch FM A.qxp
11/11/2005
11:57 AM
Page 2
MPS-SIAM Series on Optimization
This series is published jointly by the Mathematical Programming Society and the Society for Industrial and Applied Mathematics. It includes research monographs, books on applications, textbooks at all levels, and tutorials. Besides being of high scientific quality, books in the series must advance the understanding and practice of optimization. They must also be written clearly and at an appropriate level. Editor-in-Chief Michael Overton, Courant Institute, New York University Editorial Board Michael Ferris, University of Wisconsin C. T. Kelley, North Carolina State University Monique Laurent, CWI, The Netherlands Adrian S. Lewis, Cornell University Jorge Nocedal, Northwestern University Daniel Ralph, University of Cambridge Franz Rendl, Universität Klagenfurt, Austria F. Bruce Shepherd, Bell Laboratories - Lucent Technologies Mike Todd, Cornell University Series Volumes Attouch, Hedy, Buttazzo, Giuseppe, and Michaille, Gérard, Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization Wallace, Stein W. and Ziemba, William T., editors, Applications of Stochastic Programming Grötschel, Martin, editor, The Sharpest Cut: The Impact of Manfred Padberg and His Work Renegar, James, A Mathematical View of Interior-Point Methods in Convex Optimization Ben-Tal, Aharon and Nemirovski, Arkadi, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications Conn, Andrew R., Gould, Nicholas I. M., and Toint, Phillippe L., Trust-Region Methods
MP06 Attouch FM A.qxp
11/11/2005
11:57 AM
Page 3
VARIATIONAL ANALYSIS IN SOBOLEV AND BV SPACES APPLICATIONS TO PDES AND OPTIMIZATION Hedy Attouch Université Montpellier II Montpellier, France
Giuseppe Buttazzo Università di Pisa Pisa, Italy
Gérard Michaille Université Montpellier II Montpellier, France
MPS Society for Industrial and Applied Mathematics Philadelphia
Mathematical Programming Society Philadelphia
MP06 Attouch FM A.qxp
11/11/2005
11:57 AM
Page 4
Copyright © 2006 by the Society for Industrial and Applied Mathematics and the Mathematical Programming Society. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA, 19104-2688.
Library of Congress Cataloging-in-Publication Data Attouch, H. Variational analysis in Sobolev and BV spaces : applications to PDEs and optimization / Hedy Attouch, Giuseppe Buttazzo, Gérard Michaille. p. cm. – (MPS-SIAM series on optimization) Includes bibliographical references and index. ISBN 0-89871-600-4 (pbk.) Mathematical optimization. 2. Calculus of variations. 3. Sobolev spaces. 4. Functions of bounded variation. 5. Differential equations, Partial. I. Buttazzo, Giuseppe. II. Michaille, Gérard. III. Title. IV. Series. QA402.5.A84 2005 519.6—dc22 2005051592
is a registered trademark.
i
i
i
“abmb 2005/1 page v i
Contents Preface
xi
1
1
Introduction
Part I: Basic Variational Principles 2
3
5
Weak solution methods in variational analysis 2.1 The Dirichlet problem: Historical presentation . . . . . . . . . . . . . 2.2 Test functions and distribution theory . . . . . . . . . . . . . . . . . . 2.2.1 Definition of distributions . . . . . . . . . . . . . . . . . 2.2.2 Locally integrable functions as distributions: Regularization by convolution and mollifiers . . . . . . . . . . . . . 2.2.3 Radon measures . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Derivation of distributions, introduction to Sobolev spaces 2.2.5 Convergence of sequences of distributions . . . . . . . . 2.3 Weak solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Weak formulation of the model examples . . . . . . . . . 2.3.2 Positive quadratic forms and convex minimization . . . . 2.4 Weak topologies and weak convergences . . . . . . . . . . . . . . . . 2.4.1 Topologies induced by functions in general topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The weak topology σ (V , V ∗ ) . . . . . . . . . . . . . . . 2.4.3 Weak convergence and geometry of uniformly convex spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Weak compactness theorems in reflexive Banach spaces . 2.4.5 The Dunford–Pettis weak compactness theorem in L1 () 2.4.6 The weak∗ topology σ (V ∗ , V ) . . . . . . . . . . . . . . Abstract variational principles 3.1 The Lax–Milgram theorem and the Galerkin method 3.1.1 The Lax–Milgram theorem . . . . . . 3.1.2 The Galerkin method . . . . . . . . . 3.2 Minimization problems: The topological approach .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
7 7 15 15 18 24 24 27 31 31 36 41 41 44 52 54 57 59 67 67 67 73 76
v
i
i i
i
i
i
i
vi
“abmb 2005/1 page v i
Contents 3.2.1 3.2.2
3.3
3.4
4
5
Extended real-valued functions . . . . . . . . . . . . . . 77 The interplay between functions and sets: The role of the epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.3 Lower semicontinuous functions . . . . . . . . . . . . . 80 3.2.4 The lower closure of a function and the relaxation problem 82 3.2.5 Inf-compactness functions, coercivity . . . . . . . . . . . 86 3.2.6 Topological minimization theorems . . . . . . . . . . . . 87 3.2.7 Weak topologies and minimization of weakly lower semicontinuous functions . . . . . . . . . . . . . . . . . . . . 91 Convex minimization theorems . . . . . . . . . . . . . . . . . . . . . 91 3.3.1 Extended real-valued convex functions and weak lower semicontinuity . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.2 Convex minimization in reflexive Banach spaces . . . . . 93 Ekeland’s ε-variational principle . . . . . . . . . . . . . . . . . . . . 98 3.4.1 Ekeland’s ε-variational principle and the direct method . . 98 3.4.2 A dynamical approach and proof of Ekeland’s ε-variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Complements on measure theory 4.1 Hausdorff measures and Hausdorff dimension . . . . . . . . . . . . 4.1.1 Outer Hausdorff measures and Hausdorff measures . . . 4.1.2 Hausdorff measures: Scaling properties and Lipschitz transformations . . . . . . . . . . . . . . . . . . . . . 4.1.3 Hausdorff dimension . . . . . . . . . . . . . . . . . . . 4.2 Set functions and duality approach to Borel measures . . . . . . . . 4.2.1 Borel measures as set functions . . . . . . . . . . . . . 4.2.2 Duality approach . . . . . . . . . . . . . . . . . . . . . 4.3 Introduction to Young measures . . . . . . . . . . . . . . . . . . . . 4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Slicing Young measures . . . . . . . . . . . . . . . . . 4.3.3 Prokhorov’s compactness theorem . . . . . . . . . . . 4.3.4 Young measures associated with functions and generated by functions . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Semicontinuity and continuity properties . . . . . . . . 4.3.6 Young measures capture oscillations . . . . . . . . . . . 4.3.7 Young measures do not capture concentrations . . . . . Sobolev spaces 5.1 Sobolev spaces: Definition, density results . . . . . . . . . . . . . . 5.2 The topological dual of H01 (). The space H −1 (). . . . . . . . . . 1,p 5.3 Poincaré inequality and Rellich–Kondrakov theorem in W0 () . . 5.4 Extension operators from W 1,p () into W 1,p (RN ). Poincaré inequalities and the Rellich–Kondrakov theorem in W 1,p () . . . . . . . . 5.5 The Fourier approach to Sobolev spaces. The space H s (), s ∈ R . 5.6 Trace theory for W 1,p () spaces . . . . . . . . . . . . . . . . . . . 5.7 Sobolev embedding theorems . . . . . . . . . . . . . . . . . . . . .
109 . 109 . 109 . . . . . . . . .
117 120 124 124 129 138 138 139 142
. . . .
142 143 146 149
151 . 152 . 165 . 168 . . . .
174 180 186 192
i
i i
i
i
i
i
Contents
6
7
8
vii 5.7.1 Case 1 ≤ p < N . . . . . . . . . . 5.7.2 Case p > N . . . . . . . . . . . . 5.7.3 Case p = N . . . . . . . . . . . . Capacity theory and elements of potential theory 5.8.1 Contractions operate on W 1,p () . 5.8.2 Capacity . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
194 200 202 206 206 212
Variational problems: Some classical examples 6.1 The Dirichlet problem . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The homogenous Dirichlet problem . . . . . . . . . . 6.1.2 The nonhomogenous Dirichlet problem . . . . . . . . 6.2 The Neumann problem . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 The coercive homogenous Neumann problem . . . . . 6.2.2 The coercive nonhomogenous Neumann problem . . . 6.2.3 The semicoercive homogenous Neumann problem . . 6.2.4 The semicoercive nonhomogenous Neumann problem 6.3 Mixed Dirichlet–Neumann problems . . . . . . . . . . . . . . . . 6.3.1 The Dirichlet–Neumann problem . . . . . . . . . . . 6.3.2 Mixed Dirichlet–Neumann boundary conditions . . . 6.4 Heterogenous media: Transmission conditions . . . . . . . . . . . 6.5 Linear elliptic operators . . . . . . . . . . . . . . . . . . . . . . . 6.6 The nonlinear Laplacian p . . . . . . . . . . . . . . . . . . . . . 6.7 The Stokes system . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
217 218 218 222 225 225 229 230 234 236 236 238 240 245 249 253
. . . . . .
257 257 260 262 263 276 276
5.8
“abmb 2005/1 page v i
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
The finite element method 7.1 The Galerkin method: Further results . . . . . . . . . . . . . . . . . 7.2 Description of finite element methods . . . . . . . . . . . . . . . . . 7.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Convergence of the finite element method . . . . . . . . . . . . . . 7.5 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Flat triangles . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 H 2 () regularity of the solution of the Dirichlet problem on a convex polygon . . . . . . . . . . . . . . . . . . . 7.5.3 Finite element methods of type P2 . . . . . . . . . . . . Spectral Analysis of the Laplacian 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The Laplace–Dirichlet operator: Functional setting . . . . . . . . . . 8.3 Existence of a Hilbertian basis of eigenvectors of the Laplace–Dirichlet operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 The Courant–Fisher min-max and max-min formulas . . . . . . . . 8.5 Multiplicity and asymptotic properties of the eigenvalues of the Laplace–Dirichlet operator . . . . . . . . . . . . . . . . . . . . . . 8.6 A general abstract theory for spectral analysis of elliptic boundary value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 277 . 277 279 . 279 . 281 . 286 . 289 . 297 . 303
i
i i
i
i
i
i
viii 9
Contents Convex duality and optimization 9.1 Dual representation of convex sets . . . . . . . . . . . . . . . . 9.2 Passing from sets to functions: Elements of epigraphical calculus 9.3 Legendre–Fenchel transform . . . . . . . . . . . . . . . . . . . 9.4 Legendre–Fenchel calculus . . . . . . . . . . . . . . . . . . . . 9.5 Subdifferential calculus for convex functions . . . . . . . . . . . 9.6 Mathematical programming: Multipliers and duality . . . . . . . 9.6.1 Karush–Kuhn–Tucker optimality conditions . . . . 9.6.2 The marginal approach to multipliers . . . . . . . . 9.6.3 The Lagrangian approach to duality . . . . . . . . . 9.6.4 Duality for linear programming . . . . . . . . . . . 9.7 A general approach to duality in convex optimization . . . . . . 9.8 Duality in the calculus of variations: First examples . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
Part II: Advanced Variational Analysis
307 307 312 318 328 331 340 341 345 353 356 358 365
369
10
Spaces BV and SBV 10.1 The space BV (): Definition, convergences, and approximation . . . 10.2 The trace operator, the Green’s formula, and its consequences . . . . . 10.3 The coarea formula and the structure of BV functions . . . . . . . . . 10.3.1 Notion of density and regular points . . . . . . . . . . . . 10.3.2 Sets of finite perimeter, structure of simple BV functions 10.3.3 Structure of BV functions . . . . . . . . . . . . . . . . . 10.4 Structure of the gradient of BV functions . . . . . . . . . . . . . . . . 10.5 The space SBV () . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . .
371 371 378 387 388 395 402 406 408 409 410
11
Relaxation in Sobolev, BV, and Young measures spaces 11.1 Relaxation in abstract metrizable spaces . . . . . . . . . . . . . . . 11.2 Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1 11.3 Relaxation of integral functionals with domain W 1,1 (, Rm ) . . . . 11.4 Relaxation in the space of Young measures in nonlinear elasticity . . 11.4.1 Young measures generated by gradients . . . . . . . . . 11.4.2 Relaxation of classical integral functionals in Y(; E) .
. . . . . .
417 417 421 437 449 450 457
-convergence and applications 12.1 -convergence in abstract metrizable spaces . . . . . . . 12.2 Application to the nonlinear membrane model . . . . . . 12.3 Application to homogenization of composite media . . . 12.3.1 The quadratic case in one dimension . . . . 12.3.2 Periodic homogenization in the general case 12.4 Application to image segmentation and phase transitions . 12.4.1 The Mumford–Shah model . . . . . . . . .
. . . . . . .
463 463 467 472 472 475 482 482
12
“abmb 2005/1 page v i
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
i
i i
i
i
i
i
Contents
ix 12.4.2 12.4.3
Variational approximation of a more elementary problem: A phase transitions model . . . . . . . . . . . . . . . . . 483 Variational approximation of the Mumford–Shah functional energy . . . . . . . . . . . . . . . . . . . . . . . . 487
13
Integral functionals of the calculus of variations 497 13.1 Lower semicontinuity in the scalar case . . . . . . . . . . . . . . . . . 497 13.2 Lower semicontinuity in the vectorial case . . . . . . . . . . . . . . . 503 13.3 Lower semicontinuity for functionals defined on the space of measures 510 13.4 Functionals with linear growth: Lower semicontinuity in BV and SBV .513 13.4.1 Lower semicontinuity and relaxation in BV . . . . . . . 513 13.4.2 Compactness and lower semicontinuity in SBV . . . . . 515
14
Application in mechanics and computer vision 14.1 Problems in pseudoplasticity . . . . . . . . . . . . . . 14.1.1 Introduction . . . . . . . . . . . . . . . . 14.1.2 The Hencky model . . . . . . . . . . . . . 14.1.3 The spaces BD(), M(div), and U () . . 14.1.4 Relaxation of the Hencky model . . . . . . 14.2 Some variational models in fracture mechanics . . . . . 14.2.1 A few considerations in fracture mechanics 14.2.2 A first model in one dimension . . . . . . 14.2.3 A second model in one dimension . . . . . 14.3 The Mumford–Shah model . . . . . . . . . . . . . . .
15
16
“abmb 2005/1 page ix i
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
521 521 521 523 524 528 529 529 532 543 549
Variational problems with a lack of coercivity 15.1 Convex minimization problems and recession functions . . . 15.2 Nonconvex minimization problems and topological recession 15.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Limit analysis problems . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
553 553 572 581 588
An introduction to shape optimization problems 16.1 The isoperimetric problem . . . . . . . . . 16.2 The Newton problem . . . . . . . . . . . 16.3 Optimal Dirichlet free boundary problems 16.4 Optimal distribution of two conductors . .
. . . .
. . . .
. . . .
. . . .
. . . .
601 602 604 605 609
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . .
. . . .
. . . . . . . . . .
. . . .
. . . .
Bibliography
615
Index
631
i
i i
i
i
“abmb 2005/1 page x i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page x i
Preface Most of the material in this book comes from graduate-level courses on variational analysis, PDEs, and optimization which have been given during the last decades by the authors, H. Attouch and G. Michaille at the University of Montpellier (France) and G. Buttazzo at the University of Pisa (Italy). Our objective is twofold. The first objective is to provide to students the basic tools and methods of variational analysis and optimization in infinite dimensional spaces together with applications to classical PDE problems. This corresponds to the first part of the book, chapters 1 through 9, and takes place in classical Sobolev spaces. We have made an effort to provide, as much as possible, a self-contained exposition, and we try to introduce each new development from various perspectives (historical, numerical, . . .). The second objective, which is oriented more toward research, is to present new trends in variational analysis and some of the most recent developments and applications. This corresponds to the second part of the book, chapters 10 through 16, where in particular are introduced the BV () spaces. This organization is intended to make the book accessible to a large audience, from students to researchers, with various backgrounds in mathematics, as well as physicists, engineers, and others. As a guideline, we try to portray direct methods in modern variational analysis—one century after D. Hilbert delineated them in his famous lecture at Collège de France, Paris, 1900. The extraordinary success of these methods is intimately linked with the development, throughout the 20th century, of new branches in mathematics: functional analysis, measure theory, numerical analysis, (nonlinear) PDEs, and optimization. We try to show in this book the interplay among all these theories and also between theory and applications. Variational methods have proved to be very flexible. In recent years, they have been developed to study a number of advanced problems of modern technology, like composite materials, phase transitions, thin structures, large deformations, fissures, and shape optimization. To grasp these often involved phenomena, the classical framework of variational analysis, which is presented in the first part, must be enlarged. This is the motivation for the introduction in the second part of the book of some advanced techniques, like BV and SBV spaces, Young measures, -convergence, recession analysis, and relaxation methods. Finally, we wish to stress that variational analysis is a remarkable example of international collaboration. All mathematical schools have contributed to its success, and it is a modest symbol that this book has been written in collaboration between mathematicians of two schools, French and Italian. This book owes much to the support of the Universities
xi
i
i i
i
i
i
i
xii
“abmb 2005/1 page x i
Preface
of Montpellier (France) and Pisa (Italy) and of their mathematical departments, and the convention of cooperation that connects them. Acknowledgments. We would like to express our sincere thanks to all the students and colleagues whose comments and encouragement helped us in writing the final manuscript. Year by year, the redaction of the book profited much from their comments. We are grateful to our colleagues in the continuous optimization community, who strongly influenced the contents of the book and encouraged us from the very beginning in writing this book. The chapter on convex analysis benefitted much from the careful reading of L. Thibault and M. Valadier. We would like to thank SIAM and the editorial board of the MPS-SIAM Series on Optimization for the quality of the editing process. We address special thanks to B. Lacan in Montpellier, who helped us when we started the project. Finally, we take this opportunity to express our consideration and gratitude to H. Brezis and E. De Giorgi, who were our first guides in the discovery of this fascinating world of variational methods and their applications.
i
i i
i
i
i
i
“abmb 2005/1 page 1 i
Chapter 1
Introduction
Let us detail the contents of each of the two parts of the book. Part I: Basic Variational Principles. In Part I, we follow as a guideline the variational treatment of the celebrated Dirichlet problem. We show how the program of D. Hilbert, which was first delineated in his famous lecture at Collège de France in 1900 [154], has been progressively solved throughout the 20th century. We introduce the basic elements of variational analysis which allow one to solve this classical problem and closely related ones, like the Neumann problem and the Stokes problem. Chapter 2 contains an extensive exposition of weak solutions methods in variational analysis and of the accompanying notions: test functions, the distribution theory of L. Schwartz, weak convergences, and topologies. Chapter 3 provides an exposition of the basic abstract variational principles. We enhance the importance of the direct method for solving minimization problems and put to the fore some of its basic topological ingredients: lower semicontinuity, coercivity, and infcompactness. We show how weak topologies, reflexivity, and convexity properties come naturally into play. We insist on the modern approach to optimization theory where the concept of epigraph of a function plays a central role; see the recent monograph of Rockafellar and Wets [205] on variational analysis where the epigraphical analysis is systematically developed. Chapter 4 contains some complements on geometric measure theory. We introduce in a self-contained way the notion of Hausdorff measure, which allows us to recover, as special cases, both the Lebesgue measure on an open set of RN and surface measures (which play an important role, for example, in the definition of the space trace of Sobolev spaces). These two basic ingredients, roughly speaking, the generalized differential calculus of distribution theory and the generalized integration theory of Lebesgue, allow us to introduce in chapter 5 the classical Sobolev spaces which provide the right functional setting for the variational approach to the studied problems. All the ingredients of the variational approach to the model examples are now available: in chapter 6 we describe some of them, including Dirichlet, Neumann, and mixed problems. Let us say that we are in the classical favorable situation: we have to minimize a convex 1
i
i i
i
i
i
i
2
“abmb 2005/1 page 2 i
Chapter 1. Introduction
coercive lower semicontinuous function on a reflexive Banach space and, in such a situation, the direct method of Hilbert and Tonelli does apply. Chapters 7 and 8 complete this classical portrayal of variational methods by introducing two of the most powerful numerical tools which allow one to compute approximate solutions of variational problems: finite element methods and spectral analysis methods. Each of these two methods corresponds to a very specific type of Galerkin approximation of an infinite dimensional problem by a sequence of finite dimensional ones. Each method has its own advantages; for example, finite element methods allow one to treat engineering problems involving general domains, like the wing of a plane, which explains their great success. Around 1970, the study of constrained problems and variational inequalities led Stampacchia, Browder, Brezis, Moreau, Rockafellar et al. to develop the elements of a unilateral variational analysis. In particular, convex variational analysis has known considerable success and has familiarized mathematicians with the idea that sets play a decisive role in analysis. The Fenchel duality, the subdifferential calculus of convex functions, and the extension of the Fermat rule are striking examples of this new approach. The role of the epigraph has progressively emerged as essential in the geometrical understanding of these concepts. Chapter 9 provides a thorough exposition of these elements of convex variational analysis in infinite dimensional spaces. We stress the importance of the Fenchel duality, which allows us to associate to each convex variational problem a dual one, whose solutions have in general a deep physical (or numerical or economical) interpretation as multipliers. Part II: Advanced Variational Analysis. This second part corresponds to chapters 10 through 16 and deals with our second objective, which is to present new trends in variational analysis. Indeed, in recent years, variational methods have proved to be very flexible. They have been developed to study a number of advanced problems of modern technology, like composite materials, image processing, and shape optimization. To grasp these phenomena, the classical framework of variational analysis, which was studied in Part I, must be enlarged. Let us describe some of these extensions: 1. The modelization of a large number of problems in physics, image processing, requires the introduction of new functional spaces permitting discontinuities of the solution. In phase transitions, image segmentation, plasticity theory and the study of cracks and fissures, in the study of the wake in fluid dynamics and the shock theory in mechanics, the solution of the problem presents discontinuities along onecodimensional manifolds. Its first distributional derivatives are now measures which may charge zero Lebesgue measure sets, and the solution of these problems cannot be found in classical Sobolev spaces. The classical theory of Sobolev spaces, which was developed in chapter 5, is completed in chapter 10 by a self-contained and detailed presentation of these spaces, BV (), SBV (), BD(). The space BV (), for example, is the space of functions with bounded variations, and a function u belongs to BV () iff its first distributional derivatives are bounded measures. The SBV () space is the subspace of BV () which consists of functions whose first distributional derivatives are bounded measures with no Cantor part.
i
i i
i
i
i
i
“abmb 2005/1 page 3 i
3 2. In chapter 12, we introduce the concept of -convergence, which provides a parametrized version of the direct method in variational analysis. Following Stampacchia’s work, Mosco [187], [188] and Joly [161] introduced the Mosco epiconvergence (1970) of sequences of convex functions to study approximation and perturbation schemes in variational analysis and potential theory. The general topological concept, without any convexity assumption, has progressively emerged, and De Giorgi in 1975 introduced the notion of -convergence for sequences of functions. It corresponds to the topological set convergence of the epigraphs, whence the equivalent terminology “epi-convergence.” This concept has been successfully applied to a large variety of approximation and perturbation problems in calculus of variations and mechanics: homogenization of composite materials, materials with many small holes and porous media, thin structures and reinforcement problems, and so forth. We illustrate the concept by describing some recent applications to thin structures, composite material, phase transitions, and image segmentation. 3. Chapters 11 and 13 deal with the question of lower semicontinuity and relaxation of functionals of calculus of variations. Indeed, as a general rule, when applying the direct method to a functional F which is not lower semicontinuous, one obtains that minimizing sequences converge to solutions of the relaxed problem, which is the minimization of the lower semicontinuous envelope cl F of F . In the vectorial case, i.e., when functionals are defined on Sobolev spaces W 1,p (, Rm ), ⊂ RN , relaxation with respect to the weak topology of W 1,p (, Rm ) (or strong topology of Lp (, Rm )) leads to the important concepts of quasiconvexity (in the sense of Morrey), polyconvexity, and rank-one convexity. We consider as well the case of functionals with linear growth and the corresponding lower semicontinuity and relaxation problems on BV and SBV spaces. All these notions play an important role in the modeling of large deformations in mechanics and plasticity, as described in chapter 14. Following the microstructure school of Ball and James, in the modeling of the solid/solid phase transformations, the density energy possesses a multiwell structure. An alternative and appropriate procedure consists in relaxing the corresponding free energy functional in the space of Young measures generated by gradients. 4. Another important aspect of the direct method concerns the coercivity property. In chapter 15, we examine how the method works when the variational problem has a lack of coercivity. In that case, existence of solutions relies on compatibility conditions, whose general formulation involves recession functions. 5. The last topic considered, in chapter 16, is shape optimization, which is a good illustration of the powerfulness of direct methods in variational analysis and also of their limitations.
i
i i
i
i
“abmb 2005/1 page 4 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 5 i
Part I
Basic Variational Principles
5
i
i i
i
i
“abmb 2005/1 page 6 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 7 i
Chapter 2
Weak solution methods in variational analysis
2.1 The Dirichlet problem: Historical presentation Throughout this book, we adopt the following notation: is an open subset of RN (N ≤ 3 for applications in classical mechanics), and x = (x1 , x2 , . . . , xN ) is a generic point in . The topological boundary of is denoted by ∂. Let g : ∂ −→ R be a given function which is defined on the boundary of . The Dirichlet problem consists in finding a function u : −→ R which satisfies u = 0 u=g
on on
,
(2.1)
∂.
(2.2)
The operator is the Laplacian u =
N ∂ 2u
∂xi2 i=1
=
∂ 2u ∂ 2u + ··· + 2 ; 2 ∂x1 ∂xN
it is equal to the sum of the second partial derivatives of u with respect to each variable x1 , x2 , . . . , xN . Equation (2.1) is the Laplace equation, and a solution of this equation is said to be harmonic on . Clearly, there are many harmonic functions. The following examples of harmonic functions illustrate how rich this family is: N • u(x) = N i=1 ai xi + b (affine function) is harmonic on R ; • u(x) = ax12 + bx22 + cx32 is harmonic on R3 iff a + b + c = 0; • u(x) = ex1 cos x2 and v(x) = ex1 sin x2 are harmonic on R2 (note that u(x1 , x2 ) = Re e(x1 +ix2 ) and v(x1 , x2 ) = Im e(x1 +ix2 ) ); • u(x) = (x12 + x22 + x32 )−1/2 is harmonic on R3 \{0} (Newtonian potential). The study of harmonic functions is a central topic of the so-called potential theory and of harmonic analysis. We will see further, as the above examples suggest, the close connections between this theory, the theory of the complex variable, and the potential theory. 7
i
i i
i
i
i
i
8
“abmb 2005/1 page 8 i
Chapter 2. Weak solution methods in variational analysis
Thus, we can reformulate the Dirichlet problem by saying that we are looking for a harmonic function on which satisfies the boundary data u = g on ∂. It is called a boundary value problem. The condition u = g on ∂, which consists in prescribing the value of the function u on the boundary of , is called the Dirichlet boundary condition and gives rise to the name of the problem. Let us consider, for illustration, the elementary case N = 1. Take =]a, b[ an open bounded interval. Given two real numbers g1 and g2 , the Dirichlet problem reads as follows: find u : [a, b] −→ R such that u = 0 on ]a, b[, u(a) = g1 , u(b) = g2 . Clearly, this problem has a unique solution, whose graph in R2 is the line segment joining point (a, g1 ) to point (b, g2 ). We will see that for an open bounded subset and under some regularity hypotheses on and g covering most practical situations, one can prove existence and uniqueness of a solution of the Dirichlet problem. Indeed, this is a long story whose important steps are summarized below. It is in 1782 that the Laplace equation appears for the first time. When studying the orbits of the planets, Laplace discovered that the Newtonian gravitational potential of a distribution of mass of density ρ on a domain ⊂ R3 , which is given by the formula ρ(y) 1 dy, (2.3) u(x) = 4π |x − y| satisfies the equation u = 0
¯ R3 \.
on
(2.4)
Indeed, it is a good exercise to establish this formula. One first verifies that the Newtonian potential v(x1 , x2 , x3 ) = (x12 + x22 + x32 )−1/2 satisfies v = 0 on R3 \{0}. Then a direct derivation under the integral sign yields that u is ¯ harmonic on R3 \. In 1813, Poisson establishes that on the potential u satisfies −u = ρ
on ,
(2.5)
which is the so-called Poisson equation. The central role played by the Laplace and Poisson equations in mathematical physics appeared with more and more evidence, especially because of the work of Gauss. In 1813, Gauss established the following formula (which is often called the Gauss formula or the − → divergence theorem). Given a vector field V : ⊂ R3 −→ R3 , − → − → → V (x) · − n (x) ds(x), (2.6) div V (x) dx =
∂
− → which states that the volume integral of the divergence of the vector field V is equal to the − → global outward flux of V through the boundary of . In the above formula, if we denote − → V (x) = (v1 (x), v2 (x), v3 (x)),
i
i i
i
i
i
i
2.1. The Dirichlet problem: Historical presentation
“abmb 2005/1 page 9 i
9
∂vi ∂v1 ∂v2 ∂v3 − → div V (x) = (x) = (x) + (x) + (x) ∂xi ∂x1 ∂x2 ∂x3 i − → → is the divergence of V . The vector − n (x) is the unit vector which is orthogonal to ∂ at x and which is oriented toward the outside of . The measure ds is the two-dimensional Hausdorff measure on ∂. Let us briefly explain how the mathematical formulation of conservation laws in − → physics leads to the Laplace equation. Suppose that the vector field V derives from a potential u, that is, ∂u − → V (x) = Du(x) = gradu(x) = ∇u(x) = (x) (2.7) ∂xi i=1,...,N (these are the most commonly used notations). Suppose, moreover, that the vector field − − → → − → V (x) is such that V · n ds = 0 for all closed surfaces ∂G ⊂ . ∂G By the Gauss theorem, it follows that − → div V dx = 0 ∀ G ⊂ G
and hence
− → div V = 0
on .
(2.8)
From (2.7) and (2.8), noticing that div(gradu) = u, we obtain u = 0, that is, u is harmonic. The above argument is valid both in the case of the gravitational vector field of Newton and in the case of the electrostatic field of Coulomb in the regions where there is no mass (respectively, charges). With the help of this formula, Gauss was able to prove a number of important properties of harmonic functions, like the mean value property; this was the beginning of the potential theory. Riemann, who was successively a student of Gauss (1846–1847 in Göttingen) and of Dirichlet (1847–1849 in Berlin), established the foundations of the theory of the complex variable and made the link (when N = 2) with the Laplace equation. Let us recall that for any function z ∈ C −→ f (z), which is assumed to be derivable as a function of the complex variable z (f is then said to be holomorphic), its real and imaginary parts P and Q (f (z) = P (x, y) + iQ(x, y), where z = x + iy) satisfy the so-called Cauchy–Riemann equations ∂P ∂Q = , ∂x ∂y ∂Q ∂P =− . ∂y ∂x
(2.9)
i
i i
i
i
i
i
10
“abmb 2005/1 page 1 i
Chapter 2. Weak solution methods in variational analysis
It follows that P =
∂ 2Q ∂ 2Q − = 0, ∂y∂x ∂x∂y
Q = −
∂ 2P ∂ 2P − = 0. ∂x∂y ∂y∂x
Thus, the real part and the imaginary part of a holomorphic function are harmonic. This approach allows, for example, solution in an elegant way of the Dirichlet problem in a disc. Take = D(0, 1) = {z ∈ C : |z| < 1}, the unit disc centered at the origin in R2 . Given g : ∂D −→ R a continuous function, we want to solve the Dirichlet problem u = 0 on D, (2.10) u = g on ∂D. Let us start with the Fourier expansion of the 2π -periodic function g(eiθ ) = cn (g)einθ ,
(2.11)
n∈Z
+π 1 it −int where cn (g) = 2π dt. Note that the above Fourier series converges in −π g(e )e 2 L (−π, +π) norm sense (Dirichlet theorem). Indeed, when starting with the Fourier expansion of the boundary data g, one can give an explicit formula for the solution u of the corresponding Dirichlet problem: u(reiθ ) := cn (g)r |n| einθ . (2.12) n∈Z
Clearly, when taking r = 1, one obtains u = g on ∂D. Thus, the only point one has to verify is that u is harmonic on D. Take r < 1 and replace cn (g) by its integral expression in (2.12). By a standardargument +πbased on the uniform convergence of the series, one can exchange the symbols n and −π to obtain +π 1 g(eit ) r |n| ein(θ−t) dt. u(reiθ ) = 2π −π n∈Z Let us introduce Pr (θ ) =
r |n| einθ ,
(2.13)
n∈Z
the so-called Poisson kernel, and observe that (we denote z = reiθ ) it 1 − r2 e +z = . Pr (θ − t) = Re it e −z 1 − 2r cos(θ − t) + r 2
(2.14)
Hence,
+π it e +z 1 g(eit )dt, (2.15) 2π −π eit − z and u appears as the real part of a holomorphic function, which says that u is harmonic on D. Using (2.14) one obtains the Poisson formula +π 1 − r2 1 g(eit ) dt. (2.16) u(reiθ ) = 2π −π 1 − 2r cos(θ − t) + r 2 u(reiθ ) = Re
i
i i
i
i
i
i
2.1. The Dirichlet problem: Historical presentation
“abmb 2005/1 page 1 i
11
By similar arguments, one can explicitely solve the Dirichlet problem on a square ]0, a[×]0, a[ of R2 . Unfortunately, these methods can be used only in the two-dimensional case. The modern general treatment of the Dirichlet problem starts with the Dirichlet principle, whose formulation goes back to Gauss (1839), Lord Kelvin, and Dirichlet. It can be formulated as follows. The solution of the Dirichlet problem is the solution of the following minimization problem:
N ∂v 2 dx : v = g on ∂ . (2.17) min ∂xi i=1 The functional v → J (v) :=
N ∂v 2 i=1
∂xi
dx
is called the Dirichlet integral or Dirichlet energy. Thus, the Dirichlet principle states that the solution of the Dirichlet problem minimizes, over all functions v satisfying the boundary data v = g on ∂, the Dirichlet energy. Equivalently, J (u) ≤ J (v) ∀v, v = g on ∂, u = g on ∂, that is, u is characterized by a minimization of the energy principle. One can easily verify, at least heuristically, that the solution u of the minimization problem (2.17) is a solution of the Dirichlet problem (2.1), (2.2). Indeed, the Laplace equation (2.1) can be seen as the optimality condition satisfied by the solution of the minimization problem (2.17). The celebrated Fermat’s rule which asserts that the derivative of a function is equal to zero at any point u where f achieves a minimum (respectively, maximum) can be developed in our situation by using the notion of directional derivative. When dealing with problems coming from variational analysis, the so-obtained optimality condition is called the Euler equation. Thus let us take v : −→ R, which satisfies v = 0 on ∂. Then, for any t ∈ R, u + tv still satisfies u + tv = g on ∂ and, since u minimizes J on the set {w = g on ∂}, we have J (u) ≤ J (u + tv). Let us compute for any t ∈ R, t = 0 1 1 J (u + tv) − J (u) = t t
|Du + tDv|2 − |Du|2 |Dv|2 . = 2 Du · Dv dx + t
For any t > 0, this is a nonnegative quantity, and thus by letting t −→ 0+ Du · Dv dx ≥ 0.
i
i i
i
i
i
i
12
“abmb 2005/1 page 1 i
Chapter 2. Weak solution methods in variational analysis
By taking t < 0 and letting t −→ 0− , we obtain the reverse inequality Du · Dv dx ≤ 0.
Finally
Du · Dv dx = 0
for any v = 0 on ∂.
Taking v regular and after integration by parts, we obtain (u)v dx = 0 for any v regular, v = 0 on ∂,
that is, u = 0 on ∂. Riemann recognized the importance of this principle but he did not discuss its validity. In 1870, Weierstrass, who was a very systematic and rigorous mathematician, discovered when studying some results of his friend Riemann that the Dirichlet principle raises some difficulties. Indeed, Weierstrass proposed the following example (apparently close to the Dirichlet problem!):
2 +1 dv dx : v(−1) = a, v(+1) = b , (2.18) x2 min dx −1 which fails to have a solution. A minimizing sequence for (2.18) can be obtained by considering the viscosity approximation problem
2 +1 dv 1 2 x + 2 dx : v(−1) = a, v(+1) = b , (2.19) min n dx −1 which now has a unique solution un given by un (x) =
a + b a − b arctan nx , − 2 arctan n 2
n = 1, 2, . . . .
One can directly verify that un satisfies the boundary data and that 2 +1 2 dun x dx −→ 0 as n −→ +∞. dx −1 Thus the value of the infinimum of (2.18) is zero. But there is no regular function (continuous, piecewise C1 ) which satisfies the boundary conditions and such that +1 2 du x2 dx = 0. dx −1 Such a function would satisfy u = constant on ] − 1, +1[ which is incompatible with the boundary data when a = b. As we will see, the pathology of the Weierstrass example comes from the coefficient dv 2 which vanishes (at zero) on the domain = (−1, +1). As a result, x 2 in front of dx
i
i i
i
i
i
i
2.1. The Dirichlet problem: Historical presentation
“abmb 2005/1 page 1 i
13
there is a lack of uniform ellipticity or coercivity, which is not the case in the Dirichlet problem. Thus, the Weierstrass example is not a counterexample to the Dirichlet principle; its merit is to underline the shortcomings of this principle. Moreover, it raises a decisive question, which is to understand in which class of functions one has to look for the solution of the Dirichlet principle. Until that date, it was commonly admitted that the functions to deal with have to be regular C1 or C2 (differentiation being taken in the classical sense), depending on the situation. It is only in 1900 with a famous conference at the Collège de France in Paris that Hilbert formulates the foundations of the modern variational approach to the Dirichlet principle and hence to the Dirichlet problem. These ideas, which have been worked out in the classical book of mathematical physics of Courant and Hilbert (1937) [115], can be summarized as follows. The basic idea of Hilbert is to enlarge the class of functions in which one looks for a solution of the Dirichlet principle and simultaneously to generalize the notion of solution. More precisely, Hilbert proposed the following general method for solving the Dirichlet principle: 1. First, construct a minimizing sequence of functions (un )n∈N . 2. Then, extract from this minimizing sequence a convergent subsequence, say, unk −→ u. ¯ The so-obtained function u¯ is the (generalized) solution of the original problem. This is what in modern terminology is called a compactness argument. Let us first notice that, even when starting with a sequence (un )n∈N of smooth functions, its limit u¯ may be no more differentiable in the classical sense. Such construction of a generalized solution, which corresponds to finding a space obtained by a completion procedure, is very similar to the one which consists of passing from the set of rational numbers to the set of real numbers or from the Riemann integrable functions to Lebesgue integrable functions. This was a decisive step, and this program was developed throughout the 20th century by a number of mathematicians from different countries. The functional space in which to find the generalized solution was only in an implicit form in the work of Courant and Hilbert (as a completion of piecewise C1 functions). The celebrated Sobolev spaces were gradually introduced in the work of Friedrichs (1934) [140] and, for the Soviet mathematical school, Sobolev (1936) [208] and Kondrakov. The compactness argument, that is, the compact embedding of the Sobolev space H 1 () into L2 () when is bounded, was proved by Rellich (1930). The modern language of distributions which provides a generalized notion of derivatives for nonsmooth functions (and much more) was systematically developed by L. Schwartz (1950), who was teaching at the Ecole Polytechnique in Paris. This has proved to be a very flexible tool for handling generalized solutions for PDEs. The ideas of the compactness method introduced by Hilbert to solve the Dirichlet principle were developed in a systematic way by the Italian school. Tonelli (1921) had the intuition to put together the semicontinuity notion of Baire and the Ascoli-Arzela compactness theorem. So doing, he was able to transfer from real functions to functionals of the calculus of variations (like the Dirichlet integral) the classical compactness argument. He developed the so-called direct methods in the calculus of variations whose basic topological ingredients are the lower semicontinuity of the functional and the compactness of the lower
i
i i
i
i
i
i
14
“abmb 2005/1 page 1 i
Chapter 2. Weak solution methods in variational analysis
level sets of the functional. In the line of the Hilbert approach, he founded the basis of the topological method for minimization problems in infinite dimensional spaces. Thus, modern tools in variational analysis provide a general and quite simple approach to existence results of generalized solutions for a large number of boundary value problems from mathematical physics. The natural question which then arises is to study the regularity of such solutions and to establish under which conditions on the data and the domain we have a classical solution. A large number of contributions have been devoted to this difficult question. Let us say that in the case of the Dirichlet problem, if the domain and g ¯ For a detailed are sufficiently smooth, then there exists a classical solution u ∈ C2 (). bibliography on the regularity problem, see Brezis [90, Ch. IX]. So far, we have considered the Dirichlet problem as it has been introduced historically. Indeed, one can reformulate it in an equivalent form which is more suitable for a variational treatment. Let us introduce g˜ : −→ R a function defined on the whole of and whose restriction on ∂ is equal to g: g| ˜ ∂ = g.
(2.20)
One usually prescribes g˜ to preserve the regularity properties on g and ∂ (for example, continuity or Lipschitz continuity) and, in most practical situations, this is quite easy to achieve. Take as a new unknown function v := u − g. ˜
(2.21)
Clearly it is equivalent to find v or u. The boundary value problem satisfied by v is
−v = f on , v = 0 on ∂
(2.22)
with f = g. ˜ The Dirichlet boundary data v = 0 on ∂ is then said to be homogeneous, and problem (2.22) is often called the homogeneous Dirichlet problem. Note that a number of important physical situations lead to (2.22). For example, when describing the electrostatic potential u in a domain with a density of charge f and whose boundary is connected with the earth, then
−u = f on , u = 0 on ∂.
Let us consider an elastic membrane in the horizontal plane x3 = 0 occuping a domain in the (x1 , x2 ) plane. Suppose that at each point x ∈ a vertical force of intensity f (x) is exerted and that the membrane is fixed on its boundary. Let us denote by u(x) the vertical displacement of the point x of the membrane when the equilibrium is attained. Then
−c u = f on , u = 0 on ∂,
where c > 0 is the elasticity coefficient of the membrane.
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 1 i
15
2.2 Test functions and distribution theory 2.2.1
Definition of distributions
The concept of distribution is quite natural if we start from some simple physical observations. Let us first consider a function f ∈ L1loc (), where is an open subset of RN . One cannot, for an arbitrary x ∈ , give a meaning to f (x). But, from a physical point of view, it is meaningful to consider the average of f on a small ball with center x and radius ε > 0 and let ε go to zero. Indeed, it follows from the Lebesgue theory that for almost every x ∈ , 1 lim f (ξ )dξ ε−→0 |B(x, ε)| B(x,ε) exists and the limit is a representative of f . (Such points x are called Lebesgue points of f .) Let us notice that 1 f (ξ )dξ = f (ξ )vx,ε (ξ )dξ, |B(x, ε)| B(x,ε)
where vx,ε (ξ ) =
if ξ ∈ B(x, ε), elsewhere.
1 |B(x,ε)|
0
1 Thus, it is equivalent to know f as an element of Lloc () or to know the value of the integrals f (x)v(x)dx for v belonging to a sufficiently large class of functions. This is the starting point of the notion of distribution. Functions v will be called test functions. It is equivalent to know f as a function or as a distribution, the distribution being viewed as the mapping which to a test function v associated f vdx: f : v → f vdξ.
L1loc
If one is concerned only with functions, there are many possibilities for the choice of the class of test functions. Let us go further and suppose we want to model the concept of a Dirac mass, that is, of a unit mass concentrated at a point. This is an important physical notion which can be viewed as the limiting case of the unit mass concentrated in a ball of radius ε > 0 with ε −→ 0. For example consider the Dirac mass at the origin 0 ∈ and the functions 1 if x ∈ B(0, ε), fε (x) = |B(0,ε)| 0 elsewhere. Then, the distribution attached to fε is the mapping 1 v → fε (ξ )v(ξ )dξ = v(ξ )dξ. |B(0, ε)| B(0,ε) When passing to the limit as ε −→ 0, we need to take test functions at least continuous (at the origin) in order for the above limit to exist. The limiting distribution is the mapping v ∈ Cc () → v(0),
i
i i
i
i
i
i
16
“abmb 2005/1 page 1 i
Chapter 2. Weak solution methods in variational analysis
where Cc () is the set of continuous real-valued functions, with compact support in . This is the modern way to consider a Dirac mass (at the origin) as the linear mapping which to a regular test function v associates its value at the origin. Note that this distribution is no longer attached to a function. A similar device can be developed to attach a distribution to more general mathematical objects, such as the derivative of a L1loc function. Take f ∈ L1loc () and try to define the ∂f . Let us approximate f by a sequence fn of smooth functions distribution attached to ∂x i n is the mapping fn . Then the distribution of ∂f ∂xi v →
∂fn (x)v(x)dx. ∂xi
But we cannot pass to the limit on this quantity just by taking v continuous, like in the previous step. So let us assume v to be of class C1 on with a compact support. Then, let us integrate by parts ∂fn ∂v vdx = − fn dx; ∂xi ∂xi we can now pass to the limit on this last expression. Finally, the distribution attached to is the mapping ∂v dx, v ∈ C1c () → − f ∂x i
∂f ∂xi
where C1c () is the set of real-valued functions of class C1 with compact support in . We are now ready to define the concept of distribution. We consider the space of test functions D(), which is the vector space of real or complex valued functions on which are indefinitely derivable and with compact support in . (This allows us to cover all the previous situations and much more!) For v ∈ D(), we say that the support of v is contained in a compact subset K ⊂ , and we write sptv ⊂ K if v = 0 on \ K (equivalently, {v = 0} ⊂ K). We use the following notation. An element p ∈ NN , p = (p1 , p2 , . . . , pN ), where N is the dimension of the space ( ⊂ RN ), is called a multi-index. The integer |p| = p1 + p2 + · · · + pN is called the lenght of the multi-index p. For v ∈ D(), we write D p v :=
∂ |p| v p . . . . ∂xNN
p p ∂x1 1 ∂x2 2
The operator D p can be viewed as the composition of elementary partial derivation operators Dp = where ( ∂x∂ i )pi =
∂ ∂xi
◦
∂ ∂xi
◦ ··· ◦
∂ ∂x1
∂ , ∂xi
p1
◦ ··· ◦
∂ ∂xN
pN ,
pi times.
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 1 i
17
Let us introduce the notion of sequential convergence on D(). It is the only topological notion on D() that we use. Definition 2.2.1. A sequence (vn )n∈N of functions converges in the sense of the space D() to a function v ∈ D() if the two following conditions are satisfied: (i) There exists a compact subset K in such that spt vn ⊂ K for all n ∈ N and spt v ⊂ K. (ii) For all multi-index p ∈ NN , D p vn −→ D p v uniformly on K. One can prove the existence of a locally convex topology on the space D() with respect to which a linear functional F is continuous iff it is sequentially continuous, that is, F (vn ) −→ F (v) whenever vn −→ v in the sense of D(). But this topology is not easy to handle (it is not metrizable); we don’t really need to use it, so we will use only the notion of convergent sequence in D() as defined above. Definition 2.2.2. A distribution T on is a continuous linear form on D(). Equivalently, a linear form T : D() −→ R is a distribution on if for any sequence (vn )n∈N in D(), the following implication holds: vn −→ 0 in the sense of D() ⇒ T (vn ) −→ 0. The space of distributions on is denoted by D (). It is the topological dual space of D() and we will write T , v(D (),D()) := T (v) the duality pairing between T ∈ D () and v ∈ D(). Let us now give a practical criterion which allows us to verify that a linear form on D() is continuous (and hence is a distribution). Proposition 2.2.1. Let T be a linear form on D(). Then T is a distribution on iff for all compact K in , there exists n ∈ N and C ≥ 0, possibly depending on K, such that ∀v ∈ D() with spt v ⊂ K, |T (v)| ≤ C D p v∞ . |p|≤n
Proof. Clearly, the above condition implies that T is continuous on D(). To prove the converse statement, let us argue by contradiction. Thus, given T ∈ D (), let us assume that there exists a compact K in and a sequence (vn )n∈N in D() such that for each n ∈ N spt vn ⊂ K and |T (vn )| > n D p vn ∞ . |p|≤n
Let us define wn :=
n
1 D p vn ∞
vn .
|p|≤n
i
i i
i
i
i
i
18
“abmb 2005/1 page 1 i
Chapter 2. Weak solution methods in variational analysis
Then, wn ∈ D(), spt wn ⊂ K and for each m ∈ N D m wn =
n
1 D p vn ∞
D m vn ,
|p|≤n
so that ∀n>m
D m wn ∞ ≤
1 , n
and wn −→ 0 in D(). By linearity of T |T (wn )| > 1, so that T (wn ) does not tend to zero, a contradiction with the fact that T ∈ D (). Proposition 2.2.1 allows us to naturally introduce the notion of distribution with finite order. Definition 2.2.3. A distribution T ∈ D () has a finite order if there exists an integer n ∈ N such that for each compact subset K ⊂ , there exists a constant C(K) such that ∀ v ∈ D() with spt v ⊂ K, |T (v)| ≤ C(K) sup D p v∞ . |p|≤n
If T has a finite order, the order of T is the smallest integer n for which the above inequality holds. In Proposition 2.2.1, the integer n a priori depends on the compact set K. A distribution has a finite order if n can be taken independent of K. Let us describe some first examples of distributions.
2.2.2
Locally integrable functions as distributions: Regularization by convolution and mollifiers
Take f ∈ L1loc (), which means that for each compact subset K of , One can associate to f the linear mapping Tf : v ∈ D() −→ f (x)v(x) dx.
K
|f (x)| dx < +∞.
For any compact subset K ⊂ , for any v ∈ D() with spt v ⊂ K, the following inequality holds: |Tf (v)| ≤ C(K)v∞ with C(K) = K |f (x)| dx < +∞. By Proposition 2.2.1 Tf is a distribution of order zero. Indeed, a function f ∈ L1loc () is uniquely determined by its corresponding distribution Tf , as stated in the following. Theorem 2.2.1. Let f ∈ L1loc (), g ∈ L1loc () be such that ∀v ∈ D() f (x)v(x) dx = g(x)v(x) dx.
Then f = g almost everywhere (a.e.) on .
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 1 i
19
The above result allows us to identify f ∈ L1loc with the corresponding distribution Tf , which gives the injection L1loc () → D (). The proof of Theorem 2.2.1 is a direct consequence of the density of the space of test functions D() in the space Cc (). Proposition 2.2.2. D() is dense in Cc () for the topology of the uniform convergence. More precisely, for every v ∈ Cc (), there exists a sequence (vn )n∈N , vn ∈ D(), and a compact set K ⊂ such that vn → v uniformly and spt vn ⊂ K. Proof. Take v ∈ Cc () and extend v by zero outside of . We so obtain an element, which we still denote by v, which is continuous on RN and with compact support in . Let us use the regularization method by convolution and introduce a smoothing kernel ρ: ρ ∈ D(RN ), ρ ≥ 0, spt ρ ⊂ B(0, 1), ρ(x)dx = 1. RN
Take, for example,
ρ(x) =
m e−1/(1−|x| 0
2
)
if |x| ≤ 1, elsewhere,
m being choosen in order to have RN ρ(x)dx = 1. Then let us define for each integer n = 1, 2, . . . , ρn (x) := nN ρ(n x), which satisfies
ρ ∈ D(RN ), ρn ≥ 0, n spt ρn ⊂ B(0, 1/n), ρn (x)dx = 1. RN
The sequence (ρn )n∈N is said to be a mollifier. Given v ∈ Cc (RN ), let us define vn = v ρn , that is, vn (x) =
v(y)ρn (x − y)dy. RN
We have
spt vn ⊂ spt v + spt ρn ⊂ spt v + B(0, 1/n)
and vn has a compact support in for n large enough. The classical derivation theorem under the integral sign yields ∀ α ∈ NN
D α v n = v D α ρn
and vn belongs to C∞ (). Thus vn belongs to D().
i
i i
i
i
i
i
20
“abmb 2005/1 page 2 i
Chapter 2. Weak solution methods in variational analysis
Let us now prove that vn converges uniformly to v. To that end, we use the other equivalent formulation of vn v(x − y)ρn (y)dy vn (x) = RN
and the fact that
RN
ρn = 1 to obtain vn (x) − v(x) =
[v(x − y) − v(x)]ρn (y)dy. RN
Using that spt ρn ∈ B(0, 1), we have sup |vn (x) − v(x)| ≤
sup
x∈RN
|v(z) − v(x)|.
x,z∈RN x−z≤1/n
This last quantity goes to zero as n → +∞; this is a consequence of the uniform continuity of v on RN (recall that v is continuous with compact support). Let us now complete the proof. Proof of Theorem 2.2.1. Take h = f − g, h ∈ L1loc which satisfies ∀v ∈ D() h(x)v(x) dx = 0.
By density of D() in Cc () (see Proposition 2.2.2) for any v ∈ Cc () there exists a sequence (vn )n∈N in D() such that spt vn ⊂ K for some fixed compact K in , vn −→ v uniformly on K. Since h(x)vn (x) dx = K h(x)vn (x) dx = 0, and h ∈ L1 (K), by passing to the limit as n −→ ∞, we obtain ∀ v ∈ Cc ()
h(x)v(x)dx = 0.
The conclusion h = 0 follows by the Riesz–Alexandrov theorem (see Theorem 2.4.6) and the uniqueness of the representation. Let us give a direct independant proof of the fact that h = 0. It will mostly rely on the Tietze–Urysohn separation lemma. 1 = One can first reduce to consider the case h ∈ L () with || < +∞ (write 1 , with open, compact, and take h| ). By density of C () in L (), for n n n c n n∈N each ε > 0 there exists some hε ∈ Cc () such that h − hε L1 () < ε. Hence hε (x)v(x) dx ≤ εvL∞ () ∀v ∈ Cc (). (2.23)
Consider K1 = {x ∈ : hε (x) ≥ ε},
K2 = {x ∈ : hε (x) ≤ −ε}.
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 2 i
21
These two sets are disjoint and compact. By the Tietze–Urysohn separation lemma, there exists a function ϕ ∈ Cc () such that on K1 , ϕ(x) = 1 ϕ(x) = −1 on K2 , −1 ≤ ϕ(x) ≤ 1 ∀ x ∈ . Taking K = K1 ∪ K2 ,
|hε |dx +
|hε |dx =
K
|hε | dx. \K
On K, we have |hε | = hε ϕ, so that |hε | dx = hε ϕ = hε ϕ dx − K
K
By (2.23),
hε ϕ dx. \K
hε ϕ dx ≤ εϕL∞ ≤ ε and, since |hε | ≤ ε on \ K, hε ϕ dx ≤ ε||. \K
So,
|hε | dx ≤ ε(1 + ||). K
Finally,
|hε | dx ≤ ε(1 + ||) + ε|| = ε + 2ε||
and
|h| dx ≤
|h − hε | dx +
|hε | dx ≤ 2ε(1 + ||).
This being true for any ε > 0, we conclude that h = 0. Noticing that Lp () → L1loc () for any 1 ≤ p ≤ +∞, as a direct consequence of Proposition 2.2.2, we obtain the following corollary. Corollary 2.2.1. Given 1 ≤ p ≤ +∞, let us suppose that f ∈ Lp (), g ∈ Lp () satisfy f (x)v(x) dx = g(x)v(x) dx ∀v ∈ D();
then f = g a.e. on . Let us notice that when taking 1 < p < +∞, we can obtain this result in a more direct way, by using the density of D() in Lq (), 1 ≤ q < +∞. Then take q = p the Hölder conjugate exponent of p, p1 + p1 = 1. Clearly the density of D() in Lq () is a consequence of Proposition 2.2.2. Because of the importance of this result, let us give another proof of it, of independent interest, which relies only on Lp techniques.
i
i i
i
i
i
i
22
“abmb 2005/1 page 2 i
Chapter 2. Weak solution methods in variational analysis
Proposition 2.2.3. Let be an arbitrary open subset of RN . Then D() is dense in Lp () for 1 ≤ p < +∞. This will result from the following. Proposition 2.2.4. Let f ∈ Lp (RN ) with 1 ≤ p < +∞. Then, for any mollifier (ρn )n∈N , the following properties hold: (i) f ρn ∈ Lp (RN ), (ii) f ρn Lp (RN ) ≤ f Lp (RN ) , (iii) f ρn −→ f in Lp (RN ) as n → +∞. Proof. To prove (i) and (ii) we omit the subscript n ∈ N. Let us consider the case 1 < p < ∞ and introduce p with p1 + p1 = 1. |f (x − y)|ρ(y)dy |(f ρ)(x)| ≤ RN ≤ |f (x − y)| ρ(y)1/p ρ(y)1/p dy. RN
Let us apply the Hölder inequality 1/p |f (x − y)|p ρ(y)dy |(f ρ)(x)| ≤ Since
RN
RN
1/p ρ(y) dy
.
RN
ρ(y)dy = 1, we obtain
|(f ρ)(x)|p ≤
|f (x − y)|p ρ(y)dy. RN
Let us integrate with respect to x ∈ RN and apply the Fubini–Tonelli theorem |(f ρ)(x)|p dx ≤ |f (x − y)|p ρ(y) dy dx RN RN RN ≤ ρ(y) |f (x − y)|p dx dy N RN R ≤ |f (x)|p dx,
RN
where we have used again that ρ dx = 1 and the fact that the Lebesgue measure on RN is invariant by translation. Thus, f ρ ∈ Lp and f ρLp ≤ f Lp . The convergence of f ρn to f in Lp (RN ) relies on a quite similar computation. Using that ρn dx = 1, we can write [f (x) − f (x − y)] ρn (y) dy f (x) − (f ρn )(x) = RN
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 2 i
23
and |f (x) − (f ρn )(x)| ≤
|f (x) − f (x − y)| ρn (y) dy. RN
Let us rewrite this last inequality as |f (x) − (f ρn )(x)| ≤
|f (x) − f (x − y)| ρn (y)1/p ρn (y)1/p dy
RN
and apply the Höder inequality to obtain |f (x) − (f ρn )(x)| ≤
|f (x) − f (x − y)|p ρn (y) dy.
p
RN
Integrating with respect to x on RN and applying the Fubini–Tonelli theorem, we obtain p p f − (f ρn )Lp ≤ ρn (y) f − τy f Lp dy. RN
p
Let us introduce ϕ(y) := f − τy f Lp . Since f ∈ Lp (RN ), ϕ is a continuous function on RN such that ϕ(0) = 0. We conclude thanks to the following property: ∀ϕ : RN −→ RN continuous with ϕ(0) = 0, we have ϕ(y) ρn (y) dy = 0. lim n→+∞ RN
This results from the inequality ϕ(y) ρ (y) dy − ϕ(0) = (ϕ(y) − ϕ(0)) ρ (y) dy n n N N R R ≤ |ϕ(y) − ϕ(0)| ρn (y) dy B(0,1/n)
≤ sup |ϕ(y) − ϕ(0)|, |y|≤1/n
which tends to zero as n → +∞. Indeed, we will interpret this last result as a convergence in D () of the sequence (ρn ) to δ0 , the Dirac mass at the origin. We can now complete the proof. Proof of proposition 2.2.3. Take f ∈ Lp (), ε > 0, and g ∈ Cc () such that f − gLp () < ε. Then let us extend g outside of by zero to obtain a function that we still denote by g which belongs to Cc (RN ). Take fn = g ρn . Then, fn ∈ D(RN ), and in fact fn ∈ D() for n large enough, because spt fn ⊂ spt g + B(0, 1/n). Moreover, fn −→ g in Lp (), so that f − fn Lp ≤ ε for n large enough.
i
i i
i
i
i
i
24
2.2.3
“abmb 2005/1 page 2 i
Chapter 2. Weak solution methods in variational analysis
Radon measures
Let us recall that a Radon measure µ is a linear form on Cc () such that for each compact K ⊂ , the restriction of µ to CK () is continuous, that is, for each K ⊂ , K compact, there exists some C(K) ≥ 0 such that ∀ v ∈ Cc () with spt v ⊂ K,
|µ(v)| ≤ C(K)v∞ .
To such a Radon measure, one can associate its restriction to D(), Tµ : v ∈ D() −→ v(x) dµ(x),
which by the definition of µ is a distribution of order zero. Conversely, µ is completely determined by the corresponding distribution Tµ . This is a consequence of the density of D() in Cc (); see Proposition 2.2.2. As a consequence, we can identify any Radon measure with its corresponding distribution and M → D (). As a typical example of a distribution measure which is not in L1loc (), if 0 ∈ , take µ = δ0 the Dirac mass at the origin, with µ, v(D ,D) := v(0). To describe further examples of great importance in applications, we need to introduce further notions, namely, the derivation of distributions and weak limits of distributions.
2.2.4
Derivation of distributions, introduction to Sobolev spaces
∂T Definition 2.2.4. Let T ∈ D () be a distribution on . Then ∂x is defined as the linear i mapping on D(), ∂T ∂v : v ∈ D() −→ − T , . ∂xi ∂xi (D ,D)
More generally, for any multi-index p = (p1 , . . . , pN ), we define D p T : v ∈ D() −→ (−1)|p| T , D p v(D ,D) . Proposition 2.2.5. For any distribution T on , for any multi-index p ∈ NN , we have that D p T is still a distribution on . Therefore, for any v ∈ D() D p T , v(D (),D()) = (−1)|p| T , D p v(D (),D()) . Proof. One just needs to notice that for p ∈ NN fixed, the mapping v −→ D p v is continuous from D() into D(). This is an immediate consequence of the definition of the sequential convergence in D(), which, we recall, involves a compact support condition and the uniform convergence of the derivatives of arbitrary order. These two properties are clearly preserved by the operations D p . Therefore, every distribution in D () possesses derivatives of arbitrary orders in D (). Indeed, the notion of derivative D p T of a distribution has been defined so as to
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 2 i
25
extend the classical notion of derivative for a smooth function. Let us recall the identification we make between f ∈ L1loc and the corresponding distribution Tf . Proposition 2.2.6. Let f be some function in the set Cm () of real-valued functions of class Cm in . Then, for any p ∈ NN with |p| ≤ m, the distribution derivative D p f coincides with the classical derivative D p f of functions. Proof. It is a direct consequence of the integration by parts formula. If f ∈ C1 () and v ∈ D(), ∂v ∂f (x) dx. (x) v(x) dx = − f (x) ∂x ∂x i i Similarly, integration by parts |p| times gives the following formula: (D p f )(x) v(x) dx = (−1)|p| f (x) D p v(x) dx;
|p|
this formula is valid for f ∈ C () and v ∈ D(). The fact that the test function v has a compact support is essential to make, in the integration by parts formula, the integral term on ∂ equal to zero. We stress the fact that the notion of derivative D p T of a distribution T ∈ D () takes as a definition the integration by parts formula, the derivation operation being transferred, by this operation, on the test functions. This can be done at an arbitrary order since the test functions have been taken indefinitely differentiable. The two previous remarks justify the choice of test functions v ∈ D(). We can now describe a fundamental example of distribution coming from the theory of Sobolev spaces. This theory, which plays a central role in the variational approach to a large number of boundary value problems (like the Dirichlet problem) will be developed in detail in chapter 5. Here we give some definitions and elementary examples. For any m ∈ R, p ∈ [1, +∞], W m,p () = {f ∈ Lp () : D j f ∈ Lp ()
∀ j, |j | ≤ m}.
One of the most important Sobolev spaces is the space ∂f W 1,2 () = H 1 () = f ∈ L2 () : ∈ L2 (), i = 1, 2, . . . , N . ∂xi ∂f In the above definition, the derivation ∂x (or more generally D j f ) is taken in the distribution i sense. We will see that the choice of this notion of derivation is fundamental to obtain the desirable properties for the corresponding spaces. As an elementary example, let us consider = (−1, 1) and f (x) = |x|. Clearly, f is not differentiable in the classical sense at the origin. The function f is continuous on , it belongs to any Lp (), 1 ≤ p ≤ +∞, and thus it defines a distribution and we can compute its first distribution derivative Df 1 Df, v(D (),D() := − f (x)v (x)dx. −1
i
i i
i
i
i
i
26
“abmb 2005/1 page 2 i
Chapter 2. Weak solution methods in variational analysis
Let us write
1 −1
f (x)v (x)dx =
0
−1
1
f (x)v (x)dx +
f (x)v (x)dx
0
and let us integrate by parts on each interval (−1, 0) and (0, 1). This is possible since now f ∈ C1 ([−1, 0]) and f ∈ C1 ([0, 1]). We have 0 0 f (x)v (x)dx = f (0)v(0) − f (−1)v(−1) − f (x)v(x)dx, −1
1
f (x)v (x)dx = f (1)v(1) − f (0)v(0) −
0
−1
1
f (x)v(x)dx.
0
Note that since v ∈ D(−1, 1), we have v(−1) = v(1) = 0, but for a general v ∈ D(−1, 1), v(0) = 0. By adding the two above equalities, the terms containing v(0) cancel and we obtain 0 1 1 f (x)v (x) dx = − −v(x) dx + v(x) dx , ∀v ∈ D(−1, 1) −1
that is,
1 −1
−1
f (x)v (x)dx = −
where g(x) =
−1 1
if if
0
1
g(x)v(x)dx, −1
−1 < x < 0, 0 < x < 1.
The above function g is then the distributional derivative of f (x) = |x| on = (−1, 1). It belongs to Lp () for any 1 ≤ p ≤ +∞, so that f ∈ W 1,p () for any 1 ≤ p ≤ +∞. The parameters m ∈ N and p ∈ [1, +∞] yield a scale of spaces which allow us to distinguish, for example, in our situation the different behavior of the functions fα (x) = |x|α and of their derivatives at zero. A similar computation as above yields Dfα = α|x|α−2 x in D (−1, 1). 1 1 Hence, fα ∈ W 1,p (−1, 1) iff −1 |x|p(α−1) dx < +∞, that is, p < 1−α . When p = 2, we 1 α 1 have that fα (x) = |x| belongs to H (−1, 1) iff α > 2 : this expresses that in some sense the derivative of fα at x does not tend to +∞ too rapidly when x goes to zero. Let us now examine the other parameter m, which is relative to the order of derivation. Take again f (x) = |x| and compute the second order derivative of f on (−1, 1). This amounts to computing the first order derivative of g(x) = sign x. Thus 1 2 D f, v(D ,D) = Dg, v(D ,D) = − g(x)v (x)dx. −1
As before, let us split the integral over (−1, 1) into two parts, 1 0 1 g(x)v (x) dx = − v (x) dx + v (x) dx −1
−1
0
= −[v(0) − v(−1)] + [v(1) − v(0)] = −2v(0),
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 2 i
27
since v ∈ D(−1, 1) and v(−1) = v(1) = 0. Thus Dg, v(D (−1,1),D(−1,1)) = 2v(0) and Dg = 2δ0 , where δ0 is the Dirac mass at the origin. But the distribution δ0 is a measure which is not representable by a function: suppose that there exists a function h ∈ L1loc such that 1 h(x)v(x) dx. ∀v ∈ D(−1, 1) v(0) = −1
Then, taking successively v ∈ D(−1, 0) and v ∈ D(0, 1), we conclude by Theorem 2.2.1 that h = 0 a.e. on (−1, 0) and on (0, 1). Hence h = 0 a.e. on (−1, 1), which would imply v(0) = 0 for every v ∈ D(−1, 1), a clear contradiction. Therefore f ∈ W 1,2 (−1, 1) but f ∈ W 2,1 (−1, 1). The above computation of the distributional derivative of a function g which has a discontinuity is very important in a number of applications (phase transitions, plasticity, image segmentation, etc.). This situation will be considered in detail in chapter 10 and will lead us to the introduction of the functional space BV (), the space of functions with bounded variation, which can be characterized as the space of integrable functions whose first distributional derivatives are bounded measures. The next operation on distributions which is very useful for applications is the notion of limit of a sequence of distributions.
2.2.5
Convergence of sequences of distributions
Definition 2.2.5. Let Tn ∈ D () for all n ∈ N and T ∈ D (). The sequence (Tn )n∈N is said to converge to T in D () if ∀v ∈ D()
lim Tn (v) = T (v).
n→+∞
We will write Tn −→ T in D () or limn→+∞ Tn = T in D (). In Section 2.4, we will interpret this convergence as a weak* convergence in the dual space D () of D(). Let us recall that any distribution T ∈ D () possesses derivatives or arbitrary orders. The following result expresses that for any multi-index p ∈ NN the mapping T −→ D p T is continuous. Proposition 2.2.7. Let p ∈ NN . The mapping T ∈ D () −→ D p T ∈ D () is continuous, which means that for any sequence (Tn )n∈N , T in D (), the following implication holds: Tn −→ T in D () ⇒ D p Tn −→ D p T in D ().
i
i i
i
i
i
i
28
“abmb 2005/1 page 2 i
Chapter 2. Weak solution methods in variational analysis
Proof. The proof is a direct consequence of the definitions. Let us assume that Tn −→ T in D () and take v ∈ D(). By definition of D p D p Tn , v = (−1)|p| Tn , D p v
∀ v ∈ D().
Since v ∈ D(), D p v still belongs to D(), and the convergence of Tn to T in D () implies lim Tn , D p v = T , D p v. n→+∞
Thus, again by the definition of D p , lim D p Tn , v = (−1)|p| T , D p v = D p T , v,
n→+∞
which expresses that D p Tn −→ D p T in D (). The above proposition is one of the reasons that explains the success of the theory of distributions. It makes this theory a very flexible tool for the study of PDEs. We will often use this type of argument—for example, in the chapter on Sobolev spaces. Suppose (vn )n∈N is a sequence in H 1 () such that vn −→ v in L2 (), ∂vn −→ gi in L2 (), ∂xi
i = 1, 2, . . . , N.
∂v Then, v ∈ H 1 () and gi = ∂x , i = 1, 2, . . . , N. This can be justified with the language of i ∂v n distribution as follows. Since vn −→ v in L2 (), vn −→ v in D () and hence ∂v −→ ∂x ∂xi i ∂v in D (). On the other hand, ∂xni −→ gi in L2 () and hence in D (). The uniqueness of ∂v = gi for all i = 1, 2, . . . , N and v ∈ H 1 (). the limit in D () implies that ∂x i Let us give another illustration of the above tools and compute the fundamental solution of the Laplacian in R3 .
Proposition 2.2.8. Take N = 3 and consider the Newtonian potential f (x) = Then, −
1 f 4π
1
.
x12 + x22 + x32
= δ.
Proof. Let us denote r(x) =
x12 + x22 + x32 the Euclidean distance of x = (x1 , x2 , x3 )
1 from the origin, and notice that f (x) = r(x) belongs to L1loc (R3 ) and thus defines a distribution on R3 . Let us compute f in D (). A standard approach consists in approximating f by a sequence fε of smooth functions, then computing fε in a classical sense by Proposition 2.2.6 and passing to the limit in D () as ε → 0. Clearly, the difficulty is at the origin where f has a singularity. Thus, the parameter ε is intended to isolate the origin and
i
i i
i
i
i
i
2.2. Test functions and distribution theory
“abmb 2005/1 page 2 i
29
regularize f at the origin. At this point, there are two possibilities which lead to different computations and that we examine now. First, take for ε > 0 1/ε if r(x) ≤ ε, fε (x) = 1/r(x) if r(x) ≥ ε. Clearly, fε is now a continuous, piecewise smooth function (it is not C1 ) on R3 and a ∂fε belongs to L2 (R3 ) with standard computation yields that ∂x i ∂fε 0 if r < ε, = 3 /r if r > ε. −x ∂xi i ∂r = xri for i = 1, 2, 3. Note that ∂x i Let us now compute −fε . By definition,
−fε , v(D (R3 ),D(R3 )) = fε , −v(D ,D) 3 ∂fε ∂v = , . ∂xi ∂xi (D ,D) i=1 Since
∂fε ∂xi
belongs to L1loc , −fε , v =
3 i=1
∂fε ∂v dx ∂xi ∂xi
R3
3 =− i=1
r≥ε
xi ∂v . dx. r 3 ∂xi
Let us now integrate by parts this last expression. Noticing that on R3 \ {0}, ∂ xi 3 −3 xi = + xi · 4 · 3 3 r r r r ∂x i i =
x2 3 3 3 i − 3 = 3 − 3 =0 r3 r5 r r
(which means that f = 0 on R3 \ {0}), we obtain −fε , v = −
3 xi x i − v dx, 3 r Sε i=1 r
where Sε = {x ∈ R3 : r(x) = ε} is the sphere of radius ε centered at the origin. Note that the unit normal to Sε at x which is oriented toward the outside of {r ≥ ε} is equal to − xr . Hence 1 −fε , v(D ,D) = (x)v(x) dx 2 r Sε = µε , v(D ,D) ,
i
i i
i
i
i
i
30
“abmb 2005/1 page 3 i
Chapter 2. Weak solution methods in variational analysis
where µε = ε−2 H2 Sε and H2 Sε is the two-dimensional Hausdorff measure supported by Sε . An elementary calculus yields that 1 µε = δ in D (R3 ). ε→0 4π lim
By definition of the convergence in D (R3 ) we have −
1 fε 4π
−→ δ in D (R3 ).
On the other hand, fε converges to f in L1 (R3 ) (for example, by the dominated convergence 3 theorem) and hence in D (R3 ). By the continuity of the differential operator in D (R ) 1 (cf. Proposition 2.2.7) we finally obtain that − 4π f = δ. Another regularization consists of building fε a C1 (R3 ) function which approximates f . Take, for example, fε (x) =
aε r 2 (x) + bε 1/r(x)
if r(x) ≤ ε, if r(x) ≥ ε,
aε and bε being choosen in order to have fε ∈ C1 (R3 ). This is equivalent to the system
aε ε 2 + bε = 1/ε, aε = −1/(2ε 3 ),
which gives bε = 3/(2ε). Noticing that r2 3 + 2ε 3 2ε 1 r2 = 3− 2 2ε ε 3 ≤ for r(x) ≤ ε, 2ε
a ε r 2 + bε = −
we have that 0 ≤ fε (x) ≤
3 3 = f (x) on R3 . 2r(x) 2
Hence, by the Lebesgue dominated convergence, fε −→ f in L1 (R3 ) as ε −→ 0. So fε −→ f in D (R3 ) and fε −→ f in D (R3 ). An elementary computation yields −fε = −6aε 1B(0,ε) 3 = + 3 1B(0,ε) . ε
i
i i
i
i
i
i
2.3. Weak solutions
“abmb 2005/1 page 3 i
31
Hence
−
1 fε 4π
=
1
1B(0,ε) . 4 π ε3 3
Noticing that 43 πε 3 is precisely the volume of the ball B(0, ε), we have 1
lim 1B(0,ε) ε→0 4 π ε 3 3 which finally implies
−
1 f 4π
= δ,
=δ
and completes the proof.
2.3 Weak solutions 2.3.1 Weak formulation of the model examples The Dirichlet problem. Let be an open subset of RN and f : −→ R a given function; take f ∈ L2 (), for example. We recall that the Dirichlet problem is to find a function u : −→ R which solves −u = f
u=0
on ,
on ∂.
(2.24)
In Section 2.1, we explained that it is difficult to prove directly the existence of a classical solution to this problem. By classical solution, we mean a function u which is continuous on and of class C2 on . So, the idea is to allow u to be less regular (at least in a first stage) and to interpret u in a weak sense, namely, in a distribution sense. Taking test functions v ∈ D(), (2.24) is equivalent to −u, v(D (),D()) = f vdx ∀ v ∈ D().
The definition of the derivation of distributions (see Definition 2.2.4) is precisely based on the integration by parts formula and allows us to transfer the derivation operation from u onto the test functions v ∈ D(). At this point, we have two possibilities. The two equalities −u, v =
N ∂u i=1
∂v , ∂xi ∂xi
,
(2.25)
(D (),D())
−u, v = u, −v(D (),D())
(2.26)
correspond, respectively, to a partial transfer and a global transfer of the derivatives on the test functions. They give rise to two distinct weak formulations of the initial problem, which indeed depend on the regularity properties which we expect the solution u to satisfy.
i
i i
i
i
i
i
32
“abmb 2005/1 page 3 i
Chapter 2. Weak solution methods in variational analysis
If we expect the solution u to have first distribution derivatives which are integrable, then (2.25) gives rise to N ∂u ∂v dx = f v dx ∀ v ∈ D(), (2.27) i=1 ∂xi ∂xi u = 0 on ∂ . If we don’t expect u to have first derivatives which are integrable and just expect u to be L1 () or in C(), then by using (2.26), we obtain the (very) weak formulation f v dx ∀ v ∈ D(), − uv dx = (2.28) u = 0 on ∂. For many reasons, the weak formulation (2.27) is the one which is well adapted to our situation: (a) As a general rule, we will see that it is preferable to perform the integrations by parts which are necessary and no more. Otherwise, the solution which is obtained satisfies the equation in a very weak sense, we have only poor informations on this solution, and the study of its uniqueness and regularity becomes quite involved. (b) When trying to give a sense to the boundary condition u = 0 on ∂, it is useful to have some information on the derivatives of u on . To find a weak solution u for which we know just that u belongs to some Lp () space is not sufficient to give meaning to the trace of u on ∂ (recall that ∂ has a zero Lebesgue measure). Since we will be able to find a weak solution of the Dirichlet problem in the space H 1 () (that is, with first order distribution derivatives in L2 ()), we will use (2.27) as a variational formulation of the Dirichlet problem. Note, too, that (2.27) has another advantage over (2.28): the left member of the equation, which is the important part and which involves the partial differential operator governing the equation, is symmetric with respect to u and v in (2.27), while it is not in (2.28). This has important consequences on the variational formulation of the problem; see Section 2.3.2 and chapter 3. Let us summarize the above comments and give a first definition of the notion of a weak solution for the Dirichlet problem. It will be made precise later and solved in chapter 6. Given f ∈ L2 (), a weak solution u of the Dirichlet problem (2.24) is a function u ∈ H 1 () which satisfies N ∂u ∂v dx = f v dx ∀ v ∈ D(), (2.29) i=1 ∂xi ∂xi u = 0 on ∂. Indeed, we will justify the choice of the functional space H 1 () and explain how to interpret the trace of such functions on ∂. We will reformulate (2.29) by introducting the subspace H01 () of H 1 (): H01 () = {v ∈ H 1 () : v = 0 on ∂}.
i
i i
i
i
i
i
2.3. Weak solutions
“abmb 2005/1 page 3 i
33
Indeed, H01 () is equal to the closure of D() in H 1 (). As a consequence, the equality (2.29) can be extended by a density and continuity argument to H01 (). In this way we obtain the classical weak formulation of the Dirichlet problem. Definition 2.3.1. A weak solution of the Dirichlet problem is a solution of the following system: N ∂u ∂v f v dx ∀ v ∈ H01 (), dx = i=1 ∂xi ∂xi u ∈ H01 ().
(2.30)
Note that (2.30) can be written in the following abstract form: find u ∈ V such that a(u, v) = L(v) for all v ∈ V , where a : V ×V −→ R is a bilinear form which is symmetric and positive (a(v, v) ≥ 0 for every v ∈ V ) and L is a linear form on V . In chapter 3, an existence result for such an abstract problem will be proved (Lax– Milgram theorem); in chapter 5 the basic ingredients of the theory of Sobolev spaces will be developed, for example, to treat the case V = H01 (). Thus, in chapter 6 we will be able to prove the existence of a weak solution to the Dirichlet problem. The Neumann problem. We recall that the Neumann problem consists of finding a solution u to the boundary problem ∂u = 0 on ∂, ∂n
u − u = f on ,
(2.31)
where ∂u = Du·n is the outward normal derivative of u on ∂. A major difference between ∂n the Dirichlet and the Neumann problem is that in the Neumann problem, the value of u on the boundary is not prescribed (it is ∂u which is prescribed). As a consequence, we have to ∂n test u on and on ∂; it is not sufficient to take test functions v ∈ D(). We will take test functions v ∈ C1 (). We are no longer in the setting of the distribution theory, but we can follow the lines of this theory. Let us first assume that u is regular and let us multiply (2.31) by v ∈ C1 () and integrate by parts. Recall that from the divergence theorem, div(v Du) dx = v Du · n dσ.
Thus
∂
(v u + Du · Dv) dx =
v ∂
∂u dσ. ∂n
(2.32)
By using (2.31) and (2.32) we obtain
(uv + Du · Dv) dx =
f v dx
∀v ∈ C1 ().
(2.33)
i
i i
i
i
i
i
34
“abmb 2005/1 page 3 i
Chapter 2. Weak solution methods in variational analysis
Now (2.33) makes sense even for a function u for which we are only able to define first generalized derivatives as functions. So, we will take (2.33) in a first step as a notion of weak solution. Precisely, given f ∈ L2 (), a weak solution of the Neumann problem is a function u ∈ H 1 () such that
(uv + Du · Dv) dx =
f v dx
∀v ∈ C1 ().
(2.34)
The striking feature is that in this weak formulation (2.34), the Neumann boundary condition has disappeared! It is important to verify that, so doing, we have not lost any information. In other words, we need to show that, conversely, if u ∈ H 1 () verifies (2.34), then u satisfies (2.31). The second condition in (2.31) is the so-called Neumann boundary condition. First take v ∈ D(). Clearly D() is a subspace of C1 () and so we obtain u − u = f in D ().
(2.35)
To recover the Neumann boundary condition, we have to perform the integration by parts in a reverse way. To do so, we assume that we have been able to prove that the weak solution u of (2.34) is in fact a regular function. So, by using (2.32) and (2.34), ∂u (u − u)vdx + v f v dx ∀v ∈ C1 (). (2.36) dσ = ∂n ∂ By (2.35), u − u = f , so we can simplify (2.36) to obtain ∂u v dσ = 0 ∀v ∈ C1 (), ∂n ∂ which implies
∂u ∂n
= 0.
Thus, the Neumann boundary condition is implicitely contained in the weak variational formulation (2.34). Indeed, just like for the Dirichlet problem, we will prove a density result, namely, “C1 () is dense in H 1 ().” As a consequence, the equality (2.34) can be extended to all v ∈ H 1 () and the final variational formulation of the Neumann problem will be the following. Definition 2.3.2. A weak solution of the Neumann problem is a solution u of the following system: (uv + Du · Dv) dx = f v dx ∀ v ∈ H 1 (), (2.37) u∈ H 1 (). Note again that the above problem can be written as find u ∈ V = H 1 () such that a(u, v) = L(v) ∀ v ∈ V ,
i
i i
i
i
i
i
2.3. Weak solutions where a(u, v) =
“abmb 2005/1 page 3 i
35
(uv
+ Du · Dv) dx is a bilinear form, symmetric, and positive and L(v) =
f v dx
is a linear form on V .
The basic difference between the weak variational formulations of the Dirichlet and Neumann problems is in the choice of the space V which reflects the choice of the test functions: V = H01 () in the Dirichlet problem; V = H 1 () in the Neumann problem. The Stokes system. Given f = (f1 , f1 , . . . , fN ) ∈ L2 ()N and µ > 0, we are looking for the velocity vector field of the fluid u = (u1 , u2 , . . . , uN ) and the pressure p : −→ R of the fluid which satisfy ∂p = fi on , i = 1, . . . , N, ∂xi div u = 0 on , ui = 0 on ∂, i = 1, . . . , N.
− µui +
(2.38) (2.39) (2.40)
∂ui The condition div u = N i=1 ∂xi = 0 expresses that the fluid is incompressible. The choice of the test functions is not as immediate as in the two previous situations. A guideline is to choose the test functions smooth enough to perform the integration by parts and which looks like the function or vector field we want to test. A clever choice (J. Leray developed this method) is to take test fields v ∈ V, where V = { v = (v1 , . . . , vn ), vi ∈ D(), i = 1, . . . , N, and div v = 0}. One may require the vi to be C1 function with compact support as well. The important point is to assume that the divergence of v is equal to zero. Let us interpret (2.38) in the sense of distributions. If we expect to find ui with first partial derivatives in L2 () (i.e., ui ∈ H 1 ()) and p ∈ L2 (), this is equivalent to writing for each i = 1, 2, . . . , N ∂vi µ Dui · Dvi dx − p· dx = fi vi dx ∀vi ∈ D(). (2.41) ∂xi The trick is now to add these equalities ∂vi (i = 1, 2, . . . , N). Since the test functions v1 , . . . , vN , by definition of V, verify = 0, we obtain ∂xi µ
N i=1
Dui · Dvi dx =
N i=1
fi vi dx
∀ v ∈ V.
(2.42)
Conversely, it is easy to verify that if u is regular and satisfies (2.42), then N i=1
(−µui − fi )vi dx = 0
∀ v ∈ V.
i
i i
i
i
i
i
36
“abmb 2005/1 page 3 i
Chapter 2. Weak solution methods in variational analysis
In other words, the vector (µui + fi )i=1,...,N is orthogonal to V in L2 ()N . One can prove—indeed, this is quite an involved result (see chapter 6)—that this property implies the existence of p ∈ L2 () such that µ ui + fi =
∂p , ∂xi
i = 1, 2, . . . , N.
Indeed, as in the previous examples, the equality (2.42) can be extended by a density and continuity argument to V = { v ∈ H01 ()N : div v = 0}. Finally, the variational formulation of the Stokes system is given below. Definition 2.3.3. A weak solution of the Stokes system is a solution u = (u1 , u2 , . . . , uN ) of the system N N µ Dui · Dvi dx = fi vi dx ∀ v ∈ V, (2.43) i=1 i=1 u ∈ V , where V = { v ∈ H01 ()N : div v = 0}. The choice of the functional space V (which is obtained by a completion of the space V of test functions) is of fundamental importance. The pressure p has apparently disappeared in this formulation. It is contained implicitly in it, since p can be interpreted as a Lagrange multiplier of the constraint div v = 0. Notice that, once more, the weak formulation we have obtained can be written in the following form: find u ∈ V such that a(u, v) = L(v)
∀v ∈ V , where a(u, v) = µ i=1 Dui Dvi dx and L(v) = fi vi are, respectively, a bilinear form and a linear form on V . N
2.3.2
Positive quadratic forms and convex minimization
The weak formulations of the model examples studied in the previous section have very similar structures. Indeed, they can be viewed as particular cases of the following abstract problem. Given V a linear vector space, a : V × V −→ R a bilinear form, and L : V −→ R a linear form, find u ∈ V such that a(u, v) = L(v)
∀v ∈ V .
(2.44)
In chapter 3, we will study in detail the existence of solutions to such problems. This will require some topological assumptions on the data V , a, L. For the moment, we will examine algebraic properties of such problems and make the link, when a(·, ·) is symmetric and positive, with convex minimization problems. Let us first make precise these notions concerning bilinear and quadratic forms.
i
i i
i
i
i
i
2.3. Weak solutions
“abmb 2005/1 page 3 i
37
Definition 2.3.4. Let V be a linear vector space and a : V × V −→ R a bilinear form, i.e., ∀ u ∈ V v −→ a(u, v) is a linear form, ∀ v ∈ V u −→ a(u, v) is a linear form. The bilinear form is said to be symmetric if ∀u, v ∈ V
a(u, v) = a(v, u).
When a is symmetric, one can associate to a(·, ·) the quadratic form q : V −→ R which is equal to q(v) = a(v, v). The bilinear form a is said to be positive (one can say as well that the associated quadratic form q is positive) if ∀v ∈ V
a(v, v) ≥ 0.
We say that a(·, ·) (or q(·)) is positive definite if ∀v ∈ V
a(v, v) ≥ 0 and a(v, v) = 0 ⇒ v = 0.
We can now make the link between problem (2.44) and a minimization problem. All the notions used in the following statement are algebraic. Proposition 2.3.1. Let V be a linear vector space, L : V −→ R a linear form, and a : V × V −→ R a bilinear, symmetric, positive form. Then the two following statements are equivalent: (i) u ∈ V , a(u, v) = L(v) ∀v ∈ V ; (ii) u ∈ V , J (u) ≤ J (v) ∀ v ∈ V , where J (v) :=
1 a(v, v) − L(v). 2
Proof. Let us first prove (i) ⇒ (ii). Since V is a linear space, it is equivalent to prove that J (u) ≤ J (u + v)
∀v ∈ V .
A simple computation gives 1 1 a(u + v, u + v) − L(u + v) − a(u, u) − L(u) . J (u + v) − J (u) = 2 2 Note that because of the symmetry assumption on the bilinear form a(·, ·), a(u + v, u + v) = a(u, u) + 2a(u, v) + a(v, v). Thus
1 1 a(u, u) + a(u, v) + a(v, v) 2 2 1 − a(u, u) − [L(u) + L(v)] + L(u) 2 1 = [a(u, v) − L(v)] + a(v, v). 2
J (u + v) − J (u) =
i
i i
i
i
i
i
38
“abmb 2005/1 page 3 i
Chapter 2. Weak solution methods in variational analysis
Since, by assumption, u is a solution of (i), a(u, v) = L(v) and 1 a(v, v), 2 which is nonnegative, since a has been assumed to be positive. J (u + v) − J (u) =
Let us now prove (ii) ⇒ (i). We know that u is a solution of the minimization problem, i.e., u minimizes J (·). One is naturally tempted to write an optimality condition which expresses that some derivative of J at u is equal to zero. Since V was only assumed to be a linear vector space, the only derivation notion we can use is the directional derivative which always makes sense since it relies only on the topological structure of the real line. Since u minimizes J , for any t ∈ R, for any v ∈ V , J (u + tv) − J (u) ≥ 0. Dividing by t > 0, we have 1 [J (u + tv) − J (u)] ≥ 0. t Before letting t go to zero, let us compute this last expression, 1 1 1 1 a(u + tv, u + tv) − L(u + tv) − a(u, u) + L(u) [J (u + tv) − J (u)] = t t 2 2 2 1 t = ta(u, v) + a(v, v) − tL(v) 2 t t = a(u, v) + a(v, v) − L(v). 2 Thus, by letting t go to zero, we obtain 1 [J (u + tv) − J (u)] = a(u, v) − L(v) ≥ 0. t→0 t Then, one can either make the same argument by using t < 0 or replace v by −v in the above inequality to obtain the opposite inequality and conclude that lim+
a(u, v) = L(v)
∀v ∈ V .
Let us return to the model examples studied in Section 2.3.1 and use their weak formulations together with Proposition 2.3.1 to obtain the results below. Corollary 2.3.1. With the notation of Section 2.3.1, the following facts hold. (a) The weak solution u of the Dirichlet problem −u = f on , u=0 on ∂ is a solution of the minimization problem J (u) ≤ J (v) ∀ v ∈ H01 (), u ∈ H01 (), where J (v) := 21 |Dv|2 dx − f vdx. This is the Dirichlet variational principle.
i
i i
i
i
i
i
2.3. Weak solutions
“abmb 2005/1 page 3 i
39
(b) The weak solution u of the Neumann problem u − u = f on , ∂u =0 on ∂ ∂n is a solution of the minimization problem J (u) ≤ J (v) ∀ v ∈ H 1 (), u ∈ H 1 (), where J (v) :=
1 2
(|Dv|
2
+ v 2 )dx −
f v dx.
(c) The weak solution u of the Stokes system −µui + div u = 0 u = 0
= fi
∂p ∂xi
i = 1, 2, . . . , N on , on , on ∂
is a solution of the minimization problem J (u) ≤ J (v) ∀ v ∈ V = {v ∈ H01 ()N : div v = 0}, u ∈ V, where J (v) =
1 2
N i=1
|Dvi |2 dx −
N i=1
fi vi dx.
Note that in all these examples, the bilinear form a(·, ·) is symmetric and positive. We now come to the question of the nature of the minimization problem and the properties of the functional J given by J (v) =
1 a(v, v) − L(v). 2
Note that J is the sum of a quadratic form and of a linear form. We are ready to introduce a property of fundamental importance in the study of the minimization problems—convexity. Recall that a function J : V −→ R, where V is a linear vector space, is convex if ∀ u, v ∈ V
∀ λ ∈ [0, 1]
J (λu + (1 − λ)v) ≤ λJ (u) + (1 − λ)J (v).
The role of convexity in minimization problems will be examined in detail in chapters 3, 9, 13, and 15. The class of convex functionals is stable with respect to the sum, it contains the linear forms, and we are going to see that it contains the positive quadratic forms. Thus, functionals of the form J (v) = 21 a(v, v) − L(v) with a and L as above will be convex. Let us now formulate the convexity property for the positive quadratic forms. Proposition 2.3.2. Let V be a linear vector space and a : V × V −→ R a bilinear form which is symmetric and positive. Then, the quadratic form q : V −→ R which is associated with a, i.e., q(v) = a(v, v) is a convex function.
i
i i
i
i
i
i
40
“abmb 2005/1 page 4 i
Chapter 2. Weak solution methods in variational analysis
Proof. This is just an algebraic computation. For any u, v ∈ V and λ ∈ [0, 1], q(λu + (1 − λ)v) = a(λu + (1 − λ)v, λu + (1 − λ)v) = λ2 a(u, u) + (1 − λ)2 a(v, v) + 2λ(1 − λ)a(u, v). Thus, λq(u) + (1 − λ)q(v) − q(λu + (1 − λ)v) = (λ − λ2 )a(u, u) − 2λ(1 − λ)a(u, v) + [(1 − λ) − (1 − λ)2 ]a(v, v) = λ(1 − λ)[a(u, u) − 2a(u, v) + a(v, v)] = λ(1 − λ)a(u − v, u − v), which is nonnegative, because λ ∈ [0, 1] and a is positive. When examinating the question of the uniqueness of the solution of the previous problems, the notion which plays a central role is the strict convexity. Recall that J : V −→ R is strictly convex if J is convex and the convexity inequality J (λu + (1 − λ)v) < λJ (u) + (1 − λ)J (v) is strict whenever u = v and λ ∈]0, 1[. The importance of this notion is justified by the following elementary result. Proposition 2.3.3. Let V be a linear space and J : V −→ R a strictly convex function. Then there exists at most one solution u to the minimization problem J (u) ≤ J (v) ∀ v ∈ V , u ∈ V. Proof. Suppose that we have two distinct solutions u1 and u2 to the above minimization problem. Then u1 + u2 1 J < [J (u1 ) + J (u2 )] = inf J (u), 2 2 a clear contradiction. Hence u1 = u2 . Proposition 2.3.4. Let V be a linear vector space and a : V × V −→ R a bilinear form which is symmetric and positive definite. Then, the quadratic form q : V −→ R which is associated with a, i.e., q(v) = a(v, v) is strictly convex. Proof. The proof is the same computation as in the proof of Proposition 2.3.2: for any u, v ∈ V , for any λ ∈ [0, 1], λq(u) + (1 − λ)q(v) − q(λu + (1 − λ)v) = λ(1 − λ)a(u − v, u − v). When taking λ ∈]0, 1[ we have λ(1 − λ) > 0, and when taking u = v we have a(u − v, u − v) > 0 because a is positive definite. So, for λ ∈]0, 1[ and u = v, λq(u) + (1 − λ)q(v) > q(λu + (1 − λ)v) and q is strictly convex.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 4 i
41
Proposition 2.3.5. The sum of a convex function and of a strictly convex function is strictly convex. Proof. The proof is a direct consequence of the fact that when adding an inequality and a strict inequality, one obtains a strict inequality. Let us return to the model examples and their variational formulations as minimization problems as given in Corollary 2.3.1. Noticing that in all these situations, the quadratic form q(v) = a(v, v) is positive definite, we obtain that the corresponding functional J is strictly convex. So, the weak solution of the problems under consideration, when it exists, is characterized as the unique solution of the associated minimization problems. This makes a natural transition to chapter 3, where the existence question will be examined.
2.4 Weak topologies and weak convergences In recent decades, weak topologies have proved useful as a basic tool in variational analysis in the study of PDEs, and more generally in all fields using tools from functional analysis. Let us explain some of the reasons for the success of weak convergence methods. (a) Distributions are defined as continuous linear forms on D(). In other words, a distribution T ∈ D () is viewed via its action on test functions v ∈ D(): T ∈ D () : v ∈ D() −→ T , v(D (),D()) . Given a sequence T1 , T2 , . . . , Tn , . . . of distributions (for example, functions, measures), a natural mode of convergence for such sequences is to assume that ∀ v ∈ D()
lim Tn , v(D ,D) = T , v.
n→∞
This is a typical example of weak convergence. (b) A celebrated theorem from Riesz asserts that the closed unit ball of a normed linear space is compact iff the space has a finite dimension. Thus, when looking for topologies making bounded sets relatively compact in infinite dimensional spaces, one is naturally led to introduce new topologies which are weaker than the topology of the norm. This is why weak topologies play a decisive role. (c) Besides the importance of weak topologies from a theoretical point of view, we will see that weak convergences naturally occur when describing concrete situations. For example, weak convergences allow us to describe high oscillations of a sequence of functions, as well as concentration phenomena on zero Lebesgue measure sets. Before introducing weak toplologies on normed linear spaces, let us recall some basic facts from general topology.
2.4.1 Topologies induced by functions in general topological spaces First we need to fix the notation. Recall that a topology on a space X is a family θ of subsets of X, called the family of the open sets of X, satisfying the axioms of the open sets, namely,
i
i i
i
i
i
i
42
“abmb 2005/1 page 4 i
Chapter 2. Weak solution methods in variational analysis
(i) X and ∅ belong to θ; (ii) ∀(Gi )i∈I Gi ∈ θ , I arbitrary, ∪i∈I Gi ∈ θ ; (iii) ∀(Gi )i∈I Gi ∈ θ , I finite, ∩i∈I Gi ∈ θ . In other words, the open sets of X for a given topology is a family of subsets of X which is stable with respect to arbitrary unions and finite intersection. We will often denote by τ a topology on a space X and by θτ the family of the τ -open sets. A topology can be seen as a subset of P (X), where P (X) is the family of all subsets of X. There is a natural partial ordering on the topologies on a given space X, which is induced by the inclusion ordering on the subsets of P (X): we will say that a topology τ1 is coarser or weaker than a topology τ2 and we write τ1 < τ2 if θτ1 ⊂ θτ2 , that is, if any element G of θτ1 also belongs to θτ2 . Conversely, we will say that τ2 is stronger or finer than τ1 . Proposition 2.4.1. The family of the topologies on a set X forms a complete lattice for the relation τ1 < τ2 (τ1 weaker than τ2 ), that is, given an arbitrary collection of topologies (τi )i∈I on X, (a) There exists a lower bound, that is a topology which is the largest among all the topologies weaker than the τi , i ∈ I . We denote τ = ∧i∈I τi the lower bound (or infimum) of the τi . Clearly θτ = ∩i∈I θτi , that is G ∈ θτ iff G belongs to θτi for all i ∈ I . (b) There exists an upper bound (or supremum) that is a topology which is the smallest among the topologies which are stronger than all the τi . We denote τ = ∨i∈I τi the upper bound of the τi . We have that θτ is generated by ∪i∈I θτi in the sense of Proposition 2.4.2. Proof. (a) Clearly if (θτi )i∈I is a family of topologies on X, then ∩i∈I θτi = {G ∈ P (X) : G ∈ θτi ∀i ∈ I } still satisfies the axioms of the open sets, and it is a topology. The topology τ attached to the family θ = ∩i∈I θτi is weaker than all the topologies τi , i ∈ I , and clearly it is the largest among the weaker ones. (b) In contradiction to the previous case, if a topology τ is stronger than all the topologies τi , then θτ must contain all the families θτi , that is, θτ ⊃ ∪i∈I θτi . But now A = ∪i∈I θτi does not satisfy (in general) the axioms of the open sets. So, one is naturally led to address the following question: given a class A of subsets of an abstract space X, does there exist a smallest topology θτ on X which contains A? Clearly by (a), the answer is yes, one has to take for θτ the intersection (or, with an equivalent terminology, the infimum) of all the topologies containing A = ∪i∈I θτi . This is made precise in Proposition 2.4.2. Proposition 2.4.2. Let X be an abstract space and let A be any class of subsets of X. Then there exists a smallest (weakest) topology on X containing A, denoted by τA , called the topology generated by A. It is equal to the intersection of all the topologies containing A. It can be obtained via the following two-step procedure: 1. First, take the finite intersections of elements of A. One so obtains a family of sets which we call BA .
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 4 i
43
2. Then BA is a base for the topology τA generated by A, that is, any member of τA can be obtained as the union of a family of members of BA . We stress that in this construction, one has first to take finite intersections of elements of A, then arbitrary unions of the so-obtained sets. When reversing the two operations, one obtains a family which is not stable by union. For a proof, see one of several books on general topology (for instance, Bourbaki [79]). We now come to the situation which is of interest when considering weak topologies. Suppose X is an abstract space and (Yi , τi )i∈I is a family of topological spaces. Suppose that for each i ∈ I , a function fi : X → Yi is given. We want to investigate the topologies on X with respect to which all the functions fi are continuous, and, among these topologies, examine the question of the existence of a smallest (weakest) one. Noticing that for each i ∈ I , fi−1 (θτi ) := {fi−1 (Gi ) : Gi ∈ θτi } still satisfies the axioms of the open sets, we denote fi−1 (τi ) the corresponding topology on X, which is the weakest making fi (for i fixed) continuous. It follows from Propositions 2.4.1 and 2.4.2 that the answer to the previous question is given by τ = ∨i∈I fi−1 (τi ), whose precise description is given in the following. Theorem 2.4.1. Let X be an abstract space and let (Yi , τi )i∈I be an arbitrary collection of topological spaces with for each i ∈ I , fi : X → Yi a given function. Then, there exists a weakest topology τ on X making all the functions (fi )i∈I continuous, fi : (X, τ ) −→ (Yi , τi ) i ∈ I . This topology τ is equal to ∨i∈I fi−1 (τi ), that is, τ is generated by the class A = {fi−1 (Gi ), Gi ∈ θτi , i ∈ I }.
The class Bτ =
fi−1 (Gi ),
Gi ∈ θτi , J ⊂ I, J finite
i∈J
is a base for this topology, that is, each element of θτ can be written as a union of elements of Bτ . We say that the topology τ is induced by the family (fi )i∈I . The following properties are quite elementary. Proposition 2.4.3. Let fi : X −→ (Yi , τi ), i ∈ I , be given, and let τ = ∨fi−1 (τi ) be the topology on X induced by the (fi )i∈I . For any sequence (xn )n∈N of elements of X, the two conditions are equivalent: τ (i) xn −→ x as n → ∞; τi (ii) ∀i ∈ I fi (xn ) −→ fi (x) as n → ∞. Proof. Since the topology τ makes each fi continuous, we have clearly (i) ⇒ (ii). Conversely, let us assume (ii) and prove (i). When considering a neighborhood of x, it is equivalent to take an element of the base Bτ which contains x. So let, x ∈ i∈J fi−1 (Gi ), τi Gi ∈ θτi , J ⊂ I , J finite. For each i ∈ J , since fi (xn ) −→ fi (x) we have that xn ∈ fi−1 (Gi ) for n ≥ ni . Take N = maxi∈J ni , since J is finite, N is a finite integer, and τ xn ∈ i∈J fi−1 (Gi ) for n ≥ N , which expresses that xn −→ x.
i
i i
i
i
i
i
44
“abmb 2005/1 page 4 i
Chapter 2. Weak solution methods in variational analysis A similar type argument yields the following result.
Proposition 2.4.4. Let (Z, T ) be a topological space and let g : (Z, T ) −→ (X, τ ) be a given function, where τ is the topology induced by the family fi : X −→ (Yi , τi ). Then g is continuous iff fi ◦ g is continuous from (Z, T ) into (Yi , τi ) for each i ∈ I .
2.4.2 The weak topology σ (V , V ∗ ) We now assume that X is a vector space. To enhance this property, we denote it by V (like vector) and assume that V is a normed linear space, the norm of v ∈ V being denoted by vV or v when no confusion is possible. We denote by V ∗ the topological dual of V , which is the set of all linear continuous forms on V . To avoid confusion, generic elements of V and V ∗ are denoted, respectively, by v ∈ V and v ∗ ∈ V ∗ . We will write v ∗ , v = v ∗ (v) for the canonical pairing between V ∗ and V , which is just the evaluation of v ∗ ∈ V ∗ at v ∈ V . Recall that V ∗ is a normed linear space (indeed, it is a Banach space) when equipped with the dual norm v ∗ V ∗ = sup{|v ∗ (v)| : vV ≤ 1}. With this definition, we have ∀v ∈ V , ∀v ∗ ∈ V ∗
|v ∗ , v| ≤ v ∗ v,
and v ∗ is precisely the smallest constant for which the above inequality holds. Definition 2.4.1. Let (V , · ) be a normed linear space with topological dual V ∗ . The topology σ (V , V ∗ ), called the weak topology on V , is the weakest topology on V making continuous all the elements of V ∗ . Let us first comment on this definition. By definition, each element v ∗ ∈ V ∗ is a function from V into R, that is, v ∗ : V −→ R,
v −→ v ∗ , v(V ∗ ,V ) .
The weak topology on V is defined as the weakest topology on V making all these functions {v ∗ : v ∗ ∈ V ∗ } continuous. By Theorem 2.4.1, such a topology exists, and it is weaker than the norm topology (since by definition all the elements v ∗ of V ∗ are continuous for the norm topology). We collect below some first results on the topology σ (V , V ∗ ) which are direct consequences of its definition. Proposition 2.4.5. Let V be a normed space and σ (V , V ∗ ) the weak topology on V . (i) A local base of neighborhoods of v0 ∈ V for σ (V , V ∗ ) consists of all sets of the form N (v0 ) = {v ∈ V : |vi∗ , v − v0 | < ε ∀i ∈ I }, where I is a finite index set, vi∗ ∈ V ∗ for each i ∈ I , and ε > 0. (ii) (V , σ (V , V ∗ )) is a Hausdorff topological space.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 4 i
45
(iii) The topology σ (V , V ∗ ) is coarser than the topology of the norm on V . (iv) When V is finite dimensional, the weak topology and the norm topology coincide. (v) When V is infinite dimensional, the weak topology σ (V , V ∗ ) is strictly coarser than the norm topology. Proof. (i) We have N (v0 ) =
(vi∗ )−1 (]αi − ε, αi + ε[) i∈I
where αi = vi∗ , v0 . By definition of the weak topology σ (V , V ∗ ), N (v0 ) is an open set for this topology. Let us prove that such sets form a local base of open neighborhoods of v0 for σ (V , V ∗ ). Take A an open set for σ (V , V ∗ ) containing v0 . By Theorem 2.4.1, there exists some open set B for σ (V , V ∗ ) such that v0 ∈ B ⊂ A with B = i∈I (vi∗ )−1 (Gi ), vi∗ ∈ V ∗ , Gi open in R, I finite. Since vi∗ (v0 ) ∈ Gi and Gi is open in R, there exists some ε > 0 such that |vi∗ (v) − ∗ vi (v0 )| < ε for all i ∈ I implies vi∗ (v) ∈ Gi for all i ∈ I . Hence v0 ∈ N (v0 ) ⊂ B ⊂ A with N (v0 ) =
(vi∗ )−1 (]αi − ε, αi + ε[), αi = vi∗ , v0 . i∈I
(ii) Let us prove that the topology σ (V , V ∗ ) is Hausdorff. Take v1 and v2 two distinct elements of V and prove that there exist A1 and A2 two open sets for σ (V , V ∗ ) such that v1 ∈ A1 , v2 ∈ A2 , and A1 ∩ A2 = ∅. This is a direct consequence of the Hahn–Banach separation theorem. There exists a closed hyperplane which strictly separates v1 and v2 , that is, there exists some v ∗ ∈ V ∗ and α ∈ R such that v ∗ , v1 < α < v ∗ , v2 . Take A1 = {v ∈ V : v ∗ , v < α}, A2 = {v ∈ V : v ∗ , v > α}. They are open for the topology σ (V , V ∗ ) and separate v1 and v2 . Assertion (iii) is obvious since all the elements v ∗ of V ∗ are continuous for the norm topology. (iv) Since the weak topology σ (V , V ∗ ) is coarser than the norm topology, it has fewer open sets. Let us prove that when V is a finite dimensional space, the opposite inclusion is true, that is, any open set A for the norm topology is also an open set for the weak topology. Take v0 ∈ B(v0 , ε) ⊂ A, where B(v0 , ε) is an open ball in (V , · ), and prove that there exists some open set U for σ (V , V ∗ ) such that v0 ∈ U ⊂ B(v0 , ε) ⊂ A.
i
i i
i
i
i
i
46
“abmb 2005/1 page 4 i
Chapter 2. Weak solution methods in variational analysis
Let us choose a base e1 , . . . , eN of V with ei = 1, i = 1, . . . , N. Each element v of V ei∗ can be uniquely written as v = xi ei and the mappings v −→ xi are linear continuous forms on V , i.e., they are elements of V ∗ . We have (with v0 = xoi ei ) (xi − xoi )ei v − v0 = i
≤
|xi − xoi |
i
≤
|ei∗ , v − vo |.
i
Therefore, v ∈ B(v0 , ε) as soon as v ∈ U := i=1,...,n (ei∗ )−1 (]αi − Nε , αi + Nε [), where αi = ei∗ , v0 = xoi . Then notice that U is open for σ (V , V ∗ ), which concludes the proof of (iv). (v) There are different ways to prove that in infinite dimensional spaces the weak topology is strictly coarser than the norm topology. One of them consists of proving that the unit sphere S = {v ∈ V : v = 1} is never closed in infinite dimensional spaces for ∗ the topology σ (V , V ∗ ). Indeed S¯ σ (V ,V ) = {v ∈ V : v ≤ 1}; see, for instance, [90, Proposition III.6] and related comments for a detailed proof. Remark 2.4.1. It is quite convenient when formulating topological properties to express them with the help of sequences. This can be done without loss of generality when the topology under consideration is metrizable. But the weak topology σ (V , V ∗ ) when V is an infinite dimensional normed space is a locally convex topology (the basic operations on V , vectorial sum and multiplication by a scalar, are continuous for the topology σ (V , V ∗ )) which is not metrizable. Therefore, it is important to state the properties of this topology with general topological arguments as we have done up to now. Nevertheless, in most pratical situations, one can just use weakly convergent sequences. This will follow from deep results like the Eberlein–Smulian compactness theorem or from simpler observations like the following one: if V ∗ is separable, the weak topology σ (V , V ∗ ) is metrizable on each bounded set of V . Consequently we now focus on properties of sequences which are σ (V , V ∗ ) convergent. Let us start with the following elementary results, which are direct consequences of the definition and of Proposition 2.4.3. Proposition 2.4.6. Let V be a normed linear space and σ (V , V ∗ ) the weak topology on V . For any sequence (vn )n∈N in V the following properties hold: σ (V ,V ∗ )
(i) vn −→ v ⇐⇒ ∀v ∗ ∈ V ∗ v ∗ , vn −→ v ∗ , v; ·
σ (V ,V ∗ )
(ii) vn −→ v ⇒ vn −→ v; σ (V ,V ∗ )
(iii) vn −→ v ⇒ the sequence (vn )n∈N is bounded and v ≤ lim inf n vn ; σ (V ,V ∗ )
·∗
(iv) vn −→ v and vn∗ −→ v ∗ ⇒ vn∗ , vn −→ v ∗ , v.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 4 i
47
Proof. (iii) The fact that the sequence (vn )n∈N is bounded is a consequence of the Banach– Steinhaus theorem: consider the family of linear operators from the Banach space V ∗ into R Tn : V ∗ −→ R n ∈ N, v ∗ −→ v ∗ , vn . For each n ∈ N, Tn is a linear continuous operator with norm Tn L(V ∗ ,R) = sup |v ∗ , vn | = vn . v ∗ ∗ ≤1
(This last equality is a consequence of the Hahn–Banach theorem.) For each v ∗ ∈ V ∗ , the sequence (Tn (v ∗ ))n ∈ N is bounded in R. This is a direct consequence of the equality Tn (v ∗ ) = v ∗ , vn and of the weak convergence of the sequence (vn )n∈N . By the Banach– Steinhaus theorem, supn∈N Tn L(V ∗ ,R) < +∞, which is equivalent to supn∈N vn < ∞. Let us now prove the inequality v ≤ lim inf n vn . By assumption, for each v∗ ∈ V ∗ v ∗ , v = lim v ∗ , vn . n→+∞
∗
∗
By using the inequality |v , vn | ≤ v ∗ vn , we infer ∀v ∗ ∈ V ∗
|v ∗ , v| ≤ (lim inf vn )v ∗ ∗ . n→+∞
By using the Hahn–Banach theorem, we obtain v = sup |v ∗ , v| ≤ lim inf vn . v ∗ ∗ ≤1
n
(iv) This is just a triangulation argument. Write vn∗ , vn − v ∗ , v = vn∗ − v ∗ , vn + v ∗ , vn − v. Hence |vn∗ , vn − v ∗ , v| ≤ vn∗ − v ∗ ∗ vn + |v ∗ , vn − v|. The previous result (iii) tells us that there exists some constant C ∈ R+ such that vn ≤ C for all n ∈ N. So, |vn∗ , vn − v ∗ , v| ≤ Cvn∗ − v ∗ + |v ∗ , vn − v|, which clearly implies the result. Remark 2.4.2. We will interpret in Section 3.2.3 the property σ (V ,V ∗ )
vn −→ v ⇒ v ≤ lim inf vn n
as a lower semicontinuity property of the norm · V for the topology σ (V , V ∗ ). Indeed, more generally, this can be viewed as a consequence of the fact that · V is convex and continuous on V (and hence lower semicontinuous for the topology σ (V , V ∗ )).
i
i i
i
i
i
i
48
“abmb 2005/1 page 4 i
Chapter 2. Weak solution methods in variational analysis
Because of its importance, let us say a few words about the weak convergence in Hilbert spaces (which we denote by H ). The Riesz representation theorem tells us that any element of the topological dual space can be represented as H v −→ f, v, where f is a given element of H and ·, · is the scalar product in H . Let us complete this observation by a few elementary results. Proposition 2.4.7. Let H be a Hilbert space. A sequence (vn )n∈N is weakly convergent in H iff ∀z ∈ H vn , z −→ v, z. n→+∞
Moreover, we have the following implication: ·
σ (H,H )
vn −→ v and vn −→ v ⇒ vn −→ v. Proof. We just need to prove the last statement. We have vn − v2 = vn 2 + v2 − 2vn , v. Hence lim vn − v2 = v2 + v2 − 2v, v = 0,
n→+∞ ·
that is, vn −→ v. In the next section we will prove that this last property, which is quite important in the applications, is valid in a much larger class than the Hilbert spaces, namely, the uniformly convex Banach spaces. For the moment, we pause in these theoretical developments to give some examples of sequences which are σ (V , V ∗ ) convergent but not · V convergent. 2 Example 2.4.1. Take real numbers, v = V = 2l . An element v of V is a sequence of (vk )k∈N , such that k∈N |vk |< +∞. The scalar product u, v := k∈N uk vk and the corresponding norm v = ( |vk |2 )1/2 give to V a Hilbert space structure. Consider the sequence e1 , e2 , . . . , en , . . . with
en = (δn,k )k∈N , where δn,k (the Kronecker symbol) takes the value 1 if k = n and 0 elsewhere. The family (en )n∈N is called the canonical basis of l 2 (it is a Hilbertian basis). Let us show that (en )n∈N weakly converges to 0 in V , that is, ∀v ∈ V
en , v −→ 0. n→+∞
Observe that en , v = vn , where v = (vn )n∈N . Since n∈N |vn |2 < +∞, the general term of this convergent series, that is, vn , tends to zero as n goes to +∞, which proves the result.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 4 i
49
The sequence (en )n∈N is not norm convergent in V ; otherwise it would necessarly norm-converge to zero (recall that the norm convergence implies the weak convergence). This is impossible since for each n ∈ N, en = 1. This can be equivalently obtained when observing that for all n = m √ en − em = 2, the sequence (en )n∈N is not a Cauchy sequence and hence it is not norm convergent. Example 2.4.2 (weak convergence in Lp (), 1 ≤ p < ∞). Take a bounded open set in RN , 1 ≤ p < ∞, and p p V = L () = v : −→ R Lebesgue measurable: |v(x)| dx < +∞ . 1/p
is a Banach space with dual V ∗ = V equipped with the norm v = |v(x)|p dx 1 1 p L (), p + p = 1 (with the convention that the conjugate exponent of 1 is +∞, i.e., L1 ()∗ = L∞ ()). The weak convergence in V = Lp () can be formulated as follows: σ (Lp ,Lp ) vn (x)z(x)dx −→ v(x)z(x)dx. vn −→ v ⇐⇒ ∀z ∈ Lp ()
n→+∞
p
The weak convergence in L allows us to model two different types of phenomena (they may occur simultaneously): 1. Oscillations. We describe the simplest situation of wild oscillations. Take = (a, b) a bounded open interval of the real line, dx the Lebesgue mesure on , and vn (x) = sin(nx), n ∈ N. Clearly vn oscillates between −1 and +1 with period equal to Tn = 2π/n. When n goes to +∞, Tn −→ 0 and simultaneously its frequency goes to +∞.
σ (Lp ,Lp )
Let us prove that vn −→ 0 for any 1 ≤ p < ∞. Indeed, we can state a slightly more precise result: for any z ∈ L1 () b z(x) sin nxdx −→ 0 as n → +∞. a
(Note that for all 1 ≤ p < +∞, Lp (a, b) ⊂ L1 (a, b).) Indeed, that is exactly the Riemann’s theorem which states that the Fourier coefficients of an integrable function tend to zero as n → +∞. For convenience, we give the proof, which is a nice illustration of a density argument. We also emphasize that this result is a particular case of an ergodic theorem (see Section 13.2, Proposition 13.2.1, Remark 13.2.5, and Theorem 14.2.2). For an arbitrary z ∈ L1 , it is difficult to compute or get information on the integral b a z(x) sin nx dx. So, let us first consider the case where z belongs to some dense subspace Z of L1 (), Z being chosen to make the computation easier. For example, take Z = C1c (a, b), the subspace of C1 functions with compact support in (a, b). By integration by parts, for any z ∈ C1c (a, b), b b cos nx z(x) sin nxdx = z (x) dx, n a a
i
i i
i
i
i
i
50
“abmb 2005/1 page 5 i
Chapter 2. Weak solution methods in variational analysis
which implies
b
a
1 b z(x) sin nxdx ≤ |z (x)|dx −→ 0. n→+∞ n a
Another choice would consist of taking for Z the subspace of the step functions on (a, b). In that case a direct computation yields a similar result. So, for any z belonging to a dense subspace Z of L1 (), we have that b z(x) sin nxdx −→ 0. n→+∞
a
We complete the proof by a density argument. Take z an arbitrary element of L1 (). By the density of Z in L1 (), for any ε > 0, there exists some element zε ∈ Z such that z − zε 1 < ε. Let us write b b b z(x) sin nx dx = zε (x) sin nx dx + (z(x) − zε (x)) sin nx dx. a
a
a
Thus,
a
b
b b z(x) sin nxdx ≤ zε (x) sin nxdx + |z(x) − zε (x)|dx a a b ≤ zε (x) sin nxdx + ε. a
Since zε ∈ Z, we obtain
lim sup n→+∞
a
b
z(x) sin nxdx ≤ ε
∀ ε > 0,
b which implies limn→+∞ a z(x) sin nxdx = 0. We now observe that the sequence (vn )n∈N , vn (x) = sin nx does not norm-converge in V = Lp (a, b). Otherwise, it would be norm convergent to zero, but this is impossible since, for example, with p = 2, b b 1 vn 2 = (1 − cos 2nx)dx (sin nx)2 dx = a a 2 b−a 1 = − (sin 2nb − sin 2na) 2 4n b−a −→ , which is different from zero! n→+∞ 2 2. Concentration. Take = (0, 1) and (vn )n∈N a sequence of step functions which is described as follows: k √ k let An = k=1,...,n n+1 − 2n1 2 , n+1 + 2n1 2 and take vn = n on An and vn = 0 elsewhere. Let us examine the mode of convergence of the sequence (vn )n∈N in V = L2 (0, 1). One can first observe that
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 5 i
51
1 (a) 0 vn2 (x) dx = n · n12 · n = 1 for all n ∈ N; (b) the sequence (vn )n∈N converges to zero in measure, that is, ∀δ > 0
meas{x ∈ (0, 1) : |vn (x)| > δ} −→ 0.
In fact, one just needs to observe that {x ∈ (0, 1) : |vn (x)| > δ} = An and meas(An ) = n · n12 = n1 −→ 0. n→+∞
Therefore, the sequence (vn )n∈N does not norm-converge in L2 (0, 1); otherwise it would converge to zero (recall that the norm convergence in L2 implies the convergence in measure), which is impossible since vn L2 = 1. Let us now prove that the sequence (vn )n∈N weakly converges to zero in V = L2 (0, 1). Indeed, by using the same density argument as in the previous oscillation example, we just need to prove that for any step function z : (0, 1) −→ R, 1 vn (x)z(x)dx −→ 0. 0
n→+∞
By linearity of the integral, we just need to compute for any 0 < a < b < 1 the integral b a vn (x)dx. Let us now observe that as n goes to +∞,
b
vn (x)dx n(b − a) ·
a
1 √ b−a · n = √ −→ 0 2 n n
( stands for equivalent). We stress the fact that in the concentration example, the weak convergence occurs simultaneously with the pointwise convergence. What happens in this situation is that the mass of |vn |2 is concentrated in a set of small Lebesgue measure. This is the concentration phenomenon. Let us notice, too, that the sequence (vn )n∈N , in the above example normconverges to zero in any Lp (0, 1), 1 ≤ p < 2! To see this, just compute 1 1 |vn (x)|p dx = n · 2 · np/2 = n(p/2)−1 −→ 0 n 0 as n → +∞ as soon as (p/2) − 1 < 0, that is, p < 2. Thus p = 2, in this situation, is a critical exponent, for which we pass from strong convergence of the sequence (vn )n∈N in Lp , p < 2, to weak convergence in L2 . As we will see, weak convergences related to concentration effect, often occurs in situations where some critical exponent is involved (like the critical Sobolev exponent). The two previous examples illustrate the utility of the density arguments when proving weak convergence. Let us state it in an abstract setting. Proposition 2.4.8. Let V be a normed linear space and Z a dense subset of V ∗ . For any bounded sequence (vn )n∈N in V , the following assertions are equivalent: σ (V ,V ∗ )
(i) vn −→ v. (ii) ∀z∗ ∈ Z, z∗ , vn −→ z∗ , v as n → +∞.
i
i i
i
i
i
i
52
“abmb 2005/1 page 5 i
Chapter 2. Weak solution methods in variational analysis
Proof. Clearly (i) ⇒ (ii). So let us assume (ii) and prove that for any v ∗ ∈ V ∗ , we have lim v ∗ , vn = v ∗ , v.
n→+∞
By the density of Z in V ∗ for any ε > 0, there exists some element zε∗ ∈ Z such that v ∗ − zε∗ ∗ < ε. Let us write v ∗ , vn − v = zε∗ , vn − v + v ∗ − zε∗ , vn − v, which by the triangle inequality and the definition of the dual norm · ∗ yields |v ∗ , vn − v| ≤ |zε∗ , vn − v + v ∗ − zε∗ ∗ · vn − v. Using the assumption that the sequence (vn )n∈N is bounded in V and that v ∗ − zε∗ ∗ < ε, we obtain that for some constant C ∈ R+ , |v ∗ , vn − v| ≤ |zε∗ , vn − v| + Cε. Now let n tend to +∞, and use assumption (ii) together with zε∗ ∈ Z to get lim sup |v ∗ , vn − v| ≤ Cε. n→+∞
This inequality being true for any ε > 0, we finally infer ∀v ∗ ∈ V ∗ lim v ∗ , vn − v = 0, n→+∞
that is, v = σ (V , V ∗ ) limn→+∞ vn .
2.4.3 Weak convergence and geometry of uniformly convex spaces In this section we pay attention to a particular class of Banach spaces, namely, the uniformly convex Banach spaces, where we will be able to extend the result of Proposition 2.4.7, that is, weak convergence and convergence of the norms imply the strong convergence. Definition 2.4.2. A Banach space (V , ·) is said to be uniformly convex if for any sequences (un )n∈N , (vn )n∈N in V with un = vn = 1 for all n ∈ N, the following implication holds: un + vn 2
−→ 1 ⇒ un − vn −→ 0.
n→+∞
This result reflects a geometrical property of the unit ball which has to be well rotund. Note that this definition is not stable when replacing a norm by an equivalent one. As an elementary example, one can observe that V = RN equipped with the norm x2 = n N 2 1/2 is uniformly convex, whereas the norms x1 = i=1 xi i=1 |xi | and x∞ = max1≤i≤N |xi | are not uniformly convex. The uniform convexity of the norm expresses that if u and v are on the unit sphere, the fact that u+v is close to the sphere forces u and v to be close one to the other. 2
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 5 i
53
Proposition 2.4.9. (a) The Hilbert spaces are uniformly convex. (b) The Lp spaces, 1 < p < ∞ are uniformly convex. Proof. (a) The uniform convexity of Hilbert spaces is a direct consequence of the parallelogram equality: ∀u, v ∈ V ,
u + v2 + u − v2 = 2(u2 + v2 ).
Notice that this property characterizes Hilbert spaces among general Banach spaces. So, let us take un , vn in V such that un = vn = 1. We have un + vn 2 2 un − vn = 4 1 − . 2 n −→ 1 forces un − vn to converge to zero as n → +∞. So un +v 2
(b) The same type of argument works in Lp spaces, 1 < p < ∞ when replacing the parallelogram identity by the so-called Clarkson’s inequalities; if · is the Lp norm, one has to distinguish two cases: u−v p u+v p 1 + ≤ (up + vp ) 2 2 2 p−1 u + v p 1 u − v p p p + ≤ (u + v ) 2 2 2 with
1 p
+
1 p
if 2 ≤ p < ∞, if 1 < p ≤ 2
= 1.
The following result justifies the introduction of the notion of uniform convexity in this section devoted to the weak convergence. Proposition 2.4.10. Let V be a uniformly convex Banach space. Then for any sequence (vn )n∈N in V the following implication holds: σ (V ,V ∗ )
·
vn −→ v and vn −→ v ⇒ vn −→ v. Proof. Let us reduce ourselves to the case vn = v = 1. To that end, let us consider v wn := vvnn and w := v (the case v = 0 is obvious, so, we can assume v = 0). σ (V ,V ∗ )
One can notice that wn = 1 and wn −→ w: indeed, for any v ∗ ∈ V ∗ , vn v v ∗ , wn − w= v ∗ , − vn v 1 1 1 = v ∗ , vn − v + − v ∗ , v, vn vn v σ (V ,V ∗ )
which goes to zero as n → +∞, since vn −→ v and vn −→ v = 0. σ (V ,V ∗ )
Let us show that wn2+w → 1 as n → +∞. Since wn2+w −→ w, by the lower semicontinuity of the norm for the weak topology (Proposition 2.4.6(iii)), 1 = w ≤ lim inf n
wn + w wn + w ≤ lim sup ≤ 1. 2 2 n
i
i i
i
i
i
i
54
“abmb 2005/1 page 5 i
Chapter 2. Weak solution methods in variational analysis
The last inequality follows from the triangle inequality and the fact that wn = w = 1. So we have wn = w = 1 and wn2+w −→ 1. It follows from the uniform convexity ·
property that wn −→ w. Using once more that vn −→ v we derive that vn = vn wn norm converges to v = vw. Remark 2.4.3. It is a quite useful method, when proving that a sequence (un )n∈N is norm converging in a Hilbert space, or more generally in a uniformly convex Banach space, to prove first that the weak convergence holds and then to prove that the norms converge, too. For example, when minimizing the norm over a closed convex bounded subset of a uniformly convex Banach space, one automatically obtains that any minimizing sequence is norm convergent. The property for a Banach space to verify for any sequence (vn )n∈N in V the implication σ (V ,V ∗ )
·
vn −→ v and vn −→ v ⇒ vn −→ v is often called the Kadek property. Remark 2.4.4. Let us observe that the Kadek property fails to be true in general Banach spaces. For example, it is false in the space L1 (), ⊂ RN equipped with the Lebesgue measure: indeed, take = (0, π ),
vn (x) = 1 + sin nx,
n = 1, 2, . . . .
π v ≡ 1, vn 1 = 0 vn (x)dx = π + n1 (1 − cos nπ ), so that vn 1 −→ n→∞ π π = v1 . But vn − v1 = 0 | sin nx|dx does not converge to zero as n → +∞. Indeed,
Then vn
σ (L1 ,L∞ )
−→
π
| sin nx|dx = n ·
0
π/n
sin nxdx = 2.
0
2.4.4 Weak compactness theorems in reflexive Banach spaces We have already observed that Lp spaces enjoy quite different properties with respect to the weak convergence, depending on the two situations 1 < p < ∞ or p = 1, p = ∞. One can distinguish them by introducing the concept of uniform convexity, or local uniform convexity of the space as done in Section 2.4.3. But this is a geometrical concept related to the choice of the norm, and when dealing with topological concepts like compactness, one is naturally led to consider notions which are of topological nature (i.e., invariant by the choice of an equivalent norm). This is where the notion of reflexive Banach space plays a fundamental role. Let us first recall its definition. Let V be a Banach space, V ∗ its topological dual, and V ∗∗ its topological bidual equipped, respectively, with the norms · V = · ,
v ∗ ∗ = sup |v ∗ , v|, v≤1
v ∗∗ ∗∗ = sup |v ∗∗ , v ∗ |. v ∗ ∗ ≤1
There exists a canonical embedding of V into V ∗∗ denoted by J : V −→ V ∗∗ which is defined as follows: ∀v ∈ V , ∀v ∗ ∈ V ∗
J v, v ∗ (V ∗∗ ,V ∗ ) = v ∗ , v(V ∗ ,V ) .
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 5 i
55
Let us comment on this definition. For any v ∈ V , the mapping v ∗ ∈ V ∗ −→ v ∗ , v(V ∗ ,V ) ∈ R is linear and continuous on V ∗ , so it defines uniquely an element of V ∗∗ which is denoted by J v. Let us observe that |v ∗ , v| ≤ v ∗ ∗ · v
∀ v ∈ V,
so that J v∗∗ ≤ v. Indeed, as a consequence of the Hahn–Banach theorem we have J v∗∗ = sup |v ∗ , v| = v, v ∗ ∗ ≤1
so that J is a linear isometry from V into V ∗∗ . As a consequence, J is an embedding of V into V ∗∗ . Definition 2.4.3. A Banach space V is said to be reflexive if J (V ) = V ∗∗ . When V is reflexive one can identify V and V ∗∗ with the help of J . Remark 2.4.5. J is a linear isometry. Thus it preserves the linear and the normed structures and allows us to identify V and V ∗∗ when J is onto, that is, in the case of reflexive Banach spaces. One has to pay attention to the following fact: the definition of reflexive Banach spaces says that the map J realizes an isometrical isomorphism between V and V ∗∗ . It is essential to use J in the definition since one can exhibit a nonreflexive Banach space V such that there exists an isometry from V onto V ∗∗ ! Proposition 2.4.11. Let V be a uniformly convex Banach space. Then V is reflexive. For a proof of this result, see, for instance, [225], [90]. When considering Lp spaces, this result is in accordance with the results concerning the dual of Lp spaces. When 1 < p < ∞, Lp is uniformly convex, (Lp )∗ = Lp , where p1 + p1 = 1, so that (Lp )∗∗ = (Lp )∗ = Lp (equalities above mean isometric isomorphisms). Remark 2.4.6. Note that there exist reflexive Banach spaces which do not admit an equivalent norm which makes the space uniformly convex. However, one can always renorm a reflexive Banach space with a norm (equivalent) which is locally uniformly convex both with its dual norm. With this renorming, it will satisfy the Kadek property (as well as its dual); see Section 2.4.3. The importance of reflexive Banach spaces is justified by the following theorem. Theorem 2.4.2. (a) In a reflexive Banach space (V , · ) the closed unit ball B = {v ∈ V : vV ≤ 1} is compact for the topology σ (V , V ∗ ). As a consequence, the bounded subsets of V are relatively compact for the topology σ (V , V ∗ ).
i
i i
i
i
i
i
56
“abmb 2005/1 page 5 i
Chapter 2. Weak solution methods in variational analysis
(b) The above property characterizes the reflexive Banach spaces: a Banach space is reflexive iff the closed unit ball is compact for the topology σ (V , V ∗ ). Proof. The proof is a direct consequence of the Banach–Alaoglu–Bourbaki theorem, Theorem 2.4.7. It makes use of the weak* topology on the dual of a Banach space. Let us now state a theorem from Eberlein and Smulian which states that from every bounded sequence in a reflexive Banach space one can extract a sequence which converges for the topology σ (V , V ∗ ). This is an important and quite suprising result, since the weak topology is not metrizable. One does not expect to have such a sequential compactness result! Theorem 2.4.3. Let V be a reflexive Banach space. Then, from each bounded sequence (un )n∈N in V , one can extract a subsequence (unk )k∈N which converges for the topology σ (V , V ∗ ). Proof. (a) Let us first assume that V ∗ is separable, that is, there exists a countable set D = (vk∗ )k∈N which is dense in V ∗ . The proof relies on a diagonalization argument. First, let us notice that for all v ∗ ∈ V ∗ , |v ∗ , un | ≤ v ∗ ∗ · un ≤ Cv ∗ ∗ , where C = supn∈N un < +∞. So the sequence {v ∗ , un : n ∈ N} is bounded in R. For each v ∗ ∈ V ∗ , one can extract from the sequence {v ∗ , un : n ∈ N} a convergent subsequence. The difficult point is that without any further argument, the so extracted sequence depends on v ∗ . This is where the separability of V ∗ and the diagonalization argument take place. Let us start with v1∗ ∈ D (D dense countable subset of V ∗ ) and extract a convergent subsequence {v1∗ , uσ1 (n) : n ∈ N}, where σ1 : N → N is a strictly increasing mapping. In a similar way, the sequence {v2∗ , uσ1 (n) : n ∈ N} is bounded in R, so there exists a convergent subsequence {v2∗ , uσ1 ◦σ2 (n) : n ∈ N}. Let us iterate this argument by induction. We can so construct for each n ∈ N an increasing mapping σn : N −→ N such that the sequence {vn∗ , uσ1 ◦σ2 ◦...◦σn (k) : k ∈ N} is convergent in R. Note that it is important to have the composition of the mapping σi in the precise order σ1 ◦ . . . ◦ σn to have this subsequence extracted from all the previous ones, σ1 , σ1 ◦ σ2 , . . . , σ1 ◦ . . . ◦ σn−1 . Now the diagonalization argument consists of taking τ : N −→ N defined by τ (n) = (σ1 ◦ σ2 ◦ . . . ◦ σn )(n). In other words, τ (n) is the element of rank n of the subsequence which has been extracted at step n. Clearly τ is strictly increasing. It is important to notice that for each n ∈ N, the sequence (uτ (k) )k≥n is extracted from the sequence (uσ1 ◦...◦σn (k) )k≥n : indeed, for k ≥ n, uτ (k) = uσ1 ◦σ2 ◦...◦σk (k) = u(σ1 ◦...◦σn )(p) ,
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 5 i
57
where p = (σn+1 ◦ · · · ◦ σk )(k). Since k ≥ n and σi are strictly increasing, we also have p ≥ n. It follows that for each k ∈ N, the sequence (vk∗ , uτ (n) : n ∈ N), as a subsequence of a convergent sequence, is convergent. By the density of D in V ∗ , the result is easily extented to an arbitrary element v ∗ of ∗ V : for each ε > 0 take vk∗ε ∈ D such that v ∗ − vk∗ε < ε. Then for each n, m ∈ N, |v ∗ , uτ (n) − v ∗ , uτ (m) |≤ |vk∗ε , u(τn ) − u(τm )|
+ |v ∗ − vk∗ε , u(τn ) − u(τm )| ≤ 2Cε + |vk∗ε , u(τn ) − u(τm )|.
Hence,
lim sup |v ∗ , uτ (n) − uτ (m) | ≤ 2Cε. n,m→∞
This being true for any ε > 0, we infer that the sequence {v ∗ , uτ (n) : n ∈ N} satisfies the Cauchy criteria and is thus convergent in R. Let us denote for all v ∗ ∈ V ∗ , L(v ∗ ) := limn→+∞ v ∗ , uτ (n) . Clearly, L is a linear continuous form on V ∗ ; note that by passing to the limit in the inequality |v ∗ , un | ≤ Cv ∗ ∗ , we also have |L(v ∗ )| ≤ Cv ∗ ∗ . Hence L ∈ V ∗∗ . Let us stress the fact that up to this point, we have not used the reflexivity hypothesis. We now use that V is reflexive to assert that L = J (u) for some u ∈ V , where J is the canonical embedding of V into V ∗∗ . So for all v ∗ ∈ V ∗ we have lim v ∗ , uτ (n) = v ∗ , u,
n→+∞
that is, u = σ (V , V ∗ ) limn→∞ uτ (n) . (b) Now take V a reflexive Banach space and do not make any separability assump¯ the closure tions. Take E the subspace of V generated by the (un )n∈N and define W = E, of E in (V , · ). It is easy to verify that W is reflexive and separable. This in turn implies that W ∗ is reflexive and separable (see [90, Corollary III.24]). We are now in the situation studied in the first part of the proof. One can extract a subsequence (uτ (n) )n∈N which converges in W for the topology σ (W, W ∗ ). Since V ∗ ⊂ W ∗ (by restriction of the linear continuous forms on V to W ) we derive that (uτ (n) )n∈N is convergent for the topology σ (V , V ∗ ).
2.4.5 The Dunford–Pettis weak compactness theorem in L1 () When 1 < p < ∞, the Lp spaces are reflexive Banach spaces, and it follows from Theorem 2.4.3 that the relatively compact subsets of Lp for the topology σ (Lp , Lp ) are exactly the bounded subsets of Lp . The situation in the case p = 1 is very different (L1 is not a reflexive Banach space), and the comprehension of the weak convergence properties of bounded sequences in L1 is a subject of great importance, quite involved. Let us first examine the following example. Take = (−1, 1) equipped with the Lebesgue measure and take 1 1 n if − 2n ≤ x ≤ + 2n , vn (x) = 0 elsewhere.
i
i i
i
i
i
i
58
“abmb 2005/1 page 5 i
Chapter 2. Weak solution methods in variational analysis
Clearly, the sequence (vn )n∈N satisfies ≥ 0, vn (x)dx = 1, v n vn (x) −→ 0 for almost every x ∈ , v (x)z(x)dx −→ z(0) for any z ∈ C(). n
n→+∞
We can observe that the sequence (vn )n∈N is bounded in L1 (), but one cannot extract a weakly convergent subsequence in the sense σ (L1 , L∞ ). By contrast, we will see in the next section that one can interpret the convergence of the sequence (vn )n∈N with the help of a weak topology of dual σ (V ∗ , V ), for example, σ (Mb (), C0 ()). For the moment, we just retain from this example that to obtain σ (L1 , L∞ ) compactness of a sequence of functions, it is not sufficient to assume that the sequence is bounded in L1 . This is where the notion of uniform integrability plays a central role. Definition 2.4.4. Let (, A, µ) be a measured space with µ a positive and finite measure (µ() < +∞). Let K be a subset of L1 (, A, µ). We say that K is uniformly integrable if (a) and (b) hold: (a) K is bounded in L1 (, A, µ); (b) for every ε > 0 there exists some δ(ε) > 0 such that A ∈ A, µ(A) < δ(ε) ⇒ sup |v(x)|dµ(x) < ε. v∈K
A
A comprehensive characterization of this property is given by the De La Vallée– Poussin theorem. Theorem 2.4.4. Let (, A, µ) be a measure space with µ a positive and finite measure and K a subset of L1 (, A, µ). The following properties are equivalent: (i) K is uniformly integrable; (ii) there exists a function θ : [0, +∞[→ [0, +∞[ (θ can be taken convex and increasing) such that lims→+∞ θ(s) = +∞ and s sup θ(|v(x)|)dµ(x) < +∞. v∈K
Proof. The implication (ii) ⇒ (i) is important for applications. Let us prove it. First, one can observe that since θ has a superlinear growth, for each M ∈ R+ there exists some C(M) ∈ R+ such that ∀s ∈ R+
0≤s≤
1 θ(s) + C(M). M
Let us fix M0 > 0. We have for each v ∈ K 1 |v(x)|dµ(x) ≤ θ(|v(x)|dµ(x) + C(M0 )µ() M0 and hence supv∈K |v|dµ < +∞, which proves (a) of Definition 2.4.4.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 5 i
59
Let us now prove (b). Fix ε > 0; for any v ∈ K, A ∈ A, and M ∈ R+ , 1 |v(x)|dµ(x)≤ θ(|v(x)|)dµ(x) + C(M)µ(A) M A A 1 ≤ sup θ(|v|)dµ + C(M)µ(A). M v∈K Take M(ε) :=
2 sup ε v∈K
ε . 2C(M(ε))
θ(|v|) dµ,
δ(ε) :=
|v(x)|dµ(x) ≤
ε ε + = ε, 2 2
Then if µ(A) ≤ δ(ε) we have sup v∈K
A
which proves (b). For the proof of the reverse implication (i) ⇒ (ii), see, for instance, [125, Theorem 22] or [102, Theorem 2.12]. We are now ready to state the Dunford–Pettis theorem, which gives a characterization of the weak compactness property in L1 . Theorem 2.4.5 (Dunford–Pettis theorem). Let (, A, µ) be a measure space with µ a positive and finite measure. Let K be a subset of L1 (, A, µ). The following properties are equivalent: (i) K is relatively compact for the weak topology σ (L1 , L∞ ). (ii) K is uniformly integrable. (iii) From each sequence (vn )n∈N contained in K, one can extract a subsequence converging for the topology σ (L1 , L∞ ). Proof. See [125, Theorem 25], [127, Theorem IV.8.9, Corollary IV.8.11], and [102]. When is an open subset of RN and µ the Lebesgue measure on , one can find a proof of implication (ii) ⇒ (iii) in Proposition 4.3.7 by using the notion of Young measures. Remark 2.4.7. As an illustration of the Dunford–Pettis theorem, let us consider (vn )n∈N a sequence in L1 (, A, µ), µ() < +∞ such that supn∈N |vn | ln |vn |dµ < +∞. We claim that the sequence (vn ) is σ (L1 , L∞ ) relatively compact. To obtain this result, just use the De La Vallée–Poussin theorem with θ(r) = r ln r (which is superlinear) and then use the Dunford–Pettis theorem.
2.4.6 The weak∗ topology σ (V ∗ , V ) Let (V , · ) be a normed linear space with topological dual V ∗ . On V ∗ we have already defined two topologies:
i
i i
i
i
i
i
60
“abmb 2005/1 page 6 i
Chapter 2. Weak solution methods in variational analysis • The norm topology associated to the dual norm v ∗ ∗ = sup{|v ∗ , v| : v ∈ V , v ≤ 1}, which makes V ∗ a Banach space; • The weak topology σ (V ∗ , V ∗∗ ), where V ∗∗ is the topological bidual of V . But this topology is often difficult to handle because the space V ∗∗ may have a rather involved structure (think, for example, V = L1 , V ∗ = L∞ , and V ∗∗ = (L∞ )∗ ), and moreover it may be too strong to enjoy desirable compactness properties.
The idea is to introduce a topology weaker than σ (V ∗ , V ∗∗ ) by considering a weak topology on V ∗ induced not by all the linear continuous forms on V ∗ but only by a subfamily. At this point, there is a natural candidate which consists in taking J (V ) ⊂ V ∗∗ where J : V −→ V ∗∗ is the canonical embedding from V into its bidual V ∗∗ . Recall that (see Section 2.4.4) ∀v ∈ V , ∀v ∗ ∈ V ∗
J (v), v ∗ (V ∗∗ ,V ∗ ) := v ∗ , v(V ∗ ,V ) .
Definition 2.4.5. Let V be a normed space with topological dual V ∗ . The weak∗ topology σ (V ∗ , V ) on V ∗ is the weakest topology on V ∗ making continuous all the mappings (J (v))v∈V , where J is the canonical embedding from V into V ∗∗ : J (v) : V ∗ −→ R, v ∗ −→ J (v), v ∗ (V ∗∗ ,V ∗ ) = v ∗ , v. Let us collect in the following proposition some first elementary properties of the topology σ (V ∗ , V ). Proposition 2.4.12. Let V be a normed space and σ (V ∗ , V ) the weak∗ topology on the topological dual V ∗ . Then (i) A local base of neighborhoods of v0∗ ∈ V ∗ for the topology σ (V ∗ , V ) consists of all sets of the form N (v0∗ ) = {v ∗ ∈ V ∗ : |v ∗ − v0∗ , vi | < ε ∀i ∈ I }, where I is a finite index set, vi ∈ V for each i ∈ I , and ε > 0. (ii) For any sequence (vn∗ )n∈N in V ∗ , we have σ (V ∗ ,V )
(a) vn∗ −→ v ∗ ⇐⇒ ∀ v ∈ V ·
vn∗ , v −→ v ∗ , v;
σ (V ∗ ,V )
(b) vn∗ −→ v ∗ ⇒ vn∗ −→ v ∗ . Assume now that V is a Banach space. Then σ (V ∗ ,V )
(c) vn∗ −→ v ∗ ⇒ the sequence (vn∗ ) is bounded and v ∗ ∗ ≤ lim inf vn∗ ∗ ; n
(d)
σ (V ∗ ,V ) vn∗ −→
·
v ∗ and vn −→ v ⇒ vn∗ , vn −→ v ∗ , v.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 6 i
61
Proof. (i) and (ii)(a) are direct consequences of the general properties of topologies induced by functions; see, respectively, Theorem 2.4.1 and Proposition 2.4.3. (ii)(b), (ii)(c), (ii)(d) are obtained in a similar way as in the proof of Proposition 2.4.6. Just notice that to apply the uniform boundedness theorem, one needs to assume that V is a Banach space. (In Proposition 2.4.6 one works on V ∗ , which is always a Banach space!) An important example: V = C0 (), V∗ = Mb (). Let be a locally compact topological space which is σ -compact (i.e., can be written as = n∈N Kn with Kn compact). For example, we may take = RN , or an open subset of RN , or an arbitrary topological compact set. Take V = C0 () the linear space of real continuous functions on which tend to zero at infinity; more precisely, we say that a continuous function u is in C0 () if for every ε > 0 there exists a compact set Kε such that |u(x)| < ε on \ Kε . Notice that C0 () reduces to C() when is compact. We may endow the space C0 () with the norm vV = sup |v(x)|. x∈
Then V is a Banach space whose topological dual V ∗ can be described thanks to the celebrated result below. Theorem 2.4.6 (Riesz–Alexandroff representation theorem). The topological dual of C0 () can be isometrically identified with the space of bounded Borel measures. More precisely, to each bounded linear functional on C0 () there is a unique Borel measure µ on such that for all f ∈ C0 (), (f ) = f (x)dµ(x).
Moreover, φ = |µ|(). For a proof of this theorem, see, for instance, [206, Theorem 6.19] or [94, Theorem 1.4.22]. Notice that this theorem holds true also when is not supposed to be σ -compact, but in that case one has to consider measures µ which are regular. We recall that a Borel measure µ ≥ 0 is said to be regular if ∀B ∈ B() µ(B) = inf{µ(V ) : V ⊃ B, V open}, ∀B ∈ B() µ(B) = sup{µ(K) : K ⊂ B, K compact}, and a signed measure µ is said to be regular if its total variation measure |µ| is regular. When is σ -compact, this property is automatically satisfied by Borel measures which are bounded. Note also that since Cc () is dense in C0 (), these two spaces have the same topological dual. We prefer to consider C0 () in this construction because it is a Banach space for the sup norm. Thus a bounded Borel measure µ can be as well considered as a σ -additive set function on the Borel σ -algebra (that’s the probabilistic approach) or as a continuous linear form on C0 () or Cc () (that’s the functional analysis approach). Given a sequence (µn )n∈N of
i
i i
i
i
i
i
62
“abmb 2005/1 page 6 i
Chapter 2. Weak solution methods in variational analysis
bounded Borel measures, we can consider these measures as elements of the topological dual space V ∗ = Mb () of V = C0 (). This leads to the following definition. Definition 2.4.6. (i) A sequence (µn )n∈N ⊂ Mb () converges weakly to µ ∈ Mb (), and we write µn −→ µ in Mb () provided
ϕ dµn −→
ϕ dµ as n → ∞
for each ϕ ∈ Cc (). (ii) A sequence (µn ) ⊂ Mb () converges σ (Mb , C0 ) to µ ∈ Mb (), and we write µn provided
σ (Mb ,C0 )
−→
µ
ϕ dµn −→
ϕ dµ
for each ϕ ∈ C0 (). The relation between these two close concepts is given by the following result. Proposition 2.4.13. Given (µn )n∈N ⊂ Mb (), µ ∈ Mb () one has the equivalence µn
σ (Mb ,C0 )
−→
µ ⇐⇒ µn −→ µ and sup |µn |() < +∞. n∈N σ (Mb ,C0 )
Proof. Since Cc () ⊂ C0 () the implication µn −→ µ ⇒ µn −→ µ is clear. Moreover, since Mb () = V ∗ with V = C0 () which is a Banach space, the uniform boundedness theorem implies (see Proposition 2.4.12(ii)(c)) µn
σ (Mb ,C0 )
−→
µ ⇒ sup µn < +∞,
that is, supn∈N |µn |() < +∞. The converse statement follows from a density argument which is similar to the one developed in Proposition 2.4.8. (Note that Cc () is dense in C0 ().) Corollary 2.4.1. On any bounded subset of Mb () there is the equivalence µn
σ (Mb ,C0 )
−→
µ ⇐⇒ µn −→ µ.
Let us now return to the general abstract properties of the weak∗ topologies and state the following compactness theorem, which explains the importance of these topologies.
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 6 i
63
Theorem 2.4.7 (Banach–Alaoglu–Bourbaki). Let V be a normed linear space. Then the unit ball BV ∗ = {v ∗ ∈ V ∗ : v ∗ ∗ ≤ 1} of the topological dual V ∗ is compact for the topology σ (V ∗ , V ). Proof. An element v ∗ ∈ V ∗ is a function from V into R. Let us write briefly RV for the set of all functions from V into R and denote by i the canonical embedding i : V ∗ −→ RV , v ∗ −→ {v ∗ , v}v∈V . When v ∗ ∗ ≤ 1, we have indeed |v ∗ , v| ≤ v, so ! [−v, +v] := Y, i : BV ∗ −→ v∈V
i(v ∗ ) = {v ∗ , v}v∈V . Let us endow Y with the product topology, which is the weakest topology on Y making all the projections continuous. This topology induces on i(BV ∗ ) the weak∗ topology σ (V ∗ , V ); this is exactly the way it has been defined. The topological space Y , which is a product of compact spaces and which is equipped with the product topology, is compact; this is the compactness Tikhonov theorem. So, we have just to verify that i(BV ∗ ) is closed in Y . This is clear since for all generalized sequence (vν∗ )ν∈I , the convergence of i(vν∗ ) for the product topology vν∗ , v −→ (v) ∈ Y implies that is still linear and |(v)| ≤ v, so that (v) = v ∗ , v for some v ∗ ∈ BV ∗ and i(vν∗ ) −→ i(v ∗ ). To obtain a weak∗ sequential compactness result on BV ∗ we need to assume a separability condition on V . Let us recall that a topological space V is said to be separable if there exists a dense countable subset of V . Typically that is the case of spaces C0 (), Lp () for 1 ≤ p < +∞ but not L∞ (). Theorem 2.4.8. Let V be a separable normed space. Then the unit ball BV ∗ of V ∗ is metrizable for the topology σ (V ∗ , V ). Before proving Theorem 2.4.8 let us formulate the following important result, which is a direct consequence of Theorems 2.4.7 and 2.4.8. Corollary 2.4.2. Let V be a separable normed linear space and (vn∗ )n∈N a bounded sequence in V ∗ . Then one can extract a subsequence (vn∗k )k∈N which converges for the topology σ (V ∗ , V ). Proof of Theorem 2.4.8. Let (vn )n≥1 be a dense countable subset of the unit ball BV of V (which exists since V is assumed to be separable; note that separability is a hereditary property).
i
i i
i
i
i
i
64
“abmb 2005/1 page 6 i
Chapter 2. Weak solution methods in variational analysis Then define on the unit ball BV ∗ of V ∗ the following distance d: ∀ u∗ , v ∗ ∈ BV ∗ d(u∗ , v ∗ ) =
∞ 1 |u∗ − v ∗ , vn |. n 2 n=1
Let us verify that the topology associated to the distance d coincides with the weak∗ topology σ (V ∗ , V ) on BV ∗ . This can be done with the help of neighborhoods or by using generalized sequences (nets): one has to verify that for an arbitrary net (vi∗ )i∈I contained in BV ∗ , σ (V ∗ ,V )
vi∗ −→ v ∗ ⇐⇒ d(vi∗ , v) −→ 0. I
It is easy to verify that since
supi∈I vi∗
≤ 1,
d(vi∗ , v) −→ 0 ⇐⇒ ∀k ∈ N I
vi∗ , vk −→ v ∗ , vk . I
So we have to verify that ∀v ∈ BV vi∗ , v −→ v ∗ , v ⇐⇒ ∀k ∈ N vi∗ , vk −→ v ∗ , vk . This is exactly the same argument as in Proposition 2.4.8, where one just uses nets instead of sequences. Back to weak∗ convergence of measures. Let us denote by a locally compact topological metrizable space which is σ -compact. Recallthat V = C0 () is separable: one can first notice that Cc () is dense in C0 (), Cc () = n C(Kn ), where Kn are compact, because of the σ -compactness assumption. Then observe that C(K) is separable; this can be obtained as a consequence of the Stone–Weierstrass theorem. Note that the metrizability of is equivalent to the separability of C(K) (see [94, Theorem 2.3.29]). We can now reformulate Corollary 2.4.2 in the case of sequences of measures. Proposition 2.4.14. Let be a locally compact, metrizable, σ -compact topological space. Then, from any bounded sequence of Borel measures (µn )n∈N on , i.e., verifying sup |µn |() < +∞, n∈N
one can extract a subsequence (µnk )k∈N which is σ (Mb , C0 ) convergent to some bounded Borel measure µ: ∀ϕ ∈ C0 () ϕ dµnk −→ ϕ dµ.
k→+∞
As a particular important situation where the result above can be directly applied let us mention the following. Corollary 2.4.3. Let be an open subset of RN equipped with the Lebesgue measure dx and (fn )n∈N a sequence of functions which is bounded in L1 (), i.e., supn∈N |fn (x)|dx < +∞. Then there exists a subsequence (fnk )k∈N and a bounded Borel measure µ on such that ∀ϕ ∈ C0 () ϕ(x)fnk (x)dx −→ ϕ(x)dµ(x).
k→+∞
i
i i
i
i
i
i
2.4. Weak topologies and weak convergences
“abmb 2005/1 page 6 i
65
Proof. Apply Proposition 2.4.14 to the sequence µn = fn dx and note that |µn |() = |f | dx. n Remark 2.4.8. (a) The above result, which allows us to extract from any bounded sequence (fn )n∈N in L1 () a subsequence which is σ (Mb , C0 ) convergent to a bounded Borel measure µ, relies on the embedding of L1 into Mb () which is a dual, namely, of C0 . This is a general method which consists, when one has some estimations on a sequence (vn )n∈N in some space X, to embed X → Y ∗ , where Y ∗ is the topological dual of some (separable) σ (Y ∗ ,Y )
normed space Y . Then one can extract a subsequence vnk −→ y ∗ , the limit y ∗ belonging to Y ∗ . (b) As an example of application of Corollary 2.4.3, take the sequence (vn )n∈N defined at the beginning of Section 2.4.5; one has vn
σ (Mb ,C0 )
−→
δ0 ,
where δ0 is the Dirac measure at the origin. (c) As a counterpart of its generality, the information given by a weak∗ convergence σ (Mb ,C0 )
σ (L1 ,L∞ )
fn dx −→ µ or even fn −→ f is often not sufficient to analyze some situations, for example, occuring in the study of some nonlinear PDEs. To treat such situations, we will introduce the concept of Young measures in chapter 4.
i
i i
i
i
“abmb 2005/1 page 6 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 6 i
Chapter 3
Abstract variational principles
The introduction in chapter 2 of a weak formulation of the model examples (Dirichlet problem, Stokes system) leads to study of the following problem. Given a linear vector space V , a bilinear symmetrical form a : V × V −→ R, and a linear form L : V −→ R, find u ∈ V such that a(u, v) = L(v)
∀ v ∈ V.
(3.1)
When a is positive, this turns out to be equivalent to the following minimization problem: find u ∈ V such that J (u) ≤ J (v), (3.2) where J (v) := 21 a(v, v) − L(v). In this chapter, we introduce the topological and geometrical concepts which allows us to solve this kind of problem and much more.
3.1 The Lax–Milgram theorem and the Galerkin method 3.1.1 The Lax–Milgram theorem In this section, V is a Hilbert space equipped with the scalar product ·, · and the associated norm: ∀v ∈ V v2 = v, v. Let us recall the celebrated Riesz theorem. Theorem 3.1.1 (Riesz). Let V be a Hilbert space and L ∈ V ∗ a linear continuous form on V . Then there exists a unique f ∈ V such that ∀v∈V
L(v) = f, v. 67
i
i i
i
i
i
i
68
“abmb 2005/1 page 6 i
Chapter 3. Abstract variational principles Notice that given f ∈ V , the linear form Lf defined by Lf (v) = f, v
satisfies (by application of the Cauchy–Schwarz inequality) |Lf (v)| ≤ f v and hence Lf ∗ ≤ f , where Lf ∗ is the dual norm of the continuous linear form Lf . On the other hand, by taking v = f1 f (if f = 0) we obtain Lf ∗ ≥ f . The Riesz theorem tells us that the linear isometrical embedding f → Lf from V into V ∗ is onto. So V and V ∗ can be identified both as vector spaces and as Hilbert spaces. Note that it is not completely correct to say that the topological dual of V is V itself! The Riesz theorem tells us that the topological dual of V , that is, V ∗ , is isometric to V and describes how any element of V ∗ can be uniquely represented with the help of an element of V : the mapping f ∈ V → Lf ∈ V ∗ is an isometrical isomorphism from V onto V ∗ . So we will often identify V ∗ with V . But one may imagine other representations of ∗ V . We will illustrate this when describing the dual of the Sobolev space H01 (). An important situation where one has to be careful with such identifications is when we have an embedding (i linear continuous, V dense in H ) i
V → H of two Hilbert spaces, (V , (·, ·)) and (H, ·, ). Clearly, any linear continuous form L on H when “restricted” to V (indeed, L|V = L ◦ i) defines a linear continuous form on V . So H ∗ ⊂ V ∗ and the mapping L ∈ H ∗ → L|V ∈ V ∗ is one to one because V is dense in H . When identifying H and H ∗ we have the usual “triplet” V → H → V ∗ . But now we cannot also identify V and V ∗ because we should end with the conclusion that H → V ! Thus one has to choose one identification. One cannot identify both (H and H ∗ ) and (V and V ∗ ). A typical situation is V = H01 () → L2 () → H −1 () = V ∗ . This leads to a representation of the dual of the Hilbert space V = H01 () which is different from the Riesz representation. The Riesz representation theorem will play a key role when establishing the following theorem.
i
i i
i
i
i
i
3.1. The Lax–Milgram theorem and the Galerkin method
“abmb 2005/1 page 6 i
69
Theorem √ 3.1.2 (Lax–Milgram). Let V be a Hilbert space with the scalar product ·, · and · = ·, · the associated norm. Let a : V × V −→ R be a bilinear form which satisfies (i) and (ii): (i) a is continuous, that is, there exists a constant M ∈ R+ such that ∀u, v ∈ V
|a(u, v)| ≤ Mu · v;
(ii) a is coercive, that is, there exists a constant α > 0 such that ∀v ∈ V
a(v, v) ≥ αv2 .
Then for any L ∈ V ∗ (L is a linear continuous form on V ) there exists a unique u ∈ V such that a(u, v) = L(v) ∀v ∈ V . Remark 3.1.1. Let us make some complements to the discussions above. (a) Before proving the Lax–Milgram theorem, let us notice that it contains as a particular case the Riesz representation theorem. Take a(u, v) = u, v and verify (i) and (ii). By the Cauchy–Schwarz inequality we have |a(u, v)| ≤ u v. Hence a is continuous (take M = 1). Moreover, a(v, v) = v, v = v2 and a is coercive (take α = 1). So, for any L ∈ V ∗ there exists a unique u ∈ V such that L(v) = u, v
∀ v ∈ V,
and this is the Riesz representation theorem. (b) One can easily verify that for a bilinear form, the continuity property is equivalent to the existence of some constant M ≥ 0 such that |a(u, v)| ≤ Mu v.
(3.3)
Let us first verify that if (3.3) is satisfied, then a is continuous: take un −→ u and vn −→ v. Then a(un , vn ) − a(u, v) = a(un , vn ) − a(un , v) + a(un , v) − a(u, v) = a(un , vn − v) + a(un − u, v). It follows that |a(un , vn ) − a(u, v)| ≤ Mun · vn − v + Mun − u · v.
(3.4)
The sequence (un )n∈N being norm convergent is bounded in V and there exists a constant C ≥ 0 such that supn un ≤ C. Returning to (3.4), |a(un , vn ) − a(u, v)| ≤ M Cvn − v + v · un − u , which implies that limn a(un , vn ) = a(u, v).
i
i i
i
i
i
i
70
“abmb 2005/1 page 7 i
Chapter 3. Abstract variational principles
Conversely, let us assume that a is a bilinear continuous form on V × V . Since a(0, 0) = 0 and a is continuous at (0, 0), for any ε > 0 there exists some η(ε) > 0 such that u ≤ η(ε) and v ≤ η(ε) ⇒ a(u, v)| ≤ ε. Take u, v arbitrary elements of V , u = 0, v = 0. Then η(ε) u ≤ η(ε) and u Hence
η(ε) v ≤ η(ε). v
a η(ε) u, η(ε) v ≤ ε, u v
which implies ∀ u, v = 0
|a(u, v)| ≤
ε u · v. η(ε)2
This is still true if u or v is the zero element of V . So, one can take M = ε/η2 (ε). Proof of the Lax–Milgram theorem. When establishing a weak formulation for some partial differential equations or systems Au = f
(3.5)
(for example, Au = −u for the Dirichlet problem with prescribed boundary data contained in the domain of A), we have been led to study problems of the form a(u, v) = L(v)
∀ v ∈ V.
(3.6)
Indeed, we are going to reconstruct an abstract equation (3.5) from (3.6). The major interest of this reverse operation is that now we are able to formulate precisely the topological and geometrical properties of the operator A. Let us first apply the Riesz theorem to L: there exists some f ∈ V such that L(v) = f, v
∀v ∈ V .
(3.7)
For any fixed u ∈ V , the mapping v → a(u, v) is a continuous linear form on V ; note that |a(u, v)| ≤ Mu v ∀v ∈ V . Applying once more the Riesz theorem, there exists a unique element, which we denote A(u) ∈ V , such that ∀v ∈ V a(u, v) = A(u), v. The mapping u → A(u) from V into V is linear: given u1 , u2 belonging to V A(u1 + u2 ), v = a(u1 + u2 , v) = a(u1 , v) + a(u2 , v) = A(u1 ), v + A(u2 ), v = A(u1 ) + A(u2 ), v ∀ v ∈ V.
i
i i
i
i
i
i
3.1. The Lax–Milgram theorem and the Galerkin method
“abmb 2005/1 page 7 i
71
Hence, A(u1 + u2 ) = A(u1 ) + A(u2 ). Similarly, ∀ λ ∈ R, ∀ u ∈ V
A(λu) = λA(u).
So, our problem can be reformulated as follows: find u ∈ V such that A(u), v = f, v
∀v ∈ V ,
that is, Au = f.
(3.8)
Let us reformulate in terms of A the properties of the bilinear form a(·, ·): (a) since a is bilinear we have that A : V −→ V is a linear mapping;
(3.9)
(b) A is continuous: for any u, v ∈ V Au, v = a(u, v) ≤ Mu v. Taking v = Au we obtain
Au2 ≤ Mu Au,
which implies Au ≤ M u.
(3.10)
This expresses that A is a linear continuous operator from V into V with AL(V ,V ) ≤ M. (c) A is coercive in the following sense: there exists some α > 0 such that ∀v ∈ V
Av, v ≥ αv2 .
(3.11)
To solve (3.8) we formulate it as a fixed point problem. Let λ be some strictly positive parameter. Clearly, to solve (3.8) is equivalent to finding u ∈ V such that u − λ(Au − f ) = u.
(3.12)
In other words, we are looking for a fixed point u ∈ V of the mapping gλ : V −→ V given by gλ (v) = v − λ(Av − f ). (3.13) Let us prove that with λ adequately chosen, the mapping gλ satisfies the condition of the Banach fixed point theorem, which we recall now. Theorem 3.1.3 (Banach fixed point theorem—Picard iterative method). Let (X, d) be a complete metric space and g : X −→ X be a Lipschitz continuous mapping with a Lipschitz constant k strictly less than one, i.e., ∀ x, y ∈ X d(g(x), g(y)) ≤ k d(x, y).
i
i i
i
i
i
i
72
“abmb 2005/1 page 7 i
Chapter 3. Abstract variational principles
Then, there exists a unique x¯ ∈ X such that g(x) ¯ = x. ¯ Moreover, for any x0 ∈ X, the sequence (xn ) starting from x0 with xn+1 = g(xn ) for all n ∈ N converges to x¯ as n goes to +∞. Proof of Lax–Milgram theorem continued. Take v1 , v2 ∈ V . Then gλ (v2 ) − gλ (v1 ) = [v2 − λ(Av2 − f )] − [v1 − λ(Av1 − f )] = (v2 − v1 ) − λA(v2 − v1 ). Let us denote v = v2 − v1 . So gλ (v2 ) − gλ (v1 ) = v − λAv. To majorize this quantity, we consider its square and take advantage of the Hilbertian structure of V : gλ (v2 ) − gλ (v1 )2 = v − λAv2 = v2 − 2λAv, v + λ2 Av2 . By using (3.10) and (3.11) (note that we have assumed λ > 0), we obtain gλ (v2 − gλ (v1 )2 ≤ (1 − 2λα + λ2 M 2 ) v2 .
(3.14)
So, the question is to find some λ > 0 such that 1 − 2λα + λ2 M 2 < 1. Take λ¯ for which the quantity 1 − 2αλ + λ2 M 2 is minimal, that is, λ¯ = α/M 2 , in which case ¯ + λ¯ 2 M 2 = 1 − 1 − 2λα "
Hence gλ¯ (v2 ) − gλ¯ (v1 ) ≤
1−
α2 < 1. M2
α2 v2 − v1 , M2
(3.15)
# and kλ¯ = 1 − α 2 /M 2 is strictly less than one (note that α > 0). So gλ¯ has a unique fixed point u; ¯ equivalently, the equation Au = f has a unique solution u. ¯ Remark 3.1.2. (a) One of the main advantages of the proof above is that since it relies on the Banach fixed point theorem, it is a constructive proof. (b) A second advantage is that it can be easily extended to nonlinear equations: solve Au = f, where A : V −→ V satisfies ∃M ≥ 0 such that ∀u, v ∈ V ∃α > 0 such that ∀u, v ∈ V
Au − Av ≤ Mu − v;
Au − Av, u − v ≥ αu − v2 .
(c) Another approach consists of proving that A is onto, that is, R(A) = V . To that end, one first establishes that
i
i i
i
i
i
i
3.1. The Lax–Milgram theorem and the Galerkin method
“abmb 2005/1 page 7 i
73
(i) R(A) is closed: for this one can first notice that ∀v ∈ V
αv2 ≤ Av, v ≤ Av · v
and hence αv ≤ Av. If Avn −→ z, we have αvn − vm ≤ A(vn − vm ) = Avn − Avm and (vn ) is a Cauchy sequence in V . Hence vn −→ v for some v ∈ V and by continuity of A, Avn −→ Av. Consequently z = Av. (ii) R(A) is dense: if z ∈ R(A)⊥ , then Av, z = 0
∀ v ∈ V.
Take v = z to conclude z = 0. Hence R(A)⊥ = {0}, that is, R(A) = V .
3.1.2 The Galerkin method A quite natural idea when considering an infinite dimensional (variational) problem is to approximate it by finite dimensional problems. This has important consequences both from the theoretical (existence, etc.) and the numerical point of view. In this section, we consider the situation corresponding to the Lax–Milgram theorem, and by using the Galerkin method we will both provide another proof of the existence of a solution and describe a corresponding approximation numerical schemes. We stress the fact that this type of finite dimensional approximation method is very flexible and can be applied (we will illustrate it further in various situations) to a large number of linear or nonlinear problems. Definition 3.1.1. Let V be a Banach space. A Galerkin approximation scheme is a sequence (Vn )n∈N of finite dimensional subspaces of V such that for all v ∈ V , there exists some sequence (vn )n∈N with vn ∈ Vn for all n ∈ N and (vn )n∈N norm-converging to v. This approximation property can be reformulated as ∀v ∈ V
lim dist(v, Vn ) = 0,
n→+∞
where dist(v, Vn ) = inf w∈Vn v − w. Proposition 3.1.1. Let V be a separable Banach space. Then one can construct a Galerkin approximation scheme (Vn ) by the following method: (i) take (un )n∈N a countable dense subset of V (the separability of V just expresses that such set exists); (ii) let Vn = span {u1 , u2 , . . . , un }. Then (Vn )n∈N is a Galerkin scheme.
i
i i
i
i
i
i
74
“abmb 2005/1 page 7 i
Chapter 3. Abstract variational principles
Proof. Let v ∈ V . By (i), there exists a mapping k → n(k) from N into N such that un(k) − v ≤ k1 for all k ∈ N∗ . For k fixed, un(k) ∈ Vn(k) and hence dist(v, Vn(k) ) ≤
1 . k
Since Vn ⊃ Vn(k) for n ≥ n(k), we obtain ∀n ≥ n(k)
dist(v, Vn ) ≤ dist(v, Vn(k) ) ≤
1 , k
that is, dist(v, Vn ) −→ 0 as n → +∞. Remark 3.1.3. (a) The vectors u1 , . . . , un need not to be linearly independent. By a classical linear algebra argument, one can replace in Proposition 3.1.1 the sequence (un )n∈N by a sequence (wn )n∈N , wn ∈ V made by linearly independent vectors. (b) In Proposition 3.1.1 the sequence of subspaces (Vn )n∈N satisfies • V1 ⊂ V2 ⊂ V3 ⊂ · · · ⊂ Vn ⊂ · · · is an increasing sequence of finite dimensional subspaces; • Un∈N Vn = V . The Galerkin approach to the Lax–Milgram theorem. We now suppose that V is a separable Hilbert space, a : V × V −→ R is a bilinear, continuous, coercive form, L : V −→ R is a linear, continuous form. We want to study the following problem: find u ∈ V such that a(u, v) = L(v) ∀ v ∈ V .
(3.16)
Since V is separable, by Proposition 3.1.1 there exists a Galerkin scheme (Vn )n∈N with Vn increasing with n ∈ N. Consider the approximated problems find un ∈ Vn such that (3.17) a(un , v) = L(v) ∀ v ∈ Vn . Problem (3.17) can be equivalently reformulated as An un = fn ,
(3.18)
where An is the linear operator from Vn −→ Vn such that a(u, v) = An u, v
∀ u, v ∈ Vn
and L(v) = fn , v
∀ v ∈ Vn .
This is exactly the same argument as in the proof of the Lax–Milgram theorem except that now we work on finite dimensional spaces, which makes the existence of (un )n∈N very easy: since ker An = 0 (which follows from the coercivity of An ), then An is onto. Note the basic difference with the infinite dimensional situation where such an argument is false!
i
i i
i
i
i
i
3.1. The Lax–Milgram theorem and the Galerkin method
“abmb 2005/1 page 7 i
75
The question now is to study the convergence of the sequence (un )n∈N . In (3.17) take v = un so that αun 2 ≤ a(un , un ) = L(un ) ≤ L∗ un and
L∗ . (3.19) α The sequence (un )n∈N is bounded and hence weakly relatively compact, that is, there exists a subsequence (unk ) and some u ∈ V such that un ≤
unk , v −→ u, v ∀v ∈ V .
(3.20)
w−V
We write unk −→ u. (See Theorem 2.4.3 with a direct independent proof of this result in separable Hilbert spaces.) Given v ∈ Vm , we have Vnk ⊃ Vm for k sufficiently large and hence a(unk , v) = L(v) (3.21) for all k sufficiently large. Then notice that w−V
unk −→ u ⇒ a(unk , v) −→ a(u, v) as k −→ +∞. This follows from the representation of the linear continuous form u → a(u, v) on V , where a(u, v) = u, At v (At denotes the adjoint of A). Hence w−V
unk −→ u ⇒ a(unk , v) = unk , At v −→ u, At v = a(u, v). So, when passing to the limit in (3.21), we obtain that, given v ∈ Vm , a(u, v) = L(v). Hence a(u, v) = L(v) for every v ∈ m∈N Vm . Since m∈N Vm = V by continuity of a(u, ·) and L(·) we finally infer a(u, v) = L(v)
∀ v ∈ V.
Since u is the unique solution of this problem, by a classical compactness argument, the whole sequence (un )n∈N weakly converges to u. Indeed, one can prove that the sequence (un )n∈N norm converges to u. This is what is explained below with an explicit bound on un − u. Proposition 3.1.2 (Cea lemma). Let V be a separable Hilbert space and (Vn )n∈N a Galerkin scheme. Suppose a(un , v) = L(v) ∀ v ∈ Vn , un ∈ Vn , and
a(u, v) = L(v) ∀ v ∈ V , u ∈ V,
i
i i
i
i
i
i
76
“abmb 2005/1 page 7 i
Chapter 3. Abstract variational principles
where a and L satisfy the assumptions of the Lax–Milgram theorem. Then, un − u ≤
M dist(u, Vn ). α
Proof. Substracting (3.16) from (3.17), we obtain a(u − un , v) = 0
∀v ∈ Vn ,
and in particular a(u − un , un ) = 0. It follows, for every v ∈ Vn , a(u − un , u − un ) = a(u − un , u − v) + a(u − un , v − un ) = a(u − un , u − v). Let us now use the continuity and coercivity property of a, α u − un 2 ≤ Mu − un · u − v, to obtain αu − un ≤ M u − v
∀ v ∈ Vn .
Hence
M dist (u, Vn ). α Since (Vn )n∈N is a Galerkin scheme, dist(u, Vn ) −→ 0 as n −→ +∞, and (un )n∈N normconverges to u. u − un ≤
3.2
Minimization problems: The topological approach
As a corollary of the Lax–Milgram theorem, we have obtained that if a : V × V −→ R is bilinear, continuous, coercive, and symmetric, then, for any L ∈ V ∗ , there exists a unique solution to the minimization problem: find u ∈ V such that J (u) ≤ J (v)
∀v ∈ V ,
where J (v) = 21 a(v, v) − L(v). Let us observe that J : V −→ R is convex, continuous, and coercive. By coercive, we mean that limv−→+∞ J (v) = +∞, which follows easily from the inequality J (v) ≥
α v2 − L∗ · v. 2
Indeed, we will prove in Section 3.3.2 the following general result, which contains as a particular case the above situation (Theorem 3.3.4). Let J : V −→ R ∪ {+∞} be a real-extended valued function on a reflexive Banach space V , which is convex, lower semicontinuous, and coercive. Then there exists at least one u ∈ V such that J (u) ≤ J (v) ∀ v ∈ V .
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 7 i
77
Before proving this theorem, in this section we will successively examine its basic ingredients. We will first justify the introduction of extended real-valued functions. Then, we will state the Weierstrass minimization theorem, which is purely topological, and in the process we will study the notions of lower semicontinuity, inf-compactness, and the interplay between inf-compactness, coercivity, and the role of the weak topology. So doing, we will be able to explain why convexity plays an important role in such questions.
3.2.1
Extended real-valued functions
A main reason for introducing extended real-valued functions is that they provide a natural and flexible modelization of minimization (or maximization) problems with constraints. Since in this chapter we consider minimization problems, we just need to consider functions f : X −→ R ∪ {+∞}. On the other hand, if one considers maximization problems, one needs to introduce functions f : X −→ R ∪ {−∞}. If maximization and minimization are both involved, just like in ¯ saddle value problems, one needs to work with functions f : X −→ R. The (effective) domain of a function f : X −→ R ∪ {+∞} is the set dom f = {x ∈ X : f (x) < +∞}. The function f is said to be proper if dom f = ∅. Let us briefly justify the introduction of extended real-valued functions. Most minimization problems can be written as min{f0 (x) : x ∈ C},
(3.22)
where f0 : X −→ R is a real-valued function and C ⊂ X is the set of constraints. In economics, C describes the available resources, the possible productions of a firm, or a set of decisions, and f0 is the corresponding cost or economical criteria. In physics, the configurations x of the system are subject to constraints (unilateral or bilateral) and f0 , for example, is the corresponding energy. A natural way to solve such problems is to approach them by penalization. For example, let us introduce a distance d on X and, for any positive real number k, consider the minimization problem min{f0 (x) + k d(x, C) : x ∈ X},
(3.23)
d(x, C) = inf{d(x, y) : y ∈ C}
(3.24)
where is the distance function from x to C. Note that the penalization term is equal to zero if x ∈ C (that is, if the constraint is fulfilled), and when x ∈ C (that is, if the constraint is violated) it takes larger and larger values which are increasing to +∞ with k. Let us also notice that the approximated problem (3.23) can be written as min{fk (x) : x ∈ X},
i
i i
i
i
i
i
78
“abmb 2005/1 page 7 i
Chapter 3. Abstract variational principles
where fk (x) = f0 (x) + k d(x, C) is a real-valued function. Then the approximated problems are unconstrained problems, which makes the method interesting. As k → +∞, the sequence of functions {fk : k ∈ N} increases to the function f : X −→ R ∪ {+∞}, which is equal to f0 (x) if x ∈ C, f (x) = (3.25) +∞ otherwise. The function f is an extended real-valued function, so, if we want to treat in a unified way problems (3.22) and (3.23) we are naturally led to introduce extended real-valued functions. The minimization problem (3.22) can be equivalently formulated as min{f (x) : x ∈ X}, where f is given by (3.25). Note that in this formulation the constraint is equal to the domain of f . A particularly useful function in this unilateral framework is the indicator function δC of the set C: 0 if x ∈ C, δC (x) = (3.26) +∞ otherwise. With this notation we have f = f0 + δC . More generally, in variational analysis and optimization, one is often faced with expressions of the form f = sup fi i∈I
and one should notice that the class of extended real-valued functions is stable under such supremum operation. As a further illustration of these considerations, the convex duality theory establishes a one-to-one correspondence between a convex C ⊂ X of a normed linear space and its support function σC : X∗ −→ R ∪ {+∞}, which is defined by
σC (x ∗ ) = sup{x ∗ (x) : x ∈ C}
with X∗ the topological dual of X. One cannot avoid the value +∞ for σC as soon as the set C is not bounded (which is often the case!) and general duality statements must consider extended real-valued functions. So, from now on, unless explicitly specified, we will consider functions f : X −→ R ∪ {+∞} possibly taking the value +∞. We adopt the conventions that λ × (+∞) = +∞ if λ > 0 and 0 × (+∞) = 0.
3.2.2 The interplay between functions and sets: The role of the epigraph The analysis of unilateral problems (like minimization) naturally leads to the introduction of mathematical concepts which have a unilateral character. The classical approach of analysis
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 7 i
79
does not provide the appropriate tools to deal with the mathematical objects and operations that are intrinsically unilateral (constraints, minimization). In the previous section, we justified the introduction of extended real-valued functions f : X −→ R ∪ {+∞} which is the appropriate concept for dealing with minimization problems. Although in classical analysis, the properties of the graph of a function play a fundamental role, in variational analysis it is the epigraph that will take over this role. The set epi f = {(x, λ) ∈ X × R : λ ≥ f (x)}
(3.27)
is the epigraph of the function f : X −→ R ∪ {+∞}. For any γ ∈ R, the lower γ -level set of f is levγ f = {x ∈ X : f (x) ≤ γ }.
(3.28)
When considering the minimization problem of a given function f : X −→ R ∪ {+∞}, the solution set is arg min f = {x¯ ∈ X : f (x) ¯ = inf f (x)}. X
(3.29)
Note that arg min f can be possibly empty and
arg min f =
levγ f.
(3.30)
γ >inf X f
Thus, to an extended real-valued function, we have associated many different sets, its epigraph, its lower level sets, its minimum set. Conversely, to a set we have associated extended real-valued functions, for example, the indicator function, the support function, the distance function (this last one is real-valued if the set is nonempty). In the following section, we describe how some basic topological properties of functions for minimization problems can be naturally formulated with the help of the attached geometrical sets, epigraphs, and lower level sets. Note that the lower level sets can be obtained by cutting operations on the epigraph: levγ f × {γ } = epi f ∩ (X × {γ }). Most basic operations in variational analysis can be naturally formulated with the help of the epigraphs. They give rise to the so-called epigraphical calculus; see [39], [47], [71]. The following result is valid for an arbitrary family of functions on an abstract space X. Proposition 3.2.1. Let X be an abstract space and (fi )i∈I be a family of extended realvalued functions fi : X −→ R ∪ {+∞} indexed by an arbitrary set I . Then epi(sup fi ) = i∈I
epi(inf fi ) = i∈I
epi fi ,
i∈I
$
epi fi .
i∈I
i
i i
i
i
i
i
80
“abmb 2005/1 page 8 i
Chapter 3. Abstract variational principles
3.2.3
Lower semicontinuous functions
Let (X, τ ) be a topological space. For any x ∈ X, we denote by Vτ (x) the family of the neighborhoods of x for the topology τ . We recall the classical definition of continuity for a function f : (X, τ ) −→ R. The function f is said to be continuous at x ∈ X for the topology τ if ∀ε>0
∃ Vε ∈ Vτ (x) such that ∀ y ∈ Vε
|f (y) − f (x)| < ε.
This can be viewed as the conjunction of the two following properties: 1. ∀ε > 0
∃ Vε ∈ Vτ (x) such that ∀ y ∈ Vε
2. ∀ε > 0
∃ Wε ∈ Vτ (x) such that ∀ y ∈ Wε
f (y) > f (x) − ε; f (y) < f (x) + ε.
Then, take Vε ∩ Wε which still belongs to Vτ (x) to obtain the continuity result. These properties are, respectively, called the lower semicontinuity and the upper semicontinuity of f at x for the topology τ . To deal with possibly extended real-valued functions, definition 1 of lower semicontinuity has to be formulated slightly differently. Definition 3.2.1. Let (X, τ ) be a topological space and f : X −→ R ∪{+∞}. The function f is said to be τ -lower semicontinuous (τ -lsc) at x if ∀ λ < f (x) ∃ Vλ ∈ Vτ (x) such that f (y) > λ ∀ y ∈ Vλ . (We write Vλ to stress the dependance of the set V upon the choice of λ!) If f is τ -lsc at every point of X, then f is said to be τ -lower semicontinuous on X. Proposition 3.2.2. Let (X, τ ) be a topological space and f : X −→ R ∪{+∞} an extended real-valued function. The following statements are equivalent: (i) f is τ -lower semicontinuous; (ii) epi f is closed in X × R (where X × R is equipped with the product topology of τ on X and of the usual topology on R); (iii) ∀γ ∈ R, levγ f is closed in (X, τ ); (iv) ∀γ ∈ R, {x ∈ X : f (x) > γ } is open in (X, τ ); (v) ∀x ∈ X f (x) ≤ lim inf y→x f (y) := sup infV ∈Vτ (x) y∈V f (y). Proof. We are going to prove (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (v) ⇒ (i). Assume that f is τ -lsc and prove that epif is closed. Equivalently, let us prove that the complement of epi f in X × R is open. Take (x, ł) ∈ epi f . By definition of epi f , ł < f (x). Take ł < γ < f . Since f is τ -lsc, there exists Vγ ∈ Vτ (x) such that f (y) > γ ∀y ∈ Vγ . Equivalently, (y, γ ) ∈ epi f for all y ∈ Vγ . It follows that (Vγ ×] − ∞, γ [) ∩ epi f = ∅. Noticing that Vγ ×] − ∞, γ [ is a neighborhood of (x, λ), the conclusion follows.
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 8 i
81
(ii) ⇒ (iii). The implication follows directly from the relation levγ f × {γ } = epi f ∩ (X × {γ }). Assuming that epi f is closed, we infer that levγ f × {γ } is closed, and hence levγ f is closed. (Note that x → (x, γ ) from X onto X × {γ } is a homeomorphism.) (iii) ⇒ (iv) is obvious just by taking the complement of levγ f . (iv) ⇒ (v). Let γ < f (x). Since, by assumption, {y ∈ X : f (y) > γ } is open, there exists some V ∈ Vτ (x) such that V ⊂ {y ∈ X : f (y) > γ }. Equivalently, ∀y∈V
f (y) > γ ,
which implies f (y) ≥ γ .
inf
y∈V
Hence inf f (y) ≥ γ ,
sup
V ∈Vτ (x) y∈V
and this being true for any γ < f (x), it follows that sup
inf f (y) ≥ f (x),
V ∈Vτ (x) y∈V
that is, f (x) ≤ lim inf f (y). y−→x
(v) ⇒ (i). Let λ < f (x). By assumption (v) λ < sup
inf f (y),
V ∈Vτ (x) y∈V
which implies the existence of some Vλ ∈ Vτ (x) such that inf f (y) > λ,
y∈Vλ
i.e., f (y) > λ for every y ∈ Vλ . This is exactly the lower semicontinuity property. The class of lower semicontinuous functions enjoys remarkable stability properties. Since closedness is preserved under arbitrary intersections and finite union of sets, we derive from Proposition 3.2.1 (epigraphical interpretation of sup and inf) and Proposition 3.2.2 (equivalence between f lsc and epif closed) the following important result. Proposition 3.2.3. Let (X, τ ) be a topological space and (fi )i∈I , fi : X −→ R ∪ {+∞} be an arbitrary collection of τ -lower semicontinuous functions. Then, supi∈I fi is still τ -lower semicontinuous. When I is a finite set of indices, inf i∈I fi is still τ -lower semicontinuous. As a consequence, the supremum of a family of continuous functions is lower semicontinuous. One can prove that if (X, τ ) is metrizable, the converse is true: if f is τ -lsc,
i
i i
i
i
i
i
82
“abmb 2005/1 page 8 i
Chapter 3. Abstract variational principles
there exists an increasing sequence (fn )n∈N of τ -continuous functions which is pointwise convergent to f . We will establish this important approximation result in Theorem 9.2.1 by using the epigraphical regularization. See also [60, Theorem 1.3.7]. Proposition 3.2.4. Let f, g : (X, τ ) −→ R∪{+∞} be two lower semicontinuous functions. Then, f + g is still lower semicontinuous. Proof. Take xν −→ x a τ -converging net. Then lim inf (f + g)(xν ) = lim inf [f (xν ) + g(xν )] ν
ν
≥ lim inf f (xν ) + lim inf g(xν ) ν
ν
≥ f (x) + g(x).
3.2.4 The lower closure of a function and the relaxation problem In some important situations, the function f to minimize fails to be lower semicontinuous for a topology τ which makes a minimizing sequence τ -relatively compact (see Section 3.2.5). In that case, the analysis of the behavior of the minimizing sequences requires the introduction of the lower closure of f and leads to the introduction of the relaxed problem. Definition 3.2.2. Given (X, τ ) a topological space and f : X −→ R ∪ {+∞}, the τ -lower envelope of f is defined as clτ f = sup{g : X → R ∪ {+∞} : g τ − lsc, g ≤ f }. Proposition 3.2.5. Let (X, τ ) be a topological space and f : X −→ R ∪{+∞} an extended real-valued function. Then clτ f is τ -lsc; it is the greatest τ -lsc function which minorizes f . We have the following properties: (a) epi(clτ f ) is the closure of epi f in X × R equipped with the product topology of τ with the usual topology of R : epi(clτ f ) = cl(epi f ); (b) clτ f = supV ∈Vτ (x) inf y∈V f (y) = lim inf y−→x f (y); (c) f is τ -lsc at x iff f (x) ≤ clτ f (x); (d) f is τ -lsc at x iff f (x) = clτ f (x). Proof. Let us first notice that a set C in X × R is an epigraph iff (i) C recedes in the vertical direction: (x, ł) ∈ C and µ > ł ⇒ (x, µ) ∈ C; (ii) C is vertically closed: for every x ∈ X the set {ł ∈ R : (x, ł) ∈ C} is closed.
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 8 i
83
(a) This implies that the closure of an epigraph is an epigraph. Set cl(epi f ) = epi g. Since epi g is closed, this implies that g is τ -lsc. Moreover, since epi g ⊃ epi f , we have g ≤ f . Hence g is a τ -lsc minorant of f . We claim that g is the lower envelope of f , that is, g is the greatest of such τ -lsc minorants of f . Take h ≤ f and h τ -lsc. Hence epi h ⊃ epi f, which implies cl(epi h) = epi h ⊃ cl(epi f ) = epi g. Hence, epi h ⊃ epi g and h ≤ g. So g = sup{h : X → R ∪ {+∞} : h τ − lsc, h ≤ f }, that is, g = clτ f . We have proved that cl(epi f ) = epi g = epi (clτ f ). (b) Since clτ f is τ -lsc, it follows from Proposition 3.2.2(v) that (clτ f )(x) ≤ lim inf (clτ f )(y); y−→x
and since clτ f ≤ f ,
(clτ f )(x) ≤ lim inf f (y) y−→x
∀ x ∈ X.
(3.31)
Then, notice that the function h(x) = lim inf y−→x f (y) is less than or equal to f and is τ -lsc. To verify this last point, take λ < lim inf f (y) = sup y−→x
inf f (y).
V ∈Vτ (x) y∈V
Then there exists some open set Vλ ∈ Vτ (x) such that inf y∈Vλ f (y) > λ. It follows that for all ξ ∈ Vλ , Vλ ∈ Vτ (ξ ) and hence sup
inf f (y) > λ,
V ∈Vτ (ξ ) y∈V
that is, lim inf f (y) > λ. y−→ξ
Thus, x → lim inf y−→x f (x) is τ -lsc and minorizes f . By definition of clτ f we have lim inf f (y) ≤ (clτ f )(x). y−→x
(3.32)
Then compare (3.31) and (3.32) to obtain ∀x ∈ X
(clτ f )(x) = lim inf f (y). y−→x
i
i i
i
i
i
i
84
“abmb 2005/1 page 8 i
Chapter 3. Abstract variational principles (c) Proposition 3.2.2 expresses that f is τ -lsc at x iff f (x) ≤ lim inf f (y). y−→x
This is equivalent to saying that f (x) ≤ clτ f (x), and since clτ f ≤ f is always true, it is equivalent to f (x) = clτ f (x). We have the following “sequential” formulation of clτ f . Proposition 3.2.6. Let (X, τ ) be a topological space and f : X −→ R ∪ {+∞}. Then, for any x ∈ X (clτ f )(x) = lim inf f (y) y−→x
= min{lim inf f (xν ) : xν is a net , xν −→ x}. ν
When (X, τ ) is metrizable (clτ f )(x) = lim inf f (y) y−→x
= min{lim inf f (xn ) : (xn ) sequence, xn −→ x}. n
Corollary 3.2.1. Let (X, τ ) be a metrizable space and let f : X −→ R ∪ {+∞} be an extended real-valued function. Then f is τ -lower semicontinuous at x ∈ X iff ∀xn −→ x f (x) ≤ lim inf f (xn ). n
Proof of Proposition 3.2.6. We give for simplicity the proof only in the metrizable case. We have lim inf f (y) = sup inf f (y), y−→x
ε>0 y∈Bτ (x,ε)
where Bτ (x, ε) = {y ∈ X : dτ (y, x) < ε}, dτ being a distance inducing the topology τ . Then, for any xn −→ x, for any ε > 0, xn belongs to Bτ (x, ε) for n sufficiently large. Hence inf f (y) ≤ f (xn ) ∀n ≥ N (ε). y∈Bτ (x,ε)
Passing to the limit as n → +∞ gives inf
y∈Bτ (x,ε)
f (y) ≤ lim inf f (xn ). n
This being true for any ε > 0 and any xn −→ x, leads to the inequality lim inf f (y) ≤ inf{lim inf f (xn ) : xn −→ x}. y−→x
n
(3.33)
On the other hand, for each n ∈ N, there exists some xn ∈ Bτ (x, 1/n) such that
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach inf
y∈Bτ (x,1/n)
f (y) ≥ f (xn ) − −n ≥ f (xn )
1 n if
if
85
inf
y∈Bτ (x,1/n)
inf
y∈Bτ (x,1/n)
“abmb 2005/1 page 8 i
f (y) > −∞
f (y) = −∞.
In both cases, lim inf f (y) = lim y−→x
inf
n y∈Bτ (x,1/n)
f (y) ≥ lim sup f (xn ) n
≥ lim inf f (xn ).
(3.34)
n
Then compare (3.33) and (3.34) to obtain lim inf f (y) = min{lim inf f (xn ) : xn −→ x} y−→x
n
= min{lim sup f (xn ) : xn −→ x}. n
Proposition 3.2.7. Let f : (X, τ ) −→ R ∪ {+∞} be an extended real-valued function. Then inf f = inf clτ f. X
X
More generally, for any τ -open subset G of X inf f = inf clτ f. G
G
Moreover, arg min f ⊂ arg min clτ f. Proof. (a) Since f ≥ clτ f , we just need to prove that inf clτ f ≥ inf f. G
G
For any x ∈ G, since G is τ -open, we have G ∈ Vτ (x). By Proposition 3.2.5, (clτ f )(x) = sup
inf f (y).
V ∈Vτ (x) y∈V
Hence, for every x ∈ G
(clτ f )(x) ≥ inf f (y). y∈G
This being true for any x ∈ G gives inf (clτ f ) ≥ inf f. G
G
(b) Let x ∈ arg min f . We have clτ f (x) ≤ f (x) ≤ inf f = inf clτ f, X
X
which implies x ∈ arg min clτ f . Remark 3.2.1. The function clτ f , which we call the lower envelope of f , is often called the lower semicontinuous regularization of f or the relaxed function or the τ -closure. The problem min{clτ f (x) : x ∈ X} is called the relaxed problem.
i
i i
i
i
i
i
86
“abmb 2005/1 page 8 i
Chapter 3. Abstract variational principles
3.2.5
Inf-compactness functions, coercivity
Besides lower semicontinuity, the second basic ingredient in minimization problems is the inf-compactness property. Definition 3.2.3. Let (X, τ ) be a topological space and let f : X −→ R ∪ {+∞} be an extended real-valued function. The function f is said to be τ -inf-compact if for any γ ∈ R levγ f = {x ∈ X : f (x) ≤ γ } is relatively compact in X for the topology τ . When f is τ -lsc, its lower level sets are closed for τ , and τ -compactness is equivalent to saying that the lower level sets of f are τ -compact. Definition 3.2.4. Let X be a normed linear space. A function f : X −→ R ∪ {+∞} is said to be coercive if limx−→+∞ f (x) = +∞. The relation between the two concepts is made clear in the following. Proposition 3.2.8. Let X be a normed space and f : X −→ R ∪ {+∞}. The following conditions are equivalent: (i) f is coercive; (ii) for any γ ∈ R, levγ f is bounded. Proof. (i) ⇒ (ii). Assume f is coercive. If for some γ0 ∈ R, levγ0 f is not bounded, then there exists a sequence (xn )n∈N such that xn −→ +∞ as n goes to +∞ and f (xn ) ≤ γ0 for all n ∈ N. But f coercive implies that f (xn ) −→ +∞, a clear contradiction. (ii) ⇒ (i). Assume that for any γ ∈ R, levγ f is bounded. If f is not coercive, we can construct a sequence (xn )n∈N such that xn −→ +∞ as n goes to +∞ and such that f (xn ) ≤ γ0 for some γ0 ∈ R. This contradicts the fact that levγ0 f is bounded. Let us recall the Riesz theorem: the bounded sets in a normed space are relatively compact iff the space has a finite dimension. Corollary 3.2.2. Let X = Rn equipped with the usual topology and let f : X −→ R ∪ {+∞}. The following conditions are equivalent: (i) f is coercive; (ii) f is inf-compact. In infinite dimensional spaces, the topologies which are directly related to coercivity are the weak topologies (see Section 2.4).
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 8 i
87
3.2.6 Topological minimization theorems In this section, unless otherwise specified (X, τ ) is a general topological space. Theorem 3.2.1. Let (X, τ ) be a topological space and let f : X −→ R ∪ {+∞} be an extended real-valued function which is τ -lower semicontinuous and τ -inf compact. Then, inf X f > −∞ and there exists some x¯ ∈ X which minimizes f on X: f (x) ¯ ≤ f (x) ∀ x ∈ X. Because of the importance of this theorem, which is often referred to as the Weierstrass theorem, we give two different proofs of it, each of independent interest. Without any restriction we may assume that f is proper, that is, f ≡ +∞. First proof. We want to prove that arg min f = ∅. We use formula (3.30), which relates arg min f to the lower level sets of f . arg min f = γ >inf X f levγ f = γ0 >γ >inf X f levγ f, where γ0 ∈ R is taken arbitrary with γ0 > inf X f . This comes from the fact that the sets levγ f are decreasing with γ . The τ -lower semicontinuity of f implies that the sets levγ f are closed for the topology τ . Moreover, for γ0 > γ > inf X f the sets levγ f are nonempty and contained in levγ0 f which is compact by the τ -inf compact property of f . Therefore, we have a family {levγ f : γ0 < γ < inf X f } of nonempty closed subsets, contained in a fixed compact set, and which is decreasing with γ . For any finite subfamily {levγi f : i = 1, 2, . . . , m} levγi f = levγ f i=1,...,m
with γ = inf{γ1 , . . . , γm } > inf X f , and hence
levγi f = ∅.
i=1,...,m
From the finite intersection property (which characterizes topological compact sets; it is obtained from the Heine–Borel property just by passing to the complement, and so replacing open sets by closed sets), we conclude that γ0 >γ >inf X f levγ f = ∅. Second proof. The proof we present now illustrates the direct method in the calculus of variations. It is the proof initiated by Hilbert and furter developed by Tonelli, which first introduces a minimizing sequence. Let us observe that given a function f : X −→ R ∪{+∞}, one can always construct a minimizing sequence, that is, a sequence (xn )n∈N such that f (xn ) −→ inf X f as n −→ +∞. To do so, we just rely on the definition of the infimum of a family of real numbers: if inf X f > −∞, take inf X f ≤ f (xn ) ≤ inf X f + 1/n; if inf X f = −∞, take f (xn ) ≤ −n.
i
i i
i
i
i
i
88
“abmb 2005/1 page 8 i
Chapter 3. Abstract variational principles
Since f is proper, inf X f < +∞ and for n ≥ 1 f (xn ) ≤ max{inf f + 1/n, −n} X
≤ max{inf f + 1, −1} := γ0 . X
Note that γ0 > inf X f , which implies levγ0 f = ∅ and xn ∈ levγ0 f
∀ n ≥ 1.
Thus, the sequence (xn )n∈N is trapped in a lower level set of f which is compact for the topology τ (f is τ -inf compact). We follow the argument and assume that τ is metrizable. In the general topological case, one has to replace sequences by nets. So, we can extract a subsequence τ -converging to some x¯ ∈ X, ¯ xnk −→ x. We have lim f (xnk ) = lim f (xn ) = inf f. n
k
X
By the τ -lower semicontinuity of f , f (x) ¯ ≤ lim f (xnk ). k
Hence, f (x) ¯ ≤ inf f, X
which says both that inf X f > −∞ since f : X −→ R ∪ {+∞} and f (x) ¯ ≤ f (x)
∀ x ∈ X,
and the proof is complete. Remark 3.2.2. Indeed, the inf-compactness assumption can be slightly weakened by noticing that in the proof of Theorem 3.2.1, we just need to know that some lower level set of f is relatively compact. Notice that it is equivalent to assume that levγ0 f is relatively compact or to assume that levγ f is relatively compact for all γ ≤ γ0 . So, we can formulate the following result. Theorem 3.2.2. Let (X, τ ) be a topological space and f : X −→ R ∪ {+∞} an extended real-valued function which is τ -lower semicontinuous and such that for some γ0 ∈ R, levγ0 f is τ -compact. Then inf X f > −∞ and there exists some x¯ ∈ X which minimizes f on X: f (x) ¯ ≤ f (x) ∀ x ∈ X. To illustrate the difference between Theorems 3.2.1 and 3.2.2, take X = R and x2 f (x) = 1+x 2 . Then, levγ f is compact for γ < 1, but lev1 f = R. Thus we can apply
i
i i
i
i
i
i
3.2. Minimization problems: The topological approach
“abmb 2005/1 page 8 i
89
Theorem 3.2.2 to conclude the existence of a minimizer (which is zero!), but Theorem 3.2.1 does not apply! Corollary 3.2.3. Let (X, τ ) be a topological space and assume that f : X −→ R ∪ {+∞} is τ -lower semicontinuous. Then, for any compact subset K of (X, τ ), there exists some x¯ ∈ K such that f (x) ¯ ≤ f (x) ∀x ∈ K. Proof. Take g := f + δK . Since K is closed, δK is lower semicontinuous. So, g as a sum of two lower semicontinuous functions is still lower semicontinuous. The sublevel sets of g are contained in K, so g is τ -inf compact. Therefore, by Theorem 3.2.1 there exists some x¯ ∈ X such that g(x) ¯ ≤ g(x) for every x ∈ X, that is, f (x) ¯ ≤ f (x) ∀ x ∈ K, x¯ ∈ K, which completes the proof. The above statement is the unilateral version of the classical theorem which says that a continuous function achieves on any compact its minimum value and maximum value. If one is concerned only with the minimization problem, one just needs to consider lower semicontinuous functions. Corollary 3.2.4. Take X = RN with the usual topology. Take f : RN −→ R ∪ {+∞} which is lower semicontinuous and coercive. Then, there exists some x¯ ∈ RN such that f (x) ¯ ≤ f (x) ∀x ∈ RN . Proof. Since f : RN −→ R ∪ {+∞} is coercive, it is inf-compact for the usual topology (Corollary 3.2.2). This combined with the lower semicontinuity of f implies the existence of a minimizer. Comments on the direct methods of the calculus of variations. Theorems 3.2.1 and 3.2.2 provide both a general existence result for the global minimization problem of an extended real-valued function and a method for solving such problems, originally introduced by Hilbert and further developed by Tonelli: when considering a minimization problem for a function f : X −→ R ∪ {+∞} min{f (x) : x ∈ X}, one first constructs a minimizing sequence, which is a sequence (xn )n∈N such that f (xn ) −→ inf f X
as n −→ +∞.
This is always possible; at this point, we don’t need any structure on X. Then, one has to establish that the sequence (xn )n∈N is relatively compact for some topology τ on X, and this is how the topology τ appears. This usually comes from some estimations on (xn )n∈N
i
i i
i
i
i
i
90
“abmb 2005/1 page 9 i
Chapter 3. Abstract variational principles
which follow from a coercivity property of f with respect to some norm on X. Then, one may use weak topologies or some compact embeddings to find the topology τ . We stress that there is a great flexibility in this method. One may consider special minimizing sequences enjoying compactness properties which are not shared by the whole lower level sets. We will return to this important point. We just say here that it is the skill of the mathematician to find a minimizing sequence which is relatively compact for a topology τ which is as strong as possible. Indeed, and this is the second point of the direct methods, one then has to verify that f is τ -lower semicontinuous. Clearly, the stronger the topology τ , the easier it is to verify the lower semicontinuity property. As a general rule, inf-compactness and lower semicontinuity are two properties which are antagonist: if τ1 > τ2 , then f τ1
inf-compact ⇒ f τ2
inf-compact
while f τ2 −lsc ⇒ f τ1 −lsc. Thus, a balance with respect to these two properties determines the choice of a “good” topology τ (if it exists!). As we will see, in some important situations, the function f fails to be τ -lower semicontinuous and there is no solution to the minimization problem of f . In such situations, it is still interesting to understand the behavior of the minimizing sequences. The following “relaxation result” gives a first general answer to this question. Theorem 3.2.3. Let (X, τ ) be a topological space and let f : X −→ R ∪ {+∞} be an extended real-valued function. Let (xn )n∈N be a minimizing sequence for f , and suppose that a subsequence xnk τ -converges to some x¯ ∈ X. Then (clτ f )(x) ¯ ≤ (clτ f )(x) ∀ x ∈ X, that is, x¯ is a minimum point for clτ f . Proof. Since (xn )n∈N is a minimizing sequence, lim f (xnk ) = lim f (xn ) = inf f. n
k
X
By Proposition 3.2.6, since x¯ = τ − lim xnk , clτ f (x) ¯ ≤ lim f (xnk ). k
By Proposition 3.2.7, inf f = inf clτ f. X
X
Hence ¯ ≤ lim f (xnk ) = lim f (xn ) = inf f = inf clτ f, clτ f (x) X
X
that is, ¯ ≤ (clτ f )(x) (clτ f )(x)
∀x ∈ X.
i
i i
i
i
i
i
3.3. Convex minimization theorems
“abmb 2005/1 page 9 i
91
We say that min{(clτ f )(x) : x ∈ X} is the relaxed problem of the initial minimization problem of f over X.
3.2.7 Weak topologies and minimization of weakly lower semicontinuous functions Until now, the basic ingredients used in the direct approach to minimization problems have been purely topological notions. We now assume that the underlying space (X, τ ) is a vector space. To stress that fact, we denote it by V (like vector) and assume that V is a normed space, the norm of v ∈ V being denoted by vV or v when no confusion is possible. The consideration of coercive functions f : V −→ R ∪ {+∞} leads naturally to study of the topological properties of the bounded subsets of V , and this is a basic reason for studying weak topologies on topological vector spaces. We recall from Theorems 2.4.2 and 2.4.3 the following result. Theorem 3.2.4. In a reflexive Banach space V , the bounded sets are weakly relatively compact. Moreover, from any bounded sequence (un )n∈N in V one can extract a weakly convergent subsequence. As a direct application of Proposition 3.2.8 and of the previous compactness result, we obtain that a coercive function on a reflexive Banach space is weakly inf-compact. By using the Weierstrass minimization Theorem 3.2.1 we obtain the following existence result. Theorem 3.2.5. Let V be a reflexive Banach space and f : V −→ R ∪ {+∞} an extended real-valued function which is coercive and weakly lower semicontinuous. Then there exists some u ∈ V such that f (u) ≤ f (v) ∀v ∈ V . The question that now naturally arises is to describe the class of functions which are weakly lower semicontinuous. This is where convexity plays a central role.
3.3
Convex minimization theorems
Let us first recall some definitions and elementary properties of extended real-valued convex functions.
3.3.1
Extended real-valued convex functions and weak lower semicontinuity
Definition 3.3.1. Let V be a linear space and let f : V −→ R ∪ {+∞}. Then, f is said to be convex if for each u, v ∈ V and each λ ∈ [0, 1] we have f (λu + (1 − λ)v) ≤ λf (u) + (1 − λ)f (v). Proposition 3.3.1. Let V be a linear space and f : V −→ R ∪ {+∞}. Then, f is convex if and only if its epigraph is a convex subset of V × R.
i
i i
i
i
i
i
92
“abmb 2005/1 page 9 i
Chapter 3. Abstract variational principles
Proof. Let us first assume that f is convex. Fix (u, α) and (v, β) in epi f and λ ∈ [0, 1]. Since α ≥ f (u) and β ≥ f (v), then f (u) and f (v) are finite and we have λα + (1 − λ)β ≥ λf (u) + (1 − λ)f (v) ≥ f (λu + (1 − λ)v). This is equivalent to saying that (λu + (1 − λ)v, λα + (1 − λ)β) ∈ epif , i.e., λ(u, α) + (1 − λ)(v, β) ∈ epi f, and so epi f is convex. Conversely, let us assume that epi f is convex. Fix λ ∈ [0, 1]. If either f (u) = +∞ or f (v) = +∞, since 0 × (+∞) = 0, the inequality is clearly valid. So, let us assume that f (u) < +∞ and f (v) < +∞. The two points (u, f (u)) and (v, f (v)) are in epi f and so is the segment joining this two points. In particular, λ(u, f (u)) + (1 − λ)(v, f (v)) ∈ epi f, which is equivalent to saying that f (λu + (1 − ł)v) ≤ λf (u) + (1 − λ)f (v). So, it is equivalent to study convex sets or convex functions. Let us now recall the geometrical version of the Hahn–Banach theorem, which plays a basic role in convex analysis (see chapter 9). Theorem 3.3.1 (Hahn–Banach separation theorem). Let (V , · ) be a normed linear space and suppose that C is a nonempty closed convex subset of V . Then, each point u ∈ C can be strongly separated from C by a closed hyperplane, which means ∃u∗ ∈ V ∗ , ∃α ∈ R such that ∀v ∈ C u∗ (v) ≤ α and u∗ (u) > α. This is equivalent to saying that C is contained in the closed half-space Hα,u∗ = {v ∈ V : u∗ (v) ≤ α}, whereas u is in its complement. Corollary 3.3.1. Let (V , · ) be a normed linear space and let C be a nonempty closed convex subset of V . Then C is equal to the intersection of the closed half-spaces that contain it. Theorem 3.3.1 and Corollary 3.3.1 have important consequences with respect to topological (closedness) properties of convex sets: let us notice that by definition of the weak topology on V, any linear strongly continuous form is continuous for the weak topology which implies that any closed half-space Hα,u∗ = {v ∈ V : u∗ (v) ≤ α} is closed for the weak topology. So, any closed convex set, which is equal to an intersection of closed half spaces, is closed for the weak topology. Since for an arbitrary set, the reverse implication “closed for the weak topology” ⇒ “closed for the strong topology” is always true, we finally obtain the following result.
i
i i
i
i
i
i
3.3. Convex minimization theorems
“abmb 2005/1 page 9 i
93
Theorem 3.3.2. Let (V , · ) be a normed linear space and C a nonempty convex subset of V . Then, the following statements are equivalent: (i) C is closed for the norm topology of V; (ii) C is closed for the weak topology of V. When translating the above theorem from sets to functions via the correspondence f −→ epi f and recalling that f convex ⇐⇒ epi f convex in X × R, f τ − lsc ⇐⇒ epi f closed in X × R, we obtain the following result. Theorem 3.3.3. Let V be a normed linear space and f : V −→ R ∪ {+∞} a convex proper function. The following statements are equivalent: (i) f is lower semicontinuous for the norm topology on V; (ii) f is lower semicontinuous for the weak topology on V. Of course it is the implication (i) ⇒ (ii) which is important. It tells us that a convex lower semicontinuous function is automatically weakly lower semicontinuous. As a particular case, a convex continuous function is weakly lower semicontinuous.
3.3.2
Convex minimization in reflexive Banach spaces
In this section (V , ·) is a reflexive Banach space. We can now state the following important result. Theorem 3.3.4. Let (V , · ) be a reflexive Banach space and f : V −→ R ∪ {+∞} a convex, lower semicontinuous, and coercive function. Then, there exists u ∈ V which minimizes f on V : f (u) ≤ f (v) ∀ v ∈ V . First proof. Since f is coercive, its lower level sets are bounded in V and hence weakly relatively compact. So f is weakly inf-compact. Since f is convex lower semicontinuous, it is weakly lower semicontinuous. Then apply the Weierstrass minimization Theorem 3.2.1 to f with τ equal to the weak topology of V . Second proof. We use the direct methods of the calculus of variations. Take (un )n∈N , a minimizing sequence for f , that is, limn f (un ) = inf V f . For n sufficiently large, (un )n∈N remains in a fixed sublevel set of f , which is bounded by coercivity of f . Because the space V is reflexive, one can extract a weakly convergent subsequence unk u. We have lim f (unk ) = lim f (un ) = inf f. k
n
V
The function f is convex and lower semicontinuous, so it is weakly lower semicontinuous and f (u) ≤ lim inf f (unk ). k
i
i i
i
i
i
i
94
“abmb 2005/1 page 9 i
Chapter 3. Abstract variational principles
It follows that f (u) ≤ lim inf f (unk ) = lim f (un ) = inf f, n
k
V
that is, f (u) ≤ f (v) ∀v ∈ V . Remark 3.3.1. Concerning the question of uniqueness, we need to recall the notion of strict convexity: the function f −→ R ∪ {+∞} is said to be strictly convex if ∀u = v
∀λ ∈]0, 1[
f (λu + (1 − λ)v) < λf (u) + (1 − λ)f (v).
It is easily seen that the conclusion of Proposition 2.3.3 still holds for functions with values in R ∪ {+∞}. Example 3.3.1. Take for V a Hilbert space with norm · 2 = ·, ·. Then · 2 is strictly convex: indeed, for v1 , v2 ∈ V , λ ∈]0, 1[ we have λv1 + (1 − λ)v2 2 − λv1 2 − (1 − λ)v1 2 = −λ(1 − λ) v2 2 + v1 2 − 2v1 , v2 = −λ(1 − λ)v1 − v2 2 ≤ 0. The above inequality becomes an equality iff v1 = v2 . As an application of the results above, we consider the problem of the best approximation in Hilbert spaces. √ Theorem 3.3.5. Let (V , · ), be a Hilbert space with norm · = ·, ·. Given C ⊂ V a closed convex nonempty subset of V and u0 ∈ V , there exists a unique element u¯ ∈ C such that u0 − u ¯ ≤ inf u0 − v. v∈C
We have u0 − u ¯ = d(u0 , C), that is, u¯ ∈ C realizes the minimum of the distance between u0 and C. We say that u¯ is the projection of u0 on C and we write u¯ = projC u0 . Moreover, u¯ is characterized by the following property: u¯ ∈ C, u0 − u, ¯ v − u ¯ ≤ 0 ∀ v ∈ C. Because of the importance of this result, we give two proofs of independent interest. The first relies on Theorem 3.3.4 and is straightforward, but recall that we have used the weak topology to prove Theorem 3.3.4. The second is a direct one and completely elementary (it does not use the weak topology) and can be the starting point for developing a theory of Hilbert spaces at a more elementary level without using the weak topologies. First proof. First notice that it is equivalent to have u0 − u ¯ ≤ u0 − v
∀v∈C
i
i i
i
i
i
i
3.3. Convex minimization theorems
“abmb 2005/1 page 9 i
95
or u0 − u ¯ 2 ≤ u0 − v2
∀ v ∈ C.
So, we may say that our problem is equivalent to minimizing f (v) = u0 − v2 + δC (v) over V . Clearly f is a convex function, as a sum of convex functions. It is strictly convex, because · 2 is strictly convex and the sum of a convex and of a strictly convex function is still strictly convex. Since C is closed, δC is lower semicontinuous, and since · 2 is continuous it is also lower semicontinuous, and f as a sum of two lower semicontinuous functions is still lower semicontinuous. Finally f (v) ≥ u0 − v2 , which is clearly coercive, and so is f . So, f is strictly convex, lower semicontinuous, and coercive. It achieves its minimum at a unique point u¯ ∈ C. Second proof. Let (un )n∈N be a minimizing sequence, that is, un ∈ C, u0 − un 2 −→ inf{u0 − v2 : v ∈ C} = d(u0 , C)2 . Then, use the parallelogram equality: given n, m ∈ N, 2u0 − un 2 + 2u0 − um 2 = un − um 2 + 4 u0 −
(un + um ) 2
2
.
Since C is convex, (un + um )/2 belongs to C and un + um 2 u0 − ≥ d(u0 , C)2 . 2 Hence un − um 2 ≤ 2u0 − un 2 + 2u0 − um 2 − 4d(u0 , C)2 . It follows that lim sup un − um 2 ≤ 0,
n,m−→+∞
that is, the sequence (un )n∈N is a Cauchy sequence. Since V is a Hilbert space, the sequence (un )n∈N norm-converges to some element u¯ which still belongs to C, because C is closed. Moreover, u0 − u ¯ 2 = lim u0 − un 2 = d(u0 , C)2 . n
Let us now prove the optimality condition for u, ¯ that is, u¯ ∈ C, u0 − u, ¯ v − u ¯ ≤ 0 ∀v ∈ C. This property says that u¯ is characterized by the following geometrical property: for any v ∈ C, the angle between the two vectors u0 − u¯ and v − u¯ is greater than or equal to π/2. We will later derive this property from general subdifferential calculus. At the moment, we give a direct elementary proof of it.
i
i i
i
i
i
i
96
“abmb 2005/1 page 9 i
Chapter 3. Abstract variational principles
For any v ∈ C, by convexity of C, the line segment [u, ¯ v] still belongs to C and hence, for all t ∈ [0, 1], wt = tv + (1 − t)u¯ belongs to C. By definition of u¯ u0 − u ¯ 2 ≤ u0 − tv − (1 − t)u ¯ 2 ¯ − t (v − u) ¯ 2. ≤ (u0 − u) By developing this last expression, we obtain 2tu0 − u, ¯ 2. ¯ v − u ¯ ≤ t 2 v − u Divide by t > 0, and then let t go to zero to obtain u0 − u, ¯ v − u ¯ ≤ 0. Conversely, let us prove that if u satisfies the optimality condition above, then u0 − u ≤ inf u0 − v. v∈C
First notice that the optimality condition implies ∀v ∈ C, u0 − v, v − u ¯ ≤ 0.
(3.35)
Indeed, u0 − v, v − u ¯ = u0 − u¯ + u¯ − v, v − u ¯ = u0 − u, ¯ v − u ¯ − u¯ − v2 ≤ 0. We then have u0 − u ¯ 2 = u0 − u, ¯ u0 − v + v − u ¯ ¯ u0 − v ≤ u0 − u, = u0 − v + v − u, ¯ u0 − v 2 = u0 − v + u0 − v, v − u ¯ ≤ u0 − v2 , where we have used the optimality condition in the first inequality and relation (3.35) in the last one. Corollary 3.3.2. When C = W is a closed subspace, then u¯ = projW u0 is characterized by u¯ ∈ W, u0 − u¯ ∈ W ⊥ , that is,
u0 = (u0 − u) ¯ + u¯ ∈ W ⊥ + W.
i
i i
i
i
i
i
3.3. Convex minimization theorems
“abmb 2005/1 page 9 i
97
This is the orthogonal decomposition of V = W ⊕ W ⊥ as the sum of two orthogonal subspaces. Moreover, the projection operator proj : V −→ W is linear. Proposition 3.3.2. When V is a Hilbert space and C is a closed convex nonempty subset of V , the projection operator V −→ C which associates to each u ∈ V its projection projC u on C is a contraction: ∀ u, v ∈ V
projC u − projC v ≤ u − v.
Proof. We have u − projC u, z − projC u ≤ 0
∀ z ∈ C,
v − projC v, z − projC v ≤ 0
∀ z ∈ C.
Take z = projC v in the first inequality and z = projC u in the second one. Summing up, we obtain projC v − projC u, u − projC u − v + projC v ≤ 0, that is, projC v − projC u2 ≤ projC v − projC u, v − u. By the Cauchy–Schwarz inequality, it follows that projC v − projC u ≤ v − u. We end this section by remarking that the Hilbertian structure plays a fundamental role in the previous results for the best approximation. The existence of the best approximation u¯ still holds true when (V , · ) is a reflexive Banach space. But when the space (V , · ) is no longer reflexive, even the existence of u¯ may fail to be true, as shown by the following example. Example 3.3.2. Take V = C([0, 1]; R) equipped with the sup norm ∀v ∈ V
v∞ = sup{|v(t)| : t ∈ [0, 1]}.
Then (V , · ∞ ) is a Banach space which is not reflexive. Indeed, this is a consequence of the fact that the existence of a projection may fail to be true in this space: take 1 1/2 v(t)dt − v(t)dt = 1 . C= v∈V : 0
1/2
Clearly C is a closed convex nonempty subset of V (it is in fact a closed hyperplane). One can easily verify that d(0, C) = 1, but there is no element v ∈ C such that v∞ = 1. Indeed, if v ∈ C, 1 1/2 1 1/2 1 1 v(t)dt − v(t)dt ≤ |v(t)|dt + |v(t)|dt ≤ v∞ . 1= 2 0 2 1/2 0 1/2 Hence d(0, C) = inf{v∞ : v ∈ C} ≥ 1, and it is not difficult to show that d(0, C) = 1. Suppose now that for some v ∈ C, v∞ = 1. Since 1 1/2 1 1 ≤ , ≤ , v(t) dt v(t) dt 2 2 0 1/2 1 1/2 we necessarily have 0 v(t) dt = 21 and 1/2 v(t) dt = − 21 .
i
i i
i
i
i
i
98
“abmb 2005/1 page 9 i
Chapter 3. Abstract variational principles
1/2 So 0 (1−v(t))dt = 0. Since 1−v(x) ≥ 0, this implies v(t) ≡ 1 on [0, 21 ]. Similarly v(t) ≡ −1 on [ 21 , 1], a clear contradiction with the fact that v has to be continuous.
3.4
Ekeland’s ε-variational principle
The so-called ε-variational principle was introduced by Ekeland in 1972 [128], [129], [130]. It is a general powerful tool in variational analysis and optimization which can be traced back to a maximality result for a partial ordering introduced by Bishop and Phelps in 1962 [70]. Ekeland’s ε-variational principle asserts the existence of minimizing sequences of a particular kind. Not only do they approach the infimal value of the minimization problem, but they also simultaneously satisfy the first order necessary conditions up to any desired approximation. In many instances, this makes this variational principle play a key role in the application of the direct method. Indeed, Ekeland’s ε-variational principle has known a considerable success with applications in a wide variety of topics in nonlinear analysis and optimization (critical point theory, geometry of Banach spaces, etc.). In the last two decades, there has been increasing evidence that Ekeland’s ε-variational principle bears close connections with dissipative dynamical systems (dynamical systems with entropy). Indeed, solutions provided by the ε-variational principle can be seen as stable equilibria of such dynamics [92], [46]. In particular, the recent model in dynamical decision theory introduced by Attouch and Soubeyran [37] will serve as a guideline in the proof and interpretation of the results.
3.4.1
Ekeland’s ε-variational principle and the direct method
Let us start with the following formulation of the Ekeland’s ε-variational principle. Theorem 3.4.1 (Ekeland). Let (X, d) be a complete metric space and f : X −→ R ∪ {+∞} an extended real-valued function which is lower semicontinuous and bounded below ( inf X f > −∞). Then, for each ε > 0, there exists some xε ∈ X which satisfies the two following properties: (i) inf f ≤ f (xε ) ≤ inf f + ε, X
(ii)
X
f (x) ≥ f (xε ) − εd(x, xε ) ∀ x ∈ X.
Let us first comment on this result and show some direct consequences of it. (Its proof is postponed to the next section.) Condition (ii) has a clear interpretation when f : X −→ R is Gateaux differentiable on a Banach space (X, .). It can be seen as a unilateral nonsmooth version of the condition Df (xε )∗ ≤ ε. Let us start with the definition of the Gateaux differentiability property of f at xε . For any ξ ∈ X, with ξ = 1 and any t > 0, f (xε + tξ ) = f (xε ) + tDf (xε ), ξ + o(t). By taking x = xε + tξ in (ii) and using the above equality, we obtain f (xε ) + tDf (xε ), ξ + o(t) ≥ f (xε ) − εt.
i
i i
i
i
i
i
3.4. Ekeland’s ε-variational principle
“abmb 2005/1 page 9 i
99
Let us simplify, divide by t > 0, and let t → 0+ . We obtain Df (xε ), ξ ≥ −ε. Changing ξ into −ξ yields
|Df (xε ), ξ | ≤ ε.
This being true for any ξ ∈ X with ξ ≤ 1, we finally obtain Df (xε )∗ ≤ ε. We can summarize this result in the following Corollary 3.4.1. Let (X, .) be a Banach space and f : X −→ R a real-valued function which is lower semicontinuous, Gateaux differentiable, and bounded below. Then, for each ε > 0, there exists some xε ∈ X such that inf f ≤ f (xε ) ≤ inf f + ε, X
Df (xε )∗ ≤ ε.
X
The above result asserts the existence of minimizing sequences (xn )n∈N of particular type: take xn = xεn with εn → 0+ ; then f (xn ) → inf X f as n → +∞, Df (xn ) → 0 in X∗ as n → +∞. Application of the direct method when dealing with such particular minimizing sequences leads us naturally to introduce the so-called Palais–Smale compactness condition for a functional f . Definition 3.4.1. Let (X, .) be a Banach space. We say that a C1 function f : X −→ R satisfies the Palais–Smale condition if every sequence (xn )n∈N in X which satisfies sup |f (xn )| < +∞ and Df (xn ) → 0 in X∗ as n → +∞ n
possesses a convergent subsequence ( for the topology of the norm of X). As an immediate consequence of Corollary 3.4.1, we obtain the next theorem. Theorem 3.4.2. Let (X, .) be a Banach space and f : X −→ R a C1 function which satisfies the Palais–Smale condition and which is bounded below. Then the infimum of f on X is achieved at some point x¯ ∈ X and x¯ is a critical point of f , i.e., Df (x) ¯ = 0. Proof. Using Corollary 3.4.1 of the Ekeland’s ε-variational principle, we have the existence of a sequence (xn )n∈N which satisfies f (xn ) → inf f, Df (xn ) → 0. X
Since inf X f ∈ R, we have supn |f (xn | < +∞, and the sequence (xn )n∈N satisfies the hypotheses of the Palais–Smale condition. Hence, one can extract a convergent subsequence
i
i i
i
i
i
i
100
“abmb 2005/1 page 1 i
Chapter 3. Abstract variational principles
xnk → x. ¯ By using the continuity properties of f and Df , one gets at the limit f (x) ¯ = inf X f and Df (x) ¯ = 0. Judicious applications of this kind of result (based on the Palais–Smale compactness condition) provide existence results for critical points, not only local minima or maxima but also saddle points. One of the most celebrated of these results is the mountain pass theorem of Ambrosetti and Rabinowitz [12]. For further results in this direction, see [46] or [121]. Indeed, it turns out that when f is not necessarily smooth, condition (ii) is a convenient formulation of an ε-approximate optimality condition. The key is the following observation: Property (ii) of Theorem 3.4.1 just expresses that xε is an exact solution of the perturbed minimization problem (Pε ) % & inf f (x) + εd(x, xε ) : x ∈ X .
(Pε )
In the particular and important case where f is convex and lower semicontinuous on a Banach space, one gets the following result. Corollary 3.4.2. Let (X, .) be a Banach space and f : X −→ R∪{+∞} an extended realvalued function which is convex, lower semicontinuous, proper (f ≡ +∞), and bounded below. Then, for each ε > 0 there exist xε ∈ X and xε∗ ∈ X∗ such that
inf f ≤ f (xε ) ≤ inf f + ε, X
X
xε∗ ∈ ∂f (xε ), xε ∗ ≤ ε, where ∂f (xε ) is the subdifferential of f at xε . Proof. We use standard tools from convex subdifferential calculus (see chapter 9). Since xε minimizes the closed convex proper function x → ϕ(x) := f (x) + εx − xε , we have ∂ϕ(xε ) 0. The norm being a continuous function in X, the additivity rule for the subdifferential calculus holds (Theorem 9.5.4) and we have ∂f (xε ) + εB(0, 1) 0. Equivalently, there exists some xε∗ ∈ ∂f (xε ) with xε∗ ∗ ≤ ε.
3.4.2 A dynamical approach and proof of Ekeland’s ε-variational principle Ekeland’s ε-variational principle bears close connection with dissipative dynamical systems. This fact was recognized by Brezis and Browder [92]: “A general ordering principle”; Aubin and Ekeland [46]: “walking in complete metric spaces”; and Zeidler [227]: “The abstract entropy principle.” More recently, the importance of this principle in the modelization of dynamical decision with bounded rationality was put to the fore by Attouch and Soubeyran [37]. This cognitive interpretation will serve as a guideline throughout this section. The central concept in the dynamical approach to Ekeland’s ε-variational principle is the following partial ordering relation.
i
i i
i
i
i
i
3.4. Ekeland’s ε-variational principle
“abmb 2005/1 page 1 i
101
Definition 3.4.2. Let (X, d) be a metric space and f : X −→ R ∪ {+∞} an extended real-valued function which is proper (f ≡ +∞). Let us introduce the following partial ordering on X: y $S x ⇐⇒ f (y) + d(x, y) ≤ f (x). We call it the marginal satisficing relation. We write % & S(x) = y ∈ X : y $S x % & = y ∈ X : f (y) + d(x, y) ≤ f (x) the set of elements of X which satisfy this ordering with respect to x. Let us introduce some elements of decision theory that allow us to interpret this relation in a natural and intuitive way. Space X is the decision or performance space (the state space in physics). It is supposed that to each element x ∈ X the agent is able to attribute a value or valence f (x) ∈ R ∪ {+∞} which measures the quality of the decision or performance x (the value +∞ allows us to take account of the constraints). For example, when performing x, f (x) measures how far the agent is from a given goal. In our context, f (x) measures the dissatisfaction of the agent who, making x ∈ X, is faced with a problem which is not completely solved. Thus the agent is willing to reduce its dissatisfaction and make f (x) as small as possible. The connection with the traditional formulation in decision sciences is obtained by taking f (x) = g − g(x), where g is a classical utility or gain function and g is a desirable level of resolution of the problem (for example g = supx∈X g(x)). We choose this presentation to fit well with the classical formulation of variational principles in mathematics and physics and, in our situation, with the usual formulation of Ekeland’s ε-variational principle. The classical decision theory deals with perfectly rational agents who have immediate and free access to a global knowledge of their environment, and correspondingly minimize their value function f on X. Modelization of decision processes in a complex real world requires us to introduce some further notions. Following Simon’s [207] pioneering work in decision theory and bounded rationality, one needs to modelize the ability and difficulty of the agent to move and decide in a complex environment. A major difficulty for the agent is that it needs to explore its environment and get enough information to make further decisions. In this context, making decision becomes a dynamical process which at each step k = 1, 2, . . . is based on the following question: Is it worthwhile for the agent to pass from a given state xk ∈ X (performance, decision, allocation) at time tk into a further state xk+1 ∈ X at time tk+1 ? A key ingredient of the modelization of this balance between the advantage for the agent to pass from x to y and the possibility and difficulty of realizing it is the notion of cost to change. Following Attouch and Soubeyran [37], one introduces for any x and y in X, c(x, y) ≥ 0, which is the cost to pass (change, move) from x to y. In our context, we assume that c(x, y) ≥ θ d(x, y), where d is a metric on X and θ > 0, is a unitary cost to move. This expresses that the cost to move is high for small deplacements (by contrast with c(x, y) = d(x, y)2 , for example!). This metric d modelizes the difficulty for the agent to pass from a state x to a further state y. This is where the metric d in the cognitive interpretation of Ekeland’s ε-variational principle appears! This is a rich concept
i
i i
i
i
i
i
102
“abmb 2005/1 page 1 i
Chapter 3. Abstract variational principles
which covers several aspects: there are costs to explore and get information (this comes from limited time and energy available for the agent), physical costs to move, and also costs with psychological and cognitive interpretation (dissimilarity costs, cost to quit a routine and enter into an other one, excitation and inhibition costs). From now on, we consider the particular situation c(x, y) = d(x, y), which is enough for our purpose. We now have the two terms of the balance: on one hand, the marginal gain f (x)−f (y), and on the other hand, the cost to change d(x, y). Precisely, the marginal satisficing relation y $S x says that it is worthwhile for the agent to pass from x to y if the expected marginal gain (“reduction of dissatisfaction”) f (x) − f (y) is greater than or equal to the cost to pass from x to y: f (x) − f (y) ≥ d(x, y). Let us examine some properties of the marginal satisficing relation $S . Proposition 3.4.1. The marginal satisficing relation $S is a partial ordering relation on domf . An element x¯ ∈ X is maximal with respect to this order iff for all x ∈ X, x = x¯ one has f (x) ¯ < f (x) + d(x, ¯ x). Proof. Clearly $S is reflexive; we have x $S x for all x ∈ X, because d(x, x) = 0. Let us verify that $S is antisymetric. Suppose y $S x and x $S y. We thus have f (x) ≥ f (y) + d(x, y) and f (y) ≥ f (x) + d(x, y). Let us add these two inequalities and simplify the resulting expression. (At this point, note that it is important to consider x and y in domf.) One obtains 2d(x, y) ≤ 0, and hence x = y. Let us now verify that $S is transitive. Suppose that z $S y and y $S x. We have f (y) ≥ f (z) + d(y, z), f (x) ≥ f (y) + d(x, y). By adding the two above inequalities, and using again that x, y, z ∈ domf , we obtain f (x) ≥ f (z) + d(x, y) + d(y, z) ≥ f (z) + d(x, z). In the above inequality we used the triangle inequality property satisfied by the metric d. Hence, z $S x and $S is transitive. Let us now express that an element x¯ of X is maximal with respect to the partial ordering relation $S . This means that for any x ∈ X, the following implication holds: x $S x¯ ⇒ x = x, ¯ ∀x ∈ X, x = x, ¯
i.e.,
f (x) ¯ < f (x) + d(x, ¯ x),
which completes the proof.
i
i i
i
i
i
i
3.4. Ekeland’s ε-variational principle
“abmb 2005/1 page 1 i
103
Cognitive interpretation of maximal elements for S . Let us show that maximal elements of the satisficing relation $S can be interpreted as stable routines of the corresponding “worthwhile to move” (marginal satisficing) dynamical system. In our cognitive version, x¯ ∈ X is said to be a stable routine if, starting from x, ¯ the agent prefers to stay at x¯ than to move from x¯ to x for all x different from x. ¯ Let us make this precise and consider, when starting from x, ¯ the two following possibilities: (a) if the agent chooses to stay at x, ¯ his gain (dissatisfaction in our case) will be f (x) ¯ + d(x, ¯ x) ¯ = f (x); ¯ (b) if the agent considers to move from x¯ to x, his gain (dissatisfaction) is after moving (one adds two unsatisfactions, the cost to move d(x, ¯ x) and the dissatisfaction attached to x): f (x) + d(x, ¯ x). Thus, the agent is willing to stay at x¯ and rejects any move from x¯ to x for all x = x¯ iff f (x) ¯ < f (x) + d(x, ¯ x), which, by Proposition 3.4.1, just expresses that x¯ is maximal for $S . Therefore, Ekeland’s ε-variational principle can be reformulated as an existence result of a maximal element for the partial ordering $S . Let us make this precise in the following statement. Theorem 3.4.3. Let us assume that (X, d) is a complete metric space and f : X −→ R ∪ {+∞} is an extended real-valued function which is lower semicontinuous and bounded below. Then for any x0 ∈ domf there exists some x¯ ∈ X which satisfies the two following properties: (i) x¯ $S x0 , (ii) x¯ is maximal with respect to the partial ordering $S . Before proving Theorem 3.4.3, let us show how Ekeland’s ε-variational principle can be derived from it: given ε > 0, take x0 ∈ domf such that inf f ≤ f (x0 ) ≤ inf f + ε. X
X
Then, let us apply Theorem 3.4.3 with the metric εd and the corresponding satisficing relation y $S x ⇐⇒ f (y) + εd(x, y) ≤ f (x). Theorem 3.4.3 asserts the existence of x¯ε such that x¯ε $S x0 and x¯ε maximal with respect to $S . The property x¯ε $S x0 implies f (x¯ε ) ≤ f (x0 ) − εd(x¯ε , x0 ) ≤ f (x0 ) ≤ inf f + ε. X
i
i i
i
i
i
i
104
“abmb 2005/1 page 1 i
Chapter 3. Abstract variational principles
On the other hand, by Proposition 3.4.1 and the maximality property of x¯ε , we have ∀x = x¯ε
f (x¯ε ) < f (x) + εd(x¯ε , x).
Thus, x¯ε satisfies the two desired properties (i) and (ii) of Theorem 3.4.1. We are going to prove Theorem 3.4.3 (and hence Ekeland’s ε-variational principle 3.4.1) by using the dynamical system, which is naturally associated to the marginal satisficing relation. Definition 3.4.3. A trajectory (xk )k∈N of the marginal satisficing dynamics is a sequence of elements xk of X such that (S)
xk+1 ∈ S(xk ) ∀ k = 0, 1, 2, . . . ,
where S is the marginal satisficing relation. Equivalently, we have x0 %S x1 %S x2 %S · · · %S xk %S xk+1 %S · · · , that is, f (xk+1 ) + d(xk , xk+1 ) ≤ f (xk ) ∀ k = 0, 1, 2, . . . . Let us establish some general properties of the trajectories of the above dynamical system (S). We are mostly concerned with the asymptotical behavior as k → +∞ of these trajectories. Proposition 3.4.2. Let (X, d) be a metric space and f : X −→ R ∪ {+∞} an extended real-valued function which is proper and bounded below. Take any trajectory (xk )k∈N of (S) starting from some x0 ∈ domf x0 %S x1 %S x2 %S · · · %S xk %S xk+1 %S · · · . Then, the following properties hold: (i) (f (xk ))k∈N decreases with k, and f (xk ) → inf k f (xk ) ∈ R when k → +∞; (ii) The sequence (xk )k∈N satisfies +∞ k=0 d(xk , xk+1 ) < +∞. Hence, it is a Cauchy sequence in (X, d). When (X, d) is a complete metric space, the sequence (xk )k∈N converges in (X, d) to some x¯ ∈ X. Moreover, when f is lower semicontinuous, we have x¯ $S xk for all k ∈ N. Proof. (i) For any k ∈ N, by definition of %S f (xk+1 ) ≤ f (xk+1 ) + d(xk , xk+1 ) ≤ f (xk ). We have used d(xk , xk+1 ) ≥ 0, which expresses that changes are costly. Therefore, the sequence (f (xk ))k∈N is decreasing. Since −∞ < inf f ≤ f (xk ) ≤ f (x0 ) < +∞, X
we have f (xk ) ↓ inf k f (xk ), which is a finite real number.
i
i i
i
i
i
i
3.4. Ekeland’s ε-variational principle
“abmb 2005/1 page 1 i
105
(ii) Let us write the inequality f (xk+1 )+d(xk , xk+1 ) ≤ f (xk ) for k = 0, 1, . . . , n−1, f (x1 ) + d(x0 , x1 ) ≤ f (x0 ) . . . f (xn ) + d(xn−1 , xn ) ≤ f (xn−1 ). Then, we sum these inequalities and simplify the resulting expression (note that f (xk ) ∈ R for all k ∈ N). We obtain f (xn ) +
n−1
d(xk , xk+1 ) ≤ f (x0 ).
k=0
Let us now use the minorization f (xn ) ≥ inf X f and the assumption inf X f > −∞. We thus have n−1
d(xk , xk+1 ) ≤ f (x0 ) − inf f < +∞. X
k=0
This being true for any n ∈ N, we deduce +∞
d(xk , xk+1 ) ≤ f (x0 ) − inf f < +∞. X
k=0
Note that this holds true, just by assuming that (X, d) is a metric space. Then, by a classical argument, when (X, d) is a complete metric space, this implies the convergence of the sequence (xk )k∈N in (X, d). To see this, write the triangle inequality
n+p−1
d(xn , xn+p ) ≤ ≤
d(xk , xk+1 )
k=n +∞
d(xk , xk+1 ),
k=n
which tends to zero as n → +∞. Hence (xk )k∈N is a Cauchy sequence in (X, d) which implies its convergence when (X, d) is a complete metric space. Let xk → x¯ in (X, d) as k → +∞. Let us prove that x¯ $S xn for all n ∈ N. We have xk $S xn for all k ≥ n (by transitivity of $S ), i.e., f (xk ) + d(xk , xn ) ≤ f (xn )
∀ k ≥ n.
i
i i
i
i
i
i
106
“abmb 2005/1 page 1 i
Chapter 3. Abstract variational principles
Let us fix n ∈ N and let k → +∞ in this inequality. Since xk → x¯ in (X, d), by using the lower semicontinuity property of f (up to now we have not used it!), we obtain f (x) ¯ + d(x, ¯ xn ) ≤ f (xn ), that is, x¯ $S xn . Our objective is to prove the existence of a trajectory (xk )k∈N of the marginal satisficing dynamics (S) which converges to a maximal element x¯ for $S . So doing, we will have x¯ $S xk for all k ∈ N and hence x¯ $S x0 , which combined with the maximality of x¯ is precisely the claim of Theorem 3.4.3. In this perspective, to consider an arbitrary trajectory of (S) does not provide enough information: note that xk ≡ x0 for all k ∈ N is a trajectory of (S)! The dynamical system (S) modelizes a general rejection decision mechanism. We are now going to consider some trajectory of (S) which describes the decision process of a motivated agent. This means that at each step, the agent is willing to substantially improve his performance. This is a rich modelization subject involving some optimization aspects. In this perspective, the notion of aspiration index m(x), which is defined in the next statement, plays an important role. Lemma 3.4.1. For any x ∈ X, diam S(x) ≤ 2(f (x) − m(x)), where
% & % & m(x) = inf f (y) : y ∈ S(x) = inf f (y) : y $S x
is called the aspiration index of the agent at x. Proof. The proof is an immediate consequence of the definition of y ∈ S(x): y ∈ S(x) ⇐⇒ f (y) + d(x, y) ≤ f (x). Noticing that for y ∈ S(x) we have f (y) ≥ m(x), we deduce ∀y ∈ S(x)
d(x, y) ≤ f (x) − m(x).
As a consequence, for any y, z ∈ S(x), d(y, z) ≤ d(y, x) + d(x, z) ≤ 2(f (x) − m(x)), and diam S(x) ≤ 2(f (x) − m(x)). The aspiration index of the agent at x, say, m(x), measures the gap between its present level of satisfaction at x and the maximum level of satisfaction that it can hope to obtain at a further step. Note that m(x) is, in general, not known by the agent who is not able to explore all of S(x). The cognitive model says that if the agent is motivated and is willing to explore enough at each step (and pay corresponding exploration costs!), then it knows a sufficiently good approximation of m(x) and the process converges to a stable routine.
i
i i
i
i
i
i
3.4. Ekeland’s ε-variational principle
“abmb 2005/1 page 1 i
107
We are going to consider trajectories (xk )k∈N corresponding to a motivated agent who satisfies enough at each step. As an example, we consider that at each step k, the agent satisfies and fills a given fraction λ ∈]0, 1[ of the gap between f (xk ) and m(xk ): thus xk+1 $S xk and f (xk+1 ) ≤ λm(xk+1 ) + (1 − λ)f (xk ) = f (xk ) − λ[f (xk ) − m(xk )]. We now have all the ingredients to state a dynamical, cognitive version and proof of Ekeland’s ε-variational principle [37]. Theorem 3.4.4 (Attouch and Soubeyran). Let (X, d) be a complete metric space and f : X −→ R ∪ {+∞} an extended real-valued function which is lower semicontinuous and bounded below. (a) Then, for any x0 ∈ domf , there exists a trajectory (xk )k∈N of the marginal satisficing dynamical system (S), x0 %S x1 %S x2 %S . . . %S xk %S xk+1 %S . . . , which converges in (X, d) to some x¯ ∈ X which is a maximal element for the partial ordering $S . (b) Such trajectory can be obtained by satisficing enough at each step, for example, given some positive parameter 0 < λ < 1, by taking at each step xk+1 $S xk and f (xk+1 ) ≤ f (xk ) − λ[f (xk ) − m(xk )], % & where m(.) is the aspiration index : m(x) = inf f (y) : y $S x . Proof. Take a trajectory (xk )k∈N of (S) which satisfies enough at each step. One can always construct such a trajectory just by using the definition of m(x) as an infimum. Then, observe that the sequence (S(xk ))k∈N is nested. Since $S is transitive and xk+1 $S xk , we have the following implication: y ∈ S(xk+1 ) ⇐⇒ y $S xk+1 ⇒ y $S xk ⇐⇒ y ∈ S(xk ), i.e., S(xk+1 ) ⊂ S(xk ) for all k ∈ N. Let us prove that diam S(xk ) → 0 as k → +∞. By using Lemma 3.4.1, it is enough to prove that f (xk ) − m(xk ) → 0 as k → +∞. Since S(xk+1 ) ⊂ S(xk ) we have % & m(xk+1 ) = inf f (y) : y ∈ S(xk+1 ) % & ≥ inf f (y) : y ∈ S(xk ) = m(xk ). We now use that this agent satisfies enough, i.e., f (xk+1 ) ≤ f (xk ) − λ[f (xk ) − m(xk )], and the inequality m(xk+1 ) ≥ m(xk ) to obtain f (xk+1 ) − m(xk+1 ) ≤ f (xk ) − λ[f (xk ) − m(xk )] − m(xk ) ≤ (1 − λ)[f (xk ) − m(xk )].
i
i i
i
i
i
i
108
“abmb 2005/1 page 1 i
Chapter 3. Abstract variational principles
Hence f (xk ) − m(xk ) ≤ (1 − λ)k [f (x0 ) − m(x0 )] and diam S(xk ) ≤ 2(1 − λ)k [f (x0 ) − m(x0 )]. Since 0 < λ < 1 we have diam S(xk ) → 0 as k → +∞. The sequence (S(xk ))k∈N is a decreasing sequence of closed nonempty sets (closedness follows from the lower semicontinuity of f ) whose diameter tends to zero. Since (X, d) is complete, we have, by a classical result, that k∈N S(xk ) = {x} ¯ is nonvoid and is reduced to a single element x¯ ∈ X. For any ¯ ≤ diam S(xk ) which tends k ∈ N, we have xk and x, ¯ which belong to S(xk ), hence d(xk , x) to zero. Thus, xk converges to x¯ in (X, d) as k → +∞. The maximality of x¯ with respect to $S follows from the following observation: suppose ¯ Since x¯ ∈ S(xk ) for every k ∈ N, we have y $S xk for all k ∈ N, i.e., that y $S x. ¯ y ∈ k∈N S(xk ) = {x}. Indeed, when proving Theorem 3.4.3 and its dynamical version (Theorem 3.4.4), we have obtained a stronger version of Ekeland’s variational principle, which is formulated below. Theorem 3.4.5. Let (X, d) be a complete metric space and f : X −→ R ∪ {+∞} a proper lower semicontinuous function which is bounded below. Let ε > 0 and x0 ∈ X be given such that f (x0 ) ≤ inf f + ε, X
and let λ > 0. Then, there exists some x¯ε,λ ∈ X such that f (x¯ε,λ ) ≤ f (x0 ) ≤ inf f + ε; d(x¯ε,λ , x0 ) ≤ λ; f (x¯ε,λ ) < f (x) +
X
ε d(x¯ε,λ , x) ∀ x = x¯ε,λ . λ
Proof. Let us apply Theorem 3.4.3 with λε d instead of d. One obtains the existence of some x¯ε,λ ∈ X which satisfies x¯ε,λ $S x0 , i.e., f (x¯ε,λ ) +
ε d(x¯ε,λ , x0 ) ≤ f (x0 ), λ
and x¯ε,λ is maximal with respect to $S . We thus have f (x¯ε,λ ) ≤ f (x0 ) and inf f + X
ε ε d(x¯ε,λ , x0 ) ≤ f (x¯ε,λ ) + d(x¯ε,λ , x0 ) λ λ ≤ f (x0 ) ≤ inf f + ε, X
which implies d(x¯ε,λ , x0 ) ≤ λ. The last property expresses that x¯ε,λ is maximal with respect to $S .
i
i i
i
i
i
i
“abmb 2005/1 page 1 i
Chapter 4
Complements on measure theory
4.1
Hausdorff measures and Hausdorff dimension
This section aims to define intrinsically the intuitive notion of length, area, and volume. More precisely, we would like to provide a nonnegative measure for any subset of RN , which agrees with the well known k-dimensional measure for regular k-dimensional surfaces when k is an integer. Hausdorff’s construction, as shown below, is particularly well suited to the geometry of sets and does not require any local parametrization on these sets; therefore, no assumption of regularity is needed. For instance, the Hausdorff measure offers the possibility of measuring fractal sets, as well as defining a new notion of dimension for any set, thereby extending the classical topological dimension. Note that the process described in the first subsection is the Carathéodory general approach to construct a measure from a σ -subadditive set function (or outer measure).
4.1.1
Outer Hausdorff measures and Hausdorff measures
We denote the collection of all subsets of RN by P(RN ) and, for any nonempty set E of P(RN ), we set diam(E) = sup{d(x, y) : (x, y) ∈ E}, the diameter of E, where d is the Euclidean distance in RN . When s is a positive integer we denote the volume of the unit ball of Rs by ωs ; in the general case s ≥ 0, we set ωs =
π s/2 , (1 + s/2)
where is the well-known Euler function +∞ x t−1 e−x dx. (t) = 0
We also set cs = 2−s ωs . For instance, we have c0 = 1,
c1 = 1,
c2 = π/4,
c3 = π/6.
109
i
i i
i
i
i
i
110
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Let E be any set of P(RN ) and δ > 0. Afinite or countable family (Ai )i∈N of sets in P(RN ) satisfying 0 < diam(Ai ) ≤ δ and E ⊂ i∈N Ai will be called a δ-covering of E. Definition 4.1.1. For each s ≥ 0, δ > 0, and E ⊂ RN , let us set
s s diam(Ai ) : (Ai )i∈N δ-covering of E . Hδ (E) := inf cs i∈N
The s-dimensional outer Hausdorff measure is the set mapping Hs taking its values in [0, +∞] defined by Hs (E) := sup Hδs (E) δ>0
= lim Hδs (E). δ→0
Remark 4.1.1. The constant cs is a normalization parameter so that when s ∈ N and E is an s-dimensional regular hypersurface of RN , Hs (E) is the s-dimensional area of E (see Proposition 4.1.6 or Theorem 4.1.1). The positive number δ, intended to tend to zero, forces the sets Ai of the δ-covering to follow the geometry of E. Proposition 4.1.1. The set function Hs : P(RN ) −→ [0, +∞] is an outer measure, i.e., satisfies (i) Hs (∅) = 0; (ii) (σ -subadditivity) for all sequences (Ei )i∈N of subsets of RN such that E ⊂ ∪i∈N Ei , Hs (Ei ); Hs (E) ≤ i∈N
(iii) H is a nondecreasing set function, that is, Hs (A) ≤ Hs (B) whenever A ⊂ B. s
Proof. For all A in P(RN ) such that diam(A) ≤ δ, one has Hδs (∅) ≤ diam(A)s ≤ δ s and (i) follows. The monotonicity property (iii) follows straightforwardly from the definition of Hs . Let now ε > 0 and (Ai,j )j ∈N be a δ-covering of Ei satisfying cs
diam(Ai,j )s ≤
j ∈N
ε + Hδs (Ei ). 2i
Obviously, (Ai,j )(i,j )∈N2 is a δ-covering of i∈N Ei so that ' ( $ s Ei ≤ Hδs (Ei ) + 2ε. Hδ i∈N
i∈N
Letting δ → 0, conclusion (ii) follows from the fact that Hs is a nondecreasing set function and ε is arbitrary.
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
“abmb 2005/1 page 1 i
111
Following the classical construction of a measure from an outer measure, we define the subset Ms of Hs -measurable sets in the sense of Carathéodory: A ∈ Ms ⇐⇒ ∀X ∈ P(RN ), Hs (X) = Hs (X ∩ A) + Hs (X \ A). Note that ∅ and RN belong to Ms . Proposition 4.1.2. The set Ms is a σ -algebra and Hs is σ -additive on Ms . Proof. Obviously RN belongs to Ms . The proof of stability of Ms with respect to the complementary operation is easily established from a straightforward calculation. The proof of stability for countable union and the σ -additivity of Hs is divided into three steps. First step. Stability for finite unions and finite intersections. Let A1 , A2 be two sets in Ms and X some set in P(RN ). Since A1 and A2 are two measurable sets, an elementary calculation on sets gives Hs (X) = Hs (A1 ∩ X) + Hs (X \ A1 ) = Hs (A1 ∩ X) + Hs ((X \ A1 ) ∩ A2 ) + Hs ((X \ A1 ) \ A2 ) = Hs (A1 ∩ X) + Hs (X ∩ A2 \ A1 ) + Hs (X \ (A1 ∪ A2 )). According to the identities X ∩ A1 = X ∩ (A1 ∪ A2 ) ∩ A1 and X ∩ A2 \ A1 = X ∩ (A1 ∪ A2 ) \ A1 , we derive Hs (X) = Hs (X ∩ (A1 ∪ A2 )) + Hs (X \ (A1 ∪ A2 )) so that A1 ∪ A2 ∈ Ms . Stability for finite intersection is then obtained thanks to the stability with respect to the complementary operation. Note that substituting X by X ∩ (A1 ∪ A2 ) in Hs (X) = Hs (A1 ∩ X) + Hs (X \ A1 ) we obtain the following important equality used in the step below: Hs (X ∩ (A1 ∪ A2 ) = Hs (X ∩ A1 ) + Hs (X ∩ A2 )
(4.1)
whenever A1 and A2 are disjoint sets in Ms . Second step. Stability for disjoint countable unions and σ -additivity of Hs . Let (Ai )i∈N be a family of pairwise disjoint sets in Ms and A their union. According to the subadditivity of Hs for all X in P(RN ) we have Hs (X) ≤ Hs (X \ A) + Hs (X ∩ A) ∞ ≤ Hs (X \ A) + Hs (X ∩ Ai ) i=0
'
= Hs (X \ A) + lim Hs n→+∞
' ≤ lim inf H n→+∞
' s
X\
n $ i=0
n $
( X ∩ Ai
i=0
( Ai
+H
' s
n $
(( X ∩ Ai
i=0
= Hs (X),
i
i i
i
i
i
i
112
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
where we have used (4.1) in the first equality, the nondecreasing of Hs in the inequality, and the stability for finite union in the last equality. This proves that A belongs to Ms . The σ -additivity of Hs is obtained by taking X = A. Last step. Stability for countable union. Let (Ai )i∈N be a family of sets in Ms . Set B0 = A0 and for n ≥ 1, Bn = An \
n−1 $
Ai .
i=0
According to the first step, the family (Bi )i∈N is made up of pairwise disjoint sets of Ms so that, from step 2, Bi = Ai belongs to Ms . i∈N
i∈N
Definition 4.1.2. The restriction to Ms of the set function Hs is called the s-dimensional Hausdorff measure. The s-dimensional Hausdorff measure Hs is a [0, +∞]-valued Borel measure in the following sense: Proposition 4.1.3. The σ -algebra Ms contains the σ -algebra of all the Borel sets of RN . Proof. The proof is based on the following Carathéodory criterion. An outer measure satisfying this criterion is sometimes called an outer metric measure. Lemma 4.1.1. Let µ be an outer measure on a metric space E equipped with its Borel σ algebra B(E) and Mµ be the σ -algebra of the measurable sets in the Carathéodory sense. Then B(E) ⊂ Mµ iff µ is additive on any pair {A, B} of sets of E satisfying d(A, B) > 0. Proof of Lemma 4.1.1. Let us assume B(E) ⊂ Mµ and let A, B be two subsets of E such that d(A, B) > 0. Since A ∈ Mµ the two decompositions A = (A ∪ B) ∩ A, B = (A ∪ B) \ A imply µ(A ∪ B) = µ(A) + µ(B). Conversely, let A be an open subset of E and X any fixed subset of E such that µ(X) < +∞. According to subadditivity, it is enough to establish the inequality µ(X) ≥ µ(X ∩ A) + µ(X \ A). One may assume µ(X) < +∞. For all k ∈ N∗ , let us define the sets 1 , Ak := x ∈ A : d(x, E \ A) > k Bk := Ak+1 \ Ak . Noticing that for k − l ≥ 2, d(Bk , Bl ) ≥ n i=1
1 l+1
µ(X ∩ B2i ) = µ
−
1 k
' n $
> 0, we have for all n ∈ N (
X ∩ B2i
≤ µ(X),
i=1
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
hence
113
µ(X ∩ Bk ) ≤ µ(X).
even
k
The same calculation gives
k
so that
“abmb 2005/1 page 1 i
µ(X ∩ Bk ) ≤ µ(X)
odd
µ(X ∩ Bk ) ≤ 2µ(X)
k∈N
and
lim
k→+∞
µ(X ∩ Bi ) = 0.
i≥k
We infer lim µ(X ∩ (A \ Ak )) = lim µ X ∩ (∪i≥k Bi ) k→+∞ k→+∞ ≤ lim µ(X ∩ Bi ) = 0, k→+∞
i≥k
and, by subadditivity, we deduce µ(X ∩ A) ≤ µ(X ∩ Ak ) + µ(X ∩ (A \ Ak )) ≤ lim inf µ(X ∩ Ak ). k→+∞
Since obviously lim supn→+∞ µ(X ∩ Ak ) ≤ µ(X ∩ A), one has µ(X ∩ A) = lim µ(X ∩ Ak ). k→+∞
From d(Ak , X \ A) >
1 k
> 0 we now obtain
µ(X) ≥ µ((X ∩ Ak ) ∪ (X \ A)) = µ(X ∩ Ak ) + µ(X \ A), and we complete the proof of Lemma 4.1.1 by letting k → +∞. Proof of Proposition 4.1.3 continued. According to Lemma 4.1.1, it is enough to prove that Hs is an outer metric measure. Let A and B be two subsets of RN such that d(A, B) > 0 and let C = (Ci )i∈N be a covering of A ∪ B with diam(Ci ) ≤ δ < d(A, B). We can write this covering as the union of two disjoint δ-coverings: take A = {C ∈ C : C ∩ A = ∅} and B = {C ∈ C : C ∩ B = ∅}. Therefore ∞ i=1
cs diam(Ci )s =
C∈A
cs diam(C)s +
cs diam(C)s
C∈B
and Hδs (A ∪ B) ≥ Hδs (A) + Hδs (B). We end the proof by letting δ → 0.
i
i i
i
i
i
i
114
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Remark 4.1.2. The following process is often used to construct an outer measure and thus a measure on Cantor-type sets C in RN . Let E0 = {C} and for n ∈ N∗ let En be a finite collection of disjoint subsets E of C, such that each E ∈ En is contained in one of the sets of En−1 . We assume moreover that limn→+∞ {diam(E) : E ∈ En } = 0. We set now µ(C) = a, where a is an arbitrary number satisfying 0 < a < +∞ and, for all the sets Ei , i = 1, . . . , m1 , of E1 , we define the masses µ(Ei ) such that m1
µ(Ei ) = µ(C).
i=1 n Similarly, we assign masses µ(Ei ) to the sets of En such that if E ∈ En−1 , E = ∪m i=1 Ei , Ei ∈ En , mn µ(Ei ) = µ(E).
i=1
We finally set
' µ R \ N
$
( E
= 0.
E∈En
Let us denote the collection of all the complementary sets of En by E n and set E = ∪n∈N (En ∪ E n ). For each set A ∈ P(RN ), we now extend µ by setting
$ µ(A) = inf µ(Ei ) : A ⊂ Ei , Ei ∈ E , i∈N
i∈N
which defines an outer measure. One can prove that µ is a measure whose σ -algebra of measurable sets contains the σ -algebra of Borel sets of RN . Moreover, the support of µ, that is, the smallest closed set X of RN such that µ(RN \ X) = 0, is contained in ∩n∈N ∪E∈En E. Remark 4.1.3. The construction of Hausdorff measures Hs can be made in a general metric space X and indeed most of the properties continue to hold in this larger framework. For a detailed presentation of Hausdorff measures in metric spaces, see Ambrosio and Tilli [23]. Theorem 4.1.1. For all Lebesgue measurable sets E in RN , we have HN (E) = LN (E), where LN denotes the Lebesgue measure on RN . Moreover, Hs (E) = 0 for s > N , whereas H0 is the counting measure. Proof. For all Lebesgue measurable sets E in RN , let us recall the so-called isodiametric inequality, which asserts that the Lebesgue measure of E is less than the Lebesgue measure of any ball having the same diameter: LN (E) ≤ cN (diam(E))N . For a proof see, for instance, Evans and Gariepy [132]. For all covering (Ai )i∈N of E we then have cN (diam(Ai ))N ≥ LN (Ai ) ≥ LN (E), i∈N
i∈N
hence HδN (E) ≥ LN (E). Letting δ → 0 gives HN (E) ≥ LN (E).
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
“abmb 2005/1 page 1 i
115
The converse inequality is more involved. We will use the so-called Vitali’s covering lemma. Let us firstly define the notion of fine covering. Definition 4.1.3. A family F of closed balls in RN is said to cover a set E ⊂ RN finely if, for each x ∈ E and each ε > 0, there exists B r (x) ∈ F with r < ε, where Br (x) denotes the open ball with radius r and centered at x. Lemma 4.1.2. (Vitali’s covering theorem). Let E ⊂ RN with LN (E) < +∞ and finely covered by a family of closed balls F. Then there exists a countable subfamily G of pairwise disjoint elements of F such that ' L
E\
N
$
( = 0.
B
B∈G
Proof. For the proof see, for instance, Ziemer [229]. Remark 4.1.4. In the definition of a family finely covering a subset E of RN one may replace the family of closed balls by a regular family of closed subsets of RN (see [229]). Vitali’s covering theorem is also valid for nonnegative Radon measures µ on RN (i.e., locally bounded nonnegative Borel measures; see Section 4.2). For a proof consult [229]. Proof of Theorem 4.1.1 continued. Let A be a set of finite Lebesgue measure in RN and, for each η > 0, let U be an open subset of RN such that A ⊂ U and LN (U \ A) < η. There exists a family F of closed balls with diameter less than δ, included in U , and covering finely U . Indeed, for all x ∈ U , there exists a closed ball B r(x) (x) included in U and F = (B r (x))x∈U, r≤r(x)∧δ/2 is such a suitable family. According to Lemma 4.1.2 one can extract a subfamily (Bi )i=1,...,∞ of pairwise disjoint elements with diameter less than δ, satisfying ' ( ∞ ∞ $ $ N L U\ Bi = 0, Bi ⊂ U. i=1
Set A∗ :=
∞
i=1 (Bi
i=1
∩ A). We have LN (A \ A∗ ) = 0 and HδN (A∗ )
≤ =
∞ i=1 ∞
cN (diamBi )N LN (Bi )
i=1
= LN (U ) ≤ LN (A) + η. Letting δ → 0 and η → 0, we obtain HN (A∗ ) ≤ LN (A). It remains to establish LN (A \ A∗ ) = 0 ⇒ HN (A \ A∗ ) = 0 and more generally that if E is a Borel set satisfying LN (E) = 0, then HN (E) = 0. This property is a corollary of the first assertion of Lemma 4.1.3 below, whose proof may be found in [229].
i
i i
i
i
i
i
116
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Lemma 4.1.3. Let F be a family of closed balls with sup{diamB : B ∈ F} < +∞. Then there exists a countable subfamily G of pairwise disjoint elements of F such that $
B⊂
B∈F
$
B ∗,
B∈G
where B ∗ denotes the closed ball concentric with B with radius five times as big as that of B. Let E be a subset of RN , finely covered by F; then, for all finite family G ∗ ⊂ G, E⊂
$
B
B∗ .
$ $
B∈G ∗
B∈G\G ∗
Indeed, for η > 0, consider an open subset U of RN such that E ⊂ U and LN (U ) ≤ η. The open set U being a union of closed balls included in U with diameter less than δ, according to Lemma 4.1.3, there exists a family G of pairwise disjoint closed balls included in U such that U ⊂ B∈G B ∗ and HδN (E) ≤
HδN (B ∗ )
B∈G
≤
cN 5N (diam(B))N
B∈G
=5
N
' L (B) = 5 L N
N
N
B∈G
$
( B
< 5N η.
B∈G
The conclusion follows by letting δ → 0 and η → 0. We establish now that for all Borel subsets E of RN , Hs (E) = 0 when s > N . One can write E = ∪n∈N En , where HN (En ) < +∞ (take En = Bn ∩ E, where Bn denotes the ball of RN with radius n, centered at 0). Let δ > 0 and (Ai )i∈N a δ-covering of En . We have cs
diam(Ai )s ≤
i∈N
cs s−N δ cN diam(Ai )N cN i∈N
which yields Hδs (En ) ≤
cs s−N N δ H (En ), cN
and finally, letting δ → 0, one obtains Hs (En ) = 0. Therefore, by subadditivity, Hs (E) ≤
∞
Hs (En ) = 0.
n=0
The fact that H0 is the counting measure is easy to establish and left to the reader.
i
i i
i
i
i
i
4.1.
4.1.2
Hausdorff measures and Hausdorff dimension
“abmb 2005/1 page 1 i
117
Hausdorff measures: Scaling properties and Lipschitz transformations
The scaling properties of length, area, or volume are well known. The two propositions below summarize and generalize these properties. Proposition 4.1.4. Let A be any subset of RN and λ > 0. Then Hs (λA) = λs Hs (A). Proof. If (Ai )i is a δ-covering of A, then (λAi )i is a λδ-covering of λA so that s Hλδ (λA) ≤
∞
cs (λdiam(Ai ))s = λs
∞
i=1 s Therefore Hλδ (λA)
cs diam(Ai )s .
i=1
≤λ → 0, we obtain Hs (λA) ≤ λs Hs (A). Replacing λ with 1/λ and A with λA gives the converse inequality. s
Hδs (A). Letting δ
Proposition 4.1.5. Let A be any subset of RN and f : A −→ Rm satisfying for all x and y in A |f (x) − f (y)| ≤ L|x − y|α , where L > 0 and α > 0 are two given constants. Then cs/α s/α s Hs/α (f (A)) ≤ L H (A). cs Consequently, if f is a Lipschitz function with modulus L, then Hs (f (A)) ≤ Ls Hs (A), and if f is an isometry, then Hs (f (A)) = Hs (A). Proof. If (Ai )i is a δ-covering of A, then (f (A ∩ Ai ))i is an Lδ α -covering of f (A), and ∞ ∞ cs/α cs/α s/α s/α HLδα (f (A)) ≤ cs diam(f (A ∩ Ai ))s/α ≤ L cs diam(Ai )s . cs i=1 cs i=1 s/α
We deduce HLδα (f (A)) ≤ δ → 0.
cs/α s/α s L Hδ (A) cs
and the conclusion follows after letting
The Hausdorff measure of an N -dimensional regular hypersurface of Rm is nothing but its classical area when the hypersurface is defined by means of a parametrization. Proposition 4.1.6. Let m ∈ N with m ≥ N , f : RN −→ Rm be a one-to-one Lipschitz map and E a Borel subset of RN . Then N 1/2 Cm |Ji |2 dLN , HN (f (E)) = E
i=1
where Ji , i = 1, . . . , CmN , are the N × N -minors of the Jacobian matrix of f .
i
i i
i
i
i
i
118
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Proof. To illustrate how the definition of the one-dimensional Hausdorff measure is well adapted to the local geometry of arcs, we only give the proof in the case N = 1, where f : [0, 1] −→ Rm is denoted by t → (xi (t))i=1,...,m . For a complete proof, consult Ambrosio [16] or Evans and Gariepy [132]. We must establish ' m (1/2 H1 f ([0, 1]) = |xi (t)|2 dt [0,1]
=
[0,1]
i=1
|f (t)| dt,
which is the length of the parametrized arc = f ([0, 1]). We use the well-known classical result straightforwardly obtained from Taylor’s formula and Riemann integration theory: if f belongs to C1 ([α, β], Rm ), we have [α,β]
|f (t)| dt = lim
η→0
n−1
|f (ti+1 ) − f (ti )| ≥ |f (α) − f (β)|,
i=0
where t0 = α < t1 < . . . ti < ti+1 < . . . tn = β is some finite subdivision of [α, β] with η = maxi=0,...,n−1 (ti+1 − ti ). Step 1. The function f is assumed to belong to C1 ([0, 1], Rm ) and we prove [0,1] |f (t)| dt ≥ H1 (). For all δ > 0, let η > 0 be such that |t − t | < η ⇒ |f (t) − f (t )| < δ. Let us consider a finite subdivision t0 = 0 < t1 < . . . ti < ti+1 < . . . tn = 1 of [0, 1] satisfying η > maxi=0,...,n−1 (ti+1 − ti ) and set i = f ([ti , ti+1 )) for i = 0, . . . , n − 1. Consider now ai , bi in [ti , ti+1 ] such that diam(i ) = |f (ai ) − f (bi )|. We have \ {f (1)} =
n−1 $
i
i=0
with diam(i ) < δ and [0,1]
|f (t)| dt =
n−1 i=0
≥
n−1 i=0
≥
[ti ,ti+1 ]
n−1
[ai ,bi ]
|f (t)| dt
|f (t)| dt
|f (ai ) − f (bi |
i=0
=
n−1
diam(i ) ≥ Hδ1 ().
i=0
We end this step by letting δ → 0.
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
“abmb 2005/1 page 1 i
119
Step 2. The function to belong to C1 ([0, 1], Rm ) and we prove f is again assumed 1 the converse inequality [0,1] |f (t)| dt ≤ H (). Let n ∈ N, h = 1/n, ti = hi, i = 0, . . . , n − 1, and consider the covering =
n−1 $
f ([ti , ti+1 )) ∪ {f (1)}
i=0
by pairwise disjoint Borel subsets f ([ti , ti+1 )) of (recall that f is one-to-one). We then have n−1 1 H () = H1 (f ([ti , ti+1 ))). i=0
For each i = 1, . . . , n − 1, consider the orthogonal projection π of f ([ti , ti+1 )) onto the line (f (ti ), f (ti+1 )). As π does not increases distances, from Proposition 4.1.5 we have H1 f ([ti , ti+1 )) ≥ H1 π(f ([ti , ti+1 ))) , which is greater than H1 ([f (ti ), f (ti+1 ))). Indeed, a convexity argument yields [f (ti ), f (ti+1 )) ⊂ π(f ([ti , ti+1 ))). By using the definition, it is easy to prove for the segments [a, b] in Rm that H1 ([a, b]) = |b − a|. Therefore H1 ([f (ti ), f (ti+1 ]) = |f (ti ) − f (ti+1 )| and H1 () ≥
n−1
|f (ti ) − f (ti+1 )|.
i=0
The conclusion follows after letting n → +∞. Step 3. The function f is assumed to be Lipschitz continuous. The conclusion is a straightforward consequence of the two previous steps and the following approximating result: every Lipschitz map f : E → Rm can be approximated in the Lusin sense by functions of class C1 . More precisely, there exists a nondecreasing family (Ki )i∈N of compact sets included in E such that ' ( ∞ $ N H E\ Ki = 0 i=0
and such that f Ki is the restriction of a function of class C1 . For a proof of this property, consult [132]. CmN 2 1/2 Remark 4.1.5. The measure · LN is sometimes called the element of area i=1 |Ji | of the N-dimensional surface f (E) and, for all Borel subset B of f (E), N 1/2 Cm B → HN (B) = |Ji |2 dLN f −1 (B)
i=1
is its corresponding surface measure.
i
i i
i
i
i
i
120
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
When f is not one-to-one, the generalization of the formula in Proposition 4.1.6 is the so-called area formula: (1/2 'm−N+1 −1 N 2 0 |Ji | dLN , H (E ∩ f (x)) H (dx) = E
f (E)
i=1
where x → H0 (E ∩ f −1 (x)) is nothing but the multiplicity function as illustrated in the example of the parametrization of the unit circle: t → (cos(2nπ t), sin(2nπ t)) from E = [0, 1[ into R2 . One can also generalize this formula as follows: let h : E → [0, +∞] be a Borel function; then f (E)
h(x) HN (dy) =
1/2 N Cm h(x) |Ji |2 LN (dx).
E
x∈f −1 (y)
i=1
For these generalizations, see [16], [132], or Federer [134].
4.1.3
Hausdorff dimension
Among the wide variety of definitions of dimension, the Hausdorff dimension introduced in Theorem 4.1.2 has the advantage of being defined for any set. For alternative definitions of dimensions, consult, for instance, Falconer [133]. Lemma 4.1.4. For all fixed set A of RN , the map s → Hs (A) is nonincreasing. More precisely, for all δ > 0 and for t > s, we have Hδt (A) ≤ δ t−s
ct s H (A). cs δ
(4.2)
Proof. Let (Ai )i∈N be a δ-covering of A. One has Hδt (A) ≤ ct diam(Ai )t i∈N
≤
ct cs diam(Ai )s δ t−s . cs i∈N
The conclusion then follows by taking the infimum on all the δ-covering of A. Theorem 4.1.2 (definition of the Hausdorff dimension). Let A be a set of RN and set s0 := inf{t ≥ 0 : Ht (A) = 0}. Then s0 satisfies
Hs (A) =
+∞ if s < s0 , 0 if s > s0 .
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
“abmb 2005/1 page 1 i
121
The real number s0 is called the Hausdorff dimension of the set A and is denoted by dimH (A). At the critical value s0 , Hs0 (A) may be zero or infinite or may satisfy 0 < Hs0 (A) < +∞. In this last case, A is called an s0 -set. Proof. Letting δ → 0 in inequality (4.2), we obtain Hs (A) < +∞ ⇒ ∀t > s, Ht (A) = 0. Set s0 := inf{t ≥ 0 : Ht (A) = 0} and take s < s0 . Assume that Hs (A) < +∞. For t satisfying s < t < s0 we have Ht (A) = 0, which contradicts the definition of s0 . Consequently, for s < s0 , Hs (A) = +∞. Since the map s → Hs (A) is nonincreasing, for s > s0 , we have Hs (A) = 0. Obviously dimH (R) = 1 and H1 (R) = +∞. On the other hand, Proposition 4.1.9 provides a nontrivial set A in R with dimH (A) = 1 and satisfying H1 (A) = 0. Remark 4.1.6. Taking, as δ-covering, the class of balls of RN , one may define
∞ ∞ $ s s N H˜ δ (E) := inf cs , diam(Bi ) : E ⊂ Bi , 0 < diam(Bi ) ≤ δ, Bi ball of R i=1
i=1
and the set mapping H˜ s , by H˜ s (E) := sup H˜ δs (E) δ>0
= lim H˜ δs (E). δ→0
It is easy to establish the following bounds: Hs (E) ≤ H˜ s (E) ≤ 2s Hs (E). Thanks to this estimate, the Hausdorff dimensions defined from the two mappings Hs and H˜ s are equal. The following are useful for the Hausdorff dimension. Proposition 4.1.7. Let A be any set in RN . (i) The following implications hold true: Hs (A) < +∞ ⇒ dimH (A) ≤ s, Hs (A) > 0 ⇒ dimH (A) ≥ s; (ii) Let f : A −→ Rm satisfying for all x and y in A, |f (x) − f (y)| ≤ L|x − y|α , where L > 0 and α > 0 are two given positive constants. Then dimH (f (A)) ≤ 1 dimH (A). α
i
i i
i
i
i
i
122
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Proof. The proof of assertion (i) is a straightforward consequence of the definition. Let us prove (ii). Indeed s > dimH (A) ⇒ Hs (A) = 0 and inequality Hs/α (f (A)) ≤ cs/α s/α s L H (A) established in Proposition 4.1.5 yields Hs/α (f (A)) = 0 so that cs s ≥ dimH (f (A)). α The conclusion follows by letting s → dimH (A). Example 4.1.1. When U is an open subset of RN , then dimH (U ) = N . Indeed, U contains an open ball B and HN (U ) ≥ HN (B) > 0, and thus dimH (U ) ≥ N . Example 4.1.2. Every countable subset A of RN is a set of zero Hausdorff dimension. Indeed, by σ -additivity, one has Hs ({a}). Hs (A) = a∈A
Since H0 ({a}) = 1, for all s > 0 one has Hs ({a}) = 0, and thus Hs (A) = 0. This proves that dimH (A) = 0. Example 4.1.3. Let us consider for N ≤ m a one-to-one map f : RN −→ Rm of class C1 . Let E be a compact subset of RN . We have dimH (f (E)) = N . Indeed, from Proposition 4.1.6, we have 0 < HN (f (E)) ≤ CLN (E) < +∞, where C is a positive constant. Example 4.1.4. Let us consider now the middle third Cantor set C of the interval [0, 1]. 2 We have dimH (C) = ln . Indeed, ln 3 2 1 ∪ C∩ ,1 . C = C ∩ 0, 3 3 Set C1 := C ∩ [0, 13 ] and C2 := C ∩ [ 23 , 1]. These two sets are geometrically similar to C by a ratio 13 . According to the properties of the Hausdorff measure established in Proposition 4.1.5, we derive 2 Hs (C) = s Hs (C). 3 The conclusion follows if we assume that C is an s-set, that is, 0 < Hs (C) < +∞. For a proof of this property, consult [133], where various interesting methods are given for the calculation of Hausdorff dimensions of fractal sets. More generally, let S1 , . . . , Sm : RN → RN be m similarities, i.e., satisfying |Si (x) − Si (y)| = ci |x − y| for all x, y in RN , where 0 < ci < 1 (the ratio of Si ). We assume that the Si satisfy the open set condition, that is, there exists a nonempty bounded open set U such that m $
Si (U ) ⊂ U.
i=1
i
i i
i
i
i
i
4.1.
Hausdorff measures and Hausdorff dimension
Consider now a so-called self-similar set F satisfying F = where s is the solution of m cis = 1
“abmb 2005/1 page 1 i
123 m i=1
Si (F ). Then dimH F = s,
i=1
and 0 < Hs (F ) < +∞. For a proof, consult [133]. Note that the middle third Cantor set is such a self-similar set by taking S1 (x) = 13 x and S2 (x) = 13 x + 23 and the open set condition holds for U =]0, 1[. The Hausdorff dimension of a set gives us some information about its topological structure, as stated in the proposition below. Proposition 4.1.8. A subset E of RN with dimH (E) < 1 is totally disconnected: no two of its points lie in the same connected component. Proof. Let x and y be distinct elements of E and consider the distance function dx to x in RN , that is, dx (z) = |z − x|. As dx does not increase distances, from Proposition 4.1.7 we deduce that dimH (dx (E)) ≤ dimH (E) < 1. Thus dx (E) is a subset of R of H1 measure (or Lebesgue measure) zero. Consequently, there exists r > 0 such that r < dx (y) and r ∈ dx (E) (otherwise (0, dx (y)) ⊂ dx (E)). The set E is then the union of the two disjoint open sets E = {z ∈ E : dx (z) < r} ∪ {z ∈ E : dx (z) > r} where x is in one set and y is in the other, so that x and y lie in different connected components. We end this section by giving a nontrivial set in R, with null Lebesgue measure but with Hausdorff dimension equal to one. Proposition 4.1.9. There exists a compact set E of [0, 1] such that H1 (E) = 0 and dimH (E) = 1. Proof. We make use of a Cantor-type set construction but reduce the proportion of intervals removed at each stage. More precisely, we define a family (Kn )n∈N of closed sets as follows: K0 = [0, 1], K1 = K0 \ I11 , where I11 is an open interval centered at 1/2 with length l0 , 0 < l0 < 1; for all n > 1, Kn is the union of 2n closed disjoint intervals Iin with the same length ln and Kn+1 is obtained by removing from each Iin an open interval of ln length n+1 and centered at the center of Iin . Let us show that the compact set E = ∩n∈N Kn answers the question. A straightforward calculation gives ln = 2−n (1 − l0 )/n so that H1 (E) = limn→+∞ H1 (Kn ) = 0. Now fix 0 < r < 1 and define the integer (depending on r), n0 = sup{n ∈ N : ∃i ∈ {1, . . . , 2n }, Br (x) ∩ E ⊂ Iin }. It is easily seen that r≥
2−n0 (1 − l0 ) ln0 = . n0 + 1 n0 (n0 + 1)
(4.3)
i
i i
i
i
i
i
124
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
On the other hand, for each 0 < α < 1, there exists a positive constant (i.e., independent of n0 ) C(α, l0 ) such that 2−αn0 (1 − l0 )α 2−n0 ≤ C(α, l0 ) α ; (4.4) n0 (n0 + 1)α indeed, it is enough to take & % C(α, l0 ) = (1 − l0 )−α max x α (1 + x)α 2−(1−α)x : x ≥ 1 . Consider now the outer measure µ in R satisfying µ(Iin ) = 2−n for all n and all i = 1, . . . , 2n , defined following the general construction of Remark 4.1.2. Collecting (4.3) and (4.4), we obtain for all x ∈ E and all 0 < r < 1, µ(Br (x) ∩ E) ≤ C(α, l0 )r α . This estimate yields dimH (E) ≥ α. Indeed, if (Bi )i∈N is a δ-covering of E made up of balls Bi centered in E with δ < 1, ωα ωα ωα cα diam(Bi )α ≥ µ(Bi ∩ E) ≥ µ(E) = , C(α, l0 ) i∈N C(α, l0 ) C(α, l0 ) i∈N so that H˜ α (E) > 0, then Hα (E) > 0 and dimH (E) ≥ α. Since α < 1 is arbitrary, we have dimH (E) ≥ 1. The opposite inequality is trivial; then dimH (E) = 1.
4.2 4.2.1
Set functions and duality approach to Borel measures Borel measures as set functions
Let be a separable locally compact topological space (for example, RN or an open subset of RN ) and B() its Borel field. We denote the set of all Rm -valued Borel measures by M(, Rm ). Let us recall that M(, Rm ) is the vectorial space of all the set functions µ : B() → Rm satisfying µ(∅) = 0 and the σ -additivity condition: ( ' $ Bn = µ(Bn ) for all pairwise disjoint family (Bn )n∈N in B(). µ n∈N
n∈N
In the case when m = 1, we will use the notation M() (or Mb ()) and the elements of M() are called signed Borel measures. The subset of its nonnegative elements is denoted by M+ (). If A is a fixed Borel subset of , the restriction to A of a Borel measure µ ∈ M(, Rm ) is the Borel measure µA of M(, Rm ) defined for all Borel set E of , by µA(E) = µ(E ∩ A). It is worth noticing that Section 4.1 provides many concrete Borel measures on RN . Indeed, if A is a Borel subset in RN satisfying Hs (A) < +∞, then µ = Hs A belongs to M+ (RN ). The support of a measure µ ∈ M(, Rm ) is the smallest closed set E of , denoted by spt(µ), such that |µ|( \ E) = 0. As a straightforward consequence of the definition, we also have spt(µ) = {x ∈ : ∀r > 0, |µ|(Br (x)) > 0}.
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
125
Let us recall that if µ ∈ M+ (), the measure µ(B) of all Borel subset B of can be approximated by the measures of the open or compact subsets of . More precisely, the following holds. Proposition 4.2.1. The Borel measures µ in M+ () are regular, i.e., for all Borel set B of , one has µ(B) = sup{µ(K) : K ⊂ B, K compact set of }, = inf{µ(U ) : B ⊂ U, U open set of }. The total variation of a measure µ ∈ M(, Rm ) is the real-valued set function |µ|, defined for all Borel set B of by
∞ ∞ $ |µ(Bi )| : Bi = B , |µ|(B) = sup i=0
i=0
where the supremum is taken over all the partitions of B in B(). We point out that for all µ in M(, Rm ) we automatically have |µ|() < +∞ (µ is bounded) and |µ| is σ -additive, so that |µ| is a Borel measure in M+ (). Actually, it is easily seen that |µ| is the smallest nonnegative scalar Borel measure ν such that |µ(B)| ≤ ν(B) for all Borel sets B and that the mapping µ → |µ|() is a norm for which M(, Rm ) is a Banach space. In the scalar case, for all µ ∈ M() we define in M+ () the positive part µ+ and the negative parts µ− of µ by µ+ =
|µ| + µ , 2
µ− =
|µ| − µ , 2
so that µ = µ+ − µ− and |µ| = µ+ + µ− . We define now the nonnegative Radon measures as the locally finite nonnegative Borel measures. Definition 4.2.1. A set function µ : B() → [0, +∞] such that for all ⊂⊂ its restrictions to B( ) is a Borel measure on , is called a nonnegative Radon measure. Remark 4.2.1. It is easily seen that the nonnegative Radon measures are regular. Given a measure λ in M+ (), we denote the set of all Borel functions f : → Rm such that |f | dλ < +∞
by L1λ (, Rm ) or by L1λ () when m = 1. Given now two measures µ ∈ M(, Rm ) and λ ∈ M+ (), we say that the measure µ is absolutely continuous with respect to the measure λ and we write µ n1 } is finite. Assume that In contains an infinite sequence of indices in I , that is, {i0 , . . . , ik , . . .} ⊂ I . We then have ' ( $ +∞ > µ Bik = µ(Bik ) = +∞, k∈N
k∈N
a contradiction. Example 4.2.1. Let µ ∈ M(, Rm ) and Br (x0 ) be the open ball in centered at x0 with radius r. Then for all but countably many r in R+ , one has |µ| = 0. ∂Br (x0 )
Lemma 4.2.2. Let µ ∈ M+ (), (fi )i∈N be a family of nonnegative functions in L1µ () and set f = supi fi . Then
f dµ = sup fi dµ ,
i∈I
Ai
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
127
where the supremum is taken over all finite families (Ai )i∈I of pairwise disjont open subsets of . Proof. Let n be a fixed element of N and consider the µ-measurable sets Ei := x ∈ : sup fk (x) = fi (x) . 0≤k≤n
We now construct the following family (i )i=0,...,n of pairwise disjoint µ-measurable sets: ' 0 = E0 ,
i+1 = \
i $
( Ek ∩ Ei+1 ,
i = 0, . . . , n − 1.
k=1
It is easy to check that =
n i=0
Ei =
n i=0
i . Moreover,
1ni=0 i sup fk dµ
sup fk dµ = 0≤k≤n
= =
0≤k≤n
n
sup fk dµ
i=0 i 0≤k≤n n
fi dµ.
i=0
i
Let µi = fi · µ be the Borel measure in M+ () whose density with respect to µ is fi . Since each measure µi is regular, one has µi (i ) = sup{µi (K):K compact subset of i }, hence n
sup fi dµ = sup fi dµ : (Ki )i=1,...,n pairwise disjoint compact sets . 0≤i≤n
i=0
Ki
From the regularity property of µi again, and by compactness, for all ε > 0, there exists a family (Oi )i=0,...,n of open sets, that one may assume pairwise disjoint, each Oi containing Ki , such that µi (Oi \ Ki ) < ε/n. Therefore n
sup fi dµ = sup fi dµ : (Ai )i=1,...,n pairwise disjoint open sets . 0≤i≤n
i=0
Ai
Taking the supremum on n and by the monotone convergence theorem, we obtain
f dµ ≤ sup fi dµ .
i∈I
Ai
Since the converse inequality is obvious, the proof is complete. Example 4.2.2. Let be a bounded open set of RN and let us consider the following measure associated with a given function u in the space SBV () defined in Section 10.5:
i
i i
i
i
i
i
128
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Su is a hypersurface in , which is the union of a negligible subset of for the N − 1Hausdorff measure, and of countable many C 1 -hypersurfaces with Hausdorff dimension N − 1 (Su is the jump set of u). For H N −1 almost every x ∈ Su , νu (x) is a normal unit vector at x and µ := LN + νu HN −1 Su , where LN denotes the Lebesgue measure in RN . On the other hand, let us consider the functions fν := |∇u · ν|p + |νu · ν|1Su indexed by a countable dense subset D of ν in S N −1 . We have sup fν dµ + sup fν dµ sup fν dµ = \Su ν∈D Su ν∈D ν∈D = |∇u|p dx + HN −1 (Su ),
which is the Mumford–Shah energy functional introduced in image segmentation and studied in Sections 12.4 and 14.3. By applying Lemma 4.2.2 now, we then obtain the following formula which is a central point to extend to arbitrary dimension, the approximation of the Mumford–Shah energy functional established in one dimension:
|∇u| dx + H p
N−1
(Su ) = sup
i∈I
|∇u · νi | dx +
|νν · νi | dH
p
Ai
N −1
Su
.
Ai
The supremum is taken over all finite families (Ai )i∈I of pairwise disjoint open subsets of . Note that for p = 1, we obtain the expression of the total variation of the measure Du := ∇u LN + [u]νu HN−1 Su . For more details on the spaces BV () and SBV (), see chapter 10, and for the definition and properties concerning the Mumford–Shah energy functional, consult Sections 12.4 and 14.3. For another application of the localization Lemma 4.2.2, we refer the reader to Section 13.3. Lemma 4.2.3. Let be an open bounded subset of RN , µ a nonnegative Radon measure on , and 0 < s ≤ N , t > 0. For each Borel subset E of , the following implication holds: µ(Bρ (x)) ∀x ∈ E, lim sup > t ⇒ µ ≥ CtHs E, ρs ρ→0 where C is a positive constant depending only on s. When s is an integer, one may take C = ωs , the volume of the unit ball of Rs . Proof. One may assume µ(E) < +∞. Since µ(E) = inf{µ(U ) : E ⊂ U, U open set of }, one may choose an arbitrary open subset U of such that E ⊂ U and µ(U ) < +∞. Let now δ > 0 and consider the family of closed balls δ µ(Bρ (x)) F := B ρ (x) ⊂ U : x ∈ E, ρ ≤ , ≥ t . 2 ρs From the hypothesis, it is easily seen that this family finely covers E (Definition 4.1.3). Therefore, according to Lemma 4.1.3, there exists a countable subfamily G of pairwise
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
129
disjoint elements of F such that $ B∈F
B⊂
$
B ∗,
B∈G
where B ∗ denotes the closed ball concentric with B, with radius five times as big as that of B. Moreover, for each finite family G ∗ ⊂ G, ' ( ( ' $ $ ∗ E⊂ B . B ∪ B∈G ∗
B∈G\G ∗
Thus, s H5δ (E) ≤ cs
(diam(B))s + cs 5s
B∈G ∗
≤ 2s cs t −1
'
(
(diam(B))s
B∈G\G ∗
'
µ(B) + 2s cs 5s t −1
B∈G ∗
( µ(B) .
B∈G\G ∗
Since the second sum of the last inequality can be taken less than δ for an appropriate choice of G ∗ , we obtain ' ( −1 s H5δ (E) ≤ ωs t µ(B) + δ B∈G ∗
≤ ωs t −1 µ(U ) + δ. We end the proof by letting δ → 0 and taking the infimum on U .
4.2.2
Duality approach
We recall now some definitions and important results concerning the Riesz functional analysis approach. We set C0 (, Rm ) to denote the space of all continuous functions which tend to zero at infinity, i.e., ∀ε > 0, there exists a compact set Kε ⊂ such that
sup |ϕ(x)| ≤ ε. x∈\Kε
Recall that equipped with the norm ϕ∞ = sup |ϕ(x)|, x∈
C0 (, Rm ) is a Banach space. We denote its subspace made up of all continuous functions with compact support in by Cc (, Rm ). When m = 1, the two above spaces will be denoted, respectively, by C0 () and Cc (). According to the vectorial version of the Riesz–Alexandroff representation Theorem 2.4.6, the dual of C0 (, Rm ) (and then of Cc (, Rm )) can be isometrically identified with M(, Rm ). Then, any Borel measure is a continuous linear form on C0 (, Rm ) or
i
i i
i
i
i
i
130
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Cc (, Rm ) and the two dual norms · Cc (,Rm ) and · C0 (,Rm ) are equal to the total mass | · |(): |µ|() = sup{µ, ϕ : ϕ ∈ C0 (, Rm ), ||ϕ||∞ ≤ 1} = sup{µ, ϕ : ϕ ∈ Cc (, Rm ), ||ϕ||∞ ≤ 1}, where µ, ϕ = µ(ϕ). In what follows, µ, ϕ will also be denoted by ϕ dµ. Note that M(, Rm ) is isomorphic to the product space M()m and that according to this isomorphism µ ∈ M(, Rm ) ⇐⇒ µ = (µ1 , . . . , µm ) and µi ∈ C0 () , i = 1, . . . , m. The following proposition is an easy generalization, for vectorial measures, of Proposition 2.4.13 and Corollary 2.4.1. To shorten notations, σ (C0 , C0 ) and σ (Cc , Cc ) denote, respectively, the two weak topologies σ (C0 (, Rm ), C0 (, Rm )) and σ (Cc (, Rm ), Cc (, Rm )). Proposition 4.2.2. The weak topologies σ (C0 , C0 ) and σ (Cc , Cc ) induce the same toplogy on bounded subsets of M(, Rm ). Moreover, from any bounded sequence of Borel measures (µn )n∈N in M(, Rm ), one can extract a subsequence σ (C0 , C0 )-converging (thus σ (Cc , Cc )-converging) to some Borel measure µ in M(, Rm ). Note that for unbounded sequences of M(, Rm ), the two weak topologies do not agree, as illustrated in the following example: take = (0, +∞) and µn = nδn . Then nδn 0 in the topology σ (Cc (), Cc ()) but nδn , ϕ → 1 as n → +∞ for any function of C0 () satisfying ϕ ∼ x1 in the neighborhood of +∞. According to Proposition 4.2.2, from now on, for bounded sequences in M(, Rm ), we do not distinguish the convergences associated with these two weak topologies and we will refer to these convergences as the weak convergence in M(, Rm ). In terms of probabilistic approach, we have the following properties. Proposition 4.2.3 (Alexandrov). Let µ, µn in M+ () such that µn weakly converges to µ; then for all open subset U of , µ(U ) ≤ lim inf µn (U ), n→+∞
for all compact subset K of , µ(K) ≥ lim sup µn (K). n→+∞
Consequently, for all relatively compact Borel subset B of such that µ(∂B) = 0, we have µ(B) = lim µn (B). n→+∞
Proof. Let U be an open subset of and 1U its characteristic function. Classically, there exists a nondecreasing sequence (ϕp )p∈N in Cc () such that 1U = supp ϕp . Therefore
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
131
µ(U ) = lim
ϕp dµ = lim lim ϕp dµn p→+∞ n→+∞ dµn = lim inf µn (U ). ≤ lim inf p→+∞
n→+∞
n→+∞
U
For the other assertion, it suffices to notice that if K is a compact subset of , there exists a nonincreasing sequence (ϕp )p∈N in Cc () such that 1K = inf p ϕp and to argue similarly. Let us prove the last assertion. Since µn is a nondecreasing set function, according to the first assertions, we have ◦
◦
µ(B ) ≤ lim inf µn (B ) ≤ lim inf µn (B) n→+∞
n→+∞
≤ lim sup µn (B) ≤ lim sup µn (B) ≤ µ(B). n→+∞
n→+∞
◦
The conclusion follows from µ(B) = µ(B ). The following corollary clarifies the relation between the weak convergence of sequences in M(, Rm ) and the convergence of the corresponding measures of suitable Borel sets. Corollary 4.2.1. Let µ, µn in M(, Rm ) be such that µn weakly converges to µ and |µn | weakly converges to some σ in M+ (). Then |µ| ≤ σ and, for all relatively compact Borel subset B of such that σ (∂B) = 0, we have µ(B) = limn→+∞ µn (B). Proof. Let U be an open subset of and ϕ any function in Cc (U, Rm ) with ϕ∞ ≤ 1. Letting n → +∞ in ϕ dµ |ϕ| d|µn |, n ≤ U
we obtain
U
|ϕ| dσ. ϕ dµ ≤ U
U
Taking the supremum on ϕ gives |µ|(U ) ≤ σ (U ). The conclusion |µ| ≤ σ follows from Proposition 4.2.1. For proving the last assertion, let us denote the m components of µn and µ by µin and i µ , i = 1, . . . , m, respectively, and the weak limits (for a subsequence not relabeled) of the positive and negative parts of µin by ν i,+ and ν i,− , respectively. Going to the limit when i,− i,± i i,+ − ν i,− and ν i,± ≤ σ . n → +∞ in µin = µi,+ n − µn and µn ≤ |µn |, we obtain µ = ν The conclusion then follows by applying Proposition 4.2.3 to the m components µi± n . We now restrict ourselves to the case when = RN and we study the approximation of a Borel measure in M(RN , Rm ) by a regular function in the sense of the weak convergence of measures. To this end, we define the regularization (or the mollification) of a measure by N means of a regularizing kernel. Let us recall that a regularizer ρε is a function in C∞ c (R )
i
i i
i
i
i
i
132
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
defined by ρε (x) = ε −N ρ(x/ε), where ρ is some nonnegative real-valued function in N C∞ c (R ) satisfying ρ(x) dx = 1,
sptρ ⊂ B 1 (0).
RN
Note that the support spt ρε of ρε is included in B ε (0). For any measure µ in M(RN , Rm ), we define the function ρε ∗ µ defined on RN by ρε ∗ µ(x) = ρε (x − y) µ(dy) RN
and we aim to show that ρε ∗ µ is a suitable approximation of µ. Theorem 4.2.2. The functions ρε ∗ µ belong to C∞ (RN , Rm ) and for any α ∈ NN , D α (ρε ∗ µ) = D α ρε ∗ µ. Moreover, when ε goes to zero, (i) ρε ∗ µ µ, weakly in M(RN , Rm ); (ii) |ρε ∗ µ| ≤ |µ|;
RN
RN
|ρε ∗ µ| →
(iii) RN
|µ|. RN
Proof. The classical derivation theorem under the integral sign yields D α (ρε ∗ µ) = D α ρε ∗ µ. Let us establish (i). Let ϕ ∈ Cc (RN , Rm ). According to Fubini’s theorem, ϕ(x) ρ ∗ µ(x) dx − ϕ(y) µ(dy) ε N R RN = ϕ(x)ρε (x − y) µ(dy) dx − ϕ(y) µ(dy) N N RN R R = ϕ(x) ρε (x − y) µ(dy) dx − ρε (x − y) dx ϕ(y) µ(dy) N N RN RN R R ≤ |ϕ(x) − ϕ(y)|ρε (x − y) dx |µ|(dy) RN RN |µ|, ≤ sup |ϕ(x) − ϕ(y)| {(x,y)∈R2N :|x−y|≤ε}
RN
which, thanks to the uniform continuity of ϕ, tends to zero when ε → 0. We establish now (ii). By Fubini’s theorem and a change of scale, we have x−y dx |ρε ∗ µ|(x) dx = ε −N ρ µ(dy) ε RN RN RN x−y ≤ ε −N |µ|(dy) dx ρ N N ε R R x−y ≤ dx |µ|(dy) = ε −N ρ |µ|. ε RN RN RN
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
133
Assertion (iii) is a straightforward consequence of (i), the weak lower semicontinuity of the map µ → RN |µ| and (ii). Let Cb (, Rm ) be the set of all bounded continuous functions from into Rm . We introduce now a stronger notion of convergence induced by the weak topology σ (Cb (, Rm ), Cb (, Rm )). Definition 4.2.2. A sequence (µn )n in M(, Rm ) narrowly converges to µ in M(, Rm ) iff
ϕ dµn →
ϕ dµ
for all ϕ in Cb (, Rm ). This convergence is strictly stronger than the weak convergence of measures. Indeed, let = (0, 1) and µn = δ1/n . Then µn weakly converges to 0 but µn = 1. Moreover, taking ϕ ∼ sin(1/x) at 0+ , ϕµn does not converge for any subsequence. This example shows that the unit ball of M() is not weakly sequentially compact for this topology. Nevertheless, the Prokhorov theorem below asserts that the bounded sets of M+ () are sequentially compact for the narrow topology as long as a uniform control is assumed outside a compact set whose measure is close to that of . Theorem 4.2.3 (Prokhorov). Let H be a bounded subset of M+ () satisfying ∀ε, ∃Kε compact subset of such that sup{µ( \ Kε ) : µ ∈ H} ≤ ε. Then H is sequentially compact for the narrow topology. For a proof, consult, for instance, Delacherie and Meyer [125]. Any subset of M+ () satisfying the previous uniform bound is said to be tight. Note that in the previous example, {δ1/n : n ∈ N∗ } is not tight. In the bounded subsets of M+ (), the weak and the narrow topology agree when there is no loss of mass, i.e., when µ() = limn→+∞ µn (). More precisely, we have the following. Proposition 4.2.4. Let µn , µ in M+ (). Then the following assertions are equivalent: (i) µn µ narrowly; (ii) µn µ weakly and µn () → µ(). Proof. We prove (ii) ⇒ (i). The converse is obvious. Let f ∈ Cb (), ε > 0 and K a compact subset of such that µ( \ K) ≤ ε. Let moreover ϕ ∈ Cc () satisfying 0 ≤ ϕ ≤ 1, ϕ = 1 in K. We have
i
i i
i
i
i
i
134
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory f dµn − f dµ ≤ f dµn − f ϕ dµn + f ϕ dµn − f ϕ dµ + f ϕ dµ − f dµ ≤ f ∞ (1 − ϕ) dµn + f ϕ dµn − f ϕ dµ + f ∞ (1 − ϕ) dµ
so that
lim sup f dµn − f dµ ≤ 2εf ∞ n→+∞
and the conclusion follows after letting ε → 0. In the probabilistic approach we have a similar statement. Proposition 4.2.5. Let µn , µ in M+ (). Then the following assertions are equivalent: (i) µn narrowly converges to µ; (ii) µn () → µ() and for all open subset U, µ(U ) ≤ lim inf µn (U ); n→+∞
(iii) µn () → µ() and for all closed subset F, µ(F ) ≥ lim sup µn (F ); n→+∞
(iv) for all Borel subset B such that µ(∂B) = 0, µn (B) → µ(B). Proof. The only implication we have to establish is (iv) ⇒ (i). Indeed, each of the others is an easy consequence of Propositions 4.2.4 and 4.2.3. According to Proposition 4.2.4, it is enough to establish the weak convergence of µn to µ. For this, let ϕ ∈ Cc (), ϕL∞ () = M, with compact support K, and consider the subdivision −M = a0 < a1 · · · < ai < ai+1 < · · · < am = M, ai+1 − ai ≤ ε, µ([ϕ = ai ]) = 0. Such a subdivision exists. Indeed, the last property is a consequence of Lemma 4.2.1. Consider now the Borel subsets Ui = [ϕ < ai ] ∩ K and the function ϕε =
m
ai χUi \Ui−1 .
i=1
Since µ(∂Ui ) = 0, from assertion (iv) we have µn (Ui ) → µ(Ui ) so that for all ε > 0 ϕε dµn → ϕε dµ. From the equiboundedness of the measures µn and the estimate ϕε − ϕ∞ ≤ ε, we easily deduce ϕdµn → ϕdµ, which ends the proof.
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
135
The following proposition is an extension of property (iv). For a proof, consult Marle [178, Proposition 9.9.4]. Proposition 4.2.6. Let µn , µ be Borel measures in M+ () such that µn narrowly converges to µ and let f be a µn -measurable (for every n) and bounded function from into R such that the set of its discontinuity points has a null µ-measure. Then f is µ-measurable and f dµn = f dµ. lim n→+∞
We end this subsection by stating two theorems extending in some sense the classical Fubini’s theorem. We prove only the first one. For the second, see, for instance, [132]. Let µ in M+ ( × Rm ). We denote the projection of µ on by σ . Let us recall that σ is the measure of M+ () defined for all Borel set E of by σ (E) = µ(E × Rm ). The following slicing decomposition holds. Theorem 4.2.4. There exists a family (µx )x∈ of probability measures on Rm , unique up to equality σ -a.e., such that for all f in C0 ( × Rm ) (i) x → f (x, y) dµx (y) is σ -measurable; Rm
f (x, y) dµ(x, y) =
(ii) ×Rm
f (x, y) dµx (y) dσ (x).
Rm
We will write µ = (µx )x∈ ⊗ σ . Proof. First step. We establish the result for all f of the form f = g ⊗ h, where g(x) = 1B (x), B is any Borel subset of , and h belongs to C0 (Rm ). Let us first assume that h belongs to a dense countable subset D of C0 (Rm ) and define γh ∈ M+ () by γh (B) =
h(y) dµ(x, y) ∀ B ∈ B(). B×Rm
Since µ(B ×Rm ) = σ (B) = 0 ⇒ γh (B) = 0, according to the Radon–Nikodym theorem, Theorem 4.2.1, the measure γh has a density ah ∈ L1σ () with respect to σ , i.e., γh = ah · σ. We then obtain
h(y) dµ(x, y) = B×Rm
ah (x) dσ (x).
(4.5)
B
Moreover, there exists a sequence (Nh )h∈D of σ -null sets such that ah is given for all x0 ∈ := \ (∪h∈D Nh ) by ah (x0 ) = lim
ρ→0
= lim
ρ→0
γh (Bρ (x0 )) σ (Bρ (x0 )) (Bρ (x0 ))×Rm
h(y) dµ(x, y)
µ(Bρ (x0 )) × Rm )
.
(4.6)
i
i i
i
i
i
i
136
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
For all fixed x0 in , let us now consider the linear map x0 : D → R defined by x0 (h) = ah (x0 ). From (4.6), we easily check that |x0 (h)| ≤ h∞ . Therefore x0 may be extended by a measure µx0 in M+ (Rm ). Since the map x → 1B (x)µx (h) = 1B (x)ah (x) is σ -measurable for all h ∈ D, it is also σ -measurable for all h in C0 (Rm ). Now, from (4.5), one can write h(y) dµ(x, y) = h(y) dµx (y) dσ (x) B×Rm
B
Rm
with µx ≤ 1. Second step. We establish the result for all f of the form f = g ⊗h, where g ∈ L1σ () and h ∈ C0 (Rm ). For all ε > 0, let us consider the step function gε = i∈I αi 1Bi , Bi ∈ B(), I finite, such that |g − gε | dσ < ε. (4.7)
We have h(y) dµ (y) dσ (x) g ⊗ h dµ − g(x) x Rm ×Rm ≤ g ⊗ h dµ − gε ⊗ h dµ m ×Rm ×R + gε ⊗ h dµ − gε (x) h(y) dµx (y) dσ (x) m Rm ×R h(y) dµx (y) dσ (x) − g(x) h(y) dµx (y) dσ (x). + gε (x)
Rm
Rm
According to the first step, the second term of the right-hand side is equal to zero. According to (4.7), each of the two other terms is less than εh∞ . Since ε is arbitrary, we obtain g ⊗ h dµ = g(x) h(y) dµx (y) dσ (x). (4.8) ×Rm
Rm
We are going to prove that µx is a probability measure. Let (hn )n∈N be a nondecreasing sequence of functions in C0 (Rm ) pointwise converging to 1Rm . From (4.8) we deduce, for all Borel set B in , hn (y) dµ(x, y) = hn (y) dµx (y) dσ (x), B×Rm
B
and, by letting n → +∞,
Rm
σ (B) =
µx (Rm ) dσ (x). B
Since for σ a.e. x in , µx (Rm ) = µx ≤ 1 we infer that µx (Rm ) = 1 for σ a.e. x in .
i
i i
i
i
i
i
4.2. Set functions and duality approach to Borel measures
“abmb 2005/1 page 1 i
137
Third step. We assume that f belongs to C0 ( × Rm ). The result is now an easy consequence of the density of
m gi ⊗ hi : gi ∈ Cc (), hi ∈ Cc (R ), I ∈ P F (N) i∈I
in C0 ( × Rm ) for the uniform norm. (P F (N) denotes the family of all finite subsets of N.) Last step. It remains to establish the uniqueness of the family (µx )x , up to equality σ -a.e. Take f = 1Bρ (x0 ) × h, where h is any function in D and x0 ∈ is such that the limit Bρ (x0 ) Rm h(y) dµx (y) dσ (x) lim ρ→0 σ (Bρ (x0 )) exists. According to the theory of differentiation of measures (Radon theorem), we know that there exists a Borel set Nh with σ (Nh ) = 0 such that the above limit exists for x0 ∈ \ Nh . Now, this limit exists for x0 ∈ = \ ∪h∈D Nh and for all h ∈ D. From 1Bρ (x0 ) h(y) dµ(x, y) = h(y) dµx (y) dσ (x) ×Rm
Bρ (x0 )
we deduce that for x0 ∈
Rm
Rm
h(y) dµx0 (y) = lim
(Bρ (x0 ))×Rm
h(y) dµ(x, y)
σ (Bρ (x0 ))
ρ→0
= x0 (h). Therefore µx = x for all x ∈ and all h ∈ D. This gives the required uniqueness. Theorem 4.2.5 (classical coarea formula). For all Lipschitz functions f : RN −→ R and for all functions g : RN −→ R in L1 (RN ) we have +∞ N −1 g(x)|Df | dx = g(x) dH (x) dt. −∞
RN
[f =t]
As a corollary of Theorem 4.2.5 we obtain the so-called curvilinear Fubini theorem. Corollary 4.2.2. Let be a bounded open subset of RN and t the set {x ∈ : d(x, RN \ ) = t }. Then +∞ N −1 g(x) dx = g(x) dH (x) dt.
−∞
t
Proof. Take f = d(·, RN \ ). Taking now g = 1 and f the truncation of d(·, S) between s and s , with s < s , we obtain the next corollary.
i
i i
i
i
i
i
138
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Corollary 4.2.3. Let S be a subset of RN . Then s LN [s < d(·, S) < s ] = HN −1 [d(·, S) = t] dt and the distributional derivative of L
N
s
[d(·, S) < t] is given by
d N L [d(·, S) < t] = HN −1 [d(·, S) = t] . dt
4.3
Introduction to Young measures
We deal now with the notion of Young measure, a measure theoretical tool, well suited to the analysis of oscillations of minimizing sequences (see, for instance, [108]). We will give in chapter 11 an important application in the scope of relaxation in nonlinear elasticity. For an application to phase transitions for crystals, consult [110]. For a general exposition of the theory, see [53], [55], [104], [215], [223], [224], and the references therein.
4.3.1
Definition
In all this section, is an open bounded subset of RN and E = Rd . In chapter 11, Section 11.4, we will consider the case when d = mN so that E will be isomorphic to the space Mm×N of m × N matrices. To shorten notations, we denote the N -dimensional Lebesgue measure restricted to by L. Definition 4.3.1. We call a Young measure on × E any positive measure µ ∈ M+ ( × E) whose image π #µ by the projection π on is the Lebesgue measure L on , i.e., for all Borel subset B of , π #µ(B) := µ(B × E) = L(B). We denote the set of all Young measures on × E by Y(; E). We now consider the space Cb (; E) of Carathéodory integrands, namely, the space of all functions ψ : × E → R, B() ⊗ B(E) measurable, and satisfying (i) ψ(x, ·) is bounded continuous on E for all x ∈ ; (ii) x → ψ(x, ·) is Lebesgue integrable. We equip Y(; E) with the narrow topology, i.e., the weakest topology which makes the maps µ →
ψ dµ ×E
continuous, when ψ runs through Cb (; E). This topology induces the narrow convergence of Young measures defined as follows: let (µn )n∈N be a sequence of measures in Y(; E) and µ ∈ Y(; E); then nar µn µ ⇐⇒ lim ψ(x, λ)dµn (x, λ) = ψ(x, λ)dµ(x, λ) ∀ψ ∈ Cb (; E). n→+∞ ×E
×E
i
i i
i
i
i
i
4.3. Introduction to Young measures
“abmb 2005/1 page 1 i
139
Remark 4.3.1. Let (µn )n∈N be a sequence in Y(; E) and µ ∈ Y(; E). It is easily seen (see Valadier [223], [224]) that nar 1B (x)ϕ(λ)dµn = 1B (x)ϕ(λ)dµ ∀(B, ϕ) ∈ B()×Cb (E). µn µ ⇐⇒ lim n→+∞ ×E
×E
The space Y(; E) is closed in M( × E) equipped with the narrow convergence; more precisely, we have the next proposition. Proposition 4.3.1. Let (µn )n∈N be a sequence in Y(; E) narrowly converging to some µ in M( × E). Then µ belongs to Y(; E). Proof. From Remark 4.3.1, taking the test function (x, λ) → ϕ(x, λ) = 1B (x)1E (λ), we obtain L(B) = lim µn (B × E) = µ(B × E) n→+∞
for all Borel subsets B of .
4.3.2
Slicing Young measures
According to Theorem 4.2.4, for eachYoung measure µ corresponds a unique family (µx )x∈ (up to equality a.e.) of probability measures on E such that µ = (µx )x∈ ⊗ L. Moreover, the map x → µx is measurable in the following sense: h dµx is measurable. ∀h ∈ C0 (E), x → E
A Young measure µ is then also called a parametrized measure and is the set of the parameters. Let Lw (, M(E)) be the space of all families (µx )x∈ of measures µx ∈ M(E) (not necessarily probability measures) such that x → µx is measurable in the previous sense. By identifying µ = (µx )x∈ ⊗ L with (µx )x∈ , we have Y(; E) ⊂ Lw (, M(E)). We equip Lw (, M(E)) with the following weak convergence: Lw h dµnx h dµx in L∞ () weak star, (µnx )x∈ (µx )x∈ ⇐⇒ ∀ h ∈ C0 (E), E
i.e.,
h dµnx
g(x) E
dx →
g(x)
E
h dµx
dx ∀ h ∈ C0 (E) and ∀g ∈ L1 ().
E
Remark 4.3.2. The set Y(; E) is not closed in Lw (, M(E)) equipped with this convergence. Indeed, take = (0, 1), E = R, and µn = δn ⊗ L. Then (µnx )x∈ is the constant Lw
family δn and (µnx )x∈ 0 which is not a Young measure. Let us define the tightness notion for Young measures.
i
i i
i
i
i
i
140
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Definition 4.3.2. A subset H of Y(; E) is said to be tight if ∀ε > 0, ∃Kε compact subset of E such that sup µ( × (E \ Kε )) < ε. µ∈H
Tight subsets of Y(; E) are closed in Lw (, M(E)). More precisely, we have the next proposition. Proposition 4.3.2. Let (µn )n∈N be a tight sequence in Y(; E) with µn = (µnx )x∈ ⊗ L and Lw
assume that (µnx )x∈ (µx )x∈ in Lw (, M(E)). Then for a.e. x in , µx is a probability measure on E so that (µx )x∈ ⊗ L belongs to Y(; E). Proof. Since supn∈N µn (×E) = L() < +∞, there exists a subsequence (not relabeled) and some µ ∈ M+ ( × E) such that µn µ weakly in the sense of measures in M( × E). We claim that it suffices to prove that µ ∈ Y(; E). Indeed, assuming µ ∈ Y(; E) and denoting by (νx )x∈ the family of probability measures associated with µ, by using a density argument, we easily obtain Lw
(µnx )x∈ (νx )x∈ in Lw (, M(E)). Therefore, by unicity of the weak limit in Lw (, M(E)), up to a Lebesgue negligible subset of , we will obtain (νx )x∈ = (µx )x∈ so that µ = (µx )x∈ ⊗ L ∈ Y(; E). We are going to establish that µ ∈ Y(; E). According to Alexandrov’s theorem (Proposition 4.2.3), for all open subsets U of , one has π #µ(U ) = µ(U × E) ≤ lim inf µn (U × E) = L(U ). n→+∞
Let now K be any compact subset of and for all ε > 0 let Kε be a compact subset of E given by the tightness hypothesis. According to Alexandrov’s theorem again, one has π #µ(K) = µ(K × E) ≥ µ(K × Kε ) ≥ lim sup µn (K × Kε ) n→+∞
≥ lim sup µn (K × E) − ε n→+∞
= L(K) − ε so that, since ε is arbitrary, π #µ(K) ≥ L(K). Since the measure π #µ is regular, we deduce that π #µ(B) = L(B) for all Borel subsets B of . On Y(; E) the narrow convergence and the weak convergence of families of corresponding probability measures, are equivalent. More precisely, we have the following theorem.
i
i i
i
i
i
i
4.3. Introduction to Young measures
“abmb 2005/1 page 1 i
141
Theorem 4.3.1. Let (µn )n∈N be a sequence in Y(; E) and µ ∈ Y(; E) with µn = (µnx )x∈ ⊗ L and µ = (µx )x∈ ⊗ L. Then Lw
nar
µn µ ⇐⇒ (µnx )x∈ (µx )x∈ in Lw (, M(E)). Proof. Implication ⇒ is straightforward. We now prove the converse implication. First step. We establish the tightness of (µn )n∈N . Averaging each family, we define the two probability measures on E, 1 1 νn := µnx dx, ν := µx dx, L() L() which act on all ϕ ∈ C0 (E) as follows: 1 1 νn , ϕ := ϕ(λ) dµnx (λ) dx, ν, ϕ := ϕ(λ) dµx (λ) dx. L() E L() E Thus, the weak convergence of (µnx )x∈ toward (µx )x∈ in Lw (, M(E)) yields the weak convergence of νn toward ν in M(E). According to the regularity property satisfied by ν, for arbitrary ε > 0, there exists a compact subset Kε of E such that ν(E \ Kε ) < ε. From Lemma 4.2.1, one may assume ν(∂Kε ) = 0 so that, according to Alexandrov’s theorem, Proposition 4.2.3, νn (Kε ) → ν(Kε ), and, since νn and ν are probability measures, νn (E \ Kε ) → ν(E \ Kε ). We then deduce supn≥Nε νn (E \ Kε ) < 2ε for a certain Nε in N. Our claim then follows from µn ( × (E \ Kε )) = L()νn (E \ Kε ). nar
Second step. We establish µn µ. According to Remark 4.3.1, it suffices to prove n lim 1B (x)ϕ(λ) dµ (x, λ) = 1B (x)ϕ(λ) dµ(x, λ) ∀B ∈ B(), ∀ϕ ∈ Cb (E).
n→+∞ ×E
×E
For ε > 0 let Kε be the compact subset of E given by the tightness of (µn , µ)n∈N and consider φε ∈ Cc (E) satisfying 0 ≤ φε ≤ 1 and φε = 1 on Kε . We now write n 1B (x)ϕ(λ) dµ (x, λ) − 1B (x)ϕ(λ) dµ(x, λ) ×E ×E n n ≤ 1B (x)ϕ(λ) dµ (x, λ) − 1B (x)φε (λ)ϕ(λ) dµ (x, λ) ×E ×E n + 1B (x)φε (λ)ϕ(λ) dµ (x, λ) − 1B (x)φε (λ)ϕ(λ) dµ(x, λ) ×E ×E + 1B (x)φε (λ)ϕ(λ) dµ(x, λ) − 1B (x)ϕ(λ) dµ(x, λ). (4.9) ×E
×E
n
According to the tightness of (µ , µ)n∈N , the first and the last term in the right-hand side of (4.9) are less than εϕ∞ . On the other hand, since φε ϕ ∈ C0 (E), by hypothesis, the second term tends to zero when n goes to +∞. Therefore, since ε is arbitrary, we end the proof by letting n → +∞ in (4.9).
i
i i
i
i
i
i
142
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
4.3.3
Prokhorov’s compactness theorem
The theorem below may be considered as a parametrized version of the classical Prokhorov compactness Theorem 4.2.3. Theorem 4.3.2 (Prokhorov’s compactness theorem for Young measures). Let (µn )n∈N be a tight sequence in Y(; E). Then, there exists a subsequence (µnk )k∈N of (µn )n∈N and µ in Y(; E) such that nar µnk µ in Y(; E). Proof. Since supn∈N µn (×E) = L() < +∞, there exists a subsequence (not relabeled) and µ ∈ M+ ( × E) such that µn µ weakly in the sense of measures in M( × E). Since (µn )n∈N is tight, arguing as in the proof of Proposition 4.3.2, one may assert that µ belongs to Y(; E). It remains to establish the narrow convergence of µn toward µ, or, equivalently, according to Theorem 4.3.1, the weak convergence of (µnx )x∈ toward (µx )x∈ in Lw (, M(E)). Let ∈ L1 (), ϕ ∈ C0 (E), and ε ∈ Cc () satisfying | − ε | dx < ε. (4.10)
Since µ weakly converges to µ in M( × E) and (x, λ) → ε (x)ϕ(λ) belongs to C0 ( × E), according to the slicing Theorem 4.2.4, one has n lim ε (x) (4.11) ϕ(λ) dµx dx − ε (x) ϕ(λ) dµx dx = 0. n
n→+∞
E
E
Let us write
n (x) ϕ(λ) dµx dx − (x) ϕ(λ) dµx dx E E n n ≤ (x) ϕ(λ) dµx dx − ε (x) ϕ(λ) dµx dx E E n + ε (x) ϕ(λ) dµx dx − ε (x) ϕ(λ) dµx dx E E + ε (x) ϕ(λ) dµx dx − (x) ϕ(λ) dµx dx .
E
(4.12)
E
From (4.10), the first and the last term of the right-hand side of (4.12) are less than εϕ∞ . Since ε is arbitrary, the claim follows from (4.11) by letting n → +∞ in (4.12).
4.3.4 Young measures associated with functions and generated by functions Let u : → E be a given Borel function and consider the image µ = G#L of the measure L by the graph function G : → × E, x → (x, u(x)). Since the image of the measure µ by the projection π on is the Lebesgue measure L, µ belongs to Y(; E). This measure, concentrated on the graph of u, is called the Young
i
i i
i
i
i
i
4.3. Introduction to Young measures
“abmb 2005/1 page 1 i
143
measure associated with the function u. By definition of the image of a measure, µ “acts” on Cb (; E) as follows: ∀ϕ ∈ Cb (; E), ϕ(x, λ) dµ(x, λ) = ϕ(x, u(x)) dx. ×E
This shows that the probability family (µx )x∈ associated with µ is (δu(x) )x∈ . Let (un )n∈N be a sequence of Borel functions un : → E and consider the sequence nar of their associated Young measures (µn )n∈N , µn = (δun (x) )x∈ ⊗ L. If µn µ in Y(; E), the Young measure µ is said to be generated by the sequence (un )n∈N . In general (see Examples 4.3.1 and 4.3.2), µ is not associated with a function. Let us rephrase the tightness of a sequence (µn )n∈N in terms of the associated sequence (un )n∈N . We easily obtain the following equivalence: the sequence (µn )n∈N is tight iff & % ∀ε > 0, ∃Kε , compact subset of E, such that sup L x ∈ : un (x) ∈ E \ Kε < ε. n∈N
Remark 4.3.3. It is worth noticing that a sequence (µn )n∈N of Young measures associated with a bounded sequence (un )n∈N in L1 (, E) is tight. Indeed according to the Markov inequality, one has % & 1 L x ∈ : |un (x)| > M ≤ |un | dx M 1 ≤ sup |un | dx, M n∈N which tends to zero when M → +∞. Therefore, according to Prokhorov’s theorem, Theorem 4.3.2, for each bounded sequence (un )n∈N in L1 (, E), one can extract a subsequence generating a Young measure µ, i.e., nar
(δun (x) )x∈ ⊗ L µ.
4.3.5
Semicontinuity and continuity properties
Here is a first semicontinuity result related to extended real-valued nonnegative functions. Proposition 4.3.3. Let ϕ : × E → [0, +∞] be a B() ⊗ B(E) measurable function such that λ → ϕ(x, λ) is lsc for a.e. x in . Moreover, let (µn )n∈N be a sequence of Young measures in Y(; E), narrowly converging to some Young measure µ in Y(; E). Then ϕ(x, λ) dµ(x, λ) ≤ lim inf ϕ(x, λ) dµn (x, λ). n→+∞
×E
×E
Proof. Let us consider the Lipschitz regularization of ϕ % & ϕp (x, λ) := inf ϕ(x, ξ ) + p|λ − ξ | ξ ∈E
i
i i
i
i
i
i
144
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
for p ∈ N intended to go to +∞, and set ψp = ϕp ∧ p. It is easily seen that ψ belongs to Cb (; E) (see Theorem 9.2.1) and that (ψp )p∈N is a nondecreasing sequence which pointwise converges to ϕ. Consequently, ψp (x, λ) dµ(x, λ) = lim ψp (x, λ) dµn (x, λ) n→+∞ ×E ×E ≤ lim inf ϕ(x, λ) dµn (x, λ), n→+∞
×E
and we complete the proof thanks to the monotone convergence theorem, by letting p → +∞ in the left-hand side. We would like to improve Proposition 4.3.3 for functions ϕ taking negative values or more generally for functions ϕ which are not necessarily bounded from below. We restrict ourselves to sequences of Young measures associated with functions. Let us first recall the notion of uniform integrability: a sequence (fn )n∈N of functions fn : → R in L1 () is said to be uniformly integrable if lim sup |fn | = 0. R→+∞ n∈N
[|fn |>R]
It is classical that this definition is equivalent to Definition 2.4.4 (see Delacherie and Meyer [125]). Proposition 4.3.4. Let (µn )n∈N be a sequence of Young measures associated with a sequence of functions (un )n∈N , narrowly converging to some Young measure µ. On the other hand, let ϕ : × E → R be a B() ⊗ B(E) measurable function such that λ → ϕ(x, λ) is lsc for a.e. x in . Assume moreover that the negative part x → ϕ(x, un (x))− is uniformly integrable. Then ϕ(x, un (x)) dx. ϕ(x, λ) dµ(x, λ) ≤ lim inf n→+∞
×E
Proof. Let R > 0 intended to tend to +∞ and set ϕR = sup(−R, ϕ)+R. Since ϕR ≥ 0 and λ → ϕR (x, λ) is lsc, one may apply Proposition 4.3.3 so that, removing the term RL(), one obtains ϕ(x, λ) dµ(x, λ) ≤ sup(−R, ϕ(x, λ)) dµ(x, λ) ×E ×E ≤ lim inf sup(−R, ϕ(x, λ)) dµn (x, λ) n→+∞ ×E = lim inf sup(−R, ϕ(x, un (x))) dx. (4.13) n→+∞
On the other hand, sup(−R, ϕ(x, un (x))) dx = ϕ(x, un (x)) dx + (−R) dx [ϕ(.,un (.))≥−R] [ϕ(.,un (.)) 0 such that µ(∂(Bρ (x0 ) × A)) = 0. Such a choice is possible thanks to Lemma 4.2.1. Since nar
µn µ, in particular µn weakly converges in the sense of measures in M( × E) so that, according to Theorem 4.2.3, µ(Bρ (x0 ) × A) = lim µn (Bρ (x0 ) × A) n→+∞ & % = lim L x ∈ Bρ (x0 ) : un (x) ∈ A . n→+∞
Collecting (4.16) and (4.17), we finally obtain % & L x ∈ Bρ (x0 ) : un (x) ∈ A , µx0 (A) = lim lim ρ→0 n→+∞ L(Bρ (x0 ))
(4.17)
(4.18)
which proves the thesis. For a given sequence (un )n∈N the first mode of behavior which can cause a defect of strong convergence is the presence of rapid oscillations in the functions un . Estimate (4.18) shows that Young measures capture some information on such oscillations. We will illustrate this property with a few examples. Let u : Y = (0, 1)N → E be a given function in Lp (Y, E), p ≥ 1 extended by Y periodicity to RN and define the sequence (un )n∈N by setting un (x) = u(nx) for all x ∈ RN . Classically one has un u in Lp (, E), where u is the mean value of u defined by u = Y u(y) dy. Obviously, if u is not a constant function, we have not strong convergence in Lp (, RN ) nor a.e. pointwise convergence on of the sequence (un )n∈N toward u. Let (µn )n∈N denote the sequence of Young measures
i
i i
i
i
i
i
4.3. Introduction to Young measures
“abmb 2005/1 page 1 i
147
associated with the sequence (un )n∈N , i.e. µn = (δun (x) )x∈ ⊗ L. Then we have the next proposition. Proposition 4.3.5. The sequence (µn )n∈N narrowly converges to µ = (µx )x∈ ⊗ L in Y(; E), where, for a.e. x in , the probability measure µx is the image u#LY of the Lebesgue measure LY by the function u. In other words, µ acts on all ϕ ∈ Cb (; E) as follows: ϕ(x, λ) dµ(x, λ) = ×E
ϕ(x, u(y)) dy
dx.
Y
Proof. It is enough to establish n lim ϕ(x, λ) dµ (x, λ) = ϕ(x, u(y)) dy dx n→+∞ ×E
Y
when ϕ is of the form ϕ(x, λ) = 1B (x)φ(λ), where B belongs to B() and φ is a bounded continuous function on E (see Remark 4.3.1). Since, classically, x → φ(u(nx)) weakly converges to Y φ(u(y)) dy in L∞ () weak star, we have lim ϕ(x, λ) dµn (x, λ) = lim 1B (x)φ(u(nx)) dx n→+∞ ×E n→+∞ = 1B (x) φ(u(y)) dy dx Y = ϕ(x, u(y)) dy dx
Y
and the thesis is proved. Example 4.3.1. Take u : (0, 1) → R the function defined by −1 if x ∈ (0, 21 ), u(x) = +1 if x ∈ ( 21 , 1), and consider the sequence (un )n∈N defined as previously with, for instance, = (0, 1). Using Proposition 4.3.5 (or (4.18)), it is easily seen that µx = 21 δ−1 + 21 δ+1 so that µ is the measure on (0, 1) × R, concentrated on the union of the two segments (0, 1) × {−1} ∪ (0, 1) × {1}, with the mass 1/2 on each two segments. Therefore, the measure µ, encoding the oscillations, is not associated with a function. Example 4.3.2. Take u : (0, 1) → R the function defined by u(x) = sin(2π x) and consider the sequence (un )n∈N defined as previously with, for instance, = (0, 1). An elementary calculation gives µx = √1 2 L(−1, 1) so that µ is not associated with a function but π
1−y
concentrated on all the rectangle (0, 1) × (−1, 1). The next proposition shows that Young measures encode oscillations on the weak limits.
i
i i
i
i
i
i
148
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Proposition 4.3.6. Let (un )n∈N be a given sequence of functions in Lp (, E), weakly converging to some u in Lp (, E), p ≥ 1, and assume that the sequence (µn )n∈N of their associated Young measures narrowly converges to some Young measure µ. Then, for a.e. x in , u(x) is the barycenter (or the expectation) of the probability measure µx : u(x) = λdµx (λ). E
Proof. Reasoning with each component of un , one may assume, without restrictions, that un is a real-valued function and that E = R. Let us first apply Theorem 4.3.3 with ϕ defined by ϕ(x, λ) = φ(x)λ, where φ ∈ Cc (). Since (un )n∈N weakly converges in Lp (), the sequence ϕ(., un (.))n∈N is uniformly integrable (see Remark 4.3.4) so that, according to Theorem 4.3.3, φ(x)λ dµ(x, λ) = lim φ(x)un (x) dx n→+∞ ×E = φ(x)u(x) dx.
According now to Theorem 4.2.4, we infer φ(x) λ dµx (λ) dx = φ(x)u(x) dx.
R
Since φ is arbitrary, one obtains u(x) =
R
λ dµx (λ) for a.e. x in .
One can now establish the Dunford–Pettis theorem, Theorem 2.4.5. Proposition 4.3.7. Let (un )n∈N be a given sequence of uniformly integrable functions in L1 (, E). Then, there exists a subsequence (unk )k∈N and u in L1 (, E) such that unk u
σ (L1 , L∞ ).
Proof. Since (un )n∈N is uniformly integrable, one can easily establish that sup |un | dx < +∞ n∈N
so that (see Remark 4.3.3) the sequence of Young measures µn = (δun (x) )x∈ ⊗ L is tight. According to Prokhorov compactness Theorem 4.3.2, there exist a subsequence of (µn )n∈N nar (not relabeled) and µ in Y(; E) satisfying µn µ. Consider g ∈ L∞ (, E) and set ϕ(x, λ) := g(x).λ. The sequence (ϕ(x, un (x))n∈N obviously satisfies hypotheses of Theorem 4.3.3 so that lim g(x).un (x) dx = g(x). λ dµx dx. n→+∞
The barycenter
E
u : x →
λ dµx E
then satisfies un u σ (L1 , L∞ ).
i
i i
i
i
i
i
4.3. Introduction to Young measures
“abmb 2005/1 page 1 i
149
4.3.7 Young measures do not capture concentrations Another mode of behavior which causes a defect of strong convergence for a sequence (un )n∈N weakly converging to some u in Lp (, E) is the concentration effect. Such concentration effects appear when un − u converges to zero in measure and when the total mass |un − u|p dx is concentrated at the limit to a set of zero Lebesgue measure. Note that in examples of Section 4.3.6 no concentration effects occured because the sequences (un )n∈N did not converge in measure. Let us illustrate the concentration phenomenon with the following elementary example. Let = (−1, 1), p = 2 and consider the real-valued function un , n ∈ N∗ , defined by √ n if x ∈ (− n1 , n1 ), un (x) = 0 otherwise. It is easily seen that un converges to 0 a.e. in (−1, 1), in measure, and weakly in L2 (−1, 1). On the other hand, the total mass is (−1,1) |un |2 dx = 2, while the measure |un |2 L weakly converges to 2δ0 in the sense of measures in M(−1, 1). Therefore the total mass is concentrated at the point x = 0. Proposition 4.3.8 shows that Young measures generated by sequences converging in measure are trivial and therefore do not capture concentration effects. For the analysis of both oscillations and concentration effects, see [217] and [137]. Proposition 4.3.8. A sequence (un )n∈N of Borel functions converges to u in measure iff the associated sequence of Young measures (µn )n∈N narrowly converges to the Young measure associated with u, i.e., µ = (δu(x) )x∈ ⊗ L. On the other hand, let (µn )n∈N be a sequence of Young measures associated with a sequence of Borel functions (un )n∈N , narrowly converging to some Young measure µ. Moreover let (vn )n∈N be another sequence of Borel functions vn : → E such that vn − un converges to 0 in measure. Then the sequence (vn )n∈N generates the same Young measure µ. In other words, vn = (vn − un ) + un generates the Young measure µ generated by un so that the perturbation by vn − un , for which a concentration phenomenon may occur, has no effect on µ. Proof. First step. We first claim Lw
un converges in measure toward u ⇒ (δun (x) )x∈ (δu(x) )x∈ , which, according to Theorem 4.3.1, is equivalent to the narrow convergence of corresponding Young measures. Now, after using an easy density argument, it is enough to test the convergence with ϕ(x, λ) = 1B (x)φ(λ), where B ∈ B() and φ ∈ C0 (E). Let ε > 0 given arbitrary. From uniform continuity of φ, there exists η > 0 such that |λ − λ | < η ⇒ |φ(λ) − φ(λ )| < ε.
i
i i
i
i
i
i
150
“abmb 2005/1 page 1 i
Chapter 4. Complements on measure theory
Let us write ϕ(x, un (x)) dx − ≤ ϕ(x, u(x)) dx |φ(un (x)) − φ(u(x))| dx = |φ(un (x)) − φ(u(x))| dx + |φ(un (x)) − φ(u(x))| dx [|un −u|>η]
[|un −u|≤η]
≤ 2φ∞ L([|un − u| > η]) + εL().
(4.19)
Now, by hypothesis, limn→+∞ L([|un − u| > η]) = 0 and, since ε is arbitrary, the claim follows after letting n → +∞ in (4.19). Second step. We establish the converse implication. Let us consider ϕ ∈ Cb (; E) nar defined by ϕ(x, λ) = |λ − u(x)| ∧ C, where C is any positive constant. Since µn µ, one has lim ϕ(x, λ) dµn (x, λ) = ϕ(x, λ) dµ(x, λ), n→+∞ ×E
×E
that is, lim
n→+∞
|un (x) − u(x)| ∧ C dx =
|u(x) − u(x)| ∧ C dx = 0.
(4.20)
On the other hand, for any η > 0, 1 L([|un − u| > η]) ≤ min(η, C)
|un (x) − u(x)| ∧ C dx.
Consequently, (4.20) yields limn→+∞ L([|un − u| > η]) = 0. Last step. We establish the second assertion. As previously, according to Theorem 4.3.1, it suffices to establish φ(vn ) dx = 1B (x)φ(λ) dµ(x, λ) lim n→+∞ B
×E
for all Borel subset B of and all φ ∈ Cc (E). Let ε > 0. Since φ is uniformly continuous on E, there exists η > 0 such that |φ(vn (x)) − φ(un (x))| < ε for all x in the set [|vn − un | < η]. On the other hand, since vn − un tends to 0 in measure, limn→+∞ L([|vn − un | ≥ η]) = 0. Therefore |φ(vn ) − φ(un )| dx φ(vn ) dx − φ(un ) dx ≤ B B = |φ(vn ) − φ(un )| dx [|vn −un |≥η] |φ(vn ) − φ(un )| dx + [|vn −un | 0, 1 [J (u + tv) − J (u)] ≥ 0, t and pass to the limit on this inequality as t → 0+ . We have |∇u + t∇v|p − |∇u|p 1 1,p ∀v ∈ W0 () dx − f v dx ≥ 0. p t
(6.98)
To pass to the limit in (6.98) we use the Lebesgue dominated convergence theorem: Set h(t) = |∇u + t∇v|p . We have h (t) = p|∇u + t∇v|p−2 (∇u + t∇v) · ∇v.
i
i i
i
i
i
i
252
“abmb 2005/1 page 2 i
Chapter 6. Variational problems: Some classical examples
Hence 1 1 |∇u + t∇v|p − |∇u|p = (h(t) − h(0)) t t 1 1 = p|∇u + s∇v|p−2 (∇u + s∇v) · ∇v ds. t 0 From this, by taking 0 < t ≤ 1, we obtain p t 1 |∇u + s∇v|p−1 |∇v| ds |∇u + t∇v|p − |∇u|p ≤ t 0 t ≤ p(|∇u| + |∇v|)p−1 |∇v|.
(6.99)
Let us notice that |∇v| ∈ Lp () and that (|∇u| + |∇v|)p−1 belongs to Lp () (because of the equality (p − 1)p = p). Hence the right-hand side of (6.99) is a function which belongs to L1 () and which is independent of t ∈]0, 1]. We can now pass to the limit on (6.98) and obtain
1,p
∀v ∈ W0 ()
|∇u|p−2 ∇u · ∇v dx −
f v dx ≥ 0.
Let us now replace v by −v. We obtain ∀v ∈
1,p W0 ()
|∇u|
p−2
∇u · ∇v dx =
f v dx.
(iii) By taking v ∈ D(), we obtain, by definition of the derivation in distributions, the following equality: − div(|∇u|p−2 ∇u) = f in D (). 1,p
On the other hand, when is regular, from u ∈ W0 () we infer that u = 0 on ∂ in the trace sense (Proposition 5.6.1). Remark 6.6.1. We could as well consider Neumann-type boundary value problems for the ¯ we have the following integration p operator. Let us just notice that, assuming u ∈ C2 (), by parts formula: ¯ ∀v ∈ D()
|∇u|p−2 ∇u · ∇v dx = −
vp u dx +
|∇u|p−2 ∂
∂u v dσ. ∂n
It follows that variational problems in W 1,p () lead to boundary conditions of the following type: ∂u = g on ∂. |∇u|p−2 ∂n (Note that when p = 2, one recovers the classical Neumann boundary condition.)
i
i i
i
i
i
i
6.7. The Stokes system
“abmb 2005/1 page 2 i
253
6.7 The Stokes system In this section, we are going to make precise the variational approach to the Stokes system, which was introduced in Section 2.3.1. Let us recall that the Stokes system for an incompressible viscous fluid in a domain of RN consists in finding functions u1 , u2 , . . . uN : −→ R and p : −→ R which satisfy ∂p = fi on , i = 1, . . . , N, −µui + ∂xi N ∂ui = 0 on , i=1 ∂xi ui = 0 on ∂, i = 1, . . . , N. The given vector f = (f1 , f2 , . . . , fN ) ∈ L2 ()N represents a volumic density of forces, µ > 0 is the viscosity coefficient. (It is a positive scalar which is inversely proportional to the Reynolds number.) The vector function u = (u1 , . . . , uN ) : −→ RN is the velocity vector field of the fluid; it assigns to each point x ∈ the velocity vector u(x) = (ui (x))i=1,...,N of the fluid at x. The scalar function p : −→ R is the pressure; for each x ∈ , p(x) is the pressure of the fluid at x. The Stokes system can be written in the following form: −µu + ∇p = f on , div(u) = 0 on , u = 0 on ∂. The condition div(u) = 0 expresses that the fluid is incompressible. The Stokes system is a linear system of (N + 1) partial differential equations on involving (N + 1) unknown functions (u1 , . . . uN , p). The variational formulation of the Stokes system was introduced by Leray around 1934. The idea is to work in the functional space % & V = v ∈ H01 ()N : div(v) = 0 (6.100) and make the pressure appear as a Lagrange multiplier of the constraint “div(v) = 0.” Let us assume that is a bounded connected open subset of RN whose boundary is piecewise C1 . The space V is equipped with the scalar product of H01 ()N u, vH01 ()N =
N
ui , vi H01 () ,
i=1
where ui , vi H01 () =
(ui vi + ∇ui · ∇vi ) dx,
and the corresponding norm vH01 ()N =
' N i=1
(1/2 (u2i
+ |∇ui | ) dx 2
.
i
i i
i
i
i
i
254
“abmb 2005/1 page 2 i
Chapter 6. Variational problems: Some classical examples
The space V is equal to the kernel of the divergence operator div, div : v ∈ H01 ()N −→ div(v) ∈ L2 (), which is a linear continuous operator from H01 ()N into L2 (). The continuity of the div operator follows from the following inequality: ∀v ∈ H01 ()N
div(v)2L2 () = ≤
N ∂vi 2 ∂x dx i=1
i=1
N
i
|∇vi |2 dx
≤ v2H 1 ()N . 0
Hence, V is a closed subspace of H01 ()N and V is an Hilbert space. We can now state the variational formulation of the Stokes system. Theorem 6.7.1. (a) For every f ∈ L2 ()N there exists a unique u ∈ V which satisfies N N µ ∇ui · ∇vi dx = fi vi dx ∀v ∈ V , (6.101) i=1 i=1 u ∈ V. (b) Let u be the solution of (6.101). Then the relation (6.101) determines a unique p ∈ L2 () (up to an additive constant) such that the couple (u, p) ∈ V × L2 () satisfies µ
N i=1
∇ui · ∇vi dx −
N i=1
fi vi dx =
p div(v) dx ∀v ∈ H01 ()N .
(6.102)
(c) The couple (u, p) is a weak solution of the Stokes system: −µu + ∇p = f in D ()N , div(u) = 0 in D ), u = 0 on ∂ in the trace sense. The couple (u, p) is called the variational solution of the Stokes system. Proof. (a) Let us consider the bilinear form a : V × V −→ R a(u, v) = µ
N i=1
∇ui · ∇vi dx
and the linear form l : V −→ R l(v) =
N i=1
fi vi dx.
i
i i
i
i
i
i
6.7. The Stokes system
“abmb 2005/1 page 2 i
255
The bilinear form a is clearly continuous and its coercivity follows, by standard argument, from the Poincaré inequality in H01 (). The continuity of l is also immediate. Thus, all the assumptions of the Lax–Milgram theorem, Theorem 3.1.2, are satisfied. This implies the existence and uniqueness of the solution u of (6.101). (b) The difficulty comes from the fact that D()N is not contained in the space V , and one cannot interpret directly (6.101) in terms of distributions. Moreover, up to now, the pressure p still has not appeared in the above variational formulation. Let us reformulate (6.101) as an orthogonality relation. Let us consider the linear form L : H01 ()N −→ R which is defined by L(v) = µ
N
∇ui · ∇vi dx −
i=1
N i=1
fi vi dx.
(6.103)
The linear form L is clearly continuous on H01()N and (6.101) precisely tells us that ∗ L(v) = 0 for all v ∈ V . In other words L ∈ H01 ()N and L(v) = 0 ∀v ∈ V , i.e., L ∈ V ⊥ the orthogonal subspace of V (for the pairing between H01 ()N and its topological dual). The precise description of such elements is provided by the following theorem, obtained by De Rham in 1955. N Theorem 6.7.2. boundary 1 Let N∗be a bounded connected set in R1 whose % is piecewise 1 N C . Let L ∈ H () , a linear continuous form on H () . Set V = v ∈ H01 ()N : 0 0 & div(v) = 0 . Then 2 p div(v) dx ∀v ∈ H01 ()N . L(v) = 0 ∀v ∈ V ⇐⇒ ∃p ∈ L () such that L(v) =
Proof of Theorem 6.7.1 continued. Let us admit the De Rham theorem (the implication L ∈ V ⊥ ⇒ ∃p . . ., which is the interesting part of the theorem, is a nontrivial result), and apply it to the bilinear form L which is defined in (6.103). We thus have the existence of p ∈ L2 () such that µ
N i=1
∇ui · ∇vi dx −
N
i=1
fi vi dx =
p div(v) dx
∀v ∈ H01 ()N .
This is precisely (6.102). (c) Since D()N is dense in H01 ()N , the solution of (6.102) is characterized by (u, p) ∈ V × L2 () and µ
N i=1
∇ui · ∇vi dx −
p div(v) dx =
N i=1
fi vi dx
∀v ∈ D()N .
Taking v = (0, . . . , vi , . . . 0), i = 1, . . . , N yields ∂vi p dx = fi vi dx µ ∇ui · ∇vi dx − ∂xi
∀vi ∈ D(),
i
i i
i
i
i
i
256
“abmb 2005/1 page 2 i
Chapter 6. Variational problems: Some classical examples
that is, −µui +
∂p = fi in D (). ∂xi
Moreover, u ∈ V contains the information div(u) = 0 in D (), u = 0 on ∂ in the trace sense. Hence, (u, p) is a weak solution of the Stokes system.
i
i i
i
i
i
i
“abmb 2005/1 page 2 i
Chapter 7
The finite element method
One of the major interests of variational methods is to provide both a theory for existence of solutions and numerical methods for computing accurate approximations of these solutions. Certainly, the most celebrated of these variational approximation methods is the finite element method. It is a Galerkin approximation scheme where the elements of the finite dimensional approximating subspaces Vn are piecewise polynomial functions. This method has been proved to be very successful, the main reason being that, because of the local character of several problems, by choosing a basis of the space Vn whose functions have small supports, one obtains approximated problems with sparse matrices, i.e., with most entries equal to zero. This is a decisive property in order to be able to solve numerically the corresponding linear system: one should notice that engineering problems involving systems of PDEs from continuum mechanics usually give rise to large linear systems (100 × 100 or 1000 × 1000 are quite frequent!). Our scope is to introduce the main ideas of the finite element method and then to describe a typical example. For simplicity of the exposition, we restrict ourselves to linear problems whose variational formulation enters into the abstract setting of the Lax–Milgram theorem: find u ∈ V such that a(u, v) = l(v) for all v ∈ V . (7.1) Let us first recall and make precise some aspects of the Galerkin method (which was introduced in Section 3.1.2).
7.1 The Galerkin method: Further results Let us briefly recall the assumptions on the abstract variational problem (7.1): V is a Hilbert space, a : V × V → R is a continuous coercive bilinear form, i.e., there exists some constants M ∈ R+ and α > 0 such that ∀u, v ∈ V , ∀v ∈ V ,
|a(u, v)| ≤ Muv, a(v, v) ≥ αv2 .
(7.2) (7.3)
257
i
i i
i
i
i
i
258
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
The linear form l : V → R is supposed to be continuous. Then, Lax–Milgram’s theorem asserts the existence and uniqueness of a solution u ∈ V of problem (7.1). Typically, as in the boundary value problems which were studied in the previous sections, V is a Sobolev space, like H 1 () or H01 (). It is an infinite dimensional space; this is a common feature of all methods from functional analysis, and the numerical computation of the solution u requires a further step which is the reduction to a finite dimensional problem. The Galerkin method is based on the approximation of the infinite dimensional space V by a sequence of finite dimensional subspaces (Vn )n∈N . More precisely, for each n ∈ N, Vn is a finite dimensional subspace of V and one supposes that the following approximation property holds: ∀v ∈ V , ∃(vn )n∈N , vn ∈ Vn ∀ n ∈ N, and vn → v in V . For each n ∈ N the approximated variational problem is find un ∈ Vn such that a(un , v) = l(v) ∀ v ∈ Vn .
(7.4)
(7.5)
One should notice that the approximated problem (7.5) is still a variational problem: it has the same structure as the initial variational problem (7.1), except now it is posed on a finite dimensional space Vn . Indeed, existence and uniqueness of the solution un of (7.5) follows, in a similar way, from the Lax–Milgram theorem. When a is symmetric, problem (7.5) reduces to a minimization problem, namely, un is the solution of 1 min (7.6) a(v, v) − l(v) : v ∈ Vn . 2 This alternate description of the finite dimensional approximation method is often called the Ritz method. The term variational approximation is justified by the fact that the sequence of problems (7.5) does approximate the initial problem (7.1), in the sense that the sequence (un )n∈N norm converges in V to u. More precisely, in Proposition 3.1.2 it was proved that M (7.7) dist(u, Vn ), α and one can observe that, clearly, the approximation property (7.4) implies that for any u ∈ V , dist(u, Vn ) → 0 as n → +∞. Let us now make precise the structure of the approximated problem (7.5). Let us introduce a basis (ϕ1 , ϕ2 , . . . , ϕI (n) ) of the vector space Vn with I (n) = dim Vn . Let us (n) write un = Ii=1 λi ϕi . Then (7.5) is equivalent to j ) = l(ϕj ) ∀ j = 1, 2, . . . , I (n), a(un , Iϕ(n) λ i ϕi . un = u − un ≤
i=1
This, in turn, is equivalent to finding λ = (λi )i=1,2,...,I (n) in RI (n) which is a solution of the linear system I (n) λi a(ϕi , ϕj ) = l(ϕj ) ∀ j = 1, 2, . . . , I (n). (7.8) i=1
i
i i
i
i
i
i
7.1. The Galerkin method: Further results
“abmb 2005/1 page 2 i
259
Let us set An = (a(ϕj , ϕi ))1≤i,j ≤I (n) . It is an I (n) × I (n) square matrix which, by reference to the elasticity problem, is often called the stiffness matrix. Similarly, the vector bn = (l(ϕj )) in RI (n) is often called the load vector. With these notations, one can write (7.8) in the following form: An λ = bn .
(7.9)
Let us now examine the properties of the matrix An : for any vector λ ∈ RI (n) we have (·, · is the Euclidean scalar product in RI (n) and | · | the Euclidean norm) An λ, λ =
I (n)
(An λ)i λi
i=1
I (n) I (n) a(ϕj , ϕi )λj λi = i=1
j =1
I (n) I (n) = a λj ϕ j , λi ϕ i j =1
≥ α
I (n)
i=1
λ i ϕi 2 .
i=1
(n) (n) λi ϕi is a norm on is a basis of Vn , one can easily verify that λ → Ii=1 Since (ϕ)Ii=1 RI (n) . All norms being equivalent on the finite dimensional space RI (n) , there exists some constant c > 0 such that ∀λ ∈ R
I (n)
c|λ| ≤
I (n)
λ i ϕi .
i=1
Hence ∀λ ∈ RI (n)
An λ, λ ≥ αc2 |λ|2 .
(7.10)
From (7.10) it follows that An is one to one (that is, ker(An ) = {0}), which, in this finite dimensional setting, implies that for any load vector bn , problem (7.9) has a unique solution. This is another elementary way (without using Lax–Milgram’s theorem) to prove existence and uniqueness of the solution λ of the approximated problem (7.9). Let us also notice that when a is symmetric, so is the matrix An . We now come to the central point of this theory which is the effective construction of the finite dimensional approximating subspaces Vn and the resolution of the linear system (7.9). As we have already stressed, it is crucial, from a numerical point of view, that the matrix An possesses as many zeroes as possible. At this point, there are different strategies: in the next section, we shall describe an approach which uses spectral analysis in infinite dimensional spaces and a special basis whose elements are eigenfunctions. The finite element method relies on a different strategy that we now describe.
i
i i
i
i
i
i
260
7.2
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
Description of finite element methods
Let us now assume that V is a closed subspace of H 1 () and the bilinear form a : V × V → R is of the type (∇u · ∇v + a0 uv) dx,
a(u, v) =
(7.11)
with a0 ∈ L∞ (), a0 ≥ 0. This allows us to cover various situations like Dirichlet, Neumann, and mixed problems which were studied in the previous sections. A key property of a is that it is a local bilinear form, that is, a(ϕ, ψ) = 0, as soon as ϕ and ψ are two elements of V whose supports do not intersect, or more generally such that the Lebesgue measure of the intersection of their supports is zero. Let us recall that the stiffness matrix An is equal to (a(ϕi , ϕj )), where (ϕi ), i = 1, . . . , I (n), is a basis of Vn . The strategy is now clear: we have to choose Vn such that one can find a canonical basis in the space Vn whose corresponding functions have supports which are as small as possible. This is made possible thanks to a triangulation of the set . For simplicity of the exposition, we restrict ourselves to problems which are posed over sets ⊂ R2 which are polyhedra; we also say that such set is polygonal. Definition 7.2.1. A triangulation T of a polygonal set of R2 is a finite decomposition of the set of the form $ = K K∈T
such that (i) each set K ∈ T is a triangle, (ii) whenever K1 and K2 belong to T , K1 ∩ K2 is either empty, or reduced to a common vertex, or to a common face (edge). In particular, for each distinct K1 , K2 ∈ T , one has int(K1 ) ∩ int(K2 ) = ∅. Two triangles K1 and K2 which have a common face are said to be adjacent. The triangles K ∈ T are called finite elements. We set h(T ) = max diamK, K∈T
(7.12)
where diamK = sup{|x − y| : x, y ∈ K} is the diameter of K. By convention, we denote by Th a triangulation T such that h(T ) = h. As we shall see, to each triangulation Th will be associated a finite approximating dimensional subspace Vh . It is convenient to consider the family of approximating subspaces indexed by the positive parameter h, say, Vh with h → 0. Of course, to reduce to an abstract Galerkin scheme as described in Section 7.1, one may take Vh = Vhn for some hn → 0. An example of triangulation is given in Figure 7.1.
i
i i
i
i
i
i
7.2. Description of finite element methods
“abmb 2005/1 page 2 i
261
Figure 7.1. Example of triangulation. One may already observe that one can use a fine triangulation in a subregion where a particular behavior of the solution is expected (for example, in special parts of airplanes). Figure 7.2 shows a forbidden situation where the intersection of two triangles K1 and K2 is not an edge of K2 :
Figure 7.2. A forbidden situation. Let us now describe the finite dimensional space Vh which is associated to a triangulation Th . At this point, we need to make precise the boundary condition; take, for example, the Dirichlet boundary condition u = 0 on ∂ and V = H01 (). Then Vh = {v ∈ C() : v is affine on each K ∈ Th , v = 0 on ∂}. In other words, Vh is the linear space of continuous functions on which are piecewise linear with respect to the triangulation Th and which vanish on the boundary. One can easily verify that an affine function on a triangle K is uniquely determined by its values at the vertices of K. Hence, any function v ∈ Vh is uniquely determined by its values at the vertices (also called nodes) of the triangulation which are in the interior of , i.e., in (on the vertices which are on the boundary ∂, v is prescribed to be equal to zero). For any vertex ai of the triangulation, i = 1, 2, . . . , I (h), which is in the interior set , let us denote by ϕi the element of Vh which satisfies ϕi (aj ) = δij ,
1 ≤ i, j ≤ I (h).
Equivalently, ϕi is the function of Vh which is equal to one at the vertex ai and is equal to zero at all other vertices aj with j = i. It is usually called a hat function. Clearly, (ϕ1 , ϕ2 , . . . , ϕI (h) ) is a basis of Vh and each element v of Vh can be uniquely written in the
i
i i
i
i
i
i
262
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
form v=
I (h)
v(ai )ϕi .
i=1
One should notice that each element ϕi of this basis has small support: more precisely, the support of ϕi is the union of all triangles K of Th such that ai is a vertex of K. The stiffness matrix Ah = (a(ϕi , ϕj ))1≤i,j ≤I (h) is a sparse matrix, since a(ϕi , ϕj ) = 0 except when ai and aj are two vertices of a same triangle K of the triangulation Th . Let us now stress a technical but important point: the structure of Ah , i.e., the distribution of zeroes, for a given triangulation Th highly depends on the enumeration of the vertices. Clearly, one has to use an enumeration to obtain Ah with a simple structure like, for example, tridiagonal matrices. Let us illustrate this in a concrete situation.
7.3 An example Take = (0, 1) × (0, 1) the unit square in R2 and, given f ∈ L2 (), let us consider the Dirichlet boundary value problem −u = f on , u = 0 on ∂. Its variational formulation is as follows: find u ∈ H01 () such that ∇u · ∇v dx = f v dx ∀ v ∈ H01 ().
Let us consider the triangulation of in Figure 7.3.
Figure 7.3. Triangulation of with indexed nodes. We denote by al,m = (lh, mh), 0 ≤ l, m ≤ N +1, the nodes of the triangulation. There are N 2 nodes which are in , and the dimension of Vh is equal to N 2 with h = 1/(N + 1). Let us index the nodes of the triangulation as indicated in Figure 7.3, where N has been taken equal to 4. We draw, for example, the perspective of the hat function ϕ6 which is an element of the finite element basis (Figure 7.4).
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
263
Figure 7.4. The hat function ϕ6 . Recall that a(ϕi , ϕj ) = 0 iff ai and aj are two vertices of the same triangle. Then, notice that a vertex ai is connected to at most six other vertices aj with j = i, which, a priori, yields a seven-point numerical scheme. The matrix Ah then has the following structure where each cross represents an element a(ϕi , ϕj ) which is, a priori, not equal to zero. It is a matrix with a band structure, that is, a(ϕi , ϕj ) = 0 for |i − j | > dmax , where dmax , the width of the band, is small with respect to the size of the matrix. Indeed, for symmetry reasons a(ϕl,m , ϕl+1,m+1 ) = 0 and a(ϕl,m , ϕl−1,m−1 ) = 0 and we have a five-point scheme! Let λl,m = uh (ϕl,m ) be the component of uh with respect to ϕl,m so that uh = λl,m ϕl,m . An elementary computation yields 1 ≤ l, m ≤ N, −λl−1,m − λl,m−1 + 4λl,m − λl,m+1 − λl+1,m = h2 fl,m , λl,0 = λl,N+1 = 0, 0 ≤ l ≤ N + 1, λ0,m = λN+1,m = 0, 0 ≤ m ≤ N + 1. This is the classical five-point scheme for the Laplacian. (Here fl,m = f ϕl,m dx or, equivalently, an approximation of this integral.) Remark 7.3.1. It is worth noticing that the tridiagonal block structure of Ah is intimately related to the enumeration of the elements of the basis. This structure may be lost by choosing a different enumeration!
7.4
Convergence of the finite element method
The convergence of the finite element method, which is a Galerkin approximation method, relies on Proposition 3.1.2: u − uh H01 () ≤
M dist(u, Vh ). α
i
i i
i
i
i
i
264
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
Therefore, the estimate of the error u−uh (and showing that the error goes to zero as h → 0) can be reduced to a problem in approximation theory: one has to evaluate (majorize) the distance for the H01 () norm between a function u ∈ H01 () and the subspace Vh of continuous functions which are piecewise affine relative to a given triangulation Th . To that end, we need to make a geometrical assumption on the family of triangulations (Th )h→0 . Definition 7.4.1. A family of triangulations (Th )h>0 is said to be regular if there exists a constant σ (σ ≥ 0) such that for any h > 0 and any K ∈ Th hK ≤ σ, ρK
(7.13)
where hK is the diameter of K and ρK is the supremum of the diameters of the balls contained in K. It can be easily shown that this condition is equivalent to the following. There exists a constant θ0 > 0 such that for any h > 0 and for any K ∈ Th , θK ≥ θ0 , where θK denotes the smallest angle of the triangle K. Thus the regularity of a family of triangulations (Th )h>0 in the sense of Definition 7.4.1 prevents the triangles from becoming “flat” in the limit when h → 0. As we shall see, this is a key assumption to obtain the convergence of the method. For example, the situation with εh → 0 in Figure 7.5 is not allowed in the context of a regular family of triangulation (as εh → 0, Kh becomes flat).
Figure 7.5. A triangle Kh becoming flat. We shall return later to this example and show that in such a situation some of the following mathematical developments fail to be true.
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
265
The main result of this section, which is the convergence of the finite element method under the regularity assumption (7.13), is given below. Theorem 7.4.1. Let be a polygon and let (Th )h→0 be a regular family of triangulations of . Then, the finite element method converges, i.e., lim u − uh H01 () = 0.
h→0
Moreover, if u belongs to H 2 (), the following estimate holds: there exists some constant C > 0 such that for all h > 0 u − uh H01 () ≤ ChuH 2 () . Proof. (a) Let us first assume that u ∈ H 2 (). Recalling that N = 2, by Sobolev embedding theorems (see Section 5.7) we have u ∈ C(). Indeed this is true under the assumption N ≤ 3. This allows us to talk about the value of u at any point of , and especially at the nodes (ai )i=1,...,I (h) of the triangulation Th . Let us introduce the function h (u) which is the continuous affine interpolant of u at the nodes of Th : h (u) =
I (h)
u(ai )ϕi .
i=1
Recall that ϕi is the hat function related to the node ai and that the (ϕi )i=1,...,I (h) form a basis of Vh . The above formula is just the linear decomposition of h (u) in the basis (ϕi )i=1,...,I (h) . Since h (u) ∈ Vh , by definition of dist(u, Vh ), we have dist(u, Vh ) ≤ u − h (u)H01 () . This inequality, when combined with Proposition 3.1.2 (Cea’s lemma) yields u − uh H01 () ≤
M u − h (u)H01 () . α
(7.14)
Let us now use the following approximation result that we admit for the moment (we shall return to this crucial result further): there exists a constant C independent of h such that for all u ∈ H 2 () u − h (u)H 1 () ≤ ChuH 2 () . (7.15) Let us notice that this estimate makes use in an essential way of the regularity assumption (7.13) on the family of triangulations (Th )h→0 and of the fact that u ∈ H 2 (). Let us now combine (7.14) and (7.15) to obtain u − uh H01 () ≤
M ChuH 2 () . α
(7.16)
Thus, in the case u ∈ H 2 (), we have convergence of the finite element method, that is, norm convergence in H01 () of the sequence (uh )h→0 to u as h → 0. More precisely, the estimate (7.16) provides information about the rate of convergence of the method.
i
i i
i
i
i
i
266
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
(b) In the general case, that is, u ∈ H01 (), one completes the proof by a density argument: for any ε > 0 let us introduce some vε ∈ D() = C∞ c () such that u − vε H01 () < ε.
(7.17)
Since vε ∈ D() ⊂ H 2 (), for each ε > 0 we can use the previous argument and, by (7.15), we have vε − h (vε )H01 () ≤ Chvε H 2 () . (7.18) Let us now write the triangle inequality u − h (vε )H01 () ≤ u − vε H01 () + vε − h (vε )H01 () and use inequalities (7.17) and (7.18) to obtain u − h (vε )H01 () ≤ ε + Chvε H 2 () . Since h (vε ) ∈ Vh , this implies dist(u, Vh ) ≤ ε + Chvε H 2 () . Hence lim sup dist(u, Vh ) ≤ ε. h→0
This being true for any ε > 0, we finally obtain lim dist(u, Vh ) = 0,
h→0
which, by Proposition 3.1.2 (Cea’s lemma), implies the norm convergence in H01 () of uh to u as h → 0. Let us now give the proof of the piecewise affine interpolation inequality (7.15) which, together with the abstract Cea’s lemma, is the key ingredient of the proof of Theorem 7.4.1. Because of its importance and its own interest let us state it independently. Theorem 7.4.2. Let be a polygon and (Th )h→0 a regular family of triangulations of (i.e., (7.13) is supposed to be satisfied). Then, there exists a constant C, which is independent of h, such that for any u ∈ H 2 () u − h (u)H 1 () ≤ ChuH 2 () . We recall that h (u) is the piecewise affine interpolant of u relative to Th . For pedagogical reasons, it is worthwhile to first prove Theorem 7.4.2 in one dimension, i.e., = (a, b) is an interval of R. Indeed, the role of the norms H 1 () and H 2 () and of the assumption u ∈ H 2 () already appear quite naturally in this situation, and the proof just requires elementary tools. We shall then consider the two-dimensional case and show how, in that case, one has to do some geometrical assumptions on Th . (This is where the regularity assumption (7.13) on Th plays a central role.)
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
267
Proof of Theorem 7.4.2 in the one-dimensional case. Take = (a, b) be an interval of R with −∞ < a < b < +∞. Let a = a0 < a1 < a2 < · · · < an = b be a discretization of , and set h = maxi |ai+1 − ai |. (a) Let us first assume that u is smooth, say, u ∈ C∞ ([a, b]), and let x ∈ (aj , aj +1 ). The Taylor–Lagrange formula at order one yields
u(aj +1 ) − u(aj ) h (u) (x) = aj +1 − aj = u (aj + θj )
for some 0 < θj < h. Hence, for any x ∈ (aj , aj +1 ) u (x) − h (u) (x) = |u (x) − u (aj + θj )| aj +1 ≤ |u (s)| ds. aj
Applying the Cauchy–Schwarz inequality, we obtain aj +1 u (x) − h (u) (x)2 ≤ h |u (s)|2 ds. aj
The above inequality holds for any x ∈ (aj , aj +1 ). After integration on (aj , aj +1 ) one obtains aj +1 aj +1 u (x) − h (u) (x)2 dx ≤ h2 |u (s)|2 ds. aj
aj
Summing the above inequality with respect to j = 0, 1, . . . , N − 1, finally yields a
b
u (x) − h (u) (x)2 dx ≤ h2
b
|u (s)|2 ds,
a
that is, u − h (u) L2 () ≤ hu L2 () .
(7.19)
Let us prove that there exists some constant C > 0 such that u − h (u)L2 () ≤ Chu L2 () .
(7.20)
Using the same argument and notation as above, we can write for x ∈ (aj , aj +1 ) u(aj +1 ) − u(aj ) u(x) − h (u)(x) = u(x) − u(aj ) + (x − aj ) aj +1 − aj = u(x) − u(aj ) − (x − aj )u (aj + θj ) = (x − aj )u (aj + θx,j ) − (x − aj )u (aj + θj ),
i
i i
i
i
i
i
268
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
where 0 < θx,j < x − aj . It follows that
aj +1
|u(x) − h (u)(x)| ≤ (x − aj )
|u (t)| dt,
aj
which by similar arguments as above yields h2 u − h (u)L2 () ≤ √ u L2 () . 3 Finally, by combining (7.19) and (7.20), one obtains u − h (u)H 1 () ≤ ChuH 2 () .
(7.21)
(b) Let us now extend the above inequality to an arbitrary u ∈ H 2 (). To that end one uses a density argument: noticing that C∞ ([a, b]) is dense in H 1 (a, b) and H 2 (a, b), we just need to prove that the operator h : H 1 () → H 1 () is continuous. More precisely, one can state the following result of independent interest, which concludes the proof in the one-dimensional case. Lemma 7.4.1. Suppose = (a, b) and (Th )h→0 is a discretization of . Then, there exists a constant C > 0 such that for any h > 0, for any v ∈ H 1 (), h (v)H 1 () ≤ CvH 1 () . Proof. For any x ∈ (aj , aj +1 ) v(aj + 1) − v(aj ) aj +1 − aj aj +1 1 v (t) dt = aj +1 − aj aj
h (v) (x) =
(see Theorem 5.1.1). Hence
2 h (v) (x) ≤
1 (aj +1 − aj )2
which by the Cauchy–Schwarz inequality yields
2 h (v) (x) ≤
1 aj +1 − aj
aj +1
|v (t)| dt
2 ,
aj
aj +1
|v (t)|2 dt.
aj
After integration on (aj , aj +1 ), and summation with respect to j , one obtains h (v) L2 () ≤ v L2 () . On the other hand,
√ b − ah (v)L∞ () √ ≤ b − avL∞ () .
(7.22)
h (v)L2 () ≤
(7.23)
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
269
In the one-dimensional case (N = 1) it was proved in Theorem 5.1.1 that each element of H 1 (a, b) has a unique continuous representative. Let us verify by some elementary computation that this canonical embedding H 1 (a, b) ⊂ C([a, b]) is continuous, i.e., there exists some constant C > 0 such that ∀ v ∈ H 1 (a, b)
vL∞ (a,b) ≤ CvH 1 (a,b) .
(7.24)
Recall that we still denote v˜ = v the continuous representative of v and that for any x0 , x ∈ [a, b] x v(x0 ) = v(x) + v (t) dt. x0
Let us apply the Cauchy–Schwarz inequality to the above formula: x |v(x0 )| ≤ |v(x)| + |v (t)| dt x0
≤ |v(x)| +
√
b−a
b
1/2
|v (t)| dt 2
.
a
Let us now integrate this inequality with respect to x ∈ [a, b]: b 1/2 b 3/2 2 (b − a)|v(x0 )| ≤ |v(x)| dx + (b − a) |v (t)| dt a
a
b
≤ (b − a)1/2
1/2 |v(x)|2 dx
+ (b − a)3/2
a
b
|v (t)|2 dt
1/2 .
a
The elementary inequality (α + β)2 ≤ 2(α 2 + β 2 ) now yields 1/2 b b √ 1 vL∞ (a,b) ≤ 2 v(x)2 dx + (b − a) v (x)2 dx b−a a a 1/2 √ 1 vH 1 (a,b) . ,b − a ≤ 2 max b−a Combining (7.23) and (7.24) finally yields
√ h (v)L2 () ≤ C b − avH 1 () ,
which, together with (7.22) gives h (v)H 1 () ≤ CvH 1 () , and the proof of Lemma 7.4.1 is complete. Remark 7.4.1. Indeed one can prove the following result: for all v ∈ H 1 (a, b) lim v − h (v)H 1 (a,b) = 0.
h→0
(7.25)
This is slightly more precise than proving that for every u ∈ H 1 (a, b) lim dist(u, Vh ) = 0,
h→0
i
i i
i
i
i
i
270
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
which we have used in the proof of the convergence of the finite element method. The proof follows the lines of the previous arguments: given v ∈ H 1 (a, b), for any ε > 0, let vε ∈ H 2 (a, b) with v − vε H 1 (a,b) < ε. We have v − h (v)H 1 (a,b) ≤ v − vε H 1 (a,b) + vε − h (vε )H 1 (a,b) + h (v − vε )H 1 (a,b) ≤ Cv − vε H 1 (a,b) + Chvε H 2 (a,b) . It follows that lim sup v − h (v)H 1 (a,b) ≤ Cε. h→0
This being true for any ε > 0, the conclusion follows. Proof of Theorem 7.4.2 in the two-dimensional case. Suppose now that is a polygon. The proof of the basic estimate for u ∈ H 2 (), u − h (u)H 1 () ≤ ChuH 2 () ,
(7.26)
is much more involved than in the one-dimensional case. To establish this result one needs to make some geometrical assumptions on the triangulation Th ; this is where the regularity assumption (7.13) plays a central role. Indeed, to establish (7.26) we first are going to argue with a single triangle (the key step) and show the following result. Theorem 7.4.3. There exists a constant C > 0 such that for any triangle K, for any u ∈ H 2 (K), h u − (u)H 2 (K) ≤ Ch h + uH 2 (K) , (7.27) ρ where (u) is the affine interpolant of u at the vertices of K, h is the diameter of K, and ρ is the diameter of the largest ball contained in K. Proof of Theorem 7.4.2 (continuation). The basic estimate (7.26) and so Theorem 7.4.2 can be easily deduced from (7.27), as shown in the following. By taking the square of each member of (7.27), writing the corresponding inequalities for all the triangles of the triangulation Th , and then summing these inequalities, one obtains h 2 u − h (u)2H 1 () ≤ Ch2 h + u2H 2 () . ρ Let us now use the regularity assumption (7.13) on the triangulation (h/ρ ≤ σ ) and take h ≤ 1 to obtain u − h (u)H 1 () ≤ Ch(1 + σ )uH 2 () , which proves (7.26)
.
Thus our concern to complete the proof of the finite element method in the case N = 2 is to prove Theorem 7.4.3. A key idea in the process of getting the estimate (7.27) consists 9 which is used as a reference; take, first in establishing such a formula for a fixed triangle K, 9 equal to the unit simplex. for example, K
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
271
9 be a given triangle. Then there exists a constant C > 0 such that for Lemma 7.4.2. Let K 2 9 any v ∈ H (K) 2 v − K9 (v)H 1 (K) (7.28) 9 ≤ C|D v|L2 (K) 9 , 2 ∂i v i (x)2 dx. where |D 2 v|L2 (K) 9 := 9 i1 +i2 =2 K 1 2 ∂x1 ∂x2
This is the two-dimensional version of inequalities (7.19) and (7.20). At this stage, we don’t need to know the precise value of the constant C, the point being just to know if such a constant exists. 9 to a triangle K of Th by using an Then, we shall pass from the reference triangle K affine transformation, x = B xˆ + b = F (x), ˆ where B is an invertible matrix and b ∈ R2 , which satisfies 9 K = F (K). The geometrical properties of the triangulation will appear through this transformation. Proof of Lemma 7.4.2. To prove (7.28), and since we don’t need to know the precise value of C, we follow an analysis similar to the one in the proof of general Poincaré inequalities (Theorem 5.4.3, with the Poincaré–Wirtinger inequality as an example). The idea is to argue by contradiction and use the Rellich–Kondrakov compact embedding theorem, Theorem 9 5.4.2. Without ambiguity, for simplicity of the notation let us write K instead of K. Suppose the assertion (7.28) is false. Then we could find a sequence (vn )n∈N such that vn ∈ H 2 (K) ∀ n ∈ N, 1 vn − K (vn )H 1 (K) ≥ n. 2 |D vn |L2 (K) Noticing that D 2 (K (vn )) = 0, one can rewrite the above inequality in the following form: 2 vn − K (vn ) 1 D ≤ . vn − K (vn ) H 1 (K) L2 (K) n Let us introduce the function un := We have
vn − K (vn ) . vn − K (vn ) H 1 (K)
un ∈ H 2 (K), un H 1 (K) = 1, |D 2 un |L2 (K) ≤ 1/n, un (aj ) = 0, j = 1, 2, 3, where aj are the vertices of K.
Let us show how to obtain a contradiction from this set of properties. From un H 1 (K) = 1 and |D 2 un |L2 (K) ≤ 1/n we obtain that the sequence (un )n∈N is bounded in H 2 (K). Since K is bounded and piecewise C1 one can apply the Rellich–Kondrakov theorem, which
i
i i
i
i
i
i
272
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
gives that the sequence (un )n∈N is relatively compact in H 1 (K). We can then extract a subsequence, which we still denote un , such that un → u
in H 1 (K).
Hence uH 1 (K) = 1. On the other hand, from |D 2 un |L2 (K) ≤ 1/n, we obtain that D 2 un → 0 = D 2 u
in L2 (K).
Hence u is an affine function on K. The linear map u → u(aj ) from H 2 (K) into R is continuous. Since un → u in H 2 (K) we have u(aj ) = 0, j = 1, 2, 3. The only affine function on K which is zero at the vertices is the function u = 0. This is a contradiction with uH 1 (K) = 1. This establishes (7.28) and concludes the proof of Lemma 7.4.2. Let us now consider an affine invertible transformation F (x) ˆ = B xˆ + b
(7.29)
9 and examine how it affects the formula (7.28). with K = F (K) The following notation and definitions will be helpful. We use the mappings 9 −→ F (x) xˆ ∈ K ˆ =x∈K F
9 which is the inverse of F . To each function v defined on K one can and F −1 : K → K, 9 → R, which is defined by associate the function vˆ : K v( ˆ x) ˆ = v(F (x)), ˆ which, with the above notation, gives v( ˆ x) ˆ = v(x). Recall that F (x) ˆ = B xˆ + b is an affine invertible map. The spectral norms of B and B −1 will play a central role in the following. Recall that these norms are defined by % & B = sup |Bξ | : |ξ | = 1 , & % B −1 = sup |B −1 ξ | : |ξ | = 1 , where |ξ | is the Euclidean norm. The geometrical characteristic properties of the triangulation (h and ρ) do appear naturally in the evaluation of these norms. Let us denote by hK and ρK (respectively, hK9 9 as defined in (7.13). and ρK9 ) the geometrical characteristic numbers of K (respectively, K) Lemma 7.4.3. The following estimates hold: B ≤
Proof. We have
hK , ρK9
B −1 ≤
hK9 . ρK
% & B = sup |Bξ | : |ξ | = 1 & % 1 = sup |Bξ | : |ξ | = ρK9 . ρK9
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
273
9 such By definition of ρK9 , for any ξ with |ξ | = ρK9 one can find two points xˆ1 and xˆ2 in K that ξ = xˆ2 − xˆ1 . Hence Bξ = B xˆ2 − B xˆ1 = F xˆ2 − F xˆ1 = x2 − x1 , 9 where x2 and x1 belong to K = F (K). By definition of hK , which is the diameter of K, we have |Bξ | = |x2 − x1 | ≤ hK . This inequality being true for any ξ with |ξ | = ρK9 , we deduce B ≤
hK . ρK9
9 The other inequality is obtained in a similar way, by reversing the role of K and K. The other basic ingredient of the proof is the change of variables in the integrals which are equal to Sobolev norms. Let us write, for a given integer m ≥ 0 ' (1/2 α 2 |v|m,K = |∂ v(x)| dx . (7.30) |α|=m K
9 be two finite elements which are affine equivalent, that is, Lemma 7.4.4. Let K and K 9 with F (x) K = F (K) ˆ = B xˆ + b and B affine invertible. If a function v belongs to the 9 and space H m (K) for some integer m ≥ 0, then the function vˆ = v ◦ F belongs to H m (K) there is a constant C(m) > 0 such that ∀v ∈ H m (K)
|v| ˆ m,K9 ≤ C(m)Bm | det B|−1/2 |v|m,K .
Analogously, one has 9 ∀vˆ ∈ H m (K)
|v|m,K ≤ C(m)B −1 m | det B|1/2 |v| ˆ m,K .
Proof. By standard density arguments, one just needs to argue with v ∈ C∞ (K). Hence 9 It is convenient to introduce the first and second derivatives of v: then Dv(x) vˆ ∈ C∞ (K). is a linear form and ∂v (x) = Dv(x) · ei , ∂xi where (ei ) are the vectors of the canonical basis in RN (here N = 2). Similarly, D 2 v(x) is the bilinear symmetric form associated to the Hessian matrix and ∂ 2v (x) = D 2 v(x) · (ei , ej ). ∂xi ∂xj
i
i i
i
i
i
i
274
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
One can unify these two situations (and much more) by writing for any multi-index α = (α1 , α2 ) with length |α| = m ∂ α v(x) = D m v(x)(e1 , . . . , e1 , e2 , . . . , e2 ), where e1 is repeated α1 times, and e2 is repeated α2 times. Recall that ∂ α v(x) = (∂ |α| v/ ∂x1α1 ∂x2α2 )(x). Set % & D m v(x) = sup |D m v(x)(ξ1 , . . . , ξm )| : |ξi | ≤ 1, 1 ≤ i ≤ m . Then |∂ α v(x)| ≤ D m v(x) and |v|m,K =
1/2 |∂ v(x)| dx 2
α
K |α|=m
1/2
D m v(x)2 dx
≤ C1
(7.31)
,
K
where C12 is the cardinal of the set of indices α such that |α| = m, i.e., C1 = C1 (m). We can now perform the differentiation rule for composition of functions. Recalling that v( ˆ x) ˆ = v(F (x)) ˆ = v(B xˆ + b) we have ˆ x)(ξ ˆ 1 , . . . , ξm ) = D m v(x)(Bξ1 , . . . , Bξm ) D m v( so that ˆ x) ˆ ≤ D m v(x)Bm . D m v( 9 one obtains Taking the square and after integration on K, 2 2m m D v( ˆ x) ˆ d xˆ ≤ B D m v(F (x)) ˆ 2 d x. ˆ 9 K
9 K
Using the formula of change of variables in multiple integrals we get D m v( ˆ x) ˆ 2 d xˆ ≤ B2m | det(B −1 )| D m v(x)2 dx. 9 K
(7.32)
K
Combining (7.31) and (7.32) we obtain |v| ˆ m,K9 ≤ C1 (m)Bm | det B|−1/2
D m v(x)2 dx
1/2 .
K
Since conversely there exists a constant C2 (m) such that 1/2 D m v(x)2 dx ≤ C2 (m)|v|m,K ,
(7.33)
K
we finally obtain
|v| ˆ m,K9 ≤ C(m)Bm | det B|−1/2 |v|m,K
i
i i
i
i
i
i
7.4. Convergence of the finite element method
“abmb 2005/1 page 2 i
275
with C(m) = C1 (m)C2 (m). 9 yields the other inequality. Reversing the role of K and K End of the proof of Theorem 7.4.3. We now have all the ingredients of the proof of the basic estimate (7.27) in Theorem 7.4.3: h uH 2 (K) . u − (u)H 1 (K) ≤ Ch h + ρ Let u ∈ H 2 (K). By Lemma 7.4.4 we have for m = 0, 1 : |u − (u)|m,K ≤ CB −1 m | det B|1/2 |uˆ − (u)| 9. m,K
(7.34)
9 yields for m = 0, 1 The estimate (7.28) on K : ˆ 2,K9 . |uˆ − (u)| 9 ≤ C|u| m,K
(7.35)
Applying again Lemma 7.4.4 we have |u| ˆ 2,K9 ≤ CB2 | det B|−1/2 |u|2,K .
(7.36)
Combining (7.34), (7.35), and (7.36) we obtain |u − (u)|m,K ≤ CB −1 m B2 |u|2,K .
(7.37)
Take the square of inequality (7.37) and sum over m = 0, 1 to obtain u − (u)H 1 (K) ≤ CB2 1 + B −1 |u|2,K . Using Lemma 7.4.3 we finally get
hK9 h2K 1 + |u|2,K ρK ρK29 h2K 2 |u|2,K ≤ C hK + ρK
u − (u)H 1 (K) ≤ C
9 where 1/@ ρK and h@ K have been included in the constant C. Recall that K is a fixed reference triangle. Noticing that |u2,K | ≤ uH 2 (K) , the proof is complete. Remark 7.4.2. Note that we have obtained a slightly more precise result than (7.27); indeed, we proved that for every u ∈ H 2 (K) h 1 |u|2,K , u − h (u)H (K) ≤ Ch h + ρ where |u|2,K = |D 2 u|L2 (K) just involves the L2 -norm of the second-order partial derivatives of u. Consequently, in Theorem 7.4.2, we have that for any u ∈ H 2 () u − h (u)H 1 () ≤ Ch|u|2,K . Similarly, in Theorem 7.4.1, we have that if u ∈ H 2 (), u − uh H 1 () ≤ Ch|u|2,K .
i
i i
i
i
i
i
276
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
7.5
Complements
7.5.1
Flat triangles
Let us return to the situation described in Figure 7.5, which illustrates a family of triangulations (Th )h→0 involving triangles Kh ∈ Th becoming flat as h → 0. Let us show that on such triangles the affine interpolate can lead to significant errors. Take a simple function which is not affine, for example, a quadratic function u(x, y) = y 2 . Let us compute the affine interpolate h (u) of u on the triangle Kh whose vertices are (0, h2 ), (0, − h2 ), and (hεh , 0). We have that h (u) vanishes at (hεh , 0) and is equal to h2 /4 at the two other vertices. Hence h2 /4 h ∂ =− . h (u) = − ∂x hεh 4εh Since
∂u ∂x
∂ = 0 we obtain | ∂x (u − h (u))| =
h 4εh
and
2 1/2 ∂ h2 (u − h (u) (x) dx = √ √ . ∂x 4 2 εh Kh
On the other hand, we have
∂2u ∂x 2
=
∂2u ∂x∂y
= 0 and
∂2u ∂y 2
= 2, which give
1/2
|u|2,Kh =
4 dx dy
=
√ √ 2h εh .
Kh
An inequality of the type u − h (u)H 1 (Kh ) ≤ Ch|u|2,Kh would then imply ∂ (u − h (u) ∂x that is,
≤ Ch|u|2,K , L2 (Kh )
√ √ h2 √ √ ≤ Ch 2h εh , 4 2 εh
which is equivalent to inf εh > 0.
h>0
Thus, the convergence analysis developed in this chapter fails to be true without any geometrical assumption on the family of triangulations preventing the triangles from becoming flat.
i
i i
i
i
i
i
7.5. Complements
7.5.2
“abmb 2005/1 page 2 i
277
H 2 () regularity of the solution of the Dirichlet problem on a convex polygon
In the model situation studied in this section, we chose to take as a polygon in R2 , to make as simple as possible the description of the triangulation in the finite element method. (Otherwise, for general one has to approximate it by such polygonal sets h .) Conversely, we have a difficulty, which is to know if the solution u of the Dirichlet problem with f ∈ L2 () −u = f on , u = 0 on ∂, satisfies the property u ∈ H 2 (). Indeed the estimate u − uh H01 () ≤ h has been established under the assumption u ∈ H 2 (). We are in a situation where is a polygon, its boundary is not smooth (it is only piecewise C1 or Lipschitz continuous), and the classical Agmon–Douglis–Nirenberg theorem which asserts that u ∈ H 2 () under the assumption that is of class C2 does not apply. The answer to this question is quite involved. It was studied by Grisvard in [150], [151], who proved in particular that if is a polygon which is supposed to be convex, then u ∈ H 2 ().
7.5.3
Finite element methods of type P2
The method which has been developed in R2 with a polygon and finite elements which are triangles can be naturally extended to R3 when replacing triangles by tetrahedrons. Functions of the approximating subspaces Vh are continuous and piecewise affine. This is what we call a finite element method of type P1 (by reference to the degree of the polynomial functions which are used). To improve the quality of the approximation, one may naturally think to enrich the approximating subspaces and make them contain more functions. This can be done, for example, by considering functions which are piecewise polynomial of degree less than or equal to two. Let us briefly describe an example of such a finite element method of type P2 . Take a polygon in R2 and a given triangulation Th of . We introduce the space % & Vh = v ∈ C() : vK ∈ P2 for every K ∈ Th , where P2 is the family of polynomial functions on R2 of degree less than or equal to two. The general form of an element p ∈ P2 is then p(x, y) = a + bx + cy + dx 2 + exy + fy 2 , and one can verify that P2 is a vector space of dimension equal to 6. Then, to fix an element p ∈ P2 , it is not sufficient to give its values at the vertices of a triangle (as it was the case for p ∈ P1 ): we need to give its values at six points carefully choosen. Take, for example,
i
i i
i
i
i
i
278
“abmb 2005/1 page 2 i
Chapter 7. The finite element method
Figure 7.6. Six points on the triangle K. the case of Figure 7.6, where a1 , a2 , a3 are the vertices of K and aij = 21 (ai + aj ) are the midpoints of the edges of K. This choice leads to triangulations whose nodes are the vertices of the triangles and the midpoints of all the edges. As an illustration consider the case of Figure 7.7.
Figure 7.7. A triangulation for finite element method of type P2 . Let us denote by (Nj ) the nodes of Th , j = 1, . . . , I (h) (vertices + midpoints of edges). It is quite elementary to verify that (a) Vh is a subspace of dimension I (h) of H 1 () and any function v of Vh is uniquely determined by its values at the nodes of the triangulation, (b) a basis of Vh is given by the family of functions (pj )j =1,...,I (h) which is defined by pj ∈ Vh , pj (Ni ) = δij For any element v ∈ Vh one has v(x) =
for i, j = 1, . . . , I (h). I (h)
v(Nj )pj (x).
j =1
The finite element method can now be developed in a way parallel to what we did before. Indeed, as expected, one can get a better order in the approximation by piecewise P2 functions. Let us denote by I (h) v(Nj )pj h (v) = j =1
the element of Vh obtained by interpolation of v on the nodes of the triangulation (h (v) = v on the nodes). Then, one can show the following result (which we do not prove): if the family of triangulations (Th )h→0 is regular, then there exists a constant C > 0 such that for any u ∈ H 3 (), u − h (u)H 1 () ≤ Ch2 |u|3, .
i
i i
i
i
i
i
“abmb 2005/1 page 2 i
Chapter 8
Spectral Analysis of the Laplacian
8.1
Introduction
From the very beginning of the 19th century, the study of the eigenvalue problem for the Laplace equation emerged as a fundamental topic in the theory of partial differential equations. In 1922, J.B.J. Fourier was faced with this question to develop the so-called separation of variables method. Let us illustrate this method in the case of the wave equation with Dirichlet boundary data, which, for example, gives a model for the vibrations of an elastic membrane which is clamped on its boundary. Given initial data u0 , u1 : → R, one looks for a solution u : Q = × (0, +∞) → R of the boundary value problem 2 ∂ u − u = 0 on Q, 2 ∂t u=0 on = ∂ × (0, +∞), u(x, 0) = u (x) on , 0 ∂u (x, 0) = u1 (x) on . ∂t The idea is to look for a solution u of the form u(x, t) = w(x)ϕ(t), (8.1) where the dependence of u with respect to (x, t) has been separated. The wave equation then becomes w(x)ϕ (t) − ϕ(t)w(x) = 0 or, equivalently,
ϕ (t) w(x) = . (8.2) ϕ(t) w(x) Since the left-hand side of (8.2) is a function only of t and the right-hand side only of x, this forces these two expressions to be constant, that is, ϕ (t) w(x) = = −λ ϕ(t) w(x) for some constant λ. 279
i
i i
i
i
i
i
280
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
This method leads to the study of the spectral problem for the so-called Laplace– Dirichlet operator, −w = λw on , (8.3) w = 0 on ∂, and the resolution of the ordinary differential equation ϕ (t) + λϕ(t) = 0.
(8.4)
One can easily verify that the eigenvalues of the spectral problem (8.3) are positive (it is enough to multiply by w and integrate by parts on ). Therefore, the solutions of (8.4) are of the following form: √ √ ϕ(t) = A cos λt + B sin λt. The question is now, can one, by linear combinations of such separate solutions, obtain a solution √ √ u(x, t) = wi (x) A cos λi t + B sin λi t (8.5) i
which satisfies the initial data u(x, 0) = u0 (x) and (∂u/∂t)(x, t) = u1 (x)? Indeed, this question is intimately related to the possibility of generating any given function by linear combinations (indeed, series!) of eigenvectors of the Laplace–Dirichlet operator. We shall give a positive answer to this question and prove the following theorem (which is the main result of this chapter). Assume is a bounded open set in RN . Then, there exists a complete orthonormal system of eigenvectors of the Lapalce–Dirichlet operator in the space L2 (). A complete orthonormal system is also called a Hilbertian basis. This is a deep result of Rellich which is precisely based on the Rellich–Kondrakov theorem (compact embedding of H01 () into L2 (); see Theorem 5.3.3) and on the abstract spectral decomposition theorem for compact self-adjoint operators. Indeed, variational methods play a central role in the theory of eigenvalues of elliptic partial differential equations. Another striking result in this direction is the variational characterization of eigenvalues of the Laplace–Dirichlet operator. One of the formulas provided by the Courant–Fisher min-max principle is the following: the first eigenvalue of the Laplace–Dirichlet operator is given by the variational formula 2 1 2 |∇v(x)| dx : v ∈ H0 (), v(x) dx = 1 . λ1 (−) = min
This formula and its companions provide powerful tools for studying the properties of the eigenvalues and eigenvectors of the Laplace–Dirichlet operator. In 1911, Weyl used this principle to solve the problem on the asymptotic distribution of the eigenvalues of the Laplace–Dirichlet operator. We shall briefly describe such a result in Section 8.5. In the last two decades, spectral methods, just like finite element methods, have proved to be very efficient in the numerical treatment of some partial differential operators. They give raise to approximation methods where the finite dimensional approximating spaces Vn are based on (orthogonal) polynomials of high degree (by contrast with finite element methods, where the degree is fixed, one for P1 method, two for P2 method, for example).
i
i i
i
i
i
i
8.2. The Laplace–Dirichlet operator: Functional setting
“abmb 2005/1 page 2 i
281
Here, the degree of polynomials of Vn increases with n. These methods provide accurate approximations of the solution, which are limited only by the regularity of the solution. But on the counterpart, there are some restrictions on the geometry of . For pedagogical reasons and simplicity of exposition, we restrict our attention to the spectral analysis of the Laplace equation with Dirichlet boundary condition. In Section 8.6, we shall briefly survey some straight extensions of these results.
8.2 The Laplace–Dirichlet operator: Functional setting Our objective is to study the eigenvalue problem for the Laplace equation on with Dirichlet boundary conditions on ∂. We seek for λ ∈ R and u = 0 such that −u = λu on , (8.6) u=0 on ∂. In all this chapter, is assumed to be a bounded open set in RN . Let us give a precise meaning to this definition and write its variational formulation. Definition 8.2.1. We say that λ ∈ R is an eigenvalue of the Laplace–Dirichlet operator if there exists some u ∈ H01 (), u = 0 such that ∇u(x) · ∇v(x) dx = λ u(x)v(x) dx ∀v ∈ H01 (), (8.7) u∈ H 1 (). 0
When such u exists it is called an eigenvector related to the eigenvalue λ. Remark 8.2.1. Let us make some comments to the definition above: (a) if (8.7) is satisfied, then −u = λu in the distribution sense, u = 0 in the trace sense, i.e., (8.6) is satisfied in a weak sense. The next step consists in proving that u is regular and hence it is a classical solution of (8.6). (b) One may wonder whether “λ ∈ R” is not too restrictive, and take instead λ ∈ C. Indeed, by taking v = u in (8.7) one obtains |∇u|2 dx λ= , u2 (x) dx
which implies that all the eigenvalues of the Laplace-Dirichlet operator are positive real numbers. Let us now come to the central idea which will permit us to formulate the above problem in terms of classical operator theory.
i
i i
i
i
i
i
282
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
The classical theory for spectral analysis of operators in infinite dimensional spaces works with operators T : H −→ H, where H is a Hilbert space and T ∈ L(H ) is a linear, continuous, and compact operator from H into H . One cannot write − in such a setting because, as for any differential operator, there is a loss of regularity when passing from u to −u. Indeed, − is a linear continuous operator − : H01 () −→ H −1 (). Another way to treat − is to consider it as an operator from L2 () into L2 (), but with a domain, i.e., dom(−) = H 2 () ∩ H01 (). By contrast, and that is the central idea, the inverse operator T = (−)−1 is a nice operator which fits well with the classical theory. The straight relation which connects the spectrum of an operator and the spectrum of its inverse operator will permit us to conclude our analysis. Let us now define the operator T as the inverse of the Laplace–Dirichlet operator. Definition 8.2.2. The inverse of the Laplace–Dirichlet operator is the operator T : L2 () −→ L2 () which is defined for every h ∈ L2 () by the following: T h ∈ H01 () ⊂ L2 () is the unique solution of the variational problem ∇(T h)(x) · ∇v(x) dx = h(x)v(x) dx ∀v ∈ H01 (), Th ∈ H 1 (). 0
Equivalently, T h is the variational solution of the Dirichlet problem (see Theorem 5.1.1), −(T h) = h on , T h = 0 on ∂. At this point let us notice that T : L2 () −→ H01 (). To consider T as a linear continuous operator from a space H into itself we have two possibilities: either T : L2 () −→ L2 () or T : H01 () −→ H01 (). These two approaches lead to similar parallel developments. We choose to consider T as acting from L2 () into L2 (). We then have (−) ◦ T = idH ,
H = L2 (),
i.e., T is the right inverse of −.
i
i i
i
i
i
i
8.2. The Laplace–Dirichlet operator: Functional setting
“abmb 2005/1 page 2 i
283
The introduction of T = (−)−1 is justified in the context of the spectral analysis of the Laplace–Dirichlet operator by the following result. Lemma 8.2.1. The real number λ is an eigenvalue of the Laplace–Dirichlet operator iff 1/λ is an eigenvalue of T = (−)−1 . Proof. Let us assume that λ is an eigenvalue of the Laplace–Dirichlet operator, i.e., there exists some u ∈ H01 (), u = 0, such that ∇u · ∇v dx = λ uv dx ∀v ∈ H01 ().
By definition of T = (−)−1 this is equivalent to u = T (λu). By linearity of T (this is proved in the next proposition), and using the fact that λ = 0, we deduce 1 T (u) = u, λ i.e., 1/λ is an eigenvalue of T . The following properties of the operator T will play a central role in its spectral analysis. Proposition 8.2.1. The operator T satisfies the following properties: (i) T : L2 () −→ L2 () is a linear continuous operator, (ii) T is self-adjoint in L2 (), (iii) T is compact from L2 () into L2 (), (iv) T is positive definite. Proof. (i)1 Take h1 , h2 ∈ L2 () and α1 , α2 ∈ R. By definition of T ∇(T h1 ) · ∇v dx = h1 v dx ∀v ∈ H01 (),
∇(T h2 ) · ∇v dx =
h2 v dx
∀v ∈ H01 ().
By taking a linear combination of these two equalities we obtain ∇(α1 T h1 + α2 T h2 ) · ∇v dx = (α1 h1 + α2 h2 )v dx αT h + α T h ∈ H 1 (). 1
1
2
2
∀v ∈ H01 (),
0
i
i i
i
i
i
i
284
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
By uniqueness of the solution of the Dirichlet problem, we get T (α1 h1 + α2 h2 ) = α1 T h1 + α2 T h2 , which is the linearity of T . (i)2 Let us now prove that T : L2 () −→ L2 () is continuous. By definition of T , the equality hv dx ∇(T h) · ∇v dx = 1 H0 ().
holds true for any v ∈ gives
In particular, it is satisfied by v = T h ∈ H01 (), which
|∇(T h)| dx = 2
(8.8)
hT (h) dx.
Using the Cauchy–Schwarz inequality, we obtain 1/2 1/2 |∇(T h)|2 dx ≤ h2 dx (T h)2 dx .
(8.9)
Let us now use the Poincaré inequality and the fact that is bounded (Theorem 5.3.1). There exists a constant C > 0 which depends only on such that 1/2 1/2 2 2 1 ∀v ∈ H0 () ≤C |∇v(x)| dx . v(x) dx
In particular, since for every h ∈ L () we have T h ∈ H01 (), we can write the Poincaré inequality with v = T h, which gives 1/2 1/2 2 2 ≤C |∇(T h(x))| dx . (8.10) |T h(x)| dx 2
Let us combine inequalities (8.9) and (8.10) to obtain 1/2 1/2 |∇(T h)|2 dx ≤ C h2 dx |∇(T h)|2 dx .
Equivalently
1/2 |∇(T h)| dx 2
1/2
≤C
2
,
(8.11)
which, together with (8.10), gives 1/2 1/2 2 2 2 |T h(x)| dx ≤C h dx .
(8.12)
h dx
Thus, we have obtained ∀h ∈ L2 ()
T hL2 () ≤ C 2 hL2 () ,
(8.13)
i.e., T is a linear continuous operator from L2 () into L2 (). Indeed, we have obtained a sharper result: from (8.11) and (8.12) we deduce that # ∀h ∈ L2 () T hH01 () ≤ C 1 + C 2 hL2 () , (8.14)
i
i i
i
i
i
i
8.2. The Laplace–Dirichlet operator: Functional setting
“abmb 2005/1 page 2 i
285
i.e., T : L2 () −→ H01 () is a linear continuous operator. This proves that one can also treat T as a linear continuous operator from H01 () into H01 (). (ii) Let us now prove that T is a self-adjoint operator in L2 (), i.e., ∀g, h ∈ L2 ()
T h, gL2 () = h, T gL2 () ,
which means
(T h)(x)g(x) dx =
h(x)(T g)(x) dx.
By definition of T h and T g we have ∇(T h) · ∇v dx = hv dx
∀v ∈ H01 (),
∇(T g) · ∇v dx =
gv dx
∀v ∈ H01 ().
Take v = T g ∈ H01 () in the first equality and v = T h ∈ H01 () in the second equality. We obtain hT (g) dx = h, T gL2 () , ∇(T h) · ∇(T g) dx =
∇(T g) · ∇(T h) dx =
gT (h) dx = g, T hL2 () .
Hence h, T gL2 () = g, T hL2 () =
∇(T h) · ∇(T g) dx,
which shows that T is self-adjoint in L2 (). (iii) T is compact from L2 () into L2 (). Take B a bounded set in L2 (). By (8.14), since T is linear and continuous from 2 L () into H01 (), the set T (B) is bounded in H01 (). We now use the fact that is bounded and the Rellich–Kondrakov theorem, Theorem 5.3.3, to conclude that T (B) is relatively compact in L2 (). (iv) T is positive definite. By (8.8) we have 2 ∀v ∈ L () T h, h = |∇(T h)|2 dx ≥ 0,
that is, T is positive. Moreover, if T h, h = 0, then ∇(T h) = 0, that is, T h is locally constant. Since T h ∈ H01 (), this forces T h to be equal to zero. Coming back to the definition of T h, T h = 0 means that hv dx = 0 for all v ∈ H01 (). By the density of H01 () into L2 (), we conclude that h = 0. We can summarize the results of this section and say that the operator T = (−)−1 is a linear continuous, self-adjoint, compact, positive operator from L2 () into L2 (). This will allow us to obtain in the next section the spectral decomposition of the Laplace–Dirichlet operator.
i
i i
i
i
i
i
286
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
8.3
Existence of a Hilbertian basis of eigenvectors of the Laplace–Dirichlet operator
Let us first recall the well-known (see, for instance, Brezis [90]) abstract “diagonalization” theorem for compact self-adjoint positive definite operators. Theorem 8.3.1. Let us assume that H is a separable Hilbert space with dim H = +∞. Let T : H −→ H be a linear continuous self-adjoint compact and positive definite operator. Then we have the following: (i) T is diagonalizable: there exists a Hilbertian basis of eigenvectors of T . (ii) The set (T ) of eigenvalues of T is countable. It can be written as a sequence (µn )n∈N of positive distinct real numbers that decreases to zero as n → +∞ 0 ← µn < · · · < µ3 < µ2 < µ1 . (iii) For each µn ∈ (T ), Eµn = ker(T − µn I ) is a finite dimensional subspace of H : it is the eigensubspace relative to the eigenvalue µn . Its dimension is called the multiplicity of µn . (iv) For all µi = µj , µi , µj ∈ (T ), Eµi ⊥ Eµj (orthogonal subspaces). (v) H = ⊕n∈N Eµn , i.e., ∀x ∈ H
x=
projEµn (x)
n∈N
and ∀x ∈ H
Tx =
µn projEµn (x).
n∈N
Remark 8.3.1. The situation described in the above statement is simplified by the fact that here we have assumed T be positive definite, i.e., ker T = 0, which allows us to avoid considering µ = 0 in the spectral decomposition (the null space ker T may be infinite dimensional). Sketch of the proof of Theorem 8.3.1. It is worthwhile to recall some of the basic ingredients of the proof of Theorem 8.3.1. (a) First notice that (T ) is a bounded subset of (0, +∞): if µ ∈ (T ), then there exists some u ∈ H , u = 0 such that T u = µu. Hence T u, u = µ|u|2 . Since u = 0 we have T u, u > 0 (T is positive definite), which forces µ to be positive. On the other hand, since T is linear continuous, µ|u|2 ≤ |T u|H |u|H ≤ T L(H,H ) |u|2 ,
i
i i
i
i
i
i
8.3. Existence of a Hilbertian basis of eigenvectors
“abmb 2005/1 page 2 i
287
which gives 0 < µ ≤ T L(H,H ) . (b) Now take µ = ν with µ, ν ∈ (T ). By definition of (T ) and Eµ , Eν we have ∀h ∈ Eµ T h = µh, ∀k ∈ Eν T k = νk. We deduce
T h, k = µh, k, T k, h = νk, h.
Since T is self-adjoint, we have T h, k = h, T k. Hence (µ − ν)h, k = 0. We have assumed µ = ν. This forces h and k to satisfy h, k = 0, that is, Eµ ⊥ Eν . (c) It is interesting to see where the compactness on T comes into play. Let us notice that for any µ ∈ (T ) the subspace Eµ is closed and invariant by T ; if h ∈ Eµ , i.e., T h = µh, then T (T h) = µ(T h), i.e., T h ∈ Eµ . Hence T : Eµ −→ Eµ and Eµ is a Hilbert space for the induced structure of H . Moreover, if BEµ denotes the unit ball in Eµ , we have T (BEµ ) = µBEµ . Since µ = 0 and T is compact, this forces BEµ to be relatively compact, that is, dim Eµ < +∞. We have obtained the following decomposition of H as a Hilbertian sum of eigenspaces: A (8.15) Eµn . H = n∈N
To derive from this formula a Hilbertian basis of H we have to pick up in each Eµn an orthonormal basis whose cardinal is equal to the (finite) multiplicity of µn . To keep a quite simple notation we adopt the following convention. Definition 8.3.1. We now decide to count the eigenvalues of T according to their multiplicity, that is, µ1 is repeated k1 times where k1 is the multiplicity of µ1 ... µn is repeated kn times where kn is the multiplicity of µn ... and so on. Clearly, in this way, we obtain a sequence of positive real numbers, which we still denote by (µn )n∈N , such that 0 ← µn ≤ · · · µ3 ≤ µ2 ≤ µ1 . Note that now the µi are not necessarily distinct.
i
i i
i
i
i
i
288
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
The convention above allows us to pick up an orthonormal basis in each finite dimensional eigensubspace to obtain a Hilbertian basis (hn )n∈N in H which satisfies T hn = µn hn for every n ∈ N. Let us now come back to our model example. Clearly, by Proposition 8.2.1, the operator T = (−)−1 which is considered as acting from H = L2 () into L2 () satisfies all the conditions of the abstract diagonalization theorem, Theorem 8.3.1. Thus, there exists a Hilbertian basis (en )n∈N of L2 () such that for each n ∈ N en is an eigenvector of T . More precisely, T en = µn en , and (µn )n∈N is a sequence of positive numbers which decreases to zero. By Lemma 8.2.1, we deduce that 1/µn is an eigenvalue of the Laplace–Dirichlet operator and that en is a corresponding eigenvector. This means −en = µ1n en on , en = 0 on ∂, the solution en being taken in the variational sense, i.e., en ∈ H01 () and 1 en v dx ∀v ∈ H01 (). ∇en · ∇v dx = µn Indeed, it is immediate to verify that the above equality is equivalent to T (en /µn ) = en , that is, T (en ) = µn en . Let us now forget the operator T = (−)−1 which was just a technical ingredient in our study and convert the previous results directly in terms of −, the Laplace–Dirichlet operator. Noticing that the sequence (λn )n∈N with λn = 1/µn is now an increasing sequence of positive numbers which tends to +∞ as n → +∞, we obtain the following theorem. Theorem 8.3.2. The Laplace–Dirichlet operator has a countable family of eigenvalues (λn )n∈N which can be written as an increasing sequence of positive numbers which tends to +∞ as n → +∞: 0 < λ1 ≤ λ2 ≤ · · · ≤ λn ≤ · · · . Each eigenvalue is repeated a number of times equal to its multiplicity (which is finite). There exists a Hilbertian basis (en )n∈N of L2 () such that for each n ∈ N, en is an eigenvector of the Laplace–Dirichlet operator relatively to the eigenvalue λn : −en = λn en on , en = 0 on ∂. We already mentioned that the spectral analysis of the Laplace–Dirichlet operator could have been, as well, developed in the space H01 (). Indeed, there is a direct link between these two approaches, which is described below. √ Proposition 8.3.1. The family en / λn n∈N is a Hilbertian basis of the space H01 () equipped with the scalar product u, v = ∇u · ∇v dx.
i
i i
i
i
i
i
8.4. The Courant–Fisher min-max and max-min formulas Proof. (a) Let us start from the definition of en : ∇en · ∇v dx = λn en v dx
“abmb 2005/1 page 2 i
289
∀v ∈ H01 ().
By taking v = en ∈ H01 (), we obtain
|∇en |2 dx = λn
en2 dx = λn .
Hence en 2H 1 () = λn 0
and
en √ λn
H01 ()
= 1.
(b) Let us verify the orthogonality property in H01 (): take n = m
em en √ ,√ λn λm
H01 ()
1 =√ ∇en · ∇em dx λn λm 1 λn en em dx, =√ λn λm
2 which is equal to zero because (e is an orthogonal system in L (). √n )n∈N (c) Let us verify that en / λn n∈N generates a vector space which is dense in H01 (). Equivalently, we have to verify that if f ∈ H01 () is such that, for all n ∈ N,
en f, √ λn
= 0, H01 ()
then f = 0. Let us notice that en 1 f, √ =√ ∇f · ∇en dx λn H01 () λn # = λn f en dx.
Since λn = 0, our assumption becomes f en dx = 0 for all n ∈ N, which clearly implies f = 0 because (en )n∈N is an orthonormal basis in L2 ().
8.4 The Courant–Fisher min-max and max-min formulas Let us start with the variational characterization of the first eigenvalue λ1 (−) of the Laplace–Dirichlet operator. Without ambiguity, we write λ1 . To introduce this result,
i
i i
i
i
i
i
290
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
let us notice that if λ is an eigenvalue of −, then there exists some u ∈ H01 (), u = 0 such that ∇u · ∇v dx = λ uv dx ∀v ∈ H01 ().
By taking v = u and noticing that u = 0 we obtain |∇u(x)|2 dx λ = . u(x)2 dx
(8.16)
The above expression plays a central role in the variational approach to eigenvalue problems for the Laplace equation. Definition 8.4.1. For any v ∈ H01 (), v = 0, let us write |∇v(x)|2 dx R(v) = . v(x)2 dx
R : H01 () −→ R+ is called the Rayleigh quotient. From (8.16) we immediately obtain that for any eigenvalue λ of the Laplace–Dirichlet operator, % & λ ≥ inf R(v) : v ∈ H01 (), v = 0 , which is equivalent to saying % & λ1 ≥ inf R(v) : v ∈ H01 (), v = 0 .
(8.17)
Indeed, there is equality between these two expressions. That is the object of the following theorem. Theorem 8.4.1 (Courant–Fisher formula). Assume is a bounded open subset of RN . The first eigenvalue λ1 of the Laplace–Dirichlet operator on is given by the following variational formula: 2 |∇v(x)| dx 1 λ1 = min : v ∈ H0 (), v = 0 . v 2 (x) dx
Moreover, the infimum above is achieved and the solutions of this variational problem are the eigenvectors relative to the first eigenvalue λ1 . Proof. By (8.17) we need only to prove the inequality % & inf R(v) : v ∈ H01 (), v = 0 ≥ λ1 ,
i
i i
i
i
i
i
8.4. The Courant–Fisher min-max and max-min formulas
“abmb 2005/1 page 2 i
291
that is, ∀v ∈ H01 (), v = 0,
R(v) ≥ λ1 .
(8.18)
R(v) in a Hilbertian basis of eigenvectors The idea is to express, for any v ∈ of (−). Indeed, we know by Theorem 8.3.2 and Proposition 8.3.1 that there exists a Hilbertian basis (en )n∈N of √ L2 () such that en is an eigenvector of − relatively to the eigenvalue λn and that en / λn is a corresponding Hilbertian basis in H01 (). By using the Bessel–Parceval inequality respectively in H01 () and L2 (), we have +∞ en v2H 1 () = |∇v(x)|2 dx = v, √ 2H 1 () , (8.19) 0 λn 0 n=1 H01 (),
v2L2 ()
v(x) dx =
=
2
+∞
v, en 2L2 () .
(8.20)
n=1
One can easily compare these two quantities by using that en is an eigenvalue of − relatively to λn . Indeed, (8.19) gives +∞ 2 1 2 vH 1 () = ∇v · ∇en dx 0 λ n=1 n +∞ +∞ 2 λ2 n en v dx = λn v, en 2L2 () = λ n n=1 n=1 ≥ λ1
+∞
v, en 2L2 () = λ1 v2L2 () ,
n=1
where in the last equality we use (8.20). Hence, R(v) ≥ λ1 for any v ∈ H01 (), v = 0, which concludes the first part of the theorem. Let us now notice that by (8.16), any eigenvector v relative to the eigenvalue λ1 is a solution of the Courant–Fisher minimization problem. Conversely, if v is such a solution, by the above Parceval equalities (8.19), (8.20) we must have +∞ 2 v2H 1 () n=1 λn v, en 0 R(v) = = = λ1 , (8.21) +∞ 2 2 vL2 () n=1 v, en i.e.,
+∞
(λn − λ1 )v, en 2 = 0.
n=1
This forces v, en to be equal to zero for all indices n ∈ N∗ such that λn = λ1 . Equivalently, v must belong to the eigenspace relative to λ1 . Another elegant and direct approach to the Courant–Fisher formula relies on the theory of Lagrange multipliers. Let us start with the following equivalent formulation of the Courant–Fisher variational formula for λ1 : inf |∇v(x)|2 dx : v ∈ H01 (), v(x)2 dx = 1 . (P)
i
i i
i
i
i
i
292
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
Problem (P) is now seen as a constrained minimization problem, namely, the minimization of the Dirichlet energy functional on the unit sphere of L2 (). That is where Lagrange multipliers naturally come into play! Let us first notice that by using the direct methods of the calculus of variations and the Rellich–Kondrakov theorem, one can easily prove that problem (P) admits a solution u. To write some satisfied by such solution u we take V = H01 () optimality condition 2 with u, v = ∇u · ∇v dx and v = v, v, and we consider the functionals |∇v(x)|2 dx = v2 , F (v) = G(v) = v 2 (x) dx.
Then (P) is equivalent to % & min F (v) : G(v) = 1, v ∈ V . % & The sphere S = v ∈ V : G(v) = 1 is a submanifold of class C1 and codimension 1 in V . Indeed, for any u, v we have, as t → 0, 1 G(u + tv) − G(u) −→ 2 uv dx. t Let us interpret this limit as a linear continuous form on H01 (): uv dx = h, vH01 () = ∇h · ∇v dx ∀v ∈ H01 ()
means
−h = u on , h = 0 on ∂,
i.e., h = T u, where T = (−)−1 . Hence G is Frechet differentiable on V and ∇G(u) = 2T u. The theory of Lagrange multipliers applies in our situation and we have that there exists µ ∈ R such that ∇F (u) = µ∇G(u), that is, u = µT u. Equivalently,
−u = µu on , u = 0 on ∂,
which implies
µ = R(u) =
|∇u(x)|2 dx = λ1 . u(x)2 dx
Let us summarize the previous results in the following statement.
i
i i
i
i
i
i
8.4. The Courant–Fisher min-max and max-min formulas
“abmb 2005/1 page 2 i
293
Proposition 8.4.1. The first eigenvalue λ1 (−) of the Laplace–Dirichlet operator is a Lagrange multiplier of the constrained minimization problem |∇v(x)| dx : v ∈
min
2
H01 (),
v (x) dx = 1 . 2
The above approach provides a direct variational proof of the existence of an eigenvalue (indeed, the first λ1 (−)) of the Laplace–Dirichlet operator. The whole theory can then be developed in this way, by using a recursive argument: in the next step we can apply the same argument in the orthogonal subspace of V1 (which is the eigenspace relative to λ1 ) and so on. Let us make this precise in the following statement. Proposition 8.4.2. Let 0 < λ1 ≤ λ2 ≤ · · · ≤ λn ≤ · · · be the sequence of the eigenvalues of the Laplace–Dirichlet operator (repeated in accordance with their multiplicities) and (en )n∈N a corresponding Hilbertian basis of eigenvectors in L2 (): −en = λn en on , en = 0 on ∂. Let us denote by Vn the subspace of H01 () generated by the first eigenvectors e1 , . . . , en , Vn = span{e1 , e2 , . . . , en }, and by Vn⊥ the orthogonal of Vn in H01 () with respect to the scalar product of H01 (): u, v = ∇u · ∇v dx.
Then the following variational formulas hold: % & λ1 = min R(v) : v ∈ H01 (), v = 0 , % & λ2 = min R(v) : v ∈ V1⊥ , v = 0 ... % & ⊥ λn = min R(v) : v ∈ Vn−1 , v = 0 .... Proof. By definition of λn and en ∇en · ∇v dx = λn en v dx
∀v ∈ H01 ().
By taking v = en we obtain R(en ) = λn ,
n = 1, 2, . . . .
⊥ Hence, by noticing that en ∈ Vn−1 , we first obtain % & ⊥ λn ≥ inf R(v) : v ∈ Vn−1 , v = 0 .
(8.22)
i
i i
i
i
i
i
294
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
To obtain the reverse inequality, we use an argument similar to the one used in the proof of Theorem 8.4.1, which relies on the Bessel–Parceval equality. Recall that (see (8.21)) for any v ∈ H01 () +∞ 2 i=1 λi v, ei L2 () R(v) = +∞ 2 i=1 v, ei L2 () and that v, ei H01 () = λi v, ei 2L2 () , which allows to pass from orthogonality in H01 () to orthogonality in L2 () and vice versa. ⊥ Hence, for any v ∈ Vn−1 we have i≥n
R(v) = Therefore
λi v, ei 2L2 ()
2 i≥n v, ei L2 ()
≥ λn .
% & ⊥ inf R(v) : v ∈ Vn−1 , v = 0 ≥ λn .
(8.23)
Combining inequalities (8.22) and (8.23) yields the result. The fact that the infimum is actually attained follows from the direct methods of the calculus of variations, being the functional v → R(v) coercive and weakly lower semicontinuous. We can now state the Courant–Fisher min-max and max-min formulas. Theorem 8.4.2 (Courant–Fisher min-max and max-min formulas). With the same notation as in Proposition 8.4.2, we have λn = min
max R(v)
M∈Ln v∈M, v =0
= max
min
M∈Ln−1 v∈M ⊥ , v =0
R(v),
where Ln is the class of all n-dimensional linear subspaces in H01 () and M ⊥ stands for the orthogonal subspace of M in H01 (). Proof. (a) Take first M = Vn = span{e1 , . . . , en }. For any v ∈ Vn , we have v, ei H01 () = v, ei L2 () for all i > n. By expressing R(v) as (see (8.21)) +∞
2 i=1 λi v, ei L2 () R(v) = +∞ , 2 i=1 v, ei L2 ()
we obtain
n i=1
R(v) = n
λi v, ei 2L2 ()
2 i=1 v, ei L2 ()
Hence
≤ λn .
% & max R(v) : v ∈ Vn , v = 0 ≤ λn
(8.24)
i
i i
i
i
i
i
8.4. The Courant–Fisher min-max and max-min formulas
“abmb 2005/1 page 2 i
295
(indeed equality holds taking v = en ) and consequently min
max R(v) ≤ λn .
M∈Ln v∈M,v =0
Let us now prove the reverse inequality. Equivalently, we have to prove that for any ndimensional subspace M of H01 (), % & λn ≤ max R(v) : v ∈ M, v = 0 . (8.25) Take a subspace M of H01 () with dim M = n. We claim that ⊥ M ∩ Vn−1
= {0}.
This is a consequence of the classical relation linking the dimension of the image and the kernel of a linear mapping: take P : M −→ Vn−1 to be the linear mapping which, to any v ∈ M, associates P (v) = projVn−1 v, the projection of v on Vn−1 . We have dim M = n = dim(ker P ) + dim(P (M)). Since P (M), the image of M per P , is contained in Vn−1 , we have dim(P (M)) ≤ n − 1. Hence dim(ker P ) ≥ n − (n − 1) = 1. ⊥ Equivalently, there exists some v ∈ M, v = 0 such that projVn−1 v = 0, that is, v ∈ M ∩Vn−1 , which proves the claim. ⊥ Take now any v ∈ M ∩ Vn−1 , v = 0. We know by Proposition 8.4.2 that % & ⊥ λn = min R(v) : v ∈ Vn−1 .
Hence λn ≤ R(v) % & ≤ max R(v) : v ∈ M, v = 0 , which proves (8.25) and completes the proof of the min-max formula. (b) The proof of the max-min formula is very similar to the proof of the min-max formula. First note that by Proposition 8.4.2, % & ⊥ λn = min R(v) : v ∈ Vn−1 , v = 0 . (8.26) Hence λn ≤ sup
inf
⊥ M∈Ln−1 v∈M , v =0
R(v).
(8.27)
To prove the reverse inequality, we need to show that for any M ∈ Ln−1 we have λn ≥
inf
v∈M ⊥ , v =0
R(v).
(8.28)
i
i i
i
i
i
i
296
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
We claim that there exists some v ∈ M ⊥ ∩ Vn with v = 0. This can be obtained by considering the linear mapping Q : Vn −→ M which, to any v ∈ Vn , associates Q(v) = projM v. We have n = dim Vn = dim(ker Q) + dim Q(Vn ) . Since dim Q(Vn ) ≤ dim M = n − 1, we have dim(ker Q) ≥ 1. Equivalently, there exists some v ∈ Vn , v = 0 such that projM v = 0, that is v ∈ M ⊥ ∩ Vn . We now use (8.24) to obtain λn ≥ R(v) ≥ inf
v∈M ⊥ , v =0
R(v),
that is, (8.28). Hence λn = supM∈Ln−1 inf v∈M ⊥ , v =0 R(v). Moreover, by (8.26) we have that the sup is a max (it is precisely attained by taking M = Vn−1 ) and the inf is a min (take v = en ). Finally, λn = max
min
M∈Ln−1 v∈M ⊥ v =0
R(v),
which ends the proof. Remark 8.4.1. (1) It is worth pointing out that the Courant–Fisher min-max and max-min principles, which give a variational characterization of the eigenvalues of the Laplace– Dirichlet operator, hold for the sequence (λn )n∈N of eigenvalues which is expressed according to the multiplicity condition. This is another justification of this convention which gives the information on the values of the eigenvalues and on their multiplicities. For these reasons, we call the (λn )n∈N with the multiplicity condition (i.e., λn is repeated a number of times equal to its multiplicity) the sequence of eigenvalues of the Laplace–Dirichlet operator. (2) In Proposition 8.4.2, the (λn )n∈N are obtained by a recursive formula: one has first to know Vn−1 to obtain λn . By contrast, the Courant–Fisher min-max and max-min principles provide a direct variational formulation of the eigenvalues of the Laplace–Dirichlet operator. (3) The Courant–Fisher min-max principle is the point of departure for the Ljusternik– Schnirelman theory of critical points. Indeed, in 1930, Ljusternik wrote, “The theory of eigenvalues of quadratic form developed by R. Courant enables one to discern their existence and reality without calculations. We shall generalize their theory to arbitrary functions having continuous second partial derivatives.” Typically, the Ljusternik theory deals with the variational approach to nonlinear eigenvalue problems of the type F (u) = λu,
u ∈ X, λ ∈ R, u = 1,
where X is a separable Hilbert space, dim X = +∞, F : X −→ R is even, of class C1 , with F compact.
i
i i
i
i
i
i
8.5. Multiplicity and asymptotic properties of the eigenvalues
“abmb 2005/1 page 2 i
297
Let us conclude this section by making a direct connnection between the first eigenvalue λ1 (−) of the Laplace–Dirichlet operator and the Poincaré constant (cf. Definition 5.3.1). Let us recall that the Poincaré constant is the smallest constant C such that for any v ∈ H01 () 1/2 1/2 2 2 |∇v(x)| dx . v(x) dx ≤C
This is equivalent to saying that % & 1 = inf R(v) : v ∈ H01 (), v = 0 , 2 C i.e., 1/C 2 = λ1 . In other words, we have obtained the following result. Proposition 8.4.3. The Poincaré constant C and the first eigenvalue λ1 of the LaplaceDirichlet operator are related by the following formula: 1 = λ1 . C2
8.5
Multiplicity and asymptotic properties of the eigenvalues of the Laplace–Dirichlet operator
The first eigenvalue λ1 (−) plays a fundamental role, for example, in the analysis of the resonance phenomena for vibrating structures and in some related shape optimization problems. Indeed, the first eigenvalue λ1 (−) enjoys remarkable properties as stated in the following result. Theorem 8.5.1. Let be a bounded connected regular open set in RN . The first eigenvalue λ1 of the Laplace–Dirichlet operator has multiplicity equal to one. Its eigenspace is generated by a vector e1 ∈ H01 () such that e1 > 0 on . Proof. Let us denote by E1 the eigenspace relative to the first eigenvalue λ1 . We recall that the Courant–Fisher theorem, Theorem 8.4.1, asserts that the elements v ∈ E1 , v = 0, are the solutions of the minimization problem 2 |∇v(x)| dx 1 min (P) : v ∈ H0 (), v = 0 . v(x)2 dx
An important consequence of this formula is that if v ∈ E1 , then automatically |v| ∈ E1 . This follows from the fact that the truncations operate on the space H01 (). In particular, see Corollary 5.8.1, for any v ∈ H01 (), |v| ∈ H01 (), and R(|v|) = R(v). The following argument proceeds by contradiction and makes use of the strong maximum principle. Suppose that one can find two elements v1 and v2 in the eigensubspace E1 which are not proportional.
i
i i
i
i
i
i
298
“abmb 2005/1 page 2 i
Chapter 8. Spectral Analysis of the Laplacian
Because of the regularity assumption on , v1 and v2 are smooth functions and it makes sense to consider their values at any point x ∈ . Thus we can find x0 and x1 ∈ such that α1 v1 (x0 ) + α2 v2 (x0 ) = 0, α1 v1 (x1 ) + α2 v2 (x1 ) = 0 for some α1 , α2 ∈ R. Now take w = |α1 v1 + α2 v2 |. Since v1 , v2 ∈ E1 , we have α1 v1 + α2 v2 ∈ E1 and w = |α1 v1 + α2 v2 | still belongs to E1 (as shown just above, as a consequence of the Courant–Rayleigh variational formula for λ1 ). Let us summarize the properties of w: w ∈ E1 , w ≥ 0, w(x0 ) = 0, w(x1 ) = 0. Since −w = λ1 w, from λ1 > 0 and w ≥ 0, we deduce that −w ≥ 0, w = 0 on ∂, w(x0 ) = 0, x0 ∈ . The strong maximum principle property of Hopf now implies that w = 0 on , a clear contradiction to the fact that w(x1 ) = 0. Thus E1 has dimension one. By taking any vector w ∈ E1 \{0} and e1 = |w| we obtain a vector in E1 which satisfies, by using again the strong maximum principle, e1 > 0 on . Corollary 8.5.1. Let be as in Theorem 8.5.1 and take any eigenvector ei of the Laplace– Dirichlet operator corresponding to an eigenvalue λi > λ1 . Then the sign of ei is not constant on . Proof. Since λi = λ1 we have E(λi ) ⊥ E(λ1 ) and , this forces ei to change sign on .
ei (x)e1 (x) dx
= 0. Since e1 > 0 on
This means that the status of the first eigenvalue is very particular. It is the only eigenvalue which possesses an eigenvector with constant sign. Indeed, in the analysis of the second eigenvalue problem λ2 (−), the nodal set of a second eigenvector (the set where it is equal to zero) plays a central role. The explicit computation of the spectrum of the Laplace–Dirichlet operator is possible only in very particular situations. Nevertheless, even in situations where such a computation is not possible (or is too complicated), one can get rather precise information on the spectrum by using comparison arguments. To stress the dependance of eigenvalues with respect to , let us denote by (λn ())n∈N the sequence of eigenvalues of the Laplace–Dirichlet operator on (with the multiplicity convention). Then, as a direct consequence of the Courant–Fisher min-max principle, we have the following comparison result.
i
i i
i
i
i
i
8.5. Multiplicity and asymptotic properties of the eigenvalues
“abmb 2005/1 page 2 i
299
˜ be two open bounded subsets of RN with ⊂ . ˜ Then, Proposition 8.5.1. Let and for any n ≥ 1, ˜ ≤ λn (), λn () i.e., λn () is a decreasing function of . Proof. For any v ∈ H01 (), let us denote by v˜ the function which is equal to v on and ˜ Moreover, ˜ \ . By Proposition 5.1.1, we have v˜ ∈ H01 (). zero on ˜|v(x)|2 dx, |v(x)|2 dx = ˜ 2 dx. |∇v(x)|2 dx = |∇ v(x)| ˜ ˜
˜ by the mapping Hence, H01 () can be isometrically identified with a subspace of H01 () i˜
˜ ˜ v ∈ H01 ()−→v˜ = i(v) ∈ H01 (). ˜ ˜ is an n∈ Ln () If M ∈ Ln () is an n-dimensional subspace of H01 (), then i(M) 1 ˜ dimensional subspace of H0 (). We can now apply the Courant–Fisher min-max formula (Theorem 8.4.2) to obtain λn () = = = ≥
min
max R(v, )
min
˜ max R(v, ˜ )
M∈Ln () v∈M, v =0 M∈Ln () v∈M, v =0
˜ max R(w, )
min
˜ W =i(M), M∈Ln () w∈W \{0}
min
˜ = λn (), ˜ max R(w, )
˜ w∈W \{0} M∈Ln ()
which ends the proof. To go further and use this comparison result we need to know some particular situations where the spectrum of − can be explicitely computed. Let us start with the simplest situation, that is, N = 1 and = (0, 1). Proposition 8.5.2. Let N = 1 and = (0, 1). Then, the eigenvalues (λn )n∈N of the Laplace–Dirichlet operator are given by λn = n2 π 2 ,
n = 1, 2 . . . ,
and the corresponding orthonormal basis (en )n∈N of eigenvectors in L2 () is given by √ en (x) = 2 sin(nπ x). Proof. The proof is elementary. When solving the ordinary differential equation u + λu = 0,
i
i i
i
i
i
i
300
“abmb 2005/1 page 3 i
Chapter 8. Spectral Analysis of the Laplacian
one obtains
√ √ u(x) = A sin( λx) + B cos( λx).
The boundary √ condition u(0) = 0 gives B = 0 and the boundary condition u(1) = 0 gives sin λ = 0, that is, λ = n2 π 2 , for some n ≥ 1. The √ corresponding solution is u(x) = A sin(nπ x). After L2 -normalization one obtains A = 2. In this very simple situation, each eigenvalue has multiplicity one. Let us now study the Laplace equation with the Dirichlet boundary condition on the N -cube = (0, 1)N and the corresponding eigenvalue problem. Proposition 8.5.3. Let = (0, 1)N . For each p = (p1 , p2 , . . . , pN ) with pi ∈ N\{0}, i = 1, 2, . . . , N (i.e., p ∈ (N∗ )N ), the positive real number λp := π 2 (p12 + p22 + · · · + pN2 ) is an eigenvalue of the Laplace–Dirichlet operator on = (0, 1)N and the function up (x) = 2N/2
N !
sin(πpi xi )
i=1
is an eigenfunction corresponding to the eigenvalue λp . Indeed, & % (−, ) = λp : p ∈ (N∗ )N , i.e., of − on = (0, 1)N can be expressed in this way, and the family % all the eigenvalues & ∗ N up : p ∈ (N ) is an orthonormal basis of L2 (). Proof. Take p = (p1 , p2 , . . . , pN ), v ∈ D() and compute ∇up (x) · ∇v(x) dx =
N ∂up ∂v (x) (x) dx. ∂x ∂x i i i=1
Let us notice that up (x) =
N !
with epi (xi ) =
epi (xi )
√ 2 sin(πpi xi ).
i=1
Consequently, ∂ up (x) = ∂xi (ep i
-
!
. epj (xj ) ep i (xi ).
j =i
stands for the derivative of the function of one variable epi (·).) Hence ∇up (x) · ∇v(x) dx
=
N i=1
·
0
1
ep i (xi ) !
(0,1)N −1 j =i
∂v (x) dxi ∂xi
epj (xj ) dx1 . . . dxi−1 dxi+1 . . . dxN .
i
i i
i
i
i
i
8.5. Multiplicity and asymptotic properties of the eigenvalues
“abmb 2005/1 page 3 i
301
An integration by parts yields 1 1 ∂v epi (xi ) (x) dxi = − ep i (xi )v(x) dxi ∂xi 0 0 1 2 2 epi (xi )v(x) dxi . = π pi 0
ep i
+ = 0, since epi is an eigenvector (The last equality follows from the fact that 2 2 relative to the eigenvalue π pi of the one-dimensional Laplace–Dirichlet problem.) Thus ' N ( 2 2 pi up (x)v(x) dx. ∇up (x) · ∇v(x) dx = π
π 2 pi2 epi
i=1
By a classical density and extension by continuity argument, this equality can be extended to an arbitrary v ∈ H01 (), and ∇up (x) · ∇v(x) dx = λp up v dx ∀v ∈ H01 (), u ∈ H 1 (), p
0
2 which precisely means that λp = π 2 N i=1 pi is an eigenvalue of the Laplace–Dirichlet < N eigenvector. operator on (0, 1)N , and up (x) = i=1 epi (xi ) is a corresponding % & Let us now show that the family of eigenfunctions up : p ∈ (N∗ )N is an orthonormal basis of L2 (). First let us notice that if p = q, then there exists at least one i ∈ {1, 2, . . . , N} such that pi = qi . From the orthogonality in L2 (0, %1) of the two functions & sin(pi π x) and cos(qi πx), we immediately obtain that the family up : p ∈ (N∗ )N is orthogonal in L2 (). % & The point that is more delicate is to prove that the family up : p ∈ (N∗ )N generates L2 () in the topological sense, that is, the vector space generated by this family of vectors is dense in L2 (). Indeed, by a careful application of the Fubini theorem, one can prove the following result (which is quite classical in integration theory and we omit its proof). Lemma 8.5.1. Let (vp )p∈N∗ and (wq )q∈N∗ be two Hilbertian bases of L2 (0, 1). Then the family of functions (x, y) → vp (x)wq (y) is a Hilbertian basis of L2 ((0, 1)2 ). % Thus,&by iterating this result a finite number of times we obtain that the family up : ∗ N 2 %p ∈ (N ) is∗ an & orthonormal basis of L (). This clearly implies that by taking = N λp : p ∈ (N ) we have obtained all the eigenvalues; otherwise, there would exist some v ∈ H01 (), v = 0, which is an eigenvector corresponding to some eigenvalue λ ∈ / . 2 () to all the By the orthogonality property this would imply that v is orthogonal in L % & up : p ∈ (N∗ )N which forms a basis, and hence v = 0, a clear contradiction. This completes the proof of the spectral analysis of the Laplace–Dirichlet operator in the case = (0, 1)N .
i
i i
i
i
i
i
302
“abmb 2005/1 page 3 i
Chapter 8. Spectral Analysis of the Laplacian
Proposition 8.5.3 and Proposition 8.5.1 (comparison principle) permit us to obtain a sharp estimation of the asymptotic behavior of the sequence (λn ())n∈N∗ of the eigenvalues of the Laplace–Dirichlet operator in a bounded open set in RN . Indeed, one can prove the following result. Theorem 8.5.2. Let (λn ())n∈N be the sequence of the eigenvalues of the Laplace–Dirichlet operator in a bounded open set in RN (with the multiplicity convention). Then, there exist two positive constants c and d , which depend only on , such that for all n ≥ 1, c n2/N ≤ λn () ≤ d n2/N . Sketch of the proof. The proof is quite technical but the idea is very simple. The idea consists in the comparison of with two N -cubes Qa and Qb such that Qa ⊂ ⊂ Qb , Qa = (−a/2, a/2)N , Qb = (−b/2, b/2)N . Then Proposition 8.5.1 applies and one obtains λn (Qa ) ≤ λn () ≤ λn (Qb ). Then the problem has been reduced to the evaluation of λn (Qa ) and λn (Qb ). Clearly λn (Qa ) = λn (Q)/a 2 and λn (Qb ) = λn (Q)/b2 , where Q = (0, 1)N . By Proposition 2 8.5.3, the numbers λn (Q)/π 2 are precisely the positive integers of the form N i=1 pi with % N 2 & ∗ pi ∈ N \ {0}. Thus, one has to arrange i=1 p &i : pi ∈ N as an increasing % the numbers sequence to obtain the sequence λn (Q)/π 2 : n ∈ N∗ : this is just a combinatorial problem! To that end, it is convenient to introduce for any t > 0 the quantity ν N (t) which is 2 the cardinal of all the elements p ∈ (N∗ )N such that p = (p1 , . . . , pN ) with N i=1 pi ≤ t. N/2 for Then, the key of the proof consists in showing the following estimate: νN (t) ∼ CN t some constant CN > 0. Remark 8.5.1. From Proposition 8.5.3, we can obtain, as indicated before, after some combinatorial argument, a complete description of the sequence (λn )n∈N of the eigenvalues of the Laplace–Dirichlet operator in = (0, 1)N . For example, (a) N = 2. Then λ1 () = 2π 2 , multiplicity = 1 (no surprise!), λ2 () = 5π 2 , multiplicity = 2, λ3 () = 8π 2 , multiplicity = 1, λ4 () = 10π 2 , multiplicity = 2, .... (b) N = 3. Then λ1 () = 3π 2 , λ2 () = 6π 2 , λ3 () = 9π 2 , ....
multiplicity = 1 (no surprise!), multiplicity = 3, multiplicity = 3,
Indeed, in the various examples we have encountered, the multiplicity of the second eigenvalue of the Laplace–Dirichlet operator does not obey a simple rule: it may be equal to one, two, three,….
i
i i
i
i
i
i
8.6. A general abstract theory for spectral analysis
“abmb 2005/1 page 3 i
303
This makes a sharp contrast to the first eigenvalue λ1 (−), which always has multiplicity one, and makes the second eigenvalue more delicate to work with. Remark 8.5.2. The previous analysis of the spectrum of the Laplace–Dirichlet operator on ˜ such a general bounded open set relies on the comparison of with a reference set ˜ ˜ that either ⊂ or ⊂ . There is another way to obtain comparison results, which consists in the use of rearrangement results (Steiner symmetrization). The idea is that this type of transformation preserves the measure and the L2 norm of the functions and makes smaller the (L2 )N norm of the gradient of a function (Dirichlet integral). In this way it is possible to compare the corresponding Rayleigh quotients. This device is very useful in shape optimization; it permits us, for example, to solve an old problem from Rayleigh which consists in proving that the ball minimizes the first eigenvalue among all open sets of given volume.
8.6 A general abstract theory for spectral analysis of elliptic boundary value problems So far, we have considered the spectral analysis of the Laplacian with Dirichlet boundary conditions. To be able to develop a similar analysis for more general linear elliptic operators and for different types of boundary conditions (like Neumann, mixed,…), let us introduce the following abstract setting. (i) Let V and H be two real Hilbert spaces (infinite dimensional spaces) such that i
V −→ H ; we assume that • V can be embedded in H by i which is linear continuous and one to one, H • V is dense in H (i.e., i(V ) = H ), • V is compactly embedded in H (i.e., i is compact) (as a typical example, take V = H01 () and H = L2 () with their usual Hilbertian structures). (ii) Let
a : V × V −→ R, (u, v) → a(u, v),
be a bilinear form on V × V which is symmetric continuous and coercive: ∃α > 0 such that ∀ v ∈ V
a(v, v) ≥ αv2 .
Here · stands for the norm in V . The norm and the scalar product in H are respectively denoted by | · |H (| · | without ambiguity) and ·, ·H (·, · without ambiguity). Note that we are in the situation of the Lax–Milgram theorem, Theorem 3.1.2, and for any L ∈ V ∗ there exists a unique u ∈ V which satisfies a(u, v) = L(v) ∀v ∈ V . Noticing that for any h ∈ H the linear form v ∈ V → h, vH
i
i i
i
i
i
i
304
“abmb 2005/1 page 3 i
Chapter 8. Spectral Analysis of the Laplacian
is continuous on V , we deduce from the Lax–Milgram theorem the existence of a unique solution u = T h of the following problem: a(T h, v) = h, vH , T h ∈ V. By using the same device as in Proposition 8.2.1, we can easily prove that T : H −→ H is a linear continuous, self-adjoint, compact, and positive definite operator. Thus, one can apply to T the abstract diagonalization theorem, Theorem 8.3.1, for compact, self-adjoint, positive definite operators and conclude that there exists a Hilbertian basis (en )n∈N in H of eigenvectors, T en = µn en , with (µn )n∈N , the decreasing sequence of positive eigenvalues (with multiplicity condition), which tends to zero as n → +∞. Note that now the family (en )n∈N is a Hilbertian basis of V , when V is equipped with the scalar product u, v := a(u, v) (which is equivalent to the initial one). We can now give a precise description of the solutions of the abstract spectral problem: find λ ∈ R such that there exists u ∈ V , u = 0 which satisfies a(u, v) = λu, vH
∀v ∈ V .
(When such u = 0 exists, it is called an eigenvector relative to the eigenvalue λ.) Theorem 8.6.1. Assume that the canonical injection of V into H is dense and compact and that the continuous bilinear form a : V × V −→ R is symmetric and coercive on V × V (V -elliptic). Then the eigenvalues λ of the abstract variational problem find λ ∈ R such that there exists u ∈ V , u = 0, a(u, v) = λu, vH ∀ v ∈ V , can be written as an increasing sequence of positive numbers (λn )n∈N which tends to +∞ as n → +∞ 0 < λ1 ≤ λ2 ≤ λ3 ≤ · · · ≤ λn ≤ · · · . (We again adopt the multiplicity convention: each eigenvalue is repeated a number of times equal to its multiplicity, which is finite.) There exists an orthonormal basis (Hilbertian basis) (en )n∈N of H such that for each n ∈ N, en is an eigenvector relative to the eigenvalue λn : a(en , v) = λn en , vH ∀v ∈ V , en ∈ V . √ Moreover, the sequence en / λn n∈N is a Hilbertian basis of V when this space is equipped with the (equivalent) scalar product a(·, ·).
i
i i
i
i
i
i
8.6. A general abstract theory for spectral analysis
“abmb 2005/1 page 3 i
305
We consider now some applications of the results above. (1) Neumann problem. Take in this case 1 V = v ∈ H () : v(x) dx = 0 and
a(u, v) =
∇u(x) · ∇v(x) dx.
The coercivity of a on V × V follows from the Poincaré–Wirtinger inequality (Corollary 5.4.1) and the compact embedding V −→ H = L2 () from the Rellich–Kondrakov theorem, Theorem 5.4.2 ( is assumed smooth and bounded). From Theorem 8.6.1, we deduce the existence of a Hilbertian basis (en )n∈N in L2 () such that −en = λn en on , ∂en = 0 on ∂, ∂ν where
∂ ∂ν
stands for the normal outwards derivative on the boundary ∂.
% (2) Mixed Dirichlet–Neumann problem (see Section 6.3). By taking V = v ∈ & H 1 () : γ0 (v) = 0 on 0 with HN −1 (0 ) > 0, one obtains the existence of a Hilbertian basis in L2 (), (en )n∈N such that −en = λn en on , en = 0 on 0 , ∂en = 0 on 1 = \ 0 . ∂ν (3) One can obtain similar results by replacing − by an elliptic linear operator A of the form ∂ ∂v ai,j + a0 u Av = − ∂xi ∂xj i,j or more generally when considering elliptic systems (elasticity, Stokes,…). Remark 8.6.1. The variational approach of Courant–Fisher works without any particular difficulty in such a general setting. One introduces the abstract Rayleigh quotient R(v) = a(v, v)/|v|2H and thus % & λ1 = min R(v) : v ∈ V , v = 0 , λn = min max R(v) M∈Ln v∈M, v =0
= max
min
M∈Ln−1 v∈M ⊥ , v =0
R(v).
Let us end this chapter and return to the situation which was our first motivation for this study, the method of separation of variables of Fourier, applied to the wave equation 2 ∂ u ∂t 2 − u = 0 on Q = × (0, +∞), u = 0 on = ∂ × (0, +∞), u(x, 0) = u0 (x) on , ∂u (x, 0) = u1 (x) on . ∂t
i
i i
i
i
i
i
306
“abmb 2005/1 page 3 i
Chapter 8. Spectral Analysis of the Laplacian
Denote by 0 < λ1 < λ2 ≤ λ3 ≤ · · · ≤ λn ≤ · · · the eigenvalues of the Laplace–Dirichlet 2 operator √ and by (en )n∈N a corresponding Hilbertian basis in L () of eigenvectors. Set ωn = λn . Then the unique (variational) solution of the above problem is given by the following formula (for u0 ∈ H01 () and u1 ∈ L2 () given): u(t) =
1 u0 , en L2 () cos(ωn t) + u1 , en L2 () sin(ωn t) en . ωn
+∞ n=1
Here the variational solution is taken in the following sense: for any 0 < T < +∞ u ∈ C (0, T ); H01 () ∩ C1 (0, T ); L2 () and 2 d 2 () + u(t), v ∇u(t) · ∇v dx = 0 L dt 2 u(0) = u0 ,
du (0) = u1 dt
in the distributional sense on (0, T ) ∀v ∈ H01 (),
(see, for example, Raviart–Thomas [199, Section 8.2] for further details).
i
i i
i
i
i
i
“abmb 2005/1 page 3 i
Chapter 9
Convex duality and optimization
In this chapter, unless specified, (V , ·V ) is a general normed linear space with topological dual V ∗ . For any v ∈ V and v ∗ ∈ V ∗ , we write v ∗ (v) = v ∗ , v(V ∗ ,V ) . Recall that V ∗ is a Banach space when equipped with the dual norm v ∗ V ∗ = sup{v ∗ , v : vV ≤ 1}. Without ambiguity, for simplicity of notation, we write · instead of · V , · ∗ instead of · V ∗ and v ∗ , v instead of v ∗ , v(V ∗ ,V ) .
9.1
Dual representation of convex sets
We know that several basic geometrical objects in a normed linear space V can be described by using continuous linear forms, i.e., elements of the topological dual space. For example, a closed hyperplane H can be written H = {v ∈ V : v ∗ , v = α} for some v ∗ ∈ V ∗ , v ∗ = 0, and α ∈ R. Similarly, a closed half-space H can be written H = {v ∈ V : v ∗ , v ≤ α}. Intersections of finite collections of closed half-spaces yield convex polyhedra. Indeed, we are going to show that arbitrary closed convex sets in V can be described by using only linear continuous forms. This is what we call a dual representation. This theory is based on the Hahn–Banach theorem (which we stated in Theorem 3.3.1), which is formulated below. Theorem 9.1.1. Let C be a nonempty closed convex subset of a normed linear space V . Then, each point u ∈ / C can be strongly separated from C by a closed hyperplane, which means ∃u∗ ∈ V ∗ , u∗ = 0, ∃α ∈ R such that u∗ , u > α and u∗ , v ≤ α ∀ v ∈ C. 307
i
i i
i
i
i
i
308
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization From a geometrical point of view, this means that C is contained in the closed half-
space
= > H{u∗ ≤α} := v ∈ V : u∗ , v ≤ α ,
whereas u is in the complement: H{u∗ >α} := {v ∈ V : u∗ , v > α}. Proof. Let us give the proof of Theorem 9.1.1 when V is a Hilbert space. In that case, one can give a constructive proof relying on the projection theorem on a closed convex subset. (In the general case of a normed linear space V , one can use the analytic version of the Hahn–Banach theorem, which itself is a consequence of the Zorn lemma.) Let us denote by PC (u) the projection of u on C. It is characterized by the angle condition (optimality condition) u − PC (u), v − PC (u) ≤ 0 ∀ v ∈ C, PC (u) ∈ C. Set z := u − PC (u). Since u ∈ / C, we have z = 0 and we can rewrite the above inequality in the following form: supz, v ≤ z, PC (u).
(9.1)
v∈C
On the other hand, by definition of z and since z = 0 0 < |z|2 = z, z = z, u − z, PC (u), which implies z, PC (u) < z, u.
(9.2)
Take α := z, PC (u). Combining (9.1) and (9.2) we obtain supz, v ≤ α < z, u, v∈C
i.e., C ⊂ H{z,.≤α} and u ∈ H{z,.>α} . As a direct consequence of Theorem 9.1.1 we obtain the following corollary. Corollary 9.1.1. Let C be a nonempty closed convex subset of a normed linear space V . Then C is equal to the intersection of all closed half-spaces that contain it: C=
H{v∗ ≤α} .
C⊂H{v∗ ≤α}
Proof. Let us denote by F the set = > F = (v ∗ , α) ∈ V ∗ × R : C ⊂ H{v∗ ≤α} .
i
i i
i
i
i
i
9.1. Dual representation of convex sets
“abmb 2005/1 page 3 i
309
Clearly C ⊂ (v∗ ,α)∈F H{v∗ ≤α} . Let us prove the converse inclusion (v∗ ,α)∈F H{v∗ ≤α} ⊂ C. By taking the complement, this is equivalent to proving $ V \C ⊂ V \ H{v∗ ≤α} , (v ∗ ,α)∈F
which is precisely the conclusion of the Hahn–Banach separation theorem, Theorem 9.1.1. Among closed convex sets, an important subclass is obtained by taking the intersection of a finite number of closed half-spaces. Definition 9.1.1. A closed convex polyhedron P is an intersection of finitely many closed half-spaces: in other words, there exist v1∗ , . . . , vk∗ ∈ V ∗ with vi∗ = 0 and α1 , . . . , αk ∈ R such that % & P = v ∈ V : vi∗ , v ≤ αi for i = 1, . . . , k . In the representation of closed convex sets as the intersection of closed half-spaces, it is natural to look for the simplest representation. To that end, let us observe the following elementary facts: (a) α ≥ α and C ⊂ H{v∗ ≤α} ⇒ C ⊂ H{v∗ ≤α } ; (b) fixing v ∗ = 0 and making α vary provides parallel hyperplanes. From Corollary 9.1.1 and the above observations, we deduce C= H{v∗ ≤α} . v ∗ ∈V ∗ , v ∗ =0
(9.3)
{∃α∈R : C⊂H{v∗ ≤α} }
The question we have to examine is to describe, for a given v ∗ ∈ V ∗ , v ∗ = 0, such that there exists some α ∈ R with C ⊂ H{v∗ ≤α} , what is the intersection of all the parallel half-spaces H{v∗ ≤α} which contain C. The answer to this question gives rise to the notion of support function. Proposition 9.1.1. For any v ∗ ∈ V ∗ , v ∗ = 0 such that C ⊂ H{v∗ ≤α} for some α ∈ R we have H{v∗ ≤α} = H{v∗ ≤σC (v∗ )} , %
{α : C⊂H{v∗ ≤α} }
where σC (v ∗ ) := sup v ∗ , v : v ∈ C}. In other words, for any given v ∗ ∈ V ∗ , v ∗ = 0, such that C ⊂ H{v∗ ≤α} for some α ∈ R, the intersection of all the “parallel” closed halfspaces H{v∗ ≤α} containing C is the closed half-space H{v∗ ≤σC (v∗ )} , where σC (v ∗ ) is defined as above. It is convenient to extend the definition of σC to an arbitrary v ∗ ∈ V ∗ by allowing it to take the value +∞. Definition 9.1.2. For any subset C of V , the function σC : V ∗ → R ∪ {+∞} defined by % & σC (v ∗ ) = sup v ∗ , v : v ∈ C is called the support function of the set C.
i
i i
i
i
i
i
310
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Proof of Proposition 9.1.1 (a) For any v ∈ C, by definition of σC , we have v ∗ , v ≤ σC (v ∗ ). Hence, C ⊂ H{v∗ ≤σC (v∗ )} , which clearly implies H{v∗ ≤α} ⊂ H{v∗ ≤σC (v∗ )} . {α : C⊂H{v∗ ≤α} }
(b) For any α ∈ R such that C ⊂ H{v∗ ≤α} , we have & % α ≥ sup v ∗ , v : v ∈ C = σC (v ∗ ). Hence, H{v∗ ≤σC (v∗ )} ⊂ H{v∗ ≤α} and
H{v∗ ≤σC (v∗ )} ⊂
H{v∗ ≤α} ,
{α : C⊂H{v∗ ≤α} }
which completes the proof. As a direct consequence of formula (9.3) and Proposition 9.1.1 we obtain the following important result. Theorem 9.1.2. Let C be a nonempty closed convex subset of a normed linear space V . Then C= H{v∗ ≤σC (v∗ )} , v ∗ ∈V ∗ , v ∗ =0
where σC is the support function of C. Equivalently, & % C = v ∈ V : v ∗ , v ≤ σC (v ∗ ) ∀ v ∗ ∈ V ∗ . Remark 9.1.1. The dual representation of a closed convex set C has been obtained with the help of the support function σC : V ∗ → R ∪ {+∞}. As we will see in this chapter, the mapping C → σC can be viewed as a particular case of the general duality correspondence, namely, the Legendre–Fenchel transform f → f ∗ . More precisely, by taking f = δC the indicator of C, we have f ∗ = σC . We examine below the properties of σC which are direct consequences of its definition. Proposition 9.1.2. The support function σC : V ∗ → R∪{+∞} of a closed convex nonempty subset C is a function which is closed, convex, proper, and positively homogeneous of degree 1. Proof. For any v ∈ C, the mapping v ∗ ∈ V ∗ → v ∗ , v is a linear continuous form on V ∗ , hence convex and continuous. The function σC as a supremum of convex functions is still convex and, as a supremum of continuous functions,
i
i i
i
i
i
i
9.1. Dual representation of convex sets
“abmb 2005/1 page 3 i
311
it is closed (lower semicontinuous); see Proposition 3.2.3. Moreover, σC (0) = 0 and σC is proper. Finally, for any v ∗ ∈ V ∗ and t > 0 we have % & σC (tv ∗ ) = sup tv ∗ , v : v ∈ C & % = t sup v ∗ , v : v ∈ C = tσC (v ∗ ), which expresses that σc is positively homogeneous of degree 1. To have a sharper view of the dual generation of closed convex sets, it is interesting to introduce the notion of supporting%hyperplane. This¬ion is closely related to the question: in the definition of σC (v ∗ ) = sup v ∗ , v : v ∈ C , is the supremum attained? Definition 9.1.3. An element v ∗ ∈ V ∗ , v ∗ = 0, is said to support C at a point u ∈ C if σC (v ∗ ) = v ∗ , u % & = sup v ∗ , v : v ∈ C . An equivalent terminology consists in saying that v ∗ is a supporting functional of C at u ∈ C. The geometric terminology above comes from the fact that when v ∗ supports C at u ∈ C we have that the closed half-space H{v∗ ≤σC (v∗ )} contains C and that the corresponding hyperplane & % H = v ∈ V : v ∗ , v = σC (v ∗ ) intersects C at u. (Note that the intersection of H with C may contain some other points.) An interesting question is to know whether it is possible to obtain a dual representation of closed convex sets by supporting functionals. As we will see, this is a quite involved question which is intimately connected with the properties of the subdifferential of a closed convex function and the Bishop–Phelps theorem (density properties of the domain of the subdifferential). Let us end this section with some elementary examples illustrating the concept of support function. Example 9.1.1. (1) Take C = B(0, 1) the unit ball of V . Then, for any v ∗ ∈ V ∗ % & σC (v ∗ ) = sup v ∗ , v : vV ≤ 1 = v ∗ V ∗ , i.e., σC is the dual norm · V ∗ of · V .
i
i i
i
i
i
i
312
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
(2) Take C as a cone, i.e., λv ∈ C for all v ∈ C and λ ≥ 0 (note that necessarily 0 ∈ C). Let us assume moreover that C is closed and convex. Then % & σC (v ∗ ) = sup v ∗ , v : v ∈ C = 0 whenever v ∗ , v ≤ 0 ∀v ∈ C, = +∞ otherwise. Let us notice that the set % C ∗ = v ∗ ∈ V ∗ : v ∗ , v ≤ 0 for all v ∈ C} is a closed convex cone; it is called the polar cone of C. We have that σC is equal to the indicator function of this polar cone σC = δC ∗ .
9.2
Passing from sets to functions: Elements of epigraphical calculus
Our next goal is to apply the dual representation Theorem 9.1.2 to the set C = epif , where f : V → R ∪ {+∞} is a closed convex proper function. So doing, we will obtain by a pure geometrical approach the Legendre–Fenchel duality theory for closed convex functions. To that end, it will be useful to develop some tools of epigraphical calculus, which consists of viewing functions as sets, via their epigraphs. As stressed in Section 3.2.2, the epigraph of an extended real valued function is a geometrical object that carries most of the properties of the corresponding variational problems. In our context, given f : V → R ∪ {+∞}, recall that f is closed (lsc) ⇐⇒ epif is closed, f is convex ⇐⇒ epif is convex, and that the basic operation in convex analysis and duality which consists in taking the supremum of a family of convex (affine) functions has an immediate epigraphical interpretation epifk . epi sup fk = k∈I
k∈I
Beyond the classical operations on extended real valued functions (sum and multiplication by a positive scalar) let us introduce the epi-addition, also called inf-convolution. Definition 9.2.1. Let V be a linear space and f, g : V → R ∪ {+∞} two extended real valued functions. The epi-sum of f and g (also called inf-convolution) is the function f #e g : V → R
i
i i
i
i
i
i
9.2. Passing from sets to functions: Elements of epigraphical calculus
“abmb 2005/1 page 3 i
313
defined by % & (f #e g)(v) = inf f (v1 ) + g(v2 ) : v1 + v2 = v, v1 , v2 ∈ V % & = inf f (v − w) + g(w) : w ∈ V % & = inf f (w) + g(v − w) : w ∈ V . We often briefly write f #g. Note that f #e g may take the value −∞ (for example, take g = 0 and f not minorized). The term epi-sum comes from the following geometrical interpretation of this operation. Proposition 9.2.1. For any f, g : V → R ∪ {+∞} epiS (f #e g) = epiS f + epiS g, where epiS f stands for the strict epigraph of f , i.e., % & epiS f = (v, λ) ∈ V × R : λ > f (v) , and the sum epiS f + epiS g is the vectorial sum (also called Minkowski sum) of the two sets epiS f and epiS g. Proof. We have λ > (f #e g)(v) iff there exists v1 , v2 ∈ V , with v = v1 + v2 such that λ > f (v1 ) + g(v2 ). This is clearly equivalent to the existence of v1 , v2 ∈ V and λ1 , λ2 ∈ R such that λ1 > f (v1 ), λ2 > g(v2 ) and v = v1 +v2 , λ = λ1 +λ2 . Equivalently, (λ, v) = (λ1 , v1 )+(λ2 , v2 ) with (λ1 , v1 ) ∈ epiS f and (λ2 , v2 ) ∈ epiS g. Remark 9.2.1. The term inf-convolution refers to the (formal) similarities of this operation with the usual convolution of functions on RN f (x − y)g(y) dy, (f ∗ g)(x) =
RN
where one has to replace RN by inf and product by addition. As we will see, there are many striking similarities between these two operations. Proposition 9.2.2. Let f, g : V → R ∪ {+∞} be two convex functions. Then, their epi-sum f #e g is still a convex function. Proof. This property is a clear consequence of the geometrical interpretation of the epi-sum via epigraphs (Proposition 9.2.1) epiS (f #e g) = epiS f + epiS g
i
i i
i
i
i
i
314
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
and of the fact that the Minkowski (vectorial) sum of two convex sets is still convex: indeed, given C and D two convex subsets of a vector space E, consider two points of C + D, let v1 = c1 + d1 , v2 = c2 + d2 with ci ∈ C and di ∈ D (i = 1, 2). For any 0 ≤ λ ≤ 1, one has λv1 + (1 − λ)v2 = λ(c1 + d1 ) + (1 − λ)(c2 + d2 ) = λc1 + (1 − λ)c2 + λd1 + (1 − λ)d2 , which still belongs to C + D. One can then easily verify that the convexity of an extended real-valued function is equivalent to the convexity of its strict epigraph. The fact that the epi-sum preserves the convexity, as we have just observed, follows clearly from its geometrical interpretation. On the other hand, it is somewhat surprising from the analytical point of view, since (f #e g)(v) is expressed as an infimum of convex functions, namely v → f (v − w) + g(w) and the class of convex functions is not stable by infimal operations. This calls for some explanation. Indeed, the convexity of f #e g when f and g are convex functions is a consequence of the observation, “the function of two variables (v, w) → h(v, w) := f (v − w) + g(w) is convex with respect to the pair (v, w),” and of the following proposition. Proposition 9.2.3. Let V and W be two linear spaces and h : V × W → R ∪ {+∞} be a convex function. Then, the function p : V → R defined by p(v) = inf h(v, w) w∈W
is still convex. Proof. Let us prove that for any u, v ∈ V and λ ∈]0, 1[ p(λu + (1 − λ)v) ≤ λp(u) + (1 − λ)p(v). Without any restriction, we can assume p(u) < +∞ and p(v) < +∞; otherwise the inequality is trivially satisfied. Take arbitrary s > p(u) and t > p(v). By definition of p, one can find elements wu,s and wv,t in W such that s > h(u, wu,s ) and t > h(v, wv,t ). By convexity of h(·, ·) with respect to the couple of variables (v, w) h(λu + (1 − λ)v, λwu,s + (1 − λ)wv,t ) ≤ λh(u, wu,s ) + (1 − λ)h(v, wv,t ) ≤ λs + (1 − λ)t. By definition of p p(λu + (1 − λ)v) ≤ h(λu + (1 − λ)v, λwu,s + (1 − λ)wv,t ). We combine the two above inequalities to obtain p(λu + (1 − λ)v) ≤ λs + (1 − λ)t.
i
i i
i
i
i
i
9.2. Passing from sets to functions: Elements of epigraphical calculus
“abmb 2005/1 page 3 i
315
This being true for any s > p(u) and t > p(v), by letting s tend to p(u) and t tend to p(v), we obtain the required convexity inequality. We will see that the epi-sum (inf-convolution) is the dual operation of the usual sum. The epi-sum plays also an important role in the regularization of lower semicontinuous extended real-valued functions. The following theorem has a long history (it goes back to Hausdorff, Pasch, Baire, and has been revisited by many authors). Theorem 9.2.1 (Lipschitz regularization via epi-sum). Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a proper and lower semicontinuous function. Suppose moreover that f is conically minorized, i.e., there exists some k0 ≥ 0 such that for all v ∈ V f (v) ≥ −k0 (1 + v). Let us define for all k ∈ R+ the function fk := f #e k · , i.e., % & fk (v) = inf f (w) + kv − w . w∈V
Then we have (a) for all k ≥ k0 , fk is Lipschitz continuous on V with constant k, that is, for all u, v ∈ V |fk (u) − fk (v)| ≤ ku − v; (b) for all v ∈ V one has
f (v) = lim fk (v). k→+∞
More precisely, the sequence (fk )k monotonically increases to f as k ↑ ∞. (c) When f is convex, so is fk for all k ≥ k0 . Proof. (a) For all k ≥ k0 , we have fk (v) ≥ inf
w∈V
≥ inf
w∈V
% %
&
− k0 − k0 w + kv − w
&
− k0 − k0 w + kw − kv
≥ −k0 − kv > −∞. On the other hand, taking some w0 ∈ dom f = ∅ (f is proper) fk (v) ≤ f (w0 ) + kv − w0 < +∞. Hence, for all k ≥ k0 and all v ∈ V , fk (v) is a real number. Take now u, v ∈ V . The triangle inequality yields for any w ∈ V v − w ≤ u − w + v − u. Hence, for all w ∈ V , for all k ∈ R+ f (w) + kv − w ≤ f (w) + ku − w + kv − u.
i
i i
i
i
i
i
316
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Taking the infimum with respect to w ∈ V yields fk (v) ≤ fk (u) + kv − u. Exchanging the role of v and u and noticing that for k ≥ k0 both fk (v) and fk (u) are finitely valued yields |fk (v) − fk (u)| ≤ kv − u. (b) By taking w = v in the definition of fk (v), one has fk (v) ≤ f (v). Clearly, the sequence (fk )k is increasing with respect to k. Hence lim fk (v) ≤ f (v).
k→+∞
Let us prove the reverse inequality f (v) ≤ lim fk (v). k→+∞
If limk→+∞ fk (v) = +∞, there is nothing to prove. So let us assume that limk→+∞ fk (v) < +∞. For each k ≥ k0 , let us introduce some wk ∈ V such that fk (v) ≥ f (wk ) + kv − wk − εk for some εk > 0, with εk → 0 as k → +∞. Using the growth condition on f we obtain +∞ > sup fk (v) ≥ −k0 (1 + wk ) + kv − wk − εk , k>0
which clearly implies wk → v in (V , · ) as k → +∞. Let us now pass to the limit on the inequality fk (v) ≥ f (wk ) − εk and use the lower semicontinuity of f to obtain lim fk (v) ≥ lim inf f (wk )
k→+∞
k
≥ f (v). (c) Noticing that f and k· are both convex functions, the convexity of fk = f #e k· is a straightforward consequence of Proposition 9.2.2. Let us end this section with the following striking property of closed convex functions. We know that a linear operator from a normed space into another normed space is continuous iff it is bounded on bounded sets. This is an important property since it reduces the study of continuity of a linear operator A : E → F to the establishment of majorizations of the following type: there exists some M ∈ R+ such that vE ≤ 1 ⇒ AvF ≤ M.
i
i i
i
i
i
i
9.2. Passing from sets to functions: Elements of epigraphical calculus
“abmb 2005/1 page 3 i
317
We will prove in Theorem 9.3.1 that any closed convex function is the supremum of all its continuous affine minorants. Thus, it is not surprising that also for closed convex functions local boundedness implies continuity. Let us make this precise in the following statement. Theorem 9.2.2. Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a convex function which is majorized on a neighbourhood of a point v0 ∈ dom f , i.e., ∃r > 0 such that
sup
v−v0 0 and observe that for any v ∈ B(0, rε), the following convex inequalities hold: writing v = (1 − ε)0 + ε 1ε v we have f (v) ≤ (1 − ε)f (0) + εf writing 0 =
1 v 1+ε
+
ε 1+ε
1 v ≤ εM ; ε
−1 v we have ε
ε 1 f (v) + f 0 = f (0) ≤ 1+ε 1+ε
1 εM −1 v ≤ f (v) + , ε 1+ε 1+ε
which yields f (v) ≥ −εM. Combining the two above inequalities, we obtain |f (v)| ≤ εM for v ∈ B(0, rε), which yields the continuity of f at the origin. (b) First observe that in the above argument, when taking ε = 1 we have the existence of some positive constant, we still denote by M, such that |f (v + v0 ) − f (v0 )| ≤ M
∀ v ∈ B(0, r).
Take arbitrary v, w ∈ B(v0 , r ) with v = w. Set ε = r − r > 0 and u=w+
ε (w − v) , w − v
λ=
w − v . ε + w − v
i
i i
i
i
i
i
318
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
We have u ∈ B(v0 , r) and w − vu = (ε + w − v)w − εv. Equivalently, w = λu +
ε v, ε + w − v
w = λu + (1 − λ)v with λ ∈]0, 1[. By convexity of f f (w) ≤ λf (u) + (1 − λ)f (v) = f (v) + λ(f (u) − f (v)), which yields (observe that |f (u) − f (v)| ≤ |f (u) − f (v0 )| + |f (v) − f (v0 )| ≤ 2M) f (w) − f (v) ≤
2M w − v 2M ≤ w − v. ε + w − v ε
Exchanging the role of w and v, we obtain |f (w) − f (v)| ≤
2M w − v, ε
which completes the proof.
9.3
Legendre–Fenchel transform
Given f : V → R ∪ {+∞} a closed convex proper function, we are going to introduce f ∗ , the Legendre–Fenchel transform of f , by considering the set C = epi f ⊂ V × R and its dual representation, as given by Theorem 9.1.2. To that end, we need to exploit the particular structure of the set C = epi f in V × R and describe the family of the closed half-spaces in V × R containing it. Let us start with the following elementary result, which describes the closed halfspaces in V × R. Lemma 9.3.1. Let l ∈ (V × R)∗ be a linear continuous form on V × R, l = 0. Then, there exist u∗ ∈ V ∗ and γ ∈ R, (u∗ , γ ) = 0 such that l(v, t) = u∗ , v + γ t ∀ (v, t) ∈ V × R. A closed half-space H in V × R is of the following form: % & H = H{(u∗ ,γ )≤α} := (v, t) ∈ V × R : u∗ , v + γ t ≤ α . Depending on the value of γ (γ = 0 or γ % = 0), we have the two distinct&situations: % ∗ & ∗ (a) γ = 0. Then H = H{(u∗ ,γ )≤α} = (v, % &t) ∈ V ×R : u , v ≤ α = u ≤ α ×R is invariant by all the translations parallel to 0 ×R. In that case, we say that the half-space is “vertical”. (b) γ = 0. By normalization (divide by −γ ) one can rewrite the closed half-space H in the form % & H = (v, t) ∈ V × R : t ≥ u∗ , v − α
i
i i
i
i
i
i
9.3. Legendre–Fenchel transform or
“abmb 2005/1 page 3 i
319
% & H = (v, t) ∈ V × R : t ≤ u∗ , v − α .
It is the epigraph or the hypograph of the affine continuous function v → u∗ , v − α.
Because of its particular structure, the epigraph of a proper function cannot be con& % tained in a half-space of the form (v, t) ∈ V × R : t ≤ u∗ , v − α . We can summarize the previous results in the following lemma. Lemma 9.3.2. Let f : V −→ R ∪ {+∞} be a proper function. Then, a closed half-space H in V × R containing the set C = epi f is either vertical or equal to the epigraph of an affine continuous function, i.e., there exist some u∗ ∈ V and α ∈ R such that & % H = (v, t) ∈ V × R : t ≥ u∗ , v − α . Let us keep in mind that we are looking for the simplest dual representation of convex functions f . In this perspective, it is a striking and important property that one can get rid of the vertical half-spaces in their dual representations. Indeed, this is not a surprising result since one can approach them, and get arbitrarily “close” to closed vertical half-spaces, by epigraphs of continuous affine functions. Let us make this precise in the following statement. Theorem 9.3.1. Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a closed convex proper function. Then, f is equal to the supremum of all the continuous affine functions which minorize f . Proof. (a) Let us first notice that among all the closed half-spaces containing C = epi f , at least one of them is the epigraph of an affine continuous function. Otherwise, there would be only vertical half-spaces in this family, and C, as an intersection of such sets, would be vertically % &invariant. This is impossible, because f is proper. Indeed, for any v0 ∈ dom f , C ∩ ( v0 × R) = {v0 } × [f (v0 ), +∞[ with [f (v0 ), +∞[ strictly included in R. Then, notice that & % epi f ⊂ epi v → u∗ , v − α is equivalent to
f (v) ≥ u∗ , v − α ∀v ∈ V ,
i.e., f admits at least an affine continuous minorant. (b) Let us recall the general property: f = sup fi ⇐⇒ epi f = i∈I epi fi . Thus, to establish the assertion of the theorem, it suffices to show that each point (v0 , t0 ) ∈ epi f is outside the epigraph of an affine continuous function that is majorized by f . We know that C = epi f is equal to the intersection of all the closed half-spaces that contain it (Corollary 9.1.1) and that any such half-space either is vertical or is the epigraph of an affine continuous function (Lemma 9.3.2). To eliminate the vertical half-spaces in their
i
i i
i
i
i
i
320
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
dual representation, we use the Lipschitz regularization theorem, Theorem 9.2.1. Since f admits an affine continuous minorant, it is conically minorized and f = supk fk with fk convex and Lipschitz continuous. Since t0 < f (v0 ), for some k0 sufficiently large t0 < fk0 (v0 ) ≤ f (v0 ) and (v0 , t0 ) ∈ epi fk0 . Let us now use the dual representation of the closed convex set epi fk0 as the intersection of all the closed half-spaces that contain it. Since fk0 is everywhere defined, there is no vertical half-space containing epi fk0 . Thus, fk0 is the supremum of all its affine continuous minorants. As a consequence, one can find an affine continuous function v → l(v) = u∗ , v − α with (v0 , t0 ) ∈ epi l and l ≤ fk0 . Since fk0 ≤ f we have l ≤ f and (v0 , t0 ) ∈ epi l, that is, l satisfies all required properties. Just like for convex sets, we are going to look for the simplest dual description of closed convex functions, i.e., using the simplest continuous affine minorants. Theorem 9.3.1 tells us that % & f (v) = sup v ∗ , v − α : v ∗ , v − α ≤ f (v) ∀v ∈ V . Let us now observe that for v ∗ ∈ V ∗ being fixed, making α vary provides parallel minorizing affine continuous functions. Clearly, the best α is obtained by taking % & α = sup v ∗ , v − f (v) : v ∈ V , which, for v ∗ ∈ V ∗ being fixed, is a real number iff f admits a continuous affine minorant with slope v ∗ . This is precisely the quantity which is classically denoted by % & f ∗ (v ∗ ) = sup v ∗ , v − f (v) : v ∈ V and which makes sense for an arbitrary v ∗ ∈ V ∗ , with possibly +∞ values. The above geometrical considerations allow us to reformulate Theorem 9.3.1 in the following form: % & ∀v ∈ V f (v) = sup v ∗ , v − f ∗ (v ∗ ) : v ∗ ∈ V ∗ . (9.4) We are now ready to introduce classical notation, terminology, and basic facts concerning the Legendre–Fenchel transform which is defined below for arbitrary proper function f . Definition 9.3.1. Let V be a normed linear space and let f : V → R ∪ {+∞} be a proper function. The Legendre–Fenchel conjugate of f (L-F in short) is the function f ∗ : V ∗ → R ∪ {+∞} defined by % & f ∗ (v ∗ ) = sup v ∗ , v − f (v) : v ∈ V .
i
i i
i
i
i
i
9.3. Legendre–Fenchel transform
“abmb 2005/1 page 3 i
321
Let us notice that since f is proper, by taking some v0 ∈ dom f f ∗ (v ∗ ) ≥ v ∗ , v0 − f (v0 ), i.e., f ∗ admits an affine continuous minorant and f ∗ : V ∗ → R ∪ {+∞}. Moreover, v ∗ ∈ dom f ∗ iff there exists α ∈ R such that for all v ∈ V one has ∗ v , v − f (v) ≤ α, i.e., f (v) ≥ v ∗ , v − α. Let us now return to the case when f is closed convex and proper. We know that f admits at least one such affine continuous minorant. This implies that f ∗ is proper. Since f ∗ is a supremum of continuous affine functions it is a closed convex proper function from V ∗ into R ∪ {+∞}. Let us examine the two formulas % & f ∗ (v ∗ ) = sup v ∗ , v − f (v) : v ∈ V (definition of f ∗ ), % & f (v) = sup v ∗ , v − f ∗ (v ∗ ) : v ∗ ∈ V ∗ ( Theorem 9.3.1). They are essentially the same. Let us make this precise. Since f ∗ is closed convex and proper we can compute its conjugate f ∗∗ : V ∗∗ → R ∪ {+∞}. By using the canonical embedding of V into V ∗∗ , we can restrict f ∗∗ to V to obtain & % ∀v ∈ V f ∗∗ (v) = sup v ∗ , v − f ∗ (v ∗ ) : v ∗ ∈ V ∗ (recall that i : V → V ∗∗ is defined by i(v)(v ∗ ) = v ∗ (v)). The dual representation theorem, Theorem 9.3.1, for closed convex functions then can be reformulated in the following form: f = f ∗∗ . This is the Fenchel–Moreau–Rockafellar theorem that we now state. Theorem 9.3.2. Let V be a normed space and let f : V → R ∪ {+∞} be a closed convex proper function. Then f = f ∗∗ , i.e., f is equal to its biconjugate. Equivalently, ∀v ∈ V
% & f (v) = sup v ∗ , v − f ∗ (v ∗ ) : v ∗ ∈ V ∗ .
It is worth noticing that the dual representation of closed convex proper functions has been given, in the above theorem, a quite simple formulation. On the counterpart, it is a precise analytic formulation which may hide the geometrical features of this duality theory. It is good to keep in mind both aspects. At this point, it is interesting to observe that the duality for functions has been derived from duality for sets (via the representation of f : V → R ∪ {+∞} by its epigraph C = epi f ). Conversely, the duality for sets can be obtained as a particular case of the duality for functions. Let us associate to a set C its indicator function δC and first observe
i
i i
i
i
i
i
322
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
that whenever C is a nonempty closed convex set, then δC is a closed convex proper function. Then notice that % & (δC )∗ (v ∗ ) = sup v ∗ , v − δC (v) : v ∈ V % ∗ & = sup v , v : v ∈ C = σC (v ∗ ), i.e., (δC )∗ is the support function of C. The Fenchel–Moreau–Rockafellar theorem, Theorem 9.3.2, says that (δC )∗∗ = δC , which is equivalent to & % δC (v) = sup v ∗ , v − σC (v ∗ ) : v ∗ ∈ V ∗ . Noticing that v ∈ C iff δC (v) = 0 one gets & % C = v ∈ V : v ∗ , v ≤ σC (v ∗ ) ∀ v ∗ ∈ V ∗ . Let us summarize the previous results. Proposition 9.3.1. Let C be a nonempty closed convex subset of a normed linear space V ; then (δC )∗ = σC and (σC )∗ = δC , that is,
& % C = v ∈ V : v ∗ , v ≤ σC (v ∗ ) for all v ∗ ∈ V ∗ .
It is worth noticing that the biconjugate operation f → f ∗∗ enjoys nice properties for convex functions which are not necessarily closed. We recall (see Section 3.2.4) that given (X, τ ) a general topological space and f : X → R ∪ {+∞}, clτ f is the largest τ -lower semicontinuous function that minorizes f . We have epi(clτ f ) = cl(epi f ); clτ f is called the lower semicontinuous regularization of f . Moreover (see Proposition 3.2.5 (d)), f is τ -lsc at x iff f (x) = (clτ f )(x). For convex functions f : V :→ R ∪ {+∞}, with V a normed space, we have the following elegant characterization of clf (for the topology of the norm of the space V ) in terms of the biconjugate f ∗∗ . Proposition 9.3.2. Let V be a normed linear space and let f : V → R ∪ {+∞} be a convex proper function. Let us assume that f admits a continuous affine minorant. Then the following equality holds: f ∗∗ = clf. As a consequence, f is lower semicontinuous at u ∈ V ⇐⇒ f (u) = f ∗∗ (u).
i
i i
i
i
i
i
9.3. Legendre–Fenchel transform
“abmb 2005/1 page 3 i
323
Proof. By definition, for any v ∈ V
& % f ∗∗ (v) = sup v ∗ , v − f ∗ (v ∗ ) : v ∗ ∈ V ∗ ,
which implies that f ∗∗ is the upper envelope of the continuous affine minorants of f . It is a closed (convex) proper function, hence f ∗∗ ≤ clf ≤ f. Let us now observe that clf is still convex, because the epigraph of clf is the closure of epi f which is a convex set. Hence, clf is a closed convex proper function. Since f ∗∗ and clf are both closed convex proper functions, we apply Theorem 9.3.2 to obtain (f ∗∗ )∗∗ = f ∗∗ , (clf )∗∗ = clf. The inequality f ∗∗ ≤ clf ≤ f implies, by taking the biconjugate of each term, that (f ∗∗ )∗∗ ≤ (clf )∗∗ ≤ f ∗∗ , i.e.,
f ∗∗ ≤ clf ≤ f ∗∗ ,
and the equality f ∗∗ = clf follows. By Proposition 3.2.5 (d), for general functions f : V → R ∪ {+∞} we have the equivalence f is lower semicontinuous at u ∈ V ⇐⇒ f (u) = clf (u). As a consequence, when f is proper and convex, we have f is lower semicontinuous at u ∈ V ⇐⇒ f (u) = f ∗∗ (u), which completes the proof. Example 9.3.1. (1) Take C = B(0, 1). By definition of the dual norm, for any v ∗ ∈ V ∗ σB(0,1) (v ∗ ) = sup v ∗ , v = v ∗ V ∗ . v∈B(0,1)
Thus, σB(0,1) = · V ∗ . Conversely the convex duality theorem yields ( · ∗ )∗ = δB(0,1) .
(9.5)
(2) As suggested by the result above we have ( · )∗ = δB∗ (0,1) .
(9.6)
Let us prove (9.6). Indeed by contrast with (9.5), (9.6) is an elementary result which does not use the Hahn–Banach theorem. Set f (v) = vV . Thus, for any v ∗ ∈ V ∗ % & f ∗ (v ∗ ) = sup v ∗ , v − v : v ∈ V .
i
i i
i
i
i
i
324
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
If v ∗ ∗ ≤ 1, then v ∗ , v − v ≤ 0 for all v ∈ V and v ∗ , v − v = 0 for v = 0. Thus f ∗ (v ∗ ) = 0. If v ∗ ∗ > 1, by definition of ·∗ there exists some v0 ∈ V such that v ∗ , v0 > v0 . It follows that for all t > 0 v ∗ , tv0 − tv0 = t (v ∗ , v0 − v0 ). Hence lim v ∗ , tv0 − tv0 = +∞, which implies t→+∞
& % f ∗ (v ∗ ) = sup v ∗ , v − f (v) = +∞. v∈V
By taking the conjugate in (9.6) and applying the duality Theorem 9.3.2 we obtain for any v ∈ V % & v = sup v ∗ , v : v ∗ ∗ ≤ 1 ; this is the isometrical embedding theorem from V into its bidual V ∗∗ . We give in the following proposition an important example of a dual convex function which indeed is an extension of Example 9.3.1, case 2. Proposition 9.3.3. Let (V , · ) be a normed space with topological dual space (V ∗ , · ∗ ). Let ϕ : R → R ∪ {+∞} be a closed convex function which is even (i.e., ϕ(−t) = ϕ(t)). Then the function f : V → R ∪ {+∞}, f (v) = ϕ(v), is a closed convex proper function and f ∗ (v ∗ ) = ϕ ∗ (v ∗ ∗ ). Proof. The assumptions on ϕ imply that ϕ : R+ → R ∪ {+∞} is increasing. Thus f is still convex and clearly closed. Moreover, % & f ∗ (v ∗ ) = sup v ∗ , v − ϕ(v) v∈V & % ∗ v , v − ϕ(v) = sup sup t≥0 v∈V ,v=t
% & = sup tv ∗ ∗ − ϕ(t) t≥0
& % = sup tv ∗ ∗ − ϕ(t) (because ϕ is even) t∈R
= ϕ ∗ (v ∗ ∗ ), which completes the proof. As a straightforward consequence we obtain the following useful result. p
Corollary 9.3.1. Set f (v) = p1 vp with 1 < p < +∞. Then f ∗ (v ∗ ) = p1 v ∗ ∗ where p is the Hölder conjugate exponent of p, i.e., 1/p + 1/p = 1. In particular, taking
i
i i
i
i
i
i
9.3. Legendre–Fenchel transform
“abmb 2005/1 page 3 i
325
V = Lp (, A, µ), 1 < p < +∞, we have V ∗ = Lp (, A, µ) and the conjugate function of 1 f (v) = v(x)p dµ(x) p is equal to 1 ∗ ∗ v ∗ (x)p dµ(x). f (v ) = p This makes the transition with the next important example, which is concerned with integral functionals and which will be studied in detail in chapter 13. Theorem 9.3.3. Let V = Lp (, A, µ), 1 < p < +∞, and f (v) = j (x, v(x)) dµ(x)
a convex integral functional associated to a convex normal integrand j . Then
f ∗ : V ∗ = Lp (, A, µ) → R ∪ {+∞} is given by f ∗ (v ∗ ) =
j ∗ (x, v ∗ (x)) dµ(x),
where j ∗ (x, ·) is the convex conjugate of j (x, ·). Remark 9.3.1. When V = H is a Hilbert space the L-F transform f → f ∗ is an involution from 0 (H ) into itself, where 0 (H ) is the set of closed convex proper functions on H , ∗
0 (H ) −→ 0 (H ), f → f ∗ , i.e., f ∗∗ = f . This transform has some analogy with the Fourier–Plancherel transform (F-P in short), F : L2 (RN ) → L2 (RN ), f → F(f ), where FFf = f , which is indeed an isometry. Let us notice that ( · 2 /2)∗ = · 2 /2 2 i.e., · 2 /2 is invariant for the L-F transform while f (x) = 21 e−x is invariant for the F-P transform. A basic property of the F-P transform is F(f ∗ g) = F(f )F(g), where f ∗ g is the convolution of functions. This property can be seen as the (formal) analogue of the property (f #e g)∗ = f ∗ + g ∗ , which is studied in Section 9.4. Let us complete this section by studying the natural setting in which the Legendre– Fenchel transform acts as an operator. We will pay particular attention to the description
i
i i
i
i
i
i
326
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
of its range. Our final result (Theorem 9.3.5) shows that the Legendre–Fenchel transform is a one-to-one mapping from 0 (V ) onto 0 (V ∗ ) (cf. Definition 9.3.2). Given a general normed space (V , .), let us recall that the Legendre–Fenchel transform is the mapping which associates to a closed convex proper function f : V −→ R ∪ {+∞} its conjugate f ∗ : V ∗ −→ R ∪ {+∞} which is defined by % & ∀v ∗ ∈ V ∗ f ∗ (v ∗ ) = sup v ∗ , v − f (v) : v ∈ V . Let us start with some simple observations. (a) The Legendre–Fenchel transform is one-to-one. This results from the implication f ∗ = g ∗ ⇒ f ∗∗ = g ∗∗ and f ∗∗ = f, g ∗∗ = g, which hold true for arbitrary closed convex proper functions f and g (Theorem 9.3.2). (b) The description of the range of the L-F transform requires some attention. Let us return to the (above) definition of f ∗ . In this supremum, one just needs to consider v ∈ domf . For such v ∈ domf , the mapping v ∗ → v ∗ , v − f (v) is continuous on V ∗ for the topology σ (V ∗ , V ), which is the weak star topology of the dual. This follows directly from the definition of this topology. Hence, f ∗ , as a supremum of such affine, σ (V ∗ , V ) continuous functions, is a convex, proper function which is σ (V ∗ , V ) lower semicontinuous. Indeed, when V is not reflexive, for a convex function g : V ∗ −→ R∪{+∞}, to be σ (V ∗ , V ) lower semicontinuous is a strictly stronger property than just to be lower semicontinuous for the topology of the norm. One can exhibit a closed convex function g : V ∗ −→ R which is not σ (V ∗ , V ) lower semicontinuous. Take any ξ ∈ V ∗∗ \ I (V ), where I is the canonical embedding of V into its bidual V ∗∗ (recall that I (v), v ∗ (V ∗∗ ,V ∗ ) = v ∗ , v(V ∗ ,V ) ). Then define g(v ∗ ) := ξ, v ∗ (V ∗∗ ,V ∗ ) . Clearly g is continuous on V ∗ , but it is not σ (V ∗ , V ) lower semicontinuous. Otherwise, by linearity, it would be continuous for the topology σ (V ∗ , V ), which would imply ξ ∈ I (V ). It turns out that this σ (V ∗ , V ) lower semicontinuity property allows us to characterize the range of the L-F transform. Let us make this precise in the following statement. Theorem 9.3.4. Let (V , .) be a normed space. (a) For any f : V −→ R ∪ {+∞} which is closed, convex, and proper, its Legendre– Fenchel conjugate f ∗ : V ∗ −→ R ∪ {+∞} is a convex proper function which is σ (V ∗ , V ) lower semicontinuous. (b) Conversely, let g : V ∗ −→ R ∪ {+∞} be a convex, proper, and σ (V ∗ , V ) lower semicontinuous function. Then, g = g ∗∗ and g belongs to the range of the Legendre–Fenchel transform. More precisely, g = (g ∗ )∗ is equal to the Legendre–Fenchel transform of the closed convex proper function g ∗ : V −→ R ∪ {+∞} which is defined by & % ∀v ∈ V g ∗ (v) = sup v ∗ , v − g(v ∗ ) : v ∗ ∈ V ∗ . Proof. Part (a) has already been proved. Proof of part (b) requires some further topological tools. When equipped with the topology σ (V ∗ , V ), the space V ∗ is a locally convex
i
i i
i
i
i
i
9.3. Legendre–Fenchel transform
“abmb 2005/1 page 3 i
327
topological vector space, whose dual can be identified with V . The Hahn–Banach theorem still holds in locally convex topological vector spaces, and from that point, the proof is essentially the same as in Theorem 9.3.2. To give a unified formulation of Theorem 9.3.2 and Theorem 9.3.4 where V and V ∗ , f and f ∗ play symmetrical roles, it is convenient to introduce the following notions and notation. Definition 9.3.2. Let (V , .) be a normed space with topological dual V ∗ . We set % 0,V ∗ (V ) = f : V −→ R ∪ {+∞}, f is a pointwise supremum of a nonvoid family of & affine functions with slopes in V ∗ , f ≡ +∞ ; % 0,V (V ∗ ) = g : V ∗ −→ R ∪ {+∞}, g is a pointwise supremum of a nonvoid family of & affine functions with slopes in V , g ≡ +∞ . These definitions make explicit references to the pairing between the two spaces V and V ∗ , that is, (v, v ∗ ) ∈ V × V ∗ → v ∗ , v(V ∗ ,V ) = v ∗ (v). Without ambiguity, one often omits the subscript refering to the coupled space and writes briefly 0 (V ) and 0 (V ∗ ). To be more precise, one has f ∈ 0 (V ) ⇐⇒ f = sup fi i∈I
with fi (v) = vi∗ , v − αi for some index set I , vi∗ ∈ V ∗ (slope), and αi ∈ R; g ∈ 0 (V ∗ ) ⇐⇒ g = sup gj j ∈J
with gj (v ∗ ) = v ∗ , vj − βj for some index set J , vj ∈ V (slope), and βj ∈ R. We can now reformulate Theorem 9.3.1 and its corresponding version when considering the locally convex topological vector space (V ∗ , σ (V ∗ , V )), together with Theorems 9.3.2 and 9.3.4 in the following final statement. Theorem 9.3.5. Let (V , .) be a normed space with topological dual V ∗ . Then, (a) one has % & 0 (V ) = f : V −→ R ∪ {+∞}, f closed, convex, proper % & = f : V −→ R ∪ {+∞}, f σ (V , V ∗ ) closed, convex, proper , while % & 0 (V ∗ ) = g : V ∗ −→ R ∪ {+∞}, g σ (V ∗ , V ) closed, convex, proper . (b) The Legendre–Fenchel transform is a one-to-one mapping from 0 (V ) onto 0 (V ∗ ): ∗
0 (V ) −→ 0 (V ∗ ) f −→ f ∗ . For any f ∈ 0 (V ) one has f = f ∗∗ and for any g ∈ 0 (V ∗ ) one has g = g ∗∗ .
i
i i
i
i
i
i
328
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Remark 9.3.2. The preceding theory can be developed in the general setting of two vector spaces V and W in separate duality. Let us denote by v, w(V ,W ) a given pairing between elements v ∈ V and w ∈ W . It is a bilinear form with separating properties, namely, ∀v ∈ V , v = 0, ∃w ∈ W with v, w = 0, ∀w ∈ W, w = 0, ∃v ∈ V with v, w = 0. Then W is the dual of (V , σ (V , W )) and conversely. The set 0 (V ) (respectively, 0 (W )) is defined by taking suprema of affine functions with slopes in W (respectively, V ), and the Legendre–Fenchel transform is a one-to-one mapping from 0 (V ) onto 0 (W ); see Moreau [182] for further details.
9.4
Legendre–Fenchel calculus
As we have already stressed, most optimization problems can be written as % & inf f (v) : v ∈ V , where f = f0 + δC is the sum of the objective function f0 and the indicator function of the constraint C. This explains the importance of getting a formula for the Legendre–Fenchel conjugate of a sum of functions. At this point, the epi-sum plays a central role, because of the following general property. Proposition 9.4.1. Let ϕ, ψ : V → R ∪ {+∞} be two proper functions. Then (ϕ#ψ)∗ = ϕ ∗ + ψ ∗ . Proof. It is enough to take v ∗ ∈ V ∗ and compute & % (ϕ#ψ)∗ (v ∗ ) = sup v ∗ , v − (ϕ#ψ)(v) v∈V = > = sup v ∗ , v − inf ϕ(v1 ) + ψ(v2 ) v1 +v2 =v v∈V = > = sup v ∗ , v + sup − ϕ(v1 ) − ψ(v2 ) v1 +v2 =v v∈V % ∗ & = sup v , v1 − ϕ(v1 ) + v ∗ , v2 − ψ(v2 ) v∈V , v1 +v2 =v ∗ ∗ ∗
= ϕ (v ) + ψ (v ∗ ), which completes the proof. Corollary 9.4.1. Let f, g : V → R ∪ {+∞} be two closed convex proper functions. Then (f + g)∗ = (f ∗ #g ∗ )∗∗ .
i
i i
i
i
i
i
9.4. Legendre–Fenchel calculus
“abmb 2005/1 page 3 i
329
As a consequence, when the convex function f ∗ #g ∗ is a σ (V ∗ , V ) closed proper function, we have (f + g)∗ = f ∗ #g ∗ . Proof. By Proposition 9.4.1 we have (f ∗ #g ∗ )∗ = f ∗∗ + g ∗∗ . When f and g are assumed to be closed convex and proper, one gets (f ∗ #g ∗ )∗ = f + g. Taking again the Legendre–Fenchel conjugate, we obtain (f + g)∗ = (f ∗ #g ∗ )∗∗ . The function f ∗ #g ∗ , as the epi-sum of two convex functions, is still convex (Proposition 9.2.2). When it is σ (V ∗ , V ) closed and proper, Theorem 9.3.4 yields (f + g)∗ = f ∗ #g ∗ , which completes the proof. We can now state the following theorem from Rockafellar [201] and Moreau [182], which, under a so-called qualification assumption on f and g, asserts that f ∗ #g ∗ is σ (V ∗ , V ) closed and hence (f + g)∗ = f ∗ #g ∗ . Theorem 9.4.1. Let V be a normed linear space and let f, g : V → R ∪ {+∞} be two closed convex and proper functions which satisfy the following qualification assumption: there is a point u0 ∈ dom f ∩ dom g where f is continuous.
(Q)
Then, f ∗ #g ∗ is a σ (V ∗ , V ) closed convex proper function and the following equality holds: (f + g)∗ = f ∗ #g ∗ . Moreover, for any v ∗ ∈ V ∗ , the infimum in the definition of f ∗ #g ∗ is achieved. Proof. Corollary 9.4.1 tells us that the only point we need to verify is that f ∗ #g ∗ is σ (V ∗ , V ) closed. Equivalently, we have to prove that for λ ∈ R, the sublevel set of f ∗ #g ∗ & % C = v ∗ ∈ V ∗ : (f ∗ #g ∗ )(v ∗ ) ≤ λ is σ (V ∗ , V ) closed. Indeed, we are going to establish that for each ρ > 0, C ∩ ρBV ∗ is σ (V ∗ , V ) closed, i.e., the traces of C on all closed balls of V ∗ are σ (V ∗ , V ) closed. It will follow from the Banach–Dieudonné–Krein–Smulian theorem (see, e.g., [127, Theorem V 5.7]) that C is σ (V ∗ , V ) closed. Let (vn∗ )n∈N be a bounded sequence of elements of C with vn∗ → v ∗ , σ (V ∗ , V ). When V is separable, it is not restrictive to consider sequences. For
i
i i
i
i
i
i
330
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
general V the argument can be readily extended by considering generalized sequences. By definition of f ∗ #g ∗ , for each n ∈ N, there exists some wn ∈ V ∗ such that f ∗ (vn∗ − wn∗ ) + g ∗ (wn∗ ) ≤ λ +
1 . n
(9.7)
The key point of the proof is to prove that the sequence (wn∗ )n∈N is bounded in V ∗ . To that end, we use as an essential fact the qualification assumption (Q): there exist some r > 0 and some M ∈ R such that sup f (u0 + rv) ≤ M. (9.8) vV ≤1
For any v ∈ B(0, 1) let us majorize wn∗ , v. To that end, let us write rwn∗ , v(V ∗ ,V ) = wn∗ , rv = wn∗ , u0 + wn∗ , rv − u0 = wn∗ , u0 + vn∗ − wn∗ , u0 − rv − vn∗ , u0 − rv ≤ g(u0 ) + g ∗ (wn∗ ) + f (u0 − rv) + f ∗ (vn∗ − wn∗ ) − vn∗ , u0 − rv. We rewrite the above inequality in the form rwn∗ , v(V ∗ ,V ) ≤ f ∗ (vn∗ − wn∗ ) + g ∗ (wn∗ ) + f (u0 − rv) + g(u0 ) + u0 − rvvn∗ ∗ and use (9.7), (9.8) to obtain rwn∗ , v(V ,V ∗ ) ≤ λ +
1 + M + g(u0 ) + vn∗ ∗ (u0 + r). n
Using that the sequence (vn∗ ) is bounded, we immediately obtain from the above inequality (which is valid for any v ∈ B(0, 1)) that supn wn∗ ∗ < +∞. We now use the Banach–Alaoglu–Bourbaki theorem, Theorem 1.4.7, and Corollary 1.4.2: when V is separable (for a general V one can use a device of Attouch and Brezis [31]), the unit ball of V ∗ is σ (V ∗ , V ) sequentially compact. As a consequence, one can find a subsequence (wnk )k∈N and some w ∗ ∈ V ∗ such that wnk w ∗ in σ (V ∗ , V ). Let us now use the lower semicontinuity of f ∗ and g ∗ for the topology σ (V ∗ , V ) and pass to the limit in (9.7) to obtain f ∗ (v ∗ − w ∗ ) + g ∗ (w ∗ ) ≤ λ. As a consequence, (f ∗ #g ∗ )(v ∗ ) ≤ f ∗ (v ∗ − w ∗ ) + g ∗ (w ∗ ) ≤ λ and v ∗ ∈ C. The same argument with λ = f ∗ #g ∗ and vn∗ = v ∗ gives that the infimum in the definition of f ∗ #g ∗ is achieved. The qualification assumption (Q), because of its importance, has been intensively studied and many weakened versions of it have been established. Let us quote the following result (see Aubin [45]).
i
i i
i
i
i
i
9.5. Subdifferential calculus for convex functions
“abmb 2005/1 page 3 i
331
Theorem 9.4.2. Let V be a Banach space and let f, g : V → R ∪ {+∞} be two closed convex proper functions such that dom f − dom g is a neighborhood of the origin. Then, the same conclusions as Theorem 9.4.1 hold and (f + g)∗ = f ∗ #g ∗ .
In the same spirit, the same result was established by Attouch and Brezis in [31] under the even weaker assumption $
λ(dom f − dom g) is a closed subspace of V .
λ>0
Note that by contrast with the Rockafellar theorem, which holds in general normed spaces, the Aubin and Attouch–Brezis theorems require that the space V is a Banach space. Indeed, an essential ingredient in the proof of these theorems is the Banach–Steinhaus theorem. Otherwise, the proof is essentially the same as in Theorem 9.4.1.
9.5
Subdifferential calculus for convex functions
To obtain the simplest possible dual representation of a closed convex set C of a normed linear space (V , · ), we introduce the notion of supporting hyperplane. When taking C = epif , the epigraph of a closed convex proper function, the corresponding notion is the exact minoration: a continuous affine function l : V → R is an exact minorant of f at u if l ≤ f and l(u) = f (u). Equivalently, when setting l(v) = u∗ , v + α, this becomes
f (v) ≥ u∗ , v + α ∀ v ∈ V , f (u) = u∗ , u + α,
i.e., α = f (u) − u∗ , u, l(v) = f (u) + u∗ , v − u, which is equivalent to ∀v ∈ V
f (v) ≥ f (u) + u∗ , v − u.
This leads to the following definition. Definition 9.5.1. Let (V , · ) be a normed space and f : V → R ∪ {+∞} be a closed convex proper function. We say that an element u∗ ∈ V ∗ belongs to the subdifferential of f at u ∈ V if ∀v ∈ V
f (v) ≥ f (u) + u∗ , v − u(V ∗ ,V ) .
We then write u∗ ∈ ∂f (u).
i
i i
i
i
i
i
332
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
The terminology reflects the fact that when f is continuously differentiable and convex, the following inequality holds: ∀v ∈ V
f (v) ≥ f (u) + ∇f (u), v − u.
Moreover, this inequality characterizes ∇f (u). For this reason, when u∗ ∈ ∂f (u), we say either that u∗ belongs to the subdifferential of f at u or that u∗ is a subgradient of f at u. Note that if u∗ ∈ ∂f (u), then necessarily u ∈ dom f (take v0 ∈ dom f = ∅, we have f (v0 ) − u∗ , v0 − u ≥ f (u) and f (u) < +∞). It is also important to notice that given u ∈ dom f , the set ∂f (u) may be empty; see Phelps [197, Example 3.8]. Proposition 9.5.1. Let (V , · ) be a normed space and f : V → R ∪ {+∞} a closed convex proper function. Then the two following conditions are equivalent: (i) u∗ ∈ ∂f (u), (ii) f (u) + f ∗ (u∗ ) − u∗ , u = 0. Proof. (a) Let us first give a geometrical proof: to say that u∗ ∈ ∂f (u) means that v → u∗ , v + f (u) − u∗ , u is an exact minorant of f at u. This implies that it is a maximal minorant with slope u∗ , i.e., f (u) − u∗ , u = −f ∗ (u∗ ). (b) The analytic proof is also immediate: the inequality f (u) + f ∗ (u∗ ) − u∗ , u ≥ 0 is always true. Thus the equality f (u) + f ∗ (u∗ ) − u∗ , u = 0 is equivalent to the inequality f (u) + f ∗ (u∗ ) − u∗ , u ≤ 0. By definition of f ∗ this is equivalent to saying u∗ , u − f (u) ≥ u∗ , v − f (v) ∀ v ∈ V , i.e., u∗ ∈ ∂f (u). Remark 9.5.1. As we have already stressed, for any v ∈ V and v ∗ ∈ V ∗ the inequality f (v) + f ∗ (v ∗ ) − v ∗ , v ≥ 0 is always true. Thus, when writing the characterization of u∗ ∈ ∂f (u), f (u) + f ∗ (u∗ ) − u∗ , u = 0, we express that for the pair (u, u∗ ) ∈ V ×V ∗ , the function (v, v ∗ ) → f (v)+f ∗ (v ∗ )−v ∗ , v takes its minimal value. For this reason, relation (ii) in Proposition 9.5.1 is called the Fenchel extremality relation. A major interest of the Fenchel extremality characterization of subdifferentials is that f and f ∗ play a symmetric role in its formulation. This together with the F-M-R duality theorem, Theorem 9.3.2 (which expresses that f = f ∗∗ ), yields the following result.
i
i i
i
i
i
i
9.5. Subdifferential calculus for convex functions
“abmb 2005/1 page 3 i
333
Theorem 9.5.1. Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a closed convex and proper function. Then, for u ∈ V and u∗ ∈ V ∗ we have u∗ ∈ ∂f (u) ⇐⇒ u ∈ ∂f ∗ (u∗ ). Proof. In fact we obtain u∗ ∈ ∂f (u) ⇐⇒ f (u) + f ∗ (u∗ ) − u∗ , u = 0 ⇐⇒ f ∗∗ (u) + f ∗ (u∗ ) − u∗ , u = 0 ⇐⇒ u ∈ ∂f ∗ (u∗ ), where we use the Fenchel extremality characterization of the subdifferential and the F-M-R duality theorem, Theorem 9.3.2 (f = f ∗∗ ). Remark 9.5.2. When using the notation of set-valued analysis we can write (∂f )−1 = ∂f ∗ . This is indeed the formulation, in terms of subdifferentials, of the convex duality theory. For theoretical reasons it is important to know if a closed convex proper function can be uniquely determined (up to a constant) by its subdifferential. From a geometrical point of view, this can be formulated in the following form: is a closed convex proper function the upper envelope of its exact continuous affine minorants? Indeed the answer is yes when V is a Banach space. The proof of this result relies on the Ekeland’s ε-variational principle. (See Section 3.4. We state it without proof, referring, for instance, to Phelps [197, Corollary 3.1.9]). Theorem 9.5.2. Suppose (V , · ) is a Banach space and f : V → R ∪ {+∞} is a closed convex proper function. Then, for any u ∈ dom f % & f (u) = sup f (v) + v ∗ , u − v : v ∈ V , v ∗ ∈ V ∗ with v ∗ ∈ ∂f (v) % & = sup v ∗ , u − f ∗ (v ∗ ) : ∃v ∈ V such that v ∗ ∈ ∂f (v) . Note that this theorem, when specialized to convex sets, says that any closed convex nonempty set in a Banach space is the intersection of the closed half-spaces defined by its supporting hyperplanes (Phelps [197, Proposition 3.2.1]). When proving the above theorem via Ekeland’s variational principle one obtains in the process the following density result. Theorem 9.5.3. Suppose (V , · ) is a Banach space and f : V → R ∪ {+∞} is a closed convex proper function. Then, dom ∂f is dense in dom f . More precisely, for any v ∈ dom f , there exists a sequence (vn )n∈N with vn ∈ dom ∂f for all n ∈ N such that vn → v and f (vn ) → f (v).
i
i i
i
i
i
i
334
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Proof. For the proof, [46, Theorem 3].
see Azé [49,
Theorem 3.2.4] and Aubin–Ekeland
To develop a calculus for subdifferentials it is convenient to consider ∂f as a multivalued operator, −→ ∂f : V −→ V ∗ , and to identify ∂f with its graph & % ∂f = (v, v ∗ ) ∈ V × V ∗ : v ∗ ∈ ∂f (v) . −→ We recall the basic definitions for calculus of set-valued mappings: given A, B : V −→ V ∗ we have % & dom A = v ∈ V : ∃v ∗ ∈ V ∗ with (v, v ∗&) ∈ A , % A−1 = (v ∗ , v) ∈ V ∗ × V : (v, v ∗ ) ∈ A , dom(A + B) = dom A ∩ dom B, (A + B)(v) = Av + Bv in the sense of vectorial sum. Moreover, we say that A ⊂ B if graph A ⊂ graph B. As we have already stressed, the convex duality theory can be expressed as ∂f ∗ = (∂f )−1 . Theorem 9.5.4. Let (V , · ) be a normed space and let f, g : V → R ∪ {+∞} be two closed convex proper functions. (a) The following inclusion is always true: ∂f + ∂g ⊂ ∂(f + g). (b) If moreover the qualification assumption (Q) holds, f is finite and continuous at a point of dom g,
(Q)
then we have ∂f + ∂g = ∂(f + g). Proof. (a) Take u ∈ dom ∂f ∩ dom ∂g, u∗ ∈ ∂f (u), and w ∗ ∈ ∂g(u). By definition of ∂f and ∂g, for any v ∈ V , f (v) ≥ f (u) + u∗ , v − u, g(v) ≥ g(u) + w ∗ , v − u. By adding these two inequalities, we obtain for any v ∈ V (f + g)(v) ≥ (f + g)(u) + u∗ + w ∗ , v − u, i.e., u∗ + w ∗ ∈ ∂(f + g)(u).
i
i i
i
i
i
i
9.5. Subdifferential calculus for convex functions
“abmb 2005/1 page 3 i
335
(b) Take u∗ ∈ ∂(f + g)(u). Equivalently, by using the Fenchel extremality relation, we obtain (f + g)(u) + (f + g)∗ (u∗ ) − u∗ , u = 0. By Theorem 9.4.1, we have (f + g)∗ (u∗ ) = (f ∗ #g ∗ )(u∗ ) and the infimum in the definition of (f ∗ #g ∗ )(u∗ ) is achieved. Consequently, there exists some w∗ ∈ V ∗ such that (f + g)(u) + f ∗ (u∗ − w ∗ ) + g ∗ (w ∗ ) − u∗ , u = 0. Equivalently, (f (u) + f ∗ (u∗ − w ∗ ) − u∗ − w ∗ , u) + (g(u) + g ∗ (w ∗ ) − w ∗ , u) = 0. By the Fenchel inequality, f (u) + f ∗ (u∗ − w ∗ ) − u∗ − w ∗ , u ≥ 0, g(u) + g ∗ (w ∗ ) − w ∗ , u ≥ 0. Since the sum of these two quantities is equal to zero, we obtain f (u) + f ∗ (u∗ − w ∗ ) − u∗ − w ∗ , u = 0, g(u) + g ∗ (w ∗ ) − w ∗ , u = 0. These are the Fenchel extremality relations (Proposition 9.5.1) and they are equivalent to u∗ − w ∗ ∈ ∂f (u) Finally, we obtain
and
w ∗ ∈ ∂g(u).
u∗ = (u∗ − w ∗ ) + w ∗ ∈ ∂f (u) + ∂g(u),
i.e., u∗ ∈ (∂f + ∂g)(u). We already stressed the fact that for u ∈ dom f , the set ∂f (u) may be empty. The following result, which can be viewed as a corollary of Theorem 9.5.4, gives a sufficient condition for the set ∂f (u) to be nonempty. This result, as we will see in Sections 9.6 and 9.8, is quite useful for applications. Proposition 9.5.2. Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a closed convex and proper function. Let us assume that f is continuous at u ∈ dom f . Then ∂f (u) = ∅ and ∂f (u) is a closed convex and bounded subset of V ∗ . Proof. Let us apply Theorem 9.4.1 to the sum of the two closed convex and proper functions f and g = δ{u} (g is the indicator function of the singleton {u}). By assumption, f is continuous at the point u, and the qualification assumption (Q) of Theorem 9.4.1 is satisfied.
i
i i
i
i
i
i
336
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization Hence, for any v ∗ ∈ V ∗ , the equality ∗ (f + δ{u} )∗ (v ∗ ) = (f ∗ #δ{u} )(v ∗ )
∗ holds, and the infimum in the formulation of (f ∗ #δ{u} )(v ∗ ) is achieved. An elementary computation yields (f + δ{u} )∗ (v ∗ ) = v ∗ , u − f (u), ∗ δ{u} (w ∗ ) = w∗ , u.
Hence, for any v ∗ ∈ V ∗ , there exists some w ∗ ∈ V ∗ such that v ∗ , u − f (u) = f ∗ (v ∗ − w ∗ ) + w ∗ , u. Equivalently,
f (u) + f ∗ (v ∗ − w ∗ ) − v ∗ − w ∗ , u = 0.
This is the Fenchel extremality relation. This is equivalent to v ∗ − w ∗ ∈ ∂f (u), which expresses that ∂f (u) = ∅. Note that, as well, we may have applied Theorem 9.5.4 instead of Theorem 9.4.1 to obtain the above result. As a general rule, the set ∂f (u) is closed and convex. This is an immediate consequence of the definition of ∂f (u). Let us now verify that under the continuity assumption of f at u, this set is bounded. Since f is continuous at u, it is bounded on a neighborhood of u. Let r > 0 and M ≥ 0 be such that f (u + rv) ≤ M for all v ∈ B(0, 1). Take v ∗ ∈ ∂f (u). By definition of ∂f , we have for all v ∈ B(0, 1) f (u + rv) ≥ f (u) + rv ∗ , v. Hence
1 (M + |f (u)|). r This being true for any v ∈ B(0, 1), we obtain v ∗ , v ≤
v ∗ ∗ ≤
1 (M + |f (u)|), r
and, as a consequence, the set ∂f (u) is bounded. Let us now come to the central role played by the subdifferential calculus in convex optimization. The following result, despite its elementary proof (it is a straightforward consequence of the definition of ∂f ), shows the role of the subdifferential optimality rule ∂f (u) 0 as a substitute to the classical Fermat rule. Proposition 9.5.3. Let (V , · ) be a normed space and let f : V → R ∪ {+∞} be a closed convex and proper function. Then, for an element u ∈ V the two following statements are equivalent:
i
i i
i
i
i
i
9.5. Subdifferential calculus for convex functions
“abmb 2005/1 page 3 i
337
(i) f (u) ≤ f (v) for all v ∈ V ; (ii) ∂f (u) 0. Let us stress that the above proposition gives a necessary and sufficient condition for an element u ∈ V to be a solution of the convex minimization problem % & min f (v) : v ∈ V . This necessary and sufficient condition ∂f (u) 0 is an extension to nonsmooth convex functions of the classical first-order necessary and sufficient condition of optimality for convex C1 functions, namely, ∇f (u) = 0. Thus, for a given convex optimization problem, the problem which consists in finding the optimal solutions can be attacked by using the subdifferential calculus and solving the generalized equation ∂f (u) 0. As we have stressed, Legendre–Fenchel calculus and subdifferential calculus are intimately connected; playing with both of them when passing from one formulation to the other gives a lot of flexibility and makes a rich calculus. This calculus is made even richer when exploiting some of its geometrical aspects (duality via polar cones, etc.). Let us develop these ideas in the following general approach to optimization (both finite and infinite dimensional) problems. Let f0 : V → R ∪ {+∞} be a closed convex proper function (objective function) on a normed linear space V , and let C ⊂ V be a closed convex nonempty subset of V (set of constraints). Consider the following optimization problem: & % (P) inf f0 (v) : v ∈ C . It can be written in the equivalent form % & inf f (v) : v ∈ V , where f := f0 + δC . An element u ∈ V is an optimal solution of (P) iff ∂f (u) 0. To compute ∂f we assume that the qualification assumption (Q) is satisfied: f0 is continuous at a point of C or int C ∩ dom f0 = ∅.
(Q)
Then, Theorem 9.5.4 tells us that it is equivalent to look for a solution of the equation ∂f0 (u) + ∂δC (u) 0. To describe the subdifferential of the indicator function of a closed convex set C, we need to introduce the notion of tangent and normal cone to C at a point u ∈ C.
i
i i
i
i
i
i
338
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Definition 9.5.2. Let C be a closed convex nonempty subset of a normed space V and let u ∈ C. (a) The tangent cone to C at u, denoted by TC (u) is defined by $ TC (u) = λ(C − u). λ≥0
It is the closure of the cone spanned by C − u. (b) The normal cone (also called outward normal cone) NC (u) to C at u ∈ C is the polar cone of the tangent cone: & % NC (u) = v ∗ ∈ V ∗ : v ∗ , v ≤ 0 ∀ v ∈ TC (u) % ∗ & = v ∈ V ∗ : v ∗ , v − u ≤ 0 ∀ v ∈ C . Proposition 9.5.4. Let C be a closed convex nonempty subset of a normed space V . Then, for every u ∈ C, ∂δC (u) = NC (u). Proof. By definition of the subdifferential u∗ ∈ ∂δC (u) ⇐⇒ δC (v) ≥ δC (u) + u∗ , v − u ∀v ∈ C ⇐⇒ ⇐⇒
u ∈ C, u∗ , v − u ≤ 0 u ∈ C, u∗ , v ≤ 0
∀v∈C
∀ v ∈ TC (u),
that is, u∗ ∈ NC (u). An equivalent and quite useful characterization of NC (u) is given by using the Fenchel extremality relation: u∗ ∈ NC (u) ⇐⇒ u∗ ∈ ∂δC (u) ⇐⇒ δC (u) + δC∗ (u∗ ) = u∗ , u ⇐⇒ σC (u∗ ) = u∗ , u, where we have used that δC∗ = σC (see Proposition 9.3.1). Let us formulate this result precisely. Proposition 9.5.5. Let C be a closed convex nonempty subset of a normed linear space V . For every u ∈ C we have % & NC (u) = u∗ ∈ V ∗ : u∗ , u = max{u∗ , v : v ∈ C} . Equivalently, an element u∗ of NC (u) is characterized by the fact that the linear form v → u∗ , v attains its maximum on C at the point u.
i
i i
i
i
i
i
9.5. Subdifferential calculus for convex functions
“abmb 2005/1 page 3 i
339
Let us come back to the convex constrained optimization problem (P). We can summarize the previous results in the following statement. Theorem 9.5.5. Let (V , · ) be a normed space, let f0 : V → R ∪ {+∞} be a closed convex and proper function, and let C ⊂ V be a closed convex nonempty subset. We assume that one of the two following qualification assumptions (Q1 ) or (Q2 ) is satisfied: f0 is continuous at some point of C,
(Q1 )
dom f0 ∩ int C = ∅.
(Q2 )
Then the following statements are equivalent: (i) u is an optimal solution of the minimization problem (P) % & min f0 (v) : v ∈ C ;
(P)
(ii) u is a solution of the equation ∂f0 (u) + NC (u) 0; (iii) there exists some u∗ ∈ V ∗ such that u ∈ C, u∗ ∈ ∂f0 (u), ∗ u , v − u ≥ 0 ∀ v ∈ C. To go further we need to enrich the model and give more information on the structure of the set of constraints C. Because of its practical importance, in the next subsection we are going to pay particular attention to the mathematical convex programming theory (and in particular to linear programming) and the theory of multipliers. We will see how the notion of dual problem naturally occurs. When f0 is a smooth convex function, say, f0 ∈ C1 (V , R), Theorem 9.5.5 takes the following simpler equivalent form: u is an optimal solution of the above minimization problem (P) iff (iii)
u ∈ C and ∇f0 (u), v − u ≥ 0 for every v ∈ C.
Problem (iii) is a particular case of the following general variational inequality problem: given an operator A : V → V ∗ and z ∈ V ∗ find u ∈ C such that Au, v − u ≥ z, v − u ∀ v ∈ C. Note that when C = V (i.e., there are no constraints), the above problem reduces to the standard equation Au = z.
i
i i
i
i
i
i
340
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
As an example, let us examine the important case where C is a closed convex cone such that C ∩ (−C) = {0}. Then, C is equal to the positive cone for the partial ordering v ≥ u ⇐⇒ v − u ∈ C. Then problem (iii) takes the following equivalent form: ∇f0 (u) ≥ 0, u ≥ 0, ∇f0 (u), u = 0. (The last equality is obtained by taking successively v = 0 and v = 2u in (iii).) This type of problem is called a complementarity problem. % & Take now a closely related problem where C = v ∈ V : v ≥ g , where g ∈ V is given. One can easily obtain that (iii) becomes ∇f0 (u) ≥ 0, u ≥ g, ∇f0 (u), u − g = 0. When V = H01 () and f0 (v) =
1 2
|∇v(x)|2 dx is the Dirichlet integral, we obtain
−u ≥ 0, u ≥ g, u, u − g = 0. The first condition expresses that −u = µ ≥ 0 is a nonnegative Radon measure. The last condition (complementary condition) can be recognized as (u˜ − g) ˜ dµ = 0,
where u˜ and g˜ are the quasi-continuous representatives of u and g. It expresses that µ = −u does not charge the set where u˜ > g. ˜ In other words, µ is concentrated on the contact set ω = {u˜ = g} ˜ and we have to solve the free boundary value problem: −u = 0 on \ ω, u = g on ω, u = 0 on ∂.
9.6
Mathematical programming: Multipliers and duality
In this section, (V , .) is a normed space. Mathematical programming is concerned with optimization problems of the form % & min f0 (v) : f1 (v) ≤ 0, . . . , fn (v) ≤ 0 , (P) where fi (i = 1, . . . , n) are given functions from V into R.
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
341
Thus, a mathematical programming problem is an optimization problem where the constraint C has the following specific form: % & C = v ∈ V : fi (v) ≤ 0, i = 1, . . . , n . This problem is of fundamental importance; a large number of problems in decision sciences, engineering, and so forth can be written as mathematical programming problems. The mathematical analysis of this kind of problem depends heavily on the geometrical properties of the functions fi (i = 0, . . . , n). When the functions fi are affine, (P) is called a linear programming problem. When fi (i = 1, . . . , n) are affine and f0 is quadratic, (P) is called a quadratic programming problem. In this section, we study the situation where f0 , f1 , . . . , fn are supposed to be convex functions. Thus (P) is a convex minimization problem (f0 and C are convex); it is called a convex mathematical programming problem.
9.6.1
Karush–Kuhn–Tucker optimality conditions
The following theorem, which is the central result of this section, will be obtained by applying Theorem 9.5.5 to our situation. Because of the specific form of the constraint C, the constraint qualification assumption (Q) takes a quite simple form (this is the Slater qualification assumption). The computation of the normal cone NC (u) provides, as fundamental mathematical objects, the Karush–Kuhn–Tucker optimality conditions and the corresponding Lagrange multipliers. Theorem 9.6.1. Suppose that V is a normed space, f0 : V → R ∪ {+∞} is closed convex proper, and f1 , . . . , fn : V → R are convex and continuous. Suppose moreover that the following Slater qualification assumption is satisfied: there exists some v0 ∈ V such that f0 (v0 ) < +∞ and such that fi (v0 ) < 0 ∀i = 1, . . . , n. Then, the following statements are equivalent: (i) u is a solution of problem (P) above; (ii) there exist λ1 , λ2 , . . . , λn in R+ such that ∂f0 (u) + λ1 ∂f1 (u) + . . . λn ∂fn (u) 0, λi ≥ 0 ∀i = 1, . . . , n, f i (u) ≤ 0 ∀i = 1, . . . , n, λi fi (u) = 0 ∀i = 1, . . . , n. The central point of the proof of Theorem 9.6.1 is the computation of the normal cone %NC (u). We are going & to do it first when C is a closed half-space (that is, when C = v ∈ V : f (v) ≤ 0 with f affine continuous) and then in the general case. Lemma 9.6.1. Let (V , .) be a normed space and u∗ ∈ V ∗ with u∗ = 0. Let us consider the closed half-space % & H = v ∈ V : u∗ , v − u ≤ 0 . Then, NH (u) = R+ u∗ .
i
i i
i
i
i
i
342
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
In other words, v ∗ ∈ V ∗ belongs to the normal cone to H at u iff there exists some λ ≥ 0 such that v ∗ = λu∗ . Proof. The inclusion R+ u∗ ⊂ NH (u) is immediate: by definition of H, we have u∗ , v − u ≤ 0 for all v ∈ H. Hence u∗ ∈ NH (u) and R+ u∗ ⊂ NH (u). Conversely, take v ∗ ∈ NH (u), v ∗ = 0 (the case v ∗ = 0 is trivial). By definition of NH (u), we have (9.9) v ∗ , v − u ≤ 0 ∀ v ∈ H. As a particular subset of H, let us consider the affine subspace & % W = v ∈ V : u∗ , v − u = 0 . We have W = u + M where M = ker u∗ is the hyperspace & % M = v ∈ V : u∗ , v = 0 . By taking in (9.9) elements v belonging to W = u + M, we obtain v ∗ , v ≤ 0 ∀ v ∈ M. Then, replace v by −v (M is a subspace) to obtain v ∗ , v = 0 ∀ v ∈ M. We now follow a standard device in linear algebra. Take an arbitrary element w ∈ V , w ∈ M; noticing that u∗ , v ∗ w = 0, u ,v − ∗ u , w we deduce that for every v ∈ V , u∗ , v w ∈ M = ker u∗ . u∗ , w
v− Since v ∗ = 0 on M, we have
u∗ , v w , v , v = v , ∗ u , w ∗
that is,
∗
v ∗ , w ∗ u ,v . v , v = u∗ , w ∗
This being true for all v ∈ V , we finally obtain v∗ =
v ∗ , w ∗ u , u∗ , w
i.e., v ∗ = tu∗ for some t ∈ R.
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
343
Until now, we have exploited only a part of the information given by (9.9). Returning to (9.9), t must satisfy tu∗ , v − u ≤ 0 ∀ v ∈ H. Since for all v ∈ H u∗ , v − u ≤ 0, we necessarily have t ≥ 0. % & Let us now examine the situation where C = v ∈ V : f (v) ≤ 0 and compute the normal cone NC (u) at an arbitrary point u of C. Proposition 9.6.1. Suppose that f : V → R is a convex continuous function on a normed linear space V . Set % & C = v ∈ V : f (v) ≤ 0 and assume that C satisfies the following Slater property: there exists some v0 ∈ C such that f (v0 ) < 0. Then, for every u ∈ C
NC (u) =
{0} if f (u) < 0, R+ ∂f (u) if f (u) = 0.
As a consequence, u∗ ∈ NC (u) ⇐⇒ ∃λ ≥ 0 such that u∗ ∈ λ∂f (u) and λf (u) = 0. Proof. Take u ∈ C. If f (u) < 0, because of the continuity of f , we have u ∈ int C, which yields TC (u) = V and hence NC (u) = {0}. If on the contrary f (u) = 0, let us prove that NC (u) = R+ ∂f (u). The inclusion R+ ∂f (u) ⊂ NC (u) is quite easy to verify: take u∗ ∈ ∂f (u); by definition of the subdifferential ∂f (u) of f at u f (v) ≥ f (u) + u∗ , v − u.
∀v ∈ V
Noticing that f (u) = 0 and f (v) ≤ 0 for all v ∈ C, we obtain u∗ , v − u ≤ 0 ∀ v ∈ C, i.e., u∗ ∈ NC (u). Since NC (u) is a cone, we obtain R+ ∂f (u) ⊂ NC (u). Let us now prove the opposite inclusion, which is the delicate part of the proof: NC (u) ⊂ R+ ∂f (u). Equivalently, we have to prove that if f (u) = 0 and u∗ ∈ NC (u), then there exists some λ ≥ 0 such that u∗ ∈ λ∂f (u). The case u∗ = 0 is trivial, so we assume in the following that u∗ = 0. We are going to prove the existence of such λ by using a variational argument. As a direct consequence of the definition of the normal cone, we have (see Proposition 9.5.5) the equivalence u∗ ∈ NC (u) / ∗
the linear form v → u , v attains its maximal value on C at u ∈ C.
i
i i
i
i
i
i
344
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
As a general property of a linear form, the maximum of the linear form v → u∗ , v on C is attained on its boundary and v ∈ int C ⇒ u∗ , v < u∗ , u. Hence
f (v) < 0 ⇒ u∗ , v < u∗ , u.
Therefore
u∗ , v − u ≥ 0 ⇒ f (v) ≥ 0, % & that is, on the closed half-space H = v ∈ V : u∗ , v − u ≥ 0 we have f (v) ≥ 0. Noticing that u ∈ H and f (u) = 0, we have the following variational property: “f achieves its minimal value on the half-space H at the point u.”
Hence, ∂(f + δH )(u) 0. Since f is continuous, we can apply Theorem 9.5.5 to obtain ∂f (u) + NH (u) 0. We are in the situation described in Lemma 9.6.1. Noticing that % & H = v ∈ V : −u∗ , v − u ≤ 0 , we thus have NH (u) = R+ (−u∗ ) = R− (u∗ ). As a consequence, there exists some t ≤ 0 such that ∂f (u) + tu∗ 0. Let us finally prove that t < 0. Otherwise, t = 0 and ∂f (u) 0, which expresses that f attains its minimal value at u. This is impossible because f (u0 ) < 0 (Slater condition) and f (u) = 0. Thus t < 0, and, dividing by t the above relation, we obtain u∗ ∈ − 1t ∂f (u), i.e., u∗ ∈ R+ ∂f (u). We have now all the elements to prove Theorem 9.6.1. Proof of Theorem 9.6.1. Let us first verify that all the assumptions of Theorem 9.5.5 are satisfied. Since the functions fi are continuous, the Slater condition implies that v0 ∈ int C. Since f0 (v0 ) < +∞, we have dom f ∩ int C = ∅ and the qualification assumption (Q2 ) is satisfied. Thus u is a solution of the convex programming problem (P) iff ∂f0 (u) + NC (u) 0. % & Then notice that C = i=1 Ci , where Ci = v ∈ V : fi (v) ≤ 0 , which is equivalent to saying that δC = δC1 + · · · + δCn . The Slater condition implies that each of the closed convex functions fi = δCi is continuous at the point v0 . Thus, the subdifferential rule for the sum of convex functions (see Theorem 9.5.4) gives n
∂δC = ∂δC1 + · · · + ∂δCn , that is, for any u ∈ C,
NC (u) = NC1 (u) + · · · + NCn (u).
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
345
We now combine these results with Proposition 9.6.1 to obtain the existence of real numbers λ1 ≥ 0, . . . , λn ≥ 0 such that ∂f0 (u) + λ1 ∂f1 (u) + · · · + λn ∂fn (u) 0, λi = 0 if fi (u) < 0. Thus, in all cases λi fi (u) = 0.
9.6.2 The marginal approach to multipliers Let us first restate Theorem 9.6.1 in a variational way. Proposition 9.6.2. Assume that the hypotheses of Theorem 9.6.1 are satisfied. Let u be an optimal solution of the minimization problem % min f0 (v) : fi (v) ≤ 0,
& i = 1, . . . , n .
(P)
n (a) Then, there exists some vector λ ∈ R+ such that u is a solution of the unconstrained minimization problem:
n min f0 (v) + λi fi (v) : v ∈ V . (Pλ ) i=1
Moreover, the complementarity slackness condition holds: λi fi (u) = 0,
i = 1, . . . , n.
n (b) Conversely, if for some λ ∈ R+ , u is a solution of the unconstrained minimization problem (Pλ ) and i = 1, . . . , n, fi (u) ≤ 0, i = 1, . . . , n, λi fi (u) = 0,
then u is an optimal solution of the minimization problem (P). Proof. Just notice that, since for all i = 1, . . . , n the functions fi are supposed to be continuous, the additivity rule for subdifferentials holds, ∂f0 + λ1 ∂f1 + · · · + λn ∂fn = ∂(f0 + λ1 f1 + · · · + λn fn ), and the Karush–Kuhn–Tucker condition can be written in the form ' ( n ∂ f0 + λi fi (u) 0. i=1
This expresses that u is a solution of the convex unconstrained minimization problem (Pλ ).
i
i i
i
i
i
i
346
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Definition 9.6.1. Let u be an optimal solution of the minimization problem (P) above. We n call a vector λ ∈ R+ a Lagrange multiplier vector for u if ∂f0 (u) +
n
λi ∂fi (u) 0
i=1
and λi fi (u) = 0,
i = 1, . . . , n.
The determination of Lagrange multipliers is a central question since, if we are able to compute a Lagrange multiplier λ(u) of an optimal solution u, then u can be obtained as a solution of the unconstrained minimization problem
n min f0 (v) + λi fi (v) : v ∈ V . (Pλ ) i=1
Let us first notice that the set of Lagrange multipliers does not depend on the solution u, i.e., if u1 and u2 are two solutions of the minimization problem (P), then M(u1 ) = M(u2 ), where M(ui ) is the set of Lagrange multipliers of the solution ui . Indeed, this is a consequence of the following characterization of Lagrange multipliers. Proposition 9.6.3. Let u be an optimal solution of the minimization problem & % i = 1, . . . , n . min f0 (v) : fi (v) ≤ 0, Then, the set of Lagrange multipliers for u is equal to ' n : inf f0 = inf f0 + M = λ ∈ R+ C
V
n
(P)
( λi f i
,
i=1
& % where C = v ∈ V : fi (v) ≤ 0 for all i = 1, . . . , n is the set of constraints. Proof. Take a Lagrange multiplier λ for u. Then u is a solution of the unconstrained minimization problem
n n f0 (u) + λi fi (u) = inf f0 (v) + λi fi (v) : v ∈ V . i=1
V
i=1
Because of the complementarity slackness property we deduce
n λ i fi . f0 (u) = inf f0 + V
i=1
On the other hand, since u is an optimal solution of (P), we have & % f0 (u) = inf f0 + δC , which proves that λ ∈ M.
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
347
Conversely, let us suppose that λ ∈ M. Then, f0 (u) ≤ f0 (u) +
n
λi fi (u)
i=1
and ni=1 λi fi (u) ≥ 0. Since λi ≥ 0 and fi (u) ≤ 0, this implies λi fi (u) = 0 for all i = 1, . . . , n. Hence
n n f0 (u) + λi fi (u) = inf f0 (v) + λi fi (v) : v ∈ V , V
i=1
i=1
which expresses that u is a solution of the unconstrained minimization problem
n λi fi (v) : v ∈ V . min f0 (v) + i=1
As a consequence, ∂f0 (u) +
n
λi fi (u) 0,
i=1
which, together with λi ≥ 0, fi (u) ≤ 0, and λi fi (u) = 0, tells us that λ is a Lagrange multiplier vector for u. Clearly, the set M is independent of u solution of (P). Thus we can speak of the set of Lagrange multipliers of a convex program. Indeed, the definition of the set M makes sense, and the set M may be nonempty, even when there is no solution of the convex program (P). This leads us to give the following definition. Definition 9.6.2. For a given convex program % inf f0 (v) : fi (v) ≤ 0,
& i = 1, . . . , n ,
(P)
the set M of generalized Lagrange multiplier vectors is defined by ' ( n n λi f i , M = λ ∈ R+ : inf f0 + δC = inf f0 + V
V
i=1
& i = 1, . . . , n . When the problem (P) has a solution, where C = v ∈ V : fi (v) ≤ 0, then M is the set of Lagrange multiplier vectors for (P). %
Without ambiguity, in what follows we will omit the word “generalized.” We are going to characterize the set M by using marginal analysis. Definition 9.6.3. The value function attached to a convex program % & inf f0 (v) : fi (v) ≤ 0 ∀ i = 1, . . . , n
(P)
i
i i
i
i
i
i
348
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
is the function p : Rn → R which is defined, for every y = (y1 , y2 . . . , yn ) ∈ Rn , by % & p(y) := inf f0 (v) : fi (v) ≤ yi ∀ i = 1, . . . , n . The function p is also called the marginal function. Let us observe that the value function is the optimal value of the perturbed convex program (Py ) & % (Py ) inf f0 (v) : fi (v) ≤ yi ∀ i = 1, . . . , n . The initial problem, or unperturbed problem corresponds to the case y = 0, i.e., (P) = (P0 ). We also notice that the value function may take the value −∞, which may be a source of difficulties. We are going to show that Lagrange multiplier vectors for problem (P) correspond to subgradients of the value function p. Theorem 9.6.2. Consider the convex minimization problem % & inf f0 (v) : fi (v) ≤ 0 ∀ i = 1, . . . , n
(P)
and its value function p : Rn → R % & p(y) = inf f0 (v) : fi (v) ≤ yi ∀ i = 1, . . . , n . Then, the following properties hold: (a) the value function p is convex; (b) if p(0) ∈ R, then M = −∂p(0), i.e., the set of generalized Lagrange multiplier vectors for (P) is equal to the opposite of the subdifferential of p at the origin; (c) if p(0) ∈ R and the Slater qualification assumption is satisfied, then p is continuous n at the origin and, as a consequence, M is a nonempty, closed, convex, bounded set in R+ . Proof. (a) We notice that p(y) = inf f (v, y), v∈V % & where f (v, y) = f0 (v) + δC(y) (v) and C(y) = v ∈ V : fi (v) ≤ yi for all i = 1, . . . , n . Let us verify that the mapping (v, y) → δC(y) (v) is convex. We just need to verify that for every (u, z) ∈ V × Rn and (v, y) ∈ V × Rn such that u ∈ C(z) and v ∈ C(y), we still have λu + (1 − λ)v ∈ C(λz + (1 − λ)y) for all λ ∈ [0, 1]. Indeed, this is an immediate consequence of the convexity of functions fi : we have fi (λu + (1 − λ)v) ≤ λfi (u) + (1 − λ)fi (v) ≤ λzi + (1 − λ)yi = (λz + (1 − λ)y)i . Since f0 is convex, we obtain that f is convex with respect to the pair (v, y). The convexity of the value function p is then a consequence of Proposition 9.2.3. n (b) We first prove that every generalized Lagrange multiplier vector λ ∈ R+ satisfies −λ ∈ ∂p(0). Equivalently, we need to prove that ∀y ∈ Rn
p(y) ≥ p(0) −
n
λi y i .
i=1
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
349
By definition of p and by Definition 9.6.2 of generalized Lagrange multiplier vectors, we have % & p(0) = inf f0 + δC
n = inf f0 + λi fi . i=1
Take an arbitrary y ∈ Rn and denote by C(y) the set % & C(y) = v ∈ V : fi (v) ≤ yi , i = 1, . . . , n . For every v ∈ C(y) we have ni=1 λi fi (v) ≤ ni=1 λi yi (recall that λi ≥ 0 for all i = 1, . . . , n). Hence, for all v ∈ C(y), p(0) ≤ f0 (v) +
n
λi yi .
i=1
As a consequence, by taking the infimum with respect to v ∈ C(y), we obtain p(0) ≤ p(y) +
n
λi y i .
i=1
Let us now prove that, conversely, if −λ ∈ ∂p(0), then λ is a generalized Lagrange multiplier vector for (P). n n We first prove that λ ∈ R+ . Indeed, for every y ∈ R+ we have C ⊂ C(y), and as a consequence p(y) ≤ p(0). Combining this inequality and the subdifferential inequality p(y) ≥ p(0) −
n
λi y i ,
i=1
we obtain
n
λi yi ≥ 0.
i=1 n n This being true for all y ∈ R+ , we obtain that λ ∈ R+ . Let us now prove that ' ( n inf f0 + δC = inf f0 + λ i fi . V
V
i=1
Equivalently, we need to prove that ' p(0) = inf f0 + V
n
( λi f i .
i=1
i
i i
i
i
i
i
350
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
n The inequality p(0) ≥ inf V f0 + ni=1 λi fi is always true for arbitrary λ ∈ R+ : indeed, for every v ∈ C, we have fi (v) ≤ 0 and hence λi fi (v) ≤ 0. This immediately yields ' ( ' ( n n λi fi ≤ inf f0 + λ i fi inf f0 + V
C
i=1
i=1
≤ inf f0 = p(0). C
The opposite inequality p(0) ≤ inf f0 + ni=1 λi fi relies on the fact that −λ ∈ ∂p(0). We thus have for each y ∈ Rn p(y) +
n
λi yi ≥ p(0).
i=1
Take an arbitrary v ∈ V and choose correspondingly yi = fi (v) for all i = 1, . . . , n. Thus, we have v ∈ C(y) and p(y) ≤ f0 (v). As a consequence, f0 (v) +
n
λi fi (v) ≥ p(0).
i=1
This being true for all v ∈ V , by taking the infimum with respect to v, we obtain ' ( n λi fi ≥ p(0). inf f0 + V
i=1
n and Finally, we have proved that λ ∈ R+
' inf (f0 + δC ) = inf f0 + V
V
n
( λi f i .
i=1
By Definition 9.6.2, λ is a generalized Lagrange multiplier vector. (c) By the Slater qualification assumption, there exists some v0 ∈ dom f0 such that fi (v0 ) < 0 for all i = 1, . . . , n. Thus, we can find a neighborhood of the origin in Rn , say, B(0, r) with r > 0, such that ∀y ∈ B(0, r), ∀ i = 1, . . . , n fi (v0 ) < yi . % & (It is enough to take, for example, r = 21 inf |fi (v0 )| : i = 1, . . . , n .) By definition of the value function p, we have ∀y ∈ B(0, r)
p(y) ≤ f0 (v0 ).
Since f0 (v0 ) < +∞, p is bounded from above on the ball B(0, r). Let us prove that this property, together with p(0) ∈ R, implies ∀y ∈ Rn
p(y) > −∞.
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
351
We first formulate the properties above in terms of epigraphs. We have B(0, r) × [f0 (v0 ), +∞[⊂ epi p. If p(y) = −∞ for some y ∈ Rn , we would have {y} × R ⊂ epi p. Take ξ = −αy with α > 0 to have |ξ | < r, for example, α = r/(2|y|). We can write α = (1 − λ)/λ for some 0 < λ < 1, which gives λξ + (1 − λ)y = 0. Then we observe that (ξ, f0 (v0 )) ∈ B(0, r) × [f0 (v0 ), +∞[, (y, t) ∈ {y} × R for every t ∈ R. By the convexity of epi p we obtain λξ + (1 − λ)y, λf0 (v0 ) + (1 − λ)t ∈ epi p, i.e., p(0) ≤ λf0 (v0 ) + (1 − λ)t for every t ∈ R. Since 0 < λ < 1, this implies p(0) = −∞, a contradiction. We can now apply Theorem 9.2.2: the function p : Rn → R ∪ {+∞} is convex and majorized on a neighborhood of the origin. Hence, p is continuous at the origin. By Proposition 9.5.2, ∂p(0) = ∅, and the set M = −∂p(0) is a nonempty closed convex n bounded set in R+ . Let us now introduce a dual minimization problem (P ∗ ) to the convex program (P) and show that the generalized Lagrange multiplier vectors are the solutions of this dual problem (P ∗ ). Theorem 9.6.3 (dual convex program). Let us consider a convex program & % inf f0 (v) : fi (v) ≤ 0 ∀ i = 1, . . . , n
(P)
and let p : Rn → R be its value function. We assume that p(0) ∈ R and that the Slater qualification assumption holds. Then (a) The generalized Lagrange multiplier vectors of (P) are the solutions of the maximization problem % & n sup − p ∗ (−λ) : λ ∈ R+ , (P ∗ ) which is called the dual problem of (P). The set of solutions of (P ∗ ) is a nonempty closed n . convex bounded subset of R+ n (b) For every λ ∈ R+ , the equality
n ∗ λi fi (v) −p (−λ) = inf f0 (v) + v∈V
i=1
holds and, as a consequence, the dual problem (P ∗ ) can be written in the form
n λi fi (v) . sup inf f0 (v) + n v∈V λ∈R+
(P ∗ )
i=1
i
i i
i
i
i
i
352
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Proof. By Theorem 9.6.2, we have the following equivalence: λ ∈ M ⇐⇒ −λ ∈ ∂p(0). We know that p is a convex function. Thus, by using the Fenchel extremality relation (Proposition 9.5.1), we obtain λ ∈ M ⇐⇒ p(0) + p ∗ (−λ) = 0. Theorem 9.6.2 also tells us that p is continuous at the origin. By Proposition 9.3.2, we thus have p(0) = p∗∗ (0). Noticing that p ∗∗ (0) = sup{−p∗ (µ) : µ ∈ Rn }, we obtain λ ∈ M ⇐⇒ −p∗ (−λ) = sup −p ∗ (µ) µ∈Rn
= sup −p ∗ (−µ). µ∈Rn
Thus, λ ∈ M iff λ is a solution of the maximization problem sup −p ∗ (−µ).
µ∈Rn
Let us now compute p ∗ : % & p∗ (µ) = sup µ, y − p(y) y
% % && = sup µ, y − inf f0 (v) : fi (v) ≤ yi ∀ i = 1, . . . , n y
% & = sup µ, y − f0 (v) : y ∈ Rn , fi (v) ≤ yi ∀ i = 1, . . . , n . y
n If for some i ∈ {1, . . . , n} we have µi > 0 then p∗ (µ) = +∞. Otherwise, when −µ ∈ R+ we have
n ∗ p (µ) = sup −f0 (v) + sup µ i yi v∈V
= sup v∈V
Hence −p ∗ (−µ) =
yi ≥fi (v) i=1
n
µi fi (v) − f0 (v) .
i=1
f0 (v) +
inf
v∈V
−∞
sup inf
µi fi (v)
n if µ ∈ R+ ,
i=1
otherwise,
and the dual problem can be written n v∈V µ∈R+
n
f0 (v) +
n
µi fi (v) .
i=1
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
353
9.6.3 The Lagrangian approach to duality In the framework of convex problems, and thanks to the Legendre–Fenchel transform, we have seen that a large number of mathematical objects can be paired with a dual one. Indeed we are going to go further and see how to realize the duality of optimization problems themselves. In the previous section, we introduced a variational problem (P ∗ ), called the dual problem of (P). We are going to justify this terminology and explore how the primal problem (P) and its dual (P ∗ ) are related to each other in remarkable ways. Let us introduce some basic notations and concepts. The convex program % & inf f0 (v) : fi (v) ≤ 0 ∀ i = 1, . . . , n (P) is called the primal problem. The key notion in the duality theory for optimization problems is the Lagrangian. Definition 9.6.4. The Lagrangian function attached to the convex program (P) is the n function L : V × R+ → R ∪ {+∞} defined by L(v, λ) = f0 (v) +
n
λi fi (v).
i=1
We already noticed that this expression plays a central role in the theory of Lagrange multipliers. The new aspect in the definition above is to consider this expression as a bivariate function, i.e., L is a function of the two variables v and λ. This is a big step, since we are no longer concerned only with the primal problem, its solutions, and the characterization of its solutions: we choose to give from the very beginning an equivalent status to the variables v and λ. Let us first observe that the Lagrangian function L encapsulates all the information of the primal problem (P). Clearly sup L(v, λ) = f0 (v) + δC (v),
n λ∈R+
% & where C = v ∈ V : fi (v) ≤ 0 for all i = 1, . . . , n is the set of constraints. Thus, the primal problem can be equivalently written as an inf-sup problem, namely, inf sup L(v, λ).
v∈V λ∈Rn +
(P)
Let us denote by α ∈ R the optimal value of (P) α := inf sup L(v, λ); v∈V λ∈Rn +
α is called the primal value. This formulation of (P) makes it rather natural to consider the associated variational problem sup inf L(v, λ), (P ∗ ) n v∈V λ∈R+
i
i i
i
i
i
i
354
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
called the dual problem, which is obtained by interchanging the order of the sup and inf operators. Indeed, this formulation fits perfectly with the conclusion of Theorem 9.6.3 where it is shown that, under some assumptions, Lagrange multipliers are solutions of the dual problem (P ∗ ). Let us denote by β ∈ R the optimal value of (P ∗ ) β := sup inf L(v, λ); n v∈V λ∈R+
β is called the dual value. n The dual problem therefore consists in maximizing over vectors λ ∈ R+ the dual function d(λ) := inf L(v, λ). v∈V
Note that the dual problem (P ∗ ) is well defined without any assumptions on the functions fi , i = 1, . . . , n. One has always β ≤ α, because supX inf Y ≤ inf Y supX is always true. It can happen that the primal value α is strictly larger than the dual value β. In this case, we say that there is a duality gap. A basic question is to find conditions ensuring that there is no duality gap. When this is the case, the primal and the dual problem are connected through a rich calculus involving value functions, Legendre–Fenchel transform and subdifferentials, minimax and saddle value problems. The notion of saddle value and saddle point of the Lagrangian function is also fundamental: it permits us to treat in a unifying way the primal and the dual aspects of optimization problems and will allow us to develop all these ideas in a far more general setting in the next section. Definition 9.6.5. Let L : X × Y → R be a bivariate function where X and Y are arbitrary spaces. A point (x, y) ∈ X × Y is called a saddle point of L if max L(x, y) = L(x, y) = min L(x, y). y∈Y
x∈X
Equivalently, (x, y) is a saddle point of L if L(x, y) ≤ L(x, y) ≤ L(x, y)
∀ x ∈ X, y ∈ Y.
Another way to say this is (a) x is a solution of the minimization problem inf x∈X L(x, y), (b) y is a solution of the maximization problem maxy∈Y L(x, y). Note that the existence of a saddle point (x, y) implies that there is no duality gap. This follows from the equalities sup L(x, y) = L(x, y) = inf L(x, y), y∈Y
x∈X
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
355
which imply α = inf sup L(x, y) ≤ L(x, y) ≤ sup inf L(x, y) = β. x
y
y
x
Since α ≥ β is always true we obtain L(x, y) = inf sup L(x, y) = sup inf L(x, y). x
y
y
x
The converse is not true in general: it is possible to have no duality gap without the existence of saddle points. We can now reformulate the conclusions of Theorem 9.6.3 in the following form. Theorem 9.6.4. Consider a convex program (P) and assume that the Slater condition holds. Then the following facts hold true. (a) There is no duality gap, i.e., the primal and the dual values are equal; let us call it the optimal value. (b) (dual attainment) Assuming moreover that the optimal value is finite, then the set of solutions of the dual problem (P ∗ ) is nonempty: it is the set of generalized Lagrange multipliers of problem (P), and it is convex and bounded. (c) (saddle point formulation of primal solutions) The following assertions are equivalent: (i) u is a solution of the primal problem (P); n (ii) there exists a vector λ ∈ R+ such that (u, λ) is a saddle point of the Lagrangian function L.
If (ii) is satisfied, then λ is a Lagrange multiplier of the optimal solution u, and it is a solution of the dual problem (P ∗ ). Proof. (a) If α = −∞, there is nothing to prove, since we know that α ≥ β. Otherwise, if α is finite, we are in the situation which was studied in Theorem 9.6.2: the Slater condition implies ∂p(0) = ∅ and the set M of generalized Lagrange multiplier vectors is nonempty. For every λ ∈ M we have α = inf sup L(v, λ) v
λ
= inf L(v, λ) v
≤ sup inf L(v, λ) = β. λ
v
Since α ≥ β is always true, we obtain α = β. (b) It is just a reformulation of Theorem 9.6.3: the set of solutions of the dual problem (P ∗ ) has been characterized with the help of the value function: λ solution of (P ∗ ) ⇐⇒ −λ ∈ ∂p(0) ⇐⇒ λ generalized Lagrange multiplier of (P). (c) Let us first prove the implication (i) ⇒ (ii).
i
i i
i
i
i
i
356
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Let u be a solution of problem (P). Then, the Slater condition implies the existence of a Lagrange multiplier λ associated to u (Theorem 9.6.1 and Proposition 9.6.3). Therefore, L(u, λ) = min L(v, λ). v∈V
On the other hand, because of the complementary slackness property (λi fi (u) = 0 for all n i = 1, . . . , n), we have for any λ ∈ R+ L(u, λ) = f0 (u) +
n
λi fi (u)
i=1
= f0 (u) ≥ f0 (u) +
n
λi fi (u)
i=1
(use that λi ≥ 0 and fi (u) ≤ 0). Hence L(u, λ) ≥ sup L(u, λ). n λ∈R+
Finally inf L(v, λ) ≥ L(u, λ) ≥ sup L(u, λ),
v∈V
n λ∈R+
which expresses that (u, λ) is a saddle point of L. Let us now prove the implication (ii) ⇒ (i). If (u, λ) is a saddle point of L on n V × R+ , we have f0 (u) + δC (u) = sup L(u, λ) ≤ L(u, λ) n λ∈R+
≤ inf L(v, λ) ≤ sup inf L(v, λ) v
n v λ∈R+
≤ inf sup L(v, λ) = inf f0 (v) + δC (v) . v λ∈Rn +
v
Hence, u is a solution of the primal problem (P) and ' ( n inf f0 + δC = inf f0 + λi fi , v
v
i=1
which expresses (see Proposition 9.6.3) that λ is a Lagrange multiplier and, hence, λ is a solution of the dual problem.
9.6.4
Duality for linear programming
Take V = Rn . Given vectors a 1 , a 2 , . . . , a m , c in Rn and a vector b in Rm consider the primal linear program % & inf c, x : a i , x − bi ≤ 0, i = 1, . . . , m , (P)
i
i i
i
i
i
i
9.6. Mathematical programming: Multipliers and duality
“abmb 2005/1 page 3 i
357
where ·, · is the usual Euclidean scalar product in Rn . This is clearly a problem of linear programming, with
f0 (x) = c, x, fi (x) = a i , x − bi
i = 1, . . . , m.
% & Indeed, f0 is linear and C = x ∈ Rn : a i , x − bi ≤ 0 is a polyhedral set (finite m → R is given intersection of closed half-spaces). The Lagrangian function L : Rn × R+ by m L(x, λ) = c, x + λi a i , x − bi . i=1
Therefore, the primal problem can be rewritten as inf sup L(x, λ)
(P)
x λ∈Rm +
and the dual problem (P ∗ ) is given by (P ∗ )
sup inf L(x, λ).
m x λ∈R+
Let us compute the dual function d(λ) = inf L(x, λ) x '3 = infn x∈R
We find
d(λ) =
x, c +
m
4 λi a
i
−
i=1
m
( λ i bi .
i=1
i − m if c + m i=1 λi bi i=1 λi a = 0, −∞ otherwise.
The dual problem (P ∗ ) is then given by sup −b, λ m i i=1 λi a = −c, subject to m λ ∈ R+ ,
(P ∗ )
and the Kuhn–Tucker optimality conditions are the following: m i i=1 λi a = −c, λ ≥ 0, x ∈ Rn , i a i , x − bi ≤ 0, i λi a , x − bi = 0.
i
i i
i
i
i
i
358
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
9.7 A general approach to duality in convex optimization In Section 9.6, we developed a duality theory for convex programs. This is an important class of convex optimization problems, but it is far from covering the whole field of convex optimization. Thus a number of natural questions arise: Is it possible to develop a duality theory for general convex optimization, and if yes, is there a unique dual minimization problem? What are the relations between primal and dual problems, and what is the interpretation of the solutions of the dual problem? At the center of all these questions is the notion of Lagrangian function which we now introduce. Let us consider the primal problem % & inf f (v) : v ∈ V , (P) where f : V → R ∪ {+∞} is a general convex, lower semicontinuous, and proper function whose definition usually includes the constraints. The basic idea is to introduce a bivariate function L : V × W → R which satisfies ∀v ∈ V
f (v) = sup L(v, w). w∈W
We say that L is a Lagrangian function associated to problem (P). In this way, the primal problem can be written as an inf − sup problem: inf sup L(v, w).
v∈V w∈W
(P)
As we did in Section 9.6, one can associate to problem (P) a dual problem (P ∗ ) which is obtained by interchanging the order of inf and sup: sup inf L(v, w).
w∈W v∈V
(P ∗ )
Equivalently, (P ∗ ) can be written as a maximization problem, in the form sup d(w)
(P ∗ )
w∈W
with d : W → R being defined by d(w) := inf L(v, w). v∈V
We call d(·) the dual function (attached to the Lagrangian function L). The interesting situation occurs when there is no duality gap, i.e., inf(P) = sup(P ∗ ), which is equivalent to saying inf sup L(v, w) = sup inf L(v, w).
v∈V w∈W
w∈W v∈V
In this general abstract setting, we have the following result. Proposition 9.7.1. Let L : V ×W → R be a general bivariate function. Then, the following facts are equivalent:
i
i i
i
i
i
i
9.7. A general approach to duality in convex optimization
“abmb 2005/1 page 3 i
359
(i) (v, w) is a saddle point of L; (ii) v is a solution of the primal problem (P), w is a solution of the dual problem (P ∗ ), and there is no duality gap: inf(P) = sup(P ∗ ). Proof. (i) ⇒ (ii). By definition of saddle point L(v, w) ≤ L(v, w) ≤ L(v, w)
∀v ∈ V , ∀w ∈ W.
Hence, for every v ∈ V sup L(v, w) ≤ L(v, w) ≤ L(v, w) ≤ sup L(v, w). w∈W
w∈W
This being true for all v ∈ V , we obtain that v is a solution of the minimization problem inf sup L(v, w) , v∈V
w∈W
which is precisely the primal problem (P). In a similar way, for every w ∈ W inf L(v, w) ≤ L(v, w) ≤ L(v, w) ≤ inf L(v, w).
v∈V
v∈V
Hence, w is a solution of the maximization problem sup inf L(v, w) , w∈W
v∈V
which is the dual problem (P ∗ ). (ii) ⇒ (i). Since v is a solution of the primal problem (P), we have sup L(v, w) = inf sup L(v, w). v∈V w∈W
w∈W
Similarly, since w is a solution of the dual problem (P ∗ ), we have inf L(v, w) = sup inf L(v, w).
v∈V
w∈W v∈V
Since there is no duality gap, inf sup = sup inf and we obtain sup L(v, w) = inf L(v, w), v∈V
w∈W
that is, ∀v ∈ V , ∀w ∈ W
L(v, w) ≤ L(v, w).
This clearly implies L(v, w) ≤ L(v, w) for all v ∈ V , and L(v, w) ≤ L(v, w) for all w ∈ W , i.e., L(v, w) = min L(v, w) = max L(v, w), v
w
which expresses that (v, w) is a saddle point of L.
i
i i
i
i
i
i
360
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
As we described above, duality theory for minimization problems follows very naturally from the Lagrangian formulation: it just consists in the permutation of the inf and the sup. We stress the fact that the primal and the dual problems are intimately paired as soon as there is no duality gap and there exist saddle points of L. Thus the question is, for which class of bivariate functions L can one expect to have such properties? This is a central question in game theory, fixed point theory, economics, and so forth. Let us quote the celebrated Von Neumann’s minimax theorem. Indeed, we give a slightly more general formulation to recover as a particular case the existence theorem for convex minimization problems (see Aubin [45], for example). Theorem 9.7.1 (Von Neumann’s minimax theorem). Let V and W be two reflexive Banach spaces and let M ⊂ V and N ⊂ W be two closed convex nonempty sets. Let L : M × N → R be a bivariate function which satisfies the following properties: (ia ) ∀w ∈ N, v → L(v, w) is convex and lower semicontinuous, (ib ) ∀v ∈ M w → L(v, w) is concave and upper semicontinuous. (iia ) M is bounded or there exists some w0 ∈ N such that v → L(v, w0 ) is coercive, (iib ) N is bounded or there exists some v0 ∈ M such that w → −L(v0 , w) is coercive. Then L possesses a saddle point (v, w) ∈ M × N , i.e., min L(v, w) = L(v, w) = max L(v, w). v∈M
w∈N
In particular inf v supw L(v, w) = supw inf v L(v, w), i.e., there is no duality gap. It follows from the previous results that the key property to developing a duality theory for optimization problems is the possibility to write the function f in the following form: f (v) = sup L(v, w) w∈W
with L : V × W → R a convex-concave bivariate function. We are going to see how the Legendre–Fenchel transform permits us to produce such convex-concave Lagrangian functions in a systematic and elegant way. The idea is first to introduce a perturbation function F : V × Y → R ∪ {+∞} such that F (v, 0) = f (v) for all v ∈ V . The primal problem (P) can now be written as % & inf F (v, 0) : v ∈ V . (P) The key property which allows us to produce a convex-concave Lagrangian function from F is that F is convex with respect to (v, w) ∈ V ×W . For example, in convex programming, the duality scheme that was studied in Section 9.6 is associated with the perturbation function: f0 (v) if fi (v) ≤ yi , i = 1, . . . , n, F (v, y) = +∞ otherwise. One can easily verify that when f0 and fi (i = 1, . . . , n) are convex functions, so is F .
i
i i
i
i
i
i
9.7. A general approach to duality in convex optimization
“abmb 2005/1 page 3 i
361
Let us now describe how one can associate a Lagrangian function to F . Proposition 9.7.2 (definition of Lagrangian). Let F : V × Y → R ∪ {+∞} be a convex function. We associate to F a Lagrangian function L : V × Y ∗ → R by the following formula: & % ∀v ∈ V , ∀y ∗ ∈ Y ∗ − L(v, y ∗ ) = sup y ∗ , y − F (v, y) , y∈Y
i.e., −L(v, ·) is the Legendre–Fenchel conjugate of F (v, ·). We have that L is a convexconcave function. More precisely, (1) for all v ∈ V , y ∗ → L(v, y ∗ ) is concave, upper semicontinuous on Y ∗ ; (2) for all y ∗ ∈ Y ∗ , v → L(v, y ∗ ) is convex. Proof. The proof is immediate; just note that (2) follows from Proposition 9.2.3 and from the fact that the function (v, y) → −y ∗ , y + F (v, y) is convex. As expected, we can reformulate problem (P) by using the Lagrangian function L attached to the (convex) perturbation function F . Proposition 9.7.3. Let F : V × Y → R ∪ {+∞} be a convex, lower semicontinuous, proper function and let L : V × Y ∗ → R be the corresponding Lagrangian function, given by & % −L(v, y ∗ ) = sup y ∗ , y − F (v, y) . y∈Y
(a) Primal problem: we have ∀v ∈ V
F (v, 0) = sup L(v, y ∗ ). y ∗ ∈Y ∗
As a consequence, with the notation f (v) = F (v, 0), the primal problem % & inf f (v) : v ∈ V can be written as
inf sup L(v, y ∗ ).
v∈V y ∗ ∈Y ∗
(P) (P)
(b) Dual problem: the dual problem, which, by definition, is sup inf L(v, y ∗ )
(P ∗ )
sup d(y ∗ )
(P ∗ )
y ∗ ∈Y ∗ v∈V
can be written as
y ∗ ∈Y ∗
where the dual function d : Y ∗ → R ∪ {+∞} is given by d(y ∗ ) = inf L(v, y ∗ ) v∈V
= −F ∗ (0, y ∗ ), and F ∗ is the Legendre–Fenchel conjugate of F with respect to (v, y).
i
i i
i
i
i
i
362
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Proof. (a) Since F is closed convex and proper on V × Y , for every v ∈ V the function ϕv : y → F (v, y) is closed convex on Y . Hence, for all v ∈ V ϕv (y) = ϕv∗∗ (y) & % = sup y ∗ , y − ϕv∗ (y ∗ ) . y ∗ ∈Y ∗
By definition of L we have % & ϕv∗ (y ∗ ) = sup y ∗ , y − ϕv (y) y∈Y
& % = sup y ∗ , y − F (v, y) y∈Y
= −L(v, y ∗ ). Hence,
% & ϕv (y) = sup y ∗ , y + L(v, y ∗ ) . y ∗ ∈Y ∗
Take now y = 0 to obtain f (v) = F (v, 0) = ϕv (0) = sup L(v, y ∗ ). y ∗ ∈Y ∗
(b) By definition, the dual function d : Y ∗ → R ∪ {+∞} is equal to d(y ∗ ) = inf L(v, y ∗ ). v∈V
By definition of L, % & L(v, y ∗ ) = − sup y ∗ , y − F (v, y) y
%
& = inf − y ∗ , y + F (v, y) . y
Hence, d(y ∗ ) =
%
inf
v∈V ,y∈Y
& − y ∗ , y + F (v, y) .
Thus, d(y ∗ ) = − sup
%
&
0, v + y ∗ , y − F (v, y)
v∈V ,y∈Y ∗ ∗
= −F (0, y ), where F ∗ is the conjugate of F with respect to (v, y). The other fundamental mathematical object which is attached to the perturbation function F is the value function.
i
i i
i
i
i
i
9.7. A general approach to duality in convex optimization
“abmb 2005/1 page 3 i
363
Proposition 9.7.4 (definition of the value function). Let F : V × Y → R ∪ {+∞} be a convex function. The value function (also called marginal function) attached to F is the function p : Y → R ∪ {+∞}, which is defined by ∀y ∈ Y
p(y) := inf F (v, y). v∈V
It is a convex function. Moreover, for every y ∗ ∈ Y ∗ p ∗ (y ∗ ) = F ∗ (0, y ∗ ). Thus, the dual problem (P ∗ ) can be formulated in terms of p as follows: sup (−p ∗ (y ∗ )) = − inf p ∗ (y ∗ ). ∗ ∗ y ∈Y
y ∗ ∈Y ∗
(P ∗ )
Proof. The convexity of p follows from the convexity of F and by applying Proposition 9.2.3. For y ∗ ∈ Y ∗ we have % & p ∗ (y ∗ ) = sup y ∗ , y − p(y) y∈Y
% & = sup y ∗ , y − inf F (v, y) y∈Y
=
sup
(v,y)∈V ×Y
v∈V
& 0, v + y ∗ , y − F (v, y)
%
= F ∗ (0, y ∗ ) = −d(y ∗ ), where d is the dual function introduced in Proposition 9.7.3(b). We have now all the ingredients for developing a general convex duality theory. The following theorem may be proved in the same way as Theorems 9.6.2 and 9.6.3. Let us notice that the qualification assumption “there exists some v0 ∈ V such that F (v0 , .) is finite and continuous at the origin” plays the role of the Slater qualification assumption in convex programming. For this reason, we call it (GS) (for generalized Slater). Theorem 9.7.2. Let f : V → R ∪ {+∞} be a closed convex and proper function such that inf f > −∞. Let F : V × Y → R ∪ {+∞} be a perturbation function which satisfies the following conditions: (i) F (v, 0) = f (v) ∀v ∈ V ; (ii) F is a closed convex proper function; (iii) (generalized Slater): there exists some v0 ∈ V such that y → F (v0 , y) is finite and continuous at the origin.
i
i i
i
i
i
i
364
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Then, the following properties hold: (a) The value function p is continuous at the origin. As a consequence, the set of solutions of the dual problem (P ∗ ), which is equal to ∂p(0), is nonempty. Indeed, it is a nonempty closed convex and bounded subset of Y ∗ . (b) There is no duality gap, i.e., inf(P) = max(P ∗ ). (c) Let u be a solution of the primal problem (P). Then, for every element y ∗ which is a solution of the dual problem (by property (a) there exist such elements), (u, y ∗ ) is a saddle point of the Lagrangian function L associated to F . Conversely, if (u, y ∗ ) is a saddle point of L, then u is a solution of (P) and y is a solution of (P ∗ ). (d) (u, y ∗ ) is a saddle point of L iff it satisfies the extremality relation: F (u, 0) + F ∗ (0, y ∗ ) = 0. Proof. (a) The generalized Slater condition implies the existence of some neighborhood of 0 in Y in which the function y → F (v0 , y) is bounded from above: let r > 0 and M ∈ R be such that F (v0 , y) ≤ M ∀ y ∈ BY (0, r). As a consequence, the value function p satisfies p(y) = inf F (v, y) ≤ F (v0 , y) ≤ M v∈V
∀ y ∈ BY (0, r).
The function p also satisfies p(0) = inf(P) and p is bounded fom above on a neighborhood of the origin. By Theorem 9.3.2 and Proposition 9.5.2 we obtain that p is subdifferentiable at 0, i.e., ∂p(0) = ∅. We know by Proposition 9.7.4 that the dual problem (P ∗ ) can be expressed in terms of the value function p. Indeed, the fact that y ∗ is a solution of (P ∗ ) is equivalent to saying that y ∗ is a solution of the minimization problem inf p ∗ (y ∗ ).
y ∗ ∈Y ∗
Thus, y ∗ is a solution of (P ∗ ) ⇐⇒ ∂p ∗ (y ∗ ) 0 ⇐⇒ y ∗ ∈ ∂p(0) (Theorem 9.5.1). By the argument above, ∂p(0) is a closed convex bounded nonempty subset of Y ∗ . Thus, there exist solutions of the dual problem (P ∗ ) and the set of these solutions is a closed convex bounded subset of Y ∗ . (b) Let y ∗ be any solution of the dual problem (P ∗ ). We know by (a) that there exist such elements and that they are characterized by the relation y ∗ ∈ ∂p(0) or, equivalently, by the Fenchel extremality relation p(0) + p ∗ (y ∗ ) = y ∗ , 0 = 0. Hence inf(P) = p(0) = −p∗ (y ∗ ) = sup −p ∗ (y ∗ ) = sup(P ∗ ). y ∗ ∈Y ∗
i
i i
i
i
i
i
9.8. Duality in the calculus of variations: First examples
“abmb 2005/1 page 3 i
365
(c) We use the Lagrangian formulation of problem (P) and (P ∗ ) given by Proposition 9.7.3: (P) inf v∈V supy ∗ ∈Y ∗ L(v, y ∗ ), (P ∗ ) supy ∗ ∈Y ∗ inf v∈V L(v, y ∗ ). The characterization of pairs of optimal solutions (u, y ∗ ) of problems (P) and (P ∗ ), respectively, as saddle points of the Lagrangian L is a direct consequence of Proposition 9.7.1. (d) Let (u, y ∗ ) be a saddle point of L. Let us reformulate the extremality relation p(0) + p ∗ (y ∗ ) = 0
(see (b) above)
in terms of the function F : we have p(0) = inf(P) = F (u, 0), p∗ (y ∗ ) = sup(P ∗ ) = F ∗ (0, y ∗ ) Hence
(Proposition 9.7.4).
F (u, 0) + F ∗ (0, y ∗ ) = 0,
that is, (0, y ∗ ) ∈ ∂F (u, 0).
9.8
Duality in the calculus of variations: First examples
As a model example, let us consider the Dirichlet problem: given h ∈ L2 (), find u ∈ H01 () such that −u = h in , u = 0 on ∂. The variational formulation of this problem has been extensively studied in chapter 5: the solution u of the Dirichlet problem is the minimizer, on the Sobolev space H01 (), of the functional 1 2 f (v) = |∇v(x)| dx − h(x)v(x) dx. 2 The primal problem (P) can be expressed as 1 2 min |∇v(x)| dx − h(x)v(x) dx . (P) v∈H01 () 2 We now introduce the perturbation function 1 2 |∇v(x) + y(x)| dx − h(x)v(x) dx, F (v, y) = 2 which we consider as a function F : H01 () × L2 ()N → R. To compute the Lagrangian function L associated to F and describe the corresponding dual problem, we start to analyze the structure of this problem. The primal function f can be written as f (v) = (Av) + (v),
i
i i
i
i
i
i
366
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
where : L2 ()N → R is the convex integral functional 1 (w) = |w(x)|2 dx 2 and A is the gradient operator, which can be viewed as a linear continuous operator from H01 () into L2 ()N . The functional : H01 () → R is the linear and continuous mapping (v) = − h(x)v(x) dx.
The perturbation function F :
H01 ()
× L ()N → R can then be written as 2
F (v, y) = (Av + y) + (v). Theorem 9.8.1. Let V and Y be two Banach spaces and suppose ∈ 0 (Y ), ∈ 0 (V ), and A ∈ L(V , Y ) (A is a linear continuous operator). Consider the primal problem % & inf (Av) + (v) (P) v∈V
and the perturbation function F : V × Y → R ∪ {+∞} defined by F (v, y) = (Av + y) + (v). Then the following facts hold. (a) The Lagrangian function L : V × Y ∗ → R associated to F is given by L(v, y ∗ ) = (v) + y ∗ , Av − ∗ (y ∗ ) and the dual problem (P ∗ ) is equal to % & sup − ∗ (−A∗ y ∗ ) − ∗ (y ∗ ) , y ∗ ∈Y ∗
(P ∗ )
where A∗ is the adjoint operator of A. (b) Let us assume that there exists some v0 ∈ V such that (v0 ) < +∞, (A(v0 )) < +∞ and is continuous at Av0 . Then (P ∗ ) has at least one solution y ∗ . If u is a solution of (P), one has the following extremality relations: −A∗ y ∗ ∈ ∂(u), y ∗ ∈ ∂(Au). Proof. (a) By definition of Lagrangian (Proposition 9.7.2) & % L(v, y ∗ ) = inf − y ∗ , y + F (v, y) y∈Y & % = inf − y ∗ , y + (Av + y) + (v) y∈Y & % = (v) − sup y ∗ , y − (Av + y) y∈Y
% & = (v) − sup y ∗ , Av + y − (Av + y) − y ∗ , Av y∈Y
= (v) + y ∗ , Av − ∗ (y ∗ ).
i
i i
i
i
i
i
9.8. Duality in the calculus of variations: First examples
“abmb 2005/1 page 3 i
367
The perturbation function F is convex and lower semicontinuous: this is an immediate consequence of the facts that ∈ 0 (Y ), ∈ 0 (V ), and A ∈ L(V , Y ). Therefore, the primal and the dual problems can be expressed in terms of the Lagrangian function L and we have (Proposition 9.7.3) (P) inf v∈V supy ∗ ∈Y ∗ L(v, y ∗ ), (P ∗ ) supy ∗ ∈Y ∗ inf v∈V L(v, y ∗ ). Let us compute the dual function d: d(y ∗ ) = inf L(v, y ∗ ) v∈V & % = inf (v) + y ∗ , Av − ∗ (y ∗ ) v∈V & % = −∗ (y ∗ ) − sup −y ∗ , Av − (v) v∈V & % = −∗ (y ∗ ) − sup −A∗ y ∗ , v − (v) v∈V
= −∗ (y ∗ ) − ∗ (−A∗ y ∗ ). Therefore, from the analysis made in Section 9.7, part (a) is proved. (b) The existence of v0 ∈ V such that (v0 ) < +∞ with continuous at Av0 clearly implies that the generalized Slater condition is satisfied. Hence (P ∗ ) admits at least a solution (Theorem 9.7.2). Let y ∗ be such a solution. Let u be a solution of the primal problem (P). We know that there is no duality gap. Hence inf(P) = sup(P ∗ ), i.e., (Au) + (u) = −(y ∗ ) − ∗ (−A∗ y ∗ ). Thus
(Au) + ∗ (y ∗ ) + (u) + ∗ (−A∗ y ∗ ) = 0.
Equivalently, (Au) + ∗ (y ∗ ) − y ∗ , Au + (u) + ∗ (−A∗ y ∗ ) + A∗ y ∗ , u = 0. Since by the Fenchel inequality the quantities (Au) + ∗ (y ∗ ) − y ∗ , Au and (u) + ∗ (−A∗ y ∗ ) + A∗ y ∗ , u are nonnegative, we obtain (Au) + ∗ (y ∗ ) − y ∗ , Au = 0, (u) + ∗ (−A∗ y ∗ ) + A∗ y ∗ , u = 0. These are the Fenchel extremality relations, which are equivalent to ∗ y ∈ ∂(Au), −A∗ y ∗ ∈ ∂(u). Remark 9.8.1. It is often more convenient to write the dual problem as a minimization problem % ∗ ∗ & ∗ ∗ ∗ (y ) + (−A y ) . inf ∗ ∗ y ∈Y
i
i i
i
i
i
i
368
“abmb 2005/1 page 3 i
Chapter 9. Convex duality and optimization
Let us come back to the Dirichlet problem and apply the above results: we recall that V = H01 () and Y = L2 ()N . (a) A : H01 () → L2 ()N is the gradient operator. The adjoint operator A∗ : L2 ()N → H −1 () = H01 ()∗ is defined by A∗ y, v(H −1 ,H01 ) = y, AvL2 ()N N ∂v yi dx = ∂xi i=1 3 N 4 ∂yi = − ,v ∂xi i=1
,
(D (),D())
where in the last equality v varies in D()N , that is, A∗ y = − N i=1 Di yi = − div y in the distribution sense, i.e., A∗ is the opposite of the divergence operator. (b) We know by Theorem 9.3.3 that the conjugate of the function 1 (y) = |y(x)|2 dx 2 is equal to ∗ (y ∗ ) = (c) Let (v) = −
1 2
|y ∗ (x)|2 dx.
h(x)v(x) dx. An easy computation gives % & ∗ (v ∗ ) = sup v ∗ , v(H −1 ,H01 ) − −h, vL2 ()
v∈H01 ()
= sup v ∗ + h, v(H −1 ,H01 ) v∈H01 ()
=
0 if v ∗ + h = 0, +∞ otherwise.
We now collect all these results to obtain, thanks to Theorem 9.8.1, the description of the dual problem (P ∗ ) of the Dirichlet problem 1 ∗ 2 ∗ 2 N ∗ sup − |y (x)| dx : y ∈ L () , div y = h . (P ∗ ) 2 Clearly, the generalized Slater condition is satisfied (F is everywhere continuous!). Thus there exists a solution y∗ of (P ∗ ) and this solution is unique because of the strict convexity of the mapping y ∗ → |y ∗ (x)|2 dx. On the other hand, we know that (P) admits a unique solution u. The extremality relations yield ∗ y = Au = grad u, div y ∗ = −h, i.e., we have − div(grad u) = h which is in accordance with the definition of u.
i
i i
i
i
i
i
“abmb 2005/1 page 3 i
Part II
Advanced Variational Analysis
369
i
i i
i
i
“abmb 2005/1 page 3 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 3 i
Chapter 10
Spaces BV and SBV
The modelization of a large number of problems in physics, mechanics, or image processing requires the introduction of new functional spaces permitting discontinuities of the solution. In phase transitions, image segmentation, plasticity theory, the study of cracks and fissures, the study of the wake in fluid dynamics, and so forth, the solution of the problem presents discontinuities along one-codimensional manifolds. Its first distributional derivatives are now measures which may charge zero Lebesgue measure sets, and the solution of these problems cannot be found in classical Sobolev spaces. Thus, the classical theory of Sobolev spaces must be completed by the new spaces BV () and SBV ().
10.1 The space BV (): Definition, convergences, and approximation In this section is an open subset of RN . Let us recall (see chapter 4) that M(, RN ) denotes the space of all RN -valued Borel measures, which is also, according to the Riesz theory, the dual of the space C0 (, RN ) of all continuous functions ϕ vanishing at infinity, N 2 1/2 equipped with the uniform norm ||ϕ||∞ = . Note that M(, RN ) i=1 supx∈ |ϕi (x)| N is isomorphic to the product space M () and that µ = (µ1 , . . . , µN ) ∈ M(, RN ) ⇐⇒ µi ∈ C0 (), i = 1, . . . , N. Definition 10.1.1. We say that a function u : → R is a function of bounded variation iff it belongs to L1 () and its gradient Du in the distributional sense belongs to M(, RN ). We denote the set of all functions of bounded variation by BV (). The four following assertions are then equivalent: (i) u ∈ BV (); (ii) u ∈ L1 () and ∀i = 1 . . . N,
∂u ∈ M(); ∂xi
(iii) u ∈ L1 () and ||Du|| := sup{Du, ϕ : ϕ ∈ Cc (, RN ), ||ϕ||∞ ≤ 1} < +∞; 371
i
i i
i
i
i
i
372
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
(iv) u ∈ L1 () and ||Du|| = sup{ u div ϕ dx : ϕ ∈ C1c (, RN ), ||ϕ||∞ ≤ 1} < +∞, where the bracket in (iii) is defined by Du, ϕ :=
N i=1
ϕi
∂u . ∂xi
Equivalence between (ii) and (iii) is a direct consequence of the density of the space Cc (, RN ) in C0 (, RN ) equipped with the uniform norm. Equivalence between (iii) and N N 1 N (iv) can easily be established by the density of C∞ c (, R ) in Cc (, R ) and Cc (, R ). Remark 10.1.1. According to the vectorial version of the Riesz–Alexandroff representation theorem, Theorem 2.4.6, the dual norm ||Du|| is also the total mass |Du|() = |Du| of the total variation |Du| of the measure Du. Moreover, from classical integration theory, N the integral f Du can be defined for all Du-integrable functions f from into R as, N for example, for functions in Cb (, R ). For the same reasons, f |Du| is well defined for all |Du|-integrable real-valued functions f as, for example, for functions in Cb (). According to the Radon–Nikodym theorem, Theorem 4.2.1, there exist ∇u ∈ L1 (, RN ) and a measure Ds u, singular with respect to the N -dimensional Lebesgue measure LN restricted to , such that Du = ∇u LN + Ds u. Consequently, W 1,1 () is a subspace of the vectorial space BV () and u ∈ W 1,1 () iff Du = ∇u LN . For functions in W 1,1 (), we will sometimes write ∇u for Du. The space BV () is equipped with the following norm, which extends the classical norm in W 1,1 (): ||u||BV () := |u|L1 () + ||Du||. We will define two weak convergence processes in BV (). The first is too weak to ensure continuity of the trace operator defined in Section 10.2 but sufficient to provide compactness of bounded sequences. The second is an intermediate convergence between the weak and the strong convergence associated with the norm. Definition 10.1.2. A sequence (un )n∈N in BV () weakly converges to some u in BV (), and we write un u iff the two following convergences hold: un → u strongly in L1 (); Dun Du weakly in M(, RN ). We will see later that when is regular, the boundedness of a sequence in BV () is sufficient to ensure the existence of a weak cluster point (Theorem 10.1.4). In the proposition below we establish a compactness result related to this convergence, together with the lower semicontinuity of the total mass. Proposition 10.1.1. Let (un )n∈N be a sequence in BV () strongly converging to some u in L1 () and satisfying supn∈N |Dun | < +∞. Then
i
i i
i
i
i
i
10.1. The space BV (): Definition, convergences, and approximation (i) u ∈ BV () and
|Du| ≤ lim inf n→+∞
“abmb 2005/1 page 3 i
373
|Dun |;
(ii) un weakly converges to u in BV (). Proof. For all ϕ in C1c (, RN ) such that ||ϕ||∞ ≤ 1, we have
u div ϕ dx = lim
n→+∞
un div ϕ dx ≤ lim inf n→+∞
|Dun |
and assertion (i) is proved by taking the supremum in the first member, over all the elements ϕ in C1c (, RN ) satisfying ||ϕ||∞ ≤ 1. We now establish (ii). Since un strongly converges to u in L1 (), for all ϕ ∈ ∞ Cc (, RN ), one has
Dun , ϕ = −
un div ϕ dx → −
u div ϕ dx = Du, ϕ.
N N By using the density of C∞ c (, R ) in C0 (, R ) for the uniform norm and the boundedness of (Dun )n∈N , we easily conclude that the sequence (Dun )n∈N weakly converges to Du.
As a consequence of the semicontinuity property (i), BV () is a complete normed space. Theorem 10.1.1. Equipped with its norm, BV () is a Banach space. Proof. Let (un )n∈N be a Cauchy sequence in BV (). Then (un )n∈N is a Cauchy sequence in L1 () and for all ε > 0 there exists Nε in N such that |Dup − Duq | < ε.
for all p, q > Nε ,
(10.1)
Thus, there exists u ∈ L1 () such that un → u strongly in L1 (). In particular up − uq → u − uq strongly in L1 () when p goes to +∞. According to the lower semicontinuity property (i) of Proposition 10.1.1, (10.1) yields, for q > Nε ,
|D(u − uq )| ≤ lim inf
p→+∞
|D(up − uq )| ≤ ε.
This estimate yields first u ∈ BV () then limq→+∞ in BV ().
|D(u − uq )| = 0, so that un → u
To define the second weak convergence process, let us recall the notion of narrow convergence defined in Section 4.2.2. As said in Remark 10.1.1, the integral f |Du| is well defined for all f in the space Cb () of bounded continuous functions on . Thus
i
i i
i
i
i
i
374
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
|Du| may be considered as an element of Cb (). We now say that a sequence (|Dun |)n∈N narrowly converges to µ in M() iff |Dun | µ for the σ (Cb (), Cb ()) convergence (see Section 4.2.2). Definition 10.1.3. Let (un )n∈N be a sequence in BV () and u ∈ BV (). We say that un converges to u in the sense of the intermediate convergence iff 1 u n → u strongly in L (), |Dun | → |Du|.
The term intermediate convergence is due to Temam [219] and is also called strict convergence. Let us notice that according to Proposition 10.1.1(i), when un strongly converges to u in L1 (), there is in general loss of the total mass at the limit. The proposition below states that this convergence is stronger than the weak convergence, therefore justifying the terminology. Proposition 10.1.2. The three following assertions are equivalent: (i) un u in the sense of the intermediate convergence; (ii) (iii)
u in BV (), n u weakly |Du | → n |Du|; un → u strongly in L1 (), |Dun | |Du| narrowly in M().
Proof. (i) ⇒ (ii). This implication is a straightforward consequence of Proposition 10.1.1. We are going to prove (ii) ⇒ (iii). Let us recall (cf. Proposition 4.2.5) that for nonnegative Borel measures µn and µ in M(), there is equivalence between µn µ narrowly and µn () → µ(), µ(U ) ≤ lim inf µn (U ) ∀ open subset U of . n→+∞
Set µn = |Dun | and µ = |Du|. At first we have µn () = |Dun | → µ() = |Du|. Let now U be any open subset of . Obviously, un and u belong to BV (U ) and un → u strongly in L1 (U ). Applying Proposition 10.1.1 with = U (note that supn∈N U |Dun | ≤ supn∈N |Dun | < +∞), we obtain
|Du| ≤ lim inf U
n→+∞
|Dun |. U
Implication (iii) ⇒ (i) is straightforward.
i
i i
i
i
i
i
10.1. The space BV (): Definition, convergences, and approximation
“abmb 2005/1 page 3 i
375
Remark 10.1.2. It results from Propositions 4.2.5 and 10.1.2 that if un u in the sense of the intermediate convergence, for all Borel subset B of such that ∂B |Du| = 0, one has |Dun | → |Du|. B
B
More generally, according to Proposition 4.2.6, for all bounded Borel function f : → R such that the set of its discontinuity points has a null |Du|-measure, one has f |Dun | → f |Du|.
Remark 10.1.3. The intermediate convergence is strictly finer than the weak convergence in BV (). Indeed, the sequence of functions in BV (0, 1) defined by nx if 0 < x ≤ n1 , un (x) = 1 if x > n1 , weakly converges to the function 1 in BV (0, 1) and does not converge in the sense of the intermediate convergence. Indeed, the total mass |Dun |(0, 1) is the constant 1. The space C∞ () is not dense in BV () when BV () is equipped with its strong convergence. Indeed, its closure is the space W 1,1 () (see Proposition 5.4.1). Nevertheless, one can approximate every element of BV () by a function of C∞ () in the sense of the intermediate convergence. Theorem 10.1.2. The space C ∞ () ∩ BV () is dense in BV () equipped with the intermediate convergence. Consequently C∞ () is also dense in BV () for the intermediate convergence. Proof. First notice that C∞ ()∩BV () = C∞ ()∩W 1,1 (). The second assertion is then a straightforward consequence of the density of the space C∞ () in W 1,1 () equipped with its strong convergence (Proposition 5.4.1) which is finer than the intermediate convergence. Let ε > 0, intended to go to zero, and u ∈ BV (). We are going to construct uε in C∞ () ∩ W 1,1 () such that |u − uε | dx < ε and | |Duε | − |Du| | < 4ε. (10.2)
The following construction is similar to the proof of the Meyers–Serrin theorem, Theorem 5.1.4. Let us consider a family (i )i∈N of open subsets of such that |Du| < ε; (10.3) \0
i ⊂⊂ i+1 ; ∞ $ = i . i=0
i
i i
i
i
i
i
376
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
We construct the open covering (Ci )i∈N∗ of as follows: set C1 = 2 and, for i ≥ 2, to the covering Ci = i+1 \ i−1 . Let now (ϕi )i∈N∗ be a partition of unity subordinate ∞ (Ci )i∈N∗ . The functions ϕi satisfy : ϕi ∈ C∞ (C ), 0 ≤ ϕ ≤ 1, ϕ = 1. Note that i i i c i=1 ϕ1 = 1 on 1 . For each i, choose εi > 0 such that spt(ρεi ∗ ϕi u) ⊂ Ci , |ρε ∗ (ϕ1 Du)| dx − |ϕ1 Du| < ε, 1 |ρεi ∗ (uϕi ) − uϕi | dx < ε2−i , |ρεi ∗ (uDϕi ) − uDϕi | dx < ε2−i ,
(10.4) (10.5) (10.6) (10.7)
where ρεi are the regularizers defined in Theorem 4.2.2. Estimate (10.5) is obtained by applying Theorem 4.2.2(iii) to the measure ϕ1 Du. Estimates (10.6) and (10.7) are straightforward consequences of the convergence of ρε ∗ (uϕi ) and ρε ∗ (uDϕi ), respectively, to uϕi and uDϕi in L1 () (see Proposition 2.2.4). We define uε by uε =
∞
ρεi ∗ (uϕi ).
i=1
Note that each x in belongs to at most two of the sets Ci and that, by (10.4), uε is well defined and clearly belongs to C∞ (). From (10.6) we obtain |u − uε | dx ≤
∞ i=1
|ρεi ∗ (uϕi ) − uϕi | dx < ε.
We are going to establish the last estimate of (10.2). In the distributional sense, we have D(uϕi ) = ϕi Du + uDϕi L so that Duε = = =
∞ i=1 ∞ i=1 ∞
D(ρεi ∗ (uϕi )) = ρεi ∗ (ϕi Du) + ρεi ∗ (ϕi Du) +
i=1
∞
ρεi ∗ (D(uϕi ))
i=1 ∞
ρεi ∗ (uDϕi )
i=1 ∞
ρεi ∗ (uDϕi ) − uDϕi .
i=1
Therefore, according to (10.7), Theorem 4.2.2(ii), and (10.3), ∞ ≤ |ρε ∗ (ϕ1 Du)| dx − |Du | |ρεi ∗ (ϕi Du)| dx ε 1
+
i=2 ∞ i=1
|ρεi ∗ (uDϕi ) − uDϕi | dx
i
i i
i
i
i
i
10.1. The space BV (): Definition, convergences, and approximation ≤ ≤
∞
377
|ρεi ∗ (ϕi Du)| dx + ε
i=2 ∞ i=2
“abmb 2005/1 page 3 i
|ϕi Du| + ε
|Du| + ε < 2ε.
≤
(10.8)
\0
On the other hand, from (10.5), (10.3) and because ϕ1 = 1 on 1 , ≤ ε + (1 − ϕ1 )|Du| |ρε ∗ (ϕ1 Du)| dx − |Du| 1 ≤ε+ |Du| ≤ 2ε.
(10.9)
\0
Collecting (10.8), (10.9) we obtain < 4ε, |Duε | − |Du|
which completes the proof of (10.2). Theorem 10.1.2 allows us to extend Sobolev’s inequalities and compactness embedding results on W 1,1 () (see Section 5.7) to the space BV (). Theorem 10.1.3. Let be a 1-regular open bounded subset of RN . For all p, 1 ≤ p ≤ the embedding BV () → Lp ()
N , N −1
is continuous. More precisely, there exists a constant C which depends only on , p, and N , such that for all u in BV (), p1 |u|p dx ≤ C|u|BV () .
Proof. Let (un )n∈N be a sequence of functions in C∞ ()∩BV () which converges to some u in BV () for the intermediate convergence. Since the embedding W 1,1 () → Lp () is N continuous for 1 ≤ p ≤ N−1 , there exists a constant C, which depends only on , p, and N such that p1 p 1 |un | dx ≤ C |un |L () + |Dun | dx < +∞.
We deduce that un u in Lp () and, according to the weak lower semicontinuity of the norm of Lp (), p1 p1 |u|p dx ≤ lim inf |un |p dx n→+∞ ≤ lim inf C |un |L1 () + |Dun | dx n→+∞
= CuBV () ,
i
i i
i
i
i
i
378
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
where we have used the intermediate convergence in the last equality. Theorem 10.1.4. Let be a 1-regular open bounded subset of RN . Then for all p, N 1 ≤ p < N−1 the embedding BV () → Lp () is compact. Proof. According to Theorem 10.1.3, every element of BV () belongs to Lp () for 1 ≤ N p ≤ N−1 , so that we can slightly improve the density Theorem 10.1.2 as follows: for all u ∈ BV (), there exists un ∈ C∞ () ∩ BV () satisfying p u n → u in L (), |Du|. |Dun | →
We conclude thanks to the compactness of the embedding of W 1,1 () → Lp (). Indeed, let un be such that un BV () ≤ 1 and vn ∈ C∞ () ∩ BV () be such that 1 |v n − un |Lp () ≤ n , |Dvn | dx ≤ 2.
Since vn W 1,1 () ≤ 4 for n large enough there exists a subsequence (vnk )k∈N and u in Lp () such that vnk → u strongly in Lp (), thus unk → u strongly in Lp (). By the lower semicontinuity of the total mass, and since unk strongly converges to u in L1 (), we obtain |u|L1 () + |Du| ≤ lim |unk |L1 () + lim inf |Dunk |
k→+∞
k→+∞
≤ lim inf unk BV () ≤ 1
k→+∞
and the proof is complete.
10.2 The trace operator, the Green’s formula, and its consequences In all this section, is a domain of RN with a Lipschitz boundary (i.e., a Lipschitz domain). Under a weaker hypothesis on the regularity of the set , we extend in the space BV () the notion of trace developed for Sobolev functions in Section 5.6. It is worth noticing that the method used for establishing the trace theorem below also applies to Sobolev functions. Theorem 10.2.1. There exists a linear continuous map γ0 from BV () onto L1HN −1 () satisfying
i
i i
i
i
i
i
10.2. The trace operator, the Green’s formula, and its consequences
“abmb 2005/1 page 3 i
379
(i) for all u in C() ∩ BV (), γ0 (u) = u; (ii) the generalized Green’s formula holds: ∀ϕ ∈ C1 (, RN ), ϕDu = − u div ϕ dx + γ0 (u) ϕ.ν dHN −1 ,
where ν(x) is the outer unit normal at HN −1 almost all x in . Proof. Each generic element x in RN will be denoted by x = (x, ˜ xN ), where x˜ = (x1 , . . . , xN−1 ) ∈ RN−1 and xN ∈ R. Let us consider a finite cover of by the open cylinders ˜ × (yN − R, yN + R), y = (y, ˜ yN ) ∈ , CR (y) = SR (y) ˜ is the open ball of RN −1 with radius R centered at y˜ = (y1 , . . . , yN −1 ). Since where SR (y) is Lipschitz regular, relabeling the coordinate axes if necessary, there exists ε0 > 0 and a Lipschitz function f such that ∩ CR (y) contains the open set ˜ f (x) ˜ − ε0 < xN < f (x)}, ˜ CR,ε0 (y) := {x ∈ RN : x˜ ∈ SR (y), and ˜ ˜ xN = f (x)} (y) := {(x, ˜ xN ) : x˜ ∈ SR (y), is a neighborhood of y in . Let u be a fixed function in BV (). According to Lemma 4.2.2, since CR,ε (y) |Du| < +∞, R and ε0 can be chosen so that the measure |Du| does 0 not charge ∂CR,ε0 (y) \ (y), i.e., ∂CR,ε (y)\(y) |Du| = 0. 0
First step. We fix y in and, to shorten notation, we denote the sets (y), SR (y), ˜ and CR,ε0 (y) by , SR , and CR,ε0 , respectively. Since is Lipschitz, according to Rademacher’s theorem (see [132]), the outer unit normal ν(x) exists at HN −1 a.e. x on . This step is devoted to the existence of u+ in L1HN −1 () satisfying + N−1 ≤C |u | dH
|u| dx +
CR,ε0
1 N ∀ϕ ∈ Cc (CR,ε0 ∪ , R )
|Du|
CR,ε0
ϕDu = − CR,ε0
u+ ϕν dHN −1 ,
u div ϕ dx + CR,ε0
(10.10) where C is a positive constant depending only on f . For all regular function v defined on CR,ε0 and all ε in ]0, ε0 [, we adopt the notation ˜ := v(x, ˜ f (x) ˜ − ε). Consider now a sequence (un )n∈N in C∞ (CR,ε0 ) ∩ BV (CR,ε0 ) v ε (x) which converges to u in BV (CR,ε0 ) in the sense of intermediate convergence (see Theorem 10.1.2). We have |un − u| dx → 0, CR,ε0 (10.11) |Dun | dx → |Du|. CR,ε0
CR,ε0
i
i i
i
i
i
i
380
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
Since the function un is smooth, for ε > ε in ]0, ε0 [ one has
˜ − uεn (x) ˜ = uεn (x)
−ε
so that
∂un (x, ˜ f (x) ˜ + s) ds, ∂xN
−ε
˜ − uεn (x)| ˜ ≤ |uεn (x)
ε
ε
|Dun (x, ˜ f (x) ˜ − s)| ds.
Thus, according to Proposition 4.1.6 and Remark 4.1.5 applied to the map SR ⊂ RN −1 → RN , x˜ → (x, ˜ f (x)), ˜ we deduce |uεn (x) ˜
−
uεn (x)| ˜
dH
N −1
# |uεn (x) ˜ − uεn (x)| ˜ 1 + |Df (x)| ˜ 2 d x˜
(x) ≤
SR
≤C
ε
ε
SR
=C
|Dun (x, ˜ f (x) ˜ − s)| ds d x˜
|Dun (x)| dx, CR,ε,ε
where C is a positive constant depending only on f and CR,ε,ε = {x ∈ RN : x˜ ∈ SR , f (x) ˜ − ε < xN < f (x) ˜ − ε }. Therefore, with the notation made precise above,
|uεn − uεn | dHN −1 ≤ C
|Dun | dx.
(10.12)
CR,ε,ε
We intend to go to the limit on n in (10.12). From the coarea formula (Theorem 4.2.5), one has 0 |uεn (x) ˜ − uε (x)| ˜ dHN −1 (x) dε ≤ C |un (x) − u(x)| dx, −ε0
CR,ε0
and the first limit in (10.11) yields the existence of a subsequence on n (not relabeled) such that for a.e. ε in ]0, ε0 [, uεn → uε in L1HN −1 (). On the other hand, according to Proposition 10.1.2, (10.11) ensures the narrow convergence of the measure |Dun | to the measure |Du| in M+ (CR,ε0 ). Let us consider ε, ε such that ∂CR,ε,ε |Du| = 0. This choice is indeed valid in the complementary I of a countable subset of ]0, ε0 [, by using Lemma 4.2.2 and because |Du| does not charge ∂CR,ε0 \ . According to the properties of the narrow convergence (cf. Proposition 4.2.5), we have, for ε and ε in I ,
|Dun | → CR,ε,ε
|Du|. CR,ε,ε
i
i i
i
i
i
i
10.2. The trace operator, the Green’s formula, and its consequences
“abmb 2005/1 page 3 i
381
Going to the limit on n in (10.12), we finally obtain, for ε in ]0, ε0 [\N , where N is an L1 -negligible set, |uε − uε | dHN −1 ≤ C |Du|. (10.13)
CR,ε,ε
From now on, ε denotes a sequence of numbers in ]0, ε0 [\N , going to zero and (10.13) must be taken in the sense |Du|. |uεp − uεq | dHN −1 ≤ C CR,εp ,εq
From (10.13), (uε )ε is a Cauchy sequence in L1HN −1 (), then strongly converges to some function u+ in L1 (). It remains to prove that u+ satisfies (10.10). Letting ε → 0 in (10.13) and integrating over ] − ε0 , 0[, we obtain
'
+
|u | dH
N−1
(
≤C
|u| dx +
|Du| .
CR,ε0
(10.14)
CR,ε0
On the other hand, since un ∈ C∞ (CR,ε0 ) and ϕ ∈ C1c (CR,ε0 ∪ , RN ), going to the limit on n in the classical Green’s formula un div ϕ dx = − Dun .ϕ dx + uεn ϕ ε .ν dHN −1 , CR,ε0 ,ε
CR,ε0 ,ε
where ˜ f (x) ˜ − ε0 < xN < f (x) ˜ − ε}, CR,ε0 ,ε := {x ∈ RN : x˜ ∈ SR (y), we claim that
u div ϕ dx = − CR,ε0 ,ε
uε ϕ ε .ν dHN −1 .
ϕDu + CR,ε0 ,ε
(10.15)
We must justify the convergence
Dun .ϕ dx → CR,ε0 ,ε
(10.16)
ϕDu. CR,ε0 ,ε
The two others are straightforward. We reason by truncation: let ε < ε and consider ϕ˜ = ϕθ in C1c (CR,ε0 ,ε , RN ) where the scalar function θ belongs to Cc (CR,ε0 ,ε ) and satisfies θ = 1 in CR,ε0 ,ε , with ||θ ||∞ ≤ 1. We have
Dun ϕ dx = CR,ε0 ,ε
Dun ϕ˜ dx −
CR,ε0 ,ε
Dun ϕ˜ dx.
(10.17)
CR,ε,ε
i
i i
i
i
i
i
382
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV Dun ϕ˜ dx ≤ ||ϕ|| ˜ ∞ |Dun | CR,ε,ε CR,ε,ε
From
and according to the narrow convergence of |Dun | to |Du|, and because |Du| does not charge ∂CR,ε,ε , for ε and ε in ]0, ε0 [\N , we obtain lim lim sup Du ϕ ˜ dx = 0. n ε →ε n→+∞ C R,ε,ε The convergence in (10.16) is obtained by letting n → +∞ and ε → ε in (10.17) and the claim is proved. Finally going to the limit on ε in (10.15) we obtain u div ϕ dx = − ϕDu − u+ ϕ.eN HN −1 . CR,ε0
CR,ε0
SR
Second step. According to the first step and from a straightforward argument using a partition of unity subordinate to a finite cover (CR (yi ))i=1,...,r of , there exists γ0 (u) in L1HN −1 () satisfying N −1 |γ0 (u)| dH ≤C |u| dx + |Du| (10.18)
and such that for all ϕ ∈ C(, RN ), ϕDu = − u div ϕ dx + γ0 (u) ϕ.ν dHN −1 ,
(10.19)
where ν(x) is the outer normal unit at HN −1 almost all x in and C a positive constant depending only on . The operator γ0 is well defined by γ0 (u) = u+ i , in i , i = 1, . . . , r. Indeed, Green’s formula (10.10) established in the first step yields + N −1 (u+ =0 i − uj )ϕ.ν dH i ∩j
+ for all functions ϕ in C1c ((CR,ε0 (yi ) ∩ CR,ε0 (yj )) ∪ (i ∩ j ), RN ) so that u+ i = uj in i ∩ j up to sets of HN−1 measure zero. The generalized Green’s formula (10.19) yields the linearity of γ0 . The continuity is a consequence of (10.18). The identity γ0 (u) = uSR for all u in C() ∩ BV () is a straightforward consequence of the definition of γ0 (u). The operator γ0 then agrees with the trace operator defined in W 1,1 (). Since γ0 (W 1,1 ()) = L1 (), we also have γ0 (BV ()) = L1 ().
Remark 10.2.1. Let be a Lipschitz open bounded subset of RN . The density theorem, Theorem 10.1.2, may be slightly improved in the sense that one may further assert that the trace of each regular approximating function of u ∈ BV (), belonging to C∞ ()∩BV (),
i
i i
i
i
i
i
10.2. The trace operator, the Green’s formula, and its consequences
“abmb 2005/1 page 3 i
383
coincides with the trace of u on the boundary of . Indeed, it is easily seen, with the notations of Theorem 10.1.2, that uε − u is the strong limit in BV () of the functions uε,n − un :=
n
ρεi ∗ (uϕi ) − uϕi
i=0
whose traces on := ∂ are the function null. The result then follows from the strong continuity of the trace operator. Remark 10.2.2. One may define the space BV (, Rm ) as the space of all functions u : → Rm in L1 (, Rm ) whose distributional derivative Du belongs to the space M(, M m×N ) of m × N matrix-valued measures. Arguing as in the proof of Theorem 10.2.1 with each component of u, one may prove the existence of the trace operator γ0 from BV (, Rm ) onto L1HN −1 (, Rm ) satisfying (i) ∀u ∈ C(, Rm ) ∩ BV (, Rm ), γ0 (u) = u; (ii) the Green’s formula holds : ∀ϕ ∈ C1 (, M m×N ) ϕ : Du = − u. div ϕ dx + γ0 (u) ⊗ ν : ϕ dHN −1 ,
where ν(x) is the outer unit normal at HN −1 almost all x in , γ0 (u) ⊗ ν is the M m×N valued function (γ0 (u)i νj )i=1...m,j =1...N . The integral with respect to the measure Du in the first member is defined by ϕ : Du :=
and γ0 (u) ⊗ ν : ϕ :=
m N i=1 j =1 N m
ϕi,j
∂uj ∂xi
γ0 (u)i νj ϕi,j .
i=1 j =1
The divergence of ϕ is the vector valued distribution (div ϕ)j := 1 . . . m.
N
∂uj i=1 ∂xi
, j =
The density Theorem 10.1.2 and Remark 10.2.1 also hold in BV (, Rm ). We now give some consequences of Theorem 10.2.1. Example 10.2.1. Consider two disjoint Lipschitz domains 1 and 2 , included in an open bounded subset of RN , such that = 1 ∪ 2 , and set 1,2 := ∂1 ∩ ∂2 which is assumed to satisfy HN−1 (1,2 ) > 0 (see Figure 10.1). We respectively denote the trace operators from BV (1 ) onto L1 (∂1 ) and BV (2 ) onto L1 (∂2 ) by γ1 and γ2 . Let u1 and u2 be, respectively, two functions in BV (1 ) and BV (2 ) and define u1 in 1 , u= u2 in 2 .
i
i i
i
i
i
i
384
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
Then u belongs to BV () and Du = Du1 1 + Du2 2 + [u]νHN −1 1,2 , where [u] = γ1 (u1 ) − γ2 (u2 ) and ν(x) is the unit inner normal at x to 1,2 considered as a part of the boundary of 1 (see Figure 10.1). In particular, if u ∈ W 1,1 ( \ 1,2 ) Du = ∇u LN + [u]νHN −1 1,2 , where ∇u is the gradient of u in L1 ().
Figure 10.1. The set Proof. For all ϕ ∈ C1c (, RN ), Du, ϕ = − u div ϕ dx = −
u1 div ϕ dx −
1
u2 div ϕ dx. 2
Since ϕ belongs to C1 (1 , RN ) ∩ C1 (2 , RN ), applying the generalized Green’s formula in BV (1 ) and BV (2 ), we have u div ϕ dx = − ϕ Du + γ1 (u1 )ϕ.(−ν) HN −1 , 1 1 1 1 1,2 u2 div ϕ dx = − ϕ Du2 + γ2 (u2 )ϕ.ν HN −1 . 2
2
1,2
By summing these two equalities, we obtain Du, ϕ = ϕ Du1 + ϕ Du2 + 1
2
γ1 (u1 ) − γ2 (u2 ) ϕ.νHN −1 .
(10.20)
1,2
Assume now ||ϕ||∞ ≤ 1. According to the continuity of γ1 and γ2 , there exists a positive constant C depending only on 1 and 2 such that N −1 γ (u ) − γ (u ) ϕ.νH 1 1 2 2 ≤ C u1 BV (1 ) + u2 BV (2 ) , 1,2
so that u belongs to BV () and
Du := sup{Du, ϕ : ϕ ∈ Cc (, RN ), ||ϕ||∞ ≤ 1} ≤ C u1 BV (1 ) + u2 BV (2 ) .
Finally, (10.20) can be written Du, ϕ = γ1 (u1 ) − γ2 (u2 ) ϕ.νHN −1 1,2 . ϕ Du1 1 + Du2 2 +
i
i i
i
i
i
i
10.2. The trace operator, the Green’s formula, and its consequences
“abmb 2005/1 page 3 i
385
This shows that Du = Du1 1 + Du2 2 + γ1 (u1 ) − γ2 (u2 ) νHN −1 1,2 on C1c (, RN ), and thus are equal by density. Example 10.2.2. Let us slightly modify the previous example by considering the function u in , v= 0 in RN \ , where is a Lipschitz domain of RN and u ∈ BV (). We see that v belongs to BV (RN ) and Dv = Du + u+ ν HN −1 , where is the boundary of , ν denotes the inner unit vector normal to , and u+ the trace of u on . Thus, taking for instance u equal to the constant 1 in , the characteristic function 1 of belongs to BV (RN ) (more precisely in a subspace SBV (RN ) introduced in Section 10.5) and D1 = ν HN −1 . This formula will be generalized in Section 10.3 when possesses a reduce boundary ∂r (which in general does not coincides with the topological boundary) and a generalized unit normal ν to ∂r . More precisely, for 1 belonging to BV (RN ), we will obtain D1 = ν HN −1 ∂r . Example 10.2.3. Jump through a family of hypersurfaces. Let be a Lipschitz domain of RN , u a function in BV (), and (t )t∈I a family of Lipschitz hypersurfaces such that t ⊂ is the boundary of a Lipschitz open bounded subset t of with t < t ⇒ t ⊂⊂ t . As a consequence of Example 10.2.1, one has for all but countably many t in I , the jumps [u]t of u through t are null. Proof. Indeed, from Example 10.2.1, |Du| = |[u]t | dHN −1 , t
t
and we conclude by Lemma 4.2.1. Example 10.2.4. Let be a Lipschitz domain of RN and u ∈ BV (). For all t > 0, consider the Lipschitz domains t = {x ∈ : d(x, RN \ ) > t} with boundary t and − denote, respectively, by u+ t and ut the traces of u when u is considered as an element of BV (t ) or BV ( \ t ). We have for all but countably many t in R+ : the traces u+ t (x) and N−1 u− (x) agree with u(x) for H almost x in . t t Proof. By using arguments of the proof of Theorem 10.2, one may assume that is a cylinder. With the notations of this theorem, for all t in a complementary of a countable subset of R+ , estimate (10.13) becomes |ut+ε − ut | dHN −1 ≤ C |Du|.
CR,t+ε,t
i
i i
i
i
i
i
386
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
We deduce, for these t, that ut+ε tends to ut in L1 () when ε goes to 0+ . On the other 1 hand, according to the proof of the trace theorem, ut+ε tends to u+ t in L (t ) when ε goes + N−1 + to 0 . Therefore, for H almost all x in t , ut = u. With the same arguments, but now reasoning on CR,t,t+ε , we obtain that for HN −1 almost all x in t , u− t = u. We are going to establish the continuity of the trace operator with respect to the intermediate convergence. Let us recall that the trace operator is continuous from W 1,p () into Lp (), p ≥ 1 when these two spaces are equipped with their weak topology. Indeed, this is a consequence of the following property (cf. [90, Proposition III 9]): if T is a continuous linear operator from a Banach space E into a Banach space F , then T is continuous from E equipped with the σ (E, E ) topology into F equipped with the σ (F, F ) topology. We cannot apply this general principle to the trace operator defined in BV (). Indeed, the weak toplogy in BV () is not the σ (BV (), BV () ) topology and we must be careful with the terminology “weak” convergence in BV (). The example of Remark 10.1.3 shows that γ0 is not continuous from BV () into L1 () when BV () and L1 () are equipped with their weak convergence: define un ∈ BV (), = (0, 1) by un (x) = nx if x ∈ (0, n1 ] and u(x) = 1 if x ∈ [ n1 , 1). It is easily seen that un weakly converges to the constant 1 in BV () whereas un (x) = 0 for x ∈ {0}. Note that in this example, un does not converge to u = 1 in the sense of intermediate convergence because there is “ loss of total mass.” Indeed |Dun | = 1 but |Du| = 0. When the total mass is preserved at the limit, the trace operator is continuous. Theorem 10.2.2. Let be a Lipschitz domain of RN . The trace operator γ0 is continuous from BV () equipped with the intermediate convergence onto L1 () equipped with the strong convergence. Proof. For each t > 0, consider the Lipschitz domain t = {x ∈ : d(x, RN \ ) > t} with Lipschitz boundary t , and (un )n∈N and u in BV () such that |un − u | dx → 0, |Dun | → |Du|.
(10.21)
Possibly passing to a subsequence on n, and for almost all t ∈ R+ , we have |Du| = 0, t un − u= (un − u)t for HN −1 a.e. x on t , lim |un − u| dHN −1 → 0, n→+∞
(10.22)
t
where (un − u)t denotes the trace on t of the function un − u in BV (t ). Indeed, the two first assertions are satisfied for all but countably many t in R+ , thanks to Lemma 4.2.1 and to Example 10.2.4. The last assertion is a consequence of the curvilinear Fubini theorem
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
387
(cf. Corollary 4.2.2): +∞ +∞ |un − u| dHN −1 dt = |un − u| dHN −1 dt N 0 0 t [d(x,R \)=t] = |un − u| dx.
For a fixed t, for which these assertions hold, let us define the function un,t in BV () by un,t = un − u in \ t , 0 in t . We have
Dun,t = D(un − u)( \ t ) + (un − u)t νt HN −1 t ,
where νt is the outer unit normal vector to t . According to the strong continuity of the trace operator γ0 from BV () onto L1 (), we finally deduce |γ0 (un − u)| dHN−1 = |γ0 (un,t )| dHN −1 ≤C |un − u| dx + |Dun − Du| dx \t \t + |un − u| dHN −1 . t
Letting n → ∞, (10.21) and (10.22) yield lim sup |γ0 (un − u)| dHN −1 ≤ 2C n→+∞
|Du| dx.
\t
We have used the narrow convergence of |Dun | to |Du| and t |Du| = 0 to assert that \t |Dun | tends to \t |Du| (see Proposition 4.2.5). We complete the proof by letting t go to zero.
10.3 The coarea formula and the structure of BV functions It is well known that each real-valued function u of bounded variation on an interval I of R is the difference between two monotonous functions and, consequently, possesses at every point x0 ∈ I , two limits u(x0 − 0) and u(x0 + 0). One can then define its jump set Su := {x ∈ : u(x − 0) = u(x + 0)}. The main objective of this section is to establish that one can associate a set Su to each u in BV (), which generalize in any dimension the jump set of u in one dimension. In the particular case of a simple function u = χ ∈ BV (), we will see that Su is a part ∂M of the topological boundary ∂, which may differ from ∂ of a set of null HN−1 measure. The structure theorem, Theorem 10.3.4, will be a straightforward consequence of this property thanks to the generalized coarea formula in Section 10.3.3. For a first reading, the reader is advised to go directly to the definitions of the approximate limit sup and approximate limit inf (Definition 10.3.4) and to Theorem 10.3.4.
i
i i
i
i
i
i
388
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
10.3.1
Notion of density and regular points
In this subsection, we generalize the notions of interior, exterior of subsets of RN as well as the notions of limit, continuity, and jump for measurable functions. Indeed, we intend to extend the expression Dχ = ν HN −1 previously obtained for Lipschitz domains thanks to the theory of traces (see Example 10.2.3) to sets whose characteristic function χ belongs to BV (RN ). Actually, for these sets, we will obtain Dχ = ν HN −1 ∂M , where the generalized boundary ∂M (the measure theoretical boundary) is, up to a set of HN−1 measure zero, the set of all points that are neither in the generalized interior nor in the generalized exterior of . In what follows, Bρ (x0 ) denotes the open ball of RN with radius ρ > 0 and centered at x0 . Definition 10.3.1. Let E be a Borel subset of RN . A point x0 in RN is a density point of E iff LN (Bρ (x0 ) ∩ E) lim = 1. ρ→0 LN (Bρ (x0 )) A point x0 is a rarefaction point of E iff LN (Bρ (x0 ) ∩ E) = 0. ρ→0 LN (Bρ (x0 )) lim
The set of all density points and all rarefaction points of E are respectively called measure theoretical interior and measure theoretical exterior of E and denoted by E∗ and E ∗ . Example 10.3.1. When O is an open subset of RN , it is easily seen that all the points of O are density points and all the points of RN \ O are rarefaction points. Nevertheless, if x0 belongs to the boundary of O, various situations may occur as shown in Figure 10.2.
Figure 10.2. The point x0 is a density point of the union of the two discs but is a rarefaction point of the complementary of this union. We generalize these definitions relatively to a fixed Borel subset F of RN as follows. Definition 10.3.2. Let F and E be two Borel subsets of RN and assume that x0 in RN is such that LN (Bρ (x0 ) ∩ F ) > 0 for all ρ > 0 small enough. The point x0 is an F -density point of E iff LN (Bρ (x0 ) ∩ F ∩ E) lim = 1. ρ→0 LN (Bρ (x0 ) ∩ F )
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
389
The point x0 is an F -rarefaction point of E iff LN (Bρ (x0 ) ∩ F ∩ E) = 0. ρ→0 LN (Bρ (x0 ) ∩ F ) lim
These definitions allow us to adopt the following notion of boundary. Definition 10.3.3. Let E be a Borel subset of RN . The measure theoretical boundary of E is the subset of RN denoted by ∂M E, made up of all the elements of RN which are neither density points nor rarefaction points of E. As a straightforward consequence of this definition, one can easily establish that the measure theoretical boundary is a part of the classical topological boundary as stated in the following proposition. The proof is left to the reader. Proposition 10.3.1. The measure theoretical boundary of E is the subset of the topological boundary ∂E defined by
LN (Bρ (x) ∩ E) LN (Bρ (x) \ E) N ∂M E = x ∈ R : lim sup > 0 and lim sup >0 . LN (Bρ (x)) LN (Bρ (x)) ρ→0 ρ→0
Remark 10.3.1. The measure theoretical boundary may differ from the topological boundary of a set of nonnull HN−1 measure. Indeed, consider E = B \ [(0, 0), (0, 1)[, where B is the unit open ball of R2 . The measure theoretical boundary is the sphere but the (topological) boundary is the union of the sphere and the interval [(0, 0), (0, 1)]. Let E be an open subset of RN satisfying the following property: for all point x0 of ∂E, there exists a normal vector ν(x0 ) to ∂E such that E is included in the half-space πν (x0 ) := {x ∈ RN : x − x0 , ν(x0 ) > 0}, where , denotes the scalar product in RN . Then one has ∂E = ∂M E. Indeed, it is easily seen that χ ρ1 (E−x0 ) strongly converges to the characteristic function χπν (x0 ) of πν (x0 ) in L1loc (RN ) so that for x0 in ∂E, LN (B1 (x0 ) ∩ ρ1 (E − x0 ) LN (Bρ (x0 ) ∩ E) lim = lim ρ→0 ρ→0 LN (B1 (x0 )) LN (Bρ (x0 )) LN (Bρ (x0 ) \ (E − x0 )) . ρ→0 LN (Bρ (x0 ))
= 1/2 = lim
We now look into the notion of approximate limit. Definition 10.3.4. Let f : RN −→ R be a measurable function and x0 ∈ RN . A real number α is the approximate limit of f at x0 iff ∀ε > 0, x0 is a density point of the set [ |f − α| < ε ]
i
i i
i
i
i
i
390
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
or equivalently ∀ε > 0, x0 is a rarefaction point of the set [ |f − α| > ε ]. We then write α = ap limx→x0 f (x). More generally, we define in R the approximate limit sup and approximate limit inf of f at x0 by LN (Bρ (x0 ) ∩ [f > t]) ap lim sup f (x) = inf t ∈ R : lim = 0 , ρ→0 LN (Bρ (x0 )) x→x0
LN (Bρ (x0 ) ∩ [f < t]) ap lim inf f (x) = sup t ∈ R : lim =0 x→x0 ρ→0 LN (Bρ (x0 ))
.
Let F be a fixed Borel subset of RN . A real number α is called the F -approximate limit of f at x0 iff ∀ε > 0, x0 is an F -density point of [ |f − α| < ε ] and we write α = ap limx→x0 ,x∈F f (x). Example 10.3.2. If α = limx→x0 f (x), it is straightforward to show that α = ap limx→x0 f (x). When f is the characteristic function χD1 ∪D2 of the union of the two discs in Example 10.3.1, [ |f − 1| < ε ] = D1 ∪ D2 , so that ap limx→x0 f (x) = 1 although the classical limit does not exist. When f is the characteristic function of the complementary of the union of these two discs, the approximate limit of f at x0 is zero. It is easy to establish uniqueness of the approximate limit when it exists. For further details related to these notions, see [156]. We only give, without proof, four elementary useful properties. Proposition 10.3.2. For all Borel subsets A and B of RN , the four following assertions hold: (i) Let C = A ∪ B. If the approximate limits ap
lim
x→x0 , x∈A
f (x) and ap
lim
x→x0 , x∈B
f (x)
exist and coincide, then ap limx→x0 , x∈C f (x) exists. (ii) If x0 is not a rarefaction point of A, A ⊂ B, and if x0 is a B-rarefaction point of B \ A, then the existence of ap lim f (x) x→x0 , x∈A
implies the existence of ap
lim
x→x0 , x∈B
f (x).
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
391
(iii) If A ⊂ B and if x0 is not a rarefaction point of A, then the existence of ap
lim
f (x)
lim
f (x).
x→x0 , x∈B
implies the existence of ap
x→x0 , x∈A
(iv) If x0 is not a rarefaction point of A ∩ B, then the existence of ap
lim
x→x0 , x∈A
f (x) and ap
lim
x→x0 , x∈B
f (x)
implies equality of these two approximate limits. In the following proposition, we show that the approximate limit at x0 is a classical limit for the restriction of f to a suitable Borel set. Moreover, when the approximate lim inf and approximate lim sup coincide, their common value is the approximate limit. Proposition 10.3.3. A measurable function f : RN −→ R possesses an approximate limit α at x0 iff there exists a Borel subset B of RN such that x0 is a rarefaction point of RN \ B and such that the restriction f B of f to B possesses the classical limit α at x0 . On the other hand, one always has ap lim inf f ≤ ap lim sup f x→x0
x→x0
in R and ap lim inf x→x0 f = ap lim supx→x0 f := α ∈ R if and only if ap limx→x0 f = α. Proof. Let us prove the first assertion. It is easily seen that the given condition is sufficient. Conversely, assume that f possesses an approximate limit α at x0 . Without loss of generality one may assume α = 0. For all integer i, x0 is then a rarefaction point of Ai := RN \ {x ∈ RN : |f (x)| < 1/ i}. Consider a nonincreasing sequence ρ1 > ρ2 > . . . > ρi > . . . in R+ such that LN (Bρ (x0 ) ∩ Ai ) ≤ 2−i LN (Bρ (x0 )) when 0 ≤ ρ ≤ ρi , and denote the complementary set of ∪i∈N∗ (Ai ∩ Bρi (x0 )) by B. It is straightforward to show that f B has the limit zero at x0 . We now show that x0 is a rarefaction point of ∪i∈N∗ (Ai ∩ Bρi (x0 )). Let ρi > ρ > ρi+1 . Then LN (Bρ (x0 ) ∩ (RN \ B)) ≤ LN (Bρ (x0 ) ∩ Ai ) +
∞
LN (Bρi+k (x0 ) ∩ Ai+k )
k=1
≤ Cρ 2 N
−(i−1)
.
i
i i
i
i
i
i
392
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
We conclude the proof of the assertion by letting ρ go to zero and then i go to +∞ in inequality LN (Bρ (x0 ) ∩ (RN \ B) ≤ C2−(i−1) . LN (Bρ (x0 )) We are going to establish the second assertion. Assume that ap lim inf x→x0 f (x) = ap lim supx→x0 f (x) = α. Let ε > 0, tε , tε be such that α ≤ tε < α + ε, α − ε < tε ≤ α and LN (Bρ (x0 ) ∩ [f > tε ]) LN (Bρ (x0 ) ∩ [f < tε ]) lim = 0, lim = 0. ρ→0 ρ→0 LN (Bρ (x0 )) LN (Bρ (x0 )) Since [|f − α| > ε] = [f > α + ε] ∪ [f < α − ε] and
[f > α + ε] ⊂ [f > tε ], [f < α − ε] ⊂ [f < tε ],
we infer
LN (Bρ (x0 ) ∩ [|f − α| > ε] = 0, ρ→0 LN (Bρ (x0 )) lim
and thus α = ap limx→x0 f (x). Conversely, assume that α = ap limx→x0 f (x), i.e., for all ε>0 LN (Bρ (x0 ) ∩ [|f − α| > ε] lim = 0. ρ→0 LN (Bρ (x0 )) We deduce that ∀ε > 0 LN (Bρ (x0 ) ∩ [f > α + ε]) =0 ρ→0 LN (Bρ (x0 )) lim
and
LN (Bρ (x0 ) ∩ [f < α − ε]) = 0. ρ→0 LN (Bρ (x0 )) lim
Therefore α + ε ≤ ap lim inf f (x) and α − ε ≥ ap lim sup f (x), x→x0
x→x0
and thus ap lim inf f (x) = ap lim sup f (x), x→x0
x→x0
provided that we have established ap lim inf x→x0 f (x) ≤ ap lim inf x→x0 f (x). Let us show ap lim inf x→x0 f (x) ≤ ap lim supx→x0 f (x). One may assume ap lim supx→x0 f (x) < +∞. Let any τ ∈ R be such that LN (Bρ (x0 ) ∩ [f < τ ]) = 0. ρ→0 LN (Bρ (x0 )) lim
We claim that τ≤ ap lim supx→x0 f (x). Assume, on the contrary, that τ >ap lim supx→x0 f (x) and set ε = τ − ap lim supx→0 f (x). We deduce that there exists tε such that τ > tε and LN (Bρ (x0 ) ∩ [f > tε ]) = 0. ρ→0 LN (Bρ (x0 )) lim
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
393
The inclusion [f ≥ τ ] ⊂ [f > tε ] then yields LN (Bρ (x0 ) ∩ [f ≥ τ ]) =0 ρ→0 LN (Bρ (x0 )) lim
which is in contradiction with LN (Bρ (x0 ) ∩ [f ≥ τ ]) = 1. ρ→0 LN (Bρ (x0 ) lim
Remark 10.3.2. Every measurable function f : RN −→ R possesses an approximate limit almost everywhere. For a proof, consult Morgan [184]. Let f be a function in L1loc (RN ). Then each Lebesgue point of f is a point of approximate limit. This property is indeed the straightforward consequence of 1 LN (Bρ (x0 ) ∩ [|f − f (x0 )| > ε]) |f (x) − f (x0 )| dx. ≤ LN (Bρ (x0 ) ε LN (Bρ (x0 )) Bρ (x0 ) Nevertheless, there exist points of approximate limit which are not Lebesgue points. Consult for instance, Morgan [184, Exercise 2.7]. In the next sections, for all functions f in L1 (), open subset of RN , we will adopt the following convention: we choose a representative of f , still denoted f , such that at every point x0 of approximate limit, f (x0 ) = ap limx→x0 f (x). Such a representative is said to be approximately continuous at its points of approximate limit. We now generalize the concept of left and right limits u(x0 − 0) and u(x0 + 0) for functions defined on RN . We denote the unit sphere of RN by S N −1 and for all a in S N −1 and all x0 in RN , πa (x0 ) denotes the open half-space πa (x0 ) := {x ∈ RN : x − x0 , a > 0}. We also denote the hemiball πa (x0 ) ∩ Bρ (x0 ) by Hρ,a (x0 ). Definition 10.3.5. A point x0 in RN is called a regular point for the measurable function f : RN → R iff there exists a ∈ S N −1 such that the two following approximate limits exist: fa (x) := ap
lim
x→x0 , x∈πa (x0 )
f (x) and f−a (x) := ap
lim
x→x0 , x∈π−a (x0 )
f (x).
Example 10.3.3. Let us consider Example 10.3.1 and take for f the characteristic function of the union of the two discs. The point x0 is a regular point of f and satisfies 1 = fa (x0 ) = f−a (x0 ), where a is one of the two unit vectors orthogonal to the common tangent hyperplan at the two discs at x0 . One could say that x0 is a point of approximate continuity for f . Let us point out that we also have 1 = ap limx→x0 f (x). Consider now the characteristic function f of one of the two discs. The point x0 is also a regular point of f and satisfies fa (x0 ) = f−a (x0 ). The following theorem asserts that regular points always satisfy the alternative of Example 10.3.3.
i
i i
i
i
i
i
394
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
Theorem 10.3.1 (structure of the set of regular points, jump points, and jump sets). Let x0 be a regular point of a measurable function f : RN → R. We have the following alternative: (i) If fa (x0 ) = f−a (x0 ), then ap limx→x0 f (x) exists and for all b in S N −1 , fb (x0 ) = ap limx→x0 f (x). The point x0 is called a point of approximate continuity of f . (ii) If fa (x0 ) = f−a (x0 ), then ±a is the unique element of S N −1 such that fa and f−a exist. The point x0 is called a jump point of f. The real number |fa (x0 ) − f−a (x0 )| is the jump of f at x0 and fa (x0 ) − f−a (x0 ) a is the oriented jump. The set of all jump points of f is called the jump set of f and denoted by Sf . Proof. Assume that fa (x0 ) = f−a (x0 ). The existence of ap limx→x0 f (x) is a consequence of Proposition 10.3.2(i), with A = πa (x0 ) and B = π−a (x0 ). Thus, according to Proposition 10.3.2(iii), with B = RN and A = πb (x0 ), fb (x0 ) exists for all b ∈ S N −1 . According to Proposition 10.3.2(iv), we obtain fb (x0 ) = ap limx→x0 f (x). We finally establish (ii). Let b be any element of S N −1 . We show that fb (x0 ) does not exist if b = ±a. Otherwise, as x0 is not a rarefaction point of the sets πa (x0 ) ∩ πb (x0 ) and π−a (x0 ) ∩ πb (x0 ), from Proposition 10.3.2(iv), fa (x0 ) = fb (x0 ) and f−a (x0 ) = fb (x0 ), a contradiction. Example 10.3.4. Let us consider the function f defined by α if |x| ≤ 1, f = β if |x| > 1, where α = β. The jump set is the sphere S N −1 . In Example 10.3.1, if f = χD1 ∪D2 , the jump set is the boundary (topological) except the point x0 . The next proposition characterizes the jump set of simple functions. Proposition 10.3.4 (inner measure theoretic normal). Let E be a Borel subset of RN and χ its characteristic function. The jump set Sχ of χ is the set of all points x0 of RN for which there exists a in S N−1 satisfying LN Hρ,a (x0 ) ∩ E = 1, lim ρ→0 LN Hρ,a (x0 ) LN Hρ,−a (x0 ) ∩ E lim = 0. ρ→0 LN Hρ,a (x0 ) For such points, the unit vector a is unique and called the inner measure theoretic normal to E at x0 . Moreover, Sχ ⊂ ∂M E. Proof. From the definition, every point satisfying the two above properties is a jump point of χ , and the uniqueness of a is a consequence of Theorem 10.3.1(ii). Conversely, if x0 is a jump point of χ , there exists a unique ±a in S N −1 such that χa (x0 ) = χ−a (x0 ). It is easily seen that χa (x0 ) and χ−a (x0 ) belong to {0, 1}. Then,
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
395
exchanging, if necessary, a and −a, we have χa (x0 ) = 1 and χ−a (x0 ) = 0, which gives the two required limits. We must now prove SχE ⊂ ∂M E. Let x0 ∈ SχE . One has LN (Bρ (x0 ) ∩ E) LN (Bρ (x0 ) ∩ E) = N N L (Bρ (x0 )) L (Bρ (x0 ) ∩ E ∩ πa (x0 )) ×
LN (Bρ (x0 ) ∩ E ∩ πa (x0 )) LN (Bρ (x0 ) ∩ πa (x0 ))
×
LN (Bρ (x0 ) ∩ πa (x0 )) LN (Bρ (x0 ))
≥
1 LN (Bρ (x0 ) ∩ E ∩ πa (x0 )) . 2 LN (Bρ (x0 ) ∩ πa (x0 ))
Since the second factor tends to 1 when ρ → 0, the inequality above yields lim sup ρ→0
1 LN (Bρ (x0 ) ∩ E) ≥ > 0. LN (Bρ (x0 )) 2
Exchanging the roles of a and −a and E and RN \ E, we also obtain lim sup ρ→0
LN (Bρ (x0 ) \ E) 1 ≥ > 0; LN (Bρ (x0 )) 2
thus, according to Proposition 10.3.1, x0 ∈ ∂M E.
10.3.2
Sets of finite perimeter, structure of simple BV functions
To clarify the structure of BV functions, we now establish that up to a set of HN −1 measure zero, the jump set of all simple function χE which belongs to BV (RN ) is essentially the measure theoretical boundary of E. Definition 10.3.6. A Borel subset E of RN is called a set of finite perimeter in iff its characteristic function χE belongs to BV (). The total mass |DχE |() is called the perimeter of E in and is denoted by P (E, ) or P (E) when = RN . A Borel subset E of RN is called a set of locally finite perimeter if it is a set of finite perimeter in U for all bounded open subset U of RN . Remark 10.3.3. When E is a Lipschitz open bounded subset of , according to the trace Theorem 10.2.1 and to Example 10.2.2, we have P (E, ) = HN −1 ( ∩ ∂E). Theorem 10.3.2 (structure of simple functions of BV ()). Let E be a set of finite perimeter in . Then, (i) up to a Borel subset of HN −1 -zero measure, ∂M E ∩ is the jump set of χE ; (ii) the set ∂M E ∩ is countably N − 1-rectifiable, i.e., ∂M E ∩ ⊂ i∈N An , where HN−1 (A0 ) = 0 and for each i = 1, . . . , +∞ there exists a Lipschitz function fi : RN−1 → RN such that Ai = fi (RN −1 );
i
i i
i
i
i
i
396
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
(iii) the following generalized Gauss–Green formula holds: for HN −1 almost all x in , there exists ν(x) ∈ S N−1 , called the generalized inner normal vector to E at x, such that for all ϕ in C1c (, RN ), χE div ϕ dx = ϕ.(−ν)dHN −1 , ∂M E∩
that is, DχE = ν HN−1 ∂M E ∩ . Proof. The proof is divided into the five following steps: Step 1. We define the generalized inner normal vector ν(x) to E at x and the reduced boundary ∂r E of E. Step 2. This step consists of establishing ∂r E ⊂ SχE ∩ ⊂ ∂M E ∩ . Step 3. We prove that ∂M E ∩ , SχE ∩ , and ∂r E are essentially the same sets, more precisely, HN−1 (∂M E ∩ \ ∂r E) = 0. Step 4. We establish that SχE ∩ is countably (N − 1)-rectifiable. Step 5. We prove the generalized Gauss–Green formula. Step 1. Contrary to the previous notions of boundary ∂M E and SχE , the definition below is specifically defined for subsets E of such that DχE belongs to M(, RN ). Definition 10.3.7. Let E be a subset of finite perimeter in . The generalized unit inner normal to E is the Radon–Nikodym derivative of DχE with respect to the measure |DχE |. In other words, Bρ (x) DχE for |DχE | a.e. x in , ν(x) = lim . ρ→0 Bρ (x) |DχE | The reduced boundary ∂r E consists of all points x in such that the limit above exists. Let us remark that according to the trace theory, when E is a Lipschitz domain of RN with boundary , we have DχE = νHN −1 , where ν(x) is the inner normal to E at HN −1 DχE a.e x in . Consequently |Dχ (x) = ν(x) for HN −1 a.e. x in and HN −1 ( \ ∂r E) = 0. E| Step 2. We establish that ∂r E ⊂ SχE ∩ . The inclusion SχE ∩ ⊂ ∂M E ∩ has been proved in Proposition 10.3.4. The key of the proof is the following blow-up lemma. Lemma 10.3.1. Let x0 ∈ ∂r E, ν(x0 ) be the generalized unit inner normal to E at x0 , Eρ the homothetic subset {x ∈ RN : ρ(x − x0 ) ∈ E} of E, and πν (x0 ) the half-space {x ∈ RN : x − x0 , ν(x0 ) > 0}. Then χEρ weakly converges to χπν (x0 ) in BV (B1 (x0 )). Sketch of the proof of Lemma 10.3.1. We admit the three following estimates: for each x ∈ ∂r E there exits a positive constant C such that for all sufficiently small r > 0, C ≤ r N −1 |DχE |(Br (x)) ≤ C −1 , C ≤ r −N LN (Br (x) ∩ E), C≤r
−N
L (Br (x) \ E). N
(10.23) (10.24) (10.25)
For a proof, we refer the reader to [229, Lemma 5.5.4].
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
397
Without loss of generality, one may assume x0 = 0 and ν(0) = (0, . . . , 0, 1). Moreover, it is enough to establish that, for each sequence (ρh )h∈N , there exists a subsequence (not relabeled) satisfying χEρh χπν (0) in BV (B1 (x0 )). Let us fix r > 0. Reasoning with a smooth approximating sequence in the sense of the intermediate convergence (Theorem 10.1.2), and changing scale, a straightforward calculation gives DχEρh (Br (0)) = ρh1−N DχE (Bρh r (0)), |DχEρh |(Br (0)) =
ρh1−N |DχE |(Bρh r (0)).
(10.26) (10.27)
Collecting (10.23) and (10.27), we obtain, for ρh small enough, Cr N−1 ≤ |DχEρh |(Br (0)) ≤ C −1 r N −1 .
(10.28)
Thanks to (10.28) with r = 1, applying Theorem 10.1.4 in BV (B1 (0)), there exist a subsequence (not relabeled) and a subset F ⊂ RN of finite perimeter in B1 (0) such that χEρh → χF strongly in L1 (B1 (0)),
(10.29) N
DχEρh DχF weakly in M(B1 (0), R ).
(10.30)
From now on we reason in BV (B1 (0)). It remains to establish that F = πν (0). According to Theorem 4.2.1 and Lemma 4.2.1, for all but countably many 0 < ρ < 1 DχEρh (Bρ (0)) → DχF (Bρ (0)).
(10.31)
Thus, from (10.27), (10.26), according to the definition of ν(0) = (0, . . . , 0, 1), and from (10.31) we deduce lim |DχEρh |(Bρ (0)) = lim
h→+∞
|DχEρh |(Bρ (0))
h→+∞ ∂ ∂xN
= lim
h→+∞
∂ χEρh (Bρ (0)) χEρh (Bρ (0)) h→+∞ ∂xN lim
|DχE |(Bρh ρ (0)) ∂ χEρh (Bρ (0)) lim ∂ h→+∞ ∂x χ (B (0)) N ρh ρ ∂xN E
∂ ∂ χEρh (Bρ (0)) = χF (Bρ (0)). h→+∞ ∂xN ∂xN
= lim
(10.32)
The lower semicontinuity of the total variation (Proposition 10.1.1) and (10.30), (10.32) finally yield ∂ |DχF |(Bρ (0)) ≤ χF (Bρ (0)), ∂xN hence equality. Let us consider the density νN of ∂x∂N χF with respect to |DχF | (cf. Radon– Nikodym theorem, Theorem 4.2.1). From above one has ∂ χF (Bρ (0)) = νN (x) d|DχF |(x). |DχF |(Bρ (0)) = ∂xN Bρ (0)
i
i i
i
i
i
i
398
“abmb 2005/1 page 3 i
Chapter 10. Spaces BV and SBV
This shows that νN (x) = 1 for |DχF | a.e. x ∈ B1 (0); hence ∂x∂N χF = |DχF | is a nonnegative Radon measure. Note also that from (10.32) and (10.28), ∂x∂N χF (Bρ (0)) ≥ Cρ N−1 so that ∂x∂N χF ≡ 0. On the other hand, from (10.31), (10.26), (10.27), and the definition of ν(0), for i = 1 . . . N − 1, and for all but countably many 0 < ρ < 1, we have ∂ χ (Bρ (0)) ∂xi F
|DχF |(Bρ (0))
= lim
h→+∞
∂ χ (Bρ (0)) ∂xi Eρh
|DχEρh |(Bρ (0))
= lim
h→+∞
∂ χ (Bρh ρ (0)) ∂xi E
|DχE |(Bρh ρ (0))
= 0.
(10.33)
We deduce from (10.33) that ∂x∂ i χF (Bρ (0)) = 0, for i = 1, . . . , N − 1. Let us consider the density νi of ∂x∂ i χF with respect to |DχF |. Since 0=
∂ χF (Bρ (0)) = ∂xi
νi (x) d|DχF |(x), Bρ (0)
νi (x) = 0 for |DχF | a.e. x ∈ B1 (0), and hence ∂x∂ i χF = 0. The function χF depends only on the variable xN , is nondecreasing, and, from (10.29), takes only the two values 0 and 1. Now set α = sup{xN : χF (xN ) = 0}. Note that α = +∞; otherwise DχF ≡ 0. For proving F = πν (0), it suffices to establish α = 0. Assuming α > 0 gives Bρ (0) ⊂ RN \ F for ρ < α so that, by (10.29), 0 = LN (Bρ (0) ∩ F ) = lim LN (Bρ (0) ∩ Eρh ) h→+∞
= lim ρh−N LN (Bρρh (0) ∩ E), h→+∞
which contradicts (10.24). If α < 0, then Bρ (0) ⊂ F for ρ < −α and a similar argument gives 0 = lim ρh−N LN (Bρρh (0) \ E), h→+∞
which contradicts (10.25). The proof of Lemma 10.3.1 is complete. We are going to establish ∂r E ⊂ SχE ∩ . Let x0 ∈ ∂r E. We have LN (E ∩ Hρ,−ν (x0 )) LN (Eρ ∩ B1 (x0 ) ∩ π−ν (x0 )) = . LN (Hρ,−ν (x0 )) LN (B1 (x0 ) ∩ π−ν (x0 )) The convergence of χEρ to χπν (x0 ) in L1 (B1 (x0 )) yields LN (E ∩ Hρ,−ν (x0 )) LN (πν (x0 ) ∩ B1 (x0 ) ∩ π−ν (x0 )) = = 0. ρ→0 LN (Hρ,−ν (x0 )) LN (B1 (x0 ) ∩ π−ν (x0 )) lim
Similarly, LN (E ∩ Hρ,ν (x0 )) LN (πν (x0 ) ∩ B1 (x0 ) ∩ πν (x0 )) = =1 ρ→0 LN (Hρ,ν (x0 )) LN (B1 (x0 ) ∩ πν (x0 )) lim
and, according to Proposition 10.3.4, x0 ∈ SχE .
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 3 i
399
Step 3. The proof of HN−1 (∂M E ∩ \ ∂r E) = 0 consists first in proving HN −1 (SχE ∩ \ ∂r E) = 0; next, HN−1 (∂M E ∩ \ SχE ∩ ) = 0. Keys of the proof are the relative isoperimetric inequality and Lemma 4.2.3. This is summarized in the next lemma. Lemma 10.3.2. Let E be a set of finite perimeter in . Then the following assertions hold: (i) Relative isoperimetric inequality: there exists a positive constant C such that min{LN (Br ∩ E), LN (Br \ E)}
N −1 N
≤ C|DχE |(Br )
for all open ball Br with radius R included in . (ii) There exists a positive constant C such that for all x ∈ SχE ∩ , lim inf ρ→0
|DχE |(Bρ (x)) ≥ C. ρ N −1
(iii) For HN−1 almost all x ∈ \ SχE ∩ , lim sup ρ→0
|DχE |(Bρ (x)) = 0. ρ N −1
(iv) HN−1 (SχE ∩ \ ∂r E) = 0. (v) HN−1 (∂M E ∩ \ SχE ∩ ) = 0. Proof of Lemma 10.3.2. Assertion (i) is a straightforward consequence of the Poincaré– Wirtinger inequality |u − u|
N N −1
Br
NN−1
≤C
|Du|, where u = Br
1 N L (Br )
u(x) dx, Br
applied to u = χE . Indeed, every function u in BV () satisfies the Poincaré–Wirtinger inequality: consider a smooth approximating sequence in the sense of the intermediate convergence, and apply Corollary 5.4.1 and Theorem 10.1.2 as in the proof of Proposition 10.1.3. Let us prove (ii). Let x ∈ SχE ∩ ; according to Theorem 10.3.4, an easy computation yields lim inf ρ→0
so that
1 1 LN (Bρ (x) ∩ E) LN (Bρ (x) \ E) ≥ and lim inf ≥ N N ρ→0 L (Bρ (x)) 2 L (Bρ (x)) 2
1 LN (Bρ (x) ∩ E) LN (Bρ (x) \ E) = lim = . N ρ→0 ρ→0 L (Bρ (x)) LN (Bρ (x)) 2 lim
The conclusion of assertion (ii) then follows from (i).
i
i i
i
i
i
i
400
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV For proving (iii), it suffices to establish that for all δ > 0, the set > = |DχE |(Bρ (x)) Aδ = ( \ SχE ) ∩ x ∈ : lim sup > δ ρ N −1 ρ→0
satisfies HN−1 (Aδ ) = 0. From Lemma 4.2.3 applied to the Borel measure µ = |DχE |, one has |DχE |(Aδ ) ≥ CδHN−1 (Aδ ) and the conclusion follows from |DχE |(Aδ ) = 0 which is a consequence of ∂r E ⊂ SχE ∩ established in Step 2. We establish assertion (iv). From assertion (ii) and Lemma 4.2.3 applied to the Borel measure µ = |DχE |, there exists a nonnegative constant C depending only on N , such that, for all Borel set B included in SχE , HN −1 (B) ≤ C |DχE |(B).
(10.34)
Obviously, by definition |DχE |( \ ∂r E) = 0; thus |DχE |(SχE ∩ \ ∂r E) = 0, and the conclusion HN−1 (SχE ∩ \ ∂r E) = 0 follows from (10.34). We finally establish HN−1 (∂M E ∩ \ SχE ∩ ) = 0. According to (iii), we are reduced to prove that for each x in ∂M E ∩ , lim sup ρ→0
|DχE |(Bρ (x)) > 0. ρ N −1
From the definition of the measure theoretical boundary (Proposition 10.3.1), there exists δ > 0 such that lim sup ρ→0
LN (Bρ (x) ∩ E) LN (Bρ (x) ∩ E) < 1 − δ. > δ and lim inf ρ→0 LN (Bρ (x)) LN (Bρ (x))
Consequently, choosing δ < 1/2, for ρ small enough, δ
t] := {x ∈ : u(x) > t} when t varies in R. The following property, established by Fleming and Richel in [136], generalizes the classical coarea formula (Theorem 4.2.5) to BV functions and states that for almost every t in R, the level set [u > t] of each BV function u has a finite perimeter in . Consequently, we show that the jump set of u ∈ BV () inherits its structure from the one of the jump set of finite perimeter sets [u > t], t ∈ R, stated in Theorem 10.3.2. Theorem 10.3.3 (coarea formula). Let u be a given function in BV (). Then, for a.e. t in R, the level set Et = {x ∈ : u(x) > t} of u is a set of finite perimeter in , and +∞ DχEt dt, Du = −∞ +∞ |DχEt |dt. |Du|() = −∞
More generally for all Borel function f : → R+ ,
f |Du| =
+∞
−∞
f |DχEt |dt.
Proof. Let us assume for the moment that for a.e. t in R, DχEt belongs to M(, RN ). For all t in R, set χEt if t ≥ 0, ft = −χ\Et if t < 0. +∞ It is easily seen that for all x in , u(x) = −∞ ft (x) dt. For all ϕ in C1c (, RN ) we have < Du, ϕ > = −
u div ϕ dx +∞ = − dx ft (x) div ϕ dt
=
0
−∞
−∞
=−
0
+∞
χEt div ϕ dx
dt 0
χEt div ϕ dx
dt
χEt div ϕ dx −
dt
+∞ 0
−∞ +∞
χ\Et div ϕ dx −
dt
=− dt χEt div ϕ dx −∞ +∞ = < DχEt , ϕ > dt. −∞
Hence Du =
+∞ −∞
DχEt dt and |Du|() ≤
+∞ −∞
|DχEt |dt.
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
“abmb 2005/1 page 4 i
403
+∞ We establish now the converse inequality −∞ |DχEt | dt ≤ |Du|(), which also proves that DχEt belongs to M(, RN ) for a.e. t in R. Step 1. We assume that u belongs to the space A() of piecewise linear and continuous functions in . By linearity, one can assume that u is the linear function u = a.x + b with a ∈ RN and b ∈ R so that |DχEt | = HN −1 ( ∩ ∂Et )
= HN −1 ( ∩ [a.x + b = t]).
Consequently, according to the classical coarea formula (Theorem 4.2.5), +∞ +∞ |DχEt | = χ (x) dHN −1 (x) dt −∞ −∞ [a.x+b=t] N = |a|L () = |Du|.
+∞ Step 2. We establish the inequality −∞ |DχEt | dt ≤ |Du|() for all u ∈ BV (). Let (un )n∈N be a sequence in A() such that un u for the intermediate convergence. Such a sequence exists from Theorem 10.1.2 and the well-known density of the space A() in W 1,1 () equipped with its strong topology. For another and direct proof of this assertion, consult Ziemer [229, Exercise 5.2]. Let us set En,t := {x ∈ : un (x) > t}. According to the first step and to Fatou’s lemma, we have |Du| = lim |Dun | n→+∞ +∞ = lim |DχEn,t | dt n→+∞ −∞ +∞ ≥ lim inf |DχEn,t | dt. (10.37) −∞
On the other hand, |un − u| dx =
+∞
−∞
n→+∞
|χEn,t − χEt | dt dx =
+∞
−∞
|χEn,t − χEt | dx
dt,
which converges to zero. Thus, for a subsequence (not relabeled), and for almost all t in R, χEn,t → χEt strongly in L1 ().
(10.38)
The lower semicontinuity of the total variation with respect to the strong convergence in L1 () (Proposition 10.1.1) and (10.37), (10.38), finally yield +∞ |Du| ≥ |DχEt | dt.
−∞
i
i i
i
i
i
i
404
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
It is now easy to adapt the proof above for obtaining the coarea formula with f = χE for any Borel subset E of . The general coarea formula with a Borel function f : → R+ is then obtained by a classical density argument. We are now in a position to establish that HN −1 almost all points of are regular for function in BV (), and that their jump set (cf. Theorem 10.3.1), is countably (N − 1)rectifiable. For each function u in BV () whose representative satisfies the convention of Remark 10.3.2, we set Su = { x ∈ : u− (x) < u+ (x) }, where u− (x) = ap lim inf y→x u(y) and u+ (x) = ap lim supy→x u(y). Theorem 10.3.4. Let u be a given function in BV (). Then, for HN −1 -almost all x in , u− (x) and u+ (x) are finite and Su is countably (N − 1)-rectifiable. Moreover, Su is, up to a set of HN−1 measure zero, the jump set of u, and HN −1 almost all x in are regular for u. Proof. We begin by proving that Su is countably (N −1)-rectifiable. According to the coarea formula (Theorem 10.3.3), for almost all t ∈ R, Et = {x ∈ : u(x) > t} := [u > t] is a set of finite perimeter in . Now let D be a dense countable subset of {t ∈ R : − Et is of < t < u+ (x) }. We have finite perimeter} and set Su,t := { x ∈ Su : u (x) − Su = t∈D Su,t . On the other hand, from definitions of u and u+ , it is easy to establish that for all x ∈ Su,t , LN (Bρ (x) ∩ [u > t]) > 0, lim sup LN (Bρ (x)) ρ→0 LN (Bρ (x) ∩ [u < t]) lim sup > 0, LN (Bρ (x)) ρ→0 so that x ∈ ∂M Et . Thus Su ⊂ t∈D ∂M Et and the conclusion follows from the structure theorem, Theorem 10.3.2. We admit that −∞ < u− (x) ≤ u+ (x) < +∞ for HN −1 almost all x in . For a proof, consult Evans and Gariepy [132, Theorem 2]. For establishing that up to an HN −1 negligible set, Su is the jump set of u and that HN −1 almost all x in are regular for u, according to Proposition 10.3.3, Definition 10.3.5, and Theorem 10.3.1, it is enough to establish that for HN−1 almost all x in Su , there exists ν(x) in S N −1 such that u− (x) = ap limy→x, y∈π−ν(x) (x) u(y) and u+ (x) = ap limy→x, y∈πν(x) (x) u(y). Let x ∈ Su such that u− (x) and u+ (x) are finite and set t = u+ (x) − ε with ε small enough so that u− (x) < t < u+ (x). Thus x ∈ ∂M Et and, from Theorem 10.3.2, for HN −1 -almost all such x in Su , there exists ν(x) in S N−1 such that LN Hρ,ν(x) (x) ∩ [u > u+ (x) − ε] lim = 1. (10.39) ρ→0 LN Hρ,ν(x) (x) On the other hand, according to the definition of the approximate limsup, LN Bρ (x) ∩ [u > u+ (x) + ε] lim = 0, ρ→0 LN Bρ (x)
i
i i
i
i
i
i
10.3. The coarea formula and the structure of BV functions
405
LN Hρ,ν(x) (x) ∩ [u > u+ (x) + ε] = 0. lim ρ→0 LN Hρ,ν(x) (x)
hence
“abmb 2005/1 page 4 i
(10.40)
Combining (10.39) and (10.40) we obtain LN Hρ,ν(x) (x) ∩ [|u − u+ (x)| < ε] = 1, lim ρ→0 LN Hρ,ν(x) (x) which proves that u+ (x) = ap limy→x, y∈πν(x) (x) u(y). The proof of u− (x) = ap
lim
y→x, y∈π−ν(x) (x)
u(y)
is similar. Remark 10.3.4. In the proof of Theorem 10.3.4 we have established that Su possesses for HN−1 -almost every x in , a normal unit vector νu (x) and that HN−1 Su \ {x ∈ RN : uνu (x) (x) = u−νu (x) (x)} = 0. Moreover, we have obtained that for almost every t ∈ R and for HN −1 -almost every x in ∂M Et ∩ Su , νu (x) = νEt , where νEt is the inner measure theoretic normal to Et at x. Remark 10.3.5. According to Theorem 10.3.4 and Proposition 10.3.3, there exist two Borel sets E + and E − such that for H N −1 -almost all x in Su u+ (x) =
lim
y→x, y∈E + ∩πνu (x)
u(y) and u− (x) =
lim
y→x, y∈E − ∩π−νu (x)
u(y).
Remark 10.3.6. According to our convention (Remark 10.3.2) on the representative of L1 -functions, for HN−1 a.e. x in \ Su , u+ (x) = u− (x) = u(x). In the following proposition, we give some piece of information on u when x belongs to Su . Proposition 10.3.5. Let u ∈ BV (). For HN −1 -almost every x in Su letting Et := [u > t], one has (u− (x), u+ (x)) ⊂ {t ∈ R : x ∈ ∂M Et ∩ } ⊂ [u− (x), u+ (x)], and for HN−1 -almost every x in ∂M Et ∩ \ Su , u(x) = t. Proof. For the first inclusion (u− (x), u+ (x)) ⊂ {t ∈ R : x ∈ ∂M Et ∩ }, it suffices to note that t ∈ (u− (x), u+ (x)) implies x ∈ Su,t ⊂ ∂M Et . We establish now the second inclusion {t ∈ R : x ∈ ∂M Et ∩ } ⊂ [u− (x), u+ (x)]. Let t ∈ R be such that x ∈ ∂M Et ∩ , assume that t > u+ (x), and take t0 such that t > t0 > u+ (x) and LN (Bρ (x) ∩ [u > t0 ]) = 0. ρ→0 LN (Bρ (x)) lim
i
i i
i
i
i
i
406
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
Such a t0 exists from the definition of the approximate limsup. We have LN (Bρ (x) ∩ [u > t]) = 0, ρ→0 LN (Bρ (x)) lim
which is in contradiction with x ∈ ∂M Et . Assuming t < u− (x) yields the same contradiction. Since, up to a set of HN−1 measure zero, Su is the jump set of u, one has, for HN −1 almost every x in ∂M Et ∩ \ Su , u(x) = u+ (x) = u− (x), so that u(x) = ap limy→x u(y). For such x, we establish that u(x) = t. Otherwise, assume that t > u(x) and set ε = t −u(x). According to the definition of the approximate limit at x, we have LN (Bρ (x) ∩ [|u − u(x)| > ε]) = 0, ρ→0 LN (Bρ (x)) lim
which yields LN (Bρ (x) ∩ [u > ε + u(x)]) = 0, ρ→0 LN (Bρ (x)) lim
that is,
LN (Bρ (x) ∩ [u > t]) = 0, ρ→0 LN (Bρ (x)) lim
which contradicts the hypothesis x ∈ ∂M Et . Using the same arguments, assumption u < t would give the same contradiction.
10.4
Structure of the gradient of BV functions
Let u be a given function in BV () and Du = D a u + D s u the Lebesgue–Nikodym decomposition of the measure Du with respect to the N -dimensional Lebesgue measure LN restricted to . Let us recall that the measure D a u denotes the absolutely continuous part of Du with respect to the measure LN and D s u its singular part. We will denote the density of D a u with respect to LN by ∇u, so that D a u = ∇u LN . The theorem below makes precise the structure of the singular part D s u. Theorem 10.4.1. Let us denote the two measures D s uSu and D s u \ Su by J u and Cu, respectively, called jump part and Cantor part of Du. Then J u is absolutely continuous with respect to the restriction of the (N − 1)-dimensional Hausdorff measure to Su . More precisely, J u = (u+ − u− ) νu HN −1 Su . Moreover, J u and Cu are mutually singular: for all Borel sets E of HN−1 (E) < +∞ ⇒ |Cu|(E) = 0. Consequently, the Hausdorff dimension of the support spt(Cu) of the measure Cu satisfies N − 1 ≤ dimH (spt(Cu)) < N.
i
i i
i
i
i
i
10.4. Structure of the gradient of BV functions
“abmb 2005/1 page 4 i
407
Proof. According to the coarea formula, Theorem 10.3.3, and to Theorem 10.3.2(iii), for all Borel set E ⊂ Su J u(E) = Du(E) = +∞ = −∞
= E
+∞
−∞
DχEt (E) dt νu (x) dHN −1 (x) dt
E∩∂M Et +∞
−∞
χ{t∈R : x∈∂M Et } dt νu (x) dHN −1 (x).
Since E ⊂ Su , according to Proposition 10.3.5, for HN −1 a.e. x in E, one has L1 ({t ∈ R : x ∈ ∂M Et }) = u+ (x) − u− (x), thus J u = (u+ − u− ) νu HN−1 Su . Let now E be any Borel set included in \Su , satisfying HN −1 (E) < +∞. According to Theorems 10.3.3 and 10.3.2(iii), one has |Cu|(E) = |Du|(E) =
+∞
−∞
HN −1 (E ∩ ∂M Et ) dt.
(10.41)
With our convention (Remark 10.3.2), the points of \ Su are all points of approximate continuity for u; thus, according to Proposition 10.3.5, one has E ∩ ∂M Et ⊂ {y ∈ E : u(y) = t}. Moreover, HN−1 (E) < +∞, so that, from Lemma 4.2.1, the set of all t such that HN−1 ({y ∈ E : u(y) = t}) > 0 is at most countable, and (10.41) yields |Cu|(E) = 0. The proposition below states that all functions u in BVloc (RN ) possess an approximate derivative for almost all x in RN in the following sense: there exists a linear function L : RN −→ R denoted by ap Du such that ap lim
y→x
|u(y) − u(x) − L(y − x)| = 0. |y − x|
For a proof, see [134, Theorem 4.5.9] or [132, Theorem 4]. Proposition 10.4.1. Let u be a given function of BVloc (RN ), i.e., u ∈ BV (U ) for all open bounded subset U of RN . Then for almost all x0 in RN , 1 | u(x) − u(x0 ) − ∇u(x0 ) . (x − x0 ) | lim dx = 0. ρ→0 ρ d B (x ) | x − x0 | ρ 0 Consequentely for almost all x0 in RN ap limx→x0 Du = ∇u(x0 ). Example 10.4.1. We construct a BV -function whose gradient is reduced to its Cantor part: the Cantor–Vitali function. Let = (0, 1) and C be the classical triadic Cantor set
i
i i
i
i
i
i
408
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
Figure 10.3. Construction of the Cantor–Vitali function. C=
n∈N
Cn , where Cn is the union of 2n intervals of size 3−n . We define −n 2 fn (x) := χCn , 3 x un (x) := fn (t) dt (see Figure 10.3). 0
All the functions un belong to C([0, 1]) and if I is any of the 2n intervals of Cn , fn (t) dt = fn+1 (t) dt = 2−n , I I ∀x ∈ (0, 1) \ Cn , un (x) = un+1 (x). Indeed I fn (t) dt = ( 23 )n mes(I ) = 2−n . We deduce that for all x in Cn , |un (x) − un+1 (x)| ≤ 2−(n−1) , so that un uniformly converges to a continuous function u. According to the lower semicontinuity of the total variation, we then obtain |Du| ≤ lim inf |Dun | dx = 1, (0,1)
n→+∞
(0,1)
which proves that u belongs to BV (0, 1) and, since u is continuous, that J u = 0. Finally, since u is locally constant on (0, 1) \ C and L1 (C) = 0, one has ∇u = 0 and Du = Cu. Moreover, the support of Cu is the Cantor set C whose Hausdorff dimension is ln(2)/ln(3) ∼ 0.632 (see Example 4.1.1).
10.5 The space SBV () In some problems arising in image segmentation, or in mechanics in the study of cracks and fissures (see chapters 12 and 14), the first distributional derivatives of the competing
i
i i
i
i
i
i
10.5. The space SBV ()
“abmb 2005/1 page 4 i
409
functions which operate in the models are often measures without singular Cantor part. The solutions of these problems may be found in a special space of functions of bounded variation.
10.5.1
Definition
Definition 10.5.1. The special set of functions of bounded variation is the subset SBV () of BV () made up of all the functions of BV () whose gradient measures have no Cantor part in their Lebesgue decomposition, i.e., u ∈ SBV () ⇐⇒ u ∈ L1 () and Du = ∇u LN + (u+ − u− ) νu HN −1 Su . Remark 10.5.1. Arguing as in Remark 10.2.2, one may define the space SBV (, Rm ) as the space of all functions u : → Rm which belong to L1 (, Rm ), and whose distributional derivative Du is a M m×N -valued measure of the form Du = ∇u LN + (u+ − u− ) ⊗ νu HN −1 Su . Example 10.5.1. Let be an open bounded subset of RN , K a closed subset of such that HN−1 (K) < +∞, and u ∈ W 1,1 ( \ K) ∩ L∞ (). We claim that u belongs to SBV () and that Su ⊂ K. We first assume that K is regular in the following sense: there exist a C1 hypersurface such that K ⊂ and two disjoint subsets 1 and 2 of such that ∂1 ∩ ∂2 = and LN ( \ 1 ∪ 2 ) = 0. Then the result is a straightforward consequence of the trace theory (see Subsection 10.2 and Example 10.2.1). It is worth pointing out that in this case, the hypothesis u ∈ L∞ () is unnecessary. We now consider the general case. If N = 1, the result follows from the previous argument. We assume N ≥ 2. Since HN −1 (K) < +∞ and K is a compact set, for all n ∈ N∗ there exists a finite family of closed balls (Bρni (xi ))i∈In , covering K, with ρi ≤ n1 and such that cN −1 (2ρi )N −1 ≤ HN −1 (K) + 1. i∈In
Therefore
HN−1 (∂Bρni (xi )) ≤ C(HN −1 (K) + 1),
i∈In
where C is a positive constant depending only on the dimension N . We now consider the following functions un : $ u(x) for x ∈ \ Bρni (xi ), un (x) = i∈In 0 elsewhere. Since LN (∪i∈In Bρni (xi )) tends to zero, un → u strongly in L1 () when n → +∞. Since moreover ∂Bρni (xi )) is a finite union of C1 hypersurfaces, reasoning on a neighborhood
i
i i
i
i
i
i
410
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
of each ∂Bρni (xi )), from the trace theory and the estimate |u+ n (x)| ≤ ||u||L∞ () (note that according to Remark 10.3.4, u+ is a classical limit in a Borel set E + of ), one has |Dun | ≤ |∇u| dx + C||u||L∞ () (HN −1 (K) + 1).
\K
The semicontinuity of the total variation (Proposition 10.1.1) yields u ∈ BV (). On the other hand, it is easily seen that Su ⊂ K. Since HN −1 (K \ Su ) ≤ HN −1 (K) < +∞, according to Theorem 10.4.1, Cu(K \ Su ) = 0. Finally Cu = 0 because Cu(Su ) = 0.
10.5.2
Properties
The following chain-rule for the derivatives in BV () was established in Ambrosio [20]. Proposition 10.5.1. Let u be a given function in BV () and ϕ in C10 (R). Then v := ϕ ◦ u belongs to BV () and, even if it means changing νu by −νu , J v = ϕ(u+ ) − ϕ(u− ) νu HN −1 Su , ∇v = ϕ (u)∇u, Cv = ϕ (u)Cu. Proof. Consider un ∈ C∞ () ∩ BV () converging to u for the intermediate convergence. Then vn := ϕ ◦ un → v := ϕ ◦ u in L1 (). On the other hand, |Dv|() ≤ lim inf |Dvn |() n→+∞
≤ ||ϕ ||∞ lim inf |Dun |() n→+∞
= ||ϕ ||∞ |Du|() < +∞. This proves that v ∈ BV (). We now show that J v = ϕ(u+ ) − ϕ(u− ) νu HN −1 Su . Since ϕ ∈ C10 (R), ϕ is the difference of two nondecreasing functions in C1 (R). One may then assume ϕ nondecreasing so that v + = ϕ ◦ u+ , v − = ϕ ◦ u− , Sv = Su , and νv = νu . It remains to establish ∇v = ϕ (u)∇u and Cv = ϕ (u)Cu or equivalently, Dv\Sv = ϕ (u)Du\Sv . Consider a Borel set E of included in \ Sv . From the coarea formula (Theorem 10.3.3) and the structure of simple functions of BV () (Theorem 10.3.2(iii)), one has +∞ +∞ Dχ[v>t] (E) dt = Dχ[u>ϕ −1 (t)] (E) dt Dv(E) = −∞ −∞ +∞ = Dχ[u>t] (E) ϕ (t) dt −∞ +∞ = Dχ[u>t] ϕ (t) dt E −∞ +∞ = HN −1 (∂M ([u > t]) ∩ E) ϕ (t) dt. (10.42) −∞
i
i i
i
i
i
i
10.5. The space SBV ()
“abmb 2005/1 page 4 i
411
But, according to Proposition 10.3.5, for HN −1 a.e. x in , x ∈ ∂M ([u > t]) ∩ E ⇒ u(x) = t so that (10.42) yields +∞ N −1 Dv(E) = ϕ (u(x)) dH ∂M ([u > t])(x) dt −∞ E +∞ = ϕ (u)Dχ[u>t] dt −∞
E
= ϕ (u) Du(E), which completes the proof. The following criterion for a function u in BV () to belong to SBV () was established by Ambrosio in [14]. Theorem 10.5.1. Let u be a given function in BV (). Then u belongs to SBV () iff there exists a Borel measure µ in M( × R, RN ) and a in L1 (, RN ) such that for all in C1c (, RN ) and all ϕ in C10 (R), ϕ(s)(x) µ(dx, ds) = − (10.43) ϕ (u) a . (x) + ϕ(u) div (x) dx. ×R
Moreover, a = ∇u a.e. and N −1 N −1 µ = + Su ) − − Su ), # (νu H # (νu H N −1 |µ|( × R) = 2 H (Su ), where + : −→ × R, x → (x, u+ (x)) and − : −→ × R, x → (x, u− (x)). Proof. Let us assume that u belongs to SBV (). According to Proposition 10.5.1, for all ϕ ∈ C10 (R), ϕ(u) belongs to SBV () and D(ϕ(u)) = ϕ (u)∇uL + ϕ(u+ ) − ϕ(u− ) νu HN−1 Su . Consequently, for all ∈ C1c (, RN ) ϕ(u) div dx = − D(ϕ(u)), = − ϕ (u)∇u. dx − ϕ(u+ ) − ϕ(u− ) .νu dHN −1 Su ,
which we write ϕ (u) ∇u . + ϕ(u) div dx. ϕ(u+ ) − ϕ(u− ) .νu dHN −1 Su = −
According to the definition of the image of a measure, the left-hand side above is nothing but ϕ(s)(x) dµ(x, s). ×R
Finally, since + () and − () are disjoint sets, and by injectivity of + and − , one has N −1 N −1 |µ| = |+ Su ) − − Su )| # (νu H # (νu H + − N −1 N −1 = |# (νu H Su )| + |# (νu H Su )| − N −1 N −1 = + (H S ) + (H S ). u u # #
i
i i
i
i
i
i
412
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
Hence |µ|( × R) = 2HN−1 (Su ). The proof of the converse condition proceeds in three steps. First step. We establish a(x0 ) = ∇u(x0 ) for a.e. x0 in . Let us fix x0 ∈ such that a (y) := a(x0 + ρy) → a(x0 ) strongly in L1 (B), ρ uρ → ∇u(x0 ).y strongly in L1 (B), 1 lim N−1 |µ|(Bρ (x0 ) × R) = 0, ρ−→0 ρ where uρ denotes the rescaled function uρ (y) :=
1 u(x0 + ρy) − u(x0 ) , ρ
and Bρ (x0 ), B, respectively, the open ball in RN with radius ρ, centered at x0 , and the unit open ball in RN centered at 0. Let us justify the possible choice of such x0 . Actually x0 is chosen to be a Lebesgue point of y → aρ (y) := a(x0 + ρy). On the other hand, according to Proposition 10.4.1, the second property is satisfied a.e. in . Finally, denoting by π the projection from × R onto and by π #|µ| the image of the measure |µ| by π , the limit lim
ρ−→0
1 1 |µ|(Bρ (x0 ) × R) = lim N π# |µ|(Bρ (x0 )) ρ−→0 ρ ρN
exists for almost every x0 in , and, up to a positive multiplicative constant, is equal to the density of the regular part of the measure π# |µ| in its Lebesgue–Nikodym decomposition. In what follows, x0 is a fixed element in where these three properties are satisfied. ˜ ∈ ˜ x−x0 ), where Applying condition (10.43) to the function defined by (x) := ( ρ 1 N Cc (B, R ), one obtains x − x0 1 x − x0 ˜ ˜ ϕ(u) dx− ϕ (u).a dx = ϕ(s)(x) dµ, − div ρ ρ ρ Bρ (x0 ) Bρ (x0 ) ×R 0 and the change of scale y = x−x gives ρ ˜ ϕ(u(x0 +ρy))+ρϕ (u(x0 +ρy))(y).a ˜ div − ρ (y) dy =
B
1 ρ N −1
ϕ(s)(x) dµ. ×R
0) Testing this equality with the function ϕ defined by ϕ(s) := γ ( s−u(x ), one obtains ρ 1 ˜ ˜ γ (uρ ) + γ (uρ )(y).a − ϕ(s)(x) dµ div ρ (y) dy = N −1 ρ B ×R
and letting ρ → 0, ˜ γ (∇u(x0 ).y) + γ (∇u(x0 ).y)(y).a(x ˜ div 0 ) dy = 0. B
Since ˜ ˜ ˜ dy div γ (∇u(x0 ).y) + γ (∇u(x0 ).y)(y).∇u(x0 ) dy = div γ (∇u(x0 ).y) B
B
= 0,
i
i i
i
i
i
i
10.5. The space SBV () we deduce
“abmb 2005/1 page 4 i
413
a(x0 ) − ∇u(x0 ) .
˜ γ (∇u(x0 ).y)(y) dy = 0.
B
˜ and γ being arbitrary, we deduce a(x0 ) = ∇u(x0 ) and the proof of the first The choice of step is complete. Second step. We establish Cu = 0. From Proposition 10.5.1, for all ϕ in C10 (R), ϕ ◦ u ∈ BV () so that for all in C1c (, RN ), one has div ϕ(u)+ϕ (u).∇u dx = − ϕ(u+ )−ϕ(u− )).νu dHN −1 − ϕ (u) dCu, Su
and condition (10.43) yields ϕ (u) dCu = ϕ(s)(x) µ(dx, ds) − ϕ(u+ ) − ϕ(u− )).νu dHN −1 .
×R
Su
(10.44) We now focus on a careful analysis of the measure ϕ (u) Cu. Let us apply the slicing Theorem 4.2.4 for the measure µ. Let τi denote the density of µi with respect to |µi |, where (µi )1≤i≤N is the family of components of the measure µ. According to Theorem 4.2.4 one has ϕ(s)i (x) µi (dx, ds) = i (x) ϕ(s)τi (x, s) θx (ds) σi (dx), ×R
R
where σi is the image of the measure |µi | by the projection of × R on and (θx )x∈R is a family of probability measures on R. Then (10.44) yields, for i = 1 . . . N, ϕ(s)τi (x, s) θx (ds) σi − ϕ(u+ ) − ϕ(u− ))νu,i HN −1 Su , (10.45) ϕ (u)Ci u = R
and finally, since Cu and HN−1 Su are mutually singular, ϕ(s)τi (., s) θ. (ds) λi ϕ (u)Ci u =
(10.46)
R
with λi := σi \Su . To complete the proof, the idea is to express (10.46) in terms of functional identity. Consider the densities b and c of, respectively, Ci u and λi with respect to the measure α := |Ci u| + λi , equality (10.46) yields, for α a.e. x in , b(x)(ϕ ◦ u)(x) = c(x) ϕ(s)τi (x, s) θx (ds). R
Indeed, there exists a Borel set Nϕ with L (Nϕ ) = 0 such that above equality holds true for all x in \ Nϕ . Since C10 () possesses a dense countable subset D, it also holds for all x in = \ ϕ∈D Nϕ . Let then x in and assume that b(x) = 0. We deduce N
ϕ ◦ u(x) =
c(x) b(x)
ϕ(s)τi (x, s) θx (ds) R
∀ϕ ∈ C10 (R).
i
i i
i
i
i
i
414
“abmb 2005/1 page 4 i
Chapter 10. Spaces BV and SBV
Equality between the two linear forms ϕ → (ϕ ◦ u)(x) and ϕ →
c(x) b(x)
ϕ(s)τi (x, s) θx (ds) R
brings a contradiction. (The first is not continuous in C10 (R).) Consequently, b(x) = 0 and Ci u = bα = 0 which ends the proof of the second step. N −1 N −1 Last step. It remains to establish µ = + Su ). AcSu ) − − # (νu H # (νu H cording to the previous step, equality (10.45) now becomes
ϕ(s)τi (x, s) θx (ds) σi = ϕ(u+ ) − ϕ(u− ))νu,i HN −1 Su .
R
Let β := σi + HN−1 Su and let b, c denote now the densities of HN −1 Su and σi with respect to the measure β. We have for all x ∈ \ N satisfying β(N ) = 0, and for all ϕ ∈ C10 (R), c(x) ϕ(s)τi (x, s) θx (ds) = b(x) ϕ(u+ ) − ϕ(u− ) νu,i . R
Thus, for all x in \ N c(x)τi (x, .) θx = b(x)νu,i (x) δu+ (x) − δu− (x) . For all bounded Borel function f : × R −→ R, we now obtain
f (x, s) dµi (x, s) = ×R
R
= =
f (x, s)τi (x, s) θx (ds) σi (dx) f (x, s)τi (x, s) θx (ds) c(x)β(dx)
R
f (x, u+ (x)) − f (x, u− (x)) νu,i (x) dHN −1 (x)
Su
and the proof is complete. We now state, without proof, another criterion for a function u in L∞ () to belong to SBV (). This criterion concerns the restrictions of u to the one dimensional slices of . Let us define for all ν ∈ S N−1 πν = {x ∈ RN : x.ν = 0}, x = {t ∈ R : x + tν ∈ }, x ∈ πν , ν = {x ∈ πν : x = ∅}. On the other hand, for all Borel functions u : → R and x in ν , we define the Borel function ux for all t in x by : ux (t) = u(x + tν). For a proof of the theorem below, consult Braides [82].
i
i i
i
i
i
i
10.5. The space SBV ()
“abmb 2005/1 page 4 i
415
Theorem 10.5.2. Let u be a given function in L∞ () such that for all ν ∈ S N −1 (i) ux ∈ SBV (x ) for HN−1 a.e. x ∈ ν ; (ii) |∇ux | dt + H0 (Sux ) HN −1 (dx) < +∞. ν
x
Then u belongs to SBV (). Conversely, if u belongs to SBV () ∩ L∞ (), conditions (i) and (ii) are satisfied for all ν in S N −1 . Moreover, for HN −1 a.e. x in ν , ∇u(x + tν).ν = ∇ux (t) and
H0 (Sux ) HN −1 (dx) = ν
|νu · ν| dHN −1 .
Su
i
i i
i
i
“abmb 2005/1 page 4 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 4 i
Chapter 11
Relaxation in Sobolev, BV, and Young measures spaces
11.1
Relaxation in abstract metrizable spaces
This section is devoted to the description of the relaxation principle in a general metrizable space or, more generally, in a first countable topological space X. Roughly speaking, given an extended real-valued function F : X −→ R ∪ {+∞}, we wish to apply the direct method in the calculus of variations to the lower semicontinuous envelope cl(F ) of the function F so that inf X F = minX cl(F ). Such a procedure is very important in various applications and leads to the concept of generalized solutions for the optimization problem inf X F . We begin by giving some complements on the sequential version, denoted by F , of the general notion of lower semicontinuous envelope cl(F ) introduced in chapter 3. Proposition 11.1.1. Let F : X −→ R ∪ {+∞} be a proper extended real-valued function defined on a metrizable space (X, d) or, more generally, on a first countable topological space, and let us define the extended real-valued function F : X −→ R by % & F (x) := inf lim inf F (xn ) : (xn )n∈N , x = lim xn . n→+∞
n→+∞
(11.1)
Then the function F is characterized for every x in X by the two following assertions: (i) ∀(xn )n∈N such that xn → x, F (x) ≤ lim inf n→+∞ F (xn ); (ii) there exists a sequence (yn )n∈N in X such that yn → x and F (x) ≥ lim supn→+∞ F (yn ). Proof. Note that trivially the system of assertions (i) and (ii) is equivalent to (i) and (ii) : (i) ∀(xn )n∈N such that xn → x, F (x) ≤ lim inf n→+∞ F (xn ); (ii) there exists a sequence (yn )n∈N in X such that yn → x and F (x) = limn→+∞ F (yn );
417
i
i i
i
i
i
i
418
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
and each function F satisfying (i) and (ii) automatically satisfies & % F (x) := inf lim inf F (xn ) : (xn )n∈N , x = lim xn . n→+∞
n→+∞
We are reduced to establishing that the function F defined by formula (11.1) satisfies (i) and (ii) . We establish only the nontrivial assertion (ii) . Its proof is based on the following diagonalization lemma. Lemma 11.1.1. Let (am,n )(m,n)∈N×N be a sequence in a first countable topological space X such that (i) limn→+∞ am,n = am ; (ii) limm→+∞ am = a. Then there exists a nondecreasing map n → m(n) from N into N such that lim am(n),n = a.
n→+∞
For a proof and other diagonalization results, see [28]. Note that under the same conditions, we have the following more classical diagonalization procedure: there exists an increasing map m → n(m) such that limm→+∞ am,m(n) = a. This second result could be applied for proving Proposition 11.1.1 but we prefer using Lemma 11.1.1, which will turn out to be the fitting tool for establishing a similar proposition in the context of the -convergence (see the next chapter). Let us go back to the proof of Proposition 11.1.1. By definition of the infima, for all m ∈ N∗ there exists a sequence (xm,n )n∈N in X satisfying limn→+∞ xm,n = x and such that 1 if F (x) = −∞, F (x) ≤ lim inf F (xm,n ) ≤ F (x) + , n→+∞ m if F (x) = −∞, lim inf F (xm,n ) ≤ −m. n→+∞
Therefore limm→+∞ limn→+∞ F (xm,σm (n) ) = F (x), where σm : N −→ N is an increasing map, possibly depending on m, such that lim inf F (xm,n ) = lim F (xm,σm (n) ). n→+∞
n→+∞
We end the proof by applying Lemma 11.1.1 to the sequence ((xm,σm (n) , F (xm,σm (n) ))(m,n)∈N2 in the metrizable space X × R: there exists n → m(n) mapping N into N such that
lim F (xm(n),σm(n) (n) ) = lim
n→+∞
lim xm(n),σm(n) (n) = lim
n→+∞
lim F (xm,σm (n) ) = F (x),
m→+∞ n→+∞
lim xm,σm (n) = x.
m→+∞ n→+∞
The sequence (yn )n∈N defined by yn = xm(n),σm(n) (n) fulfills assertion (ii) .
i
i i
i
i
i
i
11.1. Relaxation in abstract metrizable spaces
“abmb 2005/1 page 4 i
419
Theorem 11.1.1. The function F defined in Proposition 11.1.1 is the lower semicontinuous (lsc) envelope cl(F ) of the function F , i.e., the greatest lower semicontinuous function less than F . Proof. We must establish F ≤ F; F lsc; G : X −→ R, G lsc and G ≤ F ⇒ G ≤ F . For the first assertion, take the constant sequence (xn )n∈N = (x)n∈N in formula (11.1). Let us prove the second assertion. Let (ym )m∈N be a sequence in X converging to y ∈ X and consider a subsequence (yσ (m) )m∈N satisfying limm→+∞ F (yσ (m) )= lim inf m→+∞ F (ym ). According to Proposition 11.1.1, there exists a sequence (yσ (m),n )n∈N in X satisfying limn→+∞ yσ (m),n = yσ (m) and such that lim inf F (ym ) = lim F (yσ (m) ) m→+∞
m→+∞
= lim
lim F (yσ (m),n ).
m→+∞ n→+∞
On the other hand, we have limm→+∞ limn→+∞ yσ (m),n = y. Applying the diagonalization lemma, Lemma 11.1.1, to the sequence (yσ (m),n , F (yσ (m),n ))(m,n)∈N2 in the metrizable space X × R, there exists n → m(n) mapping N to N such that lim F (yσ (m(n)),n ) = lim inf F (ym ), n→+∞
lim yσ (m(n)),n = y.
m→+∞
n→+∞
Hence lim inf F (ym ) = lim F (yσ (m(n)),n ) m→+∞
n→+∞
≥ lim inf F (yσ (m(n)),n ) n→+∞ % & ≥ inf lim inf F (xn ) : (xn )n∈N , y = lim xn = F (y). n→+∞
n→+∞
We establish now the third assertion. Let G ≤ F be an lsc function mapping X into R and, for every x in X, consider any sequence (xn )n∈N in X converging to x. We have G(x) ≤ lim inf G(xn ) n→+∞
≤ lim inf F (xn ). n→+∞
Taking the infimum over all the sequences (xn )n∈N converging to x, we finally obtain the inequality G(x) ≤ F (x). From now on, we write indifferently F or cl(F ) when X is metrizable or first countable. But note that the two notions differ when X is a general topological space. In the following theorem, we state the abstract relaxation principle in countable topological spaces. (One could also apply Theorem 3.2.3 and Theorem 11.1.1 to obtain it.)
i
i i
i
i
i
i
420
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Theorem 11.1.2. Let F : X −→ R ∪ {+∞} be a proper extended real-valued function defined on a metrizable space (X, d) or, more generally, on a first countable topological space, and assume that%there exists a&minimizing sequence (xn )n∈N (i.e., limn→+∞ F (xn ) = inf X F ) such that S = xn : n ∈ N is relatively compact in X. Then (i) inf X F = minX cl(F ); (ii) every cluster point x of S is a solution of minX cl(F ), i.e., cl(F )(x) = minX cl(F ). Proof. Let x be any cluster point of S and (xσ (n) )n∈N a subsequence of (xn )n∈N converging to x. According to Proposition 11.1.1(i), we have cl(F )(x) ≤ lim inf F (xσ (n) ) = inf F. n→+∞
X
(11.2)
Let now x be any element of X. From Proposition 11.1.1(ii), there exists a sequence (yn )n∈N converging to x and satisfying cl(F )(x) ≥ lim sup F (yn ).
(11.3)
n→+∞
Combining (11.2) and (11.3), we obtain cl(F )(x) ≤ lim inf F (xσ (n) ) = inf F ≤ lim sup F (yn ) ≤ cl(F )(x). n→+∞
X
(11.4)
n→+∞
This proves that cl(F )(x) = min cl(F ). Taking now x = x in (11.4), we obtain X
inf F = min cl(F ) = cl(F )(x) X
X
and the proof is complete. In the terminology of the relaxation theory, the problem (P) :
min cl(F )
is called the relaxed problem of the optimization problem (P) :
inf F. X
A solution of (P) is sometimes called a generalized solution of the initial problem (P). The relaxation procedure consists in making explicit the lsc envelope of the functional F for a suitable topology on the space X, to obtain a well-posed problem (P) in the sense of Theorem 11.1.2 (i.e., the existence of an optimal solution holds for (P)).
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
11.2
“abmb 2005/1 page 4 i
421
Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
One of the fundamental hypotheses in elasticity theory is that the total free energy F associated with many materials is of local nature. From a mathematical point of view, the functional F can be represented as the integral over the reference configuration ⊂ RN (N = 3), of a density associated with the possible deformation gradients of the body, which account for the local deformations. The other basic principle is that equilibrium configurations correspond to minimizers of F under prescribed conditions in a Sobolev space W 1,p (, Rm ) (m = 3). The functional F may fail to be lower semicontinuous. Indeed, to model the various solid/solid phase transformations in the microstructure, the density energy possesses in general a multiwell structure, and corresponding optimization problems have no solutions. According to Section 11.1, a classical procedure is to replace F by its lower semicontinuous envelope with respect to the weak topology of W 1,p (, Rm ). The relaxed problem possesses now at least a solution giving the same initial energy. The case p = 1 will be treated in the next section. We will see that contrary to the case p > 1, the domain of the lower semicontinuous envelope cl(F ) of the functional F , strictly contains the domain W 1,1 (, Rm ) of F . This domain is indeed the space BV (, Rm ) of functions of bounded variation introduced in chapter 10. This phenomena is due to the lack of reflexivity of the Sobolev space W 1,1 (, Rm ). This approach, which consists in relaxing the functional F with respect to the weak topology of W 1,p (, Rm ) (or to the strong topology of Lp (, Rm )), is not the only one. Another idea consists in “enlarging” the space W 1,p (, Rm ) of admissible functions and treating the problem in the space Y(; Mm×N ) of Young measures introduced in Section 4.3. Actually this is the same general procedure: the integral functionals are considered as living on the space X = Y(; Mm×N ) equipped with a metrizable topology for which compactness of minimizing sequences also holds. One computes the lsc envelope of F relative to this new space. The two relaxed problems are different but there are some important connections between them. This procedure will be described in detail in Section 11.4. Let us now make precise the structure of the functional F . We consider a bounded open subset of RN , sufficiently regular so that the trace theory, the Rellich–Kondrakov theorem, Theorem 5.4.2, and density arguments apply (take, for example, of class C1 ). We denote the space of m × N matrices with entries in R by Mm×N and consider a function f : Mm×N −→ R such that there exist three positive constants α, β, L satisfying ∀a ∈ Mm×N , α|a|p ≤ f (a) ≤ β(1 + |a|p ), ∀a, b ∈ Mm×N , |f (a) − f (b)| ≤ L|b − a|(1 + |a|p−1 + |b|p−1 ).
(11.5) (11.6)
Let W 1,p (, Rm ) be the space (isomorphic to W 1,p ()m ; see chapter 5), made up of all ∂ui functions u : −→ Rm whose distributional gradient ∇u = ( ∂x )i=1...m, j =1...N belongs to j
i
i i
i
i
i
i
422
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Lp (, Mm×N ). We define the functional F : Lp (, Rm ) −→ R+ ∪ {+∞} by f (∇u) dx if u ∈ W 1,p (, Rm ), F (u) = +∞ otherwise,
(11.7)
and we intend to compute its lsc envelope in the space X = Lp (, Rm ) equipped with its strong topology. Actually, by classical arguments (the compactness of the embedding of W 1,p (, Rm ) into Lp (, Rm ) and the lower bound in (11.5)), one can easily establish that the lsc envelope of F considered as living on W 1,p (, Rm ) equipped with its weak topology, coincides with the restriction to W 1,p (, Rm ) of the lsc envelope of F considered here. Proposition 11.2.1 states that the domain of F is not relaxed. Proposition 11.2.1. The domain of the functional cl(F ) is the space W 1,p (, Rm ). Proof. From the inequality cl(F ) ≤ F , we obviously obtain W 1,p (, Rm ) ⊂ dom(cl(F )). For the converse inclusion, let u ∈ Lp (, Rm ) such that cl(F )(u) < +∞ and consider a sequence (un )n∈N strongly converging to u in Lp (, Rm ) and satisfying cl(F )(u) = limn→+∞ F (un ). Such a sequence exists from Proposition 11.1.1. According to the lower bound (11.5) and to the equality cl(F )(u) = limn→+∞ F (un ) < +∞ one obtains sup |∇un |p dx < +∞, n∈N
so that (un )n∈N is bounded in W 1,p (, Rm ). According to the Rellich–Kondrakov theorem, Theorem 5.4.2, there exists a subsequence (not relabeled) and v ∈ W 1,p (, Rm ) such that un v weakly in W 1,p (, Rm ), un → v strongly in Lp (, Rm ). Consequently u = v ∈ W 1,p (, Rm ). In Theorem 11.2.1, we establish that the functional cl(F ) possesses an integral representation. In the following proposition, also valid for p = 1, we characterize the density of this integral functional. For every bounded Borel set A of RN , we will sometimes denote its N-dimensional Lebesgue measure by |A| rather than LN (A). Proposition 11.2.2 (quasi-convex envelope of f ). Let us consider a function f : Mm×N −→ R+ satisfying for p ≥ 1 and all a ∈ Mm×N the upper growth condition 0 ≤ f (a) ≤ β(1 + |a|p ), and the continuity assumption (11.6). Then for each fixed a in Mm×N , the map 1 1,p m D → ID := inf f (a + ∇φ(x)) dx : φ ∈ W0 (D, R ) |D| D is constant on the family of all open bounded subsets of RN whose boundary satisfies |∂D| = 0; we denote it by Qf (a). If f satisfies (11.5), the function Qf : Mm×N → R+ , defined for all a ∈ Mm×N by 1 1,p Qf (a) = inf f (a + ∇φ(x)) dx : φ ∈ W0 (D, Rm ) |D| D
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
423
satisfies the same condition (11.5) and (11.6) with a new constant L depending only on α, β, and p. Moreover, Qf is W 1,p -quasi-convex in the sense of Morrey (quasi-convex in short), namely, it satisfies the so-called quasi-convexity inequality: for all open bounded subset D of RN with |∂D| = 0, 1 1,p Qf (a + ∇φ(x)) dx. (11.8) ∀a ∈ Mm×N , ∀φ ∈ W0 (D, Rm ), Qf (a) ≤ |D| D Furthermore, the function Qf is the greatest quasi-convex function less than or equal to f , also called the quasi-convexification or quasi-convex envelope of f . Proof. (a) Let D and D be two open bounded subsets of RN with |∂D| = |∂D | = 0. For proving the first assertion, it suffices to establish ID ≤ ID and to invert the roles of D and D. For every ε > 0 there exists a finite family (xi + εi D )i∈Iε of pairwise disjoint sets xi + εi D ⊂ D, εi > 0, satisfying $ (xi + εi D ) < ε. (11.9) D \ i∈Iε
First, we claim that the map I verifies the following subadditivity property: if A and B are two disjoint bounded open subsets of RN , then |A ∪ B|IA∪B ≤ |A|IA + |B|IB .
(11.10)
Indeed, let φA and φB be two η-minimizers of |A|IA and |B|IB in D(A, Rm ) and D(B, Rm ), respectively, extended by 0 on RN \ A and RN \ B. We have f (a + ∇φA ) dx − η, |A|IA ≥ A
|B|IB ≥
f (a + ∇φB ) dx − η. B
Such η-minimizers exist thanks to (11.6) and by a density argument. The function φ which 1,p coincides with φA and φB on A and B, respectively, belongs to W0 (A ∪ B, Rm ) so that f (a + ∇φ) dx |A ∪ B|IA∪B ≤ A∪B = f (a + ∇φA ) dx + f (a + ∇φB ) dx A
≤ |A|IA + |B|IB + 2η.
B
The thesis is obtained after making η → 0. Using quite similar arguments, one also obtains |A|IA ≤ |A \ B|IA\B + |B|IB
(11.11)
whenever A and B are two open bounded subsets of RN with B ⊂ A and |∂B| = 0.
i
i i
i
i
i
i
424
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Applying, respectively, (11.10) and (11.11) to the finite union i∈Iε (xi + εi D ) and to A = D, B = i∈Iε (xi + εi D ), according to (11.9) and to the growth condition (11.5), we obtain $ |εi D |Ixi +εi D , (11.12) (xi + εi D )I i∈Iε (xi +εi D ) ≤ i∈Iε i∈Iε $ (11.13) |D|ID ≤ β(1 + |a|p )ε + (xi + εi D )Ii∈Iε (xi +εi D ) . i∈Iε
A change of scale easily gives Ixi +εi D = ID . Combining now (11.12) and (11.13), one obtains β(1 + |a|p )ε ID ≤ + ID . |D| Since ε is arbitrary we have indeed established ID ≤ ID . It is worth noticing that the above subadditivity argument is a particular case of a more general result related to subadditive ergodic processes (see Krengel [165], Dal Maso and Modica [119], or Licht and Michaille [170]). Such a general argument will be used in chapter 12. (b) We assume that f satisfies (11.5) and show that Qf satisfies the same conditions. Taking D = Y = (0, 1)N in the definition of Qf , from (11.5), we have 1,p Qf (a) ≥ α inf |a + ∇φ|p dx : φ ∈ W0 (Y, Rm ) Y p 1,p ≥ α inf (a + ∇φ) dx : φ ∈ W0 (Y, Rm ) Y
= α|a|p . We have used Jensen’s inequality satisfied by the convex function a → |a|p in the second inequality above. The upper bound of (11.5) is trivially obtained by taking φ = 0 as admissible function in the expression of the infima and by using the upper bound condition satisfied by f . 1,p Let us now establish (11.6). Given arbitrary η > 0, let φη ∈ W0 (Y, Rm ) be such that Qf (b) ≥ f (b + ∇φη ) dx − η. Y
From (11.6) and Hölder’s inequality, we obtain Qf (a) − Qf (b) ≤ f (a + ∇φη ) dx − f (b + ∇φη ) dx + η Y Y ≤ | f (a + ∇φη ) − f (b + ∇φη ) | dx + η Y
≤L|a−b |
p
(1+ | a + ∇φη (x) |p−1 + | b + ∇φη (x) |p−1 ) p−1 dx Y
≤ CL | a − b |
(1+ | a |p + | b |p + | b + ∇φη (x) |p )dx
p−1 p
p−1 p
+ η,
+η (11.14)
Y
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1 where C is a constant depending only on p. On the other hand, from (11.5), 1 p | b + ∇φη (x) | dx ≤ f (b + ∇φη ) dx α Y Y 1 ≤ (Qf (b) + η) α β η ≤ (1+ | b |p ) + . α α
“abmb 2005/1 page 4 i
425
(11.15)
Combining (11.14) and (11.15) and letting η → 0, we obtain Qf (a) − Qf (b) ≤ L | b − a | (1+ | a |p−1 + | b |p−1 ), where L is a constant which depends only on p, α β. We end the proof by interchanging the roles of a and b. (c) We establish the quasi-convex inequality. Let us set % & 1,p Aff0 (D, Rm ) := φ ∈ W0 (D, Rm ) : φ piecewise affine . 1,p
Note that according to the density of Aff0 (D, Rm ) in W0 (D, Rm ) equipped with its strong topology, and to (11.6), we have 1 ∀a ∈ Mm×N , Qf (a) = inf f (a + ∇φ) dx : φ ∈ Aff0 (D, Rm ) . |D| D It is now easily seen that the quasi-convex inequality is satisfied with every φ ∈ Aff0 (D, Rm ). Indeed, since φ ∈ Aff0 (D, Rm ), there exist some open bounded and pairwise disjoint sets Di ⊂ D, i = 1, . . . , r, with D¯ = ∪ri=1 D¯ i , |∂Di | = 0, and ai ∈ Mm×N , i = 1, . . . , r, such that ∇φ ≡ ai on Di , which clearly implies Qf (a + ∇φ) dx = D
r
Qf (a + ai )|Di |.
(11.16)
i=1
On the other hand, for η > 0 and i = 1, . . . , r, there exists φi,η ∈ Aff0 (Di , Rm ) such that 1 f (a + ai + ∇φi,η ) dx − η. (11.17) Qf (a + ai ) ≥ |Di | Di Let us consider the function φ˜ defined on D by ˜ φ(x) = φ(x) + φi,η (x) when x ∈ Di . Clearly φ˜ belongs to W0 (D, Rm ) (actually to Aff0 (D, Rm )). Summing inequalities (11.17) for i = 1, . . . , r, equality (11.16) yields ˜ dx − η|D| Qf (a + ∇φ) dx ≥ f (a + ∇ φ) 1,p
D
D
≥ Qf (a)|D| − η|D|.
i
i i
i
i
i
i
426
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Letting η → 0, we obtain the quasi-convex inequality for every φ ∈ Aff0 (D, Rm ). Thanks to 1,p the density of Aff0 (D, Rm ) in W0 (D, Rm ) and to (11.6) satisfied by Qf , the quasi-convex 1,p inequality is now satisfied for any φ ∈ W0 (D, Rm ). It remains to establish that Qf is the greatest quasi-convex function less than or equal to f . First notice that g ≤ f yields Qg ≤ Qf . On the other hand, if g is quasi-convex, Qg = g. Indeed Qg ≤ g by definition and inequality (11.8) satisfied by g gives the converse inequality. We then obtain from g ≤ f and the quasi-convexity of g, that g = Qg ≤ Qf . Remark 11.2.1. One can prove that Qf is the quasi-convex envelope of f under less restrictive assumptions on f , as for example, without continuity condition (see Dacorogna [116, Theorem 1.1] and [57]). For the relationship between the notions of convexity, polyconvexity, rank-one convexity, and various examples, consult Dacorogna [116] and Sverak [211], [212], [213]. One can establish that Qf is convex on each interval [a, b] in Mm×N satisfying rank(a − b) = 1, i.e., is rank-one convex. Consequently, when m = 1 or N = 1, Qf is a convex function and actually Qf = f ∗∗ . For a proof, consult Step 3 in the proof of Theorem 1.1 in Dacorogna [116]. Nevertheless, when m > 1 or N > 1, Qf is not in general the convexification of f . Consequently, the optimization problems related to integral functionals F with a convex density f are not well posed. One needs to use a relaxation procedure in the sense of the relaxation theorem, Theorem 11.1.2. Let f : Mm×N −→ R satisfying (11.5) and (11.6). In Theorem 13.2.1, we will establish, in a more general situation, that the quasi-convex inequality (11.8) is a necessary and the lower semicontinuity of the integral functional u → sufficient condition to ensure 1,p f (∇u) dx defined on W (, Rm ) equipped with its weak topology. We now state the main result of this section. Theorem 11.2.1. Let us consider a function f : RN −→ R satisfying (11.5) and (11.6) with p > 1, and F the associated integral functional (11.7) defined in Lp (, Rm ) equipped with its strong topology. Then the lsc envelope of F is given, for every u in Lp (, Rm ), by Qf (∇u) dx if u ∈ W 1,p (, Rm ), cl(F )(u) = +∞ otherwise. The proof of Theorem 11.2.1 is the straightforward consequence of Propositions 11.2.3 and 11.2.4. We adopt the following notation: Qf (∇u) dx if u ∈ W 1,p (, Rm ), QF (u) = +∞ otherwise. The proofs given here have the advantage of being easily adapted to the theory of homogenization (see the next chapter).
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
427
Proposition 11.2.3. For every u in Lp (, Rm ), p > 1, and every sequence (un )n∈N strongly converging to u in Lp (, Rm ), one has QF (u) ≤ lim inf F (un ). n→+∞
(11.18)
Assume that f satisfies (11.6) and only the upper growth condition 0 ≤ f (a) ≤ β(1 + |a|p ) for all a ∈ Mm×N . Then, for every u in W 1,p (, Rm ) and every sequence (un )n∈N weakly converging to u in W 1,p (, Rm ), one has QF (u) ≤ lim inf F (un ). n→+∞
Proof. We assume that f satisfies (11.5) and (11.6) and we establish the first assertion. The second assertion will be obtained at the end of the proof as a straightforward consequence. Obviously, one can assume lim inf n→+∞ F (un ) < +∞ so that u belongs to W 1,p (, Rm ). For a nonrelabeled subsequence, consider the nonnegative Borel measure µn := f (∇un (.))LN . We have sup µn () < +∞. n∈N
Consequently there exists a further subsequence (not relabeled) and a nonnegative Borel measure µ ∈ M() such that µn µ weakly in M(). Let µ = gLN + µs be the Lebesgue–Nikodym decomposition of µ, where µs is a nonnegative Borel measure, singular with respect to the Lebesgue measure LN . For establishing (11.18) it is enough to prove that for a.e. x ∈ , g(x) ≥ Qf (∇u(x)). Indeed, according to Alexandrov’s theorem, Proposition 4.2.3, we will obtain g(x) dx + µs () lim inf F (un ) = lim inf µn () ≥ µ() = n→+∞ n→+∞ ≥ g(x) dx ≥ Qf (∇u(x)) dx.
Let ρ > 0 intended to tend to 0 and denote the open ball of radius ρ centered at x0 by Bρ (x0 ). According to the theory of differentiation of measures (see Theorem 4.2.1), there exists a negligible set N for the measure LN , such that for all x0 ∈ \ N , g(x0 ) = lim
ρ→0
µ(Bρ (x0 )) . |Bρ (x0 )|
i
i i
i
i
i
i
428
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Applying Lemma 4.2.1, for all but countably many ρ > 0, one may assume µ(∂Bρ (x0 )) = 0. From Alexandrov’s theorem, Proposition 4.2.3, we then obtain µ(Bρ (x0 )) = limn→+∞ µn (Bρ (x0 )) and we finally are reduced to establishing lim lim
ρ→0 n→+∞
µn (Bρ (x0 )) ≥ Qf (∇u(x0 )) |Bρ (x0 )|
for x0 ∈ \ N.
(11.19)
Let us assume for the moment that the trace of un on ∂Bρ (x0 ) is equal to the affine function u0 defined by u0 (x) := u(x0 ) + ∇u(x0 ), x − x0 . It follows that µn (Bρ (x0 )) 1 f (∇u(x0 ) + ∇(un − u0 )) dx = lim lim n→+∞ |Bρ (x0 )| n→+∞ |Bρ (x0 )| B (x ) ρ 0
1 1,p m ≥ inf f (∇u(x0 ) + ∇φ) dx : φ ∈ W0 (Bρ (x0 ), R ) |Bρ (x0 )| Bρ (x0 ) = Qf (∇u(x0 )) and the proof would be complete. Note that the small size of the radius ρ of Bρ (x0 ) does not contribute to the estimate above. The idea now consists in modifying un by a function of W 1,p (Bρ (x0 ), Rm ) which coincides with u0 on ∂Bρ (x0 ) in the trace sense, to follow the previous procedure and to control additional terms, when ρ goes to zero, thanks to the following classical estimate. Lemma 11.2.1. For every u in W 1,p (, Rm ), p ≥ 1, there exists a negligible set N such that for all x0 in \ N we have -
1 |Bρ (x0 )|
|u(x) − u(x0 ) + ∇u(x0 )(x − x0 ) |p dx
.1/p = o(ρ).
(11.20)
Bρ (x0 )
Proof. For the proof we refer the reader to Ziemer [229, Theorem 3.4.2]. From now on, we fix x0 in the set \ N ∪ N . To suitably modify un on the boundary of Bρ (x0 ), we use a well-known method due to De Giorgi. A neighborhood of the boundary of Bρ (x0 ) is sliced as follows: let ν ∈ N∗ , ρ0 := λρ, where 0 < λ < 1, and set Bi := Bρ0 +i ρ−ρ0 (x0 ) for i = 0, . . . , ν. ν
On the other hand, consider for i = 1, . . . , ν, ϕi ∈ C∞ 0 (Bi ), 0 ≤ ϕi ≤ 1, ϕi = 1 in Bi−1 , |gradϕi |L∞ (Bi ) ≤
ν ν = ρ − ρ0 ρ(1 − λ)
and define un,i ∈ W 1,p (Bρ (x0 ), Rm ) by un,i := u0 + ϕi (un − u0 ). For i = 1, . . . , ν, we have
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
429
Qf (∇u(x0 ))
1 1,p m = inf f (∇u(x0 ) + ∇φ) dx : φ ∈ W0 (Bρ (x0 ), R ) |Bρ (x0 )| Bρ (x0 ) 1 f (∇un,i ) dx ≤ |Bρ (x0 )| Bρ (x0 ) 1 1 = f (∇un,i ) dx f (∇un ) dx + |Bρ (x0 )| Bi \Bi−1 |Bρ (x0 )| Bi−1 1 + f (∇u(x0 )) dx |Bρ (x0 )| Bρ (x0 )\Bi 1 1 ≤ f (∇un,i ) dx f (∇un ) dx + |Bρ (x0 )| Bi \Bi−1 |Bρ (x0 )| Bρ (x0 ) + β 1 + |∇u(x0 )|p (1 − λ)N . (11.21) Let us estimate the second term of the right-hand side of (11.21). From the growth condition (11.5) we obtain 1 f (∇un,i )dx ≤ C 1 + |∇u(x0 )|p (1 − λ)N |Bρ (x0 )| Bi \Bi−1 C νp C p |un − u0 |p dx, + |∇(un − u0 )| dx + |Bρ (x0 )| ρ p (1 − λ)p Bi \Bi−1 |Bρ (x0 )| Bi \Bi−1 where C is a positive constant depending only on p and β. Then, averaging inequalities (11.21), we obtain ν 1 Qf (∇u(x0 )) ν i=1 Cν p−1 1 f (∇un ) dx + |un − u0 |p dx ≤ |Bρ (x0 )| Bρ (x0 ) |Bρ (x0 )|ρ p (1 − λ)p Bρ (x0 ) C 1 p N + C 1 + |∇u(x0 )| (1 − λ) + |∇un |p dx ν |Bρ (x0 )| Bρ (x0 ) C +1 Cν p−1 f (∇un ) dx + |un − u0 |p dx ≤ αν |Bρ (x0 )| Bρ (x0 ) |Bρ (x0 )|ρ p (1 − λ)p Bρ (x0 ) + C 1 + |∇u(x0 )|p (1 − λ)N , (11.22)
Qf (∇u(x0 )) =
where we have used the lower bound (11.5) in the last inequality. Letting n → +∞ and ρ → 0, from Lemma 11.2.1 we obtain µn (Bρ (x0 )) C Qf (∇u(x0 )) ≤ + 1 lim lim + C 1 + |∇u(x0 )|p (1 − λ)N n→+∞ ρ→0 αν |Bρ (x0 )| and (11.19) follows after letting λ → 1 and ν → +∞. It is worth noticing that we have used the slicing method to control the term |Bρ 1(x0 )| Bρ (x0 ) |∇un |p dx by letting the slices become increasingly thinner (i.e., ν → +∞).
i
i i
i
i
i
i
430
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
If now f does not verify the lower bound in (11.5), we first note that the weak convergence of un to u in W 1,p (, Rm ) yields the boundedness of |∇un |p dx < +∞. sup n∈N
Bρ (x0 )
Moreover, according to the Rellich–Kondrakov compact embedding theorem, Theorem 5.4.2, un → u strongly in Lp (, Rm ). Then, to conclude, it suffices going to the limit respectively on n and ρ in (11.22) and to let λ → 1, ν → +∞ as above. Proposition 11.2.4. For every u in Lp (, Rm ), p > 1, there exists a sequence (un )n∈N strongly converging to u in Lp (, Rm ) such that QF (u) ≥ lim sup F (un ). n→+∞
Proof. One can assume QF (u) < +∞. Therefore, taking D = Y := (0, 1)N in the definition of Qf , and η = 1/k, k ∈ N∗ QF (u) = Qf (∇u) dx 1,p = inf f (∇u(x) + ∇y φ(y)) dy : φ ∈ W0 (Y, Rm ) dx Y ≥ f (∇u(x) + ∇y φη (x, y)) dxdy − η||, (11.23) ×Y
& % 1,p where φη (x, .) is a η-minimizer of inf Y f (∇u(x) + ∇y φ(y)) dy : φ ∈ W0 (Y, Rm ) . 1,p We admit that we can select a measurable map x → φη (x, .) from into W0 (Y, Rm ). For a 1,p proof, consult Castaing andValadier [105]. We claim that φη belongs to Lp (, W0 (Y, Rm ). Indeed, p ∇y φη (x, .)Lp (Y,Mm×N ) dx = |∇y φη (x, y)|p dxdy ×Y |∇u(x) + ∇y φη (x, y)|p dxdy + |∇u|p dx ≤C ×Y 1 ≤C f (∇u(x) + ∇y φη (x, y)) dxdy + |∇u|p dx α ×Y 1 η|| p ≤C QF (u) + + |∇u| dx < +∞, α α where C is a positive constant depending only on p. 1,p Classically Cc (, D(Y, Rm )) is dense in Lp (, W0 (Y, Rm ). Consequently, from the continuity assumption (11.6) fulfilled by f , it is easily seen that (11.23) yields QF (u) = Qf (∇u) dx ≥ f (∇u(x) + ∇y φ˜ η (x, y)) dxdy − 2η|| (11.24)
×Y
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
431
for some φ˜ η in Cc (, D(Y, Rm )). We have actually established the following interchange result between infimum and integral: f (∇u(x) + ∇y φ(y)) dy dx inf 1,p
φ∈W0 (Y,Rm ) Y
=
f (∇u(x) + ∇y (x, y)) dxdy
inf
1,p
∈Lp (,W0 (Y,Rm )) ×Y
=
inf
∈Cc (,D(Y,Rm )) ×Y
Indeed from (11.23)
f (∇u(x) + ∇y (x, y)) dxdy.
inf
1,p
φ∈W0 (Y,Rm ) Y
≥
inf
f (∇u(x) + ∇y φ(y)) dy dx
1,p
∈Lp (,W0 (Y,Rm )) ×Y
f (∇u(x) + ∇y (x, y)) dxdy
and the converse inequality is trivial. Note that this result also holds for p = 1. Because of its importance, we state it in a slightly more general form. Lemma 11.2.2. Let f : Mm×N → R+ be any function satisfying (11.5) and (11.6) with p ≥ 1. Let moreover ξ be any element of Lp (, Mm×N ). Then inf f (ξ(x) + ∇y φ(y)) dy dx
=
1,p
φ∈W0 (Y,Rm ) Y
inf
1,p
∈Lp (,W0 (Y,Rm )) ×Y
f (ξ(x) + ∇y (x, y)) dxdy
=
inf
∈Cc (,D(Y,Rm )) ×Y
f (ξ(x) + ∇y (x, y)) dxdy.
For more about interchange theorems, consult, for instance, [26]. Proof of Proposition 11.2.4 continued. Let us go back to (11.24). To shorten the notation we denote the previous η-minimizer φ˜ η in Cc (, D(Y, Rm )) by φη and extend y → φη (x, y) by Y -periodicity on RN . Consider now the function uη,n defined by uη,n (x) = u(x) +
1 φη (x, nx). n
Note that φη is a Carathéodory function so that x → φη (x, nx) is measurable. Clearly uη,n belongs to W 1,p (, Rm ) and uη,n → u strongly in Lp (, Rm ) when n → +∞. It’s indeed a straightforward consequence of |φη (x, nx)|p dx ≤ sup |φη (x, y)|p dx
y∈Y
≤ || sup sup |φη (x, y)|p < +∞. x∈ y∈Y
i
i i
i
i
i
i
432
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
On the other hand, 1 lim f (∇uη,n ) dx = lim f (∇u(x) + (∇y φη )(x, nx) + ∇x φη (x, nx)) dx n→+∞ n→+∞ n = lim f (∇u(x) + (∇y φη )(x, nx)) dx n→+∞ = f (∇u(x) + ∇y φη (x, y)) dxdy. ×Y
For passing from the first to the second equality we have used the continuity assumption (11.6) on f . The last equality is a consequence of Lemma 11.2.3 stated at the end of the proof and applied to g(x, y) = f (∇u(x) + ∇y φη (x, y)) (note that y → f (∇u(x) + ∇y φη (x, y)) belongs to the set C# (Y ) of all the restrictions to Y of continuous and Y -periodic functions on RN and that g ∈ L1 (, C# (Y ))). Consequently, from (11.24) f (∇u(x) + (∇y φη )(x, nx)) dx lim F (uη,n ) = lim n→+∞ n→+∞ = f (∇u(x) + ∇y φη (x, y)) dxdy ×Y
≤ QF (u) + 2η. Letting η → 0 (i.e., k → +∞), up to a subsequence, one obtains lim
lim F (uη,n ) ≤ QF (u).
η→+0 n→+∞
We apply now the diagonalization Lemma 11.1.1 for the sequence F (uη,n ), uη,n η,n in the 1 such that metric space R × Lp (, Rm ): there exists a map n → η(n) := k(n) lim F (uη(n),n ) ≤ QF (u),
n→+∞
lim uη(n),n = u strongly in Lp (, Rm ).
n→+∞
The sequence (un )n∈N where un = uη(n),n then verifies the assertion of Proposition 11.2.4. We now state and prove Lemma 11.2.3 invoked above. Let D be any open cube of RN . We recall that C# (D) denotes the set of all the restrictions to D of continuous and Dperiodic functions on RN , equipped with the uniform norm on D, and that L1 (, C# (D)) denotes the space of all measurable functions h from into C# (D) satisfying sup |h(x, y)| dx < +∞. y∈D
Lemma 11.2.3. For every function g in L1 (, C# (D)), 1 lim g(x, nx) dx = g(x, y) dxdy. n→+∞ |D| ×D
(11.25)
1 Proof. It is well known (see, for instance, Yosida [225]) that if g belongs to L (, C# (D)), then g is a Carathéodory function satisfying supy∈D |g(., y)| ∈ L1 () so that g(x, nx) dx
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
433
is well defined. For k ∈ N∗ , let us decompose the cube D as follows: k $ N
D=
Di ,
i=1
where Di are small pairwise disjoint open cubes 1/k-homothetic of D. We approximate g in L1 (, C# (D)) by the following step function gk : k N
gk (x, y) =
g(x, yi )1Di (y),
i=1
where 1Di is the characteristic function of the set Di extended by D-periodicity on RN and yi is any fixed element of Di . Due to the periodicity of 1Di , classically x → 1Di (nx) σ (L∞ , L1 ) weakly converges to |Di |/|D| so that |Di | lim g(x, yi )1Di (nx) dx = g(x, yi ) dx. n→+∞ |D| For a proof, see Example 2.4.2 or Proposition 13.2.1 and the proof of Theorem 13.2.1. Summing these equalities over i = 1, . . . , k N , one obtains that (11.25) is satisfied for gk . To conclude by going to the limit on k, we use the uniform bound with respect to n: |g(x, nx) dx − gk (x, nx)| dx ≤ g − gk L1 (,C# (D)) . (11.26)
Moreover, since sup |gk (x, y) − g(x, y)| ≤ 2 sup |g(x, y)| ∈ L1 () y∈D
y∈D
and limk→+∞ supy∈D |gk (x, y) − g(x, y)| = 0, according to the Lebesgue dominated convergence theorem, we obtain lim g − gk L1 (,C# (D)) = 0.
k→+∞
(11.27)
Thus, from (11.26) g(x, nx) dx − 1 g(x, y) dxdy ≤ g(x, nx) dx − gk (x, nx) dx |D| ×D 1 + gk (x, nx) dx − gk (x, y) dxdy |D| ×D 1 1 + gk (x, y) dxdy − g(x, y) dxdy |D| ×D |D| ×D 1 ≤ max 1, g − gk L1 (,C# (D)) |D| 1 + gk (x, nx) dx − gk (x, y) dxdy . |D| ×D We conclude first by letting n → +∞ and using (11.25) satisfied by gk and then by letting k → +∞ and using (11.27).
i
i i
i
i
i
i
434
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Now, we would like to compute the lower semicontinuous envelope of the integral functional in (11.7), by taking into account a boundary condition on a part 0 of the boundary ∂ of . More precisely, we aim to describe the lower semicontinuous envelope of the integral functional F : Lp (, Rm ) −→ R+ ∪ {+∞} defined by f (∇u) dx if u ∈ W 1,p (, Rm ), 0 F (u) = (11.28) +∞ otherwise, 1,p
where W0 (, Rm ) denotes the subspace of all the functions u in W 1,p (, Rm ) such that u = 0 on 0 in the sense of traces. In the following corollary, we state that the boundary condition is “not relaxed.” We will see that the preservation of the boundary condition is not satisfied in the case p = 1. Corollary 11.2.1. The lower semicontinuous envelope of the integral functional F defined in (11.28) is given by Qf (∇u) dx if u ∈ W 1,p (, Rm ), 0 cl(F )(u) = +∞ otherwise. Proof. We only have to establish the existence of a sequence (un )n∈N strongly converging to u in Lp (, Rm ) such that cl(F )(u) ≥ lim supn→+∞ F (un ). Assuming cl(F )(u) < +∞, we 1,p must construct a sequence (un )n∈N in W0 (, Rm ), strongly converging to u in Lp (, Rm ) and satisfying cl(F )(u) ≥ lim supn→+∞ F (un ). According to Theorem 11.2.1, there exists a sequence (vn )n∈N in W 1,p (, Rm ) strongly converging to u in Lp (, Rm ) such that Qf (∇u) dx ≥ lim sup f (∇vn ) dx. (11.29) n→+∞
The idea is now to modify vn on a neighborhood of ∂ so that the new function belongs to 1,p W0 (, Rm ) and in such a way to decrease the energy. We use again the slicing method of De Giorgi. Let ν ∈ N∗ and 0 ⊂⊂ such that 1 1 + |∇u|p dx ≤ (11.30) ν \0 and (i )i=0,...,ν an increasing sequence of open subsets strictly included in , i ⊂⊂ i+1 ⊂⊂ . Let (ϕi )i=0,...,ν−1 be a sequence of functions in D(RN ) satisfying ϕi = 1 on i , ϕi = 0 on RN \ i+1 , 0 ≤ ϕi ≤ 1, ν |∇ϕi | ≤ , d where d = dist(0 , RN \ ), and define un,i = ϕi (vn − u) + u.
i
i i
i
i
i
i
11.2. Relaxation of integral functionals with domain W 1,p (, Rm ), p > 1
“abmb 2005/1 page 4 i
435
1,p
Clearly un,i belongs to W0 (, Rm ) and f (∇un,i ) dx + f (∇un,i ) dx + f (∇un,i ) dx f (∇un,i ) dx = i+1 \i i \i+1 ≤ f (∇vn ) dx. f (∇u) dx + f (∇un,i ) dx + \0
i+1 \i
Then, from (11.30) and the growth condition in (11.5), we obtain 1 ν p p p (|∇vn | ) dx + f (∇vn ) dx, |vn − u| dx + + f (∇un,i ) dx ≤ C ν d i+1 \i where, from now on, C denotes various positive constants depending only on β, p, and . By averaging these ν inequalities, we obtain ν−1 1 1 ν p 1 p p + f (∇un,i ) dx ≤ C |vn − u| dx + |∇vn | dx + f (∇vn ) dx. ν i=0 ν d ν As already said in the proof of Proposition 11.2.3, we used a slicing method to control the term |∇vn |p dx by taking increasingly thin slices (i.e, ν → +∞). We could not conclude by using a simple troncature. From the coercivity condition (11.5), |∇vn |p dx is bounded, hence ν−1 1 1 ν p p + f (∇un,i ) dx ≤ C |vn − u| dx + f (∇vn ) dx. ν i=0 ν d Let i(n, ν) be the indice i such that f (∇un,i(n,ν) ) dx =
(11.31)
min
i=0,...,ν−1
f (∇un,i ) dx.
Inequality (11.31) yields 1 ν p f (∇un,i(n,ν) ) dx ≤ C |vn − u|p dx + f (∇vn ) dx + ν d so that from (11.29) and since vn → u in Lp (, Rm )
lim sup lim sup F (un,i(n,ν) ) ≤ lim sup F (vn ) ≤ ν→+∞ n→+∞
n→+∞
Qf (∇u) dx.
We conclude by a classical diagonalization argument: there exists n → ν(n) mapping N into N such that lim F (un,i(n,ν(n)) ) ≤ Qf (∇u) dx. n→+∞
Obviously lim un,i(n,ν(n)) = u strongly in Lp (, Rm ).
n→+∞
i
i i
i
i
i
i
436
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
p m The sequence (un )n∈N defined by un = un,i(n,ν(n)) then tends to u in L (, R ) and satisfies Qf (∇u) dx ≥ lim supn→+∞ F (un ).
As a consequence we obtain the following relaxation theorem in the case p > 1. Theorem 11.2.2 (relaxation theorem, p > 1). Let us consider a function f : Mm×N −→ R satisfying (11.5), (11.6), a function g in Lq (, Rm ), where p1 + q1 = 1, and consider the following problem: 1,p m inf f (∇u) dx − g.u dx : u ∈ W0 (, R ) . (P)
Then, the relaxed problem of (P) in the sense of Theorem 11.1.2 is 1,p min Qf (∇u) dx − g.u dx : u ∈ W0 (, Rm ) .
Proof. With the notation of Theorem 11.2.1 one has 1,p m inf f (∇u) dx − g.u dx : u ∈ W0 (, R ) =
and 1,p m inf Qf (∇u) dx − g.u dx : u ∈ W0 (, R ) =
(P)
F (u) −
inf
u∈Lp (,Rm )
g.u dx
inf
u∈Lp (,Rm )
g.u dx .
cl(F )(u)−
Since G : u → g.u dx is a continuous perturbation of F , one has cl(F +G) = cl(F )+G in Lp (, Rm ). Then, according to Theorems 11.1.2, 11.2.1, it suffices to establish the infcompactness of u → F (u) −
g.u dx
p m p in (, Rm ) such that F (u) − L (, R ) equipped with its strong topology. Let u ∈ L 1,p (, Rm ) and g.u dx ≤ C, where C is any positive constant. Then u ∈ W F (u) − g.u dx = f (∇u) dx − g.u dx ≤ C.
From (11.5) and Hölder’s inequality, we obtain α |∇u|p dx ≤ gLq (,Rm ) uLp (,Rm ) + C.
(11.32)
p p
q
Applying Young’s inequality ab ≤ λ pa + λ1q bq with a = uLp (,Rm ) , b = gŁq (,Rm ) p where λ is chosen so that λp CP < α and CP denotes the Poincaré constant, i.e., the best constant satisfying |u|p dx ≤ CP |∇u|p dx,
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm ) the estimate (11.32) yields
“abmb 2005/1 page 4 i
437
|∇u|p dx ≤ C,
where C is a positive constant depending only on , p, α, and gŁq (,Rm ) . Therefore u 1,p belongs to the closed ball with radius C of W0 (, Rm ) which, according to the Rellich– Kondrakov theorem, Theorem 5.4.2, is compact in Lp (, Rm ). Remark 11.2.2. In the case p = 1, Theorem 11.2.1 obviously remains valid as long as one considers the restrictions of F and cl(F ) to W 1,1 (, Rm ). More precisely, for all u ∈ W 1,1 (, Rm ), cl(F )(u) =
Qf (∇u) dx.
Nevertheless, we do not have a complete description of cl(F ), which is indeed given in the next section.
11.3
Relaxation of integral functionals with domain W 1,1 (, Rm )
We show how the space BV (, Rm ) and the notion of trace (see Remark 10.2.2) take place in the relaxation theory. For simplicity of the exposition, in a first approach we limit our study to the case m = 1 and f = |.|. The general case is treated at the end of this section. From now on is a Lipschitz open bounded subset of RN . It is well known that the integral functionals defined on Lp (), p > 1 by 1,p p 1,p |∇u| dx if u ∈ W (), |∇u|p dx if u ∈ W0 (), F (u) = G(u) =
+∞ otherwise,
+∞ otherwise,
are lower semicontinuous for the strong topology of LP () or the weak topology of W 1,p (). For instance, one may argue directly by using the convexity of these two functionals or one may apply the previous section by noticing that Qf = f when f = |.|p . The case p = 1 where L1 () is equipped with its strong topology and the functionals F , G are given by 1,1 |∇u| dx if u ∈ W (), |∇u| dx if u ∈ W01,1 (), F (u) = G(u) =
+∞ otherwise,
+∞ otherwise,
is more involved. Indeed, we have seen in Section 10.4 that the sequence (un )n∈N which generates the Cantor–Vitali function u satisfies un → u strongly in L1 (0, 1), supn∈N F (un ) < +∞ but u ∈ W 1,1 (). Consequently, the domain of the lower closure of F strictly contains the space W 1,1 (). We see below that this domain is included in the space BV (). Actually, as a consequence of Proposition 11.3.2, the domain is exactly BV (). Concerning the second functional G, we will see that the boundary condition u = 0 is relaxed by a surface energy. Proposition 11.3.1. The domain of cl(F ) and cl(G) is included in BV ().
i
i i
i
i
i
i
438
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Proof. Let u ∈ L1 (, Rm ) be such that cl(F )(u) < +∞ and consider a sequence (un )n∈N strongly converging to u in L1 (, Rm ) and satisfying cl(F )(u) = limn→+∞ F (un ). Such a sequence exists from Proposition 11.1.1. According to cl(F )(u) = limn→+∞ F (un ) < +∞, one obtains, for a not relabeled subsequence of (un )n∈N , sup |∇un | dx < +∞. n∈N
Thus, from Proposition 10.1.1(i), u ∈ BV (). Proposition 11.3.2. Let be a Lipschitz bounded open subset of RN and its boundary. The lsc envelopes cl(F ) and cl(G) of the functionals F and G defined on L1 () equipped with its strong topology are given by |Du| if u ∈ BV (), cl(F )(u) =
otherwise, +∞ |Du| + |γ0 (u)| dHN −1 if u ∈ BV (), cl(G)(u) =
+∞ otherwise,
where γ0 is the trace operator from BV () into L1 (), defined in Section 10.2. Proof. Let us set QF (u) =
|Du| if u ∈ BV (),
otherwise, +∞ |γ0 (u)| dHN −1 if u ∈ BV (), |Du| + QG(u) =
+∞ otherwise.
We first establish that for all u in L1 (): if un → u in L1 (), then QF (u) ≤ lim inf F (un ), n→+∞
1 there exists a sequence (un )n∈N converging to u in L () such that QF (u) ≥ lim sup F (un ). n→+∞
These two assertions are straightforward consequences of Proposition 10.1.1 and Theorem 10.1.2. We now deal with the functional G and establish for every u ∈ L1 (), if un → u in L1 (), then QG(u) ≤ lim inf G(un ), n→+∞
1 there exists a sequence (un )n∈N converging to u in L () such that QG(u) ≥ lim sup G(un ). n→+∞
Proof of the first assertion. Consider a sequence (un )n∈N strongly converging to u in L1 () such that lim inf n→+∞ G(un ) < +∞. For a subsequence (not relabeled), we have ˜ denote a bounded open subset of RN strongly containing , i.e., un ∈ W01,1 (). Let ˜ by ˜ and define, for every function v in L1 (), the function v˜ in L1 () ⊂ , v(x) if x ∈ , v(x) ˜ = ˜ \ . 0 if x ∈
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm )
“abmb 2005/1 page 4 i
439
˜ u˜ n ∈ W 1,1 (), ˜ and u˜ ∈ BV () ˜ with D u˜ = It is easily seen that u˜ n → u˜ in L1 (), N−1 N −1 Du + γ0 (u) ν H , where ν denotes the inner unit normal at H a.e. x in (see Examples 10.2.1 and 10.2.2). Since Du and γ0 (u) ν HN −1 are mutually singular, one has |D u| ˜ = |Du| + |γ0 (u)|HN−1 . Thus, according to the previous result for F , one has |D u| ˜ ≤ lim inf |D u˜ n | = lim inf |Dun |. |Du| + |γ0 (u)| dHN−1 =
˜
˜
n→+∞
n→+∞
Proof of the second assertion. For t > 0 let us consider the open subset t = {x ∈ : dist(x, RN \ ) > t} of , and define for every u in BV (), the function ut by u(x) if x ∈ t , ut (x) = 0 if x ∈ \ t . It is easy to see that ut ∈ BV () and that Dut = Dut + γt (u) νt HN −1 t , where t denotes the boundary of t , νt the inner unit normal at HN −1 a.e. x in t , and γt the trace operator from BV(t ) into L1 (t ) (see Examples 10.2.1 and 10.2.2). According to Theorem 10.1.2 and Remark 10.2.1, there exists ut,n ∈ C∞ ()∩BV () satisfying ut,n = 0 on in the trace sense and converging to ut for the intermediate convergence in BV (). We obtain, for a.e. t, |Dut,n | = |Dut | lim n→+∞ |u| dHN −1 , = |Du| + t
t
where we have used that for a.e. t, γt (u) = u on t (cf. Example 10.2.4). Letting t → 0+ , we claim that |Dut,n | = |Du| + |γ0 (u)| dHN −1 . (11.33) lim+ lim t→0 n→+∞
To justify the nontrivial limit |u| dH
lim
t→0+
t
N −1
=
|γ0 (u)| dHN −1 ,
we argue with local coordinates and make use of estimate (10.13) in the proof of Theorem 10.2.1. With the notations of this proof, we have |u(x, ˜ t) − u(x, ˜ t )| ≤ |Du|,
and letting t → 0,
CR,t ,t
|γ0 (u) − u(x, ˜ t)| ≤
|Du|. CR,t
i
i i
i
i
i
i
440
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
The result follows after letting t → 0. Now t denotes a sequence converging to 0. Going back to (11.33) and using the diagonalization Lemma 11.1.1 for the sequence |Dut,n |, ut,n in the metrizable space R × L1 (), we conclude that there exists a sequence n → t (n) such that lim |Du| + |γ0 (u)| dHN −1 , |Dut (n),n | = n→+∞
lim ut (n),n = u in L1 (),
n→+∞
and the proof is complete. Let us now consider the general case and state the following theorem, analogous to Theorem 11.2.1. Given a function f : Mm×N → R satisfying (11.5) and (11.6) with p = 1, we intend to compute the lsc envelope of the functional F : L1 (, Rm ) −→ R+ ∪ {+∞} defined by f (∇u) dx if u ∈ W 1,1 (, Rm ), F (u) = (11.34) +∞ otherwise. Let us recall that when u ∈ BV (), according to the Radon–Nikodym theorem, Theorem 4.2.1, one has Du = ∇uLN + D s u, where ∇uLN and D s u are two mutually singular measures in M(, RN ). Theorem 11.3.1. The lsc envelope of the functional F defined in (11.34) is given, for every u ∈ L1 (, Rm ), by s Qf (∇u) dx + (Qf )∞ D u |D s u| if u ∈ BV(, Rm ), |D s u| cl(F )(u) = +∞ otherwise, where Qf is the quasiconvex envelope of f defined in Proposition 11.2.2, and (Qf )∞ is the recession function of Qf defined for every a in Mm×N by (Qf )∞ (a) = lim supt→+∞ (Qf (ta)/t). The proof of Theorem 11.3.1 is a consequence of Propositions 11.3.3 and 11.3.4. We set s Qf (∇u) dx + (Qf )∞ D u |D s u| if u ∈ BV(, Rm ), |D s u| QF (u) = +∞ otherwise. Proposition 11.3.3. For every u in L1 (, Rm ) and every sequence (un )n∈N strongly converging to u in L1 (, Rm ), one has QF (u) ≤ lim inf F (un ). n→+∞
(11.35)
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm )
“abmb 2005/1 page 4 i
441
Proof. Our strategy is exactly the one of Proposition 11.2.3. Obviously, one may assume lim inf n→+∞ F (un ) < +∞. For a nonrelabeled subsequence, let us consider the nonnegative Borel measure µn := f (∇un (.))LN . Since sup µn () < +∞, n∈N
there exists a further subsequence (not relabeled) and a nonnegative Borel measure µ ∈ M() such that µn µ weakly in M(). Let µ = g LN + µs be the Lebesgue–Nikodym decomposition of µ where µs is a nonnegative Borel measure, singular with respect to the N -dimensional Lebesgue measure LN . For establishing (11.35) it suffices to prove that g(x) ≥ Qf (∇u(x)) s for a.e x ∈ , D u s ∞ µ ≥ (Qf ) |D s u|. |D s u| Indeed, according to Alexandrov’s theorem, Proposition 4.2.3, we will obtain lim inf F (un ) = lim inf µn () ≥ µ() = g(x) dx + µs () n→+∞ n→+∞ s D u ∞ |D s u|. ≥ Qf (∇u(x)) dx + (Qf ) s u| |D (a) Proof of g(x) ≥ Qf (∇u(x)) for a.e. x ∈ . It suffices to reproduce the proof of Proposition 11.2.3 which obviously holds true for p = 1 (indeed, Lemma 11.2.1 holds true from Proposition 10.4.1). D u s (b) Proof of µs ≥ (Qf )∞ ( |D s u| ) |D u|. The proof is based on various arguments of Du Ambrosio–Dal Maso [21]. The density |Du| satisfies the following property (see Alberti [8]). s
Lemma 11.3.1. The density Du/|Du| is for |D s u|-a.e. x0 ∈ a rank-one matrix, i.e., for |D s u|-a.e. x0 ∈ Du (x0 ) = a(x0 ) ⊗ b(x0 ) |Du| with a(x0 ) ∈ Rm , b(x0 ) ∈ RN , |a(x0 )| = |b(x0 )| = 1. The rank-one property of the jump part of the singular measure D s u is indeed trivial because of its structure (see Section 10.3). De Giorgi conjectured that the diffuse singular part is also a rank-one matrix valued measure. The proof was later given by Alberti in [8]. Let us give now some notation. Let x0 be an element of the complementary of the |D s u|-nullset invoked in Lemma 11.3.1 and let Q denote the unit cube of RN centered at the origin, whose sides are either orthogonal or parallel to b(x0 ). We set Qρ (x0 ) := {x0 + ρx : x ∈ Q}, where ρ is a positive parameter intended to tend to zero.
i
i i
i
i
i
i
442
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
According to the theory of differentiation of measures (see Theorem 4.2.1), for establishing µs ≥ (Qf )∞ (D s u), it is enough to prove lim
ρ→0
µ(Qρ (x0 )) ≥ (Qf )∞ (a(x0 ) ⊗ b(x0 )). |Du|(Qρ (x0 ))
(11.36)
We will make use of the following two estimates: for |D s u|-a.e. x0 ∈ , we have |Du|(Qρ (x0 )) = +∞, ρ→0 ρN |Du|(Qtρ (x0 )) lim sup ≥ t N ∀t ∈]0, 1[d . |Du|(Qρ (x0 )) ρ→0 lim
(11.37) (11.38)
Assertion (11.37) easily follows from the theory of differentiation of measures (see Theorem 4.2.1). For a proof of (11.38), consult Ambrosio–Dal Maso [21, Theorem 2.3]. From now on, x0 will be a fixed element of the complementary of the |D s u|-null set invoked in Lemma 11.3.1, for which moreover estimates (11.37), (11.38) hold true and the two limits µ(Qρ (x0 )) Du(Qρ (x0 )) lim , lim ρ→0 |Du|(Qρ (x0 )) ρ→0 |Du|(Qρ (x0 )) exist (cf. Theorem 4.2.1) and make use of a blow up procedure. We set tρ := |Du|(Qρ (x0 ))/ρ N . According to (11.37), tρ will play the role of the parameter t in the definition of (Qf )∞ : (Qf )∞ (a) = lim sup t→+∞
Qf (ta) . t
Let us now define the following rescaled functions of BV (Q, Rm ): 1 ρ N−1 vρ (x) := u(y)dy , u(x0 + ρx) − |Qρ (x0 )| Qρ (x0 ) |Du|(Qρ (x0 )) vρ,n (x) := which satisfy
ρ N−1 |Du|(Qρ (x0 ))
un (x0 + ρx) −
1 |Qρ (x0 )|
un (y)dy ,
Qρ (x0 )
vρ (y)dy = 0, lim |vρ,n − vρ |L1 (Q,Rm ) = 0 Q
n→+∞
and, from Lemma 11.3.1, Dvρ (Q) =
Du(Qρ (x0 )) → a(x0 ) ⊗ b(x0 ) when ρ → 0. |Du|(Qρ (x0 )
The sequence (vρ )ρ>0 fulfills the following properties (for a proof, consult Ambrosio and Dal Maso [21, Theorem 2.3]).
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm )
“abmb 2005/1 page 4 i
443
Lemma 11.3.2. There exists a subsequence of (vρ )ρ>0 , not relabeled, which weakly converges in BV (Q, Rm ) to a function v of the form v(x) = v(b(x0 ), x)a(x0 ), where v is nondecreasing and belongs to BV (] − 1/2, 1/2[). Moreover, for a.e. δ in (0, 1) one has Dvρ (δQ) → Dv(δQ). D u s We are now in a position to establish µs ≥ (Qf )∞ ( |D s u| ) |D u|. The proof will be divided into three steps. We fix δ in (0, 1) outside of a set of null measure so that the last assertion of Lemma 11.3.2 holds. We will make δ tend to 1 at the end of the proof. s
First step (truncation). Let uρ,n in W 1,1 (Q, Rm ) defined by 1 1 un (y)dy . un (x0 + ρy) − uρ,n := ρ |Qρ (x0 )| Qρ (x0 ) Note that
1 u tρ ρ,n
is the function vρ,n previously defined. A change of scale gives 1 1 f (∇un )dx = f (∇uρ,n )dx. |Du|(Qρ (x0 )) Qδρ (x0 ) tρ δQ
We want to modify the function uρ,n in a neighborhood of δQ so that it coincides with an affine function of gradient tρ Dv(δQ) on ∂δQ. The basic idea, similar to the one used in the proof of Proposition 11.2.3, consists in slicing a neighborhood of the boundary of δQ 1/2 by thin slices whose size is of order vρ − vL1 (Q,Rm ) . (Recall that |vρ − v|L1 (Q,Rm ) goes to zero when ρ → 0.) 1
More precisely, we set αρ := vρ − vL2 1 (Q,Rm ) and, for ν ∈ N∗ intended to tend to +∞, αρ Q0 := (1 − αρ )δQ, Qi := 1 − αρ + i δQ for i = 1, . . . , ν. ν On the other hand, for i = 1, . . . , ν, we consider ν ϕi ∈ C∞ 0 (Qi ), 0 ≤ ϕi ≤ 1, ϕi = 1 in Qi−1 , |∇ϕi | ≤ αρ and define uρ,n,i ∈ tρ
δ
+ W01,1 (δQ, Rm ) by uρ,n,i := tρ
where
δ
δ
+ ϕi (uρ,n − tρ
δ ),
is the affine function: δ (x) −
:=
v(( 2δ )− ) + v((− 2δ )+ ) Dv(δQ) a(x0 ). .x + δN 2 +
) a(x0 ) has been chosen so that the traces of v and agree on The constant v((δ/2) )+v(−(δ/2) 2 the faces of δQ orthogonal to b(x0 ) and, consequently, so that v − θ fulfills the Poincaré inequality on δQ. From (11.5), an easy computation gives 1 1 f (∇uρ,n )dx ≥ f (∇uρ,n,i )dx − Rρ,n,ν,δ,i , (11.39) tρ δQ tρ δQ
i
i i
i
i
i
i
444
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
where Rρ,n,ν,δ,i := Oρ +
β tρ
Qi \Qi−1
|D(uρ,n − tρ
δ )|dx +
βν αρ
Qi \Qi−1
|vρ,n −
δ |dx
and Oρ does not depend on n and tends to 0 when ρ → +∞. Second step (averaging). According to (11.39) and to the definition of Qf , we have δN tρ 1 f (∇un )dx ≥ Qf N Dv(δQ) |Du|(Qρ (x0 )) Qδρ (x0 ) tρ δ β − Oρ − |D(uρ,n − tρ δ )|dx tρ Qi \Qi−1 βν − |vρ,n − δ |dx. αρ Qi \Qi−1 Averaging these ν inequalities, we obtain δN tρ 1 f (∇un )dx ≥ Qf N Dv(δQ) |Du|(Qρ (x0 )) Qδρ (x0 ) tρ δ β β − Oρ − |D(uρ,n − tρ δ )|dx − |vρ,n − νtρ Q αρ δQ\(1−αρ )δQ tρ β δN Qf N Dv(δQ) − Oρ − |D(uρ,n − tρ δ )|dx ≥ tρ δ νtρ Q 21 β −β |vρ,n − v|dx − |v − δ |dx. αρ δQ\(1−αρ )δQ Q Letting successively ν → +∞ and n → +∞, we obtain 1 f (∇un )dx lim sup n→+∞ |Du|(Qρ (x0 )) Qδρ (x0 ) 21 δN tρ ≥ Qf N Dv(δQ) − Oρ − β |vm − v|dx tρ δ Q β − |v − δ |dx. αρ δQ\(1−αρ )δQ
δ |dx
(11.40)
Last step. From Lipschitz continuity of Qf (cf Proposition 11.2.2) and according to Lemmas 11.3.1 and 11.3.2 and estimates (11.37) and (11.38), we obtain δN δN tρ tρ Qf N Dv(δQ) = lim sup Qf N Dv(δQ) lim sup tρ δ tρ δ ρ→0 ρ→0 N δ tρ Qf N a(x0 ) ⊗ b(x0 ) ≥ lim sup tρ δ ρ→0 |Du|(Cδρ (x0 )) − lim inf L 1 − ρ→0 |Du|(Cρ (x0 )) ≥ (Qf )∞ (a(x0 ) ⊗ b(x0 )) − L (1 − δ N ). (11.41)
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm ) On the other hand, 1 ρ→0 αρ
“abmb 2005/1 page 4 i
445
|v −
lim
Indeed, from Poincaré’s inequality 1 |v − αρ δQ\(1−αρ )δQ
δ |dx
= 0.
(11.42)
δQ\(1−αρ )δQ
Dv(δQ) δ |dx ≤ C Dv − δ N δQ\(1−αρ )δQ
which tends to zero when ρ goes to zero. Combining (11.40), (11.41), and (11.42), we obtain 1 lim sup lim sup f (∇un )dx ≥ (Qf )∞ (a(x0 ) ⊗ b(x0 )) − L (1 − δ N ). n→+∞ |Du|(Qρ (x0 )) δQρ (x0 ) ρ→0 The lower bound is then established after letting δ → 1. Proposition 11.3.4. For every u in Lp (, Rm ), there exists a sequence (un )n∈N strongly converging to u in L1 (, Rm ) such that QF (u) ≥ lim sup F (un ). n→+∞
Proof. The proof proceeds in two steps. First step (computation of cl(F )W 1,1 (, Rm )). According to Remark 11.2.2 and Theorem 11.2.1 for every u ∈ W 1,1 (, Rm ), cl(F )(u) = Qf (∇u) dx. (11.43)
Second step. Let us consider the functional F˜ : L1 (, Rm ) −→ R+ ∪ {+∞} defined by
Qf (∇u) dx if u ∈ W 1,1 (, Rm ), F˜ (u) = +∞ otherwise. We claim that it is enough to establish that for all u ∈ BV(, Rm ), s D u cl(F˜ )(u) ≤ |D s u|. Qf (∇u) dx + (Qf )∞ s u| |D
(11.44)
Indeed, let us assume (11.44). For every u ∈ BV(, Rm ), according to Proposition 11.1.1 and Theorem 11.1.1, we obtain the existence of a sequence (uk )k∈N in W 1,1 (, Rm ) strongly converging to u in L1 (, Rm ) and satisfying s D u Qf (∇u) dx + (Qf )∞ |D s u| ≥ cl(F˜ )(u) s u| |D = lim F˜ (uk ) k→+∞ Qf (∇uk ) dx. = lim k→+∞
i
i i
i
i
i
i
446
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
On the other hand, from (11.43), there exists a sequence (uk,n )n∈N in W 1,1 (, Rm ) strongly converging to uk in L1 (, Rm ) and satisfying Qf (∇uk ) dx = lim f (∇uk,n ) dx. n→+∞
Consequently, s D u s u| ≥ lim lim f (∇uk,n ) dx, |D Qf (∇u) dx + (Qf )∞ k→+∞ n→+∞ |D s u| lim lim uk,n = u strongly in L1 (, Rm ), k→+∞ n→+∞
and we conclude by a diagonalization argument: there exists n → k(n) mapping N to N such that s D u ∞ lim |D s u|, Qf (∇u) dx + (Qf ) f (∇uk(n),n ) dx ≤ n→+∞ |D s u| lim u = u strongly in L1 (, Rm ). n→+∞
k(n),n
We now establish (11.44). We use some arguments of Ambrosio and Dal Maso [21]. Let u be a fixed element of BV(, Rm ) and consider the set function µ : A → cl(F˜ )(u, A) defined for all bounded open subset A of . The notation cl(F˜ )(., A) means that we consider the lower semicontinuous envelope of the functional F˜ localized on A, the space L1 (A, Rm ) being equipped with its strong topology. Note that µ satisfies the following estimate for every open bounded subset A of : µ(A) ≤ β|A| + β |Du|. (11.45) A
Indeed, applying the approximation theorem, Theorem 10.1.2, there exists a sequence (vn )n∈N in BV(A, Rm ) ∩ C∞ (A, Rm ) such that vn → u strongly in L1 (A, Rm ),
|∇vn | dx →
A
|Du|. A
Thus, from the growth condition (11.5) µ(A) = cl(F˜ )(u, A) ≤ lim inf F (vn , A) n→+∞ ≤ β|A| + lim inf |∇vn | dx n→+∞ A = β|A| + |Du|. A
i
i i
i
i
i
i
11.3. Relaxation of integral functionals with domain W 1,1 (, Rm )
“abmb 2005/1 page 4 i
447
According to the definition of µ and to (11.45), one can now easily establish that for all bounded open subsets A and A of , µ satisfies A ⊂ A ⇒ µ(A) ≤ µ(A ); A ∩ A = ∅ ⇒ µ(A ∪ A ) ≥ µ(A) + µ(A ); µ(A) ≤ sup{µ(A ) : A ⊂⊂ A}; µ(A ∪ A ) ≤ µ(A) + µ(A ). For a complete proof, consult [21]. Consequently (consult, for instance, the book [117]), one can extend µ to a Borel measure on , still denoted by µ, defined for every Borel set B of by µ(B) = inf{µ(A) : A open, B ⊂ A}. By considering the Lebesgue–Nikodym decomposition (cf. Theorem 4.2.1) µ = µa + µs of µ, it suffices now to establish, for every Borel set B of , µa (B) ≤
(11.46)
Qf (∇u) dx, B
(Qf )∞
µs (B) ≤ B
Ds u |D s u|
|D s u|.
(11.47)
To estimate the singular part µs from above, i.e., (11.47), we will need the following continuity result. Lemma 11.3.3. Let h : Mm×N → R be a continuous function and (un )n∈N a sequence in BV(, Rm ) converging to u for the intermediate convergence. Then lim
n→+∞
Dun h |Dun |
Du |Dun | = h |Du|
|Du|.
For a proof, we refer the reader to Luckhaus and Modica [174] or, in the convex case, to Demengel and Temam [126]. Let us set, for every a ∈ Mm×N , g(a) = sup t>0
Qf (ta) − Qf (0) . t
According to the rank-one convexity of Qf (cf. Remark 11.2.1), we note that if rank(a) = 1, one has g(a) = (Qf )∞ (a). For every open subset A of , consider un ∈ BV(A, Rm ) ∩ C∞ (A, Rm ), strongly converging to u in L1 (A, Rm ), and such that lim |Dun |(A) = |Du|(A).
n→+∞
i
i i
i
i
i
i
448
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
Such a sequence exists by Theorem 10.1.2. From f ≤ f (0) + g and Lemma 11.3.3, one has Dun µ(A) ≤ Qf (0)|A| + lim |Dun | g n→+∞ A |Dun | Du = Qf (0)|A| + g |Du|. |Du| A The same inequality now holds true for any Borel set B of . Taking the singular part of the two members and according to the rank-one property of the singular part of Du, we obtain s D u µs (B) ≤ (Qf )∞ |D s u|. |D s u| B We are going to estimate from above the absolutely continuous part µa , i.e., (11.46). Let ρn be a regularizing kernel (see Theorem 4.2.2) and set un = ρn ∗ u. From ∇un = ρn ∗ Du = ρn ∗ ∇u + ρn ∗ D s u and the local Lipschitz continuity of Qf , one has, for every open set A ⊂⊂ A, Qf (ρn ∗ ∇u) dx + L |ρn ∗ D s u|(A ) Qf (∇un ) dx ≤ A A ≤ Qf (ρn ∗ ∇u) dx + L |D s u|(A + spt(ρn )). A
Letting n → +∞ yields
µ(A ) ≤ and, since A ⊂⊂ A is arbitrary,
A
µ(A) ≤
Qf (∇u) dx + L |D s u|(A)
Qf (∇u) dx + L |D s u|(A)
A
for every open subset A of . Since the Lebesgue measure and the nonnegative Borel measure |D s u| are regular, the same inequality now holds true for every Borel subset B of . Taking the absolutely continuous part of each member, we finally obtain a µ (B) ≤ Qf (∇u) dx B
and the proof is complete. We now compute the lsc envelope of the integral functional in (11.34) by taking into account a boundary condition on a part 0 of the boundary ∂ of . More precisely, we aim to describe the lsc envelope of the integral functional F : L1 (, Rm ) −→ R+ ∪ {+∞} defined by f (∇u) dx if u ∈ W 1,1 (, Rm ), 0 F (u) = +∞ otherwise.
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
449
Corollary 11.3.1. The lsc envelope of the integral functional F is given by s D u ∞ |D s u| Qf (∇u) dx + (Qf ) s u| |D cl(F )(u) = + (Qf )∞ (γ0 (u) ⊗ ν) dHN −1 if u ∈ BV(, Rm ), 0 +∞ otherwise, where ν denotes the outer unit normal to 0 and γ0 the trace operator. In other words, in this relaxation process, the boundary condition is translated in terms of surface energy (Qf )∞ ([u] ⊗ ν) dHN −1 0 . The proof is very similar to that of Proposition 11.3.2. For details, consult [1] and the references therein. As a consequence we obtain the relaxation theorem in the case p = 1. Theorem 11.3.2 (relaxation theorem, p = 1). Let us consider a function f : Mm×N −→ R satisfying (11.5) and (11.6) and g in L∞ (, Rm ) satisfying g∞ < CαP , where CP is the Poincaré constant in . Then the relaxed problem of 1,1 m inf f (∇u) dx − g.u dx : u ∈ W0 (, R ) (P)
in the sense of Theorem 11.1.2 is given by s D u ∞ s min Qf (∇u) dx − (Qf ) u| + (Qf )∞ (γ0 (u) ⊗ ν) dHN −1 |D |D s u| 0 − g.u dx : u ∈ BV(, Rm ) . (P)
Proof. Note that from the estimate g∞ < CαP , we easily see that the infimum is finite. Arguing as in the proof of Theorem 11.2.2, according to Theorem 11.1.2, it suffices to establish the inf-compactness of u → F (u) − g.u dx
in L1 (, Mm×N ). This property is a straightforward consequence of the estimate g∞ < α and the compactness of the embedding of W1,1 (, Rm ) into L1 (, Mm×N ). CP 0
11.4
Relaxation in the space of Young measures in nonlinear elasticity
The strategy described in Sections 11.2 and 11.3 has the disadvantage to quasi-convexify the density function f so that the relaxed functional, with density Qf , does not provide information on the oscillations of the gradient minimizing sequences. In this section, we describe an alternative way for relaxing the free energy by using the notion of Young measure
i
i i
i
i
i
i
450
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
introduced in Section 4.3, well adapted for capturing oscillations of minimizing sequences (see Subsection 4.3.6). According to the point of view of Ball and James in [56] or Bhattacharya and Kohn in [69], the density of the relaxed free-energy obtained this way is the microscopic free-energy density corresponding to the macroscopic free-energy density Qf .
11.4.1 Young measures generated by gradients To shorten notations, we denote the N -dimensional Lebesgue measure restricted to the open bounded subset ⊂ RN by L. Definition 11.4.1. Let us denote by E := RmN ∼ Mm×N , the set of m × N matrices. A Young measure µ in Y(; E) is called a W 1,p -Young measure if there exists a bounded sequence (un )n∈N in W 1,p (, Rm ), p ≥ 1, such that µ is generated by the sequence of gradients (∇un )n∈N , i.e., nar (δ∇un (x) )x∈ ⊗ L µ, or, equivalently, from Theorem 4.3.1, Lw
(δ∇un (x) )x∈ (µx )x∈ . We admit the following important technical lemma. For a proof, consult Kinderlehrer and Pedregal [162], Pedregal [195], or Fonseca, Müller, and Pedregal [137]. Lemma 11.4.1. Let (un )n∈N be a sequence in W 1,p (, Rm ) weakly converging to some u in W 1,p (, Rm ). Then, there exists a sequence (vn )n∈N in W 1,p (, Rm ) satisfying 1,p
(i) vn ∈ u + W0 (, Rm ); (iii) (|∇vn |p )n∈N is uniformly integrable; (iii) vn − un → 0 and ∇(vn − un ) → 0 in measure. Let us point out that according to item (iii) and to Proposition 4.3.8, the two sequences (∇un )n∈N and (∇vn )n∈N generate the same W 1,p -Young measure. The main result of this section is the following characterization of W 1,p -Young measures, established by Kinderlehrer and Pedregal (see Kinderlehrer and Pedregal [162], Pedregal [195], Sychev [215]). Theorem 11.4.1. Let p > 1; then µ ∈ Y(; E) is a W 1,p -Young measure iff there exists u ∈ W 1,p (, Rm ) such that the three following assertions hold: (i) ∇u(x) = E λ dµx (λ) for a.e. x in ; (ii) for all quasi-convex function φ : E → R for which there exist some γ ∈ R and β > 0 such that γ ≤ φ(λ) ≤ β(1 + |λ|p ) for all λ ∈ E, one has φ(∇u(x)) ≤ φ(λ) dµx (λ) for a.e. x in ; E
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity (iii)
×E
“abmb 2005/1 page 4 i
451
|λ|p dµ(x, λ) < +∞.
The function u will be refered as the underlying deformation of the Young measure µ. Proof. We split the proof into several steps. Proof of the necessary conditions. First step: Necessity of (i) and (iii). By definition, there exists a bounded sequence (un )n∈N in W 1,p (, Rm ) weakly converging to some u ∈ W 1,p (, Rm ), whose sequence of gradients generates µ. Since ∇un ∇u weakly in Lp (, E), one obtains (i) by applying Proposition 4.3.6. Take now ϕ(x, λ) = |λ|p . Since λ → ϕ(x, λ) is lsc (actually continuous), according to Proposition 4.3.3, we deduce |λ|p dµ(x, λ) ≤ lim inf |∇un |p dx < +∞. n→+∞
×E
Second step: Necessity of (ii). Let φ be a quasi-convex function satisfying the growth condition in (ii) and x0 a fixed element in such that the two following limits exist: 1 φ(∇u(x0 )) = lim φ(∇u(x)) dx, ρ→0 L(Bρ (x0 )) B (x ) ρ 0
1 ρ→0 L(Bρ (x0 ))
φ(λ) dµx0 (λ) = lim E
Bρ (x0 )
φ(λ) dµx (λ) dx.
E
Such x0 exists outside a negligible set, from the Lebesgue differentiation theorem. Let us set ϕ(x, λ) = L(Bρ1(x0 )) 1Bρ (x0 ) (x)φ(λ) which defines a B() ⊗ B(E)-measurable function, continuous with respect to λ. Consider now the sequence (vn )n∈N given by Lemma 11.4.1 with = Bρ (x0 ). From the growth condition fulfilled by φ, the sequence (ϕ(x, ∇vn (x)))n∈N is uniformly integrable. Therefore, by applying Theorem 4.3.3, we obtain 1 ϕ(x, λ) dµ(x, λ) = lim φ(∇vn (x)) dx n→+∞ L(Bρ (x0 )) B (x ) ×E ρ 0 1 ≥ φ(∇u(x)) dx. (11.48) L(Bρ (x0 )) Bρ (x0 ) The last inequality is a consequence of lower semicontinuity of the integral functional v → φ(∇v(x)) dx Bρ (x0 )
when W 1,p (Bρ (x0 ), Rm ) is equipped with its weak convergence, due to the quasi-convexity assumption on φ (see Theorem 13.2.1). But, according to the slicing theorem, Theorem 4.2.4, 1 ϕ(x, λ) dµ(x, λ) = φ(λ) dµx (λ) dx L(Bρ (x0 )) Bρ (x0 ) ×E E
i
i i
i
i
i
i
452
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
so that (11.48) yields 1 L(Bρ (x0 ))
Bρ (x0 )
1 φ(λ) dµx (λ) dx ≥ L(B ρ (x0 )) E
φ(∇u(x)) dx. Bρ (x0 )
The conclusion then follows by letting ρ → 0. Proof of the sufficent conditions. The proof of the sufficient conditions is more involved. Furthermore, we prove that any Young measure satisfying conditions (i), (ii), and 1,p (iii) is generated by the sequence of gradients of a bounded sequence in u + W0 (, Rm ). For the convenience of the reader we divide the proof into several steps. First step: The Young measure µ is assumed to be homogeneous. Let a be a fixed matrix in E, Q a fixed open bounded cube of RN and let la be the linear function defined for every x ∈ Q by la (x) = a.x. We consider the closed set Ha (E) of probability measures µ on E satisfying the three following conditions: (i) E λ dµ = a; (ii) for all quasi-convex function φ satisfying ∀λ ∈ E, γ ≤ φ(λ) ≤ β(1 + |λ|p ) for some γ ∈ R and β > 0, one has φ(a) ≤ E φ(λ) dµ(λ); (iii)
E
|λ|p dµ < +∞.
Note that these three conditions are exactly the one stated in Theorem 11.4.1, fulfilled by the homogeneous Young measure µ ⊗ L. In this step we set = Q and we aim to establish 1,p the existence of a bounded sequence (un )n∈N in la + W0 (Q, Rm ) such that the sequence (∇un )n∈N generates the Young measure µ ⊗ LQ. In the second step, we will apply this result in the case when a = 0. For every v ∈ la + W 1,p (Q, Rm ), let us consider the following probability measure on E µv :=
1 L(Q)
δ∇v(x) dx, Q
which acts on every ϕ ∈ C0 (E) as follows: 1 µv , ϕ = L(Q)
ϕ(∇v(x)) dx. Q
Note that E |λ|p dµv is well defined and precisely E |λ|p dµv = We finally consider the following convex subset of Ha (E):
1 L(Q)
Q
|∇v(x)|p dx.
% & Ca (E) := µv : v ∈ la + W 1,p (Q, Rm ) . 1,p
Let now v be a fixed element of la + W0 (Q, Rm ), extended by Q-periodicity on RN , and, for every n ∈ N∗ , let us define the function vn : x → vn (x) := n1 v(nx) in
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
453
la + W 1,p (Q, Rm ). According to a classical result on oscillating functions, rephrased in terms of narrow convergence of Young measures, one has nar
(δ∇vn (x) )x∈Q ⊗ LQ µv ⊗ LQ
when n → +∞.
(11.49)
Note that the norm of ∇vn in Lp (Q, E) is exactly the one of ∇v. To conclude, it is enough to show that for every µ ∈ Ha (E), there exists a sequence (µwk )k∈N in Ca (E), (wk )k∈N 1,p bounded in la + W0 (Q, Rm ), such that µwk µ weakly in M(E), i.e. σ (C0 (E), C0 (E)); hence, according to Theorem 4.3.1, and since µwk and µ are homogeneous, nar
µwk ⊗ LQ µ ⊗ LQ
when k → +∞.
(11.50)
Indeed, since the space Y(Q; E) endowed with the topology of the narrow convergence is metrizable (see [104, Proposition 2.3.1]), combining (11.49), (11.50), and by using a diagonalization argument (cf. Lemma 11.1.1), we will show that there exists a bounded sequence (un )n∈N∗ in la + W 1,p (Q, Rm ), whose sequence of gradients generates the Young measure µ ⊗ LQ. We are going to establish (11.50) or, equivalently, the density of Ca (E) in Ha (E) for the σ (C0 (E), C0 (E)) topology. Let us point out that we want to prove the existence of a sequence (µwm )m∈N weakly converging to µ, such that moreover (wm )m∈N be bounded 1,p in la + W0 (Q, Rm ) or, equivalently, such that (∇wm )m∈N be bounded in Lp (Q, E). To take into account this condition, we establish Ca (E) = Ha (E) for a metric d finer than the classical metric d inducing the σ (C0 (E), C0 (E)) topology in the set P(E) of probability measures on E. Let us recall that given a dense countable family (ϕi )i∈N∗ in C0 (E), the distance d is given by ∀(µ, ν) ∈ P(E) × P(E), d(µ, ν) :=
+∞
1
i=1
2i ϕ
i ∞
|µ, ϕi − ν, ϕi |.
We define now the distance d by setting, for every (µ, ν) ∈ P(E) × P(E), +∞ p d (µ, ν) := |λ| dµ − |λ|p dν +
E
E
i=1
1 2i ϕ
i ∞
|µ, ϕi − ν, ϕi |,
and we argue by contradiction. Let us assume that Ca (E) is not dense in Ha (E) for the metric associated with the distance d . Then, there exists µ0 ∈ Ha (E), k ∈ N∗ and η > 0 such that k 1 ∀ν ∈ Ca (E), |λ|p dµ0 − |λ|p dν + |µ0 , ϕi − ν, ϕi | > η. (11.51) i 2 ϕi ∞ E E i=1 We reason now in the finite dimensional space Rk+1 . From (11.51), the vector 1 |λ|p dµ0 , i µ0 , ϕi 2 ϕi ∞ E i=1,...k
i
i i
i
i
i
i
454
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
does not belong to the closure of the following convex set of Rk+1 1 Ca := |λ|p dν, i ν, ϕi : ν ∈ Ca (E) . 2 ϕi ∞ E i=1,...k Consequently, according to the Hahn–Banach separation Theorem 9.1.1, there exist (ci )i=0,...k in Rk+1 and η > 0 such that for all ν ∈ Ca (E), |λ|p dν +
c0 E
≥ η + c0
k i=1
1 ci ν, ϕi 2i ϕi ∞
|λ|p dµ0 + E
Let us set φ := c0 |.|p +
k
1
i=1
2i ϕ
k i=1
i ∞
ci µ0 , ϕi .
(11.52)
ci ϕi . ϕi ∞
We claim that c0 ≥ 0. Otherwise, taking in (11.52) ν = µvt with vt := tw + la , t > 0, 1,p where w is a fixed function in W0 (Q, Rm ), and letting t → +∞, the right-hand side in (11.52) would be −∞. Replacing if necessary (when c0 = 0), the functionφ by the function φ + δ|.|p with δ > 0 small enough (choose precisely δ satisfying η − δ E |λ|p dµ0 > 0), one may assume c0 > 0. Thus the function φ still satisfies (11.52), for η > 0 replaced p by η − δ E |λ| dµ0 > 0 if necessary, together with the growth conditions c0 |λ|p + γ ≤ φ(λ) ≤ β(1+|λ|p ) for some constants γ ∈ R and β > 0. Finally, replacing φ by φ −γ , one may assume that φ satisfies (11.52) and the growth conditions c0 |λ|p ≤ φ(λ) ≤ β(1 + |λ|p ) of Proposition 11.2.2. Therefore, (11.52) and condition (ii) satisfied by µ0 yield 1 1,p inf φ(∇v(x)) dx : v ∈ la +W0 (Q, Rm ) > φ dµ0 ≥ Qφ dµ0 ≥ Qφ(a). L(Q) Q E E But, according to the classical variational principle (Proposition 11.2.2), since φ satisfies appropriate growth conditions, 1 1,p inf φ(∇v(x)) dx : v ∈ la + W0 (Q, Rm ) = Qφ(a), L(Q) Q a contradiction. Second step: The Young measure µ satisfies the conditions (i), (ii), and (iii) of Theorem 11.4.1 and u = 0. We want to construct a bounded sequence (un )n∈N in W 1,p (, Rm ) whose gradients generate µ. The idea of the proof consists in “localizing” (µx )x∈ thanks to Vitali’s covering theorem, to apply the previous step for each localization, then to stick together the sequences of functions whose gradients generate each localized Young measures. According to Vitali’s covering Lemma 4.1.2 and Remark 4.1.4, for every k ∈ N∗ , there exists a finite family (Qi,k )i∈Ik of pairwise disjoint open cubes included in satisfying $ 1 1 L \ (11.53) Qi,k ≤ , diam(Qi,k ) < . k k i∈I k
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
455
For every i ∈ Ik , let us define the probability measure µi,k on E by µi,k :=
1 L(Qi,k )
µx dx, Qi,k
which acts on every ϕ ∈ C0 (E) as follows: µi,k , ϕ =
1 L(Qi,k )
Qi,k
ϕ(λ) dµx (λ) dx.
E
With the notations of the first step, it is easy to show that µi,k belongs to H0 (E). Consequently, according to step 1, for every i ∈ Ik , there exists a bounded sequence (vi,k,n )n∈N in 1,p W0 (Qi,k , Rm ) such that nar
when n → +∞.
(δ∇vi,k,n (x) )x∈Qi,k ⊗ LQi,k µi,k ⊗ LQi,k
(11.54)
By using Lemma 11.4.1, one may furthermore assume the sequence (|∇vi,k,n |P )n∈N uniformly integrable so that, from Theorem 4.3.3, (11.54) yields lim
n→+∞ Q i,k
|∇vi,k,n | dx =
|λ|p dµi,k dx
p
E
Qi,k
1 = L(Qi,k )
|λ| dµx p
Qi,k
dx.
(11.55)
E
We now stick together the functions vi,k,n , i ∈ Ik , by setting vi,k,n if x ∈ Qi,k , vk,n (x) := 0 if x ∈ \ i∈Ik Qi,k . Take now θ in a dense subset of regular functions of L1 (), ϕ ∈ C0 (E) and set Rk := \ i∈I Qi,k θ (x)ϕ(0) dx. Note that from (11.53), lim k→+∞ Rk = 0. From (11.54), the k definition of µi,k , and according to the mean value theorem, one has lim
n→+∞
θ (x)ϕ(∇vk,n (x)) dx = lim
n→+∞
=
i∈Ik
θ(x)µi,k , ϕ dx + Rk
i∈Ik
Qi,k
i∈Ik
Qi,k
θ(x)ϕ(∇vk,n (x)) dx + Rk
Qi,k
1 = ϕ dµy dy θ(x) dx + Rk L(Qi,k ) Qi,k E i∈Ik Qi,k θ(xi,k ) ϕ dµy dy + Rk (11.56) = E
i
i i
i
i
i
i
456
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
for some xi,k ∈ Qi,k . We stress the fact that the convergence in (11.56) may be taken in the narrow convergence sense. Indeed, setting µi,k if x ∈ Qi,k , µ˜ x = δ if x ∈ \ Q , 0
i,k
i∈Ik
one defines aYoung measures (µ˜ x )x∈ ⊗L and estimate (11.56) yields limn→+∞ (δ∇vk,n (x) )x∈ ⊗ L = (µ˜ x )x∈ ⊗ L in the narrow convergence sense in Y(; E). Letting k → +∞ in (11.56), from (11.53), one obtains lim lim θ (x)ϕ(∇vk,n (x)) dx = lim θ(xi,k ) ϕ dµy dy k→+∞ n→+∞
k→+∞
= lim
i∈Ik
Qi,k
i∈Ik
Qi,k
k→+∞
=
θ(y)
θ(y)
ϕ dµy
E
ϕ dµy
dy
E
dy.
E
This shows that limk→+∞ limn→+∞ (δ∇vk,n (x) )x∈ ⊗L = (µx )x∈ ⊗L, where each limit must be taken in the narrow convergence sense in Y(; E). According to the diagonalization Lemma 11.1.1, there exists a map n → k(n) such that nar
(δ∇vk(n),n (x) )x∈ ⊗ L (µx )x∈ ⊗ L when n → +∞. Furthermore, from (11.55), the sequence (vk(n),n )n∈N is bounded in W 1,p (, Rm ). Setting un := vk(n),n shows what is required. Last step: The Young measure µ satisfies the conditions (i), (ii), and (iii) of Theorem 11.4.1 without condition on u in W 1,p (, Rm ). Given µ ∈ Y(; E) satisfying (i), (ii), and (iii), let us consider µ˜ = (µ˜ x )x∈ ⊗ L in Y(; E) defined by µ˜ x , ϕ = µx , ϕ(. − ∇u(x)) p for every ϕ ∈ Lµx (E) and for a.e. x in . It is straightforward to check that µ˜ satisfies 1,p the conditions of step 2. Thus, there exists a sequence (vn )n∈N in W0 (, Rm ) such that (∇vn )n∈N generates µ. ˜ Consider un := vn + u in W 1,p (, Rm ). For each ψ ∈ Cb (; E), ˜ ˜ λ) := ψ(x, λ + ∇u(x)). Thus, for every let us define ψ in Cb (; E) by setting ψ(x, ψ ∈ Cb (; E) we have ˜ lim ψ(x, ∇un (x)) dx = lim ψ(x, ∇vn (x)) dx n→+∞ n→+∞ ˜ = ψ(x, λ) d µ(x, ˜ λ) ×E = ψ(x, λ)dµ(x, λ), ×E
which proves that (∇un )n∈N generates µ. Remark 11.4.1. When considering functions depending on x and u, the necessary condition (ii) must be replaced by condition (ii) below: for every : × Rm × E −→ R such that
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
457
for a.e. x in , φ(x, u(x), .) is quasi-convex and satisfies γ ≤ φ(x, ξ, λ) ≤ β(1 + |ξ |p + |λ|p ) for some constants γ and β > 0, one has φ(x, u(x), λ) dµx (λ) for a.e. x ∈ . φ(x, u(x), ∇u(x)) ≤ E
Indeed, it suffices to argue as in the second step of the proof of the necessary conditions by setting 1 ϕ(x, λ) = 1B (x ) (x)φ(x, u(x), λ), L(Bρ (x0 )) ρ 0 where x0 is such that the two following limits exist: 1 φ(x, u(x), ∇u(x)) dx, φ(x0 , u(x0 ), ∇u(x0 )) = lim ρ→0 L(Bρ (x0 )) B (x ) ρ 0 1 φ(x, u(x), λ) dµx dx. φ(x0 , u(x0 ), λ) dµx0 (λ) = lim ρ→0 L(Bρ (x0 )) B (x ) E E ρ 0
11.4.2
Relaxation of classical integral functionals in Y(; E)
We intend to apply the relaxation procedure for the integral functionals of Section 11.2, but considered as living in the space X = Y(; E), equipped with the topology of the narrow convergence. The generalized solutions of the relaxed problem may be interpreted as microstructures: they capture highly oscillatory minimizing sequences on smaller and smaller spatial scales and describe fine phase mixtures in elastic crystals. Fundamental applications to real materials and polycrystals may be found in [66], [67], [68], [69], and [110]. For papers dealing with laminates and multiwell problems, see [56], [190], [214], [196], and [228]. We consider the problem % & 1,p inf F (u) : u ∈ u0 + W0 (, Rm ) (:= inf(P)), (P) where u0 is a given function in W 1,p (, Rm ), p > 1, and F : W 1,p (, Rm ) −→ R is the integral functional defined by F (u) = f (x, u, ∇u) dx.
The density f : × RN × E −→ R is assumed to be B() ⊗ B(RN ) ⊗ B(E)-measurable, continuous with respect to the third variable, to satisfy the continuity assumption |f (x, ξ, λ) − f (x, ξ , λ)| ≤ L|ξ − ξ |(1 + |ξ |p−1 + |ξ |p−1 )
(11.57)
with respect to the second variable, and the usual bounds α(|λ|p − 1) ≤ f (x, ξ, λ) ≤ β(1 + |ξ |p + |λ|p ),
i
i i
i
i
i
i
458
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
where L > 0, 0 < α < β are three given constants. For example, f may possess the following structure: f (x, ξ, λ) = f0 (x, ξ ) + f1 (x, λ). For less restrictive continuity assumptions on f , see Acerbi and Fusco [2], Dacorogna [116], and Pedregal [195]. In general (P) has no solution and the relaxation procedure applied to this problem with X = W 1,p (, Rm ) equipped with its weak convergence leads to the classical relaxed problem % & 1,p min F (u) : u ∈ u0 + W0 (, Rm ) (:= min(P)), (P) where F is the integral functional defined on W 1,p (, Rm ) by F (u) = Qf (x, u, ∇u) dx,
and, for a.e. x in , Qf (x, ξ, .) is the quasi-convex envelope of f (x, ξ, .). According to Theorem 11.1.2 and arguing as in the proof of Theorem 11.2.2, one can establish inf(P) = min(P). Because of the quasiconvexification, this procedure has the disadvantage of erasing the possible potential wells of f (x, ξ, .) so that the relaxed problem does not provide much information on the behavior of minimizing sequences (see Example 11.4.1 or examples given in [142]). An alternative way, as we will see now, is to relaxe (P) in the space Y(; E) equipped with the narrow convergence. Let us firstly give some definitions and notations. For a fixed u0 in W 1,p (, Rm ), let GY0 (; E) be the set of W 1,p -Young measures such that the underlying deformation u 1,p defined in Theorem 11.4.1 belongs to u0 +W0 (, Rm ). We define its subset of elementary 1,p W -Young measures by & % 1,p EGY0 (; E) := µ ∈ Y(; E) : ∃u ∈ u0 + W0 (, Rm ), µ = (δ∇u(x) )x∈ ⊗ L . Note that GY0 (; E) is nothing but the closure of EGY0 (; E) in Y(; E). Since u ∈ 1,p u0 + W0 (, Rm ) is uniquely defined by its gradient ∇u ∈ Lp (, E), the operator 1,p T : GY0 (; E) −→ u0 + W0 (, Rm ), µ → T µ := u, ∇u(x) = λ dµx (λ), E
is well defined. We now reformulate the problem (P) in terms of Young measures by considering the integral functional G : Y(; E) −→ R ∪ {+∞}, defined by f (x, T µ(x), λ) dµ(x, λ) if µ ∈ EGY0 (; E), G(µ) = ×E +∞ otherwise. Note that, in its domain, the functional G is nothing but the functional F . More precisely, G(µ) = f (x, u(x), ∇u(x)) dx = F (u),
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
459
1,p
when u ∈ u0 + W0 (, Rm ) and µ = (δ∇u(x) )x∈ ⊗ L. The problem (P) is then equivalent to % & inf G(µ) : µ ∈ Y(; E) . On the other hand, let us define the integral functional G : Y(; E) −→ R ∪ {+∞} by setting f (x, T µ(x), λ) dµ(x, λ) if µ ∈ GY0 (; E), G(µ) = ×E +∞ otherwise. According to Theorem 11.4.1(iii) and to the growth conditions fulfilled by f , the domain of G is GY0 (; E). The functional G is nothing but the natural extension of G to GY0 (; E). In the following theorem, we show that G is the sequential lower semicontinuous envelope of G when Y(; E) is equipped with the narrow convergence and we make precise the relationship between the two relaxed problems in W 1,p (, Rm ) and Y(; E). Theorem 11.4.2. The sequential lower semicontinuous envelope of G in Y(; E) equipped with the narrow convergence is the extended functional G. Moreover, we have inf(P) = min(P) = min(P where min(P
young
young
),
% & ) := min G(µ) : µ ∈ Y(; E) ,
young
and if µ is a solution of min(P ), then u = T µ is a solution of min(P). Furthermore, if (un )n∈N is a 1/n-minimizing sequence of the problem inf(P), then every cluster point of young (δ∇un (x) )x∈ ⊗ L converges narrowly to a solution of min(P ). Proof. We begin by proving that G is the lsc envelope of G. According to Proposition 11.1.1 and Theorem 11.1.1 we must establish the two following assertions: for every µ ∈ Y(; E), nar
∀µn µ, G(µ) ≤ lim inf G(µn ),
(11.58)
n→+∞ nar
there exists a sequence (νn )n∈N in Y(; E), νn µ, such that G(µ) ≥ lim sup G(νn ). n→+∞
(11.59) In a second step, to apply Theorem 11.1.1, we will establish the compactness of every minimizing sequence of inf(P), in terms ofYoung measures. We will conclude by giving the young relations between a solution of min(P ) and the corresponding underlying deformation. First step: Proof of (11.58). One may assume lim inf n→+∞ G(µn ) < +∞, so that, for a nonrelabeled subsequence, one has G(µn ) < +∞, µn ∈ EGY0 (; E), µ ∈ GY0 (; E), and G(µn ) =
f (x, un (x), ∇un (x)) dx. 1,p
According to the coerciveness assumption on f , there exists u ∈ u0 + W0 (, Rm ) such that un u weakly in W 1,p (, Rm ) and strongly in Lp (, Rm ). Let us write
i
i i
i
i
i
i
460
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces G(µn ) =
f (x, un (x), ∇un (x)) dx
=
f (x, u(x), λ) dµn (x, λ) ×E
f (x, u(x), ∇un (x)) dx .
f (x, un (x), ∇un (x)) dx −
+
From continuity assumption (11.57), and the boundedness of the sequence (un )n∈N in W 1,p (, Rm ), the second term in the right-hand side tends to zero. Since, for a.e. x ∈ , λ → f (x, u(x), λ) is lsc (actually continuous), the conclusion then follows from Proposition 4.3.3. Second step: Proof of (11.59). One may assume G(µ) < +∞ so that µ ∈ GY0 (; E). 1,p According to the definition of GY0 (; E), there exists u ∈ u0 +W0 (, Rm ) and a bounded m 1,p sequence (un )n∈N in W (, R ) such that µn = (δ∇un (x) )x∈ ⊗ L narrowly converges to µ. But, from Lemma 11.4.1, there exists a sequence (vn )n∈N in W 1,p (, Rm ) satisfying 1,p
(i) vn ∈ u + W0 (, Rm ); (ii) (|∇vn |p )n∈N is uniformly integrable; (iii) vn − un → 0 and ∇(vn − un ) → 0 in measure. Hence by (iii) and Proposition 4.3.8, νn := (δ∇vn (x) )x∈ ⊗ L narrowly converges to µ. Moreover, it is easily seen that up to a subsequence, vn strongly converges to u in Lp (, Rm ). From now on we consider a nonrelabeled subsequence of (vn )n∈N such that vn strongly converges to u in Lp (, Rm ). Let us write f (x, vn , ∇vn ) dx = f (x, u, ∇vn ) dx + f (x, vn , ∇vn ) dx − f (x, u, ∇vn ) dx . (11.60)
According to the growth condition satisfied by f , the sequence (f (x, u, ∇vn ))n∈N is also uniformly integrable, so that, by Theorem 4.3.3, lim f (x, u, ∇vn ) dx = f (x, u, λ) dµ(x, λ). (11.61) n→+∞
×E
On the other hand, from (11.57) and the boundedness of the sequence (vn )n∈N in Lp (, Rm ), the second term in the right-hand side of (11.60) tends to zero. Collecting (11.60) and (11.61), we obtain lim G(νn ) = G(µ), n→+∞
which completes the proof of the second step. Third step: Compactness of minimizing sequences. Let (µn )n∈N be a minimizing 1,p sequence of inf(P). Since G(µn ) < +∞, there exists (un )n∈N in u0 + W0 (, Rm ) such that µn = (δ∇un (x) )x∈ ⊗ L and G(µn ) = F (un ). According to the coerciveness
i
i i
i
i
i
i
11.4. Relaxation in the space of Young measures in nonlinear elasticity
“abmb 2005/1 page 4 i
461
assumption on f , the sequence (∇un )n∈N is bounded in Lp (, E); thus, from Remark 4.3.3, the sequence (µn )n∈N is tight. The conclusion then follows by applying Prokhorov’s theorem, Theorem 4.3.2. young Last step. Let µ be a solution of min(P ) and u = T µ. According to Theorem 11.4.1(ii) and Remark 11.4.1, we have for a.e. x ∈ , Qf (x, u(x), ∇u(x)) ≤
E
≤
Qf (x, u(x), λ) dµx (λ) f (x, u(x), λ) dµx (λ).
E
Therefore,
Qf (x, u(x), ∇u(x)) dx ≤
f (x, T µ(x), λ) dµ(x, λ) ×E
= min(P
young
) = inf(P) = min(P).
This shows that u = T µ is a solution of min(P). The last assertion is a straightforward consequence of Theorem 11.1.2. Example 11.4.1. Let us illustrate Theorem 11.4.2 with the classical example: take = (0, 1), p = 4, u0 = 0, and ϕ(ξ, λ) = (λ2 −1)2 +ξ 2 ; see Figure 11.1. The map λ → ϕ(ξ, λ) possesses the two potential wells ±1 and inf(P) = 0 has no solution.
Figure 11.1. The graph of λ → ϕ(ξ, λ) with ξ = 1. For each fixed ξ , the quasi-convex envelope of λ → (λ2 − 1)2 + ξ 2 (equivalently, its convex envelope) is given by Qϕ(x, ξ, λ) =
(λ2 − 1)2 + ξ 2 if |λ| ≥ 1, ξ 2 otherwise,
i
i i
i
i
i
i
462
“abmb 2005/1 page 4 i
Chapter 11. Relaxation in Sobolev, BV, and Young measures spaces
so that min(P) = 0 possesses a unique solution u = 0. Therefore, from Theorem 11.4.2 young we have min(P ) = 0 and if µ is a solution 2 2 2 ((λ − 1) + (T µ) )dµx (λ) dx = 0, x a.e. on (0, 1), (0,1) R 0= λ dµx (λ). R
From the first equality, we deduce that the support of µx is include in {−1, 1}. Therefore, the second equality gives µx = 21 δ−1 + 21 δ1 and µ = ( 21 δ−1 + 21 δ1 ) ⊗ L.
i
i i
i
i
i
i
“abmb 2005/1 page 4 i
Chapter 12
-convergence and applications
12.1 -convergence in abstract metrizable spaces Given a metrizable space, or more generally a first countable topological space, we would like to define a convergence notion on the space of extended real-valued functions F : X −→ R ∪ {+∞} so that the maps F → inf F, X
F → argminX F
are sequentially continuous. More precisely, given Fn , F : X → R ∪ {+∞}, under some compactness hypotheses, we wish that the following implications hold true when n → +∞: Fn → F ⇒ inf Fn → min F ; X
X
Fn → F, un ∈ argminX Fn ⇒ un → u ∈ argminX F at least for a subsequence. It is worth noticing that such convergence theory contains in some sense the theory of relaxation of chapter 11. Indeed, according to the relaxation theorem, Theorem 11.1.2, one has Fn ≡ F → cl(F ) ⇒ inf = min cl(F ) X
X
and every relatively compact minimizing sequence possesses a subsequence converging to u ∈ argminX F . Therefore, constant sequences Fn ≡ F must converge to cl(F ) in the sense described above (see Remark 12.1.1). Such an issue is of central importance in the calculus of variations. Indeed, many problems arising from physics, mechanics, economics, and approximation methods in numerical analysis are modeled by means of minimization of functionals depending on some parameter, here formally denoted by n. For instance, we write Fn for Fε , where ε is a small parameter associated to a thickness, a stiffness in mechanics, or to a size of small discontinuities. Then, if the model associated with Fn possesses a variational formulation, the problem of finding a functional F asymptotically equivalent to Fn , formally written F ∼ Fn , must be posed in terms of variational analysis: F ∼ Fn means that when n tends 463
i
i i
i
i
i
i
464
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
to infinity (or ε tends to zero), inf X F ∼ minX Fn ; x ∈ argminX F ∼ xn ∈ εn -argminX Fn , εn → 0, in the sense of some suitable topology on X. The notion of -convergence, introduced by De Giorgi and Franzoni [124] and studied in this section, corresponds to that issue. Definition 12.1.1. Let (X, d) be a metrizable space, or more generally a first countable topological space, (Fn )n∈N a sequence of extended real-valued functions Fn : X −→ R ∪ {+∞}, and F : X −→ R ∪ {+∞}. The sequence (Fn )n∈N (sequentially) -converges to F at x ∈ X iff both the following assertions hold: (i) for all sequences (xn )n∈N converging to x in X, one has F (x) ≤ lim inf Fn (xn ); n→+∞
(ii) there exists a sequence (yn )n∈N converging to x in X such that F (x) ≥ lim sup Fn (yn ). n→+∞
When (i) and (ii) hold for all x in X, we say that (Fn )n∈N -converges to F in (X, d) and we write F = − lim Fn . n→+∞
Note that trivially the system of assertions (i) and (ii) is equivalent to (i) and (ii) : (i) for all sequences (xn )n∈N converging to x in X, one has F (x) ≤ lim inf Fn (xn ); n→+∞
(ii) there exists a sequence (yn )n∈N converging to x in X such that F (x) = lim Fn (yn ). n→+∞
Remark 12.1.1. Let us consider the constant sequence (Fn )n∈N where Fn = F : X −→ R ∪ {+∞} is a given function. From chapter 11, this sequence does not -converge to F but to the lsc envelope cl(F ) of F . Consequently, the -convergence is not in general associated with a topology on the family of all functions F : X −→ R ∪ {+∞}. For a detailed analysis of subfamilies of functions on which the -convergence is endowed by a topology, we refer the interested reader to [117]. Remark 12.1.2. Let us recall the definition of the set convergence. Let (Cn )n∈N be a sequence of subsets of a metric space (X, d), or more generally of any topological space.
i
i i
i
i
i
i
12.1. -convergence in abstract metrizable spaces
“abmb 2005/1 page 4 i
465
The lower limit of the sequence (Cn )n∈N is the subset of X denoted by lim inf n→+∞ Cn and defined by % & lim inf Cn = x ∈ X : ∃xn → x, xn ∈ Cn for all n ∈ N . n→+∞
The upper limit of the sequence (Cn )n∈N is the subset of X denoted by lim supn→+∞ Cn and defined by % & lim sup Cn = x ∈ X : ∃(nk )k∈N , ∃(xk )k∈N , ∀k, xk ∈ Cnk , xk → x . n→+∞
The sequence (Cn )n∈N is said to be convergent if the following equality holds: lim inf Cn = lim sup Cn . n→+∞
n→+∞
The common value is called the limit of (Cn )n∈N in the Painlevé–Kuratowski sense and denoted by limn→+∞ Cn . Therefore, by definition x ∈ C := limn→+∞ Cn iff the two following assertions hold: ∀x ∈ C, ∃(xn )n∈N such that ∀n ∈ N, xn ∈ Cn and xn → x; ∀(nk )k∈N , ∀(xk )k∈N such that ∀k ∈ N, xk ∈ Cnk , xk → x ⇒ x ∈ C. One can prove that the -convergence of a sequence (Fn )n∈N to a function F is equivalent to the convergence of the sequence of epigraphs of the functions Fn to the epigraph of F when the class of subsets of X × R is equipped with the set convergence previously defined (see Attouch [28]). This is why -convergence is sometimes called epiconvergence. We define the extended real-valued functions −lim supn→+∞ Fn and −lim inf n→+∞ Fn by setting for all x ∈ X − lim sup Fn (x) := min lim sup Fn (xn ) : xn → x , n→+∞ n→+∞ − lim inf Fn (x) := min lim inf Fn (xn ) : xn → x . n→+∞
n→+∞
The following proposition is a straightforward consequence of the definitions above. Proposition 12.1.1. Let (X, d) be a metrizable space, or more generally a first countable topological space, (Fn )n∈N a sequence of functions Fn : X −→ R ∪ {+∞} and F : X −→ R ∪ {+∞}. Then (i) the sequence (Fn )n∈N (sequentially) -converges to F iff − lim sup Fn ≤ F ≤ − lim inf Fn ; n→+∞
n→+∞
(ii) the functions −lim supn→+∞ Fn and −lim inf n→+∞ Fn are lower semicontinuous.
i
i i
i
i
i
i
466
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
The main interest of the concept of -convergence is its variational nature made precise in item (i) below. For more precise details about epiconvergence or -convergence, we refer the reader to Attouch [28], Braides [83], and Dal Maso [117]. Theorem 12.1.1. Let (Fn )n∈N be a sequence of functions Fn : X −→ R ∪ {+∞} which -converges to some function F : X −→ R ∪ {+∞}. Then the following assertions hold: (i) Let xn ∈ X be such that Fn (xn ) ≤ inf{ Fn (x) : x ∈ X } + εn , where εn > 0, εn → 0 when n → +∞. Assume that {xn , n ∈ N} is relatively compact; then every cluster point x of {xn : n ∈ N} is a minimizer of F and lim inf{ Fn (x) : x ∈ X } = F (x).
n→+∞
(ii) If G : X → R is continuous, then (Fn + G)n∈N -converges to F+G. Let (Fn )n∈N be a sequence of functions Fn : X −→ R ∪ {+∞}. If there exists a function F : X −→ R ∪ {+∞} such that each subsequence of (Fn )n∈N possesses a subsequence which -converges to F , then all the sequence -converges to F . Proof. Assertion (ii) is easy to establish and is left to the reader. For a proof of the last assertion, consult Attouch [28, Proposition 2.72]. The proof of assertion (i) is very close to that of Theorem 11.1.2. Let x be a cluster point of {xn : n ∈ N}, (xσ (n) )n∈N a subsequence of {xn : n ∈ N} converging to x and set = x if there exists n such that m = σ (n), x˜m = σ (n) x¯ otherwise. Then x˜m → x¯ when m → +∞ and, according to (i) of Definition 12.1.1, we have F (x) ≤ lim inf Fn (x˜n ) ≤ lim inf Fσ (n) (xσ (n) ) = lim inf inf Fσ (n) . n→+∞
n→+∞
n→+∞ X
(12.1)
Let now x be any element of X. According to (ii) of Definition 12.1.1, there exists a sequence (yn )n∈N converging to x and satisfying F (x) ≥ lim sup Fn (yn ) ≥ lim sup Fσ (n) (yσ (n) ). n→+∞
(12.2)
n→+∞
Combining (12.1) and (12.2), we obtain F (x) ≤ lim inf inf Fσ (n) ≤ lim sup inf Fσ (n) ≤ lim sup Fσ (n) (yσ (n) ) ≤ F (x). n→+∞ X
n→+∞
X
(12.3)
n→+∞
This proves that F (x) = minX F . Taking x = x in (12.3) we also obtain limn→+∞ inf X Fσ (n) = minX F . Since all subsequence of inf X Fn possesses a subsequence converging to minX F , one has limn→+∞ inf X Fn = minX F as required. In the following sections, we give three applications of the -convergence. In Section 14.2, we also show how the -convergence allows us to justify some one-dimensional models in the framework of fracture mechanics.
i
i i
i
i
i
i
12.2. Application to the nonlinear membrane model
“abmb 2005/1 page 4 i
467
12.2 Application to the nonlinear membrane model Let ω be an open bounded subset of R2 with boundary γ and consider ε = ω × (0, ε), the reference configuration filled up by some elastic material. This three-dimensional thin structure is clamped on a part 0,ε = γ0 × (0, ε) of the boundary ∂ε of ε (see Figure 12.1).
Figure 12.1. The deformation of a thin layer ε of size ε. To take into account large purely elastic deformation, the constitutive law of the deformable body is associated with a nonconvex elastic density f satisfying a growth condition of order p > 1. The stored strain energy associated with a displacement field u : ε → R3 is given by the integral functional Fε : Lp (ε , R3 ) −→ R+ ∪ {+∞} defined by 1,p 1 f (∇u) dx if u ∈ W0,ε (ε , R3 ), ε Fε (u) = ε +∞ otherwise, where the density f satisfies the conditions (11.5) and (11.6), namely, there exist three positive constants α, β, L such that ∀a ∈ M3×3 , ∀a, b ∈ M
3×3
α|a|p ≤ f (a) ≤ β(1 + |a|p ), ,
|f (a) − f (b)| ≤ L|b − a|(1 + |a|
(12.4) p−1
+ |b|
p−1
).
(12.5)
The scaling parameter ε−1 accounts for the stiffness of the material. In the linearized elasticity framework, it corresponds to Lamé coefficients of order ε −1 . The structure is subjected to applied body forces gε : ε −→ R3 for which we make the following assumption: there exists a vector valued function g : = ω × (0, 1) −→ R3 , g ∈ Lq (, R3 ) (1/p + 1/q = 1), such that εgε (x, ˆ εx3 ) = g(x), x = (x, ˆ x3 ). The exterior loading is Lε (u) =
gε .u dx ε
and the equilibrium configuration is given by displacement vector fields uε , solutions of the problem % & inf Fε (u) − Lε (u) : u ∈ Lp (ε , R3 ) . Due to the very small thickness ε of the layer ε , for computing an approximate equilibrium displacement field, it is illusory to make a direct use of the finite element
i
i i
i
i
i
i
468
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
method described in chapter 7. The variational property of -convergence (Theorem 12.1.1) provides a new procedure: by letting ε go to zero, we aim at finding the elastic energy of a (fictitious) material occupying the two-dimensional membrane ω. We will finally compute an approximate equilibrium displacement field for the corresponding minimization problem by means of a two-dimensional finite element method. To work in the fixed space Lp (, R3 ), = ω × (0, 1), the change of scale (x, ˆ x3 ) = (x, ˆ εx3 ) transforming (x, ˆ x3 ) ∈ ε into (x, ˆ x3 ) ∈ leads to the following equivalent optimization problem: find u˜ ε solution of p 3 ˜ g.u dx : u ∈ L (, R ) , inf Fε (u) −
f ∇v, 9 1 ∂v dx if v ∈ W 1,p (, R3 ), 0 ε ∂x3 F˜ε (v) = +∞ otherwise, 9 9 = ( ∂vi )i=1,2,3, j =1,2 ). 0 = γ0 × (0, 1), and ∇v denotes the tangential gradient of v (i.e., ∇v ∂xj As suggested above, we first establish the existence of the -limit of the sequence (F˜ε )ε>0 when ε goes to zero. To make precise its domain, we establish the following compactness result. where
Proposition 12.2.1 (compactness). Let (uε )ε>0 be a sequence in Lp (, R3 ) satisfying sup F˜ε (uε ) < +∞. ε>0 1,p
Then, there exist a nonrelabeled subsequence and u in V = {v ∈ W0 (, Rm ) : such that uε converges to u, weakly in
1,p W0 (, R3 ),
p
∂v ∂x3
= 0},
3
and strongly in L (, R ).
Proof. Since supε>0 F˜ε (uε ) < +∞, by using the lower bound in (12.4), we obtain 1 ∂uε 9 ˜ dx; f ∇uε , Fε (uε ) = ε ∂x3 (∇uε )ε>0 is bounded in Lp (, M3×3 );
1 ∂uε ε ∂x3
ε>0
is bounded in Lp (, R3 ).
Consequently, according to the Rellich–Kondrakov theorem, Theorem 5.4.2, and the Poincaré 1,p inequality, Theorem 5.3.1, there exist some u ∈ W0 (, R3 ) and a nonrelabeled subsequence of (uε )ε>0 such that 1,p
uε u weakly in W0 (, R3 ); uε → u strongly in Lp (, R3 ); ∂uε → 0 strongly in Lp (, R3 ). ∂x3 Therefore u belongs to V .
i
i i
i
i
i
i
12.2. Application to the nonlinear membrane model
“abmb 2005/1 page 4 i
469
1,p
Note that V is canonically isomorphic to Wγ0 (ω, R3 ). In the sequel, we will use 1,p the same notation for v ∈ V and its canonical representant in Wγ0 (ω, R3 ). The following theorem was established by Ledret-Raoult [168]. For more general and recent variational models related to thin elastic plates, see [25], [27], [77], [175], and [176]. For a variational model taking into account oscillation-concentration effects, see [169]. Theorem 12.2.1 (Le Dret and Raoult [168]). Let us equip Lp (, R3 ) with its strong topology. The sequence of integral functionals (F˜ε )ε>0 -converges to the integral functional F defined in Lp (, R3 ) by: Qf0 (∇u) 9 d xˆ if u ∈ V , F (u) = ω +∞ otherwise. The energy density f0 : M3×2 −→ R is defined for all m in M3×2 by & % f0 (m) = inf f ((m|ξ )) : ξ ∈ R3 and Qf0 denotes the quasi-convex envelope of f0 . We write (m/ξ ) to denote the matrix M3×3 obtained by completing the matrix m with the column ξ . Since the map u → g.u dx is continuous on Lp (, R3 ), from Theorem 12.1.1 we deduce the following corollary. Corollary 12.2.1. The sequence of optimization problems
inf F˜ε (u) −
g.u dx : u ∈ L (ε , R )
(Pε )
g.u d xˆ : u ∈ V ,
(P)
p
3
ε
converges to the limit problem
min F (u) − ω
1 where g is defined for all xˆ ∈ ω, by g(x) ˆ = 0 g(x, ˆ s) ds. Moreover, if uε is a solution or an ε-minimizer of (Pε ), then u˜ ε defined by u˜ ε (x) = uε (x, ˆ εx3 ) strongly converges in Lp (, R3 ) to a solution u of the limit problem (P). Roughly speaking, for very small thickness of the layer ε , an equilibrium configu1,p 1,p ration uε of (Pε ), living in W0,ε (ε , R3 ), is close to u, living in Wγ0 (ω, R3 ), and the layer ε may be considered as a two-dimensional membrane ω, reference configuration filled up by some elastic material whose strain energy density is the function Qf0 . Proof of Theorem 12.2.1. For all bounded Borel sets A of RN , we will sometimes denote its N -dimensional Lebesgue measure by |A| rather than LN (A). The proof proceeds in two
i
i i
i
i
i
i
470
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
steps, corresponding to each inequality in the definition of the -convergence. Let us first notice that f0 satisfies ∀m ∈ M3×2 ,
∀m, m ∈ M
α|m|p ≤ f0 (m) ≤ β(1 + |m|p ), 3×2
(12.6)
|f0 (m) − f0 (m )| ≤ L |m − m |(1 + |m|p−1 + |m |p−1 ),
,
(12.7)
where L is a positive constant depending only on p and L. These estimates are obtained by easy calculations from (12.4) and (12.5) and left to the reader. Consequently f0 fulfills all conditions of Proposition 11.2.2. First step. Let uε strongly converge to u in Lp (, R3 ). We are going to establish F (u) ≤ lim inf F˜ε (uε ). ε→0
Obviously, one may assume lim inf ε→0 F˜ε (uε ) < +∞ so that for a nonrelabeled subsequence, 9 ε , 1 ∂uε dx F˜ε (uε ) = f ∇u ε ∂x3 and from Proposition 12.2.1, u belongs to V . Trivially we have 1 ∂uε 9 9 ε ) dx dx ≥ f ∇uε , f0 (∇u ε ∂x3 9 ε ) dx. ≥ Qf0 (∇u (12.8)
Let us now consider the function h : M −→ R defined by h(a) = Qf0 (a1 |a2 ), where a1 and a2 denote the two first columns of the matrix a = (a1 , a2 , a3 ). We claim that 3×3
∀a ∈ M3×3 , ∀a, b ∈ M
3×3
0 ≤ h(a) ≤ β(1 + |a|p ), ,
|h(a) − h(b)| ≤ L |b − a|(1 + |a|
(12.9) p−1
+ |b|
p−1
),
(12.10)
where L is some positive constant depending only on p and L . We also claim that h = Qh where, D denoting any bounded open subset of R3 with |∂D| = 0, the function Qh is defined by 1 1,p 3×3 3 ∀a ∈ M , Qh(a) = inf h(a + ∇φ) dx : φ ∈ W0 (D, R ) |D| D (see Proposition 11.2.2). Inequalities (12.9) and (12.10) are straightforward consequences of inequalities (12.6) and (12.7). Let us show that h = Qh. Indeed, let Y = Yˆ × (0, 1), 1,p where Yˆ = (0, 1)2 , φ ∈ W0 (Y, R3 ), and set φy (x) ˆ = φ(x, ˆ y) for a.e. y in (0, 1). Clearly 1,p φy belongs to W0 (Yˆ , R3 ) and we have 9 dx h(a + ∇φ) dx = Qf0 ((a1 |a2 ) + ∇φ) Y Y 1 = Qf0 ((a1 |a2 ) + ∇φy (x)) ˆ d xˆ dy Yˆ
0
≥
1
Qf0 ((a1 |a2 ))dy = h(a),
0
i
i i
i
i
i
i
12.2. Application to the nonlinear membrane model
“abmb 2005/1 page 4 i
471
where we have used the quasi-convexity inequality satisfied by Qf0 in the last inequality (see Proposition 11.2.2). Consequently, 1,p h(a) = inf h(a + ∇φ) dx : φ ∈ W0 (Y, R3 ) Y
and, according to Proposition 11.2.2, the equality h = Qh follows from 1 1,p inf h(a + ∇φ) dx : φ ∈ W0 (D, R3 ) |D| D 1,p 3 = inf h(a + ∇φ) dx : φ ∈ W0 (Y, R ) . Y
Letting ε → 0 in (12.8) and according to Remark 11.2.1 and Theorem 13.2.1, we obtain 9 ε , 1 ∂uε dx ≥ lim inf 9 ε ) dx lim inf f ∇u Qf0 (∇u ε→0 ε→0 ε ∂x3 = lim inf h(∇uε ) dx ε→0 9 dx. h(∇u) dx = Qf0 (∇u) ≥
Since u belongs to V , with our convention the last integral is also equal to and the proof of the first step is complete.
ω
Qf0 (∇u) d xˆ
Second step. We are going to establish ( − lim supε→0 F˜ε ) ≤ F. Let us assume 1,p F (u) < +∞ so that u ∈ Wγ0 (ω, R3 ) and F (u) = Qf0 (∇u(x)) ˆ d x. ˆ ω
Following the proof of Lemma 11.2.2 about interchange between infimum and integral, one may easily establish f0 (∇u) d xˆ = inf f (∇u(x), ˆ ξ(x)) ˆ d x. ˆ (12.11) ξ ∈D(ω,R3 ) ω
ω
1,p
Let now ξ be some fixed element in D(ω, R3 ) and define in W0 (, R3 ) the following function: ˆ x3 ) = u(x) ˆ + εx3 ξ(x). ˆ wε (x, It is easy to see that wε strongly converges to u in Lp (, R3 ). On the other hand, from (12.5), an easy computation gives ˜ lim Fε (wε ) = f (∇u(x), ˆ ξ(x)) ˆ d x. ˆ (12.12) ε→0
ω
i
i i
i
i
i
i
472
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
Now, from (12.11), taking the infimum over ξ ∈ D(ω, R3 ), (12.12) gives lim F˜ε (wε ) ≤ f0 (∇u) d xˆ ε→0
so that
ω
p 3 ˜ inf lim sup Fε (vε ) : vε → u strongly in L (, R ) ≤ f0 (∇u) d x, ˆ ε→0
ω
that is to say,
˜ − lim sup Fε (u) ≤ F˜ (u),
(12.13)
ε→0
where F˜ is the functional defined in Lp (, R3 ) by f0 (∇v(x)) ˆ d xˆ if v ∈ V , F˜ (v) = ω
+∞ otherwise.
Obviously, (12.13) also holds for any function u ∈ Lp (, R3 ). Taking the lower semicontinuous envelope of each of the two members when Lp (, R3 ) is equipped with its strong topology, and according to Proposition 12.1.1(ii) and to an easy adaptation of Corollary 11.2.1, we obtain − lim sup F˜ε (u) ≤ F (u) ε→0
for all u ∈ L (, R ). P
3
Last step. Collecting the two previous steps gives − lim supε→0 F˜ε ≤ F ≤ − lim inf ε→0 F˜ε so that − limε→0 F˜ε = F .
12.3 Application to homogenization of composite media 12.3.1 The quadratic case in one dimension Before giving the general result concerning homogenization of composite media in Subsection 12.3.2, we establish a complete description of -limits of integral functionals with quadratic density in the one-dimensional case. More precisely, given aε : R → R satisfying that there exist α > 0 and β > 0 such that, for all x ∈ R, α ≤ aε ≤ β,
(12.14)
we would like to establish the existence of a -limit for the sequence of integral functionals Fε : L2 (0, 1) → R+ ∪ {+∞} defined by aε (x) u2 (x) dx if u ∈ H 1 (0, 1), Fε (u) = (0,1) +∞ otherwise, when L2 (0, 1) is equipped with its strong topology.
i
i i
i
i
i
i
12.3. Application to homogenization of composite media
“abmb 2005/1 page 4 i
473
Theorem 12.3.1. Assume that aε fulfills condition (12.14). Then the following assertions hold: (i) If a1ε a1 for the σ (L∞ , L1 ) topology, then (Fε )ε>0 -converges to the integral functional F defined on L2 (0, 1) by a(x) u2 (x) dx if u ∈ H 1 (0, 1), F (u) = (0,1) +∞ otherwise. (ii) Conversely, if (Fε )ε>0 -converges to some functional F , then ( a1ε )ε>0 σ (L∞ , L1 )converges to some b with a = b1 satisfying (12.14), and F has the integral representation F (u) =
a(x) u2 (x) dx if u ∈ H 1 (0, 1),
(0,1)
+∞ otherwise.
Proof of (i). Let (uε )ε>0 be a sequence strongly converging to some u in L2 (0, 1). We want to establish (12.15) F (u) ≤ lim inf Fε (uε ). ε→0
Obviously, one may assume lim inf ε→0 Fε (uε ) < +∞, so that, for a nonrelabeled subsequence, uε belongs to H 1 (0, 1). From the equiboundedness of uε in L2 (0, 1), we deduce that uε is bounded in H 1 (0, 1) so that uε weakly converges to u in H 1 (0, 1). We now take into account the quadratic expression of Fε and write Fε (uε ) as follows: 2 Fε (uε ) = aε uε dx = aε (uε − au /aε )2 dx (0,1) (0,1) +2 uε u a dx − u2 a 2 /aε dx (0,1) (0,1) ≥2 uε u a dx − u2 a 2 /aε dx. (0,1)
(0,1)
Letting ε → 0 gives (12.15). Given u ∈ L2 (0, 1), we now must construct a sequence (vε )ε>0 strongly converging to u and satisfying (12.16) F (u) ≥ lim sup Fε (vε ). ε→0
One may assume u ∈ H 1 (0, 1). Let us set vε (x) = u(0) +
x
a(t)u (t)/aε (t) dt.
0
We recall that u belongs to C([0, 1]) so that the previous expression is well defined. Then vε = au /aε weakly converges to u in L2 (0, 1). Since vε is bounded in L∞ (0, 1), we deduce
i
i i
i
i
i
i
474
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
that vε is bounded in H 1 (0, 1); thus, from the Rellich–Kondrakov theorem, Theorem 5.4.2, it strongly converges to some θ in L2 (0, 1) with θ = u , thus θ = u + c. According to the continuity of the trace, one has θ (0) = u(0), so that θ = u and vε → u strongly in L2 (0, 1). On the other hand, Fε (vε ) = a 2 u2 /aε dx, (0,1)
and letting ε → 0 yields
lim Fε (vε ) = F (u),
ε→0
hence (12.16). Proof of (ii). From (12.14), 1/aε is bounded in L∞ (0, 1). Therefore, for a nonrelabeled subsequence, it σ (L∞ , L1 )-converges to some b with a = b1 satisfying (12.14). According to (i), the corresponding subsequence of (Fε )ε>0 -converges to the functional G defined by a(x) u2 (x) dx if u ∈ H 1 (0, 1), G(u) = (0,1) +∞ otherwise. Consequently F = G and a is uniquely defined by F . Since all subsequence of (1/aε )ε>0 possesses a subsequence which σ (L∞ , L1 )-converges to the same limit 1/a, all the sequence (1/aε )ε>0 σ (L∞ , L1 )-converges to 1/a. Example 12.3.1. Let us consider aε defined by aε (x) = a(x/ε), where a is a (0, 1)-periodic function taking two positive values α and β on (0, 1/2) and (1/2, 1), respectively. Then 1/aε weakly converges for the σ (L∞ , L1 ) topology to (α −1 + β −1 )/2. For a proof, see Example 2.4.2 or Proposition 13.2.1 and the proof of Theorem 13.2.1. Therefore −1 −1 α + β −1 u2 (x) dx if u ∈ H 1 (0, 1), F (u) = 2 (0,1) +∞ otherwise. The functional Fε may be interpreted for example as the elastic energy of a system of two kinds of small periodically distributed springs with size ε. Theorem 12.3.1 shows that the mechanical behavior of such a system is equivalent to a homogeneous string in the sense −1 −1 −1 of -convergence. The equivalent density is associated with the strain tensor α +β 2 strictly smaller than the mean value α+β of the tensors associated with the two kinds of 2 material. Remark 12.3.1. The same problem in the two-dimensional case, describing, for example, a system of two kinds of small elastic pieces in = (0, 1)2 in a chessboard structure, may be treated as above by using the concept of -convergence. One √ can show that the equivalent density is quadratic and associated with the strain tensor αβ. In the three-dimensional case, there is no explicit formula for the strain tensor limit. More generally, when working with general quadratic densities of the form fε (ξ ) = Aε ξ, ξ , Aε ∈ M3×3 satisfying, for all ξ ∈ R3 , α|ξ |2 ≤ Aε ξ, ξ ≤ β|ξ |2 ,
i
i i
i
i
i
i
12.3. Application to homogenization of composite media
“abmb 2005/1 page 4 i
475
one can show that the -limit of the associated integral functional possesses a density of the form Aξ, ξ . The strategy is then to derive optimal bounds for the constant limit matrix A (see Murat and Tartar [192]).
12.3.2
Periodic homogenization in the general case
Let be an open bounded subset of R3 which represents the interior of the reference configuration filled up by some elastic (p > 1) or pseudoplastic (p = 1) material which is clamped on a part 0 of the boundary ∂ of . We assume that this material is heterogeneous with a periodic distribution of small heterogeneities of size of order ε > 0, so that the stored strain energy density is of the form x (x, a) → f ,a , ε where f (., a) is Y -periodic, Y = (0, 1)3 . We assume that f satisfies conditions (12.4) and (12.5) of the previous section and, to take into account large purely elastic deformations, f is not assumed to be convex but possibly quasiconvex. With the notations of Sections 11.2 and 11.3, the stored strain energy associated with a displacement field u : → R3 is given by the integral functional Fε : Lp (, R3 ) −→ R+ ∪ {+∞} defined by f x , ∇u dx if u ∈ W 1,p (, R3 ), 0 ε Fε (u) = +∞ otherwise. The structure is subjected to applied body forces g : −→ R3 , g ∈ Lq (, R3 ), (1/p + 1/q = 1 if p > 1; q = +∞ if p = 1) and the exterior loading is defined by g.u dx. L(u) =
The equilibrium configuration is then given by the displacement field uε solution of the problem L(u) : u ∈ Lp (, R3 ) . inf Fε (u) −
Due to the very small size ε of heterogeneousness, for computing an approximate equilibrium displacement field, it is illusory to make direct use of the finite element method. The variational property of -convergence (Theorem 12.1.1) would again provide a new procedure: to find a fictitious material occupying , which appears to be homogeneous when ε goes to zero, and to compute the approximate equilibrium displacement field by means of a finite element method related to a discretization of the new model. To treat more general situations, we deal with functionals Fε : Lp (, Rm ) −→ + R ∪ {+∞} defined by f x , ∇u dx if u ∈ W 1,p (, Rm ), 0 ε Fε (u) = +∞ otherwise,
i
i i
i
i
i
i
476
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
where is an open bounded subset of RN , m is any positive integer, f satisfies the growth conditions (12.4) and (12.5), and, for all a in the set Mm×N of m × N -matrices, the Borel function f (., a) is assumed to be Y -periodic, Y = (0, 1)N . Following the strategy of the previous subsection, we are going to establish the convergence of the sequence (Fε )ε>0 when Lp (, Rm ) is equipped with its strong topology. In Theorem 12.3.2, we will establish that the -limit of Fε possesses an integral representation. In the following proposition, we characterize its density (p > 1) or its regular part (p = 1). It is worth noticing the similarity between this proposition and Proposition 11.2.2, where we defined the relaxed density of the integral functional on Sobolev or BV spaces. Proposition 12.3.1. For all open bounded convex set A in RN the following limit exists: 1 1,p f (x, a + ∇u(x)) dx : u ∈ W0 (A/ε, Rm ) . f hom (a) = lim inf ε→0 |A/ε| A/ε Moreover, this limit does not depend on the choice of the open bounded convex set A and is equal to 1 1,p m f (y, a + ∇u(y)) dy : u ∈ W (Y, R ) . inf∗ inf 0 n∈N nN nY The proof is based on a convergence result related to subadditive processes. To go further, we first give some notations and definitions. Let us denote the family of all the bounded Borel sets of RN by Bb (RN ). A sequence (Bn )n∈N of sets of Bb (RN ) is said to be regular if there exists an increasing sequence of intervals In with vertices in ZN and a positive constant C independent of n such that Bn ⊂ In and |In | ≤ C|Bn | for all n ∈ N. A subadditive ZN -invariant set function indexed on Bb (RN ) is a map S : Bb (RN ) −→ R, A → SA , such that (i) ∀A, B ∈ Bb (RN ) with A ∩ B = ∅, SA∪B ≤ SA + SB ; (ii) ∀A ∈ Bb (RN ), ∀z ∈ ZN , Sz+A = SA . Finally, for all A in Bb (RN ), we define the positive number ρ(A) := sup{r ≥ 0 : ¯ ∃Br (x) ⊂ A}, where B¯ r (x) is the closed ball with radius r > 0 centered at x. The following lemma generalizes the classical limit theorem related to subadditive processes indexed by cubes. For a proof, we refer the reader to Ackoglu and Krengel [3], to Licht and Michaille [170], or to Dal Maso and Modica [119]. Lemma 12.3.1. Let S be a subadditive ZN -invariant set function such that SI : I = [a, b[, a = (ai )i=1...4 , b = (bi )i=1...4 ∈ ZN , γ (S) := inf |I | ∀i = 1, . . . , N, ai < bi > −∞ and which satisfies the following domination property: there exists a positive constant C(S) < +∞ such that |SA | ≤ C(S) for all Borel sets A included in [0, 1[N . Let (An )n∈N
i
i i
i
i
i
i
12.3. Application to homogenization of composite media
“abmb 2005/1 page 4 i
477
be a regular sequence of Borel convex sets of Bb (RN ) satisfying limn→+∞ ρ(An ) = +∞. Then SAn S[0,m[N lim = γ (S). = inf ∗ n→+∞ |An | m∈N mN
Proof of Proposition 12.3.1. Let us notice that in the definition of subadditivity, assertion (i) may be replaced by the following: for all A, B ∈ Bb (RN ) with A ∩ B = ∅ and |∂A| = |∂B| = 0, SA∪B ≤ SA + SB (see [170, Remark of Theorem 2.1] or [119]). Then we claim that 1,p
S : A → inf
0
A
0
f (x, a + ∇u(x)) dx : u ∈ W0 (A, Rm )
is a subadditive ZN -invariant process. Indeed, Y -periodicity of f (., a) yields SA+z = SA for all A ∈ Bb (RN ) and all z ∈ ZN . Let now A, B in Bb (RN ) such that A ∩ B = ∅ and 0
0
|∂A| = |∂B| = 0. For arbitrary fixed η > 0, consider ϕA ∈ D(A, Rm ) and ϕB ∈ D(B, Rm ) satisfying f (x, a + ∇ϕA ) dx ≤ SA + η, 0 A
0
f (x, a + ∇ϕB ) dx ≤ SB + η.
B 0
0
Extending ϕA and ϕB by zero, respectively, on RN \ A and RN \ B, the function ϕ which 0
0
1,p
0
coincides with ϕA on A and ϕB in B belongs to W0 (A ∪ B, Rm ). Since |∂A| = |∂B| = 0, 0
0
0
| A ∪ B \ A ∪ B | = 0, thus
0
f (x, a + ∇ϕ) dx =
A∪B
0
f (x, a + ∇ϕA ) dx +
0
A
+
0
0
0
f (x, a) dx
A∪B\A∪B
=
0
A
f (x, a + ∇ϕB ) dx
B
f (x, a + ∇ϕA ) dx +
≤ SA + SB + 2η.
0
f (x, a + ∇ϕB ) dx
B
Consequently, SA∪B ≤ SA + SB + 2η and the subadditivity of S follows by letting η → 0. We conclude the proof by applying Lemma 12.3.1 to this process. Remark 12.3.2. Lemma 12.3.1 may be generalized for ergodic subadditive processes, i.e., for subadditive processes with value in L1 (, T , P) and whose probability law is, roughly speaking, invariant under a group (Tz )z∈ZN of measure preserving transformations on the probability space (, T , P). This probabilistic version allows to treat stochastic homogenization (see, for instance, [119], [170], or Section 14.2).
i
i i
i
i
i
i
478
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
We can now establish the main convergence result, a generalization of Theorem 11.2.1 and Theorem 11.3.1. In what follows, ε actually denotes a sequence (εn )n∈N of positive numbers εn going to zero when n → +∞ and we adopt the notation of Section 11.2. For more general problems involving multiple small parameters, see [10], [11]. For problems concerned with nonlocal effects, see [62], [63]. Theorem 12.3.2. Let f satisfying (12.4) and (12.5) with p ≥ 1 and assume that the Borel function f (., a) is Y -periodic for all a in Mm×N . Let us consider the integral functional Fε defined in Lp (, Rm ) by f x , ∇u dx if u ∈ W 1,p (, Rm ), 0 ε Fε (u) = +∞ otherwise, where Lp (, Rm ) is equipped with its strong topology. Then (Fε )ε>0 -converges to the integral functional F hom defined by (i) case p > 1 f hom (∇u) dx if u ∈ W 1,p (, Rm ), 0 F hom (u) = +∞ otherwise; (ii) case p = 1 s D u hom hom,∞ |D s u| f (∇u) dx + (f ) s u| |D F hom (u) = + (f )hom,∞ (γ0 (u) ⊗ ν) dHN −1 if u ∈ BV(, Rm ), 0 +∞ otherwise, where ν denotes the outer unit normal to 0 , γ0 the trace operator, and (f )hom,∞ the recession function of f hom defined by (f )hom,∞ (a) = lim sup t→+∞
(f )hom (ta) . t
The proof of Theorem 12.3.2 is the consequence of Propositions 12.3.2 and 12.3.3. To shorten the proofs we do not take into account the boundary condition, i.e., the domain of Fε is W 1,p (, Rm ). For treating the general case, it suffices to reproduce exactly the proofs of Corollary 11.2.1 when p > 1 and Corollary 11.3.1 when p = 1. Proposition 12.3.2. For all u in Lp (, Rm ) and all sequence (un )n∈N strongly converging to u in Lp (, Rm ), one has F hom (u) ≤ lim inf Fεn (un ). n→+∞
(12.17)
i
i i
i
i
i
i
12.3. Application to homogenization of composite media
“abmb 2005/1 page 4 i
479
Proof. Our strategy is exactly the one of Proposition 11.2.3 or 11.3.3. Obviously, one may assume lim inf n→+∞ Fεn (un ) < +∞. For a nonrelabeled subsequence, consider the nonnegative Borel measure µn := f ( ε.n , ∇un (.))L; we have sup µn () < +∞. n∈N
Consequently, there exists a further subsequence (not relabeled) and a nonnegative Borel measure µ ∈ M() such that µn µ weakly in M(). Let µ = gLN + µs be the Lebesgue–Nikodym decomposition of µ, where µs is a nonnegative Borel measure, singular with respect to the N -dimensional Lebesgue measure L restricted to . For establishing (12.17) it is enough to prove that g(x) ≥ f hom (∇u(x)) x a.e., s D u |D s u| when p = 1. µs ≥ f hom,∞ |D s u|
(12.18) (12.19)
(a) Proof of (12.18). Let ρ > 0 intended to tend to 0 and let Bρ (x0 ) be the open ball of radius ρ centered at x0 . According to the theory of differentiation of measures, for a.e. x0 ∈ µ(Bρ (x0 )) g(x0 ) = lim . ρ→0 |Bρ (x0 )| Applying Lemma 4.2.1, one may assume µ(∂Bρ (x0 )) = 0 for all but countably many ρ > 0, so that, from Alexandrov’s theorem, Proposition 4.2.3, we have µ(Bρ (x0 ) = limn→+∞ µn (Bρ (x0 )) and we finally must establish µn (Bρ (x0 )) ≥ f hom (∇u(x0 )) ρ→0 n→+∞ |Bρ (x0 )| lim lim
for a.e. x0 ∈ .
(12.20)
Let us assume for the moment that the trace of un on ∂Bρ (x0 ) coincides with the affine function u0 defined by u0 (x) := u(x0 ) + ∇u(x0 ), x − x0 . It follows from Proposition 12.3.1 that µn (Bρ (x0 )) |Bρ (x0 )| 1 x = lim f , ∇u(x0 ) + ∇(un − u0 ) dx n→+∞ |Bρ (x0 )| B (x ) εn ρ 0
1 x 1,p m ≥ lim sup inf f , ∇u(x0 ) + ∇φ dx : φ ∈ W0 (Bρ (x0 ), R ) |Bρ (x0 )| Bρ (x0 ) εn n→+∞ 1 1 1,p m = lim inf f (x, ∇u(x0 ) + ∇φ) dx : φ ∈ W0 Bρ (x0 ), R n→+∞ εn | ε1n Bρ (x0 )| ε1n Bρ (x0 ) lim
n→+∞
= f hom (∇u(x0 )), and the proof would be complete. The idea now consists in modifying un into a function of W 1,p (Bρ (x0 ), Rm ) which coincides with u0 on ∂Bρ (x0 ) in the trace sense, to follow the
i
i i
i
i
i
i
480
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
previous procedure and to control the additional terms, when ρ goes to zero, thanks to the estimate (see Lemma 11.2.1 and Proposition 10.4.1): for a.e. x ∈ , 1/p p 1 |u(x) − u(x0 ) + ∇u(x0 )(x − x0 ) | dx = o(ρ). |Bρ (x0 )| Bρ (x0 ) The suitable modification of un is exactly the one of Proposition 11.2.3 because of the conditions (12.4) and (12.5) satisfied by f . The proof of (12.20) is then complete. (b) Proof of (12.19). It suffices to reproduce the proof of inequality s D u s ∞ µ ≥ (Qf ) |D s u| |D s u| obtained in the proof of Proposition 11.3.3 after substituting f by f ( εxn , .) and, according to Proposition 12.3.1, after substituting Qf by f hom . Proposition 12.3.3. For all u in Lp (, Rm ), p ≥ 1, there exists a sequence (un )n∈N strongly converging to u in Lp (, Rm ) such that F hom (u) ≥ lim sup Fεn (un ). n→+∞
Proof. The proof will be obtained in two steps. First step. We assume u ∈ W 1,p (, Rm ). We reproduce, with minor modifications, the outline of the proof of Proposition 11.2.4. According to Proposition 12.3.1 and to the Lebesgue dominated convergence theorem, hom hom F (u) = f (∇u) dx = lim fkhom (∇u) dx, (12.21)
k→+∞
1 1,p m = inf f (y, a + ∇v) dy : v ∈ W0 (kY, R ) . |kY | kY Let us fix k ∈ N∗ . Applying the interchange lemma, Lemma 11.2.2, we have for all η > 0 (of the form 1/ h with h integer) and for some φk,η in Cc (, D(kY, Rm )), 1 f (y, ∇u(x) + ∇y φk,η (x, y)) dxdy ≥ fkhom (∇u) dx |kY | ×kY 1 > f (y, ∇u(x) + ∇y φk,η (x, y)) dxdy − η. (12.22) |kY | ×kY where
fkhom (a)
Let us extend y → φk,η (x, y) by kY -periodicity on RN and consider the function uk,η,n defined by x uk,η,n (x) = u(x) + εn φk,η x, . εn Note that φk,η is a Carathéodory function so that x → φk,η (x, εxn ) is measurable. Clearly uk,η,n belongs to W 1,p (, Rm ) and uk,η,n → u strongly in Lp (, Rm )
i
i i
i
i
i
i
12.3. Application to homogenization of composite media
“abmb 2005/1 page 4 i
481
when n goes to ∞. On the other hand, according to the continuity assumption (12.5) on f and to Lemma 11.2.3, x lim f , ∇uk,η,n dx n→+∞ εn x x x = lim + εn ∇φk,η x, dx f , ∇u(x) + ∇y φk,η x, n→+∞ εn εn εn x x dx = lim , ∇u(x) + (∇y φk,η ) x, f n→+∞ εn εn 1 = f (y, ∇u(x) + ∇y φk,η (x, y)) dxdy. |kY | ×kY Consequently, from (12.22) 1 lim Fεn (uk,η,n ) = f (y, ∇u(x)+∇y φk,η (x, y)) dxdy ≤ fkhom (∇u) dx+η. n→+∞ |kY | ×kY The inequality above and (12.21), letting η → 0 (i.e., h → +∞) and k → +∞, yield lim
lim
lim Fεn (uk,η,n ) = F hom (u).
k→+∞ η→+0 n→+∞
Let us now apply the diagonalization Lemma 11.1.1 to the sequence Fεn (uk,η,n ), uk,η,n k,η,n in the metric space R × Lp (, Rm ): there exists a map n → (k, η)(n) such that lim Fεn (u(k,η)(n),n ) = F hom (u),
n→+∞
lim u(k,η)(n),n = u strongly in Lp (, Rm ).
n→+∞
We have proved that F hom (u) ≥ lim supn→+∞ Fεn (un ) for un = u(k,η)(n),n converging to u in W 1,p (, Rm ) equipped with the strong convergence of Lp (, Rm ). If p > 1, the proof is complete because the domain of F hom is W 1,p (, Rm ). Second step (p = 1). Let us consider the functional G defined on L1 (, Rm ) by f hom (∇u) dx if u ∈ W 1,1 (, Rm ), G(u) = +∞ otherwise. According to Theorem 11.3.1, F hom is nothing but the lower semicontinuous envelope cl(G) of G. Therefore, for all u ∈ L1 (, Rm ), there exists a sequence (ul )l∈N in L1 (, Rm ), strongly converging to u in L1 (, Rm ) such that F hom (u) = lim G(ul ). l→+∞
One may assume F hom (u) < +∞ so that according to the first step, there exists a sequence (ul,n )n∈N strongly converging to ul in L1 (, Rm ) when n → +∞, such that G(ul ) = lim Fεn (ul,n ). n→+∞
i
i i
i
i
i
i
482
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
Combining these two equalities, we obtain F hom (u) = lim sup lim sup Fεn (ul,n ) l→+∞
n→+∞
and lim
lim ul,n = u strongly in L1 (, Rm ).
l→+∞ l→+∞
We end the proof by applying the diagonalization Lemma 11.1.1.
12.4 Application to image segmentation and phase transitions 12.4.1 The Mumford–Shah model Let be a bounded open subset of RN and g a given function in L∞ (). Denoting by F the class of the closed sets of , for all K in F and all u in C1 ( \ K) we deal with the functional 2 E(u, K) := |u − g| dx + |∇u|2 dx + HN −1 (K)
\K
and the associated optimization problem inf{E(u, K) : (u, K) ∈ C1 ( \ K) × F}.
(12.23)
When is a rectangle in R2 and g(x) is the light signal striking at a point x, (12.23) is the Mumford–Shah model of image segmentation: K may be considered as the outline of the given light image in computer vision. If it exists, a solution (u∗ , K ∗ ) of (12.23) fulfills the three following properties: (i) the first term in E(u, K) asks that u∗ approximates the light signal g in L2 (); (ii) in \ K ∗ , u∗ does not vary very much (because of the term \K |∇u∗ |2 dx); (iii) the third term asks that the boundaries K ∗ be as short as possible. Let us remark that dropping one of the three terms makes the problem trivial, i.e., inf{E(u, K) : (u, K) ∈ C1 ( \ K) × F} = 0. Indeed, when E(u, K) = \K |∇u|2 dx + HN −1 (K), take u∗ = 0 and K ∗ = ∅. N −1 When E(u, K) := |u − g|2 dx + H (K), take u∗ = g and K ∗ = ∅. 2 When E(u, K) := |u − g| dx + \K |∇u|2 dx, let us decompose by a finite union of open cubes Qi,η with diameter η and boundary Ki,η : ' ( $ $ Qi,η = 0, Kη = Ki,η LN \ i∈I (η)
and set ui,η
1 := |Qi,η |
i∈I (η)
g(x) dx, uη := Qi,η
ui,η 1Qi,η .
i∈I (η)
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
483
Then E(uη , Kη ) tends to zero when η goes to zero. In this last case (14.14) has obviously no solution if g is not constant. To fit several applications to computer vision problems, one can adjust the functional E by suitable positive constants α, β, and γ and consider E(u, K) := α
|∇u|2 dx + γ HN −1 (K).
|u − g|2 dx + β
\K
In what follows, to shorten notations, we set α = β = γ = 1. The existence of a solution for the optimization problem (12.23) was conjectured in [191] and has been established in [123] by using the semicontinuity and the compactness results of Ambrosio [13] related to functionals defined in SBV spaces (see chapter 13). They defined a weak formulation of (12.23) as follows. If (12.23) has a solution (u∗ , K ∗ ), the closed set K ∗ must contain the jump set of u∗ . Then, it is natural to solve the problem in SBV () and to consider K ∗ as the closure of the set Su∗ . That leads us to consider the following weak formulation of the problem (12.23): |∇u| dx + H
inf
|u − g| dx : u ∈ SBV () ,
2
N−1
(Su ) +
2
(12.24)
where ∇u denotes the density of the Lebesgue part of Du and Su the jump set of u (see Section 10). The functional E defined on SBV () by E(u) = |∇u|2 dx + HN −2 (Su ) + 2 |u − g| dx will be referred to as the Mumford–Shah energy functional. In Section 14.3 we will establish the following existence result. Theorem 12.4.1. There exists at least a solution of the weak problem (12.24).
12.4.2 Variational approximation of a more elementary problem: A phase transitions model To describe a numerical processing of the weak formulation (problem (12.24)), a natural way consists in approximating in a variational sense the Mumford–Shah energy functional by classical integral functionals. Before treating the Mumford–Shah energy, we begin by showing how the Van Der Waals–Cahn–Hiliard thermodynamical model of phase transitions allows us to define a good approximation of the term HN −1 (Su ). For another and more direct method in one or two dimensions (i.e., N = 1, 2), we refer the reader to Chambolle [107]. For nonlocal variational approximations of the Mumford–Shah functional we refer the reader to Braides–Dal Maso [86], Gobbino [148], Cortesani and Toader [114], and references therein. Let be an open bounded subset of RN , m > 0, 0 < α < β be such that α meas() ≤ m ≤ β meas() and let SBV ( : {α, β}) be the subspace of all functions of SBV () taking only the two values α or β. We consider the following problem: inf HN−1 (Su ) : u ∈ SBV ( : α, β), u dx = m .
i
i i
i
i
i
i
484
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
As said above, the thermodynamical model of phase transition provides an analogous estimate of this problem. Indeed, consider the functional Fε defined in L1 () by √ 1 c 2 ε |Du| + √ W (u) dx if u ∈ H 1 (), u ≥ 0, u dx = m, 0 Fε (u) = ε +∞ otherwise, where
c0 = 2
β
#
−1 W (t) dt
,
α
ε is a positive parameter intended to go to zero, and W : [0, +∞[→ R is a nonnegative, continuous function with exactly two zeros α, β (0 < α < β). The integral functional √ c0−1 Fε is the rescaled Van Der Waals–Cahn–Hiliard energy functional by the ratio 1/ ε and W is a thermodynamical potential of a liquid with constant mass m confined to a bounded container under isothermal conditions and whose density distribution u presents two phases α and β. More precisely, the Van Der Waals–Cahn–Hiliard energy is the functional W (u) dx + Ind{ v(x) dx=m, v≥0} (u) + ε |Du|2 dx, u →
where Ind{ v(x) dx=m, v≥0} (u) =
u(x) dx = m, u ≥ 0,
0 if
+∞ otherwise. The thickness L of the transition between the two phases is given by L=
√ β −α . ε β √ 2 α W (τ ) dτ
Since L is very small, so is ε and the Van Der Waals–Cahn–Hiliard free energy is nothing but a perturbation of the Gibbs free energy functional W (u) dx + Ind{ v(x) dx=m, v≥0} (u) G : u →
by the functional H : u → ε |Du|2 dx. The first Gibbs model, which consists in minimizing G, is unsatisfactory. It is indeed easily seen that the set argmin (G) is made by infinitely many piecewise constant functions u taking the value α in an arbitrary subset A of with measure (βmeas() − m)/(β − α), and the value β in \ A, with no restriction on the shape of the interface between [u = α] and [u = β]. In particular, there is no way to recover the physical criterion: the interface has minimal area. This criterion may be recovered by the new model, consisting in minimizing the functional Fε . We point out that because of argmin(G) ∩ dom (H ) = ∅, this last model is a (viscosity) singular perturbation of the first one. For a general study of viscosity perturbations consult Attouch [29]. Modica proved in [180] the following result previously established in the special case N = 1 by Gurtin [152].
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
485
Theorem 12.4.2. The sequence (Fε )ε>0 -converges to the functional F defined by N−1 (S ) if u ∈ SBV ( : α, β), and u dx = m, H u F (u) =
+∞ otherwise
in L1 () equipped with its strong topology. Assume moreover that W satisfies the following polynomial behavior at infinity: there exist t0 > 0, c1 > 0, c2 > 0, k ≥ 2 such that for all t ≥ t0 c1 t k ≤ W (t) ≤ c2 t k . Then, the set {uε : ε → 0} of minimum points of Fε has a compact closure in L1 (), and any cluster point u is a minimum point of F . Proof. We only give the proof of the lower bound in the definition of -convergence and establish the compactness result. For a complete proof, consult [180] or the proof of √ Proposition 12.4.2. We begin by substituting ε by ε and we omit the constant c0 in the definition of Fε . The expected -limit must be −1 N−1 u dx = m, (Su ) if u ∈ SBV ( : α, β), and F (u) = c0 H
+∞ otherwise,
which is actually the asymptotic model of Van Der Waals–Cahn–Hiliard. First step. We begin by proving that for all v in L1 () and all sequence (vε )ε>0 strongly converging to v in L1 (), one has lim inf Fε (vε ) ≥ F (v). ε→0
The proof given here is based on a general method described in [221]. One may assume, for a subsequence not relabeled, that lim inf ε→0 Fε (vε ) = limε→0 Fε (vε ) = C < +∞ where C is a nonnegative constant which does not depend on ε. We then deduce v ≥ 0, v dx = m, W (vε ) dx ≤ Cε.
According to the continuity of W and Fatou’s lemma, the last inequality yields W (v) dx ≤ lim inf W (vε ) dx ≤ 0, ε→0
so that W (v(x)) = 0 a.e. and v takes only the two values α and β. Note that since truncatures operate on H 1 (), v is also the strong limit of the truncated functions v˜ε = α ∨ vε ∧ β. Moreover, from the definition of W which achieves its infimum at α and β, Fε (vε ) ≥ Fε (v˜ε ). According to these remarks, keeping the same notation, we will replace vε by v˜ε . The elementary Young inequality yields 1/2 1/2 Fε (vε ) ≥ 2 W (vε ) dx |Dvε |2 dx .
i
i i
i
i
i
i
486
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
This estimate is optimal and may be recovered by studying the map ε → Fε (u) for a fixed u in H 1 () whose minimum point is 1/2 W (u) dx ε= 2 |Du| dx and for which the minimal value is 1/2 1/2 2 2 |Du| dx . W (u) dx
By the Cauchy–Schwarz inequality we obtain lim inf Fε (vε ) ≥ 2 lim inf ε→0
ε→0
#
= 2 lim inf ε→0
t √
W (vε )|Dvε | dx
|D ψ(vε ) | dx,
where ψ(t) = α W (s) ds. Since vε → v strongly in L1 (), α ≤ vε ≤ β, and ψ is continuous, we deduce that ψ(vε ) strongly converges to ψ(v) in L1 (). According to Proposition 10.1.1, we finally deduce that ψ(v) belong to BV () and lim inf Fε (vε ) ≥ 2 |Dψ(v)|. (12.25) ε→0
Let us compute this last integral. Since 0 on [v = α], β# ψ(v) = W (s) ds on [v = β], α
the function ψ(v) is a simple function of BV () and β # |Dψ(v)| = W (s) ds HN −1 (Sv ). α
Inequality (12.25) finally gives lim inf Fε (vε ) ≥ 2 ε→0
β
#
W (s) ds HN −1 (Sv ), = F (v),
α
concluding the first step. Second step. Let us now establish the relative compactness in L1 () of the set {uε : ε → 0} of minimum points of Fε . The letter C will denote various positive constants. Consider vε = ψ ◦ uε , where ψ is the primitive of the function W 1/2 defined above. Let us first prove the relative compactness of the set {vε : ε → 0}. From the polynomial behavior of W and the fact that k/2 + 1 ≤ k, we have for all t ≥ t0 , t0 t ψ(t) ≤ W 1/2 (s) ds + W 1/2 (s) ds α
t0
≤ C(1 + W (t)),
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions which yields
vε dx ≤ C(1 +
√
“abmb 2005/1 page 4 i
487
εFε (uε ))
and gives the boundedness in L1 () of vε . On the other hand, from the proof of the lower bound above 1 |Dvε | dx ≤ Fε (vε ), 2 which finally gives the boundedness of vε in BV (). The relative compactness of {vε : ε → 0} is a consequence of the compactness of the embedding BV () → L1 (). Let us now go back to the functions uε . Let v be a strong limit in L1 () of a nonrelabeled subsequence of vε , consider the inverse function ψ −1 of ψ, and set u = ψ −1 ◦v. We establish the strong convergence of uε to u in L1 (). We proceed as follows: we prove the equi-integrability of uε and the convergence in measure of uε to u (see, for instance, Marle [178]). From the polynomial behavior of W , we have k k |uε | dx ≤ t0 meas() + C W (uε ) dx √ ≤ C(1 + εFε (uε )) ≤ C √ k/2 and equi-integrability follows from k ≥ 2. On the other hand, since ψ (t) ≥ c1 t0 for all t > t0 , ψ −1 is a Lipschitz function on [ψ(t0 ), +∞), hence uniformly continuous on R+ . Therefore, uε converges in measure to u. For a numerical approach, it suffices now to establish the -convergence of the discretization Fε,h(ε) of the functional Fε by finite elements, to the functional F , with a suitable choice of the size h of discretization. For this study, consult Bellettini [61].
12.4.3 Variational approximation of the Mumford–Shah functional energy
When neglecting the functional u → |u − g|2 dx in the expression of the Mumford– Shah functional, to control the jumps of admissible functions, a natural domain is the space GSBV () of generalized special functions of bounded variation defined by % & GSBV () := u : → R : u Borel function , k ∧ u ∨ (−k) ∈ SBV () ∀k ∈ N . It can be shown (see Ambrosio–Tortorelli [24]) that to each function u ∈ GSBV () there corresponds a Borel function ∇u : → RN and Su ⊂ such that ∇u = ∇(k ∧ u ∨ (−k)) a.e. on [|u| ≤ k] for all k ∈ N and HN −1 (Sk∧u∨(−k) ) → HN −1 (Su ) when k → +∞. Following the strategy of the previous subsection, as in Ambrosio–Tortorelli [24], we establish that the functional F defined in X := L1 () × L1 (, [0, 1]) equipped with its strong topology by |∇u|2 dx + HN −1 (Su ) if u ∈ GSBV () and s = 1, F (u, s) = +∞ otherwise
i
i i
i
i
i
i
488
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
can be approximated, in the sense of -convergence, by the functionals defined in X by (s 2 + ε 2 )|∇u|2 dx + Mε (s, ) if (u, s) ∈ C1 () × C1 (, [0, 1]) ∩ X, F (u, s) = ε
+∞ otherwise.
For all open subsets A of and all s in C1 (, [0, 1]), Mε (., A) denotes the integral functional 1 ε|∇s|2 + (1 − s)2 dx. Mε (s, A) := 4ε A The second argument s is, as we will see in the proof, a control parameter on the gradient. The approximation of the Mumford–Shah energy will be the functional Gε defined by Gε (u, s) = Fε (u, s) + |u − g|2 dx. Indeed, u → |u − g|2 dx is a continuous perturbation of Fε and the conclusion will follow from Theorem 12.1.1(ii). We assume that satisfies the following “reflection condition” (R) on ∂: there exists an open neighborhood U of ∂ in RN and a one-to-one Lipschitz function ϕ : U ∩ −→ U \ such that ϕ −1 is Lipschitz. Theorem 12.4.3. Assume that satisfies condition (R). Then, the sequence of functionals (Fε )ε>0 -converges to the functional F . The proof proceeds in the two Propositions 12.4.1 and 12.4.2. We denote the strong topology of L1 () × L1 (, [0, 1]) by τ , and the letter C will denote various positive constants which do not depend on ε. We point out that condition (R) is not necessary for obtaining the lower bound in Proposition 12.4.1. Proposition 12.4.1. For all (u, s) ∈ X and all sequences ((uε , sε ))ε τ -converging to (u, s), we have F (u, s) ≤ lim inf ε→0 Fε (uε , sε ), or equivalently, F ≤ − lim inf ε→0 Fε . Proof. Obviously, one may assume − lim inf ε→0 Fε (u, s) < +∞ and s = 1. First step. We assume u ∈ L∞ () and establish the proposition in the one-dimensional case N = 1 when is a bounded interval I in R. When I is not an interval, it suffices to argue on each connected component of I and to conclude thanks to the superadditivity of − lim inf ε→0 Fε . When working on a bounded open subset A of R, we will denote F and − lim inf ε→0 Fε by F (., A) and − lim inf ε→0 Fε (., A), respectively. The key point of the proof is the following lemma. Lemma 12.4.1. Assume u in L∞ (I ) and fix x0 ∈ I . (i) If there exists η > 0 such that for all ρ < η, u ∈ W 1,2 (Bρ (x0 )), then for all ρ < η − lim inf Fε ((u, 1), Bρ (x0 )) ≥ 1. ε→0
(ii) If there exists ρ > 0 such that u ∈ W 1,2 (Bρ (x0 )), then − lim inf Fε ((u, 1), Bρ (x0 )) ≥ ε→0
|∇u|2 dx.
Bρ (x0 )
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
489
Assume for the moment that the proof of Lemma 12.4.1 is established. We claim that the set E := {x ∈ I : ∃η > 0 ∀ρ < η, u ∈ W 1,2 (Bρ (x0 ))} is finite. Indeed, otherwise E would contain an infinite countable subset D = {xi , i ∈ N}. For all n in N and ρ small enough such that Bρ (xi ), i = 0, . . . , n, are pairwise disjoint sets, we would have from (i), superadditivity, and nondecreasing properties of A → − lim inf ε→0 Fε (., A), +∞ > − lim inf Fε ((u, 1), I ) ≥ ε→0
n
− lim inf Fε ((u, 1), Bρ (xi )) ≥ n.
i=0
ε→0
This being true for all n ∈ N, we obtain a contradiction. The set E is then made up of a finite number of points x0 , . . . , xn and it is easily seen that u ∈ W 1,2 (I \ E). From H0 (E) < +∞, we deduce u ∈ SBV (I ) and E = Su . For ρ small enough as previously, according to (ii) of Lemma 12.4.1, we have ' ( $ − lim inf Fε ((u, 1), I ) ≥ − lim inf Fε (u, 1), I \ Bρ (x0 ) ε→0
ε→0
x∈Su
'
$
+ − lim inf Fε (u, 1), ε→0
I \ x∈Su Bρ (x0 )
Bρ (x0 )
x∈Su
≥
(
|∇u|2 dx + H0 (Su ).
We conclude the step by letting ρ → 0 in the above inequality. It remains to establish assertions (i) and (ii) of Lemma 12.4.1. Let (uε , sε ) ∈ X ∩ C1 (I ) × C1 (I, [0, 1]) τ -converging to (u, 1) and satisfying lim inf Fε ((uε , sε ), Bρ (x0 )) < +∞. ε→0
For proving (i), we establish the existence of xε , xε , xε in Bρ (x0 ) such that xε < xε < xε and satisfying limε→0 sε (xε ) = 0, limε→0 sε (xε ) = limε→0 sε (xε ) = 1, for a nonrelabeled subsequence. Let us assume this result for the moment. We conclude as follows: by convexity inequality (precisely a 2 + b2 ≥ 2ab), Fε (uε , sε ) ≥ Mε (sε , Bρ (x0 )) ≥ (1 − sε )|∇sε | dx Bρ (x0 )
≥
xε
xε
≥
(1 − sε )|∇sε | dx +
xε
(1 − sε )|∇sε | dx
xε
x ε (1 − sε )∇sε dx + (1 − sε )∇sε dx xε xε 2 xε 2 xε (1 − sε ) + − (1 − sε ) , = − 2 2 xε xε xε
which tends to 1 when ε → 0.
i
i i
i
i
i
i
490
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
We are going to establish the existence of xε , xε , and xε . In what follows, we argue with various nonrelabeled subsequences and C denotes various positive constants independent of ε. Let σ < ρ and set mε := inf Bσ (x0 ) sε . From Fε ((uε , sε ), Bρ (x0 )) ≤ C, we derive |∇uε |2 dx ≤ C. m2ε Bσ (x0 )
Up to a subsequence, mε converges to some l, 0 ≤ l ≤ 1. We claim that l = 0. Otherwise, C |∇uε |2 dx ≤ 2 lim ε→0 B (x ) l σ 0 and uε would weakly converge to u in W 1,2 (Bσ (x0 )), which is in contradiction with u ∈ W 1,2 (Bρ (x0 )) for all ρ < η. Consequently, there exists xε ∈ Bσ (x), satisfying limε→0 sε (xε ) = limε→0 mε = 0. On the other hand, estimates x0 +ρ x0 −σ (1 − sε )2 (1 − sε )2 dx ≤ C, dx ≤ C, 4ε 4ε x0 −ρ x0 +σ and the mean value theorem yield, for a subsequence, the existence of xε and xε satisfying the required assertions. Let us show (ii). According to sε2 |∇uε |2 dx ≤ Fε (uε , sε ), Bρ (x0 )
it is enough to establish the following inequality |∇u|2 dx ≤ lim inf Bρ (x0 )
ε→0
sε2 |∇uε |2 dx
Bρ (x0 )
when Fε (uε , sε ) is equibounded. Set vε = (1 − sε )2 . The equiboundedness of ∇vε in L1 (I ) will provide the following uniform control on vε : for all δ > 0, there exists a finite part Jδ of I such that for all compact subset K, K ⊂ I \ Jδ , one has lim supε→0 supK vε < δ. (12.26) We will then deduce lim inf ε→0 inf K sε > 1 − δ 1/2 . Let us assume for the moment estimate (12.26). We have for all δ > 0 and all compact subset K with K ⊂ I \ Jδ , sε2 |∇uε |2 dx ≥ sε2 |∇uε |2 dx C≥ Bρ (x0 ) Bρ (x0 )∩K 2 ≥ inf (sε ) |∇uε |2 dx. K
Therefore
1
sε2 |∇uε |2 dx ≥ (1 − δ 2 )2 lim inf
C ≥ lim inf ε→0
Bρ (x0 )∩K
Bρ (x0 )
ε→0
|∇uε |2 dx, Bρ (x0 )∩K
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
491
and the weak convergence of uε to u in W 1,2 (Bρ (x0 ) ∩ K) yields, by lower semicontinuity, 1 lim inf sε2 |∇uε |2 dx ≥ (1 − δ 2 )2 |∇u|2 dx. ε→0
Bρ (x0 )
Bρ (x0 )∩K
The conclusion (ii) follows after letting K to I and δ → 0. We are now going to establish (12.26). We claim that supε I |∇vε | dx < +∞. Indeed, by convexity +∞ ≥ 2Mε (sε , I ) ≥ 2(1 − sε )|∇sε | dx I = |∇vε | dx. I
Let σ > 0 satisfying δ > σ and consider for all t in R, the sets Atε := [vε ≤ t]. According to the classical coarea formula, more precisely to Corollary 4.2.2, we have +∞ C ≥ |∇vε | dx = H0 ([vε = t]) dt I
−∞ δ
≥
H0 ([vε = t]) dt.
σ C Therefore, there exists tε ∈]σ, δ[ such that H0 ([vε = tε ]) ≤ δ−σ . The set Atεε has then C at most k = [ δ−σ ] connected components with k independent of ε: more precisely, there exists a family (Iεi )i=1,...,k of intervals (possibly empty) such that Atεε = ki=1 Iεi . For every i = N n≥N Iεin . The complementary of the union i = 1, . . . , k, consider the interval I∞ 0 i of k intervals I∞ := ki=1 I∞ is the required finite part Iδ of I . Indeed, since vε converges a.e. to zero, ( ' ( ' k $$ $ meas(I∞ ) = meas Iεin = meas Atεnn i=1 N n≥N
N n≥N
' ≥ meas
$
( [vεn ≤ σ ]
N n≥N
= meas(I ) so that Iδ = I \ I∞ possesses k elements. Finally, if K is a compact set included in Iδ , 0
0
i i i arguing on each interval I∞ , we have K∩ I∞ ⊂⊂ I∞ and, for N large enough 0
i K∩ I∞ ⊂
n≥N
Iεin ⊂
[vεn ≤ δ].
n≥N
Second step. We establish Proposition 12.4.1 in the N -dimensional case, N > 1. We will use the same notation for the functionals considered in the one-dimensional and the N-dimensional case.
i
i i
i
i
i
i
492
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
We begin by assuming u ∈ L∞ (). Let (uε , sε ) be a sequence in X converging to (u, 1) such that lim inf ε→0 Fε ((uε , sε ), ) < +∞ and A any open subset of . With the notation and definitions of Theorem 10.5.2 for all ν ∈ S N −1 and for a subsequence not relabeled, (uε,x , sε,x ) strongly converges in L1 (Ax ) × L1 (Ax , [0, 1]) for HN −1 a.e. x in Aν (it’s an easy consequence of Fubini’s theorem). On the other hand, lim inf Fε ((uε,x , sε,x ), Ax ) dHN −1 ≤ lim inf Fε ((uε,x , sε,x ), Ax ) dHN −1 Aν
ε→0
ε→0
Aν
= lim inf Fε ((uε , sε ), A) < +∞. ε→0
Thus, for HN−1 a.e. x in Aν , lim inf ε→0 Fε ((uε,x , sε,x ), Ax ) < +∞. One may apply the result of the first step: for HN−1 a.e. x in Aν , ux belongs to SBV (Ax ) ∩ L∞ (Ax ) and lim inf Fε ((uε,x , sε,x ), Ax ) ≥ |∇ux |2 + H0 (Sux ∩ Ax ). ε→0
Ax
Integrating this inequality over Aν , according to Theorem 10.5.2, we deduce that for all open subset A of , u ∈ SBV (A) and lim inf Fε ((uε , sε ), A) ≥ lim inf Fε ((uε,x , sε,x ), Ax ) dHN −1 ε→0 Aν ε→0 2 N −1 ≥ |∇ux | dt dH (x) + H0 (Sux ∩ Ax ) dHN −1 (x) Aν Ax Aν 2 = |∇u.ν| dx + |νu .ν| dHN −1 Su . A
A
We conclude thanks to Lemma 4.2.2 and Example 4.2.2. If now u is not assumed to belong to L∞ (), by a truncation argument we have − lim inf Fε ((u, s), ) ≥ − lim inf Fε ((N ∧ u ∨ (−N )), s), ) ε→0
ε→0
≥ F ((N ∧ u ∨ (−N )), s), ). Letting N → +∞ gives the thesis. Proposition 12.4.2. For all (u, s) ∈ X there exists a subsequence ((uε , sε ))ε τ -converging to (u, s) such that F (u, s) ≥ lim supε→0 Fε (uε , sε ) or, equivalently, F ≥ −lim supε→0 Fε . Proof. One may assume u in SBV () ∩ L∞ (). Indeed, if u ∈ GSBV (), an easy truncation argument gives the thesis. For a given u ∈ SBV () ∩ L∞ (), it suffices to construct (uε , sε ) in H 1 () × H 1 (, [0, 1]) ∩ X, τ -converging to (u, 1) in X and satisfying F (u, s) ≥ lim supε→0 Fε (uε , sε ). The expression of Fε has indeed a sense in H 1 ()×H 1 (, [0, 1])∩X. Moreover, C1 ()× C1 (, [0, 1]) is dense in H 1 () × H 1 (, [0, 1]) equipped with its strong topology and Fε is continuous for this topology which is stronger than τ . The conclusion will follow by a diagonalization argument.
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
493
First step. Let aε , bε , cε be three sequences in R+ going to zero which will be adjusted later in a suitable way. The idea consists in modifying (u, 1) in a neighborhood of Su to obtain, from the expression of Fε (uε , sε ), an equivalent of HN −1 (Su ). We begin by assuming the following regularity condition on Su : lim
ρ→0
mes( ∩ (Su )ρ ) = HN −1 (Su ), 2ρ
(12.27)
where (Su )ρ is the tubular neighborhood {x ∈ RN : d(x, Su ) < ρ} of order ρ of Su . In what follows, for any t ∈ R+ , (Su )t denotes a tubular neighborhood of order t > 0 of Su . We construct uε in H 1 () satisfying uε = u in \ (Su )aε and such that its gradient satisfies |∇uε (x)| ≤
C aε
(12.28)
a.e. in (Su )aε . Consider now sε in H 1 (, [0, 1]) such that 0 on (Su )aε , sε = 1 − cε on \ (Su )aε +bε . The positive constant cε is introduced for technical reasons and, as said before, will be adjusted later. We have (uε , sε ) ∈ H 1 () × H 1 (, [0, 1]) ∩ X and Fε ((uε , sε ), ) = (sε2 + ε 2 )|∇u|2 dx + ε 2 |∇uε |2 dx \(Su )aε cε2
(Su )aε
1 meas( \ (Su )aε +bε ) + meas((Su )aε ) 4ε 4ε + Mε (sε , (Su )aε +bε \ (Su )aε ). The first term trivially goes to |∇u|2 dx when ε goes to zero. We adjust aε so that the second and fourth terms go to zero. For this, it suffices, thanks to (12.28), to select an intermediate power of ε between √ ε2 and ε, for instance, aε = ε3/2 . To make the third term vanish, it suffices to choose cε ≤ ε, for instance, cε = ε 3/4 . We are reduced to finding sε satisfying lim sup Mε (sε , (Su )aε +bε \ (Su )aε ) ≤ HN −1 (Su ). +
ε→0
Let us denote the map d(.) = dist(., Su ) by d. We try to find sε of the form σε ◦ d. Applying 2 the coarea formula Theorem 4.2.5 to the function g = ε|σε ◦ d|2 + (1−σ4εε ◦d) and to the truncated function f = aε ∨ d ∧ (aε + bε ) of d, we obtain aε +bε (1 − σε (t))2 Mε (sε , (Su )aε +bε \ (Su )aε ) = ε|σε (t)|2 + HN −1 ([d = t]) dt. 4ε aε Consider h(t) = meas([d < t]). Then, according to Corollary 4.2.3, h (t) = HN −1 ([d = t]), and aε +bε (1 − σε (t))2 Mε (sε , (Su )aε +bε \ (Su )aε ) = ε|σε (t)|2 + h (t) dt. 4ε aε
i
i i
i
i
i
i
494
“abmb 2005/1 page 4 i
Chapter 12. -convergence and applications
The function σε is chosen as the solution of the ordinary boundary value problem ε σε = 1−σ , 2ε σε (aε ) = 0, σε (aε + bε ) = 1 − cε , that is, σε (t) = 1 − exp( aε2ε−t ) when we choose bε = −ε ln(ε3/2 ). On the other hand, the regularity assumption on Su yields for all η > 0 the existence of ε0 such that for all ε < ε0 and all t < aε + bε , one has h(t) ≤ 2t (HN −1 (Su ) + η). Thanks to this estimate, the conclusion then follows by integrating by parts the expression Mε (sε , (Su )aε +bε \ (Su )aε ) (for details see [24, Proposition 5.1]). Second step. We do not assume hypothesis (12.27). To apply the first step, we construct a sequence uη converging to u in L2 () such that Suη satisfies (12.27) and such that F (u) = limη→0 F (uη ). Afterwards, it will suffice to apply the procedure of the first step to the function uη and to conclude by a diagonalization argument. For constructing uη , the idea consists in finding uη as a solution of the Mumford–Shah problem 1 2 N −1 2 inf |∇v| + H (Sv ) + |v − u| dx : v ∈ SBV ( ) , (Pη ) η where = ∪ U and u is the extension of u on defined by u(ϕ −1 (x)) if x ∈ U \ , u(x) = γ0 (u) if x ∈ ∂, u(x) if x ∈ . U and ϕ are given by the regularity condition (R) fulfilled by and γ0 denotes the trace operator. We next use the following regularity property related to Mumford–Shah solutions (see Ambrosio and Tortorelli [24], De Giorgi, Carriero, and Leaci [123]): HN −1 (S uη ∩ \Suη ) = 0 and, for all compact set K included in S uη ∩ , lim
ρ→0
meas((K)ρ ) = HN −1 (K). 2ρ
Taking K = ∩ Suη , we obtain the required regularity on uη in . Obviously, uη converges to u in L2 () thanks to the penalization parameter 1/η in (Pη ). It remains to establish the convergence of F (uη ) to F (u). Consider the two Borel measures µη and µ in M+ ( ) defined for all Borel set B in , by µη (B) := |∇uη |2 dx + HN −1 (B ∩ Suη ), B µ(B) := |∇u|2 dx + HN −1 (B ∩ Su ). B
It is worth noticing that u has no jump through ∂ so that µ(∂) = 0. Taking v = u as a test function in (Pη ), we obtain lim sup µη ( ) ≤ µ( ). η→0
i
i i
i
i
i
i
12.4. Application to image segmentation and phase transitions
“abmb 2005/1 page 4 i
495
On the other hand, according to Theorem 13.4.3, which we will establish in the next chapter, we have for all open subset A of µ(A) ≤ lim inf µη (A). η→0
According to Proposition 4.2.5, we deduce that µη narrow converges to µ. Since µ(∂) = 0, we have µη () → µ().
i
i i
i
i
“abmb 2005/1 page 4 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 4 i
Chapter 13
Integral functionals of the calculus of variations
This chapter is devoted to the study of the sequential lower semicontinuity of certain types of functionals which occur in many variational problems. As noted in chapter 11, lower semicontinuity is the key tool to apply the direct methods of the calculus of variations, and we deal in the sections below with some different cases, depending on the spaces the functionals are defined on. We will see that, due to the integral form of the functionals under consideration, the convexity or quasi-convexity conditions play a central role in all the results. We first complement Section 11.2 by establishing necessary and sufficient conditions for more general integral functionals to be lower semicontinuous. Then to complement Section 11.3, we deal with lower semicontinuity of functionals defined on the space of measures, on BV and SBV . We do not pretend to be exhaustive in this very widely studied field; we intend only to give here some principal results. For other cases and details see [22], [97], [102], [116], [143], [185].
13.1
Lower semicontinuity in the scalar case
In this section we consider integral functionals of the form F (u) = f (x, u, Du) dx,
(13.1)
where u varies on a Sobolev space W 1,p (). We stress the fact that in this section we restrict our attention to the case of functions u which take their values in R; some differences with the case of functionals defined on vector valued functions will be discussed in the next section. To study the sufficient conditions for the lower semicontinuity of functionals of the form (13.1), it is convenient to consider first the case of functionals of the form F (u, v) = f x, u(x), v(x) dµ(x), (13.2)
where (, A, µ) is a measure space with the measure µ nonnegative and finite (i.e., µ ∈ M+ ()), f : ×Rm ×Rn → [0, +∞] is an A⊗Bm ⊗Bn -measurable function (Bm and Bn , 497
i
i i
i
i
i
i
498
“abmb 2005/1 page 4 i
Chapter 13. Integral functionals of the calculus of variations
respectively, denote the σ -algebras of Borel subsets of Rm and Rn ), and u, v, respectively, vary in the spaces L1µ (; Rm ), L1µ (; Rn ) of µ integrable Rm , Rn valued functions. The theorem below is a lower semicontinuity result for functionals of the form (13.2); the link with the case (13.1) will be discussed later. Theorem 13.1.1. Assume the function f satisfies the following conditions: (i) for µ-a.e. x ∈ the function f (x, ·, ·) is lower semicontinuous on Rm × Rn ; (ii) for µ-a.e. x ∈ and for every s ∈ Rm the function f (x, s, ·) is convex on Rn . Then the functional F defined in (13.2) is sequentially lower semicontinuous on the space L1µ (; Rm ) × L1µ (; Rn ) endowed with the strong topology on L1µ (; Rm ) and the weak topology on L1µ (; Rn ). Proof. Let uh → u strongly in L1µ (; Rm ) and vh → v weakly in L1µ (; Rn ); we have to prove that (13.3) F (u, v) ≤ lim inf F (uh , vh ). h→+∞
Possibly passing to subsequences, we may assume without loss of generality that the lim inf in the right-hand side of (13.3) is a finite limit, that is, lim F (uh , vh ) = c ∈ R.
h→+∞
(13.4)
Since the sequence (vh ) is weakly compact in L1µ (; Rn ), we may use the Dunford–Pettis theorem, Theorem 2.4.5, and the De La Vallée–Poussin criterion, Theorem 2.4.4, to conclude that there exists a function ϑ : [0, +∞[→ [0, +∞[, which can be taken convex and strictly increasing, with a superlinear growth, that is, ϑ(t) = +∞, t→+∞ t lim
such that
ϑ(|vh |) dµ ≤ 1.
sup h∈N
Setting
(13.5)
√ H (t) = tϑ(t), −1 (t) = ϑ H (t) , ξh (x) = H |vh (x)| ,
it is easy to see that (i) H is strictly increasing and H (t)/t → +∞ as t → +∞, (ii) is strictly increasing and (t)/t → +∞ as t → +∞, (iii) ϑ(t)/H (t) → +∞ as t → +∞, (iv) ξh (x) = ϑ |vh (x)| .
i
i i
i
i
i
i
13.1.
Lower semicontinuity in the scalar case
Therefore, by (13.5) we have
“abmb 2005/1 page 4 i
499
(ξh ) dµ ≤ 1.
sup h∈N
We can use now the Dunford–Pettis theorem again to deduce that the sequence (ξh ) is weakly compact in L1µ (), hence (up to extracting a subsequence) we may assume that ξh → η weakly in L1µ () for a suitable η. By the Mazur theorem, a suitable sequence of convex combinations of (ξh , vh ) is strongly convergent in L1µ () × L1µ (; Rn ) to (η, v). More precisely, there exist Nh → +∞ and αi,h ≥ 0 with
Nh+1
αi,h = 1
i=Nh +1
such that the sequences
Nh+1
ηh =
Nh+1
αi,h ξi ,
νh =
i=Nh +1
αi,h vi
i=Nh +1
strongly converge to η in L1µ () and to v L1µ (; Rn ), respectively. Possibly passing to subsequences we may also assume that ηh → η, νh → v, and uh → u pointwise µ-a.e. on . Consider now a point x ∈ where all the convergences above occur, and set & % εh = max |u(x) − ui (x)| : Nh < i ≤ Nh+1 , Nh+1 αi,h f x, ui (x), vi (x) , λh = i=N & % h +1 Ah = (ν, η, λ) ∈ Rn+2 : η = H (|ν|), ∃s ∈ Rm , |s − u(x)| ≤ εh , λ ≥ f (x, s, ν) . We have εh → 0 and by definition of νh , ηh , λh we obtain that νh (x), ηh (x), λh (x) belongs to the convex hull coAh of Ah . Since Ah ⊂ Rn+2 , by the Carathéodory theorem on convex hulls in Euclidean spaces the vector νh (x), ηh (x), λh (x) can be written as a convex combination of n + 3 elements of Ah , that is, there exist βi,h ≥ 0,
νi,h ∈ Rn ,
ηi,h ≥ 0,
λi,h ≥ 0
(i = 1, . . . , n + 3)
such that (νi,h , ηi,h , λi,h ) ∈ Ah for every index i, and n+3 n+3 βi,h = 1, βi,h νi,h = νh (x), i=1
i=1
n+3 βi,h ηi,h = ηh (x), i=1
n+3
βi,h λi,h = λh (x).
i=1
Therefore, for suitable si,h ∈ Rm with |si,h − u(x)| ≤ εh we have λi,h ≥ f (x, si,h , νi,h ).
i
i i
i
i
i
i
500
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
By extracting subsequences, without loss of generality, we may assume that for every index i the sequence |νi,h | tends to a limit and, denoting by I the set of indices i such that this limit is finite, again by passing to subsequences, we may also assume that ∀i ∈ I, νi,h → νi ∀i ∈ / I, |νi,h | → +∞ βi,h → βi ∀i = 1, . . . , n + 3. Since
n+3
βi,h H (|νi,h |) = ηh (x) → η(x),
i=1
the set I cannot be empty. From the relation n+3
βi,h νi,h = νh (x) → v(x),
i=1
/ I . Moreover, from we obtain that βi = 0 for every i ∈ ηh (x) =
n+3
βi,h ηi,h ≥
βi,h ηi,h =
i ∈I /
i=1
βi,h |νi,h |
i ∈I /
H (|νi,h |) |νi,h |
we get βi,h |νi,h | → 0 so that
∀i ∈ /I
βi = 1,
i∈I
βi νi = v(x).
i∈I
We now use the assumptions on the function f to obtain βi f x, u(x), νi f x, u(x), v(x) ≤ i∈I
≤ lim inf h→+∞
≤ lim inf h→+∞
βi,h f (x, si,h , νi,h )
i∈I n+3
βi,h f (x, si,h , νi,h )
i=1
≤ lim inf λh (x) h→+∞
so that by Fatou’s lemma, f (x, u, v) dµ ≤ lim inf λh (x) dµ
h→+∞
= lim inf h→+∞
Nh+1
i=Nh +1
(13.6)
αi,h
f (x, ui , vi ) dµ.
(13.7)
i
i i
i
i
i
i
13.1.
Lower semicontinuity in the scalar case
“abmb 2005/1 page 5 i
501
Fix now ε > 0; by using (13.4) we obtain, for h large enough, f (x, ui , vi ) dµ ≤ c + ε ∀i ∈ [Nh + 1, Nh+1 ]
and so by (13.7) F (u, v) ≤ c + ε. The proof is then achieved by taking ε → 0+ . Remark 13.1.1. It is easy to see that the result of Theorem 13.1.1 remains true if the measure µ is only assumed to be σ -finite. The result above, under the slightly stronger assumption that f (x, ·, ·) is continuous for µ-a.e. x ∈ , was first obtained by De Giorgi in 1968 in an unpublished paper. The original proof by De Giorgi is obtained by approximating from below the convex function f (x, s, ·) by finite suprema of affine functions fk (x, s, ·) for which the proof is easier, and then passing to the limit as k → +∞; the interested reader may find further details about this type of proof in the book by Buttazzo [97]. The proof reported above follows on the contrary the scheme of the proof which was given in 1977 by Ioffe [157]. By using the result of Theorem 13.1.1 we can easily give some sufficient conditions for the sequential lower semicontinuity of functionals of the form (13.1) on the Sobolev space W 1,1 (). More precisely, the following result holds. Theorem 13.1.2. Let be a Lipschitz domain of Rn and let f : × Rm × Rmn → [0, +∞] be a function verifying the assumptions of Theorem 13.1.1. Then the functional F defined in (13.1) is sequentially weakly lower semicontinuous on the Sobolev space W 1,1 (; Rm ). Proof. Let (uh ) be a sequence in W 1,1 (; Rm ) converging weakly to some function u. Setting vh = Duh we have that vh converges weakly to v = Du in L1 (; Rmn ) and, by the Rellich theorem, uh converge strongly to u in L1 (; Rm ). By Theorem 13.1.1 we have F (u) = f (x, u, v) dx ≤ lim inf f (x, uh , vh ) dx = lim inf F (uh ),
h→+∞
h→+∞
which then proves the assertion. In the so-called scalar case (i.e., when m = 1), and in the case of ordinary integrals as well (i.e., when n = 1), we will prove below that the convexity of the integrand f with respect to the gradient is also a necessary condition for the semicontinuity. This is no longer true in the vector valued case m > 1 for multiple integrals, as we will discuss in the next section. For the sake of simplicity we here limit ourselves to consider only the case of functionals of the form F (u) = f (Du) dx; (13.8)
the more general case (13.1) presents only technical differences in the proof, and we refer to one of the books mentioned at the beginning of this chapter for the details. Theorem 13.1.3. Assume that either m = 1 or n = 1 and that the functional F in (13.8) is sequentially weakly* lower semicontinuous in the Sobolev space W 1,∞ , in the sense that F (u) ≤ lim inf F (uh ) h→+∞
(13.9)
i
i i
i
i
i
i
502
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
for every sequence uh converging to u uniformly in and with Duh uniformly bounded in . Then the function f is convex and lower semicontinuous. Proof. We give the proof only in the case m = 1, the other one being similar. Let z1 , z2 ∈ Rn , let t ∈]0, 1[, and let z = tz1 + (1 − t)z2 . Denote the linear function uz (x) = z · x by uz and define z2 − z1 z0 = , |z2 − z1 | = > 1hj = x ∈ : j −1 < z0 · x < j −1+t , j ∈ Z, h ∈ N, h h > = j −1+t j j ∈ Z, h ∈ N, 2hj = x ∈ : h < z0 · x < h , = > 1 1h = hj : j ∈ Z , = > 2 2 = : j ∈ Z , h hj 1 chj + z1 · x if x ∈ 1hj , uh (x) = 2 + z2 · x if x ∈ 2hj , chj where
(j − 1)(1 − t) |z2 − z1 |, h It is easy to verify that, as h → +∞, 1 chj =
meas(1h ) → t, meas()
2 =− chj
jt |z2 − z1 |. h
meas(2h ) → 1 − t. meas()
(13.10)
Moreover, the functions uh are Lipschitz continuous and for every x ∈ 1hj we have uh (x) − uz (x) = c1 + (z1 − z) · x = (1 − t) j − 1 |z2 − z1 | + (z1 − z2 ) · x hj h j − 1 t (1 − t) = (1 − t)|z2 − z1 | − z0 · x ≤ |z2 − z1 |. h h Analogously, a similar computation gives for every x ∈ 2hj uh (x) − uz (x) ≤ t (1 − t) |z2 − z1 |. h Therefore uh → uz uniformly on . Morever, it is immediate to see that the gradients Duh are uniformly bounded on , so that the sequence (uh ) converges to uz weakly* in W 1,∞ (). By the assumption (13.9), using also (13.10), we then obtain f (z) meas() = F (uz ) ≤ lim inf F (uh ) h→+∞ = lim inf f (z1 ) meas(1h ) + f (z2 ) meas(2h ) h→+∞
= tf (z1 ) meas() + (1 − t)f (z2 ) meas(), which proves the convexity of f . The lower semicontinuity of f on Rn is a straightforward consequence of the sequential lower semicontinuity assumption (13.9) on F .
i
i i
i
i
i
i
13.2.
13.2
Lower semicontinuity in the vectorial case
“abmb 2005/1 page 5 i
503
Lower semicontinuity in the vectorial case
We have seen in the previous section that the convexity assumption on the integrand f (x, s, ·) is necessary and sufficient for the sequential weak lower semicontinuity of the functional f (x, u, Du) dx (13.11) F (u) =
in the case of scalar functions u. On the contrary, in the case of functions u with vector values, the convexity of the integrand with respect to the gradient variable describes only a small class of weakly lower semicontinuous functionals. For this reason, according to Morrey [185] we introduce the notion of quasi-convexity defined in chapter 11 in the context of relaxation theory. Definition 13.2.1. A Borel function f : Rmn → [0, +∞] is said to be quasi-convex if f z + Dφ(x) dx (13.12) f (z) meas(A) ≤ A
for a suitable (hence for all) bounded open subset A of Rn , every m × n matrix z, and every φ ∈ C10 (A; Rm ). Remark 13.2.1. It is possible to prove that when either m = 1 or n = 1 quasi-convexity reduces to the usual convexity; this will actually follow from Theorem 13.1.3 once the equivalence between lower semicontinuity and quasi-convexity will be proved. On the other hand, there are many examples of quasi-convex functions which are not convex, as, for instance, the function z → f (z) = | det z|. If f : Rmn → [0, +∞] is quasi-convex, then for every z ∈ Rmn and s ∈ Rm the function φs,z : Rn → [0, +∞] defined by φs,z (ξ ) = f (z + s ⊗ ξ )
(13.13)
is convex. This fact follows again from the results of the previous section on the scalar case, once we remark that by (13.12) we obtain φs,z ξ + Dφ(x) dx φs,z (ξ ) meas(A) ≤ A
for every vector ξ ∈ Rn and every scalar function φ ∈ C10 (A). The property given by (13.13) is called rank-one convexity. From the convexity of the functions φs,z defined in (13.13), we obtain that every rank-one convex function f of class C2 (Rmn ) satisfies the so-called Legendre–Hadamard condition: ∂ 2f (z0 )αi αh βj βk ≥ 0 ∂zij ∂zhk
i
i i
i
i
i
i
504
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
for all z0 ∈ Rmn and for all α ∈ Rm , β ∈ Rn . (The summation convention over repeated indices is adopted.) Moreover, we get that every rank-one convex finite-valued function is locally Lipschitz on Rmn . This is made precise in Lemma 13.2.1. Remark 13.2.2. A wide class of quasi-convex functions is the class of polyconvex functions, introduced by Ball in [54]. A function f is called polyconvex if it can be written in the form f (z) = g X(z) ∀z ∈ Rm×n , where X(z) denotes the vector of all subdeterminants of the matrix z and g is a convex function. For instance, if m = n = 2, every polyconvex function is of the form g(z, det z) with g convex on R4 × R; analogously, if m = n = 3, every polyconvex function is of the form g(z, adj z, det z) with g convex on R9 × R9 × R and where adj z denotes the adjugate matrix of z, that is the transpose of the matrix of cofactors of z. Lemma 13.2.1. Let f : Rmn → R be a rank-one convex function such that 0 ≤ f (z) ≤ c(1 + |z|p ) ∀z ∈ Rmn . Then, for a suitable constant k > 0 we have f (z) − f (w) ≤ k|z − w| 1 + |z|p−1 + |w|p−1 ∀z, w ∈ Rmn . Proof. Being f convex with respect to each column vector, it is enough to prove the inequality in the case f convex. Again, arguing component by component, we may assume that f is a convex function of only one variable t. These functions are differentiable almost everywhere, and we have f (t + h) − f (t) ∀h > 0, h f (t + h) − f (t) f (t) ≥ ∀h < 0. h
f (t) ≤
(13.14) (13.15)
Taking h = 1 + |t| in (13.14) and h = −1 − |t| in (13.15) we obtain for a.e. t ∈ R |f (t)| ≤
f (t + h) c ≤ 1 + |t|p + |h|p ≤ c 1 + |t|p−1 , |h| 1 + |t|
from which the conclusion follows easily. For a systematic study of the properties of quasi-convex functions, see [186], [2], [177], [185], and [116]. The main interest of the notion of quasi-convexity consists in its relation with the lower semicontinuity of integral functionals, in the sense specified by the following theorem. Theorem 13.2.1. Let p ≥ 1 and let f : × Rm × Rmn → R be a Carathéodory integrand such that 0 ≤ f (x, s, z) ≤ c a(x) + |s|p + |z|p (13.16)
i
i i
i
i
i
i
13.2.
Lower semicontinuity in the vectorial case
“abmb 2005/1 page 5 i
505
for all (x, s, z) ∈ × Rm × Rmn , where c ≥ 0 is a constant, and a ∈ L1 (). Then the following conditions are equivalent: (i) for a.e. x ∈ and every s ∈ Rm the function f (x, s, ·) is quasi-convex; (ii) the functional F defined by (13.11) is sequentially lower semicontinuous on W 1,p (; Rm ) with respect to its weak topology. Proof. We give here the proof only in the basic case f = f (z), where 0 ≤ f (z) ≤ c 1 + |z|p ; the proof of the general case can be found in the references mentioned above. We start by proving that the quasi-convexity condition (i) implies the lower semicontinuity condition (ii). We have to prove that F (u) ≤ lim inf F (uh )
(13.17)
h→+∞
whenever uh → u weakly in W 1,p (; Rm ). This will be done in three steps. First step. Inequality (13.17) holds when u is affine and uh = u on ∂. In fact, in 1,p this case Du is a constant matrix z and uh − u ∈ W0 (; Rm ) so that, by definition of quasi-convexity, we get F (u) = ||f (z) ≤ f z + D(uh − u) dx = F (uh ),
hence (13.17). Second step. Inequality (13.17) holds when u is affine. We use a slicing method near the boundary of (see also the proof of Proposition 11.2.3). Let 0 be a compact subset of , let R = 21 dist(0 , ∂), and let N ∈ N. For every integer i = 1, . . . , N define iR i = x ∈ : dist(x, 0 ) < N and let ϕi be a smooth function such that 0 ≤ ϕi ≤ 1,
ϕi = 0 on \ i ,
ϕi = 1 on i−1 ,
|Dϕi | ≤
2N . R
Finally take vi,h = u + ϕi · (uh − u). For every i = 1, . . . , N we have vi,h → u weakly in W 1,p (; Rm ) as h → +∞, and vi,h = u on ∂ so that, by step 1, F (u) ≤ F (vi,h ) f (Duh ) dx + = i−1
i \i−1
f (Dvi,h ) dx +
f (Du) dx \i
i
i i
i
i
i
i
506
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
1 + |Dvi,h |
1 + |Du|p dx i \i−1 \i Np ≤ F (uh ) + c 1 + |Duh |p + |Du|p + |uh − u|p p dx R \ i i−1 +c 1 + |Du|p dx \i Np ≤ F (uh ) + c 1 + |Du|p dx. |Duh |p + |uh − u|p p dx + c R i \i−1 \0
≤ F (uh ) + c
p
dx + c
Summing for i = 1, . . . , N and dividing by N gives Np c 1 + |Du|p dx. |Duh |p + |uh − u|p p dx + c F (u) ≤ F (uh ) + R N \0 \0 Passing to the limit as h → +∞ and taking into account that (uh ) is bounded in W 1,p (; Rm ) yields c 1 + |Du|p dx. +c F (u) ≤ lim inf F (uh ) + h→+∞ N \0 Now, to achieve the proof of step 2, it is enough to pass to the limit as N → +∞ and as 0 ↑ . Third step. Inequality (13.17) holds in the general case u ∈ W 1,p (; Rm ). Let us fix ε > 0 and let w be a piecewise affine function such that u − wW 1,p < ε. In particular, there exist open sets i and constant matrices zi such that Dw = zi on i . Setting wi,h (x) = uh (x) − u(x) + zi x on i , we have wi,h → zi x weakly in W 1,p (i ; Rm ) so that, by step 2, f (zi ) dx ≤ lim inf f (Dwi,h ) dx. h→+∞
i
i
By Lemma 13.2.1 we get f (Duh ) − f (Dwi,h | dx f (Dwi,h ) dx ≤ F (uh ) − i i i i ≤c |Duh − Dwi,h | 1 + |Duh |p−1 + |Dwi,h |p−1 dx ≤c
i
i
i
i
1/p
1 + |Duh | + |Du − Dw|
|Du − Dw| dx
≤c ≤ cε.
|Du − Dw| 1 + |Duh |p−1 + |Du − Dw|p−1 dx p
p
p
1−1/p dx
i
i i
i
i
i
i
13.2.
Lower semicontinuity in the vectorial case
Analogously,
“abmb 2005/1 page 5 i
507
f (zi ) dx ≤ cε. F (u) − i i
Therefore, F (u) ≤ cε +
i
≤ cε +
i
f (zi ) dx i
lim inf h→+∞
≤ cε + lim inf
h→+∞
i
f (Dwi,h ) dx i
f (Dwi,h ) dx i
≤ 2cε + lim inf F (uh ) h→+∞
and the conclusion follows by taking the limit as ε → 0+ . We prove now that lower semicontinuity condition (ii) implies quasiconvexity of f . The proof is based on the following result, which is, as said in Example 2.4.2 about oscillation phenomena, a straightforward consequence of a general ergodic theorem. (See also Lemma 12.3.1.) Lemma 13.2.2. Let Q be any open cube in Rn of size L > 0, v a function in Lp (Q, Rm ) p and v˜ its Q-periodic extension, i.e., the function of Lloc (Rn , Rm ) defined by v(x) ˜ = v(x − z) if x ∈ Q + z, z ∈ LZn . p
Then the sequence (vh )h∈N defined by vh (x) = ˜ weakly converges in Lloc (Rn , Rm ) v(hx), 1 v(x) dx. (weak* if p = +∞) to its mean value meas(Q) Q The proof is a straightforward consequence of Proposition 13.2.1 that we establish below because of its own interest. Let now z be an m × n matrix, lz the linear function defined by lz (x) = zx, u ∈ C10 (Q, Rm ), u˜ its Q-periodic extension, and set for every x ∈ Rn , uh (x) = u(hx)/ ˜ h. It is p easily seen that uh strongly converges to 0 in Lloc (Rn , Rm ). On the other hand, according to Lemma 13.2.2, Duh weakly converges to 1 Du(x) dx = 0 meas(Q) Q p
in Lloc (Rm , Rmn ). Therefore uh + lz weakly converges to lz in W 1,p (Q, Rm ) and, by hypothesis (ii), lim inf f (Duh + z) dx ≥ f (z) dz = meas(Q)f (z). (13.18) h→+∞
Q
Q
i
i i
i
i
i
i
508
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
A change of scale and the periodicity assumption on u˜ gives 1 1 f (Duh + z) dx = f (D u˜ + z) dx meas(hQ) hQ meas(Q) Q 1 = f (Du + z) dx meas(Q) Q so that (13.18) yields 1 meas(Q)
f (Du + z) dx ≥ f (z), Q
which completes the proof. Remark 13.2.3. When p = +∞ the result of Theorem 13.2.1 still holds if we substitute the weak topology with the weak* topology of W 1,∞ (; Rm ), and condition (13.16) with 0 ≤ f (x, s, z) ≤ α(x, |s|, |z|)
(13.19)
for all (x, s, z) ∈ × Rm × Rmn , where α(x, t, τ ) is a function which is summable in x and increasing in t and τ . Remark 13.2.4. From what was presented above, it follows that the implications f convex
⇒
f polyconvex
⇒
f quasi-convex
⇒
f rank-one convex
hold true. We stress the fact that, as shown in Theorem 13.2.1, quasi-convexity is the right property to use when dealing with the lower semicontinuity of integral functionals of the calculus of variations. However, due to its intrinsic definition, it is not easy to work with quasi-convex functions, while polyconvexity and rank-one convexity conditions are much more explicit: in many cases, the sufficiency of the first and the necessity of the second are of great help. None of the implications above can be reversed; this can be seen very easily for the first one (indeed |det z| is a polyconvex function which is not convex), whereas more delicate counterexamples are needed for the remaining ones. See the book by Dacorogna [116] and the paper by Sverak [213] for details concerning these topics. The only problem that remains open in the study of the implications above is the equivalence between quasi-convexity and rank-one convexity in the case m = n = 2: neither counterexamples nor proofs of the equivalence are known. We state now the ergodic theorem stated in Lemma 13.2.2, which is a generalization of the convergence result described in Example 2.4.2 about oscillations phenomena, and a particular case of Lemma 12.3.1. Proposition 13.2.1. With the notation of Lemma 12.3.1, let Q be a cube in Rn of the form ni=1 (ai , ai + L) for some L > 0, and A : Bb (Rn ) → R satisfying (i) AA∪B = AA + AB for every disjoint set of Bb (Rn ), (ii) AA+z = AA for every set of Bb (Rn ) and every z in LZn .
i
i i
i
i
i
i
13.2.
Lower semicontinuity in the vectorial case
“abmb 2005/1 page 5 i
509
Then, for any bounded Borel convex subset B of Rn , lim
h→+∞
1 AhB = AQ . meas(hB) meas(Q)
Consequently, if v belongs to Lp (Q, Rm ) and v˜ denotes its Q-periodic extension, then vh p defined by vh (x) = v(hx), ˜ h ∈ N, weakly converges in Lloc (Rn , Rm ) (weakly* if p = +∞) 1 to its mean value meas(Q) Q v(x) dx. Proof. When B = Q, the first assertion is a straightforward consequence of the following decomposition: $ hQ = (z + Q). z∈LZn ∩hQ
For the proof of the general case, see [106]. We are going to the proof of the second assertion. Let U be an open ball of Rn ; then vh is bounded in Lp (U, Rm ). Indeed 1 1 |vh |p dx = |v| ˜ p dx, meas(U ) U meas(hU ) hU which converges thanks to the first assertion. Therefore, in the case when p > 1, there exists a subsequence of (vh )h∈N (not relabeled) and u ∈ Lp (U, Rm ) such that vh u weakly in Lp (U, Rm ). To identify the weak limit u, we work in the space of measures M(U, Rm ). Obviously vh u Ln U weakly in M(U, Rm ). For a.e. x0 in U , ρ in (0, 1) \ N where N is a countable subset of R (see Lemma 4.2.2), according to the first assertion, we have 1 u(x0 ) = lim u(x) dx ρ→0 meas(Bρ (x0 )) B (x ) ρ 0 1 vh (x) dx = lim lim ρ→0 h→+∞ meas(Bρ (x0 )) B (x ) ρ 0 1 = lim lim v(x) ˜ dx ρ→0 h→+∞ meas(hBρ (x0 )) hB (x ) ρ 0 1 = v(x) dx. meas(Q) Q It remains to treat the case p = 1. We establish the uniform integrability of the sequence (vh )h∈N in L1 (U, Rm ); the conclusion will follow by the Dunford–Pettis theorem, Theorem 2.4.5, and by the above procedure for identifying the weak limit. We now use a truncation argument. Let δ > 0 intended to go to +∞ and set vh,δ = vh ∧ δ ∨ (−δ), vδ = v ∧ δ ∨ (−δ), wh,δ = vh − vh,δ , wδ = v − vδ . Note that vh,δ = (vδ )h . For every Borel subset A of U , we have |vh | dx ≤ |wh,δ | dx + |vh,δ | dx. A
U
(13.20)
A
i
i i
i
i
i
i
510
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
On the other hand, according to the first assertion, 1 lim |wh,δ | dx = |wδ | dx. h→+∞ U meas(Q) Q Then, given ε > 0, since there exists δ large enough such that 1 ε |wδ | dx < , 4 meas(Q) Q there exists h(ε) ∈ N such that
|wh,δ | dx
1, p N −1 sup un ∞ + |∇un | dx + H (Sun ) < +∞. n∈N
Then there exists a subsequence (unk )k∈N and a function u in SBV (), such that unk → u strongly in L1loc (); ∇unk ∇u weakly in Lp (, RN ); HN−1 (Su ) ≤ lim inf HN −1 (Sunk ). k−→+∞
i
i i
i
i
i
i
516
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
Moreover, the Lebesgue part and the jump part of the derivatives converge separately. More precisely, ∇unk ∇u weakly in L1 () and J unk J u weakly in M(, RN ). Proof. In what follows C denotes various constants which do not depend on n. Thanks to inequalities |∇un |p dx ≤ C and |un |∞ ≤ C,
we have un BV () ≤ C. Indeed, according to Remark 10.3.4, for HN −1 a.e. x in Sun , − |u+ n (x)| ≤ un L∞ () ≤ C and |un (x)| ≤ un L∞ () ≤ C. From the compactness of 1 the embedding of BV (U ) into L (U ), for each regular open subset of , there exists a subsequence (unk )k∈N and u in BV () such that unk u weakly in BV (), unk → u strongly in L1loc (). Let us show that u ∈ SBV (). Since unk ∈ SBV (), according to Theorem 10.5.1, there exists a Borel measure µnk in M( × R, RN ) such that for all ∈ C1c (, RN ) and all ϕ ∈ C10 (R), ϕ(s)(x) µnk (dx, ds) = − ϕ (unk ) ∇unk . (x) + ϕ(unk ) div (x) dx, ×R
|µnk |( × R) = 2 H N−1 (Sunk ).
The second equality and the hypothesis HN −1 (Sunk ) ≤ C yield the existence of a Borel measure µ in M( × R, RN ) and a subsequence (not relabeled) of (µnk )k∈N , weakly converging to µ in M( × R, RN ). On the other hand, for a further nonrelabeled subsequence, there exists a ∈ L1 (, RN ) such that ∇unk weakly converges to a in L1 (, RN ). Going to the limit in the first equality, we obtain ϕ(s)(x) µ(dx, ds) = − ϕ (u) a . (x) + ϕ(u) div (x) dx, ×R
which, from Theorem 10.5.1, yields u ∈ SBV () and a = ∇u. On the other hand, according to the lower semicontinuity of the total variation, one has 2HN−1 (Su ) = |µ|( × R) ≤ lim inf |µnk |( × R) k−→+∞
= lim inf 2HN −1 (Sunk ). k−→+∞
Finally, since Dunk weakly converges to Du in M(, RN ) and ∇unk weakly converges to ∇u in L1 (, RN ), one has J unk = Dunk −∇unk LN weakly converges to J u = Du−∇uLN in M(, RN ). When (un )n∈N is not bounded in L∞ (), we obtain the same result provided that (un )n∈N be bounded in BV (). Note also that the boundedness of (∇un )n∈N in Lp ()
i
i i
i
i
i
i
13.4. Functionals with linear growth: Lower semicontinuity in BV and SBV
“abmb 2005/1 page 5 i
517
with p > 1, implies the equi-integrability of (∇un )n∈N . In this sense, the next theorem generalizes Theorem 13.4.3. Theorem 13.4.4. Let (un )n be a sequence in SBV () satisfying (i) supn∈N {|un |BV () } < +∞; (ii) the approximate gradients ∇un are equi-integrable (i.e., (∇un )n∈N is relatively compact with respect to the weak topology of L1 (, RN )); (iii) the sequence (HN−1 (Sun ))n∈N is bounded. Then there exists a subsequence (unk )k∈N weakly converging to some u ∈ SBV (), such that unk → u strongly in L1loc (); ∇unk ∇u weakly in L1 (, RN ); J unk J u weakly in M(, RN ); HN−1 (Su ) ≤ lim inf HN −1 (Sunk ). k−→+∞
Proof. According to Theorem 2.4.5, equi-integrability condition (ii) yields the existence of a subsequence of (∇un )n∈N and a in L1 (, RN ) such that ∇un a in L1 (, RN ). Then we argue as in the proof of Theorem 13.4.3 and adopt the same notation. We only have to justify the convergence of ϕ (unk ) ∇unk . (x) dx
to
ϕ (u) ∇u . (x) dx.
According to Egorov’s theorem, since ϕ (unk ) converges a.e. to ϕ (u), for all ε > 0 there exists a Borel subset ε of such that LN ( \ ε ) < ε and limk→+∞ supx∈ε |ϕ(unk ) − ϕ(u)| = 0. Let us write ϕ (unk ) ∇unk . (x) dx = ϕ (unk ) ∇unk . (x) dx + ϕ (unk ) ∇unk . (x) dx.
ε
Since moreover supk∈N right-hand side tends to
\ε
|∇unk | dx < +∞, one easily obtains that the first term in the
ϕ (u) ∇u . (x) dx.
ε
Letting ε → 0, the conclusion then follows, provided that we establish ϕ (unk ) ∇unk . (x) dx = 0, lim lim ε→0 k→+∞ \ ε
i
i i
i
i
i
i
518
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
which is a straightforward consequence of ϕ (unk ) ∇unk . (x) dx ≤ C
|∇unk | dx
\ε
\ε
and equi-integrability of (∇un )n∈N . Remark 13.4.1. If one of the two conditions (ii) and (iii) is not satisfied, the conclusion may fail. Indeed, in the Cantor–Vitali example of Section 10.4, un belongs to SBV (0, 1), weakly 0 converges to the Cantor–Vitali function u in BV (0, 1) and H (Sun ) = 0. Nevertheless (∇un )n∈N is not equi-integrable since (0,1) ∇un dx = 1 does not converge to (0,1) ∇u dx = 0. With the notation of this example, consider now vn in SBV (0, 1) defined by vn = un 1(0,1)\Cn . It is easily seen that vn weakly converges to u in BV (0, 1) and that ∇vn = 0 is obviously equi-integrable. But H0 (Svn ) = 2(2n − 1) is not uniformly bounded. Condition (iii) of Theorem 13.4.4 can be weakened by the following condition (iii) : there exists a function ψ : [0, +∞) → [0, +∞] such that ψ(t)/t → +∞ as t → 0 and − N −1 sup ψ |u+ < +∞. n − un | dH n∈N
Sun
For a proof, consult Braides [82]. Remark 13.4.2. Theorem 13.4.4 obviously holds in the vectorial case, i.e., when the considered sequences belong to SBV (, Rm ). We now deal with the lower semicontinuity property of functionals of the form f (∇u) dx + g(u+ , u− )h(νu ) dHN −1 Su ,
where f , g, and h verify suitable conditions. Theorem 13.4.5. Let us consider a function f : RN → R+ satisfying the De La Vallée– Poussin criterion: f is convex and f (a) = +∞. a→+∞ |a| lim
Let moreover g : R × R → R+ be a lower semicontinuous symmetric and subadditive function, i.e., g(a, b) = g(b, a) ≤ g(b, c) + g(c, a) ∀ a, b, c ∈ R, and assume that g(a, b) ≥ max(ψ(|a − b|, δ|a − b|) for all a, b ∈ R where the function ψ : [0, +∞) → [0, +∞] satisfies the condition ψ(t)/t → +∞ as t → 0, and δ is some positive constant. Let finally h : RN → [0, +∞) be a convex, even function, positively homogeneous of degree 1 and satisfying, for all ν ∈ RN , h(ν) ≥ c|ν| for some positive constant c. Then
i
i i
i
i
i
i
13.4. Functionals with linear growth: Lower semicontinuity in BV and SBV
“abmb 2005/1 page 5 i
519
the functional defined in L1 () by f (∇u) dx + g(u+ , u− )h(νu ) dHN −1 Su if u ∈ SBV (), F (u) =
+∞ otherwise
is lower semicontinuous for the strong topology of L1 (). If the lower semicontinuous, symmetric, and subadditive function g only satisfies the condition g(a, b) ≥ ψ(|a − b|), then, given a nonempty compact subset K of R, the functional F˜ defined in L1 () by f (∇u) dx + g(u+ , u− )h(νu ) dHN −1 Su if u ∈ SBV () and u(x) ∈ K, F˜ (u) =
+∞ otherwise
is lower semicontinuous for the strong topology of L1 (). Sketch of the proof. The proof is based on the following lemma. Lemma 13.4.1. Let g : R × R → R+ be a lower semicontinuous, symmetric, and subadditive function and h : RN → [0, +∞) be a convex, even function, positively homogeneous of degree 1. Then − N −1 g(u+ Sun g(u+ , u− )h(ν) dHN−1 Su ≤ lim inf n , un )h(νun )) dH n→+∞
whenever un , u satisfy the thesis of Theorem 13.4.4 (with condition (iii) or (iii) ). For the proof of Lemma 13.4.1, consult Braides [82, Theorem 2.12]. Let (un )n∈N be a sequence strongly converging to some u in L1 () and such that lim inf n→+∞ F (un ) < +∞. From the De La Vallée–Poussin criterion, there exists a subsequence of (un )n∈N (not relabeled) such that ∇unk weakly converges in L1 (, RN ). On the other hand, according to the coercivity assumption on g, − N −1 sup |u+ Sun < +∞. n − un | dH n∈N
The sequence (un )n∈N is then bounded in BV (). Moreover, from the assumption made on ψ, condition (iii ) of Remark 13.4.1 is satisfied. Thus conditions (i), (ii), and (iii) of Theorem 13.4.4 are fulfilled. The conclusion then follows from the convexity of f and Lemma 13.4.1. The proof of the lower semicontinuity property of the functional F˜ follows the same scheme. The boundedness − N −1 Sun < +∞ sup |u+ n − un | dH n∈N
− is now satisfied thanks to |u+ n (x)| ≤ un L∞ () ≤ C and |un (x)| ≤ un L∞ () ≤ C for HN−1 a.e. x in Sun (cf. Remark 10.3.4).
i
i i
i
i
i
i
520
“abmb 2005/1 page 5 i
Chapter 13. Integral functionals of the calculus of variations
Remark 13.4.3. In the vectorial case, Theorem 13.4.5 holds with the same conditions on g and h (now g : Rm × Rm → R+ ), and when f : R × Mm×N → R is quasi-convex and satisfies the growth conditions of order p > 1: for all A ∈ Mm×N , |A|p ≤ f (x, A) ≤ β(1 + |A|p ) for some positive constants α and β. For a proof, consult Ambrosio [15]. This result will be essential in Section 14.2 to establish the existence of a solution for the weak Griffith model in the framework of fracture mechanics.
i
i i
i
i
i
i
“abmb 2005/1 page 5 i
Chapter 14
Application in mechanics and computer vision
14.1 14.1.1
Problems in pseudoplasticity Introduction
This section is devoted to the study of the equilibrium of a three-dimensional elastoplastic material occupying a bounded domain ⊂ R3 as reference configuration and subjected to body and surface forces. The unknown displacement vector field u solves a minimization problem of the form W (ε(v)) dx − L(v) : v ∈ A ,
inf
∂ui where ε(v) denotes the linearized strain tensor εi,j (v) = 21 ( ∂x + j v → L(v) accounts for the exterior loading and is of the form g · v dH2 , f · v dx + L(v) =
∂uj ∂xi
). The linear mapping
1
where f denotes the body forces and g the surface forces on a part 1 of the boundary. The body is assumed to be clamped on 0 = \ 1 , with H2 (0 ) > 0. We will denote the space of 3 × 3 symmetric matrices by MS and its subspace of matrices with null trace by MSD , so that MS = MSD ⊕ RI . The constitutive equation of the material is such that the restriction W D of the stored energy density W to MSD satisfies a linear growth at infinity. The admissible set A of displacement fields is a subset of the space {v ∈ LD() : v = 0 on 0 }, where LD() = {v ∈ L1 (, R3 ) : ε(v) ∈ L1 (, MS )}. From a mathematical point of view, due to the linear growth of W D at infinity, two difficulties may appear: • The value of the infimum may be infinite. This problem leads to the theory of yield design (or limit load), which consists in analyzing the set of λ ∈ R+ such that W (ε(v)) dx − λL(v) : v ∈ A > −∞. inf
521
i
i i
i
i
i
i
522
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
From the mechanical point of view, this analysis predicts the load capacity of the structure. For more details about limit analysis, see Section 15.4. • Even if the infimum is finite, the problem has generally no solution. This is not surprising from a mechanical point of view, since the observed displacements are sometimes discontinuous on surfaces. A well-adapted space must contain admissible displacements with discontinuity on two-dimensional surfaces. This space, derived from BV (, R3 ) where the measure ε(u) plays the role of the measure Du, is precisely % & BD() := v ∈ L1 (, R3 ) : ε(u) ∈ M(, MS ) . It will be described in detail in Subsection 14.1.3. The integral functional W (ε(u)) of the measure ε(u) will be defined in Subsection 14.1.2. Unfortunately, the new problem inf W (ε(v)) − L(v) : v ∈ A˜ ,
where now the set A˜ of admissible displacements is a subset of the space {v ∈ BD() : v = 0 on 0 }, has generally no solution. Indeed, plastification phenomena may appear on the boundary, and discontinuities are sometimes observed on the part 0 of the boundary. The boundary condition v = 0 in the trace sense must be replaced by a surface energy of the form W ∞ (γ0 (v)τ ⊗s ν) dH2 . 0
We have denoted by γ0 the trace operator and by ν the exterior unit normal to 0 . The symmetric matrix field γ0 (v)τ ⊗s ν will be defined further. Roughly, the function W ∞ describes the behavior of W at infinity on straight lines generated by γ0 (v)τ ⊗s ν. The set ˜ is now a subset of BD() whose of admissible displacement fields, still denoted by A, elements satisfy a weaker boundary condition and will be described further. The mathematical theory of relaxation introduced in chapter 11 allows us to sum up this discussion as follows: ∞ 2 ˜ inf W (ε(v)) + W (γ0 (v)τ ⊗s ν) dH − L(v) : v ∈ A (P)
0
is the relaxed problem of inf W (ε(v)) dx − L(v) : v ∈ A ,
(P)
that is, v →
W (ε(v)) +
0
W ∞ (γ0 (v)τ ⊗s ν) dH2 + IndA˜
is the lower semicontinuous envelope of v → W (ε(v)) dx + IndA when BD() is equipped with its weak convergence. Moreover inf(P) = min(P). The next sections are devoted to a precise description and to a proof of this relaxation scheme.
i
i i
i
i
i
i
14.1. Problems in pseudoplasticity
“abmb 2005/1 page 5 i
523
14.1.2 The Hencky model To illustrate the previous general considerations, we deal with the description of the Hencky model. The reference configuration is assumed to have a boundary of class C1 . Let λ and µ be two given positive constants, namely, the Lamé coefficients of the material, and set k = λ + 2µ/3, the compression stiffness. The constitutive equation of the material is such that there exists a potential W of the form W (E) = W D (E D ) +
2 k tr(E) 2
for all E = E D + (1/3) tr(E)I in MS = MSD ⊗ RI . The density W D is more precisely k , defined by W D (E D ) = φ(|E D |), where : R+ −→ R has a quadratic growth up to µ√ 2 and a linear growth beyond this threshold. More precisely, k µs 2 if s ≤ √ , µ 2 φ(s) = 2 √ k k if s ≥ √ . s k 2 − 2µ µ 2 The proof of Lemma 14.1.1 below may be easily established and is left to the reader. Lemma 14.1.1. The function W D is convex and fulfills the three conditions (13.23), (13.24), and (13.25). The set of admissible displacement fields is the set of finite energy, that is, A = {v ∈ LD() : div(v) ∈ L2 (), v = 0 on 0 }, and problem (P) is precisely k D D 2 W (ε (v)) dx + (div(v)) dx − L(v) : v ∈ A . inf 2 Let us recall thatfor a function v : → RN , its divergence in the distributional sense ∂ui is given by div v := N i=1 ∂xi . The boundary condition v = 0 on 0 must be taken in the trace sense. The trace operator is indeed well defined from LD() into L1 (0 , R3 ). For a proof, it suffices to adapt the trace Theorem 10.2.1 (see Remark 10.2.2 or see, for instance, Temam [219]). We define now the relaxed problem. For all a in R3 and every unit vector ν in R3 , we denote the tangential and normal components of a relative to ν, by aτ and aν , respectively. In other words aτ = a −(a ·ν) ν, where a ·ν denotes the scalar product of a and ν in R3 . For all a and b in R3 , we define their symmetric tensor product by a ⊗s b := 1/2(ai bj +aj bi )i,j . The relaxed problem is precisely k inf W D (ε D (v)) + (div(v))2 dx (P) 2 (14.1) + W ∞ (γ0 (v)τ ⊗s ν) dH2 − L(v) : v ∈ A˜ , 0
i
i i
i
i
i
i
524
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
where γ0 is the trace operator from BD() into L1 () and the set of admissible displacements is A˜ = {v ∈ BD() : div v ∈ L2 (), vν = 0 on 0 }. Note that the boundary condition v = 0 (taken in the trace sense) on 0 in problem (P) has been relaxed, in problem (P), by the surface energy W ∞ (γ0 (v)τ ⊗s ν) dH2 0
and the weaker boundary condition vν = 0 on 0 . This last condition must be taken in the trace sense γν (v) = 0 where γν is a linear continuous operator from the space {v ∈ L1 (, R3 ) : div v ∈ L2 ()} into the dual C1 () of C1 (). The existence of this trace operator γν will be established in Theorem 14.1.3. The integral W D (ε D (v)) must be taken in the sense of measures defined in Sections 13.3 and 13.4.1. Let us recall that the measure W D (ε D (v)) denotes the Borel measure W D (eD ) L3 + (W D )∞ (ε D (v)S ) where eD L3 + ε D (v)S is the Lebesgue–Nikodym decomposition of the measure ε D (v), and that the recession function (W D )∞ of W D is defined for all E in MSD by W D (tE) . t→+∞ t
(W D )∞ (E) = lim
Consequently, by definition one has D D D D W (ε (v)) := W (e ) dx + (W D )∞ (ε D (v)S ).
When the singular part εD (v)S of εD (v) vanishes, we also denote the measure eD (v)L3 by εD (v)L3 . In the same spirit, if e L3 + ε(v)S is the Lebesgue–Nikodym decomposition of the measure ε(v), we denote the measure e(v)L3 by ε(v)L3 when ε(v)S = 0.
14.1.3 The spaces BD(), M(div), and U () Unless differently specified, the set is, for the moment, a bounded open subset of R3 . As said before, a well-adapted space for relaxing the above model is the space defined below. Definition 14.1.1. The subspace BD() := {v ∈ L1 (, R3 ) : ε(v) ∈ M(, MS )} of L1 (, R3 ) is called the space of bounded deformations. The measure ε(v) ∈ C0 (, MS ) is defined by its action on all ϕ in C0 (, MS ): ε(u), ϕ = ε(v)i,j , ϕi,j , i,j
i
i i
i
i
i
i
14.1. Problems in pseudoplasticity
“abmb 2005/1 page 5 i
525
where the brackets in the right-hand side, denote the action of the signed measure ε(v)i,j on the scalar function ϕi,j for the duality (C0 (), C0 ()). Remark 14.1.1. The action of ε(v) on ϕ will also be written as ϕ ε(v) which is also well defined on bounded Du-integrable functions ϕ. Note that, when v is regular (for instance, belongs to W 1,1 (, R3 )), ϕ ε(v) = ϕ(x) ε(v)(x) dx := ϕ(x) : ε(v)(x) dx,
where for two 3 × 3 matrices A and B, A : B denotes their Hilbert–Schmidt scalar product T defined by A : B := trace(A B). This is why we also write ϕ : ε(v) for the integral ϕ ε(v) with respect to the measure ε(v). Under these considerations, we leave the reader to adapt the definitions of the weak and intermediate convergences and the proof of the approximating Theorem 10.1.2 or its generalization, Theorem 13.4.1. It suffices to argue with the components of u and to replace everywhere Du by ε(u) (see also Remark 10.2.2). Because of its importance in Subsection 14.1.4, we state the approximating theorem. Theorem 14.1.1. Let f : MS → R+ be a convex function satisfying (13.23) and (13.24). The space C∞ (, MS ) ∩ BD() is dense in BD() equipped with the intermediate convergence associated with f . More precisely, for all u in BD(), there exists un in C∞ (, MS ) ∩ BD() such that un → u strongly in L1 (, R3 ); |ε(u )| dx → |ε(u)|; n f (ε(un )) dx → f (ε(u)); f (ε D (un )) dx → f (ε D (u)).
In the same spirit, we state without proof the trace theorem. Theorem 14.1.2. Let be a Lipschitz open bounded subset of R3 . There exists a linear continuous map γ0 from BD() onto L1H2 (, R3 ) satisfying (i) ∀u ∈ C(, R3 ) ∩ BD(), γ0 (u) = u ; (ii) ∀ϕ ∈ C(, MS ) ϕ : ε(u) = − u · div ϕ dx + γ0 (u) ⊗S ν : ϕ dH2 ,
i
i i
i
i
i
i
526
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
2 where ν is the outer unit normal at almost all x on and div ϕ denotes the vector valued H ∂ϕ 3 function defined by (div ϕ)i := j =1 ∂xi,jj , i = 1, . . . , 3.
As a consequence of the Green’s formula (ii) above, one can adapt the two first examples of Section 10.2. Example 14.1.1. Consider two disjoint Lipschitz open bounded subsets 1 and 2 of an open bounded subset of R3 such that = 1 ∪ 2 and set 1,2 := ∂1 ∩ ∂2 which is assumed to satisfy H2 (1,2 ) > 0. We denote the trace operators from BD(1 ) onto L1H2 (∂1 , R3 ) and BD(2 ) onto L1H2 (∂2 , R3 ) by γ1 and γ2 , respectively. Let u1 and u2 be, respectively, two elements of BD(1 ) and BD(2 ) and define u1 in 1 , u= u2 in 2 . Then u belongs to BD() and ε(u) = ε(u1 )1 +ε(u2 )2 +[u] ⊗s νHN −1 1,2 , where [u] = γ1 (u1 ) − γ2 (u2 ) and ν(x) is the unit inner normal at x to 1,2 , considered as a part of the boundary of 1 . Example 14.1.2. By slightly modifying the previous example, if we set u in , v= 0 in R3 \ , where is a Lipschitz bounded open subset of R3 and u ∈ BD(), we see that v belongs to BD(R3 ) and that ε(v) = ε(u) +u+ ⊗s ν HN −1 , where is the boundary of , ν denotes the inner unit vector normal to , and u+ denotes the trace of u on . Remark 14.1.2. Remark 10.2.1 also holds in this situation. More precisely, let be a Lipschitz open bounded subset of R3 . The approximating Theorem 14.1.1 may easily be improved as follows: the regular approximating functions of u ∈ BD() have all their traces equal to that of u on the boundary of . For a proof, see, for instance, [219]. We define now the space M(div) which is involved in the definition of the admissible set of displacement fields in the Hencky model. Definition 14.1.2. We denote the space of all the functions in L1 (, R3 ) such that their divergence belongs to L2 (), by M(div): M(div) := {v ∈ L1 (, R3 ) : div v ∈ L2 ()}. On M(div), we can define a trace notion.
i
i i
i
i
i
i
14.1. Problems in pseudoplasticity
“abmb 2005/1 page 5 i
527
Theorem 14.1.3. Let be a C2 open bounded subset of R3 . There exists a linear continuous map γν from M(div) into C1 () satisfying (i) ∀u ∈ C(, R3 ) ∩ BD(), γν (u) = u · ν ; (ii) for all u in M(div), all ϕ ∈ C1 (), and all ∈ C1 () such that = ϕ, γν (u), ϕ =
u · D dx +
div u dx.
For a proof, consult Temam [219, Proposition 7.2]. Note that this theorem also makes sense when div u is a Borel measure. In this case, the space M(div) is defined by M(div) := {v ∈ L1 (, R3 ) : div v ∈ M()}. For our application we consider only the case when div v ∈ L2 (). We consider now the space U () := BD() ∩ M(div) equipped with two convergences. Precisely, for all sequence (un )n∈N in U (), one defines • the weak convergence, defined by the weak convergence of un to u in BD() and the strong convergence of div un to div u in L2 (); • the intermediate convergence associated with a convex function f , defined by the intermediate convergence of un to u in BD() associated with f together with the strong convergence of div un to div u in L2 (). The following result completes the approximating Theorem 14.1.1 in the spirit of Remark 14.1.2. For a proof, consult Theorems 3.4 and 5.3 in [219]. Theorem 14.1.4. Let f : MS → R be a convex function satisfying α(|A| − 1) ≤ f (A) ≤ β(1 + |A|) for all A in MS . Then for all u in U (), there exists un in C∞ (, MS ) ∩ BD() such that un → u strongly in L1 (, R3 ); |ε(u )| dx → |ε(u)|; n f (ε(un )) dx → f (ε(u));
div un → div u strongly in L2 (); D f (ε (u )) dx → f (ε D (u)); n γ0 (un ) = γ0 (u).
i
i i
i
i
i
i
528
14.1.4
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
Relaxation of the Hencky model
In this subsection we show that the lower semicontinuous envelope of the Hencky functional energy for the weak topology of U () is the functional of (P) in (14.1). We moreover establish the convergence of the corresponding energy to the energy of the relaxed functional. Here, is a bounded open subset of R3 of class C2 . Theorem 14.1.5. The lower semicontinuous envelope for the weak convergence of U (), of the integral functional defined on U () by k D D (div(v))2 dx if v ∈ LD() ∩ M(div), v = 0 on 0 , W (ε (v)) dx + F (v) = 2 +∞ otherwise, is the functional defined on U () by k W D (ε D (v)) + (div(v))2 dx + (W D )∞ (γ0 (v)τ ⊗s ν) dH2 , 2 0 F (v) = if v ∈ U, γν (v) = 0 on 0 , +∞ otherwise. Sketch of the proof. Arguing exactly as in the proof of Theorem 13.4.2, where Theorem 13.4.1 is replaced by Theorem 14.1.4, we obtain k D D 2 W (ε (v)) + (div(v)) dx + (W D )∞ (γ0 (v) ⊗s ν) dH2 , 2 0 F (v) = if v ∈ U, vν = 0 on 0 , +∞ otherwise, where condition vν = 0 on 0 is intended in the trace sense: for H 2 a.e. x on 0 , γν (v)(x) = 0. Indeed, the map v → (div(v))2 dx is continuous for the weak convergence of U (). Note also that according to Theorem 14.1.3, the trace operator defined from M(div) into C1 () is continuous when the two spaces are equipped with their weak convergences. Finally, for H2 a.e. x on 0 , since vν = 0 in the trace sense on 0 , we have γ0 (v)τ ⊗s ν(x) = 2 γ0 (v) ⊗s ν(x). Indeed, it is easily seen that γ0 (v) · ν = γν (v) H a.e. 0 Corollary 14.1.1. The energy of the Hencky model k D D 2 W (ε (v)) dx + (div(v)) dx − L(v) : v ∈ A , inf 2 where A = {v ∈ LD() : div(v) ∈ L2 (), v = 0 on 0 }, relaxes to k min W D (ε D (v)) + (div(v))2 dx + W ∞ (γ0 (v)τ ⊗s ν) dH 2 − L(v) : v ∈ A˜ , 2 0 where A˜ = {v ∈ BD() : div(v) ∈ L2 (), vν = 0 on 0 }. Sketch of the proof. According to the general theory of relaxation (see Theorem 11.1.2), it suffices to prove that any minimizing sequence related to the Hencky energy possesses a subsequence weakly converging in the space U . This assertion may be easily established thanks to coercivity condition (13.25).
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
14.2
“abmb 2005/1 page 5 i
529
Some variational models in fracture mechanics
14.2.1 A few considerations in fracture mechanics Let us consider an elastic brittle medium whose reference configuration is a bounded domain of R3 . Griffith’s theory of fracture mechanics asserts that the energy necessary to produce a crack K included in is proportional to the crack area H2 (K). Consequently, the elastic deformation energy outside the crack must be completed by an additional energy whose simplest form is λH2 (K). The constant λ is the Griffith coefficient, introduced for fracture initiation (see [149], [200]). The elastic energy of the deformable body under consideration then takes the form E(u, K) = f (∇u) dx + λH2 (K), \K
where u denotes the deformation vector field and ∇u the deformation gradient. Under suitable conditions on f , the functional E makes sense in a classical way if, for instance, K is a closed set, and u belongs to C1 ( \K, R3 ). From the inequality H2 (K) ≤ λ−1 E(u, K), we see that when E(u, K) < +∞, the crack surface K is a two-Hausdorff-dimensional closed set of and its Lebesgue measure is zero. Thus, the crack surface K can be seen as the set of discontinuity points for the measurable function u, more precisely, the measurable representative of u defined on , satisfying the convention of Remark 10.3.2. It is worth noticing the analogy between this model and the strong model introduced in image segmentation by Mumford–Shah and discussed in Section 12.4 (see also Section 14.3 for complements). Following the idea developed in Section 12.4, one may define a weak formulation in the setting of SBV functions introduced in Section 10.5 (completed by Remark 10.5.1) by considering the functional E(u) = f (∇u) dx + λH2 (Su ), u ∈ SBV (, R3 ),
where ∇u is the density of the regular part of the measure Du and Su the jump set of u. Actually, one can deal with functionals of the more general form E(u) = g(x, u+ (x), u− (x), νu (x)) dH2 Su . f (x, ∇u) dx +
The bulk energy density f accounts for the elastic deformation outside the crack and g for the density energy necessary to produce a crack of surface Su . The meaning of the presence of the terms u+ (x), u− (x), and νu (x) is the following: the fracture energy may depend on the crack opening and, for nonisotropic materials, on the crack surface orientation. According to Theorem 13.4.5 and Remark 13.4.3, suitable conditions on f and g ensure semicontinuity of E so that direct methods of the calculus of variations provide the existence of solutions for optimization problems related to the energy functionals of the type E. More precisely, we consider a lower semicontinuous symmetric and subadditive function g : R3 × R3 → R+ , i.e., g(a, b) = g(b, a) ≤ g(b, c) + g(c, a) ∀ a, b, c ∈ R3 .
i
i i
i
i
i
i
530
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
We assume moreover that g(a, b) ≥ ψ(|a − b|) for all a, b ∈ R3 , where ψ : [0, +∞) → [0, +∞], satisfies ψ(t)/t → +∞ as t → 0. Subadditivity assumption on the function g forces the crack material to possess a minimal number of connected components. We finally consider a convex, even function h : R3 → [0, +∞), positively homogeneous of degree 1 and satisfying for all ν ∈ RN , h(ν) ≥ c|ν|, where c is a given positive constant. Theorem 14.2.1. Let f : R×M 3×3 → R be a quasi-convex function satisfying the following growth conditions of order p > 1: there exist two positive constants α and β such that for all A ∈ M 3×3 , α|A|p ≤ f (x, A) ≤ β(1 + |A|p ). Let g and h be two functions satisfying the conditions introduced above, and let L ∈ L∞ (, R3 ) (the exterior loading). Then, given a nonempty compact subset K of R3 , there exists a solution of the minimum problem + − 2 min f (∇u) dx + g(u , u )h(νu ) dH Su + L · u dx : u ∈ SBV (, R3 ), u(x) ∈ K for a.e. x . Proof. Following the classical direct methods of the calculus of variation, the conclusion is a straightforward consequence of the lower semicontinuity of the functional energy, which has been established in Theorem 13.4.5 and Remark 13.4.3. The coercivity easily follows from the confinement condition u(x) ∈ K for a.e. x. A similar type of result is described in detail in Section 14.3 for the Mumford–Shah model in image segmentation. In the second assertion of Theorem 14.2.1, the confinement condition u ∈ K does not seem to be natural for all problems in fracture mechanics. To remove such a condition, we have in general to state the problems in the space GSBV of functions whose truncations are in SBV (see [17], [82], and references therein). A similar statement can be given for boundary value problems. In this case we know (see the previous section or Theorem 11.3.1) that the Dirichlet boundary conditions u = u0 on a subset 0 of the boundary of , assumed to be Lipschitz, are relaxed into a surface energy at the boundary. Taking, for instance, g = 1, we have to consider minimization problems of the form 2 2 f (∇u) dx + h(νu ) dH Su + h(ν) dH 0 + L · u dx : u ∈ SBV (, R3 ), min u(x) ∈ K for a.e. x , where ν is the inner unit normal to 0 . Existence of a solution is obtained in a similar way. The argument above also works for more general minimum problems of the form (see, for instance, [118]) min f (∇u) dx + g(u+ , u− )h(νu ) dH2 Su + V (x, u) dx : u ∈ SBV (, R3 ) ,
where f, g, h are as above, and the potential V is integrable in x and satisfies V (x, s) ≥ θ(|s|) − a(x)
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
531
with a ∈ L1 () and θ (t)/t → +∞ as t → +∞. The case of Theorem 14.2.1 corresponds to V (x, s) = L(x) · s + δK (s). Functionals of the type E can be approximated by elliptic functionals via a variational procedure due to Ambrosio and Tortorelli and described in Section 12.4.3 for the Mumford–Shah energy. These approximating functionals are more adapted to a numerical treatment (see [61], [80]). In the one-dimensional case, the two models introduced in Subsections 14.2.2 and 14.2.3 provide an alternative and more direct way for defining a discrete variational approximation of E. Other models, proposed by Barenblatt [58], can be similarly weakened in the space SBV (, R3 ) by considering energies of the type E(u) = f (∇u) dx + g(|u+ − u− |) dH2 Su ,
where g(t) → 0 as t → 0. Optimization problems related to such energies functionals do not provide minimizing sequences satisfying Theorem 14.2.1. Indeed, we lose the control of the Hausdorff measure of the jump set. Therefore, minimizing sequences may converge to a function whose jump part is not concentrated on an (N − 1)-Hausdorff dimensional set and, consequently, it does not belong to SBV (, R3 ). To select minimizers, one can follow a singular perturbation approach, involving the notion of viscosity solutions (consult Attouch [29]). This procedure has already been described in details for the phase transition model in Section 12.4.2. Precisely, the method consists in perturbing the functional E by the fracture initiation energy εH2 (Su ) and introducing functionals of Griffith type Eε (u) = E(u) + εH2 (Su ) (ε > 0 intended to tend to zero) for which Theorem 14.2.1 applies. The cluster points of ε-minimizers related to Eε then belong to SBV (, R3 ) and minimize E among minimizers with jump set of minimal HN−1 measure (see [17] and [85]). For a specific study of problems involving fracture mechanics in the modern framework of SBV functions, we refer the reader to [17], [22], [138], [139], and references therein. In sections below, we would like to supply a justification of weak Griffith’s models, in the one-dimensional case, by taking into account the microscopic scale and the statistical local energy distribution. More precisely, we deal with two discrete systems of material points, which, in the reference configuration, occupy the points of the lattice εZ, ε = 1/n included in the interval [0, 1]. For each material point placed at x ∈ εZ in the reference configuration, u(x) ∈ R denotes its new position in the deformed state. Each point interacts only with its nearest neighbors. We aim at describing the continuous limit of the discrete energy in the sense of -convergence. We show that the continuous variational limit of the second model takes the form of Griffith’s model discussed above. In the first model, we show that convexity conditions satisfied by local density energies entail the presence of a Cantor-part energy: the Griffith’s initiation energy is completed by an additional term C uc (0, 1), where uc is the Cantor part of u and C a suitable constant. It is worth noticing that the discrete energies considered may also be viewed as discrete variational approximating functionals of Griffith’s type in the one-dimensional case. This approach is very close to that of Chambolle consisting in approximating the Mumford–Shah functionals in the twodimensional case (see Chambolle [107]) and to some recent works by Braides [84] and Braides and Gelli [89].
i
i i
i
i
i
i
532
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
14.2.2 A first model in one dimension The interaction between each pair {zε, (z + 1)ε} of contiguous points is described by a random energy u(ε(z + 1)) − u(εz) εWωz , ε where Wωz belongs to a finite set {Wj , j ∈ J }. Each density Wj , j ∈ J , mapping R into [0, +∞], is assumed to be of Hencky pseudoplastic type (see Section 14.1) and must take the value +∞ if two neighboring points occupy the same location. The total energy of the interaction between points located in [0, 1] ∩ εZ is Eε (ω, u) =
n−1 z=0
ε Wωz
u(ε(z + 1)) − u(εz) , ε
where ω = (ωz )z∈Z ∈ J Z . One may think that two neighboring material points are connected by randomly chosen nonlinear springs, and this model also describes a system of n springs which are randomly distributed. We still denote the continuous extension of u, which is affine on (εz, ε(z + 1)) for all z ∈ Z, by u. Thus the discrete energy may be considered as defined on L1 (0, 1). This first model is described by the almost sure -convergence of u → Eε (ω, u) toward a deterministic energy functional living in BV (0, 1). Let us give some specific notations about BV -functions in the one-dimensional case. For each function u in BV (0, 1), one writes u = ua dt + us the Lebesgue–Nikodym decomposition of its distributional derivative u with respect to the Lebesgue measure dt on (0, 1).+ The−singular part with +respect −to dt has the following decomposition: us = (u − u )δ + u , where u and u are, respectively, the approximate upper and t c t∈Su lower limits of u, Su is the jump set {t ∈ (0, 1) : u+ (t) = u− (t)} of u, and uc is the singular diffuse part, also called the Cantor part of u . Let BV + (0, 1) be the subset of all the functions u in BV (0, 1) such that ua > 0 a.e. in (0, 1) and us ≥ 0. We will establish that the (deterministic) limit energy functional is defined on BV + (0, 1) by 1 E(u) := W hom (ua ) dt + C us (0, 1) , 0
where C is a positive constant. The density e → W hom (e) is obtained as the almost sure limit of a suitable subadditive ergodic process. This mathematical result expresses that the mechanical macroscopic behavior of a string can be interpreted as the variational limit of a discrete system at a microscopic scale. Moreover, we will express the duality principle (W hom )∗ = j ∈J pj Wj∗ , where pj is the probability presence of each density Wj . For other models in a deterministic setting and some discussions about the possible existence of a fracture site related to these models, consult [87] and [17]. Let us give some more details on the random discrete model. As said above, we consider a finite set of functions Wj , j ∈ J , of probability presence pj , satisfying the three following conditions: (i) Wj : R → R+ ∪ {+∞} is convex, finite for e > 0, Wj (1) = 0, and there exists α > 0 such that α(e − 1) ≤ Wj (e) for all e ≥ 0;
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
533
(ii) there exists β > 0 such that Wj (e) ≤ β(1 + e) for all e > 1; (iii) lime→0+ Wj (e) = +∞ and Wj (e) = +∞ when e ≤ 0. The assumption Wj (1) = 0 means that no energy is needed when no deformation occurs. Assumption (iii) means that an infinite amount of energy is needed to squeeze a pair of material points down to a single one and that there is no interpenetrability of the matter. These density functions are of Hencky pseudoplastic type owing to the convexity and the linear growth conditions. The fundamental stochastic setting that we will need for describing the discrete energy and its asymptotic behavior is the discrete dynamical system (, T , P, (Tz )z∈Z ): = Z , (, T , P) is the product probability space of the Bernoulli probability space on constructed from pj , j ∈ J , the transformation Tz is the shift defined for all z ∈ Z by Tz (ωs )s∈Z = (ωs+z )s∈Z . The expectation operator will be denoted by E. To write in a continuous form the random energy functional, we consider the random function defined for all (ω, t, e) ∈ × R × R by W (ω, t, e) = Wωz (e) when t ∈ [z, z + 1). Let Aε (0, 1) denote the space of continuous functions on (0, 1) which are affine on each interval (εz, ε(z+1)) of (0, 1). More generally when s = (b−a)/n, As (a, b) will denote the space of continuous functions on (a, b) which are affine on each interval (a+sz, a+s(z+1)) of (a, b). The total energy due to the interactions between the points of [0, 1] ∩ εZ is the functional defined in L1 (0, 1) by n−1 u(ε(z + 1)) − u(εz) ε W ω, z, if u ∈ Aε (0, 1), Eε (ω, u) = ε z=0 +∞ otherwise, or, in a continuous form,
t , u W ω, (t) dt Eε (ω, u) = ε 0 +∞ otherwise, 1
if u ∈ Aε (0, 1),
where t → u(t) also denotes the piecewise affine extension of u. Note that the domain of Eε (ω, .) is the subset of all functions u in Aε (0, 1) whose distributional derivative u is positive. To solve the problem, we make use of the following concept of subadditive random process which extends, in a probabilistic setting, the notion of subadditive process developed in Lemma 12.3.1. Consider a probability space (, T , P) and a group (τs )s∈Zd of P-preserving transformations on (, T ), i.e., (i) τs is T -measurable; (ii) P(τs E) = P(E) for all E in T and all s in Zd ; (iii) τs ◦ τt = τs+t , τ−s = τs−1 for all s and t in Zd .
i
i i
i
i
i
i
534
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
The group (τs )s∈Zd is said to be ergodic if every set E in T , such that for all s ∈ Zd τs E = E, has a probability measure equal to 0 or 1. A sufficient condition to ensure ergodicity is the following mixing condition: for all E and F in T lim P(τs E ∩ F ) = P(E)P(F ),
|s|→+∞
which expresses an asymptotic independence. We summarize these conditions by saying that (, T , P, (τz )z∈Z ) is a discrete ergodic dynamical system. By using classical probabilistic arguments, it is easily seen that (, T , P, (Tz )z∈Z ), previously defined, is a discrete ergodic dynamical system. We are now going to recall the important notion of random subadditive < process in a general probabilistic setting. The notation [a, b), a ∈ Z d , b ∈ Z d , stands for di=1 [ai , bi ) and J is the set of all subsets [a, b) with a, b in Zd . For every I in J , |I | denotes its Lebesgue measure. Finally, the positive number ρ(I ) denotes the supremum of the set {r ≥ 0 : ∃B¯ r (x) ⊂ I } where B¯ r (x) is the closed ball of Zd with radius r, centered at x. Given a discrete ergodic dynamical system (, T , P, (τz )z∈Zd ), a random subadditive process with respect to the group (τs )s∈Zd is a set function S : J −→ L1 (, T , P) satisfying (i) and (ii): (i) for all I ∈ J such that there exists a finite family (Ij )j ∈J of disjoint sets in J with I = j ∈J Ij , SI (·) ≤ SIj (·); j ∈J
(ii) ∀I ∈ J , ∀s ∈ Zd , Ss+I = SI ◦ τs . A sequence (An )n∈N of sets of J is said to be regular if there exists an increasing sequence (In )n∈N of J and a positive constant C independent of n such that An ⊂ In and |In | ≤ C|An | for all n ∈ N. The following subadditive ergodic theorem is due to Ackoglu and Krengel. It will be applied in the next section, with d = 1, to the discrete ergodic dynamical system (, T , P, (Tz )z∈Z ). Theorem 14.2.2. Let S be a subadditive process with respect to the group (τs )z∈Zd which is assumed to be ergodic. Suppose that SI (ω) inf P(dω) : |I | = 0 > −∞ |I | and let (An )n∈N be a regular sequence of J satisfying limn→+∞ ρ(An ) = +∞. Then almost surely SAn (ω) S[0,m[d lim = inf ∗ E . n→+∞ |An | m∈N md For a proof see [3] and, for some extensions, see [165] and [170]. The space L1 (0, 1), equipped with its norm, plays the role of the metric space (X, d) in the -convergence definition of chapter 12.
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
535
Theorem 14.2.3. The energy functional Eε (ω, .) -converges almost surely to the functional E defined in L1 (0, 1) by 1 W hom (ua ) dt + W hom,∞ (1) us (0, 1) if u ∈ BV + (0, 1), E(u) = 0
+∞ otherwise. The density W hom is defined as follows: W hom (e) = +∞ if e ≤ 0 and, if e > 0, one has ω-a.s., n 1 W hom (e) = lim inf W (ω, t, e + v ) dt : v ∈ W01,1 (0, n) n→+∞ n 0 n 1 = inf∗ E inf W (., t, e + v ) dt : v ∈ W01,1 (0, n) . n∈N n 0 Moreover, W hom verifies properties (i), (ii), and (iii) of the functions Wj and its Legendre–Fenchel transform is given by (W hom )∗ = j ∈J pj Wj∗ . The proof is established by means of Propositions 14.2.1 and 14.2.2, each giving, respectively, the lower bound and the upper bound for a subsequence in the definition of -convergence. Before stating Proposition 14.2.1, we introduce a parametrized subadditive process, i.e., a family of subadditive processes which will be used to define the limit problem. For this purpose, for δ ∈ (0, δ0 ], j ∈ J , we consider the truncated functions Tδ Wj = Wj ∧Lδ , where Lδ is the affine function defined by Lδ (t) = Wj (δ)+τi (t −δ), τj ∈ ∂Wj (δ) (the subdifferential of Wj at δ) and set Wδ (ω, t, e) = Tδ ωz (e) when t ∈ [z, z + 1), W0 = W. The nonincreasing family (Wδ )δ∈(0,δ0 ] satisfies for 0 < δ ≤ δ0 , ω ∈ , t ∈ R, and e ∈ R: Wδ ≤ W, Wδ (e) = W (e) for e ≥ δ, limδ→0 Wδ (ω, t, e) = W (ω, t, e), e → Wδ (ω, t, e) is convex, α(|e| − 1) ≤ Wδ (ω, t, e) ≤ βδ (1 + |e|), where βδ is a positive constant depending only on δ. Let F(, R+ ∪ {+∞}) be the set of all the measurable functions from into R+ ∪ {+∞} and consider the parametrized subadditive process S defined by S : J × [0, δ0 ] × R −→ F(, R+ ∪ {+∞}, (A, δ, e) → SA (δ, e, .),
where SA (δ, e, ω) = inf
A
◦ Wδ (ω, t, e + v ) dt : v ∈ W01,1 (A) .
It is worth noticing that the domain of e → SA (δ, e, ω) is R when δ ∈ (0, δ0 ] while the one of e → SA (0, e, ω) is ]0, +∞[. Indeed, if SA (0, e, ω) < +∞, for e ≤ 0, there exists ◦ ◦ v ∈ W01,1 (A) such that W (ω, t, v (t)) < +∞ t a.e. in A. Therefore v (t) > −e for t a.e. ◦ in A and A v dt = 0 > −e, a contradiction.
i
i i
i
i
i
i
536
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
It is easily seen that for all fixed (δ, e) in (0, δ0 ] × R ∪ {0} × (0, +∞) , the map A → SA (δ, e, .) is a subadditive process satisfying SA (δ, e, ω) ≤ |A| max{Tδ Wj (e) : j ∈ J } and that all conditions of Theorem 14.2.2 are fulfilled. Consequently, there exists a set of full probability such that for all (δ, e) in (0, δ0 ] ∩ Q × R and all ω ∈ , S[0,n) (δ, e, ω) S[0,m) = inf ∗ E (δ, e, .). n→+∞ m∈N n m
Wδhom (e) := lim
(14.2)
On the other hand, for all e > 0, there exists a set e of full probability such that S[0,n) (0, e, ω) S[0,m) hom W (e) := lim = inf ∗ E (0, e, .) n→+∞ m∈N n m and W hom (e) = +∞ if e ≤ 0. The independence of the first set with respect to e comes from the equi-Lipschitz property of e → S[0,n) (δ, e, ω) (see [119] or [179]). The following lemma states the continuity at δ = 0+ of the function δ → Wδhom (e) when e > 0. Lemma 14.2.1. For all fixed e > 0, there exists δ(e) > 0 such that for all δ ∈ (0, δ(e)] ∩ Q, Wδhom (e) = W hom (e). Proof. First step. This step is also valid for all e ∈ R and all ω ∈ . Let δ > 0, we show that there exists uδ,n (ω) ∈ A1 (0, n) ∩ W01,1 (0, n) such that S[0,n) (δ, e, ω) 1 = n n
n
Wδ (ω, t, e + uδ,n (ω)) dt.
0
S[0,n) (δ, e, ω) . From a classical calculation, the n Fenchel conjugate of e → Wn (e) is given for all σ in R by 1 n ∗ ∗ Wn (σ ) = Wδ (ω, t, σ ) dt. n 0
To shorten notation, we set Wn (e) =
Set Aj (ω, n) = {t ∈ [0, n] : Wδ (ω, t, .) = Tδ Wj } and λj (ω, n) = meas(Aj (ω, n)) for j ∈ J . Thus λj (ω, n) Wn∗ (σ ) = (Tδ Wj )∗ (σ ) (14.3) n j ∈J and, by classical subdifferential rules (see chapter 9), ∂Wn∗ (σ ) =
λj (ω, n) j ∈J
n
∂(Tδ Wj )∗ (σ ).
(14.4)
Let now σ ∈ ∂Wn (e). Thus e ∈ ∂Wn∗ (σ ) and from (14.4), there exists Uj,δ,n (ω) ∈ ∂(Tδ Wj )∗ (σ ) such that λj (ω, n) e= Uj,δ,n (ω). n j ∈J
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
The function
x
uδ,n (ω)(x) = 0
“abmb 2005/1 page 5 i
537
Uj,δ,n (ω) − e 1Aj (ω,n) (s) ds
j ∈J
answers the question. Indeed, uδ,n belongs to A1 (0, n) ∩ W01,1 (0, n). On the other hand, according to Uj,δ,n (ω) ∈ ∂(Tδ Wj )∗ (σ ), (14.3), and the fact that e ∈ ∂Wn∗ (σ ), λj (ω, n) 1 n Wδ (ω, t, e + uδ,n (ω)) dt = Tδ Wj (Uj,δ,n (ω)) n 0 n j ∈J =
λj (ω, n)
σ Uj,δ,n (ω) − Tδ Wj∗ (σ )
n
j ∈J
= σ e − Wn∗ (σ ) = Wn (e). Second step. From classical probabilistic arguments, there exists a set of full probability such that for all ω ∈ , lim
n→+∞
λj (ω, n) = pj . n
(14.5)
We pick up ω0 in e ∩ ∩ and claim that inf n Uj,δ,n (ω0 ) ≥ δ for all δ ∈ Q∗+ small enough, say, δ ∈ (0, δ(e)] ∩ Q. Otherwise there exist two sequences (δk )k and (nk )k converging to 0 and +∞ such that Uj,δk ,nk (ω0 ) ≤ δk . But, according to the first step S[0,nk ) (δk , e, ω0 ) S[0,nk ) (0, e, ω0 ) ≥ nk nk λj (ω0 , nk ) = Tδk Wj (Uj,δk ,nk (ω0 )) nk j ∈J ≥
λj (ω0 , nk ) j ∈J
nk
Wj (δk )
and letting k → +∞, we obtain W hom (e) = +∞, a contradiction. Last step. According to the two previous steps, for δ ∈ (0, δ(e)] ∩ Q one has S[0,n) (δ, e, ω0 ) S[0,n) (0, e, ω0 ) ≥ n n λj (ω0 , n) = Tδ Wj (Uj,δ,n (ω0 )) n j ∈J =
λj (ωý , n) j ∈J
n
Wj (Uj,δ,n (ω0 ))
1 n W (ω0 , t, uδ,n (ω0 )) dt n 0 S[0,n[ (0, e, ω0 ) ≥ . n
=
i
i i
i
i
i
i
538
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
S[0,n) (0, e, ω0 ) S[0,n) (δ, e, ω0 ) = . Letting n → +∞, we obtain that W hom (e) = n n Wδhom (e) as soon as δ ∈ (0, δ(e)] ∩ Q.
Therefore
Proposition 14.2.1. Let , be the subsets of full probability defined in (14.2) and (14.5) and let u, uε in L1 (0, 1) be such that uε → u strongly in L1 (0, 1). Then for all ω in ∩ , E(u) ≤ lim inf Eε (ω, uε ). ε→0
Moreover the domain of − lim inf ε→0 Eε (ω, .) is included in BV + (0, 1). Proof. We fix ω in ∩ . First step. If lim inf ε→0 Eε (ω, uε ) < +∞, by the coercivity condition on the functions Wj (property (i)), u belongs to BV (0, 1) and obviously u ≥ 0. Let us now consider, for δ ∈]0, δ0 ] ∩ Q, the truncated energy 1 t Wδ (ω, , u (t)) dt if u ∈ Aε (0, 1), Eε,δ (ω, u) = ε 0 +∞ otherwise, and the corresponding energy with domain W 1,1 (0, 1) 1 t Wδ (ω, , u (t)) dt if u ∈ W 1,1 (0, 1), ˜ Eε,δ (ω, u) = ε 0 +∞ otherwise, which satisfies all the properties of random integral functionals considered in [1]. From the inequalities Eε (ω, .) ≥ Eε,δ (ω, .) ≥ E˜ ε,δ (ω, .) and according to [1] we then deduce lim inf Eε (ω, uε ) ≥ lim inf Eε,δ (ω, uε ) ≥ Eδ (u), ε→0
where
Eδ (u) =
1
0
ε→0
(14.6)
Wδhom (ua ) dt + Wδhom,∞ (1)us (0, 1) if u ∈ BV (0, 1),
+∞ otherwise. Second step. It is not restrictive to assume lim inf ε→0 Eε (ω, uε ) < +∞. Obviously ua ≥ 0 and us ≥ 0. We would like to let δ going to 0 in (14.6) and apply Lemma 14.2.1. It remains to prove that ua > 0 a.e. in (0, 1). Otherwise, there exists a Borel set N of (0, 1) with meas(N ) = 0, such that ua = 0 on N . We have +∞ > lim inf Eε (ω, uε ) ≥ meas(N )Wδhom (0), ε→0
(14.7)
where, from the first step in the proof of Lemma 14.2.1, λj (ω, n) hom Tδ Wj (Uj,δ,n (ω)), lim Wδ (0) = n→+∞ n j ∈J
λj (ω, n) Uj,δ,n (ω) = 0. n j ∈J
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
539
By the coercivity assumption (i), an easy calculation leads to supn |Uj,δ,n (ω)| ≤ C. Thus there exists Uj,δ (ω) in R satisfying, up to a subsequence, limn→+∞ Uj,δ,n (ω) = Uj,δ (ω) and W hom (0) = pj Tδ Wj (Uj,δ (ω)), δ j ∈J pj Uj,δ (ω) = 0. j ∈J
The second equality yields the existence of an index jδ such that Ujδ ,δ ≤ 0 and the first equality gives Wδhom (0) ≥ pjδ Tδ Wjδ (Ujδ ,δ (ω)) ≥ pjδ Tδ W jδ (0) ≥ min pj min Tδ Wj (0) j
j
so that limδ→0 Wδhom (0) = +∞. Letting δ → 0 in (14.7) leads to a contradiction. Last step. Letting δ → 0 in (14.6), according to the monotone convergence theorem and Lemma 14.2.1, we finally obtain lim inf Eε (ω, uε ) ≥ ε→0
1
W hom (ua ) dt + W hom,∞ (1) us (0, 1) .
0
The proof is then achieved. To establish the upper bound, we will apply the following lemma. Lemma 14.2.2. (i) Let e > 0 and let i ∈ N be a fixed integer. There exists ui,n (ω) in A1 (in, (i + 1)n) ∩ W01,1 (in, (i + 1)n) such that S[in,(i+1)n) (0, e, ω) =
(i+1)n
W (ω, t, e + ui,n (ω)) dt.
in
(ii) The map W hom : R → [0, +∞] is convex, continuous, and W hom (1) = 0. Proof. For establishing (i), reproduce the first step of the proof of Lemma 14.2.1 with (0, n) replaced by (in, (i + 1)n) and Wδ by W . Note also that there exist Uj,δ > 0, j = 1, 2, 3 (depending on e), satisfying sup Uj,n < +∞, such that n
S[0,n) (0, e, ω) λj (ω, n) = Wj (Uj,n (ω)), n n j ∈J
λj (ω, n) Uj,n = e, Uj,n > 0. n j ∈J
The convexity of the map e → W hom (e) is a consequence of Jensen’s inequality fulfilled for e > 0 and established by a straightforward calculation. Consequently, this map
i
i i
i
i
i
i
540
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
is continuous on its domain R∗+ and, to prove (ii), we must show that lime→0+ W hom (e) = +∞. Let ek > 0 tend to 0 and fix ω in ∩k∈N ek . Letting n → +∞ (up to a subsequence) in S[0,n[ (0, ek , ω) λj (ω, n) = Wj (Uj,n,k (ω)), n n j ∈J λj (ω, n) Uj,n,k = ek , n j ∈J we obtain the existence of numbers Uj,k ≥ 0 such that W hom (ek ) = pj Wj (Uj,k (ω)), j ∈J pj Uj,k = ek . j ∈J
Note that this shows that W hom (1) = 0. Obviously Uj,k → 0 when k → +∞. Thus lim W hom (ek ) =
k→+∞
j ∈J
pj lim Wj (Uj,k ) = +∞ k→+∞
and the proof is complete. We establish now the upper bound. Proposition 14.2.2. There exists a subset of full probability such that for all ω ∈ , there exists a subsequence (nk )k∈N of positive integers satisfying the following: for all u in L1 (0, 1), there exists unk in L1 (0, 1) possibly depending on ω, converging to u, and such that lim supk→+∞ E1/nk (ω, unk ) ≤ E(u). Proof. First step. We prove Proposition 14.2.2 for all u belonging to SBV# (0, 1) ∩ BV + (0, 1) where SBV# (0, 1) denotes the subspace of all functions of SBV (0, 1) having a finite jump set. It is not restrictive to assume that Su = {t0 }. For m ∈ N∗ , consider the decomposition (0, 1) = ∪m−1 i=0 (i/m, (i +1)/m). Let us set i0 to denote the integer such that t0 ∈ [i0 /m, (i0 + 1)/m) and by um the interpolate function of u with respect to this decomposition. We set ei,m := um (i/m, (i + 1)/m) if i = i0 and ei0 ,m = m u((i0 + 1)/m) − u(i0 /m) . For each i ∈ {1, . . . , m − 1} we consider a sequence ei,m,η in Q∗+ such that limη→0 ei,m,η = ei,m . By continuity of W hom and Jensen’s inequality, we have
lim
η→0
m−1 i=0, i =i0
m−1 i+1 m 1 hom W hom (ua ) dt W (ei,m,η ) ≤ i m m i=0, i =i0 1 ≤ W hom (ua ) dt 0
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics and
“abmb 2005/1 page 5 i
541
i0 i0 + 1 1 hom 1 hom lim lim W (ei0 ,m,η ) = lim −u m u W m→+∞ m m→+∞ η→0 m m m = W hom,∞ ([u](t0 ))
so that lim lim
m→+∞ η→0
m−1 i=0
1 hom W (ei,m,η ) ≤ E(u). m
(14.8)
But Lemma 14.2.2 implies the existence of ui,η,n in A1 (in, (i + 1)n) ∩ W01,1 (in, (i + 1)n) (we have dropped the dependence on ω) such that S[in,(i+1)n) (0, ei,m,η , ω) 1 (i+1)n W (ω, t, ei,m,η + ui,η,n dt = n n in (i+1)/m =m W (ω, mnt, ei,m + ui,η,n (mnt)) dt. i/m
Therefore, as the sequence of intervals (in, (i + 1)n) n∈N∗ is regular, according to Theorem 14.2.2, there exists = ∩e∈Q∗+ e of full probability such that for all ω ∈ (i+1)/m hom W (ω, mnt, ei,m,η + ui,η,n (mnt)) dt. (14.9) W (ei,m,η ) = lim m n→+∞
i/m
Combining (14.8) and (14.9), we obtain 1 lim lim lim W (ω, mnt, vm,η,n ) dt ≤ E(u), m→+∞ η→0 n→+∞ 0
where vm,η,n (t) = um (t) +
m−1 i=0
1(i/m,(i+1)/m) (t)
1 ui,η,n (nmt). mn
An easy calculation shows that limm→+∞ limη→0 limn→+∞ vm,η,n = u strongly in L1 (0, 1) and that vm,η,n belongs to A1/nm (0, 1). Therefore, by using a diagonalization argument, there exists a map n → (m(n), η(n)) such that 1 (ω, vm(n),η(n),n ) ≤ E(u), limn→+∞ E m(n)n limn→+∞ vm(n),η(n),n = u. We complete the proof by denoting k → nk , the subsequence n → m(n)n, and setting unm(n) := vm(n),η(n),n . Second step. We prove Proposition 14.2.2 for u belonging to SBV + (0, 1). Let of SBV# (0, 1) with jump Su = {t0 , . . . , ti , . . .} be the jump set of u and let ul the function set Sul = {t0 , . . . , tl } defined by ul (0+ ) = u(0+ ) and ul = ua + li=0 [u](.)δti . According to the first step, there exists ul,nk strongly converging to ul in L1 (0, 1) such that lim sup E1/nk (ω, ul,nk ) ≤ E(ul ). k→+∞
i
i i
i
i
i
i
542
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
Letting l tend to +∞ and by using a diagonalization argument, there exists a map k → l(k) such that the sequence ul(k),nk , still denoted by unk , strongly converges to u in L1 (0, 1) and satisfies lim sup E1/nk (ω, unk ) ≤ E(u). k→+∞
˜ Last step. According to the previous step, we have −lim supk→+∞ E1/nk (ω, .) ≤ E, where 1 W hom (ua ) dt + W hom,∞ (1) [u](t) if u ∈ SBV + (0, 1), ˜ E(u) = 0 t∈Su +∞ otherwise. Its lower semicontinuous envelope for the strong topology of L1 (0, 1) is (see [76]) 1 hom W (ua ) dt + W hom,∞ (1)us (0, 1) if u ∈ BV + (0, 1), E(u) = 0 +∞ otherwise, which ends the proof. Proof of Theorem 14.2.3. Propositions 14.2.1 and 14.2.2 and the last assertion of Theorem 12.1.1 imply that for all ω in the set ∩ ∩ of full probability, Eε (ω, .) -converges to E. It remains to show that W hom satisfies the three properties of the functions Wj . and the Convexity, property (iii), and W hom (1) = 0 have been proved in Lemma 14.2.2(ii) growth conditions are trivially satisfied. Finally, we establish (W hom )∗ = j ∈J pj Wj∗ . We make precise the notations introduced in the proof of Lemma 14.2.1 by pointing out the dependence of Wn with respect to the parameter δ. We then set S[0,n) (δ, e, ω) . n Its Fenchel conjugate is given for all σ in R by 1 n ∗ (Wnδ )∗ (σ ) = Wδ (ω, t, σ ) dt. n 0 Wnδ (e) =
Equation (14.3) becomes now (Wnδ )∗ (σ ) =
λj (ω, n) j ∈J
n
(Tδ Wj )∗ (σ ).
(14.10)
According to the strong law of large numbers, the right-hand side of (14.10) tends almost surely to pj (Tδ Wj )∗ (σ ) j ∈J
when n goes to infinity. Note that this pointwise limit is also the -limit of λj (ω, n) σ → (Tδ Wj )∗ (σ ) n j ∈J defined on R.
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
543
On the other hand the pointwise limit of Wnδ toward Wδhom obtained by the subadditive ergodic theorem is also the -limit of σ → Wnδ (σ ) defined on R. Indeed, the sequence of functions (Wδhom )n∈N∗ is equi-Lipschitz, so that pointwise and -limit agree (see Corollary 2.59 in [28]). According to the continuity of the Fenchel conjugate with respect to the Mosco-convergence, hence here, to the -convergence, we deduce from (14.10) (Wδhom )∗ (σ ) = (14.11) pj (Tδ Wj )∗ (σ ) ∀σ ∈ R. j ∈J
We would now like to go to the limit on δ in (14.11). Since the sequence (Wδhom )δ of lower semicontinuous functions defined on R increases to the lower semicontinuous function W hom when δ tends to 0, we have Wδhom → W hom in the sense of -convergence for functionals defined on R. The same argument implies that Tδ Wj -converges to Wj . According to the continuity of the Fenchel conjugate with respect to the -convergence (note that Mosco and -convergence agree), we finally deduce our result by going to the limit on δ in (14.11). Remark 14.2.1. Concerning the upper bound, for all u in SBV# (0, 1) ∩ BV + (0, 1), we have proved the existence of uε strongly converging to u in L1 (0, 1) and having the traces of u at 0+ and 1− . It is possible to generalize the previous study when the density functions Wj satisfy a growth condition of order p > 1. In this case W hom,∞ (1) = +∞ and the limit functional given by Theorem 14.2.3 becomes E(u) =
1
W hom (u ) dt if u ∈ Lp (0, 1), u > 0,
0
+∞ otherwise. Let us consider the functional 1 t W (ω, , u ) dt if u ∈ W 1,1 (0, 1), E˜ ε (ω, u) = ε 0 +∞ otherwise. Then, according to Proposition 14.2.2, we have − lim supε→0 E˜ ε ≤ E almost surely. On the other hand, the truncation argument of the first step in the proof of Proposition 14.2.2, implies that for uε → u in L1 (0, 1), lim inf E˜ ε (ω, uε ) ≥ lim inf E˜ ε,δ (ω, uε ) ≥ Eδ (u). ε→0
ε→0
Arguing as in the second step of this proof, we also obtain lim inf ε→0 E˜ ε (ω, uε ) ≥ E(u). ˜ Thus the functional E(ω, .) -converges to the functional E.
14.2.3 A second model in one dimension Keeping the same probabilistic setting, we study a new discrete model for which interaction between each pair of contiguous points is described by a random energy density which is
i
i i
i
i
i
i
544
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
no longer assumed to be convex but which satisfies the same conditions in a neighborhood of 0+ . This energy functional is precisely assumed to be subadditive beyond a random threshold eε satisfying limε→0 εeε = 0. In this case, the total energy is of the form Fε (ω, u) =
uz+1 − uz ε Wε ω, z, ε z=0
n−1
and almost surely -converges to a deterministic energy functional defined on the subset SBV + (0, 1) := SBV (0, 1) ∩ BV + (0, 1) by F (u) =
1
0
f (ua ) dt +
g([u]).
t∈Su
More precisely, we consider a finite number Wj , j ∈ J of density functions satisfying (i) Wj : R → R+ ∪ {+∞} is convex, finite for e > 0, Wj (1) = 0 and there exists α > 0 such that α(e − 1)2 ≤ Wj (e) for all e in [0, +∞); (ii) there exists β > 0 such that Wj (e) ≤ β(1 + e2 ) for all e > 1; (iii) lime→0+ Wj (e) = +∞ and Wj (e) = +∞ when e ≤ 0. Note that (i), (ii), and (iii) are the conditions of the first model but with growth conditions of order 2. On the other hand, let g be a subadditive, continuous function with at most linear growth, mapping [0, +∞) into (0, +∞) and satisfying inf [0,+∞) g > 0. We then consider the density functions Wj,ε , j ∈ J , from R into [0, +∞) defined by Wj,ε (e) =
Wj (e) if e ≤ 1, Wj (e) ∧ 1ε g(ε(e − 1)) if e > 1.
According to growth conditions, it is easily seen that there exists ej,ε > 1 satisfying limε→0 ej,ε = +∞, limε→0 εej,ε = 0, and such that Wj,ε (e) =
Wj (e) if e ≤ ej,ε , 1 g(ε(e − 1)) if e > ej,ε . ε
= (e − 1)2 when An example of such functions is given by Wj √ √ e ≥ 1, Wj (e) = − ln e when 0 ≤ e ≤ 1, and g(e) = 1 + e or g(e) = 1 + e or g(e) = 1 + e. As said above, the subadditivity assumption on g forces the crack to possess a minimal number of connected components. Denoting by eWj ,ε every threshold ej,ε , we now can define the random threshold ω → eε (ω) by eε (ω) = (eωz ,ε )z∈Z and the random function Wε for all t ∈ [z, z + 1) by ωz (e) if e ≤ eωz ,ε , Wε (ω, t, e) = 1 g(ε(e − 1)) if e > eωz ,ε . ε
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
545
The total energy modeling interactions between contiguous points of [0, 1] ∩ εZ is the functional defined on L1 (0, 1) by n−1 u(ε(z + 1)) − u(εz) ω, z, ε W if u ∈ Aε (0, 1), ε Fε (ω, u) = ε z=0 +∞ otherwise, or, in a continuous form, by
t , u ω, dt if u ∈ Aε (0, 1), W ε Fε (ω, u) = ε 0 +∞ otherwise. 1
We equip L1 (0, 1) with its strong convergence. The main result is given below. Theorem 14.2.4. The functional Fε -converges almost surely to the functional F defined in L1 (0, 1) by F (u) =
1
W hom (ua ) dt +
0
+∞ otherwise,
g([u](t))if u ∈ SBV + (0, 1),
t∈Su
where W hom is the limit density defined in the first model. The proof follows the lines of the first model proof: first we establish the lower bound, then the upper one for a subsequence. For more general models, but in a deterministic setting, see [88]. We would like to write the functional Fε so that the contribution of Wj and g are separated. To this end, we consider the space SBVε (0, 1) of all the functions of SBV (0, 1) whose restriction to each interval (εz, ε(z + 1)) included in (0, 1) is affine, and we associate to each function u of Aε (0, 1) the function u˜ in SBVε (0, 1) defined for all t ∈ ε[z, z + 1) by u(ε(z + 1)) − u(εz) ≤ eωz ,ε , u(t) ˜ = u(t) if ε t − εz + u(εz) otherwise. Note that actually u˜ is a random function, but we have dropped the dependence on ω to shorten notations. Then, for all u ∈ Aε (0, 1), one has u(ε(z + 1)) − u(εz) Fε (ω, u) = εW ω, z, ε {z : u(ε(z+1))−u(εz)≤εeωz ,ε } u(ε(z + 1)) − u(εz) g ε −1 + ε {z : u(ε(z+1))−u(εz)>εeωz ,ε } 1 t W (ω, , u˜ a ) dt + g([u](t)) ˜ := F˜ε (ω, u), ˜ ≥ ε 0 t∈S u˜
i
i i
i
i
i
i
546
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
where ω → W (ω, t, e) is the random function defined in the first model. The proof of the lower bound in the definition of -convergence is based on the following lemma. Lemma 14.2.3. Assume that supε Fε (ω, uε ) < +∞ and that uε strongly converges to some u in L1 (0, 1). Then u˜ ε strongly converges to u in L1loc (0, 1). Moreover u belongs to SBV + (0, 1) and g([u]) ≤ lim inf g([u˜ ε ]). ε→0
t∈Su
t∈Su˜ ε
Proof. In the proof of the first statement, the difficulty stems from the lack of coercivity of Wε with respect to (uε − 1)+ . Nevertheless, note that for all i, j in {1, . . . , n − 1}, jε ε jε (uε − 1)+ dt, |uε − u˜ ε | dt ≤ 2 iε iε and that
sup ε
1
(uε − 1)− dt < +∞.
0
Thus for (a, b) ⊂⊂ (0, 1), where a, b satisfy limε→0 uε (a) = u(a) and limε→0 uε (b) = u(b), one has b sup (uε − 1)+ dt < +∞. ε
a
Let now (a, b) ⊂⊂ (0, 1) as above. From supε F˜ε (ω, u˜ ε ) < +∞ and using the previous estimate, the coercivity assumption on W and the fact that inf [0,+∞) g > 0, we have supε u˜ ε BV (a,b) < +∞; (u˜ )ε equi-integrable on (a, b); ε,a 0 supε H (Su˜ ε (a,b) < +∞. Thus, according to compactness Theorem 13.4.4, u belongs to SBV (a, b) and there exists a subsequence (not relabeled) satisfying u˜ ε u weakly in BV (a, b), 1 b), u ˜ ε,a ua weakly in L (a, [u˜ ε (a, b)]δt [u(a, b)]δt weakly in M(a, b), t∈S t∈S u ˜ (a,b) u(a,b) ε H 0 (Su(a,b) ) ≤ lim inf H 0 (Su˜ ε (a,b) ). ε→0
Note that thanks to the coercivity assumption on Wj , (u˜ ε,a )ε is equi-integrable on (0, 1) and thus weakly converges to ua in L1 (0, 1). Moreover, u is a Borel measure on (0, 1) as a nonnegative distribution, so that u belongs to BV (0, 1). Therefore u belongs to SBV (0, 1) owing to the characterization of the space SBV (0, 1) (see Theorem 10.5.1). To prove the last assertion, it is enough to establish g([u]) ≤ lim inf g([u˜ ε ]) t∈Su(a,b)
ε→0
t∈Su˜ ε
i
i i
i
i
i
i
14.2. Some variational models in fracture mechanics
“abmb 2005/1 page 5 i
547
and to let a go to 0 and b to 1. Set µε = g([u˜ ε (a, b)]) H 0 Su˜ ε (a,b) . From supε F˜ε (ω, u˜ ε ) < +∞, up to a subsequence, µε weakly converges to a Borel measure µ ∈ M(a, b). Let µ = θ H 0 Su(a,b) + µs its Lebesgue–Nikodym decomposition with respect to the Borel ˜ measure H 0 Su(a,b) . It suffices now to establish θ(t0 ) ≥ g([u](t0 ) t0 for H 0 Su(a,b) a.e. t0 ˜ ˜ 0 in (0, 1). For H Su(a,b) a.e. t0 in (0, 1) and for a.e. ρ > 0, we have ˜ µε (Bρ (x0 )) (Bρ (x0 )) u(a,b) ˜ = lim lim µε (Bρ (x0 )) ρ→0 ε→0 g([u˜ ε ](t)) = lim lim
θ (t0 ) = lim lim
H 0 S
ρ→0 ε→0
ρ→0 ε→0
t∈Bρ (x0 )∩Su˜ ε (a,b)
≥ lim inf lim inf g ρ→0
≥ lim inf g ρ→0
ε→0
[u˜ ε ](t)
t∈Bρ (x0 )∩Su˜ ε (a,b)
[u](t) ˜
t∈Bρ (x0 )∩Su(a,b)
= g([u](t0 )), where we have used the subadditivity and continuity assumptions on g. We now establish the lower bound. Proposition 14.2.3. There exists a set of full probability such that for all ω in and u, uε in L1 (0, 1) satisfying uε → u strongly in L1 (0, 1), F (u) ≤ lim inf Fε (ω, uε ). ε→0
We assume lim inf ε→0 Fε (ω, uε ) < +∞. Define vε in W 1,1 (0, 1) by vε (t) = Proof. t u ˜ ds. Since u˜ a,ε weakly converges to ua in L2 (0, 1), vε strongly converges in L2 (0, 1) 0 a,ε t to the function v of L2 (0, 1) defined by v(t) = 0 ua ds. Therefore, according to Remark 14.2.1 seen for the first model, there exists a set of full probability such that, for all ω ∈ , 1 1 t lim inf W ω, , vε dt ≥ W hom (v ) dt. ε→0 ε 0 0 Therefore
lim inf ε→0
and
ua
0
1
1 t W ω, , u˜ a,ε dt ≥ W hom (ua ) dt, ε 0
> 0. We end the proof by applying Lemma 14.2.3.
To conclude the proof of Theorem 14.2.4, we now establish the upper bound. Proposition 14.2.4. There exists a subset of full probability such that for all ω ∈ , there exists a subsequence (nk )k∈N of positive integers satisfying: for all u in L1 (0, 1), there exists unk in L1 (0, 1) possibly depending on ω, converging to u and such that lim supk→+∞ F1/nk (ω, unk ) ≤ F (u).
i
i i
i
i
i
i
548
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
Proof. Without loss of generality, one may assume F (u) < +∞ and Su = {t0 }. Let m ∈ N∗ and i0 ∈ N be such that t0 ∈ [i0 /m, (i0 + 1)/m) and set Im = (0, i0 /m) ∪ ((i0 + 1)/m, 1). According to Remark 14.2.1, where (0, 1) is replaced by (0, i0 /m) or ((i0 + 1)/m, 1), there exists a subset of full probability which can be chosen independent of m, such that for all ω ∈ , there exists wn ∈ A1/mn (0, i0 /m) ∩ A1/mn (i0 + 1)/m, 1) satisfying lim wn = u strongly in L1 (Im ), n→+∞ i0 + 1 i0 i0 + 1 i0 wn , =u and wn =u m m m m 1 W hom (ua ) dt ≤ W hom (ua ) dt. W (ω, nmt, wn ) dt ≤ lim sup n→+∞
0
Im
Im
Noticing that W1/mn ≤ W , we obtain W1/mn (ω, nmt, wn ) dt ≤ lim sup n→+∞
1
W hom (ua ) dt.
(14.12)
0
Im
Consider now the function wn,m defined as follows: wn,m = wn on Im ; i0 i0 + 1 i0 i0 + 1 =u ; wn,m m = u m , wn,m m m 1 i0 i0 + 1 wn,m is affine on , − with wn,m = 1; m m nm 1 i0 + 1 i0 + 1 − , . wn,m is affine on m nm m 1 i0 +1 Clearly wn,m belongs to A1/mn (0, 1) and the slope e of its restriction to ( i0m+1 − nm , m ) satisfies 1 n−1 1 i0 i0 + 1 i0 i0 + 1 −u − >u −u − , e=u m m mn m m m mn 1 where the last term tends to [u](t0 ) > 0. Since mn eωi0 +1 ,1/mn tends to 0, for n large enough we have e > eωi0 +1 ,1/mn . A straightforward calculation then yields i0 1 i0 + 1 −u − W1/mn (ω, nmt, wn,m ) dt = g u m m m (0,1)\I m
so that
lim sup lim sup m→+∞ n→+∞
W1/mn (ω, nmt, wn,m ) dt = g([u](t0 )).
(14.13)
(0,1)\I m
Finally, combining (14.12) and (14.13), we obtain 1 lim sup lim sup W1/mn (ω, nmt, wn,m ) dt ≤ F (u). m→+∞ n→+∞
0
i
i i
i
i
i
i
14.3. The Mumford–Shah model
“abmb 2005/1 page 5 i
549
On the other hand, clearly lim sup lim sup wn,m = u strongly in L1 (0, 1) m→+∞ n→+∞
and we complete the proof by a diagonalization argument as in Proposition 14.2.2. Remark 14.2.2. One can slightly generalize this model by assuming the function g of random type (see [159], [160]). More precisely, let {gj , j ∈ J } be a finite set of Lipschitz functions mapping [0, +∞) into (0, +∞), satisfying inf [0,+∞) g > 0, and not necessarily subadditive. We set = {(wj , gj ), j ∈ J }Z and Wj (e) if e ≤ 1, Wj,ε (e) = Wj (e) ∧ 1ε gj (ε(e − 1)) if e > 1, which, as previously, leads to the random function defined by ωz1 (e) if e ≤ eωz ,ε , Wε (ω, t, e) = 1 2 ωz (ε(e − 1)) if e > eωz ,ε ε for all ω = ((ωz1 , ωz2 ))z∈Z in , t in [z, z + 1), and e in R. Then, the corresponding total energy, almost surely -converges to the functional 1 W hom (ua ) dt + g hom ([u](t)) if u ∈ SBV + (0, 1), F (u) = t∈Su 0 +∞ otherwise. Setting g(ω, t, .) = ωz2 for all t in [z, z + 1), the density g hom (a) at a ∈ R is the almost sure deterministic limit of the process > = + G(−T ,T ) (a) = inf g(ω, t, [v](t)) : v ∈ SBV0,a (−T , T ) ,
t∈(−T ,T )∩Sv
+ (−T , T ) = {v ∈ SBV + (−T , T ) : va = 0, v(−T ) = 0, v(T ) = a} SBV0,a
when T goes to +∞. It is easily seen that a → g hom (a) is subadditive.
14.3 The Mumford–Shah model Let us recall the Mumford–Shah model discussed in Section 12.4. Let be a bounded open subset of RN and g a given function in L∞ (). Denoting by F the class of the closed sets of , for all K in F and all u in C1 ( \ K) we defined the functional 2 E(u, K) := |u − g| dx + |∇u|2 dx + HN −1 (K)
\K
and the associated optimization problem which is the strong formulation of the Mumford– Shah model in image segmentation: inf{E(u, K) : (u, K) ∈ C1 ( \ K) × F}.
(14.14)
i
i i
i
i
i
i
550
“abmb 2005/1 page 5 i
Chapter 14. Application in mechanics and computer vision
When is a rectangle in R2 and g(x) is the light signal striking at a point x, (14.14) is the Mumford–Shah model of image segmentation: K may be considered as the outline of the given light image in computer vision. In Section 12.4 we introduced the corresponding weak formulation, |u − g|2 dx + |∇u|2 dx + HN −1 (Su ) : u ∈ SBV () , (14.15) mw := inf
where ∇u denotes the density of the Lebesgue part of Du. One may now establish the weak existence result. Theorem 14.3.1. There exists at least a solution of the weak problem (14.15). Proof. We adopt the strategy of the so-called direct methods in the calculus of variations. From the hypothesis g ∈ L∞ (), we may assume that all the admissible functions u in (14.15) are uniformly bounded in L∞ () by |g|L∞ () . Indeed, let c = |g|L∞ () and consider the truncated function uc = c ∧ u ∨ (−c). It is easily seen that uc belongs to SBV () and satisfies Su ⊂ Su , c |∇uc |2 dx ≤ |∇u|2 dx, |uc − g|2 dx ≤ |u − g|2 dx
(note that only the last inequality requires the explicit value of c). Let (un )n be a minimizing sequence of (14.15). It obviously satisfies |∇un |2 dx + H N −1 (Sun ) ≤ C, |un |∞ +
where C is a constant which does not depend on n. According to Theorem 13.4.3, there exists a subsequence (unk )k and a function u∗ in SBV () such that 1 ∗ unk → u in Lloc (), ∗ ∇unk ∇u in L2 (, RN ), HN−1 (Su∗ ) ≤ lim inf H N −1 (Sunk ). k−→+∞
According to Fatou’s lemma and to the weak lower semicontinuity of the norm of the space L2 (, RN ), we deduce ∗ 2 |u − g| dx + |∇u∗ |2 dx + HN −1 (Su∗ ) ≤ lim inf |unk − g|2 dx + lim inf |∇unk |2 dx + lim inf HN −1 (Sunk ) k→+∞ k→+∞ k→+∞ ≤ lim inf |unk − g|2 dx + |∇unk |2 dx + HN −1 (Sunk ) = mw , k→+∞
which proves that u∗ is a solution of (14.15).
i
i i
i
i
i
i
14.3. The Mumford–Shah model
“abmb 2005/1 page 5 i
551
We establish now the existence of a solution for the strong problem (14.14). Theorem 14.3.2. There exists at least a solution of the strong problem (14.14). Proof. The proof proceeds in three steps. First step. Let ms be the value of the infimum of problem (14.14). We begin by showing that mw ≤ ms . Indeed, arguing as in the previous proof, one may assume that all the admissible functions of (14.14) are uniformly bounded in L∞ (). Moreover, according to Example 10.5.1, for all K in F such that HN −1 (K) < +∞, the space W 1,1 (\K)∩L∞ () is included in SBV () and all its elements satisfy HN −1 (Su \ K) = 0. Second step. We establish that any solution u∗ of (14.15) satisfies ∗ u ∈ C1 ( \ S u∗ ), HN −1 (S u∗ ∩ \ Su∗ ) = 0. We only prove the first assertion. The second is more involved and we refer the reader to the paper of De Giorgi, Carriero, and Leaci [123]. Let Bρ (x) be the open ball centered at x, with radius ρ small enough so that Bρ (x) ⊂ \ S u∗ . Then u∗ belongs to W 1,2 (Bρ (x)) and minimizes the problem inf |∇v|2 dx + |v − g|2 dx : v ∈ u∗ + W01,2 (Bρ (x)) . Bρ (x)
Bρ (x)
Thus u∗ is a solution of the Dirichlet problem −v + v = g in Bρ (x), v = u∗ on ∂Bρ (x). According to classical results on regularity properties of the solutions of Dirichlet problems (see, for instance, [90], [193]) we have u∗ ∈ C1 (Bρ (x)). Last step. Collecting the two previous steps we straightforwardly deduce that mw = ms and that (u∗ , S u∗ ∩ ) is a solution of (14.14). For other variational models in computer vision and image processing, see [183], [41], [42], [43], and references therein.
i
i i
i
i
“abmb 2005/1 page 5 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 5 i
Chapter 15
Variational problems with a lack of coercivity
As we have seen in Section 3.2, every minimization problem of a coercive lower semicontinuous function admits a solution. On the other hand, without the coercivity assumption, in general we cannot apply the direct method and the existence of a minimizer may fail. This may occur even if the cost function is convex, as it happens, for instance, in the case min{ex : x ∈ R}. However, some minimum problems, even if not coercive, still admit a solution, as, for instance, the case min{x 2 : (x, y) ∈ R2 } trivially shows. In this chapter we present some methods which allow us to identify the noncoercive minimum problems which admit a solution. The history of these tools goes back to Stampacchia [210] and Fichera [135], who developed them to treat noncoercive cases in the framework of variational inequalities and of unilateral contact problems in elasticity, respectively. On the other hand, at least in the finite dimensional convex situations, the geometrical tool of recession function was introduced by Rockafellar in 1964, and this has been shown to be very useful in a large number of cases. The theory we present in Section 15.1 for the convex cases and in Section 15.2 for the general ones appeared first in the paper by Baiocchi et al. in 1988 [50] and makes it possible to treat in a unified way problems of geometrical type as well as problems coming from continuum mechanics.
15.1
Convex minimization problems and recession functions
In this section we will treat convex minimum problems, not necessarily coercive, and we will prove the existence of minimizers provided some compatibility conditions are satisfied. The simplest example of a variational noncoercive minimization problem is the classical Neumann problem presented in Section 6.2, 1 min |Du|2 dx − L, u : u ∈ H 1 () , 2 553
i
i i
i
i
i
i
554
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
n where 1 is a connected bounded Lipschitz domain of R and L belongs to the dual space H () . The Euler–Lagrange equation of the minimization problem above can be written in the weak form DuDφ dx = L, φ ∀φ ∈ H 1 (). (15.1)
Now, if the term L on the right-hand side of (15.1) is of the form L, φ = f φ dx + gφ dHn−1
∂
with f ∈ L2 () and g ∈ L2 (∂), which we write as L = f + g, integrating by parts the left-hand side of (15.1) we obtain the PDE problem −u = f in , ∂u =g on ∂. ∂ν It is well known that a solution of the problem above exists iff the compatibility condition L, 1 = 0
(15.2)
is fulfilled. This can be seen in a simple way by remarking that if a solution of (15.1) exists, condition (15.2) follows straightforwardly by taking as a test function φ = 1; on the other hand, if the compatibility condition (15.2) is fulfilled, then by the Poincaré inequality (Theorem 5.3.1), the minimization problem becomes coercive as soon as it is restricted to the class of functions in H 1 () with zero average. Therefore we obtain the existence of a solution u for the problem written in weak form as 1 DuDψ dx = L, ψ ∀ψ ∈ H (), ψ dx = 0.
Now, this implies the existence of a solution for (15.1) by noticing that every function φ in H 1 () can be written as φ = ψ + c, where ψ is a function with zero average and c ∈ R, so that condition (15.2) yields DuDφ dx = DuDψ dx = L, ψ = L, φ.
In this section, (V , σ ) will denote a real locally convex Hausdorff topological vector space, and F : V →] − ∞, +∞] will be a proper convex and sequentially σ -lower semicontinuous mapping. The minimization problem we are interested in is % & min F (v) : v ∈ V . From the discussions above we know that the existence or nonexistence of a solution depends on some compatibility conditions that we want to identify. To do this we recall the classical definition of recession function introduced by Rockafellar [201] in the finite dimensional case (see also Sections 11.3 and 13.3).
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
555
Definition 15.1.1. Given a proper convex and sequentially σ -lower semicontinuous functional F : V →] − ∞, +∞], the recession functional F ∞ of F is defined, for every v ∈ V , by F (v0 + tv) F ∞ (v) = lim , (15.3) t→+∞ t where v0 is any element of dom F = {v ∈ V : F (v) < +∞}. The main properties of the recession functional F ∞ are listed in the following proposition. Proposition 15.1.1. We have the following: (i) The limit in (15.3) exists and is independent of v0 . (ii) The functional F ∞ can equivalently be expressed by % & F ∞ (v) = sup F (u + v) − F (u) : u ∈ dom F = F (v + tv) − F (v ) > 0 0 = sup : t >0 t for any v0 ∈ dom F . (iii) F ∞ is proper, convex, sequentially σ -lower semicontinuous, and positively 1-homogeneous, that is, F ∞ (tv) = tF ∞ (v)
∀t ≥ 0,
∀v ∈ V .
(iv) For every F1 , . . . , Fn proper, convex, sequentially σ -lower semicontinuous mappings, with (dom F1 ) ∩ . . . ∩ (dom Fn ) = ∅, it is (∞ ' n n Fi∞ . Fi = i=1
i=1
(v) F ∞ (v) + F ∞ (−v) ≥ 0 for every v ∈ V . Proof. Let us prove that the limit in (15.3) exists. This follows from the fact that for every v0 ∈ dom F and v ∈ V the function φ(t) = F (v0 + tv) is convex on R; hence the mapping t → φ(t) − φ(0) /t is nondecreasing and so it admits a limit as t → +∞. Moreover, the fact that the definition of F ∞ does not depend on v0 ∈ dom F follows from property (ii). Let us prove now that for every v0 ∈ dom F and v ∈ V it is F (v0 + tv) F (v0 + tv) − F (v0 ) lim = sup : t >0 . t→+∞ t t The inequality ≤ is trivial because lim
t→+∞
F (v0 + tv) F (v0 + tv) − F (v0 ) = lim t→+∞ t t F (v0 + tv) − F (v0 ) ≤ sup : t >0 . t
i
i i
i
i
i
i
556
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
To prove the opposite inequality, fix s > 0 and t > s; by the convexity of F we have s s F (v0 + sv) = F 1 − v0 + (v0 + tv) t t s s ≤ 1− F (v0 ) + F (v0 + tv), t t so that
F (v0 + tv) − F (v0 ) F (v0 + sv) − F (v0 ) ≤ . s t By letting first t → +∞, and then taking the supremum for s > 0, we get F (v0 + sv) − F (v0 ) F (v0 + tv) sup : s > 0 ≤ lim . t→+∞ s t
To conclude the proof of (ii) it remains to prove that for every v0 ∈ dom F and v ∈ V the equality F (v0 + tv) − F (v0 ) sup F (u + v) − F (u) = sup (15.4) t t>0 u∈dom F holds. Let u, v0 ∈ dom F and v ∈ V ; by using the convexity and lower semicontinuity of F we obtain 1 1 u + (v0 + tv) F (u + v) ≤ lim inf F 1 − t→+∞ t t 1 1 ≤ lim inf 1 − F (u) + F (v0 + tv) t→+∞ t t F (v0 + tv) − F (v0 ) = F (u) + lim ; t→+∞ t hence, inequality ≤ in (15.4) is proved. To prove the opposite inequality, we denote by S the left-hand side of (15.4); it is clear that without loss of generality we may assume S < +∞. Then u + v ∈ dom F for every u ∈ dom F and so, from F (u + v) ≤ S + F (u), we deduce for every integer k ≥ 0 F (u + kv) = F (u) +
k
F (u + iv) − F (u + (i − 1)v) ≤ F (u) + kS.
i=1
Take now two nonnegative integers h, k; by using the convexity of F and the inequality above we obtain 1 u + hv h 1− u+ F u+ v =F k k k 1 1 ≤ 1− F (u) + F (u + hv) k k 1 1 h ≤ 1− F (u) + (F (u) + hS) = F (u) + S. k k k
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
557
Finally, by using the lower semicontinuity of F , we have F (u + tv) ≤ F (u) + tS
∀t ≥ 0.
Thus, taking u = v0 , we obtain F (v0 + tv) − F (v0 ) ≤S t
∀t > 0,
which concludes the proof of (15.4). The fact that F ∞ is proper, convex, and sequentially σ -lower semicontinuous follows from assertion (ii). Indeed, for every u ∈ dom F the mapping v → F (u + v) − F (u) is clearly convex and sequentially σ -lower semicontinuous, and so is F ∞ , thanks to the well-known properties of supremum of convex functions. The fact that F ∞ is positively 1-homogeneous follows easily from the definition. Indeed, given v ∈ V and s > 0, we have F ∞ (sv) = lim
t→+∞
F (v0 + tsv) t
and, setting τ = ts, F (v0 + τ v) = sF ∞ (v). τ →+∞ τ
F ∞ (sv) = s lim
Assertion (iv) follows straightforwardly by the definition of recession function. Finally, to prove (v), we may reduce ourselves to the case when both F ∞ (v) and F ∞ (−v) are finite; otherwise the statement is trivial. Therefore, by using (ii), we deduce that u + v ∈ dom F,
u − v ∈ dom F
∀u ∈ dom F ;
hence, by using (ii) again, taking u − v instead of u in the supremum associated to F ∞ (v), we have F ∞ (v) + F ∞ (−v) ≥ sup F (u) − F (u − v) u∈dom F + sup F (u − v) − F (u) ≥ 0, u∈dom F
which concludes the proof of Proposition 15.1.1. Here are some simple examples in which the recession functional can be explicitly computed. Example 15.1.1. Let V = R and let F (x) = ex for every x ∈ R. Then an easy calculation shows that in this case = 0 if x ≤ 0, F ∞ (x) = +∞ if x > 0. Analogously, if V = R2 and F (x) = x12 , then 0 F ∞ (x) = +∞
if x1 = 0, if x1 = 0.
i
i i
i
i
i
i
558
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
As we will see more precisely later, this example shows how the recession function indicates the directions of coercivity. Example 15.1.2. Let F : V → [0, +∞] be a nonnegative convex σ -lower semicontinuous functional which is positively homogeneous of degree p > 1, that is, F (tv) = t p F (v) Then we have
∀t > 0, ∀v ∈ V .
=
0 if F (v) = 0, +∞ otherwise. On the other hand, if F is positively homogeneous of degree 1, then it is clear that F ∞ = F . F ∞ (v) =
Example 15.1.3. Let be an open subset of Rn with a Lipschitz boundary, and let V = W 1,p (; Rm ) (p ≥ 1) be the Sobolev space of all Rm -valued functions which are in Lp () along with their first derivatives. Consider the functional f (x, Du) dx ∀u ∈ W 1,p (, Rm ), F (u) =
where (i) f : × Rmn → [0, +∞] is a Borel function, (ii) for a.e. x ∈ the function f (x, ·) is convex and lower semicontinuous on Rmn , (iii) there exists u0 ∈ W 1,p (, Rm ) such that F (u0 ) < +∞. It is well known (see, for instance, Section 13.1) that under the assumptions above, the functional F turns out to be proper, convex, and sequentially lower semicontinuous with respect to the weak topology of W 1,p (, Rm ). Moreover, we have ∞ f ∞ (x, Du) dx ∀u ∈ W 1,p (, Rm ), F (u) = ∞
where f (x, ·) is the recession function of f (x, ·). In fact, because of the convexity of f , for all u ∈ W 1,p (, Rm ) the function f x, Du0 (x) + tDu(x) − f x, Du0 (x) g(x, t) = t is nondecreasing with respect to t for a.e. x ∈ . Therefore the monotone convergence theorem gives F (u0 + tu) − F (u0 ) t→+∞ t
F ∞ (u) = lim = lim
t→+∞
As a consequence, if
f ∞ x, Du(x) dx.
g(x, t) dx =
|Du|p dx
F (u) =
∀u ∈ W 1,p (, Rm )
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
559
with p > 1, we get F ∞ (u) =
=
0 +∞
if u is locally constant in , otherwise.
We are now in a position to give a first result on necessary conditions in convex minimization. Proposition 15.1.2. Assume that % & inf F (v) : v ∈ V > −∞ (which always occurs, for instance, if F admits a minimum point on V ). Then F ∞ (v) ≥ 0
∀v ∈ V .
(15.5)
Proof. Let m be the infimum of F on V , let v0 be any point in dom F , and let v ∈ V . Since F is proper the infimum m is finite and so, by the definition of F ∞ we obtain F ∞ (v) = lim
t→+∞
F (v0 + tv) m ≥ lim = 0. t→+∞ t t
Example 15.1.4. Let P : V → [0, +∞] be a convex sequentially σ -lower semicontinuous functional which is positively homogeneous of degree p > 1 (for instance, the pth power of a seminorm), let L ∈ V , and let F be the functional defined by F (v) = P (v) − L, v By Example 15.1.2 we have F ∞ (v) =
−L, v +∞
∀v ∈ V .
if P (v) = 0, otherwise;
hence, by the necessary condition of Proposition 15.1.2 we deduce that if the minimum problem % & min F (v) : v ∈ V admits a solution, then the linear functional L has to satisfy the compatibility condition L, v ≤ 0
∀v ∈ V with P (v) = 0.
Example 15.1.5. The necessary condition of Proposition 15.1.2 is clearly not sufficient to obtain the existence of a minimizer, as the example of the function F (x) = ex with V = R shows. In fact, in this case it is F ∞ ≥ 0 (see Example 15.1.1) but the function F has no minimum points on R. We give now an existence result for convex minimum problems without coercivity; the existence of a minimizer for the functional F will be obtained by adding to the necessary condition (15.5) some more requirements, namely, semicontinuity, compactness, and
i
i i
i
i
i
i
560
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
compatibility conditions, in the sense specified below. We will use the notation ker F ∞ for % & ∞ the set v ∈ V : F (v) = 0 , which, if condition (15.5) is satisfied, is a sequentially σ -closed convex cone. Theorem 15.1.1. Let V be a reflexive and separable Banach space with norm · , and let F : V →] − ∞, +∞] be a proper convex sequentially weakly lower semicontinuous functional. Assume that the following conditions are satisfied: (i) compactness: if th → +∞, vh → v weakly, and F (th vh ) is bounded from above, then vh − v → 0; (ii) necessary condition: F ∞ (v) ≥ 0 for every v ∈ V ; (iii) compatibility: ker F ∞ is a linear subspace of V . Then the minimum problem
% & min F (v) : v ∈ V
(15.6)
admits at least a solution. Proof. For convenience, we divide the proof into several steps. Step 1. For every h ∈ N consider the minimum problem % & min F (v) : v ∈ Bh ,
(℘h )
where Bh = {v ∈ V : v ≤ h}. Since F is sequentially weakly lower semicontinuous and Bh is sequentially weakly compact, by the direct method of the calculus of variations (see Section 3.2) we obtain that for every h ∈ N there exists a solution vh of problem (℘h ). Step 2. If for some h ∈ N it is vh < h, we claim that the proof of the theorem is achieved because such vh is a solution of problem (15.6). Indeed, due to the convexity of F , for every v ∈ V and every θ ∈]0, 1[ F vh + θ (v − vh ) ≤ θ F (v) + (1 − θ)F (vh ); hence F (v) − F (vh ) ≥
1 F vh + θ(v − vh ) − F (vh ) . θ
Due to the definition of vh , the right-hand side is nonnegative whenever vh +θ(v −vh ) ∈ Bh , which always occurs when θ is choosen small enough to have θ v − vh ≤ h − vh . Therefore, to conclude the proof of the theorem, it remains to show that the case vh = h
∀h ∈ N
(15.7)
leads to existence of a solution of problem (15.6) too. Thus, we assume for the rest of the proof that (15.7) holds.
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
561
Step 3. For every h ∈ N set wh = vh / h; the sequence (wh ) isweakly compact in V . Then we may extract a subsequence which we still denote by (wh ) weakly converging to some w ∈ V . We have F (hwh ) = F (vh ) ≤ F (v0 ) < +∞ (15.8) for every h large enough, where v0 is any point in dom F . Therefore, by the compactness assumption (i), we obtain wh − w → 0. Moreover, by using the lower semicontinuity and convexity of F , we have for every t > 0 t v0 F (tw + v0 ) ≤ lim inf F twh + 1 − h→+∞ h t t ≤ lim inf F (v0 ) ≤ F (v0 ), F (hwh ) + 1 − h→+∞ h h
where the last inequality follows from (15.8). Hence F (tw + v0 ) F (v0 ) ≤ lim = 0, t→+∞ t→+∞ t t
F ∞ (w) = lim
which, together with the necessary condition (ii), implies w ∈ ker F ∞ .
Step 4. By the compatibility condition (iii) we obtain F ∞ (−w) = 0, and this implies, by Proposition 15.1.1(ii), that F (v − tw) ≤ F (v)
∀v ∈ V , ∀t ≥ 0.
(15.9)
In particular, by taking v = vh and t = h, from the fact that wh − w → 0 we have vh − hw < h for h large enough, and then from (15.9) it follows that vh − hw is a solution of problem (℘h ) for h large enough, with vh − hw < h. Hence we found a solution of problem (℘h ), for h large enough, with norm strictly less than h, and by repeating the argument used in Step 2, this provides a solution of the minimum problem (15.6) Remark 15.1.1. In the case of nonreflexive Banach spaces V the result above still holds; it is enough to assume that V is a Banach space with norm · and to consider a topology σ on V coarser than the norm topology and such that (V , σ ) is a Hausdorff vector space with the closed unit ball of (V , · ) sequentially σ -compact. This happens, for instance, when V is the dual W of a separable Banach space W and σ is the weak* topology on V . Then the proof above can be repeated if F : V →] − ∞, +∞] is assumed to be proper convex and sequentially σ -lower semicontinuous and such that (ii), (iii) hold together with the compactness condition: if th → +∞, vh → v with respect to σ , and F (th vh ) is bounded from above, then vh − v → 0.
i
i i
i
i
i
i
562
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Remark 15.1.2. Notice that if dom F is bounded in (V , · ), then conditions (i), (ii), and (iii) are automatically fulfilled. In fact, in this case the set ker F ∞ reduces to {0}, as can be seen immediately. Remark 15.1.3. Consider the particular case when the so-called condition of Lions– Stampacchia type (see [173]) F ∞ (v) > 0
∀v = 0
(15.10)
holds. Then we obtain immediately that assumptions (ii) and (iii) are fulfilled. By Theorem 15.1.1 we obtain that, if the compactness assumption (i) also holds, then the set of solutions of problem (15.6) is nonempty. In this case the set of solutions is also bounded. In fact, by contradiction, assume there exist vh solutions of (15.6) with vh → +∞; arguing as in the proof of Theorem 15.1.1, it is possible to prove that the sequence wh = vh /vh converges in norm to some w ∈ ker F ∞ . From assumption (15.10) it follows that w = 0, and this contradicts the fact that wh = 1 for every h. We consider now the particular case of quadratic forms on a Hilbert space. More precisely, let V be a separable Hilbert space, let a : V × V → R be a symmetric bilinear continuous form, and let L ∈ V . Define F (v) =
1 a(v, v) − L, v 2
∀v ∈ V
and consider the minimum problem
% & min F (v) : v ∈ V ,
(15.11)
which is equivalent to the equation in V .
a(v, ·) = L
On the bilinear form a we assume that the following conditions are satisfied: a(v, v) ≥ 0 ∀v ∈ V , vh → 0 weakly, and a(vh , vh ) → 0
⇒
vh → 0 strongly.
(15.12) (15.13)
In this case, since ker(F ∞ − L) = (ker F ∞ ) ∩ (ker L), as a corollary of Theorem 15.1.1 we obtain the following result. Proposition 15.1.3. The minimum problem (15.11) admits a solution iff the compatibility condition L is orthogonal to ker a (i.e., L, v = 0 whenever a(v, v) = 0)
(15.14)
is fulfilled. Note that the condition above reads simply ker a ⊂ ker L. Proof. An easy computation shows that −L, v F ∞ (v) = +∞
if a(v, v) = 0, if a(v, v) = 0;
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
563
then, by Proposition 15.2, if problem (15.11) admits a solution, we must necessarily have F ∞ ≥ 0, that is, (15.14). Conversely, let us assume (15.14) holds. The weak lower semicontinuity of the functional F is then a consequence of the continuity of the bilinear form a, the compactness assumption (i) follows from property (15.13), and finally the necessary condition (ii) and the compatibility condition (iii) follow from property (15.14). Therefore, by Theorem 15.1.1 we obtain that the minimum problem (15.11) admits a solution. Example 15.1.6. Consider the variational formulation of the classical Neumann problem 1 2 1 min |Du| dx − L, u : u ∈ H () , 2 where is a bounded regular open subset of Rn and L ∈ H 1 () . If we consider the bilinear form a(u, v) = Du Dv dx ∀u, v ∈ H 1 ()
and apply Proposition 15.1.3, we obtain that a solution exists iff the compatibility condition L, 1 = 0 is fulfilled. In terms of partial differential equations, when L = f + g with f ∈ L2 () and g ∈ L2 (∂), this means that the problem −u = f in , ∂u =g on ∂ ∂ν admits a solution iff
f (x) dx +
g(x) dHn−1 (x) = 0. ∂
For every subset K of V we denote by χK the indicator function of K defined by = 0 if v ∈ K, χK (v) = +∞ otherwise. Notice that if K is a nonempty sequentially σ -closed convex subset of V , then the function χK turns out to be a proper convex sequentially σ -lower semicontinuous mapping. Definition 15.1.2. For every nonempty sequentially σ -closed convex subset K of V we define the recession cone K ∞ of K by setting K ∞ = dom(χK )∞ .
(15.15)
For every function F : V →] − ∞, +∞] we denote by epi F the epigraph of F % & epi F = (v, t) ∈ V × R : t ≥ F (v) . (15.16)
i
i i
i
i
i
i
564
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
We remark that epi F is a nonempty sequentially closed convex subset of V × R whenever F is a proper sequentially σ -lower semicontinuous convex function. Proposition 15.1.4. If F : V →] − ∞, +∞] is a proper convex sequentially σ -lower semicontinuous function, we have epi F ∞ = (epi F )∞ . Proof. By Proposition 15.1.1(ii) it is (v, t) ∈ epi F ∞
⇐⇒
t ≥ F (u + v) − F (u)
∀u ∈ dom F.
In other words, if (s, u) ∈ epi F we have F (u + v) ≤ t + F (u) ≤ t + s, that is, (v, t) + epi F ⊂ epi F, or equivalently,
χepi F (v, t) + (w, α) = 0
∀(w, α) ∈ epi F.
By Proposition 15.1.1 again, we obtain ∞ χepi F (v, t) = 0, that is, (v, t) ∈ (epi F )∞ . The definition given in (15.15) of recession cone is a very useful tool in convex analysis (see [201], [51], [52], and [79], where it is called “cône asymptote”); let us now state its main properties. Proposition 15.1.5. For any nonempty sequentially σ -closed convex subset K of V the set K ∞ is a convex sequentially σ -closed cone and the following properties hold true: K∞ = t −1 (K − v0 ) ∀v0 ∈ K, (15.17) t>0
K is a cone ⇐⇒ K ∞ = K, 0 ∈ K ⇐⇒ K ∞ ⊂ K, K is bounded ⇒ K ∞ = {0},
(15.18) (15.19) (15.20)
dom F ∞ ⊂ (dom F )∞ whenever F : V →] − ∞, +∞] is convex and lsc. (15.21) Moreover, a point w ∈ V belongs to K ∞ iff one of the following conditions is fulfilled: k + w ∈ K ∀k ∈ K, k + tw ∈ K ∀k ∈ K ∀t ≥ 0,
(15.22) (15.23)
∃k ∈ K : k + tw ∈ K ∀t ≥ 0.
(15.24)
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
565
Proof. By Proposition 15.1.1(ii) and the definition (15.15) of K ∞ we have that v ∈ K ∞ iff χK (v0 + tv) − χK (v0 ) ≤ 0
∀t > 0, ∀v0 ∈ K,
which is equivalent to v + tv0 ∈ K
∀t > 0, ∀v0 ∈ K.
Therefore (15.17) follows. The proof of (15.18) follows from Proposition 15.1.1(iii). Assertion (15.19) follows easily from (15.17) by taking v0 = 0. Always from (15.17) we obtain that K ∞ reduces to {0} whenever K is bounded, that is, (15.20). The proof of (15.21) simply follows by remarking that when F ∞ (v) < +∞, then v0 + tv ∈ dom F for every t > 0 and every v0 ∈ dom F . Finally, (15.22), (15.23), (15.24) follow from Proposition 15.1.1(ii). Remark 15.1.4. We remark that the inclusion in (15.21) may be strict: this can be seen by taking, for instance, F (v) = v2 . In this case we have dom F = (dom F )∞ = V
whereas
dom F ∞ = {0}.
In the finite dimensional case, implication (15.20) can be reversed, as the following proposition shows. Proposition 15.1.6. Let K be a convex closed subset of Rn such that K ∞ = {0}. Then K is bounded. Proof. Assume by contradiction that K is unbounded; then for every h ∈ N there exists xh ∈ K with |xh | ≥ h. The sequence yh = xh /|xh | is bounded, so that we may extract a subsequence (which we still denote for simplicity by yh ) converging to some y with |y| = 1. We claim that y ∈ K ∞ . In fact, fix t > 0 and let x0 be any point in K; since K is convex, for h large enough we have t t xh + 1 − x0 ∈ K, |xh | |xh | and, since K is closed, passing to the limit as h → +∞, ty + x0 ∈ K, which implies that y belongs to K ∞ . This gives a contradiction because by assumption K ∞ = {0} and |y| = 1. In general infinite dimensional topological vector spaces, Proposition 15.1.6 is false, as shown by the following example. Example 15.1.7. Let V be a separable (infinite dimensional) Hilbert space, and let (en )n∈N be a complete orthonormal system in V . Consider the set % & K = v ∈ V : |(v, en )| ≤ n for every n ∈ N ,
i
i i
i
i
i
i
566
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
where (·, ·) denotes the scalar product in V . It is easy to see that K is a nonempty convex weakly closed subset of V ; moreover, we have K ∞ = {0}. In fact, if v ∈ K ∞ , by using the fact that 0 ∈ K, we obtain from (15.17) tv ∈ K that is, |(v, en )| ≤
for every t > 0,
n for every t > 0 and n ∈ N. t
Therefore, as t → +∞, we get (v, en ) = 0 ∀n ∈ N, and so v = 0. Nevertheless, K is unbounded, as it contains the points vn = nen for every n ∈ N. We will specialize now our existence results on noncoercive minimum problems to the case when the functional F can be written in the form F (v) = J (v) − L, v + χK (v). This situation arises, for instance, in many problems of mathematical physics, where J represents the stored energy functional depending on the nature of the body, L describes the action of the applied forces, and K is the set of admissible configurations which takes into account the physical constraints of the problems. In this case, the minimization problem we are dealing with takes the form % & min J (v) − L, v : v ∈ K . Theorem 15.1.2. Assume that J : V → [0, +∞] is a proper convex sequentially σ -lsc functional,
(15.25)
L : V → R is a linear σ -continuous functional, K ⊂ V is a nonempty convex sequentially σ -closed set,
(15.26) (15.27)
and consider the minimum problem % & min J (v) − L, v : v ∈ K .
(15.28)
Then a necessary condition for the existence of at least a minimizer is J ∞ (v) ≥ L, v
∀v ∈ K ∞ .
(15.29)
On the other hand, the minimum problem (15.28) admits at least a solution provided the necessary condition (15.29) holds and the following compactness and compatibility conditions are fulfilled: if th → +∞, vh ∈ K, vh → v weakly, and J (th vh ) − th L, vh is bounded from above, then vh − v → 0,
(15.30)
K ∞ ∩ ker(J ∞ − L) is a linear subspace of V .
(15.31)
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
567
Proof. Noticing that by Proposition 15.1.1(iv) the equality (J − L + χK )∞ = J ∞ − L + χK ∞ holds, and taking Proposition 15.1.2 into account, we obtain that condition (15.29) is necessary for the existence of at least a solution of the minimum problem (15.28). Analogously, compactness and compatibility conditions (i) and (iii) become in this case (15.30) and (15.31), so that the conclusion follows by Theorem 15.1.1. Remark 15.1.5. When J is a quadratic form, or more generally when J ∞ takes only the values 0 and +∞ (which, for instance, occurs if J is positively p-homogeneous with p > 1; see Example 15.1.2, condition (15.31)), it can be written in the simpler form K ∞ ∩ ker J ∞ ∩ ker L
is a linear subspace of V ,
(15.32)
which will be used in the following. Let us discuss the structural assumptions of the existence result of Theorem 15.1.1 above. Example 15.1.8. The compactness assumption (i) in the existence theorem, Theorem 15.1.1, cannot be dropped, as the following example shows. Let V be an infinite dimensional separable Hilbert space, and let (en )n∈N be a complete orthonormal system in V . We denote by (·, ·) the scalar product in V , and we define a functional F : V → R by setting F (v) = 2−n |(v, en ) − 1|2 ∀v ∈ V . n∈N
It is easy to see that the functional F is finite-valued, convex, and weakly lower semicontinuous. Moreover, for every v ∈ V F (tv) t 1 2 −n t|(v, en )| − 2(v, en ) + = lim . 2 t→+∞ t n∈N
F ∞ (v) = lim
t→+∞
Hence, F ∞ (v) =
0 +∞
if v = 0, if v = 0,
so that the necessary condition (ii) and the compatibility condition (iii) of Theorem 15.1 are fulfilled. Nevertheless, the functional F does not admit any minimum point in V . In fact, taking for every k ∈ N k vk = ei i=1
we get inf F (v) ≤ F (vk ) =
v∈V
∞
2−n
n=k+1
i
i i
i
i
i
i
568
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
so that inf F (v) = 0.
v∈V
But there are no points v ∈ V such that F (v) = 0. Indeed, F (v) = 0 would imply (v, en ) = 1 for every n ∈ N, which contradicts the equality v2 = |(v, en )|2 . n∈N
Example 15.1.9. The compatibility condition (iii) in Theorem 15.1.1 cannot be dropped, as the following example shows. Take V = R and define F : R →] − ∞, +∞] by = − log x if x > 0, F (x) = +∞ if x ≤ 0. The function F is proper, convex, and lower semicontinuous, and the compactness condition (i) is fulfilled since the dimension of V is finite. Moreover, a simple calculation yields 0 if x ≥ 0, F ∞ (x) = +∞ if x < 0, and so the necessary condition (ii) is fulfilled too, but % & inf F (x) : x ∈ R = −∞. We conclude this section by showing how Theorem 15.1.1 can be used to determine whether the algebraic difference of two closed convex sets in a Banach space is closed. Here we consider a reflexive Banach space V and two nonempty closed convex subsets A, B of V ; the algebraic difference A − B is defined by A − B = {a − b : a ∈ A, b ∈ B}. Example 15.1.10. We emphasize that even when dealing with cones in a finite dimensional space, the convex set A − B may be not closed: take, for instance, V = R3 and % & A = %x ∈ R3 : x1 ≥ 0, x2 ≥& 0, x3 ≥ 0, x1 x3 ≥ x22 , B = x ∈ R3 : x2 = x3 = 0 . The sets A and B are closed convex cones, but a simple calculation gives % & % & A − B = x ∈ R3 : x2 > 0, x3 > 0 ∪ x ∈ R3 : x2 = 0, x3 ≥ 0 , which is not closed. Indeed, for every n ∈ N we have that the point (0, 1, 1/n) belongs to A − B, whereas their limit (0, 1, 0) is not in A − B. In the previous example the convex sets A and B are both unbounded. On the other hand, if A and B are two closed convex subsets of a reflexive Banach space and at least one of them is bounded, then A − B is closed (weak and strong closedness coincide,
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
569
due to convexity). Indeed, if A is bounded and xh = ah − bh tends weakly to x with ah ∈ A and bh ∈ B, then up to subsequences we have ah → a ∈ A weakly, hence bh = xh − ah → x − a ∈ B weakly, so that x ∈ A − B. The following lemma characterizes the closed convex subsets of V . Lemma 15.1.1. Let K be a nonempty convex subset of a reflexive Banach space V . Then the following conditions are equivalent: (i) K is closed. (ii) For every u ∈ V the function v → u − v has a minimum on K. Proof. Assume K is closed and let u ∈ V ; set % & M = inf u − v : v ∈ K , and for every h ∈ N let vh ∈ K be such that u − vh ≤ M +
1 . h
The sequence (vh ) is bounded; since V is reflexive, possibly passing to subsequences, we may assume that vh converges weakly to some v which belongs to K, because K is weakly closed (being strongly closed and convex). By the weak lower semicontinuity of the norm, we obtain u − v ≤ lim inf u − vh ≤ M, h→+∞
which proves (ii). Conversely, assume (ii) holds, and let (vh ) be a sequence in K strongly convergent to some v ∈ V . By (ii) there exists w ∈ K such that w − v ≤ w − v
∀w ∈ K.
w − v ≤ vh − v
∀h ∈ N;
In particular, hence, as h → +∞, we obtain w = v, and this proves that v ∈ K. We are now in a position to prove a closure result for the difference of two closed sets. Theorem 15.1.3. Let V be a reflexive Banach space, and let A and B be two nonempty closed convex subsets of V . Assume that A is locally compact for the strong topology and that A∞ ∩ B ∞ is a linear subspace. (15.33) Then the convex set A − B is closed. Proof. By Lemma 15.1.1 it is enough to prove that for every u ∈ V the function v → u−v has a minimum on A − B, or equivalently that the problem % & min F (a, b) : (a, b) ∈ V × V (15.34)
i
i i
i
i
i
i
570
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
has at least a solution, where F (a, b) = u − a + b + χA (a) + χB (b). We apply to the minimum problem (15.34) the existence theorem, Theorem 15.1.1. Since F is a nonnegative functional, the necessary condition F ∞ ≥ 0 is immediately fulfilled. To prove the compactness condition (i) take ah → a weakly in V , bh → b weakly in V , th → +∞ such that F (th ah , th bh ) ≤ C; then by the definition of F we have th ah ∈ A,
th bh ∈ B,
u − th ah + th bh ≤ C.
Since the convex set A is assumed to be locally compact for the strong topology of V and th ah ∈ A, the convergence ah → a is actually strong; moreover, by u − th ah + th bh ≤ C we obtain that bh − ah → 0 and so also bh → a strongly. Let us finally prove the compatibility condition (iii). By the definition of F we find F ∞ (a, b) = lim
t→+∞
F (a0 + ta, b0 + tb) , t
where a0 ∈ A and b0 ∈ B. Therefore F ∞ (a, b) = lim a − b + χA (a0 + ta) + χB (b0 + tb) t→+∞
= a − b + χA∞ (a) + χB ∞ (b) so that
% & ker F ∞ = (a, b) : a ∈ A∞ , b ∈ B ∞ , a = b % & = (a, a) : a ∈ A∞ ∩ B ∞ .
Therefore the compatibility condition (iii) follows from assumption (15.33). Remark 15.1.6. The requirement that A is locally compact for the strong topology is clearly fulfilled when A is finite dimensional. On the other hand, there are closed convex subsets A which are locally compact for the strong topology but not finite dimensional. For instance, it is enough to take in a separable Hilbert space V the set & % A = v ∈ V : |(v, en )| ≤ 1/n for all n ∈ N , where (en )n∈N is a complete orthonormal system in V . The results of this section allow us to study the problem of the lower semicontinuity for the inf-convolution of two convex functions. We recover some results of Section 9.2. If V is a reflexive Banach space and f, g : V →] − ∞, +∞] are two proper convex functions, we recall that the inf-convolution f #e g is defined by (f #e g)(w) = inf{f (u) + g(v) : u, v ∈ V , u + v = w}.
(15.35)
For instance, if f = χA and g = χB , with A, B convex subsets of V , we have f #e g = χA+B . Moreover, it is easy to see that in terms of epigraphs the inf-convolution operation turns out to simply reduce to the algebraic sum, that is, epif #e g = epif + epig .
i
i i
i
i
i
i
15.1. Convex minimization problemsand recession functions
“abmb 2005/1 page 5 i
571
As we saw in Example 15.1.10, it may happen that f and g are both lower semicontinuous but the lower semicontinuity does not occur for the inf-convolution f #e g. To see when f #e g is lower semicontinuous, we investigate equivalently on the closedness of epif #e g and we obtain the following result. Proposition 15.1.7. Let f, g : V →]−∞, +∞] be two proper convex lower semicontinuous functions. Assume the following: (i) compactness: if th → +∞, uh and vh converge weakly, uh + vh → 0 strongly, and f (th uh ) + g(th vh ) is bounded from above, then uh and vh converge strongly; (ii) compatibility: if f ∞ (v) + g ∞ (−v) ≤ 0, then f ∞ (−v) + g ∞ (v) ≤ 0. Then the inf-convolution f #e g defined in (15.35) is lower semicontinuous. Proof. By Lemma 15.1.1 it is enough to show that for every M ∈ R and every u ∈ V the function (t, v) → |t − M| + v − u admits a minimum on epif #e g . In other words, we have to show the existence of a solution for the minimum problem & % min |t − M| + v − u : t ≥ (f #e g)(v) . It is easy to see that for a fixed v, the optimal t in the minimum problem above is given by t = M ∨ (f #e g)(v), so that we have to show the existence of a solution for the problem & % min M ∨ (f #e g)(v) + v − u : v ∈ V . By the definition of inf-convolution this fact turns out to be equivalent to the existence of a solution for % & min M ∨ f (x) + g(y)) + x + y − u : x, y ∈ V . Setting (x, y) = M ∨ f (x) + g(y)) + x + y − u we apply to the existence theorem, Theorem 15.1.1. The convexity and the weak lower semicontinuity of follow straightforwardly. To prove the compactness assumption (i) of Theorem 15.1.1, take xh → x and yh → y weakly in V , and th → +∞ such that (th xh , th yh ) is bounded from above. Dividing by th we obtain lim sup h→+∞
f (th xh ) g(th yh ) + th th
+
+ xh + yh ≤ 0,
which implies that xh + yh → 0 strongly in V . By the compactness assumption (i), the convergence of xh and of yh is actually strong in V . + Since ∞ (x, y) = f ∞ (x)+g ∞ (y) x+y the necessary condition (ii) of Theorem 15.1.1 is fulfilled. It remains to prove the compatibility condition (iii). Since % & ker ∞ = (x, y) ∈ V × V : x + y = 0, f ∞ (x) + g ∞ (y) ≤ 0 , the fact that ker ∞ is a subspace follows immediately from assumption (ii).
i
i i
i
i
i
i
572
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Example 15.1.11. Let be a bounded connected open subset of Rn and let X be the Sobolev space H 1 (). We denote by (x, y) the points of , where x represents some k coordinates and y the remaining n − k. Consider the functionals |Dx u|2 dx dy − f, u, F (u) =
G(u) =
|Dy u|2 dx dy − g, u,
where f and g are in the dual space of X. The functionals F and G are convex and lower semicontinuous on H 1 (); we want to see if their inf-convolution F #e G is still lower semicontinuous. By Proposition 15.1.7 above it is enough to verify the compactness assumption (i) and the compatibility assumption (ii). If th → +∞, uh and vh converge weakly, uh +vh → 0 strongly, and F (th uh )+G(th vh ) is bounded from above, then dividing by th2 we have that Dx uh → 0 in L2 (),
Dy vh → 0 in L2 (),
which, together with the fact that uh + vh → 0 strongly in H 1 (), implies that uh and vh converge actually strongly in H 1 (). Finally, an easy calculation gives the expressions of the recession functions of F and G: +∞ if Dy v = 0, +∞ if Dx u = 0, G∞ (v) = F ∞ (u) = −g, v if Dy v ≡ 0. −f, u if Dx u ≡ 0, Therefore, to prove the compatibility assumption, and hence the lower semicontiuity of F #e G, it is enough to assume that f − g, 1 = 0.
15.2
Nonconvex minimization problems and topological recession
In this section we will consider general minimum problems of the form % & min F (v) : v ∈ V ,
(15.36)
where F is a possibly nonconvex functional noncoercive as well. Problems of this kind arise, for instance, in nonlinear elasticity (see Section 11.2), and due to the lack of convexity the results of previous sections cannot be applied. We will introduce a new kind of recession functional for general nonconvex functions and we will prove an abstract existence result under lower semicontinuity, compactness, and compatibility conditions, which will be expressed by means of this new tool. As in Section 15.1, (V , σ ) will denote a real locally convex Hausdorff topological vector space, and F : V →] − ∞, +∞] will be a proper (not necessarily convex) mapping. Definition 15.2.1. The topological recession functional F∞ of F is defined for every v ∈ V by F (tw) F∞ (v) = lim inf . (15.37) t→+∞ t w→v
i
i i
i
i
i
i
15.2. Nonconvex minimization problems and topological recession
“abmb 2005/1 page 5 i
573
The main properties of the functional F∞ are listed in the following proposition. Proposition 15.2.1. We have the following: (i) F∞ is σ -lower semicontinuous and positively homogeneous of degree 1. (ii) F∞ = F whenever F is σ -lower semicontinuous and positively homogeneous of degree 1. (iii) F∞ = F ∞ whenever F is proper, convex, and σ -lower semicontinuous. (iv) (F + G)∞ (v) ≥ F∞ (v) + G∞ (v) for every mapping G : V →] − ∞, +∞] and for every v ∈ V such that the sum at the right-hand side is defined. (v) The equality (F + G)∞ = F∞ + G∞ holds in the following cases: (va ) F and G are proper, convex, σ -lower semicontinuous, and dom F ∩dom G = ∅; (vb ) G is positively homogeneous of degree 1, finite, and σ -continuous; (vc ) F is convex and σ -lower semicontinuous, F (0) < +∞, G is σ -lower semicontinuous and positively homogeneous of degree 1. Proof. The proof of properties (i), (ii), (iv), (vb ) can be obtained immediately from Definition 15.2.1; property (va ) follows from property (iii) and from Proposition 15.1.1(iv); property (vc ) follows from property (iv) and from the fact that using properties (ii), (iii), and Proposition 15.1.1(i), we get for every v ∈ V F (tv) + G(tv) (F + G)∞ (v) ≤ lim inf t→+∞ t F (tv) = lim inf + G(v) t→+∞ t = F ∞ (v) + G(v) = F∞ (v) + G∞ (v). It remains to prove property (iii). Let v ∈ V and let v0 ∈ dom F be fixed; taking wt = v + v0 /t we get F∞ (v) ≤ lim inf t→+∞
F (twt ) F (v0 + tv) = lim inf = F ∞ (v). t→+∞ t t
On the other hand, by using the convexity and the lower semicontinuity of F , for every s > 0 we have s s v0 + tw F (v0 + sv) ≤ lim inf F 1 − t→+∞ t t w→v s s F (v0 ) + F (tw) ≤ lim inf 1 − t→+∞ t t w→v = F (v0 ) + s lim inf t→+∞ w→v
F (tw) = F (v0 ) + sF∞ (v). t
i
i i
i
i
i
i
574
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Therefore,
F (v0 + sv) − F (v0 ) ≤ F∞ (v) s for every s > 0, and passing to the limit as s → +∞, we obtain F ∞ (v) ≤ F∞ (v). Analogously to what we made in Section 15.2 in the convex case, for every nonempty subset K of V we may define the topological recession cone K∞ of K.
Definition 15.2.2. Let K be a nonempty subset of V . The topological recession cone K∞ of K is defined by (15.38) K∞ = dom(χK )∞ . Remark 15.2.1. By Definition 15.2.1 it follows that a point u belongs to K∞ iff ∀s > 0
∀U ∈ 0(u)
1 ∃t > s : U ∩ K = ∅, t
where 0(u) denotes the family of all σ -neighborhoods of u. In other words, ' ( $1 K∞ = K , clσ t t>s s>0
(15.39)
where clσ denotes the closure with respect to σ . The following proposition contains a list of properties of the topological recession cones. Proposition 15.2.2. We have the following: (i) K∞ is a σ -closed cone (possibly nonconvex); (ii) K∞ = K whenever K is a σ -closed cone; (iii) K∞ = K ∞ whenever K is a nonempty convex σ -closed set; (iv) K∞ = {0} whenever K is bounded (the converse is false even in the convex case, as shown in Example 15.1.7); (v) if V is finite dimensional, then K∞ = {0} implies that K is bounded; (vi) K∞ = (K ∪ H )∞ = (K + H )∞ for every bounded subset H of V ; (vii) (epi F )∞ = epi F∞ for every proper function F : V →] − ∞, +∞]. Proof. Properties (i), (ii), (iii) follow from the definition (15.38) of K∞ and from Proposition 15.2.1(i), (ii), (iii), respectively. To prove property (iv), let U be a closed neighborhood of 0; since K is bounded, there exists s > 0 such that K ⊂ tU for every t > s. By the characterization of K∞ given by (15.39) this implies that K∞ ⊂ U , and since U is arbitrary, we get K∞ = {0}.
i
i i
i
i
i
i
15.2. Nonconvex minimization problems and topological recession
“abmb 2005/1 page 5 i
575
Let us prove property (v). Assume by contradiction that K is unbounded; then for every h ∈ N there exists xh ∈ K with |xh | ≥ h. Since the sequence yh = xh /|xh | is bounded, we may extract a subsequence (still denoted by yh ) converging to some y ∈ V with |y| = 1. Therefore (χK )∞ (y) ≤ lim inf χK (|xh |yh ) h→+∞
= lim inf χK (xh ) = 0, h→+∞
so that y ∈ K∞ , and this is impossible because y = 0. Let us prove property (vi). Since the inclusion K∞ ⊂ (K ∪ H )∞ is obvious, it is enough to prove the opposite inclusion. Let x ∈ (K ∪ H )∞ ; if x = 0 we have x ∈ K∞ because, by (i), K∞ is a closed cone. If x = 0 let U0 , U be two disjoint neighborhoods of 0 and x, respectively. Since H is bounded, there exists s0 > 0 such that H ⊂ tU0 for every t > s0 ; moreover, since x ∈ (K ∪ H )∞ , by (15.39) we have ∀s > 0
∀Wx
1 ∃t > s : Wx ∩ (K ∪ H ) = ∅, t
(15.40)
where we denoted by Wx a generic neighborhood of x. When s ≥ s0 and Wx ⊂ U we have 1 Wx ∩ H ⊂ U ∩ U0 = ∅ t
∀t > s,
so that, by (15.40), 1 Wx ∩ K = ∅. t By (15.39) this proves that x ∈ K∞ . The equality K∞ = (K ∪ H )∞ can be proved in a similar way. Let us now prove property (vii). If (u, ξ ) ∈ (epi F )∞ , denoting by U and I generic neighborhoods of u and ξ , respectively, by (15.39) we have ∀s > 0
∀U
∀I
∃t > s
∃v ∈ U
∃η ∈ I : F (tv) ≤ tη.
Therefore F∞ (u) ≤ ξ , so that (u, ξ ) ∈ epi F∞ . On the other hand, if (u, ξ ) ∈ epi F∞ , we have F∞ (u) ≤ ξ ; hence by the definition of F∞ we obtain ∀s > 0
∀U
∀η > ξ
∃t > s
∃v ∈ U :
F (tv) < η. t
Therefore (u, ξ ) ∈ epi F∞ . Remark 15.2.2. It can be useful to give a characterization of the topological recession functional F∞ in terms of converging nets: actually, it is easy to show that for every v ∈ V F (tλ vλ) : tλ → +∞, vλ → v , (15.41) F∞ (v) = inf lim inf λ∈ tλ where is an arbitrary directed set and (tλ ), (vλ ) are nets indexed by . Recall that a directed set is a set together with a relation ≥ which is both transitive and reflexive such that for any two elements a, b there exists another element c with c ≥ a and c ≥ b.
i
i i
i
i
i
i
576
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Definition (15.37) or the equivalent one (15.41) is very general and provides a good extension of the notion of convex recession functional introduced in the last section. However, in many situations, it is not easy to deal with neighborhoods or nets; for this reason, we introduce now a new definition of recession functional, in which only the behavior of converging sequences is involved. More precisely, for every v ∈ V we set F (th vh ) seq F∞ (v) = inf lim inf : th → +∞, vh → v , (15.42) h→+∞ th seq
where (th ) and (vh ) are sequences. It is clear from (15.41) and (15.42) that F∞ ≤ F∞ , and seq that F∞ = F∞ whenever the space (V , σ ) is metrizable. Moreover, by a proof similar to seq the one of Proposition 15.2.1, it is possible to show that the following properties for F∞ hold. Proposition 15.2.3. We have the following: seq
(i) F∞ is positively homogeneous of degree 1. seq
(ii) F∞ = F whenever F is sequentially σ -lower semicontinuous and positively homogeneous of degree 1. seq
(iii) F∞ = F whenever F is proper, convex, and sequentially σ -lower semicontinuous. seq
seq
seq
(iv) (F + G)∞ (v) ≥ F∞ (v) + G∞ (v) for every mapping G : V →] − ∞, +∞] and for every v ∈ V such that the sum at the right-hand side is defined. seq
seq
seq
(v) The equality (F + G)∞ (v) = F∞ (v) + G∞ (v) holds in the following cases: (va ) F and G are proper, convex, sequentially σ -lower semicontinuous, and dom F ∩ dom G = ∅; (vb ) G is positively homogeneous of degree 1 and sequentially σ -continuous; (vc ) F is convex and sequentially σ -lower semicontinuous, F (0) < +∞, G is sequentially σ -lower semicontinuous and positively homogeneous of degree 1. seq
Remark 15.2.3. We point out that in general the functional F∞ is neither σ -lower semicontinuous nor sequentially σ -lower semicontinuous. However, this simpler definition will be sufficient to obtain an existence theorem for minimizers (Theorem 15.2.1) which will be seq used in many applications. We stress the fact that the explicit computation of F∞ may be difficult; nevertheless, to apply the existence result of Theorem 15.2.1, in many cases seq it will be enough to show qualitative properties of F∞ which are easy to obtain thanks to Proposition 15.2.3. In an analogous way, for every nonempty subset K of V we may introduce the seseq quential recession cone K∞ by setting seq = dom(χK )seq K∞ ∞ .
(15.43)
i
i i
i
i
i
i
15.2. Nonconvex minimization problems and topological recession
“abmb 2005/1 page 5 i
577
In other words, it is seq x ∈ K∞
⇐⇒
∃th → +∞
∃xh → x
∀h ∈ N
th xh ∈ K.
seq K∞
The set turns out to be a cone (possibly not sequentially σ -closed) with vertex 0 and, by a proof similar to the one of Proposition 15.2.2, we obtain the following properties. Proposition 15.2.4. We have the following: seq
(i) K∞ = K if K is a sequentially σ -closed cone. (ii) K∞ = K ∞ if K is a nonempty sequentially σ -closed set. seq seq
(iii) K∞ = {0} if K is bounded. seq
(iv) K∞ = {0} implies K bounded if V is finite dimensional. seq
seq
seq
(v) K∞ = (K ∪ H )∞ = (K + H )∞ if H ⊂ V is bounded. The following result provides a general necessary condition for the existence of minimizers. Proposition 15.2.5. Assume that
% & inf F (v) : v ∈ V > −∞.
Then we have seq
seq
F∞ (v) ≥ 0 ∀v ∈ V (hence F∞ ≥ 0, because F∞ ≤ F∞ ). Proof. Let m be the infimum of F on V , and let v ∈ V . By the definition of F∞ we get F∞ (v) = lim inf t→+∞ w→v
F (tw) m ≥ lim inf = 0. t→+∞ t t
We recall that even in the convex case, the condition F ∞ ≥ 0 is not sufficient for the existence of a solution of problem % & min F (v) : v ∈ V (15.44) (see, for instance, Example 15.1.9). To show an existence result for problem (15.44), analogously to what was done in Section 15.1 we set % & seq seq ker F∞ = v ∈ V : F∞ (v) = 0 . Theorem 15.2.1. Assume V is a Banach space with norm · , let σ be a topology on V coarser than the norm topology and such that (V , σ ) is a Hausdorff vector space with the closed unit ball of (V , · ) sequentially σ -compact, and let F : V →] − ∞, +∞] be a sequentially σ -lower semicontinuous functional. Assume also that the following conditions are satisfied:
i
i i
i
i
i
i
578
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
(i) compactness: if th → +∞, vh → v with respect to σ , and F (th vh ) is bounded from above, then vh − v → 0; seq
(ii) necessary condition: F∞ (v) ≥ 0 for every v ∈ V ; seq
(iii) compatibility: for every u ∈ ker F∞ there exists t > 0 such that F (v − tu) ≤ F (v) for all v ∈ V . Then the minimum problem (15.44) has at least a solution. Proof. The proof is similar to the one of Theorem 15.1, and we will follow it step by step. Step 1. For every h ∈ N let vh be a solution of the minimum problem % & min F (v) : v ≤ h .
(℘h )
Since F is sequentially σ -lower semicontinuous and {v ∈ V : v ≤ h} is sequentially σ -compact, by the direct method of the calculus of variations (see Corollary 3.2.3) we obtain that there exists a solution vh of problem (℘h ). Moreover, by the lower semicontinuity of F and by the assumptions on σ , we may choose vh such that % & vh = min w : w solves (℘h ) .
(15.45)
Step 2. If the sequence (vh ) is bounded in norm, then by the lower semicontinuity of F and by the σ -compactness of bounded sets, the existence of a solution of problem (15.44) follows from the direct method of the calculus of variations. Indeed, if (vhk ) is a subsequence of (vh ) which is σ -convergent to some v ∈ V , we have % & F (v) ≤ lim inf F (vhk ) = inf F (v) : v ∈ V . k→+∞
Step 3. It remains to show that the case (vh ) unbounded cannot occur. By contradiction, assume that a subsequence of vh (which we still index by h) tends to +∞. Since the normalized vectors wh = vh /vh are bounded, there exists a subsequence of (wh ) (which we still index by h) which σ -converges to some w ∈ V . We have F (vh+1 ) ≤ F (vh ) ∀h ∈ N, so that F (vh ) is bounded from above. Hence seq F∞ (w) ≤ lim inf h→+∞
F (vh wh ) F (vh ) = lim inf ≤ 0, h→+∞ vh vh
which, togheter with the necessary condition (ii), gives seq w ∈ ker F∞ .
(15.46)
i
i i
i
i
i
i
15.2. Nonconvex minimization problems and topological recession
“abmb 2005/1 page 5 i
579
Step 4. We have vh → +∞, wh → w with respect to σ , and F (vh wh ) = F (vh ) bounded from above. By the compactness assumption (i) we obtain wh − w → 0, and this prevents w from being zero, because wh = 1 for all h ∈ N. From (15.46) and from the compatibility condition (iii) we get that there exists t > 0 such that F (vh − tw) ≤ F (vh ) ∀h ∈ N.
(15.47)
Finally, t vh − tw = 1 − vh + t (wh − w) vh t ≤ 1− vh + twh − w vh = vh + t wh − w − 1 .
The right-hand side of the last equality is strictly less than vh for h large enough, and this is in contradiction with (15.47) and (15.45). Remark 15.2.4. Looking at the proof of Theorem 15.2.1 we see that the compactness condition (i) can be imposed only on sequences (vh ) which σ -converge to an element seq v ∈ ker F∞ . Remark 15.2.5. In Theorem 15.2.1 a weaker form of the compatibility condition (iii) can be used, namely, seq
for every u ∈ ker F∞ there exists R > 0 such that for every v ∈ V with v ≥ R there exists t > 0 such that F (v − tu) ≤ F (v). Moreover, an inspection of the proof of Theorem 15.2.1 shows that in the case of a product space V1 ×· · ·×Vn the compatibility condition (iii) can be replaced by the following weaker one: seq
for every (u1 , . . . , un ) ∈ ker F∞ there exist positive numbers t1 , . . . , tn such that F (v1 − t1 u1 , . . . , vn − tn un ) ≤ F (v1 , . . . , vn )
∀ (v1 , . . . , vn ) ∈ V1 × · · · × Vn .
Remark 15.2.6. Theorem 15.2.1 includes in a certain sense the classical direct method of the calculus of variations, which gives the existence of a solution of problem (15.44) under the following assumptions: (i) F is sequentially σ -lower semicontinuous. (ii) There exist α > 0 and b ∈ R such that F (v) ≥ αv + b ∀v ∈ V .
i
i i
i
i
i
i
580
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Indeed, by (ii) we obtain th → +∞, vh → v, F (th vh ) ≤ c
⇒
vh → 0 and v = 0,
so that the compactness hypothesis (i) is satisfied. Moreover, by (iii) again we get seq (v) ≥ αv ∀v ∈ V , F∞ seq
so that ker F∞ reduces to {0}. Hence, hypotheses (ii) and (iii) of Theorem 15.2.1 are also satisfied, and problem (15.44) admits at least a solution. In particular (ii) holds if dom F is bounded. As in Section 15.1, we now specialize our results to the case when the functional F is of the form F (v) = J (v) − L, v + χK (v), where J : V → [0, +∞] is a sequentially σ -lower semicontinuous functional; L : V → R is a linear σ -continuous functional; K ⊂ V is a sequentially σ -closed set. Taking into account Propositions 15.39 and 15.40, we have that a necessary condition for the existence of a solution of the minimum problem % & min J (v) − L, v : v ∈ K is given by seq (v) ≥ L, v ∀v ∈ K, J∞
(15.48)
whereas the compatibility condition (iii) can be written as (iii ) for every u ∈ K∞ with J∞ (u) = L, u, there exists t > 0 such that for every v∈K v − tu ∈ K, J (v − tu) + tL, u ≤ J (v). seq
seq
In many situations the functional J has a “superlinear growth” so that the functional seq J∞ reduces to seq 0 if v ∈ ker J∞ , seq (v) = J∞ +∞ otherwise; in this case the necessary condition (ii) becomes (ii ) L, v ≤ 0 ∀v ∈ K∞ ∩ ker J∞ , seq
seq
whereas the compatibility condition (iii) is given by (iii ) with u ∈ K∞ ∩ ker J∞ ∩ ker L. seq
seq
i
i i
i
i
i
i
15.3. Some examples
15.3
“abmb 2005/1 page 5 i
581
Some examples
In this section we show some examples that can be treated with the theory of noncoercive minimum problems developed in the previous sections of the chapter. Some other cases with applications to problems from mechanics were presented in chapter 14. Consider the following minimization problem: 1 2 1 (15.49) |Dv| + B(v) dx − L, v : u ∈ H () , min 2 where is a bounded connected open subset of Rn , L is in the dual space of H 1 (), and B : R →] − ∞, +∞] is a convex lower semicontinuous function. Proposition 15.3.1. For the existence of a solution of problem (15.49) the assumption −B ∞ (−1) ≤
L, 1 ≤ B ∞ (1) meas()
(15.50)
−B ∞ (−1)
0, then (15.53) has a solution. It remains to consider the case L, 1 = 0.
(15.55)
If h is uniformly continuous and g = 0, a proof of existence can be found in [93]. Note that in this case such regularity assumptions imply that the solution w of the associated linearized problem −w = h in , (15.56) ∂w/∂ν = g on ∂ is essentially bounded. More generally, the following proposition holds. Proposition 15.3.2. Let h ∈ H −1 () and g ∈ H −1/2 (∂) satisfy (15.55). Then the following conditions are equivalent:
i
i i
i
i
i
i
15.3. Some examples
“abmb 2005/1 page 5 i
583
(i) there exists a solution of problem (15.53); (ii) the linearized problem (15.56) admits a negative solution. Moreover, any solution u of (15.53) solves (15.56) and u+ = 0. Proof. Assume (i) holds. If u solves (15.53) and u+ = 0, for every constant c > 0 we have F (u − c) < F (u), which contradicts the fact that u is a minimum point of the functional F . Hence u+ = 0 and u solves (15.56), that is, u is a negative solution of (15.56). Assume now (ii). Let w be a negative solution of (15.56); then, w obviously solves (15.53). Remark 15.3.2. For instance, by well-known regularity results for solutions of elliptic partial differential equations, condition (ii) of Proposition 15.3.2 is satisfied if ∂ is smooth, h ∈ Lp (), and g ∈ W 1−1/p,p (∂) with p > n/2. Remark 15.3.3. The argument used in the proof of Proposition 15.3.2 applies also to problems of the form 1 (15.57) |Dv|2 + B(v + ) dx − L, v : v ∈ H 1 () , min 2 for any convex and strictly increasing B : R+ → R+ . In this case the associated Euler– Lagrange equation reads as −u + ∂B(u+ ) h in , (15.58) ∂u/∂ν = g on ∂ being L = h+g and ∂B the subdifferential of the convex function B, whereas the associated linearized equation coincides with (15.56). Remark 15.3.4. Consider the obstacle (from below) problem (see also [141]) 1 2 1 |Dv| dx − L, v : v ∈ H (), v ≤ 0 . min 2
(15.59)
By introducing the function B(s) =
0 +∞
if t ≤ 0, if t > 0,
problem (15.59) can be written in the form (15.58) and the argument above works in the same way. The Euler–Lagrange equation is usually written in this case as a variational inequality: DuD(v − u) dx − L, v − u ≥ 0 ∀v ∈ H 1 (), v ≤ 0. u ∈ H 1 (), u ≤ 0
i
i i
i
i
i
i
584
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Consider now, in a Hilbert space V , a linear continuous functional L : V → R, a closed convex subset K ⊂ V , and a symmetric continuous bilinear form a : V × V → R which we assume to be nonnegative, that is, a(v, v) ≥ 0
∀v ∈ V .
If F denotes the functional F (v) =
1 a(v, v) − (L, v) 2
∀v ∈ V ,
we may consider the minimum problem min{F (v) : v ∈ K}.
(15.60)
Proposition 15.3.3. The following conditions are equivalent: (i) There exists u ∈ K such that F (u) ≤ F (v) for every v ∈ K. (ii) There exists u ∈ K such that a(u, u − v) ≤ (L, u − v) for every v ∈ K. Proof. Assume (i). Then, since K is convex, for every v ∈ V we have tv + (1 − t)u ∈ K and the map t ∈ [0, 1], gv (t) = F tv + (1 − t)u , achieves its minimum at t = 0, hence 0 ≤ gv (0) = −a(u, u) + a(u, v) − (L, v) + (L, u), that is (ii). Assume now (ii). We have to prove that the map gv above achieves its minimum at t = 0. Since gv (t) = a(v, v) + a(u, u) − 2a(u, v) ≥ 0, the function gv (t) is convex on [0, 1]. Therefore it is enough to show that gv (0) ≥ 0, which holds true because gv (0) = −a(u, u) + a(u, v) − (L, v) + (L, u). Remark 15.3.5. Since the map v → a(v, v) is quadratic, the recession functional associated to F is given by −(L, v) if a(v, v) = 0, ∞ F (v) = +∞ otherwise; hence (see Theorem 15.1.2), so that a solution of (15.60) exists, a necessary condition is (L, v) ≤ 0
∀v ∈ K ∞ ∩ ker a,
whereas, if a satisfies the compactness condition vh → 0 weakly, a(vh , vh ) → 0
→
vh → 0 strongly,
i
i i
i
i
i
i
15.3. Some examples a sufficient one is
“abmb 2005/1 page 5 i
585
K ∞ ∩ ker a ∩ ker L
is a subspace.
Problem (15.60) written in the form (ii) above is called a variational inequality. For instance, the obstacle problems are of this type, where the Hilbert space is H 1 () and the convex set K is % & K = v ∈ H 1 () : v ≥ ψ q.e. in where q.e. is intended in the sense of capacity (see Section 5.8). Let us consider more particularly this situation in the case the bilinear form a is given by means of an elliptic operator, n a(u, v) = aij (x)Di uDj u dx, i,j =1
whose coefficient aij are symmetric, measurable, and satisfy the ellipticity condition c1 |ξ |2 ≤
n
aij (x)ξi ξj ≤ c2 |ξ |2
∀ξ ∈ Rn .
i,j =1
We denote by A the corresponding elliptic operator Au = −
n
Di aij (x)Dj u .
i,j =1
Proposition 15.3.4. The obstacle problem (15.60), equivalent to the variational inequality (ii) of Proposition 15.3.3, is also equivalent to the complementary problem u − ψ ≥ 0, Au − L ≥ 0, Au − L, u − ψ = 0.
(15.61)
Proof. Let u be a solution of the obstacle problem (15.60); by formulation (ii) of Proposition 15.3.3, for every nonnegative function φ ∈ H 1 (), taking v = u + φ, we obtain Au, φ ≥ L, φ, that is, Au − L ≥ 0. On the other hand, taking v = ψ we have 0 ≤ Au − L, u − ψ = Au, u − ψ − L, u − ψ ≤ 0, that is, Au − L, u − ψ = 0. On the other hand, if u verifies (15.61), writing a generic v ≥ ψ in the form ψ + φ with φ ≥ 0, we have Au − L, u − v = Au − L, u − ψ − Au − L, φleq0, so that by Proposition 15.3.3, u solves the obstacle problem.
i
i i
i
i
i
i
586
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
Remark 15.3.6. Since Au−L is a nonnegative distribution, by the Riesz–Schwartz theorem there exists a nonnegative Borel measure µ on such that Au − L = µ. Moreover, by (15.61) the measure µ is concentrated on the coincidence set {u = ψ}. For further details on the obstacle problems see, for instance, the book by Kinderlehrer and Stampacchia [163]. Here we want only to remark that the obstacle problem is a mathematical model for determining the shape of a thin elastic membrane subject to a vertical load L and where the unknown u represents the vertical displacement of the membrane. The measure µ in this framework has a natural interpretation as the upward force due to the constraint reaction of the rigid obstacle. As a further example we consider now the case of minimum problems of the form T 1 2 n 1 min |u | + V (t, u) dt : u ∈ H (0, T ; R ), u(0) = u(T ) , (15.62) 2 0 where V is a Borel function, with V (t, ·) lower semicontinuous on Rn , and such that for a.e. t ∈ (0, T ) and for every s ∈ Rn
V (t, s) ≥ −a(t) − b(t)|s|q
(15.63)
for suitable q < 2 and a(t), b(t) in L1 (0, T ). If V (t, ·) is smooth the minimum problem above has the Euler–Lagrange equation −u + ∇V (t, u) = 0, (15.64) u(0) = u(T ), u (0) = u (T ). We are therefore looking for solutions of the differential equations −u + ∇V (t, u) = 0 which are periodic on the interval&[0, T ]. It is convenient to denote by HT1 the space % u ∈ H 1 (0, T ; Rn ) : u(0) = u(T ) and by F the functional F (u) = 0
T
1 2 |u | + V (t, u) dt. 2
Notice that in general the functional F is not convex. Nevertheless the sequential weak lower semicontinuity of F is straightforward. Let us prove now the compactness property (i) of Theorem 15.2.1. If th → +∞ and uh → u weakly in HT1 , with F (th uh ) bounded T from above, then we deduce by (15.63) that 0 |uh |2 dt → 0, and this implies immediately the strong convergence of the sequence {uh } to a constant. To prove the necessary condition F∞ (u) ≥ 0 for every u ∈ HT1 , we notice that it is trivially fulfilled when u is a nonconstant function, because in this case we have F∞ (u) = +∞. Indeed, if th → +∞ and uh → u weakly in HT1 with u nonconstant, we have lim inf h→+∞
0
T
|uh |2
dt ≥
T
|u |2 dt > 0.
0
Therefore, by (15.63) and using the fact that q < 2, we obtain T th 2 F (th uh ) q−1 lim inf |uh | − b(t)th |uh |q dt = +∞. ≥ lim inf h→+∞ h→+∞ 0 th 2
i
i i
i
i
i
i
15.3. Some examples
“abmb 2005/1 page 5 i
587
The necessary condition F∞ (u) ≥ 0 is then reduced to F∞ (c) ≥ 0
for every c ∈ Rn .
(15.65)
Here below we assume (15.65) is satisfied and we will particularize some special cases when additional assumptions on the potential V are made. When V (t, ·) is convex on Rn , then the functional F turns out to be convex, and we are in the framework of Section 13.1. Assume there exists a function u0 ∈ HT1 such that V t, u0 (t) is integrable and introduce for every c ∈ Rn the convex function c : R → ] − ∞, +∞] given by T c (r) = V t, u0 (t) + cr dt. 0
Then by applying Theorem 15.1.1 it is easy to see that the existence of at least a solution to problem (15.62) occurs provided for every c ∈ Rn the function c is either constant or such that lim c (r) = +∞. (15.66) |r|→+∞
For instance, if V (t, s) = V (s) − f (t)s we have that condition (15.66) is fulfilled for every f ∈ L1 (0, T ; Rn ) whenever the function V (s) has a superlinear growth, that is, lim
|s|→+∞
V (s) = +∞. |s|
Another case in which the existence of minimizers for problem (15.62) can be easily obtained is when the potential V satisfies a Lipschitz condition of the form V (t, s1 ) − V (t, s2 ) ≤ k(t)|s1 − s2 | for a.e. t ∈ (0, T ) and for every s1 .s2 ∈ Rn . (15.67) Here we assume that k ∈ L1 (0, T ), and that the potential V satisfies the coercivity condition lim
|s|→+∞ 0
T
V (t, s) dt = +∞.
(15.68)
In this case problem (15.62) is actually coercive, and the existence result then follows immediately from the direct methods of the calculus of variations of Section 3.2. Indeed, by using the Lipschitz condition (15.67) we obtain for every u ∈ HT1 F (u) ≥ 0
T
T 1 2 k(t)u(t) − u(0) dt. |u | + V t, u(0) dt − 2 0
Moreover, since t u(t) − u(0) = ≤ T u (s) ds 0
T
|u |2 dt
1/2 ,
0
i
i i
i
i
i
i
588
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
we deduce that F (u) ≥ 0
T
T 1/2 1 2 2 |u | dt |u | + V t, u(0) dt − C 2 0
for a suitable positive constant C. Now, by using the coercivity assumption (15.68), and the fact that on HT1 the norm is equivalent to
T
2
1/2
|u | dt + |u(0)|
2
,
0
we obtain that uHT1 → +∞ yields F (u) → +∞, hence the coercivity of F . The lower semicontinuity of F is a straightforward consequence of the results of Section 13.1. Consider now the case when the function V (t, ·) is periodic. More precisely, we assume that the function V is nonnegative, or more generally bounded from below by an L1 function, and that for suitable independent vectors τi ∈ Rn we have for a.e. t ∈ (0, T ) and for every s ∈ Rn V (t, s + τi ) = V (t, s),
i = 1, . . . , n.
Note that in this case the functional F is not convex in general and that, by the nonnegativity of V , the necessary condition (15.65) is clearly fulfilled. Thus, to apply Theorem 15.2.1 and obtain the existence of a minimizer for F , it is enough to show that the compatibility condition (iii ) of Remark 15.2.5 holds. In other words we have to find for every c ∈ Rn a vector µ ∈ Rn whose components µi are positive, such that F (u1 − µ1 c1 , . . . , un − µn cn ) ≤ F (u1 , . . . , un )
∀u ∈ HT1 .
This can be achieved if we choose for instance µi = |τi |/|ci | for all i = 1, . . . , n, with µi = 1 if ci = 0.
15.4
Limit analysis problems
We consider the so-called limit analysis problems which consist in minimizing functionals of the form F (x) − γ L(x), x ∈ X, where X is a normed space, F : X →] − ∞, +∞] is a (possibly nonconvex) functional, and L ∈ X . We are interested in characterizing the values of γ for which the minimum is attained. As an application we consider nonconvex minimum problems defined on the space of measures and on the BV space. Let X be a normed space; we consider on X the norm topology τ and another (linear Hausdorff) topology σ weaker than τ and such that the unit ball {x ∈ X : x ≤ 1} is σ -compact. Let F : X →] − ∞, +∞] be a functional proper and σ -lsc (on all τ -bounded sets), and let L : X → R be a linear σ -continuous functional. The limit analysis problem associated to F and L consists in finding the values γ ∈ R for which the minimum problem min{F (x) − γ L(x) : x ∈ X}
(15.69)
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
589
admits at least a solution. When F is convex, the problem was studied by Bouchitté and Suquet in [78], where the following result is proved. Theorem 15.4.1. Assume that F is proper, convex, σ -lsc, and σ -coercive in the sense that lim
x→+∞
F (x) = +∞.
Then, setting γ ∗ = min{F ∞ (x) : x ∈ X, L(x) = 1}, γ∗ = − min{F ∞ (x) : x ∈ X, L(x) = −1}, the following statements are equivalent: (i) F − γ L is σ -coercive. (ii) γ∗ < γ < γ ∗ . Proof. Since F is proper, there exists x0 ∈ X such that F (x0 ) < +∞; then, by considering the functional F (x + x0 ) − F (x0 ) it is easy to see that we may reduce ourselves to assume without any loss of generality that F (0) = 0. Assume now (i). By the properties of recession functions we have (F − γ L)∞ (x) ≥
F (tx) − γ L(tx) t
∀t > 0, ∀x ∈ X;
if for some x = 0 we had (F − γ L)∞ (x) ≤ 0, this would be in contradiction to the coerciveness of F − γ L. Then, taking as x ∗ the solution of % & γ ∗ = min F ∞ (x) : L(x) = 1 we obtain
0 < (F − γ L)∞ (x ∗ ) = F ∞ (x ∗ ) − γ L(x ∗ ) = γ ∗ − γ .
Analogously, if x∗ is the solution of % & −γ∗ = min F ∞ (x) : L(x) = −1 , we get
0 < (F − γ L)∞ (x∗ ) = F ∞ (x∗ ) − γ L(x∗ ) = −γ ∗ + γ ,
that is, (ii). Assume now (ii) and let 0 < γ < γ ∗ . By contradiction, if F − γ L were not coercive, we could find a sequence (xh ) such that xh → +∞ and (F − γ L)(xh ) ≤ M
(15.70)
for a suitable M ∈ R. Note that L(xh ) → +∞ because otherwise (15.70) and the coerciveness of F would prove that (xh ) is bounded. Setting yh = xh /xh and th = xh we
i
i i
i
i
i
i
590
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
get that (yh ) is σ -relatively compact, hence (up to subsequences) converging to a suitable y ∈ X. By using the lower semicontinuity and convexity of F , and the fact that F (0) = 0, we have for every t > 0 F (ty) F (th yh ) F (tyh ) ≤ lim inf ≤ lim inf h→+∞ h→+∞ t th t L(xh ) = γ L(y) ≤ lim inf γ h→+∞ xh so that, as t → +∞,
F ∞ (y) ≤ γ L(y).
(15.71)
Being F lower semicontinuous and σ -coercive, by Proposition 15.1.3 we have F ∞ ≥ 0, hence L(y) ≥ 0 by (15.71). The case L(y) > 0 must be excluded because, taking z = y/L(y) we get L(z) = 1 and γ ∗ ≤ F ∞ (z) =
F ∞ (y) ≤ γ, L(y)
which contradicts (ii). Therefore L(y) = 0 and, by (15.71), F ∞ (y) = 0. Arguing as in the first part of the proof we get that by the coerciveness of F , this implies y = 0, so that using (15.70) and recalling that L(xh ) → +∞, F (xh ) M xh ≤ ≤ + γ. F L(xh ) L(xh ) L(xh ) Hence xh /L(xh ) is bounded in X (by the coerciveness of F ), and so yh /L(yh ) is bounded too. But this contradicts the fact that yh = 1 whereas L(yh ) → 0. An analogous argument can be used in the case γ∗ < γ < 0. When F is not necessarily convex we have the following result. Theorem 15.4.2. Assume that (i) F is proper and sequentially σ -lsc; (ii) for all z ∈ ker F∞ there exists η = η(z) > 0 such that F (x − ηz) ≤ F (x)
∀x ∈ X;
(iii) there exist a seminorm P : X → [0, +∞[ satisfying the compactness condition (2.3.11) and a number C ≥ 0 such that F (x) ≥ P (x) − C
∀x ∈ X.
Then, setting γ ∗ = inf{F∞ (x) : L(x) = 1}, γ∗ = − inf{F∞ (x) : L(x) = −1}, we have
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
591
(a) if the minimum problem (15.69) admits a solution, then γ∗ ≤ γ ≤ γ ∗ ; (b) if γ∗ < γ < γ ∗ , then the minimum problem (15.69) admits a solution. Proof. To prove statement (a) we apply Proposition 15.2.5. The necessary condition for the existence of a solution to problem (15.69) is (F − γ L)∞ ≥ 0
on X,
which, thanks to Proposition 15.2.1, becomes F∞ (x) ≥ γ L(x)
∀x ∈ X.
(15.72)
Taking L(x) = −1 in (15.72) gives & % γ ≥ sup − F∞ (x) : L(x) = −1 = γ∗ ; similarly, taking L(x) = 1 in (15.72) gives & % γ ≤ inf F∞ (x) : L(x) = 1 = γ ∗ . To prove statement (b) we are going to verify all the hypotheses of Theorem 15.2.1 for the functional G = F − γ L which is clearly sequentially σ -lsc. As seen in the proof of statement (a), the necessary condition (ii) of Theorem 15.2.1 is equivalent to inequalities γ∗ ≤ γ ≤ γ ∗ . To verify the compactness hypothesis (i) of Theorem 15.2.1, let th → +∞ and let xh → x be a sequence σ -converging in X such that F (th xh ) − γ L(th xh ) ≤ C.
(15.73)
Dividing by th we get F (th xh ) C − γ L(xh ) ≤ th th so that, by definition of the topological recession functional, F∞ (x) ≤ γ L(x). If L(x) > 0 we would obtain % & x ≤ γ < γ ∗ = inf F∞ (y) : L(y) = 1 , F∞ L(x) which gives a contradiction. In an analogous way we can exclude the case L(x) < 0. Therefore we have L(x) = 0 and so also F∞ (x) = 0. Now, if L(th xh ) is bounded from above, by (15.73) we would get P (th xh ) ≤ C,
i
i i
i
i
i
i
592
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
and so, since P satisfies the compactness assumption (ii) of Theorem 15.2.1, we would obtain xh → x strongly in X. Otherwise, if L(th xh ) (or a subsequence of it) tends to +∞, dividing by L(th xh ) in (15.73) we get xh P ≤ C. L(xh ) Since L(xh ) → L(x) = 0, by the compactness assumption of P we obtain again xh → x. Therefore the compactness hypothesis (2.3.11) is satisfied. Finally we verify the compatibility conditions (iii) of Theorem 15.2.1. Let z ∈ ker(F∞ − γ L), i.e., F∞ (z) = γ L(z). As before, using the strict inequalities γ∗ < γ < γ ∗ we obtain F∞ (z) = L(z) = 0, and so, by assumption (ii) F (x − ηz) − γ L(x − ηz) = F (x − ηz) − γ L(x) ≤ F (x) − γ L(x) for all x ∈ X, that is, the compatibility condition (iii) is satisfied. Remark 15.4.1. If the infimum % & inf F∞ (x) : L(x) = 1
% & respectively, inf F∞ (x) : L(x) = −1
is not attained, then we can also accept in Theorem 15.4.2 (b) γ = γ ∗ (respectively, γ = γ∗ ). As an application of the limit analysis theorems above we consider the case of functionals defined on measures. We refer to Section 13.3 for the theory of convex functionals on measures and to Bouchitté and Buttazzo [74], [75], [76] for further details on nonconvex functionals defined on measures. Consider a measure space (, B, µ), where is a separable locally compact metric space, B is the σ -algebra of all Borel subsets of , and µ : B → [0, +∞[ is a positive, finite, nonatomic measure. Consider a functional defined on M(; Rn ) of the form first considered by Bouchitté and Buttazzo [74], dλ ∞ s F (λ) = f f (λ ) + g(λ(x)) d#. (15.74) dµ + dµ \Aλ Aλ Here f : Rn → [0, +∞] is a proper, convex, lsc function with f (0) = 0; f ∞ is its recession function; g : Rn → [0, +∞] is an lsc function with g(0) = 0 satisfying the subadditivity condition g(s1 + s2 ) ≤ g(s1 ) + g(s2 ) ∀s1 , s2 ∈ Rn ; dλ λ = dµ µ + λs is the Lebesgue–Nikodym decomposition of λ into absolutely continuous and singular parts with respect to µ;
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
593
Aλ is the set of all atoms of λ; λ(x) is the value λ({x}); # is the counting measure. As already recalled, Bouchitté and Buttazzo [74] proved that if the condition g(ts) t
f ∞ (s) = lim+ t→0
∀s ∈ Rn
is fulfilled, then the functional (15.74) is sequentially weakly∗ -lsc on M(; Rn ). For our purposes, it is convenient to introduce for every function g : Rn → [0, +∞] the functions g(ts) , t g(ts) g 0 (s) = lim sup . t t→0+
g ∞ (s) = lim inf t→+∞
The following proposition holds (see [74]). Proposition 15.4.1. Let g : Rn → [0, +∞[ be an lsc and subadditive function, with g(0) = 0. Then, we have the following: (i) the functions g 0 and g ∞ are convex, lsc, and positively 1-homogeneous; (ii) g 0 (s) = supt>0
g(ts) t
= limt→0+
(iii) g ∞ (s) = inf t>0
g(ts) t
= limt→+∞
g(ts) t
for every s ∈ Rn ;
g(ts) t
for every s ∈ Rn .
Remark 15.4.2. From Proposition 15.72 it follows easily that g ∞ (s) ≤ g(s) ≤ g 0 (s)
for every s ∈ Rn .
The following theorem gives sufficient conditions on f and g to apply Theorem 15.4.2 to functionals of the form (15.74). Theorem 15.4.3. Let f : Rn → [0, +∞], and g : Rn → [0, +∞[ be given functions. Assume that (i) f is convex, lsc, and proper on Rn , and f (0) = 0; (ii) there exist C1 > 0 and D ∈ R such that f (x) ≥ C1 |x| − D
∀x ∈ Rn ;
(iii) g is lsc and subadditive on Rn , and g(0) = 0;
i
i i
i
i
i
i
594
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
(iv) there exists C2 > 0 such that g(x) ≥ C2 |x|
∀x ∈ Rn ;
(v) g 0 = f ∞ in Rn ; (vi) H ∈ C0 (; Rn ). Let F : M(; Rn ) → [0, +∞] be the functional defined in (15.74). Then, setting γ ∗ = inf{F∞ (λ) : H, λ = 1}, γ∗ = − inf{F∞ (λ) : H, λ = −1}, we have (a) if the functional F − γ H, · admits a minimum on M(; Rn ), then γ∗ ≤ γ ≤ γ ∗ ; (b) the functional F − γ H, · admits a minimum on M(; Rn ), for every γ such that γ∗ < γ < γ ∗ . Proof. By the assumptions made on f and g the functional F is sequentially weakly∗ -lsc on M(; Rn ). Moreover, by assumptions (ii) and (iv), we have dλ s |λ | + C2 |λs (x)| d# − Dµ() F (λ) ≥ C1 dµ + C1 dµ \Aλ Aλ so that F (λ) ≥ Cλ − b
(15.75)
for suitable C > 0 and b ∈ R. Finally, from (15.75) we get ker F∞ = {0} so that hypothesis (ii) of Theorem 15.4.2 is satisfied too, and hence the conclusions follow from Theorem 15.4.2. We give now an explicit formula for the bounds γ ∗ and γ∗ . To obtain this result we need first an explicit representation for the topological recession function F∞ . We use a representation theorem for the relaxed functional associated to integrals of the form (15.74). More precisely, given a functional F : M(; Rn ) → [0, +∞] of the form dλ dµ + f g λ(x) #. if λs = 0 on \ Aλ , F (λ) = dµ Aλ +∞ otherwise, we consider its relaxed functional F defined by % & F = sup G : G ≤ F, G sequentially weakly∗ lsc on M(; Rn ) . Bouchitté and Buttazzo [75] proved that if f, g : Rn → [0, +∞] satisfy the assumptions
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
595
• f is convex and lsc on Rn , and f (0) = 0, • there exist α > 0 and β ≥ 0 such that f (s) ≥ α|s| − β
∀s ∈ Rn ,
• g is subadditive and lsc on Rn , and g(0) = 0, • g 0 (s) ≥ α|s| for every s ∈ Rn , then the following integral representation holds for F : dλ ∞ s g(λ(x)) d#, (f ) (λ ) + dµ + F (λ) = f dµ \Aλ Aλ where f = f #e g 0 ,
(15.76)
g = f ∞ #e g.
To characterize the topological recession function for functionals of the form (15.74) we introduce the functional dλ G∞ (λ) = g ∞ (λ) = g∞ g ∞ (λs ). dµ + dµ Theorem 15.4.4. Under the assumptions of Theorem 15.4.3, we have F∞ (λ) = G∞ (λ)
∀λ ∈ M(; Rn ).
Proof. We prove first that F∞ (λ) ≤ G∞ (λ) for every λ ∈ M(; Rn ). Let λ ∈ M(; Rn ) be a measure with a finite number of atoms. Then, dλ 1 g(tλ(x)) f t dµ + d# f ∞ (λs ) + F∞ (λ) ≤ lim inf t→+∞ dµ t t \Aλ Aλ dλ ≤ f∞ f ∞ (λs ) + g ∞ λ(x) d#. (15.77) dµ + dµ \Aλ Aλ & % Now, let λ be any measure in M(; Rn ). Setting Ahλ = % x ∈ Aλ : |λ|(x) < &1/ h and λh = λ · 1\Ahλ , we get λh → λ; moreover, since Aλh = x ∈ Aλ : |λ|(x) ≥ 1/ h , we have that λh has a finite number of atoms. From the weak∗ -lower semicontinuity of F∞ , taking into account (15.77), we have dλ dµ+ f∞ f ∞ (λs ) + g ∞ λs (x) d# F∞ (λ) ≤ lim inf F∞ (λh ) ≤ lim inf h→+∞ h→+∞ dµ \Aλ Aλ \Ahλ dλ ≤ dµ + f∞ χ{0} (λs ) + g ∞ λs (x) d#. (15.78) dµ \Aλ Aλ By computing the relaxation of the first and the last terms of (15.78), we get by (15.76) dλ dµ + (f ∞ #e g ∞ )(λs ) + (f ∞ #e g ∞ ) λs (x) d# F∞ (λ) ≤ (f ∞ #e g ∞ ) dµ \Aλ Aλ
i
i i
i
i
i
i
596
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
for every λ ∈ M(; Rn ). From (v) of Theorem 15.4.3 we deduce that f ∞ #e g ∞ = g ∞ so that F∞ (λ) ≤ G∞ (λ) ∀λ ∈ M(; Rn ). We prove now the opposite inequality. We claim that for every ε > 0 there exists a kε > 0 such that g ∞ (s) ≤ f (s) + ε|s| + kε ∀s ∈ Rn . (15.79) By contradiction, assume there exists an ε0 > 0 such that for every k ∈ N there exists a sk ∈ Rn with g ∞ (sk ) > f (sk ) + ε0 |sk | + k. Setting vk = sk /|sk |, and tk = |sk | we have g ∞ (vk ) >
f (tk vk ) k + ε0 + . tk tk
(15.80)
Since |vk | = 1, it is not restrictive to assume vk → v for some v ∈ Rn . If (tk ) is bounded we get, by using Proposition 15.4.1(i), g(v) ≥ g ∞ (v) = +∞, which is impossible since g is finite. Therefore, we can assume tk → +∞. Passing to the limit in (15.80) yields, taking into account Proposition 15.4.1(i) again, g ∞ (v) ≥ f ∞ (v) + ε0 = g 0 (v) + ε0 > g ∞ (v), which is a contradiction. Then (15.79) holds, and from the weak∗ -lower semicontinuity and 1-homogeneity of G∞ we get G∞ (th λh ) ∞ G (λ) = inf lim inf h→+∞ th dλh 1 dλh dµ ≤ inf lim inf f th + ε th h→+∞ th dµ dµ s ∞ s + f (th λh ) + g th λh (x) d#
\Aλh
Aλh
F (th λh ) ≤ inf lim inf + ελh h→+∞ th
,
where the infimum is taken over all th → +∞ and all λh → λ. Taking (th ) and (λh ) such that F (th λh ) F∞ (λ) = lim inf , h→+∞ th we have G∞ (λ) ≤ F∞ (λ) + ε lim sup λh ≤ F∞ (λ) + εC. h→+∞
Letting ε → 0, we get
G∞ (λ) ≤ F∞ (λ),
and the proof is achieved.
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
597
By virtue of Theorem 15.4.4 we can write γ ∗ = inf{G∞ (λ) : H, λ = 1}, γ∗ = − inf{G∞ (λ) : H, λ = −1} or equivalently γ∗ =
1 , sup {H, λ : G∞ (λ) = 1}
γ∗ =
1 . inf {H, λ : G∞ (λ) = 1}
The last expressions allow us to compute explicitly γ ∗ and γ∗ in terms of g ∞ and H only. Indeed, by using the definition of the Fenchel transform, it is easy to see that % & 1 = sup t : (G∞ )∗ (tH ) = 0 . ∞ sup {H, λ : G (λ) = 1} Therefore, since
∞ ∗
(G ) (w) = we obtain
0 +∞
if (g ∞ )∗ (w) ≡ 0, otherwise,
% & H (x)s −1 . γ ∗ = sup t : (g ∞ )∗ (tH ) ≡ 0 = sup ∞ x,s g (s)
Analogously, it is % & H (x)s −1 γ∗ = inf t : (g ∞ )∗ (tH ) ≡ 0 = inf ∞ . x,s g (s) For instance, if g ∞ (s) = c|s|, we get γ∗ =
c , H C0 (;Rn )
γ∗ = −
c . H C0 (;Rn )
The result we obtained allows us to study the limit analysis problem for a class of nonconvex functionals defined on BV . More precisely, let =]a, b[ be an open interval of R, and assume that f and g satisfy all the hypotheses of Theorem 15.4.3 Consider the nonconvex functional F : BV (; Rn ) → [0, +∞] defined by f (∇u) dx + f ∞ (D s u) + g(D s u(x)) d#(x), (15.81) F (u) =
\Su
Su
where ∇u and D s u, respectively, denote the absolutely continuous and the singular parts of Du with respect to the Lebesgue measure, and Su is the set of jumps of u, that is, the set of all points x ∈ such that the left and right traces u+ (x) and u− (x) do not coincide. Setting λ = Du, the functionals of type (15.81) can be interpreted in terms of functionals of type (15.74) on M(; Rn ).
i
i i
i
i
i
i
598
“abmb 2005/1 page 5 i
Chapter 15. Variational problems with a lack of coercivity
The Neumann problem. We deal with functionals G defined on BV (; Rn ) by ∞ s f (∇u) dx + f (D u) + g(D s u(x)) d#(x) − γ L, u, (15.82) G(u) =
\Su
Su
where
L, u =
hu dx +
φDu
with h ∈ L1 (; Rn ) and φ ∈ C0 (; Rn ). It is easily verified that L, 1 = 0 is a necessary condition in order to get a minimum for the functional (15.82). Therefore, setting x h(y) dy, H (x) = a
we have that H ∈ C0 (; Rn ), and integrating by parts, L, u = φ − H, Du,
with φ − H ∈ C0 (; Rn ).
Hence, by the limit analysis result above, a necessary condition for the existence of a minimizer of the functional G in (15.82) is γ∗ ≤ γ ≤ γ ∗ , whereas a sufficient condition is γ∗ < γ < γ ∗ , with (φ(x) − H (x)) s −1 γ∗ = inf , x,s g ∞ (s) −1 (φ(x) − H (x)) s γ ∗ = sup . g ∞ (s) x,s The Dirichlet problem. To deal with the Dirichlet problem associated to functionals of the form (15.82), it is convenient to consider an open interval 0 containing and the space % & BV0 = u ∈ BV (0 ; Rn ) : u = 0 on 0 \ . Therefore, given h ∈ L1 (; Rn ) and φ ∈ C(; Rn ), and denoting by h˜ ∈ L1 (0 ; Rn ) and φ˜ ∈ C0 (0 ; Rn ) some extensions of h and φ to 0 , we may set for every u ∈ BV0 ˜ ˜ ˜ φDu = hu dx + L, u = hu dx + φDu 0
and consider the problem f (∇u) dx + min 0
0 \Su
0
∞
˜ g(D u(x)) d#(x) − γ L, u : u ∈ BV0 ,
f (D u) + s
s
Su
(15.83)
where Su denotes now the set of jumps of u on 0 . If H ∈ C0 (0 ; Rn ) is such that H = h a.e. on , we have ˜ u = ˜ φDu = (φ − H )Du H u dx + L,
0
i
i i
i
i
i
i
15.4. Limit analysis problems
“abmb 2005/1 page 5 i
599
and problem (15.83) can be written as dλ ∞ s min f f (λ ) + g(λ(x)) d# − γ φ − H, λ : λ ∈ M(; Rn ), dx + dx \Aλ Aλ λ() = 0 . Therefore, arguing as in the previous case, we obtain that a necessary (respectively, sufficient) condition for existence in the Dirichlet problem (15.83) is γ∗ ≤ γ ≤ γ ∗ (respectively, γ∗ < γ < γ ∗ ), with = >−1 γ∗ = inf φ−H,λ : λ() = 0 , g ∞ (λ)
γ ∗ = sup
=
φ−H,λ ∞ g (λ)
: λ() = 0
>−1
.
Remark 15.4.3. The Neumann and Dirichlet problems can be considered in the more general and interesting case of functions u defined on a subset of Rn . The functional F : BV (; Rm ) → [0, +∞] is then of the form (see Sections 10.3 and 10.4) g([u], νu ) dHn−1 , F (u) = f (∇u) dx + f ∞ (D s u) +
\Su
Su
where Hn−1 is the Hausdorff (n − 1)-dimensional measure, [u] is the jump of u along Su , and νu is the normal versor to Su . In this case the associated limit analysis problems have not been studied, and even the study of general conditions on f and g which imply the lower semicontinuity of F leaves some open questions.
i
i i
i
i
“abmb 2005/1 page 6 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 6 i
Chapter 16
An introduction to shape optimization problems
In this section we give a quick introduction to shape optimization problems in a rather general framework, and we discuss some of their features, especially in relation with the existence of an optimal solution. Our goal is not to give here a detailed presentation of the many problems and results in this very wide field, but only to show that several optimization problems, often very important for applications in mechanics and engineering, cannot be formulated by taking a Banach function space of the competing admissible choices: a more appropriate framework consists in taking as admissible controls the elements of a class of domains. We refer the reader interested in a deeper knowledge and analysis of this fascinating field to one of the several books on the subject [153], [9], [198], [209], to the notes by Tartar [218], or to the recent collection of lecture notes by Bucur and Buttazzo [95]. A shape optimization problem is a minimization problem where the unknown variable runs over a class of domains; then every shape optimization problem can be written in the form % & min F (A) : A ∈ A , (16.1) where A is the class of admissible domains and F is the cost function that one has to minimize over A. It must be noticed that the class A of admissible domains does not have any linear or convex structure, so in shape optimization problems it is meaningless to speak of convex functionals and similar notions. Moreover, even if several topologies on families of domains are available, in general there is not an a priori choice of a topology which allows us to apply the direct methods of the calculus of variations for obtaining the existence of at least an optimal domain. We want to stress that, as also happens in other kinds of optimal control problems, in several situations an optimal domain does not exist; this is mainly because in these cases the minimizing sequences are highly oscillating and converge to a limit object only in a suitable “relaxed” sense. Then we may have, in these cases, only the existence of a relaxed solution that in general is not a domain, and whose characterization may change from problem to problem. 601
i
i i
i
i
i
i
602
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems
A general procedure to relax optimal control problems can be successfully developed by using the -convergence scheme which provides the right topology that has to be used for sequences of admissible controls. In particular, for shape optimization problems, this provides the right notion of convergence for sequences of domains. Presenting in a detailed way the abstract framework of relaxed optimal control problems through the -convergence would require us to develop several preliminary tools as background. This goes beyond our purposes, so we simply refer the interested reader to [64], where this framework was first introduced, or to [95], [101]. Coming back to the minimization problem (16.1), in general, unless some geometrical constraints on the admissible sets are assumed, or some very special cases of cost functionals are considered, the existence of an optimal domain may fail. In these situations the discussion will then be focused on the relaxed solutions that always exist. As usually happens in all optimization problems, to give a qualitative description of the optimal solutions of a shape optimization problem, it is important to derive the so-called necessary conditions of optimality. These conditions have to be derived from the comparison of the cost of an optimal solution Aopt to the cost of other suitable admissible choices, close enough to Aopt . This procedure is what is usually called a variation near the solution. The difficulty in obtaining necessary conditions of optimality for shape optimization problems consists in the fact that being the unknown variables domains, the notion of neighborhood is not a priori clear; the possibility of choosing a domain variation could then be rather wide. The same method can be applied, when no classical solution exists, to relaxed solutions, and this will provide some qualitative information about the behavior of the minimizing sequences of the original problem. Finally, for some particular problems presenting special behaviors or symmetries, one would like to exhibit explicit solutions (balls, ellipsoids, …). This could be very difficult, even for simple problems, and often, instead of having established results, one can give only conjectures. In general, since the explicit computations are difficult, one should develop efficient numerical schemes to produce approximated solutions. This is a challenging field we will not enter; we refer the interested reader to the books and papers available on the subject [9], [198], [209]. In the following examples we show that several classical optimization problems can be written in the form (16.1).
16.1 The isoperimetric problem The isoperimetric problem is certainly the oldest shape optimization problem; it seems to go back to the Greek golden period of mathematics (Archimedes, Zenodorus, …), and a legend about Queen Dido shows that the question was clearly formulated long ago. The problem with constraint Q (see, for instance, [59]) consists in finding among all Borel subsets A of a given closed set Q ⊂ RN the one which minimizes the perimeter, once its Lebesgue measure, or more generally the quantity A f (x) dx for a given function f ∈ L1loc (RN ), is prescribed. With this notation the isoperimetric problem can be then formulated in the form (16.1) if we take F (A) = Per(A),
i
i i
i
i
i
i
16.1. The isoperimetric problem
“abmb 2005/1 page 6 i
603
f (x) dx = c . A= A⊂Q : A
Here the perimeter of a Borel set is the one defined in chapter 10 as Per(A) = |D1A | = HN −1 (∂ ∗ A), where D1A is the distributional derivative of the characteristic function of A, HN −1 is the (N − 1)-dimensional Hausdorff measure introduced in Section 4.1, and ∂ ∗ A is the reduced boundary defined in Section 10.3. By using the properties of the BV spaces seen in chapter 10, when Q is bounded we obtain the lower semicontinuity and the coercivity of the perimeter for the L1 convergence, which enables us to apply the direct methods of the calculus of variations of Section 3.2 and to obtain straightforwardly the existence of an optimal solution for the problem (16.2) min Per(A) : A ⊂ Q, f dx = c . A
It is also very simple to show that in general the problem above may have no solution if we drop the assumption that Q is bounded (see, for instance, [95]). Take indeed f ≡ 1, c = π , and Q the countable union of all closed disks in R2 of the form B(xn , rn ), where xn = (2n, 0) and rn = 1 − 1/n (see Figure 16.1). It is then easy to see that the infimum of problem (16.2) is 2π, whereas no admissible domain in A provides the value 2π to the cost functional.
Figure 16.1. An unbounded set Q. On the other hand, it is very well known that the classical isoperimetric problem, with Q = RN and f ≡ 1, admits a solution which is any ball of measure c, even if the complete proof of this fact requires very delicate tools, especially when the dimension N is larger than 2. A complete characterization of pairs (Q, f ) which provide the existence of a solution for the problem (16.2) seems to be difficult. A variant of the isoperimetric problem consists in not counting some parts of the boundary ∂A in the cost functional. More precisely, if Q is the closure of an open set with a Lipschitz boundary, we may consider problem (16.2) with Per(A) replaced by the cost functional Per (A) = |D1A | = HN −1 ( ∩ ∂ ∗ A)
which does not count the part of ∂A which is included in ∂. The existence of a solution when is bounded still holds, as above, together with nonexistence examples when this
i
i i
i
i
i
i
604
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems
boundedness condition is dropped. Indeed, it is enough to take f , c, Q as above and to observe that the infimum of problem (16.2) is in this case zero, whereas no admissible domain provides the value zero to the cost functional.
16.2 The Newton problem Another classical question which can be considered as a shape optimization problem is the determination of the best aerodynamical profile for a body in a fluid stream under some constraints on its size. This problem, at least within the class of radially symmetric bodies, which makes the problem one-dimensional, was first considered by Newton, who gave a rather simple variational expression for the aerodynamical resistance of a convex body in a fluid stream. Here are his words (from Principia Mathematica): If in a rare medium, consisting of equal particles freely disposed at equal distances from each other, a globe and a cylinder described on equal diameter move with equal velocities in the direction of the axis of the cylinder, (then) the resistance of the globe will be half as great as that of the cylinder.… I reckon that this proposition will be not without application in the building of ships. Indeed, if we make the assumption that the resistance is due to the impact of fluid particles against the body surface, if all the particles are supposed independent (which is quite reasonable if the fluid is rarefied), and if the tangential friction is neglected, simple geometric considerations lead us to obtain for the resistance along the direction of the fluid stream the expression 1 R(u) = dx, (16.3) 2 1 + |Du| where we normalized to one all the physical multiplicative constants involving the density and the velocity of the fluid. Here represents the cross section of the body at the basis level, and u(x) is a function whose graph is the body upper boundary. Since the validity of the model requires that all particles hit the body at most once, we consider only convex bodies, which turns out to require convex and u : → [0, +∞[ concave. Note that the integral functional F above is neither convex nor coercive. Therefore, obtaining an existence theorem for minimizers via the usual direct method in the calculus of variations may fail. Indeed, if we do not impose any further constraint on the competing functions u, the infimum of the functional in (16.3) turns out to be zero, as immediately seen by taking, for instance, un (x) = n dist(x, ∂) for every n ∈ N and by letting n → +∞. Therefore, no function u can minimize the functional F , because F (u) > 0 for every function u. A complete discussion of the problem can be found in [95], where all the relevant references are quoted. Here we simply recall that the problem 1 dx : u concave, 0 ≤ u ≤ M min 2 1 + |Du| admits a solution uopt . Some interesting necessary conditions of optimality can be deduced: for instance (see [166]), it can be proved that on every open set ω where uopt is of class C 2
i
i i
i
i
i
i
16.3. Optimal Dirichlet free boundary problems
“abmb 2005/1 page 6 i
605
we obtain det D 2 u(x) = 0
∀x ∈ ω.
In particular, this excludes that in the case = B(0, R) the solution uopt is radially symmetric. The optimal radially symmetric profile and a nonsymmetric profile which is better that all the radial ones are shown, respectively, in Figures 16.2 and 16.3.
1
Figure 16.2. The optimal radial shape for M = R. It is interesting to notice that with simple calculations, one can write the optimization problem above in the form (16.1) by taking the cost functional as a boundary integral, F (A) = j x, ν(x) dHN −1 , ∂A
for a suitable integrand j (x, s), being ν(x) the exterior normal unit vector to ∂A at x and HN−1 the Hausdorff (N − 1)-dimensional measure (see [103]).
16.3
Optimal Dirichlet free boundary problems
We consider now the model example of a Dirichlet problem over an unknown domain, which has to be optimized according to a given cost functional. More precisely, we consider a given bounded open subset of RN , an admissible class A of subsets of , a given function
i
i i
i
i
i
i
606
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems
2
0
Figure 16.3. A nonradial profile better than all radial ones for M = 2R. f ∈ L2 (), and a cost functional of the form F (A) = j (x, uA ) dx.
(16.4)
Here the integrand j : × R → R is given, and we denote by uA the unique solution of the elliptic problem −u = f in A, (16.5) u ∈ H01 (A), extended by zero to \ A. It is well known that in general one should not expect the existence of an optimal solution; below we show an example where the existence of an optimal domain does not occur. The problem we consider is of the form (16.1), where the admissible class A consists of all subdomains of a given bounded open subset of RN and the cost functional F is of the form (16.4) with j (x, s) = |s − u(x)|2 for a prescribed desired state u. In the thermostatic model the shape optimization problem (16.1) with the choices above consists in finding an optimal distribution, inside , of the
i
i i
i
i
i
i
16.3. Optimal Dirichlet free boundary problems
“abmb 2005/1 page 6 i
607
Dirichlet region \ A to achieve a temperature which is as close as possible to the desired temperature u, once the heat sources f are prescribed. For simplicity, we consider a uniformly distributed heat source, that is, we take f ≡ 1, and we take the desired temperature u constantly equal to c > 0. Therefore, problem (16.1) becomes 2 1 min |uA − c| dx : −uA = 1 in A, uA ∈ H0 (A) . (16.6)
We will actually show that for small values of the constant c no regular domain A can solve problem (16.6) above; the proof of nonexistence of any domain is slightly more delicate and requires additional tools like the capacitary form of necessary conditions of optimality (see, for instance, [95], [98], [99], [109]). Proposition 16.3.1. If c > 0 is small enough, then problem (16.6) has no smooth solutions. Proof. The nonexistence proof can be obtained by contradiction. Assume indeed that a regular domain A solves the optimization problem (16.6) and that A does not coincide with the whole set . Take a point x0 ∈ \ A and a small ball Bε of radius ε sufficiently small, centered at x0 . If uA denotes the solution of (16.5) corresponding to A, and if ε is small enough so that Bε does not intersect A, then the solution uA∪Bε , corresponding to the admissible choice A ∪ Bε , is given by if x ∈ A, uA (x) uA∪Bε (x) = (ε 2 − |x − x0 |2 )/4 if x ∈ Bε , 0 otherwise. Therefore, we obtain F (A) = |uA − c|2 dx + c2 dx + c2 dx, A
Bε
\(A∪Bε )
2 2 2 ε − |x − x0 | 2 F (A ∪ Bε ) = |uA − c| dx + c2 dx. − c dx + 4 A Bε \(A∪Bε ) By using the minimality of A this then yields 2 2 2 ε − |x − x0 | c meas(Bε ) ≤ − c dx 4 Bε ε 2 2 2 ε − r = N ε−N meas(Bε ) − c r N −1 dr 4 0 ε 1 2 = c meas(Bε ) + (ε 2 − r 2 )(ε 2 − r 2 − 8c)r N −1 dr, 16 0 2
which, for a fixed c > 0, turns out to be false if ε is small enough. Thus all regular domains A = are ruled out by the argument above. We can now exclude also the case A = if c is small enough, by comparing the full domain to the empty set ∅. This gives, taking into account that u∅ ≡ 0, F () = |u − c|2 dx,
i
i i
i
i
i
i
608
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems F (∅) =
c2 dx,
so that we have F (∅) < F () if c is small enough. Hence all regular subdomains of are excluded, and the nonexistence proof is achieved. Nevertheless, the existence of an optimal domain occurs for problem (16.1) in some particular cases: (i) when severe geometrical constraints on the class of admissible domains are imposed; (ii) when the cost functional fulfills some particular qualitative assumptions; (iii) when the problem is of a very special type, involving, for instance, only the eigenvalues of the Laplace operator, and where neither geometrical constraints nor monotonicity of the cost is required. See the lecture notes of Bucur and Buttazzo [95] for a more complete discussion on this topic; here, for the sake of simplicity, we list only some situations when conditions (i), (ii), and (iii) occur. Concerning case (i), a rather simple situation when the existence of an optimal solution occurs is when the class of admissible domains fulfills the geometrical constraint which is called the exterior cone condition. It consists in requiring that there exists a fixed height h and opening ω such that for every domain A of the admissible class A and every point x0 ∈ ∂A a cone with height h, opening ω, and vertex at x0 is contained in \ A. This condition is weaker than an equi-Lipschitz conditions on the class of admissible domains; conditions weaker than the exterior cone but which still imply the existence of an optimal domain can be given in terms of capacity (see [95]). Concerning case (ii), we consider the problem % & min F (A) : A ∈ A, meas(A) ≤ m ,
(16.7)
where the volume constraint meas(A) ≤ m has been added. The cost functional F is assumed to be monotone nonincreasing with respect to the set inclusion, that is, A1 ⊂ A2
⇒
F (A2 ) ≤ F (A1 ).
Moreover, F is assumed to be lower semicontinuous with respect to the γ -convergence on the class of domains, defined by An → A in the γ -convergence
⇐⇒
uAn → uA weakly in H01 (),
where uAn and uA are the solutions of (16.5) in An and A, respectively, with the right-hand side f ≡ 1. Under the assumptions above, Buttazzo and Dal Maso in [100] showed that problem (16.5) admits an optimal solution. It is important to stress that several optimal shape problems can be written in the form (16.7) with a cost functional F which is nonincreasing for the set inclusion and γ -lower semicontinuous. For instance, if L denotes a second-order elliptic operator of the form Lu = − div a(x)Du
i
i i
i
i
i
i
16.4. Optimal distribution of two conductors
“abmb 2005/1 page 6 i
609
with the N ×N matrix a(x) symmetric, uniformly elliptic, and with bounded and measurable coefficients, we may consider the spectrum (A) of L associated to the Dirichlet boundary conditions on A: Lu = λu, u ∈ H01 (A). It is known that (A) is given by a nonnegative sequence λk (A) which tends to +∞, that every λk (A) is a nonincreasing set function with respect to A, and that the mappings λk (A) are γ -continuous. Therefore the existence result above applies and we obtain that the problem % & min (A) : A ∈ A, meas(A) ≤ m admits an optimal solution whenever the mapping : RN → [0, +∞] is • nondecreasing in the sense that 1k ≤ 2k ∀ k
⇒
(1 ) ≤ (2 );
• lower semicontinuous in the sense that nk → k ∀ k
⇒
() ≤ lim inf (n ). n→+∞
In particular, for a fixed integer k, all cost functionals of the form F (A) = φ λ1 (A), . . . λk (A) with a mapping φ : Rk → R continuous and nondecreasing in each variable verify the assumptions above. Concerning case (iii), we may still prove the existence of an optimal domain for problem (16.7) for some special form of the cost functional. The case which has been considered in [96] (see also [95]) is when F (A) = φ λ1 (A), λ2 (A) , where λ1 (A) and λ2 (A) are the first two eigenvalues of the Laplace operator − on H01 (A). Due to the special form of the cost functional and to the fact that the operator is −, it is possible to show that an optimal solution exists without any monotonicity assumption on φ by requiring only that φ is a lower semicontinuous function on R2 .
16.4
Optimal distribution of two conductors
Another interesting case of a shape optimization problem is the optimal distribution of two given conductors into a given set. If denotes a bounded open subset of RN (the prescribed container), denoting by α and β the conductivities of the two materials, the problem consists in filling with the two materials in the best performing way according to some given cost functional. The volume of each material can also be prescribed. It is convenient to denote by A the domain where the conductivity is α and by aA (x) the conductivity coefficient aA (x) = α1A (x) + β1\A (x).
i
i i
i
i
i
i
610
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems
In this way the state equation becomes − div aA (x)Du = f in , u = 0 on ∂,
(16.8)
where f is the (given) source density, and we denote by uA its unique solution. It is well known (see, for instance, [164], [192]) that if we take as a cost functional an integral of the form j (x, 1A , uA , DuA ) dx,
in general an optimal configuration does not exist. However, the addition of a perimeter penalization is enough to imply the existence of classical optimizers. In other words, if we take as a cost the functional j (x, 1A , u, Du) dx + σ Per (A), J (u, A) =
where σ > 0, the problem can be written as an optimal control problem in the form % & min J (u, A) : A ⊂ , u solves (16.8) . (16.9) A volume constraint of the form meas(A) = m could also be present. The main ingredient for the proof of the existence of an optimal classical solution is the following result. Theorem 16.4.1. Let an (x) be a sequence of N × N symmetric matrices with measurable coefficients, such that the uniform ellipticity condition c0 |z|2 ≤ an (x)z · z ≤ c1 |z|2
∀x ∈ , ∀z ∈ RN
(16.10)
holds with 0 < c0 ≤ c1 . Given f ∈ H −1 () we denote by un the unique solution of the problem un ∈ H01 (). (16.11) − div an (x)Du = f, If an (x) → a(x) a.e. in , then un → u weakly in H01 (), where u is the solution of (16.11) with an replaced by a. Proof. By the uniform ellipticity condition (16.10) we have c0 |Dun |2 dx ≤ f un dx
and, by the Poincaré inequality we have that un are bounded in H01 () so that a subsequence (still denoted by the same indices) converges weakly in H01 () to some v. All we have to show is that v = u, or equivalently that − div a(x)Dv = f. (16.12) This means that for every smooth test function φ we have a(x)DvDφ dx = f, φ.
i
i i
i
i
i
i
16.4. Optimal distribution of two conductors
“abmb 2005/1 page 6 i
611
Then it is enough to show that for every smooth test function φ we have lim a(x)DvDφ dx. an (x)Dun Dφ dx = n→+∞
This is an immediate consequence of the fact that φ is smooth, Dun → Dv weakly in L2 (), and an → a a.e. in remaining bounded. Another way to show that (16.12) holds is to verify that v minimizes the functional F (w) = a(x)DwDw dx − 2f, w, w ∈ H01 (). (16.13)
Since the function α(s, z) = sz·z, which is defined for all z ∈ RN and for symmetric positive definite N × N matrices s, is convex in z and lower semicontinuous in s, the functional (a, ξ ) = a(x)ξ · ξ dx
is sequentially lower semicontinuous with respect to the strong L1 convergence on a and the weak L1 convergence on ξ (see, for instance, Theorem 13.1.1 and [97]). Therefore we have F (v) = (a, Dv) − 2f, v ≤ lim inf (an , Dun ) − 2f, un = lim inf F (un ). n→+∞
n→+∞
Since un minimizes the functional Fn defined as in (16.13) with a replaced by an , we also have for every w ∈ H01 () Fn (un ) ≤ Fn (w) = an (x)DwDw dx − 2f, w,
so that taking the limit as n → +∞ and using the convergence an → a we obtain lim inf Fn (un ) ≤ a(x)DwDw dx − 2f, w = F (w). n→+∞
Thus F (v) ≤ F (w), which shows what is required. Remark 16.4.1. The result above can be rephrased in terms of G-convergence by saying that for uniformly elliptic operators of the form − div a(x)Du the G-convergence is weaker than the L1 -convergence of coefficients. Analogously, we can say that the functionals Gn (w) = an (x)DwDw dx
-converge, with respect to the L2 () convergence, to the functional G defined in the same way with a in the place of an . Corollary 16.4.1. If An → A in L1 (), then uAn → uA weakly in H01 (). A more careful inspection of the proof of Theorem 16.4.1 shows that the following stronger result holds.
i
i i
i
i
i
i
612
“abmb 2005/1 page 6 i
Chapter 16. An introduction to shape optimization problems
Theorem 16.4.2. Under the same assumptions of Theorem 16.4.1, the convergence of un is actually strong in H01 (). Proof. We have already seen that un → u weakly in H01 (), which gives Dun → Du weakly in L2 (). Denoting by cn (x) and c(x) the square root matrices of an (x) and a(x), respectively, we have that cn → c a.e. in remaining equibounded. Then cn (x)Dun converges to c(x)Du weakly in L2 (). Multiplying (16.4) by un and integrating by parts we obtain a(x)DuDu dx = f, u = lim f, un n→+∞ = lim an (x)Dun Dun dx. n→+∞
This implies that cn (x)Dun → c(x)Du strongly in L2 (). −1 Multiplying now by cn (x) we finally obtain the strong convergence of Dun to Du in L2 (). We are now in a position to obtain an existence result for the optimization problem (16.2). On the function j we only assume that it is nonnegative, Borel measurable, and such that j (x, s, z, w) is lower semicontinuous in (s, z, w) for a.e. x ∈ . Theorem 16.4.3. Under the assumptions above, the minimum problem (16.2) admits at least a solution. Proof. Let (An ) be a minimizing sequence; then Per (An ) are bounded so that, up to extracting subsequences, we may assume (An ) is strongly convergent in the L1loc sense to some set A ⊂ . We claim that A is a solution of problem (16.2). Let us denote by un a solution of problem (16.1) associated to An ; by Theorem 16.4.2, (un ) converges strongly in H01 () to some u ∈ H01 (). Then by the lower semicontinuity of the perimeter (see Proposition 10.1.1) and by Fatou’s lemma we have J (u, A) ≤ lim inf J (un , An ), n→+∞
which proves the optimality of A. Remark 16.4.2. The same proof works when volume constraints of the form meas(A) = m are present. Indeed this constraint passes to the limit when An → A strongly in L1 (). The existence result above shows the existence of a classical solution for the optimization problem (16.2). This solution is simply a set with finite perimeter and additional assumptions have to be made to prove further regularity. For instance, in [18] Ambrosio and Buttazzo considered the similar problem = > min E(u, A) + c Per (A) : u ∈ H01 (), A ⊂ ,
i
i i
i
i
i
i
16.4. Optimal distribution of two conductors where c > 0 and
“abmb 2005/1 page 6 i
613
aA (x)|Du|2 + 1A (x)g1 (x, u) + 1\A g2 (x, u) dx.
E(u, A) =
They showed that every solution A is actually an open set provided g1 and g2 are Borel measurable and satisfy inequalities gi (x, s) ≥ γ (x) − k|s|2 ,
i = 1, 2,
where γ ∈ L1 () and k < αλ1 , λ1 being the first eigenvalue of − on .
i
i i
i
i
“abmb 2005/1 page 6 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 6 i
Bibliography [1] Y. Abddaimi, C. Licht, G. Michaille. Stochastic homogenization of an integral functional of quasiconvex function with linear growth. Asymptotic Analysis 15 (1997), 183–202. [2] N. Acerbi, N. Fusco. Semicontinuity problems in the calculus of variations. Mech. Anal. 86 (1984), 125–145. [3] M. A. Ackoglu, U. Krengel. Ergodic theorem for superadditive processes. J. Reine Angew. Math. 323 (1981), 53–67. [4] S. Adly, M. Thera, E. Ernst. Stability of the solution set of non-coerecive variational inequalities. Comm. Contemp. Math. 4 (2002), no. 1, 145–160. [5] S. Adly, E. Ernst, M. Thera. A characterization of convex and semicoercive functionals. J. Convex Anal. 8 (2001), no. 1, 127–148. [6] S. Agmon, A. Douglis, L. Nirenberg. Estimates near the boundary for solutions of elliptic partial differential equations satisfying general boundary conditions I. Comm. Pure Appl. Math. 12 (1959), 623–727. [7] S. Agmon, A. Douglis, L. Nirenberg. Estimates near the boundary for solutions of elliptic partial differential equations satisfying general boundary conditions II. Comm. Pure Appl. Math. 17 (1964), 35–92. [8] G. Alberti. Rank one properties for derivatives of functions with bounded variation. Proc. Roy. Soc. Edinburgh A-123 (1993), 239–274. [9] G. Allaire. Shape optimization by the homogeneization method. Appl. Math. Sci. 146, Springer-Verlag, New York, 2002. [10] F. Alvarez, J.-P. Mandallena. Multiparameter homogenization by localization and blow-up. Proc. Roy. Soc. Edinburgh 134A (2004), 801–814. [11] F. Alvarez, J.-P. Mandallena. Homogenization of multiparameter integrals. Nonlinear Anal. 50 (2002), 839–870. [12] A. Ambrosetti, P. Rabinowitz. Dual variational methods in critical point theory and applications. J. Funct. Anal. 14 (1973), 349–381. 615
i
i i
i
i
i
i
616
“abmb 2005/1 page 6 i
Bibliography
[13] L. Ambrosio. A compactness theorem for a special class of functions of bounded variation. Boll. Un. Mat. It. 3-B (1989), 857–881. [14] L. Ambrosio. A new proof of the SBV compactness theorem. Università di Pisa, 2.148 754, Sept. 1993. [15] L. Ambrosio. On the lower semicontinuity of quasi-convex integrals in SBV (; Rk ). Nonlinear Anal. 23 (1994), 405–425. [16] L. Ambrosio. Corso introduttivo alla Theoria Geometrica della misura ed alle Superfici Minime. Appunti dei corsi tenuti da documenti della Scuola, Scuola Normale Superiore, Pisa, 1997. [17] L. Ambrosio, A. Braides. Energies in SBV and variational models in fracture mechanics. Homogenization and Applications to Material Sciences 9. D. Cioranescu, A. Damlamian, P. Donato eds, Gakuto, Gakkotosho, Tokyo, Japan (1997), 1–22. [18] L. Ambrosio, G. Buttazzo. An optimal design problem with perimeter penalization. Calc. Var. 1 (1993), 55–69. [19] L. Ambrosio, G. Buttazzo. Weak lower semicontinuous envelope of functionals defined on a space of measures. Ann. Mat. Pura Appl. 150 (1988), 311–340. [20] L. Ambrosio, G. Dal Maso. A general chain rule for distributional derivatives. Proc. Amer. Math. Soc. 108 (1990), 691–702. [21] L. Ambrosio, G. Dal Maso. On the relaxation in BV (; Rm ) of quasi-convex integrals. J. Funct. Anal. 109 (1992), 76–97. [22] L. Ambrosio, N. Fusco, D. Pallara. Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs, Oxford University Press, New York, 2000. [23] L. Ambrosio, P. Tilli. Topics on analysis in metric spaces. Oxford Lecture Series in Math. Appl. 25, Oxford University Press, Oxford, UK, 2004. [24] L. Ambrosio, V. M. Tortorelli. Approximation of functionals depending on jumps by elliptic functionals via -convergence. Comm. Pure Appl. Math. 18 (1990), 999– 1036. [25] O. Anza Hafsa. Variational formulations on thin elastic plates with constraints. J. Convex Anal. (to appear). [26] O. Anza Hafsa, J.-P. Mandallena. Interchange of infimum and integral. Calc. Var. 18 (2003), 433–449. [27] O. Anza Hafsa, J.-P. Mandallena. Relaxation of second order geometric integrals and non-local effects. J. Nonlinear Convex Anal. 5 (2004), 295–306. [28] H. Attouch. Variational convergence for functions and operators. Applicable Mathematics Series, Pitman Advanced Publishing Program, Boston, 1985.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
617
[29] H. Attouch. Viscosity solutions of minimization problems. SIAM J. Optim. 6 (1996), 769–806. [30] H. Attouch, G. Bouchitté, M. Mabrouk. Variational formulations for semilinear elliptic equations involving measures. Nonlinear Variational Problems, vol II, A. Marino, A. Murthy, eds., Pitman Res. Notes in Math. 193 (1989), 1–56. [31] H. Attouch, H. Brezis. Duality for the sum of convex functions in general Banach spaces. Aspects of Mathematics and Its Applications, J. Barroso, ed., North-Holland, Amsterdam (1986), 125–133. [32] H. Attouch, T. Champion. Lp regularization of the non-parametric minimal surface problem. Ill-Posed Variational Problems and Regularization Techniques, Lecture Notes in Econom. and Math. Systems 477 (1999), 25–34. [33] H. Attouch, R. Cominetti. Lp approximation of variational problems in L1 and L∞ . Nonlinear Anal. 36 (1999), no. 3, 373–399. [34] H. Attouch, A. Damlamian. Applications des méthodes de convexité et monotonie à l’étude de certaines équations quasi-linéaires. Proc. Roy. Soc. Edinburgh 79A (1977), 107–129. [35] H. Attouch, C. Picard. Variational inequalities with varying obstacles: The general form of the limit problem. J. Funct. Anal. 50 (1983), no. 3, 329–386. [36] H. Attouch, H. Riahi. Stability results for Ekeland’s ε-variational principle and cone extremal solutions. Math. Oper. Res. 18 (1993), no. 1, 173–201. [37] H.Attouch, A. Soubeyran. Towards stable routines: Improving and satisficing enough by exploration-exploitation on an unknown landscape. Working paper GREQAM, Montpellier (2004). [38] H. Attouch, M. Thera. A general duality principle for the sum of two operators. J. Convex Anal. 3 (1996), no. 1, 1–24. [39] H. Attouch, R. J.-B. Wets. Epigraphical analysis. In Analyse non linéaire, Perpignan 1987, and Ann. Inst. H. Poincaré, Anal. Non Linéaire 6 (1989), 73–100. [40] H. Attouch, R. J.-B. Wets. Quantitative stability of variational systems: I. The epigraphical distance. Trans. Amer. Math. Soc. 328 (1991), no. 2, 695–729. [41] G. Aubert, R. Deriche, P. Kornprobst. Computing optical flow via variational techniques. SIAM J. Appl. Math. 60 (1999), 156–182. [42] G. Aubert, P. Kornprobst. A mathematical study of the relaxed optical flow problem in the space BV (). SIAM J. Math. Anal. 30 (1999), 1282–1308. [43] G. Aubert, P. Kornprobst. Mathematical problems in image processing: Partial differential equations and the calculus of variations. Appl. Math. Sci. 147, SpringerVerlag, New York, 2002.
i
i i
i
i
i
i
618
“abmb 2005/1 page 6 i
Bibliography
[44] Th. Aubin. Problèmes isopérimétriques et espaces de Sobolev. J. Diff. Geom., 11 (1976), 573–598. [45] J.P. Aubin. Mathematical methods of game and economic theory. North-Holland, Amsterdam, 1979. [46] J.P. Aubin, I. Ekeland. Applied nonlinear analysis. John Wiley, New York, 1984. [47] J.P. Aubin, H. Frankowska. Set-valued analysis. Systems Control Found. Appl. 2, Birkhäuser-Verlag, Boston, 1990. [48] A. Auslender. Noncoercive optimization problems. Math. Oper. Res. 21 (1996), 769–782. [49] D. Azé. Eléments d’ Analyse Convexe et Variationnelle. Ellipse, Paris, 1997. [50] C. Baiocchi, G. Buttazzo, F. Gastaldi, F. Tomarelli. General existence theorems for unilateral problems in continuum mechanics. Arch. Ration. Mech. Anal. 100 (1988), no. 2, 149–189. [51] C. Baiocchi, F. Gastaldi. Inéquations variationnelles non coercives. C.R. Acad. Sci. Paris 299 (1984), 647–650. [52] C. Baiocchi, F. Gastaldi. Some existence results on noncoercive variational inequalities. Ann. Scuola Norm. Sup. Pisa (4) 13 (1986), 617–659. [53] E.J. Balder. Lectures on Young measures theory and its applications in economics. Workshop di Teoria della Misura e Analisi Reale, Grado, 1997, Rend. Instit. Univ. Trieste 31 (2000), suppl. 1, 1–69. [54] J. M. Ball Convexity conditions and existence theorems in nonlinear elasticity. Arch. Ration. Mech. Anal. 63 (1977), 13–23. [55] J.M. Ball. A version of the fundamental theorem for Young measures. PDE’s and Continuum Models of Phase Transitions, M. Rascle, D. Serre, M. Slemrod, eds., Lecture Notes in Physics 344, Springer-Verlag, New York, 1989, 207–215. [56] J.M. Ball, R.D. James. Fine phase mixtures as minimizers of energy. Arch. Ration. Mech. Anal. 100 (1987), 13–52. [57] J.M. Ball, F. Murat. W 1,p -quasiconvexity and variational problems for multiple integrals. J. Funct. Anal. 58 (1984), 225–253. [58] G. I. Barenblatt. The mathematical theory of equilibrium cracks in brittle fracture. Adv. Appl. Mech. 7 (1962), 55. [59] E. Barozzi, E. H. A. Gonzalez. Least area problems with a volume constraint. Variational methods for equilibrium problems of fluids, Astérisque 118 (1984), 33–53. [60] G. Beer. Lipschitz regularization and the convergence of convex functions. Numer. Funct. Anal. Optim. 15 (1994), no. 1–2, 31–46.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
619
[61] G. Bellettini, A. Coscia. Discrete approximation of a free discontinuity problem. Numer. Funct. Anal. Optim. 15 (1994), no. 3-4 , 201–224. [62] M. Bellieud, G. Bouchitté. Homogenization of elliptic problems in a fiber reinforced structure. Nonlocal effect. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 26 (1998), no. 3, 407–436. [63] M. Bellieud, G. Bouchitté. Homogenization of a soft elastic material reinforced by fibers. Asymptot. Anal. 32 (2002), no. 2, 153–183. [64] M. Belloni, G. Buttazzo, L. Freddi. Completion by Gamma-convergence for optimal control problems. Ann. Fac. Sci. Toulouse Math. 2 (1993), 149–162. [65] A. Beurling, J. Deny. Dirichlet spaces. Proc. Nat. Acad. Sci. USA 45 (1959), 208–215. [66] K. Bhattacharya. Wedge-like microstructure in martensite. Acta Metal. 39 (1991), 2431–2444. [67] K. Bhattacharya. Self accommodation in martensite. Arch. Ration. Mech. Anal. 120 (1992), 201–244. [68] K. Bhattacharya, N. Firoozye, R. D. James, R. V. Kohn. Restrictions on microstructures. Proc. Roy. Soc. Edinburgh 124A (1994), 843–878. [69] K. Bhattacharya, R. V. Kohn. Elastic energy minimization and the recoverable strains of polycrystalline shape-memory materials. Arch. Ration. Mech. Anal. 139 (1997), 99–180. [70] E. Bishop, R. Phelps. The support functional of a convex set. Proc. Symp. Pure Math. Amer. Math. Soc. 7 (1962), 27–35. [71] J.M. Borwein, A. Lewis. Convex analysis and nonlinear optimization. Canadian Mathematical Society Books in Mathematics, Springer-Verlag, New York, 2000. [72] J.M. Borwein, A. Lewis, D. Noll. Maximum entropy spectral analysis using first order information. Part I: Fisher information and convex duality. Math. Oper. Res. 21 (1996), 442–468. [73] J.M. Borwein, A. Lewis, R. Nussbaum. Entropy minimization, DAD problems and doubly stochastic kernels. J. Funct. Anal. 123 (1994), 264–307. [74] G. Bouchitté, G. Buttazzo. New lower semicontinuity results for non convex functionals defined on measures. Nonlinear Anal. 15 (1990), 679–692. [75] G. Bouchitté, G. Buttazzo. Integral representation of nonconvex functionals defined on measures. Ann. Inst. H. Poincaré Anal. Non Linéaire 9 (1992), 101–117. [76] G. Bouchitté, G. Buttazzo. Relaxation for a class of nonconvex functionals defined on measures. Ann. Inst. H. Poincaré Anal. Non Linéaire 10 (1993), 345–361.
i
i i
i
i
i
i
620
“abmb 2005/1 page 6 i
Bibliography
[77] G. Bouchitté, G. Buttazzo, P. Seppecher. Energies with respect to a measure and applications to low dimensional structures. Calc. Var. Partial Differential Equations 5 (1997), no. 1, 37–54. [78] G. Bouchitté, P. Suquet. Equi-coercivity of variational problems: The role of recession functions. Proceedings of the Séminaire du Collège de France, Vol XII, (Paris, 1991–1993), 31–54, Pitman Res. Notes Math. Ser., 302. [79] Bourbaki. Elements de mathématiques–Espaces vectoriels topologiques, Act. Sci. Ind., 1189, Hermann, Paris 1966. [80] B. Bourdin, G.A. Francfort, J.-J. Marigo. Numerical experiments in revisited brittle fracture. J. Mech. Phys. Solids 48 (2000), no. 4, 797–826. [81] A. Bourgeat, A. Mikelic, S. Wright. Stochastic two-scale convergence in the mean and applications. J. Reine Angew. Math. 456 (1994), 19–51. [82] A. Braides. Approximation of free-discontinuity problems. Lecture Notes in Math. 1694, Springer-Verlag, New York, 1998. [83] A. Braides. -convergence for beginners. Oxford Lecture Ser. Math. Appl. 22, Oxford University Press, Oxford, UK, 2002. [84] A. Braides. Discrete approximation of functionals with jumps and creases. Homogenization, 2001 (Naples), Gakuto Internat. Ser. Math. Sci. Appl. 18, Gakk¯otosho, Tokyo, 2003, 147–153. [85] A. Braides, A. Coscia. A singular perturbation approach to problems in fracture mechanics. Math. Mod. Meth. Appl. Sci. 3 (1993), 302–340. [86] A. Braides, G. Dal Maso. Non-local approximation of the Mumford-Shah functional. Calc. Var. Partial Differential Equations 5 (1997), 293–322. [87] A. Braides, G. Dal Maso, A. Garroni. Variational formulation of softening phenomena in fracture mechanics: The one dimensional case. Arch. Ration. Mech. Anal. 146 (1999), 23–58. [88] A. Braides, M.S. Gelli. Continuum limits of discrete systems without convexity hypotheses. Math. Mech. Solids 7 (2002), 41–66. [89] A. Braides, M. S. Gelli. Limits of discrete systems with long-range interactions. J. Convex Anal. 9 (2002), no. 2, 363–399. [90] H. Brezis. Analyse Fonctionnelle, Théorie et Applications. Masson, Paris, 1983. [91] H. Brezis. Intégrales convexes dans les espaces de Sobolev. Israel J. Math. 13 (1972), 9–23. [92] H. Brezis, F.E. Browder. A general principle on ordered sets in nonlinear functional analysis. Adv. Math., 21 (1976), no. 3, 353–364.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
621
[93] H. Brezis, L. Nirenberg. Characterizations of the ranges of some nonlinear operators and applications to boundary value problems. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 5 (1978), 225–326. [94] H. Buchwalter. Variations sur l’Analyse. Ellipse, Paris, 1992. [95] D. Bucur, G. Buttazzo. Variational Methods in Some Shape Optimization Problems. Lecture notes, Dipartimento di Matematica Università di Pisa and Scuola Normale Superiore di Pisa, Series Appunti di Corsi della Scuola Normale Superiore (2002). [96] D. Bucur, G. Buttazzo, I. Figueiredo. On the attainable eigenvalues of the Laplace operator. SIAM J. Math. Anal. 30 (1999), 527–536. [97] G. Buttazzo. Semicontinuity, relaxation and integral representation in the calculus of variations. Pitman Res. Notes Math. Ser. 207, Longman, Harlow, UK, 1989. [98] G. Buttazzo, G. Dal Maso. Shape optimization for Dirichlet problems: Relaxed solutions and optimality conditions. Bull. Amer. Math. Soc. 23 (1990), 531–535. [99] G. Buttazzo, G. Dal Maso. Shape optimization for Dirichlet problems: Relaxed formulation and optimality conditions. Appl. Math. Optim. 23 (1991), 17–49. [100] G. Buttazzo, G. Dal Maso. An existence result for a class of shape optimization problems. Arch. Ration. Mech. Anal. 122 (1993), 183–195. [101] G. Buttazzo, L. Freddi. Relaxed optimal control problems and applications to shape optimization. Lecture Notes, NATO-ASI Summer School Nonlinear Analysis, Differential Equations and Control, Montreal, Kluwer, Dordrecht, The Netherlands 1999, 159–206. [102] G. Buttazzo, M. Giaquinta, S. Hildebrandt. One-dimensional variational problems. An introduction. Oxford Lecture Ser. Math. Appl. 15, Oxford University Press, New York, 1998. [103] G. Buttazzo, P. Guasoni. Shape optimization problems over classes of convex domains. J. Convex Anal. 4 (1997), 343–351. [104] C. Castaing, P. Raynaud de Fitte, M. Valadier. Young measure on topological spaces. With applications in control theory and probability theory. Math. Appl. 571, Kluwer Academic Publishers, Dordrecht, Netherlands, 2004. [105] C. Castaing, M. Valadier. Convex analysis and measurable multifunctions. Lecture Notes in Math. 590, Springer-Verlag, Berlin, 1977. [106] E. Chabi, G. Michaille. Ergodic theory and application to nonconvex homogenization. Set Valued Anal. 2 (1994), 117–134. [107] A. Chambolle. Image segmentation by variational methods: Mumford and Shah functional and the discrete approximations. SIAM J. Appl. Math. 55 (1995), 827– 863.
i
i i
i
i
i
i
622
“abmb 2005/1 page 6 i
Bibliography
[108] M. Chipot, C. Collins, D. Kinderlehrer. Numerical analysis of oscillations in multiple well problems. Numer. Math. 70 (1995), 259–282. [109] M. Chipot, G. Dal Maso. Relaxed shape optimization: The case of nonnegative data for the Dirichlet problem. Adv. Math. Sci. Appl. 1 (1992), 47–81. [110] M. Chipot, D. Kinderlehrer. Equilibrium configurations of crystals. Arch. Ration. Mech. Anal. 103 (1988), 237–277. [111] Ph. Ciarlet. The finite element method for elliptic problems. North-Holland, Amsterdam, 1978. [112] F. H. Clarke. Optimization and nonsmooth analysis. John Wiley, New York, 1983. [113] C. Combari, L. Thibault. On the graph convergence of subdifferentials of convex functions. Proc. Amer. Math. Soc. 126 (1998), 2231–2240. [114] G. Cortesani, R. Toader. Nonlocal approximation of nonisotropic free-discontinuity problems. SIAM J. Appl. Math. 59 (1999), 1507–1519. [115] R. Courant, D. Hilbert. Methoden der Mathematischen Physik. Berlin, Vol I (1931), Vol II (1937). English ed. Interscience, New York, Vol I (1953), Vol II (1962). [116] B. Dacorogna. Direct methods in the calculus of variations. Appl. Math. Sci. 78, Springer-Verlag, Berlin, 1989. [117] G. Dal Maso. An introduction to -convergence. Birkhäuser, Boston, 1993. [118] G. Dal Maso, G. Franfort, R. Toader. Quasi-static evolution in brittle fracture: the case of bounded solutions. Calculus of Variations: Topics from the Mathematical Heritage of E. De Giorgi, Quad. Mat., 14, Dept. Math., Seconda Univ. Napoli, Caserts, 2004, 245–266. Also available on the Web at http://cvgmt. sns.it/cgi/get.cgi/papers/dalfratoa04a/. [119] G. Dal Maso, L. Modica. Nonlinear stochastic homogenization and ergodic theory. J. Reine Angew. Math. 363 (1986), 27–43. [120] R. Dautray, J. L. Lions. Analyse mathématique et calcul numérique pour les sciences et les techniques. Masson, Paris, 1984. [121] D. G. De Figueiredo. Lectures at the Tata Institute of Fundamental Research. Springer-Verlag, New York, 1989. [122] E. De Giorgi, L. Ambrosio. Un nuovo tipo di funzionale del calcolo delle variazioni. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 82 (1988), 199–210. [123] E. De Giorgi, M. Carriero, A. Leaci. Existence theorem for a minimum problem with free discontinuity set. Arch. Ration. Mech. Anal. 108 (1989), 195–218. [124] E. De Giorgi, T. Franzoni. Su un tipo di convergenza variazionale. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8) 58 (1975), 842–850.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
623
[125] C. Dellacherie, P.A. Meyer. Probabilité et Potentiel. Hermann, Paris, 1975. [126] F. Demengel, R. Temam. Convex functions of measures and applications. Indiana Univ. Math. J. 33 (1984), 673–709. [127] N. Dunford, J.T. Schwartz. Linear operators. Interscience, New York, 1958. [128] I. Ekeland. Remarques sur les problèmes variationnels. CRAS, t. 275, 1057–1059, 1972. [129] I. Ekeland. Remarques sur les problèmes variationnels II. CRAS, t. 276, 1347–1349, 1973. [130] I. Ekeland. On the variational principle. J. Math. Anal. Appl. 47 (1974), 324–353. [131] I. Ekeland, R.Temam. Convex analysis and variational problems. North-Holland, Amsterdam, 1978. [132] L.C. Evans, R.F. Gariepy. Measure theory and fine properties of functions. Stud. Adv. Math., CRC Press, Boca Raton, FL, 1992. [133] K.F. Falconer. The geometry of fractal sets. Cambridge University Press, Cambridge, UK, 1986. [134] H. Federer. Geometric measure theory. Springer-Verlag, Berlin, 1969. [135] G. Fichera. Problemi elastostatici con vincoli unilaterali: il problema di signorini con ambigue condizioni al contorno. Atti Accad. Naz. Lincei Mem. Cl. Sci. Fis. Mat. Natur. Sez. I (8) 7, 1963/1964, 91–140. [136] W. Fleming, R. Rishel. An integral formula for total gradient variation. Arch. Math. 11 (1960), 218–222. [137] I. Fonseca, S. Müller, P. Pedregal. Analysis of concentration and oscillation effects generated by gradients. SIAM J. Math. Anal. 29 (1998), no. 3, 736–756. [138] G. A. Francfort, J.-J. Marigo. Une approche variationnelle de la mécanique du défaut. Actes du 30ème Congrès d’Analyse Numérique: CANum’98 (Arles 1998), 57–74, ESAIM Proc. 6, Soc. Math. Appl. Indust., Paris, 1999. [139] G. A. Francfort, J.-J. Marigo. Cracks in fracture mechanics: A time indexed family of energy minimizers. Variations of domain and free-boundary problems in solid mechanics (Paris, 1997), 197–202, Solid Mech. Appl. 66, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999. [140] K. Friedrichs. Spektraltheorie halbbeschränkter Operatorem, I, II. Math. Ann. 109 (1934), 465–487; 685–713. [141] F. Gastaldi, F. Tomarelli. Some remarks on nonlinear and non-coercive variational inequalities. Boll. U.M.I. (7), 1-B (1987) 143–165.
i
i i
i
i
i
i
624
“abmb 2005/1 page 6 i
Bibliography
[142] G. Geymonat. Introduction à la localisation. Cours LMGC, Université Montpellier II, Mars-Avril 2003. [143] M. Giaquinta, S. Hildebrandt. Calculus of variations, I, II. Springer-Verlag, Berlin, 1996. [144] E. Giusti. Minimal surfaces and functions of bounded variation. Birkhäuser, Boston, 1984. [145] D. Gilbarg, N. Trudinger. Elliptic partial differential equations of second order. Springer-Verlag, Berlin, 1977. [146] R. Glowinski. Numerical methods for nonlinear variational problems. SpringerVerlag, New York, 1984. [147] R. Glowinski, J. L. Lions, R. Trémolière. Numerical analysis of variational inequalities. North-Holland, Amsterdam, 1981. [148] M. Gobbino. Finite difference approximation of the Mumford-Shah functional. Comm. Pure Appl. Math. 51 (1998), 197–228. [149] A. A. Griffith. The phenomenon of rupture and flow in solids. Phil Trans. Royal Soc. London A 221 (1920), 163–198. [150] P. Grisvard. Problèmes aux limites dans des domaines avec points de rebroussement. Ann. Fac. Sci. Toulouse Math. 6 (1995), no. 3, 561–578. [151] P. Grisvard. Singularities in boundary value problems. Rech. Math.Appl. 22, Masson, Paris, Springer-Verlag, Berlin, 1992. [152] M. E. Gurtin. On phase transitions with bulk, interfacial, and boundary energy. Arch. Ration. Mech. Anal. 96 (1986), 243–264. [153] A. Henrot, M. Pierre. Variation et optimisation de formes. Une analyse géométrique. Mathematiques et Applications 48, Springer, New York, 2005. [154] D. Hilbert. Mathematical Problems. Lecture delivered before the First International Congress of Mathematicians at Paris 1900. Bull. Amer. Math. Soc. 8, 437–479. (Reprinted in Browder, F., ed. 1976, Vol. 1, 1–34.) [155] J.B. Hiriart-Urruty, C. Lemaréchal. Convex analysis and mininimization algorithms I, II. Springer-Verlag, New York, 1993. [156] S.I. Huhjaev, A.I. Vol’pert. Analysis in classes of discontinuous functions and equations of mathematical physics. Martinus Nijhoff, Dordrecht, The Netherlands, 1985. [157] A. D. Ioffe. On lower semicontinuity of integral functionals I. II. SIAM J. Control Optim. 15 (1977), 521–538 and 991–1000. [158] A. D. Ioffe, V. M. Tihomirov. Theory of extremal problems. North-Holland, Amsterdam, 1979.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
625
[159] O. Iosifescu, C. Licht, G. Michaille. Variational limit of a one dimensional discrete and statistically homogeneous system of material points. Asymptot. Anal. 28 (2001), 309–329. [160] O. Iosifescu, C. Licht, G. Michaille. Variational limit of a one dimensional discrete and statistically homogeneous system of material points. CRAS, t. 322, Série I, 575– 580, 2001. [161] J. L. Joly. Une famille de topologies sur l’ensemble des fonctions convexes pour lesquelles la polarité est bicontinue. J. Math. Pures Appl. 52 (1973), 421–441. [162] D. Kinderlehrer, P. Pedregal. Characterization of Young measures generated by gradients. Arch. Ration. Mech. Anal. 119 (1991), 329–365. [163] D. Kinderlherer, G. Stampacchia. An introduction to variational inequalities and their applications. Classics Appl. Math. 31, SIAM, Philadelphia, 2000. [164] R. V. Kohn, G. Strang. Optimal design and relaxation of variational problems, I,II,III. Comm. Pure Appl. Math. 39 (1986), 113–137, 139–182, 353–377. [165] U. Krengel. Ergodic theorems. Studies in Mathematics, Walter de Gruyter, Berlin, New York, 1985. [166] T. Lachand-Robert, M.A. Peletier. An example of non-convex minimization and an application to Newton’s problem of the body of least resistance. Ann. Inst. H. Poincaré Anal. Non Linéaire 18 (2001), no. 2, 179–198. [167] B. Larrouturou, P. L. Lions. Méthodes Mathématiques pour les Sciences de l’ingénieur, Optimisation et Analyse Numérique. Ecole Polytechnique, Palaiseau, 1996. [168] H. Le Dret, A. Raoult. The nonlinear membrane model as variational limit in nonlinear three-dimensional elasticity. J. Math. Pures Appl., (9) 74 (1995), no. 6, 549–578. [169] L. Leghmizi, C. Licht, G. Michaille. The nonlinear membrane model: A Young measure and varifold formulation. ESAIM: COCV 11 (2005), 449–472. [170] C. Licht, G. Michaille. Global-local subadditive ergodic theorems and application to homogenization in elasticity. Ann. Math. Blaise Pascal 9 (2002), 21–62. [171] E.H. Lieb. Sharp constants in the Hardy-Littelwood-Sobolev and related inequalities. Ann. Math. 118 (1983), no. 2, 349–374. [172] J.L. Lions. Quelques méthode de résolution des problèmes aux limites non linéaires. Dunod, Paris, 1969. [173] J.L. Lions, G. Stampacchia. Variational inequalities. Comm. Pure Appl. Math. 20 (1967), 493–519. [174] S. Luckhaus, L. Modica. The Gibbs-Thompson relation within the gradient theory of phase transitions. Arch. Ration. Mech. Anal. 107 (1989), 71–83.
i
i i
i
i
i
i
626
“abmb 2005/1 page 6 i
Bibliography
[175] J.-P. Mandallena. On the relaxation of nonconvex superficial integral functionals. J. Math. Pures Appl. 79 (2000), 1011–1028. [176] J.-P. Mandallena. Quasiconvexification of geometric integrals. Ann. Mat. Pura Appl. (to appear). [177] P. Marcellini. Approximation of quasiconvex functions, and lower semicontinuity of multiple integrals. Manuscripta Math. 51 (1985), 1–28. [178] C.M. Marle. Mesure et probabilités. Hermann, Paris, 1974. [179] K. Messaoudi, G. Michaille. Stochastic homogenization of nonconvex integral functionals. Math. Modelling Numer. Anal. 28 (1994), no. 3, 329–356. [180] L. Modica. The gradient theory of phase transitions and the minimal interface criterion. Arch. Ration. Mech. Anal. 98 (1987), 123–142. [181] B. Mohammadi, J.-H. Saïac. Pratique de la Simulation Numérique. Dunod, Paris, 2003. [182] J.J. Moreau. Fonctionnelles convexes. Cours Collège de France 1967, new edition CNR, Facoltà di Ingegneria di Roma, Roma, 2003. [183] J.M. Morel, S. Solimini. Variational models in image segmentation. Birkhäuser Boston, Boston, 1995. [184] F. Morgan. Geometric measure theory: A beginners guide. Academic Press, New York, 1988. [185] C. B. Morrey. Multiple integrals in the calculus of variations. Springer-Verlag, Berlin, 1966. [186] C.B. Morrey. Quasiconvexity and the semicontinuity of multiple integrals. Pacific J. Math. 2 (1952), 25–53. [187] U. Mosco. Convergence of convex sets and of solutions of variational inequalities. Adv. Math. 3 (1969), 510–585. [188] U. Mosco. On the continuity of the Young-Fenchel transformation. J. Math. Anal. Appl. 35 (1971), 518–535. [189] J. Moser. A Harnack inequality for paraboloc differential equations. Comm. Pure Appl. Math. 17 (1964), 101–134. [190] S. Müller, V. Sverak. Attainment results for the two-well problem by convex integration. Geometric Analalysis and the Calculus of Variations, International Press, Cambridge, MA (1996), 239–251. [191] D. Mumford, J. Shah. Optimal approximation by piecewise smooth functions and associated variational problem. Comm. Pure Appl. Math. 17 (1989), 577–685.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
627
[192] F. Murat, L. Tartar. Optimality conditions and homogenization. Proceedings of Nonlinear Variational Problems, Isola d’Elba 1983, Res. Notes Math. 127, Pitman, London (1985), 1–8. [193] J. Neˇcas. Les méthodes directes en théorie des équations elliptiques. Masson, Paris, 1967. [194] X.X. Nguyen, H. Zessin. Ergodic theorems for spatial processes. Z. Wah. Vew. Gebiette 48 (1979), 133–158. [195] P. Pedregal. Parametrized measures and variational principle. Birkhäuser Boston, Boston, 1997. [196] P. Pedregal. Laminates and microstructure. European J. Appl. Math. 4 (1993), 121– 149. [197] R.R. Phelps. Convex functions, monotone operators and differentiability. Lecture Notes in Math. 1364, Springer-Verlag, New York, 1989. [198] O. Pironneau. Optimal shape design for elliptic systems. Springer Ser. Comput. Phys., Springer-Verlag, New York, 1984. [199] P.A. Raviart, J.M. Thomas. Introduction à l’analyse numérique des équations aux dérivées partielles. Masson, Paris, 1983. [200] J. R. Rice. Mathematical analysis in the mechanics of fracture. Fracture: An Advanced Treatise, H. Liebowitz ed., 2 (1969), 191–311. [201] R.T. Rockafellar. Convex analysis. Princeton University Press, Princeton, NJ, 1970. [202] R.T. Rockafellar. Integrals which are convex functionals. Pacific J. Math. 24 (1968), no. 3, 525–539. [203] R.T. Rockafellar. Convex integral functionals, II. Pacific J. Math. 39 (1971), 439–469. [204] R.T. Rockafellar. Conjugate convex functions in optimal control and the calculus of variations. J. Math. Appl. 32 (1970), 174–222. [205] R.T. Rockafellar, R.J.B. Wets. Variational analysis. Springer-Verlag, Berlin, Heidelberg, New York, 1998. [206] W. Rudin. Analyse réelle et complexe. Masson, Paris, 1975. [207] H. A. Simon. Bounded rationality in social science: Today and tomorrow. Mind Society, 1 (2000), 25–39. [208] S. Sobolev. Methode nouvelle a résoudre le probléme de Cauchy pour les équations linéaires hyperboliques normales. Mat. Sbornik 1 (1936), 39–72. [209] J. Sokolowski, J.-P. Zolesio. Introduction to shape optimization. Shape sensitivity analysis. Springer Series in Comput. Math. 16, Springer-Verlag, Berlin (1992).
i
i i
i
i
i
i
628
“abmb 2005/1 page 6 i
Bibliography
[210] G. Stampacchia. Formes bilinéaires coercitives sur les ensembles convexes. C. R. Acad. Sci. Paris, 258 (1964), 4413–4416. [211] V. Sverak. New examples of quasiconvex functions. Arch. Ration. Mech. Anal. 119 (1992), 293–300. [212] V. Sverak. Quasiconvex functions with subquadratic growth. Proc. Roy. Soc. Lond. A433 (1991), 723–725. [213] V. Sverak. Rank-one convexity does not imply quasiconvexity. Proc. Roy. Soc. Edinburgh 120A (1992), 185–189. [214] V. Sverak. On the problem of two well. Microstructure and Phase Transition, D. Kinderlehrer et al., eds., IMA Vol. Math. Appl. 54, Springer-Verlag, NewYork (1993), 183–190. [215] M.A. Sychev. A new approach to Young measure theory, relaxation and convergence in energy. Ann. Inst. Henri Poincaré 16 (1999), no. 6, 773–812. [216] G. Talenti. Best constants in Sobolev inequality. Ann. Mat. Pura Appl. 110 (1976), 353–372. [217] L. Tartar. H -measures, a new approach for studying homogenization, oscillations and concentration effects in partial differential equations. Proc. Roy. Soc. Edinburgh 115A (1990), 193–230. [218] L. Tartar. An introduction to the homogenization method in optimal design. Optimal Shape Design. Lecture Notes in Math. 1740, Springer-Verlag, Berlin (2000), 47–156. [219] R. Temam. Problèmes Mathématiques en Plasticité. Gauthier-Villars, Paris, 1983. [220] L. Thibault. Sequential convex subdifferential calculus and sequential Lagrange multipliers. SIAM J. Control Optim. 35 (1997), 1434–1444. [221] D. Torralba. Applications aux transitions de phases et à la méthode barrière logarithmique. Thèse de l’Université Montpellier II, 1996. [222] N.S. Trudinger. On Harnack type inequalities and their applications to quasilinear elliptic equations. Comm. Pure Appl. Math. 20 (1967), 721–747. [223] M. Valadier. Young measures. Methods of Nonconvex Analysis, A. Cellina ed. Lecture Notes in Math. 1446, Springer-Verlag, Berlin (1990), 152–188. [224] M. Valadier. A course on Young measures. Rend. Instit. Mat. Univ. Trieste 26 suppl. (1994), 349–394. [225] K. Yosida. Functional analysis. Springer-Verlag, New York, 1971. [226] L. C. Young. Lectures on calculus of variations and optimal control theory. W. B. Saunders, Philadelphia, 1969.
i
i i
i
i
i
i
Bibliography
“abmb 2005/1 page 6 i
629
[227] E. Zeidler. Nonlinear functional analysis and its applications III. Variational Methods and Optimization, Springer-Verlag, Berlin, 1984. [228] K. Zhang. Rank-one connections at infinity and quasiconvex hulls. J. Convex Anal. 7 (2000), no. 1, 19–45. [229] W. P. Ziemer. Weakly differentiable functions. Springer-Verlag, Berlin, 1989.
i
i i
i
i
“abmb 2005/1 page 6 i
i
i
i
i
i
i
i
i
i
“abmb 2005/1 page 6 i
Index Aff0 (D, Rm ), 425 α = ap limx→x0 f (x), 390 ap lim inf x→x0 f (x), 390 ap lim supx→x0 f (x), 390 B(), 124 BV (), 371 BV (, Rm ), 383 C0 (), 61 C0 (, Rm ), 129 Cb (; E), 138 Cb (, Rm ), 133 Cc (), 16 Cc (, Rm ), 129 C1c (), 16 Cm (), 25 C# (Y ), 432 C1 -diffeomorphism, 174 Capp (·), 212 D(), 16 D (), 17 , 7 diam(E), 109 dimH , 121 div, 9 dom f , 77 epi f , 79 F, 181 f #e g, f #g, 313 f ∗ , 321 -convergence, 464 − lim inf n→+∞ Fn , 465 − lim supn→+∞ Fn , 465 H 1 (), 152 H01 (), 154 H −1 (), 167 Hs , 112
H s (RN ), 185 J u, Cu, 407 K ∞ , 563 L1λ (, Rm ), 125 Lw (, M(E)), 139 LN , 114 M(), 124 M(, Rm ), 124 Mb (), 124 Mm×N , 421 µ = (µx )x∈ ⊗ σ , 135 µA, 124 µ+ , µ− , 125 v, ˆ 180 ∂f , 331 ∂M E, 389 ∂r E, 397 Qf , 423 ρε ∗ µ, 132 SBV (), 409 SBV (, Rm ), 409 Sf , 394 σC , 309 spt, 16, 124 u− , u+ , 404 W 1,p (), 153 1,p W0 (), 154 W 1,p (, Rm ), 421 W −m,p (), 167 y $S x, 101 Y(; E), 138 nar
, 138
631
i
i i
i
i
i
i
632 absolutely continuous, 125 approximate derivative, 408 limit, 389 limit inf, 390 limit sup, 390 Cantor part, 407 Cantor set, 122 Cantor–Vitali function, 408, 518 Carathéodory criterion, 112 Cauchy–Riemann, 9 coercive, 76 σ -coercive, 589 coercivity, 86 complementary problem, 340 complementary slackness condition, 345, 356 concentration, 50 convolution, 18 Courant–Fisher, 290, 291, 294 covering fine, 115 De La Vallée–Poussin criterion, 518, 519 density point, 388 Dirac mass, 15, 27 Dirichlet, 598 problem, 31 distribution, 17 derivation, 24 domain, 77 dual function, 354, 358, 361 problem, 351, 354 value, 354 duality gap, 354, 358 duality gap, 354 Dunford–Pettis theorem, 145 Eberlein–Smulian theorem, 56 eigenvalue, 279–281 eigenvector, 280, 281, 286, 288, 290 Ekeland’s ε-variational principle, 98 epiconvergence, 466
“abmb 2005/1 page 6 i
Index epi-sum, 312, 328 ergodic dynamical system, 534 subadditive ergodic theorem, 534 ergodic theorem, 49 exact minorant, 331, 333 extension operator, 174 theorem, 179 Fenchel extremality relation, 332, 364 extremality relations, 335 Fenchel–Moreau theorem, 321 Legendre–Fenchel conjugate, 320 transform, 597 finite perimeter (set of), 396 Galerkin approximating method, 257 Galerkin method, 73 Gateaux differentiability, 98 Gauss, 8 Gauss–Green formula, 396, 401 generalized solution, 420 Hahn–Banach separation theorem, 92 Hahn–Banach theorem, 307, 454 harmonic function, 7 hat function, 261, 262 Hausdorff dimension, 120 measure, 109, 112 outer measure, 109 Hilbert, 13 infcompact function, 86 inner measure-theoretic normal, 395 interpolant, 265 jump part, 407 point, 394 set, 394, 404 Karush–Kuhn–Tucker optimality conditions, 341, 345
i
i i
i
i
i
i
Index kernel, 10 Lagrange multiplier vector, 346 characterization of, 346 Lagrangian, 353 Laplace, 8 Lax–Milgram theorem, 67 Lebesgue–Nikodym, 592 Legendre–Fenchel conjugate, 361 limit analysis, 597 lower semicontinuous regularization, 85 marginal function, 348, 363 Markov inequality, 143 measure Borel, 124 bounded, 125 counting, 593 Radon, 115, 125 regular, 61, 125 signed, 124 support, 124 total variation, 125 measure theoretical boundary, 389 exterior, 388 interior, 388 mollifier, 18 mountain pass theorem, 100 narrow, 374 narrow convergence of Young measures, 138 Neumann, 598 boundary condition, 34 problem, 33 Newtonian potential, 7, 28 nodes, 261, 262 normal cone, 338 optimal value, 355 oscillations, 49
“abmb 2005/1 page 6 i
633 Palais–Smale compactness condition, 99 perturbation function, 360, 361, 363, 365 Picard iterative method, 71 Poincaré inequality, 168 Poincaré–Wirtinger inequality, 180, 400 Poisson equation, 8 primal problem, 353 value, 353 proper, 77 quasi-continuous representatives, 340 quasi-convex envelope, 423, 458 Rademacher, 379 Radon measure, 24 Radon–Nikodym theorem, 126 rarefaction point, 388 Rayleigh Courant–Rayleigh formula, 298 quotient, 290 recession cone, 563 function, 440, 478, 510, 524, 592 functional, 555 reduce boundary, 397 reflexive, 55 regular triangulation, 264 regular point, 393 relaxation, 437 relaxation scheme, 457 relaxed problem, 85, 420 Rellich–Kondrakov compact embedding theorem, 179 Rellich–Kondrakov theorem, 172 Riemann, 9 Riesz representation theorem, 20, 48, 61, 67, 129 theorem, 41, 86 Rockafellar theorem, 329
i
i i
i
i
i
i
634 saddle point, 354, 355 value problems, 354 self-similar set, 123 separation of variable method, 279, 305 set convergence, 464 set of class C1 , 174 singular, 125 Slater generalized Slater, 363 Slater qualification assumption, 341, 343, 344, 348, 350, 351, 355, 363 slicing decomposition, 135 Sobolev spaces, 24 Stokes problem, 35 subadditive, 593 subadditivity, 110, 592 subdifferential, 331 support function, 309
tangent cone, 338 test function, 31 test functions, 15
“abmb 2005/1 page 6 i
Index tightness for nonnegative Borel measures, 133 for Young measure, 139 uniformly convex, 52, 55 uniformly integrable, 58, 144, 145, 148, 450, 451, 460 value function, 363 Vitali’s covering theorem, 115 Von Neumann’s minimax theorem, 360 weak convergence, 41 solution, 31 topology, 41, 44 weak∗ topology, 60 Weierstrass example, 12 theorem, 87 Young measure, 138 Young measures W 1,p -Young measures, 450 associated with functions, 142 generated by functions, 143
i
i i
i