Differential geometry and the calculus of variations (Mathematics in science and engineering volume 49)

DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS This is Volume 49 in MATHEMATICS IN SCIENCE A N D ENGINEERING A ...

Author: Robert Hermann

119 downloads 665 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS

This is Volume 49 in MATHEMATICS IN SCIENCE A N D ENGINEERING A series of monographs and textbooks Edited by RICHARD BELLMAN, University of Southern CaI$ornia A complete list of the books in this series appears at the end of this volume.

DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS Robert Hermann UNIVERSITY OF CALIFORNIA SANTA CRUZ, CALIFORNIA

@

ACADEMIC PRESS New York and London

1968

COPYRIGHT 0 1968, BY ACADEMIC PRESSINC.

ALL RIGHTS RESERVED. N O PART O F THIS BOOK MAY BE REPRODUCED I N ANY FORM, BY PHOTOSTAT, MICROFILM, O R ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.

ACADEMIC PRESS INC.

I I I Fifth Avenue, New Y o r k , N e w Y o r k 10003

United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. Berkeley S q u a r e House, London W. I

LIBRARY OF CONGRESS CATALOG CARDNUMBER : 68-14664

PRINTED I N THE UNITED STATES O F AMERICA

Preface Differential geometry has radically changed in the last twenty years: A

‘‘global ” approach based on the theory of manifolds and inspired, at least

in part, by progress in the sister field of topology has replaced the traditional methods. However, unlike topology (and like, say, analysis) the problems have not really changed, and the student who ignores the history of the subject cuts himself off from the richest sources of intuition. In fact, one might say that the new methods are just a systematization of viewpoints that have always been inherent in the subject, at least in the work of such masters as Lie, Darboux, Cartan, Levi-Civita, and Carathtodory. (These are the men from the classical period of differential geometry whose work will appear often in this book.) This volume is meant to serve a variety of functions. It was originally planned to show to mathematically inclined engineers and physicists how differential forms and vector fields could be used in the calculus of variations and Hamilton-Jacobi theory, i.e., in the mathematics of classical mechanics. However, over the years the book has been written its scope has widened, and now differential geometry itself is emphasized. Hopefully, enough of the applied flavor remains to interest the audience for whom it was originally intended. Half of the book is an exposition of the geometric side of the classical oneindependent variable calculus of variations and Hamilton-Jacobi theory, corresponding to the classical treatise LeCons sur les Invariants Integraux ” by E. Cartan, and “ Variationsrechnung by C. Caratheodory. Now, this material has been in a complete form for at least 50 years. The reasons for giving it such prominence are (a) I like it, feel it has great beauty, and deplore that it has virtually disappeared from mathematical education, and (b) I think that its combination of qualitative geometric reasoning and detailed computation is very useful as a model for training in differential geometry and mathematical physics. Especially important, the student can really learn how vector fields and differential forms, the building blocks of the subject, are used. However, this is not a systematic general treatment of the calculus of variations. (The excellent treatise by Gelfand and Fomin is highly recom“

”

V

vi

Preface

mended here.) In differential geometry today the most important variational structures are Riemannian metrics. Accordingly, 1 have gone as far as seemed profitable in a study of general variational structures, then switched to Riemannian geometry, which has a different flavor because of the intervention of the theory of affine connections. This switch causes a discontinuity in the nature of the material presented in the book. In the first two parts the reader will find the material presented along more classical lines, while in the third and fourth parts we change gears in order to present to the reader the different outlook of contemporary differential geometry. Although in principle the only prerequisites would be a good course in advanced calculus and possibly some vector and/or tensor analysis, it probably would be best for the reader to be familiar with the introduction to differential forms given in the book by H. Flanders [ l ] and i n the introduction to Lie groups and the vector field concept given i n the book by Auslander and Mackenzie [I]. Spivak‘s book [ I ] is recommended as preparation in calculus, and we have referred to it occasionally for proofs. Abraham’s book [ I ] can be consulted for an alternate treatment of many topics, as well as an introduction to the advanced parts of classical mechanics. The beginner in differential geometry will find that the matter of notations is the most annoying obstacle to grasping the fundamental ideas. In fact, there is an amusing definition of modern differential geometry as “the study of invariance under change of notation.” 1 believe that the situation is not really this bad, and that there is a reasonably optimal notation available, namely what can be called the differentiable manifold-vector field-differential form notation. However, one must be prudent in using the big guns of modern mathematics, and the reader will notice that various currently fashionable bits of jargon have been exorcised from the treatment. The aim is not to construct a quasi-algebraic apparatus to tackle a few central problems, but rather to achieve a synthesis of algebraic, analytical, and topological techniques to cover a variety of topics. While it would be desirable in the abstract to say that the “global” problems are the central ones, it is usually not possible to make a decisive distinction between “ local and global.” This is the first in what might be a two volume work. Several items in this volume may seem isolated from the rest of the book. Some were put in for their own sake as interesting side points, but others are planned as introductions to topics that will be covered more systematically in the second volume. This book was begun at the Lincoln Laboratory of Massachusetts Institute of Technology; I am greatly indebted to my colleagues there. A grant of ”

“

Preface

vii

support for one year from the Mathematics Division of the Air Force Office of Scientific Research enabled me to extend the scope of the book; it was completed at Argonne National Laboratory. I am, of course, indebted to many colleagues for conversations and ideas and I would like to thank them. I shall attempt a partial listing: W. Ambrose, L. Auslander, M. Berger, S. S. Chern, J. M. Cook, R. Crittenden, B. Friedman, P. Griffiths, S. Helgason, R. Kalman, W. Klingenberg, N. Kuiper, C. C. Moore, J. Moser, R. Palais, R. Prosser, S. Smale, I. M. Singer, D. C. Spencer, S. Sternberg, and H. C. Wang. J. Moyal and Harley Flanders have read part of the manuscript and made many suggestions. April, 1968

R. HERMANN

This page intentionally left blank

Contents

PREFACE

V

Part 1. Differential and Integral Calculus on Manifolds Introduction Tangent Vector-Vector Field Formalism Differential Forms Specialization to Euclidean Spaces: Differential Manifolds Mappings, Submanifolds, and the Implicit Function Theorem The Jacobi Bracket and the Lie Theory of Ordinary Differential Equations Lie Derivation and Exterior Derivative; Integration on Manifolds The Frobenius Complete Integrability Theorem Reduction of Dimension When a Lie Algebra of Vector Fields Leaves a VectorField Invariant 10. Lie Groups 11. Classical Mechanics of Particles and Continua 1.

2. 3. 4. 5. 6. 7. 8. 9.

3 6 11 21 28 34 46 63 73 81 98

Part 2. The Hamilton-Jacobi Theory and Calculus of Variations 12. 13. 14. 15. 16. 17. 18.

Differential Forms and Variational Problems Hamilton-Jacobi Theory Extremal Fields and Sufficient Conditions for a Minimum The Ordinary Problems of the Calculus of Variations Groups of Symmetries of Variational Problems: Applications to Mechanics Elliptic Functions Accessibility Problems for Path Systems

Part 3.

113 122 142 152 170 232 24 1

Global Riemannian Geometry

19. Affine Connections on Differential Manifolds 20. The Riemannian Affine Connection and the First Variation Formula 21. The Hopf-Rinow Theorem; Applications to the Theory of Covering Spaces

ix

261 272 284

Contents

X

22. 23. 24. 25. 26.

The Second Variation Formula and Jacobi Vector Fields Sectional Curvature and the Elementary Comparison Theorems Submanifolds of Riemannian Manifolds Groups of Isometries Deformation of Submanifolds in Riemannian Spaces

29 1 302 318 342 362

Part 4. Differential Geometry and the Calculus of Variations: Additional Topics in Differential Geometry 27. 28. 29. 30. 31.

32. 33.

First-Order Invariants of Submanifolds and Convexity for Affinely Connected Manifolds Affine Groups of Automorphisms. Induced Connections o n Submanifolds. Projective Changes of Connection The Laplace-Beltrami Operator Characteristics and Shock Waves The Morse Index Theorem Complex Manifolds and Their Submanifolds Mechanics on Riemannian Manifolds

313 378 386 394 401 420 421

BIBLIOGRAPHY

43 1

SUBJECT INDEX

435

DIFFERENTIAL GEOMETRY A N D THE CALCULUS OF VARIATIONS


Part

1

DIFFERENTIAL AND INTEGRAL CALCULUS ON MANIFOLDS


1

Introduction

We begin by recalling the main principles of ordinary three-dimensional vector analysis. The underlying space is the space of three real variables x = (x,,x, , x,). A scalar field is a real-valued function f(xI, x, ,x,). A vectorfield, denoted by X , say, is an ordered triplet (A,(x), A,(x), A3(x)) of scalar functions. (In the usual geometric representation of a vector as a directed line segment in Euclidean space, they are just the components along the three coordinate axes. However, we shall try to avoid using the Euclidean properties of three-dimensional number space.) Gibbsian vector analysis, as commonly used in the physical sciences, is concerned with the rules of calculation and the physicogeometric interpretation of the six basic operations on vector and scalar fields. The three basic algebraic operations are:

'A (a) Multiplication of a scalarfby a vector : (b)

Dot or inner product of vector fields. If

x=

(A13

A,,

y = (Bl, B ,

A3),

3

B3),

then X - Y is the scalar of A , B , + A, B, + A, B, . (c) Vector or cross product of oectorfields X and Y :

xx

Y = (A,B3 - B,A,, B,A, - A,B,, A,B, - B,A,).

These operations really involve only the algebraic properties of the range spaces of the scalar and vector fields. However, when we turn to the following three basic operations, which involve differentiation as well, it is clear that the domain spaces play a vital role also: (a)

Gradient. I f f i s a scalar field, grad f is the vector field:

(b) Divergence. If X field :

= ( A l , A , , A,)

is a vector field, div X is the scalar

aA, aA, f3A, +---+-.

ax,

ax, 3

ax,

4

Part 1. Calculus on Manifolds

(c) Curl. If X is a vector field, curl X is the vector field: d A , dA,

a.4, dA2

ax,’

ax,’ ax,

ax,

dx,

These operations involve, in the last analysis, writing out everything in terms of three components, which is cumbersome. However, there are a few simple rules of combination that, when proved once and for all, enable one to calculate problems of physics, differential geometry, and others without referring back to the components at each stage. For example: curl(gradf)

= 0,

x x ( Y x Z ) = ( X x Y ) x Z + Y x ( X x Z), x x Y = - Y x x, x.Y = Y .x. Of course these operations also have a simple physical or geometric interpretation, but we shall not consider either in detail at this time. As examples: (1) grad f is the direction of steepest ascent of the function f ; or, which says the same thing in different words, gradfis perpendicular to the level surfaces off. (2) Af = div(gradf) = 0 expresses the absence of sources and sinks of a flow with a potentialf.

However, the simplicity and beauty of this scheme when applied to everyday problems of the physical sciences almost inevitably forces difficulties and awkwardness when problems involving change of coordinates are encountered. This awkwardness can often be circumvented by a clever combination of physical and mathematical reasoning, but there are limits to what can be done because these operations depend on the flat or Euclidean structure of the underlying space. One might suppose that tensor analysis offers a way out of this dilemma by bringing the requirements of invariance under change of coordinates to the foreground, but the geometric properties, at least, of the fundamental objects and operations are often hidden in a maze of indices and conventions. The great advantages of tensor analysis are that some of the formal simplicity of ordinary vector analysis is retained and that it is the supreme tool in subjects where extensive computations must be made. However, in this book we shall develop and use what we call the formalism of vector-fields-differential forms in n-dimensional spaces and manifolds. Despite the fact that this formalism may, as justly as tensor analysis, be regarded as the direct generalization of the ordinary vector analysis outlined above, it has up to now occupied almost no place in the mathematical arsenal of the theoretical physicist or engineer. We shall attempt to provide the reader with a link

1. Introduction

5

between his presumed knowledge of ordinary vector analysis and/or tensor analysis by defining as explicitly as possible the notion of vector field and differential form on the Euclidean n-space ; further, we shall motivate this notion by applications to the theory of ordinary differential equations and the calculus of variations.

2 Tangent Vector-Vector

Field Formalism

It will be assumed that the reader knows the rudiments of finite dimensional vector-space theory and point-set topology. For the latter, this should include familiarity with such notions as “ compactness,” continuity,” “ Hausdorff,” and topological space,” and with the elementary general theorems interrelating these notions. However, in order to correlate these principles with material that the reader may have encountered in physics or engineering (for example, tensor analysis), we shall show in the next chapter how the notions introduced in this chapter assure a more familiar form for Euclidean spaces. Most of our work in Parts 1 and 2 will be concerned with them. The aim of our formalism is to be able to carry over differential calculus from Euclidean spaces to more general (finite or infinite dimensional) spaces. Now, the primitive notion in calculus is the idea of a derivative of a realvalued function ? + f ( t ) of a real variable t : “

“

df dt

- ( t ) = f ’ ( t ) = lim

f(t + At) -f(t)

At-rO

At

This is extended to real-valued functions J’(x,, . . . , x,,) of n-real variables by defining the partial derivatives:

af af -,...,-. ax 1 ax” However, this is just a special case of (2.1) :

is the derivative of the function t -J(xl

+ t , x 2 , . . . , x,)

at t

= 0.

The coordinates (x,,.. . , x,) play a special role in this definition, since if x,’(x), . . . , xfl’(x)are new coordinates, the

;if

af

are quite different from the ax,,’...’ I ax“

af

-,

8x1

. . . ,-.a !

ax”

However, let us analyze what is done in (2.2). We take the curve + (x, + t , . . . , x,) in R”, restrict the function to the curve, and differentiate

t

6

2. Tangent Vector-Vector Field Formalism

7

it by the rule (2.l).? However, notice that we have used only curves of a special type here; namely, those that are coordinate lines. When we change coordinates, naturally we change this system of curves. Thus we can encompass all possible changes of coordinates by considering this “ directional derivative ” process applied to all$ curves in R”. We shall now describe the mathematical structure (the “ tangent bundle ’7 to which this leads. Suppose t + x ( t ) is such a curve, with t running over the interval 0 5 t I 1. Let us pick any point in the curve. For illustrative purposes, suppose this is the point t = 0. Consider the mapping

from the vector space of real-valued functions of x to real numbers. There are two algebraic rules satisfied by this process: (a) It is linear. (b) If f ( x ) and g(x) are functions

Explicitly, of course, if x ( t ) = ( x l ( t ) ) ,. . . , (x,,(t)),and if d x , (O), ..., u, dt

v 1 =-

dx dt

= 2(0)

’

then

We are accustomed to interpreting (vl, . . . , u,) as the “tangent vector” to the curve t + x ( t ) at t = 0. But (2.5) tells us that by making use of the Euclidean structure of R“ and by using its ordinary coordinate system, the linear mapping (2.3) can be essentially identified with the tangent vector ’’ to the curve. This suggests in general that we can say that two curves t + x ( t ) and t + y ( t ) have the same tangent vector or have contact to first order” at t = 0 “

“

R“ denotes the set of n-tuples of real numbers considered, say, as a vector space over the real numbers. We shall also use a vector notation x = ( x , , . . . ,x,,) when it is convenient; no confusion is likely. 1 Of course we are not making precise questions of differentiability of curves and functions that are necessary to enable the derivatives to make sense. This will be made more precise in the next chapter; in general, we assume everything is differentiable infinitely many times.

8


if the functional defined by (2.3) is the same. Also (2.5) tells us that two such curves must meet at t = 0. Thus we can say that the set of “ tangent vectors” to a point x(0) of R” is the set of mappings of the form (2.3). This is an idea that is obviously independent of coordinates. Let us use it to define analogous notions for more general spaces. Let M be a space. The set of real-valued functions on M forms a ring. Such functions can be added and multiplied in the usual way. Let F ( M ) be a subring of the ring of all functions that we shall regard as the basic” functions on M . For example, if M is R“, we shall want F ( M ) to be the ring of all functions that depend on the underlying variables x l , . . . , x, in an infinitely differentiable (Cm) way. In this chapter we shall not be so precise about this ring, but shall regard it as given. We shall denote points of M by such letters as p , q, and elements of F ( M ) by such letters asf, g , . . . , occasionally confusing the functionfwith its valuesf(p), and abbreviating F ( M ) to F if M is fixed in the discussion. “

Definition

Let p be a point of M . A tangent vector to p is a linear mapping, typically denoted by v , of F ( M ) to R , such that 4fg)=U ( f M P ) + 49)f(P)

for f , 9 E F .

(2.6)

(Note that (2.6) corresponds to (2.4b).) The set of all such linear mapping is called the rangent space to M at p , denoted by M,, . Since two such mappings defined at the same point p can be added and multiplied by real scalars, the tangent vectors at p form a real vector space. The union M,, of tangent spaces to all points of M forms a new space called the tangent bundle to M , denoted by T ( M ) . We can consider a curve in M as a mapping (typically denoted by such letters as c or y) of an interval of real numbers into points of M . Throughout this book t will denote such a real parameter. For simplicity, we normalize the interval to 0 5 t 5 I . (Occasionally, s will also serve to denote a real parameter.) For each point of the interval we define the tangent vector to B at t , denoted by ~ ‘ ( t )as, follows:

UPEM

(Of course c cannot be an arbitrary continuous curve, since the derivative in (2.7) must exist, but we assume that the reader has enough experience from advanced calculus to formulate the correct differentiability hypotheses.) Thus a’(t) E Ma(,)

for 0 I tI 1.

9

2. Tangent Vector-Vector Field Formalism

At this general level, it is not possible to assert that, for each point p E M and each v E M p, there is a curve passing through p whose tangent vector is v. Thus a “ tangent vector” is not necessarily, as was found for Euclidean space, an equivalence class of curves passing through p , with two curves “ identified ” if they meet at p to “ first order of contact.” The definition we have adopted is the appropriate one if we want tangent vectors to form a vector space. Regard a curve as a mapping 0:[0, I ] - + M . Its tangent vector t-+o‘(t) then defines a mapping 0‘:[0, I ] -+ T ( M ) . It is a cross section in the sense that applying the “projection mapping” T ( M )-+ A4 (which assigns to each vector v E T ( M ) the point to which it is attached ”) gives back 0.Here we touch on the theory of fiber bundles. It will repay our investment to pause and explain the geometric idea of a “vector bundle,” which is a special case of a fiber bundle. “

Definition Consider a mapping 71 of a space E onto a space M . It is called a vector bundle if, for each p E M , the inverse image x-’($) (the fiber above p ) is a real vector space. (In the case E = T ( M ) , the fiber above p is just M , , the tangent space to M at p.) The vector-bundle concept permeates all modern mathematics, and is relevant to many ideas in physics, particularly in quantum field theory. Intuitively, the space M (the base space of the bundle) is nonlinear,” while the fibers are linear: E is nonlinear horizontally and is “ linear ” vertically. (See Fig. 1.) “

“

”

FIGURE I

In this book we shall not consider vector bundles very extensively, although some of the language and naive geometric intuition built up for their study will be helpful. (For exilmple, the book by Auslander and MacKenzie [l] is recommended for its more extensive treatment of many ideas mentioned here.) Associated with a vector bundle 71: E M is the concept of cross section, denoted when general vector bundles are being discussed by $. It is a map M -+ E such that --f

71$(p) = p

for all p

E

The set of cross sections is denoted by T ( E ) .

M.

(2.8)

10


Although the points of E that are in different fibers cannot be added, cross sections can be : ($1

+ $2)(P>

= $,(PI

+ $z(P>

for P

EM

,

$1.

$2 E

T(E).

(2.9)

Notice that (2.8) guarantees that $ l ( p ) and $2(p) lie in the same fiber 7 c - ’ ( p ) , and the basic postulate of a vector bundle allows us to add ~ l ( pand ) $2(p). Further, $ E r can be multiplied by a function f~ F ( M ) : (,f$)(P)

= f(P)$(P)

for P E M .

(2.10)

I n algebraic jargon, r ( E ) forms a module over the ring F ( M ) . As we shall see, many concepts i n differential geometry take on optimally elegant analytical form when expressed in this module language. For example, let us look at the case E = T ( M ) . A cross section, which we denote in this case by such letters as X, Y, . . . ,can be geometrically described, then, as a uectorjield, since to each p E M it assigns a tangent vector X ( p ) lying in M , . In this case we denote T ( E ) by V ( M ) . The “module” description of V ( M ) will be very convenient for us. Let X E V ( M ) ,that is, X assigns a tangent vector X ( p ) E M , to each p E M . Thus, forfE F ( M ) , p + X ( p ) ( f ) defines another function on M . Assume that it too belongs to F ( M ) , and denote it by X ( f ) . X then defines a linear mapping: F ( M ) + F ( M ) , called the Lie deriuatire operation. X ( f ) is called the Lie deriiiatiue o f f by X,By definition,

X(f)(P) of

= X(P)(f).

(2.11)

The basic property (2.6) of tangent vectors then allows us to note a property X as a mapping: F ( M ) + F ( M ) . X ( f 9 ) = X ( f ) s +f

W)

for f,9 E F ( M ) >

(2.12)

that is, X is a deriration of the ring F(M). This property enables us in favorable cases (for example, if M is a differentiable manifold and F ( M ) is the ring of C“ functions) to define a vector field as a derivation of F ( M ) . In local coordinates, we shall see that such a derivation is nothing but aJirst-order, linear diferential operator. Conversely, suppose X: F ( M ) + F ( M ) satisfies (2.12). Then X defines a cross section p -+ X ( p ) E M , of the tangent bundle to M :

X(P)(f)= X(f)(P). For the purposes of this chapter, we have not made any distinction between the objects X defined in either of two ways, either “geometrically” as cross sections of T ( M ) or algebraically” as derivations of F(A4). “

3

Differential Forms

Whenever we are given a vector bundle 71: E + M , we can construct “new” vector bundles by the operations of tensor algebra on the fibers. As an example, we mention the dual space and the skew-symmetric multilinear forms. Let V be a real vector space whose elements we denote by v and call vectors.” A “covector” is a linear mapping, typically denoted by w , of V into R, the real numbers. The space V* of covectors forms a new vector space called the dual space to V . If Vis finite dimensional, V* has the same dimension as V. In fact, suppose v l , . . . , v, is a basis for V ; that is, every element c’ E V can be written in the unique form “

v=alul

+ . - -+a,v,.

The coefficients a , , .. . , a, in this expansion depend linearly on v, and hence define linear forms on V, that is, elements of V*, which we denote by ol, . . . , o n .One can prove (as an exercise) that w I , . . . , o,forms a basis for V*, called the dual basis to the given basis ( v l , , . . , u,) of V. It can also be characterized by the condition? O i ( U J ) = hij,

1I i, j 5 n.

An r-covector on V is a mapping (%,- - * , v,)+o(u,,

. . - ,0),

with domain the n-tuples of elements of V with values in the real numbers. We can indicate this by the notation w : V x . .. X V + R. We require that o be multilinear in the sense that it is linear in each of the variables ulr . . . , u, when all others are held fixed. In addition, we require that it be skewsymmetric in the sense that w ( u l , , . ., u,) changes sign when neighboring arguments are permuted ; that is, w ( u l , . . . , 13,) =

- o ( v z , u,,

u3,

. . ., v,)

w(vI,v2,uj,...,ur)= -w(vI~v3,u2,u4,...,u~)

and so forth. Again, the set of these r-covectors forms a vector space, which we shall denote by V*‘. If w E V*l, v E V , we shall define v _I o,the contraction of 0

t S,,

is the ‘‘ Kronecker delta” symbol; it is zero except when i =j , when it is I . 11

12


by u, or the inner product of w with u, in the following way:

..., u r - L ) = W ( U , U 1 , ..., u r - l ) .

(U J O ) ( U I ,

Thus u _I w is the element of V * r - l resulting from holding fixed one of the r-arguments of w . We shall find the mapping from V*‘ to V * r - l convenient for proving facts about r-covectors by induction on r . If r = 1 , of course u _I w coincides with w(u), the value of the linear form o on u. We shall use both notations interchangeably. If w I is a n r-covector and o2 an s-covector, a “product” form denoted by w 1 A w2 can be defined as an (r +s)-covector. It is called the exterior product of w1 and w 2 . Roughly, it is obtained in the following way: Consider (r s)-vectors u , , . . . , u,+, . One can assign to them the number

+

w1(u1,

. . . >u r ) w 2 ( u r + 1 ,

ur+s)-

However, this assignment does not depend skew-symmetrically on all variables. I t can be made so by permuting the variables and adding up the results, with appropriate signs. For example, if r = s = 1 , (w1 A

w2)(u1,

u 2 ) = wI(uI)wZ(u2)

- WI(u2)wZ(uI).

Notice that from this formula follows U _] ( W I A 0 2 ) = + 0 1 ( U ) 0 2

= (V

3 W1) A

- 02(V)Wl W2

-0

1 A (U

02).

(If c is a constant, c A w is taken t o be just cw.) The general formula can be guessed as U

(01 A 02)

= (U

W1) A 0 2

-I-(-

1 y W l A (0 J W 2 ) .

(3.1)

Now we turn things around and use (3.1) to define the exterior product

w 1 A w 2 by induction on r +s. Suppose it is defined for covectors whose

degree adds up to a number less than r the formula:

+ s, with (3.1) true. Define w 1 A w2 by

( 0 1 ~ ~ 2 ) ( ~ 1 , - - . r ~ r + s ) = u~ 1 (

w ~l ~ 2 ) ( ~ 2 , - - . , u r + s ) ,

(3.2)

where u1 _] (wl A w 2 ) is given by the right-hand side of (3.1). We must show that w , A w 2 as defined really depends skew-symmetrically on the variables ( u , , . . . , u,+,). That it depends skew-symmetrically o n the variables u 2 , . . . , o r + g follows from our inductive hypotheses that the righthand side of (3.1) is a genuine (r s - 1)-covector. We must check that it changes sign when u1 and u2 are permuted:

+

(w1 A

w2)(r1, . . . , Ur+J = u2

_I (01

J(w1

A W2NU3,

. . ., U,+J

13

3. Differential Forms

But V2

J (01 J ( 0 1

A 02)) = U2

J (U1

_I 0 1 ) A 0 2

+ (- 1y01 A

J0

(U1

2 )

= ( V 2 J (u1 J 0 1 ) ) A 0 2

+ (- l Y - ’ ( V l

+ (-

l)r(%

-k (-

101) A (U2

_I 0 2 )

01) A (u1 1( U 1

02)).

A 0 2 )

A (212

1)”01

It is now clear that this changes sign when u1 and v2 are permuted. We have shown that o1A o2is defined as an (r + s)-covector, satisfying (3.1). The “ bilinear ” rules

+ 01’)A = W 1 A O2+ W , ‘ A 1 A +02’) = 0 1 A +0 1 W2

(01 0

(3.3b)

A 02’,

Wp

(02

(3.3a)

W2,

are easy to prove by induction also, and are left to the reader. The anticommutativity of the exterior product, namely, “

”

0 1 A 0 2

= (-

(3.4)

I)rs02 A 0 1

will also be proved by induction on r 4-s. Notice that an identity such as (3.4) between (r + s)-forms holds if and only if the identity resulting from applying V J to both sides holds for all u E V . But U J (0, A 0 2 )= ( V J 01) A O 2 4- ( - l Y W l

A (V _I 0 2 )

-(-1)(r-1)su2

A ( V J 01) + ( - q r + ( S - l ) r 02) A 01 (assuming (3.4) is true for forms whose sum of degrees is < r +s)

= (-

ly[(- 1)”02

A (V

01)

= V J ((- I)*s02 A 01).

+ (V

0 2 ) A 011

Then (3.4) is proved. THEOREM 3.1 Suppose w l r . . . , 0,is a basis of V*. Then, for each r, the r-covectors ailA . * . A air,

1I i, < i ,
(f> = (01 + u2)(4*cf>) = %(4*(f)> + u2(4*(f)), which is occasionally called the diflerential of 4. The geometric interpretation of 4* is very natural: Suppose t -+ o ( t ) is a curve in M . Let t + 4a(t) = o , ( t ) be the image curve in M‘. Then

4*(a’(t))= ol’(t>; that is, the tangent vector of the image curve is the image of the tangent vector to the original curve in M under 4. Proof

4*(a’(t))(f)= a’(t)(4*(f))

d

= --f(o,(t)) = ol’(t)(f)-

dt

Notice that 4 does not have to be either 1 - 1 or onto in order that 4* can be defined on tangent uectors. However, the situation with regard to vectorjelds is not so simple. Suppose that X E V ( M ) . Then p @ * ( X ( p ) ) is a mapping M -+ T ( M ’ ) . One cannot associate a vector field on M‘ with this mapping in an unambiguous way unless 4-l exists. (Then one can define an “image” vector field 4 * ( X ) a s p ’ 4*(X(4-’(p)) = $*(X)(p’)).) However, differential forms do admit a simple law of transformation under 4: This is one main reason for their usefulness in differential geometry. (The other is that they serve as “ volume elements for integration.) Recall that 4* maps F ( M ’ ) onto F ( M ) , that is, maps F o ( M ’ ) onto F o ( M ) . We shall now extend this to a map of F‘(M’) to F‘(M) for all r, that is, to a map sending a differential form o’on M’ buck to a differential form +*(o’) on M . Recall that the linear map 4*:M , + M i ( , ) induces a dual map 4* from covectors on M i ( , ) back to covectors on M , , that is, 4*:Mi:,’, + MF‘. Regard o’as a cross sectior, p’ -+ w’(p’) E MA*‘ of T * r ( M ’ ) Then . 4*(u‘)is by definition the cross section p + 4*(0’(4(p))); that is, --f

-+

”

“

q5*(u’>(ul, . . . , u,)

= o’(4*(u1), . . . , 4*(ur))

for u l , . . . , u,

E

”

T ( M ) . (3.10)

20


Now we have the following very nice property of the exterior derivative operation : t$*(df) = dt$":(f)

for f~ F(M').

(3.11)

(When we extend d to higher-degree differential forms, this property will also ex tend .)

Proof.

For u E T ( M ) , t$*(df )(u> = d f (t$*(U))

= t$*(u)(f>

= U(t$*(f

1) = &*(f

Notice also that the definition oft$* would be impossible if differential forms were defined solely in terms of the F(M)-module structure of V ( M )and were not the same as those defined as cross sections of the covector bundles.

Exercises 1 . Suppose u , , . . . , u, is a basis of a vector space V. Define wl, . . . , o,E V * such that z: = wI(u)uI+ . . . + o,(u)u, for all u E V. Prove that ol, . . . , onare a basis of V.

2. Prove (3.7). 3. Prove the traditional explicit formula for the determinant of a matrix, using the definition of the determinant of a linear transformation given in the text. 4.

Prove (3.1 1) in an explicit way for mappings between Euclidean spaces.

4

Specialization to Euclidean Spaces: Differential Manifolds

Let R" be the space of n real variables (xl, . . . ,x,). Thus, a point of R" is an ordered n-tuple of real numbers; such an n-tuple will be denoted by ( x j ) , 1 < j i n, or by a vector notation x when no confusion is likely. Also, x i , without parentheses, will denote the real-valued function on R" that assigns the j t h coordinate to each point of R".t Consider a domain D in R";say, for simplicity, that it is contiex$: Let F ( D ) be the ring of real-valued functions that depend on a C" way on the points of D,that is, such that the partial derivatives of all orders exist. I f f € F ( D ) , df/dxi, d2f/(dxiaxj), etc., denote the partial derivatives with respect to the indicated variables. We shall now show how the objects such as "tangent vector," " vector field," and " differential form," which were introduced rather abstractly in Chapter 3, take a very familiar form here. THEOREM 4.1

A,

Let X E V ( D ) . Then, there are uniquely determined functions A , , . . . , E F ( D ) such that$

af

X ( f ) = Ai axi

f o r f e F(D).

(4.1)

Proof. Let xo = ( x i o )be a fixed (for the moment) point of D.Then, given

t The reader should note that much of the notational confusion in undergraduate differential calculus is caused by not making this distinction. 3 A domain in R" is an open, connected set of point. It is convex if it contains, for any two points xo and x' in it, the whole line segment txo i -(1 - t ) x ' , for 0 5 t 5 1. 5 We shall use the summation convention from now on. The general rules (as far as they can be formalized) are that when two indices occur in expressions multiplied together, they are to be summed over their " natural range of values" (which, presumably, has already been specified). We d o not use lower and upper indices as in tensor analysis; where upper indices are used, usually they will be ordinary counting indices. The same indices should not occur three or more times; this always indicates that a mistake has been made. Occasionally it will be required that indices occurring together not be summed, but this will be stated explicitly. Part of the convention requires that one index standing "alone" take on all values from its natural range; for example, y I j kpJ means cJy ,j k pJ for i and k = I , 2, . . . , n. 21

22


any C" function f(x) in D , Taylor's formula implies that f has a representation of the form f(x)=f(x0)

df +(x')(x~ axi

-

xi")

+ gi,(x)(Xi - X ? ) ( X ~ - x;).

(4.2)

(The functionsgij(x) are those obtained from any of the classical formulas for the remainder; they are C" i f f i s also.) Calculate X ( j ' ) , using the linearity and (2.12), and evaluate it at xo to obtain X(f)(xo). Notice, for example, that X(f(xo)) = 0, since X applied to any constant function is, from (2.12), zero. The result is

(The third terms drop out when xo is substituted.) Now x i is merely another function on D : Call X ( x , ) = A i . Thus, letting xo vary also, we have X (f )

af

a

= Ai - = Ai - (f

axi

dxi

),

whence the theorem.

Q.E.D.

Theorem 4.1 can be interpreted in the following way: Consider the differential operators , i = 1, . . . ,n, which mapfinto d f / 3 x i . They satisfy (2.12) and hence define elements of V(D)(if we define V ( D ) as derivations of F ( D ) ) . Theorem 4.1 then asserts that they form a basis for the module of V ( D ) over the ring F(D). Now we can state precisely how vector fields are defined as cross sections of the tangent bundle. THEOREM 4.2 Let xo be a point of D, and let u E D x 0 .Then

a

that is,

u = U(Xi) y

ax;

(XO).

(4.3)

In particular, we see that the values of the vector fields a/ax, at each point of D form a basis of the tangent space. Proof:

Applying

L'

to (4.2) gives the result, using (2.6).

Q.E.D.

4. Specialization to Euclidean Spaces

23

Theorem 4.2 tells us that T(D) itself is parametrized by a domain D' c R'", for, any u E D, admits a unique representation of the form

a ax,

a,-(x'): The assignment ZI -+ ( x , a,) defines the correspondence. We can use this to require that a cross section map X : D T ( D ) be C"-differentiable. The condition for this is clearly that the functions xo-+ X ( x j ) ( x o )= Aj(xo) be elements of F(D). We see from Theorem 4.2 that X considered as a cross section arises from the derivation A,(d/dx,) of F ( D ) : There is equivalence of the two possible definitions of vector field. Let us now examine differential forms. The coordinate function xj are elements of F(D). Their differentials dx, are both F(D)-linear functions on V ( D ) and cross sections of the bundle T * ( D ) : -+

dxj(X)= X(xj) dx,(v) = ~(x,)

for X

E

V(D),

for u E T(D).

Suppose w is any 1-differential form on D in the sense that it is an F(M)-linear map of V ( D )+ F ( M ) . Then

that is, w = w(d/i3xi)dxi.In particular, every such form arises from a cross section of the bundle T * ( D ) , and the particular " coordinate " differential forms dx, form a basis for the module A'(D). A similar remark holds for r-degree forms. Once we know that thedxjforms a basis, arguments identical to those used in Theorem 3.1 imply that the dx,, A . . . A d x j p ,1 5 j , <j , < . . . n form a basis for all forms of degree r, whether they are defined as F(D)-multilinear skew-symmetric maps V ( D ) x .. . x V ( D ) 4F(D) or as a cross section of the bundle of r-covectors. In particular, every r-form admits a unique expansion of the form ~ = a , , . . . , , . d x , A, * - - ~ d x , , , with skew-symmetric coefficients a,, ... , r . We shall now switch from domains in Euclidean space, denoted by D, D', etc., to general differentiable manifolds (of differentiability class Cm),denoted by M , N , etc. A differentiable manifold carries several sorts of structures, since intuitively it should be considered as a space that locally looks like a convex domain in Euclidean space, with all the local Euclidean structures tied together globally by the topology of the space. (Think of a closed surface in 3-space, say a sphere or a torus.)

24


Defttition

A space M is said to be a differentiable manifold (of class Cm)of dimension n if it carries the following structures: (a) M is a Hausdorff topological space and is the union of a countable number of compact subsets. (b) M has a covering by a family of open subsets, the typical ones denoted by U , U', . . . , and each open subset U of the family has associated with it a convex domain D in R" and a homeomorphism 4 of U with D such that, whenever two open sets U , U ' of the covering intersect, the associated transition mapping of q5( U n U ') with 4'( (In U ' ) is a map of differentiability class C" in the usual sense for Euclidean space. (Explicitly, this map assigns to an x E D such that 4 ( p ) = x,for p E U n U ' , the point 4 ' ( p ) E D '.) In addition, for differential-geometric purposes, it is convenient to suppose that the underlying topological space is connected (hence arcwise connected). We shall suppose that this is so implicitly, noting any exceptions explicitly. Each of these admissible homeomorphisms of an open set U with a subset D of R" will be called a chart. U itself will be called a coordinate neighborhood. The collection of all these charts will be called the a la s that defines the manifold structure. Notice that any open subset of R has a manifold structure: The covering can be taken as that defined by its open, convex subsets. A mapping 4: M + M ' between two manifolds will be said to be of differentiability class C" if, whenever referred back via charts for M - i M ' , it defines a C" map in the usual sense for R". As in the earlier part of this work, we shall deal mainly with C" maps; hence we shall not so specify every time one is introduced. Non-C" maps (usually piecewise C", though) will appear later, but we shall identify them explicitly. In particular, a real-valued C" function on a manifold M is a well-defined concept, that is, just a map A4 + R. The set of these functions will be denoted by F ( M ) . We base the definitions of the fundamental concepts of tangent vector-vector field, differential form, and similar terms on the properties of F ( M ) , as described before. A few details must be settled before proceeding further, to assure that everything works as smoothly as for domains in R". First, the manifold structure on the tangent and cotangent bundles should be made precise. This can be done in the following way: Let U be a coordinate neighborhood with a chart giving a correspondence with a domain D c R". Then, under this correspondence, the tangent vectors above points of U correspond in a 1-1 way to the tangent bundle T(D),which we have seen to be homeomorphic to a domain in R'". These "charts" for T ( M ) can be used to give it a manifold structure, and the vector fields are defined as the C" cross-section maps M - t T ( M ) . It can be verified (exercise) that this is equivalent to the condition that a cross-section


25

map X : & T (I M) is + C" if and only if the function p -+ X ( p ) ( f ) is C" for each f E F ( M ) . Then one shows the identity between vector fields as cross sections of T ( M ) or as derivations of F ( M ) by relating the proof back to the special case where M is a domain in R". We leave the details of this as exercises. Let M be a manifold. A set of functions x l , . . . , x , defined on an open subset U of M is a coordinate system for U if the map p + ( x l ( p ) ,. . . , x,(p)) of U -+ R" is a diffeomorphism.t Then the differentials dx,,. . . , dx, form a basis for differential forms in U. We define d / d x j as the dual basis of vector fields in U ; that is, ( a / a x , ) ( f )are the coefficients in the expansion: (4.4)

Conversely, suppose ( x i ) are functions such that the dxj form a basis for the differential forms in U. Do they form a coordinate system for U, at least if U is a sufficiently small open set of M ? The answer is "yes," by the use of the implicit function theorem. Suppose that 4 is a chart diffeomorphism from U to a domain D of R",with coordinates on this R" denoted by y,, . . . ,y,. Transferring back to D,the x i , . . . ,x, become functions of the y . By (4.4),

(4.5) Thus the (dx,, . . . ,dx,) form a basis for differential forms if and only if the Jacobian matrix ( a x i / d y j )is nonsingular, that is, has nonzero determinant. If this is so, the implicit function theorem (see Chapter 5) asserts that the mapping y + x is a local diffeomorphism, that is, if U is sufficiently small. Classical tensor analysis works by describing geometric objects completely in terms of such local coordinate systems. Suppose we regard (xi)and (yi)as two different local coordinate systems for the same neighborhood U of M . Then (4.5) enables us to express the transformation law of components of covariant tensors (like differential forms) from one coordinate system to the other. For example, if o = a, dxi = bi d y i , then

which is the characteristic transformation law. The transformation law for controvariant tensors (like vector fields) is most readily derived from (4.5).

t A map 4 from manifold M t o manifold M ' is a difeomorphism if the inverse map 4-j: M 1+ A 4 exists and is C". (A map may be 1-1 and onto, with inverse mapping not C" ; for example, the map x

--f

x 3 from R + R.)

26


Recall that d / d x , and djay, are defined as the vector fields dual to the dxi and dyi.Suppose

a

aYi

-=

Then

a

A,.--. lJ

axj

That is,

Then (4.6) is a differential-geometric version of the chain rule for differentiation. However, this restriction to the use of '' flat" bases of differential forms and vector fields is the major defect of classical tensor analysis. Many geometric structures (for example, Riemannian metrics) take a very awkward form when their description is forced into this mold. E. Cartan, in his work on differential geometry and Lie groups (which is the foundation of all " modern " differential geometry), worked with a formalism that is halfway between tensor analy. . . , w,) sis and the formalism used today. Roughly, he worked with bases (u,, of the I-differential forms in neighborhoods of M that are not necessarily just the differentials of coordinate functions. He used the greater freedom to choose the " moving frames" to reduce many geometric problems to much simpler form than was possible in classical language.

Exercises

UpEM

M,, is its tangent bundle, prove that 1. If M is a manifold, T ( M ) = the procedure sketched in the text defines a manifold structure for T ( M ) . Prove that a cross-section map: M + T ( M ) is C" if and only if X ( f ) is C" for eachfE F ( M ) . 2. Let M be a manifold, with p o E M . Show that there exists two open neighborhoods U,, U , ofp, and a functionfE F ( M ) such that: (a) The closure of U , is contained in U , . (b) f ( p ) = 1 for p E U,. (c) ,f'(p) = 0 for p E M - U , .

27


3. Show that a vector field on a manifold M can be defined either as a derivation of F ( M ) or as a C“ cross section of the tangent bundle T ( M )+ M . 4. Similarly, show that a differential form can be defined either as a C“ cross section of the covector bundle, or as a F(M)-multilinear form on V ( M ) .

5. Prove (4.2). 6. Suppose 4 : M + M ‘ is a C“ map between manifolds. Suppose X , X ’ are vector fields on M and M ’ . Let us say that they are q5-reluted if $*(X’(f’)) = X(4*(f’))

for eachf’

E

F(M’).

Prove: If X , Y E V ( M ) are &related to X ’ , Y ’ E V ( M ’ ) , then [ X , Y ] is &related to [ X ’ , Y ’ ] . Prove this abstractly, and then by using coordinate systems. 7 . Suppose A is the Laplace operator in the plane:

a2

A=-+-.

ax2

a2 ay2

Work out its expression in polar coordinates: X = r cos 0; using (4.6).

y = r sin 0

5

Mappings, Submanifolds, and the Implicit Function Theorem

We now develop the implicit function theorem and its consequences for the theory of mappings between manifolds, based on the “ inverse function theorem” in the version given by Spivak [ I , p. 351. In fact, this result takes the following form for manifolds.

THEOREM 5.1 Suppose M and M‘ are manifolds of the same dimension, and 6 : A 4 + M is a map between them. (Let p be a point of M , p’ = $(p). Suppose that $*(M,) = M i , . Then there is an open subset U containing p such that: (a) $ ( U ) is an open subset of M’. (b) $ is a diffeomorphism between U and $ ( U ) , that is, there is a map n: $( U ) U such that --f

$n(p’)

= p’

for p’ E d ( V ) ,

x$(p) = p

for p

E

U.

For example, suppose M and M ’ are identified with convex subsets D and D‘ of R”. One often wants to regard 4 as defining a “new coordinate system for D. For example, suppose X E D is of the form ( x l , .. ., x,,), with 6(x) = ($,(x), . . . , $,,(x)) = (x,’,. . . , x,,’).Then, regarding xl’, . . ., x,’as realvalued functions on M’, we have $*(xi’) = Cpi(x). Now

”

Since (d$JJx,) is the matrix of $* with respect to the natural bases for the tangent spaces of M and M’ defined by their being in R“, the condition: $*(M,) = M,. is equivalent to any one of the following conditions: (a) det(Jdi/dxj) # 0. (b) d$l, . . . , (I$,, are linearly independent at every point. (C) dol A . . . A d$,, # 0. We can now turn to the case of mapping between manifolds of different dimensions. 28

29

5. Implicit Function Theorem for Mappings

THEOREM 5.2 Let 4 : D + D' be a map of domains, D c R", D c R", m < n. Suppose that 4 satisfies the following condition: for all x E D.

4*(D,) = Di,,,

(4 is then said to be a maximal rank mapping). Then, if D is sufficiently small, it can be changed by a diffeomorphism so that 4 is just the standard projection of R" onto R". Schematically, there is a diagram of maps as follows:

1

D+R"

+\

projection

R"

(The map R" 4 R" is that which assigns, say, the point (xl, . . . , x ,) point (xl,. . . , x,J E R". (Recall that m I n.))

E

R" to the

Proof. Suppose coordinates for D are xl, . .. , x, ; coordinates for D' are y l , . . . ,y,, . Then the map is defined by 4(xl, . . .,x,) = (41(x), . . .,(p.(x)) = (yl, .. . ,y,). The n x m matrix

is the matrix of the linear transformation 4*:D, + D;,,,with respect to the natural bases of these vector spaces. T o say that 4* is onto is to say that the rank of this matrix is maximal, that is, m. After possibly relabeling coordinates and shrinking D,we may then suppose that

(:

det -(x)

)

# 0.

(5.1)

Consider the functions 4,, .. .,4", x " . - ~ , ...,x, on D. We want to show that they form a new coordinate system for D.(This is precisely what is required to prove the theorem, since the diffeomorphism D + R" is defined by (xl, .. .,x,) + (&(x), . . . ,~ J x ) ,x,+~, . . . ,xn).We must then show that the I-forms d+l, . . . ,&J", . . . , dx, are linearly independent. Suppose that there is a linear relation of the form " (5.2a) 1 2 , dba C Ai dxi = 0, "1

a= 1

+

i=m+ 1

or (5.2b)

30


(the terms . . . involve d ~ , + ~.., . , dx,,),forcing

hence Ia = 0 by (5.1); hence I , , , the proof.

=0 =

. . . = 1, by (5.2), which completes

This result can be rephrased in another way that is often useful in practice. Suppose M is a manifold, andf,, , , . ,f, is a set of real-valued functions on M . Suppose also that df,, . . . ,df, are linearly independent at every point of M . Then if (xl, . . . , x ,) is a coordinate system for M , the map (xl, .. . , x ,) + ( f l ( x ) ,. . . ,fn(x))o f a domain in R" into R" is of maximal rank. We conclude that in a neighborhood of each point of M , a coordinate system of functions can be introduced for whichf,, . . . ,f, are the first n-elements. Now we study submanifold maps. Dejnition

Let Nand M be manifolds, 4: N M a map. $ is said to define an immersion of N in M if the following condition is satisfied: --f

For each p

E

N , the tangent map $*: N , + M6(,) is 1-1.

(5.3)

If, in addition, I$ is 1-1, it is said to be a submanifold map of N in M , or defines N as a submanifold of M . Remarks. Strictly speaking, a submanifold consists of the ordered triple ( N , M , 4) satisfying these two conditions. It is often convenient and customary to relax this precise statement and regard the submanifold as 4 ( N ) , when no confusion is likely.

THEOREM 5.3 Let 4 : N + M be an immersion map. Then, each point p E N has a neighborhood U such that: (a) 4 restricted to U is a submanifold map. (b) 4 ( p ) has a neighborhood V with a coordinate system zl, . . . ,z, for V such that 4( U ) c V , and (6( U ) is the set of all points of V on which the functions z , + ~ ,. . . , z, are zero. Proof. Since $* : M , + M6(,) is onto, the dual map $* : M$(,,)-+ N,* on covectors is 1-1. Then we can find a coordinate system y,, . . . ,y , valid in a neighborhood of $ ( p ) such that:

The values of the 1-forms 4*(dy,), . . . , @*(dy,) in a neighborhood of p form a basis for the 1-covectors.

31


Then, by Theorem 5.1, in a neighborhood o f p the functions 4*(yl), ... 4*(-yn) form a coordinate system of N . Now, the functions #*(Y”+~),. ..,4*(yrn)are functionally dependent on the 4*(Y,), * . 4*(Y,)7 say7 .?

4*(Yn+

1)

= Fn+ 1 ( 4 * ( ~ 1 )... , 9 4*(Yn)),

4*(Ym) = Fm(4*(Y,),

..

. j

@*(Yrn)).

We may suppose without loss of generality that the F a r e defined over all R”. Consider the following functions on the neighborhood of 4 ( p ) :

.

~ 1 ,*

. 7

Yn, ~

n 1+ -

E +I ( Y ~ , .

*

-

7

Y m ) , . . . ? Y m - FrntY1,

.. Y,>. . ?

Their differentials are linearly independent in this neighborhood and hence there is a new coordinate system, say, zl, . . . ,z,, for a possibly smaller, neighborhood of &I) such that 4*(zl), . . . , 4*(z,) is a coordinate system for D ; $*(z,,+~) = 0 = . . . = 4*(z,). These properties imply (a) and (b) required for the theorem. Next, we inquire about the intersection of two immersed submanifolds. Here we must make the distinction between the case where the two submanifolds intersect “in general position” and where they do not. Intuitively, two such submanifolds are not in “ general position when they can be deformed slightly to change the dimension of the intersection, although this will not be the precise definition. First we shall deal with a problem in linear algebra. Let V be a vector space over the real numbers, and let V ’ , Y ” be linear subspaces. Construct the direct sum vector space V ’ 0 V”. We can map this into V , sending u‘ 0 v“ into Y’ + u“. The kernel of this linear map is V’ n V”, the range is V ’ + Y “ c Y . Thus we have the relation ”

dim V’ + dim V ” - dim(V’ n V ” ) = dim(V’ + V ” ) = dim V,

or

dim( V’ n V “ ) 2 dim V’ + dim V” - dim V.

This inequality suggests that we make the following definition :

Definition The linear subspaces V ’ and V“ of V are in general position if: (a) dim(V’ n V ” )= dim V ’ = dim V ” - dim V for the case dim V ’

+ dim V “ 2 dim V .

dim V . (b) dim(V’ n V ” ) = 0 for the case dim V ’ + dim V ” I

32


Roughly, we may say that V ' and Y" are in general position if dim( V ' n V " ) has minimal dimension compatible with the above inequality. Notice that, in case (a), dim( V ' + V " )= V ; that is, V' + V" = V. The crucial geometric property of this definition can now be stated.

5.4 THEOREM If ( Vt', V,") are two families of linear subspaces of V, depending continuously on a parameter t (say, 0 i t _< 1) and if ( V,', V,") are in general position, then ( Vt', V,") are in general position for t sufficiently small. Proof. The general position condition is equivalent to the condition that the linear map V ' @ Y" -+ V , constructed above, when made into a matrix by means of bases, have maximal rank; that is, a subdeterminant of maximal order must be nonzero. Since the determinant must vary continuously, this subdeterminant remains nonzero for sufficiently small t .

THEOREM 5.5 Let N , N ' , M be manifolds, and let 4 : N -+ M , 4 ' : N - + M ' be immersion maps. Suppose p E N, p' E N ' are points such that &p) = 4'Cp');&(NJ and 4*(NL,)are in general position within M 4 ( p )Then . there are neighborhoods U of p , U ' of p ' , in N and N ' such that 4( U ) n U ') is a submanifold of M whose dimension is equal to dim N + dim N ' - dim M . (PI(

Proof. Let n = dim M - dim N . By Theorem 5.3, there is a neighborhood V of +(p), a neighborhood U of p , and a maximal rank map $: V + R" such

that

$(&I)

= 0,

4 ( U ) = $-l(o)?

#J*WP) = $;Yo).

Let U ' = 4 ' - ' ( U ) . Considerthemap$+': U ' - + R".IncasedimN+ dim"= dim M , ($4')*= N i , + R," must be 1-1 ; hence $4' is an immersion map if U is sufficiently small. But $*$*I:

($4)(4(U> n 4'(U')) = 0, which shows that 4(U) n 4 ' ( U ' ) = 4'(p) only, as required. Consider the case that dim N + dim N ' > dim M . Then (++')* must map Nd. onto R,"; that is, $4' is a maximal rank mapping if everything is sufficiently small. Then, again,

4 - Y 4 W ) n 4'(U')) = ($4')-YO). But ($@)-'(O) can be represented by an immersion map of the required dimension by Theorem 5.3, which finishes the proof.


33

Finally we remark that all these different versions of the implicit function theorem may be intuitively summarized by saying that arbitrary C“ mappings satisfying maximal rank conditions behave locally just as linear mappings of vector spaces. Thus there is a good technical reason why a thorough knowledge of linear algebra is one of the most important prerequisites for the study of differential geometry!

Exercises 1. Suppose 4 : M + N is a maximal rank mapping of manifolds (that is, &(Mp) = N 4 ( p )for all p E M ) . Prove that for p E 4 ( M ) , the “fiber” @ ‘ ( p ) is an embedded submanifold of M .

2. Prove that the intersection of two embedded submanifolds which always meet in general position is a submanifold. (Determine whether they are embedded or immersed.) What is the global structure of the general-position intersection of two immersed submanifolds ?

6

The Jacobi Bracket and the Lie Theory of Ordinary Differential Equations Jacobi Bracket

So far, the basic objects-vector fields and differential forms-have been considered in order t o set up a formalism equivalent to ordinary differential calculus which, in addition, is independent of the choice of local coordinates. However, these ideas were first developed by S. Lie, not only for formal reasons but also for expressing in geometric form the many subtle and interesting relations between the theory of ordinary differential equations and the theory of Lie groups. Although the modern theory of Lie groups emerged from this work, it is an area that seems to be neglected in modern research. Here, we shall indicate only a few of the simpler ideas, partly for their own sake and partly to motivate the introduction of the Jacobi bracket operation on vector fields, which will play a major role in our work. We shall work with domains D in R” with coordinates xi.The extension to general manifolds will usually be evident. Recall that a vector field on D is defined as a derivation, say X , of F(D). It was proved in Chapter 4 that X took the form of a first-order partial differential operator:

a

x=/li--. axi If X and Yare vector fields, XY is not, since as an operator it is a secondorder differential operator. Explicitly, for f, g E F(D),

However, note that the “ b a d ” middle terms cancel out if ( Y X ) ( f g ) is subtracted from X Y ( f g ) ; that is, if f - X Y ( f ) - YX(f) is a derivation of F(D), and hence defines another vector field that we call the Jacobi bracket of X a n d Y , and denote by [ X , Y ] . The following formal laws follow from direct (although occasionally tedious) computations, which we leave to the reader: 34

35

6. Jacobi Bracket and Lie Theory

(6.le) ( X , X , , . . . , Y , Z , . . . denote vector fields, c,, c 2 , . . . denote real constants, f E F(D).) Properties (6.la), (6.1b), and (6.1~)express the fact that V ( D ) is, as a real vector space, also a Lie algebra. (A Lie algebra is a vector space with a multiplication’’ ( X , Y) -+ [ X , Y ] defined for any two elements X and Y satisfying (6.1a), (6.1 b), and (6. lc).) A curve a(t), a 5 t I b, is said to be an integral cume of a vector field X if o’(t) = X ( o ( t ) ) for a 5 t < b. Suppose the expression in coordinates for X is “

a

X = Ai-. ax,

Now d dt

- x,(CJ(t)) = C’(t)(Xi) = X(a(t))(x,)

a axj

= A,(o(t))-(Xi) = A,(o(t)).

Thus we see that CJ is an integral curve of X if and only if its coordinate functions xi(t) = x,(o(t)) satisfy the system of first-order differential equations: d

- Xi(t) dt

= A,(x(t)).

(6.2)

Invoking the existence theorem for systems of ordinary differential equations, we have : THEOREM 6.1 Suppose X is a vector field (of class C E )in the domain D.t Then

t All these statements will hold for manifolds also.

36


(a) Two integral curves o: [a, b] + D and o,:[a,, b,] D of X that coincide at just one common point to of their domains of existence must coincide in their entire common domain. (b) Given xo E D,there is a number a > 0 and an integral curve t + o(t, xo), 0 2 t 2 a, of X with a(0, xo)= xo.This function depends in a C“ way on xo. In addition, a depends on xo,but can be chosen independent of xo over any compact subset of D. (c) Ifo(t), a 5 t I 6 , is an integral curve of A’, so is thecurve o,(t)= a(t +c), a-c I tI b - c, obtained by translating the time” parametrization of 0. --f

“

All these statements are the geometric analogs of fundamental analytical properties of systems of differential equations of the type (6.2). For example, condition (a) of the theorem is just the uniqueness of solutions of (6.2); (b) follows from the usual Picard iteration method of solving (6.2) ;(c) follows from the uniqueness and the fact that (6.2) does not contain t explicitly on the righthand side (that is, it is a so-called autonomous system). These properties of integral curves enable us to try to “ continue solutions of (6.2) so that we can obtain integral curves defined over maximal domains of t. For example, start off with an integral curve o(t), 0 I t 2 a , , with o(0) = xo. Find an integral curve o,(t),a, 2 t 2 a,, with o,(a,) = .(a,). By uniqueness, the two curves can be fitted together (without corners) to obtain an integral curve over 0 2 t 2 a,. Repeat the process beyond a2 and also in the negative direction. Although we do not want to go into the details here, we can say that the process will succeed in proving the existence of a n integral curve defined over ( - co, co) unless “barriers” are met in the form of two numbers CI, p, CI < 0, fl < 0, such that: ”

There is an integral curve a(t), a < t < p, with a(0) = xo, but there is no such integral curve in a domain containing (a, 13).

One reason for the existence of these barriers” may be that the integral curve wants to escape from D into the remainder of R”. However, barriers may occur even if D = R ” ; for example, suppose that n = I , D = R’, X = x’(d/dx), and (6.2) becomes “

Intuitively, the curve wants to escape to a3 at t = l/xo. One might think that one way of remedying this would be to add ” a point at 00 to R’. This can be done successfully, but it leads to a differentiable manifold. (In this case, the manifold is the circle.) With these warnings in mind, let us, for the sake of understanding the geometric meaning of the Jacobi bracket, suppose that X and Y are vector “

37


fields defined in D, all of whose integral curves can be extended over (- co, co). Suppose a(t), 0 t I a, is an integral curve of X . For each t construct the curves s -,a(t, s), 0 I sI b, such that: (a) a(t, 0) = a(t) for 0 I 1s a. (b) For each t, the curve s a(t, s) in an integral curve of Y. --f

We ask: If we hold s fixed, and consider the curve t --* a(t, s), when will this curve become an integral curve of X for each such s ? In terms of the coordinates (xi)for R", suppose that X=A.I

d

axi '

d

axi

&=Bi-

xi(t, S )

9

= xi(o(t, s)),

xi(t) = xi(V(t)).

Then our constructions translate into the conditions x,(t,

d x i(t) - A,(x(t)),

0)= Xi(t),

~

dt

Put

~

as

- B j ( x ( t ,s)).

a

Ci(t, s) = - X i ( t , s) - Ai(X(t, s). at

Now

THEOREM 6.2 If [ X , Y ] = 0, then for each s the curve t + a(t, s) is an integral curve of X . Intuitively, knowing one integral curve of X and all integral curves of Y starting on this curve, a whole family of integral curves of X can be obtained. Proof.

[ X , Y ] = 0 if and only if aAi

aBi

-A, - - B .

axj

axj

J

is identically zero. Then Ci(t,s) satisfies the equations

ac, -(r, as

S)

aB.

= 2( x ( t ,s))cj(t, s),

axj

38


which comprise a system of linear homogeneous first-order ordinary differential equations for C i in s (with t held fixed). Ci(t,0) = 0 , since o(t) is an integral curve of X . Then we take it as known from the theory of ordinary differential equations (uniqueness!) that Ci(t,s) is identically zero, which implies that for each s, t --+ o(t, s) in an integral curve of X . Q.E.D. We now turn to the interpretation of Theorem 6.2 and the integral curves of a vector field in terms of the theory of groups (assuming always that those integral curves of vector fields that we shall be considering can be extended indefinitely within D). Suppose t + o ( t ; x") is the integral curve for X , - 00 < t < 03, such that o(0) = x". Since t + o(t 4- a ; x") is also an integral curve, which takes on the value o(a; x") at t = 0, we must (by the uniqueness of integral curves, that is, by part (a) of Theorem 2.1) have

o(t + a ; x") = o(t; a(a; x")). For each t E (- co, co) define a transformation T, of D into itself as follows: T,(xo) = o(l;

x")

for each x" E D.

Now x" + Tf(xo)is a transformation of D into itself (of differentiability class C", by the fundamental existence theorem for ordinary differential equations). Also,

T+,,(xo>= o(t + a ; xo> = o(t;o ( a ; x")) = T,(T,(xO>).

Since this holds for each x" E D, we have T,+,,= T,T,,. In particular, To is the identity transformation T , T- = T - T , ; that is, T - , is the inverse of T, and T, is an invertible transformation of D into itself (a "diffeomorphism"). The property T,+, = T,T, tells us that the family { T r :-CQ < t < co} o f transformations forms a one-parameter group of transformations of D into itself. This is the one-parameter group generated by X . X can be reconstructed from T, since the curves t -+ T,(xo)are integral curves of X for each x" E D . Suppose that Y is another such vector field, with [X, Y ] = 0, and that Y generates the one-parameter group S, , - co < s < co. Transcribing Theorem 6.2 into group language, we have several equivalent statements:

'A the transform by each S,, t + S,(o(t)) (a) For each integral curve o(t)of , (which, in the notation of Theorem 2.2, = o(t, s)) is also an integral curve of X . Thus the one-parameter group S, permutes the integral curves of X , or leaws invariant the differential equations, giving the integral curves of X . This interpretation is basic to the Lie theory of ordinary differential equations. (b) For each x" E D, each s, t E (- 00, co), Ss(Tr(xo)) must equal T,(S,(x")),since the curve t + S,(T,(x"))is an integral curve of X starting at


39

Ss(xo) when t = 0. Since this is true for each xo E D, we have S,T, = T, S, ; that is, the one-parameter groups generated by X and Y commute. Suppose now that 4 is a diffeomorphism of a domain D in the domain D’. Thus 4* sets up an isomorphism of F(D’) and F(D). Hence vector fields on D and D‘ correspond. Given X E V(D),define 4 * ( X ) E V(D’) as follows:

4 * ( X ) ( f ’ ) = 4-’*(X(+*(f)N

for f ’ E W’).

(6.3)

for x’ E D’.

(6.4)

This is equivalent to the property

4 * ( X ) ( x ‘ )= & ( X ( ~ - ’ ( X ‘ ) ) ) Proof.

Using (6.3),

4 * ( X ) ( x ’ > ( f )= 4 * L m f ’ ) ( X ’ )

=

4 - *(X(+*(f’))(x’)

= X(4*(Y)(4-‘(X‘)).

However, this is just the right-hand side of (6.4). This mapping X + 4*( X ) has two main properties :

4*(CX, Yl)

=

C4*(X), 4*(Y)l

for

x,y E V ( D ) >

(6.5)

that is, 4* is a Lie algebra isomorphism. (c) If t + a(t) is an integral curve of X , then the image curve t + 4(o(t)) is an integral curve of +*(X). Conversely, if the image curves of all integral curves of X under 4 are integral curves of Y E V(D‘), then y

= +*(XI-

(6.6)

The proofs of these statements are straightforward, and are therefore left to the reader. To provide some practice with this formalism, we discuss the local canonical form theorem for nonsingular vector fields and give some indications of its importance in the Lie theory of ordinary differential equations.

THEOREM 6.3 Suppose that X is a vector field in a domain D of R“, of coordinates

x = (xi),and xo is a point of D with X(xo) # 0 (that is, if X = A,(a/ax,), then not all Ai(x0) are zero). Then, if D is small enough, there is an invertible transformation such that

4: D

D‘, where D’is a domain in the space of variables y i

Proof. We shall give a geometric proof, but shall leave verification of certain analytical details to the reader.

40


At most reordering the coordinates, we can suppose that A , ( x o ) # 0. Suppose that xo = 0. Construct a mapping of ( y l , . . . ,y,)-space into D by mapping ( y , , . . . ,y,) into x,(y,), . . ., x,(y,), where t --+ ( x i ( t ) )is the integral curve of X , which is equal at t = 0 to (0, y , , . . . ,y,). This mapping is invertible (and is left to the reader to verify); hence they,, . . . , y , can be introduced in a neighborhood of x0. When one follows an integral curve of X in these new coordinates, y , increases linearly while the y z , . . . , y , remain the same. This, however, is just the condition that X i n these new coordinates is a/dy,. Q.E.D.

Theorem 6.3 is the simplest of the “canonical form” theorems that play a central role in the modern theory of ordinary differential equations. Notice that actually putting X into this canonical form is more or less equivalent to ‘‘ solving the differential equations defined by the integral curve. ”

Applications to the Lie Theory of Ordinary Differential Equations The Lie theory tion of a system, “

”

is merely an interplay between the geometric interpreta-

of ordinary differential equations as the integral curves of the vector field X = A ,(d/dx,) and the interpretation of X as a generator of a one-parameter group of transformations. Actually, we have been using Lie theory” all along. However, one may consider the Lie theory in the more restricted sense as the discussion of those parts of the general theory that have relevance to the sort of problem one faces in “ explicitly” solving differential equations. As a first remark : Suppose Xis a vector field,f’is a function with X(f)= 0. Suppose a([) is an integral curve of X . Then “

that is,f‘is constant along all the integral curves of A’. Classically, the function f i s called an integral of A’, or of the system (6.7), defining the integral curves. Conversely, functions having this property satisfy X(f)= 0 ; that is,fsatisfies

41


the first-order partial differential equation

af axi

Ai - = 0.

One may interpret the problem of “explicitly” solving the system (6.7) as that of finding ( n - 1) functionally independent integral functions, say, f 2 , . . . , f , , for then the submanifolds f 2 = constant, . . . ,fn = constant, in x-space, are one-dimensional and in fact are the sets of points described by each integral curve of (6.7). Thus, “ explicitly solving involves some formal process that converts a set of integral functions into a possibly larger set of interest. To see an example of such a formal process, suppose we are given another vector field Yon D such that [ Y, X I = gX for some g E F(D). lff’is an integral of X, so is Y ( f ) . ”

Proof. X ( Y ( f ) )= Y ( X ( f ) )+ [ X , Y ] ( f )= 0 - g X ( f ) = 0. Thus, Lie derivation by Y is the formal process generating (possibly) “ new integrals. Let us now see what the condition [ Y, XI = g X means geometrically. First we ask whether there is a function h in D such that the vector field X‘ = hX satisfies [ Y, X ’ ] = 0. Obviously, h must satisfy: ”

Y(h)

+ hg = 0

or

Y(log h ) = -g.

Thus log h (and hence h) can be found locally if, for example, Y(x) # 0 for x E D,for then we can suppose, in view of Theorem 6.3, that Y = (d/ax,); hence log h can be found by a simple quadrature: x,,)

or

log h

= Jg(x,,

...,

Now notice that the integral curves of X ’ = h X and X differ only by a change in parametrization, provided h(x) # 0 for x E D.(We may say that two curves a(t), a i t i 6, and a,(t), a, i t i b,, differ only by change in parametrization if there is a map a : [a,b] -+ [a,, b , ] ,that is, between the intervals of parametrization, such that da/dt is always # O and such that a(t) = al(a(t))for a i t 5 b. If da/dt is always > O (resp. ) dt.

Thus the system of order (n - 1) can be solved first, and then x l ( t ) can be found by “ quadrature,” that is, by an integration. The order of the differential equations defining the integral curves of Y has been essentially reduced by 1 . If n = 2, this is ideal, since the system (i) can also be solved by “quadrature.” These observations constitute Lie’s main contribution to the classical problem of solving differential equations in the plane. If

dY - P ( X l Y ) dx Q ~ Y ) is such a differential equation, the solution curves, when written in parametric form, are the integral curves of

Lie observed that all the classical tricks for “solving” this equation by quadrature were associated, in the way we described above, with a one-parameter group of transformations in the plane.

Exercises

1. Suppose X = Bi(d/dxi),where all the Bi(x) are homogeneous of degree I ;that is, B i ( l x ) = l B i ( x ) for each 2 > 0. Show that for each, the transformation x + esx permutes the integral curves of X . Deduce that [ X , Y ] = 0, where Y = xi(d/axi).Now verify this directly.

2. In the (x, y ) plane, consider the vector field

a ax

Y=y--x-.

a

ay

44


Show that the one-parameter group it generates is the group of rotations: + y sin t, x sin t - y cos t ) . Let X be another vector field

(x,y ) + (x cos t

a

.4-+B-

ax

a

ay

in the plane. Find the condition: The one-parameter group generated by Y permutes the integral curves of X up to a change in parameter. 3. Find the coordinate system in which the infinitesimal generator of the one-parameter group of rotations in the plane has its canonical form. If

a ax

X=A-+B--

a ay

is such that [ Y, X I =fX (that is, if the problem is rotationally symmetric), find the " explicit " formulas for the integral curves of X. 4. Consider the space of one variable x,and on this space the three vector fields

Compute the Jacobi brackets. Show that the one-parameter group generated by any linear combination of these three vector fields with constant coefficients is contained in the group x + (ax b)/(cx d ) of linear fractional transformations.

+

+

5. Suppose X = Ai(djdxi) and Y = Bi(d/dxi)are vector fields such that [ X , Y ] = Y and such that the n-vectors, ( A i ( x O ) )and (Bi(xO)),are linearly independent. Show that the coordinate system can be chosen so that xo = 0, and about this point,

where c1 is some function of the indicated variables. Suppose that Z = Bi(djdxi)is such that 0 = [ X , Y ] = [ Y, 21. Show that the integral curves of Z in this coordinate system can be found by solving a system of order n - 2, followed by quadratures. 6. Suppose

21 = 2, [ Y, 21 = 2 X and such are vector fields such that [ X , Y ] = Y, [X,

45


that the vectors (Ai(x0)),(Bi(xo)),(Ci(xo))are linearly independent. Show that the coordinate system can be chosen so that xo = 0, and about this point,

y are functions of the indicated variables. Show that the problem where u, /I, of finding the integral curves of any vector field W that satisfies 0 = [A', W ]= [ Y, W ]= [Z, W ]in these coordinates can be reduced to solving a system of differential equations of order n - 3, quadratures, and a Riccati equation. (A Riccati equation is one of the form dxldt = a(t) b(t)x c(t)xz.)

+

7. Prove (6.5) and (6.6).

+

7

Lie Derivation and Exterior Derivative; Integration o n Manifolds

Let us return to the study of differential forms on a manifold M . We have described the Jacobi bracket operation (X,Y ) -+ [X,Y ] on vector fields, the Lie derivativef+ X ( f ) of a function by a vector field, and exterior derivative f + df of a function. We shall now extend the latter two operations from functions f (that is, differential forms of degree zero) to differential forms of any degree. Note first that the definition of [ X , Y ] can be rewritten as

X ( Y ( f ) )= Y ( X ( f ) )+ [ X

mf).

The key idea is that X acting on Y ( f ) acts first on Y , leading to [X,Y ] ,and then onf, leading to X ( f ) . Suppose now that o is an rth degree differential form. For XI, . , . , X,E V ( M ) , o ( X , , . . . , X,)is then a function. Let X be another vector field. Let us apply X to this function and write X(W(X1,

. . . ,X,))

=

X ( w ) ( X , , . . . > X,)

+ d C X , XI13 x2, . . .,X,)

+ . . . + w ( X , . . . , [ X , X,]).

(7.1)

Now, we use (7.1) as the definition of the r-form X ( w ) , and call it the Lie derivative of o by X. We must verify that X ( w ) is well defined by (7.1). That it depends skewsymmetrically on XI, . . . , X, should be obvious. The only nontrivial point is that it is F ( M ) multilinear, that is,

X ( w ) ( f X , , X 2 , . . . , X,)

=fX(w)(X,,

. . . , X,)

for f~ F ( M ) . (7.2)

(Here we use the algebraic “module” definition of differential forms. It is much more convenient for the purpose of doing things in a coordinate-free way than is the vector bundle definition.) Now

X ( w ) ( f X , , . . . , X,)

= X ( w ( f X 13

. . ., X , ) )

-4[X,fX,l,X2, =

X ( f ) o ( X , , . . . , X,) -

.-.,Xr>

- 4 f X l , X , X 2 , ...,Xr)

+ f X ( X 1 , . . ., X,)

w ( X ( f ) X , , . . . , X,) - o ( f [ X , Xll,

x,,. . . , X,) ... .

Notice that the first and third terms now cancel, as required in order to prove (7.2). 46

47

7. Lie Derivation and Exterior Derivative

For w E F'(M), X E V ( M ) ,define the contraction of w by X,X J a,?as the ( r - 1)-form given by ( X A w ) ( X , , . . ., X,-,)= o ( X , x,, . . . ,X , - , ) .

(7.3)

It is readily verified that (7.1) gives the rule X ( Y J o)= [X,Y]_I

+ YJX(W)

Using this, we can prove that

X(w,A

w 2 )= X ( o , )

w2

A

for X , Y

+ w1 A X(w2)

V(M).

E

for X

E

(7.4) (7.5)

V(M).

Proof: Suppose degree w , = r, degree w 2 = s. Then (7.5) is true for r = s = 0. Proceed to prove (7.5) by induction on r s. Let Y be another vector field:

+

Y i X(w,A w 2 )= (using (7.4)) X ( Y =

x((Y JWl) -

A W2

(wl

A

(CX, Y l

+ (y

J0

1 )

A a 2

A

_I ( w l A 0 2 )

( y Jw2))

+ ( Y J X(0))A

w2

A x(O2)+ (- l y ( X ( w 1 ) A ( Y 1 ([x,y1 _I w2) + A ( y X ( w 2 ) ) (CX, Y I J w1) A w2 + (- 1Irwl A ( [ X , Y l

mi)

W2)

WI A

-

w 2 ) ) - [X,Y I

+(-1yWi

A 0 2 + ( - 1 y m 1 A ([X, y1 0 2 ) ) (7.4) again and the induction hypothesis)

([x,y ]

= (using

_I

0 1

w2)).

When the cancellations are made, we get Y applied to the right-hand side of (7.5). Since Y is an arbitrary vector field, (7.5) holds for forms of this degree also. Iff

E

V(M),

then X ( d f ) = d X ( f ) .

(7.6)

Proof. For Y E V ( M ) ,

Y

J X ( d f ) = X(Y J df) = X(Y(f))=

But,

y

[X,Y] J df

rx, Yl(f1

Y(X(f)).

4 X ( f > )= Y X ( f ) .

Q.E.D.

The geometric meaning of the Lie derivative by X is not so evident in this formal treatment: It will become clearer when we deal with Lie groups.

t In some differential geometry books, X -I w is denoted by i ( X ) ( w )and X ( w ) is denoted by 8 ( X ) ( w ) or LAW).

48


Roughly, X ( w ) is a measure of the extent to which o is invariant under the one-parameter group generated by X . Suppose, for example, that we examine the geometric consequences of the condition X ( w ) = 0. If X is identically zero, it means nothing, since X ( w ) is always zero. Pick a point p at which X ( p ) # 0, and introduce a coordinate system x iabout p in which X = ajax,. Now o admits an expansion in this neighborhood of the form

1a j l ...

j,

dxj, A

A

*..

dxjr.

Using the rules developed above,

0 = X(o)= C X ( a j l _ _ _ j , ) dxjl

+ ail ...

j,

... A dxjp d ( X ( x j , ) ) A . .. A d x j r + . . . A

since d ( X ( x j ) )is always zero. Hence

that is, a j , ... j p is a function of x2,. . . , x, above. The one-parameter group t + $ t generated by X then leaves x2,. . . , x, alone and increases x1 linearly; x1-+x I t . We see that (7.7) is the condition that o be invariant under each of these transformations; that is,

+

for all t .

q!~?(o)= O

This holds in a neighborhood o f p ; however, the set of points where it holds is open and closed in M , and since we are assuming M to be connected, it holds everywhere on M . Now we turn to extending d to forms of all degree, sending an r-form o onto an ( r 1)-form do. We shall want the following basic formula relating d to Lie derivative to hold:

+

X(w)= X

_I

do + d ( X

_I

for X E V(A4).

o)

(7.8)

Let us take advantage of the fact that we have already defined X ( w ) , and can assume that d(X w ) is defined by induction, to define d o for forms of degree r, assuming it is defined (and satisfies (7.8)) for forms of degree less than r. Explicitly, dw(X1

3

. . ., Xr + 1)

. . .> Xr + 1) O ) ( X , . .., Xr+l).

= X , ( o > ( X ,3

- 4x1

1

(7.9)

Now, as in the earlier definition of X ( w ) , we must verify that (7.9) really


49

defines do as a skew-symmetric, F(M)-multilinear function of XI, . .., X,,,. This is similar to the earlier computation and is left as an exercise. The three remaining important properties of d are

X(dw)= dX(w), d(w,

w 2 ) = do, A w,

A

+ (- lye, A dw, ,

(7.10) (7.11)

(if o,is an r-form), and d(dw) = 0

for all forms w.

(7.12)

All three can be proved b y the technique we have used already; namely, we assume that they are true for forms of lower degree and apply the inner product Y to both sides, for an arbitrary vector field Y. As an example, we prove (7.12) with this technique: Y A d ( d o ) = Y ( d o ) - d(Y A d o ) =

Y(dw) - d(Y(w) - d( Y A m ) )

=

Y ( d 0 )- d Y(w) (the third term vanishes by induction hypotheses)

=0

(by (7.10)).

In principle, (7.9) can be worked out to give a noninductive, explicit definition of do. However, this is never used in practice. Either (7.8) is used or d can be calculated very simply in local coordinates. Suppose xi are coordinate functions on a neighborhood of M. Then ~=Ca~,...~,d ~ x. ~. , . ~ d x ~ ~ .

Using (7.1 1) and (7.12), we see that do

=

dajl .__ j , A d x j l A . - .A d x j v .

(7.13)

(Notice that (7.13) requires (7.12) only for zero forms, where it is easily proved directly, and (7.1 I). But (7.10) for forms of degree greater than zero is an easy consequence of (7.5) and (7.10) for forms of degree zero. Thus (7.10) could be proved quite simply by using (7.13) instead of the formal method indicated above as an exercise.) We find that (7.13) has as consequence the basic rule for the behavior of d under mappings between manifolds. Suppose 4 is a map: M' -+ M of manifolds. Then we have b*(dw) = db*(w)

for each differential form w on M .

(7.14)

Proof. We have already verified (7.14) for zero forms. But it suffices to

50


verify (7.14) in each coordinate patch, and (7.13) obviously enables this, since

4*(0)= d$*(w) = =

c .__ c 4*(daj, ... +*(Uj,

to do

.. A dxj,). d 4 * ( x j , ) A ... A d 4 * ( x j $

j,)4*(dXj, jr) A

US

A

'

4*(dW).

Integration on Manifolds Our main concern in this book is with differential calculus on manifolds. Since we shall also occasionally need some of the basic facts of integral geometry, we now present a short survey of what we need. The reader can refer to Spivak [I] for a fuller treatment of the integration theory. Let M be a manifold (always assumed to be C" and representable as a countable union of compact sets). Suppose dim M = n. Definitions

M is orientable if it admits at least one n-differential form w of degree n whose value is nonzero at each point of M . Two such forms w and o' are equivalent (for the purposes of orientation) if w =f w ' , with f E F ( M ) ,f ( p ) > 0, for all p E M . An orientation of A4 is just an equivalence class of such forms. It readily verified that if M is connected, it admits either no orientation or two. (For if an n-form w defines an orientation, - w defines an orientation in a different class. If o' also defines an orientation, o' must equal f w for some everywhere nonzero f E F ( M ) . Since M is connected, either f > 0 or f < 0 everywhere on M . In the former case, o' is equivalent to w ; in the latter, to -o.)If M is disconnected, fixing an orientation on each connected component of M defines an orientation for M in an obvious way. A coordinate system (x,,. . . , x,) valid in a connected open set U of M is positively (respectively, negatively) oriented with respect to an orientation defined by an n-form o if dx,

A

... A dx,

= fw,

with f > 0 in U (respectively, ,f < 0 in U ) . A partition of unity for M is a sequencef;, f 2 , . . . of functions from F ( M ) such that (a) C ? , f , ( p ) = l a n d f j ( p ) 2 O f o r a l l p E M , j = 1 , 2,.... (b) Each functionfi has assigned to it an open subset U j of M such that fi vanishes outside U j , and each U j meets only a finite number of the other sets of the sequence.

51


(c) Each set U j is a coordinate neighborhood of the manifold, with a coordinate system of functions x,,. . . ,x, valid in U j .

It follows from (a) and (b) that M is the union of the U j . Conversely, it can be proved that any open covering U , , U , ,.. . of M such that each set meets only a finite number of the other sets has a set of functions f , , f i , . . . associated with it satisfying (a) and (b). The proof can be found quite easily in Helgason [l, p. 81 or Spivak [I, p. 631. Now, let M be orientable, with a fixed orientation. Let 8 be an n-form on M and let f be a continuous real valued? function on M . We define the integral off over M with respect to 8, denoted by jM fe, as follows: Case 1

f vanishes outside a coordinate neighborhood U.

Let 4 : D + U be a diffeomorphism of U with a convex domain D in R", chosen so that the coordinate system defined by 4 on U is positively oriented. Let (x,,. . . ,x,) be coordinates on D,4*(8) = g dx, A .. . A dx, . Then put

where the integral on the right-hand side is the ordinary$ Riemann integral for the function 4*(f ) g on the domain D. We pause to show that this is independent of the choice made of 4 and D. Suppose then that 4': D' -+ U is another diffeomorphism of U with a convex domain D' in R", with 4'*(8) = g' dx, A . . . A dx, . Let II/ be the mapping D' -+ D defined as: II/ = 4-'4'. Then $*(dx,

A

* - .

A

dx,)

= 4'*4-'*(dx,

A

... A

dx,)

But, $*(dx, A A dx,) also equals Jdx, A ... A dx,, where J is the Jacobian of the mapping 41, from D' to D. Now we take, as known from advanced calculus, the behavior of the Riemann integral under a diffeomorphism $: D' + D,namely,

IDh d x , But J

* *.

= g'/II/*(g),

dx, =

$*(A) IJI

JD,

dx,

. . . dx,

which is positive, since both

for all h E C(D').

4

and

4'

define coordinate

t Of course the theory can be trivially extended t o complex functions by separating them into real and imaginary parts. $ When we write dxl . . . dx. with no wedge products, we just mean the ordinary, unoriented Riemann integral. Thus, to be pedantic, it should as a symbol be distinguished from s D + * ( f ) g dx, A ... A dx., although, of course, it is equal to it as a number.

52


systems that are oriented positively. Now ‘M

f0

=

14*(f)g D

... dx,

dx,

Case 2 f vanishes outside a compact subset of M . Let { f l , f 2 , . . .} be functions on M defining a partition of unity for M . Then f can be written as

f=c& j =1

The sum on the right-hand side is really finite, for by property (b) of a partition of unity, the compact subset outside of whichf vanishes meets only a finite number of the elements of the covering of M associated with the partition of unity. Since eachfSi vanishes outside a coordinate neighborhood, by case 1,

jM&8 is defined.

N o w let us define

(Again, this is really only a finite sum.) We must prove that this is independent of the partition of unity chosen. Suppose, then, that {f,’, f2’, . . .} is another partition of unity. N o w {fjfk’: 1 ij,k < a}is also a partition of unity, since?

f fjfi(P) c m

j,k= 1

=

m

cfjfi(P)

j=1 k = l

Some analysis is needed to prove that the double summation can be broken up, but the justification (left to the reader) readily follows from theorems on convergence of infinite series, since all terms are nonnegative.


Now consider split up into

x:k=l

J M ffjfk'8. w

53

Since this is really a finite sum, it can be m

.

For fixed j , ffjfk' vanishes outside a coordinate neighborhood; hence the additivity of the Riemann integral implies that

Performing the double summation in the reverse order, we see that

which proves invariance. Case 3

The General Case. Let Cc(M) be the vector space of continuous, realvalued functions on M that vanish outside a compact subset of M . We can sum up what we have proved above in the following way: An n-differential form 8 on M defines a h e a r functional f + J M f d on C,(M) with the following property : Given a compact set K of M , there exists a number > 0 such that, whenever f E Cc(M)vanishes outside K and is everywhere bounded in absolute value by 1, J M f 8 5 a. a

(Tracing through case 2 and referring back to the properties of the Riemann integral in bounded domains in R", we see that a is fixed, once a particular finite set of coordinate neighborhoods covering K is chosen.) Now, this is just the property needed to extend J M f d to functions f on A4 that are merely Bore1 measurable. (Following the pattern established in extending the Riemann integral from continuous functions on closed intervals to a Lebesgue integral over the whole real line, roughly, one approximates f by sequences from C,(M), while defining the integral by a limiting operation.) As usual in measure theory, we say that such an f is integrable if iM1f 10 is finite. We have all the general tools of integration theory that are available on, say, the real line for the Lebesgue integral. However, all immediate work in this chapter is concerned with continuous functions, so that we are really dealing only with the analog of the Riemann integral. It must be emphasized, however, that one should not go too far in trying to regard this from the point of view of functional analysis. The main aims in integral geometry are computation of such integrals, or at least statements of

54


theorems showing how such computations can be reduced to computation of geometric invariants, and of theorems about the behavior of the integrals under mappings. In addition, in the future the problem of singularities of such integrals will be increasingly important. In such problems it is important to be able to utilize the intuitive tricks of integrating that are learned in integral calculus. In these computations, our standard notation is sometimes awkward,? and more intuitive notations are desirable. For example, for J M , f 8we sometimes write

and so forth. The third notation is useful when one form of 8 is fixed throughout the discussion; it can be called dp. Physicists have devised a useful notation for integrating over domains in Euclidean space of variables x l , . . ., x,, which goes something like this: sf(.) d"x. We now turn to the question of behavior under mappings, which is really the main problem of integral geometry. The most immediate concern is behavior under diffeomorphisms, which follows more or less from the definitions.

THEOREM 7.1 Let 4: M -+ M' be a diffeomorphism between manifolds, and let 8' be a volume element1 form on M ' , f ' E F ( M ) . Then

JMW = J 4*(f'M*(O'). M Pro06 When .f vanishes outside a coordinate neighborhood of M , this is inherent in the proof given above that the defining Riemann integral depends on the coordinate neighborhood. The general case can be obtained from this one by using a partition of unity. Next, suppose that M is a manifold, that o is a p-form on M , that N is a p-dimensional manifold, and that the map 4: N + A4 defines N as a submanifold of M . Then 4*(o) is a volume element form with respect to N ; hence we

t Although, conversely, sometimes problems in integral calculus become much more amenable when written in a more abstract than usual notation. This is particularly true of "change of variable" arguments; writing down explicitly the mappings involved and using the general formalism often clears up much confusion. 1 By this we mean a form of the same degree as the dimension of the space. We shall state things for C " functionsf, but of course everything that does not involve differentiation off usually extends to at least continuousf. In addition, we shall not state explicitly unless there is a possibility of confusion about whether we are working with a particular orientation on each manifold.


55

can define J N 4*(w) as above. As usual, it is often convenient to suppress mention of 4 and write this simply as jN o. Defining the integral of a p-form over a p-dimensional manifold is very natural and easy, since differential forms can be pulled back” under mappings. Now let us suppose that 4 : M -P B is a map of manifolds and that o is a volume element differential form on M . Of course it makes no geometric sense to think of defining an object like &(w), since differential forms are covariant objects; that is, they behave under mappings as functions rather as points. However, o does define a linear form on functions, namely, f -P Jyf o , which, as the dual of a covariant object, is contravariant. Thus it may be expected that the linear functional defined by o on forms does get “pushed” by 4 to define a linear functional on functions of B. We shall denote this functional by c#-’*(w). Thus, as definition, 4 - ’ * ( w ) is a linear map C,(B) R, defined by “

--f

6-’*(o)(j’)= JM 4 * ( j ) w

for j’E c,(B).

Now there is one immediate difficulty with this definition, namely, 4*(f ) may not be integrable with respect to the measure defined by o.However, there are two main cases where this difficulty does not arise: (a) All continuous functions on M are integrable with respect to w. (b) 4 is a proper map; that is, the inverse image of every compact subset of M under (b is a compact subset of M . For the moment, the reader can assume that we are working with one of these assumptions-there are various devices available for weakening them. To justify this unorthodox notation, namely, +-‘*(o), note that by Theorem 7.1 (in case 4-l exists; that is, 4 is a diffeomorphism) 4-l*(o) as a functional agrees with the functional defined by the form 4 - ’ * ( w ) . In addition, if $: B+ C is another map, the reader can easily check that ($4)-’* is just $-I*$-’*, so that the notation will not lead to inconsistency on iteration of mappings. We also believe that the notation has some intuitive geometric content. If 4-l exists as a map, 4 - ’ * ( w ) is of the same degree as o.If not (for example, if B is of lower dimension than M ) , then $-‘*(o) is something like a volume element form on B so that 4-’* “collapses” the degree of w. In fact, we shall see below that 4-’* acts by “collapsing” the component of o along the fibers of 4 by a process of “integration over the fibers.” There is another mapping associated with 4 that carries measures on B back into measures on M . Let 6 be any fixed volume element form on B. Then any f E F ( B ) defines a measure on B, namely, that associated with the volume element form f0. But we can pull back f to 4*(f), then multiply +*(f)w to get a volume element, and hence get a measure on M . We can thus

56


regard this as a mapping of F ( B ) into the space of measures on M , or more generally, as the space of linear functioiials on C(M). Thus we have the possibility of extending this map, defined by 4 and o of C ( B ) into? C,(M)* from certain generalized functions$ on B, into certain generalized functions on M ; we shall continue to use the notation 4* for this mapping. The most important such generalized function is the Dirac delta function. Suppose, then, that b is a point of B, that d,, d 2 , . . . are a sequence of elements of C,(B) that converge to the Dirac delta function dbf; that is, lim

j A m J”M,

djfO = f(6)

for allf

E

C,(B).

Then, according to our definition, Cp*(S,) is the linear functional on C , ( M ) such that

Physicists use the notation

4*(M = 6,-

‘ ( b )7

regarding 6,as the “delta function” corresponding to the set 4 - ’ ( p ) just as the usual Dirac functions associated with a set consisting of one point. Using this bit of formalism, we can write the relation between 4-’* and +* in an interesting, but purely formal, way. Suppose the measure 4-’*(0) on M ’ is defined by a volume element differential form on M ‘ ; for example, by one of the forms go. Thus

where d6 is a suggestive shorthand for 0. Now, formally, g can be written as J B

thus, formally again,

Using the relation +-‘*(o) = go again, we have (purely formally, of course) /Md*(db)”

= ’>(b’)g(6’)

d6‘.

t This just denotes the set of real-valued linear forms on C , ( M ) . 1 We refer to Gelfand-Silov [I J for the notion of “generalized function.”

57


Finally,

s, I fW =

(JMff#-’*(6b)o)

3

db

for all

f €

F(M).

(7.15)

This is one of the two main formulas of integral geometry (the other is the Stokes’ formula); it describes how an integration over M can be decomposed into an integration over the fibers of 4, and then into an integration over B. (It is a generalization of the formula

which is well known in integral calculus. However, (7.15) contains a good deal more information, since it holds in cases where the map (p is much more complicated than the simple projection map associated with a Cartesian product.) At any rate, one of the most urgent tasks of integral geometry is to find the broadest conditions under which (7.15) holds, that is, in which the formal tricks we used can be justified. We shall now mention one general theorem that is relevant. Let M be a manifold, 4: M - , B a map of M onto B, with dim B I dim M and with both M and B orientable. We say that a point p E M is a nonsingular point of the mapping if 4 * ( M p )= B b ( p ) ,and we then say that a point b E B is a nonsingular image point of the mapping if (p-’(b) consists only of nonsingular points. Let 4 be an everywhere nonzero volume element form on B. A theorem of Sard tells us that the singular image points of 4 form a subset of measure zero on B. (By “measure zero” we mean relative to that measure defined by 4 on B. We must refer to the exposition given by Sternberg [I, p. 471.) Now, if b E B is a nonsingular image point of 4, it follows from the implicit function theorem that 4-’(b) is a submanifold of M whose dimension is equal to (dim M - dim B). Now, let/€ Cc(B) and let o be a differential form that vanishes outside a compact subset of M and whose degree is equal to (dim M - dim B). Thus 0=0A

4*(f$)

,s

is a volume element form on M ; hence 8 is well defined. We want to express this integral in terms of an integration over the fibers of 4 and over the base manifold B. THEOREM 7.2 With the above notations, (7.16)

58


A word of explanation of the notations inherent in this formula is necessary. If b is a singular image point, we must regard j.b-,(b)$ as undefined. If b is a nonsingular image point, we define I)as follows: I) restricted to 4 - ’ ( b ) is a volume element form for the submanifold. The points where this submanifold is nonzero form an open subset of the submanifold (hence also a submanifold of M ) , and S & - l ( , , ) $ is defined as the integral of $ over this submanifold. Thus b +f(b) j4- I) is a real-valued function defined except for a set of measure zero on B ; hence the right-hand side of (7.16) is defined as the integral of this function over B. The proof of formula (7.16) can be found in all generality in a paper by Federer [I], although it is expressed there in a different language. We shall not give the proof of (7.16) in full here, but only in the case where 4 has no singular points, that is, where 4 is a maximal rank, onto-mapping. Actually, the fact that (7.16) can allow singularities is its most interesting and delicate point, but the “nonsingular” version we prove here is adequate for most of the applications we have in mind. Since (7.16) is linear in $, notice first that it suffices to prove it in case $ vanishes outside a coordinate neighborhood of M . By the implicit function theorem, M can be covered by coordinate neighborhoods U having coordinate systems x i , . . . , x, with the following properties:

I&-

(a) 0 < x I ,..., x, < 1. (b) 4 ( U ) is a coordinate neighborhood for B, with coordinate system y , , . . . , y, such that $ * ( y , ) = xl,. . . , +*(y,) = x,. Suppose that in g!(U),,fib is given by a form h(y) dy, A ... A dy,. We can suppose without loss of generality that w is of the form k(x) dx,+l A ... A dx, (since any of the factors involving dx,,. . . , dx, will not affect either side of (7.16)). Thus the left-hand side of (7.16) is

1; . . . Jolh(x)k(x)

dx,

. . . dx,

The right-hand side is

and the two are then equal by the property of the Riemann integral that asserts that multiple integrals can be evaluated by iterated partial integrals. Q.E.D.

COROLLARY TO THEOREM 7.2 Suppose that 4*($) A o is a volume element form for M that is nonzero on 4-’(b), with b a nonsingular image point for 4. Then 4*(6b) = d + - I ( b ) , the


"

59

Dirac delta function" of the fiber 4 - ' ( b ) , is just o in the sense that

To make this more explicit we give an example: Suppose that M is R" itself and that 4 is a map of R" -+ R ; that is, 4 is just a real-valued function on R", say, of the form x + 4(x).

Suppose, say, that b = 0. The condition that b be a regular image point is then just d4(x) # 0 for each x such that 4 ( x ) = 0. Let grad 4 be the vector field

a4 a I---. axi axi

i= 1

Then grad 4 ( x ) # 0 for each x such that 4(x) = 0. Suppose the form $ on R is just the Riemann integral form. Then 4*($)= d4. Let us then try to find an (rz - 1)-form o such that

0 = dx, A ... A dx, = d 4

A

o.

4 to both sides: grad 4 -I 0 = grad 4(4)w - d4 A (grad 4 A o). Now d 4 is zero when restricted to the fibers of 4 : Apply the inner product of grad

which, anticipating the notation to be introduced in Part 3 , we write as Thus, we see that

llgrad 4112.

8,-1(,,1 can be represented by the ( n - 1)-form (grad 4 -I Q)/)Igrad4112, in the sense that 8,-1(,,) applied to a functionfE C,(M) is just the integral off over the hypersurface +-'(O) with respect to this form. At this point we make contact with the material in the first volume of a treatise by Gelfand and silov [ I ] on "generalized functions," and we refer the reader to that discussion for more detail and for the fascinating applications to partial differential equations. We have now completed our admittedly fragmentary remarks about the general facts concerning the behavior of measures defined by differential

60


forms under mappings. We now turn to the second basic general fact about integration on manifolds, namely Stokes’ formula.” Now, just as the behavior of measures under mappings can ultimately be reduced (if things are not too pathological) to the very simple theorem that a multiple Riemann integral can be reduced to iterated one-dimensional Riemann integrals, so can “ Stokes’ formula” be reduced to the fact that the integral of a derivative of a function is the function itself. Here, again, it is rather difficult to state and prove precisely a version of Stokes’ formula that is comprehensive for all geometric applications (at least without detouring into considerable technicalities). We shall compromise again by stating it in reasonable generality and by proving it under simple hypotheses. Let M be a (an oriented) manifold that will be fixed throughout the discussion. Let o be a form of degree equal to (dim M - I), and let D be an open subset of M . Let D‘ be the boundary of D in A4 (that is, D‘ = D - D, where D is the closure of D in M ) . Now, of course, D‘ can be quite pathological. However, nice domains will have boundaries that can be exhibited as the union of a “large piece” that is a submanifold of M of codimension 1 (that is, a hypersurface) and various smaller pieces of lower dimension. Suppose this hypersurface is orientable: One of the two possible orientations can be chosen as follows: Let p E D‘ be a point on the hypersurface boundary N of D, and let u E M , be the vector such that sufficiently small curves with tangent vector u point inside D. Then, a basis u , , . . . , (n = dim M ) for N , is positively oriented if (ox, . . ., u,_ u) is a positively oriented basis of M , (in terms of the given orientation on M , that is, if 0 is the everywhere nonzero volume element form on M , then B(u,, . . . , vnp1, u) > 0.) This orientation of N will be called the positive orientation of N relative to D. We denote by d D this hypersurface on the boundary of D (possibly nonconnected, of course) with the orientation described above. Stokes’ formula then states that “

“

”

,,

(7.17) (Of course, under suitable conditions, one can allow d D to be the whole boundary of D if the proper precautions are taken to orient the hypersurface part of the boundary and the convention adopted is that the integral of o over a subspace of lower dimension is zero; but our procedure is more in line with d D as it is defined in topology.) Now we can prove one simple, but adequate version of Stokes’ formula: THEOREM 7.3 Let D be an open subset of an oriented manifold M , and let d D be an oriented hypersurface of M that lies on the boundary of D and is positively

61


oriented relative to D. Suppose that U , , U , , . . . is a sequence of open subsets of M which covers D u d D and such that, in each U j ,j = 1, 2, . . . , there is a function such that

fi

dD n U j = { p E U j : f j ( p ) = 0},

dfj # 0

(7.18a)

in U j ,

(7.18b) (7.18c)

D n U j = { p E U j : f j ( p ) > 0},

and such that each U j meets only a finite number of the others.

(7.19)

Then Stokes' formula holds in D for each form w that is defined and smooth in a neighborhood of D u do. Proof. Using a partition of unity, it suffices to deal with the case where o vanishes outside one of the sets of the covering having the properties described in (7.18). If necessary, making the covering smaller, we can suppose that each of the sets carries a coordinate system xl, . . . ,x, such that x1 is just the functionf, . Thus we can reduce to the case where: (a) D is the subset {(xl, ...,x,) E R": x, > 0). (b) d D is the subset { ( x ~ ., . . , x,) E R": x1 = 03. (c) w vanishes outside a compact subset of D. Suppose, for example, that o = gl(x) dx,

+

A

... A dx, - g, dx, A ... A dx,-,.

A

dx,

A

... A dx,

g, dx,

Since (7.17) is linear in w and Ciw, it suffices to deal with essentially just two cases, namely, o = g,(x) dx,

A

... A dx,

or

w

= g2(x) dx, A

dx,

In both cases, g1and g2 vanish outside a compact set. Then

In the first case,

integrating first with respect to x,, we find that

A

... A dx,.

62


In the second case,

(xz.

..., x,)

E R“-

= 0,

after integrating with respect to x2 and remembering that g 2 vanishes at infinity. But jaD do also vanishes, since dx, = 0 on dD. Q.E.D. Remark. In both Theorems 7.2 and 7.3 we have used the same trick, namely, find a formula that is expressed in completely geometric, coordinatefree language. To prove the formula, we first verify it in the simplest possible cases, where it reduces to a well-known property of the Riemann integral, and then extend it, using partition of unity ” tricks, to more complicated situations that are built up from the simple ones. In fact the whole procedure is the prototype for many of the ideas of algebraic topology. “

Exercises

1. Prove that dw as defined by (7.9) depends skew-symmetrically on its arguments. 2. Work out dw(X, Y ) and dd(X, Y, Z ) explicitly if w and 0 are 1- and 2-forms. Guess and prove the general formula do(X,. . ., X,,,) for an r-form. 3. Prove (7.10) and (7.1 I), two ways: first, using local coordinates; then completely intrinsically.

4. In the proof of (7.14) given, show why “it suffices to verify (7.14) in each coordinate patch.” 5 . Suppose A4 is an orientable manifold, and N is an embedded submanifold of A4 of one less dimension. Suppose that a function f~ F ( M ) is identically zero on N , and d f # 0 at each point of N . Show that N is orientable.

6. Show that the classical Gauss, Green, and Stokes’ theorems (proved in vector analysis) are specializations of the general Stokes’ theorem.

8 The

Frobenius Complete Integrability Theorem

Let M be a manifold, and let V ( M )be the set of its vector fields. Originally, we defined an X E V ( M ) as a cross section of the tangent bundle to M. However, we have established that X can be alternatively defined as a linear mapping: F ( M ) + F ( M ) such that

X(fS)= X(fk +

fm)

forf, 9 E

wf).

This property can be described algebraically by saying that a vector field is a derivation of the ring F(M). Now, given X , Y E V ( M ) , we established (in Chapter 6) that the vector field [ X , Y ] ,the Jacobi bracket of X and Y, can be defined by the rule

f

+

[ X , Y l ( f )= X ( Y ( f ) )- Y ( X ( f > ) .

Formula (6.1) gave some of the algebraic properties of this bracket operation. In particular, they showed that V ( M )is a Lie algebra (over the real numbers) and established a connection with the theory of Lie groups, which will be explained in more detail in Chapter 10. In this section, we establish a connection between this algebraic structure and certain geometric facts. A set H of vector fields on M is said to define a vector-jield system on M if it is an F(M)-submodule of V ( M ) ;that is, if

fX+gYEH

f o r X , Y E H , f,g€F(M).

We shall suppose that such a vector-field system is given on M . For p E M , define H,, the “value” of H at p , as the set of all vectors of M , of the form X ( p ) , for X E H. H , is a linear subspace of M,: Its dimension is called the rank of H at p and is denoted by r ( p ) ; the point p is said to be a maximal point for H if rCp) 2 r(q) for all q E M . LEMMA 8.1 r(p) I r(q) for all points q sufficiently close to p . In particular, the set of maximal points of H i s an open set of points in M .

aI r(p) be elements of H such that the (X,(p)) are a Proof. Let X,, 1 I basis of H , . To prove the lemma it suffices to show that the (X,(q)) are linearly independent elements of H , whenever q is sufficiently close to p . 63

64


This is indeed a general fact. Suppose that

a axi Since the values of the vector fields (a/ax,) form a basis of the vector space of X a = A at.-

tangent vectors at each point, the dimension of the subspace of M , spanned by the Xa(q) is equal to the rank of the r x n matrix (Aai(q));that is, it is equal to the number of rows of the largest square submatrix whose determinant is nonzero. Since the determinant of a matrix of continuous functions is a continuous function, the r x r subdeterminant, which is # O at q = p (since the rank is Y at p , by construction) will remain # O when q varies in some neighborhood of p . This proves the lemma.

LEMMA 8.2 Let p be a maximal point for H. Then there is a neighborhood U of p and a set of elements (X,), 1 2 a 2 r, in H such that (a) (Xa(q)) is a basis for H , for all q E U . (b) Each X E H can be written in the form X = f a X a , with (Such a set of vector fields is called a basis for H in U . )

,fa

E F(U).

The proof is a corollary to the argument of Lemma 8.1. The vector fields ( X u )chosen for that proof are linearly independent at every point q sufficiently close to p ; hence they form a basis for H , , since dim H , = dim H,. This proves (a). To prove (b), choose an X E H. X(q) can be written as ,f,(q)Xa(q)for q sufliciently close to p . The assignment q + (fb(q)) defines the functions f, . It remains only to show that they are C". This can be done, as in the proof of Lemma 8.1, by writing the X , in terms of local coordinates. Definition

A mapping 4 : N 4 A4 of a manifold N into A4 is called an integral map of H if 4 * ( N p ) c H , for all p E N . A functionfE F ( M ) is called an integral function of H if X ( f ) = 0 for all

fEM.

Notice that this notion generalizes ideas we have already discussed for the case where H has a basis consisting of a single vector field. In that case, the integral maps that are submanifolds are one-dimensional, that is, are determined locally by ordinary differential equations. As we shall see, integral maps of more general vector-field systems are determined locally by partial differential equations.

65

8. The Frobenius Complete Integrability Theorem

Suppose for the rest of this chapter that all points of M are maximal points for H. Notice that an integral functionfc F ( M ) is constant along an integral submanifold map 4 : N - + M . For suppose t -+ o(t) is a curve in N . Notice that df(H,)

=0

for all p

E

M.

(8.1)

Since &(o’(t)) E H g ( t ) we , see that d dt

- (f(4(40))> = 0,

which shows thatfis constant along N (if it is connected, of course). Now, we may ask: When are the integral submanifolds determined by the integral functions? Precisely, we mean the following: For p E M , let H,‘ be the set of all vectors z, E M , such that: df(z,)

=0

for all integral functionsfdefined in a neighborhood of p .

By (8.1), we have H, c H,‘. If H,

= H,’

for all p

E

(8.2)

M,

we say that the integral functions determine the integral submanifolds. In fact, supposef,, . . .,f, are a maximal set of functionally independent integral functions of H defined in a neighborhood of p . Consider the submanifolds defined locally about p by setting these functions all equal to constants. One obtains submanifolds locally defined about p that will also be integral submanifolds of H if (8.2) is satisfied; that is, if Y = dim N - s. However, H cannot be an arbitrary vector-field system if this condition is satisfied. For then H can locally be defined as the set of all X E V ( M ) such that d f 1 ( X ) = 0 = * . . = dfS(X). Thus, if X , Y E H , dfl(CX, Yl) = 0 = ... = df“,

Yl),

that is, [H, H ] c H. Algebraically, this means that H i s a Lie subalgebra of the Lie algebra V ( M ) . Geometrically, (8.3) is an “integrability condition,” as we shall now prove. COMPLETE INTEGRABILITY THEOREM, LOCAL VERSION) THEOREM 8.3 (FROBENIUS

Suppose H is a vector-field system on H which satisfies the integrability condition (8.3). Suppose that p is a maximal point for M , with r = dim H,.

66


Then p has a neighborhood U and a coordinate system ( y i ) , 1 I iI n, defined in U such that (a) The a/dy,, 1 5 a I r, form a basis for H in U . (b) The y , , r + 1 < u I n, form a basis for integral functions of H in U (in the sense that any integral function f can, in this coordinate system, be written as a function of the y , alone). (c) The submanifolds y , = constant are integral submanifolds for H .

(A coordinate system with these properties is called a$at coordinate system for H.) The proof will proceed by induction on n ; this induction will involve repeated application of Theorem 6.3, and the following trick.

LEMMA 8.4 If M is sufficiently small, there exists a basis (X,) for H in M so that X b l = O. 3

Proof. We can first suppose that M is sufficiently small so that there are elements A’,’ E H forming a basis of H in M. Suppose that

X,’

a axi

= Aai-,

in terms of any local coordinate system (xi). Thus, rank ( A J q ) ) = r for all q E M . By at most relabeling the coordinate system and possibly choosing M smaller, we can suppose that det(d,b(q)) # 0

for

E

M.

Let (Bob)be the inverse matrix to (Aab);that is, BnbA,, then

= 6,,

. If X ,

= BobXb’,

The (X,) also form a basis for H i n M , since (Bob)is everywhere a nonsingular matrix. Thus: [ X , , X,] must be a linear combination of the X,; that is, [X,, X,] =fa,, X , for some functions fobc E F ( M ) . Note, however, from the form of X , given above, that [X,, X,] does not have any terms involving a/ax,. This forces fa,, = 0. Q.E.D.


67

Suppose now that (X,) is a basis for H in M satisfying [ X , , Xb] = 0. Using Theorem 6.3, choose a new coordinate system ( y i ) for A4 (if necessary, making M even smaller) so that

x i =-.a

aYl

Suppose X ,

= Cai(a/ayi).Then

0 = [ X , , X,]

ac. a a~1 ayi ’

= ?--

that is, aCai/ayi= 0 and hence

Suppose

Then, for 2 I a, b I r, =

Ex,, xbl

=

[X,’,

xb’]

+ xi,c b l x l + Xb)l a alone. + terms containing -

= [Calxi

ayi

Thus, also [X,’, X,’] = 0, 2 I a, b I r. The alay, = X i and the X,‘, 2 I a I r, form a new basis for H in M . But (X,’), 2 I aI r, is a basis for a completely integrable vector-field system in a domain of the space of variables ( y 2 , ... ,y,). Part (a) of Theorem 8.3 thus follows by induction on n. Parts (b) and (c) are then obvious consequences of (a). For example, let us prove (a). Letf(y) be an integral function expressed in terms of these coordinates (yi). Since (a/ay,) E H, we have df/ay, = 0; that is f is a function of y , + , , . . . ,y , above. Q.E.D.

Remarks. Theorem 8.3 provides us with r-dimensional integral submanifolds locally defined about each point. The global form of the Frobenius theorem provides (if every point is a maximal point) a unique, maximal, connected integral manifold passing through each point. The intuitive idea in its proof is to take the piece of an integral submanifold provided by the local version (that is, Theorem 8.3) and “ analytically continue” it. For example, the process we described earlier for finding integral curves of a vector field

68


defined over maximal intervals of real numbers is a special case. We shall give more details below. There is a dual description of vector-field systems that is also useful. Suppose o r +,, . . . , onare 1-differential forms on M. Define H as the set of vector fields X such that 0 = or+, ( X )

=

. . . = on(X)*

Suppose (xi)is a coordinate system for M , and o,= aUidxi. If the rankof the (n - r ) x n matrix ( ~ ~ ~ is ( p(n) -) r ) at every point of M , then every point of M is a maximal point, and r = rank H . Suppose, for example, that: o,= dx,, + auodx, .

Now an integral submanifold

4 : N -+

(8.4)

M , of H , satisfies

4*(o,) = 0,

(8.5)

since If the o,are given by (8.4), we can attempt to define integral manifolds of H by givingf,(x,) to xu as a function. Then (8.5) requires that the following system of differential equations be satisfied :

The integrability conditions (8.3) can also be expressed in terms of differential forms. In fact, we have the following result, which we leave to the reader.

LEMMA 8.5 H is completely integrable, that is, satisfies (8.3), if and only if do, can be written in the form 8," A w , , for some choice of 1-forms d,,; that is, do, belong to the " ideal " (in the Grassman algebra of forms) generated by the o,. Now we turn to the global form of the Frobenius theorem. Let H be a vector-field system on M ; suppose that every point of M is a maximal point of H (we then say that H i s nonsingular), and the integrability condition (8.3) is satisfied. Recall that an integralcurve of H is a C" map f + a(r)of an interval [a, 01 + M such that o'(t)E H,(().

for a I tI 6.

Let us extend this notion to define an integral path of H as a continuous image of an interval of real numbers that is composed of a finite number of pieces of integral curves. For p E M , let Lp denote the set of points of M that


69

can be joined t o p by an integral curve. Lp is called the leaf of H which passes through the point p . THEOREM 8.6 (GLOBAL VERSION

OF THE FROBENIUS COMPLETE INTEGRABILITY

THEOREM)

Each Lp can be made into a submanifold of M so that it is a maximal connected integral submanifold of H , and so that LqP= H ,

for all q E Lp.

(8.6)

We can sketch the proof of this theorem from the local version proved in Chapter 8. First of all, recall that a function f defined on an open set of M is an integral of H if X ( f )=0

for all X E H .

This condition is satisfied by f if and only if it is constant along all integral paths of H. Let q E Lp. By the local version, there is a neighborhood U of q on M and a coordinate system xl, ~.. , x, for U such that x,+

1,

. . . ,x, are integrals of H ( r = dim H J .

Such a coordinate system will be said to be afrat one with respect to H . Any other f E F ( U ) that is an integral of H can be written as a function of the x,,,,

*.*,x,.

A basis for the open sets of L p will be obtained as follows: For q E Lp, let U be an open set containingq and carrying a flat (with respect to H ) coordinate system xl, . . . ,x,. Since x , + ~ ., . . , x, are constant on Lp, the map q’

-+

(xl(q’),

..

. 7

XAq’))

defines a 1-1 correspondence of the subset {q’:xr+l(q’) =xr+I(q),

q e . 9

x,(q’) =x,(q)>

of L p n U with an open subset of R‘. We call this set of L p a slice of Lp with respect to the flat coordinate system. All such slices will be taken as the basis for open sets on the topology of Lp. The slices determine a system of coordinate systems, with open subsets of R‘, for a possible manifold structure on Lp. The transition map between two such coordinate systems is C“, since it is given by functions inherited from the transition maps for coordinate systems of the manifold structure of M . What is not obvious a priori is that, with the topology so defined, L p can be covered with a countable number of open sets. In fact, this is a rather deep fact, whose proof we shall not give here, but for which we shall refer the reader to Chevalley [I]. Let 4 : L p - + M be the inclusion map. It is clearly C“, and 1-1 ; that is, it is a submanifold map.

70


It should be obvious from its construction that (8.6) is satisfied. However, what is meant by “maximal” integral manifold? Suppose that 4’: N + M defines a connected integral manifold of H, and that the inclusion map 4: L p + M can be written as 4 = 4‘$, where $ is a map: Lp + N , and Cp‘ is an integral submanifold map: N + M $ , with N connected. Since Cp‘t,b is a submanifold map, so is $. Now + ( N ) is contained in Lp, since every point in 4 ( N ) can be joined to p by an integral path of H ; hence $(Lp)= N . Then, $ is a 1-1 onto submanifold map and hence must be a diffeomorphism. It is in this sense that L p is a maximal ” integral submanifold. This finishes the proof of Theorem 8.6. However, as a by-product of the proof we obtain the following theorem, which plays a very important role in the theory of Lie groups. “

THEOREM 8.7 Let p E M , and let L p be the leaf through p of the nonsingular completely integrable vector-field system ff. Suppose that 4: N + M is a map of manifolds such that 4 ( N ) c L p . Then 4 can be factored through a differentiable map $: N + L p ; that is, $ followed by the inclusion map: L p-+A4 is Cp.

Proof. Evidently, there is a point-set map $: N + L p with this property, but it is not obvious that it is a C“ map. Suppose, then, that J’E F(LP).We must show that $*(f)E F ( N ) . Since L p is a manifold,,fcan be written as the sum of functions that vanish to the outside of slices of L p . To see this, note that we have taken over the proof given by Chevalley [ l ] that L p can be covered by a countable number of slices by flat coordinate systems of H . The existence of a “partition of unity” (see Chapter 7) for this covering of L p then guarantees this property, since f can be written m

where f l , f ; , . . . is the partition of unity,’’ with each of its elements contained in a slice-coordinate neighborhood of L p . Now any such function can be written as F ( x , , . . . , x,.).This function can evidently be extended to a C“ function in a neighborhood of N surrounding the slice. Thus $*(f)is obtained by pulling back via Cp a C“ function on M , and hence is C“ on N . “

Q.E.D.

Theorem 8.7 guarantees that the submanifolds defined as leaves do not have one kind of possible pathology. However, there is another sort of pathology that they might have, namely, they may not be regularly embedded in the sense of the following definition:


71

Definition

Let #: N -+ M be a submanifold of the manifold M . It is said to be regularly embedded if # is a homeomorphism of N with # ( N ) , that is, if the map # - I : # ( N ) + N (which exists in the point-set sense, since # is assumed 1-1) is continuous.

In fact, N is regularly embedded if and only if the following property is satisfied : Every point p E # ( N ) has a neighborhood U such that #-'(U n #(A')) consists of one connected coordinate neighborhood of N . Then, if we think of a curve on a space winding around it an infinite number of times, coming nearer and nearer to a given point each time, it is not regularly embedded. We leave the discussion of the global properties of completely integrable systems at this point, with an apology to the reader for lack of details and examples concerning this rich subject, which deserves a book of its own. Our immediate aim here is to do only enough to use the results as a tool in Lie group theory. Exercises

1. Suppose M is a manifold and X , , . . . , X , are vector fields such that [ X i , Xi]= 0 for 1 I i, j I r, and such that the(X,(p)) are linearly independent at a point p E M . Show that there is a coordinate system ( x i , . . . , x,) valid in a neighborhood of p such that

x.=-a

axi

for 1 i i _< r.

2. The torus is defined as the space obtained by identifying two points

x = (xl, x,), x1 = (x,',x,') whose coordinates differ by an integer. Consider

the system of parallel lines in R2 whose slope is a given vector a = ( a i ,a,). Show that this, projected down to the torus, is a one-dimensional foliation? whose leaves are those lines. Find the conditions on a that the leaves be nonregularly embedded, or dense, or both. Also examine the question of the existence of global integrals of the foliation. (Approached directly, one probably has to use facts from number theory, which can be found if necessary The system of leaf-submanifolds defined by a nonsingular, completely integrable vector field system is called a foliation.

72


in appropriate texts. There are other indirect proofs using Lie or topological group theory or both. It would be instructive to compare the two approaches.) 3. There is another proof (given by Cartan [l]) of the local existence of leaves that starts from (8.4). Construct the functions xu =fu(x,) by finding ordinary differentials for the functions t -+f,(tx,), with (x,) regarded as a “ parameter.” Work this out as a problem, and show directly that the resulting functions actually do define an integral submanifold.

4. Suppose H is a vector-field system on M , with dim H(p) = n for all p E M . Suppose that each point of M has an n-dimensional integral submanifold of H passing through it. Must H be completely integrable?

9

Reduction of Dimension when a Lie Algebra of Vector Fields Leaves a Vector-Field Invariant

As we have said, the Lie theory of ordinary differential equations is concerned with discussing the interrelation between a set of differential equations and a group of its “ symmetries,” with particular emphasis on the question of how various properties of the group help in the practical problems connected with the differential equations. We shall now examine one typical situation. Let M be a manifold, X E V ( M ) a vector field on M , and L a linear set of vector fields on M such that

CL,L1 = L, [L, X ] = 0.

(9.1 a) (9.1b)

(Condition (9.1 a) means that L forms a Lie algebra of vector fields.) Let p be a point of M . We shall suppose that in a neighborhood U of p , there are elements Y ,, . . ., Y, E L such that each Y EL can be written uniquely in the form

Y = fa(Y)Ya + f ( y ) X ,

(9.2)

a 2 r and the with functions fa( Y ) , f ( Y ) E F ( U ) . (Choose indices 1 I summation convention.) Using condition (9.1b), we have 0 = CX,Y l = x(fa(Y>>Ya+ X(f(Y))X*

Hence

X(f,(Y))= 0

for 1 5 a 5 r ,

Y EL.

(9.3)

This means that all thefa( Y ) are integrals of X . Since we are trying to solve ” A’, that is, find as many integrals as possible, note that any functionfthat can be expressed as a polynomial in the fa( Y ) is an integral. Designate 2 ! the set of integrals obtained in this way. (In other words, R is the smallest algebra (over the real numbers) of functions defined in CJ containing all the f,( Y), 1I aI , YEL.) “

LEMMA 9.1 If Z E L ,f E R, then Z ( f ) E R. 73

74


ProoJ: It suffices to prove this when f occurs among one of the generators of Q, that is, as one of the f,( Y ) , for Y EL.

Comparing these two expressions, we see that Z ( f , (Y ) )E R, as required. Let N be the subset of U consisting of the points q such that for allf

f(q)=f(p)

E

R.

We shall suppose that N is a submanifold of M . Note that the vector field X is tangent to N ; hence we can reduce the problem of finding its integral curves to finding the integral curves of X restricted to the submanifold N . (This process can be repeated about every point of M of course.) Let L , consist of those vector fields X in L that are tangent to N , that is, X ( q ) E N , for q E N . Then L, is a Lie subalgebra of L, that is, [LN, L,] c L, . Let XN be the vector field X restricted to N so that, also

[LN

9

xh'l

= O.

Then the process can be iterated. Let us examine this. Suppose the Y , , .. ., Y , EL were chosen so that Y , , . . ., Y, E L,, Y, , , , . . ., Y, are linearly independent mod L,; that is, no linear combination of them lies in L,. If Y EL, , then y = f m x ,+ f ( W .

i:

z =a = s + 1 f m x a

(9.4)

is tangent to N . Let us suppose that R has a certain number of functions that, in the neighborhood of p , are functionally independent, and such that every other function in R can be written as a function of them. We can suppose these functions, x I , . . . ,x,, are part of a coordinate system, x l , . . ., x, for M . Choose indices 1 I ii n. Then we can suppose without loss of generality that N is determined by the equations: x i = 0, and that p is the point 0 of R".

LEMMA 9.2 I f Y EL satisfies Y ( p ) E N , for one point p

E

N , then Y EL,

.

75

9. Reduction of Dimension

Proof. We know that Y ( x i )for 1 5 i I n are functions and Fi(x,, .. . , x,,) Y is tangent to N if Fi(0) = 0 for 1 < i 5 n. But, thisis soif and onlyif Y ( p )E N p . Q.E.D.

We see from this lemma that 2 defined by (9.4) is identically zero on N . For otherwise there is a point q E N such that & s + , f,( Y)(q). Y, E L , , which is a contradiction. Thus, we have for Y EL, , Y = i f a ( y ) y a+ f ( y > x , a=

1

restricted to N . Now thefa( Y ) are constant on N , since they belong to Q. There are then essentially two cases: Case 1 f( Y ) = 0 on N for all Y ELN In this case, notice that Y EL, is everywhere nonzero on N if it is nonzero as an element of the vector space L. A Lie algebra of vector fields with this property is said to act simply. Thus we may say that the reduction process reduces the general case to the case where the given Lie algebra of vector field acts simply. Case 2 fY # 0.fov some Y ELN

c:=

Then Y f,( Y )Y, and X are vector fields whose integral curves differ only by a change in parametrization. However, the former vector field is an element of L ; hence its integral curves may be considered as “ known.” Thus the integral curves of Xare “known” also by a simple quadrature, once f( Y ) is known. Now we consider another method for reducing dimension when a known Lie algebra L of vector fields commutes with a given vector field X , that is, [ X , L]

= 0.

Recall that a functionfe F ( M ) is called an integral of L if Note that:

Y ( f )= 0

for all Y E L.

Iff is an integral of L, so is X ( f ) .

Suppose that (x,), 1 5 a 5 m, is a functionally independent basis for the integrals of L ; that is, any integral function f on M is a function of the x,, . . . , x, above. Let (y,), 1 I iI n, be a set of functions on M such that ( x u ,y i ) forms a coordinate system from M . Now X can be written in the form

a a x = A&) ax, - + Ai(X, y ) -. aYi

76


Let X ‘ be the vector field in x-space defined by X‘

a

= A,(x) -.

ax,

Then the integral curves (x(t),y(t)) for X can be obtained by solving two lower dimensional systems:

(9.5b) Thus (9.5a) can be solved first for x(t), which is then substituted in the right-hand side of (9.5b) to be solved for y(t). For example, we may be able to change coordinates for x,-space so that X ’ = a/ax, . (In fact, this is what is meant by “ solving” (9.5a).) Then (9.5b) takes the form

for a choice of constants c,, . . . , c,. Continuing on a general level, suppose that L and L‘ are Lie algebras of vector fields, that L c L’, and [L’, X ]= 0. Suppose, as above, that the coordinate system (xu,yi)is chosen so that i3/axUis a basis for the vector-field system defined by L. Then

We do not assume that [L’, L ] c L, so that the elements of L‘ not contained in L do not “ pass to the quotient to define vector fields on x,-space leaving X ’ invariant. Thus they are no help in the problem of integrating X ’ . However, once X ‘ is solved,” they can be of use in solving (9.6). As an explanation, suppose that the coordinate system (x,) is chosen so that X‘ = a/dx,. Then ”

“

x =- a

ax,

a

+ /I.-

’ axi

and x 2 , . . . , x, are integrals of X . Hence the Y ( x , ) , . . . , Y(x,), Y , Y2(x2), . . . , Y , Y2(xm),. . . are integrals of X for all Y,, Y , , . .. E L‘. We need (n - nz) integrals of X that are independent of Y , , . . . , Y, in order to say that X has been completely “ solved.” The point is: “ Purely algebraic” conditions can be given for L and L’ which guarantee that this is so. We now turn to the following more explicit example.

77


Matrix-Riccati Systems Change notations slightly. Let i, j , .. . ,range between 1 and n. The underlying space is that of the variables ( t , xij),a space of dimension n2 + 1. Consider a vector field

a + U i j ( t ) X j k -.a x =at

Thus the parametrization of the integral curves of X is precisely the given t, and the integral curves are determined by the following system of linear homogeneous ordinary differential equations :

If b = ( b i j )is a constant matrix, let xb

Lxb

9

xO1

=

[

Xik bkj

= Xik bkj

a ax,, .

-1

a a a 5 + uhl X l m aX hm a 3

= uhl X i k b k j 8ij,lm

a

- al,l aXhm

X l m 6ik;hm b k j

a

= uhl X l k bkm -- uhl X l m b m j aXhm

=

I f b’ = (bij),then

X t k bkm

a

-- a h l X t k aXhm

aXhj

bkm

a

-= 0. aX hm

a axij

-

78


Thus, Cxb

3

xb']

= Xbb'-b'b

9

and the collection of the x b forms a Lie algebra of vector fields that leaves X invariant. Hence the above theory can be applied. However, the set of all x b is too big, since the vector-field system it determines is the set of all vectors on xij-space. Thus we look for a subalgebra of such vector fields to which to apply the theory. There is an obvious advantage in choosing the subalgebra as large as possible, since then the system (9.5a) will be as small as possible. Rather than go any further here into the general algebraic details, we shall deal only with a special choice.? Divide the indices 1 I i, j , . . . , 5 n into two groups: 1 5 a, 6, . . . , 5 m ; m + 1 5 u, u, . . . , n. Consider the set of all matrices b = (bij)such that b,,

= 0.

(9.7)

If b and b' satisfy (9.7), so do bb' and b'b; hence so does bb' - b'b. To see this, (bb'),,

=

bai b:, = b,, &,

= 0.

Let L be the set of vector fields x b = Xijbjk(d/dXk) such that b satisfies (9.7). Let L' be the set of all vector fields of the form xb, with b an arbitrary matrix. According to the general theory described above, the next step is to solve the completely integrable vector-field system defined by L'. This can be done explicitly in a certain neighborhood U of the identity matrix I = ( h i j ) $ ; let U be the subset of matrices (xij) such that det(x,,) # 0. For each n x n matrix x E U, let y(x),,, be the functions of x such that (y(x),,,) is the inverse matrix of (xu,);that is, xu,

Y(X),W

=

44,.

(9.8)

Let f(x),, = x,, ~(x),, be the indicated nm functions defined for each n x n matrix x E U . We shall show that thef,, are integrals of the vector fields of the form X,, where b satisfies (9.7). First suppose that X is any vector field. Apply X to both sides of (9.9a) (9.9b)

t Algebraically, the set of all A', is isomorphic as a Lie algebra to the linear Lie algebras of all n x n matrices. The subalgebras we now describe are maximal subalgebras. 1 Note that the vector fields A', are independent of 2.

79


But if

x = x b = Xik bkj(d/aXij), x(xau)Yuu

with bau= 0, We have = xak bku Y u u = xau buu Y u u .

x a u ~ u x x ( x x w ) ~ w= u xau Yaxxxk

b,w

~

= x a u ~ u x x x byy w ~

w

u

w u

= xau 6uy b y w ~ w u

- xay byw ~

w u .

Hence we see that xb(hu) = 0 for all b satisfying (9.7), that is, for all X EL. It is also easily seen that the functions f a , in U are functionally independent; that is, the dfaUare everywhere independent. In fact the functions (fa,, xu, x a b xua)form a new coordinate system for U. According to the general theory, the next step is to calculate X ( f , , ) , for then

X'

=

a + -a

Y,(fa,) afau

at'

THEOREM 9.3 Consider a system of linear homogeneous differential equations :

ni

+ 1 5 u,u,w,x,y, . . ., I n.

(9.10)

Consider U, the open set in the space of all n x n matrices (xij)= x determined by the conditions: (a) det(xuu)# 0. (b) Let y ( x ) , , be the inverse matrix of ( x u " ) . Introduce a space of variables zau and on this space a system of ordinary, time-dependent differential equations (a matrix-Riccatti system) :

80


Consider the map 4 from U to this z-space which assigns to each x E U the point +(x) = z = (z,"), with z,, = x,,y(x),, . (a) I f x(t) is a solution of (9.10) that lies completely in U , 4(x(t)) = z(t) is a solution of (9.1 1). (b) Suppose z ( t ) = (z,,(t)) is a solution of (9.11). Suppose that ( x ; ~ = xu E (Iis such that 4(xa) = z(a). Let x(t) be the solution of (9.10) such that x(a) = .'x If x(t) lies in U, then d(x(t))= z(t).

10

Lie Groups

It will be assumed that the reader is acquainted with the elementary algebraic properties of groups. Recall that a group denoted typically by G, with elements g , g l , g', etc., has associated with it a multiplication operation (9,g l ) + g g l , satisfying the rules : g(g1g2)= ( g g l ) g 2

(associative law).

There is an identity element e E G such that for all g

eg = ge = g

For each g E G, there is an inverse element 9-l

E

E

G.

(10.1)

G such that

g-'g = e = 99-1.

Definition Let G be a group and let a : G x G + G be the map that assigns g l g i l to the pair g l , g 2 E G. If G in addition has a topological structure so that a is a continuous map, we call it a topologicalgroup. If, further, G has a manifold structure so that a is a differentiable (that is, Cm)map, we speak of a Lie group. It can be proved that two such manifold structures that give rise to the same topological structure must coincide, so for most practical purposes we think of the manifold structure as determined by the group structure. Historically, groups arose as transformation groups on spaces. Some typical examples would be: The group of permutations of a finite set; the group of linear or affine transformations of a vector space; the group of canonical transformations in classical mechanics; the group of unitary transformations in quantum mechanics; the group of Lorentz transformations in the theory of special relativity. Definition Let G be a group and let M be a space. An action of G by transformations on M is defined by a map: G x M + M , (9,p ) + gp, such that 81

82


for 91, 92 E G, P E M -

91(92 P ) = ( g l g 2 ) P

for p

eP = P

E

M,

where gp is thought of as the transform o f p by the transformation g. If M is a manifold and G is a Lie group, we speak of G as a Lie transformation group if this map G x M -+ A4 is differentiable. Let us study the simplest type of transformation group, where the group G acts as a group of linear transformations on a real vector space V. We assume that to each element g E G is assigned a linear transformation p(g): u u, and the map G x V - t V, ( g , u ) -+ p(g)(v), satisfies the transformation-group conditions described above. We call p a linear representation of G. For certain applications, to be described later, V must be infinite dimensional. We shall suppose that it is a topological vector space, that is, the concept of the limit limn+mu, = u of a sequence of elements of V is well defined, and the sum of limits equals limit of sums, that is, --f

lim(u,

n-m

+ un) = u + u ,

if lim u, = u,

lim u,

n

n

= u.

This enables us to define the derivative du(t)/dtof curves t -+ u ( t ) in V : - - - lim (u(t

dt

Aft0

+ At) - u(t))/At,

with the usual rules of differential calculus satisfied. Suppose that t g ( t ) defines a one-parameter subgroup of G ; that is, a map R -+ G is given such that -+

dtl

+ t2) = g ( t M t 2 )

for t l , t z

ER.

As we have already seen, such objects play a very important role in the application of group-theoretic ideas to differential equations. Lie group theory (as opposed to abstract group theory) is concerned with studying a group by means of its one-parameter subgroups. Definition

Let p be a linear representation of a Lie group G by linear transformations on a vector space V , with t + g ( t ) a one-parameter subgroup of G. A linear transformation A : V + V is called the infinitesimal generator of the oneparameter group t + p(g(t)) of linear transformations if = (by

definition)

lirn t+O

p(g(r))u - u t

We shall suppose that each one-parameter subgroup of G has in this sense

83

10. Lie Groups

an infinitesimal generator. (If V is finite dimensional and if G acts as a Lie transformation group on V , then this condition is obviously satisfied. Certain infinite dimensional V , to be described below, also satisfy it.) Let us also suppose that the mapping G x V - , V, (9, u) + p(g)(u), is continuous in the sense of mapping convergent sequences into convergent sequences, with limits mapped into limits. Conversely, p(g(t)) is determined by a linear differential equation involving A : This is the reason for the terminology of “infinitesimal generator.”

that is, the orbit t -P p(g(t))(v) = u(t) of the one-parameter group satisfies the linear differential equation “

”

d

u(t) = A(u(t)); dt

u(0) = u.

(10.2)

Conversely, these steps are reversible: If A is a given linear transformation: V + V and if (10.2) has a unique solution, then a one-parameter group of linear transformations, denoted by exp(tA), is defined by the rules exp(tA)(u) = u(t). The motivation for this notation is that exp(td)(u) is defined by the power series (10.3) within its domain of convergence. For example, if u is finite dimensional, (10.3) always converges. (See Chevalley [l, Chap. 11.) Such operator-power series can indeed be handled much as real or complex power series, provided one handles the possible noncommutativity of operators with care. (See Exercise 2.) Having associated linear transformations with one-parameter subgroups, we may ask for the relation between the algebraic operations possible on linear transformations and the properties of the one-parameter subgroups. For example, operators may be added : (A

+ B)(u) = A(u) + B(b)

for u E V,

84


multiplied by (real) scalars

(c’wu)

=ca4,

and the commutator [A, B] = AB - BA of two operators may be defined. The addition and scalar multiplication imply that the operators form a vector space (over the real numbers). The commutator operation defines it then as a Lie algebra in the sense of the following definition. Definition A real Lie algebra, typically denoted by G , with elements denoted by X , Y, . . . , is defined by requiring that:

(a) G is a vector space (over the real numbers). (b) A bilinear multiplication operation (A’, Y) -+ [ X , Y], G x G -+ G, is defined for elements of G , satisfying the following law, called the Jacobi identity: [ X , [Y, 211 = CCX, YI, Zl

+ CY, [ X , Z l l

for X , Y, Z

E

G.

(c) [ X , Y ] = - [ Y, X ] for X , Y E G .

Of course, on this purely algebraic level, the real numbers are not sacred: The definition makes sense for an arbitrary field of scalars, for example, for the complex numbers, the rational numbers, and the integers mod a prime number. However, Lie algebras over a field of nonzero characteristic have certain unpleasant features. The most interesting cases are the real, complex, and rational numbers, and accordingly one speaks of a real, complex, or rational Lie algebra. For the purposes of Lie group theory, the real case is by far the most important, and when we talk about a Lie algebra without mentioning the scalars, we shall always mean a real one. It is readily verified that the commutator definition (A, B)+ [A, B ] = AB - BA makes the linear operators on V into a Lie algebra. How does this translate back into terms of one-parameter groups? Explicitly, we ask the following question: Suppose t -+gi(t), i = 1, 2, 3, 4, are one-parameter subgroups of G, with

how are g3(t) and g4(f) related to g,(t) and g2(t)? To answer this question, let us work formally for the moment.

85

10. Lie Groups

LEMMA 10.1 If A is a linear operator, V + V, then formally: exp(tA)

= lim "-+

w

(I

+ :)",

(10.4)

Proof. There are two approaches. First, purely as an operator equation, we have, using the binomial expansion,

=I

~ t ) ' n - 1 (At)3 ( n - l ) ( n - 2) + At + (.++..-. 2 n 3! n2

Formally, as n -+ coy this goes over to the power series for exp(tA). Another approach would be to work with the differential equation (10.2). Set v,(t) =

Hence v,(t) - v =

(1 +-?)"(v)

s'.( + 0

1

As n-1

for E V

" (y) )

-

n-1

( v ) ds.

(10.5)

Then, if v,(t) exists and equals, say, v ( t ) , and if the formal limiting operations in (10.5) are justified, we have v ( t ) - v = J)(v(s))

ds,

which is the integral equation form of (10.2). We shall not get involved with the material in functional analysis necessary to justify these formal limits, since it would take us too far afield. (See Yosida [I]). However, these results will be very useful to us as intuitive motivation.

86

Part. 1. Calculus on Manifolds

LEMMA 10.2 If A , B are linear operators, V - t V, we have formally: exp(t(A

+ B ) ) = lim

(10.6)

n+m

exp(t[A, B ] ) = lim[exp(:)

exp(:)

ex^(?)]^*.

exp(+)

(10.7)

Proof. We prove only (10.6). Equation (10.7) is similar, and is left as an exercise. Set C ( t ) = exp(tA) exp(tB). Then dC - (0) = A + B. dt (Since (d/dt)exp(tA) is, formally, A exp(tA), exp(0A) = I, the identity operator, and the product law for differentiation holds as long as the order of the operation is respected.) Suppose Taylor's expansion holds. Then

C(t)= I

+ (A + B)t + tZA2(t),

where A , ( f ) is a well-behaved function in the neighborhood o f t [exp(:

A ) exp(r;"

.)In

=

[I

+(A+B)t + ~

n

= 0.

(D'(i)j" -

A,-

Then

.

Note that the right-hand side will not be affected by the third term as n + co, since it has an n2 i n the denominator and the product involves n terms. Then the limit as n + co is

which proves (10.6) formally. Equations (10.6) and (10.7) are the key formulas connecting Lie algebras and Lie groups. They suggest the following ideas. The "Lie algebra" of a Lie group denoted by G should be defined as the set of one-parameter subgroups of G. The algebraic operations necessary to define a Lie algebra can be, intuitively, presented as follows: If t -+ g(t) is a one-parameter subgroup, if c E R, then the "scalar product" of c with the subgroup is the subgroup f -t g(ct).

(10.8)

87

10. Lie Groups

If t - +gl( t) and t -+gZ(t) are one-parameter subgroups, the “sum” of the two is the one-parameter subgroup t -+ g 3 ( t ) such that

(10.9) The bracket formula “

”

one-parameter subgroup t -+ g4(t) is

“

defined ” by the

Now, so far there is no guarantee that these limits exist or that they satisfy the identities needed to express a “ Lie algebra.” However, this suggests such a direct and intuitive approach toward defining the Lie algebra of a Lie group that we shall do it anyway. Use the symbol X to denote the one-parameter subgroup t -+g(t), and write g(t) = exp(tX). If exp(tX) = g l ( t ) , exp(tY) = g2(t),define X + Y and [ X , Y ] so that exp(G

+ YN = g 3 ( t ) ,

exp(tCX, Yl) = g4(0.

Since Helgason does in fact prove [l, Chap. 21 that necessary limits exist, we shall adopt this definition of the Lie algebra. We may say then that we have shown, if the formal steps can be made rigorous, that this “ notation ” for G suggests an algebraic interpretation of our work on linear transformations and infinitesimal generators. Suppose p is a linear representation of G by operators on V. To each one-parameter group denoted by the “symbol” X , that is, the group is t -+ exp(tX) = g(t), associate the infinitesimal generator A : V + V : p(exp(tX))

= exp(fA).

Let A = p(X). Regard p as a mapping of G -+ (Lie algebra of linear operators on V ) . Then Lemma 10.2 asserts that p is a Lie algebra homomorphism; that is, P(X

+ Y ) = P(X) + P

( n

P(CX7 YI) = CP(X), P(Y)I.

Since the foundations of Lie group theory are not our main concern, we shall leave the development of this general approach at this point and turn to more geometric material. Suppose G acts as a transformation group on a manifold M . For topology, adopt that of pointwise convergence; that is, a sequence ( f n ) of functions converges if Iim f n ( p ) = f ( p )

n-t

00

for all p E M

88


The linear representation p of G by transformations on v is defined as follows: P ( 9 ) ( f ) ( P ) =f(s-’P) f o r f e F ( W , P E (10.1 1) Let G be the collection of dzfferentiuble (say, Cm), one-parameter subgroups t + g(t) = exp(tX) of G. Then, if p denotes the infinitesimal generator of the one-parameter group t -+ ( p ( t ) ) of linear transformations on F ( M ) , we have d (10.12) P(X)(f)(P) = ,f(exp( - tX>P) i=o-

Also, P(X)(f*f2)

= P ( X ) ( f ,) f 2

+f l P ( - m f 2 ) -

This shows that p(X) is a vector field on M , that is, an element of V ( M ) . We can sum up these ideas in the following theorem.

THEOREM 10.3 Suppose the Lie group G is a transformation group on a manifold M . Equation (10.1 I ) defines a representation of G by linear transformations in F ( M ) , and (10.12) defines a mapping of G , the set of all one-parameter subgroups of G, into V ( M ) . Suppose t g(t) is a one-parameter subgroup of G and X E V ( M ) is the vector field on M defined by (10.12). Then each orbit t - + g ( t ) p of the oneparameter group is an integral curve of the vector field - X . -+

Proof. The first part of the statement is evident. To prove the second, suppose a(t) = g(t)p, f E F ( M ) :

=

=

lim f ( g ( A t ) g ( t ) p ) - f ( g ( t h )

At-0

-X ( f ) ( d t ) P ) .

Q.E.D.

There are three standard ways to make G act on G itself (strictly speaking, G acts on M , where M is the underlying manifold structure on the space of points making up G), namely: (i) Left translation: Given g E G, L, denotes the diffeomorphism h +gh on G. (ii) Right translation: Given g E G, R, denotes thediffeomorphism h hg-’ of G. (iii) Adjoint action: Given g E G, Ad gdenotes the diffeomorphism h -+ ghg-’. --f

Notice that for fixed g E G, L, and R, commute, and Ad g = L, R, .

89

10. Lie Groups

Definition

A vector field X on G is left (right) invariant if L,*(X(f))

= X(L,*(f))

R,*(X(f)) = X(R,*(f))

for allfE &),

all g E G .

for a l l f e F(G), all g

E

G.

For each one-parameter subgroup X E G , let X , be the vector field on G that is the infinitesimal generator of the one-parameter group t -+ R e x p ( f X ) . Thus X, is a left-invariant vector field on M. Similarly, let X, be the infinitesimal generator of the one-parameter subgroup t + L e x p ( f XIt) .is right invariant.

THEOREM 10.4 The mappings X - t X , and X , are 1-1 onto maps from the set of oneparameter subgroups to the set of left- and right-invariant vector fields on G. Proof. We shall work with the left-invariant vector fields. The proof for the right-invariant fields is similar. Suppose first that two one-parameter subgroups t + g , ( t ) and t + g 2 ( t ) give rise to the same element X , . Now both

t-+gl(t)

and

t+gz(t)

are integral curves of the vector field - X , (by Theorem 10.3). Since both begin at e, they must coincide, that is, g , ( t ) = g 2 ( t )for all t . This proves that X + X , is 1-1. T o show it is onto, proceed as follows: Let Y be a left-invariant vector field on G. We shall first show that the integral curve of Y beginning at e can be extended over (- co, a).Suppose otherwise, that is, that a : (a, b) + G is an integral curve of Y which cannot be extended over a large interval. Now the following geometric property of left-invariant vector fields is inherent in this definition : If t + y ( t ) is an integral curve of Y, if g E G , then t g y ( t ) = L,(y(t)) is an integral curve of Y . --f

Thus, for to E (a, b), the curve t + a ( t , ) - ' o ( t ) is an integral curve of Y, which is equal to e for t = t o . By uniqueness of integral curves, ~(1,)- 'a(t) = a ( t - to)

for a I tI b.

This shows that the size of the neighborhood of to in which there exists a solution of the differential equations defining the integral curves of Y remains bounded away from zero as to approaches b, which gives the desired contradiction.

90


Then let o(t), - co < t < co, be the curve in G that is the integral curve of Y beginning at e for t = 0. Since the to used above can be any real number, o(t,

+ t ) = o(t) .

o(t0);

that is, t + a(t) is a one-parameter subgroup of G and hence defines an element of G , which we call X . The left invariance of Y then proves that

-x, = Y .

Q.E.D.

Remarks. It is more customary to define the Lie algebra of a Lie group as the set of left-invariant vector fields. (For example, this is the procedure adopted by Chevalley [l] and Helgason [I].) While this is most convenient for the purpose of proving the main theorems in the foundations of Lie group theory, it is slightly awkward when considering Lie groups as transformation groups, since the identification of the Lie algebra with the set of one-parameter subgroups is better adapted to the geometric intuition. At any rate, Theorem 10.4 and Exercise 6 shows that this is compatible with the definition we have chosen. There is an action of G on the underlying vector space of G that is also called the acljoint action of G. (Strictly speaking, it should be called the infinitesimal version of the adjoint action of G on G, but it is customary to confuse this point.) It can be most readily defined as follows: For g E G, X E G , the one-parameter subgroup represented by Ad g ( X ) is just t

--f

Ad g(exp(tX)) = g exp(tX)g-’

It is readily verified that for each g E G, Ad g considered as a mapping G + G is a Lie algebra isomorphism. Thus “Ad” also stands for a linear representation of G by automorphisms of G. This can be symbolized by the relation : Ad g(exp t X ) = exp(t(Ad g ( X ) ) )

for g

E

G, X

E

G,

- co

< t < co.

If G acts on a manifold M , we then also have the important formula

g . exp(tX) . p

= g . exp(tX). = Ad

g-l .g p

g(exp(tX)) . SP

= exp(t(Ad

g(X))) . SP.

It seems to be inevitable in Lie group theory that each symbol has at least two possible meanings. For example, we have seen the two meanings of “Ad.” So far, we have been working with one fixed meaning of “exp.” However, there is another related meaning.

91

10. Lie Groups

Definition

Let G be a Lie group and let G denote its Lie algebra. The exponential mapping, denoted also by “ exp,” is the mapping: G G defined as follows : --f

For X E G , exp(X) is the value at t = 1 of the one-parameter subgroup of G determined by X . This completes our listing of the general facts relating a single Lie group to its Lie algebra. However, for geometric purposes it is most important to know the relation between the Lie subgroups of a Lie group and the Lie subalgebras of its Lie algebra. Definition

Let G be a Lie group. A Lie subgroup of G is defined by a pair (typically denoted by (H, 9))such that H i s a Lie group, and 4 is a submanifold map: H + G, that is a homomorphism of the group structures on H and G. As usual in the theory of submanifolds, it is convenient to often suppress explicit mention of the map 9, for the sake of notational simplicity, and write H c G. However, it is quite important to keep in mind that the topology that makes H into a Lie group is not necessarily the topology induced from G . In addition, we shall usually say “ subgroup instead of “ Lie subgroup,” since subgroups in the purely algebraic sense will not be considered. A subgroup H of G defines a (Lie) subalgebra, denoted typically by H, of G. For, every one-parameter subgroup of H can be regarded as a oneparameter subgroup of G, defining the inclusion H c G. It is obvious from the definition that all Jacobi brackets of elements of H again lie in H, so that it is a subalgebra; we say that H corresponds to H. One of the main theorems of Lie theory is Theorem 10.5. ”

THEOREM 10.5 Let G be a connected Lie group. The correspondence H -+ H sets up a 1-1 correspondence between connected subgroups of G and the subalgebras of G .

Proof. First we shall show that every subalgebra H of G arises in this way from a connected subgroup of G. For this purpose it is most convenient to regard the Lie algebra of G as the set of left-invariant vector fields on G. Thus H can be regarded as a subalgebra of V ( G ) ; hence it defines a completely integrable vector-field system on G. This system is invariant under left translation by G; hence it is everywhere nonsingular. Let H be its maximal connected integral submanifold passing through the identity element of G.

92


Next we show that H is a subgroup of G in the purely algebraic sense. Let h E H. We want to prove that h-' E H . From Exercise 11 [part (a)], we know that h can be written at least one way in the form h

= exp(X,). . . exp(X,)

for some choice X I , . . . , X , E H.

Now exp(X)-' = exp( - X ) ; hence h-' E H. Next we prove that h,h, E H if E H. By left invariance, Lhl(H) is an integral submanifold of H passing through h,; hence it must be contained in H , whence h,h, E H. Consider the map H x H-+ C defined by (hlh2)-+ h;'h,. Its image is contained in H. By Theorem 9.4 it can be factored through a mapping H x H + H , that is, the map (h,, h,) h; ' h , is differentiable in terms of the manifold structure on H. This defines it as a Lie group. Clearly, the submanifold map H + G defines it as a Lie subgroup of C, and by its very construction the corresponding subalgebra is H. Suppose H I is another connected subgroup of G whose corresponding Lie subalgebra of G is H. Using the fact that a connected Lie group is generated by any neighborhood of the identity, we see that H I and H are identical as point-sets in G. Using Theorem 9.4 again, we see that the identity map H, + H is differentiable. Turning the argument around, the identity map H-+ H , is differentiable; that is, H and H, are identical as Lie subgroups of G. Q.E.D. h,, h,

--f

Thus we begin to see how the algebraic properties of the Lie algebra reflect the algebraic properties of the groups. There is a group of useful theorems giving sufficient conditions that subgroups in the algebraic sense be Lie subgroups : Let H be a subgroup in the algebraic sense of a Lie group G such that every element of H can be joined to the identity element by a (broken) C" path lying completely within H. Then, H is a Lie subgroup of G. For the proof, we refer to Kobayashi-Nomizu [ I , p. 2751. Let H be a subgroup in the algebraic sense of a Lie group G that is a closed subset of G. Then H i s a Lie subgroup of G. Further, the topology on H is that induced from G, and H is a regularly embedded submanifold of G. For the proof, we refer to Helgason [l, Theorem 2.3, p. 6051. These subgroups are the most important in the geometric applications. We shall call them chedsubgroups. They arise geometrically in the following

93

10. Lie Groups

way: Suppose that a Lie group G acts as a transformation group on a manifold M . Let p E M , and let H be defined as follows: H

= {g E

H:gp

= p},

where H i s called the isotropy subgroup of G at p . That it is a closed subgroup of G follows from the Fact that the mapping G x M -+ M is continuous. The subset Gp = { g p : g E G} c M is called the orbit of G at p . It can be identified with the space of left cosets of G by H , which is denoted by GIH, defined as: An element of GIH is the subset OF the form g H for one choice of g E G. GIH is also called the homogeneous space of G with isotropy subgroup H . The coset eH is called the origin of GIH. The map G + G / H sending g E G into gH is called the projection of G into G / H , Each go E G induces a transformation of G/H as follows: g o . ( g H ) = ( g o g ) H . (In other words, G acts on G/H in such a way that the projection map G GIH commutes with the action of G on itself by left translation and on GIH.) Notice that H is then the set of g E G such that the transformation defined by g on GIH leaves the origin invariant. We shall state the basic theorems concerning these ideas, referring again to Helgason [ I , Chap. 2, Sects. 3-41 for the proofs. -+

Theorem. Let H be a closed subgroup of a Lie group G. GIH can be made into a manifold so that the projection map G -+ G / H is a maximal rank, onto mapping. The fibers? of the projection are the left cosets of H , and they are also integral manifolds of the left-invariant vector-field system on G determined by H. (In fact, G -+ G / H is a principal fiber bundle with H as structure group, referring to Auslander-Mackenzie [ 11 for this notion.) Theorem. Suppose a Lie group G acts as a transformation group on a manifold M , and that the subgroup H is the isotropy subgroup of G at a point p E M . Then the mapping GIH- Gp, which assigns to g H the point gp,f defines GIH as a submanifold of M whose set of points is the orbit top. (Thus, when we speak of the orbits as “submanifolds,” we mean their manifold structure inherited from GIH.) As a bonus From these general theorems, we obtain a way of proving that various spaces are manifolds, without the necessity of going through the details of exhibiting an atlas of coordinate systems. One such important example is the Grassman manifolds, which we do as an illustration, Let V be

t If 4: M + B is a map of M into B, the inverse image of a point of B is called the fiber above that point. 1Notice that this map is well defined, since if g and g, define the same coset, gg;’ E H ; hence g p = g l p .

94


a real, finite dimensional vector space. Let A ( V ) be the group of linear automorphisms of V. It can be easily proved that A ( V ) is a Lie group such that the mapping A( V ) x V .+ Vdefines it as a Lie transformation group on V. (For example, choosing a basis for V identifies V with R" (n = dim V ) and identifies A ( V ) with GL(n, R), the group of all n x n invertible real matrices. In the exercises, we go over the Lie group generalities for GL(n, R ) and the other " classical " groups.) For each integer p , 0 < p < n = dim V , let Cp(V ) be the set of p-dimensional linear subspaces of V (the Grassman manifold). A ( V ) acts on G P ( V ) in an obvious algebraic way: If W is a p-dimensional subspace of V, that is, a " point" of GP(V ) , and if a E A( V ) , then a W is just the subspace a( W ) . It is quite simple linear algebra to prove that A( V ) acts transitively on GP(u).Let W , be a fixed element of GP(V ) . Then the isotropy subgroup of A( V ) at W is A( V , W ) , defined as A ( V , W,)

= {u E

A ( V ) : a(W,)

=

W0}.

It is clearly a closed subgroup of A( V ) ,identifying GP(V ) with A( V ) / A (V , Wo), hence giving it a manifold structure. Of course a similar procedure would be followed for vector spaces over the complex numbers. We shall leave the general theory of Lie groups at this point.

Exercises 1. Show that the solution of (10.2) does indeed define exp(rA) as a oneparameter group of linear transformations on V.

+ +

2. Show that exp(A) exp(B) = exp(A B ) = exp(B) exp(A) if A B = BA, with exp(A) as defined by (10.2). If [ A , B ] merely commutes with A and B, work out the formula connecting exp(A B ) , exp(A), exp(B). 3.

Prove (10.4) if V is finite dimensional.

4.

Work out the formal details of (10.7).

5. Show that (10.6) and (10.7) hold if A and B are finite dimensional operators, that is, carry out the needed estimates. (In fact, the same estimates hold if A and B are bounded operators in a Hilbert space.)

6. Let G be a Lie group, and let X -+ X , be the isomorphic mapping of G , the set of one-parameter subgroups of G, onto the set of left-invariant vector fields. Suppose p is a representation of C by linear transformations on a vector space V. For each X E G, let p(X) be the infinitesimal generator of the

95

10. Lie Groups

one-parameter group of linear transformations t X , Y EG. Prove directly that

cx,,

x, + YL = P(X) + P ( n

-+

p(exp(tX)). Suppose that

YLl = P(CX,,

YLI).

This result can be interpreted as follows: Using Theorem 10.4, G can be identified with G,, the set of left-invariant vector fields on G . Now G , is a subalgebra of V(G). Use this identification to make G into a Lie algebra. Then the exercise shows that X + p ( X ) is a Lie algebra homomorphism. 7.

Show that dim G = dim G.

8. Prove that, for X E G , the vector field on G, which is the infinitesimal generator of the one-parameter group t -+ Ad(exp(tX)), is X , + X , . 9. Prove that, for X E G, X,(e) of G.

=

-

X,(e), whose e is the identity element

10. For X E G , prove that exp(X) is the value at t curve of the vector fields - XL or X , that begins at e.

1I.

=

1 of the integral

Prove the following facts: (a) exp, considered as a map G G, is differentiable. (b) exp, , its differential, is an isomorphism Go 4 G, . -+

(If V is a neighborhood of 0 in G such that exp restricted to V is a diffeomorphism, then U = exp( V ) is called a canonical neighborhood of e in G . The coordinate system in U, obtained by pulling back via exp-l a Euclidean coordinate system for G , is called a canonical coordinate system for U.) (c) If G is connected and - X E V for all X E V, then every element of G can be written as the product of a finite number of elements chosen from U = exp( V ) . (d) If G is connected, it is Abelian if and only if [ X , Y ] = 0 for all X , Y EG ; that is, G is an Abelian Lie algebra. 12. Suppose that XI, . .., X , is a basis for the vector space G . Define a map 4 : G + G as follows: If X = x, X , + .. . + x, X,, ,then 4 ( X ) = exp(x, X , ) . . exp(x, X,). Prove that 4 is also a diffeomorphism in a neighborhood of X = 0. The coordinate system obtained in this way for the corresponding neighborhood of e is called a canonical coordinate system of the second kind. 13. The Lie algebra G is said to be nilpotent if, for n sufficiently large, Ad X , . . . Ad X , = 0 for any n-tuple of elements X , , . . . , X , of elements of G . Prove that exp: G -+ G and the map 4 of Exercise 12 have everywhere nonzero Jacobian in this case.

96


14. If G is connected, show that every element g E G can be written in the form exp( Y , ) . . . exp( Y,) for some choice Y , , . . ., Y, of elements of G.

The following exercises will elucidate the “ classical groups.”

15. Let Y be a finite dimensional real vector space. Prove that the set of invertible linear transformations is a Lie group, denoted by GL(V). Its Lie algebra is E ( V ) , the space of all linear operators V + V. Similarly, if V is a complex vector space, with GL(V, C ) the group of all complex-linear isomorphisms: V + V , its Lie algebra is E ( Y , C ) , the complex-linear operators V - t V. If V = R“, then GL(Y, R ) becomes GL(n, R), the group n x n real invertible matrices. Similarly, if V = C”, GL( V , C ) is GL(n, c), the group of n x n complex invertible matrices. 16. Suppose V is a vector space; B( , ) is a bilinear, scalar-valued form on V . Let G be the group of isomorphisms: V - t V such that B(gu, gu) = B(u, u) for all u, u E V ; that is, G is the subgroup of GL(V) that preserves the form B. Prove that G consists of the operators A : V + V such that B(Au, v )

+ B(u, Av) = 0.

The other “classical” matrix groups (in addition to GL(n, R), GL(n, C ) ) can be obtained in this way. First suppose that V is a real vector space of dimension n, and B( , ) is a symmetric, positive definite form. G is then essentially O(n, R ) , the real orthogonal matrices. Show that its Lie algebra consists of the skew-symmetric n x n real matrices. If V is a complex vector space, B( , ) a nondegenerate symmetric complexlinear form G is essentially O(n, C ) , the n x n complex-orthogonal matrices. If Y is a real vector space of dimension n, if B( , ) is a symmetric bilinear form that, when brought to “ normal form,” has p-plus and ( n - p)-minus signs, G is denoted by GL(p, n - p). If B is a nondegenerate skew-symmetric form, then G is denoted by Sp(n, C ) and Sp(n, R), according to whether the form is real or complex. Determine its Lie algebra. Suppose V is a complex-vector space, but the form B( , ) is Hermitian bilinear; that is, it is bilinear as a real form, and in addition, B(cu, u) = cB(u, u)

for u, u E V , c E C ,

B(z4, cu) = cB(u, ?I).

If B(u, u) > 0 for all u E V , the group of complex-linear automorphisms preserving B is denoted by U ( B ) .In terms of matrices, it can be identified with U(n), the group of n x n complex unitary matrices. Determine its Lie algebra.

97

10. Lie Groups

If V is direct sum of complex subspaces V, 0 V, with B(V,, V,) = 0 ;

B(u,, u , ) > 0;

B(u,, u,) < 0;

for u, E V,, u2 E V , ,

and complex dim V, = p ,

dim V, = q,

then U ( B ) ,when realized as a matrix group, is denoted by U ( p ,4).Determine its Lie algebra. 17. Let SL(n, R ) and SL(n, C ) be the subgroup of determinant 1 matrices. Show that their Lie algebra are the n x n matrices of trace zero. Define : SO(n, R ) = O(n, R ) n SL(n, R ) SO(n, C) = O(n, C) n SL(n, C) S U ( n ) = U ( n ) n SL(n, C )

18. Prove that SL(n, R), GL(n, C ) , SL(n, C), SO@, R), O(n, C), SO@, C), U(n), SU(n) are all connected. However, GL(n, R), O(n, R ) are disconnected and have two components. Find the formula for the dimension of each of these groups.

11 Classical Mechanics of Particles and Continua Mechanics of a Single Particle Our aim in this chapter is to give a survey of some topics in mechanics that are of interest from a geometer’s point of view. Let us start off at the most elementary level, with Newton’s law of motion for a particle moving in Euclidean 3-space, which we denote by R3. Conforming with the notations in books on mechanics, points of R3 are denoted by r. The Euclidean dot product is r * r‘.

It is a symmetric, positive-definite bilinear form. The vector, or cross product is r x r’. It will be assumed that the reader is familiar with the rules of this Euclidean vector algebra and analysis. Consider a particle moving with time t, analytically given by a curve -+ r(t). If m is its mass and if F(r, i, t ) is its force law, Newton’s equation of motion is

d2r

(11.1)

This force law F is then essentially a mapping T ( R 3 )x R + R 3 , which must be prescribed by the physical theory with which one is involved. The main physical theories (for example, gravitation, electromagnetism, and fluid mechanics) provide a distinctive way of prescribing a force law. Note how strongly the vector-space structure of R3 enters into the equation. First, the solutions for the force-free case are straight lines. Second, to equate both sides of (I 1.l), we are relying on the fact that the tangent vector to the wave, drldt, can be identified with a vector in R3 itself, a characteristic property of vector spaces. Notice that the theory is “covariant” under the group of affine transformations of R 3 ; that is, if R3 --+ R3 is of the form

+:

4(r) = 4 )+ a,

where A is a linear transformation: R3 -+ R3, a E R , , and if r’(t) = +(r(t)) 98

11. Mechanics of Particles and Continua

99

is the transformed motion, then

with

’

F(r‘, r‘, t ) = AF(+- ‘(r’), A - r, t)

(11.2)

Here, “ covariance ” is a rather vague term of course : If the solutions of (1 1.1) are subjected to an arbitrary diffeomorphism R3, (1 1.1) is transformed into some second-order differential equation, but of a considerably more complicated type than (11.2). Thus the criterion of “covariance” can be regarded as esthetic; one could pinpoint the affine group by requiring that the diffeomorphism preserve the force laws that are linear in the indicated variable. After the equations are recast into the form of the Euler equations of the calculus of variations, or the Hamilton equations of Hamilton-Jacobi theory, there will be revealed a more subtle “covariance” with respect to a much larger group. (Perhaps one can look upon this as the physical reason for introducing these mathematical elaborations into mechanics.) The metric structure of R3 so far has played no role. It enters, for example, with the idea of ‘‘energy.” The kinetic energy T is defined by

(11.3)

-

with Idr/dt12 defined as (drldt) (dr/dt). Then d

(11.4)

Suppose that F is independent of i and t . Equation (11.4) suggests that we use F to define a I-differential form o on R3 by requiring that o ( u ) = F(r) u,

for each r E R3, each u E Rr3, which is identified with R 3 itself. Then, if o = -dV, where V E F ( R 3 ) ,we see that

that is, the “total energy” E = T + V is constant along the motion or, as the physicists say, is a “conserved quantity.” The (local) condition for the existence of such a potential-energy function V is, of course, that d o = 0, which means in the language of Euclidean vector analysis that the curl of the vector field F is zero.

Part 1, Calculus on Manifolds

100

The conservation “ laws readily introduced :

”

of momentum and angular momentum are also

p, the linear momentum,

L, the angular momentum,

dr dt

= m -, =r

x p;

Systems of Particles Suppose we are given s particles, each moving in 3-space, subject to given external forces and t o mutual interaction forces. Suppose r,(t), .. . , r,(t) are their paths of motion, with masses m,,. . . , m, . Newton’s equations of motion then look something like this:

dt Introduce indices 1 5 a, b, . . . I S. The total kinetic energy T is given by

Again a potential function V(r,, ...,r,) can be introduced by the condition

d dt

- V(r, (t),

. ..,r,(t))

= -

1Fa

*

a

dra dt

-

for each curve t + (rl(t), . . . ,r,(t)) in R3’. Conservation of total energy E = T + V will result. Total momentum p = p + ...

and total angular momentum L = L,

+ pm= C m a .dra dt

a

+ .. . + L, = 2 ra x pa a

11. Mechanics of Particles and Continua

101

can be now introduced. These quantities are most useful if the forces Fa are of a special type, namely, Fa

=

9

b

where Fab, for b # a, is interpreted as the force that the bth particle exerts on the ath particle, and Fa, is interpreted as the action of the external forces on the ath particle. Then (11.5) This suggests the simplifying condition Fob = -Fba

for a # 6.

(It is just the quantitative version of Newton’s law : “Action With this condition,

(11.6) = reaction ”.)

(11.7) The right-hand side is the sum of the external forces. In particular, total momentum is conserved if the system is isolated, that is, if there are no external forces. Turn now to total angular momentum:

=

(ra a2

As before, we can suppose that x2 - (-( w = dx,

A

ds,

+

aijdxi A dxj.

f/2); hence

i, j r 2

aij dxi A d x j .

xi,

Now (P*(x2) = 0 = (P*(f), by construction; hence (P is also an integral submanifold of the form j 2 2 a i j dxi A d x j . As in the proof of Theorem 13.3, this form must everywhere in M have rank m - 2; hence the induction hypothesis may be applied. Part (c) now follows from (b): For if o=

dxi

i= 1

A

dxrtir,

with x r + l = 0 = ... = x, defining (P(N) locally, then the vector fields (dji3x2r+l),. . . , (djdx,) span C , the characteristic vector-field system of w . Now, by inspection, the vector fields ( J / ~ X ~ ~ . .+. ,~(ajax,) ), are tangent to the submanifold xr+ = 0 = . . . = x Z mthat ; is, their integral curves starting at one point of the submanifold remain on the submanifold. Abstractly, this means that for all p E N . C+(p)c (P*(Np,)

13. Hamilton-Jacobi Theory

129

C " by restriction" defines the vector-field system H on N ' required for the proof of (c). THEOREM 13.6 Let w be a closed 2-form on the manifold M , and let 6' be a 1-form such that d6' = w . Suppose that 4 : N + M is a submanifold of M that is also an integral submanifold of 8, that is, 4*(6')= 0. Let X be a characteristic vector field of w that is not tangent to 4 ( N ) at any point, and such that for somefE F ( M ) . X(0) = fd Let 6: N x [0, a] be the submanifold map constructed above; that is, for fixed p , t 46(p, t ) is an integral curve of X , reducing to 4 ( p ) at t = 0. Let 4t be the submanifold map: p + 6(p, t ) = 4,(p). Then: (a) 6 is an integral submanifold of w . (b) For each t, 4t is an integral submanifold of 8. Proof. (b) is a consequence of Theorem 13.2, since 4 is also, of course, an integral submanifold of w. Now X(6') = X _I d6' + d(X(6'))+ d(X(O)),since X is characteristic do. It suffices to prove (b) locally: We can then suppose that the coordinate system ( x i ) , 1 < i, j , .. ., S n , has been chosen so that X = d/dx,. Then there exists a g E F ( M ) such that X(g6') = 0 and such that g is everywhere # O . For g must satisfy

and it is evident that (locally) such a g can be found. Suppose g0 = a , d x i . Hence dai/ax1= 0. We see, using the explicit form of the integral curves of X , that 4,*(xi) = 4 * ( x i ) for i > 1, 4r*(X1) =

Thus, holding t constant,

4 * ( 4 + t.

4,*(g') = +,*(ai(xz 7

..

. 7

xn))

d(4,*(xi))

.. . , x.)d(x, + t ) U i ( X 2 , . . . , xn)dxi

= Ul(XZ,

+ izc

1

Q.E.D.

Now we shall indicate the relation of this general theory to HamiltonJacobi theory in the usual sense, namely, the study of the Hamilton ordinary differential equations and the Hamilton-Jacobi partial differential equations.

130

Part 2. Hamilton-Jacobi Theory-Variational Calculus

We change notations: M will be a domain in RZ"+', 1 i, j , . . . , In.A point of M will be denoted by (x, y , t ) , x = ( x i ) ,y = ( y i ) .In physical problems, the xi are coordinates of configuration space, the y i the coordinates of momentum space, t the coordinate of time. (The reader will more frequently in physics books see x and y replaced respectively by q and p . ) Suppose that H ( x , y , t ) is a real-valued function on M . A closed 2-form o in M is said to be in Humiltoniun form with Hamiltonian H if dw = dyi A dxi - dH

A

dt.

The following special notation is useful in the computations in HamiltonJacobi theory and the calculus of variations:

H. Thus,

d2H

aZH

ayj axi ,

axi axj7

.=-

'33

dH

H i , n + j = p

= Hi dxi

etc.

+ H , + i dyi + H , dt.

THEOREM 13.7 Let w be the 2-form given by w = dy,

A

d x i - dH

A

dt.

Then : (a) Rank o = 2n, dimension of characteristic vectors = 1. (b) There is exactly one characteristic vector field X such that X ( t ) = 1, namely, (13.1) X also satisfies: X ( H ) = H , . (c) A curve in M that is a characteristic curve of X can be written in the form ( t , x ( t ) ,y ( t ) ) .The functions x ( t ) , y ( t ) are determined as solutions of the Hamilton equations with Hamiltonian H :

Proof. Suppose U E T ( M )is a nonzero tangent vector to M that is a characteristic tangent vector to o.Then

0=v

_I

w = v(Y,) Clxi - U ( X J dyi

- v ( H ) d t + ~ ( t ) ( Hdxi i + H,+i

dyi +.H, dt).

131


First we must have v ( t ) # 0: for otherwise, 0 = u(y,) = u(xi); hence u is identically zero. Thus u can be normalized so that u ( t ) = 1 ; hence U(XJ = Hn+i ~(yi= ) -Hi > that is, 2) is uniquely determined; hence the space of characteristic vectors is one-dimensional. Working backward, we see that (13.1) provides an everywhere # O characteristic vector field. That (13.2) gives the integral curves of X follows, of course, from the very definition for vector field. Now, we turn to a similar geometric interpretation of the Hamilton-Jacobi partial differential equation with Hamiltonian H : 9

as

and to an explanation of the Hamilton system (13.2).

"

duality " between this single equation and the

THEOREM 13.8 Let w be the 2-form dy, A dxi - dH A dt. Suppose, for simplicity, that H i s defined over all (x, y , t)-space so that w is defined over all of R2"+'. Let 7c: R 2 n + l Rll+1 be the projection map of (x, y , t ) on (x,t ) . Let D be a convex domain of R"", and S(x, t ) a function defined on D.Associate with D the map: &: D -+ R2"+',defined as follows: -+

(Thus (a)

4s*(~i) = x i ,4s*(yi) = as/axi,cPs*(t) = t.) Then &*(a) = 0; that is, 4s is an integral submanifold

of o if (13.3)

that is, S(x, t ) is a solution of the Hamilton-Jacobi partial differential equation. (b) Conversely, any map 4 : D + R2"+l that is a cross-section map for 7c, that is, satisfies 7c4 = identity, and that is an integral submanifold of w arises in this way as (Ps for some function S on D that is a solution of (13.3). (c) If S(x, t ) is a solution of (13.3) defined in D,consider the vector field

a

a + +s*(H,+i) at axi . a

=-

-

(13.4)

132


The integral curves of Yare of the form ( t , x ( t ) ) , where x ( t ) is a solution of the system of ordinary differential equations :

(13.5) Each of these integral curves is the projection under .n of a characteristic curve of w.? Explicitly, for a solution x ( t ) of (13.5), the curve t + ( x ( t ) , yi(t) = (dS/dxi)(x(t), t ) , t ) is a characteristic curve of w, that is, is a solution of the Hamilton equations. Equivalently, we may say that 4 maps every integral curve of Y into an integral curve of X .

as axi

= - dxi -

:(

4s*(H) dt

= dS - -

+ &*(H)

1

dt.

Thus, if S is a solution of (13.3), that is, (aS/at)+ &*(H) = 0, then

+,*(do) = d dS

= 0.

This proves (a). Conversely, suppose that 4 : D + RZn+lis a cross-section map such that 4*(0) = 0. Now w = do; hence d(4*(e)) = 0.

Suppose S(x, t ) is a function on D such that dS

=

4*(O)= 4*(yi) dxi - 4 * ( H ) dt.

(Since a cross-section map: D + R Z n f ’ is characterized by the conditions 4*(xi) = x i , 4*(t)= t ) ; hence: 4*(yi) = aS/dx,,that is, 4 = 4,. This proves (b). Turn to (c) and note that 0=X

_I

w =X

_I

dO = X(8).

Since proving (c) involves a general relation between mappings and vector fields, it is worth our while to put down the general condition.

7 For the Harniltonians arising from the calculus of variations, the integral curves of Y form what is known as an “extremal field.”


133

LEMMA 13.9 Let a : M - , M‘ be a mapping between manifolds. Let X and X ’ be vector fields on, respectively, M and M ‘. Then a maps every integral curve of X into an integral curve of X ’ if and only if or

a*(x(f)) = Xa*(f)

for all f E F(M’),

(13.6a)

a*(X(p)) = X ( a ( p ) )

for all p E M .

(13.6b)

Proof. Suppose o(t),0 5 t I b, is an integral curve of al(t) = ao(t). Then, al‘(t) = X(a,(t)); that is, for f E F(M’),

d -f(o,(t)) = X(a,(t>)(f>= X(f)(o,(t)). dt

But

Since o(0) can be any point of M , we get condition (b), and also condition (a), as

X(a*(f))(p> = a * ( W ) ) ( p )

for all P E M .

The steps we have gone through are reversible, to prove the converse. Q.E.D.

Returning to Xgiven by (13.1) and Y given by (13.4), we verify the condition (13.6) (it suffices to verify (13.6) when f varies over any basis for the functions) : 4s*(X(xi>)=z 4s*(Hn+i) = 4s*Y(xi) = Y(4s*(xi))*

134

Part 2. Hamilton-Jacobi Theory-Variational

(using the fact that S satisfies (13.3)) =

=

(

E:

)

Calculus

: E azs (

- H ~~ , - - , t - H , , + ~x,--,t

as

+

H"+j(X7

-

4s*(Hi) = 4s*(X(xi>).

2)

&*(X(t)) = &*(I) = 1 =

)

-

r(t)= Y(#s*(t)).

Q.E.D.

Let us now turn to the problem of actually solving the Hamilton-Jacobi equation

as

(13.7)

THEOREM 13.10 Suppose that the domain D c R2"+l is sufficiently small, that S o ( x ) is a function defined in a domain D' of R" (that is, x-space) such that the point (x, y = (aS'/ax,)(x), t = 0) belongs to D whenever x E D'. Then, if D is sufficiently small, there exists an a > 0 and a unique solution S(x, t ) of (13.7), defined for x E D', 0 I t i a, such that S ( x , 0) = SO(X). Further, for xo E D, let ( x ( t ) ,y ( t ) ) ,0 5 t i a, be the solution of the Hamilton equations : (13.8) with x(0)

= xo, y(0) = (aSo/axi)(x').

Then ( x ( t ) , y ( t ) ) must also satisfy

That is, t + (x(t),t) is an integral curve of the vector field defined by (13.4) on (x, t)-space. For T E [0, a],

s(~(T),

= so(X0)

+ J 0T [ Y i ( r ) dxi(t) - H ( x ( t ) , Y ( x ) , t ) ]

dt.

(13.10)

Proof. This theorem is really nothing more than a realization of Theorem 13.2. Let 4' be the mapping D' -+ D that assigns (x,y = ((aS/i3xi)(x),0)) to x E D'. Then ~O*(Y,

as

dxi - H d t ) = - dxi = d S ;

axi

135


hence 4' defines D' as an integral submanifold of the 2-form w = dyi

A

dxi - dH

A

dt.

We have seen that the characteristic curves of w are essentially defined as the solutions of the Hamilton equations (13.9). By Theorem 13.2 there is an integral submanifold 6: D' x [0, a] -+ D such that 6(x, 0) = 4"(x) for x E D', and such that for each xo E D,t + S(xo, t ) is a characteristic curve of o. Let p : D 3 R"" be the " standard" projection that assigns (x,t ) to the point (x,y , t ) E D. Consider the map p 6 : D' x [0, a] RE+'.If a is sufficiently small, this map has nonzero Jacobian. Suppose D' and a are chosen small enough so that there is a domain D"c R"" and an inverse map (p6)-' : D"-+ D' x [0, a]. Let 4 = 6(p6)-', a map D"-+ R2"+'. Then --f

p+

= p6(p6)-' = identity

map;

that is, Cp is a cross section map D"+ R2"+',and is an integral submanifold of o.Then d4*(yi dxi - H d t ) = 0; hence 4*(yi) dxi - 4 * ( H ) dt = d S

(13.11)

for some function S(x, t ) defined for (x,t ) E D".(4*(xi) = x i and + * ( t ) = t because of the cross section property of 4 ; that is, pCp = identity map.) Thus

that is, S satisfies the Hamilton-Jacobi equation. We now show that S(x, 0) = So(x) + constant. We see that 4(x, 0) = @"(x). Thus d ( S ( x ) - S0(x)) = 0.

We have already seen in Theorem 13.7 that (13.9) does define a characteristic curve of o starting at (xo,y(0) = ((dS/axi)(xo)),0) = 4 ° ( ~ ohence ) ; it must agree with the curve t -+ 6(xo, t ) . Finally we prove (13.10). By (13.11),

Q.E.D.

Equation (13.10) has an interpretation in terms of the "action" that has a

136


certain importance in physics. In general, if (x(t), y ( t ) ) ,0 5 t 5 T, is a curve in (x,y)-space, the number ( 13.1 2)

is called the action along the curve. In problems in mechanics, x i ( t ) and y i ( t ) describe, respectively, how the position and momentum coordinates of a particle (or system of particles) changes in time. The function H gives the value of the energy, so that (13.12) assigns a definite number to each possible trajectory of the physical system. The “principle of least action” requires that the trajectory actually followed is one that minimizes this value of the action. It is easily seen that these curves are contained among the solutions of the Hamilton equations (13.8). In quantum mechanics (according to the viewpoint of Feynman) there is a “smearing out” of this one definite trajectory that minimizes the action, and possible trajectories are given weights determined by the action, and by Planck‘s constant h, in such a way that as h + 0, this smearing peaks up to concentrate at the one definite minimizing trajectory. This would give a marvelously geometric picture of the correspondence principle” (that is, of the sense in which quantum mechanics reduces to classical mechanics as Planck’s constant goes to zero) if Feynman’s ideas could be made more rigorous. This finishes our discussion of the part of Hamilton-Jacobi theory that can be stated in terms of precise theorems. Much more material, harder to formulate as theorems, has been developed in the long history of the subject. This is mainly motivated by the role that Hamilton-Jacobi theory has played in physical applications, particularly in celestial mechanics, geometrical optics, and in the foundations of quantum mechanics. It is almost an obligatory task for anyone interested in these physical applications to read this supplementary material. We shall limit ourselves here to several remarks that fit in particularly well with the differential-form point of view. First we ask what is meant by “solving” the Hamilton equations. We have seen in Part 1 that solving a system of ordinary differential equations may be interpreted geometrically as finding a change of coordinates so that the vector field whose integral Eurves solve the differential equations is in these new coordinates a vector field whose integral curves are in some sense known.” We can make a similar interpretation of the problem of “solving” the Hamilton equations. Suppose that o = dyi A d x , - dH A dt. If a new coordinate system (xi’, yi’, t’) can be found so that “

”

“

“

”

“

o = dy,‘

A

dxi,

(1 3.13)

we can say that o is in canonical form in the coordinate system. Since o


137

remains the same, the characteristic curves of w are geometrically the same, but the coordinates of these curves in the “ new ” coordinate system satisfy the Hamilton equations with Hamiltonian zero. That is, the (xi’,y,’) are constant along the integral curves; hence, when expressed in terms of the “old” coordinates, they are a set of 2n functionally independent integral functions of the original Hamilton equations. Hence these original Hamilton equations can be regarded as ‘‘ solved.” We can then say that solving them is more or less equivalent to throwing w into canonical form. In Theorem 13.3 we have one method for writing w in canonical form. However, this is not a really practical method for doing so: For example, it applies to any 2-form of constant rank and does not really take advantage of the fact that o is initially in a relatively simple form. There is another method, due to Jacobi, for doing this. This method works by solving the HamiltonJacobi partial differential equation and that seems particularly well adapted (at least as well as anything else) to the equations of celestial mechanics, particularly the 2-body problem. We now explain this method. We have seen that a single solution S(x, t ) of

as -+H at

::

(x , - - , t ) = o

(13.14)

determines an n-parameter family of solutions of the Hamilton equations, obtained by finding the integral curves of the vector field

in ( x , t)-space. Thus an n-parameter family S(x, t; a,, ..., a,,) of solutions of (13.4) that depends “ essentially” on the parameters, that is, satisfies det(-)

as axi a a j

# 0,

(1 3.15)

should in principle determine the full 2n-parameter family of solutions of the Hamilton equations. An n-parameter family of solutions of (I 3.14) satisfying (13.15) is called a complete solution of (13.14). Jacobi’s trick consists in the observation that, given a complete solution, the reduction of o to canonical form can be made directly without solving any differential equations, as follows : Introduce a “ new ” space R4”+l of variables (x,y, t, a, b), a = (ai), b = (bi),etc. Consider the form Q = d y , A d x , - dH = d(yi

dxi - H dt

A

dt

+ d b , A dai

+ bi d a J .

138


Calculus

Consider S(x, t ; a) as a function of all the indicated 2n + 1 variables. Then

as axi as =-

d S ( x , t ; U ) = - dxi

as + as - d t + -d at

aai

~ i

axi

Consider the submanifold defined by the relations (13.16) Notice first that the differentials of the functions on the left-hand side of relations (1 3.16) are everywhere linearly independent; hence the relations do define a bonafide 2n + 1-dimensional submanifold of R4"+'. We next show that the projection of this submanifold on (x,y , t)-space has nonzero Jacobian. For this it suffices to show that every form on the submanifold can be written in terms of the d x , , dy,, and dt.7 Clearly, this is an integral submanifold of 0.Now, on the submanifold,

a2s

dbi

= -dXj

dy,

= -d a j

axj aai

a2s

axi a a j

a2s a2s + aai d t +daj at aai auj ~

2

+ ax,a2s axj d x j + _ axi_at dt. a2s

~

By (13.1 5), dui can be restricted to the submanifold, then be expressed in terms of dy and dx,and thus so can db, . This shows that the projection on (x,y , t ) space has nonzero Jacobian. Then, locally, the submanifold defined by (13.16) can be written as the graph of a mapping 4 : (x, y , t ) -+ ( u ( x , y , t ) ) of R2"+' -+ R'". We have 4*(dUi A

db,) = 0 ;

hence 4 is the required mapping, sending o into canonical form (if ui and bi are redefined as xi', y i ' ) . Of course one point to all this is that (at least for some of the Hamiltonians occurring in the simpler problems of classical mechanics) a complete solution

4

t If : V-. V' is a linear transformation between vector spaces of the same dimension, to prove that is an isomorphism it suffices to prove that the dual map $* of linear forms: V'* + V* is mito.

4

139


of the Hamilton-Jacobi equation can be found (after a possible change in variables of x-space) in the additive form

S(x,, . . . , x,; a,, . . ., a,)

= S , ( x , , a,)

+ ... + S,(X,,

a,)

by the method of separation of variables, so useful in elementary theoretical physics. However, the Hamilton-Jacobi equation is highly nonlinear, and solutions of this type seem to be even more accidental than are the solutions that can be obtained from the usual linear partial differential equations of theoretical physics. Our second remark is concerned with “ perturbation theory ” in the special sense that it is used in celestial mechanics. However, a few remarks about perturbation theory in general might be in order. Let D be a domain in an R”, and let X be a vector field in D. Let X “ be a one-parameter family of vector fields in D reducing to X when E = 0. Suppose also that, for simplicity, the coefficients of X “ depend on E in a real analytic way. The simplest example would be X “ = X + E Y, where Y is another vector field. In general, “perturbation theory” is concerned with studying how the integral curves of X “ are related to those of X as E -+ 0. In terms of the nineteenth century, pre-Poincark view of differential equations, this was a purely computational problem, limited only by one’s ability to compute formal expansions in E . Research since PoincarC has shown that this is a hopelessly naive view. (It is of a different order of difficulty, for example, from the results in divergent series; modern research has shown here that, by and large, the formal classical work can be cleaned up.) Existence theorems for the type of expansions one is trying to find must be proved and are usually quite difficult, at least for physically realistic situations. The typical problem of this type is that of perturbing periodic integral curves. An integral curve a(t), -00 < t < 00, is periodic or closed with period T if o(t T ) = a ( t ) for all t. It suffices, in view of the uniqueness theorem for integral curves, to show that a(T) = a(0) (but a ( t ) # o(0) for 0 < t < T ) . Given such a periodic integral curve, does there exist a periodic curve a&with period T(E)for E sufficiently close to 0, reducing to the a and T as

+

E+O?

Let us return now to Hamilton-Jacobi theory. Suppose we have a oneparameter family of Hamiltonian functions, say, H ( x , y , t ) = H o ( x , y , t ) + E H ’ ( x ,y, t ) . Suppose the Hamilton equations with Hamiltonian H o can be “solved,” say by the method of Jacobi given above, and H ’ is regarded as a small perturbing “energy ” applied to the “ known ” system with Hamiltonian H o . The typical example in celestial mechanics is that where H o describes the motion of the sun and earth, the “solvable” 2-body problem, and where H’ describes the perturbation on the earth, by, say, Venus. (Or, to be “

”

140


modern, replace sun by earth, earth by satellite, Venus by the Moon.) Intuitively, the effect can be visualized as a “ slow ” change in the elliptical orbit of the earth, that is, the “parameters” describing the orbit that would be constant were there no perturbation to change slowly with time. Analytically, this can be interpreted as follows: dy,

A

dxi - dH

A

dt

= (dy, A

dxi - d H o

A

dt)

+ dH’

A

dt.

Being able to solve the system with Hamiltonian H o means that we can find functions ( x i ’ ,yi’) on (x,y , [)-space such that dyi

A

dx, - dHo

A

d t = dy,‘

A

dx,’.

Now ( x i ’ ,y,’, t ) forms a new coordinate system; for, holding t = constant, we have dyi‘ A dx,‘ = dy, A dx,;hence the Jacobian, for fixed t, of the map going from (x, y ) to (x’(x,y , t ) , y‘(x, y , t ) ) is 1. Thus d y , A dxi - d H

A

dt

= dyi‘ A

dx,’+ dH’

A

dt.

When H’ is expressed as a function of the new coordinates, the characteristic curves of the form, that is, the Hamilton equations with “total Hamiltonian H , are described by Hamilton equations in the primed coordinates with Hamiltonian H’ : The effects of the unperturbed system have been completely taken into account, and the variations of the ‘‘constants” ( x i ’ , y i ’ )of the unperturbed motion due to the “perturbation” H’ are very simply taken into account . Our third topic will be to discuss in more generality some of the underlying transformation ” properties of the Hamilton equations that we have used in the first two topics. Suppose, then, that we are given two separate systems. Consider two spaces of variables ( x i ,y i , t ) and (xi‘,y,’, t ’ ) with Hamiltonians H a n d H ’ given on both. A mapping 4 from unprimed to the primed space such that ”

“

4*(dyi‘

A

dx,’ - dH‘

A

dt’) = dyi A dxi - d H

A

dt

(13.17)

will have the following property:

Given a curve ( x ( t ) , y ( t ) ) in unprimed space (that is, solutions of the Hamiltonian H ) , the image curve $ ( x ( t ) , y(t )) will, after reparametrization by the level surface of the function t‘, be a solution of the Hamilton equations with Hamiltonian H’. Thus a 4 satisfying (13.17) sets up a correspondence between solutions of the two Hamilton equations; however, the sense of “time” may not be preserved in this transformation, and the Hamiltonian H‘ may bear a quite complicated relation to H.

13. HamiltonJaeobi Theory

141

Now, clearly of special importance in all this are the transformations of ( x , y ) to (x’,y’)-space above, such that

4*(dyi

A

dxi’) = dy, A d x i .

(13.18)

Such a transformation is called a canonical transformation. If H‘(x’, y‘, t ) and H ( x , y , t ) are functions such that $*(H’) = H , by (13.18) 4 carries solutions of the Hamilton equations with Hamiltonian H into solutions for the Hamiltonian H ‘ . If we regard 4 as a mapping of the same space onto itself, the canonical transformation with an inverse? forms a group. This group may be regarded as permuting the Hamiltonian systems. In classical language, a Hamiltonian system of ordinary differential equations is a Lie system with respect to the group of canonical transformations (just as a linear system of differential equations defined by a vector field of the form

is a Lie system with respect to the group of all transformations of (x, t)-space of the form ( x , t ) -+ ( n ( t ) x , t ) , where a ( t ) = ( a i j ( t ) ) is an n x n matrix function o f t ) . We defer further study of the group of canonical transformations until later. Exercises

1. Work out the solution of the Hamilton-Jacobi equation for the case where H(x, y ) is of the form: + y i y i+ V ( r ) with r 2 = x i x i , (1 2 i 5 3). Work out the solution of the Kepler problem in the celestial mechanics (that is, the case V ( r ) = l / r ) in as complete a form as possible. 2. Let.H(x, y ) , H ’ ( x , y ) be two Hamiltonians. Show by means of Jacobi’s complete solution method that locally there is a canonical transformation taking one into the other. If, for example, H ’ is a function of y alone, discuss what this means for the problem of “solving” the Hamilton equations for Hamiltonian H . Can you think of any “global” reasons why this canonical transformation may not exist globally? For example, discuss the Kepler problem from this global point of view. Also, discuss globally the simple case where x and y are one-dimensional vectors. “

”

7 Since +*@xi’ A dyi’ A ... A dx.’) = dxi A dyi A ...A dy”, a canonical transformation always has Jacobian equal to 1 ; hence it has at least a local inverse.

14

Extremal Fields and Sufficient Conditions for a Minimum

Let B be a manifold. We have said that a Lagrangian for B is a real-valued function L : T ( B ) x R + R , which enables us to define a real-valued function 0 -+ L(0) = j lL(a’(t ),t ) 0

dt

on the space of curves of B. For theoretical purposes, it is most convenient to study homogeneous, time-independent Lagrangians. This means that

L is a function L(v) of L ( h ) = AL(u)

D E

T ( B ) alone;

for i> 0.

(14.la) (14.1b)

Condition (14. I b) guarantees that the function t -+ L (0)is independent of the parametrization of the curves. In the development of this chapter (we are following CarathCodory’s ideas [2], as modified slightly by Hermann [I]), we shall consider such Lagrangians, indicating later how nonhomogeneous ones are handled also. Introduce local coordinates ( x i ) for B and (xi, ij)for T ( B )as explained in Chapter 12. Then L becomes a function L(x, i ) ,satisfying the homogeneity condition

L(x , A i ) = AL(x, rn).

Differentiating both sides of this relation gives Euler’s relations :

L(x , a) = L,, j ( x , i)rnj,

(14.3)

L,, j ( x , i)ij =0, L,, j(x’ A i ) = L,, j ( x , i). (Recall that

L

aL

ai, ,

_=-

aL Lj=- ,

axj

etc.)

Now

e(L) = L , , ~d x j - ( L - ~ , , + dt ~ i ~ ) = L,,,

dxj; 142

(14.2) (14.4)

143

14. Extremal Fields-Conditions for a Minimum

using (14.2), 6(L)becomes a form on T ( B ) alone. Hence, in dealing with such homogeneous Lagrangians, we can take M as T ( B ) and omit explicit time ” dependence. When considering constraints, it is appropriate to consider homogeneous ones also. Thus K will be a subset of T ( B ) such that “

Au E K

for all u E K , all L > 0.

In the traditional versions of the Lagrange variational problem, K is defined by equations on T ( B ) : ga(u) = 0

for 1 I a I m,

with ga(lu) = Ag,(v)

for L > 0.

Definition An extremalfield for the homogeneous variational problem is defined by a pair ( W , X ) consisting of a real-valued function W on B and a vector field X E V ( B ) such that

X ( W )= 1

= L(X(b))

for all b E B.

(14.5a)

That is, d dt

- W(o(t))= L(o’(t))

along any integral curve t + a ( t ) of X . (This is just a normalizing condition.)

X(b)E K

for all b E B.

(14.5b)

For each b E B, L(X(b))is a relative minimum function

u -+ L(u),where u varies over the vectors u E Bb satisfying u E K , u(W) = 1.

(14.5~)

THEOREM 14.1 Let o ( t ) , 0 I t I a, be an integral curve of X . If al(t), 0 I t I a is any nearby? curve to o(t>,with W(a(0)) = W(a,(O)),W(al(l)> = W(o(l)), whose tangent vector curve belongs to the constraint-set, that is, ol’(t) E K for 0 I t I a, then

L(o) = fL(d(t)) dt I L(al) = f L ( o l ’ ( t ) )dt. 0

t The precise meaning of

‘I

nearby ” will be made clear in the proof.

144


If, further, for each b E B, X(b) is the only minimal point of L(v), with u subject to the conditions listed in (14.5c), then L(o) < L(o,), unless o1 differs from CJ only by a change of parametrization. Proof. We can suppose without loss of generality that W(a(0)) = 0 W(a(a)) = 1 . Then, by (14.5a), W(o(t))= t ; that is, c~ is parametrized by the value of the W o n the level surface of Wit lies on. (Think of the successive level surfaces of W as wave fronts,” and the integral curves of X as the “ rays ” corresponding to these wave fronts.) If (d/dt)W ( o l ( t ) )> 0, the parametrization of 0 , can be changed so that W ( o l ( t ) )= t also, that is, so that o1 is parametrized by the level surface of W it lies on. Let us then make precise the condition that c1be “nearby” to [r by requiring that: “

(a) (d/dt)W(o,(t))> 0; that is, the function Wis always strictly increasing on ol.Thus the values of W can be introduced as a parametrization for 0 , ; that is, we can suppose that W(a,(t)) = t , 0 5 t 5 1. (b) For each t E [0, a ] , ol’(t) (which now satisfies ol’(t)(W)= 1 and cr,‘(t) E K ) is sufficiently close to X ( o , ( t ) ) so that property (14.5~)holds; that is, L(X(CJ,(t1)) 5 UCJ,’(t1). Thus,

whence

L(o,’(t)) 2 L(X(o,(t)) = 1 = L(X(a(t))) = L(a’(t)),

L(0,) = J>(cT1yt)) 2 />(d(t))

dt 2 Jaldt 0

dt 2 L(cr).

Q.E.D

Thus we have found a method (CarathCodory’s) for proving that certain curves give a minimum to a given variational problem by solving an infinite succession (parametrized by the level surfaces of W ) of finite dimensional minimization problems. Actually, the method is an abstraction of a method that had been implicit from the beginning of the calculus of variations. Carathtodory simply stood the classical reasoning on its head and put the motivation into, first, making the method clear; and second, into carrying out the analytical details necessary to show that there is a plentiful ” (local) supply of such extremal fields, and to find the conditions that a single curve be embeddable as an integral curve of an extremal field. At every point of the domain, we have a vector X(b), which are the rays” corresponding to the curve fronts W = constant. X ( b ) represents the “ optimal” direction to go when a curve is at point 6. Since c~ is an integral curve “

“

I45


of X , it is always going optimally; (14.5~)expresses this "optimality". Any other curve going from a(0) to the surface W = W(o(1)) would at some point violate this optimality; hence it would give a larger value to L(o,). Let us now work out the conditions (14.5) more explicitly in the case of the constraint-set K defined by equations : g,,(x, a) = 0. Introduce the abbreviations : g a , . = -aga axi I

aga g a , n + i = - , etc. axi

7

aw

Wi =-,

axi

etc.

Suppose that W ( x ) and X = Ai(x)(a/axi)define an extremal field. Fix xo E D. We shall carry out the minimization of (14.5c), using in the usual way the Lagrange multiplier rule. Introduce real constants A, A,: set up the function f L(x", i) add , the constraint functions multiplied by the multiplier of constants --f

f

+ L(x0,

a) + A(wqxO)fi - 1) + A,g,,(xO, a),

and express the fact that il= Ai(xo) is to be a critical point for this unconstrained function of 2 ; that is, Ln+i(XO,A(X0))

+ Aw,(xO)+ Aag,,,n+i(XO,A(xO))= 0.

Now we want x o to vary. It is not too unreasonable to suppose that the A and A,, will also vary with x ; that is, Ln+i(x,

A(x)) + 4x)wi

+ &(x)ga, n+i(x, A ( x ) ) = 0.

(14.6)

By (14.5a), we have W iAi = 1. The Euler relations for homogeneous functions give Ln+i(x, 2)fi = L(x, a), go, n + i ( x , f ) f i = gn(x, f). By (14.5b), g,(x, A(x)) = 0; hence ga,n+i(x,A(x))Ai(x)= ga(x, A(x)). Multiplying (14.6) by A i ( x ) and adding, using the Euler relations, we have 0 = L ( x , A(x)) + A(x) = 1 + A(x).

Thus (14.6) simplifies to

aw

- axi (XI

= Ln+i(x,

)IX('

+ Aa(x)ga,n+i(x,A(x)),

or

dW

= Ln+i(x, A ( x ) ) dxi = &(x)ga,n+i(x, A ( x ) ) d x i .

(14.7)

We want to describe this relation more geometrically. Now O(L) = L,,, a x i . Consider (A,) as the coordinates of a space R", and consider the space of the

146


Calculus

variables (xi,ii, La) as T ( B ) x R". Then X and the functions (A,(x)) together define a cross-section mapping @ : B-+ K x R". Notice that (14.7) can be rewritten as (14.8) d W = @*(O(L) laO(g,)).

+

Applying d t o both sides, we have 0 = @*[Id(O(L)+ l a W a > > I *

(14.9)

Thus @, considered as a map of B + K x R", defines a submanifold of K x R" such that the 2-form, w = d(O(L) + &O(g,)), on K x R" is identically zero when restricted to this submanifold. Thus the (partial) differential equations defining the extremal field can be defined in this "geometric" way. Conversely, consider an n-dimensional submanifold of K x R" such that o is zero on this submanifold and such that the forms dx,, . . . , dx, are independent on the submanifold. Notice now that if B is sufficiently small, the cross-section map @: B+ K x R" c T ( B ) x R" and the function W on D satisfying (14.8) can be reconstructed. For suppose that the submanifold is realized as a map 4 : B' -+ K x R", where B' is a domain of R" such that

4*(w) = 0,

+*(dx,

Then the composite map B' -+ K x R" dinates of B' are t,, . . . , t,, 4 satisfies

b*(dX, A

"'

A

A

... A dx,) # 0.

+B

has nonzero Jacobian (if coor-

dX,) = J(t)(dt, A

... A dt,)).

Hence, if B' is small enough, an inverse map exists; that is, we can identify B' with B and suppose that 4 is a cross-section map x + (X(x),A,(x)), where X(x) E K , and (&(x)) E R". 0 = 4*(o) = 4*(d(O(L)+ l

a O(ga>>) =

d4*(O(L) + l a %a))

hence there is a function W with

dW

= $*(O(L)

+ l a O(ga));

that is, if 4 is identified with @, we have just (14.8). This discussion may be summed up by saying: There is a 1-1 correspondence between extremal fields of the variational problem defined by ?hehomogeneous Lagrangian L(x, i) and the homogeneous constraint function ga(x, i) = 0 and n-dimensional integral submanifolds of the 2-form d(O(L)+ &O(g,)) on ( K x R") (= subset of (x,,ii,La) defined by ga = 0) which have the property that dx, A - - .A dx, is nonzero on the submanifold.

;


147

Of course one source of imprecision in this statement is that in our working of our way backward to define X , X ( x ) is only a critical point, not necessarily a relative minimum, of the function u -+ L(v) when u runs over those u E Dx0 such that u E K, v(w) = 1. It is usual in classical treatments to impose a priori conditions on the second derivatives L,+ i, n + j , ga,n+ i , ,+ of L and g,, guaranteeing that the Hessian form of this function is positive. Such a condition is usually called a Legendre condition, but there is really no point in our writing it down explicitly here. THEOREM 14.2

If CD : x -+ (x, i= A(x), A,(x)) is an integral map of B -+ K x R” such that @*(d(Q(L)+ Aa8(ga)))= 0, if X is the associated vector field, that is, X = A,(d/dx,), then an integral curve x ( t ) of X has the following property: The curve t + (x(t), (d/dt)x(t) = i ( t ) , A,(x(t))) is a characteristic curve of the 2-form d(8(L) + A, O(g,)). A necessary condition that a given curve t -+ x(t) in B be embeddable in an extremal field is then that there be functions (A,(t)), called Lagrange multipliers, so that the curve

is a characteristic curve of this 2-form.

Proof. Let o(t),0 t 5 1, be an integral curve of X , and let ol(t)=@ ( o ( t ) ) A,(a(t))) be the image curve in K x R”. Let 8 = L , + , d x , + L t a , n + i d x i = 8 ( L ) + Aa8(gJ. = (ol’(t),

Taking t 3 / d i j of (14.2), we have

+

L , + i , , + j ( x ,i)ii L,+j = L,+,

or

J ~ , + ~ , , + ~ ( Xi ,) i j= 0.

We remark now that al’(t)-Jdo, as a 1-covector at the point ol(t), contains nonzero terms only in the dxi-terms. For since a ( t ) is an integral curve of X = A i(a/axi), d - x ( t ) = A(x(t)). dt

148


Calculus

The coefficient of ol’(t)J dO involving d i j is then

by the Euler relations. The coefficient of dL is

since a’(t) E K ; that is, ga(a’(t))= 0. Now we remark that the 1-covector o l ’ ( t )_I dB is zero when pulled back to B by @*; that is, @ * ( c l ’ ( t )_I dB = 0. To prove this, let u E B,,(t). @*(al’(t) A dO)(u) = ( o l ’ ( t )_I dB(@,(u)) = dO(a,‘(t),CD,(v)) = dO(@*(o’(t),@*(v))

= @*,dO)(o’(t),v) = 0.

These two remarks clearly force al’(t) _I d0 = 0; that is, t -+ ol(t) = @(a(t))is a characteristic curve of the 2-form d(B(L) + A,O(g,)), as required. The fact that t + ( x ( t ) ,(dxldt), A,(t)) is a characteristic curve of d(O(L) + A, O(g,)) leads to an interpretation of the Lagrange variational rule: Construct the Lagrangian Z ( x , i, t ) = L(x, a)

+ A,(t)g,(x,

i).

Notice that the curves r + ( x ( t ) ,(dxldl)),A,(t )) that are characteristic curves of d(U(L) + A, U(g,)) and satisfy the constraints ga = 0 are also characteristic curves of dO(L’);that is, they are the extremals of the ordinary variational problem that are defined by L‘ that happen to satisfy the constraints. Linear Lagrangians and Convex Inequality Constraints Suppose now that L : v + L(v) is a linear function on T(B) with B a domain of R”’.Thus, in terms of coordinates ( x i ,i i )for T(b),1 I i, j , .. ., < n, L ( x , 2) must have the form L ( x , x ) = at(x)xi. Further, we shall suppose that K is defined as the set of ( x , i) E T(B)such that ga(x, i) i 0, 1 5 a, b, . . . , i m, where, for each x E B, the Hessian matrix (go, ,+ ,,+,(x, i) is ) positive semidefinite. (Thus, the g, are convex functions when restricted to each tangent space.) Suppose that W and X = A,(d/dx,)are, respectively, functions and vector


149

fields in D that define an extremal field in B for the variational problem. By (14.5a), ga(x, A ( x ) 5 0 . X ( W ) = 1 = Ui(X)Ai(X), For each x E B, f = A ( x ) is the minimum value of the function f + A i ( x ) i i when f varies in a neighborhood of A(x), subject to the condition W,(X)ii =

1,

ga(x, 2)) s o .

Note first that we cannot have ga(x,A ( x ) ) < 0 for all a ; that is, A ( x ) cannot be in the interior of the constraint set. For otherwise the function 1-+ L(x, i ) would be a linear function that has a relative minimum on an open subset of a space R"-' (after the constraint Wi(x)i1 = 1 is taken into account), which is impossible. Suppose, then, that g,(x, A(x)) = 0 for 1 5 a s p , but that g,(x, A(x)) < 0 forp+ 1 < a i m . Let us use a geometric terminology. Let K" be the subset o f f such that (x,f ) E K . Thus K" is that part of the constraint-set lying over the point x. It is a convex subset of B,. Think of the subset of K " consisting of those 3 satisfying ga(x, 2) = 0, 1 5 a i p , ga(x, f ) < 0, p

+ 1 ia I n,

as a face of the convex set K". We also call the face nondegenerate if the rank of the matrix (g,, n+i(x,a)) is, for all ( x , i )on the face and for 1 5 a 5 p, 15iI n, equal to n, that is, is of maximal rank. We shall then say that this is an (n - p ) dimensionalface of the convex set K". Continue to regard x E B as fixed. We now show that the linear forms i

-+

K(x)ii,

1-+ g,,

n + i ( ~ ,

A(x))ii,

1I a 5 p,

(14.10)

are linearly independent. Suppose otherwise: By the hypothesis that the matrix (ga,n + i(x, A(x))) has maximal rank, the forms 2 -+ga,,+ i ( x ,A ( x ) ) i i are linearly independent. There must then be a relation of the form

Then

which is a contradiction.

150


+

Let v E B, and consider the line segment t + X ( x ) tv, - 1 t I 0. If the following conditions are satisfied, the whole segment lies in K", for t sufficiently close to zero, and satisfies the normalizing constraint v( W ) = 1 : Wi(X)ki(U)= 0 . d

+ tu)J(

-ga(x(x)

dt

t=O

= g,, , , + ~ x~ ,( x ) ) , ki(v> I

(14.11a)

o

for 1 1a 5 p .

(14.11b)

The minimal property of X ( x ) = ( x , A ( x ) ) requires that L(v) = U,(X)ki(U)2 0

for all vectors v E B, satisfying (14.11). The first condition this imposes is that the form i -+ a i ( x ) f ior u + L(v) can be written as a linear combination of the forms in (14.10); that is, D

ai(x> =

P

C Aaga,n+i(x,A ( x ) ) + nW,(x)*

a= 1

(14.12b)

For the forms of (14.10) are linearly independent: If they are extended to a basis of linear forms, if the form i + a i ( x ) i iis expressed in terms of this basis, if the coefficients of the forms other than those in (14.10) were nonzero, there would be a u E B, satisfying (14.1])-in fact (14.1 lb) could be zero-but L ( v ) could have arbitrary sign, which would give a contradiction. Using the Euler homogeneity relations again, we see that the A occurring in (14.12a) must be 1. Thus, finally, the minimization property requires that the I = , 1 _< a S p , occurring in (14.12) be 1 0 . We can thus extend the Aa to 1 I a i m by requiring that I , I 0. A11 this has been for a fixed value x E B. If, when x varies, X ( x ) always lies on a nondegenerate (n - p)-dimensional face, the Ra can be chosen as functions of x, Aa(x), 1 2 a I m, and we then have dW

= ~i

Note that O(L) = L,,

dxi -

C A,(X)ga,n+i(X,A ( x ) )d x i . Y

a= 1

dxi = ai dxi . Again this means that the mapping

@: x -(X(x), - I a ( x ) ) of B + C x R" is an integral submanifold of

4O(L) + Aa O(ga)h so that theextremal fieldsare determined by the same sorts of differential equations as those for the variational problems (14.5a) involving equality constraints, but in addition the integral submanifolds of this 2-form must satisfy


151

an inequality condition, namely: The functions Aa on K x R” must be 20 on the integral submanifold. This may be described as the “ non-singular ” part of the theory. One obvious sort of singularity may happen when x + X ( x ) lies on faces of different dimensions as x varies over B. In problems of the theory of optimal control, this phenomenon has great importance, since it corresponds to “ switching,” but its general theory seems to be much more difficult. It is traditional in treatises in the calculus of variations to spend considerable effort in developing the various necessary and sufficient conditions for extremals of variational problems to give minima. However, this is not a subject of active interest in differential geometry (except in the special case of a Riemannian metric, which we shall develop later with other methods), and we shall not go into more details here. Our aim in this chapter has been only to present CarathCodory’s brilliant idea in as clear a form as possible. In the next chapter, we shall present further material of a classical nature that is useful in classical mechanics.

Exercises 1.

Prove (14.2) through (14.4).

2. Suppose L is a Lagrangian on a manifold M , and N is a submanifold. Suppose the constraints-set K consists of the u E T ( M ) that are tangent to N . What does the Lagrange variational rule say about the extremals ?

3. Show that Newton’s equations of motion for a system of particles (Chapter 1 I), for forces derivable from a potential, at least, can be written directly as the Euler equations for a Lagrangian. What is the relation of d’alembert’s principle to the Lagrange variation rule ? 4. Discuss the case of “ nonholonomic ” constraints in classical mechanics (for example, the case of a friction-free sphere rolling on a plane) from the point of view of the Lagrange variational rule. Is there a generalization of d’Alembert’s principle to this case? 5. For a simple variational problem (that is, with no constraints) work out the Legendre condition.

6 . Consider a simple case where such a Legendre condition is not satisfied in a uniform way; for example, the case of a pseudo-Riemannian metric in the plane (see Part 3). Discuss the possible minimizing properties of geodesics (that is, extremals), using Carathtodory’s method.

15 The Ordinary

Problems of the Calculus of Variations

In this chapter, we present the more classical approach to the most important special case of the general variational problem, whose theory has been already outlined. One may regard much of this material as constituting the mathematical content of classical mechanics; some will be a repetition (for the sake of clarity) of material already given. Let D be a domain in R“, the space of variables x i , 1 < i, j , . . . n. For x E D , let D , be the tangent vector space to D at x. Let

T(D)=

u

Dx

X E D

be the tangent bundle to D considered as a domain in R2“ with coordinates ii),where ii are the functions on T ( D ) defined as

(.xi,

for v E T ( D ) .

i i ( v ) = D(XJ

Consider R , the real numbers, as parametrized by t. A Lagrangian on D is a real-valued function L on T ( D ) x R . If a : [a, b] + D, L(0) =

I’ L(o’(t), b

t ) dt.

a

The function 0 L(a) defines a function on the space of curves in D , and the extremal curves of L, in general, are the curves that in some sense are critical “points” for this function. However, in this simplest case, thejirst variation formula gives a rationale for a more explicit definition of the extremals as solutions of the Euler equations. Let us derive the Euler equations and the first variation formula in the standard way. Suppose a(r) is defined in coordinates by x ( t ) = (xL(r)).Then --f

L(a) =

jyX(t), a

-( t ) , t

dt

1

clt.

Suppose that a,, 0 I s < 1, is a deformation of a, that is, a one-parameter family of curves with a. = a. If in coordinates, as= x(t, s) = ( x i ( t ,s)); if c ( t ) E D a ( r )is, for each t E [a, b], the tangent vector to the curve s a,(t) at s = 0, then --f

axi

i . , ( u ( t ) ) = -( t ,

as

152

0).

153

14. Ordinary Problems of Calculus of Variations

The tangent vector field t v(t) along cs is called the infinitesimal deformation corresponding to thegiven deformation s -+ o, of CS.Put vi(t) =(ax,/as)(t, O).? --f

and after the second term on the right-hand side is integrated by parts,

Hence

(15.1) Thefirst variation formula leads us to consider curves that satisfy the following system of second-order differential equations, the Euler equations:

dt

(15.2)

We depart slightly from the general point of view and regard an extremal of the variational problem as a curve satisfying (15.2); thus (15.2) is to be regarded as a system of differential equations to be investigated for its own sake. Of course the relation of the two notions of " extremal curve" is obvious from (15.1). For example, if a,(a) = oo(a), a,(b) = oo(b),that is, if crS is a deformation with fixed end points, then v,(a) = 0 = zil(b); hence (d/ds)L(as)I s = o = 0. Conversely, if (d/ds)L(a,) I is true for every deformation with fixed end points, then (15.2) is satisfied (see any classical book on the calculus of variations). However, (1 5.1) contains information about deformations that may not have fixed end points: For example, if cs satisfies (1 5.2) and if each of the " boundary terms"-that is, the last two terms on the right-hand

t It is convenient to use a subscript notation for partial derivatives:

154


side of (15.1)-vanishes, then the left-hand side vanishes. We shall see a little later on what the vanishing of these last two terms means geometrically. After these classical comments, we proceed to a more intrinsic, geometric” characterization of the solution of the Euler equations as the projection into D of the characteristic curves of a certain closed 2-form on T ( D ) x R. Define the Cartan I-form O(L)associated with L as a I-form on T ( D ) x R as “

Given a curve o ( t ) , a

< t < b,

in D, we can consider its extended curve

0(t) = (a’(t), t ) in T ( D ) x R: I n other words, 0 is the graph of the tangent

vector curve of 0. O(L), as a 1-differential form on T ( D ) x R, defines a Lagrangian on T ( D ) x R. (a) L(a) = e(L)(0); that is, the value of the function defined by the Lagrangian L on the curve rs is equal to the value on the Lagrangian 8(L) defined on T ( D ) x R on the curve 0. Explicitly:

and H is the function (dL/di)ii - L on T ( D ) x R. (b) The curve o is a solution of the Euler equations (15.2); that is, is an extremal of L if and only if

(

d2)>

6 = x ( t ) ,- t )

is a characteristic curve of the 2-form dO(L) on T ( D ) x R. Thus, since O(L)= y i dxi - H dt, if the functions (x,, y , , t ) form a new coordinate system for T ( D ) x R, that is, of det(Ln+,,n+j)# O ,

(15.5)

then a is an extremal of L if and only if its extended curve, when written in the new coordinates as ( x i ( z ) ,yi(t), t), is a solution of the Hamilton equations with Hamiltonian H :


155

All except the remark about the Euler equations being equivalent to Hamilton’s has been proved in Chapter 12. Note that dyi = Ln+i, n + j d i j

+ Ln+ 1 , n + j

dxj

+ L + i , t dt.

Now the ( x i ,y i , t ) define a new coordinate system (at least locally) for T ( D ) x R if and only if the d i j can be expressed in terms of the dxi , dyi , and dt. This is so if and only if (15.5) is verified. Given this condition in the new coordinate system, dB(L) is in Hamiltonian form dy, A dxi - dH A dt with Hamiltonian function H = L,,, ,ti - L when expressed in the new coordinates.? We can now refer to our work in Chapter 13 to complete the description of the extremals as solutions of the Hamilton equations. A Lagrangian L satisfying (15.5) is said to define a regular, nonparametric (or nonhomogeneous) variational problem. In such problems, the extremal curves come with their “ own ” parametrization, in the sense that the parametrization cannot be freely changed without destroying the extremal curve property. Thus this sort of variational problem is the appropriate one for defining the time evolution of physical systems in Newtonian physics, where “ time” is “ given ” absolutely for the whole system, from the “ outside,” as it were. A brief indication of at least the formal relation with Newton’s laws of motion might be in order. The type of Lagrangian satisfying (15.5) for which the Euler equations are as simple as possible is one of the following form (assuming no explicit time dependence in L ) : (1 5.6) L(x, i) = $ m i i i i - V(x), where m is a constant, V(X)is a function on D, n

ddt ( L + , ( x ( t ) , $ ) )

=

d2xi dt2

- Li = m - +

1, 2, or 3;

dV

-.

axi

Recall Newton’s law for motion of a particle: Force “vector” =mass x “ acceleration vector.” This will be formally identical with the Euler equations if we identify m with mass, (d2Xi/dt2)with the acceleration vector, and (-a V / d x i )with the “ force ” vector. Thus V(x)is to be regarded as the potential function giving rise to the “force” vector. Newton’s laws are implicity based on the Euclidean nature of the underlying space, in particular on the unnatural” identification between vector fields and differential forms. It “

-f In other words, H may be regarded as function H ( x , , y i,t) of the indicated 2n variables such that, identically in ( x , i ,t),

H(xi,L.+i(x,a,t),t)=L.+j(x,i,t)aj-L(x,i,t).

+1

156


seems more natural, then, to regard the “force” vector field as a differential form, namely as -dV. Continuing, we see that the i iare to be regarded as the coordinates of the velocity of the particle. L then has the form: (kinetic energy - potential energy). Then y , = L n + ,= m i i are the coordinates of the linear momentum of the particIe.

. =Yi x. ’

H

m’

= L,+,ii - L = miiii

+ V(X)

-$ ~ ~ . i i i i

+ V(x) = (potential + kinetic) energy = total energy. =$WI.fiii

(15.7)

Thus the formal study of regular, nonparametric variational problems may be regarded as a generalization of the material that goes under the name of classical mechanics.” This has the great advantage of replacing the Newtonian theory, which is really covariant” only under the group of orthogonal transformations, by an apparatus that is covariant under the much bigger group of all transformations of the underlying space D. We now want to make precise this “covariance” of the Euler equations under arbitrary transformations of the domain D. We shall be able to d o even more, namely, to develop covariance with respect to arbitrary mappings of one domain D c R“ into another D‘ c R”. Suppose, then, that D is a domain in R“with coordinates ( x i ) ,1 5 i,j , . . . I n, that D’ is a domain in R”, with coordinates (zJ, 1 5 a, 6, . . . 5 m, and that 4 is a map: D -+ D‘. Suppose @,(x) = z, are the functions defining the mapping; that is, “

“

4 * ( z a ) = 4,.

There is a mapping &*: T(D’)-+ T ( D ) of the tangent bundles assigning 4*(v) E D;,,, to each z) E D,. If (z,, 2,) and ( x i , ii) are respectively the standard coordinates on T( D) and T(D’), respectively, then recall that (15.8)

THEOREM 15.1 Let D and D’ be domains in, respectively, R” and R”, 4 : D .+ D‘ a map between them, 4* : T( D ) -+ T( D’) the prolonged map to tangent vectors. Consider 4* as a map also of T( D) x R + T(D’) x R by mapping

4*(u, 0 = (4*(u), 0

for v E T ( D ) , t E R ;


157

that is, (4*)*(t)= t . Let L and L’ be, respectively, functions on T ( D ) x R and T(D’) x R defining Lagrangians on D and D’ such that (&)*(L’) = L. Then: (a) If a(t),a 5 t I b, is a curve in D,if al(t) = 4(o(t))is the transformed curve in D‘ under 4, then (b) L ( 4 = L’(a,), ( 4 * ) * ( w %= 49. Proof. (a) Follows more or less from the definitions. The basic geometric property of 4* is Ol’(t)

= 4*(o’(t>);

hence, L’(o,)

=

j L‘(ol’(t))d t = j L!(&(d(t)))

=

ja(4*)*(WCJYO)d t

=

jaL(o’(t))d t = L(a).

b

b

a

dt

a

b

b

(b) This requires a little more computation: dL

= Ln+ld f i

+ Li dxi + L, d t

= (+,)*(dL!)= (&)*(L;+.

= (4*>*(Zn+a)

d(%

fj)

di,

+ L,’dza + L,‘dt

+ (4*)*(~:)d4o + ( 4 * ) * ( ~ :d)t

(the unwritten terms involve dx, and dt). Hence,

a4a kj dt (4,)*(dz, - in d t ) = d&=- -

axj

a4a

a4a

= - d x j - -fj

ax

84a axj

ax

dt

= -( d x j - f j dt).

158


= L,+ j(dxj

- f j dt) + L dt

= B(L).

Q.E.D.

COROLLARY 1 If Cp,(T(D)) = T(D’), then 4 carries an extremal of L into an extremal of L’. Proof. The proof consists in putting together three general remarks. (a) If E and E‘ are domains, cp: E-+ E‘ is a map such that 6,(T(E)) = T(E’) ; that is, Cp is a maximal rank mapping. If w‘ is a closed 2-form on E ’, and if u E T ( E ) is a characteristic vector of cp*(w), then cp&) is a characteristic vector of w. For the proof, suppose that D E E,, y E E. Let u’ E E i ( y ) We . must show that w((D*(P), u’) = 0. But by hypothesis there exists a vector u E Ey such that cp*(u) = u’. Then W(cP*(U>

u’) = w(cp*(u), 4o*(u’))

= cp*(w)(u, u ’ ) = 0.

(b) If 4,(T(D)) = T(D’), then (&)*: T(T(D))-+ T(T(D))is onto, that is,

4* is a maximal rank mapping of T ( D ) or T(D’).

We shall use the standard coordinates ( x i , ii),(z,, 5,) for, respectively, T ( D ) and T(D’),1 I i, j , . . . , I n ; 1 I a, 6, . . . , 5 m. To show that a mapping of vector spaces is onto is equivalent to showing that the dual mapping on covectors is 1-1. Thus, suppose that there is a covector on T(D’) mapped into zero by (&I*; say,

0 = (4*>*(Aa dza + A m + , d i a ) = A,

- nx; + Am+, d

= A,

a(Pa a24, dxi + Am+u

JdJU

axi

axi

axj axi

~

xi

dxj

+ A,,,+, 84, -d i i axi

Now that 4 is maximal rank means that the rank of the matrix ( d 4 , / d x i ) is everywhere m. Then

84, dRi = 0; ,Im+,-

axi

hence,

34,

;Im+, - = 0,

axi

I59


hence,

a 4 a dxi = 0, A, axi

a4a

hence, Aa - = 0 , axi

hence, A,

= 0.

(c) If a(t) is a curve in D, if ol(t) = 4 ( o ( t ) ) is the transformed curve under the image of the curve t -+ (o'(t), t ) under the map (p* is the curve t (%'(t>,0.

4, then +

This follows from the very definition of the map

c$*

:T(D) x

R iT(D') x R.

COROLLARY 2 Let X = A ,(a/dx,) be a vector field on D. Define the prolonged vector field

X on T ( D ) x R as

aAi s. a + -.a 2 = A . -a + laxi

axj

asi

at

Then x ( e ( L ) )= o ( ~ ( L ) ) .

(15.9)

Proof. This could be verified by a similar direct computation, but it is more constructive to reduce it to Theorem 15.1 by a geometric argument. Suppose, for simplicity, that X generates a one-parameter group of transformations of D,s -+ T,; that is, for xo E D, s T,(xo)is an integral curve of X . In particular, --f

a axj

a

- T,"(Xi) = X(T,*(Xi)) = Aj - (T,*(Xi)).

at

Now, from the geometric meaning, s T,, is also a one-parameter transformation group on T ( D ) . Extend T, to T ( D ) x R by --f

(T,*)*(t) = t

+ s.

Finally, then, s + T,, is a one-parameter group of transformations of T ( D ) x R. We want to show that X is the infinitesimal generator of this group :

160


hence,

a as

- K*)*(t)I

s=o

=

1.

This verifies the indicated form of X . Finally, to prove (15.9),

a as

= I3 - (T,,)*(L)

I

s=o

= O(X(L)).

As useful as these nonparametric Lagrangians are for understanding classical mechanics from a higher” point of view, for theoretical purposes it is more convenient (as we saw in Chapter 14) to have Lagrangians whose extremal curves can be freely reparametrized. In physics this corresponds to giving up the Newtonian picture of “time” as an “independent” variable with a different status from the “ dependent” space and velocity coordinates. We first show that the extremals of a Lagrangian L on D are independent of the parametrization if “

(a) L is a function on T ( D ) alone; that is, L is time-independent; (b) L(/lu) = AL(u) for A > 0.

ProoJ: Let o ( t ) ,a I tI b, and ol(z), c( I 7 I B, be curves differing only by change in parametrization. By definition, this means that there is a realzI [I, such that t’(r) > 0, (.) = a , z(P) = b, and valued function t ( z ) , c( I ol(z)= o(t(7)) for c( I z 5 p. L(o,)

/I = Ja L(o; (7)) n7 =

J:” L(o’(t(z)t’(z))dz

On the other hand, we can start with an arbitrary time-dependent Lagr-angian L(.u,, . . . ,x,, i, . ., . , in, t ) ; aiid by introducing another pair of dependent variables x , + ~= I, infl = i, and a Lagrangian L i , then by


161

the formula

we convert the time-dependent, parametrization-dependent variational problem in iz variables to a time-independent, parametrization-independent probIem in (n + 1) variables. To verify that this formula actually does this, notice that L, so defined is homogeneous, and that Ll(al) = L(o), where B = ( x , ( t ) ) , and oI = (xi(t), t ) is the graph ” of c in (n + ])-space. We shall work from now on with such a time-independent, homogeneous Lagrangian L ( x , , ii). Then L satisfies the Euler homogeneous function relations, which will play an important role. To derive them, start with “

An) = AL(x, i).

(15.10a)

L,+ l(x, A i ) i , = L(x, i).

( 15.lob)

L(x, Differentiate with respect to A:

Differentiate again with respect to

A and set A = 1.

f)iiij = 0. Ln+l,n+j(x,

(15 . 1 0 ~ )

Differentiate (15.10a) with respect to A i : J ~ , , + ~A( iX) A, = 2L,,+l(x,i) or

L,z+l(x,Ai)= L,+l(x, i). (15.1Od)

Applying a / d A to (15.10d), we have

Ln+1 , n + j(x, f ) i j = 0.

(15.10e)

Then

B(L) = Ln+i dxi - ( L n + i i i L) dt = L n + ldxi

- 0,

by (15.10b). Thus O(L) is a form on T ( D ) alone, and we can in effect ignore the additional explicit “ time” variable and consider extremals as characteristic curves of the 2-form dB(L) on T(D). In addition, note that by (15.10e), det(Ln+i,,+j)= 0; hence the functions y , = L n + iare not functionally independent. However, if (n - 1) of them are functionally independent, that is, if rank(Ln+i,n+j(u)) =n -1

for all u E T(D).

( 15.1 1a)

then we say that L defines a regular homogeneous variational problem.

162


Another way of putting this condition is as follows: Whenever (bj)are numbers such that Ln+i,n+j(X,

I)bj=O,

(15.11b)

then we must have b j = a I j for some real number a. For (15.11b) just expresses the fact that the nullity of the matrix ( L n + i , n + j ) is 1, equivalent to the fact that the rank is n - 1. Suppose from now on that L satisfies this regularity condition. The general theory of homogeneous variational problems expounded in Chapter 14 is applicable, with no constraints; that is, K = T ( B ) . Modifying slightly the general theory (since we may want to also consider extremals that do not minimize L), we say that an extremal j e f d for the variational problem is defined by a vector field X : D + T ( D ) such that

X*(dO(L)) = 0,

X * ( L ) > 0 at all points of D.

(15.12)

(We are using the characterization of vector fields as mappings D + T(D) such that X ( x ) E D, for all x E D, that is, as cross-section maps.) Equation (15.10b) implies that

If X E V(D) defines an extremal field, so doesfX, for each positive function f on D.

( 15.1 3)

Suppose that X = Ai(i3/dxi), and that W is a function on D such that X*(O(L))= d W ; that is, d W = L n + i (A~( ,x ) )d x i .

(15.14)

Then

X ( W ) ( X )= La+ I(x, A(x))Ai(x) = L(x, A ( x ) ) = X*(L)(x)

for X E D .

Hence we can normalize X by multiplying by a positive function so that X ( W ) = 1= L ( X ( x ) )

for all x E D.

We shall call such a function (defined up to an additive constant) the characteristic function associated with the extremal field. THEOREM 15.2 Suppose that Xis a vector field on D that defines an extremal vector field for the variational problem defined by a homogeneous regular LagrangianL on D ; that is, X satisfies (15.12). Suppose that W is a function o n D such that X*(O(L)) = dW. Then:


163

(a) The integral curves of X are extremals of L. (b) Let o and o1be curves beginning and ending on the same level surface of W whose tangent vector curves are sufficiently close together. Suppose further that o is an integral curve of X and that the following condition is satisfied : The symmetric matrix (Ln+i , ,+ j ( o ’ ( t ) ) ) is positive semidefinite.

( 15.15)

Then L(o) < L(o,). Equality holds only if o1 is also an integral curve of X . Proof. Part (a) is a consequence of Theorem 15.1. To prove Part (b) we will show that, for each x E D,L(u) > L(X(x)) for each u E 0,that is sufficiently close to X ( x ) , and that satisfies u( W ) = 1, u # X(x). If, say,

then d dt

- (L(x, A ( x )

+ t(v - A(x))) = L,+i(Ui - A,(x)).

d2

- L(x,A ( x ) + t(v - A(x))) I t = O dt2

= Ln+i,n+j(x,A(x))(ui

+

- Ai(x))(vj - Aj(x)>-

We know that (d/dt)(L(x,A ( x ) t(u, - A(x))) = 0, from the very definition of extremal vector field. Now, (d2/dt2)L(x,A ( x ) tu) 2 0. We want to prove that it is > 0. Suppose, otherwise, that is, it =O. Then, by (15.11b), ui - A i ( x ) = p A i ( x ) . The condition

aw

+

aw

1 = -( X ) U i = -(X)Ai(X) axi axi forces p

= 0; contradiction.

Q.E.D.

We see that extremal vector fields are very useful if they can be found. Now we indicate how the general method given in Chapter 13 for finding integral manifolds can be adapted to finding integral manifolds of the special 1-form dd(L). Let o(t),a 5 t 5 6, be a curve in D, and let us look for necessary conditions that o is the integral curve of an extremal vector field X . The first condition that comes to mind is that o is to be an extremal itself, that is, a solution of the Euler equations. Let W be the function on D such that X *(B(L))= d W.At most adding a constant to W , we can suppose that W(o(a))= 0. Suppose that

164


4 : D‘

+ D is a submanifold of D such that W is equal to zero on Then, for x E D, dW = L,+,(X(Xj) d x , .

Hence, for x x , we have

E

(p( 0‘)and for a tangent vector u E

u( W ) = L,,

4(D’).

D, that is tangent to (P(D’)at

j ( X ( X ) ) U ( X , ) = 0.

Forgetting for the moment how we arrived at this relation, let us make a general definition. Definition

Let L : T(D) R define a homogeneous, regular Lagrangian on D. Suppose that u and u are tangent vectors to a point x E D ;u is said to be perpendicular to u (with respect to the Lagrangian L) if --f

L,+,(u)fi(v)= 0.

(1 5.16)

If V is a subspace of D, ,we say that u is perpendicular to V (with respect to L) if u is perpendicular to all vectors u E V . If (p: D’ D is a submanifold of D such that x E 4(D‘), we say that v is perpendicular to the submanfold if u is perpendicular to the tangent space of the submanifold at x. If a x is a vector field on the submanifold (p, that is, x is a map assigning a vector ~ ( x E) D, to each x E $(D), we say that x is a perpendicular uectorjield to 4 if ~ ( x is ) perpendicular to the tangent space to 4 ( D ) at each point x E 4 ( D ) . This relation is a generalization of the ordinary relation of perpendicularity for vectors in Euclidean spaces. However, notice that this relation is not necessarily symmetric, as it is for Euclidean geometry; that is, if u is perpendicular to u, u is not necessarily perpendicular to u. It is also easily seen that this definition is independent of the coordinate system. Returning now to the extremal vector field X , its associated characteristic function W, and its integral curve a(tj with W(o(0))= 0, we see that o’(0) = X(a(0)) is perpendicular to any submanifold 4 : D’ -+ D on which W is zero, and the vector field on the submanifold obtained by restricting Xis also perpendicular to the submanifold. Note also that the map D‘ -+ T(Dj, which assigns X(q5(x’)) to each x E D‘, is an integral submanifold of the I-form B(L). Now we are prepared to apply the general theory of Chapter 13 concerning integral submanifolds of 1- and 2-forms to reverse this reasoning and give a condition that an isolated extremal curve of L can be embedded as anintegral curve of a vector field. --f

THEOREM 15.3 Let D be a domain of R”, with a homogeneous, regular Lagrangian L. Let D be a submanifold of D of dimension n - 1, and let B : [0, I ] -+ D be

4 : D’

-+


165

an extremal curve such that a(0) lies on 4(D') and such that its tangent vector a'(0) is perpendicular there to (P(D'), and such that L(a'(0)) = 1. Then, if D is sufficiently small, there is a unique extremal vector field X such that: (a) a is an integral curve of X . (b) If W is the function on D such that X*(B(L)) = dW, then W = 0 on

4(W*

Thus, an isolated extremal curve of L can be embedded (locally) in an extremal field in many ways-every choice of a hypersurface to which it is initially perpendicular defines such a field. We shall arrange our proof of Theorem 15.3 in a series of lemmas so that the reader can see at least the beginning steps of attempting to embed a in an extremal field if its initial perpendicular submanifold is of lower dimension than n - 1. We shall not complete this project here, partly because we do not know the answer (except for Riemannian manifolds, where it can be done). LEMMA 15.4 Let D be a domain with a homogeneous regular Lagrangian L. Let + D be a submanifold of dimension m of D. Let P(4) be the set of tangent vectors u E T(D)to points x E +(D')such that L(u) = 1, u is perpendicular to +( 0).Then P(4) is a submanifold of T ( D ) of dimension (n - l), and is an integral submanifold of the 1-form B(L). A vector Y E P ( 4 ) is not tangent to

4 : D'

4W)-

Proof. It suffices to prove the lemma in the case that D is as small as we please. Let X , = A;(d/dx,), 1 I a, b, . .., 5 rn, be everywhere linearly independent vector fields in D such that the X , are tangent to 4(D') and that their values at each point of 4(D') define a basis for its tangent space. A vector u = (i is ) perpendicular to 4 ( D ' ) at x if Ln+i(X,i)Ai"(x)= 0.

We add the equation L(x, i) = 1, and must show that these equations are of maximal rank. Thus we must show that the rank of the n x (m+ 1) matrix

(

L n

is equal to m

+ i, n +j ( X , i)Ai"(x) Ln+

j ( x , i)

1

+ 1. Suppose, then, that there is a relation of the form

+

AL"+j(x,iO) ; l . L n + l , n + j ( X , iO)A,"(x)= 0. Multiplying by i jand using the Euler relation (15.10b) and (15.10c), we have

166


Calculus

1= 0. Since now Ln+i, n + j ( x , i)(& A:(x)) = 0, sinceL is regular, we must have relations of the form 1,A:(x) = px:, whence 0 = L , + ~ ( xfo)l, , A; ( x ) = P L , + ~ ( xi)ii , =p. Finally, then, 1, X a ( x ) = 0, whence 1, = 0. Now suppose u E P(#)n D, is tangent to d(D). We must then have relations of the form

v

hence,

= y,

X,(X),

Ln+i(x,i ( u ) ) A q ( x )= 0;

0 = Ln+i(x, i(v))A,R(x)ya= Ln+i(x, +>)k(u) = L(x, i ( u ) ) =

contradiction.

1,

15.5 LEMMA

Let L be a regular homogeneous Lagrangian on a domain D and let D x 0 ,for xo E D,satisfy L(v) = 1 . Then, if a is sufficiently small, there is a unique extremal curve a(t), 0 s t I a, satisfying a(0) = xo, o'(0) = u, L(a'>- yi(y(t))= 0,

dL,.,(y’(t)) Hence the conditions that y ( t ) are d dt

- L,+i(O’(t),

=

d

L,+i(c’(t),0,

= ( ~ ’ ( t )t,)

etc.

be a characteristic curve of

t)(wi - J’idt) - Li(a’(t),t)wi

+ Liyi dt + L,+iCjkiYjWk = 0.

Now w i and dt are independent differential forms; hence these equations imply that the coefficient of wi is zero, that is: d dt

- L,+i(C’(t)>

t ) - L,(@‘(t), t)

+ Ln+k

Cjik(‘’(t),

t)wj(c‘(t>)= 0.

These differential equations for o are the EuZer(-Lugrange) equations with respect to the basis (q, . . . ,w,). Notice, in case the w , , . . . , w, are the differentials of a set of coordinate functions x,,. . . ,x, for M , that they reduce (since Cjik= 0) to the classical Euler equations d aL aL ----=o dt a i i axi

that were derived in Chapter 15. The more general equations are very useful in certain mechanical problems (for example, in rigid body dynamics) where a basis for differential forms can be found more readily than for a “natural” coordinate system. 1f the variational problem is nonhomogeneous, that is, if extremals cannot

173

16. Symmetries of Variational Problems

be freely reparametrized, regularity of the variational problem is determined by the condition

-.. A e(L) z 0. e(Ly = B(L) n factors Alternately, this can be expressed by the condition det(Ln+i, ,+

dL,,

E L,, i, n +

dyj

j)

# 0, where

(modulo m i , dt).

If the variational problem is time-independent and homogeneous, that is, if L ( h ) = LL(v)

for

A > 0,

then ( L n + i y i L) = 0; hence O(L) can be considered as a I-form on T ( M ) . Regularity is now determined by the condition : Dimension of the characteristic vector of B(L) = 2 or rank(&,+,, ”+ j ) = (n - 1). An extremal vector field for the variational problem can now be defined as a cross-section mapping @: M x R -+ T ( M ) x R assigning to each pair ( p , t ) E M x R a tangent vector @ ( p , t ) E M p such that

a*(de(L)) = 0. A function S ( p , t ) on M x R such that dS

= @*(B(L))

is a solution of the Hamilton-Jacobi partial differential equation associated with the variational problem. The rays of an extremal field are the curves a ( t ) that are solutions of the system of ordinary differential equations: d ( t ) = @(a(t),t ) .

They are extremals of the variational problem, as described above, but such n-parameter families of extremals play an important role in proving that extremals really do minimize, and are very important in physics (for example, in describing the wave-particle duality). Let 4 : N M x R be a submanifold of M x R. A “ vector field ” on N is a mapping v : N --+ T ( M ) such that v(p) E 4 ( p ) for p E N . Such a vector field is perpendicular to N if v*(B(L))= 0. If @ is an extremal vector field, with S the associated solution of the Hamilton-Jacobi equation, and N is a submanifold defined by S = constant, then one sees that @ restricted to N is such a perpendicular vector field. Now suppose that L and L‘ are Lagrangians on manifolds M and M ’ , that 4 : M ’ M is a map, and 4*: T ( M ’ ) 4T ( M )is the differential of 4. We shall --f

174


also use 4*to denote the map: T ( M ' ) x R - + T ( M ) x R, which acts identically on R. Suppose that Then, also, as we have proved in Chapter 15 (Theorem 15.1) For example, if 4 is a diffeomorphism between M and M ', it is clear that this implies that 4 maps an extremal of L' into an extremal of L. If M = M ' , L = L', such a 4 can be regarded as a symmetry of the variational problem. A Lie group G acting on M as diffeomorphisms of M can be regarded as a group of symmetries if each individual transformation is a symmetry. Now, the action of G on M can be prolonged to an action of G on T ( M ) x R by sending C#I E G into 4*. (It is left to the reader to verify that this actually defines an action of G on T ( M ) x R.) This action of G preserves O(L);hence, also dB(L). Now the Lie algebra of G also acts as a Lie algebra of vector fields on T ( M ) x R, since the action of a Lie group on a manifold gives rise to an action of its Lie algebra as vector fields. For example, if an element X in the Lie algebra of G is realized as a vector field

on M (using a local coordinate system xi,. . ., x,,for M ) , the prolonged vector field is

x

a +aAi i. a axi axj aii

= A.-

on T ( M ) x R, as in Chapter 15.7 Thus we shall automatically have X(O(L))= 0,

X(dO(L)) = 0.

These basic geometric facts suggest that we split up the problem of discussing groups of symmetries into its " Hamiltonian " and " Lagrangian " components. Symmetry Groups from the Point of View of Hamilton-Jacobi Theory

We shall now change our point of view. We regard Hamilton-Jacobi theory as the study of the characteristic curves of a closed 2-form, independently of whether the 2-form arises from a variational problem.

t With one slight modification, that is, we leave off + ( a / a t ) in the definition of 1,thus regarding ias acting trivially on the time coordinate.


175

Let P be a manifold. (By choosing P, we mean to emphasize that the typical case is that where P is the “phase space” of a mechanical problem.) Let o be a closed 2-form on P. Recall that a vector v E T ( P ) or vector field X E V ( M )is the characteristic for w if

x_Io=o.

u_lo=o,

We suppose that the dimension of the characteristic vectors (or, equivalently, the rank of o)is constant on P. A vector field Y E V ( M )leaves o invariant if Y ( w ) = 0.

THEOREM 16.1 Let o be a closed 2-differential form of constant rank on P. Let

c = ( X E V ( P ) :x J w = O},

z = { Y E V ( P ) : Y ( w ) = O}.

Then

[ C , I ] c c.

[I, I ] c I ,

In other words, C is an ideal in the Lie algebra Z. The one-parameter transformation group generated by a Y EI permutes the characteristic curves of w. Proof. Most of this follows trivially from the rules of operation for vector fields and differential forms. For example, we prove [C, I] c I. For X E C, YEI, [X, Y ] A

0=

Y(X

_I 0)-

x _I Y ( 0 ) = 0 ;

hence [ X , Y ] E C. To prove C c I, suppose X E C. Then X ( w ) = X A do + d ( X A w ) = 0. We leave the last remark as an exercise. Now, in accordance with general group-theoretical principles, one should regard I a s the Lie algebra of the transformation group of all diffeomorphisms 4 of P that preserve w , that is, that satisfy 4*(w) = o.The reader should be warned, however, that this group cannot be regarded as a Lie group. (Roughly, it cannot be described by a finite number of parameters.) Hence this relation between the group and Lie algebra will remain in the background as intuitive motivation. Let Y EI. Using another of the rules of operation concerning vector fields and forms, we have 0 = Y ( w )= Y

_I

do + d(Y _I w),

hence d( Y _I o)= 0. Thus the mapping Y -+ Y J 0 sends I into the set of closed 1-forms on P. The kernel is C . Since C is an ideal, the image in the set of closed 1-forms inherits the Lie algebra structure associated with I/C.

176


What precisely is this image in the set of closed 1-forms? Notice that 0 = Y _I o satisfies

X

_I

8 =0

for all X E Z-

(16.1)

Note another useful fact:

If $ is a 1-form satisfying d$ then

= o,and

if Y ( $ ) = 0,

d ( Y _ I $) = - Y_I 0.

(16.2)

Before proceeding further in these abstract directions, it may be helpful for the reader to see an example of what this means in more classical language.

EXAMPLE (THE

POISSON BRACKET)

Suppose dim P w = dpi A dqi

= 2n,

with coordinates ( q l , . . . q n ,p l , . . ., p , ) .

(1 I i, j , . . . , I n ; summation convention in force).

We choose this way of labeling coordinates on P so as to suggest the usual terminology in classical mechanics. P is " phase-space,'' the q are coordinates of " configuration space," and the p are coordinates of " momentum space." Let Y EI. Y -I w

=

Y ( p J dqi - Y ( q J d p i -

Since d( Y o)= 0, and P is a Euclidean space, by the PoincarC lemma there is a function f E F ( P ) such that (16.3a)

af -- -Y(qi) _ api

or

Y=

af a ---+ aPi

aqi

af a

--.

aqi aPi

(16.3b)

Suppose that we turn this around and, given f E F ( P ) , define Y, as a vector field on P by the formula (16.3). Then Y,(cu)

= d ( Y , -I w )

+ Y , -I do = d ( d f ) + 0 = 0.

Thus, for g E F(P), (16.4) The function on the right-hand side of 16.4 is classically called the Poisson

177


bracket of the function f and g, and denoted by if, g } . Now we have defined it so that d f = Y,Jw. (1 6 . 3 ~ ) Then d { f , g> = d Y j ( g ) = Y,(dg) = Y,( YgJ 0) = [ Y, > Ygl 0Thus, Y{f, g } = Yf YgI. (16.5) 2

Equation (1 6.5) suggests that we regard F ( P ) as some sort of algebra under the Poisson bracket operation (f,g) -+ {A g), and (16.5) then says that the mapping f -+ Y, is an algebra homomorphism of F(P) onto the Lie algebra I. In fact, we shall show that F ( P ) under { , 1 is itself a Lie algebra, so that f 4 Y , is a Lie algebra homomorphism. Skew symmetry: {f,g} = - {g, f } of the Poisson bracket is obvious from (16.4). It remains only to prove the Jacobi identity:

{f,( 9 , h:)

t r, >

+ Yg(Y,(h))

=

Y f ( b h ) ) = YfC Y,(W

=

y{f,g}(h)+ ( 9 , { f , h } } = { { Ag } , h} + ( 9 , {f,h } ) ,

=

Y,l(h)

which is precisely the Jacobi identity. There is an alternative definition of Poisson bracket in terms of exterior multiplication. Note that o is a 2-form of maximal rank 2n. Hence on# 0. (0" = exterior product of n copies of w.) Forf, g E F ( P ) , on-' A d f A dgis a say, 2n-form that then must be a multiple of on, A

df

A

dg

= ho".

Let us find h by applying Yf_I to both sides: Y,

_I

on-' = (n - 1 ) 0 " - ~

(since d f o~ = (-1)"'o

Y,

A

A

df = o

J df =

df, A

Y,

_I

on= no"-'

A

df

d f ) , and

Y,(f)

=

{f,f> = 0.

Finally, then, on-' A

df-{f,g } = h * no"-'

A

#.

This suggests that we try to prove that h = {f, g)/n. Since they are functions, it suffices to prove that there is point-by-point equality. Now, if df = 0 at a point, clearly both sides must be zero; hence, equality. If df # 0, df A on-' is not zero (exercise in exterior algebra; left to the reader); hence, equality again. Finally, then, gn-

1

A

df

A

{f,9) gn.

dg =-

n

(16.6)

178


Calculus

Notice now that we have used only the coordinate system ( p i ,qi) to get the classical formula for Poisson bracket. Then we can sum up our results in coordinate-free form as follows : THEOREM 16.2 Let P be a manifold of dimension 212,and let w be a differential form on P of rank 2n. Let Z be the Lie algebra of vector fields on P that preserve w. There is a linear onto mapping: F ( P ) +Z, denoted by f Y,, satisfying (in fact, defined by) (16.3~).Defining the Poisson bracket o f f ; g E F ( P ) , denoted by {f,g } , by (16.4) makes F ( P ) into a Lie algebra such that the mapping onto Zis a Lie algebra homomorphism with kernel the constant functions. Equation (16.6) provides an alternate definition for Poisson bracket in terms of exterior algebra. The geometric properties of the Poisson bracket operation can be summed up as follows: Each f € F ( P ) gives rise (modulo the usual difficulties in extending integral curves of vector fields, which we shall ignore here) to a oneparameter group of diffeomorphisms on P that preserves the form w. The condition {f,g } = 0 means that g is constant on the orbit of the group, that is, that g is an integral of the ordinary differential equations defining the orbit. If ( q i ,p , ) is a coordinate system for P such that --f

w = dpi A dq, (that is, if w is in canonical form with respect to the coordinate system), then the differential equations for the orbits of the group generated by f are just

-dqi _ --dt

af

dpi’

dPi

dt

-

af

dqi’

These, it will be recognized, are just the Hamilton equations with Hamiltonianf. Let us rephrase some of these results in a more informal way, which is useful in physics. We are given a “phase space” P and a 2-form of w of maximal rank on P. The functions on P are the observables on phase space. With the help of o,the observables can be made into a Lie algebra. There is a mapping that assigns a one-parameter group of w-preserving transformations on P. (Classically, a w-preserving diffeomorphism is called a canonical transformation.) This is not quite 1-1, but the kernel is unimportant (for classical mechanics); that is, just the constant functions. In physics, choosing a particular mechanical system with phase-space P amounts to choosing a distinguished observable H, to be called the Hamiltonian (or energy). The corresponding one-parameter transformation group on P is to be regarded as determining the evolution of the mechanical system with time: A oneparameter group of symmetries of the mechanical system with Hamiltonian H is determined by an “ observable ” f with { H ,f } = 0.


1 79

The main point to keep in mind is this correspondence between " observable~,''that is, functions on phase space, and certain one-parameter groups of diffeomorphisms of phase space, called " canonical transformations." It is this structure that quantum mechanics has in common with classical mechanics, but in quantum theory the phase-space P must be regarded as being infinite dimensional. As application of these ideas, we shall prove two simple theorems concerning the global properties of Hamiltonian systems. As above, let P be an even dimensional manifold, carrying a closed 2-form o of maximal rank. For f E F(P),let Y, be the vector field such that df = Y, A o.Then the Poisson bracket off and g is

Yf(d= {f,s}. Two functionsf and g are said to be in involution? if {f,g } = 0. Thus, iff and H E F(P),f and H are in involution iff is an integral of Y , ,that is, iff is constant along the solutions of the Hamilton equations with Hamiltonian H . The next theorem due to Arnold [I] gives a good qualitative picture of the global conditions that a given Hamiltonian system must satisfy in order that it admit a " large " number of integrals that are in involution.

THEOREM 16.3 Suppose dim P = 2n, and that n functions f,, . . . ,f, are given on P that are in involution with each other and with H. Let Q be a connected component of { p E P:f,(p)= 0 =

=f,(p)}.

Suppose that: (a) Q is compact. (b) The forms df,,. . . , df, are linearly independent at each point of Q. Then Q is diffeomorphic to a torus, in such a way that Y, goes over into a vector field on the torus generated by a one-parameter subgroup.$

Proof. Suppose for the moment that just (b) is satisfied. Then Y,, , . .., Ym are vector fields on P that are tangent to Q and are linearly independent at every point of Q. Further, (b) guarantees that Q is a submanifold of P. Then Yf,, . , . , YJndefine a basis for vector fields on Q . Further, they commute, as

t This is the classical terminology, which probably should be changed because it is confusing to the modern reader. It conies from the classical theory of partial differential equations. 1 That is, regarding the torus as the underlying space of a compact Abelian Lie group. Thus, either all integral curves of YH o n Q are closed or they behave in the same way as do the one-parameter subgroups going off at an irrational angle.

180


is obvious from the condition that thefi, . . .,f,are all in involution. Then YH can be written in Q in the form YH = 91 Y J , + . . . + gn Yfn2

with

. . . > E F(Q). Now the condition that Y , commute with Yf,, . . . , YSnforces g1 =constant, . . . , gn = constant. Now, if Q is compact, the Y,, , . . . , Yfn generate a 91,

9 1 1

global connected Abelian Lie group of diffeomorphisms of Q; hence Q is the underlying manifold of a compact, connected Abelian Lie group, that is, a torus. Q.E.D.

Now let P continue to be a manifold of even dimension, with a closed 2-form o of maximal rank. Let D be an open subset of P whose boundary in P consists of a number of submanifolds of P. Let Y be a vector field defined in a neighborhood of D such that (a) (b)

Yis tangent to the submanifolds constituting the boundary of D ; Y ( o )= O .

Suppose, then, that there exists a functionfsuch that d f = Y A w . (Recall that d( Y _I o)= 0, so there are certain topological restrictions to the global existence of thisf.) Then the critical points o f f t h a t occur inside D are also zero points of Y. This can be exploited (see, for example, Theorem 16.4) to show the existence of zero points for Y if one knows for some a priori reason that the maxima and minima o f f d o not occur on the boundary. This can be made explicit as follows : Let p be a point on the boundary of D in P . Let us extend the notion of tangent vector to D by saying that a tangent vector v E P, belongs to D, if there is a curve t + o(t), with o(0) = p , defined for sufficiently small t and lying in the closure of D , such that ~ ' ( 0 = ) v. Then, if such a p is also a critical point off restricted to the closure of D, v(f)

= 0 = o(Y ( p ) , v)

for all

E

D,.

This condition can, under suitable hypotheses on Y, often be used to conclude that Y ( p ) = 0. Now, we specialize. THEOREM 16.4 Let D be the region between two concentric circles in the (x,y)-plane. Let t -+ 41 be a one-parameter semigroup area-preserving diffeomorphism of D such that: Each 4t rotates the inner and outer boundary of D in opposite senses, with no fixed points on the boundary. Then the semigroup has at least two fixed points inside D.

181


Proof. Let Y be the vector field in D which is the infinitesimal generator of t + d t . The area-preserving condition requires that Y(w) = 0, where o = dx A dy. Hence also, d( Y(w))= 0. We want to assert the existence of a function in D such that

Now D is not simply connected, but the condition for the existence off is seen to be that the integral of Y _I w around a boundary circle be zero, or that the line integral of the vector field

around the boundary circle is zero. Now X and Y are perpendicular vector fields. Our assumptions on each d t guarantee that Y has the behavior indicated in Fig. 3, where D is the region between the two circles. The full arrows indicate Y, which is tangent to the two boundary circles. Hence X , represented by the dotted arrows, is perpendicular to the boundary circles, the line integral of X around each is zero, and such an f exists.

FIGURE 3

Now let p be a critical point off restricted to the closure of D.Then

o(Y, X ) = Y

_I

w ( X ) = A2 + B2.

At most changing Y to - Y, we can suppose that X always points into D on the boundary. Then p cannot be on the boundary, for otherwise 0 = X(f)(P)

=4-(X)(P) =

y

J W ( X ) ( P ) = A2

+ B2(P),

which contradicts that t -+ 4, has no fixed points on the boundary. Hence, f has at least two critical points (namely, its maximum and minimum) inside D, which then must be fixed points for the semigroup. Q.E.D.

182

Part 2. Hamilton-Jacohi Theory-Variational

Calculus

Remark. The famous PoincarC-Birkhoff fixed-point theorem asserts that a single area-preserving homeomorphism of D that rotates the boundary of D in opposite directions must have at least two fixed points. Thus, Theorem 16.4 is the infinitesimal, differentiable version of their theorem. The proof we have given of Theorem 16.4 is considerably simpler than any existing proof of the stronger theorem. (Notice that, even if the transformation is differentiable, Theorem 16.4 cannot be applied to a single one, since there is no reason to expect that it can be embedded in a whole semigroup. Perhaps this is true, but no doubt proving it would be considerably harder than proving the PoincarCBirkhoff theorem.) It remains a challenge to topologists to formulate and prove a general fixed-point theorem including the Poincart-Birkhoff theorem, whose proof now uses very special techniques. This completes our discussion of the case where P is even dimensional, and where o is of rank equal to the dimension of P. Let us now return to the general case, where o is a closed 2-form on P whose rank is constant. As we have seen, if Y E V ( P ) satisfies Y ( o )= 0, then d( Y 1o)= 0. Now, if the PoincarC lemma applies, there is a functionfs F ( P ) such that df = Y _I o.At any rate, we shall suppose that such a function exists. For example, this is so if the first Betti number of P is zero. Of coursef is only really defined up to an additive constant, but this is not particularly bothersome. THEOREM 16.5 Let o be a closed 2-form on a manifold P of constant rank on a manifold P; let Y E V ( P ) and f E F ( P ) satisfy Y ( w ) = 0, d f = Y A o.Thenfis a constant along all characteristic curves of o. Proof. If X is a characteristic vector field for o,that is, if X

X ( f ) = @(X)

=o(X,

Y ) = ( X A w ) ( Y ) = 0.

_I

o = 0, then Q.E.D.

We see the general setting for the relation between one-parameter groups of symmetries and functions on phase space which are integrals of motion that is typical of all Hamiltonian mechanics. (Physicists usually know this as “ E. Noether’s theorem.”)

THEOREM 16.6 Let o,P , Y and ,f be as in Theorem 16.5. Then a point p E P is a critical point offif and only if the integral curve of Y beginning a t p is a characteristic curve of w.

Proof: If Y ( p ) = 0, this is true (since a point curve must be considered as a characteristic curve of 0). Suppose, then, that Y(p) # 0. Since the theorem is


183

a local one, we may suppose coordinates (q, . . . ,x,) have been chosen for P so that

Thus, if o=

Y(o)= 0 forces

1

15 i . j s r

a i j d x iA d x j ,

Hence, d'(p) = Y(p) J o is zero if and only if aTj(O,. ..,0) = 0, where a:( , . . ., ) is the function on R' such that a:(x,(p), . . . ,xr(p)) = aij(p).But if a'(t) = Y(a(t)),a(0) = p , then xi(o(t))= 0 ~ ' ( tJ) o = =

if i > 1;

t

if i = 1.

1 alj(o(t))dxi(a'(t))d x j 1 aTj(t,0, ..., 0) d x j .

1 si,j s r

25j5r

Buti3alj/ax, = 0 forces aTj(t, 0, . . . ,0) = aTj(O,. ..,0 ) ;hence a'([) J o = 0 Q.E.D. for all t if and only if d f ( p ) = Y ( p ) o = 0. Return to the Calculus of Variations M is a manifold of dimension m ;L is a Lagrangian on M , that is, a realvalued function on T ( M ) ; and P = T ( M ) x R. (For simplicity, we consider only time-independent Lagrangians on M . ) We may as well work with coordinates. Suppose x i (1 5 i , j , . . . 5 n = dim M ; summation convention) is a coordinate system for M . Then ( x i , f i , t ) forms a coordinate system for P , with ki(v) = dxi(v),t the coordinate on R.The x i are just the original xi on M pulled up to T ( M ) with no change in notation. The Cartan 1-form associated with L is then - L ) dt, O(L)= L,+i dxi w is then dO(L). A first source of symmetry vector fields of

UJ is obtained by taking a vector field on M , whose associated one-parameter group preserves the extremals of L, and prolonging to P :

184


We know that

Y ( W ) )= WW). We have seen earlier that exp(tX) permutes the extremals of L if and only if X(L) = 0. Thus, if Xi s a vector field on M that generates " symmetries " of L, we are in position to apply (16.2), and Theorems 16.3 and 16.4. If f x = o(L)(XT)= Ln+iAi,

then fx is constant along the characteristic curves of dO(L); that is, by Theorem 16.1,

$)

fX(x(t).

= constant

for any curve t + x ( t ) in M that solves the Euler equations. We callf, the function on P that is conjugate to the vector field X on M . Now we can discuss where Theorem 16.4 is applicable. We must examine d f , = d(L,+ A i)for critical points. But

+

d ( L , + i A i ) = L , + i , . + j A j d ~ j (L,,+i,jAi+ L , + i A i , j ) d x j . This is equal to zero at a point and forces (16.7a)

L n + i, n + j A j = 0,

L"+,,j A i

+ L , + , A , , = 0.

(16.7b)

j

These equations admit an immediate geometric interpretation in case L is a homogeneous regular Lagrangian, since then rank (Ln+ ,+ j ) = m - 1, and ( A j ) must be a multiple of (aj), say, l i j. The second equation becomes

,,

dA. 0 = L j , n + i ( xi)lii , + Ln+i(x,i )-2.

ax

The Euler homogeneous function relations give

(

0 = l L j x, -

+ L,+i(x,

'f)) axj

~

a axj

- - - L(x, A ( x ) ) .

Returning to coordinate-free notations, these conditions mean that the point is a critical point of the function q + L(X(q)) on M . There is a similarly simple geometric answer to the question in case L is a nonhomogeneous, regular Lagrangian. In this case, however, (1 6.7) gives essentially a trivial answer. For regularity means that det(L,+ i, n + j ) # 0 ;


185

hence (16.7a) forces A j = 0, that is, the point is a zero point of X . To get a more interesting answer, we must replace 8 by the vector field X + atat:

that is, we must just subtract fromf, the Hamiltonjan function. The critical point conditions for this function become

a

Ln+,n +j Ai = L n + i, n+ j ki

9

-(Ln+iAi - L"+iAi + L) = 0. ax

(16.8a) (16.8b)

But (16.8a) and regularity forces A i = Ai;hence (16.8b) condenses to

Hence, geometrically, we get essentially the same answer as in the homogeneous case. We can sum these computations up in the following coordinatefree form. THEOREM 16.7

Let L be a regular Lagrangian on a manifold M , and let X be a vector field on M that generates a one-parameter group on M permuting the extremals of L ; that is, X generates symmetries of L. Then the integral curve of X beginning at qo is an extremal of L if and only if qo is a critical point of the function 4 +. L(X(4)) on M. A famous example of this theorem is provided by the particular solution given by Lagrange of the Newtonian 3-body problem, where the three bodies rotate uniformly at the vertices of an equilateral triangle. Newtonian mechanics. A glance at a book on classical mechanics will convince the reader that most Newtonian problems correspond to Lagrangians of the form

L(x72) = +gij(x)Ai2j+ U i ( X ) A , - V ( X ) .

(16.9)

(As above, xl,. . . , xn continue to denote coordinates for M , usually the " configuration space.") Tn fact the g i j ( x )are determined by the mass-distributions, the (ai(x))are components of the zwctor potentials, the V ( x ) the scalar potential, of any force fields that may be present. (For example, problems of motion of charged particles usually involve the ai(x) in a nontrivial way, while in gravitational or electrostatic problems, only V(x) is present.) In fact this

186


possibility of separating the Lagrangian into the sum of terms involving, respectively, inertial masses, scalar potentials, and vector potentials seems to be characteristic of Newtonian physics and accounts for its simplicity by comparison, say, to Einsteinian physics. Now let us compute the Hamiltonian function for the Lagrangian given by 16.9,

Thus the variational problem (which is nonhomogeneous) is regular if and only if det(gij) # 0, a condition that we shall suppose is verified. Let us compute the Hamiltonian function. Put yi=L,+i=gijlj+ai H

= L , , + i l i- L

= g i j i i i j- L =QLJijiiij

+aili

+ V(x)

= +Gij(yi - ai)(vj -

aj)

+ V(X),

where ( G j j ( x ) )is the inverse matrix t o ( g j j ( x ) ) . Again, this simplicity of the Hamiltonian (that is, total energy) is characteristic of Newtonian physics. A word as to why His to be regarded as the total energy might be of interest. Most naively, t g i j i i i j is to be regarded as the “kinetic energy” (4 mass x (velocity)2 in terms of elementary physics), and V(x) is the potential energy, so that their sum is to be regarded as the “total energy.” Of course, in elementary physics, V and hence H i s defined only up to an additive constant. Now the Lagrangian L is time-independent; hence d / d t is to be regarded as generating a symmetry group on P . The function

(3

-O(L) -

H

is then to be regarded as the function that is “conjugate” to the symmetry ?,dt. Another more precise way of looking at this is to regard t as another , replace the Lagrangian L ( x , , . . . , x,, dependent variable, say, t = x , , + ~and i l ,. . . , i,)by the equivalent homogeneous Lagrangian L’(x,, . . . , x,+

1,

11,. . . , l,*+ 1)

=

L

(

XI,

. . . , x,,

5)ifl+P

x,+i l 1 , ... 9 x,+

1

187


Then ‘n+t+i=Ln+i

n

ij

j = 1

Xn+ 1

1

for I ~ i l n , E ~ ~ + -~L ~ =+ ~ - + L .

which is just Q(L)when we identify x , + ~ with t , and i n +with l dtldt, that is, with 1. Considered as a group on the configuration space of variables (xl, .. . , xn, t), the group generated by a/& is genuinely a group of symmetries for the Lagrangian L’, and H is genuinely its conjugate function on phase space. This identification of the total energy with the function that is conjugate to time translation is most useful in relativistic and quantum mechanics, where the “ elementary,” operational ways of defining energy are not available in obvious form. Of course this remark can be turned around and can be used to convince one that $ g i j i i i j and V ( x ) should be regarded as, respectively, the kinetic and potential energy. This indicates that the case where V = 0 = ai is to be regarded as the free particle ” case. Let X be a vector field on M that generates a group of symmetries of the “free particle” Lagrangian L = + g i j i i i j . Thus, if X = A,(x)(d/axi), the conjugate function g i j A i ( x ) i jcan be regarded as the “momentum” function defined by X . For example, suppose that “

9.. V = 6..m, ‘I

m = 3;

that is, L is the free Lagrangian appropriate to a particle of mass m moving in 3-space, R 3 . It is easily seen that the generators of symmetries are of the form

x =i . ( +

pijXj)-,

a

axi

where c l i , p i j are constants, and (pij) is a skew-symmetric matrix. Let us take, say, translation in the x,-direction:

The conjugate momentum function is then m i l , that is, just the classical linear momentum in the corresponding direction.

188

Part 2. Hamilton-Jacohi Theory-Variational Calculus

Suppose now that

x = x 2 - a- x l - . a ax, ax,

Thus X generates the group of rotations about the x,-axis. The conjugate momentum function is m(x2 il- x1i2), which is just the classical angular momentum. These facts explain from our higher point of view the role that these momentum functions play in elementary physics. Further, giving these functions such a group-theoretic interpretation enables one to define the corresponding functions in the generalizations of classical mechanics ; for example, Einsteinian and quantum mechanics. The Equivalence of Homogeneous and Nonhomogeneous Lagrangians. The Principle of Maupertuis Let (x,, . . . , x,) be a coordinate system for the manifold M , and let L(x, 2 ) be a time-independent homogeneous, regular Lagrangian for M . We now ask: How can the differential equations for the extremals of L be written in Hamiltonian form? One way is to give up the symmetry between all variables, and choose one, say, xl, to parametrize the curves of M . The effect of this is to construct the Lagrangian

. ., ., q x , , . . . ) X,-I, t, i,

= L XI,. . . ) X,-I,

A,

t, -,i,. . . ) . x,+ 1 x,+ 1 I),

obviously leading both to rather awkward formulas and to difficulties from the global point of view. There is an alternate procedure that keeps the symmetry between the dependent variables ; hence it is preferable both for esthetic reasons and because it carries over to manifolds. The extremals of L are always the curves of M whose tangent vector curve in (x, I)-space is a characteristic curve for dO(L), where

B(L) = L,,; d x ; . Now the obstacle in the way of writing the equations of the characteristic curves in Hamiltonian form is that the functions L,+l, . . . , L,, are not functionally independent. Indeed, since rank(l,+

i, ,,+ j )

= II -

1

(by definition of regularity), we know by the implicit function theorem that there is a function H(x,, . . . , x,, y,, . . . , y,) of 2n-variables such that identically. . . ., , L,,(x, i) =) 0 H(x,, . . . , x,, L,, l(x, i)


Define a mapping

189

4 of (x, 1, c)-space to (x,y , t)-space by

Then, since by construction of H , 4 * ( H ) = 0, we have 4*(yi dxi - H dt) = 8(L).

Thus 4 carries a characteristic curve of dQ(L)into a characteristic curve of d(y, dxi- H dc) restricted to the submanifold: H = 0. We know from Chapter 13 that the characteristic curves of d(y, dx,- H dt) are, after parametrization by t , just the solutions of the Hamilton equations with Hamiltonian H ( x , y). Hence the differential equations for extremal curve x ( t ) of L can be written in Hamiltonian form : (16.10) subject to the subsidiary condition H ( x ( t ) ,y(CN = 0.

(16.1 1)

Now suppose that L'(x, 1) is another Lagrangian on M , in fact nonhomogeneous and regular. Suppose in addition that this same function H is the Lagrangian for L'; that is,

(

El

Hx,-

Ei-..

=-

We also know that (16.10), without the subsidiary condition, describes the differential equations for the extremals of L. We can immediately draw several conclusions. (a)

Every extremal curve of L can be parametrized so that it is an extremal + x(t) is determined by the condition

(b)

Every curve x ( t ) of L' that satisfies

of L'. This parametrization t

is also an extremal curve for L.

190


We now want to turn these remarks around and suppose that L' is given. We know that H ( x , y ) can be constructed according to the usual rules and that every extremal of curve t -+ x(t) of L satisfies (16.10), with and

ff(x(t), y ( t ) ) = constant

Suppose now that we can find a one-parameter family e -+ L" of homogeneous, regular Lagrangians on x-space such that

(E 1

H x, We conclude:

(x, 2 ) = e

identically on (x,2).

(a') For each choice of e, the extremals of L' can be parametrized so as t o be extremals of L'. (b') If t --f x(t) is an extremal of L', with

then it is also an extremal of L'. This reduction of the problem of finding the extremals of a nonhomogeneous Lagrangian to the problem of finding the extremals of a one-parameter family of homogeneous Lagrangians is sometimes known as the isoenergetic reduction. (Notice that if L' is the Lagrangian of a Newtonian-mechanical problem, H is just the energy.) This circle of ideas forms what may be called the Principle of Maupertuis. Let us descend from these generalities to find examples of Lagrangians that actually occur in Newtonian physics. Let us consider a Lagrangian of the form L'

= JV(x)g,,(x)i.,i,

+

U'i.,

The condition for regularity is seen t o be det(gCj)# 0,

V # 0,

and we shall suppose without further comment that these conditions are fulfilled. =

Jvgijij

+ ui JTiiij


191

Let H ( x , y ) = G i j ( x ) b ,- ai)(yj- a j ) - V(x),where (G,) is the inverse matrix to (gij).Then

As we have seen above, this function H(x, y) is the Hamiltonian for the following nonhomogeneous Lagrangian :

L = + g i j% i f j + V(X) + a, ii. Thus an extremal of L‘,say, t 3 x ( t ) , that satisfies dxi dt

= 0 = L,+i - - L

or

dxi dx .

3gij-A - v(x(t)) = 0 dt dt

is an extremal of L. For example, if a, = 0, V = +,this means that the extremals of L‘ that satisfy

are extremals of Lr2.Now, in this case L’ defines a Riemannian metric on A4 (see Part 3). The condition C ( x , (dxldt))= 1 means that the curve t + x ( t ) is parametrized by arc length. (This equivalence between the variational problem defined by L‘ and L’’ is used as a simplifying technique by Milnor El]. Notice, however, that it is quite special and is linked to the quadratic nature of the Lagrangian.) Conversely, suppose that we start off with L given. The time independence of L implies that, for each extrernal curve, t x ( t ) : --f

dx, dx, dxi + gij - - V - a, - = constant. d t dt dt This constant can be absorbed in V. The result is that the integral curves of L for which this constant has the value e are integral curves of the homogeneous Lagrangian Le=J@+e)gijiiij+aiii.

The most interesting case is where a, = 0. Then this isoenergetic reduction process” tells us that all integral curves with a given value of “energy” are geodesics (that is, extremals) of a certain Riemannian metric or configuration space. As we shall see in Part 3, the theory of geodesics on a Riemannian manifold has a much richer geometric background than a random variational problem. “

I92


The Transition from Newtonian to Einsteiniant Mechanics via the Calculus of Variations The aim of this section is to show how some standard facts of special relativity theory can be derived (following rather closely to Levi-Civita’s ideas [l]) from the formalism of the calculus of variations. To give the essentials of the method, it suffices to suppose that M is onedimensional, so that the Newtonian picture is of a particle of mass m moving on a line with coordinate x and potential energy V ( x ) .The Newtonian Lagrangian is, of course, just

This Lagrangian is of the nonhomogeneous, regular type, so that its extremals come with their own parametrization. This parameter, of course, is identified with the physical time. Now L defines a whole class of Lagrangians, as V ( x ) runs over the class of suitable functions. Let 4 be a transformation of (x,t)-space into itself which “ preserves” the class in the following sense: Given L and 4, there is a Lagrangian L‘ of the same class, that is,

so that if t .+ (x(t), t ) is an extremal of L, t + 4(x(t), t ) is an extremal of L‘. Now it is easily seen that this requires that c$*(dt) = dt. That is, 4*(t)= t + constant: (4*)*(L’)= L. To determine the explicit possibilities for 4, suppose $*(x) = f ( x , t ) . Then

for some constant

M,

or

af

ax =

fl

or

f=

+x+g(t).

t We believe it would be more accurate from the geometric point of view to replace the standard name ‘‘ relativistic mechanics ” by “ Einsteinian mechanics.” For example, the beginner often is led to believe (in the numerous bad expositions of the theory) that classical Newtonian mechanics is not “relativistic.” However, it is covariant under a perfectly good group, namely, the Galilean group. In fact, the main effect of the Einsteinian revision is to replace covariance under this group by covariance under another group, the Lorentz group. (Of course Einstein himself in his own popularizations is always very good on this point.)

193


forces g ( t ) = fi = constant. Finally, then, we see that

4 must be of the form 4(x, t ) = ( + x + a,t + p).

Hence the symmetry group is the group of '' rigid motions " of the real line. We also see that the symmetry group permuting this class of Lagrangians is the symmetry group of the Lagrangian for the free Lagrangian. Now let us look at the symmetry group for the extremals of the free Lagrangian. We are looking for maps 4 of (x,t)-space into itself which permute the extremals of L = $2.

4 is of the form The extended transformation of

4*: (x,i,t ) -+

4 is

(f ( x , t), ax i+ -,at t + p1. af

-

af

Now O(L)= idx - +dt; i2 hence

Setting the coefficient of d i

A

dt in 4*(dO(L))- dO(L) equal to zero gives

This implies, since (x,1,t ) are independent variables,

Hence,

(

(4*)*(O(L))= + i Setting the coefficient of dx

-d2g =o dt2

$1

+ - ( f d x + dg) - 1 ( + i+ 2 A

g)2

dt.

d f in d(&)*(O(L)) - dO(L) equal to zero gives or

g(t) = yt

+ tl.

194


Calculus

Finally, then, 4 is of the form (x, t ) ( +x, y t + M, t + fi). This is a GuZiZeun transformation. Its defining property can be put more physically in the following way: --f

If s + (x(s), t(s)) is a curve in space-time, let ((dx/ds)/ (dflds)) be the velocity of the curve. The Galilean transformations permute the curves of constant velocity. The coefficient y is the increment given to the velocity. The new coordinates for space-time introduced by a Galilean transformation then represent physically a coordinate system moving at constant velocity with respect to the old. The Einstein modification of this scheme is an attempt to allow transformations of space-time that permit a more thorough “mixing” of the space and time variables, that is, that dethrone time from the absolute position it holds in Newtonian physics. Now an arbitrary transformation on (x, t)-space would send a Lagrangian of the form

into a Lagrangian on (x, t)-space in the form L’(x, t, i, t), but clearly it would be of a rather unrecognizable type. Following Levi-Civita [l, p. 2921, we can modify the Lagrangian L to get a Lagrangian L* that does have a more reasonable transformation law under space-time transformations. Consider a given extremal of L, and let c be a positive real number that is very large in comparison with the velocity of the given extremal. Let us first modify L to

This is harmless, since it does not affect the extremals of L. In a neighborhood of the tangent vector-field t -+ ( t , x(t), (dxldt)) of the given extremal, $ I

1--+c2

Thus, if we replace L by

c2

is close to

J :+ 1 - -I

2v(x).


195

we can be confident that the extremal of L* having the same end points as t + x ( t ) will be “close ” to t + x ( t ) . t What is the significance of the Lagrangian L* ? Notice that L* is in fact closely related to the homogeneous Lagrangian

In fact the extremals of L**,which are a priori of the form are, when reparametrized by t , just the extremals of L*. Now L** is a Lagrangian whose extremals are geodesics (extremals) of a pseudo-Riemannian metric (of Lorentz type) on space-time. Such Lagrangians have a very simple transformation law under changes of variable in space-time. Thus, at the expense of introducing this new sort of “ Lorentzian ” or (which is more just historically) Minkowskian geometry for space-time, we have “ geometrized ” the problem of allowing “ mixing ” transformations of space and time. Notice another fact that is of interest for the later extension to general relativity: The coefficients of the metric determined by L** depend, in case V ( x ) # 0, on the mass of the particle. Crudely, the massiveness of the particle actually affects the geometry. Let us return for a moment to L* and compare it to L. We know how to define the energy and momentum of the Newtonian system by applying the variational formalism to L ; namely, energy = -mB(L)

,

momentum

= rnO(L)

This also makes sense with L replaced by L*. Let us compute and see what happens. aL _---

ai

e(L) =

JC’

C i

- P’ + 2 ~ ( x ) / m . C f

Jc’

dx - H dt.

- i z+ 2 ~ ( x ) / m

t The analogy is with the following fact about extrema of functions in finite dimensional spaces: If qo is a critical point of the real-valued function q + F(q), and if the critical point is nondegenerate, then for small E , the function q + F(q) EG(q) will have a critical point that is close to qO.In this finite dimensional situation, this can be proved by use of the implicit function theorem, but the matter is more delicate for functions on infinite dimensional spaces such as those that occur in the calculus of variations.

+

196


with

Thus the

“

energy ” is

and the momentum is

mk Several famous conclusions follow from these calculations : First notice that if V ( x ) = 0, the energy differs by an additive constant from what it is in the Newtonian case; namely, energy

= me2.

Further notice that if t + x ( t ) is an extremal, then i ( t ) = dx/dt, the velocity of the particle in the usual sense. If (as we shall see in a moment) c should be identified with the velocity of light in a vacuum, then as long as the velocity of the particle is small compared with this velocity of light and V(x) is small compared with mc’, there should be no substantial difference from the Newtonian energy (except for the additive constant); hence the motion should also not differ substantially from the Newtonian motion. (Of course this is built into our construction, but it is nice to see precisely how it is reflected on the equations of motion.) Finally note that if one wants to write momentum

= mass

x velocity,

the mass of the particle must be identified not with its Newtonian m, but with m

+

J 1 r ( F / c z ) (2v(x)/mcz).

Thus the mass of a moving particle may vary with time, particularly if V ( x ) is small compared with mc2, and (dx/dt)/cis close to 1; the mass blows up. Since the energy must be a constant of motion, we see that a particle moving by this Lagrangian never can (if V ( x ) is small compared with mc2) approach

197


the velocity of light (from below). Finally, let us write the Euler-Lagrange equations of motion : aL- aL _d _ -dt

a i ax’

or

or d

mci

Notice that the Newtonian law of motion, mass x acceleration

= force,

makes no kind of sense. But, if the Newtonian law is rewritten in the form derivative of momentum = force, it does make sense in Einsteinian mechanics. with the force -(dV/dx)

JC ( q c v ( % i j / m T )

’

Now - ( d V / d x )is the Newtonian force. We shall leave Einsteinian mechanics at this point, having described how the basic laws of elementary physics might be modified. Let us now return to the homogeneous Lagrangian L** in ( x , t)-space whose extremals, when parametrized by t, are those of L*. We shall consider only the force-free Lagrangian, so L** = J

T T .

Let us look for the symmetries of L**, that is, the transformations space x time into itself such that (4,)*(L**) = L**. If

4*(t>= g ( x , t ) ,

4 * ( 4 = f ( x , t), then

af

af

(4*)*(f) = i, ax i+ at

ag

ag

(4*)*(i)= i; ax f + at

4 of

198


Calculus

hence,

Thus,

Differentiating the first relation with respect to t and the second with respect to x , c 2 ag a29 at

ax ax

af a2f - 0 , ax ax at

c 2 ag a29

at

ax at

af a2f -- 0. at ax at

Thus, either

The second possibility is impossible: For example, it leads to aflax - aglax ag/ax aj/ax

(3’(:I2,

or

=

which leads to c 2 = 0. The first possibility leads to Substituting this back in, leads to

dk

2($)

-

dh,

= c2

Using the first two gives

This identity forces dk, -= a , dt or k , is a linear function o f t .

= constant,

199


Similarly, working on the rest of the relations, we see that f and g are linear functions of x and t, say,

4*(x) = allx + a,,

t

+ a,

4*(t)= azlx+ a22t + b.

The relations the constants must satisfy are found by substituting back in c”$, - a:,

= -1,

c2&

-

= c2,

c2a21a22- a,,a,,

= 0.

These conditions define the affine linear transformation as the Lorentz group.? Several very important physical facts can be read off from the properties of the Lorentz transformations. First let us examine what happens as c -+ co. Let us suppose that the matrix

(2 2)

depends on c and that each element goes to a finite limit as c -+ co. Let a21

a)2) a22

be the limit matrix. It follows from the first and second relation that aLl = 0, ai2 = A 1. Now it is readily seen that the determinant of every Lorentz transformation is _+ 1. This relation also holds in the limit: 1 = (a;,a;2 - 4;1u;2)2 = Q12l l U ,22 2 = a 12 ll.

Finally, then, we see that the “ limit ” of a Lorentz transformation as c -+ co is one of the form x-+ + x

+ ut f a

(withu =

t

-+

+t

+ b.

Notice that this is a Galilean transformation! Hence we may say that the “ limit ” as c --+ co of the Lorentz group, the symmetry group of Einsteinian physics, is the Galilean group, the symmetry group of Newtonian physics. Thus we have a more sophisticated group-theoretical way of describing the transition. A variant of the same sort of reasoning can be used to describe the transition from quantum to classical mechanics. Now, let s -+(x(s),t(s)) be a curve in space-time. We agreed earlier to call dxjds

--

dtlds

-4s)

the velocity of the curve. We saw that the Galileangroup was characterized by

t Of course ‘‘space ” is usually three-dimensional, so that what is usually called the Lorentz group is the analogous group acting on (x,,x2, x 3 , &space.

200


Calculus

the property of giving a constant increment u to the velocity of all curves in space-time. Let us examine a Lorentz transformation from the same point of view. Now the transformed curve is

+

s -+ (allx(s> a I 2t ( s )

+ a, ~ ~ ~ x+( a22 s ) t(s) + b).

Hence the transformed velocity is a I l(dx/ns> + a 1 2 ( W s ) - a 114 s ) + a 1 2 a,,(dx/ds) + a22(dt/ds) - a214s) + a22’

Notice that if ~ ( s )= 0, then the transformed curve has velocity a12/a22. This number, then, should be an interesting invariant of the Lorentz transformation; in fact, it would be physically just the velocity of the new coordinate system defined by the Lorentz transformation with respect to the old. Let fl = a12/a22 be this invariant. Now 0 seems completely to determine the transformation. (This version of the Lorentz group is just one-dimensional.) One finds explicitly that 011 =

1 J1 - (P2/C2)’

a21

=

B czJ1 - ( f l ’ / C ’ ) ’

(The sign +_ is determined by the sign of the determinant (u11a22- u I 2 u 2 , ) . ) Substituting this back into the expression for the transformed velocity, we see that it is

+ B < + _ 4 s )+ P (a21/a22)4s> + 1 - * ( W B / C 2 ) + 1 . (a11/a22>4s>

Thus we see explicitly how the transformation law for velocities in Newtonian physics must be modified. Notice also that the condition that the transformation be real is /I2 < c2. Then

+

v(P/c2) + 1

cannot be greater or equal to c if B < c.

For then there would be (by continuity) a u such that

Hence it is impossible to take a velocity less than c and transform it to be greater than c. In effect, c is an upper bound for the velocities possible in our


201

world. Further, the same argument (when reversed) shows that the result of applying a Lorentz transformation to a motion of velocity cis again a motion of velocity c. These motions (physically, they are the paths of light rays, as we shall see in a moment) then have a distinguished role, both as a limiting possibility for the motions of velocity less than c and as possessing the property that their velocity is invariant under Lorentz transformation. (This latter property is just the mathematical statement of the result of the famous Michelson-Morley experiment, which initiated the relativistic ” revolution in physics.) Another way of labeling these notions serves to introduce us to the notions of general relativity. Change our previous notations slightly, and let L be the following Lagrangian on (x,t)-space: L = Ci2 - f 2 . “

In terms of the jargon to be introduced in Part 3, L defines a pseudoRiemannian metric of Lorentz type, or a Lorentz metric, for short, on (x,t)space. The curves whose velocities are less than c are then just those on whose tangent vectors L has apositice value. Such curves are also said to be timelike. The curves on whose tangent vectors L has a negative value are spacelike, while those for which L has the value zero are lightlike. Since the Lorentz group preserves L, it is geometrically obvious that it permutes timelike, spacelike, or lightlike curves. The route we have taken in developing the Special Theory of Relativity ” by generalizing Newtonian mechanics is not historically the way it was discovered, nor is it even the most important from the general physical point of view. Actually, the main clue in the minds of the discoverers of the theoryEinstein, Lorentz, and PoincarC (listed in alphabetical order)-was that the Lorentz group, not the Galilean group, permuted the solutions of Maxwell’s equations of electromagnetism. In fact, in Whittaker’s judgment [2], Poincart! rather than Einstein deserves most of the credit, since he saw most clearly that it was just this property of the Lorentz transformation that was involved. (Of course, as one of the two or three greatest figures in mathematics-indeed in all of science-in the nineteenth century, this is not surprising. As in so many other things, PoincarC was far ahead of his time, for this groupinvariance point of view has only recently been absorbed into the mainstream of physics.) While it would be too great a detour to describe Maxwell’s equations here, perhaps it is worthwhile to give a primitive, one-dimensional version of the argument. First, we must describe what is meant by a ware. Restricting ourselves to 1-space dimension x,it may be described as a real-valued function S(x, t). For a fixed t and constant a, the points S(x, t ) = a may be called the wave fronts at time t. A curve t -+ x(t) is a ray if “

S(x(t), t ) = constant for all t.

202


Calculus

Thus the ray is a curve following along the wave front. (In our one-dimensional situation, of course it is more or less uniquely determined.) The velocity (dx/dt)(t) of the ray may be called the velocity of the wave at time t and point ?(?). Now the curves describing ordinary light waves in a vacuum are those satisfying the wave equation :

a2s ax2

-

1 a2s c2 at2

Suppose 4 is a Lorentz transformation of (x,t)-space into itself. We leave it to the reader to verify, but it is seen that: If S is a solution of the wave equation, so is +*(S). This is what is meant by the Lorentz transformation “permuting” the solutions of the wave equation or, more loosely, preserving the wave equation. Suppose now that t + x ( t ) is a ray associated with a given wave S(x, t). Then

as d x as - ( x ( t ) , t ) - + - ( ~ ( t ) t, ) = 0. ax

clt

at

First we assume without proof a proof of the uniqueness of the wave equation: If S(x, t ) is a solution, and S(x, 0) = (aS/ar)(x,0) for all x, then S is identically zero. (See Courant-Hilbert [l, p. 4411 for a simple proof.) Thus, if f( ) and g( ) are functions of one variable such that S(X, 0) = I ( x >

+dx),

as

(x,0) = c ( f ‘ ( x ) - g ’ ( x ) ) ,

then

S(x, t ) = f ( x

f ’ ( 0 ) = 0,

+ ct) - g(x - ct).

(Clearly, such functions can be found.) Now this “general ” solution represents a superposition of curves traveling to the left and right on the x-axis. Clearly, only waves traveling strictly in one direction will possess a genuine “wave velocity” and a system of rays. Suppose, for example, that S ( X ,t ) = f ( x - ct). Then

or

_ -- c

dx

dt

or

x=ct+a.


203

Thus we have completed our limited discussion of the connection between c as the wave velocity” of light waves and as the constant occurring as an upper bound for velocities (and in E = me2) in Einsteinian mechanics. “

Special Relativity and Lie Group Theory In the preceding section we developed the elements of what is usually known as the Theory of Special Relativity from the point of view of generalizing Newtonian mechanics so as to replace the symmetry group of classical mechanics (the Galilean group) by another group (the Lorentz group), which permits some mixing between space and time. It is appropriate to call this and its later extensions to general relativity, which generalize the Newtonian theory even further, Einsteinian mechanics.” In this way of developing the theory, Lie group theory plays only a subsidiary role, and mechanics (or the calculus of variations) is basic. We shall now present an alternate approach based more directly on Lie group theoretical considerations. We shall still consider a space-time which, as used throughout the Theory of Special Relativity, is a manifold covered with a single coordinate system, diffeomorphic to Euclidean space. For the moment, we shall continue to suppose that space is only one-dimensional and that (x, t ) are the coordinates on this space-time manifold. As in the preceding section, the velocity of a curve s + (x(s), t(s)) in spacetime is, as a function of its parameter, “

dxlds s-+-- u(s). dtlds We saw that both the Galilean and Lorentz groups had the property that they permuted the curves with constant velocity. We shall now show that these are essentially the only possibilities if we want this and a reasonable physical property to hold. Suppose 4 is a diffeomorphism of space-time, with $*(XI

=f(x, t),

$*@I = 9(x,t>.

Then, if s + 4(x(s), t(s)) is the transformed curve,

Thus we see that the transformed curve is

204


The velocity of the transformed curve is then

(d/dW(x(s),t ( 4 N - (iifiJx>(dx/ds> + ( w J t ) ( d t / a- ( m W ( s ) + (af/w. ds(g(x(s), t(s>>>

(dS/aX>(dx/ds)+ (ag/dt)(dt/ds) - (ag/ax)v(s>+

(WW

Hence, if we want to be constant whenever u(s) isconstant, it is clear that the coefficients must be constant; that is, f and g must be linear? functions of x and t , say, 4*(t)= azlx + aZ2t , c#J*(x) = allx + a,, t, so that the velocity of the transform of a curve moving with constant velocity u is v’ = a110 + a12 az1u + a 2 2

As before, it is convenient to put P = u I 2 / a z 2the , velocity of the transform of

a curve with zero velocity, so that 0’ =

(allla,z>~+ B (a21la22)u

+1.

Now, when fi approaches zero, one expects (for physical reasons, if none other) that z’’ approaches u. This indicates that the coefficients of L) in the numerator and denominator should be$ functions of p, say,

v’ = 4 P ) v

+P

Y ( P b + 1’ with ~ ( 0 = ) 1, y(0) = 0. We now determine c( and y by imposing the condition that the transformations we are considering form a group, which again is obviously physical. Suppose that & is the parameter of another such transformation and that the result of composing the two transformations is a third transformation characterized by parameter p 2 . A direct computation shows that

First notice that product, we have

B2 = u ( / ~ ~ + ) PPI. Comparing

the other two terms in the

4 4 P I ) P + P I ) = 4Pl).(P) + PI? Y ( 4 B l ) P + P1) = Y(Pl>4P> + Y(B>.

7 For convenience, we shall consider only homogeneous transformations.

J More accurately, one expects that the transformations belong to a group. The condition 0 requires that the group be onethat the transformations approach the identity as dimensional.

p

--f

205


Differentiate these relations with respect to

p and then set p = 0:

E ’ ( P l ) a ( P d = .(PI>~’(O>,

Y(Pl)a(B1) = y(Pl)a’(o) + Y’(0).

Also differentiate the first relations with respect to p, and then set P1 = 0:

+ 1) = a’(O)a(p) + 1.

.(p)(a‘(o)p

From this last relation, we have, after setting

0 = 0,

1 = a’(0) + 1

or

~’(&(@)=

Y’(P)@(P) = Y’(0).

a’(0) = 0.

Now changing PI to p, we have the following differential equations determining a and y : 0 3

Now a(P) = 0 is ruled out, since a(0) = 0. Hence a’(P) = 0

Thus, putting y’(0)

= l / c 2 , we

or

a(@

=

1.

have?

This gives the law of transformation for velocities that we obtained earlier for the Lorentz group:

Now, we must determine the whole matrix

(::: u“):.

As before, the conditions that this belong to a group and that it approach the identity as fi + 0 require that the coefficients be functions of fi also. Thus

Now, from the fact that c@)

=

1, that is, p2 = PI + p, we see that

is a one-parameter group of 2 x 2 real matrices. 7 We include the possibility ~’(0) = 0 by possibly allowing c

=

co.

206


One more relation is needed to determine the matrix elements completely. This can be obtained by the following considerations: Let D(p) = alla22 - uz1ulZbe the determinant of the matrix. As for any one-parameter group of matrices, we have D(B + B1) = D(P>D(P1);

hence,

D’(P) = ~’(O)D(P).

Hence,

o(p)= e D ’ ( 0 ) P .

We want to conclude that D’(0) = 0. We must impose an additional physical condition to do so : Let s + (x(s), t(s))be any curve in space-time. The “ timeinterval along the curve, say over the interval 0 < s < 1, is just t(1) - t(0). The transformed curve is ”

s

The “time interval

-+

”

(allx(4

+ a12 44, a21x(s>+ a 2 2 t(s>).

along this curve,? say, for 0 2 s < 1, is then just

a2,xU)

+ a 2 2 t(1) - a21x(O>- a 2 2 t(0).

Now this is in general riot the same time interval as the original curve. Of course, as p-+ 0, the time intervals become the same, namely,

- a2140) + t(lNa22 - 1) - m a 2 2 - 1)

0 as /?-+ 0. However, it is reasonable to suppose even more: namely, that this difference divided by P approaches zero as B 0. This requires, of course, that Q2141)

+

-+

It is readily verified that this condition leads to D‘(0) = 0; hence 2 P2 a 222 3 N P ) = 1 = a11a22 - a12a21 = a22 -7

or

t Mathematically, “time” is just a real-valued function on our manifold so that the “time interval ” for the end points of two curves is just the difference of the values of this “time function ” at the end points.

20%


By reversing the reasoning of the preceding section, it is seen that these conditions show that 4 is a Lorentz transformation. Thus we have succeeded in characterizing the Lorentz transformations by means of reasonable conditions. This method bypasses mechanics. We can now reintroduce it in the following way. Suppose a particle of mass m moves along a curve in space-time. Its Newtonian energy and momentum are, respectively, dxlds dtlds

E(s) = m -

M(s)= m -

Considered as a function on velocity space, with the coordinate u,

E(v) = imu2,

M(u) = mu.

Now a Galilean transformation 4 of parameter s + u + p on velocity space. Thus

4*(E)(u)

= m(u

p introduces the translation

mpZ + p)" = mu2 + mug + 2 2 ~

~ * ( M ) ( u=) mu + mp = M(u) + mp. Thus,

4*(E) = E

mp2 + p M + -, 2

+*(M) = M

+ mp.

This computation indicates the following group-theoretic interpretation of energy and momentum : The transform of any one of the functions E, M and the constant function under a Galilean transformation is a linear combination with constant coefficients of the functions themselves. Thus the mapping

p+[

F

defines a linear representation of the Galilean group by 3 x 3 real matrices. Perhaps it is worthwhile to pause and describe the general background of this sort of phenomenon. Suppose that a Lie group G acts on a manifold P. The space F ( P ) of all real-valued C" functions on P forms a vector space

208


Caiculus

under pointwise addition and multiplication by real scalars. The action of G on P defines a linear representation of G into the group of linear transformations on F(P>. For f E F(P), g E G, the transform off by g is defined just as g*(f). Now F ( P ) is, of course, infinite dimensional. However, there may be linear subspaces of F ( P ) that are finite dimensional and invariant under G and hence define a finite dimensional representation of G. The case where P is the velocity space used above; G is the Galilean group (which is just the translation group on velocity space); and the subspace of F ( P ) is spanned by E, M ; and 1 is the case considered above. Now, in general, it is a very difficult problem to decompose completely the representation of G on F(P). (Such a decomposition would be known as the “Plancherel theorem” for the action of G on P.) However, the cases where it can be accomplished are very important: For example, if P is the group G itself, with a compact G acting on itself by left translation, then the resulting decomposition is the Peter- Weyl theorem. The case where P is the 2-sphere in 3-space, with G the group SO(3, R) of rotations, is very important in quantum mechanics. The irreducible finite dimensional subspaces F ( P ) are generated by letting G act on the spherical harmonics. One further general remark will be useful to us in extending these considerations to special relativity. Let fi,. . . , h be linearly independent functions on P that are transformed among themselves by the action of G. Explicitly, g*(fi) = a i j ( g ) f , ,

1 I i,j, . . . I n (summation convention).

The mappingg + (aij(g))then just defines a linear representation of G by n x n real matrices. (This is just the matrix representation obtained by choosing the basis fi, . . . ,f , in the space of functions spanned by fi, . . . ,f,.) We know from general Lie theory that there is then at the “ infinitesimal ” level a corresponding linear representation of the Lie algebra of G by n x n real matrices (Lie bracket going into the commutator ab - ba). This can be obtained explicitly as follows: Recall that an element of the Lie algebras of G is a one-parameter subgroup of G, say, t + g(t). It, acting on P, has an infinitesimal generator A’, which is a vector field on P. This correspondence defines a Lie algebra homomorphism of G into V ( M ) . Explicitly,

We then see that

The mapping X real matrices.

+ ( a i j ( X ) )is

the desired linear representation of G by ( n x n)

209


Returning to the case where P is velocity space, G the Galilean group, we see that these infinitesimal relations take the form dE X(E)= -= 2M,

aM X ( M ) = -= m ,

av

av

X(1)= 0.

Note that we can separate out the roles played by E and M by, additional relations :

X y E ) # 0.

X 2 ( M ) = 0 = X3(E),

Now turn to the Lorentz group. It, too, acts on velocity space: U-+

v+B .

I

(P/c2)v

+1

*

Its Lie algebra is also one-dimensional: The infinitesimal generator is then

Now, in the Galilean case,

xi)=[

i i)(4).

8 ;I.

The trick now is to replace this matrix by g

2

It is not possible here to explain in detail why this is the correct modification. However, it is related to the fact that in higher dimensions the Lorentz group is a semisimple Lie group, while the Galilean group is not. (Notice that the matrix

i)

has three distinct eigenvalues, namely, 2 = 0, , I=

[ 8 K)

(lie), while

has only 2 = 0 as a multiple eigenvalue. The effect of the perturbation by c is

210


to split apart" these eigenvalues.) Now E and M satisfy the following conditions : "

X(M)=

X(E) = M ,

1 E c

+ m.

Hence X 2 ( M )= (l/c*)M. This enables us to determine M explicitly by a change of variable: u = 2 log(%).

Notice that (1

-

$) 2 (; log(=))

=

1.

Let M *(u), E *(u) be the functions such that

M*(u(u))= M(u),

E*(u(v)) = E(u).

Thus, by this change of variable, X goes over to d2M* - - M 1 *,

--

dU2

c2

a/&,

and M

*, E*

satisfy

dE" -- M* du

Hence, M * ( u ) = aleUiC + a 2 eCuIC,

E*(u) = b

+ c(aleu/C- a 2 e-"/').

Now

Now we clearly want

M(O)=O=a, + a 2 ; hence,

M(u)=

2a u / c [l - ( V 2 / C 2 ) ] ' / 2 '

21 1


Now

E*(u) = b or

[(-1cc +- vu + (-)cc +- uu 2c b + c 4 ( c 2 , , /,j b

E(v) = b =

+ cul(e”/‘ + e-”/c), + CLI

-

=

‘”1

2u1c

+ [1 - ( v 2 / c 2 ) ] 1 / 2 ‘

We can determine the integration constant b by using the relation

1 X(M)(O) = - E(0) C2

+ m = -1 ( b + 2 a 1 c )+ m C2

-- aM ( O ) = > 2a ,

av

C

or b = -mc2. To determine a,, it seems necessary to impose an additional condition: For example, it seems reasonable that as v + 0, relativistic effects should subside and that d M / d v should approach its Newtonian value m. But (i?M/av)(O)= 2a1/c. Finally, then, M(v) =

mu [l - ( u 2 / c 2 ) ] ” 2 ’

E(v) = - m c 2

+ [l - mc2

(v2/c2)]”2’

Now we have determined M and E partly by requiring that they reduce to the Newtonian values as c + co. However, there is nothing sacred about E = +mv2 in the Newtonian case; $mv2 + constant would serve just as well. However, since no particular constant serves to simplify anything, we usually are content to let it be zero. It is quite different, however, in the relativistic case. Notice that redefining E as E ‘ , where E’

=

mc2

[l - ( u 2 / c 2 ) ] ’ / 2 ’

gives the following law of transformations under infinitesimal Lorentz transformations: 1

X ( M ) = - E’, C2

X(E‘) = M .

This is obviously a considerably simpler transformation law than our original choice of E. Notice, for example, that the transformation law no longer invofves m , but is determined completely by the underlying geometry. In fact,

212


Calculus

let us compare this transformation law to the transformation law satisfied by the functions x and t on space-time. We have seen that a Lorentz transformation on space-time can be written in terms of fl as follows:

CZ[l

Px -(p/c2)]”2

+

[l - ( P Z / C 2 ) ] ” 2

Hence the infinitesimal generator is the vector field (which we shall also call X , since it, too, is identified with the generator of the Lie algebra of the Lorentz group) :

Hence the functions X and t transform under a Lorentz transformation precisely in the same way as do M and E‘. This suggests the following geometric construction : Let s -+ (x(s), t ( s ) ) = a($) be a curve in space-time, with velocity function u(s) = (dx/ds)/(dt/ds). Define a vector field along (r by assigning to a(s) the tangent vector

a + M(v(s)) at ax a

E‘(v(s))-

This is the “momentum-energy ” vector field along the curve. This vector field has the following “ covariance ” property: If curves a and a1 correspond under a Lorentz transformation, the momentum-energy vector fields also correspond under the same Lorentz transformation. Notice that this behavior of energy and momentum together has no analog in Newtonian physics!

Variational Problems Admitting Given Groups of Symmetries

As we remarked in Part I , there is often a connection between being able to solve a given system of differential equations by quadratures,” and the differential equations admitting a symmetry group of a certain algebraic structure, although it is difficult to make this precise. The differential equations arising from variational problems have a special structure, and this leads to a “

213


further interesting relation to possible symmetry groups. Lacking a general theory, we shall restrict ourselves to sufficiently illustrative remarks. Let M be a manifold with a Lagrangian L given on M . Let 0(L) be the Cartan 1-form on T ( M ) x R = P. A vector field Y o n P generates a symmetry group of L if Y(dO(L))= 0. Suppose that Yl, . . . , Y, are vector fields on P satisfying this condition. Choose functions fi,. . . ,f , on P such that

df, = - Y , _I dO(L)

for a = 1, . ..,r .

Then thef, are constant on the characteristic curves of dO(L);that is, they are “integrals of motion,” in the classical sense, of the extremals of L. For each choice c = (c,, . . . , c,) of real constants, consider the submanifold” of P defined by “

P’ = {P ~ P : f , ( p=) ~ 1 .,. .

= cr>.

( 16.12)

(Of course this need not be a submanifold. Most of our discussion will be only generic,” ignoring the possible singularities that may arise, and is intended only to cover the high points of the theory.) Thus the problem of finding the characteristic curves of dO(L) can be “reduced” to the problem of finding the characteristic curves of &(L) restricted to each of the submanifolds p‘. Since this is a manifold of lower dimension, we have succeeded in reducing the difficulty involved in solving the differential equations that define the characteristic curves of dO(L). However, this remark is independent of the algebraic structure of the Lie algebra generated by Y,, . . . , Y,. Do certain algebraic structures lead to a further simplification ? Choose indices 1 < a, fi, . .. < r , and the summation convention. Then ‘1

The k,,, are the structure constants of the Lie algebra generated by the Y , , . . . , Y,. If Y is a vector field, then Y(fa)= dfa(Y) = dO(L)(Y, Ya).

If Y(dO(L))= 0, and if Y(f,) is expressible as a linear combination of (f,- c,), . . . , ( f , - cr), then Y is tangent to the submanifold P‘ and hence provides an additional symmetry for the characteristic curves of dO(L) that lie on P‘.

In particular, the Y-elements in the Lie algebra generated by Y,, .. ., Y, that are tangent to P‘ form a subalgebra that can be computed in a purely algebraic fashion. (For example, if the algebra as generated by Y,, . . . , Y, is Abelian, this subalgebra is the whole algebra.) This subalgebra acts as symmetries on the differential equations determining the characteristic curves

214


of dO(L) that lie on P'. The whole algorithm can then be iterated, with the subalgebra acting on P' instead of the whole algebra acting on P . If the process ends with a problem of finding the characteristic curves of a 2-form on a two-dimensional submanifold of P, we shall have succeeded in " integrating the characteristic curves of dO(L) (hence the extremal curves of L) by quadratures," in the classical terminology. As illustrations, let us consider an example of structure for the Lie algebra. First let us suppose m = 2:

cy,, Y2l

= kY2.

(The Lie algebra is then solvable.) We shall also suppose that Y,(f z ) = kfz . Suppose Y = a, Y , u2Y2:

+

Y(fJ =

- a 2 kf2 >

Y ( f 2 )= U l k f 2 .

where Y is tangent to P' if c2 = 0. Thus, we see in the non-Abelian case ( k # 0) that we can expect that only a one-parameter family of the submanifolds P' will admit a further group of symmetries. All this applies if the Yare prolongations of vector fields on M that generate groups of symmetries of the extremals of L. However, the technique of finding normal forms for the vector fields is more useful for practical purposes in this case. Suppose, then, that A',, . . . , X , are vector fields on M such that

xi,(^) = 0,

a = 1,

..., r.

First suppose that

[A', XpI 3

= 0,

1 I a, B I r, x,(q),. . ., Xr(q)

are linearly independent for all q E M . Then, it is easily seen by an extension of the argument used in Chapter 8, that coordinates ( x l , .. ., x,,) can be chosen for the open set of M that we are working in, so that

Then, also,

We conclude that L in the coordinates is a function L(x,+,, . . ., x , ,

i , ,. . . , i,,,). In classical language, the coordinates ( x l , .. ., x,) are ignorable.

Now

O(L) = L,,+idxi - H dt,

with H

= L,+iii - L.

215


We conclude that 6(L)(Xl)= L,+l, . . . ,O(L)(X,)= L,+, are constant along the characteristic curves of dO(L). H is a function H(x,+,, ..., x,, yl, . . . ,y,). Then put y i = L,+ i , and set y1 = cl, . . . ,y , = c, . The Hamilton equations for ~ , + ~ ( .t.).,, x,(t), ~ , + ~ ( t. ). ., ,y,(t) are then, for i = r + 1, . . . ,n, dxj dt

-=

d yi dt

Xn(t),c1, * . . )c r , Y r + i ( t ) , - * . j Y n ( t ) ) ,

Hn+i(xr+l(t),

- = --i(xr+

Ir

. . ., xn

9

.. ., Cr

clr

3

yr+ 1 r

. . ., Y n ) *

If ( x r + l ( f ) ,. . . , x,(t), ~ , + ~ ( t. .) ., ,y,(t)) is a solution of this Hamiltonian system in 2(n - r ) variables (for each value of (cl, . . . , c,)), then xl(t), .. . ,x,(t) are determined by xi(t> = JH,+i(xr+ l(t>, . . . 7

xn(t),

..

~ 1 ,

. ?

cr

7

Y r + l(t>,

..., y n ( t ) >

dt

for i = 1 , ..., r. Of course this is ideal if (n - r) = 2 because the resulting reduced Hamiltonian system can, of course, be further solved " by quadratures," since the Hamiltonian H is still available to us as an integral of motion. For each value of c = (cl, . . . ,c,) is there a Lagrangian R' on x,+~,. . . , x,space whose extremal curves are the curves in x,+ 1, . . . ,x,-space that occur as the projection of solutions of the reduced Hamiltonian system? We mention the classical procedure for finding such an R :

R'(X,+~, . . ., x,, ir+l, . . . , a,) is the function such that

R is called the Routhian. We refer to Whittaker [l], for a fuller discussion and for the solution of many problems using Routh's method. Integrals of the Type of

"

Total Angular Momentum "

Let M be a manifold on which a Lagrangian is given. We have discussed the general principle that the integrals of motion are associated with vector fields on T ( M ) , leaving &(L) invariant. One source of such vector fields that we have exploited is obtained by taking a vector field on M , which generates a group M permuting the extremals of L and prolonging it to T ( M ) . However, there is the possibility of groups acting on T ( M ) , leaving dO(L) invariant, which do not arise as prolongations of vector fields on M . In this section, we

21 6


present one method of obtaining such symmetries. It will also give us a n opportunity to illustrate our earlier statement, namely, that it is often useful to work with a basis for differential forms on an M that does not arise from a coordinate system. Suppose, then, that w i (1 5 i, j , . . . n = dim M ; summation convention) is a basis for I-forms o n M , with

dWi= C j k i O j

A Ok.

As before, we consider mi as forms on T ( M ) ,pulling them up with the dual of the projection map T ( M ) M without any special notation. Then y i denotes the functions I‘ - y i ( ~ i )= o i ( u ) on T ( M ) . If L E F ( T ( M ) ) is the Lagrangian dL = L i mi L,, dyi , then --f

+

O(L) = L,+ioi - Hdt,

with

H

= L,s+iyi- L.

Let X be a vector field on T ( M ) x R such that

X ( t ) = 0 = X ( H ) = X&+J Then

x A dO(L) = -Oi(x) dL,+i -k L,,+iCjkjOj(X)Ok X will generate symmetries of the characteristic curves of d ( L )if d ( X 1 dO(L))= 0. There is one important case (to which we shall restrict ourselves) where such a choice can be made: O j ( X ) = Ln+j .

Suppose, for example, we assunie that ( c j k i )is skew-symmetric in all three indices, and that L is a nonhomogeneous, regular Lagrangian; that is, det(L,+i,,+j) # 0. Thus X A dO(L) = - L , + j , dL,+i.Hence, with these is an integral of the characteristic curves of assumptions, I = dU(L). We have not yet established that such an X exists. Now 0 = X ( H ) = X(L,+i)yi =

- L , L n + i.’

+ Ln+iX(.Yi) - Li W i ( X > - L n + i X ( Y i )

Thus the relations

must be considered as the conditions for the existence of this new integral of motion, which in the classical rotating rigid body problem is the integral of “total angular momentum.”


217

Rigid Bodies Treated Group-Theoretically The configuration space for a rigid body in Euclidean 3-space, with one point fixed, is just the group of 3 x 3 real orthogonal matrices of determinant 1. (For consider the fixed point of the body as the center of the coordinate system, and consider a fixed orthogonal coordinate system. The position of the body is evidently determined by the position of another orthogonal coordinate system fixed in the body. Two such coordinate systems are related by a 3 x 3 orthogonal matrix. Since the moving coordinate system can be deformed into its original fixed position, and an orthogonal matrix always has determinant k l , it is clear that we are interested only in orthogonal matrices of determinant 1.) The traditional treatment of rigid-body dynamics usually is designed to mask the fact that the configuration space is the manifold of 3 x 3 real orthogonal matrices, that is, the underlying manifold of a Lie group. To restore some balance to this situation, we shall treat things strictly from the grouptheoretical point of view, purposely looking for variational problems that can be solved easily by using symmetry principles. We shall mention only very briefly the relation to the traditional rigid-body problems. Let G be a Lie group. We have defined the Lie algebra of G, usually denoted by G , as the set of one-parameter subgroups of G, and have justified the name “Lie algebra” by showing how the sum and Jacobi bracket of two one-parameter groups may be defined. If G acts a group of diffeomorphisms on a manifold M , we have also seen that, as the infinitesimal version” of the action, G acts a Lie algebra of vectorjields on M . The two most obvious examples of such an action are the action of G on itself by left and right translation. If a basis for G is chosen, the corresponding vector fields on G defined by these two actions form two bases for vector fields (“ absolute parallelisms”) on G, and are, respectively, right- and left-invariant vector fields on G. The basis of 1-forms dual to the basis of left- (or right) invariant vector fields on C is a basis for the vector space of left- (or right) invariant differential 1-forms on (3.1-Conversely, giving a basis of left- (or right) invariant 1-forms on G seems to fix the basis of left- (or right) invariant vector fields by duality. Most Lie groups can be realized simply as subgroups of GL(n, R), the group of all real n x n invertible matrices. Hence, a basis for its left- or rightinvariant vector fields can usually be most easily found by finding a basis for left- or right-invariant vector fields on GL(n, R), and then finding the subspace of these vector fields that are tangent to G. The general theory then tells “

f A differential form on a Lie group that is invariant under left or right translation is often called a Cartan-Maurer form.

218


this subspace defines a basis for the left- or right-invariant vector fields G. We must compute the left- and right-invariant differential forms on GL(n, R ) . Choose the following range of indices and summation conventions, 1 5 i,,j, . . . 2 n ; x i j will denote the functions on GL(n, R) which assign the entry in the ith row and,jth column to a matrix in GL(n, R). Thus the whole matrix is ( x i j ) .The functions that assign the (i,j)th entry to the inverse matrix of an element of GL(n, R ) are denoted xG' . Thus LIS that

011

x..x:lJ k 1J

= 6.tk = x 1.J . xJrk ' .

Now the following forms define a basis for left-invariant forms, 0.. ij = x. t i 1 dxkj,

(16.13)

while the following forms define a basis for right-invariant ones,

oij= d X i k X k j 1 .

(16.14)

The ortlro~gor~al group, O(n, R ) , consists of all matrices whose inverse is equal to its transpose, and is thus a closed subgroup of GL(n, R).SO(n,R),the voiatiorr group, consists of all orthogonal matrices of determinant I. (Recall that the orthogonality condition requires that the determinant be 1.) Hence, from thegeneral theoryof Liegroups sketched in Chapter 10, one deduces that: (a) O(n, R ) is a closed Lie subgroup of GL(n, R). (b) SO(n, R ) is the connected component containing the identity ofO(n, R); hence it i s an invariant closed Lie subgroup of O(n, R).

(Prove these facts directly, as an exercise!) Now O(n, R), as a submanifold of GL(n, R ) , is determined by the relations XkiXkj =

hjj.

Differentiating these relations, we have, on SO(n, R), &Xkj

+ X k id.Xkj= 0.

Restricting to SO(n, R) (with o i j and oij given by (16.13) and (16.14), ~

; =j ~

dXk; = - 0 j ; .

k d ;~ k = j -xkj

Similarly, one proves that ( w i J ) are skew-symmetric in i and j . Now it is readily verified by counting that dim SO(n, R ) =

n(n ~

-

2

1) '


219

that is, just the numbers of pairs (i,j), with 1 < i <j I n. We conclude: The w i j (resp. 6.1;~) with 1 I i <j I n are, when restricted to SO(n, R) from GL(n, R ) (where they were originally defined by (16.13), (16.14))’ form a basis for the left-invariant (right-invariant) 1-forms on SO(n, R). Let us now return to the case n = 3. We have seen that SO(3, R ) would be identified with the configuration space of a rigid body moving in threedimensional Euclidean space with one point fixed, say, the origin. Recall that this is obtained by choosing aJixedorthonorma1 basis (elo, ezo, e3’) of R3, and choosing one orthonormal basis (el, e , , e 3 ) that is fixed in the body. There is a unique orthogonal matrix ( x j j ) such that ei = x i j e j o .This allows us to identify the given “configuration” of the body with the matrix ( x i j ) . Now a physical motion of the body as a function of time is determined by functions of time: ei(t). This allows us to define the curve ( x i j ( t ) )in SO(3, R) by ej(t) = xij(t)ejO. What sort of Lagrangians on SO(3, R) are suited to describe the (Newtonian) dynamics of rigid-body motion with no external motion ? By general principles (or perhaps, more truthfully, because it is the type we are used to), the Lagrangian should be, at each point q of configuration space, a quadratic form on the tangent vectors to that point. Further, the general symmetry of Newtonian mechanics under rigid motions tells us that this Lagrangian should be invariant under left trunshtion on SO(3, R), since applying a left translation just means rotating each element of the rigid body by the same rotation. These two conditions severely restrict the choice of Lagrangian. To express this analytically, let us take advantage of the fact that dim SO(n, R ) =

n(n ~

-

1)

2

=n

(only for n = 3).

Thus the tangent space to SO(3, R) at a point, say, the identity element, can be identified with R 3 . T o do this explicitly, put

Then (wl, w , , 6.1~) form a basis for left-invariant forms on SO(3, R ) . Let (yl, y , , y 3 ) be these forms regarded as functions on T(SO(3, R ) ) . Evidently, then, the Lagrangian L representing a possible force-free rigid-body motion must be of the form L

= I..y.y. ti

I

I’

220


with constants I j j forming a symmetric matrix. Of course this matrix is nothing but the moment of inertia matrix of the body. We can obviously exploit the freedom to choose the fixed bases (elo, e,’, e,’) by choosing it SO that the matrix ( I j j )takes a diagonal form; that is,

Iij=O ifi#j,

=Ii ifi=j.

The axes moving the body are then the principal axes of the body, and L takes the form L

= I,yI2

+ I,yz2 + I,y,2.

In accordance with general principles, the Lagrangian for a motion under forces derived from a potential function V defined on configuration space is then just

L

= )(I,y12

+ 12yz2 +

z3y32) -

I/.

Then a curve o(t)is an extremal of L if it satisfies the general Euler equations (with respect to the basis of 1-forms) that were derived in the beginning of this chapter, namely,

The first step in making these more explicit is to find dwi: d o i j = d(xki d x k j )

= dxki A d x k j = d X k i A 6 k k l d X k l j

= dXkiXkjlXktjt

A dxk,j

= Oij, A O j , j .

Now = dw12= 013

dw,

A 0 3 2 = - 0 ~ 1 3 A 023

dw, = d q 3 = w12 A w 2 , = w1 A dw, =do,,

=

o,, A o

, =~-wl

= - w 2 A 0,.

0,. A

w2.

This gives the following values of the nonzero components of (cjki), the structure constants of the Lie group SO(3, R ) : =

Suppose dV

=

-1,

c132

= 1,

c231

=

-1.

V i o i .Then

L j = - Vi,

Ln+i= I i y i

(no summation).

22 1


If yi(t) = o,(o'(t)),the Euler equations take the form (16.15a) 1 2dY2 -= dt

- y I Y3 v 3 -

(16.15b)

(16.15~) These equations are given in every textbook on rigid-body mechanics. (Notice, then, that the configuration space variables d o not appear except in the potential. The problem of finding the extremals a(t) can be divided into two parts: First find yi(t)by solving the three-dimensional system (16.15); then find a(t) as solution of the second three-dimensional system: mi(a'(t>) = Yi(t)*

Except in the most trivial cases, solving (16.15) involves the theory of elliptic functions. (In fact, the applications to rigid-body problems were the principal impetus to the development of the theory of elliptic functions in the first half of the nineteenth century.) Let us suppose that Y = 0, and inquire about the possible existence of integrals of motion of the system (16.15) that arise because of the underlying group invariance. First, the Hamiltonian H = L n , , y i - L = $ Z i y ~= L

is such an integral. (It is also the totd energy in the case of rotating rigid bodies.) We shall now look systematically for more integrals of the characteristic curves of d8(L) that are functions of y , , . . . ,yn alone. Now let us look for a function f of ( y , , ...,y,,) alone, and a vector field Y o n T(SO(3, R)) such that

df = Y_I de(L) = Y_I (dLn+iA

Wi

+ Ln+iCjkiWj A Wk - dH A dt),

and Y(t) = 0. As conditions we have (16.16a) (16.16b) (16.16~)

222


Now, if Y satisfies these conditions, Y(dO(L))= d( Y(d(L)))= d(d Y _I dO(L)) = ddf dd(O(L)(Y))= 0,

+

+ d(O(L)(Y ) )

so that Y generates a one-parameter group that permutes the characteristic curves of dO(L). Conversely, if a vector field Y on T(SO(3,R))satisfies Y ( t ) = 0 (16.16a; 16.16b)

and

A

d w , ( Y ) = 0,

(16.17)

then such anfexists, is an integral of the characteristic curves of dd(L), and is a function of y,, . . . ,y, alone. Writing out the condition of (16.17) in more detail, we have Ln+j , n + j d y j

(16.18)

dmi(Y) = 0

A

(since L is a function of y l , . . . , y , alone). Now L is a regular, nonhomogeneous Lagrangian ; that is, det(L,+ i , ,+ The conditions then become: mi( Y ) is a function of y,,

j)

# 0.

. . . , y , alone.

We shall not go into a deep analysis of these conditions here. Consider the simplest choice, namely, mi( Y ) = a i j y j , with constant a i j ;

Y(yi) = 0.

The condition on Y then becomes Ln+i,n+jaikdyj A d y k = O . Now 0 Ln+i,n+j = Zi Thus,

C Iiaik d y , A dyk = O i, k

or

ifi#j if i = j .

I i a i k= I k u k i (no summation).

Condition (16.16a) requires that

Now ( c j k i is ) skew-symmetric in all three indices. Then this condition can be realized by choosing a .. =

(no summation).


223

It is then clear that all desired conditions are satisfied.? Then (no summation);

mi(Y ) = I , yi hence,

df =

i

Iizyi d y ,

or

2f

+ ( I z y,)’ + ( I , y,)’.

=(I,Y,)~

(16.19)

This is obviously an integral of (16.15) independent of the energy integral I,y12 I,y,’ I,y,’ (unless, of course, Il = I, = I , , which is the trivial case, since the right-hand side of (16.15) is identically zero anyway). Physically, it is the integral of total angular momentum for the rigid body, and the reader will readily. verify that it is the integral found by more general arguments in the preceding section. These two integrals enable one to reduce (16.15) (remember that V = 0) to three separated first-order differential equations for y l , y , , y 3 , which can be solved with elliptic functions. In fact, as we show later, following Tricomi [l], these equations can be used to define the Jacobi elliptic functions and derive their principal properties.

+

+

The Euler Angles for a Rotating Rigid Body We have seen that the Euler equations for the extremals of a left-invariant variational problem on SO(3, R)(or any Lie group for that matter) split up in two parts: To find an extremal curve o(t), first one solves for yi(t) = o,(o’(r)), for i = 1,2, 3 (q,o,, o3 a convenient basis for left-invariant forms) then finds o itself. Of course the question arises exactly how to describe curves on SO(3,R) in explicit terms, since it is a compact manifold and hence cannot be covered by a single-coordinate system. At least two methods can be used. First, we have defined the left-invariant form as

,,

0..=

x,k dxkj

(1 I i , j , k, . . . I 3; summation convention)

where x i j are the functions on SO(3, R) which to every matrix assign its (i,j)th entry. Of course the x i functions x i j are bound by the orthogonality conditions x i j x k j= a,,. In principle, these relations could be solved for three independent functions to define a coordinate system for a piece of the manifold, but we can be certain that this would be too awkward to be of much value. However, if the “ momentum functions y i j ( t )= o i j ( t ) have already ”

t In making these choices, we must confess that we have been guided by knowing the answer via the analogy with rigid-body dynamics. The reader is invited to try to work out the necessary conditions to a conclusion.

224


been found, the functions xij(a(t))that actually describe the extremal are obtained as solutions of

Now these form a system of linear, ordinary, time-dependent differential equations for the functions t + x i j ( o ( t ) ) ,so there are certainly methods available to solve them, although they, too, may probably not be too practical for computations or for predicting the qualitative properties of the extremals. The second method proceeds by introducing a coordinate system for a piece of SO(3, R) that is well adapted to describing the group structure of SO(3, R) (hence is also well adapted to the physics, since the physics and the group theory more or less coincide), namely, the Euler angles. They can be described group-theoretically. Consider the set of matrices : cos 0, -sin 0, ( 0

sin 0, cos 8, 0 0 O1 I

=

A(&),

-

03

< 0, < 03.

(16.20)

They form a one-parameter subgroup of SO(3, R); in fact, each one just represents a rotation about the x3-axis of angle el. Similarly, consider the one-parameter group of rotations about the x,-axis :

i' 0 0

0 cos 0, -sin 8,

sin 8, cosO 0, I

= B(B,),

-co < O2 < co.

(16.21)

It is seen now that each orthogonal matrix (of determinant 1) can be written as a product, A(0,)B(0,)A(0,), of three about the two axes. It can be verified that this representation is unambiguous for a certain open subset of SO(3, R) and for O,, U, , 0, suitably restricted. Thus 0,, 0,, and 0 , serve as a coordinate system for a piece of SO(3, R). In fact, a suitable (but tedious) calculation shows that

+ cos 0, do,,

(16.22a)

w 2 = sin 0, cos 0, d0, - sin 0, do,,

(16.22b)

o1= sin 0, sin 0, do,

o3= cos 0, d0,

+ d03.

Notice, for example, that these forms are not independent for 0, so the Jacobian of the map

(16.22~) = 0 = 0, = O3

,

225


is zero at 81 = 0 = 8, = 0 3 . Now, the Lagrangian of the left-invariant variational problem is just

L = I,(sin 8, sin 83 8,

+ cos 83 8,), + I,(sin 0, cos O3 8, - sin 83 8,), + Z,(COS8, 8, + 8,),.

Thus 0, does not appear explicitly in L. In fact, this is just the coordinate system chosen so that the infinitesimal generator of left translation by A(0) is just dl88,. Note that the infinitesimal generator of right translation by A ( 0 ) is just 81883 . Thus the condition that the rigid body be symmetric about the x,-axis is that the Lagrangian not depend on e3 either. Clearly, the condition for this is Il = I , . Let us now compute the Hamiltonian for the Lagrangian L = Ilyt2 + I , y,, + 13y3,. Suppose the Ln+i and Lk+iare such that

dL = Ln+i dyi+ - - .;

dL = Lk+iddi+

- - a .

Now L,+ = 2Ziyi (no summation). The d iare related to jiby solving (16.22) and making the substitutions w -+ j , d0 -+ 8.

8, =

+

sin O3 y , cos 83 y , , sin 8,

8,

Thus, Ln+1

= cos

O,(sin 83 y ,

83 y , - sin 0, y , ,

+ cos 0, y,).

sin O3 sin 0,

+ En+,cos O3 - En+,cot

cos 8 3 sin 8,

cot

= En:,,,-

Ln+2= Ln+3

= y 3 - cot

8,

En+, _ _ - En+,sin 83 -

82

sin Q 3 ,

82

cos 83,

=&+3*

Now, if p i = L,+i, the Hamiltonian H is just L written in terms of the p (since L is a quadratic Lagrangian):

If pi’= LA+i , we know from earlier work that the Hamiltonian for L in the 8,, 0, , 83 coordinate system is obtained by simply substituting in the values of p i = L, + in terms of pi’= LA + i :

+ pz’ cos 8, - p3’ cot 8, sin 0, 1

p,‘ sin 8, - p3‘ cot 0, cos 8,

226


For example, look again at the case I, = I , ; that is, suppose that the variational problem is invariant under left and right translation by 4 0 ) . Then

As predicted by the general theory, the Hamiltonian does not depend on 8, and Q 3 , so pl‘ and p3‘ are constants, say, C, and C2. Then the Hamilton equations for 8,, p,’ give

do, =dt

1 21,

Pz‘.

Also, we know from “conservation of energy” (that is, the fact that the Hamiltonian is a constant of motion) that

(

_~ ‘lZ

41, sin2 0,

+4112($)’

+ C3’ cot2 8, or

(2)’

- 2C,C,

C,’

= constant = E ,

1

= - 41,’

x (C,,

sin2 8,

+

Change variables to x dx

1 1 2 ( x )=

c,2

cos2 0, - 2 c , c , cos 8,)

+ E - (c32/413) 11

= cos 8,.

Then

( E - $)Id1

- x2)

+

- (C,

- C3X)2,

which can certainly be solved in elementary terms without elliptic functions. The most important point is to compute the roots of the second-degree polynomial on the right-hand side. The solution will then oscillate between these limits. If our rigid body is a top, 0, will measure the angle between its x,-axis (in a coordinate system fixed in the top) and the fixed space x,-axis. This leads to the typical rising and falling motion of the top. I n fact, we see that we can add a potential-energy term of the form V(cos 0,) to our Lagrangian without affecting this qualitative picture of the motion. By the general principles, this merely adds V(cos 0,) to the Hamiltonian: p,’ and p3’ remain constants of motion, and x(t) is again determined by an equation of the form


227

Hence the solution will oscillate between two roots of f ( x ) if it starts out between them. For example, if V(cos 8,) = a cos 8, (corresponding to the example of the “ heavy symmetrical top,” which is found in every textbook on rigid-body mechanics; for example, Goldstein [I]), the effect is to makef(x) a cubic polynomial in x, requiring the introduction of elliptic functions. Once we have found x ( t ) = cos 8,(t), the other Euler angles el, 63 can be found by a quadrature from the Hamilton equations :

do3 aH -=-=dt

api

1

)

2C3~2-2C1~ +c3 1-xz 41,

(

We can now summarize the qualitative features of our discussion: We have considered variational problems of Newtonian type on a manifold M that admits a relatively large group, namely, left and right translation by SO(3, R). However, the largest Abelian subgroups of this group are just two-dimensional. The Euler angles define a coordinate system in which one of these twodimensional Abelian subgroups takes its normal form, this form being the natural coordinate system for discussing variational problems that admit the Abelian groups as symmetry groups. (All these two-dimensional Abelian groups are conjugate within the big group, so the seemingly arbitrary choice of one of them really does not matter. In terms of the theory of compact Lie groups, these two-dimensional Abelian subgroups are Cartan subgroups of SO(3, R ) x SO(3, R).)These variational problems are “ integrable by quadratures,” and in fact form most of the classical problems of rigid-body mechanics that have been found to be “integrable by quadratures” (except for the case discovered by S. Kovalewska, which does not seem to be explicable group-theoretically; see Golubev [I] for a full discussion). The case of a variational problem admitting left translation by SO(3, R)as a symmetry group (the rigid body with no external forces) seems to be a typical problem admitting a non-Abelian group of symmetries, a class of problems on which more research needs to be done. (The maximal Abelian subgroup of SO(3, R) are just one-dimensional, that is, the one-parameter subgroup, so that left invariance provides only a one-dimensional Abelian group of symmetries, which is not enough for integrability by quadratures.”) In fact, the group-theoretic properties of SO(3, R) are connected with the basic properties of the elliptic functions, as we have tried to show in Chapter 17, but it is not yet possible to put this connection into definitive form. Finally, we want to describe how the parametrization of SO(3, R) by the Euler angles fits into the general theory of Lie groups, particularly the theory of symmetric spaces. (In this paragraph, we shall be using Helgason’s book [l] “

228


as a basic reference, and shall assume that the reader is familiar with the general notions found there.) Let G be a connected Liegroup, and let s: G C be an automorphism of G such that s2 = identity. (s is then called an inuolutiue automorphism.) Let --f

K

= {gE

G : s(g) = g}.

Then K is a closed subgroup of C, called a symmetric subgroup of G, and GIK is called a symmetric homogeneous space. We shall deal here with the case: K compact. (GIK is then called a Riemannian symmetric homogeneous space.) Now s defines an automorphism of G , the Lie algebra of G, that will also be denoted by s. For example, this can be seen by identifying G with the set of one-parameter subgroups of G. If t -+ g(t) is such a one-parameter subgroup, its transform by s is the one-parameter subgroup, that is, t sg(t). We see, then, that --f

s ' (X )

=X

for all X E G .

Since s is a linear transformation of G, we can split G as the direct sum K O P , with

K

=

{ X E G: s ( X ) = X } ,

P

=

{ X E G :s ( X ) = - X } .

From the fact that s is an automorphism of G, we see that Ad K(P) c P,

[P, P] c K .

Thus Ad K induces a linear representation on P (which is essentially equivalent to the linear isotropy group of the homogeneous space GIK). Let A be a maximal Abelian subalgebra of P. One basic theorem of the theory of symmetric spaces is that Ad K(A) = P. A is called a Cartan subalgebra of the symmetric space G/K. (All maximal Abelian subalgebras of P are conjugate under Ad K.) Now let P = exp(P) c C. The exponential map of G + G usually has singularities. Thus it is a remarkable fact that P can be shown to be a closed submanifold of G. In fact P is the connected component containing the identity of {g E

c:s(g) = 9-1).

Elements of P are called transuectiuns of the symmetric space G / R Let A = exp(A) c P. Again it can be shown that A is a closed submanifold of P (diffeomorphic to a multidimensional torus if G is compact, to a Euclidean space if G is noncompact, and if K is a maximal compact subgroup of G). In a sense, this " flat " submanifold A is the " core " of the symmetric space; many

229


of the important geometric and group-theoretic facts about G / K can be reconstructed from knowledge of A and a certain finite group acting on A , the Weyl group. (The Weyl group can be defined as follows: Let N ( A , K ) and C(A, K ) be, respectively, the normalizer and centralizer of A in K; that is, N ( A , K ) = { k E K : Ad k(A) = A )

C ( A , K ) = { k E K : kak-'

=a

for all a

E

A}.

Then C(A, K ) and N ( A , K ) are closed subgroups of K, C(A, K ) is a normal subgroup of N ( A , K ) , and the Weyl group is the quotient group N ( A , K ) / C ( A , K). A more geometric way of looking at this is to notice that each k E N ( A , K ) induces a transformation, namely, Ad k, on A, and C(A, K ) is just the subgroup of those elements that act trivially on A.) Now the relation Ad K(A) = P implies Ad K(A) = P. Further, it can be proved that G=P-K.

Thus,

G = KPK-~K= KPK,

that is, the map a : K x A x K -+ G such that a(k, a, k') = kpk' for kpk' E K, a E A is onto G. Now C(A, K ) can be made to act as a transformation group on K x A x K as follows: c * (k,a,k') = (kc-', a, ck')

for c E C(A, K ) , k, k' E K, a E A.

Notice that N(C

*

(k,U , k')) = kc-', a, ck')

= kc-lack' = kac-'ck' = a(k,a,

k');

that is, a maps each orbit of C(A, K ) into a point. Also, C(A, K ) acts on K x A x K i n such a way that no element except the identity transformation has a fixed point. Hence the orbit space C(A,K ) / K x A x K is a manifold; a passes to the quotient to define a map of the orbit space onto G, which we shall denote by E. Now the orbit space and G have the same dimension: CY is not quite a diffeomorphism (this would be impossible topologically, if for no other reason), but the points of G that are regular with respect to a t are sufficiently plentiful in the sense that their complement in G is the union of a finite number of submanifolds of lower dimension.

4:

t If M + M'is a map of manifolds of the same dimension, a point p' E M ' is regular with respect to 4 if has nonzero Jacobian at each point of + - ' ( p ' ) . If is onto, a basic general theorem on the theory of manifolds says that the complement in M' of the set of regular elements is of measure zero.

4

4

230


As an example, we can apply this construction to the case G = SO(3, R), K = one-parameter group of rotation about the x,-axis. Explicitly,

.i( 1

cos0 sin 0 0 -sin 0 cos 0 0 = A ( O ) : O 5 0 < 2 7 t 0 0 1

i .

We want to exhibit the involutive automorphism of G that exhibits K as a symmetric subgroup and G / K as a symmetric homogeneous space.? Since K is a one-parameter group, a reasonable choice is just Ad of an element of K of order 2; namely, s(g) = A(n)gA( - 17)

for g E SO(3, R).

We shall leave it to the reader to show that this choice does the job. Let us compute P : P = (9 E G : A(n)gA( -n) = g - ' ] .

Consider for the moment that matrices define linear transformations on 3-vectors, a 3-vector denoted by L'. Any rotation in 3-space admits one and up to a constant multiple only one invariant vector, say, g u = u. Then A(n)g = A(n)v = g-'A(n)zi

or

g(A(n)u)= A(n)u.

Thus, A(n)c' = +v.

Case I A(n)v = v

Then c' is the same invariant vector as the whole one-parameter group 0 + A ( 0 ) ; hence g commutes with each A(B), or g2 = A( -2n) = identity matrix.

Case 2 A(n)u = - u Then r is perpendicular to the invariant vector of the one-parameter group 0 4 A ( @ , that is, lies in the (xl, x,)-plane. Since we are interested only in the connected component of P containing the identity element, we see that Case 2 is the only relevant one, and P can be

t In fact, C / K is just the two-sphere (that is, a Riemannian manifold of constant curvature). This fits in with the alternate geometric definition of a Riemannian symmetric space (modulo certain global complications) as a Riemannian manifold whose sectional curvatures are invariant under parallel translation.


23 1

considered as the set of rotations about axes lying in the (xl, x,)-plane. In particular, it contains the one-parameter group

i

1 0 O+B(fI) = 0 cos 8 sin 8 0 -sin 8 cosO 8 I of rotationsabout thex,-axis. Now the centralizer of A in Kis the identity; hence the map (01,8,,8,) + A(8,)B(8,)A(8,), which defined the Euler angle parametrization of SO(3, R),is essentially just the construction a : K x A x K + G outlined for the general symmetric case.? (Another more qualitative way of putting this is to say that in this case K and A turn out to be one-dimensional (in fact, circles); hence K x A x K can be described by three angular parameters. The specific choices we made are unimportant, since any two choices are related by a conjugacy.)

Exercises 1. Prove the last statement of Theorem 16.1. 2. Suppose (b is a diffeomorphism of R2" such that (b*: F(R2")-+ F(R2")is a Lie algebra homeomorphism relative to Poisson bracket. Prove that 4 is a canonical transformation. 3. Investigate (using Theorem 16.6 and (16.7)) the solution of the two- and three-body problems of celestial mechanics that are also orbits of oneparameter groups of symmetries. 4.

Verify the formula given for the

"

Routhian."

5. Suppose G is a Lie group and (ai), 1I i In, are a basis for the leftinvariant form on G. Let yi be the functions on T(G) such that yi(u) = o i ( u )

for u E T(G).

Let L = Ziyi2define a Lagrangian and a variational problem on G. Work out the general conditions that a polynomialf(y, , . . . ,yn)be an integral of motion. (Hint: Is there a relation with the Casimir operators of the universal-enveloping algebra of the Lie algebra of G ? See Hermann [8] for the notion.)

6. Prove Formula (16.22).

t I owe these remarks concerning the general setting of the Euler angle construction to C. C. Moore.

17 Elliptic Functions Unaccountably, the theory of elliptic functions has virtually disappeared from recent mathematics or physics literature, despite the fact that it is amazingly rich in structure, theorems, and mathematical or physical intuition. Of course we cannot hope to give the subject the systematic treatment it needs, and shall limit ourselves to some properties that follow from the fact that they can be dejned as the functions describing rigid-body motion. Our treatment by means of differential equations then follows up an idea briefly sketched by Tricomi in his book on differential equations [I, pp. 19-26]. The most readily accessible treatment of elliptic functions along classical lines can be found in Whittaker and Watson [l], although they neglect the geometric side of the theory. Recall that the problem of motion of a rotating rigid body with no external forces leads to differential equations of the form : ( 17.1a)

(1 7.1 b) (17.1~) We have seen that the underlying rigid-body problem has two algebraic integrals, namely, those of “energy and “total angular momentum ”: ”

I , y,’ ~,’y,’

+ 1, y2’ + z3y,’

=c

+ Z,’y,’ + ~ ~ ’ y =, ’m

(= constant)

(1 7.2)

(= constant).

(17.3)

It can easily be verified directly from (17.1) that (17.2) and (17.3) are indeed integrals; that is, they are constant along solutions of (17.1). We shall suppose that Zl# 1, # I , # 0. We have already seen that if this were not satisfied (for example, if two of the I were equal), then (17.1) could be solved in terms of sines and cosines. If, on the other hand, one of the I is zero, it is clear that 232

233

17. Elliptic Functions

(17.1) can be solved in terms of exponentials. Finally, we are not necessarily assuming that the I are positive (as they are in the rigid-body problem). One of the variables can be eliminated from (17.2) and (17.3) to obtain algebraic relations among the other two :

+ (I, I , - ~ , ~ ) + y ,(I,~ 1, - ~ (z,f, - ~ , ~ ) y+~(z’,I , - ~ (zlz2- ~

~ ’ ) y (~z *~ I ~~

~= Z,C ~- m, )

y

(17.4) ~

~

~= z2c~- m, )

y

(17.5) ~

~

~= Z3c~ - m .)

y

(1~7.6) ~

These can be substituted into (17.1) to actually “ solve” (17.1). For example,

for suitable constants u, p, y, 6, or

The integral on the left is an “elliptic integral,” so this solution does us little good in practice. In fact we are usually interested in the reverse process, namely, inverting an elliptic integral to make it part of a system of the type of (17.1). The remarkable property of system (17.1) is that any system of solutions ( y l ( t ) ,y2(t),y3(t)) satisfies an algebraic “ addition formula ” that is independent of 11,12,Z3 , namely,

Two similar identities are obtained by permuting y,, y , , and y 3 . Further, (1 7.4) through (1 7.6) can be used to obtain a n algebraic formula connecting, say, y2(s t ) to y i ( t ) and y,(s), for i = 1, 2, 3. T o prove (17.7), one has only to apply the differential operator (a/&)(ajds) to the left-hand side of (17.7) to verify by direct computation that it vanishes when combined with (17.1) through (17.6); hence it is a function of (s 6 ) . The function given on the right-hand side is obtained by setting t = 0. The solutions of (17.1) with special choices of the adjustable parameters have explicit names-the Jacobian elliptic functions:

+

+

y l ( t ) = snt,

y 2 ( t ) = cnt,

y 3 ( t )= dnr.

(17.8)

234


For

--- I 3 I1

-

- 1,

I2 -

13

k2,

y,(O) = 1;

~ ~ (=0 1. )

Putting these values in (1 7.2) through (17.7) gives the classical addition formulas for the Jacobian elliptic functions, the treatment of which can be found in complete detail in all the reference books. For example, sn(s

dn(t) + cn(s)sn(t)dn(s) + t ) = sn(s)cn(t) 1 - k2sn2(s)sn2(t)

5

where lc is a free parameter, so really the Jacobian functions depend on t and k , but it is customary to express this explicitly. Let us return to the study of system (17.1).

LEMMA 17.1 Tf any two functions among those constituting a solution (yl(t), y2(t), y 3 ( t ) )of (17.1) vanish for one value oft, then the three functions are constant.

Proof. Let us suppose, say, that y,(to) = 0 = yz(to). Then, if (yl*(t), y2*(t),the y3*(t))are defined as follows: Yl*(t> = 0,

Y2*(0 = 0,

Y3*(t) = Y&o).

Notice that they define a solution of (17.1) which satisfies the same initial conditions at t = to as does our original solution. By the uniqueness theorem for ordinary differential equations, they must coincide. Q.E.D. I n studying the properties of a system of differential equations such as (17.1), it is often a good practice to start by finding how the system behaves when transformed by various groups of transformations of the underlying space. We shall now do so, considering only the simplest group that seems interesting. (A more systematic treatment would be very interesting, but would carry us too far afield.) Let us begin by rewriting the differential equations (17.1) as a Pfaffian system : (17.9a) 0 = 0 1 = 11 - (13 - 121Y2Y3 dt, (17.9b) 0 = 0 2 = 12 4 ' 2 - (11 - 131.~1~3 dt, (17.9~) 0 = 0 3 = I , dy3 - ( 1 2 - Ii)y,y2 dt.

235


We shall consider only the group of linear transformations of ( y l ,y 2 , y 3 , t)space that are dilations, that is, that multiply the coordinates by constants. Thus, if Q, is such a transformation, Q,*(t)= Adt,

4*(yi) = A i y i

for i = 1, 2, 3 (no summation).

Consider another system of the same form as (17.9): 01‘

= 11‘

02‘

= I,’

03’

= 13’

dy, - ( 1 3 ’ - Z 2 ’ ) ~ 2y1 dt, - (Zi’- 1

3 ’ ) ~ 1 ~dt, 3

dy3 - (12’ - I ; ) y l y , dt.

(17. IOa) (17. lob) (17.10~)

The condition that 4 carry the integral curves of (17.10) into integral curves of (17.9) is that the 4*(0) be linear combinations of the o’, that is, that we have a relation of the form

Q,*(wi)= C a i j o j f j= 1

for

i = 1, 2, 3.

Comparing the coefficients of the dy, we have

aij = 0 if i # j ,

A i l i = aiiJi’,

or

A iZi

a.. = -

’

Zi‘

(no summation).

Thus, Zl’(Z3

- I3)Az A 3 A = AiZI(Z3’ - 1 2 7 ,

- Z3’), Z3’(12- Z1)A1A2 A = A , Z3(Z2’- 13’). 12‘(ZI

- 13)AIA3 A = A ,

ZZ(11’

(17.1 la) (17.11b) (17.1 I c)

Notice that if the Z and I’ are prescribed, one of the A’, A , , A , , A can be prescribed arbitrarily. If Zl= Ill, 1, = Z2’, Z3‘,then (17.1 1) holds if and only if (1 7.12)

Now, obviously system (17.1) is preserved under time translation. Thus, if one function of a triple (yl, y 2 ,y 3 )that solves (17.1) vanishes at some value of t , then combining a permutation o f y , , y 2 , y 3 , a transformation of type (17.1 1) and a time translation will send the given solution into the Jacobian elliptic functions (possibly needing complex values for the parameters of the transformation). Thus, the first problem is to find those solutions.

236


Calculus

LEMMA17.2 Let ( y l ( t ) ,yz(t),y 3 ( t ) ) be a nonconstant solution of (17.1) defined for = 0. Then, both y 2 and y , must have a zero

aI t I 2b such that yl(a) = y,(b) in the interval a < t < 2b. Proof.

Suppose otherwise: For example, suppose that y 2 ( t )# 0 for

uI tI 2b. Without loss in generality we may suppose that a = 0. Then (17.1)

takes the form

YZ(S) Yl(t)Y3(S)

+ 0 + YLO) + t)Y3(0)

+ Y2(t)

- Y&

f Y3(t)Y1(S)

- Yl(s

*

By Lemma 17.1, y3(0) # 0. Since the denominators are nonzero, y1(2b)= 0. Hence, putting r = 2b - t , we have

vl.(t)y3(2b - t , + Y3(l )yl(2b - t , = O. Equation (17.6) gives a relation of the form y3’ Y , ( t ) 2 ( v l ( 2 b-

v + B)

= (vl(t)2

= ay12

+ p. Then

+ PlYl(2b - t Y >

or P(yl(t)2 - y,(2b - t)’) = 0. Hence /3 = 0, since y , is nonconstant. But then y 3 ( t ) = 0, and Lemma 17.1 forces y1 constant; contradiction. LEMMA17.3 Suppose that ( y l ( t ) ,y 2 ( t ) ,y 3 ( t ) )is a nonconstant solution of (17.1) defined over - co < t < 03. Then at least one of the components of this solution must vanish at least once on - co < t < co. Proof. Suppose otherwise: Then the derivatives of the components must also be everywhere nonzero. Since 11,I , , and I , are nonequal, at most one of the right-hand sides of (17.4) through (17.6) can vanish. Let us suppose, then, that I , c - m and I , c - m are nonzero. Then

with

6 # 0.

At most transforming the system by equations of the type of (17.1 l), we can suppose that

237


Thus y,(t) > 0, and (dy,/dt)(t)< 0 for 0 5 t < co; hence

t =

I”‘o’ Yl(f)

dx

J‘

But this integral converges, since at each possible singularity the integrand has a singularity of order -3 (since /3 # 0, 6 # 0), which gives the contradiction. Hence, in studying a nonconstant solution ( y l ( t ) ,y 2 ( t ) ,y 3 ( t ) ) ,we can suppose (after making a time translation and a permutation) that (17.1 3)

Y l ( 0 ) = 0.

(At this point a further transformation can be made by throwing the solution onto the one defining the Jacobian elliptic functions, but we shall not be particularly concerned with that here.) By Lemma 17.1, and (17.4) and (17.5), I, c - m # 0 and I , c - m # 0. Also, if (17.12) is satisfied, v1(-t> = -Y,(t);

Uz(-t) =Yz(t);

Y3(-t)

=r,(t).

(17.14)

To prove this, one can choose A , A , , A , , A , to make a change of variables such that

A, By (17.11), I , Zl(Q

= A , = 1,

A,

= 11’,I, = 12’, 1, = I,’.

=

-JJ,(-O,

z2(t)

=

-1 = A .

Thus the new functions

=Y 2 ( - 0 ,

Z d t )

= y,(-t)

satisfy the same system (17.1) as the old, with the same initial condition; whence, (17.14). Clearly we can use a change of variable of the type of (17.12) to suppose in addition that YZ(0) > 0,

Y3(0) > 0.

(17.15 )

So far we have been proceeding without assumptions of the signs of Zl,I , , and Z3 . Now the global behavior of the solutions is radically different, depending on whether or not all the signs are the same. The case of like signs is the one that occurs in the force-free rigid-body problem; the case of unlike signs will be left as an exercise. We can suppose without any essential loss in generality that I , > 0,

I , > 0,

I , > 0.

(17.16)

238


Calculus

We shall now show that we can suppose (at most permuting y , and y,) that dY2 ( t )

dt

0.

(17.17)

Suppose otherwise : Then,

dY3 J3 dt = (1, - Zl)yl y z ( t ) > 0

Case I

for sufficiently small t > 0.

(dy,/dt)(O) > 0.

Then y l ( t )> 0 for small t > 0, and J , > J, . Then, also, I , > J , and I , > I,, which is a contradiction. Case 2 (dy,/dt)(O)< 0

Then J3 < 1,. Hence, also, I l < J 3 , J , < I , , which is again a contradiction. Now, let K > 0 be the first positive real zero ofy, or y , . By the mean-value theorem, y,(t) # 0 for 0 < t 5 K. (17.13) Further, if (dy,/dt)(t)> 0 for t > 0 sufficiently small, then Y2(K) = 0.

(I 7.19)

We can now show how to compute K. For example, suppose that YAK) = 0.

(17.20)

Then (dy,/dt)(t)< 0 for 0 < t s K . We can solve (1 7.4) and (17.6) in the form dY

2 = -JG,y,2dt

~

-~

pz)(yzy,2 - 6,)

for

oI tI K.

Thus, t=

dX

-

for 0 1 t

I

K.

Hence, (17.21)

In particular, notice that K < 00.

239


Once we have found a K that is a zero of y,(t), what can be done with it? The addition formula (17.7) enables us to extract considerably more information, as described in the following theorem.

THEOREM 17.4 Let (y,, y , ,y 3 ) be a nonconstant solution of (17.1) (with no assumptions on 11,I , , Z3 other than I , # 0, I , # 0, I , # 0 ) ; let K # 0 be such that yl(0) = 0 = y 2 ( K ) . Then Yl(t + 2 K ) = -.Yi(t),

~ 3 (+ t 2 K ) = y3(t),

y2(t + 2 K ) = - ~ z ( t ) ,

for all t for which these make sense. In particular, y1 and y z are periodic with period 4K, while y 3 is periodic with period 2K. Proof. Let us rewrite (17.7) in the form

(y2(t)+ y2(s))(yl(t+ s)y3(0))= (V2(l Put t = K

= s.

+ s> +y2(0))(yl(t)y3(s)+ Y3(t)y(sl)).

Now either

y2(2K) + y2(O) =

y1(f)y3(2K- t,

Or

+ Y1(2K - t)Y3(f) = O. (17.22)

We shall rule out this second identity. Now (17.5) can be solved in the form Y3

,= I , c - m - ( I , I , (12 13

Z12)y1Z

- 13,)

Substituting in this would give 1, c = m. In particular, y3(0)= 0, which would contradict the nonconstancy of the solution. Now we can use (17.7) again, with a permutation of yl, y , , and y 3 : [Yl(t)

+ y,(s)lb,(t + S)Y3(0)+ Y3(t + slYz(O>l = [Yi(t

Putting t

=s =K

+ $1+ Y I ( O ) I [ Y ~ ( ~ ) YJ’3(t)Yz(s)l. ~(~)

makes the right-hand side zero; hence = y2(2K)Y3(0)

+ y3(2K)yZ(o),

whence, using ( 1 7.22), we have Y3(2K) = Y3(0).

(17.23)

Similarly, playing with the other permutations proves that y1(2K)= 0. Now put

(17.24)

240

Part 2. Hamilton-Jacobi Theory-Variational Calculus Zl(t) =

-Yl(t)>

z&) = - Y z ( 4

z3(t>= Y 3 W .

Notice that (z,(t),z2(t),z 3 ( t ) )is also a solution of (17.1), which has the same initial conditions at t = 2K as does ( y , ( t ) ,y 2 ( t ) ,y 3 ( t ) )at t = 0. Thus -Y1@> = Y l ( t + 2 K ) , -Y2@) = Y 2 ( f

+ 2K),

Y 3 0 ) = Y 3 ( t f 2K)>

which proves the theorem. Remark. Notice that Theorem 17.4 is purely formal and holds for the complex variable case as well. However, to consider this extension to complex variables would take us too far afield, and we must refer to Whittaker and Watson [l].

18 Accessibility Problems for

Path Systems

General Remarks From a higher point of view, the calculus of variations should be regarded as the “theory” of a real-valued function on an infinite dimensional space, namely, the space of curves on the underlying configuration space. One might think that it would be possible to eliminate the confusion and ambiguity that afflicts the subject by treating it consistently from this point of view. However, in searching for such a panacea, one must have respect for the fact that the calculus of variations has the longest history of any currently active branch of mathematics; in addition, this basic insight into the calculus of variations has been explicit since Volterra’s pioneering work on ‘‘infinite dimensional manifolds” in the 1880’s. In this chapter, we hope to demonstrate that this insight is useful for developing intuition into the mathematical structure of the subject. However, the foundations are still unsettled; there is no point in committing to print a full-scale exposition. Let M be a manifold and let P ( M ) denote the space of paths of M . (Recall that a path is a continuous map of an interval [a, b] of real numbers into M that is piecewise Cm.)By a path system on D we mean a subset (denoted, say, by n ) of P ( M ) having the property that if a path 0 belongs to 71, all paths obtain from 0 by changing the parametrization of 0 and by restricting (r to a subinterval of its domain of definition also belong to n. Now, our basic intuition is that P ( M ) is to be regarded as an “infinite dimensional manifold” and that we shall be considering path systems obtained by setting a set of real-value functions on P ( M ) (usually uncountably infinite in number) equal to zero. Thus, in a sense, we are to regard a path system 71 as a “submanifold” of P ( M ) . Let n be a given path system on M . Since paths can be freely parametrized, let us suppose they are defined over the interval [0, 11, with the parameter denoted by t. For p E M , let n(y) be the set of all points of M that can be joined t o p by a path in n.We can look at this in the following way: Consider the subspace of paths in n that begin at p , denoted say by np, and map n p + M by sending a path into its end point. The “ accessibility ” problem, in full generality, is to describe n(p) in terms of the differential geometric invariants of n. 24 1

242


Calculus

We see the analogy with a problem for ordinary finite dimensional manifolds. Suppose A and B are two such spaces, and 4 : A + B is a map, with dim B 5 dim A . We are interested in finding conditions that guarantee that 4 is onto B. One such condition, of a local nature, is given by the implicit function theorem: For a point a E A , 4* is a linear map A , B,,,,. If it is onto, 4 covers a small neighborhood of ~ ( c I )If. 4*(Aa)= B,,,, for all CI E A , then It is easy to see that this local fact can be “expanded out” globally, provided there is some condition of uniformity for the norm of 4* as CI varies on A . Considered from the opposite point of view, the “critical points” for the discussion of onto-ness are the points CI E A such that 4,(A,) # Bb(,). Dually, such a critical point would be a point having the following property: There exists a function f E F ( B ) with d j # 0 at $(a) but d ( 4 * ( f ) )= 0 at a. Now, a map may be onto, even though it has critical points. Consider the maps 41 and $ 2 : R + R, defined by 41(x) = x2 and 42(x) = x3 for I E R. Both have critical points at x = 0; the first is not onto; the second is. (As a side point, we may mention that the key fact is that the Jacobian $,‘(x) of changes sign, whereas the second does not.) We shall not pursue the discussion of the type of singularities of mappings that are sufficient to guarantee that a map be onto. This would invoke a delicate and still largely unknown field. (See paper by Hartman and Nirenberg [I], for a beginning in this direction, at least for the case where A and B are of equal dimension.) However, this simple example does suggest one such invariant : Suppose . f F~( B ) satisfies df # 0 at $(a), but d 4 * ( f ) = 0 at a. Thus, 4 * ( f ) has a critical point in the ordinary sense; hence we can define its Hessian, which is a quadratic form on A , . Suppose the Hessian is positive semidefinite. Then CI has a neighborhood U in which --f

#*(f)(x’) 2

4*(f)(~1)

for all a’

E

U,

that is, f ( + ( a ’ ) ) - f ( 4 ( ~ 1 ) ) But . then 4 ( U ) cannot cover a neighborhood of 4(a), for otherwisej would have a critical point at ~ ( c I ) . Another simple remark suggests itself: Consider the set of all functions f E F ( B ) such that 4 * ( f )= constant. Obviously, a necessary condition that 4 be onto is that any such function be constant on B. It is also plausible that in some cases (for example, if the image set 4 ( A ) is not too wildly behaved) this condition will be sufficient. Let us return to the case where we have the end-point mapping x p M of the space of a path from a given path system x on a manifold M beginning at the point p . Pursuing the analogy in the preceding paragraph, we can ask for functions on M that are constant when pulled back to xp under the mapping. Such functions obviously have the geometric property of being constant along those paths in x beginning at p . As we shall see below, for common --f

18. Accessibility Problems for Path Systems

243

types of path systems defined by systems of ordinary differential equations, functions on M which are constant on paths of the system must satisfy certain systems of partial differential equations. Thus, if we can prove that these partial differential equations have no nonconstant solutions, we can hope to turn this around and actually p r o w that n ( p ) = M . To make more explicit the program sketched in the last paragraph, let us make more definitions. Let U be an open subset of M . A function f E F ( U ) is said to be an integral of the path system n if f(a(t)) = f(a(0)) 0:[O,

114 U

for 0 I t _< 1, every path that belongs to n.

The set of all such integrals will be denoted by Z(n, U ) . Since they can be multiplied and added, they form a ring of functions. We use these rings of functions to define a new path system, denoted by n*, called the completion of n, in the following way: A path a: [0, I ] + M is in K* if and only if for every open subset U c M , every f~Z(n, U ) ,f ( a ( t ) )= constant for those values of t for which o(t) E U . Clearly, n c n*. (n* is something like the dual of the dual of a vector space.) We say that the path system TC satisfies the dualityprinciple if n*(p) = n(p)

for all p E M .

(18.1)

Intuitively, we may summarize (18.1) by saying that all points are accessible from a point p along paths in n that are not obviously inaccessible.” For the existence of a nonconstantfs 1(n, U ) sets up a priori limitations on the accessible points, since all paths in n must lie on the hypersurfaces f = constant. There are several other weaker versions of the duality principle that may be formulated, but the one we have just given should be sufficient to indicate the idea. Now, there does not seem to be as yet any general theorem describing necessary and sufficient conditions that a path system satisfy the duality principle. This seems to be an important subject for further research. The closest thing to such a general theorem is a theorem due to Chow [l], which may be interpreted as proving the duality principle in the case where n consists of all the integral curves of a nonsingular vector-field system whose derived vector-field system is also nonsingular. In fact this is quite typical of the general situation, namely, that in order to prove the duality principle, certain local nonsingularity conditions must be imposed. In turn, Chow’s theorem is a generalization of a famous theorem due to CarathCodory [ I ] that gives a geometric condition that a Pfaffian equation admit an integrating factor. It is interesting that Carathkodory’s theorem “

244


Calculus

arose from accessibility conditions. It asserts that the second law of thermodynamics, in its form postulating the nonexistence of a perpetual motion machine of the " second kind," can be formulated as requiring that certain points of the "phase space" of a physical system be nonaccessible along adiabatic paths, that is, on integral paths of a Pfaffian form 8 in phase space representing '' heat." It is possible to prove the Lagrange multiplier rule for extremals of the Lagrange variational problem, using accessibility idea. (This is due to Bliss [l] and Radon [l].) We shall now explain this approach. Suppose that D is a convex domain in R" with coordinates x l , . . . ,x,. A Lagrange variational problem in homogeneous form will be supposed given on T ( D ) . Thus we are given functions g l ( x , i), . . . ,gm(x,i) and L(x,i) on T ( D ) that are homogeneous in x. Recall that a curve x(t), 0 I t 5 1, in D is an extremal of the Lagrange variational problem if (18.2a) (18.2b) for all paths 2(t), 0 5 t 5 1, such that (i) 2(0) = x(O), 2(l) = x(1); that is, 2(t) has the same end points as x(t). (ii) g0(2(t),(d2/&)) = 0 for 0 I t 5 1, 1 5 a I m, that is, the path 2 ( t ) satisfies the constraints given by (18.2a). Bliss' idea is to convert this Lagrange variational problem into a Mayer variationalproblem. Since we do not want to go into full detail here explaining the Mayer problem,? we shall simply describe the situation explicitly without attaching special labels to the construction. Introduce another real variable, labeled y , to our variables xl, . . . ,x , . Let D* be the following convex set in R"":

D* = {(xi, . . . , x, y ) : ( X I ,. . . ,x,)

E

D, - co < y < CO}.

Consider the following path system n in D*: A path ( ~ ( t y(t)), ), 0 5 t 5 1, in D* lies in n if and only if dt

(18.3)

iThe Mayer and Lagrange problems are equivalent in the sense that one can be transformed into the other, at least locally. We are concentrating on the Lagrange problemin this book because we think it is more natural from a differential-geometric point of view, although it seems that the Bliss school of the calculus of variations considered the Mayer problem to be more basic.


245

for 1 < a 5 m, 0 i t i 1 . Let x ( t ) , 0 i t i 1, be an extremal curve of the Lagrange variational problem defined by gl,. . . ,g, and L, that is, a curve satisfying (18.2). Construct y ( t ) so that

Then the curve ( ~ ( t )y,( t ) ) in D* belongs to n, that is, satisfies (18.3). We now show that the fact that x ( t ) is an extremal, that is, satisfies (18.2), implies that the point (x(l), y(1) - E ) in D* is inaccessible with respect to 7-c from the point (x(O), 0) in D*,for any positive number E . Suppose otherwise: that is, (2(t),$(t)), 0 I t I 1, is a path in 71 joining these two points. Then, from (18.3), ga(2(t),(d2/dt))= 0 ; that is, 2(t) satisfies the constraints, and 2(0) = ~ ( 0 ) 2(l) ; = x(1). Then

hence,

hence,

contradicting that x(t) is an extremal. Apply the duality principle now. If n is sufficiently small, which we shall suppose is true, there is a functionfE I(n, D*),that is, a functionf(x, y ) such that d

-f(x(t),

dt

y(t)) = 0

for every curve ( x ( t ) ,y ( t ) )in n ;

hence,

Now these conditions on f are implied by the following conditions:

af axi

-(x, y ) i i

af ( x , y ) L ( x , i) +=0 aY

(18.4)

for every (x, i, y ) satisfying ga(x, i) = 0. These conditions are in turn implied

246


by the following conditions: There exist functions &(x, y , i) such that af (x, y)%

axi

+af (x, y ) L ( x , i )= La(% y . i)g.(x, aY

i)

(18.5)

identically in (x, y , i). (We are using the summation convention, with the following range of indices: 1 i, j , . . . I n ; 1 I a, h, . . . m.) Now, conversely, (18.5) is implied by the preceding two conditions if suitable local regularity conditions are satisfied and if again D is sufficiently small. (This is just a matter of applying the implicit function theorem.) I n this semi-intuitive account, we shall short-circuit the matter by assuming that (18.5) is true. Then, applying d / a x i to both sides of (18.5), we have

a j af

-+ - - = -aL g

axi

ay ai,

aIa

ai,

a

aga +I a

xi'

or multiplying by dxi and summing, we have

df

= o('a)ga

af + La o ( g a ) - - O(L), aY

where

Thus O(L) and O(g,) are the Cartan 1-forms associated with the Lagrangian functions. (Recall that L and ga are assumed to be homogeneous in i.) We can now eliminate df by applying d to both sides: d(O(&)g,)

= d ( M ( L )-

IaO(ga)),

af . where I = dY

Now let ( x ( t ) , y ( t ) ) be the curve in D* satisfying (18.3), with x ( t ) the given extremal of the Lagrange variational problem. Let o(t) be the curve (x(t), y ( t ) , (dxldt)) in (x, y , i)-space. Since go(x,(dxldt)) = 0, we see that

is a characteristic curve of d(hQ(L)- L,O(g,)>. Working out the explicit conditions that this be so, we see that x ( t ) satisfies the equation

where 4 t ) = A(x(t),A t ) ) , &(t) = Aa(x(t),y(t)). This says, however, that with this choice of the " Lagrange multipliers" ( A ( t ) , & ( t ) ) , the curve x(l) in D is an extremal of the time-dependent, ordinary,


247

unconstrained variational problem whose Lagrangian is : A(t)L + A,(t)g,. This, however, is just the " Lagrange multiplier rule " for finding the extremals of the Lagrange variational problem with which we started. In summary, we may say that we have shown that the accessibility problem is really the basic one for the calculus of variations, underlying all the classical variational problems. We now turn to a treatment of the accessibility problem for vector-field systems, and in particular, to the proof of Chow's theorem.

The Accessibility Problem for Vector-Field and Pfaffian Systems Since the material in this section will be geometric and independent of coordinate systems, it will be convenient to work with a differentiable manifold M instead of a domain D in R". Recall the relevant notations: F ( M ) denotes the ring of functions on M . (All functions, maps, vector fields, curves, etc., will be of differentiability class C" unless mentioned otherwise.) V ( M ) denotes the set of vector fields on M . Each X E V ( M ) is by definition a map F ( M ) -+ F ( M ) that defines a deriuation of F ( M ) . For f E F ( M ) , X ( f ) , the image off under this map is the Lie derivative off under X . (Alternatively, X can be thought of as a first-order partial differential operator on functions on M , and X ( f ) is then the result of applying the operator symbolized by X to f.) Geometrically, if ~ ( t ) a, t 5 b, is an integral curve of X , then

For p E M , M p is the tangent space to M . Each v E M , is by definition an R-linear map: F ( M ) -+ R such that 4 f g ) = v(f)g(p)

Each X

E

+ f(p)v(g)

for all f , 9 E F ( W .

V ( M ) determines a value X ( p ) E M , at each p E M . X ( P > ( f >= X ( f ) ( P )

for

f

E

F(M).

If a(t), a I t Ib, is a curve, o'(t), the tangent vector to a at t may be defined as that element of Mo( 0 such that o can be extended as a C“ map to (a - E , b E ) . ) The tangent spaces to two points of M are both n-dimensional real-vector spaces; hence they are isomorphic when considered as abstract vector spaces. However, there is no unique way of describing such an isomorphism. Of course, if M is Euclidian space itself, the tangent spaces are isomorphic with M (hence with each other), but we have been emphasizing that this must be ignored if one is interested in nonlinear phenomena. By parallel translation along o we mean some method of setting up consistently and smoothly an isomorphism between M+) and Mu(b)for a It 5 6 . It is to be expected, however, that the isomorphism one derives between Mu(n) and Mu@) will depend on the choice of curvejoining o(a) to o(b). We shall not worry here about the more general scheme for accomplishing this goal, but shall describe how an affine connection on M , that is, a covariant differentiation law V satisfying (19.1), gives such a process. Suppose the curve o is an integral curve for a vector field X on M ; that is, o’(t)= X ( o ( t ) ) for a 5 t I b. Of course not every curve can be so described, but those that can be are sufficiently plentiful for our purposes. Suppose that Y is another vector field on M . Consider the condition

+

“

V, Y(a(t))= 0

tI b. for a I

”

(19.3a)

If Y satisfies this condition, we say that its covariant derivative along o is zero. Let us for the moment look at this condition in a domain D c R” with coordinates ( x i ) . Suppose that o(t) = (xi(?))= x(t). Using (19.2) and the relations

we have

264

Part 3. Global Riemannian Geometry

Thus we see that (19.3a) is equivalent to the relations

As a first observation, notice that this condition on X and Y involves only the values of X and Y on the curve o(t). Thus, if x(t) = ( x , ( t ) ) ,a 5 t 5 b, is an arbitrary curve in D, and if v : [a, b ] + T ( D ) is a vector field along x ( t ) , that is, v(t) E Dx(r,for a 5 t 5 b, with components vi(t) = dx,(u(t));that is, if

we may define another vector field, denoted by V v ( t ) ,along the curve x(t) by the following formula and have reasonable expectations that the operator v -+ Vv on vector fields along the curve is of interest. (Vv is called the couariant derivatiue of v along the curve.) (19.4) Note that Vv(t) = 0 if and only if (19.5) As soon as x ( t ) is given, (19.5) may be regarded as a system of first-order, linear, homogeneous differential equations for the components uk(t) of u. Thus we see that, given a ua E D,,,, , there is a unique vector field v(t) along the curve such that Vu = 0 and v(a) = v", and that if us is another element of D,,,, , if the vector field u(t) along x(t) satisfies V u = 0 and u(a) = us, then

+

V(u(t) u(t)) = 0

and

u(a)

+ u(a) = aa + ua.

Thus the correspondence v" -+ u(t) sets up an isomorphism between the vector spaces D,,,,and D x ( rfor ) each t E [a, 61. This is the desired "parallel translation " of tangent vectors along the curve. We can sum up what we have proved for domains of R" in terms of an arbitrary manifold, in the form of the following theorem. The proof for an arbitrary manifold can be done by referring pieces of the manifold back to R" via charts, and will be left to the reader.

THEOREM 19.1 Let M be a manifold with an affine connection described by a covariant differentiation operation ( X , Y ) + V, Y satisfying (19.1). Let 0 : [a, b] -+ M

19. Allhe Connections on Differential Manifolds

265

be a curve in M . A vector field on 0 is a mapping, usually denoted by v, assigning a tangent vector u ( t ) E Mu(t)to each t E [a, b]. Then there is an operation assigning a new vector field V v to each vector field v on o, with the following properties: (a) If a ( t ) is a real-valued function o f t , a i t i 6 , da V(a(t)u(t))= - u ( t ) + a(t)Vv(t). dt

(b) If u and u are vector fields along o, then V u + V v = V ( u + u). (c) If X and Yare vector fields on M , with o'(t)= X ( o ( t ) ) ,u(t) = Y ( o ( t ) ) for a I t I b, then Vv(t) = V, Y(o(t)). (d) If v, is a given tangent vector in Mu(,,, there is a unique vector field u along o with v(a) = ua and V v = 0. The correspondence va 4 v(t), for each t E [a, b], is linear, and sets up an isomorphism between Mu(,) and Mu(t), called the parallel translation of tangent vectors along o. Any vector field v on o such that V v = 0 is said to be self-parallel. If M = R", a special affine connection can be defined by requiring that V,,,,,(a/dx,) = 0; if v(t) = v,(t)(a/ax,)(o(t)),the condition that Vv = 0 reduces to ( d ~ / d t=) ~ 0, that is, u i ( t ) = vi(a).Thus the parallelism idea does not really depend on the curve, and the isomorphism between tangent spaces is that obtained by identifying the tangent space to Euclidean space at a point with Euclidean space itself. The straight lines in Euclidean space are those whose tangent vectors a t different points are parallel, that is, curves o(t)satisfying Vo'(t) = 0.

(19.6a)

Thus we are justified in calling the curves satisfying (19.6a), in a space with a general affine connection, the straight lines or self-parallel curues of the space. Let us look at the conditions by referral via a chart back to a domain with coordinates ( x i ) in R". If o ( t ) = ( x i ( t ) ) ,the components of the vector field o'(t) along o are ( d x i / d t ) ,and, using (19.5), (19.6a) becomes d 2 x i ( t ) = - r i j k ( x ( t ) )dxi 2. dx .

dt2

dt dt

(19.6b)

These differential equations are, of course, nonlinear and of second order. The most one can say is that there is a unique solution with x(a) and (&/&)(a); that is, with o(a) and o'(a) prescribed, that the solution to this initial value problem exists if b is sufficiently close to a, and that the obtained solution is a C" function of the initial conditions x(a) and (dx/dt)(a).Of course we also have the following homogeneity condition : If t + a(t) is a solution of (19.6a), so is the curve t --+ o(At), where 1 is a real constant.

(19.7)

266


Now we turn to the description of the torsion and curvature tensorfields associated with an affine connection on a manifold M given by a covariant differentiation operation satisfying (19.1). We shall mainly restrict ourselves to formal considerations, since our goal is to sketch the theory of affine connections in as efficient a manner as possible for use as a tool in Riemannian geometry. The torsion and curvature tensors are, respectively, maps T: V ( M ) x V ( M ) + V ( M ) and R : V ( M ) x V ( M ) x V ( M ) + V ( M ) , defined as follows for X , Y , Z E V ( M ) : T ( X , Y ) = v x Y - v y x - [ X , Y1.

(19.8a)

R ( X , Y ) ( Z )= VX(VY Z ) - VY(VXZ) - V[X,Y1Z-

(19.8b)

We write R ( X , Y ) ( Z )because we mean to suggest that R should be interpreted as a law assigning to each pair ( X , Y ) E V ( M ) a mapping 2 R ( X , Y ) ( Z )of V ( M ) into itself. Certain algebraic properties of T and R should be evident. First, both T and R are skew-symmetric in X and Y. Second, both are obviously R-multilinear in their arguments. Less obviously, however (since V is not P ( M ) multilinear), they are both F(M)-multilinear with respect to multiplication of X , Y , or 2 by functions from F ( M ) . For example, using (19.1), for f E F(M): T(fX,Y ) = V f xY - V y ( f x ) - [ f x ,Y I =f V , Y - Y(f)X-f V y X Y(f)X-f [ X ,Y ] = fT(X, Y).

+

N f X , Y ) ( Z )= fV,VY - V Y ( f V X Z )- VLfX,Y ] Z = f V x V , Z - Y(f)V,Z - f V , V x Z + Y(f)VxZ-fV,,,Y,Z = f R V , Y)(Z). That f pulls through with no differentiations acting on it when the other arguments are multiplied by it is verified similarly. This property indicates the tensorial character of T and R. We can use this property to define the values of T and R at each pointy E M in a way similar to that we used earlier to define the value of a differential form at a point. The value of T at p, denoted by T,, is to be a bilinear mapping M p x M , M , . For u, u E M , , choose vector fields X , Y E V ( M ) with X ( p ) = u, Y ( p )= u, and define T,(% 0) = T ( X , Y>(P>. The value of R at p , denoted by R,, is to be a bilinear mapping M p x M , x M , 4 M , defined as follows: For ul, u 2 , v E M,, choose X , Y, 2 E V ( M ) with X ( p ) = u, Y ( p ) = u 2 , Z ( p ) = z), and --f

R,(uI> u*)(u) =

w,Y)(Z)(P).

267

19. Aftine Connections on Differential Manifolds

Of course we must verify that these definitions make sense, that is, are independent of the vector fields chosen to extend the tangent vectors at p . One can show that it suffices to verify this for a neighborhood of each point p . Using charts, it suffices, then, to suppose that M is a convex domain of R”,with coordinates ( x i ) .Thus, for example, if

a axi

u = ui -( p ) ,

a axi ’

X=A,-

Y

a axi

= Bi -,

2,

a axi

= V i -( p ) ,

with A i ( p ) = u i , B i ( M ) = u i ,

and if

(the T i j k are the components of T with respect to the coordinate system (xi)), then

This shows quite explicitly that defining T,(u, v) = T ( X , Y ) ( p ) is legitimate, since it is independent of how u, u are extended to vector fields. Similar considerations hold also for the curvature tensor, of course. Consider now a manifold A4 and a two-parameter family 6(s, t ) in M , a I t I 6, 0 2 s 2 1 ; that is, 6 is a map [0, I ] x [a, b] M . The geometric interpretation of 6 is mainly a matter of taste, of course, but the following picture will be most useful to us: For s held fixed, the curve t -+ 6(s, t ) is a curve in M , denoted by a,, say. Thus s -+ a, can be considered as a oneparameter family of curves, or as a curve in the space of curves of M . 6(s ,t) can be thought of as a homotopyor deformation of the curve t a(t) = a,(t) = 6(0, t). For notational convenience, we shall usually normalize the parametrization of t to be also the unit interval [0, 11. A vectorJieZd on 6 is a map denoted by, say, z), of [0, I ] x [0, 11 T ( M ) , assigning to s, t E [0, 13 a tangent vector u(s, t ) E M,,,, t ) . The vector fields a,6,d, 6 on 6 are defined as follows : -+

-+

-+

For 0 I s, t I 1 , d, 6(s, t ) and a, 6(s, t )are, respectively, the tangent vectors to the curves u 6(u, t ) and u + 6(s, u) at u = s and u = t. Thus a,6 and d,6 are the tangent vector fields to the curves obtained by holding, respectively, t and s constant and varying s and 1. -+

(19.9)

268


Similarly, if v : (s, t ) + v(s, t) is a vector field on 6, and if V is a fixed afiine connection on M , define V,u and V,u as vector fields on 6, the covariant derivatives of v in, respectively, the s and t-directions, as follows: For 0 I s, t I 1, V, v(s, t ) and V, u(s, t) are, respectively, the value of the covariant derivative vector fields at u = s and u = 1 of the vector fields u + v(u, t ) and u-+ u(s, u) along the curves u+6(u, t) and u + 6(s, u).

(19.10)

The reader will find it easier to keep these definitions in mind by referring to a special case, namely: Suppose that X and Y are vector fields on M such that for each s, t -+ 6(s, t ) is an integral curve of X , and for each t , J + S(s, t ) is an integral curve of Y . Then, from (19.9),

a, qS,t ) = x(qs,t i ) ,

ai6(s, t ) = y(qS,t ) ) .

(19.1 1)

Of course not all homotopies can be written in this form, but usually this type of homotopy is sufficiently general so that results proved for them extend to arbitrary homotopies. If also there is a vector field Z on M so that Z(6(s, t)) = v(s, t), (19.10) takes the form

v, v(s, 0 = v, Z(S(S, t ) ) , v, u(s, t ) = v, Z(S(S, t ) ) .

(19.12a) (19.12b)

Note that if X , Y E V ( M ) satisfy (19.11), then

Proof. It suffices to prove this in case 6 is contained in a sufficiently small open set of M . We may as well suppose, then, that we are in a domain of R" with coordinates (xi). Suppose that

a

X=Ai--,

axi

a

Y=Bi--,

axi

and that S(S,

t ) = ( X i ( & t ) ) = x(s, t ) .

Then (19.1 1) reduces to

axi

at = A i ( X ( S ,

t)),

axi

as = B,(x(s, t)).

269

19. Affine Connections on Differential Manifolds

Thus,

a2xi - I aZx. = -aA.ax. Biaxj 0 =1 2 - a-asat ataxi axj as axj at

whence (19.13). Now we are prepared to state the fundamental formulas connecting this " covariant derivative along curves " concept with the curvature and torsion tensor fields : Suppose that 6(s, t ) , 0 s, t 1, is a homotopy of curves in a manifold M with affine connection V, and that u : (s, t ) + u(s, t ) E Ms(,,t).is a vector field on 6. Then,

v, a, W ,t ) - V, 8, 6(s,

vsVAs, t ) - Vt V,u(s,

t ) = ~ ( , , ~ ) (6(s, a , t ) , a,6(s, t ) ) .

(19.144

t ) = Rqs,t)(as6(s, t ) , at N s , t))(u(s, t)),

(19.14b)

where, for each p E M , T, and R, are respectively the values, as defined above, of the torsion and curvature tensors of the affine connection. Proof. We shall give the proof only in the special case where there are vector fields X , Y on M related to 6 via (19.11), and a vector field 2 on M such that u(s, t ) = Z(6(s, t)). The proof in complete generality must be done by a straightforward but tedious computation in local coordinates, which we leave to the reader as an exercise. By (19.13),

v, a, ~ ( s ,t ) - v, a, 6(s,

[ X , Yl(6(s,

0 )= 0.

t ) = v Y m s ,t ) ) = =

vx y w , ti) - [IY,

w, X)(W, 0) G(s,,)(a,

t ) , a,

XI(^

t)),

whence (19.14a). By (19.12),

V,u(s, t ) = V,Z(6(S, t)),

V,u(s, t ) = VyZ(h(s, t ) ) .

Then applying (19.12) again, we have

v, v,u(s, 0 = VY v x Z(Ns, t ) ) v,VS& 0 = VXVY Z(d(s, 0) =

CVYVXZ + V[X,Y,Z + R ( X , Y)(z)l(s(s,

t)).

o)

270


Now R ( X , Y > ( z ) ( s ( s1)) , = R6(s,t)(X(fi(S,t)), Y ( % t)>)(Z(6(s, t))), by definition of the "value" of R at a point. We have seen that [ X , Y](d(s, t ) ) = 0 implies that VLx,y,(Z)(6(s, t)) = 0. Subtraction now gives (19.14b). As first application of (19.14), we have: THEOREM 19.2 Suppose that M is a manifold with an affine connection. If, for any curve

o: [a, 61 -+ M , parallel translation of Mu(a,to Mu(,,,along o does not actually depend on the curve joining a(a) to o(b), then the curvature tensor of the

affine connection is identically zero. Conversely, if the curvature tensor is identically zero and if M is simply connected, then parallel translation of tangent vectors along curves joining any two points of M really does not depend on the curve joining the two points.

Proof. Suppose that S(s, f), 0 I s, t I 1, is a homotopy of curves with fixed end points; that is,

6(s, 0)= 6(0,0),

6(s, 1) = 6(0, 1)

for 0 I s I 1.

Let vo be an element of Ma(o,o,,and define a vector field v(s, t ) on 6 as follows: u(s, 0) = uo, and the vector field t u(s, t ) o n the curve t -,6(s, t ) is self-parallel. Analytically, this means that V, u = 0. Applying (19.14), we have -+

V, V, v

= 0 = V,

V, v

+ R(d, 6, d,6)(~).

Now u(s, 1) E 1) is the parallel-translate of vo along the curve t -+ 6(s, t). Thus, if parallel translation is independent of the curve, we must have u(s, 1) = v(0,l),

whence Vszl(s, 1) = 0,

whence &(a,

lps 6(0, I)>3, 6(0,l))fV(O,1))

= 0.

Since vo and 6 can be chosen completely arbitrarily, we must obviously have R = 0. Conversely if R = 0, suppose o(t) and ol(t),0 I t I 1, are two curves joining o(0)and a(1). If M is simply connected, we can find a fixed end-point homotopy 6(s, t ) such that

6(0, t ) = o(t),

6(1, t ) = Ul(t).

Let u be a vector field along 6 such that V, v = 0 and ~ ( s ,0) = uo. Since R = 0,V, V, u = V,V, v = 0. Since ~ ( s 0) , = uo, also Vsv(s, 0) = 0.

19. ABne Connections on Differential Manifolds

271

Hence, for fixed s, V,u(s, t j is the parallel translate of V,D(S,O j along the curve t + 6(s, t ) ; hence V, D(S, t ) must also be zero. In particular, V, u(s, 1) = 0. Since 6(s, 1j = 6(1, l), this forces u(s, 1) = u(0, I j for all s; in particular v(1, I> = v(0, 1). But v(1, 1) and v(0, 1) are, respectively, the parallel transQ.E.D. lates of the vector uo along the curves 0 and c1.

20

The Riemannian AfFine Connection and t h e First Variation Formula

Suppose that M is a manifold with a Riemannian Lagrangian L E F ( T ( M ) ) . As explained in the Introduction, this means that, for each p E M , there is a positive definite symmetric bilinear form (u, v) + ( u , u ) on each tangent space M,, such that

L(u) = ( v ,

u)"2

= (/?I((.

If X , Y are vector fields on M , then inner product of X and Y , denoted by ( X , Y >, can be defined as

( X , Y X P ) = < X ( P ) ,Y(P)). This inner product has the properties:

( X , Y ) EF(M)

for

X,Y

E

V(A4).

( X , Y > = ( Y , x>. (flX,

(20.la) (20.lb)

+ f 2 X 2 , Y ) = f l ( X , , 0+ f,(X,, y > for f l , f 2 E F ( M ) , X 1 , X 2 , YE V(A4). (20.1~) ( X , X ) 2 0.

( X , X ) ( p ) = 0 implies X ( p ) = 0.

(20.ld)

Clearly, a Riemannian metric can be just as well defined by an inner product satisfying (20.1). If (20.ld) is replaced by the weaker definiteness condition, (X,Y )

=0

for all Y

E

V ( M ) implies X = 0,

(20.ld')

then the inner product is said to define a pseudo-Ric.manniun structure on M . Many of the elementary formal properties carry over from Riemannian to pseudo-Riemannian situations. However, since the deeper global properties of the geodesics do not carry over in a routine way, we shall restrict ourselves to the Riemannian case. Of course the pseudo-Riemannian case is very important for the theory of relativity, but we must refer for the moment to Lichnerowicz' book [l] for an account of this aspect. 272

273

20. Riemannian AfFine Connection

THEOREM 20.1 Suppose that M is a manifold, with a Riemannian metric defined by an inner product for vector fields satisfying (20.1). Then there is a unique affine connection V on M such that (a) V,Y = V, X + [ X , Y] for X , Y E V ( M ) ; that is, the torsion tensor is identically zero. (b) Z ( ( X , Y)) = (V,X, Y ) ( X , V z Y ) for Y , X , 2 E V ( M ) .

+

This affine connection is called the Riemannian (or Levi-Civita) connection.

Proof. We prove uniqueness first. Rewrite (b) as < v , x , Y> = Z(X, Y> - (X, = (using (a)), = (using

Y>

z ( X , Y> - (X, V,Z) - ( X , [Z,

Yl)

(b) again), Z(X, Y) - (X, [Z, Y])

- Y(X,

z>+ (VYX, 2 )

= (using (a)),

-

v,

Z(X, Y> - <X, [Z,

Yl>

Z(X, y > - (X, [Z,

Yl>

y<x,z>+ + (CY, XI, z>

= (using

+ - X(Y, z>- (Y,

= (using (a)), z ( X ,

Y> - (X, [Z,

- Y(X,

z>

VXZ)

Yl)

- Y(X, Z >

+ ( [ Y , XI, z>+ X- (Y, VZX) - (Y, cx,Zl).

Finally, then, we have (VZX, Y> = f(Z(X, Y> - (X, [Z, Y1) - Y(X, + X(Y, z>- (Y, [X, Zl)).

z>+ ([Y,

XI,

z>

(20.2)

Since the right-hand side does not now involve V, uniqueness of V is proved. We can also use this formula to define V,X if it is verified (left to the reader) that when f Z or f Y is substituted for Z or Y (for f E F ( M ) ) , the functionfpulls out to multiply everything on the right. Alternately, we work out V i n terms of a coordinate system ( x i ) . Suppose gij

= (-9

a

a -).

axi axj

The rules (20.1) imply that for each p, ( g i j ( p ) )is a positive definite, symmetric

274


matrix. The g E jare the components of the metric with respect to the coordinate system. The Lagrangian L(v) = ilull can then be written as

Let (g,;’) be the inverse matrix to (gij). Finally, then,

(20.4) This formula can serve to define V in each coordinate patch. The proven uniqueness guarantees that the V defined in each patch agrees in the overlap when two patches intersect; hence defines a V operator globally on M . We can use the Riemannian metric to define a length function for curves. If (T: [a, h] -+ M, the fengtfz of (T is f b \ l d ( t ) \ l dt = jbL(d(t)) d t = L(o).

*a

a

(20.5)

Of course the general theory of homogeneous, regular, ordinary variational problems developed in Chapter 14 applies in this special case, but it is instructive to go over the same ground, using the Riemannian affine connection, which is not available when discussing more general variational problems. Certain normalizations of the parametrizations of curves are convenient. First we say that the curve (r is parametrized by arc length if ( ~ ’ ( t ) a’(t)) , = 1 for u I t I b. In this case, b - a is the length of the curve. We say that cr is parametrized proportionally to urc fength if

b Then

=

1,

a = 0,

( ~ ‘ ( t ) , a’(t)) =

length

(T

=

(o’(O),o’(0))

for 0 I t I 1.

(o’(O), 0’(0))’’~.

Consider a homotopy 6(s, t ) , 0 5 s, t I 1, with 6(0, t ) = o ( t ) ; that is, the homotopy defines a deformation of cr. For 0 5 s 5 1, let ( T ~be the curve o,(T)= 6(s, t ) . Let L(o,) = length of crs. We are interested in computing (d/ds)L(o,).For this purpose, we shall use freely the covariant differentiation of vector fields along curves and homotopies ideas developed in Chapter 19, always referring to the Riemannian connection given by Theorem 20.1. We can suppose without an y loss in generality that each of the curves t + 6(s, t ) is parametrized proportionally to arc length. Now

L(cS)=

1 (a, 6(s, 0

t ) , 3, a(s, t ) )

dt.

275

20. Riemannian AEne Connection

Parametrization proportional to arc length implies

= L(aJ2.

(19.14a); that is, the fact that the torsion tensor is zero), “1

Consider now the vector field t -+v(t) = 13,6(0,t ) along the initial curve o(t)= 6(0, t ) . It may be thought of as the injktesimal deformation corresponding to the deformation s + o, of o. In terms of this vector field, the first variation formula is I d 2 ds

- - (L(oJ2)

1

s=o

=

(41,) o’(1)) -

<m, o’(0)) J (40,Vo’(t)> dt. 1

-

(20.6)

Now, if c is to be considered as an extremal of the variational problem, it should be clear that it should satisfy Vo’(t) = 0 ;

(20.7)

that is, the extremal or geodesics are the self-parallel curves or straight lines of the associated Riemannian affine connection. Of course it could be verified that (20.7) is just the Euler equation for the variational problem, but there is no real need to do this here explicitly. Having obtained the geodesics, we would expect next, following the general theory, to try and prove that, at least locally, they are minimizing by using extremal fields. However, again the general theory can be circumvented by use of the Riemannian connection (although the reader will notice that the “extremal field” idea appears in disguised form).

276

Part 3. Globat Riemannian Geometry

THEOREM 20.2

Let M be a Riemannian manifold and let p o be a point of M . For each r > 0, let B(r) be the set of tangent vectors u E M,, whose length is less than r, that is, satisfying ( u , u ) < r 2 . (Thus, B(r) is the open ball of radius r with center 0 in the vector space M,, considered as a Euclidean space with the inner product ( , ).) Then:

(a) If r is sufficiently small, there is a mapping denoted by exp: B(r) + M such that, for u E B(r), t + exp(tu), 0 5 t I 1, is the geodesic of M starting at p o and tangent there to 1‘. (b) The Jacobian of the map exp at the point 0 E B(r) is nonzero. (c) If r is sufficiently small, exp is a diffeomorphism of B(r) with an open neighborhood B(r, p o ) of p o , called an open geodesic ball about y o of radius r . t As p o varies over any compact subset of M , the radius r of a geodesic ball can be chosen uniformly.

Prooj: We have seen that, given u E M,, , there is an E > 0 and a geodesic a(t; c) defined for 0 I f I E , which is equal to p o for t = 0 and tangent at t = 0 to 2’. Since, in a coordinate system aboutp, , cr is determined by a system of second-order ordinary differential equations, with p o and u determining the initial conditions, we see from the existence theorem on solutions of ordinary differential equations that D also varies in a C” way when p o and u vary over the tangent bundle to M . Further, E can be chosen uniformly when p o and u vary over a compact subset of T ( M ) .We also have derived the homogeneity condition: For /z > 0, t + o(At; u) is a geodesic beginning at p , , defined for 05t I c/A, tangent there to Au. By uniqueness, we have a(At; u) = a(t; h).

Thus we can normalize E = 1 at the expense of making u small; hence we can arrange that cr(f; u) is defined for 0 5 t 5 1, (0, u ) < r 2 , when r is sufficiently small. This r will do for part (a), since we can then define exp(u) = cr( 1; u )

when u E B(r).

To prove part (b), since M,, and M are the same dimension, it suffices to prove that exp, maps the tangent space to M,, at 0 onto M,,, . But, if u E M,, , t + exp(rt. ( u / r ) ) is a curve in M beginning at p o and tangent there to u. Hence, exp, maps the tangent vector to the curve t -+ rt . (u/r) onto u. Part (c) follows now from the implicit function theorem. Notice that an If 4:N N ’ is a ( C a ) map of manifolds, and if U is an open subset of N , we say that 4establishes a r/i’eornorphir.rn of U with its image U if: (a) d ( U ) is open in N ‘ , and (b) re--f

stricted to U has an inverse map +((/)

->

U.

r$

20. Riemannian Affine Connection

277

open geodesic ball of radius r can also be characterized as an open subset U of M containing p o such that: (a) Each p E U can be joined to p o by a unique geodesic of length < r, and each geodesic of length less than r beginning at p o lies in U . (b) The map exp- : U -+ B(r) (defined because, by (a), exp restricted to B(r) is 1-1 and onto U is C"). Condition (b) is just a technical point; that is, (a) describes the intuitive geometric meaning of a geodesic ball, but seems to be necessary, since there are of course 1-1 onto C" maps of manifolds that have no C" inverses. For example, the map x x3 of R + R. Another simple consequence of the first variation formula is: Suppose d(s, t ) , 0 5 s, t I 1, is a deformation of curves such that --f

(a) the length of each curve t S(s, t ) is the same; (b) t + o(t) = d(0, t ) is a geodesic; (c) t + v(t) = d,S(O, t ) is the infinitesimal deformation. --f

Then, (v(l), ~'(1))= (v(O), ~ ' ( 0 ) )This . result is known as Gauss' lemma. THEOREM 20.3 Let M be a manifold with a Riemannian metric and let p o be a point of M . Suppose that B(r, p o ) is an open geodesic ball about p o of radius r . Suppose that p E B(r, p o ) and that the length of the unique geodesic of length less than r joining p to p o is W ( p ,po). Then the length of any path? joining p o to p is greater than W ( p , p o ) except if the path actually is the geodesic of length less than r joining p to p o . Proof. The function p - , W ( p , p o ) is a C" function on B ( r , p o ) - p o , since it is just the carry over via (exp)-' of the function v + IJv/)on B(r) - (0), which is known to be C". Now, in general, given a function J'defined in an open set U of a Riemannian manifold, we can define the gradient vector field in U , denoted by gradf, by either of the following equivalent conditions: (gradf, Y ) = d f ( Y ) = Y ( f ) n+l

)

ai(s>ui(O) = O ,

+

forcing lims+o a&) = 0 for a = n 1 5 i I 2 n . Now, an interval of a, say, 0 5 t 5 E , is free of conjugate points. Then lim us(&)= lim s-ro

s-0

(, C

jZn+l

ai(s)ui(E)

+ j sCn ai(S)ui(E)

1

= 0.

Since O ( E ) is not a conjugate point, the (vi(&))are linearly independent elements This forces of Mu,,,. lim

s+o

(3" 1 ai(sjui(E)

= 0;

hence lims.+oai(s) = 0 for all i. If a(t) is a geodesic in M and u is a continuous, piecewise C" vector field on O , define

that is, H(u) is an abbreviation for the second variation formula in the fixed end-point case. Thus, if H ( u ) < 0 for some such vector field vanishing at t = 0 and t = 1, a is not a minimizing geodesic joining a(0) to a(1). Further, if u is a Jacobi vector field, we see, after integrating by parts the first term of the integrand of (22.6), that

N u ) = - -

s u(s) - to

~

to) = Vo(to), and lims+toVuS(s)= 0, by (22.8). Thus

lim __ H(’”) - (Vu(t,), Vu(t,)) > 0. s - to

s+to

Hence H(u“) < 0 for s < to and ( t o - s) sufficiently small, which proves that Q.E.D. there is a shorter path than o joining o(0) to o(1). Having shown that geodesics are not minimizing beyond the first conjugate point, we may ask: What if it has no conjugate points? (The case where the end point 4 1 ) is the first conjugate point is the borderline case that can go either way.) It is unreasonable to expect that the absence of conjugate points, without any further assumptions, would imply that the geodesic was minimizing, since a geodesic’s being minimizing is a global property, whereas the absence of conjugate points is a local condition, Thus the reasonable expectation is that the absence of conjugate points will imply that the length of the geodesic will be less than that of “nearby” paths joining the same end points. One way of making this precise is by requiring that H(u) > 0 for all nonzero, continuous, piecewise differentiable vector fields on the geodesic vanishing at the end points. We shall now show that this (and more) is true.

22. Second Variation Formula-Jacobi

297

Vector Fields

THEOREM 22.7 Let o(f),0 5 t < 1, be a geodesic of a Riemannian manifold M . Suppose that a(t) is not a conjugate point of o(0) with respect to CT for 0 I t I 1. (We shall say in this case that CJ contains no conjugate points.) Let v be a Jacobi vector field on CJ such that v(0) = 0. If u is any other continuous, piecewise differentiable vector field along CT with u(0) = 0, u(1) = u(l), then H ( u ) 2 H ( v ) . Equality holds only if u = v. Proof. Suppose dim M = n. Let v, , 1 I i, j , .. . I n, be Jacobi vector fields on CJ such that v,(O) = 0 and so that the vectors Vvi(0)form a basis of Mu(,, . By Lemma 22.5, v i ( t ) forms a basis for Mu(t)for 0 < t I 1. Suppose that u(t) = ai(t)ui(t)for 0 < t I 1 :

H ( u ) = lim JIC(Vu(t).V u ( t ) ) - ( u ( t ) , & ) ( U ( t ) , &+O

= lim

&

dt

CT’(t))(CT’(z)))]

H,(u),

E’O

where H,(u) is the integral. We now evaluate H,(u): vu

da, dt

= - vi

+ a , VV,.

Substituting, dt

+ aiaj(Vvi, V v j ) - (vi, R ( v j , d)(d))]d t . We now prove that d ((Vvi(t),u j ( t > ) - ( v ~ j ( t )vi(t)>) > = 0. dt

(22.9)

-

(This holds for any pair of Jacobi vector fields, independently of the initial conditions.) The left-hand side of (22.9) is (VVvi, U j )

+ ( V V , ,V u j ) - ( V V v j , v i ) - ( V v j , V V ; )= - ( R ( v ; ,o‘)(cJ‘),

vj)

+(WVj

>

CJTO’),

= 0,

using the identities for the curvature tensor to be provided in Chapter 23.

v;>

298


I n our case, where vi(0) = vj(0), note that (22.9) implies that (VVi,

Vj) =

(VVj,

(22.10)

Vi).

Returning to HE(u),using (22.10), integrating by parts the third term, and applying Jacobi equations, we have

Now v(1)

=

u(1) implies that v ( t ) = a,(l)v,(t).

Thus,

H ( v ) = ai(l)aj(l)(Vui(l)> v j ( 1 ) ) ; hence, H,(u)

= H(v)

+

s (dt da,

vi,

&

2v j ) d t

da-

- (ai(&)VV~(E), u(E)).

Now ai

Define a vector field t

vv, = v u - dai - 21; dt

+ w ( t ) along

c for 0 < t 2 1 by

Then u is a Jacobi field if and only if w ( t ) H,(u), we have

=0

for 0 5 t I 1. Substituting in

ff,(u) = H ( u ) + Je'(w(t), w(t>> d t - 0. Suppose otherwise: Since U ( E ) --f 0 as c + 0, we must clearly have lim ( w ( E ) , w ( E ) ) = co; &-0

hence, also lim ( w ( E ) , u(E)) = - 00. E-0

Now we can choose numbers

E

arbitrarily close to zero for which

<w(c), w ( E ) ) = max <w(t), w ( t ) ) . &St 0. Then M is compact. Further, every geodesic of M of length 2 must contain at least one conjugate point, and A4 has diameter s Jniun.

Jqin

?‘root Let a(t), 0 I t I 1, be a geodesic of M that is free of conjugate points. Let IZ = dim M , and let u i , 1 I i , j , . . . 5 n, be self-parallel vector fields (that is, Vu, = 0) along c whose values at each a(t) form a basis of M , ( t ) , and with ( u i ( t ) , u j ( t ) ) = dij.

300


If v(t) = ai(t)vi(t), with ai(0) = ai(l), we must have, by Theorem 22.7, H ( v ) 2 0. But

H(v) = jl[(Vv, Vv) - (v, R(v, o’)(o’))] dt 0

= (after

Jb

1

-

integrating by parts the first term and using v(0)= v(1)

[(VVv, v)

= 0),

+ (v, R(u, o’)(o’)>] d t .

Suppose we try to find such a v satisfying

for some constant A > 0.

VVv = -L2v

Working out the conditions u wouId have to soIve, we see that the simplest vector fields of this type would be u i ( t )= sin(nt)ui(t).

Then

H(ui) = 71’ /lsin’(nt) dt 0

1 -

0

sin2(nt)(vi(t), R,(t)(vi(t), o’(t))(o’(t))) dt

Definition Let p be a point of a Riemannian manifold M . Let u be a unit tangent vector ofp. The Ricci curvature of v, denoted by R(v), is the trace of the linear transformation u R,(u, v)(v) of M,, If v is any nonzero tangent vector, define --f

Thus we have

since these are the diagonal elements in the matrix of the linear transform

22. Second Variation Formula-Jacobi

30 1

Vector Fields

with respect to the basis (ui(t)) of M , ( , ) . Finally, then,

0
= ( V x V y Z - v,v,z, W )

z, w>>- (VY z, v, w>- Y ( ( V , z,W ) ) + ( V , z, v y W ) = X Y ( ( Z , W > )- X ( ( Z , VY W ) ) - Y( )+ Y( )+ X ( ( Z , v y W ) ) - ( Z , v,v, W ) = X((VY

=

( Z , vyv, W - v,vy W ) .

Hence, ( R W , W Z ) , W>=

-+ < N Z , m y ) , w>+ ( N Y , Z ) ( X ) , w>= 0. Permute W with X , Y , and Z in turn to obtain these more similar identities:

(NK

Y ) ( Z ) ,x>+ ( N Z ,W>(Y),x>+ ( R ( Y , m w ) ,

x>= 0

( R ( X , W>(Z),y > + (W,X > ( W , Y > + ( N W ,Z>(X),Y > = 0

( N X , Y W ) , Z> + ( N W , X ) ( Y ) ,Z )

+ = 0.

Now add these four identities, noting that many terms cancel by (23.3), and identity R ( X , Y ) = - R( Y , X ) , giving 2(R(X, W ) ( Z ) ,Y > + 2 + 2 ( Y )X, > = 0 or, using (23.3), ( R ( X , W ) ( Z )+ R ( K Z ) ( X ) , Y > + ( N W , Y ) ( Z ) ,X > = 0 =(using (23.2) again), = -(R(Z,

m w > ,y > + (NW, Y ) ( Z ) ,x>.

This leads to our third basic identity for the curvature tensor, which we write in the more easily remembered form ( N X , W Z ) , W > = ( J V Z , W X ) , Y>.

(23.4)

304


Now we can show that K ( T ) really does not depend on the choice of orthonormal vectors generating r. Suppose that

id

= cos

O M + sin 8u,

R(u’, u‘) = (cos’ 8

v’ = -sin 8u

+ cos 8v.

+ sin2 8)R(u, u) = R(u, u)

by skew-symmetry of (u, v) + R(u, u). ( u ’ , R(u’, v’)(v’)) = (u’,R(u, u)(u’))

+ sin 8u, R(u, v)( -sin 8u + cos 8u)) f3+ sin2 8)(u, R(u, v)(u)> = K(T),

= (cos 8u = (co?

by (23.3). We can easily derive the formulas for K ( T ) in case vectors u, v span but are not orthonormal. Put u’=- U II u II ’

vf =

(u’, v’) are orthonormal generators of

tion process).

v--

r,

(v, u>u llU1l2

l[~-WIl.

(the Gram-Schmidt orthonormaliza-

and by (23.3),

Now

where 8 is the angle between u and v. Thus (23.5)

23. Sectional Curvature and Comparison Theorems

305

Geometrically, of course, the denominator is just the square of the area of the parallelogram determined by u and u. It will be recognized at once from the preceding definition that essentially it is the sectional curvature of the plane r, that is, K ( T ) = ( u , Rp(u,u)(u)), that appears in the second term of the second variation formula (22.3). As a first observation, the sign of the curvature plays a crucial role. For example, t < 1, is any geodesic, if if the sectional curvature is always S O , if a(t), 0 I u is any continuous piecewise C" vector field on a, then H ( u ) > 0 unless u = 0 identically. By Theorem 22.4, a can have no conjugate points. We deduce from the corollary to Lemma 21.4:

THEOREM 23.1 (HADAMARD AND

CARTAN)

Suppose that M is a complete Riemannian manifold whose sectional curvature is always nonpositive. Let p E M . Then, exp: M p M is a covering map. --f

As a second observation notice from the definition the following property of the Ricci curvature of a unit tangent vector v E M,: With respect to any orthonormal basis ( u i ) , 1 < i < n, of M , chosen so that, say, u = u l , the Ricci curvature is the sum of the sectional curvatures in the planes spanned by u and v i , 2 I i < n. Thus, if the sectional curvatures of M are bounded from below by a positive number, so are the Ricci curvatures; hence, by Theorem 22.8, if M is complete it is also compact, and its diameter is bounded from above, depending only on this positive number. The ultimate purpose of the Rauch type of comparison theorem is to refine this result by giving more information about the topology of such a Riemannian manifold, but in this chapter we shall restrict ourselves to developing fundamental analytical tools. First, however, we discuss the more classical geometric interpretations of the sectional curvature.

THEOREM 23.2

Let p be a point of a Riemannian manifold and r a plane in M p , K ( T ) the sectional curvature of that tangent plane. For each r > 0, let y r be the circle in r about the origin of radius r. Let L ( r ) be the length of the curve exp(y,) in M (assuming that Y is sufficiently small so that the exp mapping is defined in some convex ball of M p about 0 containing 7,). Then

dL dr

- (0)= 271,

d2L dr2

-(0) = 0,

d3L dr3

-(0) = -6nK(T).

306


Thus, taking the Taylor expansion of L(r), we have a new geometric definition of K ( T ) .

K(T) = lirn r+O

length y,. - length exp(y,) nr3

(23.6)

7

Proof. Since this is purely a local result about p , we can (and will), for simplicity of notation only, suppose that M is complete. Then, let 6(s, t,) 0 I s, t I 1, be a homotopy such that:

(a) 6(s7 0 ) = P. (b) s -+d,6(s, 0) is the circle of unit length in T, that is, yl, in its arclength-proportional parametrization. (c) 6(s7 t ) = exp(t d,6(s, 0)), that is, t -+ 6(s, t ) is the geodesic starting from p tangent there to yl(s). Then, L(t) = li(d,6(s, t ) , ds6(s, t ) ) ' I 2 ds, 0

_-

LEMMA 23.3 Let 6(s, t ) , 0 I s, f I 1 be a homotopy with 6(s, 0) = 6(0, 0), but V, 3,6(s, 0) # 0. Then lirn t-0

d s 6 ( s , t) -

v, d s 6 ( s , 0)

l l & W 7 till - I I V , a s m 0111.

Proof. lirn IIas6(s' t-0

t2

t)ll

= by

L'H6pital's rule (since lla,6(s, 0)ll

= 0).

Q.E.D.

307


Applying Lemma 23.3,

Now V, V, a,6 = V, V, hence V, a, = 0. Then

a, 6 + R(8,6, a, 6)(d, a),

M a , &a, wa, 6), 8,s) + iiv, a,a ii

d2L

and t + 6(r, s) is a geodesic;

IIa, 6 II

First, note by Gauss’ lemma that (a, 6, a, 6) = 0. Let T(s, t ) be the plane in Md(,,,) spanned by a,6(s, t ) and a, 6(s, t ) . Since

and

a, 6(s,

0), V, 8,6(s, 0) E r, we see that K ( r ( s , t ) )+ K ( T ) as t + 0.

(v a 6 ”>‘+(v ’ IlaS6ll

a6

- 2K(T)IIVSd,6(s,

A) v a 6

’ IIV,a,~ll

as t + O

0)II =

as t -+0.

IIV,d,6(s, 0)ll.

308


Putting the calculations together, we see that

d2L dt2

-(0)= 0, d3L

__

dt3

(0) = lim t+O

d2L/dt2 ~

= lim

t

1-0

d 2L/dt ~

IIas4l

. IIvt a, 6 II

= - 3 K ( r ) ~'\lVIlJsd1l ds = - 6 zK ( r) . 0

Q.E.D.

COROLLARY 1 Let M and N be Riemannian manifolds, Il4*(v>II = llvll

4 :N

--t

M a map such that

for all v E T ( N ) ,

(that is, 4 is an isometric immersion of N in M ) . Let p o E N and let r be a plane in N,, such that, for all u E r, t -+ +(exp(tu)) is a geodesic of M . Then KN(r)= K M ( & ( r ) ) , where KN and K M are the sectional curvatures with respect to the metrics on N and M . Proof. Let y, be the circle of radius r in r. Since 4 is isometric, the length of d(exp(y,)) and exp(y,) are the same. But, 4(exp(y,)) = exp(d,(y,)). 4*(yr) is the circle of radius r in 4*(r).The result follows from (23.6), applied to r and &(r). This corollary leads to another geometric interpretation of the sectional curvature of a Riemannian manifold M . Suppose first that dim M = 2. Since dim M , = 2, there is only one sectional curvature associated with each point. The function thus defined over M is called the Gaussian curvature of M . Now suppose that dim M > 2; let p E M , and let r be a two-dimensional subspace of M . The union of all geodesics of M starting at p and tangent there to r, at least locally about p , determines a two-dimensional submanifold N ( r ) about p . Now any submanifold of a Riemannian manifold has on it an induced Riemannian metric obtained by restricting the inner product to the tangent spaces to the submanifold. (The resulting metric on the submanifold is such that the inclusion map of the submanifold into the big manifold is an isometric immersion.) If we apply this to N ( r ) , we see from Corollary 1 that the Gaussian curvature of N ( T ) at p is precisely the sectional curvature of r. Of course this fact can also be taken as a geometric definition of sectional curvature. Another remark resulting from Corollary 1 is that if dim M = dim N (that is, if 4 is a local isometry of M on N ) , 4 preserves the sectional curvature.

309


This remark is also, of course, implicit in the fact that the Riemann curvature tensor is invariantly attached to the metric. We shall present another useful corollary after a preliminary definition. Definition Let M and N be Riemannian manifolds and let 4 : M N be a map such that (p,(Mp)= N+(,) for all p E M . (4 is then called a maximal rank mapping of M into N . It follows from the implicit function theorem that 4 is then an open mapping.) For p E M , let Fp = {v E M,: (p*(u) = O}. F, is called the space of vertical vectors with respect to 4. Let Fp' = { u E M p :u is perpendicular to F p ; that is, (v, u ) = 0 for all u E Fp}. (p is said to be a compatible mapping between the Riemannian structures on M and N if --f

I~~*(U)II

= llvll

for all u E F / ,

all p

E

M.

(23.7)

COROLLARY 2

Let 4 : A4 -+ N be a compatible, maximal rank mapping between Riemannian manifolds. Let p E M and let r be a two-dimensional subspace of Fp'. Then,

Proof. One can show that if u E Fp', if o(t), 0 5 t I 1, is a geodesic starting at p and tangent there to v, that (a) ~ ' ( tE )FA,, ,for all t E [0, I]; and (b) t 4 ( o ( t ) ) is a geodesic of M . If u is an arbitrary vector of M p , u = u1 + u 2 , with u1 E F p , u2 E F,', (vl, u 2 ) = 0. Thus \lullz = IIulIlz llv2112. But lluzII = I14*(u2)11 = ll4*(u)ll. Hence 1 1 ~ 1 12 II4(u)lI. In particular, if Y is any curve of M , length y 2 length 4(y). Let yr be the circle of radius r about 0 in r.

+

--f

length exp(yr) 2 length

4 exp(yr).

But

Using (23.5), we now get (23.8). Corollary 2 is very useful in proving that certain spaces have nonnegative curvature. The sectional curvature also has a geometric interpretation in terms of the comparative size of geodesic triangles in the Riemannian and Euclidean spaces. The following theorem is representative of this type of result.

310


THEOREM 23.4 Let M be a Riemannian manifold, p E M , u o , u1 E M,, . For t E [0, 11, let D ( t ) = distance from Exp(tu,) to Exp(tv,), 8 = angle between u,, and ul. Then D(t)2 has a Taylor expansion of the form D(t)’ = llvl - v,Ilt2

+ K(T)IIv,112 llu1112 sin2 8t4 + .-..

(23.9)

Proof. It is most convenient at this point to use Taylor’s formula for covariant derivatives.

LEMMA 23.5 Let v(t), 0

I ] --t M be a curve in an affinely connected manifold M . Let t I I , be a vector field along o. Then u admits a Taylor expansion:

0:[0,

u ( t ) = u(0)

+ Vu(0)t + V2v(0) -

t2 +

2!

...

I vNu(o)tN

N!

+ u(t)tN+’.

(23.10)

(Literalfy, (23.10) makes no sense, since v(t) E M u ( * )while , the vector on the right-hand side belongs to M,(,, , However, we mean implicitly that they are compared by parallel-translating the left-hand side along a from a(t) to om.) Proof. Of course, if o ( l ) is constant, u(t) is just an ordinary vector-valued function in Mu,,,; hence this is just the usual Taylor’s formula. We can reduce to this case, however, by the following trick : For t E [O, I], let w ( t ) be the vector in Mu(,, which parallel-translates along o to give u(t). It is easily seen that (dw/dt)(t) parallel-translates along o to give Vu(t). Thus the classical Taylor expansion for w ( t ) parallel-translates along a to give (23.10). s I I, 0 I t I I,, be a homotopy Return to Theorem 23.6. Let 6(s, t ) , 0 I in M such that : ( 4 6(s, 0) = P. (b) 6(0, t ) = exp(v, t ) , 6(1, t ) = exp(u,t) for 0 I t I t o . (c) For each t , s --t 6(s, t ) is a geodesic parametrized proportionally to arc length. (If to is sufficiently small, obviously such a homotopy can be constructed.) For notational convenience we shall assume that to = 1. Note that D(t)’ =

J

1

0

I/dS6(s,t)l12 ds.

Condition (c) implies that s + d,6(s, t ) is a Jacobi vector field along the

311


geodesic s -+6(s, t ) . From (a), the Jacobi equations reduce at t

=0

to

v, v, a, 6(s, 0) = 0. Thus, in view of the conditions 8,S(0,O) = uo ,a, 6( 1,O) = u1 derived from (b), we have

a, 6(s, 0) = (1 - s)uo + SUI .

V,a,6(s,O)=V,a,6(s,O)=u,

-Do.

Thus the Taylor expansion of a,S(s, t ) about t = 0 is

a,6(s,

t ) = (ul

+ . . + (higher-order terms in 2).

- uo)t

*

V, V, a,6(~,t ) = V, V, a,6(s, t ) = V, V, a,6

+ R(a,6, a, 6)(a, 6).

Hence,

N O Z= J

1

0

IKU,

- U 0 ) t + HV, v, a, 6(s, t )

+ R(a,6(s, t), ds6(s, t))(a,6(s, t)))t2 + . .. 11’

Now

s,

1

ds = J0 [& ((01 l

= ((01

a

- uo)t, v, a,

= 0 (by

- UOP,

v, 4d(0,

1

w ,0 )

- uo)t, v, a, 6(1, 0 )

- ((01

ds.

t)>

condition (b)).

We now prove

Integrating by parts and taking into account condition (b), we have (left-hand side (23.11)) =

1 - (V, V, V, a, 6(s, 0),V, 8,6(s, 0)) ds. 1

0

v, v, v, a, 6 = V,(V, v, a, 6 + R(a, 6, a, 6)) = v, v, v, a, 6 + R(d, 6, a, S)(V, a, 6) + (V, 6, 3, w , 6) + N V ,a,6, 8,6>@,6) + R(a,6, V, 4Was 6) + R@, 698, a v , a, b).

ds

312


Finally we see that V,V,V, Now

a, 6(s,

0) = 0. This proves (23.11).

+ N u ,, u1 - u0>(u1)s2 + “0 =

j(u13

R(u0, ud(u0))

> u1

+ K U l ? W u , , - uo>(uo)>

+ 3< - D o N u , , -uo)(ud) + $< - 00 9

= ds

- uo)(u1)(1

=K

m 11uo 112

7

R(u0 ud(v1)) 9

llu1 112 sin2 0,

where 0 is the angle between uo and u l , and is the plane of M , spanned by uo and u1 ; whence (23.9). Note that the first term on the right-hand side of (23.9) gives the “law of cosines,” or basically, the Pythagorean theorem, for triangles in Euclidean geometry. Note also that tlluoll ~ ~ lsin u 01l t 2~ is~ the Euclidean area of the triangle. Thus (23.8) shows that the sectional curvature gives the deviation of the law of cosines for small triangles. Many other formulas of trigonometry on Riemannian spaces can also be derived, using (23.9) as a replacement for the law of consines, or independently in a similar way. So far, we have been comparing to a certain infinitesimal order the geometric entities in a Riemannian space with the corresponding entitites in a Euclidean space. These are very classical results (mostly due to Riemann); only recently have results of this type been proved that are global and that enable one to compare goemetric entities in two, possibly both, non-Euclidean, Riemannian spaces. The following comparison theorem, due to Rauch, is the foundation of much of the work.

THEOREM 23.6 p

Let M be a manifold, with two Riemannian metrics defined on it. For M , u, v E M,, let ( u , u> and ( u , v)* be the inner product in the two

E


313

metrics. Let R and R* be the Riemann curvature tensors of the unstarred and starred metrics. For u, v E M , , let K(u, v) and K*(u, v ) be the sectional curvature on the unstarred and starred metrics of the plane spanned by u and v. Let p o be a point of M such that the inner products on M , agree. Let a: [0, 11-+M be a curve beginning at p o , which is a geodesic in both metrics, which has the same length in both metrics, and such that for 0 < t 5 1, a(t) is not a conjugate point of a(]) (with respect to 0)in the unstarred metric. Let v(t) be a vector field along CT which vanishes at t = 0, which is a Jacobi field with respect to both metrics, and such that (v(t), a’(t)>

= 0 = ( u ( t ) , o’(t)>*

for 0 I t I 1.

For t E [0, 11, let u(t) E Ma(r)be defined as follows: Parallel-translate u(t) to ~ ( 1 )along CT, using the starred affine connection; then parallel-translate this vector at 4 1 ) back to o(f) along a, using the unstarred affine connection. The result is u ( t ) . Then

I J:[K*(u(L), a‘(i)) - K ( u ( l ) , a’(L))](v(l),u(L)>* d l .

(23.12)

l I t. Equality holds if and only if v ( l ) = u ( l ) for 0 I Proof. It suffices to prove (23.12) in case t = 1 and to suppose that length 1. Let dim M = n. Choose the following indices and summation convention: I < i, j , . . . < n. Let ( w i ( t ) ) ,( z i ( t ) )be vector fields along a such that: CT =

(V and V* denote covariant differentiation with respect to the unstarred and starred affine connection, respectively.) Suppose that u ( t ) = ai(t)wi(t)= bi(t)zi(t).Writing out the Jacboi equations for u for both metrics, using (b) and (c), we have d ’bi(t )

dt2

+ bj(Zi(t),R*(Zj(?),a’(t))(a’(t))>*

= 0.

(23.13)

314


From (a), ai(l) = bi(l). Hence u(t) = bi(t)wi(t). Since u(0) = u(0) = 0, u(1) = u(l), we have, by Theorem 22.7,

= Jol

db, db, [ z dt

= (after

+

1

0

- bi bj<wi> R(wj 9 c’(t))(0‘(l))>] dt

db, integrating by parts and using (23.13)) bi(l) -(1) dt

[bj*bi- ( u , R(u, a’)(a’)>ldt

COROLLARY 3 Suppose that M is a Riemannian manifold with two Riemannian metrics , )*. Let p o E M satisfy the following conditions:

( , ) and (

(a) The geodesics beginning at p o in the two metrics coincide and have the same length. (b) If a: [O, I] M is any geodesic beginning at p o , having no conjugate points of p o , then --f

K*(u, o’(t)) I K(u, a’(t))

for all t

E

[O, 11, all u, 0 E Mo(t).


315

Let p be any point of M lying on a geodesic from p o having no conjugate points of p o . Then for all u E M , . ( u , u ) I( u , u)* Intuitively, inside the “ conjugate locus ” of p o the starred metric is bigger than the unstarred one. Proof. In view of the relation between Jacobi vector fields and geodesic deformations, the Jacobi fields of both metrics that are zero at pa must coincide. Thus (b) implies that

d dt ((u(0,

4t>> - (u(t),

u(t>>*)I 0

for each Jacobi field that is zero at p a ; hence 5

>*.

If o(1) is not a conjugate point of o(0) = p o ,the values at t = 1 of all Jacobi vector fields that are zero at p a spans Mu(,,. The corollary then follows. The following more qualitiative comparison theorem is due to Morse and Schoenberg. Both comparison theorems may be considered as generalizations of the classical Sturm comparison theorem. THEOREM 23.7 Let M be a Riemannian manifold, and let a: [O, 11-M be a geodesic of M such that o(1) is the first conjugate point of a(0) along o. Suppose that c1 and c2 are positive real numbers such that Cl IK(u,o’(t)),

(23.14a) (23.14b)

=/A,(b) ./AIlength o.

(a) length a I

(23.15)

Proof. Suppose for the moment that o(1) is an arbitrary geodesic of M . Let u,(t), 0 It I 1, 2 Ii i n (summation convention), be vector fields along o such that Vu,(t)= 0, (u,(f), u j ( t ) ) = a,, , (u(t), o‘(t)) = 0. Suppose that u is a vector field along o of the form u ( f ) = a, sin(knf)ui(f). Thus llu(t)l12 = sin2(knt) . a,ai,

H(u) =

1

0

vu

= k m , cos(knt)u,(r).

[(VU,VU) - (u, R(u,o‘)(o’))] dt

= aiai Jo

1

[k2n2cos2(knt) - sin2(knt). Ila’(t)1I2K(a’(t), u(t))] d t ,

316


where K(o’(t),u(t)) is the sectional curvature in the plane spanned by o’(t) and u(t). By (23.14), the integrand is no greater than

k2n2 cos2(knt) - c1 sin2(knt)~~o’(t)~~z. But IIa’(t)II = length o.

Take k = 1. Note that A j cos2(nt)dr = A j sin2(xt) dt and u(0) = 0 = ~(1). If there are no conjugate points on oy we must have H ( u ) 2 0. This forces length o i 7 c / f i l , and hence proves (23.15a), since the same inequality obviously holds if o(1) is the first conjugate point of ~(0). We turn to proving (23.1%). Let u(t), 0 I tI 1, be a continuous, piecewise Cm vector field along o, with u(0) = 0 = u(l), and ( ~ ‘ ( l ) ,~ ( 1 ) ) = 0. Using a Fourier series expansion of the components of u, we can write u(t) =

C aik sin(knt)ui(t). k=l m

Since u is piecewise Cm, the Fourier series for Vv also converges and is, vV(t)=

m

C aik kn COS(knt)u,(t),

k= 1

Suppose that a(1) is a conjugate point of o(0). We can then choose the vector field u so that H ( v ) = 0. Then

c m

I \ ~ ( t ) (= 1~

aikailsin(knt) sin(lnt).

k,l=l

j:cos(knt) cos(Znt) dt = -

1

0

cos(k + Z)nt dt

sin(k + l)nt k+f

=0

J0

‘sin(knt) sin(Znt) dt

IO+~O

+ J’sin(knt) 0

+

sin(knt) sin(Znt) dt

sin(lnt) dt.


317

Thus, z2

m

k,l=l

aikaii(z2k~ - (length .)'c2)

s,

1

sin(kzt) sin(l7c.t) d t .

Suppose now that (23.15b) is not true; that is, (length .)'cZ2 < n2 I n2kl

for k , 1 2 1.

But

These inequalities are thus contradictory, whence (23.19, and the theorem is proved. This theorem can also be proved by using the Rauch comparison theorem, that is, Theorem 23.6.

24

Submanifolds of Riemannian Manifolds

Throughout this chapter, let M be a Riemannian manifold. Thus, each p E M has a positive definite inner product ( , ) defined by the metric. M carries the Riemannian affine connection ( X , Y ) 4 V, Y. Now recall that technically a submanifold must be considered as a pair ( N , 4) consisting of another manifold N and a mapping 4 : N 4 M such that: (a) For p E N , 4* : N , (b) 4 itself is 1-1.

-+

M4(p)is 1-1.

If (a) is satisfied, but not necessarily (b), the pair is called an immersed submanifold: By the implicit function theorem, every point of N has a neighborhood so that 4 restricted to this neighborhood is a submanifold. Intuitively, an immersed submanifold is locally a submanifold, but may have " self-intersections." However, many differential geometric facts proved about submanifolds carry over with little difficulty to immersed submanifolds, so we shall restrict attention here to submanifolds. If ( N , 4) is a submanifold, it is customary to suppress explicit reference to 4, to identify N with the subset $ ( N ) of M and each N p with the subspace 4*(N,) of M4(,). When there is little possibility of confusion, we shall do so. Let N be a submanifold of M . Since each N p is identified with a subspace of M,, the given inner product ( , ) on M , can be restricted to N , to define a positive definite inner product there also: Thus N inherits a Riemannian metric from its embedding, called the induced metric. Our first job is to compute the affine connection and the curvature for the induced metric. For p E N , let Npibe the orthogonal complement of N p in M , with respect to the form ( , ). An element v EN,' (satisfying (v, w) = 0 for all w E N,) is called a normal vector to N . Define N'=

PEN

NPi,

the normal vector bundle to N . It is readily verified that N' is a submanifold of T ( M ) , the tangent bundle to M , whose dimension is equal to that of M . A vector field X E V ( M ) is said to be tangent or nomal to N if, respectively, X ( p ) E N p or X b ) E N,' for all p E N . Suppose that X and Y are tangent to 318

319

24. Submanifolds of Riemannian Geometry

N , while Z is normal. Note that, forfE F ( M ) :

(VX(fY),Z X P ) = (fvx y, Z X P ) + (X(f)r,z> =f(P)(Vx Y?Z>(P), since ( Y , Z ) ( p ) = 0.

(V,,X,Y? 2%) =f(p)(Vx y, Z>W = (VX Y , f Z ) ( P ) . These identities are the tipoff that the mapping of vector fields into functions: ( X , Y, 2 )-+ (V, Y, 2 ) possesses a “value ” at each p E N ; that is, if u E NpL, u, w E N p , choose X , Y, Z E V ( M ) such that X and Y are tangent to N , Z normal to N , so that X ( p ) = u, Y ( p ) = w,Z ( p ) = u, and define

Su(u, w)= < v x y, ZXP) = < v x Y(P),Z(P>>.

(24.1)

Considered as a function in the indicated subset of T ( M ) x T ( M ) x T ( M ) , S is called the second fundamental form of N . (This is the classical terminology; the first fundamental form is just the inner product ( , ) on T ( M ) restricted to T ( N ) . ) The symmetric bilinear form (u, w) -+ SJu, w) defined on T ( N ) is called the value of the second fundamental form on u. It must be verified that this is independent of the extension of u, u, w to vector fields, but we shall do this in a moment after computing in terms of a local basis of vector fields. As algebraic properties, note from (24.1) that S,(v, w)varies linearly when u, u, or w are varied separately; that is, S as a function of Np‘ x N p x N p + R is multilinear. A less automatic property is symmetry, namely, SdU, w)= SU(% u).

(24.2)

Proof. First note that if X and Y E V ( M ) are tangent to N , so is [X,.Y]. T o prove this, let p E N. Revert to explicit mention of the map 4 : N -+ M defining the submanifold. For u E N,, f~F(M), & ( u ) ( f ) = u(cj*(f)). Hence 4 * ( u ) ( f )= 0, provided 4*(f)= 0. Thus, eliminating 4 again from the notation,

N , c {u E M,: u ( f ) = 0 for allfE F ( M ) that vanish on M ) . Conversely, it is seen (using the implicit function theorem, which is left as exercise) that the set on the right-hand side has the same dimension (as a vector space) as N p ; hence equality holds. Now suppose that f E F ( M ) vanishes on N and that X , Y E V ( M ) are tangent to N. Thus

0 = X(P)(f>= X(f)(P)= Y ( p ) ( f ) = Y(f)(P)

for all P

E

N.

cx,YI(P)(f)= [X,YI(f)(P)= X(Y(f>>- Y(X(f))(P)= 0,

320


since Y (f)and X ( f ) are functions vanishing on N . Thus [X,Y](p)E N p for all p E N. Returning to (24.2), suppose that X, Y E V ( M ) are tangent to N , and Z E V ( M )is normal to N . If u = Z ( p ) , z, = X(p), w = Y(p), &(u, w ) = (V, y, Z X P ) = (VY = (P>+ a x , YI(P), Z(P)> = SU(W V),

since [ X , Y](p)E N p , Z ( p ) E Np’. This proves (24.2). We must learn how to compute the second fundamental form in terms of a local basis for vector fields.

LEMMA 24.1 Let p be a point of N, let (vJ, 1 I i,j, . .. I mdim M , be an orthonormal basis of M p such that: iI n = dim N , is a basis for N p . (a) ( v J , for 1 I (b) ( v J , n + 1 I i = m, is a basis for NoL.

Then there is an open set U of M containing p and a basis ( X i )of vector fields in U such that ( X i , Xi> = J i j

for 1 5 i , j 5 m .

(Any basis of vector fields satisfying this condition is called an orthonormal basis.) Now n U, X i @ ) = ui for I I iI m, [ X i , X,], for 1 I i, j , . . . I n, is expressible, in N n U, in terms of XI, . . . , X,.

X i ,for 1 i i I n, is tangent to N

Proof. Since N is a submanifold of M , U can first be chosen so that it carries a coordinate system of functions ( x i )so that: U n N = ( q E U : x,+,(q) = 0 = ... = x,(q)} (exercise in the implicit function theorem). From (d/dxi)(xj) = 0 for 1 iI n, n + 1 < j I m, it follows that the vector fields (dlax,), . . . , (a/ax,) are tangent to N . Recall the Gram-Schmidt orthogonalization linear algebra process of constructing an orthonormal basis from a given basis. Thus

The construction is such that the vector fields XI‘ obtained form an orthoare also tangent to N, normal basis of vector fields so that the XI’,. .. , X,,’

32 1


while the XA+17.. ., X,‘ are therefore normal to N . Thus we have expressions of the form

vi =

vi =

c n

j= 1

m

C

j=n+l

i < n,

for 1

CijXj‘(P)

for n + 1 s i 5 m.

cijXj’(p)

Each of the matrices occurring in these relations is an orthogonal matrix. If now we define

xi= c CijXj” n

l s i l n ,

j=l

the vector fields (X,, ..., Xm)will do the required job.

Q.E.D.

Let us say that a basis of vector fields having the same properties as the bases X,, . .., Xn constructed in Lemma 24.1 is a local moving frame? for the submanifold geometry of N . Lemma 24.1 can then be interpreted as asserting the existence of a plentiful supply of local moving frames. Suppose now that we work with any such local moving frame XI, . . . , Xn defined in U. Since the indices I i i, j , . . . i n must systematically be split into two parts to account for N p and Np’, it is convenient to introduce the following further ranges of indices, with the corresponding summation conventions in force : 1 < i , j ,... I n ;

n + l < a , P ,... I m . (24.3)

l < a , b ,... < n ;

X, (that is, the r i j k ) are the components of the Suppose that Vx, X j = rijk Riemannian affine connection with respect to the basis ( X i ) . The rabu determine the second fundamental form, since =

=

-< x b

= rabu(P) = s X , ( p ) ( x a ( P ) ,

9

vX,

xb(p>>

xz> =

(24.4a)

- l-aab

for P

N . (24.4b)

Note that 0 = Xi((X i , X,)) = .

(24.7)

To find the relation between the curvature of the metrics on M and N and the second fundamental form, it is necessary to apply covariant derivatives of both sides of (24.7) and use the various identities we have developed. However, there is a much neater way of doing this, developed by E. Cartan, using a dual differential form point of view. It will repay our investment to detour a moment to develop this approach. LEMMA 24.3 Let ti be an open set of a Riemannian manifold M of dimension n (1 i, j , .. . I n) which has an orthonormal basis ( X i ) of vector fields; that is, ( X i , X i ) = d i j . Let coi be the dual basis of differential forms; that is, W i ( X j )=

dij.

323


Let

rijk

be the functions in U such that

and let w i j be the 1-forms defined by (24.8)

+

(a) oij oji= 0,

(b) d o i = wij A w k .

(24.9)

Conversely, any set of forms wij satisfying (24.9a) and (24.9b) is uniquely determined and given by (24.8). Let the 2-forms aijbe defined by

a..= dw.. - 0. rk A 0k j '

(24.10)

ij

11

Then the curvature tensor is determined as follows : = Oij(X, Y

( R ( X , Y ) ( X i ) ,X i )

for X , Y

)

E

V(U).

(24.11)

Proof. Equation (24.9a) is equivalent to (24.5a). We show that (24.9b) is equivalent to (24.5b) : (dwi

- Oij A

wj>(xk

7

xl)

- xl(wi(xk))

= xk(wi(xl))

- wij(xk)wj(xl) = -wi([xk, x l l ) =

9

- Oi([xk,

- oij(xl)wj(xk) - rkij6jZ + r t i j 6 j k

xll) + rkZi

- rZki

xll)

*

This shows that (24.9b) is equivalent to (24.5b); hence, also that the wij are uniquely determined by (24.9), since they determine the unique Riemannian connection. Note, for example, that for X , Y E V ( U ) , VX

y

= [x(wk(y))

+ wj(y)wjk(x>Ixk.

(24.12)

In particular, vxi xj = o j k ( x i ) x k

and

wk(vx xj) = wjk(x).

From (24.10), dwij(X, Y ) = X(wij(Y))- Y(wij(X)) - Oij(CX, Y l ) = X ( ( X j , V Y X i ) - Y ( ( x j 7 v ~ x i ) ()~- j 7 V ~ x , Y ] X i )

( V ~ x jV,X X i ) + < X i , R ( X , Y ) ( X i ) ) - ok(Vl'Xj)wk(VXXi) + <xj R(X, y)(xi)) = wjik(X)Wik(Y) - w k j ( y ) w k i ( x ) + ( x j ,R ( X , y)(xi)), =

-

( w , N u , u)(w1)> =

a=n+ 1

S",(WI,

U)S",(W,

u)

where R( , )( ) and RN( , )( ) are respectively the curvature tensors of M and N , and where S, )( , ) is the second fundamental form of N . In particular, if u, v are unit orthonormal vectors of N , , then

where K ( u , u ) and KN(u,u ) are respectively the sectional curvatures of M and N in the plane spanned by u and v.

325


COROLLARY Suppose that N is a hypersurface in M ; that is, dim M Let v, be a generating vector of N p L .Then, for u, u E N p ,

= dim

N

+ 1.

KN(u,v) - K(u, u) = the product of the eigenvalues of the quadratic form Sun(, ) restricted to the plane spanned by u and v. (These eigenvalues are called the principal curvatures of the plane.)

To prove the corollary, recall the following facts from linear algebra: (i) A vector u E N p is an eigenvector with eigenvalue 1 of Sun(, ) if

S,,,(u, u ) = 1(u, u )

for all u E N , .

(ii) The set of all eigenvectors corresponding to a given eigenvalue is a linear subspace of N p. (iii) Eigenvectors corresponding to different eigenvalues are perpendicular with respect to { , ). Hence, to compute (for u, u E Np)the sectional curvature of the plane spanned by u and v, we can choose u and v so that they are eigenvectors for eigenvalues 1, and 1, of Sun(, ) restricted to the plane, and satisfy ( u , u ) = 0, S",(U,

( u , u ) = 1,

v)

{ v , v) = 1.

= 4 = 0,

SU"(U, u ) = I, = 1 1 , Sun(v,0) = A , ( U , u> = 1 2 7

whence, from (24.14), K,(u,

U)

- K ( u , U ) = 111,.

Q.E.D.

Now we turn to the second variation formula in a more general form than that considered in Chapter 22, namely, when we consider homotopies whose end points are not necessarily fixed, but which lie on two submanifolds of M . Explicitly, suppose that: (a) 6(s, t ) , 0 I s, t I 1, is a homotopy in N , with each curve t + 6(s, t ) parametrized proportionally to arc length. N and N' are submanifolds of M : 6(s, 0) E N and 6(s, 1) E N' for 0 s < 1 ; that is, the end points of the homotopy lie on N and N Further, o(t)= 6(s, 0); v ( t ) = d,6(0, t ) E M , ( t ) . Hence, t + u(t) is the vector field on (T representing the infinitesimal deformation of o. I .

326


(b) R( , )( ) is the curvature tensor of M ; S, )( , ) and S, )( , ) are the second fundamental forms of N and N’, respectively. L(s) = length of curve t--t6(S, t),0 I tI 1. From (20.6) we have d L(0) ds L(s)I

s=o

=(

~ ( t )o’(t)>[=’ , t=O

1

1

0

( ~ ( t ) Vo’(t)) , dt.

This is the first variation formula. It vanishes if u(0) E No(,,, ~ ( 1E) N;,,), that is, v(0) and ~ ( 1 are ) tan-

gent to, respectively, N and N‘.

(24.15a)

o‘(0) E N&,,, o’(1) E N$,,, that is, o is perpendicular and N‘ at, respectively, t = 0 and t = 1.

(24.15 b)

o(t) is

a geodesic, that is, Vo’(t) = 0.

(24.15~)

Formula (24.15a) is, of course, implied by our assumptions that 6(s, 0) E N and 6(s, 1) E N ‘ . Let us suppose further that the remaining conditions are satisfied. We can now carry out the differentiation in the first term of the right-hand side of (22.2), with the result that

Let us examine the term of the form, say, (V, 13,6(s,0), o‘(O)>, which at first sight does not have a familiar form. However, we shall as usual assume that 6(s, t ) is of a special form, namely, that there exists a vector field X E Y ( M ) that is tangent to N such that ds6(s, t ) = X(6(s, t ) ) .

Then

- (v(t>, 0 i=

=

o’(t))(o’(t>)>d t

1

su,{i)(m3 W)/ i=O

+ J1[llVu(t)llz 0

- K(v(t), o’(t))((v(t)I(’L(0)’ sin’ O ( t ) ] dt. (24.16)

The second form is obtained from the first by integrating by parts the first term in the integrand. The third is obtained from the second by applying the Gram-Schmidt process to the vectors v(t), a’(t); (O(t) is the angle between v(t) and o ’ ( t ) ; L(O), the length of the curve t -+ 6(0, t ) = o(t), is equal to llu’(t)ll, since o is parametrized proportionally to arc length). There is an obvious interest in the first two terms on the right-hand side of (24.16), particularly in knowing geometric conditions that they be zero. The following theorem will give us such conditions.

THEOREM 24.5 Let M be a complete Riemannian manifold, let N be a submanifold of M , let 6(s, t ) , 0 5 s, t I1, be a homotopy in N such that 6(s, 0) E N . Let o(t) = 6(0, t), and u(t) = d S 6(0, t). Conclusion : If, for each s, the curve t -+ 6(s, t ) is perpendicular to N at t = 0, then (Vv(O), u )

=

-S,,.(,,(u(O), u )

for all u E Nu(o).

Conversely, if a(t), 0 I t I 1, is a curve in M , with o(0) E N , a’(0) and if uo and u1 E Nu(o,satisfy uo E M,,(o), ( ~ 1 ,u>

= - ~ u ~ ( o ) ( uuo) ,

(24.17) E N:(o,,

for all u E

then there is at least one homotopy 6(s, t ) , 0 5 s, t 5 I , such that

w,0 )

EN

,

46 ( S , O ) E N&, 0).

(24.18)

For each s, t -+ 6(s, t ) is a geodesic 6(s, t ) = ~ ( t ) If. v(t) = 8,6(0, t ) , then

328


t -+ v ( t ) is a Jacobi vector field along a satisfying v(0) = u,, , Vv(0) = ul. In particular, c’ satisfies the initial conditions (24.17).

Proof It suffices to prove this theorem locally, that is, to suppose that N is contained in an open set U that has defined on it a basis of vector fields XI, . . . , X , E V ( U ) such that (Xi,Xj)=aij

for 1 1 i , j i n = d i m M (summation convention in force)

so X,, . . . , A’, are tangent to N .

Then let Ai(s, t ) , Bi(s, t ) be the functions such that

dt6(s, t ) = A;($, [ ) X i ( $ , tj,

d,d(s, t ) = B i ( S , t)Xi(S, t ) .

Our assumptions about 6 are equivalent to the conditions for 1 5 i I n;

Ai(s,0) = 0

for

Bi(s, 0) = 0

II

+ 1 I i 5 m.

Now

V”(0) = V,d,6(0, 0) = v,d,s(o, 0) = vs(

1

n+lSi<m

AJ,).

Hence (Vu(O),

xj(a(o>)> = C Ai(O, O)B,(O, O)>,

which proves (24.17). Now we deal with the converse. Let y(s), 0 I s i 1, be any curve in N with y’(0) = u o . We show that there exists a vector field s + w(s) E N y c s , along y such that

w(s) E N;,,,

for 0 I s I 1,

Vw(0)

=ul,

w(0)

= a’@>.

To do this (again it suffices to work locally), we can choose again the orthonormal bases X i for vector fields such that X , , 1 i s n, is tangent to N. Suppose that

Let us look for w(s) of the form

329


Then

This suggests that we choose the aj(s), hence w(s), so that for n

aj(s) = ( X i , o’(0))

da, (O) = < U l j x k > ds

-

n+ 1

+ 1 Ij

Bi(Obj(O)(VxiXj

sj_<m

_< m , 1

lsisn

Xk)(O(O))

for n

+ 1 I k Im.

TOshow that w(s) SO defined satisfies the required conditions, it remains only to check that

(vw(o),x k )

=

for 1 5 k 5 n .

(#I, x k )

But, for 1 5 k 4 n,

=

-1Bi(o)uj(o>(v,ix, X j ) ( g ( o ) ) >

= -

as required.

Now that we have verified the existence of a vector field s W(S)along the curve s -+ y(s), we can proceed to the proof of the converse. Choose the homotopy 6(s, t ) so that --f

6(s, t ) = exp(tw(s))

for 0 I s, t 5 1.

It should be clear that 6 satisfies all conditions of (24.18) except possibly the initial conditions, which we now verify: dt6(s, 0) = w ( s )

and

d,6(s, 0) = y (s),

by the definition of the exponential map. Thus u(0) = a,s(o, 0) = y’(0) = u o ,

Vu(0)= v,a,s(o,0) = v,a,s(o,0) = V w ( 0 ) = u l , which completes the proof.

330


This theorem suggests several definitions. First, if a(t), 0 I t I 1, is a geodesic that is perpendicular to N at t = 0, we say that a Jacobi field t -+ u(t) along o is transziersal to N if it satisfies 40) E NdO,

(Vu(O),u )

=

-S,,,(ol(u(0), u )

(24.19a) for all u E N o ( o ) .

(24.19b)

The second part of the theorem asserts that such a vector field arises as the infinitesimal deformation of at least one geodesic deformation of c such that each geodesic of the deformation is initially perpendicular to N . Let us say that a point o(to)of o,0 < to 5 I , is a focal point of N with respect to a if a nonzero transversal Jacobi field exists which is zero at t o . The dimension of all such Jacobi fields (notice that u(to) = 0 and (24.19) are linear homogeneous conditions) is called the index of the focal point: Let N' = N p L be the normal bundle to N , and let exp: N'+ A4 be the map such that, for c E N,,', t + exp(tr;) is the geodesic starting at p which is tangent there to u. We then have:

UpEN

COROLLARY TO THEOREM 24.5 If u E N', p = exp(u), then p is a focal point of N with respect to the geodesic t + exp(tu) if and only if exp,: (N'),, -+ M , is not 1-1 ; that is, if and only if exp has a zero Jacobian at u. Proof. Suppose first that exp, is not 1-1. Let y(s), 0 I s I 1, be a curve in N , ~ ( s E) N&, , be a vector field on y such that s + exp(w(s)) has a zero tangent vector at s = 0, w(0) = u. If 6(s, t) = exp(tw(s)), u(t) = i3,6(0,t ) , then u ( t ) is a Jacobi vector field along t -+ exp(tu) that is transversal to N and vanishes at t = 1 ; that is, p = exp(u) is a focal point with respect to the geodesic t + exp(tu), according to the above definition. Conversely, if t + c ( t ) is a Jacobi vector field along the geodesic t exp(tu) that vanishes at t = 1 and that is transversal to N , by Theorem 24.5 we can construct a geodesic deformation 6(s, t ) satisfying (24.18), such that ds 6(0,0)= o(O), V, d, 6(0,0) = Vv(0). Then t -+ a, 6(0, t ) would be a Jacobi field along t + exp(tu) satisfying the same initial conditions at t = 0 as ~ ( 2 ) ; hence, must coincide with u(t). But then -+

6(s, 1 )

= exp(46(s,

0));

hence, s 6(s, 1) is a curve starting at exp(u), which is the image under exp of a curve in N', and which has a zero tangent vector at s = 0. Hence, exp, is not 1-1 at u, as required to finish the proof of the corollary. The corollary provides us with important qualitative information about the distribution of focal points. For example, if u E N', and if exp(u) is not -+

331


a focal point (with respect to t -+ exp(tu)), then exp(u') is not a focal point with respect to t + exp(tu') for all u' E N' that are sufficiently close to u. Further, by Sard's theorem, the set of points p E M that are focal points with respect to some geodesic joining p to N and perpendicular to N is of measure zero. Note further that in case N is a point, say p o E N , N' = M,, , and the focal points are just conjugate points in the sense defined in Chapter 22. Many of the results proved in Chapters 22 and 23 concerning conjugate points can be generalized to apply to focal points. The Jacobi theorem (Theorem 22.4) is the prime example, and takes the following form: THEOREM 24.6 Let N be a submanifold of a complete Riemannian manifold M , let u E N1 be such that exp(u) is a focal point of Nwith respect to the geodesic t -+ exp(tu). Then, for a > I , t -+ exp(tu) is not the geodesic of minimal length joining exp(au) to N. The proof is similar to the proof of Theorem 22.4, and is left as an exercise. Now we turn to various elementary geometric applications of these concepts. The first one we have in mind is concerned with the following situation: N is a submanifold of a Riemannian manifold M , p o is a point of M , qo is a point of N. We ask: What are suficient conditions that guarantee that the real-valued function q d(po, q), for q E N, cannot have a relative maximum at q = qo? It will turn out that this is a question that unifies many isolated geometric questions concerning Riemannian spaces, and whose answer falls out in a natural way from the second variation formula. The basic theorem is : -+

THEOREM 24.7 Let N be a submanifold of a complete Riemannian manifold M . Let po be a point of M , u E Nposuch that exp(u)= p E N , and such that the geodesic def

t exp(tu) = a([) is perpendicular to N at t = I . Suppose further that c satisfies the following condition: For every Jacobi vector field t -+ v(t) along cr such that v(0) = 0, --f

(Vu(t), u(t)) > 0

for 0 5 t I 1.

(24.20)

Suppose in addition that N satisfies any one of the following conditions: N is a minimal submanifold of M

(24.21)

or dim M I 2(dim N ) - 1, and K,(u,, u z ) I K ( u , , u z )

for all u , , u2 E T(N).

(24.22)

332


Then there is at least one geodesic deformation 6(s, t ) , 0 I s, r I 1, with 6(0, t ) = a(t), 6(s, 0) = p , 6(s, 1) E N , and with length of t + 6(s, t ) 0 5 t 5 1, actually greater than the length of a, if s is sufficiently small, but not zero. Intuitively, p cannot be a relative maximum of the function q -+ d(p,, q ) on N , although this is not strictly true (unless N has within a geodesic ball about po and G is a minimizing geodesic), so we have stated the theorem in this complicated and more precise form. A word of definition is needed for the terms used in the statement of the theorem. First, a submanifold N of a Riemannian manifold M , in general, is said to be a minimal submanifold of M if for all p E N , all w E N I ,

A,(w)

+ . . . + A,(w)

= 0,

(24.23)

where Al(w), . . . , A,,(M') are the eigenvalues (counted according to multiplicity) of the symmetric bilinear form S,( , ) on N , (n = dim N ) . The geometric interpretation in terms of N minimizing " surface area" will be explained in a second volume. (In case M = R 3 , with the Euclidean metric, dim N = 2, this definition gives the classical one, that is, soap bubbles.) The proof consists of a series of lemmas.

LEMMA24.8 a contains no conjugate points of a(0) with respect to cr. The proof is almost obvious: d ( u ( t > , 4 t > >= 2(Vu(t), dt

-

40) > 0;

hence u ( t ) cannot vanish because v(0) = 0; hence, no conjugate points.

LEMMA 24.9 If u, E N , is such that S g l ( l l (u~, )L=, 0, then there is a geodesic homotopy 6(s, t ) such that

6 ( ~0, ) = p o ,

S(S, I )

E

N,

6(0, t ) = ~ ( t ) ,

and t + 6(s, t ) is of greater length than t small.

+ a(t)

d,8(0, 1)

=

v,,

if s is >O, but sufficiently

Pro?/: The fact that a( 1) is not a conjugate point of o(0) = p o with respect to a implies, we know, that exp, is 1 - 1 in the neighborhood of (Mpo)&,, where u E M,, is such that o(t)= exp(tu).

333


Thus there exists a curve y(s) in M starting at p o , such that s + exp(y(s)) is any curve in N , in particular, chosen to be tangent to u1 at s = 0. Now define 6(s, t ) = exp(ty(s)),

L(s) = length of t

--f

6(s, t ) .

If v ( t ) = 8,6(0, t ) , we see from the second variation formula (24.16) that

since u(t) is a Jacobi vector field, and u(1) = u,. By (24.20), this is greater than zero; whence, the lemma. Now, in case N is a minimal submanifold of M according to the definition (24.23), Su,(ll(, ) must have at least one nonpositive and one nonnegative eigenvector; hence it must have at least one u1 E Nu,(1)that annihilates So,(1). This suffices to prove the theorem in case condition (24.21) is satisfied. Condition (24.22) is more difficult to handle. The tool is the following lemma, conjectured by Chern and Kuiper [I], but proved by Otsuki [l].

LEMMA 24.10 Let W be a vector space over the real numbers of dimension d. Let , ) be symmetric, bilinear forms over V such that Q,( , ), . . . , d-1

C

i= 1

Qi(W1,

wlIQi(w2 > wZ)

- Qi(W1,

~

2

50 )

~

(24.24)

for ail choice of vectors w l , w 2 E W . Then there is at least one nonzero vector M! E W such that Q,(w,

W) =

0 = ... = Q I - , ( w , w ) .

For the proof, we must refer to Otsuki's paper [I]. To apply this to the theorem, we choose W = N,, Q1= So,(,),,,o,(l),,, and Q 2 , ..., Q d - l are the second fundamental forms Su2(, ), ..., Sud-2(, ), where (d(I)/\\o'(l)\\, u 2 , . . . , u d - l ) is an orthonormal basis of N,'. That (24.24) is satisfied is a consequence of the assumptions made in (24.22), namely, that KN( , ) 5 K ( , ), and the fundamental formula (24.14) relating KN( , ) - K ( , ) and the second fundamental forms. Now, to have

334


Lemma 24.10 apply to give the vector u1 E N , needed to satisfy S a . ( l ) (ul), ~l, we must have dim M - dim N = d - 1

and

d 5 dim N ,

whence dim M

2(dim N ) - 1 ,

which is precisely the condition postulated in (24.22).

Q.E.D.

For p o E M ,r > 0, recall that &o,

r ) = { P E M : 4 P o 7 P) < r > ,

that is, B(p,, r ) is the ball of radius r about po .

COROLLARY TO THEOREM 24.7 Suppose r > 0 is such that, for all u E M,, and llufl < r , the geodesic exp(tu) satisfies (24.20) for 0 < t 4 1, and such that d(exp(u),p,) = (lull. Then B ( p o , ( r / 2 ) ) is geodesically convex in the sense that, for p , q E B @ , , (r/4)), any geodesic of shortest length joining p to q must be completely in B(p,, (r/2)). In particular, there is a geodesically convex ball about each point of M . t

--f

Prooj. By the triangle inequality for the distance function d( , ) we have d(p, q ) 5 d(p, p o ) + d(p, q ) < r . Let y(s), 0 I s I 1 , be a geodesic of length d(p, q ) joining p to q. Then d(J,, y(s)) < r for 0 Is 5 1 ; hence y(s) E B(po, r ) for 0 I sI 1 . But, by Theorem 24.7, the function s + d(po,y(s)) cannot have a maximum for 0 < s < I , since s + y(s) is a geodesic of M hence d ( p o , y(s)) 5 d(po , p ) < r/2. That is, y(s) E B(p,, (r/2)) for 0 I sI 1. To show that each point p o has a convex ball, it remains only to show that such a positive real number r exists. Suppose, then, that o(t),0 t I 1 , is a geodesic of M , with a(0) = p o , and that t u(t) is a Jacobi vector field along 0 that vanishes at t = 0. Then --f

which is obviously greater than zero, provided Ila‘(O)Il is sufficiently small, and IIVr-(O)ll is bounded, say, IIVu(0))I I 1 and l/cr’(O)/l I r. But if this holds for all such ~ ( t with ) lIVu(O)l/ i I , clearly ( V v ( t ) , u(t)> > 0. Now this r might vary as Ilcr’(O)ll varies. But again the infimum of such r is positive as o’(0) varies in direction about p o . Q.E.D.


335

Note an additional fact that follows from this argument: (d/dt)(Vu(t),v ( t ) ) is always > 0 if

(40,R(U(t),a’(t>)(a’(t)>>5 0. This condition is automatically implied by the condition : The sectional curvature of M is nonpositive. This condition, together with the condition that M be simply connected, implies (Theorem 23.1) that exp: M,, + M is a diffeomorphism. With these conditions we conclude that: The geodesic balls B(po, Y), for any Y > 0, are geodesically convex, if the curvature is nonpositive and if M is simply connected.

(24.25)

Another result of this type is: If u E M p o , (Iu((= r, if (24.20) is satisfied along the geodesic t + exp(tu) = o(t), then expuw E Mpo IIwII

=yH

is a submanifold about exp(u), and its second fundamental form S-up(lJ , ) is positive definite there.

(24.26)

Proof. That it is a submanifold follows from the implicit function theorem, since exp(u) is not a conjugate point of p o (Lemma 24.9). If u1 E Mexp(”) is tangent to the submanifold, there is a Jacobi field t -+ u ( t ) with u(0) = 0, u(1) = ul, and a geodesic deformation t 6(s, t ) with v ( t ) = d, 6(0, t ) , [(a,6(s, 0)II = Y, 6(s, 0) = p o . From the second variation formula, we have --f

&{1)(Ul3

01)

+ in the direction -a‘(r/J]d(O)II) is positive definite, then a satisfies condition (24.20).

(24.27)

Calculation of the Second Fundamental Form of Hypersurfaces Let f be a real-valued function on a Riemannian manifold M . We want to see how the second fundamental form, hence also the curvature, of the hypersurface f = constant can be computed in terms of .f: Construct the

336


gradient (vector) field o f f ; an element of V ( M ) , denoted by grad f and defined by (gradf, X ) Thus if p

E

=X(f)

for all X

M is not a critical point forf, that is, if

E

V(A4).

4f#

0 at p , then

f -f ( P ) = 0 defines a hypersurfacet i n a neighborhood of p , and grad f ( p ) is perpendicular to this hypersurface. Hence (gradf/I/gradfll)(p) is the unit normal to the hypersurface, and the second fundamental form is

for X, Y tangent to the hypersurface, that is, satisfying X ( f ) =0 = Y(f).

Since (A', gradf)

= 0 = ( Y , gradf),

this can be rewritten as

Let us compute this in terms of an orthonormal moving frame. Let U be an open set of M containing p, with a basis (w,) of 1-differential forms that is dual to an orthonormal basis X i of vector fields in U . (1 5 i , j , . . .

m

= dim

M ; summation convention in force.)

Suppose that

clf=fiwi,

dfi = f i j w j .

Then we see that grad f

= fi

Xi

3

II grad f II

=

fifi .

Let ( a i j )be the connection forms corresponding to the given orthonormal basis. Then, for X E V ( M ) , by (24.12),

V, ~ r a d f= CX(fi>+ f

j~ji(x)IXi.

Hence

(V, grad f >Y >

= (X(si)

+ f j wji(X>>Oi(Y ) .

A hypersurface of a manifold is a submanifold of one lower dimension (that is, of codimension I).


337

Thus the eigenvalues of the (normalized) second fundamental form are precisely those of the quadratic form (in i, k ) :

restricted by the condition f i A i = 0. Example

M = Euclidean space, with the flat Euclidean metric ,f = f a i j x i x j , where (xl, . . . , x,,)are the Euclidean coordinate system, ( a i j )is a symmetric constant matrix. Put the wi = d x i . df = a i jx j dx,,hence

fi = a i j x j ,

f.. CJ = a i j .

Since w i j = 0, the above quadratic form reduces to

These formulas plainly indicate how the second fundamental form is to be computed in principle in terms of the algebraic properties of the matrix (aij). Let us carry this out explicitly for the simplest case, namely: Suppose aij = aij, f = $ x i x i ;hencef = r 2 determines a sphere of radius r J 2 . Suppose, then, that f (x) = r 2 .

h =xi,

Hence,

fih = x i x i = 2j-= IIgradfII

2r2.

= J2r.

The quadratic form is then AiAj/,,br, whence: The (normalized) second fundamental form of a sphere of radius Y has all eigenvalues equal to 1/,,h. Thus, it has constant sectional curvature equal to llr. Totally Geodesic Submanifolds

DeJinition

N of a Riemannian manifold M is said to be geodesic at N if each sufficiently small geodesic of M beginning at p and tangent there to N lies in N completely. N is said to be toral1.v geodesic if A submanifold

a point p

E

it is geodesic at each of its points.

338


Of coLirse this is the geometric definition of a totally geodesic submanifold, designed to generalize the concept of plane in Euclidean geometry. We now want to show that this definition is equivalent to several others and that this equivalence is reasonably nontrivial and useful. “

”

THEOREM 24. I 1 A submanifold N of a Riemannian manifold M is totally geodesic if and only if its second fundamental form of N is identically zero.

Proof. Let y E N , and suppose that N is geodesic at p. Let cr(t), 0 5 t 5 I, be a curve in N , beginning at p . which is also a geodesic of M . Set u = o’(0)E N , . Pick a u E N,’. Then, almost by definition, SU(U,u ) = ( u , Vu(0)) = ( u , Vo’(0)) = 0.

Hence, if N is geodesic at p, So( , ) is identically zero for all u E N,‘. This proves one part of Theorem 24.1 I . Turn to the converse ; suppose that S, )( , ) is identically zero. By (24.7), we see that

v,

Y = VX*Y,

for any pair X , Y of vector fields of M that are tangent t o N . (V* denotes covariant differentiation in the induced metric on N . ) In particular, we have proved : LEMMA 24.12 If the second fundamental form of N is identically zero, then every curve on N that is a geodesic in the induced metric on N is a geodesic of M also. But this property of N clearly implies that it is geodesic at each point, by the uniqueness of geodesics. Q.E.D. Another useful geometric characterization of total geodesity is Theorem 24.13. THEOREM 24.13 A submanifold N of a Riemannian manifold M is totally geodesic in M if and only if the following condition is satisfied:

Each sufficiently small geodesic of M whose end points lie on N must lie completely in N .

(24.29)

339


t I 1, be a geodesic of ProoJ: Suppose N is totally geodesic. Let o(t),0 I M with o(0) and ~ ( 1E) N . If length o is sufficiently small, o(1) lies in a geodesic ball about o(O), and a(1) can be joined to a(0) by a geodesic nth induced metric on N : By Lemma 24.12 and the uniqueness of geodesics, these must coincide. Conversely, suppose that (24.29) is satisfied. Let o ( t ) , 0 I t I 1, be a geodesic of N in the induced metric. If length o is sufficiently small, ~ ( 1 ) can be joined to o(0) by a geodesic of M . By (24.29), this geodesic must also lie in N ; hence, by the length-minimizing property? of geodesics, it must also be a geodesic of N , which by uniqueness of geodesics on N must equal o. Thus every geodesic of N is a geodesic of M ; hence N is totally geodesic in M . Q.E.D.

This completes our study of the more or less superficial properties of totally geodesic submanifolds. We now go a little deeper and investigate the relation between the properties of totally geodesic submanifolds of M and its curvature tensor. THEOREM 24.14 Let N be a totally geodesic submanifold of a Riemannian manifold M . Then, for p E N ,

R V , , N,)(N,) c N , ,

NN,;N , N,)(N,) = N , ,

etc.,

where R( ; , ), R( ; ; , )( ), etc., denote the successive covariant derivatives of the curvature tensor. Proof. First we must define the covariant derivative of the curvature tensor. It is to be an F(M)-multilinear mapping V ( M ) x V ( M ) x V ( M ) x V ( M )+ V ( M ) ,denoted by and defined by

x,y, 2, w

+

N X ;y, ~>(w>,

W X ;r, Z > ( W = V A W , Z ) ( W ) ) - w x Y,Z > ( W - R( Y, Vx Z)(W ) - R( Y,Z, V,y W ) .

(24.30)

It is easily verified that this formula does actually define an F ( M ) multilinear mapping; hence it forms what is classically known as a " tensorfield " on M . Our earlier discussion of how to define the values at a point of such tensor fields as differential forms and vector fields can be extended to

t Notice that up to this point we have been using only the self-parallel property of geodesics.

340


show that all these covariant derivatives of the curvature tensor possess “values” at points of M.For example, the value at p is a multilinear map M , x M , x M , x M , --f M,, denoted by

(u, u1, v 2

3

u3)

--f

R(u; u1, D d ( U 3 ) .

x, z,

For y, W E V ( W , R ( X ; Y, Z)(W)(P> = R(XCp); Y(P>,Z(P)>(W(P)). This definition can be iterated to define the higher covariant derivatives of the curvature tensor. Now we know that N totally geodesic is equivalent to V, Y tangent to N , for any A’, Y E Y ( M ) tangent to N. Hence V, V, Y and V, V, Y are tangent to N , for X , Y, Z tangent to N. By the Ricci identity connecting iterated covariant derivatives, we see that R ( X , Z ) ( Y ) is tangent to N. This leads to the statement:

Further covariant derivation leads to the analogous statement for the covariant derivatives. Q.E.D. Theorem 24.14 tells us that the tangent spaces to totally geodesic submanifolds cannot be arbitrary. The following theorem tells us what Riemannian manifolds have a maximal number of totally geodesic submanifolds. One feels intuitively that a “generic” Riemannian manifold can have very few totally geodesic submanifolds, but research in this direction is not very advanced.

THEOREM 24.15 Let M be a Riemannian manifold of dimension 2 3 such that each twoand three-dimensional tangent subspace is tangent to, respectively, a twoand three-dimensional totally geodesic submanifold. Then M has constant sectional curvature. Proof: Let p E M , and let N, be any two-dimensional subspace of M,. Then R(N,, N ~ ) ( N , )c N,. Let u E M p n N,‘. Now

since each linear transformation R(u,, u z ) is skew-symmetric. R ( N p ,N,)(u) must belong to N + (u), since N , ( u ) is tangent to a three-dimensional totally geodesic submanifold. By skew-symmetry of R( , ), these two relations are compatible only if R(N,, N,)(v) = 0. Since v is arbitrary in N,’,

+

NN,>N,W/)

= 0.

(24.31)

34 1


Let ul, v2 be orthonormal vectors in N p . Then, since N p is an arbitrary twodimensional subspace, there are relations of the form R(u,, u2)(2)1)

=

-an,,

R(u,,

N u , , v)(u2)

a u , , u)(u) = h,

uz)(v2)

= 0,

= a02 9

R ( v , , u2)(v) = 0.

We want to prove that a = b. Let ul, v2 be orthonormal vectors in N p . Since (24.3 1) holds for arbitrary two-dimensional subspaces, R(u,, u

+ v2)(v - v 2 ) = 0.

But also R(v,, u + u2)(v - u2) = R(v,, u)(v) - R(u,, u2)(u2) = (b - a)v,,which implies a = b. This implies that the sectional curvatures of all two-dimensional subspaces of M p are the same. We now show that they remain constant when p varies over M . Let X i (1 i,j, . . . 5 m = dim M ; summation convention) be an orthonormal basis for vector fields on an open subset of M , let mi be a dual basis of I-forms, and let Q i j = R i j k l okA wI be the corresponding curvature forms. Let p - K be the function on M whose value at each point p is the common value of the sectional curvatures at this point. Then, by (24.31),

This implies that Qij =

Kwi A m i .

(24.32)

The Bianchi identities for the curvature tensor are dQj

= Wik A n k j - f i i k A W k j ,

where ( w i j )are the connection forms. From (24.32),

dRij

= KWikOk A W j - KO,A W k A Okj.

But also dQij

= d K A Oi A

Oj

+ K W i k A wk A

wj

- Koi

A ~

j Ak

ok.

Combining these two different ways of computing d Q i j , we have

dK

A

mi A m j = 0.

Here is where we use the fact that m 2 3. The 1-form d K can have zero inner product with all 2-forms only if it is zero; that is, K = constant. Q.E.D.

25

Groups of lsometries

One theme pervading mathematics for at least a hundred years is the emphasis on the reciprocity between a geometric structure and its group of automorphisms. This attitude pervades physics: For example, we may say that the whole point of the Theory of Special Relativity is to replace the automorphism group of Newtonian mechanics (the Galilean group) by the Lorentz group. Thus our study of Riemannian manifolds must take into account the group of its automorphisms. Since a complete development would involve us in the technicalities of Lie group theory, we shall limit our treatment to several topics that will give the flavor of what may be called “grouptheoretical geometry,” trying to get along with a minimum of Lie group theory. As usual in mathematics, the subject is rich and attractive precisely because it involves the interaction of two seemingly different disciplines, but this creates difficulties i n exposition. Let M be a Riemannian manifold, supposed, for simplicity, to be complete. A diffeomorphism 4 : M M is said to be an isometry of M if --f

ll+*(4ll

=

lI4l

for all u E T ( M ) ,

where u + /1u/1= ( u , u)”’ is the length function defined by the metric. Then preserves the length of curves; hence it also preserves distances between points. (It is a n interesting fact that, conversely, a distance preserving homeomorphism is an isometry.) Since obviously the product of two isometries is an isometry, as is the inverse, the set of all isometries forms a groupt. Now, the first general result of interest is that the group of all isometries forms a Lie group. Let I ( M ) be the group of isometries of M . As a start, we shall take over the following theorem without proof, which can be found in Helgason’s book [l].

THEOREM 25.1 Let A4 be a Riemannian manifold. I ( M ) can be made into a Lie group so that:

t It is assumed that the rcader is familiar with the definition and elementary algebraic propertics of groups, as well as certain standard notations. 342

25. Groups of Isometries

(a) The map Z ( M ) x M

x M is differentiable.

-+ which

assigns 4 ( p ) to each pair

343

(4,p ) E Z ( M )

(b) If p E M , and 41,4 2 ,. . . is a sequence in I ( M ) so that limj,m 4 j ( p ) exists, then at least one subsequence of all 4 converges to an element of Z ( M ) . The first topic to be studied concerns the relation between the orbits and isotropy groups of a closed subgroup G of Z ( M ) . It is known that G itself is a Lie group that acts, in the manner by which it is defined, as a differentiable transformation group on M . For p E M , the isotropy subgroup, denoted by GP, of G at p , is defined by

GP = { g E G : gp

=p } .

By (b) of Theorem 25.1, GP is compact. The orbit of G at p , denoted by Gp, is defined by Gp = { g p : g E G}. (It is convenient to simplify the notation g(p), that is, the transform of p by the diffeomorphism of M which is g, to gp when no confusion is likely.) Assertion (b) of Theorem 25.1 implies that each orbit is a closed subset of M . Further, each orbit is a submanifold. For the coset space GIGp is a manifold and the map GIGP M obtained by passing to the quotient from the map g + g .p if G -+M is a submanifold map. It is even a regularly embedded submanifold, since part (b) of Theorem 25.1 implies that a convergent sequence in Gp must also converge when considered as a sequence in GIGP. Now we can state the main general result concerning the structure of the orbits and isotropy subgroups of a closed group of isornetries. THEOREM 25.2 Let G be a closed group of isometries of a complete Riemannian manifold M . Let p E M , and let N = Gp, the orbit of G at p . Then there is an open set U of M containing N such that: (a) GU = U, that is, U is the union of orbits of G. (b) Every q E U can be joined to N by exactly one geodesic whose length is d(q, N ) . For q E U , Gq is conjugate within G to a subgroup of GP. (c) U is dense in M . (d) The main part of the proof is in the following lemma. LEMMA 25.3 Let N be a closed, regularly embedded submanifold of a Riemannian manifold M . Let N be the normal tangent vector bundle to N . Consider N

344


as a submanifold of N', via the zero cross section.? Define: V = { u E N ' : There are no focal points of N along the geodesic t + exp(to), 0 < t I 1, and this geodesic is theonly geodesicof length 5 ((uII joining exp(u) to N ) .

Then V is an open subset of N' which contains N . exp restricted to V is a diffeomorphism of exp( V ) with V. Proof. Suppose that uo E V and that any neighborhood of vo in N' contains points of N' not lying in V. Now, since uo is not a focal point, that is, the Jacobian of exp: N' + M is nonsingular at u o , a neighborhood of uo in N' contains no focal points. Thus there are two sequences u l , u 2 , . . .; u l , u 2 , . . . of elements of N' with (a) limj+mu j = vo. (b) exp(uj) = exp(uj) for 1 5 j < GO, but uj # u j . (c) /Iujll I llujll for 1 < j < co. (d) The geodesics t exp(ruj) and t + exp(tuj) contain no focal points. --f

Suppose that ui E N i J . We see that all points p i lie at a bounded distance from p , where p is the point such that uo E Np'. Since the metric on M is complete and N is a closed regularly embedded submanifold of M , we can assume without loss of generality that lim p j

j-

=q

EN,

0)

lim u j = u EN,'.

j+m

Then exp(u) = exp(u,), ljull 5 IIuII, which implies that u = uo by the definition of V. This, however, contradicts the fact that there is a neighborhood of vo in N ' on which exp is 1-1 when restricted. This shows that V is open in N'. Now, by its definition, exp is 1-1 when restricted to V . Since it also has nonzero Jacobian at each point of V , exp is a diffeomorphism of V with exp(V). Return to the case where N is the orbit G * p of a closed subgroup G of I ( M ) . If V c N ' is as described in Lemma 25.3, it should be clear that$ g* u E V

for all v E V .

Hence, g(exp V ) = exp V ;

t That is, p t N is identified with the zero element of N i . J If .L/ denotes by the element of G and the diffeomorphism of M derived from C c I ( M ) , g* denotes the linear extension ofg to tangent vectors. g +g* defines an action of G on N I. Since .L/ sends a geodesic of M into a geodesic, the actions of G on M and N commute with the map exp N' + M .


345

hence the U c M required for the theorem can be chosen as exp(V). This will, at any rate, satisfy (a) and (b). To prove (c), let q E U, and let o ( t ) , 0 5 t 5 1, be the geodesic of minimal length joining q to N . We have G4 c G"(O). For otherwise there is a g E G4 such that g 4 G"(O);that is, go(0) # o(O), but gq = q. Then c and go would be distinct geodesics of minimal length joining

q to N , contradicting that q E U. But o(0) E N = G p ; hence a(0) = gp for some g E G. Then one checks easily that GU(')= G g P = gGPg-' = Ad g(GP).

That is, Ad g-'(G4) c GP.

To show that U is dense in M , suppose q E M and o(t), 0 5 t I 1, is a geodesic of length d(q, N ) joining q to N . Then a ( t ) E CJ for 0 I t < 1. For otherwise there would be another geodesic y joining a(to)to N , y perpendicular to N at y(1). The corner between c and y at a(to) could be cut across to give a curve of shorter length than a joining q to N ; contradiction. This finishes the proof of Theorem 25.2. Remarks (A) Theorem 25.2 may be regarded as providing a local structure theorem for a group of isometries, asserting that in the neighborhood of an orbit the action of a closed isometry group is, in a sense, built up from the action of a transitive isometry group, namely, G on Gp, and a linear action of the isotropy subgroup, namely, GP on Np'. (B) Let us say that a point p E M is a maximal point for the action of G on M i f

dim G P5 dim G4

for all q E M ,

dim Gp 2 dim Cq

for all q E M .

or, equivalently, Let us say that a point p E M is a principal point for the action of G on A4 if p is a maximal point, and if the number of connected components? of C p is no greater than the number of connected components of C4, for any other

t Recall that G' is a compact topological group; hence, as a topological space, it has only a finite number of connected components. As in any topological group, the component containing the identity element is an invariant subgroup of G'.

346


maximal point q E M . Thus Theorem 25.2 guarantees that if p is a principal point or maximal point for the action of G, so are all points of U . In particular, the set of all principal and maximal points are both open and dense in M . I n general, if p E M , g E GP,g* maps Npi into Np'. This defines a homomorphism of G into the group of linear transformations on Np'. (This is the linear action referred to in (A) above.) Notice that:

If p is a principal point for the action of C on M , for each g E GP,g * : N,' + N,' is the identity map. To prove this remark, suppose u E N,'. We may suppose that [lull is sufficiently small so that exp(tu) E U for 0 2 t 5 1. Thus Gexp@)c GP. Since p is a principal point, Gexp(u) = G p ; hence g exp(u) = exp(u). But g exp(u) = exp(g,(u)); hence exp(u) = exp(g,(u)), forcing u = g,(u). Hence the action of G in a neighborhood of a principal orbit is " trivial" i n the sense that a neighborhood is the product of the orbit by a cell of Euclidean space, and the group action on the Euclidean cell is trivial, that is, every element of the group acts as the identity. Another way of putting this is to say that, at the principal points, the structure of isometry groups is just that determined by the extreme types, namely, the transitive groups and the trivial groups. Now we would like to get some idea of the structure of the action of G at points that are not principal orbits. Let us say that two points p , q E M lie in the same orbit class if the isotropy subgroups GP and Gq are conjugate within G.

THEORFM 25.4 Let G be a closed group of isometries of a complete Riemannian manifold M . Let C be a compact subset of M . Then there are only a finite number of orbit classes among the points of C . Proof. Proceed by induction on M . If it is zero dimensional, G must be a finite group; hence the statement is obvious. Suppose it is not true for M , but is true for all manifolds of lower dimension. Let (p,), 1 < j < GO, be a sequence of points of C such that the isotropy subgroups Gp' are all nonconjugate within G. Since C is compact, we can suppose that lim,+m p J = p . Let N = Gp, the orbit of G at p , and let N L be the normal tangent vector bundle to N . Define

S

= {V E N':

ll~i= l l}.

S is a manifold of one less dimension than M . G acts on S : For a given g E G gives, by definition, a diffeomorphism of M . Its differential g? is a diffeomorphism of T ( M ) , and the correspondence g + gu defines an action of G on T ( M ) . (Exercise: Prove this.) S is a submanifold of T ( M ) and


347

clearly each g* maps S into itself; hence it defines an action of G on S as a transformation group. We want to apply our induction hypothesis to this action. To do this we must know that S admits a Riemannian metric having the property that G acts as a group of isornetries. This will follow from a lemma. LEMMA 25.5 Let M be a Riemannian manifold. Then T ( M ) admits a Riemannian metric having the property that the group of isornetries of M , when extended to an action o n T ( M ) , acts as an isometry group on T ( M ) . We leave the proof of this lemma as an exercise. At any rate, we suppose that there are only a finite number of orbit classes of the action of G on S. Suppose that q is a point of M that is close t o p , that is, so that q = exp u, for some u E V, where V is the subset of N’ that is described in Lemma 25.3. Note that (34

=G

~ / ~ ~ ” ~ ~

where the right-hand side is the isotropy group of the action of G on S. For if g,(u) = u, then 94 = 9 exp(u) = exp(g*(u)) = exp(0) = 4 .

That is, G’’llullc G4. If gq = q, then g,(u) = u, for otherwise exp(u) = exp(g,(c)), contradicting the definition of V, whence Gq c Gv’IIuII. Hence there are only a finite number of orbit classes among the orbits of points nearp, which contradict the fact that limj+apj = p , and that the orbit classes among the (pj) are distinct. In studying the distribution of the various orbits of G, it is convenient to consider the set of orbits as itself forming a space. Dejinition Let G be a closed group of isometries of a Riemannian space M . The

orbit space of the action of G on M is a space, denoted by G\M, abstractly

constructed as follows: A point of C\M is an orbit of the action of G on M . G \ M is made into a metric space as follows: For p , q E M , the “distance” between the orbits Gp and Gq is just the minimal distance d(Gp, Gq) between the subsets, defined as usual by using the given Riemannian metric on M . Define the projection mapping 4 : M - t G\M by assigning 4 ( p ) = Gp to each p E M .

348


THEOREM 25.6 The orbit space G\M with the distance function defined as above is a well-defined metric space. The projection mapping 4 : M + G\M defined above is an open, continuous mapping. Let M o be the set of p E M such that Gp is a principal orbit of G, and let (C\M)' be the set of principal orbits, that is, the image under 4 of M o . Then (G\M)' can be made into a manifold so that 4 : M o + (G\M)O is a maximal rank mapping, in fact a principal fiber bundle with structure group G. Proof. Given p so that

E

M , one can choose a point q on any given orbit of G 4P7

4 ) = 4GP,

(since M is complete and the orbits of G are all closed in M ) . This fact suffices to show that the metric space axioms are satisfied for G\M. d(Gp, Gq) = 0 *d(p,q)

=0

*p =

*Gp

= Gq.

4 G P , Gq) = 4 P ,4 ) = 4 q , P) = 4 G q , Gp). To show transitivity, that is, 4GP, Gr)I 4 G P , Gq) + 4 G q , Gr), choose q and r on their respective orbits so that

~ G PGq) , = 4 p , 91,

+

d(Gg, G r ) = d(q, r ) .

Then d(Gp, C r ) I d(p, r ) 5 d(p, q ) d(q, r ) , whence transitivity. Continuity of 4 also follows easily from this property of the orbits. Suppose that U is an open subset of M , that p E U , and that d(Gq, Gp) is sufficiently small. As remarked above, we can choose q so that d(Gq, Gp) = d(p, q ) . Hence q E U if d(Gq, Gp) is sufficiently small, and $ ( U ) is open in G\M. There is another way of stating this result:

4 - ' ( 4 ( ~ is ) )open. But 4 - ' ( $ ( U ) ) is the saturation of U with respect to G, that is, the union of all orbits of G that touch U . Turn to the last statement of the theorem. We have seen that M o is open in M (Theorem 25.2). Let p E M o and N = Gp. If V is the subset of N' defined in Theorem 25.2, we have seen that exp( V ) is an open subset of M o , and that exp( V n M p ) intersects each orbit of G only once. Thus V n Mp -+ 4 exp( V n M p ) provides a homeomorphism of an open subset of Gp with an open subset of a Euclidean space. It is readily verified that these homeomorphisms combine in the right way to provide a manifold structure for G\M*, which has almost by definition the property that 4 is a maximal rank mapping.


349

The fact that 4 : M o + 4 ( M o ) is a principal fiber bundle mapping follows now at once from the definition of principal fiber bundle, which goes as follows :

Definition Let E and B be topological spaces, let 4 : E + B be a continuous mapping, and let G be a topological group that acts on E. This setup is said to define a principaljiber bundle with structure group G and projection map 4, denoted by (E, B, 4, G), if each point b E B has a neighborhood U c B and a mapping 4 : U x G+ E, such that: (a) +(U x G) is an open subset of E, and 4 is a homeomorphism with this subset. (b) For 6‘ E U , g, g‘ E G, g4@, s’) = 4@,gg’).

Roughly, we may say that a principal fiber bundle is determined by the action of a topological group G on a space E so that (i) G acts simply on E, that is, g E G, e E E, ge = e implies g = identity; (ii) each point of E has a neighborhood invariant under G which is isomorphic to the simplest type of action of G, namely, G acting on a product U x G by leaving each element of U fixed and acting on G by left translations. Return to the case where G is a closed group of isometries acting on a complete Riemannian manifold M . Now the principal orbits of G are dense in G\M and hence have a manifold structure. This fact, together with some computations of orbit classes in special cases, suggests that an orbit space be regarded as a sort of “ generalized manifold,” with the distance function on it that we have used above to define a “ generalized Riemannian structure.” We proceed then with a geometric study of the orbit space G\M, based on the fact that it is a metric space in the natural way described above. LEMMA 25.7 Let G be a Lie group of isometries of a Riemannian manifold M . Let

a: [0, 11 + M be a geodesic of A4 that is perpendicular to one orbit of G, say, to Ga(0). Then o(t>is perpendicular to G a ( t ) for 0 t < 1.

Proof. Let g(s), 0 I s 5 1, be a curve in G, withg(0) = the identity element. Let 6(s, t ) = g(s)o(t), 0 5 s, t 5 1. 6 is a homotopy in M having the property that, for fixed s, the curve t 3 6(s, t ) is a geodesic whose length is equal to the length of G (since each transformation g(s) is an isometry of A4 and hence

350


maps geodesics of M into geodesics). Let v(t) = d, 6(0, t ) ; that is, t + v(t) is the vector field on y representing the infinitesimal deformation. The first variation formula implies that (v(t), y’(t))

=

(v(a), y’(a))

for 0 i t I 1.

( ~ ) ; (u(O), y’(0)) = 0, implying that (v(t), y ’ ( t ) ) = 0. Now v(a) E ( G ~ ( U ) ) ~hence But, as y varies over all such curves in G, v ( t ) fills up the tangent space to the orbit of G at y ( t ) ; hence y ’ ( t ) is perpendicular to the orbit as required. Say that a geodesic of A4 is transversal to the action of G if it is perpendicular to each orbit of G that it touches. Lemma 25.7 then asserts that there is a plentiful supply of such transversal geodesics. Let o(t), 0 I t 5 1, be one of them. We say that a Jacobi vector field t + v(t) along o is transversal to the action of G if there is a geodesic deformation 6(s, t ) of o whose infinitesimal deformation vector field is v, that is, so that v(t) = d,6(0, t ) . A curve of G\M is a geodesic of G\M if it is equal to the projection under the projection map M G\M of a geodesic of M that is transversal to the action of G. (The justification for this simplification is that it can be shown that these are precisely the curves that locally minimize arc length, as arc length is defined in the general theory of metric spaces.) --f

THEOREM 25.8 Let G be a closed group of isometries of a complete Riemannian manifold M . Let L be a closed subgroup of G, and put M ( L ) = { p E M : GP is conjugate within G to L } . Then M ( L ) and G\M(L) are manifolds and the projection map M ( L ) + G\M(L) is a fiber space map. Further, G\M(L) is a totally geodesic subset of G\M.

Proof. Obviously, if two points of M lie on the same orbit of G, then the isotropy subgroups of these two point are conjugate in G. Hence M ( L ) contains the entire orbit of any point it touches: By C\M(L) we mean the subset of the orbit space consisting of those orbits that lie in M(L). Let p E M , let N be the orbit of G at p , and let CJ be the neighborhood of N having the properties listed in Theorem 25.2. We have seen that, for any q E U , Gq is conjugate to a subgroup of G”. Suppose that GP = L, that is, p E M ( L ) . Any q E U can then be transformed by a n element of G so that Gq c G P , and so that the unique geodesic of shortest length joining q to N ends at p . Suppose t + exp(tu), 0 5 t 5 1, v E Np’, is this geodesic. Then q E M ( L ) if and only if Gq = G P .This fact and the geometric properties of U force the following conclusion: G P= GP if and only if g,(u) = v for all g E GP.


35 1

This suggests the following way to parametrize points of M(L) close to p . Define N,'(L) = {u E N , ' :g,(u) = u for all g E L ) . There is a map G/L x N,'(L) + M defined as follows : For g E G, u E NpL(U ) , map ( 9 , ~ )into exp(g&)). Since exp((g4,4 = e d g , e*(4) = exp(g,(u)), this map passes to the quotient to define the desired map of G/L x N,'(L). (G/L denotes the space of left cosets of L in G, considered, of course, as a homogeneous space of G.) If we restrict to the product of G/L with a sufficiently small neighborhood of 0 in N,'(L), by our above remarks this map is a homeomorphism with a neighborhood of p in M(L). Similarly, mapping this neighborhood of zero in N,'(L) onto M(L), then projecting on G\M(L), defines a homeomorphism with a neighborhood of the orbit Gp. If we use these homeomorphisms to define manifold structures for M(L) and G\M(L) (it is left to the reader to check that the manifold axioms are fulfilled), that the map M ( L ) --f G\M(L) is a fiber space map follows more or less by definition. Turn to the totally geodesic statement. We must make precise what is meant by " totally geodesic" in G\M, since it is not quite a Riemannian manifold. Now, of all the equivalent defining properties of a totally geodesic submanifold of a Riemannian manifold, one is adapted to generalize to a metric space: We say that a subset A of a metric space Q is totally geodesic if, given q E A , there is a neighborhood U of q in Q such that every geodesic curve of Q starting at q, which lies in U and ends on A , must lie completely in A . In the case Q = G\M, with the fact that geodesics in Q are projections of G-transversal geodesics in M , we see that the neighborhood constructed above of the orbit whose isotropy group is L has precisely this property; hence G\M(L) is totally geodesic in G\M. Q.E.D. The next question is: Given a subgroup L c G, does the metric space structure on G\M, restricted to G\M(L), arise from a Riemannian metric on G\M(L), and how can this Riemannian metric be computed in terms of the given Riemannian metric on M ? THEOREM 25.9 Let A4 be a complete Riemannian manifold, and let L be a group of isometries of M . Let F ( L ) be the fixed point set of L ; that is, F(L)= {pE M:gp=pforallgEL}. Then F ( L ) is a closed, regularly embedded totally geodesic submanifold of M . In addition, for p E F(L), the tangent space to F ( L ) at p , namely F(L), , is precisely { u E M p : g,(v) = u for all g E L ) .

352


Pro06 Let p , q E F(L). Let o(t),0 5 t 5 1 , be the geodesic joining p to q, and let g E L . If p and q are sufficiently close together (for example, if q lies on a convex ball about p ) , then ga(t) = o(t) for 0 I t I 1. (Otherwise there would be two small geodesics joining p to q.) If u = a'(O), then g,(u) = u and q = exp(u). This shows that the intersection of F ( L ) and a sufficiently small neighborhood of p is contained in exp({u E M p : g*(u) = u for all g E L } ) . This provides coordinate systems to make F ( L ) into a manifold. The argument also shows that it is a totally geodesic submanifold, since a sufficiently small geodesic of M whose end points lie on F ( L ) must then lie completely in F ( L ) . THEOREM 25.10 Let G be a closed group of isometries of a complete Riemannian manifold M . Let L be a compact subgroup of G, and let N ( L , G) be the normalizer of L in G; that is,

N(L, G)

= {gE

G: gLg-'

c L}.

Let F o ( L ) = { p E M : GP= L } . Then F o ( L ) is an open submanifold of F(L), which is left invariant by the action of N(L, G), and N(L, G)\Fo(L) is isomorphic to G\M(L). All orbits of N(L, C ) on F o ( L ) are principal. Proof. To show that F o ( L ) is open in F ( L ) , suppose p E Fo(L) and that q E F ( L ) is sufficiently close to p . Then G P= L c Gq. We have seen that G4is conjugate to a subgroup of GP. This, however, is possible (since they are compact) if and only if G4= L , that is, q E Fo(L). Suppose now that p E Fo(L), g E G, and g p E Fo(L). Then GP= L = Ggp. But G ~ =P S ~ ~ S = - lgLg-l. (Proof: Let h E Ggp.Then hgp = gp, or g-'hgp = p , or g-lhp E G P , or h E g G P g - ' . ) Thus g must belong to N(L, G). Let p E Fo(L). The orbit of p under N ( L , G) must be contained in M(L).

Thus we obtain a map of N ( L , G)\Fo(L) + G\M(L), which is obviously onto. Let us show that it is 1-1 : Suppose, then, that points p , q E Fo(L) lie on the same orbit of G. The argument in the last paragraph shows that p and q must lie on the same orbit of N(L, G).

353


The isotropy subgroup of N(L, G) at each point of Fo(L) is obviously N(L, G) n L;hence all orbits are principal and the orbit space N(L, G)\Fo(L) is a manifold. That the mapping

W L , C)\F0(L)

+

G\M(L)

is differentiable is seen by referring back to the way in which the manifold structures were defined. (Details are left to the reader.) To complete the proof, we shall show that this map is an isometry of Riemannian manifolds. Let p and q be points of F o ( L ) that are close together, and let o(t), 0 i t i 1 , be a geodesic of minimal length joining Gp to Cq. After translating by G, we can suppose that a(0) = p . Now length o i d(p, q). Hence, if d(p, q) is sufficiently small, we have G U ( lc ) GP = L ,

But a(1) = gq, for some g E G; hence G"(') g E N ( L , G). Thus

= L,

hence a(1) E Fo(L), and

&Gp, Gq) 2 4 N ( L G ) P ,N ( L , G ) d .

The reverse inequality is obvious.

Q.E.D.

We change direction now to treat the infinitesimal properties of groups of isometries. First we must describe the Lie algebra of a Lie group. Definition

Let G be a Lie group. Its Lie algebra, usually denoted by G , is defined as follows: (a) An element of G is a one-parameter subgroup of G, that is, a mapping g: R + G such that g(t

+ s) = g(t) . g(s)

for - 00 < t , s < 00.

(b) G is defined as a real Lie algebra (in the abstract sense) as follows: If g1 and gz are one-parameter subgroups, that is, elements of G , then g1 + g 2 is the one-parameter subgroup g3:

The Jacobi bracket, [gl,g 2 ] ,is the one-parameter sugbroup g4:

It is shown in treatises in the theory of Lie groups that these limits exist,

354


define one-parameter subgroups, and that these operations satisfy the algebraic conditions necessary to show that G is well defined as a Lie algebra. For our purposes, it is most important to see what the Lie algebra means in terms of transformation groups. Suppose, then, that G acts as a transformation group (in a C" way) on a manifold M . Each one-parameter subgroup t g ( t ) in G then acts on M . We have seen that it has a unique vector field X as an infinitesimal generator; that is, each orbit t -g(t)p of g is an integral curve of X . It is most important to realize that this mapping G + V ( M ) is a Lie algebra homomorphism; that is, that the sum and bracket of two one-parameter subgroups have as infinitesimal generators the sum and Jacobi bracket of this infinitesimal generator. (Again we assume this as a basic fact in the theory of Lie groups.) This homomorphism G 4 V ( M ) may be described as the injinitesiinaf version of the action of G on M . Return to the case of a Riemannian manifold. ---f

THEOREM 25.1 1 Let X be the infinitesimal generator of a one-parameter group of transformations of a Riemannian manifold M . Then this is a one-parameter group of isometries if and only if

(VyX, Z )

+ (V,X,

Y )=0

for all Y, Z

E

V(A4).

(25.1)

These conditions amount to a system of differential equations for the vector field X . These equations are called the Killing equations, and the solutions are called Killing vector fields.

Proof. Let t + g ( t ) be the one-parameter transformation group generated by X. If Y is a vector field, let p be the vector field: p -tg(t)*( Y ( g ( - t ) p ) ) . The following relation is easily derived:

Expressing the fact that g,is a group of isometries, we have

Taking a j d t and setting t

= 0,

X ( ( Y, Z ) ) = ( [ X , Y ] , Z)

we have

+ ( Y , [ X , Z])

for all Y, Z

E

V(A4). (25.2)

(Although we shall not pursue the point, the equations are those guaranteeing


355

that the Lie derivative by X of the metric tensor is zero.) But

W < Y , -0) = (VX y, -0+ = 0. If X ( p ) = 0, we are finished, since we count a constant integral curve of X as a geodesic, even if a degenerate one. Suppose X ( p ) # 0. Let a(s), 0 I t I I , be the geodesic of M , with a(0) = p , a'(0) = X ( p ) . Let u(s) = X(a(s)), that is, u is X restricted to a. A general property is: u is a Jacobi vectorjield on a. This is easiest to see if X generates a global one-parameter group of isometries: t -+ 4 t . Then s -+ 4(at(s)) is also a geodesic; hence 6(s, t ) = 4 t ( ~ ( s ) )is a geodesic deformation of a whose infinitesimal deformation field is precisely u. That u is Jacobi follows from our earlier work. In the case where X does not generate a global isometry group, a slight variant of this argument may be used: 6(s, t ) may still be defined, for t sufficiently small, so that for each s, t -+ 6(s, t ) is an integral curve of X . Again we must prove that s + 6(s, t ) is geodesic. This can be done, for example, by reducing X to canonical form, as in the proof of Theorem 6.3. But Vv(0) = Vat(())x

= v x X(p).

Thus p is a critical point if Vv(0) = 0, that is, if and only if (by the uniqueness of Jacobi fields) X(a(s)) = u(s) = ~ ' ( s ) for 0 I s I 1. This is precisely the desired conclusion. A detailed study of the Killing equations as differential equations is not possible here, but we shall do a few items of this nature as illustrations. LEMMA 25.13 Let X be a Killing vector field on M ; p a point such that X ( p ) = 0 = V, X , for all u E M , . Then X is identically zero on M . Proof: Let a($), 0 _< s I I , be a geodesic of M beginning at p . We have seen that X restricted to a is a Jacobi vector field. The initial value of this Jacobi field and its first covariant derivative must vanish; hence X restricted to a must vanish identically, hence X is zero in a neighborhood of p . Thus the set of all points where X is zero is both open and closed ; hence, equals M , since A4 is connected. Q.E.D.

357


THEOREM 25.14 Let M be a Riemannian manifold, and let I(M) denote the set of Killing vector fields on M . Then: (a) I(M) is a Lie subalgebra of V ( M ) . (b) If n = dim M , then dim I(M) I n(n + 1)/2. (c) If X E I(M), Y, 2 E V ( M ) , then

vyvzx = R(Y, X ) ( Z ) + * V [ , , , , X .

(d) For X, Y EI ( M ) , 2 E V ( M ) , R(X, WZ)

= V[X,Y,Z - 3V[Z,X]y

+ P[,,Y ] X + cz, CX, Y l l .

Proof. The straightforward computation needed to prove (a) is left to the reader. We now prove (b): = { X EI(M): X ( p ) = O}. Since I ( M ) P is the kernel For p E M , let of the restriction map X+ X ( p ) , it suffices to show that dim I(M)P I

n(n ~

n(n - 1) + 1) - n = ~.

2

2

Now each X E I ( M ) Phas, by definition, a singular point at p . We can then define the linear approximation to X at p , I , , as a linear transformation: Mp + M p as follows: I,(u) = [ Y, X ] ( p ) ,where Y E V ( M )is such that Y ( p ) = u. For w E Mp, (jx(u), w > = (CY, XI, w> == (VX y, w > - ( V Y X , w> = ( v x ( p )y, w > - ( V , x , w > = -(V,X, w>, since X ( p ) = 0. We conclude: lx(u) = -V, X

for u E M p .

X is identically zero if and only if lX Lemma 25.13).

=0

(using

I, is a skew-symmetric (with respect to the form ( , )) linear transformation of M p .

Thus dim I ( M ) p5 dimension of space of skew-symmetric n x n real matrices = n(n - 1)/2. Turn to the proof of (c). If CT is a geodesic of M , we see that X restricted to CT is a Jacobi vector field. Then, writing out the Jacobi equations, Vd,,, v,.,,,x

= R(o’(s),X ) ( C ’ ( S ) > .

358


Since o is arbitrary, for all Y E V ( M ) .

V y V y X = R( Y, X)( Y) We want to “polarize” this identity:

+ vzvzx + vzvyx + v y v z x = R ( Y -t Z , X ) ( Y + Z ) = R( Y, X ) ( Y ) + R ( Z , X ) ( z ) + R( Y, X ) ( Z ) + R ( Z , X ) ( Y ) .

V ( y + z , V ( y + z ) X= v

Hence,

VZVyX V, Vy X

y v y x

+ V y V z X = R ( Z , X)(Y) + R(Y, X ) ( Z ) . + Vy Vz X = 2Vy V, X + R ( Z , Y)(X) + V[z, ylX

(using the Ricci identity for iterated covariant derivatives). Use the “ cyclic permutation identity” for the curvature tensor: R ( 2 , X ) ( Y) - R ( Z , Y)(X) - R( Y, X ) ( Z ) = 0.

Putting these together gives (c). Proof of (d):

“ X , Y1,Zl

= V[X, y] = (using

z

-

VAX, Y l

= V[X, Y]

z - vz v x y + vz v y x

(c)), VLX,y1 Z - W ,Y)(X) - SVCz,.,Y

+ R(Z, X ) ( Y ) + +V[z,

y1 X

.

Use the cyclic identity for R :

w, X)( Y )

-

R(Z, Y ) ( X ) - R(Y, XNZ) = 0,

and we get (d).

Q.E.D.

I n favorable cases, formula (d) may be applied to compute the curvature tensor of M in terms of the Lie-algebra structure of I ( M ) . The most favorable situation is when M is a symmetric homogeneous space. De$nition Suppose that a manifold A4 is acted on transitively by a Lie group G, with K the isotropy subgroup of G at a fixed point p E M (so that M is the coset space C I K ) . Let G be the Lie algebra of G, K the subalgebra corresponding to K . M is then said to be a sj-mnietric homogeneous space (relative to the action of G) if there is an automorphism a of G such that

K

=

{ X E M :a ( X ) = X } ;

a2 = the identity.

359


This condition can be rephrased as follows: Put

P = {X

E

G :a ( X ) = - X I .

Then, (a) G is the direct sum of K and P; (b) IK, PI = P; ( 4 [P7PIc K. Conversely, if a subspace P of G exists satisfying these three conditions, ci can be defined as the identity on K, and minus the identity on P, so that the symmetric condition is equivalent to the existence of such a P. We also say that a K satisfying this condition is a symmetric subalgebra. The symmetric spaces are very important for differential geometry, since on the one hand they include most of the interesting “ classical spaces, such as spaces of constant curvature, projective spaces, and Grassman varieties, and on the other hand their geometric properties can be treated with general methods that do not work so well for more complicated sorts of spaces. They form a “max-min” class of spaces, that is, they seem to be the largest class of spaces that can be treated with certain unified techniques. Of course this definition is not the most geometric one possible, but we must refer to Helgason’s book [I] for details. We shall present two general theorems about them as illustrations. ”

25.15 THEOREM Suppose M = G/K is a symmetric homogeneous space, and that M is a Riemannian manifold on which G acts as a group of isometries. Let P be the subspace of G satisfying (a), (b), and (c) of the definition. (a) For X , Y,ZEP,

where p is the point of M at which K is the isotropy subgroup of G. Qualitatively, the curvature tensor of M is determined completely by the algebraic properties of G. (b) The covariant derivative of the curvature tensor of M is zero; that is, the curvature tensor is invariant under parallel translation. Proof. The infinitesimal action of G on M defines G as a Lie algebra of Killing vector fields on M . The basic conditions we need are: The restriction map X + X ( p ) defines an isomorphism of P with Mp.

360


For X , Y EP, [ X , Y ] ( p )= 0. Theorem 25.14 gives R ( X , Y ) ( Z )= V[X,YlZ - P,,, X ] y

+ JVrz,,]X + [ Z , [ X , Y ] ]

for X , Y, Z E G .

(25.3)

If we take X , Y , 2 E P, and restrict to p , we have part (a). We now prove for X , Y E P.

V , Y ( p )= 0

First Vx Y ( p )= ( V , X

+ [ X , Y ] ) ( p )= V, X ( p ) . Take Z E P:

(25.4)

( V , y, Z X P ) = -(P), = (VX

or ( V x Y, Z X P ) = 0 ;

hence (VxW ) ,

365

26. Deformation of Submanifolds

The further conditions necessary to prove (26.10) follow in a similar way, Q.E.D. using the definition of iterated covariant derivatives of R. So far, we have been working with an arbitrary M . In case M is special, for example, of constant curvature, results even closer to the classical theory of developable surfaces can be obtained. For example:

THEOREM 26.3 Suppose that N is a submanifold of M , that C={X

E

V(A4):X ( p ) E C, for all p

EN},

and dim C p is constant for p E N , and that R(C,, C,)(N,) c N ,

for all p E N.

(26.12)

Then the vector field system C is completely integrable, and if Q is one of its integral submanifolds, then the tangent spaces to N are self-parallel along Q. If, further, R(C, Np)(N,) 5

=N,

(26.13)

then Q is a totally geodesic submanifold of M. We leave it as an exercise to the reader to show that (26.13) is automatically satisfied if M has constant curvature. Proof. We prove that C is completely integrable, that is, [C, C] c C. For X , Y E C, Z tangent of N , V[X, Y]

z= v x V Y z- v, v x

-W

X , Y)(Z),

which is tangent to N . The rest of the theorem follows routinely from our work in proving Theorems 26.1 and 26.2. In summary, we may say that this work develops a generalization of the classical theory of developable surfaces to reasonably general Riemannian situations. The classical theory completely ignores the possibility of singularities in the developable structure, and only recently (Massey {l] and HartmanNirenberg [l]) has work begun on the singularities. A developable surface in 3-space with singularities may be described as a surface Q c R3 such that each point p E S satisfies either (a) or (b): (a) p is contained in a line segment lying in S along which the tangent plane is self-parallel. (b) The principal curvatures are zero at p , that is, the second fundamental form is zero at p .

366


(The necessary and sufficient condition for this is that S be of zero Gaussian curvature.) In this form the conditions may be generalized to higher dimensional and Riemannian situations, but this goes beyond the scope of this book.

Deformation Problems for Riemannian Submanifolds In this section we shall compare different isometric embedding of the same Riemannian manifold N into a Riemannian manifold M . Hence, for the sake of clarity, we explicitly label submanifold mappings, supposing that 4 and 4’are two isometric embeddings of N into M . In addition to being isometries, we shall suppose that 4 and 4’ satisfy the following condition, which will simplify the discussion considerably. (Note that it is automatically satisfied if A4 is of constant curvature, which is the classical situation.) If p E N , u, u E N , , then the sectional curvature of the planes spanned by 4+(4, +*(4 and 4*’(u), 4*’(u> are the same.

(26.14)

We shall not repeat this fixed assumption throughout this section. Let 4 and 4’be two isometric embeddings of a Riemannian manifold N i n a Riemannian manifold M . Let +(N)’ and 4’(N)’ be the normal vector bundles to the submanifolds determined by 4 and +‘. Then 4 and +’ are rigidly related if there is a map I): 4(N)’ --+ 4’(N)’ such that: For p

E

N , I) maps the normal tangent vectors t o

4 ( N ) at 4 ( p ) linearly onto the normal tangent vectors to + ’ ( N ) at 4 ‘ ( p ) , preserving the inner product on these normal vectors that is inherited from M .

(26.15a)

ForPEN UE4(N)&,), V,WENp, sl(4*(v)2 4*(w)) = Si@I)(4*’(49+*’(w)>,

(26.15b)

where S, )( , ) and S; )( , ) are, respectively, the second fundamental forms of N and N ’ . Condition (26.15a) can be summarized in the language of fiber bundles by saying that I) is a bundle isomorphism of the two normal vector bundles defined on N by the isometric embeddings. It is easy to see that such isomorphisms exist locally: Whether they exist globally is a purely topological question, toward which much current research in differential topology is applicable. For a geometer, (26.1 5b) is the key condition. However, we must defer a further explanation of its geometric meaning. Suffice it to say that in case M is a space of constant curvature (which is the classical situation), it

367


suffices to guarantee that (PI and d2 are related by an isometry c1 of M ; that is, 41 =

ff42.

In favorable cases it can be proved that two isometric embeddings are rigidly related for purely algebraic reasons. The following theorem offers at least a qualitative explanation of this.

THEOREM 26.4 Let #I be an isometric embedding of N in M . Suppose that m = dim M , n = dim N . Choose the following range of indices and summation conventions : 1 < a , b ,... < m - n . l < i , j,... I n ; For each p E N , choose a fixed orthonormal basis (vi) of N , and (u,) of 4(N);(,,). Consider the second fundamental forms (SJ , )) as symmetric bilinear forms on N , , and let 8, be the I-covectors, that is, I-forms, on N , , defined by

Suppose that:

e,,(u) = SU,(u,vi)

for u E N , .

(a) Any other set el, of 1-forms on N , related to the 0, as

oi, A e,,, e;,(u,) = eia(ui) must be deducible from the ei, by relations of the form el, A e;,

=

&,

= Ma, Qit3,

(26.16) (26.17)

where (Ma& is an orthogonal matrix, that is, satisfies MabMcb= S,,. (In other words, relations (26.16) must imply (26.17), from some choice of Mob.) (b) The forms Sum(, ) are linearly independent. Then any other isometric embedding of N in A4 (satisfying (26.14) as always) is rigidly related to 4. Proof.

Let

4':N-+ M

be another isometric embedding. For p

E

N , let

(ua') be an orthonormal basis for +'(N);,(,,),let SA,n( , ) be the corresponding

second fundamental forms, and let Qi,be I-forms defined by Q,,(v) = SL,a(u,vi). That (26.16) is satisfied is implied by the fundamental relation between second fundamental forms and curvature, and the fact that (26.14) is satisfied. Let Mobbe the orthogonal matrix satisfying (26.17). Define a linear mapping $,: $(N)&,) + +'(Pi);,(,) by the condition $P(~,)

= Mba

ub'*

Now it is readily verified that $, is independent of the bases we have used to define it; hence we may consider it as defining a global map $: 4(N)' 4'(N)', satisfying (26.15a). Further, condition (26. I5b) follows trivially from (26.17). Q.E.D. --f

368


Remarks. The condition that SU,( , ), . . . , Sum-”(, ) be linearly independent forms forces a condition between n and m, namely, tn-ns-

n(n

+ 1)

or

2

(n rn I

+ l)(n + 2) 2

(Since (n(n + 2)/2) is the maximal number of linearly independent quadratic forms on a vector space of dimension n, this corresponds to the intuitive fact that when m is too big in relation to n, there is just too much freedom in the perpendicular direction to hope to get rigidity.) Finding effective sufficient conditions for reasonably general cases that (26.17) be satisfied is a difficult algebraic problem which is beyond the scope of this book. In case dim M - dim N = 1, that is, N is a hypersurface in M , the answer is quite simple, and was found by E. Cartan [3]. We shall follow his proof. THEOREM 26.5 Let 4 : N + M be an isometric embedding of N as a hypersurface in A4 such that the following condition is satisfied:

For p E M , the dimension of the space of characteristic vectors of the second fundamental form of N at 4 ( p ) is no greater than n - 3. Then 4 is rigidly related to any other isometric embedding of N in M . Proof. 26.4

(26.18)

Let S( , ) = Sun(, ), O i= O i , , Oi’ = O i n , in the notation of Theorem

Let C, be the set of characteristic vectors of S ; that is, C,

= { u E N,:

S(v,w ) = 0 for all w E N , } .

Notice also that C,

=

(v E N , : 8,(u) = 0 for 1 I i 5 n).

Equation (26.16) takes the form 8, A

oj = 0,’ A ej’

At most relabeling things, we can suppose that el, . . ., 8, are a maximal linearly independent set from among the Oi(a = n - dim C,, since O1 = 0 = ... - 8, defines C,). Thus the 2-covectors O i A Q j , 1 I i,j I a, are linearly independent; hence so are the Oi’A Oj‘ for 1 _< i, j _< a. This implies that the el’, . . . , 8,’ are linearly independent. Turning the argument around, we see that el‘, . . . , 0,’ are a maximal set of linearly independent forms from the da’.

369


N o w a 2 3 . F o r u ~ C , ,l < i , j < a ,

o = u _I (ei’A 8;)

=

e:(u)e;

-

e;(u)e:.

Since i can be chosen different from j (requiring only a 2 2, as a matter of fact), O,’(u) = 0. By symmetry, we see that C,

=

{ u E N , : &‘(u)

= 0} = {u E

N , : S’(u, w)

=0

for all w E N,,},

where S’ is the symmetric (by assumption (26.16)) form defined by for u E N , .

S ( u , u i ) = 8,’(u)

Notice that we must prove that S(

3

) = +S’(

3

1,

since a one-dimensional orthogonal matrix must be 1. Since the characteristic vectors of S and s’ are the same, it suffices to prove this relation on C,’ or, what amounts to the same thing, to suppose the special case C, = 0 or a = n. Now

o = el A 8 , A 8,

= 8, A

8,‘ A d2‘.

Thus 8,, O,’, and 8,’ must be linearly dependent, or since 8,’ and 8,‘ are independent, 8, is dependent on 8,‘ and 02‘. Similarly, 8, is dependent on 8,’ and 03‘. However, since el’, 02’,8,‘ are independent (here we use a 2 3), 8, must be dependent on 8,’ alone, say 8,’ = a,8,. But, 1 has played no privileged role ; hence

Bi’

= aiOi

for 1 I i Ia

(no summation).

But then

Oj‘

A

Oj‘ = a , a j 8 , A O j

(no summation).

Hence,

aiaj= 1 u1a2= a , a 3 ;

for 1 5 i < j < a . hence a2 = a 3 .

By symmetry, a, = a, for 1 i j i a ; hence alz = 1, which is what is required to finish the proof.


Part

4

DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS: ADDITIONAL TOPICS IN DIFFERENTIAL GEOMETRY


27 First-Order Invariants of Submanifolds

and Convexity for Afinely Connected Manifolds

As in Euclidean geometry, many of the geometric properties of submanifolds of Riemannian manifolds really involve only the underlying affine connection. It is worth our while to change the point of view from that of the calculus of variations and attempt to describe the geometric invariants of submanifolds more systematically, using mainly the underlying affine connection. As a bonus, of course, the results will hold also for pseudo-Riemannian manifolds, which is of interest for applications to physics (for example, the Theory of General Relativity). In this chapter, M will be a manifold with a given affine connection, denoted by V, and N will be a submanifold. We shall assume that V has zero torsion tensor. Recall that V is defined by a bilinear mapping of V ( M ) x V ( M ) -+ V ( M ) , say, (A’, Y ) -+ V, Y, satisfying V,(fY) V,, Y = f Vx Y

= X(f)Y

+ f v,

(27.1)

y

for X, Y E V ( M ) , f

E

F(M).

(27.2)

Recall the significance of these laws. Equation (27.2) implies that the connection depends “ tensorially ” on X , but (27.1) implies that this is not so in Y. However, it may be possible to consider quotient relations that kill off the first term on the right-hand side of (27.1), and hence convert the covariant derivative operation into a genuine “tensor field.” We can now apply this remark. Definition

For each point p E N, define a bilinear mapping, S: N p x N p -P Mp/Npas follows: For u, u tangent vectors to N at p , choose vector fields X and Y that are tangent to N, that satisfy A’@) = u, Y@)= u, and set S(u, v) = image of V, Y(p) under the quotient

projection map M p + M p / N p .

To verify that S( , ) is well defined, notice that the map (A’, Y) -+ Vx Y(p)-r M p / N pis bilinear also when X and Y that are tangent to N are multiplied by functions. S( , ) is called the second fundamental form of N . 373

374

Part 4. Additional Topics-Differential Geometry

To get a real-valued bilinear form from S, choose any I-covector o at p that is zero on N , and so passes to the quotient to define a linear form on M,IN,. Define: S,,,(u, u ) = w(S(u, u ) ) = w(V, Y), if Xand Yare vector fields tangent to N with X(p) = u, Y(p) = u.

We now present in the formal lemmas some of the main geometric properties of this form.

LEMMA 27.1 S( , ) is a symmetric bilinear form.

Proof. The torsion tensor T( , ) is

T ( X , Y) = v, Y - vyx- [ X , Y].

If X and Y are tangent to N , then so is [ X , Y ] . Thus T ( X , Y ) = 0 implies V, Y - V, X(p) E 0 (mod N,), which shows that S(X(p), Y(p))=S( Y(p),X(p)). Q.E.D.

LEMMA 27.2 Let u E N , be a tangent vector t o N that is tangent to a geodesic of the affine connection on M that lies on N. Then S(u, u) = 0. In particular, if N is geodesic at p , S( , ) is identically zero at p . Proof. Let X be a vector-field tangent to N such that X(p) = u and the integral curve of X beginning at p is a geodesic. By the definition of geodesic, V, X = 0 along the geodesic, in particular at p , whence S(u, u) = 0. N is said to be geodesic at p if all geodesics beginning at p and tangent there to N remain tangent. That S( , ) identically zero at p is a necessary condition should now be evident.

LEMMA 27.3 Let p that

+Hpc

M , be a field of tangent subspaces of N defined for p M,

=N,@

E

N such

H,.

Such a field defines an induced affine connection on N , denoted by V N ,in the following way: For X * , Y * E V ( N ) , choose X , Y E V ( M ) which reduce to X * and Y * on N . For p E N , put V& Y * ( p ) = projection of V, Y ( p ) on N , .

27. First-Order Invariants of Submanifolds

375

Proof. The main point is to show that the projection of Vx Y@) on N , is zero if X or Yare zero on N . But this should be rather evident. If H p is identified with M , / N , , note that this definition of VN can be rewritten as

VXNY- V, Y

= S(X,

Y)

for X,Y E V ( M ) , tangent to N .

LEMMA 27.4 If S( , ) is identically zero for all points p E N , then N is totally geodesic; that is, N is geodesic at each of its points. Proof. Since this is a purely local problem, we can suppose that there is at least one such field p + H , of tangent subspaces enabling us to define an induced afine connection on N . Notice that S( , ) = 0 implies that this connection really does not depend on H (hence there is an induced connection globally defined on N ) . Also, V, Y = V X NY for X , Y tangent to N . In particular, a geodesic of N in the induced connection is a geodesic for the given connection on M . By the uniqueness of geodesics having given tangent vector, this proves that N is geodesic at each point. Q.E.D.

Convex Hypersurfaces and Functions Now we turn to the following question: Let f be a real-valued function on M , and let p be a point of N that is a critical point for f restricted to N . Then df restricted to N is zero at p ; that is,

df(u) = u ( f ) = 0

for all u E N , .

(27.3)

We now ask how to compute the Hessian off, restricted to N , at the critical point p . Now, in general, the Hessian is a quadratic form, denoted by u -+ h,(u), on N p . It can be defined as follows: For u E N , ,pick a vector field X that is tangent to N , satisfies X(p) = N , and put h,(u) = X(X(f))(P).

(27.4)

We want to express (27.4) more precisely in terms of the geometry of N and f. Suppose thatfitself does not have a critical point at p . Then (27.3) expresses the geometric fact that N and the level surface f -'( f @)) are tangent at p . Let Sdff( , ) be the second fundamental form. A basic formula is h,(u)

= S&,

u) - Si,(U, u).

To prove (27.5), start from (27.4): X ( X ( f ) ) = X(df(X)) = VX(df)(X) + df(VX XI. Now df(V, X ) ( p ) = Sdf(u, u). by definition.

(27.5)

376

Part 4. Additional Topics-Differential

Geometry

V,(d’)(X)(p) depends only on the value of X at p ; hence we can choose a vector field Y satisfying Y ( f ) = 0, Y ( p ) = X(p). Then Vx(df)( X ) ( P )= V,(df)(

=

Y ( d f ( Y ) )- d f ( V , YXP)

= -S,Ss(Y(P), Y(P)),

which proves (27.5). To realize the significance of this formula, let us compute S&( , ) in case M is Euclidean space, with coordinates ( x i ) , 1 5 i 5 n, and the flat affine connection (that is, V,,,,z(a/dxj) = 0):

af

df =--xi

axi

Suppose Y = A i ( d / d x i ) satisfies df( Y ) = 0; that is, A i ( d f / d x i )= 0.

Recall that the function j” is said to be convex if its Hessian matrix (d2fl(dxiax,)) is positive semidefinite. This then implies that: The form v

+ S&(v,

v) is nonpositive.

(27.6)

Such a condition, verified at all noncritical points off, is then the appropriate generalization of ‘‘convex function ” to an affinely connected manifold. Similarly, we can say that a hypersurface N of M is convex if, for each p E N , each form /I on M , with A(N,) = 0, the following condition is satisfied: The form v

+ Sl(v,

v) on N p does not change sign.

Now we turn to the question of proving bounded by convex hypersurfaces.

“

geometric convexity” of regions

THEOREM 27.5 Suppose that f is a function on N such that f ( p ) = 0, f -‘(O) is a hypersurface that is geodesic at p and is tangent to the hypersurface N at p . Suppose that SdJ(u,v) i 0 for all u E N , . Then p has a neighborhood U such that f(q)I 0

for all q E N n U ;

that is, N lies, in a neighborhood ofp, completely on “one side” of the hypersurface ,f - ‘(0). Notice the analogy with the statement in Euclidean geometry that a convex hypersurface lies completely on one side of its tangent plane. Proof. Using (27.5), we see that the function q +f(q), for q E N , has a relative maximum at q = p . Q.E.D.

27. First-Order Invariants of Submanifolds

377

THEOREM 27.6

Suppose that f is a real-valued function on M such that: For t E [0, 11, the hypersurfacef - ' ( t ) is strictly convex in the sense that S&-(u,u) < 0 for all u E M,, , satisfying u ( f ) = 0 , o sf@)I 1. Then the set of all points p E M such that 0 s f @ ) 1 is geodesically convex in the sense that a geodesic whose end points lie on the set lies completely on the set.

Proof. Suppose ~ ( t ) 0, I tI 1, is a geodesic with 0 ~ f ( o ( O ) )and f(o(t)) 5 1. Iff(a(t)), for 0 I t 1, does not lie on the interval [0, 13, there is a point to E (0, 1) that is a relative maximum for t +f(o(t)). Thus we can apply (27.5), with N taken as the geodesic B . The first term of (27.51, Sdr(u, u), is zero, since it is the second fundamental form of B , which is zero, since B is a geodesic. Our hypotheses then assert that the Hessian offrestricted to B must be positive at t o , which is a contradiction. Q.E.D.

These simple results suffice to give the idea of the geometric meaning of formula (27.5). Many more sophisticated applications are possible.

28

Affine Groups of Automorphisms. Induced Connections on Submanifolds. Projective Changes of Connection

Let A4 continue to be a manifold with a torsion-free affine connection V. Suppose that V‘ is another affine connection. We shall show that the “ difference,” D( , ) of V and V’ is a tensorJield. For X , Y E V ( M ) , put D ( X , Y ) = V, Y - V,’ Y . For f~ F ( M ) , note that DUX, Y) =fO(X,Y), D(X,fY) =fD(X, Y )+ X(f)Y -X(f)Y

= fD(X,X

).

(28.1)

Then D( , ) is F(M)-multilinear; hence, defines a tensor field on M . Clearly it is also symmetric in X and Y . Suppose now that 4 is a diffeomorphism of M and that V’ is the “transformed ’’ affine connection ; that is, 4 * P x Y ) = V&,

4 * ( Y ) = N 4 * ( X , 4*(Y>>>+ V+.(X) 4 * ( Y ) .

(28.2)

Suppose further that t + 4r is a one-parameter group of diffeomorphisms whose infinitesimal generator is the vector field 2. We know then that

Substituting

4, for 4 in (28.2), differentiating with respect to t, we have CZ, V x Y l = D’(X, Y )

+ Vcz, x i Y + VxCZ, Y1,

(28.3)

where D’ is some tensor field. (If one uses this relation as the definition of D‘, it can be easily verified independently that it is a tensor field.) Now, if 4 preserves the affine connection, that is, is an automorphisrn V‘ = V, then if 2 generates a one-parameter group of connection automorphisms, we have [Z, Vx Y ] = Vtz,x, Y

+ VJZ,

Y]

for all X , Y

E

V(M).

(28.4)

This reasoning can be readily reversed (exercise !) to show that (28.4) is also sufficient that the vector field Z generate a one-parameter group of connection automorphisms. 378

28. Affine Groups of Automorphisms

379

LEMMA 28.1 Suppose that 4 : M -P M is a connection automorphism, that M i s connected and that p is a point of A4 such that +(p) = p ;

Then

&: M ,

-P

M,

is the identity.

4 acts as the identity on M .

Proof. The set of all points of M that are fixed under 4 is obviously closed in M . We shall show that it is also open in M , which will prove that it is all of M . 4 maps geodesics of the connection into geodesics. Since it leaves p fixed and leaves fixed all tangent vectors starting at p , it leaves fixed all geodesics starting at p , hence leaves fixed each point of a neighborhood of p ; that is, the Q.E.D. set of all fixed points of 4 is open in M .

Now, if 2 is a vector field in M , the condition that the one-parameter group generated by Z leave p fixed is obviously Z ( p ) = 0; that is, p is a singular point of 2.If this condition is satisfied, we can define a linear transformation (pz: M , + M , in the following way: For v E M , , pick any vector field Y such that Y ( p ) = u, and set lz(v) = c y , Zl(P).

(28.5)

(The reader can readily verify, following a pattern we have used many times before (for example, in the definition of the second fundamental form) that the vanishing of Z at p guarantees that this does not depend on how u is extended to a vector field.) The linear transformation 1, is called the linearpart of Z at p . To justify this name, we can look at it in local coordinates, say ( x i , . . .,xn), with x i ( p ) = 0. with Ai(0) = 0. Then

that is, ( ( d A i / d x j ) ( 0 ) ) is the matrix of the linear transformation I,. The ((dAi/axj)(0))are, of course, just the first terms in the Taylor series of A i ( x ) about x = 0.

380


Geometry

LEMMA 28.2 Let Z be a vector field, with Z ( p ) = 0. Then Iz = 0 on M,, if and only if the one-parameter group generated by Z acts as the identity on tangent vectors at p . We leave the proof to the reader. From Lemmas 28.1 and 28.2, we have immediately Lemma 28.3. LEMMA 28.3 If Z generates a group of connection automorphisms, if Z ( p ) = 0 = I,, then 2 is zero at every point of M . Now we turn to the following question: Suppose that Z is a vector field that generates a one-parameter group of connection automorphisms, that is, satisfies (28.4). Suppose that N is a submanifold of M and that Z is tangent to N ; that is, Z ( p ) E N, for all p E N so that Z restricted to N defines a vector field on N . Dejinition

Let Z be a vector field on a manifold N . We say that Z vanishes to the kth N if

order a t a point p E

Z(p)

=0

[ X , , [... [ X , , Z ]

and

.-.I@)= 0

(28.6)

for all choices of k-tuples of vector fields ( X , , . . . , X,) on N . (Notice that X vanishes to the first order if Z ( p ) = 0 and 1, = 0.) Return to the case where N is a submanifold of M , Z is tangent to N , and Z satisfies (28.4). We ask: Is it possible that Z restricted to N vanishes to the kth order at a point of N without vanishing everywhere on M ? This is clearly possible for some choices of N . For example, if N is a plane in Euclidean space M , there is clearly a nontrivial one-parameter group of affine transformations of M leaving every point of the plane fixed. However, we can find conditions on N that prevent this. Let X and Y be vector fields tangent to N . Then, by (28.4),

cz, v x Y l = V [ Z , X ]

y

+ vxcz, Y l = V [ Z , X ]

y

+ V[Z,

Y,X

+ [ X , [Z, Y l l .

Thus, if Z restricted to N vanishes to the second order at p , then

[Z,

v,

Y X P ) = 0.

If every tangent vector at p can be written as a combination of N,, and vectors of the form Vx Y(p), for A', Y tangent to N , then we see that I, = 0; hence, by Lemma 28.3, Z is identically zero on M . In general, let us call the space of


38 1

vectors spanned by N , and {V, Y(p):X , Y vector fields tangent to N } the first osculating space to N . Then we have the following rather trivial theorem, which is important for the qualitative picture given us of the relation of the “induced geometry” on N to the group of affine connection automorphisms. THEOREM 28.4 Let N be a submanifold of the affinely connected space M , let 2 be a vector field that is tangent to N and generates a one-parameter group of connection automorphisms. Suppose that the first osculating space to N at p fills up N p . Then if Z restricted vanishes to the second order at p , it must vanish everywhere on M . This analysis can be carried further to get criteria for nonvanishing of the order, higher than 2, by considering the higher osculating spaces to N . (The second osculating space to N at p would be that spanned by N p , the set of V, Y(p) and V,, Vx, Y ( p ) ,for vector fields X , XI, Y tangent to N , and so on for the higher osculating spaces. We leave this to the reader, since it involves a simple iteration of the basic argument, that is, applying (28.4) to [ Z , V, V,, Y ] . Theorem 28.4 is not the best possible answer. There are clearly submanifolds having the property that a vector field satisfying (28.4), vanishing to the first order when restricted to N , vanishes identically. For example, this is so if N admits an induced affine connection,” that is, one that is left invariant by an affine automorphism of M which maps N into itself. (For example, we have seen that if N is totally geodesic, it admits such an “induced connection.”) We shall now turn to this more basic question of induced affine connections on submanifolds, using Cartan’s method of the moving frame.” (Indeed, this problem gives a good introduction to Cartan’s theory.) The most complete work on this subject is by Klingenberg [l]. We shall be working with a particularly simple case, namely, that where N is a hypersurface in M . Of course this question is a purely local one, so that we are free to choose “moving frames,” that is, a basis ( q , . . . , on)for I-forms on M . (Choose the following range of indices and summation convention: 1 5 I., J., . . . 5 n = dim M ; 1 5 n, b, . . . 5 n - 1.) Let (oil)be the connection forms associated with this basis, and the affine connection. As definition, “

“

wij(X) = -oi(V,Xj)

for X

E

V(A4).

(28.7)

Then the following relations hold: d o i = oijA o j

(28.8)

(expressing the fact that the torsion tensor is zero).

d o i j = oikA okj+ Rij,

Rij = RijklokA ol.

(28.9)

382


Geometry

(The R i j k ,are the components of the curvature tensor R( , )( ), just as defined earlier for Riemannian geometry.) Now we can arrange the moving frame so that w, = 0 on N . By (28.8), whence on,A o,= 0 restricted to N .

do,= onjA w j ,

Suppose that (coif) is another such moving frame. They are related to the (wi) by relations of the form w i = M i j w j ' ,where ( M i j )is a matrix-valued function on M , whose determinant is not zero. Let (w:,) be the connection forms (defined by (28.7)) in the primed system. The transformation law between the (oij)and (mij) is readily calculated from (28.7), and is found to be 0.. ' I = MikWk;MG

4- d M i kMkj' ,

(28.10)

and ( M ; ' ) is the inverse matrix to ( M i j ) .Suppose that the primed frame also satisfies con' = 0; that is, M,, = 0 restricted to N . Our goal is to use this freedom t o change frames to satisfy more conditions. From (28. lo), we then have on,= M,, co;, Ma;'

, restricted to N , hence w,, . o,= M,,w,,

. w,'.

(28.11)

(The dot ( . ) indicates the symmetric product of I-forms; that is, if U,, 0, are 1-forms, 8, . O,(X, Y ) = +(U1(X)02(Y ) + el( Y ) O , ( X ) ) . )It can be readily seen that the form u,u

+

a,,. w,(u,

0)

is the second fundamental form of N , (u, ZI) -+ Sw,(u, ZI). We shall suppose that this form is definite, that is, that for all u tangent to N implies u = 0.

So,,(u, v ) = 0

For simplicity, we suppose that the form is positive definite-the cases of other signatures of the quadratic form u + So,(u, u ) can be handled similarly. The law of transformation (28.1 1) then assures us that a moving frame (mi) can be that is, so that chosen so that w,, . w, = o,. 0,; restricted to N .

on,= w,

(28.12)

Suppose that (mi') is another choice of moving frame satisfying (28.12). Then the transition functions ( M i j )must satisfy M a , M a , = 6,b M,,

and

M,,

=0

restricted to N .

Differentiate (28.13) : dMacMab

or dM,,M,'

f M a c d'ab

+ M,'

dM,,

=

d'nn

= dab dM,,M,;'.

(28.13)

383


Hence, dMahM,' or

+ M G 1 dMah= ( n - 1) dM,, M n i L ,

(" 2 ')

dMohM-' ha - -

dMnn

Let

Computing the relation between 8 and 8', we have

e = M,,U;

+

M L ~

e r

(" 2

- ')M,,,,w:

MA^.

(28.14)

Now, given the primed moving frame, we can arrange the unprimed moving frame by a change of frame for which the transition functions ( M i j ) satisfy (28.11), that is, leave the relation (28.12) invariant, and such that

e = 0.

(28.1 5 )

In fact we can accomplish this by a choice of ( M i j ) satisfying Mn, = 1 ; Mah= dab; Ma;' = - M a n ; for with these choices, (28.14) set equal to zero can

be regarded as a set of equations for M a n . Now, with conditions (28.13) imposed, let us see how the possible changes of frame are restricted. Using (28.14) again and the relation 0 = 0' = 0, we have

Hence,

hence,

(" 1

n-1 Mhn= - ( T ) M h c M i l M o ,= - l)Mhn? forcing Mh, = 0.

(28.16)

But this relation tells us that the set of vectors 2, satisfying w,(u) = 0 are left invariant when the moving frames are chosen; that is, for p E N , there is a

384


subspace H , of M , such that M , = N , @ H , . Further, the reader can convince himself by working through the above argument that a connection automorphism of M that maps N onto itself will preserve the splitting of the tangent space of M at points of N . (For example, notice that the transformation applied to a moving frame satisfying (28.13) and (28.15) will again satisfy (28.13) and (28.15)) Hence, if we use this field p + H p of subspaces to define an induced affine connection on N (as explained in the beginning of this chapter), an affine transformation of M leaving N invariant will be an automorphism of the induced connection on N . Summing up, we have proved Theorem 28.5 THEOREM 28.5 Let N be a hypersurface in an affinely connected space M whose second fundamental form is definite. Then there is an induced affine connection on N which is preserved by every connection automorphism of M which leaves N invariant. “

”

The procedure we have followed is typical of Cartan’s method of the moving frame. In terms of the jargon, what we have done is to reduce the structure group of the tangent bundle of M restricted to N in a natural way to a subgroup that is small enough to enable one to define an induced affine connection on N . Many examples of this reduction process can be found in Cartan’s book on the method of the moving frame [3], although the generalities are not very clear there. A general (although sketchy) treatment of these matters can be found in a paper by Hermann [9]. “

”

Projective Change of Connections

If two affine connections, V and V‘, are given on a manifold M , we have seen that their “difference” D( , ) is a tensor field. Recall that D ( X , Y ) = V, Y - V,‘Y

for X , Y E V ( M ) .

We may ask : When are V and V’ related in such a way that the second fundamental forms of a submanifold with respect to the connection are the same? The answer is Theorem 28.6. THEOREM 28.6 Suppose that 0 is a 1-form on M such that

v,

Y - V,’Y

= *(O(X)Y

+ O( Y ) X )

(28.17)

for each pair ( X , Y E V(A4)).Then a submanifold N of M has the same second

385


fundamental form with respect to V' as it does with respect to V. Conversely, if this property is valid for every submanifold of M , then V and V' are related by (28.17). Further, (28.17) is satisfied if and only if V and V' are projectively related in the classical sense, that is, their geodesics differ only by a change in parametrization. Proof. Suppose (28.17) is satisfied. Let N be a submanifold of M . Let w be a 1-form on A4 such that o ( N J = 0 for all p E N . Then, for X , Y vector fields that are tangent to N , we have

S,(X, Y ) = w(V, Y) = w(V,'Y)

= S,'(X,

Y),

in view of (28.17). The converse is obvious. Suppose now that the geodesics of V and V' differ only by a change in parametrization. Let X be a vector field whose integral curves are geodesics of V; that is, V,X= 0. Then the integral curves must, after a change in parametrization, be geodesics of V' ; that is, Vx'X

= fX

for some f

E

F(M).

Let D ( X , Y ) = Vx Y - V,' Y. Then V(X, X ) = -fX. Since D is a tensorJield, this must hold for an arbitrary vector field on M . Polarization of this identity (that is, substitution of X + Y in place of X ) shows that (28.17) is satisfied. The converse is readily obtained by reversing these steps and observing that, at least locally, each geodesic can be exhibited as the integral curve of a vector field X satisfying V, X = 0. Q.E.D.

29 The Laplace-Beltrami

Operator

Let M be a manifold, with a pseudo-Riemannian metric on M defined by a definite inner product ( , ) on tangent vectors. Associated with the metric we can define a linear differential operator A that acts on functions on M . By specializing the metric, many of the differential operators that are important in mathematical physics may be obtained. For example, the ordinary Laplace operator a2

a2

a2

-+1+7 ax2

ay

aZ

is associated with the flat Euclidean metric on Euclidean space; the d’Alembertian, or wave, operator

1 az 0 =---

c2 at2

a2

axz

a2

a2

ay2

az2

is associated with the Lorentz metric on space-time. Our aim in this chapter is to illustrate the power of the theory of affine connections on manifolds by deriving many of the properties of these operators without introducing coordinates. Let j be a function on M . The gradient off, denoted by gradf, is a vector field on A4 defined as (gradf, X )

=X

for X E V ( M ) .

( f )= df(X)

(29.1)

The first-order differential operatorf-t I/gradfI/ is sometimes called the first Bdtrami diferential operator. In fact, we have already seen in Chapter 13 that the function f such that //gradj // * is constant on the level surfaces off is a solution of the Hamilton-Jacohipartial diferential equation associated with the variational problem; for such a nf, the integral curves of grad f a r e geodesics. Let 0 be the n-form on M defining the volume element associated with the metric. The easiest way t o define 0 is by using an orthonormal moving frame (q, . . ., w,J of I-forms, that is, the metric is given by d s 2 = + _ w ,.w1

* ‘..*

w, . w,

where ( . ) denotes the symmetric product of 1-forms. Then 0 is given by 8 = 0,A ..‘ A 386

W,.

(29.2)

387

29. The Laplace-Beltrami Operator

Let X be a vector field on M . The divergence of X , denoted by div(X), is the function such that x ( O ) = (div X)O.

(29.3)

Thus div(X) = 0 is the condition that the one-parameter group generated by X leave invariant the volume on M defined by 8. Letf E F ( M ) . We can define A( f),the Lupluce-Beltrumi operator (sometimes called the second Beltrami operator), as

A(f)

= div(gradf).

(29.4)

We can give a more explicit form of A ( f ) in terms of the orthonormal moving frame (mi) of the connection. Let ( X i ) , 1 i i , j , . . . i n = dim M be a basis of vector fields dual to the (ai); that is,

X ( 0 J = X(Wi)(Xj)Oj [ X ( w i ( X j ) - ai(VxCX, Xj1)Iwj = Wi(VXJ x - v, X j ) O j .

=

Now

... A 0,+ ... + o1A ... A ~ ( 0 , ) x - v, x,)]01A A 0,f f O1 A ..' A 0,-1 A [O,(v~,x -v~X,,)]O, = [wi(v,ix - v,x,)le.

x(e)=~

( 0 A, )

= [O,(V,,

By (29.31,

' ' '

div X

' '

= o i ( V x i X- V , X i ) .

Now Y = mi( Y ) ( X i ) ;hence (Y, Xj> Put e, = (Xi, X i ) . (Then e i = div X But

=

= ui(Y)<Xi >

Xj>-

1.) Now

c e i ( ( X i ,V X i X )

( X i , vx X i )

- (Xi, V,Xi)).

i

=X((Xi,

Xi)) - = 0.

Now we have div and A expressed in terms of the frames and the connection : (29.5) div X = C e i ( V X i X ,X i > i

A(f)

= div(gradf) =

1 ei(VXigradf, Xi>. i

(29.6)

388


Geometry

There is another useful form of A(f). Starting from (29.6), we have A ( f > = C ei(Xi((gradf7 Xi>) - (gradf, VxiXi>). I

Hence,

A ( f ) = C ei(Xi x i ( f ) - (Vxi Xi)(f)).

(29.7)

1

Equation (29.7) is the form that most closely resembles the usual Laplace (or d'Alembert) operator for Euclidean space. For if the connection is flat, a coordinate system (xi)can be chosen so that d

xi = axi ,

V&Xi = 0.

Then

(Laplace or d'Alembert are obtained by specializing the e.) On the other hand, for an arbitrary metric we can always find an orthonorma1 moving frame (Xi) such that at a given point p ,

VXiXj(P)

= 0.

Further, a coordinate system (xi) about that point can be chosen so that 22

(Exercise: Prove these statements.) Then, at one point, we can always arrange that A has the same form as in the flat Euclidean case. Finally, to illustrate the usefulness of these formulas, we shall derive a formula for the Laplace-Beltrami operator for the metric induced on a submanifold N of M . Suppose that dim N = p , and choose the additional range of indices: 1 < a, b, . . . 5 p ; p 1 5 u, v, . , . 5 n. Suppose that ( X , ) are tangent to N . (Notice that we are implicitly assuming that the metric induced on N is nondegenerate.) Suppose that grad f = Y + Z , where Y and Z are vector fields that are, respectively, tangent and perpendicular to N . Then Y restricted to N is the gradient vector field offrestricted to N with respect to the induced metric on N . Then

+

A"(f) =

c e,<x, a

7

v,.

y>,

389


where AN is the Laplace-Beltrami operator with respect to the metric induced on N applied to f restricted to N . Now (X, v x a z>= X,((X, 9

3

2 ) ) - (VXa

x, z>= - S,(X,, 9

XJ,

where S, )( , ) is the second fundamental form of N . Now

Example: Spherical Harmonics Suppose that M is flat Euclidean space, with Euclidean coordinates (xi), and that N is the sphere of radius r. Let g = $xixi = $Ix12. Then p = n - 1. Choosing the moving frame so that (X,,) is tangent to A, we have

x, =

grad - xi(a/axi) -llgrad 9 II 1x1 *

Suppose that f is a function on R" that is homogeneous of degree A. By Euler's homogeneous function relation,

That is,

Now g is a solution of the Hamilton-Jacobi equation for the Riemannian metric ds2 = dxi dxi, since IIgrad 911 = 2g. We know, then, from general principles that VxnX, = 0; that is, the integral curves of X,, are geodesics. (This is obvious geometrically, of course: The integral curves of X,, are the orthogonal trajectories of the spheres concentric about the origin, that is, straight lines.) We have also seen that the second fundamental form Sxn( , ) of the sphere of radius r has all its eigenvalues equal to - ( l / r ) . Then

390


-_ -

_ _ Af

n2f

Af(n - 1)

r2 +

r2

r2

A r2

= -(A

+ n - 2)f.

In particular, we have the following. THEOREM 29.1

Iff is a function on R" which is harmonic (that is, satisfies A(f) = 0) and which is homogeneous of degree 2, then f restricted to the unit sphere in R" is an eigenfunction of the Laplace-Beltrami operator on the sphere with eigenvalue A(1 n - 2). In particular, we may consider thefthat are polynomials and that also are harmonic on R".Those polynomials of degree A are permuted by the action of the rotation group, and this gives all finite-dimensional linear representation of the group of rotations of R".

+

The Geometric Background for

"

Separation of Variables "

Now we return to the first definition of A(f) as div(grad f ) . Suppose f, g are functions on M . Then grad(fg) = f grad g

+ g grad f .

A(fg)Q = grad(fg>(Q>

s)(0) + (9 gradf)(@ g(0) df A (grad g _I 0)

= (fgrad

= fgrad

+

+ g gradf(8) + dg

A

(gradf

Now

df

A

(grad g

_I

(df A 0) - (grad g J d f ) 0 - (grad f,grad g)0.

0) = grad g =

_I

Hence, A(fg> = f A ( 9 ) + 9 A ( f ) - (gradf, grad

s>.

We have now proved LEMMA 29.2 A(fg)

= A(f)g

+f A (9) if and only if (grad f,grad g)

= 0.

_I

0).

39 1


LEMMA 29.3 Suppose that f is a function on M such that 1lgradfll2 = Fdf).

(29.9)

A(f) = F2(f).

(29.10)

Then there exists a function g = F ( f ) such that A(9) = k.

(29.11)

(Fl( ), F,( ), and F( ) are functions of one real variable, say, x.) In fact, g satisfies (29.11) if and only if F"(x). F , ( x )

+ F'(x)F,(x) = i F ( x ) .

(29.12)

Proof. Suppose we look for g = F(f) to satisfy (29.1 1). dg = F"f) df;

hence, grad g

= Fr(f)grad f.

A(s>d = (grad S)(Q = (F'(f)gradf)@ = F'U) = F'(f)

A(fW A(f)d

+ d(F'(S)) A (gradf

+ F " ( f ) df

(gradf = F ' ( f ) A(f)Q + F " ( f ) IIgradf II 0. A

_J

6) 6)

Formula (29.12) now follows. Functions f satisfying (29.9) and (29.10) are called isoparametric (with respect to the given Riemannian manifold). Once such functions have been found, Lemma 29.2 tells us that eigenfunctions for the Laplace-Beltrami operator can be found by solving an ordinary linear differential equation, namely, (29.12). Conditions (29.9) and (29.10) also have a geometric significance: THEOREM 29.4 Let f be a function such that Ilgradfll is a (nonzero) function off. Then A(f) is a function offifand only if the mean curvature of the level surfaces of f is constant on each surface. Proof. We see from the proof of Lemma 29.3 that we may change f by a function of one variable, f - F ( f ) . In particular, we may suppose

392


Ilgradf/12 = + 1 ; hence we may choose an orthonormal moving frame (XI, . . . , X,,) such that g r a d f = XI.

Now Hence,

The right-hand side is now precisely the mean curvature of the level surface

f = constant, that is, the sum of the eigenvalues of the second fundamental form.

Green’s Formula Suppose now that A4 is a manifold with a boundary hypersurface dM = N and a pseudo-Riemannian metric, denoted by ( , ). We shall be using the concepts involving integration that were introduced in Chapter 7. Let A g be functions on M . We suppose that M is orientable and that 0 ( = o1A . .. A o,, in terms of orthonormal moving frames) is the volumeelement differential form defined by the metric. Then g A(f)Q = g gradf(0) = g d(gradf _I 0) = d(g gradf -I 0) - dg A (gradf A 0). dg

A

(gradf J 0) = -(gradf

_I

dg) . 0 =

-

(gradf, grad g)0.

This is symmetric in g andf’; hence fA(g) - g A(f)

= d(fgrad

g

_I

0 - g gradf -I 0).

Integrating over M and using Stoke’s formula on the right-hand side, we have

J;,( f & >

-

9 A(f>Y =

1 N

f(grad 9

J0

- g(gradf J 0).

(29.13)

Suppose now that X N is a unit-length vector perpendicular to aN, pointing into M from N . Then X N _I 0 restricted to N is the volume element form on dN,


393

which we denote by O N . If X is any vector perpendicular to X N , notice that X _I 8 is zero restricted to N . Now gradf- X N (f ) X N is perpendicular to N . Hence

Equation (29.13) now becomes JM

Cf&)

-9

A(f>P=

1CfX"(9) N

- 9XN(f)PN.

(29.14)

This is Green's formula, from which many integral formulas and uniqueness theorems can be proved. We shall refer to a textbook on partial differential equations for detail on these applications (Garabedian [l]). As an illustration, we shall consider integral formulas obtained by inserting for g a fundamental solution of the Laplace-Beltrami equation A = 0. Let p be a point of M. A function g that satisfies A(g) = 0 on M - @), but has a singularity at p , with JM

fA(9)

=f(P)

for a l l f E ccdm

(29.15)

is called a fundamental solution. Symbolically, A(g) = 6,. Suppose that A(f) = 0. Then, from (29.14), f(P)

=

J CfXN(9>- sXNU)loN. N

(29.16)

This can be interpreted as a " mean-value" formula expressing the value of f a t p in terms of the values off and its normal derivative over the boundary of M . Suppose now that g = constant on N . Now, inserting g = 1 in (29.14), we have

JNXN(f)8"

= 0.

Hence, the second term in (29.16) drops out, and we have

(29.17) which is a " mean-value formula " for.f(p). For example, in Euclidean space it is readily verified that the fundamental solution can be chosen so as to be constant on concentric spheres about p , giving the ordinary mean-value formula for harmonic functions.

30

Characteristics and Shock Waves

We shall give only a sketchy treatment of the subject matter indicated in the title. A full-scale exposition would involve us in a good deal of the theory of partial differential equations and applied mathematics, and hence would require another book. Our starting point is a brilliant but little-known exposition of the theory of shock waves given by Levi-Civita [2]. The main idea can be described very easily in terms of manifolds. Let M and E be manifolds, and let (b be a mapping of M + E. We shall suppose that, in local coordinates for M and E, 4 satisfies a given system P of partial differential equations of order r.

Definition A submanifold N of M is a characteristic submanifold for the system P of partial differential equations if there is a map 4 : M + E such that:

(a) (b restricted to M - N is differentiable to all orders and is a solution of the system P . (b) 6 is differentiable of order Cr-’ on all of M , but not all the rth derivatives of (b are continuous at each point of N . (c) The limits of the rth-order derivatives of q5 exist along curves in M - N that approach points of N . (This condition will be made more precise below.) A map (b: M + E satisfying (a) through (c) will be called a shock solution of P , with A; as its submanifolds of discontinuity. In addition to giving this general definition (not precisely in this language, of course), Levi-Civita points out how the differential equation for the characteristic submanifolds, and the jump-conditions for the rth derivatives of 4 on N can be obtained as a consequence of “ geometrical-dynamical compatibility conditions,” which are obtained by combining the compatibility relations implied by (a) through (c) with the fact that (b “solves” P . Rather than carry out the details of this program in full generality, we propose in this chapter to concentrate on what seem to be the most important cases for geometric and physical applications, namely: where E is a vector bundle over M and in addition 4 is a cross section. We shall be especially interested in the case where E is the tangent bundle (that is, the 4 are vector 3 94

30. Characteristics and Shock Waves

395

fields, which we like to denote by such letters as X , Y , . . .). It will be almost obvious how to extend from this case to the most general vector bundles; hence we shall not pursue the more general directions. The Geometric Compatibility Conditions for Vector Fields Let M be a manifold and let N be a submanifold of M . All data will be C" unless mentioned otherwise. Let X be a vector field on M which satisfies the following conditions : X is continuous (as a cross-section map: M -+T ( M ) )on M .

X i s C" on M - N .

(30.la) (30.1b)

Let a(t) and ol(t),0 I t i 1, be curves in M , with a([) and o,(t)E M - N for 0 < t < 1, and a(0) = a,(O) for p E N . Let Y be a ( P )vector field on M . Suppose that lim [ Y , X ] ( a ( t ) )exists. 1-0

We shall call it 6( Y , X , a). It is an element of M o ( o ) LetfE . F ( M ) . Then

W Y , x , a) = f(P) 6(Y, x,a) - X(f)(P)Y(P).

(30.2)

From (30.2) we have LEMMA 30.1

6( Y, X , a) - 6( Y, X , al)depends only on the value of Y at p . If, for u E M , , we put BAY)

=w

, x,4 - w ,

x,a),

(30.3)

where Y is a vector field such that Y ( p ) = u, we conclude that 6,(zl) depends linearly on u. Thus, if we are given a smooth pair of family of curves, one pair for each point of N , ti + S,(ti) can be considered as a tensor field on N which measures the jump in the first derivatives of X across N . We aim now to find the compatibility conditions that result from the fact that X itself is continuous across N . Suppose that 2 is a vector field on M that is tangent to N . For p E M , lets + exp(s2)p be the integral curve of Z starting at p . Let o be a differential 1-form on M . Then, for each s, lim o(X(exp(sZ)a(t))) = lim o(X(exp(sZ)a,(t))). t-0

t-0

Take the derivative with respect to s of both sides of this relation, assume that

396


it is permissible to irzterchange limit and derizxztiue, and set s = 0. The result is

+

lim Z ( o > ( X ( d t ) )+) w ( [ Z , Xl(4m) = lim Z(w>(X(c,(t)>>4 [ Z , XI(c1(t)N. 1-0

t-0

(Z(w) denotes the Lie derivative of w by the vector field 2, which is again a

1-form.) The first terms on both sides of this relation cancel each other, since X is continuous across N . Since w is an arbitrary differential form, we have proved the following. THEORI'M 30.2

With the assumption listed above, 6,(u) is zero when u is tangent to N . (This condition may be referred to as the geometric compatibility condition.) The extension to the case where X has discontinuities of the rth derivatives across N , but is C r - ' on all of M , can be made similarly. For any choice ( Z , , . . . , Z,) of vector fields, [Z,, [Z,, . . . [Z,, X I ] - . . ]is continuous across N , but its derivatives have a jump across N . Thus we can define for u l , . . . , u, E M , , 6,(ul, . . . , u,) by choosing vector fields (Zl, . . . ,Z,) whose value at p is ( u l , letting G(ul, . . ., u,) = lim [XI, . . . [ X , , Z], . ..](a(t)) f+O -

lim [ X I , ... [X,, Z],

r-0

. . . , ur), and

...I(a,(t)).

We see immediately that 6 depends tensorially and symmetrically on u1 ,. . . ,u , , and that < S ( P ~ , . . . , ti,) = 0 whenever one of the u is tangent to N .

The Dynamic Compatibility Conditions In general, the " dynamic compatibility conditions " are those conditions imposed on the tensor field 6 constructed in the preceding section by the condition that X satisfy a certain rth-order system of differential equations; hence, in particular, certain functions of the rth-order derivatives of X must be continuous across N . A common type of differential equation to require of a vector field X is that the Lie derivative of a tensor field on A4 be zero. For example, we shall examine the cases where the tensor field is a p-differential form 1). Suppose, then, that X ( 0 ) is continuous across N . Suppose that Y,, . . . , Y,, are C" vector fields on M . Then X(0)( Y,, . . . , Y,,) are continuous across N . But X(O)(YI, . . . , Y,) = X ( O ( Y , , . .

. ?

Y , l , Y, . . . , Y,) ... - O ( Y , >. . ., [ X , Y,,l).

YP)) - WCX, -

9

397


Now X(O(Y,, . . ., Y,)) is continuous across N , since X is. Thus we have proved THEOREM 30.3 If 8 is a p-form such that X(8) is continuous across N , then for ul, . . . , up E N , , e(s,(u,),

u2,

. . . , up) + . . . + qv1, .. . ,u p -

s,(u,))

= 0.

In particular, if 8 is a nonzero m-form (m = dim M ) , then the linear transformation u -+ 6(v) of M , -+ M p has trace zero.

Example

A4 has a pseudo-Riemannian metric, defined by an inner product ( , ); X = grad A for f E F ( M ) , that is, ( X , Y ) = Y ( f ) for Y E V ( M ) ; 8 is the volume element differential form defined by the metric; N is a hypersurface of M . Now X(8) = (div X)O, where div X is the divergence of the vector field X . Hence A(f), the Laplacian off, is div(gradf). Suppose that we require that A ( f ) be continuous across N . LEMMA 30.4 With X

= grad J;

we have that ( M u ) , u> = ( u , d,(u)>.

Proof. Let Y , 2 be vector fields on M . Then,

[Y,X]=VyX-V,Y. (VY

x,2 ) = Y ( < X)0, - + ( X , VY Z ) = Y ( W ) ) + <x,VY -0.

We see, then, that the only continuity jumps in ([ Y, X I , Z ) occur in the contribution of Y ( Z ( f ) ) ,which is equal to Z( Y(f))+ [ Y, Z ] ( f ) .Since [ Y , Z ] is a first-order operator, we have the lemma. Now suppose that g is a function on M such that dg # 0 at every point of N , but g = 0 on N . Then, forp E N , grad g is perpendicular to N , . We shall show that: If the first derivatives of X are discontinuous across N at every point of N , then grad g must lie in N , for each p E N , . In particular, (grad g, grad g) = 0 on N . Suppose otherwise : That is, for some p E N , grad g(p) is linearly independent from N , so that N , and grad g ( p ) together span N , . By Theorem 30.3,

398


Geometry

the trace of 6 must be zero; the geometric compatibility conditions require that S,(N,) = 0. These two facts require that & & - a ddP)) E N , . Then (6(grad g ( p ) ) , grad g(p)> = 0. But also, by Lemma 30.4, (h(grad

N P > = 0.

Since ( , ) is nondegenerate, &grad g ( p ) ) must be zero, which implies that 6 ( M p )= 0, contradicting that the first derivatives of X have a jump across N . In other words, we have proved the following theorem: THEOREM 30.5 Let A be the Laplace-Beltrami operator associated with a pseudo-Riemannian manifold. Consider the partial differential equationf- A ( f ) . . . . (The dots indicate terms of lower order than the second.) Let g be a function on M whose level surfaces are characteristics for the equation in the sense that they are shock solutions for which there are the surfaces of discontinuity for the second derivatives. Then the length of grad g is zero; that is, grad g is a lightlike vector field on M .

+

Shock Conditions for Tensor Fields Occurring in Classical Continuum Physics Let M be a manifold with an affine connection V . For simplicity, we again suppose that V has a zero torsion tensor. Let T be a tensor field of type ( I , 1) on M . This means that T is an F(M)-linear map of P'(M)+ V ( M ) . Let Y E V ( M ) . The roruriunt derivatice of T by Y , denoted by V,T, is another tensor field of the same algebraic type as T. Explicitly,

V , T ( X )= V,(T(X)) - T(V,X)

for X

E

V(M).

(30.4)

(The reader will verify that V,T so defined is F(M)-linear on X ; hence it really does define a tensor field on M.) Hold X and a point p E M fixed: u + V , T ( X ) defines a linear transformation of vectors at p into vectors at p. The trace of this linear transformation is a number depending linearly on the value of X at p . As p and X vary, we get a differential form on A4 called the dicergence of T, denoted by div(T). To get an explicit formula for div T, choose a basis ( X i ) for vector fields on M(l 2 i , j , . . . n = dim M ) . Let (mi)be a dual basis for 1-forms on M ; that is, m , ( X j ) = 6,, . Then div T ( X ) = w,(V,,(T)(X))

for X E V(A4).

(30.5)

399


Tensor fields of the type T can occur in a very fundamental way in classical continuum physics. This tensor operation of " divergence " is the basic one appearing in the partial differential equations of continuum physics. We shall not attempt here to explain why this is so. I t seems to be built into the geometric assumptions underlying our understanding of the " continuum." Brillouin's book [l] offers the reader the most convincing detailed explanation of this fact, at least for elasticity theory. Many books on general relativity offer more general explanations. (The books by Levi-Civita [l] and Einstein [I] are the best from this point of view.) Indeed, that this tensor operation of " divergence " occurs in the same way in all branches of classical continuum mechanics was one of the main mathematical clues in Einstein's mind in constructing The General Theory of Relativity. Suppose, then, that T is such a tensor field on M , that N is a hypersurface of M such that T is C" on M - N , but is merely continuous on all M . For each X E V ( M ) ,T ( X ) is then a vector field that is C a, on M - N , but which is only continuous on M . Hence, for p E N , for two curves CT and C T ~starting at p but pointing out into M - N , for u E M , , we can define 6T(.X)(u)E M p

G S base, measuring the jump in the first derivatives of T ( X ) across N . To make things more symmetric, we can define 6,: M , x N p -+ M , by setting 6T(X(P)?

u> = 6 T ( X ) ( U ) -

Explicitly, then, for X , Y E V ( M ) ,

Now

[ Y , T(X)I

= V,(T(X)) - V T ( X ) y = (V,

T ) ( X )+ W , X ) - V T ( X ) Y .

Since X and Y are C" vector fields, and T itself is continuous across N , the last two terms contribute nothing to (30.6); hence 6T(x(P),

'(P))

= l i m 'k@{ 1-0

T)(X)(CT(t)) - (vY

T)(x)(CTl('))}.

(30.7)

The geometric compatibility conditions are then

6,(M,, N P )= 0

for all p E N .

(30.8)

Suppose that div T is continuous across N . Then, by (30.7), the trace of the transformation u -+ 6,(~, u) is zero for fixed u E M,. This, together with (30.8), implies that

6,(M,, M P )c N ,

for all p

E

N.

(30.9)

400


Geometry

To get some interesting information about N , let us suppose that M has a pseudo-Riemannian metric ( , ) for which V is the Levi-Civita affine connection, and such that (T(X), Y ) = k ( X , T( Y ) )

for X , Y

E

V(M).

(30.10)

(In elasticity it is usual to have a (+) sign, while the (-) occurs in electromagnetism.) Suppose that g is a function on M such that dg # 0 at points of N , but that g ( N ) = 0. For ,'A Y, 2 E Y ( M ) , (V,(T)(X),

z>= Y(.

(30.11)

In particular. 0 = (d,(X, Y ) , grad g>

=

+_<Mgradg, Y ) , X ) .

Summing up, we have proved:

THEOREM 30.6 Suppose that Tis a tensor field on M of type (1, 1) satisfying (30.1 1); that N is a hypersurface on M across which T and div Tare continuous. Suppose that g is a function such that g ( N ) = 0. Then &(grad g, M P )= 0

for all p

E

N.

(30.12)

Notice that the condition div T continuous is not yet sufficient to indicate a condition for N which is independent of T. Such a condition requires additional differential equations that T must satisfy, which in physical problems involve thermodynamic conditions.

31 The Morse Index Theorem Consider a differential equation of the form

u"(t) + r(t)v(t) = 0.

(31.1)

Classical Sturm-Liouville theory deals with such equations in which u(t) and r ( t ) are scalar-valued functions of t . The theory of these equations is well known, and of course they appear in many contexts in applied mathematics. However, the theory of systems of type (31.1) in which u ( t ) is a vector-valued function of t is considerably less developed and less well known, despite the fact that many physical problems lead to such equations in a very natural way, particularly in stability problems. Morse [l] has developed the foundations for a successful generalization of the classical Sturm-Liouville theory to such systems. We shall now present enough notations and definitions to be able to state the main result: the Morse index theorem. The proof will be given later. Since it may be difficult for the reader to see the forest for the trees while reading the proof, we may point out here that the proof basically consists in putting together certain well-known analytical techniques concerning systems of second-order linear differential equations with the basic ideas of the calculus of variations. The main difference between our proof and Morse's is that we try to work directly with the infinite-dimensional linear spaces that occur, whereas Morse, by a variety of ingenious analytical and geometric tricks, tries to reduce the infinite-dimensional situation to a finite one. Let V be a vector space of finite dimension? over the real numbers. Elements of V will be denoted by such letters as u, u, w, .. . . It will be assumed that V has a given fixed, positive-definite, symmetric bilinear form (u, v ) + (u, v). Thus

0, and two sets ( W , Q ) and ( W " , p") of boundary conditions. We refer to (31.5) and (31.6) as, respectively, left- and righthand boundary conditions.

403

31. The Morse Index Theorem

There is a problem in the calculus of variations associated with (31.4) through (31.6) that is the foundation for the Morse treatment. Proceed as follows to find it: Supposeu(t),O 5 t i a, satisfies (31.4) through (31.6). Then?

- j: 0 , - (40, w'(t)>) = ( ~ " ( t ) ,4 rl t

9 ) + (u'(4, w'(t)> - (u'(9, w'(t)> - (4%w"(t>> = < - Rt(v(t)), 4 0 ) + ( U ( t > , R,(w(t)))

-

= 0,

obtained by use of the symmetry property of R , . Now

(~'(o),~ ( 0 ) -) J-V

which is equal to f L ( X , Y) when the right-hand side is projected mod H. Hence L passes to the quotient with respect to the restriction mapping H + H,, , and we get, for each p E N , a bilinear mapping (which we again denote by L( , )) of H , x H , + M , / N , . This field of bilinear mappings is called the Levi f o r m of N . Explicitly, then, for X , Y E H , p E N , L ( X ( p ) , Y ( p ) ) = J [ J X , Y ] ( p ) projected into M,/N,.

(32.10)

LEMMA 32.2

The Levi form is symmetric. Proof. This follows from the integrability condition (32.4): J [ J X , Y ] - J[JY, X ]

=

[Y, X ]

+ [JY,J X ] .

The right-hand side projects into zero when projected mod N , . The left-hand side, though, is L ( ~ ( P )W, p ) ) - L( Y(P),XP>). Let us examine now the consequences of the Levi form vanishing identically.

424


THEOREM 32.3 If the Levi form vanishes, then the field p + H p of tangent subspaces of T ( N ) is completely integrable. The maximal integral manifolds of this field then define a foliation of N by maximal complex submanifolds. In particular, if N is a hypersurface of M (that is, if dim M = dim N + l), then these complex submanifolds of N are hypersurfaces in N ; hence N may be considered locally as a one-parameter family of complex-analytic hypersurfaces? of M . (Such objects are called hyperplanoids in the classical literature.) Conversely, if a real hypersurface of M has this geometric property, then its Levi form vanishes. Prooj: To prove integrability of p H p , we must show that [ H , N ]c H. If X , Y E H , L ( X , Y ) = 0 if and only if J [ J X , Y ] is tangent to N , hence if [ J X , Y ] also belongs to H . This condition is obviously equivalent to [H,H 3 c H . That the maximal integral submanifolds of the field H , 4H, are complexanalytic submanifolds of M is clear from Theorem 32.1, since J(Hp)= H p , and the tangent space to the maximal integral submanifolds is precisely H , . The converse is obvious. --f

For the rest of this chapter, we shall concentrate on the case where N is a real hypersurface. We can give a more convenient characterization of the hyperplanoids. THEOREM 32.4 The hyperplanoids that are real-analytic are locally, precisely the hypersurfaces that can be written as f = 0, wheref'is the real part of a holomorphic function f + J-1 g = F. Proof First notice that a hypersurface determined by f = 0 can also be written locally as the locus determined by

,/-

1 F - t = 0,

where t is a real variable;

that is, the hypersurface is composed of a one-parameter family of complexanalytic hypersurfaces. Conversely, suppose that A is a complex manifold of one complex dimension less than M , and that 4 : A x R + M is a real-analytic submanifold mapping such that, for fixed ?, the mappingp + $(p, y ) of A + M is holomorphic. (This is precisely what is meant by a hyperplanoid.) We can suppose without loss in generality that M is C" itself, and that A is There is a constant confusion in the terminology of complex manifold theory between real and complex dimension. A " complex-analytic hypersurface" is a complex submanifold of two less real dimensions.

425

32. Complex Manifolds and Submanifolds

C"-'. Since 4 is real-analytic, we can extend 4 to a mapping ofC"-' x C + C" by extending t to be a complex variable. The condition that 6 be a submanifold map requires that this extended map of en-' x C + C" have nonzero Jacobian. Then, by the implicit function theorem, there is (always, locally, of course) an inverse holomorphic map C"+ C'-' x C. Following this map by the projection C"-' x C + C , we obtain a holomorphic function F on c", that is, on M . The image of N in M is characterized by the condition that F take real values on N ; that is, N is obtained by setting the real part of F equal to zero. Q.E.D.

fl

All through this chapter there has been, in the background, an analogy with the theory of submanifolds of affinely connected spaces that was built up in Chapter 27, with the Levi form analogous to the second fundamental form. We can now make this analogy more explicit. THEOREM 32.5 Suppose that, in addition to the complex structure, M has an affine connection V with zero torsion tensor such that the covariant derivative of the J-tensor defining the complex structure is zero. Let N be a submanifold of M , let S( , ) be its second fundamental form with respect to the affine connection, and let L( , ) be its Levi form with respect to the complex structure. Then L(u, v) = S(u, v ) S(Ju, Jv) for u , v E H , , p E N. (32.11)

+

Proof. The condition that the covariant derivative of the J-tensor be zero is explicitly for X, Y E V ( M ) . V, J( Y) = JV, Y

The torsion-free condition is V, Y - V, X = [ X , Y]. Then, for X , Y E H , J[JX, Y] = J(V,, Y - V,JX)

= V,,JY

+vyx.

Taking the value of both sides at p E N and projecting mod N , gives (32.1 1). Q.E.D. The simple formula suggests a relation between " affine conversity " and pseudoconvexity." We shall discuss this only in case N is a hypersurface, so that dim M p / N , is 1. Identifying it with R, recall that N was said to be convex if S : N , x N p + R kept a fixed sign. Notice then that L : H p x H p -+ R also keeps a fixed sign: This property of L is called "pseudoconvexity." Notice, for example, that the flat affine connection on R2" satisfies the hypotheses of Theorem 30.5 when R2"is identified with C",so that a hypersurface of Euclidean space that is convex in the usual sense is pseudoconvex. This analogy can be pursued much further, but this would involve us in another book describing the theory of functions of several complex variables. "

426


Geometry

The last topic we shall touch on is the question of the geometric nature of “domain of holomorphy” idea. To someone who is familiar only with the classical theory of holomorphic functions of one complex variable, the theory for several variables seems very bewildering, since many of the general principles that are familiar in the one-variable case are completely different in the case of several variables. For example, any domain in C ’ is a “domain of holomorphy ; that is, there are holomorphic functions in the domain that cannot be extended to be holomorphic in any larger domain. The situation is completely different in C” for n 2 2. The simplest example was pointed out by Hartogs, namely: Any function that is holomorphic in the region between two concentric spheres can be extended to a holomorphic function in the interior of the bigger sphere. It turns out the geometric key to this phenomenon is that the sphere is strongly convex; hence it is also pseudoconvex. (Notice that “pseudoconvexity” for real curves in C ’ makes no sense, since H , must always be zero in this case.) Now the most definitive results along these lines has been proved by Grauert, Kohn, and H. Rossi. We cite Kohn [l] for further details. Again, these involve analytical techniques that transcend the scope of this book; we shall present only a simple remark that gives some intuitive geometric insight into their results. ”

THEOREM 32.6 Let N be a real hypersurface of a complex manifold M . Suppose that the Levi form of N is nonzero at each point of N . Let f be a function on M that is the real part of a holomorphic function. Then the derivatives of fat points of h7 in direction normal to N are determined by derivatives o f f i n directions tangential to N . In particular,fcannot be constant on N unless it is identically constant on M . Proof. Let X E H . By hypotheses, for each p E N we can choose X so that L ( X ( p ) , X ( p ) ) is not tangent to N . Then J[JX, X I is not tangent to N at p ; hence, also in a certain neighborhood of p . Then any vector field 2 in a neighborhood of p can, after multiplication by a factor, be written as J [ J X , X I + Y, where Y is tangent to N . Suppose/+ figis holomorphic on M ; that i s ,f a nd g satisfy (32.1). Then

w-1= J C J X , Xl(f> + Y ( f >= - CJX, Xl(S) + W ) = X ( J X ) ( S ) - ( J X ) ( X ) ( g )+ W ) = X ( X ) ( f ) + ( J X ) ( J X ) ( f >+ Y ( f ) .

The left-hand side involves a derivative offin a normal direction to N , while the right-hand side involves derivatives that are in direction tangent to N . The argument can be iterated to show that all normal derivatives can be so expressed. Q.E.D.

33

Mechanics on Riemannian Manifolds

Recall from our short exposition (in Chapter 11) of classical mechanics of particles and waves that many things are closeIy related to the geometry of Euclidean spaces. Now that we have acquired more experience with Riemannian geometry, it is interesting to extend these ideas to Riemannian spaces. This is more than an academic exercise. Some ideas become simpler when looked at from such a Riemannian standpoint, and some (for example, constrained motion) must inevitably involve Riemannian geometric ideas, even if the classical treatments manage to disguise this point rather well. Arnold has recently made this point [2], and some of our work will follow his ideas.

Newton’s Equations on an Affinely Connected Manifold Let M be a manifold, with an affine connection, defined by a covariantderivative operation V. If a: t -+ o(t) is a curve in M , if 0: t + v ( t ) is a vector field on a, then Vv(t) denotes the covariant derivative of the vector field. Recall how this is defined. Suppose X , Yare vector fields on A4 such that Then

o’(t) = X(o(t)),

u ( t ) = Y(a(t)).

Vu(t) = v, Y(a(t)).

This immediately enables us to formulate Newton’s equations. Let F, a “force field,” be a map: T ( M ) x R -+ T ( M ) . Newton’s equations with this force field are Vm(a’(t)) = F(a’(t), t ) (33.1) (m is a tensor field on M such that its value at a point p is a linear map: M p + M , . It will be called the muss tensor). D’Alembert’s principle also makes sense on an affinely connected manifold. Let N be a submanifold of M , and let p -+ Np’, be a field of transversal subspaces of M,, defined for p E N , such that

Mp=Np@Np’

forpsN.

Suppose a “particle” with mass tensor m moves under a force law F, with 427

428


the additional condition that it is constrained to be an N . “D’Alembert’s principle now prescribes that the forces of constraint be in the transversal direction defined by N’, that is, ”

(33.2)

Vm((i’(t)) - F((i’(t), t ) E N & t ) .

In Chapter 11, we described a method for writing these equations in more explicit form. We can now show how this is done from the affine connection point of view. Recall (Chapter 28) that the induced affine connection V, is defined on N as follows: If (i: t -+ u ( t ) is a curve along (i tangent to N , then V u ( t ) = projection of V u ( t ) on N o ( t ) ;that is, Vu(t) - Vv(t)

= S(o’(t),v ( t ) ) E

(33.3)

N&),

where S(o’) is the second fundamental form of the submanifold. Apply this to (33.2): We obtain the equations

+

V, m((i’(t)) S((i’(t))- F

E

N:(t);

that is, (33.4) (33.5) where F,, and Fl are the projection of F tangent to and transversal to N . Notice that Newton’s law for the constrained motion, (33.4), is of the same form as Newton’s law on M . Newton’s Law of Motion and Killing Vector Fields

Suppose that M is a manifold with a Riemannian metric. In fact, we do not need to assume that the metric is positive-definite. Let ( , ) be the inner product on vector fields defining the metric, and let V be the Levi-Civita affine connection associated with the metric. Consider a force-field F and a particle moving along a curve t + a ( t ) according to Newton’s law: V(o‘(t)) = F(o‘(t), t).

(33.6)

(For simplicity, we assume that the mass tensor is the identity. It can always be absorbed in F.) Suppose Xis a vector field on M . Define a functionf, on T ( M ) as follows: fx(u) = (0, X(P)>

for P E M ,

2)

E

M,.

33. Mechanics on Riemannian Manifolds

429

We want to investigate whether f, is a “ conserved quantity,” that is, whether (d/dt)f,(o’(t))= 0, where o(t)satisfies (33.6). In fact we have d --fx(o’(t)) = (Vo’(t),X )

dt

= (F(a’(t), t),

+ (o1V0,X ) X)

+ (o’,V , ? X ) .

Suppose that X is a Killing vector field, that is, is the infinitesimal generator of a one-parameter group of isometries of M . Then we know (Chapter 28) that the condition for this is that (v, V , X ) = 0

for all v E T ( M ) .

Hence we see that d -fx(a’(t)) = (F(o’(t), t ) , X(o(t>)>. dt

(33.7)

This is a remarkably simple formula that accounts for the comparative simplicity of the equations of motion of those mechanical systems whose configuration space admits a transitive group of motions.

Newton’s Equation of Motion on a Lie Group; Euler’s Equations of Rigid Body Motion As we saw in Chapter 14, if a Lie group G acts transitively and simply on the configuration space of a system of particles, Newton’s equations of motion take the form

Va’(t) = F(a’(t), t ) ,

(33.8)

where V is the affine connection defined by a right-invariant metric ( , ) on G. Consider G, the Lie algebra of G, as the subalgebra of V ( G )consisting of the right-invariant vector fields. Then ( X , Y ) = constant

for X , Y EG .

(33.9)

Suppose X : t + X ( t ) is a curve in G such that X ( t ) ( o ( t ) )= d ( t ) for all t, where t 4 o(t)is the solution of (33.8). Then

V,,,,(X(t)> = F(o’(tL0But the left-hand side equals

dX dt

-(o(t)) 4- v, x .

430


Now, for YE G, we have, using (20.2),

( V X X , y > = X ( ( X , 0 )- Y ( ( X , X > ) + ( X , [ Y , X I > = ( X , CY, X I > , by (33.9). Suppose now that B(X, Y ) is a nondegenerate symmetric, bilinear form on G that is invariant under the adjoint representation; that is, B ( [ Z , X I , Y ) + B ( X , [ Z , Y ] )= 0

for X , Y, Z

E

G.

(Such a form, the Killing form, exists if G is semisimple, a condition that is adequate for our purposes. See Helgason [ I ] or Hermann [8].) Let A be the symmetric linear transformation: G + G such that for X , Y E G .

( X , Y ) =B(AX, Y )

Then we have

or

i: 1

B A -,

Y - B ( [ A X , X I , Y ) = B(AF, Y ) .

Since this holds for all Y E G, we have dX A-=[AX, dt

X ] +AF

or

dX dt

-=

A-'[AX, X ]

+ F.

(33.10)

In particular, if A = identity, that is, if the metric ( , ) is left-invariant also, then dXjdt = F. The equations are then (for the case G = SO(3, R)) equivalent to Euler's equations (16.14). In the force-free case, F = 0, they determine the geodesics of the right-invariant metric on G. (Arnold has particularly pointed out [2] the importance of these equations for problems in point and fluid mechanics.)

Bibliography ABRAHAM, R. [l] “Foundations of Mechanics.” Benjamin, New York, 1967. W. AMBROSE, [ l ] The Cartan structural equations in classical Riemannian geometry. J. Indian Math. SOC. 24, 23-76 (1960). ARNOLD, V. [l] Small denominators and problems of stability of motion in classical and celestial mechanics. Russian Math. Surveys 18, 85-192 (1963). [2] Sur la gtometrie difftrentielle des groupes de Lie de dimension infinie et ses applications a l’hydrodynamique des fluides parfaits. Ann. Inst. Grenoble, 16,3 19-361 (1966). AUSLANDER, L., and MACKENZIE, R. E. [l] “Introduction to Differentiable Manifolds.” McGraw-Hill, New York, 1963. R. BISHOP,R., and CRITTENDEN, [ l ] “Geometry of Manifolds.” Academic Press, New York, 1964. BLISS,G. A. [l] The problem of Mayer with variable end points. Trans. Am. Math. SOC.19,305-314 (1918). [2] “Lectures on the Calculus of Variations.” Univ. of Chicago Press, Chicago, Illinois, 1946. BRILLOUIN, L. [ I ] “Tensors in Mechanics and Elasticity.” Academic Press, New York, 1964. CARATHEODORY, C. [l] Untersuchungen iiber die Griindlagen der Thermodynamik. Maih. Ann. 67,355-386 (1909). [2] “ Variationsrechnung.” Teubner, Leipzig, 1935. CARTAN, E. [ I ] “ LeCons sur les Invariants Integraux.” Hermann, Paris, 1922. [2] “ GbmCtrie des Espaces de Riemann.” Gauthier-Villars, Paris, 1952. [3] “ L e ~ o n ssur la MBthode de la RCpkre Mobile.” Gauthier-Villars, Paris, 1936. S. S., and KUIPER, N. CHERN, [I] Some theorems on the isometric imbedding of compact Riemannian manifolds in Euclidean space. Ann. Math. 56, 422-430 (1952). CHEVALLEY, C. [l] “Lie Groups.” Princeton Univ. Press, Princeton, New Jersey, 1946. CHOW, W. L. [ l ] Uber Systeme von linearen partiellen differential Gleichungen. Math. Ann. 117, 89-105 (1940). COURANT, R., and HILBERT, D. [ I ] “ Methods of Mathematical Physics,” Vol. 11. Wiley (Inteiscience), New York, 1962. EINSTEIN, A. [ I ] “The Meaning of Relativity.” Princeton Univ. Press, Princeton, New Jersey, 1950. FEDERER, H. [l] Curvature measures. Trans. Am. Maih. SOC.93, 418-491 (1959). 43 1

432

Bibliography

FLANDERS, H. [ l ] “ Differential Forms, with Application to the Physical Sciences.” Academic Press, New York, 1963. FROHLICHER, A. [l ] Zur Differentialgeometrie der komplexen Structuren. Math. Ann. 129, 50-95 (1955). GARABEDIAN, P. [ l ] “Partial Differential Equations.” Wiley, New York, 1964. GELFAND, 1. M., and FOMIN,S. [I] “Calculus of Variations.” Prentice Hall, Englewood Cliffs, New Jersey, 1963. GELFAND, 1. M., and SILOV, G. E. [l] “ Generalized Functions.” Academic Press, New York, 1964. GOLDSTEIN, H. [l ] “ Classical Mechanics.” Addison-Wesley, Reading, Massachusetts, 1951. GOLUBEV, V. V. [ l ] “Lectures on Integration of the Equations of Motion of a Rigid Body about a Fixed Point.” Off. of Tech. Serv., U. S. Dept. of Commerce, Washington, D. C., 1960. HALKIN, H. [ I ] The principle of optimal evolution. In Intern. Symp. on Nonlinear Dzfferential Equations and Nonlinear Mechanics, 196l (J. P. LaSalle and S. Lefschetz, eds.), pp. 184-302. Academic Press, New York, 1963. HARTMAN, P., and NIRENBERG, L. [ l ] On spherical image maps whose Jacobians do not change sign. Am. J. Math. 81, 901-920 (1959). HELGASON, S. [ l ] “Differential Geometry and Symmetric Spaces.” Academic Press, New York, 1962. HERMANN, R. [ l ] On geodesics that are also orbits. Bull. Am. Math. SOC.66, 91-93 (1960). [2] On the accessibility problem in control theory. In Intern. Symp. on Nonlinear D$ferential Equations and Nonlinear Mechanics, 1961 (J. P. LaSalle and S. Lefschetz, eds.) ,pp. 325-332. Academic Press, New York, 1963. [3] Convexity and pseudoconvexity for complex manifolds. J. Math. Mech. 13. 243-248 (1964). [4] Second variation for variational problems in canonical form. Bull. Am. Math. Sac. 71, 145-149 (1965). [5] Second variation for minimal submanifolds. J. Math Mech. 16,473492 (1966). [6] Remarks on the foundations of integral geometry. Rend. Circ. Mat. Palermo 9, 91-96 (1960). [7] Some differential-geometric aspects of the Lagrange variational problem. Illinois J . Math. 6, 634-673 (1962). [8] “ Lie Groups for Physicists ” Benjamin, New York, 1966. [9] Equivalence invariants for submanifolds of homogeneous spaces. Math. Ann. 158, 284-289 (1965). KLINGENBERG, W. [ l ] Zur affinen Differentialgeometrie. Math. Z . 54, 65-80 (1951). S., and NOMIZU,K. KOBAYASHI, [l ] Foundations of Differential Geometry.” Wiley (Interscience), New York, 1963. KOHN,J. J. [I] Harmonic integrals on strongly pseudoconvex manifolds. Ann. Math. 78, 112-148 (1963). LEVI-CIVITA, T. [ l ] “The Absolute Differential Calculus.” Blackie, London, 1928. I‘

Bibliography

433

[2] “ CaractCristiques des Systkmes Diff&entielles.” F. Alcon, Paris, 1932. LICHNEROWICZ, A. [l] “ ThCories Relativistes de la Gravitation.” Masson, Paris, 1955. MASSEY, W. S. [l] Surfaces of Gaussian curvature zero in Euclidean 3-space. Tshuku Math. J. 14, 73-79 (1962). MILNOR,J. [I] “Morse Theory.” Princeton Univ. Press, Princeton, New Jersey, 1963. MORSE,M. [I] “Calculus of Variations in the Large.” Am. Math. SOC.,Providence, Rhode Island, (1935). MUNKRES, J. [l] “ Elementary Differential Topology.” Princeton Univ. Press, Princeton, New Jersey, 1963. A., and NIRENBERG, L. NEWLANDER, [l] Complex analytic coordinates in almost complex manifolds. Ann. Math. 65, 391-404 (1957). NOMIZU,K., and OZEKI,H. [l] The existence of complete Riemannian metrics. Proc. Am. Math. Soc. 12, 889-891 (1961). OTSUKI,K. 111 On the existence of solutions of a system of quadratic equations. Proc. Japan Acad. 29, 99-100 (1953). PONTRJAGIN, L. [I] “ The Mathematical Theory of Optimal Processes.” Wiley (Interscience), New York, 1962. PRAGER, W. [I] “Introduction to the Mechanics of Continua.” Ginn, Boston, 1961. RADON, J. [ l ] Zum problem von Lagrange. Hamburg Math. Einzelschriften, Number 2. Teubner, Leipzig (1928). ROXIN,E. [ I ] A geometric interpretation of Pontrjagin’s maximal principle. In Intern. Symp. on Nonlitiear Differential Equations and Nonlinear Mechanics, 1961 (J. P. LaSalle and S. Lefschetz, eds.). Academic Press, New York, 1963. SPIVAK,M. [I] “Calculus on Manifolds.” Benjamin, New York, 1965. STERNBERG, S. [ I ] “Lectures on Differential Geometry.” Prentice Hall, Englewood Cliffs, New Jersey, 1964. F. TRICOMI, [ I ] “Differential Equations.” Hafner Publ. Co., New York, 1961. WHITTAKER, E. T., and WATSON, G. N. [I] “A Course of Modern Analysis.” Cambridge Univ. Press, London and New York, 1940. WHITTAKER, E. T. [l] “A Treatise on Analytical Dynamics of Particles and Rigid Bodies.” Cambridge Univ. Press, London and New York, 1959. YOSIDA, K. [l] Introduction to Functional Analysis.” Springer, Berlin, 1965. ‘I


Subject Index

A

Abelian, 95 Abelian group, 180 Action, 135, 136 Addition formula, 233, 239 Adjoint action, 88, 90 Affine connection, 261, 262, 269, 270, 313, 378, 381, 384, 425, 427 Affine transformations, 89 Algebra, 73 Algebraic topology, 62 Angular momentum, 100, 109 Arc length, 274 Arcwise connected, 24 Associative law, 81 Atlas, 24 Augmented index, 414 Autonomous system, 36 Automorphism, 342

B Barriers, 36 Betti number, 182 Bianchi identities, 341 Boundary, 113 condition, 402,406, 415 Bounded operator, 94 Bracket, 87 C

Calculus of variations, 99, 130, 151-153 Cartan form, 118 Cartan-1 form, 154, 170, 183, 213 Cartan-Maurer form, 217 Cartan subalgebra, 228 Cartan subgroup, 227 Cauchy-Riemann equations, 421

Celestial mechanics, 136, 137, 139, 231 Characteristic, 154, 167, 175, 182, 184, 189, 213,363,364,394,400 curve, 118, 147, 171, 172 function, 162 Chart, 24 Chow’s theorem, 247 Christoffel symbols, 262, 324 Classical groups, 96 Classical mechanics, 81, 98, 152, 179, 427 Closed subgroups, 92 Commutator, 84 Compact, 6, 24 Compatible mapping, 309 Complete solution, 137 Completely integrable, 68 Completely integrable systems, 71 Completely integrable vector field system, 78 Completeness, 285, 287, 293, 299 Completion of a path system, 243 Complex lie algebra, 84 Manifold, 420, 426 submanifold, 422 vector space, 96 Conformal Riemannian metric, 288 Conjugate point, 293, 294, 296, 299, 305, 313,316,332,411 Configurationspace, 176,185,219,429 Connected component, 345 Connected Lie group, 91 Connected subgroup, 92 Connection, 24, 95, 97 automorphism, 379, 380, 384 forms, 324, 336, 381 Cononical coordinate system. 95 Cononical coordinate system of the second kind, 95 Cononicalform, 40,125,178 Cononical neighborhood, 95 Cononical transformations, 81, 141, 179,231 Conservation laws, 100 435

436

Subject Index

Conserved quality, 99, 429 Constant curvature, 359 Continuous induction, 284 Continuous mechanics, 104 Contraction, 11,47,114 Constraint subset, 117 Convex, 21,376 hypersurface, 375 Coordinate neighborhood, 24,51 Coordinate patch, 62 Coordinate system, 25 Cosets, 93 Covariant, 98 derivative, 293, 310, 312, 339, 357, 359, 360,425 differentiation, 262-264,268,269 tensors, 25 Covector, 11,12,17,19,20,23 Covering, 24 Covering space or map, 286,290 Criticalpoints, 114,147,181,242 Cross section, 9, 10,17-20,22,26,63 Curl, 3,99 Curvature, 266, 297, 303, 323, 326, 335, 359, 382

D D’Alembertian, 386 D’Alembert’s principle, 102, 109, 151, 427, 428 Deformation, 152,153 of submanifolds, 362 Derivation, 10,22,23,34,63 Determinant, 14, 17,20,64 Developable surface, 362,365 Diffeomorphism, 25,28,39,51 Differential, 19,25 forms, 4.14,17-20,23,21,46 ideal, 1 13 manifold, 23,27 one parameter subgroup, 88 operator, 10,34,402 Dirac delta function, 56 Disconnected, 97 Distance function, 281,282 Divergence, 3, 139,386,399 Domain, 21,23,35,39, 107 of convergence, 83 of holomorphy, 426 Dot product, 3

Dual mapping, 15 Dual space, 11 Duality principle for path system, 250 Dynamics, 102

E

Earth, 140 Eigenvalue, 209, 210, 337, 390 Einsteinian mechanics, 188, 192 Electromagnetism, 98 Elliptic functions, 227, 232, 240 Elliptic integral, 235 Elliptical orbit, 140 Energy, 99, 207 Entropy, 254 Equation of continuity, 106 Euclidean dot product, 98 Euclidean space, 3, 22, 24, 98, 164,276 Euclidean geometry, 102, 106 Euler angles, 223, 224, 227, 231 Euler equations, 42, 99, 103, 151-154, 163, 168, 169, 222,299 Euler’s equations of fluid motion, 108 Euler relation, 161, 165, 184, 389 Eulerian, 104 Exponential mapping, 91 Extended curve, 154 Exterior algebra, 113, 177, 178 Exterior algebra, 113, 177, 178 derivative, 20, 46, 113 differentiation, 18 product, 12 External forces, 100 Extremal, 116, 152, 158, 165, 171, 174, 183, 184 curve, 119 field, 142,143, 162-164,168,173

F Fermat’s principle, 168 Fiber, 9 bundle, 348, 349 First fundamental form, 319 Flat coordinate system, 66 Fluid mechanics, 98 Focal point, 330 331, 344, 403, 407, 408 Foliation, 71 Force law, 98 Foundations of Lie group theory, 90

Subject Index

Frobenius complete integrability, 364 Functional analysis, 85 Functionally independent, 75, 79 Fundamental solution, 393

G Galilean group, 192, 199,203, 207- 209 Galilean transformation, 194 Gaussian curvature, 308, 362, 366 Generalized functions, 56, 59 Generalized manifold, 349 General position, 31, 33 General relativity, 399 Geodesic hall, 288 Geodesically convex, 334, 335 Geodesic deformation, 292 Geodesic triangle, 308 Geodesics, 191, 261, 275, 278,282, 285,286, 292,297,299,314,331 Geometric physical compatibility conditions, 374,400 Geometrical optics, 136 Global integrals, 71 Gram-Schmidt orthogonalization, 320 Grassman variety, 359 Grassman manifolds, 93 Gravitation 98 Greens, formula, 393 Gradient, 3, 277, 336, 386 Group, 81 H Hamilton equations, 99, 103, 104, 118, 129, 130,132,134,136,140,154,155 Hamilton-Jacobi equation, 134, 135, 137, 139,173,386,389 Hamilton-Jacobi partial differential equation, 131 Hamilton-Jacobi theory, 122,129, 130, 134136,139,168,174 Hamiltonian, 130-132, 137-141, 154, 155, 178,182,185,186,215,225,226 system, 179 Harmonic function, 390,393 Hausdorff, 6,24 Hermatian bilinear, 96 Hessian, 147,375,377 Hilbert space, 94 Holomorpic, 420 Hopf-Rinow theorem, 284,286,289,301

437

Homeomorphism, 71,87,91,177 Hyperplanoids, 424 Hypersurface, 59,60,107,336

I Ideal, 175 Identity map, 92 Immersed submanifold, 318 Immersion, 30, 32 Implicit function theorem, 25, 28, 33, 58 Index, 404 Infinite dimensional, 82 Infinite dimensional manifold, 114,241 Infinitesimal deformation, 114, 116, 121, 152,280,293 Infinitesimal generator, 44, 83, 87-89, 95, 159,354,378 Infinitesimal version of adjoint action, 90 Inner product, 12 Integral,40,51 curve, 35,44,68,89 curves, of vector fields, 178 equation, 85 function, 64,75 geometry, 50,57,59 manifold, 116 map, 64 submanifold, 67,123,124,126,127 Integrability condition, 65,421 Integration factor, 252 on manifolds, 60 over fibers, 55,57 Intersection, 31 Inverse, 81 Inverse function theorem, 28 Involution, 179 Involutive automorphism, 228 Isoenergetic reduction, 190 Isometric, 287,362 imbedding, 362,366,368 Isometries, 342,361 Isomorphic, 78 Jsoparametric function, 391 Isotropy subgroup, 93,94,343,346,353,359 J Jacobi-bracket, 18, 34, 36, 37, 44,63, 263, 353 Jacobi identity, 35,84

438

Subject Index

Jacobian, 51,135, 138 Jacobi vector fields, 291, 293, 294, 297 298, 315,331,333,356 Jacobian matrix, 25 K Kepler problem, 141 Kernel, I6 Killing form, 430 Killing vector field, 354-356, 359 Kineticenergy, 99, 100, 102, 156,186 Kronecker delta, 11

L L'HopitSll's rule, 306 Lagrange multipliers, 147,246,247,255 multiplier rule, 145 problem, 103 variational problem 142,148,151,244 Lagrangian, 117, 142, 143, 146, 148, 151, 152, 154, 155, 157, 160, 162, 164, 165, 167, 168, 170, 171, 174, 183, 185, 187, 219, 220, 224, 225 Laplace-Beltrami operator, 386,393, 398 Lyiplace operator, 27 Leaf, 69,70 Lebesgue intergral, 53 Left'invariant, 89 Left translation, 88,93 Legendrecondition, 147,151 Levi-Civita affine connection, 273 Levi form, 423,425 Lie Algebra, 35,63,65,78,84,217,353 algrebra of lie group, 86 algebra of vector fields, 75,174,178 derivative, 10, 18,40,41, 46,47, 106,354, 396 groups, 26,34,70,8 1,175,342 series, 249 sbbalgebra, 65,74,91 theory, 73 transformation group, 82,94 Linear automorphisni, 94 Linear homogeneous differential equations, 79 Linear lie algebra, 78 Linear operator, 85 Linear part of a vector field, 379

Linear representation, 82,88 Linear transformation, 83 Local moving frame, 321 Lorentz group, 192, 199,203,205,207,209, 342 Lorentz metric, 386

M Mayer variational problem, 244 Manifold, 26,30,35,50 Mass, 98,196 Matrix, 15 Riccati systems, 77 Maximal connected integral submanifold, 67,69,91 Maximal integral submanifold, 126,127 Maximal point, 64,65,345,346 Maximal rank mapping, 29,32,58,158 Maximal subalgebra, 78 Maxwell's equations, 201 Mean, 140 Mean curvature, 391,392 Measure zero, 57 Metricspace, 285,348 Metric structure, 99 Michelson-Morley experiment, 201 Minkowskian geometry, 195 Minimal submanifold, 331,332 Module, 10,22,30,46 Moment of inertia, 220 Momentum, 100,207 space, 176 Mongesystem, 254,255 Morseindex theorern,401,419 Moving frames, 26,383,384 Multilinear, 11 Multiple integral, 113 N Negatively oriented coordinate system, 50 Newtonian mechanics, 185, 186, 190, 192, 195 Newton's equations, 427 Newton's equations of motion, 100,151 Newton's law of motion, 98,155 Nilpotents, 95 Nonholonomic, 151 Nonsingular points, 57 Normal bundle, 343

Subject Index Normalvector bundle, 318 Number theory, 71

0 Observables, 178,179 One parameter group, 38,39,44,48,94,159,248 subgroup, 82,353 Operator power series, 83 Optimality, 145 Optimalcontrol, 151,256 Orbit, 83,93, 178,229, 343,344,346,351 space, 347,349 Ordinary differential equations, 38 Orientable, 50,57 Orientation, 50 Oriented manifold, 60,62 Orthogonalgroup,218 Orthogonal matrices, 96 Orthogonal matrix, 217 Orthogonal transformations, 156 Osculatingspace, 381 P Parallel translation, 264,265,270,359 Partial differential equations, 59 Particle 98 Partition ofunity, 50,52,61,70 Path,241,277,279 system, 241,249 Periodic, I39 Permutations, 81 Perpendicular, 164,167,173 Perpetual motion machine, 244 Perturbation, 139 theory, 139 Peter-Weyl theorem, 208 Pfaffian equation, 243,254,255 Phasespace, 176,178,179,242 Plancherel theorem, 208 Planck'sconstant, 136 PoincarC-Birkhofffixed point theorem, 182 PoincarClemma, 176 Point set topology, 6 Poisson bracket, 176178,231 Polar coordinates, 27 Positive orientation, 60 Positive oriented coordinate system, 50 Potential energy, 99 Power series, 83,85

439

Principal axes, 220 Principal curvature, 325,365 Principal point, 345,346 Principle of least action, 136 Principle of Maupertuk, 188,190 Projection, 93 Projective space, 359 Prolonged vector field, 158 Proper map, 85 Pseudoconvexity, 425,426 Pseudo-Riemannian manifold, 392,397,400 Pseudo-Riemannian metric, 195,272

Q

Quadrature, 43,75,227 Quantum field theory, 9 Quantummechanics, 81,136,179,188

R

Rational lie algebra, 84 Rauch comparison theorem, 313 Ray(s), 144,168,172,201 Real lie algebra, 84 Regular Variational problem, 155, 161, 162, 173,186 Regularly imbedded submanifold, 70,71, 343,344,351 Relativity, 272 RBpere mobile, 321 Riccati-Equation, 44 Ricci curvature, 305 Ricci identity, 302,358 Ricci tensor, 299,300 Riemannintegral, 51,53,58,60 Riemannian affine connection, 272-275 Riemannian connection, 273, 274, 321323 Riemannian geometry, 257,261 Riemannian metric or manifold 27, 165, 191, 261, 272, 275, 277, 280, 287, 289, 292, 294, 297, 308, 310, 312, 318, 322, 331, 342, 349, 362, 420 Riemannian space, 228 Rigid body, 109, 172, 216, 217, 219, 221, 223,232,429 Rigidly related submanifolds, 366 Right invariant, 89 Right translation, 88 Routhian, 215 Ring, 8

Subject Index

440 S

Sard’s Theorem, 331 Scalar field, 3 Scalar potential, 185, 186 Schwarz inequality, 279,280 Second fundamental form, 319, 320, 326, 333, 335, 337, 338, 363, 373, 376, 382, 384,425 Second variation, 117,257 formula, 119, 121, 291, 295, 325, 327, 335 Sectional curvature, 302,305, 308, 316, 337 Self-parallel curves, 265 Semigroups, I80 Simply connected, 181,270,290 Singularities of mappings, 54 Skew symmetric, 17 matrices, 96 Slice, 69,70 Special relativity, 203,208,342 Sphere, 23,230 Spherical harmonics, 208,389,390 Soap bubbles, 332 Solvable, 214 Stokes’ formula, 57,60,113, 114, 116,392 Stokes’ theorem, 107 Stress tensor, 107 Structure coefficients, 220 Sturm comparison theorem, 315 Sturm-Liouville theory, 401 Subgroup, 92 Submanifolds, 28,58,65 map, 30,69 Sum of one-parameter subgroups, 87 Summation convention, 21,336 Sun, 140 Surface area, 332 Symmetric space, 358,359 Symmetric spaces, 227,228 Symmetry(ies), 73,170, 174,184, 185,187 T Tangent bundle, 7,24,63,89, 152 Tangent space, 8,9,114

Tangent vector, 7,8,17,19,24,152 Taylor’s expansion, 86 Taylor’s formula for covariant derivatives, 310 Tensor algebra, 11 Tensor analysis, 4,14,21,26,262 Tensor field, 373, 378 Thermodynamics, 254 Top, 226 Topology, 23 Topological groups, 81, 345, 349 theory, 72 Topological vector space, 82 Torsion, 266 tensor, 374, 381 Total angular momentum, 223, 233 Total energy, 99, 103 Totally geodesic submanifold, 337,338,339, 340, 351, 352, 365, 375, 381 Transformation group, 81, 87, 93, 354 Transition function, 382 Transversal, I 16, 120 geodesic, 350 Jacobi field, 330

V Vector analysis, 3, 62, 98 Vector bundle, 9, 11 Vector field, 3, 10, 19, 22,24,27, 34, 35,39, 43,46 system, 63, 68 Vector potential, 185, 186 Vector product, 3 Vector space, 6, I 1,20,401 Velocity, 196, 199, 201, 203, 204,208 Volume element, 14, 19, 58 form, 54 Volume forces, 107

W Wave, 168 Wave equation, 202 Wave fronts, 144,201

Mathematics in Science and Engineering A

Series of Monographs and

Textbooks

Edited by RICHARD BELLMAN, University of S o u t h California

1. TRACYY. THOMAS.Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965 2. TRACYY. THOMAS. Plastic Flow and Fracture in Solids. 1961 ARIS.The Optimal Design of Chemical Reactors: A Study in Dynamic 3. RUTHERFORD Programming. 1961 and SOLOMON LEFSCHETZ. Stability by Liapunov’s Direct Method 4. JOSEPH LASALLE with Applications. 1961 LEITMANN(ed.) . Optimization Techniques: With Applications to Aero5. GEORGE space Systems. 1962 6. RICHARD BELLMAN and KENNETH L. COOKE.Differential-Difference Equations. 1963 7. FRANK A. HAIGHT.Mathematical Theories of Traffic Flow. 1963 8. F. V. ATKINSON. Discrete and Continuous Boundary Problems. 1964 9. A. JEFFREY and T. TANIUTI.Non-Linear Wave Propagation: With Applications to Physics and Magnetohydrodynamics. 1964 T. Tou. Optimum Design of Digital Control Systems. 1963 10. JULIUS FLANDERS. Differential Forms : With Applications to the Physical Sciences. 11. HARLEY 1963 12. SANFORD M. ROBERTS. Dynamic Programming in Chemical Engineering and Process Control. 1964 13. SOLOMON LEFSCHETZ. Stability of Nonlinear Control Systems. 1965 14. DIMITRIS N. CHORAFAS. Systems and Simulation. 1965 Random Processes in Nonlinear Control Systems. 1965 15. A. A. PERVOZVANSKII. C. PEASE, 111. Methods of Matrix Algebra. 1965 16. MARSHALL 17. V. E. BENES.Mathematical Theory of Connecting Networks and Telephone Traffic. 1965 18. WILLIAMF. AMES.Nonlinear Partial Differential Equations in Engineering. 1965 19. J. ACZEL.Lectures on Functional Equations and Their Applications. 1966 20. R. E. MURPHY. Adaptive Processes in Economic Systems. 1965 Dynamic Programming and the Calculus of Variations. 1965 21. S. E. DREYFUS. 22. A. A. FEL’DBAUM. Optimal Control Systems. 1965 23. A. HALANAY. Differential Equations : Stability, Oscillations, Time Lags. 1966 Time-Lag Control Systems. 1966 24. M. NAMIKOGUZTOBELI. 25. DAVIDSWORDER. Optimal Adaptive Control Systems. 1966 26. MILTON ASH. Optimal Shutdown Control of Nuclear Reactors. 1966 27. DIMITRISN. CHORAFAS. Control System Functions and Programming Approaches (In Two Volumes). 1966 28. N. P. ERUGIN.Linear Systems of Ordinary Differential Equations. 1966 MARCUS. Algebraic Linguistics; Analytical Models. 1967 29. SOLOMON 30. A. M. LIAPUNOV. Stability of Motion. 1966

31. 32. 33. 34. 35. 36. 37. 38. 39. 40.

41. 42. 43. 44.

45. 46.

47.

48.

49. 50.

GEORGE LEITMANN(ed.) . Topics in Optimization. 1967 MASANAO AOKI. Optimization of Stochastic Systems. 1967 HAROLD J. KUSHNER. Stochastic Stability and Control. 1967 MINORUURABE. Nonlinear Autonomous Oscillations. 1967 F. CALOGERO. Variable Phase Approach to Potential Scattering. 1967 A. KAUFMANN. Graphs, Dynamic Programming, and Finite Games. 1967 A. KAUFMANN and R. CRUON.Dynamic Programming: Sequential Scientific Management. 1967 J. H. AHLBERG, E. N. NILSON,and J. L. WALSH.The Theory of Splines and Their Applications. 1967 Y. SAWARAGI, Y. SUNAHARA, and T. NAKAMIZO. Statistical Decision Theory i n Adaptive Control Systems. 1967 RICHARD BELLMAN. Introduction to the Mathematical Theory of Control Processes Volume I. 1967 (Volumes I1 and I11 in preparation) E. STANLEY LEE. Quasilinearization and Invariant Imbedding. 1968 WILLIAMAMES.Nonlinear Ordinary Differential Equations i n Transport Processes. 1968 WILLARD MILLER,JR. Lie Theory and Special Functions. 1968 PAULB. BAILEY,LAWRENCE F. SHAMPINE, and PAUL E. WALTMAN. Nonlinear Two Point Boundary Value Problems. 1968 Iu. P. PETROV. Variational Methods in Optimum Control Theory. 1968 0. A. LADYZHENSKAYA and N. N. URAL'TSEVA. Linear and Quasilinear Elliptic Equations. 1968 A. KAUFMANN and R. FAUKE. Introduction to Operations Research. 1968 C. A. SWANSON. Comparison and Oscillation Theory of Linear Differential Equations. 1968 ROBERT HERMANN. Differential Geometry and the Calculus of Variations. 1968 N. K. JAISWAL. Priority Queues. 1968

In preparation

ROBERT P. GILBERT. Function Theoretic Methods in the Theory of Partial Differential Equations YUDELL LUKE.The Special Functions and Their Approximations (In Two Volumes) HUKUKANE NIKAIDO.Convex Structures and Economic Theory V. LAKSHMIKANTHAM and S. LEELA.Differential and Integral Inequalities KING-SUN FIJ.Sequential Methods in Pattern Recognition and Machine Learning

Differential geometry and the calculus of variations (Mathematics in science and engineering volume 49)

Differential Geometry and the Calculus of Variations

Dynamic Programming and the Calculus of Variations (Mathematics in Science and Engineering, Volume 21)

Dynamic Programming and the Calculus of Variations (Mathematics in Science and Engineering, Volume 21)

Partial Differential Equations and Calculus of Variations

Riccati differential equations, Volume 86 (Mathematics in Science and Engineering)

Concepts from tensor analysis and differential geometry (Mathematics in science and engineering, Volume 1)

Mathematics in Engineering and Science

Mathematics in Engineering and Science

Mathematics in engineering and science

Exterior Differential Systems and the Calculus of Variations

The calculus of variations

The Calculus Of Variations

The Calculus of Variations

Riccati Differential Equations (Mathematics in Science & Engineering, volume 86)

Nonlinear Partial Differential Equations in Engineering, Volume II (Mathematics in Science and Engineering)

Systems and Simulation (Mathematics in Science and Engineering, Volume 14)

Principles of combinatorics, Volume 72 (Mathematics in Science and Engineering)

Stability of Motion, (Mathematics in Science and Engineering, Volume 30)

Multiple Integrals in the Calculus of Variations (Classics in Mathematics)

Calculus of variations and nonlinear partial differential equations

Nonlinear Operators and the Calculus of Variations

Plateau's Problem and the Calculus of Variations

Dynamic programming and the calculus of variations

Calculus of Variations and Nonlinear Partial Differential Equations

Calculus of Variations and Nonlinear Partial Differential Equations

Topics in Calculus of Variations

Topics in calculus of variations

Nonlinear Operators and the Calculus and Variations

Stochastic Differential Equations in Science And Engineering

Stochastic Differential Equations In Science And Engineering

Differential geometry and the calculus of variations (Mathematics in science and engineering volume 49)